Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHODS AND PRODUCTS FOR FUSION PROTEIN SYNTHESIS
Document Type and Number:
WIPO Patent Application WO/2016/193746
Kind Code:
A1
Abstract:
The present invention provides a method of producing a fusion protein, said method comprising: a) contacting a first protein with a second protein under conditions that enable the formation of an isopeptide bond between said proteins, wherein said first protein and said second protein each comprise a peptide linker, wherein said peptide linkers are a pair of peptide linkers which react to form an isopeptide bond that links said first protein to said second protein to form a linked protein; and b) contacting the linked protein from (a) with a third protein under conditions that enable the formation of an isopeptide bond between said third protein and said linked protein, wherein said third protein comprises a peptide linker which reacts with a further peptide linker in the linked protein from (a), and wherein said peptide linkers are a pair of peptide linkers that react to form an isopeptide bond that links said third protein to said linked protein to form a fusion protein, wherein said pair of peptide linkers used in (a) are orthogonal to the pair of peptide linkers used in (b).Peptide linkers and the use of orthogonal pairs of said linkers in the synthesis of fusion proteins are also provided. Recombinant proteins comprising said linkers, nucleic acid molecules encoding said proteins and linkers, vectors comprising said nucleic acid molecules and host cells comprising said vectors and nucleic acid molecules are also contemplated.

Inventors:
HOWARTH MARK (GB)
Application Number:
PCT/GB2016/051640
Publication Date:
December 08, 2016
Filing Date:
June 03, 2016
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV OXFORD INNOVATION LTD (GB)
International Classes:
C07K14/315
Domestic Patent References:
WO2011098772A12011-08-18
WO2012127060A12012-09-27
WO2012142113A22012-10-18
Other References:
DIXON EMMA K ET AL: "Nonenzymatic assembly of branched polyubiquitin chains for structural and biochemical studies", BIOORGANIC & MEDICINAL CHEMISTRY, PERGAMON, GB, vol. 21, no. 12, 15 March 2013 (2013-03-15), pages 3421 - 3429, XP028553075, ISSN: 0968-0896, DOI: 10.1016/J.BMC.2013.02.052
MICHAEL FAIRHEAD ET AL: "SpyAvidin Hubs Enable Precise and Ultrastable Orthogonal Nanoassembly", JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, vol. 136, no. 35, 3 September 2014 (2014-09-03), pages 12355 - 12363, XP055200529, ISSN: 0002-7863, DOI: 10.1021/ja505584f
JACOB O. FIERER ET AL: "SpyLigase peptide-peptide ligation polymerizes affibodies to enhance magnetic cancer cell capture", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES, vol. 111, no. 13, 17 March 2014 (2014-03-17), US, pages E1176 - E1181, XP055296264, ISSN: 0027-8424, DOI: 10.1073/pnas.1315776111
VEGGIANI GIANLUCA ET AL: "Superglue from bacteria: unbreakable bridges for protein nanotechnology", TRENDS IN BIOTECHNOLOGY, ELSEVIER PUBLICATIONS, CAMBRIDGE, GB, vol. 32, no. 10, 26 August 2014 (2014-08-26), pages 506 - 512, XP029064532, ISSN: 0167-7799, DOI: 10.1016/J.TIBTECH.2014.08.001
HU X ET AL: "Autocatalytic intramolecular isopeptide bond formation in Gram-positive bacterial pili: A QM/MM simulation", JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, AMERICAN CHEMICAL SOCIETY, US, vol. 133, no. 3, 26 January 2011 (2011-01-26), pages 478 - 485, XP002631440, ISSN: 0002-7863, [retrieved on 20101213], DOI: 10.1021/JA107513T
SAYANI DASGUPTA ET AL: "Isopeptide Ligation Catalyzed by Quintessential Sortase A", JOURNAL OF BIOLOGICAL CHEMISTRY, vol. 286, no. 27, 8 July 2011 (2011-07-08), US, pages 23996 - 24006, XP055234876, ISSN: 0021-9258, DOI: 10.1074/jbc.M111.247650
GIANLUCA VEGGIANI ET AL: "Programmable polyproteams built using twin peptide superglues", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES, vol. 113, no. 5, 19 January 2016 (2016-01-19), US, pages 1202 - 1207, XP055295991, ISSN: 0027-8424, DOI: 10.1073/pnas.1519214113
Attorney, Agent or Firm:
WILKINS, Christopher (10 Salisbury SquareLondon, Greater London EC4Y 8JD, GB)
Download PDF:
Claims:
Claims

1. A method of producing a fusion protein, said method comprising:

a) contacting a first protein with a second protein under conditions that enable the formation of an isopeptide bond between said proteins, wherein said first protein and said second protein each comprise a peptide linker, wherein said peptide linkers are a pair of peptide linkers which react to form an isopeptide bond that links said first protein to said second protein to form a linked protein; and

b) contacting the linked protein from (a) with a third protein under conditions that enable the formation of an isopeptide bond between said third protein and said linked protein, wherein said third protein comprises a peptide linker which reacts with a further peptide linker in the linked protein from (a), and wherein said peptide linkers are a pair of peptide linkers that react to form an isopeptide bond that links said third protein to said linked protein to form a fusion protein,

wherein said pair of peptide linkers used in (a) are orthogonal to the pair of peptide linkers used in (b).

2. The method of claim 1 , being a method of producing a fusion protein, said method comprising:

a) providing a first protein comprising a first peptide linker;

b) contacting said first protein with a second protein, wherein said second protein comprises a second peptide linker and a third peptide linker, under conditions that enable said first peptide linker and said second peptide linker to form an isopeptide bond, thereby linking said first and second proteins; and

c) contacting said linked first and second proteins with a third protein, wherein said third protein comprises a fourth peptide linker, under conditions that enable said third peptide linker and said fourth peptide linker to form an isopeptide bond, thereby linking said second and third proteins to produce a fusion protein, wherein said first and second peptide linkers are a pair of peptide linkers that are orthogonal to the pair of peptide linkers consisting of said third and fourth peptide linkers.

3. The method of claim 1 or 2, wherein the method further comprises a step of extending the fusion protein, wherein the new protein to be linked to the fusion protein comprises a peptide linker that forms part of a pair of peptide linkers that is orthogonal to the pair of peptide linkers used to form the previous isopeptide bond in the fusion protein, wherein the peptide linker in the new protein is capable of forming an isopeptide bond with a peptide linker in a protein of the fusion protein, said method comprising contacting said new protein with said fusion protein under conditions that enable said new protein to form an isopeptide bond with a peptide linker in the fusion protein.

4. The method of any one of claims 1 to 3, wherein the fusion protein has a branched, linear or circular structure. 5. The method of claim 4, wherein the fusion protein is circularisable.

6. The method of any one of claims 1 to 5, wherein the formation of the isopeptide bond between said peptide linkers is spontaneous. 7. The method of any one of claims 1 to 5, wherein formation of the isopeptide bond between said peptide linkers is induced by a component that is added to the reaction.

8. The method of any one of claims 1 to 7, wherein each of said pair of peptide linkers is derived from an isopeptide protein.

9. The method of claim 8, wherein each pair of peptide linker is derived from a different isopeptide protein. 10. The method of claim 8 or 9, wherein the isopeptide protein comprises an amino acid sequence as set forth in any one of SEQ ID NOs: 21 , 23, 25, 27, 29 or 31 or a protein with at least 70% sequence identity to an amino acid sequence as set forth in any one of SEQ ID NOs: 21 , 23, 25, 27, 29 or 31. 1 1. The method of any one of claims 7 to 10, wherein the component that induces the formation of the isopeptide bond between said peptide linkers is a peptide ligase, preferably wherein said peptide ligase comprises a glutamic acid or aspartic acid residue that induces the formation of the isopeptide bond between said peptide linkers.

12. The method of claim 11 , wherein the peptide ligase is derived from an isopeptide protein, preferably wherein the peptide ligase is derived from the same isopeptide protein as the pair of peptide linkers between which it induces the formation of an isopeptide bond.

13. The method of any one of claims 1 to 12, wherein one or more of the peptide linkers comprises a blocking group.

14. The method of claim 13, wherein the blocking group prevents the formation of an isopeptide bond between a cognate pair of peptide linkers.

15. The method of any one of claims 1 to 14, wherein each peptide linker in a pair of peptide linkers comprises at least 6 amino acids. 16. The method of any one of claims 1 to 15, wherein one peptide linker in each pair comprises 6-50 amino acids and the other peptide linker in said pair comprises 50-300 amino acids.

17. The method of any one of claims 1 to 16, wherein one peptide linker in each pair comprises a lysine residue and the other peptide linker in said pair comprises an asparagine or aspartate residue, wherein said residues are involved in the formation of an isopeptide bond.

18. The method of any one of claims 11 to 17, wherein said peptide ligase comprises 50-300 amino acids.

19. The method of any one of claims 1 to 18, wherein the orthogonal pairs of peptide linkers are selected from any one of:

(1) a peptide linker comprising an amino acid sequence as set forth in SEQ ID NO: 1 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 1 , wherein said amino acid sequence comprises a lysine residue at position 9 and a peptide linker comprising an amino acid sequence as set forth in SEQ ID NO: 2 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 2, wherein said amino acid sequence comprises a glutamate or aspartate residue at position 55, a threonine residue at position 94, a glycine residue at position 100 and an asparagine or aspartate residue at position 106;

(2) a peptide linker comprising an amino acid sequence as set forth in SEQ ID NO: 5 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 5, wherein said amino acid sequence comprises an aspartate or asparagine residue at position 8 and a peptide linker comprising an amino acid sequence as set forth in SEQ ID NO: 6 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 6, wherein said amino acid sequence comprises a lysine residue at position 8;

(3) a peptide linker comprising an amino acid sequence as set forth in SEQ ID NO: 9 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 9, wherein said amino acid sequence comprises an asparagine or aspartate residue at position 17 and a peptide linker comprising an amino acid sequence as set forth in SEQ ID NO: 10 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 10, wherein said amino acid sequence comprises a lysine residue at position 9 and a glutamate or aspartate residue at position 70;

(4) a peptide linker comprising an amino acid sequence as set forth in SEQ ID NO: 109 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 109, wherein said amino acid sequence comprises an asparagine or aspartate residue at position 17, a glycine residue at position 1 1 and optionally an isoleucine residue at position 20, a proline residue at positions 21 and 22 and a lysine residue at position 23, and a peptide linker comprising an amino acid sequence as set forth in SEQ ID NO: 10 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 10, wherein said amino acid sequence comprises a lysine residue at position 9 and a glutamate or aspartate residue at position 70;

(5) a peptide linker comprising an amino acid sequence as set forth in SEQ ID NO: 13 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 13, wherein said amino acid sequence comprises an aspartate or asparagine residue at position 7 and a peptide linker comprising an amino acid sequence as set forth in SEQ ID NO: 14 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 14, wherein said amino acid sequence comprises a glutamate or aspartate residue at position 56, and a lysine residue at position 10;

(6) a peptide linker comprising an amino acid sequence as set forth in SEQ ID NO: 13 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 13, wherein said amino acid sequence comprises an aspartate or asparagine residue at position 7 and a peptide linker comprising an amino acid sequence as set forth in SEQ ID NO: 33 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 33, wherein said amino acid sequence comprises a lysine residue at position 8; and

(7) a peptide linker comprising an amino acid sequence as set forth in SEQ ID NO: 17 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 17, wherein said amino acid sequence comprises an aspartate or asparagine residue at position 11 and a peptide linker comprising an amino acid sequence as set forth in SEQ ID NO: 18 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 18, wherein said amino acid sequence comprises a glutamate or aspartate residue at position 241 and a lysine residue at position 162. 20. The method of claim 19, wherein the orthogonal pairs of peptide linkers comprise (1) and (4), (1) and (5), (1) and (6), (1) and (3), (1) and (2), (2) and (5), (2) and (6), (3) and (5), (3) and (6), (4) and (5) or (4) and (6).

21. The method of any one of claims 1 to 20, wherein said method is performed on a solid phase.

22. The method of claim 21 , further comprising a step of eluting the fusion protein from the solid phase. 23. A peptide linker comprising:

(i) an amino acid sequence as set forth in SEQ ID NO: 1 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 1 , wherein said amino acid sequence comprises a lysine residue at position 9;

(ii) an amino acid sequence as set forth in SEQ ID NO: 2 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 2, wherein said amino acid sequence comprises a glutamate or aspartate residue at position 55, a threonine residue at position 94, a glycine residue at position 100 and an asparagine or aspartate residue at position 106;

(iii) an amino acid sequence as set forth in SEQ ID NO: 5 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 5, wherein said amino acid sequence comprises an aspartate or asparagine residue at position 8;

(iv) an amino acid sequence as set forth in SEQ ID NO: 6 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 6, wherein said amino acid sequence comprises a lysine residue at position 8;

(v) an amino acid sequence as set forth in SEQ ID NO: 9 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 9, wherein said amino acid sequence comprises an asparagine or aspartate residue at position 17, preferably wherein said linker comprises an amino acid sequence as set forth in SEQ ID NO: 109 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 109, wherein said amino acid sequence comprises an asparagine or aspartate residue at position 17, a glycine residue at position 11 and optionally an isoleucine residue at position 20, a proline residue at positions 21 and 22 and a lysine residue at position 23; or

(vi) an amino acid sequence as set forth in SEQ ID NO: 10 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 10, wherein said amino acid sequence comprises a lysine residue at position 9 and a glutamate or aspartate residue at position 70.

24. The peptide linker of claim 23, wherein:

(a) the peptide linker in (i) comprises an amino acid sequence as set forth in SEQ ID NO: 38; and/or

(b) the peptide linker in (ii) comprises an amino acid sequence as set forth in

SEQ ID NO: 39; and/or

(c) the peptide linker in (iii) comprises an amino acid sequence as set forth in SEQ ID NO: 42; and/or

(d) the peptide linker in (iv) comprises an amino acid sequence as set forth in SEQ ID NO: 43; and/or (e) the peptide linker in (v) comprises an amino acid sequence as set forth in SEQ ID NO: 46; and/or

(f) the peptide linker in (vi) comprises an amino acid sequence as set forth in SEQ ID NO: 47.

25. The peptide linker of claim 23(i), wherein the linker is capable of spontaneously forming an isopeptide bond with a peptide linker comprising an amino acid sequence as defined in claim 23(H), preferably wherein the peptide linker of claim 23(i) is capable of spontaneously forming an isopeptide bond with a peptide linker comprising an amino acid sequence as set forth in SEQ ID NO: 2.

26. The peptide linker of claim 23(ii), wherein the linker is capable of spontaneously forming an isopeptide bond with a peptide linker comprising an amino acid sequence as defined in claim 23(i), preferably wherein the peptide linker of claim 23(H) is capable of spontaneously forming an isopeptide bond with a peptide linker comprising an amino acid sequence as set forth in SEQ ID NO: 1.

27. The peptide linker of claim 23(iii), wherein the linker is capable of spontaneously forming an isopeptide bond with a peptide linker comprising an amino acid sequence as defined in claim 23(iv), preferably wherein the peptide linker of claim 23(iii) is capable of spontaneously forming an isopeptide bond with a peptide linker comprising an amino acid sequence as set forth in SEQ ID NO: 6.

28. The peptide linker of claim 23(iv), wherein the linker is capable of spontaneously forming an isopeptide bond with a peptide linker comprising an amino acid sequence as defined in claim 23(iii), preferably wherein the peptide linker of claim 23(iv) is capable of spontaneously forming an isopeptide bond with a peptide linker comprising an amino acid sequence as set forth in SEQ ID NO: 5. 29. The peptide linker of claim 23(v), wherein the linker is capable of spontaneously forming an isopeptide bond with a peptide linker comprising an amino acid sequence as defined in claim 23(vi), preferably wherein the peptide linker of claim 23(v) is capable of spontaneously forming an isopeptide bond with a peptide linker comprising an amino acid sequence as set forth in SEQ ID NO: 10.

30. The peptide linker of claim 23(vi), wherein the linker is capable of spontaneously forming an isopeptide bond with a peptide linker comprising an amino acid sequence as defined in claim 23(v), preferably wherein the peptide linker of claim 23(vi) is capable of spontaneously forming an isopeptide bond with a peptide linker comprising an amino acid sequence as set forth in SEQ ID NO: 9 or SEQ ID NO: 109.

31. A pair of peptide linkers for use in the method of any one of claims 1 to 22 comprising:

(1) a peptide linker as defined in claim 23(i) and a peptide linker as defined in claim 23(ii);

(2) a peptide linker as defined in claim 23(iii) and a peptide linker as defined in claim 23(iv); or

(3) a peptide linker as defined in claim 23(v) and a peptide linker as defined in claim 23(vi).

32. A recombinant or synthetic polypeptide comprising polypeptide and a peptide linker as defined in any one of claims 23 to 30. 33. The polypeptide of claim 32, wherein said polypeptide comprises an amino acid sequence as set forth in any one of SEQ ID NOs: 50-59 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in any one of SEQ ID NOs: 50-59, wherein said polypeptides comprise a peptide linker as defined in any one of claims 23 to 30.

34. A nucleic acid molecule encoding a peptide linker as defined in any one of claims 23 to 30 or a polypeptide as defined in claim 32 or 33.

35. The nucleic acid molecule of claim 34, wherein the nucleic acid molecule comprises a nucleotide sequence as set forth in SEQ ID NO: 3, 4, 7, 8, 1 1 , 12, 40,

41 , 44, 45, 48, 49 or 60-69 or a nucleotide sequence with at least 70% sequence identity to a sequence as set forth in any one of SEQ ID NOs: 3, 4, 7, 8, 1 1 , 12, 40, 41 , 44, 45, 48, 49 or 60-69.

36. A vector comprising the nucleic acid molecule of claim 34 or 35.

37. A recombinant host cell containing a nucleic acid molecule and/or vector as defined in any one of claims 34 to 36. 38. A kit comprising:

(a) a recombinant or synthetic polypeptide comprising a peptide linker as defined any one of claims 23(i), (iii) or (v), 25, 27 or 29; and

(b) a recombinant or synthetic polypeptide comprising a peptide linker as defined any one of claims 23(ii), (iv) or (vi), 26, 28 or 30; and/or

(c) a nucleic acid molecule encoding a peptide linker as defined in any one of claims 23(i), (iii) or (v), 25, 27 or 29; and/or

(d) a nucleic acid molecule encoding a peptide or polypeptide linker as defined in any one of claims 23(H), (iv) or (vi), 26, 28 or 30,

optionally wherein the recombinant or synthetic polypeptide of (a) and/or (b) comprises a further peptide linker that is part of a pair of peptide linkers that are orthogonal to the peptide linkers in the polypeptides of (a) and (b).

39. A fusion protein obtained or obtainable from a method of any one of claims 1 to 22.

40. A solid substrate comprising at least one fusion protein obtained or obtainable from the method of any one of claims 1 to 22.

41. The solid substrate of claim 40, wherein said substrate is an array.

42. A library of fusion proteins comprising at least two fusion proteins obtained or obtainable from the method of any one of claims 1 to 22.

43. Use of at least two orthogonal pairs of peptide linkers for the production of a fusion protein, wherein each pair of peptide linkers reacts to form an isopeptide bond.

44. The use of claim 43, wherein the fusion protein, isopeptide bond, peptide linkers, and peptide ligase are as defined in any one of claims 1 to 31

Description:
Methods and products for fusion protein synthesis

The present invention relates to the synthesis (i.e. production, generation or assembly) of a fusion protein (i.e. a polymer comprising two or more covalently linked proteins, as defined below) and in particular to the modular (e.g. step-wise) synthesis of a fusion protein using orthogonal pairs of peptide linkers which react to form isopeptide bonds. The invention concerns the provision of a new method for synthesizing fusion proteins, particularly solid-phase synthesis. The method can advantageously be used in the production of a variety of products comprising fusion proteins, e.g. fusion protein arrays. The present invention also provides peptide linkers and the use of orthogonal pairs of said linkers in the synthesis of fusion proteins. Recombinant proteins comprising said linkers, nucleic acid molecules encoding said proteins and linkers, vectors comprising said nucleic acid molecules and host cells comprising said vectors and nucleic acid molecules are also provided. A kit comprising said recombinant polypeptides and/or nucleic acid molecules/vectors is also provided. Fusion proteins obtained by the method of the invention and products, e.g. arrays and libraries, comprising said fusions proteins are also contemplated.

Biological events usually depend on the cooperative activity of multiple proteins and the precise arrangement of proteins in complexes influences and determines their function. Thus, the ability to arrange individual proteins in a complex in a controlled manner represents a useful tool in characterising protein functions. Moreover, the conjugation of multiple proteins to form a so-called "fusion protein" can result in molecules with useful characteristics. For instance, clustering a single kind of protein often greatly enhances biological signals, e.g. the repeating antigen structures on vaccines. Clustering proteins with different activities can also result in complexes with improved activities, e.g. substrate channelling by enzymes.

However, clustering different kinds of proteins into precise artificial "fusion proteins" has encountered numerous problems. For instance, individual proteins or protein domains can be joined genetically into one long open reading frame, but errors in protein synthesis and misfolding soon become limiting. Alternative methods have focussed on expressing proteins or protein domains individually and then linking these "modules" or "units" together. For instance, methods have focussed on modifying proteins to contain well characterised interaction partners, such as biotin/avidin, thereby enabling proteins to form complexes through non- covalent interactions. Other methods have relied on reactive groups within the proteins, particularly cysteine residues, to link proteins through covalent bonds, i.e. disulfide bridges. However, even the best non-covalent linkages or reversible covalent linkages allow the rearrangement of fusion proteins. Accordingly, existing methods are limited, insofar as they commonly result in undefined mixtures of fusion proteins that are difficult to separate and/or fusion proteins that are not stable across a variety of environments, e.g. in reducing conditions.

Important features of a system for synthesizing fusion proteins include molecularly-defined connections between the individual proteins (i.e. modules, domains or units) within said fusion protein, independence from any template, and simple expression of each protein (i.e. module, domain or unit). It is also highly desirable to have a near quantitative yield for each reaction to minimize the inadvertent synthesis of heterogeneous products after just a few steps, which is a common consequence of incomplete chains within the mixture. It is also preferable for individual modules to be modified with relatively small peptide tags rather than large protein fusion domains, for minimal disruption to the function of each module within the fusion protein. However, no existing fusion protein synthesis method has been able to fulfil these criteria.

Thus, there is a need and desire for an improved method for synthesizing fusion proteins and it has now been found that peptide linkers that form isopeptide bonds to generate irreversible covalent linkages can be used in a modular (e.g. step-wise), and high-yielding, method for synthesizing a fusion protein.

Isopeptide bonds are amide bonds formed between carboxyl/carboxamide and amino groups, where at least one of the carboxyl or amino groups is outside of the protein main-chain (the backbone of the protein). Such bonds are chemically irreversible under biological conditions and they are resistant to most proteases. In fact, isopeptide bonds between proteins have been determined to be the strongest measured protein interaction.

Isopeptide bond formation can be enzyme catalyzed, for example by transglutaminase enzymes. Isopeptide bonds are commonly found in natural environments to improve the strength and/or stability of a protein complex, e.g. the stabilization of extracellular matrix structures or reinforcement of blood clots.

Isopeptide bonds may also form spontaneously as has been identified in

HK97 bacteriophage capsid formation and Gram-positive bacterial pili.

Spontaneous isopeptide bond formation has been proposed to occur after protein folding, through nucleophilic attack of the ε-amino group from a lysine on the Cy group of an asparagine or aspartate, promoted by a nearby glutamate or aspartate.

Proteins that are capable of spontaneous isopeptide bond formation have been advantageously used to develop peptide tag/binding partner pairs which covalently bind to each other and which hence provide irreversible interactions (see e.g. WO2011/098772 herein incorporated by reference). In this respect, proteins which are capable of spontaneous isopeptide bond formation may be expressed as separate fragments, to give a peptide tag and a binding partner for the peptide tag, where the two fragments are capable of covalently reconstituting by isopeptide bond formation. The isopeptide bond formed by the peptide tag and binding partner pairs is stable under conditions where non-covalent interactions would rapidly dissociate, e.g. over long periods of time (e.g. weeks), at high temperature (to at least 95 °C), at high force, or with harsh chemical treatment (e.g. pH 2-11 , organic solvent, detergents or denaturants).

In brief, a peptide tag/binding partner pair may be derived from any protein capable of spontaneously forming an isopeptide bond (an isopeptide protein), wherein the domains of the protein are expressed separately to produce a peptide tag that comprises one of the residues involved in the isopeptide bond (e.g. a lysine) and a peptide binding partner that comprises the other residue involved in the isopeptide bond (e.g. an asparagine or aspartate). In some instances, one of the peptide tag or binding partner comprises one or more other residues required to form the isopeptide bond (e.g. a glutamate). However, it has been found that it is possible to express the domains comprising the residues involved in isopeptide bond formation separately, i.e. as three separate peptides (domains, modules or units). In this respect, the peptide tag comprises one of the residues involved in the isopeptide bond (e.g. a lysine), the peptide binding partner that comprises the other residue involved in the isopeptide bond (e.g. an asparagine or aspartate) and a third peptide comprises the one or more other residues involved in isopeptide bond formation (e.g. a glutamate). Mixing all three peptides results in the formation of an isopeptide bond between the two peptides comprising the residues that react to form the isopeptide bond, i.e. the peptide tag and binding partner. Thus, the third peptide mediates the conjugation of the peptide tag and binding partner but does not form of the part resultant structure, i.e. the third peptide is not covalently linked to the peptide tag or binding partner. As such, the third peptide may be viewed as a protein ligase or peptide ligase. This is particularly useful as it minimises the size of the peptide tag and binding partner that need to be fused to the protein of interest, thereby reducing the possibility of unwanted interactions caused by the addition of the peptide tag or binding partner, e.g. misfolding.

As discussed in more detail below, various proteins which are capable of spontaneously forming one or more isopeptide bonds (a so-called "isopeptide protein") have been identified and may be modified to produce a peptide tag/binding partner pair and optionally a peptide ligase, as discussed above. Further proteins that are capable of spontaneously forming one or more isopeptide bonds may be identified by comparing their structures with those of proteins which are known to spontaneously form one or more isopeptide bonds. Particularly, other proteins which may spontaneously form an isopeptide bond may be identified by comparing their crystal structures with those from known isopeptide proteins e.g. the major pilin protein Spy0128, and in particular comparing the Lys-Asn/Asp-Glu/Asp residues often involved in the formation of an isopeptide protein. Additionally, other isopeptide proteins may be identified by screening for structural homologues of known isopeptide proteins using the Protein Data Bank using standard database searching tools. The SPASM server (http://eds.bmc.uu.se/eds/spana.php7spasm) may be used to target the 3D structural template of Lys-Asn/Asp-Glu/Asp of the isopeptide bond or isopeptide proteins may also be identified by sequence homology alone.

Notably, proteins which form isopeptide bonds may be designed de novo as described in WO2011/098772 (herein incorporated by reference). Rosetta can be used to design isopeptide proteins de novo and this software can be found at http://depts.washington.edu/ventures/UW Technology/Express

Licenses/rosetta.php. (See also Macromolecular modeling with rosetta, Das.R, Baker.D, Annu Rev Biochem, 2008, 77, 363-82). Additionally, the RASMOT-3D PRO server can be used to search the protein database for appropriate orientation of residues at http://biodev.extra.cea.fr/rasmot3d/.

The present inventors have advantageously determined that such peptide tag/binding partner pairs may be used as peptide linkers to covalently join multiple proteins, i.e. to produce a fusion protein. In particular, the inventors have demonstrated that orthogonal (i.e. mutually unreactive or non-cognate) pairs of peptide tag/binding partner peptides find utility in the fusion (e.g. conjugation, linkage) of two or more proteins, i.e. the production (synthesis, construction, assembly) of fusion proteins. As demonstrated in detail in the Examples below, the methods and uses of the present invention provide a modular (e.g. step-wise) and high yielding approach to link proteins into chains based upon sequential isopeptide bond formation. In particular, the methods and uses described herein enable the controlled (i.e. specific, targeted) extension of protein chains without the generation of statistical mixtures. It is particularly advantageous over previous methods because it results in a fusion protein in which each protein unit (module, domain) is joined by an irreversible linkage, i.e. an isopeptide bond. Thus, as the linkage is not reliant on the reaction of cysteine residues, it is applicable to proteins that contain free cysteine residues and/or disulfide bonds. Furthermore, each protein unit to be added to the chain needs to be modified only with two small peptide tags, which can be incorporated at various positions within the protein, i.e. at the N-terminus, C- terminus or at an internal site on the protein. Thus, each protein unit of the fusion protein may be completely genetically encoded, i.e. the method is not reliant on the use of unnatural (i.e. non-standard) amino acids or the post-translational modification of amino acid residues. Thus, the present invention provides a simple and scalable method for the synthesis of fusion proteins which is highly specific and does not require purification of intermediates.

A representative example of the method of the invention is set out in Figure 1 , which shows a solid-phase embodiment of the invention. However, this is in no way intended to be limiting on the scope of the invention and various other permutations would be apparent to the skilled person from the description below and are intended to be encompassed by the present invention as defined in the appended claims.

Figure 1 shows two pairs of peptide linkers, termed SpyTag/SpyCatcher and SnoopTag/Snoop Catcher, wherein each pair, i.e. each "Tag" and "Catcher", react specifically and spontaneously to form an isopeptide bond thereby linking the "Tag" peptide to the "Catcher" peptide. In this respect, the pairs are orthogonal, meaning that they are mutually unreactive, i.e. SpyTag and SpyCatcher cannot react with either of SnoopCatcher or SnoopTag to form an isopeptide bond. As discussed below in more detail, in some embodiments, the "Tags" may be viewed as peptide tags and "Catcher" peptides may be viewed as binding partner proteins.

Thus, in step 1 , a first protein, MBPx (a modified version of the maltose- binding protein, discussed below) is provided, wherein the protein has been modified to incorporate a peptide linker, SpyCatcher (i.e. the first part of a first pair of peptides linkers), e.g. via recombinant expression of a nucleic acid molecule encoding the MBPx polypeptide and the SpyCatcher peptide linker in a single open reading frame. In this representative example, the MBPx protein is used as a purification or immobilization tag that allows the extending fusion protein to be immobilized on a solid phase (an amylose resin). However, it will be evident from the discussion below that this is not an essential feature of the invention. For instance, the method may be heterogeneous (i.e. solid phase) or homogeneous (i.e. in solution) and, if heterogeneous, any suitable purification/immobilization tag may be used, i.e. it is not essential that the tag is a protein or peptide tag.

In step 2, the first protein (MBPx-SpyCatcher) is contacted with a second protein (A) which has been modified to incorporate two peptide linkers. One peptide linker is the second part of the first pair of linkers (SpyTag), the first part forming a domain of the first protein (SpyCatcher). The other peptide linker is the first part of a second pair of peptide linkers (SnoopTag); as discussed above, the second pair of linkers does not react with the first pair of linkers. Thus, on contacting the first and second proteins together, the first pair of linkers react (e.g. spontaneously) to form a specific isopeptide bond between the SpyCatcher and SpyTag peptide linkers, thereby linking the first protein (MBPx-SpyCatcher) and second protein (SpyTag-A- SnoopTag) together to form a fusion protein.

In step 3, the fusion protein (MBPx-SpyCatcher-SpyTag-A-SnoopTag) is contacted with a further protein comprising two peptide linkers, SnoopCatcher and SpyCatcher. Thus, one peptide linker (SnoopCatcher) is the second part of the second pair of peptide linkers and the other peptide linker (SpyCatcher) is from the first pair of peptide linkers. These peptide linkers may be connected via a spacer e.g. a peptide spacer, or a protein which is to be incorporated into the final fusion protein. On contacting the fusion protein (MBPx-SpyCatcher-SpyTag-A-SnoopTag) with the further protein (SnoopCatcher-SpyCatcher), the second pair of linkers react (e.g. spontaneously) to form an isopeptide bond, thereby extending the fusion protein. Alternatively viewed, the addition of the SnoopCatcher-SpyCatcher protein may be viewed as functionalising or activating the fusion protein for further extension, i.e. by adding a reactive group (a reactive peptide linker) to the fusion protein.

In step 4, the extended fusion protein from step 3 (MBPx-SpyCatcher- SpyTag-A-SnoopTag-SnoopCatcher-SpyCatcher) is contacted with a further protein (B), which incorporates two peptide linkers akin to the A protein (SpyTag-B- SnoopTag). Again, an isopeptide bond is formed between the peptide linkers that are capable of reacting together, i.e. the first pair, SpyCatcher and SpyTag, to further extend the fusion protein.

It will be evident that this process may be repeated until all of the protein units of the desired fusion protein have been linked together. The fusion protein may be simply eluted from the solid phase, e.g. with maltose, and used without further purification. It should be noted that the terminal protein of the fusion protein needs to be modified to incorporate only a single peptide linker, which can react with a free peptide linker in a protein, e.g. the penultimate protein unit, of the fusion protein. As discussed in the Examples, the inventor has demonstrated synthesis of a fusion protein containing 10 protein units, which has been validated by gel electrophoresis and mass spectrometry.

Whilst not wishing to be bound by theory, it is thought that the precise orientation of the amino acid residues in the peptide linker pairs, e.g.

SnoopTag/SnoopCatcher, SpyTag/SpyCatcher etc., promotes nucleophilic attack and formation of an irreversible isopeptide bond between the peptide linkers. As mentioned above, Lysine reacts with either Aspartate or Asparagine in each of these pairs. The SpyTag peptide has a reactive Aspartate and so it cannot react with the reactive Asparagine of SnoopCatcher. The SnoopTag peptide has a reactive Lysine and so it cannot react with the reactive Lysine of SpyCatcher.

Therefore these two peptide linker pairs are orthogonal and it will be evident that any orthogonal pairs of peptide linkers could be used in the method of the invention to generate fusion proteins. In this respect, it is the orthogonal, mutually unreactive, properties of the peptide linker pairs that enables the generation of robust and programmable fusion proteins. In particular, if the growing fusion protein chain is attached to a solid-phase, the reacting module (i.e. the next protein to be linked to the fusion protein) can be added in large excess, thereby driving reaction to completion. This means that unreacted building blocks simply may be washed away, so separation (i.e. separation of the growing fusion protein from unreacted components) is unnecessary at each step. Thus, elongating one step at a time allows chain growth using a small number of orthogonal connections. Hence, the method developed by the present inventor is superior to previously described protein-coupling methods, particularly in terms of the stability of the fusion protein product and the simplicity of the individual reaction steps. Thus, at its broadest, the invention may be viewed as the use of at least two orthogonal pairs of peptide linkers for the production of a fusion protein, wherein each pair of peptide linkers reacts to form an isopeptide bond.

In particular, the peptide linkers of each peptide linker pair react with each other to form an isopeptide bond. As mentioned above, each peptide linker forms a part (e.g. domain) of a protein that will form a unit (e.g. domain or module) of the fusion protein. In other words, the proteins to be linked together may be modified to incorporate at least one peptide linker (e.g. two, three, four peptide linkers etc.), wherein each pair of peptide linkers used in the production of a fusion protein is orthogonal to at least one other pair of peptide linkers used in the production of said fusion protein.

Thus, in some embodiments, the orthogonal pairs of peptide linkers are used in the production of a fusion protein containing at least two protein units (e.g. domains or modules). For instance, in the representative embodiment shown in Figure 1 , the protein used to conjugate protein A with protein B may be viewed as a linker unit, i.e. the linker unit (linker protein) functions only to conjugate protein A with protein B. Thus, the fusion protein may be viewed as containing or comprising at least two functional proteins, i.e. proteins with functions other than as a linker. In other embodiments, the fusion protein may be viewed as containing or comprising at least three proteins (i.e. irrespective of their function).

In further embodiments, the fusion protein may be viewed as containing or comprising at least three functional proteins. For instance, with reference to the representative embodiment shown in Figure 1 , if the linker protein used to conjugate protein A with protein B contains a protein (e.g. a functional protein) in addition to the peptide linkers, it may be viewed as a protein unit (domain or module) of the fusion protein. Thus, the fusion protein may be viewed as containing or comprising at least three functional proteins, i.e. proteins with functions other than, or in addition to, a linker.

Alternatively viewed, the invention provides a method of producing (e.g. generating, synthesizing, assembling etc.) a fusion protein, said method

comprising:

a) contacting a first protein with a second protein under conditions that enable the formation of an isopeptide bond between said proteins, wherein said first protein and said second protein each comprise a peptide linker, wherein said peptide linkers are a pair of peptide linkers which react (with each other) to form an isopeptide bond that links said first protein to said second protein to form a linked protein; and

b) contacting the linked protein from (a) with a third protein under conditions that enable the formation of an isopeptide bond between said third protein and said linked protein, wherein said third protein comprises a peptide linker which reacts with a further peptide linker in the linked protein from (a), and wherein said peptide linkers are a pair of peptide linkers that react (with each other) to form an isopeptide bond that links said third protein to said linked protein to form a fusion protein, wherein said pair of peptide linkers from/used in (a) are orthogonal to the pair of peptide linkers from/used in (b).

Viewed from yet another aspect, the invention provides a method of producing (e.g. generating, synthesizing, assembling etc.) a fusion protein, said method comprising:

a) providing a first protein comprising a first peptide linker;

b) contacting said first protein with a second protein, wherein said second protein comprises a second peptide linker and a third peptide linker, under conditions that enable said first peptide linker and said second peptide linker to form an isopeptide bond, thereby linking said first and second proteins; and

c) contacting said linked first and second proteins with a third protein, wherein said third protein comprises a fourth peptide linker, under conditions that enable said third peptide linker and said fourth peptide linker to form an isopeptide bond, thereby linking said second and third proteins to produce a fusion protein, wherein said first and second peptide linkers are a pair of peptide linkers that are orthogonal to the pair of peptide linkers consisting of said third and fourth peptide linkers.

As noted above, in some embodiments, the second protein may function as a linker between the first and third proteins. Accordingly, the fusion protein may be view as comprising two "functional" proteins, i.e. proteins that have functions other than to link two protein units (modules, domains etc.) together. Thus, in some embodiments, the second protein may be viewed as a linker protein, i.e. a protein containing at least two peptide linkers each from different orthogonal pairs of peptide linkers and optionally a spacer domain, e.g. a peptide spacer.

Thus, in some embodiments, the second protein may be viewed as a linker protein that functionalises or activates the first protein to enable said first protein to be linked with (conjugated to) said third protein. Similarly, where further proteins are added to the fusion protein (i.e. where the fusion protein is extended) a linker protein may be used to functionalise or activate one or more proteins in the fusion protein to enable said one or more proteins to be linked with said further proteins.

As discussed above, the use of orthogonal pairs of peptide linkers facilitates the production of fusion proteins containing a large number of protein units.

Accordingly, additional proteins may be added to the fusion protein (i.e. the fusion protein may be extended (e.g. elongated, lengthened)) by contacting the fusion protein with further protein that comprises at least one peptide linker that is capable of forming an isopeptide bond with a peptide linker in a protein of the fusion protein. In this respect, the peptide linker in the new protein forms part of a pair of peptide linkers that is orthogonal to the pair of peptide linkers used to form the previous isopeptide bond in the fusion protein.

Thus, in some embodiments, the method further comprises a step of extending the fusion protein, wherein the new protein (i.e. additional or further protein) to be linked to the fusion protein comprises a peptide linker that forms part of a pair of peptide linkers that is orthogonal to the pair of peptide linkers used to form the previous isopeptide bond in the fusion protein, wherein the peptide linker in the new protein is capable of forming an isopeptide bond with a peptide linker in a protein of the fusion protein, said method comprising contacting said new protein with said fusion protein under conditions that enable said new protein (particularly the peptide linker in said new protein) to form an isopeptide bond with a peptide linker in the fusion protein.

Thus, in some embodiments, the third protein may be viewed as a "further" protein, e.g. an additional or new protein, to be added to the fusion protein. Hence, extending the fusion protein may be viewed as repeating step (c) in the method above, wherein the peptide linkers in the further protein are a pair of peptide linkers that are orthogonal to the pair of peptide linkers used to join the previous protein added to the fusion protein.

In some embodiments, the new protein (i.e. further or additional protein) to be added to the fusion protein comprises at least a second peptide linker (e.g. to allow further extension of the fusion protein chain). Accordingly, the second peptide linker (and any further peptide linkers in the new protein) is orthogonal to the pair of peptide linkers used to link (conjugate) the fusion protein and the new protein.

Thus, in still further embodiments, the method of producing said fusion protein may comprise a step of extending said fusion protein, wherein said third protein comprises a fifth peptide linker and said method comprises a step of contacting said fusion protein with a fourth protein, wherein said fourth protein comprises a sixth peptide linker, under conditions that enable said fifth peptide linker and said sixth peptide linker to form an isopeptide bond, thereby linking said third and fourth proteins to extend said fusion protein, wherein said fifth and sixth peptide linkers form a pair of peptide linkers that is orthogonal to the pair of peptide linkers consisting of said third and fourth peptide linkers.

As shown in the Figure 1 , it is possible to generate a fusion protein comprising multiple protein units (e.g. more than 3 protein units, e.g. 4, 5, 6, 7, 8, 9, 10 or more protein units, such as 12, 15, 20 or more protein units) using two orthogonal pairs of peptide linkers. Thus, in some embodiments the pair of peptide linkers consisting of the fifth and sixth peptide linkers is identical to the pair of peptide linkers consisting of the first and second peptide linkers.

Thus, in still a further embodiment, the fusion protein may be further extended, wherein said fourth protein comprises a seventh peptide linker and said method comprises a step of contacting said fusion protein with a fifth protein, wherein said fifth protein comprises an eighth peptide linker, under conditions that enable said seventh peptide linker and said eighth peptide linker to form an isopeptide bond, thereby linking said fourth and fifth proteins to extend said fusion protein, wherein said seventh and eighth peptide linkers form a pair of peptide linkers that is orthogonal to the pair of peptide linkers consisting of said fifth and sixth peptide linkers.

In some embodiments the pair of peptide linkers consisting of the seventh and eighth peptide linkers is identical to the pair of peptide linkers consisting of the third and fourth peptide linkers.

It will be evident that the fusion protein chain may be extended by repeating the steps described above, e.g. wherein the fifth protein comprises a ninth peptide linker and a sixth protein that comprises a tenth peptide linker and wherein said ninth and tenth peptide linkers form a pair of peptide linkers that is orthogonal to the pair of peptide linkers consisting of said seventh and eighth peptide linkers. In some embodiments, the pair of peptide linkers consisting of the ninth and tenth peptide linkers is identical to the pair of peptide linkers consisting of the first and third peptide linkers and/or said fifth and sixth peptide linkers.

Thus, in some embodiments, the at least two orthogonal pairs of peptide linkers may be used alternately to link (conjugate) proteins to form a fusion protein. Alternatively viewed, the new or further protein to be added to the fusion protein comprises at least one peptide linker that forms part of a pair of peptide linkers that is orthogonal to the pair of peptide linkers used to link the previously added protein in the fusion protein.

Whilst the invention may be worked successfully using two orthogonal pairs of peptide linkers, it will be evident that more than two orthogonal pairs of peptide linkers may be utilized in the methods and uses of the invention. Thus, in the context of the representative examples given above, in some embodiments the pair of peptide linkers consisting of the fifth and sixth peptide linkers is different, preferably orthogonal, to the pair of peptide linkers consisting of the first and second peptide linkers. As discussed below, the use of more than two pairs of orthogonal peptide linkers would enable the production of complex fusion protein structures, e.g. branched structures. Accordingly, as discussed in detail below, the inventor has developed several orthogonal pairs of peptide linkers, which form a further embodiment of the invention.

For instance, a fusion protein comprising three proteins, 1 , 2 and 3 may be produced according to the method described above, wherein protein 1 comprises peptide linker A, protein 2 comprises peptide linkers A' and B and protein 3 comprises peptide linker B'. In this respect, peptide linkers A and A' (a pair of peptide linkers) react to form an isopeptide bond and peptide linkers B and B' (a pair of peptide linkers) react to form an isopeptide bond, wherein peptide linker pairs A/A' and B/B' are orthogonal (i.e. do not react with the other pair to form an isopeptide bond). Using a third orthogonal pair of peptide linkers would enable the production of a branched structure. For example, protein 2 may comprise a third peptide linker C and a fourth protein, 4, may comprise a peptide linker C, wherein C and C (a pair of peptide linkers) react to form an isopeptide bond and wherein peptide linkers A/A', B/B' and C/C are orthogonal. When the fusion protein 1-2-3 is contacted with protein 4 under conditions that enable C and C to form an isopeptide bond, the resultant fusion protein will be branched, i.e. 1-2(-4)-3 (see Figure 13A). Alternatively, a fusion protein 1-2-4 may be contacted with protein 3 under conditions that enable B and B' to form an isopeptide bond to produce the branched fusion protein, 1-2(-4)-3. The skilled person would understand that complex branching structures may be generated using three pairs of orthogonal peptide linkers and the complexity of the branching structures may be increased further by using additional orthogonal pairs of peptide linkers. In particular, the use of more than two orthogonal pairs of peptide linkers may advantageously be used to generate asymmetric branching structures.

Thus, in some embodiments, the method and uses of the invention utilise more than two orthogonal pairs of peptide linkers, e.g. 3, 4, 5, 6, 7, 8, 9 or 10 or more orthogonal pairs of peptide linkers.

Branching may also be achieved using two orthogonal pairs of peptide linkers. For instance, a branched fusion protein comprising five proteins, 1-5, may be produced by including additional peptide linkers in one of the proteins, e.g.

protein 2 may comprise 4 peptide linkers from two orthogonal pairs of peptide linkers. In this representative embodiment, protein 1 comprises peptide linker A, protein 2 comprises peptide linker A' and three peptide linkers B. Proteins 3, 4 and 5 each comprises peptide linker B', wherein peptide linkers A and A' (a pair of peptide linkers) react to form an isopeptide bond and peptide linkers B and B' (a pair of peptide linkers) react to form an isopeptide bond, wherein peptide linker pairs A/A' and B/B' are orthogonal. Thus, contacting the fusion protein 1-2 with proteins 3-5 would result in a branched fusion protein in which proteins 3-5 are all joined to protein 2 independently of each other (see Figure 13B). It will be evident that proteins 3-5 may be the same or different proteins. Moreover, one or more of proteins 3-5 may comprise additional peptide linkers from orthogonal pairs of peptide linkers to facilitate extension (e.g. the separate, independent extension) of each branch of the fusion protein.

Thus, in some embodiments the fusion protein may be branched. In other embodiments, the fusion protein may be linear. In some embodiments, e.g. where more than two orthogonal pairs of peptide linkers are used, the fusion protein may be comprised of asymmetric branches, i.e. the fusion protein may have an asymmetric structure. Thus, in some embodiments, the invention provides a method of producing a branched fusion protein. In some embodiments, the invention provides a method of producing a linear fusion protein.

The term "branched" refers to fusion proteins in which two or more protein units are linked (joined, conjugated) to the same internal protein unit (a non-terminal protein unit) of a fusion protein, independently of each other, i.e. via independently (separately) formed isopeptide bonds. An internal protein unit or a non-terminal protein unit may be defined as a protein that is linked (joined, conjugated) by an isopeptide bond to at least two other protein units in the fusion protein. A terminal protein unit may be defined as a protein that is linked (joined, conjugated) via an isopeptide bond only to one other protein unit in the fusion protein. Thus, in the representative examples discussed above and shown in Figure 13, protein 2 is an internal protein unit or non-terminal protein unit because it is joined via isopeptide bonds to proteins 1 and 3, wherein the proteins 4 and 5 may be viewed as

"branches" of the fusion protein. Proteins 1 , 3, 4 and 5 may be viewed as terminal protein units. Thus, a branched fusion protein comprises more than two terminal protein units.

The term "linear" refers to fusion proteins in which all of the internal protein units are linked only to two other protein units in the fusion protein, thereby generating a linear chain of protein units. Thus, a linear fusion protein comprises only two terminal protein units.

In yet other embodiments, the fusion protein may be circular. For instance, taking the fusion protein above, 1-2-3, if protein 1 also contains a peptide linker C and protein 3 also contains a peptide linker C, proteins 1 and 3 may be linked by an isopeptide bond, thereby forming a circular protein. Thus, in some embodiments, a linear protein may be viewed as being circularisable, i.e. capable of forming a circular fusion protein. In this respect, as discussed below, one or more of the peptide linkers may be blocked or protected to prevent or delay its reaction. Thus, using the example above, if peptide linker C and/or C is blocked, the fusion protein will be a linear fusion protein that is circularisable and may be circularised by unblocking C and/or C to allow the peptide linkers to react to form an isopeptide bond.

Thus, in some embodiments, the invention provides a method of producing a circular or circularisable fusion protein.

Thus, the term "circular" generally refers to fusion proteins which do not contain any terminal protein units. However, it will be evident that it is possible to produce a "branched circular" fusion protein, which comprises a circular fusion protein in which one or more of the internal protein units is linked by an isopeptide bond to at least three other protein units in the fusion protein.

The term "orthogonal" as used herein refers to molecules that are mutually unreactive, e.g. molecules that are not capable of reacting with each other or react with a reduced efficiency as compared to corresponding molecules that are capable of reacting with each other. In the context of the peptide linkers of the invention, particularly pairs of peptide linkers, the term orthogonal refers to pairs of peptide linkers that are not capable of reacting with other pairs of peptide linkers to form an isopeptide bond or react with a reduced efficiency as compared to corresponding molecules, e.g. endogenous proteins that are capable of spontaneously forming isopeptide bonds or pairs of peptide linkers that are capable of reacting with each other efficiently to form an isopeptide bond. An inability to react may be viewed as 5% or less of the peptide linkers in a sample reacting to form isopeptide bonds, e.g. 4%, 3%, 2% or 1 % or less. A reduced efficiency may be viewed as less than 5% efficiency, e.g. less than 4%, 3%, 2% or 1 % efficiency, of a pair of orthogonal peptide linkers to react to form an isopeptide bond compared to the ability of each pair of peptide linkers to form an isopeptide bond. Conversely, a pair of peptide linkers that react efficiently to form an isopeptide bond may react with at least 95% efficiency, e.g. at least 96%, 97%, 98%, 99% or 100% efficiency, i.e. at least 95% of the peptide linkers of a pair of peptide linkers in a sample react to form an isopeptide bond under conditions that enable the formation of an isopeptide bond. For example, two pairs of peptide linkers, A/A' and B/B', may be viewed as orthogonal when A and A' cannot react with B and/or B' to form an isopeptide bond or when A and A' react with B and/or B' to form an isopeptide bond with less than 5% efficiency as compared to the isopeptide bond formation between A and A' and/or B and B'.

Alternatively viewed, two peptide linkers that react together efficiently to form an isopeptide bond under conditions that enable or facilitate isopeptide bond formation may be defined as a cognate pair of peptide linkers, wherein the term "cognate" refers to components that function together, i.e. to react together to form an isopeptide bond. Thus, two peptide linkers that react together efficiently to form an isopeptide bond under conditions that enable or facilitate isopeptide bond formation can also be referred to as being a "complementary" pair of peptide linkers. As such, orthogonal pairs of peptide linkers may be viewed as non-cognate pairs or non-complementary pairs. For instance, based on the representative examples described above, the peptide linker pair A/A' may be viewed as a cognate or complementary pair of peptide linkers, whereas A/A' and B/B' are non-cognate or non-complementary pairs insofar as A and A' cannot react efficiently with B and/or B' to form an isopeptide bond under conditions that enable or facilitate isopeptide bond formation.

The peptide linkers for use in the methods and uses of the invention may be derived from a protein capable of spontaneously forming an isopeptide bond. In particular, "a protein capable of spontaneously forming an isopeptide bond" (also referred to herein as "an isopeptide protein") is one which may form an isopeptide bond in the absence of enzymes or other substances and/or without chemical modification, within its protein chain, i.e. intramolecularly. The two reactive residues for forming the isopeptide bond are therefore comprised within a single protein chain. Thus, proteins which only form isopeptide bonds intermolecularly, i.e. with other peptide or protein chains or units are not considered to be isopeptide proteins as used in the present invention. Particularly, the HK97 capsid subunits which have intermolecular isopeptide bonds are excluded.

The term "isopeptide bond" as used herein, refers to an amide bond between a carboxyl or carboxamide group and an amino group at least one of which is not derived from a protein main chain or alternatively viewed is not part of the protein backbone. An isopeptide bond may form within a single protein or may occur between two peptides or a peptide and a protein. Thus, an isopeptide bond may form intramolecularly within a single protein or intermolecularly i.e. between two peptide/protein molecules, e.g. between two peptide linkers. Typically, an isopeptide bond may occur between a lysine residue and an asparagine, aspartic acid, glutamine, or glutamic acid residue or the terminal carboxyl group of the protein or peptide chain or may occur between the alpha-amino terminus of the protein or peptide chain and an asparagine, aspartic acid, glutamine or glutamic acid. Each residue of the pair involved in the isopeptide bond is referred to herein as a reactive residue. In preferred embodiments of the invention, an isopeptide bond may form between a lysine residue and an asparagine residue or between a lysine residue and an aspartic acid residue. Particularly, isopeptide bonds can occur between the side chain amine of lysine and carboxamide group of asparagine or carboxyl group of an aspartate.

Distances between residues involved in an isopeptide bond are measured from particular C atoms within the residue. Thus, when lysine is involved in the isopeptide bond, the distance is measured from the C-epsilon atom of the lysine; when the aspartic acid is involved in the isopeptide bond, the distance is measured from the C-gamma atom of the aspartic acid; when asparagine is involved in the isopeptide bond, the distance is measured from the C-gamma atom of the asparagine and when glutamic acid is involved in the isopeptide bond, the distance is measured from the C-delta atom of glutamic acid. These atoms (from which distances are calculated) of the reactive residues involved in the isopeptide bond are referred to herein as "relevant atoms". Typically, in order for an isopeptide bond to form, the reactive residues e.g. the reactive lysine and asparagine/aspartate residues (and particularly the relevant atoms thereof; for lysine the C-epsilon atom and for asparagine/aspartate the C- gamma atom) should be positioned in close proximity to one another in space, e.g. in the isopeptide protein from which they are derived. Thus, particularly, the reactive residues e.g. the lysine and asparagine/aspartate (and particularly the relevant atoms thereof) are within 4 Angstrom of each other in the folded protein (from which they are derived) and may be within 3.8, 3.6, 3.4, 3.2, 3.0, 2.8, 2.6, 2.4, 2.2, 2.0, 1.8 or 1.6 Angstrom of each other. Particularly, the reactive residues (and more particularly their relevant atoms) may be within 1.81 , 2.63 or 2.60 Angstrom of each other in the isopeptide protein from which they are derived.

Generally isopeptide proteins from which the peptide linkers of the invention may be derived may comprise a glutamic acid or aspartic acid residue in close proximity to the two other reactive amino acid residues e.g. to lysine and

asparagine/aspartate, which are involved in the formation of the isopeptide bond. Particularly, the C-delta atom of the glutamic acid or the C-gamma atom of the aspartic acid residue may be within 5.5 Angstrom from a reactive

asparagine/aspartate residue, e.g. from the C-gamma atom of a reactive asparagine/aspartate residue, involved in the isopeptide bond, in the folded protein structure. For example, the glutamic acid (e.g. the C-delta atom thereof) may be within 5.4, 5.2, 5.0, 4.8, 4.6, 4.4, 4.2, 4.0, 3.8, 3.6, 3.4, 3.2 or 3.0 Angstrom from the reactive asparagine/aspartate residue e.g. the C-gamma atom thereof in the isopeptide bond. Particularly, the glutamic acid residue, e.g. the C-delta atom, thereof may be 4.99, 3.84 or 3.73 Angstrom from the asparagine/aspartate residue e.g. the C-gamma atom thereof.

Further, the glutamic acid residue, e.g. the C-delta atom thereof, may be within 6.5 Angstrom of a reactive lysine residue, e.g. the C-epsilon atom thereof, involved in the isopeptide bond, for example within 6.3, 6.1 , 5.9, 5.7, 5.5., 5.3, 5.1 , 4.9, 4.7, 4.5, 4.3 or 4.1 Angstrom. Particularly, the glutamic acid residue, e.g. the C-delta atom thereof, may be 6.07, 4.80 or 4.42 Angstrom from a reactive lysine, e.g. the C-epsilon atom thereof.

The glutamic acid residue (or aspartic acid residue) may help induce the formation of the isopeptide bond as discussed previously.

As discussed above, the peptide linkers for use in the methods and uses of the invention may be obtained by splitting the reactive domains of an isopeptide protein into two or three domains. Thus, each pair of peptide linkers consists of a peptide comprising a lysine residue and a peptide comprising an aspartate or asparagine residue, wherein said residues (i.e. lysine and aspartate or lysine and asparagine) are involved in the formation of an isopeptide bond (i.e. react to form an isopeptide bond), thereby joining (conjugating) said peptide linkers.

In some preferred embodiments, the formation of the isopeptide bond between said peptide linkers is spontaneous. Accordingly, one of the peptide linkers comprises a glutamic acid or aspartic acid residue that facilitates, e.g. induces or catalyzes the formation of the isopeptide bond between the lysine and asparagine or aspartate residues in the peptide linkers. In some embodiments, the glutamic acid or aspartic acid residue fulfils one or more of the proximity criteria set out above.

Thus, in embodiments where the formation of the isopeptide bond between said peptide linkers is spontaneous, one of the peptide linkers may be viewed as a peptide tag and the other peptide linker (i.e. the linker comprising the glutamic acid or aspartic acid residue that facilitates, e.g. induces or catalyzes, the formation of the isopeptide bond) may be viewed as a peptide binding partner, i.e. the binding partner for the peptide tag as defined further below.

The term "spontaneous" as used herein refers to a bond, e.g. an isopeptide or covalent bond which can form in a protein or between peptides or proteins (e.g. between 2 peptides or a peptide and a protein, i.e. the peptide linkers of the invention) without any other agent (e.g. an enzyme catalyst) being present and/or without chemical modification of the protein or peptide, e.g. without native chemical ligation or chemical coupling using 1-ethyl-3-(3-dimethylaminopropyl) carbodiimide (EDC). Thus, native chemical ligation to modify a peptide or protein to have a C- terminal thioester is not carried out.

Thus, a spontaneous isopeptide bond can form when a protein is isolated on its own or a covalent or isopeptide bond can form between two peptides or a peptide and a protein (i.e. peptide linkers of the invention) when in isolation or without chemical modification. A spontaneous isopeptide or covalent bond may therefore form of its own accord in the absence of enzymes or other exogenous substances or without chemical modification. Particularly however, a spontaneous isopeptide or covalent bond may require the presence of a glutamic acid or an aspartic acid residue in the protein or in one of the peptides/proteins (i.e. in one of the peptide linkers) involved in the bond to allow formation of the bond in a proximity-induced manner.

A spontaneous isopeptide or covalent bond may form almost immediately after the production of a protein or after contact between two or more proteins comprising peptide linkers of the invention, e.g. peptide tag and binding partner, e.g. within 1 , 2, 3, 4, 5, 10, 15, 20, 25 or 30 minutes, or within 1 , 2, 4, 8, 12, 16, 20 or 24 hours. The bond may form under a range of conditions, such as in

phosphate-buffered saline (PBS) or Tris-buffered saline (TBS) at pH 4.0-9.0, e.g. 5.0, 5.5, 6.5, 7.0, 7.5, 8.0 or 8.5, and at 0-40°C, e.g. 1 , 2, 3, 4, 5, 10, 12, 15, 18, 20, 22 or 25°C. The skilled person would readily be able to determine other suitable conditions.

Thus, in some embodiments, contacting proteins comprising peptide linkers as defined herein "under conditions that enable the formation of an isopeptide bond" includes contacting said proteins in buffered conditions, e.g. in a buffered solution or on a solid phase (e.g. column) that has been equilibrated with a buffer, such as PBS or TBS. The step of contacting may be at any suitable pH, such as pH 4.0-9.0, e.g. 4.5-8.5, 5.0-8.0, 5.5-7.5, such as about pH 6.2, 6.4, 6.6, 6.8, 7.0, 7.2, 7.4, 7.6, 7.8 or 8.0. Additionally or alternatively, the step of contacting may be at any suitable temperature, such as about 0-40°C, e.g. about 1-39, 2-38, 3-37, 4-36, 5-35, 6-34, 7-33, 8-32, 9-31 or 10-30°C, e.g. about 10, 12, 15, 18, 20, 22 or 25°C. The skilled person would understand that the conditions may need to be adapted depending on the characteristics of the peptide linkers used in the method of the invention and would readily be able to determine which conditions are suitable.

In some embodiments, contacting proteins comprising peptide linkers as defined herein "under conditions that enable the formation of an isopeptide bond" includes contacting said proteins in the presence of a chemical chaperone, e.g. a molecule that enhances or improves the reactivity of the peptide linkers. In some embodiments, the chemical chaperone is TMAO (trimethylamine N-oxide). In some embodiments, the chemical chaperone, e.g. TMAO, is present in the reaction at a concentration of at least about 0.2M, e.g. at least 0.3, 0.4, 0.5, 1.0, 1.5, 2.0 or 2.5M, e.g. about 0.2-3.0M, 0.5-2.0M, 1.0-1.5M.

In some embodiments, the formation of the isopeptide bond between said peptide linkers is not spontaneous, i.e. the formation of the isopeptide bond is induced or catalyzed by a component that is added to the reaction. The component that induces or catalyzes the formation of the isopeptide bond may be a peptide, e.g. a polypeptide such as an enzyme, such as a transglutaminase. In a preferred embodiment, the component that induces or catalyzes the formation of the isopeptide bond may be a peptide derived from an isopeptide protein, i.e. a domain or fragment of an isopeptide protein comprising a glutamic acid or aspartic acid residue that facilitates, e.g. induces or catalyzes the formation of the isopeptide bond between the lysine and asparagine or aspartate residues in the peptide linkers. A peptide that facilitates, e.g. induces or catalyzes the formation of the isopeptide bond between the lysine and asparagine or aspartate residues in the peptide linkers may be viewed as a protein ligase or peptide ligase, insofar as it is capable of inducing, specifically, the formation of an isopeptide bond between two peptide linkers.

Thus, in embodiments where the formation of the isopeptide bond between said peptide linkers is not spontaneous, i.e. wherein a component (e.g. peptide, e.g. peptide ligase) that induces the formation of an isopeptide bond between the peptide linkers is provided separately, both of the peptide linkers may be viewed as peptide tags, as defined below. Accordingly, the peptide that induces the formation of an isopeptide bond between the peptide linkers (peptide tags) may be viewed as a peptide ligase or peptide linker pair binding partner.

Thus, in some embodiments, the invention further comprises a step of contacting proteins to be linked with a component (e.g. peptide) capable of inducing the formation of an isopeptide bond between said peptide linkers under conditions that enable the formation of an isopeptide bond between said proteins. In some embodiments, the component capable of inducing the formation of an isopeptide bond between said peptide linkers is a peptide comprising a glutamic acid or aspartic acid residue that induces the formation of the isopeptide bond between the lysine and asparagine or aspartate residues in the peptide linkers in said proteins.

The component (e.g. peptide) capable of inducing the formation of an isopeptide bond between said peptide linkers may be added to the reaction before, after or contemporaneously with when the proteins to be joined together are contacted with each other. In some embodiments, the component (e.g. peptide) capable of inducing the formation of an isopeptide bond between said peptide linkers may be added to the reaction after the proteins to be joined together are contacted with each other.

The use of a component (e.g. peptide) capable of inducing the formation of an isopeptide bond between said peptide linkers is particularly advantageous because it allows the proteins units of the fusion protein to be joined (conjugated) without the presence of large intervening peptide domains. Alternatively viewed, the use of a component (e.g. peptide) capable of inducing the formation of an isopeptide bond between said peptide linkers facilitates the use of small peptide linkers (e.g. peptide tags), i.e. the minimum peptide sequence of each peptide linker in an cognate peptide linker pair capable of forming an isopeptide bond between said peptide linkers.

In some embodiments, the pair of cognate peptide linkers and the peptide capable of inducing the formation of an isopeptide bond between said peptide linkers are derived from the same isopeptide protein.

Proteins capable of spontaneously forming an isopeptide bond may be capable of forming at least one such bond and may comprise more than one isopeptide bond, for example 2, 3, 4, 5, 6, 7, 8, 9, 10 or more. It may be possible to develop several different peptide linker pairs from an isopeptide protein, particularly if more than one spontaneously formed isopeptide bond is present within a protein. In some embodiments, different peptide linker pairs derived from the same isopeptide protein may be orthogonal. It is preferred in the present invention to develop each pair of peptide linkers from an isopeptide protein which comprises a single or only two isopeptide bonds.

Examples of known proteins capable of spontaneously forming one or more isopeptide bonds include Spy0128 (Kang et al, Science, 2007, 318(5856), 1625-8), Spy0125 (Pointon et al, J. Biol. Chem., 2010, 285(44), 33858-66) and FbaB (Oke et al, J. Struct Funct Genomics, 2010, 11 (2), 167-80) from Streptococcus pyogenes, Cna of Staphylococcus aureus (Kang et al, Science, 2007, 318 (5856), 1625-8), the ACE19 protein of Enterococcus faecalis (Kang et al, Science, 2007, 318(5856), 1625-8), the BcpA pilin from Bacillus cereus (Budzik et al, PNAS USA, 2007, 106(47), 19992-7), the minor pilin GBS52 from Streptococcus agalactiae (Kang et al, Science, 2007, 318(5856), 1625-8), SpaA from Corynebacterium diphtheriae (Kang et al, PNAS USA, 2009, 106(40), 16967-71), SpaP from Streptococcus mutans (Nylander et al, Acta Crystallogr Sect F Struct Biol Cryst Commum., 201 1 , 67(Pt1), 23-6), RrgA (Izore et al, Structure, 2010, 18(1), 106-15), RrgB (El Mortaji et al, J. Biol. Chem., 2010, 285(16), 12405-15) and RrgC (El Mortaji et al, J. Biol. Chem., 2010, 285(16), 12405-15) from Streptococcus pneumoniae, SspB from Streptococcus gordonii (Forsgren et al, J Mol Biol, 2010, 397(3), 740-51). As discussed above, any of these proteins may be used to generate peptide linkers (particularly cognate peptide linker pairs) for use in the methods and uses of the present invention.

The arrangement or order of the peptide linkers in the proteins to be linked to form the fusion protein is not particularly important. For instance, the first protein of the desired fusion protein may comprise a peptide tag (A) and the second protein may comprise a peptide binding partner that is cognate for the peptide tag on the first protein (Α') and a peptide binding partner that is cognate for the peptide tag on the third protein (Β'). Alternatively, the first protein of the desired fusion protein may comprise a peptide binding partner (Α') and the second protein may comprise a peptide tag that is cognate for the peptide binding partner on the first protein (A) and a peptide tag that is cognate for the peptide binding partner on the third protein (B). In this respect, it is sufficient that the pair of peptide linkers used to link two proteins (e.g. a first protein and second protein, a fusion protein with a further protein etc.) is orthogonal to the pair of peptide linkers used to extend the fusion protein. As discussed below, orthogonal peptide linkers may be achieved in a variety of ways.

Thus, in some preferred embodiments, the first pair of peptide linkers (Α/Α') comprises one peptide linker, A, (e.g. peptide tag) with a reactive lysine residue and a one peptide linker, A', (e.g. peptide binding partner) with a reactive aspartate or asparagine residue and the second pair of peptide linkers (Β/Β') comprises one peptide linker, B, (e.g. peptide tag) with a reactive aspartate or asparagine residue and one peptide linker, B', (e.g. peptide binding partner) with a reactive lysine residue. Using the example provided above, there is no suitable route for A to react with B' and no suitable route for B to react with A'. Accordingly, the peptide linker pairs are orthogonal to each other.

In further embodiments, the first pair of peptide linkers (Α/Α') comprises one peptide linker, A, (e.g. peptide tag) with a reactive lysine residue and a one peptide linker, A', (e.g. peptide binding partner) with a reactive aspartate or asparagine residue and the second pair of peptide linkers (Β/Β') comprises one peptide linker, B, (e.g. peptide tag) with a reactive lysine residue and a one peptide linker, B', (e.g. peptide binding partner) with a reactive aspartate or asparagine residue.

Alternatively, the first pair of peptide linkers (Α/Α') comprises one peptide linker, A, (e.g. peptide tag) with a reactive aspartate or asparagine residue and a one peptide linker, A', (e.g. peptide binding partner) with a reactive lysine residue and the second pair of peptide linkers (Β/Β') comprises one peptide linker, B, (e.g. peptide tag) with a reactive aspartate or asparagine residue and a one peptide linker, B', (e.g. peptide binding partner) with a reactive lysine residue. In these embodiments, peptide linkers (peptide tags) A and B may be selected such that they have a substantial difference in size in at least one (e.g. two, three) "anchor" residues, so that the non-covalent docking of A and B' and B and A' (i.e. the interaction between A and B' and B and A') is inefficient, thereby ensuring there is minimal cross- reaction.

The term "anchor residues" refers to amino acid residues in a β-strand of a one of the peptide linkers in a cognate peptide linker pair (e.g. peptide binding partner) pointing toward the hydrophobic core of the peptide linker and accepting the reactive residue from the other peptide linker of the cognate peptide linker pair (e.g. peptide tag). A β-strand alternates between residues facing towards the solvent and residues facing towards the hydrophobic protein core and the residue orientation is defined from the structure of the domain forming a spontaneous isopeptide bond in the isopeptide protein from which the peptide linker is derived. This may be determined by any suitable method known in the art, e.g. X-ray crystallography, nuclear magnetic resonance or cryo-electron microscopy.

Small anchor residues include alanine and valine. Intermediate size anchor residues include leucine, isoleucine and methionine. Large anchor residues include phenylalanine and tryptophan. Thus, in some embodiments at least one small anchor residue may be replaced with an intermediate size or large anchor residue. In some embodiments at least one intermediate size anchor residue may be replaced with a small or large anchor residue. In still further embodiments, at least one large anchor residue may be replaced with an intermediate size or small anchor residue.

In some embodiments, orthogonal pairs of peptide linkers may be derived from different isopeptide proteins or different domains of the same isopeptide protein. In some embodiments, orthogonal pairs of peptide linkers are produced de novo.

Pairs of peptide linkers that are produced de novo should possess the two required reactive amino acid residues for the spontaneous formation of the isopeptide bond, preferably together with a glutamic acid or aspartic acid residue. Thus, as described above, one peptide linker comprises a reactive lysine residue and the other peptide linker comprises a reactive asparagine or aspartate residue. In a preferred embodiment, one of the peptide linkers also comprises a glutamic acid or aspartic acid residue that induces or facilitates the formation of an isopeptide bond between said peptide linkers. However, as noted above, a component (e.g. peptide, e.g. a peptide ligase) comprising a glutamic acid or aspartic acid residue that induces or facilitates the formation of an isopeptide bond between said peptide linkers may be provided separately.

It will be evident that neither peptide linker in a cognate peptide linker pair comprises both reactive residues involved in the formation of the isopeptide bond, i.e. each peptide linker in a cognate pair of peptide linkers comprises one reactive residue, i.e. a lysine residue or an aspartate/asparagine residue.

In embodiments where one of the peptide linkers comprises a glutamic acid or aspartic acid residue that induces or facilitates the formation of an isopeptide bond between said peptide linkers, typically said glutamic acid or aspartic acid residue is within 6.5 Angstrom of the residue in the linker involved in the isopeptide bond, e.g. within 6.0, 5.5, 5.0, 4.5, 4.0, 3.5 or 3.0 Angstrom. These distances particularly refer to the distances between the relevant atoms within each residue, i.e. the atoms involved in forming the isopeptide bond. When the two peptide linkers are brought into proximity with each other, e.g. when the first and second proteins are contacted together, the two reactive residues (and more particularly, their relevant atoms) involved in the bond should be within 4 Angstrom from each other in space, preferably 3.8, 3.6, 3.4, 3.2, 3.0, 2.8, 2.6, 2.4, 2.2, 2.0, 1.8 or 1.6 Angstrom.

The skilled person would immediately recognise that the pKa of residues involved in the isopeptide bond formation should also be considered when designing an isopeptide protein de novo. For example, it is preferred that the reactive lysine residue be deprotonated before reaction, which at neutral pH may require the lysine to be buried in the hydrophobic core.

Whilst it is preferred that orthogonal pairs of peptide linkers may be derived from different isopeptide proteins or different domains of the same isopeptide protein, it is possible to produce orthogonal pairs of peptide linkers from the same isopeptide protein, particularly from the same domains of an isopeptide protein. For instance, one peptide linker from a cognate pair of peptide linkers may be modified such that it does not react (or does not react efficiently) with the other peptide linker in the pair. The modification may be reversible, such that reversing or removing the modification that prevents the reaction between the peptide linkers reconstitutes the capacity of the peptide linker pair to react efficiently to form an isopeptide bond. Thus, by way of example, one of the peptide linkers of the cognate peptide linker pair A/A' may be modified, e.g. A is modified by the addition of a blocking group, to produce peptide linker B, wherein B cannot react efficiently with A' or A to form an isopeptide bond. Removal of the blocking group from B results in the peptide linker B', which is capable of reacting with A' to form an isopeptide bond.

The use of reversible or removal blocking groups is well known in the art. Thus, the addition of a blocking group to one peptide linker from a cognate pair of peptide linkers to produce an orthogonal pair of peptide linkers may be viewed as adding a protecting group to the peptide linker or caging the peptide linker. The blocking (e.g. protecting, masking or caging) group may be removed by any suitable means known in the art that reconstitutes the capacity of the peptide linker to react efficiently with the other peptide linker of the peptide linker pair to form an isopeptide bond. The removal of the blocking group (e.g. deprotecting, unmasking, uncaging) may be achieved via a chemical, enzymatic or light reaction, depending on the nature of the blocking group. Suitable examples of blocking groups include bulky moieties, such as proteins which may sterically impede reaction and may be removed by use of an enzyme, such as Tobacco Etch Virus protease (as reviewed in Bioorg Med Chem. 2012 Jan 15;20(2):571-82. doi: 10.1016/j.bmc.201 1.07.048. Epub 2011 Jul 30. Cleavable linkers in chemical biology. Leriche G, Chisholm L, Wagner A. trans-cyclooctene-caged lysine, (N-(((E)-cyclooct-2-en-1-yl)- oxy)carbonyl-L-lysine, which is chemically decaged by reaction with a tetrazine (Nat Chem Biol. 2014 Dec; 10(12): 1003-5. doi: 10.1038/nchembio.1656. Epub 2014 Nov 2. Diels-Alder reaction-triggered bioorthogonal protein decaging in living cells. Li J, Jia S, Chen PR) or lysine caged with o-nitrobenzyl or coumarin groups which are decaged by light of the appropriate wavelength, as well known in the art (see e.g. Chem Rev. 2013 Jan 9; 1 13(1): 1 19-91. doi: 10.1021/cr300177k. Epub 2012 Dec 21. Photoremovable protecting groups in chemistry and biology: reaction mechanisms and efficacy. Klan , Solomek T, Bochet CG, Blanc A, Givens R, Rubina M, Popik V, Kostikov A, Wirz J.)

The use of blocking groups need not be limited to the production of additional orthogonal pairs of peptide linkers. For instance, blocking groups may be particularly useful to control the extension of fusion proteins, e.g. in multiplex reactions. By way of example, multiple fusion proteins may be synthesised on a single solid phase substrate, e.g. to produce an array comprising a variety of different fusion proteins. The physical separation of each fusion protein on the solid phase would facilitate the selective unblocking of peptide linkers on the substrate, e.g. using light-reactive blocking groups akin to the generation of nucleic acid arrays. The selective unblocking of peptide linkers would enable the extension of a single fusion protein, or sets of fusions proteins (e.g. in a specific location on the solid phase), in one extension reaction and the extension of a different fusion protein, or set of fusion proteins, in subsequent reactions.

Thus, in some embodiments, one or more of the peptide linkers may comprise a blocking group, i.e. a reversible blocking group. In some embodiments, the blocking group may be removed by contacting the fusion protein with light, e.g. UV light, a chemical or an enzyme that removes the blocking group.

Thus, in some embodiments, the method of the invention may comprise a step of unblocking or removing a blocking group from a peptide linker in the fusion protein.

In a representative embodiment, the invention provides a method of producing (e.g. generating, synthesizing, assembling etc.) a fusion protein, said method comprising:

a) contacting a first protein with a second protein under conditions that enable the formation of an isopeptide bond between said proteins, wherein said first protein and said second protein each comprise a peptide linker, wherein said peptide linkers are a pair of peptide linkers which react (with each other) to form an isopeptide bond that links said first protein to said second protein to form a linked protein; and

b) contacting the linked protein from (a) with a third protein under conditions that enable the formation of an isopeptide bond between said third protein and said linked protein, wherein said third protein comprises a peptide linker which reacts with a further peptide linker in the linked protein from (a), and wherein said peptide linkers are a pair of peptide linkers that react (with each other) to form an isopeptide bond that links said third protein to said linked protein to form a fusion protein, wherein said pair of peptide linkers from (a) are orthogonal to the pair of peptide linkers from (b)

and wherein the further peptide linker in the linked protein comprises a blocking group and the conditions that enable the formation of an isopeptide bond between said third protein and said linked protein include treating the linked protein to removing the blocking group. ln some embodiments the blocking group may be removed (the peptide linker may be unblocked) before the step of contacting the linked protein with said third protein. In some embodiments, the blocking group may be removed (the peptide linker may be unblocked) after or contemporaneously with the step of contacting the linked protein with said third protein.

The term "peptide linker" as used herein generally refers to a peptide, oligopeptide or polypeptide which may be designed or derived directly from an isopeptide protein, e.g. the peptide linker may be a fragment of an isopeptide protein or a modification thereof. There is no standard definition regarding the size boundaries between what is meant by peptide, oligopeptide and polypeptide, but typically a peptide may be viewed as comprising between 2-20 amino acids, and oligopeptide between 21-39 amino acids and a polypeptide may be viewed as comprising at least 40 amino acids. Thus, a peptide linker as defined herein may be viewed as comprising at least 6 amino acids, e.g. 6-300 amino acids.

In some embodiments, a peptide linker may be referred to as a peptide tag and may be between 6-50 amino acids in length, e.g. 7-45, 8-40, 9-35, 10-30, 11-25 amino acids in length, e.g. it may comprise or consist of 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19 or 20 amino acids. The peptide linker or tag specifically binds covalently via an isopeptide bond to a second peptide linker, wherein another peptide linker, which may be viewed as a peptide tag or a peptide binding partner, as defined below. Two peptide linkers (e.g. peptide tag and peptide tag or peptide tag and peptide binding partner) that react with each other (e.g. specifically and efficiently) to form an isopeptide bond may be defined as a pair of peptide linkers, particularly a cognate pair of peptide linkers.

Thus, as mentioned above, a peptide linker must comprise at least one amino acid residue, e.g. lysine or asparagine/aspartate, that is involved in the formation of an isopeptide bond. Accordingly, each peptide linker in a pair of peptide linkers must comprise a different, i.e. complementary, reactive amino acid residue that is involved in the formation of an isopeptide bond, i.e. one peptide linker comprises a lysine residue and the other peptide linker comprises an asparagine or aspartate residue.

In some embodiments, a pair of peptide linkers comprises two peptide tags. Typically, two peptide tags do not react spontaneously to form an isopeptide bond, i.e. they require the addition of a component (e.g. peptide, e.g. a peptide ligase) that induces or catalyzes the formation of the isopeptide bond between said peptide tags/linkers, as defined above.

In some embodiments, a peptide linker (i.e. one of the peptide linkers in a cognate pair of peptide linkers) may be referred to as a peptide binding partner, which may be defined as a peptide (particularly an oligopeptide or polypeptide) which is derived or designed from an isopeptide protein and which may covalently bind to a peptide tag via an isopeptide bond (preferably via a spontaneous reaction). In some embodiments, the peptide binding partner may be designed or derived from the same isopeptide protein as the peptide tag to which it binds covalently, i.e. its corresponding peptide tag or linker.

Generally, a peptide binding partner is larger than its corresponding peptide tag and comprises or consists of a larger fragment or portion of the isopeptide protein compared to the peptide tag. In particular, in addition to comprising a residue that is involved in the formation of the isopeptide bond (i.e. a lysine or asparagine/aspartate) the peptide binding partner comprises a glutamic acid or aspartic acid residue that facilitates or induces the formation of the isopeptide bond between the peptide linkers, e.g. the peptide tag and peptide binding partner.

Thus, a peptide binding partner may comprise a fragment of an isopeptide protein which overlaps with a fragment designed to constitute a peptide tag or may comprise a discrete and separate fragment of the isopeptide protein compared to that of the peptide tag. Thus, the sequence of the peptide binding partner may overlap with that of the designed peptide tag or the peptide tag and peptide binding partner may comprise or consist of two discrete fragments of the isopeptide protein. In some embodiments, the peptide tag may not be based on the sequence of the isopeptide protein, e.g. the peptide tag (peptide linker) may be designed de novo.

Whilst there is no particular limit on the size of a peptide binding partner, practically it is preferable to minimise the size of the peptide linkers for use in the methods and uses of the invention.

Thus, in some embodiments, the peptide linker (e.g. peptide binding partner) may be between 50-300 amino acids in length, e.g. 60-250, 70-225, 80- 200 amino acids in length, e.g. it may comprise or consist of 60, 65, 70, 75, 80, 85, 90, 95, 100, 1 10, 120, 130, 140, 150, 160, 170, 180, 190 or 200 amino acids.

Accordingly, in some embodiments, a pair of peptide linkers comprises a peptide tag and a peptide binding partner, wherein said peptide linkers react spontaneously to form an isopeptide bond. When the isopeptide bond formation between the peptide linkers is not spontaneous (e.g. when the peptide linkers are both peptide tags) the peptide that induces or catalyzes the formation of the isopeptide bond between said peptide tags/linkers may be viewed as a peptide (e.g. a peptide ligase) derived from an isopeptide protein or a peptide binding partner as defined above. In particular, the peptide comprises a glutamic acid or aspartic acid residue that facilitates or induces the formation of the isopeptide bond between the peptide linkers, but importantly the ligase does not contain an amino acid residue that reacts to form an isopeptide bond with either of the peptide linkers in the peptide linker pair. In some

embodiments, the peptide ligase may be between 50-300 amino acids in length, e.g. 60-250, 70-225, 80-200 amino acids in length, e.g. it may comprise or consist of 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190 or 200 amino acids.

Thus, a peptide linker (e.g. peptide tag and/or peptide binding partner) therefore does not consist of the entire protein sequence of an isopeptide protein and is shorter in length. For instance, a peptide linker may comprise less than 5, 10, 20, 30, 40 or 50% of the number of amino acid residues present in the isopeptide protein.

Whilst a peptide linker or pair of peptide linkers can be based upon a sequence of an isopeptide protein (particularly one or more fragments thereof), it will be readily understood by the skilled person that the sequence of the peptide linker may differ from the sequence of the portion of the isopeptide protein from which it is derived. Thus, in some embodiments the peptide linker or pair of peptide linkers may comprise mutations or alterations as compared to the sequence of the isopeptide protein from which it is derived. As discussed below, some mutations may be introduced to the peptide linker sequence to improve the stability and/or function of the peptide linker, e.g. to improve the reaction rate of the spontaneous isopeptide bond formation between the peptide linkers.

Thus, in some embodiments, a peptide linker may comprise or consist of a fragment of an isopeptide protein, wherein the fragment fulfils the size criteria set forth above and comprises at least 70, 75, 80, 85, 90, 95, 96, 97, 98, 99 or 100% sequence identity to a comparable region of the isopeptide protein from which is was derived.

Moreover, as noted above, isopeptide proteins may be identified by searching for structural homologues of known isopeptide proteins, i.e. proteins with sequence similarity or identity to known isopeptide proteins. Such homologues may be viewed as functionally equivalent proteins and may find utility in the production of peptides linkers of the present invention.

In some embodiments, a pair of peptide linkers for use in the methods and uses of the invention may be derived from any suitable isopeptide protein. As mentioned above, various isopeptide proteins are known in the art. For instance, peptide linkers may be derived from the major pilin protein Spy0128, which has an amino acid sequence as set out in SEQ ID NO. 23 and is encoded by a nucleotide sequence as set out in SEQ ID NO. 24. Two isopeptide bonds are formed in the protein. One isopeptide bond is formed between lysine at position 179 in SEQ ID NO. 23 and asparagine at position 303 in SEQ ID NO. 23 (the reactive residues). The glutamic acid residue which induces the spontaneous isopeptide bond is found at position 258 in SEQ ID NO. 23. Thus, a pair of peptide linkers developed from an isopeptide protein set forth in SEQ ID NO: 23 will preferably comprise a peptide linker comprising a fragment of the protein comprising the reactive asparagine at position 303 and a peptide linker comprising a fragment of the protein comprising the reactive lysine at position 179. In some embodiments, one of the peptide linkers will comprise a fragment that also contains the glutamic acid residue at position 258. In some embodiments, a fragment of the protein comprising the glutamic acid residue at position 258 may be provided separately, i.e. as a peptide ligase as defined above.

Another isopeptide bond in the major pilin protein Spy0128 occurs between the lysine residue at position 36 of SEQ ID NO. 23 and the asparagine residue at position 168 of SEQ ID NO. 23. The glutamic acid residue which induces isopeptide formation is found at position 1 17 in SEQ ID NO. 23. Thus, a pair of peptide linkers developed from an isopeptide protein set forth in SEQ ID NO: 23 will preferably comprise a peptide linker comprising a fragment of the protein comprising the reactive lysine residue at position 36 and a peptide linker comprising a fragment of the protein comprising the reactive asparagine at position 168. In some embodiments, one of the peptide linkers will comprise a fragment that also contains the glutamic acid residue at position 117. In some embodiments, a fragment of the protein comprising the glutamic acid residue at position 1 17 may be provided separately, i.e. as a peptide ligase as defined above. ACE19, a domain of an adhesin protein from E. faecalis, also spontaneously forms an isopeptide bond. ACE19 has an amino acid sequence as set forth in SEQ ID NO. 27 and is encoded by a nucleotide sequence as set forth in SEQ ID NO. 28.

The isopeptide bond occurs between a lysine residue at position 181 of SEQ ID NO. 27 and an asparagine residue at position 294 of SEQ ID NO. 27. The bond is induced by an aspartic acid residue at position 213 in SEQ ID NO. 27. Thus, a pair of peptide linkers developed from isopeptide protein set forth in SEQ ID NO: 27 will preferably comprise a peptide linker comprising a fragment of the protein comprising the reactive asparagine residue at position 294 and a peptide linker comprising a fragment of the protein comprising the reactive lysine residue at position 181. In some embodiments, one of the peptide linkers will comprise a fragment that also contains the aspartic acid residue at position 213. In some embodiments, a fragment of the protein comprising the aspartic acid residue at position 213 may be provided separately, i.e. as a peptide ligase as defined above.

The collagen binding domain from S. aureus which has an amino acid sequence set out in SEQ ID NO. 29, comprises one spontaneously formed isopeptide bond. The isopeptide bond occurs between lysine at position 176 of SEQ ID NO. 29 and asparagine at position 308 of SEQ ID NO. 29. The aspartic acid residue which induces the isopeptide bond is at position 209 of SEQ ID NO. 29. Thus, a pair of peptide linkers developed from the isopeptide protein set forth in SEQ ID NO: 29 will preferably comprise a peptide linker comprising a fragment of the protein comprising the reactive lysine at position 176 and a peptide linker comprising a fragment of the protein comprising the reactive asparagine at position 308. In some embodiments, one of the peptide linkers will comprise a fragment that also contains the aspartic acid residue at position 209. In some embodiments, a fragment of the protein comprising the aspartic acid residue at position 209 may be provided separately, i.e. as a peptide ligase as defined above.

FbaB from Streptococcus pyogenes comprises a domain, CnaB2, which has an amino acid sequence set out in SEQ ID NO. 25, is encoded by the nucleotide sequence set out in SEQ ID NO.26 and which comprises one spontaneously formed isopeptide bond. The isopeptide bond in the CnaB2 domain forms between a lysine at position 15 of SEQ ID NO. 25 and an aspartic acid residue at position 101 of SEQ ID NO. 25. The glutamic acid residue which induces the isopeptide bond is at position 61 of SEQ ID NO. 25. Thus, a pair of peptide linkers developed from the isopeptide protein set forth in SEQ ID NO: 25 will preferably comprise a peptide linker comprising a fragment of the protein comprising the reactive lysine at position 15 and a peptide linker comprising a fragment of the protein comprising the reactive aspartic acid at position 101. In some embodiments, one of the peptide linkers will comprise a fragment that also contains the glutamic acid residue at position 61. In some embodiments, a fragment of the protein comprising the glutamic acid residue at position 61 may be provided separately, i.e. as a peptide ligase as defined above (e.g. SEQ ID NO: 34).

The RrgA protein is an adhesion protein from Streptococcus pneumoniae, which has an amino acid sequence as set out in SEQ ID NO. 21 and is encoded by a nucleotide sequence as set out in SEQ ID NO. 22. An isopeptide bond is formed between lysine at position 742 in SEQ ID NO. 21 and asparagine at position 854 in SEQ ID NO. 21. The bond is induced by a glutamic acid residue at position 803 in SEQ ID NO. 21. Thus, a pair of peptide linkers developed from the isopeptide protein set forth in SEQ ID NO: 21 will preferably comprise a peptide linker comprising a fragment of the protein comprising the reactive asparagine at position 854 and a peptide linker comprising a fragment of the protein comprising the reactive lysine at position 742. In some embodiments, one of the peptide linkers will comprise a fragment that also contains the glutamic acid residue at position 803. In some embodiments, a fragment of the protein comprising the glutamic acid residue at position 803 may be provided separately, i.e. as a peptide ligase as defined above.

The PsCs protein is a fragment of the por secretion system C-terminal sorting domain protein from Streptococcus intermedius, which has an amino acid sequence as set out in SEQ ID NO. 31 and is encoded by a nucleotide sequence as set out in SEQ ID NO. 32. An isopeptide bond is formed between lysine at position 405 in SEQ ID NO. 31 and aspartate at position 496 in SEQ ID NO. 31. Thus, a pair of peptide linkers developed from the isopeptide protein set forth in SEQ ID NO: 31 will preferably comprise a peptide linker comprising a fragment of the protein comprising the reactive aspartate at position 496 and a peptide linker comprising a fragment of the protein comprising the reactive lysine at position 405.

Thus, in some embodiments, a pair of peptide linkers for use in the method of the invention may be derived from an isopeptide protein comprising an amino acid sequence as set forth in any one of SEQ ID NOs: 21 , 23, 25, 27, 29 or 31 or a protein with at least 70% sequence identity to an amino acid sequence as set forth in any one of SEQ ID NOs: 21 , 23, 25, 27, 29 or 31. ln some embodiments, said isopeptide protein sequence above is at least 75, 80, 85, 90, 95, 96, 97, 98, 99 or 100% identical to the sequence (SEQ ID NOs: 21 , 23, 25, 27, 29 or 31) to which it is compared.

Preferably, peptide linkers derived from the isopeptide proteins defined above fulfil the size and sequence identity criteria described above.

Sequence identity may be determined by any suitable means known in the art, e.g. using the SWISS-PROT protein sequence databank using FASTA pep-cmp with a variable pamfactor, and gap creation penalty set at 12.0 and gap extension penalty set at 4.0, and a window of 2 amino acids. Other programs for determining amino acid sequence identity include the BestFit program of the Genetics Computer Group (GCG) Version 10 Software package from the University of Wsconsin. The program uses the local homology algorithm of Smith and Waterman with the default values: Gap creation penalty - 8, Gap extension penalty = 2, Average match = 2.912, Average mismatch = -2.003.

Preferably said comparison is made over the full length of the sequence, but may be made over a smaller window of comparison, e.g. less than 200, 100 or 50 contiguous amino acids.

Preferably such sequence identity-related proteins are functionally equivalent to the polypeptides which are set forth in the recited SEQ ID NOs. As referred to herein, "functional equivalence" refers to homologues of the isopeptide proteins discussed above that may show some reduced efficacy in spontaneously forming isopeptide bonds relative to the parent molecule (i.e. the molecule with which it shows sequence homology), but preferably are as efficient or are more efficient.

In some embodiments, orthogonal pairs of peptide linkers may be derived from any two or more of the isopeptide proteins defined above. In preferred embodiments, a first pair of peptide linkers is derived from an isopeptide protein having an amino acid sequence as set forth in SEQ ID NO: 21 and a second, orthogonal, pair of peptide linkers is derived from an isopeptide protein having an amino acid sequence as set forth in SEQ ID NO: 25. As discussed below, in some embodiments, two orthogonal pairs of peptide linkers may be derived from the same isopeptide protein, e.g. SEQ ID NO: 21. Other orthogonal pairs of peptide linkers may be derived from isopeptide proteins having amino acid sequences as set forth in SEQ ID NOs: 21 and 23, 21 and 27, 21 and 29, 21 and 31 , 25 and 27, 25 and 29 or 25 and 31. The skilled person readily would be able to determine whether any two pairs of peptide linkers are orthogonal based on the methods disclosed herein, particularly the Examples. For instance, various combinations of peptide linkers from different pairs of peptide linkers may be contacted, e.g. in solution, for a suitable period of time, e.g. 1-24 hours, under conditions that facilitate isopeptide bond formation, e.g. in PBS at pH 4-9, e.g. pH 7, at 1-40°C, e.g. 25°C. The sample may be analysed, e.g. by gel electrophoresis (e.g. SDS-PAGE), to determine whether any of the peptide linkers have reacted, i.e. by looking for conjugated peptides, see e.g. Figure 7. Thus, orthogonal pairs of peptide linkers for use in the method of the invention may be derived from any suitable combination of isopeptide proteins.

The inventor has advantageously developed pairs of peptide linkers that find particular utility in the methods and uses of the invention. In this respect, the inventor has determined that peptide linker pairs may be derived from the RrgA protein as defined above. However, as described in detail in the Examples below, the inventor introduced mutations into the peptide linkers relative to the native RrgA sequence to improve the reactivity of the peptide linkers. Specifically, a glycine residue was replaced with a threonine residue to stabilize a β-strand and an aspartate residue was replaced with a glycine residue to stabilize a hairpin turn close to the reaction site.

Thus, the present invention provides a peptide linker comprising:

(i) an amino acid sequence as set forth in SEQ ID NO: 1 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 1 , wherein said amino acid sequence comprises a lysine residue at position 9; or

(ii) an amino acid sequence as set forth in SEQ ID NO: 2 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 2, wherein said amino acid sequence comprises a glutamate or aspartate residue at position 55, a threonine residue at position 94, a glycine residue at position 100 and an asparagine or aspartate residue at position 106.

In some embodiments, the peptide linker in (i) comprises an amino acid sequence as set forth in SEQ ID NO: 38 and/or the peptide linker in (ii) comprises an amino acid sequence as set forth in SEQ ID NO: 39.

In a further embodiment, the present invention provides a peptide linker comprising: (i) an amino acid sequence as set forth in SEQ ID NO: 5 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 5, wherein said amino acid sequence comprises an aspartate or asparagine residue at position 8; or

(ii) an amino acid sequence as set forth in SEQ ID NO: 6 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 6, wherein said amino acid sequence comprises a lysine residue at position 8.

In some embodiments, the peptide linker in (i) comprises an amino acid sequence as set forth in SEQ ID NO: 42 and/or the peptide linker in (ii) comprises an amino acid sequence as set forth in SEQ ID NO: 43.

In a still further embodiment, the present invention provides a peptide linker comprising:

(i) an amino acid sequence as set forth in SEQ ID NO: 9 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 9, wherein said amino acid sequence comprises an asparagine or aspartate residue at position 17; or

(ii) an amino acid sequence as set forth in SEQ ID NO: 10 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 10, wherein said amino acid sequence comprises a lysine residue at position 9 and a glutamate or aspartate residue at position 70.

In some embodiments, the peptide linker in (i) comprises an amino acid sequence as set forth in SEQ ID NO: 109 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 109, wherein said amino acid sequence comprises an asparagine or aspartate residue at position 17, a glycine residue at position 11 and preferably an isoleucine residue at position 20, a proline residue at positions 21 and 22 and a lysine residue at position 23.

In some embodiments, the peptide linker in (i) comprises an amino acid sequence as set forth in SEQ ID NO: 46 and/or the peptide linker in (ii) comprises an amino acid sequence as set forth in SEQ ID NO: 47.

In some embodiments, said peptide linker sequence above is at least 75, 80, 85, 90, 95, 96, 97, 98, 99 or 100% identical to the sequence (SEQ ID NOs: 1 , 2, 5, 6, 9, 10 or 109) to which it is compared.

In preferred embodiments, the peptide linker defined in each part (i) above is capable of spontaneously forming an isopeptide bond with a peptide linker comprising an amino acid sequence as defined in each respective part (ii) above. For instance, a peptide linker comprising an amino acid sequence as set forth in SEQ ID NO: 1 or variant thereof is capable of spontaneously forming an isopeptide bond with a peptide linker comprising an amino acid sequence as set forth in SEQ ID NO: 2 or a variant thereof. Similarly, peptides comprising SEQ ID NOs: 5 and 6 or variants thereof are capable of spontaneously forming an isopeptide bond with each other, and peptides comprising SEQ ID NOs: 9 and 10 or variants thereof (e.g. SEQ ID NO: 109) are capable of spontaneously forming an isopeptide bond with each other (e.g. SEQ ID NOs: 109 and 10).

Thus, the invention provides a pair of peptide linkers that can be used in the methods and uses of the invention comprising:

(1) peptide linkers comprising SEQ ID NOs: 1 and 2 or variants thereof as defined above, e.g. SEQ ID NOs: 38 and 39;

(2) peptide linkers comprising SEQ ID NOs: 5 and 6 or variants thereof as defined above, e.g. SEQ ID NOs: 42 and 43;

(3) peptide linkers comprising SEQ ID NOs: 9 and 10 or variants thereof as defined above, e.g. SEQ ID NOs: 46 and 47;or

(4) peptide linkers comprising SEQ ID NOs: 109 and 10 or variants thereof as defined above.

Thus, each pair of peptide linkers defined above may be defined as a cognate peptide linker pair.

In some embodiments, each peptide linker pair defined above (i.e. each cognate peptide linker pair) may be viewed as being orthogonal (i.e. non-cognate) to the other peptide linker pairs, e.g. pair (1) is orthogonal to pair (2), (3) and/or pair (4), pair (2) is orthogonal to pair (1), (3) and/or pair (4), pair (3) is orthogonal to pair (1) and/or pair (2) and pair (4) is orthogonal to pair (1) and/or (2). In some embodiments, these orthogonal pairs represent preferred orthogonal (non-cognate) pairs of peptide (cognate) linkers for use in the methods and uses of the invention. Further preferred orthogonal pairs of peptide linkers are defined below.

As discussed above, the peptide linkers of the invention find particular utility in the synthesis of fusion proteins, wherein the peptide linkers are incorporated in (e.g. form domains of, or are linked to) a protein unit to be linked (conjugated) to another protein unit to form a fusion protein. Thus, in further embodiments, the invention provides a recombinant or synthetic polypeptide comprising polypeptide and a peptide linker as defined above. It will be evident that the peptide linkers of the invention may find utility in other methods and uses, e.g. as peptide tags as described in WO2011/098772 (herein incorporated by reference).

Other peptide linkers that may be used in the methods and uses of the invention include:

(i) an amino acid sequence as set forth in SEQ ID NO: 13 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 13, wherein said amino acid sequence comprises an aspartate or asparagine residue at position 7; or

(ii) an amino acid sequence as set forth in SEQ ID NO: 14 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 14, wherein said amino acid sequence comprises a glutamate or aspartate residue at position 56, and a lysine residue at position 10; or

(iii) an amino acid sequence as set forth in SEQ ID NO: 33 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ

ID NO: 33, wherein said amino acid sequence comprises a lysine residue at position 8; or

(iv) an amino acid sequence as set forth in SEQ ID NO: 17 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 17, wherein said amino acid sequence comprises an aspartate or asparagine residue at position 11 ; or

(v) an amino acid sequence as set forth in SEQ ID NO: 18 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 18, wherein said amino acid sequence comprises a glutamate or aspartate residue at position 241 and a lysine residue at position 162.

In some embodiments, said peptide linker sequence above is at least 75, 80, 85, 90, 95, 96, 97, 98, 99 or 100% identical to the sequence (SEQ ID NOs: 13, 14, 17, 18, or 33) to which it is compared.

Other peptide linker pairs that may be used in the methods and uses of the invention include:

(5) peptide linkers comprising SEQ ID NOs: 13 and 14 or variants thereof as defined above;

(6) peptide linkers comprising SEQ ID NOs: 13 and 33 or variants thereof as defined above; or (7) peptide linkers comprising SEQ ID NOs: 17 and 18 or variants thereof as defined above.

In some embodiments, where the cognate peptide linker pair comprises the pair defined in (6) above, the reaction also comprises a component that induces or catalyzes the formation of the isopeptide bond. For instance, the reaction comprises a peptide ligase, preferably wherein said peptide ligase comprises an amino acid sequence as set forth in SEQ ID NO: 34 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in SEQ ID NO: 34.

In some embodiments, said peptide ligase sequence above is at least 75, 80, 85, 90, 95, 96, 97, 98, 99 or 100% identical to the sequence (SEQ ID NO: 34) to which it is compared.

Whilst any orthogonal pairs of peptide linker pairs selected from (1)-(7) above may be used in the methods and uses of the invention, particularly preferred orthogonal pairs of peptide linkers include any one of the following pairs, which are defined above: (1) and (4), (1) and (5), (1) and (6), (1) and (3), (1) and (2), (2) and (4), (2) and (5), (2) and (6), (3) and (5), (3) and (6), (4) and (5) and (4) and (6).

The position of the peptide linker within a protein to be linked to another protein to form a fusion protein is not particularly important. Thus, in some embodiments the peptide linker may be located at the N-terminus or C-terminus of the recombinant or synthetic polypeptide or a protein to be linked in the fusion protein. In some embodiments, the peptide linker may be located internally within the recombinant or synthetic polypeptide or a protein to be linked in the fusion protein. Thus, in some embodiments the peptide linker may be viewed as an N- terminal, C-terminal or internal domain of the recombinant or synthetic polypeptide or a protein to be linked in the fusion protein.

In some embodiments, it may be useful to include one or more spacers, e.g. a peptide spacer, between the protein to be joined in or to the fusion protein and the peptide linker. Thus, the protein and peptide linker may be linked directly to each other or they may be linked indirectly by means of one or more spacer sequences. Thus, a spacer sequence may interspace or separate two or more individual parts of the recombinant or synthetic polypeptide or a protein to be linked in the fusion protein. In some embodiments, a spacer may be N-terminal or C-terminal to the peptide linker. In some embodiments, spacers may be at both sides of the peptide linker. The precise nature of the spacer sequence is not critical and it may be of variable length and/or sequence, for example it may have 1-40, more particularly 2- 20, 1-15, 1-12, 1-10, 1-8, or 1-6 residues, e.g. 6, 7, 8, 9, 10 or more residues. By way of representative example the spacer sequence, if present, may have 1-15, 1- 12, 1-10, 1-8 or 1-6 residues etc. The nature of the residues is not critical and they may for example be any amino acid, e.g. a neutral amino acid, or an aliphatic amino acid, or alternatively they may be hydrophobic, or polar or charged or structure- forming e.g. proline. In some preferred embodiments, the linker is a serine and/or glycine-rich sequence.

Exemplary spacer sequences thus include any single amino acid residue, e.g. S, G, L, V, P, R, H, M, A or E or a di-, tri- tetra- penta- or hexa-peptide composed of one or more of such residues. Representative and preferred spacer sequences comprise an amino acid sequence as set forth in SEQ ID NO: 36 or 37.

The recombinant or synthetic polypeptides of the invention may also comprise purification moieties or tags to facilitate their purification (e.g. prior to use in the methods and uses of the invention and/or during the extension of the fusion protein as discussed below). Any suitable purification moiety or tag may be incorporated into the polypeptide and such moieties are well known in the art. For instance, in some embodiments, the recombinant or synthetic peptide may comprise a peptide purification tag or moiety, e.g. a His-tag sequence. Such purification moieties or tags may be incorporated at any position within the polypeptide. In some preferred embodiments, the purification moiety is located at or towards (i.e. within 5, 10, 15, 20 amino acids of) the N- or C-terminus of the polypeptide.

Representative recombinant or synthetic polypeptides of the invention include polypeptides with an amino acid sequence as set forth in any one of SEQ ID NOs: 50-59 or a sequence with at least 70% sequence identity to an amino acid sequence as set forth in any one of SEQ ID NOs: 50-59, wherein said polypeptides comprise a peptide linker as defined above.

Preferably the recombinant or synthetic polypeptide fulfils the sequence identity requirements defined above, e.g. is at least 75, 80, 85, 90, 95, 96, 97, 98, 99 or 100% identical to the sequence to which it is compared.

As noted above, an advantage of the present invention arises from the fact that the peptide linkers incorporated in the proteins (e.g. the recombinant or synthetic polypeptides of the invention) to be joined together to form a fusion protein may be completely genetically encoded. Thus, in a further aspect, the invention provides a nucleic acid molecule encoding a peptide linker or polypeptide as defined above.

In some embodiments, the nucleic acid molecule encoding a peptide linker defined above comprises a nucleotide sequence as set forth in any one of SEQ ID NOs: 3, 4, 7, 8, 1 1 , 12, 40, 41 , 44, 45, 48, 49 or 110 or a nucleotide sequence with at least 70% sequence identity to a sequence as set forth in any one of SEQ ID NOs: 3, 4, 7, 8, 1 1 , 12, 40, 41 , 44, 45, 48, 49 or 110.

In some embodiments, the nucleic acid molecule encoding a recombinant or synthetic polypeptide defined above comprises a nucleotide sequence as set forth in any one of SEQ ID NOs: 60-69 or a nucleotide sequence with at least 70% sequence identity to a sequence as set forth in any one of SEQ ID NOs: 60-69.

Preferably, the nucleic acid molecule above is at least 75, 80, 85, 90, 95, 96, 97, 98, 99 or 100% identical to the sequence to which it is compared.

Nucleic acid sequence identity may be determined by, e.g. FASTA Search using GCG packages, with default values and a variable pamfactor, and gap creation penalty set at 12.0 and gap extension penalty set at 4.0 with a window of 6 nucleotides. Preferably said comparison is made over the full length of the sequence, but may be made over a smaller window of comparison, e.g. less than 600, 500, 400, 300, 200, 100 or 50 contiguous nucleotides.

The nucleic acid molecules of the invention may be made up of

ribonucleotides and/or deoxyribonucleotides as well as synthetic nucleotide residues that are capable of participating in Watson-Crick type or analogous base pair interactions. Preferably, the nucleic acid molecule is DNA or RNA.

The nucleic acid molecules described above may be operatively linked to an expression control sequence, or a recombinant DNA cloning vehicle or vector containing such a recombinant DNA molecule. This allows intracellular expression of the proteins for use in the methods and uses of the invention, e.g. the expression of the polypeptides of the invention, as a gene product, the expression of which is directed by the gene(s) introduced into cells of interest. Gene expression is directed from a promoter active in the cells of interest and may be inserted in any form of linear or circular nucleic acid (e.g. DNA) vector for incorporation in the genome or for independent replication or transient transfection/expression.

Suitable transformation or transfection techniques are well described in the literature. Alternatively, the naked nucleic acid (e.g. DNA) molecule may be introduced directly into the cell for the production of proteins and polypeptide of, and for use in, the invention. Alternatively the nucleic acid may be converted to mRNA by in vitro transcription and the relevant proteins may be generated by in vitro translation.

Appropriate expression vectors include appropriate control sequences such as for example translational (e.g. start and stop codons, ribosomal binding sites) and transcriptional control elements (e.g. promoter-operator regions, termination stop sequences) linked in matching reading frame with the nucleic acid molecules of the invention. Appropriate vectors may include plasmids and viruses (including both bacteriophage and eukaryotic viruses). Suitable viral vectors include baculovirus and also adenovirus, adeno-associated virus, herpes and vaccinia/pox viruses. Many other viral vectors are described in the art. Preferred vectors include bacterial and mammalian expression vectors pGEX-KG, pEF-neo and pEF-HA.

As noted above, the polypeptide of the invention may comprise additional sequences (e.g. peptide/protein tags to facilitate purification of the polypeptide) and thus the nucleic acid molecule may conveniently be fused with DNA encoding an additional peptide or polypeptide, e.g. His-tag, maltose-binding protein, to produce a fusion protein on expression.

Thus viewed from a further aspect, the present invention provides a vector, preferably an expression vector, comprising a nucleic acid molecule as defined above.

Other aspects of the invention include methods for preparing recombinant nucleic acid molecules according to the invention, comprising inserting nucleic acid molecule of the invention encoding the peptide linkers and/or polypeptides of the invention into vector nucleic acid.

Nucleic acid molecules of the invention, preferably contained in a vector, may be introduced into a cell by any appropriate means. Suitable transformation or transfection techniques are well described in the literature. A variety of techniques are known and may be used to introduce such vectors into prokaryotic or eukaryotic cells for expression. Preferred host cells for this purpose include insect cell lines, yeast, mammalian cell lines or E. coli, such as strain BL21/DE3. The invention also extends to transformed or transfected prokaryotic or eukaryotic host cells containing a nucleic acid molecule, particularly a vector as defined above.

Thus, in another aspect, there is provided a recombinant host cell containing a nucleic acid molecule and/or vector as described above. By "recombinant" is meant that the nucleic acid molecule and/or vector has been introduced into the host cell. The host cell may or may not naturally contain an endogenous copy of the nucleic acid molecule, but it is recombinant in that an exogenous or further endogenous copy of the nucleic acid molecule and/or vector has been introduced.

A further aspect of the invention provides a method of preparing a peptide linker and/or polypeptide of the invention as hereinbefore defined, which comprises culturing a host cell containing a nucleic acid molecule as defined above, under conditions whereby said nucleic acid molecule encoding said peptide linker and/or polypeptide is expressed and recovering said molecule (peptide linker and/or polypeptide) thus produced. The expressed peptide linker and/or polypeptide forms a further aspect of the invention.

In some embodiments, the peptide linkers and/or polypeptides of the invention, or for use in the method and uses of the invention, may be generated synthetically, e.g. by ligation of amino acids or smaller synthetically generated peptides, or more conveniently by recombinant expression of a nucleic acid molecule encoding said polypeptide as described hereinbefore.

Nucleic acid molecules of the invention may be generated synthetically by any suitable means known in the art.

Thus, the peptide linker and/or polypeptide of the invention may be an isolated, purified, recombinant or synthesized peptide linker or polypeptide. As noted above, the term "polypeptide" is used herein interchangeably with the term "protein". As noted above, the term polypeptide or protein typically includes any amino acid sequence comprising at least 40 consecutive amino acid residues, e.g. at least 50, 60, 70, 80, 90, 100, 150 amino acids, such as 40-1000, 50-900, 60-800, 70-700, 80-600, 90-500, 100-400 amino acids.

Similarly, the nucleic acid molecules of the invention may be an isolated, purified, recombinant or synthesized nucleic acid molecule.

Thus, alternatively viewed, the peptide linkers, polypeptides and nucleic acid molecules of the invention preferably are non-native, i.e. non-naturally occurring, molecules.

Standard amino acid nomenclature is used herein. Thus, the full name of an amino acid residue may be used interchangeably with one letter code or three letter abbreviations. For instance, lysine may be substituted with K or Lys, isoleucine may be substituted with I or lie, and so on. Moreover, the terms aspartate and aspartic acid, and glutamate and glutamic acid are used interchangeably herein and may be replaced with asp or D, or glu or E, respectively.

Whilst it is envisaged that the peptide linkers and polypeptides of, and for use in, the invention may be produced recombinantly, and this is a preferred embodiment of the invention, it will be evident that the peptide linkers of the invention may be conjugated to proteins to be joined in a fusion protein by other means. In other words, the peptide linker and protein may be produced separately by any suitable means, e.g. recombinantly, and subsequently conjugated (joined) to form a peptide linker-protein conjugate that can be used in the methods of the invention. For instance, the peptide linkers of the invention may be produced synthetically or recombinantly, as described above, and conjugated to a protein (to be linked in a fusion protein according to the method of the invention) via a non- peptide linker or spacer, e.g. a chemical linker or spacer.

Thus, in some embodiments, the peptide linker and protein to be

incorporated into a fusion may be joined together either directly through a bond or indirectly through a linking group. Where linking groups are employed, such groups may be chosen to provide for covalent attachment of the peptide linker and protein component through the linking group. Linking groups of interest may vary widely depending on the nature of the protein component. The linking group, when present, is in many embodiments biologically inert.

A variety of linking groups are known to those of skill in the art and find use in the invention. In representative embodiments, the linking group is generally at least about 50 daltons, usually at least about 100 daltons and may be as large as 1000 daltons or larger, for example up to 1000000 daltons if the linking group contains a spacer, but generally will not exceed about 500 daltons and usually will not exceed about 300 daltons. Generally, such linkers will comprise a spacer group terminated at either end with a reactive functionality capable of covalently bonding to the peptide linker and protein component. Spacer groups of interest may include aliphatic and unsaturated hydrocarbon chains, spacers containing heteroatoms such as oxygen (ethers such as polyethylene glycol) or nitrogen (polyamines), peptides, carbohydrates, cyclic or acyclic systems that may possibly contain heteroatoms. Spacer groups may also be comprised of ligands that bind to metals such that the presence of a metal ion coordinates two or more ligands to form a complex. Specific spacer elements include: 1 ,4-diaminohexane, xylylenediamine, terephthalic acid, 3,6-dioxaoctanedioic acid, ethylenediamine-N,N-diacetic acid, 1 ,1 '-ethylenebis(5-oxo-3-pyrrolidinecarboxylic acid), 4,4'-ethylenedipiperidine.

Potential reactive functionalities include nucleophilic functional groups (amines, alcohols, thiols, hydrazides), electrophilic functional groups (aldehydes, esters, vinyl ketones, epoxides, isocyanates, maleimides), functional groups capable of cycloaddition reactions, forming disulfide bonds, or binding to metals. Specific examples include primary and secondary amines, hydroxamic acids, N- hydroxysuccinimidyl esters, N-hydroxysuccinimidyl carbonates,

oxycarbonylimidazoles, nitrophenylesters, trifluoroethyl esters, glycidyl ethers, vinylsulfones, and maleimides. Specific linker groups that may find use in the subject blocking reagent include heterofunctional compounds, such as

azidobenzoyl hydrazide, N-[4-(p-azidosalicylamino)butyl]-3'-[2'- pyridyldithio]propionamid), bis-sulfosuccinimidyl suberate, dimethyladipimidate, disuccinimidyltartrate, N-maleimidobutyryloxysuccinimide ester, N-hydroxy sulfosuccinimidyl-4-azidobenzoate, N-succinimidyl [4-azidophenyl]-1 ,3'- dithiopropionate, N-succinimidyl [4-iodoacetyl]aminobenzoate, glutaraldehyde, and succinimidyl-4-[N-maleimidomethyl]cyclohexane-1-carboxylate, 3-(2- pyridyldithio)propionic acid N-hydroxysuccinimide ester (SPDP), 4-(N- maleimidomethyl)-cyclohexane-1-carboxylic acid N-hydroxysuccinimide ester (SMCC), and the like.

In some embodiments, it may be useful to modify one or more residues in the peptide linker and/or protein to facilitate the conjugation of these molecules and/or to improve the stability of the peptide linker and/or protein. Thus, in some embodiments, the peptide linker, polypeptide or protein of, or for use in, the invention may comprise unnatural or non-standard amino acids.

In some embodiments, the peptide linker, polypeptide or protein of, or for use in, the invention may comprise one or more, e.g. at least 1 , 2, 3, 4, 5 non- conventional amino acids, such as 10, 15, 20 or more non-conventional, i.e. amino acids which possess a side chain that is not coded for by the standard genetic code, termed herein "non-coded amino acids" (see e.g. Table 1). These may be selected from amino acids which are formed through metabolic processes such as ornithine or taurine, and/or artificially modified amino acids such as 9/-/-fluoren-9- ylmethoxycarbonyl (Fmoc), (tert)-(B)utyl (o)xy (c)arbonyl (Boc), 2,2,5,7,8- pentamethylchroman-6-sulphonyl (Pmc) protected amino acids, or amino acids having the benzyloxy-carbonyl (Z) group. Examples of non-standard or structural analogue amino acids which may be used in the peptide linkers or polypeptides of, and for use in, the invention are D amino acids, amide isosteres (such as N-methyl amide, retro-inverse amide, thioamide, thioester, phosphonate, ketomethylene, hydroxymethylene, fluorovinyl, (E)-vinyl, methyleneamino, methylenethio or alkane), L-N methylamino acids, D-a methylamino acids, D-N-methylamino acids. Examples of non-conventional, i.e. non-coded, amino acids are listed in Table 1.

TABLE 1

Non-conventional Code Non-conventional Code amino acid amino acid a-aminobutyric acid Abu L-N-methylalanine Nmala a-amino-a-methylbutyrate Mgabu L-N-methylarginine Nmarg aminocyclopropane- Cpro L-N-methylasparagine Nmasn carboxylate L-N-methylaspartic acid Nmasp aminoisobutyric acid Aib L-N-methylcysteine Nmcys aminonorbornyl- Norb L-N-methylglutamine Nmgln carboxylate L-N-methylglutamic acid Nmglu cyclohexylalanine Chexa L-N-methylhistidine Nmhis cyclopentylalanine Cpen L-N-methylisolleucine Nmile

D-alanine Dal L-N-methylleucine Nmleu

D-arginine Darg L-N-methyllysine Nmlys

D-aspartic acid Dasp L-N-methylmethionine Nmmet

D-cysteine Dcys L-N-methylnorleucine Nmnle

D-glutamine Dgln L-N-methylnorvaline Nmnva

D-glutamic acid Dglu L-N-methylornithine Nmorn

D-histidine Dhis L-N-methylphenylalanine Nmphe

D-isoleucine Dile L-N-methylproline Nmpro

D-leucine DIeu L-N-methylserine Nmser

D-lysine Dlys L-N-methylthreonine Nmthr

D-methionine Dmet L-N-methyltryptophan Nmtrp

D-ornithine Dorn L-N-methyltyrosine Nmtyr

D-phenylalanine Dphe L-N-methylvaline Nmval

D-proline Dpro L-N-methylethylglycine Nmetg

D-serine Dser L-N-methyl-t-butylglycine Nmtbug

D-threonine Dthr L-norleucine NIe

D-tryptophan Dtrp L-norvaline Nva

D-tyrosine Dtyr a-methyl-aminoisobutyrate Maib

D-valine Dval a-methyl-Y-aminobutyrate Mgabu

D-a-methylalanine Dmala a-methylcyclohexylalanine Mchexa D-a-methylarginine Dmarg a-methylcylcopentylalanine Mcpen

D-a-methylasparagine Dmasn a-methyl-a-napthylalanine Manap

D-a-methylaspartate Dmasp a-methylpenicillamine Mpen

D-a-methylcysteine Dmcys N-(4-aminobutyl)glycine Nglu

D-a-methylglutamine Dmgln N-(2-aminoethyl)glycine Naeg

D-a-methylhistidine Dmhis N-(3-aminopropyl)glycine Norn

D-a-methylisoleucine Dmile N-amino-a-methylbutyrate Nmaabu

D-a-methylleucine Dmleu a-napthylalanine Anap

D-a-methyllysine Dmlys N-benzylglycine Nphe

D-a-methylmethionine Dmmet N-(2-carbamylethyl)glycine Ngln

D-a-methylornithine Dmorn N-(carbamylmethyl)glycine Nasn

D-a-methylphenylalanine Dmphe N-(2-carboxyethyl)glycine Nglu

D-a-methylproline Dmpro N-(carboxymethyl)glycine Nasp

D-a-methylserine Dmser N-cyclobutylglycine Ncbut

D-a-methylthreonine Dmthr N-cycloheptylglycine Nchep

D-a-methyltryptophan Dmtrp N-cyclohexylglycine Nchex

D-a-methyltyrosine Dmty N-cyclodecylglycine Ncdec

D-a-methylvaline Dmval N-cylcododecylglycine Ncdod

D-N-methylalanine Dnmala N-cyclooctylglycine Ncoct

D-N-methylarginine Dnmarg N-cyclopropylglycine Ncpro

D-N-methylasparagine Dnmasn N-cycloundecylglycine Ncund

D-N-methylaspartate Dnmasp N-(2,2-diphenylethyl)glycine Nbhm

D-N-methylcysteine Dnmcys N-(3,3-diphenylpropyl)glycine Nbhe

D-N-methylglutamine Dnmgln N-(3-guanidinopropyl)glycine Narg

D-N-methylglutamate Dnmglu N-(1 -hydroxyethyl)glycine Nthr

D-N-methylhistidine Dnmhis N-(hydroxyethyl))glycine Nser

D-N-methylisoleucine Dnmile N-(imidazolylethyl))glycine Nhis

D-N-methylleucine Dnmleu N-(3-indolylyethyl)glycine Nhtrp

D-N-methyllysine Dnmlys N-methyl-Y-aminobutyrate Nmgabu

N-methylcyclohexylalanine Nmchexa D-N-methylmethionine Dnmmet

D-N-methylornithine Dnmorn N-methylcyclopentylalanine Nmcpen

N-methylglycine Nala D-N-methylphenylalanine Dnmphe

N-methylaminoisobutyrate Nmaib D-N-methylproline Dnmpro

N-(1 -methylpropyl)glycine Nile D-N-methylserine Dnmser

N-(2-methylpropyl)glycine Nleu D-N-methylthreonine Dnmthr D-N-methyltryptophan Dnmtrp N-(1 -methylethyl)glycine Nval

D-N-methyltyrosine Dnmtyr N-methyla-napthylalanine Nmanap

D-N-methylvaline Dnmval N-methylpenicillamine Nmpen γ-aminobutyric acid Gabu N-(p-hydroxyphenyl)glycine Nhtyr

L-t-butylglycine Tbug N-(thiomethyl)glycine Ncys

L-ethylglycine Etg penicillamine Pen

L-homophenylalanine Hphe L-a-methylalanine Mala

L-a-methylarginine Marg L-a-methylasparagine Masn

L-a-methylaspartate Masp L-a-methyl-t-butylglycine Mtbug

L-a-methylcysteine Mcys L-methylethylglycine Metg

L-a-methylglutamine Mgln L-a-methylglutamate Mglu

L-a-methylhistidine Mhis L-a-methylhomophenylalanine Mhphe

L-a-methylisoleucine Mile N-(2-methylthioethyl)glycine Nmet

L-a-methylleucine MIeu L-a-methyllysine Mlys

L-a-methylmethionine Mmet L-a-methylnorleucine Mnle

L-a-methylnorvaline Mnva L-a-methylornithine Morn

L-a-methylphenylalanine Mphe L-a-methylproline Mpro

L-a-methylserine Mser L-a-methylthreonine Mthr

L-a-methyltryptophan Mtrp L-a-methyltyrosine Mtyr

L-a-methylvaline Mval L-N-methylhomophenylalanine Nmhphe

N-(N-(2,2-diphenylethyl) Nnbhm N-(N-(3,3-diphenylpropyl) Nnbhe carbamylmethyl)glycine carbamylmethyl)glycine

1 -carboxy-1 -(2,2-diphenyl- Nmbc L-O-methyl serine Omser ethylamino)cyclopropane L-O-methyl homoserine Omhse

ln some embodiments the method of the present invention may be performed heterogeneously (as described above), using a solid phase, for example, in which the growing fusion protein, preferably the first or second protein in the fusion protein chain may be immobilized on a solid phase, permitting the use of washing steps. Thus, in some embodiments, the method is a solid phase method (i.e. a heterogeneous method). Alternatively viewed, the method is performed on a solid phase or solid substrate. The use of solid phase assays offers advantages. For instance, washing steps can assist in the removal of excess, unreacted proteins and/or components that may interfere with subsequent rounds of reaction (i.e. the addition of further proteins to the fusion protein), e.g. peptide ligases, components involved in unblocking (uncaging, unmasking, deprotecting) peptide linkers etc.

Immobilization of the fusion protein on a solid phase may be achieved in various ways. The fusion protein may be immobilized, i.e. bound to the support, in any convenient way. In some embodiments, the first or second protein of the fusion protein is immobilized on a solid support. Thus, in some embodiments, the method may comprise a step of immobilizing the first protein on a solid support. In some embodiments, the method may comprise a step of immobilizing the linked protein, comprising the first and second protein, on a solid support.

Thus the manner or means of immobilization and the solid support may be selected, according to choice, from any number of immobilization means and solid supports as are widely known in the art and described in the literature. Thus, the fusion protein may be directly bound to the support, for example via a domain or moiety of at least one protein in the fusion protein (e.g. chemically cross-linked). In some embodiments, the fusion protein may be bound indirectly by means of a linker group, or by an intermediary binding group(s) (e.g. by means of a biotin-streptavidin interaction). Thus, the fusion protein may be covalently or non-covalently linked to the solid support. The linkage may be a reversible (e.g. cleavable) or irreversible linkage. Thus, in some embodiments, the linkage may be cleaved enzymatically, chemically or with light, e.g. the linkage may be a light-sensitive linkage.

Thus, in some embodiments, a protein to be included in the fusion protein may be provided with means for immobilization (e.g. an affinity binding partner, e.g. biotin or a hapten, capable of binding to its binding partner, i.e. a cognate binding partner, e.g. streptavidin or an antibody) provided on the support. In some embodiments, the protein to be immobilized on the support may be a binding protein, e.g. maltose binding protein, antibody etc. The interaction between the fusion protein and the solid support must be robust enough to allow for washing steps, i.e. the interaction between the fusion protein and solid support is not disrupted (significantly disrupted) by the washing steps. For instance, it is preferred that with each washing step, less than 5%, preferably less than 4, 3, 2, 1 , 0.5 or 0.1 % of the fusion protein is removed or eluted from the solid phase. In this respect, the inventor has developed a modified maltose binding protein that has an improved binding affinity for maltose and therefore finds particular utility in the methods of the invention.

Thus, a further aspect of the invention provides a maltose binding protein comprising an amino acid sequence as set forth in SEQ ID NO: 70 or a sequence with at least 70% identity with an amino acid sequence as set forth in SEQ ID NO: 70.

In some embodiments, the maltose binding protein above is at least 75, 80, 85, 90, 95, 96, 97, 98, 99 or 100% identical to the sequence to which it is compared.

Preferably, a maltose binding protein with at least 70% identity with an amino acid sequence as set forth in SEQ ID NO: 70 is functionally equivalent to a protein consisting of the amino acid sequence as set forth in SEQ ID NO: 70, i.e. is capable of binding maltose with the same affinity as, or with greater affinity than, a protein consisting of the amino acid sequence as set forth in SEQ ID NO: 70. For instance, the maltose binding protein of the invention has a binding affinity for maltose of less than O^M, e.g. 0.1 , 0.08, 0.05, 0.03, or 0.01 μΜ or less. In preferred embodiments, the maltose binding protein with at least 70% identity with an amino acid sequence as set forth in SEQ ID NO: 70 comprises a valine at positions 312 and 317.

The invention also provides a nucleic acid molecule encoding the maltose binding protein defined above. In some embodiments, the nucleic acid molecule comprises a nucleotide sequence as set forth in SEQ ID NO: 71 or a sequence with at least 70% sequence identity to a nucleotide sequence as set forth in SEQ ID NO: 71.

In some embodiments, the maltose binding protein comprises (e.g. is conjugated to) a peptide linker as defined herein. In still further embodiments, the maltose binding protein comprises more than one (e.g. 2 or 3) amino acid sequences as defined above, i.e. it comprises a repeated sequence. The fusion protein, e.g. the first protein to be incorporated in fusion protein, may be immobilized before or after it is contacted with a further protein (e.g. second protein) to be incorporated into the fusion protein. Further, such an "immobilizable" fusion protein may be contacted with the further protein together with the support.

The solid support may be any of the well-known supports or matrices which are currently widely used or proposed for immobilization, separation etc. These may take the form of particles (e.g. beads which may be magnetic, para-magnetic or non-magnetic), sheets, gels, filters, membranes, fibres, capillaries, slides, arrays or microtitre strips, tubes, plates or wells etc.

The support may be made of glass, silica, latex or a polymeric material.

Suitable are materials presenting a high surface area for binding of the fusion protein. Such supports may have an irregular surface and may be for example porous or particulate e.g. particles, fibres, webs, sinters or sieves. Particulate materials, e.g. beads are useful due to their greater binding capacity, particularly polymeric beads.

Conveniently, a particulate solid support used according to the invention will comprise spherical beads. The size of the beads is not critical, but they may for example be of the order of diameter of at least 1 and preferably at least 2 μηι, and have a maximum diameter of preferably not more than 10, and e.g. not more than 6 μι ι.

Monodisperse particles, that is those which are substantially uniform in size (e.g. size having a diameter standard deviation of less than 5%) have the advantage that they provide very uniform reproducibility of reaction. Representative monodisperse polymer particles may be produced by the technique described in US-A-4336173.

However, to aid manipulation and separation, magnetic beads are advantageous. The term "magnetic" as used herein means that the support is capable of having a magnetic moment imparted to it when placed in a magnetic field, and thus is displaceable under the action of that field. In other words, a support comprising magnetic particles may readily be removed by magnetic aggregation, which provides a quick, simple and efficient way of separating the particles following the isopeptide bond formation steps.

In some embodiments, the solid support is an amylose resin.

Upon the formation of an isopeptide bond between the penultimate and final proteins in the fusion protein, it may be desirable to remove or elute the protein from the solid support. Thus, in some embodiments, the method comprises a step of eluting or removing the fusion protein from the solid support.

As indicated above, in certain protocols the methods of the invention may allow the simultaneous production of two or more fusion proteins on the same solid support, e.g. array. Thus, in some embodiments, the method of the invention may be viewed as a multiplex and/or high throughput format.

In a further embodiment, the invention provides a fusion protein obtained or obtainable from the method of the invention. In some embodiments, the fusion protein is immobilized on a solid substrate. Thus, in yet a further embodiment, the present invention provides a solid substrate comprising at least one fusion protein, obtained or obtainable from the method of the invention. In some embodiments, the solid substrate may be in the form of an array (i.e. a protein array, particularly a fusion protein array) comprising two or more fusion proteins (fusion proteins with different sequences) obtained or obtainable from the method of the invention. In some embodiments, the array comprises at least 10, 20, 50, 100, 200, 300, 400, 500, 1000, 1500, 2000, 5000 or 10000 fusion proteins i.e. different fusion proteins (with different structures or sequences).

In some embodiments, two or more fusion proteins obtained or obtainable from the method of the invention may be mixed together to form a library of fusion proteins. Thus, in a further embodiment, the invention provides a library of fusion proteins comprising at least two fusion proteins, obtained or obtainable from the method of the invention. In some embodiments, the library comprises at least 10, 20, 50, 100, 200, 300, 400, 500, 1000, 1500, 2000, 5000 or 10000 fusion proteins, i.e. different fusion proteins (with different structures or sequences). In some embodiments, the library may comprise fusion proteins immobilized on a solid substrate, e.g. bead or particle. For instance, each solid substrate, e.g. bead or particle, may comprise a different fusion protein.

Whilst the method of the invention has been exemplified using a

heterogeneous embodiment, it will be readily apparent from the disclosures herein that the method may be employed homogeneously (i.e. in solution). However, in order to prevent the production of mixtures of fusion proteins, it may in some embodiments be necessary to separate the fusion protein from other components in the reaction after each round of extension. Separation or purification can be achieved by any suitable means. For instance, one of the proteins in the fusion protein chain may comprise a purification tag or may be a binding protein (e.g. maltose binding protein) that would facilitate separation of the fusion protein from other components in the reaction, e.g. affinity chromatography. Additionally or alternatively, other purification/separation methods may be utilised, e.g. ion- exchange chromatography, size-exclusion chromatography, ultracentrifugation, spin-filtration, dialysis, dia-filtration etc.

Thus, in some embodiments, the method of the invention may comprise a step of separating or purifying the fusion protein after a step of isopeptide bond formation.

In a further embodiment, the invention provides a kit, particularly a kit for use in the methods and uses of the invention, i.e. in the production or synthesis of a fusion protein, wherein said kit comprises:

(a) a recombinant or synthetic polypeptide comprising a peptide linker as defined above; and

(b) a recombinant or synthetic polypeptide comprising a peptide linker as defined that is capable of forming an isopeptide bond with the peptide linker in the polypeptide of (a); and/or

(c) a nucleic acid molecule, particularly a vector, encoding a peptide linker as defined above; and/or

(d) a nucleic acid molecule, particularly a vector, encoding a peptide linker that is capable of forming an isopeptide bond with the peptide linker encoded by the nucleic acid molecule of (b),

optionally wherein the recombinant or synthetic polypeptide of (a) and/or (b) comprises a further peptide linker that is part of a pair of peptide linkers that are orthogonal to the peptide linkers in the polypeptides of (a) and (b).

The methods and uses of the invention may be defined as in vitro methods and uses, i.e. the in vitro method for the synthesis of a fusion protein.

It will be evident that the method of the invention is not limited to linking any specific proteins together to form a fusion protein. Thus, the method may utilize any protein or polypeptide as defined herein, i.e. any desired protein or polypeptide. In other words, the invention may utilise any protein or polypeptide that is desired to be included or incorporated into a fusion protein. Furthermore, the recombinant or synthetic polypeptide of the invention may comprise any protein linked to a peptide linker of the invention. The proteins may be derived or obtained from any suitable source. For instance, the proteins may be in vitro translated or purified from biological and clinical samples, e.g. any cell or tissue sample of an organism (eukaryotic, prokaryotic), or any body fluid or preparation derived therefrom, as well as samples such as cell cultures, cell preparations, cell lysates etc. Proteins may be derived or obtained, e.g. purified from environmental samples, e.g. soil and water samples or food samples are also included. The samples may be freshly prepared or they may be prior-treated in any convenient way e.g. for storage.

As noted above, in a preferred embodiment, the proteins to be incorporated in the fusion protein may be produced recombinantly and thus the nucleic acid molecules encoding said proteins may be derived or obtained from any suitable source, e.g. any viral or cellular material, including all prokaryotic or eukaryotic cells, viruses, bacteriophages, mycoplasmas, protoplasts and organelles. Such biological material may thus comprise all types of mammalian and non-mammalian animal cells, plant cells, algae including blue-green algae, fungi, bacteria, protozoa etc. In some embodiments, the proteins to be linked together in the fusion protein may be synthetic proteins.

As a representative example, the proteins to be joined in a fusion protein according to the invention may be enzymes, structural proteins, antibodies, antigens, prions, receptors, ligands, cytokines, chemokines, hormones and so on or any combination thereof.

In some embodiments, the recombinant or synthetic polypeptide of the invention and for use in the methods is not an isopeptide protein or a different isopeptide protein to the isopeptide protein from which the peptide linker is derived.

In some embodiments, the fusion protein comprises a repeated structure, e.g. the same protein may be linked together. Alternatively viewed, the fusion protein may contain two or more protein units of the same sequence. When the fusion protein contains two or more protein units of the same sequence, these protein units may be consecutive, e.g. separated only by the peptide linkers joining the protein units together, or they may be non-consecutive or non-sequential (e.g. separated by one or more proteins with a different sequence). In some preferred embodiments, the fusion protein comprises at least two proteins with different sequences, e.g. at least 2, 3, 4, 5, 6 proteins with different sequences. The proteins with different sequences may be arranged in any suitable order, depending on the purpose of the fusion protein.

In still further embodiments, the protein may consist of two or more peptide linkers as defined herein and optionally one or more spacers, e.g. peptide spacers, joining said peptide linkers. In this respect, the protein may be viewed as a non- functional protein or as a linker protein/peptide, as described above. In these embodiments, the other proteins in the fusion protein are different proteins or functional proteins, i.e. comprise sequences other than peptide linkers and spacers. Thus, in some embodiments the fusion protein comprises one or more proteins comprising an amino acid sequence as set forth in any one of SEQ ID NOs: 56-59 or with at least 70% sequence identity to an amino acid sequence as set forth in any one of SEQ ID NOs: 56-59, wherein said protein comprises at least two peptide linkers as defined above. For instance, a non-functional protein may be used as the second protein in a fusion protein, i.e. linking the first and third proteins, or the fourth protein in a fusion protein, i.e. linking the third and fifth proteins, and so on. In this representative example, the second and fourth proteins may be the same protein or different proteins. Thus, in some embodiments, the protein units in the fusion protein may comprise a linker protein alternately, e.g. function protein-linker protein-functional protein, or linker protein-functional protein-linker protein etc.

In some embodiments, the protein above is at least 75, 80, 85, 90, 95, 96,

97, 98, 99 or 100% identical to the sequence to which it is compared.

A "fusion protein" may be defined as a polymer comprising at least two protein units, e.g. 2, 3, 4, 5, 6, 7, 8, 9 or 10 or more protein units, such as 15, 20, 25 or 50 protein units, linked together by a covalent bond, preferably an isopeptide bond as defined herein. A protein unit may be defined as a molecule comprising at least 40 consecutive amino acids, preferably wherein the protein has a function in vivo, e.g. wherein the protein is capable of interacting specifically with one or more biological components, e.g. wherein the protein is active in vivo. Thus, a fusion protein may be viewed as a megastructure, macromolecule, megamolecule or polyprotein comprising at least two protein units, e.g. 2, 3, 4, 5, 6, 7, 8, 9 or 10 or more protein units, such as 15, 20, 25 or 50 protein units, linked together by a covalent bond, preferably an isopeptide bond as defined herein.

The terms "link", "linked" or "linking" in the context of the present invention with respect to two or more proteins in a fusion protein referred to joining or conjugating said proteins via a covalent bond, particularly an isopeptide bond which forms between the peptide linkers that are incorporated in said proteins (e.g.

peptide linkers that form domains of said proteins).

Whilst the invention is described above in terms of pairs linkers that react together to form an isopeptide bond, each (cognate) pair of linkers alternatively may be viewed as a single peptide linker that is formed of two separate or separable parts (tags, tags and binding partners) that react to form an isopeptide bond (to link/conjugate said proteins). Thus, viewed from this perspective, the invention may be viewed as the use of two orthogonal peptide linkers for the production of a fusion protein, wherein each peptide linker comprises or consists of two separable parts that reacts to form an isopeptide bond and wherein each part of the linker is incorporated in (forms a domain of) the proteins to be linked (conjugated) together.

It will be evident that the methods and uses described herein and the fusion proteins obtained or obtainable from the methods described herein have a wide range of utilities. Alternatively viewed, the fusion proteins produced by the methods described herein may be employed in a variety of industries. For instance, the methods of the invention may be useful for producing fusion proteins for

vaccination. In this respect, the methods may be useful for linking protein antigens into chains, either to be injected directly or used to decorate Viral-like particles (VLPS), since antigen multimerization gives greatly enhanced immune response.

The methods of the invention may be useful for producing fusion proteins with enhanced enzymatic properties, e.g. substrate channelling. In this respect, enzymes often come together to function in pathways inside cells and traditionally it has been difficult to connect multiple enzymes together outside cells (in vitro). Thus, the method of the invention could be used to enhance activity of multi-step enzyme pathways, which could be useful in a range of industrial conversions and for diagnostics.

The fusion proteins of the invention may also have improved properties with respect to their stability, i.e. the stability of the protein units in the fusion protein may be enhanced relative to their stability as independent proteins. In particular, fusion proteins may improve the thermostability of protein units. In this respect, enzymes are valuable tools in many processes but are unstable and hard to recover. Enzyme polymers have greater stability to temperature, pH and organic solvent and there is an increased desire to use enzyme polymers in industrial processes. However, prior to the present invention, enzyme polymer generation commonly used a

glutaraldehyde non-specific reaction and this will damage or denature (i.e. reduce the activity of) many potentially useful enzymes. Site-specific linkage of proteins into chains (polymers) through isopeptide bonds according to the present invention is expected to enhance enzyme resilience, such as in diagnostics or enzymes added to animal feed. In particularly preferred embodiments, enzymes may be stabilized by circularization, as discussed above. The methods of the invention will also find utility in the production of antibody polymers. In this respect, antibodies are one of the most important class of pharmaceuticals and are often used attached to surfaces. However, antigen mixing in a sample, and therefore capture of said antigen in said sample, are inefficient near surfaces. By extending chains of antibodies, it is anticipated that capture efficiency will be improved. This will be especially valuable in circulating tumour cell isolation, which at present is one of the most promising ways to enable early cancer diagnosis. Also antibodies of different specificities can be combined in any desired order.

In a still further embodiment, the methods of the invention may find utility in the production of drugs for activating cell signalling. In this respect, many of the most effective ways to activate cellular function are through protein ligands.

However, in nature a protein ligand will usually not operate alone but with a specific combination of other signalling molecules. Thus, the methods of the invention allows the generation of tailored fusion proteins (i.e. protein teams), which could give optimal activation of cellular signalling. These fusion proteins (protein teams) might be applied for controlling cell survival, division, or differentiation.

In yet further embodiments, the peptide linkers of the invention, particularly pairs of linkers of the invention may find utility in the generation of hydrogels for growth of stem cells, preparation of biomaterials, antibody functionalization with dyes or enzymes and stabilizing enzymes by cyclization.

The invention will now be described in more detail in the following non- limiting Example with reference to the following drawings:

Figure 1 shows a schematic of a representative example of solid phase synthesis of a fusion protein using two orthogonal pairs of peptide linkers,

SnoopTag/Snoop Catcher and SpyTag/SpyCatcher.

Figure 2 shows a schematic of the isopeptide bond formation in the RrgA protein from which the SnoopTag and SnoopCatcher peptide linker pair is derived (numbering based on Protein Data Bank ID 2WW8).

Figure 3 shows a picture of an SDS-PAGE gel with Coomassie staining characterising the SnoopTag-MBP reaction with SnoopCatcher alongside controls with alanine mutation of SnoopTag's reactive Lys (KA) or SnoopCatcher's reactive Asn (NA).

Figure 4 shows (A) a graph depicting the time-course of SnoopTag reaction with 1 : 1 or 2: 1 ratio of SnoopCatcher to SnoopTag-MBP; (B) a picture of an SDS- PAGE gel with Coomassie staining characterising the SnoopTag-MBP reaction with SnoopCatcher at a 2:1 ratio of SnoopCatcher to SnoopTag-MBP; (C) a graph depicting the time-course of SnoopTag reaction with 1 : 1 , 2: 1 or 4: 1 ratio of

SnoopCatcher to SnoopTag-MBP; and (D) a picture of an SDS-PAGE gel with Coomassie staining characterising the SnoopTag-MBP reaction with SnoopCatcher at a 4: 1 ratio of SnoopCatcher to SnoopTag-MBP.

Figure 5 shows (A) a bar chart depicting the pH-dependence of the isopeptide bond formation between SnoopTag-MBP and SnoopCatcher; and (B) a chart depicting the temperature-dependence of the isopeptide bond formation between SnoopTag-MBP and SnoopCatcher.

Figure 6 shows (A) a bar chart depicting the dependence of the isopeptide bond formation between SnoopTag-MBP and SnoopCatcher on salt, reducing agent and detergent; and (B) a graph depicting the TMAO-dependence of the isopeptide bond formation between SnoopTag-MBP and SnoopCatcher.

Figure 7 shows a picture of an SDS-PAGE gel with Coomassie staining characterising SnoopTag/SnoopCatcher and SpyTag/SpyCatcher orthogonal reactivity.

Figure 8 shows (A) a picture of an SDS-PAGE gel with Coomassie staining characterising PsCsTag/PsCsCatcher, SnoopTag/SnoopCatcher and

SpyTag/SpyCatcher orthogonal reactivity; and (B) a picture of an SDS-PAGE gel with Coomassie staining characterising RrgATag/RrgACatcher,

SnoopTag/SnoopCatcher and SpyTag/SpyCatcher orthogonal reactivity.

Figure 9 shows (A) a picture of an SDS-PAGE gel with Coomassie staining analysing solid-phase fusion protein synthesis. Lanes 1-3 show MBPx-SpyCatcher, SnoopTag-Affi-SpyTag and SpyCatcher-SnoopCatcher in isolation. MBPx-

SpyCatcher was bound to the amylose resin and stepwise reaction with SnoopTag- Affibody-SpyTag and SpyCatcher-SnoopCatcher was carried out. After each stage, one aliquot of sample was eluted from the resin with maltose (lanes 4-13). Samples were analysed without any further purification; and (B) a picture of an SDS-PAGE gel with Coomassie staining analysing solid-phase fusion protein synthesis. Lanes 1-3 show biotin-SpyCatcher, SnoopTag-Affi-SpyTag and SpyCatcher- SnoopCatcher in isolation. Biotin-SpyCatcher was bound to the streptavidin agarose and stepwise reaction with SnoopTag-Affibody-SpyTag and SpyCatcher- SnoopCatcher was carried out. After each stage, one aliquot of sample was eluted from the agarose with biotin (lanes 4-13). Samples were analysed without any further purification.

Figure 10 shows (A) a graph depicting electrospray ionization mass spectrometry to test identity of the decamer fusion protein, biotin- SpyCatcher:(SnoopTag-Affi-SpyTag:SpyCatcher-SnoopCatcher) 4 :SnoopTag-Affi- SpyTag and (B) a graph depicting size-exclusion chromatography analysis of the decamer fusion protein, MBPx-SpyCatcher:(SnoopTag-Affi-SpyTag:SpyCatcher- SnoopCatcher) 4 :SnoopTag-Affi-SpyTag. The inset shows the molecular weight standards.

Figure 11 shows (A) a picture of an SDS-PAGE gel with Coomassie staining analysing the thermostability of the decamer fusion protein, MBPx- SpyCatcher:(SnoopTag-Affi-SpyTag:SpyCatcher-SnoopCatcher) 4 :SnoopTag-Affi- SpyTag; and (B) a picture of an SDS-PAGE gel with Coomassie staining analysing the time-dependent stability of the decamer fusion protein, biotin- SpyCatcher:(SnoopTag-Affi-SpyTag:SpyCatcher-SnoopCatcher) 4 :SnoopTag-Affi- SpyTag.

Figure 12 shows a picture of an SDS-PAGE gel with Coomassie staining analysing solid-phase fusion protein synthesis. Lanes 1-3 show MBPx-SpyCatcher, SnoopTag-mEGFP-SpyTag and SpyCatcher-SnoopCatcher in isolation. MBPx- SpyCatcher was bound to the amylose resin and stepwise reaction with SnoopTag- mEGFP-SpyTag and SpyCatcher-SnoopCatcher was carried out. After each stage, one aliquot of sample was eluted from the resin with maltose (lanes 4-9). Samples were analysed without any further purification; and (B) a picture of an SDS-PAGE gel with Coomassie staining analysing solid-phase fusion protein synthesis. Lanes 1-3 show MBPx-SpyCatcher, SnoopTag-SpyTag-Affi3 and SpyCatcher- SnoopCatcher in isolation. The stepwise reaction was carried out and analysed as in (A).

Figure 13 shows a cartoon of two simple branched fusion protein structures that can be obtained using the methods of the invention.

Figure 14 shows a picture of an SDS-PAGE gel with Coomassie staining comparing the activity of a mutated RrgATag (RrgATag2.0, SEQ ID NO: 1 11) fused to MBP reacted with RrgACatcher with the unmutated RrgATag (SEQ ID NO: 9) fused to MBP reacted with RrgACatcher. Figure 15 shows a picture of an SDS-PAGE gel with Coomassie staining characterising various RrgATag peptide linker mutants (fused to SUMO) reacted with RrgACatcher.

Figure 16 shows graphs depicting the time-course of RrgATag2 reaction with 1 : 1 , 2: 1 or 4: 1 ratio of RrgATag2 to RrgACatcher. The inset graph shows the reaction over the first 8 minutes of the reaction.

Figure 17 shows a picture of an SDS-PAGE gel with Coomassie staining characterising RrgACatcher reactivity with SnoopTag, SnoopCatcher, SpyTag, SpyCatcher and RrgATag2.

Figure 18 shows a picture of an SDS-PAGE gel with Coomassie staining characterising RrgATag2/RrgACatcher, SnoopTag/SnoopCatcher and

SpyTag/SpyCatcher orthogonal reactivity.

EXAMPLES

Example 1 - Design and synthesis of cognate pairs of peptide linkers that form spontaneous isopeptide bonds

RrgA (SEQ ID NO: 21) is an adhesin from Streptococcus pneumoniae, a Gram-positive bacterium which can cause septicaemia, pneumonia and meningitis in humans. A spontaneous isopeptide bond forms in the D4 immunoglobulin-like domain of RrgA between residues Lys742 and Asn854 (Figure 2). The inventor split the D4 domain into a pair of peptide linkers termed SnoopTag (residues 734-748, SEQ ID NO: 1) and a protein which we named SnoopCatcher (residues 749-860, SEQ ID NO: 2).

However, the inventors founds that it was necessary introduce two mutations in to the SnoopCatcher peptide linker in order to form a stable pair of peptide linkers for use in the invention. In this respect, the inventor introduced the G842T mutation in SnoopCatcher to stabilize a β-strand and the D848G to stabilize a hairpin turn close to the reaction site.

The SnoopTag peptide was expressed as a recombinant polypeptide fused to the Maltose Binding Protein (MBP) and a His-Tag (SEQ ID NO: 50).

SnoopCatcher was expressed as a recombinant polypeptide fused to a His-Tag (SEQ ID NO: 39). SnoopTag-MBP and SnoopCatcher were expressed efficiently as soluble proteins in the cytosol of Escherichia coli and purified by Ni-NTA affinity chromatography. SnoopTag-MBP and SnoopCatcher, simply upon mixing, formed a complex stable to boiling in SDS (Figure 3). Mutations in the putative reactive Lys742 of SnoopTag (SnoopTag KA-MBP) and the putative reactive Asn854 of SnoopCatcher (SnoopCatcher NA) abolished reaction (Fig. 3). Electrospray ionization mass spectrometry supported the loss of NH 3 from isopeptide bond formation between SnoopCatcher and synthetic SnoopTag peptide; acetylated and gluconylated side-products common for E. coli overexpression were also observed.

With 1 : 1 SnoopCatcher to SnoopTag-MBP reaction proceeded to -80% yield. However, with a two-fold excess of SnoopCatcher, SnoopTag-MBP reacted quantitatively (Figure 4A and B). Similarly with an excess of SnoopTag-MBP, SnoopCatcher was -100% consumed (Figures 4C and D).

The inventors further established that reaction proceeded efficiently from pH

6-9, but was slowed at pH 5 (Figure 5A). Reaction was fastest at room temperature but also occurred at 4°C and 37°C (Figure 5B). Cysteine is absent from SnoopTag and SnoopCatcher so, as expected, the reaction was insensitive to dithiothreitol (DTT). No specific buffer component was required, with reaction in PBS as well as in the presence of the detergents Triton X-100 and Tween 20, or high salt (1 M

NaCI) (Figure 6A). The chemical chaperone trimethylamine N-oxide (TMAO) gave a modest enhancement (Figure 6B).

Spontaneous hydrolysis of an amide bond normally takes years under neutral conditions but we tested if hydrolysis was accelerated in this particular protein environment. We looked for cleavage of the SnoopTag-MBP/SnoopCatcher interaction, by competing with excess of an alternative SnoopTag-linked protein or with ammonia but we did not observe reversibility.

A further pair of peptide linkers was developed from the RrgA protein by splitting the D4 immunoglobulin-like domain in a different direction to the

SnoopTag/SnoopCatcher pair of peptide linkers. This pair of peptide linkers was termed RrgATag (SEQ ID NO: 9) and RrgACatcher (SEQ ID NO: 10). A pair of peptide linkers was also developed based on the PsCs protein (SEQ ID NO: 31), termed PsCsTag (SEQ ID NO: 5) and PsCsCatcher (SEQ ID NO: 6).

Each pair of peptide linkers is capable of spontaneously forming an isopeptide bond under a variety of conditions akin to the SnoopTag/SnoopCatcher pair discussed above. Example 2 - Investigating the cross-reactivity of pairs of peptide linkers A peptide tag and binding partner, SpyTag and SpyCatcher (SEQ ID NOs: 13 and 14), that react spontaneously to form an isopeptide bond have been developed previously (WO2011/098772).

SnoopTag has a reactive Lys, whereas SpyTag has a reactive Asp, so the inventor hypothesized that the SnoopTag/SnoopCatcher and SpyTag/SpyCatcher pairs would be fully orthogonal, i.e. would not show cross-reactivity. Upon mixing the peptide linkers in various combinations, it was found that each cognate pair of peptide linkers reacted efficiently, but found no trace of cross-reaction between pairs was found, even after overnight incubation (Figure 7). This result confirmed that the SnoopTag/SnoopCatcher pair is orthogonal to SpyTag/SpyCatcher.

The inventor also tested the PsCsTag/PsCsCatcher pair and RrgATag/ RrgACatcher pair for cross-reactivity with the SnoopTag/SnoopCatcher and SpyTag/SpyCatcher pairs. As shown in Figures 8A and B, no significant cross- reactivity was found between the "PsCs" pair and the "Spy" pair or "Snoop" pair.

Similarly, no significant cross-reactivity was found between the "RrgA" pair and the "Spy" pair or "Snoop" pair. Thus, each pair of peptide linkers is orthogonal to the other pairs of peptide linkers. Example 3 - Synthesis of fusion proteins using two orthogonal pairs of peptide linkers

The inventors used the "Spy" and "Snoop" pairs of peptide linkers to demonstrate that such orthogonal pairs of peptide linkers can be used successfully to synthesise fusion proteins.

The interaction of E. coli MBP with amylose resin is widely used in affinity purification: MBP-fusions typically fold and express well and show low non-specific resin binding. MBP shows selective mild elution using maltose, avoiding need for protease removal. The affinity of wild-type MBP for maltose is 1.2 μΜ, which is practical for protein purification but insufficient for multiple rounds of washing and chain extension in fusion protein synthesis. Therefore, the inventors developed a mutated MBP to improve its maltose-binding stability. Firstly, the inventors modified the polypeptide sequence by introducing mutations A312V and 1317V and deleting residues 172, 173, 175 and 176). Secondly, the MBP mutant (SEQ ID NO: 70) was tandemly-linked to generate MBPx (His 6 -MBPmt-linker-MBPmt). For initial chain building, the inventors incorporated affibodies, a non- immunoglobulin scaffold expressed efficiently in the E. coli cytosol. The affibody to HER2 was linked at its N-terminus with SnoopTag and at its C-terminus with SpyTag (SnoopTag-Affi-SpyTag, SEQ ID NO: 72). Affibody units were bridged using SpyCatcher connected through a helical spacer to SnoopCatcher

(SpyCatcher-SnoopCatcher (SEQ ID NO: 56), which also expressed efficiently as a soluble protein in E. coli) (Figure 1). Since each linkage is covalent, chain synthesis could be followed by adding maltose to elute from the resin and then boiling the supernatant before SDS-PAGE with Coomassie staining to follow the extension of the fusion protein (Figure 9A). MBPx-SpyCatcher (bound to amylose resin) reacted quantitatively with SnoopTag-Affi-SpyTag (Figure 9A, lane 5). This construct then reacted quantitatively with SpyCatcher-SnoopCatcher (Figure 9A, lane 6).

Sequential addition of SnoopTag-Affi-SpyTag and SpyCatcher-SnoopCatcher enabled efficient chain growth, extending to a product 10 units long (a decamer, Figure 9A, lane 13).

To demonstrate solid-phase extension with a different kind of solid-phase attachment, a modified SpyCatcher protein was generated, AviTag-SpyCatcher, to allow site-specific N-terminal biotinylation. After linking biotinylated SpyCatcher to streptavidin-coated beads, fusion protein chains were assembled to the length of a decamer in the same way and eluted with free biotin (Figure 9B).

To validate the assembled decamer, electrospray ionization mass spectrometry was performed, showing good correspondence between observed and expected masses (Figure 10A). Whilst mass spectrometry gives a good indication of identity, SDS-PAGE is much better for assessing purity, because lower molecular weight side-products ionize more efficiently. Affibodies are usually monomeric, showing little self-association, thus to analyze whether the decamer formed aggregates size-exclusion chromatography (SEC) was performed. SEC gave one major peak, consistent with the expected monomeric mass of the decamer calibrated with globular protein standards, indicating that there was minimal decamer self-association under these conditions (Fig. 10B).

To assess thermostability, the decamer was heated briefly at a range of temperatures and remained largely soluble even at 70°C (Fig. 11 A). Decamer integrity to storage was also tested and little degradation and little loss of solubility was observed after 1 or 4 days (Fig. 1 1 B). Expanding from the initial incorporation of AffiHER2 into the chains, it was shown that other protein units could be incorporated efficiently using orthogonal isopeptide bond formation (Figure 12). In this respect, fluorescent protein fusion protein chains were generated (Figure 12A). Bottle-brush fusion protein polymers were also produced, by joining a tandemly-linked affibody against HER2 with both the tags at the N-terminus (SnoopTag-SpyTag-Affi-Affi-Affi) (Figure 12B).

In summary, the inventor has developed a modular approach to synthesis of fusion proteins, through spontaneous isopeptide bond formation between peptide linkers. The fusion proteins generated according to the method of the invention are linked through irreversible amide bonds, so are stable over time (if protected from proteases) and allow easy analysis by SDS-PAGE. The initiation, extension and release steps use mild conditions, independent of redox state, so should be applicable to a wide range of proteins. With only a single way for the chain to grow, products are molecularly defined, favouring reproducibility and precise tuning of function. Also, subunits do not need to be connected in an N- to C- orientation, as shown with the bottle-brush polymer architectures described above. No chemical modification of the module is required, avoiding time-consuming and hard-to-control bioconjugation steps, so the method is accessible to any laboratory able to express recombinant proteins. Spontaneous isopeptide bond formation has the advantage of a simple reaction pathway between two functional groups having low intrinsic reactivity- an amine with a carboxylic acid or a carboxamide- so there is little side- reaction.

Whilst this example demonstrates fusion protein synthesis using the "Spy" and "Snoop" pairs of peptide linkers, it will be evident that any orthogonal pairs of peptide linkers according to the invention may be utilised in the methods of the invention and, as discussed above, using more than two orthogonal pairs of peptide linkers may be particularly advantageous for synthesizing fusion proteins with complex structures, e.g. branched or circular structures. Example 4 - Design and synthesis of improved cognate pair of peptide linkers based on the RrgA protein

The RrgATag described in Example 1 was subject to a variety of

modifications with the objective of producing a pair of peptide linkers with improved activity relative to the RrgATag/RrgACatcher peptide linker pair. The inventor synthesized a mutant RrgATag peptide linker comprising a substitution at position 1 1 - Aspartic acid to Glycine (D1 1G) - called RrgATag2.0 (see Table 2 below). RrgATag2.0 (SEQ ID NO: 11 1) and RrgATag (SEQ ID NO: 9) were expressed as fusion proteins linked to maltose binding protein (MBP) and their reactivity with RrgACatcher was compared. The reactions were performed for 6 hours in phosphate buffered saline (PBS) at pH 7.4 and room temperature. 10 μΜ of each protein was used in each reaction.

Figure 14 shows that RrgATag2.0 has greatly increased reactivity with RrgACatcher when compared with RrgATag.

The inventor synthesized a further eight peptide linkers comprising various mutations relative to the RrgATag (SEQ ID NO: 9), including extensions, truncations, substitutions and combinations thereof. Table 2 shows the sequences of the mutant RrgATag peptide linkers, wherein substitutions and extensions are underlined.

Table 2

Peptide name (SEQ ID Sequence Modification relative to NO:) RrgATag (SEQ ID NO:

9)

RrgATag (SEQ ID NO: 9) Dl PATYEFTN DKHYITN EP -

RrgATag2 (SEQ ID NO: DIPATYEFTNGKHYITNEPIPPK D1 1 G (substitution) 109) 4 amino acid C- terminal extension

RrgATag2.0 (SEQ ID NO: DIPATYEFTNGKHYITNEP D1 1 G (substitution) 1 11)

RrgATag2.1 (SEQ ID NO: DIPATYEFTNGKHYITNE D1 1G (substitution) 113) 1 amino acid C- terminal deletion

RrgATag2.2 (SEQ ID NO: Dl PATYEFTNGKHYITN D1 1G (substitution) 115) 2 amino acid C- terminal deletion

RrgATag2.3 (SEQ ID NO: ATYEFTNGKHYITNEP D1 1G (substitution) 117) 3 amino acid N- terminal deletion RrgATag2.4 (SEQ ID NO: KHYITNEP 1 1 amino acid N- 119) terminal deletion

RrgATag2.5 (SEQ ID NO: GKHYITNEP D1 1G (substitution) 121) 10 amino acid N- terminal deletion

RrgATag2.6 (SEQ ID NO: NGKHYITNEP D1 1G (substitution) 123) 9 amino acid N- terminal deletion

RrgATag2.7 (SEQ ID NO: IVPQDIPATYEFTNGKHYITNEP D1 1G (substitution) 125) 4 amino acid N- terminal extension

The mutated RrgATag peptide linkers were expressed as fusion proteins linked to a SUMO (small ubiquitin modifier) protein and the fusion proteins were tested for reactivity with RrgACatcher (SEQ ID NO: 10). The reactions were performed for 30 minutes in phosphate buffered saline (PBS) at pH 7.4 and room temperature. 10 μΜ of each protein was used in each reaction. Figure 15 shows that only four of the modified RrgATag peptide linkers showed observable reactivity with RrgACatcher: RrgATag 2.0, RrgATag2.3, RrgATag2 and RrgATag2.7.

However, RrgATag2 showed a significant increase in activity relative to

RrgATag2.0, which as discussed above, has increased activity relative to RrgATag. Thus RrgATag2 has significantly improved reactivity with RrgACatcher in comparison to RrgATag.

The speed of reaction between RrgATag2 (in the form a fusion protein with SUMO) and RrgACatcher is shown in Figure 16 and indicates that an excess of RrgATag2 increases the speed of reaction. However, the reaction neared completion, i.e. 100% consumption of RrgACatcher, at all concentrations of RrgATag2.

Whilst not wishing to be bound by theory, it is hypothesised that the significantly improved activity of RrgATag2 is a result of the combination of modifications/mutations relative to RrgATag. In this respect, the C-terminal extension, which is based on the native RrgA sequence, is thought to form favourable interactions with the RrgACatcher peptide linker. Furthermore, it is hypothesised that the D to G mutation (i.e. the reduction in the size of the side- chain) in the middle of the RrgATag2 peptide linker stabilises the hairpin turn in the peptide (as seen in the crystal structure to be present in the full length domain).

Example 5 - Investigating the cross-reactivity of the improved RrgATag2 peptide linker

The RrgATag2/RrgACatcher peptide linker pair was tested for cross- reactivity against the SnoopTag/SnoopCatcher and SpyTag/SpyCatcher peptide linker pairs, as described in Example 3 above. The RrgATag2 peptide linker was expressed as a fusion protein linked to SUMO, as described in Example 4.

Figure 17 shows that no significant cross-reactivity was found between the

RrgACatcher peptide linker and the SpyTag or SnoopTag peptide linkers. Figure 18 shows that no significant cross-reactivity was found between the RrgATag2 peptide linker and the SpyCatcher or SnoopCatcher peptide linkers. Thus, each pair of peptide linkers is orthogonal to the other pairs of peptide linkers.

Materials and Methods

Cloning

KOD Hot Start DNA Polymerase (Roche) was used to perform all PCR and site-directed mutagenesis. Gibson Assembly® Master Mix (NEB) was used according to the manufacturer's instructions. Constructs were initially cloned into chemically competent E. coli DH5a (Life Technologies).

pET28a SpyTag-MBP (Addgene plasmid ID 35050), Glutathione-S- Transferase-BirA and pDEST14-SpyCatcher (GenBank JQ47841 1 , Addgene plasmid ID 35044) have been described in B. Zakeri et al., 2012 (Proc Natl Acad Sci U S A 109, E690-697).

pET28a SnoopCatcher was generated by DNAWorks primer-mediated assembly from residues 749-860 of Streptococcus pneumoniae adhesin RrgA (numbering based on Protein Data Bank ID 2WW8), digested with Hindi 11 and Ndel and subcloned into pET28a. To optimize reaction with SnoopTag, the G842T mutation was made in this construct by QuikChange with 5'-

GTGCCGCAGGATATTCCGGCTACATATGAATTTACCAACG (SEQ ID NO: 73), and the D848G mutation with 5'-

GCTACATATGAATTTACCAACGGTAAACATTATATCACCAATGAACC (SEQ ID NO: 74) and their reverse complements. SnoopCatcher is 132 residues long (assuming fMet cleavage) and has an N-terminal thrombin cleavage site and His 6 tag. pET28a SnoopCatcher NA was produced from pET28a SnoopCatcher by QuikChange of N854 to A using the forward primer 5'-

ACATTATATCACCGCTGAACCGATACCGCCG (SEQ ID NO: 75) and its reverse complement.

pET28a SnoopTag-MBP was generated in two steps. First the reactive peptide based on the N-terminal β-strand of RrgA's D4 domain was cloned (residues 734-748) into pET28a-SpyTag-MBP by site-directed, ligase-independent mutagenesis (SLIM) PCR (Chiu et al., 2004) using 5'- GGTAGTGGTGAAAGTGGTAAAATCGAAGAAG (SEQ ID NO: 76), 5'- AAACTGGGCGATATTGAATTTATTAAAGTGAACAAAAACGATAAAGGTAGTGGT GAAAGTGGTAAAATCGAAGAAG (SEQ ID NO: 77), 5'- TCCCATATGGCTGCCGCGCG (SEQ ID NO: 78) and 5'-

TTTATCGTTTTTGTTCACTTTAATAAATTCAATATCGCCCAGTTTTCCCATATGG

CTGCCGCGCG (SEQ ID NO: 79). The 3 C-terminal residues of the peptide were removed using QuikChange with 5'-

GAATTTATTAAAGTGAACAAAGGTAGTGGTGAAAGTGGTAAAATCG (SEQ ID

NO: 80) and its reverse complement. pET28a SnoopTag KA-MBP, an unreactive version of SnoopTag, was generated by QuikChange of K742 to A on pET28a

SnoopTag-MBP using 5'- GGGCGATATTGAATTTATTGCAGTGAACAAAGGTAGTGG (SEQ ID NO: 81) and its reverse complement.

pET28a MBP-SpyCatcher was generated by fusing SpyCatcher with a

Gly/Ser spacer at the C-terminus of MBP, through overlap extension PCR.

SpyCatcher was amplified from pDEST14-SpyCatcher using the forward primer 5'- GTTCGGGCGGTAGTGGTGCCATGGTTGATACCTTATCAGGTTTATCAAGTGAG

CAAG (SEQ ID NO: 82) and the reverse primer 5'-

TACTAAGCTTCTATTAAATATGAGCGTCACCTTTAGTTGCTTTGCCATTTACAG

(SEQ ID NO: 83). The forward primer 5'-

ATCTCATATGGGCAGCAGCCATCATCATCATCATCAC (SEQ ID NO: 84) and the reverse primer 5'-

GTATCAACCATGGCACCACTACCGCCCGAACCCGAGCTCGAATTAGTCTGCG

(SEQ ID NO: 85) were used to amplify MBP from pET28a SpyTag-MBP. The two resulting PCR products were mixed and amplified again using the SpyCatcher forward primer and the MBP reverse primer, digested with Ndel and Hindi 11 , and subcloned into pET21. To increase the affinity of MBP-SpyCatcher for amylose we first made the A312V and 1317V mutations in MBP by QuikChange using the forward primer 5'-

GTCTTACGAGGAAGAGTTGGTGAAAGATCCACGTGTGGCCGCCACTATGGAA AACGC (SEQ ID NO: 86) and its reverse complement. Residues 172, 173, 175 and 176 were deleted from MBP using QuikChange with 5'-

GGGTTATGCGTTCAAGTATGGCGACATTAAAGACGTGGGCG (SEQ ID NO: 87) and its reverse complement. We then shortened SpyCatcher's N-terminus by QuikChange using 5'-

CACCATCACCATCACGATTACGATAGTGCTACCCATATTAAATTCTC (SEQ ID NO: 88) and its reverse complement. To decrease even further the dissociation from amylose resin, tandem links of this mutant MBP were generated to give MBPx- SpyCatcher (N-terminal His 6 tag-MBPmt-spacer-MBPmt-spacer-SpyCatcher).

MBPx was amplified and fused to MBPx-SpyCatcher via Gibson assembly using the forward primer 5- GGCGGATCCGGAGGTGGATCCGGAAAGATAGAGGAGGGTAAACTGGTAATCT GG (SEQ ID NO: 89), the reverse primer 5- CCTATAGTGAGTCGTATTAATTTCG (SEQ ID NO: 90), the forward primer 5- CGAAATTAATACGACTCACTATAGG (SEQ ID NO: 91) and the reverse primer 5-

TCCGGATCCACCTCCGGATCCGCCGGAACTAGAATTCGTCTGCGCGTCTTTCA GG (SEQ ID NO: 92).

pET28a SpyCatcher-SnoopCatcher was produced in steps. Initially

SpyCatcher was fused with a Gly/Ser spacer at the N-terminus of SnoopCatcher, then the Gly/Ser spacer was replaced with an a-helical spacer (sequence

PANLKALEAQKQKEQRQAAEELANAKKLKEQLEK, SEQ ID NO: 93). The forward primer 5'-CTTTAAGAAGGAGATATACATATGTCGTACTACCATCACCATC (SEQ ID NO: 94) and the reverse primer 5'-

CCGCTGCTTCCGGATCCAATATGAGCGTCACCTTTAGTTG (SEQ ID NO: 95) were used to amplify the SpyCatcher portion from pDEST14-SpyCatcher. The SnoopCatcher part was cloned using the forward primer 5'- CATATTGGATCCGGAAGCAGCGGCCTGGTGCCGCGCGGATCCCATATGAAGC CGCTGC (SEQ ID NO: 96) and the reverse primer 5'-

GTGGTGGTGGTGGTGCTCGAGTTATTATTTCGGCGGTATCGGTTC (SEQ ID NO: 97) from pET28a SnoopCatcher. Following SpyCatcher and SnoopCatcher fusion, the Gly/Ser spacer was replaced with stable a-helical linker using the forward primer 5'- CTAAAGGTGACGCTCATATTGGATCCCCCGCCAACCTGAAGGCCCTGGAGGC CCAGAAGCAGAAGGAGCAGAGACAGGCCGCCGAGGAGC (SEQ ID NO: 98) and the reverse primer 5'-

CACGGCACCACGCAGCGGCTTCATATGGGATCCCTTCTCCAGCTGCTCCTTCA GCTTCTTGGCGTTGGCCAGCTCCTCGGCGGCCTGTC (SEQ ID NO: 99). 35 residues were deleted from SpyCatcher's N-terminus via QuikChange using the forward primer 5'-

CACCATCACCATCACGATTACGATAGTGCTACCCATATTAAATTCTC (SEQ ID NO: 100) and its reverse complement.

pET28a SnoopTag-Affi-SpyTag (N-terminal His 6 tag-SnoopTag-spacer-

Affibody against HER2-spacer-SpyTag) was generated by Gibson assembly using the forward primer 5'-

GTGAACAAAGGCAGTGGTGAGTCGGGATCCGGAGCTAGCATGACTGGTGG

(SEQ ID NO: 101) and the reverse 5'

CATCACGATGTGGGCACCGGAACCTTCCCCGGATCCCTCGAGGCCTTTCGG

(SEQ ID NO: 102) from pET28a-KTag-AffiHer2-SpyTag.

pET28a SnoopTag-AffiTaq-SpyTag, an affibody against Taq DNA polymerase was generated by inverse PCR from pET28a SnoopTag-AffiHer2- SpyTag using 5'- CTACCCAACCTAAACGGGGTACAAGTAAAGGCTTTCATAGACTCGCTAAGGGA TGACCCAAGCCAAAGCGC (SEQ ID NO: 103) and 5'-

GTTGAATATCTCCCAAGTAGCCCACCCTAGCTCCTTGTTGAACTTGTTGTCTAC

TTCTTTGTTGAATTTGTTGTCCACGCC (SEQ ID NO: 104).

pET28a SnoopTag-mEGFP-SpyTag was cloned by substituting mEGFP at the BamHI sites in pET28a SnoopTag-Affi-SpyTag and by PCR to extend the spacer. pET28a SnoopTag-SpyTag-Affi3 was generated by PCR assembly of tandem copies of AffiHER2 linked by Gly/Ser spacers.

AviTag-SpyCatcher, containing a peptide tag for site-specific biotinylation at the N-terminus was cloned by SLIM PCR from pDEST14-SpyCatcher using 5'- GATTACGACATCCCAACGACCGAAAACCTG (SEQ ID NO: 105), 5'-

GCCTGAACGATATTTTTGAAGCGCAGAAAATTGAATGGCATGAAGGCGATTAC

GACATCCCAACGACCGAAAACCTG (SEQ ID NO: 106), 5'-

GTGATGGTGATGGTGATGGTAGTACGACATATG (SEQ ID NO: 107) and 5'-

TGCCATTCAATTTTCTGCGCTTCAAAAATATCGTTCAGGCCGCTGCCGTGATG GTGATGGTGATGGTAGTACGACATATG (SEQ ID NO: 108). All mutations and constructs were verified by sequencing.

Protein expression and purification

Proteins were expressed in E. coli BL21 DE3 RIPL (Agilent). Colonies were grown up overnight at 37°C in LB containing 0.5 mg/mL kanamycin for pET28a vectors and 0.1 mg/mL ampicillin for pET21. The overnight cultures were diluted 1 :100 in LB containing 0.8% glucose with the appropriate antibiotic and grown at 37°C, 200 rpm to OD 600 0.5-0.6 and induced with 0.4 mM IPTG at 30°C, 200 rpm for 4 h. Proteins were purified by standard methods on Ni-NTA (Qiagen) and dialyzed thrice with TBS (50 mM Tris HCI pH 8.0 and 50 mM NaCI).

For MBPx-SpyCatcher's purification, after elution from Ni-NTA, the buffer was exchanged by dialysis into 20 mM Tris HCI pH 8.0 at 4°C, loaded onto quaternary high performance (Q-HP) resin (GE Healthcare) and eluted by 10 column volumes (i.e. 10 mL) linear gradient of 0-0.15 M NaCI with a flow rate of 1 mL/min. An extra elution step was performed with a linear gradient of 0.15-0.35 M NaCI at the flow rate of 1.5 mL/min and collecting 0.5 mL fractions. Collected fractions were dialyzed into TBS, concentrated using a Vivaspin centrifugal concentrator 5 kDa cutoff (GE Healthcare) and stored at -80°C.

SnoopTag-Affi-SpyTag was dialyzed in 20 mM 2-(N- morpholino)ethanesulfonic acid (MES) pH 5.8 at 4 °C and loaded onto sulfopropyl high performance (SP-HP) resin (GE Healthcare). Protein was eluted by applying a linear gradient of 0.2-0.5 M NaCI and collecting 1 mL fractions. The eluted fractions were concentrated to 1-2 mg/mL using a Vivaspin centrifugal concentrator 5 kDa cutoff (GE Healthcare), dialyzed into TBS pH 8.0 and stored at -80°C.

For SpyCatcher-SnoopCatcher's purification, after elution from Ni-NTA, the buffer was exchanged by dialysis into 20 mM Tris HCI pH 8.0 at 4°C, loaded onto quaternary high performance (Q-HP) resin and eluted with a linear gradient of 0.2- 0.5 M NaCI. Collected fractions were dialyzed into TBS, concentrated using a Vivaspin centrifugal concentrator 5 kDa cut-off and stored at -80 °C.

Purified AviTag-SpyCatcher was biotinylated in PBS pH 7.4 containing 5 mM MgCI 2 , 1 mM ATP, 380 μΜ D-biotin and 7 μΜ GST-BirA for 1 hr at 25°C. After 1 hr incubation, further GST-BirA was added to give a final concentration of 14 μΜ and the reaction was incubated for another hour at 25°C. GST-BirA was removed by incubating the reaction mixture with 50 of slurry Hi-Cap Glutathione matrix (Qiagen) at 25°C, with end-over-end rotation for 30 min. Resin was spun down at 4,000 g for 1 min. The supernatant was collected and dialyzed overnight at 4 °C into PBS. To confirm complete biotinylation, a streptavidin gel-shift assay was performed as described. SDS-PAGE

SDS-PAGE was performed on the indicated percentage polyacrylamide gel using an XCell SureLock gel container (Life Technologies) at 200 V. Gels were stained with Instant Blue Coomassie stain (Triple Red Ltd.) and bands were densitometrically analyzed using a Gel Doc XR imager and Image Lab 3.0 software (Bio-Rad). All running buffers were Tris-glycine, except for Figure 9A which was Tris-acetate to improve resolution of high M w products.

Isopeptide bond reconstitution

To assess the formation of a covalent bond between SnoopTag and

SnoopCatcher, proteins were mixed each at 10 μΜ final concentration in TBS pH

8.0 containing 1.5 M trimethylamine N-oxide (TMAO; Sigma-Aldrich). TMAO acts as a chemical chaperone. Reactions were stopped by adding 6* SDS loading buffer (0.23 M Tris-HCI, pH 6.8, 24% v/v glycerol, 120 μΜ bromophenol blue, 0.23 M SDS). Samples were subsequently heated using a Bio-Rad C1000 thermal cycler at 95 °C for 5 min, before SDS-PAGE on 16% polyacrylamide gels.

To test orthogonality, 10 μΜ SnoopTag-MBP and 10 μΜ SnoopCatcher or SpyCatcher were incubated for 1 hr at 25°C in TBS pH 8.0, before SDS-PAGE. Similarly 10 μΜ SpyTag-MBP and 10 μΜ SnoopCatcher or SpyCatcher were incubated as above.

For the other peptide linker pairs, 10 μΜ RrgATag-MBP or 10 μΜ PsCsTag-

MBP and 10 μΜ SnoopCatcher, SpyCatcher, SnoopTag-MBP or SpyTag-MBP were incubated for 24 hr at 25°C in PBS pH 7.4, before SDS-PAGE.

To evaluate the pH-dependence, each protein was mixed at 10 μΜ in succinate-phosphate-glycine buffer (12.5 mM succinic acid, 43.75 mM NaH 2 P0 4 , 43.75 mM glycine), chosen to enable suitable buffering over a broad pH range, ranging from pH 4.0 to pH 9.0 and incubated at 25 °C for 15 min.

To determine the effect of temperature, 10 μΜ SnoopTag-MBP and 10 μΜ SnoopCatcher were mixed for 15 min at the indicated temperatures in phosphate buffered saline (PBS, 10 mM Na 2 HP0 4 137 mM NaCI, 27 mM KCI, 1.8 mM KH2PO4) pH 8.0 containing 1.5 M TMAO. PBS was used in place of TBS because the pH of Tris buffers changes substantially with temperature.

To investigate the sensitivity to the buffer composition, proteins were incubated at 25 °C for 15 min in PBS pH 8.0, TBS pH 8.0 or TBS pH 8.0 containing 1 % Triton X-100 (w/v), 1 % Tween 20 (v/v), 10 mM ethylene diamine tetraacetate (EDTA), 10 mM MgCI 2 , 10 mM DTT or 50 mM Tris pH 8.0 with 1 M NaCI.

Reaction rate was determined by reacting SnoopTag-MBP and

SnoopCatcher at the indicated concentrations in TBS pH 8.0 containing 1.5 M TMAO and incubating at 25°C for various times. Reactions were stopped in SDS- loading buffer, as described above, prior to SDS-PAGE. % reconstitution was calculated as 100* the band intensity of the covalent adduct, divided by the sum of band intensities of SnoopTag-MBP, SnoopCatcher and the covalent adduct.

To test reversibility with competing tag, 10 μΜ SnoopCatcher was incubated with 15 μΜ SnoopTag-MBP for 6 hr and then SnoopTag-Affi-SpyTag at a final concentration of 130 μΜ was added for 16 hr, all at 25°C. To test reversibility with ammonia, 10 μΜ SnoopCatcher was incubated with 10 μΜ SnoopTag-MBP for 2 hr in TBS pH 8.0 containing 1.5 M TMAO and then TBS pH 8.0 or NH 4 CI pH 9.0 (to a final concentration of 1 M) was added for 16 hr, all at 25°C. Mass spectrometry

20 μΜ SnoopTag-MBP and 20 μΜ SnoopCatcher were incubated at 25°C for 2 hr in PBS pH 7.4. Mass spectrometry analysis was performed using a

Micromass LCT time-of-flight electrospray ionization MS (Micromass) and m/z spectrum was converted to molecular mass profile using a maximum entropy algorithm and the V4.00.00 software (Waters). ExPASy ProtParam was used to predict the molecular masses based on the protein's amino acid sequence, with the N-terminal fMet cleaved and subtracting 17.0 Da for isopeptide bond formation. Non-enzymatic gluconylation is often observed from expression of His-tagged proteins in E. coli and adds 178 Da. Similarly, E. co// ' -expressed proteins may also undergo some degree of acetylation.

The decamer was concentrated to 15 μΜ and buffer-exchanged into 200 mM ammonium acetate using an Amicon Ultra 0.5 ml_ centrifugal filter with a 10 kDa cut-off (Millipore). Measurements were carried out on a first generation Synapt High Definition Mass Spectrometry Quadrupole Time of Flight mass spectrometer (Waters), calibrated using 10 mg/mL caesium iodide in 250 mM ammonium acetate. 2.5 μ\- aliquots of sample were delivered by nano-electrospray ionization via gold- coated capillaries, prepared in house. Instrumental parameters were as follows: source pressure 6.0 mbar, capillary voltage 1.20 kV, cone voltage 150 V, trap energy 30 V, transfer energy 10 V, bias voltage 5 V, and trap pressure 0.0163 mbar. Mass spectra were smoothed and peak-centered and masses were assigned using MassLynx v4.1 (Waters).

Solid-phase synthesis of fusion proteins

40 μΙ_ of slurry amylose resin (NEB) was applied to a 1 mL poly-prep column (Bio-Rad), rinsed with 1 mL MilliQ water and equilibrated with 1 mL TBS pH 8.0.

320 pmol tandem MBPx-SpyCatcher in TBS pH 8.0 in a final volume of 80 was added to the resin and incubated at 25°C for 1 hr with 700 rpm shaking on a ThermoMixer comfort (Eppendorf). Unreacted protein was removed from the column by gravity flow and resin was washed with 1 mL Wash Buffer (50 mM Tris HCI pH 8.0 with 500 mM NaCI). 3 nmol SnoopTag-Affi-SpyTag in TBS pH 8.0 in a final volume of 80 was added to the resin and incubated at 25°C for 1 hr with 700 rpm shaking. Unreacted SnoopTag-Affi-SpyTag was removed from the column by gravity flow and resin washed with 1 mL Wash Buffer. 4 nmol SpyCatcher- SnoopCatcher in TBS pH 8.0 with 1.5 M TMAO was added to the resin and incubated at 25°C for 2 hr with 700 rpm shaking. Unreacted SpyCatcher-

SnoopCatcher was removed from the column by gravity flow and the resin was washed with 1 mL Wash Buffer. Chains were produced by sequential addition of SnoopTag-Affi-SpyTag and SpyCatcher-SnoopCatcher, according to the conditions described above. Chains were eluted, after resin washing, by adding 40 TBS pH 8.0 containing 50 mM D-maltose (Sigma) and incubating at 25°C for 10 min with 700 rpm shaking. Chains were collected by centrifuging the column in a 1.5 mL microcentrifuge tube for 10 s at 17,000 g. Chains containing SnoopTag-mEGFP- SpyTag and SnoopTag-SpyTag-Affi3 were synthesized in exactly the same way.

For SDS-PAGE testing after each step, samples were eluted as previously described, mixed with 6* SDS loading buffer and heated at 95 °C for 5 min before SDS-PAGE.

For biotinylated-SpyCatcher-based assembly, 40 μί of slurry monomeric avidin resin (Thermo Scientific) was applied to a 1 mL poly-prep column, rinsed and equilibrated as above. 4 μΜ biotinylated-SpyCatcher in TBS pH 8.0 in a final volume of 80 μί was added to the resin and incubated at 25°C for 1 hr with 700 rpm shaking. Unreacted biotinylated-SpyCatcher was removed from the column by gravity flow, resin was washed with 1 ml_ Wash Buffer, and sequential addition of SnoopTag-Affi-SpyTag and SpyCatcher-SnoopCatcher was performed as described above. After resin washing, chains were eluted by applying onto the column 40 μΙ_ 1 mM D-biotin in TBS pH 8.0 and incubating at 25°C for 4 hr with 700 rpm shaking. Chains were collected as previously indicated and analyzed by SDS- PAGE on 16 and 8% Tris-glycine gels.

Gel filtration chromatography

Decamer chains were analyzed by gel filtration chromatography on a

Superdex 200 GL 10/300 column (24 ml_ bed volume) (GE Healthcare). The column was calibrated by using gel filtration standards (thyroglobulin 670 kDa, IgG 158 kDa, ovalbumin 44 kDa, myoglobin 17 kDa, and vitamin B12 1.35 kDa) (Bio- Rad). Samples were eluted at 0.4 mL/min in 50 mM Tris HCI pH 8.0 with 500 mM NaCI, with absorbance profile measured at 280 nm on an AKTA purifier 10 (GE Healthcare).

Stability testing of chains

For temperature-stability testing, decamer chains in 150 mM ammonium acetate pH 8.0 at 3 μΜ in a final volume of 30 μΙ_ were incubated at 25, 37, 50, 60 or 70 °C for 3 min and cooled to 10 °C at 3 °C/s in a Bio-Rad C1000 Thermal Cycler. Samples were then spun at 17,000 g at 4 °C for 30 min to remove aggregates and the supernatant was analyzed by SDS-PAGE with Coomassie staining on an 8% Tris-glycine gel. For time-dependent stability testing, decamer chains at 3 μΜ in 150 mM ammonium acetate pH 8.0 containing 0.1 % sodium azide, 1 mM phenylmethylsulfonyl fluoride (PMSF), 1 mM EDTA, and EDTA-free mixed protease inhibitors (Roche) in a final volume of 30 μΙ_ were incubated at 25°C for 1 or 4 days. At each time point samples were spun at 17, 000 g at 4 °C for 30 min and the supernatant was analyzed by SDS-PAGE with Coomassie staining on an 8% Tris-glycine gel.