Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD OF DETECTING PROTEIN-PROTEIN INTERACTIONS
Document Type and Number:
WIPO Patent Application WO/2021/025623
Kind Code:
A1
Abstract:
The present invention relates to a method of detecting an interaction between a first protein and a second protein, the method comprising: a) providing a first fusion protein comprising the first protein connected to a nucleic acid amplifying enzyme; b) providing a second fusion protein comprising the second protein connected to a processivity enhancing protein; c) contacting the first protein of the first fusion protein to the second protein of the second fusion protein; d) performing a nucleic acid amplification reaction of one or more template nucleic acid under a condition in which the nucleic acid amplifying enzyme is not able to amplify the one or more template nucleic acid; e) detecting the interaction between the first protein and the second protein, wherein the interaction between the first protein and the second protein restores the activity of the nucleic acid amplifying enzyme and wherein the amplification of the one or more template nucleic acid under said condition is indicative of the interaction between the first protein and the second protein.

Inventors:
GHADESSY FARID JOHN (SG)
Application Number:
PCT/SG2020/050458
Publication Date:
February 11, 2021
Filing Date:
August 05, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
AGENCY SCIENCE TECH & RES (SG)
International Classes:
C12Q1/686; C12Q1/25
Domestic Patent References:
WO2002022869A22002-03-21
Other References:
PU J. ET AL.: "RNA Polymerase Tags To Monitor Multidimensional Protein- Protein Interactions Reveal Pharmacological Engagement of Bcl-2 Proteins", J AM CHEM SOC, vol. 139, no. 34, 2 August 2017 (2017-08-02), pages 11964 - 11972, XP055791727, [retrieved on 20201102], DOI: 10.1021/JACS.7B06152
MEHLA J. ET AL.: "A Comparison of Two-Hybrid Approaches for Detecting Protein-Protein Interactions", METHODS ENZYMOL, vol. 586, 5 January 2017 (2017-01-05), pages 333 - 358, [retrieved on 20201102], DOI: 10.1016/BS.MIE. 2016.10.02 0
Attorney, Agent or Firm:
SPRUSON & FERGUSON (ASIA) PTE LTD (SG)
Download PDF:
Claims:
Claims

1. A method of detecting an interaction between a first protein and a second protein, the method comprising: a) providing a first fusion protein comprising the first protein connected to a nucleic acid amplifying enzyme; b) providing a second fusion protein comprising the second protein connected to a processivity enhancing protein; c) contacting the first protein of the first fusion protein to the second protein of the second fusion protein; d) performing a nucleic acid amplification reaction of one or more template nucleic acid under a condition in which the nucleic acid amplifying enzyme is not able to amplify the one or more template nucleic acid; e) detecting the interaction between the first protein and the second protein, wherein the interaction between the first protein and the second protein restores the activity of the nucleic acid amplifying enzyme and wherein the amplification of the one or more template nucleic acid under said condition is indicative of the interaction between the first protein and the second protein.

2. The method of claim 1, wherein the method further comprises providing a protein ligase; wherein the contacting of the first protein of the first fusion protein to the second protein of the second fusion protein in step c) comprises the ligation of the first protein to the second protein by the protein ligase; and wherein the amplification of the one or more template nucleic acid in step e) is indicative of protein ligase activity.

3. The method of claim 1, wherein the first protein of the first fusion protein or the second protein of the second fusion protein is a protein ligase; and wherein the amplification of the one or more template nucleic acid in step e) is indicative of protein ligase activity.

4. The method of any one of claims 1 to 3, wherein the interaction between the first protein and second protein is a covalent interaction or a non-covalent interaction.

5. The method of any one of claims 1 to 4, wherein the nucleic acid amplifying enzyme is Stoffel fragment of Taq DNA polymerase.

6. The method of any one of claims 1 to 5, wherein the processivity enhancing protein is Sso7d or topoisomerase V HhH.

7. The method of any one of claims 1 to 6, wherein the first protein and the second protein are respectively SpyCatcher and SpyTag, or SpyTag and SpyCatcher.

8. The method of any one of claims 1 to 7, wherein the method is carried out in vitro.

9. The method of any one of claims 1 to 8, wherein step a) to step c) of the method is carried out in a bacterial cell.

10. The method of any one of claims 1 to 9, wherein the one or more template nucleic acid comprises the polynucleotide sequence encoding the first protein, or the polynucleotide sequence encoding the second protein, or both. 11. The method of any one of claims 1 to 10, wherein the first fusion protein and the second fusion protein are expressed in a cell comprising a protein ligase.

12. The method of any one of claims 2 to 11, wherein the one or more template nucleic acid comprises the polynucleotide sequence encoding the protein ligase.

13. The method of any one of claims 9-12, wherein the cell is located in a water-in-oil droplet.

14. The method of claim 13, wherein the water-in-oil droplet further comprises a buffer, salt, deoxynucleotides (dNTPs) and primers.

15. The method of claims 13 or 14, wherein the droplets have a diameter of between 1 pm and 20 pm.

16. The method of any one of claims 1 to 15, wherein the nucleic acid amplification reaction is polymerase chain reaction (PCR).

17. The method of any one of claims 1 to 16, wherein the template nucleic acid is template DNA.

18. The method of any one of claims 1 to 17, wherein the condition of step d) is a salt concentration of at least 50 mM, or accelerated cycling conditions, or both.

19. The method of claim 18, wherein the condition is a salt concentration is between 50 mM to 300 mM.

20. The method of claims 18 or 19, wherein the salt is potassium chloride.

21. The method of claim 18, wherein the accelerated cycling conditions is a PCR annealing time that is shorter than the minimum annealing time required for the nucleic acid amplifying enzyme to amplify the one or more template nucleic acid, or a PCR extension time that is shorter than the minimum extension time required for the nucleic acid amplifying enzyme to amplify the one or more template nucleic acid, or both.

22. The method of anyone of claims 1, 4-6, 8-10 and 13-21, wherein the first protein and the second protein are respectively the large peptide fragment (NB) and small peptide fragment (NS) of split NanoLuc lucif erase, or the small peptide fragment (NS) and the large peptide fragment (NB) of split NanoLuc luciferase.

23. A method of determining activity of a protein ligase, the method comprising: a) providing a first fusion protein comprising a first protein connected to a nucleic acid amplifying enzyme; b) providing a second fusion protein comprising a second protein connected to a processivity enhancing protein; c) ligating the first protein of the first fusion protein to the second protein of the second fusion protein with the protein ligase; d) performing a nucleic acid amplification reaction of one or more template nucleic acid under a condition in which the nucleic acid amplifying enzyme is not able to amplify the one or more template nucleic acid; e) determining the activity of the protein ligase, wherein the interaction between the first protein and the second protein restores the activity of the nucleic acid amplifying enzyme and wherein the amplification of the one or more template nucleic acid under said condition is indicative of protein ligase activity.

24. A bacterial cell comprising: a) a polynucleotide sequence encoding a first fusion protein comprising a first protein connected to a nucleic acid amplifying enzyme; b) a polynucleotide sequence encoding a second fusion protein comprising a second protein connected to a processivity enhancing protein.

25. The bacterial cell of claim 24, further comprising a polynucleotide sequence encoding a protein ligase.

26. A method of selecting one or more variants of an interacting protein pair, wherein the one or more variants have a higher affinity interaction relative to the wild type protein pair, the method comprising: a) providing a first fusion protein comprising a first protein variant of the interacting protein pair connected to a nucleic acid amplifying enzyme; b) providing a second fusion protein comprising a second protein variant of the interacting protein pair connected to a processivity enhancing protein; c) contacting the first protein variant of the first fusion protein to the second protein variant of the second fusion protein; d) performing a nucleic acid amplification reaction of one or more template nucleic acid under a condition in which the nucleic acid amplifying enzyme is not able to amplify the one or more template nucleic acid; e) detecting the interaction between the first protein variant and the second protein variant, wherein the interaction between the first protein variant and the second protein variant restores the activity of the nucleic acid amplifying enzyme and wherein the amplification of the one or more template nucleic acid under said condition is indicative of the interaction between the first protein variant and the second protein variant; f) measuring the copy number of the one or more amplified template nucleic acids resulting from the interaction of the variant protein pair and comparing this to the copy number of the one or more amplified template nucleic acid resulting from the interaction of the wild type protein pair; g) selecting the one or more variant proteins with a copy number of amplified template nucleic acid higher than the wild type protein pair, wherein a higher copy number of amplified template nucleic acid is indicative of a higher affinity interaction.

Description:
METHOD OF DETECTING PROTEIN-PROTEIN INTERACTIONS

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit of priority of Singapore application No. 10201907377P, filed 8 August 2019, the contents of it being hereby incorporated by reference in its entirety for all purposes.

FIELD OF THE INVENTION

[0002] The invention is in the field of protein biochemistry. In particular, the invention relates to a method of detecting protein-protein interactions. The invention also relates to a method of detecting protein ligase activity.

BACKGROUND OF THE INVENTION

[0003] Cellular biology is governed by a complex network of protein-protein interactions. In many cases, the principal interacting component of one protein in a binary complex presents as a short, often alpha helical region that retains binding affinity in the form of a discrete peptide. This knowledge can guide development of both small molecule and peptidic antagonists towards therapeutic targets and protein biosensors. Protein engineering can further derive novel peptide-protein pairs by splitting compliant proteins into interacting components. This approach has yielded robust tools for biosensing, imaging and targeted protein conjugation.

[0004] Protein-protein interactions can also be facilitated by protein ligases. Protein ligases have recently emerged as important tools in the field of chemical biology. In particular, peptide ligases, which facilitate covalent linkage between the N- and C-termini of substrate proteins and peptides harbouring appropriate recognition sequences, have been used extensively in the development of new protein architectures and site-specific tagging. Protein ligases therefore allow a myriad of applications including the assembly of protein domains and the production of protein conjugates such as antibody-drug conjugates.

[0005] The conventional method of detecting protein ligase activity involves incubating substrate proteins in the presence of the enzyme and visualizing the product after separation by sodium dodecyl sulphate -polyacrylamide gel electrophoresis (SDS-PAGE). However, this method is not readily amenable to high-throughput screening of protein ligase variants. Other methods of detecting protein ligase activity, including methods which utilise mass spectrophotometry, fluorescence resonance energy transfer (FRET), enzyme-linked immunosorbent assay (ELISA) or fluorescence-activated cell sorting (FACS) in the measurement of protein ligase activity, have been disclosed. However, these methods rely on bespoke custom reagents and/or expensive instrumentation and are only able to increase throughput to a limited extent.

[0006] Methodologies that disclose new protein-protein interactions, modulate affinities of known protein-protein interactions, and select for novel peptide/protein binders are therefore important tools for proteomics, drug discovery, target validation and biotechnology applications. To this end, a suite of “N-hybrid” platforms including the prototypical yeast 2- hybrid (Y2H) selection methodology have been developed and successfully implemented over the years. These couple in vivo protein-protein interactions to co-localisation of two protein domains required for signal generation. However, as conventional 2-hybrid platforms utilize mesophilic reporter proteins, these platforms are inadequate when used in the co-selection of thermostability of proteins, which is a desired feature in many downstream applications of evolved proteins.

[0007] There is therefore a need to identify methods of detecting protein-protein interactions that could be applied to high-throughput screening approaches for screening novel proteins such as protein ligases.

SUMMARY

[0008] In one aspect, there is provided a method of detecting an interaction between a first protein and a second protein, the method comprising: a) providing a first fusion protein comprising the first protein connected to a nucleic acid amplifying enzyme; b) providing a second fusion protein comprising the second protein connected to a processivity enhancing protein; c) contacting the first protein of the first fusion protein to the second protein of the second fusion protein; d) performing a nucleic acid amplification reaction of one or more template nucleic acid under a condition in which the nucleic acid amplifying enzyme is not able to amplify the one or more template nucleic acid; e) detecting the interaction between the first protein and the second protein, wherein the interaction between the first protein and the second protein restores the activity of the nucleic acid amplifying enzyme and wherein the amplification of the one or more template nucleic acid under said condition is indicative of the interaction between the first protein and the second protein.

[0009] In another aspect, there is provided a method of determining activity of a protein ligase, the method comprising: a) providing a first fusion protein comprising a first protein connected to a nucleic acid amplifying enzyme; b) providing a second fusion protein comprising a second protein connected to a processivity enhancing protein; c) ligating the first protein of the first fusion protein to the second protein of the second fusion protein with the protein ligase; d) performing a nucleic acid amplification reaction of one or more template nucleic acid under a condition in which the nucleic acid amplifying enzyme is not able to amplify the one or more template nucleic acid; e) determining the activity of the protein ligase, wherein the interaction between the first protein and the second protein restores the activity of the nucleic acid amplifying enzyme and wherein the amplification of the one or more template nucleic acid under said condition is indicative of protein ligase activity.

[0010] In one aspect, there is provided a bacterial cell comprising: a) a polynucleotide sequence encoding a first fusion protein comprising a first protein connected to a nucleic acid amplifying enzyme; b) a polynucleotide sequence encoding a second fusion protein comprising a second protein connected to a processivity enhancing protein.

[0011] In one aspect, there is provided a method of selecting one or more variants of an interacting protein pair, wherein the one or more variants have a higher affinity interaction relative to the wild type proteins, the method comprising: a) providing a first fusion protein comprising a first protein variant of the interacting protein pair connected to a nucleic acid amplifying enzyme; b) providing a second fusion protein comprising a second protein variant of the interacting protein pair connected to a processivity enhancing protein; c) contacting the first protein variant of the first fusion protein to the second protein variant of the second fusion protein; d) performing a nucleic acid amplification reaction of one or more template nucleic acid under a condition in which the nucleic acid amplifying enzyme is not able to amplify the one or more template nucleic acid; e) detecting the interaction between the first protein variant and the second protein variant, wherein the interaction between the first protein variant and the second protein variant restores the activity of the nucleic acid amplifying enzyme and wherein the amplification of the one or more template nucleic acid under said condition is indicative of the interaction between the first protein variant and the second protein variant; f) measuring the copy number of the one or more amplified template nucleic acids resulting from the interaction of the variant protein pair and comparing this to the copy number of the one or more amplified template nucleic acid resulting from the interaction of the wild type protein pair; g) selecting the one or more variant proteins with a copy number of amplified template nucleic acid higher than the wild type protein pair, wherein a higher copy number of amplified template nucleic acid is indicative of a higher affinity interaction.

DEFINITIONS

[0012] As used herein, the term “peptide”, “polypeptide” and “protein” refer to a polymeric form of amino acids. Proteins and polypeptides are understood to comprise more amino acids than peptides. Proteins and polypeptides typically comprise at least about 35 amino acids while peptides typically comprise from 2 to about 35 amino acids. Proteins may comprise 1 or more polypeptides and the individual polypeptide chains may be covalently or non-covalently linked. A portion of the protein or polypeptide may have or be capable of acquiring a three-dimensional arrangement by forming secondary, tertiary or quartemary structures. Peptides, polypeptides and proteins may be naturally-occurring or non-naturally occurring. Proteins may include moieties other than amino acids (e.g. may be glycoproteins) and may be otherwise processed or modified. In the context of this application, the terms “peptide”, “polypeptide” and “protein” may be used interchangeably.

[0013] As used herein, the term “interacting protein pair” refers to 2 proteins that are capable of interacting covalently or non-covalently via protein -protein interactions. Non- covalent interactions may include hydrogen bonds, ionic bonds, van der Waals interactions and hydrophobic bonds. Examples of interacting protein pairs include SpyCatcher-SpyTag and the split NanoLuc lucif erase system. A high affinity interaction refers to a protein-protein interaction that is has a dissociation constant (K d ) of less than 200 nM.

[0014] As used herein, the term “ligation” refers to covalent linkage of two proteins mediated by an enzyme. The ligation of two proteins involves the formation of a peptide or isopeptide bond between the two proteins. An isopeptide bond is a type of peptide bond that forms between the carboxyl group of one amino acid and the amino group of another, where at least one of these joining groups is part of the side chain of one of these amino acids. The enzyme mediating the ligation of proteins is known as “protein ligase” or “peptide ligase”. The term “ligation” may also refer to native chemical ligation, which involves a chemo selective reaction between a first peptide having a C -terminal a-carboxythioester moiety and a second peptide having an N-terminal cysteine residue. A thiol exchange reaction yields an initial thioester-linked intermediate, which spontaneously rearranges to give a native amide bond at the ligation site while regenerating the cysteine side chain thiol.

[0015] The terms “protein ligase” or “peptide ligase” as used in the context of this application refer to enzymes which catalyse the formation of peptide and isopeptide bonds with site and substrate specificity. In the context of this application, the terms “protein ligase” and “peptide ligase” are used interchangeably. Protein ligases catalyse protein ligase reactions and are used in a myriad of applications including the assembly of protein domains, the production of therapeutic protein conjugates (e.g. antibody-drug conjugates) and the production of fusion proteins.

[0016] As used herein, the term “fusion protein” refers to a protein comprising a polypeptide or a fragment thereof linked to another polypeptide. A fusion protein can be made recombinantly by constructing a nucleic acid sequence encoding a polypeptide or fragment thereof in frame with a nucleic acid sequence encoding a different protein, and then expressing the fusion protein. Alternatively, a fusion protein can be generated by connecting a polypeptide or fragment thereof with another protein via covalent bonds or non-covalent interactions. For example, fusion proteins can be generated by chemical methods such as cross-linking/native chemical ligation or can be generated by ligation by a protein ligase.

[0017] As used herein, the term “variant” in the context of a protein refers to a protein comprising a mutation of one or more amino acids as compared to a reference protein. In the context of a protein, the term “mutation” refers to a modification to the amino acid sequence resulting in a change in the amino acid sequence of the protein compared to a reference amino acid sequence. The mutation may involve one or more amino acid residues and may be selected from the group consisting of substitution, insertion, deletion, truncation and combinations thereof. A library of protein variants may be generated from mutation of the protein. The library may be randomised or semi-randomised. A randomised library is generated by random mutation of the amino acid residues of a protein. A semi-randomised library is generated by random mutation of specific amino acid residues of a protein. Protein variants may be functional or non-functional. A functional protein variant is one that retains the activity of the wild type protein or has increased activity compared to the wild type protein. A non-functional protein variant is one that has decreased activity or complete loss of activity compared to the wild type protein. The activity of a functional protein variant may refer to the affinity with which the functional protein variant interacts with another protein.

[0018] As used herein, the term “inhibit” in the context of inhibiting the activity of an enzyme refers to a partial decrease or a complete loss in the activity of the enzyme when compared to the enzyme’s original activity. The term “restore” in the context of restoring the activity of an enzyme refers to the reversal of a decrease in the activity of the enzyme. The term “restore” may refer to a partial recovery or a full recovery of the enzyme activity after a loss of activity. The conditions that inhibit or restore the activity of an enzyme may include but are not limited to temperature, pH, salt concentration and reaction durations.

[0019] As used herein, the term “polymerase chain reaction” (PCR) refers to a process where a single, or a few copies of a DNA molecule, can be amplified by several orders of magnitude, generating thousands to millions of copies of a particular DNA sequence. PCR is based on three discrete steps: denaturation of a DNA template, annealing of a primer to the denatured template DNA, and extension of the primer with a polymerase to create a nucleic acid complementary to the template DNA. A round of denaturation, annealing and extension is referred to as a “cycle”. It will be generally known in the art that in a PCR reaction, the annealing and extension steps may be combined. The conditions under which PCR cycles are performed are well established in the art. Thermostable polymerases such as Taq DNA polymerase and its variants are commonly used. The versatility of PCR has led to a large number of variants of PCR such as real-time/quantitative PCR, reverse transcription PCR, nested PCR, assembly PCR and multiplex PCR. PCR can be used in various research applications including genotyping, cloning, sequencing, mutagenesis and microarrays. Amplification may be clonal or non-clonal. Clonal amplification refers to an amplification from a single template DNA molecule only. Non-clonal amplification refers to an amplification from a mixture of template DNA molecules comprising more than one template DNA molecule. The term “amplicon” refers to a piece of DNA or RNA that is the source and/or product of amplification or replication events. The term “amplicon” may be used to describe an amplification product. Amplicons can be formed from PCR reactions. The term “copy number” in the context of an amplicon refers to the number of copies of the amplicon. The copy number of an amplicon can be used to quantify the amplification reaction.

[0020] The term “not able to amplify” or “unable to amplify” in the context of a template nucleic acid refers to a nucleic acid amplifying enzyme not being able to amplify the full length of the template nucleic acid or amplicon. There are various factors that could result in a nucleic acid amplifying enzyme, such as a polymerase, to be unable to amplify a template nucleic acid. These factors include conditions which inhibit the function of the enzyme, such as temperature, pH, salt concentration, as well as reaction parameters that limit the function of the enzyme, such as cycling conditions in a polymerase chain reaction (PCR). Cycling conditions may refer to PCR annealing times, or PCR extension times, or both. It will generally be understood that the PCR cycling conditions for optimal polymerase function, such as annealing time and extension time, will depend on the length of the template nucleic acid. For example, for a shorter amplicon, the polymerase would require a shorter annealing time and shorter extension time to carry out amplification of the full length of the template nucleic acid. For longer amplicons, the polymerase would require a longer annealing time and longer extension time to carry out amplification of the full length of the template nucleic acid. An example of a reaction parameter that limits the function of the nucleic acid amplifying enzyme is accelerated cycling conditions. The term “accelerated cycling conditions” refers to PCR cycling times that are shorter than the minimum cycling times required for the polymerase to amplify the full length of the template nucleic acid. For example, accelerated cycling conditions may refer to a PCR annealing time that is shorter than the minimum annealing time required for the nucleic acid amplifying enzyme to amplify the full length of the one or more template nucleic acid, or a PCR extension time that is shorter than the minimum extension time required for the nucleic acid amplifying enzyme to amplify the full length of the one or more template nucleic acid, or both. It will be understood by a person skilled in the art that a polymerase is unable to generate full-length amplicons under the accelerated cycling conditions.

[0021 ] The term “ Stoffel” or “ Stoffel fragment” as used herein refers to a protein that makes up amino acid residues 293 to 832 of full length Taq polymerase and is also produced as a recombinant protein in Escherichia coli. Stoffel DNA polymerase is around 2-fold more thermostable than Taq DNA polymerase and works over a broader range of magnesium ion concentrations.

[0022] The term “Compartmentalised Self Replication” (CSR), for the purposes of this application, refers to a technique originally developed to select for thermostable nucleic acid polymerase variants with improved functionality. CSR is a technique based on the self replication of polymerase genes by the encoded polymerases within discrete compartments. In particular, CSR entails clonal encapsulation of bacteria expressing a library of polymerase variants into the aqueous compartments of a heat-stable emulsion. Subsequent thermal cycling permits amplification of a polymerase gene only by the particular enzyme it encodes, thereby quantitatively linking activity of the constituent library members to the copy number of their respective genes. Genotype-phenotype linkage is therefore maintained.

[0023] The term “Compartmentalised 2-Hybrid Replication” (CH2R) as used herein refers to a modified version of CSR which permits selection of other proteins or enzymes by coupling the activities of the proteins to polymerase read-out. For example, CH2R may be used to couple the interaction between an interacting protein pair to polymerase read-out. CH2R involves expressing candidate proteins as respective fusions to a polymerase and a processivity clamp. Protein-protein interaction between the interacting protein pair brings the processivity clamp into close proximity with the polymerase, allowing DNA amplification in conditions that are otherwise prohibitive to the function of the polymerase. CH2R may also be used in the co evolution, or the selection of variants of, interacting protein pairs.

BRIEF DESCRIPTION OF THE DRAWINGS

[0024] The invention will be better understood with reference to the detailed description when considered in conjunction with the non-limiting examples and the accompanying drawings, in which:

[0025] Fig. 1 shows an overview of the peptide ligase detection paradigm. Peptides/protein substrates (X and Y) expressed as fusion proteins with a processivity enhancing protein (Sso7d) and Stoffel fragment. In the absence of any ligation, the Stoffel fragment is unable to PCR- amplify template DNA in presence of high salt concentrations (top panel). Ligation of the two fusion proteins results in tethering of Sso7d to Stoffel, and rescue of activity in the presence of high salt (middle panel). A fusion protein comprising Sso7d-Stoffel hybrid will PCR amplify template DNA under high salt concentrations (lower panel).

[0026] Fig. 2 shows that the coupling of Sso7d and Stoffel fragment mediated by SpyCatcher-SpyTag interaction facilitates PCR in high salt buffer conditions. In (A), indicated proteins were (co)-expressed in E.coli and cells directly used in PCR reactions with increasing KC1 concentrations (0, 100, 200, 300 mM). S7d-S: Sso7d - Stoffel fusion; S7d-SC: Sso7d- SpyCatcher fusion; ST-S: SpyTag-Stoffel fusion; S: Stoffel. (B) shows SDS-PAGE analysis of uninduced/induced E.coli cell lysates (co)-expressing indicated proteins. Highlighted bands indicate 1: Sso7d-Stoffel fusion (S7d-S); 2: Stoffel fragment (S); 3: Sso7d-SpyCatcher fusion (S7d-SC); 4: Sso7d-SpyCatcher fusion conjugated to SpyTag-Stoffel fusion (ST-S); 5: SpyTag-Stoffel fusion (ST-S); 6: Sso7d (S7d). In (C), indicated proteins (recombinantly expressed and purified) were (co)-incubated and an aliquot used in PCR with both normal and high salt buffer (+KC1). In (D), the same reaction mixes were analysed by SDS-PAGE (right). Highlighted band (*) indicates S7D-SC-ST-S fusion protein.

[0027] Fig. 3 shows Compartmentalised Self Replication (CSR) used in the selection of peptide ligase activity. A library of plasmids encoding mutagenized SpyCatcher variants (green) fused to Sso7d (blue) and SpyTag (magenta) fused to Stoffel (red) is transformed into E. coli. Post-expression, the cells are compartmentalized in the presence of dNTPS, oligonucleotide primers, PCR buffer and high salt. Thermal cycling lyses cells and PCR amplification of genes encoding active SpyCatcher variants (single asterisk) can occur in permissive compartments (eg top right) where fusion of Sso7d- spycatcher and Spytag-Stoffel has occurred. Post-PCR, emulsion is broken, and genes encoding active spycatcher variants harvested and analysed/subjected to further rounds of selection. The method can also be used to screen for improved SpyTag variants in addition to co-selection of novel orthogonal SpyCatcher-SpyTag variants.

[0028] Fig. 4 shows that novel SpyTag variants selected for using the method of the invention are functional. Indicated proteins were (co) expressed in E. coli and cell lysates analysed by SDS-PAGE. Lanes 4-7 indicate that novel SpyTag variants indicated (residues shown replace XXXX in sequence GAHXXXXDAYKP) can form a covalent bond with SpyCatcher (indicated by top arrow). Ligation of endogenous SpyTag to SpyCatcher is shown in lane 2. Note that SpyCatcher and SpyTag components are respectively fused to Sso7d and Stoffel fragment.

[0029] Fig. 5 shows that processivity-clamp fusion enhances polymerase activity in high salt buffer conditions. (A) shows PCR amplicon yields at indicated KC1 concentrations in reaction buffer using E. coli cells expressing either Stoffel fragment (S) or a Topoisomerase V HhH processivity domain - Stoffel fusion protein (H-S). (B) shows PCR amplicon yields at indicated KC1 concentrations in reaction buffer using E. coli cells expressing Sso7d - Stoffel fusion protein with induction at 37°C for 3 hours (lane 1), 37°C overnight (lane 2) and room temperature overnight (lane 3).

[0030] Fig. 6 shows that the coupling of Sso7d and Stoffel fragment mediated by reconstitution of split Nanoluc luciferase facilitates PCR in high salt buffer conditions. (A) shows the structure of NanoLuc highlighting the large (light grey) and small (darker grey) fragments of split Nanoluc. Peptide sequences of the endogenous (NS6) and engineered small fragments (NS1-NS5) along with affinity constants indicated to the right. (B) shows PCR amplification in absence (top panel) and presence (lower panel) of 100 mM KC1 by indicated co-expressed proteins. S7d-SC: Sso7d-SpyCatcher fusion; ST-S: SpyTag-Stoffel fusion; S: Stoffel. S7d-NB: Sso7d-NanoLuc large fragment fusion; NS(l-6)-S: NanoLuc small fragment- S toff el fusion.

[0031] Fig. 7 shows a Compartmentalised 2-Hybrid Replication (C2HR) selection paradigm. (1) Genes encoding a protein (NB) and interacting peptide (NS1) are co-expressed in E.coli from a single plasmid as respective fusions to Sso7d (S7d) and Stoffel fragment of Taq polymerase. Gene and encoded protein are denoted in same colour/pattern. (2) Cells are clonally segregated into discrete aqueous compartments comprising PCR reagents and high KC1 buffer. (3) Thermal cycling lyses cells and gene amplification mediated by specific primers (arrows) is only efficient in compartments hosting an interacting protein -peptide pair (top bubble). Deletion of the peptide gene from the expression plasmid (lower bubble) results in poor amplification due to none co-localisation of the Sso7d and Stoffel components. Gene amplification is correspondingly poor in cells co-expressing weak/none interacting protein- peptide pairs when libraries are interrogated. (4) Amplicons are harvested for analysis and/or further rounds of selection. [0032] Fig. 8 shows a C2HR model selection. E. coli cells co-expressing either Sso7d-NB + NS1-S or Sso7d-NB + S were mixed at different ratios prior to emulsification and CSR in high KC1 buffer (left panel) or direct PCR in high salt buffer (open control). Upper arrow indicates amplicon derived from cells expressing Sso7d-NB + NS1-S. Lower arrow indicates amplicon derived from cells expressing Sso7d-NB + S. These bands correspond to the large and small amplicons depicted in FIG. 7.

[0033] Fig. 9 shows C2HR selection of functional SpyTag and related variants. (A) shows consensus sequence logos derived from naive (n=20) and library 1 selectants (top 20 enriched). (B) shows consensus sequence logos derived from 500 most abundant sequences selected from libraries 1 and 2.

[0034] Fig. 10 shows that SpyTag variants selected by C2HR retain function. In (A), Sso7d- SpyCatcher (S7d-SC) fusion protein was co-expressed with Stoffel fragment alone (S) or Stoffel fragment fusions with wild-type SpyTag (ST) and indicated selectants. The core “IVMVD” motif has been omitted for clarity (replaced with vertical bar). Highlighted bands represent 1: S7d-SC-ST-S fusion protein; 2: Stoffel fragment; 3: S7d-SC. All selectants yield correct size fusion protein corresponding to wild-type SpyTag control (band 1). In (B), the same expressor cells highlighted in (A) were used directly in PCR reactions ± KC1 (100 mM). As with wild-type SpyTag, all SpyTag variants enabled PCR in high-salt buffer.

[0035] Fig. 11 shows the directed co-evolution of SpyCatcher and SpyTag. (A) shows the two underlined phenyalanine residues in SpyCatcher and the underlined isoleucine in SpyTag were randomized prior to selection. The corresponding positions of these residues in the binary complex are shown on the right. (B) shows consensus sequence logos for naive and library selectants after one or two rounds of C2HR. Frequency of endogenous (FF/I) and other enriched motifs indicated.

[0036] Fig. 12 shows a pull-down assay of Sso7D-SpyCatcher protein (S7D-SC) by endogenous (ST) and selected (STL2) biotinylated SpyTag peptides. Covalently bound Sso7D- SpyCatcher protein indicated by arrow in SDS-PAGE gel. Streptavidin beads with no peptides bound used as control. Lower protein band corresponds to streptavidin monomer co-eluted from beads.

[0037] Fig. 13 shows that coupling of Sso7d and Stoffel fragment mediated by reconstitution of split Nanoluc luciferase facilitates PCR with accelerated cycling conditions. This figure shows that high affinity interactions can be detected under conditions of accelerated cycling counditions. (A) shows PCR amplification of a 1545 bp amplicon using shortened annealing and extension times (15 and 10 seconds respectively) and (B)shows PCR amplification using longer annealing and extension times (30 and 120 seconds respectively) by indicated co-expressed proteins. S7d-SC: Sso7d-SpyCatcher fusion. Under conditions of accelerated cycling successful amplification only occurs for cells co-expressing high affinity interactant pairs (NS1/3/4/5-S co-expressed with S7d-NB and ST-S co-expressed with S7d- SC). ST-S: SpyTag-Stoffel fusion; S: Stoffel. S7d-NB: Sso7d-NanoLuc large fragment fusion; NS(l-6)-S: NanoLuc small fragment- Stoffel fusion. PetF2 and PetRC primer pair was used (500 nM final concentration) along with 10 ng pET-SBPp53delta plasmid template.

DETAILED DESCRIPTION OF THE PRESENT INVENTION

[0038] In a first aspect, the present invention refers to a method of detecting an interaction between a first protein and a second protein, the method comprising: a) providing a first fusion protein comprising the first protein connected to a nucleic acid amplifying enzyme; b) providing a second fusion protein comprising the second protein connected to a processivity enhancing protein; c) contacting the first protein of the first fusion protein to the second protein of the second fusion protein; d) performing a nucleic acid amplification reaction of one or more template nucleic acid under a condition in which the nucleic acid amplifying enzyme is not able to amplify the one or more template nucleic acid; e) detecting the interaction between the first protein and the second protein, wherein the interaction between the first protein and the second protein restores the activity of the nucleic acid amplifying enzyme and wherein the amplification of the one or more template nucleic acid under said condition is indicative of the interaction between the first protein and the second protein.

[0039] The first protein of the first fusion protein and the second protein of the second fusion protein may interact in the presence or absence of a protein ligase. In one embodiment, the method as described herein further comprises providing a protein ligase; wherein the contacting of the first protein of the first fusion protein to the second protein of the second fusion protein in step c) comprises the ligation of the first protein to the second protein by the protein ligase; and wherein the amplification of the one or more template nucleic acid in step e) is indicative of protein ligase activity.

[0040] In one embodiment, the first protein of the first fusion protein or the second protein of the second fusion protein is a protein ligase; wherein the contacting of the first protein of the first fusion protein to the second protein of the second fusion protein in step c) comprises the ligation of the first protein to the second protein by the protein ligase; and wherein the amplification of the one or more template nucleic acid in step e) is indicative of protein ligase activity.

[0041] In one aspect, the present invention refers to a method of determining activity of a protein ligase, the method comprising: a) providing a first fusion protein comprising a first protein connected to the a nucleic acid amplifying enzyme; b) providing a second fusion protein comprising a second protein connected to a processivity enhancing protein; c) ligating the first protein of the first fusion protein to the second protein of the second fusion protein with the protein ligase; d) performing a nucleic acid amplification reaction of one or more template nucleic acid under a condition in which the nucleic acid amplifying enzyme is not able to amplify the one or more template nucleic acid; e) determining the activity of the protein ligase, wherein the interaction between the first protein and the second protein restores the activity of the nucleic acid amplifying enzyme and wherein the amplification of the one or more template nucleic acid under said condition is indicative of protein ligase activity.

[0042] The method as described herein may be used to determine the activity of a protein ligase. Examples of protein ligases include SpyCatcher, Sortase, butelase-1 (CtAEPl) and OaAEPlb. In one example, in the presence of the protein ligase, the first protein of the first fusion protein and the second protein of the second fusion protein are ligated. The ligation of the first protein and the second protein restores the activity of the nucleic acid amplifying enzyme and facilitates amplification of the one or more template nucleic acid as described herein. Therefore, amplification of the one or more template nucleic acid is indicative of protein ligase activity. In other words, the activity of the protein ligase is coupled to the amplification of the template nucleic acid. The protein ligase activity may be quantified in relative terms based on the copy number of the amplicons.

[0043] In one embodiment, the first protein and the second protein may be naturally occurring or mutated. In another embodiment, the protein ligase may be naturally occurring or mutated. The mutation may be selected from the group consisting of substitution, insertion, deletion, truncation and combinations thereof. The mutation may be introduced by targeted mutation, random mutation, or combinations thereof. Libraries of protein variants may be generated by mutation of the proteins. Protein variants may be functional or non-functional. [0044] In some embodiments, the first fusion protein may comprise the first protein connected to the N-terminus or the C-terminus of the nucleic acid amplifying enzyme. In some embodiments, the second fusion protein may comprise the second protein connected to the N- terminus or the C-terminus of the processivity enhancing protein.

[0045] In one embodiment, the interaction between the first protein and second protein is a covalent interaction or a non-covalent interaction. Non-covalent interactions may include hydrogen bonds, ionic bonds, van der Waals interactions and hydrophobic bonds. A covalent interaction may result from a ligation reaction.

[0046] The nucleic acid amplifying enzyme may be a DNA polymerase or an RNA polymerase. In one embodiment, the nucleic acid amplifying enzyme is a Stoffel fragment of a Taq DNA polymerase. In another embodiment, the nucleic acid amplifying enzyme is the functionally equivalent domain of Stofflel fragment from a family A or family B polymerase, such as polB from Thermococcus kodakarensis or polA-DNA polymerase I from Thermus thermophilus.

[0047] The processivity of a nucleic acid amplifying enzyme refers to the number of nucleotides that the nucleic acid amplifying enzyme can incorporate into the nucleic acid during a single template-binding event, before dissociating from the nucleic acid template. The processivity of a nucleic acid amplifying enzyme can be increased by the binding of a processivity enhancing protein. In one embodiment, the processivity enhancing protein is Sso7d or topoisomerase V HhH. In another embodiment, the processivity enhancing protein is Sso7d.

[0048] In one embodiment, the first protein and the second protein are respectively SpyCatcher and SpyTag, or SpyTag and SpyCatcher. [0049] In one embodiment, the first protein and the second protein are respectively the large peptide fragment (NB) and small peptide fragment (NS) of split NanoLuc luciferase, or the small peptide fragment (NS) and the large peptide fragment (NB) of split NanoLuc luciferase. [0050] In one embodiment, the method of detecting an interaction between a first protein and a second protein or the method of determining activity of a protein ligase as described herein is carried out in vitro.

[0051] In one embodiment, step a) to step c) of the method as described herein is carried out in a cell. The cell may be a mammalian cell, a yeast cell or a bacterial cell. In one embodiment, the cell is a bacterial cell. The bacterial cell may be an Escherichia coli cell. [0052] In one embodiment, the first fusion protein and the second fusion protein are expressed in a cell. It will generally be understood that the polynucleotide sequences encoding the fusion proteins are transformed into the cells and that the expression of the fusion proteins are induced in the cell. In one embodiment, the first fusion protein and the second fusion protein are expressed in a cell comprising a protein ligase. The protein ligase may be native to the cell or the polynucleotide sequence encoding the protein ligase may be transformed into the cell and expressed in the cell.

[0053] In one embodiment, the one or more template nucleic acid as described herein is template DNA. The template nucleic acid may be a predetermined template nucleic acid. In one embodiment, the one or more template nucleic acid as described herein comprises the polynucleotide sequences encoding the first protein, or the polynucleotide sequence encoding the second protein, or both. In one embodiment, the one or more template nucleic acid as described herein comprises the polynucleotide sequence encoding the protein ligase.

[0054] Where step a) to step c) of the method as described herein is carried out in a cell, it is possible to maintain genotype-phenotype linkage when each cell contains only a single variant of the first protein and when the template nucleic acid is the gene encoding the variant of the first protein. In this example, when the variant of the first protein is a functional variant that is able to interact with the second protein, clonal amplification of the template nucleic acid occurs during CSR. Since the template nucleic acid includes the gene encoding the variant of the first protein, a functional variant of the first protein generates amplification of its own gene. The affinity of interaction of the variant of the first protein with the second protein is therefore quantitatively linked to the copy number of the variant of the first protein after CSR. It will be understood by a person skilled in the art that the method as described herein may also be used in a situation where each cell contains a single variant of the second protein and when the template nucleic acid is the gene encoding the variant of the second protein.

[0055] It will be understood by a person skilled in the art that the method as described herein may also be used in a situation where each cell contains a single variant of a protein ligase and when the template nucleic acid includes the gene encoding the variant of the protein ligase. In one embodiment, the method can be used to screen a library of variants of a protein ligase by coupling the activity of the protein ligase variant to the amplification of the one or more template nucleic acid. Clonal amplification of the template nucleic acid (e.g. during CSR) is indicative of ligation of a first protein of an interacting protein pair to the second protein of the interacting protein pair. To maintain genotype -phenotype linkage, step a) to step c) of the method may be carried out in a cell and the template nucleic acid in each cell is the gene encoding the said protein ligase variant. In this example, ligation of the first protein to the second protein results in amplification of the gene encoding said protein ligase variant during CSR, thereby quantitatively linking activity of the protein ligase variants to the copy number of their respective genes. The activity of the protein ligase variants can therefore be compared with the activity of wild type or other variants of protein ligases. Therefore, the method as described herein may be used for the screening of protein ligases.

[0056] In one embodiment, the cell as described herein is located in a water- in-oil droplet. [0057] In one embodiment, the water-in-oil droplet comprising the cell further comprises a buffer, salt, deoxynucleotides (dNTPs) and primers within the same droplet.

[0058] In one embodiment, the water-in-oil droplet has a diameter of between 1 pm and 20 pm. In another embodiment, the water-in-oil droplet has a diameter of between 1 pm and 10 pm.

[0059] In one embodiment, the nucleic acid amplification reaction is polymerase chain reaction (PCR).

[0060] In one embodiment, the condition of step d) of the method as described herein is a salt concentration of at least 50 mM, or accelerated cycling conditions, or both. .

[0061] In one embodiment, the condition is a salt concentration between 50 mM to 300 mM. The condition may be a salt concentration of about 50 mM, about 100 mM, about 150 mM, about 200 mM or about 300 mM. In another embodiment, the salt concentration is about 100 mM.

[0062] In one embodiment, the salt is potassium chloride. [0063] Cycling conditions may refer to PCR annealing times, or PCR extension times, or both. In one embodiment, the accelerated cycling conditions is a PCR annealing time that is shorter than the minimum annealing time required for the nucleic acid amplifying enzyme to amplify the full length of the one or more template nucleic acid, or a PCR extension time that is shorter than the minimum extension time required for the nucleic acid amplifying enzyme to amplify the full length of the one or more template nucleic acid, or both. It will be understood that the nucleic acid amplifying enzyme is unable to amplify the full length of the one or more template nucleic acid under the accelerated cycling conditions.

[0064] The detection of the interaction between the first protein and the second protein in step e) of the method as described herein may be based on the detection and quantification of amplicons to determine the strength of the interaction relative to the interaction of a reference pair of interacting proteins. The reference may be an interacting protein pair, such as a wild type interacting protein pair, which serves as a positive control. The reference may be a non interacting protein pair which serves as a negative control.

[0065] The determination of protein ligase activity in step e) of the method as described herein may be based on the detection and quantification of PCR amplicons to determine the activity of a protein ligase relative to a reference activity of a protein ligase. The reference may be a functional protein ligase, such as a wild type protein ligase, which serves as a positive control. The reference may be any protein ligase with known activity. The reference may be a non-functional protein ligase which serves as a negative control.

[0066] In one aspect, the present invention refers to a bacterial cell comprising: a) a polynucleotide sequence encoding a first fusion protein comprising a first protein connected to a nucleic acid amplifying enzyme; b) a polynucleotide sequence encoding a second fusion protein comprising a second protein connected to a processivity enhancing protein.

[0067] In one embodiment, the bacterial cell as described herein further comprises a polynucleotide sequence encoding a protein ligase.

[0068] In one aspect, the present invention refers to a method of selecting one or more variants of an interacting protein pair, wherein the one or more variants have a higher affinity interaction relative to the wild type protein pair, the method comprising: a) providing a first fusion protein comprising a first protein variant of the interacting protein pair connected to a nucleic acid amplifying enzyme; b) providing a second fusion protein comprising a second protein variant of the interacting protein pair connected to a processivity enhancing protein; c) contacting the first protein variant of the first fusion protein to the second protein variant of the second fusion protein; d) performing a nucleic acid amplification reaction of one or more template nucleic acid under a condition in which the nucleic acid amplifying enzyme is not able to amplify the one or more template nucleic acid; e) detecting the interaction between the first protein variant and the second protein variant, wherein the interaction between the first protein variant and the second protein variant restores the activity of the nucleic acid amplifying enzyme and wherein the amplification of the one or more template nucleic acid under said condition is indicative of the interaction between the first protein variant and the second protein variant; f) measuring the copy number of the one or more amplified template nucleic acids resulting from the interaction of the variant protein pair and comparing this to the copy number of the one or more amplified template nucleic acid resulting from the interaction of the wild type protein pair; g) selecting the one or more variant proteins with a copy number of amplified template nucleic acid higher than the wild type protein pair, wherein a higher copy number of amplified template nucleic acid is indicative of a higher affinity interaction.

[0069] In one example, the method as described herein can be used to screen a library of variants of a first protein of an interacting protein pair by coupling the interacting affinity of the first protein variant to nucleic acid amplification readout during clonal amplification. Clonal amplification may be facilitated by Compartmentalised Self Replication (CSR) which entails clonal encapsulation of bacterial cells into the aqueous compartments of a heat-stable emulsion. Clonal amplification may also be facilitated by depositing one bacterial colony in one reaction well or tube. Screening of clonal amplification may be conducted using robotic or manual methods. Amplification of the template nucleic acid is indicative of an interaction between the variant of the first protein with the second protein of the interacting protein pair. In embodiments where step a) to step c) of the method is carried out in a cell and the template nucleic acid is the gene encoding the said variant of the first protein, a high affinity interaction between the first protein variant and the second protein results in amplification of the gene encoding said first protein variant, thereby quantitatively linking the interacting affinity of the first protein variants to the copy number of their respective genes. Upon comparing the copy numbers of the amplified template nucleic acid against the wild type protein pair, variants of the first protein with high interacting affinity can be selected.

[0070] In another example, the method as described herein can be used to select variants of both members of an interacting protein pair, or co-evolving an interacting protein pair, by coupling the interacting affinity of the first protein variant and second protein variant to nucleic acid amplification readout. Clonal amplification of the template nucleic acid is indicative of an interaction between the variant of the first protein with the variant of the second protein of the interacting protein pair. Clonal amplification may be facilitated by Compartmentalised Self Replication (CSR) which entails clonal encapsulation of bacterial cells into the aqueous compartments of a heat-stable emulsion. Clonal amplification may also be facilitated by depositing one bacterial colony in one reaction well or tube. Screening of clonal amplification may be conducted using robotic or manual methods. In embodiments where step a) to step c) of the method is carried out in a cell and the template nucleic acid comprises the gene encoding said variant of the first protein and the gene encoding said variant of the second protein, a high affinity interaction between the first protein variant and the second protein results in amplification of the gene encoding said first protein variant and the gene encoding said second protein variant, thereby quantitatively linking the interacting affinity of the protein variants to the copy number of their respective genes. Upon comparing the copy numbers of the amplified template nucleic acid against the wild type protein pair, variants of the first and second proteins with high interacting affinity can be selected.

[0071] In one embodiment, the interacting protein pair is SpyCatcher-SpyTag. In another embodiment, the interacting protein pair is the large peptide fragment (NB) and small peptide fragment (NS) of split NanoLuc luciferase.

[0072] The invention illustratively described herein may suitably be practiced in the absence of any element or elements, limitation or limitations, not specifically disclosed herein. Thus, for example, the terms "comprising", "including", "containing", etc. shall be read expansively and without limitation. Additionally, the terms and expressions employed herein have been used as terms of description and not of limitation, and there is no intention in the use of such terms and expressions of excluding any equivalents of the features shown and described or portions thereof, but it is recognized that various modifications are possible within the scope of the invention claimed. Thus, it should be understood that although the present invention has been specifically disclosed by preferred embodiments and optional features, modification and variation of the inventions embodied herein disclosed may be resorted to by those skilled in the art, and that such modifications and variations are considered to be within the scope of this invention.

[0073] The invention has been described broadly and generically herein. Each of the narrower species and sub generic groupings falling within the generic disclosure also form part of the invention. This includes the generic description of the invention with a proviso or negative limitation removing any subject matter from the genus, regardless of whether or not the excised material is specifically recited herein.

[0074] Other embodiments are within the following claims and non- limiting examples. In addition, where features or aspects of the invention are described in terms of Markush groups, those skilled in the art will recognize that the invention is also thereby described in terms of any individual member or subgroup of members of the Markush group.

EXPERIMENTAL SECTION

[0075] Non-limiting examples of the invention and comparative examples will be further described in greater detail by reference to specific Examples, which should not be construed as in any way limiting the scope of the invention.

[0076] Materials and methods [0077] Materials

[0078] Oligonucleotides and genes were from Integrated DNA Technologies; restriction enzymes, T4 polynucleotide kinase and T4 DNA ligase were from NEB; Pfu DNA polymerase (Agilent Technologies) and Taq DNA polymerase (Bioline) were used for DNA amplification. Nucleic acid purification kits were from Qiagen and chemicals from Sigma. Electrocompetent TGI and BL21 cells were obtained from Lucigen.

[0079] Table 1. Oligonucleotides

[0080] Vector construction

[0081] Taq pET22b(+) was generated via amplification of the Taq polymerase gene with primers TAQNdel-F and TAQXholR, followed by infusion into pET22b(+) via Ndel and Xhol sites. Inverse PCR was carried out on Taq pET22b(+) with primers pET-ATG-R and Stoff-F, followed by intramolecular ligation to produce Stoffel pET22b(+) - which encodes only the Stoffel fragment. HhH-Stoffel pET22b(+), which encodes for Topoisomerase V HhH processivity domain - Stoffel fusion, was produced via amplification of the processivity domain gene using primers HhH-Stoff-F and HhH-GGG-Stoff-R, followed by infusion into Stoffel pET22b(+) via Ndel site. Inverse PCR and intramolecular ligation were carried out on Taq pET22b(+) with primers StoffAPWP-F and KALEtoLPETGGG-R to generate Exo-Stoffel pET22b(+). Sso7d-Stoffel pET22b(+), encoding for Sso7d - Stoffel fusion, was constructed via infusion cloning Sso7d gene with primers SS07DINF-F and SS07DINF-R into an inverse PCR product from amplification of Exo-Stoffel pET22b(+) generated using primers pET-ATG- R and EXOLPETV2-F. Stoffel pETDuet-1 was constructed by subcloning stoffel fragment from Stoffel pET22b(+) with primers Ndel-Stoff-F and Xhol-Stoff-R into the second multiple cloning site (MCS) of pETDuet-1 via Ndel and Xhol sites. Sso7d was introduced into the first MCS of pETDuet-1 using primers SS07D-BAM-pETDuet-l -F2 and SS07D-Sort-SalI- pETDuet-1 -R on Stoffel pET22b(+). This produces Sso7d Stoffel pETDuet-1. Inverse PCR was carried out on Sso7d Stoffel pETDuet-1 with primers DUETHHH-F and DUETHHH-R, followed by infusion of SpyCatcher gene (residues 22 to 101) using primers SPYCDUET-F2 and SPYCHHH-R2. This gives Sso7d-SpyCatcher Stoffel pETDuet-1. Complementary primer pair SPYTINF-TOP and SPYTINF-B were annealed to form an oligo duplex which was cloned into Sso7d-SpyCatcher Stoffel pETDuet-1 via Ndel site to yield Sso7d-SpyCatcher SpyTag- S toff el pETDuet- 1.

[0082] The large fragment of split NanoFuc lucif erase was amplified using primers NanoBigDUET-F and NanoBigHHH-R and the product cloned into Sso7d Stoffel pETDuet-1 to create Sso7d-NB Stoffel pETDuet-1. A series of complementary primer pairs were annealed to form oligo duplexes which were cloned into this vector to get Sso7d-NB NS1/2/3/4/5/6- Stoffel pETDuet-1 for test selection.

[0083] Fib 1 and Fib 2 were created by amplifying Sso7d-SpyCatcher Stoffel pETDuet-1 with primers FPETGG-Sall-F and SPYTR6.2, and FPETGG-Sall-F and SPYTR5.2 respectively. Fib 3 was created by overlap extension PCR of two PCR products - the first with primers FPETGG-Sall-F and SpyC-NNKl-R and the second with primers SpyC-NNK2-F and SpyT-NNK-R on the same vector. All resultant library PCR products were then cloned into Sso7d-SpyCatcher Stoffel pETDuet-1 via Sail and Spel.

[0084] Constructs for expression and purification of Sso7d-SpyCatcher and SpyTag-Stoffel fusion proteins were created by amplifying Sso7d-SpyCatcher SpyTag-Stoffel pETDuet-1 using primer pairs INF-Pet22-JW-SSO-SPYC-F and INF-Pet22-JW-SSO-SPYC-R, and INF- Pet22-JW-SPYT-F and INF-Pet22-JW-SPYT-R and the subsequent respective PCR products infused into pET22b(+). [0085] Polymerase activity assays

[0086] Constructs expressing Stoffel, HhH-Stoffel fusion protein and Sso7d-Stoffel were transformed into E. coli BL21 (DE3) competent cells. Cells expressing HhH-Stoffel were induced for 3 hours at 37°C with 1 mM IPTG. Cells expressing Sso7d-Stoffel were induced with 1 mM IPTG with different temperature and duration as described in text. 1 mL of culture was then harvested by centrifugation, washed with PBS twice and resuspended in 50 pL of PBS. 2 pL of cell suspension was used for PCR (95°C for 5 mins, 25 cycle of 95°C for 5 s, 55°C for 30 s, and 72°C for 1 min) with 10 ng pET22b(+) and 0.5 pM of each primer petF2 and pET-ATG-R. PCR reactions involving HhH-Stoffel were carried out with PCR reaction buffer containing 30 mM Tris pH 8.0 and 0.2% Tween 20 while reactions for Sso7d-Stoffel were carried out with PCR reaction buffer comprising of 10 mM Tris-HCl pH 8.3, 10 mM KC1, 1.5 mM MgCh. Differentiation of polymerase activity was carried out by adjustment of salt concentration using KC1. Successful polymerase activity yields a 198bp amplicon using expression plasmid as template. Subsequently selected SpyTag variants were screened and compared using the same method. Activity assays for PCR of the larger 1545 bp fragment directly using expressor cells was carried out essentially as above using 10 ng pET22- SBPp53delta plasmid template and 0.5 pM of each primer petF2 and petRC. Normal cycling conditions were 95°C for 5 mins (1 cycle), 95°C for 5 s, 60°C for 30 s, and 72°C for 120 s (25 cycles). Accelerated cycling parameters were 95 °C for 5 mins (1 cycle), 95 °C for 5 s, 60°C for 15 s, and 72°C for 10 s (35 cycles).

[0087] Compartmentalised Self Replication (CSR) selections

[0088] 200 pL of aqueous phase consisting of Stoffel buffer (10 mM Tris-HCl pH8.3, 10 mM KC1 and 1.5 mM MgCh), 0.25 mM dNTPs, 0.5 pM of each primer, 100 mM KC1, lmg/mL BSA, 1 x 107 E. coli BL21 (DE3) expressor cells were manually dispersed (1 drop every 5 seconds) into 400 pL of oil phase [4.5% (v/v) Span 80, 0.45% (v/v) Tween 80 and 0.05% Triton X-100 (v/v) in mineral oil] with constant stirring at 1250rpm. Stirring was continued for 9 mins before thermocycling. CSR was carried out using different primers pairs (BIOOLS79- duetMCS2-F and Spytag-Spel-R2 for test selection, BIO-OLS79-LPETGG-Sall-F and Spytag-Spel-R2 for selection of Lib 1 and 3, BIO-OLS79-LPETGG-Sall-F and NESTSpyTag- Spel-R3 for selection of Lib 2) at 95°C for 5 mins, followed by 10 cycles of 95°C for 5 s, 55°C for 30 s and 72°C for 1 min. The aqueous phase was extracted twice with 900 pL ether and treated with 10 pL exonuclease and 2 pL Dpnl overnight at 37°C. The aqueous phase was then incubated with 25 pL streptavidin M280 beads (Invitrogen) for 1 hour with rotation at room temperature before 3 washes with 200 pL of PBSBT [PBS + 0.1% (w/v) BSA, 0.1%(v/v) Tween 20] and 3 washes with 200 pL of PBS. The beads were then resuspended with PCR reactions containing different primer pairs (NESTOLS79-duetMCS2-F and Spytag-Spel-R2 for test selection, NESTOLS79-LPETGG-F and Spytag-Spel-R2 for selection of Lib 1 and 3, NESTOLS79-LPETGG-F and NESTSpyTag-SpeI-R4 for selection of Lib 2) and subjected to a rescue PCR (95°C for 5 mins followed by 20 cycles of 95°C for 5 s, 55°C for 20 s and 72°C for 1 min).

[0089] Sequence analysis

[0090] Amplicons generated by Compartmentalised 2-Hybrid Replication (C2HR) were adapted by PCR using primers SpyT-F19 and SpyT-R19 (Lib 1) and SpyT-F17 and SpyT-R17 (Lib 2) and sequencing carried out using the NextSeq Illumina platform (DNA Link, Korea). Data extraction/analysis was carried out using Python.

[0091] Protein expression and purification

[0092] The Sso7D-SpyC construct was cloned with a N-terminal 6xHis-tag and transformed into Escherichia coli BL21(DE3) (Invitrogen) competent cells. These were grown in LB medium at 37°C and induced at OD600 nm ~ 0.6 at 25°C with 1 mM IPTG and incubated overnight. Cells were then harvested by centrifugation, sonicated and heated at 65 °C for 15 min before clarification by centrifugation. The clarified cell lysate was applied to a His-TrapFF column (GE Healthcare) and purified using a gradient elution. The fractions containing the protein were pooled and dialyzed into buffer A solution (20 mM Tris, pH 8, 1 mM DTT) using HiPrep 26/10 desalting column, and loaded onto anion-exchange Resource Q 1 mL column (GE Healthcare) pre-equilibrated in buffer A. The column was then washed in 60 column volumes of buffer A and bound protein was eluted with a linear gradient in buffer comprising 1 M NaCl, 20 mM Tris pH 8, and 1 mM DTT over 30 column volumes. Protein purity as assessed by SDS-PAGE was ~95%, and the protein was concentrated using Amicon-Ultra (3 kDa MWCO) concentrator (Millipore).

[0093] The SpyTag-Stoffel construct was cloned with a C-terminal 6xHis-tag and transformed into Escherichia coli BL21(DE3) (Invitrogen) competent cells. These were grown in LB medium at 37°C and induced at OD600 nm ~ 0.6 at 30°C with 0.5 mM IPTG and incubated overnight. Cells were then harvested by centrifugation, sonicated, then heated at 65°C for 15 min and clarified by centrifugation. The clarified cell lysate was applied to a His- TrapFF column (GE Healthcare) and purified using a gradient elution. The fractions containing the protein were pooled and buffer exchanged into buffer with 50 mM Tris pH 8, 150 mM NaCl, 1 mM DTT and run on a size exclusion Hi Load 16/600 Superdex S200 column. Fractions were pooled and protein purity as assessed by SDS-PAGE was ~ 95%. The protein was concentrated using Amicon-Ultra (10 kDa MWCO) concentrator (Millipore).

[0094] Activity assay was carried out by co-incubating purified proteins (Sso7D-SpyC and SpyTag-Stoffel, 5 mM each) at room temperature for 30 minutes. 1 pL of the reaction mixture was subjected to polymerase activity assay.

[0095] Pull-down assay

[0096] Biotin-labelled peptides (100 pM) were incubated with streptavidin beads (50 pL) for 2 hours at room temperatures prior to washing with 3 washes of PBS + 0.1% (v/v) Tween 20. Beads were next incubated at 4°C overnight with 500 pM of Sso7d-SpyCatcher protein, followed by 3 washes with PBS + 0.1% (v/v) Tween 20 and then 3 washes with PBS. Bound protein was eluted by boiling in SDS buffer prior to analysis by SDS-PAGE.

[0097] Example 1: Peptide ligase detection protocol

[0098] FIG. 1 is a schematic of the peptide ligase detection protocol. The substrate pair of interest were respectively fused to the C-terminus of SSo7D and the N-terminus of the Stoffel fragment (residues 293 to 832 of Taq polymerase). Ligation of the substrate pair effected by a peptide ligase or intrinsic activity resulted in a fusion protein capable of amplifying DNA in the presence of high (50-200mM) salt concentrations. In the absence of ligation, no amplification products were detected. The ligation reaction can be carried out using purified components, or within the confines of a bacterial cell within which the enzyme/substrate proteins have been expressed.

[0099] Example 2: Coupled polymerase read-out of protein-peptide interactions using model interactants.

[00100] The polymerase activity of the Stoffel fragment of Taq DNA polymerase (amino acids 293-832) fused to either the Sso7d or Topoisomerase V HhH processivity domains was assayed. Both domains facilitated PCR amplification in higher salt concentrations (>50 mM KC1) that inhibited the non-chimeric Stoffel fragment (FIG. 5).

[00101] The Spycatcher-Spytag protein-peptide pair associate with relatively high affinity to form a complex with exceptional stability due to interlinking isopeptide bond formation. Sso7d-Spycatcher and SpyTag-Stoffel fusion proteins were co-expressed in E.coli and polymerase activity assayed by adding cells directly to other standard PCR components and carrying out thermal cycling in buffer with increasing salt concentrations. Covalent association between Spycatcher and bound SpyTag peptide resulted in an Sso7d-Spycatcher-SpyTag- Stoffel fusion protein competent for PCR in high salt buffer (FIG.2A). Control reactions omitting either one or both of the SpyCatcher/SpyTag components did not show any DNA amplification. SDS-PAGE analysis of cell lysates used in PCR confirmed formation of the thermostable Sso7d-Spycatcher-SpyTag-Stoffel fusion protein (FIG.2B). Similar results were obtained using purified protein components (FIG. 2C). Only the reaction comprising Sso7d- Spycatcher and SpyTag-Stoffel proteins yielded PCR amplicons in high-salt buffer, with formation of the Sso7d-SpyCatcher-SpyTag-Stoffel fusion protein again confirmed by SDS- PAGE (FIG. 2D). Next, the SpyCatcher and SpyTag components were replaced with the non- covalently interacting large and small peptide fragments of split Nano Luc luciferase (NB and NS respectively). A series of small peptide fragments with wide ranging affinities for the large fragment (Kds 0.7 nM to 1.9 x 10 5 nM) were fused to Stoffel and individually co-expressed with the Sso7d-large fragment chimera. PCR analysis directly using expressor cells showed a positive high-salt buffer read-out for peptide variants with affinities < 180 nM for the large NanoLuc fragment (FIG. 6). Furthermore, amplicon yields correlated with the reported affinities of the small fragment peptides, with maximal polymerase activity observed for the highest affinity peptide (NS1, Kd = 0.7 nM).

[00102] Example 3 : Clonal amplification by Compartmentalised Self Replication (CSR)

[00103] In order to transpose this assay into a directed evolution format (permitting selection from millions of enzyme variants), the Compartmentalised Self Replication (CSR) methodology (FIG. 3) may be used. CSR was first developed to enable engineering of thermostable polymerases. Bacterial cells expressing a mutagenized polymerase library were clonally segregated into the aqueous compartments of a water-in-oil emulsion. Thermocycling of the emulsion lysed each bacterium, releasing cellular components into hosting aqueous compartment. By supplementing with reagents required for PCR (added to aqueous phase during emulsification procedure), active enzymes were only able to amplify the genes encoding them, thus maintaining genotype-phenotype linkage. Selection for improved peptide ligase activity using CSR first involved expression of relevant “signal-amplification” components (Sso7d and Stoffel fragments fused to relevant peptide/proteins) and a different ligase variant in each bacterial cell. The collection of cells (ie library) was then emulsified to yield on average one bacterial cell per aqueous compartment that also comprised buffer, high salt, dNTPS and oligonucleotide primers flanking the gene encoding the peptide ligase. Thermal cycling first lysed cells (and destroyed non-thermostable E.coli proteins) and allowed for clonal amplification of peptide ligase genes that encoded functional enzymes. These amplicons were harvested post-CSR and further analysed (secondary assay/sequencing) or re-cloned into appropriate expression plasmid for further rounds of selection.

[00104] Example 5: Model selections for interacting proteins and peptides using the Compartmentalised Self Replication (CSR) platform.

[00105] The dynamic read-out of the reporter polymerase was next evaluated in the CSR platform. A test selection was carried out using E. coli cells co-expressing either Sso7d-NB + Stoffel or Sso7d-NB + NSl-Stoffel (FIG. 7). Cells were mixed at different ratios prior to emulsification and thermocycling in high salt buffer using a primer pair common to both expression constructs flanking the NS1 cassette. In the absence of emulsification, the Sso7d- NB -NS 1 -Stoffel complex amplified from both expression plasmid templates as expected (FIG. 8). In contrast, C2HR enabled clonal amplification/enrichment of the NS1 cassette in plasmids expressing NSl-Stoffel (upper arrowed band) over those expressing Stoffel only (lower arrowed band). This was readily apparent at the 1:100 ratio of cells, with selection for the NS1 gene cassette occurring only when C2HR is used. The panel of cells co-expressing Sso7d-NB and NS-Stoffel variants (FIG. 6) were next combined equally and one round of C2HR carried out. Analysis of only 10 selectants indicated preferential enrichment for the high affinity NS1 variant (Kd = 0.7 nM, 5/10 selectants) followed by the next highest affinity variant, NS5 (Kd = 3.4 nM, 3/10 selectants). The other two selectants encoded the lower affinity NS2 variant. Together, these experiments confirmed that C2HR was able to select for high-affinity interacting protein pairs.

[00106] Example 6: Selection of functional SpyTag variants from a semi-randomised library using CSR.

[00107] The selection strategy shown in FIG. 3 was implemented to select for randomized Spytag variants capable of binding to and forming a covalent bond with Spycatcher. The endogenous Spytag amino acid sequence is: GAHIVMVDAYKP (SEQ ID NO: 20). The underlined hydrophobic residues are important for high affinity interaction with Spycatcher. We created a library of randomized Spytags conforming to the sequence GAHXXXXDAYKP (SEQ ID NO: 21) where X represents any amino acid. These were genetically fused to the Stofel fragment and selection for ligation to wild-type Spycatcher carried out using CSR as described in FIG. 3. Three rounds of selection were carried out and enriched genes cloned and sequenced after each round.

[00108] Several variant Spytags comprising hydrophobic motifs selected for by CSR (LVLW (SEQ ID NO: 86), MMLM (SEQ ID NO: 87), FVFY (SEQ ID NO: 88), VVCR (SEQ ID NO: 89)) were expressed in E.coli as N-terminal fusions to Stoffel fragment along with Sso7d-Spycatcher. SDS-PAGE analysis (FIG. 4) of lysates from the same expressing cells used for the PCR assay indicated the presence of a covalent bond between SpyCatcher and the variant SpyTags selected (top left blue arrow points to covalently bonded Sso7d-SpyCatcher and SpyTag-Stoffel proteins. Lane 1 is positive control (Sso7d-Stoffel fusion protein). Lane 2 is positive control (Sso7d-Spycatcher and WTSpyTag-Stoffel fusion proteins expressed). Lane 3 is negative control (no SpyTag fused to Stoffel fragment). Lanes 4-7 are same as lane 2 however the variant selected Spytag sequences are fused to Stoffel. This showed that the SpyTag variants selected for are functional, further validating the selection method.

[00109] Example 7: Selection for functional SpyTag peptide variants using Compartmentalised 2-Hybrid Replication (C2HR).

[00110] A library of SpyTag-Stoffel variants was created wherein the hydrophobic “IVMV” (SEQ ID NO: 90) motif in SpyTag essential for high affinity interaction with SpyCatcher (FIG. 11A) was randomised. This library (Lib 1) was co-expressed in E.coli along with Sso7d- SpyCatcher prior to encapsulation in emulsion compartments containing oligonucleotide primers flanking the randomized region of SpyTag along with other requisite PCR components (dNTPs, high-salt buffer). Ten rounds of thermal cycling were carried out to facilitate clonal amplification of genes encoding functional SpyTag core motifs, following which amplicons were harvested and sequenced en masse. Analysis of 96,400 unique protein sequence reads identified the “IVMV” (SEQ ID NO: 90) motif as the 161st most abundant, suggesting positive enrichment. Consensus motifs of 20 random sequences from the naive and the 20 most enriched by selection varied notably, indicating stronger preference for hydrophobic residues in the latter (FIG. 9A). Further consensus sequence analysis of the top 500 abundant motifs identified the endogenous “IVMV” (SEQ ID NO: 90) motif, and highlighted tolerance for other bulky hydrophobic residues in place of the isoleucine and methionine residues (FIG. 9B). These pack into a hydrophobic groove in SpyCatcher and are essential for high affinity interaction (FIG. 11A). Higher sequence variation was tolerated at both valine positions in the motif, again commensurate with structural data showing these residues to project away from the SpyCatcher hydrophobic pocket and contributing less to productive binding interactions.

[00111] Next, a further single round selection was carried out, this time randomizing the 3 residues either side of the core “IVMVD” (SEQ ID NO: 91) motif of SpyTag (Lib 2). The obligate aspartic acid residue in this motif forms the isopeptide bond with lysine 31 in SpyCatcher. The endogenous flanking motifs “GAH” and “AYK” were present in the 52nd most abundant read out of 160415 unique protein sequences, again suggesting positive selection. Notably, no clear consensus motif emerged upon analysis of the top 500 enriched sequences, signifying a higher degree of redundancy for residues flanking the SpyTag “IVMVD” (SEQ ID NO: 91) core motif (FIG. 9B). This was confirmed by analysis of the top 10 enriched flanking motifs for SpyCatcher binding. All showed a positive, covalent interaction with Spycatcher as judged by high-salt PCR and SDS- PAGE analysis (FIG. 10). N-terminally biotinylated peptides encoding a SGSG linker and SpyTag (SGSGGAHIVMVDAYKPTKKSovT aq sequence underlined) (SEQ ID NO: 24) and the top Lib 2 selected variant (STL2: SGSGSFDIVMVDHVSPTK) (SEQ ID NO: 25)(STL2 sequence underlined) were synthesized and pull-down of a recombinant target protein (Sso7d- SpyCatcher) was assayed. As before, the variant showed comparable activity to Spytag, pulling down a similar amount of the SpyCatcher fusion protein (FIG. 12).

[00112] Example 8: Co-evolution of an interacting protein-peptide pair using C2HR

[00113] The co-evolution of both peptide and an interacting partner was investigated using C2HR. The isoleucine residue in the core “IVMV” (SEQ ID NO: 90) motif of SpyTag packs into a discrete hydrophobic pocket lined by phenylalanines 75 and 92 of SpyCatcher (FIG. 11A). These three residues were simultaneously randomized and C2HR selection carried out. In contrast to previous selections, the primer pair was chosen to generate amplicons co encoding interacting SpyCatcher and SpyTag variants during the emulsion PCR phase. Selections were carried out using uninduced cells, relying on T7 promoter leakiness to reduce protein levels and potentially increase selection pressure.

[00114] After one round of selection using induced cells, 1 out of the 42 selectants analysed comprised the endogenous FF/I residues at the randomized SpyCatcher/SpyTag positions. Other combinations that were enriched included IY/W, LF/Y and FF/P (2 out of 42 selectants for each). Consensus sequence analysis of all 42 selectants further highlighted preference for hydrophobic residues at the three randomized positions (FIG. 11B). In particular, clear selection for the endogenous phenylalanine residues in SpyCatcher was observed. No clear consensus emerged from analysis of 52 random sequences from the unselected library, although there was some inherent bias for phenylalanine and leucine at codon 92 of SpyCatcher. A second round of selection did not lead to enrichment of any specific motif, but clearly enriched for bulky hydrophobic residues at the randomized positions. In the absence of induction, the FFI motif was not observed in any of the selectants analysed in the first round. It was, however, enriched after the second round (4/47 selectants). As with induced C2HR conditions, clear selection for bulky hydrophobic residues was also observed.

[00115] The C2HR platform may be further adapted to select for other classes of proteins whose activity directly or indirectly facilitates co-localisation of polymerase and processivity factor components. These include peptide ligases belonging to the hydrolase and transglutaminase families and intein domains that regulate protein splicing. Nucleic acid modifying enzymes, particularly DNA recombinases could also be engineered by C2HR. In this case, enzyme activity fuses the otherwise split processivity and polymerase gene cassettes, leading to expression of the requisite fusion protein. Whilst this approach has been previously described using other reporter genes, dynamic read-out afforded by polymerase function may expedite selections.

[00116] Example 9: High affinity interactions can be detected under conditions of accelerated cycling conditions

[00117] Results shown in FIG. 13 indicate amplification of large -1.5 Kb amplicon (arrowed) under conditions of short annealing and extension time (15 and 10 seconds respectively) only by cells co-expressing high affinity interactants: ST and S and NS 1/4/5 and NB. Under normal annealing and extension times (30 and 120 seconds respectively) amplification occurs irrespective of interactions.

[00118] Table 2. Summary of sequence listing I Amino acid sequence of NB

[00119] Equivalents

[00120] The foregoing examples are presented for the purpose of illustrating the invention and should not be construed as imposing any limitation on the scope of the invention. It will readily be apparent that numerous modifications and alterations may be made to the specific embodiments of the invention described above and illustrated in the examples without departing from the principles underlying the invention. All such modifications and alterations are intended to be embraced by this application.