Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
CYCLOPROPENE AMINO ACIDS AND METHODS
Document Type and Number:
WIPO Patent Application WO/2015/136265
Kind Code:
A1
Abstract:
The invention relates to a polypeptide comprising an amino acid having a cyclopropene group wherein said cyclopropene group is joined to the amino acid via a carbamate group. Suitably the cyclopropene group is a 1,3-disubstituted cyclopropene such as a 1,3-di methylcyclopropene. Suitably the cyclopropene group is present as a residue of a lysine amino acid. The invention also relates to methods of making the polypeptides. The invention also relates to an amino acid comprising cyclopropene wherein said cyclopropene group is joined to the amino acid moiety via a carbamate group.

Inventors:
ELLIOTT THOMAS (GB)
Application Number:
PCT/GB2015/050694
Publication Date:
September 17, 2015
Filing Date:
March 10, 2015
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
MEDICAL RES COUNCIL (GB)
International Classes:
C12N9/36; C07K1/13; C07K14/435
Other References:
THOMAS S ELLIOTT ET AL: "Proteome labeling and protein identification in specific tissues and at specific developmental stages in an animal", NATURE BIOTECHNOLOGY, vol. 32, no. 5, 13 April 2014 (2014-04-13), pages 465 - 472, XP055188555, ISSN: 1087-0156, DOI: 10.1038/nbt.2860
AMIT SACHDEVA ET AL: "Concerted, Rapid, Quantitative, and Site-Specific Dual Labeling of Proteins", JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, vol. 136, no. 22, 4 June 2014 (2014-06-04), pages 7785 - 7788, XP055188528, ISSN: 0002-7863, DOI: 10.1021/ja4129789
WOLFGANG H. SCHMIED ET AL: "Efficient Multisite Unnatural Amino Acid Incorporation in Mammalian Cells via Optimized Pyrrolysyl tRNA Synthetase/tRNA Expression and Engineered eRF1", JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, vol. 136, no. 44, 5 November 2014 (2014-11-05), pages 15577 - 15583, XP055188529, ISSN: 0002-7863, DOI: 10.1021/ja5069728
ZHIPENG YU ET AL: "Genetically Encoded Cyclopropene Directs Rapid, Photoclick-Chemistry-Mediated Protein Labeling in Mammalian Cells", ANGEWANDTE CHEMIE INTERNATIONAL EDITION, vol. 51, no. 42, 15 October 2012 (2012-10-15), pages 10600 - 10604, XP055069973, ISSN: 1433-7851, DOI: 10.1002/anie.201205352
Attorney, Agent or Firm:
SCRIPT IP LIMITED (Frome, Somerset BA11 1BB, GB)
Download PDF:
Claims:
CLAI M S

1. A polypeptide comprising an amino acid having a cydopropene group wherein said cydopropene group is joined to theamino acid via a carbamate group.

2. A polypeptide according to claim 1 wherein said cydopropene group is a 1,3-disubstituted cydopropene.

3. A polypeptide according to claim 2 wherein said cydopropene is a 1,3-dimethylcyclopropene.

4. A polypeptide according to any of claims 1 to 3 wherein said cydopropene group is present as a residue of a lysine ami no acid. 5. A polypeptide according to any of claims 1 to 4 further comprising a tetrazine compound linked to said cydopropene group.

6. An amino acid comprising cydopropene wherein said cydopropene group is joined to theamino acid moiety via a carbarn ate group.

7. An amino acid according to claim 6 wherein said cydopropene is a

1,3-disubstituted cydopropene.

8. An amino acid according to claim 7 wherein said cydopropene is a

1,3-dimethylcyclopropene.

9. An amino acid according to any of claims 6 to 8 wherein said amino acid is a lysine ami no acid. 10. An amino acid according to claim 9 which comprises A/e-i((2-methylcydoprop- 2-en-1-yi)methoxy)carbonyl]-l-iysine.

11. An i amino acid according to claim 10 which consists of

12. A method of producing a polypeptide comprising a cydopropene group wherein said cydopropene group isjoined to theamino acid moiety via a carbamate group, said method comprising genetically incorporating an amino add comprising acydopropene group joined to theamino acid moiety viaacarbamategroup, into a polypeptide.

13. A method according to claim 12 wherein producing thepolypeptidecomprises (i) providing a nucleic acid encoding the polypeptide which nucleic acid comprises an orthogonal codon encoding theamino acid having a cydopropene group;

(ii) translating said nucleic acid in the presence of an orthogonal tRNA synthetase-'' tRNA pair capable of recognising said orthogonal codon and incorporating said amino acid having a cydopropene group into the polypeptide chain.

14. A method according to claim 12 or claim 13 wherein said orthogonal codon comprises an amber codon (TAG), said tRNA comprises WRNACUA and said tRNA synthetase comprises A-j PylRS; or wherein said orthogonal codon comprises an amber codon (TAG), said tRNA comprises mtRNAcuA and said tRNA synthetase comprises mPy!RS.

15. A method according to any of claims 12 to 14 wherein said amino acid comprising a cydopropene group is an amino acid according to any of claims 6 to 11. 16. A method of producing apolypeptidecomprising atetrazinegroup, said method comprising providing a polypeptide according to any of claims 1 to 4, contacting said polypeptide with atetrazineeompound, and incubating to allow joining of thetetrazine to the cydopropene group by an in erse electron demand Die!s-A!der cycloaddition reaction.

17. A method according to claim 16 wherein said reaction is allowed to proceed for 10 minutes or less, preferably for 1 minute or less, preferably for 30 seconds or less.

18. A polypeptide according to any of claims 1 to 5 wherein said polypeptide comprises two or more amino acids each having acyciopropenegroup, wherein each said cydopropene group is joined to each said amino acid via a carbarn ate group.

19. Apolypeptideaccordingtoclaim 18 wherein said polypeptide comprises four amino acids each having acyciopropenegroup,

20. An antibody drug conjugate (ADC) comprising a polypeptide according to any of claims 1 to 5, 18 or 19.

21. A compound, polypeptide or method substantially as described herein.

22. A compound, polypeptideor method substantially as described herein with reference to the accompanying drawings

Description:
The invention relates to site-specific incorporation of bio-orthogonal groups via the (expanded) genetic code. In particular the invention relates to incorporation of carbamate-bonded cyclopropenes into polypeptides via genetically incorporated amino acids such as lysines. Such cyclopropene groups are useful for addition of further chemical groups such as tetrazines. BACKGROU N D TO TH E I N VEN TI ON

The site-specific incorporation of bio-orthogonal groups via genetic code expansion provides a powerful general strategy for site specifically labelling proteins with any probe. However, the slow reactivity of the bio-orthogonal functional groups that can be genetically encoded, and/or their need for photoactivation, has limited this strategy's utility.

The rapid, site-specific labeling of proteins with diverse probes remains an outstanding challenge for chemical biologists; enzyme mediated labeling approaches may be rapid, but use protein or peptide fusions that introduce perturbations into the protein under study and may limit the sites that can be labeled, while many 'bio-orthogonal' reactions for which a component can be genetically encoded are too slow to effect the quantitative and site specific labeling of proteins on a time-scale that is useful to study many biological processes. There is a pressing need for general methods to site-specifically label proteins, in diverse contexts, with user-defined probes.

Inverse electron demand Diels-Alder reactions involving tetrazines have emerged as an important class of rapid bio-orthogonal reactions. The rates reported for some of these reactions are very fast.

Yu et al 2012 (Angew. Chem. Int. Ed. Volume 51, pages 10600-10604) disclose

Genetically Encoded Cyclopropene Directs Rapid, Photoclick Chemistry Mediated Protein Labelling in Mammalian Cells. The authors report the synthesis of a stable cyclopropene amino acid, the characterisation of its reactivity in a photo induced cycloaddition reaction with two tetrazoles, its site-specific incorporation into proteins both in E.coli and in mammalian cells, and its use in directing bioothogonal labelling of proteins both in vitro and in vivo. In order to incorporate their cyclopropene containing amino acid into proteins, the authors had to evolve an orthogonal tRNA/tRNA synthetase pair that selectively charges their cyclopropene lysine amino acid in response to a TAG amber codon. This required a synthetase library to be constructed, five positions within that synthetase to be randomised, together with at least five rounds of positive and negative selection screening. It is a drawback of this work that it relies on the specific mutant synthetase produced. In joining their tetrazole compounds to the cyclopropene moiety in their modified amino acids, Yu ei a! use photo activation. Photo activation is carried out at either 302 nano metres or 365 nano metres. The requirement for photo activation in joining tetrazoles to the amino acid of Yu et a/ is a drawback in the art. This is a laborious extra step in the conjugation chemistry, UV is also damaging to cells and so is disadvantageous in the in vivo/ cellular setting. Kamber ei a/ disclose Isomeric Cyclopropenes Exhibiting Unique Bioorthogonal Reactivities (2013 JACS Volume 135, pages 13680-13683). The authors discuss two reactions that can be used to tag biomolecules in complex environments: the inverse electron demand Diels-Alder reaction of tetrazines with 1,3-disubstituted

cyclopropenes, and the 1,3-dipolar cycloaddition of nitrile imines with 3,3-disubstituted cyclopropenes. The authors discuss various chemical reaction schemes used to generate stable cyclo adducts. None of the molecules discussed by Kamber et ai are amino acids. There is no reason to imagine that the compounds as described could be incorporated into amino acids. Even if any such incorporation was attempted, there is absolutely no suggestion or guidance which might allow such compounds to be incorporated into polypeptides. No schemes for synthesis of amino acids comprising any of the chemical groups described are presented by Kamber ei ai. There are no biochemical tools for incorporation into proteins mentioned anywhere in this document. Kamber et a/ are solely concerned with examining the substitution pattern on the cyclopropene, one such pattern allowing reactions with tetrazines and one such pattern not being permissive of reactions with tetrazines.

The present invention seeks to overcome problem(s) associated with the prior art. SU M M ARY OF TH E I N VENTI ON

In one aspect the invention provides a polypeptide comprising an amino acid having a cyclopropene group wherein said cyclopropene group is joined to the amino acid via a carbamate group.

Suitably said cyclopropene group is a 1,3-disubstituted cyclopropene. Suitably said cyclopropene is a 1,3-dimethyl cyclopropene. Suitably said cyclopropene group is present as a residue of a lysine amino acid. Suitably said polypeptide further comprises a tetrazine compound linked to said cyclopropene group.

In another aspect, the invention relates to an amino acid comprising cyclopropene wherein said cyclopropene group is joined to the amino acid moiety via a carbamate group.

Suitably said cyclopropene is a 1,3-disubstituted cyclopropene. Suitably said

cyclopropene is a 1,3-dimethylcyclopropene. Suitably said amino acid is a lysine amino acid. Suitably said amino acid comprises Af e ~[((2-methylcyeloprop~2-en-i- yl)methoxy)carbonyl]-l-lysine.

Suitably said amino acid comprises, or more suitably consists of:

In another aspect, the invention relates to a method of producing a polypeptide comprising a cyclopropene group wherein said cyclopropene group is joined to the amino acid moiety via a carbamate group, said method comprising genetically incorporating an amino acid comprising a cyclopropene group joined to the amino acid moiety via a carbamate group, into a polypeptide.

Suitably producing the polypeptide comprises

(i) providing a nucleic acid encoding the polypeptide which nucleic acid comprises an orthogonal codon encoding the amino acid having a cyclopropene group; (ii) translating said nucleic acid in the presence of an orthogonal tRNA synthetase/tRNA pair capable of recognising said orthogonal codon and incorporating said amino acid having a cyclopropene group into the polypeptide chain, Suitably said orthogonal codon comprises an amber codon (TAG), said tRNA comprises fctRNAcuA and said tRNA synthetase comprises M&PyiRS.

Suitably said orthogonal codon comprises an amber codon (TAG), said tRNA comprises mtRNAcuA and said tRNA synthetase comprises MmPylRS.

In another aspect, the invention relates to a method as described above wherein said amino acid comprising a cyclopropene group is an amino acid as described above.

In another aspect, the invention relates to a method of producing a polypeptide comprising a tetrazine group, said method comprising providing a polypeptide comprising a cyclopropene group as described above, contacting said polypeptide with a tetrazine compound, and incubating to allow joining of the tetrazine to the

cyclopropene group by an inverse electron demand Diels-Alder cycloaddition reaction. Suitably said reaction is allowed to proceed for 10 minutes or less, preferably for 1 minute or less, preferably for 30 seconds or less. Reactions in vivo, or in eukaryotic culture conditions such as tissue culture medium or other suitable media for eukaryotic cells, may need to be conducted for longer than 30 seconds to achieve maximal labelling. The skilled operator can determine optimum reaction times by trial and error based on the guidance provided herein.

In another aspect, the invention relates to a polypeptide as described above wherein said polypeptide comprises two or more amino acids each having a cyclopropene group, wherein each said cyclopropene group is joined to each said amino acid via a carbamate group. Provision of two or more cyclopropene groups on the polypeptide

advantageously allows joining of two or more conjugated groups (functional groups) to the polypeptide. This is especially helpful when the conjugated groups (functional groups) comprise drug molecules such as cytotoxic molecules such as in an antibody- drug-conjugate.

Suitably said polypeptide comprises four amino acids each having a cyclopropene group. Suitably the antibody drug conj ugate (ADC) comprising a polypeptide as described above comprises four amino acids each having a cyclopropene group. This is especially advantageous for the joining of four cytotoxic molecules to the ADC of interest.

In another aspect, the invention relates to an antibody drug conjugate (ADC) comprising a polypeptide as described above. Suitably the polypeptide is an antibody polypeptide such as whole antibody (e.g. a monoclonal antibody (mAb)) or is an antibody fragment (e.g. a single-chain variable fragment [scFv]), suitably an antibody fragment comprising CDR amino acid sequence.

Suitably the antibody polypeptide (or fragment) may advantageously be humanised by manufacture of chimaeric antibody polypeptide(s); suitably the antibody polypeptide (or fragment) may advantageously be CDR-grafted; suitably the antibody polypeptide (or fragment) may advantageously be fully humanised to the extent that the technology permits.

Suitably the antibody polypeptide (or fragment) may be fused to another polypeptide of interest such as such as a ligand for the transferrin receptor, for example transferrin or a part thereof, to assist in transport and/or targeting of the ADC. In another aspect, the invention relates to a polypeptide as described above wherein said tetrazine group is further joined to a fluorophore.

Suitabl said fluorophore comprises fluorescein, tetramethyl rhodamine (TAMRA) or boron-dipyrromethene (BODIPY).

Suitably said fluorophore may comprise one or more Alexa fluorophore(s). Suitably said fluorophore may comprise one or more Cyanine based fluorophore(s).

D ETAI LED D ESCRI PTI ON

Genetic code expansion methods allow the quantitative, site-specific, and genetically directed incorporation of unnatural amino acids with diverse chemical structures and bearing diverse functional groups. This is most commonly achieved b inserting the unnatural amino acid in response to an amber stop codon introduced into a gene of interest. 12 ' 13 Genetic code expansion is achieved via the introduction of an orthogonal aminoacyl-tRNA synthetase/tRNAcuA pair into cells. The pyrrolysyl-tRNA synthetase tRNAcuA pair is amongst the most useful pairs for genetic code expansion,^ because it i) can specifically recognize a range of useful unnatural amino acids, 2) can be evolved to recognize an extended range of chemical structures, and 3) can be used as an orthogonal pair for genetic code expansion in E. co/ , 14 yeast, 15 mammalian cells, 16-18 C. degans 19 and D, melanogaster, 20

We demonstrate production of newly synthesized proteins with cyclopropene groups that can be labelled with tetrazine probes introduced via a chemoselective inverse electron demand Diels-Alder reaction.

In another aspect, the invention relates to a homogenous recombinant polypeptide as described above. Suitably said polypeptide is made by a method as described above.

Also disclosed is a polypeptide produced according to the method(s) described herein. As well as being the product of those new methods, such a polypeptide has the technical feature of comprising cyclopropene suitably carbamate-linked cyclopropene.

Mutating has it normal meaning in the art and may refer to the substitution or truncation or deletion of the residue, motif or domain referred to. Mutation may be effected at the polypeptide level e.g. by synthesis of a polypeptide having the mutated sequence, or may be effected at the nucleotide level e.g. by making a nucleic acid encoding the mutated sequence, which nucleic acid may be subsequently translated to produce the mutated polypeptide. Where no amino acid is specified as the replacement amino acid for a given mutation site, suitably a randomisation of said site is used. As a default mutation, alanine (A) may be used. Suitably the mutations used at particular site(s) are as set out herein.

A fragment is suitably at least 10 amino acids in length, suitably at least 25 amino acids, suitably at least 50 amino acids, suitably at least 100 amino acids, suitably at least 200 amino acids, suitably at least 250 amino acids, suitably at least 300 amino acids, suitably at least 313 amino acids, or suitably the majority of the polypeptide of interest.

The methods of the invention may be practiced in vivo or in vitro, In one embodiment, suitably the methods of the invention are not applied to the human or animal body. Suitably the methods of the invention are in vitro methods. Suitably the methods do not require the presence of the human or animal body. Suitably the methods are not methods of diagnosis or of surgery or of therapy of the human or animal body.

The term 'comprises' (comprise, comprising) should be understood to have its normal meaning in the art, i.e. that the stated feature or group of features is included, but that the term does not exclude any other stated feature or group of features from also being present.

AD VAN TAGES

Cyclopropene is a less carbon rich group than known protein labelling groups.

Cyclopropene amino acid of the current in vention leads to more rapid protein labelling than prior art techniques.

Using the cyclopropene amino acid of the present invention leads to a more efficient incorporation than prior art labelled amino acids.

It has been known to incorporate amino acids bearing norbornene groups into proteins. The present invention offers specific advantages over prior art methods involving norbornene groups. For example, although the conjugation chemistry for cyclopropene amino acids of the invention is similar to that of norbornene containing amino acids, conjugation to cyclopropene amino acids can be faster.

Incorporation of cyclopropene amino acids according to the invention can be more efficient than incorporation of prior art unnatural amino acids. The incorporation of cyclopropene amino acids according to the invention can lead to a higher level of incorporation than prior art unnatural amino acids.

It is an advantage of the invention that the cyclopropene amino acids taught can be incorporated using wild type tRNA synthetases. Prior art unnatural amino acids have tended to require mutant tRNA synthetases for their incorporation, such as, for example, amino acids incorporating BCN groups. Rapid conjugation reactions for unnatural amino acids incorporated into polypeptides have been mentioned in the prior art. For example, TCO/BCN amino acids offer rapid reaction times, which can be faster than norbornene reaction times. However, it is an advantage of the cyclop ropene amino acids that very rapid reaction times are provided.

Certain known unnatural amino acids are able to use the wild type tRNA synthetases. For example, amino acids comprising norbornene groups can be incorporated using wild type tRNA synthetase. However, by using cycfopropene containing amino acids of the invention a higher level of incorporation is achieved. In other words, the amount of material produced which comprises the unnatural amino acid is greater when using cyclopropene containing amino acids of the invention than when using prior art unnatural amino acids such as those comprising norbornene.

It is an advantage of the invention that the cyclopropene amino acids form excellent substrates for the tRNA synthetases noted herein, most suitably the wild type tRNA synthetases noted herein.

It is an advantage of the invention that the cyclopropene containing amino acids support excellent linker chemistry, for example rapid and specific reaction with tetrazine containing compounds.

It is an advantage of the invention that the cyclopropene containing amino acids are smaller in size than known unnatural amino acids previously used to label proteins. For example, a known unnatural amino acid comprising norbornene can be

incorporated into polypeptides, but cyclopropene containing amino acids of the invention are advantageously of smaller size than the norbornene containing amino acids of the prior art.

It is an advantage of the invention that the cyclopropene amino acids are less likely to perturb protein structure when incorporated into polypeptides. At least part of this advantageous effect maybe attributed to the small size of the cyclopropene molecular group.

A key advantage of incorporation of a cyclopropene group is that it permits a range of extremely useful further compounds such as labels to be easily and specifically attached to the cyclopropene group. In another aspect, the invention relates to a polypeptide as described above wherein said cyclopropene group is joined to a tetrazine group.

An unnatural amino acid comprising an amide bonded cyclopropene has been described in the prior art (Yu ef a/ 2012). This amino acid is 3,3 disubstituted. This amino acid is as follows:

In order to incorporate this amino acid into polypeptides, it is essential to use a mutant tRNA synthetase.

In contrast, the amino acid comprising cyclopropene of the present invention contains a carbamate group (rather than an amide group). The cyclopropene containing amino acid of the present invention is therefore chemically distinct from the amide bonded cyclopropene amino acid in the art.

An exemplary amino acid of the in vention is 1,3 disubstituted. An exemplar)' amino acid of the invention is as follows:

It is an advantage of the carbamate - cyclopropene amino acid of the invention that it is incorporated well by the wild type tRNA synthetase. This has the advantage of requiring less biological manipulation in order to obtain good incorporation. This also provides the advantage of enhanced or increased incorporation. In other words, the cyclopropene - carbamate amino acid of the present invention is incorporated to higher levels and/ or more efficiently than known unnatural amino acids.

Use of the cyclopropene amino acid of the in vention may provide a superior rate of reaction with tetrazine compounds.

The carbamate chemistry of the invention provides the advantage of more degrees of freedom in the chemical structure of the incorporated amino acid. In particular, the carbamate cyclopropene of the invention has more degrees of freedom compared to the amide cyclopropene known in the art. Similarly, the carbamate cyclopropene of the invention is more accessible when present in the polypeptide chain.

By comparison with the amide bonded cyclopropene known in the art, the carbamate cyclopropene of the present invention is a slightly "longer" amino acid. This provid es the advantage of a greater "reach" for the groups of the amino acid protruding away from the amino acid backbone. Again, this can render those groups more accessible for further labelling or conjugation reactions.

The chemical structure of the carbamate cyclopropene of the invention advantageously provides more conformational degrees of freedom. In other words, the carbamate cyclopropene group of the invention can adopt more conformations within a protein structure than prior art amide bonded cyclopropene amino acids.

In more detail, this may arise from the nature of the bonding between cyclopropene group and amino acid group. In the prior art amide arrangement, the important bond is SP2 hybridised. In the invention, the important bond is SP3 hybridised, which is a more flexible bonding arrangement.

Moreover, the cyclopropene carbamate arrangement of the invention comprises a methylene group between the carbamate and the cyclopropene group. Firstly, this provides a longer molecule. The prior art amide bonded version is a less advantageous shorter molecule. More specifically, the methylene carbon in the amino acid of the present invention corresponds to a double bonded oxygen group (=o) instead of the advantageous methylene carbon of the present invention. The double bonded version in the prior art amide amino acid cannot rotate as freely as the methylene carbon bonded group in the amino acid of the invention. The fact that the amino acid of the present invention is smaller than prior art norbornene containing amino acids and yet still preserves the advantageous carbamate chemistry is a benefit of the invention. This benefit provides, among other things, better incorporation of the amino acid into the polypeptide chain.

In addition, the joining to tetrazine compounds (tetrazine conjugation) is

advantageously facilitated by the carbamate cyclopropene arrangement in the amino acid of the present invention.

Suitably said tetrazine group is further joined to a fluorophore.

Suitably said tetrazine group is further joined to a polyethylene glycol (PEG) group.

Suitably said fluorophore comprises fluorescein, tetramethyl rhodamine (TAMRA) or boron-di rromethene (BODIPY).

Suitably the cyclopropene amino acid of the in vention is incorporated into a polypeptide using the wild type tRNA synthetase.

Suitably the amino acid having a cyclopropene group is incorporated at a position corresponding to a lysine residue in the wild type polypeptide. This has the advantage of maintaining the closest possible structural relationship of the cyclopropene containing polypeptide to the wild type polypeptide from which it is derived.

Suitably the polypeptide comprises a single cyclopropene group. This has the advantage of maintaining specificity for any further chemical modifications which might be directed at the cyclopropene group. For example when there is only a single cyclopropene group in the polypeptide of interest then possible issues of partial modification (e.g. where only a subset of cyclopropene groups in the polypeptide are subsequently modified), or issues of reaction microenvironments varying between alternate cyclopropene groups in the same polypeptides (which could lead to unequal reactivity between different cyclopropene group(s) at different locations in the polypeptide) are advantageously avoided.

Suitably the polypeptide comprises two cyclopropene groups; suitably the polypeptide comprises three cyclopropene groups; suitably the polypeptide comprises four cyclopropene groups; suitably the polypeptide comprises five cyclopropene groups; suitably the polypeptide comprises ten cyclopropene groups or even more.

In principle multiple cyclopropene containing amino acids could be incorporated by the same or by different orthogonal codons/orthogonal tRNA pairs. Suitably multiple cyclopropene containing amino acids are incorporated by insertion of multiple amber codons (together with a suitable orthogonal tRNA synthetase as described herein).

Suitably the amino acid comprising cyclopropene is a lysine amino acid. In one embodiment, the tRNA may be from one species such as Methanosarcina barker i, and the tRNA synthetase may be from another species such as Methanosarcina mazes. In another embodiment, tRNA may be from a first species such as Methanosarcina mazei and the tRNA synthetase may from a second species such as Methanosarcina barkers. When an orthogonal pair comprises tRNA and tRNA synthetase from different species, it is always with the proviso that the orthogonal pair work effectively together ie. that the tRNA synthetase will effectively amino acylate the tRNA of the amino acid of interest. Equally, mutant tRNAs or mutant tRNA synthetases maybe used provided they have the correct amino acylation activity. Although it is an advantage of the invention that the cyclopropene containing amino acids of the invention are effectively charged onto tRN As using the wild type PylRS synthetase, if is equally possible to use mutant PylRS synthetases provided they are effective in charging the tRNA with the cyclopropene containing amino acid of the invention. Most suitably, orthogonal pairs comprise the tRNA and a tRNA synthetase from the same species. Of course it is possible to evolve the wild type synthetase (or another variant of a suitable synthetase) to make a synthetase for incorporation of the cyclopropene amino acid of the invention which may have increased efficiency. In principle, a Pyl derived tRNA synthetase might be of use. Chimeric tRNA synthetases may be produced provided that the charging/ acetylation part of the tRNA synthetase molecule is based on or derived from Pyl tRNA synthetase. In other words, the anti-codon part of the tRNA molecule may be varied according to operator choice, for example to direct tRNA in recognising an alternate codon such as a sense codon, a quadruplet codon, an amber codon or another "stop" codon. However, the functional acylation/ charging part of the tRNA molecule should be conserved in order to preserve the cyclopropene charging activity. Either of the Methanosardna barker! and Methanosarcina mazei species pyrrolysine tRNA synthetases are suitable.

Both the Methanosarcina barkers ' and Methanosarcina mazei tRNAs are suitable. In any case these tRNAs differ by only one nucleotide. This one nucleotide difference has no impact on their activity in connection with cyciopropene containing amino acids. Therefore, either tRNA is equally applicable in the present invention.

The tRNA used may be varied such as mutated. In all cases, any such variants or mutants of the Pyl tRNA should always retain the capacity to interact productively with the tRNA synthetase used to charge the tRNA with the cyciopropene containing amino acid.

Genetic I ncor poration and Polypeptide Producti on

In the method according to the invention, said genetic incorporation preferably uses an orthogonal or expanded genetic code, in which one or more specific orthogonal codons have been allocated to encode the specific amino acid residue with the cyciopropene group so that it can be genetically incorporated by using an orthogonal tRNA synthetase/tRNA pair. The orthogonal tRNA synthetase/tRNA pair can in principle be any such pair capable of charging the tRNA with the amino acid comprising the cyciopropene group and capable of incorporating that amino acid comprising the cyciopropene group into the polypeptide chain in response to the orthogonal codon. The orthogonal codon may be the orthogonal codon amber, ochre, opal or a quadruplet codon. The codon simply has to correspond to the orthogonal tRNA which will be used to carry the amino acid comprising the cyciopropene group. Preferably the orthogonal codon is amber.

It should be noted that many of the specific examples shown herein have used the amber codon and the corresponding tRNA/tRNA synthetase. As noted above, these may be varied. Alternatively, in order to use other codons without going to the trouble of using or selecting alternative tRNA/tRNA synthetase pairs capable of working with the amino acid comprising the cyciopropene group, the anticodon region of the tRNA may simply be swapped for the desired anticodon region for the codon of choice. The anticodon region is not involved in the charging or incorporation functions of the tRNA nor recognition by the tRNA synthetase so such swaps are entirely within the ambit of the skilled operator. Thus in some embodiments the anticodon region of the tRNA used in the invention such as RNACUA or mtRNAajA may be exchanged i.e. a chimeric tRNAcuA may be used such that the anticodon region is swapped to recognise an alternate codon so that the cyclopropene containing amino acid may be incorporated in response to a different orthogonal codon as discussed herein including ochre, opal or a quadruplet codon, and the nucleic acid encoding the polypeptide into which the cyclopropene amino acid is to be incorporated is correspondingly mutated to introduce the cognate codon at the point of incorporation of the cyclopropene amino acid. Most suitably the orthogonal codon is amber.

Thus alternative orthogonal tRNA synthetase/ tRNA pairs may be used if desired. Preferably the orthogonal synthetase/tRNA pair are Methanosardna barkeri MS pyrrolysine tRNA synthetase ( fePylRS) and its cognate amber suppressor tRNA ( ttRNAcuA).

The Methanosardna barkeri PylT gene encodes the fctRNAcuA tRNA.

The Methanosardna barkeri PylS gene encodes the MdPylRS tRNA synthetase protein. When particular amino acid residues are referred to using numeric addresses, the numbering is taken using &PylRS (Methanosardna barkeri pyrrolysyl-tRNA synthetase) amino acid sequence as the reference sequence (i.e. as encoded by the publicly available wild type Methanosardna barkeri PylS gene Accession number Q46E77):

MDKKPLDVLX SATGLWMSRT GTLHKIKHYE VSRSK1YIEM ACGDHLWNN SRSCRTARAF RHHKYRKTCK RCRVSDEDTN NFLTRSTEG TSVKVKWSA PKVKKAMPKS VSRAPKPLEN PVSAKASTDT SRSVPSPAKS TPNSPVPTSA PAPSLTRSQL DRVEALLSPE DKISLNIAKP FRELESELVT RRKNDFQRLY TNDREDYLGK LERDITKFFV DRDFLEIKSP ILIPAEYVER MGINNDTELS KQIFRVDKNL CLRPMLAPTL YNYLRKLDRI LPDPIKIFEV GPCYRKESDG KEHLEEFTMV NFCQMGSGCT RENLESLIKE FLDYLEIDFE IVGDSCMVYG DTLDIMHGDL ELSSAWGPV PLDREWGIDK PWIGAGFGLE RLLKVMHGFK NIKRASRSES YYNGIST L.

If required, the person skilled in the art may adapt fcPylRS tRNA synthetase protein by mutating it so as to optimise for the cyclopropene amino acid to be used. The need for mutation (if any) depends on the cyclopropene amino acid used. An example where the dPylRS tRNA synthetase may need to be mutated is when the cyclopropene amino acid is not processed by the fcPylRS tRNA synthetase protein. Such mutation (if desired) may be carried out by introducing mutations into the M&PylRS tRNA. synthetase, for example at one or more of the following positions in the fcPylRS tRNA synthetase: M241, A267, ¥271, L274 and C313. tRN A Synthetases

The tRNA synthetase of the invention may be varied, Although specific tRNA synthetase sequences may have been used in the examples, the invention is not intended to be confined only to those examples,

In principle any tRNA synthetase which provides the same tRNA charging (aminoacylation) function can be employed in the invention.

For example the tRNA synthetase may be from any suitable species such as from archea, for example from Methanosarcina barkers MS; Methanosarcina barkers str. Fusaro; Methanosarcina mazei Goi; Methanosarcina acetivorans C2A; Methanosarcina thermophila; or Methanococcoides burtonii. Alternatively the the tRNA synthetase may be from bacteria, for example from Desulfitobacterium hafniense DCB-2; Desulfitobacterium hafniense Y51; Desulfitobacterium hafniense PCPi; Desuifotomacuium acetoxidans DSM 771.

Exemplary sequences from these organisms are the publicallv available sequences. The following examples are provided as exemplary sequences for pyrrolysine tRNA synthetases:

>M. barker MS/ 1-419/

Methanosarcina barker! MS

VERSION Q6WRH6.1 GL74501411 MDKKPLDVLISATGLWMSRTGTLHKIKHHEVSRSKIYIEMACGDHLVVNNSRSCRTA

RAFRHiI YRKTCKRCRVSDED NFLTRSTESKNSVK ¾WSAPK KKAMPKSVSRAP KPLENSVSAKASTNTSRSVPSPAKSTPNSSVPASAPAPSLTRSQLDRVEALLSPEDKISL NMAKPFRELEPEL RRKNDFQRLYTNDREDYLGKLERDITKFF ¾RGFLEIKSPILIP AEWERMGINN D^LSKQIFRV r DKNLCLRPMLAPTLYNYLRKLDRILPGPIKIFEVGPC YRKESDGKEHLEEF MWFCQMGSGCTOENLEALIKEFLDYLEIDFEIVGDSCMVYGD TLDIMHGDLFi.SSAWGPVSLDREWGIDKPWIGAGFGLERLLKVMHGF NIKRASRS ESYYN GISTNL

IS > M. barker IF/ 1-419/

Methanosarcina barkeri six, Fusaro

VERSION ΥΡ__304395·ΐ GI:7366838o

MDKKPLDV1JSATGIA¥MSRTG^^

RAFRHH KYRKTCKRCRVSDEDMNFLTRSTEGKTSVKVKA^APKVK^

KPLENP\¾AKASTDTSRS SPAKSTPNSP TSAPAPSLTRSQLDR\¾ALLSPEDKISL NIAKPFRELESELV RRl¾ T DFQRLY'rNDREDYLGKLERDIT F DRDFLEIKSPILrPA EY T^RMGINNDTELSKQIFRVDKNirJ J RPMI PT ^LRKLDRILPDPIKIFEVGPCY RKESDGKEHLEEFTMVNFCQMGSGCTRENLESLIKEFLDYLEIDFEIVGDSCM GDT LDIMHGDI^LSSAWGPWLDREWGIDKPWIGAGFGLERLLK ^HGFKNIKRASRSE SYY GIST L >M.mazeij 1-454

Methanosarcina mazes G01

VERSION NP_633469.i GL21227547

MDKKPLNTLISATGIAYMSRTGTIHKI HHEVSRSKIYIEMACGDFILVVNNSRSSRTAR ALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKVVSAPTRTKK IP

PKPLEN^AAQAQPSGS FSPAIPVSTQESVSVPASVSTSISSISTGATASAI^GNTNPI TSMSAPVQASAPALTT SQTORI-EVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIY AEERENYLGKLERFJTRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVD NF CLRPMI-APNLYNYLRKLDRALPDPIKIFEIGPCTRKESDGKEHLE^

CTRENI SIITDFLNFILGIDFKWGDSCI R^ODTLDVMFIGDLELSSAWGPIPLDREW GIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL

>M.acetivor ans/ 1-443

Methanosarcina acetivor ans C2A

VERSION NP ... 615128.2 01:161484944

MDKKPLDTLISATGLWMSRTCMffl

ALRHHK^ RKTCRFICRVSDEDINNFLTKTSEEKTT i T ^SAPR KAMPi<SVARAP KPLEATAQVPLSGSKPAPATTVSAPAQAPAPSTGSASATSASAQRMANSAAAPAAPVPT SAPALTKGQLDRLEGLLSPKDEISLDSEKPFRELESELLSRRKKDLKRIYAEERENYLG

KLEREIT FWDRGFLEIKSPILIPAE ERMGINSDTELSKQVFRIDKNFCLRPMLAPN LYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLEAII TEFLNHLGIDFEIIGDSCMVYGNTLDVMHDDLELSSAVVGPVPLDREWGIDKPWIGA GFGLERLLKV^HGFKNIKRAARSESYYNGISTNL

> M.th am ophi!a/ 1-478

Methanosardna thermophila, VERSION DQ017250.1 01:67773308

MDKKPLNTLISATGLW

RALRHHKYRKICKHCRVSDEDLNKFLTRimDK^

PKPLENTAPVQTLPSESQPAPI PISASI APASTS fAPAPASI APAPASI APASAST TISTSAMPASTSAQGTTKFNYISGGFPRPIPVQASAPALTKSQIDRLQGLLSPKDEISLD S GTPFRKLESELLSRRRKDLKQIYi EEREHYLGKLEREITKi^\¾RGFLEIKSPILIPMEYI ERMGIDNDKF .SKQIFRVDNNFCLRPMLAPNL^m.RKLNRALPDPIKIFEIGPCYRK ESDGKEHLEEFI LNICQMGSGCTRENLEAIIKDFLDYLGIDFEIVGDSCMVYGDTLD

\TV1HGDLELSSAWGP MDRDWGINKPWIGAGFGLERLLK TV1HNFKNIKRASRSES YYNGISTNL

> M.bur ion/ / 1-416

Methanococcoides burtonii DSM 6242, VERSION YP ... 566710.1 01:91774018 MEKQLLDVL ΈLNGVWLSRSGLLHGIRNFEΠ KHIHIETDCGAllFT I^N τ SRSS SAR SLRHNKYRKPCKRCRPADEQIDRFVKKTFKEKRQWS SSPKKHWKKPK\ AVIKSFS

ISTPSPKEASVSNSIPTPSISVVKDEVKVPEVKYITSQIERLK^MSPDDKIPIQDEL PEF KA^ ^ EKELIQRRRDDIJ K ^EDREDRLGKLERDITEFWDRGFLEIKSPIMIPFEYIER MGIDM ) DHLNKQIFRVDESMCLRPMm 3 CLYNYLRKLDKVXPDPIRIFEIGPCTRKES DGSSHLF ,F I ¾TCQMGSGCTRENMEALIDEFLEHLGIEYEIEADNC I TGDTIDI MHGDLELSSA\^OPrPLDREWGVNKP\ GAGFGLERLLKVRHNY r raiRRASRSELYY NGINTNL

> D,hafn!ense__DCB-2f 1-279

Desulfitobacterium hafnienseDCB-2

VERSION YP_00246i289.i GI: 219670854

MSSFWT¾VQYQRLKELNASGEQI^MGFSDALSRDRAFQGIEHQLMSQGKRHLEQLR

WKH PALLELEEGLAKALHQQ KKCLRPMLAPNLY LWRELERLWDKPIRIFEIGTCYRKESQGAQHLNEFmLNLTEL GTPLEERHQRLEDMARWVLEAAGI

PHFLDEKWErVDPWVGLGFGLERLLMIllEGTQHVQSMARSLSYLDGVRLNIN > D. hafniense Ysi/ 1-312

Desulfitobacterium hafnienseYsi

VERSION YP__52ii92.i 01:89897705

MDRIDHTDSKFVQAGETTVLPATT FLTO

MGFSDALSRDRAFQGIEHQLMSQGKRHLEQLRTYKHRPALLELEEGLAKALHQQGF

VQV^TPTIITKSALAKMTIGEDHPLF

DKPlRIFEIGTCTRKESQGAQHLNEFTMLNLTELGTPLEERHQRLEDA AR\WLEAi\Gl REFEIΛ ESS WGDWD ^ IKGDLEI SGAMGPHFLJ)EK /VΈIVDPm' τ GLGFGLERLL MIREGTQHVQSMARSLSYLDGVRLNIN

>D. ?afn/ensePCPi/i-288

Desulfitobacterium hafniense

VERSION AY692340.1 GL53771772

MFLTRRDPPLSSFVVTi V QYQRLKELNASGEQLEMGFSDALSRDRAFQGIEHQLMSQG

KI HLEQLRTVXHRPALLELEEKL\KALHQQG QVVTPTnTKSALAiaViTIGEDHPLF SQWWLDGKKCLRPMIAPNLYTLWRELERLWDKPIRIFEIGTCYRKESQGAQH LNEF TMLNLTELGTPLEERHQRLEDMAI WVLEAAGII EFELVTESSVVYGD^ WMKGDLE LASGAMGPHFLDF iWEIFDP GLGFGLERLLMIREGTQHVQSMARSLSYLDGVRL NIN

> D.ac&oxidans/ 1-2.77

Desulfotomaculum acetoxidans DSM 771

VERSION YP . 003189614.1 01:258513392

MSFL TVSQQKRLSELNASEEEKNMSFSSTSD REAAYKRVEMRLINESKQRLNKLRH ETRPAICALENRLAAALRGAGFVQVATPVIL^

LRPMIAPNLYYILKDLLRLV\¾KPWIFEIGSCFRKESQGSNHLNEFTMLNLVEWGL PE EQRQK ISELAKL ^ IDETGIDFΛΉLEί: ES GET M^I DIELGSGALGPHFLD GRWGWGPVVVGIGFGLERLLMVEQGGQNVRSMGKSLTYLDGVRLNI

When the particular tRNA charging (aminoacylation) function has been provided by mutating the tRNA synthetase, then it may not be appropriate to simply use another wild-type tRNA sequence, for example one selected from the above. In this scenario, it will be important to preserve the same tRNA charging (aminoacylation) function. This is accomplished by transferring the mutation(s) in the exemplar}' tRNA synthetase into an alternate tRNA synthetase backbone, such as one selected from the above.

In this way it should be possible to transfer selected mutations to corresponding tRNA synthetase sequences such as corresponding pylS sequences from other organisms beyond exemplary M. barkeri and/or M.mazei sequences.

Target tRNA synthetase proteins/backbones, may be selected by alignment to known tRNA synthetases such as exemplary M. barkeri and/or M.mazei sequences.

This subject is now illustrated by reference to the pylS (pyrrolysine tRNA synthetase) sequences but the principles apply equally to the particular tRNA synthetase of interest.

For example, an alignment of all PylS sequences may be prepared. These can have a low overall % sequence identity. Thus it is important to study the sequence such as by aligning the sequence to known tRNA synthetases (rather than simply to use a low sequence identity score) to ensure that the sequence being used is indeed a tRNA synthetase. Thus suitably when sequence identity is being considered, suitably it is considered across the sequences of the examples of tRNA synthetases as above. Suitably the % identity may be as defined from an alignment of the above sequences.

It may be useful to focus on the catalytic region. The aim of this is to provide a tRNA catalytic region from which a high % identity can be defined to capture/identify backbone scaffolds suitable for accepting mutations transplanted in order to produce the same tRNA charging (aminoacylation) function, for example new or unnatural amino acid recognition. Thus suitably when sequence identity is being considered, suitably it is considered across the catalytic region. Suitably the % identity may be as defined from the catalytic region.

'Transferring' or 'transplanting' mutations onto an alternate tRNA synthetase backbone can be accomplished by site directed mutagenesis of a nucleotide sequence encoding the tRNA synthetase backbone. This technique is well known in the art. Essentially the backbone pylS sequence is selected (for example using the active site alignment discussed above) and the selected mutations are transferred to (i.e. made in) the correspon di ng/homol ogous positions.

When particular amino acid residues are referred to using numeric addresses, unless otherwise apparent, the numbering is taken using MbPylRS (Methanosar na barkeri pyrrolysyl-tRNA synthetase) amino acid sequence as the reference sequence (i.e. as encoded by the publicly available wild type Methanosardna barkeri Py S gene Accession number Q46E77): MDKKPLDVLX SATGLWMSRT GTLHKIKHYE VSRSKIYIEM ACGDHLWNN SRSCRTARAF RHHKYRKTCK RCRVSDEDIN NFLTRSTEGK TSVKVKWSA PKVKKAMPKS VSRAPKPLEN PVSAKASTDT SRSVPSPAKS TPNSPVPTSA PAPSLTRSQL DRVEALLSPE DKISLNIAKP FRELESELVT RRK DFQRLY TNDREDYLGK LERDITKFFV DRDFLEIKSP ILIPAEYVER MGINNDTELS KQIFRVDKNL CLRPMLAPTL YNYLRKLDRI LPDPIKIFEV GPCYRKESDG KEHLEEFT V NFCQMGSGCT RENLESLIKE FLDYLEIDFE IVGDSCMVYG DTLDIMHGDL ELSSAWGPV PLDREWGIDK PWIGAGFGLE RLLKVMHGFK NIKRASRSES Y XG ISTN L. This is to be used as is well understood in the art to locate the residue of interest. This is not always a strict counting exercise - attention must be paid to the context or alignment. For example, if the protein of interest is of a slightly different length, then location of the correct residue in that sequence corresponding to (for example) L266 may require the sequences to be aligned and the equivalent or corresponding residue picked, rather than simply taking the 266th residue of the sequence of interest. This is well within the ambit of the skilled reader.

Notation for mutations used herein is the standard in the art. For example L266M means that the amino acid corresponding to L at position 266 of the wild type sequence is replaced with M.

The transplantation of mutations between alternate tRNA backbones is now illustrated with reference to exemplary M. barkeri and M.mazd sequences, but the same principles apply equally to transplantation onto or from other backbones.

For example Mb AcKRS is an engineered synthetase for the incorporation of Ac

Parental protein/backbone: M. barkeri PylS Mutations: L266V, L270I, Y271F, L274A, C317F

Mb PCKRS: engineered synthetase for the incorporation of PCK

Parental protein/backbone: M. barkers PylS

Mutations: M241F, A267S, Y271C, L274M

Synthetases with the same substrate specificities can be obtained by transplanting these mutations into M. mazei PyfS. Thus the following synthetases may be generated by transplantation of the mutations from the Mb backbone onto the Mm tRNA backbone: Mm AcKRS introducing mutations L301V, L305I, Y306F, L309A, C348F into M. mazei PylS,

and

Mm PCKRS introducing mutations M276F, A302S, Y306C, L309M into M. mazei PylS. Full length sequences of these exemplaiy transplanted mutation synthetases are given below.

>Mb„PylS/i ~ 4i9

MDTGCPLDVLISATGLWMS TGTLHKIKHHEVSRSK1YIEMACGDHLVVNNSRSCRTA

KPLFA T SVSAKAST TSRS SPA1<STP SS ASAPAPSLTRSQLDRVFAL

MAKPFRELEPELVTRRKNDFQRLYTNDI EDYLGKLERDITKFFVDRGFLEIKSPILIP AEYVERMGINNDTELSKQIFRVDKNLCLRPMIAPTLYN^LRKLDRILPGPIKIFEVGPC YRKESDGKEHLEEFfMV FCQMGSGCTRENLEALIKEFLDYLEIDFErVGDSCMVYGD TLDIMHGDLELiSSAWGPVSLDPJSWGID

ESYYNGISTNL

>Mb_AcKRS/i-4i9

MDKKPLD\XISATGL\\^SRTGTLHKIKHHEVSRSKrnEMACGDHLVVN SRSCRTA RAFRHHKYRI rcKRCRVSGEDmNFLTRSTESKNSVKVR SA^

KPI^ SVSAKASTNTSRS SPAKSTP SS ASAPAPSLTRSQLDR_VEALLSPEDKISL NMAKPFRELEPELVTRRKNDFQRLYTO

AE ^¾RMGINNDTELSKQIFRVDKNLGLRPM ' T APTIFNYARKLDRILPGPIKIFEVGPC YRKESDGKEHLEEFrMVNFFQMGSGCTJ ENLEALIKEFLDYLEIDFEIVGDSCMVYGD TLJ)IMHGDLFJ J SSAWGPVSLDREWGIDKPWIGAGFGLERLLK 7 ]V1HGFKNIKRASRS ESYYNGISTNL >Mb„PCKRS/i-4i9

MDKKPLD^JSATCn.WMSRTGTIJ-IKIKHHEVSRSKmEMACXJDHIA SRSCRTA

!^ ^Ί^!Hκvi· ΓS ' CKRCκ^ ' S! ! !^^·ί ;!ΈSΊΈ8κ^s^ ' κ\ R\ VSAPKYKKA. KSVSRAP

TG > LENSVSAKAS^^^SRSWSPA S^ NSSVPASAPAPSLTRSQLDR\ΈALLSPEDKISL MAKPFRELEPEL\ RRKNDFQRLYTNDS¾EDYLGKLERD1TKFFVDRGFLEIKSPILIP AF^T^RFGINNDTELSKQIFRVDKNirJ.RPMLSPTir^MRKLDRILPGPIKIFEVGPC YRS<ESDGKEHLEEin V FCQMGSGCTRENLEALIKEFLDYLEIDFElVGDSCMVYGD TLDIMHGDI ..SSAWGPVSLDREWGIDKPWIGAGFGI^RLLK^IHGFKNIKRASRS ESYYNGISTNL

MDKKPLNTXISATGLWMSRTG^HKIKHHEVSRSKTOEMACGDHLVVNNSRSSRTAR

ALRHHKYRKTCKRCRVSDEDLN FLT^^

PKPLE TEAAQAQPSGSKFSPAIPVSTQESVSVPASVSTSISSISTGATASALVKG T PI TSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRR KDLQQIY AEEIENYLGKLEREITIFFVDRGFLEIKSPILIPLEYIER^IGIDNDTELSKQIFRVDK F CLRPMLAPNLYAm.RKLDRALPDPIKIFEIGPCJV^KESDG EHLEEFT'ILNFCQMGSG CTRE LESHTDFL HLGIDFKIVGDSCMVTGDTLDV'MHGDLELSSAVVGPIPLDREW GIDKP\ GAGFGLERLLKyKHDFKNIKRAARSESYYNGISTNL

>Mm_AcKRS/i-454

MDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSmiElvLCGDHLVVNNSRSSRTAR ALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKAT<WSAPTRTKKAMPKSVARA PKPLENTEAAQAQPSGSKFSPAIPVSTQE^^

TSMSAPVQASAPALTTCSQTORLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQ QIY AEERENYLGKLEREITRFFVDRGFLEIKSPILIPI.EYIERMGIDNDTELSKQIFRVDKN F CLRPMVAPNIFNYARKLDRALPDPIKIFEIGPCTRKESDGKEHLEEFTMLNFFQMGSG CTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAVVGPIPLDREW GIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL

>Mm_PCKRS/i-454

MDKKPL TLISATC

ALRHHK^RKTCKRCRYSDEDLNKFLTKANFJ)QTS ^K^SAPTRTKKAMPKSVARA PKPLENTEAAQAQPSGSKFSP PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPI TSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQrir AEERENYLGKLEREI RFFVDRGFLEIKSPILIPLEYIERFGIDNDTELSKQIFRVDKNFC

LRl¾lLSP LCNYMRKLDRALPDPIKIFEIGPi¾¾KESDGKEHLEEFTMLNFCQMGSGC TRENLESIITDFLNHLGIDFK1VGDSCMVYGDTLDVMHGDLELSSAVVGPIPLDREWGI DKPWTGAG FGLE LL VKH DFKNIKR A ARSESYYNG ISTNL

The same principle applies equally to other mutations and/ or to other backbones.

Transplanted polypeptides produced in this manner should advantageously be tested to ensure that the desired function/substrate specificities have been preserved.

Polynucleotides encoding the polypeptide of interest for the method described above can be incorporated into a recombinant replicable vector. The vector may be used to replicate the nucleic acid in a compatible host cell. Thus in a further embodiment, the invention provides a method of making polynucleotides of the invention by introducing a polynucleotide of the invention into a replicable vector, introducing the vector into a compatible host ceil, and growing the host ceil under conditions which bring about replication of the vector. The vector may be recovered from the host cell. Suitable host cells include bacteria such as E. coli.

Preferably, a polynucleotide of the invention in a vector is operably linked to a control sequence that is capable of providing for the expression of the coding sequence by the host ceil, i.e. the vector is an expression vector. The term "operably linked" means that the components described are in a relationship permitting them to function in their intended manner. A regulatory sequence "operably linked" to a coding sequence is ligated in such a way that expression of the coding sequence is achieved under condition compatible with the control sequences.

Vectors of the invention may be transformed or transfected into a suitable host cell as described to provide for expression of a protein of the invention. This process may comprise culturing a host cell transformed with an expression vector as described above under conditions to provide for expression by the vector of a coding sequence encoding the protein, and optionally recovering the expressed protein.

The vectors may be for example, piasmid or virus vectors provided with an origin of replication, optionally a promoter for the expression of the said polynucleotide and optionally a regulator of the promoter. The vectors may contain one or more selectable marker genes, for example an ampicillin resistance gene in the case of a bacterial piasmid. Vectors may be used, for example, to transfect or transform a host cell.

Control sequences operably linked to sequences encoding the protein of the invention include promoters/enhancers and other expression regulation signals. These control sequences may be selected to be compatible with the host ceil for which the expression vector is designed to be used in. The term promoter is well-known in the art and encompasses nucleic acid regions ranging in size and complexity from minimal promoters to promoters including upstream elements and enhancers.

Another aspect of the invention is a method, such as an in vitro method, of incorporating the cyclopropene containing amino acid(s) genetically and site- specifically into the protein of choice, suitably in a eukaryotic cell. One advantage of incorporating genetically by said method is that it obviates the need to deliver the proteins comprising the cyclopropene amino acid into a cell once formed, since in this embodiment they may be synthesised directly in the target cell. The method comprises the following steps:

i) introducing, or replacing a specific codon with, an orthogonal codon such as an amber codon at the desired site in the nucleotide sequence encoding the protein

ii) introducing an expression system of orthogonal tRNA synthetase/tRNA pair in the cell, such as a pyrollysyl-tRNA synthetase/tRNA pair

iii) growing the cells in a medium with the cyclopropene containing amino acid according to the invention.

Step (i) entails or replacing a specific codon with an orthogonal codon such as an amber codon at the desired site in the genetic sequence of the protein. This can be achieved by simply introducing a construct, such as a plasmid, with the nucleotide sequence encoding the protein, wherein the site where the cyclopropene containing amino acid is desired to be introduced/ replaced is altered to comprise an orthogonal codon such as an amber codon. This is well within the person skilled in the art's ability and examples of such are given here below.

Step (ii) requires an orthogonal expression system to specifically incorporate the cyclopropene containing amino acid at the desired location (e.g. the amber codon). Thus a specific orthogonal tRNA synthetase such as an orthogonal pyrollysyl-tRNA synthetase and a specific corresponding orthogonal tRNA pair which are together capable of charging said tRNA with the cyclopropene containing amino acid are required. Examples of these are provided herein.

Protei n Expression and Pur ification

Host cells comprising polynucleotides of the invention may be used to express proteins of the invention. Host cells may be cultured under suitable conditions which allow expression of the proteins of the invention. Expression of the proteins of the invention may be constitutive such that they are continually produced, or inducible, requiring a stimulus to initiate expression. In the case of inducible expression, protein production can be initiated when required by, for example, addition of an inducer substance to the culture medium, for example dexamethasone or IPTG.

Proteins of the invention can be extracted from host cells by a variety of techniques known in the art, including enzymatic, chemical and/or osmotic lysis and physical disruption.

Proteins of the in vention can be purified by standard techniques known in the art such as preparative chromatography, affinity purification or any other suitable technique.

FU RTH ER ADVANTAGES

Yu et al join tetrazoles to cyclopropene amino acids in polypeptides. Yu et al require the use of ultraviolet irradiation in order to photoactivate their conjugation groups. Their best reaction rates were achieved with 302 nano metres UV irradiation.

However, this type of UV irradiation has high ionisation potential. This means that the molecules and/or cells upon which the radiation is directed are likely to be damaged by this UV energy. By contrast, the conjugations of the present in vention do not require any UV step for photoactivation. Even when Yu et al use a less damaging source of UV irradiation (eg. 365 nano metre UV irradiation), the observed reaction rates are considerably slower than those provided by the present invention. Thus, even if the UV irradiation is adjusted in Yu et al in an attempt to try to avoid or reduce some of the drawbacks associated with UV treatment, the same laborious irradiation step must still be carried out and slower reaction rates are achieved. It is an advantage of the present invention that UV irradiation can be omitted, and that excellent reaction rates are obtained even without photoactivation.

It is an advantage of the cyclopropene amino acids of the present invention that they are easy to manufacture. For example, the number steps in the synthetic pathway is advantageously few.

It should be noted that the prior art cyclopropene amino acid of Yu et al contains an amide group. This amide bond is a potential substrate for peptidases. Peptidase action on the amide bond of the prior art cyclopropene amino acid would cleave the cyclopropene part of the molecule off the polypeptide. This is clearly a disadvantage. By contrast, it is an advantage of the carbamate linked cyclopropene groups of the present invention that carbamate bonded cyclopropene is not a target for peptidases. Prior art based techniques rely on tetrazole chemistry for conjugation, in contrast, the present invention teaches the use of advantageous tetrazine chemistry. It is an advantage of the carbamate bonded cyclopropene amino acids of the present invention that they enable the use of the wild type PylRS synthetase. Making use of the wild type synthetase is advantageous as it involves less labour by alleviating the need to prepare mutant synthetases. In addition, the mutant synthetases do not always amino acylate in tRNA to the same level as wild type tRNA synthetases. In other words, the mutations required to be made to a synthetase in order to handle prior art

cyclopropene amide bonded amino acids can cause a loss of efficiency of amino acylation. In contrast, it is demonstrated herein that amino acylation using the wild type synthetase with the amino acid of the present invention is a very efficient process, which is a further advantage over prior art techniques.

Further particular and preferred aspects are set out in the accompanying independent and dependent claims. Features of the dependent claims may be combined with features of the independent claims as appropriate, and in combinations other than those explicitly set out in the claims.

Where an apparatus feature is described as being operable to provide a function, it will be appreciated that this includes an apparatus feature which provides that function or which is adapted or configured to provide that function. BRI EF D ESCRI PTI ON OF TH E D RAW I N GS

Embodiments of the present invention will now be described furt her, with reference to the accompanying drawings, in which:

Fi gure 1 SORT-M enables proteome tagging and labelling at diverse codons, with diverse chemistries, and in genetically targeted cells and tissues, (a) Proteome tagging via SORT (stochastic orthogonal receding of translation) uses an orthogonal aminoacyl- tRNA synthetase/tRNA pair. The pyrrolysyl-tRNA synthetase/tRNA pair is used in this study. This synthetase (and its previously evolved active-site variants) recognizes a range of unnatural amino acids (yellow star, and yellow hexagon), does not aminoacylate endogenous tRNAs, but efficiently aminoacylates its cognate tRNA - without regard to anticodon identity; PyltRNA is not a substrate for endogenous aminoacyl-tRNA synthetases. Orthogonal pyrrolysyl-tRNA synthetase/tRNAxxx pairs (XXX indicates choice of anticodon, yellow) in which the anticodon has been altered compete for the decoding of sense codons (dark blue and pink) via a pathwa that is orthogonal to that used by natural synthetases and tR As (dark blue and pink) to direct natural amino acids. SORT allows the incorporation of diverse chemical groups into the proteome, in response to diverse codons. Since there is no competition at the active site of the orthogonal synthetase, starvation and minimal media are not required. In addition the expression pattern of the orthogonal proteome tagging system can be genetically directed allowing tissue specific proteome labelling. Selective pressure incorporation approaches are shown in Supplem entary Fi g. 1 for comparison to SORT, ( b) The combination of encoding amino acids (1-3) across the proteome via SORT and chemoselective modification of 3 with tetrazine probes (4a-g, 5, 6 and 7) allows detection of labelled proteins via SORT-M (stochastic orthogonal receding of translation and chemoselective modification). Amino acid structures: A e -((tert- butoxy)carbonyl)-L-lysine 1, A e -(i-propynlyoxy)carbonyl)-L-lysine 2 and Λ/ ε -(((2- methylcycloprop-2-en-i-yl)methoxy)carbonyi)-L-lysine.

Fi gure 2 (Supplementary Figure 2) shows Quantitative site-specific i ncor porati on of 3 i nto protei ns expressed i n E. co li and its rapid and quantitative iabe! ! i ng with tetrazi ne probes

A. The PylRS/tRNAcuA pair directs efficient, site-specific incorporation of 3 into sfGFP bearing an amber stop codon at position 150. Incorporation of 3 is more efficient than 1 a well-established excellent substrate for the PylRS/tRNAcuA pair.

B. Specific and quantitative labelling of 2 nmol sfGFP bearing 3 with 10 equivalents of tetrazine fluorophore 4a. ESI-MS analysis of sfGFP-3 purified from E. coii grown with 1 mM 3 bearing the PylRS/tRNAcuA pair and SfGFPisoTAG confirms the incorporation of 3. sfGFPi50-3: Expected mass: 27951.5 Da, Found mass: 27950 ± 1.0 Da, minor peak 27820 corresponding to loss of N-terminal methionine. Labelling sfGFPi50-3 with 4a is quantitative, as judged by ESI-MS of the labelling reaction. Expected mass: 28758.4 Da, Found mass: 28758 ± 1.0 Da, minor peak 28627 corresponds to loss of N- terminal methionine.

C, Determining the rate constant for labelling of sfGFP-3 (10,6 μΜ, sfGFP incorporating 3 at position 150), with 10 equivalents of 4a. 2 nmol of purified sfGFP-3, (10.6 μΜ in 20 mM Tris-HCl, 100 mM NaCl, 2 mM EDTA, pH 7.4) were incubated with 20 nmol of tetrazine-dye conjugate 4a (10 ui of a 2 mM solution in DMSO). At different time points 8 ,uL aliquots were taken from the solution and quenched with a 700-fold excess of BCN and plunged into liquid nitrogen. Samples were mixed with NuPAGE LDS sample buffer supplemented with 5 % β-mercaptoethanol, heated for 10 min to 90°C and analyzed by 4-12% SDS page. The amounts of labelled proteins were quantified by scanning the fluorescent bands with a Typhoon Trio phosphoimager (GE Life Sciences). Bands were quantified with the ImageQuant™ TL software (GE Life Sciences) using rubber band background subtraction. The rate constant was determined by fitting the data to a single-exponential equation. The calculated obsen ' -ed rate k' was divided by the concentration of 4a to obtain rate constant k for the reaction. Measurements were done in triplicate. All data processing was performed using Kaleidagraph software (Synergy Software, Reading, UK). For comparison the rate of labelling sfGFP bearing Ns~5-norbornene-2-yloxycarbonyl-L-lysine (NorK), a known substrate for PylRS, was determined in a similar way using 11.25 μΜ. sfGFP bearing NorK at position 150 (SfGFP-NorK) and 20 equivalents of 4a.

Fi gure 3 shows Supplementary Table 1 - Primers Fi gure 4 (Supplementary Fi gure 3) shows SORT-M enables codon specifi c proteom e taggi ng and label l i ng i n E. co li

A. Proteome labelling with 3 via the indicated PylRS/tRNAxxx pair. Cells contained two plasmids, one encoding MbPylRS, the other encoding T4 lysozyme and the indicated tRNAxxx. Cells were grown in the presence of 0.1 raM 3 from OD6oo=0.2 and T4 lysozyme expression, induced by the addition of 0.2 mM arabinose after ih. After a further 3 h cells were harvested. Tagged proteins in the lysate were detected via an inverse electron demand Diels-Alder reaction between incorporated 3 and tetrazine fluorophore 4a (20 mM, ih, RT), The amino acids in parentheses are the natural amino acids encoded by the endogenous tRNA bearing the corresponding anti-codon. B. Lane profile analysis for each codon.

Fi gure 5 (Supplementary Figure 4) shows Specific am i no acid replacement i n SORT dem onstrated by ESI -M S

T4 lysozyme isolated after SORT with UUU(Lys) in the presence of lmM 3, Expected mass WT T4 lysozyme: 19512.2 Da, Found mass: 19510 ± 2.0 Da. Expected mass WT T 4 lysozyme Lys-*3 single mutation: 19622.3 Da, Found mass: 19620 ± 2.0 Da,

Fi gure 6 (Supplem entary Fi gure 5) shows I ncorporation o! 3 (0.1 m l ) via SORT-M is not foxi c to cei ls

Chemically competent DH10B cells were transformed with two plasmids: pBKwtPylRS necessary for expression of PylRS, and pBAD_wtT4L_MbPyfTxxx plasmids that is required for expression of PyltR Axxx and expresses lysozyme under arabinose control. The cells were recovered in i nil SOB medium for one hour at 37°C prior to aliquoting to 10 ml LB-KT (LB media with 50 ^ig ml "1 kanamycin, and 25 ^ig ml '1 tetracycline) and incubated overnight (37°C, 250 rpm, 12 h). The overnight culture (ΟΌ ()00 ~3) was diluted to a OD& 0 o~Q-3 in 10 mL LB-KT1/2 (LB media with 25 μg mb 1 kanamycin, and 12.5 μg mh 1 tetracycline) supplemented with 3 at different concentrations, o, 0.1, 0.5 mM. 200 μL aliquots of these cultures were transferred into a 96-well plate and OD 6 oo measured using a Microplate reader, Infinite 200 Pro (TECAN). OD 6oo was measured for each sample ever-}' 10 min with linear 1 mm shaking between the measurements. Fi gure 7 (Supplem entary Fi gure 6) shows Measurement of time-dependent variati n in incorporation of 3 in proieome v a SORT-M at different concentrations of 3 in response to AAA codon

Chemically competent DH10B cells were transformed with two plasmids: pBKwtPylRS necessary for expression of PylRS, and pBAD_wtT4L_MbPylTuuu plasmid that is required for expression of PyltRNAuuu- pB AD_wtT4L_MbP lTuuu plasmid also contains the gene for expression of T4 lysozyme that is downstream, of arabinose-inducible promoter. After transformation, cells were recovered in 1 ml SOB medium for one hour at 37°C prior to inoculation in 10 ml LB-KT (LB media with 50 ug ml "1 kanamycin, and 25 μg ml "1 tetracycline). The culture was incubated overnight (37°C, 250 rpm, 12 h) and subsequently diluted to an OD 6 oo~0-3 in 30 mL LB-KT ( / 2 (LB media with 25 μg ml 1 kanamycin, and 12.5 μg ml "1 tetracycline) supplemented with 3 at different concentrations, 0, 0.1 , 0.5 mM. The cultures was incubated (37°C, 250 rpm) for 1 h, when OD 6 oo reached approximately 0.6. 2 ml culture aliquot was collected in a separate tube for each of three cultures. This is the pre-induction culture (lane labelled asl in the gel image). Subsequently arabinose was added at a final concentration of 0.2% (v/v) to induce expression of T4 lysozyme and culture aliquots of 2 mL were collected every hour (lanes labelled as 2, 3 and 4 corresponding to 1, 2 and 3h culture collection after induction). For each of the collected cultures, bacterial cells were pelleted by centrifi!gation at 4 °C, washed with ice cold PBS (3 x 1 mL) and subsequently the pellets were frozen and stored at -20 °C. The pellets were then thawed in 200 μΐ. of ice cold PBS and lysed by sonication (9 x 10 s ON / 20 s OFF, 70% power). The iysates were clarified by centrifugation at 15,000 RPM, 4 °C for 30 minutes. The supernatants were transferred to fresh 1.5 mL. tubes. 50 μί, of supernatant was transferred to a new tube for the labeling reactions, and the rest was frozen in liquid nitrogen and stored at -80C. To the 50 μΐ. of supernatant, 0.5 μΐ. of 2 mM 4a was added and the Iysates were incubated ax 25°C for 1 hour. After lh, 17 of 4X LDS sample buffer supplemented (6mM BCN and 5% BME) was added and mixed by vortexing gently. Samples were incubated for lornin before boiling at 90 °C for 10 min. Samples were analysed by 4-12% SDS-PAGE and fluorescent images were acquired using Typhoon Trio phosphoimager (GE Life Sciences)

Fi gure 8 shows Site-specific incorporation of 3 into proteins at diverse codons and specific proteome labelling using SORT-M in human cells, (a) Western blot analysis demonstrates the efficient amino acid dependant expression of an mCherry-EGFP fusion protein separated by an amber stop codon bearing a C-terminal HA-tag (mCh- TAG-EGFP-HA) in HEK293T ceils, Anti-FLAG detected tagged PylRS (b) Specific labelling of mCh-TAG-EGFP-HA (immunoprecipitated from 10 6 cells) with 4a (20μΜ in 5θμί PBS, ih, RT) confirms the incorporation of 3 into protein in HEK293 cells, (c) SORT-M labelling of 3 that is statistically incorporated into newly synthesised proteins across the whole proteome of mammalian cells directed by six different PylRS/PyltR Axxx mutants using 0.5 mM 3. Labeling with 4g (20μΜ in PBS, ih, RT, as above). The amino acids in parentheses are the natural amino acids encoded by the endogenous tRNA bearing the corresponding anti-codon.

Fi gure 9 (Supplementary Figure 11) shows

A. Full blots from Figure 8.

B. Full blots from Fi gure 10.

Fi gure 10 shows Site-specific incorporation of amino acid 3 into protein produced in Drosophila m anogaster. (a) Incorporation of 3 demonstrated by a dual luciferase reporter. Dual luciferase assay on ovary extract from 10 female flies expressing Triple- Rep-L in the presence or absence of 10 mM 1 or lomM 3. The data show a representative example from 1 of 3 biological replicates. The error bars represent the standard deviation of 3 technical replicates from a single biological replicate, (b) Site- specific incorporation of 3 (or 1) into GFP_TAG_mCherry-HA in flies expressing PylRS/PyltRNAcuA. The full-length protein resulting from unnatural amino acid incorporation is detected by anti-HA western blot, (c) Specific labelling of encoded 3 with tetrazine probes. Flies were fed with no amino acid, amino acid 1 (500 flies) or amino acid 3 (100 flies). 5 times more flies were fed with 1 in order to generate comparable amount of reporter protein. The full-length protein containing the unnatural amino acid was immunoprecipitated from lysed ovaries with anti-GFP beads. The beads were labelled (4g, 4μΜ, 200μί PBS, RT. 2h) washed. Full length protein was detected by anti-HA blot and the same gel imaged on a fluorescence scanner shows specific fluorescent labelling of the protein incorporating 3 but not 1, confirming the identity of the incorporated amino acid. Figu re 11 (exam pie 6) Specific protein labeling at genetically encoded unnatural amino acids 1 and 2. (a) Genetically encoded 1, but not 2 , in calmodulin is specifically labeled with probe 3 , Coomassie and fluorescence images demonstrate the specificity of labeling and ESI MS before labelling (black, expected mass: 17875, found mass: 17874) and after labelling (red, expected mass: 18553, found mass: 18552) demonstrate the reaction is quantitative, (b) Genetically encoded 2, but not 1, in calmodulin is specifically labeled with probe 4, Coomassie and fluorescence images demonstrate the specificity of labeling and ESI MS before labeling (black, expected mass: 17930, found mass: 17930) and after labelling (green, expected mass: 18484, found mass: 18485) demonstrate the reaction is quantitative. Raw (before deconvolution) ESI- MS spectra are not shown.

Figu re 12 (exam le 6) Incorporating 1 and 2 at positions 1 and 40 of Calmodulin and the kinetics of specific labelling, (a) Expression was performed in E. coli bearing ribo-Qi, O-gst- cam iTAG-40AGTA, the PylRS/tRNAuAcu pair and the A PrpRS/tRNAcuA pair. Amino acids 1 and 2 were used at 4 and 1 mM, respectively, (b) Labelling time course for reaction of CaM1i2 40 with 3 and 4. Each reaction was followed for 2h by in gel fluorescence and mobility shift.

Figu re 13 (exam ple 6) Concerted, quantitative one-pot, dual labeling of Calmodulin in 30 minutes, (a) Dye dependent labeling of CaM1,2 40 ; sequential labeling with purification after first labeling in lane 4, sequential labeling without purification in lane 5, one-pot dual labeling in lane 6. (b) ESI-MS of one-pot protein labeling, before labeling (black, expected mass: 18000 found mass: 18000), after labeling (gold, expected mass: 19233 found mass: 19234). Raw (before deconvolution) ESI-MS spectra are not shown.

Figure 14 shows Scheme A

Fi gure 15 {supplementary fi gure 15) shows Amino acid and DNA sequence of Drosophila GFP-amber-mCherry-HA.

GFP (amino acid residues 1-238), Amber codon at position 248, mCherry (amino acid residues 255-489), HA tag (amino acid residues 491-499), Myc tag (amino acid residues 500-509), His tag (amino acid residues 510-515) and SV40 NLS (amino acid residues 523-528).

Fi gure 16 shows structure of exemplary' amino acid A e -[((2-methylcycloprop- yi)methoxy)carbonyl]-l-iysine.

Although illustrative embodiments of the invention have been disclosed in detail herein, with reference to the accompanying drawings, it is understood that the invention is not limited to the precise embodiment and that various changes and modifications can be effected therein by one skilled in the art without departing from the scope of the invention as defined by the appended claims and their equi valents.

Chem ical syntheses - general m ethods

All chemicals and solvents were purchased from Sigma-Alrich, Alfa Aesar or Fisher

Scientific and used without further purification unless otherwise stated. Qualitative analysis by thin layer chromatography (TLC) was performed on aluminium sheets coated with silica (Merck TLC 60F-254). The spots were visualized under short wavelength ultra-violet lamp (25411m) or stained with basic, aqueous potassium permanganate, ethanolic ninhydrin or vanillin. Flash column chromatography was performed with specified solvent systems on silica gel 60 (mesh 230-400).

LC-MS analysis was performed on Agilent 1200 machine. The solvents used consisted of 0.2 % formic acid in water (buffer A) and 0.2 % formic acid in acetonitrile (buffer B). LC was performed using Phenomenex Jupiter C18 column (150 x 2 mm, 5μηι) and monitored using variable wavelengths. Retention times (R t ) are recorded to a nearest 0.1 min and m/z ratio to nearest 0.01 mass units. The following programme was used for small molecule LC gradient: 0-1 min (A:B 10:90-10:90, 0.3 mL/min), 1-8 min (A:B 10:90-90:10, 0.3 mL/min), 8-10 min (A:B 90:10-90:10, 0.3 mL/min), 10-12 (A:B 90:10-10:90, 0.3 mL/min). Mass spectrometry analysis following LC was carried out in ESI mode on a 6130 Quadrupole spectrometer and recorded in both positive and negative ion modes. NMR analysis was carried out on a Bruker 400MHz instmment. All reported chemical shifts (8) relative to TMS were referenced to the residual protons in deuterated solvents used: dr~ chloroform (Ή δ = 7.26 ppm, δ : = 77.16 ppm), cfe - dimethylsulfoxide (Ή δ = 2.49 p m ? 13 C δ = 39.52 ppm), D 2 0 (Ή δ = 4.70). APT or two-dimensional experiments (COSY, HSQC) were always performed to provide additional information used for analysis where needed. Coupling constants are given in Hz and described as: singlet - s, doublet - d, triplet - t, quartet - q, broad singlet - br, multi let - m, doublet of doublets - dd, etc. and combinations thereof.

Protei n expression , purification and label l i n g of site-specifical ly

i ncor porated 3 i n E. col i

Expression and purification of sfGFP-3 from E. co/iElectrocompetent E. coil DH10B cells were co-transformed with pBK-M&PylRS and ps GFPlSQTAG PylT 14, 26 . Transformed cells were recovered in S.O.B. (1 mL, supplemented with 0.2% glucose) for 1 h at 37 9 C and used to inoculate LB containing 50 .ug/rnL kanamycin and 25 .ug/mL tetracycline (LB-KT). The ceils were incubated with shaking overnight at 37 9 C, 250 r.p.m. 1 mL of overnight culture was used to inoculate 100 mL of LB-KT½, the day culture was then incubated (37 9 C, 250 r.p.m). At O.D.gQo ~0.3, the culture was divided equally and supplemented with either 3 (1 mM) or H 2 0 (500 μΙ.) and incubated further (37 9 C, 250 r.p.m). At O.D. 60 o ~0.6 protein expression was induced by the addition of arabinose (0.2%), after 4 h, the ceils were harvested by

centrifugation (4000 r.p.m, 20 min) and the pellet frozen until further use.

The frozen bacterial pellet was thawed on ice and resuspended in 2,5 mL lysis buffer (Bugbuster®, Novagen ® , 50 , ug/raL DNAse 1, Roche inhibitor cocktail and 20 mM imidazole). Cells were incubated (4 °C, 30 minutes) then clarified by centrifugation (16000 g, 4 °C, 30 minutes). The clarified lysates were transferred to fresh tubes and 100 ,LIL Ni-NTA slurry added. The mixtures was incubated with agitation (4 °C, 1 h) and then collected by centrifugation (1000 g, 4 °C, 5 min). The beads were resuspended three times in 500 wash buffer (10 mM Tris-HCL, 40 mM imidazole, 200 mM NaCl, pH 8) and collected by centrifugation (1000 g, 4 °C, 5 min). Finally, the beads were resuspended in 100 μ-L elution buffer (10 mM Tris-HCL, 300 mM imidazole, 200 mM NaCl, pH 8), pelleted by centrifugation (1000 g, 4 °C, 5 min) and the supernatant collected into fresh tubes. The elution was repeated three times with 100 μί, of elution buffer. The purified proteins were analysed by 4-12% SDS-PAGE and LC-MS.

Protein Mass Spectrometry

Using an Agilent 1200 LC-MS system, ESI- MS was additionally carried out with a 6130 Quadrupole spectrometer. The solvent system consisted of 0.1 % formic acid in H 2 0 as buffer A, and 0.1 % formic acid in acetonitrile (MeCN) as buffer B. Protein UV absorbance was monitored at 214 and 280 nm. Protein MS acquisition was carried out in positive ion mode and total protein masses were calculated by deconvolution within the MS Chemstation software (Agilent Technologies). In vitro labeling of purified sfGFP 150-3

To Purified sfGFPiso-1 or sfGFPiso-3 protein (-30 μΜ, in elution buffer) was added 4a (10 molar equivalents, from a 2 mM stock solution in DMSO). The reactants were mixed by aspirating several times and the mixture then incubated at room temperature for 2 hours, a sample was analysed by ESI-MS. Following incubation the proteins were separated by 4-12% SDS-PAGE and analysed by using Typhoon Trio phosphoimager (GE Life Sciences).

Time course of sfGFP150~3 and sfGFP150-NorK labelling and rate constant determination

2 nmol s/CFP-3 (10.6 μΜ) was labeled at room temperature by the addition of 20 nmol of tetrazine-dye conjugate 4a (ΙΟμΙ of a 2 mM solution in DMSO) the samples were mixed by aspirating several times. At different time points, 8 ^iL aliquots were taken from the solution and quenched with a 700-fold excess of bicyclo[6.1.0]non-4-yn-9-ylmethanol (BCN) and plunged into liquid nitrogen. Samples were mixed with NuPAGE LDS sample buffer supplemented with 5 % β-mercaptoethanol, heated for 10 min to 90°C and analyzed by 4-12% SDS page. The amounts of labelled proteins were quantified by scanning the fluorescent bands with a Typhoon Trio phosphoirnager (GE Life Sciences). Bands were quantified with the ImageQuant 1 TL software (GE Life Sciences) using rubber band background subtraction. The rate constant was determined by fitting the data to a single-exponential equation. The calculated observed rate k' was divided by the concentration of 4a to obtain rate constant k for the reaction. Measureme ts were done in triplicate. All data processing was performed using Kaleidagraph software

(Synergy Software, Reading, UK). For comparison the rate of labelling sfGFP bearing Ne-5- norbornene-2-yioxycarbonyl-L-lysine (NorK), a known substrate for PylRS, was determined in a similar way using 11.25mM sfGFP bearing NorK at position 1 50 (SfGFP-NorK) and 20 equivalents of 4a. P!asmid construction for pBAD_ wt T4L_MbPylT XX x

pBAD_T4L83TAG_MbPylT C uA was digested with Ncol and Kpnl restriction enzymes. The same restriction enzymes were also used to digest the wild-type T4 lysozyme from (D67) pBAD_wtT4L. The insert and backbone were ligated in 3 :1 ratio using T4 DNA ligase (RT, 2 hours), transformed into chemically competent DH 10B cells and grown on Tetracycline agar plates (37°C, 18 hours). Single colonies were picked and the correct sequence was confirmed by DNA sequencing (GATC Gmbh.), this step created pBAD_wtT4L_MbPylT C uA- All final constructs were confirmed by DNA sequencing.

Proteomic incorporation of 3 via SORT in E. coli expressing T4 lysozyme

Electrocompetent E. coli DH10B cells (50 , uL) were either doubly transformed with pB AD_wtT4L_MbPylTxxx plasmid (2 μΐ., necessary for expression of PyitRNAxxx and expresses T4 lysozyme under arabinose control) and pBKwtPylS plasmid (2 μΐ. necessary for expression of PylRS) or singly transformed with pBAD_wtT4L_MbPylTxxx alone.

Transformed cells were recovered in 1 mL S.O.B. (supplemented with 0.2% glucose) for 1 h at 37 °C. 100 uL of the recovery was used to inoculate 5 mL LB-KT (50 ,ug/mL kanamycin and 25 , ug/mL tetracycline) or LB-T (25 μg/mL tetracycline). Cultures were incubated overnight (37 °C, 250 r.p.m.). I mL of each overnight culture was used to inoculate 15 mL ½ strength antibiotic containing media LB-T or LB-KT. Cultures were incubated at 37 °C until O.D. 60 o -0.3 was reached, at this time each culture was divided into 5 mL aliquots and supplemented with either 3 (0.1 mM final cone.) or H 2 0 (50 μΕ). Cultures were then incubated (37 °C, 250 r.p.m.). At O.D.600 0.6. T4 lysozyrne expression was initiated by the addition of arabinose (0.2% final cone.) and cultures incubated for a further 4 hours. Cells were harvested by centrifugation (4000 rpm, 4 °C, 20 minutes) and then resuspended three times in 1 mL of ice cold PBS and collected by centrifugation (4000 rpm, 4 °C, 20 minutes). The final bacterial pellets were immediately frozen for storage.

E.coli: Ghemoseiective labelling proteomes tagged with 3 with tetrazine-dye conjugates

Frozen bacterial pellets were resuspended in 500 uh PBS and lysed using a bath sonicator (energy output 7.0, 90 s total sonication time. 10 s blasts and 20 s breaks, Misonix Sonicator 3000). The lysate was cleared by centrifugation (4 °C, 14000 r.p.m., 30 min) and the supernatant aspirated to a fresh tube. To 50 μί, of cleared cell lysate was added 4a (2 mM, stock in DMSO, final concentration - 20 μΜ). The reactions were mixed by aspirating several times and the samples then incubated in the dark (room temperature, 1 h). After this time 17 uL of 4X LDS sample buffer supplemented (6 mM BCN and 5% BME) was added and mixed by vortexing gently. Samples were incubated for 10 min before boiling at 90 °C for 10 min. Samples were analysed by 4- 12% SDS-PAGE and fluorescent images were acquired using Typhoon Trio phosphoimager (GE Life Sciences).

The same protocol for fluorescent labelling of the E. Coli proteins was applied for all tetrazine-dye conjugates.

Site-sp ecific incorp oratio n of 3 in HEK293 cells and client oselectiv e lab elling w ith tetrazine p rob es

Ste specific incorporation of 3 in H EK ceils

HEK293 Cells (ATCC CRL-1573) were plated on 24 well plates and grown to near confluence. The cells were transfected using Lipofectamine 2000 (Invitrogen) with the pMmPylS-mCherry-TAG-EGFP-HA construct and the p4CMVE-U6~PylT construct. 18 After i6hrs growth with or without lmM 3 or with imM 1 the cells were lysed on ice using RIPA buffer (Sigma). The lysates were spun down and the supernatant was added to 4X LDS sample buffer (Life technologies). The samples were run out by SDS-PAGE, transferred to a nitrocellulose membrane and blotted using primary rat anti-HA(clone 3F10, Roche, No. 1 867 423) and mouse anti-FLAG (clone G191, Abnova, cat. MAB8183), the secondary antibodies were anti-rat (Invitrogen, A11077) and anti- mouse (Cell Signaling Technologies, No. 70768).

Labelling site-specifically incorporated 3 from HEK 293 cells Adherent HEK293T cells (ATCC C L-11268; 4x1ο 6 per immunoprecipitation) were transfected with 7.5 fig p4CMVE-U6-PylT and 7.50 pPylRS-mCherry-TAG-EFGP-HA 18 using TransIT-293 transfection reagent according to the manufacturer's protocol and cultured for 48 hours in DMEM/io%FBS, supplemented with 0.5 mM 1 or 2 mM 3 where indicated. Cells were washed twice with PBS and lysed on ice for 30 minutes in imL Lysis Buffer (150 mM NaCl, 1% Triton X-100, 50 mM Tris HC1 (pH 8.0). After clarifying the lysate by centrifugation (10 min at i6ooog), HA-tagged proteins were captured using 50 ,uL uMACS HA-tag MicroBeads (Miltenyl Biotec) per transfection, washed with 0.5 mL RIPA (150 mM NaCl, 1% Igepal CA-630, 0.5% sodium deoxychoiate, 0.1% SDS, 50 mM Tris HC1 (pH 8.0) and 0.5 mL PBS (pH 7.4). The suspension of MicroBeads was incubated with 50 fiL PBS (pH 7.4), 20 μΜ 4a for 1 hour and subsequently washed with 0.5 mL RIPA to remove excess dye. HA-tagged proteins were eluted from beads using SDS sample buffer and separated on a 4-12% Bis-Tris PAGE gel (Invitrogen), imaged using a Typhoon imager (GE Healthcare) and subsequently stained with DirectBlue or transferred for western blotting with Anti- HA- tag pAb-HRP-DirecT (MBL).

Exp ression a nd p urificatio n of SfGFP fro m m a m m a lian cells

HEK293T were transfected in a 10cm tissue culture dish with isug DNA using PEI and incubated for 72 hours with 3 (0.5 μΜ). Cells were washed twice with PBS and lysed in imL RIPA buffer. Cleared lysate was added to 5θμΙ, GFP-Trap® M (ChromoTek) and incubated for 4I1. Beads were washed with imL RIPA, imL PBS, imL PBS+soomM NaCl, imL ddH20 and eluted in i Acefic Acid/ddH20. Purified protein was labeled with 2μΜ 4a for 4b and loaded on a 4-12% Bis-Tris PAGE gel. Fluorescence of 4a- labeled sfGFP was detected on a Typhoon imager and gel was stained subsequently with DirectBlue.

Fly p las m ids, transgenic flies and culture

For all fly experiments no randomisation or blinding was used within this study. Plasms d construction for transgenic fly line generation

The PyltRNAcuA anticodon was mutated using the QuikChange mutagenesis kit and pSGioS (pJet i.2-U6-PylT, gift from S. Greiss) as a template. This contains the PylT gene without its 3' terminal CCA fused to the Drosophi!a U6-b promoter. Primers FMT19 and FMT20 were used to generate PyitRNArcc to decode alanine codons (creating pFTi8); primers FMT23 and FMT24 were used to generate PyltRNAGcr to decode serine codons (creating pFT2o); primers FMT27 and FMT28 were used to generate PyltR AcAG to decode leucine codons (creating pFT22) and primers FMT29 and FMT30 were used to generate PyltRNAcAT to decode methionine codons (creating PFT23). The mutated tRNA expression cassettes were subcloned from pFTiS, pFT20, pFT 22 and PFT23 into pUCiS using EcoRI and HinDIII then multimerised using AsiSI, BamHI and Bglll to create 2, then 4 copies of the tRNA. The 4 copy versions of the tRNA cassette were subcloned into pSGiiS using AsiSI and Mlul to create P-FT58 (Ala), pFT6o (Ser), pFT62 (Leu) and PFT63 (Met). pSGi.1.8 contains the M.maza PylRS gene. 20

Fly lines and culture conditions

Transgenic lines were created by P element insertion using a Drosophila embryo injection service (BestGene Inc.). Lines were generated using the following plasmids: PFT58 (Ala), pFT6o (Ser), pFT02 (Leu) and pFT63 (Met). nos-Gal4-VPi6 (Bloomington 4937) and MSi096-Gal4 (Bloomington 8860) were used as Gal4 drivers. All flies were grown at 25°C on standard Iberian medium. Flies were fed unnatural amino acids by mixing dried yeast with the appropriate concentration of amino acid (usually lomM) diluted in dH 2 0 to make a paste. Ovaries were prepared from females that were grown on Iberian fly food supplemented with a yeast paste with or without the amino acid for a minimum of 48 hours. For proteome labelling experiments transgenic male flies of constructs FT58, FT60, FT62 and FT63 were crossed with nos- vpi6~GAL4 virgins to generate FT58/nos-vpi6-GAL4, FT6o/nos-vpi6-GAL4, FT62/nos-vpi6-GAL4 and FT63/'nos-vpi6-GAL4 respectively.

Site sp ecific in corp ora tio n of 3 in D, m ela n ogaster

Lucif erase assays

Ovaries from 10 females of Triple Rep-L flies recombined with nos-Gal4-VPi6 fed 3, 1 or no amino acid were dissected in ιοομΐ ix Passive lysis buffer and processed for luciferase assays as previously described 20 .

!mmunoprecipitation and labelling of site specifically incorporated 3

Ovaries from 100 (for control and 3) or 500 (for 1) females were dissected in PBS then lysed in 300 or 1500 μΐ RIPA buffer containing ix complete protease inhibitor cocktail (Roche). A sample was taken into 4 x LDS buffer as a total lysate control then the remainder was used for immunoprecipitation with GFP-TRAP agarose beads (Chromotek) following the manufacturer's instructions. The total volume of the IP was 3ml. After overnight incubation, the beads were washed 2 x with RIPA buffer then 2 x with PBS. For tetrazine labeling, the beads were resuspended in 200μ1 PBS + 4μΜ 4g and incubated for 2 hours on a roller at RT. The beads were washed 3 times with 500 ( uL of wash buffer then resuspended in 4x LDS sample buffer.

Exam ple 1 - Synthesi s of N^[C(2~m ethylcyc!loprop » 2-en » 1 » yf ) methoxy)car bony!] -L-!ysi ne 3

A class of reaction useful in protein labelling is the very rapid and specific inverse electron demand Diels-Alder reaction between strained alkenes (or alkynes) and tetrazines. 21"25

While we, and others, have previously encoded unnatural amino acids bearing strained alkenes, alkynes and tetrazines via genetic code expansion and demonstrated their use for site-specific protein labelling via inverse electron demand Diels-Alder reactions, 26"30 all the molecules used to date are rather large. We have previously shown that a variety of carbamate derivatives of lysine are good substrates for PylRS, 31 and it has been demonstrated that 1,3 disubstituted cyclopropenes, unlike 3,3 di substituted cyclopropenes, 32 ' 24 react efficiently with tetrazines. 22 We therefore designed and synthesized a carbamate derivative of lysine, bearing a 1,3 disubstituted cyclopropene ( A e -[((2-methylcycloprop-2-en-i-yl)methoxy)carbonyl]-L-ly sine 3, Fi g. 1b), for incorporation into proteins and labelling with tetrazines.

ynthesis of met ylcydoprop~2~en » 1 » yl}methoxy)carbonyl]~L-lyssne (3)

S4

Scheme 1. Synthesis of A^-[({2-methylcycloprop-2-en-i-yl}methoxy)carbonyl]-L- lysine 3. Reagents and conditions; L Rh 2 (OAc) 4 , propyne, CH 2 Cl 2 , 4 °C to RT, 75% yield; is. DIBAL-H, CH 2 C1 2 , o °C to RT; Mi.4-nitrophenyl chloroformate, Hiinig's base, CH 2 C1 2 , RT, 73% yield; iv. Fmoc-Lys-OH, Hiinig's base, THF/DMF, 4 °C to RT, 82% yield; v. NaOH, THF/H 2 0, RT, 68% yield.

L Ethyl 2-met y!cydoprop-2-ene-1-carboxy!ate S1

A 100 niL 2-neck round bottom flask was charged with CH 2 C1 2 (2 mL) and rhodium acetate (442 mg, 1 mmol, 0.05 eq), and fitted with a dry ice condenser. Propyne (approx. 10 mL) was condensed into the rhodium acetate suspension and the flask lowered into a water bath (20 °C), a steady reflux of propyne was obtained. Ethyl diazoacetate (2.1 mL, 20 mmol, leq) was added to the stirred propyne solution drop- wise over 1 h using a syringe pump. The reaction was stirred at room temperature for a further 10 minutes whereby TLC analysis showed the reaction to be complete by after this time. The cyclopropene product was then purified by silica gel flash column chromatography eluting with pentane and diethyl ether (90:10). This gave the desired product S1 as a colourless volatile liquid (1.9 g, 75% yield). Ή NMR analysis δ ' Η (400 MHz, CDCI 3 ) 6.35 (iH, t, J 1.4), 4.18-4.09 (2H, m), 2.16 (3H, d, J 1.3), 2.12 (iH, d, J 1.6), 1.26 (3li, t, J 7.1); LRMS m/z (ES + ) 127.2 [M+H] + .

These values are in good agreement with literature.{Liao, 2004 #1}

M. and Mi. (2~!V!ethyfcycioprop-2-en-1-y!)rnet y! (4-nitrop eny!) carbonate

S3

DIBAL-H (22.5 mL of a lM solution in CH 2 C1 2 , 22.5 mmol, 1.5 eq) was added drop-wise to a stirred solution of cyclopropene ester S1 (1.9 g, 15 mmol, 1 eq) in CH 2 C1 2 (15 mL) at -10 °C. The reaction was stirred at -10 °C for 20 minutes before quenching with the cautious addition of H 2 0 (i mL), then NaOH (i mL of a i M solution in H 2 0) and H 2 0 (2.3 mL). The mixture was stirred for a further 2h at room temperature before it was dried (Na 2 S0 4 ) and filtered. Hunig's base (3.9 mL, 22.5 mmol. i.seq) was added to the filtrate (containing crude cyclopropene alcohol S2) followed by the addition of 4- nitrophenyl chloroformate (3.3 g, 16.5 mmol, 1.1 eq). After stirring at room temperature for 18 hours a significant colourless precipitate formed, and TLC analysis showed complete consumption of the crude cyclopropene alcohol S2. The reaction was diluted with CH2CI2 and then dry loaded onto silica gel, whereby the activated carbonate S3 was purified by silica gel column chromatography eluting with ethyl acetate and hexane (20:80). This gave the desired cyclopropene carbonate S3 as a colourless oil (2.7 g, 73% yield over 2 steps). Ή NMR analysis δ Η (400 MHz, CDC¾) 8.28 (2H, d, J 9.2), 7.39 (2H, d, J 9.2), 6.62 (lH, s), 4.21 (lH, dd, J 10.9, 5.3), 4.14 (lH, dd, J 10.9, 5.3), 2.18 (3H, d, J 1.3), 1.78 (iH, td, J 5.3, 1.3).

iv, A -{Fm oc)-A ?f -({(2~m ethy cydoprop-2-en-1-y!) m et oxy) car bony!) 't!ysme S4

Fmoc-Lys-OH-HCl (6.7 g, 16.5 mmol, 1.5 eq) was dissolved in THF (30 mL) and DMF (10 mL), to this solution was added Hunig's base (9.0 mL, 55.0 mmol, 5 eq) followed by cyclopropene carbonate S3 (2.7 g, 11.0 mmol, 1 eq) an immediate yellow coloration was observed upon addition of the carbonate. The reaction was stirred at room temperature for 6 hours and was adjudged complete by the consumption of starting material after this time as shown by TLC analysis. The crude reaction mixture was dry loaded onto silica gel and the major product purified by silica gel column chromatography eluting with ethyl acetate, hexane and acetic acid (50:49:1 then 99:0:1). This gave the desired product S4 as a colourless gum (4.3 g, 82% yield). Ή NMR analysis 6Ή (400 MHz, CDCI 3 ) 7-77 (2H, t, J 7-6), 7-65-7-55 (2H, m), 7.39 (2H, t, J 7.6), 7.31 (2H, t, J 7.3), 6.54 (lH, s), 5.68-5.57 (lH, m), 4.84 (lH, br-s), 4.44-4.32 (2H, m), 4.22 (lH, t, J 7.0), 3.98- 3.87 (lH, m), 3.17-3.09 (2H, m), 2.15-2.06 (6H, m), 1.99-1.86 (lH, m), 1.84-1.70 (lH, m), 1.68-1.59 (lH, m), 1.58-1.34 (2H, m); LRMS m/z (ES + ) 479.3 [M+H] + , 501.3 [M+ a] + , m/z(ES-) 477.2

v. N E -[({ 2-m et y!cycloprop-2-en-1-yf } methoxy)car bonyl] -L-fysi ne 3

-(Fmoc)-A/ s -(((2-methyl cyclo rop-2-en-i-yl)methoxy)carbonyl)-L-lysine S4 (3.5 g, 7.0 mmol, 1 eq) was dissolved in TH F and H 2 0 (3:1 40 mL), to this solution was added sodium hydroxide (0.9 g, 22.6 mmol, 3.1 eq). The reaction was stirred at room temperature for 8 hours after which time the reaction was adjudged complete by LC- MS analysis. The reaction mixture was diluted with H 2 0 (100 mL) and the pH adjusted to ~5 by the addition of HC1 (lM). The aqueous solution was washed with Et 2 0 (5* 100 mL), then concentrated to dryness yielding a colourless solid. The solid was purified by preparative HPLC, the product fractions were combined and the solvent removed by freeze-drying. This gave A i' -[({2-methylcycloprop-2-en-i-yl}methoxy)carbonyl]-L- lysine 3 as a colourless solid. ¾ (400 MHz, D 2 0) 6.45 (iH, s), 3.90-3.61 (2H, m), 3.09 (iH, t, J 6.4), 2.98-2.86 (2H, m), 1.92 (3H, s), 1.52-1,37 (2H, m), 1.37-1.22 (2H, m), 1.21-1.08 (2H, m), 0.83 (iH, d, J 5-2). LRMS m/z (ES + ) 257.2 [M+H] + , m/z (ES ~ ) 255.2 [M-H]-. 0c (100 MHz, D 2 0) 101.1 (CH), 72.3 (CH 2 ), 55-9 (CH), 40.2 (CH 2 ), 34-3 (CH 2 ), 28.9 (CH 2 ), 20.3 (CH 2 ), 16.6 (CH 3 ), 10.8 (CH) HRMS (ES + ) Found: (M+ a) + 279.1302. C i2 H 2 o0 4 2 Na required M + , 279.1315. Exam ple 2 - Encodi ng the site-specific i ncor poration of 3 i n E. co li

We demonstrated that 3 is efficiently and site-specifically incorporated into recombinant proteins in response to the amber codon using the PylRS/tRNAcuA pair and an SfGFP gene bearing an amber codon at position 150 (Supplementary Fig, 2a). The yield of protein is 8 mg per litre of culture, which is greater than that obtained for a well-established efficient substrate for PylRS A/ 6 -[(tert-butoxy)carbonyl]-L-lysine 1 (4 mg per litre of culture) 33 Electrospray ionisation mass spectrometry of SfGFP bearing 3 at position 150 (SfGFP-3) confirms the incorporation of the unnatural amino acid (Supplem entary Fig. 2b). SfGFP-3 was specifically labelled with the fluorescent tetrazine probe 4a, while SfGFP- 1 was left unlabelled (Supplementary Fig. 2b). 2 nmol of SfGFP-3 was quantitatively labelled with 10 equivalents of 4a in 30 minutes, as judged by both fluorescence imaging and mass spectrometry (Supplementary Fi g, 2b). The second order rate constant for labelling SfGFP-3 with 4a was 27 ± 1.8 M^s "1 (Supplementary Fig. 2c). 26

Since PylRS does not recognize the anticodon of its cognate t NA 3 ^ it is possible to alter the anticodon of this tRNA to decode distinct codons. We created a new tRNA in which the anticodon of PyltRNAcuA was converted from CUA to UUU (Supplem entary Table 1), to decode a set of lysine codons. We added 0.1 mM 3 to cells containing PylRS, PyltRNAuuu, and the gene for T4 lysozyme. Following expression of T4 lysozyme we detected proteins in the lysate bearing 3 with the tetrazine probe 4a (20 microM ih, Supplem entary Fig. 3). Control experiments show that the observed labelling requires the presence of the synthetase and tRNA, and electrospray ionization mass spectrometr^ demonstrates the incorporation of 3 in place of lysine in T4 lysozyme (Supplementary Fig. 4). The addition of 3 (0.1 or 0.5 mM) has little or no effect on cell growth (Supplementary Fig, 5) suggesting that the amino acid is not toxic at the concentration used, and there is substantial labelling within ih of amino acid addition (Supplementary Fi g, 6).

Exam le 3 - Genetic encodi ng of 3 i n hu man cel ls

Full-length mCherry-3-GFP-HA was expressed in HEK293 cells carrying the PylRS/tRNAcuA pair and mCherry-TAG-EGFP-HA (a fusion between the mCherry gene and the EGFP gene with a C-terminal HA tag, separated by the amber stop codon (TAG)). 18 Full-length protein was detected only in the presence of the 3 (Fig, 8a. Full gels in Supplementary Fig. 11), mCherry-3-EGFP-HA was selectively labelled with 4a, while mCherry-1-EGFP-HA was not labelled (Fig. 8 b) 18 demonstrating the site- specific incorporation of 3 with the PylRS/tRNAcuA pair in human cells.

Exam le 4 - Genetic encodi ng of 3 i n D. m ela n ogaster

We demonstrated that 3 can be site specifically incorporated into proteins in D. melanogaster. To achieve this, we used flies containing the PylRS/tRNAcuA pair (with the tRNA expressed ubiquitously from a U6 promoter and UAS-PylRS expression directed to ovaries using a nos-vpi6-GAL4 driver), and a dual luciferase reporter bearing an amber codon between firefly and renilla luciferase. 20 We observe a strong luciferase signal that is dependent on the addition of 1 or 3, and the dual luciferase signal is larger with 3. These experiments demonstrate that 3 is taken up by flies and is more efficiently incorporated in vivo in response to an amber codon than 1 (Fi g. 10a), a known excellent substrate for PylRS. 3 may be supplied by feeding food supplemented with amino acid 3 at lomM. In additional experiments, we demonstrated by western blot the efficient incorporation of 3 into a GFP-TAG- mCherry-HA construct (Supplem entary Fig. 15) expressed in ovaries 20 (Fig. 10 b), and the specific fluorescent labelling of the incorporated amino acid with 4g (F g, 10 c).

Exam ple 5 - Synthesis of Tetrazi ne-BOD f PY FL 4d

Schem e 3. Synthesis of tetrazine-biotin 4d . Reagaits and conditions: i . HCl dioxane, RT, ioo yield; i i . Bodipy-FL-NHS ester, Hiinig's base, DMF, RT.

i , S6

Boc-protected Tetrazine S6 was synthesized using the procedure reported earlier 6 . 4M HCl in dioxane (500 μΐ,, 2,0 mmol) was added to a stirring solution of Tetrazine S5 (8 mg, 0.02 mmol) in DCM (500 fiL). The reaction was carried out for 2 h at room temperature and subsequently the solvent was removed under reduced pressure to yield primary amine hydrochloride S6 as a pink solid (6mg, 0.02 mmol, 100%). The compound was directly used in the next step without any further purification.

i i . 4d

BODIPY FL succinimidyl ester (smg, 0.013 mmol, Life technologies) and Hiinig's base

(50 μΐ, 2.8 mmol) were added to the solution of Tetrazine-amine S2 (6mg, 0.02 mmol) in dry DMF (1 mL). The reaction mixture was stirred at room temperature for 16 h. The reaction mixture was diluted with 4ml of water and the product was purified by semi- preparative reverse phase HPLC using a gradient from 10% to 90% of buffer B in buffer A (buffer A: H 2 0; bufferB: acetonitrile). The identity and purity of the tetrazine- BODIPY FL conjugate 4d was confirmed by LC-MS. ESI-MS: [M-H]-, calcd. 581.38, found 581.2.

SU M M ARY OF EXAM PLES 1 to 5

We have characterized the synthesis of, and the genetically encoded, site-specific incorporation of a cyclopropene containing amino acid 3, and demonstrated the quantitative labelling of 3, with tetrazine probes, in proteins expressed in E. coli, mammalian cells and D. melanogaster, thereby showing the widespread utility and ind strial application of the present invention.

Supplementary References to Exam ples 1 to 5

1. Gautier, A. et al. Genetically Encoded Photocontrol of Protein Localization in Mammalian Cells, Journal of the American Chemical Society 132, 4086-4088 (2010).

2. Karp, N.A., Kreil, D.P. & Lilley, K.S. Determining a significant change in protein expression with DeCyder during a pair- wise comparison using two-dimensional difference gel electrophoresis. Proieomics , 1421-1432 (2004).

3. Karp, N.A. & Lilley, K.S. Design and analysis issues in quantitative proteomics studies. Proteomics! Supp! 1, 42-50 (2007).

4. Lilley, K.S. in Current Protocols in Protein Science (John Wiley & Sons, inc., 2001).

5. Von Stetina, J.R., Lafever, K.S., Rubin, M. & Drummond-Barbosa, D. A Genetic Screen for Dominant Enhancers of the Cell-Cycle Regulator alpha-Endosulfine Identifies Matrimony as a Strong F nctional Interactor in Drosophila. G3 (Bethesda) 1, 607-613 (2011).

6. Lang, K. et al. Genetically encoded norbornene directs site-specific cellular

protein labelling via a rapid bioorthogonal reaction. Nat Chem 4, 298-304 (2012).

Exam ple 6 - D ual Label li ng of Protei ns

The ability to attach two distinct molecules to programmed sites in proteins will facilitate a variety of applications including FRET 1 ' 2 to study protein structure, conformation and dynamics. Several approaches for doubly labeling proteins have been reported. One approach relies on the installation of one unnatural amino acid that is specifically labeled in combination with cysteine thiol labeling, but this approach is generally limited to proteins that do not contain free thiols. 3 - 4 Chemical ligation approaches can be combined with the genetic encoding of a single unnatural amino acid for protein labeling,s but this may limit the size and/or sites that may be labeled. Perhaps the most generally applicable approach for protein double labelling is based on the genetic incorporation of two distinct amino acids in response to two distinct codons introduced at user defined sites in the gene of interest. An ideal strategy for dual labeling requires i) the efficient, cellular, incorporation of two distinct unnatural amino acids into a protein that can be labelled in mutually orthogonal reactions, and ii) the development of mutually orthogonal reactions that allow the simultaneous addition of two molecules to the protein for rapid, quantitative labelling of the protein in aqueous media at physiological pH, temperature and pressure.

Scheme A (Figure 14) shows concerted, rapid, one-pot quantitative dual labelling of proteins in aqueous medium at physiological pH and temperature, (a) Unnatural amino acids and fluorophores used in this example, (b) Concerted labeling at an encoded terminal alkyne and an encoded cyclopropene via mutually orthogonal cycloadditions.

A limited range of chemistries have been investigated for the double labeling of proteins containing pairs of unnatural amino acids. The incorporation of azide- and alkyne- containing amino acids, and their non-quantitative labeling with alkyne and azide based fluorophores has been reported 7 , but this is not ideal for double labeling of proteins; if the encoded azide and alkyne are in proximity they can react to form a triazole in the protein, a strategy which allows genetically directed protein stapling, 6 but precludes labeling with probes. Moreover, an efficient one-pot reaction is not feasible because of the reaction between azide- and alkyne- bearing probes with each other. The incorporation of ketone and azide containing amino acids has been reported, 8 ' 10 which allows one-pot reaction of the encoded ketone with alpha effect nucleophiles, and the azides with alkyne probes. 10 However this approach is problematic because encoded azides are subject to reduction in many proteins when expressed in E, co//, 8 - 11 which will prevents quantitative labeling. Moreover, ketone labeling with alpha effect nucleophiles is very slow (rate constant approximately lO "4 M : ) and the reaction is optimal at P-H4-5.5, 12 which limits its utility for many proteins that are denatured or precipitate when kept for long periods under acidic conditions. We recently genetically installed a deactivated tetrazine containing amino acid' 3 and a norbornene containing amino acid 14"16 into proteins using our optimized orthogonal translation system, 9 Because the rate of inverse electron demand Diels Alder reaction between the deactivated tetrazine and norbornene is very slow, but the tetrazine can react with bicyclononyne based probes and the norbornene can react with activated tetrazine probes we were able to use this approach to specifically and quantitatively double label proteins. 9 While this approach has the advantage of proceeding in aqueous media at physiological pH, temperature and pressure; it does require sequential labeling steps (to avoid inverse electron Demand reactions between probes), each of which takes several hours, with purification between steps. All approaches reported to date for doubly labeling proteins at genetically encoded unnatural amino acids take tens of hours to days to reach completion.

An ideal approach to double label proteins would allow rapid one-pot labeling of genetically installed bio-orthogonal functional groups, proceed rapidly in aqueous media at physiological pH, temperature and pressure and be implemented simply by adding the labeling reagents to a recombinant protein bearing the site specifically incorporated bioorthogonal groups. A promising pair of mutually orthogonal reactions for one-pot labeling under aqueous conditions at physiological pH are the Cu(X)-catalysed 3+2 cycloaddition between azides and terminal alkynes, 17 and the inverse electron demand Diels Alder reaction of a strained alkenes and a tetrazine l8"23 (Figu re 11). The reaction of strained alkynes and azides can also be orthogonal to strained alkene tetrazine reactions, but since tetrazines react with strained alkynes this approach requires careful tuning of the rate constants for each reaction. 2 ^ No combination of 3+2 cycloaddition and inverse electron demand Diels Alder reaction has been demonstrated for protein labelling.

We demonstrated in examples 1 to 5 that a 1,3 disubstituted cyclopropene containing amino acid, 2 (referred to as 3 in examples 1 to 5 and elsewhere in this document), can be efficiently and site specifically incorporated into proteins using the PylRS/tRNAcuA pair. 25 This amino acid, unlike the 3,3 disubstituted cyclopropene incorporated for photoclick reactions, 26 reacts with tetrazines l¾27 with on-protein rate constants of 27 M-'s "1 . 25 Here we demonstrate the efficient genetic encoding of a terminal alkyne containing amino acid 1 and a cyclopropene containing amino acid 2 into a single protein and their rapid, quantitative, one-pot labeling with azide and tetrazine probes (Figu re 11). This work provides the first approach to the concerted double labeling of proteins in a one-pot process under aqueous conditions, at physiological pH, and provides a step change in the speed of double labeling, from days in previous work to 30 minutes in the approach reported here.

Proteins containing either 1 or 2 were overexpressed to examine the specificity of the orthogonality of the proposed labeling reactions. A fusion protein of glutathione-S- transferase and calmodulin (GST-CaM) with amino acid 1 at position 1 in calmodulin was expressed from cells containing ribo-Qi (an evolved orthogonal ribosome 6 > 8 > 29 ), O-gst- cam /TAG (a fusion gene between g!ufafhione-S transferase (_gst) and calmodulin (cam) on an orthogonal message 30 in which the first codon of earn is replaced with a TAG codon), and M/PrpRS/tRNAcuA (a synthetase/tRNA pair developed for incorporating 1 in response to the TAG codon) 31 grown in the presence of 1 (4 nxM), The GST tag was subsequently removed by cleavage using thrombin at an engineered thrombin-cleavage site between GST and CaM. CaMlj (CaM containing 1 at position 1, ~ioo pmole) was labelled with the azide containing fluorophore 3 (2 nmole), in a Cu (I)-catalysed click reaction. The reaction was quantitative as judged by both the quantitative shift of the fluorescemiy labelled protein by SDS-PAGE and electrospray ionization mass spectrometry (ESI-MS) (Fi gure 11a).

The cyclopropene containing amino acid, 2, was site specifically incorporated at position 40 of calmodulin. The modified protein was expressed in cells bearing the PylRS/tRNAcuA (that efficiently directs the site specific incorporation of 2),¾ ribo-Qi, and Q-gst-cam 4 oTAG grown in the presence of 2 (1 mM). CaM2 40 (~ioo pmol) (obtained after thrombin cleavage of the GST tag) was labelled with the tetrazine containing fluorophore 4 (2 nmole). The reaction was quantitative as judged by both the quantitative shift of the fluoreseently labelled protein by SDS-PAGE and electrospray ionization mass spectrometry (ESI-MS) (Figu re 11b). CaM2 40 was not labeled with 3 under the conditions that led to quantitative labeling of CaM1i with 3 ( Figure 11a) . Similarly, CaM1 , was not labeled with 4 under conditions where CaM2 40 was quantitatively labeled with 4. These experiments demonstrate that the two labeling reagents react quantitatively with their target amino acid, but do not react with their non-targeted unnatural amino acid in proteins.

Next we investigated labeling 1 and 2 within the same protein. We site-specifically incorporated 1 and 2 at positions 1 and 40 of calmodulin to produce CaM1i2 40 (Figure 12). We directed the incorporation of amino acid 1 with an M/PrpRS/tRNAcuA pair and the incorporation of amino acid 2 with the evolved PylRS/tRNAuAcu pair, which efficiently decodes the quadruplet ACTA codon on orthogonal messages using ribo-Qi. 9 Unnatural amino acids were incorporated in response to UAG and AGTA codons at positions 1 and 40 in calmodulin, within a GST-calmodulin gene on an orthogonal message (O-gst-camnAG- 4OAGTA)- Expression of full-length GST-CaM1i2 40 was dependent on the addition of amino acids 1 and 2 to E, coii, and ESI-MS demonstrated the genetically directed incorporation of amino acids 1 and 2 (Fi gu re 12c). The yield of full length GST-CaM1i2 40 was -2 mg per L of culture.

To determine the time required to quantitatively label CaM1i2 40 with azide 3 or tetrazine 4 we incubated 100 pmol of CaM1i2 40 with 2 nmol of either 3 or 4 and followed each reaction by both mobility shift on SDS-PAGE and fluorescent imaging upon labeling (Fi gure 12b). These experiments demonstrate that fluorophore labeling is complete in 30 minutes.

Next we investigated the labeling of CaM1i2 40 with both 3 and 4 (Fi gure 13). We first tested the addition of 4 (2 nmol) to CaM1i2 40 (100 pmol) followed by purification to remove free 4, and subsequent labelling with 3 (2 nmol) (Figu re 13a lane 4). This led to efficient double labelling as judged by SDS-PAGE mobility shift and fluorescence imaging. Next we performed sequential labeling without purification by incubating CaM1 l 2 40 with 4 for 30 minutes and then adding 3 and click reagents and incubating further for 30 rnin (Figu re 13a lane 5). This also led to efficient double labelling as judged by SDS-PAGE mobility shift and fluorescence imaging. Finally, we simultaneously added 4 (2 nmol), 3 (2 nmol) and click reagents to CaM1i2 0 (100 pmol) and incubated for 30 minutes. (Figu re 13a lane 6). This again led to efficient double labelling as judged by SDS-PAGE mobility shift and fluorescence imaging. In all doubly labeled proteins we observe a decrease in the BODIPY-FL fluorescence relative to the singly labeled control upon excitation at 488 nm (compare lanes 4, 5, and 6 to lane 3 in Fi gu re 13a), consistent with in gel Forster resonance energy transfer (FRET) between BODIPY-FL and BODIPY-TMR-X. ESI-MS further demonstrates that this concerted, one-pot protocol leads to genetically directed efficient, rapid and quantitative double labeling of proteins.

in summary, in this example we show an efficient and rapid protocol for expressing recombinant proteins bearing a site specifically incorporated aikyne and a site specifically- incorporated cyclopropene. We demonstrate that the inverse electron demand Diels Alder reaction of an encoded 1,3 disubstituted cyclopropene and tetrazine probe, and the 3+2 cycloaddition reaction of the encoded aikyne and azide probe are mutually orthogonal to each other and to the functional groups in proteins. By combining the genetic encoding of an aikyne and a cyclopropene in a single protein and labelling with the mutually orthogonal reactions we demonstrate the concerted, one-pot rapid double labeling of a protein in aqueous media at physiological pH and temperature. This strategy has utility for doubly labeling proteins for a variety of studies and applications, and may be extended to the double labeling of diverse molecules in diverse cells and organisms.

N ote on exam le 6 : The chemical designations in example 6 and in the corresponding figures (drawings) discussed in example 6 are self-contained and apply only to example 6. Discussion of chemical designations in the rest of this document are consistent with the exception of example 6. For example, the skilled reader will immediately appreciate that compound 2 of example 6 corresponds to compound 3 in the rest of this document (i.e. the exemplary cyclopropene amino acid of the invention). Compounds 3 and 4 of example 6 are tetrazine compounds.

REFERENCES TO EXAMPLE 6

(l)Zhang, J.; Campbell, R. E.; Ting, A. Y.; Tsien, R. Y. Nature Reviews Molecular Cell Biology 2QQ2, 3, 906.

(2) Kajihara, D.; Abe, R.; Kjima, I.; Komiyama, C; Sisido, M.; Hohsaka, T. Nat Methods2Q 06, 3, 923.

(3) Brostad, E. M.; Lemke, E. A.; Schuitz, P. G.; Deniz. A. A. J Am Chem &c2008 , 130, 17664.

(4) Nguyen, D. P.; Elliott, T.; Holt, M.; Muir, T. W.; Chin, J. W. J Am Chem Soc 2011, 133, «418.

(5) Wissoer, R. F.; Batjargal, S,; Fadzen, C. ML; Petersson, E. J. J Am Chem S∞2Q 13, 135, 6529.

C6)Neumann, H .: Wang, K.; Daws, L.: Garcia-Alai, ML; Chin, J. W. Nature2Q 10 , 464, 44s.

(7)Wao, W.; Huang, Y,; Wang, Z.; Russell, W. K.; Pai, P. J.; Russell, D. H.; Liu, W. R. Angew Chem Int Ed Engl 2010 , 49, 3211.

(SjChatterjee, A.; Sun, S. B.; Furman, J. L.; Xiao, H.; Schuitz, P, G. Biochemistry 20 3,

(9)Wang, K.; Sachdeva, A.; Cox, D. J.; Wiif, N. W.; Wallace, S.; Mehl, R, A.; Chin, J. W. submitted.

(lO)Wu, B,; Wang, Z,; Huang, Y.; Liu, W. R. Chembiochem : a European journal of chemical biology 2012 , 13, 1405.

(n)Sasmal, P, K.; Carregal-Romero, S.; Han, A. A.; Streu, C. N.; Lin, Z.; Namikawa, K.; Elliott, S. L.; Koster, R. W.; Parak, W. J.; Meggers, E. ChemBioChem 2012 , 13, 1116.

(12) Rotenberg, S. A.: Calogeropoulou, T.; Jawors , J. S.; Weinstein, I. B.; Rideout, D. Proceedingsof the National Academy of Sciences of the United Sates of America 1991, 88, 2490.

(13) Seitchik, J. L.; Peeler, J. C; Taylor, M. T.; Blackman, M. L.; Rhoads, T. W.; Coolev, R. B.; Refakis, C; Fox, J. M.; Mehl, R. A. J Am Chem S c2Q 12 , 134, 2898.

(14) Lang, .; Davis, L.; Torres-Kolbus, J.; Chou, C; Deiters, A.; Chin, J, W. Nat Char, 2012, 4, 298.

(15) Plass, T.; Milles, S.; Koehler, C; Szymanski, J.; Mueller, R.; WieBler, M.; Schuitz, C.; Lemke, E. A,

Angewandte Chemie International Edition 20 2, 51, 4166. 06) Kay a, E.: Vrabel, M.; Dei ml, C; Prill, S.; Fluxa, V. S.; Carell, T. Angewandte Chemie Internationa! Edition 2012, 57, 4466.

(i7)Wang, Q.; Chan, T. R.; Hilgraf, R.; Fokin, V. V.; Sharpless, . B.; Finn, M. G, J Am Chem 5&C2QQ 3, 725, 31 2.

(i8)Devaraj, N. K.; Weissieder, R. Accounts of Chemical Research 2011, 44, 816.

(19) Yang, J.; Seckute, J.; Cole, C M.; Devaraj, N. K. Angewandte Chemie Internationa! Edition 2012, 57, 7476.

(20) Blackman, M. L.; Royzen, M.; Fox, J. M. J Am Chem 5bc2008, 730, 13518.

(21) Lang, K.; Davis, L.; Wallace, S.; Mahesh, M.; Cox, D. J.; Blackmail, M. L.; Fox, J. M.; Chin, J. W. J Am Chem S c2012, 134, 10317.

(22)Borrmann, A.; Miiles, S.; Plass, T.; Dommerholt, J.; Verkade, J. M. M.; WieBler, M.; Schultz, C; van Hest, J. C. M.; van Delft, F. L.; Lemke, E. A. ChemBioChem 2012, 73, 2094.

(23) Schoch, J.; Staudt, M.; Samanta, A.; Wiessler, M.; Jaschke, A. Bioconjug Chem 2012, 23, 1382.

(24) Karver, M. R.; Weissieder, R.; Hilderbrand, S. A. Angew Chem Int Ed Eng! 2012, 51, 920.

(25) Bianco, A.; Elliott, T. S.; Townsle , F. M.; Pisa, R.; Davis, L.; Elsasser, S. J.; Ernst:, R. J.; Lang, K.; Sachdeva, A.; Chin, J. W. Under Review,

(26) Yu, Z.; Pan, Y.; Wang, Z.; Wang, J.; Lin, Q. Angewandte Chemie Internationa! Edit) on 2012, 57, 0600.

(27) amber, D. N.; Nazarova, L. A.: Liang, Y.; Lopez, S. A.; Patterson, D. M.; Shih, H. W.; Houk, K. N.; Prescher, J. A. J Am Chem Sbc2013, 735, 13680.

(28) Wang, .; Schmied, W. H.; Chin, J. W. Angew Chem int Ed Engl 2012, 57, 2288.

(29)Wang, K.; Neumann, H.; Peak-Chew, S. Y.; Chin, J. W. Nature biotechnology 2007, 25, 770.

(30) Rackham, O.; Chin, J. W. Nature chemical biology 2005, 7, 159.

(31) Deiters, A,; Schultz, P. G. Bioorganic &amp; Medicinal Chemistry Letter s2QQ5, 15, 1521.