Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
ORTHOGONAL Q-RIBOSOMES
Document Type and Number:
WIPO Patent Application WO/2011/077075
Kind Code:
A1
Abstract:
The invention relates to 16S rRNA comprising a mutation at A1196, and to 16S rRNA further comprising a mutation at C1195 and/or A1197, and to 16S rRNA which comprises (i) C1195A and A1196G; or (ii) C1195T, A1196G and A1197G; or (iii) A1196G and A1197G. The invention also relates to ribosomes comprising such 16S rRNAs and to use of same.

Inventors:
CHIN JASON (GB)
WANG KAIHANG (GB)
NEUMANN HEINZ (DE)
Application Number:
PCT/GB2010/002296
Publication Date:
June 30, 2011
Filing Date:
December 20, 2010
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
MEDICAL RES COUNCIL (GB)
CHIN JASON (GB)
WANG KAIHANG (GB)
NEUMANN HEINZ (DE)
International Classes:
C12N15/11; C12P21/02
Domestic Patent References:
WO2008065398A22008-06-05
WO2008065398A22008-06-05
Foreign References:
GB2007004562W2007-11-28
GB2006002637W2006-07-14
Other References:
MARK S WILSON ET AL: "Novel Archaea and Bacteria Dominate Stable Microbial Communities in North Americaâ s Largest Hot Spring", MICROBIAL ECOLOGY, SPRINGER-VERLAG, NE, vol. 56, no. 2, 13 December 2007 (2007-12-13), pages 292 - 305, XP019623389, ISSN: 1432-184X
MUTH GREGORY W ET AL: "Using a targeted chemical nuclease to elucidate conformational changes in the E. coli 30S ribosomal subunit", BIOCHEMISTRY, vol. 39, no. 14, 11 April 2000 (2000-04-11), pages 4068 - 4074, XP002633712, ISSN: 0006-2960
WANG KAIHANG ET AL: "Evolved orthogonal ribosomes enhance the efficiency of synthetic genetic code expansion", NATURE BIOTECHNOLOGY, NATURE PUBLISHING GROUP, NEW YORK, NY, US, vol. 25, no. 7, 1 July 2007 (2007-07-01), pages 770 - 777, XP002478973, ISSN: 1087-0156, [retrieved on 20070624], DOI: DOI:10.1038/NBT1314
RACKHAM OLIVER ET AL: "A network of orthogonal ribosome center dot mRNA pairs", NATURE CHEMICAL BIOLOGY, NATURE PUBLISHING GROUP, NEW YORK, NY, US, vol. 1, no. 3, 1 August 2005 (2005-08-01), pages 159 - 166, XP002478971, ISSN: 1552-4450, [retrieved on 20080717], DOI: DOI:10.1038/NCHEMBIO719
O'CONNOR MICHAEL ET AL: "Decoding fidelity at the ribosomal A and P sites: Influence of mutations in three different regions of the decoding domain in 16S rRNA", NUCLEIC ACIDS RESEARCH, OXFORD UNIVERSITY PRESS, SURREY, GB, vol. 25, no. 6, 1 January 1997 (1997-01-01), pages 1185 - 1193, XP002478972, ISSN: 0305-1048, DOI: DOI:10.1093/NAR/25.6.1185
XIE JIANMING ET AL: "A CHEMICAL TOOLKIT FOR PROTEINS--AN EXPANDED GENETIC CODE", NATURE REVIEWS MOLECULAR CELL BIOLOGY, MACMILLAN MAGAZINES, LONDON, GB, vol. 7, no. 10, 1 October 2006 (2006-10-01), pages 775 - 782, XP009084566
TATUSOVA; MADDEN: "Blast 2 sequences - a new tool for comparing protein and nucleotide sequences", FEMS MICROBIOL. LETT., vol. 174, 1999, pages 247 - 250
"The Ribosome", vol. LXVI, 2001, COLD SPRING HARBOR LABORATORY PRESS
LAURSEN, B.S.; SORENSEN, H.P.; MORTENSEN, K.K.; SPERLING-PETERSEN, H.U., MICROBIOL MOL BIOL REV, vol. 69, 2005, pages 101 - 123
WIKSTROM, P.M.; LIND, L.K.; BERG, D.E.; BJORK, G.R., J MOL BIOL, vol. 224, 1992, pages 949 - 966
SHINE, J.; DELGARNO, L., BIOCHEM J, vol. 141, 1974, pages 609 - 615
STEITZ, J.A.; JAKES, K., PROC NATL ACAD SCI U S A, vol. 72, 1975, pages 4734 - 4738
YUSUPOVA, G.Z.; YUSUPOV, M.M.; CATE, J.H.; NOLLER, H.F., CELL, vol. 106, 2001, pages 233 - 241
CHEN, H.; BJERKNES, M.; KUMAR, R.; JAY, E., NUCLEIC ACIDS RES, vol. 22, 1994, pages 4953 - 4957
GOTTESMAN, S. ET AL.: "The Ribosome", vol. LXVI, 2001, COLD SPRING HARBOR LABORATORY PRESS
LOOMAN, A.C.; BODLAENDER, J.; DE GRUYTER, M.; VOGELAAR, A.; VAN KNIPPENBERG, P.H., NUCLEIC ACIDS RES, vol. 14, 1986, pages 5481 - 5497
LIEBHABER, S.A.; CASH, F.; ESHLEMAN, S.S., J MOL BIOL, vol. 226, 1992, pages 609 - 621
WINKLER, W.; NAHVI, A.; BREAKER, R.R., NATURE, vol. 419, 2002, pages 952 - 956
LAURSEN, B.S.; SORENSEN, H.P.; MORTENSEN, K.K.; SPERLING-PETERSEN, H.U.: "Initiation of protein synthesis in bacteria", MICROBIOL MOL BIOL REV, vol. 69, 2005, pages 101 - 123, XP055234766, DOI: doi:10.1128/MMBR.69.1.101-123.2005
SHULTZABERGER, R.K.; BUCHEIMER, R.E.; RUDD, K.E.; SCHNEIDER, T.D., J MOL BIOL, vol. 313, 2001, pages 215 - 228
LEE, K.; HOLLAND-STALEY, C.A.; CUNNINGHAM, P.R., RNA, vol. 2, 1996, pages 1270 - 1285
WOOD, T.K.; PERETTI, S.W., BIOTECHNOL. BIOENG, vol. 38, 1991, pages 891 - 906
JACOB, W.F.; SANTER, M.; DAHLBERG, A.E., PROC NATL ACAD SCI U S A, vol. 84, 1987, pages 4757 - 4761
LUDWIG; SCHLEIFER, FEMS MICROBIOL. REV., vol. 15, 1994, pages 155 - 73
"Bergey's Manual of Systematic Bacteriology", vol. 1, 2, SPRINGER
COLE JR; CHAI B; FARRIS RJ; WANG Q; KULAM SA; MCGARRELL DM; GARRITY GM; TIEDJE JM, NUCLEIC ACIDS RES, vol. 33, 2005, pages D294 - D296
DUNNY ET AL., APPL. ENVIRON. MICROBIOL., vol. 57, 1991, pages 1194 - 1201
XIE, J.; SCHULTZ, P. G.: "A chemical toolkit for proteins--an expanded genetic code", NAT REV MOL CELL BIOL, vol. 7, 2006, pages 775 - 82, XP009084566
STEER, B. A.; SCHIMMEL, P.: "Major anticodon-binding region missing from an archaebacterial tRNA synthetase", J BIOL CHEM, vol. 274, 1999, pages 35601 - 6, XP002976469, DOI: doi:10.1074/jbc.274.50.35601
CHIN, J. W.; MARTIN, A. B.; KING, D. S.; WANG, L.; SCHULTZ, P. G.: "Addition of a photocrosslinking amino acid to the genetic code of Escherichiacoli", PROC NATL ACAD SCI U S A, vol. 99, no. 1, 2002, pages 1020 - 4
SRINIVASAN, G.; JAMES, C. M.; KRZYCKI, J. A.: "Pyrrolysine encoded by UAG in Archaea: charging of a UAG-decoding specialized tRNA", SCIENCE, vol. 296, 2002, pages 1459 - 62, XP002518931, DOI: doi:10.1126/science.1069588
POLYCARPO, C. ET AL.: "An aminoacyl-tRNA synthetase that specifically activates pyrrolysine", PROC NATL ACAD SCI U S A, vol. 101, 2004, pages 12450 - 4, XP002518930, DOI: doi:10.1073/PNAS.0405362101
NEUMANN, H.; PEAK-CHEW, S. Y.; CHIN, J. W.: "Genetically encoding N(epsilon)-acetyllysine in recombinant proteins", NAT CHEM BIOL, vol. 4, 2008, pages 232 - 4, XP002518929, DOI: doi:10.1038/NCHEMBIO.73
NGUYEN, D. P. ET AL.: "Genetic encoding and labeling of aliphatic azides and alkynes in recombinant proteins via a pyrrolysyl-tRNA Synthetase/tRNA(CUA) pair and click chemistry", J AM CHEM SOC, vol. 131, 2009, pages 8720 - 1, XP009127968, DOI: doi:10.1021/ja900553w
RACKHAM, O.; CHIN, J. W.: "A network of orthogonal ribosome x mRNA pairs", NAT CHEM BIOL, vol. 1, 2005, pages 159 - 66, XP002478971, DOI: doi:10.1038/nchembio719
WANG, K.; NEUMANN, H.; PEAK-CHEW, S. Y.; CHIN, J. W.: "Evolved orthogonal ribosomes enhance the efficiency of synthetic genetic code expansion", NAT BIOTECHNOL, vol. 25, 2007, pages 770 - 7, XP002478973, DOI: doi:10.1038/nbt1314
ROSTOVTSEV, V. V.; GREEN, L. G.; FOKIN, V. V.; SHARPLESS, K. B.: "A stepwise huisgen cycloaddition process: copper(I)-catalyzed regioselective "ligation" of azides and terminal alkynes", ANGEW CHEM INT ED ENGL, vol. 41, 2002, pages 2596 - 9, XP002524189, DOI: doi:10.1002/1521-3773(20020715)41:14<2596::AID-ANIE2596>3.0.CO;2-4
HOHSAKA, T.; SISIDO, M.: "Incorporation of non-natural amino acids into proteins", CURR OPIN CHEM BIOL, vol. 6, 2002, pages 809 - 15, XP008099492, DOI: doi:10.1016/S1367-5931(02)00376-9
OHTSUKI, T.; MANABE, T.; SISIDO, M.: "Multiple incorporation of non-natural amino acids into a single protein using tRNAs with non-standard structures", FEBS LETT, vol. 579, 2005, pages 6769 - 74, XP005205440, DOI: doi:10.1016/j.febslet.2005.11.010
MURAKAMI, H.; HOHSAKA, T.; ASHIZUKA, Y.; SISIDO, M.: "Site-directed incorporation of p-nitrophenylalanine into streptavidin and site-to-site photinduced electron transfer from a pyrenyl group to a nitrophenyl group on the protein framework", JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, vol. 120, 1998, pages 7520 - 7529
RODRIGUEZ, E. A.; LESTER, H. A.; DOUGHERTY, D. A.: "In vivo incorporation of multiple unnatural amino acids through nonsense and frameshift suppression", PROC NATL ACAD SCI U S A, vol. 103, 2006, pages 8650 - 5, XP055016054, DOI: doi:10.1073/pnas.0510817103
MONAHAN, S. L.; LESTER, H. A.; DOUGHERTY, D. A.: "Site-specific incorporation of unnatural amino acids into receptors expressed in mammalian cells", CHEMISTRY AND BIOLOGY, vol. 10, 2003, pages 573 - 580, XP002296506, DOI: doi:10.1016/S1074-5521(03)00124-8
ANDERSON, J. C. ET AL.: "An expanded genetic code with a functional quadruplet codon", PROC NATL ACAD SCI U S A, vol. 101, 2004, pages 7566 - 71, XP055227944, DOI: doi:10.1073/pnas.0401517101
ATKINS, J. F.; BJORK, G. R.: "A gripping tale of ribosomal frameshifting: extragenic suppressors of frameshift mutations spotlight P-site realignment", MICROBIOL MOL BIOL REV, vol. 73, 2009, pages 178 - 210
STAHL, G.; MCCARTY, G. P.; FARABAUGH, P. J.: "Ribosome structure: revisiting the connection between translational accuracy and unconventional decoding", TRENDS BIOCHEM SCI, vol. 27, 2002, pages 178 - 83, XP004347051, DOI: doi:10.1016/S0968-0004(02)02064-9
SELMER, M. ET AL.: "Structure of the 70S ribosome complexed with mRNA and tRNA", SCIENCE, vol. 313, 2006, pages 1935 - 42, XP055131420, DOI: doi:10.1126/science.1131127
MAGLIERY, T. J.; ANDERSON, J. C.; SCHULTZ, P. G.: "Expanding the genetic code: selection of efficient suppressors of four-base codons and identification of "shifty" four-base codons with a library approach in Escherichia coli", J MOL BIOL, vol. 307, 2001, pages 755 - 69, XP004464163, DOI: doi:10.1006/jmbi.2001.4518
KHAZAIE, K.; BUCHANAN, J. H.; ROSENBERGER, R. F.: "The accuracy of Q beta RNA translation. 1. Errors during the synthesis of Q beta proteins by intact Escherichia coli cells", EUR J BIOCHEM, vol. 144, 1984, pages 485 - 9
LAUGHREA, M.; LATULIPPE, J.; FILION, A. M.; BOULET, L.: "Mistranslation in twelve Escherichia coli ribosomal proteins. Cysteine misincorporation at neutral amino acid residues other than tryptophan", EUR J BIOCHEM, vol. 169, 1987, pages 59 - 64
KRAMER, E. B.; FARABAUGH, P. J.: "The frequency of translational misreading errors in E. coli is largely determined by tRNA competition", RNA, vol. 13, 2007, pages 87 - 96
CHIN, J. W. ET AL.: "Addition of p-azido-L-phenylalanine to the genetic code of Escherichia coli", J AM CHEM SOC, vol. 124, 2002, pages 9026 - 7, XP008129464, DOI: doi:10.1021/ja027007w
MUKAI, T. ET AL.: "Adding I-lysine derivatives to the genetic code of mammalian - cells with engineered pyrrolysyl-tRNA synthetases", BIOCHEM BIOPHYS RES COMMUN, vol. 371, 2008, pages 818 - 22, XP022688470, DOI: doi:10.1016/j.bbrc.2008.04.164
CAMARERO, J. A.; PAVEL, J.; MUIR, T. W.: "Chemical Synthesis of a Circular Protein Domain: Evidence for Folding-Assisted Cyclization", ANGEWANDTE CHEMIE - INTERNATIONAL EDITION, vol. 37, 1998, pages 347 - 349
SCOTT, C. P.; ABEL-SANTOS, E.; WALL, M.; WAHNON, D. C.; BENKOVIC, S. J.: "Production of cyclic peptides and proteins in vivo", PROC NATL ACAD SCI U S A, vol. 96, 1999, pages 13638 - 43, XP002944416, DOI: doi:10.1073/pnas.96.24.13638
LI, P.; ROLLER, P. P.: "Cyclization strategies in peptide derived drug design", CURR TOP MED CHEM, vol. 2, 2002, pages 325 - 41, XP055094202, DOI: doi:10.2174/1568026023394209
WALENSKY, L. D. ET AL.: "Activation of apoptosis in vivo by a hydrocarbon-stapled BH3 helix", SCIENCE, vol. 305, 2004, pages 1466 - 70, XP002555229, DOI: doi:10.1126/science.1099191
TRAUGER, J. W.; KOHLI, R. M.; MOOTZ, H. D.; MARAHIEL, M. A.; WALSH, C. T.: "Peptide cyclization catalysed by the thioesterase domain of tyrocidine synthetase", NATURE, vol. 407, 2000, pages 215 - 8
STEMMER, W. P.; MORRIS, S. K.: "Enzymatic inverse PCR: a restriction site independent, single-fragment method for high-efficiency, site-directed mutagenesis", BIOTECHNIQUES, vol. 13, 1992, pages 214 - 20
SANTORO, S. W.; WANG, L.; HERBERICH, B.; KING, D. S.; SCHULTZ, P. G.: "An efficient system for the evolution of aminoacyl-tRNA synthetase specificity", NAT BIOTECHNOL, vol. 20, 2002, pages 1044 - 8, XP003013349, DOI: doi:10.1038/nbt742
RICE, J. B.; LIBBY, R. T.; REEVE, J. N.: "Mistranslation of the mRNA encoding bacteriophage T7 0.3 protein", J BIOL CHEM, vol. 259, 1984, pages 6505 - 10
HAO, B. ET AL.: "A new UAG-encoded residue in the structure of a methanogen methyltransferase", SCIENCE, vol. 296, 2002, pages 1462 - 6
KOBAYASHI, T. ET AL.: "Structural basis for orthogonal tRNA specificities of tyrosyl-tRNA synthetases for genetic code expansion", NAT STRUCT BIOL, vol. 10, 2003, pages 425 - 32, XP002375045, DOI: doi:10.1038/nsb934
Attorney, Agent or Firm:
RICHARDS, William (Shelley John Amor, Greenwood LLP,7 Gay Street, Bath BA1 2PH, GB)
Download PDF:
Claims:
CLAIMS

1. A I6S rRNA comprising a mutation at Al 196. 2. A 16S rRNA according to claim 1 wherein said mutation is Al 196G.

3. A 16$ rRNA according to claim 1 or claim 2 further comprising a mutation at CI 195 and/or A 1 1 7. 4. A 16S rRNA according to any preceding claim which comprises

(i) C1 1 5A and A1 1 6G; or

(ii) C1 195T, A1 1 6G and Al l 97 G; or

(iii) A1 196G and A1 1 7G. 5. A 16S rRNA according to any preceding claim which further comprises A531 G and U534A.

6. A ribosome capable of translating a quadruplet codon, said ribosome comprising a 16S rRNA according to any preceding claim.

7. Use of a 16S rRNA according to any preceding claim in the translation of a mRNA comprising at least one quadruplet codon.

8. A 16S rRNA, ribosome, cell or method substantially as described herein.

9. A 16S rRNA, ribosome, cell or method according to claim 8 with reference to the accompanying figures.

Description:
Orthogonal Q-Ribosomes

Field of the invention

The invention relates to ribosomes for translation of quadruplet codons. Background to the Invention

Since each of the 64 triplet codons are used to encode natural amino acids or polypeptide termination, new blank codons are required for cellular genetic code expansion. In principle quadruplet codons might provide 256 blank codons.

Stoichiometrically aminoacylated extended anticodon tRNAs have been used to incorporate unnatural amino acids in response to 4-base codons with very low efficiency in in viiro systems , 3 and in limited in vivo systems, via import of previously aminoacylated†RNA 14 15 . This is a problem in the art.

In one case a 4-base suppressor and amber codon have been used, in a non- generalizable approach, to encode two unremarkable amino acids with low efficiency ,6 . Indeed, the inefficiency with which natural ribosomes decode quadruplet codons severely limits their utility for genetic code expansion, which is a problem in the art.

The present invention seeks to overcome problem (s) associated with the prior art.

Summary of the invention The inventors have mutated certain ribosomal components to produce a ribosome with a new technical capability of translating quadruplet codons. The mutations have focussed on the 16S rRNA. The ribosomes produced according to the present invention are sometimes referred to as quadruplet-ribosomes or Q-Ribosomes (RiboQ) . In one aspect, the invention relates to a 16S rRNA comprising a mutation at Al 196.

In one aspect, the invention relates to a 16S rRNA comprising a mutation at Al 196 and at least one further mutation selected from C I 195T, A l 1 7G, C I 195A. In another aspect, the invention relates to a 16S rRNA as described above further comprising a mutation at C I 195 and/or Al 197. In another aspect, the invention relates to a 16S rRNA as described above which comprises

(i) C1 195A nd A1 196G; or

(ii) C1 195T, A1 196G and A1 197G; or

(iii) A1 196G and A1 197G.

In another aspect, the invention relates to a ribosome capable of translating a quadruplet codon, said ribosome comprising a 16S rRNA as described above. In another aspect, the invention relates to use of a 16S rRNA as described above in the translation of a mRNA comprising at least one quadruplet codon.

Detailed Description of the Invention

In one aspect the invention relates to a 16S rRNA comprising a mutation at Al 196.

Suitably said mutation is A 1 196G.

In another aspect, the invention relates to a 16S rRNA as described above further comprising a mutation at CI 195 and/or A 1 197.

In another aspect, the invention relates to a 16S rRNA as described above which comprises

(i) C 1 195A and A1 196G; or

(ii) C1 195T, A1 196G and A1 197G; or

(iii) A1 196G and A1 197G.

In another aspect, the invention relates to a 16S rRNA as described above which further comprises A531 G and U534A. In another aspect, the invention relates to a ribosome capable of translating a quadruplet codon, said ribosome comprising a 16S rRNA as described above.

In another aspect, the invention relates to use of a 16S rRNA as described above in the translation of a mRNA comprising at least one quadruplet codon.

Suitably the 16S rRNA of the invention comprising a mutation at Al 196 comprises A1 196G. This specific mutation is common to each of the preferred 16S rRNAs exemplified herein such as Q l , Q2, Q3 and Q4, which all possess A1 196G (i.e. G at position 1 196).

Suitably the 16S rRNA of the invention further comprises a mutation at Al 197. Suitably the 16S rRNA of the invention comprising a mutation at A l 197 comprises Al 197G. This specific mutation is common to 75% of the preferred 16S rRNAs exemplified herein such as Ql , Q2 and Q3, which all possess Al 197G (i.e. G at position 1 197).

Suitably the 1 6S rRNA of the invention comprises a mutation at Al 196 and a mutation at Al 197. Most suitably the 16S rRNA of the invention comprises A 1 196G and A1 197G. Each of Q l , Q2 and Q3 comprise this combination of mutations.

Suitably the 16S rRNA of the invention may comprise a mutation at CI 195. This mutation may be CI 195T or C I 195A. Suitably the 16S rRNA of the invention which comprises a C I 195 mutation also comprises a A l 196 mutation such as Al 196G. Suitably when the 16S rRNA of the invention comprises A l 197G, it also comprises C I 195T. Suitably when the 16S rRNA of the invention comprises Al 196G and Al 1 7G, it also comprises C I 195T. Suitably when the 1 6S rRNA of the invention comprises A 1 196G and is wild type at Al 197 (i.e. A at position 1 197), it also comprises C I 195A.

Further mutations may be present or may not be present. Ribo-X and Ribo-Q The Ribo-Q 1 6S rRNA sequences herein have been prepared from Ribo-X as a starting 1 6S rRNA sequence. Ribo-X is a published 16S rRNA sequence well known to the person skilled in the art. More specifically, Ribo-X refers to a 16S rRNA sequence which has two substitutions compared to wild type, namely A531 G and U534A. Therefore suitably each Ribo-Q 16S rRNA sequence described herein also possesses A531 G and U534A in addition to each further mutation or substitution discussed herein. It should be assumed that the 1 6S rRNAs of the invention each possess A531 G and U534A in addition to any other mutations discussed, unless the context indicates otherwise. Thus, suitably each 16S rRNA of the invention comprises at least 3 mutations compared to wild type, namely A 1 196, A531 G and U534A, most suitably A 1 196G, A531 G and U534A.

In case any more detail is needed, Ribo-X is discussed in depth in PCT/GB2007/004562 (published as WO2008/065398). This document is specifically incorporated herein by reference expressly for the detail of the Ribo-X 16S rRNA sequence which is the 'background' or parent sequence from which the Ribo-Q 16S rRNAs of the invention are derived and/or produced.

Suitably the 16S rRNA of the invention comprises A1 196G and A1 197G (Ribo-Ql , Ribo- Q2, Ribo-Q3).

Suitably the 16S rRNA of the invention comprises C 1 195T and A1 196G and A1 197G (Ribo-Q3). Suitably the 16S rRNA of the invention comprises C 1 195T and A 1 196G (Ribo-Q4) .

In one embodiment the 16S rRNA of the invention consists of wild type 16S rRNA sequence and A531 G and U534A and Al 196G and Al 197G (Ribo-Q l ). In one embodiment the 16S rRNA of the invention consists of wild type 16S rRNA sequence and A531 G and U534A and A1 196G and A1 197G and up to 8 further mutations/substitutions (Ribo-Q2).

In one embodiment the 16S rRNA of the invention consists of wild type 16S rRNA sequence and A531 G and U534A and C1 195T and Al 196G and Al 197G (Ribo-Q3).

In one embodiment the 16S rRNA of the invention consists of wild type 16S rRNA sequence and A531 G and U534A and CI 195T and Al 196G (Ribo-Q4). The invention relates to encoding multiple unnatural amino acids via evolution of a quadruplet decoding ribosome.

Definitions

As the term "orthogonal" is used herein, it refers to a nucleic acid, for example rRNA or mRNA, which differs from natural, endogenous nucleic acid in its ability to cooperate with other nucleic acids. Orthogonal mRNA, rRNA and tRNA are provided in matched groups (cognate groups) which cooperate efficiently. For example, orthogonal rRNA, when part of a ribosome, will efficiently translate matched cognate orthogonal mRNA, but not natural, endogenous mRNA. For simplicity, a ribosome comprising an orthogonal rRNA is referred to herein as an "orthogonal ribosome," and an orthogonal ribosome will efficiently translate a cognate orthogonal mRNA. An orthogonal codon or orthogonal mRNA codon is a codon in orthogonal mRNA which is only translated by a cognate orthogonal ribosome, or translated more efficiently, or differently, by a cognate orthogonal ribosome than by a natural, endogenous ribosome. Orthogonal is abbreviated to O (as in O-mRNA).

Thus, by way of example, orthogonal ribosome (O-ribosome)•orthogonal mRNA (O- mRNA) pairs are composed of: an mRNA containing a ribosome binding site that does not direct translation by the endogenous ribosome, and an orthogonal ribosome that efficiently and specifically translates the orthogonal mRNA, but does not appreciably translate cellular mRNAs.

"Evolved", as applied herein for example in the expression "evolved orthogonal ribosome", refers to the development of a function of a molecule through diversification and selection. For example, a library of rRNA molecules diversified at desired positions can be subjected to selection according to the procedures described herein. An evolved rRNA is obtained by the selection process.

As used herein, the term "mRNA" when used in the context of an O-mRNA O-ribosome pair refers †o an mRNA that comprises an orthogonal codon which is efficiently translated by a cognate O-ribosome, but not by a natural, wild-type ribosome. In addition, if may comprise an mutant ribosome binding site (particularly the sequence from the AUG initiation codon upstream to -13 relative to the AUG) that efficiently mediates the initiation of translation by the O-ribosome, but not by a wild-type ribosome. The remainder of the mRNA can vary, such that placing the coding sequence for any protein downstream of that ribosome binding site will result in an mRNA that is translated efficiently by the orthogonal ribosome, but not by an endogenous ribosome.

As used herein, the term "rRNA" when used in the context of an O-mRNA O-ribosome pair refers†o a rRNA mutated such that the rRNA is an orthogonal rRNA, and a ribosome containing it is an orthogonal ribosome, i.e., it efficiently translates only a cognate orthogonal mRNA. The primary, secondary and tertiary structures of wild-type ribosomal rRNAs are very well known, as are the functions of the various conserved structures (stems-loops, hairpins, hinges, etc.). O-rRNA typically comprises a mutation in 16S rRNA which is responsible for binding of tRNA during the translation process. It may also comprise mutations in the 3' regions of the small rRNA subunit which are responsible for the initiation of translation and interaction with the ribosome binding site of mRNA. The expression of an "O-rRNA" in a cell, as the term is used herein, is not toxic to the cell. Toxicity is measured by cell death, or alternatively, by a slowing in the growth rate by 80% or more relative to a cell that does not express the "O-mRNA." Expression of an O- rRNA will preferably slow growth by less than 50%, preferably less than 25%, more preferably less than 10%, and more preferably still, not at all, relative to the growth of similar cells lacking the O-rRNA.

As used herein, the terms "more efficiently translates" and "more efficiently mediates translation" mean that a given O-mRNA is translated by a cognate O-ribosome at least 25% more efficiently, and preferably at least 2, 3, 4 or 8 or more times as efficiently as an O-mRNA is translated by a wild-type ribosome or a non-cognate O-ribosome in the same cell or cell type. As a gauge, for example, one may evaluate translation efficiency relative to the translation of an O-mRNA encoding chloramphenicol acetyl transferase using at least one orthogonal codon by a natural or non-cognate orthogonal ribosome.

As used herein, the term "corresponding to" when used in reference to nucleotide sequence means that a given sequence in one molecule, e.g., in a 16S rRNA, is in the same position in another molecule, e.g., a 16S rRNA from another species. By "in the same position" is meant that the "corresponding" seauences are aligned with each other when aligned using the BLAST sequence alignment algorithm "BLAST 2 Sequences" described by Tatusova and Madden (1999, "Blast 2 sequences - a new tool for comparing protein and nucleotide sequences", FEMS Microbiol. Lett. 174:247-250) and available from the U.S. National Center for Biotechnology Information (NCBI). To avoid any doubt, the BLAST version 2.2.1 1 (available for use on the NCBI website or, alternatively, available for download from that site) is used, with default parameters as follows: program, blastn; reward for a match, 1 ; penalty for a mismatch, -2; open gap and extend gap penalties 5 and 2, respectively; gap x dropoff, 50; expect 10.0; word size 1 1 ; and filter on.

As used herein, the term "selectable marker" refers to a gene sequence that permits selection for cells in a population that encode and express that gene sequence by the addition of a corresponding selection agent. As used herein, the term "region comprising sequence that interacts with mRNA at the ribosome binding site" refers to a region of sequence comprising the nucleotides near the 3' terminus of 16S rRNA that physically interact, e.g., by base pairing or other interaction, with mRNA during the initiation of translation. The "region" includes nucleotides that base pair or otherwise physically interact with nucleotides in mRNA at the ribosome binding site, and nucleotides within five nucleotides 5' or 3' of such nucleotides. Also included in this "region" are bases corresponding to nucleotides 722 and 723 of the E. coli 16S rRNA, which form a bulge proximal to the minor groove of the Shine-Delgarno helix formed between the ribosome and mRNA.

As used herein, the term "diversified" means that individual members of a library will vary in sequence at a given site. Methods of introducing diversity are well known to those skilled in the art, and can introduce random or less than fully random diversity at a given site. By "fully random" is meant that a given nucleotide can be any of G, A, T, or C (or in RNA, any of G, A, U and C). By "less than fully random" is meant that a given site can be occupied by more than one different nucleotide, but not all of G, A, T (U in RNA) or C, for example where diversity permits either G or A, but not U or C, or permits G, A, or U but not C at a given site.

As used herein, the term "ribosome binding site" refers to the region of an mRNA that is bound by the ribosome at the initiation of translation. As defined herein, the "ribosome binding site" of prokaryotic mRNAs includes the Shine-Delgarno consensus sequence and nucleotides -13 to +1 relative to the AUG initiation codon.

As used herein, the term "unnatural amino acid" refers to an amino acid other than the 20 amino acids that occur naturally in protein. Non-limiting examples include: a p- ace†yl-L-phenylalanine, a p-iodo-L-phenylalanine, an O-methyl-L-tyrosine, a p- propargyloxyphenylalanine, a p-propargyl-phenylalanine, an L-3-(2-naphthyl)alanine, a 3-me†hyl-phenylalanine, an O-4-allyl-L-tyrosine, a 4-propyl-L-tyrosine, a tri-O-acetyl- GlcNAcb-serine, an L-Dopa, a fluorinated phenylalanine, an isopropyl-L-phenylalanine, a p-azido-L-phenylalanine, a p-acyl-L-phenylalanine, a p-benzoyl-L-phenylalanine, an L-phosphoserine, a phosphonoserine, a phosphonotyrosine, a p-bromophenylalanine, a p-amino-L-phenylalanine, an isopropyl-L-phenylalanine, an unnatural analogue of a tyrosine amino acid; an unnatural analogue of a glutamine amino acid; an unnatural analogue of a phenylalanine amino acid; an unnatural analogue of a serine amino acid; an unnatural analogue of a threonine amino acid; an alkyl, aryl, acyl, azido, cyano, halo, hydrazine, hydrazide, hydroxyl, alkenyl, alkynl, ether, thiol, sulfonyl, seleno, ester, thioacid, borate, boronate, phospho, phosphono, phosphine, heterocyclic, enone, imine, aldehyde, hydroxylamine, keto, or amino substituted amino acid, or a combination thereof; an amino acid with a photoactivatable cross-linker; a spin- labeled amino acid; a fluorescent amino acid; a metal binding amino acid; a metal- containing amino acid; a radioactive amino acid; a photocaged and/or photoisomerizable amino acid; a biotin or biotin-analogue containing amino acid; a keto containing amino acid; an amino acid comprising polyethylene glycol or polyether; a heavy atom substituted amino acid; a chemically cleavable or photocleavable amino acid; an amino acid with an elongated side chain; an amino acid containing a toxic group; a sugar substituted amino acid; a carbon-linked sugar- containing amino acid; a redox-active amino acid; an a-hydroxy containing acid; an amino thio acid; an a, a disubstituted amino acid; a b-amino acid; a cyclic amino acid other than proline or histidine, and an aromatic amino acid other than phenylalanine, tyrosine or tryptophan.

International patent application PCT/GB2006/002637 describes the generation of orthogonal ribosome/mRNA pairs in which the ribosome binding site in the O-mRNA binds specifically to the O-ribosome. Briefly, the bacterial ribosome is a 2.5 MDa complex of rRNA and protein responsible for translation of mRNA into protein (The Ribosome, Vol. LXVI. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York; 2001 ). The interaction between the mRNA and the 30S subunit of the ribosome is an early event in translation (Laursen, B.S., Sorensen, H.P., Mortensen, K.K. & Sperling-Petersen, H.U., Microbiol Mol Biol Rev 69, 101 - 123 (2005)), and several features of the mRNA are known to control the expression of a gene, including the first codon (Wikstrom, P.M., Lind, L.K., Berg, D.E. & Bjork, G.R., J Mol Biol 224, 949-966 (1992)), the ribosome-binding sequence (including the Shine Delgarno (SD) sequence (Shine, J. & Delgarno, L, Biochem J 141 , 609-615 (1974), Steitz, J.A. & Jakes, K., Proc Natl Acad Sci U S A 72, 4734-4738 (1975), Yusupova, G.Z., Yusupov, M.M., Cate, J.H. & Noller, H.F., Cell 106, 233-241 (2001 )), and the spacing between these sequences (Chen, H., Bjerknes, M., Kumar, R. & Jay, E., Nucleic Acids Res 22, 4953-4957 (1994)). In certain cases mRNA structure (Gottesman, S. et al. in The Ribosome, Vol. LXVI (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York; 2001 ), Looman, A.C., Bodlaender, J., de Gruyter, M., Vogelaar, A. & van Knippenberg, P.H., Nucleic Acids Res 14, 5481 -5497 (1986)), Liebhaber, S.A., Cash, F. & Eshleman, S.S., J Mol Biol 226, 609-621 ( 1992), or metabolite binding (Winkler, W., Nahvi, A. & Breaker, R.R., Nature 419, 952-956 (2002)), influences translation initiation, and in rare cases mRNAs can be translated without a SD sequence, though translation of these sequences is inefficient (Laursen, B.S., Sorensen, H.P., Mortensen, K.K. & Sperling-Petersen, H.U., Microbiol Mol Biol Rev 69, 101-123 (2005)), and operates through an alternate initiation pathway, Laursen, B.S., Sorensen, H.P., Mortensen, K.K. & Sperling-Petersen, H.U. Initiation of protein synthesis in bacteria. Microbiol Mol Biol Rev 69, 101 -123 (2005). For the vast majority of bacterial genes the SD region of the mRNA is a major determinant of translational efficiency. The classic SD sequence GGAGG interacts through RNA- NA base-pairing with a region at the 3' end of the 16S rRNA containing the sequence CCUCC, known as the Anti Shine Delgarno (ASD). In E. coli there are an estimated 4,122 translational starts (Shultzaberger, R.K., Bucheimer, R.E., Rudd, K.E. & Schneider, T.D., J Mol Biol 313, 215-228 (2001 )), and these differ in the spacing between the SD-like sequence and the AUG start codon, the degree of complementarity between the SD- like sequence and the ribosome, and the exact region of sequence at the 3' end of the 16S rRNA with which the mRNA interacts. The ribosome therefore drives translation from a more complex set of sequences than just the classic Shine Delgarno (SD) sequence. For clarity, mRNA sequences believed to bind the 3' end of 16S rRNA are referred to as SD sequences and to the specific sequence GGAGG is referred to as the classic SD sequence.

Mutations in the SD sequence often lead to rapid cell lysis and death (Lee, K., Holland- Staley, C.A. & Cunningham, P.R., RNA 2, 1270-1285 ( 1996), Wood, T.K. & Peretti, S.W., Biotechnol. Bioeng 38, 891 -906 (1991 )). Such mutant ribosomes mis-regulate cellular translation and are not orthogonal. The sensitivity of cell survival to mutations in the ASD region is underscored by the observation that even a single change in the ASD can lead to cell death through catastrophic and global mis-regulation of proteome synthesis (Jacob, W.F., Santer, M. & Dahlberg, A.E., Proc Natl Acad Sci U S A 84, 4757- 4761 (1987). Other mutations in the rRNA can lead to inadequacies in processing or assembly of functional ribosomes.

PCT/GB2006/02637 describes methods for tailoring the molecular specificity of duplicated E. coli ribosome mRNA pairs with respect to the wild-type ribosome and imRNAs to produce multiple orthogonal ribosome orthogonal mRNA pairs. In these pairs the ribosome efficiently translates only the orthogonal mRNA and the orthogonal mRNA is not an efficient substrate for cellular ribosomes. Orthogonal ribosomes as described therein that do not translate endogenous mRNAs permit specific translation of desired cognate mRNAs without interfering with cellular gene expression. The network of interactions between these orthogonal pairs is predicted and measured, and it is shown that orthogonal ribosome mRNA pairs can be used to post-transcriptionally program the cell with Boolean logic. PCT/GB2006/02637 describes a mechanism for positive and negative selection for evolution of orthogonal translational machinery. The selection methods are applied to evolving multiple orthogonal ribosome mRNA pairs (O-ribosome O-mRNA). Also described is the successful prediction of the network of interactions between cognate and non-cognate O-ribosomes and O-mRNAs.

Here we provide new, further modified orthogonal ribosomes and . methods for producing such O-ribosomes which expand the molecular decoding properties of the ribosome. Specifically, we evolve orthogonal ribosomes that more efficiently decode quadruplet codons.

We disclose evolved orthogonal ribosomes which enhance the efficiency of synthetic genetic code expansion. We provide cellular modules composed of an orthogonal ribosome and an orthogonal mRNA. These pairs function in parallel with, but independent of, the natural ribosome-mRNA pair in Escherichia coli. Orthogonal ribosomes do not synthesize the proteome and may be diverged to- operate using different tRNA decoding rules from natural ribosomes. Here we demonstrate the evolution of orthogonal ribosomes (ribo-Q's) for the efficient, high fidelity decoding of codons such as quadruplet codons placed within the context of an orthogonal mRNA in living cells. We combine ribo-Q, orthogonal mRNAs and orthogonal aminoacyl-†RNA synthetase/tRNA pairs to substantially increase the efficiency of site-specific unnatural amino acid incorporation in E. coli. This advantageously allows the efficient synthesis of proteins incorporating unnatural amino acids at multiple sites, and/or minimizes the functional and/or phenotypic effects of truncated proteins for example in experiments that use unnatural amino acid incorporation to probe protein function in vivo.

ORTHOGONAL CODONS

We describe an evolved ribosome which is capable of translating an orthogonal mRNA codon, which means that the ribosome interprets mRNA information according to a code which is not the universal genetic code, but an orthogonal genetic code. This introduces a number of possibilities, including the possibility of having two separate genetic systems present in the cell, wherein cross-talk is eliminated by virtue of the difference in code; or of a mRNA molecule encoding different polypeptides according to which code is used to translate it.

An orthogonal codon, from which orthogonal genetic codes can be assembled, is a code which is other than the universal triplet code. Table 1 below represents the universal genetic code: Table Second nucleotide

U A

UGU Cysteine

UUU Phenylalanine (Phe) UCU Serine (Ser) UAU Tyrosine U

(Cys)

UUC Phe UCC Ser UAC Tyr UGC Cys C

UUA Leucine (Leu) UCA Ser UAA STOP UGA STOP A

UGG Tryptophan UUG Leu UCG Ser UAG STOP G

(Trp)

CCU Proline CGU Arginine

CUU Leucine (Leu) CAU Histidine (His) U

(Pro) (Arg)

CUC Leu CCC Pro CAC His CGC Arg C

CAA Glutamine

CUA Leu CCA Pro CGA Arg

(Gin)

CUG Leu CCG Pro CAG Gin CGG Arg

ACU Threonine AAU Asparagine

AUU Isoleucine (lie) AGU Serine (Ser) U

(Thr) (Asn)

AUC lie ACC Thr AAC Asn AGC Ser C

AGA Arginine AUA He ACA Thr AAA Lysine (Lys)

(Arg)

AUG Methionine

ACG Thr AAG Lys AGG Arg G or START

GCU Alanine GAU Aspartic acid GGU Glycine

GUU Valine Val U

(Ala) (Asp) (Gly)

GUC (Val) GCC Ala GAC Asp GGC Gly C

GAA Glutamic acid

GUA Val GCA Ala GGA Gly A

(Glu)

GUG Val GCG Ala GAG Glu GGG Gly G

Certain variations in this code occur naturally; for example, mitochondria use UGA to encode tryptophan (Trp) rather than as a chain terminator. In addition,

most animal mitochondria use AUA for methionine not isoleucine and

all vertebrate mitochondria use AGA and AGG as chain terminators. Yeast mitochondria assign all codons beginning with CU to threonine instead of leucine (which is still encoded by UUA and UUG as it is in cytosolic mRNA).

Plant mitochondria use the universal code, and this has permitted angiosperms †o transfer mitochondrial genes to their nucleus with great ease.

Violations of the universal code are far rarer for nuclear genes. A few unicellular eukaryofes have been found that use one or two (of their three) STOP codons for amino acids instead. The vast majority of proteins are assembled from the 20 amino acids listed above even though some of these may be chemically altered, e.g. by phosphorylation, at a later time.

However, two cases have been found in nature where an amino acid that is not one of the standard 20 is inserted by a tRNA info the growing polypeptide.

Selenocysteine. This amino acid is encoded by UGA. UGA is still used as a chain terminator, but the translation machinery is able to discriminate when a UGA codon should be used for selenocysteine rather than STOP. This codon usage has been found in certain Archaea, eubacteria, and animals (humans synthesize 25 different proteins containing selenium).

Pyrrolysine. In one gene found in a member of the Archaea, this amino acid is encoded by UAG. How the translation machinery knows when it encounters UAG whether to insert a tRNA with pyrrolysine or to stop translation is not yet known.

All of the above are, for the purposes of the present invention, considered to be part of the universal genetic code. The present invention enables novel codes, not previously known in nature, to be developed and used in the context of orthogonal mRNA/rRNA pairs.

SELECTION FOR ORTHOGONAL RIBOSOMES A selection approach for the identification of orthogonal ribosome orthogonal mRNA pairs, or other pairs of orthogonal molecules, requires selection for translation of orthogonal codons in O-mRNA. The selection is advantageously positive selection, such that cells which express O-mRNA are selected over those that do not, or do so less efficiently.

A number of different positive selection agents can be used. The most common selection strategies involve conditional survival on antibiotics. Of these positive selections, the chloramphenicol acetyl-transferase gene in combination with the antibiotic chloramphenicol has proved one of the most useful. Others as known in the art, such as ampicillin, kanamycin, tetracycline or streptomycin resistance, among others, can also be used.

O-mRNA/O-rRNA pairs can be used to produce an orthogonal transcript in a host cell, for example CAT, that can only be translated by the cognate orthogonal ribosome, thereby permitting extremely sensitive control of the expression of a polypeptide encoded by the transcript. The pairs can thus be used to produce a polypeptide of interest by, for example, introducing nucleic acid encoding such a pair to a cell, where the orthogonal mRNA encodes the polypeptide of interest. The translation of the orthogonal mRNA by the orthogonal ribosome results in production of the polypeptide of interest. It is contemplated that polypeptides produced in cells encoding orthogonal mRNAOrthogonal ribosome pairs can include unnatural amino acids.

The methods described herein are applicable to the selection of orthogonal mRNA orthogonal rRNA pairs in species in which the O-mRNA comprises orthogonal codons which are translated by the O-rRNA. Thus, the methods are broadly applicable across prokaryotic and eukaryotic species, in which this mechanism is conserved. The sequence of 16S rRNA is known for a large number of bacterial species and has itself been used to generate phylogenetic trees defining the evolutionary relationships between the bacterial species (reviewed, for example, by Ludwig & Schleifer, 1994, FEMS Microbiol. Rev. 15: 155-73; see also Bergey's Manual of Systematic Bacteriology Volumes 1 and 2, Springer, George M. Garrity, ed.). The Ribosomal Database Project II (Cole JR, Chai B, Farris RJ, Wang Q, Kulaim SA, McGarrell DM, Garrity GM, Tiedje JM, Nucleic Acids Res, (2005) 33(Database Issue) :D294-D296. doi: 10.1093/nar/gki038) provides, in release 9.28 (6/17/05), 155,708 aligned and annotated 16S rRNA sequences, along with online analysis tools. Phylogenetic trees are constructed using, for example, 16S rRNA sequences and the neighbour joining method in the ClustalW sequence alignment algorithm. Using a phylogenetic tree, one can approximate the likelihood that a given set of mutations (on 16S rRNA and a codon in mRNA) that render the set orthogonal with respect to each other in one species will have a similar effect in another species. Thus, the mutations rendering mRNA/16S rRNA pairs orthogonal with respect to each other in one member of, for example, the Enterobacteriaceae Family (e.g., E. coli) would be more likely to result in orthogonal mRNA/ orthogonal ribosome pairs in another member of the same Family (e.g., Salmonella) than in a member of a different Family on the phylogenetic tree.

In some instances, where bacterial species are very closely related, it may be possible to introduce corresponding 16S rRNA and mRNA mutations that result in orthogonal molecules in one species into the closely related species to generate an orthogonal mRNA orthogonal rRNA pair in the related species. Also where bacterial species very are closely related (e.g., for E. coli and Salmonella species), it may be possible to introduce orthogonal 16S rRNA and orthogonal mRNA from one species directly to the closely related species to obtain a functional orthogonal mRNA orthogonal ribosome pair in the related species.

Alternatively, where the species in which one wishes to identify orthogonal mRNA orthogonal ribosome pairs is not closely related (e.g., where they are not in the same phylogenetic Family) to a species in which a set of pairs has already been selected, one can use selection methods as described herein to generate orthogonal mRNA orthogonal ribosome pairs in the desired species. Briefly, one can prepare a library of mutated orthogonal 16S rRNA molecules. The library can then be introduced to the chosen species. One or more O-mRNA sequences can be generated which comprise a sequence encoding a selection polypeptide as described herein using one or more orthogonal codons (the bacterial species must be sensitive to the activity of the selection agents, a matter easily determined by one of skill in the art) . The O-mRNA library can then be introduced to cells comprising the O-rRNA library, followed by positive selection for those cells expressing the positive selectable marker in order to identify orthogonal ribosomes that pair with the O-mRNA.

The methods described herein are applicable to the identification of molecules useful to direct translation or other processes in a wide range of bacteria, including bacteria of industrial and agricultural importance as well as pathogenic bacteria. Pathogenic bacteria are well known to those of skill in the art, and sequence information, including not only 16S rRNA sequence, but also numerous mRNA coding sequences, are available in public databases, such as GenBank. Common, but non-limiting examples include, e.g.. Salmonella species, Clostridium species, e.g., Clostridium botulinum and Clostridium perfringens, Staphylococcus sp., e.g, Staphylococcus aureus; Campylobacter species, e.g., Campylobacter jejuni, Yersinia species, e.g.. Yersinia pestis, Yersinia enterocolitica and Yersinia pseudotuberculosis, Listeria species, e.g.. Listeria monocytogenes, Vibrio species, e.g., Vibrio cholerae, Vibrio parahaemolyticus and Vibrio vulnificus, Bacillus cereus, Aeromonas species, e.g., Aeromonas hydrophila, Shigella species, Streptococcus species, e.g., Streptococcus pyogenes, Streptococcus faecalis. Streptococcus faecium, Streptococcus pneumoniae, Streptococcus durans, and Streptococcus avium, Mycobacterium tuberculosis, Klebsiella species, Enterobacter species, Proteus species, Citrobacter species, Aerobacter species, Providencia species, Neisseria species, e.g., Neisseria gonorrhea and Neisseria meningitidis, Heamophilus species, e.g., Haemophilus influenzae, Helicobacter species, e.g., Helicobacter pylori, Bordetella species, e.g., Bordetella pertussis, Serratia species, and pathogenic species of E. coli, e.g., Enterotoxigenic E. coli (ETEC), enteropathogenic E. coli (EPEC) and enterohemorrhagic E. coli 0157:H7 (EHEC). RELEASE FACTOR 1 /AMBER CODONS

Advantageously, to maximize the efficiency of full-length protein synthesis with respect to truncated protein, the effects of release factor 1 (RF-l )-mediated chain termination would be minimized for the expression of a gene of interest.

Unlike the natural ribosome the orthogonal ribosome is not responsible for synthesizing the proteome, and is therefore tolerant to mutations in the highly conserved rRNA that cause lethal or dominant negative effects in the natural ribosome. Orthogonal ribosomes may therefore be advantageously evolved towards decreased RF-1 binding.

We disclose the synthetic evolution of orthogonal ribosomes (ribo-Q's) for the efficient, high fidelity decoding of quadruplet codons placed within the context of an orthogonal mRNA in living cells. Ribo-Q's may preferably be combined with orthogonal mRNAs and orthogonal aminoacyl-tRNA synthetase/†RNA pairs to advantageously significantly increase the efficiency of site-specific unnatural amino acid incorporation in E. coli. This increase in efficiency makes it possible †o synthesize proteins incorporating unnatural amino acids a† multiple sites, and minimizes the functional and phenotypic effects of truncated proteins in vivo. This has clear industrial application and utility, for example in the manufacture of proteins incorporating unnatural amino acids.

BACTERIAL TRANSFORMATION The methods described herein rely upon the introduction of foreign or exogenous nucleic acids into bacteria. Methods for bacterial transformation with exogenous nucleic acid, and particularly for rendering cells competent to take up exogenous nucleic acid, is well known in the art. For example, Gram negative bacteria such as E. coli are rendered transformation competent by treatment with multivalent cationic agents such as calcium chloride or rubidium chloride. Gram positive bacteria can be incubated with degradative enzymes to remove the peptidoglycan layer and thus form protoplasts. When the protoplasts are incubated with DNA and polyethylene glycol, one obtains cell fusion and concomitant DNA uptake. In both of these examples, if the DNA is linear, it tends to be sensitive to nucleases so that transformation is most efficient when it involves the use of covalently closed circular DNA. Alternatively, nuclease- deficient cells (RecBC- strains) can be used to improve transformation.

Electroporation is also well known for the introduction of nucleic acid to bacterial cells. Methods are well known, for example, for electroporation of Gram negative bacteria such as E. coli, but are also well known for the electroporation of Gram positive bacteria, such as Enterococcus faecalis, among others, as described, e.g., by Dunny et al., 1991 , Appl. Environ. Microbiol. 57: 1 194-1201 . The in vivo, genetically programmed incorporation of designer amino acids allows the properties of proteins to be tailored with molecular precision 1 . The Methanococcus jonnaschii tyrosyl-tRNA synthetase/ †RNACUA (Μ/iyrRS/tRNAcuA) 2 · 3 and the Mefhanosarcina barkeri pyrrolysyl-tRNA synthetase/tRNAcuA (MbPylRS/tRNAcuA) 4"6 orthogonal pairs have been evolved to incorporate a range of unnatural amino acids in response to the amber codon in E. co//'- 6 · 7 . However, the potential of synthetic genetic code expansion is generally limited to the low efficiency incorporation of a single type of unnatural amino acid at a time, since every triplet codon in the universal genetic code is used in encoding the synthesis of the proteome. In order to efficiently encode multiple distinct unnatural amino acid into proteins we require i) blank codons and ii) mutually orthogonal aminoacyl-tRNA synthetase/tRNA pairs that recognize unnatural amino acids and decode the new codons. Here we synthetically evolve an orthogonal ribosome 8 ' 9 (riboQl ) that efficiently decodes a series of quadruplet codons and the amber codon, providing several blank codons on an orthogonal mRNA, which it specifically translates 8 . By creating mutually orthogonal aminoacyl-tRNA synthetase/ †RNA pairs and combining these with riboQ l we direct the incorporation of distinct unnatural amino acids in response to two of the new blank codons on the orthogonal mRNA (Figure 5). Using this code, we genetically direct the formation of a specific, redox insensitive, nanoscale protein cross-link via the bio-orthogonal cycloaddition of encoded azide and alkyne containing amino acids 10 . Since the synthetase/t NA pairs used have been evolved to incorporate numerous unnatural amino acids'- 6 - 7 it will be possible to encode more than 200 unnatural amino acid combinations using this approach. Since ribo-Ql independently decodes a series of quadruplet codons this work provides foundational technologies for the encoded synthesis and synthetic evolution of unnatural polymers in cells.

A ribosome must accommodate an extended anticodon†RNA into its decoding centre to decode it 17 ' ,8 . Natural ribosomes are very inefficient at, and unevolvable for quadruplet decoding (Figure 6), which would enhance misreading of the proteome. In contrast orthogonal ribosomes 8 , which are specifically addressed to the orthogonal message, and are not responsible for synthesizing the proteome, may, in principle, be evolved to efficiently decode quadruplet codons on the orthogonal message. To discover evolved orthogonal ribosomes that enhance quadruplet decoding we first created 1 1 saturation mutagenesis libraries in the 16S rRNA of ribo-X (an orthogonal ribosome previously evolved for efficient amber codon decoding on an orthogonal message 9 ; taken together these libraries cover 127 nucleotides that are within 12 A of a tRNA bound in the decoding centre 19 (Figure 7). We used ribo-X as a starting point for library generation because we hoped to discover evolved orthogonal ribosomes that gain the ability to efficiently decode quadruplet codons while maintaining the ability to efficiently decode amber codons on the orthogonal mRNA; thereby maximizing the number of additional codons that can be decoded on the orthogonal ribosome.

To select orthogonal ribosomes that efficiently decode quadruplet codons using extended anticodon tRNAs we combined each O-ribosome library with a reporter construct (O-caf (AAGA 1 6)/tRNA Ser2 ucuu). The reporter contains a chloramphenicol acetyl transferase gene that is specifically translated by O-ribosomes 9 , an in frame AAGA quadruplet codon and tRNA Ser2 ucuu (a designed variant of †RNA Ser2 that is aminoacylated by E. coli seryl-tRNA synthetase and decodes the AAGA codon 9 ·∞). The orthogonal caf gene is read in frame, and confers chloramphenicol resistance, only if †RNA Ser2 ucuu efficiently decodes the AAGA codon and restores the reading frame. Clones surviving on chloramphenicol concentrations which kill cells containing ribo-X and the cat reporter have 4 distinct sequences. Clone ribo-Q4 has double mutations at CI 195A and Al 196G, ribo-Q3 has the triple mutations at CI 195T, Al 196G and Al 197G; ribo-Q2 and ribo-Ql have the double mutation at Al 196G and Al 197G, ribo-Q2 also has eight additional non-programmed mutations. While the entire decoding centre was mutated, the selected mutations are spatially localized and might accommodate an extended anticodon:codon interaction in the decoding centre (Figure la). The chloramphenicol resistance of cells containing†RNA ser2 ucuu and cat with two AGGA codons is greatly enhanced when the cat gene is translated by ribo-Q ribosomes in place of unevolved ribosomes (Figure 1b,c). Indeed the chloramphenicol resistance of cells containing two AGGA codons read by the riboQ ribosomes approaches that of a wild-type cat gene. This suggests that riboQl may decode quadruplet codons with an efficiency approaching that for triplet decoding and with a much greater efficiency than the unevolved ribosome. The enhancement in quadruplet decoding efficiency is maintained for a variety of quadruplet codon-anticodon interactions (Figure 8). Natural ribosomes decode triplet codons with high fidelity (error frequencies ranging from 10-2 to 10- 4 errors per codon have been reported 2 '- 23 ). To explicitly compare the fidelity of triplet decoding and quadruplet decoding for the evolved orthogonal ribosomes and the progenitor ribosome we used two independent methods: the incorporation of 35 S cysteine into a protein, which contains no cysteine codons in its gene 9 and variants of a dual luciferase system 9 ' 23 (Figure 9). We find that the triplet and quadruplet decoding translational fidelity is the same for the evolved ribosome (ribo-Ql ) and un-evolved and wild-type ribosomes, and that the 4th base of the codon-anticodon interaction is discriminated equally well by all ribosomes (Figure 9). To demonstrate that the enhanced amber decoding properties of ribo-X are maintained in ribo-Ql we compared the efficiency of incorporating p-benzoyl-L- phenylalanine (Bpa, 1) into a recombinant GST-MBP fusion in response to an amber codon on an orthogonal mRNA using orthogonal ribosomes and a previously evolved p-benzoyl-L-phenylalanyl-tRNA synthetase/tRNAcuA pair 3 (BpaRS/tRNAcuA) (Figure 2). Ribo-Ql and ribo-X incorporate 1 with a comparable and high efficiency in response to the amber codons in the orthogonal mRNA (compare lanes 4 & 6 and lanes 10 & 12 in Figure 2a). Ribo-X and ribo-Ql are substantially more efficient than the wild type ribosome at incorporating 1 via amber suppression (compare lanes 4 & 6 to lane 2 & lanes 10 & 12 to lane 8 in Figure 2a) .

To demonstrate the utility of ribo-Ql for incorporating unnatural amino acids in response to quadruplet codons we compared the efficiency of incorporating p-azido-L- phenylalanine (AzPhe, 2) into a recombinant GST-MBP fusion in response to a quadruplet codon using ribo-Ql or the wild-type ribosome. In order to direct the incorporation of 2 we used the AzPheRS*/†RNAuccu pair (a variant of the pAzPheRS- 7/†RNACUA pair 24 derived from the MjTyrRS/tRNAcuA pair for the incorporation of 2 as described below). We find that ribo-Ql substantially increases the efficiency of incorporation of 2 in response to a quadruplet codon, and even allows the incorporation of 2 in response to two quadruplet codons for the first time (compare lanes 2 & 6 and lanes 4 & 8, Figure 2b) . The site and fidelity of incorporation of 2 were further confirmed by analysis of tandem mass spectrometry (MS/MS) fragmentation series of the relevant tryptic peptides (Figure 11 ).

To take advantage of ribo-Q l for the incorporation of multiple distinct unnatural amino acids in recombinant proteins, we required mutually orthogonal aminoacyl-†RNA syn†hetase/†RNA pairs. We demonstrated that the MbPylRS/tRNAcuA pair *■ 5 and M TyrRS/†RNAcuA pair 2 , each of which have previously been evolved to incorporate a range of unnatural amino acids '· 6 - 7 · 25 , are mutually orthogonal in their aminoacylation specificity (Figure 12) . We created the AzPheRS*/†RNAuccu pair, which is derived from the M/TyrRS/tRNAcuA pair, by a series of generally applicable directed evolution steps (Figures 13-15). The MbPylRS/tRNAcuA pair and AzPheRSVtRNAuccu pair are mutually orthogonal: they decode distinct codons, use distinct amino acids and are orthogonal in their aminoacylation specificity (Figure 16) .

To demonstrate the simultaneous incorporation of two useful unnatural amino acids into a single protein we combined the MbPylRS/MbtRNAcuA pair, the AzPheRS* tRNAuccu pair and ribo-Ql in E. coli. We used these components to produce full-length GST- calmodulin containing 2 (AzPhe) and N6-[(2-propynyloxy)carbonyl]-L-lysine (CA , 4, which we recently discovered is an efficient substrate for MbPyIRS 7 ) (Figure 3) in response to an AGGA and UAG codon in an orthogonal gene. Production of the full- length protein required the addition of both unnatural amino acids. We further confirmed the incorporation of 2 and 4 at the genetically programmed sites by MS/MS sequencing of a single tryptic fragment containing both unnatural amino acids (Figure 3).

To begin to demonstrate that emergent properties may be programmed into proteins via combinations of unnatural amino acids we genetically directed the formation of a triazole cross-link, via a copper catalysed Husigen [2+3] cycloaddition reaction ("Click reaction"). 10 We first encoded 2 and 4 at position 1 and 1 9 in calmodulin (Figure 4) . After incubation of calmodulin incorporating the azide (2) and alkyne (4) at these positions with Cu (I) for 5 minutes we observe a more rapidly migrating protein band in SDS-PAGE. MS/MS sequencing unambiguously confirms that the faster mobility band results from the product of bio-orthogonal cycloaddition reaction between 2 and 4. Our results demonstrate the genetically programmed proximity acceleration of a new class of asymmetric, redox insensitive cross-link that can be used to specifically constrain protein structure on the nanometer scale. Unlike existing protein cyclization methods for recombinant proteins 26 < 27 , these cross-links can be encoded at any spatially compatible sites in a protein, not just placed at the termini. In contrast to the chemically diverse cyclization methods that can be accessed with peptides by solid- phase peptide synthesis 28 these cross-links can be encoded into proteins of essentially any size. Given the importance of disulfide bonds in natural therapeutic proteins and hormones, the utility of peptide stapling strategies 29 , the importance of peptide cyclization 30 , and the improved stability of proteins cyclized by native chemical ligation 26 it will be interesting to investigate the enhancement of protein function that may be accessed by combining the encoding of these cross-links with directed evolution methods. By combining the numerous variant MjTyrRS/tRNAcuA and MbPylRS/tRNAcuA pairs reported for the incorporation of unnatural amino acids '· 6 · 7 (after appropriate anticodon conversion using the steps reported here) with ribo-Ql it will be possible to encode more than 200 amino acid combinations in recombinant proteins. Experimental (methods summary)

Methods for cloning, site-directed mutagenesis and library construction are described in the Supplementary Materials. Ribosome libraries were screened for quadruplet suppressors using a modification of the strategy to discover ribo-X .

E. coli genehogs or DH 10B were used in all protein expression experiments using LB medium supplemented with appropriate antibiotics and unnatural amino acids. Proteins were purified by affinity chromatography using published standard protocols. Translational fidelity of evolved O-ribosomes was measured by mis-incorporation of 35 S- labelled cysteine 9 . Briefly, GST-MBP was produced by the O-ribosome in the presence of 35 S-cysteihe. The protein was purified, cleaved with thrombin, which cleaves the linker between GST and MBP, and analysed by SDS-PAGE and phospho-imaging. A modified Dual-luciferase assay was used to measure the fidelity of translation of O-ribosomes 9 . Luminescence from a luciferase mutant containing an inactivating missense mutation in this assay is a measure of translational inaccuracy of the ribsome. The DLR was translated by the O-ribosome, extracted in the cold and luciferase activity measured using the Dual-Luciferase Reporter Assay System (Promega).

LC/MS/MS of proteins was performed by NextGen Science (Ann Arbor, USA). Proteins were excised from Coomassie stained SDS-PAGE gels, digested with trypsin and analysed by LC/MS/MS. Total protein mass was obtained by ESI-MS; purified protein was dialysed into 10 mM ammonium bicarbonate pH 7.5, mixed 1 : 1 with 1 % formic acid in 50% methanol and total mass determined in positive ion mode.

Cyclization reactions were performed for 5 minutes at room-temperature on purified protein in 50 mM sodium phosphate pH 8.3 in the presence of 1 mM ascorbic acid, 1 mM CuSCu and 2 mM balhophenathroline. Details of all methods can be found in the Supplementary Materials.

Definitions

The term 'comprises' (comprise, comprising) should be understood to have its normal meaning in the art, i.e. that the stated feature or group of features is included, but that the term does not exclude any other stated feature or group of features from also being present.

Brief Description of the Figures

Figure 1. Selection and characterization of orthogonal quadruplet decoding ribosomes. a. Mutations in quadruplet decoding ribosomes form a structural cluster close to the space potentially occupied by an extended anticodon tRNA. Selected nucleotides are shown in red. b. Ribo-Qs substantially enhances the tRNA decoding of quadruplet codons. The†RNA ser2 uccu-dependent enhancement in decoding AGGA codons in the O-caf (AGGA 103, AGGA 146) gene was measured by survival on increasing concentrations of chloramphenicol (Cm) , c. As in b, but measuring CAT enzymatic activity directly by thin-layer chromatography acetylated chloramphenicol (AcCm). ribo-X (Rx), ribo-Ql -4 (Q1 -Q4) and the O-ribosome (O)

Figure 2. Enhanced incorporation of unnatural amino acids in response to amber and quadruplet codons with ribo-Ql . a. Ribo-Q l incorporates Bpa (p-benzoyl-L- phenylalanine) as efficiently as ribo-X. The entire gel is shown in Figure 10. b. Ribo-Q l enhances the efficiency AzPhe (p-azido-L-phenylalanine) in response to the AGGA quadruplet codon using AzPheRSVtRNAuccu. The gel showing the ratio of GST-MBP to GST as well as MS/MS spectra of the single and double AzPhe incorporations are shown in Figure 11. (UAG)n or (AGGA)n describes the number of amber or AGGA codons (n) between gst and ma/E.

Figure 3. Encoding an azide and an alkyne in a single protein via orthogonal translation. a. Expression of GST-CaM-His6 (a glutathione-S-transferase calmodulin his6 fusion) containing two unnatural amino acids. An orthogonal gene producing a GST-CaM-HiS6 fusion that contains an AGGA codon at position 1 and an amber codon at position 40 of calmodulin (CaM))was translated by ribo-Ql in the presence of AzPheRSVtRNAuccu and MbPylRS/tRNAcuA. The entire gel is shown in Figure 17. b. LC/MS/MS analysis of the incorporation of two distinct unnatural amino acids into the linker region of GST-MBP. (2 is denoted as Y* and 4 as K*).

Figure 4. Genetically directed cyclization of calmodulin via a Cu(l)-ca†alyzed Huisgens [3+2]-cycloaddition. a Structure of calmodulin indicating the sites of incorporation of 2 and 4 and their triazole product. Image created using Pymol (www.pymol.org) and pdb-file 4CLN. b. GST-CaM-His 6 l AzPhe 149CAK specifically cyclizes with Cu(l)-catalyst. AzPhe is 2, Tyr is tyrosine, BocK is 3 and CAK is 4. Lanes 1 and 2 are from a separate gel c. LC/MS/MS confirms the triazole formation. The MS/MS spectra of a doubly charged peptide containing the crosslink (m/z= 1226.6092, which is within 1.8 ppm of the mass expected for cross-linked peptide).

Figure 5. Strategy for the synthesis of an orthogonal genetic code. Combining the two mutually orthogonal pairs (ΛΊ b Pyl RS/ΛΊ btRN Acu A and Mj ' AzPheRSVtRNAuccu) with evolved orthogonal ribosomes (Ribo-Q) creates a system that is able to decode the UAG and AGGA codons on an orthogonal mRNA (O-mRNA) to produce a protein that contains two distinct unnatural amino acids at genetically encoded sites. UAG is decoded as 4 (CAK) or 3 (BocLys) by MbPylRS/MbtRNAcuA while AGGA is decoded as 2.

Figure 6. Evolving an orthogonal quadruplet decoding ribosome. The natural ribosome (gray) and the progenitor orthogonal ribosome (green) utilize tRNAs with triplet anticodon to decode triplet codons in both wt- (black) and orthogonal- (purple) mRNAs, respectively. The decoding of quadruplet codons with extended anticodon tRNAs (red) is of low efficiency (light gray arrows) on both ribosomes. Synthetic evolution of the orthogonal ribosome leads to an evolved scenario in which a mutant (orange patch) orthogonal ribosome more efficiently decodes quadruplet codons on orthogonal mRNAs using extended anticodon tRNAs. Decoding of extended anticodon tRNAs on natural mRNAs is unaffected because the orthogonal ribosome does not read natural mRNAs and the natural ribosome is unaltered.

Figure 7. Comprehensive mutagenesis of the ribosome decoding centre.

A. Structure of the ribosomal small subunit with bound tRNAs and mRNAs. tRNA anticodon stem loops are bound to A site (yellow), P site (cyan), and E site (dark blue). The mRNA is shown in purple. 16S ribosomal RNA is shown in green and ribosomal proteins in gray. The 1 18 residues in the decoding centre, targeted for mutation in the 1 1 libraries, are shown in orange (This figure was created using Pymol v0.99 (www.pymol.org) and PDB ID 2J00). B. Secondary structure of the E. co/i 16S ribosomal RNA (www.rng.ccbb.utexas.edu). The nucleotides targeted for mutation are shown colored orange.

Figure 8. Ribo-Q enhances the tRNA dependent decoding of different quadruplet codons. Ribo-X, Ribo-Q 1-4 and the O-ribosome were produced from pRSF-O-rDNA vectors. The tRNAser2UCUA-dependent enhancement in decoding UAGA codons in the O-caf (UAGA103, UAGA146), the †RNAser2AGGG-dependent enhancement in decoding CCCU codons in the O-cat (CCCU 103, CCCU146), and the†RNAser2UCUU- dependent enhancement in decoding AAGA codons in the O-caf (AAGA146) was measured by survival on increasing concentrations of chloramphenicol. pRSF-O-rDNA vectors and corresponding O-caf vectors were co-transformed into GeneHogs cells. Transformed cells were recovered for 1 h in SOB medium containing 2% glucose and used to inoculate 200 ml of LB-GKT (LB medium with 2% glucose, 25 pg ml-' kanamycin and 12.5 pg ml-' tetracycline). After overnight growth (37°C, 250 r.p.m., 16 h), 2 ml of the cells were pelleted by centrifugation (3,000g), and washed three times with an equal volume of LB-KT (LB medium with 12.5 pg ml-' kanamycin and 6.25 vg mH tetracycline). The resuspended pellet was used to inoculate 18 ml of LB-KT, and the resulting culture incubated (37°C, 250 r.p.m. shaking, 90 min). To induce expression of plasmid encoded O-rRNA, 2 ml of the culture was added to 18 ml LB-IKT (LB medium with 1.1 mM isopropyl-D-†hiogalactopyranoside (IPTG), 12.5 pg mH kanamycin and 6.25 g mH tetracycline) and incubated for 4 h (37°C, 250 r.p.m.). Aliquots (250 μΙ optical density at 600 nm (OD600) = 1 .5) were plated on LB-IKT agar (LB agar with 1 mM IPTG, 12.5 pg ml-' kanamycin and 6.25 [ig ml- 1 tetracycline) supplemented with 50 pg mH chloramphenicol and incubated (37°C, 40 h).

Figure 9: The translation fidelity of evolved ribosomes is comparable to that of the natural ribosome. A. The translational error frequency for triplet decoding as measured by, 35 S-cysteine misincorporation is indistinguishable for ribo-Ql , ribo-Q3-Q4, ribo-X, the unevolved orthogonal ribosome and the wild-type ribosome. GST-MBP was synthesized by each ribosome in the presence of 35 S-cysteine, purified on glutathione sepharose and digested with thrombin. The left panel shows a Coomassie stain of the thrombin digest. The un-anno†a†ed bands result primarily from the thrombin preparation. The right panel shows 35 S labeling of proteins in the same gel, imaged using a Storm Phosphorimager. Lanes 1-6 show thrombin cleavage reactions of purified protein derived from cells containing the indicated ribosome (with the ribosomal RNA produced from pSC l Ol * constructs that drive rRNA from a P1 P2 promoter) and either pO-gst-malE (for orthogonal ribosomes) or pgst-molE (for wild-type ribosomes). The size markers are pre-stained standards (Bio-Rad 161-0305). The error frequency per codon translated by the ribo-Q ribosomes as measured by this method was less than l xl O 3 . Control experiments with the progenitor orthogonal ribosome, ribo-X and the wild-type ribosome allowed us to put the same limit on their fidelity. This limit compares favourably with previous measurements of error frequency using 35 S mis-incorporation (4x10 3 errors per codon) 33 B. The translational fidelity of ribo-Q 1 in triplet decoding is comparable to that of the un-evolved ribosome, as measured by a dual-luciferase assay. In this system a C-terminal firefly luciferase is mutated at codon K529(AAA), which codes for an essential lysine residue. The extent to which the mutant codon is misread by tRNA L v (UUU) is determined by comparing the firefly luciferase activity resulting from the expression of the mutant gene to the wild-type firefly luciferase, and normalizing any variability in expression using the activity of the co-translated N-†erminal Ren/7/a luciferase. Previous work has demonstrated that measured firefly luciferase activities in this system result primarily from the synthesis of a small amount of protein that mis-incorporates lysine in response to the mutant codon 23 , rather than a low activity resulting from the more abundant protein containing encoded mutations. In experiments examining the fidelity of ribo-Q 1 , lysate from cells containing pSC 101 * -ribo- Ql and pO-DLR and its codon 529 variants were assayed. Control experiments used lysates from cells containing pSC 101 * -O-ribosome and pO-DLR and its codon 529 variants. C. The quadruplet decoding fidelity of ribo-Q is comparable to that of un- evolved ribosomes. Efficiencies were determined using a dual luciferase construct with an N-terminal Renilla and C-terminal Firefly luciferase (Ren-FF) . The reporter was mutated to include a quadruplet AGGA codon in the linker between the two luciferases (Ren-AGGA-FF). Ren-AGGA-FF was transformed into DH 10B cells along with a non-cognate anticodon Ser2A tRNA (UCUA or AGGG) and either ribo-Q or the O- ribosome. Readthrough efficiency for Ren-AGGA-FF was measured by taking the ratio of Firely luminescence/Renilla luminescence. This data was divided by the same Firefly/Renilla ratio when using the Ren-FF construct in the presence of tRNA (to normalize for effects of the tRNA on sites outside the AGGA codon under investigation). In order to obtain the level of decoding by these non-cognate tRNAs as a fraction of decoding by cognate tRNA, these data were compared with that obtained from the same experiment using a cognate Ser2A tRNA with the UCCU anti-codon. The data represent the average of at least 4 trials. The error bars represent the standard deviation. D Fourth base specificity in quadruplet decoding. £:. col/ DH 10B expressing the indicated combination of an O-ribosome, a chloramphenicol acetyltransferase gene under the control of an orthogonal rbs with a quadruplet codon at a permissive site and E. coli Ser2A tRNAuccu were scored for their ability to grow in the presence of increasing amounts of chloramphenicol. The fractional activity is the maximal Cm resistance of the cells relative to the combination containing a cognate codon in the mRNA and a particular o-ribosome.

Figure 10: Ribo-Q l enhances the efficiency of BpaRS/tRNAcuA-dependent unnatural amino acid incorporation in response to single and double UAG codons, maintaining the enhanced amber decoding of ribo-X. In each lane an equal volume of protein purified from glutathione sepharose under identical conditions is loaded. Orthogonal ribosomes are produced from pSC10r-ribo-X, pSC 10r-ribo-Ql . Bpa, p-benzoyl-L- phenylalanine (1 ). The BpaRS/tRNAcuA pair is produced from pSUPBpa that contains six copies of M/tRNAcuA.. (UAG)n describes the number of amber stop codons (n) between gsf and ma/E in 0-gsf(UAG) n ma/E or gsf (U AG) n malE. The ratio of GST-MBP to GST reflects the efficiency of amber suppression versus RF1 mediated termination. A part of this gel showing the band for full-length GST-MBP is shown in Figure 2 of the main text. Figure 11: Ribo-Ql enhances the efficiency of AzPheRS*/†RNAuccu unnatural amino acid incorporation in response to AGGA quadruplet codons. A. Ribo-Ql is produced from pSC lor-ribo-Q l . AzPhe, 2.5 mM 2. The AzPheRSVtRNAuccu pair is produced from pDULE AzPheRS*/†RNAuccu that contains a single copy of MjtRNAuccu. (AGGA)n describes the number of quadruplet codons (n) between gst and alE in 0-gsf(AGGA) n ma/E or gsf(AGGA)n/na/E. The ratio of GST-MBP to GST reflects the efficiency of frameshift suppression. A part of this gel showing the bands for full-length GST-MBP is shown in Figure 2 of the main text. B & C. MS/ MS spectra of tryptic fragments incorporating one or two AzPhes respectively. Figure 12. MbPylRS/MbtRNAcuA and MjTyrRS/tRNAcuA pairs are mutually orthogonal in their aminoacylation specificity. A. The decoding network of MbPylRS/MbtRNAcuA (lime) and M/TyrRS/tRNAcuA (grey) and its unnatural amino acid incorporating derivatives. A unique unnatural amino acid is specifically recognized by each of the synthetases and used to aminoacylate its cognate tRNA. We asked whether the MbPylRS/tRNAcuA pair 4 · 5 · 34 and M TyrRS/tRNAcuA pair are mutually orthogonal in their aminoacylation specificity. Our experiments demonstrate that there is no cross-acylation (grey arrows) between the two aminoacyl-tRNA synthetase/tRNAcuA pairs (as shown by decoding the amber codon in myo4TAGHis6 using the different combinations of synthetases and tRNAs, see below) . However, both tRNAs direct the incorporation of their amino acid in response to the amber codon. B. E. coli DH10B were transformed with pMyo4TAG-His&, a plasmid holding the gene for sperm whale myoglobin with an amber codon at position 4 and a C-†erminal hexahistidine tag and an expression cassette for either MbtRNAcuA or M tRNAcuA. MbPylRS or M/TyrRS were provided on pBKPylS or pB M/TyrRS, respectively. Cells expressing MbPyIRS received 10 mM 3 (BocLys) as a substrate for the synthetase. Myoglobin-His6 produced by the cells was purified by Ni 2+ -affinity chromatography, analysed by SDS-PAGE and detected with Coomassie stain or Western blot against the

Figure 13. Genetically encoding 2 in response to a quadruplet codon. A. M/ ' AzPheRS aminoacylates its cognate amber suppressor †RNACUA with 2. To differentiate the codons that the two mutually orthogonal tRNAs decode and to create a pair for the incorporation of an unnatural amino acid in response to a quadruplet codon, we altered the anticodon of MjtRNAcuA from CUA to UCCU to create M/tRNAuccu. After this, the resulting tRNAuccu is no longer a substrate of the parent MjAzPheRS. To create a version of AzPheRS-7 that aminoacylates M/tRNAuccu we identified six residues (Y230, C231 , P232, F261 , H283, D286) in the parent synthetase that recognize the anticodon of the†RNA 35 and mutated these residues to all possible combinations, creating a library of 10 8 possible synthetase mutants. To select for AzPheRS mutants that specifically aminoacylate M/tRNAuccu we created a chloramphenicol acetyl transferase reporter (pREP JY(UCCU), derived from pREP YC-JYCUA 32 ), which contains the four base codon AGGA at position 1 1 1 , a site permissive to the incorporation of a range of amino acids. In the absence or presence of AzPheRS/MjtRNAuccu this reporter confers resistance to chloramphenicol at low levels (30-50 pg mN). We selected synthetase variants on 150 g ml-' of chloramphenicol that, in combination with M/tRNAuccu, specifically direct the incorporation of 2 in response to the AGGA codon on pREP JY(UCCU). We characterized 24 synthetase/tRNAuccu pairs by their chloramphenicol resistance in the presence of 2 and pREP JY(UCCU) . The seven best synthetase/tRNAuccu combinations confer a chloramphenicol resistance of 250-350 pg ml-' on cells containing 2 and pREP JY(UCCU) (Figure 14) . In the absence of the 2, we observe only background levels of resistance (30 pg ml ') for several synthetases indicating that the synthetase/MjtRNAuccu pairs specifically direct the incorporation of 2 in response to the quadruplet codon AGGA. Sequencing these seven clones revealed similar but non-identical mutations (Figure 14). B. Library design. Structure of M/TyrRS (grey) bound to its cognate tRNA (orange). Residues of the synthetase that recognize the anticodon and which are mutated in the library, as well as bases of the natural anticodon (G34, U35, A36) are shown in blue (Figure created using Pymol, www.pymol.org, and pdb-file 1 J 1 U). C. The production of full-length myoglobin from myo4(AGGA)-hiS6 by the AzPheRS*- 2/M ' tRNAuccu pair is dependent on the presence of 2. In the remainder of the text we refer to M/AzPheRS*-2 as MjAzPheRS* for simplicity. MjAzPheRSVtRNAuccu efficiently suppress an AGGA codon placed into the myoglobin gene. E. coli DH 10B were transformed with pMyo4TAG-His6 or pMyo4AGGA-HiS6, a plasmid holding the gene for sperm whale myoglobin with an amber or an AGGA codon at position 4, respectively, and a C-terminal hexahistidine tag and an expression cassette for either M/†RNACUA or M/tRNAuccu. M/AzPheRS or M/AzPheRS* were provided on pBKMjAzPheRS or pBKMy ' AzPheRS*, respectively. Cells received 2.5 mM 2 as a substrate for the synthetase. Myoglobin-His6 produced by the cells was purified by Ni 2+ -affinity chromatography, analysed by SDS-PAGE and detected with Coomassie stain. D. M AzPheRS*/†RNAuccu decodes AGGA codons specifically with 2. The incorporation of 2 into myoglobin-His6 purified from cells expressing Myo4(AGGA) and M/AzPheRSVtRNAuccu in the presence of 2.5 mM 2 was analysed by ESI-MS. The mass of the observed peak ( 18457.75 Da) corresponds to the calculated mass of myoglobin containing a single 2 ( 18456.2 Da).

Figure 14: Amino acid dependent growth of selected MjAzPheRS* variants. E. co// DH 10B were co-transformed with isolates from a library built on pBK My ' AzPheRS-7 and pREP JY(UCCU) (coding for MjtRNAuccu and chloramphenicol acetyltransferase with an AGGA codon at position Di l l ) . Cells were grown in the presence or absence of 1 mM 2 for 5 h and pronged onto LB agar plates containing 25 vg mM kanamycin, 12.5 μg mM tetracycline and the indicated concentration of chloramphenicol with or without the unnatural amino acid. Plates were photographed after 18 h at 37°C. Sequencing of mutations for incorporating tyrosine, 2 and propargyl-L-tyrosine (Figure 15) in response to the AGGA codon reveals clones with common mutations Y230K, C231 K and P232K, but divergent mutations at positions F261 , H283 and D286. This suggests that amino acids 230, 231 and 232 confer affinity and specificity for the anficodon, and that 261 , 283 and 286 may couple the identity of the anticodon to the amino acid identity. Figure 15: Amino acid dependent growth of selected M/PrTyrRS* variants. E. co// DH 10B transformed as in Figure 14 using isolates from a library built on M/PrTyrRS and tested for unnatural amino acid dependent growth. Mutations relative to M/PrTyrRS are given in the fable below. Figure 16: The MbPylRS/MbtRNAcuA and Mj ' AzPheRS*/tRNAuccu pairs incorporate distinct unnatural amino acids in response to distinct unique codons. A. The two orthogonal pairs (MbPylRS/MbtRNAcuA and M/AzPheRS*/tRNAuccu) decode two distinct codons in the mRNA (UAG and AGGA) with two distinct amino acids (N6-[(†ert.- butyloxy)carbonyl]-L-lysine and 2). MbPyIRS does not aminoacylate MjtRNAuccu and MbtRNAcuA is not a substrate for M/AzPheRS*. B. Suppression of a cognate codon af position 4 in the gene of sperm whale myoglobin by different combinations of MbPylRS/MbtRNAcuA and MjAzPheRS*/†RNAuccu. E. co// DH 10B were transformed with pMyo4TAG-His6 or pMyo4AGGA-His6 as described in Figure 6C. Cells were provided with MbPyIRS (on pBKPylS) or M/AzPheRS* (on pBKMjPheRS*) and 2.5 mM N6-[(tert.- butyloxy)carbonyl]-L-lysine or 5 mM 2, respectively. Myoglobin-His6 produced by the cells was purified by Ni 2+ -affini†y chromatography, analysed by SDS-PAGE and detected with Coomassie stain. We see weak incorporation in response to the UAG codon using the MbPyIRS pair. This incorporation is independent of the presence of M/AzPheRS* and results from a low level background acylation of the tRNA by E. coli synthetases in rich media, as previously observed.

Figure 17: Encoding an azide and an alkyne in a single protein via orthogonal translation. A. Expression of GST-CaM-HiS6 containing two unnatural amino acids. E. coli DH10B were transformed with four plasmids: pCDF PylST (expressing MbPyIRS and MbtRNAcuA), pDULE AzPheRS* tRNAuccu (encoding M/AzPheRS7tRNAuccu). pSC l Ol * ribo- Q l and p-O-gsf-CaM-His* 1 AGGA 40UAG (a GST-CaM-HiS6 fusion translated by the orthogonal ribosome that contains an AGGA codon at position 1 and an amber codon at position 40 of calmodulin (CaM)). Cells were grown in LB medium containing antibiotics to maintain the plasmids and 2.5 mM 4 and/or 5 mM 2 as indicated. Cells were harvested, lysed and the protein purified on GSH-beads. Bound protein was eluted with 10 mM GSH in PBS and analysed by SDS-PAGE. A part of this gel is shown in Figure 3 of the main text. Full-length protein was produced by this method with yields of upto 0.5 mg/L

Figure 18 shows Supplementary Table 1 : Oligonucleotides used in this study.

The invention is now described by way of example. These examples are intended to be illustrative, and are not intended to limit the appended claims.

Examples

Plasmid construction

Previously described gsf-MalE protein expression vectors pgst-malE and pO-gst-malE 9 ' , are translated by wild type and orthogonal ribosomes respectively. These vectors were used as templates to construct variants containing one or two quadruplet codons in the linker region between the gsf and ma/E open reading frame.

To create vectors containing a single AGGA quadruplet codon between gst and ma/E [pgst{AGGA)malE and pO-gst[AGGA)malE) the Tyr codon, TAC, in the linker between gsf and ma/E was changed to AGGA by Quikchange mutagenesis (Stratagene), using the primers GMxl AGGAf and GMxl AGGAr (all primers used in this study are listed in Supplementary Table 1). For double AGGA mutants we additionally mutated the fourth codon in malE from GAA to AGGA by quick change PCR, with the primers GMx2AGGAf and GMx2AGGAr to create the vectors pgsf(AGGA)2/na/E and pO-gsf(AGGA)2ma/E. The vector pO-gsf-ma/E(Y252AGGA) used for protein expression for mass spectrometry, in which the codon for Y l 7 of MBP was mutated to AGGA, was created by Quikchange mutagenesis (Stratagene) using the primers MBPY1 7AGGAf and MBPY1 7AGGAr.

To create vectors for constitutive production of the selected O-ribosomes the mutations in pRSF-OrDNA that confer the quadruplet decoding capacity on the orthogonal ribosome were transferred to pSC l Ol based O-rRNA expression vectors. pSC 101 * -ribo-X was used as a template and the mutations in 1 6S rDNA were introduced by enzymatic inverse PCR using the primers scl Ol Qr and sc l Ol Ql f (for Ribo-Ql ), sc l 01 Q3f (forRibo- Q3) and sc 101 Q4f (for Ribo-Q4) .

pDULE AzPheRS* tRNAuccu (containing the gene for M/tRNAuccu and MjAzPheRS*, each under the control of the Ipp promoter) was created by changing the anticodon of the Mj ' tRNAcuA to UCCU by Quikchange and replacing the ORF of the MjBPA-RS with MjAzPheRS*-2 via ligation of the MjAzPheRS*-2 gene, obtained by cutting pBK MjAzPheRS*-2 with the restriction enzymes Ndel and Stul, into the same sites on pDULE Mj ' BPARS MjtRNAuccu. pCDF PylST (a plasmid expressing MbPyIRS and MbtRNAcuA from constitutive promoters) was created by cloning PCR products containing expression cassettes for MbPyIRS and MbtRNAcuA into the BamHI and Sail or the Sail and Notl sites of pCDF DUET-1 (Novagen) . The PCR products were obtained by amplifying the relevant regions of pBK PylRS and pREP PylT.

Plasmid encoding a fusion of GST and CaM were created by replacing the ORF of MBP in p-O-gst-malE with human CaM. The gene for CaM was amplified by PCR from pET3- CaM (a kind gift from K. Nagai) using primers CamEcof and CamH6Hindr (adding a C- terminal His6-tag) and cloned into the EcoRI and Hindlll sites of pO-gst-malE. Methionine-1 of CaM was mutated to AGGA by a subsequent round of Quikchange mutagenesis using primers CaM l aggaf and CaM l aggar (simultaneously removing part of the linker between GST and CaM). In a second round of mutagenesis an amber codon was introduced at position 149 using primers CaMK149TAGf and CaMK149TAGr. To create a sterically hindered control the amber codon was inserted at position 40 instead using primers CaM40tagf and CaM40tagr.

Construction of ribosome libraries and quadruplet decoding reporters.

1 1 different 1 6S rDNA libraries were constructed by enzymatic inverse PCR 8 > 31 using pTrcRSF-O-ribo-X as a template. The resulting pRSF-O-rDNA libraries mutate between 7 and 13 nucleotides in defined regions on 16S rRNA and were constructed by multiple rounds of by enzymatic inverse PCR using the library construction primers in Supplementary Table 1. Each library has a diversity of greater than 10 9 , ensuring more than 99% coverage. There is overlap in the nucleotides mutated in the 1 1 libraries and overall they cover the entire surface of decoding centre in the A site of the ribosome.

To create a reporter of quadruplet decoding by orthogonal ribosomes, we used a previously described O-cat (UAGA146)/tRNA(UAGA) vector as a template 9 . This vector contains a variant of E. coli †RNA Ser2 on an Ipp promoter and rrnC transcriptional terminator. The tRNA has an altered anticodon and selector codons for serine 146 in the chloramphenicol acetyl transferase [cat) gene downstream of an orthogonal ribosome-binding site. Serl 46 is an essential and conserved catalytic serine residue that ensures the fidelity of incorporation. To create O-cat (AAGA 103 AAGA146)/tRNA(UCUU) the AAGA codon was introduced at position 146 and 103 and the anticodon of the tRNA was converted to UCUU by Quikchange mutagenesis using primers CAT146AGGAf, CAT146AGGAr and CAT103AGGAf, CAT103AGGAr. O-caf reporters containing the quadruplet codons AGGA, CCCU (using primers CAT146CCCUf, CAT146CCCUr and CAT103CCCUf and CAT103CCUr) and the corresponding tRNAs (Ser2AGGAf, Ser2AGGAr, Ser2CCCUf and Ser2CCCUr) were also created by Quikchange mutagenesis. Reporters containing a single quadruplet selector codon were intermediates in the vector construction process. Vectors having the O-cat gene but lacking the tRNA were created using 0-cat(UAGA146), which does not contain the tRNA cassette, as a template using Quik change primers CAT146AAGf, CAT146AGGAr, CAT103AGGAf, CAT103AGGAr, CAT146CCCUf, CAT146CCCUr, CAT103CCCUf and CAT103CCCUr that mutate the codons in O-cat.

Selection of orthogonal ribosomes with enhanced quadruplet decoding.

To select O-ribosomes with improved quadruplet decoding, each pRSF-O-rDNA library was transformed by electroporation into GeneHog E. coli (Invitrogen) cells containing O-cat (AAGA146). Transformed cells were recovered for 1 h in SOB medium containing 2% glucose and used to inoculate 200 ml of LB-GKT (LB medium with 2% glucose, 25 μg ml-' kanamycin and 12.5 μg mH tetracycline). After overnight growth (37 °C, 250 r.p.m., 16 h), 2 ml of the cells were pelleted by centrifugation (3,000g), and washed three times with an equal volume of LB-KT (LB medium with 12.5 g mH kanamycin and 6.25 pg ml-' tetracycline). The resuspended pellet was used to inoculate 18 ml of LB-KT, and the resulting culture incubated (37 °C, 250 r.p.m. shaking, 90 min). To induce expression of plasmid encoded O-rRNA, 2 ml of the culture was added to 18 ml LB-IKT (LB medium with 1.1 mM isopropyl-D-thiogalac†opyranoside (IPTG), 12.5 pg ml-' kanamycin and 6.25 pg ml-' tetracycline) and incubated for 4 h (37 °C, 250 r.p.m.). Aliquots (250 ml optical density at 600 nm [ODm) = 1.5) were serial diluted and plated on LB-IKT agar (LB agar with 1 mM IPTG, 12.5 g mH kanamycin and 6.25 \ig ml-' tetracycline) supplemented with chloramphenicol of different concentrations (75 pg ml 1 , 100 μg ml 1 , 150 pg ml 1 , and 200 pg mH respectively) and incubated (37 °C, 40 h).

Characterization of evolved orthogonal ribosomes with enhanced quadruplet decoding.

To separate selected pRSF-O-rDNA plasmids from the O-cat (AAGA146)/tRNA ser2 (UCUU) reporter plasmids, total plasmid DNA from selected clones was purified and digested with Nofl restriction endonuclease, and transformed into DH10B E. coli. Individual transformanfs were replica plated onto kanamycin agar and tetracycline agar and plasmid separation of pRSF-O-rDNA from the reporter confirmed by restriction digest and agarose gel analysis.

To quantify the quadruplet decoding activity of selected 16S rDNA clones, the selected pRSF-O-rDNA plasmids were cotransformed with O-caf (AGGA103, AGGA146) /tRNA ser2 (UCCU). Cells were recovered (SOB, 2% glucose, 1 h) and used to inoculate 10 ml of LB-GKT, which was incubated (16 h, 37 °C, 250 r.p.m.). We used 1 ml of the resulting culture†o inoculate 9 ml of LB-KT, which was incubated (90 min, 37 °C, 250 r.p.m.). We used 1 ml of the LB-KT culture to inoculate 9 ml of LB-IKT medium, which was incubated (37 °C, 250 r.p.m., 4 h). Individual clones were transferred to a 96-well block and arrayed, using a 96-well pin tool, onto LB-IKT agar plates containing chloramphenicol at concentrations from 0 to 500 pg mH. The plates were incubated (37°C, 16 h). We performed analogous experiments for other quadruplet codon- anticodon pairs.

To extract soluble cell lysates for in vitro CAT assays, 1 ml of each induced LB-IKT culture was pelleted by cenfrifugafion at 3,000g. The cell pellets were washed three times with 500 pi Washing Buffer (40 mM Tris-HCI, 150 mM NaCI,- 1 mM EDTA, pH 7.5) and once with 500 pi lysis buffer (250 mM Tris-HCI, pH 7.8). Cells were lysed in 200 pi Lysis Buffer by five cycles of flash-freezing in dry ice/ethanol, followed by rapid thawing in a 50 °C water bath. Cell debris was removed from the lysate by cenfrifugafion ( 12,000g, 5 min) and the top 150 pi of supernatant frozen at -20 °C. To assay CAT activity in the lysates, 10 pi of soluble cell extract was mixed with 2.5 pi of FAST CAT Green (deoxy) substrate (Invitrogen) and preincubated (37 °C, 5 min). We added 2.5 pi of 9 mM acetyl-CoA (Sigma), and incubated (37 °C, 1 h). The reaction was stopped by the addition of ice- cold ethyl acetate (200 I, vortex 20 s). The aqueous and organic phases were separated by cenfrifugafion (12,000g, 10 min) and the top 100 μΙ of the ethyl acetate layer collected. We spotted 1 μΙ of the collected solution onto a silica gel Thin-layer chromatography plate (Merck) for thin-layer chromatography in chloroform:methanol (85:15 vol/vol). The fluorescence of the spatially resolved substrate and product was visualized and quantified using a phosphorimager (Storm 860, Amersham Biosciences) with excitation and emission wavelengths of 450 nm and 520 nm, respectively.

Small scale expression and purification of gst-malE fusions.

E. coli containing the appropriate plasmid combinations were pelleted (3,000g, 10 min) from 50 ml overnight cultures, resuspended and lysed in 800 μΙ Novagen BugBuster Protein Extraction Reagent (supplemented with l protease inhibitor cocktail (Roche), 1 mM PMSF, 1 mg mH lysozyme (Sigma), 1 mg mH DNase I (Sigma)), and incubated (60 min, 25 °C, 1 ,000 r.p.m.). The lysate was clarified by centrifugation (6 min, 25,000g, 2 °C). GST containing proteins from the lysate were bound in batch (1 h, 4 °C) to 50 μΙ of glutathione sepharose beads (GE Healthcare). Beads were washed 3 times with 1 ml PBS, before elution by heating for 10 min at 80 °C in 60 μΙ 1 χ SDS gel-loading buffer. All samples were analyzed on 10% Bis-Tris gels (Invitrogen) . Measuring the translational fidelity of orthogonal quadruplet decoding ribosomes

35 S-cys†e/ ' ne misincorporation: E. coli containing either pO-gst-malE and pSC 101 * -O- ribosome, pO-gst-malE and pSC 101 * -ribo-X, pO-gst-malE and pSC l OT-riboQ, or pgsf- malE were resuspended in LB media (supplemented with 35 S-cys†eine ( 1 ,000 Ci mmoH) to a final concentration of 3 nM, 750 μΜ methionine, 25 μg mH ampicillin and 12.5 μg mH kanamycin) to an Οϋόοο of 0.1 , and cells were incubated (3.5 h, 37°C, 250 r.p.m.) . 10 ml of the resulting culture was pelleted (5,000g, 5 min), washed twice ( 1 ml PBS per wash), resuspended in 1 ml lysis buffer containing 1 % Triton-X, incubated (30 min, 37°C, 1 ,000 r.p.m.) and lysed on ice by pipetting up and down. The clarified cell extract was bound to 100 μΙ of glutathione sepharose beads ( 1 h, 4°C) and the beads were pelleted (5,000g, 10 s) and washed twice in 1 ml PBS. The beads were added to 10 ml polypropylene column (Biorad) and washed (30 ml of PBS; 10 ml 0.5 M NaCI, 0.5x PBS; 30 ml PBS) before elution in 1 ml of PBS supplemented with 10 mM glutathione. Purified GST-MBP was digested with 12.5 units of thrombin for 1 h, to yield a GST fragment and an MalE fragment. The reaction was precipitated with 15% trichloroacetic acid and loaded onto an SDS-PAGE gel to resolve the GST, MBP and thrombin, and stained with InstantBlue (Expedeon). The 35 S activity in the GST and MBP protein bands were quantified by densitometry, using a Storm Phosphorimager (Molecular Dynamics) and ImageQuant (GE Healthcare). The error frequency per codon for each ribosome examined was determined as follows: GST contains four cysteine codons, so the number of counts per second (c.p.s.) resulting from GST divided by four gives A, the cps per quantitative incorporation of cysteine. MBP contains no cysteine codons, but misincorporation at noncysteine codons gives B c.p.s. Because GST and MBP are present in equimolar amounts, (A/B 410, where 410 is the number of amino acids in the MBP containing thrombin cleavage fragment, gives the number of amino acids translated for one cysteine misincorporation C. Assuming the misincorporation frequency for all 20 amino acids is the same as that for cysteine the number of codons translated per misincorporation is C/20, and the error frequency per codon is given by (C/20) 1 ..

Dual luciferase assays: The previously characterized pO-DL contains a genetic fusion between a 5' Renilla luciferase (R-luc) and a 3' firefly luciferase (F-luc) on an orthogonal ribosome binding site 9 . pO-DLR, and its K529 codon variants, were transformed into £. coli cells with pSC 101 * -0-ribosome or pSC 101 * -ribo-Q 1 . Where indicated an additional £. coli Ser2A tRNA with a mutated anticodon, as specified in individual experiments, was supplied on plasmid p l 5A-tRNA-Ser2A. In this case 25 μg ml-' tetracyclin was added to all culture media to maintain the additional plasmid. In experiments that used a suppressor tRNA recognizing AGGA codons a natural AGG codon, that is followed by a codon starting with an A, was removed from the linker region of pO-DLR by QuikChange using primers DLR952AAGxf and DLR953AGGxr.

Individual colonies were incubated (37°C, 250 r.p.m., 36 h) in 2 ml LB supplemented with ampicillin (50 μg ml-') and kanamycin (25 μg ml 1 ), pelleted (5,000g, 5 min), washed with ice cold Millipore water and resuspended in 300 μΙ ( 1 mg mH lysozyme, 1 mg ml-' DNase I, 10 mM Tris (pH 8.0), 1 mM EDTA) . Cells were incubated on ice for 20 min, frozen on dry ice, and thawed on ice. 10 μΙ samples of this extract were assayed for firefly (F- luc) and Renilla (R-luc) luciferase activity using the Dual-Luciferase Reporter Assay System (Promega). Each ribosome reporter combination was assayed from four independent cultures using an Orion microplate luminometer (Berthold Detection Systems) and the data analyzed as previously described. The error reported is the standard deviation.

Mass spectrometric characterization of p-azido-L-phenylalanine (2) incorporation by Ribo-Ql

E. coli DH 10B containing p-0-gsf-ma/E(Y252AGGA), pSC 101 *Ribo-Ql and pDULE- AzPheRS*tRNAuccu were used to produce protein for mass spectrometry. Protein was expressed in the presence of 2.5 mM 2 and purified on glutathione. The purified proteins were resolved by SDS-PAGE, stained with Instant Blue (Expedeon) and the band containing full length GST-MBP was excised for analysis by LC/MS/MS (NextGen Sciences) . The samples were reduced with DTT at 60°C and alkylated with iodoacetamide after cooling to room temperature. The samples were then digested with trypsin (37°C, 4 h), and the reaction was stopped by the addition of Formic acid. The samples were analyzed by nano LC/MS/MS on a ThermoFisher LTQ Orbitrap XL. 30 μΙ of hydrolysate was loaded onto a 5 mm 75 pm ID C I 2 (Jupiter Proteo, Phenomenex) vented column at a flow-rate of 10 μΙ min 1 . Gradient elution was over a 15 cm 75 μιτι ID C I 2 column at 300 nl min-' with a 1 hour gradient. The mass spectrometer was operated in data-dependent mode, and ions were selected for MS/MS. The Orbitrap MS scan was performed at 60,000 FWHM resolution. MS/MS data was searched using Mascot (www.matrixscience.com).

Evolution of a quadruplet decoding M/AzPheRS

pBK M/AzPheRS-7 24 (a kanamycin resistant plasmid, which contains M/AzPheRS-7 on a GlnRS promoter and terminator) was used as a template to create a library in the region of M AzPheRS that recognizes the anticodon. Codons for residues Y230, C231 , P232, F261 , H283 and D286 were randomized to NNK in two rounds of enzymatic inverse PCR, generating a library of 10 s mutant clones. pREP JY(UCCU) was created by changing the anticodon of M/tRNAcuA in pREP YC-JYCUA 32 from CUCUAAA to CUUCCUAA by QuikChange mutagenesis (Stratagene) and changing the amber codon in the chloramphenicol acetyltransferase gene to AGGA. E. coli DH 10B harbouring this plasmid were transformed with the mutant library and grown in LB-KT (LB medium supplemented with 25 μg ml-' kanamycin and 12.5 pg mM tetracycline) supplemented with 1 mM 2. 10 9 cells were plated on LB-KT plates containing 1 mM 2 and concentrations of chloramphenicol ranging from 50 to 250 pg ml 1 . After incubation (36 h, 37°C) individual clones were tested for 2 dependent growth on LB-KT plates with 0-250 mM chloramphenicol with and without 1 mM 2. The plasmid DNA from clones showing amino acid dependent growth was isolated and digested with Hindlll to eliminate pREP JY(UCCU). After transformation and reisolation of the kanamycin resistant plasmid the DNA was sequenced.

To select quadruplet decoding pairs that incorporate other amino acids, the procedure above was repeated using the relevant starting template and unnatural amino acid.

Investigating the mutual orthogonality of MbPylRS/MbtRNAcuA and M/TyrRS/M/tRNAcuA To test the ability of MbPyIRS to aminoacylate M/tRNAcuA E. coli DH 10B were transformed wifh a pBK MbPyIRS encoding MbPyIRS under the control of a GlnRS promoter and terminator and pMyo4TAG-His6, expressing sperm whale myoglobin with an amber codon at position 4 and MjtRNAciiA. The cells were grown overnight at 37°C in LB-KT. Fresh LB-KT (50 ml) supplemented with 10 mM N6-[(tert.-butyloxy)carbonyl]-L-lysine (BocLys, 3) was inoculated 1 :50 with overnight culture. After 3 h at 37°C protein expression was induced by addition of 0.2% arabinose. After a further 3 h cells were harvested and washed wifh PBS. Proteins were extracted by shaking at 25°C in 1 ml Ni- wash buffer ( 10 mM Tris/CI, 20 mM imidiazole, 200 mM NaCI pH 8.0) supplemented with protease inhibitor cocktail (Roche), 1 mM PMSF, and approx. 1 mg mH lysozyme and 0.1 mg ml-' DNAse I. The extract was clarified by centrifugation (5 min, 25000 g, 4°C), supplemented 50 μΙ Ni 2+ -NTA beads and incubated with agitation for 1 h at 4°C. Beads were washed in batch three time ' s with 1 ml Ni-wash buffer and eluted in 100 μΙ sample buffer supplemented with 200 mM imidazole. To test the aminoacylation activity between the cognate pairs or between MjTyrRS and MbtRNAcuA analogous experiments were carried out as above using the relevant plasmids (pBK M/TyrRS or pB MbPy\RS and pMyo4TAG-His0 or pMyo4TAG-His,s-PylT) and unnatural amino acids (3 or none). Proteins were analysed by 4-12% SDS-PAGE and stained with Instant Blue.

Characterization of the quadruplet suppressing AzPheRS*

Expression and purification of myoglobin from pMyo4TAG-His6 or pMyo4AGGA-His<6 was carried out as above using the relevant pBK plasmids and 2.5 mM 2. Proteins were analysed by 4-12% SDS-PAGE.

Characterization of yo4AzPhe produced with AzPheRS* from p yo4AGGA-His& by ESI mass spectrometry

Myoglobin was expressed in E. coli DH 10B using plasmids pBK AzPheRS* and pMyo4AGGA-HiS6 essentially as described above but at 1 I scale. The protein was extracted by shaking at 25°C in 30 ml Ni-wash buffer supplemented with protease inhibitor cocktail (Roche), 1 mM PMSF, 1 mg mH lysozyme and 0.1 mg mH DNAse I. The extract was clarified by centrifugation ( 15 min, 38000 g, 4°C), supplemented 0.3 ml Ni 2+ - NTA beads and incubated with agitation for 1 h at 4°C. Beads were poured into a column and washed with 40 ml of Ni-wash buffer. Bound protein was eluted in 0.5 ml fractions of the same buffer containing 200 mM imidazole and immediately rebuffered to 10 mM ammonium carbonate pH 7.5 by dialysis. 50 μΙ of the sample was mixed 1 : 1 with 1 % formic acid in 50% methanol and total mass determined on an LCT time-of-flight mass spectrometer with electrospray ionization (Micromass). The sample was injected at 10 μΙ min-' and calibration performed in positive ion mode using horse heart myoglobin. 50 scans were averaged and molecular masses obtained by deconvoluting multiply charged protein mass spectra using MassLynx version 4.1 (Micromass) . The theoretical mass of the wild-type myoglobin was calculated using Protparam ( http://us.expasv.org/tools/protparam.html), and the theoretical mass for 2 adjusted manually.

MS/MS analysis of GST-AABP 234AzPhe 239CAK

E. coli DH 10B were transformed with pDULE AzPheRS*/tRNAuccu and pCDF PylST and grown to logarithmic phase in LB-ST (25 μg mH spectinomycin and 12.5 μg mH tetracycline). Electrocompetent cells were prepared and transformed with a plasmid for the constitutive expression of an orthogonal ribosome (pSC l Ol * Ribo-Q) and p-O- gsf(234AGGA 239TAG)ma/E. The recovery of the transformation was used to inoculate LB-AKST (LB medium containing 50 pg ml-' ampicillin, 12.5 g ml 1 kanamycin. 25 pg mH spectinomycin and 12.5 pg mH tetracycline). The culture was grown to saturation at 37°C and used to inoculate the main culture 1 :50. Cells were grown overnight at 37°C, harvested by centrifugation and stored at -20°C. The GST-MBP protein was expressed at a scale of 100 ml using 2.5 m of each AzPhe (2) and CA (4). Proteins were extracted and purified as above. After washing the beads with PBS the protein was eluted by heating in 100 μΙ l x sample buffer containing 50 mM β-mercaptoethanol to 80°C for 5 min. The protein sample was analysed by 4-12% SDS-PAGE and stained with Instant Blue. The band containing full-length GST-MBP was excised and submitted for LC/MS/MS analysis (by NextGen Sciences) . Cyclization of GST-CaM-His. 1 AzPhe 149CAK

E. coli DH 10B were transformed sequentially with four plasmids as described above using expression plasmids p-0-gst-CaM-His6 1 AGGA 149UAG or p-0-gst-CaM-HiS6 1 AGGA 40UAG. The protein was expressed at 0.5 L scale as described above using 5 mM 2 and 2.5 mM 4. The cells were extracted and GST-CaM-His6 purified as described for myoglobin-HiS6 and dialysed against 50 mM Na2HPC> pH 8.3. To perform the cyclization reaction, 160 μΙ of protein sample was mixed with 40 μΙ of a fresh solution of 5 mM ascorbic acid, 5 mM CuSO, and 10 mM bathophenanthroline. The reaction was incubated at 4°C and analysed by 4-12% SDS-PAGE.

To analyze the cyclization product by mass spectrometry we introduced additional tryptic cleavage sites around the incorporation sites of unnatural amino acids to facilitate subsequent analysis. Therefore, the point mutations Q4 and M 146 (numbering relative to the AGGA codon in p-0-gsf-CaM-His6 1 AGGA 149UAG) and a G3K linker directly following the TAG codon were introduced by QuikChange. The protein was expressed, purified and cyclized as above with very similar yields. The cyclized protein was subsequently excised from an SDS-PAGE gel and submitted for mass spectrometric analysis (NextGen Sciences, Ann Arbor, USA) . References:

1 . Xie, J. & Schultz, P. G. A chemical toolkit for proteins-an expanded genetic code. Nat Rev Mol Cell Biol 7, 775-82 (2006).

2. Steer, B. A. & Schimmel, P. Major anticodon-binding region missing from an archaebacterial tRNA synthetase. J Biol Chem 274, 35601 -6 ( 1999).

3. Chin, J. W., Martin, A. B., King, D. S., Wang, L. & Schultz, P. G. Addition of a photocrosslinking amino acid to the genetic code of Escherichiacoli. Proc Natl Acad Sci U S A 99, 1 1020-4 (2002) .

4. Srinivasan, G., James, C. M. & Krzycki, J. A. Pyrrolysine encoded by UAG in Archaea: charging of a UAG-decoding specialized tRNA. Science 296, 1459-62 (2002).

5. Polycarpo, C. et al. An aminoacyl-tRNA synthetase that specifically activates pyrrolysine. Proc Natl Acad Sci U S A 101 , 12450-4 (2004).

6. Neumann, H., Peak-Chew, S. Y. & Chin, J. W. Genetically encoding N(epsilon)- acetyllysine in recombinant proteins. Nat Chem Biol 4, 232-4 (2008).

7. Nguyen, D. P. et al. Genetic encoding and labeling of aliphatic azides and alkynes in recombinant proteins via a pyrrolysyl-†RNA Synthetase/tRNA(CUA) pair and click chemistry. J Am Chem Soc 131 , 8720-1 (2009).

8. Rackham, O. & Chin, J. W. A network of orthogonal ribosome x mRNA pairs. Nat Chem Biol 1 , 159-66 (2005).

9. Wang, K., Neumann, H., Peak-Chew, S. Y. & Chin, J. W. Evolved orthogonal ribosomes enhance the efficiency of synthetic genetic code expansion. Nat Biotechnol 25, 770-7 (2007).

10. Rostovtsev, V. V., Green, L. G., Fokin, V. V. & Sharpless, K. B. A stepwise huisgen cycloaddition process: copper(l)-catalyzed regioselective "ligation" of azides and terminal alkynes. Angew Chem Int Ed Engl 41 , 2596-9 (2002).

1 1 . Hohsaka, T. & Sisido, M. Incorporation of non-natural amino acids into proteins.

Curr Opin Chem Biol 6, 809- 15 (2002) .

12. Ohtsuki, T., Manabe, T. & Sisido, M. Multiple incorporation of non-natural amino acids into a single protein using tRNAs with non-standard structures. FEBS Lett 579, 6769-74 (2005).

13. Murakami, H., Hohsaka, T., Ashizuka, Y. & Sisido, M. Site-directed incorporation of p-nitrophenylalanine into streptavidin and site-to-site photinduced electron transfer from a pyrenyl group to a nitrophenyl group on the protein framework. Journal of the American Chemical Society 120, 7520-7529 (1998).

14. Rodriguez, E. A., Lester, H. A. & Dougherty, D. A. In vivo incorporation of multiple unnatural amino acids through nonsense and frameshift suppression. Proc Natl Acad Sci U S A 103, 8650-5 (2006).

15. Monahan, S. L., Lester, H. A. & Dougherty, D. A. Site-specific incorporation of unnatural amino acids into receptors expressed in mammalian cells. Chemistry and Biology 10, 573-580 (2003).

16. Anderson, J. C. et al. An expanded genetic code with a functional quadruplet codon. Proc Natl Acad Sci U S A 101 , 7566-71 (2004).

17. Atkins, J. F. & Bjork, G. R. A gripping tale of ribosomal frameshifting: extragenic suppressors of frameshift mutations spotlight P-site realignment. Microbiol Mol Biol Rev 73, 178-210 (2009).

18. Stahl, G., McCarty, G. P. & Farabaugh, P. J. Ribosome structure: revisiting the connection between translational accuracy and unconventional decoding.

Trends Biochem Sci 27, 178-83 (2002).

19. Selmer, M. et al. Structure of the 70S ribosome complexed with mRNA and tRNA.

Science 313, 1935-42 (2006).

0. Magliery, T. J., Anderson, J. C. & Schultz, P. G. Expanding the genetic code: selection of efficient suppressors of four-base codons and identification of "shifty" four-base codons with a library approach in Escherichia coli. J Mol Biol 307, 755-69 (2001 ).

Khazaie, ., Buchanan, J. H. & Rosenberger, R. F. The accuracy of Q beta RNA translation. 1. Errors during the synthesis of Q beta proteins by intact Escherichia coli cells. Eur J Biochem 144, 485-9 (1984).

Laughrea, M., Latulippe, J., Filion, A. M. & Boulet, L. Mistranslation in twelve

Escherichia coli ribosomal proteins. Cysteine misincorporation at neutral amino acid residues other than tryptophan. Eur J Biochem 169, 59-64 (1987).

Kramer, E. B. & Farabaugh, P. J. The frequency of translational misreading errors in E. coli is largely determined by tRNA competition. Rna 13, 87-96 (2007).

Chin, J. W. et al. Addition of p-azido-L-phenylalanine to the genetic code of

Escherichia coli. J Am Chem Soc 124, 9026-7 (2002).

Mukai, T. et al. Adding l-lysine derivatives to the genetic code of mammalian cells with engineered pyrrolysyl-tR A synthetases. Biochem Biophys Res Commun 371 , 818-22 (2008).

Camarero, J. A., Pavel, J. & Muir, T. W. Chemical Synthesis of a Circular Protein Domain: Evidence for Folding-Assisted Cyclization. Angewandte Chemie - International Edition 37, 347-349 ( 1 98).

Scott, C. P., Abel-Santos, E., Wall, M., Wahnon, D. C. & Benkovic, S. J. Production of cyclic peptides and proteins in vivo. Proc Natl Acad Sci U S A 96, 13638-43 (1999).

Li, P. & Roller, P. P. Cyclization strategies in peptide derived drug design. Curr Top Med Chem 2, 325-41 (2002).

Walensky, L. D. et al. Activation of apoptosis in vivo by a hydrocarbon-stapled BH3 helix. Science 305, 1466-70 (2004).

Trauger, J. W., Kohli, R. M., Mootz, H. D., Marahiel, M. A. & Walsh, C. T. Peptide cyclization catalysed by the thioesterase domain of tyrocidine synthetase. Nature 407, 215-8 (2000).

Stemmer, W. P. & Morris, S. K. Enzymatic inverse PCR: a restriction site independent, single-fragment method for high-efficiency, site-directed mutagenesis. Biotechniques 13, 214-20 (1992).

Santoro, S. W., Wang, L, Herberich, B., King, D. S. & Schultz, P. G. An efficient system for the evolution of aminoacyl-†RNA synthetase specificity. Nat Biotechnol 20, 1044-8 (2002).

Rice, J. B., Libby, R. T. & Reeve, J. N. Mistranslation of the mRNA encoding bacteriophage T7 0.3 protein. J Biol Chem 259, 6505-10 (1984).

Hao, B. et al. A new UAG-encoded residue in the structure of a methanogen methyltransferase. Science 296, 1462-6 (2002).

Kobayashi, T. et al. Structural basis for orthogonal tRNA specificities of tyrosyl- tRNA synthetases for genetic code expansion. Nat Struct Biol 10, 425-32 (2003).

All publications mentioned in the above specification are herein incorporated by reference. Various modifications and variations of the described aspects and embodiments of the present invention will be apparent to those skilled in the art without departing from the scope of the present invention. Although the present invention has been described in connection with specific preferred embodiments, it should be understood that the invention as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the described modes for carrying out the invention which are apparent to those skilled in the art are intended to be within the scope of the following claims.