Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
ENGINEERED OLIGOSACCHARYLTRANSFERASES
Document Type and Number:
WIPO Patent Application WO/2016/023018
Kind Code:
A2
Abstract:
The Campylobacter jejuni protein glycosylation locus (pgl) encodes machinery for asparagine-linked (N-linked) glycosylation and serves as the archetype for bacterial N- glycosylation. This machinery has been functionally transferred into Escherichia coli, thereby enabling convenient mechanistic dissection of the N-glycosylation process in this genetically tractable host. Here, we sought to identify sequence determinants in the oligosaccharyltransferase PglB that restrict its specificity to only those glycan acceptor sites containing a negatively charged amino acid residue at the -2 position relative to asparagine. This involved creation of a genetic assay named glycoSNAP (glycosylation of secreted N- linked acceptor proteins) that facilitates high-throughput screening of glycophenotypes in E. coli. Using this assay, we isolated several C. jejuni PglB variants that were capable of glycosylating an array of noncanonical acceptor sequences including one in a eukaryotic N- glycoprotein. Collectively, these results underscore the utility of glycoSNAP for shedding light on poorly understood aspects of N-glycosylation and for engineering designer N- glycosylation biocatalysts.

Inventors:
OLLIS ANNE A (US)
FISHER ADAM C (US)
MERRITT JUDITH H (US)
DELISA MATTHEW P (US)
Application Number:
PCT/US2015/044395
Publication Date:
February 11, 2016
Filing Date:
August 08, 2015
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
GLYCOBIA INC (US)
International Classes:
C12P21/02; C12N9/10
Attorney, Agent or Firm:
HONG, Chang, B. (Inc.410 Weill Hall,Mcgovern Cente, Ithaca NY, US)
Download PDF:
Claims:
What is claimed is:

1. A modified oligosaccharyltransferase (EC 2.4.1.119) capable of catalyzing the

transfer of a lipid-linked glycan onto a sequon comprising N-X-S/T, wherein X is any amino acid residue except a proline.

2. A modified oligosaccharyltransferase having a relaxed specificity to an N-glycan acceptor site comprising Y-X-N-X-S/T, wherein X is any amino acid residue except a proline and wherein Y is not a negatively charged amino acid residue.

3. The modified oligosaccharyltransferase of claim 2, wherein Y is not D or E amino acid residue.

4. An isolated or recombinant oligosaccharyltransferase comprising an amino acid

residue change in the protein-binding pocket of the oligosaccharyltransferase.

5. The oligosaccharyltransferase of claim 4, wherein the oligosaccharyltransferase is prokaryotic.

6. The oligosaccharyltransferase of claim 4, wherein the oligosaccharyltransferase is selected from Camplyobacter sp.

7. A mutant oligosaccharyltransferase comprising a substitution at position R327 and/or R328 in C. jejuni.

8. A mutant oligosaccharyltransferase comprising at least one of the mutations selected from at least one of R327D, R328L; R327N, R328L; R327L, R328Q; R327M, R328L; R328L; R327G, R328L; R327V, R328L; R328N; and R327P, R328V.

9. A mutant oligosaccharyltransferase comprising a substitution at position R331 in C. lari.

10. The mutant oligosaccharyltransferase of claim 10, wherein the mutant comprises Q330D, R330L.

11. The mutant oligosaccharyltransferase of any one of claims 7 - 10, wherein the mutant oligosaccharyltransferase catalyzes the transfer of a lipid-linked glycan onto a sequon comprising N-X-S/T, wherein X is any amino acid residue except a proline.

12. The mutant oligosaccharyltransferase of claim 11, wherein the lipid-linked glycan comprises bacterial O-antigen.

13. The mutant oligosaccharyltransferase of claim 11, wherein the lipid-linked glycan is a eukaryotic glycan.

14. An isolated or recombinant polypeptide encoding an

oligosaccharyltransferase comprising or consisting of a mutant oligosaccharyltransferase wherein the polypeptide sequence is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8% or at least 99.9% identical to the mutant oligosaccharyltransferase.

15. A host cell comprising the oligosaccharyltransferase of any one of above claims.

16. A modified host cell comprising a recombinantly expressed protein comprising a modified oligosaccharyltransferase wherein the host cell catalyzes the transfer of a lipid-linked glycan onto a sequon comprising N-X-S/T, wherein X is any amino acid residue except a proline.

17. The modified host cell of claim 16, wherein the oligosaccharyltransferase catalyzes the transfer of lipid-linked glycans onto a broader range of sequons than that of the wild-type oligosaccharyltransferase.

18. The modified host cell of claim 16, wherein the oligosaccharyltransferase comprises a relaxed specificity to an N-glycan acceptor site comprising Y-X-N-X-S/T, wherein X is any amino acid residue except a proline and wherein Y is not a negatively charged amino acid residue.

19. A method for producing a glycoprotein in a host cell comprising

expressing a modified oligosaccharyltransferase;

expressing a protein of interest;

wherein the oligosaccharyltransferase catalyzes the transfer of lipid-linked glycan onto an asparagine residue of a sequon comprising N-X-S/T of the protein, wherein X is any amino acid residue except a proline.

20. The method of claim 19, further expressing one or more glycosyltransferases.

21. A method for detecting N-linked glycoproteins comprising

expressing one or more oligosaccharyltransferase activity;

expressing a glycosylated secreted protein;

separating the glycosylated secreted protein into the extracellular medium from cells, free oligosaccharides or membrane-associated lipid-linked oligosaccharides; and identifying one or more oligosaccharyltransferase activity to an acceptor site of the

glycosylated secreted protein.

Description:
ENGINEERED OLIGOSACCHARYLTRANSFERASES

STATEMENT OF FEDERALLY SPONSORED RESEARCH AND DEVELOPMENT

[0001] This invention was made with government support under National Science

Foundation Grant CBET 1159581 and National Institutes of Health Grant R44 GM088905-01 and NIH SIG Grant 1S10RR025449-01. The government has certain rights in this invention.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0002] This application is related to U.S. Provisional Application No. 62/034,986, filed August 8, 2014, which is herein incorporated by reference, in its entirety, for all purposes.

SEQUENCE LISTING

[0003] This application contains a Sequence Listing which [ ] submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on [DATE], is named [.tx ] and is [#######] bytes in size.

FIELD OF INVENTION

[0004] The disclosure herein generally relates to the field of glycobiology and protein engineering. More specifically, the embodiments described herein relate to oligosaccharyl transferase compositions and production of therapeutic glycoproteins in recombinant hosts.

[0005] Background

[0006] Chemical modification of specific amino acid side chains with oligosaccharides, a process termed glycosylation, is estimated to affect more than half of all eukaryotic proteins ! ' 2 . Asparagine-linked (N-linked) is the most abundant type of glycosylation and affects numerous cellular processes including, protein folding, homeostasis, and trafficking 3 7 .

While originally believed to occur only in eukaryotes, N-linked glycosylation has now been observed in all domains of life 8 . In eukaryotes, the N-glycosylation process is essential as reflected by the well-conserved Glc3Man9GlcNAc 2 glycan structure in animal, plant, and fungal species 1 . In archaea and bacteria, N-glycosylation is not required for survival. These organisms employ much more diverse monosaccharides and linkages in their glycan structures 9 , which appear to be optimized for specific purposes. For example, the N-glycan produced by the pathogenic bacterium Campylobacter jejuni is a heptasaccharide [GalNAc 5 (Glc)Bac where Bac is bacillosamine or 2,4-diacetamido-2,4,6-trideoxyglucose] that helps mediate adherence to and invasion of host cells 10 .

[0001] N- linked protein glycosylation minimally involves two distinct steps: synthesis of lipid-linked oligosaccharides (LLOs) and transfer of oligosaccharides from a lipid-phospho carrier (e.g. , dolichol mono- or diphosphate in eukaryotes and archaea, and undecaprenol diphosphate in bacteria n ) to asparagine residues in acceptor proteins. This latter step is catalyzed by the oligosaccharyltransferase (OST). The eukaryotic OST is a multimeric protein complex with the STT3 protein serving as the central catalytic subunit 12 , whereas archaeal and bacterial OSTs are single subunit enzymes that bear homology to STT3 13 ' 14 . A hallmark of eukaryotic and archaeal OSTs is their broad acceptor site specificity, which permits glycosylation of Asn residues in the context of a very short consensus sequon (N-X- S/T; X≠ P). Bacterial OSTs on the other hand recognize a more specific sequon that is extended by a negatively charged amino acid (Asp or Glu) in the -2 position relative to the Asn (D/E-X_i-N-X+i-S/T; X_ h X+i≠ P) 15 . This so-called "minus two rule" was established based on studies of the C. jejuni OST PglB (Q ' PglB) and restricts bacterial glycosylation to a narrow set of polypeptides. A possible explanation for the minus two rule comes from the crystal structure of C. lari PglB (C/PglB; 56% identical to Q ' PglB) in which a salt bridge between R331 of the OST and the -2 Asp of a bound acceptor peptide was proposed to strengthen the PglB-peptide interaction. Since R331 is conserved in bacteria but not in eukaryotes or archaea, this residue may contribute to the more specific site selection by bacterial OSTs 13 .

[0002] A handful of genetic screens for N-linked glycosylation have been described for this purpose including ELISA-based detection of periplasmic glycoproteins 16 ' 11 , glycophage

18 19 20 22

display ' , and cell surface display of glycoconjugates " . All of these involve the use of glycoengineered Escherichia coli carrying the complete protein glycosylation (pgl) locus of C. jejuni 23 ; however, none have been used to engineer OST variants with improved or novel activities. While there may be several reasons for this, a potential limitation of some of these methods in identifying N-glycoproteins is that they can be confounded by the prevalence of glycan intermediates that have not been transferred to proteins (e.g., LLOs in bacterial cell membranes), increasing the likelihood for false-positive hits.

[0003] What is needed, therefore, is a method to test the capability of one or more OST variants to glycosylate a native site in a eukaryotic glycoprotein.

SUMMARY [0007] The present invention provides methods and compositions for the recombinant expression of a modified oligosaccharyltransferase activity (EC 2.4.1.119). The methods and compositions further provide for the production of carbohydrates in prokaryotic host cells and attaching them as N-linked glycans to proteins. Various oligosaccharyltransferase enzymes are engineered to catalyze the transfer of desired glycans built on a lipid-linked precursor such as bactoprenol via various glycosyltransferase activities required to synthesize specified oligosaccharide structures.

[0008] In one embodiment, a polypeptide encoding an oligosaccharyltransferase activity comprising 310-VNQTIQEVENVDFSEFMRRISGSEIVF wherein the polypeptide is engineered at amino acid position 327 and/or 328. In preferred embodiments, a C. jejuni oligosaccharyltransferase mutant is selected from R327D, R328L; R327N, R328L; R327L, R328Q; R327M, R328L; R328L; R327G, R328L; R327V, R328L; R328N; and R327P, R328V.

[0009] In other embodiments, a polypeptide encoding an oligosaccharyltransferase activity comprising 313-NETIMEVNTIDPEVFMQRISSSVLVF is engineered at amino acid position 330 and/or 331.

[0010] In preferred aspects of the invention, a modified oligosaccharyltransferase enzyme catalyzes the transfer of a prokaryotic glycans, for example, GalNAc5(Glc)Bac onto a sequon such as N-X-S/T lacking the canonical acidic residue (D/E) in the -2 position of the acceptor motif. Various bacterial O-antigens are suitable lipid-linked glycans for transfer onto a sequon.

[0011] The present invention also provides methods and compositions for the recombinant production of human or human- like glycans in recombinant host cells. In certain aspects, a method is provided for producing an oligosaccharide composition comprising: culturing a recombinant host cell to express a modified oligosaccharyltransferase activity (EC 2.4.1.119) that catalyzes the transfer of non-native or eukaryotic carbohydrate structures that are human or human-like glycans. Additional aspects provide for the expression of one or more glycosyltranferase activity comprising: mannosyltransferease (EC 2.4.1.131), N- acetylglucosaminyl transferase enzyme activity (EC 2.4.1.101; EC 2.4.1.143; EC 2.4.1.145; EC 2.4.1.155; EC 2.4.1.201); GalNAc transferase (EC 2.4.1.-); galactosyltransferase (EC 2.4.1.-); fucosyltransferase (EC 2.4.1.69); and sialyltransferase (EC 2.4.99.4, EC 2.4.99.-, EC 2.4.99.8). [0012] While various eukaryotic and prokaryotic expression systems are used to express oligosaccharyltransferase activities to transfer glycans and produce glycoprotein

compositions, a preferred expression system involves prokaryotic host cells.

[0013] Accordingly, the present invention demonstrates the catalytic transfer of a lipid-linked glycan onto a eukaryotic sequon comprising N-X-S/T through engineering an

oligosaccharyltransferase enzyme.

BRIEF DESCRIPTION OF THE FIGURES

[0014] Figure 1. Specific detection of glycosylated proteins using the glycoSNAP assay. Western blot analyses with a GalNAc-specific lectin (SBA) and epitope-tag-specific antibody (anti-His) or Coomassie staining of membranes containing colony-secreted YebF variants. Images depicted here were derived from: glycosylation-competent colonies (left row, YebF 4xDQNAT + wt QPglB), colonies expressing a catalytically inactive OST (middle row, YebF 4xD Q NAT + Qr colonies expressmg a target bearing a nonconsensus motif (right row, YebF 4xAQNAT + wt QPglB). Results are representative examples of at least two biological replicates. See Fig. 10 for uncropped versions of the images.

[0015] Figure 2. Structure- and sequence-guided mutagenesis of PglB. (a) The C/PglB structure (pdb: 3RCE) shows a potential salt bridge between PglB R331 and the -2 Asp of a DQNAT acceptor peptide. C/PglB is shown as a surface representation, while the bound acceptor peptide is a stick representation. Shown at right is a homology model of QPglB. Key residues in the peptide/protein binding cavity are labeled, and the red and dark blue coloring indicates oxygen and nitrogen atoms, respectively. The highly polar character of these residues is conserved between C/PglB and QPglB. The protein backbone of PglB is colored blue for labeled residues and gray for the remainder of the structure. The acceptor peptide is depicted in yellow, with the target Asn residue in orange, (b) Sequence alignment of bacterial and eukaryotic OSTs. The regions of the OSTs homologous to C PglB residues 313-339 were aligned using Clustal Omega. Shading indicates residue conservation. Bacterial conservation of R331 is shown in red. The residues of interest in this study are highlighted in yellow, (c) Homology models of QPglB and DL, NL, and LQ mutants. Labels indicate the locations of the native Arg residues and the mutations isolated in this study. Coloring shows polar residues in the region of the mutations, as in (a), with the mutant protein backbones colored cyan.

[0016] Figure 3. PglB mutants exhibit relaxed substrate specificity, (a-d) Western blots of acceptor protein scFvl3-R4 XQNAT , where X is one of the 20 amino acids indicated across the top, co-expressed with each of the Q ' PglB variants as indicated. The slower migrating band on anti-His immunoblots is the glycosylated form of scFvl3-R4 XQNAT , confirmed by the anti- glycan immunoblots. Molecular weight (MW) markers are indicated on the left. Blots are representative examples of at least two biological replicates. See Fig. 11 for uncropped versions of the images, (e-h) Sequence logos showing experimentally determined substrate specificities of the indicated Q ' PglB variants from glycoSNAP y e bF N24L/XXNXT library screening.

[0017] Figure 4. Glycosylation of a native eukaryotic protein by PglB variants. Western blot analysis of bovine pancreatic RNaseA, with either an S32D substitution in the -2 position of its sequon (a) or its native sequon (b), expressed with each of the Q ' PglB variants as indicated or with the catalytically inactive QPglB D54N/E316Q (mut). Molecular weight (MW) markers are indicated on the left. The gO and gl labels on the right denote the aglycosylated and glycosylated forms of RNaseA, respectively. Blots are representative examples of at least two biological replicates. See Fig. 12 for uncropped versions of the images.

[0018] Figure 5. The glycoSNAP assay. Schematic of glycoSNAP assay for reporting N- glycosylation of secreted acceptor proteins in glycoengineered E. coli. Colonies carrying plasmids that encode N-glycosylation machinery and E. coli YebF modified with an acceptor sequon (e.g., YebF4x DQNAT ) were replicated on a filter, and protein expression was induced when the filter was overlaid on a plate layered with a nitrocellulose membrane. YebF that was secreted and glycosylated was specifically detected by Western blot and correlated to glycosylation-competent colonies.

[0019] Figure 6. Characterization of YebF glycosylation by isolated OST variants, (a) Western blot analysis of YebF4xAQNAT glycosylation by wt CjPglB or DL, NL, and LQ mutants isolated using glycoSNAP. Glycosylation of YebF4xDQNAT by wt CjPglB is shown for comparison. Shorter exposure of YebF4xDQNAT glycosylation clearly shows doubly through quadruply glycosylated YebF (far left panel). The gO through g5 labels denote aglycosylated and singly through quintuply glycosylated forms of YebF. Results are representative of at least three biological replicates, (b) Western blot analysis to confirm glycosylation of residue N24 in YebF by CjPglB DL variant. The gO through g2 labels denote aglycosylated, singly, and doubly glycosylated forms of YebF. Results are representative of at least two biological replicates, (c) Amino acid sequence of native E. coli YebF. Arrow indicates signal peptide cleavage site for processing to mature protein. The underlined Asn indicates the N-glycosylated residue in this study, (d) Glycosylation efficiency of CjPglB variants assessed using YebFN24 with single DQNAT or AQNAT acceptor site. Catalytically inactive CjPglBD54N/E316Q (mut) is shown for an aglycosylated YebF control. The gO and gl labels denote aglycosylated and singly glycosylated forms of YebF, respectively. Samples were from four-hour inductions and representative of at least three biological replicates. The bottom blot corresponds to membrane fractions prepared from osmotically lysed spheroplasts from the same cultures as the blots above. Note that both aglycosylated and glycosylated YebFN24L/AQNAT (gO(A) and gl(A), respectively) migrate faster than aglycosylated and glycosylated YebFN24L/DQNAT (gO(D) and gl(D), respectively), (e) SDS-PAGE analysis of YebFN24L/DQNAT or YebFN24L/AQNAT purified from culture supernatants of the same cells in (d). The gO through gl labels denote aglycosylated and singly glycosylated forms of YebF. Molecular weight (MW) markers are indicated at left of all Western blots and the SDS-PAGE gel.

[0020] Figure 7. Mass spectrometry analysis of nonconsensus glycosylation. (a) Ni-NTA- purified scFvl3-R4AQNAT samples used in MS analysis, stained with Coomassie Brilliant Blue G-250. Glycosylated bands, indicated by gl arrow, were excised and submitted for MS analysis. MS/MS spectrum of the triply-charged precursor ion [m/z 1189.01 for CjPglB DL variant (b); m/z 1189.03 for CjPglB NL variant (c), and m/z 1189.08 for CjPglB LQ variant (d)], identifying the glycopeptide and a 1405.56 Da glycan with bacillosamine as the innermost saccharide attached to the N273 site (shown in red) in scFvl3-R4AQNAT. A series of y-ions covering from yl to yl5 was observed with the complete knockout of glycan molecule, leading to the confident identification of tryptic peptide 256- LISEEDLDGAALEGGAQNATGK-277, in which N263 residue was found to be deamidated to Asp (shown in green), consistent with commonly observed deamidation of Asn residues that are followed by Gly. A second series of y-ions with the added mass of 228.11 Da at N273 site was also found covering from y9/Yl to yl7/Yl, providing direct evidence for bacillosamine as the innermost saccharide (Yl) attached to N273 site. This result is also consistent with the previous observation that a relatively tight bond exists for Yl -peptide compared to the fragile internal glycan bonds, (e) Representative MS/MS spectrum (result from CjPglB DL variant is shown) for the quadruply-charged precursor (m/z 892.05) with low collision energy (CE = 29 eV) applied. A complete Y-type series ions (from Yl to Y6 ) attached to the core peptide reveals the expected C. jejuni heptasaccharide glycan.

[0021] Figure 8. Glycosylation of glycoSNAP-isolated YebF N24L/XXNXT variants. Western blot analysis of the most efficiently glycosylated YebF N24L/XXNXT targets for each CjPglB variant (DL, NL, or LQ) compared to glycosylation of YebF N24L/AQNAT . Molecular weight (MW) markers are indicated at left. The gO and gl labels denote aglycosylated and singly glycosylated forms of YebF, respectively. Results are representative of at least three biological replicates.

[0022] Figure 9. Substrate specificity is transferable between bacterial OSTs. Western blot analysis of YebF N24L/DQNAT and Ye bF N24L/AQNAT glycosylation by wt ClPglB and DL variant. Molecular weight (MW) markers are indicated at left. The gO and gl labels denote aglycosylated and singly glycosylated YebF, respectively. Note that glycosylated

YebFN24L/AQNAT (gl(A)) migrates faster than glycosylated YebFN24L/DQNAT (gl(D)). Results are representative of two biological replicates.

[0023] Figure 10. Uncropped images of Figure 1 blots and Coomassie-stained membranes. Each circle was approximately 90 mm in diameter (cut to fit a standard 100 mm petri dish).

[0024] Figure 11. Uncropped images of Figure 3 immunoblots.

[0025] Figure 12. Uncropped images of Figure 4 immunoblots.

[0026] Figure 13. Schematic of modified oligosaccharyltransferases produced using the glycoSNAP assay.

DETAILED DESCRIPTION OF THE INVENTION

[0027] Definitions

[0028] The following definitions of terms and methods are provided to better describe the present disclosure and to guide those of ordinary skill in the art in the practice of the present disclosure.

[0029] All publications, patents and other references mentioned herein are hereby incorporated by reference in their entireties and for all purposes.

[0030] EC numbers are established by the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB) (available at

http://www.chem.qmul.ac.uk/iubmb/enzyme/). The EC numbers referenced herein are derived from the KEGG Ligand database, maintained by the Kyoto Encyclopedia of Genes and Genomics, sponsored in part by the University of Tokyo. Unless otherwise indicated, the EC numbers are as provided in the database as of March 2013.

[0031] The accession numbers referenced herein are derived from the NCBI database (National Center for Biotechnology Information) maintained by the National Institute of Health, U.S.A. Unless otherwise indicated, the accession numbers are as provided in the database as of March 2013. [0032] The methods and techniques of the present invention are generally performed according to conventional methods well known in the art and as described in various general and more specific references that are cited and discussed throughout the present specification unless otherwise indicated. See, e.g., Sambrook et al, Molecular Cloning: A Laboratory Manual, 2d ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989); Ausubel et al, Current Protocols in Molecular Biology, Greene Publishing Associates (1992, and Supplements to 2002); Harlow and Lane, Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1990); Taylor and Drickamer,

Introduction to Glycobiology, Oxford Univ. Press (2003); Worthington Enzyme Manual, Worthington Biochemical Corp., Freehold, N.J.; Handbook of Biochemistry: Section A Proteins, Vol I, CRC Press (1976); Handbook of Biochemistry: Section A Proteins, Vol II, CRC Press (1976); Essentials of Glycobiology, Cold Spring Harbor Laboratory Press (1999).

[0033] Unless explained otherwise, all technical and scientific terms used herein have the same meaning as commonly understood to one of ordinary skill in the art to which this disclosure belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present disclosure, suitable methods and materials are described below. The materials, methods, and examples are illustrative only and not intended to be limiting. Other features of the disclosure are apparent from the following detailed description and the claims.

[0034] The term "claim" in the provisional application is synonymous with embodiments or preferred embodiments.

[0035] As used herein, "comprising" means "including" and the singular forms "a" or "an" or "the" include plural references unless the context clearly dictates otherwise. For example, reference to "comprising a cell" includes one or a plurality of such cells. The term "or" refers to a single element of stated alternative elements or a combination of two or more elements, unless the context clearly indicates otherwise.

[0036] The term "human-like" with respect to a glycoproteins refers to proteins having attached N-acetylglucosamine (GlcNAc) residue linked to the amide nitrogen of an asparagine residue (N-linked) in the protein, that is similar or even identical to those produced in humans.

[0037] "N-glycans" or "N-linked glycans" refer to N-linked oligosaccharide structures. The N-glycans can be attached to proteins or synthetic glycoprotein intermediates, which can be manipulated further in vitro or in vivo. The predominant sugars found on glycoproteins are glucose (Glu), galactose (Gal), mannose (Man), fucose (Fuc), ^-acetylgalactosamine (GalNAc), N-acetylglucosamine (GlcNAc), and sialic acid (e.g., N-acetyl-neuraminic acid (NeuAc or NANA). Hexose (Hex) may also be found. N-glycans differ with respect to the number of branches ("antennae" or "arms") comprising peripheral sugars (e.g., GlcNAc, galactose, fucose and sialic acid) that are added to the "triamannosyl core". The term

"triamannosyl core", also referred to as "M3", "M3GN2", the "triamannose core", the "pentasaccharide core" or the "paucimannose core" reflects Man 3 GlcNAc 2 oligosaccharide structure where Manal,3 arm and the Manal,6 arm extends from the di-GlcNAc structure (GlcNAc 2 ): pi,4GlcNAc-pi,4GlcNAc. N-glycans are classified according to their branched constituents (e.g., high-mannose, complex or hybrid).

[0038] A "high-mannose" type N-glycan comprises four or more mannose residues on the di-GlcNAc oligosaccharide structure. "M4" reflects Man 4 GlcNAc 2 . "M5" reflects

Man 5 GlcNAc 2 .

[0039] A "hybrid" type N-glycan has at least one GlcNAc residue on the terminal end of the al,3 mannose (Man al,3) arm of the trimannose core and zero or more mannoses on the al,6 mannose (Man al,3) arm of the trimannose core. The various N-glycans are also referred to as "glycoforms". An example of a hybrid glycan is "GNM3GN2", which is GlcNAcMan 3 GlcNAc 2 .

[0040] A "complex" type N-glycan typically has at least one GlcNAc residue attached to the Manal,3 arm and at least one GlcNAc attached to the Manal,6 arm of the trimannose core. Complex N-glycans may also have galactose or ^-acetylgalactosamine residues that are optionally modified with sialic acid or derivatives (e.g., "Neu" refers to neuraminic acid and "Ac" refers to acetyl). Complex N-glycans may also have intrachain substitutions comprising "bisecting" GlcNAc and core fucose. Complex N-glycans may also have multiple antennae on the trimannose core, often referred to as "multiple antennary glycans" or also termed "multi-branched glycans," which can be tri-antennary tetra-antennary or penta-antennary glycans.

[0041] As used herein, the term "predominantly" or variations such as "the predominant" or "which is predominant" will be understood to mean the glycan species as measured that has the highest mole percent (%) of total N-glycans after the glycoprotein has been removed (e.g., treated with PNGase and the glycans released) and are analyzed by mass spectroscopy, for example, MALDI-TOF MS. In other words, the phrase "predominantly" is defined as an individual entity, such as a specific glycoform, present in greater mole percent than any other individual entity. For example, if a composition consists of species A in 40 mole percent, species B in 35 mole percent and species C in 25 mole percent, the composition comprises predominantly species A. The term "enriched", "uniform", "homogenous" and "consisting essentially of are also synonymous with predominant in reference to the glycans.

[0042] The mole % of N-glycans as measured by MALDI-TOF-MS in positive mode refers to mole % saccharide transfer with respect to mole % total N-glycans. Certain cation adducts such as K+ and Na+ are normally associated with the peaks eluted increasing the mass of the N-glycans by the molecular mass of the respective adducts.

[0043] Unless otherwise indicated, and as an example for all sequences described herein under the general format "SEQ ID NO:", "nucleic acid comprising SEQ ID NO: l" refers to a nucleic acid, at least a portion of which has either (i) the sequence of SEQ ID NO: 1 , or (ii) a sequence complementary to SEQ ID NO: 1. The choice between the two is dictated by the context. For instance, if the nucleic acid is used as a probe, the choice between the two is dictated by the requirement that the probe be complementary to the desired target.

[0044] An "isolated" or "substantially pure" nucleic acid or polynucleotide (e.g., RNA, DNA, or a mixed polymer) or glycoprotein is one which is substantially separated from other cellular components that naturally accompany the native polynucleotide in its natural host cell, e.g., ribosomes, polymerases and genomic sequences with which it is naturally associated. The term embraces a nucleic acid, polynucleotide that (1) has been removed from its naturally occurring environment, (2) is not associated with all or a portion of a

polynucleotide in which the "isolated polynucleotide" is found in nature, (3) is operatively linked to a polynucleotide which it is not linked to in nature, or (4) does not occur in nature. The term "isolated" or "substantially pure" also can be used in reference to recombinant or cloned DNA isolates, chemically synthesized polynucleotide analogs, or polynucleotide analogs that are biologically synthesized by heterologous systems.

[0045] However, "isolated" does not necessarily require that the nucleic acid, polynucleotide or glycoprotein so described has itself been physically removed from its native environment. For instance, an endogenous nucleic acid sequence in the genome of an organism is deemed "isolated" if a heterologous sequence is placed adjacent to the endogenous nucleic acid sequence, such that the expression of this endogenous nucleic acid sequence is altered. In this context, a heterologous sequence is a sequence that is not naturally adjacent to the endogenous nucleic acid sequence, whether or not the heterologous sequence is itself endogenous (originating from the same host cell or progeny thereof) or exogenous (originating from a different host cell or progeny thereof). By way of example, a promoter sequence can be substituted (e.g., by homologous recombination) for the native promoter of a gene in the genome of a host cell, such that this gene has an altered expression pattern. This gene would now become "isolated" because it is separated from at least some of the sequences that naturally flank it.

[0046] A nucleic acid is also considered "isolated" if it contains any modifications that do not naturally occur to the corresponding nucleic acid in a genome. For instance, an endogenous coding sequence is considered "isolated" if it contains an insertion, deletion, or a point mutation introduced artificially, e.g., by human intervention. An "isolated nucleic acid" also includes a nucleic acid integrated into a host cell chromosome at a heterologous site and a nucleic acid construct present as an episome. Moreover, an "isolated nucleic acid" can be substantially free of other cellular material or substantially free of culture medium when produced by recombinant techniques or substantially free of chemical precursors or other chemicals when chemically synthesized.

[0047] Modified Oligosaccharyltransferase

[0048] Various oligosaccharyltransferase "OSTs" are expressed, which are useful in transferring glycans onto noncanonical acceptor sequences. The present invention discloses modified oligosaccharyltransferase activity (EC 2.4.1.119) that demonstrates relaxed specificity wherein the oligosaccharyltransferase is capable of transfer a glycan onto a N-X- S/T motif on a protein. In further embodiments, the oligosaccharyl transferase enzyme activity (EC 2.4.1.119) is characterized as capable of glycosylating a native site in a eukaryotic glycoprotein. In preferred embodiments, the oligosaccharyltransferase transfers one or more of high-mannose, hybrid and complex glycans onto a protein comprising the N- X-S/T motif. In other embodiments, the oligosaccharyltransferase is expressed in a prokaryotic host cell comprising at least one other glycosyltransferase activity.

[0049] In certain embodiments, one or more modified OST variants exhibit transfer of glycans on one or more N-linked sites on a protein including but not limited to AQNAT, NQNAT, QQNAT, HQNAT, GQNAT, TQNAT and WQNAT.

[0050] glycoSNAP Assay

[0051] To shed light on the sequence determinants governing the more stringent specificity of bacterial OSTs, we sought to isolate Q ' PglB variants capable of transferring glycans to short eukaryotic N-X-S/T sequons. We hypothesized that such variants could be isolated by laboratory evolution using a reporter assay that generates a genotype-glycophenotype linkage.

[0004] To directly detect N-linked glycoproteins produced in E. coli, we developed a versatile, high-throughput colony blotting assay based on glycosylation of YebF, a small (10 kDa in its native form) protein that is secreted into the extracellular medium 24 . This assay, which we named glycoSNAP (glycosylation of secreted N-linked acceptor proteins), effectively separates glycosylated YebF proteins from their producing cells and any fOS or membrane-associated LLOs. Using this method, a combinatorial library of Q ' PglB variants was screened and a total of 26 unique variants were isolated based on their ability to conjugate glycans to eukaryotic N-X-S/T acceptor sites appended to the C-terminus of YebF. The glycoSNAP assay was subsequently applied to experimentally identify sequons that could be tolerated by the three most active Q ' PglB variants. As expected, the relaxed OSTs glycosylated an array of noncanonical acceptor sequences, exhibiting site selection that was reminiscent of eukaryotic OSTs. In fact, each of the relaxed Q ' PglB variants was capable of glycosylating a native site in a eukaryotic glycoprotein. Hence, glycoSNAP not only permitted the discovery of amino acids that govern OST acceptor site specificity but also yielded a set of more flexible N-glycosylation biocatalysts for use in glycoengineering applications.

[0052] Accordingly, provided herein are methods to screen a combinatorial library of oligosaccharyltransferase variants using high-throughput colony-blotting assay. The provides secreting N-linked protein into the extracellular medium and separating the glycosylated N- linked protein from cells and free oligosaccharides or membrane-associated lipid-linked oligosaccharides (LLOs).

[0053] Host Cells

[0054] In accordance with the present invention, the host cell is a prokaryote. Such cells serve as a host for expression of recombinant proteins for production of recombinant therapeutic proteins of interest. Exemplary host cells include E. coli and other

Enterobacteriaceae, Escherichia sp., Campylobacter sp., Wolinella sp., Desulfovibrio sp. Vibrio sp., Pseudomonas sp. Bacillus sp., Listeria sp., Staphylococcus sp., Streptococcus sp., Peptostreptococcus sp., Megasphaera sp., Pectinatus sp., Selenomonas sp., Zymophilus sp., Actinomyces sp., Arthrobacter sp., Frankia sp., Micromonospora sp., Nocardia sp.,

Propionibacterium sp., Streptomyces sp., Lactobacillus sp., Lactococcus sp., Leuconostoc sp., Pediococcus sp., Acetobacterium sp., Eubacterium sp., Heliobacterium sp.,

Heliospirillum sp., Sporomusa sp., Spiroplasma sp., Ureaplasma sp., Erysipelothrix, sp., Corynebacterium sp. Enterococcus sp., Clostridium sp., Mycoplasma sp., Mycobacterium sp., Actinobacteria sp., Salmonella sp., Shigella sp., Moraxella sp., Helicobacter sp,

Stenotrophomonas sp., Micrococcus sp., Neisseria sp., Bdellovibrio sp., Hemophilus sp., Klebsiella sp., Proteus mirabilis, Enterobacter cloacae, Serratia sp., Citrobacter sp., Proteus sp., Serratia sp., Yersinia sp., Acinetobacter sp., Actinobacillus sp. Bordetella sp., Brucella sp., Capnocytophaga sp., Cardiobacterium sp., Eikenella sp., Francisella sp., Haemophilus sp., Kingella sp., Pasteurella sp., Flavobacterium sp. Xanthomonas sp., Burkholderia sp., Aeromonas sp., Plesiomonas sp., Legionella sp. and alpha-proteobacteria such as Wolbachia sp., cyanobacteria, spirochaetes, green sulfur and green non-sulfur bacteria, Gram-negative cocc Gram negative bacilli which are fastidious, Enterobacteriaceae -glucose-fermenting gram-negative bacilli, Gram negative bacilli - non-glucose fermenters, Gram negative bacilli - glucose fermenting, oxidase positive.

[0055] In one embodiment of the present invention, the OST is expressed in E. coli host strain C41(DE3), because this strain has been previously optimized for general membrane protein overexpression (Miroux et al, "Over-production of Proteins in Escherichia coli: Mutant Hosts That Allow Synthesis of Some Membrane Proteins and Globular Proteins at High Levels," JMol Biol 260:289-298 (1996), which is hereby incorporated by reference in its entirety). Further optimization of the host strain includes deletion of the gene encoding the DnaJ protein (e.g., AdnaJ cells). The reason for this deletion is that inactivation of dnaJ is known to increase the accumulation of overexpressed membrane proteins and to suppress the severe cytotoxicity commonly associated with membrane protein overexpression (Skretas et al, "Genetic Analysis of G Protein-coupled Receptor Expression in Escherichia coli: Inhibitory Role of DnaJ on the Membrane Integration of the Human Central Cannabinoid Receptor," Biotechnol Bioeng (2008), which is hereby incorporated by reference in its entirety). Applicants have observed this following expression of Algl and Alg2.

Furthermore, deletion of competing sugar biosynthesis reactions may be required to ensure optimal levels of N-glycan biosynthesis. For instance, the deletion of genes in the E. coli O antigen biosynthesis pathway (Feldman et al., "The Activity of a Putative Polyisoprenol- linked Sugar Translocase (Wzx) Involved in Escherichia coli O Antigen Assembly is Independent of the Chemical Structure of the O Repeat," J Biol Chem 274:35129-35138 (1999), which is hereby incorporated by reference in its entirety) will ensure that the bactoprenol-GlcNAc-PP substrate is available for other reactions. To eliminate unwanted side reactions, the following are representative genes that may be deleted from the E. coli host strain: wbbL, glcT, glf, gafT, wzx, wzy, waaL, nanA, wcaJ.

[0056] Methods for transforming/transfecting host cells with expression vectors are well- known in the art and depend on the host system selected, as described in Sambrook et al, Molecular Cloning: A Laboratory Manual, Cold Springs Laboratory Press, Cold Springs Harbor, New York (1989). For eukaryotic cells, suitable techniques may include calcium phosphate transfection, DEAE-Dextran, electroporation, liposome-mediated transfection and transduction using retrovirus or other virus, e.g. vaccinia or, for insect cells, baculovirus. For bacterial cells, suitable techniques may include calcium chloride transformation,

electroporation, and transfection using bacteriophage.

[0057] Target Glycoproteins

[0058] Various examples of suitable target glycoproteins may be produced according to the invention, which include without limitation: cytokines such as interferons, G-CSF, coagulation factors such as factor VIII, factor IX, and human protein C, soluble IgE receptor a-chain, IgG, IgG fragments, IgM, interleukins, urokinase, chymase, and urea trypsin inhibitor, IGF-binding protein, epidermal growth factor, growth hormone-releasing factor, annexin V fusion protein, angiostatin, vascular endothelial growth factor-2, myeloid progenitor inhibitory factor- 1, osteoprotegerin, a-1 antitrypsin, DNase II, a-feto proteins, AAT, rhTBP-1 (aka TNF binding protein 1), TACI-Ig (transmembrane activator and calcium modulator and cyclophilin ligand interactor), FSH (follicle stimulating hormone), GM-CSF, glucagon, glucagon peptides, GLP-1 w/ and w/o FC (glucagon like protein 1), GLP-1 receptor agonist e.g., exenatide, direct thrombin inhibitor e.g., bivalirudin, IGF-1 e.g., mecasermin, parathyroid hormone e.g., teriparatide, plasma kallikrein inhibitor e.g., ecallantide, IL-I receptor agonist, sTNFr (aka soluble TNF receptor Fc fusion), CTLA4-Ig (Cytotoxic T Lymphocyte associated Antigen 4-Ig), receptors, hormones such as human growth hormone, erythropoietin, peptides, stapled peptides, human vaccines, animal vaccines, serum albumin and enzymes such as ATIII, rhThrombin, glucocerebrosidase and asparaginase.

[0059] Antibodies, fragments thereof and more specifically, the Fab regions such as adalimumab, atorolimumab, fresolimumab, golimumab, lerdelimumab, metelimumab, morolimumab, sifalimumab, ipilimumab, tremelimumab, bertilimumab, briakinumab, canakinumab, fezakinumab, ustekinumab, adecatumumab, belimumab, cixutumumab, conatumumab, figitumumab, intetumumab, iratumumab, lexatumumab, lucatumumab, mapatumumab, necitumumab, ofatumamb, panitumumab, pritumumab, rilotumumab, robatumumab, votumumab, zalutumumab, zanolimumab, denosumab, stamulumab, efungumab, exbivirumab, foravirumab, libivirumab, rafivirumab, regavirumab, sevirumab, tuvirumab, nebacumab, panobacumab, raxibacumab, ramucirumab, gantenerumab.

[0060] One aspect of the present invention is directed to a glycoprotein conjugate produced comprising a glycan on the N-X-S/T motif of the protein, the glycan transferred onto the N- linked site via the modified oligosaccharyltransferase, wherein N is asparagine, X is any amino acid other than proline, S is Serine and T is threonine. [0061] Results

[0062] A secreted reporter for E. coli TV-linked glycosylation. YebF modified at its C- terminus with a glycosylation tag consisting of four tandem repeats of an optimal

glycosylation sequon (YebF 4xDQNAT ) is glycosylated and accumulates in the extracellular medium of E. coli cells harboring plasmids encoding E. coli YebF and the C. jejuni pgl locus 22 . Here, we leveraged secretion of glycosylated YebF 4xDQNAT with a colony blotting method to create a genetic screen named glycoSNAP (Fig. 5). Specifically, colonies replicated onto a filter membrane were induced to secrete YebF 4xDQNAT , which subsequently diffused away from filter-bound cells and bound to a second nitrocellulose membrane layer. Lectin- or immuno-blotting of the nitrocellulose membrane was then used to detect the presence of glycosylated YebF 4xDQNAT . Positive signals on nitrocellulose correlated to specific

glycosylation-competent colonies, which were preserved on the initial filter membrane for further analysis as needed.

[0063] Colonies carrying plasmids that encode N-glycosylation machinery and E. coli YebF modified with an acceptor sequon (e.g., YebF4x DQNAT ) were replicated on a filter, and protein expression was induced when the filter was overlaid on a plate layered with a nitrocellulose membrane. YebF that was secreted and glycosylated was specifically detected by Western blot and correlated to glycosylation-competent colonies.

[0064]

[0005] To determine whether this method could reliably identify colonies producing glycoproteins, we transformed E. coli strain CLM24 with three plasmids: pMW07-pglAB, which encodes the entire pgl pathway except for C PglB; pMAFlO, which encodes Q ' PglB 25 ; and pTrc99-YebF-GT-6x-His, which encodes YebF 4xDQNAT 22 . When nitrocellulose membranes generated using these cells were blotted with soybean aggluntinin (SBA) lectin that binds terminal GalNAc residues in the C. jejuni glycan 26 , a clear signal corresponding to glycosylation-competent colonies was observed (Fig. 1). Probing the membrane with antibodies specific for the 6x-His tag present at the C-terminus of YebF 4xDQNAT confirmed that the SBA reactive spots coincided with secreted YebF 4xDQNAT proteins (Fig. 1). When Q ' PglB was rendered inactive by mutation of two residues in the catalytic pocket, namely D54N and E316Q 13 , no glycan-specific signal from SBA blotting was detected (Fig. 1). This lack of signal was attributed to absence of protein glycosylation because YebF 4xDQNAT secretion was still detected by anti-6x-His antibodies. Glycosylation was similarly abolished in cells co-expressing wt Q ' PglB with YebF lacking the canonical acidic residue in the -2 position of the acceptor motif (YebF y ). Longer exposures revealed a very faint glycan- specific SBA signal, while YebF secretion levels remained essentially the same (Fig. 1). The weak signal was later attributed to a very low level of nonconsensus glycosylation by wt Q ' PglB in this system (see below) and thus was still N-glycoprotein dependent. Taken together, these data confirmed the ability of our assay to reliably detect glycosylation- competent colonies and recapitulated the known acceptor site specificity of the bacterial OST.

[0065] Structure-guided laboratory evolution of OST specificity. Given the observation of a salt bridge between R331 of C PglB and the -2 Asp of a bound acceptor peptide 13 (Fig. 2a), we hypothesized that the minus two rule may be a consequence of this PglB-peptide interaction. To test this hypothesis, we used the glycoSNAP assay to isolate Q ' PglB variants capable of efficiently glycosylating a minimal N-X-S/T acceptor motif. This first involved creation of a focused combinatorial library of OST variants. Q ' PglB shares 56% identity with C/PglB 13 and alignment of the two sequences revealed that R331 in C/PglB corresponds to R328 in Q ' PglB (Fig. 2b). However, Q ' PglB also has a second Arg immediately preceding R328 that is not conserved in C PglB. A homology model generated for Q ' PglB showed R327 was prominently positioned amongst a conserved cluster of strongly polar residues lining the entrance of the peptide/protein binding cavity (Fig. 2a). Therefore, we chose to mutate both R327 and R328 in our focused Q ' PglB library. Codons for R327 and R328 were randomized by PCR using degenerate NNK primers, and the resulting 4.5xl0 5 -member library was screened for Q ' PglB variants capable of efficiently glycosylating YebF 4xAQNAT . A total of 26 unique hits were isolated, and their glycosylation activity was confirmed by immunoblotting (Fig. 6a, for example). Densitometry was then used for the relative comparison of glycosylation efficiency of the four AQNAT sites by each positive hit. The 9 most efficient OSTs with respect to the amount of glycosylated YebF 4xAQNAT are given in Table 1. For comparison, glycosylation of YebF 4xDQNAT by wild-type (wt) Q ' PglB resulted in the majority of the proteins appearing in either a triply or quadruply glycosylated form, while YebF 4xAQNAT appeared primarily aglycosylated in the presence of wt Q ' PglB. It should be noted that a small amount of mono- and diglycosylated protein was detected under the conditions tested here, which to our knowledge is the first reported instance of nonconsensus glycosylation by wt Q ' PglB. Nonetheless, the efficiency of this glycosylation was very low compared to glycosylation of DQNAT sites by wt Q ' PglB or AQNAT sites by the isolated mutants. Importantly, this low level of nonconsensus glycosylation by wt Q ' PglB corresponded to a very faint signal in the glycoSNAP assay that was barely above

background (Fig. 1) and thus was never isolated in our screening efforts. The most common substitution uncovered by our screen was R328 substituted with Leu (isolated 9 times) or Gin (isolated 4 times). None of the 26 hits retained Arg in position 328, whereas 4 hits - RL, RM, RN, and RP - retained Arg in position 327, highlighting the importance of R328 in restricting the specificity of wt Q ' PglB. The DL, NL, and LQ variants each produced the greatest percentage of doubly glycosylated or greater YebF 4xAQNAT (Table 1) and thus were chosen for further analysis. Homology modeling of each revealed a more open peptide/protein binding pocket (Fig. 2c), which could potentially provide greater accessibility to acceptor peptide entry into the catalytic pocket.

[0066] Table 1. Extent of YebF 4xAQNAT glycosylation by C/PglB variants

glycan QPglB b

occupancy a RR C RR DL NL LQ ML RL GL VL RN PV

5x 2.2 nd 8.2 1.1 0.5 2.1 nd 2.7 nd nd nd

4x 25.0 nd 20.8 28.0 3.6 17.0 25.4 3.6 nd nd 8.2

3x 47.2 nd 19.0 25.6 41.6 18.3 12.7 25.0 19.7 16.2 27.6

2x 24.0 7.0 32.1 29.2 34.8 34.5 39.3 37.8 36.9 37.6 43.8 lx 1.6 18.7 10.3 12.3 6.9 22.5 16.4 22.4 24.0 40.9 12.8

Ox nd 74.3 9.6 3.8 12.6 5.6 6.1 8.5 19.4 5.4 7.7

[0067] a Glycan occupancy defined as the relative % of each YebF 4xAQNAT form detected, ranging from aglycosylated (Ox) through quintuply glycosylated (5x). YebF 4xAQNAT glycosylation levels were quantified by densitometry of anti-His immunoblots (see Fig. 6a, for example). b Different Q ' PglB clones including wt (RR) and variants with substitutions at positions 327 and 328. 'Glycosylation of YebF 4xDQNAT by wt Q ' PglB. nd = not detected

[0068] Glycosylation of an internal nonconsensus site. To simplify OST evaluation, we examined the ability of each Q ' PglB variant to glycosylate a single sequon at the C-terminus of YebF (YebF AQNAT ). In our initial tests, the Q ' PglB variants reproducibly generated mono- and diglycosylated YebF AQNAT (Fig. 6b, shown for DL mutant). This result was consistent with the similarly unexpected appearance of five proteins that were reactive towards the anti- glycan antiserum for YebF 4xAQNAT , which only contained four engineered glycosylation sites (Fig. 6a). Analysis of the YebF primary structure identified one putative nonconsensus glycosylation site (ANNET) at the extreme N-terminus of the mature protein (Fig. 6c). We hypothesized that diglycosylated YebF AQNAT was the result of nonconsensus glycosylation at this site by the OST variants, which were isolated based on their ability to glycosylate sites like ANNET that lacked a negatively charged residue in the -2 position. To test this hypothesis, an N24L substitution was introduced to eliminate the putative nonconsensus glycosylation site. Indeed, glycosylation of YebF N24L/AQNAT by the DL variant produced only a single glycoform (Fig. 6b), providing additional evidence for relaxed site selection by the Q ' PglB variants. It is also worth noting that all samples were harvested from the culture supernatant; hence, glycosylation at both the extreme N- and C-termini of YebF did not interfere with its secretion across the outer membrane.

[0069] Broadly relaxed specificity of isolated OST variants. We next sought to more fully characterize the acceptor site preferences of the DL, NL, and LQ variants. All three of these OSTs were observed to generate a significant amount of monoglycosylated y e bF N24L/AQNAT whereas wt Q ' PglB was incapable of glycosylating this acceptor protein (Fig. 6d and e). To determine if the variants could still recognize a canonical bacterial motif, we generated a YebF N24L/D Q NAT construct As expected, wt Q'PglB efficiently glycosylated Ye bF N24L/DQNAT whereas the DL, NL, and LQ mutants did not detectably glycosylate Y e bF N24L/DQNAT under the same conditions (Fig. 6d and e). Immunoblotting against the HA epitope tag fused to each of the Q ' PglB constructs showed the DL and NL mutants were expressed at higher levels than wt Q ' PglB or the LQ mutant (Fig. 6d). However, the observed relaxed substrate specificity was not attributed solely to higher expression levels, since the higher expression of the DL and NL mutants did not yield significant glycosylation of YebF N24L/DQNAT (Fig. 6d).

[0006] We next examined whether the relaxed substrate specificity for AQNAT extended to different contexts. For this analysis, we employed a single-chain antibody fragment, scFvl3- R4, which has a single glycosylation tag fused at its C-terminus 20 . A panel of acceptor site variants was created by substituting all 20 amino acids in the -2 position of the glycosylation sequon. When tested against this panel, wt Q ' PglB showed the expected preference for D/E in the -2 position (Fig. 3a). A low level of glycosylation of the GQNAT and HQNAT sequons was also detected, suggesting some inherent relaxation in target specificity under the conditions used here. In contrast to the restricted specificity of the wt enzyme, each of the mutants exhibited much less stringent specificity. For example, the DL variant glycosylated 15/20 acceptor sites at clearly detectable levels, with the most efficient glycosylation observed for TQNAT and WQNAT (Fig. 3b). Likewise, the NL mutant readily glycosylated 19/20 targets, with only RQNAT lacking apparent modification (Fig. 3c). The most efficient glycosylation by this OST was observed in the context of AQNAT, NQNAT, and QQNAT motifs. The LQ variant glycosylated 14/20 target sites and recognized HQNAT most efficiently (Fig. 3d).

[0007] To confirm the nonconsensus site glycosylation observed for the different Q ' PglB variants, we performed mass spectrometry using scFvl3-R4 AQNAT as acceptor protein. A trypsin site (Gly-Lys-Gly) was introduced immediately after the glycosylation tag in scFvl3- R4 AQNAT to facilitate removal of the positively charged 6x-His tag. This new scFvl3-R4 AQNAT construct was glycosylated in cells expressing one of the DL, NL, or LQ variants, after which glycoproteins were purified using nickel-affinity chromatography (Fig. 7a), treated with trypsin, and subjected to liquid chromatography-mass spectrometry (LC-MS) analysis. LC- MS of gel-extracted tryptic digests of all purified proteins showed a single major peak (eluting at -27.3 min), whose MS spectra yielded only a single triply-charged ion and its associated quadruply-charged ion. The fragmentation spectra of the triply and quadruply- charged ions confirmed the amino acid sequence of the glycopeptides and identified the expected C. jejuni glycan containing seven monosaccharides with added mass of 1405.56 Da on the N273 residue in the tryptic peptide 256-LISEEDLDGAALEGGAQNATGK-277 of all three purified scFvl3-R4 AQNAT proteins. The MS/MS profiles of the triply-charged precursor (m/z 1189.03) identified the glycopeptides and a 1405.56 Da glycan with bacillosamine as the innermost saccharide attached to the N273 sites in each (Fig. 7b-d). Due to the relatively high collision energy (CE = 56 eV) required for peptide sequencing, only partial glycan structural information was obtained as expected. However, when a lower CE (29 eV) was applied for the quadruply-charged ion (m/z 892.05), we obtained complete Y-type series ions (from Yl to Υ6β) attached to the core peptide revealing the expected heptasaccharide glycan structure (Fig. 7e). Taken together, these results unequivocally confirm glycan attachment to the nonconsensus AQNAT site by all three of the isolated Q ' PglB variants.

[0070] Unbiased determination of acceptor site preferences. To experimentally define the acceptor site specificity for the DL, NL, and LQ mutants in an unbiased fashion and to demonstrate the versatility of the glycoSNAP method, we screened a combinatorial library of acceptor site sequences against each of the OST variants. A library of sequons in which the - 2, -1, and +1 positions were randomized by PCR using degenerate NNK primers was introduced in single copy at the C-terminus of YebF N24L . The resulting 2.4x10 5 -member library was first screened in the presence of wt Q ' PglB to validate the assay for defining sequon specificity. Sequencing of 30 randomly chosen positive clones demonstrated the expected D/E-X_i-N-X+i-T specificity for efficient target glycosylation by wt PglB, with a greater preference for Asp in the -2 position (Fig. 3e). When the same library was screened with each of the DL, NL, and LQ mutants, no strong preferences for specific amino acids were observed in any of the randomized positions (Fig. 3f-h). For the DL variant, RXNXT was most commonly isolated, with 6/30 hits containing this sequence. For the NL and LQ variants, slight preference for AQNAT was observed, with 5/30 and 10/30 picks,

respectively, having this sequence. On average, glycosylation efficiency was comparable to or better than what was observed for AQNAT glycosylation (Fig. 7). The most efficiently glycosylated sites for the DL variant were SGNIT, RGNIT, RGNQT, or RTNRT, while the NL variant efficiently glycosylated AGNVT, SNNIT, and STNST sites and the LQ variant preferred KGNNT and SANVT sequences (Fig. 8).

[0071] De novo relaxation of CTPglB acceptor site specificity. To determine if the identified mutations could similarly confer less stringent substrate specificity to homologous OSTs, we rationally designed a relaxed C/PglB variant by replacing Q330 and R331 with D and L, respectively. The wt C/PglB and the C. lari DL mutant glycosylated Ye bF N24L/DQNAT with nearly identical efficiency (Fig. 9), confirming that the DL mutant retained a strong preference for DQNAT. Moreover, the DL substitution endowed C/PglB with the ability to glycosylate Ye bF N24L/AQNAT , an activity not shared by wt C/PglB (Fig. 9). Thus, the contribution of these homologous residues to acceptor recognition appears to be a conserved feature of PglB.

[0072] Glycosylation of a native eukaryotic glycoprotein. Since all three C PglB variants exhibited significantly relaxed specificity for the -2 position, we hypothesized that they might recognize a short eukaryotic N-X-S/T glycosylation site in a native glycoprotein. To test this notion, each variant was evaluated for the ability to glycosylate bovine RNaseA, which contains a single acceptor site at N34 in the context SRNLT. It has been shown previously that wt Q ' PglB can only glycosylate this site when it is changed to a canonical bacterial sequon with D or E substituted for S32 in the -2 position (RNaseA S32D ) 20 ' 27. In agreement with earlier studies, wt Q ' PglB was only capable of glycosylating the RNaseA S32D mutant (Fig. 4a) but not wt RNaseA (Fig. 4b). On the other hand, the DL, NL, and LQ mutants not only glycosylated RNaseA S32D (Fig. 4a) but also glycosylated the short N-X-S/T sequon in wt RNaseA (Fig. 4b), confirming our hypothesis and marking the first instance of a bacterial OST recognizing a native eukaryotic sequon.

[0073] Discussion [0074] The glycoSNAP assay described here is a versatile, high-throughput screen for N- linked protein glycosylation in E. coli strains. Using this assay, novel biocatalysts capable of recognizing the minimal N-X-S/T eukaryotic-type sequon in both peptide tags and native proteins were discovered. Given the modularity of the glycoSNAP assay, we anticipate that any protein component of an N-glycosylation pathway including acceptor proteins, OSTs, and glycosyltransferases (GTases) can be similarly interrogated in a combinatorial fashion. For example, by using different antibodies or lectins specific for a glycan of interest, one could isolate GTase variants and/or unique combinations of GTases that catalyze the biosynthesis of designer glycan structures that become successfully conjugated to acceptor proteins.

[0008] The isolated C PglB variants revealed sequence determinants that govern substrate recognition by bacterial OSTs. C PglB R328, homologous to C/PglB R331, appears to restrict substrate specificity to extended D/E-X-N-X-S/T sequons. The significance of this residue in regulating site selection was evidenced by the fact that (i) none of the positive hits retained R328 and (ii) the ability of 4 relaxed hits to glycosylate AQNAT after just a single R328 substitution. Moreover, the DL, NL, and LQ variants showed an unwavering preference for sequons that lacked an acidic residue in the -2 position. In fact, the glycosylation efficiency of these variants decreased significantly compared to wt Q ' PglB when a -2 Asp residue was present (e.g., Ye bF N24L/DQNAT , scFvl3-R4 DQNAT , and RNaseA S32D ). It is interesting to note that while eukaryotic OSTs exhibit no strong preferences for specific amino acids beyond N- X-S/T, Asp is most frequently present in the -2 position of confirmed aglycosylated sequons (16.7% out of a data set of 48 sites) and is the fourth least common residue (3.1% of 417 sites) in confirmed glycosylated eukaryotic sequons 28 . These observations further support the shift of our Q ' PglB variants to more eukaryotic-like specificities.

[0009] Relaxed specificity was observed previously with PglB homologs from C. lari and Desulfovibrio desulfuricans (DdPgl ) where each glycosylates a nonconsensus N-X-S/T motif, NNN 274 ST, in the C. jejuni acceptor protein AcrA 29 ' 30 . However, in both cases, the relaxed substrate specificity is exclusive to this unique site in AcrA and not observed with any other N-X-S/T sites tested. In stark contrast, the relaxation of our mutants was much more general and potentially more useful for glycoengineering as demonstrated by glycosylation of RNaseA at its native N-X-S/T acceptor site. Recent efforts to further relax the specificity of C/PglB by swapping the charged residues between the bacterial OST and acceptor peptide resulted in no apparent glycosylation of an RQNAT sequon by C/PglB R331D/E mutants in vivo 31 . Here, the application of glycoSNAP resulted in a unique instance of charge inversion involving R327 of Q ' PglB, where 3 of the 5 sequons most efficiently glycosylated by the DL variant contained Arg in the -2 position. For comparison, the NL variant, which contains a neutral residue at the 327 position with similar size and shape to Asp, performed most efficiently with Ser or Ala in the -2 sequon position. These results suggest both R327 and R328 in Q ' PglB make important contributions to defining substrate specificity. Interestingly, only the polar nature of residue 327 is conserved in C/PglB, where the corresponding residue is Q330. This could indicate some differences between Q ' PglB and C/PglB in their specific mode of sequon recruitment or binding, as has been observed with the NNN 274 ST site in AcrA that was glycosylated by C/PglB but not Q ' PglB 29 .

[0010] Despite these differences, our results indicate that relaxed acceptor site specificity is a readily transferable trait between PglB homo logs. A C/PglB variant in which the native Q330/R331 residues were rationally replaced with DL glycosylated an AQNAT sequon as efficiently as the Q ' PglB DL variant and retained highly efficient glycosylation of a DQNAT sequon. This is in stark contrast to studies where a C/PglB R331A mutant generates only a very low level of AQNAT glycosylation and significantly reduces glycosylation of a

DQNAT sequon 31 . Our results revealed that the adjacent Q330 (R327 in Q ' PglB) residue plays an important role in regulating site selection along with R331. The R327/Q330 and R328/R331 residues are prominently positioned at the mouth of a channel of highly polar residues in the peptide/protein binding cavity of PglB, where their side chains may provide a selective barrier through specific interactions that stabilize sequons containing acidic residues. Hydrogen bonded associations, in addition to potential electrostatic interactions, may help stabilize the acceptor peptide as it navigates the catalytic pocket of PglB. The more open conformation predicted by homology models of the Q ' PglB DL, NL, and LQ variants may abolish some of these interactions, thereby accommodating more structurally diverse sequences. It is also interesting to note that sequence alignments (Fig. 2b) revealed a conserved DLQ motif in the eukaryotic STT3 subunit of the OST that is shifted by one amino acid compared to our Q ' PglB DL/NL/LQ mutations. Hence, it is intriguing to speculate that the ability of our relaxed mutants to glycosylate eukaryotic sequons may stem from an STT3- like remodeling of the catalytic pocket.

[0075] While the contributions of other PglB residues to substrate recruitment or binding were not examined here, the established glycoSNAP assay should facilitate future studies using larger combinatorial PglB libraries that cover significantly greater sequence space. Additionally, the proven use of YebF chimeras to secrete a diverse range of target proteins to the extracellular medium lends the intriguing possibility of directly screening target sequons in the context of their native proteins, as fusions to YebF 22 ' 24 ' 32. Overall, the development of glycoSNAP for the discovery of novel glycosylation pathway enzymes is a significant advance for mechanistic dissection of poorly understood aspects of N-glycosylation and should enable the creation of potent new biocatalysts for biosynthesis of tailor-made glycoproteins.

EXAMPLES

[0076] The above disclosure generally describes the present invention. A more specific description is provided below in the following examples. The examples are described solely for the purpose of illustration and are not intended to limit the scope of the present invention. Changes in form and substitution of equivalents are contemplated as circumstances suggest or render expedient. Although specific terms have been employed herein, such terms are intended in a descriptive sense and not for purposes of limitation.

EXAMPLE 1

[0077] Bacterial strains and growth conditions. E. coli strain DH5a was used for cloning, site-directed mutagenesis, and library construction while strain CLM24 25 was used for all glycosylation studies. Cultures were grown at 37°C in LB containing 100 g/ml trimethoprim (Tmp), 20 g/ml chloramphenicol (Cm), and either 100 g/ml ampicillin (Amp) or 80 g/ml spectinomycin (Spec) depending on the target-encoding plasmid. Cultures were typically induced at mid-log phase with 0.1 mM isopropyl β-D-thiogalactoside (IPTG) and 0.2% (w/v) L-arabinose. For R aseA glycosylation, cultures were induced with 0.01 mM IPTG.

Induction was carried out at 30°C for 16-20 h or, where indicated, for 4 h.

[0078] Plasmid construction. Plasmid pMW07-pglAB encodes the C. jejuni pgl locus with a complete in-frame deletion of pglB and was constructed by homologous recombination in yeast. Briefly, the pgl operon (galE-pglG) excluding pglB was amplified from pACYCpg/ 23 as two PCR products with overlapping ends. Both products were recombined with linearized vector pMW07 20 using a modified lazy bones protocol 33 . Briefly, 0.5 mL of an overnight yeast culture was pelleted and washed in sterile TE buffer (10 mM Tris-HCl pH 8.0, 1 mM EDTA). 0.4 mg of salmon sperm carrier DNA (Sigma), plasmid DNA, and PCR products were added to the pellet along with 0.5 mL Lazy Bones solution (40%> polyethylene glycol MW 3350, 0.1 M lithium acetate, 10 mM Tris-HCl pH 7.5, and 1 mM EDTA). After vortexing for 1 min, this solution was incubated up to 4 days at room temperature. Cells were heat shocked at 42°C, pelleted, and plated on selective medium. All C. jejuni pglB variants were derivatives of pMAFlO 25 , which encodes wt pglB with a C-terminal HA epitope tag in a pMLBAD vector. A catalytic mutant of PglB was constructed by introducing double point mutations, D54N and E316Q, based on homologous inactivating mutations reported for C. laripglB 13 . All YebF constructs were derivatives of pTrc-YebF-GT 22 , which encodes the native yebF gene from E. coli with a 4xDQNAT glycosylation tag and a 6x-His epitope tag at its C-terminus. For YebF lxl - D/A ^ QNAT constructs, the three C-terminal glycosylation sites were mutated to eliminate the glycosylation sequons, resulting in YebF 1 X ( D/A )Q NAT~3X ( D/A )Q NAV _ THE YebF N24L constructs were created by site-directed mutagenesis of the YebF lxl - D/A ^ QNAT" 3 (D/A)QNAV p arenta j plasmids. The pSF vector was made by insertion of Xbal and Sbfi sites, followed by the coding sequence for a FLAG epitope tag, into pSN18 21 . The C. laripglB gene from pET33b-ClStt3 (kindly provided by Bil demons) was cloned between the Xbal and Sbfi sites of this vector to yield pSF-C/PglB. The pMLBAD-C/PglB plasmid was constructed by cloning C. laripglB from pSF-C/PglB (without the FLAG epitope tag) into pMAFlO, using EcoRI and Ncol sites to replace the C. jejuni pglB gene in the parental construct. A C-terminal HA tag was added to C/PglB during cloning. The pBS plasmid was constructed by combining the spectinomycin resistance cassette and pSClOl ori from the pZ expression vectors 34 , digested with ^vrll and Aatll, with the expression region of pBAD24, amplified with primers pBAD-Avrll-for (5'-

AACATACCTAGGATCGATGCATAATGTGCCTGTC-3'), and pBAD-Aatll-rev (5'- AAGATTGACGTCGATGCCTGGCAGTTTATGG-3'). The gene encoding scFvl3- R4 DQNAT was cloned from pTrc . ssDs bA-scFvl3-R4 DQNAT 20 into the Xbal site of pBS, and this construct was used as the template for site-directed mutagenesis to construct all scFvl3- R4 XQNAT constructs plasmid p BS-scFvl3-R4 AQNAT was used as the template for pBS- scFvl3-R4 AQNAT GKG , where the codon for Lys was inserted between the existing codons for Gly by site-directed mutagenesis, to add a trypsin cleavage site to facilitate mass

spectrometry studies. Plasmid pTrc-ssDsbA-RNaseA S32D 20 was used directly and as the template to construct pTrc-ssDsbA-RNaseA encoding wt RNaseA (D32 reverted to S). All site-directed mutagenesis was performed using two stages of extra-long PCR. Primers were designed with desired base changes flanked by up to 20 homologous bases. Dpnl digestion was used to remove parental plasmid following PCR. All plasmids were confirmed by DNA sequencing at the Cornell Biotechnology Resource Center. EXAMPLE 2

[0079] GlycoSNAP assay. Transformants were plated on 150 mm LB agar plates containing 100 g/ml Tmp, 20 g/ml Cm, 100 g/ml Amp, and 0.2% (w/v) D-glucose and incubated overnight at 37°C. The second day, circles of nitrocellulose transfer membrane (Fisher Scientific) were pre-wet with sterile phosphate buffered saline (PBS) and placed onto induction plates consisting of LB agar containing 100 μg/ml Tmp, 20 μg/ml Cm, 100 μg/ml Amp, 0.1 mM IPTG, and 0.2% (w/v) L-arabinose. Colonies from the transformation plates were replicated onto Whatman 0.45 μιη 142 mm cellulose nitrate membrane filters (VWR). Filters were placed colony side up onto the nitrocellulose layer on the induction plates.

Induction plates were incubated at 30°C for 16-20 h. The third day, the colony containing filters were transferred to fresh LB agar plates containing 100 μg/ml Tmp, 20 μg/ml Cm, 100 μg/ml Amp, and 0.2%> (w/v) D-glucose and saved as needed. The nitrocellulose membranes were briefly rinsed in Tris buffered saline (TBS) then blotted with horseradish peroxidase (HRP)-conjugated lectin (0.5 μg/ml SBA-HRP) or immunob lotted with 6x-His tag-specific polyclonal antibodies (Abeam), as per standard Western blotting protocols. To detect all bound protein, membranes were stained with 0.1% Coomassie blue R-250 in 50% methanol and 7% acetic acid. Positive hits from library screening were individually picked and restreaked on LB agar plates containing 100 μg/ml Tmp, 20 μg/ml Cm, 100 μg/ml Amp, and 0.2%) (w/v) D-glucose before further analysis.

[0080] Protein analysis. For YebF samples induced 16-20 h, cells were pelleted, and the supernatant was harvested and precipitated with ice cold 10% trichloroacetic acid (TCA). For 4 h induction samples, culture volumes containing both cells and supernatant were harvested and directly TCA precipitated. For scFvl3-R4, periplasmic fractions were harvested after spheroplasting cells in buffers containing 0.2 M Tris-Ac (pH 8.2), 0.25 M sucrose, 160 μg/ml lysozyme, and 0.25 mM EDTA. For RNaseA, protein was purified by nickel-affinity chromatography from the periplasmic fractions of 50-100 ml cultures. In all cases, protein was solubilized in Laemmli sample buffer and resolved on SDS-polyacrylamide gels

(BioRad). Western blotting used 6x-His tag-specific polyclonal antibodies (Abeam) or C. jejuni heptasaccharide glycan- specific antiserum hR6 29 . Pierce enhanced chemiluminescent (ECL) substrate (Thermo Scientific) was used for detection of bound antibodies. All blots were visualized using a Chemidoc XRS+ system with Image Lab image capture software (BioRad).

[0081] MS analysis. Three recombinant scFvl3-R4 AQNAT proteins (~1 g), glycosylated in vivo by the PglB DL, NL, or LQ mutants, were purified using HisPur Ni-NTA resin (Thermo Fisher) from periplasmic fractions of 500 mL cultures and resolved on a 12% SDS- polyacrylamide gel. The corresponding glycoprotein bands at -36 kDa, detected with Biosafe Coomassie stain (BioRad), were excised and subjected to in-gel digestion with trypsin followed by extraction of the tryptic peptide. Briefly, gel slices were sequentially washed with distilled water, 50% acetonitrile (ACN)-IOO mM ammonium bicarbonate and 100% ACN. Gel pieces were dried in a Speedvac SCI 10 (Thermo Savant), reduced with 50 ΐ, of 10 mM dithiothreitol at 56°C for 45 min and alkylated by treatment with 70 of 55 mM iodoacetamide in the dark at room temperature for 45 min. After washing, the gel slices were dried, rehydrated with 40 μί of 10 ng/μΕ trypsin in 50 mM ammonium bicarbonate, 10% ACN on ice for 30 min, followed by incubation at 35°C for 16 h. The resultant peptides were collected after centrifugation for 2 min at 4,000 x g. The residual peptides in the gel were then sequentially extracted with 100 μί of 5% formic acid (FA), 100 μί of 50% ACN, and 100 xL of 75% ACN, 5% FA (vortexed for 30 min, sonicated for 5 min in each extraction). Extracts from each sample were combined and evaporated to dryness in a Speedvac SCI 10 (Thermo Savant). The tryptic peptides were reconstituted in 30 μί of 0.2% formic acid (FA) for subsequent precursor ion scanning MS analysis.

[0011] The nanoLC-ESI-MS/MS analysis was performed on an UltiMate3000 nanoLC (Thermo/Dionex) coupled with a hybrid triple quadrupole linear ion trap 4000 Q Trap mass spectrometer, which was equipped with a Micro Ion Spray Head II ion source (AB SCIEX). The tryptic peptides (5 were injected with an autosampler onto a PepMap CI 8 trap column (5 μιη, 300 μιη id x 5 mm, Thermo/Dionex) with 0.1% FA at 20 μΕ/ηιίη for 1 min and then separated on a PepMap CI 8 RP nano column (3 μιη, 75 μιη x 15 cm,

Thermo/Dionex) and eluted in a 60-min gradient of 10% to 35% ACN in 0.1% FA at 300 nL/min, followed by a 3 -min ramp to 95% ACN-0.1% FA and a 5 -min hold at 95% ACN- 0.1% FA. The column was re-equilibrated with 0.1% FA for 30 min prior to the next run.

[0012] MS data acquisition was performed using Analyst 1.4.2 software (AB SCIEX) for PI scan triggered information-dependent acquisition (IDA) analysis 35 . The precursor ion scan of the oxonium ion (HexNAc+ at mlz 204.08) was monitored using a step size of 0.2 Da cross a mass range of mlz 400 to 1800 for detecting glycopeptides containing the N- acetylhexosamine unit. The nanospray voltage was 1.9 kV, and was used in the positive ion mode for all experiments. The declustering potential was set at 50 eV and nitrogen was used as the collision gas. For the IDA analysis, after each precursor ion scan, the two highest intensity ions with multiple charge states were selected for MS/MS using a rolling collision energy that was applied based on the different charge states and mlz values of the ions. All acquired MS and MS/MS spectra triggered by PI scan on mlz 204 were manually inspected and interpreted with Analyst 1.4.2 and BioAnalysis 1.4 software (Applied Biosystems) for identification of the glycopeptide sequence, the N-linked glycosylation sites and glycan compositions.

EXAMPLE 3

[0082] Generation of homology models and sequence logos. Homology modeling was performed using SWISS-MODEL in automated mode, which is considered reliable when >50% sequence identity is shared between the target and template proteins 36 . Chain A of pdb 3RCE was specified as the template structure. Structure images were generated using PyMOL Molecular Graphics System, Version 1.7.0.1 Schrodinger, LLC. The acceptor peptide was added from alignment and overlay with 3RCE. Sequence logos were made from

N24L/XXNXT

sequons of confirmed positive hits from YebF glycoSNAP screening and generated using WebLogo 3 37 . Sequence conservation at each position is indicated by the height of each stack. Within each stack, the height of each amino acid letter represents its relative frequency at that position.

INFORMAL SEQUENCE LISTINGS [0083] C. jejuni PglB DNA

[0084] atgttgaaaaaagagtatttaaaaaacccttatttagttttgtttgcgatgattgtatta gcttatgtttttagtgtattttgcaggtttt attgggtttggtgggcaagtgagtttaacgagtattttttcaataatcaattaatgatca tttcaaacgatggctatgcttttgctgagggcgc aagagatatgatagcaggttttcatcagcctaatgatttgagttattatggatcttcttt atctacgcttacttattggctttataaaatcacacc tttttcttttgaaagtatcattttatatatgagtacttttttatcttctttggtggtgat tcctattattttactagctaatgaatacaaacgccctttaa tgggctttgtagctgctcttttagcaagtgtagcaaacagttattataatcgcactatga gtgggtattatgatacggatatgctggtaattgt tttacctatgtttattttattttttatggtaagaatgattttaaaaaaagactttttttc attgattgccttgccattatttataggaatttatctttggtg gtatccttcaagttatactttaaatgtagctttaattggactttttttaatttatacact tatttttcatagaaaagaaaagattttttatatagctgtg attttgtcttctcttactctttcaaatatagcatggttttatcaaagtgccattatagta atactttttgctttatttgctttagagcaaaaacgctta aattttatgattataggaattttaggtagtgcaactttgatatttttgattttaagtggt ggggttgatcccatactttatcagcttaaattttatatt tttagaagcgatgaaagtgcgaatttaacacagggctttatgtattttaatgttaatcaa accatacaagaagttgaaaatgtagattttagc gaatttatgcgaagaattagtggtagtgaaattgttttcttgttttctttgtttggtttt gtatggcttttgagaaaacataaaagtatgattatgg ctttacctatattggtgcttgggtttttagccttaaaaggaggacttagatttaccattt attctgtacctgtaatggctttaggatttggtttttta ttgagcgagtttaaggctatattggttaaaaaatatagccaattaacttcaaatgtttgt attgtttttgcaactattttgactttggctccagtat ttatccatatttacaactataaagcgccaacagttttttctcaaaatgaagcatcattat taaatcaattaaaaaatatagccaatagagaaga ttatgtggtaacttggtgggattatggttatcctgtgcgttattatagcgatgtgaaaac tttagtagatggtggaaagcatttaggtaagga taattttttcccttctttttctttaagtaaagatgaacaagctgcagctaatatggcaag acttagtgtagaatatacagaaaaaagcttttatg ctccgcaaaatgatattttaaaatcagacattttacaagccatgatgaaagattataatc aaagcaatgtggatttatttctagcttcattatca aaacctgattttaaaatcgatacaccaaaaactcgtgatatttatctttatatgcccgct agaatgtctttgattttttctacggtggctagttttt cttttattaatttagatacaggagttttggataaaccttttacctttagcacagcttatc cacttgatgttaaaaatggagaaatttatcttagca acggagtggttttaagcgatgattttagaagttttaaaataggtgataatgtggtttctg taaatagtatcgtagagattaattctattaaacaa ggtgaatacaaaatcactccaatcgatgataaggctcagttttatattttttatttaaag gatagtgctattccttacgcacaatttattttaatg gataaaaccatgtttaatagtgcttatgtgcaaatgttttttttgggaaattatgataag aatttatttgacttggtgattaattctagagatgcta aagtttttaaacttaaaatttaa

[0085] C. jejuni PglB AA

[0086] MLKKEYLKNPYLVLFAMIVLAYVFSVFCRFYWVWWASEFNEYFF NQLMII

SNDGYAFAEGARDMIAGFHQPNDLSYYGSSLSTLTYWLYKITPFSFESIILYMSTFL SS

LVVIPIILLANEYKRPLMGFVAALLASVANSYYNRTMSGYYDTDMLVIVLPMFILFF

MVRMILKKDFFSLIALPLFIGIYLWWYPSSYTLNVALIGLFLIYTLIFHRKEKIFYI AVIL

SSLTLSNIAWFYQSAIIVILFALFALEQKRLNFMIIGILGSATLIFLILSGGVDPIL YQLKF

YIFRSDESANLTQGFMYFNVNQTIQEVENVDFSEFMRRISGSEIVFLFSLFGFVWLL RK

HKSMIMALPILVLGFLALKGGLRFTIYSVPVMALGFGFLLSEFKAILVK YSQLTSNV CIVFATILTLAPVFIHIYNYKAPTVFSQNEASLLNQLK IANREDYVVTWWDYGYPV

RYYSDVKTLVDGGKHLGKDNFFPSFSLSKDEQAAANMARLSVEYTEKSFYAPQNDI

LKSDILQAMMKDYNQSNVDLFLASLSKPDFKIDTPKTRDIYLYMPARMSLIFSTVAS F

SFINLDTGVLDKPFTFSTAYPLDVK GEIYLSNGVVLSDDFRSFKIGDNVVSVNSIVEI

NSIKQGEYKITPIDDKAQFYIFYLKDSAIPYAQFILMDKTMFNSAYVQMFFLGNYDK

LFDLVINSRDAKVFKLKI*

[0087] C. lari PglB DNA

[0088] atgaaactacaacaaaatttcacggataataattctataaaatatacctgtattttaatc cttatagcctttgcttttagtgttttgtgt agattatactgggtagcttgggcaagtgagttttatgagtttttctttaatgatcaactc atgattactactaatgatggctatgcttttgcaga aggtgcaagagatatgatagcaggttttcatcaacctaatgacttatcttattttggaag ctcactttctactttgacttattggctttatagtatt ttgccttttagctttgaaagtattattttatatatgagtgctttttttgcttctttgatt gttgtgcctattatattaatcgcaagagagtataaactca ctacctatggctttatagcagctttacttggaagcattgcaaatagttattataaccgca ctatgagtgggtattacgatacagatatgctag tgttagttttaccaatgcttattttgcttacctttatacgcttaactattaataaagaca ttttcaccctacttttaagtccggtttttatcatgattta tttgtggtggtatccatcaagttattctttaaattttgctatgataggactttttggact ttatactttagtatttcatagaaaagaaaagatttttta tctaactattgctttgatgatcatagctttaagtatgctagcatggcaatataagcttgc tttgattgtattattatttgctatttttgcttttaaaga agaaaaaatcaatttttatatgatttgggctttgatttttattagcattttgatattgca tttaagtggcggcttagatcctgttttataccaactta aattttatgtatttaaagcttctgatgtgcaaaatttaaaagatgctgcctttatgtatt ttaatgtcaatgaaaccattatggaagtaaatactat cgatcctgaagtatttatgcaaagaattagctctagtgttttagtatttatcctttcttt tataggttttatcttactttgcaaagatcacaaaagc atgcttttggctctacctatgcttgcactaggttttatggctttaagagctggacttaga tttaccatttatgcagttcctgtgatggctttgggt tttgggtattttttatatgcattttttaattttttagaaaaaaaacaaatcaaacttagc ctaagaaataaaaatatcttacttatactcattgcattt tttagtataagccctgctttgatgcatatttattattataaatcctctactgtttttact tcttatgaagctagtattttaaatgatttaaaaaataaa gctcaaagagaagattatgttgttgcttggtgggattatggttatccaatacgctattat agcgatgtaaaaaccttaatcgatggtggaaa acacctaggaaaagataattttttctcatcttttgtcttaagcaaagaacaaattccagc agccaatatggcaagacttagcgtagaataca ctgaaaaatctttcaaagaaaactatcctgatgttttaaaagctatggttaaagattata ataaaacaagtgctaaagattttttagaaagttta aatgataaagattttaaatttgataccaataaaactagagatgtatacatttatatgcct tatagaatgttgcgtatcatgcctgtggtggcac aatttgcaaatacaaatcctgataatggagagcaagaaaaaagtttatttttctcccaag ctaatgccatagctcaagataaaaccacagg ttctgttatgcttgataatggagtagaaattattaatgattttagagccttaaaagtaga aggtgcaagcatacctttaaaagcttttgtggat atagaatccattactaatggcaaattttattacaatgaaattgattcaaaagctcaaatt tatttgctctttttaagagaatataaaagctttgtg attttagatgaaagtctttataatagttcttatatacaaatgtttttgttaaatcaatac gatcaagatttatttgaacaaattactaatgatacaa gagcaaaaatttataggctaaaaagatga

[0089] C. lari PglB AA

[0090] MKLQQNFTD NSIKYTCILILIAFAFSVLCRLYWVAWASEFYEFFFNDQLMIT

TNDGYAFAEGARDMIAGFHQPNDLSYFGSSLSTLTYWLYSILPFSFESIILYMSAFF AS

LIVVPIILIAREYKLTTYGFIAALLGSIANSYYNRTMSGYYDTDMLVLVLPMLILLT FIR LTINKDIFTLLLSPVFIMIYLWWYPSSYSLNFAMIGLFGLYTLVFHRKEKIFYLTIALMI

IALSMLAWQYKLALIVLLFAIFAFKEEKINFYMIWALIFISILILHLSGGLDPVLYQ LKF

YVFKASDVQNLKDAAFMYFNVNETIMEVNTIDPEVFMQRISSSVLVFILSFIGFILL CK

DHKSMLLALPMLALGFMALRAGLRFTIYAVPVMALGFGYFLYAFFNFLEK QIKLSL

RNK ILLILIAFFSISPALMHIYYYKSSTVFTSYEASILNDLK KAQREDYVVAWWDY

GYPIRYYSDVKTLIDGGKHLGKDNFFSSFVLSKEQIPAANMARLSVEYTEKSFKENY P

DVLKAMVKDYNKTSAKDFLESLNDKDFKFDTNKTRDVYIYMPYRMLRIMPVVAQF

ANTNPDNGEQEKSLFFSQANAIAQDKTTGSVMLDNGVEIINDFRALKVEGASIPLKA F

VDIESITNGKFYYNEIDSKAQIYLLFLREYKSFVILDESLYNSSYIQMFLLNQYDQD LF

EQITNDTRAKIYRLKR*

References

[0091] 1. Apweiler, R., Hermjakob, H. & Sharon, N. On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochim Biophys Acta 1473, 4-8 (1999).

[0092] 2. Zielinska, D.F., Gnad, F., Wisniewski, J.R. & Mann, M. Precision mapping of an in vivo N-glycoproteome reveals rigid topological and sequence constraints. Cell 141, 897- 907 (2010).

[0093] 3. Helenius, A. & Aebi, M. Intracellular functions of N-linked glycans. Science 291, 2364-2369 (2001).

[0094] 4. Helenius, A. & Aebi, M. Roles of N-linked glycans in the endoplasmic reticulum. Annu Rev Biochem 73, 1019-1049 (2004).

[0095] 5. Varki, A. Biological roles of oligosaccharides: all of the theories are correct. Glycobiology 3, 97-130 (1993).

[0096] 6. Mitra, N., Sinha, S., Ramya, T.N. & Surolia, A. N-linked oligosaccharides as outfitters for glycoprotein folding, form and function. Trends Biochem Sci 31, 156-163 (2006).

[0097] 7. Aebi, M., Bernasconi, R., Clerc, S. & Molinari, M. N-glycan structures:

recognition and processing in the ER. Trends Biochem Sci 35, 74-82 (2010).

[0098] 8. Abu-Qarn, M., Eichler, J. & Sharon, N. Not just for Eukarya anymore: protein glycosylation in Bacteria and Archaea. Curr Opin Struct Biol 18, 544-550 (2008).

[0099] 9. Schwarz, F. & Aebi, M. Mechanisms and principles of N-linked protein glycosylation. Curr Opin Struct Biol 21, 576-582 (2011).

[00100] 10. Szymanski, CM. & Wren, B.W. Protein glycosylation in bacterial mucosal pathogens. Nat Rev Microbiol 3, 225-237 (2005).

[00101] 11. Larkin, A., Chang, M.M., Whitworth, G.E. & Imperiali, B. Biochemical evidence for an alternate pathway in N-linked glycoprotein biosynthesis. Nat Chem Biol 9, 367-373 (2013).

[00102] 12. Zufferey, R. et al. STT3, a highly conserved protein required for yeast oligosaccharyl transferase activity in vivo. Embo J 14, 4949-4960 (1995).

[00103] 13. Lizak, C, Gerber, S., Numao, S., Aebi, M. & Locher, K.P. X-ray structure of a bacterial oligosaccharyltransferase. Nature 474, 350-355 (2011).

[00104] 14. Matsumoto, S. et al. Crystal structures of an archaeal

oligosaccharyltransferase provide insights into the catalytic cycle of N-linked protein glycosylation. Proc Natl Acad Sci USA 110, 17868-17873 (2013). [00105] 15. Kowarik, M. et al. Definition of the bacterial N-glycosylation site consensus sequence. EMBO J 25, 1957-1966 (2006).

[00106] 16. Pandhal, J. et al. Inverse metabolic engineering to improve Escherichia coli as an N-glycosylation host. Biotechnol Bioeng 110, 2482-2493 (2013).

[00107] 17. Ihssen, J. et al. Structural insights from random mutagenesis of Campylobacter jejuni oligosaccharyltransferase PglB. BMC Biotechnol 12, 67 (2012).

[00108] 18. Celik, E., Fisher, A.C., Guarino, C, Mansell, T.J. & DeLisa, M.P. A filamentous phage display system for N-linked glycoproteins. Protein Sci 19, 2006-2013

(2010).

[00109] 19. Durr, C, Nothaft, H., Lizak, C, Glockshuber, R. & Aebi, M. The Escherichia coli glycophage display system. Glycobiology 20, 1366-1372 (2010).

[00110] 20. Valderrama-Rincon, J.D. et al. An engineered eukaryotic protein glycosylation pathway in Escherichia coli. Nat Chem Biol 8, 434-436 (2012).

[00111] 21. Mally, M. et al. Glycoengineering of host mimicking type-2 LacNAc polymers and Lewis X antigens on bacterial cell surfaces. Mol Microbiol 87, 112-131 (2013).

[00112] 22. Fisher, A.C. et al. Production of secretory and extracellular N-linked glycoproteins in Escherichia coli. Appl Environ Microbiol 77, 871-881 (2011).

[00113] 23. Wacker, M. et al. N-linked glycosylation in Campylobacter jejuni and its functional transfer into E. coli. Science 298, 1790-1793 (2002).

[00114] 24. Zhang, G., Brokx, S. & Weiner, J.H. Extracellular accumulation of

recombinant proteins fused to the carrier protein YebF in Escherichia coli. Nat Biotechnol 24,

100-104 (2006).

[00115] 25. Feldman, M.F. et al. Engineering N-linked protein glycosylation with diverse O antigen lipopolysaccharide structures in Escherichia coli. Proc Natl Acad Sci USA 102, 3016-3021 (2005).

[00116] 26. Linton, D., Allan, E., Karlyshev, A.V., Cronshaw, A.D. & Wren, B.W.

Identification of N-acetylgalactosamine-containing glycoproteins PEB3 and CgpA in

Campylobacter jejuni. Mol Microbiol 43, 497-508 (2002).

[00117] 27. Kowarik, M. et al. N-linked glycosylation of folded proteins by the bacterial oligosaccharyltransferase. Science 314, 1148-1150 (2006).

[00118] 28. Gavel, Y. & von Heijne, G. Sequence differences between glycosylated and non-glycosylated Asn-X-Thr/Ser acceptor sites: implications for protein engineering. Protein Eng 3, 433-442 (1990). [00119] 29. Schwarz, F. et al. Relaxed acceptor site specificity of bacterial oligosaccharyltransferase in vivo. Glycobiology 21, 45-54 (2011).

[00120] 30. Ielmini, M.V. & Feldman, M.F. Desulfovibrio desulfuricans PglB homolog possesses oligosaccharyltransferase activity with relaxed glycan specificity and distinct protein acceptor sequence requirements. Glycobiology 21, 734-742 (2011).

[00121] 31. Gerber, S. et al. Mechanism of bacterial oligosaccharyltransferase: in vitro quantification of sequon binding and catalysis. J Biol Chem 288, 8849-8861 (2013).

[00122] 32. Haitjema, C.H. et al. Universal Genetic Assay for Engineering Extracellular Protein Expression. ACS Synthetic Biology 3, 74-82 (2013).

[00123] 33. Shanks, R.M., Caiazza, N.C., Hinsa, S.M., Toutain, CM. & OToole, G.A. Saccharomyces cerevisiae-based molecular tool kit for manipulation of genes from gram- negative bacteria. Appl Environ Microbiol 72, 5027-5036 (2006).

[00124] 34. Lutz, R. & Bujard, H. Independent and tight regulation of transcriptional units in Escherichia coli via the LacR/O, the TetR/O and AraC/Il-I2 regulatory elements. Nucleic Acids Res 25, 1203-1210 (1997).

[00125] 35. Zhang, S. et al. Comparative characterization of the glycosylation profiles of an influenza hemagglutinin produced in plant and insect hosts. Proteomics 12, 1269-1288 (2012).

[00126] 36. Arnold, K., Bordoli, L., Kopp, J. & Schwede, T. The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling.

Bioinformatics 22, 195-201 (2006).

[00127] 37. Crooks, G.E., Hon, G., Chandonia, J.M. & Brenner, S.E. WebLogo: a sequence logo generator. Genome Res 14, 1188-1190 (2004).