Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
PRODUCTION OF CROSS-REACTIVE MATERIAL 197 FUSION PROTEINS
Document Type and Number:
WIPO Patent Application WO/2022/263559
Kind Code:
A1
Abstract:
This invention relates to methods of producing a CRM197 fusion protein comprising expressing a nucleic acid encoding the CRM197 fusion protein in a prokaryotic cell, wherein the CRM197 fusion protein comprises a CRM197 polypeptide and a tag that comprises the amino acid sequence WSHPQFEK or a variant thereof. This allows CRM197 to be produced as a fusion protein in soluble form in prokaryotic host cell systems. Methods of production, CRM197 fusion proteins, encoding nucleic acids and vectors and host cells comprising the nucleic acids and vectors are provided.

Inventors:
BERNARDES GONÇALO JOSÉ LOPES (PT)
DE ALBUQUERQUE MARIA INÊS SOUSA (PT)
Application Number:
PCT/EP2022/066394
Publication Date:
December 22, 2022
Filing Date:
June 15, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
INST DE MEDICINA MOLECULAR JOAO LOBO ANTUNES (PT)
International Classes:
C07K14/34
Domestic Patent References:
WO2010150230A12010-12-29
WO2016079755A12016-05-26
Other References:
SCHMIDT THOMAS G M ET AL: "The Strep-tag system for one-step purification and high-affinity detection or capturing of proteins", NATURE PROTOCOLS, NATURE PUBLISHING GROUP, GB, vol. 2, no. 6, 1 January 2007 (2007-01-01), pages 1528 - 1535, XP001536704, ISSN: 1750-2799, DOI: 10.1038/NPROT.2007.209
AH-REUM PARK ET AL: "Efficient recovery of recombinant CRM197 expressed as inclusion bodies in E.coli", PLOS ONE, vol. 13, no. 7, 18 July 2018 (2018-07-18), pages e0201060, XP055694375, DOI: 10.1371/journal.pone.0201060
PHILIPPE GOFFIN ET AL: "High-yield production of recombinant CRM197, a non-toxic mutant of diphtheria toxin, in the periplasm of Escherichia coli", BIOTECHNOLOGY JOURNAL, vol. 12, no. 7, 1 July 2017 (2017-07-01), DE, pages 1700168, XP055576196, ISSN: 1860-6768, DOI: 10.1002/biot.201700168
ALESSANDRA STEFAN ET AL: "Overexpression and purification of the recombinant diphtheria toxin variant CRM197 in", JOURNAL OF BIOTECHNOLOGY, ELSEVIER, AMSTERDAM NL, vol. 156, no. 4, 15 August 2011 (2011-08-15), pages 245 - 252, XP028119257, ISSN: 0168-1656, [retrieved on 20110825], DOI: 10.1016/J.JBIOTEC.2011.08.024
PAPPENHEIMERHARPER, SCIENCE, vol. 175, no. 1, 1972, pages 901 - 903
PAPPENHEIMER AM, ANNU. REV. BIOCHEM., vol. 46, no. 1, 1977, pages 69 - 94
BIGIO ET AL., FEBS LETTERS, vol. 218, no. 1, 1987, pages 271 - 276
GIANNINI, NUCLEIC ACIDS RES., vol. 12, no. 1, 1984, pages 4063 - 4069
MALITO ET AL., PNAS, vol. 109, no. 1, 2012, pages 5229 - 5234
SHINEFIELD HR, VACCINE, vol. 28, no. 1, 2010, pages 4335 - 4339
BUZZI, THERAPY, vol. 1, no. 1, 2004, pages 61 - 66
BUZZI ET AL., CANCER IMMUNOL. IMMUNOTHER, vol. 53, no. 1, 2004, pages 1041 - 1048
HU, J. CELL. PHYSIOL., vol. 230, no. 1, 2015, pages 1713 - 1728
STEFAN ET AL., J. BIOTECHNOL., vol. 156, no. 1, 2010, pages 245 - 252
GOFFIN ET AL., BIOTECH. J., vol. 12, no. 1, 2017, pages 1700168
PARK ET AL., PLOS ONE, vol. 13, no. 1, 2018, pages 1 - 16
PRINZ ET AL., J. BIOL. CHEM., vol. 272, no. 1, 1997, pages 15661 - 15667
MAHAMADBOONCHIRDPANBAGRED: "Applied Microbiol", BIOTECHNOL, vol. 1, no. 1, 2016, pages 1 - 16
ALTSCHUL, J. MOL. BIOL., vol. 215, 1990, pages 405 - 410
PEARSONLIPMAN, PNAS USA, vol. 85, 1988, pages 2444 - 2448
SMITHWATERMAN, J. MOL BIOI., vol. 147, 1981, pages 195 - 197
NUCL. ACIDS RES., vol. 25, 1997, pages 3389 - 3402
SAMBROOKRUSSELL: "Molecular Cloning: a Laboratory Manual", 2001, COLD SPRING HARBOR LABORATORY PRESS
AUSUBEL ET AL.: "A Compendium of Methods from Current Protocols in Molecular Biology", 1999, JOHN WILEY & SONS, article "Short Protocols in Molecular Biology"
WIEDEMANN, BELLSTEDT ,GORLACH, BIOINFORMATICS, vol. 29, no. 1, 2013, pages 1750 - 1757
LOUIS-JEUNEANDRADE-NAVARROPEREZ-LRATXETA, PROTEINS: STRUCT., FUNCT. AND BIOINF., vol. 80, no. 1, 2012, pages 374 - 381
BERNARDIM ET AL., NAT. COMMS., vol. 7, no. 1, 2016, pages 13128
LAISCHREIBER, VACCINE, vol. 27, no. 1, 2009, pages 3137 - 3144
Attorney, Agent or Firm:
MEWBURN ELLIS LLP (GB)
Download PDF:
Claims:
Claims:

1. A method of producing a CRM197 fusion protein, comprising: expressing a nucleic acid encoding the CRM197 fusion protein in a prokaryotic cell, wherein the CRM197 fusion protein comprises a CRM197 amino acid sequence and a tag that comprises the amino acid sequence WSHPQFEK or a variant thereof.

2. A method according to claim 1 , wherein the CRM197 fusion protein is expressed in soluble form in the prokaryotic cell.

3. A method according to claim 1 or claim 2, wherein the CRM197 fusion protein comprises an amino acid sequence having at least 80% sequence identity to the amino acid sequence of SEQ ID NO: 1.

4. A method according to any one of the preceding claims, wherein the CRM197 polypeptide comprises an amino acid sequence having at least 80% sequence identity to the amino acid sequence of SEQ ID NO:

3.

5. A method according to any one of the preceding claims, wherein the prokaryotic cell is a bacterial cell.

6. A method according to claim 5, wherein the bacterial cell is an E. coli cell.

7. A method according to any one of the preceding claims, wherein the tag is connected to the N terminus of the CRM197 polypeptide.

8. A method according to any one of the preceding claims, wherein the tag is connected to the CRM197 polypeptide through a linker.

9. A method according to claim 8, wherein the linker comprises a protease recognition site.

10. A method according to claim 9, wherein the protease recognition site is an enteropeptidase recognition site.

11. A method according to claim 11 , wherein the enteropeptidase recognition site comprises the amino acid sequence DDDDK.

12. A method according to any one of the preceding claims, wherein the CRM197 fusion protein comprises the amino acid sequence of SEQ ID NO: 3.

13. A method according to any one of the preceding claims, wherein the nucleic acid encoding the CRM197 fusion protein comprises the nucleotide sequence of SEQ ID NO: 4.

14. A method according to any one of the preceding claims, further comprising isolating the expressed CRM197 fusion protein.

15. A method according to any one of the preceding claims, further comprising removing the tag from the CRM197 fusion protein to produce a CRM197 polypeptide.

16. A CRM197 fusion protein, comprising a CRM197 polypeptide and a tag; wherein the tag comprises the amino acid sequence WSHPQFEK, or a variant thereof.

17. A CRM197 fusion protein according to claim 16, comprising the amino acid sequence of SEQ ID NO: 3 or a variant thereof.

18. A nucleic acid encoding a CRM197 fusion protein according to claim 16 or claim 17.

19. A nucleic acid according to claim 17, comprising a nucleotide sequence having at least 95% sequence identity to SEQ ID NO: 4.

20. An expression vector comprising a nucleic acid according to claim 18 or claim 19.

21. A host cell comprising the nucleic acid according to claim 18 or claim 19, or the expression vector of claim 20.

22. A host cell according to claim 21 , wherein the cell is a prokaryotic cell.

Description:
Production of Cross-reactive material 197 Fusion Proteins

Field

The present invention relates to methods for the production of recombinant proteins, in particular, proteins useful in therapy, such as Cross-reactive material 197 (CRM 197).

Background

Cross-reactive material 197 (CRM197) is a genetically detoxified variant of the diphtheria toxin (DT), isolated from a group of six different nitrosoguanidine-treated DT mutants first described in the early 1970s by the Alwin Max Pappenheimer Jr’s Lab at Harvard University (Pappenheimer, Uchida and Harper (1972), Immunochem. 9(1); Uchida, Pappenheimer and Harper (1972), Science 175(1)901-903; Pappenheimer AM (1977), Annu. Rev. Biochem. 46(1)69-94).

CRM197 is a 535 amino acid (aa) protein that can be cleaved by trypsin-like proteases in two fragments: fragment A (~21 kDa, containing the catalytic domain) and fragment B (~38 kDa, containing the transmembrane and receptor domains) (Bigio et al. (1987), FEBS Letters 218(1)271-276; Giannini et al. (1984), Nucleic Acids Res. 12(1)4063-4069; Malito etal. (2012), PNAS 109(1)5229-5234). The protein contains two disulphide bonds: a more exposed bond at Cys186-201 , and another at Cys461-471 that appears to be more buried inside the protein, and therefore could be less accessible to chemical modifications (Malito et al. (2012), PNAS 109(1)5229-5234). From the sequence first described by G. Giannini in 1984, to X-ray crystallography and MD dynamics studies, throughout the years it has become clear that a single substitution in position 52 (due to a mutation of the wild-type codon “GGG" for Gly into “GAG” for Glu) is enough to render the protein non-toxic. This mutation seems to increase the flexibility of the active-site loop that covers the NAD-binding pocket of CRM197, when compared to that in DT (Malito et al. (2012), PNAS 109(1)5229-5234).

Besides the genetic mutation that makes this protein safe without the need for often destructive chemical treatments, very early on it was observed that CRM197 is capable of eliciting a consistent and memory- inducing immune response in children and toddlers, making it ideal as a carrier protein in paediatric vaccines (Shinefield HR. (2010), Vaccine 28(1)4335-4339). Interestingly, CRM197 has also been shown to have great potential as an anti-tumour agent, and could be effective in the treatment of atherosclerosis (since both cancer and vascular plaques overexpress HB-EGF, the specific receptor for DT and CRM197) (Buzzi et al. (2004), Therapy 1(1)61-66; Buzzi etal. (2004), Cancer Immunol. Immunother. 53(1)1041-1048; Hu et al. (2015), J. Cell. Physiol. 230(1)1713-1728).

There have been many attempts over the years, from both academic groups and the pharmaceutical industry, to develop methods of producing this protein in a system that does not require biosafety level (BSL)-3 facilities, contrary to production in Corynebacterium diphteriae (Stefan et al. (2010), J. Biotechnol. 156(1)245-252; Goffin et al. (2017), Biotech. J. 12(1)1700168; Park et al. (2018), PLoS ONE 13(1)1-16). Furthermore, its high commercial costs, as well as the challenges its production in Escherichia coli creates, have limited the scope of research laboratories that can work with this protein and have made it almost prohibitive for use in academic research. The presence of structural disulphide bridges, which are important for maintenance of the immunogenic character of CRM197, makes this protein very challenging to express in its proper conformation in bacteria, as is the case with many other proteins bearing these chemical bonds (Prinz et al. (1997), J. Biol. Chem. 272(1)15661-15667). The usually reducing cytoplasmic environment that is common in bacteria renders proper folding of such proteins difficult to achieve. These conditions are usually due to the presence of thioredoxin (Trx), glutaredoxin (Grx) or glutathione (GSH), which have the capacity to reduce disulphide bonds in anaerobic conditions (Prinz et al. (1997), J. Biol. Chem. 272(1)15661-15667).

Various approaches are known in the art to promote expression and proper folding of these challenging proteins in E. coli (de Marco, A. (2009), Microb. Cell Factories 8(1)). One such approach is to target secretion of proteins to the periplasm. Contrary to the cytoplasm, this is an oxidizing compartment, containing enzymes that promote disulphide bond formation and isomerization, as well as specific chaperones and foldases. This is however a problematic solution, as the bottleneck created by proteins crossing the inner membrane may lead to the accumulation of misfolded proteins in the cytoplasm, in turn driving cellular cytotoxicity and decreased soluble protein yields. Other solutions (which may be applied in combination) include maintaining tight control of protein expression, using methods such as: temperature induction, genetic modification of the Shine-Dalgarno sequences or other expression-modulating regions, making changes to growth medium or the plasmid origin of replication, and the use of expression promoters. The genetic background of the host cell (often bacterial) used for protein production can also have a large impact on expression levels (Goffin etal. (2017), Biotech. J. 12(1)1700168).

Whilst useful, the value of such methods in improving the expression of challenging proteins varies, depending on the intrinsic characteristics of the protein to be expressed. Moreover, whilst scientists tend to favour one method over another, there is no reliable data comparing methods of improving protein expression against one-another under the same experimental conditions (de Marco, A. (2009), Microb. Cell Factories 8(1)).

It has been previously described that the fusion of a N-terminal His-tag to CRM197 seems to render the protein non-toxic to E. coli, increasing total protein expression yields (Stefan et al. (2010), J. Biotechnol. 156(1)245-252). However, the addition of His-tag has not been shown to be sufficient to render the protein soluble, and the majority of the expressed protein detected seemed to be associated with inclusion bodies (Stefan etal. (2010), J. Biotechnol. 156(1)245-252). There have also been attempts at co-expressing this protein with chaperones to promote proper folding (Mahamad, Boonchird and Panbagred (2016), Applied Microbiol. Biotechnol. 1(1)1-16) or directing protein expression to the periplasm (Park etal. (2018), PLoS ONE 13(1)1-16). Though in both cases the protein expression yields are quite impressive (~150 mg/mL and ~3 g/L, respectively), the methods described are quite complex for a non-specialist, and often these descriptions rely only on production yields, disregarding protein functionality, more specifically in vivo.

Summary

The present inventors have recognised that linking a CRM197 polypeptide to a tag comprising the amino acid sequence WSHPQFEK allows CRM197 to be produced as a fusion protein in soluble form in a prokaryotic host cell system. A first aspect of the invention provides a method of producing a CRM197 fusion protein comprising: expressing a nucleic acid encoding the CRM197 fusion protein in a prokaryotic cell, wherein the CRM197 fusion protein comprises a CRM197 polypeptide and a tag that comprises the amino acid sequence WSHPQFEK or a variant thereof.

Methods of the first aspect may further comprise isolating the expressed CRM197 fusion protein.

Methods of the first aspect may further comprise removing the tag from the CRM197 fusion protein to produce a CRM197 polypeptide.

A second aspect of the invention provides a CRM197 fusion protein comprising a CRM197 amino acid sequence and a tag comprising the amino acid sequence WSHPQFEK, or a variant thereof.

A third aspect of the invention provides a nucleic acid encoding a CRM197 fusion protein of the second aspect.

A fourth aspect of the invention provides an expression vector comprising a nucleic acid of the third aspect.

A fifth aspect of the invention provides a prokaryotic cell comprising a nucleic acid of the third aspect or an expression vector of the fourth aspect.

A CRM197 polypeptide of the first to the fifth aspects may comprise the full length native CRM197 amino acid sequence set forth in SEQ ID NO: 1 or may be a variant thereof.

A prokaryotic cell of the first and fifth aspects may be a non-pathogenic prokaryotic cell. Preferably, the prokaryotic cell is a bacterial cell, most preferably an E. coli cell.

Additional aspects of the disclosure and embodiments of the invention are described in more detail below.

Brief Description of the Figures

Figure 1 shows schematics of sequences coding for a CRM197 fusion protein known as iCRM197.

Sequence was inserted in the place of nucleotides 5010-6618 of pET-51 b(+) to construct pET-IA1788.

Figure 2 shows the expression of iCRM197. Figure 2A shows a representative SDS-PAGE gel in which 20 pL of the flow-through, wash, and elution fractions were analysed. Some iCRM197 is still found in the flowthrough and wash fractions, but these were later captured in the same affinity column, after recharging.

Figure 2B shows a representative SDS-PAGE gel of a batch of iCRM197 after full capture through the affinity column. Figure 2C shows a representative gel showing efficiency of protease-dependent Strep-tag ® II removal, analysed under reducing (with b-mercaptoethanol) or non-reducing conditions (without b- mercaptoethanol). The tagged protein could be collected and re-subjected to protease digestion.

Figure 3 shows biophysiochemical characterisation of iCRM197. Figure 3A shows a representative western- blot analysis of different purification steps of iCRM197. 20 pL of the flow-through, wash, and elution fractions were analysed. Some iCRM197 is still found in the flow-through and wash fractions, but as aforementioned, these were later re-captured with the same affinity column, after recharging. The membrane was probed first with the antibody against Strep-tag ® II (left), and after mild-stripping, was re-probed with an antibody against CRM197 (right). Figure 3B shows deconvoluted mass spectra of iCRM197, reconstructed from the ion series using the MaxEnt algorithm. Theoretical calculated mass is indicated. Figure 3C shows far-UV CD spectra of CRM197 commerically produced in C. diphtheriae (COM-CRM197) and iCRM197. Figure 3D shows results obtained from Capito and K2D3 CD analysis online-tools. In the case of Capito, alpha-helix includes a, 310, and p-helix; b-sheet includes b-bridge, and bonded turn; and bends, and loops are included in the structural feature irregular. For K2D3, helical and b-strand contents were subtracted from 100 to obtain percentage of irregular. All values are presented as percentage. Figure 3E shows a representative LC-MS spectrum obtained after Ellman’s reaction, showing unmodified protein (60368 Da), detection of 2 reacted thiols (60759 Da), and detection of 4 reacted thiols (61171 Da). Figure 3F shows a representative LC-MS spectrum after conjugation with CAA-NEt. Addition of four molecules of the carbonylacrylic reagent is detectable (61302 Da), as well as unmodified protein (60403 Da).

Figure 4 shows protein conjugation with an Alexa Fluor 488 fluorophore for internalization assays. Figure 4A shows schematics of the conjugation of CRM197 and iCRM197 with a fluorophore (Alexa Fluor 488), for protein tracking in live cells. Figure 4B shows an SDS-PAGE gel for confirmation of specificity of the 1h click-reaction, for samples of unmodified or CAA-alkyne modified samples of CRM197 and iCRM197. The same Coomassie-stained lane (left) is shown next to the corresponding fluorescence detection (right).

Marker is annotated with the molecular weights corresponding to each band.

Figure 5 shows the protein carrier potential of iCRM197. Figure 5A shows confocal imaging detection of the internalized proteins in Raji cells. Scale bar is 10 pm. Figure 5B shows imaging flow cytometry results for detection of fluorescently labelled proteins inside Raji cells (at 30 min after start of incubation), including photos corresponding to the GFP-positive population. Figure 5C shows anti CRM197 specific IgG levels of BALB/cByJ mouse groups immunized with COM-CRM197 (¨) and iCRM197 (A), after first and second boost immunizations administered two weeks apart. Data is shown as ELISA units per mL of serum. Figure 5D shows ELISA raw data obtained from an immunization experiment with vehicle control (solid, black), COM- CRM197 (dashed, black) or iCRM197 (dotted, black). Absorbance at 450 nm is plotted against serial serum dilutions. “TO” refers to timepoint at the time of the first injection (top panel); “post I” refers to timepoint after the first booster injection (middle panel); “post II” refers to timepoint after the second booster injection (bottom panel). Graphs show mean and standard error of the mean.

Detailed Description

This invention relates to the production of CRM197 in a prokaryotic system in a soluble form by expressing a nucleic acid encoding a fusion protein comprising CM197 linked to a tag. In particular, the tag may comprise the amino acid sequence WSHPQFEK. The soluble fusion protein may be isolated following expression and used in various applications, or optionally cleaved to generate CRM197. Methods of the invention may be useful in the efficient, scalable production of biologically active CRM197 polypeptide.

A CRM197 fusion protein as described herein may comprise a CRM197 polypeptide. Cross-reactive material 197 (CRM 197) polypeptide is a diphtheria toxin (DT) polypeptide that is detoxified by the presence of a Gly to Glu mutation at a position corresponding to position 52 of the wild type CRM197 sequence shown in SEQ ID NO: 1. A CRM197 polypeptide may further comprise a disulphide bond between Cys residues at positions corresponding to positions 186 and 201 of SEQ ID NO: 1 and a disulphide bond between Cys residues at positions corresponding to positions 461 and 471 of SEQ ID NO: 1.

A CRM197 polypeptide within a CRM197 fusion protein may be correctly folded and may display the immunogenic and biological properties of wild-type CRM197.

A CRM197 polypeptide may comprise the amino acid sequence of SEQ ID NO: 1 or a variant thereof.

A variant of a reference sequence set out herein, such as a reference CRM197 polypeptide or reference CRM197 fusion protein sequence, may comprise an amino acid sequence having at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or at least 98% sequence identity to the reference sequence. Particular amino acid sequence variants may differ from a reference sequence shown herein by insertion, addition, substitution or deletion of 1 amino acid, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more than 10 amino acids.

Sequence similarity and identity are commonly defined with reference to the algorithm GAP (Wisconsin Package, Accelerys, San Diego USA). GAP uses the Needleman and Wunsch algorithm to align two complete sequences that maximizes the number of matches and minimizes the number of gaps. Generally, default parameters are used, with a gap creation penalty = 12 and gap extension penalty = 4. Use of GAP may be preferred but other algorithms may be used, e.g. BLAST (which uses the method of Altschul et al (1990) J. Mol. Biol. 215: 405-410), FASTA (which uses the method of Pearson and Lipman (1988) PNAS USA 85: 2444-2448), or the Smith-Waterman algorithm (Smith and Waterman (1981) J. Mol Biol. 147: 195- 197), or the TBLASTN program, of Altschul et al. (1990) supra, generally employing default parameters. In particular, the psi-Blast algorithm (Nucl. Acids Res. (1997) 25 3389-3402) may be used.

Sequence comparison may be made over the full-length of the relevant sequence described herein.

The CRM197 polypeptide may be linked to a tag in a CRM197 fusion protein described herein. The presence of the tag facilitates the soluble expression of the CRM197 fusion protein in prokaryotic systems. In addition, the tag may be useful for the affinity purification of the CRM197 fusion protein following expression.

Preferred tags for use in a CRM197 fusion protein are non-immunogenic and biochemically unreactive or inert.

Preferably the tag is linked to the N terminus of the CRM197 polypeptide.

A suitable tag sequence may comprise the amino acid sequence WSHPQFEK or a variant thereof. Suitable tags include Strep-tag ® II and Twin-Strep-Tag ® , which are well-known in the art.

The tag may be linked directly to the CRM197 polypeptide or may be linked indirectly through a linker. A linker is a sequence of amino acid residues that connects the tag to the CRM197 polypeptide. The linker may be 1-20 amino acids in length, preferably 2-15 amino acids in length. For example, the linker may be 2 amino acids in length, Suitable examples of linker amino acid sequences are known in the art. Preferably, the linker does not comprise Cysteine (C) residues.

In some preferred embodiments, the linker may comprise a protease recognition site. A protease recognition site is an amino acid sequence that is specifically cleaved by a protease. Suitable proteases and protease recognition sites are known in the art and include enteropeptidase (enterokinase), which specifically cleaves at the recognition site DDDDK. Other suitable proteases may include Tobacco Etch Virus nuclear-inclusion- a endopeptidase (TEV), which specifically cleaves at the recognition site consensus sequence ECI_UFO\f (most commonly ENLYFQ(G/S)). In some embodiments, f may denote the initial glycine (G) residue of the CRM197 polypeptide.

Suitable linkers may comprise a protease recognition site and a dipeptide, such as GA. For example, a linker may comprise the amino acid sequence GADDDDK or a variant thereof.

In some embodiments, the CRM197 fusion protein may further comprise an initiation methionine at its N terminus that is connected to the tag sequence via a dipeptide, such as AS.

In some embodiments, the CRM197 fusion protein may further comprise a signal sequence. Signal sequences are short peptides which direct the translocation of a newly synthesised peptide (target protein) towards a specific cellular location. In prokaryotes, signal sequences may act to direct the target protein towards the periplasm, or towards a secretory pathway. Suitable signal sequences are well known in the art.

A CRM197 fusion protein as described herein comprising an CRM197 polypeptide and a tag consisting of the amino acid sequence WSFIPQFEK is provided as another aspect of the invention. A suitable fusion protein may comprise the amino acid sequence of SEQ ID NO: 3 or a variant thereof.

An isolated nucleic acid molecule encoding a CRM197 fusion protein as described above and a vector comprising such a nucleic acid are also provided as aspects of the invention. A nucleic acid may, for example, encode a CRM197 fusion protein comprising an CRM197 polypeptide and a tag consisting of the amino acid sequence WSFIPQFEK. A suitable nucleic acid may comprise the nucleotide sequence of SEQ ID NO: 4 or a variant thereof

A variant of a reference nucleotide sequence set out herein, such as a reference CRM197 coding sequence may comprise a nucleotide sequence having at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or at least 98% sequence identity to the reference sequence. Particular nucleotide sequence variants may differ from a reference sequence shown herein by insertion, addition, substitution or deletion of 1 amino acid, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more than 10 nucleotides. Nucleic acid molecules may comprise DNA and/or RNA and may be partially or wholly synthetic. Reference to a nucleotide sequence as set out herein encompasses a DNA molecule with the specified sequence, and encompasses an RNA molecule with the specified sequence in which U is substituted forT, unless context requires otherwise.

A nucleic acid may be codon optimised for expression in a prokaryotic system, such as E. coli.

Further provided are constructs in the form of vectors (e.g. expression vectors), or transcription or expression cassettes which comprise such nucleic acids. Suitable vectors can be chosen or constructed, containing appropriate regulatory sequences, including promoter sequences, terminator sequences, polyadenylation sequences, enhancer sequences, marker genes, origins of replication and other sequences as appropriate. A vector will typically contain expression control sequences compatible with the prokaryotic host cell (e.g., an origin of replication). In addition, any number of a variety of well-known promoters will be present, such as the lactose promoter system, a tryptophan (trp) promoter system, a beta-lactamase promoter system, or a promoter system from phage lambda. The promoters will typically control expression, optionally with an operator sequence, and have ribosome binding site sequences and the like, for initiating and completing transcription and translation. Vectors for use in prokaryotic cells may also require an origin of replication component.

Vectors may be plasmids e.g. phagemid, or viral e.g. 'phage, as appropriate. For further details see, for example, Sambrook & Russell (2001) Molecular Cloning: a Laboratory Manual: 3rd edition, Cold Spring Harbor Laboratory Press. Many known techniques and protocols for manipulation of nucleic acid, for example in the preparation of nucleic acid constructs, mutagenesis, sequencing, introduction of DNA into cells and gene expression, and analysis of proteins, are described in detail in Ausubel et al. (1999) 4 th eds., Short Protocols in Molecular Biology: A Compendium of Methods from Current Protocols in Molecular Biology, John Wiley & Sons.

Preferred vectors of the invention may comprise a promoter sequence for the T7 polymerase. T7 is a strong promoter that provides high expression levels, and is specific to the T7 RNA polymerase, which has a very low error rate. Vectors comprising a T7 promoter sequence are readily available in the art.

Preferred vectors of the invention may further comprise a lac promoter-operator sequence. This may be useful in preventing basal expression from the T7 promoter.

A method of producing a CRM197 fusion protein as described herein may comprise introducing a nucleic acid or vector as described herein may be introduced into a prokaryotic cell. For example, the vector or nucleic acid may be transformed into a prokaryotic cell in which the vector is functional.

Techniques for the introduction of nucleic acid into prokaryotic cells (generally referred to without limitation as “transformation”) are well established in the art and any suitable technique may be employed, in accordance with the circumstances. Suitable techniques may include calcium chloride transformation, electroporation and transfection using bacteriophage (see for example, Sambrook & Russell (2001) supra ; Ausubel et al. (1999) supra). In addition, the vector may comprise a viral vector, such as an adenovirus, retrovirus, lentivirus, adeno-associated virus, baculovirus, vaccinia virus or herpes simplex virus vector. A viral vector may comprise a viral particle, comprising a nucleic acid and one or more viral proteins.

The introduced nucleic acid may be on an extra-chromosomal vector within the cell or the nucleic acid may be integrated into the genome of the host cell. Integration may be promoted by inclusion of sequences within the nucleic acid or vector which promote recombination with the genome, in accordance with standard techniques.

Marker genes such as antibiotic resistance or sensitivity genes may be used in identifying clones containing nucleic acid of interest, as is well-known in the art.

A range of host cells suitable for the production of recombinant CRM197 fusion proteins are known in the art. Suitable prokaryotic cells may include bacterial cells, such as Escherichia coli, Lactococcus lactis; bacilli, such as Bacillus subtilis, and other enterobacteriaceae, such as Salmonella, Serratia, and various Pseudomonas species.

In preferred embodiments, the prokaryotic cell may be an Escherichia coli cell, such as a BL21 (DE3) strain E. coli cell.

Preferably, the prokaryotic cell does not naturally express CRM197. For example, the prokaryotic cell may be other than Corynebacterium diphtheriae.

Suitable prokaryotic cells may be non-pathogenic cells. Suitable non-pathogenic organisms may be handled at biosafety level 1 (BSL-1) or biosafety level 2 (BSL-2) (as defined by EU Directive 2000/54/EC).

In some embodiments, the prokaryotic cell may be deficient in disulphide reductase. For example, the prokaryotic cell may be a disulphide reductase-deficient E coli cell.

Preferably, the prokaryotic cell does not express heterologous or recombinant chaperones.

A recombinant prokaryotic cell comprising a nucleic acid or vector that expresses a CRM197 fusion protein as described above is also provided by the invention.

The introduction may be followed by expression of the nucleic acid encoding the CRM197 fusion protein in the prokaryotic cell to produce the encoded CRM197 fusion protein. In some embodiments, the prokaryotic cell (which may include cells actually transformed although more likely the cells will be descendants of the transformed cells) may be cultured in vitro under conditions for expression of the nucleic acid, so that the encoded CRM197 fusion protein is produced. Suitable conditions are well known in the art. When an inducible promoter is used, expression may require the activation of the inducible promoter. For example, a bacterial strain, such as E. coli B121 (DE3) which comprises T7 polymerase linked to an inducible lac UV5 promoter may be used. The nucleic acid encoding the CRM 197 fusion protein may be expressed in the cytoplasm of the prokaryotic cell to produce the CRM197 fusion protein in a soluble form. Preferably, the CRM197 fusion protein is expressed in the soluble cell fraction in its correctly folded conformation. For example, the CRM197 fusion protein may be correctly folded and may be present in a soluble cell fraction and not associated with inclusion bodies in the prokaryotic cell. Preferably, the CRM197 fusion protein is not expressed in the periplasm of the prokaryotic cell.

The CRM197 fusion protein may be produced using batch, fed batch or continuous cell culture bioprocesses well known in the art. In some embodiments, the CRM197 fusion protein is produced with up to 12 mg/L, 24 mg/L, 48 mg/L, 98mg/L, 100mg/L, 300 mg/L, 500mg/L, 1 g/L, 3g/L or 5g/L soluble protein yield per L of cell culture material.

The expressed CRM197 fusion protein may be isolated, recovered and/or purified, after production. This may be achieved using any convenient method known in the art. Preferably, the CRM197 fusion protein may be isolated under physiological conditions. Suitable techniques for the purification of recombinant polypeptides include discontinuous batch purification or continuous purification methods and are well known in the art. For example, HPLC, FPLC, ion exchange (IEX), cation exchange (CEX), anion exchange (AEX), hydroxyapatite (HAC), hydrophobic interaction (HIC), mixed mode (MM) or affinity chromatography methods may be employed.

In some embodiments, the CRM197 fusion protein may be isolated, recovered and/or purified using heparin or heparin-like affinity chromatography methods that are well known in the art. Suitable chromatography resins for use in such methods may for example comprise functional sulphate groups, such as dextran sulphate or sulphate esters.

In other embodiments, the CRM197 fusion protein may be isolated, recovered and/or purified using anion exchange chromatography methods that are well known in the art. Suitable anion exchange chromatography resins may for example comprise functionalised diethylaminoethyl (DEAE), trimethyalaminoethyl (TMAE), quaternary aminoethyl (QAE) or quaternary amine (O) groups.

In some preferred embodiments, the CRM197 fusion protein may be purified using the tag, for example by affinity chromatography methods. A solid support, such as a chromatography material may be used to isolate the CRM197 fusion protein. For example, the solid support may comprise an affinity chromatography material bearing a functional group or ligand which binds specifically to the tag. Suitable functional groups or ligands may include streptavidin, Strep-Tactin ® and variants thereof. Suitable affinity chromatography materials may include affinity chromatography membranes and affinity chromatography columns.

For example, a method may comprise preparing a cell lysate comprising the expressed CRM197 fusion protein and contacting the cell lysate with a chromatography material, such as an affinity chromatography material, such that the CRM197 fusion protein in the lysate binds to the chromatography material. The chromatography material may specifically bind to the tag of the CRM197 fusion protein. In some embodiments, the soluble fraction and/or supernatant of a cell lysate may be contacted with the chromatography material. The method may further comprise eluting the CRM197 fusion protein from the chromatography material to produce the isolated CRM197 fusion protein. Suitable methods of elution are well known in the art.

Following elution, the isolated CRM197 fusion protein may be resuspended or formulated into any appropriate buffer. The isolated CRM197 fusion protein may be used in a range of applications.

In some embodiments, the tag may be removed from the CRM197 fusion protein following isolation to produce an isolated CRM197 polypeptide. For example, the fusion protein may be contacted with a protease that cleaves the CRM197 fusion protein at the protease recognition site to separate the CRM197 polypeptide from the tag. In some embodiments, the protease is enteropeptidase. The use of enteropeptidase (also called enterokinase) to cleave peptide sequences is well known in the art. Following cleavage, the CRM197 polypeptide may be isolated or purified from the tag. Suitable techniques, such as HPLC or FPLC, are well known in the art.

CRM197 polypeptides and fusion proteins produced as described herein may sustain immune responses comparable to CR 197 produced at commercial scale in C. diphtheriae. Suitable techniques are well known in the art, and are described in more detail below.

CRM197 polypeptides and fusion proteins produced as described herein may bind to heparin binding EGF- like growth factor (HB-EGF) with a binding affinity comparable to CRM197 produced at commercial scale in C. diphtheriae.

The CRM197 polypeptides and fusion proteins may therefore be useful as protein carriers for example in therapeutic compositions.

Following expression as described herein, a CRM197 polypeptide or the CRM197 fusion protein may be conjugated with a functional moiety, such as an antigen, drug or detectable label and/or formulated with a pharmaceutically acceptable excipient.

Suitable functional moieties and excipients are well known in the art.

A pharmaceutical composition, comprising a CRM197 fusion protein and a pharmaceutically acceptable excipient is further contemplated.

Other aspects and embodiments of the invention provide the aspects and embodiments described a above with the term “comprising” replaced by the term “consisting of’ and the aspects and embodiments described above with the term “comprising” replaced by the term “consisting essentially of.

It is to be understood that the application discloses all combinations of any of the above aspects and embodiments described above with each other, unless the context demands otherwise. Similarly, the application discloses all combinations of the preferred and/or optional features either singly or together with any of the other aspects, unless the context demands otherwise. Modifications of the above embodiments, further embodiments and modifications thereof will be apparent to the skilled person on reading this disclosure, and as such, these are within the scope of the present invention.

All documents and sequence database entries mentioned in this specification are incorporated herein by reference in their entirety for all purposes.

The term “and/or” where used herein is to be taken as specific disclosure of each of the two specified features or components with or without the other. For example, “A and/or B” is to be taken as specific disclosure of each of (i) A, (ii) B and (iii) A and B, just as if each is set out individually herein.

Experiments

Experiment 1 - expression of soluble CRM 197 in E. coli and its purification in non-denaturing conditions The artificial codon-optimized sequence of CRM197 was cloned by in-Fusion cloning into pET-51 b(+) in position 5010-6618 bp, by which the full MCS (except for the first nucleotide which is common to the EK site), linker region and 10xHis tag were removed. The final plasmid (pET-IA1788) generates a fusion protein bearing a initiation methionine, followed by a dipeptide linker, the Strep-tag ® II, another dipeptide linker, and the enteropeptidase recognition site, followed by a nucleotide sequence coding for CRM197 (Figure 1). This fusion protein is herein referred to as iCRM197. Plasmid stocks exist in NZ5a cells (NZYTech), and these plasmids were used to transform BL21(DE3) cells.

For the expression protocol, all growths are performed in standard Erlenmeyertype flasks, using orbital shakers with adjustable temperature. Cultures are grown in Lima's broth (Miller’s formulation) low salt medium (Sigma-Aldrich, Cat# L3397) with (solid) or without (liquid) 10 g/L agar. Growth medium is always supplemented with ampicillin or carbenicillin (at 100 pg/mL), to maintain selective pressure (the latter for liquid media).

Growth for protein expression can be performed directly from frozen vials of pET-IA1788 transformants. Bacteria can be recovered from the glycerol stocks using a pipette, removing a portion of the upper surface of the frozen stock and inoculate a starter culture. This culture is then grown overnight, at 37 °C, under 150- 250 rpm agitation. The starter culture is then used to inoculate (at 1 :50 inoculum) carbenicillin-supplemented LB. The ratio of liquid to flask capacity should be up to 25%. This culture is maintained at 37 °C under mild- shaking conditions (up to 180 rpm) until OD600 reaches mid-log phase (0.5-0.7), at which point cultures are removed from the shaker, and IPTG is added to the medium. Cultures are then cultured at 16-18 °C, for another 5 h, at 120 rpm. Collection of bacterial cells is performed at 4 °C, at 8000 xg, for 20 min. Pellet can be frozen at -20 °C, until the protocol for isolation of the soluble fraction can be performed.

For the isolation of soluble fractions, all flasks must be kept on ice. When it is necessary to store solutions overnight, and to avoid freeze-thaw cycles, flasks must be kept on ice, and the containers lidded and placed inside a cold-chamber (room temperature of 4 °C). Isolation of whole protein content of the soluble fraction is performed by resuspending the bacterial pellet directly in a standard binding buffer (100 mM Tris-HCI, 150 mM NaCI, 1 mM EDTA, pH 8) appropriate for purification using a Strep-Tactin ® High-capacity Fast Flow column. To process bacterial pellet corresponding to up to 500 mL of culture, 10 mL of buffer solution should be used. The cell suspension is then sonicated, at 14 mV, for 4 min (20” ON + 20” OFF). The preparation is then centrifuged at 18000 xg, for 30 min, at 4 °C. Supernatant (hereafter named “soluble fraction”) is then kept on ice for purification of iCRM197. When required, DNase treatment can be performed, but DNA contaminations are not common in this protocol.

Soluble fraction is loaded into a pre-equilibrated Strep-Tactin ® column, until the volume of flow-through matches the volume of column input, and is fully collected in a Falcon tube that is kept on ice. At this point, 6 column volumes of binding buffer are used to wash unbound proteins (and the fractions collected into a new collection tube). Bound iCRM197 is then eluted with 6 column volumes of elution buffer (similar to binding buffer, but with 2.5 mM of desthiobiotin). SDS-PAGE can be performed with up to 10 pg of protein content from each of these fractions to confirm protein presence, and stability (by comparing reduced samples with non-reduced samples).

Cleavage of the Strep-tag ® II can be performed using any commercially available EK digestion kit. In our hands, EK digestion was performed using a commercial enteropeptidase obtained from Abeam (Cat# ab2007001), according to the manufacturer’s instructions, at 20 °C, for4h. iCRM197 can remain stable in the elution buffer for a couple of days if it remains on ice. If the protein will not be used during this interval or if another buffer is required, then it can be dialyzed and concentrated as needed, using routine concentrators with 10K-30K MWCO, or even an ammonium sulphate precipitation method, depending on the available conditions or the requirements of downstream needs.

Expression of soluble CRM197 with a N-terminal Strep-tag ® II bearing an enteropeptidase cleavage site in the linker region is successful even using the most standard BL21(DE3) E. coli protein expressing strain, at 4-6h post-induction, at 18 °C and low-agitation, in LB (Lima's formulation) medium. Longer induction times did not substantially improve soluble protein expression levels, although that did not seem to be related to cell death, but instead to the cultures reaching saturation point (as observed by monitoring the turbidity of the culture). In all the batches of fusion protein produced, purification from the soluble fraction using one single affinity-column was enough to obtain over 90% purity. Even in the purification assays where the binding capacity of the culture was exceeded, and some soluble protein was collected in the flow-through and the wash fractions, it was possible to capture this “excess” protein simply by re-injecting these fractions in a newly recharged protein (Figure 2A, 2B).

Moreover, we demonstrate that the protein produced in this manner does not seem to suffer proteolytic degradation or aggregation, as confirmed by SDS PAGE analysis under denaturing versus non-denaturing conditions (Figure 2C). Although the tag does not interfere the downstream assays shown here, we also demonstrate that it can be cleaved. For separation of untagged protein from tagged protein, it is possible to perform a reverse-affinity column, meaning that the tagged protein will remain bound to the Strep-Tactin ® column, but the “clean” CRM197 will be collected in the flow-through. In this way, even if tag-removal is required, the whole purification process can be performed using only one type of purification column (and indeed, the same column can be used, after sanitization).

Experiment 2 - Biophysical characterisation & Cys-directed modification of CRM197 The protein iCRM197 appears as a ~60 kDa band in a SDS-PAGE (consistently with the theoretical mass of 60459 Da), both under reducing and non-reducing conditions (Figure 2C). Analysis under non-reducing conditions allows to detect possible covalent and noncovalent aggregation in the samples, and to detect impurities. Gels run in reducing conditions allow detection of the proteolytically nicked form (if any) of the protein. Post-purification aggregation was not detected in any of the batches. The only lower weight band present in this particular gel (Figure 2C) corresponds to the protease used for the digestion, which was later removed in the polishing step.

Intact mass of iCRM197 was further confirmed by LC-MS (Figure 3B). Protein identity was routinely confirmed by western-blot assays performed using samples collected at different stages of the purification protocol. Membranes were probed with an antibody against Strep-tag ® II, as well as an antibody against full- length CRM197 (Figure 3A).

Characterization of protein secondary structure was performed through circular dichroism analysis, by comparing directly iCRM197 and the commercial protein control (COM-CRM197, Sigma-Aldrich, Cat# 322327) tested in the same experimental conditions. Quantification of the prevalence of secondary structures was done by direct comparison of the CD spectra (Figure 3C), as well as by blind-analysis of the molar ellipticity using two different CD analysis online-tools: Capito (Wiedemann, Bellstedt and Gorlach, (2013), Bioinformatics 29(1)1750-1757) and K2D3 (Louis-Jeune, Andrade-Navarro and Perez-lratxeta (2012), Proteins: Struct., Funct. And Bioinf. 80(1)374-381). Both use the spectra of proteins that have been characterized by x-ray crystallography as standards. Capito is a web-tool for estimating secondary structure content and analysing far-UV CD data based on a selected set of far-UV CD data as available from the PCDDB, and is especially suited for analysis of mutants of the same protein in multiple conditions. Alternatively, K2D3 is a tool that includes data not just from experiment far-UV CD data available in the PCDDB databases, but also uses DichroCalc to calculate the theoretical CD spectra of a non-redundant set of structures, thus adding to this database. It is described as particularly useful for analysis of proteins with a high percentage of beta-sheets in their structure (or less globular proteins, which is the case with CRM197). Direct analysis of the far-UV spectra did not reveal significant differences between the CRM197 standard and either of the variants produced in this work. Quantification of the percentage of helical content, beta- strands, and irregular structures confirms this observation (Figure 3D). Quantification of the beta-strand content using both tools provides different values. Flowever, percentages of the categories are maintained when comparing the three proteins amongst each other, using the same tool.

Native CRM197 has been shown to be amenable to Cys-directed reactions. Therefore, thiol availability was assessed for iCRM197. Ellman’s reaction led to a change in the colour of the protein solution. We confirmed by LC-MS that the TCEP-reduced iCRM197 protein readily reacts with Ellman’s, incorporating two (most prevalent) or four modifications (Figure 3E), a result confirmed in four independent experiments. iCRM197 was also tested for its malleability for modification with a carbonylacrylic reagent (Bernardim et al. (2016), Nat. Comms. 7(1)13128). These reagents readily and specifically react with thiols in an irreversible manner, thus creating conjugates that are highly stable in the plasma, making them ideal for production of bioconjugates. Regardless of the number of equivalents of reagent used for the reaction, or its duration, the only detectable peak (other than that of the unmodified protein) corresponds to the mass of iCRM197 with four additions of CAA-NEt (Figure 3F). At 37°C, there was clear degradation of the protein, as well as when the protein was allowed to react at 25 °C overnight. It is possible that allowing the protein to react for longer than 4h, but less than 16h (corresponding to the overnight reaction) would improve conversion without noticeable degradation of the protein carrier, but this is yet to be tested. Other possibility to improve protein stability is the addition of stabilizing molecules to the solution. However, this involves confirmation that these do not affect reaction with CAA-NEt. Finally, it is also possible that increasing TCEP-reduction time would improve the outcome of conjugation reaction. Nonetheless, the goal at this stage was to show that the protein is amenable to Cys-directed modifications, which was achieved.

Experiment 3 - the role ofiCRM197 as a protein carrier

For iCRM197 to be able to function as a protein carrier, it is first essential that it is internalized by B cells. To assess this, it was necessary to be able to track the protein inside live cells. Thus, iCRM197 and the corresponding commercial control were modified with a carbonylacrylic reagent bearing an alkyne moiety (Bernardim et al. (2016), Nat. Comms. 7(1)13128), afterwhich these modified proteins were conjugated by click chemistry to a fluorescent molecule (Figure 4). The addition of the fluorophore was shown to be specific for the CAA-alkyne modified proteins, as shown by detection of fluorescence on an SDS-PAGE gel (Figure 4).

Both proteins were tagged using the same reaction conditions, and these now fluorescent proteins were incubated with Raji cells for up to 2h. Fluorescence inside the cells could be detected already at 30 min after the start of incubation, both by confocal microscopy (Figure 5A) and imaging flow-cytometry (Figure 5B). Overall, this data indicates internalization of iCRM197, with an internalization kinetics identical to wild-type (WT) CRM197 and has also been previously described for this protein (Lai and Schreiber (2009), Vaccine 27(1)3137-3144). However, for a protein to be useful as protein carrier, it is essential that in vivo it has the capability to not only trigger antibody production, but also that internalization is successful, thus activating T- cell dependent responses, such as generation of immunological memory. To test this, groups of 4 BALB/c mice were immunized with either a protein solution, or the same volume of vehicle. Each group received 3 injections, 2 weeks apart, and sera was collected every two weeks after each immunization to determine total specific-CRM197 IgG production. ELISA results show that iCRM197 is comparable to CRM197 in terms of specific antibody responses (Figure 5C, 5D). Moreover, a booster effect is observable, which confirms its capacity to trigger immunological memory (Figure 5C, 5D). Taken together, this data shows that CRM197 variants produced as described herein can induce not only specific antibody responses, but also generation of immunological memory, both essential features of a vaccine protein carrier. Reference Sequences