Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHODS AND COMPOSITIONS FOR LECTIN PRODUCTION
Document Type and Number:
WIPO Patent Application WO/2019/170892
Kind Code:
A1
Abstract:
Methods for the production of recombinant lectin proteins in yeast are provided. The methods comprise transforming a Pichia pastoris yeast culture with a nucleic acid expression vector that comprises a gene encoding a lectin polypeptide, wherein the lectin gene is fused to a nucleic acid sequence that encodes a Saccharomyces α-factor prepro-peptide, and wherein the expression of the lectin polypeptide is under the control of a constitutive promoter present in the expression vector; and maintaining the yeast culture under conditions that support expression of the polypeptide, followed by isolating the recombinant protein from the yeast culture. The methods are suitable for production of recombinant phytohaemagglutinin (PHA) lectin production, particularly PHA-L and PHA-E.

Inventors:
MANTALIDI ANASTASIA (CH)
LOCKETT ANTHONY (CH)
Application Number:
PCT/EP2019/055915
Publication Date:
September 12, 2019
Filing Date:
March 08, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
SYNDERMIX AG (CH)
International Classes:
C12N15/81
Domestic Patent References:
WO2012161956A22012-11-29
Other References:
ROMAAN J. M. RAEMAEKERS ET AL: "Functional phytohemagglutinin (PHA) and Galanthus nivalis agglutinin (GNA) expressed in Pichia pastoris . Correct N-terminal processing and secretion of heterologous proteins expressed using the PHA-E signal peptide", EUROPEAN JOURNAL OF BIOCHEMISTRY, vol. 265, no. 1, 27 July 1999 (1999-07-27), GB, pages 394 - 403, XP055583075, ISSN: 0014-2956, DOI: 10.1046/j.1432-1327.1999.00749.x
BAUMGARTNER PHILIPPE ET AL: "Large-scale production, purification, and characterisation of recombinant Phaseolus vulgaris phytohemagglutinin E-form expressed in the methylotrophic yeast Pichia pastoris.", PROTEIN EXPRESSION AND PURIFICATION, vol. 26, no. 3, December 2002 (2002-12-01), pages 394 - 405, XP002791152, ISSN: 1046-5928
Attorney, Agent or Firm:
CREASE, Devanand et al. (GB)
Download PDF:
Claims:
WHAT IS CLAIMED IS:

1. A method for the production of recombinant phytohaemagglutinin (PHA) lectin protein comprising: a. transforming a Pichia pastoris yeast culture with a nucleic acid expression vector that comprises a gene encoding a PHA polypeptide, wherein the PHA gene is fused to a nucleic acid sequence that encodes a Saccharomyces a-factor prepro- peptide, and wherein the expression of the PHA polypeptide is under the control of a constitutive promoter present in the expression vector; b. maintaining the yeast culture under conditions that support expression of the polypeptide; and c. isolating the recombinant PHA protein from the yeast culture.

2. The method of claim 1 , wherein the constitutive promoter is selected from the group consisting of: GAP; TEF1 ; PGK1 ; TPI1 ; YPT1 and PGCW14.

3. The method of any one of claims 1 to 2, wherein the PHA protein is isolated from a supernatant of the yeast culture; or wherein the PHA protein is isolated from a cellular extract obtained from the yeast culture.

4. The method of any one of claims 1 to 3, wherein the PHA polypeptide is selected from: PHA- L; and PHA-E.

5. The method of any one of claims 1 to 4, wherein the PHA protein is a tetrameric PHA selected from the group consisting of: [(PHA-E)4]; [(PHA-E)3(PHA-L)1]; [(PHA-E)2(PHA-L)2]; [(PHA-E)1 (PHA-L)3]; and [(PHA-L)4]

6. The method of claim 4 or 5, wherein PHA-L gene is obtained or derived from Phaseolus vulgaris.

7. The method of claim 6, wherein the PHA-L gene corresponds to SEQ ID NO: 11 or a homologue or derivative thereof.

8. The method of any one of claims 1 to 7, wherein the PHA protein comprises tetrameric PHA-L and is isolated from a supernatant of the yeast culture.

9. The method of any one of claim 7 and 8, wherein the isolated recombinant tetrameric PHA-L protein demonstrates mitogenic activity towards lymphocytes, suitably Xenopus laevis lymphocytes.

10. The method of claim 9, wherein the isolated recombinant tetrameric PHA-L protein demonstrates a level of mitogenic activity that is equivalent to that demonstrated for plant- derived native PHA-L protein.

11. The method of claims 1 to 10, wherein the method is performed in the absence of methanol.

12. A PHA protein produced by the method of any one of claims 1 to 11.

13. An expression vector for use in Pichia pastoris yeast culture, wherein the expression vector comprises a fusion gene encoding a PHA polypeptide, wherein the PHA gene is fused inframe to a nucleic acid sequence that encodes a Saccharomyces a-factor prepro- peptide, and wherein the expression of the fusion gene is under the operative control of a constitutive promoter present in the expression vector.

14. The expression vector of claim 13, wherein the constitutive promoter is selected from the group consisting of: GAP; TEF1 ; PGK1 ; TPI1 ; YPT1 and PGCW14.

15. The expression vector of any one of claims 13 or 14, wherein the PHA polypeptide is selected from: PHA-L; and PHA-E.

16. The expression vector of any one of claims 13 to 15, wherein PHA gene is obtained or derived from Phaseolus vulgaris.

17. The expression vector of any one of claims 13 to 16, wherein the PHA gene corresponds to SEQ ID NO: 11 or a homologue or derivative thereof.

18. A cell culture comprising the expression vector of any one of claims 13 to 17.

19. The cell culture of claim 18, wherein the cells are Pichia pastoris yeast cells.

Description:
METHODS AND COMPOSITIONS FOR LECTIN PRODUCTION

FIELD OF THE INVENTION

The invention relates to the recombinant production of plant proteins, especially lectins, in yeast. BACKGROUND OF THE INVENTION

Lectins are proteins are found in a diversity of organisms and are highly specific for the reversible binding of carbohydrate moieties which explains the diversity of reactions caused by lectins - including the agglutination of erythrocytes and the activation of lymphocytes and other cells.

Plants represent a common source of lectins with the richest abundance occurring in the seeds. The full diversity of biological functions of lectins in plants is not yet fully understood to this date but they are believed to offer protection to the plants from bacterial and fungal invasion, assist in plant growth and modulate plant metabolism. Plant lectins are also toxic to mammals - a feature that may be protective for the plant against ingestion.

The lectin carbohydrate reaction finds multiple applications for example as a biological assay tool as well as a potential therapeutic agent. More specifically, lectins could be used anticancer drugs as they have been shown to lead to apoptosis in different cancer cell lines. Examples may include Korean mistletoe lectin which has been shown to lead to apoptosis of human A253 cancer cells (Choi et al. 2004, Arch Pharm Res. 2004 Jan; 27(1):68-76.), Sophora flavescens lectin which has been shown to inhibit the growth of HeLa cells (Liu et al. 2008, Phytomedicine. 2008 Oct; 15(10):867-75.) and French bean hemagglutinin which has been shown to kill breast cancer MCF-7 cells (Lam et al. 2010, Acta Biochim Pol. 2009; 56(4):649-54.). The pathways leading to cell death are different but usually involve the activation of caspases, a family of protease enzymes involved in programmed cell death and inflammation.

Lectins also find application as antiviral drugs. Amongst many examples gold coral lectin was reported to prevent infection of H9 cells with human immunodeficiency virus (HIV-1 ; Mdller et al. 1988, J Acquir Immune Defic Syndr. 1988; 1(5):453-8.). The carbohydrate-binding lectins are believed to inhibit fusion of HIV-infected cells with CD4 cells by a carbohydrate-specific interaction with the HIV-infected cells (Hansen et al.1989, AIDS. 1989 Oct; 3(10) .635-41.). Similarly Griffithsin, a lectin which can be isolated from red algae may be used in the treatment of the viral infections such as Zika. The lectin appears to inhibit flaviviral entry as it can crosslink mannose oligosaccharides found on the viral E glycoproteins (Alexandre et al. 201 1 , J Virol. 201 1 Sep; 85(17): 9039-9050). Lectins have also been used in the cosmeceutical context. For example, International Patent Application No. WO-96/38162-A describes a method of using various lectins for prevention and treatment of skin diseases and disorders caused by bacteria, fungi, and viruses.

However, cosmetic compositions are difficult to standardise and can be prone to large batch to batch variations in the starting materials. Therefore, there is a need for biotechnological derived ingredients which ensure that safety and efficacy can be guaranteed in the cosmetic product.

The Common bean, Phaseolus vulgaris contains genes encoding a number of different proteins belonging to the legume lectin family. The proteins expressed by the legume lectin family genes in P. vulgaris have differing functional activities, including inhibition of amylases as well as carbohydrate binding. The most characterised lectin protein of Phaseolus vulgaris products are the major seed lectins, which are derived from two genes, died and dlec2. These genes produce polypeptides designated PHA-E and PHA-L (from "phytohaemagglutinin"; PHA) respectively, in approximately equal amounts. PHA-E and PHA-L polypeptides form a tetrameric protein and are sufficiently similar that they can assemble with each other into the tetramers. Hence, the native PHA lectin purified from P. vulgaris seeds contains a mixture of five diverse isoforms, which can be written as [(PHA-E)4], [(PHA-E)3(PHA-L)1], [(PHA-E)2(PHA-L)2], [(PHA-E)1 (PHA-L)3] and [(PHA-L)4j.

PHA-E and PHA-L differ in their specificity of binding to carbohydrates. For example, there is a known difference in specificity of binding to the complex carbohydrates present on the surface of blood cells, such that PHA-E binds specifically to erythrocytes (red blood cells) whereas PHA-L binds poorly to erythrocytes but much more strongly to leucocytes (white blood cells). Consequently, a tetrameric isoform consisting of [(PHA-E)4], will agglutinate erythrocytes but not leucocytes, whereas a corresponding isoform consisting of [(PHA-L)4] agglutinates leucocytes but not erythrocytes; the other isoforms show a mixed agglutination specificity.. It is presumed that these homologous genes presumably arose through duplication and divergence of an ancestral PHA gene. The diversity in the biological properties may enable the native tetrameric complexes to exhibit diverse plant defence properties, thereby contributing to improved resistance to herbivores and pests. Hence, the heterogeneity of the PHA complex in P. vulgaris may be the result of positive selective pressure which favours the formation of heterotetramers over homotetramers.

PHA-L has been shown to have some useful therapeutic properties. International Patent application No. WO-97/49420 describes chemoprotective effects upon small intestine issue in rats when PHA-L is administered orally in combination with a high dose of the anti-cancer chemotherapeutic agent 5- fluorouracil. In WO-97/49420 the PHA-L was purified from native P. vulgaris plant material via a complex process requiring multiple steps of affinity chromatography and HPLC. Recovery yield of PHA-L from kidney bean meal starting material was barely over 0.6%.

Hence, due to technical difficulties, purification of [(PHA-L)4] from P. vulgaris is non-viable on anything but a very limited laboratory scale - usually microgram (pg) levels of production - even though the starting material is abundant. Recombinant expression for many eukaryotic lectins in a bacterial host such as E. coli is known to be unsuitable. Plant lectin products often do not fold correctly and are not glycosylated or form complexes with the bacterial proteins, possibly due to their natural anti-bacterial role in plants. Plant lectins have been expressed as soluble functional proteins using the yeast Pichia pastoris as an expression host (Raemaekers et al., 1999, Eur. J Biochem. 265: 394-403), however, expression of lectin proteins, such as PHA at true commercially viable yields consistently and at sufficiently high grade to be used in pharmaceutical preparations still remains a challenge.

It is desirable to provide an improved process for the manufacture of high quality pure functional recombinant lectin at commercial yields. It is an object of the present invention to overcome the disadvantages observed in the prior art.

These and other uses, features and advantages of the invention should be apparent to those skilled in the art from the teachings provided herein.

SUMMARY OF THE INVENTION

In a first aspect the invention provides a method for the production of recombinant phytohaemagglutinin (PHA) lectin protein comprising: a. transforming a Pichia pastoris yeast culture with a nucleic acid expression vector that comprises a gene encoding a PHA polypeptide, wherein the PHA gene is fused to a nucleic acid sequence that encodes a Saccharomyces a-factor prepro- peptide, and wherein the expression of the PHA polypeptide is under the control of a constitutive promoter present in the expression vector; b. maintaining the yeast culture under conditions that support expression of the polypeptide; and c. isolating the recombinant PHA protein from the yeast culture.

In specific embodiments the constitutive promoter is selected from the group consisting of: GAP; TEF1 ; PGK1 ; TPI1 ; YPT1 and PGCW14. Suitably synthetic constitutive promoters including core promoters may also be used. Typically, the PHA polypeptide is isolated from a supernatant of the yeast culture and/or isolated from a cellular extract obtained from the yeast culture.

In embodiments the PHA comprises a PHA-L and/or a PHA-E polypeptide.

In further embodiments the PHA lectin protein is a tetrameric PHA and may be selected from the group consisting of: [(PHA-E)4j; [(PHA-E)3(PHA-L)1j; [(PHA-E)2(PHA-L)2j; [(PHA-E)1(PHA-L)3j; and [(PHA-L)4j.

Suitably, the PHA gene is obtained or derived from Phaseolus vulgaris. Optionally, the PHA gene is a PHA-L gene that corresponds to SEQ ID NO: 11 or a homologue or derivative thereof. Suitably the isolated recombinant tetrameric PHA protein is characterised as biologically functional by demonstrating mitogenic activity towards lymphocytes, suitably Xenopus laevis lymphocytes. Typically, the isolated recombinant PHA protein demonstrates a level of mitogenic activity that is equivalent to, or even greater than, that demonstrated for plant-derived native PHA-L protein.

In one embodiment of the invention the method is performed in the absence of methanol.

A second aspect of the invention provides for a PHA protein produced by the methods as described herein.

A third aspect of the invention provides expression vector for use in Pichia pastoris yeast culture, wherein the expression vector comprises a fusion gene encoding a PHA polypeptide, wherein the PHA gene is fused in-frame to a nucleic acid sequence that encodes a Saccharomyces ofactor prepro- peptide, and wherein the expression of the fusion gene is under the operative control of a constitutive promoter present in the expression vector. In an embodiment of the invention, the constitutive promoter is selected from the group consisting of: GAP; TEF1 ; PGK1 ; TPI1 ; YPT1 and PGCW14.

A fourth aspect of the invention provides for a cell culture comprising an expression vector as described herein. Suitably, the cell culture comprises cells that are Pichia pastoris yeast cells.

It will be appreciated that the above statements are to be read in conjunction with the embodiments described in further detail below. Each embodiment of the invention may be utilised in isolation or in combination with other embodiments, unless otherwise specified.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is further illustrated with reference to the following drawings in which

Fig. 1 is a map of a pGAPZa expression vector of an embodiment of the present invention, showing the main features including the location of the constitutive promoter (GAP), the prepro sequence and the polycloning site.

Fig. 2 shows a Western blot of culture supernatant in which Pichia pastoris transformants are screened for expression of recombinant PHA-L.

Fig. 3 shows a Southern blot characterisation of DNA insertion in P. pastoris clone Pp(PHA- L). DNA extracted from Pp(PHA-L) was subjected to restriction digestion and Southern blotting; the blot is calibrated with gene copy equivalents. The blot was probed with the PHA-L coding sequence. Rsa I restriction fragment in yeast genomic DNA is circled; other digests gave only high mol. wt. fragments. Figure 4 shows graph of accumulation of recombinant PHA-L in separate yeast culture supernatants for 3, 4, 5 and 6 days. Cell extractions were performed on days 3 and 6 to provide comparison of PHA-L associated with the cells.

Figure 5 shows graphs of analytical gel filtration profiles for protein standards (panel A), PHA- L purified from plant tissues (Vector Labs, panel B) and recombinant PHA-L (panel C).

Figure 6 shows a graph of a mitogenic activity of recombinant PHA-L, as measured by a lymphocyte stimulation assay. A commercial mitogenic PHA preparation from plant tissues ("PHA-P", Sigma) was used as a positive control. Points and error bars show mean ± SE for 3 replicates at each concentration.

Figure 7 is a plasmid map of the pAVE1275 PHA-L secretion leader expression vector showing the location of the constitutive promoter (GAP), the prepro sequence and polycloning site.

Figure 8 is a plasmid map of the pAVE1274 PHA-L secretion leader expression vector showing the location of the constitutive promoter (GAP), the prepro sequence and polycloning site.

Figure 9 is Labchip data displaying elution fractions from culture supernatants using the pAVE1275 expression system.

Figure 10 is Labchip data displaying elution fractions from culture supernatants using the pAVE1274 expression productivity in the culture medium.

Figure 1 1 shows a chart of relative clonal productivity after 24 and 27 hours.

Figure 12 is a Labchip data output displaying the elution fractions from small scale protein purifications obtained from the pAVE1274 expression system.

DETAILED DESCRIPTION OF THE INVENTION

Unless otherwise indicated, the practice of the present invention employs conventional techniques of chemistry, molecular biology, microbiology, recombinant DNA technology, and chemical methods, which are within the capabilities of a person of ordinary skill in the art. Such techniques are also explained in the literature, for example, M.R. Green, J. Sambrook, 2012, Molecular Cloning: A Laboratory Manual, Fourth Edition, Books 1-3, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY; Ausubel, F. M. et al. (1995 and periodic supplements; Current Protocols in Molecular Biology, ch. 9, 13, and 16, John Wiley & Sons, New York, N. Y.); B. Roe, J. Crabtree, and A. Kahn, 1996, DNA Isolation and Sequencing: Essential Techniques, John Wiley & Sons; J. M. Polak and James O'D. McGee, 1990, In Situ Hybridisation: Principles and Practice, Oxford University Press; M. J. Gait (Editor), 1984, Oligonucleotide Synthesis: A Practical Approach, IRL Press; and D. M. J. Lilley and J. E. Dahlberg, 1992, Methods of Enzymology: DNA Structure Part A: Synthesis and Physical Analysis of DNA Methods in Enzymology, Academic Press. Each of these general texts, as well as the others cited herein, are incorporated by reference. Prior to setting forth the invention, a number of definitions are provided that will assist in the understanding of the invention.

As used herein, the term‘comprising’ means any of the recited elements are necessarily included and other elements may optionally be included as well. ‘Consisting essentially of means any recited elements are necessarily included, elements that would materially affect the basic and novel characteristics of the listed elements are excluded, and other elements may optionally be included. ‘Consisting of means that all elements other than those listed are excluded. Embodiments defined by each of these terms are within the scope of this invention.

The term‘isolated’, when applied to a polynucleotide sequence, denotes that the sequence has been removed from its natural organism of origin and is, thus, free of extraneous or unwanted coding or regulatory sequences. The isolated sequence is suitable for use in recombinant DNA processes and within genetically engineered protein synthesis systems. Such isolated sequences include cDNAs, mRNAs and genomic clones. The isolated sequences may be limited to a protein encoding sequence only, or can also include 5’ and 3’ regulatory sequences such as promoters and transcriptional terminators. Prior to further setting forth the invention, a number of definitions are provided that will assist in the understanding of the invention.

A‘polynucleotide’ is a single or double stranded covalently-linked sequence of nucleotides in which the 3' and 5' ends on each nucleotide are joined by phosphodiester bonds. The polynucleotide may be made up of deoxyribonucleotide bases or ribonucleotide bases. Polynucleotides include DNA and RNA, and may be manufactured synthetically in vitro or isolated from natural sources. Sizes of polynucleotides are typically expressed as the number of base pairs (bp) for double stranded polynucleotides, or in the case of single stranded polynucleotides as the number of nucleotides (nt). One thousand bp or nt equal a kilobase (kb). Polynucleotides of less than around 40 nucleotides in length are typically called‘oligonucleotides’. The term‘nucleic acid sequence’ as used herein, is a single or double stranded covalently-linked sequence of nucleotides in which the 3' and 5' ends on each nucleotide are joined by phosphodiester bonds. The polynucleotide may be made up of deoxyribonucleotide bases or ribonucleotide bases. Nucleic acid sequences may include DNA and RNA, and may be manufactured synthetically in vitro or isolated from natural sources. Sizes of nucleic acid sequences, also referred to herein as‘polynucleotides’ are typically expressed as the number of base pairs (bp) for double stranded polynucleotides, or in the case of single stranded polynucleotides as the number of nucleotides (nt). One thousand bp or nt equal a kilobase (kb). Polynucleotides of less than around 40 nucleotides in length are typically called‘oligonucleotides’ and may comprise primers for use in manipulation of DNA such as via polymerase chain reaction (PCR).

The term‘nucleic acid’ as used herein, is a single or double stranded covalently-linked sequence of nucleotides in which the 3' and 5' ends on each nucleotide are joined by phosphodiester bonds. The polynucleotide may be made up of deoxyribonucleotide bases or ribonucleotide bases. Nucleic acids may include DNA and RNA, and may be manufactured synthetically in vitro or isolated from natural sources. Nucleic acids may further include modified DNA or RNA, for example DNA or RNA that has been methylated, or RNA that has been subject to post-translational modification, for example 5’- capping with 7-methylguanosine, 3’-processing such as cleavage and polyadenylation, and splicing. Nucleic acids may also include synthetic nucleic acids (XNA), such as hexitol nucleic acid (HNA), cyclohexene nucleic acid (CeNA), threose nucleic acid (TNA), glycerol nucleic acid (GNA), locked nucleic acid (LNA) and peptide nucleic acid (PNA). Sizes of nucleic acids, also referred to herein as ‘polynucleotides’ are typically expressed as the number of base pairs (bp) for double stranded polynucleotides, or in the case of single stranded polynucleotides as the number of nucleotides (nt). One thousand bp or nt equal a kilobase (kb). Polynucleotides of less than around 100 nucleotides in length are typically called‘oligonucleotides’ and may comprise primers for use in manipulation of DNA such as via polymerase chain reaction (PCR). In specific embodiments of the present invention the nucleic acid sequence comprises messenger RNA (mRNA).

The present invention also refers to homologues and homology. The term‘homology’ as used herein refers in general terms to the existence of a shared ancestry between two polypeptides or proteins based on the amino acid/nucleotide sequence. Homology is inferred from the amino acid/nucleotide sequence similarity between the wild type polypeptide and another protein e.g. homologue.

Proteins are referred to as homologues if they have substantially similar sequence identity or homology to that of lectin proteins described herein. The term“substantially similar sequence identity” is used herein to denote a level of sequence similarity of from about 50%, 60%, 70%, 80%, 90%, 95% to about 99% identity. Percent sequence identity can be determined using conventional methods (Henikoff and Henikoff Proc. Natl. Acad. Sci. USA 1992; 89: 10915, and Altschul et al. Nucleic Acids Res. 1997; 25:3389-3402).

According to the present invention, homology to the nucleic acid sequences described herein is not limited simply to 100% sequence identity. Many nucleic acid sequences can demonstrate biochemical equivalence to each other despite having apparently low sequence identity. In the present invention homologous nucleic acid sequences are considered to be those that will hybridise to each other under conditions of low stringency (Sambrook J. et al, supra).

The term‘operatively linked’, when applied to nucleic acid sequences, for example in an expression construct, indicates that the sequences are arranged so that they function cooperatively in order to achieve their intended purposes. By way of example, in a DNA vector a promoter sequence allows for initiation of transcription that proceeds through a linked coding sequence as far as a termination sequence. In the case of RNA sequences, one or more UTRs may be arranged in relation to a linked protein coding sequence or open reading frame (ORF). A UTR may be located 5’ or 3’ in relation to an operatively linked ORF.

The term‘promoter’ as used herein denotes a site on DNA to which RNA polymerase will bind and initiate transcription. Promoters are commonly, but not always, located in the 5’ non-coding regions of genes. In the present invention‘inducible’ promoters are those whose activity - i.e. ability to direct transcription of an operably linked ORF - is dependent upon the presence of a triggering chemical or physical factor. Typically, triggering chemical factors may include nutrients, alcohols, antibiotic compounds, signalling molecules and metal ions. Physical triggering factors may include presence or absence of light (photostimulation) or a change in temperature (thermo-/cryo- stimulation). An alternative to inducible promoters are‘constitutive’ promoters which are generally non-inducible and are permanently active. The relative strength of constitutive promoters may vary and can be dependent upon cell culture conditions including nutrient status and cell density.

The term‘polypeptide’ as used herein is a polymer of amino acid residues joined by peptide bonds, whether produced naturally or in vitro by synthetic means. Polypeptides of less than around 12 amino acid residues in length are typically referred to as‘peptides’ and those between about 12 and about 30 amino acid residues in length may be referred to as‘oligopeptides’. The term‘polypeptide’ as used herein denotes the product of a naturally occurring polypeptide, precursor form or proprotein. Polypeptides can also undergo maturation or post-translational modification processes that may include, but are not limited to: glycosylation, proteolytic cleavage, lipidization, signal peptide cleavage, propeptide cleavage, phosphorylation, and such like. The term‘protein’ is used herein to refer to a macromolecule comprising one or more polypeptide chains, such as a multimer.

The term ‘amino acid’ in the context of the present invention is used in its broadest sense and is meant to include naturally occurring L a-amino acids or residues. The commonly used one and three letter abbreviations for naturally occurring amino acids are used herein: A=Ala; C=Cys; D=Asp; E=Glu; F=Phe; G=Gly; H=His; l=lle; K=Lys; L=Leu; M=Met; N=Asn; P=Pro; Q=Gln; R=Arg; S=Ser; T=Thr; V=Val; W=Trp; and Y=Tyr (Lehninger, A. L., (1975) Biochemistry, 2d ed., pp. 71-92, Worth Publishers, New York). In conventional notation X=any amino acid, although is used herein also to refer to an absence or insertion of an amino acid residue in a specified sequence. The general term ‘amino acid’ further includes D-amino acids, retro-inverso amino acids as well as chemically modified amino acids such as amino acid analogues, naturally occurring amino acids that are not usually incorporated into proteins such as norleucine, and chemically synthesised compounds having properties known in the art to be characteristic of an amino acid, such as b-amino acids. For example, analogues or mimetics of phenylalanine or proline, which allow the same conformational restriction of the peptide compounds as do natural Phe or Pro, are included within the definition of amino acid. Such analogues and mimetics are referred to herein as ‘functional equivalents’ of the respective amino acid. Other examples of amino acids are listed by Roberts and Vellaccio (The Peptides: Analysis, Synthesis, Biology, Gross and Meiehofer, eds., Vol. 5 p. 341 , Academic Press, Inc., N.Y. 1983). Such modifications may be particularly advantageous for increasing the stability of domains and/or for improving or modifying solubility, bioavailability and delivery characteristics (e.g. for in vivo applications).

The term‘gene product’ as used herein refers to the product of a coding sequence or ORF. The gene product may comprise a polypeptide or protein. The term‘lectin’ as used herein refers to any protein with carbohydrate recognition property which may be of plant, animal, fungal, bacterial, or viral origin. The lectin may be the wild type protein or genetically engineered.

In the context of this disclosure the term 'PHA protein’ is to be understood to refer to PHA-L, PHA-E and homologues or derivatives thereof. More particularly, a PHA protein may be selected from tetrameric [(PHA-E)4], [(PHA-E)3(PHA-L)1], [(PHA-E)2(PHA-L)2], [(PHA-E)1 (PHA-L)3] or [(PHA-L)4]

The term‘PHA homologue’ is understood to refer to a polypeptide with common ancestry to the PHA polypeptide determined by sequence similarity.

The term‘derivative’ is understood to refer to polypeptide variants of lectins described herein, which may be modified when compared to the wild type polypeptide by:

(i) N-terminal and/or C-terminal substitution and/or truncation of up to eight amino acid residues of the amino acid sequence of the polypeptide; and/or

(ii) protein engineering to include mutations, optionally point mutations, or larger substitutions, truncations, deletions or insertions, at one or more ends or within in solvent exposed loops or structural motifs comprised within the polypeptide; and/or

(ii) fusion with other proteins or polypeptides, either of lectin or non-lectin origin, including fluorophores (such as GFP or RFP), antibodies, affimers, aptamers, or polypeptides with an enzymatic activity.

The resulting derivative may have up to 50, 60, 70, 80, 90, 95, 99% sequence identity with the wild type PHA polypeptide.

The term ‘fusion protein or polypeptide’ as used herein refers to chimaeric proteins which are produced through joining of two of more polypeptide coding sequences (e.g. ORFs) originating from separate genes. When such a fusion gene is translated a single polypeptide is created which has functional properties derived from each of the original proteins. Fusion proteins are created by recombinant DNA technology as understood by the skilled person.

The term‘transformation’ as used herein refers to the process by which exogenous DNA is introduced into a cell, resulting in a genetic modification.

The present invention is based in part upon the observation by the inventors that unexpectedly high yields of recombinant PHA-L protein are obtained in the methylotrophic yeast Pichia pastoris when the Saccharomyces a-factor prepro- peptide is used to direct secretion of the recombinant PHA into the culture medium. To achieve this, the inventors have pursued a strategy in which the yeast prepropeptide is fused to mature PHA coding sequences. In addition, whilst production of recombinant PHA- L from Pichia pastoris transformed with expression constructs has been described previously in Raemaekers et al. 1999 (supra), it has been found presently that the yields at least equivalent to the strong inducible alcohol oxidase 1 gene (AOX1 ) promoter system can be achieved but without many of the drawbacks of that system.

In embodiments of the invention, the lectin may be selected from the group suitably consisting of: PHA; arcelin; GNA ( Galanthus nivalis lectin); NICTBABA ( Nicotiana tabacum lectin); MOL ( Mornininga oleiflora lectin); frutalin lectins (Jacalin lectin, helianthus lectins); EUL (Euonymus lectin - e.g. rice and the spindle tree lectins); monocot lectins (tulip crocus narcissus lectin) and tomato lectins.

Pichia pastoris is a widely used host for heterologous protein production. Along with favourable properties such as growth to high cell density in fermenters and high capacities for protein secretion, P. pastoris has been also extensively chosen in the past because it provides a strong, methanol inducible promoter of the AOX1 gene. However, due to the promoter system used, the methanol levels in the culture are relatively high which contributes to depleted cell density when used in production scale fermenters, therby impacting on product yield and longer term culture viability. Further, methanol which is a hazardous, flammable and toxic substance needs to be kept on site in large amounts which bears an enormous safety risk. Methanol containing culture media will also need to be disposed of by a licenced waste solvent agency which increases costs and has a negative environmental impact. Nevertheless, the strength and general acceptance of the AOX system has made it the industry standard despite the potential drawbacks of its use.

The present invention, therefore, provides a recombinant protein expression system that advantageously expresses biologically PHA proteins at high levels, but under a constitutive promoter which does not require the use of hazardous substances such as methanol, and in a form that allows for simple recovery of highly active protein from culture medium. The expression system is readily scaled up and facilitates commercial levels of biologically active PHA production that hitherto were not considered possible and equivalent to those obtained using conventional approaches.

According to a further embodiment of the present invention there is provided a PHA expression system that comprises the use of a P. pastoris expression vector in which expression of a native or derivatised PHA-L, or PHA-E gene is fused, typically in frame, to Saccharomyces a-factor prepropeptide and, wherein expression of the protein is placed under the control of a promoter that functions constitutively in P. pastoris. In one embodiment of the invention the system comprises an expression vector that comprises a P. pastoris Glyceraldehyde-3-phosphate dehydrogenase (GAP) promoter. Alternative embodiments permit for the use of other constitutive promoters such as, but not limited to, TEF1 ; PGK1 ; TPI 1 ; YPT1 and PGCW14 (Ahmad et al. Appl Microbiol Biotechnol (2014) 98:5301- 5317; Sears et al. Yeast, 14: 783-790 (1998); and Liang, S., Zou, C., Lin, Y. et al. Biotechnol Lett (2013) 35: 1865).

In embodiments PHA polypeptides may be modified by:

(i) N-terminal and/or C-terminal modification, such as substitution and/or truncation of up to eight amino acids of the amino acid sequence of the protein; and/or (ii) protein engineering to include mutations at the ends or within the body of the protein ; and/or

(ii) fusion with other proteins or polypeptides, lectin or non-lectin,

The resulting polypeptide may have up to 50, 60, 70, 80, 90, 95, 99% sequence identity with the wild type PHA polypeptide.

The lectin may be selected from the group consisting of: PHA-E, arcelin; GNA ( Galanthus nivalis lectin); NICTBABA ( Nicotiana tabacum lectin); MOL ( Mornininga oleiflora lectin); frutalin lectins (Jacalin lectin, helianthus lectins); EUL (Euonymus lectin - e.g. rice and the spindle tree lectins); monocot lectins (tulip crocus narcissus lectin) and tomato lectins.

In a further alternative embodiment it is envisaged that synthetic constitutive promoters may be utilised in the system of the invention. Such constitutive promoters may include core promoters comprising a core promoter nucleotide sequence which suitably may be flanked by variable nucleotide flanking regions at the 5’ and/or 3’ position of the core promoter nucleotide sequence.

Core promoter systems may employ synthetic transcription factors (sTFs) and engineered promoters depending on sTFs to control the expression of genes. The sTF-dependent promoters may comprise a variable number of sTF- binding sites linked to a core promoter (for example, see US 2018/371468 A1 ).

The operative components of a plasmid vector of an embodiment of the invention used for lectin protein production may comprise a constitutive promoter which is located upstream of the gene encoding the lectin polypeptide; and transcriptional termination region (terminator), which is located downstream of the gene and assists with the stability of mRNA.

Further the vector may comprise additional sequences such as one or more an antibiotic-resistance genes which allows for selection of yeast cultures that harbour the plasmid vector within the cells. Multiple cloning sites (MCSs), which contain the specific position for restriction enzyme to cut and clone genes into the plasmid vector, are also included. Another optional part is a gene that encodes the signal peptide or secretion signal (a-factor secretion signal) allowing for secretion of the protein to the outside of the cell into the culture medium which assists with purification and isolation of the polypetide/protein product.

Further, the target protein may be tagged, suitably with a polyhistidine tag, or equivalent, at the N- terminus and/or C-terminus to assist with purification.

It has been found by the inventors that the expression system described is particularly advantageous for generating commercial quantities of PHA-L under conditions that support high levels of heterologous recombinant protein expression, such as those described in Macau ley-Patrick et al. (Yeast, Volume 22, Issue 4, March 2005: 249-270). Surprisingly it has been found by the inventors that the expression system as described is particularly advantageous not only for generating commercial quantities of PHA but it also provides PHA-L protein that exhibits a low level of protein heterogeneity. By this it is meant that there is a low level of variation due to translational or post-translational processing of isogenic product (e.g. low levels of truncated variants).

According to embodiments of the invention, lectin produced by the described methods may be incorporated into pharmaceutical or cosmeceutical formulations suitable for administration to a subject. Such preparations of the invention are formulated to conform with regulatory standards and can be administered orally, intra-venously, topically, or via other standard routes. The pharmaceutical preparations may be in the form of tablets, pills, lotions, gels, liquids, powders, suppositories, suspensions, liposomes, microparticles or other suitable formulations known in the art.

The lectin proteins of the present invention may be comprised within pharmaceutical compositions in certain embodiments. Typically, a specified protein will be isolated from a library and characterised for its desired therapeutic potential. Suitably the isolated protein will be utilised in purified form together with one or more pharmacologically approved carriers. Typically, these carriers will include aqueous or alcoholic/aqueous solutions, emulsions or suspensions, any including saline and/or buffered media. Parenteral vehicles include sodium chloride solution, Ringer's dextrose, dextrose and sodium chloride and lactated Ringer's. Suitable physiologically-acceptable adjuvants, if necessary to keep a polypeptide complex in suspension, may be chosen from thickeners such as carboxymethylcellulose, polyvinylpyrrolidone, gelatin and alginates. Intravenous vehicles include fluid and nutrient replenishers and electrolyte replenishers, such as those based on Ringer's dextrose. Preservatives and other additives, such as antimicrobials, antioxidants, chelating agents and inert gases, may also be present. A variety of suitable formulations can be used, including extended release formulations where there is particular need for such a mode of administration.

In specific embodiments of the present invention, the lectin proteins of the present invention of the present invention are utilised as separately administered compositions or in conjunction with other therapeutic agents.

The route of administration of pharmaceutical compositions according to the invention may be any of those commonly known to those of ordinary skill in the art. For therapy, including without limitation immunotherapy, the selected ligands thereof of the invention can be administered to any patient in accordance with standard techniques. The administration can be by any appropriate mode, including parenterally, intravenously, intramuscularly, intraperitoneally, transdermally, via the pulmonary route, or also, appropriately, by direct infusion with a catheter. The dosage and frequency of administration will depend on the age, sex and condition of the patient, concurrent administration of other drugs, counterindications and other parameters to be taken into account by the clinician. Administration can be local (e.g., local delivery to the lung by pulmonary administration, e.g., intranasal administration) or systemic as indicated. The proteins of the invention will be suitably preserved in order to be in a form appropriate for administration to human or animal patients. Preservation may also involve chemical or other modification so as to stabilise the polypeptides for in-vivo use. Stabilisation may include PEGylation or other appropriate chemical processing. In addition, the lectin proteins can be lyophilised for storage and reconstituted in a suitable carrier prior to use.

Pharmaceutical compositions containing the present modified polypeptides or a combination thereof with other drugs or biologicals can be administered for prophylactic and/or therapeutic treatments. In certain therapeutic applications, an adequate amount to accomplish at least partial inhibition, suppression, modulation, killing, or some other measurable parameter, of a population of selected cells is defined as a therapeutically-effective dose.

The invention is further illustrated by the following non-limiting examples.

EXAMPLES

Materials

Except where stated otherwise specifically, materials used in the research were as follows: molecular biology reagents were from a variety of suppliers including Promega (www.promega.com), New England Biolabs (www.neb.com), MBI-Fermentas and others; expression vectors and Pichia pastoris strains were from Invitrogen (www.invitrogen.com) and FUJIFILM Diosynth Biotechnologies; PCR primers were from Sigma-Genosys; protein purification materials were from Amersham- Pharmacia. Other reagents were of laboratory grade or better.

Mass spectrometry

Mass spectrometric data was obtained my means of an UPLC-MS system (Acquity UPLC and Xevo™ G2-S QToF).

Samples were prepared by diluting the protein from large-scale preparation formulated in ammonium bicarbonate at pH = 7.8 to 1 pL for intact mass analysis and to 2 mg/mL for deglycolsylated samples. For deglycolsylated samples 30 pL sample, 3.5 pL 10X Glycobuffer 2 and 2 pL PNGase F were incubated at 37 °C for 2 h.

A Waters Acquity Protein BEH C4, 300A, 1.7 pm, 2.1 x 50 mm column was used with Mobile Phase A: 0.1 % Formic Acid in Water and Mobile Phase B: 0.1 % Formic Acid in Acetonitrile.

The LC-MS data was processed using MaxEntl by combining spectra across a region of a TIC peak correlating with the main product elution (data from 4.5-6 min). The data was pre-processed by background subtraction a lock mass correction. The region of interest was set at m/z = 1070-1350 which capture data of five to six charge forms for the product. The UPLC and MS instrument parameters and methods are provided below.

UPLC Instrument Method (MSeR.acm) summary:

UPLC Gradient:

Example 1 - Construction of an expression vector

1.1. PHA-L coding sequence The complete PHA-L coding sequence was obtained by polymerase chain reaction (PCR) amplification of Phaseolus vulgaris genomic DNA. The died and dlec2 genes contain no introns and thus a complete coding sequence could be obtained in one step.

Genomic DNA was isolated from shoots of seedlings of Phaseolus vulgaris cv Tendergreen. Leaves and stipules were excised, frozen, and ground while frozen prior to extraction of DNA using a phenol- cresol extraction method (Ellis et al., (1984) Chromosoma 91 : 74-81 ). Genomic DNA was ethanol precipitated, redissolved and stored frozen in aqueous solution.

Primers for PCR were designed using the published sequence of PHA-L (EMBL X02409). They corresponded to the first six codons of the PHA-L coding sequence (N terminal primer) and the last six codons (including stop) of the PHA-L sequence, plus 5 bases of 3 'UTR (C-terminal primer). BamH1 sites were added to both primers for subsequent operations.

PCR was carried out under standard conditions (with Taq DNA polymerase) using 1 pg of genomic DNA as template in a reaction volume of 50mI. The annealing temperature was set at 55°C, and 30 cycles of PCR were carried out. A product of approx. 820bp was obtained after amplification; products were analysed by agarose gel electrophoresis. The band corresponding to the product was excised from the gel, and the DNA was purified by electroelution, followed by phenol extraction and ethanol precipitation. The isolated PCR product DNA was "blunt end" cloned by ligation into the Hine II site of plasmid pUC18 (i.e. plasmid DNA restricted with Hine II). The ligation mixture was transformed into E. coli strain JM 101. Clones were selected by plating on media containing ampicillin (50pg/ml) and X- gal. Plasmid DNA was prepared from selected white colonies and checked for the presence of inserts by restriction with BamH1 (excision of 820bp fragment diagnostic for transformants containing inserts). A number of clones were subjected to DNA sequencing using M13 forward and reverse primers to give a complete sequence for the inserts. A clone carrying no changes to amino acid coding sequence when compared to the database sequence was selected. This coding sequence served as the "template" for all further constructs; the pUC18-based recombinant plasmid was designated p232D2. The nucleotide and predicted amino acid sequence of PHA-L is shown in SEQ ID NO:3.

Example 2 - Expression construct for PHA-L

Expression of PHA-L using the yeast a-factor prepro-sequence to direct secretion.

To generate the expression construct pGAPZa A/PHA-L, the coding sequence of the mature PHA-L polypeptide (i.e. amino acid residues 21-272 in SEQ ID NO: 3) was excised by amplification with the following PCR primers:

The N-terminal primer shown was designed to provide a complete yeast a-factor prepro- sequence, including the amino acid residues involved in N-terminal processing of the prepro-peptide. The initial translation product has the signal peptide (pre-sequence) removed co-translationally, and is then acted on by the yeast KEX2 protease, which cleaves C terminal to a Lys-Arg dibasic sequence to remove most of the pro-sequence during secretion. The primer therefore includes sequences encoding the lysine-arginine residues (AAAAGA = KR). Depending on the attached coding sequence, the remainder of the a-factor prepro-sequence may be further "trimmed" at the N-terminus by the yeast STE13 protease, removing the glutamine/alanine repeats (GAGGCTGAAGCT = EAEA; double underlined in the sequence above; SEQ ID NO: 6), which are the recognition sequence for STE13. In addition, the N-terminal primer contains an Xho I site to allow cloning into the vector in the correct reading frame.

The product of approximately 800 bp was isolated by agarose gel electrophoresis, purified by electroelution, and cloned into a vector for PCR products, pCR2.1 (Invitrogen). Clones were selected and checked by DNA sequencing to eliminate any PCR errors. The desired fragment was then excised from the intermediate clone by digestion with Xho I and Xba I, purified by agarose gel electrophoresis and electroelution, and ligated to pGAPZa A DNA (Invitrogen) that had been restricted with Xho I and Xba I - see Figure 1 for vector diagram. The ligation mixture was transformed into E. coli strain DH5a, which was plated on media containing zeocin (50pg/ml) to select for transformants. Selected clones were screened by analysis of plasmid DNA by restriction digestion.

For a selected clone, the DNA sequence of the region containing the recombinant coding sequence in the resulting plasmid pGAPZa A/PHA-L was sequenced using 5' GAP (Invitrogen) and 3' AOX1 sequencing primers to check that no errors had been introduced and that the correct reading frame had been maintained. This construct was used for the production of recombinant tetrameric PHA-L protein complex. The sequence of the region of the pGAPZa A/PHA-L plasmid encoding the recombinant protein is shown in SEQ ID NO: 7.

Example 3

3.1 Transformation of Pichia pastoris

Pichia pastoris (strain X33) cells were made competent and were then transformed with linearised plasmid DNA containing the expression construct described above, such as via electroporation or using the ' EasyComp' chemical transformation method as described in the Invitrogen user’s manual.

The pGAPZa A/PHA-L expression construct clone amplified in E. coli were used to prepare plasmid DNA, using a proprietary DNA purification kit (Qiagen). Plasmid DNA were linearised with Bnl I (Roche), and checked for correct linearisation by agarose gel electrophoresis. The linearised DNA (1- 2pg) was used to transform Pichia pastoris strain X-33 using the Invitrogen "Easy Comp" transformation kit as described by the manufacturer. Transformants were selected by plating yeast cells on media containing zeocin (100pg/ml). An average transformation gave 10-20 colonies representing individual yeast transformants. Yeast colonies growing on selective media were transferred to streak plates under the same zeocin selection after 4 days, and formed the primary bank of transformed P. pastoris clones.

Selected P. pastoris clones were then screened for expression of recombinant PHA-L. Five individual clones were grown as 50ml cultures in shake flasks in YPD medium for 5 days. Culture supernatants were isolated by centrifugation, and were screened for the presence of PHA-L by analysis of samples by SDS-polyacrylamide gel electrophoresis, followed by Western blotting, using rabbit anti-PHA antibodies (Vector labs) as the primary probe (dilution 1 :5000). Immunoreactive material was detected using horseradish peroxidase (HRP)-coupled secondary antibodies (goat anti-rabbit HRP; Bio-Rad; dilution 1 : 10000), followed by detection of bound HRP using chemiluminescence (ECL system, Amersham). Recombinant PHA-L was visualised as a band of approx mol. wt. 30kDa. A typical screen of culture supernatants is shown in Figure 2. The clone with the highest expression levels from the screening process, designated Pp(PHA-L) was selected for further study.

The transformation was repeated several times with similar results. No yeast transformants with consistently higher expression levels of PHA-L than the clone selected initially were observed.

3.2 Characterisation of genomic transformation of Pichia pastoris with PHA-L

Pichia pastoris was transformed with empty pGAPZa vector (Invitrogen) as described above, and a transformed clone, designated Pp(GAPZ) was used as the negative control. 100 ml cultures of Pp(PHA-L) and Pp(GAPZ) were grown for 3 days at 30°C in YPG medium. Cells were harvested by centrifugation and DNA was extracted from the cell pellets using the procedure described by Sambrook and Russell (Sambrook and Russell, 2001 ). The cell pellets each yielded approximately 70pg of purified DNA. The DNA was dissolved in water, and the concentration was estimated by the Hoechst 33258 dye-binding method (Sambrook and Russell, 2001 ). The DNA was fully digestible by restriction enzymes in a trial digestion of 5pg with EcoR I.

A quantitative Southern blot was set up by restricting 2.12pg amounts of DNA extracted from Pp(PHA-L) and Pp(GAPZ) with the following restriction enzymes: Rsa I, Asp 718 1, Hind III, BamH I, EcoR I. These digests were separated by agarose gel electrophoresis. Known amounts of a positive control standard DNA (PCR product corresponding to PHA-L coding sequence) corresponding to 1 , 3 and 5 gene copy equivalents were also run on the gel. The gel was blotted, and the blot was probed with the PHA-L coding sequence (175ng, labelled with 32 P to a specific activity of >2x10 8 cpm/pg). The blot was washed to a stringency of O. lxSSC (where SSC is saline/sodium citrate, 0.15M NaCI, 0.01 SM sodium-citrate buffer pH 7.0) at 65°C. The blot was exposed to X-ray film for 2 days; results are shown in Figure 3. Both the negative control and Pp(PHA-L) gave background binding to high mol. wt. DNA fragments in all digests except Rsa I. Only Pp(PHA-L) gave a band corresponding to an expected restriction fragment (in the Rsa I digest).

The results from the blot show that a single inserted sequence of PHA-L is present in the yeast genome, and the fragment is consistent with insertion at the GAPDH locus, as expected.

3.3 Assays of expression of PHA-L by Pp(PHA-L)

Pp(PHA-L) was grown under laboratory shake-flask conditions (YP glycerol media, 30°C) for 6 days. Cells and culture medium were then separated by centrifugation. The cells were extracted under non- denaturing conditions. Cells were lysed by repeated vortexing with glass beads, and soluble proteins were extracted by washing with sodium phosphate buffer, pH 7.5, followed by centrifugation to remove cell debris. PHA-L in culture medium and cell extracts was then quantitated by immuno-dot blot assay, using pure PHA as a standard to quantitate the blots, and anti-PHA antibodies to detect the protein (details of antibodies and conditions as above). Results are summarised in Figure 4 (day 6 time point). A proportion of recombinant PHA-L, at most approx. 30%, was associated with the yeast cells, although the vast majority of the protein was present as secreted soluble protein in the culture supernatant.

The levels of PHA-L expressed under shake-flask conditions were relatively low, largely due to this being an inefficient system for Pichia growth. The cell densities achieved in shake flask cultures were low, but significantly improved yields would be expected under cell growth conditions that are specifically optimised for Pichia.

Example 4

4.1 A small-scale purification of recombinant PHA-L produced by Pichia pastoris clone Pp(PHA-L)

A 50 ml culture of Pp(PHA-L) was grown for 53 hours at 30°C in YPD medium containing zeocin (50pg/ml) under shake-flask conditions. The culture was centrifuged for 15 minutes at 3000g, and the supernatant was made 90% saturated in ammonium sulphate by adding solid (50g/100ml). After incubation at 4°C for 16h the precipitated protein was collected by centrifugation for 20min at 10,000 g. The pellet was resuspended in 15ml 4M NaCI, and centrifuged for 10 minutes at 10,000g to remove insoluble material. The supernatant was loaded on a column of phenyl-Sepharose (20ml, 1cm diameter) which had been equilibrated with 4M NaCI. The column was washed with 4M NaCI to remove non-bound material, and then eluted with 50mM sodium acetate buffer, pH 3.5. The eluted peak of protein was collected. Analysis by SDS-polyacrylamide gel electrophoresis showed that this peak contained PHA-L.

Peak fractions from the phenyl-Sepharose column were applied directly to a 1 ml HiTrapS column which had been equilibrated with 50mM sodium acetate buffer, pH 3.5. The column was washed with equilibration buffer to remove non-bound material, and was then eluted with a gradient of NaCI (0- 0.5M) over 40ml in 50mM sodium acetate buffer, pH 3.5, followed by a gradient of NaCI (0.5-1.0M) over 10ml in 50mM sodium acetate buffer, pH 3.5. The peak of protein eluting at concentration of NaCI approx. 0.4M was collected. Analysis of eluted fractions by SDS polyacrylamide gel electrophoresis showed that this peak contained recombinant PHA-L, which was >95% pure. The yield of protein was approx. 80pg, as estimated from the gel electrophoresis, corresponding to a yield of 1.6 mg protein per litre culture.

Purified PHA-L was subjected to N-terminal sequencing to confirm the identity of the recombinant protein. The protein retained the Glu-Ala repeats from the yeast a-factor prepro-sequence at its N- terminus, but the sequence of the succeeding 10 residues (N-terminal sequencing was halted after 14 residues) was identical to the N-terminal sequence of PHA-L reported in the literature and protein sequence database entry (PHA-L N-terminal sequence: EAEASNDIYFNFQR [SEQ ID NO: 8]). The retention of the Glu-Ala repeats at the N-terminus of recombinant PHA-L showed that the yeast STE13 protease had not been able to act on the recombinant protein during secretion. However, the presence of these extra residues at the N-terminus was not predicted to have any effect on the biological activity of the protein, since different isoforms of PHA show sequence variability in this region.

Example 5

Determining Molecular Weight

Native PHA (both E- and L- forms) exists predominantly as tetrameric molecules at pH 6.0-8.0. Since the biological activity properties of lectins are dependent on the degree of oligomerisation in protein molecules, the molecular weight of the recombinant PHA-L was estimated for comparison with PHA-L produced in plants. Analytical gel filtration was used as the method of choice for estimating this parameter.

A Superdex 200 column (HRI0/30, Pharmacia-Amersham) was equilibrated with 50mM Tris-HCI buffer, pH 7.5, containing 0.5M NaCI. All runs were carried out at a flow rate of 0.4 ml/min. Protein samples of approx. 250pg protein in 250mI buffer were applied to the column, and the elution volume was measured using a post-column flow cell uv absorption monitor set to a wavelength of 280nm. The column was calibrated with standard proteins (bovine serum albumin, 66,000 and 133,000 mol wt; ovalbumin, 43,500 mol. wt., soyabean Kunitz trypsin inhibitor, 20,100 mol. wt.). PHA-L from Phaseolus vulgaris seeds (Vector Labs) and recombinant PHA-L were run as the protein samples. Results are shown in Figure 5.

Both the plant PHA-L and recombinant PHA-L gave a major peak on gel filtration, corresponding to a mol. wt. of approx. 100,000. This corresponds to the tetrameric form, which would have an expected mol. wt. of approx. 120,000; the lower value on gel filtration has been observed previously. Plant PHA gave a clear peak due to other oligomeric forms, principally an octomeric form. PHA-L showed more evidence of size heterogeneity in the region between tetrameric and octomeric forms, possibly due to the glycosylation of the recombinant protein differing from that present in plants.

This experiment led to the conclusion that the majority (>80%) of recombinant PHA-L existed as biologically active tetrameric protein molecules under the above experimental conditions.

Example 6

Many legume lectins, including those from Phaseolus vulgaris, are known to show mitogenic activity (stimulation of cell division) towards mammalian cells through binding to cell surface receptors. The toxicity at high dosage may be a result of this mitogenic activity. Due to the differences in carbohydrate-binding specificity, PHA-E and PHA-L have different activities as mitogens, and thus, potentially, as toxins. Thus the toxic effects of PHA containing mixed lectin isoforms with both PHA-E and PHA-L subunits may not be shown by individual isoforms. Specifically, PHA-L is believed to be a potent mitogen for cells of the gut epithelium, but which is likely to exhibit low toxicity when administered as a pure [(PHA-L)4] protein.

The mitogenic activity of recombinant PHA-L was assayed by a lymphocyte stimulation assay, measuring the incorporation of [ 3 H]-thymidine into Xenopus laevis lymphocyte DNA. Lectin concentration in the assay was varied over 3 orders of magnitude, from 0.1 mg/ml to 5 mg/ml. A commercial preparation of mitogenic PHA, "PHA-P" (Sigma Chemcial Co.) was used as a positive control. Xenopus lymphocytes were prepared from adult frogs using a standard lymphocyte stimulation assay protocol (Horton et al., 1980, Dev. and Comparative Immunol. 4: 75-86). The results of the assay are shown in Figure 6.

Both the recombinant PHA-L and the commercial mitogenic "PHA-P" lectin preparation show stimulation of [ 3 H]-thymidine incorporation by lymphocytes of 2.5-fold over background, with a similar level for maximal stimulation, at 0.5mg/ml.

This result demonstrates that recombinant PHA-L produced according to the present expression system demonstrates mitogenic activity towards lymphocytes, and is thus biologically active in stimulating cell division.

Example 7 - Construction of an expression vector

The starting point for the construction of a GAP expression vector pAVE1232 is expression plasmid pAVE522.The alcohol oxidase (AOX) promoter region was removed by restriction digest using Bglll and Notl. This was replace with the GAP promoter Bglll/Notl fragment shown in SEQ ID :9. The resultant expression plasmid was named pAVE1232.

Example 8 - Expression construct for PHA-L

Expression of PHA-L using the yeast MFa factor prepro sequence to direct secretion.

The DNA sequence of Phaseolus vulgaris was obtained from Genbank Database accession number K03289.1 SEQ ID NO: 10. Two expression cassettes were synthesised by Geneart Thermofisher.

Gene C was made using SEQ ID NO: 1 1 and the FUJIFILM Diosynth Biotechnologies Translation Initiation/Secretion Leader operably linked to the 5 ' end of mature PHA-L gene. The complete expression cassette DNA sequence is shown in SEQ ID NO: 12 including translational stop codons and restriction sites.

Gene D was made using SEQ ID : NO 1 1 and the industry standard Translation Initiation/Secretion Leader from Thermofisher/Life Technologies pGAP expression plasmid operabley linked to the 5 ' end of the mature PHA-L gene. The complete expression cassette DNA sequence is shown in SEQ ID NO: 13

Genes C and D were cloned into expression plasmid pAVE1232 by means of Mfel and Notl restriction sites to create expression plasmids pAVE1274 (Gene C) and pAVE1275 (Gene D). Expression plasmids were confirmed by DNA sequencing (SEQ ID NO 1 1 and 12). The features of pAVE1274 and pAVE1275 are shown in Figure 7 and 8.

Example 9 - Transformation of Pichia pastoris

Pichia pastoris (strain NRRLY1 1430) cells were made competent and were then transformed with linearised plasmid DNA containing the expression construct described above via electroporation.

The pGAPZa A/PHA-L expression construct clone amplified in E. coli were used to prepare plasmid DNA, using a proprietary DNA purification kit (Qiagen). Plasmid DNA were linearised with Bglll (New England Biolabs), and checked for correct linearisation by agarose gel electrophoresis. The linearised DNA (40pg) was used to transform Pichia pastoris strain NRRLY1 1430 using electroporation. Transformants were selected by plating yeast cells on media containing zeocin concentrations ranging from 100pg/ml to 1000 pg/ml. Yeast colonies growing on selective media were transferred to streak plates under the 100 pg/ml zeocin selection after 4 days, and formed the primary bank of transformed P. pastoris clones.

Selected P. pastoris clones were then screened for expression of recombinant PHA-L. Twenty three individual clones were grown as 2 ml cultures in deep well microtitre plates in BMGLY medium for 5 days. Culture supernatants were isolated by centrifugation, and were screened for the presence of PHA-L by analysis of samples by Labchip Capillary electrophoresis (Figures 9 and 10)

Clones 9-1 , 9-6 ,9-7, 9-9, 9-10, 9-1 1 and 9-13 were selected for fermentation evaluation using ambr™250. Strains were grown to an OD600 of 8 and centrifuged and resuspended in 40% glycerol prior to storage at -70°C (Figure 1 1 ).

A vial of each strain was thawed and inoculated into a shake flask with the parameters shown in table 1 using the media in table 3. The fermentation vessels were filled with the basal media shown in table 8 and inoculated with cells to the parameters shown in table 2. Feed composition is shown in tables 4, 5, 6 and 7. Two bioreactors were run per strain. One set of fermentations received the defined feed shown in table 6. The other set of fermentations received the complex yeast extract feed shown in table 7.

Table 1 Inoculation Flask parameters

Table 2 ambr(TM)250 fermentation parameters Table 3 Inoculation flask parameters

Table 4 Biotin solution

Table 5 Trace metal solution

Table 6 Glycerol feed solution Table 7 Yeast Extract Feed Solution

Table 8 GAP Fermenter Medium

The harvesting time of around 72 hours is advantageous in order to produce large amounts of highly pure lectin protein.

Example 10 10.1 A small-scale purification of recombinant PHA-L produced by Pichia pastoris clone Pp(PHA-L)

Small scale purifications of the most productive four bioreactors were performed using Capto S Robocolumn screening (Figure 12). The columns were flushed with 5 column volumes (CV) water, cleaned with 3CV 1 M NaOH and then flushed with 5CV water. The columns were then equilibrated with 50mM Sodium Acetate pH 3.6 before loading of with 20ml of culture supernatant adjusted to pH3.6 and diluted 3 fold in water to conductivity below 15mS/cm. The columns were then subjected to

5CV post load wash of 50mM Sodium Acetate pH3.6. The product was eluted over a 12CV cycle of 50mM Sodium Acetate with a gradient of 0-1 M NaCL pH3.6. A column strip was performed with 4CV 1 M NaCI pH 3.6.

Example 1 1 Determining Molecular Weight

The protein eluate obtained from the small scale purification was analysed by peptide mapping using Liquid Chromatography Mass Spectrometry (LC-MS) to confirm identity. A Waters Acquity Protein BEH C18, 1.7pm, 2.1 x 75 mm column was used for initial chromatography. Mobile Phase A was 0.1 % Trifluoroacetic Acid in water. Mobile Phase B was 0.1 % Trifluoroacetic acid in acetonitrile. Seal wash was performed with 20% methanol. Peptide mapping was able to positively identify 83% of the molecule confirming that the protein expressed and purified was PHA-L. Using the methods described above protein quantities on of 0.18 g/L of culture can be obtained. With the culturing process fully optimised it is expected that quantities as large as 1.0 g/L of culture will be obtained.

Surprisingly this is comparable to yields formed with more favoured inducible promoter systems in Pichia pastoris, such as the AOX system. This is of significant advantage in that the present invention removes the dependency upon methanol as an inducing agent and energy source, thereby increasing safety and reducing environmental impact.

The intact protein product as well as deglycosylated sample were analysed by ultraperformance liquid chromatography mass spectrometry. Mass spectrometric data is recorded at a product elution time of 4.5 to 6 min. The protein mass fingerprint was consistently reproducible for intact full protein samples as well as deglycosylated samples.

Although particular embodiments of the invention have been disclosed herein in detail, this has been done by way of example and for the purposes of illustration only. The aforementioned embodiments are not intended to be limiting with respect to the scope of the appended claims, which follow. The choice of nucleic acid starting material, the clone of interest, or type of library used is believed to be a routine matter for the person of skill in the art with knowledge of the presently described embodiments. It is contemplated by the inventors that various substitutions, alterations, and modifications may be made to the invention without departing from the spirit and scope of the invention as defined by the claims.

SEQ ID NO: 3 - Insert from plasmid p232D2, containing PHA-L coding sequence amplified from genomic DNA of P. vulgaris.

PCR Primer CGGATCCC

ATG GCT TCC TCC AAG TTC TTC ACT GTC CTC TTC CTT GTG CTT CTC ACC CAC GCA AAC TCA

M A S S K F F T V L F L V L L T H A N S

SEQ ID NO: 7 - Insert from plasmid p232D2, containing PHA-L coding sequence synthesised from genomic DNA sequence of P. vulgaris.

SEQ ID NO: 10

GAP Promoter DNA sequence.

AGATCTTTTTTGTAGAAATGTCTTGGTGTCCTCGTCCAATCAGGTAGCCATCTCTGAAAT ATCTGG

CTCCGTTGCAACTCCGAACGACCTGCTGGCAACGTAAAATTCTCCGGGGTAAAACTT AAATGTGG

AGTAATGGAACCAGAAACGTCTCTTCCCTTCTCTCTCCTTCCACCGCCCGTTACCGT CCCTAGGA

AATTTTACTCTGCTGGAGAGCTTCTTCTACGGCCCCCTTGCAGCAATGCTCTTCCCA GCATTACG

TTGCGGGTAAAACGGAGGTCGTGTACCCGACCTAGCAGCCCAGGGATGGAAAAGTCC CGGCCG

TCGCTGGCAATAATAGCGGGCGGACGCATGTCATGAGATTATTGGAAACCACCAGAA TCGAATA

T AAAAGG CG AACACCTTT CCCAATTTTGGTTT CTCCT G ACCC AAAG ACTTT AAATTT AATTT ATTT G

TCCCT ATTT CAAT C AATT G AACAACT ATTT CG AAACG AT G AG ATTT CCTT CAATTTTT ACT G CTGTT

TTATTCGCAGCATCCTCCGCATTAGCTGCTCCAGTCAACACTACAACAGAAGATGAA ACGGCACA

AATTCCGGCTGAAGCTGTCATCGGTTACTCAGATTTAGAAGGGGATTTCGATGTTGC TGTTTTGC

C ATTTT CC AAC AG C AC AAAT AACG G G TT ATTG TTTAT AAAT ACTACT ATT G CC AG C ATT G CTG CTA

AAGAAGAAGGGGTATCTCTCGAGAAAAGAGAGGCTGAAGCTGAATTCACGTGGCCCA GCCGGC

CGTCTCGGATCGGTACCTCGAGCCGCGAATTCGGCGGCCGC

SEQ ID NO: 1 1

Mature PHA-L DNA sequence

agcaacgatatctacttcaacttccaaaggttcaacgaaaccaaccttatcctccaa cgcgatgcctccgtctcatcctccggccagttacgact aaccaatcttaatggcaacggagaacccagggtgggctctctgggccgcgccttctactc cgcccccatccaaatctgggacaacaccaccg gcaccgtggccagcttcgccacctccttcacattcaatatacaggttcccaacaatgcag gacccgccgatggacttgcctttgctctcgtcccc gtgggctctcagcccaaagacaaagggggttttctaggtcttttcgacggcagcaacagc aatttccatactgtggctgtggagttcgacaccct ctacaacaaggactgggaccccacagagcgtcatattggcatcgacgtgaactccatcag gtctatcaaaacgacgcggtgggattttgtga acggagaaaacgccgaggttctgatcacctatgactcctccacgaatctcttggtggctt ctctggtttacccttctcagaaaacgagcttcatcgt ctctgacacagtggacctgaagagcgttcttcccgagtgggtgagcgttgggttctctgc cacaactgggattaataaagggaacgttgaaacg aacgacgtcctctcttggtcttttgcttccaagctctccgatggcaccacatctgaaggt ttgaatctcgccaacttggtcctcaacaaaatcctcta

9

SEQ ID NO: 12 Gene C DNA sequence

CAATTGgaaacgAtgagatttccttcaatttttactgcagttttattcgcagcatcc tcCgcattagctgctccagtcaacactacaacagaag atgaaaCggcacaaattccggctgaagctgtcatcggttacttagatttaGaaggggatt tcgatgttgctgttttgccattttccaacagcacaA ataacgggttattgtttataaatactactattgccagcattgctgctaaagaagaagggg tatctttggataaaagagaggctgaagctagcaac gatatctacttcaacttccaaaggttcaacgaaaccaaccttatcctccaacgcgatgcc tccgtctcatcctccggccagttacgactaaccaat cttaatggcaacggagaacccagggtgggctctctgggccgcgccttctactccgccccc atccaaatctgggacaacaccaccggcaccgt ggccagcttcgccacctccttcacattcaatatacaggttcccaacaatgcaggacccgc cgatggacttgcctttgctctcgtccccgtgggctc tcagcccaaagacaaagggggttttctaggtcttttcgacggcagcaacagcaatttcca tactgtggctgtggagttcgacaccctctacaaca aggactgggaccccacagagcgtcatattggcatcgacgtgaactccatcaggtctatca aaacgacgcggtgggattttgtgaacggagaa aacgccgaggttctgatcacctatgactcctccacgaatctcttggtggcttctctggtt tacccttctcagaaaacgagcttcatcgtctctgacac agtggacctgaagagcgttcttcccgagtgggtgagcgttgggttctctgccacaactgg gattaataaagggaacgttgaaacgaacgacgt cctctcttggtcttttgcttccaagctctccgatggcaccacatctgaaggtttgaatct cgccaacttggtcctcaacaaaatcctctagtaaGCG

GCCGC SEQ ID NO: 13 Gene D DNA Sequence

CAATT G AACAACT ATTT CG AAACG AT G AG ATTT CCTT CAATTTTT ACT G CTGTTTT ATT CG CAGCAT

CCT CCGCATTAGCTGCT CCAGT CAACACT ACAACAG AAG AT GAAACGGCACAAATT CCGGCT GA

AGCTGTCATCGGTTACTCAGATTTAGAAGGGGATTTCGATGTTGCTGTTTTGCCATT TTCCAACAG

CACAAATAACGGGTTATTGTTTATAAATACTACTATTGCCAGCATTGCTGCTAAAGA AGAAGGGGT

ATCTCTCGAGAAAAGAGAGGCTGAAGCTagcaacgatatctacttcaacttccaaag gttcaacgaaaccaaccttatcctc caacgcgatgcctccgtctcatcctccggccagttacgactaaccaatcttaatggcaac ggagaacccagggtgggctctctgggccgcgc cttctactccgcccccatccaaatctgggacaacaccaccggcaccgtggccagcttcgc cacctccttcacattcaatatacaggttcccaac aatgcaggacccgccgatggacttgcctttgctctcgtccccgtgggctctcagcccaaa gacaaagggggttttctaggtcttttcgacggcag caacagcaatttccatactgtggctgtggagttcgacaccctctacaacaaggactggga ccccacagagcgtcatattggcatcgacgtgaa ctccatcaggtctatcaaaacgacgcggtgggattttgtgaacggagaaaacgccgaggt tctgatcacctatgactcctccacgaatctcttgg tggcttctctggtttacccttctcagaaaacgagcttcatcgtctctgacacagtggacc tgaagagcgttcttcccgagtgggtgagcgttgggttc tctgccacaactgggattaataaagggaacgttgaaacgaacgacgtcctctcttggtct tttgcttccaagctctccgatggcaccacatctgaa ggtttgaatctcgccaacttggtcctcaacaaaatcctctagtaaGCGGCCGC