HANLY TIMOTHY (US)
YU OLIVER (US)
WO2019173541A1 | 2019-09-12 | |||
WO2016066711A1 | 2016-05-06 |
HEALEY ROBERT D. ET AL: "An improved process for the production of highly purified recombinant thaumatin tagged-variants", FOOD CHEMISTRY, vol. 237, 1 December 2017 (2017-12-01), NL, pages 825 - 832, XP093111486, ISSN: 0308-8146, DOI: 10.1016/j.foodchem.2017.06.018
FISCHER JASMIN E ET AL: "Current advances in engineering tools for Pichia pastoris", CURRENT OPINION IN BIOTECHNOLOGY, LONDON, GB, vol. 59, 27 August 2019 (2019-08-27), pages 175 - 181, XP085846333, ISSN: 0958-1669, [retrieved on 20190827], DOI: 10.1016/J.COPBIO.2019.06.002
MARTIN-EAUCLAIRE MARIE-FRANCE ET AL: "Production of active, insect-specific scorpion neurotoxin in yeast", EUROPEAN JOURNAL OF BIOCHEMISTRY, vol. 223, no. 2, 1 July 1994 (1994-07-01), pages 637 - 645, XP093111965, ISSN: 0014-2956, DOI: 10.1111/j.1432-1033.1994.tb19036.x
JOSEPH JEWEL ANN ET AL: "Bioproduction of the Recombinant Sweet Protein Thaumatin: Current State of the Art and Perspectives", FRONTIERS IN MICROBIOLOGY, vol. 10, 8 April 2019 (2019-04-08), Lausanne, XP093111481, ISSN: 1664-302X, DOI: 10.3389/fmicb.2019.00695
BARRERO JUAN J. ET AL: "An improved secretion signal enhances the secretion of model proteins from Pichia pastoris", MICROBIAL CELL FACTORIES, vol. 17, no. 1, 12 October 2018 (2018-10-12), XP093100754, Retrieved from the Internet
REECK ET AL., CELL, vol. 50, 1987, pages 667
SAMBROOK, J.FRITSCH, E. F.MANIATIS, T.: "MOLECULAR CLONING: A LABORATORY MANUAL", 1989, COLD SPRING HARBOR LABORATORY
SILHAVY, T. J.BENNAN, M. L.ENQUIST, L. W.: "EXPERIMENTS WITH GENE FUSIONS", 1984, COLD SPRING HARBOR LABORATORY
AUSUBEL, F. M. ET AL.: "IN CURRENT PROTOCOLS MOLECULAR BIOLOGY", 1987, GREENE PUBLISHING AND WILEY-INTERSCIENCE
ASLANIDISDE JONG, NUCL. ACID. RES., vol. 18, 1990, pages 6069 - 74
HAUN ET AL., BIOTECHNIQUES, vol. 13, 1992, pages 515 - 18
NEEDLEMANWUNSCH, JOURNAL OF MOLECULAR BIOLOGY, vol. 48, 1970, pages 443 - 453
SMITHWATERMAN, ADVANCES IN APPLIED MATHEMATICS, vol. 2, 1981, pages 482 - 489
SMITH ET AL., NUCLEIC ACIDS RESEARCH, vol. 11, 1983, pages 2205 - 2220
ALTSCHUL ET AL., J. MOL. BIOL., vol. 215, 1990, pages 20894 - 410
VAN DER WEL HLOEVE K.: "Isolation and characterization of thaumatin I and 11, the sweet-tasting proteins from Thaumatococcus damellii Benth", EUR. J. BIOCHEM., vol. 3, no. 1, 1972, pages 221 - 225
IDE N.KANEKO R.WADA R.MEHTA A.TAMAKI S.TSURUTA T.: "Cloning of the thaumatin I cDNA and characterization of recombinant thaumatin I secreted by Pichia pastoris", BIOTECHNOL. PROG., vol. 2, no. 3, 2007, pages 1023 - 1030
JOSEPH JAAKKERMANS SNIMMEGEERS PVAN IMPE JFM: "Bioproduction of the Recombinant Sweet Protein Thaumatin: Current State of the Art and Perspectives", FRONT MICROBIOL., vol. 8, no. 10, 2019, pages 695
CLAIMS What is claimed is: 1. A method of producing thaumatin, the method comprising culturing a recombinant yeast cell comprising a polynucleotide encoding a fusion polypeptide comprising tandem repeats of thaumatin. 2. The method of claim 1, wherein each repeat of thaumatin comprises the amino acid sequence of SEQ ID NO: 1. 3. The method of claim 1, wherein each repeat of thaumatin comprises the amino acid sequence of SEQ ID NO: 3. 4. The method of claim 1-3, wherein each thaumatin in the tandem repeats is separated by a spacer. 5. The method of claim 4, wherein the spacer comprises a protease cleavage site for a yeast protease. 6. The method of claim 4 or claim 5, wherein the spacer comprises the amino acid sequence of any one of SEQ ID NOs: 39, 41, and 43. 7. The method of any one of claims 1-6, wherein the fusion polypeptide comprises 2-8 repeats of thaumatin. 8. The method of any one of claims 1-7, wherein the fusion polypeptide comprises an amino acid sequence at least 80% identical to the amino acid sequence of any one of SEQ ID NOs: 9, 11, 51, 53, 55 and 57. 9. The method of any one of claims 1-8, wherein the fusion polypeptide comprises the amino acid sequence of any one of SEQ ID NOs: 9, 11, 51, 53, 55 and 57. 10. The method of any one of claims 1-9, wherein the fusion polypeptide further comprises an N-terminal signal peptide. 11. The method of claim 10, wherein the signal peptide is a yeast alpha mating factor signal peptide, optionally wherein the yeast alpha mating factor signal peptide comprises the amino acid sequence of SEQ ID NO: 37. 12. The method of claim 10, wherein the signal peptide is an Ostl signal peptide, wherein the Ostl signal peptide comprises the amino acid sequence of SEQ ID NO: 5. 13. The method of claim 4, wherein the spacer comprises a 2A linker, optionally wherein the 2A linker comprises the amino acid sequence of SEQ ID NO: 7. 14. The method of any one of claims 1-13, wherein the polynucleotide is operably linked to a promoter, optionally wherein the promoter is an A0X1 promoter. 15. The method of any one of claims 1-14, wherein the polynucleotide is operably linked to a transcription terminator, optionally wherein the transcription terminator is an A0X1 terminator. 16. The method of any one of claims 1-15, wherein the polynucleotide is provided on a vector, optionally wherein the vector is a plasmid. 17. The method of any one of claims 1-15, wherein the polynucleotide is integrated into the genome of the recombinant yeast cell, optionally wherein the polynucleotide is integrated into a HIS4 locus of the genome of the recombinant yeast cell. 18. The method of any one of claims 1-17, wherein the recombinant yeast cell further comprises one or more polynucleotides encoding one or more chaperones selected from: PDI1, HAC1, ERO1, ERO2, ERV1, ERV2, KAR2, SEC1, SLY1, and GPX1, optionally wherein the recombinant yeast cell further comprises a polynucleotide encoding PpPDIl, a polynucleotide encoding HAC1, one or more polynucleotides encoding PpPDIl, PpEROl, and PpERV2, or one or more polynucleotides encoding AtPDIl, AtERVl, and AtEROl. 19. The method of claim 18, wherein the one or more polynucleotides encoding the one or more chaperones are provided on one or more vectors. 20. The method of claim 18, wherein the one or more polynucleotides encoding the one or more chaperones are integrated into the genome of the recombinant yeast cell, optionally wherein the one or more polynucleotides are integrated into a HIS4 locus of the genome of the recombinant yeast cell. 21. The method of any one of claims 1-20, wherein the recombinant yeast cell further comprises one or more polynucleotides encoding one or more proteases selected from KEX1, KEX2, and Stel3. 22. The method of claim 21, wherein the one or more polynucleotides encoding the one or more proteases are provided on one or more vectors. 23. The method of claim 21, wherein the one or more polynucleotides encoding the one or more proteases are integrated into the genome of the recombinant yeast cell, optionally wherein the one or more polynucleotides are integrated into a A0X1 locus of the genome of the recombinant yeast cell. 24. The method of any one of claims 1-23, wherein the yeast cell is a Pichia pastoris cell. 25. The method of any one of claims 1-24, further comprising isolating the thaumatin. 26. The method of claim 25, wherein the isolated thaumatin is selected from the group consisting of thaumatin I and thaumatin II. 27. Thaumatin produced using the method of any one of claims 1-26, for use as a sweetener. 28. A composition comprising the thaumatin produced using the method of any one of claims 1-26. 29. A consumable product comprising the thaumatin produced using the method of any one of claims 1-26. 30. The composition of claim 28, or the consumable product of claim 29, further comprising a second sweetener. 31. The composition of claim 28, or the consumable product of claim 29 or claim 30, wherein the second sweetener is a rebaudioside. 32. The composition or the consumable product of any one of claims 28-31, further comprising at least one additive is selected from the group consisting of a carbohydrate, a polyol, an amino acid or salt thereof, a polyamino acid or salt thereof, a sugar acid or salt thereof, a nucleotide, an organic acid, an inorganic acid, an organic salt, an organic acid salt, an organic base salt, an inorganic salt, a bitter compound, a flavorant, a flavoring ingredient, an astringent compound, a protein, a protein hydrolysate, a surfactant, an emulsifier, a flavonoids, an alcohol, a polymer, and combinations thereof. 33. The consumable product of any one of claims 29-31, wherein the consumable product is selected from: a food product, a beverage product, a nutraceutical, a pharmaceutical, a dietary supplement, a dental hygienic composition, an edible gel composition, a cosmetic product and a tabletop flavoring. 34. The consumable product of any one of claims 29-32, wherein the beverage product is selected from the group consisting of a carbonated beverage product and a non-carbonated beverage product. 35. The consumable product of claim 34, wherein the beverage product is selected from the group consisting of a soft drink, a fountain beverage, a frozen beverage; a ready-to-drink beverage; a frozen and ready-to-drink beverage, coffee, tea, a dairy beverage, a powdered soft drink, a liquid concentrate, flavored water, enhanced water, fruit juice, a fruit juice flavored drink, a sport drink, and an energy drink. 36. A polynucleotide comprising the nucleotide sequence of any one of SEQ ID NOs: 10 and 12. 37. A polypeptide comprising an amino acid sequence that is at least 80% identical to any one of SEQ ID NOs: 9, 11, 51, 53, 55 and 57. 38. The polypeptide of claim 37, comprising the amino acid sequence of any one of SEQ ID NOs: 9, 11, 51, 53, 55, and 57. 39. A recombinant yeast cell comprising a polynucleotide encoding a fusion polypeptide comprising tandem repeats of thaumatin. 40. The recombinant yeast cell of claim 39, wherein thaumatin is selected from the group consisting of thaumatin I and thaumatin II. 41. The recombinant yeast cell of claims 39 or 40, wherein the yeast cell is a Pichia pastoris cell. 42. The recombinant yeast cell of any one of claims 39 to 41, wherein each repeat of thaumatin comprises the amino acid sequence of SEQ ID NO: 1. 43. The recombinant yeast cell of any one of claims 39 to 41, wherein each repeat of thaumatin comprises the amino acid sequence of SEQ ID NO: 3. 44. The recombinant yeast cell of any one of claims 39 to 43, wherein each thaumatin in the tandem repeats is separated by a spacer. 45. The recombinant yeast cell of claim 44, wherein the spacer comprises a protease cleavage site for a yeast protease. 46. The recombinant yeast cell of claim 44, wherein the spacer comprises the amino acid sequence of any one of SEQ ID NOs: 39, 41, and 43. 47. The recombinant yeast cell of any one of claims 39 to 46, wherein the fusion polypeptide comprises 2-8 repeats of thaumatin. 48. The recombinant yeast cell of any one of claims 39 to 47, wherein the fusion polypeptide comprises an amino acid sequence at least 80% identical to the amino acid sequence of any one of SEQ ID NOs: 9, 11, 51, 53, 55 and 57. 49. The recombinant yeast cell of any one of claims 39 to 47, wherein the fusion polypeptide comprises the amino acid sequence of any one of SEQ ID NOs: 9, 11, 51, 53, 55, and 57. 50. The recombinant yeast cell of claim 39, wherein the polynucleotide comprises a nucleotide sequence at least 80% identical to the amino acid sequence of any one of SEQ ID NOs: 10, 12, 52, 54, 56, and 58. 51. The recombinant yeast cell of claim 39, wherein the polynucleotide comprises the nucleotide sequence of any one of SEQ ID NOs: 10, 12, 52, 54, 56, and 58. 52. A polynucleotide comprising, a nucleotide sequence that is at least 80% identical to any one of SEQ ID NOs: 10, 12, 52, 54, 56, and 58. 53. The polynucleotide of claim 52, wherein the polynucleotide comprises the nucleotide sequence of any one of SEQ ID NOs: 10, 12, 52, 54, 56, and 58. |
RELATED APPLICATION
This application claims the benefit under 35 U.S.C. § 119(e) to U.S. Provisional Application No. 63/374,764, entitled “PRODUCTION OF NATURAL PEPTIDE SWEETENER”, filed on September 07, 2022, the entire contents of which are incorporated herein by reference.
REFERENCE TO AN ELECTRONIC SEQUENCE LISTING
The contents of the electronic sequence listing (C149770091WO00-SEQ-VLJ.xml; Size: 102,226 bytes; and Date of Creation: September 5, 2023) is herein incorporated by reference in its entirety.
FIELD OF THE INVENTION
The field of the invention relates to methods and processes useful in the production of natural peptide sweeteners.
BACKGROUND
Zero- or low-calorie sweetener or sugar substitutes that can be used in foods and/or beverages to replace or reduce high-calorie sweeteners and/or sugar content are desirable. Thaumatin protein was first isolated from the fruit of West African plant Thaumatococcus daniellii Benth and has been reported to be 100,000 times sweeter than sucrose and may be suitable for use as sweetener
SUMMARY
The present disclosure, in some instances, provides methods of producing thaumatin. The method comprises culturing a recombinant yeast cell comprising a polynucleotide encoding a polypeptide comprising tandem repeats of thaumatin.
In a first aspect of the present invention, each repeat of thaumatin comprises the amino acid sequence of SEQ ID NO: 1. In a second aspect of the present invention, each repeat of thaumatin comprises the amino acid sequence of SEQ ID NO: 3. In some instances, each thaumatin in the tandem repeats is separated by a spacer. In one example, the spacer comprises a protease cleavage site for a yeast protease. In some cases, the spacer comprises the amino acid sequence of any one of SEQ ID NOs: 39, 41, and 43. In representative instances, the polypeptide comprises 2-8 repeats of thaumatin.
The polypeptide may further comprise an N-terminal signal peptide, such as a yeast alpha mating factor signal peptide. In some cases, the yeast alpha mating factor signal peptide comprises the amino acid sequence of SEQ ID NO: 37. In some embodiments, the signal peptide comprises an Ostl signal peptide. In further embodiments, the polypeptide comprises an Ostl signal peptide for each repeat of thaumatin. In representative examples, the Ostl comprises the amino acid sequence of SEQ ID NO: 5.
In some embodiments, the polypeptide comprises a 2A linker between signal peptide- thaumatin open reading frames. In representative examples, the 2A linker comprises the amino acid sequence of SEQ ID NO: 7.
The polynucleotide may be operably linked to a promoter. In some cases, the promoter is an A0X1 promoter. The polynucleotide may also be operably linked to a transcription terminator. In some cases, the transcription terminator is an A0X1 terminator.
In some instances, the polynucleotide is provided on a vector, optionally wherein the vector is a plasmid. The polynucleotide may be integrated into the genome of the recombinant yeast cell. In representative examples, the polynucleotide is integrated into a HIS4 locus of the genome of the recombinant yeast cell.
The recombinant yeast cell may further comprise one or more polynucleotides encoding one or more chaperones selected from: PDI1, HAC1, ERO1, ERO2, ERV1, ERV2, KAR2, SEC1, SLY1, and GPX1. In a number of instances, the recombinant yeast cell further comprises a polynucleotide encoding PDI1, a polynucleotide encoding HAC1, one or more polynucleotides encoding PD1, ERO1, and ERV2, one or more polynucleotides encoding PDI, ERV1, and ERO1, or polynucleotides encoding both PDI1 and KAR2. In some cases, the one or more polynucleotides encoding the one or more chaperones are provided on one or more vectors. The one or more polynucleotides encoding the one or more chaperones may be integrated into the genome of the recombinant yeast cell. In some cases, the one or more polynucleotides are integrated into a HIS4 locus of the genome of the recombinant yeast cell.
The recombinant yeast cell may further comprise one or more polynucleotides encoding one or more proteases selected from KEX1, KEX2, and Stel3. In some cases, the one or more polynucleotides encoding the one or more proteases are provided on one or more vectors. The one or more polynucleotides encoding the one or more proteases may be integrated into the genome of the recombinant yeast cell. In some cases, the one or more polynucleotides are integrated into a AOX1 locus of the genome of the recombinant yeast cell.
The yeast cell may be Pichia pastoris cell. The method may further comprise isolating the thaumatin. In one example, the method comprises isolating thaumatin I. In another example, the method comprises isolating thaumatin II. In some embodiments, the method further comprises isolating both thaumatin I and thaumatin II.
Uses of thaumatin produced using the method described herein as a sweetener are also provided. For example, in a further aspect, the present disclosure provides compositions or consumable products comprising the thaumatin produced using the method described herein. In some examples, the composition or the consumable product further comprises a second sweetener. In some examples, the second sweetener is a rebaudioside. In some examples, the composition or the consumable product at least one additive is selected from the group consisting of a carbohydrate, a polyol, an amino acid or salt thereof, a polyamino acid or salt thereof, a sugar acid or salt thereof, a nucleotide, an organic acid, an inorganic acid, an organic salt, an organic acid salt, an organic base salt, an inorganic salt, a bitter compound, a flavorant, a flavoring ingredient, an astringent compound, a protein, a protein hydrolysate, a surfactant, an emulsifier, a flavonoids, an alcohol, a polymer, and combinations thereof.
In some instances, the consumable product is selected from: a food product, a beverage product, a nutraceutical, a pharmaceutical, a dietary supplement, a dental hygienic composition, an edible gel composition, a cosmetic product and a tabletop flavoring. In some instances, the beverage product is selected from the group consisting of a carbonated beverage product and a non-carbonated beverage product. In some examples, the beverage product is selected from the group consisting of a soft drink, a fountain beverage, a frozen beverage; a ready -to-drink beverage; a frozen and ready -to-drink beverage, coffee, tea, a dairy beverage, a powdered soft drink, a liquid concentrate, flavored water, enhanced water, fruit juice, a fruit juice flavored drink, a sport drink, and an energy drink.
Further provided herein are recombinant yeast cells comprising a polynucleotide encoding a polypeptide comprising tandem repeats of thaumatin. In some examples, the recombinant yeast cell comprises a polypeptide comprising an amino acid sequence that is at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, and 100% identical to any one of SEQ ID NOs: 9, 11, 51, 53, 55 and 57. Further provided herein are recombinant yeast cells comprising a polynucleotide encoding a polypeptide comprising tandem repeats of thaumatin. In some examples, the recombinant yeast cell comprises a polynucleotide comprising a nucleotide sequence that is at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, and 100% identical to any one of SEQ ID NOs: 10, 12, 52, 54, 56, and 58.
Further provided herein are polypeptides comprising any one of SEQ ID NOs: 9, 11, 51, 53, 55 and 57. In some examples, the polypeptides comprises an amino acid sequence that is at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, and 100% identical to any one of SEQ ID NOs: 9, 11, 51, 53, 55 and 57.
Further provided herein are polynucleotides comprising any one of SEQ ID NOs: 10, 12, 52, 54, 56, and 58. In some examples, the polynucleotides comprises a nucleotide sequence that is at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, and 100% identical to any one of SEQ ID NOs: 10, 12, 52, 54, 56, and 58.
While the disclosure is susceptible to various modifications and alternative forms, specific instances thereof are shown by way of example in the figures and will herein be described in detail. It should be understood, however, that the figures and detailed description presented herein are not intended to limit the disclosure to the particular instances disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the present disclosure as defined by the appended claims.
Other features and advantages of this invention will become apparent in the following detailed description of preferred instances of this invention, taken with reference to the accompanying figures.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying figures are not intended to be drawn to scale. In the figures, each identical or nearly identical component that is illustrated in various figures is represented by a like numeral. For purposes of clarity, not every component may be labeled in every figure. In the figures:
FIG. 1 provides a schematic of the plasmid map of pHKA-Thmlxl. The plasmid is composed of 8,168bp, and contains an A0X1 promoter, and an Ostl signal peptide fused to Thaumatin.
FIG. 2 provides a schematic of the plasmid map of pHKA-ThmIx4. The plasmid is composed of 14,318bp, and contains 4 copies of the A0X1 promoter, and an Ostl signal peptide fused to Thaumatin.
FIG. 3 provides a schematic of the plasmid map of pHKA-Thmllxl. The plasmid is composed of 8,168bp, and contains an A0X1 promoter, and an Ostl signal peptide fused to Thaumatin.
FIG. 4 provides a schematic of the plasmid map of pHKA-ThmIIx4. The plasmid is composed of 14,318bp, and contains 4 copies of the A0X1 promoter, and an Ostl signal peptide fused to Thaumatin.
FIG. 5 provides an SDS-PAGE analysis of medium samples from induced culture. Legend: M: standard ladder; 1 : Thaumatin I can be expressed in ThmI engineered strain; 2: Thaumatin II can be expressed in Thmll engineered strain. Arrows show thaumatin I and II.
FIG. 6 provides an HPLC analysis. In panel A is the thaumatin standard; in panel B is the media sample of the ThmI strain; and in panel C is the media sample of the Thmll strain. Arrows indicate peaks corresponding to thaumatin I and II.
FIG. 7 provides an LC-MS analysis of thaumatin standard and samples. In panel A is the thaumatin I standard; in panel B is the sample from pHKA-Thml strain; and in panel C is the sample from pHKA-Thmll strain.
FIG. 8 provides a schematic of the plasmid map of pHKA-Ostl-ThmI-2A. The plasmid is composed of 8,915bp, and contains an A0X1 promoter, an Ostl signal peptide fused to Thaumatin I, a 2A linker, and an Ostl signal peptide fused to Thaumatin I.
FIG. 9 provides a schematic of the plasmid map of pHKA-Ostl-ThmII-2A. The plasmid is composed of 8,915bp, and contains an A0X1 promoter, an Ostl signal peptide fused to Thaumatin II, a 2A linker, and an Ostl signal peptide fused to Thaumatin II.
FIG. 10 provides an SDS-PAGE analysis of medium samples from induced culture. Lanes 1-6 are biological replicates of GS115 expressing pHKA-Ostl-ThmII-2A; Lane M is the protein ladder.
FIG. 11 provides an LC-MS analysis of media sample from pHKA-Ostl-ThmII-2A strain. FIG. 12 demonstrates that co-expression of Pp PDI1 or AtPDIl in Thm II strain increased thaumatin II production. Control: Parent strain, Pp PDI1 3 and 5: individul colonies from transformation of pPICZ-Pp PDI1 into Thm II strain. AtPDIl -3 and 5: individual colonies from transformation of pPICZ-AtPDIl into Thm II strain.
FIG. 13 demonstrates that co-expression of HAC1 in Thm II strain increased thaumatin production. Control: Parent strain; HAC1 colony no. 3 and HAC1 colony no. 4: individual colonies from transformation of pPICZ-HACl into Thm II strain.
FIG. 14 demonstrates that co-expression of PDI/ERO/ERV in Thmll strain increased thaumatin production. Control: parent strain, 4 and 5: individul colonies from transformation of pPICZ-PpPDI/PpEROl/PpERV2 into Thmll strain.
FIG. 15 demonstrates that co-expression of PDI1/KAR2 in Thmll strain increased thaumatin production. Control: average of 5 replicates of Thmll parent strain, PDI1+KAR2: average of five replicates of a colony from transformation of pPICZ-PpPDIl/PpKAR2 into Thmll strain.
FIG. 16 provides an SDS-PAGE analysis of Thmlx4 fermentation samples. Lanes 1 and 2: samples before methanol induction; lanes 3-11 : samples after methanol induction, 22, 27, 47, 55, 72, 77, 95, 102, and 118 hr; M: protein ladder; lane 12: thaumatin standard. Arrow shows thaumatin I.
FIG. 17 provides an SDS-PAGE analysis of ThmIIx4 fermentation samples. Lane 1 : thaumatin standard; M: protein ladder; lanes 2 and 3: samples before methanol induction; lanes 4 - 12: samples after methanol induction, 22, 27, 47, 55, 72, 77, 95, 102, and 118 hr. Arrow shows thaumatin II.
FIG. 18 provides a schematic of the plasmid map of pHKA-0stl-ThmI-linker-2Ax2, wherein the linker sequence comprises a KR(EA)5 linker (SEQ ID NO: 43). The plasmid is composed of 8,954bp, and contains an A0X1 promoter, an Ostl signal peptide fused to Thaumatin I, a KR(EA)5 linker (SEQ ID NO: 43), a 2A linker, and an Ostl signal peptide fused to Thaumatin I.
FIG. 19 provides a schematic of the plasmid map of pHKA-0stl-ThmI-linker-2Ax8, wherein the linker sequence comprises a KR(EA)5 linker (SEQ ID NO: 43). The plasmid is composed of 13,652bp, and contains an A0X1 promoter, an Ostl signal peptide fused to Thaumatin I, and 7 repeats of a KR(EA)5 linker (SEQ ID NO: 43), a 2A linker, and an Ostl signal peptide fused to Thaumatin I.
FIG. 20 provides a schematic of the plasmid map of pHKA-Ostl-Thml-linker- 2Ax2M8, wherein the linker sequence comprises a KR(EA)5 linker (SEQ ID NO: 43). The plasmid is composed of 28,806bp, and contains 8 copies of an A0X1 promoter, an Ostl signal peptide fused to Thaumatin I, a KR(EA)5 linker (SEQ ID NO: 43), an 2A linker, and an Ostl signal peptide fused to Thaumatin I.
FIG. 21 provides a schematic of the plasmid map of pHKA-Ostl-ThmII-linker-2Ax2, wherein the linker sequence comprises a KR(EA)5 linker (SEQ ID NO: 43). The plasmid is composed of 8,954bp, and contains an A0X1 promoter, an Ostl signal peptide fused to Thaumatin II, a KR(EA)5 linker (SEQ ID NO: 43), an 2A linker, and an Ostl signal peptide fused to Thaumatin II.
FIG. 22 provides a schematic of the plasmid map of pHKA-Ostl-ThmII-linker-2Ax8, wherein the linker sequence comprises a KR(EA)5 linker (SEQ ID NO: 43). The plasmid is composed of 13,652bp, and contains an A0X1 promoter, an Ostl signal peptide fused to Thaumatin II, and 7 repeats of a KR(EA)5 linker (SEQ ID NO: 43), an 2A linker, and an Ostl signal peptide fused to Thaumatin II.
FIG. 23 provides a schematic of the plasmid map of pHKA-Ostl-ThmII-2Ax2M8. The plasmid is composed of 28,806bp, and contains 8 copies of an A0X1 promoter, an Ostl signal peptide fused to Thaumatin II, a KR(EA)5 linker (SEQ ID NO: 43), an 2A linker, and an Ostl signal peptide fused to Thaumatin II.
FIG. 24 provides an SDS-PAGE analysis of medium samples from induced culture. Lanes 1-6 are biological replicates of GS115 expressing pHKA-Ostl-ThmII-linker-2Ax2, wherein the linker sequence comprises a KR(EA)5 linker (SEQ ID NO: 43); Lane M is the protein ladder.
DEFINITIONS
As used herein, the singular forms “a,” “an” and “the” include plural references unless the content clearly dictates otherwise.
To the extent that the term “include,” “have,” or the like is used in the description or the claims, such term is intended to be inclusive in a manner similar to the term “comprise” as “comprise” is interpreted when employed as a transitional word in a claim.
The word “exemplary” is used herein to mean serving as an example, instance, or illustration. Any embodiment described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments.
“Cellular system” is any cells that provide for the expression of ectopic proteins. It includes bacteria, yeast, plant cells and animal cells. It may include prokaryotic or eukaryotic host cells which are modified to express a recombinant protein and cultivated in an appropriate culture medium. It also includes the in vitro expression of proteins based on cellular components, such as ribosomes.
"Coding sequence" is to be given its ordinary and customary meaning to a person of ordinary skill in the art, and is used without limitation to refer to a DNA sequence that encodes for a specific amino acid sequence.
“Growing the Cellular System”. Growing includes providing an appropriate medium that would allow cells to multiply and divide, to form a cell culture. It also includes providing resources so that cells or cellular components can translate and make recombinant proteins.
“Protein Expression”. Protein production can occur after gene expression. It consists of the stages after DNA has been transcribed to messenger RNA (mRNA). The mRNA is then translated into polypeptide chains, which are ultimately folded into proteins. DNA or RNA may be present in the cells through transfection - a process of deliberately introducing nucleic acids into cells. The term is often used for non-viral methods in eukaryotic cells. It may also refer to other methods and cell types, although other terms are preferred: "transformation" is more often used to describe non-viral DNA transfer in bacteria, non-animal eukaryotic cells, including plant cells. In animal cells, transfection is the preferred term as transformation is also used to refer to progression to a cancerous state (carcinogenesis) in these cells. Transduction is often used to describe virus-mediated DNA transfer. Transformation, transduction, and viral infection are included under the definition of transfection for this application.
“Yeast”. According to the current disclosure a yeast are eukaryotic, single-celled microorganisms classified as members of the fungus kingdom. Yeasts are unicellular organisms which are believed to have evolved from multicellular ancestors.
As used herein, the singular forms "a, an" and "the" include plural references unless the content clearly dictates otherwise.
To the extent that the term "include," "have," or the like is used in the description or the claims, such term is intended to be inclusive in a manner similar to the term "comprise" as "comprise" is interpreted when employed as a transitional word in a claim.
The word "exemplary" is used herein to mean "serving as an example, instance, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
The term "complementary" is to be given its ordinary and customary meaning to a person of ordinary skill in the art and is used without limitation to describe the relationship between nucleotide bases that are capable to hybridizing to one another. For example, with respect to DNA, adenosine is complementary to thymine and cytosine is complementary to guanine. Accordingly, the subject technology also includes isolated nucleic acid fragments that are complementary to the complete sequences as reported in the accompanying Sequence Listing as well as those substantially similar nucleic acid sequences.
The terms "nucleic acid" and "nucleotide" are to be given their respective ordinary and customary meanings to a person of ordinary skill in the art and are used without limitation to refer to deoxyribonucleotides or ribonucleotides and polymers thereof in either single- or double-stranded form. Unless specifically limited, the term encompasses nucleic acids containing known analogues of natural nucleotides that have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally-occurring nucleotides. Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses conservatively modified or degenerate variants thereof (e.g., degenerate codon substitutions) and complementary sequences, as well as the sequence explicitly indicated.
The term "isolated" is to be given its ordinary and customary meaning to a person of ordinary skill in the art, and when used in the context of an isolated nucleic acid or an isolated polypeptide, is used without limitation to refer to a nucleic acid or polypeptide that, by the hand of man, exists apart from its native environment and is therefore not a product of nature. An isolated nucleic acid or polypeptide can exist in a purified form or can exist in a non-native environment such as, for example, in a transgenic host cell.
The terms "incubating" and "incubation" as used herein means a process of mixing two or more chemical or biological entities (such as a chemical compound and an enzyme) and allowing them to interact under conditions favorable for producing a thaumatin composition.
The term "degenerate variant" refers to a nucleic acid sequence having a residue sequence that differs from a reference nucleic acid sequence by one or more degenerate codon substitutions. Degenerate codon substitutions can be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed base and/or deoxyinosine residues. A nucleic acid sequence and all of its degenerate variants will express the same amino acid or polypeptide.
The terms "polypeptide," "protein," and "peptide" are to be given their respective ordinary and customary meanings to a person of ordinary skill in the art; the three terms are sometimes used interchangeably, and are used without limitation to refer to a polymer of amino acids, or amino acid analogs, regardless of its size or function. Although "protein" is often used in reference to relatively large polypeptides, and "peptide" is often used in reference to small polypeptides, usage of these terms in the art overlaps and varies. The term "polypeptide" as used herein refers to peptides, polypeptides, and proteins, unless otherwise noted. The terms "protein," "polypeptide," and "peptide" are used interchangeably herein when referring to a polyaminoacid product. Thus, exemplary polypeptides include polyaminoacid products, naturally occurring proteins, homologs, orthologs, paralogs, fragments and other equivalents, variants, and analogs of the foregoing.
The terms "polypeptide fragment" and "fragment," when used in reference to a reference polypeptide, are to be given their ordinary and customary meanings to a person of ordinary skill in the art and are used without limitation to refer to a polypeptide in which amino acid residues are deleted as compared to the reference polypeptide itself, but where the remaining amino acid sequence is usually identical to the corresponding positions in the reference polypeptide. Such deletions can occur at the amino-terminus or carboxy -terminus of the reference polypeptide, or alternatively both.
The term "functional fragment" of a polypeptide or protein refers to a peptide fragment that is a portion of the full-length polypeptide or protein, and has substantially the same biological activity, or carries out substantially the same function as the full-length polypeptide or protein (e.g., carrying out the same enzymatic reaction).
The terms "variant polypeptide," "modified amino acid sequence" or "modified polypeptide," which are used interchangeably, refer to an amino acid sequence that is different from the reference polypeptide by one or more amino acids, e.g., by one or more amino acid substitutions, deletions, and/or additions. In an aspect, a variant is a "functional variant" which retains some or all of the ability of the reference polypeptide.
The term "functional variant" further includes conservatively substituted variants. The term "conservatively substituted variant" refers to a peptide having an amino acid sequence that differs from a reference peptide by one or more conservative amino acid substitutions and maintains some or all of the activity of the reference peptide. A "conservative amino acid substitution" is a substitution of an amino acid residue with a functionally similar residue. Examples of conservative substitutions include the substitution of one non-polar (hydrophobic) residue such as isoleucine, valine, leucine or methionine for another; the substitution of one charged or polar (hydrophilic) residue for another such as between arginine and lysine, between glutamine and asparagine, between threonine and serine; the substitution of one basic residue such as lysine or arginine for another; or the substitution of one acidic residue, such as aspartic acid or glutamic acid for another; or the substitution of one aromatic residue, such as phenylalanine, tyrosine, or tryptophan for another. Such substitutions are expected to have little or no effect on the apparent molecular weight or isoelectric point of the protein or polypeptide. The phrase "conservatively substituted variant" also includes peptides wherein a residue is replaced with a chemically-derivatized residue, provided that the resulting peptide maintains some or all of the activity of the reference peptide as described herein.
The term "variant," in connection with the polypeptides of the subject technology, further includes a functionally active polypeptide having an amino acid sequence at least 75%, at least 76%, at least 77%, at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, and even 100% identical to the amino acid sequence of a reference polypeptide.
The term "homologous" in all its grammatical forms and spelling variations refers to the relationship between polynucleotides or polypeptides that possess a "common evolutionary origin," including polynucleotides or polypeptides from super-families and homologous polynucleotides or proteins from different species (Reeck et al., CELL 50:667, 1987). Such polynucleotides or polypeptides have sequence homology, as reflected by their sequence similarity, whether in terms of percent identity or the presence of specific amino acids or motifs at conserved positions. For example, two homologous polypeptides can have amino acid sequences that are at least 75%, at least 76%, at least 77%, at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 900 at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, and even 100% identical.
"Suitable regulatory sequences" is to be given its ordinary and customary meaning to a person of ordinary skill in the art, and is used without limitation to refer to nucleotide sequences located upstream (5' non-coding sequences), within, or downstream (3' non-coding sequences) of a coding sequence, and which influence the transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences may include promoters, translation leader sequences, introns, and polyadenylation recognition sequences.
"Promoter" is to be given its ordinary and customary meaning to a person of ordinary skill in the art, and is used without limitation to refer to a DNA sequence capable of controlling the expression of a coding sequence or functional RNA. Typically, a coding sequence is located 3' to a promoter sequence. Promoters may be derived in their entirety from a native gene, or be composed of different elements derived from different promoters found in nature, or even comprise synthetic DNA segments. It is understood by those skilled in the art that different promoters may direct the expression of a gene in different tissues or cell types, or at different stages of development, or in response to different environmental conditions.
Promoters which cause a gene to be expressed in most cell types at most times, are commonly referred to as "constitutive promoters." It is further recognized that since in most cases the exact boundaries of regulatory sequences have not been completely defined, DNA fragments of different lengths may have identical promoter activity.
The term "operably linked" refers to the association of nucleic acid sequences on a single nucleic acid fragment so that the function of one is affected by the other. For example, a promoter is operably linked with a coding sequence when it is capable of affecting the expression of that coding sequence (z.e., that the coding sequence is under the transcriptional control of the promoter). Coding sequences can be operably linked to regulatory sequences in sense or antisense orientation.
The term "expression" as used herein, is to be given its ordinary and customary meaning to a person of ordinary skill in the art, and is used without limitation to refer to the transcription and stable accumulation of sense (mRNA) or antisense RNA derived from the nucleic acid fragment of the subject technology. "Over-expression" refers to the production of a gene product in transgenic or recombinant organisms that exceeds levels of production in normal or non-transformed organisms.
"Transformation" is to be given its ordinary and customary meaning to a person of reasonable skill in the field, and is used without limitation to refer to the transfer of a polynucleotide into a target cell for further expression by that cell. The transferred polynucleotide can be incorporated into the genome or chromosomal DNA of a target cell, resulting in genetically stable inheritance, or it can replicate independent of the host chromosomal DNA. Host organisms containing the transformed nucleic acid fragments are referred to as "transgenic" or "recombinant" or "transformed" organisms.
The terms "transformed," "transgenic," and "recombinant," when used herein in connection with host cells, are to be given their respective ordinary and customary meanings to a person of ordinary skill in the art, and are used without limitation to refer to a cell of a host organism, such as a plant or microbial cell, into which a heterologous nucleic acid molecule has been introduced. The nucleic acid molecule can be stably integrated into the genome of the host cell, or the nucleic acid molecule can be present as an extrachromosomal molecule. Such an extrachromosomal molecule can be auto-replicating. Transformed cells, tissues, or subjects are understood to encompass not only the end product of a transformation process, but also transgenic progeny thereof.
The terms "recombinant," "heterologous," and "exogenous," when used herein in connection with polynucleotides, are to be given their ordinary and customary meanings to a person of ordinary skill in the art, and are used without limitation to refer to a polynucleotide (e.g., a DNA sequence or a gene) that originates from a source foreign to the particular host cell or, if from the same source, is modified from its original form. Thus, a heterologous gene in a host cell includes a gene that is endogenous to the particular host cell but has been modified through, for example, the use of site-directed mutagenesis or other recombinant techniques. The terms also include non-naturally occurring multiple copies of a naturally occurring DNA sequence. Thus, the terms refer to a DNA segment that is foreign or heterologous to the cell, or homologous to the cell but in a position or form within the host cell in which the element is not ordinarily found.
Similarly, the terms "recombinant," "heterologous," and "exogenous," when used herein in connection with a polypeptide or amino acid sequence, means a polypeptide or amino acid sequence that originates from a source foreign to the particular host cell or, if from the same source, is modified from its original form. Thus, recombinant DNA segments can be expressed in a host cell to produce a recombinant polypeptide.
The terms "plasmid," "vector," and "cassette" are to be given their respective ordinary and customary meanings to a person of ordinary skill in the art and are used without limitation to refer to an extra chromosomal element often carrying genes which are not part of the central metabolism of the cell, and usually in the form of circular double-stranded DNA molecules. Such elements may be autonomously replicating sequences, genome integrating sequences, phage or nucleotide sequences, linear or circular, of a single- or double-stranded DNA or RNA, derived from any source, in which a number of nucleotide sequences have been joined or recombined into a unique construction which is capable of introducing a promoter fragment and DNA sequence for a selected gene product along with appropriate 3' untranslated sequence into a cell. "Transformation cassette" refers to a specific vector containing a foreign gene and having elements in addition to the foreign gene that facilitate transformation of a particular host cell. "Expression cassette" refers to a specific vector containing a foreign gene and having elements in addition to the foreign gene that allow for enhanced expression of that gene in a foreign host.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the disclosure belongs. Although any methods and materials similar to or equivalent to those described herein may be used in the practice or testing of the present disclosure, the preferred materials and methods are described below.
DETAILED DESCRIPTION
Thaumatin is a group of intensely sweet proteins which was first isolated from the fruit of plant Thaumatococcus daniellii Benth. found in West Africa. Thaumatin is 1600 times sweeter than sucrose on a weight basis or approximately 100,000 on a molar basis. There are two forms of thaumatin identified in fruit, thaumatin I and II. The molecular mass of the protein is 22 kDa. Thaumatin I has 207 amino acid residues. Thaumatin II is also composed of 207 amino acid residues but has differs in 4 amino acids from Thaumatin I. Thaumatin is heatstable and its sweet taste is preserved after incubation at a pH below 5.5 for 1 hr. At these pH values the sweetener is stable during heat-intensive processing steps such as pasteurization, canning, baking, and ultra-high temperature processing. Above 70°C at a pH of 7.0, loss of sweetness was observed. Thaumatin has eight intramolecular disulfide bonds which are believed to relate to its heat stability. Loss of sweetness can be associated with heat-driven denaturation or breakage of the disulfide bonds within the protein. In the present invention, engineered Pichia strains for producing and secreting thaumatin I and thaumatin II were engineered. The product thaumatin was characterized by the same taste as thaumatin extracted from fruit.
Production of Thaumatin in Recombinant Yeast Cells
The present disclosure, in some instances, provide methods of producing thaumatin, in which multiple strategies are employed to increase thaumatin folding, secretion and/or production in engineered yeast cells (e.g., engineered Pichia cells).
In some instances, a method of producing thaumatin described herein comprises culturing a recombinant yeast cell comprising a polynucleotide encoding a fusion polypeptide which comprises tandem repeats of thaumatin. In some examples, the polynucleotide is provided on a vector (e.g., a plasmid such as an expression plasmid). In some cases, the plasmid is a high copy plasmid (e.g., for high-level expression of the fusion polypeptide comprising the tandem repeats of thaumatin). In some cases, the polynucleotide is integrated into the genome of the recombinant yeast cell. For example, in some embodiments, the polynucleotide is integrated into a HIS4 locus of the genome of the recombinant yeast cell.
In some instances, the method of producing thaumatin described herein comprises culturing a recombinant yeast cell comprising a polynucleotide encoding a fusion polypeptide which comprises tandem repeats of thaumatin. In some examples, the polynucleotide comprises 2-8 (2-20, 2-16, 2-15, 2-10, 2-8, 8-20, 8-16, 8-15, 8-10, 10-20, 10-16, 10-15, 15-20, 15-16, 16-20) repeats of thaumatin. In some examples, the polynucleotide comprises 2, 8, 10, 15, 16, 20 repeats of thaumatin. In some examples, the polynucleotide comprises 2 repeats of thaumatin. In some examples, the polynucleotide comprises 8 repeats of thaumatin. In some examples, the polynucleotide comprises 16 repeats of thaumatin.
In some instances, the method of producing thaumatin described herein comprises culturing a recombinant yeast cell comprising a polynucleotide encoding a fusion polypeptide. In some embodiments, the fusion polypeptide comprises an amino acid sequence at least 80% (at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%) identical to the amino acid sequence of any one of SEQ ID NOs: 9, 11, 51, 53, 55 and 57. In some embodiments, the fusion polypeptide comprises an amino acid sequence at least 80% , at least 85%, at least 90%, at least 95%, at least 98%, at least 99% identical to the amino acid sequence of any one of SEQ ID NOs: 9, 11, 51, 53, 55 and 57. In some embodiments, the fusion polypeptide comprises the amino acid sequence of any one of SEQ ID NOs: 9, 11, 51, 53, 55 and 57. In some instances, each repeat of thaumatin comprises an amino acid sequence that is at least 70% (e.g., at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 99%, at least 99%, or even 100%) identical to the amino acid sequence of SEQ ID NO: 1. In some examples, each repeat of thaumatin comprises the amino acid sequence of SEQ ID NO: 1. In some examples, each repeat of thaumatin comprises the amino acid sequence of SEQ ID NO: 3. More generally, each polypeptide comprises at least 2 (e.g., at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or at least 20) repeats of thaumatin. In some examples, each polypeptide comprises 2-20 (e.g., 2-20, 2-15, 2-10, 2-5, 5-20, 5-15, 5-10, 8-16 10-20, 10-15, or 15-20) repeats of thaumatin. In some cases, each polypeptide comprises 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 repeats of thaumatin. In some examples, each polypeptide comprises 2 repeats of thaumatin. In some examples, each polypeptide comprises 8 repeats of thaumatin. In some examples, each polypeptide comprises 16 repeats of thaumatin.
In representative instances, in the fusion polypeptide comprising tandem repeats of thaumatin, each thaumatin repeat is separated by a spacer. In some cases, the spacer is cleaved by a protease (e.g., cleaved in vivo by a protease in the yeast cell). As such, in some cases, each spacer between the thaumatin repeats comprises a protease cleavage site for a yeast protease. In some embodiments, each spacer comprises an amino acid sequence at least 70% (e.g., at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99%) identical to the amino acid sequence of any one of SEQ ID NOs: 39, 41, 43, 45, 47, or 49. In some examples, each spacer comprises the amino acid sequence of any one of SEQ ID NOs: 39, 41, 43, 45, 47, or 49.
In some examples, the fusion polypeptide further comprises an N-terminal signal peptide. A “signal peptide” refers to a short peptide present at the N-terminus of a protein destined to be secreted from a cell. In some cases, a signal peptide comprises a stretch of hydrophobic amino acid residues that facilitate the translocation of a newly synthesized peptide or protein to the cell membrane for subsequent secretion through the cell membrane. More typically, a signal peptide is 5-23 (e.g., 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, or 23) amino acids in length. A protein with a signal peptide can be encapsulated in a secretory vesicle and trafficked to the cell membrane via the secretory pathway. The mechanism by which a newly synthesized peptide or protein comprising a signal peptide is secreted from the cell will be known by a person having ordinary skill in the art.
In some instances, the signal peptide is a yeast alpha mating factor signal peptide. In some examples, the yeast alpha mating factor signal peptide comprises the amino acid sequence of SEQ ID NO: 37. In other instances, the signal peptide is an Ostl signal peptide. In some examples, the fusion polypeptide comprises an Ostl signal peptide for each repeat of thaumatin. In some examples, the Ostl comprises the amino acid sequence of SEQ ID NO: 5.
In some instances, the fusion polypeptide comprises a 2A linker between signal peptide-thaumatin open reading frames. In some examples, the 2A linker comprises the amino acid sequence of SEQ ID NO: 7.
In representative examples, the polynucleotide encoding the fusion polypeptide comprising the tandem repeats of thaumatin is operably linked to a promoter. In some cases, the promoter is a constitutive promoter (e.g., a constitutive promoter in yeast). In some cases, the promoter comprises an A0X1 promoter (e.g., a yeast A0X1 promoter). In some cases, the polynucleotide encoding the fusion polypeptide comprising the tandem repeats of thaumatin is operably linked to a transcription terminator. In some cases, the transcription terminator is an A0X1 terminator (e.g., a yeast A0X1 terminator).
Co-Expression of Chaperones
In some examples, the method described herein comprises co-expressing the fusion polypeptide comprising the tandem repeats of thaumatin with one or more chaperones to facilitate intramolecular disulfide bond formation, folding, and/or secretion. In some cases, the one or more chaperones are selected from: PDI1, HAC1, ERO1, ERO2, ERV1, ERV2, KAR2, SEC1, SLY1, and GPX1. Chaperones of the same yeast strain as the yeast recombinant cell used for expression of the polypeptide may be used. Heterologous chaperones from other yeast strains may also be used. Non-limiting examples of chaperones and their Genbank accession numbers are provided below in Table 3. As such, in some cases, the yeast recombinant cell used in the methods described herein further comprises one or more polynucleotides encoding one or more chaperones selected from PDI1, HAC1, ERO1, ERO2, ERV1, ERV2, KAR2, SEC1, SLY1, and GPX1. PDI1 is the structural gene for Protein Disulfide Isomerase (PDI).
In some examples, the PDI1 comprises an amino acid sequence at least 70% (e.g., at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 99%, or even 100%) identical to the amino acid sequence of SEQ ID NO: 13. In some cases, the PDI comprises the amino acid sequence of SEQ ID NO: 13. In some examples, the polynucleotide encoding the PDI comprises a nucleotide sequence at least 70% (e.g., at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 100%) identical to the nucleotide sequence of SEQ ID NO: 14. In some cases, the polynucleotide encoding the PDI1 comprises the nucleotide sequence of SEQ ID NO: 14.
In some examples, the PDI1 comprises an amino acid sequence at least 70% (e.g., at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99%) identical to the amino acid sequence of SEQ ID NO: 15. In some cases, the PDI1 comprises the amino acid sequence of SEQ ID NO: 15. In some examples, the polynucleotide encoding the PDI1 comprises a nucleotide sequence at least 70% (e.g., at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99%) identical to the nucleotide sequence of SEQ ID NO: 16. In some cases, the polynucleotide encoding the PDI1 comprises the nucleotide sequence of SEQ ID NO: 16.
In some examples, the HAC1 comprises an amino acid sequence at least 70% (e.g., at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99%) identical to the amino acid sequence of SEQ ID NO: 17. In some cases, the HAC1 comprises the amino acid sequence of SEQ ID NO: 17. In some examples, the polynucleotide encoding the HAC1 comprises a nucleotide sequence at least 70% (e.g., at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99%) identical to the nucleotide sequence of SEQ ID NO: 18. In some cases, the polynucleotide encoding the HAC1 comprises the nucleotide sequence of SEQ ID NO: 18.
In some examples, the ERO1 comprises an amino acid sequence at least 70% (e.g., at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99%) identical to the amino acid sequence of SEQ ID NO: 19 or SEQ ID NO: 27. In some cases, the ERO1 comprises the amino acid sequence of SEQ ID NO: 19 or SEQ ID NO: 27. In some examples, the polynucleotide encoding the ERO1 comprises a nucleotide sequence at least 70% (e.g., at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99%) identical to the nucleotide sequence of SEQ ID NO: 20 or SEQ ID NO: 28. In some cases, the polynucleotide encoding the ERO1 comprises the nucleotide sequence of SEQ ID NO: 20 or SEQ ID NO: 28.
In some examples, the ERO2 comprises an amino acid sequence at least 70% (e.g., at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99%) identical to the amino acid sequence of SEQ ID NO: 21. In some cases, the ERO2 comprises the amino acid sequence of SEQ ID NO: 21. In some examples, the polynucleotide encoding the ERO2 comprises a nucleotide sequence at least 70% (e.g., at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99%) identical to the nucleotide sequence of SEQ ID NO: 22. In some cases, the polynucleotide encoding the ERO2 comprises the nucleotide sequence of SEQ ID NO: 22.
In some examples, the ERV1 comprises an amino acid sequence at least 70% (e.g., at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99%) identical to the amino acid sequence of SEQ ID NO: 23. In some cases, the ERV1 comprises the amino acid sequence of SEQ ID NO: 23. In some examples, the polynucleotide encoding the ERV1 comprises a nucleotide sequence at least 70% (e.g., at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99%) identical to the nucleotide sequence of SEQ ID NO: 24. In some cases, the polynucleotide encoding the ERV1 comprises the nucleotide sequence of SEQ ID NO: 24.
In some examples, the ERV2 comprises an amino acid sequence at least 70% (e.g., at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99%) identical to the amino acid sequence of SEQ ID NO: 25. In some cases, the ERV2 comprises the amino acid sequence of SEQ ID NO: 25. In some examples, the polynucleotide encoding the ERV2 comprises a nucleotide sequence at least 70% (e.g., at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99%) identical to the nucleotide sequence of SEQ ID NO: 26. In some cases, the polynucleotide encoding the ERV2 comprises the nucleotide sequence of SEQ ID NO: 26.
In some examples, the KAR2 comprises an amino acid sequence at least 70% (e.g., at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99%) identical to the amino acid sequence of SEQ ID NO: 29. In some cases, the KAR2 comprises the amino acid sequence of SEQ ID NO: 29. In some examples, the polynucleotide encoding the KAR2 comprises a nucleotide sequence at least 70% (e.g., at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99%) identical to the nucleotide sequence of SEQ ID NO: 30. In some cases, the polynucleotide encoding the KAR2 comprises the nucleotide sequence of SEQ ID NO: 30.
In some examples, the SEC1 comprises an amino acid sequence at least 70% (e.g., at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99%) identical to the amino acid sequence of SEQ ID NO: 31. In some cases, the SEC1 comprises the amino acid sequence of SEQ ID NO: 31. In some examples, the polynucleotide encoding the SEC1 comprises a nucleotide sequence at least 70% (e.g., at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99%) identical to the nucleotide sequence of SEQ ID NO: 32. In some cases, the polynucleotide encoding the SEC1 comprises the nucleotide sequence of SEQ ID NO: 32.
In some examples, the SLY1 comprises an amino acid sequence at least 70% (e.g, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 99%, or at least 99%) identical to the amino acid sequence of SEQ ID NO: 33. In some cases, the SLY1 comprises the amino acid sequence of SEQ ID NO: 33. In some examples, the polynucleotide encoding the SLY1 comprises a nucleotide sequence at least 70% (e.g, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99%) identical to the nucleotide sequence of SEQ ID NO: 34. In some cases, the polynucleotide encoding the SLY1 comprises the nucleotide sequence of SEQ ID NO: 34.
In some examples, the GPX1 comprises an amino acid sequence at least 70% (e.g., at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99%) identical to the amino acid sequence of SEQ ID NO: 35. In some cases, the GPX1 comprises the amino acid sequence of SEQ ID NO: 35. In some examples, the polynucleotide encoding the GPX1 comprises a nucleotide sequence at least 70% (e.g., at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99%) identical to the nucleotide sequence of SEQ ID NO: 36. In some cases, the polynucleotide encoding the GPX1 comprises the nucleotide sequence of SEQ ID NO: 36.
In some examples, the recombinant yeast cell further comprises a polynucleotide encoding PDI1 (e.g., a PDI1 comprising the amino acid sequence of SEQ ID NO: 13). In some cases, the recombinant yeast cell further comprises a polynucleotide comprising the nucleotide sequence of SEQ ID NO: 14. In some examples, the recombinant yeast cell further comprises a polynucleotide encoding PDI1 (e.g., a PDI1 comprising the amino acid sequence of SEQ ID NO: 15). In some cases, the recombinant yeast cell further comprises a polynucleotide comprising the nucleotide sequence of SEQ ID NO: 16.
In some examples, the recombinant yeast cell further comprises a polynucleotide encoding HAC1 (e.g., a HAC1 comprising the amino acid sequence of SEQ ID NO: 17). In some cases, the recombinant yeast cell further comprises a polynucleotide comprising the nucleotide sequence of SEQ ID NO: 18.
In some examples the recombinant yeast cell further comprises a polynucleotide encoding PDI1 (e.g., a PDI1 comprising the amino acid sequence of SEQ ID NO: 15), a polynucleotide encoding ERO1 (e.g., an ERO1 comprising the amino acid sequence of SEQ ID NO: 19), and a polynucleotide encoding ERV2 (e.g., an ERV2 comprising the amino acid sequence of SEQ ID No: 25). In some cases, the recombinant yeast cell further comprises a polynucleotide comprising the nucleotide sequence of SEQ ID NO: 16, a polynucleotide comprising the nucleotide sequence of SEQ ID NO: 20, and a polynucleotide comprising the nucleotide sequence of SEQ ID NO: 26.
In some examples, the recombinant yeast cell further comprises a polynucleotide encoding PDI1 (e.g, a PDI1 comprising the amino acid sequence of SEQ ID NO: 15), a polynucleotide encoding ERV1 (e.g, an ERV1 comprising the amino acid sequence of SEQ ID NO: 23), and a polynucleotide encoding ERO2 (e.g., an ERO2 comprising the amino acid sequence of SEQ ID No: 21). In some cases, the recombinant yeast cell further comprises a polynucleotide comprising the nucleotide sequence of SEQ ID NO: 16, a polynucleotide comprising the nucleotide sequence of SEQ ID NO: 24, and a polynucleotide comprising the nucleotide sequence of SEQ ID NO: 22.
In some examples, the recombinant yeast cell further comprises a polynucleotide encoding PDI1 (e.g., a PDI comprising the amino acid sequence of SEQ ID NO: 15), a polynucleotide encoding KAR2 (e.g., an KAR2 comprising the amino acid sequence of SEQ ID NO: 29). In some cases, the recombinant yeast cell further comprises a polynucleotide comprising the nucleotide sequence of SEQ ID NO: 16 and a polynucleotide comprising the nucleotide sequence of SEQ ID NO: 30.
In some examples, each of the one or more polynucleotides encoding the one or more chaperones is operably linked to a promoter (e.g., a promoter selected from AOX1 promoter, GAP1 promoter, and CAT1 promoter). In some examples, the one or more polynucleotides encoding the one or more chaperones are provided on one or more vectors (e.g., plasmids). In some examples, the one or more (e.g., 1, 2, 3, 4, 5 or more) polynucleotides encoding the one or more chaperones are provided on the same vector as the polynucleotide encoding the polypeptide comprising the tandem thaumatin repeats. In some other examples, the one or more (e.g., 1, 2, 3, 4, 5 or more) polynucleotides encoding the one or more chaperones are provided on different vectors as the polynucleotide encoding the polypeptide comprising the tandem thaumatin repeats.
In some examples, the one or more polynucleotides encoding the one or more chaperones are integrated into the genome of the recombinant yeast cell. In some cases, the one or more polynucleotides are integrated into a HIS4 or AOX1 locus of the genome of the recombinant yeast cell.
In some embodiments, recombinant yeast cell comprising a polynucleotide encoding a fusion polypeptide comprising tandem repeats of thaumatin. In some embodiments, the thaumatin is selected from the group consisting of thaumatin I and thaumatin II.
In some embodiments, each repeat of thaumatin comprises an amino acid sequence at least 70% (e.g., at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99%) identical to the amino acid sequence of SEQ ID NO: 1. In some embodiments, each repeat of thaumatin comprises the amino acid sequence of SEQ ID NO: 1. In some embodiments, each repeat of thaumatin comprises an amino acid sequence at least 70% (e.g, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99%) identical to the amino acid sequence of SEQ ID NO: 3. In some embodiments, each repeat of thaumatin comprises the amino acid sequence of SEQ ID NO: 3.
In some embodiments, each thaumatin in the tandem repeats is separated by a spacer. In some embodiments, the spacer comprises a protease cleavage site for a yeast protease. In some embodiments, the spacer comprises an amino acid sequence at least 70% (e.g, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99%) identical to the amino acid sequence of any one of SEQ ID NOs: 39, 41, and 43. In some embodiments, the spacer comprises the amino acid sequence of any one of SEQ ID NOs: 39, 41, and 43.
In some embodiments, the fusion polypeptide comprises 2-8 repeats of thaumatin. In some embodiments, the fusion polypeptide comprises 2-8 (2-20, 2-16, 2-15, 2-10, 2-8, 8-20, 8- 16, 8-15, 8-10, 10-20, 10-16, 10-15, 15-20, 15-16, 16-20) repeats of thaumatin. In some embodiments, the fusion polypeptide comprises 2, 8, 10, 15, 16, 20 repeats of thaumatin. In some embodiments, the fusion polypeptide comprises 2 repeats of thaumatin. In some embodiments, the fusion polypeptide comprises 8 repeats of thaumatin. In some embodiments, the fusion polypeptide comprises 16 repeats of thaumatin.
In some embodiments, the fusion polypeptide comprises an amino acid sequence at least 70% (e.g., at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99%) identical to the amino acid sequence of any one of SEQ ID NOs: 9, 11, 51, 53, 55 and 57. In some embodiments, the fusion polypeptide comprises an amino acid sequence at least 80% identical to the amino acid sequence of any one of SEQ ID NOs: 9, 11, 51, 53, 55 and 57. In some embodiments, the fusion polypeptide comprises an amino acid sequence of any one of SEQ ID NOs: 9, 11, 51, 53, 55, and 57.
In some embodiments, the polynucleotide comprises a nucleotide sequence at least 70% (e.g., at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99%) identical to the nucleotide sequence of any one of SEQ ID NOs: 10, 12, 52, 54, 56, and 58. In some embodiments, the polynucleotide comprises a nucleotide sequence at least 80% identical to the nucleotide sequence of any one of SEQ ID NOs: 10, 12, 52, 54, 56, and 58. In some embodiments, the polynucleotide comprises a nucleotide sequence of any one of SEQ ID NOs: 10, 12, 52, 54, 56, and 58. Co-Expression of Proteases
In some examples, the method described herein comprising co-expressing the fusion polypeptide comprising the tandem repeats of thaumatin, and optionally the one or more chaperones, with one or more proteases to facilitate processing of the polypeptide comprising the tandem thaumatin repeats, and/or release and secretion of the individual thaumatin proteins.
In some embodiments, the protease is selected from KEX1, KEX2, and Stel3. Proteases of the same yeast strain as the yeast recombinant cell used for expression of the polypeptide may be used. Heterologous proteases from other yeast strains may also be used. As such, in some embodiments, the yeast recombinant cell used in the methods described herein further comprises one or more polynucleotides encoding one or more proteases selected from KEX1, KEX2, and Stel3.
In some examples, the one or more polynucleotides encoding the one or more proteases are provided on one or more vectors (e.g., plasmids). In some examples, the one or more (e.g., 1, 2, 3, or more) polynucleotides encoding the one or more proteases are provided on the same vector as the polynucleotide encoding the fusion polypeptide comprising the tandem thaumatin repeats. In some other examples, the one or more (e.g., 1, 2, 3, 4, 5 or more) polynucleotides encoding the one or more proteases are provided on different vectors as the polynucleotide encoding the polypeptide comprising the tandem thaumatin repeats. In some examples, the one or more polynucleotides encoding the one or more proteases are integrated into the genome of the recombinant yeast cell. In some cases, the one or more polynucleotides are integrated into a HIS4 or AOX1 locus of the genome of the recombinant yeast cell.
Host Yeast Strains
Any yeast strain may be suitable as the recombinant yeast cell used in the methods described herein. Non-limiting examples of yeast strains include: Pichia pastoris, Pichia farinose, Pichia anomala, Pichia heedii, Pichia guiltier mondii, Pichia kluyveri, Pichia membranifaciens, Pichia norvegensis, Pichia ohmeri, Pichia methanolica, Pichia subpelliculosa, Komagataella phaffii, Komagataella pastoris, Komagataella pseudopastoris, Candida vulgaris, Saccharomyces arboricolus, Saccharomyces bayanus, Saccharomyces bulderi, Saccharomyces cariocanus, Saccharomyces cariocus, Saccharomyces cerevisiae, Saccharomyces cerevisiae var. boulardii, Saccharomyces chevalieri, Saccharomyces dairenensis, Saccharomyces ellipsoideus, Saccharomyces eubayanus, Saccharomyces exiguous, Saccharomyces jlorentinus, Saccharomyces fragilis, Saccharomyces kudriavzevii, Saccharomyces martiniae, Saccharomyces mikatae, Saccharomyces monacensis, Saccharomyces norbensis, Saccharomyces paradoxus, Saccharomyces pastorianus, Saccharomyces spencerorum, Saccharomyces turicensis, Saccharomyces unisporus, Saccharomyces uvarum, and Saccharomyces zonatus. In some embodiments, the recombinant yeast cell in the methods described herein is a recombinant Pichia pastoris cell.
The method may further comprise isolating the thaumatin. In one example, the method comprises isolating thaumatin I. In another example, the method comprises isolating thaumatin II. In some embodiments, the method further comprises isolating both thaumatin I and thaumatin II.
Synthetic Biology
Standard recombinant DNA and molecular cloning techniques used here are well known in the art and are described, for example, by Sambrook, J., Fritsch, E. F. and Maniatis, T. MOLECULAR CLONING: A LABORATORY MANUAL, 2nd ed.; Cold Spring Harbor Laboratory: Cold Spring Harbor, N.Y., 1989 (hereinafter "Maniatis"); and by Silhavy, T. J., Bennan, M. L. and Enquist, L. W. EXPERIMENTS WITH GENE FUSIONS; Cold Spring Harbor Laboratory: Cold Spring Harbor, N. Y., 1984; and by Ausubel, F. M. et al., IN CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, published by GREENE PUBLISHING AND WILEY-INTERSCIENCE, 1987; (the entirety of each of which is hereby incorporated herein by reference). Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the disclosure belongs. Although any methods and materials similar to or equivalent to those described herein can be used in the practice or testing of the present disclosure, the preferred materials and methods are described below.
The disclosure will be more fully understood upon consideration of the following nonlimiting Examples. It should be understood that these Examples, while indicating preferred embodiments of the subject technology, are given by way of illustration only. From the above discussion and these Examples, one skilled in the art can ascertain the essential characteristics of the subject technology, and without departing from the spirit and scope thereof, can make various changes and modifications of the subject technology to adapt it to various uses and conditions. In some embodiments, the yeast cell is of the strain Pichia pastoris.
Yeast Production Systems
Expression of proteins in eukaryotes is most often carried out in a yeast host cell with vectors containing constitutive or inducible promoters directing the expression of either fusion or non-fusion proteins. Fusion vectors add a number of amino acids to a protein encoded therein, usually to the amino terminus of the recombinant protein. Such fusion vectors typically serve three purposes: (1) to increase expression of recombinant protein; (2) to increase the solubility of the recombinant protein; and (3) to aid in the purification of the recombinant protein by acting as a ligand in affinity purification. Often, a proteolytic cleavage site is introduced at the junction of the fusion moiety and the recombinant protein to enable separation of the recombinant protein from the fusion moiety subsequent to purification of the fusion protein. Such vectors are within the scope of the present disclosure.
Moreover, the expression vector typically includes those genetic elements for expression of the recombinant polypeptide in yeast cells. The elements for transcription and translation in the yeast cell can include a promoter, a coding region for the protein complex, and a transcriptional terminator.
A person of ordinary skill in the art will be aware of the molecular biology techniques available for the preparation of expression vectors. The polynucleotide used for incorporation into the expression vector of the subject technology, as described above, can be prepared by routine techniques such as polymerase chain reaction (PCR).
A number of molecular biology techniques have been developed to operably link DNA to vectors via complementary cohesive termini. In one example, complementary homopolymer tracts can be added to the nucleic acid molecule to be inserted into the vector DNA. The vector and nucleic acid molecule are then joined by hydrogen bonding between the complementary homopolymeric tails to form recombinant DNA molecules.
Alternatively, synthetic linkers containing one or more restriction sites provide are used to operably link the polynucleotide of the subject technology to the expression vector. In some examples, the polynucleotide is generated by restriction endonuclease digestion. In some cases, the nucleic acid molecule is treated with bacteriophage T4 DNA polymerase or E. coli DNA polymerase I, enzymes that remove protruding, 3 '-single-stranded termini with their 3'- 5'-exonucleolytic activities and fill-in recessed 3'-ends with their polymerizing activities, thereby generating blunt-ended DNA segments. The blunt-ended segments are then incubated with a large molar excess of linker molecules in the presence of an enzyme that is able to catalyze the ligation of blunt-ended DNA molecules, such as bacteriophage T4 DNA ligase. Thus, the product of the reaction is a polynucleotide carrying polymeric linker sequences at its ends. These polynucleotides are then cleaved with the appropriate restriction enzyme and ligated to an expression vector that has been cleaved with an enzyme that produces termini compatible with those of the polynucleotide.
Alternatively, a vector having ligation-independent cloning (LIC) sites can be employed. The required PCR amplified polynucleotide can then be cloned into the LIC vector without restriction digest or ligation (Aslanidis and de Jong, NUCL. ACID. RES. 18 6069-74, (1990), Haun, et al, BIOTECHNIQUES 13, 515-18 (1992), which is incorporated herein by reference to the extent it is consistent herewith).
In some cases, in order to isolate and/or modify the polynucleotide of interest for insertion into the chosen plasmid, it is suitable to use PCR. Appropriate primers for use in PCR preparation of the sequence can be designed to isolate the required coding region of the nucleic acid molecule, add restriction endonuclease or LIC sites, place the coding region in the desired reading frame.
In some cases, a polynucleotide for incorporation into an expression vector of the subject technology is prepared by the use of PCR using appropriate oligonucleotide primers. The coding region is amplified, whilst the primers themselves become incorporated into the amplified sequence product. In an embodiment, the amplification primers contain restriction endonuclease recognition sites, which allow the amplified sequence product to be cloned into an appropriate vector.
The expression vectors can be introduced into host cells by conventional transformation or transfection techniques. Transformation of appropriate cells with an expression vector of the subject technology is accomplished by methods known in the art and typically depends on both the type of vector and cell. Suitable techniques include calcium phosphate or calcium chloride co-precipitation, DEAE-dextran mediated transfection, lipofection, chemoporation or electroporation.
Successfully transformed cells, that is, those cells containing the expression vector, can be identified by techniques well known in the art. For example, cells transfected with an expression vector of the subject technology can be cultured to produce polypeptides described herein. Cells can be examined for the presence of the expression vector DNA by techniques well known in the art.
The host cells can contain a single copy of the expression vector described previously, or alternatively, multiple copies of the expression vector.
Typically, the vector or cassette contains sequences directing transcription and translation of the relevant polynucleotide, a selectable marker, and sequences allowing autonomous replication or chromosomal integration. Suitable vectors comprise a region 5' of the polynucleotide which harbors transcriptional initiation controls and a region 3' of the DNA fragment which controls transcriptional termination. It is preferred for both control regions to be derived from genes homologous to the transformed host cell, although it is to be understood that such control regions need not be derived from the genes native to the specific species chosen as a host.
Initiation control regions or promoters, which are useful to drive expression of the recombinant polypeptide in the desired microbial host cell are numerous and familiar to those skilled in the art. Virtually any promoter capable of driving these genes is suitable for the subject technology including but not limited to CYCI, HIS4, GALI, GALIO, ADHI, PGK, PH05, GAPDH, ADCI, TRPI, URA3, LEU2, ENO, TPI (useful for expression in Saccharomyces : AOXI (useful for expression m Pichia).
Termination control regions may also be derived from various genes native to the microbial hosts. A termination site optionally may be included for the microbial hosts described herein.
Analysis of Sequence Similarity Using Identity Scoring
As used herein "sequence identity" refers to the extent to which two optimally aligned polynucleotide or peptide sequences are invariant throughout a window of alignment of components, e.g., nucleotides or amino acids. An "identity fraction" for aligned segments of a test sequence and a reference sequence is the number of identical components which are shared by the two aligned sequences divided by the total number of components in reference sequence segment, z.e., the entire reference sequence or a smaller defined part of the reference sequence.
As used herein, the term "percent sequence identity" or "percent identity" refers to the percentage of identical nucleotides in a linear polynucleotide sequence of a reference ("query") polynucleotide molecule (or its complementary strand) as compared to a test ("subject") polynucleotide molecule (or its complementary strand) when the two sequences are optimally aligned (with appropriate nucleotide insertions, deletions, or gaps totaling less than 20 percent of the reference sequence over the window of comparison). Optimal alignment of sequences for aligning a comparison window are well known to those skilled in the art and may be conducted by tools such as the local homology algorithm of Smith and Waterman, the homology alignment algorithm of Needleman and Wunsch, the search for similarity method of Pearson and Lipman, and preferably by computerized implementations of these algorithms such as GAP, BESTFIT, FASTA, and TFASTA available as part of the GCG® Wisconsin Package® (Accelrys Inc., Burlington, MA). An "identity fraction" for aligned segments of a test sequence and a reference sequence is the number of identical components which are shared by the two aligned sequences divided by the total number of components in the reference sequence segment, z.e., the entire reference sequence or a smaller defined part of the reference sequence. Percent sequence identity is represented as the identity fraction multiplied by 100. The comparison of one or more polynucleotide sequences may be to a full-length polynucleotide sequence or a portion thereof, or to a longer polynucleotide sequence. For purposes of this disclosure "percent identity" may also be determined using BLASTX version 2.0 for translated nucleotide sequences and BLASTN version 2.0 for polynucleotide sequences.
The percent of sequence identity is preferably determined using the "Best Fit" or "Gap" program of the Sequence Analysis Software Package™ (Version 10; Genetics Computer Group, Inc., Madison, WI). "Gap" utilizes the algorithm of Needleman and Wunsch (Needleman and Wunsch, JOURNAL OF MOLECULAR BIOLOGY 48:443-453, 1970) to find the alignment of two sequences that maximizes the number of matches and minimizes the number of gaps. "BestFit" performs an optimal alignment of the best segment of similarity between two sequences and inserts gaps to maximize the number of matches using the local homology algorithm of Smith and Waterman (Smith and Waterman, ADVANCES IN APPLIED MATHEMATICS, 2:482-489, 1981, Smith etal., NUCLEIC ACIDS RESEARCH 11 :2205-2220, 1983). The percent identity is most preferably determined using the "Best Fit" program.
Useful methods for determining sequence identity are also disclosed in the Basic Local Alignment Search Tool (BLAST) programs which are publicly available from National Center Biotechnology Information (NCBI) at the National Library of Medicine, National Institute of Health, Bethesda, Md. 20894; see BLAST Manual, Altschul et al., NCBI, NLM, NIH; Altschul et al., J. MOL. BIOL. 215:403-410 (1990); version 2.0 or higher of BLAST programs allows the introduction of gaps (deletions and insertions) into alignments; for peptide sequence BLASTX can be used to determine sequence identity; and, for polynucleotide sequence BLASTN can be used to determine sequence identity.
As used herein, the term "substantial percent sequence identity" refers to a percent sequence identity of at least about 70% sequence identity, at least about 80% sequence identity, at least about 85% identity, at least about 90% sequence identity, or even greater sequence identity, such as about 98% or about 99% sequence identity. Thus, one example of the disclosure is a polynucleotide molecule that has at least about 70% sequence identity, at least about 80% sequence identity, at least about 85% identity, at least about 90% sequence identity, or even greater sequence identity, such as about 98% or about 99% sequence identity with a polynucleotide sequence described herein.
Identity is the fraction of amino acids that are the same between a pair of sequences after an alignment of the sequences (which can be done using only sequence information or structural information or some other information, but usually it is based on sequence information alone), and similarity is the score assigned based on an alignment using some similarity matrix. The similarity index can be any one of the following BLOSUM62, PAM250, or GONNET, or any matrix used by one skilled in the art for the sequence alignment of proteins.
Identity is the degree of correspondence between two sub-sequences (no gaps between the sequences). An identity of 25% or higher implies similarity of function, while 18- 25% implies similarity of structure or function. Keep in mind that two completely unrelated or random sequences (that are greater than 100 residues) can have higher than 20% identity. Similarity is the degree of resemblance between two sequences when they are compared. This is dependent on their identity.
As is evident from the foregoing description, certain instances of the present disclosure are not limited by the particular details of the examples illustrated herein, and it is therefore contemplated that other modifications and applications, or equivalents thereof, will occur to those skilled in the art. It is accordingly intended that the claims shall cover all such modifications and applications that do not depart from the spirit and scope of the present disclosure. Moreover, unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the disclosure belongs. Although any methods and materials similar to or equivalent to or those described herein can be used in the practice or testing of the present disclosure, the preferred methods and materials are described above.
Orally Consumable Products
Some instances of the present disclosure provide compositions comprising the thaumatin produced using the methods described herein. In some cases, the thaumatin produced using the methods described herein can be used, e.g., as sweeteners, in products, e.g., consumable products (e.g., orally consumable products).
In some cases, the consumable products can be, for example, a food product, a beverage product, a nutraceutical, a pharmaceutical, a dietary supplement, a dental hygienic composition, an edible gel composition, a cosmetic product and a tabletop flavoring.
Any one of the consumable products (e.g., orally consumable products) can also have at least one additional sweetener. The at least one additional sweetener can be a natural high intensity sweetener, for example. The additional sweetener can be selected from a stevia extract, a steviol glycoside, stevioside, rebaudioside A, rebaudioside B, rebaudioside C, rebaudioside D, rebaudioside D2, rebaudioside E, rebaudioside F, rebaudioside M, rebaudioside V, rebaudioside W, rebaudioside Zl, rebaudioside Z2, rebaudioside D3, dulcoside A, rubusoside, rebaudioside N, rebaudioside I, rebaudioside G, rebaudioside WB1, rebaudioside WB2, rebaudioside R6-2A, rebaudioside R6-2B, rebaudioside R6-4A, rebaudioside R6-4B, rebaudioside R7-2, steviolbioside, sucrose, high fructose corn syrup, fructose, glucose, xylose, arabinose, rhamnose, erythritol, xylitol, mannitol, sorbitol, inositol, AceK, aspartame, neotame, sucralose, saccharine, naringin dihydrochalcone (NarDHC), neohesperidin dihydrochalcone (NDHC), rubusoside, mogroside IV, siamenoside I, mogroside V, monatin, thaumatin, monellin, L-alanine, glycine, Lo Han Guo, hernandulcin, phyllodulcin, trilobtain, and combinations thereof.
Any one of the consumable products (e.g., orally consumable products) can also have at least one additive. The additive can be, for example, a carbohydrate, a polyol, an amino acid or salt thereof, a polyamino acid or salt thereof, a sugar acid or salt thereof, a nucleotide, an organic acid, an inorganic acid, an organic salt, an organic acid salt, an organic base salt, an inorganic salt, a bitter compound, a flavorant, a flavoring ingredient, an astringent compound, a protein, a protein hydrolysate, a surfactant, an emulsifier, a flavonoids, an alcohol, a polymer, and combinations thereof.
In some instances, the present disclosure provides a beverage product comprising a sweetening amount of thaumatin produced using the methods described herein. Any one of the beverage products can be, for example, a carbonated beverage product and a noncarbonated beverage product. Any one of the beverage products can also be, for example, a soft drink, a fountain beverage, a frozen beverage; a ready -to-drink beverage; a frozen and ready -to-drink beverage, coffee, tea, a dairy beverage, a powdered soft drink, a liquid concentrate, flavored water, enhanced water, fruit juice, a fruit juice flavored drink, a sport drink, and an energy drink.
In some instances, any one of the beverage products of the present disclosure can include one or more beverage ingredients such as, for example, acidulants, fruit juices and/or vegetable juices, pulp, etc., flavorings, coloring, preservatives, vitamins, minerals, electrolytes, erythritol, tagatose, glycerine, and carbon dioxide. Such beverage products may be provided in any suitable form, such as a beverage concentrate and a carbonated, ready -to- drink beverage.
In certain instances, any one of the beverage products of the present disclosure can have any of numerous different specific formulations or constitutions. The formulation of a beverage product of the present disclosure can vary to a certain extent, depending upon such factors as the product’s intended market segment, its desired nutritional characteristics, flavor profile, and the like. For example, in certain embodiments, it can generally be an option to add further ingredients to the formulation of a particular beverage product. For example, additional (z.e., more and/or other) sweeteners can be added, flavorings, electrolytes, vitamins, fruit juices or other fruit products, tastants, masking agents and the like, flavor enhancers, and/or carbonation typically may be added to any such formulations to vary the taste, mouthfeel, nutritional characteristics, etc.
Exemplary flavorings can be, for example, cola flavoring, citrus flavoring, and spice flavorings. In some examples, carbonation in the form of carbon dioxide can be added for effervescence. In other examples, preservatives can be added, depending upon the other ingredients, production technique, desired shelf life, etc. In certain cases, caffeine can be added. In some cases, the beverage product can be a cola-flavored carbonated beverage, characteristically containing carbonated water, sweetener, kola nut extract and/or other flavoring, caramel coloring, one or more acids, and optionally other ingredients. As used herein, “dietary supplement s)” refers to compounds intended to supplement the diet and provide nutrients, such as vitamins, minerals, fiber, fatty acids, amino acids, etc. that may be missing or may not be consumed in sufficient quantities in a diet. Any suitable dietary supplement known in the art may be used. Examples of suitable dietary supplements can be, for example, nutrients, vitamins, minerals, fiber, fatty acids, herbs, botanicals, amino acids, and metabolites.
As used herein, “nutraceutical(s)” refers to compounds, which includes any food or part of a food that may provide medicinal or health benefits, including the prevention and/or treatment of disease or disorder (e.g., fatigue, insomnia, effects of aging, memory loss, mood disorders, cardiovascular disease and high levels of cholesterol in the blood, diabetes, osteoporosis, inflammation, autoimmune disorders, etc.). Any suitable nutraceutical known in the art may be used. In some cases, nutraceuticals can be used as supplements to food and beverages and as pharmaceutical formulations for enteral or parenteral applications which may be solid formulations, such as capsules or tablets, or liquid formulations, such as solutions or suspensions.
In some cases, dietary supplements and nutraceuticals can further contain protective hydrocolloids (such as gums, proteins, modified starches), binders, film-forming agents, encapsulating agents/materials, wall/shell materials, matrix compounds, coatings, emulsifiers, surface active agents, solubilizing agents (oils, fats, waxes, lecithins, etc.), adsorbents, carriers, fillers, co-compounds, dispersing agents, wetting agents, processing aids (solvents), flowing agents, taste-masking agents, weighting agents, jellifying agents, gel-forming agents, antioxidants and antimicrobials.
As used herein, a “gel” refers to a colloidal system in which a network of particles spans the volume of a liquid medium. Although gels mainly are composed of liquids, and thus exhibit densities similar to liquids, gels have the structural coherence of solids due to the network of particles that spans the liquid medium. For this reason, gels generally appear to be solid, jelly-like materials. Gels can be used in a number of applications. For example, gels can be used in foods, paints, and adhesives. Gels that can be eaten are referred to as “edible gel compositions.” Edible gel compositions typically are eaten as snacks, as desserts, as a part of staple foods, or along with staple foods. Examples of suitable edible gel compositions can be, for example, gel desserts, puddings, jams, jellies, pastes, trifles, aspics, marshmallows, gummy candies, and the like. In some embodiments, edible gel mixes generally are powdered or granular solids to which a fluid may be added to form an edible gel composition. Examples of suitable fluids can be, for example, water, dairy fluids, dairy analogue fluids, juices, alcohol, alcoholic beverages, and combinations thereof. Examples of suitable dairy fluids can be, for example, milk, cultured milk, cream, fluid whey, and mixtures thereof. Examples of suitable dairy analogue fluids can be, for example, soy milk and non-dairy coffee whitener.
As used herein, the term “gelling ingredient” refers to any material that can form a colloidal system within a liquid medium. Examples of suitable gelling ingredients can be, for example, gelatin, alginate, carrageenan, gum, pectin, konjac, agar, food acid, rennet, starch, starch derivatives, and combinations thereof. It is well known to those in the art that the amount of gelling ingredient used in an edible gel mix or an edible gel composition can vary considerably depending on a number of factors such as, for example, the particular gelling ingredient used, the particular fluid base used, and the desired properties of the gel.
Gel mixes and gel compositions of the present disclosure can be prepared by any suitable method known in the art. In some embodiments, edible gel mixes and edible gel compositions of the present disclosure can be prepared using other ingredients in addition to the gelling agent. Examples of other suitable ingredients can be, for example, a food acid, a salt of a food acid, a buffering system, a bulking agent, a sequestrant, a cross-linking agent, one or more flavors, one or more colors, and combinations thereof.
Pharmaceutical compositions are also provided comprising thaumatin produced using the methods described herein. In some cases, any one of the pharmaceutical compositions of the present disclosure can be used to formulate pharmaceutical drugs containing one or more active agents that exert a biological effect. Accordingly, in some embodiments, any one of the pharmaceutical compositions of the present disclosure can contain one or more active agents that exert a biological effect. Suitable active agents are well known in the art (e.g., The Physician's Desk Reference). Such compositions can be prepared according to procedures well known in the art, for example, as described in Remington’s Pharmaceutical Sciences, Mack Publishing Co., Easton, Pa., USA.
The thaumatin produced using the methods described herein can be used with any suitable dental and oral hygiene compositions known in the art. Examples of suitable dental and oral hygiene compositions can be, for example, toothpastes, tooth polishes, dental floss, mouthwashes, mouth rinses, dentifrices, mouth sprays, mouth refreshers, plaque rinses, dental pain relievers, and the like. Dental and oral hygiene compositions comprising any one of the rebaudiosides provided herein are also provided.
As used herein, “food product composition(s)” refers to any solid or liquid ingestible material that can, but need not, have a nutritional value and be intended for consumption by humans and animals. Examples of suitable food product compositions can be, for example, confectionary compositions, such as candies, mints, fruit flavored drops, cocoa products, chocolates, and the like; condiments, such as ketchup, mustard, mayonnaise, and the like; chewing gums; cereal compositions; baked goods, such as breads, cakes, pies, cookies, and the like; dairy products, such as milk, cheese, cream, ice cream, sour cream, yogurt, sherbet, and the like; tabletop sweetener compositions; soups; stews; convenience foods; meats, such as ham, bacon, sausages, jerky, and the like; gelatins and gelatin-like products such as jams, jellies, preserves, and the like; fruits; vegetables; egg products; icings; syrups including molasses; snacks; nut meats and nut products; and animal feed.
Food product compositions can also be herbs, spices and seasonings, natural and synthetic flavors, and flavor enhancers, such as monosodium glutamate. In some embodiments, any one of the food product compositions can be, for example, prepared packaged products, such as dietetic sweeteners, liquid sweeteners, granulated flavor mixes, pet foods, livestock feed, tobacco, and materials for baking applications, such as powdered baking mixes for the preparation of breads, cookies, cakes, pancakes, donuts and the like. In other embodiments, any one of the food product compositions can also be diet and low-calorie food and beverages containing little or no sucrose.
EXAMPLES
Example 1 - Expression of thaumatin in Pichia pastoris
To demonstrate the transformation of Pichia pastoris cells to produce several engineered Pichia strains suitable for secreted thaumatin production, the following experiments were conducted. Full-length DNA fragment of thaumatin I and II genes (SEQ ID NOs: 2 and 4) were codon optimized for Pichia pastoris expression and synthesized for use in the transformation of the Pichia pastoris cells. Thaumatin fragments were inserted in frame after a nucleotide sequence encoding a mating factor signal peptide in pHKA vector (a modified Pichia expression vector) to generate single copy plasmids (pHKA-Thml and pHKA-Thmll, FIGs. 1 and 2). Multiple signal peptides for the secretion of thaumatin into the culture media were tested. The signal peptide from the S. cerevisiae Dolichyl-diphosphooligosaccharide— protein glycosyltransferase (Ostl) gene (amino acid SEQ ID NO: 5, DNA SEQ ID NO: 6) was determined to be the best for both thaumatin genes. In the plasmid, each expression cassette contains A0X1 promoter, S. cerevisiae Ostl signal peptide-thaumatin fusion gene and A0X1 transcription terminator. The Ostl signal peptide-thaumatin fusion protein can be cleaved by endogenous signal peptidase and release mature thaumatin peptides (SEQ ID NO: 1 and 3) into extracellular space.
To generate the multiple copies of the expression cassette in vitro, the above plasmid was digested with BspEI and Bglll or BspEI and BamHI. The fragments containing the thaumatin coding sequence were gel-purified then ligated together. Resulting E. coli colonies were screened by digestion with Bglll and BamHI to find colonies with an insert that is double the size of the signal expression cassette, which are plasmids containing 2 expression cassettes. This procedure was repeated on the 2 copies plasmid to generate pHKA Pichia expression plasmids harboring 4 copies of identical thaumatin expression cassettes (pHKA-ThmIx4 and pHKA-ThmIIx4, FIGs. 2 and 4).
Identified plasmids were linearized at the HIS4 gene with a BspEI digestion. The linearized expression plasmid was transformed into Pichia pastoris (GS115) cells using known methods and the expression cassette was integrated into the His 4 locus of Pichia genome. After screening, the positive strains were identified, as summarized in Table 1.
Table 1. Summary of Pichia pastoris strains
To demonstrate thaumatin production, the following experiment was conducted. Single colonies of the Pichia pastoris strains were inoculated in BMGY medium in a 24 wells plate or baffled flask and grown at 28-30°C in a shaking incubator (250-300 rpm) until the culture reached an ODeoo of 2-6 (log-phase growth). The cells were harvested by centrifuging and resuspended to an ODeoo of 1.0 in BMM/BMMY medium to induce expression. 100% methanol was added to the BMMY medium to a final concentration of 1% methanol every 24 hours to maintain induction of expression. The medium was harvested at different induction time by centrifugation and analyzed by SDS-PAGE, HPLC and LC MS as described below.
In order to identify thaumatin production, multiple methods were used to detect thaumatin in the products. Medium samples were subjected to electrophoresis on a 10-20% SDS-PAGE gel. As shown in FIG. 5, there were 22 kDa bands in all thaumatin expressing media samples, indicating thaumatin production in engineered Pichia strain. Increasing the number of thaumatin expression cassette copies through in vitro multimerization can increase thaumatin production in Pichia (FIG. 5). In order to confirm that zcAza-produced thaumatin is correctly folded with all native disulfide bonds, these samples were analyzed by HPLC and LC MS analysis as compared to thaumatin standard.
HPLC analysis was performed using a C4 HPLC column. A linear gradient increased from 20% - 40% 0.1% TFA in water: 0.1 % TFA in acetonitrile over 10 minutes then dropped down to 20% for an additional 5 minutes at a flow rate of 0.6 mL/min. Thaumatin I standard elutes at 8.7 minutes and thaumatin II standard elutes at 8.6 minutes. When the media samples of ThmI and Thmll strains were precipitated with ammonium sulfate and resuspended in water, a peak with a same retention time to the standard was observed (FIG. 6).
Samples were analyzed by LC-MS using a C4 column. Mobile phase A was 0.1% formic acid in water, and mobile phase B was 0.1% formic acid in acetonitrile. The flow rate was set at 0.2 ml/minute. Mass spectrometry analysis of the samples was done on the Q Exactive Hybrid Quadrupole-Orbitrap Mass Spectrometer (Thermo Fisher Scientific) with an optimized method in positive ion mode. The produced peptides in ThmI and Thmll strains have similar retention times as a thaumatin standard purchased from Sigma- Aldrich. The standard is a mixture of type I and type II thaumatin extracted from the Thaumatococcus daniellii plant. The produced thaumatin I peptide has same mass ([M+H] + : 22174 m/z ) as the major compound in the thaumatin standard (thaumatin I with eight disulfide bonds). The Pichia produced thaumatin II had a mass of 22257 m/z, matching that of the expected mass of the compound with the eight disulfide bonds. These results provided the evidence supporting thaumatin production in engineered Pichia strain (ThmI, Thmll etc.) (FIG. 7).
Example 2: Expression of thaumatin I and II tandem repeats using a 2A linker Another expression strategy for thaumatin involves the use of 2A peptide linkers between signal peptide-thaumatin open reading frames. The 2A linker pauses translation of mRNA to protein at the N-terminal glycine residue, then translation is re-initiated at the proline residue at the C-terminus of the 2A peptide. Of the various sequences of known 2A linkers, we selected the Equine rhinitis B virus (ERBV-1) 2A linker (amino acid SEQ ID NO: 7 and DNA SEQ ID NO: 8) one that has shown to be highly efficient in pausing and restarting translation in yeast similar to P. pastoris.
The 2A linker was added to the pHKA-Ostl-Thml and Thmll plasmids by amplifying the thaumatin gene with overlaps that code for the 2A linker with Gibson assembly. The assembly was transformed into 10G colonies and the correct sequence was confirmed with Sanger sequencing. Correct plasmids were linearized by BspEI digestion and transformed into Pichia cell.
Table 2. Summary of strains and plasmids (2A linker)
To demonstrate thaumatin I and II production, the following experiment was conducted. Single colonies of the Pichia pastoris strains were inoculated in BMGY medium in a 24 well plate or baffled flask and grown at 28-30°C in a shaking incubator (250-300 rpm) until the culture reached an ODeoo of 2-6 (log-phase growth). The cells were harvested by centrifuging and resuspended to an ODeoo of 1.0 in BMM/BMMY medium to induce expression. Methanol 100% was added to the BMMY medium to a final concentration of 1% methanol every 24 hours to maintain induction of expression. The medium was harvested at different induction time by centrifugation and subjected to SDS-PAGE, HPLC and LC MS analysis as described below (FIGs. 10 and 11).
ThmII-2A2 strain can produce thaumatin II protein. The produced thaumatin II can be detected by SDS-PAGE (FIG. 10) and LC MS (FIG. 11). Example 3 - Improvement thaumatin folding, secretion and production in Pichia pastoris
For the thaumatin peptide to have its characteristic sweetness, the six disulfide bonds must be formed in the correct positions. While P. pastoris is a good host for disulfide bond formation, the overexpression of a heterologous disulfide bonded product can overwhelm the cell’s native capacity. To improve the amount of correctly folded thaumatin produced by above identified Pichia strains, a series of chaperone and proteins related to protein expression, secretion, folding and disulfide formation were over expressed in thaumatin production Pichia strains. These chaperones and proteins were selected from P. pastoris or plant to be heterologous expressed with thaumatin in Pichia. While none of the chaperones involved in disulfide bond formation in Thaumatococcus daniellii have been identified, the disulfide bond formation system of the closely related Arabidopsis thaliana has been characterized (Table 3).
Protein disulfide isomerase (PDI) is a chaperone localized primarily in the endoplasmic reticulum that aids in forming disulfide bonds between cysteine residues. Overexpressing PDI in P. pastoris has also been shown to improve the expression of certain non-disulfide bond containing proteins. ER oxidoreductin (ERO) proteins work in tandem with PDI by donating oxidating equivalents for disulfide bond formation. ERVs are a family of sulfhydryl oxidases that play a similar role as EROs but may also directly catalyze disulfide bond formation. HAC1 is a transcriptional regulator of the unfolded protein response in P. pastoris. GPX1 is a cytosolic peroxidase that is involved in cellular redox balancing. KAR2 codes for the ER chaperone BiP that aids in proper folding and directs misfolded proteins to be degraded. The genes SLY1 and SEC1 regulate vesicle traffic from the ER to the Golgi and from the Golgi to the extracellular membrane respectively. All selected transcription regulator, chaperones and disulfide bond formation related proteins were list in Table 3.
Table 3. Selection of candidates for co-expression in thaumatin production strain
Chaperone genes were cloned into a modified pPICZ vector that has an Ndel site after the A0X1 promoter in place of the EcoRI site. To construct vectors with multiple chaperones or multiple copies of the same chaperone, the vector containing chaperone to be added was digested with Bglll and BamHI to release the expression cassette. The expression cassette was then ligated into a second chaperone vector linearized with BamHI. A Bglll and BamHI digest was performed on the resulting plasmids to confirm the insertion of the chaperone in the proper orientation. Generated vectors were linearized at the A0X1 promoter with SacI restriction enzyme and used to transform GS115 P. pastoris with thaumatin expression cassettes integrated at the HIS4 locus. Colonies with chaperone integration were selected on YPD-Zeocin and confirmed by colony PCR.
As the selected chaperones often work in tandem with other enzymes to improve secretion or disulfide bonding, expression vectors with multiple genes were generated. To ensure integration at the A0X1 locus, the promoter from the added genes needed to be replaced. The GAP1 and CAT1 promoters were amplified from GS115 genomic DNA with primers that added Bglll and Ndel sites to the 5’ and 3’ ends respectively. The A0X1 promoter was then excised from the pPICZ vectors with a Bglll/Ndel digest and replaced with the GAP1 and CAT1 promoters. Various combinations of different expression cassettes were generated by ligating a chaperone with GAP1 or CAT1 promoter excised from its expression vector with a BamHI/Bglll digest into a pPICZ A0X1 chaperone vector linearized with BamHI. All plasmids are listed below in Table 4. Table 4: Summary of selected plasmids of single and combination of different chaperones for co-expression To demonstrate improvement of thaumatin production with chaperone co-expression, confirmed colonies were grown overnight in BMGY media in 24 well plates. The next day the cells were resuspended in 2 mL BMMY media to an ODeoo of 1.0. The cultures were induced at 30 °C for 48 hours with the additional feeding of 1% methanol to each well twice daily. The cells were harvested and spun down. The supernatant was analyzed by SDS-PAGE, HPLC and LC MS.
Co-expression of PpPDIl or AtPDIl can increase thaumatin production. As shown in FIG. 12, co-expression strains had higher thaumatin production than the parent strain. Coexpression of HAC1 transcription regulator also can increase thaumatin production (FIG. 13). Co-expression of multiple chaperons and disulfide bond formation related proteins (PDI, ERO and ERV), either from Pichia pastoris and Arabidopsis. can increase thaumatin production in Thm II strain (FIG. 14). Combining the disulfide bond formation chaperone PDI1 and the ER membrane trafficking chaperone PpKAR2 also increases the amount of thaumatin secreted by Pichia (FIG. 15). Example 5: Fermentation, purification and identification of Thaumatin I and Thaumatin II
Identified thaumatin I and II production strains (Thmlx4 and ThmIIx4) were cultured in 3 L fermenters for thaumatin production. Seed cultures were inoculated into rich media with glycerol and methanol was continually fed into the medium for thaumatin induction after glycerol was fully consumed. Medium samples were collected at different time points and analyzed by SDS-PAGE and HPLC. As shown in FIG. 16, no thaumatin I was detected before methanol feeding (FIG. 16: 1 and 2); After methanol feeding, thaumatin I production increased along with the induction time throughout the 138 hr fermentation time (FIG. 16: 3-11). As shown in FIG. 17, no thaumatin II was detected before methanol feeding (FIG. 17: 2 and 3); After methanol feeding, thaumatin II production increased along with the induction time throughout the 138 hr fermentation time (FIG. 17: 4-12).
Example 6 - Alternate 2A peptide with alpha mating factor spacer peptides
An alternate method of expressing thaumatin in a tandem repeat was demonstrated by combining the 2A polypeptide strategy with a linker that mimics the alpha mating factor. The mating factor mimic linker can be cleaved by endogenous P. pastoris proteases to improve the release of thaumatin monomers from tandem repeats. The length of mating factor linkers varies between yeast species. Our experimentation has shown linkers with five repeats of the EA dipeptide are efficiently cleaved by endogenous P. pastoris KEX2 protease, (amino acid SEQ ID NO: 37 and DNA SEQ ID NO: 38) has shown to be highly efficient in pausing and restarting translation in yeast similar to P. pastoris. When combined with the 2A peptide, the yeast cell will process the tandem repeat polypeptide to thaumatin by starting and stopping translation at each 2A peptide. Extra C-terminal amino acids will be cleaved by the combined action of the KEX2 and KEX1 proteases.
The mating factor linker was added to the pHKA-Ostl-ThmII-2A plasmid by amplifying the end of the first thaumatin tandem repeat in the reverse direction and the 2A peptide in the forward direction with overlaps that code for the mating factor linker DNA sequence. The DNA fragments were assembled by Gibson assembly and sequence confirmed. The PCR and DNA assembly was repeated to generate plasmids with 8 thaumatin sequences separated by 7 linkers. Alternatively, the pHKA-Ostl-ThmII-linker-2Ax2, wherein the linker sequence comprises a KR(EA)5 linker (SEQ ID NO: 43), plasmid was multimerized by the methods described in example 1 until a plasmid with 8 copies of the Ostl-ThmII-linker-2Ax2, wherein the linker sequence comprises a KR(EA)5 linker (SEQ ID NO: 43), coding sequence was obtained. Plasmids were linearized by BspEI digestion and transformed into Pichia cells by electroporation. The strains secrete thaumatin II of the correct mass on both SDS-PAGE (FIG. 24) and LCMS. The plasmids and strains generated for this group of strains are listed in Table 5.
Table 5. Summary of Linker-2A, wherein the linker sequence comprises a KR(EA)5 linker (SEQ ID NO: 43), plasmids.
References
1. Van der Wei H,, Loeve K. (1972), Isolation and characterization of thaumatin I and II, the sweet-tasting proteins from Thaimiatococcus damellii Benth. Eur. J. Biochem. 31 221-225. 2. Ide N., Kaneko R., Wada R., Mehta A., Tamaki S., Tsuruta T. (2007a). Cloning of the thaumatin I cDNA and characterization of recombinant thaumatin I secreted by Pichia pastoris. Biotechnol. Prog. 23 1023-1030.
3. Joseph J A, Akkermans S, Nimmegeers P, Van Impe JFM. (2019). Bioproduction of the Recombinant S weet Protein Thaumatin: Current State of the Art and Perspectives. Front Microbiol. 8;10:695.
Sequences: GCTACTTTTGAAATCGTTAACAGATGTTCTTACACTGTTTGGGCTGCTGCTTCTAAA
GGTGACGCTGCTTTGGATGCTGGTGGTAGACAATTGAATAGTGGTGAATCTTGGAC
TATTAATGTTGAACCAGGTACTAATGGTGGTAAAATTTGGGCTAGAACTGATTGTT
ACTTCGATGATTCTGGTTCTGGTATTTGTAAAACTGGTGACTGTGGTGGTTTGTTGA
GATGTAAAAGATTCGGTAGACCACCTACTACTTTGGCTGAATTTTCTTTGAACCAA
TACGGTAAAGATTACATTGATATCTCTAACATCAAGGGTTTTAACGTTCCTATGGA
TTTCTCTCCTACTACTAGAGGTTGTAGAGGTGTTAGATGTGCTGCTGATATTGTTGG
TCAATGTCCTGCTAAGTTGAAAGCTCCTGGTGGTGGTTGTAACGATGCTTGTACTG
TTTTTCAAACTTCTGAATATTGCTGTACTACTGGTAAATGTGGTCCAACTGAATACT
CTAGATTCTTCAAAAGATTGTGTCCTGATGCTTTTTCTTACGTTTTGGATAAACCAA
CTACTGTTACTTGTCCTGGTTCTTCTAATTATAGAGTTACTTTCTGTCCAACTGCT
SEQ ID NO: 3 Amino acid sequence of thaumatin II
ATFEIVNRCSYTVWAAASKGDAALDAGGRQLNSGESWTINVEPGTKGGKIWARTDC
YFDDSGRGICRTGDCGGLLQCKRFGRPPTTLAEFSLNQYGKDYIDISNIKGFNVPMD FS
PTTRGCRGVRCAADIVGQCPAKLKAPGGGCNDACTVFQTSEYCCTTGKCGPTEYSRF
FKRLCPD AF S YVLDKPTT VTCPGS SNYRVTFCPT A
SEQ ID NO: 4 DNA sequence encoding SEQ ID NO: 3 thaumatin II
GCTACTTTCGAAATTGTTAACAGATGTTCTTACACTGTTTGGGCTGCTGCTTCTAAG
GGTGACGCTGCTTTGGATGCTGGTGGTAGACAATTGAACTCTGGTGAATCTTGGAC
TATTAATGTTGAACCAGGTACTAAGGGTGGTAAAATTTGGGCTAGAACTGATTGTT
ACTTTGATGATTCTGGTAGAGGTATTTGTAGAACTGGTGACTGTGGTGGTTTGTTG
CAATGTAAGAGATTTGGTAGACCACCAACTACTTTGGCTGAATTTTCTTTGAACCA
ATATGGTAAAGACTACATTGATATCTCTAACATTAAGGGTTTCAACGTTCCAATGG
ATTTCTCTCCTACTACTAGAGGTTGTAGAGGTGTTAGATGTGCTGCTGATATTGTTG
GTCAATGTCCAGCTAAATTGAAGGCTCCTGGTGGTGGTTGTAATGATGCTTGTACT
GTTTTTCAAACTTCTGAATACTGTTGTACTACTGGTAAATGTGGTCCTACTGAATAT
TCTAGATTCTTCAAAAGACTGTGTCCTGATGCTTTTTCTTACGTTTTGGATAAGCCT
ACTACTGTTACTTGTCCTGGTTCTTCTAACTACAGAGTTACTTTTTGTCCTACTGCT
SEQ ID NO: 5 Amino acid sequence of Ostl signal peptide
MRQVWFSWIVGLFLCFFNVSSA SEQ ID NO: 6 DNA sequence encoding Ostl signal peptide
ATGAGGCAGGTTTGGTTCTCTTGGATTGTGGGATTGTTCCTATGTTTTTTCAACGTG
TCTTCTGCT
SEQ ID NO: 7 Amino acid sequence of 2 A linker
GATNFSLLKLAGDVELNPGP
SEQ ID NO: 8 DNA sequence encoding 2 A linker DNA
GGAGCAACTAATTTCAGTCTCCTGAAGTTGGCCGGTGACGTTGAACTGAACCCTGG ACCG
SEQ ID NO: 9 Amino acid sequence of fusion protein ThmI-2A
MRQVWFSWIVGLFLCFFNVSSAATFEIVNRCSYTVWAAASKGDAALDAGGRQLNSG
ESWTINVEPGTNGGKIWARTDCYFDDSGSGICKTGDCGGLLRCKRFGRPPTTLAEFS L
NQYGKDYIDISNIKGFNVPMDFSPTTRGCRGVRCAADIVGQCPAKLKAPGGGCNDAC TVFQTSEYCCTTGKCGPTEYSRFFKRLCPDAFSYVLDKPTTVTCPGSSNYRVTFCPTAG ATNFSLLKLAGDVELNPGPMRQVWFSWIVGLFLCFFNVSSAATFEIVNRCSYTVWAA ASKGDAALDAGGRQLNSGESWTINVEPGTNGGKIWARTDCYFDDSGSGICKTGDCGG
LLRCKRFGRPPTTLAEFSLNQYGKDYIDISNIKGFNVPMDFSPTTRGCRGVRCAADI VG QCPAKLKAPGGGCNDACTVFQTSEYCCTTGKCGPTEYSRFFKRLCPDAFSYVLDKPTT VTCPGSSNYRVTFCPTA
SEQ ID NO: 10 DNA sequence encoding fusion protein ThmI-2A
ATGAGGCAGGTTTGGTTCTCTTGGATTGTGGGATTGTTCCTATGTTTTTTCAACGTG
TCTTCTGCTGCTACTTTTGAAATCGTTAACAGATGTTCTTACACTGTTTGGGCTGCT
GCTTCTAAAGGTGACGCTGCTTTGGATGCTGGTGGTAGACAATTGAATAGTGGTGA
ATCTTGGACTATTAATGTTGAACCAGGTACTAATGGTGGTAAAATTTGGGCTAGAA
CTGATTGTTACTTCGATGATTCTGGTTCTGGTATTTGTAAAACTGGTGACTGTGGTG
GTTTGTTGAGATGTAAAAGATTCGGTAGACCACCTACTACTTTGGCTGAATTTTCT
TTGAACCAATACGGTAAAGATTACATTGATATCTCTAACATCAAGGGTTTTAACGT
TCCTATGGATTTCTCTCCTACTACTAGAGGTTGTAGAGGTGTTAGATGTGCTGCTG ATATTGTTGGTCAATGTCCTGCTAAGTTGAAAGCTCCTGGTGGTGGTTGTAACGAT GCTTGTACTGTTTTTCAAACTTCTGAATATTGCTGTACTACTGGTAAATGTGGTCCA ACTGAATACTCTAGATTCTTCAAAAGATTGTGTCCTGATGCTTTTTCTTACGTTTTG GATAAACCAACTACTGTTACTTGTCCTGGTTCTTCTAATTATAGAGTTACTTTCTGT CCAACTGCTGGAGCAACTAATTTCAGTCTCCTGAAGTTGGCCGGTGACGTTGAACT GAACCCTGGACCGATGAGGCAGGTTTGGTTCTCTTGGATTGTGGGATTGTTCCTAT
GTTTTTTCAACGTGTCTTCTGCTGCTACTTTTGAAATCGTTAACAGATGTTCTTA CA
CTGTTTGGGCTGCTGCTTCTAAAGGTGACGCTGCTTTGGATGCTGGTGGTAGACAA
TTGAATAGTGGTGAATCTTGGACTATTAATGTTGAACCAGGTACTAATGGTGGTAA
AATTTGGGCTAGAACTGATTGTTACTTCGATGATTCTGGTTCTGGTATTTGTAAAA
CTGGTGACTGTGGTGGTTTGTTGAGATGTAAAAGATTCGGTAGACCACCTACTACT
TTGGCTGAATTTTCTTTGAACCAATACGGTAAAGATTACATTGATATCTCTAACAT
CAAGGGTTTTAACGTTCCTATGGATTTCTCTCCTACTACTAGAGGTTGTAGAGGTG
TTAGATGTGCTGCTGATATTGTTGGTCAATGTCCTGCTAAGTTGAAAGCTCCTGGT
GGTGGTTGTAACGATGCTTGTACTGTTTTTCAAACTTCTGAATATTGCTGTACTACT
GGTAAATGTGGTCCAACTGAATACTCTAGATTCTTCAAAAGATTGTGTCCTGATGC TTTTTCTTACGTTTTGGATAAACCAACTACTGTTACTTGTCCTGGTTCTTCTAATTAT
AGAGTTACTTTCTGTCCAACTGCT
SEQ ID NO: 11 Amino acid sequence of fusion protein ThmII-2A
MRQVWFSWIVGLFLCFFNVSSAATFEIVNRCSYTVWAAASKGDAALDAGGRQLNSG
ESWTINVEPGTKGGKIWARTDCYFDDSGRGICRTGDCGGLLQCKRFGRPPTTLAEFS L
NQYGKDYIDISNIKGFNVPMDFSPTTRGCRGVRCAADIVGQCPAKLKAPGGGCNDAC TVFQTSEYCCTTGKCGPTEYSRFFKRLCPDAFSYVLDKPTTVTCPGSSNYRVTFCPTAG ATNFSLLKLAGDVELNPGPMRQVWFSWIVGLFLCFFNVSSAATFEIVNRCSYTVWAA ASKGDAALDAGGRQLNSGESWTINVEPGTKGGKIWARTDCYFDDSGRGICRTGDCGG
LLQCKRFGRPPTTLAEF SLNQ YGKD YIDISNIKGFNVPMDF SPTTRGCRGVRC AADIVG QCPAKLKAPGGGCNDACTVFQTSEYCCTTGKCGPTEYSRFFKRLCPDAFSYVLDKPTT VTCPGSSNYRVTFCPTA
SEQ ID NO: 12 DNA sequence encoding fusion protein ThmII-2A
ATGAGGCAGGTTTGGTTCTCTTGGATTGTGGGATTGTTCCTATGTTTTTTCAACGTG TCTTCTGCTGCTACTTTCGAAATTGTTAATAGATGTTCTTACACCGTTTGGGCTGCT GCTTCTAAGGGTGACGCTGCTTTGGATGCTGGTGGTAGACAATTGAATAGTGGTGA
ATCTTGGACTATTAACGTTGAACCTGGTACTAAGGGTGGTAAAATTTGGGCTAGAA CTGATTGTTACTTCGATGATTCTGGTAGAGGTATTTGTAGAACTGGTGACTGTGGT
GGTTTGTTGCAATGTAAAAGATTTGGTAGACCACCAACTACTTTGGCTGAATTTTC
TTTGAATCAATACGGTAAAGACTACATTGATATCTCTAATATCAAGGGTTTCAACG
TTCCAATGGATTTCTCTCCTACTACTAGAGGTTGTAGAGGTGTTAGATGTGCTGCT
GATATTGTTGGTCAATGTCCAGCTAAATTGAAAGCTCCAGGTGGTGGTTGTAACGA
TGCTTGTACTGTTTTTCAAACTTCTGAATACTGTTGTACTACTGGTAAATGTGGTCC
TACTGAATATTCTAGATTTTTCAAAAGACTGTGCCCAGATGCTTTTTCTTACGTTTT
GGATAAACCTACTACTGTTACTTGTCCAGGTTCTTCTAACTATAGAGTTACTTTTTG
TCCTACTGCTGGAGCAACTAATTTCAGTCTCCTGAAGTTGGCCGGTGACGTTGAAC
TGAACCCTGGACCGATGAGGCAGGTTTGGTTCTCTTGGATTGTGGGATTGTTCCTA
TGTTTTTTCAACGTGTCTTCTGCTGCTACTTTCGAAATTGTTAATAGATGTTCTTAC
ACCGTTTGGGCTGCTGCTTCTAAGGGTGACGCTGCTTTGGATGCTGGTGGTAGACA
ATTGAATAGTGGTGAATCTTGGACTATTAACGTTGAACCTGGTACTAAGGGTGGTA
AAATTTGGGCTAGAACTGATTGTTACTTCGATGATTCTGGTAGAGGTATTTGTAGA
ACTGGTGACTGTGGTGGTTTGTTGCAATGTAAAAGATTTGGTAGACCACCAACTAC
TTTGGCTGAATTTTCTTTGAATCAATACGGTAAAGACTACATTGATATCTCTAATAT
CAAGGGTTTCAACGTTCCAATGGATTTCTCTCCTACTACTAGAGGTTGTAGAGGTG
TTAGATGTGCTGCTGATATTGTTGGTCAATGTCCAGCTAAATTGAAAGCTCCAGGT
GGTGGTTGTAACGATGCTTGTACTGTTTTTCAAACTTCTGAATACTGTTGTACTACT
GGTAAATGTGGTCCTACTGAATATTCTAGATTTTTCAAAAGACTGTGCCCAGATGC
TTTTTCTTACGTTTTGGATAAACCTACTACTGTTACTTGTCCAGGTTCTTCTAACTA
TAGAGTTACTTTCTGTCCAACTGCT
SEQ ID NO: 13 Amino acid sequence of PpPDI
MQFNWDIKTVASILSALTLAQASDQEAIAPEDSHVVKLTEATFESFITSNPHVLAEF FAP
WCGHCKKLGPELVSAAEILKDNEQVKIAQIDCTEEKELCQGYEIKGYPTLKVFHGEV E
VPSDYQGQRQSQSIVSYMLKQSLPPVSEINATKDLDDTIAEAKEPVIVQVLPEDASN LE
SNTTFYGVAGTLREKFTFVSTKSTDYAKKYTSDSTPAYLLVRPGEEPSVYSGEELDE T
HLVHWIDIESKPLFGDIDGSTFKSYAEANIPLAYYFYENEEQRAAAADIIKPFAKEQ RG
KINFVGLDAVKFGKHAKNLNMDEEKLPLFVIHDLVSNKKFGVPQDQELTNKDVTELI E
KFIAGEAEPIVKSEPIPEIQEEKVFKLVGKAHDEVVFDESKDVLVKYYAPWCGHCKR M
APAYEELATLYANDEDASSKVVIAKLDHTLNDVDNVDIQGYPTLILYPAGDKSNPQL Y
DGSRDLESLAEFVKERGTHKVDALALRPVEEEKEAEEEAESEADAHDEL SEQ ID NO: 14 DNA sequence encoding PpPDI
ATGCAATTCAACTGGGATATTAAAACTGTGGCAAGTATTTTGTCCGCTCTCACACT
AGCACAAGCAAGTGATCAGGAGGCTATTGCTCCAGAGGACTCTCATGTCGTCAAA
TTGACTGAAGCCACTTTTGAGTCTTTCATCACCAGTAATCCTCACGTTTTGGCAGA
GTTTTTTGCCCCTTGGTGTGGTCACTGTAAGAAGTTGGGCCCTGAACTTGTTTCT GC
TGCCGAGATTTTAAAGGACAATGAGCAGGTTAAGATTGCTCAAATTGATTGTACG
GAGGAGAAGGAATTATGTCAAGGCTACGAAATTAAAGGGTATCCTACTTTGAAGG
TGTTCCATGGTGAGGTTGAGGTCCCAAGTGACTATCAAGGTCAAAGACAGAGCCA
AAGCATTGTCAGCTATATGCTAAAGCAGAGTTTACCCCCTGTCAGTGAAATCAATG
CAACCAAAGATTTAGACGACACAATCGCCGAGGCAAAAGAGCCCGTGATTGTGCA
AGTACTACCGGAAGATGCATCCAACTTGGAATCTAACACCACATTTTACGGAGTTG
CCGGTACTCTCAGAGAGAAATTCACTTTTGTCTCCACTAAGTCTACTGATTATGCC
AAAAAATACACTAGCGACTCGACTCCTGCCTATTTGCTTGTCAGACCTGGCGAGGA
ACCTAGTGTTTACTCTGGTGAGGAGTTAGATGAGACTCATTTGGTGCACTGGATTG
ATATTGAGTCCAAACCTCTATTTGGAGACATTGACGGATCTACCTTCAAATCATAC
GCTGAAGCTAACATCCCTTTAGCCTACTATTTCTATGAGAACGAAGAACAACGTGC
TGCTGCTGCCGATATTATTAAACCTTTTGCTAAAGAGCAACGTGGCAAAATTAACT
TTGTTGGCTTAGATGCCGTTAAATTCGGTAAGCATGCCAAGAACTTAAACATGGAT
GAAGAGAAACTCCCTCTATTTGTCATTCATGATTTGGTGAGCAACAAGAAGTTTGG
AGTTCCTCAAGACCAAGAATTGACGAACAAAGATGTGACCGAGCTGATTGAGAAA
TTCATCGCAGGAGAGGCAGAACCAATTGTGAAATCAGAGCCAATTCCAGAAATTC
AAGAAGAGAAAGTCTTCAAGCTAGTCGGAAAGGCCCACGATGAAGTTGTCTTCGA
TGAATCTAAAGATGTTCTAGTCAAGTACTACGCCCCTTGGTGTGGTCACTGTAAGA
GAATGGCTCCTGCTTATGAGGAATTGGCTACTCTTTACGCCAATGATGAGGATGCC
TCTTCAAAGGTTGTGATTGCAAAACTTGATCACACTTTGAACGATGTTGACAACGT
TGATATTCAAGGTTATCCTACTTTGATCCTTTATCCAGCTGGTGATAAATCCAATCC
TCAACTGTATGATGGATCTCGTGACCTAGAATCATTGGCTGAGTTTGTAAAGGAGA
GAGGAACCCACAAAGTGGATGCCCTAGCACTCAGACCAGTCGAGGAAGAAAAGG
AAGCTGAAGAAGAAGCTGAAAGTGAGGCAGACGCTCACGACGAGCTTTAA
SEQ ID NO: 15 Amino acid sequence of AtPDIl
MASSSTSISLLLFVSFILLLVNSRAENASSGSDLDEELAFLAAEESKEQSHGGGSYH EEE
HDHQHRDFENYDDLEQGGGEFHHGDHGYEEEPLPPVDEKDVAVLTKDNFTEFVGNN
SFAMVEFYAPWCGACQALTPEYAAAATELKGLAALAKIDATEEGDLAQKYEIQGFPT VFLFVDGEMRKTYEGERTKDGIVTWLKKKASPSIHNITTKEEAERVLSAEPKLVFGFL
NSLVGSESEELAAASRLEDDLSFYQTASPDIAKLFEIETQVKRPALVLLKKEEEKLA RF
DGNFTKTAIAEFVSANKVPLVINFTREGASLIFESSVKNQLILFAKANESEKHLPTL REV
AKSFKGKFVFVYVQMDNEDYGEAVSGFFGVTGAAPKVLVYTGNEDMRKFILDGELT
VNNIKTLAEDFLADKLKPFYKSDPLPENNDGDVKVIVGNNFDEIVLDESKDVLLEIY AP
WCGHCQSFEPIYNKLGKYLKGIDSLVVAKMDGTSNEHPRAKADGFPTILFFPGGNKS F
DPIAVDVDRTVVELYKFLKKHASIPFKLEKPATPEPVISTMKSDEKIEGDSSKDEL
SEQ ID NO: 16 DNA sequence encoding AtPDIl
ATGGCTTCTTCTTCTACTTCTATTTCTTTGTTGTTGTTCGTTTCTTTCATCTTGTTG TT
GGTTAATTCTAGAGCTGAAAACGCTTCTTCTGGTTCTGATTTGGATGAAGAATTGG
CTTTTCTTGCTGCTGAAGAATCTAAAGAACAATCTCATGGTGGTGGTTCTTATCAT
GAAGAAGAACATGATCATCAACATAGAGATTTTGAAAACTACGATGATTTGGAAC
AAGGTGGTGGTGAATTTCATCATGGTGACCATGGTTACGAAGAAGAACCATTGCC
ACCAGTTGATGAAAAAGATGTTGCTGTTTTGACTAAAGATAACTTCACTGAATTTG
TCGGTAATAACTCTTTCGCTATGGTTGAATTTTACGCTCCATGGTGTGGTGCTTGTC
AAGCTTTGACTCCTGAATATGCTGCTGCTGCTACTGAATTGAAAGGTTTGGCTGCT
TTGGCTAAGATTGATGCTACTGAAGAAGGTGACTTGGCTCAAAAGTATGAAATTC
AAGGTTTTCCTACTGTTTTCTTGTTTGTTGATGGTGAAATGAGAAAGACTTATGAA
GGTGAAAGAACTAAGGATGGTATTGTTACTTGGTTGAAAAAGAAAGCTTCTCCTTC
TATTCATAACATTACTACTAAGGAAGAGGCTGAAAGAGTTTTGTCTGCTGAACCAA
AGTTGGTTTTTGGTTTTCTTAACTCTTTGGTTGGTTCTGAATCTGAAGAATTGGCCG
CTGCTTCTAGATTGGAAGATGATTTGTCTTTTTACCAAACTGCTTCTCCTGATATTG
CTAAATTGTTCGAAATTGAAACCCAAGTTAAGCGTCCTGCTTTGGTTTTGTTGAAA
AAGGAAGAAGAAAAGTTGGCTAGATTTGATGGTAATTTTACTAAGACTGCTATCG
CTGAATTTGTTTCTGCTAATAAGGTTCCATTGGTTATTAATTTCACCAGAGAAGGT
GCTTCTTTGATTTTCGAATCTTCTGTTAAGAACCAATTGATTTTGTTCGCTAAAGCT
AATGAATCTGAAAAGCATTTGCCTACTTTGAGAGAAGTTGCTAAGTCTTTCAAAGG
TAAATTCGTTTTCGTTTACGTTCAAATGGATAATGAAGATTACGGTGAAGCTGTTT
CTGGTTTCTTTGGTGTTACTGGTGCTGCTCCAAAGGTTTTGGTTTATACTGGTAACG
AAGATATGAGAAAGTTCATTTTGGATGGTGAATTGACTGTTAACAATATTAAGACT
CTGGCTGAAGATTTTCTTGCTGATAAGTTGAAACCATTCTACAAGTCTGATCCATT
GCCTGAAAACAACGATGGTGACGTTAAGGTTATTGTTGGTAACAACTTCGATGAA
ATTGTTTTGGATGAATCTAAGGATGTTTTGTTGGAAATCTATGCTCCATGGTGCGG TCATTGTCAATCTTTTGAACCAATCTATAACAAGTTGGGTAAATACTTGAAGGGTA TTGATTCTTTGGTTGTTGCTAAAATGGATGGTACTTCTAACGAACATCCAAGAGCT AAAGCTGATGGTTTTCCTACCATTTTGTTTTTCCCTGGTGGTAATAAGTCTTTCGAT CCTATTGCTGTTGATGTTGATAGAACTGTTGTTGAATTGTATAAGTTCTTGAAGAA GCATGCTTCTATTCCTTTCAAGTTGGAAAAGCCAGCTACTCCAGAACCTGTTATTT
CTACTATGAAGTCTGATGAAAAGATCGAAGGTGACTCTTCTAAGGATGAATTGTA A
SEQ ID NO: 17 Amino acid sequence of HAC1
MPVDSSHKTASPLPPRKRAKTEEEKEQRRVERILRNRRAAHASREKKRRHVEFLENH V VDLESALQESAKATNKLKEIQDIIVSRLEALGGTVSDLDLTVPEVDFPKSSDLEPMSDL STSSKSEKASTSTRRSLTEDLDEDDVAEYDDEEEDEELPRKMKVLNDKNKSTSIKQEK LNELPSPLSSDFSDVDEEKSTLTHLKLQQQQQQPVDNYVSTPLSLPEDSVDFINPGNLKI
ESDENFLLSSNTLQIKHENDTDYITTAPSGSINDFFNSYDISESNRLHHPAVMTDSS LHIT AGSIGFF SLIGGGES S VAGRRS S VGTYQLTCIAIR
SEQ ID NO: 18 DNA sequence encoding HAC1
ATGCCCGTAGATTCTTCTCATAAGACAGCTAGCCCACTTCCACCTCGTAAAAGAGC
AAAGACGGAAGAAGAAAAGGAGCAGCGTCGAGTGGAACGTATCCTACGTAATAG GAGAGCGGCCCATGCTTCCAGAGAGAAGAAACGAAGACACGTTGAATTTCTGGAA AACCACGTCGTCGACCTGGAATCTGCACTTCAAGAATCAGCCAAAGCCACTAACA AGTTGAAAGAAATACAAGATATCATTGTTTCAAGGTTGGAAGCCTTAGGTGGTAC CGTCTCAGATTTGGATTTAACAGTTCCGGAAGTCGATTTTCCCAAATCTTCTGATTT
GGAACCCATGTCTGATCTCTCAACTTCTTCGAAATCGGAGAAAGCATCTACATCCA
CTCGCAGATCTTTGACTGAGGATCTGGACGAAGATGACGTCGCTGAATATGACGA CGAAGAAGAGGACGAAGAGTTACCCAGGAAAATGAAAGTCTTAAACGACAAAAA CAAGAGCACATCTATCAAGCAGGAGAAGTTGAATGAACTTCCATCTCCTTTGTCAT CCGATTTTTCAGACGTAGATGAAGAAAAGTCAACTCTCACACATTTAAAGTTGCAA CAGCAACAACAACAACCAGTAGACAATTATGTTTCTACTCCTTTGAGTCTTCCGGA
GGATTCAGTTGATTTTATTAACCCAGGTAACTTAAAAATAGAGTCCGATGAGAACT TCTTGTTGAGTTCAAATACTTTACAAATAAAACACGAAAATGACACCGACTACATT ACTACAGCTCCATCAGGTTCCATCAATGATTTTTTTAATTCTTATGACATTAGCGAG TCGAATCGGTTGCATCATCCAGCAGTGATGACGGATTCATCTTTACACATTACAGC AGGCTCCATCGGCTTTTTCTCTTTGATTGGGGGGGGGGAAAGTTCTGTAGCAGGGA
GGCGCAGTTCAGTTGGCACATATCAGTTGACATGCATAGCGATCAGG
SEQ ID NO: 19 Amino acid sequence of AtEROl
MGKGAIKEEESEKKRKTWRWPLATLVVVFLAVAVSSRTNSNVGFFFSDRNSCSCSLQ
KTGKYKGMIEDCCCDYETVDNLNTEVLNPLLQDLVTTPFFRYYKVKLWCDCPFWPD
DGMCRLRDCSVCECPENEFPEPFKKPFVPGLPSDDLKCQEGKPQGAVDRTIDNRAFR G
WVETKNPWTHDDDTDSGEMSYVNLQLNPERYTGYTGPSARRIWDSIYSENCPKYSSG
ETCPEKKVLYKLISGLHSSISMHIAADYLLDESRNQWGQNIELMYDRILRHPDRVRN M
YFTYLFVLRAVTKATAYLEQAEYDTGNHAEDLKTQSLIKQLLYSPKLQTACPVPFDE A
KLWQGQSGPELKQQIQKQFRNISALMDCVGCEKCRLWGKLQVQGLGTALKILFSVGN
QDIGDQTLQLQRNEVIALVNLLNRLSESVKMVHDMSPDVERLMEDQIAKVSAKPARL
RRIWDLAVSFW
SEQ ID NO: 20 DNA sequence encoding AtEROl
ATGGGTAAAGGTGCTATTAAGGAAGAAGAATCTGAAAAGAAGAGAAAAACTTGG
AGATGGCCTTTGGCTACTTTGGTTGTTGTTTTCTTGGCTGTTGCTGTTTCTTCTAGA
ACTAACTCTAACGTTGGTTTCTTTTTCTCTGATAGAAATTCTTGTTCCTGTTCTTTG C
AAAAAACTGGTAAATACAAGGGTATGATTGAAGATTGTTGTTGTGATTATGAGAC
TGTTGATAACTTGAATACTGAAGTTTTGAACCCTTTGTTGCAAGATTTGGTTACTAC
TCCATTTTTCAGATACTACAAAGTTAAGTTGTGGTGTGATTGTCCATTCTGGCCAG
ATGATGGTATGTGTAGATTGAGAGATTGTTCTGTTTGTGAATGTCCAGAAAACGAA
TTTCCTGAACCATTCAAAAAGCCTTTCGTTCCTGGTTTGCCATCTGATGATTTGAAA
TGTCAAGAAGGTAAACCACAAGGTGCTGTTGATAGAACTATTGATAACAGAGCTT
TTAGAGGTTGGGTTGAAACTAAAAACCCTTGGACTCATGATGATGATACTGATTCT
GGTGAAATGTCTTATGTTAATTTGCAATTGAACCCAGAAAGATACACTGGTTACAC
TGGTCCTTCTGCTAGAAGAATTTGGGATTCTATCTATTCTGAAAACTGTCCAAAGT
ACTCTTCTGGTGAAACTTGTCCAGAAAAGAAAGTTTTGTATAAGTTGATCTCCGGT
TTGCATTCTTCTATTTCTATGCATATTGCTGCTGATTATTTGTTGGATGAATCTAGA
AATCAGTGGGGTCAAAACATTGAATTGATGTATGATAGAATCCTGAGACATCCAG
ATAGAGTTAGAAATATGTATTTCACTTACCTGTTCGTTTTGAGAGCTGTTACTAAA
GCTACTGCTTATTTGGAACAAGCTGAATACGATACTGGTAACCATGCTGAAGATTT
GAAAACTCAATCTTTGATTAAGCAGTTGTTGTATTCTCCTAAATTGCAAACTGCTT GTCCAGTTCCTTTTGATGAAGCTAAGTTGTGGCAAGGTCAATCTGGTCCAGAATTG
AAACAACAAATTCAAAAACAGTTCAGAAACATCTCTGCTTTGATGGATTGTGTTGG
TTGTGAAAAGTGTAGATTGTGGGGTAAATTGCAAGTTCAAGGTTTGGGTACTGCTT
TGAAAATTTTGTTTTCTGTTGGTAACCAGGATATCGGTGACCAAACTTTGCAATTG
CAAAGAAACGAAGTTATTGCTTTGGTTAATTTGTTGAACAGATTGTCTGAATCTGT
TAAGATGGTTCATGATATGTCTCCAGATGTTGAAAGATTGATGGAAGATCAAATTG
CTAAAGTTTCTGCTAAACCTGCTAGATTGAGAAGAATTTGGGACTTGGCTGTTTCT TTCTGGTAA
SEQ ID NO: 21 Amino acid sequence of AtERO2
MAETDVGSVKGKEKGSGKRWILLIGAIAAVLLAVVVAVFLNTQNSSISEFTGKICNC R
QAEQQKYIGIVEDCCCDYETVNRLNTEVLNPLLQDLVKTPFYRYFKVKLWCDCPFWP
DDGMCRLRDCSVCECPESEFPEVFKKPLSQYNPVCQEGKPQATVDRTLDTRAFRGWT
VTDNPWTSDDETDNDEMTYVNLRLNPERYTGYIGPSARRIWEAIYSENCPKHTSEGS C
QEEKILYKLVSGLHSSISVHIASDYLLDEATNLWGQNLTLLYDRVLRYPDRVQNLYF T
FLFVLRAVTKAEDYLGEAEYETGNVIEDLKTKSLVKQVVSDPKTKAACPVPFDEAKL
WKGQRGPELKQQLEKQFRNISAIMDCVGCEKCRLWGKLQILGLGTALKILFTVNGED
NLRHNLELQRNEVIALMNLLHRLSESVKYVHDMSPAAERIAGGHASSGNSFWQRIVT S IAQSKAVSGKRS
SEQ ID NO: 22 DNA sequence encoding AtERO2
ATGGCTGAAACTGATGTTGGTTCTGTTAAGGGTAAAGAAAAGGGTTCTGGTAAAA
GATGGATTTTGTTGATTGGTGCTATTGCTGCTGTTTTGTTGGCTGTTGTTGTTGCTG
TTTTCTTGAACACTCAAAACTCTTCTATTTCTGAGTTTACTGGTAAAATCTGTAACT
GTAGACAAGCTGAACAACAAAAGTACATTGGTATTGTTGAAGATTGTTGTTGTGAT
TATGAGACTGTTAACAGATTGAACACTGAAGTTTTGAACCCATTGTTGCAAGATTT
GGTTAAGACTCCATTCTACAGATACTTTAAGGTTAAGTTGTGGTGTGATTGTCCTTT
CTGGCCAGATGATGGTATGTGTAGATTGAGAGATTGTTCTGTTTGTGAATGTCCAG
AATCTGAATTTCCTGAAGTTTTCAAGAAACCTTTGTCTCAATATAACCCAGTTTGTC
AAGAAGGTAAACCACAAGCTACTGTTGATAGAACTTTGGATACTAGAGCTTTCAG
AGGTTGGACTGTTACTGATAATCCTTGGACTTCTGATGATGAAACTGATAACGATG
AAATGACTTATGTTAACTTGAGATTGAACCCAGAAAGATACACTGGTTATATTGGT
CCATCTGCTAGAAGAATTTGGGAAGCTATCTATTCTGAAAATTGTCCAAAACATAC CTCTGAAGGTTCTTGTCAAGAAGAAAAGATTTTGTATAAGCTGGTTTCTGGTTTGC
ATTCTTCTATTTCCGTTCATATTGCTTCTGATTACTTGTTGGATGAAGCTACTAACT
TGTGGGGTCAAAACTTGACTTTGTTGTATGATAGAGTTTTGAGATACCCAGATAGA
GTTCAAAACTTGTACTTTACTTTCTTGTTCGTTTTGAGAGCTGTTACTAAAGCTGAA
GATTACTTGGGTGAAGCTGAATACGAAACTGGTAACGTTATTGAAGATTTGAAAA
CTAAATCTCTGGTCAAGCAAGTTGTTTCTGATCCAAAAACTAAGGCTGCTTGTCCA
GTTCCATTTGATGAAGCTAAGTTGTGGAAGGGTCAAAGAGGTCCAGAATTGAAGC
AACAATTGGAAAAGCAATTTCGTAACATTTCTGCTATTATGGATTGTGTTGGTTGT
GAAAAATGTAGATTGTGGGGTAAATTGCAAATTTTGGGTTTGGGTACTGCTTTGAA
AATTTTGTTTACTGTTAACGGTGAGGATAATTTGAGACATAACTTGGAATTGCAAA
GAAACGAAGTTATTGCTTTGATGAATTTGTTGCATAGATTGTCTGAATCTGTTAAA
TACGTTCATGATATGTCTCCTGCTGCTGAAAGAATTGCTGGTGGTCATGCTTCTTCT
GGTAATTCTTTTTGGCAAAGAATTGTTACTTCCATTGCTCAATCTAAAGCTGTTTCT GGTAAAAGATCCTAA
SEQ ID NO: 23 Amino acid sequence of AtERVl
MGEKPWQPLLQSFEKLSNCVQTHLSNFIGIKNTPPSSQSTIQNPIISLDSSPPIATN SSSLQ
KLPLKDKSTGPVTKEDLGRATWTFLHTLAAQYPEKPTRQQKKDVKELMTILSRMYPC
RECADHFKEILRSNPAQAGSQEEFSQWLCHVHNTVNRSLGKLVFPCERVDARWGKLE
CEQKSCDLHGTSMDF
SEQ ID NO: 24 DNA sequence encoding AtERVl
ATGGGTGAAAAACCATGGCAACCATTGTTGCAATCTTTCGAAAAGTTGTCTAATTG
TGTTCAAACTCATTTGTCTAACTTCATTGGTATTAAGAACACTCCACCATCTTCTCA
ATCTACTATTCAAAACCCTATTATCTCTTTGGATTCTTCTCCACCAATTGCTACTAA
TTCTTCTTCTTTGCAAAAGTTGCCTTTGAAGGATAAGTCTACTGGTCCAGTTACTAA
GGAAGATTTGGGTAGAGCTACTTGGACTTTTCTTCATACTTTGGCTGCTCAATACC
CTGAAAAACCTACTAGACAACAAAAGAAAGATGTTAAGGAATTGATGACTATCTT
GTCTAGAATGTATCCATGTAGAGAATGTGCTGATCATTTCAAAGAAATTTTGAGAT
CCAACCCTGCTCAAGCTGGTTCTCAAGAAGAATTTTCTCAATGGTTGTGTCATGTT
CATAACACTGTTAATAGATCCTTGGGTAAATTGGTTTTCCCTTGTGAAAGAGTTGA
TGCTAGATGGGGTAAATTGGAATGTGAACAAAAATCTTGTGACTTGCATGGTACTT CTATGGATTTTTAA SEQ ID NO: 25 Amino acid sequence of PpERV2
MIKFNKRVATLTATLLSFIVLYTLFNSGARFANQLDQPVPLKTPELIIPNQSTKNDA PLP
FMPKMANETLKAELGNASWKLFHTILARYPESPSENQKSTLNDYIYLFAQVYPCGDC ARHFNLLLQKYPPQLS SRQ VAAVWGCHIHNQ VNKRLEKPQ YDC SNILED YDCGCGSD EKEVDDTLNNETMEHLQSIKITEKENEQFGR
SEQ ID NO: 26 DNA sequence encoding PpERV2
ATGATAACATTCAACAAACGAATAGCAACATTAGCGGCAACGTTATTTTCATTCAT
TGTGCTTTATACTCTCTTTAACAGTGGTGCTCAATTTTCCAACCAACTAGATCAGCC
TGTTCCCCTCAAAACTCCAGAACTCATCATACCGAATCAGAGTACTGAGAATGATC
CCCCTCTTCCATTCATGCCAAAAATGGCTAACGAAACTTTGAAAGCAGAACTTGGA
AATGCTTCCTGGAAACTCTTTCACACTATTCTTGCTAGATATCCTGAATCCCCATCG
GAGAATCAAAAATCAACCTTAAATGACTACATTTATTTGTTTGCACAGGTTTATCC
ATGTGGAGACTGTGCAAGACATTTCAATTTATTGCTGCAGAAATACCCTCCACAAT
TGTCCTCAAGACAGGTGGCTGCAGTGTGGGGATGTCATATTCACAATCAGGTCAAT
AAGAGATTGGAGAAACCACAATACGACTGCTCCAATATTCTAGAGGATTACGATT
GTGGATGTGGCTCTGATGAAAAGGAAGTAGATGACACTCTGAATAACGAAACAAT
AGAACACTTGCAAAGTATCAAAATTACTGAAAAAGAGAGTGAACAATTTGGTCGA
SEQ ID NO: 27 Amino acid sequence of PpEROl
MRIVRSLAVTITCYCITALANPQIPFDGNYTEITVPDTEVNIGQIVDINHEIKPKLV ELVN
TDFFKYYKLNLWKPCPFWNGDEGFCKYKDCSVDFITDWSQVPDIWQPDQLGKLGDN
TVHKDKGQDENELSSNDYCALDKDDDEDLVYVNLIDNPERFTGYGGQQSESIWTAVY
DENCFQPNEGSQLGQVEDLCLEKQIFYRLVSGLHSSISTHLTNEYLNLKNGEYEPNL KQ
FMIKVGYFTERIQNLHLNYVLVLKSLIKLQEYNVIENLPLDDSLKAGLSGLISQGAQ NI
NQTDDYLFNEKVLFQNDQNDDLKNEFRDKFRNVTRLMDCVHCERCKLWGKLQTTG
YGTALKILFDLKNPNDSINLKRVELVALVNTFHRLSKSVESIENFEKLYKIQPPTQD HPS PSSESLDVFDNEDEQNFFDSFSVDQTVTSSKEPPEEIKSKPVGKAEYKKTNSCPSSGSKS IKEAFHEELYAFIDAIGFILNSYRTLPKLLYTLFLVKSSELWDIFIGTQRHRDSTYRVDL
SEQ ID NO: 28 DNA sequence encoding PpEROl ATGAGGATAGTAAGGAGCGTAGCTATCGCAATAGCCTGTCATTGTATAACAGCGT
TAGCAAACCCTCAAATCCCTTTTGACGGCAACTACACCGAGATCATCGTGCCAGAT
ACCGAAGTTAACATCGGACAGATTGTAGATATTAACCACGAAATAAAACCCAAAC
TGGTGGAACTGGTCAACACAGACTTCTTCAAATATTACAAATTAAACCTATGGAA
ACCATGTCCGTTTTGGAATGGTGATGAGGGATTCTGCAAGTATAAGGATTGCTCTG
TTGACTTTATCACTGATTGGTCCCAGGTGCCTGATATCTGGCAACCAGACCAATTG
GGTAAGCTTGGAGATAACACGGTACATAAGGATAAGGGCCAAGATGAAAATGAG
CTGTCCTCAAATGATTATTGCGCTTTGGATAAAGACGACGATGAAGATTTAGTATA
TGTCAATTTGATTGATAACCCTGAAAGATTCACCGGTTATGGTGGTCAGCAATCTG
AATCTATTTGGACTGCGGTCTATGATGAGAACTGTTTCCAGCCGAATGAAGGATCA
CAATTGGGTCAAGTTGAAGACCTCTGTTTGGAGAAACAAATCTTTTACCGATTGGT
TTCTGGTTTGCATTCTAGTATCTCCACCCACCTCACAAACGAATATCTGAATTTGA
AAAATGGAGCATACGAACCAAATTTGAAACAGTTCATGATCAAAGTTGGGTATTT
TACTGAAAGAATCCAAAACTTACATCTCAATTATGTCCTTGTATTGAAGTCACTAA
TAAAGCTACAAGAATACAATGTTATCGACAATCTACCTCTCGATGACTCTTTGAAA
GCTGGTCTTAGCGGTTTAATATCTCAAGGAGCACAGGGTATTAACCAGAGTTCTGA
TGATTATCTATTTAACGAGAAGGTTCTTTTCCAAAATGACCAAAATGATGATTTGA
AAAATGAATTTCGTGACAAATTCCGCAACGTGACTAGATTAATGGATTGTGTCCAT
TGCGAGAGATGCAAATTATGGGGAAAATTGCAAACTACAGGGTACGGGACTGCAT
TGAAGATTCTATTTGATTTGAAGAATCCTAATGACTCCATCAATTTAAAGAGAGTT
GAGTTAGTTGCTCTAGTCAACACATTCCATAGATTGTCCAAATCTGTTGAAAGCAT
TGAAAACTTTGAAAAACTATATAAGATTCAACCGCCAACGCAGGATCGTGCATCA
GCGTCGTCCGAATCCTTAGGCCTTTTCGATAACGAAGATGAACAAAATCTCCTCAA
CTCGTTTTCGGTTGATCAGGCAGTCATTTCATCGAAAGAGGCACCAGAAGAAATC
AAAAGCAAACCTGTTGGAAAAGCCGCATATAAACAAAACAGTTGTCCATCATTGG
GTTCAAAATCTATCAAAGAAGCATTCCATGAAGAACTTCACGCATTTATTGATGCA
ATTGGATTTATATTGAACTCTTACAGGACTTTGCCCAAGCTGTTGTACACACTTTTC
CTCGTTAAATCATCTGAATTATGGGACATTTTCATTGGCACTCAAAGGCACCGAGA
TACCACATATAGAGTAGACTTGTAAGCGGCCGCCAGCTT
SEQ ID NO: 29 Amino acid sequence of PpKAR2
MLSLKPSWLTLAALMYAMLLVVVPFAKPVRADDVESYGTVIGIDLGTTYSCVGVMK
SGRVEILANDQGNRITPSYVSFTEDERLVGDAAKNLAASNPKNTIFDIKRLIGMKYD AP
EVQRDLKRLPYTVKSKNGQPVVSVEYKGEEKSFTPEEISAMVLGKMKLIAEDYLGKK VTHAVVTVPAYFNDAQRQATKDAGLIAGLTVLRIVNEPTAAALAYGLDKTGEERQIIV
YDLGGGTFDVSLLSIEGGAFEVLATAGDTHLGGEDFDYRVVRHFVKIFKKKHNIDIS N
NDKALGKLKREVEKAKRTLSSQMTTRIEIDSFVDGIDFSEQLSRAKFEEINIELFKK TLK
PVEQVLKDAGVKKSEIDDIVLVGGSTRIPKVQQLLEDYFDGKKASKGINPDEAVAYG A
AVQAGVLSGEEGVDDIVLLDVNPLTLGIETTGGVMTTLINRNTAIPTKKSQIFSTAA DN
QPTVLIQVYEGERALAKDNNLLGKFELTGIPPAPRGTPQVEVTFVLDANGILKVSAT D
KGTGKSESITINNDRGRLSKEEVDRMVEEAEKYAAEDAALREKIEARNALENYAHSL R
NQVTDDSETGLGSKLDEDDKETLTDAIKDTLEFLEDNFDTATKEELDEQREKLSKIA Y
PITSKLYGAPEGGTPPGGQGFDDDDGDFDYDYDYDHDEL
SEQ ID NO: 30 DNA sequence encoding PpKAR2
ATGCTGTCGTTAAAACCATCTTGGCTGACTTTGGCGGCATTAATGTATGCCATGCT
ATTGGTCGTAGTGCCATTTGCTAAACCTGTTAGAGCTGACGATGTCGAATCTTATG
GAACAGTGATTGGTATCGATTTGGGTACCACGTACTCTTGTGTCGGTGTGATGAAG
TCGGGTCGTGTAGAAATTCTTGCTAATGACCAAGGTAACAGAATCACTCCTTCCTA
CGTTAGTTTCACTGAAGACGAGAGACTGGTTGGTGATGCTGCTAAGAACTTAGCTG
CTTCTAACCCAAAAAACACCATCTTTGATATTAAGAGATTGATCGGTATGAAGTAT
GATGCCCCAGAGGTCCAAAGAGACTTGAAGCGTCTTCCTTACACTGTCAAGAGCA
AGAACGGCCAACCTGTCGTTTCTGTCGAGTACAAGGGTGAGGAGAAGTCTTTCAC
TCCTGAGGAGATTTCCGCCATGGTCTTGGGTAAGATGAAGTTGATCGCTGAGGACT
ACTTAGGAAAGAAAGTCACTCATGCTGTCGTTACCGTTCCAGCCTACTTCAACGAC
GCTCAACGTCAAGCCACTAAGGATGCCGGTCTGATCGCCGGTTTGACTGTTCTGAG
AATTGTGAACGAGCCTACCGCCGCTGCCCTTGCTTACGGTTTGGACAAGACTGGTG
AGGAAAGACAGATCATCGTCTACGACTTGGGTGGAGGAACCTTCGATGTTTCTCTG
CTTTCTATTGAGGGTGGTGCTTTCGAGGTTCTTGCTACCGCCGGTGACACCCACTT
GGGTGGTGAGGACTTTGACTACAGAGTTGTTCGCCACTTCGTTAAGATTTTCAAGA
AGAAGCATAACATTGACATCAGCAACAATGATAAGGCTTTAGGTAAGCTGAAGAG
AGAGGTCGAAAAGGCCAAGCGTACTTTGTCTTCCCAGATGACTACCAGAATTGAG
ATTGACTCTTTCGTTGACGGTATCGACTTCTCTGAGCAACTGTCTAGAGCTAAGTTT
GAGGAGATCAACATTGAATTATTCAAGAAGACACTGAAACCAGTTGAACAAGTCC
TCAAAGACGCTGGTGTCAAGAAATCTGAAATTGATGACATTGTCTTGGTTGGTGGT
TCTACCAGAATCCCAAAGGTTCAACAATTATTGGAGGATTACTTTGACGGAAAGA
AGGCTTCTAAGGGAATTAACCCAGATGAAGCTGTCGCATACGGTGCTGCTGTTCA
GGCTGGTGTTTTGTCTGGTGAGGAAGGTGTCGATGACATCGTCTTGCTTGATGTGA ACCCCCTAACTCTGGGTATCGAGACTACTGGTGGCGTTATGACTACCTTAATCAAC AGAAACACTGCTATCCCAACTAAGAAATCTCAAATTTTCTCCACTGCTGCTGACAA CCAGCCAACTGTGTTGATTCAAGTTTATGAGGGTGAGAGAGCCTTGGCTAAGGAC AACAACTTGCTTGGTAAATTCGAGCTGACTGGTATTCCACCAGCTCCAAGAGGTAC TCCTCAAGTTGAGGTTACTTTTGTTTTAGACGCTAACGGAATTTTGAAGGTTTCTGC CACCGATAAGGGAACTGGAAAATCCGAGTCCATCACCATCAACAATGATCGTGGT AGATTGTCCAAGGAGGAGGTTGACCGTATGGTTGAAGAGGCCGAGAAGTACGCCG CTGAGGATGCTGCACTAAGAGAAAAGATTGAGGCTAGAAACGCTCTGGAGAACTA CGCTCATTCCCTTAGGAACCAAGTTACTGATGACTCTGAAACCGGGCTTGGTTCTA AATTGGACGAGGACGACAAAGAGACATTGACAGATGCCATCAAAGATACCCTAG AGTTCTTGGAAGACAACTTCGACACCGCAACCAAGGAAGAATTAGACGAACAAAG AGAAAAGCTTTCCAAGATTGCTTACCCAATCACTTCTAAGCTATACGGTGCTCCAG AGGGTGGTACTCCACCTGGTGGTCAAGGTTTTGACGATGATGATGGAGACTTTGAC TACGACTATGACTATGATCATGATGAGTTGTAA
SEQ ID NO: 31 Amino acid sequence of PpSECl
MDLVKVGQSYVDKIVTDTGIKVLLLDDITSSIISLVSTQSELLNHQVYLIDKLENEN RD TIKQLDCVCFLSVSEKTINLLVEELGAPKYKSYKLYFNNVVPNSFLERLAERDDLEMV DKVMELFLD YDILNKNLF SFKQLNIFNSID AWNQQQFLLTL ASLKSLCF SLQTNPIIRYE SNSRMCSKLASDLSYEFGQSSKIMEKFPVNDIPPVLLILDRKNDPITPLLNPWTYQSMV HELLGIFNNTVDLTGTPSDLPPDLIKLVLNPSQDPFYAQSLYLNFGDLSDSIKTYVNEY KEKTVKHNSNELTDLNDMKHFLESFPEFKKLSNNISKHMGLITELDRKINENHLWQVS ELEQSIAVNDNHNADLQELEKLLTSQEFKIANNLKVKLVCLYAIRYELHPNNQLPKML SILLQQGVPEFEINTVNRMLKYSGSTKRLNDDSESSIFNQATNNLLQGFKQSHENDNIY MQHIPRLERVISKLVKNKLPTAHYPTLINDFLKKQRPVSDLNGARLQDIIIFFVGGVTYE EARIINNFNLVNKSTRIVIGGTTVHNTNSFMTQVLELE
SEQ ID NO: 32 DNA sequence encoding PpSECl
ATGGACTTGGTTAAGGTTGGACAATCCTACGTGGATAAAATTGTCACAGACACAG GCATTAAGGTTCTTTTATTGGATGATATCACTTCTTCCATAATTTCCCTAGTGAGCA CCCAATCAGAATTGTTGAACCATCAGGTGTATTTGATCGACAAGTTGGAGAACGA GAATAGAGATACGATAAAGCAATTGGATTGTGTGTGTTTCCTATCAGTATCAGAA AAAACTATAAACTTGCTTGTTGAGGAATTAGGTGCTCCCAAATACAAATCCTACAA GCTCTACTTCAATAATGTAGTTCCCAACTCATTCTTAGAGAGGTTGGCGGAGAGGG
ACGATTTGGAAATGGTCGATAAGGTCATGGAATTGTTCCTAGATTACGACATTTTG
AACAAGAACTTGTTTTCCTTCAAACAACTGAATATTTTCAATTCAATTGATGCTTG
GAATCAGCAACAGTTTCTCTTGACTTTAGCAAGCTTGAAATCACTCTGCTTCTCCTT
GCAAACGAATCCTATAATCAGGTATGAATCTAATAGTCGAATGTGTTCTAAGCTAG
CTTCCGATTTGTCATACGAATTTGGGCAAAGTTCTAAAATTATGGAAAAGTTCCCG
GTGAATGATATCCCTCCTGTCCTGTTAATTCTTGACCGAAAAAACGACCCAATCAC
TCCATTATTAAATCCTTGGACTTATCAATCTATGGTACACGAGCTTTTAGGAATTTT
CAATAATACGGTGGATTTAACGGGAACTCCTTCTGATCTGCCCCCAGACCTAATCA
AACTGGTATTGAATCCCTCTCAAGATCCATTTTATGCTCAGTCTCTATATTTGAATT
TCGGAGACTTGTCCGATAGTATAAAAACATACGTAAACGAGTACAAAGAAAAAAC
CGTCAAACACAATTCTAATGAATTGACAGATTTGAATGATATGAAACACTTTCTGG
AATCTTTTCCAGAGTTCAAAAAACTTTCAAACAACATTTCCAAACACATGGGCTTG
ATTACAGAATTAGATAGAAAAATCAACGAAAATCACTTATGGCAAGTGAGTGAAT
TGGAACAATCCATAGCTGTTAATGACAATCATAATGCTGACCTTCAAGAACTAGA
AAAGCTGTTGACATCTCAAGAGTTCAAGATTGCCAACAACTTAAAAGTTAAATTA GTATGTTTGTATGCCATACGATATGAACTTCATCCCAACAACCAGCTTCCAAAAAT
GTTGTCAATACTTTTACAGCAGGGGGTGCCAGAGTTTGAAATAAATACAGTCAAC
AGGATGTTGAAATACTCGGGAAGTACCAAACGATTGAATGATGACTCTGAATCTT CGATATTTAACCAGGCAACAAATAATCTACTGCAGGGGTTCAAACAAAGTCATGA
AAACGACAATATTTATATGCAGCATATTCCAAGGTTGGAAAGAGTTATCAGCAAG
TTAGTGAAAAATAAGCTACCCACAGCGCATTATCCGACTTTAATCAATGATTTTTT
GAAGAAGCAACGCCCTGTTTCTGATCTAAATGGAGCCAGGCTGCAAGATATTATT ATTTTCTTTGTTGGTGGAGTCACTTATGAAGAGGCCCGAATAATTAACAATTTCAA
TCTGGTGAACAAGTCTACGAGGATAGTTATAGGGGGAACTACAGTACACAACACG AATAGTTTTATGACTCAAGTTCTAGAATTGGAGTAA
SEQ ID NO: 33 Amino acid sequence of PpSLYl
MSFTTSLPSLRDRQIATLEKMLHLNEPIVDNGSDIQAELTWKVLILDSRSTAIVSSV LRV NDLLS SGITMHSNIRSKRAALPDVP VI YF VEPN AENINF IIDDLERDQ YAHF YINFTS SLN RDLLEEFAKKVATIGKSYKIKQVYDQYLDYIVTEPNLFSLDLVNIYSQLNNPNSLEDEI NKVADKISNGIFAAILTMNGIPTIRCCRGGPAELIASKLDQKLRDHVINTKSSASFTNSK LVLILLDRNIDLASMFAHSWIYQCMVSDVFELKRNTIKIPSQKPNESTKEYDIDPKDFF WAANNSLPFPDAVENVENELSRYKADAAELTRKTGVSSLQDIDPNAITDTTDIQLAVK SLPELAFRKSILDMHMKVLASLLQELESKSLDSYFEIEQNYKDPKNQKQFISILNNGNE
HTLNDKLRTYIMLYLLTDLPGSFVEECEEYFKKNSAELGSLSYIKRAKEVIKLSNYE LS
MSIDASHSTTSGLVNEAQKSALFQGLSSKLYGLTDGGSRLTEGVGSLITGLKNLLPD K
KQLPITNIVESIMEPSLATQESIKLTDDYLYFDPISTRGVHSKPPKRQQYNNSIVFV VGG
GNYLEYQNLQEWVTKTNTSNVNGTKSVIYGSTSIVTANEFLKECSLLGAEAK
SEQ ID NO: 34 DNA sequence encoding PpSLYl
ATGCTTCATTTGAATGAGCCCATTGTGGATAATGGTTCAGATATACAAGCGGAGTT
AACATGGAAGGTACTGATTCTGGATAGTAGGAGTACTGCAATTGTTTCTTCTGTTC
TGCGAGTTAATGACCTGCTTTCTTCTGGCATCACTATGCATAGCAATATCAGATCC
AAGAGAGCGGCTTTGCCAGATGTTCCTGTCATTTACTTTGTTGAACCTAATGCGGA
AAATATCAACTTTATCATTGATGACTTGGAAAGAGATCAGTACGCTCATTTTTATA
TCAACTTCACTTCCAGTCTAAATAGGGACCTTTTGGAGGAGTTTGCTAAGAAAGTG
GCTACGATTGGTAAGTCCTACAAGATTAAACAGGTTTATGATCAGTACCTCGATTA
CATTGTCACTGAACCCAACCTGTTCTCTTTGGACTTGGTTAACATTTACTCGCAGCT
AAATAACCCTAACTCACTGGAAGATGAAATCAATAAAGTTGCTGACAAGATTTCC
AATGGTATATTCGCAGCAATCCTAACTATGAATGGTATCCCTACTATTAGATGTTG
CAGAGGAGGTCCAGCAGAACTAATAGCGTCCAAACTAGATCAGAAGCTACGTGAT
CATGTTATCAATACAAAGTCATCTGCCTCTTTCACTAACAGTAAATTAGTGCTTAT
CCTGCTGGATAGAAACATTGATTTGGCTTCCATGTTTGCTCATTCATGGATTTATCA
ATGTATGGTGAGTGATGTTTTTGAGTTGAAAAGAAATACAATCAAAATTCCCTCTC
AAAAGCCCAATGAATCTACGAAAGAATATGATATCGACCCAAAGGATTTTTTTTG
GGCAGCCAACAACAGTTTGCCCTTCCCTGATGCTGTAGAAAATGTGGAGAACGAA
CTTTCTAGATACAAAGCGGATGCTGCAGAGCTAACTAGAAAGACTGGGGTTTCTTC
TCTTCAAGATATTGATCCCAATGCAATTACTGACACCACAGATATACAGCTTGCTG
TGAAGTCTTTACCTGAATTGGCTTTTAGAAAAAGCATCCTTGATATGCACATGAAA
GTACTTGCGTCTTTGCTGCAAGAACTGGAATCAAAGTCATTGGATTCATACTTTGA
AATTGAACAAAACTACAAAGATCCCAAAAACCAGAAGCAGTTTATCAGTATCCTC
AACAACGGGAATGAGCATACCTTGAACGACAAACTGAGAACCTACATCATGTTGT
ATCTGTTAACAGACCTCCCAGGGTCGTTCGTTGAAGAATGTGAAGAGTATTTCAAA
AAGAACTCCGCTGAGCTTGGTTCGTTGAGTTATATCAAGCGGGCAAAAGAGGTGA
TCAAGTTGTCTAATTATGAGTTGTCCATGTCAATTGATGCTAGCCACTCGACCACT
AGTGGATTGGTGAATGAAGCTCAAAAGTCTGCTTTGTTCCAAGGATTGTCGTCCAA
GCTATATGGATTAACAGATGGTGGTAGTAGGCTTACAGAGGGGGTGGGGTCATTA ATTACTGGGTTGAAAAACTTGCTACCCGACAAGAAACAACTGCCTATTACCAATAT
TGTTGAATCGATAATGGAACCAAGTCTGGCCACTCAAGAGTCGATAAAACTAACG
GACGATTACCTATATTTTGACCCTATTAGCACAAGAGGAGTTCACTCCAAACCACC
CAAAAGACAGCAATACAACAATTCTATTGTGTTTGTTGTAGGAGGGGGCAACTAT
TTGGAGTACCAAAATTTGCAAGAATGGGTTACGAAGACCAATACTAGCAACGTCA
ATGGCACTAAGTCTGTAATCTACGGTAGTACCAGTATCGTGACCGCGAACGAGTTC
TTGAAGGAGTGCTCCTTGCTCGGTGCCGAAGCAAAATAA
SEQ ID NO: 35 Amino acid sequence of PpGPXl
MSSFYDLAPLDKKGEPFPFEQLKGKVVLIVNVASKCGFTPQYTELEKLYKDHKDEGL T IVGFPCNQFGHQEPGNDEEIGQFCQLNFGVTFPILKKIDVNGSEADPVYEFLKSKKSGL LGFKGIKWNFEKFLIDKQGN VIERYS SLTKP S SIESKIEELLKK
SEQ ID NO: 36 DNA sequence encoding PpGPXl
ATGTCTTCATTTTATGATCTGGCCCCATTAGATAAGAAAGGCGAACCTTTTCCTTTC
GAACAATTAAAAGGCAAAGTGGTGTTGATTGTGAATGTTGCTTCTAAGTGTGGGTT
TACTCCACAATATACCGAGTTGGAAAAGCTCTACAAAGACCACAAGGACGAGGGA
TTGACTATTGTCGGATTTCCCTGTAACCAGTTTGGTCATCAGGAACCAGGAAATGA
TGAAGAAATTGGACAGTTTTGCCAGTTGAATTTTGGTGTAACTTTCCCAATTCTAA
AAAAGATTGATGTCAACGGTTCGGAAGCTGATCCTGTTTACGAATTTCTCAAGTCA
AAAAAGTCTGGTCTGCTCGGATTCAAAGGTATTAAGTGGAACTTTGAAAAATTCTT
GATCGATAAGCAAGGAAACGTTATTGAGAGATATTCGTCCTTGACTAAGCCCTCAT
CGATCGAGTCCAAGATTGAAGAACTATTAAAGAAATAA
SEQ ID NO: 37 Amino acid sequence of alpha mating factor signal peptide
MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSN ST
NNGLLFINTTIASIAAKEEGVSLEKR
SEQ ID NO: 38 DNA sequence encoding alpha mating factor signal peptide
ATGAGATTTCCTTCAATTTTTACTGCTGTTTTATTCGCAGCATCCTCCGCATTAGCT
GCTCCAGTCAACACTACAACAGAAGATGAAACGGCACAAATTCCGGCTGAAGCTG
TCATCGGTTACTCAGATTTAGAAGGGGATTTCGATGTTGCTGTTTTGCCATTTTCCA ACAGCACAAATAACGGGTTATTGTTTATAAATACTACTATTGCCAGCATTGCTGCT
AAAGAAGAAGGGGTATCTCTCGAGAAAAGA
SEQ ID NO: 39 Amino acid sequence of spacer 1 (KR)
KR
SEQ ID NO: 40 DNA sequence encoding spacer 1
AAGCGA
SEQ ID NO: 41 Amino acid sequence of spacer 2 (KREA)
KREA
SEQ ID NO: 42 DNA sequence encoding spacer 2
AAGCGAGAAGCC
SEQ ID NO: 43 Amino acid sequence of spacer 3 (KREAEAEAEAEA; also referred to herein as KR(EA) 5 linker (SEQ ID NO: 43))
KREAEAEAEAEA
SEQ ID NO: 44 DNA sequence encoding spacer 3
AAGCGAGAAGCCGAAGCAGAAGCAGAAGCAGAAGCG
SEQ ID NO: 45 Amino acid sequence of spacer 4 (KREAEA, also referred to herein as
KR(EA) 2 )
KREAEA
SEQ ID NO: 46 DNA sequence encoding spacer 4
AAGCGAGAAGCCGAAGCA
SEQ ID NO: 47 Amino acid sequence of spacer 5 (KREAEAEA, also referred to herein as
KR(EA) 3 )
KREAEAEA SEQ ID NO: 48 DNA sequence encoding spacer 5
AAGCGAGAAGCAGAAGCAGAAGCG
SEQ ID NO: 49 Amino acid sequence of spacer 6 (KREAEAEAEA, also referred to herein as KR(EA) 4 )
KREAEAEAEA
SEQ ID NO: 50 DNA sequence encoding spacer 6 AAGCGAGAAGCAGAAGCAGAAGCAGAAGCG
SEQ ID NO: 51 Amino acid sequence of ThmI- linker-2Ax2, wherein the linker sequence comprises a KR(EA)5 linker (SEQ ID NO: 43).
MRQVWFSWIVGLFLCFFNVSSAATFEIVNRCSYTVWAAASKGDAALDAGGRQLNSG ESWTINVEPGTNGGKIWARTDCYFDDSGSGICKTGDCGGLLRCKRFGRPPTTLAEFSL
NQYGKDYIDISNIKGFNVPMDFSPTTRGCRGVRCAADIVGQCPAKLKAPGGGCNDAC TVFQTSEYCCTTGKCGPTEYSRFFKRLCPDAFSYVLDKPTTVTCPGSSNYRVTFCPTAK REAEAEAEAEAGATNF SLLKL AGD VELNPGPMRQ VWF SWIVGLFLCFFNVS S AATFEI VNRCSYTVWAAASKGDAALDAGGRQLNSGESWTINVEPGTNGGKIWARTDCYFDDS GSGICKTGDCGGLLRCKRFGRPPTTLAEF SLNQ YGKD YIDISNIKGFNVPMDF SPTTRG CRGVRCAADIVGQCPAKLKAPGGGCNDACTVFQTSEYCCTTGKCGPTEYSRFFKRLC PD AF S YVLDKPTT VTCPGS SNYRVTFCPT A
SEQ ID NO: 52 DNA sequence encoding ThmI- linker-2 Ax2, wherein the linker sequence comprises a KR(EA)5 linker (SEQ ID NO: 43).
ATGAGGCAGGTTTGGTTCTCTTGGATTGTGGGATTGTTCCTATGTTTTTTCAACGTG TCTTCTGCTGCTACTTTTGAAATCGTTAACAGATGTTCTTACACTGTTTGGGCTGCT GCTTCTAAAGGTGACGCTGCTTTGGATGCTGGTGGTAGACAATTGAATAGTGGTGA ATCTTGGACTATTAATGTTGAACCAGGTACTAATGGTGGTAAAATTTGGGCTAGAA CTGATTGTTACTTCGATGATTCTGGTTCTGGTATTTGTAAAACTGGTGACTGTGGTG GTTTGTTGAGATGTAAAAGATTCGGTAGACCACCTACTACTTTGGCTGAATTTTCT TTGAACCAATACGGTAAAGATTACATTGATATCTCTAACATCAAGGGTTTTAACGT
TCCTATGGATTTCTCTCCTACTACTAGAGGTTGTAGAGGTGTTAGATGTGCTGCTG ATATTGTTGGTCAATGTCCTGCTAAGTTGAAAGCTCCTGGTGGTGGTTGTAACGAT GCTTGTACTGTTTTTCAAACTTCTGAATATTGCTGTACTACTGGTAAATGTGGTCCA ACTGAATACTCTAGATTCTTCAAAAGATTGTGTCCTGATGCTTTTTCTTACGTTTTG GATAAACCAACTACTGTTACTTGTCCTGGTTCTTCTAATTATAGAGTTACTTTCTGT CCAACTGCTAAGCGAGAAGCCGAAGCAGAAGCAGAAGCAGAAGCGGGAGCAACT AATTTCAGTCTCCTGAAGTTGGCCGGTGACGTTGAACTGAACCCTGGACCGATGAG GCAGGTTTGGTTCTCTTGGATTGTGGGATTGTTCCTATGTTTTTTCAACGTGTCTTC TGCTGCTACTTTTGAAATCGTTAACAGATGTTCTTACACTGTTTGGGCTGCTGCTTC TAAAGGTGACGCTGCTTTGGATGCTGGTGGTAGACAATTGAATAGTGGTGAATCTT GGACTATTAATGTTGAACCAGGTACTAATGGTGGTAAAATTTGGGCTAGAACTGA TTGTTACTTCGATGATTCTGGTTCTGGTATTTGTAAAACTGGTGACTGTGGTGGTTT GTTGAGATGTAAAAGATTCGGTAGACCACCTACTACTTTGGCTGAATTTTCTTTGA ACCAATACGGTAAAGATTACATTGATATCTCTAACATCAAGGGTTTTAACGTTCCT ATGGATTTCTCTCCTACTACTAGAGGTTGTAGAGGTGTTAGATGTGCTGCTGATAT TGTTGGTCAATGTCCTGCTAAGTTGAAAGCTCCTGGTGGTGGTTGTAACGATGCTT
GTACTGTTTTTCAAACTTCTGAATATTGCTGTACTACTGGTAAATGTGGTCCAACTG AATACTCTAGATTCTTCAAAAGATTGTGTCCTGATGCTTTTTCTTACGTTTTGGATA AACCAACTACTGTTACTTGTCCTGGTTCTTCTAATTATAGAGTTACTTTCTGTCCAA CTGCT
SEQ ID NO: 53 Amino acid sequence of ThmI- linker-2Ax8, wherein the linker sequence comprises a KR(EA)5 linker (SEQ ID NO: 43).
MRQVWFSWIVGLFLCFFNVSSAATFEIVNRCSYTVWAAASKGDAALDAGGRQLNSG ESWTINVEPGTNGGKIWARTDCYFDDSGSGICKTGDCGGLLRCKRFGRPPTTLAEFSL NQYGKDYIDISNIKGFNVPMDFSPTTRGCRGVRCAADIVGQCPAKLKAPGGGCNDAC TVFQTSEYCCTTGKCGPTEYSRFFKRLCPDAFSYVLDKPTTVTCPGSSNYRVTFCPTAK REAEAEAEAEAGATNF SLLKL AGD VELNPGPMRQ VWF SWIVGLFLCFFNVS S AATFEI VNRCSYTVWAAASKGDAALDAGGRQLNSGESWTINVEPGTNGGKIWARTDCYFDDS GSGICKTGDCGGLLRCKRFGRPPTTLAEF SLNQ YGKD YIDISNIKGFNVPMDF SPTTRG CRGVRCAADIVGQCPAKLKAPGGGCNDACTVFQTSEYCCTTGKCGPTEYSRFFKRLC PDAFSYVLDKPTTVTCPGSSNYRVTFCPTAKREAEAEAEAEAGATNFSLLKLAGDVEL NPGPMRQ VWF SWIVGLFLCFFNVS S AATFEIVNRC S YT VW AAASKGD AALD AGGRQL NSGESWTINVEPGTNGGKIWARTDCYFDDSGSGICKTGDCGGLLRCKRFGRPPTTLAE F SLNQ YGKD YIDISNIKGFNVPMDF SPTTRGCRGVRC AADIVGQCPAKLKAPGGGCND
ACTVFQTSEYCCTTGKCGPTEYSRFFKRLCPDAFSYVLDKPTTVTCPGSSNYRVTFC PT AKREAEAEAEAE AG ATNF SLLKL AGD VELNPGPMRQ VWF SWIVGLFLCFFNVSSAAT FEIVNRCSYTVWAAASKGDAALDAGGRQLNSGESWTINVEPGTNGGKIWARTDCYF DDSGSGICKTGDCGGLLRCKRFGRPPTTLAEFSLNQYGKDYIDISNIKGFNVPMDFSPT TRGCRGVRCAADIVGQCPAKLKAPGGGCNDACTVFQTSEYCCTTGKCGPTEYSRFFK RLCPDAF S YVLDKPTTVTCPGS SN YR VTFCPT AKRE AE AE AE AE AG ATNF SLLKLAGD VELNPGPMRQVWFSWIVGLFLCFFNVSSAATFEIVNRCSYTVWAAASKGDAALDAGG RQLNSGESWTINVEPGTNGGKIWARTDCYFDDSGSGICKTGDCGGLLRCKRFGRPPTT LAEF SLNQ YGKD YIDISNIKGFNVPMDF SPTTRGCRGVRC AADIVGQCPAKLKAPGGG CNDACTVFQTSEYCCTTGKCGPTEYSRFFKRLCPDAFSYVLDKPTTVTCPGSSNYRVT FCPT AKRE AEAEAEAE AG ATNF SLLKLAGD VELNPGPMRQ VWF SWIVGLFLCFFNVS S AATFEIVNRCSYTVWAAASKGDAALDAGGRQLNSGESWTINVEPGTNGGKIWARTD CYFDDSGSGICKTGDCGGLLRCKRFGRPPTTLAEFSLNQYGKD YIDISNIKGFNVPMDF SPTTRGCRGVRCAADIVGQCPAKLKAPGGGCNDACTVFQTSEYCCTTGKCGPTEYSR FFKRLCPDAFSYVLDKPTTVTCPGSSNYRVTFCPTAKREAEAEAEAEAGATNFSLLKL AGDVELNPGPMRQVWFSWIVGLFLCFFNVSSAATFEIVNRCSYTVWAAASKGDAALD AGGRQLNSGESWTINVEPGTNGGKIWARTDCYFDDSGSGICKTGDCGGLLRCKRFGR PPTTLAEF SLNQ YGKD YIDISNIKGFNVPMDF SPTTRGCRGVRC AADIVGQCPAKLKAP GGGCNDACTVFQTSEYCCTTGKCGPTEYSRFFKRLCPDAF S YVLDKPTTVTCPGS SNY RVTFCPTAKREAEAEAEAEAGATNF SLLKLAGD VELNPGPMRQ VWF SWIVGLFLCFF NVS S AATFEIVNRC S YTVWAAASKGD AALD AGGRQLNSGESWTINVEPGTNGGKIWA RTDCYFDDSGSGICKTGDCGGLLRCKRFGRPPTTLAEFSLNQYGKDYIDISNIKGFNVP MDFSPTTRGCRGVRCAADIVGQCPAKLKAPGGGCNDACTVFQTSEYCCTTGKCGPTE YSRFFKRLCPD AF S YVLDKPTTVTCPGS SNYRVTFCPT A
SEQ ID NO: 54 DNA sequence encoding ThmL linker-2 Ax8, wherein the linker sequence comprises a KR(EA)5 linker (SEQ ID NO: 43).
ATGAGGCAGGTTTGGTTCTCTTGGATTGTGGGATTGTTCCTATGTTTTTTCAACGTG TCTTCTGCTGCTACTTTTGAAATCGTTAACAGATGTTCTTACACTGTTTGGGCTGCT GCTTCTAAAGGTGACGCTGCTTTGGATGCTGGTGGTAGACAATTGAATAGTGGTGA ATCTTGGACTATTAATGTTGAACCAGGTACTAATGGTGGTAAAATTTGGGCTAGAA CTGATTGTTACTTCGATGATTCTGGTTCTGGTATTTGTAAAACTGGTGACTGTGGTG GTTTGTTGAGATGTAAAAGATTCGGTAGACCACCTACTACTTTGGCTGAATTTTCT TTGAACCAATACGGTAAAGATTACATTGATATCTCTAACATCAAGGGTTTTAACGT TCCTATGGATTTCTCTCCTACTACTAGAGGTTGTAGAGGTGTTAGATGTGCTGCTG ATATTGTTGGTCAATGTCCTGCTAAGTTGAAAGCTCCTGGTGGTGGTTGTAACGAT GCTTGTACTGTTTTTCAAACTTCTGAATATTGCTGTACTACTGGTAAATGTGGTCCA
ACTGAATACTCTAGATTCTTCAAAAGATTGTGTCCTGATGCTTTTTCTTACGTTTTG
GATAAACCAACTACTGTTACTTGTCCTGGTTCTTCTAATTATAGAGTTACTTTCTGT
CCAACTGCTAAGCGAGAAGCCGAAGCAGAAGCAGAAGCAGAAGCGGGAGCAACT
AATTTCAGTCTCCTGAAGTTGGCCGGTGACGTTGAACTGAACCCTGGACCGATGAG
GCAGGTTTGGTTCTCTTGGATTGTGGGATTGTTCCTATGTTTTTTCAACGTGTCTTC
TGCTGCTACTTTTGAAATCGTTAACAGATGTTCTTACACTGTTTGGGCTGCTGCTTC
TAAAGGTGACGCTGCTTTGGATGCTGGTGGTAGACAATTGAATAGTGGTGAATCTT
GGACTATTAATGTTGAACCAGGTACTAATGGTGGTAAAATTTGGGCTAGAACTGA
TTGTTACTTCGATGATTCTGGTTCTGGTATTTGTAAAACTGGTGACTGTGGTGGTTT
GTTGAGATGTAAAAGATTCGGTAGACCACCTACTACTTTGGCTGAATTTTCTTTGA
ACCAATACGGTAAAGATTACATTGATATCTCTAACATCAAGGGTTTTAACGTTCCT
ATGGATTTCTCTCCTACTACTAGAGGTTGTAGAGGTGTTAGATGTGCTGCTGATAT
TGTTGGTCAATGTCCTGCTAAGTTGAAAGCTCCTGGTGGTGGTTGTAACGATGCTT
GTACTGTTTTTCAAACTTCTGAATATTGCTGTACTACTGGTAAATGTGGTCCAACTG
AATACTCTAGATTCTTCAAAAGATTGTGTCCTGATGCTTTTTCTTACGTTTTGGATA
AACCAACTACTGTTACTTGTCCTGGTTCTTCTAATTATAGAGTTACTTTCTGTCCAA
CTGCTAAGCGAGAAGCCGAAGCAGAAGCAGAAGCAGAAGCGGGAGCAACTAATT
TCAGTCTCCTGAAGTTGGCCGGTGACGTTGAACTGAACCCTGGACCGATGAGGCA
GGTTTGGTTCTCTTGGATTGTGGGATTGTTCCTATGTTTTTTCAACGTGTCTTCTGC T
GCTACTTTTGAAATCGTTAACAGATGTTCTTACACTGTTTGGGCTGCTGCTTCTAAA
GGTGACGCTGCTTTGGATGCTGGTGGTAGACAATTGAATAGTGGTGAATCTTGGAC
TATTAATGTTGAACCAGGTACTAATGGTGGTAAAATTTGGGCTAGAACTGATTGTT
ACTTCGATGATTCTGGTTCTGGTATTTGTAAAACTGGTGACTGTGGTGGTTTGTTGA
GATGTAAAAGATTCGGTAGACCACCTACTACTTTGGCTGAATTTTCTTTGAACCAA
TACGGTAAAGATTACATTGATATCTCTAACATCAAGGGTTTTAACGTTCCTATGGA
TTTCTCTCCTACTACTAGAGGTTGTAGAGGTGTTAGATGTGCTGCTGATATTGTTGG
TCAATGTCCTGCTAAGTTGAAAGCTCCTGGTGGTGGTTGTAACGATGCTTGTACTG
TTTTTCAAACTTCTGAATATTGCTGTACTACTGGTAAATGTGGTCCAACTGAATACT
CTAGATTCTTCAAAAGATTGTGTCCTGATGCTTTTTCTTACGTTTTGGATAAACCAA
CTACTGTTACTTGTCCTGGTTCTTCTAATTATAGAGTTACTTTCTGTCCAACTGCTA
AGCGAGAAGCCGAAGCAGAAGCAGAAGCAGAAGCGGGAGCAACTAATTTCAGTC
TCCTGAAGTTGGCCGGTGACGTTGAACTGAACCCTGGACCGATGAGGCAGGTTTG
GTTCTCTTGGATTGTGGGATTGTTCCTATGTTTTTTCAACGTGTCTTCTGCTGCTAC T TTTGAAATCGTTAACAGATGTTCTTACACTGTTTGGGCTGCTGCTTCTAAAGGTGA
CGCTGCTTTGGATGCTGGTGGTAGACAATTGAATAGTGGTGAATCTTGGACTATTA
ATGTTGAACCAGGTACTAATGGTGGTAAAATTTGGGCTAGAACTGATTGTTACTTC
GATGATTCTGGTTCTGGTATTTGTAAAACTGGTGACTGTGGTGGTTTGTTGAGATG
TAAAAGATTCGGTAGACCACCTACTACTTTGGCTGAATTTTCTTTGAACCAATACG
GTAAAGATTACATTGATATCTCTAACATCAAGGGTTTTAACGTTCCTATGGATTTC
TCTCCTACTACTAGAGGTTGTAGAGGTGTTAGATGTGCTGCTGATATTGTTGGTCA
ATGTCCTGCTAAGTTGAAAGCTCCTGGTGGTGGTTGTAACGATGCTTGTACTGTTT
TTCAAACTTCTGAATATTGCTGTACTACTGGTAAATGTGGTCCAACTGAATACTCT
AGATTCTTCAAAAGATTGTGTCCTGATGCTTTTTCTTACGTTTTGGATAAACCAACT
ACTGTTACTTGTCCTGGTTCTTCTAATTATAGAGTTACTTTCTGTCCAACTGCTAAG
CGAGAAGCCGAAGCAGAAGCAGAAGCAGAAGCGGGAGCAACTAATTTCAGTCTC
CTGAAGTTGGCCGGTGACGTTGAACTGAACCCTGGACCGATGAGGCAGGTTTGGT
TCTCTTGGATTGTGGGATTGTTCCTATGTTTTTTCAACGTGTCTTCTGCTGCTACTT T
TGAAATCGTTAACAGATGTTCTTACACTGTTTGGGCTGCTGCTTCTAAAGGTGACG
CTGCTTTGGATGCTGGTGGTAGACAATTGAATAGTGGTGAATCTTGGACTATTAAT
GTTGAACCAGGTACTAATGGTGGTAAAATTTGGGCTAGAACTGATTGTTACTTCGA
TGATTCTGGTTCTGGTATTTGTAAAACTGGTGACTGTGGTGGTTTGTTGAGATGTA
AAAGATTCGGTAGACCACCTACTACTTTGGCTGAATTTTCTTTGAACCAATACGGT
AAAGATTACATTGATATCTCTAACATCAAGGGTTTTAACGTTCCTATGGATTTCTCT
CCTACTACTAGAGGTTGTAGAGGTGTTAGATGTGCTGCTGATATTGTTGGTCAATG
TCCTGCTAAGTTGAAAGCTCCTGGTGGTGGTTGTAACGATGCTTGTACTGTTTTTCA
AACTTCTGAATATTGCTGTACTACTGGTAAATGTGGTCCAACTGAATACTCTAGAT
TCTTCAAAAGATTGTGTCCTGATGCTTTTTCTTACGTTTTGGATAAACCAACTACTG
TTACTTGTCCTGGTTCTTCTAATTATAGAGTTACTTTCTGTCCAACTGCTAAGCGAG
AAGCCGAAGCAGAAGCAGAAGCAGAAGCGGGAGCAACTAATTTCAGTCTCCTGA
AGTTGGCCGGTGACGTTGAACTGAACCCTGGACCGATGAGGCAGGTTTGGTTCTCT
TGGATTGTGGGATTGTTCCTATGTTTTTTCAACGTGTCTTCTGCTGCTACTTTTGAA
ATCGTTAACAGATGTTCTTACACTGTTTGGGCTGCTGCTTCTAAAGGTGACGCTGC
TTTGGATGCTGGTGGTAGACAATTGAATAGTGGTGAATCTTGGACTATTAATGTTG
AACCAGGTACTAATGGTGGTAAAATTTGGGCTAGAACTGATTGTTACTTCGATGAT
TCTGGTTCTGGTATTTGTAAAACTGGTGACTGTGGTGGTTTGTTGAGATGTAAAAG
ATTCGGTAGACCACCTACTACTTTGGCTGAATTTTCTTTGAACCAATACGGTAAAG
ATTACATTGATATCTCTAACATCAAGGGTTTTAACGTTCCTATGGATTTCTCTCCTA CTACTAGAGGTTGTAGAGGTGTTAGATGTGCTGCTGATATTGTTGGTCAATGTCCT
GCTAAGTTGAAAGCTCCTGGTGGTGGTTGTAACGATGCTTGTACTGTTTTTCAAAC
TTCTGAATATTGCTGTACTACTGGTAAATGTGGTCCAACTGAATACTCTAGATTCTT
CAAAAGATTGTGTCCTGATGCTTTTTCTTACGTTTTGGATAAACCAACTACTGTTAC
TTGTCCTGGTTCTTCTAATTATAGAGTTACTTTCTGTCCAACTGCTAAGCGAGAAGC
CGAAGCAGAAGCAGAAGCAGAAGCGGGAGCAACTAATTTCAGTCTCCTGAAGTTG
GCCGGTGACGTTGAACTGAACCCTGGACCGATGAGGCAGGTTTGGTTCTCTTGGAT
TGTGGGATTGTTCCTATGTTTTTTCAACGTGTCTTCTGCTGCTACTTTTGAAATCGT
TAACAGATGTTCTTACACTGTTTGGGCTGCTGCTTCTAAAGGTGACGCTGCTTTGG
ATGCTGGTGGTAGACAATTGAATAGTGGTGAATCTTGGACTATTAATGTTGAACCA
GGTACTAATGGTGGTAAAATTTGGGCTAGAACTGATTGTTACTTCGATGATTCTGG
TTCTGGTATTTGTAAAACTGGTGACTGTGGTGGTTTGTTGAGATGTAAAAGATTCG
GTAGACCACCTACTACTTTGGCTGAATTTTCTTTGAACCAATACGGTAAAGATTAC
ATTGATATCTCTAACATCAAGGGTTTTAACGTTCCTATGGATTTCTCTCCTACTACT
AGAGGTTGTAGAGGTGTTAGATGTGCTGCTGATATTGTTGGTCAATGTCCTGCTAA
GTTGAAAGCTCCTGGTGGTGGTTGTAACGATGCTTGTACTGTTTTTCAAACTTCTG
AATATTGCTGTACTACTGGTAAATGTGGTCCAACTGAATACTCTAGATTCTTCAAA
AGATTGTGTCCTGATGCTTTTTCTTACGTTTTGGATAAACCAACTACTGTTACTTGT
CCTGGTTCTTCTAATTATAGAGTTACTTTCTGTCCAACTGCTAAGCGAGAAGCCGA
AGCAGAAGCAGAAGCAGAAGCGGGAGCAACTAATTTCAGTCTCCTGAAGTTGGCC
GGTGACGTTGAACTGAACCCTGGACCGATGAGGCAGGTTTGGTTCTCTTGGATTGT
GGGATTGTTCCTATGTTTTTTCAACGTGTCTTCTGCTGCTACTTTTGAAATCGTTAA
CAGATGTTCTTACACTGTTTGGGCTGCTGCTTCTAAAGGTGACGCTGCTTTGGATG
CTGGTGGTAGACAATTGAATAGTGGTGAATCTTGGACTATTAATGTTGAACCAGGT
ACTAATGGTGGTAAAATTTGGGCTAGAACTGATTGTTACTTCGATGATTCTGGTTC
TGGTATTTGTAAAACTGGTGACTGTGGTGGTTTGTTGAGATGTAAAAGATTCGGTA
GACCACCTACTACTTTGGCTGAATTTTCTTTGAACCAATACGGTAAAGATTACATT
GATATCTCTAACATCAAGGGTTTTAACGTTCCTATGGATTTCTCTCCTACTACTAGA
GGTTGTAGAGGTGTTAGATGTGCTGCTGATATTGTTGGTCAATGTCCTGCTAAGTT
GAAAGCTCCTGGTGGTGGTTGTAACGATGCTTGTACTGTTTTTCAAACTTCTGAAT
ATTGCTGTACTACTGGTAAATGTGGTCCAACTGAATACTCTAGATTCTTCAAAAGA
TTGTGTCCTGATGCTTTTTCTTACGTTTTGGATAAACCAACTACTGTTACTTGTCCT
GGTTCTTCTAATTATAGAGTTACTTTCTGTCCAACTGCT SEQ ID NO: 55 Amino acid sequence of Thmll- linker-2Ax2, wherein the linker sequence comprises a KR(EA)5 linker (SEQ ID NO: 43).
MRQVWFSWIVGLFLCFFNVSSAATFEIVNRCSYTVWAAASKGDAALDAGGRQLNSG ESWTINVEPGTKGGKIWARTDCYFDDSGRGICRTGDCGGLLQCKRFGRPPTTLAEFSL NQYGKDYIDISNIKGFNVPMDFSPTTRGCRGVRCAADIVGQCPAKLKAPGGGCNDAC TVFQTSEYCCTTGKCGPTEYSRFFKRLCPDAFSYVLDKPTTVTCPGSSNYRVTFCPTAK REAEAEAEAEAGATNF SLLKL AGD VELNPGPMRQ VWF SWIVGLFLCFFNVS S AATFEI VNRCSYTVWAAASKGDAALDAGGRQLNSGESWTINVEPGTKGGKIWARTDCYFDDS GRGICRTGDCGGLLQCKRFGRPPTTLAEFSLNQYGKDYIDISNIKGFNVPMDFSPTTRG
CRGVRCAADIVGQCPAKLKAPGGGCNDACTVFQTSEYCCTTGKCGPTEYSRFFKRLC PD AF S YVLDKPTT VTCPGS SNYRVTFCPT A
SEQ ID NO: 56 DNA sequence encoding Thmll- linker-2Ax2, wherein the linker sequence comprises a KR(EA)5 linker (SEQ ID NO: 43).
ATGAGGCAGGTTTGGTTCTCTTGGATTGTGGGATTGTTCCTATGTTTTTTCAACGTG TCTTCTGCTGCTACTTTCGAAATTGTTAATAGATGTTCTTACACCGTTTGGGCTGCT GCTTCTAAGGGTGACGCTGCTTTGGATGCTGGTGGTAGACAATTGAATAGTGGTGA ATCTTGGACTATTAACGTTGAACCTGGTACTAAGGGTGGTAAAATTTGGGCTAGAA CTGATTGTTACTTCGATGATTCTGGTAGAGGTATTTGTAGAACTGGTGACTGTGGT GGTTTGTTGCAATGTAAAAGATTTGGTAGACCACCAACTACTTTGGCTGAATTTTC TTTGAATCAATACGGTAAAGACTACATTGATATCTCTAATATCAAGGGTTTCAACG
TTCCAATGGATTTCTCTCCTACTACTAGAGGTTGTAGAGGTGTTAGATGTGCTGCT GATATTGTTGGTCAATGTCCAGCTAAATTGAAAGCTCCAGGTGGTGGTTGTAACGA TGCTTGTACTGTTTTTCAAACTTCTGAATACTGTTGTACTACTGGTAAATGTGGTCC TACTGAATATTCTAGATTTTTCAAAAGACTGTGCCCAGATGCTTTTTCTTACGTTTT GGATAAACCTACTACTGTTACTTGTCCAGGTTCTTCTAACTATAGAGTTACTTTTTG TCCTACTGCTAAGCGAGAAGCCGAAGCAGAAGCAGAAGCAGAAGCGGGAGCAAC TAATTTCAGTCTCCTGAAGTTGGCCGGTGACGTTGAACTGAACCCTGGACCGATGA
GGCAGGTTTGGTTCTCTTGGATTGTGGGATTGTTCCTATGTTTTTTCAACGTGTCTT CTGCTGCTACTTTCGAAATTGTTAATAGATGTTCTTACACCGTTTGGGCTGCTGCTT CTAAGGGTGACGCTGCTTTGGATGCTGGTGGTAGACAATTGAATAGTGGTGAATCT TGGACTATTAACGTTGAACCTGGTACTAAGGGTGGTAAAATTTGGGCTAGAACTG ATTGTTACTTCGATGATTCTGGTAGAGGTATTTGTAGAACTGGTGACTGTGGTGGT TTGTTGCAATGTAAAAGATTTGGTAGACCACCAACTACTTTGGCTGAATTTTCTTT GAATCAATACGGTAAAGACTACATTGATATCTCTAATATCAAGGGTTTCAACGTTC CAATGGATTTCTCTCCTACTACTAGAGGTTGTAGAGGTGTTAGATGTGCTGCTGAT ATTGTTGGTCAATGTCCAGCTAAATTGAAAGCTCCAGGTGGTGGTTGTAACGATGC TTGTACTGTTTTTCAAACTTCTGAATACTGTTGTACTACTGGTAAATGTGGTCCTAC TGAATATTCTAGATTTTTCAAAAGACTGTGCCCAGATGCTTTTTCTTACGTTTTGGA TAAACCTACTACTGTTACTTGTCCAGGTTCTTCTAACTATAGAGTTACTTTCTGTCC AACTGCT
SEQ ID NO: 57 Amino acid sequence of Thmll- linker-2Ax8, wherein the linker sequence comprises a KR(EA)5 linker (SEQ ID NO: 43).
MRQVWFSWIVGLFLCFFNVSSAATFEIVNRCSYTVW AAASKGD AALDAGGRQLNSG ESWTINVEPGTKGGKIWARTDCYFDDSGRGICRTGDCGGLLQCKRFGRPPTTLAEFSL NQYGKDYIDISNIKGFNVPMDFSPTTRGCRGVRCAADIVGQCPAKLKAPGGGCNDAC TVFQTSEYCCTTGKCGPTEYSRFFKRLCPDAFSYVLDKPTTVTCPGSSNYRVTFCPTAK REAEAEAEAEAGATNF SLLKL AGD VELNPGPMRQ VWF SWIVGLFLCFFNVS S AATFEI VNRCSYTVWAAASKGDAALDAGGRQLNSGESWTINVEPGTKGGKIWARTDCYFDDS GRGICRTGDCGGLLQCKRFGRPPTTLAEFSLNQYGKDYIDISNIKGFNVPMDFSPTTRG CRGVRCAADIVGQCPAKLKAPGGGCNDACTVFQTSEYCCTTGKCGPTEYSRFFKRLC PDAFSYVLDKPTTVTCPGSSNYRVTFCPTAKREAEAEAEAEAGATNFSLLKLAGDVEL NPGPMRQ VWF SWIVGLFLCFFNVS S AATFEIVNRC S YT VW AAASKGD AALD AGGRQL NSGESWTINVEPGTKGGKIWARTDCYFDDSGRGICRTGDCGGLLQCKRFGRPPTTLAE F SLNQ YGKD YIDISNIKGFNVPMDF SPTTRGCRGVRC AADIVGQCPAKLKAPGGGCND ACTVFQTSEYCCTTGKCGPTEYSRFFKRLCPDAFSYVLDKPTTVTCPGSSNYRVTFCPT AKREAEAEAEAE AG ATNF SLLKL AGD VELNPGPMRQ VWF SWIVGLFLCFFNVSSAAT FEIVNRCSYTVW AAASKGD AALDAGGRQLNSGESWTINVEPGTKGGKIWARTDCYF DDSGRGICRTGDCGGLLQCKRFGRPPTTLAEFSLNQYGKDYIDISNIKGFNVPMDFSPT TRGCRGVRC AADIVGQCPAKLKAPGGGCND ACTVFQTSEYCCTTGKCGPTEYSRFFK RLCPDAF S YVLDKPTTVTCPGS SN YR VTFCPT AKREAEAEAEAE AG ATNF SLLKLAGD VELNPGPMRQ VWF SWIVGLFLCFFNVSS AATFEIVNRC SYT VW AAASKGD AALDAGG RQLNSGESWTINVEPGTKGGKIWARTDCYFDDSGRGICRTGDCGGLLQCKRFGRPPTT LAEF SLNQ YGKD YIDISNIKGFNVPMDF SPTTRGCRGVRC AADIVGQCPAKLKAPGGG CNDACTVFQTSEYCCTTGKCGPTEYSRFFKRLCPDAFSYVLDKPTTVTCPGSSNYRVT FCPT AKREAEAEAEAE AG ATNF SLLKL AGD VELNPGPMRQ VWF SWIVGLFLCFFNVS S AATFEIVNRC SYT VW AAASKGD AALDAGGRQLNSGESWTINVEPGTKGGKIWARTD CYFDDSGRGICRTGDCGGLLQCKRFGRPPTTLAEFSLNQYGKD YIDISNIKGFNVPMDF SPTTRGCRGVRCAADIVGQCPAKLKAPGGGCNDACTVFQTSEYCCTTGKCGPTEYSR FFKRLCPDAFSYVLDKPTTVTCPGSSNYRVTFCPTAKREAEAEAEAEAGATNFSLLKL
AGDVELNPGPMRQVWFSWIVGLFLCFFNVSSAATFEIVNRCSYTVWAAASKGDAALD AGGRQLNSGESWTINVEPGTKGGKIWARTDCYFDDSGRGICRTGDCGGLLQCKRFGR PPTTLAEF SLNQ YGKD YIDISNIKGFNVPMDF SPTTRGCRGVRC AADIVGQCPAKLKAP GGGCNDACTVFQTSEYCCTTGKCGPTEYSRFFKRLCPDAFSYVLDKPTTVTCPGSSNY RVTFCPTAKREAEAEAEAEAGATNF SLLKLAGD VELNPGPMRQ VWF SWIVGLFLCFF NVS S AATFEIVNRC S YTVWAAASKGD AALD AGGRQLNSGESWTINVEPGTKGGKIWA RTDCYFDDSGRGICRTGDCGGLLQCKRFGRPPTTLAEFSLNQYGKDYIDISNIKGFNVP MDFSPTTRGCRGVRCAADIVGQCPAKLKAPGGGCNDACTVFQTSEYCCTTGKCGPTE YSRFFKRLCPD AF S YVLDKPTT VTCPGS SNYRVTFCPT A
SEQ ID NO: 58 DNA sequence encoding Thmll- linker-2 Ax8, wherein the linker sequence comprises a KR(EA)5 linker (SEQ ID NO: 43).
ATGAGGCAGGTTTGGTTCTCTTGGATTGTGGGATTGTTCCTATGTTTTTTCAACGTG TCTTCTGCTGCTACTTTCGAAATTGTTAATAGATGTTCTTACACCGTTTGGGCTGCT GCTTCTAAGGGTGACGCTGCTTTGGATGCTGGTGGTAGACAATTGAATAGTGGTGA ATCTTGGACTATTAACGTTGAACCTGGTACTAAGGGTGGTAAAATTTGGGCTAGAA CTGATTGTTACTTCGATGATTCTGGTAGAGGTATTTGTAGAACTGGTGACTGTGGT GGTTTGTTGCAATGTAAAAGATTTGGTAGACCACCAACTACTTTGGCTGAATTTTC TTTGAATCAATACGGTAAAGACTACATTGATATCTCTAATATCAAGGGTTTCAACG TTCCAATGGATTTCTCTCCTACTACTAGAGGTTGTAGAGGTGTTAGATGTGCTGCT GATATTGTTGGTCAATGTCCAGCTAAATTGAAAGCTCCAGGTGGTGGTTGTAACGA TGCTTGTACTGTTTTTCAAACTTCTGAATACTGTTGTACTACTGGTAAATGTGGTCC TACTGAATATTCTAGATTTTTCAAAAGACTGTGCCCAGATGCTTTTTCTTACGTTTT GGATAAACCTACTACTGTTACTTGTCCAGGTTCTTCTAACTATAGAGTTACTTTTTG TCCTACTGCTAAGCGAGAAGCCGAAGCAGAAGCAGAAGCAGAAGCGGGAGCAAC
TAATTTCAGTCTCCTGAAGTTGGCCGGTGACGTTGAACTGAACCCTGGACCGATGA GGCAGGTTTGGTTCTCTTGGATTGTGGGATTGTTCCTATGTTTTTTCAACGTGTCTT CTGCTGCTACTTTCGAAATTGTTAATAGATGTTCTTACACCGTTTGGGCTGCTGCTT CTAAGGGTGACGCTGCTTTGGATGCTGGTGGTAGACAATTGAATAGTGGTGAATCT TGGACTATTAACGTTGAACCTGGTACTAAGGGTGGTAAAATTTGGGCTAGAACTG ATTGTTACTTCGATGATTCTGGTAGAGGTATTTGTAGAACTGGTGACTGTGGTGGT TTGTTGCAATGTAAAAGATTTGGTAGACCACCAACTACTTTGGCTGAATTTTCTTT GAATCAATACGGTAAAGACTACATTGATATCTCTAATATCAAGGGTTTCAACGTTC
CAATGGATTTCTCTCCTACTACTAGAGGTTGTAGAGGTGTTAGATGTGCTGCTGAT
ATTGTTGGTCAATGTCCAGCTAAATTGAAAGCTCCAGGTGGTGGTTGTAACGATGC
TTGTACTGTTTTTCAAACTTCTGAATACTGTTGTACTACTGGTAAATGTGGTCCTAC
TGAATATTCTAGATTTTTCAAAAGACTGTGCCCAGATGCTTTTTCTTACGTTTTGGA
TAAACCTACTACTGTTACTTGTCCAGGTTCTTCTAACTATAGAGTTACTTTCTGTCC
AACTGCTAAGCGAGAAGCCGAAGCAGAAGCAGAAGCAGAAGCGGGAGCAACTAA
TTTCAGTCTCCTGAAGTTGGCCGGTGACGTTGAACTGAACCCTGGACCGATGAGGC
AGGTTTGGTTCTCTTGGATTGTGGGATTGTTCCTATGTTTTTTCAACGTGTCTTCTG
CTGCTACTTTCGAAATTGTTAATAGATGTTCTTACACCGTTTGGGCTGCTGCTTCTA
AGGGTGACGCTGCTTTGGATGCTGGTGGTAGACAATTGAATAGTGGTGAATCTTG
GACTATTAACGTTGAACCTGGTACTAAGGGTGGTAAAATTTGGGCTAGAACTGATT
GTTACTTCGATGATTCTGGTAGAGGTATTTGTAGAACTGGTGACTGTGGTGGTTTG
TTGCAATGTAAAAGATTTGGTAGACCACCAACTACTTTGGCTGAATTTTCTTTGAA
TCAATACGGTAAAGACTACATTGATATCTCTAATATCAAGGGTTTCAACGTTCCAA
TGGATTTCTCTCCTACTACTAGAGGTTGTAGAGGTGTTAGATGTGCTGCTGATATT
GTTGGTCAATGTCCAGCTAAATTGAAAGCTCCAGGTGGTGGTTGTAACGATGCTTG
TACTGTTTTTCAAACTTCTGAATACTGTTGTACTACTGGTAAATGTGGTCCTACTGA
ATATTCTAGATTTTTCAAAAGACTGTGCCCAGATGCTTTTTCTTACGTTTTGGATAA
ACCTACTACTGTTACTTGTCCAGGTTCTTCTAACTATAGAGTTACTTTCTGTCCAAC
TGCTAAGCGAGAAGCCGAAGCAGAAGCAGAAGCAGAAGCGGGAGCAACTAATTT
CAGTCTCCTGAAGTTGGCCGGTGACGTTGAACTGAACCCTGGACCGATGAGGCAG
GTTTGGTTCTCTTGGATTGTGGGATTGTTCCTATGTTTTTTCAACGTGTCTTCTGCT G
CTACTTTCGAAATTGTTAATAGATGTTCTTACACCGTTTGGGCTGCTGCTTCTAAGG
GTGACGCTGCTTTGGATGCTGGTGGTAGACAATTGAATAGTGGTGAATCTTGGACT
ATTAACGTTGAACCTGGTACTAAGGGTGGTAAAATTTGGGCTAGAACTGATTGTTA
CTTCGATGATTCTGGTAGAGGTATTTGTAGAACTGGTGACTGTGGTGGTTTGTTGC
AATGTAAAAGATTTGGTAGACCACCAACTACTTTGGCTGAATTTTCTTTGAATCAA
TACGGTAAAGACTACATTGATATCTCTAATATCAAGGGTTTCAACGTTCCAATGGA
TTTCTCTCCTACTACTAGAGGTTGTAGAGGTGTTAGATGTGCTGCTGATATTGTTGG
TCAATGTCCAGCTAAATTGAAAGCTCCAGGTGGTGGTTGTAACGATGCTTGTACTG
TTTTTCAAACTTCTGAATACTGTTGTACTACTGGTAAATGTGGTCCTACTGAATATT
CTAGATTTTTCAAAAGACTGTGCCCAGATGCTTTTTCTTACGTTTTGGATAAACCTA
CTACTGTTACTTGTCCAGGTTCTTCTAACTATAGAGTTACTTTCTGTCCAACTGCTA AGCGAGAAGCCGAAGCAGAAGCAGAAGCAGAAGCGGGAGCAACTAATTTCAGTC
TCCTGAAGTTGGCCGGTGACGTTGAACTGAACCCTGGACCGATGAGGCAGGTTTG
GTTCTCTTGGATTGTGGGATTGTTCCTATGTTTTTTCAACGTGTCTTCTGCTGCTAC T
TTCGAAATTGTTAATAGATGTTCTTACACCGTTTGGGCTGCTGCTTCTAAGGGTGA
CGCTGCTTTGGATGCTGGTGGTAGACAATTGAATAGTGGTGAATCTTGGACTATTA
ACGTTGAACCTGGTACTAAGGGTGGTAAAATTTGGGCTAGAACTGATTGTTACTTC
GATGATTCTGGTAGAGGTATTTGTAGAACTGGTGACTGTGGTGGTTTGTTGCAATG
TAAAAGATTTGGTAGACCACCAACTACTTTGGCTGAATTTTCTTTGAATCAATACG
GTAAAGACTACATTGATATCTCTAATATCAAGGGTTTCAACGTTCCAATGGATTTC
TCTCCTACTACTAGAGGTTGTAGAGGTGTTAGATGTGCTGCTGATATTGTTGGTCA
ATGTCCAGCTAAATTGAAAGCTCCAGGTGGTGGTTGTAACGATGCTTGTACTGTTT
TTCAAACTTCTGAATACTGTTGTACTACTGGTAAATGTGGTCCTACTGAATATTCTA
GATTTTTCAAAAGACTGTGCCCAGATGCTTTTTCTTACGTTTTGGATAAACCTACTA
CTGTTACTTGTCCAGGTTCTTCTAACTATAGAGTTACTTTCTGTCCAACTGCTAAGC
GAGAAGCCGAAGCAGAAGCAGAAGCAGAAGCGGGAGCAACTAATTTCAGTCTCC
TGAAGTTGGCCGGTGACGTTGAACTGAACCCTGGACCGATGAGGCAGGTTTGGTT
CTCTTGGATTGTGGGATTGTTCCTATGTTTTTTCAACGTGTCTTCTGCTGCTACTTT C
GAAATTGTTAATAGATGTTCTTACACCGTTTGGGCTGCTGCTTCTAAGGGTGACGC
TGCTTTGGATGCTGGTGGTAGACAATTGAATAGTGGTGAATCTTGGACTATTAACG
TTGAACCTGGTACTAAGGGTGGTAAAATTTGGGCTAGAACTGATTGTTACTTCGAT
GATTCTGGTAGAGGTATTTGTAGAACTGGTGACTGTGGTGGTTTGTTGCAATGTAA
AAGATTTGGTAGACCACCAACTACTTTGGCTGAATTTTCTTTGAATCAATACGGTA
AAGACTACATTGATATCTCTAATATCAAGGGTTTCAACGTTCCAATGGATTTCTCT
CCTACTACTAGAGGTTGTAGAGGTGTTAGATGTGCTGCTGATATTGTTGGTCAATG
TCCAGCTAAATTGAAAGCTCCAGGTGGTGGTTGTAACGATGCTTGTACTGTTTTTC
AAACTTCTGAATACTGTTGTACTACTGGTAAATGTGGTCCTACTGAATATTCTAGA
TTTTTCAAAAGACTGTGCCCAGATGCTTTTTCTTACGTTTTGGATAAACCTACTACT
GTTACTTGTCCAGGTTCTTCTAACTATAGAGTTACTTTCTGTCCAACTGCTAAGCGA
GAAGCCGAAGCAGAAGCAGAAGCAGAAGCGGGAGCAACTAATTTCAGTCTCCTG
AAGTTGGCCGGTGACGTTGAACTGAACCCTGGACCGATGAGGCAGGTTTGGTTCT
CTTGGATTGTGGGATTGTTCCTATGTTTTTTCAACGTGTCTTCTGCTGCTACTTTCG
AAATTGTTAATAGATGTTCTTACACCGTTTGGGCTGCTGCTTCTAAGGGTGACGCT
GCTTTGGATGCTGGTGGTAGACAATTGAATAGTGGTGAATCTTGGACTATTAACGT
TGAACCTGGTACTAAGGGTGGTAAAATTTGGGCTAGAACTGATTGTTACTTCGATG ATTCTGGTAGAGGTATTTGTAGAACTGGTGACTGTGGTGGTTTGTTGCAATGTAAA
AGATTTGGTAGACCACCAACTACTTTGGCTGAATTTTCTTTGAATCAATACGGTAA
AGACTACATTGATATCTCTAATATCAAGGGTTTCAACGTTCCAATGGATTTCTCTC
CTACTACTAGAGGTTGTAGAGGTGTTAGATGTGCTGCTGATATTGTTGGTCAATGT
CCAGCTAAATTGAAAGCTCCAGGTGGTGGTTGTAACGATGCTTGTACTGTTTTTCA
AACTTCTGAATACTGTTGTACTACTGGTAAATGTGGTCCTACTGAATATTCTAGAT
TTTTCAAAAGACTGTGCCCAGATGCTTTTTCTTACGTTTTGGATAAACCTACTACTG
TTACTTGTCCAGGTTCTTCTAACTATAGAGTTACTTTCTGTCCAACTGCTAAGCGAG
AAGCCGAAGCAGAAGCAGAAGCAGAAGCGGGAGCAACTAATTTCAGTCTCCTGA
AGTTGGCCGGTGACGTTGAACTGAACCCTGGACCGATGAGGCAGGTTTGGTTCTCT
TGGATTGTGGGATTGTTCCTATGTTTTTTCAACGTGTCTTCTGCTGCTACTTTCGAA
ATTGTTAATAGATGTTCTTACACCGTTTGGGCTGCTGCTTCTAAGGGTGACGCTGC
TTTGGATGCTGGTGGTAGACAATTGAATAGTGGTGAATCTTGGACTATTAACGTTG
AACCTGGTACTAAGGGTGGTAAAATTTGGGCTAGAACTGATTGTTACTTCGATGAT
TCTGGTAGAGGTATTTGTAGAACTGGTGACTGTGGTGGTTTGTTGCAATGTAAAAG
ATTTGGTAGACCACCAACTACTTTGGCTGAATTTTCTTTGAATCAATACGGTAAAG
ACTACATTGATATCTCTAATATCAAGGGTTTCAACGTTCCAATGGATTTCTCTCCTA
CTACTAGAGGTTGTAGAGGTGTTAGATGTGCTGCTGATATTGTTGGTCAATGTCCA
GCTAAATTGAAAGCTCCAGGTGGTGGTTGTAACGATGCTTGTACTGTTTTTCAAAC
TTCTGAATACTGTTGTACTACTGGTAAATGTGGTCCTACTGAATATTCTAGATTTTT
CAAAAGACTGTGCCCAGATGCTTTTTCTTACGTTTTGGATAAACCTACTACTGTTA
CTTGTCCAGGTTCTTCTAACTATAGAGTTACTTTCTGTCCAACTGCT
Next Patent: ENHANCEMENT OF PHAGOCYTOSIS