Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
PROMOTER FOR YEAST
Document Type and Number:
WIPO Patent Application WO/2019/236451
Kind Code:
A1
Abstract:
A promoter operably linked to a gene encoding a protein is disclosed. The promoter drives expression of the protein in a yeast cell in the absence of methanol. Also disclosed are vectors, host cells and expression systems that include the promoter, as well as methods of using the promoter to express proteins in yeast.

Inventors:
TAN XUQIU (US)
CAI RUORONG (US)
ZHONG JINGPING (US)
SCRANTON MELISSA ANN (US)
SPEER MICHAEL (US)
Application Number:
PCT/US2019/035140
Publication Date:
December 12, 2019
Filing Date:
June 03, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
BASF SE (DE)
TAN XUQIU (US)
International Classes:
C12N15/31; C07K14/39; C12N1/16; C12N1/19; C12N15/81; C12P21/00
Domestic Patent References:
WO2012129036A22012-09-27
WO2017021541A12017-02-09
WO2016139279A12016-09-09
WO2013050551A12013-04-11
WO2006089329A22006-08-31
Foreign References:
EP2862933A22015-04-22
Other References:
WANG JINJIA ET AL: "Methanol-Independent Protein Expression by AOX1 Promoter with trans-Acting Elements Engineering and Glucose-Glycerol-Shift Induction in Pichia pastoris.", SCIENTIFIC REPORTS 02 02 2017, vol. 7, 2 February 2017 (2017-02-02), pages 41850, XP002793387, ISSN: 2045-2322
SHEN WEI ET AL: "A novel methanol-free Pichia pastoris system for recombinant protein expression.", MICROBIAL CELL FACTORIES 21 OCT 2016, vol. 15, no. 1, 21 October 2016 (2016-10-21), pages 178, XP055589024, ISSN: 1475-2859
ROLAND PRIELHOFER ET AL: "Induction without methanol: novel regulated promoters enable high-level expression in Pichia pastoris", MICROBIAL CELL FACTORIES,, vol. 12, no. 1, 24 January 2013 (2013-01-24), pages 5, XP021146863, ISSN: 1475-2859, DOI: 10.1186/1475-2859-12-5
GARCÍA-ORTEGA XAVIER ET AL: "Rational development of bioprocess engineering strategies for recombinant protein production in Pichia pastoris (Komagataella phaffii) using the methanol-free GAP promoter. Where do we stand?", NEW BIOTECHNOLOGY 10 JUN 2019, vol. 53, 10 June 2019 (2019-06-10), pages 24 - 34, XP002793388, ISSN: 1876-4347
MCGEHEE ET AL., MOL. ENDOCRINOL., vol. 7, 1993, pages 551
TREISMAN, SEMINARS IN CANCER BIOL., vol. 1, 1990, pages 47
O'REILLY ET AL., J. BIOL. CHEM., vol. 267, 1992, pages 19938
YE ET AL., J. BIOL. CHEM., vol. 269, 1994, pages 25728
LOEKEN, GENE EXPR., vol. 3, 1993, pages 253
"Molecular Biology of the Gene", 1987, THE BENJAMIN/CUMMINGS PUBLISHING COMPANY, INC.
LEMAIGREROUSSEAU, BIOCHEM. J., vol. 303, 1994, pages 1
NEEDLEMANWUNSCH, J. MOL. BIOL., vol. 48, 1970, pages 443 - 453
NEEDLEMANWUNSCH, J. MOL. BIOL., vol. 48, 1979, pages 443 - 453
Attorney, Agent or Firm:
LOMPREY, Jeffrey R. et al. (US)
Download PDF:
Claims:
WHAT IS CLAIMED IS:

1. A promoter comprising a nucleic acid sequence having at least of 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 99% or more sequence identity to any one of SEQ ID NO: 1-7, or a fragment thereof, wherein the promoter is operably linked to a gene encoding a protein, and the promoter drives expression of the protein in a yeast cell in the absence of methanol.

2. The promoter of claim 1, wherein the sequence identity is over a region of at least 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 2050, 1100, 1150, or more residues, or the full length of the nucleic acid.

3. The promoter of claim 1, wherein the fragment is over a region of at least 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 2050, 1100, 1150, or more residues, or the full length of the nucleic acid.

4. The promoter of claim 1, wherein the protein is an enzyme, a peptide, an antibody, or a recombinant protein.

5. The promoter of claim 2, wherein the enzyme is a lipase, amylase, xylanase, protease, glucoamylase, glucanase, mannanase, phytase, or cellulase.

6. The promoter of any one of Claims 1-3, wherein the protein is glycosylated or non-glycosylated.

7. The promoter of any one of Claims 1-4, wherein the protein optionally comprises disulfide bonds.

8. The promoter of Claim 1, wherein the nucleic acid sequence is 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1500, 2000, 3000, 4000 or 5000 bases upstream from a translational start site of the at least one sequence encoding the protein or any number of bases in between a range defined by any two aforementioned values upstream from the start site of the at least one sequence encoding the protein.

9. The promoter of Claim 1, wherein the yeast cell are a species of methylo tropic yeast.

10. The promoter of Claim 1, wherein the yeast cell is of the genus Komagataella.

11. The promoter of Claim 10, wherein the yeast cell is selected from: K. farinosa, K. anomala, K. heedii, K. guilliermondii, K. kluyveri, K. membranifaciens, K. norvegensis, K. ohmeri, K. pastoris, K. phaffii, K. methanolica and K. subpelliclosa.

12. The promoter of claim 1, wherein the expression of protein is up to 40 g/l.

13. A vector comprising the promoter of any one of Claims 1-12.

14. The vector of Claim 13, wherein the vector is a yeast integrative plasmid, episomal plasmid, centromere plasmid or artificial chromosome.

15. The vector of Claim 13 or 14 wherein the vector comprises a selectable marker.

16. A yeast cell comprising the promoter of any one of Claims 1-12, or the vector of claims 13-15.

17. A protein expression system comprising the yeast cell of Claim 16.

18. A method of expressing protein in a yeast cell, the method comprising:

providing a yeast cell;

introducing the promoter of any one of claims 1-12, or the vector of any one of Claims 13-15 into the cell;

fermenting the yeast cell under at least one fermentation condition in the absence of methanol in a nutrient broth;

harvesting the cells; and

recovering protein from the cells.

19. The method of Claim 18, wherein the protein is excreted or is intracellular.

20. The method of Claim 18 or 19, wherein the protein is an enzyme, a peptide, an antibody, or a recombinant protein.

21. The method of Claim 20, wherein the enzyme is lipase, amylase, xylanase, protease, glucosamylase, glucanase, mannanase, phytase, or cellulase.

22. The method of any one of Claims 18-21, wherein the method further comprises driving protein expression.

23. The method of any one of Claims 18-22, wherein the yeast cells are a species of methylotrophic yeast.

24. The method of any one of Claims 18-23, wherein the yeast cells are of the genus Komagataella.

25. The method of Claim 24, wherein the yeast cells are selected from the group consisting of K. farinosa, K. anomala, K. heedii, K. guilliermondii, K. kluyveri, K. membranifaciens, K. norvegensis, K. ohmeri, K. pastoris, K. methanolic, K. phafii and K. subpelliclosa.

26. The method of Claim 25, wherein the yeast cell is K. phafii.

27. The method of any one of Claims 18-26, wherein the nutrient broth comprises at least one carbon source.

28. The method of Claim 27, wherein the at least one carbon source is selected from a group consisting of dextrose, maltose, glucose, dextrin, glycerol, sorbitol, mannitol, lactic acid, acetate, xylose, or other partially hydrolyzed starches, and any mixtures thereof.

29. The method of Claim 27 or 28, wherein the concentrations of the at least one carbon source varies from 0.5g/L, lg/L, 2 g/L, 4 g/L, 6 g/L, 8 g/L, 10 g/L, 11 g/L, 12 g/L, 13 g/L, 14 g/L, 15 g/L, 16 g/L, 18 g/L, 20 g/L, 22 g/L, 24 g/L, 26 g/L, 28g/L or 60 g/L or any concentration within a range defined by any two aforementioned values.

Description:
PROMOTER FOR YEAST

CROSS-REFERENCE TO RELATED APPLICATIONS

[0000] The present application claims the benefit of priority to US Application No. 62/682053, filed on June 7, 2018, the contents of which are incorporated herein in their entirety.

INCORPORATION BY REFERENCE TO ANY PRIORITY APPLICATIONS

[0001] Any and all applications for which a foreign or domestic priority claim is identified in the Application Data Sheet as filed with the present application are hereby incorporated by reference under 37 CFR 1.57.

SEQUENCE LISTING

[0002] This application includes a nucleotide and amino acid sequence listing in computer readable form (CRF) as an ASC II text (.txt) file according to“Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings in International Patent Applications Under the Patent Cooperation Treaty (PCT)” ST.25. The sequence listing is identified below and is hereby incorporated by reference into the specification of this application in its entirety and for all purposes.

BACKGROUND

Field

[0003] Despite numerous disadvantages to using methanol in protein expression, methanol inducible promoters, such as the AOX1 promoter (P AOXI), have been widely used in Komagataella yeast expression systems for protein expression. As described herein, a promoter that can drive protein expression independently of methanol has been identified that works well with a variety of proteins for expression, such as enzymes.

Description of the Related Art

[0004] Komagataella phaffii is a successful system for the production of a wide variety of recombinant proteins that are not native to the Komagataella cell. Several factors have contributed to its success as a protein manufacturing system, some of which include: (1) a promoter derived from the alcohol oxidase I ( AOX1 ) gene of K. phaffii that is well suited for controlled expression of foreign genes; (2) similarity of techniques needed for the molecular genetic manipulation of K. phaffii to those of S. cerevisiae, which are well established ; (3) the strong preference of K. phaffii for respiratory growth, which is a key physiological trait that facilitates its culturing at high-cell densities relative to fermentative yeasts; and (4) the knowledge base on the Komagataella system as described in numerous recent publications. Furthermore, the genome of several K. phaffii species have been sequenced, which allows facilitated studies of the RNA and protein expression pathways. The culturing condition of K. phaffii is also relatively easy, as the cells can grow in a high density culture with high levels of proteins being expressed at the intra- and extra-cellular level.

[0005] K. phaffii is a single-celled microorganism that is easy to manipulate and culture. K. phaffii is a eukaryote capable of many of the post-translational modifications performed by higher eukaryotic cells such as proteolytic processing, folding, disulfide bond formation and glycosylation. Thus, the system may help to avoid loss of proteins that may end up as inactive inclusion bodies in bacterial systems, as bacterial systems lack methods of post-translation modifications. Foreign proteins requiring post- translational modification may be produced as biologically active molecules in K. phaffii. Additionally, the K. phaffii system has been shown to give higher expression levels of protein than many bacterial systems.

[0006] The ability of K. phaffii to utilize methanol as a sole source of carbon and energy was discovered in the l970s. There are two alcohol oxidase genes AOX1 and AOX2 which have strongly inducible promoters, the AOX promoters. These genes allow Komagataella to use methanol as a carbon and energy source. For example, the AOX1 protein is produced in response to depletion of some carbon sources, such as glucose, and the presence of methanol. In some cases, the gene encoding a desired heterologous protein can be introduced under the control of the AOX1 promoter, which means that gene expression and subsequent protein expression may be induced by the addition of methanol. As methanol could be synthesized from natural gas, methane, there was an interest in using these organisms for generating yeast biomass or single cell protein (SCP) to be marketed primarily as a high protein animal feed. During the l970s, media and methods for growing K. phaffii on methanol in continuous culture at high cell densities (<130 g/l dry cell weight) were developed. However, during this same period, the cost of methane increased dramatically due to the oil crisis. Thus, the SCP process was never economically competitive for protein production.

[0007] Methods were then developed in the l980’s to produce K. phaffii as a heterologous gene expression system. The AOX1 gene (and its promoter) was isolated and vectors, strains and methods for molecular genetic manipulation of K. phaffii were developed. The combination of strong regulated expression under control of the AOX1 promoter along with the fermentation media and methods developed for the SCP process resulted in high levels of foreign proteins in K. phaffii.

[0008] Recombinant protein expression in K. phaffii may be driven by the promoter AOX1 and induced by methanol and repressed by other carbon sources such as glucose, glycerol and ethanol. This induction and repression feature functions as a switch which turns recombinant protein expression on and off under different culture conditions. This switch is advantageous when expressing proteins that are toxic towards the host cell and towards cell growth. However, there are several limitations with this system. As the AOX1 system requires methanol, the toxic and flammable material may require special handling and protocols. Additionally, hydrogen peroxide (H 2 0 2 ) may be produced from methanol metabolism, which may also result in the degradation of recombinant proteins by the produced free radicals. The nature of methanol induction also limits where the manufacture location may be, and in some circumstances, may require long fermentation times and high biomass production. The production cost is considered to be high for a traditional Komagataella system (methanol inducible system).

[0009] As such, promoters that drive protein expression independently of methanol that work as well as or better than methanol inducible promoters, are sought. A promoter that may drive protein expression independently of methanol in yeast may reduce the protein expression cost and fermentation time. Additionally, there would be no need for food grade methanol in the process, thus allowing an easy and robust fermentation method for products such as edible and medical products. Thus, a promoter system for production of protein without the presence of methanol, or a constitutive promoter system would be advantageous for the expression of recombinant proteins in the K. phaffii system. SUMMARY

[0010] In a first aspect, a promoter comprising a nucleic acid sequence having at least of 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 99% or more sequence identity to any one of SEQ ID NO: 1-7, or a fragment thereof, wherein the promoter is operably linked to a gene encoding a protein, and the promoter drives the expression of the protein from a yeast cell in absence of methanol, is provided. In some embodiments, the sequence identity is over a region of at least 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 2050, 1100, 1150, or more residues, or the full length of the nucleic acid. In some embodiments, the fragment of Seq. ID No: 1- 7 is over a region of at least 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 2050, 1100, 1150, or more residues, or the full length of the nucleic acid. In some embodiments, the protein is an enzyme, a peptide, an antibody, or a recombinant protein. In some embodiments, the enzyme is a lipase, amylase, xylanase, protease, glucoamylase, glucanase, mannanase, phytase, or cellulase. In some embodiments, the protein is glycosylated. In some embodiments, the protein comprises disulfide bonds. In some embodiments, the nucleic acid sequence is 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1500, 2000, 3000, 4000 or 5000 bases upstream from a translational start site of the at least one sequence encoding the protein or any number of bases in between a range defined by any two aforementioned values upstream from the start site of the at least one sequence encoding the protein. In some embodiments, the yeast cell is a species of methylotrophic yeast. In some embodiments, the yeast cell is of the genus Komagataella. In some embodiments, the yeast cell is selected from: K. farinosa, K. anomala, K. heedii, K. guilliermondii, K. kluyveri, K. membranifaciens, K. norvegensis, K. ohmeri, K. pastoris, K. phaffii, K. methanolica and K. subpelliclosa. In some embodiments, the expression of protein is up to 40 g/l.

[0011] In a second aspect, a vector comprising the promoter of any one of the embodiments herein is provided. In some embodiments, the vector is a yeast integrative plasmid, episomal plasmid, centromere plasmid or artificial chromosome. In some embodiments, the vector comprises a selectable marker.

[0012] In a third aspect, a yeast cell comprising the promoter or the vector of any one of the embodiments herein is provided. [0013] In a fourth aspect, a protein expression system comprising the yeast cell of any one of the embodiments herein is provided.

[0014] In a fifth aspect, a method of expressing protein in a yeast cell is provided. The method comprises providing a yeast cell, introducing the promoter or the vector of any one of any one of the embodiments herein into the cell, fermenting the yeast cell under at least one fermentation condition in the absence of methanol in a nutrient broth, harvesting the cells and recovering protein from the cells. In some embodiments, the protein is excreted or is intracellular. In some embodiments, the protein is an enzyme, a peptide, an antibody, or a recombinant protein. In some embodiments, the enzyme is lipase, amylase, xylanase, protease, glucosamylase, glucanase, mannanase, phytase, or cellulase. In some embodiments, the method further comprises driving protein expression. In some embodiments, the yeast cells are a species of methylotrophic yeast. In some embodiments, the yeast cells are of the genus Komagataella. In some embodiments, the yeast cells are selected from the group consisting of K. farinosa, K. anomala, K. heedii, K. guilliermondii, K. kluyveri, K. membranifaciens, K. norvegensis, K. ohmeri, K. pastoris, K. methanolic, K. phafii and K. subpelliclosa. In some embodiments, the yeast cell is K. phafii. In some embodiments, the nutrient broth comprises at least one carbon source. In some embodiments, the at least one carbon source is selected from a group consisting of dextrose, maltose, glucose, dextrin, glycerol, sorbitol, mannitol, lactic acid, acetate, xylose, or other partially hydrolyzed starches, and any mixtures thereof. In some embodiments, the concentrations of the at least one carbon source varies from 0.0 g/l, 0.5g/L, lg/L, 2 g/L, 4 g/L, 6 g/L, 8 g/L, 10 g/L, 11 g/L, 12 g/L, 13 g/L, 14 g/L, 15 g/L, 16 g/L, 18 g/L, 20 g/L, 22 g/L, 24 g/L, 26 g/L, 28g/L, 30 g/L, 35 g/L, 40 g/L, 45 g/L, 50 g/L, 55g/L, or 60 g/L any concentration within a range defined by any two aforementioned values. In some embodiments, the method further comprises addition of the at least one carbon source by pulse or continuous feeding.

BRIEF DESCRIPTION OF THE DRAWINGS

[0015] Figure 1 demonstrates the expression of lipase under the control of the promoter pAOXl or pSDOOl in microtiter plates. Shown are protein PAGE gels in which expression of the lipase is shown under the control of pAOXl or pSDOOl. [0016] Figure 2 demonstrates the lipase fermentation yields from 45 to 150 hours in a broth that is deficient in methanol and a broth for methanol induction. As shown, the promoter pSDOOl drives more lipase expression in the absence of methanol than the pAOXl promoter in methanol induction conditions over the same period of time.

[0017] Figure 3 shows a schematic of the promoter pSDOOl as three functional forms, a 1.5 kb promoter, a lkb promoter and a 0.66 kb promoter for driving expression of lipase 1 (l.5kb promoter (A)) and lipase 2 (l.5kb promoter (B), 1 kb promoter (C) and 0.66kb promoter (D)). As shown in the protein PAGE gel, all variations of the promoter were able to drive expression of the lipases.

[0018] Figures 4A, 4B, and 4C are an array of protein gel assays which demonstrate the expression of amylase 1 and 2 and a xylanase under the control of the promoter pSDOOl in yeast cells in methanol-free expression conditions.

[0019] Figure 5 is a panel of protein gel assays which demonstrate the expression of lipase 1 and 3 under the control of several promoters (pSDOOl, pSD002, pSD003, pSD004, pSD005, pSD007, pSD008) in yeast cells in methanol-free expression conditions in micro titer plates.

[0020] Figure 6 shows whole broth fermentation yields of lipase 3 expressed in yeast cells under the control of the promoters: pSD003, pSD004 and pSD007. As shown, the pSD007 promoter led to the most expression of lipase between the promoters under two different methanol- free fermentation conditions at 120 hrs.

DETAILED DESCRIPTION

[0021] In the description that follows, the terms should be given their plain and ordinary meaning when read in light of the specification. One of skill in the art would understand the terms as used in view of the whole specification.

[0022] As used herein,“a” or“an” may mean one or more than one.

[0023] “About” as used herein when referring to a measurable value is meant to encompass variations of ±20% or ±10%, more preferably ±5%, even more preferably ±1%, and still more preferably ±0.1 % from the specified value.

[0024] “Methylotrophic yeast,” as described herein, have its plain and ordinary meaning when read in light of the specification, and may include but is not limited to, for example, a limited number of yeast species that can use reduced one-carbon compounds such as methanol or methane, and multi-carbon compounds that contain no carbon bonds, such as dimethyl ether and dimethylamine. For example, these species can use methanol as the sole carbon and energy source for cell growth. Without being limiting, methylotrophs may include the Genus Methanoscacina, Methylococcus capsulatus, Hansenula polymorpha, Candida boidinii, Komagataella pastoris and Komagataella phaffii, for example. In the embodiments described herein, a promoter that drive protein expression independently of methanol is provided for protein expression in a methylotrophic yeast cell.

[0025] “ Komagataella phaffii,” has its plain and ordinary meaning when read in light of the specification, and may include but is not limited to, for example, a species of methylotrophic yeast. “ Pichia phaffii” may also refer to the colloquial name as it has officially been renamed Komagataella phaffii (for the GS 115 strain used herein) or it may be also referred to as Komagataella pastoris, depending on which lineage it has.

[0026] Komagataella is widely used for protein expression using recombinant DNA techniques since its alcohol oxidase promoters were isolated and cloned. Hence it is used in biochemical and genetic research in academia and the biotechnical industry as it can express a wide range of diverse genes as compared to other microorganism such as Psesudomonas, Bacillus, and Aspergillus. Furthermore, the protein product is easier to purify and leads to a clean product. Komagataella is well suited for protein expression as it has a high growth rate and is able to grow on a simple, inexpensive medium. K phaffii can grow in either shaker flasks or a fermenter, which makes it suitable for both small and large scale expression. K phaffii has two alcohol oxidase genes AOX1 and AOX2, which have strongly inducible promoters. These genes allow Komagataella to use methanol as a carbon and energy source. The AOX promoters are induced by methanol and are repressed by glucose, for example. Often, the gene for a desired heterologous protein is introduced under the control of the AOX1 promoter, which means that protein expression can be induced by the addition of methanol. In a popular expression vector, the desired protein is produced as a fusion product to the secretion signal of the a-mating factor from Saccharomyces cerevisiae (baker's yeast). This causes the protein to be secreted into the growth medium, which greatly facilitates subsequent protein purification. Komagataella also has advantages over S. cerevisiae as well. Komagataella can easily be grown in cell suspension in reasonably strong methanol solutions that would kill most other micro- organisms, a system that is difficult to set up and maintain. As the protein yield from expression in a microbe is roughly equal to the product of the protein produced per cell and the number of cells, this makes Komagataella of great use when trying to produce large quantities of protein without expensive equipment. However, Komagataella may be unable to produce proteins for which the host may lack the proper chaperones. As such, Komagataella may be co-transformed with a nucleic acid or a gene that encodes a chaperone for proper protein folding.

[0027] “Chaperone protein,”“molecular chaperones,” or“chaperones” have their plain and ordinary meaning when read in light of the specification, and may include but is not limited to, for example, proteins that assist the covalent folding or unfolding and the assembly or disassembly of other macromolecular structures. Chaperones are present when the macromolecules perform their normal biological functions and have correctly completed the processes of folding and/or assembly. The chaperones are concerned primarily with protein folding. In some embodiments of the promoter, the promoter may drive protein expression independently of methanol. The protein may be a recombinant protein, such as for example, an enzyme. In some embodiments, a chaperone is expressed with the enzyme, wherein the chaperone assists in the folding of the enzyme. In some embodiments, expression of a chaperone leads to a functional enzyme. In some embodiments, the chaperone is expressed with a recombinant protein. In some embodiments, the promotor produces constitutively and is independent on the presence of methanol.

[0028] The budding yeast of strain K phaffii, can grow on methanol and has been widely used for over 30 years for heterologous protein expression. For example, over 70 products including therapeutic biologicals (mostly) and industrial enzymes have been produced using the K phaffii system. Protein from the system may be either secreted (>l6g/L) or produced for intracellular expression (> 20g/L). Most enzyme companies produce enzymes using a native host or homologous expression of the enzyme. However no native enzymes from Komagataella have been discovered for industrial use. Also appreciated by those skilled in the art, are methods for genome sequencing and molecular tools available for strain manipulation. Growth of the cells, fermentation and expression process are also well developed as the system has a long history of safe use and is regulatory friendly. Methods of growth of a typical culture for protein expression can be appreciated by those of skill in the art. In the embodiments provided herein, K. phaffii is used as an expression host for the expression of protein in a methanol-free environment.

[0029] “Nucleic acid” or“nucleic acid molecule” have their plain and ordinary meaning when read in light of the specification, and may include but is not limited to, for example, polynucleotides, such as deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), oligonucleotides, fragments generated by the polymerase chain reaction (PCR), and fragments generated by any of ligation, scission, endonuclease action, and exonuclease action. Sugar modifications include, for example, replacement of one or more hydroxyl groups with halogens, alkyl groups, amines, and azido groups, or sugars can be functionalized as ethers or esters. Moreover, the entire sugar moiety can be replaced with sterically and electronically similar structures, such as aza-sugars and carbocyclic sugar analogs. Examples of modifications in a base moiety include alkylated purines and pyrimidines, acylated purines or pyrimidines, or other well-known heterocyclic substitutes. Nucleic acid monomers can be linked by phosphodiester bonds or analogs of such linkages. Analogs of phosphodiester linkages include phosphorothioate, phosphorodithioate, phosphoroselenoate, phosphorodiselenoate, phosphoroanilothioate, phosphoranilidate, phosphoramidate, and the like. Nucleic acids can be either single stranded or double stranded. In some embodiments, a nucleic acid sequence encoding a fusion protein or recombinant protein is provided, wherein the protein expression is driven by a promoter that drives protein expression independent of methanol. In some embodiments, the nucleic acid comprises a promoter that is not inducible by methanol. In some embodiments, a cell comprising the nucleotide for protein expression that is independent of methanol is provided.

[0030] “Coding for" or“encoding” are used herein, and have their plain and ordinary meaning when read in light of the specification, and may include but is not limited to, for example, the property of specific sequences of nucleotides in a polynucleotide, such as a gene, a cDNA, or an mRNA, to serve as templates for synthesis of other macromolecules such as a defined sequence of amino acids. Thus, a gene codes for a protein if transcription and translation of mRNA corresponding to that gene produces the protein in a cell or other biological system. In some embodiments, a vector comprises a nucleic acid encoding a protein, wherein the nucleic acid encoding the protein is under the influence of a promoter that drives protein expression independently of methanol. [0031] A " nucleic acid sequence coding for a polypeptide" has its plain and ordinary meaning when read in light of the specification, and may include but is not limited to, for example, all nucleotide sequences that are degenerate versions of each other and that code for the same amino acid sequence.

[0032] “Vector,” “expression vector” or “construct” have their plain and ordinary meaning when read in light of the specification, and may include but is not limited to, for example, a nucleic acid used to introduce heterologous nucleic acids into a cell that has regulatory elements to provide expression of the heterologous nucleic acids in the cell. The vector, as described herein, is a nucleic acid molecule encoding a gene that is expressed in a host-cell. Typically, an expression vector comprises a transcription promoter, a gene, and a transcription terminator. Gene expression is usually placed under the control of a promoter, and such a gene is said to be“operably linked to” the promoter. Similarly, a regulatory element and a core promoter are operably linked if the regulatory element modulates the activity of the core promoter. Vectors include but are not limited to plasmid, minicircles, yeast, and viral genomes. Available commercial vectors are known to those of skill in the art. Commercial vectors are available from European Molecular Biology Laboratory and Atum, for example.

[0033] A“promoter that drives protein expression independently of methanol,” has its plain and ordinary meaning when read in light of the specification, and may include but is not limited to, for example, a promoter that may allow an increase in the expression of a specific gene in the absence of methanol.

[0034] “Constitutive promoter,” has its plain and ordinary meaning when read in light of the specification, and may include but is not limited to, for example, a promoter that is active in most circumstances in the cell. In some embodiments, the promoter drives heterologous protein expression independent of methanol, in yeast. In some embodiments, the yeast cells are a species of methylotrophic yeast. In some embodiments, the yeast cells are of the genus Komagataella. In some embodiments, the promoter is a constitutive promoter that may drives expression in the absence of methanol.

[0035] “Protein expression,” “protein expression,” have their plain and ordinary meaning when read in light of the specification, and may include but is not limited to, for example, the biotechnological process of generating a specific protein. It may be achieved by the manipulation of gene expression in an organism such that it expresses large amounts of a recombinant gene. Without being limiting, this may include the transcription of the recombinant DNA to messenger RNA (mRNA), the translation of mRNA into polypeptide chains, which are ultimately folded into functional proteins and may be targeted to specific subcellular or extracellular locations.

[0036] “Fusion proteins” or“chimeric proteins” have their plain and ordinary meaning when read in light of the specification, and may include but is not limited to, for example, proteins created through the joining of two or more genes that originally coded for separate proteins or portions of proteins. The fusion proteins can also be made up of specific protein domains from two or more separate proteins. Translation of this fusion gene can result in a single or multiple polypeptides with functional properties derived from each of the original proteins. Recombinant fusion proteins can be created artificially by recombinant DNA technology for use in biological research or therapeutics. Such methods for creating fusion proteins are known to those skilled in the art. Some fusion proteins combine whole peptides and therefore can contain all domains, especially functional domains, of the original proteins. However, other fusion proteins, especially those that are non-naturally occurring, combine only portions of coding sequences and therefore do not maintain the original functions of the parental genes that formed them. In some embodiments, promoters are provided that drive protein expression independently of methanol and are useful in driving protein expression in yeast. In some embodiments, the promoter is useful in driving expression of a fusion protein.

[0037] “Promoter” has its plain and ordinary meaning when read in light of the specification, and may include but is not limited to, for example, a nucleotide sequence that directs the transcription of a structural gene. In some embodiments, a promoter is located in the 5’ non-coding region of a gene, proximal to the transcriptional start site of a structural gene. Sequence elements within promoters that function in the initiation of transcription may also be characterized by consensus nucleotide sequences. These promoter elements include RNA polymerase binding sites, TATA sequences, CAAT sequences, differentiation- specific elements (DSEs; McGehee et al., Mol. Endocrinol. 7:551 (1993); incorporated by reference in its entirety), cyclic AMP response elements (CREs), serum response elements (SREs; Treisman, Seminars in Cancer Biol. 1:47 (1990); incorporated by reference in its entirety), glucocorticoid response elements (GREs), and binding sites for other transcription factors, such as CRE/ATF (O'Reilly et al., J. Biol. Chem. 267: 19938 (1992); incorporated by reference in its entirety), AP2 (Ye et al., J. Biol. Chem. 269:25728 (1994); incorporated by reference in its entirety), SP1, cAMP response element binding protein (CREB; Loeken, Gene Expr. 3:253 (1993); incorporated by reference in its entirety) and octamer factors (see, in general, Watson et al., eds., Molecular Biology of the Gene, 4th ed. (The Benjamin/Cummings Publishing Company, Inc. 1987; incorporated by reference in its entirety)), and Lemaigre and Rousseau, Biochem. J. 303: 1 (1994); incorporated by reference in its entirety). A promoter may be constitutively active, repressible or inducible. If a promoter is an inducible promoter, then the rate of transcription initiation increases in response to an inducing agent. In contrast, the rate of transcription initiation is not regulated by an inducing agent if the promoter is a constitutive promoter. Repressible promoters are also known. In some embodiments, a gene delivery polynucleotide or vector is provided. In some embodiments, the gene delivery polynucleotide comprises a promoter sequence. The promoter can be specific for bacterial, mammalian or yeast expression, for example. In some embodiments, wherein a nucleic acid encoding a protein of interest is provided, the nucleic acid further comprises a promoter sequence. In some embodiments, the promoter is specific for expression in yeast. In some embodiments, the promoter is a conditional, inducible or a constitutive promoter. In some embodiments, the promoter is a promoter that is useful in driving protein expression independently of methanol, wherein the promoter drives protein expression in a methanol-free media. The promoters isolated herein may be inducible or constitutive and may drive protein expression in the absence of methanol.

[0038] “Conditional” or “inducible” have their plain and ordinary meaning when read in light of the specification, and may include but is not limited to, for example, a nucleic acid construct that includes a promoter that provides for gene expression in the presence of an inducer and does not substantially provide for gene expression in the absence of the inducer. In some embodiments, the promoter is an inducible promoter. In some embodiments, the promoter is an inducible promoter for yeast protein expression.

[0039] “Regulatory element” has its plain and ordinary meaning when read in light of the specification, and may include but is not limited to, for example, a regulatory sequence, which is any DNA sequence that is responsible for the regulation of gene expression, such as promoters and operators. The regulatory element can be a segment of a nucleic acid molecule, which is capable of increasing or decreasing the expression of specific genes within an organism. In some alternatives described herein, the gene is under a control of a regulatory element.

[0040] “Host cell” has its plain and ordinary meaning when read in light of the specification, and may include but is not limited to, for example, a cell that is introduced with a nucleic acid or vector that encodes a protein or gene of interest. In some embodiments, the host cell is an isolated cell. In the embodiments, described herein, the host cell is a yeast cell. In some embodiments, the cell is a methylotroph yeast cell. In some embodiments, the yeast cell is of Komagataella phaffii. In some embodiments, promoters that drive protein expression independently of methanol that are useful in driving protein expression in yeast is provided. In some embodiments, the promoter drives heterologous protein expression in yeast. In some embodiments, the yeast cells are of the genus Komagataella. In some embodiments, the isolated host cell is a yeast cell. In some embodiments, the isolated host cell is Komagataella phaffii.

[0041] The term “gene expression” refers to the biosynthesis of a gene product. For example, in the case of a gene encoding a structural protein, gene expression involves transcription of the gene into mRNA and translation of mRNA into the structural protein.

[0042] “Protein” has its plain and ordinary meaning when read in light of the specification, and may include but is not limited to, for example, a macro molecule comprising one or more polypeptide chains. A protein can also comprise non-peptide components, such as carbohydrate groups. Carbohydrates and other non-peptide substituents, such as post-translational modifications, can be added to a protein by the cell in which the protein is produced, and will vary with the type of cell. Proteins are defined herein in terms of their amino acid backbone structures; substituents such as carbohydrate groups are generally not specified, but can be present nonetheless. In some embodiments, a gene delivery polynucleotide or vector, is provided for expression protein, in a methanol independent method, in a Komagataella system. In some embodiments, the gene delivery polynucleotide or vector further comprises a sequence for at least one protein.

[0043] “Gene” has its plain and ordinary meaning when read in light of the specification, and may include but is not limited to, for example, the molecular unit of heredity of a living organism, describing some stretches of deoxyribonucleic acids (DNA) and ribonucleic acids (RNA) that code for a polypeptide or for an RNA chain that has a function in the organism, and can be a locatable region in the genome of an organism.

[0044] Genetic modification performed by transformation is described herein. “Transformation” refers to transferring genetic material, such as, for example, nucleic acids, PCT amplified nucleic acids, or synthetic DNA or RNA, to a cell. Common techniques employed for transferring genetic material may use viruses or viral vectors, electroporation, and/or chemical reagents to increase cell permeability. In some alternatives herein, the isolated host cell is transformed by electroporation. In some embodiments, the isolated host cell is transformed by exposure to alkali cations in the presence of a vector, plasmid or DNA.

[0045] Various transformation techniques have been developed and can be appreciated by one of skill in the art. Thus, gene transfer and expression methods are numerous but essentially function to introduce and express genetic material in yeast cells

“sequence alignment” means a comparison of a first amino acid sequence to a second amino acid sequence, or a comparison of a first nucleic acid sequence to a second nucleic acid sequence and is calculated as a percentage based on the comparison. The result of this calculation can be described as“percent identical” or“percent ID.”

[0047] Generally, a sequence alignment can be used to calculate the sequence identity by one of two different approaches. In the first approach, both mismatches at a single position and gaps at a single position are counted as non-identical positions in final sequence identity calculation. In the second approach, mismatches at a single position are counted as non-identical positions in final sequence identity calculation; however, gaps at a single position are not counted (ignored) as non- identical positions in final sequence identity calculation. In other words, in the second approach gaps are ignored in final sequence identity calculation. The difference between these two approaches, i.e. counting gaps as non-identical positions vs ignoring gaps, at a single position can lead to variability in the sequence identity value between two sequences.

[0048] A sequence identity is determined by a program, which produces an alignment, and calculates identity counting both mismatches at a single position and gaps at a single position as non- identical positions in final sequence identity calculation. For example program Needle (EMBOS), which has implemented the algorithm of Needleman and Wunsch (Needleman and Wunsch, 1970, J. Mol. Biol. 48: 443-453), and which calculates sequence identity per default settings by first producing an alignment between a first sequence and a second sequence, then counting the number of identical positions over the length of the alignment, then dividing the number of identical residues by the length of an alignment, then multiplying this number by 100 to generate the % sequence identity [% sequence identity = (# of Identical residues / length of alignment) x 100)].

[0049] A sequence identity can be calculated from a pairwise alignment showing both sequences over the full length, so showing the first sequence and the second sequence in their full length (“Global sequence identity”). For example, program Needle (EMBOSS) produces such alignments; % sequence identity = (# of identical residues / length of alignment) x 100)].

[0050] A sequence identity can be calculated from a pairwise alignment showing only a local region of the first sequence or the second sequence (“Local Identity”). For example, program Blast (NCBI) produces such alignments; % sequence identity = (# of Identical residues / length of alignment) x 100)].

[0051] The sequence alignment is preferably generated by using the algorithm of Needleman and Wunsch (J. Mol. Biol. (1979) 48, p. 443-453). Preferably, the program “NEEDLE” (The European Molecular Biology Open Software Suite (EMBOSS)) is used with the programs default parameter (gap open=l0.0, gap extend=0.5 and matrix=EBLOSUM62 for proteins and matrix=EDNAFULL for nucleotides). Then, a sequence identity can be calculated from the alignment showing both sequences over the full length, so showing the first sequence and the second sequence in their full length (“Global sequence identity”). For example: % sequence identity = (# of identical residues / length of alignment) x 100)].

[0052] The variant nucleic acids are described by reference to a nucleic acid sequence which is at least n% identical to the nucleic acid sequence of the respective parent enzyme with“n” being an integer between 80 and 100. The variant nucleic acids include sequences that are at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% identical when compared to the full-length sequence of the parent nucleic acid according to SEQ ID Nos. 1-7, wherein the variant is a promoter.

[0053] The variant nucleic acid comprises at least one modification compared to the parent nucleic acid. The variant nucleic acid of the present invention comprises at least one nucleotide substitution, nucleotide insertion and/or nucleotide deletion compared to the parent nucleic acid.

[0054] The yeast Komagataella phaffii has been widely used as a heterologous protein expression host. Strong inducible promoters derived from methanol utilization genes or constitutive glycolytic promoters are typically used to drive gene expression. Notably, genes involved in methanol utilization are not only repressed by the presence of glucose, but also by glycerol.

[0055] As described herein, novel promoters that drive protein expression independently of methanol to drive high heterologous expression in scale-relevant fermentation conditions in Komagataella phaffii are provided. Use of the promoters may lower the overall biomass and reduce cost of the expression of protein. Thus, the promoters described herein, drive protein expression independently of methanol and are helpful for allowing robust and efficient high throughput screening in Komagataella.

[0056] As described herein, the identified promoters may influence heterologous gene expression using fermentation conditions.

[0057] Some promoters for expression of genes in the absence of methanol have been previously described. For example, inducible promoters have previously been published for small molecule induction. Without being limiting, current promoters that induced independently of methanol include SUC2, P C UPI, P GALI P ADH, for example. However, the inducers for these specific promoters can be expensive. In addition, carbon- source dependent promoters have also been published. These can rely on relatively expensive carbon sources and can also be repressed by glucose, such as P A DH2, GLK1, HXK2 and PI SI, for example. Likewise, constitutive promoters have also been described, such as the glyceraldehyde- 3 -phosphate dehydrogenase (GAP). (Weinhandl el al. 2014; included by reference in its entirety herein). [0058] A problem with such known systems of promoters that drive protein expression independently of methanol for Komagataella is that these promoters have a weaker activity compared to the methanol-inducible AOX1 promoters. Previous studies have focused on strong promoters from shaker flask conditions, which might not correlate well to performance in scale-relevant or full-scale fermentation conditions. An ideal promoter would be strongly induced under scale-relevant fermentation conditions.

[0059] Thus, promoters that drive protein expression independently of methanol, are commercially desired to enable robust processes of protein expression, low- cost medium components, and lower levels of biomass.

[0060] Described herein are identified Komagataella native promoters that are capable of driving protein expression in a media that lacks methanol.

[0061] The recombinant expression system driven by methanol induction has several limitations. As the promoter PAOXI requires methanol, the methods require special handling and may not be suitable in the expression of edible and medical products. Additionally, the use of methanol may lead to the by-product hydrogen peroxide (H2O2) from methanol metabolism which is known to lead to oxidative stress, which may lead to the degradation of the recombinant protein one is wishing to express.

Example 1: Expression of proteins under the control of the promoters

[0062] Expression vectors are constructed with the promoter regions upstream of a gene for expression of a fusion protein or an enzyme, such as lipase. Vectors for protein expression may be constructed with the promoter placed immediately upstream of the translational start site of a gene encoding the protein. Thus, in some embodiments, these vectors can be used for transforming cells for protein expression in the absence of methanol. In some embodiments the cells are Komagataella cells.

[0063] Protein expression from the Komagataella cells may be assayed under fermentation conditions. It should be expected that the promoters described herein will drive protein expression independent of methanol (SEQ ID NO: 1-7).

Example 2 Expression of proteins under the control of a promoter.

[0064] As shown in Figure 1, the isolated promoter, pSDOOl when compared to the control pAOXl, was able to drive expression of Lipase 1 in the absence of methanol in microtiter plates. Assays to measure the yield of lipase in fermentation broth were also performed which show that the pSDOOl promoter led to expression of the marker protein, Lipase in fermentation broth in the absence of methanol (Figure 2). In both micro titer plate and fermentation conditions, the expression of lipase 1 was higher in methanol-free conditions using the pSDOOl promoter than in methanol induction conditions using the pAOX promoter.

[0065] Several variations of the pSDOOl promoter were constructed as shown in the diagram on Figure 3. These constructs were then ligated to Lipase 1 or Lipase 2 genes and placed in an expression vector. As shown, a 1.5 kb promoter, a lkb promoter and a 0.66 kb promoter for driving expression of lipase 1 (l.5kb promoter (A)) and lipase 2 (l.5kb promoter (B), 1 kb promoter (C) and 0.66kb promoter (D)) were used to drive lipase expression. As shown in Figure 3, the protein PAGE gel, all variations of the promoters were able to drive expression of the lipases.

[0066] The pSDOOl promoter was also tested for driving expression of other classes of enzymes. As shown in the panels, the promoter was able to drive expression of two amylases (amylase 1 and 2) and a xylanase in the absence of methanol. (Figure 4A- 4C).

[0067] Various promoters were also tested for the ability to drive protein expression of lipase 1 and lipase 3 (Promoters: pSDOOl (SEQ ID NO: 1), pSD002 (SEQ ID NO: 2), pSD003 (SEQ ID NO: 3), pSD004 (SEQ ID NO: 4), pSD005 (SEQ ID NO:

5), pSD007 (SEQ ID NO: 6) and pSD008 (SEQ ID NO: 7)). All promoters can drive lipase expression to various levels in microtiter plates, as shown in Figure 5. The promoters: pSD003, SD004 and pSD007 were also tested in methanol-free fermentation conditions. As shown in Figure 6, all three promoters led to lipase expression and the promoter pSD007 led to the most expression of protein.

[0068] With respect to the use of plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. The various singular/plural permutations may be expressly set forth herein for sake of clarity.

[0069] It will be understood by those of skill within the art that, in general, terms used herein, and especially in the appended claims (e.g., bodies of the appended claims) are generally intended as “open” terms (e.g., the term“including” should be interpreted as“including but not limited to,” the term“having” should be interpreted as “having at least,” the term“includes” should be interpreted as“includes but is not limited to,” etc.)· It will be further understood by those within the art that if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases "at least one" and "one or more" to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles "a" or "an" limits any particular claim containing such introduced claim recitation to embodiments containing only one such recitation, even when the same claim includes the introductory phrases "one or more" or "at least one" and indefinite articles such as "a" or "an" (e.g.,“a” and/or“an” should be interpreted to mean“at least one” or“one or more”); the same holds true for the use of definite articles used to introduce claim recitations. In addition, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should be interpreted to mean at least the recited number (e.g., the bare recitation of "two recitations," without other modifiers, means at least two recitations, or two or more recitations). Furthermore, in those instances where a convention analogous to“at least one of A, B, and C, etc.” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g.,“ a system having at least one of A, B, and C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). In those instances where a convention analogous to“at least one of A, B, or C, etc.” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g.,“ a system having at least one of A, B, or C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase“A or B” will be understood to include the possibilities of“A” or“B” or“A and B.”

[0070] In addition, where features or aspects of the disclosure are described in terms of Markush groups, those skilled in the art will recognize that the disclosure is also thereby described in terms of any individual member or subgroup of members of the Markush group.

[0071] Any of the features of an embodiment of any one of the aspects is applicable to all aspects and embodiments identified herein. Moreover, any of the features of an embodiment any one of the aspects is independently combinable, partly or wholly with other embodiments described herein in any way, e.g., one, two, or three or more embodiments may be combinable in whole or in part. Further, any of the features of an embodiment of any one of the aspects may be made optional to other aspects or embodiments.

Sequences pSDOOl- SEQ ID NO: 1

T CC AGT GT AGC ACT AAAAT CT AAT AT CTTCGGCTTT AT ACTTTTTT GTT C AT CCG AAAGCTT ACG AAC AATT CTTTCTCCT GTTTT ATT GT GG AT AT AG AC AAT TTCGT C AGTTT CTT GG AG AG AAG AGTT ATTT CCGGTTTT GGCT GGCCCT AT AA ACGGGTT CTT GG ATTT GG AT CT AGT AAT AAAAAT GT C ACT GT C ATT CTCGG AG CT G AACTTT GT GTT GT ACG AAG AT GGGTT GTT CC ACT GTTTT GCC AGCT CTTC A TT GAT G ATTTT CTT AGT GGGT GTT CTT GG AGGTT C ACGTT GCCT AT AAT CTT G A CGTT CTTCTT CAT C ACT ATCG AT GCC AT C AAAATT AAGCGT CCTT ATT GC AGGC TTTT GT G ATTT C AACT GC AAT CCTTCT AT CTCTT CAT C AG AGCTTTCG AACT G A AT ACT AT C ACT C AAAACT GGCG AC ATT GC AC ATTT CCGC AAACC ATTTCGGG A AT CT AT GCT AGCT CTTCT AG ACG AT AAAG AACG ACCGG AACC AAT ACGGGGTT GT GC AGGT GGG AAT AAAT AT GTT GGTTT GG ATT CTT G ACGT G AAG AAGGT ATT CT AGTCG AT G AAGT GGTT GAT AAGG AT AT GGCGT C ACT G AGTT GTTTT CTTTT CCT AT GTT GCGGT GTT GGGT C AGG AGTT AATT GATT C ACCT CC AT AACT CTGG AATTT CTT G AAT GT GGGGTTTT C AG AT GGGC AT CTTTCTT G ACGGGGTT GT G A GTAACGGAGGAACCTGGTGTCTTGGGTGTGAACGGTGTTTGAGCCTGTACGCG GTT ACTT CT GGGCGG AGT ACTCGG AGT CAT G AG AGCC ATT GATT AG AAGGT G A AT G AGGG AGT C ACC ACT CT AAGCAAAC AAAAT G AGGTCG AAGC AAAAAAT AA AGT AAAGT AGC ACTT CT GGC AGGTT AG AT C AAAG AGT G ACGGG AG ATTT G AA GAT GGCT GGTTTTT CCTT AGT CTT GG AAG AGGTTT GT GT GGGT AT C AGCG AAT ATT CCCCG ATT AGGC AAATT AGTT GC ATT G AAATT AAC ACG AC AT GGT G ATTT GT GGT AAC AAAT AT CT ATT GGT GGTT GGT GT GT GGGT GT AAT AGT GGTCGT GT CAT GAT GAT GGT GTT C AGGT GTT GT CAT AG ATCGGT CTT C AGT AAG AG AAGG A AGCTT GGT G ACG AT C AC AGCT AT GAT GT AAT AG AAATT GCT AAGC AATT GT G A GGT GT GAT GT ATTTT GC AG AGC AATT GT GCGGT AC AACGGGGT GTT ATT GT CT T C AC AAGGC ATTT ATT GCG AATTTCGT AGTT G AAAG AAT ATTTT AGC AC AGGG T GCTT G ACCCCT ATT GTT GCTCGCT AAACC AT GATT GCT AAAT GAT G AC AT AG C AAT C ACTTT ACT AAG ATT GCT AT AAGG AC ACCTTT CTT AGT AT AAAT GG AC A CTCTTTTCCCCT GCT AAACTT CTTTT ATTTTT C AC ACTT AAAC AGTT AC AAAAC AC AAAC AC AACT AG AA pSD002-SEQ ID NO: 2 GT GCT AAAAT CT G AGGTTT AC AAGCT GT GAT GTT CCCCT AAG AT CT C AC AATCG AAC AATCGCG AAGCC AAT GC AAGTT GTTT AAGGGG AAACG ACT C ACT A TTCCT G AAATT AGT ATT C AAAACTT GGT CCGG AAG AAC AAT G AGGCGGCCGTT AAAAT ACT C ACGT AAACGGT GT CT AC AAGCGC ATT AAAAT CCGTTT G AATT C A AGC AAAAGCC ACC AG AGGCTT AT GCTT GGTT AT ACCC AGC ATT G ACCTTT GGT AT G AGC AT CT G AAAAAC AACC AGGT GTT GC AAAGTT AAAC AT CCTTCTTT GTT CAT AT AG AACCC ACT ATT CAT GGT ACT CCCC AATCG AATTT C AC ATT CT GGTTT T G AAATT AC AC ACC ACGTT AGCTT AT AAG ATTT CAT AT AACTT ATT GAT AT ACG GTTT CC ATT GTTCG AAT AGTT G AGGTT GT AT GT AATTCG ATT G AAGGGGCC AT TTTT GTTT CCT ACTTTT CCT GGG AGCTT AT CCG AT GCGCTT C AAAGCT GG AATT GT AAAT AT AG AG AAAAAG AAGG AT GTT GTTTT ATT CTT G AAAG AGT AT AATTT T ACTT CT AGC AACT CT CCC ACTTCGCTT G ACTT C ATTT ATTT CTT GGGC AC AT A GGCGT AGT AAT CT AG ACC AAC AG AT AATTT GCCGG AAT GAT AT AGCG ATT GG A AAAT G AACT G AAATTTTTT GCT GT CTTT C AATTT G ACGGGC AGTT CAT C AGT G A CCG ACC AT AT AAAT ACGTT GAG AAT GTT ATT CTT CCTCGT AGTT G AAGT GGCTT CAT AATTT C AG AACT C AAT AG AT AAACT AGG AT GTTTT AAAGC AATT AAT GCT C AC AAGT AAGG AGCG ACT CTCTT GCTTTTCG AAT ACT AAAAGT ATCGT CCC AA CCC AG AAAAAAAG ACCT CTT AACT GC AAAAT AAACT CT AT AT ATTT CTTCT AA AAC AGTTT C AGGTT GG AT AGT ATCGC ATT CT CAT C ACTT CT AACT AGT AGGCC AT GAG AT AT ATT AACGTTT ACTT G AGTT CT AAGTT CT CCG AATT AG AT GC AC A GC AC AAAC AAG ATT AGGTTT C ACTT GGT AC AAAAT ACG AAC AG AGTTT AAGGT CGT AATTT C ATTTCGTT ATT GAT CCCC AC AAT CT ATT CTT AT C AC AGT CAT C AG AT AGTCGCG AAAAAGC AT GC AG AAAAGGGGGTCGT CCCT AT CT AAGTT GT AG C ATT AC AAC AAAT AT G ACT AC ACT C AGT GTCGC AATCGGT AT AGCC AACGCT G C AAAAT GG ATT CT ACT GAG AAT GGT AT GAT GAT CCC AGG AT C AATTT CCC AAA AATT AAAAAAAGT AAAAT AAAAAGC AT C AG AT ATT AGGG AGGT GGT AAG ATT GCT CT GC AAGCG AT C ACG AG ATTTT AGGTTTT CCTTT AT GT ACT AT AT AAAGCG C AG ATT GG AT GCCGCTTTT CCCTCCT GGGCT AT GAT AAT AT AGCG AACG AAAT AC ACGCC AAAAT AAA pSD003-SEQ ID NO: 3

T C AC ATT CAT AGC AT CT CTCGCCT GC AAT AGCTT CC ACG AT AGG AAT AT CT GT G AAAGT G AAC AT GCT ATTTCG AT GAT AT AAG ACTTT AAG AT CT GGC AT G TTT GT GTT GG AGGTT ACCCT GGGGT C AAT AACCCT AATT AT CTCCTT C ACT AAA AAT GAT G AAG ATT CTTCGG ATTCGTTTTT G AAC AG AGTT AAT GCC ATTT CTTCG T C AAT AG AAAAAT C AAT AT CT GGT AT CT CAT CTTTT AC AT ATT G AGG ATTT AGT TTTCTTCCCTTT GG AT AGT AC ATT AT GAT C AAT GT ATT CCT GT CTTT ATT GAT A AAGT ATT GGC ATT CT GCTT CTT GT AC ACCTTT G AATT GTTT GT CT GG AAGT G AC T G AC ATTTTT CC AC ATT GCT AACGGTTT GGC ACG AATT AC AT CT AAAT AAAAT GT CTTCT CCGG ATTCGT GT ATT AAGT GAT ACT CC AAT GAT AAAT CCCC ACCT AT CG AACC AG AATCGGC ATT GGCC AC AGT C AC AGGT AACTTT AGGT CTT G AAAAA TCCTTCT AT AGGCTT C ATT G AC ATT GT CAT AAG ACTT AAG ACC AT CTTCTTTGG T C AAGT C AAAAG AAT AGGC AT CTTT CAT GAG AAACT CTCGT CCTCT C AAC AAA CCTCCCCT AGGT CT C AACT CAT CTCT AT ATTT GCGGG AAATTT GGT AC ACG AG AAGGGGT AAAT CTTT AT AT G ACG AAC AT AAGT C ACC AACT AAGTTT GT G ATTT CCTCTT C AC AAGTT GGC ACT AAAC AGT AGT CTCT AT CCTT GG AGT CTTT G AACT T G AAC AATT C ATT GTT GT CCC AT CTCTT AGTT CTCT CCC AT AAAT GCTT GG AAG AC AGGCT ACTT AATT CC ATTT CC AGCCC ACC AGCCT GAT CC ATT CTTTTCCT AA TT AC ATTTT G AAGCTTTTT AT AGGT ACGG AGT CCT AAT GG AAGCC AGT G AACT ATT CCT GCT GC AGGCT GGT AAAT AAACCTT GATT G AAGG AGC AT AT CAT G AGT AGT AAGGT CCTTT AC AG AAAAT AGTTT ACTT CCTT G AAG AG AAGT AG AAT AAA ACCT CAT GTT GGGT CT CC AT G AAAGGTT C AAAGGC ATT GAT CCTTT AGGT ACT T C AGG AT GTTT AAGT CAT C AAACT GT CC AT C AAAGGT AGT AT AGT ATTT ACC A TCT AG AT AGT GAT GT AT GGGT GT AAC AC AAC ATTT AAAT GTT GT AAATT AAC A TT AGG ACT G AGT CCGG AG AT GCT ATT GT C ACCT AAAT CT ATT AG AAAGC ACTT C AGTT AT AT C ATCG AT AG AGGTTT G AAG AT AAACCT ATT GTT GAT AAAT AACC CC ATT ACCCGTTT ACGT AGC AAGGTT C AAAAATTT GCTT AG ATCGG AGCT AAA AATTCG ACT G ACTT CTTTCG AAAAT GT GG ATT AT GC AAGC AACGTT GCT ATCG G AAT AGT AT AT AAGGTCG AT CT GCCCC ATT AC AAATT GT AAAGC AAC AAAC AT CCTACGCAAA p SD004- SEQ ID NO: 4

TCAGTTTCACGGTTATGTGAGCTGTCTCCGCGTGAGGCAGTAACCTCTG T GT CAT GG AT AC AGGCT GGT AC AC ATTT GGC AGT AGG AAC AC AAT CT GGTTT A GTT G AAAT AT GGG ACGCC ACG ACGT CC AAAT GT AC AAG AT C AAT G ACT GGGC ATTCGGCCCGAACCTCAGCGCTGAGTTGGAACCGTCATGTTTTGAGTTCTGGT T C AAG AG ATCGC AGT AT CTT AC ATCGGG AT GT ACGT GC AGC AGCT C ACT AT AC AAGTCGCATTGTTGAACACCGCCAAGAGGTTTGTGGCTTACGTTGGAACGTGG AT G AAAAC AAGCT GGCC AGT GGTT CC AAT GAT AACCGT AT GAT GGT AT GGG AT GCACTGCGTGTAGAACAGCCCCTTATGAAAGTTGAAGAGCATACTGCGGCTGT TAAGGCGTTGGCATGGTCACCTCATCAACGTGGAATACTGGCTTCGGGTGGAG GT ACT GCT G AC AG ACGT AT C AAGGT GT GG AAT ACTTT AAC AGG AT CC AAGCT G C ACG AT GTT GAT ACT GG AT CT C AAGTTT GT AAT CTCTT GT GGT CTCGC AATT CT AAT G AATT GGT AAGT ACT CAT GG AT ATT CTCG AAACC AAGTCGTT ATTT GG AA AT AT CCGC AAAT G AAGC AACT AGC AT CTTT G ACT GGT CAT ACTT ATCG AGT CC TTTACCTTTCCATGTCACCTGATGGAACTACAGTCGTAACGGGGGCTGGAGAC GAAACTTTAAGATTTTGGAACTGTTTCGAGAAGTCACGACAAAGCGGAGGAG GAT C AAT ATT ACT AG ACGCTTTT AGT C AGCTTCGTT AAATT ACC ACC AAATTT G GT GC AAAAGGGCCC AT AT GGT GCT AC AACC AAAGG AACTTT CT AATTTT GAT A AT GAT GT C ATTT CTCT C ATCGGG AT G AAAAT AG AAGTCG AAAGG ATTTTT GT C ACT ATTT C AAGCCCC ACCT GC AGCT GGC AGC ATTT CT ATT GTTT AT GC ATT GT C ATTT AT GGG AAAACT AAG AAAGTT CCTCT CC ACCCGG ACT CC ACT GGT AAAT A T GCG AT ATCGG AAT CAT G ACC AACCC AT ATTTT GAT CCT AAT C ATTTCGGTT CT AGTCTCCGATCGGACTCCGTAAAACTGCGGAGTGAACTCCAACGGAGAATACT GC AGCC AAT CT CAT ATTT C ATTT GTT ATTT GT CCCT C AACT GT CTCG AT AAGGT CAT CT GT GTTT G ACT AG AT GTTCGT C ATT GGC AT GT C AAAC AAGGCT AG ACCT T AC AAT CAT CTCTT ACG AAT GT AAGT G AAT GT AACT AT ATTTT CCTT GCT ACTT TAACGAGGTTAACCAACCCCCGCACATCCCCACACCACCGCTCTTGATAAGCA TCT CCG AAAAT GC AT G ACGCG AC AACTT C AAGC AT GTT GT ATTT ACT G AGTTT T C AGCCT C ACT ATCG AT ACCT CT AT AAAT AG AGGC ACTTTCGT CTCTTCTCCCT CCCCACAAGAAACCA p SD005- SEQ ID NO: 5

AG AAGT ACT GTT AT GAATCG ATCG ACGT G AC AT GTT GTT GAT GGTT CTG ACTT CTT GAT GT CCGCGTTTT CT GT CTCT C AAT AGT GGT GTTCGGGGG AAGT AT GGTT CT AAT ACTT AAC AGGT AAG AT GGTT GC AAT G AGC ACCT GGT AAAGC AAC TT G AATTT CCT GCCCT GT CT CCGTT AAGTT AT ATTCG ACT C AAGGT CCTT GCTT CCT GT CT GTT CT GT AAAACTT CCCTTT GGT GT CTTCT AT AT C AACTTT AAAAAC AAGGT AGT GT GTCG AGCG AT AGT ACT GT GT CTTTTTCCCT AT G AAAAAAATCG C ACC AT CC AAG ACTT CT C ACCTT C AAC AGCTT C AAC AT CAT GTTCGGT CCTTTT AGAGCTACGCTGGTCGATCTAGGAGGTCTGCTATGGAAACGTCCTTGGAGAAT GT CC AAACC AC AG AAAT AT AG ACT CCGC AAAAG AAT GC AACTT GT AG ACT CCA AT ATCG AC ATT ATTT ACC AGGG ACT G ACT G AGG AGGGT CT GT CTT GC AAAGT G AT AG AT AACTT GAAAC AAAACTT CCC AAAGG AGC AT G AAGT GCT CCCC AAAAA C AAGT AT ACCGT GTTT AAC AAG AC AGCC AAAAACT AT AG AAAGGGT GTT C ATT T GGTT CC AAAAT GG ACC AAG AAGT CTTT G AG AG AG AACCCCG AGTT CTTCT AA TT GC AC ATTT CTTCCT GTT CAT AG ATT AT CCC AC AC AT AGTT GCT C AC AAAAAA AT C ACT AT AATTTT CCT CC ACCGGC AGT AT AT C ACT AAC ACCTTT AT CTTT ATT GT AG ATT AT AAT CT GAT CTTT AT CCTT AG AT GT AT CT AT CAT C AACCCC AT GCT CTT G AAAAGCTT G AGT CTT AAC ACT GTCG AATCGT AGTTTT CTT GT AG AT C ATT CG AT AT C ACT GCTTTTT CTT GCT CTTCT AATTCGTT GAG ATT CT GGGT C AAACT AG AG ATT G AATT CT G AAGGT GATT CAT GTT CAT CT CC AG AT CT GTT ATT G ATTT T GCT AATTT AAATTTTTCGT GTT C AAGCT CTTCG AT ACT CTTT AGGGT CT GTT G ACGGT CTTCT GTTT CC AAT AATT GCTT GTT G AACT CTTT AAGTTCGT CTCTCTG TTT ACT GAT ACGT G AC AAC AAAT CT AGCT GGT G ATCG AGTTT AAGTTT CCGTTT GG AGCT C AAC AG AG AAAG ATTTT C ATT AATTT GGTT GAT AGTTT GC ACGT CCG GTTCG AT CT G AAAATT CTCT AT AGTCG ACCT GATT AAGG AC AC AGT CTCTT G A AG ATCGG AC ATT GG ATTT AT GG AG AAGGG AG AT CAAAGCGG AACC AGTT GCA CTGTTTACCTTTCCAGTCGAGATACTTATCCCACAGGGCCCTCACTTTCCAGGC AGAAGTCACCTAGGAGGCGCATCCCTCCGTTTGCTTCCCTCGCGACAAACTCC CCT GT AAAAG AAAACTT C ACT G AATCGT AC ACCT AAT CAT ACG AC ACT AAC AC AGATATA pSD007-SEQ ID NO: 6

GT CCTTT CC AAATTTTT GGTT G AAGGC ATCGCTT AAATT AT G AGC AGG A TCGGT GG AAAT AAGC AGGT ATTT CTT GTT AGG ATT GT G AAGGGC AAGCT GG AT AG AT AT AG AAG AAG AT GTCGT GGTTTT ACCG AC ACCCCCCTT ACCT CC AAC AA AG AT CC ACTT C AGCG ATTCGT GGTT C AC AATT G ATCGC AAACTT GGCT CT GCC T C AAT AT CC AT GGTT GAT GT CT AGTT G AGT GGCGTTT GT GGT CTCTT GAT G AGT T C AAGGCG AAAG AAT AT GAT AGG AAAGC AT GGTTT G AACTTTTCGCG AAAG A AGGAATACTGTTCCGCGAGAAACTCCCCGGTGCCAGAACCTTCCATTGAGGTT AATCGGTGGGAGGTGTTCGAATGACAATGTCAGACAAGGCGAACACGTCTTG TGACACCAGCTGGACTAAGAAGATTCGGTATGCACCGAAGAAGAAGGCCGTG TCT C AATT GGC AACTTT GC AAC AAACT ACGG AGG AAAAGT CT C AC AAGCTTTT AACC AAGTT G AAT C ACG ACG AC AACG AT AAAG AAAT CCT C AACC AT CT AAC AC AT G AAGT AC AAAGT AG AAAT GT GAT CTT ATT GG AC AAACT AG AGG AGCT C AAC AAGGAACTGGGCTGGATTAAAGACCGAAAATGAGGAACCATGAGCACTGGGC GTTT CC AG AAAAACT GC AACC AACG AT GGG AAAAT GAT ACC AC ACT ACT AT GG T C ACCCC AC ATT GT G AAATTT C AAACC AAAAAAG AT C AACCCC AT AATT CCCC AG AGGGTTTT CCC AAC AATTTT CC AACGG ACTT GAT AAT G AGT C AG AT C ATTT G AGC AT ATT CAT CTT ACCCCTT ATT CCGT G AC AATTT ACCT ATT CC ATT C AAAG CAT ACGGT AT CCCGT G ACCTT CT CAT GG AG AT C ATT CT CC ACCG AT AC AGC AT AT AC AC AG AT AT ACCC AACT AAT AT C AATT GG ACCTT GAT AT GGTCG ACCTT G AT GGT CCCGT CC AACCTT AAAACTT AGTTT AAT GCT AT ACTTTCGCCTT G AACC AAAT CT GT CTCCCCCT C AAT CAT CTCT AT GC AAG AAGGT C AAC ACT GATT ACGT GAGCAACAGCCAGCAATCGTTCGAGTCCCCGCCAAAAAAGGCGGAGTTACTG CTCCTTGTGACCACACCCCCTGAGACCACGTCCCTAAACGATCCTTGTCGGTTC CTTCGTCCAATTGGCAATTGCCACGCATACGTGAATCGTTATTGTTTCGCCTAC CTT GCGT C ATTCGTT CC AG AAT GTTCG AC AT ACT CCTCT AG AAC AT ACCGT C AC ACC ACC AT CTT AAGTT AT CTT C ACGT G ACC AT G ACGT AC ATT GT AGTT G ACT AC CCC ATT CT CAT C ATT CCG AT GCGGCC AAAAAT CTCT AT AT AAAG ACCGT AT CCC CT AAT ATT CTCTTCTT GTT AAG AC ATT AACTT AGTT AATT C ACC AATT ACT C AC TT AT AAAC AAAC AAA pSD008-SEQ ID NO: 7

GTTTCTCTTGGGGAGATACTTTTTTCGCGTGCTCCTCCGTGCGGAACTT CCTTCT G AGCTT CT ACCT CT C AG ATT AGT CT AATCGC AT C AGG AAT AAG ACT G AG AAT GCTTTT AAGG AG AGGCTT GAG ATT GGCT AATT GCGTT CCG AAGT ACT C TTT C AAAAGG AGTT AT ACCCCT CT C AACT ACG ATT CTCT AAAG AATT ATCGT AG GCATGCTCAGGCGCCTCAACCCCATCAGTTTGACGCCACTAGATGGGACCAAC AACC AGTT ACT AAT GAGC AAGG AGT AAT ACT CCC AT CCG ACT C AATT GC AAAC ATT CT G AG AC AACC AACT CT GGT CAT AG AACGGC AAAT GG AAAT GAT G AAT AT ATTTTT AGG ATTT G AGC AGGCG AACCG AT AT GTT AT CAT GG AT CCT AC AGG AA GT ATTTT GGGTT AC AT GCT AG AAAGGG AT CT GGGC AT C ACC AAAGCT AT ATT G AG AC AG AT CT ACCGTTT GC ATCG ACCTTTT AC AGT GG AT GT AAT GG AT ACT GC AGG AAAT GT ATT AAT G AC AAT C AAG AGGCCGTTT AGTTT CAT C AATTCGC AC A T C AAAGCT AT ATT ACCCCCTTT C AGG AAC AGCG ACCC AG ACG AAC AT GT AATT GG AG AAT CCGTT C AAAGCT GGC AT CCTT GG AG ACG AAG AT AC AAT CT ATTT AC AGC AC AAATT GGCG AAAAGG AC ACT GT CT ACG AT C AGTTCGGGT AC ATT G ACG C ACCGTTT CTTTCCTTT G AGTTT CCT GT ACTTT C AG AAT CT AGGC AAACGCT AG GT GCT GT CTCT AG AAACTTCGT GGGCTTT GC AAG AG AGCTTTT C AC AG AT AC A GG AGTTT AC AT CAT CCGT AT GGGGCCT G AAT CTTTT GT AGGGCT AG AAGGG AA CTACGGGAACAATGTGGCCCAACATGCCCTTACGCTGGACCAAAGGGCTGTAT T ATT AGCC AAT GCCGTTT C AATT G ACTTT GATT ACTTTT CT AGGC ACTCGT C AC ACAGTGGTGGCTTCATTGGGTTTGAGGAATAGACAGGGTCTCGTCAACTCAGC TCCT GCC ACC AAACC AAT C ATT GAT CAACG AGC AC ACTTTT GT CC ACGT GAGA TCGCTTTCGCTTGCAGAAAGAGCAATGCATGAAAACGGCAAACGCAAAACGA GC AAAAAAACG AGT AAAT AACT AC AATTT C ACC ACC AAC AGGGT C AAAG AGC TTTT GAG AC ACT AT AAAAGGGGCCCTTT CCCCCC AGGTT CCTT G AAAT CCT CAT T C AATT AT GTTTTTT ACT CAT AATTT G ACT C AATT GGC AT CTTCTTCTTT GTT C A T AT AC AGT AATT GAT AT G ACGCTT AGT C ATT ATT AGT GTT CTCG ACT AGC AGT G GCGAAAAAAGGGGGAGTT ATTTTCT AGAACCGACCGC AAACT AT AAAAGAAA GCTGCCCCTCATATACCTTTCGAATTCTTTATTTTCTGTGTTTCTTCCCTATTTA AC AT CT AC AC AAAA