Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
EXPRESSION OF XYLOSE ISOMERASE ACTIVITY IN YEAST
Document Type and Number:
WIPO Patent Application WO/2014/098939
Kind Code:
A1
Abstract:
Expression of a xylose isomerase in a yeast cell that expresses the chaperonins GroES and GroEL was found to result in enzymatically active xylose isomerase, while there is little to no activity with expression of the bacterial xylose isomerase in a yeast cell lacking GroES and GroEL. A yeast cell expressing xylose isomerase activity, and a complete xylose utilization pathway, provides a yeast cell that can produce a target compound, such as ethanol, butanol, or 1,3-propanediol, using xylose derived from lignocellulosic biomass as a carbon source.

Inventors:
HITZ WILLIAM D (US)
QI MIN (US)
RUSH SARAH EVE (US)
TAO LUAN (US)
VIITANEN PAUL V (US)
YANG JIANJUN (US)
YE RICK W (US)
Application Number:
PCT/US2013/030326
Publication Date:
June 26, 2014
Filing Date:
March 12, 2013
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
DU PONT (US)
International Classes:
C07K14/395; C12P5/00; C12R1/645
Domestic Patent References:
WO2011066356A12011-06-03
WO2011153516A22011-12-08
WO2011149353A12011-12-01
WO2011079388A12011-07-07
WO2006115455A12006-11-02
Foreign References:
US7622284B22009-11-24
US20110318790A12011-12-29
US20110318801A12011-12-29
US8206970B22012-06-26
US20070292927A12007-12-20
US20090155870A12009-06-18
US7851188B22010-12-14
US20080182308A12008-07-31
US6514733B12003-02-04
US5686276A1997-11-11
US7005291B12006-02-28
US6013494A2000-01-11
US7629151B22009-12-08
US7622284B22009-11-24
US8058040B22011-11-15
US7943366B22011-05-17
US20100112658A12010-05-06
US20100028975A12010-02-04
US20090061502A12009-03-05
US20070155000A12007-07-05
US20060216804A12006-09-28
US8129171B22012-03-06
US20080081358A12008-04-03
Other References:
VAN MARIS A J A ET AL: "Development of Efficient Xylose Fermentation in Saccharomyces cerevisiae: Xylose Isomerase as a Key Component", ADVANCES IN BIOCHEMICAL ENGINEERING, BIOTECHNOLOGY, SPRINGER, BERLIN, DE, vol. 108, 1 January 2007 (2007-01-01), pages 179 - 204, XP008086128, ISSN: 0724-6145
BU S ET AL: "GroEL-GroES solubilizes abundantly expressed xylulokinase in Escherichia coli", JOURNAL OF APPLIED MICROBIOLOGY 2005 GB, vol. 98, no. 1, 2005, pages 210 - 215, XP002696794, ISSN: 1364-5072
CHANG HUNG-CHUN: "Mechanism of De Novo Multi-domain Protein Folding in Bacteria and Eukaryotes (Dissertation zur Erlangen des Doktorgrades der Fakultät für Chemie und Pharmazie der LMU München)", 30 December 2006, pages: p100 - 103, XP002696795
SARTHY, APPL. ENVIRON. MICROBIOL., vol. 53, 1987, pages 1996 - 2000
AMORE ET AL., APPL. ENVIRON. MICROBIOL., vol. 30, 1989, pages 351 - 357
GARDONYI ET AL., ENZYME AND MICROBIAL TECHNOLOGY, vol. 32, 2003, pages 252 - 259
HARTL; HAYER-HARTL, SCIENCE, vol. 295, 2002, pages 1852 - 1858
KEMER ET AL., CELL, vol. 122, 2005, pages 209 - 220
HUNG-CHUN WANG, PHD THESIS, 2006
PARK; BATT, APPLIED AND ENVIRONMENTAL MICROBIOLOGY, vol. 70, 2004, pages 4318 - 4325
SAMBROOK, J.; FRITSCH, E. F.; MANIATIS, T.: "Molecular Cloning: A Laboratory Manual, 2nd ed.,", 1989, COLD SPRING HARBOR LABORATORY
LESK, A. M.,: "Computational Molecular Biology", 1988, OXFORD UNIVERSITY
SMITH, D. W.,: "Biocomputina: Informatics and Genome Projects", 1993, ACADEMIC
GRIFFIN, A. M., AND GRIFFIN, H. G.,: "Computer Analysis of Sequence Data, Part I", 1994, HUMANIA
VON HEINJE, G.,: "Sequence Analysis in Molecular Biolo-gy", 1987, ACADEMIC
GRIBSKOV, M. AND DEVEREUX, J.,: "Sequence Analysis Primer", 1991, STOCKTON
HIGGINS; SHARP, CABIOS, vol. 5, 1989, pages 151 - 153
HIGGINS, D.G. ET AL., COMPUT. APPL. BIOSCI., vol. 8, 1992, pages 189 - 191
THOMPSON, J.D. ET AL., NUCLEIC ACID RESEARCH, vol. 22, no. 22, 1994, pages 4673 - 4680
ALTSCHUL ET AL., J. MOL. BIOL., vol. 215, 1990, pages 403 - 410
W. R. PEARSON: "Proc. Int. Symp.", 1992, PLENUM, article "Comput. Methods Genome Res.", pages: 111 - 20
SAMBROOK, J.; RUSSELL, D.: "Molecular Cloning: A Laboratory Manual, Third Edition,", 2001, COLD SPRING HARBOR LABORATORY PRESS
SILHAVY, T. J.; BENNAN, M. L.; ENQUIST, L. W.: "Experiments with Gene Fusions", 1984, COLD SPRING HARBOR LABORATORY PRESS
AUSUBEL, F. M.: "Current Protocols, 5th Ed.", 2002, JOHN WILEY AND SONS, INC., article "Short Protocols in Molecular Biology"
"Guide to Yeast Genetics and Molecular and Cell Biology", vol. 194, 2004, ELSEVIER ACADEMIC PRESS, article "Methods in Enzymology"
MATSUSHIKA ET AL., APPL. MICROBIOL. BIOTECHNOL., vol. 84, 2009, pages 37 - 53
KUYPER ET AL., FEMS YEAST RES., vol. 5, 2005, pages 399 - 409
CHRISTINE GUTHRIE AND GERALD R. FINK: "Methods in Enzymology", vol. 194, 2004, ELSEVIER ACADEMIC PRESS, article "Guide to Yeast Genetics and Molecular and Cell Biology"
MATTHIAS HESS ET AL., SCIENCE, vol. 331, 2011, pages 463 - 467
SILHAVY, T. J.; BENNAN, M. L.; ENQUIST, L. W.: "Experiments with Gene Fusions", 1984, COLD SPRING HARBOR LABORATORY
AUSUBEL, F. M. ET AL.: "Current Protocols in Molecular Biology", 1987, GREENE PUBLISHING ASSOC. AND WILEY-INTERSCIENCE
"Methods in Yeast Genetics", 2005, COLD SPRING HARBOR LABORATORY PRESS
AMBERG ET AL., METHODS IN YEAST GENETICS, 2005
Attorney, Agent or Firm:
FELTHAM, S., Neil (Legal Patent Records Center4417 Lancaster Pik, Wilmington DE, US)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1 . A recombinant yeast cell comprising:

a) at least one gene encoding each amino acid sequence of an interacting pair of Group I chaperonin polypeptides; and

b) at least one gene encoding a bacterial xylose isomerase polypeptide;

wherein:

i) the interacting pair of Group I chaperonins are active in the cytosol of the cell;

ii) the bacterial xylose isomerase polypeptide is converted to an active xylose isomerase enzyme; and iii) the specific activity of the bacterial xylose isomerase enzyme is higher as compared with the specific activity of the same xylose isomerase enzyme expressed in the absence of the interacting pair of Group I chaperonin polypeptides. 2. The yeast cell of Claim 1 wherein the at least one gene encoding each amino acid sequence of an interacting pair of Group I chaperonin polypeptides is derived from a bacterium.

3. The yeast cell of Claim 1 wherein the xylose isomerase polypeptide is included in the enzyme classification defined by EC 5.3.1 .5.

4. The yeast cell of Claim 3 wherein the xylose isomerase polypeptide is selected from the group consisting of Class I xylose isomerases and Class II xylose isomerases.

5. The yeast cell of Claim 1 wherein the bacterial xylose isomerase is derived from a member of a genus selected from the group consisting of Actinoplanes, Escherichia, Bacillus, Streptomyces, Burkholderia, Citrobacter, Pseudomonas, Photobacterium, Pantoea, Plautia, Vibrio, Yokenella, Bacteroides, Ruminococcus, and Zymomonas.

6. The yeast cell of Claim 2 wherein the bacterium is a member of a genus selected from the group consisting of Actinoplanes, Escherichia, Bacillus, Streptomyces, Burkholderia, Citrobacter, Pseudomonas, Photobacterium, Pantoea, Plautia, Vibrio, Yokenella, Bacteroides, Ruminococcus, and Zymomonas. 7. The yeast cell of Claim 2 wherein the interacting pair of Group I chaperonin polypeptides comprises a polypeptide selected from the group consisting of GroEL, GroES, Hsp60 and Hsp10.

8. The yeast cell of Claim 7 wherein the interacting pair of Group I chaperonin polypeptides is derived from E. coli.

9. The yeast cell of Claim 1 wherein the at least one gene of a) and the at least one gene of b) are derived from different organisms. 10. The yeast cell of claim 9 wherein the xylose isomerase specific activity is at least 50% of the specific activity of the cell wherein a) and b) are from the same bacteria.

1 1 . The yeast cell of Claim 1 wherein the xylose isomerase specific activity is at least 50% of the xylose isomerase specific activity obtained in yeast cells expressing E. coli GroES and GroEL chaperonins, and E. coli xylose isomerase.

12. The yeast cell of Claim 1 wherein the cell has a complete xylose utilization pathway and has the ability to grow on xylose as a sole carbon source.

13. The yeast cell of Claim 12 further comprising a target compound.

14. The yeast cell of Claim 13 wherein the target compound is selected from the group consisting of ethanol, butanol, and 1 ,3-propanediol.

15. A method for producing a yeast strain that has xylose isomerase activity comprising:

a) providing a yeast cell;

b) introducing a heterologous nucleic acid molecule encoding a GroEL polypeptide and a heterologous nucleic acid molecule encoding a GroES polypeptide; and

c) introducing a heterologous nucleic acid molecule encoding a bacterial xylose isomerase polypeptide;

wherein:

i) the GroEL and GroES polypeptides are expressed in the cytosol of the cell;

ii) the xylose isomerase polypeptide is converted to an active xylose isomerase enzyme; and

iii) the specific activity of the xylose isomerase enzyme is higher as compared with the specific activity of the same xylose isomerase enzyme expressed in the absence of the GroEL and GroES polypeptides.

16. A method for expressing an active bacterial xylose isomerase enzyme in yeast comprising:

a) providing a recombinant yeast cell of Claim 1 ; and

b) growing the yeast cell of a) whereby the xylose isomerase polypeptide is converted to an active xylose isomerase enzyme.

17. The method of Claim 16 wherein the recombinant yeast cell of (a) further comprises a complete xylose utilization pathway and growing of (b) is in a medium comprising xylose as a carbon source.

18. The method of Claim 17 wherein the yeast cell comprises a metabolic pathway that produces a target compound.

19. The method of Claim 18 wherein the target compound is selected from the group consisting of ethanol, butanol, and 1 ,3-propanediol.

Description:
TITLE

EXPRESSION OF XYLOSE ISOMERASE ACTIVITY IN YEAST

This application claims the benefit of United States Provisional Application 61/739755, filed December 20, 2012 and is incorporated by reference in its entirety.

FIELD OF THE INVENTION

The invention relates to the field of genetic engineering of yeast, More specifically, Saccharomyces cerevisiae is engineered to express an active xylose isomerase enzyme by also expressing GroES and GroEL proteins, and thus can grow on xylose as the sole carbohydrate when the rest of the xylose-utilization pathway is active.

BACKGROUND OF THE INVENTION

Currently fermentative production of ethanol is typically by yeasts, particularly Saccharomyces cerevisiae, using hexoses obtained from grains or mash as the carbohydrate source. Use of hydrolysate prepared from cellulosic biomass as a carbohydrate source for fermentation is desirable, as this is a readily renewable resource that does not compete with the food supply. After glucose, the second most abundant sugar in cellulosic biomass is xylose, a pentose. Saccharomyces cerevisiae is not naturally capable of metabolizing xylose, but can be engineered to metabolize xylose with expression of xylose isomerase activity to convert xylose to xylulose, and additional pathway engineering.

Success in expressing heterologous xylose isomerase enzymes that are active in yeast has been limited. Expression of xylose isomerase activity in S. cerevisiae was disclosed in US 7622284 and US

201 10318790. However many bacterial xylose isomerases do not provide significant amounts of catalytically active enyzme when expressed in yeast, as reported in Sarthy et. al. ((1987) Appl. Environ. Microbiol. 53: 1996-2000), Amore et al. ((1989) Appl. Environ. Microbiol. 30: 351 -357), and Gardonyi et al. ((2003) Enzyme and Microbial Technology. 32: 252- Chaperones, which include the chaperonins, are proteins that assist in the post-translational folding of a wide variety of proteins (reviewed in HartI and Hayer-HartI (2002) Science 295: 1852-1858). A proteomewide analysis of E. coli identified about 85 proteins that require the

GroEL/GroES chaperonins for proper folding in vivo, called Class III proteins (Kerner et al. (2005) Cell 122:209-220). Xylose isomerase was predicted to belong to this Class III. E. coli xylose isomerase was found in a soluble fraction when expressed in S. cerevisiae along with E. coli GroEL and GroES (Hung-Chun Wang, PhD Thesis (2006) Ludwig- Maximilians-Universitat Munchen).

There remains a need for additional engineered yeast cells that express xylose isomerase activity for successful utilization of xylose, thereby allowing effective use of sugars from cellulosic biomass during fermentation.

SUMMARY OF THE INVENTION

The invention provides recombinant yeast cells that are engineered to express chaperonins and bacterial xylose isomerase, and therefore have xylose isomerase enzyme activity to enable the utilization of xylose as a carbon source.

Accordingly, the invention provides a recombinant yeast cell comprising:

a) at least one gene encoding each amino acid sequence of an interacting pair of Group I chaperonin polypeptides; and

b) at least one gene encoding a bacterial xylose isomerase polypeptide;

wherein:

i) the interacting pair of Group I chaperonins are active in the cytosol of the cell;

ii) the xylose isomerase polypeptide is converted to an active xylose isomerase enzyme; and iii) the specific activity of the xylose isomerase enzyme is higher as compared with the specific activity of the same xylose isomerase enzyme expressed in the absence of the interacting pair of Group I chaperonin polypeptides.

In another aspect the invention provides a method for producing a yeast strain that has xylose isomerase activity comprising:

a) providing a yeast cell;

b) introducing a heterologous nucleic acid molecule encoding a GroEL polypeptide and a heterologous nucleic acid molecule encoding a GroES polypeptide; and

c) introducing a heterologous nucleic acid molecule encoding a bacterial xylose isomerase polypeptide;

wherein:

i) the GroEL and GroES polypeptides are expressed in the cytosol of the cell;

ii) the xylose isomerase polypeptide is converted to an active xylose isomerase enzyme; and

iii) the specific activity of the xylose isomerase enzyme is higher as compared with the specific activity of the same xylose isomerase enzyme expressed in the absence of the GroEL and GroES polypeptides.

In yet another aspect the invention provides a method for expressing an active bacterial xylose isomerase enzyme in yeast comprising:

a) providing a recombinant yeast cell described above; and b) growing the yeast cell of a) whereby xylose isomerase

polypeptide is converted to an active xylose isomerase enzyme.

BRIEF DESCRIPTION OF THE FIGURES AND SEQUENCE

DESCRIPTIONS

Figure 1A shows a plasmid map of pHR81 -AMXA.

Figure 1 B shows a plasmid map of pHR81 -AMXA-GELS.

Figure 2 shows a plasmid map of pRS423-GELS.. Figure 3A shows a plasmid map of pRS313-AMXA-GELS.

Figure 3B shows a plasmid map of pRS313- GELS.

The invention can be more fully understood from the following detailed description and the accompanying sequence descriptions which form a part of this application.

The following sequences conform with 37 C.F.R. 1 .821 -1 .825 ("Requirements for Patent Applications Containing Nucleotide Sequences and/or Amino Acid Sequence Disclosures - the Sequence Rules") and are consistent with World Intellectual Property Organization (WIPO) Standard ST.25 (2009) and the sequence listing requirements of the EPO and PCT (Rules 5.2 and 49.5(a-bis), and Section 208 and Annex C of the

Administrative Instructions). The symbols and format used for nucleotide and amino acid sequence data comply with the rules set forth in

37 C.F.R. §1 .822.

Table 1 SEQ ID NOs for GroEL polypeptides, and coding regions that are codon optimized for expression in S. cerevisiae

Table 2 SEQ ID NOs for GroES polypeptides, and coding regions that are codon optimized for expression in S. cerevisiae

*Two different codon-optimized sequences for the same amino acid sequence

SEQ ID NO:61 is the nucleotide sequence of a chimeric AMxylA expression cassette.

SEQ ID NO:62 is the nucleotide sequence of a chimeric ECgroES expression cassette.

SEQ ID NO:63 is the nucleotide sequence of a chimeric ECgroEL expression cassette.

SEQ ID NO:64 is the nucleotide sequence of pHR81 -AMXA.

SEQ ID NO:65 is the nucleotide sequence of pHR81 -AMXA-GELS

SEQ ID NO:66 is the nucleotide sequence of pRS423-GELS.

SEQ ID NO:67 is the nucleotide sequence of pRS313-AMXA-

GELS.

SEQ ID NO:68 is the nucleotide sequence of pRS313-GELS.

SEQ ID NOs:68 - 86 are the nucleotide sequences of primers and probes.

SEQ ID NO:87 is the nucleotide sequence of P5 Integration Vector.

SEQ ID NO:88 is the nucleotide sequence of a URA3 deletion scar.

SEQ ID NO:89 is the nucleotide sequence of the upstream ura3A post deletion region.

SEQ ID NO:90 is the nucleotide sequence of the downstream ura3A post deletion region.

SEQ ID NO:91 is the nucleotide sequence of the upstream his3A post deletion region.

SEQ ID NO:92 is the nucleotide sequence of the downstream his3A post deletion region.

SEQ ID NO:93 is the nucleotide sequence of pJT254.

SEQ ID NO:94 is the nucleotide sequence of pRS423 Am

104GroES 550 GroEL.

SEQ ID NO:95 is the nucleotide sequence of pRS423 Am

1 12GroES 540 GroEL.

SEQ ID NO:96 is the amino acid sequence of the xylose isomerase from Ruminococcus flavefaciens FD-1 . SEQ ID NO:97 is the amino acid sequence of the xylose isomerase from Ruminococcus champanellensis 18P13.

SEQ ID NO:98 is the amino acid sequence of Ru2.

SEQ ID NO:99 is the nucleotide sequence of xylA(Ru2), the codon optimized coding region for Ru2.

SEQ ID NO:100 is the amino acid sequence of Ru3.

SEQ ID NO:101 is the nucleotide sequence of xylA(Ru3), the codon optimized coding region for Ru3. DETAILED DESCRIPTION

The following definitions may be used for the interpretation of the claims and specification:

As used herein, the terms "comprises," "comprising," "includes," "including," "has," "having," "contains" or "containing," or any other variation thereof, are intended to cover a non-exclusive inclusion. For example, a composition, a mixture, process, method, article, or apparatus that comprises a list of elements is not necessarily limited to only those elements but may include other elements not expressly listed or inherent to such composition, mixture, process, method, article, or apparatus. Further, unless expressly stated to the contrary, "or" refers to an inclusive or and not to an exclusive or. For example, a condition A or B is satisfied by any one of the following: A is true (or present) and B is false (or not present), A is false (or not present) and B is true (or present), and both A and B are true (or present).

Also, the indefinite articles "a" and "an" preceding an element or component of the invention are intended to be nonrestrictive regarding the number of instances (i.e. occurrences) of the element or component. Therefore "a" or "an" should be read to include one or at least one, and the singular word form of the element or component also includes the plural unless the number is obviously meant to be singular.

The term "invention" or "present invention" as used herein is a non- limiting term and is not intended to refer to any single embodiment of the particular invention but encompasses all possible embodiments as described in the specification and the claims.

As used herein, the term "about" modifying the quantity of an ingredient or reactant of the invention employed refers to variation in the numerical quantity that can occur, for example, through typical measuring and liquid handling procedures used for making concentrates or use solutions in the real world; through inadvertent error in these procedures; through differences in the manufacture, source, or purity of the ingredients employed to make the compositions or carry out the methods; and the like. The term "about" also encompasses amounts that differ due to different equilibrium conditions for a composition resulting from a particular initial mixture. Whether or not modified by the term "about", the claims include equivalents to the quantities. In one embodiment, the term "about" means within 10% of the reported numerical value, preferably within 5% of the reported numerical value.

The term "chaperones" refers to proteins that assist in folding of certain newly synthesized proteins to prevent misfolding and aggregation.

The term "chaperonins" refers to a class of chaperones that are large double-ring complexes of about 800-1 ,000 kD enclosing a central cavity. There are two groups of chaperonins with similar architecture but distantly related sequence: Group I and Group II.

The term "Group I chaperonins" refers to a group of chaperonins which includes the GroELs or Hsp60s, which are found in eubacteria, and in mitochondria and chloroplasts of eukaryotic cells. These proteins interact with cofactors referred to as GroES or Hsp10. Together a GroEL or Hsp60 protein and a GroES or Hsp10 protein interact to form an active chaperonin complex are referred to herein as an "interacting pair" of Group I chaperonin polypeptides.

The term "Goup II chaperonins" refers to a group of chaperonins found in archaeal bacteria and the eukaryotic cytosol, which are GroES and Hsp10 independent. An example is TRiC (TCP-1 ring complex, also called CCT for chaperonin-containing TCP-1 ).

The term "xylose isomerase" refers to an enzyme that catalyzes the interconversion of D-xylose and D-xylulose. Xylose isomerases (XI) belong to the group of enzymes classified as EC 5.3.1 .5.

The term "Group I xylose isomerase" refers herein to a xylose isomerase (XI) protein that belongs to Group I as defined by at least one of the following criteria: a) it falls within a 50% threshold sequence identity grouping that includes the A. missouriensis XI that is prepared using molecular phylogenetic bioinformatics analysis as in Example 4 of US 201 10318801 , which is incorporated herein by reference; b) it substantially fits the amino acids for Group I in the specificity

determining positions (SDP) identified using GroupSim analysis of the Group I and Group II XI sets determined from molecular phylogenetic analysis that are given in Table 6 in Example 4 of US 201 10318801 ; and/or c) it has an E-value of 1 E-15 or less when queried using a Profile Hidden Markov Model prepared using SEQ ID NOs: 2, 24, 32, 34, 42, 54, 66, 68, 78, 96, 100, 106, 108, 122, 126, 128, 130, 132, 135, 137, and 142 of US 201 10318801 ; where the query is carried out using the hmmsearch algorithm with the Z parameter set to 1 billion, as in

Example 4 of US 201 10318801 . It is understood that although "Group 1 " xylose isomerases are known and defined in the literature that the definition provided herein is more precise than the literature definition and is the definition that informs the following discussion.

The term "Group II xylose isomerase" refers herein to a xylose isomerase (XI) protein that belongs to Group II as defined in the art, such as in Park and Batt ((2004) Applied and Environmental

Microbiology 70:4318-4325), wherein Group II Xls are distinguished from Group I Xls in being typically longer than Group I Xls: about 440 to 460 amino acids vs about 380 to 390 amino acids, respectively. Group II Xls have only 20-30% amino acid identity with Group I Xls, while among Group I Xls there is amino acid identity of at least about 50%. Analysis of Group I and Group II Xls is more fully disclosed in US 201 10318801 , which includes a phylogenetic tree.

The term "E-value", as known in the art of bioinformatics, is "Expect-value" which provides the probability that a match will occur by chance. It provides the statistical significance of the match to a sequence. The lower the E-value, the more significant the hit.

The term "gene" refers to a nucleic acid fragment that expresses a specific protein or functional RNA molecule, which may optionally include regulatory sequences preceding (5' non-coding sequences) and following (3' non-coding sequences) the coding sequence. "Native gene" or "wild type gene" refers to a gene as found in nature with its own regulatory sequences. "Chimeric gene" refers to any gene that is not a native gene, comprising regulatory and coding sequences that are not found together in nature. Accordingly, a chimeric gene may comprise regulatory sequences and coding sequences that are derived from different sources, or regulatory sequences and coding sequences derived from the same source, but arranged in a manner different than that found in nature.

"Endogenous gene" refers to a native gene in its natural location in the genome of an organism. A "foreign" gene refers to a gene not normally found in the host organism, but that is introduced into the host organism by gene transfer. Foreign genes can comprise native genes inserted into a non-native organism, or chimeric genes.

The term "promoter" or "Initiation control regions" refers to a DNA sequence capable of controlling the expression of a coding sequence or functional RNA. In general, a coding sequence is located 3' to a promoter sequence. Promoters may be derived in their entirety from a native gene, or be composed of different elements derived from different promoters found in nature, or even comprise synthetic DNA segments. It is understood by those skilled in the art that different promoters may direct the expression of a gene in different tissues or cell types, or at different stages of development, or in response to different environmental conditions. Promoters which cause a gene to be expressed in most cell types at most times are commonly referred to as "constitutive promoters".

The term "expression", as used herein, refers to the transcription and stable accumulation of coding (mRNA) or functional RNA derived from a gene. Expression may also refer to translation of mRNA into a polypeptide. "Overexpression" refers to the production of a gene product in transgenic organisms that exceeds levels of production in normal or non-transformed organisms.

The term "transformation" as used herein, refers to the transfer of a nucleic acid fragment into a host organism, resulting in genetically stable inheritance. The transferred nucleic acid may be in the form of a plasmid maintained in the host cell, or some transferred nucleic acid may be integrated into the genome of the host cell. Host organisms containing the transformed nucleic acid fragments are referred to as "transgenic" or "recombinant" or "transformed" organisms.

The terms "plasmid" and "vector" as used herein, refer to an extra chromosomal element often carrying genes which are not part of the central metabolism of the cell, and usually in the form of circular double- stranded DNA molecules. Such elements may be autonomously replicating sequences, genome integrating sequences, phage or nucleotide sequences, linear or circular, of a single- or double-stranded DNA or RNA, derived from any source, in which a number of nucleotide sequences have been joined or recombined into a unique construction which is capable of introducing a promoter fragment and DNA sequence for a selected gene product along with appropriate 3' untranslated sequence into a cell.

The term "operably linked" refers to the association of nucleic acid sequences on a single nucleic acid fragment so that the function of one is affected by the other. For example, a promoter is operably linked with a coding sequence when it is capable of affecting the expression of that coding sequence (i.e., that the coding sequence is under the

transcriptional control of the promoter). Coding sequences can be operably linked to regulatory sequences in sense or antisense orientation.

The term "selectable marker" means an identifying factor, usually an antibiotic or chemical resistance gene, that is able to be selected for based upon the marker gene's effect, i.e., resistance to an antibiotic, wherein the effect is used to track the inheritance of a nucleic acid of interest and/or to identify a cell or organism that has inherited the nucleic acid of interest. As used herein the term "codon degeneracy" refers to the nature in the genetic code permitting variation of the nucleotide sequence without affecting the amino acid sequence of an encoded polypeptide. The skilled artisan is well aware of the "codon-bias" exhibited by a specific host cell in usage of nucleotide codons to specify a given amino acid. Therefore, when synthesizing a gene for improved expression in a host cell, it is desirable to design the gene such that its frequency of codon usage approaches the frequency of preferred codon usage of the host cell.

The term "codon-optimized" as it refers to genes or coding regions of nucleic acid molecules for transformation of various hosts, refers to the alteration of codons in the gene or coding regions of the nucleic acid molecules to reflect the typical codon usage of the host organism without altering the polypeptide encoded by the DNA.

The term "carbon substrate" or "fermentable carbon substrate" refers to a carbon source capable of being metabolized by

microorganisms. A type of carbon substrate is "fermentable sugars" which refers to oligosaccharides and monosaccharides that can be used as a carbon source by a microorganism in a fermentation process.

The term "lignocellulosic" refers to a composition comprising both lignin and cellulose. Lignocellulosic material may also comprise hemicellulose.

The term "cellulosic" refers to a composition comprising cellulose and additional components, which may include hemicellulose and lignin.

The term "saccharification" refers to the production of fermentable sugars from polysaccharides.

The term "pretreated biomass" means biomass that has been subjected to thermal, physical and/or chemical pretreatment to increase the availability of polysaccharides in the biomass to saccharification enzymes.

"Biomass" refers to any cellulosic or lignocellulosic material and includes materials comprising cellulose, and optionally further comprising hemicellulose, lignin, starch, oligosaccharides and/or monosaccharides. Biomass may also comprise additional components, such as protein and/or lipid. Biomass may be derived from a single source, or biomass can comprise a mixture derived from more than one source; for example, biomass could comprise a mixture of corn cobs and corn stover, or a mixture of grass and leaves. Biomass includes, but is not limited to, bioenergy crops, agricultural residues, municipal solid waste, industrial solid waste, sludge from paper manufacture, yard waste, wood and forestry waste. Examples of biomass include, but are not limited to, corn cobs, crop residues such as corn husks, corn stover, grasses, wheat, wheat straw, barley straw, hay, rice straw, switchgrass, waste paper, sugar cane bagasse, sorghum, components obtained from milling of grains, trees, branches, roots, leaves, wood chips, sawdust, shrubs and bushes, vegetables, fruits, flowers and animal manure.

"Biomass hydrolysate" refers to the product resulting from saccharification of biomass. The biomass may also be pretreated or pre- processed prior to saccharification.

The term "heterologous" means not naturally found in the location of interest. For example, a heterologous gene refers to a gene that is not naturally found in the host organism, but that is introduced into the host organism by gene transfer. For example, a heterologous nucleic acid molecule that is present in a chimeric gene is a nucleic acid molecule that is not naturally found associated with the other segments of the chimeric gene, such as the nucleic acid molecules having the coding region and promoter segments not naturally being associated with each other.

As used herein, an "isolated nucleic acid molecule" is a polymer of RNA or DNA that is single- or double-stranded, optionally containing synthetic, non-natural or altered nucleotide bases. An isolated nucleic acid molecule in the form of a polymer of DNA may be comprised of one or more segments of cDNA, genomic DNA or synthetic DNA.

A nucleic acid fragment is "hybridizable" to another nucleic acid fragment, such as a cDNA, genomic DNA, or RNA molecule, when a single-stranded form of the nucleic acid fragment can anneal to the other nucleic acid fragment under the appropriate conditions of temperature and solution ionic strength. Hybridization and washing conditions are well known and exemplified in Sambrook, J., Fritsch, E. F. and Maniatis, T.

Molecular Cloning: A Laboratory Manual, 2 nd ed., Cold Spring Harbor Laboratory: Cold Spring Harbor, NY (1989), particularly Chapter 1 1 and Table 1 1 .1 therein (entirely incorporated herein by reference). The conditions of temperature and ionic strength determine the "stringency" of the hybridization. Stringency conditions can be adjusted to screen for moderately similar fragments (such as homologous sequences from distantly related organisms), to highly similar fragments (such as genes that duplicate functional enzymes from closely related organisms).

Post-hybridization washes determine stringency conditions. One set of preferred conditions uses a series of washes starting with 6X SSC, 0.5% SDS at room temperature for 15 min, then repeated with 2X SSC, 0.5% SDS at 45 °C for 30 min, and then repeated twice with 0.2X SSC, 0.5% SDS at 50 °C for 30 min. A more preferred set of stringent conditions uses higher temperatures in which the washes are identical to those above except for the temperature of the final two 30 min washes in 0.2X SSC, 0.5% SDS was increased to 60 °C. Another preferred set of highly stringent conditions uses two final washes in 0.1 X SSC, 0.1 % SDS at 65 °C. An additional set of stringent conditions include hybridization at 0.1 X SSC, 0.1 % SDS, 65 °C and washes with 2X SSC, 0.1 % SDS followed by 0.1X SSC, 0.1 % SDS, for example.

Hybridization requires that the two nucleic acids contain

complementary sequences, although depending on the stringency of the hybridization, mismatches between bases are possible. The appropriate stringency for hybridizing nucleic acids depends on the length of the nucleic acids and the degree of complementation, variables well known in the art. The greater the degree of similarity or homology between two nucleotide sequences, the greater the value of Tm for hybrids of nucleic acids having those sequences. The relative stability

(corresponding to higher Tm) of nucleic acid hybridizations decreases in the following order: RNA:RNA, DNA:RNA, DNA:DNA. For hybrids of greater than 100 nucleotides in length, equations for calculating Tm have been derived (see Sambrook et al., supra, 9.50-9.51 ). For hybridizations with shorter nucleic acids, i.e., oligonucleotides, the position of

mismatches becomes more important, and the length of the

oligonucleotide determines its specificity (see Sambrook et al., supra, 1 1 .7-1 1 .8). In one embodiment the length for a hybridizable nucleic acid is at least about 10 nucleotides. Preferably a minimum length for a hybridizable nucleic acid is at least about 15 nucleotides; more preferably at least about 20 nucleotides; and most preferably the length is at least about 30 nucleotides. Furthermore, the skilled artisan will recognize that the temperature and wash solution salt concentration may be adjusted as necessary according to factors such as length of the probe.

The term "complementary" is used to describe the relationship between nucleotide bases that are capable of hybridizing to one another. For example, with respect to DNA, adenosine is complementary to thymine and cytosine is complementary to guanine.

The term "percent identity", as known in the art, is a relationship between two or more polypeptide sequences or two or more

polynucleotide sequences, as determined by comparing the sequences. In the art, "identity" also means the degree of sequence relatedness between polypeptide or polynucleotide sequences, as the case may be, as determined by the match between strings of such sequences. "Identity" and "similarity" can be readily calculated by known methods, including but not limited to those described in: 1 .) Computational Molecular Biology (Lesk, A. M., Ed.) Oxford University: NY (1988); 2.) Biocomputinq:

Informatics and Genome Projects (Smith, D. W., Ed.) Academic: NY (1993); 3.) Computer Analysis of Sequence Data, Part I (Griffin, A. M., and Griffin, H. G., Eds.) Humania: NJ (1994); 4.) Sequence Analysis in

Molecular Biology (von Heinje, G., Ed.) Academic (1987); and 5.)

Sequence Analysis Primer (Gribskov, M. and Devereux, J., Eds.)

Stockton: NY (1991 ).

Preferred methods to determine identity are designed to give the best match between the sequences tested. Methods to determine identity and similarity are codified in publicly available computer programs.

Sequence alignments and percent identity calculations may be performed using the MegAlign program of the LASERGENE bioinformatics

computing suite (DNASTAR Inc., Madison, Wl).

Multiple alignment of the sequences is performed using the "Clustal method of alignment" which encompasses several varieties of the algorithm including the "Clustal V method of alignment" corresponding to the alignment method labeled Clustal V (described by Higgins and Sharp, CABIOS. 5:151 -153 (1989); Higgins, D.G. et al., Comput. Appl. Biosci., 8:189-191 (1992)) and found in the MegAlign v8.0 program of the

LASERGENE bioinformatics computing suite (DNASTAR Inc.). For multiple alignments, the default values correspond to GAP PENALTY=10 and GAP LENGTH PENALTY=10. Default parameters for pairwise alignments and calculation of percent identity of protein sequences using the Clustal method are KTUPLE=1 , GAP PENALTY=3, WINDOW=5 and DIAGONALS SAVED=5. For nucleic acids these parameters are

KTUPLE=2, GAP PENALTY=5, WINDOW=4 and DIAGONALS SAVED=4. After alignment of the sequences using the Clustal V program, it is possible to obtain a "percent identity" by viewing the "sequence distances" table in the same program.

Additionally the "Clustal W method of alignment" is available and corresponds to the alignment method labeled Clustal W (described by Higgins and Sharp, CABIOS. 5:151 -153 (1989); Higgins, D.G. et al., Comput. Appl. Biosci. 8:189-191 (1992); Thompson, J.D. et al, Nucleic Acid Research, 22 (22): 4673-4680, 1994) and found in the MegAlign v8.0 program of the LASERGENE bioinformatics computing suite (DNASTAR Inc.). Default parameters for multiple alignment (stated as protein/nucleic acid (GAP PENALTY=10/15, GAP LENGTH PENALTY=0.2/6.66, Delay Divergen Seqs(%)=30/30, DNA Transition Weight=0.5, Protein Weight Matrix=Gonnet Series, DNA Weight Matrix=IUB ). After alignment of the sequences using the Clustal W program, it is possible to obtain a "percent identity" by viewing the "sequence distances" table in the same program.

It is well understood by one skilled in the art that many levels of sequence identity are useful in identifying polypeptides, from other species, wherein such polypeptides have the same or similar function or activity. Useful examples of percent identities include, but are not limited to: 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%, or any integer percentage from 50% to 100% may be useful in identifying polypeptides of interest, such as 50%, 51 %, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61 %, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71 %, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81 %, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99%. Suitable nucleic acid fragments not only have the above identities but typically encode a polypeptide having at least 50 amino acids, preferably at least 100 amino acids, and more preferably at least 125 amino acids.

The term "sequence analysis software" refers to any computer algorithm or software program that is useful for the analysis of nucleotide or amino acid sequences. "Sequence analysis software" may be commercially available or independently developed. Typical sequence analysis software will include, but is not limited to: 1 .) the GCG suite of programs (Wisconsin Package Version 9.0, Genetics Computer Group (GCG), Madison, Wl); 2.) BLASTP, BLASTN, BLASTX (Altschul et al., J. Mol. Biol., 215:403-410 (1990)); 3.) DNASTAR (DNASTAR, Inc.

Madison, Wl); 4.) Sequencher (Gene Codes Corporation, Ann Arbor, Ml); and 5.) the FASTA program incorporating the Smith-Waterman algorithm (W. R. Pearson, Comput. Methods Genome Res., [Proc. Int. Symp.] (1994), Meeting Date 1992, 1 1 1 -20. Editor(s): Suhai, Sandor. Plenum: New York, NY). Within the context of this application it will be understood that where sequence analysis software is used for analysis, that the results of the analysis will be based on the "default values" of the program referenced, unless otherwise specified. As used herein "default values" will mean any set of values or parameters that originally load with the software when first initialized.

Standard recombinant DNA and molecular cloning techniques used herein are well known in the art and are described by Sambrook, J. and Russell, D., Molecular Cloning: A Laboratory Manual, Third Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY (2001 ); and by Silhavy, T. J., Bennan, M. L. and Enquist, L. W., Experiments with Gene Fusions, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY (1984); and by Ausubel, F. M. et. al., Short Protocols in Molecular Biology, 5 th Ed. Current Protocols, John Wiley and Sons, Inc., N.Y., 2002.

Additional methods used here are in Methods in Enzymology, Volume 194, Guide to Yeast Genetics and Molecular and Cell Biology (Part A, 2004, Christine Guthrie and Gerald R. Fink (Eds.), Elsevier Academic Press, San Diego, CA).

The present invention relates to engineered yeast strains that have xylose isomerase enzyme activity. A challenge for engineering yeast to utilize xylose, which is the second most predominant sugar obtained from cellulosic biomass, is to produce sufficient xylose isomerase activity in the yeast cell. Xylose isomerase catalyzes the conversion of xylose to xylulose, which is the first step in a xylose utilization pathway. Applicants have found that expression of a bacterial xylose isomerase in a yeast cell that expresses the chaperonins GroES and GroEL results in enzymatically active xylose isomerase, while there is little to no activity with expression of the bacterial xylose isomerase in a yeast cell lacking GroES and GroEL. A yeast cell expressing xylose isomerase activity provides a host cell for expression of a complete xylose utilization pathway, thereby enabling the engineering of a yeast cell that can produce a target compound, such as ethanol, butanol, or 1 ,3-propanediol, using xylose derived from

lignocellulosic biomass as a carbon source.

Yeast Host Cells

Any yeast cells that either produce a target chemical, or can be engineered to produce a target chemical, may be used as host cells.

Examples of such yeasts include, but are not limited to, yeasts of the genera Kluyveromyces, Candida, Pichia, Hansenula,

Schizosaccharomyces, Kloeckera, Schwammiomyces, Yarrowia, and Saccharomyces.

Engineering of a yeast cell for expression of xylose isomerase activity as disclosed herein and for production of a target chemical may occur simultaneously or in any order. In one embodiment, yeast cells that produce ethanol may be used as host cells in engineering to produce the present cells. In one embodiment the yeast cells are capable of anaerobic alcoholic fermentation. The yeast cells may naturally produce ethanol, or may be engineered to produce ethanol, or to produce increased yields of ethanol.

In other embodiments yeast cells that are engineered to express a pathway for synthesis of butanol or 1 ,3-propanediol are host cells, with engineering steps occurring in any order. Engineering of pathways for butanol synthesis (including isobutanol, 1 -butanol, and 2-butanol) have been disclosed, for example in US 8,206,970, US 20070292927, US 20090155870, US 7,851 ,188, and US 20080182308, which are

incorporated herein by reference. Engineering of pathways for 1 ,3- propanediol have been disclosed in US 6,514,733, US 5,686,276, US 7,005,291 , US 6,013,494, and US 7,629,151 , which are incorporated herein by reference.

For utilization of xylose as a carbon source, a yeast cell is engineered for expression of a complete xylose utilization pathway.

Engineering of yeast such as S. cerevisiae for production of ethanol from xylose is described in Matsushika et al. (Appl. Microbiol. Biotechnol.

(2009) 84:37-53) and in Kuyper et al. (FEMS Yeast Res. (2005) 5:399- 409). In one embodiment, in addition to engineering a yeast cell as disclosed herein to have xylose isomerase activity, the activities of other pathway enzymes are increased in the cell. Typically the activity levels of five pentose pathway enzymes are increased: xylulokinase (XKS1 ) , transaldolase (TAL1 ), transketolase 1 (TKL1 ), D-ribulose -5-phosphate 3- epimerase (RPE1 ), and ribose 5-phosphate ketol-isomerase (RKI1 ). Any method known to one skilled in the art for increasing expression of a gene may be used. For example, as described herein in Example 5 these activities may be increased by expressing the host coding region for each protein using a highly active promoter. Chimeric genes for expression are constructed and are integrated into the yeast genome. Alternatively, heterologous coding regions for these enzymes may be expressed in the yeast cell to obtain increased enzyme activities. For additional methods for engineering yeast capable of metabolizing xylose see for example

US7622284B2, US8058040B2, US 7,943,366 B2, WO201 1 153516A2, WO201 1 149353A1 , WO201 1079388A1 , US201001 12658A1 ,

US20100028975A1 , US20090061502A1 , US20070155000A1 ,

WO20061 15455A1 , US20060216804A1 and US8129171 B2.

GroES and GroEL polypeptides in yeast host cells

Bacteria, as well as mitochondria and chloroplasts of eukaryotic cells, have a variety of proteins that assist in the folding of other proteins which are called chaperones. Chaperones that are called chaperonins include proteins named GroEL, HSP60, GroES, and HSP10, which are proteins that mediate folding to produce active enzymes. These

chaperonins function in interacting pairs to form active complexes, for example GroEL with GroES, and Hsp60 with Hsp10. These complexes mediate the proper folding of certain proteins to convert them into an enzymatically active form. The present yeast cells express an interacting pair of Group I chaperonin polypeptides. No additional chaperonins or other chaperones are needed in the present cells to convert a xylose isomerase polypeptide into an active enzyme.

Any interacting pair of Group I chaperonin polypeptides may be expressed in the present cells. The individual chaperonin polypeptides of the pair may be from the same organism, or from different organisms, as long as together they form a functional complex. The chaperonins are expressed so that they are active in the cytoplasm of the cell. Chaperonins that are expressed in the nucleus of a eukaryotic cell and are transported into the mitochondria or chloroplast may be engineered so that they remain in the cytoplasm. The coding region for the transit signal sequence, which directs transport into the organelle, can be deleted so that the polypeptide remains in the cytoplasm. For example, Hsp60 and Hsp10 with the transit signal sequences removed may be expressed in a yeast cell to provide the present interacting pair of Group I chaperonin

polypeptides in the cytoplasm.

In one embodiment the interacting pair of Group I chaperonin polypeptides is derived from a bacteria. A wide variety of bacteria have chaperonins called GroEL and GroES polypeptides. In one embodiment the present yeast cells express bacterial GroEL and GroES polypeptides. Any bacteria-derived pair of GroEL and GroES polypeptides may be expressed in the present yeast cells. In various embodiments the GroEL and GroES proteins are encoded by genes of bacteria of the genera Actinoplanes, Escherichia, Bacillus, Streptomyces, Burkholderia,

Citrobacter, Pseudomonas, Photobacterium, Pantoea, Plautia, Vibrio, Yokenella, Bacteroides, Ruminococcus, or Zymomonas. In one

embodiment the GroEL and GroES polypeptides are derived from E. coli.

Typically a GroEL polypeptide is paired with its natural partner

GroES polypeptide from the same bacterium. Examples of amino acid sequences of GroEL proteins that may be used in the present cells include, but are not limited to, SEQ ID NOs:1 , 3, 5, 7, 9, 1 1 , and 13. In various embodiments the GroEL polypeptide in the present cells has at least about 95% amino acid sequence identity to any of SEQ ID NOs: 1 , 3, 5, 7, 9, 1 1 , and 13. The GroEL polypeptide may have at least about 95%, 96%, 97%, 98%, or 99% identity to any of SEQ ID NOs: 1 , 3, 5, 7, 9, 1 1 , and 13. Because GroEL proteins are well known, and because of the prevalence of genomic sequencing, suitable GroEL polypeptides may be readily identified by one skilled in the art on the basis of sequence similarity using bioinformatics approaches. Typically BLAST (described above) searching of publicly available databases with known GroEL amino acid sequences, such as those provided herein, is used to identify GroEL polypeptides, and their encoding sequences, that may be used in the present strains. In one embodiment the GroEL polypeptide in the present cells has at least about 95% amino acid sequence identity to the amino acid sequence of the E. coli GroEL (SEQ ID NO:1 ) or to the amino acid sequence of the Actinoplanes missouriensis GroEL polypeptide of SEQ ID NO:3.

Examples of amino acid sequences of GroES polypeptides that may be used in the present cells include, but are not limited to, SEQ ID NOs:15, 17, 19, 21 , 23, 25, and 27. In various embodiments the GroES polypeptide in the present cells has at least about 95% amino acid sequence identity to any of SEQ ID NOs: SEQ ID NOs:15, 17, 19, 21 , 23, 25, and 27. The GroES polypeptide may have at least about 95%, 96%, 97%, 98%, or 99% identity to any of SEQ ID NOs:15, 17, 19, 21 , 23, 25, and 27. Because GroES polypeptides are well known, and because of the prevalence of genomic sequencing, suitable GroES polypeptides may be readily identified by one skilled in the art on the basis of sequence similarity using bioinformatics approaches as described above. In one embodiment the GroES polypeptide in the present cells has at least about 95% amino acid sequence identity to amino acid sequence of the E. coli GroES (SEQ ID NO:15) or to the amino acid sequence of the Actinoplanes missouriensis GroES polypeptide of SEQ ID NO:17.

The coding region for each GroEL and GroES polypeptide is readily obtained from the genome of the bacterial strain in which it is natively expressed, as well known to one skilled in the art. Native nucleotide sequences encoding each of these proteins may be codon optimized for expression in the yeast host cell to be engineered, as is well known to one skilled in the art. For example, codon-optimized coding sequences for expression in yeast for GroEL polypeptides are provided as SEQ ID NOs:2, 4, 6, 8, 10, 12, and 14, and for GroES polypeptides are provided as SEQ ID NOs:16, 18, 20, 22, 24, 26, and 28. The coding regions for GroEL and GroES are heterologous to the yeast cell. Thus heterologous nucleic acid molecules encoding GroEL and GroES polypeptides are introduced into a yeast cell for expression.

Methods for gene expression in yeasts are known in the art (see for example Methods in Enzymology, Volume 194, Guide to Yeast Genetics and Molecular and Cell Biology (Part A, 2004, Christine Guthrie and Gerald R. Fink (Eds.), Elsevier Academic Press, San Diego, CA).

Expression of genes in yeast typically requires a promoter, operably linked to a coding region of interest, and a transcriptional terminator. A number of yeast promoters can be used in constructing expression cassettes for genes encoding GroES and GroEL, including, but not limited to

constitutive promoters FBA1 , GPD1 , ADH1 , GPM, TPI1 , TDH3, PGK1 , Ilv5, and the inducible promoters GAL1 , GAL10, and CUP1 . Suitable transcription terminators include, but are not limited to FBAt, GPDt, GPMt, ERG10t, GAL1 t, CYC1 t, ADH1 t, TAL1 t, TKL1 t, ILV5t, and ADHt.

Suitable promoters, transcriptional terminators, and GroEL and GroES coding regions may be cloned into E. co//-yeast shuttle vectors, and transformed into yeast cells. These vectors allow strain propagation in both E. coli and yeast strains.

Typically the vector contains a selectable marker and sequences allowing autonomous replication or chromosomal integration in the desired host. Typically used plasmids in yeast are shuttle vectors pRS423, pRS424, pRS425, and pRS426 (American Type Culture Collection,

Rockville, MD), which contain an E. coli replication origin (e.g., pMB1 ), a yeast 2μ origin of replication, and a marker for nutritional selection. The selection markers for these four vectors are His3 (vector pRS423), Trp1 (vector pRS424), Leu2 (vector pRS425) and Ura3 (vector pRS426).

Additional vectors that may be used include pHR81 (ATCC #87541 ), pRS313 (ATCC #77142). Construction of expression vectors with chimeric genes encoding GroEL and GroES may be performed by either standard molecular cloning techniques in E. coli or by the gap repair recombination method in yeast.

The gap repair cloning approach takes advantage of the highly efficient homologous recombination in yeast. Typically, a yeast vector DNA is digested (e.g., in its multiple cloning site) to create a "gap" in its sequence. The "gapped" vector and insert DNAs having sequentially overlapping ends (overlapping with each other and with the gapped vector ends, in the desired order of inserts) are then co-transformed into yeast cells which are plated on the medium containing the appropriate compound mixtures that allow complementation of the nutritional selection markers on the plasmids. The presence of correct insert combinations can be confirmed by PCR mapping using plasmid DNA prepared from the selected cells. The plasmid DNA isolated from yeast can then be transformed into an E. coli strain, e.g. TOP10, followed by mini preps and restriction mapping to further verify the plasmid construct. Finally the construct can be verified by sequence analysis. Like the gap repair technique, integration into the yeast genome also takes advantage of the homologous recombination system in yeast. Typically, a cassette containing a coding region plus control elements (promoter and terminator) and auxotrophic marker is PCR-amplified with a high-fidelity DNA polymerase using primers that hybridize to the cassette and contain 40-70 base pairs of sequence homology to the regions 5' and 3' of the genomic area where insertion is desired. The PCR product is then transformed into yeast cells which are plated on medium containing the appropriate compound mixtures that allow selection for the integrated auxotrophic marker. Transformants can be verified either by colony PCR or by direct sequencing of chromosomal DNA.

Xylose isomerase enzyme activity in yeast host cells

Expression of xylose isomerases in yeast cells has been

problematic; some xylose isomerases have been found to have little to no activity when expressed in yeast cells. For example, the xylose isomerase typically expressed to provide a xylose utilization pathway in Zymomonas, that from E. coli, was found to be barely active in S. cerevisiae, producing about 1000-fold lower activity than expected (Sarthy et. al. (1987) Appl. Environ. Microbiol. 53: 1996-2000). A xylose isomerase disclosed in US 201 10318801 as providing higher levels of activity in Zymomonas than the E. coli xylose isomerase, that from Actinoplanes missouriensis, is found herein to be inactive in S. cerevisiae.

In the present yeast cell, at least one gene encoding a xylose isomerase polypeptide is introduced together with at least one gene encoding each amino acid sequence of an interacting pair of Group I chaperonin polypeptides, that are described above. Expression of the xylose isomerase in the presence of the Group I chaperonins gives a higher xylose isomerase specific activity as compared with the specific activity of the same xylose isomerase enzyme expressed in the absence of the interacting pair of Group I chaperonin polypeptides.

Any polypeptide having increased xylose isomerase activity in the presence of Group I chaperonins, and belonging to the classification EC 5.3.1 , may be expressed in the present yeast cells. In one embodiment the xylose isomerase is derived from a bacteria. In one embodiment the specific activity of the bacterial xylose isomerase is at least 50% of the xylose isomerase specific activity obtained in yeast cells expressing E. coli GroEL and GroESL chaperonins, and E. coli xylose isomerase. The activity may be at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95% of this level.

Xylose isomerases are classified as belonging to Group I or Group II of xylose isomerases. In the present yeast cell a xylose isomerase polypeptide of either Group I or Group II may be introduced. Bacterial Group I and Group II xylose isomerases are described in US

201 10318801 , which is incorporated herein by reference. Examples of Group I xylose isomerases are disclosed in US 201 10318801 as the even numbered sequences starting with SEQ ID NO:2 and ending with SEQ ID NO:130, as well as SEQ ID NOs:131 -147 (Table 3 of US 201 10318801 ). Coding regions for the xylose isomerases are the odd numbered

sequences starting with SEQ ID NOs:1 and ending with SEQ ID NO:129 (Table 3 of US 201 10318801 ). Examples of Group II xylose isomerases are disclosed in US 201 10318801 as SEQ ID NOs:148-306 (Table 4 of US 201 10318801 ). The following are xylose isomerase amino acid sequences with their SEQ ID NOs herein, and in US 201 10318801 , respectively:

Actinoplanes missouriensis (SEQ ID NO:29; 66), E. coli (SEQ ID NO:31 ; 219), Streptomyces rubininosus (SEQ ID NO:35; 128), Burkholderia phytofirmans (SEQ ID NO:37; 272), Burkholderia phymatum (SEQ ID NO:39; 258), and Photobacterium profundum (SEQ ID NO:47; 177).

Additional examples of xylose isomerases that may be used in the present yeast cell include those from Bacillus subtilis (SEQ ID NO:33), Citrobacter youngae (SEQ ID NO:41 ), E. blattae (SEQ ID NO:43), Pseudomonas fluorescens (SEQ ID NO:45), Pantoea stewartii (SEQ ID NO:49), Plautia stali symbiont (SEQ ID NO:51 ), Pseudomonas syringae (SEQ ID NO:53), Vibrio sp. XY-214 (SEQ ID NO:55), and Yokenella regensburgei (SEQ ID NO:57).

Further examples of xylose isomerases that may be used in the present yeast cell include amino acid sequences identified among translated open reading frames of a metagenomic cow rumen database (Matthias Hess, et al. Science 331 :463-467 (201 1 )) by BLAST searching using xylose isomerase sequences from Ruminococcus flavefaciens FD-1 (SEQ ID NO:96) and Ruminococcus champanellensis 18P13 (SEQ ID NO:97). The sequences identified and tested herein (in Example 9) from an uncultured bacterium from cow rumen were named Ru2 (SEQ ID NO:98) and Ru3 (SEQ ID NO:100).

In one embodiment the xylose isomerase is derived from a bacteria of the genera Actinoplanes, Escherichia, Bacillus, Streptomyces,

Burkholderia, Citrobacter, Pseudomonas, Photobacterium, Pantoea, Plautia, Vibrio, Yokenella, Bacteroides, Ruminococcus, or Zymomonas.

In various embodiments the xylose isomerase polypeptide in the present cells has at least about 95% amino acid sequence identity to any of the SEQ ID NOs listed above for xylose isomerases: those disclosed in US 201 10318801 as the even numbered sequences starting with SEQ ID NO:2 and ending with SEQ ID NO:130, as well as SEQ ID NOs:131 -147 (Table 3 of US 201 10318801 ), also SEQ ID NOs:148-306 (Table 4 of US 201 10318801 ), and additionally sequences herein that are SEQ ID NOs:29, 31 , 33, 35, 37, 39, 41 , 43, 45, 47, 49, 51 , 53, 55, and 57. The amino acid sequence has at least about 95%, 96%, 97%, 98%, or 99% sequence identity to any of the SEQ ID NOs for xylose isomerases listed above and those referred to in US 201 10318801 . Because xylose isomerase proteins are well known, and because of the prevalence of genomic sequencing, suitable xylose isomerase proteins may be readily identified by one skilled in the art on the basis of sequence similarity using bioinformatics approaches. Typically BLAST (described above) searching of publicly available databases with known xylose isomerase amino acid sequences, such as those provided herein, is used to identify xylose isomerase proteins, and their encoding sequences, that may be used in the present strains.

The coding region sequence for each xylose isomerase polypeptide is readily obtained from the genome of the bacterial strain in which it is natively expressed, as is well known to one skilled in the art. Native nucleotide sequences encoding each of these proteins may be codon optimized for expression in the yeast host cell to be engineered, as is well known to one skilled in the art. Examples of coding sequences that are codon optimized for expression in S. cerevisiae, for xylose isomerases of SEQ ID NOs that are odd numbers starting with 29 and ending with 57 are SEQ ID NOs that are even numbers starting with 30 and ending with 58, as well as 59 and 60.

Methods for gene expression in yeasts are as described above for GroEL and GroES, and exemplified in Examples herein. The coding region for a bacterial xylose isomerase is heterologous to the yeast cell. Thus a heterologous nucleic acid molecule encoding a xylose isomerase polypeptide is introduced into a yeast cell for expression.

The present invention provides a method for producing a yeast strain that has xylose isomerase activity following the teachings above. In one embodiment a heterologous nucleic acid molecule encoding a GroEL polypeptide and a heterologous nucleic acid molecule encoding a GroES polypeptide, as well as a heterologous nucleic acid molecule encoding a bacterial xylose isomerase polypeptide, are introduced into a yeast cell. In the yeast cell the GroEL and GroES polypeptides are expressed in the cytosol of the cell, the xylose isomerase polypeptide is converted to an active xylose isomerase enzyme, and the specific activity of the xylose isomerase enzyme is higher as compared with the specific activity of the same xylose isomerase enzyme expressed in the absence of the GroEL and GroES polypeptides.

In one embodiment of the present yeast cell the at least one gene encoding xylose isomerase and the at least one gene encoding each amino acid sequence of an interacting pair of Group I chaperonin polypeptides are derived from the same organism. In one embodiment of the present yeast cell the at least one gene encoding xylose isomerase and the at least one gene encoding each amino acid sequence of an interacting pair of Group I chaperonin polypeptides are derived from different organisms. For example, the coding regions for the Group I chaperonin polypeptides may be derived from E. coli while the coding regions for the xylose isomerase may be derived from Citrobacter youngae, Yokenella refensburgei, or Pseudomonas syringae as in

Example 7 herein. In one embodiment the xylose isomerase specific activity in a yeast cell, having the coding regions for the interacting pair of Group I chaperonin polypeptides and the coding region for the xylose isomerase derived from different organisms, is at least 50% of the specific activity in a yeast cell in which the coding regions for the interacting pair of Group I chaperonin polypeptides and the coding region for the xylose isomerase are derived from the same bacterium. The activity may be at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95% of this level.

In one embodiment the present yeast cell has, in combination, the features described above which provide xylose isomerase activity, as well as a complete xylose utilization pathway as described above, and thereby the cell is able to grow on xylose as a sole carbon source. In one embodiment the cell additionally produces a target compound that the cell either naturally synthesizes, or is engineered to synthesize. In various embodiments the target compound is ethanol, butanol, or 1 ,3-propanediol, pathways for which are referenced above. Thus the present cell is able to utilize xylose in the synthesis of a target compound. Xylose may be the sole carbon source, or it may be one component of the carbon source. Additional carbon source components may include glucose and other components that the cell is naturally able to metabolize, or is engineered to metabolize.

The present yeast cell expresses an active xylose isomerase enzyme when it is grown in a nutrient medium that supports growth of yeast cells. Thus the present invention provides a method for expressing an active bacterial xylose isomerase enzyme in yeast comprising:

a) providing a yeast host cell having at least one gene encoding each amino acid sequence of an interacting pair of Group I chaperonin polypeptides and at least one gene encoding a xylose isomerase polypeptide;

wherein: i) the interacting pair of Group I chaperonins are active in the cytosol of the cell;

ii) the xylose isomerase polypeptide is converted to an active xylose isomerase enzyme; and

iii) the specific activity of the xylose isomerase enzyme is higher as compared with the specific activity of the same xylose isomerase enzyme expressed in the absence of the interacting pair of Group I chaperonin polypeptides; and

b) growing the yeast cell of a) whereby xylose isomerase

polypeptide is converted to an active xylose isomerase enzyme.

In one embodiment the yeast cell has a complete xylose utilization pathway and is grown in a medium using xylose as a sole carbon source. More typically, the yeast cell is grown in medium containing xylose as well as other sugars such as glucose and arabinose. This allows effective use of the sugars found in a hydrolysate medium that is prepared from cellulosic biomass by pretreatment and saccharification.

In one embodiment the yeast cell has a metabolic pathway that produces a target compound. In one embodiment the target compound is selected from the group consisting of ethanol, butanol, and 1 ,3- propanediol. Yeast cells having these metabolic pathways are described above.

EXAMPLES

The present invention is further defined in the following Examples. It should be understood that these Examples, while indicating preferred embodiments of the invention, are given by way of illustration only. From the above discussion and these Examples, one skilled in the art can ascertain the essential characteristics of this invention, and without departing from the spirit and scope thereof, can make various changes and modifications of the invention to adapt it to various uses and conditions. GENERAL METHODS

The meaning of abbreviations is as follows: "kb" means

kilobase(s), "bp" means base pairs, "nt" means nucleotide(s), "hr" means hour(s), "min" means minute(s), "sec" means second(s), "d" means day(s), "L" means liter(s), "ml" or "ml_" means milliliter(s), "μί" means microliter(s), Vg" means microgram(s), "ng" means nanogram(s), "mg" means milligram(s), "mM" means millimolar, "μΜ" means micromolar, "nm" means nanometer(s), "μηηοΙ" means micromole(s), "pmol" means picomole(s), "XI" is xylose isomerase, "nt" means nucleotide.

Standard recombinant DNA and molecular cloning techniques used here are well known in the art and are described by Sambrook, J., Fritsch, E. F. and Maniatis, T., Molecular Cloning: A Laboratory Manual, 2 nd ed., Cold Spring Harbor Laboratory: Cold Spring Harbor, NY (1989)

(hereinafter "Maniatis"); and by Silhavy, T. J., Bennan, M. L. and Enquist, L. W., Experiments with Gene Fusions, Cold Spring Harbor Laboratory: Cold Spring Harbor, NY (1984); and by Ausubel, F. M. et al., Current Protocols in Molecular Biology, published by Greene Publishing Assoc. and Wiley-lnterscience, Hoboken, NJ (1987), and by Methods in Yeast Genetics, 2005, Cold Spring Harbor Laboratory Press, Cold Spring

Harbor, NY.

Media and plates

YPD medium: 10g/L yeast extract, 20 g/L peptone (both from Difco), plus varied glucose concentration

CM+Glucose-Ura plates (Teknova Inc, Hollister, CA)

CM+Glucose-His plates (Teknova Inc, Hollister, CA)

CM+Glucose-Ura liquid medium : 6.7 g/L yeast nitrogen base without amino acids (Amresco, Solon, OH), 0.77 g/L minus ura Drop Out supplement (Clontech Laboratories, Mountain View, CA ), 20 g/L glucose CM+Glucose-His liquid medium: 6.7 g/L yeast nitrogen base without amino acids (Amresco, Solon, OH), 0.77 g/L minus his Drop Out supplement (Clontech Laboratories, Mountain View, CA ), 20 g/L glucose

CM+Glucose-Ura-His liquid medium: 6.7 g/L yeast nitrogen base without amino acids (Amresco, Solon, OH), 0.77 g/L minus ura/his Drop Out supplement (Clontech Laboratories, Mountain View, CA ), 20 g/L glucose

HPLC analysis

Fermentation samples were taken at timed intervals and analyzed for EtOH, and xylose using either a Waters HPLC system (Alliance system, Waters Corp., Milford, MA) or an Agilent 1 100 Series LC;

conditions = 0.6 mL/min of 0.01 N H 2 SO , injection volume = 10 μί, autosampler temperature = 10°C, column temperature = 65°C, run time = 25 min, detection by refractive index (maintained at 40°C). The HPLC column was purchased from BioRad (Aminex HPX-87H, BioRad Inc.,

Hercules, CA). Analytes were quantified by refractive index detection and compared to known standards.

Example 1

AMxylA, ECqroES, and ECqroEL expression cassettes constructed in yeast shuttle vectors

Vectors were prepared for yeast engineering to study whether the Actinoplanes missouriensis xylose isomerase (AMXI) can be expressed and function in Saccharomyces cerevisiae. AMXI is a group I xylose isomerase, which was found to provide higher activity than other prokaryotic xylose isomerases when expressed in Zymomonas mobilis, as described in US 201 10318801 . In addition, to study effects of co- expressing the Escherichia coli GroES and GroEL chaperonin coding sequences, ECgroEL and ECgroES, yeast shuttle vectors were

constructed for their expression.

The AMxylA, ECgroES, and ECgroEL genes encode a 394-aa AMXI protein (SEQ ID NO:29), a 548-aa ECGroEL protein (SEQ ID NO:1 ), and a 97-aa ECGroES protein (SEQ ID NO:15), respectively. The gene sequences are available in Gene Bank with accession numbers of

X16042, NC313150, and NC313151 , respectively. Coding sequences for the proteins of SEQ ID NOs:29, 1 , and 15 were codon-optimized for expression in S. cerevisiae (SEQ ID NOs:30, 2, and 16, respectively) and synthesized de novo in chimeric genes by GenScript Corporation (Piscataway, NJ). During the synthesis, a 1 ,184-nt promoter of the S. cerevisiae acetohydroxyacid reductoisomerase gene (ILV5p) with a 5' Notl site and a 3' Pmel site was added upstream of the 1 ,185-nt AmxylA coding sequence, and a 635-nt terminator of the S. cerevisiae acetohydroxyacid reductoisomerase gene (ILV5t) with a 5' Sfil site and a 3' Xhol site was added downstream of the AMxylA. The resulting synthesized DNA segment formed a 3,036-nt chimeric AMxylA expression cassette (SEQ ID NO:61 ).

A chimeric ECgroES expression cassette was synthesized that included a 679-nt promoter of the S. cerevisiae glyceraldehyde-3- phosphate dehydrogenase gene (GPDp) with a 5' Bglll site, a 294-nt codon-optimized coding region of ECgroES and a 252-nt terminator of the S. cerevisiae iso-1 -cytochrome C gene (CYC1 t) with a 5' Pad site and a 3' Notl site. The resulting 1 ,247-nt chimeric ECgroES expression cassette is SEQ ID NO:62. A chimeric ECgroEL expression cassette was

synthesized that included a 678-nt promoter of the S. cerevisiae alcohol dehydrogenase I gene (ADH1 p) with a 5' EcoRI site, a 1 ,647-nt codon- optimized coding region of ECgroEL, and a 314-nt terminator of the S. cerevisiae alcohol dehydrogenase I gene (ADH1t) with a 5' Pad site and a 3' Spel site. The resulting 2, 678-nt chimeric ECgroEL expression cassette is SEQ ID NO:63.

The AMxylA expression cassette was cloned into a shuttle vector (ATCC #87541 ), generating a 9,766-bp vector called pHR81 -AMXA (SEQ ID NO:64, see Figure 1 A diagram). The pHR81 vector contains a pMB1 origin and an ampicillin resistance (ampR) marker to allow plasmid propagation and selection, respectively, in E. coli. In addition, pHR81 has a 2 micron replication origin, a URA3 selection marker, and LEU 2-d for propagation and selection in yeast, which is correlated with high copy number in S. cerevisiae when grown in medium lacking leucine, Selection for URA3 produces a plasmid copy number of 20 to 40, while selection for LEU2-d produces a plasmid copy number of 100 to 200.

The AmxylA, ECgroES and ECgroEL expression cassettes were cloned into a pHR81 vector, resulting in a 13,921 -bp vector called pHR81 - AMXA-GELS (SEQ ID NO:65, see Figure 1 B) In this vector the ECgroEL expression cassette is located downstream of the AMxylA expression cassette and the ECgroES expression cassette is downstream of the ECgroEL expression cassette, in the opposite orientation.

The ECgroES and ECgroEL expression cassettes were also cloned in a pRS423 vector in opposite orientation, forming a 9,684-bp vector called pRS423-GELS (SEQ ID NO:66, Figure 1A). Similar to pHR81 , the pRS423 shuttle vector (ATCC 77104) provides a pMB1 origin and an ampR marker to allow plasmid propagation in E. coli. It also provides a 2 micron origin for plasmid propagation in S. cerevisiae but uses a HIS3 marker for selection, resulting in about 20 copies in S. cerevisiae.

The AMxylA, ECgroES, and ECgroEL expression cassettes were cloned together in a pRS313 shuttle vector in the same order as in pHR81 -AMXA-GELS, forming a 12,642-bp vector called pRS313-AMXA- GELS (SEQ ID NO:67, see Figure 3A). Also the ECgroES and ECgroEL expression cassettes were cloned into pSR313 in the same order as in pRS423- GELS, producing a vector of 8,848-bp called pRS313-GELS (SEQ ID NO:68, see Figure 1 B). The pRS313 backbone (ATCC #77142) contains a pMB1 origin and an ampR marker for propagation in E. coli. In addition it has a CEN6/ARSH4 origin and HIS3 marker for vector selection and maintenance in S. cerevisiae, resulting in 1 to 2 copies per cell.

Example 2

Characterization of A. missourinesis xylose isomerase expression in yeast together with E. coli GroES and GroELexpression

S. cerevisiae strain BY4741 (ATCC 4040002) is a common laboratory strain with a genotype of [MATa his3A1 leu2A0 met15A0 ura3A0]. In order to transform it with the constructed yeast shuttle vectors, competent cells of BY4741 were prepared using the Frozen-EZ Yeast Transformation II Kit from Zymo Research (Orange, CA). Briefly, 1 mL of overnight grown BY4741 strain was diluted 10 fold using fresh YPD medium and cultured for 4 to 6 hours at 30 °C to reach mid-log phase. Cells were collected by centrifuging at 500x g for 4 minutes, washed with EZ-1 solution, and then resuspended in 1 ml_ EZ-2 solution. The resulting competent cells could be stored at -80 °C. To introduce pHR81 -AMXA- GELS and pRS313-AMXA-GELS into BY4741 , 50 μΙ_ of competent cells were mixed with 1 g (< 5 μΙ_ in volume) of vector DNA. Then, 500 μΙ_ of EZ-3 solution was added. The mixture was incubated at 30 °C for 1 hour, with vortexing every 15 minutes. The cells transformed with pHR81 - AMXA-GELS were spread on CM+Glucose-Ura plates, while the cells transformed with pRS313-AMXA-GELS were spread on CM+Glucose-His plates. After 2 days incubation at 30 °C, the transformants grew and became visible. Colonies were streaked to a fresh CM+Glucose-Ura or CM+Glucose-His plate and grown for another 2 days. The resulting transformants containing pHR81 -AMXA-GELS were named BY4741 SC8 and those containing pRS313-AMXA-GELS were named BY4741 SC9.

For characterization of AMxylA, ECgroEL, and ECgroES

expression, two BY4741 SC8 transformants (#1 and #2), two BY4741 SC9 transformants (#1 and #2), and the parental BY4741 strain were grown in CM+Glucose-Ura, CM+Glucose-His, and YPD liquid media, respectively at 30 °C overnight. Cell density reached an OD 6 oo value of 3. In order to estimate copy numbers of the vectors, 1 μΙ_ of each overnight culture was mixed with 46 μΙ of TE buffer and 1 μΙ Zymolyase (Zymoresearch. Orange, CA), incubated at 37 °C for 30 minutes, and then heated to 95 °C for 10 minutes. The prepared cell lysate samples were subjected to Real Time PCR to estimate plasmid copy number in each transformant, using an Applied Biosystems 7900 Sequence Detection System instrument. The target genes were URA3 for the pHR81 vector, HIS3 for the pRS313 vector, and TEF1 (encodes for Translational elongation factor EF-1 alpha) as an internal control. Wild type S. cerevisiae cell lysate was also prepared to use as a control since it has one copy of genomic URA3 and HIS3. A 20-μΙ_ Real Time PCR reaction included the following reagents (2x TaqMan master Mix from ABI-Gene): 10 μΙ of ABI TaqMan Universal PCR Master Mix w/o UNG, 0.2 μΙ of 100 μΜ forward and reverse primers, 0.05 μΙ of 100 μΜ TaqMan probe, 1 μΙ of cell lysate, and 8.55 μΙ_ RNase free water. The PCR primers and dual labeled TaqMan probes were designed using Primer Express v2.0 software from Applied Biosystems and were purchased from Sigma-Genosys (Woodlands, TX). Primers were qualified for real time quantitation using a dilution series of genomic DNA. A linear regression was performed for each primer and probe set and the efficiencies were confirmed to be within 90-1 10%. The primer and probe SEQ ID NOs are given in Table 4.

Table 4 Primers and probes used in Real Time PCR analysis

PCR reactions were heated at 95°C for 10 minutes, followed by 40 cycles of denaturing at 95°C for 15 seconds and annealing/extending at 60°C for 1 minute. All reactions were run in triplicate, and the results were averaged. The relative quantitation of the target genes URA3, HIS3, and TEF1 in the lysate samples was calculated using the AACt method (ABI User Bulletin). The Ct value of the TEF1 gene was used to normalize the quantitation of the URA3 and HIS3 genes for differences in the number of cells added to each reaction. The relative copy number (RCN) is the fold difference in the quantitation of the target genes in a strain relative to that in a wild type strain which has one copy of URA3 and HIS3. Though BY4741 has no URA3 and HIS3, in this experiment BY4741 showed one copy of URA3 and HIS3. Results are shown in Table 5.

Table 5 Relative Copy Numbers of URA3 and HIS genes in transformants containing pHR81 -AMXA-GELS (BY4741 SC8) and those containing pRS313-AMXA-GELS ( BY4741 SC9)

These data show that BY4741 SC8-1 and BY4741 SC8-2 strains propagated a large number of the pHR81 -AMXA-GELS vector, but BY4741 SC9-1 and BY4741 SC9-2 strains only had a few copies of the pRS313-AMXA-GELS vector. BY4741 , BY4741 SC9-1 , and BY4741 SC9-2 strains have no genomic or plasmid-based URA3, but they show a low RCN rather than zero. A similar situation appears for the HIS3 gene in BY4741 , BY4741 SC8-1 , and BY4741 SC8-2 strains. These low numbers indicate background in the real time PCR assay.

To measure expression of transcripts, total RNA was isolated from the above overnight cultures using Qiagen RNeasy Mini Kit, following the manufacture's protocol (Valencia, CA)). RNA concentration was determined by using Nanodrop ND-1000 (Thermo Fisher Scientific, Wilmington, DE ). Expression of AMxylA, ECgroEL, and ECgroES transcripts were examined by quantitative Real Time RT-PCR analysis on an Applied Biosystems 7900 Sequence Detection System instrument using a two-step method. Expression of S. cerevisiae TEF1 RNA was examined as an internal control. In order to eliminate residual genomic DNA, 2 g of total RNA was first treated with DNAse for 15 minutes at room temperature followed by inactivation for 5 min at 75°C in the presence of 0.1 mM EDTA. cDNA was generated from 1 g of DNAse treated RNA using the High Capacity cDNA Reverse Transcription Kit from Applied Biosystems according to the manufacturer's recommended protocol. A 20-μΙ_ Real Time PCR reaction included 10 μΙ ABI TaqMan Universal PCR Master Mix w/o UNG, 0.2 μΙ_ of 100 μΜ forward and reverse primers, 0.05 μΙ of 100 μΜ TaqMan probe, 2 μΙ of 1 :10 diluted cDNA, and 7.55 μΙ_ of RNAse free water. The PCR primers and dual labeled TaqMan probes were designed using Primer Express v2.0 software from Applied Biosystems and were purchased from Sigma- Genosys. Primers were qualified for real time quantitation using a dilution series of genomic DNA and the PCR conditions detailed below. A linear regression was performed for each primer and probe set and the efficiencies were confirmed to be within 90-1 10%. The primer and probe SEQ ID NOs are given in Table 6. Table 6 Primers and probes used in Real Time PCR analysis

PCR reactions were heated at 95°C for 10 minutes, followed by 40 cycles of denaturing at 95°C for 15 seconds and annealing/extending at 60°C for 1 minute. All reactions were run in triplicate and the results averaged. The relative quantitation of the AMxylA, ECgroEL, ECgroES and S. cerevisiae TEF1 transcripts in the RNA samples was calculated using the AACt method (ABI). The Ct value of TEF1 RNA was used to normalize the quantitation of the target transcripts for differences in the amount of total RNA added to each reaction. The relative quantitation (RQ) value is the fold difference in expression of the target transcripts in a strain relative to that in the BY4741 SC9-1 strain. The results in Table 7 show that, relative to BY4741 SC9-1 , its sibling strain BY4741 SC9-2 that only contains a few copies of pRS313-AMXA-GELS expressed similar amounts of the target transcripts; BY4741 SC8-1 and BY4741 SC8-2 strains that contain more copies of the pHR81 -AMXA-GELS vector expressed much more of the target transcripts, especially ECgroEL and EcgroES transcripts. The BY4741 control that contains no vector did not express any of the target transcripts.

Table 7 Relative Quantitation of transcripts in transformants containing pHR81 -AMXA-GELS (BY4741 SC8) and those containing pRS313-AMXA-

GELS ( BY4741 SC9)

In order to make protein extracts, cells from the above overnight cultures were collected by centrifugation and resuspended in Cell

Breaking Buffer (CBB) which contains 10 mM TEA (pH 7.5), 10 mM

MgSO 4 , 10 mM MnCI 2 , 1 mM DTT, and Roche complete Mini EDTA-free protease inhibitor cocktail (Indianapolis, IN) in an amount of 1 tablet per 50 ml_ CBB. One milliliter of the cell re-suspension was added into a 2-mL screw-cap bead beating tube containing approximately 1 g of VWR 400 micron acid washed silica beads, and subjected to breakage on a

Minibeadbeater (BioSpec products, Bartlesville, OK) using 3 x 1 minute cycles with chilling of the tubes on ice between cycles. The tubes were then centrifuged for 1 min at 15,000x g to pellet large particles and reduce foaming, and 600 μΙ_ of supernatant was transferred to a new

microcentrifuge tube and centrifuged at 15,000x g for one hour at 4°C. Finally, 500 μΙ_ of the supernatant was transferred to a new

microcentrifuge tube and stored as protein extract.

Total protein concentration in the protein extracts were determined in triplicate on a microtiter plate, using Thermo Scientific Coomassie protein assay reagent (Rockford, IL) and following the manufacturer's instruction. BSA was used as protein standard. Xylose isomerase (XI) activities in the protein extracts were measured on a Varian Cary 300 Bio spectrophotometer (Agilent Technologies, Santa Clara, CA) at 30°C. At first, 0.8 ml_ of XI assay stock solution (10 mM TEA, pH 7.5, 10 mM

MgSO 4 heptahydrate, 10 mM MnCI 2 , 0.28 mM NADH, 1 μΙ/mL sorbitol dehydrogenase) was added to a quartz cuvette and placed on the cuvette holder of the instrument to allow temperature equilibration for 10 minutes. Then, 0.1 ml_ of the diluted protein extract was added and A 3 o n m was monitored until a stable linear reading was reached. Finally, 0.1 ml_ of 0.5 M xylose was added to start the reaction. Monitoring at A 3 o n m resulted in a slope of A 340 nm change (dA 340 /min), which was used to calculate XI activity. One unit of XI activity was defined as the formation of 1 μιτιοΐβ of D- xylulose per minute at 30°C. It was calculated in equations as follows: U (μηηοΐβ/ηηίη) = slope (dA 340 /min) * volume of reaction (μΙ_) / 6220 / 1 cm; Specific activity ^mole/min-mg) = μιτιοΙβ/ιτιίη/ρΓθίβίη concentration (mg) (US Patent Application 20080081358).

The results given in Table 8 demonstrate that both BY4741 SC8 and

BY4741 SC9 strains had XI activity, but the BY4741 strain did not. Specific activity in the BY4741 SC8 strains was at least about 4-fold higher than that in the BY4741 SC9 strains, indicating that higher copy number of the pHR81 -AMXA-GELS vector in BY4741 SC8 strains supported a higher level of expression of AMXI activity.

Table 8 Xylose isomerase activity in transformants containing pHR81 - AMXA-GELS (BY4741 SC8) and those containing pRS313-AMXA-GELS

(BY4741 SC9)

Example 3

Expression of A. missourinesis xylose isomerase in yeast alone or with E.

coli GroES and GroEL

To determine whether the A. missouriensis xylose isomerase alone can be expressed as an active enzyme in yeast, or it requires E. coli chaperonins GroEL and GroES, the pHR81 -AMXA vector was transformed into competent cells of the S. cerevisiae BY4741 strain as described in the Example 2. Transformants were selected on a CM+Glucose-Ura plate and recovered strains were named BY4741 SC5.

In addition, a 5-μί DNA mixture containing 1 g pHR81 -AMXA and 1 g pRS423-GELS, and another 5-μί DNA mixture containing 1 g pHR81 -AMXA and 1 pg pRS313-GELS were each used to transform 50 μί of competent cells of the S. cerevisiae BY4741 strain. The

transformants were selected on Teknova CM+Glucose-Ura-His plates. Resulting strains having pHR81 -AMXA and pRS423-GELS were named BY4741 SC6 while those containing pHR81 -AMXA and pRS313-GELS were named BY4741 SC7.

For characterization of AMxylA, ECgroEL, and ECgroES expression in these strains, two BY4741 SC6 transformants (#1 and #2) and two BY4741 SC7 transformants (#1 and #2) were grown in CM+Glucose-Ura- His liquid medium at 30°C for overnight, respectively. Two BY4741 SC5 transformants (#1 and #2) were grown in CM+Glucose-Ura liquid medium at 30°C for overnight. Cell density reached an OD 6 oo value of 3.

In order to estimate relative copy number of the transformed vectors in these strains, the cell lysate was prepared, real time PCR was performed, and RCN was calculated as described in Example 2. The target genes were URA3 for the pHR81 -AMXA vectors, HIS3 for pRS313- GELS and pRS423-GELS vectors, and TEF1 as an internal control. Wild type S. cerevisiae DNA was used as a standard for containing one copy of URA3 and HIS3. The RCN is the fold difference in the quantitation of the target genes in a strain relative to that in the wild type strain. Results shown in Table 9 confirm that all tested strains contained a large number of pHR81 -AMXA vectors.

Table 9 Relative Copy Numbers of URA3 and HIS genes in transformants containing no chaperonins (BY4741 SC5), those containing pHR81 -AMXA and pRS423-GELS ( BY4741 SC6), and those containing pHR81 -AMXA and pRS313-GELS ( BY4741 SC7)

The results showed that the BY4741 SC6 strains contained almost 3 to 4 fold more copies of pRS423-GELS vector than the BY4741 SC7 strains contained of the pRS313-GELS vector (HIS3 assay). Since BY4741 SC5 strains did not receive either of the pRS313-GELS and pRS423-GELS vectors, the 2.5 RCN of HIS3 for these strains represents background in the real time RT-PCR assay. Copies of the pHR81 -AMXA vector showed variation among the strains, potentially with some influence of the presence of a second vector.

In order to determine the expression levels of AMxylA, ECgroEL, and ECgroES transcripts in these strains, total RNA was isolated, quantitative real time RT-PCR analysis was performed, and relative quantitation of transcripts was calculated as described in Example 2. Expression of S. cerevisiae TEF1 RNA was examined as an internal control. The RQ value is the fold difference in expression of the target transcript in a strain relative to that in the BY4741 SC9-1 strain, which was shown in the previous example. The results given in Table 10 indicate that all strains express AMxylA transcripts, though at different levels which correlate in relative level with the vector copy numbers in Table 9. The BY4741 SC5 strains express no ECgroEL and ECgroES transcripts due to absence of these genes. The BY4741 SC7 and BY4741 SC6 strains expressed ECgroEL and ECgroES transcripts, again with relative levels correlating in relative level with the vector copy numbers in Table 9.

Table 10 Relative Quantitation of transcripts in transformants containing no chaperonins (BY4741 SC5), those containing pHR81 -AMXA and pRS423-GELS ( BY4741 SC6), and those containing pHR81 -AMXA and pRS313-GELS ( BY4741 SC7)

In order to measure xylose isomerase activity in these strains, protein extracts were prepared and XI activities were measured and calculated as described in Example 2. The results in Table 1 1 show that the BY4741 SC5 strains did not have XI activity indicating that though the AMxylA transcript was present, enzymatically active protein was not produced. The BY4741 SC7 and BY4741 SC6 strains contained levels of XI activity correlating with the relative levels of transcripts and vector copy numbers. Thus expression of ECgroEL and ECgroES in either the pRS423-GELS or pRS313-GELS vector enabled functional expression of AMxylA. Specific activity of AMXI in BY4741 SC6 strains was significantly higher than that in BY4741 SC7 strains as shown in Table 1 1 .

Table 1 1 Xylose isomerase activity in transformants containing no chaperonins (BY4741 SC5), those containing pHR81 -AMXA and pRS423- GELS ( BY4741 SC6), and those containing pHR81 -AMXA and pRS313- GELS ( BY4741 SC7)

Example 4

Expression of Additional Procarvotic Xylose Isomerases in S. cerevisiae with and without co-expression of GroEL/GroES Construction of additional procarvotic xylose isomerase expression plasmids

To test whether other bacterial xylose isomerases also require GroEL and GroES for functional expression in S. cerevisiae, three other proteins were evaluated using the same fungal host strain and the same expression plasm id that was used for the A. missouriensis xylose isomerase (AMXI). Enzymes tested were from both gram negative (E. coli) and gram positive (Bacillus subtilis and Streptomyces rubiginosus) bacteria. The amino acid sequences that were used for the three test proteins were based on the published amino acid sequences for E. coli xylose isomerase (ECXI), B. subtilis xylose isomerase (BSXI), and S.

rubiginosus xylose isomerase (SRXI), which have GenBank accession numbers AAB18542 (SEQ ID NO:31 ), AFQ57693.1 (SEQ ID NO:33), and AAA26838.1 (SEQ ID NO:35), respectively. As was the case for AMXI, the nucleotide sequences for their open reading frames were codon-optimized for expression in S. cerevisiae and synthesized by GenScript Corporation (Piscataway, NJ). All three synthetic DNA fragments were prepared with a Pmel site just upstream of the start codon and a unique Sfil site

immediately following the stop codon. The two restriction sites were used for cloning purposes as described below. The codon-optimized nucleotide sequences for the ECXI, BSXI, and SRXI synthetic DNA fragments are given as SEQ ID NO:32, SEQ ID NO:34 and SEQ ID NO:36, respectively. Table 12 shows the similarity of these proteins to each other and to AMXI at the amino acid sequence level (% Identity). Note that the two most closely related proteins are only 67% identical. Table 12 Xylose Isomerases amino acid sequence % identity

As described in Example 1 , plasmid pHR81 -AMXA (SEQ ID NO:64, Figure 1 A) is a high-copy number expression plasmid for AMXI. The 5'- end of the codon-optimized AMXA open reading frame is attached to the ILV5 promoter (ILV5p) and its 3'-end is attached to the ILV5 transcriptional terminator (ILV5t). The entire open reading frame is conveniently located between a unique Pmel site that is just upstream from the start codon and a unique Sfil site that is immediately after the stop codon. To generate corresponding expression plasmids for ECXI, BSXI, SRXI, plasmid pHR81 -AMXA was digested with Pmel and Sfil and the large vector fragment was purified by agarose gel electrophoresis. The purified vector fragment was then ligated to each of the three synthetic, codon-optimized bacterial xylose isomerase DNA fragments described above after they too were digested with Pmel and Sfil. The resulting xylose isomerase expression plasmids were called pHR81 ilv5p xylA (ECXI), pHR81 ilv5p xylA (BSXI), and pHR81 ilv5p xylA (SRXI).

Introduction of xylose isomerase and GroEL/GroES plasmids into S.

cerevisia

Competent BY4741 (ATCC 4040002) cells were prepared using the Frozen-EZ Yeast Transformation II Kit from Zymo Research (Orange, CA) and the vendor's protocol as described above. To generate strains that only express AMXI, ECXI, BSX, or SRX1 , without co-expression of

GroEL/GroES, 50 μΙ_ of ice-thawed BY4741 competent cells was mixed with 1 ig plasmid DNA (either pHR81 -AMXA, pHR81 ilv5p xylA (ECXI), pHR81 ilv5p xylA (BSXI), or pHR81 ilv5p xylA (SRXI)), and 500 μΙ_ of EZ-3 solution was added. After a 1 -hr incubation period at 30 °C with shaking at 220 rpm, the mixtures were spread onto CM+Glucose-Ura plates, and the plates were incubated for two days at 30°C until colonies appeared. Two colonies from each transformation reaction were randomly selected for further characterization, and were patched onto a fresh CM+Glucose- Ura plate. The resulting strains were named Am-A and Am-B for the AMXI strains; Ec-A and Ec-B for the ECXI strains; Bs-A and Bs-B for the BSXI strains; SR-A and SR-B for the SR-XI strains.

To generate an analogous series of strains that co-express the E. coli GroEL and GroES chaperonins in addition to the above bacterial xylose isomerases, we used the GroEL/GroES expression plasmid pRS423-GELS (SEQ ID NO:9, Figure 2) that is described in detail in Example I. The transformation protocol was the same as that described above, but 1 g of pRS423-GELS and 1 g of xylose isomerase

expression plasmid DNA was added to the competent cells, and

transformants were plated onto CM+Glucose-Ura-His plates to select for both of the plasmids. The plates were incubated for 2 days at 30 °C until colonies appeared. Two colonies from each transformation reaction were randomly selected for further characterization, and they were patched onto a fresh CM+Glucose-Ura-His plate. These strains were named

Am/GroEL/ES-A and -B; Ec/GroEL/ES-A and -B; Bs/GroEL/ES-A and -B; SR/GroEL/ES-A and -B.

Preparation of cell-free extracts and protocol for measuring xylose isomerase activity

The eight strains that only had a xylose isomerase expression plasmid were grown overnight at 30 °C in CM+Glucose-Ura liquid medium to an OD600 value of 3.0-4.7. Thirty milliliter aliquots of the cultures were then harvested by centrifugation, and the drained cell pellets were rapidly frozen on dry ice and stored at -80 °C. The same procedure was used for the eight strains that also had the GroEL/GroES expression plasmid but the growth medium was CM+Glucose-His-Ura.

Cell breaking buffer was prepared with 10 mM TEA, pH 7.5, 10 mM MgSO , 10 mM MnCI 2 , 1 mM DTT, and one tablet of complete Mini, EDTA-free protease inhibitor cocktail (Roche Diagnostics GmbH,

Mannheim, Germany) in 50 ml_ total volume. Bead beating tubes were prepared with approximately 1 gram of 400 micron acid washed silica beads (VWR) in a 2-mL screw cap tube. Cell pellets were resuspended to a concentration of 100 OD units/mL of breaking buffer and 1 ml_ of this suspension was added to the bead beating tube. Tubes were stored on ice. Cell breakage was performed using a Minibeadbeater (Biospec Products; Bartlesville, OK) using 3 x 1 minute cycles with chilling of the tubes on ice between cycles. Tubes were centrifuged for 1 min at 15, 000 x g to pellet large particles and reduce foaming. A 600-μΙ aliquot was removed and transferred to a new microcentrifuge tube. These were centrifuged at 15, 000 x g for one hour at 4°C. A 500-μΙ aliquot of the supernatant was transferred to a new microcentrifuge tube and stored on ice. The samples were then diluted 15-fold by adding 20 μΙ of the extract to 280 μΙ of breaking buffer. The remaining extract was frozen on dry ice and transferred to the -80°C freezer while the dilutions were stored on ice until analysis, which was carried out on the same day.

Xylose isomerase enzyme activity was measured

spectrophotometrically by monitoring NADH disappearance at 340 nm, using a coupled enzyme assay with sorbitol dehydrogenase. A stock xylose isomerase assay solution was prepared by adding the volumes found in Table 13 to a tube that was stored at room temperature. A solution of 0.5 M xylose was also prepared and stored in a separate tube.

All chemicals were obtained from Sigma Aldrich and the source of sorbitol dehydrogenase was sheep liver.

Table 13 Assay stock solution composition and final assay concentration of components

final

volume for concentration in

Reagent stock solution assay

1 M TEA, pH 7.5 200 μΙ_ 10 mM

1 M MgSO4 heptahydrate 200 μΙ_ 10 mM

1 M MnCI2 200 μΙ_ 10 mM

NADH (1 mg/mL in water) 4 ml_ 0.28 mM

Sorbitol Dehydrogenase (10 U/mL in

water) 2 ml_ 1 U/mL

1 ml_ assay

Water 9.4 ml_ volume A Cary 300 Bio spectrophotometer (Varian, Inc. purchased by

Agilent Technologies, Santa Clara, CA) was set-up for a 10 minute assay time and the cuvette block heater was set to 30°C. Eight hundred microliters of the assay stock solution was added to a quartz cuvette and inserted into the instrument cuvette holder and the temperature was allowed to equilibrate for 10 minutes. One hundred microliters of the extraction dilution was added to the cuvette and monitoring at A 3 o n m was initiated. This was continued until a stable linear signal was obtained (background) which typically took 2-4 minutes. Next, 100 μΙ of 0.5 M xylose was added to start the reaction. Monitoring at A 3 o n m continued until a stable linear signal was obtained (signal) which typically took 2-4 minutes. Then the resulting change in slope at A 3 o n m was used to calculate XI activity. One unit of enzyme activity was defined as the formation of 1 μιτιοΐβ of D-xylulose per minute at 30°C. It was calculated in equations as follows:

U ( mole/min) = slope (dA 34 o/min) * volume of reaction (μΙ_) / 6220 / 1 cm; Specific activity ( mole/min-mg) = mole/min/protein concentration (mg) (US Patent Application 20080081358).

A protein assay was performed on the same dilutions used for the xylose isomerase activity assay. A 15-μΙ aliquot of each sample was added to a microtiter plate in triplicate. Standard BSA protein standards (ThermoFisher Scientific) were also added in triplicate. Then 300 μΙ of Coomassie Plus - The Better Bradford Assay Reagent (ThermoFisher Scientific) was added and the plate was equilibrated at room temperature for 15 minutes. The A 5 95n m was obtained. A trend line with a polynomial fit was used for the standards to calculate the protein concentration for the samples.

As already noted, the four bacterial xylose isomerases that were used in the present work were chosen because previous attempts in other laboratories to express them in S. cerevisiae did not result in significant amounts of catalytically active enzymes. Indeed, the results shown in Table 14 validate these earlier observations: none of the four proteins produced detectable amounts of enzyme activity when they were expressed in S. cerevisiae in the absence of E. coli GroEL and GroES. In marked contrast, all of the proteins yielded active enzymes when they were co-expressed with GroEL/GroES in the same fungal host. The highest enzyme activity was obtained with the E. coli homolog (ECXI), which had a specific activity of > 0.5 U/mg protein. However, the three other test proteins also resulted in reasonable amounts of catalytically active enzyme based on literature values for other xylose isomerases that do not require GroEL and GroES for functional expression in S. cerevisiae. The above experiments provide a dramatic demonstration of the beneficial effects of bacterial molecular chaperones on the functional expression of prokaryotic xylose isomerases that would otherwise fail to fold properly in yeast cytosol.

Table 14 Bacillus subtilis and Streptomyces rubiginosus xylose isomerase activity assay results

Example 5

Up-Regulation of the Native Pentose Pathway in S. cerevisiae

In addition to expression of an active xylose isomerase enzyme, a robust pentose pathway is necessary for efficient use of xylose and ethanol production under oxygen-limiting conditions in S. cerevisiae. The pentose pathway consists of five enzymes. In S. cerevisiae, these proteins are xylulokinase (XKS1 ), transaldolase (TAL1 ), transketolase 1 (TKL1 ), D-ribulose -5-phosphate 3-epimerase (RPE1 ), and ribose 5- phosphate ketol-isomerase (RKI1 ). In order to increase the expression of these proteins, their coding regions from the S. cerevisiae genome were cloned for expression under different promoters and integrated in the S. cerevisiae chromosome. The GRE3 locus encoding aldose reductase was chosen for integration. To construct such this strain, the first step was the construction of an integration vector called P5 Integration Vector in GRE3.

The sequence of the P5 Integration Vector in GRE3 is given as SEQ ID NO:87, and the following numbers refer to nucleotide positions in this vector sequence. Gaps between the given nt numbers include sequence regions containing restriction sites. The TAL1 coding region (15210 to 16217) was expressed with the TPM promoter (14615 to 15197) and uses the TAL1 t terminator. The RPE1 (13893 to 14609) coding region was expressed with the FBA1 promoter (13290 to 13879) and uses the terminator at the upstream end of the TPI1 promoter. RKI1 coding region (nt 1 1907 to 12680) was expressed with the TDH3 promoter (1 1229 to 1 1900) and uses the GPDt (previously called TDH3t) terminator. The TKL1 coding region (nt 8830 to 10872) was expressed with the PGK1 promoter (nt 8018 to 8817) and uses the TKL11 terminator. The XKS1 coding region (nt 7297 to 5495 to) was expressed with the Ilv5 promoter (nt 8009 to 7310) and uses the ADH terminator. In this integration vector, the URA3 marker (nt 332 to 1 135) was flanked by loxP sites (nt 42 to 75 and nt 1513 to 1546) for recycling of the marker. The vector contains integration arms for the GRE3 locus (nt 1549 to 2089 and nt 4566 to

5137). This P5 Integration Vector in GRE3 can be linearized by digesting with the Kasl enzyme before integration.

The yeast strain chosen for this study was BP1548 which is a haploid strain derived from prototrophic diploid strain CBS 8272

(Centraalbureau voor Schimmelcultures (CBS) Fungal Biodiversity Centre, Netherlands). This strain is in the CEN.PK lineage of Saccharomyces cerevisiae strains. BP1548contains the MATa mating type and deletions of the URA3 and HIS3 genes. To produce BP1548, first CBS 8272 was sporulated and a tetrad was dissected to yield four haploid strains using standard procedures (Amberg et al., Methods in Yeast Genetics, 2005). One of the MATa haploids, PNY0899, was selected for further modifications. The URA3 coding sequence (ATG through stop codon) and 130 bp of sequence upstream of the URA3 coding sequence was deleted by homologous recombination using a KanMX deletion cassette flanked by loxP sites, primer binding sites, and homologous sequences outside of the URA3 region to be deleted. After removal of the KanMX marker using the ere recombinase, a 95 bp sequence consisting of a loxP site flanked by the primer binding sites remained as a URA3 deletion scar in the genome (SEQ ID NO:88). This sequence is located in the genome between URA3 upstream sequence (SEQ ID NO:89) and URA3 downstream sequence (SEQ ID NO:90).The HIS3 coding sequence (ATG up to the stop codon) was deleted by homologous recombination using a scarless method. The deletion joins genomic sequences that were originally upstream (SEQ ID NO:91 ) and downstream (SEQ ID NO:92) of the HIS3 coding sequence. The KasI integration fragment containing all five pentose pathway genes in vector P5 Integration Vector in GRE3 was transformed into the BP1548 strain using the Frozen-EZ Yeast Transformation II Kit from Zymo

Research (Irvine, CA). Transformants were selected on synthetic dropout (SD) medium lacking uracil. To recycle the URA3 marker, the CRE recombinase vector pJT254 (SEQ ID NO:93) was transformed into these integrated strains. This vector was derived from pRS413 and the ere coding region (nt 2562 to 3593) was under the control of the GAL1 promoter (nt 21 19 to 2561 ). Strains that could no longer grow on SD (- uracil) medium were selected. Further passages on YPD medium was used to cure the plasmid pJT257. The resulting strain was designated as C52-79.

Example 6

Growth and Ethanol Production by S. cerevisiae Containing Different Bacterial Xylose Isomerases and E. coli chaperonins The constructed C52-79 S. cerevisiae strain could not use xylose as an energy and carbon source since it lacks xylose isomerase activity. In this experiment, xylA chimeric genes encoding xylose isomerases from bacterial sources were expressed in the C52-79 host with or without the presence of E. coli chaperonins. The bacterial xylose isomerases tested are those from Actinoplanes missouriensis (AMyXI; SEQ ID NO:29;

YP_005460771 ), Burkholderia phytofirmans PsJN (BPP; SEQ ID NO:37; YP_001890302), Burkholderia phymatum (BPS; SEQ ID NO:39;

YP_001858563), Citrobacter youngae (CYXI; SEQ ID NO:41 ;

ZP_0657 92), Escherichia blattae (EBXI; SEQ ID NO:43;

YP_006317764), E. CO// MG1655 (ECXI; SEQ ID NO:31 ; NP_418022), Pseudomonas fluorescens (PFSXI; SEQ ID NO:45; EIK60355),

Photobacterium profundum (PPXI; SEQ ID NO:47; YP_128690), Pantoea stewartii (PS; SEQ ID NO:49; ZP_0983021 1 ), Plautia stali symbiont (PSS; SEQ ID NO:51 ; ZP_0825515), Pseudomonas syringae (PST; SEQ ID NO:53; (ZP_03398764), Vibrio sp. XY-214 (VSXI; SEQ ID NO:55;

BAI23199), and Yokenella regensburgei (YRXI; SEQ ID NO:57;

ZP_09387709): (abbreviation; SEQ ID NO; Accession number). The coding sequence for each of these proteins was codon-optimized for expression in S. cerevisiae (SEQ ID NOs:59, 38, 40, 42, 44, 60, 46, 48, 50, 52, 54, 56, and 58, respectively) and synthesized de novo in chimeric genes by GenScript Corporation (Piscataway, NJ). The ILV5p promoter and IVL5t terminator were used in each chimeric gene, which was cloned into the pHR81 vector as described in Example 1 . The resulting plasmids were called pHR81 ilv5p xylA (AMyXI), HR81 ilv5p xylA (BPPXI), pHR81 ilv5p xylA (BPSXI), pHR81 ilv5p xylA (CYXI), pHR81 ilv5p xylA (EBXI), pHR81 ilv5p xylA (ECXI), pHR81 ilv5p xylA (PFSXI), pHR81 ilv5p xylA (PPXI), pHR81 ilv5p xylA (PSXI), pHR81 ilv5p xylA (PSSXI), pHR81 ilv5p xylA (PSTXI), pHR81 ilv5p xylA (VSXI), and pHR81 ilv5p xylA (YRXI), The plasmid pHR81 ilv5p xylA (ECXI) is the same construction as previously made in Example 4 named pHR81 -ECXA. The plasmid pHR81 ilv5p xylA (AMyXI) uses different codon optimization than the previously constructed pHR81 -AMXA (Example 1 ) that was used for A missouriensis

XI expression in Examples 1 and 2.

The same plasmid pHR81 -AMXA-GELS that was described in

Example 1 was used for expression of E. coli GroES and GroEL. Each xylose isomerase expression plasmid was co-transformed with the groES and groE expression plasmid pRS423-GELS, into the C52-79 strain

(Example 5) and transformants were selected as described in Example 4.

The yeast strains obtained as described above expressing xylA genes in the presence of E. coli groES and groEL were tested in YPX medium (10 g/l yeast extract, 20 g/l peptone, and 40 g/l of xylose). To perform this test, strains were inoculated into 10 ml of YPX medium in 50 ml tissue culture tubes at a starting OD of 0.5 at 600 nm. The lids were tightly closed and the tubes were placed in a 30 °C rotary shaker set at speed of

225 rpm. After 24, 48 or 72 hours, samples were taken to measure the xylose and ethanol concentrations by HPLC as in General Methods. Three strains from each transformation were tested and results shown in Table 1 were the average and standard deviation for each set.

As shown in Table 15, with the expression of E. coli chaperones, all strains tested enabled the consumption of xylose and at the same time, ethanol production. Strains expressing xylose isomerases from C.

youngae and E. blattae show the best performance. In the absence of E. coli chaperonins, no xylose consumption or ethanol production was observed. Table 15 Growth rate, xylose consumption and ethanol production in S. cerevisiae strains expressing bacterial Xls in the presence of E. coli GroES and GroEL

Burkholderia phymatum

(YP_001858563) 7.28 0.44 9.31 0.60 3.30 0.19

Citrobacter youngae

(ZP_06571492) 10.49 1.1 1 26.00 3.49 10.32 1.36

E. blattae (YP_006317764) 9.66 1.33 24.16 1 .02 9.41 0.51

E. coli MG 1655 (NP_418022) 9.21 0.89 17.94 4.69 6.91 1.95

Pseudomonas fluorescens

(EIK60355) 4.47 0.36 2.28 0.24 0.00 0.00

Photobacterium profundum

(YP_128690) 4.10 0.53 2.39 0.08 0.00 0.00

Pantoea stewartii DC283

(ZP_0983021 1 ) 7.52 0.59 13.92 2.03 5.18 0.80

Plautia stali symbiont

(ZP_0825515) 6.75 1 .35 1 1 .19 2.35 3.96 0.97

Pseud, syringae

(ZP_03398764) 6.13 1 .15 1 1 .81 1.10 4.29 0.43

Vibrio sp. XY-214 (BAI23199) 4.88 0.89 7.26 2.18 2.37 0.92

Y. regensburgei

(ZP_09387709) 9.42 0.48 24.14 1.50 8.96 0.24

After 48 hours of growth

Actinoplanes missouriensis

(YP_005460771 ) 5.68 0.72 4.36 0.50 0.36 0.32

Burk. phytofirmans PsJN

(YP_001890302) 12.14 0.30 40.00 0.00 16.44 0.16

Burkholderia phymatum

(YP_001858563) 1 1.86 0.25 38.78 0.51 15.08 0.32

Citrobacter youngae

(ZP_06571492) 12.63 0.08 40.00 0.00 16.45 0.14

E. blattae (YP_006317764) 12.94 0.08 40.00 0.00 16.50 0.01

E. coli MG 1655 (NP_418022) 12.63 0.19 40.00 0.00 16.19 0.41

Pseudomonas fluorescens

(EIK60355) 7.55 0.82 13.45 0.95 4.27 0.39

Photobacterium profundum

(YP_128690) 9.01 0.16 19.77 0.38 7.05 0.17

Pantoea stewartii DC283

(ZP_0983021 1 ) 10.90 1 .63 40.00 0.00 16.38 0.07

Plautia stali symbiont

(ZP_0825515) 12.21 0.65 40.00 0.00 15.62 0.30 Pseud, syringae

(ZP_03398764) 10.22 1.33 40.00 0.00 16.47 0.03

Vibrio sp. XY-214 (BAI23199) 10.45 1.65 35.95 4.62 13.53 2.05

Y. regensburgei

(ZP_09387709) 1 1.45 1 .53 40.00 0.00 16.35 0.32

After 72 hours of growth

Actinoplanes missouriensis

(YP_005460771 ) 7.29 0.88 8.52 1.39 1.80 0.57

Example 7

Xylose Isomerase Activities in Yeast in the Presence or Absence of GroES and GroEL

Xylose isomerase enzyme activity was assayed in S. cereivisiae strains expressing CYXI, EBXI, ECXI, PSTXI, or YRXI in the presence or absence of EC GroES and GroEL. Strains described above were used, as well as C52-79 cells transformed with only the xylose isomerase expression plasmids described in Example 6. Transformants were selected as described in Example 4.The cells were grown in SD medium lacking uracil.

Cell breaking buffer was prepared with 10 mM TEA, pH 7.5, 10 mM MgSO , 10 mM MnCI 2 , 1 mM of DTT, and one tablet of complete Mini, EDTA-free protease inhibitor cocktail (Roche Diagnostics GmbH) in 50 mL total volume. Bead beating tubes were prepared with approximately 1 gram of 400 micron acid washed silica beads (VWR) in a 2 mL screw cap tube. Cell pellets were resuspended in 1 mL of breaking buffer and added to the bead beating tubes. Tubes were stored on ice. Cell breakage was performed using a Minibeadbeater (Biospec Products) using 3 x 1 minute cycles with chilling of the tubes on ice between cycles. Tubes were centrifuged for 1 min at 15,000 g to pellet large particles and reduce foaming. 600 μί was removed and transferred to a new microcentrifuge tube. Thetubes were centrifuged at 15,000 g for one hour at 4°C. 500 μί of the supernatant was transferred to a new microcentrifuge tube and stored on ice. 1 :10 dilutions were made by added 30 μί of the extract to 270 μΙ_ of breaking buffer. The remaining extract was frozen on dry ice and transferred to the -80°C freezer while the dilutions were stored on ice until analysis which took place the same day.

A stock xylose isomerase assay solution was prepared by adding the volumes found in Table 16 to a tube which was stored at room temperature. A solution of 0.5 M xylose was also prepared and stored in a separate tube. All chemicals were obtained from Sigma Aldrich. The source of sorbitol dehydrogenase was sheep liver.

Table 16 Assay stock solution composition and final assay concentration of components.

final

volume for concentration in chemical stock solution assay

1 M TEA, pH 7.5 250 μΙ_ 10 mM

1 M MgSO4 heptahydrate 250 μΙ_ 10 mM

1 M MnCI2 250 μΙ_ 10 mM

NADH (1 mg/mL in water) 5 ml_ 0.28 mM

Sorbitol Dehydrogenase (10 U/mL in

water) 2.5 mi1 U/mL

1 ml_ assay

Water l l .75 ml_ volume

A Cary 300 Bio spectrophotometer (Varian) was set-up for a 10 minute assay time and the cuvette block heater was set to 30°C. 800 μΙ_ of the assay stock solution was added to a quartz cuvette and inserted into the instrument cuvette holder and the temperature was allowed to equilibrate for 10 minutes. 100 μΙ_ of the extract dilution was added to the cuvette and monitoring at A 3 o n m was initiated. This was continued until a stable linear signal was obtained (background) which typically took 2-4 minutes. 100 μΙ_ of 0.5M xylose was then added to start the reaction. Monitoring at A 3 o n m continued until a stable linear signal was obtained (signal) which typically took 2-4 minutes. A protein assay was performed on the same dilutions used for the xylose isomerase activity assay. 25 μΙ_ of each sample was added to a microtiter plate in triplicate. Standard BSA protein standards (Thermo Scientific) were also added in triplicate. 280 μΙ_ of Coomassie Plus - The Better Bradford Assay Reagent (Thermo Scientific) was added and the plate was equilibrated at room temperature for 15 minutes. The A 5 95nm was obtained. A trend line with a polynomial fit was used for the standards to calculate the protein concentration for the samples. A sample was determined to have no activity if the slope after addition of xylose was more positive than the background slope. The activity assay results can be seen in Table 17. Two different transformants were assayed for each construction.

Table 17 Xylose isomerase activity assay results

(YP_006317764)

with groEL/ES 0.15

no groEL/ES 0.003 no groEL/ES No Activity

Y. regensburgei

(ΖΡ_09387709) 92% with groEL/ES 0.31 1

with groEL/ES 0.394 no groEL/ES 0.014 no groEL/ES 0.007

Pseud, syringae

(ZP_03398764) 68% with groEL/ES 0.182

with groEL/ES 0.209 no groEL/ES No Activity no groEL/ES No Activity

Example 8

Use of A. missourinesis Chaperones with Xylose Isomerase

The previous examples demonstrate the effectiveness of E. coli

GroEL and GroES in improving the folding and in vivo function of various bacterial xylose isomerases when expressed in S. cerevisiae. In this example, chaperon ins from Actinoplanes missouriensis were used.

Plasmid pRS423 Am 104GroES 550 GroEL (SEQ ID NO:94) was constructed to contain a set of A. missouriensis chaperonins. In this plasmid the groEL nucleic acid fragment (nt 6446 to 8092 in SEQ ID NO:974; also SEQ ID NO:4) that encodes a polypeptide of 550 amino acids was under the control of the ADH promoter (nt 5762 to 6439). The groES nucleic acid fragment that encodes a polypeptide of 104 amino acids (SEQ ID NO:18) was under the control of the GPD (TDH3) promoter (nt 9332 to 8654). Both the expression cassettes were terminated with a bidirectional CYC1 terminator (nt 8101 to 8319). In order to determine whether expression of this set of chaperonins from A. missouriensis can improve the folding and function of the xylose isomerase, plasmids pHR81 ilv5p xylA (AMyXI) (Example 6) and pRS423 Am 104GroES 550 GroEL were transformed into yeast strain C52-79 as described in Example 6. The resulting strain was analyzed for growth and ethanol production from xylose. Using the same growth conditions as described in Example 6, the strain used up all of the xylose (40 g/L) in the medium, producing about 16 g of ethanol after 6 days. The result demonstrated that the xylose isomerase from A. missouriensis can be functionally expressed in the presence of the A. missouriensis

chaperonins in yeast.

A second set of coding regions in the A. missouriensis genome sequence is annotated as encoding GroEL and GroES. These coding regions were also cloned in the vector pRS423 the same way as the first set of chaperonins, described above. The resulting construct was pRS423 Am 1 12GroES 540 GroEL (SEQ ID NO:95). In this construct, the GroEL coding region (nt 6446 to 8068 in SEQ ID NO:95; also SEQ ID NO:6) was under the control of the ADH promoter (nt 5762 to 6439). The GroES coding region (nt 8642 to 8034 in SEQ ID NO:95; also SEQ ID NO:20) was under the control of the GPD (TDH3) promoter (nt 9332-8654). The bidirectional CYC1 terminator (nt 8077 to 8295) was placed between these two expression cassettes. To test whether this construct is functional in yeast, plasmids pHR81 ilv5p xylA (AMyXI) and pRS423 Am 1 12GroES 540 GroEL were transformed into yeast strain C52-79 as described in Example 6. The resulting strain was analyzed for growth and ethanol production from xylose. Using the same growth conditions as described in Example 6, the strain used very little xylose and no detectable of amount of ethanol was present in the growth medium. It is possible that this set of chaperonins was not expressed in yeast, or the GroEL and GroES were not matched properly. It is also possible that the annotation in the database is incorrect. Example 9

Expression of xylose isomerases from a cow rumen metagenomic library

Additional candidate bacterial xylose isomerases were tested for activity in yeast when expressed with or without GroES and GroEL. These were two polypeptides identified using amino acid sequences of the xylose isomerases from Ruminococcus flavefaciens FD-1 (SEQ ID NO:96) and from Ruminococcus champanellensis 18P13 (SEQ ID NO:97) in a BLAST search against translated open reading frames of the metagenomic database generated from cow rumen (Matthias Hess, et al. Science 331 :463-467 (201 1 )). These two proteins have 77% amino acid identity to each other. No protein sequences were found to have greater than 70% identity to either of these sequences. Two proteins with sequence identities in the range of 59% to 64% were selected for testing and named Ru2 (SEQ ID NO:98) and Ru3 (SEQ ID NO:100. DNA sequences encoding these proteins were designed using codon optimization for expression in S. cerevisiae, and given designations of xylA (Ru2) (SEQ ID NO:99) and xylA(Ru3) (SEQ ID NO:101 ). The designed nucleic acid molecules were synthesized, including a Pmel site just upstream of the start codon and a Sfil site immediately following the stop codon.

The synthesized xylA coding regions xylA(Ru2) and xylA(Ru3) were inserted between Pmel and Sfil sites in pHR81 -AMXA creating chimeric genes for expression as described in Example 4. The xylA(Ru2) vector was named pHR81 ilv5p xylA(Ru2) and the xylA(Ru3) vector was named pHR81 Ilv5p xylA(Ru3). These constructs were transformed into the C52- 79 strain (Example 5) with or without pRS423-GELS (Example 1 ), the plasmid containing ECgroES and ECgroEL expression cassettes, as in Example 6.

Transformed strains were examined for their ability to consume xylose and to convert xylose to ethanol as described in Example 6.

Results of analysis after 24 hr of growth are shown in Table 18.

Expression of xylA(Ru2) or xylA(Ru3) alone without E. coli chaperonins did not enable the yeast strain to consume xylose or convert xylose to ethanol. On the other hand, with the expression of E. coli chaperonins, the yeast strains containing each of these xylose isomerases could consume xylose and convert xylose to ethanol. The result indicates that expression of E. coli chaperon ins enables expression of active Ru2 and Ru3 xylose isomerase enzymes in yeast.

Table 18 Growth rate, xylose consumption and ethanol production in S. cerevisiae strains expressing bacterial Xls in the presence or absence of

E. coli GroES and GroEL