Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
ENZYME WITH ENHANCED CELLULOLYTIC ACTIVITY
Document Type and Number:
WIPO Patent Application WO/2012/125677
Kind Code:
A2
Abstract:
A CelZ variant having at least 90% sequence identity to SEQ ID NO:Z having a substitution at D90, Q234, D304, D324, V330 or N383 of SEQ ID NO:Z has improved cellulase activity as compared to SEQ ID NO:Z.

Inventors:
BLAZEJ ROBERT (US)
COHEN RICHARD (US)
EMRICH CHARLES (US)
TORIELLO NICHOLAS (US)
Application Number:
PCT/US2012/028997
Publication Date:
September 20, 2012
Filing Date:
March 14, 2012
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ALLOPARTIS BIOTECHNOLOGIES INC (US)
BLAZEJ ROBERT (US)
COHEN RICHARD (US)
EMRICH CHARLES (US)
TORIELLO NICHOLAS (US)
International Classes:
C12N9/42; C12N15/56; C12N15/63; C12P7/06
Other References:
PY, B. ET AL.: 'Cellulase EGZ of Erwinia chrysanthemi: structural organization and importance of His98 and Glu133 residues for catalysis' PROTEIN ENGINEERING. vol. 4, no. 3, February 1991, pages 325 - 333
LIM, W. J. ET AL.: 'Construction of minimum size cellulase (Cel5Z) from Pectobacterium chrysanthemi PY35 by removal of the C-terminal region' APPLIED MICROBIOLOGY AND BIOTECHNOLOGY. vol. 68, no. 1, 22 January 2005, pages 46 - 52
PARK, S. R. ET AL.: 'Activity enhancement of Cel5Z from Pectobacterium chrysanthemi PY35 by removing C-terminal region' BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS. vol. 291, no. 2, 22 February 2002, pages 425 - 430
CHAPON, V. ET AL.: 'Alteration of a single tryptophan residue of the cellulose-binding domain blocks secretion of the Erwinia chrysanthemi Cel5 cellulase (ex-EGZ) via the type II system' JOURNAL OF MOLECULAR BIOLOGY vol. 303, no. 2, 20 October 2000, pages 117 - 123
Attorney, Agent or Firm:
SHUSTER, Michael, J. et al. (Silicon Valley Center801 California Stree, Mountain View CA, US)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1. A CelZ variant wherein:

the variant has at least 90% sequence identity to SEQ. ID NO:Z;

the variant comprises a substitution at D90, Q234, D304, D324, V330 or N383 using the amino acid sequence of SEQ. ID NO:Z for determining position numbering; and wherein said variant exhibits improved cellulase activity as compared to SEQ ID NO: Z. 2. The CelZ variant of claim 1 wherein the variant has at least 95 % sequence identity to SEQ. ID NO:Z.

3. The CelZ variant of claim 1 wherein the variant has at least 98 % sequence identity to SEQ. ID NO:Z.

4. The CelZ variant of claim 1 wherein the variant has at least 99 % sequence identity to SEQ. ID NO:Z.

5. The CelZ variant of claim 1 comprising at least one substitution selected from the group consisting of D90G, Q234H, D304Y, D324E, V330A and ΔN383TDNNIGSG using the amino acid sequence of SEQ. ID NO:Z for determining position numbering.

6. The CelZ variant of claim 5 comprising D90G, Q234H and ΔN383TDNNIGSG.

7. The CelZ variant of claim 6 comprising SEQ ID NO:B.

8. The CelZ variant of claim 1 comprising V330A.

9. The CelZ variant of claim 8 comprising SEQ ID NO:D.

10. The CelZ variant of claim 1 comprising D324E.

11. The CelZ variant of claim 10 comprising SEQ ID NO:E.

12. The CelZ variant of claim 1 comprising D304Y.

13. The CelZ variant of claim 12 comprising SEQ ID NO:C.

14. An isolated polypeptide consisting of SEQ ID NO:BA.

15. An isolated polypeptide consisting of SEQ ID NO:CA.

16. An isolated polypeptide consisting of SEQ ID NO:DA.

17. An isolated polypeptide consisting of SEQ ID NO:EA.

18. An isolated polypeptide consisting of SEQ ID NO:A

wherein X1 is selected from the group consisting of G, A, N, D and S; wherein X2 is selected from the group consisting of H, Y, N, E, Q and R; wherein X3 is selected from the group consisting of TDNNIGSG and N; and

with the proviso that X1 is not D when X2 is Q and X3 is N.

19. An isolated polypeptide consisting of SEQ ID NO:AA:

wherein X1 is selected from the group consisting of G, A, N, D and S;

wherein X2 is selected from the group consisting of H, Y, N, E, Q and R;

wherein X3 is selected from the group consisting of TDNNIGSG and N; and

with the proviso that X1 is not D when X2 is Q and X3 is N.

20. An isolated or recombinant polynucleotide comprising or consisting of a sequence selected from the group consisting of:

(g) any one of SEQ ID NO:F, SEQ ID NO:G, SEQ ID NO:H and SEQ ID NO:I; (h) a nucleic acid sequence that is a degenerate variant of any one of SEQ ID NO:F, SEQ ID NO:G, SEQ ID NO:H and SEQ ID NO:I;

(i) a nucleic acid sequence at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or at least 99.9% identical to any one of SEQ ID NO:F, SEQ ID NO:G, SEQ ID NO:H and SEQ ID NO:I;

(j) a nucleic acid sequence that encodes a polypeptide having any one of the amino acid sequences SEQ ID NO:A, SEQ ID NO:AA, SEQ ID NO:B, SEQ ID NO:BA, SEQ ID NO:C, SEQ ID NO:CA, SEQ ID NO:D, SEQ ID NO:DA, SEQ ID NO:E and SEQ ID NO:EA;

(k) a nucleic acid sequence that encodes a polypeptide at least 99% or at least 99.9% identical to any one of the sequences SEQ ID NO:A, SEQ ID NO:AA, SEQ ID NO:B, SEQ ID NO:BA, SEQ ID NO:C, SEQ ID NO:CA, SEQ ID NO:D, SEQ ID NO:DA, SEQ ID NO:E and SEQ ID NO:EA; and

(l) a nucleic acid sequence that hybridizes under stringent conditions to any one of the sequences SEQ ID NO:F, SEQ ID NO:G, SEQ ID NO:H and SEQ ID NO:I.

21. A vector comprising the polynucleotide of claim 20.

22. A host cell comprising the polynucleotide of claim 20.

23. A cellulolytic mixture comprising one of the CelZ variants of claims 1-13 or polypeptide of claims 14-19.

24. A method for processing biomass comprising contacting biomass with any one of the CelZ variants of claims 1-13 or polypeptide of claims 14-19.

25. A method for preparing ethanol, the method comprising contacting biomass with any one of the CelZ variants of claims 1-13 or polypeptide of claims 14-19; a beta-glucosidase and a yeast.

Description:
ENZYME WITH ENHANCED CELLULOLYTIC ACTIVITY Inventors: Robert Blazej, Richard Cohen, Charles Emrich and Nicholas Toriello Cross-Reference to Related Applications [0001] This application is related to and claims benefit of the earlier filing date of U.S. Application 61/452,584 filed March 14, 2011 which is hereby incorporated by reference in its entirety for all purposes.

Field of the Invention [0002] The invention is in the field of cellulolytic enzymes useful in the processing of plant biomass material.

Background of the Invention [0003] Cellulosic biomass is the most abundant renewable natural resource. Generated at a rate of 100 billion dry tons/year by the biosphere, cellulosic biomass has the potential to replace the world’s demand for diminishing fossil fuels. However, according to Zhang, Y. H. P.,“[o]ne of the most important and difficult technological challenges is to overcome the recalcitrance of natural lignocellulosic materials, which must be enzymatically hydrolyzed to produce fermentable sugars.” See, Zhang, Y. H. P. et al.,“Outlook for cellulase

improvement: Screening and selection strategies.” Biotechnol. Adv., 2006, 24: 452-481. A major limitation for the conversion of biomass to biofuel and renewable chemicals is the high cost and large quantities of enzymes required for hydrolysis.

[0004] Presently, commercial cellulases are typically over-expressed wild-type enzymes. Reductions in cost have thus far been achieved by increasing the yield of produced enzymes. Current yields are near theoretical maximums and further cost reductions through increased yield have not been forthcoming. Enzymes with improved capability to hydrolyze cellulose into its constituent oligo- and mono-saccharides are required to produce biofuel and renewable chemicals cost-effectively from biomass.

Summary of the Invention [0005] A CelZ variant is disclosed wherein the variant has at least 90% sequence identity to SEQ. ID NO:Z; the variant comprises a substitution at D90, Q234, D304, D324, V330 or N383 using the amino acid sequence of SEQ. ID NO:Z for determining position numbering; and wherein said variant exhibits improved cellulase activity as compared to SEQ ID NO: Z. [0006] In one embodiment the CelZ variant at least 95% sequence identity to SEQ. ID NO:Z.

[0007] In one embodiment the CelZ variant at least 98% sequence identity to SEQ. ID NO:Z.

[0008] In one embodiment the CelZ variant at least 99% sequence identity to SEQ. ID NO:Z.

[0009] In one embodiment, the CelZ variant of comprises at least one substitution selected from the group consisting of D90G, Q234H, D304Y, D324E, V330A and

N383TDNNIGSG using the amino acid sequence of SEQ. ID NO:Z for determining position numbering.

[0010] In one embodiment, the CelZ variant of comprises D90G, Q234H and

ΔN383TDNNIGSG.

[0011] In one embodiment, the CelZ variant is SEQ ID NO:B.

[0012] In one embodiment, the CelZ variant comprises V330A.

[0013] In one embodiment, the CelZ variant is SEQ ID NO:D.

[0014] In one embodiment, the CelZ variant comprises D324E.

[0015] In one embodiment, the CelZ variant is SEQ ID NO:E.

[0016] In one embodiment, the CelZ variant comprises D304Y.

[0017] In one embodiment, the CelZ variant is SEQ ID NO:C.

[0018] An isolated polypeptide is disclosed having SEQ ID NO:BA.

[0019] In one embodiment the isolated polypeptide is SEQ ID NO:CA.

[0020] In one embodiment the isolated polypeptide is SEQ ID NO:DA.

[0021] In one embodiment the isolated polypeptide is SEQ ID NO:EA.

[0022] In one embodiment the isolated polypeptide is SEQ ID NO:A, wherein X 1 is selected from the group consisting of G, A, N, D and S; wherein X 2 is selected from the group consisting of H, Y, N, E, Q and R; wherein X 3 is selected from the group consisting of TDNNIGSG and N; and with the proviso that X 1 is not D when X 2 is Q and X 3 is N. [0023] In one embodiment the isolated polypeptide is SEQ ID NO:AA, wherein X 1 is selected from the group consisting of G, A, N, D and S; wherein X 2 is selected from the group consisting of H, Y, N, E, Q and R; wherein X 3 is selected from the group consisting of TDNNIGSG and N; and with the proviso that X 1 is not D when X 2 is Q and X 3 is N.

[0024] An isolated or recombinant polynucleotide is disclosed comprising or consisting of a sequence selected from the group consisting of:

(a) any one of SEQ ID NO:F, SEQ ID NO:G, SEQ ID NO:H and SEQ ID NO:I; (b) a nucleic acid sequence that is a degenerate variant of any one of SEQ ID NO:F, SEQ ID NO:G, SEQ ID NO:H and SEQ ID NO:I;

(c) a nucleic acid sequence at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99% or at least 99.9% identical to any one of SEQ ID NO:F, SEQ ID NO:G, SEQ ID NO:H and SEQ ID NO:I;

(d) a nucleic acid sequence that encodes a polypeptide having any one of the amino acid sequences SEQ ID NO:A, SEQ ID NO:AA, SEQ ID NO:B, SEQ ID NO:BA, SEQ ID NO:C, SEQ ID NO:CA, SEQ ID NO:D, SEQ ID NO:DA, SEQ ID NO:E and SEQ ID NO:EA;

(e) a nucleic acid sequence that encodes a polypeptide at least 99% or at least 99.9% identical to any one of the sequences SEQ ID NO:A, SEQ ID NO:AA, SEQ ID NO:B, SEQ ID NO:BA, SEQ ID NO:C, SEQ ID NO:CA, SEQ ID NO:D, SEQ ID NO:DA, SEQ ID NO:E and SEQ ID NO:EA; and

(f) a nucleic acid sequence that hybridizes under stringent conditions to any one of the sequences SEQ ID NO:F, SEQ ID NO:G, SEQ ID NO:H and SEQ ID NO:I.

[0025] A vector is disclosed comprising the isolated or recombinant polynucleotide.

[0026] A host cell comprising the isolated or recombinant polynucleotide is disclosed.

[0027] In another embodiment, cellulolytic mixtures comprising one of the CelZ variants or polypeptide are disclosed.

[0028] In another embodiment a method for processing biomass comprising contacting biomass with any one of the CelZ variants or polypeptide is disclosed.

[0029] In another embodiment a method for preparing ethanol, the method comprising contacting biomass with any one of the CelZ variants or polypeptides; a beta-glucosidase and a yeast are disclosed. Brief Description of the Drawings [0030] FIG. 1 illustrates alignment of sequences DA, EA, CA and BA with wild-type CelZ.

[0031] FIG. 2 illustrates activity of 0.5 μg of wild-type CelZ and improved cellulase variants (SEQ ID NOS: B, C, D, E) as tested in Example 4.

[0032] FIG. 3 illustrates activity of 0.25, 0.5, 1, 2, 4, 8 mg/g (enzyme/AVICEL TM ) of wild-type CelZ and improved cellulase variant SEQ ID NO: B as tested in Example 5.

Detailed Description [0033] The following explanations of terms and methods are provided to better describe the present disclosure and to guide those of ordinary skill in the art in the practice of the present disclosure. As used herein,“comprising” means“including” and the singular forms “a” or“an” or“the” include plural references unless the context clearly dictates otherwise. For example, reference to“comprising a cell” includes one or a plurality of such cells, and so forth. The term“or” refers to a single element of stated alternative elements or a combination of two or more elements, unless the context clearly indicates otherwise.

[0034] Unless explained otherwise, all technical and scientific terms used herein have the same meaning as commonly understood to one of ordinary skill in the art to which this disclosure belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present disclosure, suitable methods and materials are described below. The materials, methods, and examples are illustrative only and not intended to be limiting. Other features of the disclosure are apparent from the following detailed description and the claims.

[0035] All publications disclosed herein are incorporated by reference in their entirety for all purposes.

[0036] The term“peptide” as used herein refers to a short polypeptide, e.g., one that is typically less than about 50 amino acids long and more typically less than about 30 amino acids long. The term as used herein encompasses analogs and mimetics that mimic structural and thus biological function.

[0037] The term“polypeptide” encompasses both naturally-occurring and non-naturally- occurring proteins, and fragments, mutants, derivatives and analogs thereof. A polypeptide may be monomeric or polymeric. Further, a polypeptide may comprise a number of different domains each of which has one or more distinct activities.

[0038] The term“isolated protein” or“isolated polypeptide” is a protein or polypeptide that by virtue of its origin or source of derivation (1) is not associated with naturally associated components that accompany it in its native state, (2) exists in a purity not found in nature, where purity can be adjudged with respect to the presence of other cellular material (e.g., is free of other proteins from the same species) (3) is expressed by a cell from a different species, or (4) does not occur in nature (e.g., it is a fragment of a polypeptide found in nature or it includes amino acid analogs or derivatives not found in nature or linkages other than standard peptide bonds). Thus, a polypeptide that is chemically synthesized or synthesized in a cellular system different from the cell from which it naturally originates will be“isolated” from its naturally associated components. A polypeptide or protein may also be rendered substantially free of naturally associated components by isolation, using protein purification techniques well known in the art. As thus defined,“isolated” does not necessarily require that the protein, polypeptide, peptide or oligopeptide so described has been physically removed from its native environment.

[0039] The term“polypeptide fragment” as used herein refers to a polypeptide that has a deletion, e.g., an amino-terminal and/or carboxy-terminal deletion compared to a full-length polypeptide. In a preferred embodiment, the polypeptide fragment is a contiguous sequence in which the amino acid sequence of the fragment is identical to the corresponding positions in the naturally-occurring sequence. Fragments typically are at least 5, 6, 7, 8, 9 or 10 amino acids long, preferably at least 12, 14, 16 or 18 amino acids long, more preferably at least 20 amino acids long, more preferably at least 25, 30, 35, 40 or 45, amino acids, even more preferably at least 50 or 60 amino acids long, and even more preferably at least 70 amino acids long.

[0040] A protein has“homology” or is“homologous” to a second protein if the nucleic acid sequence that encodes the protein has a similar sequence to the nucleic acid sequence that encodes the second protein. Alternatively, a protein has homology to a second protein if the two proteins have "similar" amino acid sequences. (Thus, the term“homologous proteins” is defined to mean that the two proteins have similar amino acid sequences.) As used herein, homology between two regions of amino acid sequence (especially with respect to predicted structural similarities) is interpreted as implying similaritiy function. [0041] When“homologous” is used in reference to proteins or peptides, it is recognized that residue positions that are not identical often differ by conservative amino acid substitutions. A“conservative amino acid substitution” is one in which an amino acid residue is substituted by another amino acid residue having a side chain (R group) with similar chemical properties (e.g., charge or hydrophobicity). In general, a conservative amino acid substitution will not substantially change the functional properties of a protein. In cases where two or more amino acid sequences differ from each other by conservative substitutions, the percent sequence identity or degree of homology may be adjusted upwards to correct for the conservative nature of the substitution. Means for making this adjustment are well known to those of skill in the art. See, e.g., Pearson, 1994, Methods Mol. Biol. 24:307-31 and 25:365-89 (herein incorporated by reference).

[0042] The following six groups each contain amino acids that are conservative substitutions for one another: 1) Serine (S), Threonine (T); 2) Aspartic Acid (D), Glutamic Acid (E); 3) Asparagine (N), Glutamine (Q); 4) Arginine (R), Lysine (K); 5) Isoleucine (I), Leucine (L), Methionine (M), Alanine (A), Valine (V), and 6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W). Attached as Table 1 is a BLOSUM62 matrix ranking substitutions for glycine. Attached as Table 2 is a BLOSUM62 matrix ranking substitutions for histidine. Attached as Table 3 is a general BLOSUM62 amino acid substitution matrix.

[0043] Sequence homology for polypeptides, which is also referred to as percent sequence identity, is typically measured using sequence analysis software. See, e.g., the Sequence Analysis Software Package of the Genetics Computer Group (GCG), University of Wisconsin Biotechnology Center, 910 University Avenue, Madison, Wis. 53705. Protein analysis software matches similar sequences using a measure of homology assigned to various substitutions, deletions and other modifications, including conservative amino acid substitutions. For instance, GCG contains programs such as“Gap” and“Bestfit” which can be used with default parameters to determine sequence homology or sequence identity between closely related polypeptides, such as homologous polypeptides from different species of organisms or between a wild-type protein and a mutein thereof. See, e.g., GCG Version 6.1.

[0044] A preferred algorithm when comparing a particular polypeptide sequence to a database containing a large number of sequences from different organisms is the computer program BLAST (Altschul et al., J. Mol. Biol. 215:403-410 (1990); Gish and States, Nature Genet. 3:266-272 (1993); Madden et al., Meth. Enzymol. 266:131-141 (1996); Altschul et al., Nucleic Acids Res. 25:3389-3402 (1997); Zhang and Madden, Genome Res. 7:649-656 (1997)), especially blastp or tblastn (Altschul et al., Nucleic Acids Res. 25:3389-3402 (1997)).

[0045] Preferred parameters for BLASTp are: Expectation value: 10 (default); Filter: seg (default); Cost to open a gap: 11 (default); Cost to extend a gap: 1 (default); Max. alignments: 100 (default); Word size: 11 (default); No. of descriptions: 100 (default); Penalty Matrix: BLOSUM62.

[0046] One skilled in the art may also use the ALIGN program incorporating the non- linear algorithm of Myers and Miller (Comput. Appl. Biosci. (1988) 4:11-17). For amino acid sequence comparison using the ALIGN program one skilled in the art may use a PAM120 weight residue table, a gap length penalty of 12, and a gap penalty of 4.

[0047] The length of polypeptide sequences compared for homology will generally be at least about 16 amino acid residues, usually at least about 20 residues, more usually at least about 24 residues, typically at least about 28 residues, and preferably more than about 35 residues. When searching a database containing sequences from a large number of different organisms, it is preferable to compare amino acid sequences. Database searching using amino acid sequences can be measured by algorithms other than blastp known in the art. For instance, polypeptide sequences can be compared using FASTA, a program in GCG Version 6.1. FASTA provides alignments and percent sequence identity of the regions of the best overlap between the query and search sequences. Pearson, Methods Enzymol. 183:63-98 (1990) (herein incorporated by reference). For example, percent sequence identity between amino acid sequences can be determined using FASTA with its default parameters (a word size of 2 and the PAM250 scoring matrix), as provided in GCG Version 6.1, herein incorporated by reference.

[0048] Codons are triplets of nucleotides in DNA molecules and code for an amino acid. The term codon is also used for the corresponding (and complementary) sequences of three nucleotides in the mRNA into which the DNA sequence is transcribed.

[0049] CelZ: CelZ is cellulase expressed by the celZ gene in Dickeya dadantii. CelZ is identified as EC 3.2.1.4 in ExPASy Proteomics Server. Alternatively, CelZ is called ENZ, endoglucanase Z, avicelase, -1,4-endoglucan hydrolase, β-1,4-glucanase, carboxymethyl cellulase, celludextrinase, endo-1,4-β-D-glucanase, endo-1,4-β-D-glucanohydrolase, endo-1,4-β-glucanase, or endoglucanase. ENZ catalyzes endohydrolysis of (1 4)-β-D- glucosidic linkages in cellulose, lichenin and cereal β-D-glucans. Wild-type sequence for CelZ is SEQ ID NO:Z.

[0050] Nucleic Acid Molecule: The term“nucleic acid molecule” of“polynucleotide” refers to a polymeric form of nucleotides of at least 10 bases in length. The term includes DNA molecules (e.g., cDNA or genomic or synthetic DNA) and RNA molecules (e.g., mRNA or synthetic RNA), as well as analogs of DNA or RNA containing non-natural nucleotide analogs, non-native inter-nucleoside bonds, or both. The nucleic acid can be in any topological conformation. For instance, the nucleic acid can be single-stranded, double- stranded, triple-stranded, quadruplexed, partially double-stranded, branched, hair-pinned, circular, or in a padlocked conformation. If single stranded, the nucleic acid molecule can be the sense strand or the antisense strand.“Nucleic acid molecule” includes nucleic acid molecules which are not naturally occurring.

[0051] Isolated: An“isolated” nucleic acid or polynucleotide (e.g., an RNA, DNA or a mixed polymer) is one which is substantially separated from other cellular components that naturally accompany the native polynucleotide in its natural host cell, e.g., ribosomes, polymerases, and genomic sequences with which it is naturally associated. The term embraces a nucleic acid or polynucleotide that (1) has been removed from its naturally occurring environment, (2) is not associated with all or a portion of a polynucleotide in which the“isolated polynucleotide” is found in nature, (3) is operatively linked to a polynucleotide which it is not linked to in nature, or (4) does not occur in nature. The term“isolated” or “substantially pure” also can be used in reference to recombinant or cloned DNA isolates, chemically synthesized polynucleotide analogs, or polynucleotide analogs that are biologically synthesized by heterologous systems. However,“isolated” does not necessarily require that the nucleic acid or polynucleotide so described has itself been physically removed from its native environment. For instance, an endogenous nucleic acid sequence in the genome of an organism is deemed“isolated” herein if a heterologous sequence (i.e., a sequence that is not naturally adjacent to this endogenous nucleic acid sequence) is placed adjacent to the endogenous nucleic acid sequence, such that the expression of this endogenous nucleic acid sequence is altered. By way of example, a non native promoter sequence can be substituted (e.g. by homologous recombination) for the native promoter of a gene in the genome of a human cell, such that this gene has an altered expression pattern. This gene would now become“isolated” because it is separated from at least some of the sequences that naturally flank it. A nucleic acid is also considered“isolated” if it contains any modifications that do not naturally occur to the corresponding nucleic acid in a genome. For instance, an endogenous coding sequence is considered“isolated” if it contains an insertion, deletion or a point mutation introduced artificially, e.g. by human intervention. An “isolated nucleic acid” also includes a nucleic acid integrated into a host cell chromosome at a heterologous site, as well as a nucleic acid construct present as an episome. Moreover, an “isolated nucleic acid” can be substantially free of other cellular material, or substantially free of culture medium when produced by recombinant techniques, or substantially free of chemical precursors or other chemicals when chemically synthesized. The term also embraces nucleic acid molecules and proteins prepared by recombinant expression in a host cell as well as chemically synthesized nucleic acid molecules and proteins.

[0052] The term“percent sequence identity” or“identical” in the context of nucleic acid sequences refers to the residues in the two sequences which are the same when aligned for maximum correspondence. The length of sequence identity comparison may be over a stretch of at least about nine nucleotides, usually at least about 20 nucleotides, more usually at least about 24 nucleotides, typically at least about 28 nucleotides, more typically at least about 32 nucleotides, and preferably at least about 36 or more nucleotides. There are a number of different algorithms known in the art which can be used to measure nucleotide sequence identity. For instance, polynucleotide sequences can be compared using FASTA, Gap or Bestfit, which are programs in Wisconsin Package Version 10.0, Genetics Computer Group (GCG), Madison, Wis. FASTA provides alignments and percent sequence identity of the regions of the best overlap between the query and search sequences. Pearson, Methods Enzymol. 183:63-98 (1990) (hereby incorporated by reference in its entirety). For instance, percent sequence identity between nucleic acid sequences can be determined using FASTA with its default parameters (a word size of 6 and the NOPAM factor for the scoring matrix) or using Gap with its default parameters as provided in GCG Version 6.1, herein incorporated by reference. Alternatively, sequences can be compared using the computer program, BLAST (Altschul et al., J. Mol. Biol. 215:403-410 (1990); Gish and States, Nature Genet. 3:266-272 (1993); Madden et al., Meth. Enzymol. 266:131-141 (1996); Altschul et al., Nucleic Acids Res. 25:3389-3402 (1997); Zhang and Madden, Genome Res. 7:649-656 (1997)), especially blastp or tblastn (Altschul et al., Nucleic Acids Res. 25:3389-3402 (1997)). [0053] A particular, non-limiting example of a mathematical algorithm utilized for the comparison of sequences is that of Karlin and Altschul (Proc. Natl. Acad. Sci. (1990) USA 87:2264-68; Proc. Natl. Acad. Sci. USA (1993) 90: 5873-77) as used in the NBLAST and XBLAST programs (version 2.0) of Altschul et al. (J. Mol. Biol. (1990) 215:403-10). BLAST nucleotide searches can be performed with the NBLAST program, score=100,

to obtain nucleotide sequences homologous to nucleic acid molecules of the invention. To obtain gapped alignments for comparison purposes, Gapped BLAST can be utilized as described in Altschul et al. (Nucleic Acids Research (1997) 25(17):3389-3402). When utilizing BLAST and Gapped BLAST programs, the default parameters of the respective programs (e.g., XBLAST and NBLAST) can be used (see website for BLAST hosted by the National Center for Biotechnology Information).

[0054] Purified: The term purified does not require absolute purity; rather, it is intended as a relative term. Thus, for example, a purified product preparation, is one in which the product is more concentrated than the product is in its environment within a cell. For example, a purified wax is one that is substantially separated from cellular components (nucleic acids, lipids, carbohydrates, and other peptides) that can accompany it. In another example, a purified wax preparation is one in which the wax is substantially free from contaminants, such as those that might be present following fermentation.

[0055] Recombinant: A recombinant nucleic acid molecule or protein is one that has a sequence that is not naturally occurring, has a sequence that is made by an artificial combination of two otherwise separated segments of sequence, or both. This artificial combination can be achieved, for example, by chemical synthesis or by the artificial manipulation of isolated segments of nucleic acid molecules or proteins, such as genetic engineering techniques. Recombinant is also used to describe nucleic acid molecules that have been artificially manipulated, but contain the same regulatory sequences and coding regions that are found in the organism from which the nucleic acid was isolated.

[0056] “Specific binding” refers to the ability of two molecules to bind to each other in preference to binding to other molecules in the environment. Typically,“specific binding” discriminates over adventitious binding in a reaction by at least two-fold, more typically by at least 10-fold, often at least 100-fold. Typically, the affinity or avidity of a specific binding reaction, as quantified by a dissociation constant, is about 10 -7 M or stronger (e.g., about 10 -8 M, 10 -9 M or even stronger). [0057] In general,“stringent hybridization” is performed at about 25 ºC below the thermal melting point (T m ) for the specific DNA hybrid under a particular set of conditions. “Stringent washing” is performed at temperatures about 5 ºC lower than the T m for the specific DNA hybrid under a particular set of conditions. The T m is the temperature at which 50% of the target sequence hybridizes to a perfectly matched probe. See Sambrook et al., Molecular Cloning: A Laboratory Manual, 2d ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989), page 9.51, hereby incorporated by reference. For purposes herein,“stringent conditions” are defined for solution phase hybridization as aqueous hybridization (i.e., free of formamide) in 6xSSC (where 20xSSC contains 3.0 M NaCl and 0.3 M sodium citrate), 1% SDS at 65 ºC for 8-12 hours, followed by two washes in 0.2xSSC, 0.1% SDS at 65ºC for 20 minutes. It will be appreciated by the skilled worker that hybridization at 65 ºC will occur at different rates depending on a number of factors including the length and percent identity of the sequences which are hybridizing.

[0058] A preferred, non-limiting example of stringent hybridization conditions includes hybridization in 4x sodium chloride/sodium citrate (SSC), at about 65-70 °C (or hybridization in 4x SSC plus 50% formamide at about 42-50 °C) followed by one or more washes in 1x SSC, at about 65-70 °C. A preferred, non-limiting example of highly stringent hybridization conditions includes hybridization in 1x SSC, at about 65-70 °C (or hybridization in 1x SSC plus 50% formamide at about 42-50 °C) followed by one or more washes in 0.3x SSC, at about 65-70 °C. A preferred, non-limiting example of reduced stringency hybridization conditions includes hybridization in 4x SSC, at about 50-60 °C (or alternatively hybridization in 6x SSC plus 50% formamide at about 40-45 °C) followed by one or more washes in 2x SSC, at about 50-60 °C. Intermediate ranges e.g., at 65-70 °C or at 42-50 °C are also within the scope of the invention. SSPE (1x SSPE is 0.15 M NaCl, 10 mM NaH 2 PO 4 , and 1.25 mM EDTA, pH 7.4) can be substituted for SSC (1x SSC is 0.15 M NaCl and 15 mM sodium citrate) in the hybridization and wash buffers; washes are performed for 15 minutes each after hybridization is complete. The hybridization temperature for hybrids anticipated to be less than 50 base pairs in length should be 5-10 °C less than the melting temperature (T m ) of the hybrid, where T m is determined according to the following equations. For hybrids less than 18 base pairs in length, T m (°C)=2(# of A+T bases)+4(# of G+C bases). For hybrids between 18 and 49 base pairs in length, T m (°C)=81.5+16.6(log 10 [Na + ]) +0.41 (% G+C)-(600/N), where N is the number of bases in the hybrid, and [Na + ] is the concentration of sodium ions in the hybridization buffer ([Na + ] for 1x SSC=0.165 M). [0059] The skilled practitioner recognizes that reagents can be added to hybridization and/or wash buffers. For example, to decrease non-specific hybridization of nucleic acid molecules to, for example, nitrocellulose or nylon membranes, blocking agents, including but not limited to, BSA or salmon or herring sperm carrier DNA and/or detergents, including but not limited to, SDS, chelating agents EDTA, Ficoll, PVP and the like can be used. When using nylon membranes, in particular, an additional, non-limiting example of stringent hybridization conditions is hybridization in 0.25-0.5M NaH 2 PO 4 , 7% SDS at about 65 °C, followed by one or more washes at 0.02M NaH 2 PO 4 , 1% SDS at 65 °C (Church and Gilbert (1984) Proc. Natl. Acad. Sci. USA 81:1991-1995,) or, alternatively, 0.2x SSC, 1% SDS.

[0060] The term“substantial homology” or“substantial similarity,” when referring to a nucleic acid or fragment thereof, indicates that, when optimally aligned with appropriate nucleotide insertions or deletions with another nucleic acid (or its complementary strand), there is nucleotide sequence identity at least about 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, preferably at least about 90%, and more preferably at least about 95%, 96%, 97%, 98% or 99% of the nucleotide bases, as measured by any well-known algorithm of sequence identity, such as FASTA, BLAST or Gap, as discussed above.

[0061] Alternatively, substantial homology or similarity exists when a nucleic acid or fragment thereof hybridizes to another nucleic acid, to a strand of another nucleic acid, or to the complementary strand thereof, under stringent hybridization conditions.“Stringent hybridization conditions” and“stringent wash conditions” in the context of nucleic acid hybridization experiments depend upon a number of different physical parameters. Nucleic acid hybridization will be affected by such conditions as salt concentration, temperature, solvents, the base composition of the hybridizing species, length of the complementary regions, and the number of nucleotide base mismatches between the hybridizing nucleic acids, as will be readily appreciated by those skilled in the art. One having ordinary skill in the art knows how to vary these parameters to achieve a particular stringency of

hybridization.

[0062] As used herein, a composition that is a“substantially pure” compound is substantially free of one or more other compounds, i.e., the composition contains greater than 80 vol.%, greater than 90 vol.%, greater than 95 vol.%, greater than 96 vol.%, greater than 97 vol.%, greater than 98 vol.%, greater than 99 vol.%, greater than 99.5 vol.%, greater than 99.6 vol.%, greater than 99.7 vol.%, greater than 99.8 vol.%, or greater than 99.9 vol.% of the compound; or less than 20 vol.%, less than 10 vol.%, less than 5 vol.%, less than 3 vol.%, less than 1 vol.%, less than 0.5 vol.%, less than 0.1 vol.%, or less than 0.01 vol.% of the one or more other compounds, based on the total volume of the composition.

[0063] Vector: The term“vector” as used herein refers to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. One type of vector is a “plasmid,” which refers to a circular double stranded DNA loop into which additional DNA segments may be ligated. Other vectors include cosmids, bacterial artificial chromosomes (BACs) and yeast artificial chromosomes (YACs). Another type of vector is a viral vector, wherein additional DNA segments may be ligated into the viral genome (discussed in more detail below). Certain vectors are capable of autonomous replication in a host cell into which they are introduced (e.g., vectors having an origin of replication which functions in the host cell). Other vectors can be integrated into the genome of a host cell upon introduction into the host cell, and are thereby replicated along with the host genome. Moreover, certain preferred vectors are capable of directing the expression of genes to which they are operatively linked. Such vectors are referred to herein as“recombinant expression vectors” (or simply,“expression vectors”). A vector can also include one or more selectable marker genes and other genetic elements known in the art. Suitable vectors for use in cyanobacteria include self-replicating plasmids (e.g., multiple copy and high-level expression) and chromosomal integration plasmids. Integration of vectors into the host genome or autonomously replicating vectors allow for gene expression in the host cell. When stable expression results from integration, the site of the construct's integration can occur randomly within the host genome or can be targeted through the use of constructs containing regions of homology with the host genome sufficient to target recombination with the host locus. Where constructs are targeted to an endogenous locus, all or some of the transcriptional and translational regulatory regions can be provided by the endogenous locus.

New Cellulases [0064] Provided herein are cellulases that are new variants of CelZ cellulase. Cellulase (E.C. 3.2.1.4) is an enzyme that hydrolyzes the ( 1 4) linkages between D–glucose residues of cellulose. The cellulases of the invention have improved activity over the wild-type CelZ. Additionally, isolated nucleic acid molecules encoding the cellulases of the invention are provided along with vectors comprising the isolated nucleic acid molecules and engineered host cells to produce the cellulases of the invention.

Enzymes [0065] In a first aspect, improved cellulases of the invention consists of or comprises SEQ ID NO:A.

SEQ ID NO: A SVEPLSVNGNKIYAGEKAKSFAGNSLFWSNNGWGGEKFYTADTVASLKKDWKSSIV RAAMGVQESGGYLQDPAGNKAKVERVVDAAIANX 1 MYAIIGWHSHSAENNRSEAIR FFQEMARKYGNKPNVIYEIYNEPLQVSWSNTIKPYAEAVISAIRAIDPDNLIIVGTPSW SQNVDEASRDPINAKNIAYTLHFYAGTHGESLRNKARQALNNGIALFVTEWGTVNA DGNGGVNX 2 TETDAWVTFMRDNNISNANWALNDKNEGASTYYPDSKNLTESGKKV KSIIQSWPYKAGSAASATTDPSTDTTTDTTVDEPTTTDTPATADCANANVYPNWVSK DWAGGQPTHNEAGQSIVYKGNLYTANWYTASVPGSDSSWTQVGSCX 3 wherein X 1 is selected from the group consisting of G, A, N, D and S; wherein X 2 is selected from the group consisting of H, Y, N, E, Q and R; wherein X 3 is selected from the group consisting of TDNNIGSG and N; and with the proviso that X 1 is not D when X 2 is Q and X 3 is N. [0066] In an alternate embodiment, cellulases of the invention consists of or comprises SEQ ID NO:AA. SEQ ID NO:AA includes a signal sequence which may be absent in the mature enzyme.

SEQ ID NO:AA MPLSYLDKNPVIDSKKHALRKKLFLSCAYFGLSLACLSSNAWASVEPLSVNGNKIYA GEKAKSFAGNSLFWSNNGWGGEKFYTADTVASLKKDWKSSIVRAAMGVQESGGYL QDPAGNKAKVERVVDAAIANX 1 MYAIIGWHSHSAENNRSEAIRFFQEMARKYGNKP NVIYEIYNEPLQVSWSNTIKPYAEAVISAIRAIDPDNLIIVGTPSWSQNVDEASRDPINA KNIAYTLHFYAGTHGESLRNKARQALNNGIALFVTEWGTVNADGNGGVNX 2 TETDA WVTFMRDNNISNANWALNDKNEGASTYYPDSKNLTESGKKVKSIIQSWPYKAGSAA SATTDPSTDTTTDTTVDEPTTTDTPATADCANANVYPNWVSKDWAGGQPTHNEAG QSIVYKGNLYTANWYTASVPGSDSSWTQVGSCX 3 wherein X 1 is selected from the group consisting of G, A, N, D and S; wherein X 2 is selected from the group consisting of H, Y, N, E, Q and R; wherein X 3 is selected from the group consisting of TDNNIGSG and N; and with the proviso that X 1 is not D when X 2 is Q and X 3 is N. [0067] In an alternate embodiment, cellulases of the invention consist of or comprises SEQ ID NO:B or SEQ ID NO: BA. SEQ ID NO:BA includes a signal sequence which may be absent in the mature enzyme.

[0068] In yet another alternate embodiment, cellulases of the invention consist of or comprise SEQ ID NO:C, SEQ ID NO:CA, SEQ ID NO:D, SEQ ID NO:DA, SEQ ID NO:E or SEQ ID NO:EA. SEQ ID NOs:CA, DA and EA include a signal sequence which may be absent in the mature enzyme.

[0069] In an alternative embodiment of the present invention, the isolated polypeptide comprises a polypeptide sequence at least 85% identical to any one of the sequences of disclosed sequences. Preferably the isolated polypeptide of the present invention has 90%, 95%, 98%, 98.1%, 98.2%, 98.3%, 98.4%, 98.5%, 98.6%, 98.7%, 98.8%, 98.9%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9% or even higher identity to any one of the disclosed sequences.

[0070] Alignment of sequences DA, EA, CA and BA with wild-type CelZ is provided in FIG. 1. Wild type CelZ includes signal sequence and is identified as SEQ ID NO: ZA.

Asterisks indicate a variant location and the variant amino acids are bold and underlined:

[0071] Using the aligned sequences, one of ordinary skill in the art can prepare additional CelZ variants that share the increased activity of the disclosed sequences.

[0072] According to other embodiments of the first aspect, isolated polypeptides comprising a fragment of the above-described polypeptide sequences are provided. These fragments preferably include at least 20 contiguous amino acids, more preferably at least 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100 or even more contiguous amino acids.

[0073] According to yet another embodiment of the first aspect, fusions between the above-described polypeptide sequences and heterologous polypeptides are provided. The heterologous sequences can, for example, include sequences designed to facilitate purification, e.g. histidine tags, and/or visualization of recombinantly-expressed proteins. Other non-limiting examples of protein fusions include those that permit display of the encoded protein on the surface of a phage or a cell, fusions to intrinsically fluorescent proteins, such as green fluorescent protein (GFP), fusion to signal peptides to direct polypeptide processing and export, fusion to cellulose binding module(s), fusion to dockerin domain(s), fusion to cohesion domain(s), fusion to fibronectin-like domain(s), and fusions to the IgG Fc region.

Nucleic Acid Molecules [0074] In a second aspect, nucleic acid molecules that encode cellulases of the invention are provided. In one embodiment of the second aspect, the nucleic acid molecule comprises or consists of SEQ ID NO:F. SEQ ID NO:F is the nucleic acid sequence encoding the cellulase of SEQ ID NO:B.

[0075] In another embodiment, nucleic acid sequences, SEQ ID NO:G, SEQ ID NO:H and SEQ ID NO:I are provided. These are the nucleic acid sequences that encode enzyme sequences, SEQ ID NO:C, SEQ ID NO:D and SEQ ID NO:E, respectively.

[0076] In another embodiment of the second aspect, nucleic acid molecules that hybridize under stringent conditions to SEQ ID NOS:F, G, H or I or the complement of SEQ ID NOS:F, G, H or I are provided.

[0077] It is well understood that one of skill in the art can mutate (e.g., substitute) nucleic acids which, due to the degeneracy of the genetic code, does not change which amino acid is encoded. This may be desirable in order to improve the codon usage of a nucleic acid to be expressed in a particular organism. Moreover, it is well understood that one of skill in the art can mutate (e.g., substitute) nucleic acids which encode for conservative amino acid substitutions. Therefore, the scope of the invention includes nucleic acid molecules that encode the cellulases of the invention as well as polypeptides substantially similar to the cellulases of the invention.

[0078] In one other embodiment of the second aspect, nucleic acid molecules comprising a fragment of any one of the above-described nucleic acid sequences are also provided. These fragments preferably contain at least 20 contiguous nucleotides. More preferably the fragments of the nucleic acid sequences contain at least 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100 or even more contiguous nucleotides. In one embodiment, nucleic acid molecule fragments comprise or consist of SEQ ID NO:J: 5'-AAAGTATCATCCAGAGCTGGCCGT- 3' (DdCel5Z For 977) or SEQ ID NO:K: 5''TTCGGATACACATTTGCGTTGGCG-3' (DdCel5Z Rev 1133).

Vectors [0079] In a third aspect of the invention, vectors for engineering a host cell to express the cellulase of the invention is provided.

[0080] In one embodiment of the third aspect, the vector is a plasmid. The plasmid comprises a nucleic acid molecule encoding the cellulase of the invention as well as regulatory sequences. Regulatory sequences include promoters, enhancers, termination signals, anti-termination signals and other expression control elements that, for example, serve as sequences to which repressors or inducers bind or serve as or encode binding sites for transcriptional and/or translational regulatory polypeptides, for example, in the transcribed mRNA (see Sambrook, J., Fritsh, E. F., and Maniatis, T. Molecular Cloning: A Laboratory Manual. 2nd, ed, Cold Spring Harbor Laboratory, Cold Spring Harbor

Laboratory Press, Cold Spring Harbor, N.Y., 1989). In one embodiment, the plasmid incorporates the T7 promoter sequence (5'-TAA TAC GAC TCA CTA TAG GG-3') to allow regulated expression of heterologous genes in E. coli by utilizing the high activity and specificity of the bacteriophage T7 RNA polymerase (Rosenberg et al., 1987; Studier & Moffatt, 1986; Studier et al., 1990). The commercially available pET101/D-TOPO

(accession number not available) vector is a suitable expression vector containing the following additional genetic elements: lac operator, ribosome binding site, TOPO ® cloning site, V5 epitope, polyhistidine region, T7 transcription termination region, bla promoter, ampicillin (bla) resistance gene, pBR322 origin, ROP ORF, and lacI ORF. A proprietary ligation independent cloning (LIC) vector functionally similar to pET101/D-TOPO was used in the present invention.

Example 1– Generating Plasmid [0081] DNA encoding novel variants was PCR amplified using LIC-tailed gene-specific primers as follows: 0.3 μM primers (forward 5'- TACTTCCAATCCAATGCAATGAGCGTGGAACCGCT-3'(SEQ ID NO:L); reverse 5'- TTATCCACTTCCAATGTTATTATCAGTTACAGCTACCAACCT-3' (SEQ ID NO:M), 1 ng template DNA, 1x KOD Hot Start Master Mix (#71842-3, EMD Chemicals), water to 50 μL. Reactions were incubated at 95 °C for 2 min to activate the polymerase followed by 30x thermal cycling at 95 °C for 20 s, 60 °C for 10 s, and 70 °C for 30 s. Resulting amplicons were band-purified by agarose gel electrophoresis to remove non-specific amplicons and purified by using the illustra GFX PCR DNA and Gel Band Purification kit (#28-9034-71, GE Healthcare) according to manufacture protocols. 0.2 pmol of purified amplicon was treated at 22 °C for 30 min and 75 °C for 20 min with 1 U of LIC-qualified T4 DNA polymerase (#70099-3, EMD Chemicals) in a 20 μL reaction containing 2.5 mM dCTP, 5.0 mM DTT, and 1x T4 DNA polymerase buffer to generate sticky overhangs. 0.02 pmol of the treated product was hybridized with 0.014 pmol of similarly prepared LIC vector in a 3.4 μL reaction at 25 °C for 5 min followed by the addition of 1 μL 25 mM EDTA and another 25 °C 5 min incubation.

Host Cell Transformants [0082] In a fourth aspect of the invention, host cells transformed with the nucleic acid molecules or vectors, and descendants thereof, are provided. In some embodiments, these cells carry the nucleic acid sequences on vectors, which may but need not be freely replicating vectors. In other embodiments, the nucleic acids have been integrated into the genome of the host cells.

[0083] In one embodiment of the fourth aspect, suitable host cells are SHuffle TM competent E. coli available from New England BioLabs in Ipswich, MA.

Example 2– Transforming E. coli [0084] The hybridization reaction above was transformed into SHuffle TM competent E. coli (#C3029H, New England Biolabs) as follows: 4.4 μL of hybridized insert and vector were added to thawed SHuffle™ cells and incubated on ice for 30 min. The mixture was heat shocked at 42 °C for 30 s followed by a 1 hr outgrowth at 30 °C in 250 μL SOC media. The mixture was split and plated onto 2 pre-warmed LB plates containing 100 μg/mL carbenicillin and grown inverted for 16 hr at 37 °C.

[0085] In another embodiment of the fourth aspect, methods for expressing a polypeptide under suitable culture conditions and choice of host cell line for optimal enzyme expression, activity and stability (codon usage, salinity, pH, temperature, etc.) are provided.

Example 3– Generating Cellulase [0086] Individual colonies were picked into 1 mL Magic Media (#K6803, Life

Technologies) and incubated at 37 °C for 6 hrs followed by 25 °C for 18 hrs, both with shaking at 900 rpm. OD600 measurements were made on 50 μL of sample to determine cell growth. Cell pellets were recovered by decanting the supernatant after centrifuging at 3,000 rpm at 4 °C for 10 min. Cell pellets were lysed for 20 min in 200 μL BugBuster TM (#71456- 3, EMD Chemicals) stock solution (40 mL BugBuster TM , 400 μL 100 mM PMSF, 40 μL 1 mg/mL pepstatin, 40 μL 1 mg/mL leupeptin, 40 mg lysozyme). Lysed cells were centrifuged at 3,000 rpm at 4 °C for 30 min and the cleared lysate supernatant containing expressed cellulase variant was recovered for testing.

Example 4– Determining Activity of Generated Cellulases [0087] 5 μL of cleared lysate diluted 1:10 in 50 mM HEPES buffer pH 7.2 was combined with 45 μL of 1.1% carboxymethyl cellulose (CMC) and incubated at 50 °C for 30 min. After incubation, 120 μL of stock 3,5-dinitrosalicylic acid (DNS) solution (5 g DNS, 250 mL H 2 O, heat to 40°C, 50 g KNa tartrate, 50 mL 4N NaOH, H 2 O to 500 mL) was added and incubated at 95°C for 5 min. Absorbance at 540 nm from a 100 μL sample was measured to determine the amount of liberated reducing sugars by the cellulase variant. Measurements were calibrated to a glucose standard curve and compared to wild-type activity under identical conditions. Wild-type CelZ (SEQ ID NO:Z) and improved variants of SEQ ID NO: B, C, D and E were tested and the results are shown below in FIG. 2.

Example 5– Determining Activity of Generated Cellulases on AVICEL TM [0088] Cellulase variants were purified from cleared lysate (Example 3– Generating Cellulase) by IMAC, desalted by ultrafiltration, and quantified by absorbance at 280 nm. 125μl of purified enzymes at 5, 10, 20, 40, 80, 160 μg/mL in 50 mM HEPES buffer pH 7.2 were combined with 125μl 20 mg/ml AVICEL TM PH-101 (Sigma-Aldrich) in 50 mM HEPES buffer pH 7.2 and incubated at 50°C, shaking at 1000 rpm (ø3mm orbit). After 24h, 75μl of the AVICEL TM PH-101 digestion reaction was transferred to a new tube and pelleted at 2000x g for 5 min. 25μl of the supernatant was then added to 225μl of p-hydroxybenzoic acid hydrazide (pHBAH) solution (0.185% pHBAH, 0.1M NaOH) and incubated at 95°C for 5 min. Absorbance at 410 nm from a 100 μl sample was then measured to determine the amount of liberated reducing sugars by the cellulase variant. Absorbance measurements were transformed into % conversion assuming complete hydrolysis of the substrate into cellobiose. Wild-type CelZ (SEQ ID NO:Z) and improved variant SEQ ID NO: B were tested under identical conditions and the results are shown in FIG. 3. Example 6– Biofuel production by using enzyme variant SEQ ID NO:B [0089] The following procedure may be used to produce ethanol from biomass.

Generally, the procedure comprises simultaneous saccharification and fermentation (SSF) of pretreated lignocellulosic biomass whereby cellulases convert the biomass into accessible sugar and yeast ferment the sugar into ethanol.

[0090] Biomass may be pretreated in various manners. One method is to solubilize the biomass in concentrated phosphoric acid then precipitate the swollen cellulose using cold water. The cellulose is collected and washed with sufficient water to neutralize the pH. Alternatively, the biomass may be pretreated with dilute sulfuric acid. Briefly, milled and washed biomass at 20% total solids concentration is treated for 3–12 min in 0.5–1.41% (w/w liquid phase) H 2 SO 4 at 165–183°C. Following treatment, the biomass is washed with water.

[0091] Simultaneous saccharification and fermentation is conducted in a shaking incubator (150 rpm) at a working volume of 100 mL in 250-mL baffled flasks. The washed pretreated biomass is loaded to a level of 6–7% (w/w) cellulose fraction and combined with 35 mg cellulase enzyme variant SEQ ID NO:B per gram of cellulose and beta-glucosidase such as ACCELLERASE TM BG (Danisco A/S, Copenhagen, DK) at 0.05 mL product per gram of cellulose. The medium consists of yeast extract (1% [w/v]), peptone (2% [w/v]), and citrate buffer (0.05 M). The initial pH is adjusted to 5.2 using NaOH, and then the culture is inoculated with the yeast, Saccharomyces cerevisiae D 5 A, to achieve an initial optical density (at 600 nm) of 0.5. The flask is maintained at 32–38°C for 7 days.

Informal Sequence Listing

29