Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
HETEROLOGOUS AND HOMOLOGOUS CELLULASE EXPRESSION SYSTEM
Document Type and Number:
WIPO Patent Application WO/2008/153903
Kind Code:
A3
Abstract:
The present invention provides filamentous fungi that express a combination of heterologous and homologous polypeptides, polypeptide mixtures comprising a combination of heterologous and homologous polypeptides and methods of producing the polypeptide mixtures.

Inventors:
BOWER BENJAMIN S (US)
LARENAS EDMUND A (US)
Application Number:
PCT/US2008/007077
Publication Date:
March 05, 2009
Filing Date:
June 05, 2008
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
DANISCO US INC GENENCOR DIV (US)
BOWER BENJAMIN S (US)
LARENAS EDMUND A (US)
International Classes:
C12N1/15; C12N15/80; C12P21/02
Domestic Patent References:
WO1998031821A21998-07-23
WO2005093073A12005-10-06
WO1996000281A11996-01-04
WO1997014804A11997-04-24
Foreign References:
US5753484A1998-05-19
Other References:
NYYSSONEN E ET AL: "MULTIPLE ROLES OF THE CELLULASE CBHI IN ENHANCING PRODUCTION OF FUSION ANTIBODIES BY THE FILAMENTOUS FUNGUS TRICHODERMA REESEI", CURRENT GENETICS, NEW YORK, NY, US, vol. 28, no. 1, 1 January 1995 (1995-01-01), pages 71 - 79, XP008039837, ISSN: 0172-8083
BAKER JOHN O ET AL: "Hydrolysis of cellulose using ternary mixtures of purified cellulases", APPLIED BIOCHEMISTRY AND BIOTECHNOLOGY, vol. 70-72, no. 0, April 1998 (1998-04-01), pages 395 - 403, XP002498059, ISSN: 0273-2289
WALKER L P ET AL: "Engineering cellulase mixtures by varying the mole fraction of Thermomonospora fusca E-5 and E-3, Trichoderma reesei CBHI, and Caldocellum saccharolyticum beta-glucosidase", BIOTECHNOLOGY AND BIOENGINEERING, vol. 42, no. 9, 1993, pages 1019 - 1028, XP002498060, ISSN: 0006-3592
Attorney, Agent or Firm:
KOLMAN, Michael, F. (Inc. Genencor Division,925 Page Mill Roa, Palo Alto CA, US)
Download PDF:
Claims:

CLAIMS

What is Claimed:

1. A filamentous fungus comprising a first polynucleotide encoding a first heterologous polypeptide, a second polynucleotide encoding a second heterologous polypeptide, and a third polynucleotide encoding a homologous polypeptide wherein the filamentous fungus is capable of expressing the first and second heterologous polypeptide and the homologous polypeptide and wherein the first and second heterologous polypeptide and the homologous polypeptide form a functional mixture.

2. The filamentous fungus of claim 1 , wherein the first polynucleotide is operably linked to a first promoter.

3. The filamentous fungus of claim 1, wherein the second polynucleotide is fused with the third polynucleotide and wherein the second and third polynucleotides are operably linked to a second promoter.

4. The filamentous fungus of claim 1, wherein the first polynucleotide is operably linked to a promoter native to the gene encoding the homologous polypeptide..

5. The filamentous fungus of claim 1, wherein the second polynucleotide is fused with the third polynucleotide and wherein the third polynucleotide is operably linked to a promoter of a gene encoding the homologous polypeptide.

6. The filamentous fungus of claim 1, wherein the second polynucleotide is fused with the third polynucleotide to form a polynucleotide encoding a fusion protein, wherein the fusion protein comprises the second heterologous polypeptide and the homologous polypeptide separated by a linker.

7. The filamentous fungus of claim 6, wherein the fusion protein further comprises a cleavage site.

8. The filamentous fungus of claim 1 further comprising a fourth polynucleotide encoding a selectable marker.

9. The filamentous fungus of claim 1 further comprising a fourth polynucleotide encoding a third heterologous polypeptide, wherein the filamentous fungus is capable of expressing the third heterologous polypeptide.

10. The filamentous fungus of claim 1, wherein the first heterologous polypeptide is a modified homologous polypeptide.

1 1. The filamentous fungus of claim 1 further comprising a fourth polynucleotide encoding a third heterologous polypeptide, wherein the first and second heterologous polypeptides are modified homologous polypeptides.

12. The filamentous fungus of claim 1, wherein the first heterologous polypeptide, the second heterologous polypeptide or the homologous polypeptide is an enzyme.

13. The filamentous fungus of claim 1, wherein the first heterologous polypeptide, the second heterologous polypeptide or the homologous polypeptide is a cellulase.

14. The filamentous fungus of claim 1 , wherein the functional mixture is a mixture of cellulases.

15. The filamentous fungus of claim 1, wherein the first heterologous polypeptide, the second heterologous polypeptide or the homologous polypeptide is a cellulase selected from the group consisting of exo-cellobiohydrolases, endoglucanases, and beta-glucosidases.

16. The filamentous fungus of claim 1, wherein the first heterologous polypeptide is an exo- cellobiohydrolase and the second heterologous polypeptide is an endoglucanase.

17. The filamentous fungus of claim 1, wherein the first heterologous polypeptide is an exo- cellobiohydrolase selected from the group consisting of GH family 5, 6, 7, 9 and^8, and wherein the second heterologous polypeptide is an endoglucanase selected from the group consisting of GH family 5, 6, 7, 8, 9, 12, 17, 31, 44, 45, 48, 51, 61, 64, 74, and 81.

18. The filamentous fungus of claim 1, wherein the first heterologous polypeptide is an exo- cellobiohydrolase, the second heterologous polypeptide is an endoglucanase, and wherein the homologous polypeptide is an exo-cellobiohydrolase.

19. The filamentous fungus of claim 1, wherein the first heterologous polypeptide is a first exo-cellobiohydrolase, the second heterologous polypeptide is an endoglucanase, the homologous polypeptide is a second exo-cellobiohydrolase, and wherein the first exo-

cellobiohydrolase and the second exo-cellobiohydrolase correspond to the same member of cellobiohydrolases.

20. The filamentous fungus of claim 1, wherein the filamentous fungus is selected from the group consisting of Aspergillus, Acremonium, Aureobasidium, Beauveria, Cephalosporium, Ceriporiopsis, Chaetomium, Paecilomyces, Chrysosporium, Claviceps,

Cochiobolus, Cryptococcus, Cyathus, Endothia, Fusarium, Gilocladium, Humicola, Magnaporthe, Myceliophthora, Myrothecium, Mucor, Neurospora, Phanerochaete, Podospora, Paecilomyces, Penicillium, Pyricularia, Rhizomucor, Rhizopus, Schizophylum, Stagonospora, Talaromyces, Trichoderma, Thermomyces, Thermoascus, Thielavia, Tolypocladium, Trichophyton, Trametes, and Pleurotus.

21. The filamentous fungus of claim 1, wherein the filamentous fungus is T. reesei and wherein the first heterologous polypeptide is Humicola grisea CBHI, the second heterologous polypeptide is Acidothermus cellulolyticus endoglucanase 1 , and wherein the homologous polypeptide is Trichoderma reesei CBHI.

22. The filamentous fungus of claim 1, wherein the filamentous fungus is T. reesei and wherein the first heterologous polypeptide or the second heterologous polypeptide is selected from the group consisting of Penicillium funiculosum cellobiohydrolase CBHI, Thermobiβda endoglucanases E3, Thermobifida endoglucanases E5, Acidothermus cellulolyticus GH74-core and GH48.

23. The filamentous fungus of claim 1 further comprising a fourth polynucleotide encoding a third heterologous polypeptide, wherein the first polypeptide is a modified Trichoderma reesei CBHI, the second heterologous polypeptide is a modified Trichoderma reesei CBHII, the third heterologous polypeptide is Acidothermus cellulolyticus endoglucanase 1, and the homologous polypeptide is Trichoderma reesei CBHI.

24. The filamentous fungus of claim 1, wherein the first heterologous polypeptide is an exo-cellobiohydrolase, the second heterologous polypeptide is an endoglucanase, and the homologous polypeptide is an exo-cellobiohydrolase, and wherein expression of the first heterologous polypeptide, the second heterologous polypeptide and the homologous polypeptide forms a mixture of thermostable cellulases.

25. The filamentous fungus of claim 1, wherein the third polynucleotide is an extrachromosomal polynucleotide.

26. The filamentous fungus of claim 1, wherein the first, second, and third polynucleotide are extrachromosomal polynucleotides.

27. A culture medium comprising a population of the filamentous fungus of claim 1.

28. A polypeptide mixture comprising the first heterologous polypeptide, the second heterologous polypeptide, and the homologous polypeptide obtained from the filamentous fungus of claim 1.

29. The polypeptide mixture of claim 28, wherein the mixture is a mixture of cellulases.

30. A method of producing a mixture of cellulases comprising obtaining a polypeptide mixture from the filamentous fungus of claim 1, wherein the polypeptide mixture comprises the first heterologous polypeptide, the second heterologous polypeptide, and the homologous polypeptide.

31. A method of producing a mixture of cellulases comprising obtaining a polypeptide mixture from the filamentous fungus of claim 1, wherein the polypeptide mixture comprises the first heterologous polypeptide, the second heterologous polypeptide, and the homologous polypeptide, and wherein the first heterologous polypeptide is an exo-cellobiohydrolase, the second heterologous polypeptide is an endoglucanase, and the homologous polypeptide is an exo-cellobiohydrolase.

32. A method of producing a mixture of cellulases comprising obtaining a polypeptide mixture from the filamentous fungus of claim 1 , wherein the polypeptide mixture comprises the first heterologous polypeptide, the second heterologous polypeptide, and the homologous polypeptide, wherein the first heterologous polypeptide is a first exo-cellobiohydrolase, the second heterologous polypeptide is an endoglucanase, the homologous polypeptide is a second exo-cellobiohydrolase, and wherein the first exo-cellobiohydrolase and the second exo-cellobiohydrolase correspond to the same member of cellobiohydrolases.

33. A method of producing a mixture of cellulases comprising obtaining a polypeptide mixture from the filamentous fungus of claim 1 , wherein the polypeptide mixture comprises the first heterologous polypeptide, the second heterologous polypeptide, and the homologous polypeptide and wherein the filamentous fungus is T. reesei and the first heterologous polypeptide is Humicola grisea CBHI, the second heterologous polypeptide is Acidothermus cellulolyticus endoglucanase 1, and the homologous polypeptide is Trichoderma reesei CBHI.

34. A method of producing a mixture of cellulases comprising obtaining a polypeptide mixture from the filamentous fungus of claim 23, wherein the polypeptide mixture comprises the first heterologous polypeptide, the second heterologous polypeptide, the third heterologous polypeptide and the homologous polypeptide.

Description:

HETEROLOGOUS AND HOMOLOGOUS CELLULASE EXPRESSION SYSTEM

1. CROSS-REFERENCES TO RELATED APPLICATION [0001] The present application claims benefit of and priority to U.S. Provisional Application Ser. No. US 60/933,894, filed June 8, 2007, which is incorporated herein by reference in its entirety.

2. STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH

[0002] Portions of this work were funded by Subcontract No. ZCO-0-30017-01 with the National Renewable Energy Laboratory under Prime Contract No. DE-AC36-99G010337 with the United States Department of Energy. Accordingly, the United States Government may have certain rights in the invention.

3. INTRODUCTION

[0003] Cellulose and hemicellulose are the most abundant plant materials produced by photosynthesis. They can be degraded and used as an energy source by numerous microorganisms, including bacteria, yeast and fungi, which produce extracellular enzymes capable of hydrolyzing the polymeric substrates to monomeric sugars.

[0004] Cellulases are enzymes that hydrolyze cellulose (beta-l,4-glucan or beta D-glucosidic linkages) resulting in the formation of glucose, cellobiose, cellooligosaccharides, and the like. Cellulases have been traditionally divided into three major classes: endoglucanases (EC 3.2.1.4) ("EG"), exoglucanases or cellobiohydrolases (EC 3.2.1.91) ("CBH") and beta-glucosidases ([beta] -D-glucoside glucohydrolase; EC 3.2.1.21) ("BG"). Endoglucanases act mainly on the amorphous parts of the cellulose fiber, whereas cellobiohydrolases are also able to degrade crystalline cellulose. In order to efficiently convert crystalline cellulose to glucose the complete cellulase system comprising components from each of the CBH, EG and BG classifications is required, with isolated components less effective in hydrolyzing crystalline cellulose (Filho et al., Can. J. Microbiol. 42: 1-5, 1996). It would be advantageous to express these multi- component cellulase systems cellulases in a filamentous fungus for industrial scale cellulase production.

4. SUMMARY

[0005] Accordingly, the present teachings provide filamentous fungi that express a combination of heterologous and homologous polypeptides, polypeptide mixtures comprising a combination of heterologous and homologous polypeptides and methods of producing the polypeptide mixtures.

[0006] In some embodiments, the present teachings provide a filamentous fungus comprising two or more polynucleotides that encode two or more heterologous polypeptides and a polynucleotide encoding a homologous polypeptide. The filamentous fungus is capable of expressing the heterologous and homologous polypeptides that together form a functional mixture.

[0007] In some embodiments, the present teachings provide a culture medium comprising a population of the filamentous fungus of the present teachings.

[0008] In some embodiments, the present teachings provide a polypeptide mixture comprising two or more heterologous polypeptides and a homologous polypeptide. The polypeptide mixture can be obtained from the filamentous fungi of the present teachings.

[0009] In some embodiments, the present teachings provide a method of producing a mixture of cellulases. The method comprises obtaining a polypeptide mixture comprising two or more heterologous polypeptides and a homologous polypeptide from the filamentous fungus of the present teachings. In some embodiments, the heterologous polypeptides are an exo- cellobiohydrolase and an endoglucanase, and the homologous polypeptide is an exo- cellobiohydrolase. The heterologous exo-cellobiohydrolase and the homologous exo- cellobiohydrolase, may, but need not be the same member of exo-cellobiohydrolases.

[0010] These and other features of the present teachings are set forth below.

[0011]

5. BRIEF DESCRIPTION OF THE FIGURES

[0012] The skilled artisan will understand that the drawings are for illustration purposes only. The drawings are not intended to limit the scope of the present teachings in any way.

[0013] Figure 1 provides the nucleotide sequence (SEQ ID NO: 1) of the heterologous cellulase fusion construct comprising 2656 bases.

[0014] Figure 2 provides the predicted amino acid sequence (SEQ ID NO: 2) of the cellulase fusion protein based on the nucleic acid sequence of Figure 1.

[0015] Figures 3A-F depicts the nucleotide sequence (SEQ ID NO: 14) of the pTrex4 vector containing the El catalytic domain.

[0016] Figure 4 depicts the plasmid map of T. reesei expression vector pTrex3g.

[0017] Figure 5A depicts the expression vector pTrex3g-Hgrisea-cbhl used for making an exemplary tripartite strain.

[0018] Figures 5B-E provides the nucleotide sequence (SEQ ID NO: 7) of the expression vector of Figure 5 A.

[0019] Figure 6 shows the three DNA expression fragments transformed into the cbhl deleted strain to create a 4-part strain.

[0020] Figure 7A provides the nucleotide sequence (SEQ ID NO: 8) from start to stop codon of the polynucleotide expressing the engineered CBHI protein.

[0021] Figure 7B provides the sequence of the engineered CBHI protein (SEQ ID NO: 9). The CBHI signal sequence is underlined.

[0022] Figure 8A depicts the cbhl expression vector pTrex3g-cό/7/.

[0023] Figures 8B-F provides the nucleotide sequence (SEQ ID NO: 10) of the expression vector pTrex3g-cbhl .

[0024] Figure 9A provides the nucleotide sequence (SEQ ID NO: 11) from start to stop codon of the polynucleotide expressing the engineered CBHI protein.

[0025] Figure 9B provides the amino acid sequence of the engineered CBHII protein (SEQ ID NO: 12). The signal sequence is underlined).

[0026] Figure 1OA depicts the cbhll expression vector pExp-cbhll.

[0027] Figures lOB-G provides the nucleotide sequence (SEQ ID NO: 13) of the expression vector pExp-cbhll.

6. DETAILED DESCRIPTION OF VARIOUS EMBODIMENTS

[0028] The present teachings will now be described in detail by way of reference only using the following definitions and examples. Unless defined otherwise herein, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Numeric ranges are inclusive of the numbers defining the range. The headings provided herein are not limitations of the various aspects or embodiments which can be had by reference to the specification as a whole. Accordingly, the terms defined immediately below are more fully defined by reference to the specification as a whole.

[0029] The term "polypeptide" as used herein refers to a compound made up of a single chain of amino acid residues linked by peptide bonds. The term "protein" as used herein is used interchangeably with the term "polypeptide."

[0030] The term "nucleic acid" and "polynucleotide" are used interchangeably and encompass DNA, RNA, cDNA, single stranded or double stranded and chemical modifications thereof.

Because the genetic code is degenerate, more than one codon may be used to encode a particular amino acid, and the present invention encompasses all polynucleotides, which encode a particular amino acid sequence.

[0031] The term "recombinant" when used in reference to a cell, nucleic acid, protein or vector, indicates that the cell, nucleic acid, protein or vector, has been modified by the introduction of a heterologous nucleic acid or protein or the alteration of a native nucleic acid or protein, or that the cell is derived from a cell so modified or that a protein is expressed in a non-native or genetically modified environment, e.g., in an expression vector for a prokaryotic or eukaryotic system. Thus, for example, recombinant cells express nucleic acids or polypeptides that are not

found within the native (non-recombinant) form of the cell or express native genes that are otherwise abnormally expressed, under expressed, over expressed or not expressed at all.

[0032] The term "heterologous" with reference to a polynucleotide or polypeptide refers to a polynucleotide or polypeptide having a sequence that does not naturally occur in a host cell. In some embodiments, the polypeptide is a commercially important industrial protein and in some embodiments, the heterologous polypeptide is a therapeutic protein. It is intended that the term encompasses proteins that are encoded by naturally occurring genes, mutated genes, and/or synthetic genes.

[0033] The term "homologous" with reference to a polynucleotide or polypeptide refers to a polynucleotide or polypeptide having a sequence that occurs naturally in the host cell.

[0034] As used herein, a "fusion nucleic acid" comprises two or more nucleic acids operably linked together. The nucleic acid may be DNA, both genomic and cDNA, or RNA, or a hybrid of RNA and DNA. Nucleic acid encoding all or part of the sequence of a polypeptide can be used in the construction of the fusion nucleic acid sequences. In some embodiments, nucleic acid encoding full length polypeptides are used. In some embodiments, nucleic acid encoding a portion of the polypeptide may be employed.

[0035] The term "fusion polypeptide" refers to a protein that comprises at least two separate and distinct regions that may or may not originate from the same protein. For example, a signal peptide linked to the protein of interest wherein the signal peptide is not normally associated with the protein of interest would be termed a fusion polypeptide or fusion protein.

[0036] The terms "recovered", "isolated", and "separated" are used interchangeably herein to refer to a protein, cell, nucleic acid, amino acid etc. that is removed from at least one component with which it is naturally associated.

[0037] As used herein, the term "gene" refers to a polynucleotide (e.g., a DNA segment) involved in producing a polypeptide chain, that may or may not include regions preceding and following the coding region, e.g. 5' untranslated (5 1 UTR) or "leader" sequences and 3' UTR or "trailer" sequences, as well as intervening sequences (introns) between individual coding segments (exons).

[0038] As used herein, the term "promoter" refers to a nucleic acid sequence that functions to direct transcription of a downstream gene. The promoter will generally be appropriate to the host cell in which the target gene is being expressed. The promoter together with other transcriptional and translational regulatory nucleic acid sequences (also termed "control sequences") are necessary to express a given gene. In general, the transcriptional and translational regulatory sequences include, but are not limited to, promoter sequences, ribosomal binding sites, transcriptional start and stop sequences, translational start and stop sequences, and enhancer or activator sequences.

[0039] As used herein, the term "operably linked" means that the transcriptional nucleic acid is positioned relative to the coding sequences in such a manner that transcription is initiated.

Generally, this will mean that the promoter and transcriptional initiation or start sequences are positioned 5' to the coding region. The transcriptional nucleic acid will generally be appropriate to the host cell used to express the protein. Numerous types of appropriate expression vectors, and suitable regulatory sequences are known in the art for a variety of host cells.

[0040] As used herein, the term "expression" refers to the process by which a polypeptide is produced based on the nucleic acid sequence of a gene. The process includes both transcription and translation.

[0041] As used herein, the term "vector" refers to a polynucleotide construct designed to introduce nucleic acids into one or more cell types. Vectors include cloning vectors, expression vectors, shuttle vectors, plasmids, cassettes and the like.

[0042] As used herein, the term "expression vector" refers to a vector that has the ability to incorporate and express heterologous DNA fragment in a foreign cell. Many prokaryotic and eukaryotic expression vectors are commercially available.

[0043] As used herein, the terms "DNA construct," "transforming DNA" and "expression vector" are used interchangeably to refer to DNA used to introduce sequences into a host cell or organism. The DNA may be generated in vitro by PCR or any other suitable technique(s) known to those in the art, for example using standard molecular biology methods described in Sambrook et al. In addition, the DNA of the expression construct could be artificially, for example, chemically synthesized. The DNA construct, transforming DNA or recombinant expression cassette can be incorporated into a plasmid, chromosome, extrachromosomal

element, mitochondrial DNA, plastid DNA, virus, or nucleic acid fragment. Typically, the recombinant expression cassette portion of an expression vector, DNA construct or transforming DNA includes, among other sequences, a nucleic acid sequence to be transcribed and a promoter. In preferred embodiments, expression vectors have the ability to incorporate and express heterologous DNA fragments in a host cell.

[0044] The term "introduced" in the context of inserting a nucleic acid sequence into a cell, means "transfection" or "transformation" or "transduction" and includes reference to the incorporation of a nucleic acid sequence into a eukaryotic or prokaryotic cell where the nucleic acid sequence may be incorporated into the genome of the cell (for example, chromosome, extrachromosomal element, plasmid, plastid, or mitochondrial DNA), converted into an autonomous replicon, or transiently expressed (for example, transfected mRNA).

[0045] By the term "host cell" is meant a cell that contains a vector and supports the replication, and/or transcription or transcription and translation (expression) of the expression construct.

[0046] As used herein, the term "culturing" refers to growing a population of cells under suitable conditions in a liquid, semi-solid or solid medium.

[0047] As used herein, "substituted" and "modified" are used interchangeably and refer to a sequence, such as an amino acid sequence or a nucleic acid sequence that includes a deletion, insertion, replacement or interruption of a naturally occurring sequence. Often in the context of the invention, a substituted sequence shall refer, for example, to the replacement of a naturally occurring residue.

[0048] As used herein, "modified enzyme" refers to an enzyme that includes a deletion, insertion, replacement or interruption of a naturally occurring sequence.

[0049] The term "variant" refers to a region of a protein that contains one or more different amino acids as compared to a reference protein, for example, a naturally occurring or wild-type protein.

[0050] The term "cellulase" refers to a category of enzymes capable of hydrolyzing cellulose (beta-l,4-glucan or beta D-glucosidic linkages) polymers to shorter cello-oligosaccharide oligomers, cellobiose and/or glucose.

[0051] The term "exo-cellobiohydrolase" (CBH) refers to a group of cellulase enzymes classified as EC 3.2.1.91 and/or those in certain GH families, including, but not limited to, those in GH families 5, 6, 7, 9 or 48. These enzymes are also known as exoglucanases or cellobiohydrolases. CBH enzymes hydrolyze cellobiose from the reducing or non-reducing end of cellulose. In general a CBHI type enzyme preferentially hydrolyzes cellobiose from the reducing end of cellulose and a CBHII type enzyme preferentially hydrolyzes the non-reducing end of cellulose.

[0052] The term "cellobiohydrolase activity" is defined herein as a 1 ,4-D-glucan cellobiohydrolase activity which catalyzes the hydrolysis of 1 ,4-beta-D-glucosidic linkages in cellulose, cellotetriose, or any beta-l,4-linked glucose containing polymer, releasing cellobiose from the ends of the chain. As used herein, cellobiohydrolase activity is determined by release of water-soluble reducing sugar from cellulose as measured by the PHBAH method of Lever et al., 1972, Anal. Biochem. 47: 273-279. A distinction between the exoglucanase mode of attack of a cellobiohydrolase and the endoglucanase mode of attack is made by a similar measurement of reducing sugar release from substituted cellulose such as carboxymethyl cellulose or hydroxyethyl cellulose (Ghose, 1987, Pure & Appl. Chem. 59: 257-268). A true cellobiohydrolase will have a very high ratio of activity on unsubstituted versus substituted cellulose (Bailey et al, 1993, Biotechnol. Appl. Biochem. 17: 65-76).

[0053] The term "endoglucanase" (EG) refers to a group of cellulase enzymes classified as EC 3.2.1.4, and/or those in certain GH families, including, but not limited to, those in GH families 5, 6, 7, 8, 9, 12, 17, 31, 44, 45, 48, 51, 61, 64, 74 or 81. An EG enzyme hydrolyzes internal beta- 1,4 glucosidic bonds of the cellulose. The term "endoglucanase" is defined herein as an endo-l,4-(l,3;l,4)-beta-D-glucan 4-glucanohydrolase which catalyses endohydrolysis of 1 ,4- beta-D-glycosidic linkages in cellulose, cellulose derivatives (for example, carboxy methyl cellulose), lichenin, beta- 1,4 bonds in mixed beta- 1,3 glucans such as cereal beta-D-glucans or xyloglucans, and other plant material containing cellulosic components. As used herein, endoglucanase activity is determined using carboxymethyl cellulose (CMC) hydrolysis according to the procedure of Ghose, 1987, Pure and Appl. Chem. 59: 257-268.

[0054] The term "beta-glucosidase" is defined herein as a beta-D-glucoside glucohydrolase classified as EC 3.2.1.21, and/or those in certain GH families, including, but not limited to, those in GH families 1,-3, 9 or 48, which catalyzes the hydrolysis of cellobiose with the release of

beta-D-glucose. As used herein, beta-glucosidase activity may be measured by methods known in the art, e.g., HPLC.

[0055] "Cellulolytic activity" encompasses exoglucanase activity, endoglucanase activity or both types of enzyme activity, as well as beta-glucosidase activity.

[0056] The terms "thermally stable" and "thermostable" refer to polypeptides or enzymes of the present teaching that retain a specified amount of biological, e.g., enzymatic, activity after exposure to an elevated temperature, i.e., higher than room temperature. In some embodiments, a polypeptide or an enzyme is considered thermo stable if it retains greater than 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95% or 98% of its biological activity after exposure to a specified temperature, e.g. ,40 0 C, 45 0 C, 50 0 C, 55 0 C, 60 0 C, 65 0 C, 70 0 C, 75 0 C or 80 0 C for 2, 5, 7, 10, 15, 20, 30, 40, 50 or 60 minutes at a pH of, e.g., 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5 or 8.

[0057] The term "filamentous fungi" means any and all filamentous fungi recognized by those of skill in the art. In general, filamentous fungi are eukaryotic microorganisms and include all filamentous forms of the subdivision Eumycotina. These fungi are characterized by a vegetative mycelium with a cell wall composed of chitin, beta-glucan, and other complex polysaccharides. In some embodiments, the filamentous fungi of the present teachings are morphologically, physiologically, and genetically distinct from yeasts. In some embodiments, the filamentous fungi include, but are not limited to the following genera: Aspergillus, Acremonium, Aureobasidium, Beauverict, Cephalosporium, Ceήporiopsis, Chaetomium paecilomyces, Chrysosporium, Claviceps, Cochiobolus, Cryptococcus, Cyathus, Endothia, Endothia mucor, Fusarium, Gilocladium, Humicola, Magnaporthe, Myceliophthora, Myrothecium, Mucor, Neurospora, Phanerochaete, Podospora, Paecilomyces, Penicillium, Pyricularia, Rhizomucor, Rhizopus, Schizophylum, Stagonospora, Talaromyces, Trichoderma, Thermomyces, Thermoascus, Thielavia, Tolypocladium, Trichophyton, and Trametes pleurotus . In some embodiments, the filamentous fungi include, but are not limited to the following: A. nidulans, A. niger, A. awomari, e.g., NRRL 31 12, ATCC 22342 (NRRL 31 12), ATCC 44733, ATCC 14331 and strain UVK 143f, A. oryzae, e.g., ATCC 1 1490, N. crassa, Trichoderma reesei, e.g., NRRL 15709, ATCC 13631, 56764, 56765, 56466, 56767, and Trichoderma viride, e.g., ATCC 32098 and 32086.

[0058] The term "Trichoderma" or "Trichoderma species" used herein refers to any fungal organisms which have previously been classified as a Trichoderma species or strain, or which

are currently classified as a Trichoderma species or strain, or as a Hypocrea species or strain. In some embodiments, the species include Trichoderma longibrachiatum, Trichoderma reesei, Trichoderma viήde, or Hypocrea jecorina. Also contemplated for use as an original strain are cellulase-overproducing strains such as T. longibrachiatum/reesei RL-P37 (Sheir-Neiss et al., Appl. Microbiol. Biotechnology, 20 (1984) pp. 46-53; Montenecourt B.S., Can., 1-20, 1987), and Rut-C30 strain. In some embodiments, the production of cellulases in the species targeted for improvement is tightly regulated and is sensitive to various environmental conditions.

[0059] The present teachings provide a filamentous fungus comprising two or more polynucleotides that encode two or more heterologous polypeptides and a polynucleotide encoding a homologous polypeptide. The filamentous fungus is capable of expressing the heterologous and homologous polypeptides that form a functional mixture. In some embodiments, the filamentous fungus contains a first polynucleotide and a second polynucleotide, encoding a first heterologous polypeptide and a second heterologous polypeptide, respectively, and a third polynucleotide encoding a homologous polypeptide. In some embodiments, the filamentous fungus contains an additional polynucleotide, a fourth polynucleotide, encoding a third heterologous polypeptide. In some embodiments, the filamentous fungus contains four or more polynucleotides encoding four or more heterologous polypeptides and one or more polynucleotides encoding one or more homologous polypeptides.

[0060] According to the present teachings, a functional mixture includes any mixture of polypeptides, provided that such mixture has at least one function, biological or otherwise, that is derived from at least two or three polypeptides from the mixture. In other words, at least two or three polypeptides from the mixture contribute, at a detectable level, to the function of the polypeptide mixture. In some embodiments, the functional mixture includes at least three polypeptides and has a function derived from at least two or three of the polypeptides from the mixture. In some other embodiments, the functional mixture includes at least three polypeptides and has an enzymatic function derived from at least two or three polypeptides from the mixture. In some embodiments, the functional mixture includes at least three polypeptides and has a cellulase function derived from at least two or three of the polypeptides of the mixture. In some embodiments, the functional mixture includes four polypeptides and has a function derived from two, three or four of the polypeptides from the mixture.

[0061] In some embodiments, the functional mixture includes a function that corresponds to or is an improvement of any activity, e.g., secretable protein activity including without any

limitation, cellulase activity, saccharification activity or thermal stability associated with or provided by a filamentous fungus. In some embodiments, the functional mixture includes a function derived from the activity of exo-cellobiohydrolases, endoglucanases, or beta- glucosidases or any combination thereof. In some embodiments, the functional mixture does not include any bacterial enzyme in combination with its carrier filamentous protein. In some embodiments, the functional mixture does not form any antibody or functional antibody fragments, e.g., Fab, single chain antibody, etc.

[0062] In some embodiments, the polynucleotides encoding heterologous or homologous polypeptides are operably linked to one or more promoters. The promoter can be any suitable promoter now known, or later discovered, in the art. In some embodiments, the polynucleotides are expressed under a promoter native to the filamentous fungus. In some embodiments, the polynucleotides are under a heterologous promoter. In some embodiments, the polynucleotides are expressed under a constitutive or inducible promoter. Examples of promoters that can be used include, but are not limited to, a cellulase promoter, a xylanase promoter, the 1818 promoter (previously identified as a highly expressed protein by EST mapping Trichodermά). In some embodiments, the promoter is a cellulase promoter of the filamentous fungus. In some embodiments, the promoter is an exo-cellobiohydrolase, endoglucanase, or beta-glucosidase promoter. In some embodiments, the promoter is a cellobiohydrolase I (cbh 1) promoter. Non- limiting examples of promoters include a cbhl, cbh2, egll, egl2, egl3, egl4, egl5, pkil, gpdl, xynl, and xyn2 promoter. Further, two or more of the polynucleotides encoding the heterologous or homologous polypeptides, or portions thereof, can be fused together to form a fusion polynucleotide. The fusion polynucleotide can be operably linked to any suitable promoter as discussed above.

[0063] In some embodiments, the first polynucleotide encoding a first heterologous polypeptide is operably linked to a first promoter. The first promoter can, but need not, be different from the promoter or promoters to which the second or third polynucleotides are operably linked. In some embodiments, the first polynucleotide is operably linked to a promoter of a gene encoding the homologous polypeptide.

[0064] In some embodiments, a polynucleotide, e.g., the second polynucleotide, encoding a second heterologous polypeptide, is fused to another polynucleotide, e.g., with the third polynucleotide encoding a homologous polypeptide, to form a fusion polynucleotide. The fusion polynucleotide can be operably linked to any suitable promoter, including, but not limited

to, a promoter of a gene encoding the homologous polypeptide. The fusion polynucleotide encodes a fusion polypeptide or fusion protein that comprises two polypeptides, or domains or portions thereof. The portions or domains of the polypeptides can be any portion or domain of the polypeptides that either has at least one function, biological or otherwise, or becomes functional when combined into a fusion polypeptide or when combined with the other polypeptides of the functional mixture. In some embodiments, the fusion protein comprises the second heterologous polypeptide and the homologous polypeptide.

[0065] In some embodiments, the fusion polynucleotide encodes a fusion protein that comprises two polypeptides, e.g., the second heterologous polypeptide and the homologous polypeptide, separated by a linker or a linker region. The linker can be any suitable linker for connecting two polypeptides. The linker region generally forms an extended, semi-rigid spacer between independently folded peptide domains. A linker region between the polypeptides of the fusion protein may be beneficial in allowing the polypeptides to fold independently. In some embodiments, the linker is from glucoamylase from Aspergillus species and CBHI linkers from Tήcoderma species. In some embodiments, the linker can, but need not, be a portion of the polypeptides comprising the fusion protein. In some embodiments, the polypeptides of the fusion protein are second heterologous polypeptide and the homologous polypeptide.

[0066] In some embodiments, the fusion polynucleotide encodes a fusion protein that comprises two polypeptides separated by a linker or linker region and a cleavage site. In some embodiments, the polypeptides of the fusion protein are the second heterologous polypeptide and the homologous polypeptide. In general, the cleavage site will be located within the linker region and will allow the separation of the sequences bordering the cleavage site. The cleavage site can comprise any sequence that can be cleaved by any means now known or later developed, including, but are limited to, cleavage by a protease or after exposure to certain chemicals. Examples of such sequences include, but are not limited to, a kexin cleavage site, e.g., a KEX2 recognition site which includes codons for the amino acids Lys Arg, trypsin protease recognition sites of Lys and Arg, and the cleavage recognition site for endoproteinase- Lys-C.

[0067] In some embodiments, the filamentous fungus of the present teachings further comprises a polynucleotide encoding a selectable marker. The marker can be any suitable marker that allows the selection of transformed host cells. In general, a selectable marker will be a gene capable of expression in host cell which allows for ease of selection of those hosts containing

the vector. As used herein, the term generally refers to genes that provide an indication that a host cell has taken up an incoming DNA of interest or some other reaction has occurred. Generally, selectable markers are genes that confer antimicrobial resistance or a metabolic advantage on the host cell to allow cells containing the exogenous DNA to be distinguished from cells that have not received any exogenous sequence during the transformation. Examples of such selectable markers include but are not limited to antimicrobials, (e.g., kanamycin, erythromycin, actinomycin, chloramphenicol and tetracycline). Additional examples of markers include, but are not limited to, a T. reeseipyr4, acetolactate synthase, Streptomyces hyg, Aspergillus nidulans amdS gene and an Aspergillus niger pyrG gene.

[0068] In some embodiments, the filamentous fungus of the present teachings further comprises, and is capable of expressing, a fourth polynucleotide encoding a third heterologous polypeptide. The heterologous or homologous polypeptides can be naturally occurring polypeptides or variants thereof. In some embodiments, one or more of the heterologous polypeptides may be variants of the homologous polypeptides. For example, the first heterologous polypeptide can be a modified homologous polypeptide. In some embodiments, the first and second heterologous polypeptides are modified homologous polypeptides. In some embodiments, the first and second heterologous polypeptides are modified homologous polypeptides and the filamentous fungus contains a fourth polynucleotide encoding a third heterologous polypeptide. The third heterologous may, or may not be a modified homologous polypeptide.

[0069] The heterologous and homologous polypeptides of the present teachings can be any desired polypeptide that, when mixed with the other polypeptides of the present teachings produces a functional mixture that has at least one function, biological or otherwise, that is derived from at least two or three polypeptides from the mixture. In some embodiments, the mixture of the heterologous and homologous polypeptides allow the functional mixture to display improved function with respect to an activity of, associated with, or provided by a filamentous fungus. In some embodiments, the activities include, but are not limited to, an improved secretable protein activity, improved saccharification activity or thermal stability, i.e., stability at higher temperatures, or altered pH values and/or sustained activity for greater time periods at the same temperature.

[0070] In some embodiments, the heterologous or homologous polypeptides do not include any bacterial enzyme in combination with its carrier filamentous protein. In some embodiments, the

heterologous or homologous polypeptides do not combine to form any antibody or functional antibody fragments, e.g., Fab, single chain antibody, etc.

[0071] In some embodiments, one or more of the first or the second heterologous polypeptide or the homologous polypeptide is an enzyme or a portion thereof. In some embodiments, the first or the second heterologous polypeptide or the homologous polypeptide is a cellulase, hemicellulase, xylanase, mannanase or a domain or portion thereof. In some embodiments, the first or the second heterologous polypeptide or the homologous polypeptide is a cellulase or a portion thereof. In some embodiments, the first and the second heterologous polypeptides and the homologous polypeptide combine to form a functional mixture of cellulases.

[0072] In some embodiments, the first or second heterologous polypeptide or the homologous polypeptide is a cellulase selected from the group of: exo-cellobiohydrolases, endoglucanases, beta-glucosidases or portions thereof. The first or the second heterologous polypeptide, the homologous polypeptide and, if present, the third heterologous polypeptide, can be selected from the group of: exo-cellobiohydrolases, endoglucanases, beta-glucosidases or domains thereof without any restriction. In some embodiments, more than one polypeptide, heterologous or homologous, can belong to the same class or group of cellulases. For example, two or more of the polypeptides can belong to the class of exo-cellobiohydrolases. In some embodiments, one of the heterologous polypeptide belongs to the same class of cellulases as the homologous polypeptide. In some embodiments, the heterologous and homologous polypeptides are the same member of the class, but have sequences from different origins.

[0073] In some embodiments, the filamentous fungus of the present teachings contains a first polynucleotide and a second polynucleotide, encoding a first heterologous polypeptide and a second heterologous polypeptide, respectively, wherein the first heterologous polypeptide is an exo-cellobiohydrolase and the second heterologous polypeptide is an endoglucanase. In some embodiments, the first heterologous polypeptide is an exo-cellobiohydrolase, classified as EC 3.2.1.91, and the second heterologous polypeptide is an endoglucanase, classified as EC 3.2.1.4. In some embodiments, the first heterologous polypeptide is an exo-cellobiohydrolase selected from the group consisting of GH family 5, 6, 7, 9, 48, and wherein the second heterologous polypeptide is an endoglucanase selected from the group consisting of GH family 5, 6, 7, 8, 9, 12, 17, 31, 44, 45, 48, 51, 61, 64, 74 and 81.

[0074] As discussed above the heterologous and homologous polypeptides of the present teachings can be selected without restriction from the classes of cellulase enzymes. Exemplary combinations of enzymes are provided herein. In some embodiments, the first heterologous polypeptide is an exo-cellobiohydrolase, the second heterologous polypeptide is an endoglucanase, and the homologous polypeptide is an exo-cellobiohydrolase. In some embodiments, the first heterologous polypeptide is a first exo-cellobiohydrolase, the second heterologous polypeptide is an endoglucanase, the homologous polypeptide is a second exo- cellobiohydrolase, and the first exo-cellobiohydrolase and the second exo-cellobiohydrolase correspond to the same member of cellobiohydrolases, for example, both the first and second exo-cellobiohydrolases are CBHI or both are CBHII.

[0075] The filamentous fungi of the present teachings can be any filamentous fungus recognized by those of skill in the art. In some embodiments, the filamentous fungi include, but are not limited to the following genera: Aspergillus, Acremonium, Aureobasidium, Beauveria, Cephalospoήum, Ceripoήopsis, Chaetomium paecilomyces, Chrysosporium, Claviceps, Cochiobolus, Cryptococcus, Cyathus, Endothia, Endothia mucor, Fusarium, Gilocladium,

Humicola, Magnaporthe, Myceliophthora, Myrothecium, Mucor, Neurospora, Phanerochaete, Podospora, Paecilomyces, Penicillium, Pyricularia, Rhizomucor, Rhizopus, Schizophylum, Stagonospora, Talaromyces, Trichoderma, Thermomyces, Thermoascus, Thielavia, Tolypocladium, Trichophyton, and Trametes pleurotus . In some embodiments, the filamentous fungi include, but are not limited to the following: A. nidulans, A. niger, A. awomari, e.g.,

NRRL 31 12, ATCC 22342 (NRRL 31 12), ATCC 44733, ATCC 14331 and strain UVK 143f, A. oryzae, e.g., ATCC 1 1490, N. crassa, Trichoderma reesei, e.g., NRRL 15709, ATCC 13631, 56764, 56765, 56466, 56767, and Trichoderma viride, e.g., ATCC 32098 and 32086.

[0076] In some embodiments, the filamentous fungus of the present teachings is Trichoderma. In some embodiments, the filamentous fungus of the present teachings is Trichoderma reesei. In some embodiments, the heterologous polypeptides can be from any of the following: Humicola grisea, Acidothermus cellulolyticus, Thermobiβda fusca, or Penicillium funiculosum. In some embodiments, the heterologous polypeptides is from Humicola grisea, Acidothermus cellulolyticus, Thermobiβda, e.g.,. Thermobiβda fusca, or Penicillium funiculosum and the homologous polypeptide is from Trichoderma reesei.

[0077] Exemplary combinations of heterologous and homologous polypeptides are provided herein. In some embodiments, the heterologous and the homologous polypeptides of the

functional mixture can be selected from the group consisting of T. reesei EGI, EGII, EGIII (CEL7B, 5 A, 12A, respectively), variants of CELl 2A, H. grisea EGIII, T. fuscα E5 and E3 and A. cellulolyticus El and GH74. In some embodiments, the heterologous polypeptides of the functional mixture can be exo-endo cellulase fusion construct. In some embodiments, the fusion protein has cellulolytic activity comprising a catalytic domain derived from a fungal exo- cellobiohydrolase and a catalytic domain derived from an endoglucanase. Suitable, but non- limiting examples are provided in U.S. Patent Application Publication No. 20060057672.

[0078] In some embodiments, the heterologous polypeptides of the functional mixture can be variants of H. jecorina CBH I, a Cel7 enzyme. In some embodiments the cellobiohydrolases can be have improved thermostability and reversibility, including but not limited to those described in U.S Patent Application Publication No. 20050277172 and 20050054039.

[0079] In some embodiments, the heterologous polypeptides of the functional mixture can be variants of H. jecorina CBH 2, a Cel7 enzyme. In some embodiments the cellobiohydrolases can be have improved thermostability and reversibility, including but not limited to those described in U.S Patent Application Publication No. 20060205042.

[0080] In some embodiments, the host filamentous fungus is T. reesei, the first heterologous polypeptide is Humicolα griseα CBHI, the second heterologous polypeptide is Acidothermus cellulolyticus endoglucanase 1, and the homologous polypeptide is Trichodermα reesei CBHI. In some embodiments, the filamentous fungus is T. reesei and the first heterologous polypeptide or the second heterologous polypeptide is selected from the group consisting of Penicillium funiculosum cellobiohydrolase CBHI, Thermobiβdα endoglucanases E3, Thermobifidα endoglucanases E5, Acidothermus cellulolyticus GH74-core and GH48.

[0081] In some embodiments, the filamentous fungus comprises a fourth polynucleotide encoding a third heterologous polypeptide. Here, the first polypeptide is a modified T, reesei CBHI, the second heterologous polypeptide is a modified T. reesei CBHII, the third heterologous polypeptide is Acidothermus cellulolyticus endoglucanase 1, and the homologous polypeptide is T. reesei CBHI.

[0082] The present teachings also provides for functional mixtures with improved properties and/or activities. In some embodiments, the first heterologous polypeptide is an exo- cellobiohydrolase, the second heterologous polypeptide is an endoglucanase, and the

homologous polypeptide is an exo-cellobiohydrolase. Here, the first heterologous polypeptide, the second heterologous polypeptide and the homologous polypeptide form a mixture of thermostable cellulases.

[0083] Further, in some embodiments, the present teachings provide that the polynucleotides encoding the heterologous as well as the homologous polypeptides can be extrachromosomal, i.e., in a vector or plasmid or alternatively, the polynucleotides can be integrated within the chromosomes of filamentous fungus host. In some embodiments, the filamentous fungus host has at least one polynucleotide encoding the first, second or third heterologous polypeptide or the homologous polypeptide integrated into its genome. In some embodiments, the filamentous fungus host has at least one polynucleotide encoding the first, second or third heterologous polypeptide or the homologous polypeptide integrated into its genome and at least one ' polynucleotide encoding a heterologous or homologous polypeptide in a stable vector transformed into the host.

[0084] In some embodiments, the host is T. reesei with at least one polynucleotide encoding the first or second heterologous polypeptide or the homologous polypeptide integrated into its genome. In some embodiments, the host is T. reesei with two polynucleotides integrated into its genome. The polynucleotides encode either the first, second, or, if present, the third heterologous polypeptide or the homologous polypeptide. In some embodiments, one or more polynucleotides expressing either a heterologous or homologous exo-cellobiohydrolase are integrated into the genome of a T. reesei host. In some embodiments, a polynucleotide encoding a heterologous endoglucanase is integrated into the genome of a T. reesei host. In some embodiments, a polynucleotide encoding a heterologous endoglucanase and a polynucleotide encoding either a heterologous or homologous exo-cellobiohydrolase are integrated into the genome of a T. reesei host. It is understood that when only one or two of the three or four polynucleotides that encode the polypeptides of the functional mixture are integrated into the host genome, the remaining polynucleotides are transformed into the host and are present in a vector or plasmid. In some embodiments, the filamentous fungus contains a first polynucleotide and a second polynucleotide, encoding a first heterologous polypeptide and a second heterologous polypeptide, respectively, and a third polynucleotide encoding a homologous polypeptide and all three polynucleotides are extrachromosomal.

[0085] The present teachings also provide a culture medium comprising a population of the filamentous fungi described above. The culture medium can be solid, semi-solid or liquid and suitably chosen depending on the host as well as the polypeptides expressed therein.

[0086] Further, the present teachings also provide a polypeptide mixture comprising the first heterologous polypeptide, the second heterologous polypeptide, and the homologous polypeptide obtained from the filamentous fungi described herein. In some embodiments, the polypeptide mixture is a mixture of enzymes or domains thereof. In some embodiments, the polypeptide mixture is a mixture of cellulases, hemicellualses, xylanases, mannanases or domains thereof.

[0087] In addition, the present teachings provide a method of producing a mixture of polypeptides comprising obtaining a polypeptide mixture from the filamentous fungi described herein. The polypeptide mixture contains a first heterologous polypeptide, a second heterologous polypeptide, and a homologous polypeptide. In some embodiments, the mixture of polypeptides contains a third heterologous polypeptide. As discussed above, the mixture of polypeptides is a functional mixture. In some embodiments, the mixture of polypeptides is a mixture of enzymes or domains thereof. In some embodiments, the mixture of polypeptides is a mixture of cellulases, hemicellualses, xylanases, mannanases or domains thereof.

[0088] In some embodiments, the mixture of polypeptides is a mixture of cellulases comprising a first heterologous polypeptide that is an exo-cellobiohydrolase, a second heterologous polypeptide that is an endoglucanase, and a homologous polypeptide that is an exo- cellobiohydrolase. In some embodiments, the mixture of cellulases contains a first heterologous polypeptide that is a first exo-cellobiohydrolase, a second heterologous polypeptide that is an endoglucanase, and a homologous polypeptide that is a second exo-cellobiohydrolase. Here, the first exo-cellobiohydrolase and the second exo-cellobiohydrolase correspond to the same member of cellobiohydrolases. In some embodiments, the first and second exo- cellobiohydrolase are CBHI. In some embodiments, the first and second exo-cellobiohydrolase are CBHII.

[0089] As will be apparent to one of skill in the art, several other combinations of heterologous and homologous polypeptides can be expressed in the filamentous fungi of the present teachings. Another exemplary mixture of cellulases comprises a first heterologous polypeptide

that is Humicola grisea CBHI, a second heterologous polypeptide that is Acidothermus cellulolyticus endoglucanase 1, and a homologous polypeptide that is Trichoderma reesei CBHI.

[0090] Aspects of the present teachings may be further understood in light of the following examples, which should not be construed as limiting the scope of the present teachings. It will be apparent to those skilled in the art that many modifications, both to materials and methods, may be practiced without departing from the present teachings.

7. EXAMPLES

7.1 Example 1 Construction of the Tripartite Strain

[0091] The Tripartite strain consists of the following three parts: (i) a T. reesei cellulase production strain; (ii) nucleic acid comprising a Humicola grisea cbhl gene in that strain; and (iii) an exo-endo cellulase fusion of T. reesei cbhl with Acidothermus cellulolyticus endoglucanase 1.

[0092] Construction of a CBHl-El Fusion Vector

[0093] The CBH 1-El fusion construct included the T. reesei cbhl promoter; the T. reesei cbhl gene sequence from the start codon to the end of the cbhl linker and an additional 12 bases of DNA 5' to the start of the endoglucanase coding sequence, the endoglucanase coding sequence, a stop codon and the T. reesei cbhl terminator. The nucleotide sequence (SEQ ID NO: 1) of the heterologous cellulase fusion construct comprised 2656 bases (see Figure 1), and included the T. reesei cbhl signal sequence; the catalytic domain of the T. reesei cbhl; the T. reesei cbhl linker sequence; a kexin cleavage site which includes codons for the amino acids SKR and the sequence coding for the Acidothermus cellulolyticus GH5A-E/ catalytic domain. The predicted amino acid sequence (SEQ ID NO: 2) of the cellulase fusion protein based on the nucleic acid sequence of Figure 1 is shown in Figure 2. The additional 12 DNA bases, ACTAGT AAGCGG (nucleotides 1565 to 1576 of SEQ ID NO: 1) code for the restriction endonuclease Spel and the amino acids Thr, Ser, Lys, and Arg.

[0094] The plasmid £/-pUC19 which contained the open reading frame for the El gene locus was used as the DNA template in a PCR reaction. (Equivalent plasmids are described in U.S. Pat. No. 5,536,655, which also describes the cloning of the El gene from the actinomycete Acidothermus cellulolyticus ATCC 43068, Mohagheghi A. et al., 1986). Standard procedures

for working with plasmid DNA and amplification of DNA using the polymerase chain reaction (PCR) are found in Sambrook, et al, 2001.

[0095] The following two primers were used to amplify the coding region of the catalytic domain of the El endoglucanase. Forward Primer 1=EL-316 (containing a Spel site): GCTTATACTAGTAAGCGCGCGGGCGGCGGCTATTGGCACAC (SEQ ID NO: 3); Reverse Primer 2=EL-317 (containing an Ascl site and stop codon-reverse compliment): GCTTATGGCGCGCCTTAGACAGGATCGAAAATCGACGAC (SEQ ID NO: 4).

[0096] The reaction conditions were as follows using materials from the PLATINUM Pfx DNA Polymerase kit (Invitrogen, Carlsbad, CA): 1 μl dNTP Master Mix (final concentration 0.2 mM); 1 μl primer 1 (final cone 0.5 μM); 1 μl primer 2 (final cone 0.5 μM); 2 μl DNA template (final cone 50-200 ng); 1 μl 50 mM MgSO4 (final cone 1 mM); 5 μl 10x Pfx Amplification Buffer; 5 μl 1 OxPCRx Enhancer Solution; 1 μl Platinum Pfx DNA Polymerase (2.5U total); 33 μl water for 50 μl total reaction volume.

[0097] Amplification parameters were: step 1 : 94 0 C for 2 min (1st cycle only to denature antibody bound polymerase); step 2: 94 0 C for 45 sec; step 3: 60 0 C for 30 sec; step 4: 68 0 C for 2 min; step 5: repeated step 2 for 24 cycles; and step 6: 68 0 C for 4 min.

[0098] The appropriately sized PCR product was cloned into the Zero Blunt TOPO vector and transformed into chemically competent ToplO E. coli cells (Invitrogen Corp., Carlsbad, Calif.) plated onto to appropriate selection media (LA with 50 ppm kanamycin and grown overnight at 37 0 C. Several colonies were picked from the plate media and grown overnight in 5 ml cultures at 37 0 C in selection media (LB with 50 ppm kanamycin) from which plasmid mini-preps were made. Plasmid DNA from several clones were restriction digested to confirm the correct size insert. The correct sequence was confirmed by DNA sequencing. Following sequence verification, the El catalytic domain was excised from the TOPO vector by digesting with the restriction enzymes Spel and Ascl. This fragment was ligated into the pTrex4 vector which had been digested with the restriction enzymes Spel and Ascl as shown in Figure 3.

[0099] The ligation mixture was transformed into MM294 competent E. coli cells, plated onto appropriate selection media (LA with 50 ppm carbenicillin) and grown overnight at 37 0 C. Several colonies were picked from the plate media and grown overnight in 5 ml cultures at 37 0 C. in selection media (LB with 50 ppm carbenicillin) from which plasmid mini-preps were

made. Correctly ligated CBHl-E/ fusion protein vectors were confirmed by restriction digestion.

[00100] Construction of a H.grisea cbhl Expression Vector

[00101] The H. grisea cbhl expression construct included the T. reesei cbhl promoter; the H. grisea cbhl gene sequence, the T. reesei cbhl terminator and the A. nidulans amdS selectable marker. These sequences can be assembled in a number of ways by those skilled in the art, one method is described as follows.

[00102] Genomic DNA was extracted from a sample of mycelia of Humicola grisea var. thermoidea (CBS 225.63). Genomic DNA may be isolated using any method known in the art. The following protocol may be used.

[00103] Cells were grown at 45 0 C in 20 ml Potato Dextrose Broth (PDB) for 24 hours.

The cells were diluted 1 :20 in fresh PDB medium and grown overnight. Two milliliters of cells were centrifuged and the pellet washed in 1 ml KC (60 g KCl, 2 g citric acid per liter, pH adjusted to 6.2 with 1 M KOH). The cell pellet was resuspended in 900 μl KC. 100 μl (20 mg/ml) Novozyme was added, mixed gently and the protoplasting was followed microscopically at 37 0 C until greater than 90% protoplasts were formed for a maximum of 2 hours. The cells were centrifuged at 1500 rpm (4600 xG) for 10 minutes. 200 μl TES/SDS (10 mM Tris, 50 mM EDTA, 150 mM NaCl, 1% SDS) was added, mixed and incubated at room temperature for 5 minutes. DNA was isolated using a Qiagen mini-prep isolation kit (Qiagen). The column was eluted with 100 μl milli-Q water and the DNA collected.

[00104] An alternative method used the FastPrep method for isolating genomic DNA from H.grisea var thermoidea. grown on PDA plates at 45 0 C. The system consists of the FastPrep Instrument as well as the FastPrep kit for nucleic acid isolation. (FastPrep is available from Qbiogene, MP Biomedicals United States, 29525 Fountain Pkwy., Solon, OH 44139).

[00105] Primers to PCR amplify the H.grisea cbhl gene were based on NCBI

ACCESSION D63515. They were designed to amplify from the H.grisea cbhl coding start to the terminator. The sequence of the forward primer included the 4 nucleotides CACC to facilitate cloning into the vector TOPO pENTR to enable use of the Gateway cloning system (Invitrogen).

[00106] Forward Primer: 5' C ACC ATGCGTACCGCC AAGTTCGC 3' (SEQ ID

NO: 5)

[00107] Reverse Primer: 5' TTACAGGCACTGAGAGTACCAG S' (SEQ ID

NO: 6).

[00108] PCR reaction conditions

[00109] The PCR product was cloned into pENTR/D, according to the Invitrogen

Gateway system protocol. The vector was then transformed into chemically competent ToplO E.coli (Invitrogen) with kanamycin selection. Plasmid DNA from several clones was restriction digested to confirm the correct size insert, followed by sequencing to confirm the correct sequence. Plasmid DNA from one clone was added to a LR clonase reaction (Invitrogen Gateway system) with pTrex3g/α/wc/5 destination vector DNA.

[00110] Construction of pTrex3g

[00111] This section describes the construction of the basic vector used to express the genes of interest. The vector pTrex3g has been previously described, see for example, U.S. Patent Application Publication No. 20070015266. Briefly, the vector is based on the E coli vector pSLl 180 (Pharmacia Inc., Piscataway, NJ, USA) which is a pUCl 18 phagemid based vector (Brosius, J. (1989) DNA 8: 759) with an extended multiple cloning site containing 64 hexamer restriction enzyme recognition sequences. It was engineered to become a Gateway destination vector (Hartley, J. L. et al., (2000) Genome Research 10: 1788-1795) to allow insertion using Gateway technology (Invitrogen) of any desired open reading frame between the promoter and terminator regions of the T. reesei cbhl gene. The Aspergillus nidulans amdS gene was inserted for use as a selectable marker in transformation. A promoter and terminator were positioned to allow expression of a gene of interest.

[00112] The details of pTrex3g are as follows:

[00113] The vector is 10.3 kb in size. Inserted into the polylinker region of pSLl 180 are the following segments of DNA: (i) a 2.2 bp segment of DNA from the promoter region of the T .reesei cbhl gene; (ii) the 1.7 kb Gateway reading frame A cassette acquired from Invitrogen that includes the attRl and attR2 recombination sites at either end flanking the chloramphenicol

resistance gene (CmR) and the ccdB gene; (iii) a 336 bp segment of DNA from the terminator region of the T. reesei cbhl gene; and (iv) a 2.7 kb fragment of DNA containing the Aspergillus nidulans amdS gene with its native promoter and terminator regions. Figure 4 depicts the plasmid map of T. reesei expression vector pTrex3g.

[00114] A clone of the H.gήsea cbhl in the vector pENTR, described above, was used to recombine with the pTrex3g-destination vector in a LR clonase reaction according to the manufactures instructions (Invitrogen). The H.gήsea cbhl replaced the CmR and ccdB genes of the pTrex3g destination vector with the H. grisea cbhl from the pENTR/D vector. The recombination directionally inserted the H.gήsea cbhl between the T. reesei cbhl promoter and T. reesei cbhl terminator of the destination vector. The recombination resulted in AttB sequences of 25 bp flanking the H.gήsea cbhl both upstream and downstream. An aliquot of the LR clonase reaction was transformed into chemically competent Top 10 E. coli cells (Invitrogen) and grown overnight with carbenicillin selection. Plasmid DNA, from several clones, were digested with appropriate restriction enzymes to confirm the correct insert size followed by sequencing to confirm the correct sequence. To provide DNA for transformation, plasmid DNA from a correct clone was digested with the endonuclease Xba\ to release the expression fragment including the T. reesei cbhl promotev.H. grisea cbhl:T. reesei cbhl terminator:^. nidulans amdS. This 6.2 kb fragment was isolated from the E. coli DNA by agarose gel extraction using standard techniques and transformed into a strain of T. reesei derived from the publicly available strain QM6a, as further described below. The expression vector including the two Xba I sites is shown schematically in Figure 5A and the nucleotide sequence (SEQ ID NO: 7) of the expression vector is provided in Figure 5B.

[00115] Co-Transformation and fermentation of Trichoderma reesei

[00116] A derivative of T. reesei host strain RL-P37 (Sheir-Neiss, et al, 1984) which had undergone a number of mutagenensis steps to increase cellulase production, including deletion of the native cbhl gene (Suominen, P. L. et al., 1993, MoI Gen Genet 241:523-30), was used as a host strain for transformations with the constructs of the present teachings.

[00117] Biolistic transformation of T. reesei with the H. grisea cbhl expression construction and the fusion construct of T. reesei cbhl and A. cellulolyticus El was performed using the protocol outlined below.

[00118] A suspension of spores (approximately 3.5xlO 8 spores/ml) from a P-37 derived strain of T. reesei was prepared. Between 100 μl - 200 μl of this spore suspension was spread onto the center of plates of MM acetamide medium. MM acetamide medium had the following composition: 0.6 g/L acetamide; 1.68 g/L CsCl; 20 g/L glucose; 20 g/L KH 2 PO 4 ; 0.6 g/L CaCl 2 .2H 2 O; 1 ml/L IOOOX trace elements solution; 20 g/L Noble agar; pH 5.5. IOOOX trace elements solution contained 5.0 g/1 FeSO 4 JH 2 O, 1.6 g/1 MnSO 4 -H 2 O, 1.4 g/1 ZnSO 4 JH 2 O and 1.0 g/1 CoCl 2 .6H 2 O. The spore suspension was allowed to dry on the surface of the MM acetamide medium in a sterile hood.

[00119] Transformation of T. reesei was performed using a Biolistic ® PDS- 1000/He Particle Delivery System from Bio-Rad (Hercules, CA) following the manufacturer's instructions (Lorito, M. et ah, 1993, Curr Genet 24:349-56). 60 mg of MlO tungsten particles were placed in a microcentrifuge tube. ImL of ethanol was added, the mixture was briefly vortexed and allowed to stand for 15 minutes. The particles were centrifuged at 15,000 rpm for 15 mins. The ethanol was removed and the particles were washed three times with sterile dH 2 O before 1 mL of 50% (v/v) sterile glycerol was added. After ten seconds of vortexing to suspend the tungsten, 25 μl of tungsten/glycerol particle suspension was removed and placed into a microcentrifuge tube.

[00120] While continuously vortexing the 25 μl tungsten/glycerol particle suspension, the following were added in order, allowing 5' incubations between additions; 2 μl (100-300 ng/μl) of H. grisea cbhl expression vector (Xbal cut fragment), 2 μl (100-300 ng/μl) cbhl-El expression vector (Xbal cut fragment), 25 μl of 2.5M CaCl 2 and 10 μl of 0.1 M spermidine. After a 5' incubation post spermidine addition, the particles were centrifuged for 3 seconds. The supernatant was removed; the particles were washed with 200 μl of 70% (v/v) ethanol and then centrifuged for 3 seconds. The supernatant was removed; the particles were washed with 200 μl of 100% ethanol and centrifuged for 3 seconds. The supernatant was removed and 24 μl 100% ethanol was added and mixed by pipetting. The tube was placed in an ultrasonic cleaning bath for approximately 15 seconds to further resuspend the particles in the ethanol. While the tube was in the ultrasonic bath, 8 μl aliquots of suspended particles were removed and placed onto the center of macrocarrier disks that were placed into a desiccator.

[00121] Once the tungsten/DNA solution had dried onto the macrocarrrier (approximately

15'), it was placed into the bombardment chamber. Next a plate containing MM acetamide with spores and the bombardment process was performed using 1100 psi rupture discs according to

the manufacturers instructions. After the bombardment of the plated spores with the tungsten/DNA particles, the plates were placed incubated at 28 0 C. Large transformed colonies were picked to fresh secondary plates of MM acetamide after 5 days (Penttila et al., (1987) Gene 61 : 155 - 164) and incubated another 3 days at 28 0 C. Colonies which showed dense, opaque growth on secondary plates were transferred to individual MM acetamide plates. These were grown another three days and transferred to potato dextrose agar plates (PDA) and allowed to incubate another 7-10 days at 28 0 C to allow sporulation.

[00122] The expression of enzymes from the transformants was next evaluated in two stage shake flasks. They were first grown in an inoculum shake flask containing the following media: 22.5 g/L Proflo, 30 g/L a-Lactose.H 2 O, 6.5 g/L (NH 4 ) 2 SO 4 , 2 g/L KH 2 PO 4 , 0.3 g/L MgSO 4 .7H 2 O, 0.26 g/L CaCL 2 .2H 2 O, 0.72 g/L CaCO 3 , 2 ml of 10% Tween 80, 1 ml of 100Ox TRI Trace Salts (100Ox TRI Trace Salts consists of: 5 g/L FeSO 4 JH 2 O, 1.6 g/L MnSO 4 -H 2 O, 1.4 g/L ZnSO 4 .7H 2 O) . The conditions were as follows: 50 ml media in a 4 baffled, 250 ml shake flask (Bellco Biotechnology, 340 Edrudo Road, Vineland, NJ 08360 USA), incubation at 28 0 C, shaking speed 225 RPM @ 2.5 cm diameter orbit). Transformants were inoculated into the inoculum shake flasks by transferring a 4 cm2 piece of PDA containing the transformant mycelia and spores.

[00123] After 2 days of growth in the inoculum flask, 5 ml was transferred into an expression shake flask containing 50 ml of the following media : 5 g/L (NH 4 ) 2 SO 4 , 33 g/L PIPPS Buffer, 9 g/L Bacto Casamino Acids, 4.5 g/L KH 2 PO 4 , 1.32 g/L CaCl 2 .2H 2 O, 1 g/L MgSO 4 JH 2 O, 5 ml Mazu DF204 antifoam, 2.5 ml 40Ox T.reesei Trace Salts (40Ox T.reesei Trace Salts consists of : 175 g/L Citric Acid (anhydrous), 200 g/L FeSO 4 .7H 2 O, 16 g/L ZnSO 4 .7H 2 O, 3.2 g/L CuSO 4 .5H 2 O, 1.4 g/L MnSO 4 -H 2 O, 0.8 g/L H 3 BO 3 , added in order listed), pH is adjusted to 5.5, media is sterilized, post-sterilization, 40 ml of 40% lactose is added. Expression shake flask conditions were grown as follows: 4 baffled, 250 ml shake, incubation at 28 0 C, shaking speed 225. A sample was removed at 5 days, the supernate was analyzed on SDS-PAGE protein gels, coomassie stained.

7.2 Example 2 Four-Part Strain Construction

[00124] A strain was constructed which comprised four parts: (i) a host strain consisting of a cbhl deleted production strain; (ii) a nucleic acid sequence for expression of a cbhl-El fusion gene; (iii) a nucleic acid sequence for expression of a protein engineered thermostable

T.reesei cbhl gene; and (iv) a nucleic acid sequence for expression of a protein engineered thermostable T.reesei cbhll gene. The DNA of all three expression fragments was co- transformed into the cbhl deleted production strain as shown in Figure 6.

[00125J T. reesei transformants were screened for the presence of all three expression fragments integrated into the genome. PCR primer pairs were designed to amplify each of the three expression fragments. 32 transformants that on the basis of PCR showed the presence of all three expression fragments were chosen for shake flask fermentation. Shake flasks were grown for three days, supernate samples were obtained and run in 8% tris-glycine NuPAGE (invitrogen) gels, 1 mm, in tris-glycine SDS running buffer. Sample preps were loaded at 20μl/lane unless noted (8μl supernate + 2μl reducing agent + 10 μl of 2X tris-glycine SDS sample buffer) after incubating at 100 0 C for 7 minutes followed by 5 minutes incubation on ice). Several of the 32 samples showed the high level presence of the expressed genes as evidenced by protein bands.

[00126] DNA encoding an amino acid sequence variant of the T.reesei cbhl and cbhll can be prepared by a variety of methods known in the art. These methods include, but are not limited to, gene synthesis, preparation by site-directed (or oligonucleotide-mediated) mutagenesis, PCR mutagenesis, and cassette mutagenesis of an earlier prepared DNA encoding the T.reesei cDNA sequence.

[00127] A vector was constructed in pTrex3g expressing an enzyme engineered T. reesei cbhl gene encoding an engineered protein with the following mutations in the mature amino acid sequence: S8P+T41I+N49S+A68T+N89D+S92T+S1 13N+S196T+P227L+ D249K+T255P+S278P+E295K+T296P+T332Y+V403D+S41 IF. The DNA sequence from start to stop codon was 1545 bases (SEQ ID NO: 8) as provided in Figure 7A. The sequence of the engineered CBHI protein (SEQ ID NO: 9) is provided in Figure 7B (the CBHI signal sequence is underlined). A diagram of the cbhl expression vector pTrex3g-cόλ/ is shown in Figure 8A. The DNA sequence of the expression vector pTrex3g-c6/?7 was 10145 bases (SEQ ID NO: 10) as provided in Figure 8B.

[00128] A vector was constructed to express an enzyme engineered CBHII protein. The vector included the cbhll promoter, the engineered cbhll gene, the cbhll terminator, the A.nidulans acetamidase (arndS) as selectable marker, and additional flanking 3' sequence to the cbhll terminator. The vector was constructed using the shuttle vector pCR-XL-TOPO

(Invitrogen). The expression portion of the vector was excised from the shuttle vector by digestion of the plasmid with the unique restriction endonucleases Notl and Srfl, generating a fragment of approximately 10.68 kb in length which was used to transform T. reesei.

[00129] The vector expressed a T. reesei cbhll gene encoding an engineered protein with the following mutations in the amino acid sequence: P98L,M134V, T154A, I212V, S316P, and S413Y. The DNA sequence from start to stop codon was 1416 bases (SEQ ID NO: 11) as provided in Figure 9A. The amino acid sequence (SEQ ID NO: 12) is provided in Figure 9B (the signal sequence is underlined). A diagram of the cbhll expression vector is shown in Figure 1OA. The DNA sequence of the entire cbhll expression pExp-cbhll vector was 14158 bases (SEQ ID NO: l l) as provided in Figure 1 OB .

[00130] Co-transformation was carried on a T. reesei strain deleted for cbhl, using three fragments of DNA:

[00131] The engineered cbhll expression fragment that was cut from the plasmid pExp- cbhll using Notl and Srfl.

[00132] The engineered cbhl in the expression vector pTrex3g that was used as a PCR template to generate a linear fragment of only the cbhl promoter, engineered cbhl and cbhl terminator (without amdS marker). The cbhl-El fusion fragment described in the previous example that was used as a PCR template to generate a linear fragment consisting of the cbhl promoter, the cbhl-El fusion gene and cbhl terminator (without amdS marker).These three fragments were used to coat tungsten particles in biolistic cotransformation. The procedure was carried out as described in the previous example. In this cotransformation, each of the three fragments, 1, 2 and 3 were added to the tungsten particles at a volume of 1.5 μl of each fragment (100-300 ng/μl DNA concentration). Transformant selection was on MM acetamide media as described.

6.3 Example 3 Assay of Cellulolytic Activity from Transformed Trichoderma reesei Clones

[00133] The following assays and substrates were used to determine the cellulolytic activity of the CBHI-EI fusion protein. Trichoderma reesei strains Tr-A and Tr-D were derived from RL-P37 through mutagenesis.

[00134] Pretreated corn stover (PCS): Corn stover was pretreated with 2% w/w H2SO4 as described in Schell, D. et al., J. Appl. Biochem. Biotechnol. 105:69 - 86 (2003), and followed by multiple washes with deionized water to obtain a pH of 4.5. Sodium acetate was added to make a final concentration of 5OmM and this was titrated to pH 5.0.

[00135] Measurement of Total Protein: Protein concentrations were measured using the bicinchoninic acid method with bovine serum albumin as a standard (Smith, P. K., et al. (1985) Anal. Biochem. 150:76-85).

[00136] Cellulose conversion (Soluble sugar determinations) was evaluated by HPLC according to the methods described in Baker et al., Appl. Biochem. Biotechnol. 70-72:395 - 403 (1998).

[00137] A standard cellulosic conversion assay was used in the experiments. In this assay enzyme and buffered substrate were placed in containers and incubated at a temperature over time. The reaction was quenched with enough 100 mM Glycine, pH 1 1.0 to bring the pH of the reaction mixture to at least pH10. Once the reaction was quenched, an aliquot of the reaction mixture was filtered through a 0.2 micron membrane to remove solids. The filtered solution was then assayed for soluble sugars by HPLC as described above. The cellulose concentration in the reaction mixture was approximately 7%. The enzyme or enzyme mixtures were dosed anywhere from 1 to 60 mg of total protein per gram of cellulose.

[00138] Table 1 , below, summaries the data showing the increased specific performance of the 4-part strain over a modified Tr-D.

Table 2, below, summarizes the data showing the increased specific performance of the 3-part strain over Tr-A.

[00139] All references and publications cited herein are incorporated by reference in their entirety. It should be noted that there are alternative ways of implementing the present invention. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.

SEQUENCE LISTING

<110> Danisco US Inc., Genencor Div.

Bowers, Benjamin S. Larenas, Edmund A.

<120> Heterologous and Homologous Cellulase Expression System <130> 30984WO

<150> US 60/933,894 <151> 2007-06-08

<160> 14

<170> Patentln version 3.4

<210> 1

<211> 2656 <212> DNA

<213> Artificial

<220>

<223> composite of Trichoderma reesei, Acidothermus cellulolyticus and synthetic sequences

<400> 1 atgtatcgga agttggccgt catctcggcc ttcttggcca cagctcgtgc tcagtcggcc 60 tgcactctcc aatcggagac tcacccgcct ctgacatggc agaaatgctc gtctggtggc 120 acttgcactc aacagacagg ctccgtggtc atcgacgcca actggcgctg gactcacgct 180 acgaacagca gcacgaactg ctacgatggc aacacttgga gctcgaccct atgtcctgac 240 aacgagacct gcgcgaagaa ctgctgtctg gacggtgccg cctacgcgtc cacgtacgga 300 gttaccacga gcggtaacag cctctccatt ggctttgtca cccagtctgc gcagaagaac 360 gttggcgctc gcctttacct tatggcgagc gacacgacct accaggaatt caccctgctt 420 ggcaacgagt tctctttcga tgttgatgtt tcgcagctgc cgtaagtgac ttaccatgaa 480 cccctgacgt atcttcttgt gggctcccag ctgactggcc aatttaaggt gcggcttgaa 540 cggagctctc tacttcgtgt ccatggacgc ggatggtggc gtgagcaagt atcccaccaa 600 caccgctggc gccaagtacg gcacggggta ctgtgacagc cagtgtcccc gcgatctgaa 660 gttcatcaat ggccaggcca acgttgaggg ctgggagccg tcatccaaca acgcaaacac 720 gggcattgga ggacacggaa gctgctgctc tgagatggat atctgggagg ccaactccat 780 ctccgaggct cttacccccc acccttgcac gactgtcggc caggagatct gcgagggtga 840 tgggtgcggc ggaacttact ccgataacag atatggcggc acttgcgatc ccgatggctg 900 cgactggaac ccataccgcc tgggcaacac cagcttctac ggccctggct caagctttac 960 cctcgatacc accaagaaat tgaccgttgt cacccagttc gagacgtcgg gtgccatcaa 1020 ccgatactat gtccagaatg gcgtcacttt ccagcagccc aacgccgagc ttggtagtta 1080

ctctggcaac gagctcaacg atgattactg cacagctgag gaggcagaat tcggcggatc 1140 ctctttctca gacaagggcg gcctgactca gttcaagaag gctacctctg gcggcatggt 1200 tctggtcatg agtctgtggg atgatgtgag tttgatggac aaacatgcgc gttgacaaag 1260 agtcaagcag ctgactgaga tgttacagta ctacgccaac atgctgtggc tggactccac 1320 ctacccgaca aacgagacct cctccacacc cggtgccgtg cgcggaagct gctccaccag 1380 ctccggtgtc cctgctcagg tcgaatctca gtctcccaac gccaaggtca ccttctccaa 1440 catcaagttc ggacccattg gcagcaccgg caaccctagc ggcggcaacc ctcccggcgg 1500 aaacccgcct ggcaccacca ccacccgccg cccagccact accactggaa gctctcccgg 1560 acctactagt aagcgggcgg gcggcggcta ttggcacacg agcggccggg agatcctgga 1620 cgcgaacaac gtgccggtac ggatcgccgg catcaactgg tttgggttcg aaacctgcaa 1680 ttacgtcgtg cacggtctct ggtcacgcga ctaccgcagc atgctcgacc agataaagtc 1740 gctcggctac aacacaatcc ggctgccgta ctctgacgac attctcaagc cgggcaccat 1800 gccgaacagc atcaattttt accagatgaa tcaggacctg cagggtctga cgtccttgca 1860 ggtcatggac aaaatcgtcg cgtacgccgg tcagatcggc ctgcgcatca ttcttgaccg 1920 ccaccgaccg gattgcagcg ggcagtcggc gctgtggtac acgagcagcg tctcggaggc 1980 tacgtggatt tccgacctgc aagcgctggc gcagcgctac aagggaaacc cgacggtcgt 2040 cggctttgac ttgcacaacg agccgcatga cccggcctgc tggggctgcg gcgatccgag 2100 catcgactgg cgattggccg ccgagcgggc cggaaacgcc gtgctctcgg tgaatccgaa 2160 cctgctcatt ttcgtcgaag gtgtgcagag ctacaacgga gactcctact ggtggggcgg 2220 caacctgcaa ggagccggcc agtacccggt cgtgctgaac gtgccgaacc gcctggtgta 2280 ctcggcgcac gactacgcga cgagcgtcta cccgcagacg tggttcagcg atccgacctt 2340 ccccaacaac atgcccggca tctggaacaa gaactgggga tacctcttca atcagaacat 2400 tgcaccggta tggctgggcg aattcggtac gacactgcaa tccacgaccg accagacgtg 2460 gctgaagacg ctcgtccagt acctacggcc gaccgcgcaa tacggtgcgg acagcttcca 2520 gtggaccttc tggtcctgga accccgattc cggcgacaca ggaggaattc tcaaggatga 2580 ctggcagacg gtcgacacag taaaagacgg ctatctcgcg ccgatcaagt cgtcgatttt 2640 cgatcctgtc ggctaa 2656

<210> 2

<211> 841

<212> PRT

<213> Artificial

<220>

<223> composite of T. reesei, Aciothermus cellulyticus and synthetic sequences

<220 >

<221> SIGNAL <222> (1) .. (17)

<400> 2

Met Tyr Arg Lys Leu Ala VaI lie Ser Ala Phe Leu Ala Thr Ala Arg 1 5 10 15

Ala GIn Ser Ala Cys Thr Leu GIn Ser GIu Thr His Pro Pro Leu Thr

20 25 30

Trp GIn Lys Cys Ser Ser GIy GIy Thr Cys Thr GIn GIn Thr GIy Ser 35 40 45

VaI VaI lie Asp Ala Asn Trp Arg Trp Thr His Ala Thr Asn Ser Ser 50 55 60

Thr Asn Cys Tyr Asp GIy Asn Thr Trp Ser Ser Thr Leu Cys Pro Asp 65 70 75 80

Asn GIu Thr Cys Ala Lys Asn Cys Cys Leu Asp GIy Ala Ala Tyr Ala 85 90 95

Ser Thr Tyr GIy VaI Thr Thr Ser GIy Asn Ser Leu Ser lie GIy Phe 100 105 110

VaI Thr GIn Ser Ala GIn Lys Asn VaI GIy Ala Arg Leu Tyr Leu Met 115 120 125

Ala Ser Asp Thr Thr Tyr GIn GIu Phe Thr Leu Leu GIy Asn GIu Phe 130 135 140

Ser Phe Asp VaI Asp VaI Ser GIn Leu Pro Cys GIy Leu Asn GIy Ala 145 150 155 160

Leu Tyr Phe VaI Ser Met Asp Ala Asp GIy GIy VaI Ser Lys Tyr Pro 165 170 175

Thr Asn Thr Ala GIy Ala Lys Tyr GIy Thr GIy Tyr Cys Asp Ser GIn

180 185 190

Cys Pro Arg Asp Leu Lys Phe lie Asn GIy Gin Ala Asn VaI GIu GIy 195 200 205

Trp GIu Pro Ser Ser Asn Asn Ala Asn Thr GIy lie GIy GIy His GIy 210 215 220

Ser Cys Cys Ser GIu Met Asp lie Trp GIu Ala Asn Ser lie Ser GIu 225 230 235 240

Ala Leu Thr Pro His Pro Cys Thr Thr VaI GIy GIn GIu lie Cys GIu 245 250 255

GIy Asp GIy Cys GIy GIy Thr Tyr Ser Asp Asn Arg Tyr GIy GIy Thr 260 265 270

Cys Asp Pro Asp GIy Cys Asp Trp Asn Pro Tyr Arg Leu GIy Asn Thr 275 280 285

Ser Phe Tyr GIy Pro GIy Ser Ser Phe Thr Leu Asp Thr Thr Lys Lys 290 295 300

Leu Thr VaI VaI Thr Gin Phe GIu Thr Ser GIy Ala lie Asn Arg Tyr 305 310 315 320

Tyr VaI Gin Asn GIy VaI Thr Phe GIn GIn Pro Asn Ala GIu Leu GIy 325 330 335

Ser Tyr Ser GIy Asn GIu Leu Asn Asp Asp Tyr Cys Thr Ala GIu GIu 340 345 350

Ala GIu Phe GIy GIy Ser Ser Phe Ser Asp Lys GIy GIy Leu Thr GIn 355 360 365

Phe Lys Lys Ala Thr Ser GIy GIy Met VaI Leu VaI Met Ser Leu Trp

370 375 380

Asp Asp Tyr Tyr Ala Asn Met Leu Trp Leu Asp Ser Thr Tyr Pro Thr 385 390 395 400

Asn GIu Thr Ser Ser Thr Pro GIy Ala VaI Arg GIy Ser Cys Ser Thr 405 410 415

Ser Ser GIy VaI Pro Ala GIn VaI GIu Ser GIn Ser Pro Asn Ala Lys 420 425 430

VaI Thr Phe Ser Asn lie Lys Phe GIy Pro lie GIy Ser Thr GIy Asn 435 440 445

Pro Ser GIy GIy Asn Pro Pro GIy GIy Asn Pro Pro GIy Thr Thr Thr 450 455 460

Thr Arg Arg Pro Ala Thr Thr Thr GIy Ser Ser Pro GIy Pro Thr Ser 465 470 475 480

Lys Arg Ala GIy GIy GIy Tyr Trp His Thr Ser GIy Arg GIu lie Leu 485 490 495

Asp Ala Asn Asn VaI Pro VaI Arg lie Ala GIy lie Asn Trp Phe GIy 500 505 510

Phe GIu Thr Cys Asn Tyr VaI VaI His GIy Leu Trp Ser Arg Asp Tyr 515 520 525

Arg Ser Met Leu Asp GIn lie Lys Ser Leu GIy Tyr Asn Thr lie Arg 530 535 540

Leu Pro Tyr Ser Asp Asp lie Leu Lys Pro GIy Thr Met Pro Asn Ser 545 550 555 560

lie Asn Phe Tyr GIn Met Asn GIn Asp Leu GIn GIy Leu Thr Ser Leu 565 570 575

GIn VaI Met Asp Lys lie VaI Ala Tyr Ala GIy GIn lie GIy Leu Arg 580 585 590

lie lie Leu Asp Arg His Arg Pro Asp Cys Ser GIy GIn Ser Ala Leu 595 600 605

Trp Tyr Thr Ser Ser VaI Ser GIu Ala Thr Trp lie Ser Asp Leu Gin 610 615 620

Ala Leu Ala GIn Arg Tyr Lys GIy Asn Pro Thr VaI VaI GIy Phe Asp 625 630 635 640

Leu His Asn GIu Pro His Asp Pro Ala Cys Trp GIy Cys GIy Asp Pro

645 650 655

Ser lie Asp Trp Arg Leu Ala Ala GIu Arg Ala GIy Asn Ala VaI Leu 660 665 670

Ser VaI Asn Pro Asn Leu Leu lie Phe VaI GIu GIy VaI GIn Ser Tyr 675 680 685

Asn GIy Asp Ser Tyr Trp Trp GIy GIy Asn Leu GIn GIy Ala GIy GIn 690 695 700

Tyr Pro VaI VaI Leu Asn VaI Pro Asn Arg Leu VaI Tyr Ser Ala His 705 710 715 720

Asp Tyr Ala Thr Ser VaI Tyr Pro GIn Thr Trp Phe Ser Asp Pro Thr

725 730 735

Phe Pro Asn Asn Met Pro GIy lie Trp Asn Lys Asn Trp GIy Tyr Leu 740 745 750

Phe Asn GIn Asn lie Ala Pro VaI Trp Leu GIy GIu Phe GIy Thr Thr 755 760 765

Leu GIn Ser Thr Thr Asp Gin Thr Trp Leu Lys Thr Leu VaI GIn Tyr 770 775 780

Leu Arg Pro Thr Ala GIn Tyr GIy Ala Asp Ser Phe GIn Trp Thr Phe 785 790 795 800

Trp Ser Trp Asn Pro Asp Ser GIy Asp Thr GIy GIy lie Leu Lys Asp

805 810 815

Asp Trp GIn Thr VaI Asp Thr VaI Lys Asp GIy Tyr Leu Ala Pro lie 820 825 830

Lys Ser Ser lie Phe Asp Pro VaI GIy

835 840

<210> 3

<211> 41

<212> DNA <213> Artificial

<220>

<223> forward PCR primer <400> 3 gcttatacta gtaagcgcgc gggcggcggc tattggcaca c 41

<210> 4

<211> 39

<212> DNA

<213> Artificial

<220>

<223> reverse PCR primer

<400> 4 gcttatggcg cgccttagac aggatcgaaa atcgacgac 39

<210> 5

<211> 24

<212> DNA

<213> Artificial

<220>

<223> forward PCR primer

<400 > 5 caccatgcgt accgccaagt tcgc 24

<210> 6

<211> 22

<212> DNA

<213> Artificial <220>

<223> reverse PCR primer

<400> 6 ttacaggcac tgagagtacc ag 22

<210> 7

<211> 10232

<212> DNA <213> Artificial

<220>

<223> pTrex3g-Hgrisea-cbhl expression vector <400> 7 aagcttacta gtacttctcg agctctgtac atgtccggtc gcgacgtacg cgtatcgatg 60 gcgccagctg caggcggccg cctgcagcca cttgcagtcc cgtggaattc tcacggtgaa 120 tgtaggcctt ttgtagggta ggaattgtca ctcaagcacc cccaacctcc attacgcctc 180 ccccatagag ttcccaatca gtgagtcatg gcactgttct caaatagatt ggggagaagt 240 tgacttccgc ccagagctga aggtcgcaca accgcatgat atagggtcgg caacggcaaa 300 aaagcacgtg gctcaccgaa aagcaagatg tttgcgatct aacatccagg aacctggata 360 catccatcat cacgcacgac cactttgatc tgctggtaaa ctcgtattcg ccctaaaccg 420 aagtgcgtgg taaatctaca cgtgggcccc tttcggtata ctgcgtgtgt cttctctagg 480 tgccattctt ttcccttcct ctagtgttga attgtttgtg ttggagtccg agctgtaact 540 acctctgaat ctctggagaa tggtggacta acgactaccg tgcacctgca tcatgtatat 600 aatagtgatc ctgagaaggg gggtttggag caatgtggga ctttgatggt catcaaacaa 660 agaacgaaga cgcctctttt gcaaagtttt gtttcggcta cggtgaagaa ctggatactt 720 gttgtgtctt ctgtgtattt ttgtggcaac aagaggccag agacaatcta ttcaaacacc 780 aagcttgctc ttttgagcta caagaacctg tggggtatat atctagagtt gtgaagtcgg 840 taatcccgct gtatagtaat acgagtcgca tctaaatact ccgaagctgc tgcgaacccg 900 gagaatcgag atgtgctgga aagcttctag cgagcggcta aattagcatg aaaggctatg 960 agaaattctg gagacggctt gttgaatcat ggcgttccat tcttcgacaa gcaaagcgtt 1020 ccgtcgcagt agcaggcact cattcccgaa aaaactcgga gattcctaag tagcgatgga 1080 accggaataa tataataggc aatacattga gttgcctcga cggttgcaat gcaggggtac 1140

tgagcttgga cataactgtt ccgtacccca cctcttctca acctttggcg tttccctgat 1200 tcagcgtacc cgtacaagtc gtaatcacta ttaacccaga ctgaccggac gtgttttgcc 1260 cttcatttgg agaaataatg tcattgcgat gtgtaatttg cctgcttgac cgactggggc 1320 tgttcgaagc ccgaatgtag gattgttatc cgaactctgc tcgtagaggc atgttgtgaa 1380 tctgtgtcgg gcaggacacg cctcgaaggt tcacggcaag ggaaaccacc gatagcagtg 1440 tctagtagca acctgtaaag ccgcaatgca gcatcactgg aaaatacaaa ccaatggcta 1500 aaagtacata agttaatgcc taaagaagtc atataccagc ggctaataat tgtacaatca 1560 agtggctaaa cgtaccgtaa tttgccaacg gcttgtgggg ttgcagaagc aacggcaaag 1620 ccccacttcc ccacgtttgt ttcttcactc agtccaatct cagctggtga tcccccaatt 1680 gggtcgcttg tttgttccgg tgaagtgaaa gaagacagag gtaagaatgt ctgactcgga 1740 gcgttttgca tacaaccaag ggcagtgatg gaagacagtg aaatgttgac attcaaggag 1800 tatttagcca gggatgcttg agtgtatcgt gtaaggaggt ttgtctgccg atacgacgaa 1860 tactgtatag tcacttctga tgaagtggtc catattgaaa tgtaaagtcg gcactgaaca 1920 ggcaaaagat tgagttgaaa ctgcctaaga tctcgggccc tcgggccttc ggcctttggg 1980 tgtacatgtt tgtgctccgg gcaaatgcaa agtgtggtag gatcgaacac actgctgcct 2040 ttaccaagca gctgagggta tgtgataggc aaatgttcag gggccactgc atggtttcga 2100 atagaaagag aagcttagcc aagaacaata gccgataaag atagcctcat taaacggaat 2160 gagctagtag gcaaagtcag cgaatgtgta tatataaagg ttcgaggtcc gtgcctccct 2220 catgctctcc ccatctactc atcaactcag atcctccagg agacttgtac accatctttt 2280 gaggcacaga aacccaatag tcaaccatca caagtttgta caaaaaagca ggctatgcgt 2340 accgccaagt tcgccaccct cgccgccctt gtggcctcgg ccgccgccca gcaggcgtgc 2400 agtctcacca ccgagaggca cccttccctc tcttggaaga agtgcaccgc cggcggccag 2460 tgccagaccg tccaggcttc catcactctc gactccaact ggcgctggac tcaccaggtg 2520 tctggctcca ccaactgcta cacgggcaac aagtgggata ctagcatctg cactgatgcc 2580 aagtcgtgcg ctcagaactg ctgcgtcgat ggtgccgact acaccagcac ctatggcatc 2640 accaccaacg gtgattccct gagcctcaag ttcgtcacca agggccagca ctcgaccaac 2700 gtcggctcgc gtacctacct gatggacggc gaggacaagt atcagagtac gttctatctt 2760 cagccttctc gcgccttgaa tcctggctaa cgtttacact tcacagcctt cgagctcctc 2820 ggcaacgagt tcaccttcga tgtcgatgtc tccaacatcg gctgcggtct caacggcgcc 2880 ctgtacttcg tctccatgga cgccgatggt ggtctcagcc gctatcctgg caacaaggct 2940 ggtgccaagt acggtaccgg ctactgcgat gctcagtgcc cccgtgacat caagttcatc 3000 aacggcgagg ccaacattga gggctggacc ggctccacca acgaccccaa cgccggcgcg 3060

ggccgctatg gtacctgctg ctctgagatg gatatctggg aagccaacaa catggctact 3120 gccttcactc ctcacccttg caccatcatt ggccagagcc gctgcgaggg cgactcgtgc 3180 ggtggcacct acagcaacga gcgctacgcc ggcgtctgcg accccgatgg ctgcgacttc 3240 aactcgtacc gccagggcaa caagaccttc tacggcaagg gcatgaccgt cgacaccacc 3300 aagaagatca ctgtcgtcac ccagttcctc aaggatgcca acggcgatct cggcgagatc 3360 aagcgcttct acgtccagga tggcaagatc atccccaact ccgagtccac catccccggc 3420 gtcgagggca attccatcac ccaggactgg tgcgaccgcc agaaggttgc ctttggcgac 3480 attgacgact tcaaccgcaa gggcggcatg aagcagatgg gcaaggccct cgccggcccc 3540 atggtcctgg tcatgtccat ctgggatgac cacgcctcca acatgctctg gctcgactcg 3600 accttccctg tcgatgccgc tggcaagccc ggcgccgagc gcggtgcctg cccgaccacc 3660 tcgggtgtcc ctgctgaggt tgaggccgag gcccccaaca gcaacgtcgt cttctccaac 3720 atccgcttcg gccccatcgg ctcgaccgtt gctggtctcc ccggcgcggg caacggcggc 3780 aacaacggcg gcaacccccc gccccccacc accaccacct cctcggctcc ggccaccacc 3840 accaccgcca gcgctggccc caaggctggc cgctggcagc agtgcggcgg catcggcttc 3900 actggcccga cccagtgcga ggagccctac acttgcacca agctcaacga ctggtactct 3960 cagtgcctgt aaacccagct ttcttgtaca aagtggtgat cgcgccagct ccgtgcgaaa 4020 gcctgacgca ccggtagatt cttggtgagc ccgtatcatg acggcggcgg gagctacatg 4080 gccccgggtg atttattttt tttgtatcta cttctgaccc ttttcaaata tacggtcaac 4140 tcatctttca ctggagatgc ggcctgcttg gtattgcgat gttgtcagct tggcaaattg 4200 tggctttcga aaacacaaaa cgattcctta gtagccatgc attttaagat aacggaatag 4260 aagaaagagg aaattaaaaa aaaaaaaaaa acaaacatcc cgttcataac ccgtagaatc 4320 gccgctcttc gtgtatccca gtaccagttt attttgaata gctcgcccgc tggagagcat 4380 cctgaatgca agtaacaacc gtagaggctg acacggcagg tgttgctagg gagcgtcgtg 4440 ttctacaagg ccagacgtct tcgcggttga tatatatgta tgtttgactg caggctgctc 4500 agcgacgaca gtcaagttcg ccctcgctgc ttgtgcaata atcgcagtgg ggaagccaca 4560 ccgtgactcc catctttcag taaagctctg ttggtgttta tcagcaatac acgtaattta 4620 aactcgttag catggggctg atagcttaat taccgtttac cagtgccatg gttctgcagc 4680 tttccttggc ccgtaaaatt cggcgaagcc agccaatcac cagctaggca ccagctaaac 4740 cctataatta gtctcttatc aacaccatcc gctcccccgg gatcaatgag gagaatgagg 4800 gggatgcggg gctaaagaag cctacataac cctcatgcca actcccagtt tacactcgtc 4860 gagccaacat cctgactata agctaacaca gaatgcctca atcctgggaa gaactggccg 4920

ctgataagcg cgcccgcctc gcaaaaacca tccctgatga atggaaagtc cagacgctgc 4980 ctgcggaaga cagcgttatt gatttcccaa agaaatcggg gatcctttca gaggccgaac 5040 tgaagatcac agaggcctcc gctgcagatc ttgtgtccaa gctggcggcc ggagagttga 5100 cctcggtgga agttacgcta gcattctgta aacgggcagc aatcgcccag cagttagtag 5160 ggtcccctct acctctcagg gagatgtaac aacgccacct tatgggacta tcaagctgac 5220 gctggcttct gtgcagacaa actgcgccca cgagttcttc cctgacgccg ctctcgcgca 5280 ggcaagggaa ctcgatgaat actacgcaaa gcacaagaga cccgttggtc cactccatgg 5340 cctccccatc tctctcaaag accagcttcg agtcaaggta caccgttgcc cctaagtcgt 5400 tagatgtccc tttttgtcag ctaacatatg ccaccagggc tacgaaacat caatgggcta 5460 catctcatgg ctaaacaagt acgacgaagg ggactcggtt ctgacaacca tgctccgcaa 5520 agccggtgcc gtcttctacg tcaagacctc tgtcccgcag accctgatgg tctgcgagac 5580 agtcaacaac atcatcgggc gcaccgtcaa cccacgcaac aagaactggt cgtgcggcgg 5640 cagttctggt ggtgagggtg cgatcgttgg gattcgtggt ggcgtcatcg gtgtaggaac 5700 ggatatcggt ggctcgattc gagtgccggc cgcgttcaac ttcctgtacg gtctaaggcc 5760 gagtcatggg cggctgccgt atgcaaagat ggcgaacagc atggagggtc aggagacggt 5820 gcacagcgtt gtcgggccga ttacgcactc tgttgagggt gagtccttcg cctcttcctt 5880 cttttcctgc tctataccag gcctccactg tcctcctttc ttgcttttta tactatatac 5940 gagaccggca gtcactgatg aagtatgtta gacctccgcc tcttcaccaa atccgtcctc 6000 ggtcaggagc catggaaata cgactccaag gtcatcccca tgccctggcg ccagtccgag 6060 tcggacatta ttgcctccaa gatcaagaac ggcgggctca atatcggcta ctacaacttc 6120 gacggcaatg tccttccaca ccctcctatc ctgcgcggcg tggaaaccac cgtcgccgca 6180 ctcgccaaag ccggtcacac cgtgaccccg tggacgccat acaagcacga tttcggccac 6240 gatctcatct cccatatcta cgcggctgac ggcagcgccg acgtaatgcg cgatatcagt 6300 gcatccggcg agccggcgat tccaaatatc aaagacctac tgaacccgaa catcaaagct 6360 gttaacatga acgagctctg ggacacgcat ctccagaagt ggaattacca gatggagtac 6420 cttgagaaat ggcgggaggc tgaagaaaag gccgggaagg aactggacgc catcatcgcg 6480 ccgattacgc ctaccgctgc ggtacggcat gaccagttcc ggtactatgg gtatgcctct 6540 gtgatcaacc tgctggattt cacgagcgtg gttgttccgg ttacctttgc ggataagaac 6600 atcgataaga agaatgagag tttcaaggcg gttagtgagc ttgatgccct cgtgcaggaa 6660 gagtatgatc cggaggcgta ccatggggca ccggttgcag tgcaggttat cggacggaga 6720 ctcagtgaag agaggacgtt ggcgattgca gaggaagtgg ggaagttgct gggaaatgtg 6780 gtgactccat agctaataag tgtcagatag caatttgcac aagaaatcaa taccagcaac 6840

tgtaaataag cgctgaagtg accatgccat gctacgaaag agcagaaaaa aacctgccgt 6900 agaaccgaag agatatgaca cgcttccatc tctcaaagga agaatccctt cagggttgcg 6960 tttccagtct agacacgtat aacggcacaa gtgtctctca ccaaatgggt tatatctcaa 7020 atgtgatcta aggatggaaa gcccagaata tcgatcgcgc gcagatccat atatagggcc 7080 cgggttataa ttacctcagg tcgacgtccc atggccattc gaattcgtaa tcatggtcat 7140 agctgtttcc tgtgtgaaat tgttatccgc tcacaattcc acacaacata cgagccggaa 7200 gcataaagtg taaagcctgg ggtgcctaat gagtgagcta actcacatta attgcgttgc 7260 gctcactgcc cgctttccag tcgggaaacc tgtcgtgcca gctgcattaa tgaatcggcc 7320 aacgcgcggg gagaggcggt ttgcgtattg ggcgctcttc cgcttcctcg ctcactgact 7380 cgctgcgctc ggtcgttcgg ctgcggcgag cggtatcagc tcactcaaag gcggtaatac 7440 ggttatccac agaatcaggg gataacgcag gaaagaacat gtgagcaaaa ggccagcaaa 7500 aggccaggaa ccgtaaaaag gccgcgttgc tggcgttttt ccataggctc cgcccccctg 7560 acgagcatca caaaaatcga cgctcaagtc agaggtggcg aaacccgaca ggactataaa 7620 gataccaggc gtttccccct ggaagctccc tcgtgcgctc tcctgttccg accctgccgc 7680 ttaccggata cctgtccgcc tttctccctt cgggaagcgt ggcgctttct catagctcac 7740 gctgtaggta tctcagttcg gtgtaggtcg ttcgctccaa gctgggctgt gtgcacgaac 7800 cccccgttca gcccgaccgc tgcgccttat ccggtaacta tcgtcttgag tccaacccgg 7860 taagacacga cttatcgcca ctggcagcag ccactggtaa caggattagc agagcgaggt 7920 atgtaggcgg tgctacagag ttcttgaagt ggtggcctaa ctacggctac actagaagaa 7980 cagtatttgg tatctgcgct ctgctgaagc cagttacctt cggaaaaaga gttggtagct 8040 cttgatccgg caaacaaacc accgctggta gcggtggttt ttttgtttgc aagcagcaga 8100 ttacgcgcag aaaaaaagga tctcaagaag atcctttgat cttttctacg gggtctgacg 8160 ctcagtggaa cgaaaactca cgttaaggga ttttggtcat gagattatca aaaaggatct 8220 tcacctagat ccttttaaat taaaaatgaa gttttaaatc aatctaaagt atatatgagt 8280 aaacttggtc tgacagttac caatgcttaa tcagtgaggc acctatctca gcgatctgtc 8340 tatttcgttc atccatagtt gcctgactcc ccgtcgtgta gataactacg atacgggagg 8400 gcttaccatc tggccccagt gctgcaatga taccgcgaga cccacgctca ccggctccag 8460 atttatcagc aataaaccag ccagccggaa gggccgagcg cagaagtggt cctgcaactt 8520 tatccgcctc catccagtct attaattgtt gccgggaagc tagagtaagt agttcgccag 8580 ttaatagttt gcgcaacgtt gttgccattg ctacaggcat cgtggtgtca cgctcgtcgt 8640 ttggtatggc ttcattcagc tccggttccc aacgatcaag gcgagttaca tgatccccca 8700

tgttgtgcaa aaaagcggtt agctccttcg gtcctccgat cgttgtcaga agtaagttgg 8760 ccgcagtgtt atcactcatg gttatggcag cactgcataa ttctcttact gtcatgccat 8820 ccgtaagatg cttttctgtg actggtgagt actcaaccaa gtcattctga gaatagtgta 8880 tgcggcgacc gagttgctct tgcccggcgt caatacggga taataccgcg ccacatagca 8940 gaactttaaa agtgctcatc attggaaaac gttcttcggg gcgaaaactc tcaaggatct 9000 taccgctgtt gagatccagt tcgatgtaac ccactcgtgc acccaactga tcttcagcat 9060 cttttacttt caccagcgtt tctgggtgag caaaaacagg aaggcaaaat gccgcaaaaa 9120 agggaataag ggcgacacgg aaatgttgaa tactcatact cttccttttt caatattatt 9180 gaagcattta tcagggttat tgtctcatga gcggatacat atttgaatgt atttagaaaa 9240 ataaacaaat aggggttccg cgcacatttc cccgaaaagt gccacctgac gtctaagaaa 9300 ccattattat catgacatta acctataaaa ataggcgtat cacgaggccc tttcgtctcg 9360 cgcgtttcgg tgatgacggt gaaaacctct gacacatgca gctcccggag acggtcacag 9420 cttgtctgta agcggatgcc gggagcagac aagcccgtca gggcgcgtca gcgggtgttg 9480 gcgggtgtcg gggctggctt aactatgcgg catcagagca gattgtactg agagtgcacc 9540 ataaaattgt aaacgttaat attttgttaa aattcgcgtt aaatttttgt taaatcagct 9600 cattttttaa ccaataggcc gaaatcggca aaatccctta taaatcaaaa gaatagcccg 9660 agatagggtt gagtgttgtt ccagtttgga acaagagtcc actattaaag aacgtggact 9720 ccaacgtcaa agggcgaaaa accgtctatc agggcgatgg cccactacgt gaaccatcac 9780 ccaaatcaag ttttttgggg tcgaggtgcc gtaaagcact aaatcggaac cctaaaggga 9840 gcccccgatt tagagcttga cggggaaagc cggcgaacgt ggcgagaaag gaagggaaga 9900 aagcgaaagg agcgggcgct agggcgctgg caagtgtagc ggtcacgctg cgcgtaacca 9960 ccacacccgc cgcgcttaat gcgccgctac agggcgcgta ctatggttgc tttgacgtat 10020 gcggtgtgaa ataccgcaca gatgcgtaag gagaaaatac cgcatcaggc gccattcgcc 10080 attcaggctg cgcaactgtt gggaagggcg atcggtgcgg gcctcttcgc tattacgcca 10140 gctggcgaaa gggggatgtg ctgcaaggcg attaagttgg gtaacgccag ggttttccca 10200 gtcacgacgt tgtaaaacga cggccagtgc cc 10232

<210> 8 <211> 1545 <212> DNA <213> Artificial

<220> <223> engineered sequence based on T. reesei

<400> 8 atgtatcgga agttggccgt catctcggcc ttcttggcca cagctcgtgc tcagtcggcc 60

tgcactcttc aaccggagac tcacccgcct ctgacatggc agaaatgctc gtctggtggc 120 acgtgcactc aacagacagg ctccgtggtc atcgacgcca actggcgctg gattcacgct 180 acgaacagca gcacgagctg ctacgatggc aacacttgga gctcgaccct atgtcctgac 240 aacgagacct gcacgaagaa ctgctgtctg gacggtgccg cctacgcgtc cacgtacgga 300 gttaccacga gcggtgacag cctcaccatt ggctttgtca cccagtctgc gcagaagaac 360 gttggcgctc gcctttacct tatggcgaac gacacgacct accaggaatt caccctgctt 420 ggcaacgagt tctctttcga tgttgatgtt tcgcagctgc cgtgcggctt gaacggagct 480 ctctacttcg tgtccatgga cgcggatggt ggcgtgagca agtatcccac caacaccgct 540 ggcgccaagt acggcacggg gtactgtgac agccagtgtc cccgcgatct gaagttcatc 600 aatggccagg ccaacgttga gggctgggag ccgtcaacca acaacgcgaa cacgggcatt 660 ggaggacacg gaagctgctg ctctgagatg gatatctggg aggccaactc tatctccgag 720 gctcttaccc tccacccttg cacgactgtc ggccaggaga tctgcgaggg tgatgggtgc 780 ggcggaactt actccaagaa cagatatggc ggcccttgcg atcccgatgg ctgcgactgg 840 aacccatacc gcctgggcaa caccagcttc tacggccctg gcccaagctt taccctcgat 900 accaccaaga aattgaccgt tgtcacccag ttcaagccgt cgggtgccat caaccgatac 960 tatgtccaga atggcgtcac tttccagcag cccaacgccg agcttggtag ttactctggc 1020 aacgagctca acgatgatta ctgctacgct gaggaggcag aattcggcgg atcctctttc 1080 tcagacaagg gcggcctgac tcagttcaag aaggctacct ctggcggcat ggttctggtc 1140 atgagtctgt gggatgatta ctacgccaac atgctgtggc tggactccac ctacccgaca 1200 aacgagacct cctccacacc cggtgccgtg cgcggaagct gctccaccag ctccggtgac 1260 cctgctcagg tcgaatctca gtttcccaac gccaaggtca ccttctccaa catcaagttc 1320 ggacccattg gcagcaccgg caaccctagc ggcggcaacc ctcccggcgg aaacccgcct 1380 ggcaccacca ccacccgccg cccagccact accactggaa gctctcccgg acctacccag 1440 tctcactacg gccagtgcgg cggtattggc tacagcggcc ccacggtctg cgccagcggc 1500 acaacttgcc aggtcctgaa cccttactac tctcagtgcc tgtaa 1545

<210> 9

<211> 514 <212> PRT

<213> Artificial

<220>

<223> engineered sequence based on T. reesei

<400> 9

Met Tyr Arg Lys Leu Ala VaI lie Ser Ala Phe Leu Ala Thr Ala Arg

5 10 15

Ala GIn Ser Ala Cys Thr Leu GIn Pro GIu Thr His Pro Pro Leu Thr 20 25 30

Trp GIn Lys Cys Ser Ser GIy GIy Thr Cys Thr GIn GIn Thr GIy Ser 35 40 45

VaI VaI lie Asp Ala Asn Trp Arg Trp He His Ala Thr Asn Ser Ser 50 55 60

Thr Ser Cys Tyr Asp GIy Asn Thr Trp Ser Ser Thr Leu Cys Pro Asp 65 70 75 80

Asn GIu Thr Cys Thr Lys Asn Cys Cys Leu Asp GIy Ala Ala Tyr Ala

85 90 95

Ser Thr Tyr GIy VaI Thr Thr Ser GIy Asp Ser Leu Thr He GIy Phe 100 105 110

VaI Thr GIn Ser Ala GIn Lys Asn VaI GIy Ala Arg Leu Tyr Leu Met 115 120 125

Ala Asn Asp Thr Thr Tyr Gin GIu Phe Thr Leu Leu GIy Asn GIu Phe 130 135 140

Ser Phe Asp VaI Asp VaI Ser GIn Leu Pro Cys GIy Leu Asn GIy Ala 145 150 155 160

Leu Tyr Phe VaI Ser Met Asp Ala Asp GIy GIy VaI Ser Lys Tyr Pro

165 170 175

Thr Asn Thr Ala GIy Ala Lys Tyr GIy Thr GIy Tyr Cys Asp Ser GIn 180 185 190

Cys Pro Arg Asp Leu Lys Phe He Asn GIy GIn Ala Asn VaI GIu GIy

195 200 205

Trp GIu Pro Ser Thr Asn Asn Ala Asn Thr GIy He GIy GIy His GIy 210 215 220

Ser Cys Cys Ser GIu Met Asp He Trp GIu Ala Asn Ser He Ser GIu 225 230 235 240

Ala Leu Thr Leu His Pro Cys Thr Thr VaI GIy GIn GIu He Cys GIu

245 250 255

GIy Asp GIy Cys GIy GIy Thr Tyr Ser Lys Asn Arg Tyr GIy GIy Pro 260 265 270

Cys Asp Pro Asp GIy Cys Asp Trp Asn Pro Tyr Arg Leu GIy Asn Thr 275 280 285

Ser Phe Tyr GIy Pro GIy Pro Ser Phe Thr Leu Asp Thr Thr Lys Lys 290 295 300

Leu Thr VaI VaI Thr Gin Phe Lys Pro Ser GIy Ala lie Asn Arg Tyr 305 310 315 320

Tyr VaI GIn Asn GIy VaI Thr Phe GIn Gin Pro Asn Ala GIu Leu GIy 325 330 335

Ser Tyr Ser GIy Asn GIu Leu Asn Asp Asp Tyr Cys Tyr Ala GIu GIu 340 345 350

Ala GIu Phe GIy GIy Ser Ser Phe Ser Asp Lys GIy GIy Leu Thr GIn 355 360 365

Phe Lys Lys Ala Thr Ser GIy GIy Met VaI Leu VaI Met Ser Leu Trp 370 375 380

Asp Asp Tyr Tyr Ala Asn Met Leu Trp Leu Asp Ser Thr Tyr Pro Thr 385 390 395 400

Asn GIu Thr Ser Ser Thr Pro GIy Ala VaI Arg GIy Ser Cys Ser Thr 405 410 415

Ser Ser GIy Asp Pro Ala GIn VaI GIu Ser GIn Phe Pro Asn Ala Lys 420 425 430

VaI Thr Phe Ser Asn lie Lys Phe GIy Pro lie GIy Ser Thr GIy Asn 435 440 445

Pro Ser GIy GIy Asn Pro Pro GIy GIy Asn Pro Pro GIy Thr Thr Thr 450 455 460

Thr Arg Arg Pro Ala Thr Thr Thr GIy Ser Ser Pro GIy Pro Thr GIn 465 470 475 480

Ser His Tyr GIy GIn Cys GIy GIy lie GIy Tyr Ser GIy Pro Thr VaI 485 490 495

Cys Ala Ser GIy Thr Thr Cys GIn VaI Leu Asn Pro Tyr Tyr Ser GIn 500 505 510

Cys Leu

<210> 10

<211> 10145

<212> DNA

<213> Artificial

<220>

<223> pTrex3g-cbhl expression vector

<400> 10 aagcttacta gtacttctcg agctctgtac atgtccggtc gcgacgtacg cgtatcgatg 60 gcgccagctg caggcggccg cctgcagcca cttgcagtcc cgtggaattc tcacggtgaa 120 tgtaggcctt ttgtagggta ggaattgtca ctcaagcacc cccaacctcc attacgcctc 180 ccccatagag ttcccaatca gtgagtcatg gcactgttct caaatagatt ggggagaagt 240 tgacttccgc ccagagctga aggtcgcaca accgcatgat atagggtcgg caacggcaaa 300 aaagcacgtg gctcaccgaa aagcaagatg tttgcgatct aacatccagg aacctggata 360 catccatcat cacgcacgac cactttgatc tgctggtaaa ctcgtattcg ccctaaaccg 420 aagtgcgtgg taaatctaca cgtgggcccc tttcggtata ctgcgtgtgt cttctctagg 480 tgccattctt ttcccttcct ctagtgttga attgtttgtg ttggagtccg agctgtaact 540 acctctgaat ctctggagaa tggtggacta acgactaccg tgcacctgca tcatgtatat 600 aatagtgatc ctgagaaggg gggtttggag caatgtggga ctttgatggt catcaaacaa 660 agaacgaaga cgcctctttt gcaaagtttt gtttcggcta cggtgaagaa ctggatactt 720 gttgtgtctt ctgtgtattt ttgtggcaac aagaggccag agacaatcta ttcaaacacc 780 aagcttgctc ttttgagcta caagaacctg tggggtatat atctagagtt gtgaagtcgg 840 taatcccgct gtatagtaat acgagtcgca tctaaatact ccgaagctgc tgcgaacccg 900 gagaatcgag atgtgctgga aagcttctag cgagcggcta aattagcatg aaaggctatg 960 agaaattctg gagacggctt gttgaatcat ggcgttccat tcttcgacaa gcaaagcgtt 1020 ccgtcgcagt agcaggcact cattcccgaa aaaactcgga gattcctaag tagcgatgga 1080 accggaataa tataataggc aatacattga gttgcctcga cggttgcaat gcaggggtac 1140 tgagcttgga cataactgtt ccgtacccca cctcttctca acctttggcg tttccctgat 1200 tcagcgtacc cgtacaagtc gtaatcacta ttaacccaga ctgaccggac gtgttttgcc 1260 cttcatttgg agaaataatg tcattgcgat gtgtaatttg cctgcttgac cgactggggc 1320 tgttcgaagc ccgaatgtag gattgttatc cgaactctgc tcgtagaggc atgttgtgaa 1380 tctgtgtcgg gcaggacacg cctcgaaggt tcacggcaag ggaaaccacc gatagcagtg 1440 tctagtagca acctgtaaag ccgcaatgca gcatcactgg aaaatacaaa ccaatggcta 1500

aaagtacata agttaatgcc taaagaagtc atataccagc ggctaataat tgtacaatca 1560 agtggctaaa cgtaccgtaa tttgccaacg gcttgtgggg ttgcagaagc aacggcaaag 1620 ccccacttcc ccacgtttgt ttcttcactc agtccaatct cagctggtga tcccccaatt 1680 gggtcgcttg tttgttccgg tgaagtgaaa gaagacagag gtaagaatgt ctgactcgga 1740 gcgttttgca tacaaccaag ggcagtgatg gaagacagtg aaatgttgac attcaaggag 1800 tatttagcca gggatgcttg agtgtatcgt gtaaggaggt ttgtctgccg atacgacgaa 1860 tactgtatag tcacttctga tgaagtggtc catattgaaa tgtaaagtcg gcactgaaca 1920 ggcaaaagat tgagttgaaa ctgcctaaga tctcgggccc tcgggccttc ggcctttggg 1980 tgtacatgtt tgtgctccgg gcaaatgcaa agtgtggtag gatcgaacac actgctgcct 2040 ttaccaagca gctgagggta tgtgataggc aaatgttcag gggccactgc atggtttcga 2100 atagaaagag aagcttagcc aagaacaata gccgataaag atagcctcat taaacggaat 2160 gagctagtag gcaaagtcag cgaatgtgta tatataaagg ttcgaggtcc gtgcctccct 2220 catgctctcc ccatctactc atcaactcag atcctccagg agacttgtac accatctttt 2280 gaggcacaga aacccaatag tcaaccatca caagtttgta caaaaaacag gctatgtatc 2340 ggaagttggc cgtcatctcg gccttcttgg ccacagctcg tgctcagtcg gcctgcactc 2400 ttcaaccgga gactcacccg cctctgacat ggcagaaatg ctcgtctggt ggcacgtgca 2460 ctcaacagac aggctccgtg gtcatcgacg ccaactggcg ctggattcac gctacgaaca 2520 gcagcacgag ctgctacgat ggcaacactt ggagctcgac cctatgtcct gacaacgaga 2580 cctgcacgaa gaactgctgt ctggacggtg ccgcctacgc gtccacgtac ggagttacca 2640 cgagcggtga cagcctcacc attggctttg tcacccagtc tgcgcagaag aacgttggcg 2700 ctcgccttta ccttatggcg aacgacacga cctaccagga attcaccctg cttggcaacg 2760 agttctcttt cgatgttgat gtttcgcagc tgccgtgcgg cttgaacgga gctctctact 2820 tcgtgtccat ggacgcggat ggtggcgtga gcaagtatcc caccaacacc gctggcgcca 2880 agtacggcac ggggtactgt gacagccagt gtccccgcga tctgaagttc atcaatggcc 2940 aggccaacgt tgagggctgg gagccgtcaa ccaacaacgc gaacacgggc attggaggac 3000 acggaagctg ctgctctgag atggatatct gggaggccaa ctctatctcc gaggctctta 3060 ccctccaccc ttgcacgact gtcggccagg agatctgcga gggtgatggg tgcggcggaa 3120 cttactccaa gaacagatat ggcggccctt gcgatcccga tggctgcgac tggaacccat 3180 accgcctggg caacaccagc ttctacggcc ctggcccaag ctttaccctc gataccacca 3240 agaaattgac cgttgtcacc cagttcaagc cgtcgggtgc catcaaccga tactatgtcc 3300 agaatggcgt cactttccag cagcccaacg ccgagcttgg tagttactct ggcaacgagc 3360

tcaacgatga ttactgctac gctgaggagg cagaattcgg cggatcctct ttctcagaca 3420 agggcggcct gactcagttc aagaaggcta cctctggcgg catggttctg gtcatgagtc 3480 tgtgggatga ttactacgcc aacatgctgt ggctggactc cacctacccg acaaacgaga 3540 cctcctccac acccggtgcc gtgcgcggaa gctgctccac cagctccggt gaccctgctc 3600 aggtcgaatc tcagtttccc aacgccaagg tcaccttctc caacatcaag ttcggaccca 3660 ttggcagcac cggcaaccct agcggcggca accctcccgg cggaaacccg cctggcacca 3720 ccaccacccg ccgcccagcc actaccactg gaagctctcc cggacctacc cagtctcact 3780 acggccagtg cggcggtatt ggctacagcg gccccacggt ctgcgccagc ggcacaactt 3840 gccaggtcct gaacccttac tactctcagt gcctgtaaac ccagctttct tgtacaaagt 3900 ggtgatcgcg ccgcgcgcca gctccgtgcg aaagcctgac gcaccggtag attcttggtg 3960 agcccgtatc atgacggcgg cgggagctac atggccccgg gtgatttatt ttttttgtat 4020 ctacttctga cccttttcaa atatacggtc aactcatctt tcactggaga tgcggcctgc 4080 ttggtattgc gatgttgtca gcttggcaaa ttgtggcttt cgaaaacaca aaacgattcc 4140 ttagtagcca tgcattttaa gataacggaa tagaagaaag aggaaattaa aaaaaaaaaa 4200 aaaacaaaca tcccgttcat aacccgtaga atcgccgctc ttcgtgtatc ccagtaccag 4260 tttattttga atagctcgcc cgctggagag catcctgaat gcaagtaaca accgtagagg 4320 ctgacacggc aggtgttgct agggagcgtc gtgttctaca aggccagacg tcttcgcggt 4380 tgatatatat gtatgtttga ctgcaggctg ctcagcgacg acagtcaagt tcgccctcgc 4440 tgcttgtgca ataatcgcag tggggaagcc acaccgtgac tcccatcttt cagtaaagct 4500 ctgttggtgt ttatcagcaa tacacgtaat ttaaactcgt tagcatgggg ctgatagctt 4560 aattaccgtt taccagtgcc atggttctgc agctttcctt ggcccgtaaa attcggcgaa 4620 gccagccaat caccagctag gcaccagcta aaccctataa ttagtctctt atcaacacca 4680 tccgctcccc cgggatcaat gaggagaatg agggggatgc ggggctaaag aagcctacat 4740 aaccctcatg ccaactccca gtttacactc gtcgagccaa catcctgact ataagctaac 4800 acagaatgcc tcaatcctgg gaagaactgg ccgctgataa gcgcgcccgc ctcgcaaaaa 4860 ccatccctga tgaatggaaa gtccagacgc tgcctgcgga agacagcgtt attgatttcc 4920 caaagaaatc ggggatcctt tcagaggccg aactgaagat cacagaggcc tccgctgcag 4980 atcttgtgtc caagctggcg gccggagagt tgacctcggt ggaagttacg ctagcattct 5040 gtaaacgggc agcaatcgcc cagcagttag tagggtcccc tctacctctc agggagatgt 5100 aacaacgcca ccttatggga ctatcaagct gacgctggct tctgtgcaga caaactgcgc 5160 ccacgagttc ttccctgacg ccgctctcgc gcaggcaagg gaactcgatg aatactacgc 5220 aaagcacaag agacccgttg gtccactcca tggcctcccc atctctctca aagaccagct 5280

tcgagtcaag gtacaccgtt gcccctaagt cgttagatgt ccctttttgt cagctaacat 5340 atgccaccag ggctacgaaa catcaatggg ctacatctca tggctaaaca agtacgacga 5400 aggggactcg gttctgacaa ccatgctccg caaagccggt gccgtcttct acgtcaagac 5460 ctctgtcccg cagaccctga tggtctgcga gacagtcaac aacatcatcg ggcgcaccgt 5520 caacccacgc aacaagaact ggtcgtgcgg cggcagttct ggtggtgagg gtgcgatcgt 5580 tgggattcgt ggtggcgtca tcggtgtagg aacggatatc ggtggctcga ttcgagtgcc 5640 ggccgcgttc aacttcctgt acggtctaag gccgagtcat gggcggctgc cgtatgcaaa 5700 gatggcgaac agcatggagg gtcaggagac ggtgcacagc gttgtcgggc cgattacgca 5760 ctctgttgag ggtgagtcct tcgcctcttc cttcttttcc tgctctatac caggcctcca 5820 ctgtcctcct ttcttgcttt ttatactata tacgagaccg gcagtcactg atgaagtatg 5880 ttagacctcc gcctcttcac caaatccgtc ctcggtcagg agccatggaa atacgactcc 5940 aaggtcatcc ccatgccctg gcgccagtcc gagtcggaca ttattgcctc caagatcaag 6000 aacggcgggc tcaatatcgg ctactacaac ttcgacggca atgtccttcc acaccctcct 6060 atcctgcgcg gcgtggaaac caccgtcgcc gcactcgcca aagccggtca caccgtgacc 6120 ccgtggacgc catacaagca cgatttcggc cacgatctca tctcccatat ctacgcggct 6180 gacggcagcg ccgacgtaat gcgcgatatc agtgcatccg gcgagccggc gattccaaat 6240 atcaaagacc tactgaaccc gaacatcaaa gctgttaaca tgaacgagct ctgggacacg 6300 catctccaga agtggaatta ccagatggag taccttgaga aatggcggga ggctgaagaa 6360 aaggccggga aggaactgga cgccatcatc gcgccgatta cgcctaccgc tgcggtacgg 6420 catgaccagt tccggtacta tgggtatgcc tctgtgatca acctgctgga tttcacgagc 6480 gtggttgttc cggttacctt tgcggataag aacatcgata agaagaatga gagtttcaag 6540 gcggttagtg agcttgatgc cctcgtgcag gaagagtatg atccggaggc gtaccatggg 6600 gcaccggttg cagtgcaggt tatcggacgg agactcagtg aagagaggac gttggcgatt 6660 gcagaggaag tggggaagtt gctgggaaat gtggtgactc catagctaat aagtgtcaga 6720 tagcaatttg cacaagaaat caataccagc aactgtaaat aagcgctgaa gtgaccatgc 6780 catgctacga aagagcagaa aaaaacctgc cgtagaaccg aagagatatg acacgcttcc 6840 atctctcaaa ggaagaatcc cttcagggtt gcgtttccag tctagacacg tataacggca 6900 caagtgtctc tcaccaaatg ggttatatct caaatgtgat ctaaggatgg aaagcccaga 6960 atatcgatcg cgcgcagatc catatatagg gcccgggtta taattacctc aggtcgacgt 7020 cccatggcca ttcgaattcg taatcatggt catagctgtt tcctgtgtga aattgttatc 7080 cgctcacaat tccacacaac atacgagccg gaagcataaa gtgtaaagcc tggggtgcct 7140

aatgagtgag ctaactcaca ttaattgcgt tgcgctcact gcccgctttc cagtcgggaa 7200 acctgtcgtg ccagctgcat taatgaatcg gccaacgcgc ggggagaggc ggtttgcgta 7260 ttgggcgctc ttccgcttcc tcgctcactg actcgctgcg ctcggtcgtt cggctgcggc 7320 gagcggtatc agctcactca aaggcggtaa tacggttatc cacagaatca ggggataacg 7380 caggaaagaa catgtgagca aaaggccagc aaaaggccag gaaccgtaaa aaggccgcgt 7440 tgctggcgtt tttccatagg ctccgccccc ctgacgagca tcacaaaaat cgacgctcaa 7500 gtcagaggtg gcgaaacccg acaggactat aaagatacca ggcgtttccc cctggaagct 7560 ccctcgtgcg ctctcctgtt ccgaccctgc cgcttaccgg atacctgtcc gcctttctcc 7620 cttcgggaag cgtggcgctt tctcatagct cacgctgtag gtatctcagt tcggtgtagg 7680 tcgttcgctc caagctgggc tgtgtgcacg aaccccccgt tcagcccgac cgctgcgcct 7740 tatccggtaa ctatcgtctt gagtccaacc cggtaagaca cgacttatcg ccactggcag 7800 cagccactgg taacaggatt agcagagcga ggtatgtagg cggtgctaca gagttcttga 7860 agtggtggcc taactacggc tacactagaa gaacagtatt tggtatctgc gctctgctga 7920 agccagttac cttcggaaaa agagttggta gctcttgatc cggcaaacaa accaccgctg 7980 gtagcggtgg tttttttgtt tgcaagcagc agattacgcg cagaaaaaaa ggatctcaag 8040 aagatccttt gatcttttct acggggtctg acgctcagtg gaacgaaaac tcacgttaag 8100 ggattttggt catgagatta tcaaaaagga tcttcaccta gatcctttta aattaaaaat 8160 gaagttttaa atcaatctaa agtatatatg agtaaacttg gtctgacagt taccaatgct 8220 taatcagtga ggcacctatc tcagcgatct gtctatttcg ttcatccata gttgcctgac 8280 tccccgtcgt gtagataact acgatacggg agggcttacc atctggcccc agtgctgcaa 8340 tgataccgcg agacccacgc tcaccggctc cagatttatc agcaataaac cagccagccg 8400 gaagggccga gcgcagaagt ggtcctgcaa ctttatccgc ctccatccag tctattaatt 8460 gttgccggga agctagagta agtagttcgc cagttaatag tttgcgcaac gttgttgcca 8520 ttgctacagg catcgtggtg tcacgctcgt cgtttggtat ggcttcattc agctccggtt 8580 cccaacgatc aaggcgagtt acatgatccc ccatgttgtg caaaaaagcg gttagctcct 8640 tcggtcctcc gatcgttgtc agaagtaagt tggccgcagt gttatcactc atggttatgg 8700 cagcactgca taattctctt actgtcatgc catccgtaag atgcttttct gtgactggtg 8760 agtactcaac caagtcattc tgagaatagt gtatgcggcg accgagttgc tcttgcccgg 8820 cgtcaatacg ggataatacc gcgccacata gcagaacttt aaaagtgctc atcattggaa 8880 aacgttcttc ggggcgaaaa ctctcaagga tcttaccgct gttgagatcc agttcgatgt 8940 aacccactcg tgcacccaac tgatcttcag catcttttac tttcaccagc gtttctgggt 9000 gagcaaaaac aggaaggcaa aatgccgcaa aaaagggaat aagggcgaca cggaaatgtt 9060

gaatactcat actcttcctt tttcaatatt attgaagcat ttatcagggt tattgtctca 9120 tgagcggata catatttgaa tgtatttaga aaaataaaca aataggggtt ccgcgcacat 9180 ttccccgaaa agtgccacct gacgtctaag aaaccattat tatcatgaca ttaacctata 9240 aaaataggcg tatcacgagg ccctttcgtc tcgcgcgttt cggtgatgac ggtgaaaacc 9300 tctgacacat gcagctcccg gagacggtca cagcttgtct gtaagcggat gccgggagca 9360 gacaagcccg tcagggcgcg tcagcgggtg ttggcgggtg tcggggctgg cttaactatg 9420 cggcatcaga gcagattgta ctgagagtgc accataaaat tgtaaacgtt aatattttgt 9480 taaaattcgc gttaaatttt tgttaaatca gctcattttt taaccaatag gccgaaatcg 9540 gcaaaatccc ttataaatca aaagaatagc ccgagatagg gttgagtgtt gttccagttt 9600 ggaacaagag tccactatta aagaacgtgg actccaacgt caaagggcga aaaaccgtct 9660 atcagggcga tggcccacta cgtgaaccat cacccaaatc aagttttttg gggtcgaggt 9720 gccgtaaagc actaaatcgg aaccctaaag ggagcccccg atttagagct tgacggggaa 9780 agccggcgaa cgtggcgaga aaggaaggga agaaagcgaa aggagcgggc gctagggcgc 9840 tggcaagtgt agcggtcacg ctgcgcgtaa ccaccacacc cgccgcgctt aatgcgccgc 9900 tacagggcgc gtactatggt tgctttgacg tatgcggtgt gaaataccgc acagatgcgt 9960 aaggagaaaa taccgcatca ggcgccattc gccattcagg ctgcgcaact gttgggaagg 10020 gcgatcggtg cgggcctctt cgctattacg ccagctggcg aaagggggat gtgctgcaag 10080 gcgattaagt tgggtaacgc cagggttttc ccagtcacga cgttgtaaaa cgacggccag 10140 tgccc 10145

<210> 11 <211> 1416 <212> DNA <213> Artificial

<220>

<223> engineered sequence based on T. reesei

<400> 11 atgattgtcg gcattctcac cacgctggct acgctggcca cactcgcagc tagtgtgcct 60 ctagaggagc ggcaagcttg ctcaagcgtc tggggccaat gtggtggcca gaattggtcg 120 ggtccgactt gctgtgcttc cggaagcaca tgcgtctact ccaacgacta ttactcccag 180 tgtcttcccg gcgctgcaag ctcaagctcg tccacgcgcg ccgcgtcgac gacttctcga 240 gtatccccca caacatcccg gtcgagctcc gcgacgcctc cacctggttc tactactacc 300 agagtacctc cagtcggatc gggaaccgct acgtattcag gcaacccttt tgttggggtc 360 actctttggg ccaatgcata ttacgcctct gaagttagca gcctcgctat tcctagcttg 420

actggagcca tggccactgc tgcagcagct gtcgcaaagg ttccctcttt tgtgtggcta 480 gatactcttg acaagacccc tctcatggag caaaccttgg ccgacatccg cgccgccaac 540 aagaatggcg gtaactatgc cggacagttt gtggtgtatg acttgccgga tcgcgattgc 600 gctgcccttg cctcgaatgg cgaatactct attgccgatg gtggcgtcgc caaatataag 660 aactatatcg acaccattcg tcaaattgtc gtggaatatt ccgatgtccg gaccctcctg 720 gttattgagc ctgactctct tgccaacctg gtgaccaacc tcggtactcc aaagtgtgcc 780 aatgctcagt cagcctacct tgagtgcatc aactacgccg tcacacagct gaaccttcca 840 aatgttgcga tgtatttgga cgctggccat gcaggatggc ttggctggcc ggcaaaccaa 900 gacccggccg ctcagctatt tgcaaatgtt tacaagaatg catcgtctcc gagagctctt 960 cgcggattgg caaccaatgt cgccaactac aacgggtgga acattaccag ccccccaccg 1020 tacacgcaag gcaacgctgt ctacaacgag aagctgtaca tccacgctat tggacctctt 1080 cttgccaatc acggctggtc caacgccttc ttcatcactg atcaaggtcg atcgggaaag 1140 cagcctaccg gacagcaaca gtggggagac tggtgcaatg tgatcggcac cggatttggt 1200 attcgcccat ccgcaaacac tggggactcg ttgctggatt cgtttgtctg ggtcaagcca 1260 ggcggcgagt gtgacggcac cagcgacagc agtgcgccac gatttgacta ccactgtgcg 1320 ctcccagatg ccttgcaacc ggcgcctcaa gctggtgctt ggttccaagc ctactttgtg 1380 cagcttctca caaacgcaaa cccatcgttc ctgtaa 1416

<210> 12

<211> 471

<212> PRT

<213> Artificial

<220>

<223> engineered sequence based on T. reesei

<400> 12

Met lie VaI GIy lie Leu Thr Thr Leu Ala Thr Leu Ala Thr Leu Ala 1 5 10 15

Ala Ser VaI Pro Leu GIu GIu Arg GIn Ala Cys Ser Ser VaI Trp GIy 20 25 30

GIn Cys GIy GIy GIn Asn Trp Ser GIy Pro Thr Cys Cys Ala Ser GIy 35 40 45

Ser Thr Cys VaI Tyr Ser Asn Asp Tyr Tyr Ser GIn Cys Leu Pro GIy 50 55 60

Ala Ala Ser Ser Ser Ser Ser Thr Arg Ala Ala Ser Thr Thr Ser Arg 65 70 75 80

VaI Ser Pro Thr Thr Ser Arg Ser Ser Ser Ala Thr Pro Pro Pro GIy 85 90 95

Ser Thr Thr Thr Arg VaI Pro Pro VaI GIy Ser GIy Thr Ala Thr Tyr 100 105 110

Ser GIy Asn Pro Phe VaI GIy VaI Thr Pro Trp Ala Asn Ala Tyr Tyr 115 120 125

Ala Ser GIu VaI Ser Ser Leu Ala lie Pro Ser Leu Thr GIy Ala Met 130 135 140

Ala Thr Ala Ala Ala Ala VaI Ala Lys VaI Pro Ser Phe Met Trp Leu 145 150 155 160

Asp Thr Leu Asp Lys Thr Pro Leu Met GIu GIn Thr Leu Ala Asp lie 165 170 175

Arg Thr Ala Asn Lys Asn GIy GIy Asn Tyr Ala GIy GIn Phe VaI VaI 180 185 190

Tyr Asp Leu Pro Asp Arg Asp Cys Ala Ala Leu Ala Ser Asn GIy GIu 195 200 205

Tyr Ser lie Ala Asp GIy GIy VaI Ala Lys Tyr Lys Asn Tyr lie Asp 210 215 220

Thr lie Arg GIn lie VaI VaI GIu Tyr Ser Asp lie Arg Thr Leu Leu 225 230 235 240

VaI lie GIu Pro Asp Ser Leu Ala Asn Leu VaI Thr Asn Leu GIy Thr

245 250 255

Pro Lys Cys Ala Asn Ala GIn Ser Ala Tyr Leu GIu Cys lie Asn Tyr 260 265 270

Ala VaI Thr GIn Leu Asn Leu Pro Asn VaI Ala Met Tyr Leu Asp Ala 275 280 285

GIy His Ala GIy Trp Leu GIy Trp Pro Ala Asn GIn Asp Pro Ala Ala 290 295 300

GIn Leu Phe Ala Asn VaI Tyr Lys Asn Ala Ser Ser Pro Arg Ala Leu 305 310 315 320

Arg GIy Leu Ala Thr Asn VaI Ala Asn Tyr Asn GIy Trp Asn lie Thr

325 330 335

Ser Pro Pro Ser Tyr Thr GIn GIy Asn Ala VaI Tyr Asn GIu Lys Leu 340 345 350

Tyr lie His Ala lie GIy Pro Leu Leu Ala Asn His GIy Trp Ser Asn 355 360 365

Ala Phe Phe lie Thr Asp GIn GIy Arg Ser GIy Lys GIn Pro Thr GIy 370 375 380

Gin GIn Gin Trp GIy Asp Trp Cys Asn VaI lie GIy Thr GIy Phe GIy 385 390 395 400

lie Arg Pro Ser Ala Asn Thr GIy Asp Ser Leu Leu Asp Ser Phe VaI

405 410 415

Trp VaI Lys Pro GIy GIy GIu Cys Asp GIy Thr Ser Asp Ser Ser Ala 420 425 430

Pro Arg Phe Asp Ser His Cys Ala Leu Pro Asp Ala Leu GIn Pro Ala 435 440 445

Pro GIn Ala GIy Ala Trp Phe GIn Ala Tyr Phe VaI GIn Leu Leu Thr 450 455 460

Asn Ala Asn Pro Ser Phe Leu 465 470

<210> 13

<211> 14158

<212> DNA

<213> Artificial <220>

<223> expression vector pExp-cbhll

<400> 13 gcggccgccg gggtagacga agtgacacgt atccgaaaca gcagtggtat tatggcagct 60 cagcggcatc aaacacgaac ctgagctggc catcgctgag ctgacaagag ccccgccgag 120 cagccatcgc tgaagcgcca tccttatgag caaggaaggg agttattttc gaggatggaa 180 atcttgagtg gatgtctgat ctaggttctt tattgcccag agctgtccct tttaatactc 240 tcgacatcta aaagtttttt ctcctcagcg gtctagcccg cattagcagc agtttcgtca 300 aagcttacgg ctgcatttgc acaccgaggt cgatgtgcca agagctgggg tgctgagagc 360 tggacaatga ttctccactt cagtgttgtt atcggtttcg agcttccact tgaagttagc 420 aggtgcgagt cgctatctct gtagttgagg acgggaccat ttgtactttg ttgtatgtag 480

cctctgcagt ggttggtcct gaataatctt tgaatactcc ggccggctgt gcatttccgt 540 tctctacagc gcagcatctg actagttgta tcgaaccatt agtccgtata gtatcgcatg 600 caattgctag tcaatggtag cagatcagtc gaaggcgtga agtcaatata cgattgcatt 660 gcccgccttg ttatgacaaa cgtaccgagg aagagaagac agtgtatgcc tctatgtatc 720 aaataaggag ccaggaacct cattacccgt atgctattat cgagtggcac tacatgatct 780 ccgaaaaatt taaaaaagaa ataaaaaatt gtcgttaggt ttttacagca agctctcttg 840 ggttatcgga ggctggctgg ttggagcttg tgcagtctct ttgctgatcg agaagattag 900 catgtttctt tcacaatgca aaagaagtat tgctaggaag gttcgaaaga acacttactc 960 ttctacacag tcatttcctg gagactaaca gagctttatg tagtatatat ggagacgtga 1020 agctactgcc gggtgcatgg cttgcccatc accgcacgag ttcgagcacg ttaatattcc 1080 aattacgact caaatcaata cctttgtcaa tgggagctcg tcttgacatt aacgcatcct 1140 ttcaagtaat gcaatgcagc aatggaggaa cttgtagaga ccgagggagg aatggcgaag 1200 ggcggccgga gcttggagtc ctggtggagg ctgaaagctt cgagtttcag cgtctcccag 1260 aagttaccca acccaagtgg ctacaacgac aataagtatt ctatacctag taatattgtt 1320 cgatgcttgt atggagtaga tgctggagtc tggtgtaata ttaatggctt agttcatact 1380 acatttgaca tttccagccc gagagcgcac cgaagccaca tgccgcatat tgacaaagtg 1440 ctagattgtg taaggagggc attctctata gaggaatcag cgtttgcata tacctactac 1500 gtcattgccc taatggacag taagctagcc agctgcatta tgataagagt aacgtgagat 1560 aggtaataag tcttacaaca ctttccctta tagccactaa actacaacat cgtcctgcag 1620 ttcctatatg atacgtataa cccattgata catccaagta tccagaggtg tatggaaata 1680 tcagatcaag acctctctct tctaagaaac ctagaaccag acgctggtag tataataagc 1740 acactgtgac tcgcttaggc ccttaagctt aggccggctt gcttactatt aacctctcat 1800 aaacgctact gcaatgattg gaaacttctt atagtagaat gaggcaataa gacgcatctc 1860 aggtcacata tagtcttatg tttgaaaccc ctcactactg ccatttatct tgtggaaata 1920 tctattattt cagtctatac gtaatgaagg cacttttcag gatctcttcc ctaagcttgt 1980 ataagcaggt ttgttgccgt aaccattctg tctcctcgcc taatacctgt gaagcacaga 2040 atacgtttat tctataagag acgtcttacc ttccatcgag attgaaagct taaaccgtct 2100 acaacggatg ccctcatcat gacccgtcta actcgaacat ctgccacatt agtctcgggt 2160 aacaggagga gtaacacgac cagtgtaaca cgttaagcat acaattgaac gagaatggtg 2220 aggactgaga taaaagaatt ctgttaagga tctaaaatta tagtgcatac aaggtagatg 2280 ttagtaggtg gtttcagttt tcctttcctt tacgttggta tagagcagcg ttcaccaaat 2340

gttagcagag ttctatctat gtcgtatcca ttctgcctta tatctctcaa gggcgccgag 2400 ctcatcctac gaagctctca ggccatcgta ggaaatacag gatagacact gaattctagg 2460 ctaggtatgc gaggcacgcg gatctagggc agactgggca ttgcatagct atggtgtagt 2520 agaactcccg tcaacggcta ttctcaccta gactttcccc ttcgaactga caagttgtta 2580 tattgcctgt gtaccaagcg ctaatgtgga caggattaat gccagagttc attagcctca 2640 agtagagcct atttcctcgc cggaaagtca tctctcttat tgcatttctg cccttcccac 2700 taactcaggg tgcagcgcaa cactacacgc aacatataca ctttattagc cgtgcaacaa 2760 ggctattcta cgaaaaatgc tacactccac atgttaaagg cgcattcaac cagcttcttt 2820 attgggtaat atacagccag gcggggatga agctcattag ccgccactca aggctataca 2880 atgttgccaa ctctccgggc tttatcctgt gctcccgaat accacatcgt gatgatgctt 2940 cagcgcacgg aagtcacaga caccgcctgt ataaaagggg gactgtgacc ctgtatgagg 3000 cgcaacatgg tctcacagca gctcacctga agaggcttgt aagatcaccc taggctgtgt 3060 attgcaccat gattgtcggc attctcacca cgctggctac gctggccaca ctcgcagcta 3120 gtgtgcctct agaggagcgg caagcttgct caagcgtctg gggccaatgt ggtggccaga 3180 attggtcggg tccgacttgc tgtgcttccg gaagcacatg cgtctactcc aacgactatt 3240 actcccagtg tcttcccggc gctgcaagct caagctcgtc cacgcgcgcc gcgtcgacga 3300 cttctcgagt atcccccaca acatcccggt cgagctccgc gacgcctcca cctggttcta 3360 ctactaccag agtacctcca gtcggatcgg gaaccgctac gtattcaggc aacccttttg 3420 ttggggtcac tctttgggcc aatgcatatt acgcctctga agttagcagc ctcgctattc 3480 ctagcttgac tggagccatg gccactgctg cagcagctgt cgcaaaggtt ccctcttttg 3540 tgtggctaga tactcttgac aagacccctc tcatggagca aaccttggcc gacatccgcg 3600 ccgccaacaa gaatggcggt aactatgccg gacagtttgt ggtgtatgac ttgccggatc 3660 gcgattgcgc tgcccttgcc tcgaatggcg aatactctat tgccgatggt ggcgtcgcca 3720 aatataagaa ctatatcgac accattcgtc aaattgtcgt ggaatattcc gatgtccgga 3780 ccctcctggt tattgagcct gactctcttg ccaacctggt gaccaacctc ggtactccaa 3840 agtgtgccaa tgctcagtca gcctaccttg agtgcatcaa ctacgccgtc acacagctga 3900 accttccaaa tgttgcgatg tatttggacg ctggccatgc aggatggctt ggctggccgg 3960 caaaccaaga cccggccgct cagctatttg caaatgttta caagaatgca tcgtctccga 4020 gagctcttcg cggattggca accaatgtcg ccaactacaa cgggtggaac attaccagcc 4080 ccccaccgta cacgcaaggc aacgctgtct acaacgagaa gctgtacatc cacgctattg 4140 gacctcttct tgccaatcac ggctggtcca acgccttctt catcactgat caaggtcgat 4200 cgggaaagca gcctaccgga cagcaacagt ggggagactg gtgcaatgtg atcggcaccg 4260

gatttggtat tcgcccatcc gcaaacactg gggactcgtt gctggattcg tttgtctggg 4320 tcaagccagg cggcgagtgt gacggcacca gcgacagcag tgcgccacga tttgactacc 4380 actgtgcgct cccagatgcc ttgcaaccgg cgcctcaagc tggtgcttgg ttccaagcct 4440 actttgtgca gcttctcaca aacgcaaacc catcgttcct gtaaggcgcg cctaaggctt 4500 tcgtgaccgg gcttcaaaca atgatgtgcg atggtgtggt tcccggttgg cggagtcttt 4560 gtctactttg gttgtctgtc gcaggtcggt agaccgcaaa tgagcaactg atggattgtt 4620 gccagcgata ctataattca catggatggt ctttgtcgat cagtagctag tgagagagag 4680 agaacatcta tccacaatgt cgagtgtcta ttagacatac tccgagaata aagtcaactg 4740 tgtctgtgat ctaaagatcg attcggcagt cgagtagcgt ataacaactc cgagtaccag 4800 caaaagcacg tcgtgacagg agcagggctt tgccaactgc gcaaccttaa ttaaaatagc 4860 tcgcccgctg gagagcatcc tgaatgcaag taacaaccgt agaggctgac acggcaggtg 4920 ttgctaggga gcgtcgtgtt ctacaaggcc agacgtcttc gcggttgata tatatgtatg 4980 tttgactgca ggctgctcag cgacgacagt caagttcgcc ctcgctgctt gtgcaataat 5040 cgcagtgggg aagccacacc gtgactccca tctttcagta aagctctgtt ggtgtttatc 5100 agcaatacac gtaatttaaa ctcgttagca tggggctgat agcttaatta ccgtttacca 5160 gtgccatggt tctgcagctt tccttggccc gtaaaattcg gcgaagccag ccaatcacca 5220 gctaggcacc agctaaaccc tataattagt ctcttatcaa caccatccgc tcccccggga 5280 tcaatgagga gaatgagggg gatgcggggc taaagaagcc tacataaccc tcatgccaac 5340 tcccagttta cactcgtcga gccaacatcc tgactataag ctaacacaga atgcctcaat 5400 cctgggaaga actggccgct gataagcgcg cccgcctcgc aaaaaccatc cctgatgaat 5460 ggaaagtcca gacgctgcct gcggaagaca gcgttattga tttcccaaag aaatcgggga 5520 tcctttcaga ggccgaactg aagatcacag aggcctccgc tgcagatctt gtgtccaagc 5580 tggcggccgg agagttgacc tcggtggaag ttacgctagc attctgtaaa cgggcagcaa 5640 tcgcccagca gttagtaggg tcccctctac ctctcaggga gatgtaacaa cgccacctta 5700 tgggactatc aagctgacgc tggcttctgt gcagacaaac tgcgcccacg agttcttccc 5760 tgacgccgct ctcgcgcagg caagggaact cgatgaatac tacgcaaagc acaagagacc 5820 cgttggtcca ctccatggcc tccccatctc tctcaaagac cagcttcgag tcaaggtaca 5880 ccgttgcccc taagtcgtta gatgtccctt tttgtcagct aacatatgcc accagggcta 5940 cgaaacatca atgggctaca tctcatggct aaacaagtac gacgaagggg actcggttct 6000 gacaaccatg ctccgcaaag ccggtgccgt cttctacgtc aagacctctg tcccgcagac 6060 cctgatggtc tgcgagacag tcaacaacat catcgggcgc accgtcaacc cacgcaacaa 6120

gaactggtcg tgcggcggca gttctggtgg tgagggtgcg atcgttggga ttcgtggtgg 6180 cgtcatcggt gtaggaacgg atatcggtgg ctcgattcga gtgccggccg cgttcaactt 6240 cctgtacggt ctaaggccga gtcatgggcg gctgccgtat gcaaagatgg cgaacagcat 6300 ggagggtcag gagacggtgc acagcgttgt cgggccgatt acgcactctg ttgagggtga 6360 gtccttcgcc tcttccttct tttcctgctc tataccaggc ctccactgtc ctcctttctt 6420 gctttttata ctatatacga gaccggcagt cactgatgaa gtatgttaga cctccgcctc 6480 ttcaccaaat ccgtcctcgg tcaggagcca tggaaatacg actccaaggt catccccatg 6540 ccctggcgcc agtccgagtc ggacattatt gcctccaaga tcaagaacgg cgggctcaat 6600 atcggctact acaacttcga cggcaatgtc cttccacacc ctcctatcct gcgcggcgtg 6660 gaaaccaccg tcgccgcact cgccaaagcc ggtcacaccg tgaccccgtg gacgccatac 6720 aagcacgatt tcggccacga tctcatctcc catatctacg cggctgacgg cagcgccgac 6780 gtaatgcgcg atatcagtgc atccggcgag ccggcgattc caaatatcaa agacctactg 6840 aacccgaaca tcaaagctgt taacatgaac gagctctggg acacgcatct ccagaagtgg 6900 aattaccaga tggagtacct tgagaaatgg cgggaggctg aagaaaaggc cgggaaggaa 6960 ctggacgcca tcatcgcgcc gattacgcct accgctgcgg tacggcatga ccagttccgg 7020 tactatgggt atgcctctgt gatcaacctg ctggatttca cgagcgtggt tgttccggtt 7080 acctttgcgg ataagaacat cgataagaag aatgagagtt tcaaggcggt tagtgagctt 7140 gatgccctcg tgcaggaaga gtatgatccg gaggcgtacc atggggcacc ggttgcagtg 7200 caggttatcg gacggagact cagtgaagag aggacgttgg cgattgcaga ggaagtgggg 7260 aagttgctgg gaaatgtggt gactccatag ctaataagtg tcagatagca atttgcacaa 7320 gaaatcaata ccagcaactg taaataagcg ctgaagtgac catgccatgc tacgaaagag 7380 cagaaaaaaa cctgccgtag aaccgaagag atatgacacg cttccatctc tcaaaggaag 7440 aatcccttca gggttgcgtt tccagtctag acacgtataa cggcacaagt gtctctcacc 7500 aaatgggtta tatctcaaat gtgatctaag gatggaaagc ccagaatatc gatcgcgcgc 7560 atttaaatca gctgcggagc atgagcctat ggcgatcagt ctggtcatgt taaccagcct 7620 gtgctctgac gttaatgcag aatagaaagc cgcggttgca atgcaaatga tgatgccttt 7680 gcagaaatgg cttgctcgct gactgatacc agtaacaact ttgcttggcc gtctagcgct 7740 gttgattgta ttcatcacaa cctcgtctcc ctcctttggg ttgagctctt tggatggctt 7800 tccaaacgtt aatagcgcgt ttttctccac aaagtattcg tatggacgcg cttttgcgtg 7860 tattgcgtga gctaccagca gcccaattgg cgaagtcttg agccgcatcg catagaataa 7920 ttgattgcgc atttgatgcg atttttgagc ggctgtttca ggcgacattt cgcccgccct 7980 tatttgctcc attatatcat cgacggcatg tccaatagcc cggtgatagt cttgtcgaat 8040

atggctgtcg tggataaccc atcggcagca gatgataatg attccgcagc acaagctcgt 8100 atgtgggtag cagaagaact gagcgagatc ttcgagggcg taactctgca tatccgattg 8160 gcctgctgcc acatgtcatt tgcttcggtt tcttttctgt tgagttcttg tatttgggtg 8220 aaagtaacat ggtgtatgac gagagacatt ggtggtaaga aaaaatttca cctcctctta 8280 gtgcaggact gactctcaaa atctatatgc aaatgtgtcg tgtaacaccc ttcgcatgag 8340 cgctgaccgt accctaccat ttcgccccac tcatgatagc agaagagaca tattaattcg 8400 gcaatgctac gaaagtctgc aggtatgctt aaataaacgc ttgccacaga agccgacagt 8460 ttattgttac tacttactat actgtattat tgttgctcac ataaggcggt gaaccattgg 8520 ttcaccacga cgcctgacga ggtaaattac tctctcgtag ggctgccaag gtaggtccca 8580 accccgtatc ctcggtcgag ggtgcgaggt tctttggtcc ttccctcttt ggtaaagccc 8640 agtagcgtgt ttgaatcagt tcacaatctc tcctaaacac agtccgacac taggtaggta 8700 cgttgtaata gcaactcaaa catgtaattc gttcaaggca ggaacatttt ataaacttcc 8760 ctgcgtattt aatcaataaa gatcctagtc caatcgtata ctacctacct acctagctaa 8820 ggtaggtagg tagttcgtgg gaacctggtc gctaattcac gcaacccact ttgcgctctt 8880 cgcctggccg tcgttgaagg taaagcagtt gtacccatca cctaactcaa ccgacacacc 8940 gttgatctgc tcaaggcagt tttcgtcact gtagaattcc acaggttgtt ccacgttgtc 9000 gaattggatc cccctatatt gggcactggc aaacgcggtc gtggacctgg tacagtcgcc 9060 tggctgaaca gtagtagttt cgactacgac gccgccagca caccttccgc cggtatagga 9120 attgaagagt acggggttct gtgcgaagac agccgggcag gcggaaagga tatagaagag 9180 ctgtccagtc acgttagcta gtgaagtaac gtaatggaag gaaagagaaa aggggagcag 9240 ggaggaaact cgtcatttac tcacaacttt gtgcatcttg acaaaagact tctgatatgg 9300 caacctataa ttcaacaaca tgcagcgtag taaagaatag gtgatcttct tgattcagtt 9360 gcttgagggc agggagaatg aagttccttg gaacgattta tatacccttc gcagcaagag 9420 agtcggctta aagaaaggag actgaaagtg tttacgggac gaatatctat ccgattagcg 9480 tagtatcgtc tctacaaggc ggggcgtaaa ttatgttcca aggccggaca acgtgaacaa 9540 caaatggaaa ttccagacgt ttgaggagaa tcaagctcac ttgctcgtgg ataccagtgg 9600 ttatgagcgc caccgctcaa cattgccgcc aatcggataa aaaaaagcct ctagaagagg 9660 agaccagcag ttgttttagg caaaacaatt gtacagagat cggttgtcgt ttgcgagata 9720 ggtaggtatt tacggagtaa cactaaatca aagatacaaa gttttctgcg attattaatt 9780 ctgcgacggt tggcgccatg tggtcttcca gggtgagcaa acgttactct tgctattgac 9840 tattgcaacg acgccgctcg gctgcgacac aacaaagaga cataaggccc tggggaggaa 9900

cgatgtgatc gtcagatcct tcgtagtgaa gatggcgcta cttatgactg catcaagcac 9960 actgtaccga acgcgttaca aaggatcctt tactgacctt cataccaagt ttccaatttg 10020 ttacttgcta aggtcgtgat aatattcatg gtctcctaga ggattgttac agatattaac 10080 agcttgaata gtgtcgagct tataacctgc aaggtacagc caagttgccc agcaccagga 10140 tgttacctcg cttaagttag gcaatagttt gcgagcctaa tgtcgacaaa gtatggcgca 10200 agctgagtac tgccttgggt gaatcctcgc tcaatggtaa ctttgcaagc tcatatgctt 10260 tccaaagctt gtgatacgtg cggttataag ctggcactga cgtgtttcga ggccagatgc 10320 ttgcgaaatc atcaagtgta ttgtggaaag gtctcaggat gaggtcctag aatacgcgag 10380 gcaaatttgt ctgatcgtct ttcaataacc tcatagtcga gtcacaaatg ttggaggtct 10440 ggttcaagcc gagccaagca atagcttggt cgggcgcgtc acagcatcag gaatgctaac 10500 gcttgcacat ctcgcggact ttattatgcc tggacgcaaa tattgatacc agaatcaagc 10560 cacaccctgt gaagcgtaac ttgtttttct ctgctttctt aaaaagctgc gtatatcatt 10620 gctagagcgc ccgtgaacaa cggaactcat tgtctcttta tcttcttact cgcccgggca 10680 agggcgaatt ccagcacact ggcggccgtt actagtggat ccgagctcgg taccaagctt 10740 gatgcatagc ttgagtattc taacgcgtca cctaaatagc ttggcgtaat catggtcata 10800 gctgtttcct gtgtgaaatt gttatccgct cacaattcca cacaacatac gagccggaag 10860 cataaagtgt aaagcctggg gtgcctaatg agtgagctaa ctcacattaa ttgcgttgcg 10920 ctcactgccc gctttccagt cgggaaacct gtcgtgccag ctgcattaat gaatcggcca 10980 acgcgcgggg agaggcggtt tgcgtattgg gcgctcttcc gcttcctcgc tcactgactc 11040 gctgcgctcg gtcgttcggc tgcggcgagc ggtatcagct cactcaaagg cggtaatacg 11100 gttatccaca gaatcagggg ataacgcagg aaagaacatg tgagcaaaag gccagcaaaa 11160 gcccaggaac cgtaaaaagg ccgcgttgct ggcgtttttc cataggctcc gcccccctga 11220 cgagcatcac aaaaatcgac gctcaagtca gaggtggcga aacccgacag gactataaag 11280 ataccaggcg tttccccctg gaagctccct cgtgcgctct cctgttccga ccctgccgct 11340 taccggatac ctgtccgcct ttctcccttc gggaagcgtg gcgctttctc atagctcacg 11400 ctgtaggtat ctcagttcgg tgtaggtcgt tcgctccaag ctgggctgtg tgcacgaacc 11460 ccccgttcag cccgaccgct gcgccttatc cggtaactat cgtcttgagt ccaacccggt 11520 aagacacgac ttatcgccac tggcagcagc cactggtaac aggattagca gagcgaggta 11580 tgtaggcggt gctacagagt tcttgaagtg gtggcctaac tacggctaca ctagaaggac 11640 agtatttggt atctgcgctc tgctgaagcc agttaccttc ggaaaaagag ttggtagctc 11700 ttgatccggc aaacaaacca ccgctggtag cggtggtttt tttgtttgca agcagcagat 11760 tacgcgcaga aaaaaaggat ctcaagaaga tcctttgatc ttttctacgg ggtctgacgc 11820

tcagtggaac gaaaactcac gttaagggat tttggtcatg agattatcaa aaaggatctt 11880 cacctagatc cttttaaatt aaaaatgaag ttttagcacg tgtcagtcct gctcctcggc 11940 cacgaagtgc acgcagttgc cggccgggtc gcgcagggcg aactcccgcc cccacggctg 12000 ctcgccgatc tcggtcatgg ccggcccgga ggcgtcccgg aagttcgtgg acacgacctc 12060 cgaccactcg gcgtacagct cgtccaggcc gcgcacccac acccaggcca gggtgttgtc 12120 cggcaccacc tggtcctgga ccgcgctgat gaacagggtc acgtcgtccc ggaccacacc 12180 ggcgaagtcg tcctccacga agtcccggga gaacccgagc cggtcggtcc agaactcgac 12240 cgctccggcg acgtcgcgcg cggtgagcac cggaacggca ctggtcaact tggccatggt 12300 ggccctcctc acgtgctatt attgaagcat ttatcagggt tattgtctca tgagcggata 12360 catatttgaa tgtatttaga aaaataaaca aataggggtt ccgcgcacat ttccccgaaa 12420 agtgccacct gtatgcggtg tgaaataccg cacagatgcg taaggagaaa ataccgcatc 12480 aggaaattgt aagcgttaat aattcagaag aactcgtcaa gaaggcgata gaaggcgatg 12540 cgctgcgaat cgggagcggc gataccgtaa agcacgagga agcggtcagc ccattcgccg 12600 ccaagctctt cagcaatatc acgggtagcc aacgctatgt cctgatagcg gtccgccaca 12660 cccagccggc cacagtcgat gaatccagaa aagcggccat tttccaccat gatattcggc 12720 aagcaggcat cgccatgggt cacgacgaga tcctcgccgt cgggcatgct cgccttgagc 12780 ctggcgaaca gttcggctgg cgcgagcccc tgatgctctt cgtccagatc atcctgatcg 12840 acaagaccgg cttccatccg agtacgtgct cgctcgatgc gatgtttcgc ttggtggtcg 12900 aatgggcagg tagccggatc aagcgtatgc agccgccgca ttgcatcagc catgatggat 12960 actttctcgg caggagcaag gtgagatgac aggagatcct gccccggcac ttcgcccaat 13020 agcagccagt cccttcccgc ttcagtgaca acgtcgagca cagctgcgca aggaacgccc 13080 gtcgtggcca gccacgatag ccgcgctgcc tcgtcttgca gttcattcag ggcaccggac 13140 aggtcggtct tgacaaaaag aaccgggcgc ccctgcgctg acagccggaa cacggcggca 13200 tcagagcagc cgattgtctg ttgtgcccag tcatagccga atagcctctc cacccaagcg 13260 gccggagaac ctgcgtgcaa tccatcttgt tcaatcatgc gaaacgatcc tcatcctgtc 13320 tcttgatcag agcttgatcc cctgcgccat cagatccttg gcggcgagaa agccatccag 13380 tttactttgc agggcttccc aaccttacca gagggcgccc cagctggcaa ttccggttcg 13440 cttgctgtcc ataaaaccgc ccagtctagc tatcgccatg taagcccact gcaagctacc 13500 tgctttctct ttgcgcttgc gttttccctt gtccagatag cccagtagct gacattcatc 13560 cggggtcagc accgtttctg cggactggct ttctacgtga aaaggatcta ggtgaagatc 13620 ctttttgata atctcatgcc tgacatttat attccccaga acatcaggtt aatggcgttt 13680

ttgatgtcat tttcgcggtg gctgagatca gccacttctt ccccgataac ggagaccggc 13740 acactggcca tatcggtggt catcatgcgc cagctttcat ccccgatatg caccaccggg 13800 taaagttcac gggagacttt atctgacagc agacgtgcac tggccagggg gatcaccatc 13860 cgtcgccccg gcgtgtcaat aatatcactc tgtacatcca caaacagacg ataacggctc 13920 tctcttttat aggtgtaaac cttaaactgc cgtacgtata ggctgcgcaa ctgttgggaa 13980 gggcgatcgg tgcgggcctc ttcgctatta cgccagctgg cgaaaggggg atgtgctgca 14040 aggcgattaa gttgggtaac gccagggttt tcccagtcac gacgttgtaa aacgacggcc 14100 agtgaattgt aatacgactc actatagggc gaattgggcc ctctagatgc atgctcga 14158

<210> 14 <211> 11312 <212> DNA

<213> Artificial

<220>

<223> pTrex4 : CBHl-El expression vector

<400> 14 aagcttaact agtacttctc gagctctgta catgtccggt cgcgacgtac gcgtatcgat 60 ggcgccagct gcaggcggcc gcctgcagcc acttgcagtc ccgtggaatt ctcacggtga 120 atgtaggcct tttgtagggt aggaattgtc actcaagcac ccccaacctc cattacgcct 180 cccccataga gttcccaatc agtgagtcat ggcactgttc tcaaatagat tggggagaag 240 ttgacttccg cccagagctg aaggtcgcac aaccgcatga tatagggtcg gcaacggcaa 300 aaaagcacgt ggctcaccga aaagcaagat gtttgcgatc taacatccag gaacctggat 360 acatccatca tcacgcacga ccactttgat ctgctggtaa actcgtattc gccctaaacc 420 gaagtgacgt ggtaaatcta cacgtgggcc cctttcggta tactgcgtgt gtcttctcta 480 ggtgccattc ttttcccttc ctctagtgtt gaattgtttg tgttggagtc cgagctgtaa 540 ctacctctga atctctggag aatggtggac taacgactac cgtgcacctg catcatgtat 600 ataatagtga tcctgagaag gggggtttgg agcaatgtgg gactttgatg gtcatcaaac 660 aaagaacgaa gacgcctctt ttgcaaagtt ttgtttcggc tacggtgaag aactggatac 720 ttgttgtgtc ttctgtgtat ttttgtggca acaagaggcc agagacaatc tattcaaaca 780 ccaagcttgc tcttttgagc tacaagaacc tgtggggtat atatctagag ttgtgaagtc 840 ggtaatcccg ctgtatagta atacgagtcg catctaaata ctccgaagct gctgcgaacc 900 cggagaatcg agatgtgctg gaaagcttct agcgagcggc taaattagca tgaaaggcta 960 tgagaaattc tggagacggc ttgttgaatc atggcgttcc attcttcgac aagcaaagcg 1020 ttccgtcgca gtagcaggca ctcattcccg aaaaaactcg gagattccta agtagcgatg 1080 gaaccggaat aatataatag gcaatacatt gagttgcctc gacggttgca atgcaggggt 1140

actgagcttg gacataactg ttccgtaccc cacctcttct caacctttgg cgtttccctg 1200 attcagcgta cccgtacaag tcgtaatcac tattaaccca gactgaccgg acgtgttttg 1260 cccttcattt ggagaaataa tgtcattgcg atgtgtaatt tgcctgcttg accgactggg 1320 gctgttcgaa gcccgaatgt aggattgtta tccgaactct gctcgtagag gcatgttgtg 1380 aatctgtgtc gggcaggaca cgcctcgaag gttcacggca agggaaacca ccgatagcag 1440 tgtctagtag caacctgtaa agccgcaatg cagcatcact ggaaaataca aaccaatggc 1500 taaaagtaca taagttaatg cctaaagaag tcatatacca gcggctaata attgtacaat 1560 caagtggcta aacgtaccgt aatttgccaa cggcttgtgg ggttgcagaa gcaacggcaa 1620 agccccactt ccccacgttt gtttcttcac tcagtccaat ctcagctggt gatcccccaa 1680 ttgggtcgct tgtttgttcc ggtgaagtga aagaagacag aggtaagaat gtctgactcg 1740 gagcgttttg catacaacca agggcagtga tggaagacag tgaaatgttg acattcaagg 1800 agtatttagc cagggatgct tgagtgtatc gtgtaaggag gtttgtctgc cgatacgacg 1860 aatactgtat agtcacttct gatgaagtgg tccatattga aatgtaagtc ggcactgaac 1920 aggcaaaaga ttgagttgaa actgcctaag atctcgggcc ctcgggcctt cggcctttgg 1980 gtgtacatgt ttgtgctccg ggcaaatgca aagtgtggta ggatcgaaca cactgctgcc 2040 tttaccaagc agctgagggt atgtgatagg caaatgttca ggggccactg catggtttcg 2100 aatagaaaga gaagcttagc caagaacaat agccgataaa gatagcctca ttaaacggaa 2160 tgagctagta ggcaaagtca gcgaatgtgt atatataaag gttcgaggtc cgtgcctccc 2220 tcatgctctc cccatctact catcaactca gatcctccag gagacttgta caccatcttt 2280 tgaggcacag aaacccaata gtcaaccgcg gactgcgcat catgtatcgg aagttggccg 2340 tcatctcggc cttcttggcc acagctcgtg ctcagtcggc ctgcactctc caatcggaga 2400 ctcacccgcc tctgacatgg cagaaatgct cgtctggtgg cacttgcact caacagacag 2460 gctccgtggt catcgacgcc aactggcgct ggactcacgc tacgaacagc agcacgaact 2520 gctacgatgg caacacttgg agctcgaccc tatgtcctga caacgagacc tgcgcgaaga 2580 actgctgtct ggacggtgcc gcctacgcgt ccacgtacgg agttaccacg agcggtaaca 2640 gcctctccat tggctttgtc acccagtctg cgcagaagaa cgttggcgct cgcctttacc 2700 ttatggcgag cgacacgacc taccaggaat tcaccctgct tggcaacgag ttctctttcg 2760 atgttgatgt ttcgcagctg ccgtaagtga cttaccatga acccctgacg tatcttcttg 2820 tgggctccca gctgactggc caatttaagg tgcggcttga acggagctct ctacttcgtg 2880 tccatggacg cggatggtgg cgtgagcaag tatcccacca acaccgctgg cgccaagtac 2940 ggcacggggt actgtgacag ccagtgtccc cgcgatctga agttcatcaa tggccaggcc 3000

aacgttgagg gctgggagcc gtcatccaac aacgcaaaca cgggcattgg aggacacgga 3060 agctgctgct ctgagatgga tatctgggag gccaactcca tctccgaggc tcttaccccc 3120 cacccttgca cgactgtcgg ccaggagatc tgcgagggtg atgggtgcgg cggaacttac 3180 tccgataaca gatatggcgg cacttgcgat cccgatggct gcgactggaa cccataccgc 3240 ctgggcaaca ccagcttcta cggccctggc tcaagcttta ccctcgatac caccaagaaa 3300 ttgaccgttg tcacccagtt cgagacgtcg ggtgccatca accgatacta tgtccagaat 3360 ggcgtcactt tccagcagcc caacgccgag cttggtagtt actctggcaa cgagctcaac 3420 gatgattact gcacagctga ggaggcagaa ttcggcggat cctctttctc agacaagggc 3480 ggcctgactc agttcaagaa ggctacctct ggcggcatgg ttctggtcat gagtctgtgg 3540 gatgatgtga gtttgatgga caaacatgcg cgttgacaaa gagtcaagca gctgactgag 3600 atgttacagt actacgccaa catgctgtgg ctggactcca cctacccgac aaacgagacc 3660 tcctccacac ccggtgccgt gcgcggaagc tgctccacca gctccggtgt ccctgctcag 3720 gtcgaatctc agtctcccaa cgccaaggtc accttctcca acatcaagtt cggacccatt 3780 ggcagcaccg gcaaccctag cggcggcaac cctcccggcg gaaacccgcc tggcaccacc 3840 accacccgcc gcccagccac taccactgga agctctcccg gacctactag taagcgggcg 3900 ggcggcggct attggcacac gagcggccgg gagatcctgg acgcgaacaa cgtgccggta 3960 cggatcgccg gcatcaactg gtttgggttc gaaacctgca attacgtcgt gcacggtctc 4020 tggtcacgcg actaccgcag catgctcgac cagataaagt cgctcggcta caacacaatc 4080 cggctgccgt actctgacga cattctcaag ccgggcacca tgccgaacag catcaatttt 4140 taccagatga atcaggacct gcagggtctg acgtccttgc aggtcatgga caaaatcgtc 4200 gcgtacgccg gtcagatcgg cctgcgcatc attcttgacc gccaccgacc ggattgcagc 4260 gggcagtcgg cgctgtggta cacgagcagc gtctcggagg ctacgtggat ttccgacctg 4320 caagcgctgg cgcagcgcta caagggaaac ccgacggtcg tcggctttga cttgcacaac 4380 gagccgcatg acccggcctg ctggggctgc ggcgatccga gcatcgactg gcgattggcc 4440 gccgagcggg ccggaaacgc cgtgctctcg gtgaatccga acctgctcat tttcgtcgaa 4500 ggtgtgcaga gctacaacgg agactcctac tggtggggcg gcaacctgca aggagccggc 4560 cagtacccgg tcgtgctgaa cgtgccgaac cgcctggtgt actcggcgca cgactacgcg 4620 acgagcgtct acccgcagac gtggttcagc gatccgacct tccccaacaa catgcccggc 4680 atctggaaca agaactgggg atacctcttc aatcagaaca ttgcaccggt atggctgggc 4740 gaattcggta cgacactgca atccacgacc gaccagacgt ggctgaagac gctcgtccag 4800 tacctacggc cgaccgcgca atacggtgcg gacagcttcc agtggacctt ctggtcctgg 4860 aaccccgatt ccggcgacac aggaggaatt ctcaaggatg actggcagac ggtcgacaca 4920

gtaaaagacg gctatctcgc gccgatcaag tcgtcgattt tcgatcctgt ctaaggcgcg 4980 ccgcgcgcca gctccgtgcg aaagcctgac gcaccggtag attcttggtg agcccgtatc 5040 atgacggcgg cgggagctac atggccccgg gtgatttatt ttttttgtat ctacttctga 5100 cccttttcaa atatacggtc aactcatctt tcactggaga tgcggcctgc ttggtattgc 5160 gatgttgtca gcttggcaaa ttgtggcttt cgaaaacaca aaacgattcc ttagtagcca 5220 tgcattttaa gataacggaa tagaagaaag aggaaattaa aaaaaaaaaa aaaacaaaca 5280 tcccgttcat aacccgtaga atcgccgctc ttcgtgtatc ccagtaccag tttattttga 5340 atagctcgcc cgctggagag catcctgaat gcaagtaaca accgtagagg ctgacacggc 5400 aggtgttgct agggagcgtc gtgttctaca aggccagacg tcttcgcggt tgatatatat 5460 gtatgtttga ctgcaggctg ctcagcgacg acagtcaagt tcgccctcgc tgcttgtgca 5520 ataatcgcag tggggaagcc acaccgtgac tcccatcttt cagtaaagct ctgttggtgt 5580 ttatcagcaa tacacgtaat ttaaactcgt tagcatgggg ctgatagctt aattaccgtt 5640 taccagtgcc gcggttctgc agctttcctt ggcccgtaaa attcggcgaa gccagccaat 5700 caccagctag gcaccagcta aaccctataa ttagtctctt atcaacacca tccgctcccc 5760 cgggatcaat gaggagaatg agggggatgc ggggctaaag aagcctacat aaccctcatg 5820 ccaactccca gtttacactc gtcgagccaa catcctgact ataagctaac acagaatgcc 5880 tcaatcctgg gaagaactgg ccgctgataa gcgcgcccgc ctcgcaaaaa ccatccctga 5940 tgaatggaaa gtccagacgc tgcctgcgga agacagcgtt attgatttcc caaagaaatc 6000 ggggatcctt tcagaggccg aactgaagat cacagaggcc tccgctgcag atcttgtgtc 6060 caagctggcg gccggagagt tgacctcggt ggaagttacg ctagcattct gtaaacgggc 6120 agcaatcgcc cagcagttag tagggtcccc tctacctctc agggagatgt aacaacgcca 6180 ccttatggga ctatcaagct gacgctggct tctgtgcaga caaactgcgc ccacgagttc 6240 ttccctgacg ccgctctcgc gcaggcaagg gaactcgatg aatactacgc aaagcacaag 6300 agacccgttg gtccactcca tggcctcccc atctctctca aagaccagct tcgagtcaag 6360 gtacaccgtt gcccctaagt cgttagatgt ccctttttgt cagctaacat atgccaccag 6420 ggctacgaaa catcaatggg ctacatctca tggctaaaca agtacgacga aggggactcg 6480 gttctgacaa ccatgctccg caaagccggt gccgtcttct acgtcaagac ctctgtcccg 6540 cagaccctga tggtctgcga gacagtcaac aacatcatcg ggcgcaccgt caacccacgc 6600 aacaagaact ggtcgtgcgg cggcagttct ggtggtgagg gtgcgatcgt tgggattcgt 6660 ggtggcgtca tcggtgtagg aacggatatc ggtggctcga ttcgagtgcc ggccgcgttc 6720 aacttcctgt acggtctaag gccgagtcat gggcggctgc cgtatgcaaa gatggcgaac 6780

agcatggagg gtcaggagac ggtgcacagc gttgtcgggc cgattacgca ctctgttgag 6840 ggtgagtcct tcgcctcttc cttcttttcc tgctctatac caggcctcca ctgtcctcct 6900 ttcttgcttt ttatactata tacgagaccg gcagtcactg atgaagtatg ttagacctcc 6960 gcctcttcac caaatccgtc ctcggtcagg agccatggaa atacgactcc aaggtcatcc 7020 ccatgccctg gcgccagtcc gagtcggaca ttattgcctc caagatcaag aacggcgggc 7080 tcaatatcgg ctactacaac ttcgacggca atgtccttcc acaccctcct atcctgcgcg 7140 gcgtggaaac caccgtcgcc gcactcgcca aagccggtca caccgtgacc ccgtggacgc 7200 catacaagca cgatttcggc cacgatctca tctcccatat ctacgcggct gacggcagcg 7260 ccgacgtaat gcgcgatatc agtgcatccg gcgagccggc gattccaaat atcaaagacc 7320 tactgaaccc gaacatcaaa gctgttaaca tgaacgagct ctgggacacg catctccaga 7380 agtggaatta ccagatggag taccttgaga aatggcggga ggctgaagaa aaggccggga 7440 aggaactgga cgccatcatc gcgccgatta cgcctaccgc tgcggtacgg catgaccagt 7500 tccggtacta tgggtatgcc tctgtgatca acctgctgga tttcacgagc gtggttgttc 7560 cggttacctt tgcggataag aacatcgata agaagaatga gagtttcaag gcggttagtg 7620 agcttgatgc cctcgtgcag gaagagtatg atccggaggc gtaccatggg gcaccggttg 7680 cagtgcaggt tatcggacgg agactcagtg aagagaggac gttggcgatt gcagaggaag 7740 tggggaagtt gctgggaaat gtggtgactc catagctaat aagtgtcaga tagcaatttg 7800 cacaagaaat caataccagc aactgtaaat aagcgctgaa gtgaccatgc catgctacga 7860 aagagcagaa aaaaacctgc cgtagaaccg aagagatatg acacgcttcc atctctcaaa 7920 ggaagaatcc cttcagggtt gcgtttccag tctagacacg tataacggca caagtgtctc 7980 tcaccaaatg ggttatatct caaatgtgat ctaaggatgg aaagcccaga atctaggcct 8040 attaatattc cggagtatac gtagccggct aacgttaaca accggtacct ctagaactat 8100 agctagcatg cgcaaattta aagcgctgat atcgatcgcg cgcagatcca tatatagggc 8160 ccgggttata attacctcag gtcgacgtcc catggccatt cgaattcgta atcatggtca 8220 tagctgtttc ctgtgtgaaa ttgttatccg ctcacaattc cacacaacat acgagccgga 8280 agcataaagt gtaaagcctg gggtgcctaa tgagtgagct aactcacatt aattgcgttg 8340 cgctcactgc ccgctttcca gtcgggaaac ctgtcgtgcc agctgcatta atgaatcggc 8400 caacgcgcgg ggagaggcgg tttgcgtatt gggcgctctt ccgcttcctc gctcactgac 8460 tcgctgcgct cggtcgttcg gctgcggcga gcggtatcag ctcactcaaa ggcggtaata 8520 cggttatcca cagaatcagg ggataacgca ggaaagaaca tgtgagcaaa aggccagcaa 8580 aaggccagga accgtaaaaa ggccgcgttg ctggcgtttt tccataggct ccgcccccct 8640 gacgagcatc acaaaaatcg acgctcaagt cagaggtggc gaaacccgac aggactataa 8700

agataccagg cgtttccccc tggaagctcc ctcgtgcgct ctcctgttcc gaccctgccg 8760 cttaccggat acctgtccgc ctttctccct tcgggaagcg tggcgctttc tcatagctca 8820 cgctgtaggt atctcagttc ggtgtaggtc gttcgctcca agctgggctg tgtgcacgaa 8880 ccccccgttc agcccgaccg ctgcgcctta tccggtaact atcgtcttga gtccaacccg 8940 gtaagacacg acttatcgcc actggcagca gccactggta acaggattag cagagcgagg 9000 tatgtaggcg gtgctacaga gttcttgaag tggtggccta actacggcta cactagaaga 9060 acagtatttg gtatctgcgc tctgctgaag ccagttacct tcggaaaaag agttggtagc 9120 tcttgatccg gcaaacaaac caccgctggt agcggtggtt tttttgtttg caagcagcag 9180 attacgcgca gaaaaaaagg atctcaagaa gatcctttga tcttttctac ggggtctgac 9240 gctcagtgga acgaaaactc acgttaaggg attttggtca tgagattatc aaaaaggatc 9300 ttcacctaga tccttttaaa ttaaaaatga agttttaaat caatctaaag tatatatgag 9360 taaacttggt ctgacagtta ccaatgctta atcagtgagg cacctatctc agcgatctgt 9420 ctatttcgtt catccatagt tgcctgactc cccgtcgtgt agataactac gatacgggag 9480 ggcttaccat ctggccccag tgctgcaatg ataccgcgag acccacgctc accggctcca 9540 gatttatcag caataaacca gccagccgga agggccgagc gcagaagtgg tcctgcaact 9600 ttatccgcct ccatccagtc tattaattgt tgccgggaag ctagagtaag tagttcgcca 9660 gttaatagtt tgcgcaacgt tgttgccatt gctacaggca tcgtggtgtc acgctcgtcg 9720 tttggtatgg cttcattcag ctccggttcc caacgatcaa ggcgagttac atgatccccc 9780 atgttgtgca aaaaagcggt tagctccttc ggtcctccga tcgttgtcag aagtaagttg 9840 gccgcagtgt tatcactcat ggttatggca gcactgcata attctcttac tgtcatgcca 9900 tccgtaagat gcttttctgt gactggtgag tactcaacca agtcattctg agaatagtgt 9960 atgcggcgac cgagttgctc ttgcccggcg tcaatacggg ataataccgc gccacatagc 10020 agaactttaa aagtgctcat cattggaaaa cgttcttcgg ggcgaaaact ctcaaggatc 10080 ttaccgctgt tgagatccag ttcgatgtaa cccactcgtg cacccaactg atcttcagca 10140 tcttttactt tcaccagcgt ttctgggtga gcaaaaacag gaaggcaaaa tgccgcaaaa 10200 aagggaataa gggcgacacg gaaatgttga atactcatac tcttcctttt tcaatattat 10260 tgaagcattt atcagggtta ttgtctcatg agcggataca tatttgaatg tatttagaaa 10320 aataaacaaa taggggttcc gcgcacattt ccccgaaaag tgccacctga cgtctaagaa 10380 accattatta tcatgacatt aacctataaa aataggcgta tcacgaggcc ctttcgtctc 10440 gcgcgtttcg gtgatgacgg tgaaaacctc tgacacatgc agctcccgga gacggtcaca 10500 gcttgtctgt aagcggatgc cgggagcaga caagcccgtc agggcgcgtc agcgggtgtt 10560

ggcgggtgtc ggggctggct taactatgcg gcatcagagc agattgtact gagagtgcac 10620 cataaaattg taaacgttaa tattttgtta aaattcgcgt taaatttttg ttaaatcagc 10680 tcatttttta accaataggc cgaaatcggc aaaatccctt ataaatcaaa agaatagccc 10740 gagatagggt tgagtgttgt tccagtttgg aacaagagtc cactattaaa gaacgtggac 10800 tccaacgtca aagggcgaaa aaccgtctat cagggcgatg gcccactacg tgaaccatca 10860 cccaaatcaa gttttttggg gtcgaggtgc cgtaaagcac taaatcggaa ccctaaaggg 10920 agcccccgat ttagagcttg acggggaaag ccggcgaacg tggcgagaaa ggaagggaag 10980 aaagcgaaag gagcgggcgc tagggcgctg gcaagtgtag cggtcacgct gcgcgtaacc 11040 accacacccg ccgcgcttaa tgcgccgcta cagggcgcgt actatggttg ctttgacgta 11100 tgcggtgtga aataccgcac agatgcgtaa ggagaaaata ccgcatcagg cgccattcgc 11160 cattcaggct gcgcaactgt tgggaagggc gatcggtgcg ggcctcttcg ctattacgcc 11220 agctggcgaa agggggatgt gctgcaaggc gattaagttg ggtaacgcca gggttttccc 11280 agtcacgacg ttgtaaaacg acggccagtg cc 11312