Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
NUTRITIVE POLYPEPTIDE PRODUCTION SYSTEMS, AND METHODS OF MANUFACTURE AND USE THEREOF
Document Type and Number:
WIPO Patent Application WO/2015/054507
Kind Code:
A1
Abstract:
The application concerns reagents and methods for the production of "nutritive polypeptides" in fungi, in particular, Aspergillus, where the expression of said polypeptide is increased by co-expressing in the fungal host (i) a variant nucleic acid encoding a secreted protease (in particular pepA, pepB, pepD, pepF or pepH), (ii) a "secretory pathway polypeptide (including a chaperone, like foldase, a vesicle trafficking polypeptide, a glycosylation pathway polypeptide or a protein degradation pathway) and/or, (ii) a nucleic acid encoding a Cas9 endonuclease plus a nucleic acid encoding a guide RNA (gRNA). The application is also directed to said nutritive proteins, nucleic acids encoding the proteins, recombinant microorganisms that make the proteins, vectors for expressing the proteints, methods of making the proteins using recombinant microorganisms, compositions that comprise the proteins, and methods of using the proteins.

Inventors:
BASU SUBHAYU (US)
CHEN YING-JA (US)
HARVEY CAITLYM (US)
GORA KATHERINE (US)
BERRY DAVID A (US)
YOUNG DAVID (US)
Application Number:
PCT/US2014/059923
Publication Date:
April 16, 2015
Filing Date:
October 09, 2014
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
PRONUTRIA INC (US)
BASU SUBHAYU (US)
CHEN YING-JA (US)
HARVEY CAITLYM (US)
GORA KATHERINE (US)
BERRY DAVID A (US)
YOUNG DAVID (US)
International Classes:
C12N9/62; C12N9/22; C12N15/10; C12N15/52; C12P21/02; C12R1/66
Domestic Patent References:
WO2006110677A22006-10-19
WO2009126463A22009-10-15
Foreign References:
US20130032232A12013-02-07
US20130032180A12013-02-07
US20130032225A12013-02-07
US20130032218A12013-02-07
US20130032212A12013-02-07
US20130032206A12013-02-07
US20130038682A12013-02-14
US5156956A1992-10-20
US20140023828W2014-03-12
US20130053287W2013-08-01
US20130032589W2013-03-15
US20140029304W2014-03-14
US20140028630W2014-03-14
US20140029068W2014-03-14
US20140028445W2014-03-14
US20140273235A12014-09-18
US20140273234A12014-09-18
US20140273233A12014-09-18
US20140273232A12014-09-18
US20140273231A12014-09-18
US20140273230A12014-09-18
US20140273226A12014-09-18
US20140057526W2014-09-25
US20140057527W2014-09-25
US20140057528W2014-09-25
US20070264688A12007-11-15
US20070269862A12007-11-22
Other References:
VAN DEN HOMBERGH J P ET AL: "Aspergillus as a host for heterologous protein production: the problem of proteases", TRENDS IN BIOTECHNOLOGY, ELSEVIER PUBLICATIONS, CAMBRIDGE, GB, vol. 15, no. 7, 1 July 1997 (1997-07-01), pages 256 - 263, XP027556962, ISSN: 0167-7799, [retrieved on 19970701]
JAEWOO YOON ET AL: "Disruption of ten protease genes in the filamentous fungushighly improves production of heterologous proteins", APPLIED MICROBIOLOGY AND BIOTECHNOLOGY, SPRINGER, BERLIN, DE, vol. 89, no. 3, 19 October 2010 (2010-10-19), pages 747 - 759, XP019874932, ISSN: 1432-0614, DOI: 10.1007/S00253-010-2937-0
JAEWOO YOON ET AL: "Construction of quintuple protease gene disruptant for heterologous protein production in Aspergillus oryzae", APPLIED MICROBIOLOGY AND BIOTECHNOLOGY, SPRINGER, BERLIN, DE, vol. 82, no. 4, 24 December 2008 (2008-12-24), pages 691 - 701, XP019705459, ISSN: 1432-0614
FRANCISCO J MORALEJO ET AL: "Silencing of the Aspergillopepsin B (pepB) Gene of Aspergillus awamori by Antisense RNA Expression or Protease Removal by Gene Disruption Results in a Large Increase in Thaumatin Production", APPLIED AND ENVIRONMENTAL MICROBIOLOGY, AMERICAN SOCIETY FOR MICROBIOLOGY, US, vol. 68, no. 7, 1 July 2002 (2002-07-01), pages 3550 - 3559, XP002644547, ISSN: 0099-2240, DOI: 10.1128/AEM.68.7.3550-3559.2002
J. E. DICARLO ET AL: "Genome engineering in Saccharomyces cerevisiae using CRISPR-Cas systems", NUCLEIC ACIDS RESEARCH, vol. 41, no. 7, 4 March 2013 (2013-03-04), pages 4336 - 4343, XP055086617, ISSN: 0305-1048, DOI: 10.1093/nar/gkt135
HINTZ WILLIAM E ET AL: "Improved gene expression in Aspergillus nidulans", CANADIAN JOURNAL OF BOTANY, vol. 73, no. SUPPL. 1 SECT. E-H, 1995, pages S876 - S884, XP009182477, ISSN: 0008-4026
IMURA K; OKADA A: "Amino acid metabolism in pediatric patients", NUTRITION, vol. 14, no. 1, 1998, pages 143 - 8, XP029641333, DOI: doi:10.1016/S0899-9007(97)00230-X
FURST P; STEHLE P: "What are the essential elements needed for the determination of amino acid requirements in humans?", JOURNAL OF NUTRITION, vol. 134, no. 6, 1 June 2004 (2004-06-01), pages 1558S - 1565S
REEDS PJ: "Dispensable and indispensable amino acids for humans", J. NUTR., vol. 130, no. 7, 1 July 2000 (2000-07-01), pages 1835S - 40S
ALTSCHUL ET AL., J. MOL. BIOL., vol. 215, 1990, pages 403 - 410
GISH; STATES, NATURE GENET, vol. 3, 1993, pages 266 - 272
MADDEN ET AL., METH. ENZYMOL., vol. 266, 1996, pages 131 - 141
ALTSCHUL ET AL., NUCLEIC ACIDS RES., vol. 25, 1997, pages 3389 - 3402
ZHANG; MADDEN, GENOME RES., vol. 7, 1997, pages 649 - 656
PEARSON, METHODS MOL. BIOL., vol. 24, 1994, pages 307 - 31
METHODS MOL. BIOL., vol. 25, pages 365 - 89
AUSUBEL ET AL.: "Current Protocols in Molecular Biology", 1992, GREENE PUBLISHING ASSOCIATES
PEARSON, METHODS ENZYMOL., vol. 183, 1990, pages 63 - 98
SAMBROOK ET AL.: "Molecular Cloning: A Laboratory Manual", 1989, COLD SPRING HARBOR LABORATORY PRESS, pages: 9.51
DANIEL, H.: "Molecular and Integrative Physiology of Intestinal Peptide Transport", ANNUAL REVIEW OF PHYSIOLOGY, vol. 66, 2003, pages 361 - 384
MORENO ET AL.: "Stability of the major allergen Brazil nut 2S albumin (Ber e 1) to physiologically relevant in vitro gastrointestinal digestion", FEBS JOURNAL, 2005, pages 341 - 352
MARTOS, G.; CONTRERAS, P.; MOLINA, E.; LOPEZ-FANDINO, R.: "Egg White Ovalbumin Digestion Mimicking Physiological Conditions", JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY, 2010, pages 5640 - 5648
MORENO, F. J.; MACKIE, A. R.; CLARE MILLS, E. N.: "Phospholipid interactions protect the milk allergen a-Lactalbumin from proteolysis during in vitro digestion", JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY, 2005, pages 9810 - 9816
KONG, F.; SINGH, R. P.: "Disintegration of Solid Foods in Human Stomach", JOURNAL OF FOOD SCIENCE, 2008, pages 67 - 80
GOODMAN, R. E. ET AL.: "Allergenicity assessment of genetically modified crops - what makes sense?", NATURE BIOTECHNOLOGY, 2008, pages 73 - 81
I. M. REDDY; N. K. D. KELLA; J. E. KINSELLA: "Structural and Conformational Basis of the Resistance of B-Lactoglobulin to Peptic and Chymotryptic Digestion", J. AGRIC. FOOD CHEM., vol. 36, 1988, pages 737 - 741
CRANKSHAW, M. W.; GRANT, G. A.: "Modification of Cysteine", CURRENT PROTOCOLS IN PROTEIN SCIENCE, 2001, pages 15.1.1 - 15.1.18
"DNA Microarrays: A Practical Approach (Practical Approach Series", 1999, OXFORD UNIVERSITY PRESS
NATURE GENET., vol. 2 1, no. 1, 1999, pages 1 - 60
"Microarray Biochip: Tools and Technology", 2000, EATON PUBLISHING COMPANY/BIOTECHNIQUES BOOKS DIVISION
GERHOLD ET AL., TRENDS BIOCHEM. SCI., vol. 24, 1999, pages 168 - 173
ZWEIGER, TRENDS BIOTECHNOL., vol. 17, 1999, pages 429 - 436
"DNA Microarrays: A Practical Approach", 1999, OXFORD UNIVERSITY PRESS
NATURE GENET, vol. 21, no. L, 1999, pages 1 - 60
LOPEZ-MAUY ET AL., CELL, vol. 43, 2002, pages 247 - 256
QI ET AL., APPLIED AND ENVIRONMENTAL MICROBIOLOGY, vol. 71, 2005, pages 5678 - 5684
WATERHOUSE ET AL., NUCLEIC ACIDS RESEARCH, 2013, pages D358 - 65
HOLKERI, H ET AL., FEBS LETT, vol. 429, 1998, pages 162 - 6
GASSER ET AL., APPL ENVIRON MICROBIOL., vol. 73, 2007, pages 6499 - 507
GASSER ET AL., BIOTECHNOL LETT, vol. 29, 2007, pages 201 - 12
LE CROM ET AL., PNAS, vol. 106, 2009, pages 16151 - 6
YOON ET AL., APP ENVIRON MICROBIOL., vol. 76, 2010, pages 5718 - 27
GASSER ET AL., APPL. ENVIRON MICROBIOL., vol. 73, 2007, pages 6499 - 507
HARMSEN ET AL., APPL MICROBIOL.BIOTECHNOL., vol. 46, 1996, pages 365 - 70
LARSSON ET AL., APPL. ENVIRON. MICROBIOL., vol. 67, 2001, pages 1163 - 70
TOIKKANEN ET AL., YEAST, vol. 21, 2004, pages 1045 - 55
PUNT ET AL., FUNGAL GENETICS AND BIOL., vol. 45, 2008, pages 1591 - 9
ZHANG ET AL., JCB, vol. 153, 2001, pages 1187 - 98
BUSSEY ET AL., CURR. GENET., vol. 7, 1983, pages 449 - 56
TYO ET AL., BMC BIOL., vol. 10, 2012, pages 16
SAMPSON ET AL., BIOESSAYS, vol. 36, no. 1, 2014, pages 34 - 8
GILBERT ET AL., CELL, vol. 154, no. 2, 18 December 2012 (2012-12-18), pages 442 - 51
FARZADFARD ET AL., ACS SYNTH BIOL., vol. 2, no. 10, 18 December 2012 (2012-12-18), pages 604 - 13
JINEK ET AL., SCIENCE, vol. 337, pages 816 - 821
MORRISON, S., SCIENCE, vol. 229, 1985, pages 1202
GOODMAN R. E. ET AL.: "Allergenicity assessment of genetically modified crops-what makes sense?", NAT. BIOTECH., vol. 26, 2008, pages 73 - 81
AALBERSE R. C.: "Structural biology of allergens", J. ALLERGY CLIN. IMMUNOL., vol. 106, 2000, pages 228 - 238, XP002343238, DOI: doi:10.1067/mai.2000.108434
JENKINS J. A. ET AL.: "Evolutionary distance from human homologs reflects allergenicity of animcal food proteins", J. ALLERGY CLIN IMMUNOL., vol. 120, 2007, pages 1399 - 1405, XP022383436, DOI: doi:10.1016/j.jaci.2007.08.019
JENKINS J. A. ET AL.: "Evolutionary distance from human homologs reflects allergenicity of animal food proteins", J. ALLERGY CLIN IMMUNOL., vol. 120, 2007, pages 1399 - 1405, XP022383436, DOI: doi:10.1016/j.jaci.2007.08.019
NIESEN, F. H.; BERGLUND, H.; VADADI, M.: "The use of differential scanning fluorimetry to detect ligand interactions that promote protein stability", NATURE PROTOCOLS, vol. 2, 2007, pages 2212 - 2221
LAVINDER, J. J.; HARI, S. B.; SUILLIVAN, B. J.; MAGILERY, T. J.: "High-Throughput Thermal Scanning: A General, Rapid Dye-Binding Thermal Shift Screen for Protein Engineering", JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2009, pages 3794 - 3795, XP055137712, DOI: doi:10.1021/ja8049063
PELEGRINE ET AL., LEBENSM.-WISS. U.-TECHNOL., vol. 38, 2005, pages 77 - 80
LEE ET AL., JAOCS, vol. 80, no. 1, 2003, pages 85 - 90
BIOCHEM. J., vol. 376, 2003, pages 339 - 350
WHEELER, E. L.; FERREL, R. E., CEREAL CHEM., vol. 48, 1971, pages 312
TANAKA, M.; THANANUNKUL, D.; LEE, T. C.; CHICHESTER, C. O., J. FOOD SCI., vol. 40, 1975, pages 1087 - 1088
KAKADE, M. L.; RACKIS, J. J.; MCGHEE, J. E.; PUSKI, G., CEREAL CHEM., vol. 51, 1974, pages 376 - 82
BAKER, J. E.; WOO, S. M.; THRONE, J. E.; FINNY, P. L., ENVIRONM. ENTOMOL., vol. 20, 1991, pages 53 - 60
HUESING, J. E.; SHADE, R. E.; CHRISPEELS, M. J.; MURDOK, L. L., PLANT PHYSIOL., vol. 96, 1991, pages 993 - 996
PAREDES-LOPEZ, O.; SCHEVENIN, M. L.; GUEVARA-LARA, F., FOOD CHEM., vol. 31, 1989, pages 129 - 137
"Eur. J. Biochem.", vol. 138, 1984, pages: 519
LIS, H.; SHARON, N., METHODS ENZYMOL., vol. 28, 1972, pages 360 - 368
"AOAC, Official Methods of Analysis", 1990
BASSI S: "A Primer on Python for Life Science Researchers", PLOS COMPUT BIOL, vol. 3, no. 11, 2007, pages E199
D. SITKOFF; K. A. SHARP; B. HONIG: "Accurate Calculation of Hydration Free Energies Using Macroscopic Solvent Models", J. PHYS. CHEM., vol. 98, 1994, XP055234455, DOI: doi:10.1021/j100058a043
KYTE J; DOOLITTLE RF: "A simple method for displaying the hydropathic character of a protein", J. MOL. BIOL., vol. 157, no. I, May 1982 (1982-05-01), pages 105 - 32, XP024014365, DOI: doi:10.1016/0022-2836(82)90515-0
T.E. CREIGHTON: "Proteins: Structures and Molecular Properties", 1993, W.H. FREEMAN AND COMPANY
A.L. LEHNINGER: "Biochemistry", WORTH PUBLISHERS, INC.
SAMBROOK ET AL.: "Molecular Cloning: A Laboratory Manual", 1989
"Methods In Enzymology", ACADEMIC PRESS, INC.
"Remington's Pharmaceutical Sciences", 1990, EASTON, PENNSYLVANIA: MACK PUBLISHING COMPANY
CAREY; SUNDBERG: "Advanced Organic Chemistry", vol. A, B, 1992, PLENUM PRESS
LIU, HONGBIN ET AL., ANALYTICAL CHEMISTRY, vol. 76.14, 2004, pages 4193 - 4201
EINHAUER ET AL., JOURNAL OF BIOCHEMICAL AND BIOPHYSICAL METHODS, 2001
SCOPES R.: "Protein Purification: Principles and Practice", 1987, SPRINGER
F.HOFMEISTER, ARCH. EXP. PATHOL. PHARMACOL., vol. 24, pages 247 - 260
JIM KLING: "Highly Concentrated Protein Formulations: Finding Solutions for the Next Generation of Parenteral Biologics", BIOPROCESS INTERNATIONAL, 2014
LAWRENCE, M. S.; PHILLIPS, K. J.; LIU, D. R.: "Supercharging proteins can impart unusual resilience", JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, vol. 129, no. 33, 2007, pages 10110 - 2, XP002679970, DOI: doi:10.1021/JA071641Y
PUNT ET AL., METHODS IN ENZYMOLOGY, vol. 216, 1992, pages 447 - 457
E. KARNAUKHOVA ET AL., MICROBIAL CELL FACTORIES, vol. 6, pages 34
CONESA ET AL., APPLIED AND ENVIRONMENTAL MICROBIOLOGY, vol. 66, no. 7, 2000, pages 3016 - 23
JUNXIN LI ET AL., MICROBIAL CELL FACTORIES, vol. 11, 2012, pages 84
FITZGERALD; GLICK, MICROBIAL CELL FACTORIES, vol. 13, 2014, pages 125
Attorney, Agent or Firm:
KABLER, Kevin et al. (Silicon Valley Center801 California Stree, Mountain View CA, US)
Download PDF:
Claims:
CLAIMS

1. A method of increasing production of a nutritive polypeptide in a host organism,

comprising the steps of:

a) providing a first Aspergillus host organism comprising i) a variant nucleic acid sequence encoding a secreted protease or a secretory pathway polypeptide, and ii) at least one recombinant nucleic acid encoding at least one nutritive polypeptide; and

b) culturing the first Aspergillus host organism in a fermentation medium, wherein the production of the at least one nutritive polypeptide in the cultured first

Aspergillus host organism is increased relative to a cultured Aspergillus host organism comprising a reference nucleic acid sequence.

2. The method of claim 1 , further comprising the step of purifying the produced nutritive polypeptide from the first Aspergillus host organism.

3. A nutritive composition comprising the purified nutritive polypeptide obtained by the method of claim 2, wherein the nutritive composition is substantially free of non- comestible products.

4. The nutritive composition of claim 3, comprising the purified nutritive polypeptide in an amount effective to treat or prevent a disease, disorder or condition characterized by the lack of adequate protein nutrition.

5. The method of claim 1 , wherein the secreted protease is selected from the group

consisting of pepA, pepB, pepD, pepF, and pepH.

6. The method of claim 1 , wherein the secretory pathway polypeptide is selected from the group consisting of a chaperone, a vesicle trafficking polypeptide, a glycosylation pathway polypeptide, and a protein degradation pathway.

7. The method of claim 6, wherein the chaperone comprises a foldase.

8. The method of claim 1, wherein the recombinant nucleic acid encoding the nutritive polypeptide is codon-optimized for production of the nutritive polypeptide in Aspergillus.

9. The method of claim 1, wherein the first Aspergillus host organism comprises i) a first nucleic acid sequence encoding a Cas9 endonuclease polypeptide, and ii) a second nucleic acid sequence comprising a guide nucleic acid encoding a gRNA.

10. The method of claim 9, wherein the first nucleic acid sequence and/or the second nucleic acid sequence is incorporated into a genomic locus present in the Aspergillus host organism.

11. The method of claim 9, wherein each of: the first nucleic acid sequence, the second nucleic acid sequence and the recombinant nucleic acid encoding the nutritive polypeptide, is either i) independently incorporated into more than one genomic loci present in the Aspergillus host organism or ii) incorporated into one genomic locus present in the Aspergillus host organism.

12. The method of claim 1, wherein the nutritive polypeptide is substantially secreted from the Aspergillus host organism.

13. The method of claim 12, wherein the solubility of the secreted nutritive polypeptide is greater than a reference nutritive polypeptide secreted from a cultured Aspergillus host organism comprising a reference nucleic acid sequence encoding the reference nutritive polypeptide.

14. The method of claim 12, wherein i) the content of O-linked glycosylation, N-linked glycosylation, or both O-linked glycosylation and N-Iinked glycosylation, of the secreted nutritive polypeptide is greater than a reference nutritive polypeptide secreted from a cultured Aspergillus host organism comprising a reference nucleic acid sequence, or ii) wherein the content of O-linked glycosylation, N-linked glycosylation, or both O-linked glycosylation and N-linked glycosylation, of the secreted nutritive polypeptide is less than a reference nutritive polypeptide secreted from the cultured Aspergillus host organism comprising a reference nucleic acid sequence encoding the reference nutritive polypeptide.

15. The method of claim 1, comprising providing i) a plurality of recombinant nucleic acids individually encoding at least one nutritive polypeptide or ii) a recombinant nucleic acid that encodes a plurality of nutritive polypeptides.

16. A method of increasing production of a nutritive polypeptide in an isolated fungal host organism, comprising the step of introducing into an fungal host organism at least one recombinant nucleic acid comprising a first nucleic acid sequence encoding a nutritive polypeptide, (ii) a second nucleic acid sequence encoding a Cas9 endonuclease polypeptide, wherein the second nucleic acid sequence is codon-optimized for production of the Cas9 endonuclease polypeptide in the fungal host organism, and (iii) a guide nucleic acid encoding a gRNA, under conditions such that the first nucleic acid sequence is integrated into a genomic locus present in the fungal host organism such that the nutritive polypeptide is produced by the fungal host organism at an increased level as compared to an fungal host organism comprising a non-integrated nucleic acid encoding the nutritive polypeptide.

17. The method of claim 16, wherein the non-integrated nucleic acid comprises a plasmid.

18. A method of increasing secretion of a nutritive polypeptide in an isolated Aspergillus host organism, comprising the step of introducing into the Aspergillus host organism one or more nucleic acids, wherein the one or more nucleic acids comprise: (i) a first nucleic acid sequence encoding a nutritive polypeptide, wherein the first nucleic acid further encodes a secretion leader peptide, (ii) a second nucleic acid sequence encoding a Cas9 endonuclease polypeptide, wherein the second nucleic acid sequence is codon-optimized for production of the Cas9 endonuclease polypeptide in Aspergillus, and (iii) a guide nucleic acid, wherein the guide nucleic acid comprises a gRNA or encodes a gRNA, under conditions such that at least one of the first or second nucleic acid sequences is incorporated into a genomic locus present in the Aspergillus host organism such that the nutritive polypeptide is secreted from the Aspergillus host organism at an increased level as compared to an Aspergillus host organism comprising a non-integrated nucleic acid encoding the nutritive polypeptide, wherein the non-integrated nucleic acid optionally comprises a plasmid.

19. The method of claim 18, wherein the ratio of secreted nutritive polypeptide to total

nutritive polypeptide produced from the Aspergillus host organism is increased as compared to the Aspergillus host organism comprising the non-integrated nucleic acid encoding the nutritive polypeptide.

20. The method of claim 18, wherein the nutritive polypeptide is encoded by a nucleic acid sequence native to an Aspergillus organism, wherein the nutritive polypeptide is not natively secreted by the Aspergillus organism.

21. A method of producing a nutritive polypeptide in an isolated Aspergillus host organism, comprising the step of introducing into the Aspergillus host organism at least one recombinant nucleic acid, wherein the at least one nucleic acid comprises (i) a first nucleic acid sequence encoding a nutritive polypeptide, (ii) a second nucleic acid sequence encoding a Cas9 endonuclease polypeptide, wherein the second nucleic acid sequence is codon-optimized for production of the Cas9 endonuclease polypeptide in Aspergillus, and (iii) a guide nucleic acid encoding a gRNA, under conditions such that the first nucleic acid sequence is incorporated into a genomic locus present in the Aspergillus host organism such that the nutritive polypeptide is produced by the

Aspergillus host organism.

22. The method of claim 21, further comprising the step of purifying the nutritive polypeptide from at least one cellular component of the Aspergillus host organism.

23. The method of claim 21 , wherein the nutritive polypeptide is secreted from the

Aspergillus host organism.

24. A recombinant Aspergillus host organism comprising a genomically integrated nucleic acid encoding a nutritive polypeptide, wherein the Aspergillus host organism further comprises a variant nucleic acid sequence encoding a secreted protease or a secretory pathway polypeptide.

25. The recombinant Aspergillus host organism of claim 24, wherein the secreted protease is selected from pepA, pepB, pepD, pepF, and pepH.

26. The recombinant Aspergillus host organism of claim 24, wherein the secretory pathway polypeptide is selected from a chaperone, a vesicle trafficking polypeptide, a glycosylation pathway polypeptide, and a protein degradation pathway.

27. The recombinant Aspergillus host organism of claim 24, wherein at least one endogenous Aspergillus nucleic acid is deleted.

28. The recombinant Aspergillus host organism of claim 27, wherein the deleted Aspergillus nucleic acid encodes for a natively secreted polypeptide.

29. A recombinant host organism of the division Ascomycota comprising a genomically integrated nucleic acid encoding a nutritive polypeptide, wherein the host organism further comprises a variant nucleic acid sequence encoding a secreted protease or a secretory pathway polypeptide.

30. The recombinant host organism of claim 29, wherein the secreted protease is selected from pepA, pepB, pepD, pepF, and pepH.

31. The recombinant host organism of claim 29, wherein the secretory pathway polypeptide is selected from a chaperone, a vesicle trafficking polypeptide, a glycosylation pathway polypeptide, and a protein degradation pathway.

32. The recombinant host organism of claim 29, selected from the group consisting of

Aspergillus, Trichoderma, Neurospora, Podospora, Endothia, Mucoromycotina, Cochliobolus and Pyricularia.

33. The recombinant host organism of claim 29, selected from the group consisting of A. niger, A. awomari, A. oryzae, Neurospora crassa, Trichoderma reesei, Fusarium graminearum, and Chrysosporium lucknowense.

34. A polypeptide production system comprising an isolated Aspergillus host organism

comprising i) a variant nucleic acid sequence encoding a secreted protease or a secretory pathway polypeptide, and ii) a recombinant nucleic acid encoding a nutritive polypeptide.

35. A polypeptide production system comprising an isolated Aspergillus host organism

comprising (i) a first nucleic acid sequence encoding a nutritive polypeptide, (ii) a second nucleic acid sequence encoding a Cas9 endonuclease polypeptide, wherein the second nucleic acid sequence is optionally codon-optimized for production of the Cas9 endonuclease polypeptide in Aspergillus, and (iii) a guide nucleic acid encoding a guide RNA (gRNA).

36. The system of claim 34 or 35, wherein the host organism is deficient in production of at least one protease naturally produced by Aspergillus.

37. The system of claim 34 or 35, wherein the host organism is deficient in production of at least one amylase naturally produced by Aspergillus.

38. The system of claim 35, wherein the gRNA comprises a nucleic acid sequence

complementary to a target sequence present in the genomic DNA of the Aspergillus host organism.

39. The system of claim 34 or 35, wherein the first nucleic acid sequence comprises one or more nucleotide insertions, deletions or substitutions compared to a reference nucleic acid encoding a reference nutritive polypeptide.

40. A nutritive polypeptide produced by the system of claim 34 or 35.

41. The nutritive polypeptide of claim 40, wherein the nutritive polypeptide is secreted from the host cell.

42. A nutritive composition comprising the nutritive polypeptide of claim 40 or 41, wherein the nutritive composition is purified to be substantially free of non-comestible products.

43. The nutritive composition of claim 42, comprising the purified nutritive polypeptide in an amount effective to treat or prevent a disease, disorder or condition characterized by the lack of adequate protein nutrition.

44. The nutritive composition of claim 42, substantially free of native Aspergillus

polypeptides.

45. A formulation containing the isolated nutritive polypeptide of claim 40 or 41, present at a concentration of at least 10% w/w.

46. A formulation containing the isolated nutritive polypeptide of claim 40 or 41, present in a nutritive amount.

47. A food product containing at least lg of the isolated nutritive polypeptide of claim 40 or 41 or the nutritive composition of any one of claims 42-44.

48. A kit comprising in one or more containers a first nucleic acid and a second nucleic acid, wherein the first nucleic acid comprises a first nucleic acid sequence encoding a Cas9 polypeptide, wherein the first nucleic acid sequence is optionally codon-optimized for production of the Cas9 polypeptide in Aspergillus, wherein the Cas9 polypeptide is catalytically active, and wherein the second nucleic acid comprises a second nucleic acid sequence encoding a gRNA.

49. An isolated nucleic acid comprising (i) a first nucleic acid sequence encoding a nutritive polypeptide, (ii) a second nucleic acid sequence encoding a Cas9 endonuclease polypeptide, wherein the second nucleic acid sequence is codon-optimized for production of the Cas9 endonuclease polypeptide in Aspergillus, and (iii) a guide nucleic acid encoding a gRNA.

50. A polypeptide production system comprising an isolated Aspergillus host organism

comprising a first nucleic acid and a second nucleic acid, wherein the first nucleic acid comprises a first nucleic acid sequence encoding a nutritive polypeptide, and wherein the second nucleic acid comprises a second nucleic acid sequence encoding a Cas9 polypeptide, wherein the second nucleic acid sequence is optionally codon-optimized for production of the Cas9 polypeptide in Aspergillus, and wherein the Cas9 polypeptide is catalytically inactive.

51. The system of claim 50, further comprising a third nucleic acid comprising a third nucleic acid sequence comprising gRNA.

52. The system of claim 50, further comprising a third nucleic acid comprising a third nucleic acid sequence encoding gRNA.

53. The system of claim 50, wherein at least one endogenous Aspergillus nucleic acid is mutated or deleted.

54. The system of claim 50, wherein the level of expression of at least one endogenous Aspergillus nucleic acid is reduced compared to an Aspergillus host organism lacking the second nucleic acid.

55. The system of claim 50, wherein the level of expression of at least one endogenous Aspergillus nucleic acid is increased compared to an Aspergillus host organism lacking the second nucleic acid.

56. A polypeptide production system comprising an isolated Aspergillus host organism

comprising a first nucleic acid, wherein the first nucleic acid comprises a first nucleic acid sequence encoding a Cas9 polypeptide, wherein the first nucleic acid sequence is optionally codon-optimized for production of the Cas9 polypeptide in Aspergillus, and wherein the Cas9 polypeptide is catalytically inactive, or progeny of the isolated

Aspergillus host organism.

57. The system of claim 56, capable of accepting a second nucleic acid, wherein the second nucleic acid comprises a second nucleic acid sequence encoding a nutritive polypeptide.

58. The system of claim 57, comprising a population of isolated Aspergillus host organisms under conditions such that acceptance of the second nucleic acid occurs at a higher frequency than in an isolated Aspergillus host organism lacking the first nucleic acid, or in progeny of the an isolated Aspergillus host organism lacking the first nucleic acid.

59. An isolated nucleic acid encoding a Cas9 polypeptide, wherein the nucleic acid is codon- optimized for production of the Cas9 polypeptide in an Aspergillus host organism.

Description:
TITLE

[0001] Nutritive polypeptide production systems, and methods of manufacture and use thereof.

CROSS REFERENCE TO RELATED APPLICATIONS

[0002] This application is related to PCT/US2014/057526, filed September 25, 2014;

PCT US2014/057527, filed September 25, 2014; PCT/US2014/057528, filed September 25, 2014; and U.S. provisional application no. 61/889,205, filed on October 10, 2013, the entire disclosures of which are hereby incorporated by reference in their entirety for all purposes.

BACKGROUND

[0003] Dietary protein is an essential nutrient for human health and growth. The World Health Organization recommends that dietary protein should contribute approximately 10 to 15% of energy intake when in energy balance and weight stable. Average daily protein intakes in various countries indicate that these recommendations are consistent with the amount of protein being consumed worldwide. Meals with an average of 20 to 30% of energy from protein are representative of high-protein diets when consumed in energy balance. The body cannot synthesize certain amino acids that are necessary for health and growth, and instead must obtain them from food. These amino acids, called "essential amino acids", are Histidine (H), Isoleucine (I), Leucine (L), Lysine (K), Methionine (M), Phenylalanine (F), Threonine (T), Tryptophan (W), and Valine (V). Dietary protein sources that provide all the essential amino acids are referred to as "high quality" proteins. Animal foods such as meat, fish, poultry, eggs, and dairy products are generally regarded as high quality protein sources that provide a good balance of essential amino acids. Casein (a protein commonly found in mammalian milk, making up 80% of the proteins in cow milk) and whey (the protein in the liquid that remains after milk has been curdled and strained) are major sources of high quality dietary protein. Foods that do not provide a good balance of essential amino acids are referred to as "low quality" protein sources. Most fruits and vegetables are poor sources of protein. Some plant foods including beans, peas, lentils, nuts and grains (such as wheat) are better sources of protein but may have allergenicity issues. Soy, a vegetable protein manufactured from soybeans, is considered by some to be a high quality protein. Studies of high protein diets for weight loss have shown that protein positively affects energy expenditure and lean body mass. Further studies have shown that overeating produces significantly less weight gain in diets containing at least 5% of energy from protein, and that a high-protein diet decreases energy intake. Proteins commonly found in foods do not necessarily provide an amino acid composition that meets the amino acid requirements of a mammal, such as a human, in an efficient manner. The result is that, in order to attain the minimal requirements of each essential amino acid, a larger amount of total protein must be consumed in the diet than would be required if the quality of the dietary protein were higher. By increasing the quality of the protein in the diet it is possible to reduce the total amount of protein that must be consumed compared to diets that include lower quality proteins.

Traditionally, desirable mixtures of amino acids, such as mixtures comprising essential amino acids, have been provided by hydrolyzing a protein with relatively high levels of essential amino acids, such as whey protein, and/or by combining free amino acids in a mixture that optionally also includes a hydrolyzed protein such as whey. Mixtures of this type may have a bitter taste, undesirable mouthfeel and are poorly soluble, and may be deemed unsuitable or undesirable for certain uses. As a result, such mixtures sometimes include flavoring agents to mask the taste of the free amino acids and/or hydrolyzed protein. In some cases compositions in which a proportion of the amino acid content is provided by polypeptides or proteins are found to have a better taste than compositions with a high proportion of total amino acids provided as free amino acids and/or certain hydrolyzed proteins. The availability of such compositions has been limited, however, because nutritional formulations have traditionally been made from protein isolated from natural food products, such as whey isolated from milk, or soy protein isolated from soy. The amino acid profiles of those proteins do not necessarily meet the amino acid requirements for a mammal. In addition, commodity proteins typically consist of mixtures of proteins and/or protein hydrolysates which can vary in their protein composition, thus leading to unpredictability regarding their nutritional value.

Moreover, the limited number of sources of such high quality proteins has meant that only certain combinations of amino acids are available on a large scale for ingestion in protein form. The agricultural methods required for the supply of high quality animal protein sources such as casein and whey, eggs, and meat, as well as plant proteins such as soy, also require significant energy inputs and have potentially deleterious environmental impacts.

[0004] Accordingly, it would be useful in certain situations to have alternative sources and methods of supplying proteins for mammalian consumption. One feature that can enhance the utility of a nutritive protein is its solubility. Nutritive proteins with higher solubility can exhibit desirable characteristics such as increased stability, resistance to aggregation, and desirable taste profiles. For example, a nutritive protein that exhibits enhanced solubility can be formulated into a beverage or liquid formulation that includes a high concentration of nutritive protein in a relatively low volume of solution, thus delivering a large dose of protein nutrition per unit volume. A soluble nutritive protein can be useful in sports drinks or recovery drinks wherein a user (e.g., an athlete) wants to ingest nutritive protein before, during or after physical activity. A nutritive protein that exhibits enhanced solubility can also be particularly useful in a clinical setting wherein a subject (e.g., a patient or an elderly person) is in need of protein nutrition but is unable to consume solid foods or large volumes of liquids.

[0005] Therefore, it is also useful to have nutritive protein production systems, such as host cells, which produce nutritive polypeptides for mammalian consumption.

SUMMARY OF THE INVENTION

[0006] In a first aspect, provided are methods of increasing production of a nutritive polypeptide in a host organism, including the steps of: a) providing a first Aspergillus host organism including i) a variant nucleic acid sequence encoding a secreted protease or a secretory pathway polypeptide, and ii) at least one recombinant nucleic acid encoding at least one nutritive polypeptide; and b) culturing the first Aspergillus host organism in a fermentation medium, wherein the production of the at least one nutritive polypeptide in the cultured first Aspergillus host organism is increased relative to a cultured Aspergillus host organism including a reference nucleic acid sequence. In some embodiments, the method further includes the step of purifying the produced nutritive polypeptide from the first Aspergillus host organism. Also provided are nutritive compositions including the purified nutritive polypeptide, wherein the nutritive composition is substantially free of non- comestible products. In some embodiments, the composition includes the purified nutritive polypeptide in an amount effective to treat or prevent a disease, disorder or condition characterized by the lack of adequate protein nutrition. In some embodiments, the secreted protease is selected from the group consisting of pepA, pepB, pepD, pepF, and pepH. In some embodiments, the secretory pathway polypeptide is selected from the group consisting of a chaperone, a vesicle trafficking polypeptide, a glycosylation pathway polypeptide, and a protein degradation pathway. In some embodiments, the chaperone includes a foldase. In some embodiments, the recombinant nucleic acid encoding the nutritive polypeptide is codon-optimized for production of the nutritive polypeptide in Aspergillus. In some embodiments, the first Aspergillus host organism includes i) a first nucleic acid sequence encoding a Cas9 endonuclease polypeptide, and ii) a second nucleic acid sequence including a guide nucleic acid encoding a gR A. In some embodiments, the first nucleic acid sequence and/or the second nucleic acid sequence is incorporated into a genomic locus present in the Aspergillus host organism. In some embodiments, each of: the first nucleic acid sequence, the second nucleic acid sequence and the recombinant nucleic acid encoding the nutritive polypeptide, is either i) independently incorporated into more than one genomic loci present in the Aspergillus host organism or ii) incorporated into one genomic locus present in the Aspergillus host organism. In some embodiments, the nutritive polypeptide is substantially secreted from the Aspergillus host organism. In some embodiments, the solubility of the secreted nutritive polypeptide is greater than a reference nutritive polypeptide secreted from a cultured Aspergillus host organism including a reference nucleic acid sequence encoding the reference nutritive polypeptide. In some embodiments, the content of O- linked glycosylation, N-linked glycosylation, or both O-linked glycosylation and N-linked glycosylation, of the secreted nutritive polypeptide is greater than a reference nutritive polypeptide secreted from a cultured Aspergillus host organism including a reference nucleic acid sequence, or the content of O-linked glycosylation, N-linked glycosylation, or both O-linked glycosylation and N-linked glycosylation, of the secreted nutritive polypeptide is less than a reference nutritive polypeptide secreted from the cultured Aspergillus host organism including a reference nucleic acid sequence encoding the reference nutritive polypeptide. In some embodiments, the method includes providing a plurality of recombinant nucleic acids individually encoding at least one nutritive polypeptide or a recombinant nucleic acid that encodes a plurality of nutritive polypeptides.

[0007] In another aspect, provided are methods of increasing production of a nutritive polypeptide in an isolated fungal host organism, including the step of introducing into an fungal host organism at least one recombinant nucleic acid including a first nucleic acid sequence encoding a nutritive polypeptide, a second nucleic acid sequence encoding a Cas9 endonuclease polypeptide, wherein the second nucleic acid sequence is codon-optimized for production of the Cas9 endonuclease polypeptide in the fungal host organism, and a guide nucleic acid encoding a gRNA, under conditions such that the first nucleic acid sequence is integrated into a genomic locus present in the fungal host organism such that the nutritive polypeptide is produced by the fungal host organism at an increased level as compared to an fungal host organism including a non-integrated nucleic acid encoding the nutritive polypeptide. In some embodiments, the non-integrated nucleic acid includes a plasmid. [0008] In another aspect, provided are methods of increasing secretion of a nutritive polypeptide in an isolated Aspergillus host organism, including the step of introducing into the Aspergillus host organism one or more nucleic acids, wherein the one or more nucleic acids comprise: (i) a first nucleic acid sequence encoding a nutritive polypeptide, wherein the first nucleic acid further encodes a secretion leader peptide, (ii) a second nucleic acid sequence encoding a Cas9 endonuclease polypeptide, wherein the second nucleic acid sequence is codon-optimized for production of the Cas9 endonuclease polypeptide in

Aspergillus, and (iii) a guide nucleic acid, wherein the guide nucleic acid includes a gRNA or encodes a gRNA, under conditions such that at least one of the first or second nucleic acid sequences is incorporated into a genomic locus present in the Aspergillus host organism such that the nutritive polypeptide is secreted from the Aspergillus host organism at an increased level as compared to an Aspergillus host organism including a non-integrated nucleic acid encoding the nutritive polypeptide, wherein the non-integrated nucleic acid optionally includes a plasmid. In some embodiments, the ratio of secreted nutritive polypeptide to total nutritive polypeptide produced from the Aspergillus host organism is increased as compared to the Aspergillus host organism including the non-integrated nucleic acid encoding the nutritive polypeptide. In some embodiments, the nutritive polypeptide is encoded by a nucleic acid sequence native to an Aspergillus organism, wherein the nutritive polypeptide is not natively secreted by the Aspergillus organism.

[0009] In another aspect, provided are methods of producing a nutritive polypeptide in an isolated Aspergillus host organism, including the step of introducing into the Aspergillus host organism at least one recombinant nucleic acid, wherein the at least one nucleic acid includes (i) a first nucleic acid sequence encoding a nutritive polypeptide, (ii) a second nucleic acid sequence encoding a Cas9 endonuclease polypeptide, wherein the second nucleic acid sequence is codon-optimized for production of the Cas9 endonuclease polypeptide in

Aspergillus, and (iii) a guide nucleic acid encoding a gRNA, under conditions such that the first nucleic acid sequence is incorporated into a genomic locus present in the Aspergillus host organism such that the nutritive polypeptide is produced by the Aspergillus host organism. In some embodiments, the method further includes the step of purifying the nutritive polypeptide from at least one cellular component of the Aspergillus host organism. In some embodiments, the nutritive polypeptide is secreted from the Aspergillus host organism. [0010] In another aspect, provided are recombinant Aspergillus host organisms including a genomically integrated nucleic acid encoding a nutritive polypeptide, wherein the Aspergillus host organism further includes a variant nucleic acid sequence encoding a secreted protease or a secretory pathway polypeptide. In some embodiments, the secreted protease is selected from pepA, pepB, pepD, pepF, and pepH. In some embodiments, the secretory pathway polypeptide is selected from a chaperone, a vesicle trafficking polypeptide, a glycosylation pathway polypeptide, and a protein degradation pathway. In some embodiments, at least one endogenous Aspergillus nucleic acid is deleted. In some embodiments, the deleted

Aspergillus nucleic acid encodes for a natively secreted polypeptide.

[0011] In another aspect, provided are recombinant host organisms of the division

Ascomycota including a genomically integrated nucleic acid encoding a nutritive polypeptide, wherein the host organism further includes a variant nucleic acid sequence encoding a secreted protease or a secretory pathway polypeptide. In some embodiments, the secreted protease is selected from pepA, pepB, pepD, pepF, and pepH. In some embodiments, the secretory pathway polypeptide is selected from a chaperone, a vesicle trafficking polypeptide, a glycosylation pathway polypeptide, and a protein degradation pathway. In some embodiments, the organism is Aspergillus, Trichoderma, Neurospora, Podospora, Endothia, Mucoromycotina, Cochliobolus or Pyricularia. In some embodiments, the organism is A. niger, A. awomari, A. oryzae, Neurospora crassa, Trichoderma reesi, Fusarium graminearum, or Chrysosporium lucknowense.

[0012] In another aspect, provided are polypeptide production systems including an isolated Aspergillus host organism including i) a variant nucleic acid sequence encoding a secreted protease or a secretory pathway polypeptide, and ii) a recombinant nucleic acid encoding a nutritive polypeptide.

[0013] In another aspect, provided are polypeptide production systems including an isolated Aspergillus host organism including (i) a first nucleic acid sequence encoding a nutritive polypeptide, (ii) a second nucleic acid sequence encoding a Cas9 endonuclease polypeptide, wherein the second nucleic acid sequence is optionally codon-optimized for production of the Cas9 endonuclease polypeptide in Aspergillus, and (iii) a guide nucleic acid encoding a guide RNA (gRNA). In some embodiments, the host organism is deficient in production of at least one protease naturally produced by Aspergillus. In some embodiments, the host organism is deficient in production of at least one amylase naturally produced by Aspergillus. In some embodiments, the gRNA includes a nucleic acid sequence complementary to a target sequence present in the genomic DNA of the Aspergillus host organism. In some embodiments, the first nucleic acid sequence includes one or more nucleotide insertions, deletions or substitutions compared to a reference nucleic acid encoding a reference nutritive polypeptide. Also provided is the nutritive polypeptide produced by the systems described herein. In some embodiments, the nutritive polypeptide is secreted from the host cell. In some embodiments, the nutritive composition is purified to be substantially free of non-comestible products. In some embodiments, the composition includes the purified nutritive polypeptide in an amount effective to treat or prevent a disease, disorder or condition characterized by the lack of adequate protein nutrition. In some embodiments, the composition is substantially free of native Aspergillus polypeptides. Also provided are formulations containing the isolated nutritive polypeptides, present at a concentration of at least 10% w/w or otherwise in a nutritive amount. Also provided are food products containing at least lg of the isolated nutritive polypeptides or nutritive compositions described herein.

[0014] In another aspect, provided are kits including in one or more containers a first nucleic acid and a second nucleic acid, wherein the first nucleic acid includes a first nucleic acid sequence encoding a Cas9 polypeptide, wherein the first nucleic acid sequence is optionally codon-optimized for production of the Cas9 polypeptide in Aspergillus, wherein the Cas9 polypeptide is catalytically active, and wherein the second nucleic acid includes a second nucleic acid sequence encoding a gRNA.

[0015] In another aspect, provided are isolated nucleic acids including (i) a first nucleic acid sequence encoding a nutritive polypeptide, (ii) a second nucleic acid sequence encoding a Cas9 endonuclease polypeptide, wherein the second nucleic acid sequence is codon- optimized for production of the Cas9 endonuclease polypeptide in Aspergillus, and (iii) a guide nucleic acid encoding a gRNA.

[0016] In another aspect, provided are polypeptide production systems including an isolated Aspergillus host organism including a first nucleic acid and a second nucleic acid, wherein the first nucleic acid includes a first nucleic acid sequence encoding a nutritive polypeptide, and wherein the second nucleic acid includes a second nucleic acid sequence encoding a Cas9 polypeptide, wherein the second nucleic acid sequence is optionally codon-optimized for production of the Cas9 polypeptide in Aspergillus, and wherein the Cas9 polypeptide is catalytically inactive. In some embodiments, the system includes a third nucleic acid including a third nucleic acid sequence including gRNA. In some embodiments, the system includes a third nucleic acid including a third nucleic acid sequence encoding gRNA. In some embodiments, at least one endogenous Aspergillus nucleic acid is mutated or deleted. In some embodiments, the level of expression of at least one endogenous Aspergillus nucleic acid is reduced compared to an Aspergillus host organism lacking the second nucleic acid. In some embodiments, the level of expression of at least one endogenous Aspergillus nucleic acid is increased compared to an Aspergillus host organism lacking the second nucleic acid.

[0017] In another aspect, provided are polypeptide production systems including an isolated Aspergillus host organism including a first nucleic acid, wherein the first nucleic acid includes a first nucleic acid sequence encoding a Cas9 polypeptide, wherein the first nucleic acid sequence is optionally codon-optimized for production of the Cas9 polypeptide in Aspergillus, and wherein the Cas9 polypeptide is catalytically inactive, or progeny of the isolated Aspergillus host organism. In some embodiments, the system is capable of accepting a second nucleic acid, wherein the second nucleic acid includes a second nucleic acid sequence encoding a nutritive polypeptide. In some embodiments, the system includes a population of isolated Aspergillus host organisms under conditions such that acceptance of the second nucleic acid occurs at a higher frequency than in an isolated Aspergillus host organism lacking the first nucleic acid, or in progeny of the an isolated Aspergillus host organism lacking the first nucleic acid.

[0018] In another aspect, provided are isolated nucleic acids encoding a Cas9 polypeptide, wherein the nucleic acid is codon-optimized for production of the Cas9 polypeptide in an Aspergillus host organism.

DETAILED DESCRIPTION

[0019] Terms used in the claims and specification are defined as set forth below unless otherwise specified.

[0020] A "comestible product" includes an edible product, while a "non-comestible product" is generally an inedible product or contains an inedible product. To be

"substantially free of non-comestible products" means a composition does not have an amount or level of non-comestible product sufficient to render the composition inedible, dangerous or otherwise unfit for consumption by its intended consumer. Alternatively, a polypeptide can be substantially free of non-comestible products, meaning the polypeptide does not contain or have associated therewith an amount or level of non-comestible product sufficient to render a composition containing the polypeptide inedible by, or unsafe or deleterious to, its intended consumer. In preferred embodiments a composition substantially free of non-comestible products can be consumed in a nutritional amount by an intended consumer who does not suffer or is not at increased risk of suffering a deleterious event from such consumption. For example, levels of lead and other metals are well-documented as having significant risk including toxicity to humans when present in food, particularly foods containing an agriculturally-derived product grown in soil contaminated with lead and/or other metals. Thus, products such as foods, beverages, and compounds containing industrially-produced polypeptides having metal content above a certain parts per million (ppm), are considered non-comestible products, such metal content depending upon the metal as recognized in the art. For example, inclusion of lead or cadmium in an industrially- produced polypeptide at levels such that the lead will have a deleterious biological effect when consumed by a mammal will generally render a composition containing the industrially-produced polypeptide non-comestible. Notwithstanding the above, some polypeptides have certain amounts of metals complexed to or incorporated therein (such as iron, zinc, calcium and magnesium) and such metals shall not necessarily render the polypeptides non-comestible.

[0021] The term "control sequences" is intended to encompass, at a minimum, any component whose presence is essential for expression, and can also encompass an additional component whose presence is advantageous, for example, leader sequences and fusion partner sequences.

[0022] As used herein, the phrase "degenerate variant" of a reference nucleic acid sequence encompasses nucleic acid sequences that can be translated, according to the standard genetic code, to provide an amino acid sequence identical to that translated from the reference nucleic acid sequence. The term "degenerate oligonucleotide" or "degenerate primer" is used to signify an oligonucleotide capable of hybridizing with target nucleic acid sequences that are not necessarily identical in sequence but that are homologous to one another within one or more particular segments.

[0023] As used herein, an "essential amino acid" is an amino acid selected from Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Threonine, Tryptophan, and Valine. However, it should be understood that "essential amino acids" can vary through a typical lifespan, e.g., cysteine, tyrosine, and arginine are considered essential amino acids in infant humans. Imura K, Okada A (1998). "Amino acid metabolism in pediatric patients". Nutrition 14 (1): 143-8. In addition, the amino acids arginine, cysteine, glycine, glutamine, histidine, proline, serine and tyrosine are considered "conditionally essential" in adults, meaning they are not normally required in the diet, but must be supplied exogenously to specific populations that do not synthesize them in adequate amounts. Furst P, Stehle P (1 June 2004). "What are the essential elements needed for the determination of amino acid requirements in humans?". Journal of Nutrition 134 (6 Suppl): 1558S— 1565S; and Reeds PJ (1 July 2000). "Dispensable and indispensable amino acids for humans". J. Nutr. 130 (7): 1835S^10S.

[0024] As used herein, an "expression control sequence" refers to polynucleotide sequences which are necessary to affect the expression of coding sequences to which they are operatively linked. Expression control sequences are sequences which control the transcription, post-transcriptional events and translation of nucleic acid sequences.

Expression control sequences include appropriate transcription initiation, termination, promoter and enhancer sequences; efficient RNA processing signals such as splicing and polyadenylation signals; sequences that stabilize cytoplasmic mRNA; sequences that enhance translation efficiency (e.g., ribosome binding sites); sequences that enhance protein stability; and when desired, sequences that enhance protein secretion. The nature of such control sequences differs depending upon the host organism; in prokaryotes, such control sequences generally include promoter, ribosomal binding site, and transcription termination sequence.

[0025] The term "fusion protein" refers to a polypeptide comprising a polypeptide or fragment coupled to heterologous amino acid sequences. Fusion proteins are useful because they can be constructed to contain two or more desired functional elements that can be from two or more different proteins. A fusion protein comprises at least 10 contiguous amino acids from a polypeptide of interest, or at least 20 or 30 amino acids, or at least 40, 50 or 60 amino acids, or at least 75, 100 or 125 amino acids. The heterologous polypeptide included within the fusion protein is usually at least 6 amino acids in length, or at least 8 amino acids in length, or at least 15, 20, or 25 amino acids in length. Fusions that include larger polypeptides, such as an IgG Fc region, and even entire proteins, such as the green fluorescent protein ("GFP") chromophore-containing proteins, have particular utility. Fusion proteins can be produced recombinantly by constructing a nucleic acid sequence which encodes the polypeptide or a fragment thereof in frame with a nucleic acid sequence encoding a different protein or peptide and then expressing the fusion protein. Alternatively, a fusion protein can be produced chemically by crosslinking the polypeptide or a fragment thereof to another protein.

[0026] Sequence homology for polypeptides, which is also referred to as percent sequence identity, is typically measured using sequence analysis software. See, e.g., the Sequence Analysis Software Package of the Genetics Computer Group (GCG), University of Wisconsin Biotechnology Center, 910 University Avenue, Madison, Wis. 53705. Protein analysis software matches similar sequences using a measure of homology assigned to various substitutions, deletions and other modifications, including conservative amino acid substitutions. For instance, GCG contains programs such as "Gap" and "Bestfit" which can be used with default parameters to determine sequence homology or sequence identity between closely related polypeptides, such as homologous polypeptides from different species of organisms or between a wild-type polypeptide and a mutein thereof. See, e.g., GCG Version 6. An exemplary algorithm when comparing a particular polypeptide sequence to a database containing a large number of sequences from different organisms is the computer program BLAST (Altschul et al., J. Mol. Biol. 215:403-410 (1990); Gish and States, Nature Genet. 3:266-272 (1993); Madden et al, Meth. Enzymol. 266:131-141 (1996); Altschul et al., Nucleic Acids Res. 25:3389-3402 (1997); Zhang and Madden, Genome Res. 7:649-656 (1997)), especially blastp or tblastn (Altschul et al., Nucleic Acids Res. 25:3389- 3402 (1997)).

[0027] As used herein, the term "heterotrophic" refers to an organism that cannot fix carbon and uses organic carbon for growth.

[0028] As used herein, a polypeptide has "homology" or is "homologous" to a second polypeptide if the nucleic acid sequence that encodes the polypeptide has a similar sequence to the nucleic acid sequence that encodes the second polypeptide. Alternatively, a polypeptide has homology to a second polypeptide if the two polypeptides have similar amino acid sequences. (Thus, the term "homologous polypeptides" is defined to mean that the two polypeptides have similar amino acid sequences.) When "homologous" is used in reference to polypeptides or peptides, it is recognized that residue positions that are not identical often differ by conservative amino acid substitutions. A "conservative amino acid substitution" is one in which an amino acid residue is substituted by another amino acid residue having a side chain (R group) with similar chemical properties (e.g., charge or hydrophobicity). In general, a conservative amino acid substitution will not substantially change the functional properties of a polypeptide. In cases where two or more amino acid sequences differ from each other by conservative substitutions, the percent sequence identity or degree of homology can be adjusted upwards to correct for the conservative nature of the substitution. Means for making this adjustment are well known to those of skill in the art. See, e.g., Pearson, 1994, Methods Mol. Biol. 24:307-31 and 25:365-89. The following six groups each contain amino acids that are conservative substitutions for one another: 1) Serine, Threonine; 2) Aspartic Acid, Glutamic Acid; 3) Asparagine, Glutamine; 4) Arginine, Lysine; 5) Isoleucine, Leucine, Methionine, Alanine, Valine, and 6) Phenylalanine, Tyrosine, Tryptophan. In some embodiments, polymeric molecules (e.g., a polypeptide sequence or nucleic acid sequence) are considered to be homologous to one another if their sequences are at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, %, at least 97%, %, at least 98%, or at least 99% identical. In some embodiments, polymeric molecules are considered to be "homologous" to one another if their sequences are at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, %, at least 97%, %, at least 98%, or at least 99% similar. The term "homologous" necessarily refers to a comparison between at least two sequences (nucleotides sequences or amino acid sequences). In some embodiments, two nucleotide sequences are considered to be homologous if the polypeptides they encode are at least about 50% identical, at least about 60% identical, at least about 70% identical, at least about 80% identical, or at least about 90% identical for at least one stretch of at least about 10, 15, 20, 25, 30, 35, 40, 45, 50 or over 50 amino acids. In some embodiments, homologous nucleotide sequences are characterized by the ability to encode a stretch of at least 4-5 uniquely specified amino acids. Both the identity and the approximate spacing of these amino acids relative to one another must be considered for nucleotide sequences to be considered homologous. In some embodiments of nucleotide sequences less than 60 nucleotides in length, homology is determined by the ability to encode a stretch of at least 4-5 uniquely specified amino acids. In some embodiments, two polypeptide sequences are considered to be homologous if the polypeptides are at least about 50% identical, at least about 60% identical, at least about 70% identical, at least about 80% identical, or at least about 90% identical for at least one stretch of at least about 20 amino acids. In other embodiments, two polypeptide sequences are considered to be homologous if the polypeptides are similar, such as at least about 50% similar, at least about 60% similar, at least about 70% similar, at least about 80% similar, or at least about 90% similar, or at least about 95% similar for at least one stretch of at least about 20 amino acids. In some embodiments similarity is demonstrated by fewer nucleotide changes that result in an amino acid change (e.g., a nucleic acid sequence having a single nucleotide change is more similar to a reference nucleic acid sequence than a nucleic acid sequence having two nucleotide changes, even if both changes result in an identical amino acid substitution.

[0029] As used herein, a "modified derivative" refers to polypeptides or fragments thereof that are substantially homologous in primary structural sequence to a reference polypeptide sequence but which include, e.g., in vivo or in vitro chemical and biochemical modifications or which incorporate amino acids that are not found in the reference polypeptide. Such modifications include, for example, acetylation, carboxylation, phosphorylation, glycosylation, ubiquitination, labeling, e.g., with radionuclides, and various enzymatic modifications, as will be readily appreciated by those skilled in the art. A variety of methods for labeling polypeptides and of substituents or labels useful for such purposes are well known in the art, and include radioactive isotopes such as 1251, 32P, 35S, and 3H, ligands that bind to labeled antiligands (e.g., antibodies), fluorophores, chemiluminescent agents, enzymes, and antiligands that can serve as specific binding pair members for a labeled ligand. The choice of label depends on the sensitivity required, ease of conjugation with the primer, stability requirements, and available instrumentation. Methods for labeling polypeptides are well known in the art. See, e.g., Ausubel et al., Current Protocols in Molecular Biology, Greene Publishing Associates (1992, and Supplements to 2002).

[0030] The term "nucleic acid fragment" as used herein refers to a nucleic acid sequence that has a deletion, e.g., a 5'-terminal or 3 '-terminal deletion compared to a full-length reference nucleotide sequence. In an embodiment, the nucleic acid fragment is a contiguous sequence in which the nucleotide sequence of the fragment is identical to the corresponding positions in the naturally-occurring sequence. In some embodiments, fragments are at least 10, 15, 20, or 25 nucleotides long, or at least 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, or 150 nucleotides long. In some embodiments a fragment of a nucleic acid sequence is a fragment of an open reading frame sequence. In some embodiments such a fragment encodes a polypeptide fragment (as defined herein) of the protein encoded by the open reading frame nucleotide sequence.

[0031] A composition, formulation or product is "nutritional" or "nutritive" if it provides an appreciable amount of nourishment to its intended consumer, meaning the consumer assimilates all or a portion of the composition or formulation into a cell, organ, and/or tissue. Generally such assimilation into a cell, organ and/or tissue provides a benefit or utility to the consumer, e.g., by maintaining or improving the health and/or natural function(s) of said cell, organ, and/or tissue. A nutritional composition or formulation that is assimilated as described herein is termed "nutrition." By way of non-limiting example, a polypeptide is nutritional if it provides an appreciable amount of polypeptide nourishment to its intended consumer, meaning the consumer assimilates all or a portion of the protein, typically in the form of single amino acids or small peptides, into a cell, organ, and/or tissue. "Nutrition" also means the process of providing to a subject, such as a human or other mammal, a nutritional composition, formulation, product or other material. A nutritional product need not be "nutritionally complete," meaning if consumed in sufficient quantity, the product provides all carbohydrates, lipids, essential fatty acids, essential amino acids, conditionally essential amino acids, vitamins, and minerals required for health of the consumer.

Additionally, a "nutritionally complete protein" contains all protein nutrition required (meaning the amount required for physiological normalcy by the organism) but does not necessarily contain micronutrients such as vitamins and minerals, carbohydrates or lipids.

[0032] In preferred embodiments, a composition or formulation is nutritional in its provision of polypeptide capable of decomposition (i.e., the breaking of a peptide bond, often termed protein digestion) to single amino acids and/or small peptides (e.g., two amino acids, three amino acids, or four amino acids, possibly up to ten amino acids) in an amount sufficient to provide a "nutritional benefit." In addition, in certain embodiments provided are nutritional polypeptides that transit across the gastrointestinal wall and are absorbed into the bloodstream as small peptides (e.g., larger than single amino acids but smaller than about ten amino acids) or larger peptides, oligopeptides or polypeptides (e.g., >11 amino acids). A nutritional benefit in a polypeptide-containing composition can be demonstrated and, optionally, quantified, by a number of metrics. For example, a nutritional benefit is the benefit to a consuming organism equivalent to or greater than at least about 0.5% of a reference daily intake value of protein, such as about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100% or greater than about 100% of a reference daily intake value. Alternatively, a nutritional benefit is demonstrated by the feeling and/or recognition of satiety by the consumer. In other embodiments, a nutritional benefit is demonstrated by incorporation of a substantial amount of the polypeptide component of the composition or formulation into the cells, organs and/or tissues of the consumer, such incorporation generally meaning that single amino acids or short peptides are used to produce polypeptides de novo intracellularly. A "consumer" or a "consuming organism" means any animal capable of ingesting the product having the nutritional benefit. Typically, the consumer will be a mammal such as a healthy human, e.g., a healthy infant, child, adult, or older adult. Alternatively, the consumer will be a mammal such as a human (e.g., an infant, child, adult or older adult) at risk of developing or suffering from a disease, disorder or condition characterized by (i) the lack of adequate nutrition and/or (ii) the alleviation thereof by the nutritional products of the present invention. An "infant" is generally a human under about age 1 or 2, a "child" is generally a human under about age 18, and an "older adult" or "elderly" human is a human aged about 65 or older.

[0033] In other preferred embodiments, a composition or formulation is nutritional in its provision of carbohydrate capable of hydrolysis by the intended consumer (termed a

"nutritional carbohydrate"). A nutritional benefit in a carbohydrate-containing composition can be demonstrated and, optionally, quantified, by a number of metrics. For example, a nutritional benefit is the benefit to a consuming organism equivalent to or greater than at least about 2% of a reference daily intake value of carbohydrate.

[0034] A polypeptide "nutritional domain" as used herein means any domain of a polypeptide that is capable of providing nutrition. Preferably, a polypeptide nutritional domain provides one or more advantages over the full-length polypeptide containing the nutritional domain, such as the nutritional domain provides more nutrition than the full- length polypeptide. For example, a polypeptide nutritional domain has a higher

concentration of desirable amino acids, has a lower concentration of undesirable amino acids, contains a site for cleavage by a digestive protease, is easier to digest and/or is easier to produce from the digestion of a larger polypeptide, has improved storage characteristics, or a combination of these and/or other factors, in comparison to (i) a reference polypeptide or a reference polypeptide-containing mixture or composition, (ii) the protein(s) or polypeptide(s) present in an agriculturally-derived food product, and/or (Hi) the protein or polypeptide products present in the diet of a mammalian subject. Other advantages of a polypeptide nutritional domain includes easier and/or more efficient production, different or more advantageous physiochemical properties, and/or has different s or more advantageous safety properties (e.g., elimination of one or more allergy domains) relative to full-length polypeptide. A reference polypeptide can be a naturally occurring polypeptide or a recombinantly produced polypeptide, which in turn may have an amino acid sequence identical to or different from a naturally occurring polypeptide. A reference polypeptide may also be a consensus amino acid sequence not present in a naturally-occurring polypeptide. Additionally, a reference polypeptide-containing mixture or composition can be a naturally- occurring mixture, such as a mixture of polypeptides present in a dairy product such as milk or whey, or can be a synthetic mixture of polypeptides (which, in turn, can be naturally- occurring or synthetic). In certain embodiments the nutritional domain contains an amino acid sequence having an N-terminal amino acid and/or a C-terminal amino acid different from the N-terminal amino acid and/or a C-terminal amino acid of a reference secreted polypeptide, such as a full-length secreted polypeptide. For example, a nutritional domain has an N-terminal amino acid sequence that corresponds to an amino acid sequence internal to a larger secreted polypeptide that contains the nutritional domain. A nutritional domain may include or exclude a signal sequence of a larger secreted polypeptide. As used herein, a polypeptide that "contains" a polypeptide nutritional domain contains the entirety of the polypeptide nutritional domain as well as at least one additional amino acid, either N- terminal or C-terminal to the polypeptide nutritional domain. Generally polypeptide nutritional domains are secreted from the cell or organism containing a nucleic acid encoding the nutritional domain, and are termed "secreted polypeptide nutritional domains," and, in circumstances wherein the nutritional domain is secreted from a unicellular (or single celled) organism, it is termed a "unicellular secreted polypeptide nutritional domain."

[0035] In other preferred embodiments, a composition or formulation is nutritional in its provision of lipid capable of digestion, incorporation, conversion, or other cellular uses by the intended consumer (termed a "nutritional lipid"). A nutritional benefit in a lipid- containing composition can be demonstrated and, optionally, quantified, by a number of metrics. For example, a nutritional benefit is the benefit to a consuming organism equivalent to or greater than at least about 2% of a reference daily intake value of lipid (i.e., fat).

[0036] As used herein, "operatively linked" or "operably linked" expression control sequences refers to a linkage in which the expression control sequence is contiguous with the gene of interest to control the gene of interest, as well as expression control sequences that act in trans or at a distance to control the gene of interest.

[0037] The term "percent sequence identity" or "identical" in the context of nucleic acid sequences refers to the residues in the two sequences that are the same when aligned for maximum correspondence. There are a number of different algorithms known in the art that can be used to measure nucleotide sequence identity. For instance, polynucleotide sequences can be compared using FASTA, Gap or Bestfit, which are programs in Wisconsin Package Version 10.0, Genetics Computer Group (GCG), Madison, Wis. FASTA provides alignments and percent sequence identity of the regions of the best overlap between the query and search sequences. Pearson, Methods Enzymol. 183:63-98 (1990).

[0038] The term "polynucleotide," "nucleic acid molecule," "nucleic acid," or "nucleic acid sequence" refers to a polymeric form of nucleotides of at least 10 bases in length. The term includes DNA molecules (e.g., cDNA or genomic or synthetic DNA) and RNA molecules (e.g., mRNA or synthetic RNA), as well as analogs of DNA or RNA containing non-natural nucleotide analogs, non-native internucleoside bonds, or both. The nucleic acid can be in any topological conformation. For instance, the nucleic acid can be single- stranded, double-stranded, triple-stranded, quadruplexed, partially double-stranded, branched, hairpinned, circular, or in a padlocked conformation. A "synthetic" RNA, DNA or a mixed polymer is one created outside of a cell, for example one synthesized chemically. The term "nucleic acid fragment" as used herein refers to a nucleic acid sequence that has a deletion, e.g., a 5'-terminal or 3'-terminal deletion of one or more nucleotides compared to a full- length reference nucleotide sequence. In an embodiment, the nucleic acid fragment is a contiguous sequence in which the nucleotide sequence of the fragment is identical to the corresponding positions in the naturally-occurring sequence. In some embodiments, fragments are at least 10, 15, 20, or 25 nucleotides long, or at least 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800 or greater than 1800 nucleotides long. In some embodiments a fragment of a nucleic acid sequence is a fragment of an open reading frame sequence. In some embodiments such a fragment encodes a polypeptide fragment (as defined herein) of the polypeptide encoded by the open reading frame nucleotide sequence.

[0039] The terms "polypeptide" and "protein" can be interchanged, and these terms encompass both naturally-occurring and non-naturally occurring polypeptides, and, as provided herein or as generally known in the art, fragments, mutants, derivatives and analogs thereof. A polypeptide can be monomeric, meaning it has a single chain, or polymeric, meaning it is composed of two or more chains, which can be covalently or non-covalently associated. Further, a polypeptide may comprise a number of different domains each of which has one or more distinct activities. For the avoidance of doubt, a polypeptide can be any length greater than or equal to two amino acids. The term "isolated polypeptide" is a polypeptide that by virtue of its origin or source of derivation (1) is not associated with naturally associated components that accompany it in any of its native states, (2) exists in a purity not found in nature, where purity can be adjudged with respect to the presence of other cellular material (e.g., is free of other polypeptides from the same species or from the host species in which the polypeptide was produced) (3) is expressed by a cell from a different species, (4) is recombinantly expressed by a cell (e.g., a polypeptide is an "isolated polypeptide" if it is produced from a recombinant nucleic acid present in a host cell and separated from the producing host cell, (5) does not occur in nature (e.g., it is a domain or other fragment of a polypeptide found in nature or it includes amino acid analogs or derivatives not found in nature or linkages other than standard peptide bonds), or (6) is otherwise produced, prepared, and/or manufactured by the hand of man. Thus, an "isolated polypeptide" includes a polypeptide that is produced in a host cell from a recombinant nucleic acid (such as a vector), regardless of whether the host cell naturally produces a polypeptide having an identical amino acid sequence. A "polypeptide" includes a polypeptide that is produced by a host cell via overexpression, e.g., homologous overexpression of the polypeptide from the host cell such as by altering the promoter of the polypeptide to increase its expression to a level above its normal expression level in the host cell in the absence of the altered promoter. A polypeptide that is chemically synthesized or synthesized in a cellular system different from a cell from which it naturally originates will be "isolated" from its naturally associated components. A polypeptide may also be rendered substantially free of naturally associated components by isolation, using protein purification techniques well known in the art. As thus defined, "isolated" does not necessarily require that the protein, polypeptide, peptide or oligopeptide so described has been physically removed from a cell in which it was synthesized.

[0040] The term "polypeptide fragment" or "protein fragment" as used herein refers to a polypeptide or domain thereof that has less amino acids compared to a reference polypeptide, e.g., a full-length polypeptide or a polypeptide domain of a naturally occurring protein. A "naturally occurring protein" or "naturally occurring polypeptide" includes a polypeptide having an amino acid sequence produced by a non-recombinant cell or organism. In an embodiment, the polypeptide fragment is a contiguous sequence in which the amino acid sequence of the fragment is identical to the corresponding positions in the naturally-occurring sequence. Fragments typically are at least 5, 6, 7, 8, 9 or 10 amino acids long, or at least 12, 14, 16 or 18 amino acids long, or at least 20 amino acids long, or at least 25, 30, 35, 40 or 45, amino acids, or at least 50, 60, 70, 80, 90 or 100 amino acids long, or at least 110, 120, 130, 140, 150, 1 0, 170, 180, 190 or 200 amino acids long, or 225, 250, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500, 525, 550, 575, 600 or greater than 600 amino acids long. A fragment can be a portion of a larger polypeptide sequence that is digested inside or outside the cell. Thus, a polypeptide that is 50 amino acids in length can be produced intracellularly, but proteolyzed inside or outside the cell to produce a polypeptide less than 50 amino acids in length. This is of particular significance for polypeptides shorter than about 25 amino acids, which can be more difficult than larger polypeptides to produce recombinantly or to purify once produced recombinantly. The term "peptide" as used herein refers to a short polypeptide or oligopeptide, e.g., one that typically contains less than about 50 amino acids and more typically less than about 30 amino acids, or more typically less than about 15 amino acids, such as less than about 10, 9, 8, 7, 6, 5, 4, or 3 amino acids. The term as used herein encompasses analogs and mimetics that mimic structural and thus biological function.

[0041] As used herein, "polypeptide mutant" or "mutein" refers to a polypeptide whose sequence contains an insertion, duplication, deletion, rearrangement or substitution of one or more amino acids compared to the amino acid sequence of a reference protein or polypeptide, such as a native or wild-type protein. A mutein may have one or more amino acid point substitutions, in which a single amino acid at a position has been changed to another amino acid, one or more insertions and/or deletions, in which one or more amino acids are inserted or deleted, respectively, in the sequence of the reference protein, and/or truncations of the amino acid sequence at either or both the amino or carboxy termini. A mutein may have the same or a different biological activity compared to the reference protein. In some embodiments, a mutein has, for example, at least 85% overall sequence homology to its counterpart reference protein. In some embodiments, a mutein has at least 90% overall sequence homology to the wild-type protein. In other embodiments, a mutein exhibits at least 95% sequence identity, or 98%, or 99%, or 99.5% or 99.9% overall sequence identity.

[0042] As used herein, a "polypeptide tag for affinity purification" is any polypeptide that has a binding partner that can be used to isolate or purify a second protein or polypeptide sequence of interest fused to the first "tag" polypeptide. Several examples are well known in the art and include a His-6 tag, a FLAG epitope, a c-myc epitope, a Strep-TAGII, a biotin tag, a glutathione 5-transferase (GST), a chitin binding protein (CBP), a maltose binding protein (MBP), or a metal affinity tag.

[0043] The terms "purify," "purifying" and "purified" refer to a substance (or entity, composition, product or material) that has been separated from at least some of the components with which it was associated either when initially produced (whether in nature or in an experimental setting), or during any time after its initial production. A substance such as a nutritional polypeptide will be considered purified if it is isolated at production, or at any level or stage up to and including a final product, but a final product may contain other materials up to about 10%, about 20%, about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, or above about 90% and still be considered "isolated." Purified substances or entities can be separated from at least about 10%, about 20%, about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, or more of the other components with which they were initially associated. In some embodiments, purified substances are more than about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, about 99%, or more than about 99% pure. In the instance of polypeptides and other polypeptides provided herein, such a polypeptide can be purified from one or more other polypeptides capable of being secreted from the unicellular organism that secretes the polypeptide. As used herein, a polypeptide substance is "pure" if it is substantially free of other components or other polypeptide components.

[0044] As used herein, "recombinant" refers to a biomolecule, e.g., a gene or polypeptide, that (1) has been removed from its naturally occurring environment, (2) is not associated with all or a portion of a polynucleotide in which the gene is found in nature, (3) is operatively linked to a polynucleotide which it is not linked to in nature, or (4) does not occur in nature. The term "recombinant" may also encompass a "heterologous" biomolecule or a

"heterologously expressed" polypeptide. Also, "recombinant" refers to a cell or an organism, such as a unicellular organism, herein termed a "recombinant unicellular organism," a "recombinant host" or a "recombinant cell" that contains, produces and/or secretes a biomolecule, which can be a recombinant biomolecule or a non-recombinant biomolecule. For example, a recombinant unicellular organism may contain a recombinant nucleic acid providing for enhanced production and/or secretion of a recombinant polypeptide or a non- recombinant polypeptide. A "recombinant" cell or organism is also intended to refer to a cell into which a recombinant nucleic acid such as a recombinant vector has been introduced. A "recombinant unicellular organism" includes a recombinant microorganism host cell and refers not only to the particular subject cell but to the progeny of such a cell. Because certain modifications may occur in succeeding generations due to either mutation or environmental influences, such progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the terms herein. The term "recombinant" can be used in reference to cloned DNA isolates, chemically-synthesized polynucleotide analogs, or polynucleotide analogs that are biologically synthesized by heterologous systems, as well as polypeptides and/or mRNAs encoded by such nucleic acids. Thus, for example, a polypeptide synthesized by a microorganism is recombinant, for example, if it is produced from an mRNA transcribed from a recombinant gene or other nucleic acid sequence present in the cell.

[0045] As used herein, an endogenous nucleic acid sequence in the genome of an organism (or the encoded polypeptide product of that sequence) is deemed "recombinant" herein if a heterologous sequence is placed adjacent to the endogenous nucleic acid sequence, such that the expression of this endogenous nucleic acid sequence is altered. In this context, a heterologous sequence is a sequence that is not naturally adjacent to the endogenous nucleic acid sequence, whether or not the heterologous sequence is itself endogenous (originating from the same host cell or progeny thereof) or exogenous (originating from a different host cell or progeny thereof). By way of example, a promoter sequence can be substituted (e.g., by homologous recombination) for the native promoter of a gene in the genome of a host cell, such that this gene has an altered expression pattern. This gene would now become

"recombinant" because it is separated from at least some of the sequences that naturally flank it. A nucleic acid is also considered "recombinant" if it contains any modifications that do not naturally occur to the corresponding nucleic acid in a genome. For instance, an endogenous coding sequence is considered "recombinant" if it contains an insertion, deletion or a point mutation introduced artificially, e.g., by human intervention. A "recombinant nucleic acid" also includes a nucleic acid integrated into a host cell chromosome at a heterologous site and a nucleic acid construct present as an episome.

[0046] The term "recombinant host cell" (or simply "recombinant cell" or "host cell"), as used herein, is intended to refer to a cell into which a recombinant nucleic acid such as a recombinant vector has been introduced. In some instances the word "cell" is replaced by a name specifying a type of cell. For example, a "recombinant microorganism" is a recombinant host cell that is a microorganism host cell and a "recombinant cyanobacteria" is a recombinant host cell that is a cyanobacteria host cell. It should be understood that such terms are intended to refer not only to the particular subject cell but to the progeny of such a cell. Because certain modifications may occur in succeeding generations due to either mutation or environmental influences, such progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the term "recombinant host cell," "recombinant cell," and "host cell", as used herein. A recombinant host cell can be an isolated cell or cell line grown in culture or can be a cell which resides in a living tissue or organism.

[0047] As used herein, "secrete," "secretion" and "secreted" all refer to the act or process by which a polypeptide is relocated from the cytoplasm of a cell of a multicellular organism or unicellular organism into the extracellular milieu thereof. As provided herein, such secretion may occur actively or passively. Further, the terms "excrete," "excretion" and "excreted" generally connote passive clearing of a material from a cell or unicellular organism; however, as appropriate such terms can be associated with the production and transfer of materials outwards from the cell or unicellular organism.

[0048] In general, "stringent hybridization" is performed at about 25°C below the thermal melting point (Tm) for the specific DNA hybrid under a particular set of conditions.

"Stringent washing" is performed at temperatures about 5°C lower than the Tm for the specific DNA hybrid under a particular set of conditions. The Tm is the temperature at which 50% of the target sequence hybridizes to a perfectly matched probe. See Sambrook et al., Molecular Cloning: A Laboratory Manual, 2d ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989), page 9.51, hereby incorporated by reference. For purposes herein, "stringent conditions" are defined for solution phase hybridization as aqueous hybridization (i.e., free of formamide) in 6xSSC (where 20xSSC contains 3.0 M NaCl and 0.3 M sodium citrate), 1% SDS at 65°C for 8-12 hours, followed by two washes in 0.2xSSC, 0.1 % SDS at 65°C for 20 minutes. It will be appreciated by the skilled worker that hybridization at 65°C will occur at different rates depending on a number of factors including the length and percent identity of the sequences which are hybridizing.

[0049] The term "substantial homology" or "substantial similarity," when referring to a nucleic acid or fragment thereof, indicates that, when optimally aligned with appropriate nucleotide insertions or deletions with another nucleic acid (or its complementary strand), there is nucleotide sequence identity in at least about 76%, 80%, 85%, or at least about 90%, or at least about 95%, 96%, 97%, 98% or 99% of the nucleotide bases, as measured by any well-known algorithm of sequence identity, such as FASTA, BLAST or Gap, as discussed above.

[0050] The term "sufficient amount" means an amount sufficient to produce a desired effect, e.g., an amount sufficient to modulate protein aggregation in a cell.

[0051] A "synthetic" RNA, DNA, protein, polypeptide or a mixed polymer is one created outside of a cell, for example one synthesized chemically. [0052] The term "therapeutically effective amount" is an amount that is effective to ameliorate a symptom of a disease. A therapeutically effective amount can be a

"prophylactically effective amount" as prophylaxis can be considered therapy.

[0053] As used herein, a "variant" of a nucleic acid or a nucleic acid sequence, such as a gene or a genomic sequence, encompasses nucleic acid sequences that differ by at least one nucleotide from a reference nucleic acid sequence; a "variant" also includes the deletion of the gene or the genomic sequence from a genome. A "variant" organism includes an organism containing a variant nucleic acid.

[0054] As used herein, a "vector" is intended to refer to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. One type of vector is a "plasmid," which generally refers to a circular double stranded DNA loop into which additional DNA segments can be ligated, but also includes linear double-stranded molecules such as those resulting from amplification by the polymerase chain reaction (PCR) or from treatment of a circular plasmid with a restriction enzyme. Other vectors include cosmids, bacterial artificial chromosomes (BAC) and yeast artificial chromosomes (YAC). Another type of vector is a viral vector, wherein additional DNA segments can be ligated into the viral genome (discussed in more detail below). Certain vectors are capable of autonomous replication in a host cell into which they are introduced (e.g., vectors having an origin of replication which functions in the host cell). Other vectors can be integrated into the genome of a host cell upon introduction into the host cell, and are thereby replicated along with the host genome. Moreover, certain vectors are capable of directing the expression of genes to which they are operatively linked. Such vectors are referred to herein as "recombinant expression vectors" (or simply "expression vectors"). It must be noted that, as used in the specification and the appended claims, the singular forms "a," "an" and "the" include plural referents unless the context clearly dictates otherwise.

Systems for Production of Nutritive Polypeptides

[0055] The invention relates to the field of genetic engineering and more particularly to the area of gene and/or genome modification of organisms, particularly eukaryotes, such as eukaryotes useful as recombinant hosts for production of nutritive polypeptides. The invention also concerns methods of making specific tools for use in methods of genome analysis and genetic modification. The invention more particularly relates to synthesis if recombinant host cells carrying variants of one or more genes, such variants having the capability of producting, and preferably secreting, nutritive polypeptides. Nutritive Polypeptides and Amino Acid Sequences

[0056] Proteins present in dietary food sources can vary greatly in their nutritive value. Provided are nutritive polypeptides that have enhanced nutritive value and physiological and pharmacological effects due to their amino acid content and digestibility. Provided are nutritive polypeptides that have enhanced levels of essential amino acids, the inadequate availability of such essential amino acids in a person negatively impacts general health and physiology through the perturbation of a network of cellular functions, and is associated with a wide array of health issues and diseases. Also provided are nutritive polypeptides that have reduced levels of certain amino acids, the presence or overabundance of such amino acids in the diet of an affected subject results in increased morbidity and mortality.

[0057] Traditionally, nutritionists and health researchers have utilized specific source ingredients (e.g., whey protein, egg whites, soya) or fractionates and isolates (e.g., soy protein isolates) to modulate the relative concentration of total protein in the diet, without the ability to modulate the specific amino acid constituents.

[0058] Herein provided are nutritive polypeptides capable of transforming health and treating, preventing and reducing the severity of a multitude of diseases, disorders and conditions associated with amino acid pathophysiology, as they are selected for specific physiologic benefits to improve health and address many nutrition-related conditions, including gastrointestinal malabsorption, muscle wasting, diabetes or pre-diabetes, obesity, oncology, metabolic diseases, and other cellular and systemic diseases. Also provided are the compositions and formulations that contain the nutritive polypeptides, as food, beverages, medical foods, supplements, and pharmaceuticals.

[0059] Herein are provided important elucidations in the genomics, proteomics, protein characterization and production of nutritive polypeptides. The present invention utilizes the synergistic advancements, described herein, of (a) the genomics of edible species— those human food source organisms, and human genomics, (b) substantial advances in protein identification and quantification in food protein and food nucleic acid libraries, (c) new correlations between protein physical chemistry, solubility, structure-digestibility relationships and amino acid absorption and metabolism in animals and humans, (d) physiology and pathophysiology information of how amino acids, the components of nutritive polypeptides, affect protein malnutrition, chronic disease, responses to acute injury, and aging, (e) recombinant nutritive polypeptide production utilizing a phylogenetically broad spectrum of host organisms, (f) qualification of allergenicity and toxicogenicity and in vitro and in vivo tests to assess human safety of orally consumed nutritive polypeptides.

[0060] Identification and selection of amino acid sequences encoding nutritive

polypeptides.

[0061] In its broadest sense, a nutritive polypeptide encompasses a polypeptide capable of delivering amino acid and peptide nutrition to its intended consumer, who derives a benefit from such consumption. Each nutritive polypeptide contains one or more amino acid sequences, and the present invention provides methods by which an amino acid sequence is identified and utilized in production, formulation and administration of the nutritive polypeptide having such an amino acid sequence.

[0062] In some embodiments, the source of a nutritive polypeptide amino acid sequence encompasses any protein-containing material, e.g., a food, beverage, composition or other product, known to be eaten, or otherwise considered suitable for consumption, without deleterious effect by, e.g., a human or other organism, in particular a mammal.

[0063] Nutritive polypeptide amino acid sequences derived from edible species.

[0064] In some embodiments a nutritive polypeptide comprises or consists of a protein or fragment of a protein that naturally occurs in an edible product, such as a food, or in the organism that generates biological material used in or as the food. In some embodiments an "edible species" is a species known to produce a protein that can be eaten by humans without deleterious effect. A protein or polypeptide present in an edible species, or encoded by a nucleic acid present in the edible species, is termed an "edible species protein" or "edible species polypeptide" or, if the edible species is a species consumed by a human, the term "naturally occurring human food protein" is used interchangeably herein. Some edible products are an infrequent but known component of the diet of only a small group of a type of mammal in a limited geographic location while others are a dietary staple throughout much of the world. In other embodiments an edible product is one not known to be previously eaten by any mammal, but that is demonstrated to be edible upon testing or analysis of the product or one or more proteins contained in the product.

[0065] Food organisms include but are not limited to those organisms of edible species disclosed in PCT US2013/032232, filed March 15, 2013, PCT/US2013/032180, filed March 15, 2013, PCT/US2013/032225, filed March 15, 2013, PCT/US2013/032218, filed March 15, 2013, PCT/US2013/032212, filed March 15, 2013, PCT/US2013/032206, filed March 15, 2013, and PCT/US2013/038682, filed April 29, 2013 and any phylogenetically related organisms.

[0066] In some embodiments a nutritive polypeptide amino acid sequence is identified in a protein that is present in a food source, such as an abundant protein in food, or is a derivative or mutein thereof, or is a fragment of an amino acid sequence of a protein in food or a derivative or mutein thereof. An abundant protein is a protein that is present in a higher concentration in a food relative to other proteins present in the food. Alternatively, a nutritive polypeptide amino acid sequence is identified from an edible species that produces a protein containing the amino acid sequence in relatively lower abundance, but the protein is detectable in a food product derived from the edible species, or from biological material produced by the edible species. In some embodiments a nucleic acid that encodes the protein is detectable in a food product derived from the edible species, or the nucleic acid is detectable from a biological material produced by the edible species. An edible species can produce a food that is a known component of the diet of only a small group of a type of mammal in a limited geographic location, or a dietary staple throughout much of the world.

[0067] Exemplary edible species include animals such as goats, cows, chickens, pigs and fish. In some embodiments the abundant protein in food is selected from chicken egg proteins such as ovalbumin, ovotransferrin, and ovomucuoid; meat proteins such as myosin, actin, tropomyosin, collagen, and troponin; cereal proteins such as casein, alphal casein, alpha2 casein, beta casein, kappa casein, beta-lactoglobulin, alpha-lactalbumin, glycinin, beta- conglycinin, glutelin, prolamine, gliadin, glutenin, albumin, globulin; chicken muscle proteins such as albumin, enolase, creatine kinase, phosphoglycerate mutase, triosephosphate isomerase, apolipoprotein, ovotransferrin, phosphoglucomutase, phosphoglycerate kinase, glycerol-3 -phosphate dehydrogenase, glyceraldehyde 3-phosphate dehydrogenase, hemoglobin, cofilin, glycogen phosphorylase, fructose- 1,6-bisphosphatase, actin, myosin, tropomyosin a-chain, casein kinase, glycogen phosphorylase, fructose- 1 ,6-bisphosphatase, aldolase, tubulin, vimentin, endoplasmin, lactate dehydrogenase, destrin, transthyretin, fructose bisphosphate aldolase, carbonic anhydrase, aldehyde dehydrogenase, annexin, adenosyl homocysteinase; pork muscle proteins such as actin, myosin, enolase, titin, cofilin, phosphoglycerate kinase, enolase, pyruvate dehydrogenase, glycogen phosphorylase, triosephosphate isomerase, myokinase; and fish proteins such as parvalbumin, pyruvate dehydrogenase, desmin, and triosephosphate isomerase. [0068] Nutritive polypeptides may contain amino acid sequences present in edible species polypeptides. In one embodiment, a biological material from an edible species is analyzed to determine the protein content in the biological material. An exemplary method of analysis is to use mass spectrometry analysis of the biological material, as provided in the Examples below. Another exemplary method of analysis is to generate a cDNA library of the biological material to create a library of edible species cDNAs, and then express the cDNA library in an appropriate recombinant expression host, as provided in the Examples below. Another exemplary method of analysis is query a nucleic acid and/or protein sequence database as provided in the Examples below.

[0069] By way of non-limiting examples, polypeptides of the present invention are provided in Table 1. The Predicted leader column shows the sequence indices of predicted leaders (if a leader exists). The Fragment Indices column shows the sequence indices of fragment sequences. The DBID column lists either the UniProt or GenBank Accession numbers for each sequence as available as of September 24, 2014, each of which is herein incorporated by reference. DBIDs with only numerical characters are from a GenBank database, and those with mixed alphabetical/numerical characters are from a UniProt database.

[0070] Nutritive glycoproteins and nutritive polypeptides with modulated glycosylation produced by recombinant host cells.

[0071] In some embodiments provided are formulations containing a nutritive polypeptide that is identical to the amino acid sequence of a polypeptide in a reference edible species glycoprotein, but the carbohydrate component of the nutritive polypeptide differs from a carbohydrate component of the reference edible species glycoprotein. The nutritive polypeptide is produced, for example, by expressing the polypeptide of the reference glycoprotein in a non-native host such as Aspergillus or Saccharomyces, optionally such host contains a variant nucleic acid such that glycosylation of the nutritive polypeptide is modified compared to a reference nutritive polypeptide produced in a reference host not containing the variant nucleic acid. Also provided are variant nutritive polypeptides, where the amino acid sequence differs from the amino acid sequence of a polypeptide in a reference glycoprotein by <1%, <5%, <10%, or more than 10%, and the mass of the carbohydrate component of the nutritive polypeptide is different from the mass of the carbohydrate component of the reference glycoprotein. The nutritive polypeptide variant is created by the insertion, deletion, substitution, or replacement of amino acid residues in the amino acid sequence of the polypeptide of the reference glycoprotein. Preferably, the nutritive polypeptide has distinguishable chemical, biochemical, biophysical, biological, or immunological properties from the reference glycoprotein. For example, the nutritive polypeptide is more hygroscopic, hydrophilic, or soluble in aqueous solutions than the reference glycoprotein. Alternatively, the nutritive polypeptide is less hygroscopic, hydrophilic, or soluble in aqueous solutions than the reference glycoprotein.

[0072] The term "glycan" or "glycoyl" refers to a polysaccharide or oligosaccharide which may be linked to a polypeptide, lipid, or proteoglycan. In some embodiments, a glycan is linked covalently or non-covalently to the polypeptide. In some embodiments the linkage occurs via a glycosidic bond. In some embodiments, the linkage is directly between the glycan (or glycoyl) and polypeptide or via an intermediary molecule. In some embodiments, the glycosidic bond is N-linked or O-linked. The term "polysaccharide" or "oligosaccharide" refers to one or more monosaccharide units joined together by glycosidic bonds. In some embodiments, the polysaccharide or oligosaccharide has a linear or branched structure. In some embodiments, the monosaccharide units comprise N-acetyl galactosamine, N- acetylglucosamine, galactose, neuraminic acid, fructose, mannose, fucose, glucose, xylose, N-acetylneuraminic acid, N-glycolylneuraminic acid, O-lactyl-N-acetylneuraminic acid, O- acetyl-N-acetylneuraminic acid, or O-methyl-N-acetylneuraminic acid. In some embodiments, the monosaccharide is modified by a phosphate, sulfate, or acetate group. The term "glycosylation acceptor site" refers to an amino acid along a polypeptide which carries a glycan or glycoyl in the native composition. In some embodiments the acceptor site consists of a nucleophilic acceptor of a glycosidic bond. In some embodiments, the nucleophilic acceptor site consists of an amino group. In some embodiments the amino acid consists of an asparagine, arginine, serine, threonine, hydroxyproline, hydroxylysine, tryptophan, phosphothreonine, serine, or phosphoserine. The term "exogenous glycosylation acceptor site" refers to a glycosylation acceptor site not present in the native composition of the polypeptide. In some embodiments the amino acid for the exogenous glycosylation acceptor site did not carry a glycan or glycoyl in the native composition. In some embodiments, the amino acid does not occur in the primary sequence of the polypeptide in the native composition. The term "exogenous glycan" or "exogenous glycoyl" refers to a glycan or glycoyl that occupies a glycosylation acceptor site, which was not present in the native composition on the same glycosylation acceptor site. In some embodiments, the glycosylation acceptor site is an exogenous glycosylation site or a native glycosylation site. The term "glycoprotein" refers to a polypeptide that is bound to at least one glycan or glycoyl.

[0073] Disclosed herein are formulations containing isolated nutritive polypeptides at least one exogenous glycosylation acceptor site present on an amino acid of the nutritive polypeptide. In some aspects, the at least one exogenous glycosylation acceptor site is occupied by an exogenous glycoyl or glycan, or alternatively, is unoccupied or is occupied by a non-natively occupying glycol or glycan. In some embodiments, the nutritive polypeptide is a polypeptide having an amino acid sequence at least 90% identical to SEQID 00001-03909, or is an edible species polypeptide sequence or fragment thereof at least 50 amino acids in length, or is a polypeptide having substantial immunogenicity when the glycosylation acceptor site is not present or is unoccupied. The nutritive polypeptide is more thermostable, is more digestible, and/or has a lower aggregation score than a reference polypeptide that has an amino acid sequence identical to the nutritive polypeptide but the glycosylation acceptor site is not present or is unoccupied in the reference polypeptide. The amino acids, e.g., asparagine, arginine, serine, threonine, hydroxyproline, and hydroxylysine, containing an exogenous glycosylation acceptor site are resistant to proteolysis. Exemplary glycans are N- acetyl galactosamine, N-acetylglucosamine, galactose, neuraminic acid, fructose, mannose, fucose, glucose, xylose, N-acetylneuraminic acid, N-glycolylneuraminic acid, O-lactyl-N- acetylneuraminic acid, O-acetyl-N-acetylneuraminic acid, and O-methyl-N-acetylneuraminic acid.

[0074] In another example, the nutritive polypeptide is more antigenic, immunogenic, or allergenic than the reference glycoprotein, or alternatively, the nutritive polypeptide is less antigenic, immunogenic, or allergenic than the reference glycoprotein. The nutritive polypeptide is more stable or resistant to enzymatic degradation than the reference glycoprotein or the nutritive polypeptide is more unstable or susceptible to enzymatic degradation than the reference glycoprotein. The carbohydrate component of the nutritive polypeptide is substantially free of N-glycolylneuraminic acid or has reduced N- glycolylneuraminic acid in comparison to the reference glycoprotein. Alternatively, the carbohydrate component of the nutritive polypeptide has elevated N-glycolylneuraminic acid in comparison to the reference glycoprotein.

[0075] Also provided is a nutritive polypeptide that has at least one exogenous glycosylation acceptor site present on an amino acid of the nutritive polypeptide, and the at least one exogenous glycosylation acceptor site is occupied by an exogenous glycoyl or glycan, and the nutritive polypeptide includes a polypeptide having an amino acid sequence at least 90% identical to SEQID 00001-03909 , where the nutritive polypeptide is present in at least 0.5g at a concentration of at least 10% on a mass basis, and where the formulation is substantially free of non-comestible products.

[0076] Nutritive polypeptide phvsicochemical properties.

[0077] Digestibility. In some aspects the nutritive polypeptide is substantially digestible upon consumption by a mammalian subject. Preferably, the nutritive polypeptide is easier to digest than at least a reference polypeptide or a reference mixture of polypeptides, or a portion of other polypeptides in the consuming subject's diet. As used herein, "substantially digestible" can be demonstrated by measuring half-life of the nutritive polypeptide upon consumption. For example, a nutritive polypeptide is easier to digest if it has a half-life in the

gastrointestinal tract of a human subject of less than 60 minutes, or less than 50, 40, 30, 20, 15, 10, 5, 4, 3, 2 minutes or 1 minute. In certain embodiments the nutritive polypeptide is provided in a formulation that provides enhanced digestion; for example, the nutritive polypeptide is provided free from other polypeptides or other materials. In some embodiments, the nutritive polypeptide contains one or more recognition sites for one or more endopeptidases. In a specific embodiment, the nutritive polypeptide contains a secretion leader (or secretory leader) sequence, which is then cleaved from the nutritive polypeptide. As provided herein, a nutritive polypeptide encompasses polypeptides with or without signal peptides and/or secretory leader sequences. In some embodiments, the nutritive polypeptide is susceptible to cleavage by one or more exopeptidases.

Disestion Assays

[0078] Digestibility is a parameter relevant to the benefits and utility of proteins.

Information relating to the relative completeness of digestion can serve as a predictor of peptide bioavailability (Daniel, H., 2003. Molecular and Integrative Physiology of Intestinal Peptide Transport. Annual Review of Physiology, Volume 66, pp. 361-384). In some embodiments proteins disclosed herein are screened to assess their digestibility. Digestibility of proteins can be assessed by any suitable method known in the art. In some embodiments digestibility is assessed by a physiologically relevant in vitro digestion reaction that includes one or both phases of protein digestion, simulated gastric digestion and simulated intestinal digestion (see, e.g., Moreno, et al., 2005. Stability of the major allergen Brazil nut 2S albumin (Ber e 1) to physiologically relevant in vitro gastrointestinal digestion. FEBS Journal, pp. 341-352; Martos, G., Contreras, P., Molina, E. & Lopez-Fandino, R., 2010. Egg White Ovalbumin Digestion Mimicking Physiological Conditions. Journal of Agricultural and food chemistry, pp. 5640-5648; Moreno, F. J., Mackie, A. R. & Clare Mills, E. N., 2005). Phospholipid interactions protect the milk allergen a-Lactalbumin from proteolysis during in vitro digestion. Journal of agricultural and food chemistry, pp. 9810-9816). Briefly, test proteins are sequentially exposed to a simulated gastric fluid (SGF) for 120 minutes (the length of time it takes 90% of a liquid meal to pass from the stomach to the small intestine; see Kong, F. & Singh, R. P., 2008. Disintegration of Solid Foods in Human Stomach. Journal of Food Science, pp. 67-80) and then transferred to a simulated duodenal fluid (SDF) to digest for an additional 120 minutes. Samples at different stages of the digestion (e.g., 2, 5, 15, 30, 60 and 120 min) are analyzed by electrophoresis (e.g., chip electrophoresis or SDS-PAGE) to monitor the size and amount of intact protein as well as any large digestion fragments (e.g., larger than 4 kDa). The disappearance of protein over time indicates the rate at which the protein is digested in the assay. By monitoring the amount of intact protein observed over time, the half-life (τ1/2) of digestion is calculated for SGF and, if intact protein is detected after treatment with SGF, the τ1/2 of digestion is calculated for SIF. This assay can be used to assess comparative digestibility (i.e., against a benchmark protein such as whey) or to assess absolute digestibility. In some embodiments the digestibility of the protein is higher (i.e., the SGF τ1/2 and/or SIF τ1/2 is shorter) than whey protein. In some embodiments the protein has a SGF xl/2 of 30 minutes or less, 20 minutes or less, 15 minutes or less, 10 minutes or less, 5 minutes or less, 4 minutes or less, 3 minutes or less, 2 minutes or less or 1 minute or less. In some embodiments the protein has a SIF τ 1/2 of 30 minutes or less, 20 minutes or less, 15 minutes or less, 10 minutes or less, 5 minutes or less, 4 minutes or less, 3 minutes or less, 2 minutes or less or 1 minute or less. In some embodiments the protein is not detectable in one or both of the SGF and SIF assays by 2 minutes, 5 minutes, 15 minutes, 30 minutes, 60 minutes, or 120 minutes. In some embodiments the protein is digested at a constant rate and/or at a controlled rate in one or both of SGF and SIF. In such embodiments the rate of digestion of the protein may not be optimized for the highest possible rate of digestion. In such embodiments the rate of absorption of the protein following ingestion by a mammal can be slower and the total time period over which absorption occurs following ingestion can be longer than for proteins of similar amino acid composition that are digested at a faster initial rate in one or both of SGF and SIF. In some embodiments the protein is completely or substantially completely digested in SGF. In some embodiments the protein is substantially not digested or not digested by SGF; in most such embodiments the protein is digested in SIF.

[0079] Assessing protein digestibility can also provide insight into a protein's potential allergenicity, as proteins or large fragments of proteins that are resistant to digestive proteases can have a higher risk of causing an allergenic reaction (Goodman, R. E. et al., 2008. Allergenicity assessment of genetically modified crops - what makes sense? Nature Biotechnology, pp. 73-81). To detect and identify peptides too small for chip electrophoresis analysis, liquid chromatography and mass spectrometry can be used. In SGF samples, peptides can be directly detected and identified by LC/MS. SIF protein digestions may require purification to remove bile acids before detection and identification by LC/MS.

[0080] In some embodiments digestibility of a protein is assessed by identification and quantification of digestive protease recognition sites in the protein amino acid sequence. In some embodiments the protein comprises at least one protease recognition site selected from a pepsin recognition site, a trypsin recognition site, and a chymotrypsin recognition site.

[0081] As used herein, a "pepsin recognition site" is any site in a polypeptide sequence that is experimentally shown to be cleaved by pepsin. In some embodiments it is a peptide bond after (i.e., downstream of) an amino acid residue selected from Phe, Trp, Tyr, Leu, Ala, Glu, and Gin, provided that the following residue is not an amino acid residue selected from Ala, Gly, and Val.

[0082] As used herein, a "trypsin recognition site" is any site in a polypeptide sequence that is experimentally shown to be cleaved by trypsin. In some embodiments it is a peptide bond after an amino acid residue selected from Lys or Arg, provided that the following residue is not a proline.

[0083] As used herein, a "chymotrypsin recognition site" is any site in a polypeptide sequence that is experimentally shown to be cleaved by chymotrypsin. In some embodiments it is a peptide bond after an amino acid residue selected from Phe, Trp, Tyr, and Leu.

[0084] Disulfide bonded cysteine residues in a protein tend to reduce the rate of digestion of the protein compared to what it would be in the absence of the disulfide bond. For example, it has been shown that the rate of digestion of the protein b-lactoglobulin is increased when its disulfide bridges are cleaved (I. M. Reddy, N. K. D. Kella, and J. E. Kinsella. "Structural and Conformational Basis of the Resistance of B-Lactoglobulin to Peptic and Chymotryptic Digestion". J. Agric. Food Chem. 1988, 36, 737-741). Accordingly, digestibility of a protein with fewer disulfide bonds tends to be higher than for a comparable protein with a greater number of disulfide bonds. In some embodiments the proteins disclosed herein are screened to identify the number of cysteine residues present in each and in particular to allow selection of a protein comprising a relatively low number of cysteine residues. For example, edible species proteins or fragments can be identified that comprise a no Cys residues or that comprise a relatively low number of Cys residues, such as 10 or fewer Cys residues, 9 or fewer Cys residues, 8 or fewer Cys residues, 7 or fewer Cys residues, 6 or fewer Cys residues, 5 or fewer Cys residues, 4 or fewer Cys residues, 3 or fewer Cys residues, 2 or fewer Cys residues, 1 Cys residue, or no Cys residues. In some embodiments one or more Cys residues in an edible species protein or fragment thereof is removed by deletion and/or by substitution with another amino acid. In some embodiments 1 Cys residue is deleted or replaced, 1 or more Cys residues are deleted or replaced, 2 or more Cys residues are deleted or replaced, 3 or more Cys residues are deleted or replaced, 4 or more Cys residues are deleted or replaced, 5 or more Cys residues are deleted or replaced, 6 or more Cys residues are deleted or replaced, 7 or more Cys residues are deleted or replaced, 8 or more Cys residues are deleted or replaced, 9 or more Cys residues are deleted or replaced, or 10 or more Cys residues are deleted or replaced. In some embodiments the protein of this disclosure comprises a ratio of Cys residues to total amino acid residues equal to or lower than 5%, 4%, 3%, 2%, or 1%. In some embodiments the protein comprises 10 or fewer Cys residues, 9 or fewer Cys residues, 8 or fewer Cys residues, 7 or fewer Cys residues, 6 or fewer Cys residues, 5 or fewer Cys residues, 4 or fewer Cys residues, 3 or fewer Cys residues, 2 or fewer Cys residues, 1 Cys residue, or no Cys residues. In some embodiments, the protein comprises 1 or fewer Cys residues. In some embodiments, the protein comprises no Cys residues.

[0085] Alternatively or in addition, disulfide bonds that are or can be present in a protein can be removed. Disulfides can be removed using chemical methods by reducing the disulfide to two thiol groups with reducing agents such as beta-mercaptoethanol, dithiothreitol (DTT), or tris(2-carboxyethyl)phosphine (TCEP). The thiols can then be covalently modified or "capped" with reagents such as iodoacetamide, N-ethylmaleimide, or sodium sulfite (see, e.g., Crankshaw, M. W. and Grant, G. A. 2001. Modification of Cysteine. Current Protocols in Protein Science. 15.1.1-15.1.18).

[0086] Nutritive polypeptides and nutritive polypeptide formulations with modulated viscosity. [0087] Disclosed herein are compositions, formulations, and food products that contain viscosity-modulating nutritive polypeptides. In one aspect, provided are formulations substantially free of non-comestible products that contain nutritive polypeptides present in a nutritional amount, and the nutritive polypeptide decreases the viscosity of a food product. In some embodiments, the nutritive polypeptide is present at about lOg/1 and the viscosity of the formulation is from about 1,000 mPas to about 10,000 mPas at 25 degrees C, such as from about 2,500 mPas to about 5,000 mPas at 25 degrees C.

[0088] The formulations are incorporated into food products having advantages over similar food products lacking the nutritive polypeptides, or the formulations are incorporated into other products such as beverage products or animal feed products. For example, the food products have a reduced fat content, a reduced sugar content, and/or a reduced calorie content compared to a food product not having the nutritive polypeptide. Preferably, the nutritive polypeptide is present in the food product such that consumption of a nutritional amount of the food product is satiating. In an embodiment of the invention, gelatin, an animal-derived material, is replaced by a non-animal derived product, containing one or more nutritive polypeptides. Typically the nutritive polypeptide is present in an amount effective to replace gelatin in the product. The gelatin replacement is incorporated into a food product, a beverage product, or an animal feed product, and the formulation is substantially free of non- comestible products.

[0089] Also provided are formulations containing a nutritive polypeptide present in a functional and/or nutritional amount, which increases the viscosity of a food or beverage product, such as formulations containing viscosity-increasing nutritive polypeptides incorporated into food products having advantages over similar food products lacking the nutritive polypeptides. For example, the food products have a reduced fat content, a reduced sugar content, and/or a reduced calorie content compared to a food product not having the nutritive polypeptide. Viscous nutritive polypeptides can be used as a nutritionally favorable low calorie substitute for fat. Additionally, it may be desired to add to the compositions and products one or more polysaccharides or emulsifiers, resulting in a further improvement in the creamy mouthfeel.

[0090] In some embodiments, the viscosity of nutritive polypeptide-containing materials is enhanced by crosslinking the nutritive polypeptides or crosslinking nutritive polypeptides to other proteins present in the material. An example of an effective crosslinker is

transglutaminase, which crosslinks proteins between an ε-aminogroup of a lysine residue and a γ-carboxamide group of glutamine residue, forming a stable covalent bond. The resulting gel strength and emulsion strength of nutritive polypeptides identified and produced as described herein are examined by preparing a transglutaminase-coupled nutritive protein composition, followed by gel strength and emulsion strength assays. A suitable

transglutaminase derived from microorganisms in accordance with the teachings of U.S. Pat. No. 5,156,956 is commercially available. These commercially available transglutaminases typically have an enzyme activity of about 100 units. The amount of transglutaminase (having an activity of about 100 units) added to isolated nutritive polypeptide is expressed as a transglutaminase concentration which is the units of transglutaminase per 100 grams of isolated nutritive polypeptide. The isolated nutritive polypeptide contains from 5 to 95%, preferably 20 to 80%, preferably 58% to 72% protein and also preferably from 62% to 68% protein. The transglutaminase concentration is at least 0.15, preferably 0.25 and most preferably 0.30 units transglutaminase per gram protein up to 0.80 and preferably 0.65 units transglutaminase per gram protein. Higher and lower amounts may be used. This enzyme treatment can also be followed by thermal processing to make a viscous solution containing a nutritive polypeptide. To generate nutritive polypeptide samples containing crosslinks, a sample is mixed with a transglutaminase solution at pH 7.0 to give an enzyme to protein weight ratio of 1 :25. The enzyme-catalyzed cross-linking reaction is conducted at 40 °C in most of the experiments.

[0091] Oscillatory shear measurements can be used to investigate the rheological properties of nutritive polypeptides. Also, to determine the viscosity of nutritive polypeptide solutions and gels viscoelasticity is investigated by dynamic oscillatory rheometry. A 2 mL sample of nutritive polypeptide solution or nutritive polypeptide solution containing transglutaminase is poured into the Couette-type cylindrical cell (2.5 cm i.d., 2.75 cm o.d.) of the rheometer and covered with a thin layer of low-viscosity silicone oil to prevent evaporation. For samples with enzyme present, gelation is induced in situ by incubation at 40 °C. For nutritive polypeptide samples without enzyme, gelation is induced by subjecting the sample to the following thermal treatment process: temperature increased at constant rate of 2 K min-1 from 40 to 90 °C, kept at 90 °C for 30 min, cooled at 1 K min-1 from 90 to 30 °C, and kept at 30 °C for 15 min. Some samples can be subjected to this thermal treatment after the enzyme treatment. Small deformation shear rheological properties are mostly determined in the linear viscoelastic regime (maximum strain amplitude 0.5%) with storage and loss moduli (G' and G") measured at a constant frequency of 1 Hz. In addition, some small deformation measurements are made as a function of frequency e.g., 2 χ 10-3 to 2 Hz, and some large deformation measurements are carried out at strains up to nearly 100%.

[0092] Amino acid pharmacology.

[0093] Amino acids are organic molecules containing both amino and acid groups. All amino acids have asymmetric carbon except for glycine and all protein amino acids, except proline, have an alpha-carbon bound to a carboxyl group and a primary amino group.

[0094] Amino acids exhibit a diverse range of biochemical properties and biological function due to their varying side chains. They are stable in solution at physiological pH, save for glutamine and cysteine. In the context of some proteins, conditional upon the host and translational machinery, amino acids can undergo post-translational modification. This can have significant effects on their bioavailability, metabolic function, and bioactivity in vivo. Sugar moieties appended to proteins post-translationally may reduce the usefulness of the nutritive proteins by affecting the gastrointestinal release of amino acids and embedded peptides. A comparison of digestion of glycosylated and non-glycosylated forms of the same proteins shows that the non-glycosylated forms are digested more quickly than the glycosylated forms (our data).

[0095] Although over 300 amino acids exist in nature, 20 serve as building blocks in protein. Non-protein alpha- AAs and non-alpha AAs are direct products of these 20 protein amino acids and play significant roles in cell metabolism. Due to the metabolic reactions of amino acid catabolism that drive the interconversion between amino acids, a subset of 11 of the 20 standard protein amino acids are considered non-essential for humans because they can be synthesized from other metabolites (amino acids, ketones, etc.) in the body: Alanine;

Arginine; Asparagine; Aspartic acid; Cysteine; Glutamic acid; Glutamine; Glycine; Proline; Serine; and Tyrosine.

[0096] Arginine, cysteine, glycine, glutamine, histidine, proline, serine and tyrosine are considered conditionally essential, as they are not normally used in the diet, and are not synthesized in adequate amounts in specific populations to meet optimal needs where rates of utilization are higher than rates of synthesis. Functional needs such as reproduction, disease prevention, or metabolic abnormalities, however, can be taken into account when considering whether an amino acid is truly non-essential or can be conditionally essential in a population. The other 9 protein amino acids, termed essential amino acids, are taken as food because their carbon skeletons are not synthesized de novo by the body to meet optimal metabolic requirements: Histidine; Isoleucine; Leucine; Lysine; Methionine; Phenylalanine; Threonine; Tryptophan; and Valine.

[0097] All 20 protein amino acids (and non-protein metabolites) are used for normal cell functionality, and shifts in metabolism driven by changing availability of a single amino acid can affect whole body homeostasis and growth. Additionally, amino acids function as signaling molecules and regulators of key metabolic pathways used for maintenance, growth, reproduction, immunity.

[0098] In the body skeletal muscle represents the largest store of both free and protein-bound amino acids due to its large composition of body mass (around 40-45%). The small intestine is another important site for amino acid catabolism, governing the first pass metabolism and entry of dietary amino acids into the portal vein and into the peripheral plasma. 30-50% of EAA in the diet may be catabolized by the small intestine in first-pass metabolism. The high activity of BCAA transaminases in the intestinal mucosa leads to BCAA conversion to branched-chain alpha-ketoacids to provide energy for enterocytes similar as is done in skeletal muscle. Differences in physiological state of muscle and small intestine metabolism have large implications on amino acid biology systemically across tissues in humans.

[0099] Amino acids can exist in both L- and D- isoforms, except for glycine (non-chiral). Almost all amino acids in proteins exist in the L- isoform, except for cysteine (D-cys) due to its sulfur atom at the second position of the side-chain, unless otherwise enzymatically postranslationally modified or chemically treated for storing or cooking purposes. Most D- amino acids, except for D-arg, D-cys, D-his, D-lys, and D-thr, can be converted into the L chirality by D-AA oxidases and transaminases. In order to be catabolized, these D enantiomers are transported across the plasma and other biological membranes and undergo D-oxidation or deaminate the amino acid to convert to its alpha-ketoacid or racemization to convert the D-AA to its L-isoform. The transport of D-isomers is limited by a lower affinity of L-AA transporters to D-AAs. For this reason the efficiency of D-AA utilization, on a molar basis of the L-isomer, can range from 20-100% depending on the amino acid and the species.

Nucleic Acids

[00100] Also provided herein are nucleic acids encoding polypeptides or proteins. In some embodiments the nucleic acid is isolated. In some embodiments the nucleic acid is purified. [00101] In some embodiments of the nucleic acid, the nucleic acid comprises a nucleic acid sequence that encodes a first polypeptide sequence disclosed herein. In some embodiments of the nucleic acid, the nucleic acid consists of a nucleic acid sequence that encodes a first polypeptide sequence disclosed herein. In some embodiments of the nucleic acid, the nucleic acid comprises a nucleic acid sequence that encodes a protein disclosed herein. In some embodiments of the nucleic acid, the nucleic acid consists of a nucleic acid sequence that encodes a protein disclosed herein. In some embodiments of the nucleic acid the nucleic acid sequence that encodes the first polypeptide sequence is operatively linked to at least one expression control sequence. For example, in some embodiments of the nucleic acid the nucleic acid sequence that encodes the first polypeptide sequence is operatively linked to a promoter such as a promoter described herein.

[00102] Accordingly, in some embodiments the nucleic acid molecule of this disclosure encodes a polypeptide or protein that itself is a polypeptide or protein. Such a nucleic acid molecule can be referred to as a "nucleic acid." In some embodiments the nucleic acid encodes a polypeptide or protein that itself comprises at least one of: a) a ratio of branched chain amino acid residues to total amino acid residues of at least 24%; b) a ratio of Leu residues to total amino acid residues of at least 1 1 %; and c) a ratio of essential amino acid residues to total amino acid residues of at least 49%. In some embodiments the nucleic acid comprises at least 10 nucleotides, at least 20 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least 100 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 400 nucleotides, at least 500 nucleotides, at least 600 nucleotides, at least 700 nucleotides, at least 800 nucleotides, at least 900 nucleotides, at least 1 ,000 nucleotides. In some embodiments the nutritrive nucleic acid comprises from 10 to 100 nucleotides, from 20 to 100 nucleotides, from 10 to 50 nucleotides, or from 20 to 40 nucleotides. In some embodiments the nucleic acid comprises all or part of an open reading frame that encodes an edible species polypeptide or protein. In some embodiments the nucleic acid consists of an open reading frame that encodes a fragment of an edible species protein, wherein the open reading frame does not encode the complete edible species protein. In some embodiments the nucleic acid is a cDNA.

[00103] In some embodiments nucleic acid molecules are provided that comprise a sequence that is at least 50%, 60%, 70%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 99.9% identical to an edible species nucleic acid. In some embodiments nucleic acids are provided that hybridize under stringent hybridization conditions with at least one reference nucleic acid.

[00104] The nucleic acids and fragments thereof provided in this disclosure display utility in a variety of systems and methods. For example, the fragments can be used as probes in various hybridization techniques. Depending on the method, the target nucleic acid sequences can be either DNA or RNA. The target nucleic acid sequences can be fractionated (e.g., by gel electrophoresis) prior to the hybridization, or the hybridization can be performed on samples in situ. One of skill in the art will appreciate that nucleic acid probes of known sequence find utility in determining chromosomal structure (e.g., by Southern blotting) and in measuring gene expression (e.g., by Northern blotting). In such experiments, the sequence fragments are preferably detectably labeled, so that their specific hydridization to target sequences can be detected and optionally quantified. One of skill in the art will appreciate that the nucleic acid fragments of this disclosure can be used in a wide variety of blotting techniques not specifically described herein.

[00105] It should also be appreciated that the nucleic acid sequence fragments disclosed herein also find utility as probes when immobilized on microarrays. Methods for creating microarrays by deposition and fixation of nucleic acids onto support substrates are well known in the art. Reviewed in DNA Microarrays: A Practical Approach (Practical Approach Series), Schena (ed.), Oxford University Press (1999) (ISBN: 0199637768); Nature Genet. 21(l)(suppl):l-60 (1999); Microarray Biochip: Tools and Technology, Schena (ed.), Eaton Publishing Company/BioTechniques Books Division (2000) (ISBN: 1881299376), the disclosures of which are incorporated herein by reference in their entireties. Analysis of, for example, gene expression using microarrays comprising nucleic acid sequence fragments, such as the nucleic acid sequence fragments disclosed herein, is a well-established utility for sequence fragments in the field of cell and molecular biology. Other uses for sequence fragments immobilized on microarrays are described in Gerhold et al., Trends Biochem. Sci. 24:168-173 (1999) and Zweiger, Trends Biotechnol. 17:429-436 (1999); DNA Microarrays: A Practical Approach (Practical Approach Series), Schena (ed.), Oxford University Press (1999) (ISBN: 0199637768); Nature Genet. 21(l)(suppl):l-60 (1999); Microarray Biochip: Tools and Technology, Schena (ed.), Eaton Publishing Company/BioTechniques Books Division (2000) (ISBN: 1881299376).

Expression

[00106] Vectors [00107] Also provided are one or more vectors, including expression vectors, which comprise at least one of the nucleic acid molecules disclosed herein, as described further herein. In some embodiments, the vectors comprise at least one isolated nucleic acid molecule encoding a protein as disclosed herein. In alternative embodiments, the vectors comprise such a nucleic acid molecule operably linked to one or more expression control sequence. The vectors can thus be used to express at least one recombinant protein in a recombinant microbial host cell. In some aspects, a vector or set of vectors can include a nucleic acid sequence coding for a signal peptide, e.g., to cause secretion of a protein disclosed herein. See below for further discussion of signal peptides and secretion.

[00108] Promoters

[00109] Promoters useful for expressing the recombinant genes described herein include both constitutive and inducible/repressible promoters. Examples of inducible/repressible promoters include nickel-inducible promoters (e.g., PnrsA, PnrsB; see, e.g., Lopez-Mauy et al., Cell (2002) v.43: 247-256) and urea repressible promoters such as PnirA (described in, e.g., Qi et al., Applied and Environmental Microbiology (2005) v.71 : 5678-5684).

Additional examples of inducible/repressible promoters include PnirA (promoter that drives expression of the nirA gene, induced by nitrate and repressed by urea) and Psuf (promoter that drives expression of the sufB gene, induced by iron stress). Examples of constitutive promoters include Pcpc (promoter that drives expression of the cpc operon), Prbc (promoter that drives expression of rubisco), PpsbAII (promoter that drives expression of PpsbAIl), Pcro (lambda phage promoter that drives expression of cro). In other embodiments, a Paphll and/or a laclq-Ptrc promoter can used to control expression. Where multiple recombinant genes are expressed in an engineered microorganim, the different genes can be controlled by different promoters or by identical promoters in separate operons, or the expression of two or more genes can be controlled by a single promoter as part of an operon.

[00110] In some embodiments, the inducible promoter is induced by iron starvation or by entering the stationary growth phase. In some embodiments, the inducible promoter can be variant sequences of the promoter sequence of cyanobacterial genes that are up-regulated under Fe-starvation conditions such as isiA, or when the culture enters the stationary growth phase, such as isiA,phrA, sigC, sigB, and sigH genes, or a variant or fragment thereof.

[00111] In some embodiments, the inducible promoter is induced by a metal or metal ion. By way of non-limiting example, the inducible promoter can be induced by copper, zinc, cadmium, mercury, nickel, gold, silver, cobalt, and bismuth or ions thereof. In some embodiments, the inducible promoter is induced by nickel or a nickel ion. In some embodiments, the inducible promoter is induced by a nickel ion, such as Ni 2+ . In another embodiment, the inducible promoter can be induced by copper or a copper ion. In yet another embodiment, the inducible promoter can be induced by zinc or a zinc ion. In still another embodiment, the inducible promoter can be induced by cadmium or a cadmium ion. In yet still another embodiment, the inducible promoter can be induced by mercury or a mercury ion. In an alternative embodiment, the inducible promoter can be induced by gold or a gold ion. In another alternative embodiment, the inducible promoter can be induced by silver or a silver ion. In yet another alternative embodiment, the inducible promoter can be induced by cobalt or a cobalt ion. In still another alternative embodiment, the inducible promoter can be induced by bismuth or a bismuth ion.

[00112] In some embodiments, the promoter is induced by exposing a cell comprising the inducible promoter to a metal or metal ion. The cell can be exposed to the metal or metal ion by adding the metal to the microbial growth media. In certain embodiments, the metal or metal ion added to the microbial growth media can be efficiently recovered from the media. In other embodiments, the metal or metal ion remaining in the media after recovery does not substantially impede downstream processing of the media or of the bacterial gene products.

[00113] Hosts.

[00114] Provided herein are methods, host cells and systems for enhancing recombinant nutritive polypeptide production and, optionally, secretion, from eukarotic host cells, in particular host cells of the fungal kingdom, division Ascomycota. By way of non-limiting example, provided herein is disclosure relating to filamentous fungi, including species of the genus Aspergillus. Provided in Table VI are nucleic acids and proteins that are suitable for modulation in fungal host cells, such as yeast or Aspergillus.

Table VI.

Nucleic Acids Proteins Aspergillus orthologs OrthoDB

prtT PRTT_ASPNC PRTT_ASPNC EOG70PKRS

pepA PEPA_ASPNC A2R613_ASPNC EOG7V1PKK

pepD A2QTZ2_ASPNC A2QMZ7_ASPNC EOG7WQGMW

pepF A2QP32_ASPNC A2QCT2_ASPNC EOG72CCTC

bipA A2QW80_ASPNC A2QW80_ASPNC EOG7TXT9G

prpA A2Q8J8_ASPNC A2Q8J8_ASPNC EOG78HBQ5

had HACl_ASPOR A2Q7D0_ASPNC EOG7NWF5R

kar2p GRP78_YEAST A2QW80_ASPNC EOG7TXT9G

aovpslO VPS10_ASPOR VPS10_ASPNC EOG715XXP

cog6 COG6 ASPNC COG6 ASPNC EOG71KDWJ

[00115] Also provided are homologs and orthologs of the nucleic acids and proteins. Thus, the invention provides the generation of variants of orthlogs disclosed in the OrthoDB ortholog database (See, Waterhouse et al., Nucleic Acids Research (2013): D358-65) or otherwise known, or identified using methods generally known in the art.

[00116] Orthologs provided herein include EOG70PKRS: C9SMQ4, G2WZ92, E3QD15, HI V8Z1 , K2RZ24, B6HRH5, K9GAI0, K9GIW5, A1CHV7, A1 CWV3, B0Y6K1,

Q4WPQ8, G7XXN9, G3Y9N3, A2QJF9, Q0D009, B8N0S8, 18IE18, A7BJS7.

[00117] Orthologs provided in OrthoDB include EOG715XXP: L0PAT7, B6JY29, 042930, Q6C6Y6, K0KAZ9, 0KJJ3, 12GX17, G8BWT0, A7TQ37, A7TT43, J7S0X4, H2ASC6, G0V9M5, G0W6B5, Q6FPF1 , J8LR89, C8ZFU4, C8Z3X9, C8ZAZ9, E7LW18, E7KWA6, E7QBB1 , E7QK87, E7QMG5, E7NMI3, E7NP85, E7NPG7, E7NPL7, E7NPL8, E7QAZ3, B3LPH8, B3LNF5, C7GX93, P32319, P40438, P40890, P53751 , A6ZPM0, A6ZRF6, A6ZRF7, A6ZSD9, A6ZVA5, A6ZKT1, G2W8Z2, G2WLC2, B5VR90,

B5VDW2, E7K9D2, E7KHM1, G8ZPN3, G8ZXM5, C5DU19, C5E2W7, Q6CNR4, G8JV99, Q754Q4, C4Y3C1, G3BBQ4, A5D1Y4, A5D1Y5, B5RU30, G8XZT4, A3GHY0, G3AJV7, G3ANC4, A5E1P1 , H8X4T8, G8BFA0, C5MBU2, C5MBU4, B9W9N8, Q59M21, C4YG73, E7R0E4, E7R6F7, F2QSG6, F2QVS8, C4R564, C4R192, G1XBI6, D5GKL7, C7YGZ0, J9MC36, F9FQ36, K3W132, 11RAS9, G9NH33, G0RCD8, G9N6W3, E9DZM8, E9EVQ7, J4UX30, G3JNT9, C9SM62, G2WYQ2, L2G4G6, E3QER5, H1VIR1 , H1VSF8, D1Z9Q3, Q7SH60, F8N0Q4, G4UAH6, G0S0A2, G2QY58, Q2HAB1, G2Q383, F0XD99, J3NJL4, A4RF05, L8FVY4, K1X6J4, H0EG59, A7ERD9, G2Y4B2, F9X071, K2SBE0, Q0TVB2, E4ZVX1, B2WDP9, E3RE00, H6C7G3, C1GTA5, C1FZI7, C0S7J1, F2T5C7, C5JC96, C5GVC6, A6QWZ2, C0NKP8, F0UD48, C6HAY7, C5FYX2, E4V1C4, E4UV76, F2PGH1, F2RXW9, F2SJB6, D4AQV3, D4D4B1 , C4JI06, J3KLG4, C5P311, E9CU72, B6QMS8, B8MG72, B6H711, K9GNV5, K9GE63, A1C8D8, A1DAY6, B0YA89,

Q4WBM1, Q5AS50, G7XV72, G3Y7X5, A2QHH4, Q0C7E3, B8NX76, 18TNI3, Q2TVY7.

[00118] Orthologs provided herein include EOG71 KDWJ: L0PC22, B6K3H4, 094677, Q6C1G5, K0KN49, 12H727, G8BSH8, A7THG4, J7S8L6, Η2ΑΥΉ2, G0VHH4, G0WDG5, Q6FTL7, J8LIS0, J5RQI1, C8ZGI8, B3LNQ3, C7GLV9, P53959, A6ZS37, G2WM62, B5VQZ6, G8ZRY6, C5DUZ7, C5DMM3, Q6CLC8, G8JRN3, Q75AZ8, C4Y1K1, G3B3F3, A5DK56, Q6BPU6, G8YP97, A3LT37, G3AM15, A5DUD8, H8WYI9, G8BDK2, C5M6P2, B9WBN8, Q59MF9, C4YIQ5, E7R6L5, F2R0B6, C4R6S3, G1X658, D5G544, D5G545, C7YP22, J9MZ18, K3VDM3, 11RZK3, G9PA30, G0RKF7, G9N1R9, E9DS45, E9EWT4, C9SXQ9, G2XGZ3, L2FED4, E3QI24, H1VLF9, F7VV93, Q7S4D8, F8MU50, G4UVL2, G0SAZ4, G2R730, Q2HH52, G2QI00, F0XNF7, J3NVD7, A4QUK5, L8G9L6, K1Y0X5, A7ERK2, G2YFW1 , F9XD71, K2SLI8, Q0UYL3, E5ACR4, B2VT31, E3SA56, H6C1H6, C1H2R7, C1GBA8, C0S972, F2T4Q8, C5JDA5, C5GW18, A6R6L9, C0NH47, F0U7T7, C6H6T2, C5FL60, E5QZK3, F2Q4Z6, F2RQ77, F2SF28, D4ATB8, D4D993, C4JSL3, Q1E6R9, C5PFS2, E9DJE6, B6QDN5, B8MCW6, B6HSN7, K9FXN5, K9FE46, A1CU89, A1DNX2, B0Y8T4, Q4WLS7, C8V7C6, G7XM19, G3YFX3, A2QLL1, Q0CSR3, B8NS60, I8TYE9, Q2UUV3.

[00119] Orthologs provided herein include EOG71KDWW: G1XAI1, D5GAQ4, C7YM52, J9MJX1, F9FXF4, K3UNI5, 11RWE6, G9NXY3, G0RN85, G9MPS9, E9E0V3, E9EQW7, J5JRB0, C9ST03, G2X5U9, L2GGN3, E3QU67, HI V6N3, HI VFM2, F7W518, Q7S217, F8MV63, G4V1F3, G0SA35, G2QR97, Q2GSF9, G2QHP3, F0XFY0, J3P064, G4NCT0, L8G3I2, K1X5T3, H0EX72, A7F9Y7, G2YB14, F9XGB4, Q0UML3, E4ZMR9, B2VR00, E3RSK1, H6BX95, C1GPA0, C1G2G6, C0SFZ3, F2T8T5, C5JN19, C5GAD3, A6QX76, C0NQR4, F0USA6, C6HQ49, C5G0L1, E4V6J2, F2PXX8, F2S6G4, F2SLI3, D4AUX2, D4DB55, C4JTE5, J3 3J8, C5P623, E9CRW8, B6QQY1, B8M739, K9GPL6, K9GHJ7, A1CQK3, A1D3G9, B0XQ28, Q4WTJ7, Q5BDK5, G7XKR3, G3YFF4, A2QPX9, Q0CCQ3, B8MWG4, 18TXW1, Q2UPJ3. [00120] Orthologs provided herein include EOG728HM1 : L0PFS1, B6K1Q2, 014171 , Q6C8R7, Q6CBH1, K0KUQ2, K0KVN3, 12H009, 12H5E7, G8BNP2, G8BW58, A7TEJ0, A7TJ53, J7RQU8, J7RZ87, H2AM99, H2AWV1 , G0VFE2, G0VGK7, GOWDSO, G0WF45, Q6FN24, Q6FWN2, J8PR46, J8Q458, J5PGF9, J6EHC5, C8Z3Z8, C8ZEV3, E7LR41, E7LYG3, E7KK52, E7KSM7, E7QBC2, E7NEQ5, E7NLJ6, E7Q0U3, B3LLZ2, B3LND8, C7GM47, C7GVD8, P35196, Q03175, A6ZKU9, A6ZMF8, G2W910, G2WKE9, B5VDX7, B5VPM9, E7K9P0, G8ZLR6, G8ZTQ9, C5DPM6, C5DZA1, C5DGF5, C5DNE9,

Q6CMW4, Q6CVD6, G8JPT6, 16NDB9, Q757K9, Q75DN9, C4Y1R4, G3B2N5, G3B4M7, A5DB58, A5DHK6, Q6BIJ8, Q6BRU3, G8YAW2, G8YNE2, A3GHZ7, A3LQN4, G3ANJ9, G3ASU3, A5E1U8, A5E3B3, H8X8W1, H8XAF6, G8B6A0, G8BK33, C5MBR8, C5MJ96, B9W9M1, B9WI75, Q5A3N4, Q5AJZ3, C4YG93, C4YS41, E7RAB6, F2QY30, C4QX29, C4R914, G1XAT3, D5GCA1, C7YHV1, J9MD21, F9FPN8, K3VB40, 11R9Y2, G9NS82, G0RA94, G9NAI5, E9DZ58, E9EP90, J5K2B9, G3J5B5, G3JEK6, C9SKL0, G2XA84, L2GGQ3, E3QR88, H1VJT2, F7W9D8, Q7RWJ5, F8MNV9, G4UTX4, G0RYH9, G2RH00, Q2GV28, G2QP92, F0XD43, J3NTK6, G4NH37, L8G9W7, K1XY70, A7ELA5, G2Y1E4, F9X7P9, K2RKC8, Q0UPV6, E4ZSJ3, B2VZ02, E3S849, H6CAN8, C1GVZ9, C1G454, C0S6P4, F2TT69, C5JXA6, C5GWD9, A6QW17, C0NJU4, F0UC78, C6HIX1, C5FUG1, E4V197, E5QZU9, F2PSM2, F2Q2P1 , F2RPI6, F2S9L8, F2SEY4, F2SXE7, D4AT81, D4B1K1, D4DA51, D4DC39, C4JS36, J3KJQ8, C5PG81, E9CVY2, B6QQG7, B8LVZ9, B6HV70, K9G4F8, K9F6E4, A1CI67, A1CWJ3, B0Y6W6, Q4WQ28, C8VG05, G7X804, G3Y9B3, A2QIZ3, Q0CQU0, B8N3C3, 18AC72, 18U255, Q2U367, Q2UJT3.

[00121] Orthologs provided herein include EOG72CCTC: G1X5Y4, D5GLB8, C7YVF3, C7ZN76, J9MX35, J9NES0, F9FH46, F9G1V4, K3V9D6, K3W1V5, 11RKX5, 11RP45, G9NIG2, G0RQV3, G9N1E3, E9DU06, E9DYR5, E9E0Y1, E9E484, E9EE16, E9EP20, E9EWZ3, E9F029, E9FCF1, E9FDH3, J4VSU4, J5K795, G3J2W4, G3JKN0, C9SW80, G2XAM6, L2FA23, L2FER6, L2FVK8, L2FX15, L2G789, L2GFJ3, E3Q7N6, E3Q8T0, E3QJL5, H1V2Q9, H1VG39, H1W1R1, F7W734, Q1K6X1, F8MD19, G4UH69, G0SFE3, G2QRN5, G2QZS1, Q2GXW4, Q2GZP6, G2Q6L2, G2Q7H1, F0XE90, J3NXA1, J3P9C6, J3PGF3, L8FPB9, K1WLL1, K1WVJ3, H0ECG3, H0ESJ5, H0EXD4, A7EJF6, A7EVW6, A7F7Q3, G2XVD0, G2XWS7, G2Y8T0, G2Y9J3, F9WW81, F9XCH5, F9XCT4, F9XIQ6, K2RJK5, K2RLJ4, K2S7P0, Q0U045, Q0U0P7, Q0U0P8, E5AA56, E5AC79, B2VRW7, B2VVE2, E3RJS2, E3S7D9, H6C4C1, H6C6A0, H6CBZ6, C1H627, C1GKX3, C0SFR5, F2T599, F2T7J1, C5JCG8, C5JJ79, C5GE56, C5GVA1, C0NIV0, F0U5J5, C6H875, C5FD41, C5FEZ9, C5FX94, E4UY02, E5R0C0, E5R311, F2PH15, F2PMI5, F2PX65, F2RRE9, F2RX74, F2S485, F2SE80, F2SIM5, F2SYE3, D4AQA7, D4AZG9, D4B0Q6, D4D1U7, D4D8X7, D4DFB5, C4JI27, C4JW44, J3KD58, J3KLD8, C5P2Y9, C5P8U4, E9CU49, E9D117, B6Q4A8, B6QU02, B8M556, B8MPH0, B6HBR0, B6HFM4, B6HS28, B6HV07, K9FFE8, K9H4I4, K9H6X0, K9FIA6, K9GF56, K9GFB3, A1CEV4, A1CH38, A1CN56, AICXQO, A1CZ48, A1DKU1, B0XUM7, B0Y5R9, B0YDY3, E9R4W4, Q4W8Y5, Q4WNX3, Q5BA75, G7X577, G7XE34, G7XJ53, G7XQK8, G3XP42, G3XU20, G3XZB9, G3Y2V9, A2QCT2, A2QL12, A2QP32, A2R2W1, Q0C9U7, Q0CFM5, QOCLFO, Q0CNE5, B8N9W2, B8NLK5, B8NUL2, B8NWI1, 18A0U7, 18I6L4, 18TNY8, 18TWJ6, Q2TWJ3, Q2U4G6, Q2U6H7, Q2UGG7.

[00122] Orthologs provided herein include EOG779WRH: B6K6Q9, Q09175, P42781, K0KH55, 12GXK6, G8C2F8, A7TQN8, J7S878, H2APD8, G0V5B2, G0WD46, Q6FP09, C8ZFZ8, E7KTA9, E7QJN4, B3LP77, C7GPC2, PI 3134, A6ZRK5, G2WLM8, B5VQH2, E7KHE8, G8ZUJ0, C5DYT4, C5E1X5, P09231, G8JUS4, Q75E73, C4Y0Y7, G3B8L5, A5DKC2, Q6BUI9, G8YHZ5, A3LYR4, G3APZ6, G3APZ7, G3AQ17, A5DSI0, H8WVT5, G8BBU3, C5MB07, B9W8S2, Q5APK9, 013359, E7R628, F2QTE8, C4R095, G1XB39, D5GIV9, C7ZNV5, J9MS93, F9G7P0, K3V1N7, 11RXT1 , G9PBI0, G0RTE8, G9MJ01, E9DYU9, E9EZ14, J5J3Q0, G3JFZ7, C9SHB4, G2WTC0, L2G8Y5, E3Q447, H1VGF4, F7VXX2, Q7SGV0, F8N3I8, G4U720, G0SHU5, G2RDJ2, Q2HCW1, G2Q6S3, F0XS58, J3P482, G4N6E8, L8G4A9, K1WKK9, H0EH45, A7EL30, G2Y1N5, F9XB24, K2SI37, Q0UUZ4, E4ZV53, B2WD96, E3RST5, H6C7D2, C1H845, C1G3N7, C0S4Y0, F2T2E2, C5JIA6, C5GGC8, A6RBK7, C0NN41, F0U5T3, C6H2S7, C5FEH8, E5R259, F2PMA1, F2PUB7, F2RRA5, F2S9R9, F2SBH4, F2SKZ2, F2SY55, D4AUN6, D4AVC2, D4DAD7, D4DHL6, C4JKT2, J3KHH7, C5PIA7, E9DHD5, B6QQ06, B8LXM9, B6HPR8, K9FX78, K9FCN0, A1CIL8, A1CW48, B0Y4Q1, Q4WQI8, G5EB75, G7X5T2, G3XT78, A2Q9N6, Q0CT05, B8NSY2, 18A2Y7, Q2UUJ5.

[00123] Orthologs provided herein include EOG78HBQ5: L0PFJ1, B6K5S0, 013704, Q6BZZ7, K0KEN9, 12H2K7, G8BVG0, A7TMH6, J7S977, H2ANB2, G0VB25, GOWAXl, Q6FJP0, J4TYN6, C8ZH38, E7M0Q1, E7KUM8, E7QL10, B3LJV2, C7GWA4, Q12404, A6ZPC3, G2WNF4, B5VSG5, G8ZPE4, C5DPS7, C5DF81, F2Z6B5, G8JVF9, Q752L5, C4Y0Y3, G3BDB3, A5DIG2, Q6BHB9, G8YJV1, A3LVR0, G3ASF2, A5E364, H8X926, G8B6H1, C5MIZ9, B9WHY2, Q5AKA0, C4YRU9, F2QM07, C4QVB2, G1X4R0, D5G9T7, C7ZNL8, J9MSG1, F9FXM6, K3VJ95, 11RYB8, G9NJZ2, G0R8Q3, G9MLQ9, E9E9M8, E9F9Q8, J4KMV8, G3JP03, C9SYF8, G2X8T3, L2FQT7, E3QS29, HI VBK1, H1VBY3, F7VQZ8, Q7SEC1, F8MZQ4, G4U977, G0SC20, G2RCI6, Q2HET9, G2Q4Y6, F0XCT2, J3NX63, G4MM08, L8FV64, KlWYGl, H0EGL8, A7F296, G2YAV7, F9XAE5, K2T0A9, Q0UV07, E4ZV78, B2WCZ2, E3RLC3, H6C583, C1H482, C1GKM9, C0SEQ1, F2TMA3, C5K2F6, C5GCY8, A6R594, C0NC56, F0ULG1, C6HAG1, C5FTA6, E4V0A3, F2PTG7, F2S9W6, F2SUN8, D4AJS2, D4D8T7, C4JVV2, J3KEU8, C5P0Y6, E9DE17, B6QNQ6, B8MH64, B6H1Z3, K9FR70, K9FWN4, A1CRD4, A1D496, B0XP24, Q4WJN9, C8VUK6, G7XP87, G3XS40, A2Q8J8, Q0CUS6, B8N1C6, 18AD89, Q2UKF5.

[00124] Orthologs provided herein include EOG790PXD: I2H8L1, G8BMJ1, A7TR06, J7RRW2, H2API9, G0VFD6, G0WDR4, Q6FJR6, J5PN38, C8ZGK3, E7LZ40, E7KT64, E7QJI2, E7NLX9, E7Q8G4, B3LPF5, C7GWR8, P17260, A6ZSF9, G2WLE8, B5VQ98, E7KH66, G8ZL75, C5DSV4, C5DJV5, Q6CMD7, G8JNX9, Q75AW0, A5DAP6.

[00125] Orthologs provided herein include EOG7B90XF: L0PFC2, B6JXB2, 074552, Q6C5K0, K0KSK0, 12GWB6, 12H1Y5, G8BN90, G8BU38, A7TFJ4, A7TGQ4, J7S5L5, H2AXI7, G0VAH7, G0VKB0, G0W371, Q6FQI0, J8PM89, C8ZB66, B3LPW7, C7GPX9, P34110, A6ZQH8, G2WGL9, B5VL31, G8ZRV7, C5E155, C5DKJ2, Q6CW25, 16ND11, Q757W9, C4XVR8, G3B2I7, A5DCK7, B5RTC3, G8Y998, A3LT23, G3AM00, A5DUF6, H8WYH1, G8BDI4, C5M6R8, B9WBL8, Q59T42, C4YIN5, E7R5N5, F2QPU8, C4QYN3, G1WYC6, D5G5T6, C7ZKT4, J9N528, F9G2Q0, K3VB71, 11RG99, G9NJU4, G0R8M4, G9MLV1, E9EEL0, E9EMB 1, J5JHC5, G3J562, C9SMT8, G2WZC0, L2FH96, E3QNT5, HI VCW5, F7VNW1, Q7SAE6, F8MFN7, G4UJV7, G0S709, G2R102, Q2H1N0, G2QLZ1, F0XJA9, J3P2W0, G4N4D6, L8G6K4, K1WWY4, A7ENB9, G2Y4U2, F9X2J3, K2SD19, Q0UIM3, E5A8J3, B2WEV9, E3RKZ7, H6BUL7, C1GTN2, C1G112, C0S388, F2TAD4, C5JXZ8, C5G9Q7, A6QRV5, C0NU30, F0U8N5, C6H1I2, C5FDR7, C5FDR8, E5QZF8, F2PWB0, F2RUK9, F2SF39, D4AMM7, D4DBS2, C4JYW7, 19XMV5, C5P9T9, E9CZ76, B6Q3W6, B8M7K2, B6H4R0, K9G7M4, K9FCS2, AlCKEl, A1D730, B0XXW4, Q4WXQ8, Q5B3C9, G7XCR9, G3Y608, A2R7P6, Q0CP14, B8N1T0, 18TRN7, Q2UL15.

[00126] Orthologs provided herein include EOG7GR5RH: L0PAW6, B6K6Q0, Q02953, Q6C5Z0, K0KQF5, 12GV49, 12H756, G8BU00, A7TJ41, J7S7A8, H2AS68, G0VEG8, G0WFW9, Q6FS51, J8Q551, J5PMU9, J6EFG2, C8Z8H4, C8ZIA2, E7M0E1, B3LHE0, B3LJH9, C7GN83, C7GW40, PI 0961, P20134, A6ZNY5, A6ZUA5, G2WE25, G2WN13, B5VIQ0, B5VS28, G8ZVW6, C5E007, C5E2E6, P22121, G8JP52, Q755B0, C4Y8W8, G3BA89, A5DAC0, Q6BZ70, G8YC18, A3LRV1 , G3AMW9, A5E1V6, H8X9Y2, G8BLA3, C5MB27, B9W8T9, Q5AQ33, C4YDC1, E7R0Q3, F2QU77, C4QZF2, G1XTX7, D5GK43, C7YI28, J9MDG4, J9NIK2, F9FTZ4, F9FWJ5, K3VSP4, 11RBL8, G9NSH6, G0RL00, G9NAQ4, E9DS17, E9EJ54, E9ENR0, J4UNK3, G3JI32, C9SKP8, G2XAC1, L2GG95, E3QGD3, H1UX60, F7VP15, Q7SBY0, F8MPC3, G4UTR6, G0RY37, G2RI23, Q2GPD3, G2QQ58, F0XRR6, J3PCK1, G4N9W1, L8GDI5, K1XU83, HOEEVO, A7E9S5, G2Y0R1, F9XHQ4, K2S0F7, Q0UGS6, E5AC00, B2VQY3, E3S3B6, H6BVH2, C1HCH3, C1G5F3, C0RX65, F2TD38, C5JZG7, C5GQW1, A6R5J9, C0NRM9, F0UVW8, C6H3C6, C6H3C7, C5FIV2, E4UUV2, F2PZ50, F2S7L8, F2SN94, D4AXB0, D4DJY7, C4JME6, J3K8M7, C5P817, E9CSW4, B6QPS5, B8LWT2, B6HI52, K9GJQ1, K9FPI7, A1C5F5, A1D0C5, B0XZG7, B0Y201, Q4WE63, Q5AUJ5, G7XDC9, G3Y599, A2R6Z7, Q0C923, B8N2X9, 18IVR7, Q2UJ75.

[00127] Orthologs provided herein include EOG7HQW4X: L0PBQ7, B6K4H5, Q9USH7, Q6C4W2, Q6C7R9, Q6CF28, K0KV85, 12H0W2, G8BX88, A7TE55, J7S5G8, H2AWG5, G0VEI0, G0VHX8, G0WCE6, G0WFV8, Q6FLU6, Q6FUY7, J8LQ32, J8PYG9, J4U053, J8TYM7, C8ZF47, C8ZIE7, E7LYU3, E7M0X7, E7KSV6, E7KVA0, E7QJF3, E7QLL2, E7NLP1, E7NNL8, E7Q823, E7QA36, B3LKJ9, B3LM70, C7GTR7, C7GU71, P32867, P39926, A6ZMP4, A6ZW23, G2WKN8, G2WNV3, B5VPV0, B5VSV0, E7KGV2, E7KJ94, G8ZN09, C5E4P7, C5E296, F2Z6C6, G8JNJ3, Q755G2, C4Y7J6, G3BEQ2, A5DFS3, Q6BWQ2, G8YK27, A3GHM2, A3LU22, G3AMG6, A5DYS5, H8X349, G8BJH6, C5M529, B9WCN9, Q59YF0, C4YKP1, E7R8K2, F2QLH5, C4QY09, G1XU32, G1XV70, D5G6S2, D5GKK0, C7YI40, C7YRE0, J9MDH0, J9MRG7, F9FS38, F9FTY8, K3VP74, K3VRX9, 11RBM2, 11RZS5, G9NW14, G9PAD6, G0RK14, GORLTl, G9MHY6, G9N236, E9DRH8, E9DU63, E9ELE2, E9EZX5, J4KLA6, J4UKS7, J4VXJ0, G3J4J5, G3JNS7, C9S6U5, C9SQW0, G2WRU5, G2X8A9, L2FRG1, L2GFP2, E3QIZ1, E3QW08, H1VBE4, H1VKF4, F7VM71, F7VZ59, Q1K7J3, Q7RVS5, F8MBQ5, F8MU06, G4UDL2, G4UVR8, G0SE89, G2QSY7, Q2GY86, G2Q840, F0XFM9, F0XJX5, J3NHF6, J3P3I9, G4ND45, G4NGM1, L8FSS7, L8FT16, K1WHW2, K1XL75, H0EKZ2, H0ET40, A7E6Q0, A7F5T9, G2XUR7, G2YXT2, F9WZ52, F9XAQ4, K2RWJ9, 2SEF5, Q0UCC4, Q0V5R9, E4ZSR7, E5A5E6, B2WAX1, E3RW63, E3RXN7, H6BNZ3, C1HE55, C1GMU5, C0SJE5, F2TTD6, C5K1U2, C5GRW7, A6RE71, C0NVY4, F0URZ3, C6HH21, C5FG08, E4V3C6, F2Q3V7, F2SB01, F2STR3, D4APR8, D4D161, C4JFN0, J3K5I1, C5PJD0, E9DC70, B6Q8W7, B8M122, B6HNA6, K9FR42, K9GVA2, A1CG90, A1D929, B0YBW7, Q4WAJ3, G7XW83, G3XUT1, A2QYG9, Q0CC87, B8NTT9, B8NW67, 18TN91, Q2TX29. [00128] Orthologs provided herein include EOG7NWF5R: Q6CEV1 , 0KKL3, 12H2Z9, Q6FLY3, C5DX88, C4Y6E4, G3AYN6, A5DH55, Q6BQC2, G8Y8H5, A3GFS3, G3AGG1, A5DX95, H8X1G0, G8B7A0, C5MDH5, B9W820, Q5AA52, C4YE41, E7R4C4, F2QMN8, C4QWY5, G1X2Z0, G1XDG5, C7ZKY2, J9NGA5, J9NK99, F9F7K5, F9F8K7, K3VII7, I1S462, G9P2N0, G0RFQ8, G9MRQ4, E9E7I2, E9F928, J5JN42, G3JRZ2, C9SBS6, G2X3E6, L2FG89, E3Q7M6, H1V9R5, F7VSK1, Q7SHF0, F8N209, G4U5G0, G0S0S3, G2R1U8, Q2H3S4, G2QME7, F0XB04, J3NUD6, G5EI02, L8FMV9, K 1X831, H0EJZ4, A7F237, G2XPL3, F9X6M4, K2RZK2, Q0UIF5, E5A8K3, B2WE36, E3RFE9, H6C6A4, C1GSP3, C1G051, C0S8J3, F2TNQ6, C5K1H4, C5GY20, A6RDC0, C0NYZ2, F0UHV9, C6H5B4, C5FGR9, E4V5X8, F2PTA6, F2RSI5, F2SSE4, D4AP89, D4DLQ2, C4JGL8, J3K657, C5P3I2, E9DEL0, B6QTQ5, B8MNW8, B6H5M4, K9FGA7, K9FP24, A1CC56, A1DKN3, B0XZC0, Q4WEY8, Q8TFU8, G7Y0N2, A2Q7D0, Q0CTX2, B8NL91, 18A7Z1, Q1XGE2.

[00129] Orthologs provided herein include EOG7P61 VJ: L0PG89, B6JZD5, Q6C080, Q6CAK4, K0K6D5, 12GYM0, G8BSE6, A7TH54, A7TKN3, J7S2U9, H2AZ95, G0VCB7, G0WBA7, Q6FK02, J8PGZ5, J5PEL5, C8ZIM4, E7M129, E7KVG3, E7QL96, E7QA90, B3LKS3, C7GJA6, P07267, A6ZW99, G2WP28, B5VT16, E7KIT9, G8ZMW1, C5DXS3, C5DF06, Q6CRW3, G8JNR1, Q74Z00, Q75BX7, Q75BX8, C4Y7E6, G3AWX0, A5DLJ4, Q6BUT8, G8YLU0, A3LYI5, G3AKT4, A5DWV8, H8WZZ7, G8B7P2, C5M792, B9WC26, Q59U59, C4YK41, E7RBI7, F2QUG8, C4R6G8, G1X6I4, G1XQ11, D5G878, D5GDR4, C7YZL1, C7Z131, J9NBZ1 , J9NF62, F9FHP2, F9FQ08, K3VXC2, K3VYN7, 11R J1, I1S202, G9NPY9, G9NZH3, G0RIW3, G9MNJ2, G9N404, E9DV76, E9DW11, E9EY37, E9EZD7, J4KN57, J4W833, G3J729, G3JGS6, C9SGE5, C9SKE7, G2WSP4, G2XA22, L2GAQ7, L2GFD4, E3QLB6, E3QLN9, H1VKW9, F7VP05, F7VVY8, A7UXG4, Q01294, F8MPD5, F8MUL2, G4UTQ3, G4UYE7, G0S1H4, G0S288, G2R7T2, G2RFQ3, Q2GQW8, G2QK78, G2QN49, F0XNB4, J3PHE1, J3PKM1, G4MTL8, G4NDG4, L8FSE2, L8G7H2, K1WGG2, K1X2K4, H0EL35, A7EF50, A7F3H2, G2YGA3, G2YYJ6, F9XDK0, K2R5R4, K2SAI9, Q0USG7, Q0UY52, E5AFF5, B2W0W6, B2WNK4, E3RIK4, E3RZ17, H6BU59, H6C3W6, C1GTX1, C1HAC9, C1G194, C1GLP1, C0S3G7, C0SIP3, F2T7G3, F2TJK7, C5JSV4, C5JUE9, C5G7D8, C5GE29, A6QRN1, A6R7J7, C0NIS4, C0NV12, F0U5B1, F0U899, C6H846, C6HRN5, C5FS55, C5FEK4, E4UNJ6, E5R345, F2PHT3, F2PMM2, F2RRI6, F2S7A8, F2SVY2, F2SYH9, D4B385, D4AZK1, D4DEN7, D4D8U6, C4JSZ2, C4JZ47, J3KC79, J3KDA9, C5P8P0, C5P9L1, E9CYX4, E9D0U3, B6Q1L3, B6Q342, B8LUJ6, B8M4Z6, B6H445, B6HU56, K9G926, K9H6T8, K9F9X3, K9GFA6, A1CH14, AlCKPl, A1CXS3, A1D6T2, B0XXK9, B0Y5P6, 042630, Q4WNV0, C8V8R7, Q5B977, G7X6L3, G7XPJ6, G3Y3K2, G3YA50, A2QDI4, A5AAJ6, Q0CLC4, Q0CNQ8, B8N3P1, B8NMS2, 17ZWY0, 18TQI4, 18TWD1, Q2U319, Q2UGE5, Q2UKS5.

[00130] Orthologs provided herein include EOG7PPD30: L0P6Q8, L0PD82, B6K3K0, 074484, Q6CCU3, K0KTA0, 12GWV1, G8BPH4, G8BQG4, A7TRY9, J7RYE4, H2AY98, G0V5R3, G0VKS6, G0WAE7, G0WH28, Q9Y725, J8PQE1, J6EIA2, C8Z4L4, E7LS70, E7KL28, E7QCK5, E7Q1X1, B3LGV3, C7GT83, P41940, A6ZXS0, G2WCA5, B5VFL2, E7KAI8, G8ZUT9, C5DRB6, C5DFC2, Q70SJ2, G8JQ17, Q752H4, C4Y4D7, G3BDU9, A5DL19, Q6BN12, G8YPH1, A3 GET 1, G3AS40, A5E01 1, H8X2F5, G8BB15, B9WF1 1, 093827, E7R9A4, F2QV49, C4R5U0, G1XAV3, D5GJ95, D5GJC2, C7YM04, J9MJG5, F9FRG8, K3V8D6, Q4I1Y5, G9NY30, G0RMQ5, G9MPY2, E9DYE9, E9EQL5, C9ST00, G2X5U6, L2FTG5, E3QYA9, HI VYV3, F7W542, Q7RVR8, F8MV38, G4V1I1, G0SAH0, G2QQX0, Q2GSE0, G2QHM4, F0XHM7, J3NFK4, G4MY73, L8G642, K1X5D9, HOELFl , A7E4E2, G2YI83, F9X786, K2S093, Q0UNJ5, E5A9J2, B2VWC7, E3S710, H6BP97, C1GW23, C1G479, C0S6R9, F2THF0, C5JXD0, C5GX01, A6QVZ5, C0NJR8, FOUCOl, C6HIZ7, C5FUD7, E4V169, F2Q5E5, F2S7X0, F2SXH4, D4B1M6, D4DC15, C4JS61, J3KJS4, C5PG67, E9CVW0, B6QQ95, B8LVX3, B6HVJ2, K9G791, K9FPP2, A1CI82, A1CWH9, B0Y6Y1 , Q4U3E8, Q5B1J4, G7X7Z0, G3Y9A2, A2QIW7, Q0CQV3, B8N3D7, I8A7T7, Q2UJU5.

[00131] Orthologs provided herein include EOG7QK75C: L0PEW3, L0PFR4, B6K041, P87319, Q6C2L6, K0KEY6, 12H6L3, G8BUS7, A7TRZ3, J7S6G6, H2B1Y1, G0V5W7, G0WAJ6, Q6FWS9, J8PZL7, C8ZCT4, E7KRC6, C7GRK1, Q07878, A7A0L8, G2WIF7, B5VMQ8, G8ZYR9, C5DZW7, C5DHN4, Q6CQ11, G8JXV5, Q75CI3, C4XZK5, C4XZK9, G3B4K5, A5DNG9, A5DNH1, Q6BS09, G8Y6L9, A3GH59, G3AE43, A5E4H7, A5E4H8, H8WZK9, G8BHJ3, C5MCF6, B9WGQ1, Q59Q73, E7R273, F2QNT3, C4QWF7, G1XAS3, D5GAK4, C7YJL9, J9MPR6, F9FT34, F9G0X7, K3VMV7, 11S1D0, G9NQU5, G0RP04, G9N4Q0, E9EBU9, E9EZQ1, J5J8Y8, G3J2I7, C9SVP5, G2XJ63, L2GE08, E3QQH6, H1VSD8, H1VXR2, HlWOOl, F7W5H9, Q7S5K6, F8MSR9, G4UWX4, G0S3B8, G2R6K5, Q2HC06, G2QFV6, F0XV75, J3NT79, G4N6L6, L8G3J3, K1XNG8, HOEDQO, A7EK83, G2YWZ7, G2YWZ8, F9X639, K2RF79, Q0U1 E3, E5AEZ2, B2VZ22, E3RS87, H6CAX4, C1GW28, C1G487, C0S567, F2THE3, C5JXD8, C5GWZ3, A6QVY9, A6QVZ0, C0NJR3, F0UBZ4, C6HJ02, C6HJ03, C5FUC9, E4V160, F2Q5D9, F2S7W4, F2SXI1, D4AW10, D4DC10, C4JS67, J3KJS9, C5PG61, E9CVV0, B6QQA2, B8LVX8, B6HVJ7, K9G787, K9FPN7, A1CI87, A1CWH4, B0Y6Y7, Q4WQ48, Q5B1K1, G7X7Y6, G3Y999, A2QIW3, Q0CQW1, B8N0Y1, 18U2F7, Q2UJU9.

[00132] Orthologs provided herein include EOG7SBWGK: L0PCA4, B6JUZ0, 094395, Q6CCK2, K0KT55, 12GYC9, G8BXB0, A7TSQ6, J7RT39, H2B2F6, G0VC15, G0WHG6, Q6FQX8, J4TZ43, C8ZFF1, E7LZ28, E7Q7Y0, B3LMH2, C7GTL5, P32807, A6ZN00, G2WKZ3, B5VQ51, G8ZNF3, C5DU89, C5DDT2, Q6CUB4, G8JSC7, Q752Y0, Q75DZ0, C4Y5Q7, G3B8F4, A5DF18, Q6BLW3, G8YAC3, A3GHQ1, G3AN11, A5DSK7, H8WWL2, G8BA26, C5MBM4, B9W9I0, Q59KD5, Q59TV9, C4YCM3, E7R8A2, F2QWR2, C4R481, G1XTI5, D5GAB3, C7ZPH9, J9MTZ7, F9GBU5, K3UKH0, 11S788, G9P2M2, G0RLD9, G9MYY3, E9EA63, E9EMQ0, J5JE72, G3J893, C9STJ7, G2XJB5, L2GDN7, E3QI80, H1VG63, F7VNH3, Q7SA95, F8MJD2, G4UJ15, G0S575, G2R4E7, Q2H0I3, G2Q4B1 , F0XAB8, J3P4L2, G4MT19, L8FRG8, K1WAB1, H0EYD1 , A7EBN1, G2Y6D4, F9X7N2, K2S5P3, Q0U5F2, E5A101, B2WLB5, E3RT55, H6BZK4, C1HDC8, C1G695, C0RXY6, F2TIW5, C5K0V1, C5GXR8, A6QZZ3, C0NDU6, F0UF13, C6HST2, C5FJC9, E4V2Y0, F2PZ70, F2S626, F2STC0, D4APC9, D4CZS6, C4JN36, Q1DU75, C5P7F9, E9CU17, B6Q2C3, B8MR17, B6HPY7, K9FV70, K9GID3, A1CAT1, A1DF55, B0Y3S7, Q4WUA2, Q5AVC7, G7XUB0, G3XXY5, A2R550, Q0CD79, B8NFA3, Q2MHH4.

[00133] Orthologs provided herein include EOG7SRCG7: Q6C9N3, K0KS31, 12H8J3, G8BW93, A7TG97, J7RFK8, J7S6C1 , J7S6G0, H2AM 5, H2AUE0, G0V6G7, G0V8L2, G0W603, G0WHY7, Q6FJ09, Q6FN56, J8LN32, J5S664, E7LV57, E7KP64, E7QFG8, E7NIE2, E7Q4I9, B3LSB0, C7GUD2, P38749, A6ZSP3, G2WF58, B5VJS1, E7KDA0, G8ZMK7, C5DQK6, C5DGJ8, Q6CV98, G8JPA2, Q75DK3, C4Y3Z4, G3BAK3, A5DJD7, Q6BPQ6, G8Y260, A3LZA1, G3ANQ3, A5E2C1, H8X854, G8B9R4, C5MGV0, B9WH93, Q59YI7, Q59YX5, C4YR67, E7R499, F2QQ86, C4QZ26.

[00134] Orthologs provided herein include EOG7T7QMN: L0P9Y4, L0PGL2, B6K6I3, 013898, Q6C5U6, K0KMF8, 12GZ46, 12H062, G8BQL5, G8BVV3, A7TH27, A7TIM9, J7RTX6, J7S3W9, H2B1U5, H2B1U6, G0V9Q2, G0V9Q3, G0W6E0, G0W6E1, Q6FL03, Q6FL05, J8Q6W2, J6EGJ5, C8Z6K9, C8Z6L0, E7LSE6, E7KL05, E7QCI2, E7NGG8, E7Q1U9, B3LGY8, B3LGY9, C7GJM0, C7GJM1, P33775, P52867, A6ZXN3, A6ZXN4, G2WC68, G2WC69, B5VFH3, B5VFH5, E7KAG7, G8ZXP2, C5DQ12, C5DL60, Q6CPY0, I6NE59, Q759J8, C4XZJ9, C4Y5C0, G3AZA6, G3B9M7, A5DP95, A5DQF4, B5RTX3, Q6BST4, G8YB21, G8YTI7, A3LS25, A3LY52, G3AIX8, G3AVG7, A5E6F4, A5E6X0, H8XAB9, H8XAH6, G8BK52, G8BKX1, C5MFZ7, C5MII7, B9WKF2, B9WMY4, 074189, Q5ACU3, C4YN04, C4YTX6, E7R1R2, F2QP12, F2QTP4, C4QWN8, C4QZZ6, G1XCR2, D5G7U5, C7YNF9, J9ML99, F9FAR2, K3VAW0, 11RX83, G9P2Z6, G0RFY7, G9MRI9, E9EFG2, E9FDL5, J5JB66, G3JMV2, C9SBU1, G2X3G2, L2FKP6, E3QF27, H1VTR9, F7VSQ7, Q7SH94, F8N4A0, G4U5Y4, G0S0X1, G2R2A9, Q2H3Y3, G2QM85, F0XBP5, J3NUH5, G5EHM5, L8FQQ5, K1X8U5, H0EYE9, A7EFD6, G2YTK5, F9XQ12, K2RNI7, Q0V727, E4ZUH8, B2WAJ6, E3RPN5, H6BQN1 , C1H569, C1GFP5, C0SDI8, F2T6N2, C5JV90, C5GPP0, A6QUZ0, C0NYI1, F0UPW3, C6HLV2, C5FZH9, E4UWL2, F2SB83, F2SH83, D4ASS5, D4D4K5, C4JE62, J3KH14, C5PIQ2, E9D4S5, B6QTD7, B8MLN2, B6HEJ9, K9GL27, K9GPJ4, A1CJB3, A1D867, B0XYZ3, Q4WWN0, Q5B3W9, G7XFD5, G3YBJ8, A5ABN8, Q0CC05, B8NV10, 18TU59, Q2U526.

[00135] Orthologs provided herein include EOG7TR2T6: L0PFE1, B6JV04, 013756, Q6CB03, K0KMT7, 12H488, G8BUX6, A7TKA8, J7S567, H2ANY8, G0VKP7, G0WGZ3, Q6FXX8, C8Z3H6, E7KJY0, E7QB47, E7NEJ3, B3LUU1, C7GTB0, P39702, A7A0G5, G2W8M8, B5VDL7, G8ZWN3, C5E087, C5DHZ2, Q6CSR5, G8JV83, Q754S0, C4XZF4, G3B566, A5DFJ5, B5RST3, G8YKU1, A3LP33, G3ARI2, A5DXW2, H8X6G1, G8BH20, C5MI27, B9WD45, Q5A7T0, C4YPK5, E7R368, F2QQ52, C4QYZ1, G1XAZ9, D5GJ51, C7YIB2, J9MD96, F9G051, K3V4W8, 11S4T0, G9P9V2, G0RNQ1, G9MII4, E9E8L8, E9EMJ3, J4KRB5, G3JJP0, C9SH74, C9SH75, G2WT84, L2FGZ3, E3Q3D0, HI VAI3, HI VLB3, HlVQXl, F7VTA7, Q7SHR1, F8N1D5, G4UBZ8, G0SBW5, G2RC85, Q2HEI3, G2Q440, F0XAI6, J3NRR1, G4MXK4, L8G1N4, K1Y016, H0EH52, A7EL07, G2Y1S0, F9XC99, K2SBS5, Q0UUI8, E4ZWY3, B2WPP9, E3RHC6, H6C572, C1H476, C1GKP0, C0SEP3, F2TMA8, C5K2G2, C5GCY4, A6R599, C0NC61, F0ULG6, C6HAG6, C5FTA1, E4V0A7, F2PTG2, F2S9X0, F2SUP3, D4AJR8, D4D947, C4JVU8, J3KEV5, C5P0X6, E9DE32, B6QNQ9, B8MH59, B6H1Z7, K9FB00, K9GFK9, A1CRD8, A1D4A0, B0XP20, Q4WJP3, C8VUL0, G7XP91, G3XS36, A2Q8J4, Q0CUS2, B8N1D1, 18U8F9, Q2UKF9.

[00136] Orthologs provided herein include EOG7TXT9G: L0PFV0, B6JWL3, P36604, Q99170, K0KHM2, 12H386, G8C145, A7TJ69, J7S8H7, H2AVB9, H2B0M3, G0V5E5, G0W7Y0, Q6FW50, J8LM34, C8ZBH9, E7LW13, E7QGM1, B3LQ77, C7GSQ6, P16474, A6ZPU2, G2WGY4, E7KEB7, G8ZVI6, C5DQE5, C5DMC8, P22010, G8JXH1 , Q75C78, C4YC34, G3AYE2, Q6BZH1, G8YB62, A3LN25, G3AK92, A5DUX8, H8WXM9, G8BEU5, C5M618, B9WAG1, Q5ADI3, C4YJ74, E7R7Y6, F2QTW5, C4QZS3, G1XSH8, D5GM11, C7Z550, J9MTC9, F9FL45, K3VXR0, 11RYL5, G9 MR9, G0RPK6, G9ML11, E9DYG4, E9EQK1, J5JYV0, G3JHM3, C9SSS1, G2X5L1, L2FLM9, E3QX97, H1VJE3, F7W676, P78695, F8MXP0, G4UYV3, G0S916, G2QXK9, Q2GS48, G2QHD4, F0XBZ3, J3NFN7, G4MKA5, L8G135, Kl WW58, H0EWA5, A7EAD4, G2YSM3, F9WWJ1, K2RDV3, E4ZXX2, B2W9E6, E3RVA0, H6CAK8, C1GRW7, C1G8H6, C0S072, F2T234, C5JI01 , C5GGS1, A6RA17, C0NS16, F0UU32, C5FQB8, E4V4P9, F2PPH0, F2RV54, F2SM79, D4ALR8, D4D5D5, C4JGG9, J3KIY6, C5PGZ0, E9CWZ3, B6Q4C4, B8LYJ0, B6H0S5, K9FZ18, K9FFL8, A1CEK9, A1DFN8, B0XV51, Q4WHP9, Q5BBL8, G7XAE1, G3YCR7, A2QW80, Q0CJU4, B8N4E9, 18I8C8, Q2ULV1, L0PFV0, B6JWL3, P36604, Q99170, K0KHM2, 12H386, G8C145, A7TJ69, J7S8H7, H2AVB9, H2B0M3, G0V5E5, G0W7Y0, Q6FW50, J8LM34, C8ZBH9, E7LW13, E7QGM1, B3LQ77, C7GSQ6, P16474, A6ZPU2, G2WGY4, E7KEB7, G8ZVI6, C5DQE5, C5DMC8, P22010, G8JXH1, Q75C78, C4YC34, G3AYE2, Q6BZH1, G8YB62, A3LN25, G3AK92, A5DUX8, H8WXM9, G8BEU5, C5M618, B9WAG1, Q5ADI3, C4YJ74, E7R7Y6, F2QTW5, C4QZS3, G1XSH8, D5GM1 1, C7Z550, J9MTC9, F9FL45, K3VXR0, 11RYL5, G9NMR9, G0RPK6, G9ML11, E9DYG4, E9EQK1, J5JYV0, G3JHM3, C9SSS1, G2X5L1, L2FLM9, E3QX97, H1VJE3, F7W676, P78695, F8MXP0, G4UYV3, G0S916, G2QXK9, Q2GS48, G2QHD4, F0XBZ3, J3NFN7, G4MKA5, L8G135, 1WW58, H0EWA5, A7EAD4, G2YSM3, F9WWJ1, K2RDV3, E4ZXX2, B2W9E6, E3RVA0, H6CA 8, C1GRW7, C1G8H6, C0S072, F2T234, C5JI01, C5GGS1, A6RA17, CONS 16, F0UU32, C5FQB8, E4V4P9, F2PPH0, F2RV54, F2SM79, D4ALR8, D4D5D5, C4JGG9, J3KIY6, C5PGZ0, E9CWZ3, B6Q4C4, B8LYJ0, B6H0S5, K9FZ18, K9FFL8, A1CEK9, A1DFN8, B0XV51, Q4WHP9, Q5BBL8, G7XAE1, G3YCR7, A2QW80, Q0CJU4, B8N4E9, 18I8C8, Q2ULV1.

[00137] Orthologs provided herein include EOG7V1PKK: B5FVC3, G1X5X0, G1XNC2, G1XNC3, C7Z832, C7ZCV9, C7ZF51, J9MHV2, J9MM74, J9NN59, F9FKY5, F9FM97, F9FRL0, F9GFK7, K3VLH0, K3VYY0, K3W2V2, 11RI12, 11RQZ9, 11RU91, G9NK25, G9NQ54, G9NV68, G9P71 1, G0R8T0, G0R9K1, G0RSP8, G0RTY7, G9MF49, G9MLM6, G9MUE5, G9NC88, E9DU92, E9DV01, E9E2F1, E9E8X2, E9EI07, E9E F8, E9EN41, E9EZU7, E9F4F2, E9F7H7, E9F7Q9, E9FCW9, J4W9B9, J5JB84, J5JTV4, J5K1S9, G3J685, G3J6D7, G3JB67, G3JBW9, G3JQR5, C9S7L8, C9SEM1, C9SUH4, G2WQ57, G2X1N1, G2XDN3, L2FR36, L2G2C3, L2G480, E3QIU6, E3QS16, E3QSB7, H1UXW5, H1V264, H1VSK1, F7VKM5, F7VUJ4, F7VZP8, F7W590, Q7S3R4, Q7SCF6, Q7SD30, Q7SDD9, F8MH83, F8MNQ5, F8N0Z3, F8N2X8, G4U7B2, G4UBD0, G4ULF2, G4UUI6, G0RYR0, G0SGN4, G0SI60, G2QYS0, G2R0S9, G2R4N8, G2RCL3, G2RGE0, Q2GUS3, Q2GZ56, Q2HCE4, Q2HFJ8, G2Q6W1, G2QDM7, G2QIG8, G2QNS5, F0XCI9, FOXKEl, F0XT61, J3NQ97, J3NU10, J3PGN6, G4N3E0, G4NI47, G5EH06, L8FYN9, L8G2Y8, L8G444, K1X016, H0EK58, H0EMW6, A7ECZ2, A7EE89, G2XY33, G2YJS4, F9X2B9, F9XK71, K2RB15, K2RGL3, K2RP03, K2RPM8, K2RQR5, K2S322, K2SZM9, Q69IF8, Q0UHR7, E4ZRH6, E5A7T3, B2WGB0, B2WKI0, E3RJ23, E3RV82, E3S587, H6BKH3, H6BM53, C1GZJ9, C1G562, C0RZV4, F2TGB5, C5JCV4, C5GN06, A6QUE3, C0NPB3, F0UNC9, C6HMJ2, C5FBS2, C5FW52, C5FZ57, E4UT30, E4UVV5, E5R1B9, F2PIN2, F2PQ50, F2PV53, F2RMH2, F2RVW8, F2S2U6, F2SC70, F2SGJ3, F2SPK0, D4ANC3, D4AIC4, D4AT39, D4D7C5, D4DE18, D4DGR1 , C4JDF6, C4JN46, J3KGQ7, C5PEI9, E9D624, B6Q1I3, B6Q274, B6Q6T6, B6Q6T7, B6Q8H5, B6Q9G4, B6Q9H7, B6Q9I3, B6Q9J3, B6Q9K3, B6Q9L8, B6QBG2, B6QEJ4, B6QG34, B6QGC4, B6QHY1, B6QJ96, B6QKT4, B6QLX8, B6QN81, B6QTU1, B6QWF0, B6QWF9, B6QWN8, B8MBN5, B8MF81, B6HJ 1, B6HL60, K9FBY4, K9FD05, K9FYH4, K9GAX5, A1C7T3, A1CBR4, A1DIF5, A1DDK1, B0XUW1, B0Y1V8, P41748, Q4WZS3, 093885, Q5BBC3, G7XV40, G7XX71, G3XNE5, G3XWX0, A2R613, A2R3L3, Q0CJF2, Q0CVD5, B8N6F5, B8N6H9, B8NLY9, 17ZVS3, 18A3B4, 18A4M3, Q06902, Q2UCB1, Q2UDE1.

[00138] Orthologs provided herein include EOG7VXHQH: L0PFM1 , B6K5C2, Q09803, Q6CEE2, K0KHZ6, 12H443, G8BY43, A7TH89, J7S7M5, H2APP1, G0V9C0, G0WGJ3, Q6FQG5, J8PV56, J5RU25, C8ZJK2, E7M1B1, E7KVQ5, E7QAN5, B3LKD3, C7GP56, P52917, A6ZX48, G2WPY5, B5VTV5, E7KIZ7, G8ZZ56, C5DUT4, C5DBA6, Q6CVM8, G8JQD8, Q758U9, C4Y9U8, G3B2U9, A5DQ68, Q6BPY2, G8YAM4, A3LVF1, G3API6, A5E2L0, H8X0P1, G8BHP6, C5MHK4, B9WHM5, Q5AGH7, C4YRJ0, E7R5E0, F2QSM3, C4R134, G1X6L2, D5GDW2, C7Z0G9, J9NCI9, F9FAK3, K3V9X9, 11S2H0, G9NXJ6, G0RUD7, G9MPF3, E9DR36, E9EY52, E9F1R1, J5JZ89, G3J484, C9SDP0, G2WV30, L2G5B1, E3QHT1, H1W1P2, F7W8P8, Q7S0H4, F8MSY5, G4UX09, G0S2U5, G2R8H3, Q2GQ74, G2QJM2, F0XG40, J3NML0, G4N2E6, L8FTA6, K1WRG0, H0EJC8, A7F3H9, G2XRL0, F9XML0, K2S1H5, Q0U7R6, E5A3X4, B2VXZ4, E3S1Q2, H6C2F6, C1H9G7, C1GCX1, C0SHS5, F2TM57, C5JDP2, C5GXE6, A6R703, C0NGS1, F0U6J5, C6H763, C5FLK6, E5R0G2, F2PIZ9, F2S5M7, F2SEC2, D4AYA0, D4D821, C4JW95, J3KKJ4, C5PFC4, E9D9U2, B6QQZ4, B8M727, B6GYF9, K9FCH3, K9FKM2, A1CK47, A1D7B7, B0XY62, Q4WXF8, Q5B8R9, G7XDR3, G3Y5Q9, A2R7C1, Q0CXN9, B8MZP8, 18IP85, Q2UQD2. [00139] Orthologs provided herein include EOG7WQGMW: L0PD75, B6K2G3, B6K7R9, P40903, Q9UTS0, P09230, Q6CAG3, Q6CDL6, Q6CED6, Q6CHQ5, K0KKS7, K0KLN5, I2GWL7, G8C0U0, A7TGW1, A7TI81, J7R679, J7S7G8, H2ARF1, H2B1Z3, G0VE53, G0VG86, G0WG86, G0WI68, Q6FVE0, Q6FXI6, J8LI91, J8PPP8, J6ECK6, C8Z6T1, C8ZHW6, E7LTE5, E7KME8, E7KU66, E7QDM3, E7NGQ6, B3LJ58, B3LRV8, C7GJV0, C7GXE7, P09232, P25036, A6ZNK8, A6ZQP1, G2WCH4, G2WMN3, B5VH77, B5VRQ5, E7KB19, G8ZY55, C5DTE7, C5DER2, Q6CSH5, G8JQW6, Q75CA0, C4XVW3, C4Y0Y9, G3B1H8, G3B5Z2, A5DQT9, A5DRK4, Q6BT81, Q6BU37, G8Y2W7, G8YID5, A3LTQ3, A3LY46, G3AKH7, G3AKH8, G3AKH9, G3AVA7, A5DYI0, A5E6E6, H8X442, H8XAB0, G8BJR5, G8BKW5, C5M7Q3, C5MGI9, B9WBX7, B9WJU6, Q59Z57, Q5A099, C4YJZ5, C4YT18, E7R5X5, F2QNU8, F2QPZ2, C4QWH2, C4QYT0, G1X9K0, G1XDP8, G1XLL2, G1X8P8, D5G4Q1, D5G948, D5GKK3, C7YIT9, C7YJM3, C7YLP6, C7YSV9, C7ZG07, C7ZI44, C7Z J9, J9ME45, J9MEI2, J9M 46, J9MPS0, J9N3P2, J9NAP2, J9NE39, F9F249, F9FF64, F9FP14, F9FUQ4, F9G0Y1, F9G3L2, F9G471, F9G7A8, K3UE80, K3UVC8, K3VB03, K3VK54, K3VMV3, K3VTN6, K3VZS1, K3W2J6, 11R9N9, 11RB96, I1RGU8, 11RHQ4, 11RUV5, 11RW12, 11S1C5, 11S3L6, G9N R3, G9NVB2, G9P5N1, G9P6E2, G0REN2, G0RHA8, G0RRH0, G0RRK1, G9MFA6, G9MZX8, G9N8A5, G9N8F3, G9NA09, E9DTW1, E9DVC3, E9E3D7, E9E5J4, E9E788, E9EAE1 , E9EFF2, E9EFM9, E9EGK7, E9EQQ6, E9F076, E9F724, E9F8V2, E9F8W9, E9FBI7, E9FBR3, E9FDB 1 , E9FDM8, J4KP98, J4KRC0, J4UNL8, J5JM98, J5K3B7, G3J3F3, G3J969, G3JEB0, G3JHW1, G3JJD0, G3JJM4, C9S9N0, C9SL49, C9SQM3, G2WYI8, G2X826, G2X9Y1, L2FCI6, L2FNW0, L2FTS6, L2FV 5, L2G4B7, L2G4M0, L2GHG7, E3Q3S5, E3QH13, E3QJ58, E3QPF9, H1UWA2, H1UZ47, H1V5K4, H1VCC3, H1VCH6, H1VTG6, H1W1B9, F7VQK3, F7W7E4, F7W8Q7, Q1K5X6, Q7S0X5, F8MEY9, F8MT44, F8N4Q2, G4U7U5, G4UFP5, G4UWU8, G0RZZ4, G0S9X5, G0S9Y5, G2QUE4, G2QWI7, G2R4H9, Q2GUF9, Q2GYU7, Q2H4N5, G2Q6Z6, G2Q925, G2QGL4, F0XT69, J3NLS1, J3NM96, J3P285, J3P432, G4MW44, G4MY16, G4N2V5, G5EHJ3, P58371, L8FSM5, L8FUP2, L8G6I7, K1WZP9, K1XP69, H0EGQ5, H0EQU7, H0EWV7, A7ED92, A7F4T0, G2YQS9, G2YRA6, F9WXN5, F9X5S9, F9XA84, F9XB63, F9XE34, K2R445, K2RXV9, F5HHJ0, Q0TXC6, Q0TZZ4, Q0U4V4, Q0UA54, Q0UP23, E4ZK32, E4ZK33, E4ZQP9, E4ZXM2, E5AA55, E5ACC0, B2VSR3, B2VY68, B2VY69, B2WEW2, E3RL07, E3RUQ2, E3RUQ3, E3RVQ3, H6C8Q2, C1H074, C1GJI6, C0S808, F2TE84, C5JF1 1, C5GHL3, A6QTC9, C0NTY8, F0ULK1, C6HEQ0, C5G168, C5FII2, E4UUN0, E4UZP9, F2Q5Y4, F2RNF1, F2S2G7, F2SN26, F2T0C0, D4AZ75, D4AX50, D4DLI5, D4DKQ4, C4JK17, C4JLR1 , C4JQ49, C4JQB5, C4K087, 19NM85, 19XM64, J3K0S2, J3KCY7, C5P6D1, C5P906, C5NZ70, C5P4Z8, E9CZL0, E9D3H7, E9DBN4, E9DIN5, B6QEB0, B8MD40, B6HLS3, K9FXJ0, K9FN1 1 , Al CAEl , A1CIA7, A1DER5, A1CWF3, B0Y473, B0Y708, P87184, P28296, C8VUL6, Q00208, G7XH05, G7XXA4, G3XMS 1 , G3Y0L4, A2QMZ7, A2QTZ2, Q0CID8, Q0CQY4, B8NUE0, B8N106, 17ZW44, 18A6W5, P12547, Q2U428.

[00140] Orthologs provided herein include EOG7X6TT9 : B6JV82, Q 10057, Q6C781 , K0KYD4, 12GVZ5, G8BQ74, A7TFB1, J7RMC3, J7RW07, H2ATH7, H2AXQ8, G0V556, G0VAV1 , G0W5H7, G0WDA5, Q6FSC0, Q6FX95, J8LJC2, J8LQX0, J6EHM2, J6EQ53, C8Z447, C8Z674, E7LRU8, E7LT70, E7KKR7, E7KMC0, E7QC03, E7QDH9, E7NGE6, E7Q2U8, B3LFF2, B3LU41, C7GKV1, C7GSZ8, P32474, P17967, A6ZTE9, A6ZZA5, G2W9Y3, G2WBP5, B5VER8, B5VH39, E7K9Z8, E7KBH1, G8ZQS4, C5DZ36, C5DK19, F2Z6F2, G8JM34, Q751 V7, C4Y795, G3B0N1, A5DNC3, Q6BN93, G8Y1C1 , A3LSL2, G3AV65, A5E6G6, H8XAC9, G8BKY1, C5MGL3, B9WKI6, Q5A5F2, C4YSW3, E7R696, F2QY05, C4R938, G1XG88, D5GF75, C7YK79, J9MB95, F9F193, K3V2C6, 11S4Z7, G9NW38, G0RMH6, G9MI10, E9DSE8, E9F4R4, J5JHT2, G3JLF8, C9SLW1, G2X024, L2FNS9, E3Q5T5, HI VYS5, F7VKY9, Q7S399, F8N1K6, G4UC97, G0SGS2, G2R472, Q2HFQ2, G2PZX2, F0X918, J3P7V1, G4MPX2, L8G5P5, K1X203, H0ECW9, A7ECC8, G2YZD4, F9XN30, K2RC15, Q0U7L3, Q0UGY2, E5A315, B2W8Q8, E3S8N2, H6C3Q5, C1GR41, C1G9A5, COSIOO, F2T410, C5JSZ7, C5GDB9, A6R1X2, C0NCH9, F0UIA6, C6HAT9, C5FFB3, E5QZ06, F2S5A0, F2SP00, D4B2L8, D4D7G0, C4JT91, J3K3P6, C5P668, E9CSC9, B6Q377, B8LT84, B6HK48, K9GD09, 9GF84, A1C5W8, A1DG36, B0XVY4, Q4WH99, Q5AW94, G7XB83, G3Y367, A2QFJ8, Q0CL25, B8NPF9, 17ZML7, Q00248.

[00141] Filamentous fungi are desirable host organisms for heterologous nutritive polypeptide production. Host orgainisms such as Aspergillus are provided that provide high yield secretion. Additionally, provided herein are nutritive polypeptides produced from fungal hosts, such nutritive polypeptides having modulated post-translational modifications such as glycosylation. Limitations in the posttranscriptional processing by fungal hosts, such as vacuolar sorting genes, previously restricted heterologous protein secretion. (See, e.g., Yoon, J et all 2010). Provided are fungal host organisms containing variant nucleic acids, such variant nucleic acids representing one or more modifications, deletions, and/or amplifications to genes encoding proteins involved in post-transcriptional processing, thereby enhancing nutritive protein processing with the result of increasing heterologous secretion of nutritive polypeptides having preferred protein characteristics including digestibility and solubility By way of non-limiting example, VSP10 is a vacuolar protein sorting gene that encodes a type 1 transmembrane receptor protein involved in targeting and delivery of recombinant proteins from the late-Golgi compartments to vacuoles (See, e.g., Holkeri, H et al 1998 FEBS Lett 429:162-6). Modifications to this gene increases sorting of the vacuolar proteins to the vacuoles and provideds increased secretion of heterologous nutritive polypeptides, including the ability to increase the ratio of secreted nutritive polypeptide to total produced nutritive polypeptide.

[00142] Further, provided are filamentous fungi having variants of homologous secretion supporting genes and their encoded proteins. Exemplary genes such as CUP5, SSA4, BMH2, KIN2, SSE1, BFR2, PDI1, COG6, SS02, KAR2, HAC1, EROl, COY1, IMH1, and SEC31, when expressed with a heterologous nutritive polypeptide in different combinations and at diffent levels compared to a wild-type fungal host, significantly impacts secretion of the nutritive polypeptide. For example, increased expression of on or more of CUPS, SSA4, BMH2, KIN2, SSE1, and BFR2 in a fungal host provides increased secretion of the heterologous protein. In another example, COG6, COY1, and SS02 over-expression in combination with other genes may be useful to demonstrate increases in nutritive polypeptide secretion. (See, e.g., Gasser et al. 2007 Appl Environ Microbiol. 73:6499-507; Gasser et al. 2007 Biotechnol Lett 29:201-12).

[00143] Filamentous fungi have numerous check points within the endoplasmic reticulum (ER) and the golgi apparatus, which may in certain fungal hosts limit the types and confirmations of proteins that are able to be secreted. Provided are methods of altering intra- organelle conditions, resulting in an increased heterologous nutritive polypeptide retention time, increased proper folding, stabilization of the ER, and increased transport of nutritive polypeptides between organelles and into the extracellular milieu. (See, e.g., Martin 2014). For example, provided is the manipulation of the PMR1 gene to create variants in a fungal host. PMR1 encodes a polypeptide that mediates Ca2+ within the organelles to allow for significantly higher Ca2+ concentrations, which in turn provide higher BiP binding to incompletely folded nutritive polypeptides. This Ca2+-BiP-nutritive polypeptide complex results in increased ER stability, as well as increased nutritive polypeptide residence time within the ER, providing an environment conducive to proper folding. It is believed that once protein folding is achieved, BiP is released from the matrix and the ER is destabilized, allowing for vesicle budding and the properly folded protein is then moved into the Golgi, still in the presence of higher Ca2+ concentrations. This increased Ca2+ allows the nutritive polypeptide to bypass major Golgi checkpoints, driving the movement of secretory vesicles through the fungal hyphae, and resulting in increased nutritive polypeptide secretion.

[00144] In one embodiment, provided are methods for producing Aspergillus host cells with variant vacuolar protein sorting genes (See, e.g. Le Crom et al. 2009 PNAS 106:16151- 6), such as AovpslO (Yoon et al, 2010 App Environ Microbiol. 76:5718-27), COG6, COY1, and IMHl (Gasser et al 2007 Appl. Environ Microbiol. 73:6499-507), PMRl (Harmsen, et al. 1996 Appl Microbiol.Biotechnol. 46:365-70), and SSOl and SS02 (Larsson et al., 2001, Appl. Environ. Microbiol. 67:1163-70; Toikkanen et al., 2004, Yeast 21 :1045-55)

[00145] In one embodiment, provided are methods for producing Aspergillus host cells with variant protease genes, or protease protein levels. Exemplary secreted proteases are modulated by, e.g., variant prtT (Punt et al. 2008, Fungal Genetics and Biol. 45:1591-9). Additionally secreted protease genes are deleted or modifed to produce Aspergillus hosts having one of the following variations (" "): ApepA, ApepB, ApepD, ApepF, and ApepH. In some embodiments, a protease that is only secreted under fermentation conditions is deleted or its expression and/or secretion reduced. In other embodiments, cytoplasmic proteases are deleted, e.g., highly transcribed proteases, or proteases that are generally transcribed under fermentation conditions.

[00146] In one embodiment, provided are methods for producing Aspergillus host cells with variant post-Golgi trafficking proteins, or variant endoplasmic reticulum processing, such as by deletion of one or more substantial secreted proteins, such as glaA or cbhl.

[00147] In one embodiment, provided are methods for producing Aspergillus host cells with variant sub-apical regions, such as by variants of chsD and chsE of Aspergillus nidulans.

[00148] In one embodiment, provided are methods for producing Aspergillus host cells with modified post transcriptional processing capability.

[00149] Also provided are host cells having one or more variant nucleic acids resulting in reduced expression promoters, in order to to decrease cellular stresses and increase secretion ability.

[00150] Also provided are host cells having one or more variant nucleic acids resulting in modulated level and/or activity of one or more glycosylation enzymes, e.g., the O- mannosyltransferase pmtl, dpml,psal, or rerl. (See, e.g., Kruszewska et al.,1999;

Uceelletti et al., 2005; or Perlinska-Lenart et al., 2006). In addition, provided are host cells that are capable of over-expressing one or more SNARE regulating proteins. (See, e.g., Hou et al., 2012).

[00151] Also provided are host cells having one or more variant nucleic acids resulting in increased levels and/or activities of various chaperone proteins, particularly foldases, e.g., HSF1, bipA, pdiA, hacA, or prpA. Also provided are variant nucleic acids of the transcriptional regulator had (See, e.g., Valkonen et al., 2003), as well as karlp. (See, e.g., Smith and Robinson, 2002).

[00152] Also provided are host cells having one or more variant nucleic acids resulting in decreased levels and/or activities of protein degradation pathways, such as the VPS4, VPS8, VPS13, VPS35, VPS36, PEP4 vacuolar proteases (See, e.g., Zhang et al., 2001 JCB

153:1187-98), as well as YAP 3 and/or KEX2 (See, e.g., Geisow et al, 1991), or SKI5P (See, e.g., Bussey et al., 1983, Curr. Genet. 7:449-56).

[00153] Also provided are host cells having one or more variant nucleic acids resulting in altered metabolic state of the host cell, such as amino acid metabolism and/or

respiration/oxidative stress, and oxygen and ATP consumption (See, e.g., Tyo et al., 2012 BMC Biol. 10:16). In some embodiments, the citric acid pathway is disrupted to decrease citric acid production, variant genes including citrate synthase, aconitase, and isocitrate dehydrogenase. In some embodiments, the oxalic acid pathway is disrupted to decrease oxalic acid production, variant genes including oxaloacetate acetylhydrolase.

[00154] Also provided are host cells having one or more variant nucleic acids that allow enhanced gene targeting frequency, as described herein, such as Ku70.

[00155] Also provided are host cells that contain a variant nucleic acid that results in increased expression of one or more genes in the host cell. (See, e.g., Sampson et al., Bioessays. (2014): 36(l):34-8; Gilbert et al., Cell. (2013): 18:154(2):442-51; and Farzadfard et al., ACS Synth Biol. (2013): 18:2(10):604-13).

[00156] As described herein, provided are host cells transformed with the nucleic acid molecules or vectors disclosed herein, and descendants thereof. In some embodiments the host cells are microbial cells. In some embodiments, the host cells carry the nucleic acid sequences on vectors, which may but need not be freely replicating vectors. In other embodiments, the nucleic acids have been integrated into the genome of the host cells and/or into an endogenous plasmid of the host cells. The transformed host cells find use, e.g., in the production of recombinant proteins disclosed herein. [00157] In preferred embodiments, provided are fungal host cells with variant nucleic acids. Several aspects of the invention relate to host cells, nucleic acids, polypeptides and systems for expression of CRISPR transcripts (e.g. nucleic acid transcripts, proteins, or enzymes) in eukaryotic cells. For example, CRISPR transcripts can be expressed in fungal cells, such as Aspergillus, Trichoderma, and yeast cells.

[00158] In general, "CRISPR" or the "CRISPR system" refers collectively to nucleic acids, proteins and other elements involved in the expression of or directing the activity of Clustered Regularly Interspaced Short Palindromic Repeats ("CRISPR") and CRISPR- associated ("Cas") genes, including sequences encoding a Cas gene, a tracr (trans-activating CRISPR) sequence (e.g. tracrRNA or an active partial tracrRNA), a tracr-mate sequence (encompassing a "direct repeat" and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a guide sequence (also referred to as a "spacer" in the context of an endogenous CRISPR system), or other sequences and transcripts from a

CRISPR locus.

[00159] Use of the CRISPR system is provided in, e.g., PCT/US2014/023828,

PCT/US2013/053287, PCT/US2013/032589, PCT/US2013/045602, PCT/US2014/029304, PCT/US2014/028630, PCT/US2014/029068, PCT/US2014/028445, US20140273235, US20140273234, US20140273233, US20140273232, US20140273231, US20140273230, and US20140273226, the contents of which are hereby incorporated by reference in their entireties.

[00160] In some embodiments, one or more elements of a CRISPR system is derived from a type I, type II, or type III CRISPR system. In some embodiments, one or more elements of a CRISPR system is derived from a particular organism comprising an endogenous CRISPR system, such as Streptococcus pyogenes. In general, a CRISPR system is characterized by elements that promote the formation of a CRISPR complex at the site of a target sequence (also referred to as a protospacer in the context of an endogenous CRISPR system). In the context of formation of a CRISPR complex, "target sequence" refers to a sequence to which a guide sequence is designed to have complementarity, where hybridization between a target sequence and a guide sequence promotes the formation of a CRISPR complex. Full complementarity is not necessarily required, provided there is sufficient complementarity to cause hybridization and promote formation of a CRISPR complex. A target sequence may comprise any polynucleotide, such as DNA or RNA polynucleotides. In some embodiments, a target sequence is located in the nucleus or cytoplasm of a cell. In some embodiments, the target sequence may be within an organelle of a eukaryotic cell, for example, mitochondrion or chloroplast. A sequence or template that may be used for recombination into the targeted locus comprising the target sequences is referred to as an "editing template" or "editing polynucleotide" or "editing sequence". In aspects of the invention, an exogenous template polynucleotide may be referred to as an editing template. In an aspect of the invention the recombination is homologous recombination. Non-limiting examples of suitable CRISPR/Cas proteins include Cas3, Cas4, Cas5, Cas5e (or CasD), Cas6, Cas6e, Cas6f, Cas7, Cas8al, Cas8a2, Cas8b, Cas8c, Cas9, CaslO, CaslOd, CasF, CasG, CasH, Csyl , Csy2, Csy3, Csel (or CasA), Cse2 (or CasB), Cse3 (or CasE), Cse4 (or CasC), Csel, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl , Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csxl7, Csxl4, CsxlO, Csxl6, CsaX, Csx3, Cszl, Csxl5, Csfl , Csf2, Csf3, Csf4, and Cul966. In exemplary embodiments, the CR S R/Cas-like protein of the fusion protein is derived from a Cas9 protein. The Cas9 protein can be from or codon-optimized to be expressed in

Aspergillus sp, Streptococcus pyogenes, Streptococcus thermophilus, Streptococcus sp., Nocardiopsis dassonvillei, Streptomyces pristinaespiralis, Streptomyces viridochromogenes, Streptomyces viridochromogenes, Streptosporangium roseum, Streptosporangium roseum, Alicyclobacillus acidocaldarius, Bacillus pseudomycoides, Bacillus selenitireducens, Exiguobacterium sibiricum, Lactobacillus delbrueckii, Lactobacillus salivarius, Microscilla marina, Burkholderiales bacterium, Polaromonas naphthalenivorans, Polaromonas sp., Crocosphaera watsonii, Cyanothece sp., Microcystis aeruginosa, Synechococcus sp., Acetohalobium arabaticum, Ammonifex degensii, Caldicelulosiruptor becscii, Candidatus Desulforudis, Clostridium botulinum, Clostridium difficile, Finegoldia magna,

Natranaerobius thermophilus, Pelotomaculum the rmopropionicum, Acidithiobacillus caldus, Acidithiobacillus ferrooxidans, Allochromatium vinosum, Marinobacter sp., Nitrosococcus halophilus, Nitrosococcus watsoni, Pseudoalteromonas haloplanktis, Ktedonobacter racemifer, Methanohalobium evestigatum, Anabaena variabilis, Nodularia spumigena, Nostoc sp., Arthrospira maxima, Arthrospira platensis, Arthrospira sp., Lyngbya sp., Microcoleus chthonoplastes, Oscillatoria sp., Petrotoga mobilis, Thermosipho africanus, or Acaryochloris marina. The CR/SPR/Cas-like protein of the fusion protein can be a wild type CRISPR/Cas protein, a modified CRISPR/Cas protein, or a fragment of a wild type or modified CRISPR/Cas protein. The CRISPR/Cas protein can be modified to increase nucleic acid binding affinity and/or specificity, alter an enzymatic activity, and/or change another property of the protein. For example, nuclease (i.e., DNase, RNase) domains of the CRISPR/Cas protein can be modified, deleted, or inactivated. Alternatively, the CRISPR/Cas protein can be truncated to remove domains that are not essential for the function of the fusion protein. The CRISPR/Cas protein can also be truncated or modified to optimize the activity of the effector domain of the fusion protein. In some embodiments, the CRISPR/Cas- like protein of the fusion protein can be derived from a wild type Cas9 protein or fragment thereof. In other embodiments, the CR/SFR/Cas-like protein of the fusion protein can be derived from modified Cas9 protein. For example, the amino acid sequence of the Cas9 protein can be modified to alter one or more properties (e.g., nuclease activity, affinity, stability, etc.) of the protein. Alternatively, domains of the Cas9 protein not involved in RNA-guided cleavage can be eliminated from the protein such that the modified Cas9 protein is smaller than the wild type Cas9 protein. In general, a Cas9 protein comprises at least two nuclease (i.e., DNase) domains. For example, a Cas9 protein can comprise a RuvC-like nuclease domain and a HNH-like nuclease domain. The RuvC and HNH domains work together to cut single strands to make a double-stranded break in DNA. (Jinek et al., Science, 337: 816-821). In some embodiments, the Cas9-derived protein can be modified to contain only one functional nuclease domain (either a RuvC-like or a FfNH-like nuclease domain). For example, the Cas9-derived protein can be modified such that one of the nuclease domains is deleted or mutated such that it is no longer functional (i.e., the nuclease activity is absent). In some embodiments in which one of the nuclease domains is inactive, the Cas9-derived protein is able to introduce a nick into a double-stranded nucleic acid (such protein is termed a "nickase"), but not cleave the double-stranded DNA. For example, an aspartate to alanine (D10A) conversion in a RuvC-like domain converts the Cas9-derived protein into a nickase. Likewise, a histidine to alanine (H840A) conversion in a HNH domain converts the Cas9- derived protein into a nickase.

[00161] A variety of host microorganisms can be transformed with a nucleic acid sequence disclosed herein and can in some embodiments be used to produce a recombinant protein disclosed herein. Suitable host microorganisms include both autotrophic and heterotrophic microbes. In some applications the autotrophic microorganisms allows for a reduction in the fossil fuel and/or electricity inputs required to make a protein encoded by a recombinant nucleic acid sequence introduced into the host microorganism. This, in turn, in some applications reduces the cost and/or the environmental impact of producing the protein and/or reduces the cost and/or the environmental impact in comparison to the cost and/or environmental impact of manufacturing alternative proteins, such as whey, egg, and soy. For example, the cost and/or environmental impact of making a protein disclosed herein using a host microorganism as disclosed herein is in some embodiments lower that the cost and/or environmental impact of making whey protein in a form suitable for human consumption by processing of cow's milk.

[00162] Non-limiting examples of heterotrophs include Escherichia coli, Salmonella typhimurium, Bacillus subtilis, Bacillus megaterium, Corynebacterium glutamicum, Streptomyces coelicolor, Streptomyces lividans, Streptomyces vanezuelae, Streptomyces roseosporus, Streptomyces fradiae, Streptomyces griseus, Streptomyces calvuligerus, Streptomyces hygroscopicus, Streptomyces platensis, Saccharopolyspora erythraea, Corynebacterium glutamicum, Aspergillus niger, Aspergillus nidulans, Aspergillus oryzae, Aspergillus terreus, Aspergillus sojae, Penicillium chrysogenum, Trichoderma reesei, Clostridium acetobutylicum, Clostridium beijerinckii, Clostridium thermocellum, Fusibacter paucivorans, Saccharomyces cerevisiae, Saccharomyces boulardii, Pichia pastoris, and Pichia stipitis.

[00163] Photoautotrophic microrganisms include eukaryotic algae, as well as prokaryotic cyanobacteria, green-sulfur bacteria, green non-sulfur bacteria, purple sulfur bacteria, and purple non-sulfur bacteria. Extremophiles are also contemplated as suitable organisms. Such organisms are provided, e.g., in Mixotrophic organisms are also suitable organisms. Algae and cyanobacteria are contemplated as suitable organisms. See the organisms disclosed in, e.g., PCT/US2013/032232, filed March 15, 2013, PCT/US2013/032180, filed March 15, 2013, PCT/US2013/032225, filed March 15, 2013, PCT/US2013/032218, filed March 15, 2013, PCT/US2013/032212, filed March 15, 2013, PCT/US2013/032206, filed March 15, 2013, PCT/US2013/038682, filed April 29, 2013, PCT/US2014/057526, filed September 25, 2014; PCT/US2014/057527, filed September 25, 2014; and PCT/US2014/057528, filed September 25, 2014.

[00164] Yet other suitable organisms include synthetic cells or cells produced by synthetic genomes as described in Venter et al. US Pat. Pub. No. 2007/0264688, and cell-like systems or synthetic cells as described in Glass et al. US Pat. Pub. No. 2007/0269862.

[00165] Still other suitable organisms include Escherichia coli, Acetobacter aceti, Bacillus subtilis, yeast and fungi such as Clostridium ljungdahlii, Clostridium thermocellum, Penicillium chrysogenum, Pichia pastoris, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Pseudomonas fluorescens, or Zymomonas mobilis. In some embodiments those organisms are engineered to fix carbon dioxide while in other embodiments they are not. [00166] Transfection

[00167] Proteins can be produced in a host cell using, for example, a combination of recombinant DNA techniques and gene transfection methods as is well known in the art (e.g., Morrison, S. (1985) Science 229:1202). For expression of the protein, the expression vector(s) encoding the protien is transfected into a host cell by standard techniques. The various forms of the term transfection are intended to encompass a wide variety of techniques commonly used for the introduction of exogenous DNA into a prokaryotic or eukaryotic host cell, e.g., electroporation, calcium-phosphate precipitation, DEAE-dextran transfection and the like.

[00168] Production

[00169] Skilled artisans are aware of many suitable methods available for culturing recombinant cells to produce (and optionally secrete) a protein as disclosed herein, as well as for purification and/or isolation of expressed proteins. The methods chosen for protein purification depend on many variables, including the properties of the protein of interest, its location and form within the cell, the vector, host strain background, and the intended application for the expressed protein. Culture conditions can also have an effect on solubility and localization of a given target protein. Many approaches can be used to purify target proteins expressed in recombinant microbial cells as disclosed herein, including without limitation ion exchange and gel filtration.

[00170] In some embodiments a peptide fusion tag is added to the recombinant protein making possible a variety of affinity purification methods that take advantage of the peptide fusion tag. In some embodiments, the use of an affinity method enables the purification of the target protein to near homogeneity in one step. Purification may include cleavage of part or all of the fusion tag with enterokinase, factor Xa, thrombin, or HRV 3C proteases, for example. In some embodiments, before purification or activity measurements of an expressed target protein, preliminary analysis of expression levels, cellular localization, and solubility of the target protein is performed. The target protein can be found in any or all of the following fractions: soluble or insoluble cytoplasmic fractions, periplasm, or medium.

Depending on the intended application, preferential localization to inclusion bodies, medium, or the periplasmic space can be advantageous, in some embodiments, for rapid purification by relatively simple procedures.

[00171] In some embodiments the protein is initially not folded correctly or is insoluble. A variety of methods are well known for refolding of insoluble proteins. Most protocols comprise the isolation of insoluble inclusion bodies by centrifugation followed by solubilization under denaturing conditions. The protein is then dialyzed or diluted into a non- denaturing buffer where refolding occurs. Because every protein possesses unique folding properties, the optimal refolding protocol for any given protein can be empirically determined by a skilled artisan. Optimal refolding conditions can, for example, be rapidly determined on a small scale by a matrix approach, in which variables such as protein concentration, reducing agent, redox treatment, divalent cations, etc., are tested. Once the optimal concentrations are found, they can be applied to a larger scale solubilization and refolding of the target protein.

[00172] In some embodiments the protein does not comprise a tertiary structure. In some embodiments less than half of the amino acids in the protein partipate in a tertiary structure. In some embodiments the protein does not comprise a secondary structure. In some embodiments less than half of the amino acids in the protein partipate in a secondary structure. Recombinant proteins can be isolated from a culture of cells expressing them in a state that comprises one or more of these structural features. In some embodiments the tertiary structure of a recombinant protein is reduced or eliminated after the protein is isolated from a culture producing it. In some embodiments the secondary structure of a recombinant protein is reduced or eliminated after the protein is isolated from a culture producing it.

[00173] In some embodiments a CAPS buffer at alkaline pH in combination with N- lauroylsarcosine is used to achieve solubility of the inclusion bodies, followed by dialysis in the presence of DTT to promote refolding. Depending on the target protein, expression conditions, and intended application, proteins solubilized from washed inclusion bodies can be > 90% homogeneous and may not require further purification. Purification under fully denaturing conditions (before refolding) is possible using His « Tag ® fusion proteins and His » Bind® immobilized metal affinity chromatography (Novogen ® ). In addition, S « Tag™, T7*Tag®, and Strep'Tag® II fusion proteins solubilized from inclusion bodies using 6 M urea can be purified under partially denaturing conditions by dilution to 2 M urea (S'Tag and T7 « Tag) or 1 M urea (Strep » Tag II) prior to chromatography on the appropriate resin.

Refolded fusion proteins can be affinity purified under native conditions using His » Tag, S*Tag, Strep » Tag II, and other appropriate affinity tags (e.g., GST»Tag™, and T7*Tag) (Novogen ® ).

[00174] In some embodiments the protein is an endogenous protein of the host cell used to express it. That is, the cellular genome of the host cell comprises an open reading frame that encodes the recombinant protein. In some embodiments regulatory sequences sufficient to increase expression of the protein are inserted into the host cell genome and operatively linked to the endogenous open reading frame such that the regulatory sequences drive overexpression of the recombinant protein from a recombinant nucleic acid. In some embodiments heterologous nucleic acid sequences are fused to the endogenous open reading frame of the protein and cause the protein to be synthesized comprising a hetgerologous amino acid sequence that changes the cellular trafficking of the recombinant protein, such as directing it to an organelle or to a secretion pathway. In some embodiments an open reading frame that encodes the endogeneous host cell protein is introduced into the host cell on a plasmid that further comprises regulatory sequences operatively linked to the open reading frame. In some embodiments the recombinant host cell expresses at least 2 times, at least 3 times, at least 4 times, at least 5 times, at least 10 times, or at least 20 times, at least 30 times, at least 40 times, at least 50 times, or at least 100 times more of the recombinant protein than the amount of the protein produced by a similar host cell grown under similar conditions.

[00175] Purification

[00176] Secreted

[00177] It is generally recognized that nearly all secreted bacterial proteins, and those proteins from other unicellular hosts, are synthesized as pre-proteins that contain N-terminal sequences known as signal peptides. These signal peptides influence the final destination of the protein and the mechanisms by which they are transported. Most signal peptides can be placed into one of four groups based on their translocation mechanism (e.g., Sec- or Tat- mediated) and the type of signal peptidase used to cleave the signal peptide from the preprotein. Also provided are N-terminal signal peptides containing a lipoprotein signal peptide. Although proteins carrying this type of signal are transported via the Sec translocase, their peptide signals tend to be shorter than normal Sec-signals and they contain a distinct sequence motif in the C-domain known as the lipo box (L(AS)( GA)C) at the -3 to +1 position. The cysteine at the +1 position is lipid modified following translocation whereupon the signal sequence is cleaved by a type II signal peptidase. Also provided are type IV or prepilin signal peptides, wherein type IV peptidase cleavage domains are localized between the N- and H-domain rather than in the C-domain common in other signal peptides.

[00178] As provided herein, the signal peptides can be attached to a heterologous polypeptide sequence (i.e., different than the protein the signal peptide is derived or obtained from) containing a nutritive polypeptide, in order to generate a recombinant nutritive polypeptide sequence. Alternatively, if a nutritive polypeptide is naturally secreted in the host organism, it can be sufficient to use the native signal sequence or a variety of signal sequences that directs secretion. In some embodiments of the nutritive polypeptides, the heterologous nutritive polypeptide sequence attached to the carboxyl terminus of the signal peptide is an edible species eukaryotic protein, a mutein or derivative thereof, or a polypeptide nutritional domain. In other embodiments of the polypeptide, the heterologous nutritive polypeptide sequence attached to the carboxyl terminus of the signal peptide is an edible species intracellular protein, a mutein or derivative thereof, or a polypeptide nutritional domain.

[00179] Purification of nutritive polypeptides.

[00180] Also provided are methods for recovering the secreted nutritive polypeptide from the culture medium. In some embodiments the secreted nutritive polypeptide is recovered from the culture medium during the exponential growth phase or after the exponential growth phase (e.g., in pre-stationary phase or stationary phase). In some embodiments the secreted nutritive polypeptide is recovered from the culture medium during the stationary phase. In some embodiments the secreted nutritive polypeptide is recovered from the culture medium at a first time point, the culture is continued under conditions sufficient for production and secretion of the recombinant nutritive polypeptide by the microorganism, and the recombinant nutritive polypeptide is recovered from the culture medium at a second time point. In some embodiments the secreted nutritive polypeptide is recovered from the culture medium by a continuous process. In some embodiments the secreted nutritive polypeptide is recovered from the culture medium by a batch process. In some embodiments the secreted nutritive polypeptide is recovered from the culture medium by a semi-continuous process. In some embodiments the secreted nutritive polypeptide is recovered from the culture medium by a fed-batch process. Those skilled in the art are aware of many suitable methods available for culturing recombinant cells to produce (and optionally secrete) a recombinant nutritive polypeptide as disclosed herein, as well as for purification and/or isolation of expressed recombinant polypeptides. The methods chosen for polypeptide purification depend on many variables, including the properties of the polypeptide of interest. Various methods of purification are known in the art including diafilitration, precipitation, and chromatography.

[00181] Non-secreted

[00182] In some aspects, proteins can be isolated in the absence of secretion. For example, a cell having the protein (e.g., on the cell surface or intracellularly) can be lysed and the protein can be purified using standard methods such as chromatography or antibody- based isolation of the protein from the lysate. In some aspects, a cell surface expressed protein can be enzymatically cleaved from the surface.

Allersenicity Assays

[00183] For some embodiments it is preferred that the protein not exhibit inappropriately high allergenicity. Accordingly, in some embodiments the potential allergenicy of the protein is assessed. This can be done by any suitable method known in the art. In some embodiments an allergenicity score is calculated. The allergenicity score is a primary sequence based metric based on WHO recommendations

(fao.org/ag/agii/food/pdf/allergygm.pdf) for assessing how similar a protein is to any known allergen, the primary prediction being that high percent identity between a target and a known allergen is likely indicative of cross reactivity. For a given protein, the likelihood of eliciting an allergic response can be assessed via one or both of a complimentary pair of sequence homology based tests. The first test determines the protein's percent identity across the entire sequence via a global-global sequence alignment to a database of known allergens using the FASTA algorithm with the BLOSUM50 substitution matrix, a gap open penalty of 10, and a gap extension penalty of 2. It has been suggested that proteins with less than 50% global homology are unlikely to be allergenic (Goodman R. E. et al. Allergenicity assessment of genetically modified crops— what makes sense? Nat. Biotech. 26, 73-81 (2008); Aalberse R. C. Structural biology of allergens. J. Allergy Clin. Immunol. 106, 228-238 (2000)).

[00184] In some embodiments of a protein, the protein has less than 50% global homology to any known allergen in the database used for the analysis. In some embodiments a cutoff of less than 40% homology is used. In some embodiments a cutoff of less than 30% homology is used. In some embodiments a cutoff of less than 20% homology is used. In some embodiments a cutoff of less than 10% homology is used. In some embodiments a cutoff of from 40% to 50% is used. In some embodiments a cutoff of from 30% to 50% is used. In some embodiments a cutoff of from 20% to 50% is used. In some embodiments a cutoff of from 10% to 50% is used. In some embodiments a cutoff of from 5% to 50% is used. In some embodiments a cutoff of from 0% to 50% is used. In some embodiments a cutoff of greater than 50% global homology to any known allergen in the database used for the analysis is used. In some embodiments a cutoff of from 50% to 60% is used. In some embodiments a cutoff of from 50% to 70% is used. In some embodiments a cutoff of from 50% to 80% is used. In some embodiments a cutoff of from 50% to 90% is used. In some embodiments a cutoff of from 55% to 60% is used. In some embodiments a cutoff of from 65% to 70% is used. In some embodiments a cutoff of from 70% to 75% is used. In some embodiments a cutoff of from 75% to 80% is used.

[00185] The second test assesses the local allergenicity along the protein sequence by determining the local allergenicity of all possible contiguous 80 amino acid fragments via a global-local sequence alignment of each fragment to a database of known allergens using the FASTA algorithm with the BLOSUM50 substitution matrix, a gap open penalty of 10, and a gap extension penalty of 2. The highest percent identity of any 80 amino acid window with any allergen is taken as the final score for the protein of interest. The WHO guidelines suggest using a 35% identity cutoff with this fragment test. In some embodiments of a protein, all possible fragments of the protein have less than 35% local homology to any known allergen in the database used for the analysis using this test. In some embodiments a cutoff of less than 30% homology is used. In some embodiments a cutoff of from 30% to 35% homology is used. In some embodiments a cutoff of from 25% to 30% homology is used. In some embodiments a cutoff of from 20% to 25% homology is used. In some embodiments a cutoff of from 15% to 20% homology is used. In some embodiments a cutoff of from 10% to 15% homology is used. In some embodiments a cutoff of from 5% to 10% homology is used. In some embodiments a cutoff of from 0% to 5% homology is used. In some embodiments a cutoff of greater than 35% homology is used. In some embodiments a cutoff of from 35% to 40% homology is used. In some embodiments a cutoff of from 40% to 45% homology is used. In some embodiments a cutoff of from 45% to 50% homology is used. In some embodiments a cutoff of from 50% to 55% homology is used. In some embodiments a cutoff of from 55% to 60% homology is used. In some embodiments a cutoff of from 65% to 70% homology is used. In some embodiments a cutoff of from 70% to 75% homology is used. In some embodiments a cutoff of from 75% to 80% homology is used.

[00186] Skilled artisans are able to identify and use a suitable database of known allergens for this purpose. In some embodiments the database is custom made by selecting proteins from more than one database source. In some embodiments the custom database comprises pooled allergen lists collected by the Food Allergy Research and Resource Program

(allergenonline.org/), UNIPROT annotations (uniprot.org/docs/allergen), and the Structural Database of Allergenic Proteins (SDAP, fermi.utmb.edu SDAP/sdap_lnk.html). This database includes all currently recognized allergens by the International Union of

Immunological Socieities (IUIS, allergen.org/) as well as a large number of additional allergens not yet officially named. In some embodiments the database comprises a subset of known allergen proteins available in known databases; that is, the database is a custom selected subset of known allergen proteins. In some embodiments the database of known allergens comprises at least 10 proteins, at least 20 proteins, at least 30 proteins, at least 40 proteins, at least 50 proteins, at least 100, proteins, at least 200 proteins, at least 300 proteins, at least 400 proteins, at least 500 proteins, at least 600 proteins, at least 700 proteins, at least 800 proteins, at least 900 proteins, at least 1,000 proteins, at least 1,100 proteins, at least 1,200 proteins, at least 1,300 proteins, at least 1,400 proteins, at least 1,500 proteins, at least 1 ,600 proteins, at least 1,700 proteins, at least 1 ,800 proteins, at least 1,900 proteins, or at least 2,000 proteins. In some embodiments the database of known allergens comprises from 100 to 500 proteins, from 200 to 1,000 proteins, from 500 to 1,000 proteins, from 500 to 1,000 proteins, or from 1,000 to 2,000 proteins.

[00187] In some embodiments all (or a selected subset) of contiguous amino acid windows of different lengths (e.g., 70, 60, 50, 40, 30, 20, 10, 8 or 6 amino acid windows) of a protein are tested against the allergen database and peptide sequences that have 100% identity, 95% or higher identity, 90% or higher identity, 85% or higher identity, 80% or higher identity, 75% or higher identity, 70% or higher identity, 65% or higher identity, 60% or higher identity, 55% or higher identity, or 50% or higher identity matches are identified for further examination of potential allergenicity.

[00188] Another method of predicting the allergenicity of a protein is to assess the homology of the protein to a protein of human origin. The human immune system is exposed to a multitude of possible allergenic proteins on a regular basis and has the intrinsic ability to differentiate between the host body's proteins and exogenous proteins. The exact nature of this ability is not always clear, and there are many diseases that arise as a result of the failure of the body to differentiate self from non-self (e.g., arthritis). Nonetheless, the fundamental analysis is that proteins that share a degree of sequence homology to human proteins are less likely to elicit an immune response. In particular, it has been shown that for some protein families with known allergenic members (tropomyosins, paralbumins, caseins), those proteins that bear more sequence homology to their human counterparts relative to known allergenic proteins, are not thought to be allergenic (Jenkins J. A. et al. Evolutionary distance from human homologs reflects allergenicity of animcal food proteins. J. Allergy Clin Immunol. 120 (2007): 1399-1405). For a given protein, a human homology score is measured by determining the maximum percent identity of the protein to a database of human proteins (e.g., the UNIPROT database) from a global-local alignment using the FASTA algorithm with the BLOSUM50 substitution matrix, a gap open penalty of 10, and a gap extension penalty of 2. According to Jenkins et al. (Jenkins J. A. et al. Evolutionary distance from human homologs reflects allergenicity of animal food proteins.J. Allergy Clin Immunol. 120 (2007): 1399-1405) proteins with a sequence identity to a human protein above about 62% are less likely to be allergenic. Skilled artisans are able to identify and use a suitable database of known human proteins for this purpose, for example, by searching the UNIPROT database (uniprot.org). In some embodiments the database is custom made by selecting proteins from more than one database source. Of course the database may but need not be comprehensive. In some embodiments the database comprises a subset of human proteins; that is, the database is a custom selected subset of human proteins. In some embodiments the database of human proteins comprises at least 10 proteins, at least 20 proteins, at least 30 proteins, at least 40 proteins, at least 50 proteins, at least 100, proteins, at least 200 proteins, at least 300 proteins, at least 400 proteins, at least 500 proteins, at least 600 proteins, at least 700 proteins, at least 800 proteins, at least 900 proteins, at least 1 ,000 proteins, at least 2,000 proteins, at least 3,000 proteins, at least 4,000 proteins, at least 5,000 proteins, at least 6,000 proteins, at least 7,000 proteins, at least 8,000 proteins, at least 9,000 proteins, or at least 10,000 proteins. In some embodiments the database comprises from 100 to 500 proteins, from 200 to 1,000 proteins, from 500 to 1,000 proteins, from 500 to 1,000 proteins, from 1,000 to 2,000 proteins, from 1,000 to 5,000 proteins, or from 5,000 to 10,000 proteins. In some embodiments the database comprises at least 90%, at least 95%, or at least 99% of all known human proteins.

[00189] In some embodiments of a protein, the protein is at least 20% homologous to a human protein. In some embodiments a cutoff of at least 30% homology is used. In some embodiments a cutoff of at least 40% homology is used. In some embodiments a cutoff of at least 50% homology is used. In some embodiments a cutoff of at least 60% homology is used. In some embodiments a cutoff of at least 70% homology is used. In some embodiments a cutoff of at least 80% homology is used. In some embodiments a cutoff of at least 62% homology is used. In some embodiments a cutoff of from at least 20% homology to at least 30% homology is used. In some embodiments a cutoff of from at least 30% homology to at least 40% homology is used. In some embodiments a cutoff of from at least 50% homology to at least 60% homology is used. In some embodiments a cutoff of from at least 60% homology to at least 70% homology is used. In some embodiments a cutoff of from at least 70% homology to at least 80% homology is used.

Theromostabilitv Assays

[00190] As used herein, a "stable" protein is one that resists changes (e.g., unfolding, oxidation, aggregation, hydrolysis, etc.) that alter the biophysical (e.g., solubility), biological (e.g., digestibility), or compositional (e.g.. proportion of Leucine amino acids) traits of the protein of interest.

[00191] Protein stability can be measured using various assays known in the art and proteins disclosed herein and having stability above a threshold can be selected. In some embodiments a protein is selected that displays thermal stability that is comparable to or better than that of whey protein. Thermal stability is a property that can help predict the shelf life of a protein. In some embodiments of the assay stability of protein samples is determined by monitoring aggregation formation using size exclusion chromatography (SEC) after exposure to extreme temperatures. Aqueous samples of the protein to be tested are placed in a heating block at 90°C and samples are taken after 0, 1, 5, 10, 30 and 60 min for SEC analysis. Protein is detected by monitoring absorbance at 214nm, and aggregates are characterized as peaks eluting faster than the protein of interest. No overall change in peak area indicates no precipitation of protein during the heat treatment. Whey protein has been shown to rapidly form ~ 80% aggregates when exposed to 90°C in such an assay.

[00192] In some embodiments the thermal stability of a protein is determined by heating a sample slowly from 25°C to 95°C in presence of a hydrophobic dye (e.g., ProteoStat® Thermal shift stability assay kit, Enzo Life Sciences) that binds to aggregated proteins that are formed as the protein denatures with increasing temperature (Niesen, F. H., Berglund, H. & Vadadi, M., 2007. The use of differential scanning fluorimetry to detect ligand interactions that promote protein stability. Nature Protocols, Volume 2, pp. 2212-2221). Upon binding, the dye's fluorescence increases significantly, which is recorded by an rtPCR instrument and represented as the protein's melting curve (Lavinder, J. J., Hari, S. B., Suillivan, B. J. & Magilery, T. J., 2009. High-Throughput Thermal Scanning: A General, Rapid Dye-Binding Thermal Shift Screen for Protein Engineering. Journal of the American Chemical Society, pp. 3794-3795). After the thermal shift is complete, samples are examined for insoluble precipitates and further analyzed by analytical size exclusion chromatography (SEC). Solubility Assays

[00193] In some embodiments of the proteins disclosed herein the protein is soluble. Solubility can be measured by any method known in the art. In some embodiments solubility is examined by centrifuge concentration followed by protein concentration assays. Samples of proteins in 20 mM HEPES pH 7.5 are tested for protein concentration according to protocols using two methods, Coomassie Plus (Bradford) Protein Assay (Thermo Scientific) and Bicinchoninic Acid (BCA) Protein Assay (SigmadAIdrich). Based on these measurements 10 mg of protein is added to an Amicon Ultra 3 kDa centrifugal filter (Millipore). Samples are concentrated by centrifugation at 10,000 Xg for 30 minutes. The final, now concentrated, samples are examined for precipitated protein and then tested for protein concentration as above using two methods, Bradford and BCA.

[00194] In some embodiments the proteins have a final solubility limit of at least 5 g/L, 10 g/L, 20 g/L, 30 g/L, 40 g/L, 50 g/L, or 100 g/L at physiological pH. In some embodiments the proteins are greater than 50%, greater than 60%, greater than 70%, greater than 80%, greater than 90%, greater than 95%, greater than 96%, greater than 97%, greater than 98%, greater than 99%, or greater than 99.5% soluble with no precipitated protein observed at a concentration of greater than 5 g/L, or 10 g/L, or 20 g/L, or 30 g L, or 40 g/L, or 50 g/L, or 100 g/L at physiological pH. In some embodiments, the solubility of the protein is higher than those typically reported in studies examining the solubility limits of whey (12.5 g/L; Pelegrine et al., Lebensm.-Wiss. U.-Technol. 38 (2005) 77-80) and soy (10 g/L; Lee et al, JAOCS 80(1) (2003) 85-90).

[00195] Eukaryotic proteins are often glycosylated, and the carbohydrate chains that are attached to proteins serve various functions. N-linked and O-linked glycosylation are the two most common forms of glycosylation occuring in proteins. N-linked glycosylation is the attachment of a sugar molecule to a nitrogen atom in an amino acid residue in a protein. N- linked glycosylation occurs at Asparagine and Arginine residues. O-linked glycosylation is the attachment of a sugar molecule to an oxygen atom in an amino acid residue in a protein. O-linked glycosylation occurs at Threonine and Serine residues.

[00196] Glycosylated proteins are often more soluble than their un-glycosylated forms. In terms of protein drugs, proper glycosylation usually confers high activity, proper antigen binding, better stability in the blood, etc. However, glycosylation necessarily means that a protein "carries with it" sugar moieties. Such sugar moieties may reduce the usefulness of the proteins of this disclosure including recombinant proteins. For example, as demonstrated in the examples, a comparison of digestion of glycosylated and non-glycosylated forms of the same proteins shows that the non-glycosylated forms are digested more quickly than the glycosylated forms. For these reasons, in some embodiments the nutrive proteins according to the disclosure comprise low or no glycosylation. For example, in some embodiments the proteins comprise a ratio of non-glycosilated to total amino acid residues of at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99%. In some embodiments the proteins to not comprise any glycosylation.

[00197] In some embodiments, the protein according to the disclosure is de-glycosylated after it is produced or after it is isolated. Proteins of low or no glycosylation can be made by any method known in the art. For example, enzymatic and/or chemical methods can be used (Biochem. J. (2003) 376, p339-350.). Enzymes are produced commercially at research scales for the removal of N-linked and O-linked oligosaccharides. Chemical methods include use of trifluoromethanesulfonic acid to selectively break N-linked and O-linked peptide-saccharide bonds. This method often results in a more complete deglycosylation than does the use of enzymatic methods.

[00198] In other embodiments, the protein according to the disclosure is produced with low or no glycosylation by a host organism. Most bacteria and other prokaryotes have very limited capabilities to glycosylate proteins, especially heterologous proteins. Accordingly, in some embodiments of this disclosure a protein is made recombinantly in a microorganism such that the level of glycosylation of the recombinant protein is low or no glycosylation. In some embodiments the level of glycosylation of the recombinant protein is lower than the level of glycosylation of the protein as it occurs in the organism from which it is derived. Glycosylation of a protein can vary based on the host organism, in other words some hosts will produce more glycosylation relative to one or more other hosts; while other hosts will produce less g glycosylation relative to one or more other hosts. Differences in the amount of glycosylation can be measured based upon, e.g., the mass of glycosylation present and/or the total number of glycosylation sites present.

Toxicity and Anti-Nutricitv Assays

[00199] For most embodiments it is preferred that the protein not exhibit inappropriately high toxicity. Accordingly, in some embodiments the potential toxicity of the protein is assessed. This can be done by any suitable method known in the art. In some embodiments a toxicity score is calculated by determining the protein's percent identity to databases of known toxic proteins (e.g., toxic proteins identified from the UNIPROT database). A global- global alignment of the protein of interest against the database of known toxins is performed using the FASTA algorithm with the BLOSUM50 substitution matrix, a gap open penalty of 10, and a gap extension penalty of 2. In some embodiments of a protein, the protein is less than 35% homologous to a known toxin. In some embodiments a cutoff of less than 35% homology is used. In some embodiments a cutoff of from 30% to 35% homology is used. In some embodiments a cutoff of from 25% to 35% homology is used. In some embodiments a cutoff of from 20% to 35% homology is used. In some embodiments a cutoff of from 15% to 35% homology is used. In some embodiments a cutoff of from 10% to 35% homology is used. In some embodiments a cutoff of from 5% to 35% homology is used. In some embodiments a cutoff of from 0% to 35% homology is used. In some embodiments a cutoff of greater than 35% homology is used. In some embodiments a cutoff of from 35% to 40% homology is used. In some embodiments a cutoff of from 35% to 45% homology is used. In some embodiments a cutoff of from 35% to 50% homology is used. In some embodiments a cutoff of from 35% to 55% homology is used. In some embodiments a cutoff of from 35% to 60% homology is used. In some embodiments a cutoff of from 35% to 70% homology is used. In some embodiments a cutoff of from 35% to 75% homology is used. In some embodiments a cutoff of from 35% to 80% homology is used. Skilled artisans are able to identify and use a suitable database of known toxins for this purpose, for example, by searching the UNIPROT database (uniprot.org). In some embodiments the database is custom made by selecting proteins identified as toxins from more than one database source. In some embodiments the database comprises a subset of known toxic proteins; that is, the database is a custom selected subset of known toxic proteins. In some embodiments the database of toxic proteins comprises at least 10 proteins, at least 20 proteins, at least 30 proteins, at least 40 proteins, at least 50 proteins, at least 100, proteins, at least 200 proteins, at least 300 proteins, at least 400 proteins, at least 500 proteins, at least 600 proteins, at least 700 proteins, at least 800 proteins, at least 900 proteins, at least 1,000 proteins, at least 2,000 proteins, at least 3,000 proteins, at least 4,000 proteins, at least 5,000 proteins, at least 6,000 proteins, at least 7,000 proteins, at least 8,000 proteins, at least 9,000 proteins, or at least 10,000 proteins. In some embodiments the database comprises from 100 to 500 proteins, from 200 to 1 ,000 proteins, from 500 to 1,000 proteins, from 500 to 1,000 proteins, from 1,000 to 2,000 proteins, from 1,000 to 5,000 proteins, or from 5,000 to 10,000 proteins.

[00200] Anti-nutricity and anti-nutrients [00201] For some embodiments it is preferred that the protein not exhibit anti-nutritional activity ("anti-nutricity"), i.e., proteins that have the potential to prevent the absorption of nutrients from food. Examples of anti-nutritive sequences causing such anti-nutricity include protease inhibitors, which inhibit the actions of trypsin, pepsin and other proteases in the gut, preventing the digestion and subsequent absorption of protein.

[00202] Disclosed herein are formulations containing isolated nutritive polypeptides that are substantially free of anti-nutritive sequences. In some embodiments the nutritive polypeptide has an anti-nutritive similarity score below about 1, below about 0.5, or below about 0.1. The nutritive polypeptide is present in the formulation in an amount greater than about 1 Og, and the formulation is substantially free of anti-nutritive factors. The formulation is present as a liquid, semi-liquid or gel in a volume not greater than about 500ml or as a solid or semi-solid in a mass not greater than about 200g. The nutritive polypeptide may have low homology with a protease inhibitor, such as a member of the serpin family of polypeptides, e.g., it is less than 90% identical, or is less than 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, or less than 5% identical.

[00203] Accordingly, in some embodiments the potential anti-nutricity of the protein is assessed. This can be done by any suitable method known in the art. In some embodiments an anti-nutricity score is calculated by determining the protein's percent identity to databases of known protease inhibitors (e.g., protease inhibitors identified from the UNIP OT database). A global-global alignment of the protein of interest against the database of known protease inhibitors is performed using the FASTA algorithm with the BLOSUM50 substitution matrix, a gap open penalty of 10, and a gap extension penalty of 2, to identify whether the protein is homologous to a known anti-protein. In some embodiments of a protein, the protein has less than 35% global homology to any known anti-protein (e.g., any known protease inhibitor) in the database used for the analysis. In some embodiments a cutoff of less than 35% identify is used. In some embodiments a cutoff of from 30% to 35% is used. In some embodiments a cutoff of from 25% to 35% is used. In some embodiments a cutoff of from 20% to 35% is used. In some embodiments a cutoff of from 15% to 35% is used. In some embodiments a cutoff of from 10% to 35% is used. In some embodiments a cutoff of from 5% to 35% is used. In some embodiments a cutoff of from 0% to 35% is used. In some embodiments a cutoff of greater than 35% identify is used. In some embodiments a cutoff of from 35% to 40% is used. In some embodiments a cutoff of from 35% to 45% is used. In some embodiments a cutoff of from 35% to 50% is used. In some embodiments a cutoff of from 35% to 55% is used. In some embodiments a cutoff of from 35% to 60% is used. In some embodiments a cutoff of from 35% to 70% is used. In some embodiments a cutoff of from 35% to 75% is used. In some embodiments a cutoff of from 35% to 80% is used. Skilled artisans are able to identify and use a suitable database of known protease inhibitors for this purpose, for example, by searching the UNIPROT database (uniprot.org). In some embodiments the database is custom made by selecting proteins identified protease- inhibitors as from more than one database source. In some embodiments the database comprises a subset of known protease inhibitors available in databases; that is, the database is a custom selected subset of known protease inhibitor proteins. In some embodiments the database of known protease inhibitor proteins comprises at least 10 proteins, at least 20 proteins, at least 30 proteins, at least 40 proteins, at least 50 proteins, at least 100, proteins, at least 200 proteins, at least 300 proteins, at least 400 proteins, at least 500 proteins, at least 600 proteins, at least 700 proteins, at least 800 proteins, at least 900 proteins, at least 1,000 proteins, at least 1,100 proteins, at least 1,200 proteins, at least 1,300 proteins, at least 1,400 proteins, at least 1,500 proteins, at least 1,600 proteins, at least 1,700 proteins, at least 1,800 proteins, at least 1,900 proteins, or at least 2,000 proteins. In some embodiments the database of known protease inhibitor proteins comprises from 100 to 500 proteins, from 200 to 1,000 proteins, from 500 to 1 ,000 proteins, from 500 to 1,000 proteins, or from 1,000 to 2,000 proteins, or from 2,000 to 3,000 proteins.

[00204] In other embodiments a protein that does exhibit some degree of protease inhibitor activity is used. For example, in some embodiments such a protein can be useful because it delays protease digestion when the nuttirive protein is consumed such that the protein traveres a greater distance within the GI tract before it is digested, thus delaying absorption. For example, in some embodiments the protein inhibits gastric digestion but not intestinal digestion. Delaney B. et al. (Evaluation of protein safety in the context of agricultural biotechnology. Food. Chem. Toxicol. 46 (2008: S71-S97)) suggests that one should avoid both known toxic and anti-proteins when assessing the safety of a possible food protein. In some embodiments of a protein, the protein has a favorably low level of global homology to a database of known toxic proteins and/or a favorably low level of global homology to a database of known anti-nutricity proteins (e.g., protease inhibitors), as defined herein.

[00205] Antinutrients. Provided are nutritional compositions that lack anti-nutrients (or antinutrients). Antinutrients are compounds, usually other than proteins, which are typically found in plant foods and have been found to have both adverse effects and, in some situations, certain health benefits. For instance, phytic acid, lectins, phenolic compounds, saponins, and enzyme inhibitors have been shown to reduce the availability of nutrients and to cause the inhibition of growth, and phytoestrogens and lignans have been linked with infertility problems. On the other hand, phytic acid, lectins, phenolic compounds, amylase inhibitors, and saponins have been shown to reduce the blood glucose and insulin response to starch foods and/or the plasma cholesterol and triglycerides. Furthermore, phytic acid, phenolics, saponins, protease inhibitors, phytoestrogens, and lignans have been linked to reduced cancer risks.

[00206] Provided are methods for reducing the amount of anti-nutritional factors in a food product, by treating the food product with a thermal treatment comprising steam or hot air having a temperature greater than about 90 degrees C for at least 1 minute, combining with the treated food product with a composition containing an isolated nutritive polypeptide. Optionally, the step of thermal treatment degrades at least one anti-nutritional factor such as a saponin, a lectin, and a prolamin, a protease inhibitor, or phytic acid.

[00207] Anti-nutritional factors are detected in a protein composition as follows. Phytic acid: The procedure of Wheeler and Ferrel (Wheeler, E. L., Ferrel, R. E., Cereal Chem. 1971, 48, 312) is used for the determination of phytic acid extracted in 3% trichloroacetic acid. Raffinose family oligosaccharides: Protein samples are extracted with 70% ethanol using Soxhlet apparatus for 6-8 h and thin-layer chromatography is used for the quantitative determination of raffinose and stachyose in the extract according to the procedure of Tanaka et al. (Tanaka, M., Thananunkul, D., Lee, T. C, Chichester, C. O., J. Food Sci. 1975, 40, 1087-1088). Trypsin inhibitor: The method of akade et al. (Kakade, M. L., Rackis, J. J., McGhee, J. E., Puski, G., Cereal Chem. 1974, 51, 376-82) is used for determining the trypsin inhibitor activity in raw and treated samples. One trypsin inhibitor unit (TILT) is defined as a decrease in absorbance at 410 nm by 0.01 in 10 min and data were expressed as TIU*mg-l. Amylase inhibitor: The inhibitor is extracted in 0.15 m NaCl according to the procedure of Baker et al. (Baker, J. E., Woo, S. M., Throne, J. E., Finny, P. L., Environm. Entomol. 1991, 20, 53±60) and assayed by the method of Huesing et al. (Huesing, J. E., Shade, R. E., Chrispeels, M. J., Murdok, L. L., Plant Physiol. 1991, 96, 993±996). One amylase inhibitor unit (AIU) is defined as the amount that gives 50% inhibition of a portion of the amylase that produced one mg maltose monohydrate per min. Lectins: The procedure of Paredes-Lopez et al. (Paredes-Lopez, O., Schevenin, M. L., Guevara-Lara, F., Food Chem. 1989, 31, 129-137) is applied to the extraction of lectins using phosphate-buffered saline (PBS). The hemagglutinin activity (HA) of lectins in the sample extract is determined according to Kortt (Kortt, A. A. (Ed.), Eur. J. Biochem. 1984, 138, 519). Trypsinized human red blood cell (A, B and O) suspensions are prepared according to Lis and Sharon (Lis, H., Sharon, N., Methods Enzymol. 1972, 28, 360±368). HA is expressed as the reciprocal of the highest dilution giving positive agglutination. Tannins: The tannin contents are determined as tannic acid by Folin-Denis reagent according to the procedure of the AOAC (Helrich, K. (Ed.), AOAC, Official Methods of Analysis, Association of Official Analytical Chemists, Arlington, VA 1990)

Charge Assays and Solvation Scoring

[00208] One feature that can enhance the utility of a protein is its charge (or per amino acid charge). Proteins with higher charge can in some embodiments exhibit desirable characteristics such as increased solubility, increased stability, resistance to aggregation, and desirable taste profiles. For example, a charged protein that exhibits enhanced solubility can be formulated into a beverage or liquid formulation that includes a high concentration of protein in a relatively low volume of solution, thus delivering a large dose of protein nutrition per unit volume. A charged protein that exhibits enhanced solubility can be useful, for example, in sports drinks or recovery drinks wherein a user (e.g., an athlete) wants to ingest protein before, during or after physical activity. A charged protein that exhibits enhanced solubility can also be particularly useful in a clinical setting wherein a subject (e.g., a patient or an elderly person) is in need of protein nutrition but is unable to ingest solid foods or large volumes of liquids.

[00209] For example, the net charge (ChargeP) of a polypeptide at pH 7 can be calculated using the following formula:

ChargeP = -0.002 - (C)(0.045) - (D)(0.999) - (E)(0.998) + (H)(0.091) + (K)(l .0) + (R)(1.0) - (Y)(-0.001)

where C is the number of cysteine residues, D is the number of aspartic acid residues, E is the number of glutamic acid residues, H is the number of histidine residues, K is the number of lysine residues, R is the number of arginine residues and Y is the number of tyrosine residues in the polypeptide. The per amino acid charge (ChargeA) of the polypeptide can be calculated by dividing the net charge (ChargeP) by the number of amino acid residues (N), i.e., ChargeA = ChargeP/N. (See Bassi S (2007), "A Primer on Python for Life Science Researchers." PLoS Comput Biol 3(11): el99.

doi:10.1371/journal.pcbi.0030199).

[00210] One metric for assessing the hydrophilicity and potential solubility of a given protein is the solvation score. Solvation score is defined as the total free energy of solvation (i.e. the free energy change associated with transfer from gas phase to a dilute solution) for all amino acid side chains if each residue were solvated independently, normalized by the total number of residues in the sequence. The side chain solvation free energies are found computationally by calculating the electrostatic energy difference between a vacuum dielectric of 1 and a water dielectric of 80 (by solving the Poisson-Boltzmann equation) as well as the non-polar, Van der Waals energy using a linear solvent accessible surface area model (D. Sitkoff, K. A. Sharp, B. Honig. "Accurate Calculation of Hydration Free Energies Using Macroscopic Solvent Models". J. Phys. Chem. 98, 1994). For amino acids with ionizable sidechains (Arg, Asp, Cys, Glu, His, Lys and Tyr), an average solvation free energy is used based on the relative probabilities for each ionization state at the specified pH.

Solvation scores start at 0 and continue into negative values, and the more negative the solvation score, the more hydrophilic and potentially soluble the protein is predicted to be. In some embodiments of a protein, the protein has a solvation score of -10 or less at pH 7. In some embodiments of a protein, the protein has a solvation score of -15 or less at pH 7. In some embodiments of a protein, the protein has a solvation score of -20 or less at pH 7. In some embodiments of a protein, the protein has a solvation score of -25 or less at pH 7. In some embodiments of a protein, the protein has a solvation score of -30 or less at pH 7. In some embodiments of a protein, the protein has a solvation score of -35 or less at pH 7. In some embodiments of a protein, the protein has a solvation score of -40 or less at pH 7.

[00211] The solvation score is a function of pH by virtue of the pH dependence of the molar ratio of undissociated weak acid ([HA]) to conjugate base ([A-]) as defined by the Henderson-Hasselbalch equation:

[00212] All weak acids have different solvation free energies compared to their conjugate bases, and the solvation free energy used for a given residue when calculating the solvation score at a given pH is the weighted average of those two values. [00213] Accordingly, in some embodiments of a protein, the protein has a solvation score of -10 or less at an acidic pH. In some embodiments of a protein, the protein has a solvation score of -15 or less at at an acidic pH. In some embodiments of a protein, the protein has a solvation score of -20 or less at an acidic pH. In some embodiments of a protein, the protein has a solvation score of -25 or less at an acidic pH. In some embodiments of a protein, the protein has a solvation score of -30 or less at an acidic pH. In some embodiments of a protein, the protein has a solvation score of -35 or less at an acidic pH. In some embodiments of a protein, the protein has a solvation score of -40 or less at acidic pH.

[00214] Accordingly, in some embodiments of a protein, the protein has a solvation score of -10 or less at a basic pH. In some embodiments of a protein, the protein has a solvation score of -15 or less at at a basic pH. In some embodiments of a protein, the protein has a solvation score of -20 or less at a basic pH. In some embodiments of a protein, the protein has a solvation score of -25 or less at a basic pH. In some embodiments of a protein, the protein has a solvation score of -30 or less at a basic pH. In some embodiments of a protein, the protein has a solvation score of -35 or less at a basic pH. In some embodiments of a protein, the protein has a solvation score of -40 or less at basic pH.

[00215] Accordingly, in some embodiments of a protein, the protein has a solvation score of -10 or less at a pH range selected from 2-3, 3-4, 4-5, 5-6, 6-7, 7-8, 8-9, 9-10, 10-11, and 11-12. In some embodiments of a protein, the protein has a solvation score of -15 or less at at a pH range selected from 2-3, 3-4, 4-5, 5-6, 6-7, 7-8, 8-9, 9-10, 10-11, and 11-12. In some embodiments of a protein, the protein has a solvation score of -20 or less at a pH range selected from 2-3, 3-4, 4-5, 5-6, 6-7, 7-8, 8-9, 9-10, 10-11, and 11-12. In some embodiments of a protein, the protein has a solvation score of -25 or less at a pH range selected from 2-3, 3-4, 4-5, 5-6, 6-7, 7-8, 8-9, 9-10, 10-1 1, and 11-12. In some embodiments of a protein, the protein has a solvation score of -30 or less at a pH range selected from 2-3, 3-4, 4-5, 5-6, 6-7, 7-8, 8-9, 9-10, 10-11, and 11-12. In some embodiments of a protein, the protein has a solvation score of -35 or less at a pH range selected from 2-3, 3-4, 4-5, 5-6, 6-7, 7-8, 8-9, 9- 10, 10-1 1, and 11-12. In some embodiments of a protein, the protein has a solvation score of -40 or less at a pH range selected from 2-3, 3-4, 4-5, 5-6, 6-7, 7-8, 8-9, 9-10, 10-11, and 11- 12. Aggregation Assays and Aggregation Scoring

[00216] In some embodiments a protein of this disclosure shows resistance to aggregation, exhibiting, for example, less than 80% aggregation, 10% aggregation, or no detectable aggregation at elevated temperatures (e.g., 50°C, 60°C, 70°C, 80°C, 85°C, 90°C, or 95°C).

[00217] One benefit of stable proteins as disclosed herein is that they can be able to be stored for an extended period of time before use, in some instances without the need for refrigeration or cooling. In some embodiments, proteins are processed into a dry form (e.g., by lyophilization). In some embodiments, proteins are stable upon lyophilization. In some embodiments, such lyophilized proteins maintain their stability upon reconstitution (e.g., liquid formulation).

[00218] The aggregation score is a primary sequence based metric for assessing the hydrophobicity and likelihood of aggregation of a given protein. Using the Kyte and Doolittle hydrophobity scale (Kyte J, Doolittle RF (May 1982) "A simple method for displaying the hydropathic character of a protein". J. Mol. Biol. 157 (1): 105-32), which gives hydrophobic residues positive values and hydrophilic residues negative values, the average hydrophobicity of a protein sequence is calculated using a moving average of five residues. The aggregation score is drawn from the resulting plot by determining the area under the curve for values greater than zero and normalizing by the total length of the protein. The underlying view is that aggregation is the result of two or more hydrophobic patches coming together to exclude water and reduce surface exposure, and the likelihood that a protein will aggregate is a function of how densely packed its hydrophobic (i.e., aggregation prone) residues are. Aggregation scores start at 0 and continue into positive values, and the smaller the aggregation score, the less hydrophobic and potentially less prone to aggregation the protein is predicted to be. In some embodiments of a protein, the protein has an aggregation score of 2 or less. In some embodiments of a protein, the protein has an aggregation score of 1.5 or less. In some embodiments of a protein, the protein has an aggregation score of 1 or less. In some embodiments of a protein, the protein has an aggregation score of 0.9 or less. In some embodiments of a protein, the protein has an aggregation score of 0.8 or less. In some embodiments of a protein, the protein has an aggregation score of 0.7 or less. In some embodiments of a protein, the protein has an aggregation score of 0.6 or less. In some embodiments of a protein, the protein has an aggregation score of 0.5 or less. In some embodiments of a protein, the protein has an aggregation score of 0.4 or less. In some embodiments of a protein, the protein has an aggregation score of 0.3 or less. In some embodiments of a protein, the protein has an aggregation score of 0.2 or less. In some embodiments of a protein, the protein has an aggregation score of 0.1 or less.

[00219] In some cases, soluble expression is desirable because it can increase the amount and/or yield of the protein and facilitate one or more of the isolation and purification of the protein. In some embodiments, the proteins of this disclosure are solubly expressed in the host organism. Solvation score and aggregation score can be used to predict soluble expression of recombinant proteins in a host organism. As shown in Example 8, this disclosure provides evidence suggesting that proteins with solvation scores of < -20 and aggregation scores of < 0.75 are more likely to be recombinantly expressed in a particular E. coli expression system. Moreover, the data also suggests that proteins with solvation scores of < -20 and aggregation scores of < 0.5 are more likely to be solubly expressed in this system. Therefore, in some embodiments the protein of this disclosure has a solvation score of -20 or less. In some embodiments the nutitive protein has an aggregation score of 0.75 or less. In some embodiments the nutitive protein has an aggregation score of 0.5 or less. In some embodiments the protein has a solvation score of -20 or less and an aggregation score of 0.75 or less. In some embodiments the protein has a solvation score of -20 or less and an aggregation score of 0.5 or less.

Taste and Mouth Characteristics

[00220] Certain free amino acids and mixtures of free amino acids are known to have a bitter or otherwise unpleasant taste. In addition, hydrolysates of common proteins (e.g., whey and soy) often have a bitter or unpleasant taste. In some embodiments, proteins disclosed and described herein do not have a bitter or otherwise unpleasant taste. In some embodiments, proteins disclosed and described herein have a more acceptable taste as compared to at least one of free amino acids, mixtures of free amino acids, and/or protein hydrolysates. In some embodiments, proteins disclosed and described herein have a taste that is equal to or exceeds at least one of whey protein.

[00221] Proteins are known to have tastes covering the five established taste modalities: sweet, sour, bitter, salty, and umami. Fat can be considered a sixth taste. The taste of a particular protein (or its lack thereof) can be attributed to several factors, including the primary structure, the presence of charged side chains, and the electronic and conformational features of the protein. In some embodiments, proteins disclosed and described herein are designed to have a desired taste (e.g., sweet, salty, umami) and/or not to have an undesired taste (e.g., bitter, sour). In this context "design" includes, for example, selecting edible species proteins embodying features that achieve the desired taste property, as well as creating muteins of edible species polypeptides that have desired taste properties. For example, proteins can be designed to interact with specific taste receptors, such as sweet receptors (T1R2-T1R3 heterodimer) or umami receptors (T1R1-T1R3 heterodimer, mGluR4, and/or mGluRl). Further, proteins can be designed not to interact, or to have diminished interaction, with other taste receptors, such as bitter receptors (T2R receptors).

[00222] Proteins disclosed and described herein can also elicit different physical sensations in the mouth when ingested, sometimes referred to as "mouth feel." The mouth feel of the proteins can be due to one or more factors including primary structure, the presence of charged side chains, and the electronic and conformational features of the protein. In some embodiments, proteins elicit a buttery or fat-like mouth feel when ingested.

Compositions

[00223] At least one protein disclosed herein can be combined with at least one second component to form a composition. In some embodiments the only source of amino acid in the composition is the at least one protein disclosed herein. In such embodiments the amino acid composition of the composition will be the same as the amino acid composition of the at least one protein disclosed herein. In some embodiments the composition comprises at least one protein disclosed herein and at least one second protein. In some embodiments the at least one second protein is a second protein disclosed herein, while in other embodiments the at least one second protein is not a protein disclosed herein. In some embodiments the composition comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more proteins disclosed herein. In some embodiments the composition comprises 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more proteins that are not proteins disclosed herein. In some embodiments the composition comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more proteins and the composition comprises 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more proteins that are not proteins disclosed herein.

[00224]

EXAMPLES

[00225] Below are examples of specific embodiments for carrying out the present invention. The examples are offered for illustrative purposes only, and are not intended to limit the scope of the present invention in any way. Efforts have been made to ensure accuracy with respect to numbers used (e.g., amounts, temperatures, etc.), but some experimental error and deviation should, of course, be allowed for.

[00226] The practice of the present invention will employ, unless otherwise indicated, conventional methods of protein chemistry, biochemistry, recombinant DNA techniques and pharmacology, within the skill of the art. Such techniques are explained fully in the literature. See, e.g., T.E. Creighton, Proteins: Structures and Molecular Properties (W.H. Freeman and Company, 1993); A.L. Lehninger, Biochemistry (Worth Publishers, Inc., current addition); Sambrook, et al., Molecular Cloning: A Laboratory Manual (2nd Edition, 1989); Methods In Enzymology (S. Colowick and N. Kaplan eds., Academic Press, Inc.);

Remington's Pharmaceutical Sciences, 18th Edition (Easton, Pennsylvania: Mack Publishing Company, 1990); Carey and Sundberg Advanced Organic Chemistry 3 rd Ed. (Plenum Press) Vols A and B(1992).

[00227] Example 1. Identification and selection of amino acid sequences of nutritive polypeptides of edible species using mass spectrometric analyses.

[00228] Provided is a process for identifying one or a plurality of nutritive polypeptide amino acid sequences, such as from a polypeptide or nucleic acid library, or from a relevant database of protein sequences. Here, nutritive polypeptide amino acid sequences were identified by mass spectroscopy analysis of proteins extracted and purified from edible species.

[00229] Protein Isolation for Mass Spectroscopy. Proteins were extracted from solid edible sources. Samples from the following species were included in the analysis: Actinidia deliciosa, Agaricus bisporus var. bisporus , Arthrospira platensis, Bos taurus, Brassica oleracea, Cannabis, Chenopodium quinoa, Chlorella regularis, Chlorella variabilis, Cicer arietinum, Cucurbita maxima, Fusarium graminearum, Gadus morhua, Gallus gallus, Glycine Max, Lactobacillus acidophilus, Laminariales, Linum usitatissimum, Meleagris gallopavo, Odocoileus virginianus, Oreochromis niloticus, Oryza sativa, Ovis aries, Palmaria palmata, Persea americana, Prunus mume, Saccharomyces cerevisiae, Salmo salar, Solanum lycopesicum, Solanum tuberosum, Sus scrofa, Thunnus thynnus, Vaccinium corymbosum, Vitis vinifera, and Zea mays. Each sample was first frozen at -80C and then ground using a mortar and pestle before weighing 50 mg of material into a microcentrifuge tube. The 50 mg sample was then resuspended in 1 mL of extraction buffer (8.3 M urea, 2 M thiourea, 2% w/v CHAPS, 1% w/v DTT) and agitated for 30 minutes. Addition of 500 of 100-μιη zirconium beads (Ops Diagnostics) was followed by continued agitation for an additional 30 minutes. Samples were run on a TissueLyser II (Qiagen) at 30 Hz for 3 minutes and then centrifuged for 10 minutes at 21, 130 g in a benchtop microcentrifuge (Eppendorf). Supernatants were transferred to clean microcentrifuge tubes, aliquoted into 50 iL aliquots, and stored at -80°C. The amount of soluble protein extracted was measured by Coomassie Plus (Bradford) Protein Assay (Thermo Scientific). 20 ug of protein was run on 10% 10-lane BisTris SDS-PAGE gel (Invitrogen) and then excised for analysis by LC/MS MS.

[00230] Proteins were also isolated from liquid cultures of the following edible organisms: Aspergillus niger, Bacillus subtilis, Bacillus licheniformis, and Bacillus amyloliquefaciens. Aspergillus and bacillus organisms were cultured as described herein. Clarified supernatants were isolated by centrifuging (10,000 x g) cultures for 10 minutes, followed by filtering the supernatant using a 0.2 μΜ filter. The amount of soluble protein in the clarified supernatant was measured by Coomassie Plus (Bradford) Protein Assay (Thermo Scientific). Protein samples (20 μg) were run on a 10% Precast BisTris SDS-PAGE gel (Invitrogen) according to the manufacturer's protocol.

[00231] Mass Spectroscopy. For LC/MS/MS analysis each gel was excised into five equally sized pieces. Trypsin digestion was performed using a robot (ProGest, DigiLab) with the following protocol: washed with 25mM ammonium bicarbonate followed by acetonitrile, reduced with lOmM dithiothreitol at 60°C followed by alkylation with 50mM iodoacetamide at RT, digested with trypsin (Promega) at 37°C for 4h, quenched with formic acid and the supernatant was analyzed directly without further processing. The gel digests for each sample were pooled and analyzed by nano LC/MS/MS with a Waters NanoAcquity HPLC system interfaced to a ThermoFisher Q Exactive. Peptides were loaded on a trapping column and eluted over a 75μπι analytical column at 350nL/min; both columns were packed with Jupiter Proteo resin (Phenomenex). The mass spectrometer was operated in data-dependent mode, with MS and MS/MS performed in the Orbitrap at 70,000 FWHM and 17,500 FWHM resolution, respectively. The fifteen most abundant ions were selected for MS/MS. Resulting data were searched against a Uniprot and/or NCBI protein database from the corresponding organism using Mascot with the following parameters: Enzyme - Trypsin/P, Fixed modification - Carbamidomethyl (C) Variable modifications - Oxidation (M), Acetyl (Protein N-term), Pyro-Glu (N-term Q), Deamidation (NQ), Mass values - Monoisotopic, Peptide Mass Tolerance - 10 ppm, Fragment Mass Tolerance - 0.015 Da, Max Missed Cleavages - 2. Mascot DAT files were parsed into the Scaffold software for validation, filtering and to create a non-redundant list per sample. Data were filtered using a minimum protein value of 90%, a minimum peptide value of 50% (Prophet scores) and requiring at least two unique peptides per protein. Relative abundance of detected proteins was determined by spectral counts, which is the number of spectra acquired for each protein. Spectral counting is a label- free quantification method commonly used by the protein mass spec field (Liu, Hongbin et al. Analytical chemistry 76.14 (2004): 4193-4201). To calculate the relative abundance of each protein in the protein isolate the number of protein spectral counts is divided by the total protein spectral counts. SEQID 894 - 3415 were identified using this method.

[00232] Homolog discovery. For the nutritive polypeptide sequences identified, as described, similar sequences are identified from other species, SEQID-00093, which was identified by this method, was used to search for homologs using the computer program BLAST as described herein. Example nutritive polypeptide homologs from the edible database identified in this way are shown in table El A. Example nutritive polypeptide homologous from the expressed sequence database identified in this way are shown in table E1B.

[00233] Table El A. Edible Sequences identified as homologs to SEQID-00093.

[00234] Table E1B. Expressed Sequences identified as homologs to SEQID-00093

SEQID-00870 0.45 43.8

SEQID-00867 0.47 44.5

SEQID-00105 0.44 41.1

SEQID-00103 0.45 40.5

SEQID-00866 0.40 39.8

[00235] Example 2. Identification and selection of amino acid sequences of nutritive polypeptides of edible species using cDNA libraries. Here, nutritive polypeptide amino acid sequences were identified by analysis of proteins produced from nucleic acid sequences extracted and purified from edible species.

[00236] Construction of cDNA Library. A library of cDNA from twelve edible species was constructed. The twelve edible species were divided into five categories for RNA extraction. Animal tissues including ground beef, pork, lamb, chicken, turkey, and a portion of tilapia was combined with 50 mg from each edible species. Fruit tissues from grape and tomato including both the skin and the fruit were grounded and combined with 2.5 g from each species. Seeds of rice and soybean were combined with 1 g from each species and grounded into powder. 12 ml of Saccharomyces cerevisiae were grown overnight and spun down to obtain 110 mg of wet cell weight of yeast. 1 g of mushroom mycelium was grounded and processed using fungi RNA extraction protocols. All five categories of samples were snap frozen with liquid nitrogen, thawed and lysed using category-specific RNA extract protocols. The RNA from different food categories was extracted and combined as one pooled sample. The combined pool of RNA was reverse transcribed into cDNA using oligo-dT as primers resulting in cDNA of length between 500 bp to 4 kb. Adaptors were ligated to each end of the cDNA and used as PCR primers for amplification of the cDNA library and also included Sfi I restriction digestion sites for cloning the library into an expression vector. The cDNA library was denatured and re-annealed and the single-stranded DNA was selected using gel electrophoresis. This process removed extra cDNA from highly abundant RNA species to obtain a normalized cDNA library. The normalized cDNA library was precipitated using ethanol precipitation before PCR amplification and cloning into the expression vectors.

[00237] Table E2A. Primer and adapter sequences flanking the cDNA.

[00238] Cloning of cDNA library into E. coli for protein expression. The cDNA library was cloned into the pET15b backbone vector, which was amplified with primers with overhangs that contain the corresponding Sill restriction sites (forward primer overhang: TACGTGTATGGCCGCCTCGGCC; reverse primer overhang:

TACGTGTATGGCCGTAATGGCC). pET15b contains a pBR322 origin of replication, lac- controlled T7 promoter, and a bla gene conferring resistance to carbenicillin. Both the cDNA library and PCR amplified backbone were cut with Sfil, PCR purified, and ligated. The ligation reaction was transformed into 10-Beta High Efficiency Competent Cells (New England Biolabs), and transformed cells were plated onto four LB agar plates containing 100 mg/L carbenicillin. Plates were incubated at 37 °C overnight. After colonies had grown, 2mL of liquid LB medium was added to each plate. Cells were scraped into the liquid and mixed together, and the suspension was prepared for plasmid extraction to form the multiplex cDNA plasmid library.

[00239] E coli cDNA Multiplex Expression Methods. Four strains were used to express the cDNA libraries: T7 Express from New England Biolabs; and Rosetta 2(DE3), Rosetta-gami B(DE3), and Rosetta-gami 2(DE3) from EMD Millipore. T7 Express is an enhanced BL21 derivative which contains the T7 RNA polymerase in the lac operon, while lacking the Lon and OmpT proteases. The genotype of T7 Express is: 7 genel [lon] otnpT gal

mrr)l 14::IS10. Rosetta 2(DE3) is a BL21 derivative that supplies tRNAs for 7 rare codons (AGA, AGG, AUA, CUA, GGA, CCC, CGG). The strain is a lysogen ofXDE3, and carries the T7 RNA polymerase gene under the lacUVS promoter. The genotype of Rosetta 2(DE3) is: F Rosetta-gami B(DE3) has the same properties as Rosetta 2(DE3) but includes characteristics that enhance the formation of protein disulfide bonds in the cytoplasm. The genotype of Rosetta-gami B(DE3) is F ompT

Rosetta-gami 2(DE3), similarly to Rosetta-gami B(DE3), alleviates codon bias, enhances disulfide bond formation, and have the T7 RNA polymerase gene under the lacUVS promoter in the chromosome. The genotype of Rosetta-gami 2(DE3) is A(ara-

F'[lac + lac q pro] gor522 :Tn10 trxB pRARE2 (Cam R , Str R Tet R )

[00240] Roughly 200ng of prepared cDNA libraries were transformed into the four background strains: T7 Express, Rosetta 2(DE3), Rosetta-gami B(DE3), and Rosetta-gami 2(DE3) competent cells. After transforming, ΙΟΟμΙ, of each strain was plated onto four LB (10 g/1 NaCl, 10 g/1 tryptone, and 5 g/1 yeast extract) 1.5% agar plates containing 100 mg/L carbenicillin and incubated at 37 °C for 16hrs. After incubation, 2mL of LB media with lOOmg/L carbenicillin was added to the surface of each plate containing several thousand transformants, and the cells were suspended in the surface medium by scraping with a cell spreader and mixing. Suspended cells from the four replicate plates from each background were combined to form the pre-inoculum cultures for the expression experiments.

[00241] The ODeoo of the pre-inoculum cultures made from re-suspended cells were measured using a plate reader to be between 35 and 40 (T7, Rosetta 2(DE3) or 15 and 20 (Rosetta-gami B(DE3) and 2(DE3)). For the four background strains, 125mL baffled shake flasks containing lOmL of LB medium with 100 mg/L carbenicillin were inoculated to Οϋβοο 0.2 to form the inoculum cultures, and incubated at 37 °C shaking at 250 rpm for roughly 6 hours. OD600 was measured and the inoculum cultures were used to inoculate expression cultures in 2L baffled shake flasks containing 250mL of BioSilta Enbase medium with 100 mg/L carbenicillin, 600mU/L of glucoamylase and 0.01% Antifoam 204 to an OD 6 oo of 0.1. Cultures were shaken at 30 °C and 250 rpm for 18 hours, and were induced with ImM IPTG and supplemented with additional EnBase media components and another 600mU/L of glucoamylase. Heterologous expression was carried out for 24 hours at 30 °C and 250rpm, at which point the cultures were terminated. The terminal cell density was measured and the cells were harvested by centrifugation (5000xg, 10 min, RT). Cells were stored at -80 °C before being lysed with B-PER (Pierce) according to the manufacturer's protocol. After cell lysis, the whole cell lysate is sampled for analysis. In the Rosetta (DE3) strain, the whole cell lysate is centrifuged (3000xg, 10 min RT) and the supernatant is collect as the soluble fraction of the lysate. Cell lysates were run on SDS-PAGE gels, separated into ten fractions, and then analyzed using MS-MS.

[00242] Cloning of cDNA library into Bacillus for protein secretion. The cDNA library was cloned into the pHT43 vector for protein secretion assay in Bacillus subtilis. The unmodified pHT43 vector from MoBiTec contains the Pgrac promoter, the SamyQ signal peptide, Amp and Cm resistance genes, a lacl region, a repA region, and the ColEl origin of replication. The SamyQ signal peptide was removed. The pHT43 backbone vector with no signal peptide as well as a modified version with the aprE promoter substituted for the grac promoter and with the lacl region removed were amplified with primers with overhangs that contain the corresponding Sfil restriction sites (forward primer overhang: TACGTGTATGGCCGCCTCGGCC; reverse primer overhang:

TACGTGTATGGCCGTAATGGCC). Both the cDNA library and the two PCR amplified backbones were cut with Sfil and PCR purified. The cDNA library inserts were ligated into each background. The ligation reactions were transformed into 10-Beta High Efficiency Competent Cells (New England Biolabs), and cells from each ligation were plated onto four LB agar plates containing 100 mg/L carbenicillin. Plates were incubated at 37 °C overnight. After colonies had grown, 2mL of liquid LB medium was added to each plate. For each ligation, cells were scraped into the liquid and mixed together, and the suspensions were prepped for plasmid extraction to form the multiplex cDNA plasmid libraries (henceforth referred to as the multiplex Grac-cDNA and AprE-cDNA libraries).

[00243] The expression strains used in this expression experiment are based off of the WB800N strain (MoBiTec). The WB800N strain has the following genotype: nprE aprE epr bpr mpr::ble nprB::bsr vpr wprA::hyg cm::neo; NeoR. Strain cDNA-1 contains a mutation that synergizes with the paprE promoter and has these alterations in addition to the WB800N genotype: pXylA-comK::Erm, degU32(Hy), sigF::Str. Strain cDNA-2 has these alterations to WB800N: pXylA-comK::Erm.

[00244] Roughly 1 μg of the multiplex Grac-cDNA library was transformed into both Strain cDNA- 1 and Strain cDNA-2, and ^g of the multiplex AprE-cDNA library was transformed into Strain cDNA-1. After transforming, 100μL of each strain was plated onto four LB (10 g/1 NaCl, 10 g/1 tryptone, and 5 g/1 yeast extract) 1.5% agar plates containing 5 mg/L chloramphenicol and incubated at 37 °C for 16hrs. After incubation, 2mL of LB media with 5 mg/L chloramphenicol was added to the surface of each plate containing several thousand transformants, and the cells were suspended in the surface medium by scraping with a cell spreader and mixing. Suspended cells from the four replicate plates from each transformation were combined to form the preinoculum cultures for the expression experiments.

[00245] The ODeoo of the preinoculum cultures made from resuspended cells were measured using a plate reader to be roughly 20-25. For the three strains (strain cDNA-1 + multiplex Grac-cDNA, strain cDNA- 1 + multiplex AprE-cDNA, strain cDNA-2 + Grac-cDNA), 500mL baffled shake flasks containing 50mL of 2xMal medium (20g L NaCl, 20g/L Tryptone, lOg/L yeast extract, 75 g/L D-Maltose) with 5 mg/L chloramphenicol were inoculated to ΟΌβοο ¾ 0.2 to form the inoculum cultures, and incubated at 30 °C shaking at 250 rpm for roughly 6 hours. ΟΌβοο was measured and the inoculum cultures were used to inoculate expression cultures in 2L baffled shake flasks containing 2xMal medium with 5mg/L chloramphenicol, IX Teknova Trace Metals, and 0.01% Antifoam 204 to an OD600 of 0.1. The strain cDNA-1 + multiplex AprE cDNA culture was shaken for 30 °C and 250 rpm for 18 hours, at which point the culture was harvested. The terminal cell density was measured and the cells were harvested by centrifugation (5000xg, 30 min, RT). The strain cDNA- 1 + multiplex Grac-cDNA and strain cDNA-2 + multiplex Grac-cDNA cultures were shaken at 37 °C and 250 rpm for 4 hours, and were induced with ImM IPTG. Heterologous expression was carried out for 4 hours at 37 °C and 250rpm, at which point the cultures were harvested. Again, the terminal cell density was measured and the cells were harvested by centrifugation (5000xg, 30 min, RT). The supernatant was collected and run on SDS-PAGE gels, separated into ten fractions, and then analyzed using LC-MS/MS to identify secreted proteins.

[00246] Mass spectrometry analysis. Whole cell lysate and soluble lysate samples were analyzed for protein expression using LC-MS/MS. To analyze samples, 10 μg of sample was loaded onto a 10% SDS-PAGE gel (Invitrogen) and separated approximately 5 cm. The gel was excised into ten segments and the gel slices were processed by washing with 25 mM ammonium bicarbonate, followed by acetonitrile. Gel slices were then reduced with 10 mM dithiothreitol at 60 °C, followed by alkylation with 50 mM iodoacetamide at room temperature. Finally, the samples were digested with trypsin (Promega) at 37 °C for 4 h and the digestions were quenched with the addition of formic acid. The supernatant samples were then analyzed by nano LC/MS/MS with a Waters NanoAcquity HPLC system interfaced to a ThermoFisher Q Exactive. Peptides were loaded on a trapping column and eluted over a 75 μ ι analytical column at 350 nL/min; both columns were packed with Jupiter Proteo resin (Phenomenex). A 1 h gradient was employed. The mass spectrometer was operated in data- dependent mode, with MS and MS/MS performed in the Orbitrap at 70,000 FWHM resolution and 17,500 FWHM resolution, respectively. The fifteen most abundant ions were selected for MS/MS. Data were searched against a database using Mascot to identify peptides. The database was constructed by combining the complete proteome sequences from all twelve species including Bos taurus, Gallus gallns, Vitis vinifera, Ovis aries, Sus scrofa, Oryza sativa, Glycine max, Oreochromis niloticus, Solanum lycopesicum, Agaricus bisportis var. bisporus, Saccharomyces cerevisiae, and Meleagris gallopavo. Mascot DAT files were parsed into the Scaffold software for validation, filtering and to create a nonredundant list per sample. Data were filtered at 1% protein and peptide false discovery rate (FDR) and requiring at least two unique peptides per protein. [00247] Example 3. Identification and selection of amino acid sequences of nutritive polypeptides of edible species using annotated protein sequence databases.

[00248] Construction of Protein Databases. The UniProt B/Swiss-Prot (a collaboration between the European Bioinformatics Institute and the Swiss Institute of Bioinformatics) is a manually curated and reviewed protein database, and was used as the starting point for constructing a protein database. To construct a protein database of edible species, a search was performed on the UniProt database for proteins from edible species as disclosed in, e.g., PCT/US2013/032232, filed March 15, 2013, PCT/US2013/032180, filed March 15, 2013, PCT/US2013/032225, filed March 15, 2013, PCT/US2013/032218, filed March 15, 2013, PCT/US2013/032212, filed March 15, 2013, and PCT US2013/032206, filed March 15, 2013. To identify proteins that are secreted from microorganisms, the UniProt database was searched for species from microorganisms as disclosed herein and proteins that are annotated with keywords or annotations that includes secreted, extracellular, cell wall, and outer membrane. To identify proteins that are abundant in the human diet, the reference proteomes of edible species were assembled from genome databases. As provided herein, mass spectrometry was performed on proteins extracted from each edible species. The peptides identified by mass spectrometry were mapped to the reference proteomes and the spectrum counts of the peptides associated with the reference protein sequences were converted to a measure for the abundance of the corresponding protein in food. All proteins that were detected above a cutoff spectrum count with high confidence were assembled into a database. These databases are used for identifying proteins that are derived from edible species, which are secreted, and/or are abundant in the human diet.

[00249] Processes for selection of amino acid sequences. A process for picking a protein or group of proteins can include identifying a set of constraints that define the class of protein one is interested in finding, the database of proteins from which to search, and performing the actual search.

[00250] The protein class criteria can be defined by nutritional literature (i.e., what has been previously identified as efficacious), desired physiochemical traits (e.g., expressible, soluble, nonallergenic, nontoxic, digestible, etc), and other characteristics. A relevant database of proteins that can be used for searching purposes can be derived from the sequences disclosed herein.

[00251] One example of proteins that can be searched is a highly soluble class of proteins for muscle anabolism/immune health/diabetes treatment. These proteins are generally solubly expressible, highly soluble upon purification/isolation, non-allergenic, non-toxic, fast digesting, and meet some basic nutritional criteria (e.g., [EAA] > 0.3, [BCAA] > 0.15, [Leucine or Glutamine or Arginine] > 0.08, eaa complete).

[00252] A search is conducted for expressible, soluble proteins using a binary classification model based on two parameters related to the hydrophilicity and hydrophobicity of the protein sequence: solvation score and aggregation score (see examples below for various descriptions of these two metrics and measures of efficacy of the model). Alternatively, a search can be conducted for highly charged proteins with high (or low) net charge per amino acid, which is indicative of a net excess of negative or positive charges per amino acid (see example below for additional description).

[00253] The nutritional criteria are satisfied by computing the mass fractions of all relevant amino acids based on primary sequence. For cases in which it is desired to match a known, clinically efficacious amino acid blend a weighted Euclidean distance method can be used (see example below).

[00254] As provided herein, allergenicity/toxicity/ nonallergenicity/antinutrticity criteria are searched for using sequence based homology assessments in which each candidate sequence is compared to libraries of known allergens, toxins, nonallergens, or antinutritive (e.g., protease inhibitory) proteins (see examples herein). In general, cutoffs of <50% global or < 35% local (over any given 80aa window) homology (percent ID) can be used for the allergenicity screens, and <35% global for the toxicity and antinutricity screens. In all cases, smaller implies less allergenic/toxic/antinutritive. The nonallergenicity screen is less typically used as a cutoff, but > 62% as a cutoff can be used (greater implies is more nonallergenic). These screens reduce the list to a smaller subset of proteins enriched in the criteria of interest. This list is then ranked using a variety of aggregate objective functions and selections are made from this rank ordered list.

[00255] Example 4. Purification of nutritive polypeptides.

[00256] Various methods of purification have been used to isolate nutritive polypeptides from or away other materials such as raw foods, cells, salts, small molecules, host cell proteins, and lipids. These methods include diafiltration, precipitation, flocculation, aqueous two phase extraction, and chromatography.

[00257] Purification by anti-FLAG Affinity Chromatography. Anti-FLAG purification provides a method to purify nutritive polypeptides from low-titer expression systems or from similarly charged host cell proteins. Nutritive polypeptides were engineered to contain either a single FLAG tag (DYKDDDDK) or a triple tandem FLAG tag

(DYKDDDDKDYKDDDDKDYKDDDDK) appended to the C-terminus of the protein. Anti- FLAG affinity purification offers a single-step purification process that offers non-denaturing process conditions and elution purities of >95% (Einhauer et al, 2001 Journal of

Biochemical and Biophysical Methods).

[00258] Nutritive polypeptides were purified using Anti-FLAG M2 Affinity Agarose Gel (Sigma Aldrich, St. Louis, MO). The M2 affinity resin is designed specifically for use with C-terminal FLAG epitopes. For purification of N-terminally appended FLAG epitopes, the Ml Affinity Agarose Gel was used. The M2 Affinity Agarose Gel (resin) has an advertised static binding capacity (SBC) of approximately 0.5 mg nutritive polypeptide per mL of resin.

[00259] Purification of nutritive polypeptides from Aspergillus niger secretion media and Bacillus subtilis secretion media were performed using 20-40 mL of anti-FLAG resin. Prior to purification, secretion media was adjusted to 150 mM NaCl and pH 7.4. Resin was equilibrated by rinsing the media with an excess of IX tris-buffered saline (TBS) pH 7.4 ± 0.1 and collecting it through a 0.2 um polyethersulfone (PES) vacuum filter. Equilibrated resin was then mixed with secretion media in batch mode and allowed to mix at room temperature for one hour. Unbound material was removed from the resin by passing the entire mixture through a 0.2um PES vacuum filter. The resin was physically collected on the surface of the filter and was subsequently washed with 20 resin volumes of TBS pH 7.4 ± 0.1 to further remove unbound material through the 0.2um PES vacuum filter. Washed resin was transferred to drip columns (10 mL each) and the bound polypeptides were eluted with two column volumes (CV) of 0.1 M glycine pH 3.0. The eluted polypeptides were flowed directly from the drip columns into conical tubes that contained 1M Tris pH 8.0; this strategy was used to neutralize the pH of the eluted polypeptide solution as quickly as possible. Resin was regenerated using an additional 3 CV of 0.1 M glycine pH 3.0. For short term storage, resin was stored in IX TBS pH 7.4 at 4°C; for long term storage, resin was stored in 0.5X TBS pH 7.4, 50% glycerol at -20 ° C.

[00260] Exemplary anti-FLAG purification of SEQID-00105 from B.subtilis yielded 4.0 mg of protein in a 4.3ml elution. The sample was loaded onto a polyacrylamide gel at three different dilutions for increased sensitivity and SEQID-00105 was found to be 95% pure. Exemplary anti-FLAG purification of SEQID-00298 from A. niger was performed according to the same procedure. The elution fraction was neutralized, as described, and analyzed by SDS-PAGE and Bradford assay as described herein. The main band in the elution was found to be 95% pure. The main band in the elution is compared to the MW ladder on the same gel, and matched the expected molecular weight of SEQID-00298. Forty mL of anti-FLAG resin captured 4.0 mg of material, resulting in an estimated resin capacity of 0.10 mg/mL.

[00261] Purification by 5ml Immobilized Metal Affinity Chromatography (IMAC). E. coli was grown in shake flask fermentation with targeted expression of individual nutritive polypeptides with HIS8 tags, as described herein. Cells were harvested from each shake- flask by bucket centrifugation. The supernatant was discarded, and the cells were suspended in 30 mM imidazole, 50 mM sodium phosphate, 0.5 M NaCl, pH 7.5 at a wet cell weight (WCW) concentration of 20% w/v. The suspended cells were then lysed with two passes through a Ml 10-P microfluidizer (Microfluidics, Westwood, MA) at 20,000 psi through an 87 um interaction chamber. The lysed cells were centrifuged at 15,000 relative centrifugal force (RCF) for 120 minutes, and then decanted. Cellular debris was discarded, and the supernatants were 0.2 um filtered. These filtered protein solutions were then purified by immobilized metal affinity chromatography (IMAC) on an AKTA Explorer 100 FPLC (GE Healthcare, Piscataway, NJ). Nutritive polypeptides were purified over 5 mL (1.6 cm diameter x 2.5 cm height) IMAC Sepharose 6 Fast Flow columns (GE Healthcare,

Piscataway, NJ).

[00262] IMAC resin (GE Healthcare, IMAC Sepharose 6 Fast Flow) was charged with nickel using 0.2 NiS04 and washed with 500 mM NaCl, 200 mM imidazole, pH 7.5 followed by equilibration in 30 mM imidazole, 50 mM sodium phosphate, 0.5 M NaCl, pH 7.5. 50 mL of each protein load solution was applied onto a 5 mL IMAC column, and washed with additional equilibration solution to remove unbound impurities. The protein of interest was then eluted with 15 mL of IMAC Elution Solution, 0.25 M imidazole, 0.5 M NaCl, pH 7.5 . All column blocks were performed at a linear flow rate of 150 cm/hr. Each IMAC elution fraction was buffer exchanged by dialysis into a neutral pH formulation solution. The purified proteins were analyzed for concentration and purity by capillary electrophoresis and/or SDS-PAGE. Concentration was also tested by Bradford and A280 measurement, as described herein. Table E9A demonstrates a list of nutritive polypeptides that were purified by IMAC at 5 mL scale.

[00263] Table E9A. Nutritive polypeptides that were purified by IMAC at 5 mL 00085 34 51% 00105 3.8 93%

00103 4.5 56% 00343 35.3 95%

00359 40.5 56% 00103 112 95%

00346 30.7 56% 00511 179 95%

00510 112 61% 00354 85.8 96%

00622 70 70% 00587 93 96%

00522 47 72% 00610 90.5 97%

00546 235.6 75% 00485 269 98%

00353 5.6 76% 00356 76.9 98%

00601 83.8 77% 00352 134.9 99%

00418 14 80% 00345 196.2 100%

00502 93.2 84% 00338 123.2 100%

00100 68 87% 00298 0.6 100%

00606 77.8 87% 00357 104.8 100%

00104 93 89% 00605 202 100%

00076 92 91% 00559 241.8 100%

00341 176.6 91% 00338 268 100%

[00264] Purification by 1L Immobilized Metal Affinity Chromatography (IMAC). E. coli was grown in 20 L fermentation with targeted expression of individual nutritive polypeptides with HIS8 tags, as described herein. Cells were harvested from the fermenter and centrifuged using a Sharpies AS-16P centrifuge to collect wet cell mass. Cells were subsequently resuspended in 30 mM imidazole, 50 mM sodium phosphate, 0.5 M NaCl, pH 7.5 at a wet cell weight (WCW) concentration of 20% w/v. The cell suspension was then lysed using four passes through a Niro Soavi Homogenizer (Niro Soavi, Parma, Italy) at an operational pressure of 12,500 - 15,000 psi and a flow rate of 1 L/min. The lysate was clarified using a Beckman J2-HC bucket centrifuge (Beckman-Coulter, Brea, CA) at 13,700 x g for 1 hour. Cellular debris was discarded, and the supernatant was filtered through a Sartopore II XLG 0.8/0.2um filter (Sartorius Stedim, Bohemia, NY) at 30 L/m2/hr. Filtered lysate was purified by IMAC using IMAC Sepharose 6 Fast Flow resin packed in a 0.9 L column (9 cm diameter x 13.8 cm height).

[00265] IMAC resin was equilibrated, as described herein, at a linear flow rate of 300 cm/hr. Once equilibrated, the entirety of the filtered lysate was passed over the column at a linear flow rate of 150 cm/hr. Load volumes ranged from six to ten column volumes. After the load, unbound material was washed off of the column, and the target protein was eluted. Elution pools were shipped at room temperature, 4°C or frozen. This decision was dependent on the stability of the nutritive polypeptide in Elution Solution. Table E9B summarizes a number of nutritive polypeptides that have been purified by IMAC at the 1 L column scale.

[00266] Table E9B. Nutritive polypeptides that have been purified by IMAC at 1 L scale SEQID IMAC Elution Mass IMAC Elution Purity

00240 9.00 grams 98 %

00338 43.5 grams 100 %

00341 54.3 grams 100 %

00352 19.8 grams 100 %

00559 19.5 grams 89 %

00587 8.6 grams 69 %

[00267] Nutritive polypeptides were filtered through a Sartopore II XLG 0.8/0.2D Dm filter and loaded directly into an ultrafiltration/diafiltration (UF/DF) unit operation. Membrane area and nominal molecular weight cutoff were chosen as appropriate for each nutritive polypeptide. Nutritive polypeptides were ultrafiltered at a cross flow of 12 L/m2/min and a TMP target of 25 psi. Nutritive polypeptides were concentrated approximately ten-fold on Hydrosart ultrafiltration cassettes (Sartorius Stedim, Bohemia, NY), and diafiltered seven diavolumes into a formulation buffer that is specific to the nutritive polypeptide.

Ultrafiltration permeate was discarded. The diafiltered, concentrated retentate was collected, filtered through a 0.22 um membrane filter and frozen at -80°C.

[00268] In some cases, frozen protein concentrates were lyophilized using a Labconco lyophilizer (Labconco, Kansas City, MO). Residual water content of the cake is analyzed using the Karl Fisher method.

[00269] Purification by Precipitation. Protein precipitation is a well-known method for purification of polypeptides (Scopes R. 1987. Protein Purification: Principles and Practice. New York: Springer). Many polypeptides precipitate as salt concentrations increase, a phenomenon known as salting out. Salt types have been ranked and organized on the Hofmeister series for their different abilities to salt out proteins (F.Hofmeister Arch. Exp. Pathol. Pharmacol. 24, (1888) 247-260.). Proteins also have different propensity to precipitate due to high salt concentration based on their physicochemical properties, however, a universal metric to rank proteins for this characteristic has not been established. The use of such a ranking metric to select nutritive polypeptides for their ability to be purified has implications for the speed of process development, cost of manufacture, final purity, and robustness of the purification.

[00270] In most industrial applications of purification by polypeptides precipitation, the polypeptide of interest is selectively precipitated, and the impurities are then rinsed away from the solid precipitate. In certain embodiments, polypeptides do not precipitate with high levels of salt, and purification is achieved by precipitating the impurities. In the present application, we have defined two methods of screening a library of polypeptides to rank- order them for their ability to remain soluble through harsh precipitation conditions. One method is an in silico prediction based on calculation of protein total charge across a range of pH using the primary sequence of the polypeptides, as described herein. The second method is a multiplexed purification screen in vitro, as described herein. The two methods have successfully been used independently of each other, and they have been used together on the same set of 168 nutritive polypeptides with supportive data, as described herein.

[00271] The solubility of a polypeptide correlates directly with the abundance of surface charges (Jim Kling, Highly Concentrated Protein Formulations: Finding Solutions for the Next Generation of Parenteral Biologies, BioProcess International, 2014.). It has been established that surface charges can impart physical characteristics to a polypeptide

(Lawrence, M. S., Phillips, K. J., & Liu, D. R. (2007). Supercharging proteins can impart unusual resilience. Journal of the American Chemical Society, 129(33), 101 10-2.

doi:10.1021/ja071641y).

[00272] The in silico method of predictive solubility ranking is based on calculating total number of charges of a nutritive polypeptide at a range of pH based on the primary sequence.

[00273] A prevalence of one or more certain amino acids, e.g., histidine, arginine, and lysine in a polypeptide imparts in that polypeptide, or a portion thereof, a positive charge when the pH of the protein solvent is below the pKa of the one or more amino acids. A prevalence of one or more certain amino acids, e.g., glutamic acid and aspartic acid in a polypeptide imparts in that polypeptide, or a portion thereof, a negative charge when the pH of the protein solvent is above the pKa of the one or more amino acids.

[00274] The total number of charges of a polypeptide changes as a function of the pH of the protein solvent. The number of positive charges and negative charges can be calculated at any pH based on the primary sequence of the polypeptide, as described herein. The sum of the positive charges and negative charges at any one pH results in the calculated net charge. The isoelectric point (pi) of the polypeptide is the pH at which its calculated net charge is 0. To make comparisons across nutritive polypeptides, the total number of positive charges is added to the total number of negative charges (the absolute value), and that total charge is normalized by the number of amino acids in the sequence, and the parameter "total charge per amino acid" results as the novel comparator between sequences, which is used to predict the polypeptide's resistance to precipitation. The more resistant a polypeptide is, the higher the likelihood that it can be purified to a high degree by precipitating out the impurities. Unlike predicting chromatographic performance, solubility is not affected by the polarity of the charges. While it is often true that a polypeptide experiences its lowest solubility at the pi of the sequence, some polypeptides have a high total charge and are still extremely soluble at their pi, as shown herein.

[00275] Nutritive polypeptide sequences have been evaluated by calculating the total charge per amino acid of each polypeptide at a range of pH (1-14). Nutritive polypeptides were ranked by total charge per amino acid. Polypeptides with a low pi and very negative net charge per amino acid across a wide range of pH and polypeptides with a high pi and very positive net charge per amino acid across a wide range of pH are all expected to score equally well by this ranking. Polypeptides with a large number of both charges score the best.

[00276] A set of nutritive polypeptides includes polypeptides with a low pi (<4) and very negative net charge per amino acid across a wide range ((l)SEQID-00475, (2) SEQID- 00009). This set includes polypeptides with a high pi (>10) and very positive net charge per amino acid across a wide range of pH ((4) SEQID-00433, (5) SEQID-00472). This set includes polypeptides with more neutral pi ((3) SEQID-00478). Many of the polypeptides displayed here show high charge even at extreme pH values, such as <4 and >12. The entire set is expected to be extremely soluble and resist precipitation across a wide range of pH.

[00277] In this demonstration, the E. coli cells were harvested from shake flask fermentation by centrifugation, and the whole cells were distributed into tubes (1 gram of cells per tube). To each tube, 4 mL of lysis solution was added, and cells were lysed by sonication at 75 Amps for 30 seconds. Lysis solutions included: Water, 8 M Urea 0.1 M Tris 0.1M NaCl, 0.1 M Acetate 10% gly 0.1% Tween-80 0.3 M Arg 0.3M NaCl lOmM EDTA, 10 mM Imidazole pH 5.0, 0.1 M Acetate 10% gly 0.1% Tween-80 0.3 M Arg 0.3M NaCl l OmM EDTA, 10 mM Imidazole pH 7.0, 100 NaCl 100 Hepes 10 Imidazole pH 7.5, 500 NaCl 100 Hepes 10 Imidazole pH 7.5, 100 mM Phos, 150 mM NaCl, l OmM Imidazole pH 7.5, 0.1 M NaCl 0.1 M Hepes 10 mM Imidazole 50 mM CaC12 PH 7.5, 0.1 M Hepes 3.5 M Am Sulfate pH 7.5, 0.1 M Hepes 2 M Am. Sulfate pH 7.5, 0.1 M Tris, 0.1 M Tris 0.5 M NaCl, 150 mM NaCl 10 mM Acetate 15 mM Imidazole pH 6.04, and 500 mM NaCl 100 mM Acetate 15 mM Imidazole pH 6.04. The lysate was clarified by centrifugation and 0.2 um filtration. The clarified supernatant was analyzed by SDS-PAGE (blue stain), as described herein.SEQID-00009 demonstrated solubility in each of these conditions. E. coli host cell proteins generally demonstrated solubility in these conditions as well, with one exception. In the presence of 3.5 M ammonium sulfate effectively precipitated the majority of the host cell proteins resulting in 85% purified SEQID-00009 after the cell harvest stage of the process. This result indicates that SEQID-00009 is more soluble than most E. coli host cell proteins and that precipitation can be used as part of a low cost method for isolation. This correlates with the high total charge of SEQID-00009 and supports that the prediction is accurate. Furthermore, it is predicted that polypeptides with more charge than SEQID-00009 would be even more soluble which could have the benefit over SEQID-00009 of higher polypeptide yield.

[00278] In subsequent experiments, SEQID-00009 was purified to 99% purity with a single stage of ammonium sulfate precipitation. In this demonstration, the E. coli cells were harvested from shake flask fermentation by centrifugation, and the whole cells were suspended in 0.1 M sodium carbonate, pH 10 (1 gram of cells in 4 mL of solution). The cells were lysed by sonication (80 Amp for 2 minutes). The lysate was clarified by centrifugation and 0.2 μηι membrane filtration. The clarified supernatant was divided into a series of 3 mL fractions, to which a stock solution of 4 M ammonium sulfate, pH 9.8 was added. Variable amounts of stock solution were added to achieve a range of ammonium sulfate

concentrations. The samples were mixed for 10 minutes at room temperature, and clarified by centrifugation and 0.2 um membrane filtration. The clarified supernatants were analyzed by SDS-PAGE (blue stain), as described herein.

[00279] Example 5. Expression of nutritive polypeptides in Aspergillus niger fungi.

[00280] Gene Synthesis & Plasmid Construction. Genes encoding natively secreted proteins were PCR amplified from the Aspergillus niger ATCC 64974 using primers designed from the genome sequence of Aspergillus niger CBS 513.88. In one example, genes included native 5' secretion sequences and were cloned into the expression vector pAN56-l (Genbank: Z32700.1) directly under the control of the gpdA promoter from Aspergillus nidulans with the addition of a C-terminal 3X FLAG tag

(DYKDHDGDYKDHDIDYKDDDDK). In another example genes included only the mature peptide with the addition of a heterologous 5' secretion signal. Plasmids were constructed using the Gibson Assembly Kit (New England Biolabs, Beverly, MA). Recombinant plasmids were sequence verified before transformation into Aspergillus hosts.

[00281] pFGLAHIL6T was obtained from the BCCM/LMBP (Ghent, Netherlands). Plasmids were constructed using the Gibson Assembly Kit (New England Biolabs, Beverly, MA). Recombinant plasmids were sequence verified before transformation into Aspergillus hosts.

[00282] The pyrA nutritional marker was PCR amplified from Aspergillus niger ATCC 64974 using primers designed from genome sequence of Aspergillus niger ATCC 1015. The pyrA PCR fragment was digested with Xbal and ligated into an Xbal fragment of pCSN44 (Staben et al., 1989) to construct pES1947. pCSN44 was obtained from the BCCM/LMBP (Ghent, Netherlands). The recombinant plasmid was sequence verified before transformation into Aspergillus hosts.

[00283] Strain Construction. A protease deficient derivate of Aspergillus niger ATCC 62590 was used as the expression host. Expression vectors were co-transformed with pES1947 using the protoplast method as described in Punt et al., 1992, Methods in

Enzymology,216, 447-457. Approximately 5 ug of each plasmid was transformed into Aspergillus niger protoplasts. Transformants were selected on minimal media supplemented with 1.2 M sorbitol and 1.5% bacto agar (10 g/1 glucose, 4 g/1 sodium nitrate, 20 ml/1 salts solution (containing 26.2 g/1 potassium chloride and 74.8 g/1 Potassium phosphate monobasic at pH 5.5), 1 ml/1 vitamin solution (containing 100 mg/1 Pyridoxine hydrochloride, 150 mg/1 Thiamine hydrochloride, 750 mg/1 4-Aminobenzoic acid, 2.5 g/1 Nicotinic acid, 2.5 g/1 riboflavin, 20 g/1 choline chloride, and 30 mg/1 biotin), and lml/1 of metals solution

(containing 20 g/1 Zinc sulfate heptahydrate (ZnS04-7H20), 11 g/1 Boric acid (H3B03), 5 g/1 Manganese (II) chloride tetrahydrate (MnC12-4H20), 5 g/1 Iron (II) sulfate heptahydrate (FeS04-7H20), 1.7 g/1 Cobalt(II) chloride hexahydrate (CoC12-6H20), 1.6 g/1 Copper(II) sulfate pentahydrate (CuS04-5H20), 1.5 g/1 Sodium molybdate dihydrate (NaMo04-2H20), and 5.0 g/1 EDTA disodium salt dihydrate (Na2EDTA-2H20) at pH 6.5). Individual transformants were isolated on minimal media plates and allowed to grow at 30 C until they sporulated. Spores were harvested in water and stored at 4 C.

[00284] Expression Testing. Spore stocks of Aspergillus niger strains were inoculated at 10 6 spores/mL into 2 mL of CM (MM plus 5.0 g/1 yeast extract, 2.0 g/1 casamino acids) adjusted to pH 7 with 40 mM MES and SigmaFast Protease Inhibitor Cocktail (1 tab/100 mL, Sigma Aldrich) in 24 well square bottom deep well blocks. Culture blocks were covered with porous adhesive plate seals and incubated for 48 hrs in a micro-expression chamber (Glas- Col, Terre Haute, IN) at 30°C at 600 rpm. After the growth period, 0.5-ml aliquots of the culture supematants were filtered first through a 25 μπι/0.45-μιη dual stage filter followed by a 0.22 μm filter. The filtrates were then assayed to determine the levels of secreted protein of interest (POl).

Table E15A. Exemplary results of Aspergillus leader peptide library screening.

[00285] Aspergillus niger signal peptide library construction. It is difficult to predict which secretion signal peptide will facilitate the secretion of any given protein of interest best. Therefore, one approach to optimizing secretion is to fuse a library of signal peptide sequences to the protein and screen for those that result in the highest level of secretion. We constructed a signal peptide library for SEQID-00409 and SEQID-00420. Table E15A shows the signal peptides that were fused with SEQID-00409 and SEQID-00420. DNA encoding the individual signal peptides was constructed by duplexing single stranded oligonucleotides comprising the forward- and reverse-strands of each signal peptide sequence. The oligonucleotides were designed such that single strand tails were formed at the 5'-ends of the duplexed molecule. Genes encoding natively secreted proteins SEQID-00409 and SEQID- 00420 were PCR amplified from the Aspergillus niger ATCC 64974 using primers designed from the genome sequence of Aspergillus niger CBS 513.88. Genes included native 5' secretion sequences and were cloned into the expression vector pAN56-l (Genbank:

Z32700.1) directly under the control of the gpdA promoter from Aspergillus nidulans with the addition of a C-terminal 3X FLAG tag (DYKDHDGDYKDHDIDYKDDDDK). Then the vectors were amplified without the native signal peptide and plasmids with different signal peptides were reconstructed with the duplex signal peptide sequences using the Gibson Assembly Kit (New England Biolabs, Beverly, MA). Recombinant plasmids were sequence verified before transformation into Aspergillus hosts.

[00286] The pyrA nutritional marker was PCR amplified from Aspergillus niger ATCC 64974 using primers designed from genome sequence of Aspergillus niger ATCC 1015. The pyrA PCR fragment was digested with Xbal and ligated into an Xbal fragment of pCSN44 (Staben et al., 1989) to construct pES 1947. pCSN44 was obtained from the BCCM/LMBP (Ghent, Netherlands). The recombinant plasmid was sequence verified before transformation into Aspergillus hosts.

[00287] Aspergillus niger signal peptide library strain construction. A protease deficient derivate of Aspergillus niger ATCC 62590 was used as the expression host. Each signal peptide-gene combination vector was individually co-transformed with plasmid containing the nutritional marker pyrG using the protoplast method as described in Punt et al., 1992. Approximately 5 ug of each plasmid were transformed into Aspergillus niger protoplasts. Transformants were selected on minimal media supplemented with 1.2 M sorbitol and 1.5% bacto agar (10 g/1 glucose, 4 g/1 sodium nitrate, 20 ml/1 salts solution (containing 26.2 g/1 potassium chloride and 74.8 g/1 Potassium phosphate monobasic at pH 5.5), 1 ml/1 vitamin solution (containing 100 mg/1 Pyridoxine hydrochloride, 150 mg/1 Thiamine hydrochloride, 750 mg/1 4-Aminobenzoic acid, 2.5 g/1 Nicotinic acid, 2.5 g/1 riboflavin, 20 g/1 choline chloride, and 30 mg/1 biotin), and lml/1 of metals solution (containing 20 g/1 Zinc sulfate heptahydrate (ZnS04-7H20), 11 g/1 Boric acid (H3BQ3), 5 g/1 Manganese (II) chloride tetrahydrate (MnC12-4H20), 5 g/1 Iron (II) sulfate heptahydrate (FeS04-7H20), 1.7 g/1 Cobalt(II) chloride hexahydrate (CoC12-6H20), 1.6 g/1 Copper(II) sulfate pentahydrate (CuS04-5H20), 1.5 g/1 Sodium molybdate dihydrate (NaMo04-2H20), and 5.0 g/1 EDTA disodium salt dihydrate (Na2EDTA-2H20) at pH 6.5). Individual transformants were isolated on minimal media plates and allowed to grow at 30 C until they sporulated.

[00288] Aspergillus niger signal peptide library expression testing. Six different primary transformants from each construct were inoculated into 1 ml of minimal media as defined above supplemented with 5.0 g/1 yeast extract, 2.0 g/1 casamino acids) adjusted to pH 7 with 160 mM MES in 96 deep well culture blocks. Culture blocks were covered with porous adhesive plate seals and incubated for 48 hrs in a micro-expression chamber (Glas-Col, Terre Haute, IN) at 33°C at 800 rpm. After the growth period, 0.5-ml aliquots of the culture supernatants were filtered first through a 25 μπι/0.45-μπι dual stage filter followed by a 0.22 μιη filter. The filtered supernatants were then analyzed using Chip Electrophoresis as described below or anti-FLAG DOT-BLOT and SDS-PAGE as described below. Based on these results the primary transformants, which demonstrated higher secretion than the native signal peptide, were isolated on minimal media plate and allowed to grow at 30°C until they sporulated.

[00289] Spore stocks of the above Aspergillus niger strains along with the control Aspergillus niger strain that contain expression construct of pgpdA promoter and native signal peptide of SEQID-00409 and SEQID-00424 were inoculated at 10 6 spores/mL into 10 mL of minimal media as defined above supplemented with 5.0 g/1 yeast extract, 2.0 g/1 casamino acids) adjusted to pH 7 with 160 mM MES in 125 ml plastic flask. Aspergillus spores were then grown at 30°C with 150 RPM for two days. After the growth period, aliquots of the culture supernatants were filtered first through a 25 μπι/0.45-μηι dual stage filter followed by a 0.22 μπι filter. The filtrates were then analyzed using SDS-PAGE as described herein.

[00290] Aspergillus niger heterologous nutritive polypeptide gene synthesis & plasmid construction. Genes encoding nutritive polypeptides were synthesized (Geneart, Life Technologies). Genes were codon optimized for expression in Aspergillus niger. Synthesized genes were PCR amplified and cloned into the expression vector pAN56-l (Genbank:

Z32700.1) fused to glucoamylase with its native leader sequence under the control of the gpdA promoter from Aspergillus nidulans with the addition of a C-terminal 3X FLAG tag (DYKDHDGDYKDHDIDYKDDDDK) and Kexin protease site (NVISKR) between glucoamylase gene and gene of interest. Plasmids were constructed using the Gibson Assembly Kit (New England Biolabs, Beverly, MA). Recombinant plasmids were sequence verified before transformation into Aspergillus hosts. SEQID-00087, SEQID-00103, SEQID- 00105, SEQID-00115, SEQID-00218, SEQID-00298, SEQID-00302, SEQID-00341 , SEQID-00352, SEQID-00354 genes were utilized.

[00291] Aspergillus niger heterologous nutritive polypeptide strain construction. A protease deficient derivate of Aspergillus niger D15#26 (E. Karnaukhova et al, Microbial Cell Factories, 6:34) was used as the expression host. 10 ug of Expression vectors were co- transformed with lug plasmid containing pyrG selection marker using the protoplast method described in Punt et al., 1992, Methods in Enzymology,216, 447-457. Transformants were selected on minimal media containing 10 g/1 glucose, 6 g/1 sodium nitrate, 20 ml/1 salts solution (containing 26 g/1 potassium chloride and 76 g/1 Potassium phosphate monobasic at pH 5.5), 2 mM magnesium sulphate, 1 ml/1 vitamin solution (containing 100 mg/1 Pyridoxine hydrochloride, 150 mg/1 Thiamine hydrochloride, 750 mg/1 4-Aminobenzoic acid, 2.5 g/1 Nicotinic acid, 2.5 g/1 riboflavin, 20 g/1 choline chloride, and 30 mg/1 biotin), and lml/1 of metals solution (containing 20 g/1 Zinc sulfate heptahydrate (ZnS04-7H20), 1 1 g/1 Boric acid (H3B03), 5 g/1 Manganese (II) chloride tetrahydrate (MnC12-4H20), 5 g/1 Iron (II) sulfate heptahydrate (FeS04-7H20), 1.7 g/1 Cobalt(Il) chloride hexahydrate (CoC12-6H20), 1.6 g/1 Copper(II) sulfate pentahydrate (CuS04-5H20), 1.5 g/1 Sodium molybdate dihydrate (NaMo04-2H20), and 5.0 g/1 EDTA disodium salt dihydrate (Na2EDTA-2H20) at pH 6.5) and supplemented with 1.2 M sorbitol and 1.5% bacto agar. Individual transformants were isolated on minimal media plates and allowed to grow at 30 C until they sporulated. Spores were harvested in water at stored at 4 C.

[00292] Aspergillus niger heterologous nutritive polypeptide expression testing. 90 different primary transformants from each construct were inoculated into 1 ml of minimal media as defined above supplemented with lg/L casamino acids in 96 deep well culture blocks (1 st MTP). Culture blocks were covered with porous adhesive plate seals and incubated for 72 hrs in a micro-expression chamber (Glas-Col, Terre Haute, IN) at 33°C at 800 rpm. After the growth period, 0.5-ml aliquots of the culture supernatants were filtered first through a 25 μιη/0.45-μιη dual stage filter followed by a 0.22 μπι filter. The filtrates were then assayed using an anti-FLAG ELISA method, as described herein, to determine the levels of secreted protein of interest (POI). At least five colonies from nine expression constructs excluding SEQID-00302 yielded positive signals in an anti-flag ELISA as reported in Table E15B. At least 5 primary transformants that showed positive signals in anti-FLAG ELISA assay from each of the nine expression strains were also streaked onto a fresh minimal media agar plate for single spore purification and retested for confirmation (2 nd MTP).

[00293] Spores were harvested from the plate and inoculation was done with fresh spore crops with a density of approximately 1E9 spores/ml. 10 ul of these spore crops were added to 10 ml of minimal medium giving a start density of 1E6 spores/ml. Aspergillus spores were then grown at 33C with 150 RPM for three days. After the growth period, aliquots of the culture supematants were filtered first through a 25 μιη/0.45-μιη dual stage filter followed by a 0.22 μιη filter. The filtrates were then analyzed using anti-FLAG ELISA, anti-flag western blot and SDS-PAGE described below.

[00294] Certain clones from different expression construct were grown using fresh spore crops with a final density of approximately. 1E6 spores/ml in one litre minimal media. Aspergillus spores were then grown at 33C with 150 RPM for three days. After the growth period, aliquots of the culture supematants were filtered first through a 25 μπι/0.45-μπι dual stage filter followed by a 0.22 μιη filter. The filtrates were then analyzed using anti-FLAG ELISA, anti-flag western blot and SDS-PAGE described below. Any secreted protein above 39 mg/1 in an anti-flag ELISA from a one liter shake flask were detected by an anti-FLAG western blot and SDS-PAGE.

[00295] Table E15B. demonstrates the anti-flag ELISA results and anti-FLAG western blot results of different protein secretion in Aspergillus niger

[00296] Example 6: Nutritive polypeptide expression analysis.

[00297] Nutritive polypeptides intracellularly expressed and/or secreted were detected using a variety of methods. These methods include electrophoresis, western blot, dot-blot, ELISA, and quantitative LC/MS/MS. [00298] Electrophoresis Analysis. Extracellular and/or intracellular expressed proteins were analyzed by chip electrophoresis (Labchip GXII) or SDS-PAGE analysis to evaluate expression level.

[00299] For SDS-PAGE, ΙΟμΙ sample in Invitrogen LDS Sample Buffer mixed with 5% β- mercaptoethanol was boiled and loaded onto either: 1) a Novex® NuPAGE® 12% Bis-Tris gel (Life Technologies), or 2) a Novex® 16% Tricine gel (Life Technologies), and run using standard manufacturer's protocols. Gels were stained using SimplyBlue™ SafeStain (Life Technologies) using the standard manufacturer's protocol and imaged using the Molecular Imager® Gel Doc™ XR+ System (Bio-Rad). Over-expressed heterologous proteins were identified by comparison against a molecular weight marker and control cultures.

[00300] For chip electrophoresis (Labchip GX II) samples were analyzed using a HT Low MW Protein Express LabChip® Kit (following the manufacturer's protocol) by adding 2 μΐ of sample to 7 μΐ sample buffer. A protein ladder was loaded every 12 samples for molecular weight determination and quantification (molecular weight in kDa).

[00301] LC-MS/MS analysis. Whole cell, cell lysate and secreted samples can be analyzed for protein expression using LC-MS/MS. To analyze samples, 10 μg of sample were loaded onto a 10% SDS-PAGE gel (Invitrogen) and separated approximately 2 cm. The gel was excised into ten segments and the gel slices were processed by washing with 25 mM ammonium bicarbonate, followed by acetonitrile. Gel slices were then reduced with 10 mM dithiothreitol at 60 °C, followed by alkylation with 50 mM iodoacetamide at room temperature. Finally, the samples were digested with trypsin (Promega) at 37 °C for 4 h and the digestions were quenched with the addition of formic acid. The supernatant samples were then analyzed by nano LC/MS/MS with a Waters NanoAcquity HPLC system interfaced to a ThermoFisher Q Exactive. Peptides were loaded on a trapping column and eluted over a 75 μηι analytical column at 350 nL/min; both columns were packed with Jupiter Proteo resin (Phenomenex). A 1 h gradient was employed. The mass spectrometer was operated in data- dependent mode, with MS and MS/MS performed in the Orbitrap at 70,000 FWHM resolution and 17,500 FWHM resolution, respectively. The fifteen most abundant ions were selected for MS/MS. Data were searched against an appropriate database using Mascot to identify peptides. Mascot DAT files were parsed into the Scaffold software for validation, filtering and to create a nonredundant list per sample. Data were filtered at 1% protein and peptide false discovery rate (FDR) and requiring at least two unique peptides per protein. [00302] Anti-FLAG Western Blot. Extracellular and/or intracellular protein was analyzed using western blot to evaluate expression level.

[00303] For SDS-PAGE, 10μΙ sample in Invitrogen LDS Sample Buffer mixed with 5% β- mercaptoethanol was boiled and loaded onto a Novex® NuPAGE® 12% Bis-Tris gel (Life Technologies). For standards, 0.5 μg to 2 μg Amino-terminal FLAG-BAP™ Fusion Protein (Sigma) were loaded as a positive control. Gel electrophoresis was performed according to manufacturer's protocol. Once run, the gel was transferred onto an iBlot® Mini Transfer Stack nitrocellulose 0.2 μm pore size membrane (Life Technologies) according to manufacturer protocol. Next, the nitrocellulose membrane was removed from the stack and assembled into a Millipore SNAP i.d.® 2.0 Protein Detection Apparatus. 30 ml of Millipore Blok CH Noise Cancelling reagent was placed into an assembled reservoir tray and vacuumed through. 3 ml of antibody solution was prepared by diluting 2 μl of Sigma Monoclonal ANTI-FLAG® M2-Peroxidase (HRP) antibody into 3ml of Millipore Blok CH Noise Cancelling Reagent. Antibody solution was added to reservoir tray and allowed to incubate for 10 minutes without vacuum. After incubation, the reservoir tray was filled with 90 ml of IX PBS + 0.1% Tween 20 and vacuumed through as the final wash step. After wash, the nitrocellulose membrane was removed and placed into a reagent tray. 20 ml of Millipore Luminata Classico Western HRP substrate was added and allowed to incubate for 1 minute. After incubation, the membrane was placed into imaging tray of Gel Doc™ XR+ System (Bio-rad) and imaged using a chemiluminescent protocol.

[00304] Anti-FLAG Dot Blot. Extracellular and/or intracellular proteins were analyzed using dot blot to evaluate expression level.

[00305] 110 μl of 0.2 μm filtered sample was mixed with 110 μl 8.0M Guanidine

Hydrochloride, 0.1M Sodium Phosphate (Denaturing Buffer) to allow for even protein binding. A standard curve of Amino-terminal FLAG-BAP™ Fusion Protein (Sigma) was prepared in the same matrix as the samples, starting at 2 μg, diluting 2X serially to 0.0313 μg. Invitrogen 0.45 μm nitrocellulose membrane was pre-wet in IX PBS buffer for 5 minutes and then loaded onto Bio-Rad Dot Blot Apparatus. 300 μm of PBS was vacuumed through to further wet the membrane. Next, 200 μl the 1 :1 Sample:Denaturing Buffer mixture was loaded into each well and allowed to drain through the dot blot apparatus by gravity for 30 minutes. Next, a 300 μm PBS is wash was performed on all wells by vacuum followed by loading 300 μl of Millipore Blok CH Noise Cancelling reagent and incubating for 60 minutes. After blocking, membrane was washed with 300 μl of IX PBS + 0.1% Tween 20. Next, antibody solution was prepared by adding 2.4 μΐ of Sigma Monoclonal ANTI- FLAG® M2-Peroxidase (HRP) antibody to 12 ml of Millipore Blok CH Noise Cancelling reagent (1 :5000 dilution). 100 μΐ of the resulting antibody solution was added to each well and allowed to incubate for 30 minutes by gravity. After antibody incubation, three final washes were performed with 300 μΐ IX PBS + 0.1% Tween 20 by vacuum. After washes, nitrocellulose membrane was removed and placed into a reagent tray. 20 ml of Millipore Luminata Classico Western HRP substrate was added and allowed to incubate for 1 minute. After incubation, membrane was placed into imaging tray of Gel Doc™ XR+ System (Bio- rad) and imaged using a chemiluminescent protocol.

[00306] Anti-FLAG ELISA. Protein expression was detected by direct ELISA using an anti- FLAG antibody. Briefly, a dilution series (0.005-10 μg/mL) of FLAG fusion protein (Sigma) was prepared in 0.1 M NaHC03 (pH 9.5). A dilution series (0.01-20 μg/mL) of FLAG fusion protein was also prepared in spent medium from an empty fungus culture diluted 10- fold in 0.1 M NaHC03 (pH 9.5). Experimental cultured medium samples were diluted 10- fold in 0.1 M NaHCCe (pH 9.5). The FLAG fusion protein dilution series and the experimental samples were then transferred (0.2 mL) to the wells of a Nunc-immuno™ Maxisorp™ (Thermo) 96-well plate and incubated overnight at 2-8 °C to facilitate protein adsorption. The following morning, plates were rinsed three times with Tris-buffered saline (TBS) containing 0.05% TWEEN 80 (TBST). The wells were blocked from non-specific protein binding by incubation with 0.2 mL of 1% non-fat dried milk dissolved in TBST for 1 h at room temperature. The plates were rinsed three more times with TBST and then incubated at room temperature for 1 h with 0.2 mL of the monoclonal antibody anti-FLAG M2-HRP (Sigma) diluted 1 :2000 in blocking buffer. The plates were again rinsed three more times with TBST before incubation with 0.2 mL/well of SIGMAFAST™ o- phenylenediamine dihydrocholride (OPD) (Sigma) for 30 min at room temperature. The reaction was terminated with the addition of 0.05 mL/well of 1 M HCl and the absorbance of samples was measured at 492 nm on a spectrophotometer equipped with a plate reader.

EXAMPLE 7. Expression of Cas9 in Aspergillus niger.

[00307] Plasmid Construction: The cas9 gene from Streptococcus pyrogenes was codon optimized for expression in Aspergillus and synthesized. An S V40 nuclear localization signal was introduced at 5' end of the Cas9 coding sequence ("Ancas9"). The Ancas9 gene was cloned into an integrating plasmid for filamentous fungi. The resulting construct had Cas9 expression under the control of PglaA, a strong inducible glucoamylase promoter from Aspergillus niger, and TrpC terminator. The vector containing the PglaA promoter and TrpC terminator was amplified by PCR and Ancas9 gene with 3X Flag tag

(DYKDHDGDYKDHDIDYKDDDDK) was cloned between PglaA promoter and TrpC terminator using Gibson Assembly (New England Biolabs, MA, USA) according to manufacturer's instructions. The resulting construct was propagated in E.coli and verified by DNA sequencing.

[00308] Strain construction: For expression of Cas9 in Aspergillus niger, a AaamA,pyrE derivative strain of Aspergillus niger MGG029 (Conesa et al. (2000). Applied and

Environmental Microbiology, 66(1), 3016-23) was used. Expression vectors were co- transformed with another vector containing nutritional marker pyrE gene using the protoplast method as described (See, e.g., Punt et al., 1992, Methods in Enzymology,216, 447-457) to generate the resulting strain An348. Approximately 5 ug of each plasmid were transformed into Aspergillus niger protoplasts. Transformants were selected on minimal media supplemented with 1.2 M sorbitol and 1.5% bacto agar (10 g/1 glucose, 4 g/1 sodium nitrate, 20 ml/1 salts solution (containing 26.2 g/1 potassium chloride and 74.8 g/1 Potassium phosphate monobasic at pH 5.5), 1 ml/1 vitamin solution (containing 100 mg/1 Pyridoxine hydrochloride, 150 mg/1 Thiamine hydrochloride, 750 mg/1 4-Aminobenzoic acid, 2.5 g/1 Nicotinic acid, 2.5 g/1 riboflavin, 20 g/1 choline chloride, and 30 mg/1 biotin), and lml/1 of metals solution (containing 20 g/1 Zinc sulfate heptahydrate (ZnS04-7H20), 1 1 g/1 Boric acid (H3B03), 5 g/1 Manganese (II) chloride tetrahydrate (MnC12-4H20), 5 g/1 Iron (II) sulfate heptahydrate (FeS04-7H20), 1.7 g/1 Cobalt(II) chloride hexahydrate (CoC12-6H20), 1.6 g/1 Copper(II) sulfate pentahydrate (CuS04-5H20), 1.5 g/1 Sodium molybdate dihydrate (NaMo04-2H20), and 5.0 g/1 EDTA disodium salt dihydrate (Na2EDTA-2H20) at pH 6.5). Individual transformants were isolated on minimal media plates and allowed to grow at 30°C until they sporulated. Spores were harvested in water and stored at 4°C.

[00309] Expression: An 348 strain was grown in liquid complete media with lOg/1 maltose replacing glucose as indicated (5.0 g/1 yeast extract, 2.0 g/1 casamino acids, 10 g/1 glucose, 4 g/1 sodium nitrate, 20 ml/1 salts solution (containing 26.2 g/1 potassium chloride and 74.8 g/1 Potassium phosphate monobasic at pH 5.5), and lml/1 of metals solution (containing 20 g/1 Zinc sulfate heptahydrate (ZnS04-7H20), 1 1 g/1 Boric acid (H3B03), 5 g/1 Manganese (II) chloride tetrahydrate (MnC12-4H20), 5 g/1 Iron (II) sulfate heptahydrate (FeS04-7H20), 1.7 g/1 Cobalt(II) chloride hexahydrate (CoC12-6H20), 1.6 g/1 Copper(II) sulfate pentahydrate (CuS04-5H20), 1.5 g/1 Sodium molybdate dihydrate (NaMo04-2H20), and 5.0 g/1 EDTA disodium salt dihydrate (Na2EDTA-2H20) at pH 6.51), 1 ml/1 vitamin solution (containing 100 mg/1 Pyridoxine hydrochloride, 150 mg/1 Thiamine hydrochloride, 750 mg/1 4-Aminobenzoic acid, 2.5 g/1 Nicotinic acid, 2.5 g/1 riboflavin, 20 g/1 choline chloride, and 30 mg/1 biotin. Spores were directly inoculated into 800 uL of complete media in a 96 well square bottom deep well block. Culture blocks were covered with porous adhesive plate seals and incubated for 48 hours in a micro-expression chamber (Glas-Col, Terre Haute, IN) at 30°C and shaking at 1000 rpm.

[00310] After incubation, mycelia were transferred to 2 ml Eppendorf tubes and the media was aspirated. Next, tubes were freeze-thawed in liquid nitrogen three times. 25 μΐ of 8M Urea was added and samples were vortexed. Next, 25 μΐ of 4X LDS with 5% β- mercaptoethanol was added (total 50 μΐ). This mixture was heated at 95°C for 10 minutes. The resulting mixture (15μ1) was loaded on a a Novex® NuPAGE® 12% Bis-Tris gel (Life Technologies) and ran according to manufacturer's protocol. For standards, 0.5 μg Amino- terminal FLAG-BAP™ Fusion Protein (Sigma) is loaded as a positive control. Once run, the gel is transferred onto a iBlot® Mini Transfer Stack nitrocellulose 0.2 μπι pore size membrane (Life Technologies) according to manufacturer protocol. Next, nitrocellulose membrane is removed from stack and assembled into Millipore SNAP i.d.® 2.0 Protein Detection Apparatus. 30 ml of Millipore Blok CH Noise Cancelling reagent is placed into assembled reservoir tray and vacuumed through. 3 ml of antibody solution is prepared by diluting 2 μΐ of Sigma Monoclonal ANTI-FLAG® M2-Peroxidase (HRP) antibody into 3 ml of Millipore Blok CH Noise Cancelling Reagent. Antibody solution is added to reservoir tray and allowed to incubate for 10 minutes without vacuum. After incubation, reservoir tray is filled with 90 ml of IX PBS + 0.1% Tween 20 and vacuumed through as the final wash step. After wash, nitrocellulose membrane is removed and placed into a reagent tray. 20 ml of Millipore Luminata Classico Western HRP substrate is added and allowed to incubate for 1 minute. After incubation, membrane is placed into imaging tray of Gel Doc™ XR+ System (Bio-rad) and imaged using a chemiluminescent protocol. A maltose dependent induction of cas9 was observed with anti-FlaG Western blot analysis.

[00311] EXAMPLE 8. Introduction of arginine auxotrophy using CRISPR in Aspergillus niger.

[00312] Plasmid Construction: The integrating plasmid containing Cas9 expression construct was used. Target sites for CRISPR cleavage were identified using a publicly available computational tool (Zifit.partners.org) against the query Anl4g03400, the argB gene from A niger CBS 513.88. Target site was chosen to start with "AT" to match the +1 and +2 positions of the gpdA promoter for efficient guide RNA (or gRNA) transcription. In addition, a target site closer to the 5' end of the CDS was chosen to disrupt most of the argB gene. The target sequence at 60 bp of argB that was selected was

ATTTCGCTGCCTCGGGCAAG. The gRNA inserts were constructed by annealing oligonucleotides (Table E8). The vector backbone was generated by PCR amplification from integrating plasmid containing gpdA promoter from Aspergillus nidulans and trpC terminator with KG398 (TTTAATAGCTCCATGTCAACAAGAATAAAAC ) and KG399 (GGGAAAAGAAAGAGAAAAGAAAAGAGC). Plasmids were constructed using the Gibson Assembly Kit (New England Biolabs, Beverly, MA). Recombinant plasmids were sequence verified before transformation into Aspergillus hosts.

Table E8. Oligonucleotides for gRNA cloning.

[00313] Strain Construction: For generating arginine auxotrophy in Aspergillus niger, a

AcspAl,pyrGl, prtl3 derivative strain of Aspergillus niger ATCC 62590 was used. This strain was co-transformed with a Cas9 expressing integrating vector, a vector encoding the selection marker pyrG from Aspergillus niger, in combination with sgRNA vector pES2683 to give rise to populations of transformants identified as AN386 using the method provided herein.

[00314] Arginine Auxotrophy: Transformants were selected on arginine minimal media plates with sorbitol. Pooled spores from transformants were colony purified by transfer to arginine minimal media plates with maltose to induce expression of cas9. 25 colonies were replica plated on arginine minimal media plates with glucose or maltose as the carbon source. Observed were 5 out of 25 colonies that failed to grown on minimal media but grew on arginine minimal media, indicating these isolates were auxotrophs for arginine. [00315] AN387 was further characterized by complementation analysis. AN387 was transformed with LMBP 2630 (BCCM LMBP Plasmid Collection, University of Ghent) encoding the Aspergillus nidulans argB homolog. This gene is known to complement argB mutants of Aspergillus niger (Lenouvel et al., 2002). Transformants were selected on minimal media with sorbitol and recovered was a single arginine prototroph, AN396. DNA from AN396 was isolated and characterized for its argB genotype with PCR analysis as above. In addition, PCR primers specific to the Aspergillus nidulans argB were utilized in PCR analysis demonstrating the failure of AN396 to generate amplicons across the argB locus, while the expected PCR product was observed for Apsergillus nidulans argB.

Therefore, LMBP 2630 complemented the argB mutation of AN387.

EXAMPLE 9. avsA and albA gene disruption using CRISPR in Aspergillus niser.

[00316] Plasmid Construction: The Aspergillus codon optimized cas9 gene was cloned downstream of an inducible PglaA promoter from Aspergillus nidulans and upstream of trpC terminator using Gibson Assembly (New England BioLabs, MA, USA) in a self replicating plasmid in Aspergillus niger. Constructs for expression of gRNA as provided herein were designed by inserting a synthetic gRNA sequence downstream of strong constitutive PgpdA such that the first nucleotide of the gRNA is positioned at the +1 transcription site of the gpdA promoter. The gRNA inserts were designed and synthesized with flanking sequences that overlapped those of the integrative vector backbone fragment containing gpdA promoter from Aspergillus nidulans and trpC terminator which was generated by PCR using pES2002 as template with primers, KG398 (TTTAATAGCTCCATGTCAACAAGAATAAAAC) and KG399 (GGGAAAAGAAAGAGAAAAGAAAAGAGC). The gRNA inserts were constructed by synthesizing linear fragments as provided herein to yield three plasmids, pES2696, PES2694 and pES2693. Recombinant plasmids were sequence verified before transformation into Aspergillus hosts.

pES269 Alb6 TCTTTTCTTTTCTCTTTCTTTTCCCATCAGGTCCTAGCCCTGT 3 CCGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTC

CGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTTTT TTTAATAGCTCCATGTCAACAAGAA

[00317] Strain Construction: For generating gene disruption of albA and aygA in Aspergillus niger, a AcspAl, pyrGl, prtl3 derivative strain of Aspergillus niger ATCC 62590 was used as a recipient. This strain was co-transformed with Cas9 expressing self-replicating vector, a vector encoding the selection marker pyrG from Aspergillus niger, in combination with sgRNA vectors, pES2693, pES2694 and pES2696 to give rise to populations of transformants using the methods provided herein.

[00318] Gene disruption: Transformants were selected on minimal media plates with sorbitol. Pooled spores from transformants were colony purified by transfer to arginine minimal media plates with maltose to induce expression of cas9. Twenty colonies from aygA targeted strain and ten colonies from each albA targeted strain were picked and aygA and albA gene region were sequenced respectively. Four colonies of aygA113 targeted strain showed disruption of aygA gene, three colonies of albA6 targeted strain and five colonies of albA 104 targeted strain showed disruption of albA gene as shown in the table below.

[00319] Provided are nucleic acid sequences of aygA and albA variants

[00320] Aspergillus niger codon optimized cas9 with SV40s

ATGGCCTCCCCGCCCAAGAAGAAGCGCAAGGTCGGCTCCATGGATAAGAAGTACTCCATC GGCCTCGATATCGGC ACCAACTCCGTCGGCTGGGCCGTCATTACCGATGAGTACAAGGTCCCCTCCAAGAAGTTC AAGGTCCTCGGCAAC ACCGATCGCCACTCCATCAAGAAGAACCTGATCGGCGCCCTCCTGTTCGATTCCGGCGAA ACCGCCGAGGCCACC CGTCTCAAGCGTACCGCCCGTCGCCGCTACACCCGCCGCAAGAACCGCATCTGCTACCTC CAAGAGATCTTCTCC AACGAGATGGCCAAGGTCGATGATAGCTTCTTCCACCGCCTCGAGGAATCCTTCCTGGTC GAGGAAGATAAGAAG CACGAGCGCCACCCCATCTTCGGCAACATCGTCGATGAGGTCGCCTACCACGAGAAGTAC CCCACCATCTACCAC CTCCGCAAGAAGCTCGTCGATTCCACCGATAAGGCCGATCTCCGCCTCATCTACCTCGCC CTCGCCCACATGATC AAGTTCCGCGGCCACTTCCTCATCGAGGGCGATCTCAACCCCGATAACTCCGATGTCGAT AAGCTGTTCATCCAG CTCGTCCAGACCTACAACCAGCTGTTCGAGGAAAACCCCATCAACGCCTCCGGCGTCGAT GCCAAGGCCATCCTC TCCGCTCGCCTCTCCAAGTCTCGCCGCCTCGAGAACTTGATCGCCCAGCTCCCCGGCGAG AAGAAGAACGGCCTC TTCGGGAACCTGATCGCCCTCTCCCTGGGCCTCACCCCCAACTTCAAGTCCAACTTCGAT CTCGCCGAGGATGCC AAGCTCCAGCTCTCCAAGGATACCTACGATGATGATCTCGATAACCTCCTCGCCCAGATC GGGGATCAGTACGCC GATCTGTTCCTCGCCGCCAAGAACCTCTCCGATGCCATCCTCCTCTCCGATATCCTCCGC GTCAACACCGAGATC ACCAAGGCCCCCCTCTCCGCCTCCATGATCAAGCGCTACGATGAGCACCACCAGGATCTC ACCCTGCTCAAGGCC CTCGTCCGCCAGCAGCTCCCGGAGAAGTACAAGGAAATCTTCTTCGATCAGTCCAAGAAC GGCTACGCCGGCTAC ATCGATGGCGGCGCTTCCCAAGAGGAATTCTACAAGTTCATCAAGCCCATCCTCGAGAAG ATGGATGGCACCGAG GAACTCCTCGTCAAGCTCAACCGCGAGGATCTCCTCCGCAAGCAGCGCACCTTCGATAAC GGCTCCATCCCCCAC CAGATCCACCTCGGCGAGCTGCACGCCATCTTGCGCCGGCAAGAGGATTTCTATCCGTTC CTCAAGGATAACCGC GAGAAGATCGAAAAGATCCTGACCTTCCGCATCCCCTACTACGTCGGCCCCCTCGCTCGC GGCAACTCCCGCTTC GCCTGGATGACCCGCAAGTCCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTCGTC GATAAGGGCGCCTCC GCCCAGTCCTTCATCGAGCGCATGACCAACTTCGATAAGAACCTCCCCAACGAGAAGGTC CTCCCCAAGCACTCC CTGCTCTACGAGTACTTCACCGTCTACAACGAGCTGACCAAGGTCAAGTACGTCACCGAG GGTATGCGCAAGCCC GCCTTCCTGTCCGGTGAGCAGAAGAAGGCCATCGTCGATCTGCTGTTCAAGACCAACCGC AAGGTCACCGTCAAG CAGCTCAAGGAAGATTACTTCAAAAAGATCGAGTGCTTCGATTCCGTCGAGATCAGCGGC GTCGAGGATCGCTTC AACGCCTCCCTCGGAACCTACCACGATCTCCTCAAGATTATCAAGGATAAGGATTTCCTC GACAACGAGGAAAAC GAGGACATCCTCGAGGACATCGTCCTCACCCTCACCCTCTTCGAGGATCGGGAGATGATC GAGGAACGCCTCAAG ACCTACGCCCACCTCTTCGATGATAAGGTCATGAAGCAGCTGAAGCGCCGTCGCTACACC GGCTGGGGTCGCCTC TCCCGCAAGCTCATCAACGGCATCCGCGATAAGCAGTCCGGCAAGACTATCCTCGATTTC CTCAAGTCCGATGGC TTCGCCAACCGCAACTTCATGCAGCTCATCCACGATGATTCCCTCACCTTCAAGGAAGAT ATCCAGAAGGCCCAG GTCAGCGGCCAGGGCGATTCCCTCCACGAGCATATCGCCAACCTCGCCGGCTCCCCCGCC ATCAAGAAGGGCATC CTCCAGACCGTCAAGGTCGTCGATGAGCTGGTCAAGGTCATGGGCCGCCACAAGCCCGAG AACATCGTCATCGAG ATGGCCCGCGAGAACCAGACCACCCAGAAGGGCCAGAAGAACTCCCGCGAGCGCATGAAG CGCATCGAGGAAGGC ATCAAGGAACTCGGCTCCCAGATCCTCAAGGAACACCCCGTCGAGAACACCCAGCTCCAG AACGAGAAGCTCTAC CTCTACTACCTCCAGAACGGCCGCGATATGTACGTCGATCAAGAGCTGGATATCAACCGC CTCTCCGATTACGAT GTCGATCATATCGTCCCCCAGTCCTTCCTGAAGGATGATTCCATCGATAACAAGGTCCTC ACCCGCTCCGATAAG AACCGCGGCAAGTCCGATAACGTCCCCTCCGAAGAGGTCGTCAAGAAGATGAAGAACTAC TGGCGGCAGCTCCTC AACGCCAAGCTCATCACCCAGCGCAAGTTCGATAACCTCACCAAGGCCGAGCGCGGTGGC CTCTCCGAGCTGGAT AAGGCCGGCTTCATCAAGCGCCAGCTCGTCGAAACCCGCCAGATCACCAAGCACGTCGCC CAGATCCTCGATTCC CGCATGAACACCAAGTACGATGAGAACGATAAGCTGATCCGCGAGGTCAAGGTGATCACC CTCAAGTCCAAGCTC GTGTCCGATTTCCGCAAGGATTTCCAGTTCTACAAGGTCCGCGAGATCAACAACTACCAC CACGCCCACGATGCC TACCTCAACGCCGTCGTCGGCACCGCCCTCATCAAGAAGTACCCTAAGCTCGAGTCCGAG TTCGTCTACGGCGAT TACAAGGTCTACGATGTCCGCAAGATGATCGCCAAGTCCGAGCAAGAGATCGGCAAGGCT ACCGCCAAGTACTTC TTCTACTCCAACATCATGAATTTCTTCAAGACCGAAATCACCCTCGCCAACGGCGAGATC CGCAAGCGCCCCCTC ATCGAGACTAACGGCGAGACTGGCGAGATCGTCTGGGATAAGGGCCGCGATTTCGCCACC GTCCGCAAGGTCCTC TCCATGCCCCAGGTCAACATCGTGAAGAAGACCGAGGTCCAGACCGGCGGCTTCTCCAAG GAATCCATCCTGCCC AAGCGCAACTCCGACAAGCTGATCGCCCGCAAGAAGGATTGGGACCCCAAGAAGTACGGA GGCTTCGATTCCCCC ACCGTCGCCTACTCCGTCCTCGTCGTCGCCAAGGTCGAGAAGGGCAAGTCCAAGAAGCTC AAGTCCGTCAAGGAA CTCCTGGGCATCACTATCATGGAACGCTCCAGCTTCGAGAAGAACCCTATCGATTTCCTC GAGGCCAAGGGCTAC AAGGAAGTCAAGAAGGATCTCATCATCAAGCTCCCTAAGTACTCCCTGTTCGAGCTGGAA AACGGCCGCAAGCGC ATGCTCGCGTCCGCTGGCGAGCTGCAGAAGGGCAACGAGCTGGCCCTGCCCTCCAAGTAC GTCAACTTCCTCTAC CTCGCCTCCCACTACGAGAAGCTCAAGGGCTCCCCCGAGGATAACGAGCAGAAGCAGCTG TTCGTCGAGCAGCAC AAGCACTACCTCGATGAGATCATCGAGCAGATCTCCGAGTTCTCCAAGCGCGTCATCCTC GCCGATGCCAACCTC GATAAGGTCCTGTCCGCCTACAACAAGCACCGCGATAAGCCCATCCGCGAGCAGGCCGAG AACATCATCCACCTC TTCACCCTCACCAACCTCGGTGCCCCTGCCGCCTTCAAGTACTTCGATACCACCATCGAT CGCAAGCGCTACACC TCCACCAAGGAAGTCCTCGACGCCACCCTCATCCACCAGTCCATCACCGGCCTCTACGAA ACCCGCATCGATCTC TCCCAGCTCGGCGGCGATTCCCCCGTCCGCTCCCCTAAGAAGAAGAGAAAGGTCGATTAC AAGGATCACGACGGC GACTACAAGGATCATGATATCGATTATAAGGACGACGACGACAAGTAG

[00321] AlbA gene sequence from Aspergillus niger

ATGGAGGGTCCATCTCGTGTGTACCTTTTTGGAGACCAGACCAGCGACATCGAAGCTGGC CTGCGCCGTCTGCTC CAAGCGAAGAATAGTACCATTGTCCAGTCCTTTTTCCAGCAATGCTTCCATGCAATTCGT CAAGAGATCGCGAAG CTCCCGCCGTCTCATCGGAAGCTCTTCCCACGCTTCACGAGCATCGTTGATCTCCTTTCC AGGAGTCGTGAATCA GGTCCTAGCCCTGTCCTGGAGAGTGCATTGACATGCATCTACCAATTGGGTTGTTTCATT CAGTAAGTCAATGAG TTACCATCTATACTTGACAAGTCTGACCAGCCTTCAGCTTTTACGGGGATCTTGGACATG ACTACCCTACACCCT CCAACAGCCATCTTGTTGGCCTGTGCACTGGTGTTCTGAGCTGCACGGCTGTAAGTTGCG CCAGAAATGTTGGAG AGCTTATTCCAGCTGCAGTGGAATCGGTTGTAATTGCACTGCGACTGGGAATCTGCGTTT TTCGAGTTCGAGAAC TGGTGGACTCCGCCGATTCCGAGTCAACATGCTGGTCAGCGTTGGTTTCTGGAATCAGTG AAGCAGAGGCTAGCC ACCTGATCGACGAGTACAGTAGTAAGAAGGTGTGCTCTTCCAACTTTAAACCCCCGCATT GTGGGATGCTGACAG ATGCAGGCTACTCCGCCTTCTTCGAAACCGTATATCAGCGCGGTAAGCTCTAATGGCGTT ACTGTCAGCGCACCA CCTACGGTACTTGATGAATTCGTCGAGACCTGCATTTCCAAGAATTACAAGCCAGTGAAG GCCCCTATTCATGGC CCGTACCATGCGCCACATCTGTATGATGATAAGGATATCGACCGCATCCTGCAGCAGTCC TCTGCTCTAGAAGGA CTGACCGGCTGTTCACCCGTTATTCCCATCATCTCCAGTAACACTGGAAAGCCGATCAAG GCCAAGTCCATCAAA GATCTCTTCAAGGTCGCACTGGAGGAGATACTCCTACGACGACTATGCTGGGACAAGGTC ACGGAGTCCTGCACA TCAGTCTGCAAGACCGGCACAAACCACTCTTGCAAATTGTTTCCGATCTCGAGTAGCGCC ACTCAAAGTTTGTTC ACAGTCCTCAAGAAGGCCGGTGTGAGCATCAGCTTGGAGACTGGGGTAGGAGAGATCGCG ACGAACCCAGAAATG CGGAACCTTACTGGCAAGGCAGAAAATTCAAAGATTGCTATCATTGGTATGTCTGGAAGA TTTCCTGACTCGGAT GGTACGGAGAGCTTCTGGAACCTCCTGTACAAAGGACTCGACGTACATCGCAAAGTCCCC GCAGACCGTTGGGAC GTTGATGCCCACGTCGACATGACCGGGTCAAAGAGAAACACAAGCAAAGTGGCTTACGGT TGCTGGATCAACGAA CCCGGCCTGTTTGACCCCCGATTCTTCAACATGTCGCCTCGGGAAGCACTCCAAGCAGAT CCTGCACAACGTCTT GCGTTGCTTACAGCGTACGAGGCTCTCGAGATGGCTGGCTTCATCCCGGATAGCTCTCCA TCGACGCAGAGGGAC CGTGTGGGTATTTTCTACGGAATGACCAGTGACGACTACCGTGAGATCAACAGCGGCCAG GACATTGATACCTAT TTCATCCCTGGCGGTAACCGAGCATTTACGCCGGGTCGGATAAACTACTACTTCAAATTT AGCGGCCCCAGTGTG AGCGTTGACACAGCGTGCTCGTCTAGTCTTGCTGCTATCCACATGGCTTGCAATTCGATC TGGAGAAATGACTGC GATGCCGCCATCACTGGAGGTGTGAACATTCTGACCAGCCCTGACAACCACGCCGGTCTG GATCGGGGCCATTTC CTGTCCACCACTGGCAACTGTAACACCTTTGATGACGGCGCCGACGGCTACTGTAGAGCG GACGGAGTTGGAAGC ATCGTTTTGAAGCGGCTTGAAGATGCCGAGGCCGACAACGACCCGATCCTGGCCGTCATC AACGGTGCTTACACC AACCACTCGGCGGAGGCCGTGTCAATCACTCGTCCCCATGTTGGCGCGCAAGCATTCATC TTCAACAAGCTGCTC AATGATGCGAATATCGACCCTAAGGACGTGAGCTACGTGGAAATGCATGGCACTGGAACT CAAGCAGGTGATGCA GTCGAAATGCAGTCCGTTCTTGACGTCTTCGCACCAGACTACCGCCGGGGTCCCGGTCAA TCGCTTCATATCGGT TCTGCCAAGGCAAACATTGGACACGGTGAATCCGCATCAGGAGTGACTGCTCTTGTCAAG GTCCTCCTAATGATG AGAGAGAACATGATTCCTCCTCATTGTGGTATCAAGACCAAGATCAATTCCAATTTCCCG ACAGACTTGGCGAAG CGCAATGTTCATATCGCCTTCCAACCCACTCCCTGGAATCGGCCAGCTTCAGGAAAGCGG CGAACTTTCGTCAAC AACTTTTCTGCTGCTGGTGGTAACACTGCTCTTCTACTGGAAGATGCTCCCATACCGGAA CGCCAAGGGCAGGAC CCCAGGTCGTTCCATTTGGTCTCCGTGTCAGCAAGATCCCAGTCTGCATTGAAGAACAAC GTCGAAGCTCTGGTG AAGTACATTGACTCTCAGGGCAAGTCCTTTGGTGTGAAAGAGACTGAATTCCTTCCAAAC CTGGCGTACACGACC ACCGCACGCCGTATCCACCATCCCTTCCGTGTCATTGCGGTTGGAGCGAACCTACAATCA CTGCGTGACTCGCTG CATGGTGCTTTGCACCGTGAGACATATACCCCAGTTCCCTCAACGGCTCCTGGTATTGGT TTCGTCTTCACCGGC CAAGGAGCCCAATACTCCGGAATGGGCAAGGAACTCTACCGCAGTTGTTTCCAATTCCGA ACCACCATTGAGCAT TTTGACTGCATCGCAAGAAGCCAGGGCCTTCCTTCTATCCTTCCTCTTGTCGATGGAAGC GTGGCTGTCGAAGAA CTTAGCCCTGTCGTGGTACAAGTGGGAACTACCTGTGTACAAATGGCTCTAGTAAATTAC TGGACTGCTCTGGGT GTGAAGCCGGCCTTTATCATCGGACACAGTCTTGGAGACTATGCAGCCCTTAACACGGCC GGTGTTCTATCCACC AGCGATACAATCTATCTTTGTGGCCGGCGTGCTCAGTTGCTGACGAAGGAATGCAAGATT GGGACACATTCGATG CTGGCCATCAAGGCGTCCCTGGCAGAGGTCAAACATTTCCTCAGAGACGAGCTCCACGAA GTCTCTTGTGTTAAC GCACCTGCGGAGACCGTCGTCAGCGGCCTTGTCGCTGATATCGACGAGTTGGCTCAGAAA TGCTCCACAGAGGGT TTGAAGTCAACCAAGCTCAAGGTTCCTTACGCGTTCCATTCCTCTCAGGTTGATCCTATC TTGGAGGCCTTCGAA GATATTGCCCAAGGTGTCACCTTCCACAAGCCGACAACACCTTTCGTCTCAGCCCTGTTC GGGGAAGTGATCACC GATGCTAACTGGGAGTGTCTCGGCCCCAAGTACCTGCGCGATCATTGCAGAAAGACGGTC AACTTCCTTGGCGGC GTGGAGGCTACGAGGCATGCGAAGCTGACCAATGACAAGACTCTGTGGGTTGAGATCGGC TCACATACCATTTGC TCTGGAATGATCAAAGCAACTCTTGGACCGCAAGTTACAACGGTTGCATCTCTACGCCGC GAAGAAGATACCTGG AAGGTCCTTTCGAACAGTCTTGCGAGCCTTCATCTGGCGGGTATTGATATCAACTGGAAG CAATATCACCAGGAC TTTAGCTCCTCTCTCCAGGTCCTCCGCCTCCCAGCCTACAAGTGGGATCTCAAGAACTAC TGGATTCCCTATACC AACAACTTCTGCCTGAGCAAGGGCGCTCCAGTTGCGACAGTAGCGGCAGGGCCACAGCAT GAGTACCTGACAACC GCGGCTCAGAAGGTCATTGAGACTCGAAGTGATGGAGCAACAGCTACAGTCGTGATAGAG AACGACATTGCTGAT CCCGAGCTCAACCGCGTCATTCAAGGCCATAAGGTCAACGGTACTGCTTTGTGTCCCTCA GTAAGTTACCGCTCT TGCCCAACGACTGCGTTAAGATTCGTACTAATCAGGATATAGTCACTATATGCCGACATC TCTCAAACGCTTGCA GAGTATCTCATCAAAAAGTACAAGCCTGAGTACGACGGACTTGGACTGGATGTGTGTGAG GTCACAGTGCCACGA CCACTGATTGCGAAAGGCGGACAGCAGCTCTTTAGAGTATCTGCGACAGCGGATTGGGCG GAGAAGAAGACAACC CTTCAGATATATTCAGTCACTGCGGAGGGGAAGAAGACGGCTGACCACGCAACTTGCACT GTCCGATTCTTTGAC TGCGCTGCTGCGGAGGCGGAATGGAAACGAGTTTCCTACCTTGTCAAGAGGAGCATTGAC CGACTGCATGATATC GCCGAAAATGGTGACGCTCACCGTCTTGGTAGAGGCATGGTTTACAAACTCTTCGCTGCC TTGGTTGATTATGAC GACAACTTCAAGTCCATTCGCGAGGTTATTCTTGACAGTGAACAGCACGAAGCGACTGCA CGCGTCAAGTTCCAA GCACCACAAGGCAATTTCCACCGAAACCCGTTCTGGATTGACAGTTTTGGACACCTGTCT GGGTTCATCATGAAC GCAAGCGATGCAACCGACTCCAAGAACCAGGTCTTTGTCAATCACGGATGGGACTCCATG CGTTGTTTGAAGAAG TTCTCGCCTGATGTCACCTACAGGACTTATGTTAGAATGCAGCCTTGGAAAGACTCCATC TGGGCTGGTGATGTC TACGTTTTCGATGGGGATGATATCGTTGCGGTGTATGGTGCAGTCAAGGTGAGTTCGGCC CGCGCTCAGTTGCAT AAGATTCAAGGTGCTAATCATTGGTGTCACAGTTCCAAGCCTTATCACGCAAGATTCTCG ATACGGTCCTACCTC CAGTTGGGGCTTCGAAGGGCCCCGCCAGACCAGCCGCTAGCGCTCAGAAGGCGGCCCCTG CTGCTGCTGCCAGCA AGAGTCGTGCTAGCGCCCCGGCCCCGGCGAAGCCTGCTGCTAAGCCCAGCGCCCCAAGCT TGGTCAAACGGGCAC TTACCATCCTCGCAGAGGAAGTGGGTCTGTCTGAATCCGAGATTACGGATGATCTGGTCT TCGCAGACTACGGTG TGGACTCCCTTCTTTCGTTGACGGTCACGGGCAGGTATCGTGAAGAGCTGGATATCGATC TCGAATCCTCCATCT TCATCGACCAGCCGACCGTGAAAGACTTCAAGCAGTTCTTGGCCCCAATGAGCCAGGGAG AAGCCAGCGATGGGT CCACCAGTGACCCAGAGTCTAGTAGCTCCTTCAATGGTGGCTCTTCAACAGACGAGTCCA GTGCTGGGTCCCCTG TCAGCTCACCACCAAATGAGAAGGTTACGCAGGTCGAGCAGCATGCTACGATAAAGGAGA TTCGCGCCATTTTGG CCGATGAGATTGGTGTTACGGAGGAGGAGCTGAAGGACGATGAGAACTTGGGAGAGATGG GGATGGACTCTCTGC TTTCGCTTACGGTGCTTGGTAGGATCCGTGAGACATTGGATCTGGATCTACCGGGCGAGT TCTTCATCGAGAATC AAACTCTGAATGACGTGGAGGATGCATTGGGCCTCAAACCCAAGGCAGCTCCTGCGCCTG CGCCTGCGCCTGCTC CCGTACCCGCACCCGTGTCCGCGCCCATATTGAAGGAGCCTGTCCCCAACGCAAACTCTA CCATCATGGCCCGGG CGAGCCCGCACCCTCGATCAACCTCCATTCTGTTGCAAGGAAACCCGAAAACCGCGACCA AGACCCTGTTCCTGT TCCCTGATGGGTCTGGCTCCGCAACATCGTATGCAACCATTCCCGGAGTGTCCCCGGACG TGTGTGTCTACGGAT TGAACTGCCCGTACATGAAGACTCCAGAGAAGCTCAAGTATCCCCTTGCTGAGATGACAT TCCCCTATCTGGCCG AGATCCGCCGCAGACAGCCCAAGGGCCCGTACAACTTCGGTGGATGGTCTGCAGGTGGTA TTTGCGCCTATGATG CCGCTCGCTACCTAATCCTTGAAGAGGGCGAACAGGTTGACCGATTGCTTCTTCTTGACT CGCCCTTCCCCATTG GCTTAGAGAAGTTGCCCACTCGGCTGTACGGCTTCATCAACTCAATGGGTCTCTTTGGTG AAGGCAACAAGGCTC CCCCGGCCTGGTTGCTCCCTCATTTCCTGGCCTTCATTGATTCCCTCGATACCTACAAGG CCGTCCCCCTCCCCT TTGACGATCCGAAGTGGGCCAAGAAGATGCCAAAGACATTCATGGTCTGGGCCAAGGACG GTATCTGCAGCAAGC CGGATGACCCGTGGCCCGAGCCGGACCCGGACGGCAAGCCGGACACGAGAGAGATGGTCT GGCTCCTCAAGAACC GGACCGACATGGGACCCAACAAGTGGGACACACTCGTCGGGCCCCAAAACGTCGGTGGAA TCACTGTGATAGAGG GTGCGAATCATTTCACCATGACTTTGGGACCCAAGGCTAAAGAATTGGGCTCGTTCATTG GCAACGCCATGGCCA ATTAA

[00322] aygA sequence in Aspergillus niger

ATGGCTCCTTGGATCCTCGGCGAGAAGTTCAACACCGTTTACCCCCACAAGGGCTCTATC AAAGCTCTCTGGGAA ACGAAGTGGAAGTTTGCAGTAAGTTTTCACTGGTGGTCGGCATCACCACCCCTTGCTCAG TGGTTGGCCAACCGC TCAGCCAGGTCTTATCTAACGTAGCATGCAGTGTGAAAAATCAGTCTATCCGTTTCACGA CGGTGCCATCGAAGA CTTTCGACCTATCTTCCAAAAGCTTATCGATGTAGGTTATCATAATCTTGCCATGTGCGC CTTACGAGCGCAAGG GTAAAATACTCACTTCTAGATAGGAAAATATCAACGATGCCTACACCGATGCCTACACGC AGGCTTTCTTCCCGG TTGCTGAGGCACTCGAAAATAAGGCGTCAGCTGCTTTGAACAACAACAATGTGGAGATGG CATCTGACTTGCTCC GAAGAGCTGCTGTGGTCTACCGTATCTCCCGCTTCCCATATGTCGACCCGACCAGAGAAG ACATCAAAAAAGAGG CCTTCAACCGCCAGAAGAAGGTCTATCTGAAGGCAGCATCCTTCTGGAAGCCCACCATCC AGGAGGTCATCATCC CGCACAAGCATAAGTCGGCCACCGACGGAGCCTATGTTCCCCTTTTTGTTCGTGTCCCGG AACATGCGACGGCGG AGAACCCCGTCCCAGTTGTAGTTCTAATGACTGGTCTGGATGGATACCGTCCCGACAACA GTCAGCGTACGCATG AGATCATCAACCGGGGTTGGGCAACAGTTATCTGTGAGATCCCCGGTACCGCGGACTCGC CTGCCGATCCCAGCG ACCCCGAATCGGCGGACCGTCAGTGGACTACCGTTCTCGACTACATGGCGACTCGTCCGG AGTTCGACATGTCCC GGGTTGCTGCCTGGGGTCTAAGTGCTGGTGGGTTCTACGCCATCCGGGCTGCACATACCC ACCGTGATCGCTTCG TGGGAACTCTCGCCCACGGACCGGGCTGCCATTACTTCCTGGACCCCGAGTGGTTGAGCC GTGTTGATGACCACG AGTACCCATTCCTGTAAGTACTGTGACCACGTGTGTCATTGCTAACGACACTAATGGGAC ATTTTGTAGGCTCAC CCCTGCCTGGGCCAAGAAGTACGGCTACTCCAACCCGGAGGATTTCAAGAAGCATGGCCA GAAGAAGTTCTCTCT TCTAGAGACGGGCATCTTGGATCAGCCTAGCTGCAGACTGTTGCTGCTCAACGTAAGACC GATACAGCCGCGATC TTCTGTATGATACTGACCCATGCTATGCAGGGTGTGGATGATGGCGTCACCCCCATCGAA GACTGCTTGATGCTC TTCAATCACGGAAGCCCCAAAGAGGGAAGGTCAGTCAGCTCCCTCCAATTCCGCATTTGC CTAGTCCTTAACCAT GATCGCCCCTTAGATTCTTCCATGGACTACCCCACATGGGATACCCTCACAGTCTGGTCC CGTCGTACAAGTGGT TCGAGGATGTTCTGCGCTCGCCCCAAGAGCCTCTCAAGAACTGA

[00323] albA sequence in Aspergillus niger

[00324] ATGGAGGGTCCATCTCGTGTGTACCTTTTTGGAGACCAGACCAGCGACATCGAAGCTGGC CTGCGC CGTCTGCTCCAAGCGAAGAATAGTACCATTGTCCAGTCCTTTTTCCAGCAATGCTTCCAT GCAATTCGTCAAGAG ATCGCGAAGCTCCCGCCGTCTCATCGGAAGCTCTTCCCACGCTTCACGAGCATCGTTGAT CTCCTTTCCAGGAGT CGTGAATCAGGTCCTAGCCCTGTCCTGGAGAGTGCATTGACATGCATCTACCAATTGGGT TGTTTCATTCAGTAA GTCAATGAGTTACCATCTATACTTGACAAGTCTGACCAGCCTTCAGCTTTTACGGGGATC TTGGACATGACTACC CTACACCCTCCAACAGCCATCTTGTTGGCCTGTGCACTGGTGTTCTGAGCTGCACGGCTG TAAGTTGCGCCAGAA ATGTTGGAGAGCTTATTCCAGCTGCAGTGGAATCGGTTGTAATTGCACTGCGACTGGGAA TCTGCGTTTTTCGAG TTCGAGAACTGGTGGACTCCGCCGATTCCGAGTCAACATGCTGGTCAGCGTTGGTTTCTG GAATCAGTGAAGCAG AGGCTAGCCACCTGATCGACGAGTACAGTAGTAAGAAGGTGTGCTCTTCCAACTTTAAAC CCCCGCATTGTGGGA TGCTGACAGATGCAGGCTACTCCGCCTTCTTCGAAACCGTATATCAGCGCGGTAAGCTCT AATGGCGTTACTGTC AGCGCACCACCTACGGTACTTGATGAATTCGTCGAGACCTGCATTTCCAAGAATTACAAG CCAGTGAAGGCCCCT ATTCATGGCCCGTACCATGCGCCACATCTGTATGATGATAAGGATATCGACCGCATCCTG CAGCAGTCCTCTGCT CTAGAAGGACTGACCGGCTGTTCACCCGTTATTCCCATCATCTCCAGTAACACTGGAAAG CCGATCAAGGCCAAG TCCATCAAAGATCTCTTCAAGGTCGCACTGGAGGAGATACTCCTACGACGACTATGCTGG GACAAGGTCACGGAG TCCTGCACATCAGTCTGCAAGACCGGCACAAACCACTCTTGCAAATTGTTTCCGATCTCG AGTAGCGCCACTCAA AGTTTGTTCACAGTCCTCAAGAAGGCCGGTGTGAGCATCAGCTTGGAGACTGGGGTAGGA GAGATCGCGACGAAC CCAGAAATGCGGAACCTTACTGGCAAGGCAGAAAATTCAAAGATTGCTATCATTGGTATG TCTGGAAGATTTCCT GACTCGGATGGTACGGAGAGCTTCTGGAACCTCCTGTACAAAGGACTCGACGTACATCGC AAAGTCCCCGCAGAC CGTTGGGACGTTGATGCCCACGTCGACATGACCGGGTCAAAGAGAAACACAAGCAAAGTG GCTTACGGTTGCTGG ATCAACGAACCCGGCCTGTTTGACCCCCGATTCTTCAACATGTCGCCTCGGGAAGCACTC CAAGCAGATCCTGCA CAACGTCTTGCGTTGCTTACAGCGTACGAGGCTCTCGAGATGGCTGGCTTCATCCCGGAT AGCTCTCCATCGACG CAGAGGGACCGTGTGGGTATTTTCTACGGAATGACCAGTGACGACTACCGTGAGATCAAC AGCGGCCAGGACATT GATACCTATTTCATCCCTGGCGGTAACCGAGCATTTACGCCGGGTCGGATAAACTACTAC TTCAAATTTAGCGGC CCCAGTGTGAGCGTTGACACAGCGTGCTCGTCTAGTCTTGCTGCTATCCACATGGCTTGC AATTCGATCTGGAGA AATGACTGCGATGCCGCCATCACTGGAGGTGTGAACATTCTGACCAGCCCTGACAACCAC GCCGGTCTGGATCGG GGCCATTTCCTGTCCACCACTGGCAACTGTAACACCTTTGATGACGGCGCCGACGGCTAC TGTAGAGCGGACGGA GTTGGAAGCATCGTTTTGAAGCGGCTTGAAGATGCCGAGGCCGACAACGACCCGATCCTG GCCGTCATCAACGGT GCTTACACCAACCACTCGGCGGAGGCCGTGTCAATCACTCGTCCCCATGTTGGCGCGCAA GCATTCATCTTCAAC AAGCTGCTCAATGATGCGAATATCGACCCTAAGGACGTGAGCTACGTGGAAATGCATGGC ACTGGAACTCAAGCA GGTGATGCAGTCGAAATGCAGTCCGTTCTTGACGTCTTCGCACCAGACTACCGCCGGGGT CCCGGTCAATCGCTT CATATCGGTTCTGCCAAGGCAAACATTGGACACGGTGAATCCGCATCAGGAGTGACTGCT CTTGTCAAGGTCCTC CTAATGATGAGAGAGAACATGATTCCTCCTCATTGTGGTATCAAGACCAAGATCAATTCC AATTTCCCGACAGAC TTGGCGAAGCGCAATGTTCATATCGCCTTCCAACCCACTCCCTGGAATCGGCCAGCTTCA GGAAAGCGGCGAACT TTCGTCAACAACTTTTCTGCTGCTGGTGGTAACACTGCTCTTCTACTGGAAGATGCTCCC ATACCGGAACGCCAA GGGCAGGACCCCAGGTCGTTCCATTTGGTCTCCGTGTCAGCAAGATCCCAGTCTGCATTG AAGAACAACGTCGAA GCTCTGGTGAAGTACATTGACTCTCAGGGCAAGTCCTTTGGTGTGAAAGAGACTGAATTC CTTCCAAACCTGGCG TACACGACCACCGCACGCCGTATCCACCATCCCTTCCGTGTCATTGCGGTTGGAGCGAAC CTACAATCACTGCGT GACTCGCTGCATGGTGCTTTGCACCGTGAGACATATACCCCAGTTCCCTCAACGGCTCCT GGTATTGGTTTCGTC TTCACCGGCCAAGGAGCCCAATACTCCGGAATGGGCAAGGAACTCTACCGCAGTTGTTTC CAATTCCGAACCACC ATTGAGCATTTTGACTGCATCGCAAGAAGCCAGGGCCTTCCTTCTATCCTTCCTCTTGTC GATGGAAGCGTGGCT GTCGAAGAACTTAGCCCTGTCGTGGTACAAGTGGGAACTACCTGTGTACAAATGGCTCTA GTAAATTACTGGACT GCTCTGGGTGTGAAGCCGGCCTTTATCATCGGACACAGTCTTGGAGACTATGCAGCCCTT AACACGGCCGGTGTT CTATCCACCAGCGATACAATCTATCTTTGTGGCCGGCGTGCTCAGTTGCTGACGAAGGAA TGCAAGATTGGGACA CATTCGATGCTGGCCATCAAGGCGTCCCTGGCAGAGGTCAAACATTTCCTCAGAGACGAG CTCCACGAAGTCTCT TGTGTTAACGCACCTGCGGAGACCGTCGTCAGCGGCCTTGTCGCTGATATCGACGAGTTG GCTCAGAAATGCTCC ACAGAGGGTTTGAAGTCAACCAAGCTCAAGGTTCCTTACGCGTTCCATTCCTCTCAGGTT GATCCTATCTTGGAG GCCTTCGAAGATATTGCCCAAGGTGTCACCTTCCACAAGCCGACAACACCTTTCGTCTCA GCCCTGTTCGGGGAA GTGATCACCGATGCTAACTGGGAGTGTCTCGGCCCCAAGTACCTGCGCGATCATTGCAGA AAGACGGTCAACTTC CTTGGCGGCGTGGAGGCTACGAGGCATGCGAAGCTGACCAATGACAAGACTCTGTGGGTT GAGATCGGCTCACAT ACCATTTGCTCTGGAATGATCAAAGCAACTCTTGGACCGCAAGTTACAACGGTTGCATCT CTACGCCGCGAAGAA GATACCTGGAAGGTCCTTTCGAACAGTCTTGCGAGCCTTCATCTGGCGGGTATTGATATC AACTGGAAGCAATAT CACCAGGACTTTAGCTCCTCTCTCCAGGTCCTCCGCCTCCCAGCCTACAAGTGGGATCTC AAGAACTACTGGATT CCCTATACCAACAACTTCTGCCTGAGCAAGGGCGCTCCAGTTGCGACAGTAGCGGCAGGG CCACAGCATGAGTAC CTGACAACCGCGGCTCAGAAGGTCATTGAGACTCGAAGTGATGGAGCAACAGCTACAGTC GTGATAGAGAACGAC ATTGCTGATCCCGAGCTCAACCGCGTCATTCAAGGCCATAAGGTCAACGGTACTGCTTTG TGTCCCTCAGTAAGT TACCGCTCTTGCCCAACGACTGCGTTAAGATTCGTACTAATCAGGATATAGTCACTATAT GCCGACATCTCTCAA ACGCTTGCAGAGTATCTCATCAAAAAGTACAAGCCTGAGTACGACGGACTTGGACTGGAT GTGTGTGAGGTCACA GTGCCACGACCACTGATTGCGAAAGGCGGACAGCAGCTCTTTAGAGTATCTGCGACAGCG GATTGGGCGGAGAAG AAGACAACCCTTCAGATATATTCAGTCACTGCGGAGGGGAAGAAGACGGCTGACCACGCA ACTTGCACTGTCCGA TTCTTTGACTGCGCTGCTGCGGAGGCGGAATGGAAACGAGTTTCCTACCTTGTCAAGAGG AGCATTGACCGACTG CATGATATCGCCGAAAATGGTGACGCTCACCGTCTTGGTAGAGGCATGGTTTACAAACTC TTCGCTGCCTTGGTT GATTATGACGACAACTTCAAGTCCATTCGCGAGGTTATTCTTGACAGTGAACAGCACGAA GCGACTGCACGCGTC AAGTTCCAAGCACCACAAGGCAATTTCCACCGAAACCCGTTCTGGATTGACAGTTTTGGA CACCTGTCTGGGTTC ATCATGAACGCAAGCGATGCAACCGACTCCAAGAACCAGGTCTTTGTCAATCACGGATGG GACTCCATGCGTTGT TTGAAGAAGTTCTCGCCTGATGTCACCTACAGGACTTATGTTAGAATGCAGCCTTGGAAA GACTCCATCTGGGCT GGTGATGTCTACGTTTTCGATGGGGATGATATCGTTGCGGTGTATGGTGCAGTCAAGGTG AGTTCGGCCCGCGCT CAGTTGCATAAGATTCAAGGTGCTAATCATTGGTGTCACAGTTCCAAGCCTTATCACGCA AGATTCTCGATACGG TCCTACCTCCAGTTGGGGCTTCGAAGGGCCCCGCCAGACCAGCCGCTAGCGCTCAGAAGG CGGCCCCTGCTGCTG CTGCCAGCAAGAGTCGTGCTAGCGCCCCGGCCCCGGCGAAGCCTGCTGCTAAGCCCAGCG CCCCAAGCTTGGTCA AACGGGCACTTACCATCCTCGCAGAGGAAGTGGGTCTGTCTGAATCCGAGATTACGGATG ATCTGGTCTTCGCAG ACTACGGTGTGGACTCCCTTCTTTCGTTGACGGTCACGGGCAGGTATCGTGAAGAGCTGG ATATCGATCTCGAAT CCTCCATCTTCATCGACCAGCCGACCGTGAAAGACTTCAAGCAGTTCTTGGCCCCAATGA GCCAGGGAGAAGCCA GCGATGGGTCCACCAGTGACCCAGAGTCTAGTAGCTCCTTCAATGGTGGCTCTTCAACAG ACGAGTCCAGTGCTG GGTCCCCTGTCAGCTCACCACCAAATGAGAAGGTTACGCAGGTCGAGCAGCATGCTACGA TAAAGGAGATTCGCG CCATTTTGGCCGATGAGATTGGTGTTACGGAGGAGGAGCTGAAGGACGATGAGAACTTGG GAGAGATGGGGATGG ACTCTCTGCTTTCGCTTACGGTGCTTGGTAGGATCCGTGAGACATTGGATCTGGATCTAC CGGGCGAGTTCTTCA TCGAGAATCAAACTCTGAATGACGTGGAGGATGCATTGGGCCTCAAACCCAAGGCAGCTC CTGCGCCTGCGCCTG CGCCTGCTCCCGTACCCGCACCCGTGTCCGCGCCCATATTGAAGGAGCCTGTCCCCAACG CAAACTCTACCATCA TGGCCCGGGCGAGCCCGCACCCTCGATCAACCTCCATTCTGTTGCAAGGAAACCCGAAAA CCGCGACCAAGACCC TGTTCCTGTTCCCTGATGGGTCTGGCTCCGCAACATCGTATGCAACCATTCCCGGAGTGT CCCCGGACGTGTGTG TCTACGGATTGAACTGCCCGTACATGAAGACTCCAGAGAAGCTCAAGTATCCCCTTGCTG AGATGACATTCCCCT ATCTGGCCGAGATCCGCCGCAGACAGCCCAAGGGCCCGTACAACTTCGGTGGATGGTCTG CAGGTGGTATTTGCG CCTATGATGCCGCTCGCTACCTAATCCTTGAAGAGGGCGAACAGGTTGACCGATTGCTTC TTCTTGACTCGCCCT TCCCCATTGGCTTAGAGAAGTTGCCCACTCGGCTGTACGGCTTCATCAACTCAATGGGTC TCTTTGGTGAAGGCA ACAAGGCTCCCCCGGCCTGGTTGCTCCCTCATTTCCTGGCCTTCATTGATTCCCTCGATA CCTACAAGGCCGTCC CCCTCCCCTTTGACGATCCGAAGTGGGCCAAGAAGATGCCAAAGACATTCATGGTCTGGG CCAAGGACGGTATCT GCAGCAAGCCGGATGACCCGTGGCCCGAGCCGGACCCGGACGGCAAGCCGGACACGAGAG AGATGGTCTGGCTCC TCAAGAACCGGACCGACATGGGACCCAACAAGTGGGACACACTCGTCGGGCCCCAAAACG TCGGTGGAATCACTG TGATAGAGGGTGCGAATCATTTCACCATGACTTTGGGACCCAAGGCTAAAGAATTGGGCT CGTTCATTGGCAACG CCATGGCCAATTAA

aygA sequence in Aspergillus niger

ATGGCTCCTTGGATCCTCGGCGAGAAGTTCAACACCGTTTACCCCCACAAGGGCTCTATC AAAGCTCTCTGGGAA ACGAAGTGGAAGTTTGCAGTAAGTTTTCACTGGTGGTCGGCATCACCACCCCTTGCTCAG TGGTTGGCCAACCGC TCAGCCAGGTCTTATCTAACGTAGCATGCAGTGTGAAAAATCAGTCTATCCGTTTCACGA CGGTGCCATCGAAGA CTTTCGACCTATCTTCCAAAAGCTTATCGATGTAGGTTATCATAATCTTGCCATGTGCGC CTTACGAGCGCAAGG GTAAAATACTCACTTCTAGATAGGAAAATATCAACGATGCCTACACCGATGCCTACACGC AGGCTTTCTTCCCGG TTGCTGAGGCACTCGAAAATAAGGCGTCAGCTGCTTTGAACAACAACAATGTGGAGATGG CATCTGACTTGCTCC GAAGAGCTGCTGTGGTCTACCGTATCTCCCGCTTCCCATATGTCGACCCGACCAGAGAAG ACATCAAAAAAGAGG CCTTCAACCGCCAGAAGAAGGTCTATCTGAAGGCAGCATCCTTCTGGAAGCCCACCATCC AGGAGGTCATCATCC CGCACAAGCATAAGTCGGCCACCGACGGAGCCTATGTTCCCCTTTTTGTTCGTGTCCCGG AACATGCGACGGCGG AGAACCCCGTCCCAGTTGTAGTTCTAATGACTGGTCTGGATGGATACCGTCCCGACAACA GTCAGCGTACGCATG AGATCATCAACCGGGGTTGGGCAACAGTTATCTGTGAGATCCCCGGTACCGCGGACTCGC CTGCCGATCCCAGCG ACCCCGAATCGGCGGACCGTCAGTGGACTACCGTTCTCGACTACATGGCGACTCGTCCGG AGTTCGACATGTCCC GGGTTGCTGCCTGGGGTCTAAGTGCTGGTGGGTTCTACGCCATCCGGGCTGCACATACCC ACCGTGATCGCTTCG TGGGAACTCTCGCCCACGGACCGGGCTGCCATTACTTCCTGGACCCCGAGTGGTTGAGCC GTGTTGATGACCACG AGTACCCATTCCTGTAAGTACTGTGACCACGTGTGTCATTGCTAACGACACTAATGGGAC ATTTTGTAGGCTCAC CCCTGCCTGGGCCAAGAAGTACGGCTACTCCAACCCGGAGGATTTCAAGAAGCATGGCCA GAAGAAGTTCTCTCT TCTAGAGACGGGCATCTTGGATCAGCCTAGCTGCAGACTGTTGCTGCTCAACGTAAGACC GATACAGCCGCGATC TTCTGTATGATACTGACCCATGCTATGCAGGGTGTGGATGATGGCGTCACCCCCATCGAA GACTGCTTGATGCTC TTCAATCACGGAAGCCCCAAAGAGGGAAGGTCAGTCAGCTCCCTCCAATTCCGCATTTGC CTAGTCCTTAACCAT GATCGCCCCTTAGATTCTTCCATGGACTACCCCACATGGGATACCCTCACAGTCTGGTCC CGTCGTACAAGTGGT TCGAGGATGTTCTGCGCTCGCCCCAAGAGCCTCTCAAGAACTGA

[00325] EXAMPLE 10. Utilization of recombinant host organism variants.

[00326] The cas9 gene from Streptococcus pyogenes was codon optimized for expression in Aspergillus niger and synthesized as described herein. An SV40 nuclear localization signal is introduced at both the 3' and 5' end of the Cas9 coding sequence. The cas9 cassette is cloned into pMA171, a replicating plasmid for filamentous fungi containing the AMA autonomously replicating sequence and a hygromycin antibiotic resistance cassette. Cas9 expression is placed under control of the strong inducible glucoamylase promoter from Aspergillus niger. In one example, the gene encoding a guide RNA (gRNA) targeting the coding sequence of a gene involved in citrate biosynthesis is cloned into the same vector under the control of the strong constitutive gpdA promoter from Aspergillus nidulans. The resulting Cas9/gRNA plasmid is transformed into Aspergillus niger using standard protoplast transformation methods. Transformants are selected on complete media with 100-200 ug/mL of hygromycin. Transformants are picked and purified by passage on complete media plus hygromycin plates. In one example, purified isolates are allowed to sporulate and spores are harvested in water to generate spore stocks. The resulting spore stocks are inoculated into complete media with maltose to induce the expression of cas9. Cultures are grown for approximately twenty-four hours, and samples are then plated onto complete media without hygromycin to cure the Cas9/gRNA plasmid. Colonies are screened by PCR and sequencing to identify isolates containing DNA sequence modifications (e.g., insertions, deletions, or a combination thereof) that cause gene disruption at the target protease locus. Optionally, purified transformants are patched directly to complete media plates without hygromycin to cure the Cas9/gRNA. Colonies are screened by PCR and sequencing to identify isolates with DNA sequence modifications that cause gene disruption at the target citrate biosynthesis gene locus. The resulting strain is then grown and tested for citric acid secretion in the media. Since the citrate synthesis pathway is disrupted, the pH of the media is retained at pH 8, which reduces the production of major acidic proteases and glucoamylase. The resulting strain is then transformed with an integrating plasmid containing strong constitutive gpdA promoter from Aspergillus nidulans that drives the expression of glucoamylase protein fused at the N-terminal of protein of interest with a Kexin protease site in between. The signal peptide of glucoamylase lets the fusion protein to be transported to endoplasmic reticulum where the signal peptide is cleaved by signal peptidase and the fusion protein is transported to Golgi apparatus where the mature protein of interest is cleaved from the carrier protein by Kexin protease and the protein of interest is secreted intact into the media. In one example, a smaller part of glucoamylase that includes the signal peptide of glucoamylase is used as a carrier protein.

[00327] The cas9 gene from Streptococcus pyogenes is codon optimized for expression in Trichoderma reesei and synthesized. An SV40 nuclear localization signal is introduced at both the 3' and 5' end of the cas9 coding sequence. The Cas9 cassette is cloned into pMA171, a replicating plasmid for filamentous fungi containing the AMA autonomously replicating sequence and a hygromycin antibiotic resistance cassette. Cas9 expression is placed under control of the strong cellulose inducible cbhl promoter from Trichoderma reesei. In one example, the gene encoding a guide RNA (gRNA) targeting the coding sequence of a aspartyl acidic protease gene is cloned into the same vector under the control of the strong constitutive pdc promoter from Trichoderma reesei (Junxin Li, et al, Microbial Cell Factories, 2012, 11 :84). The resulting Cas9/gRNA plasmid is transformed into

Trichoderma reesei using standard protoplast transformation methods. Transformants are selected on complete media with 100-200 ug/mL of hygromycin. Transformants are picked and purified by passage on complete media plus hygromycin plates. Optionally, purified isolates are allowed to sporulate and spores are harvested in water to generate spore stocks. The resulting spore stocks are inoculated into complete media with cellulose to induce the expression of cas9. Cultures are grown for approximately twenty-four hours, and samples are then plated onto complete media without hygromycin to cure the Cas9/gRNA plasmid.

Colonies are screened by PCR and sequencing to identify isolates containing DNA sequence variants (e.g., insertions, deletions, or a combination thereof) that cause gene disruption at the target protease locus. In a modification to this process, purified transformants are patched directly to complete media plates without hygromycin to cure the Cas9/gRNA. Colonies are screened by PCR and sequencing to identify isolates with DNA sequence variations that cause gene disruption at the target protease locus. This resulting strain is then transformed with an integrating plasmid containing strong constitutive pdc promoter from Trichoderma reesei that drives the expression of cellulase (Cbhl) protein fused at the N-terminal of protein of interest with a Kexin protease site in between. The signal peptide of Cbhl lets the fusion protein to be transported to endoplasmic reticulum where the signal peptide is cleaved by signal peptidase and the fusion protein is transported to Golgi apparatus where the mature protein of interest is cleaved from the carrier protein by Kexin protease and secreted intact into the media. In one example, a smaller part of cellulase that includes the signal peptide of glucoamylase is used as a carrier protein

[00328] The cas9 gene from Streptococcus pyogenes is codon optimized for expression in Sacchromyces cerevisae and synthesized. An SV40 nuclear localization signal is introduced at both the 3' and 5' end of the cas9 coding sequence. The cas9 cassette is cloned into yeast replicating plasmid containing the ARS autonomously replicating sequence and a URA3, uracil (pyrimidine) selection marker. Cas9 expression is placed under control of the strong galactose inducible gal4 promoter from Sacchromyces cerevisae. In one example, the gene encoding a guide RNA (gRNA) targeting the coding sequence of a vacuolar targeting gene (Fitzgerald and Glick, Microbial Cell Factories 2014, 13: 125) is cloned into the same vector under the control of the strong constitutive gpd promoter from Sacchromyces cerevisae. The resulting Cas9/gRNA plasmid is transformed into Sacchromyces cerevisae using standard transformation methods. Transformants are selected on minimal media without uracil/uridine. Transformants are picked and purified by passage on minimal media plates. The resulting transformants are inoculated into minimal media with galactose to induce the expression of cas9. Cultures are grown for approximately twenty-four hours, and samples are then plated onto minimal media with 5' FOA to cure the Cas9/gRNA plasmid. Colonies are screened by PCR and sequencing to identify isolates containing DNA sequence variations (e.g., insertions, deletions, or a combination thereof) that cause gene disruption at the target vacuolar targeting gene locus. This resulting strain is then transformed with an integrating plasmid containing strong constitutive gpd promoter from Sacchromyces cerevisae that drives the expression of alpha mating factor signal peptide fused at the N-terminal of protein of interest. The signal peptide of alpha mating factor provides protein transport to the endoplasmic reticulum, where the protein of interest is cleaved by signal peptidase and the nutritive polypeptide is secreted intact into the media surpassing the vacuolar trafficking, thereby reducing one or more inefficient steps in secretion.

[00329] While the invention has been particularly shown and described with reference to a preferred embodiment and various alternate embodiments, it will be understood by persons skilled in the relevant art that various changes in form and details can be made therein without departing from the spirit and scope of the invention.

[00330] All references, issued patents and patent applications cited within the body of the instant specification are hereby incorporated by reference in their entirety, for all purposes.

Ġ