Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A MICROBIAL ORGANISM FOR PRODUCING TEREPHTHALATE FROM BIOMASS
Document Type and Number:
WIPO Patent Application WO/2014/102280
Kind Code:
A1
Abstract:
There is disclosed a non-naturally occurring microbial organism comprising a first exogenous nucleic acid encoding a first enzyme to produce 1,2-dihydroxy-3,5-cyclohexadiene-1,4-dicarboxylate (DCD) from a dihydroxybenzoate, the first enzyme being either a wild type or a mutant. The non-naturally occurring microbial organism may further comprise a second exogenous nucleic acid encoding a second enzyme to produce terephthalate from 1,2-dihydroxy-3,5-cyclohexadiene-1,4-dicarboxylate. The non-naturally occurring microbial organism may also comprisea third exogenous nucleic acid encoding at least a third enzyme to produce a dihydroxybenzoate from a carbon source, the third enzyme being 3-dehydroshikimate dehydratase.

Inventors:
CARBONELL PABLO (FR)
FAULON JEAN-LOUP (FR)
CAPASSO ELISABETTA (IT)
VOLPATI LAURA (IT)
Application Number:
PCT/EP2013/077989
Publication Date:
July 03, 2014
Filing Date:
December 24, 2013
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
BIOCHEMTEX SPA (IT)
International Classes:
C12P7/44; C12N9/02; C12P7/40
Domestic Patent References:
WO2011094131A12011-08-04
WO2011017560A12011-02-10
Foreign References:
US5616496A1997-04-01
Other References:
SASOH M ET AL: "Characterization of the terephthalate degradation genes of Comamonas sp. strain E6", APPLIED AND ENVIRONMENTAL MICROBIOLOGY, AMERICAN SOCIETY FOR MICROBIOLOGY, US, vol. 72, no. 3, 1 March 2006 (2006-03-01), pages 1825 - 1832, XP002579013, ISSN: 0099-2240, DOI: 10.1128/AEM.72.3.1825-1832.2006
JASLEEN BAINS ET AL: "Investigating Terephthalate Biodegradation: Structural Characterization of a Putative Decarboxylating cis-Dihydrodiol Dehydrogenase", JOURNAL OF MOLECULAR BIOLOGY, vol. 423, no. 3, 11 August 2012 (2012-08-11), pages 284 - 293, XP055077046, ISSN: 0022-2836, DOI: 10.1016/j.jmb.2012.07.022
ABBAS ABOU HAMDAN ET AL: "Understanding and Tuning the Catalytic Bias of Hydrogenase", JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, vol. 134, no. 20, 23 May 2012 (2012-05-23), pages 8368 - 8371, XP055077043, ISSN: 0002-7863, DOI: 10.1021/ja301802r
Attorney, Agent or Firm:
ZAMBARDINO, Umberto et al. (Via Cappellini 11, Milano, IT)
Download PDF:
Claims:
CLAIMS

1. A non-naturally occurring microbial organism, the non-naturally occurring microbial organism comprising at least a first exogenous nucleic acid encoding at least a first en- zyme expressed in a sufficient amount to produce l,2-dihydroxy-3,5-cyclohexadiene-l,4- dicarboxylate (DCD) from a dihydroxybenzoate, the at least a first enzyme being either a wild-type or a mutant and selected from the group consisting of (lR,2S)-l,2-dihydroxy- 3,5-cyclohexadiene-l,4-dicarboxylatedehydrogenase (EC 1.3.1.53),5-epi-aristolochene 1,3-dihydroxylase (EC 1.14.13.119), 2,4-dichlorophenol 6-monoxygenase (EC 1.14.13.20), Phenol 2-monooxygenase (EC 1.14.13.7), Benzoate 4-monooxygenase (EC 1.14.13.12), and Benzoate 1,2-dioxygenase (EC 1.14.12.10).

2. The non-naturally occurring microbial organism of claim 1, further comprising at least a second exogenous nucleic acid encoding at least a second enzyme expressed in a sufficient amount to produce terephthalate from l,2-dihydroxy-3,5-cyclohexadiene-l,4- dicarboxylate (DCD), the at least a second enzyme being either a wild-type or a mutant and selected from the group consisting of TPA 1,2-dioxygenase (EC 1.14.12.15), 3-Dehydroquinate dehydratase (EC 4.2.1.10), Salicylaldehyde dehydrogenase (EC 1.2.1.65), and l,6-dihydroxycyclohexa-2,4-diene-l-carboxylate dehydrogenase (EC 1.3.1.25).

3. The non-naturally occurring microbial organism of claim 1, further comprising at least a third exogenous nucleic acid encoding at least a third enzyme expressed in a sufficient amount to produce a dihydroxybenzoate from a carbon source, the at least a third en- zyme being 3-dehydroshikimate dehydratase.

4. The non-naturally occurring microbial organism of claim 2, further comprising at least a third exogenous nucleic acid encoding at least a third enzyme expressed in a sufficient amount to produce a dihydroxybenzoate from a sugar source, the at least a third en- zyme being 3-dehydroshikimate dehydratase.

5. The non-naturally occurring microbial organism of claim 1, wherein the (1R,2S)- l,2-dihydroxy-3,5-cyclohexadiene-l,4-dicarboxylatedehydrogenaseenzyme is SEQ ID NO: 1 and comprises at least one mutation at a position selected from the group consisting of A238, H203, L151, Q257, Y250, R152, G150, H247, A196, Q194, D248, P261, N201, M240, D233, F198, A252, P193, A243, A231, K246, P235, G237, R245, Q244, V232, L242, V197, M236, and L249, wherein the amino acid position and wild-type amino acid are indicated, and wherein the at least one mutation increases a reverse enzymatic reaction and the production of l,2-dihydroxy-3,5-cyclohexadiene-l,4-dicarboxylate (DCD) from the dihydroxybenzoate. 6. The non-naturally occurring microbial organism of claim 1, wherein the (1R,2S)- l,2-dihydroxy-3,5-cyclohexadiene-l,4-dicarboxylatedehydrogenaseenzyme is (SEQ ID NO: 1 and comprises at least one mutation pair selected from the group consisting of V232A/M240G; L151R/V232A; V232A/M236A; V232A/M240L; L151R/M240G; V232A/M240A; M236A/M240G; V232A/P235A; V232A/Y250A; and Q194 V232A, wherein the at least one mutation pair increases a reverse enzymatic reaction and the production of l,2-dihydroxy-3,5-cyclohexadiene-l,4-dicarboxylate (DCD) from the dihydroxybenzoate.

7. The non-naturally occurring microbial organism of claim 1, wherein the Benzoate 1,2-dioxygenase enzyme is SEQ ID NO: 6 and comprises at least one mutation pair selected from the group consisting of W25G/W28A; D21G/K23A; W25A/W28L; W25A/W28A; S17A/H91I; S17A/W28L; W25R/W28L; W28L/H91I; W25G/W28L; and S17A/Y125A.

8. The non-naturally occurring microbial organism of claim 1, wherein the 5-epi-aristolochene 1,3-dihydroxyalseenzyme is SEQ ID NO: 2 and comprises at least one mutation pair selected from the group consisting of F97A/N138L; F97L/N138L; F97A/M300A; F97A/N138A; F97A/N138D; F97A/M445A; F97A/K130N; H124A/W125A; F97G/N138L; and I116E/N138L. 9. The non-naturally occurring microbial organism of claim 2, wherein the TPA 1,2-dioxygenaseenzyme is SEQ ID NO: 7 and comprises at least one mutation pair selected from the group consisting of I89A/P141E; F59K/I89A; I89A/R142K; I89A/P141L; E62I/I89A; T63R/I89A; T63K/I89A; I89A/R142Y; F78K/I89A; and I89A/K143G, wherein the at least one mutation pair increases a reverse enzymatic reaction and the production of terephthalate from l,2-dihydroxy-3,5-cyclohexadiene-l,4-dicarboxylate (DCD). 10. The non-naturally occurring microbial organism of claim 2, wherein the 3-Dehydroquinate dehydrataseenzyme is SEQ ID NO: 8 and comprises at least one mutation pair selected from the group consisting of S141A/H143Q; T183A/A187G; P169V/A187G; E86G/A187G; T183L/A187G; P169V/T183A; E86G/T183A; E86G/P169V; P169N/K170Q; and P169V/T183L.

11. The non-naturally occurring microbial organism of claim 2, wherein the 3-Salicylaldehyde dehydrogenase enzyme is SEQ ID NO: 9 and comprises at least one mutation pair selected from the group consisting of C352G/G353L; M361A/P362A; C352I/G353A; D315T/C352G; C352G/G353A; S312A/D315T; D321A/C322G; N337L/C352G; D315V/C352G; and P326L/M327G.

12. The non-naturally occurring microbial organism of claim 2, wherein the l,6-dihydroxycyclohexa-2,4-diene-l-carboxylate dehydrogenaseenzyme is SEQ ID NO: 10 and comprises at least one mutation pair selected from the group consisting of P157A/Y158V; P157A/Y158E; I112A/Y158V; C185A/P188A; T150A/Y158V; P157A/Y158Q; Y158V/E233A; Y158V/V256T; P157A/Y158A; and Y158V/R227A.

13. The non-naturally occurring microbial organism of claim 1, wherein the dihydroxybenzoate is protocatechuate.

14. The non-naturally occurring microbial organism of claim 1, wherein the wild-type (lR,2S)-l,2-dihydroxy-3,5-cyclohexadiene-l,4-dicarboxylatedehydrogenase enzyme is encoded by TphB, from organism Comamonastestoteroni. 15. The non-naturally occurring microbial organism of claim 2, wherein the wild-type TPA 1,2-dioxygenase enzyme comprises a first oxygenase component, encoded by TphA2, TphA3, and a second reductase component, encoded by TphAl, from organism Comamona stestoter oni .

16. The non-naturally occurring microbial organism of any of claims 1-15, wherein at least one of the at least a first enzyme, the at least a second enzyme, or the at least a third enzyme is a heterologous enzyme.

17. The non-naturally occurring microbial organism of claim 3, wherein the non- naturally occurring microbial organism comprises a first pathway comprising converting the carbon source to D-erythrose 4-phosphate (E4P) and phosphoenolpyruvate (PEP), con- verting E4P and PEP to 3-dehydroshikimate (DHS), and the at least a third enzyme expressed in a sufficient amount to produce a dihydroxybenzoate from DHS; and a second pathway comprising the at least a first exogenous nucleic acid encoding the at least a first enzyme expressed in a sufficient amount to produce l,2-dihydroxy-3,5-cyclohexadiene- 1,4-dicarboxylate (DCD) from thedihydroxybenzoate.

18. The non-naturally occurring microbial organism of claim 4, wherein the non- naturally occurring microbial organism comprises a first pathway comprising converting the carbon source to D-erythrose 4-phosphate (E4P) and phosphoenolpyruvate (PEP), converting E4P and PEP to 3-dehydroshikimate (DHS), and the at least a third enzyme ex- pressed in a sufficient amount to produce a dihydroxybenzoate from DHS; a second pathway comprising the at least a first exogenous nucleic acid encoding the at least a first enzyme expressed in a sufficient amount to produce l,2-dihydroxy-3,5-cyclohexadiene-l,4- dicarboxylate (DCD) from the dihydroxybenzoate; and a third pathway comprising the at least a second exogenous nucleic acid encoding the at least a second enzyme expressed in a sufficient amount to produce terephthalate from l,2-dihydroxy-3,5-cyclohexadiene-l,4- dicarboxylate (DCD).

19. The non-naturally occurring microbial organism of claim 1, wherein the production of l,2-dihydroxy-3,5-cyclohexadiene-l,4-dicarboxylate (DCD) from the dihydroxybenzoate by the exogenously expressed wild-type or mutant (1R,2S)-1,2- dihydroxy-3,5-cyclohexadiene-l,4-dicarboxylate dehydrogenase comprises a reverse enzymatic reaction as compared to an endogenous, homologous wild-type (1R,2S)-1,2- dihydroxy-3,5-cyclohexadiene- 1 ,4-dicarboxylate dehydrogenase.

20. The non-naturally occurring microbial organism of claim 2, wherein the production of terephthalate from l,2-dihydroxy-3,5-cyclohexadiene-l,4-dicarboxylate (DCD) by the exogenously expressed wild- type or mutant TPA 1,2-dioxygenase comprises a reverse enzymatic reaction as compared to an endogenous, homologous wild-type TPA 1,2-dioxygenase.

21. A method for producingl,2-dihydroxy-3,5-cyclohexadiene-l,4-dicarboxylate (DCD), the method comprising culturing the non-naturally occurring microbial organism of claim 1 or claim 3 in the presence of a carbon source, and under conditions and for a sufficient amount of time to produce l,2-dihydroxy-3,5-cyclohexadiene-l,4-dicarboxylate (DCD). 22. A method for producing terephthalate, the method comprising culturing the non- naturally occurring microbial organism of claim 2 or claim 4 in the presence of a carbon source, and under conditions and for a sufficient amount of time to produce terephthalate.

23. The method of claim 21 or claim 22, wherein the product accumulates in the non- naturally occurring microbial organism or is secreted from the non-naturally occurring microbial organism.

24. The method of claim 21 or claim 22, wherein the non-naturally occurring microbial organism is cultured in anaerobic culture conditions.

25. The method of claim 21 or claim 22, wherein the carbon source comprises simple sugars.

26. The method of claim 21 or claim 22, wherein the carbon source comprises a bio- mass feedstock.

27. The method of claim 26, wherein the biomass feedstock comprises a ligno- cellulosic feedstock.

28. The method of claim 27, wherein the ligno-cellulosic feedstock is subjected to a pretreatment process to create a pretreated ligno-cellulosic feedstock.

29. The method of claim 28, wherein the pretreated ligno-cellulosic feedstock is subjected to a hydrolysis.

30. The method of claim 29, wherein the hydrolysis is an enzymatic hydrolysis.

31. The method of any of claims 22 to 30, wherein at least a portion of the terephthalate is converted into polyethylene terephthalate.

32. The use of the polyethylene terephthalate of claim 31 for making polyethylene ter- ephthalate preforms and bottles.

Description:
A microbial organism for producing terephthalate from biomass BACKGROUND Terephthalate, also known as terephthalic acid or TPA, is a monomer which can be converted to polyethylene terephthalate and its copolyesters, also known as PET. PET is a raw material widely used in the packaging industry, for instance for making containers and bottles for food and beverages, and in the textile industry. TPA is obtained mainly from petrochemical sources, making its cost dependent on the price of oil. Moreover, there is a gen- eral concern regarding the use of fossil sources, which are believed responsible for the greenhouse effect. Many attempts have been made to obtain terephthalate from renewable sources. Different chemical processes have been proposed for converting carbon sources derived from biomass to terephthalate. Some processes involve only biological conversion steps, which are promoted by an enzymatic catalyst produced by microorganisms.

In WO2011044243, a method for preparing renewable and relatively high purity p-xylene from biomass is presented. Biomass treated to provide a fermentation feedstock is fermented with a microorganism to isobutanol, which is further catalytically converted to renewable p-xylene in three conversion steps. The p-xylene can then be oxidized to form terephthalic acid or terephthalate esters.

WO2010148049 describes a method for converting cis, cis-muconic acid to terephtalate and its derivatives by means of a multi-step catalytic reaction. Muconic acid can be prepared from biomass, also by means of biological conversion.

WO2011094131 describes a non-naturally occurring microorganism comprising pathways necessary for the production of (2 -hydroxy- 3-methyl-4-oxobutoxy)phosphonate, which can subsequently be converted to p-toluate, and which can then be converted to terephthalate pathway.

There is therefore the need to develop processes for converting carbon sources contained in biomass feedstock to terephthalate and its derivatives. SUMMARY

This specification discloses a non-naturally occurring microbial organism comprising at least a first exogenous nucleic acid encoding at least a first enzyme expressed in a sufficient amount to produce l,2-dihydroxy-3,5-cyclohexadiene-l,4-dicarboxylate (DCD) from a dihydroxybenzoate, the at least a first enzyme being either a wild-type or a mutant and selected from the group consisting of (lR,2S)-l,2-dihydroxy-3,5-cyclohexadiene-l,4- dicarboxylate dehydrogenase (EC 1.3.1.53), 5-epi-aristolochene 1,3-dihydroxylase (EC 1.14.13.119), 2,4-dichlorophenol 6-monoxygenase (EC 1.14.13.20), Phenol 2-monooxygenase (EC 1.14.13.7), Benzoate 4-monooxygenase (EC 1.14.13.12), and Ben- zoate 1,2-dioxygenase (EC 1.14.12.10).

It is further disclosed that the non-naturally occurring microbial organism may further comprise at least a second exogenous nucleic acid encoding at least a second enzyme expressed in a sufficient amount to produce terephthalate from l,2-dihydroxy-3,5- cyclohexadiene-l,4-dicarboxylate (DCD), the at least a second enzyme being either a wild- type or a mutant and selected from the group consisting of TPA 1,2-dioxygenase (EC 1.14.12.15), 3-Dehydroquinate dehydratase (EC 4.2.1.10), Salicylaldehyde dehydrogenase (EC 1.2.1.65), and l,6-dihydroxycyclohexa-2,4-diene-l-carboxylate dehydrogenase (EC 1.3.1.25).

It is yet further disclosed that the non-naturally occurring microbial organism may further comprise at least a third exogenous nucleic acid encoding at least a third enzyme expressed in a sufficient amount to produce a dihydroxybenzoate from a carbon source, the at least a third enzyme being 3-dehydroshikimate dehydratase.

It is also disclosed that the dihydroxybenzoate may be protocatechuate. It is also disclosed that at least one of the at least a first enzyme, the at least a second enzyme, or the at least a third enzyme is a heterologous enzyme. It is further disclosed that the wild-type (lR,2S)-l,2-dihydroxy-3,5-cyclohexadiene-l,4- dicarboxylatedehydrogenase enzyme may be encoded by TphB, from organism Comamonastestoteroni, and the wild-type TPA 1,2-dioxygenase enzyme comprises a first oxygenase component, which may be encoded by TphA2, TphA3, and a second reductase component, which may be encoded by TphAl, from organism Comamonastestoteroni.

It is also disclosed that the non-naturally occurring microbial organism comprises at least one of a first pathway comprising converting the carbon source to D-erythrose 4-phosphate (E4P) and phosphoenolpyruvate (PEP), converting E4P and PEP to 3-dehydroshikimate (DHS), and the at least a third enzyme expressed in a sufficient amount to produce a dihydroxybenzoate from DHS; a second pathway comprising the at least a first exogenous nucleic acid encoding the at least a first enzyme expressed in a sufficient amount to produce l,2-dihydroxy-3,5-cyclohexadiene-l,4-dicarboxylate (DCD) from the dihydroxybenzoate; and a third pathway comprising the at least a second exogenous nucleic acid encoding the at least a second enzyme expressed in a sufficient amount to produce terephthalate from l,2-dihydroxy-3,5-cyclohexadiene-l,4-dicarboxylate (DCD).

Also disclosed is a method for producing terephthalate and/or DCD comprising culturing the non-naturally occurring microbial organism in the presence of a carbon source, under conditions and for a sufficient time to produce terephthalate.

It is further disclosed that the non-naturally occurring microbial organism may be cultured in aerobic or anaerobic culture conditions.

It is also disclosed that the carbon source may comprise simple sugars.

It is further disclosed that the carbon source may comprise a biomass feedstock, preferably a ligno-cellulosic feedstock, and that more preferably the ligno-cellulosic feedstock is subjected to a pretreatment process to create a pretreated ligno-cellulosic feedstock.

It is also disclosed that the pretreated ligno-cellulosic feedstock may be subjected to a hydrolysis, preferably to enzymatic hydrolysis. It is further disclosed that at least a portion of the terephthalate may be converted to polyethylene terephthalate, and that the polyethylene terephthalate may be used for making polyethylene terephthalate preforms and bottles.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 presents the chemical structure of: a) terephthalate; b) dihydroxybenzoate; c) protocatechuate and d) l,2-dihydroxy-3,5-cyclohexadiene-l,4-dicarboxylate.

Figure 2 presents a pathway for the conversion of glucose to terephthalate, according to a preferred embodiment of the invention.

Figure 3 shows a schematic of a degradation pathway of TPA to PCA.

Figures 4A and 4B show similar reactions in the KEGG database to parent reaction compared to its reversed reaction A) PCA->DCD, B) DCD->TPA.

Figure 5 shows a target reaction that transforms PCA into DCD by reverse DCD dehydro- genase, according to one embodiment.

Figure 6 shows a schematic of a reaction catalyzed by 5-epiaristolochene 1,3- dihydroxylase EC 1.14.13.119. Figure 7 shows a schematic of a reaction catalyzed by 2,4-dichlorophenol 6-monoxygenase EC 1.14.13.20.

Figure 8 shows a schematic of a reaction catalyzed by phenol 2-monooxygenase EC 1.14.13.7.

Figure 9 shows a schematic of a reaction catalyzed by benzoate 4-monooxygenase EC 1.14.13.12. Figure 10 shows a schematic of a reaction catalyzed by benzoate 1,2-dioxygenase EC 1.14.12.10.

Figure 11 shows a target reaction that transforms DCD into TPA by reverse TPA 1,2- dioxygenase, according to one embodiment.

Figure 12 shows a schematic of a reaction catalyzed by 3-dehydroquinate dehydratase (DHQD) EC 4.2.1.10.

Figure 13 shows a schematic of a reaction catalyzed by salicylaldehyde dehydrogenase EC 1.2.1.65.

Figure 14 shows a schematic of a reaction catalyzed by l,6-dihydroxycyclohexa-2,4-diene- 1-carboxylate dehydrogenase EC 1.3.1.25.

Figure 15 shows a density plot of the z- score of a theoretical protein structure model according to one embodiment.

Figure 16 shows homology modeling for DCD dehydrogenase (yellow) from template 2hilA(green).

Figure 17 shows secondary and tertiary structure of DCD dehydrogenase.

Figure 18 shows prediction of binding site pockets based on Q-SiteFinder, according to one embodiment.

Figure 19 shows predicted docking area for PCA in the structural model of DCD dehydrogenase.

Figures 20A, B show predicted mutations M240G for reversing DCD dehydrogenase, according to one embodiment. Figure 21 shows candidate mutations for benB-PCA interaction in order to show reverse DCD dehydrogenase activity, according to one embodiment.

Figure 22 shows predicted mutations in 5-epi-aristolochene 1,3-dihydroxyalse in order to show reverse DCD dehydrogenase activity, according to one embodiment.

Figure 23 shows predicted mutations in hot-spots and docking, according to one embodiment.

Figure 24 shows docking of DCD with enzyme from gen aroD, according to one embodiment.

Figure 25 shows docking of substrate DCD in the structure model for salicyladehyde dehydrogenase, according to one embodiment.

Figure 26 shows docking of DCD to enzyme structural model for benD, according to one embodiment.

Figure 27 shows hot spots from the mutagenesis for benD-DCD interaction, according to one embodiment.

Figure 28 shows a schematic of candidate predictions obtained through the tensor product methodfor the reverse reactions of enzymes EC 1.3.1.53 and EC 1.14.12.15.

Figures 29a and 29b present a common pathway of aromatic compounds biosynthesis

Figures 30a and 30b represents a pathway for the conversion of glucose to protocatechuate according to a preferred embodiment of the invention.

Figure 31 represents a pathway for the conversion of glucose to protocatechuate according to another preferred embodiment of the invention. DESCRIPTION

In one aspect of the invention, a non-naturally occurring microbial organism is described, which converts a carbon source to l,2-dihydroxy-3,5-cyclohexadiene-l,4-dicarboxylate. In another aspect of the invention, a non-naturally occurring microbial organism is described, which converts a carbon source to terephthalate. In various embodiments, the non-naturally occurring microbial organism comprises at least one of the following metabolic pathways: a first pathway which converts a carbon source to at least a dihydroxybenzoate; a second pathway which converts at least a fraction of the dihydroxybenzoate to l,2-dihydroxy-3,5- cyclohexadiene-l,4-dicarboxylate; and a third pathway which converts at least a fraction of the l,2-dihydroxy-3,5-cyclohexadiene-l,4-dicarboxylate to terephthalate. In one embodiment, the dihydroxybenzoate is protocatechuate. As used herein, l,2-dihydroxy-3,5-cyclohexadiene-l,4-dicarboxylate will also be indicated by DCD, protocatechuate will also be indicated by PC A, and terephthalate will also be indicated by TP A.

Terephthalate is a chemical compound having the molecular formula CgH 4 0 4 " (IUPAC name terephthalate), which is the ionized form of terephthalic acid, also referred to as p- phthalic acid or TPA, and it is understood that terephthalate and terephthalic acid can be used interchangeably throughout to refer to the compound in any of its neutral or ionized forms, including any salt forms thereof. It is understood by those skilled in the art that the specific form will depend on the pH. The chemical structure of terephthalate in acidic form is represented in Figure 1(a).

Dihydroxybenzoate is a class of aromatic chemical compounds having the molecular formula C 7 H 6 0 4 and the general chemical structure represented in Figure 1(b). In one embodiment, dihydroxybenzoate is protocatechuate and is represented as 3,4- dihydroxybenzoic acid. The chemical structure of protocatechuate is reported in Figure 1(c). DCD is a chemical compound having the molecular formula C^H Oe and the chemical structure is represented in Figure 1(d). DCD is not an aromatic compound.

In one embodiment, a production of terephthalate is represented by the pathway shown in Figure 2. In this example, the carbon source is represented by glucose.

In various embodiments, dihydroxybenzoate and PCA may be accumulated in the non- naturally occurring microbial organism. By the expression "a product is accumulated in the non-naturally occurring microbial organism", it is meant that the product is made available inside the cell of the microbial organism.

If the product is accumulated in the non-naturally occurring microbial organism, and is not further subjected to biochemical conversion, it may be present in the microbial organism and it can be detected and harvested.

If at least a fraction of the product accumulated in the non-naturally occurring microbial organism is further subjected to biochemical conversion reactions, it may or may not be present and detectable, depending on the intracellular condition, the reaction kinetics, and the fraction of the product involved in the subsequent conversion.

In various embodiments, the described product(s) may be secreted from the non-naturally occurring microbial organism, or a combination of accumulation and secretion of the described products may occur. In one embodiment, when a product(s) is secreted, it can be harvested from a medium external to the non-naturally occurring microbial organism, such as the culture medium.

In one embodiment, the non-naturally occurring microbial organism may be fed with the carbon source with subsequent conversion to terephthalate through the formation of intermediate compounds, comprising at least a dihydroxybenzoate and DCD.

In one embodiment, the non-naturally occurring microbial organism may be fed with the carbon source with subsequent conversion to DCD through the formation of intermediate compounds, comprising at least a dihydroxybenzoate.

In other words, in various embodiments, terephthalate or DCD is obtained from a carbon source by means of a non-naturally occurring microbial organism through at least one of the described metabolic pathways comprising described enzymes, which produce at least a dihydroxybenzoate and DCD.

In various embodiments, the carbon source may be obtained from a biomass, for example, from a ligno-cellulosic feedstock, which may be further subjected to a pretreatment and to an enzymatic hydrolysis process.

As used herein, the terms "microbial," "microbial organism" or "microorganism" are equivalent terms for indicating any organism that exists as a microscopic cell included within the domains of archaea, bacteria or eukarya. Therefore, the term comprises prokar- yotic or eukaryotic cells or organisms having a microscopic size and includes bacteria, archaea and eubacteria of all species as well as eukaryotic microorganisms such as yeast and fungi. The term also includes cells of any species that can be cultured for the production of a biochemical. The term "non-naturally occurring" microbial organism or microorganism of the invention means that the microbial organism has at least one genetic alteration not normally found in a naturally occurring strain of the referenced species, including naturally-occurring strains of the referenced species. Genetic alterations include, for example, modifications introducing expressible nucleic acids encoding metabolic polypeptides, other nucleic acid addi- tions, nucleic acid deletions and/or other functional disruption of the microbial organism's genetic material. Such modifications include, for example, coding regions and functional fragments thereof, for heterologous, homologous or both heterologous and homologous polypeptides for the referenced species. Additional modifications include, for example, non-coding regulatory regions in which the modifications alter expression of a gene or op- eron. Exemplary metabolic polypeptides include enzymes within a dihydroxybenzoate, PCA and TPA biosynthetic pathway. A metabolic modification refers to a biochemical reaction that is altered from its naturally occurring state. Therefore, non-naturally occurring microorganisms can have genetic modifications to nucleic acids encoding metabolic polypeptides, or functional fragments thereof. The present invention discloses metabolic pathways that can be designed and inserted in a micro-organism to achieve biosynthesis of terephthalate and/or DCD in cells or organisms. For example, biosynthetic production of terephthalate can be confirmed by construction of strains having the designed metabolic genotype. These metabolically engineered cells or organisms also can be subjected to adaptive evolution to further augment terephthalate bio- synthesis, including under conditions approaching theoretical maximum growth.

"Exogenous" as used herein is intended to mean that the referenced molecule or the referenced activity is introduced into the host microbial organism. The molecule can be introduced, for example, by introduction of an encoding nucleic acid into the host genetic mate- rial such as by integration into a host chromosome or as non-chromosomal genetic material such as a plasmid. Therefore, the term as it is used in reference to expression of an encoding nucleic acid refers to introduction of the encoding nucleic acid in an expressible form into the microbial organism. When used in reference to a biosynthetic activity, the term refers to an activity that is introduced into the host reference organism. The source can be, for example, a homologous or heterologous encoding nucleic acid that expresses the referenced activity following introduction into the host microbial organism. Therefore, the term "endogenous" refers to a referenced molecule or activity that is present in the host. Similarly, the term when used in reference to expression of an encoding nucleic acid refers to expression of an encoding nucleic acid contained within the microbial organism. The term "heterologous" refers to a molecule or activity derived from a source other than the referenced species whereas "homologous" refers to a molecule or activity derived from the host microbial organism. Accordingly, exogenous expression of an encoding nucleic acid of the invention can utilize either or both a heterologous or homologous encoding nucleic acid. It is understood that when more than one exogenous nucleic acid is included in a microbial organism, that the more than one exogenous nucleic acids refers to the referenced encoding nucleic acid or biosynthetic activity, as discussed above. It is further understood, as dis- closed herein, that such more than one exogenous nucleic acids can be introduced into the host microbial organism on separate nucleic acid molecules, on polycistronic nucleic acid molecules, or a combination thereof, and still be considered as more than one exogenous nucleic acid. For example, as disclosed herein a microbial organism can be engineered to express two or more exogenous nucleic acids encoding a desired pathway enzyme or protein. In the case where two exogenous nucleic acids encoding a desired activity are introduced into a host microbial organism, it is understood that the two exogenous nucleic acids can be introduced as a single nucleic acid, for example, on a single plasmid, on separate plasmids, can be integrated into the host chromosome at a single site or multiple sites, and still be considered as two exogenous nucleic acids. Similarly, it is understood that more than two exogenous nucleic acids can be introduced into a host organism in any desired combination, for example, on a single plasmid, on separate plasmids, can be integrated into the host chromosome at a single site or multiple sites, and still be considered as two or more exogenous nucleic acids, for example three exogenous nucleic acids. Thus, the num- ber of referenced exogenous nucleic acids or biosynthetic activities refers to the number of encoding nucleic acids or the number of biosynthetic activities, not the number of separate nucleic acids introduced into the host organism.

The non-naturally occurring microbial organisms of the invention can contain stable genet- ic alterations, which refers to microorganisms that can be cultured for greater than five generations without loss of the alteration. Generally, stable genetic alterations include modifications that persist greater than 10 generations, particularly stable modifications will persist more than about 25 generations, and more particularly, stable genetic modifications will be greater than 50 generations, including indefinitely.

Such genetic alterations include, for example, genetic alterations of species homologs, in general, and in particular, orthologs, paralogs or nonorthologous gene displacements.

An ortholog is a gene or genes that are related by vertical descent and are responsible for substantially the same or identical functions in different organisms. Genes are related by vertical descent when, for example, they share sequence similarity of sufficient amount to indicate they are homologous, or related by evolution from a common ancestor. Genes can also be considered orthologs if they share three-dimensional structure but not necessarily sequence similarity, of a sufficient amount to indicate that they have evolved from a common ancestor to the extent that the primary sequence similarity is not identifiable. Genes that are orthologous can encode proteins with sequence similarity of about 25% to 100% amino acid sequence identity. Genes encoding proteins sharing an amino acid similarity less that 25% can also be considered to have arisen by vertical descent if their three- dimensional structure also shows similarities.

Orthologs include genes or their encoded gene products that through, for example, evolu- tion, have diverged in structure or overall activity. For example, where one species encodes a gene product exhibiting two functions and where such functions have been separated into distinct genes in a second species, the three genes and their corresponding products are considered to be orthologs. For the production of a biochemical product, the orthologous gene harboring the metabolic activity to be introduced or disrupted is to be chosen for con- struction of the non-naturally occurring microorganism.

In contrast, paralogs are homologs related by, for example, duplication followed by evolutionary divergence and have similar or common, but not identical functions. Paralogs can originate or derive from, for example, the same species or from a different species. Paralogs are proteins from the same species with significant sequence similarity to each other suggesting that they are homologous, or related through co-evolution from a common ancestor.

A nonorthologous gene displacement is a nonorthologous gene from one species that can substitute for a referenced gene function in a different species. Substitution includes, for example, being able to perform substantially the same or a similar function in the species of origin compared to the referenced function in the different species. Although generally, a nonorthologous gene displacement will be identifiable as structurally related to a known gene encoding the referenced function, less structurally related but functionally similar genes and their corresponding gene products nevertheless will still fall within the meaning of the term as it is used herein. Functional similarity requires, for example, at least some structural similarity in the active site or binding region of a nonorthologous gene product compared to a gene encoding the function sought to be substituted. Therefore, a nonorthologous gene includes, for example, a paralog or an unrelated gene.

Host microbial organisms can be selected both from naturally occurring, and non-naturally occurring microbial organisms generated in, for example, bacteria, yeast, fungus or any of a variety of other microorganisms applicable to fermentation processes. Exemplary bacteria include species selected from Escherichia coli, Klebsiella oxytoca, Anaerobiospirillumsucciniciprodiicens, Actino bacillus succinogenes,

Mannheimiasucciniciprodiicens, Rhizobium etli, Bacillus subtilis, Corynebacteriumglutamicum, Gluconobacteroxydans, Zymomonasmobilis,

Lactococcuslactis, Lactobacillus plantarum, Streptomyces coelicolor, Clostridium acetobutylicum, Pseudomonas fluorescens, and Pseudomonas putida. Exemplary yeasts or fungi include species selected from Saccharomyces cerevisiae, Schizosaccharomycespombe, Kluyveromyceslactis, Kluyveromycesmarxianus, Aspergillusterreus, Aspergillusniger, Pichiapastoris, Rhizopusarrhizus, Rhizobusoryzae, and the like. E. coli is a particularly useful host organisms since it is a well characterized microbial organism suitable for genetic engineering. Other particularly useful host organisms include yeast such as Saccharomyces cerevisiae. It is understood that any suitable microbial host organism can be used to introduce metabolic and/or genetic modifications to produce a desired product.

Techniques for producing a non-naturally microbial organism are well-known to those of ordinary skill in the art, who will also understand how to choose appropriate vectors and promoters for the insertion of an exogenous nucleic acid in a particular organisms or strains. (For example, see methods in Sambrook, J. and Russell, D. W., Molecular Cloning: A Laboratory Manual, 3rd Edition, Cold Spring Harbor Laboratory Press, 2001). Very basically, a coding region of the homologous and/or heterologous gene is isolated from a "donor" organism that possesses the gene. In one well-known technique, a coding region is isolated by first preparing a genomic DNA library or a cDNA library, and second, identifying the coding region in the genomic DNA library or cDNA library, such as by probing the library with a labeled nucleotide probe that is at least partially homologous with the coding region, determining whether expression of the coding region imparts a detectable phenotype to a library microorganism comprising the coding region, or amplifying the desired sequence by PCR. Other techniques for isolating the coding region may also be used.

Methods for transferring the exogenous nucleic acid into a host organism are well-known to those of ordinary skill in the art. Briefly, the desired coding region is incorporated into the recipient organism in such a manner that the encoded enzyme is produced by the organism in functional form. That is, the coding region is inserted into an appropriate vector and operably linked to an appropriate promoter on the vector. If necessary, codons in the coding region may be altered, for example, to create compatibility with codon usage in the target organism, to change coding sequences that can impair transcription or translation of the coding region or stability of the transcripts, or to add or remove sequences encoding signal peptides that direct the generated protein to a specific location in or outside the cell, e.g., for secretion of the protein. Any type of vector, e.g., integrative, chromosomal, or episomal, may be used. The vector may be a plasmid, cosmid, yeast artificial chromosome, virus, or any other vector appropriate for the target organism. The vector may comprise other genetic elements, such as an origin of replication to allow the vector to be passed on to progeny cells of the host carrying the vector, sequences that facilitate integration into the host genome, restriction endonuclease sites, etc. Any promoter active in the selected organism, e.g., homologous, heterologous, constitutive, inducible, or repressible may be used. An "appropriate" vector or promoter is one that is compatible with the selected organism and will generate a functional protein in that organism. The non-naturally occurring microbial organism of the invention can be obtained by any method allowing an exogenous nucleic acid to be introduced into a cell, for example, transformation, electroporation, conjugation, fusion of protoplasts or any other known technique (Spencer J. F. et al. (1988), Journal of Basic Microbiology 28, 321-333) and techniques yet to be invented. A number of protocols are known for transforming yeast, bacte- ria, and eukaryotic cells. Transformation can be carried out by treating the whole cells in the presence of lithium acetate and of polyethylene glycol according to Ito H. et al. ((1983), J. Bacteriol., 153: 163), or in the presence of ethylene glycol and dimethyl sulphoxyde according to Durrens P. et al. ((1990) Curr. Genet., 18:7). Electroporation can be carried out according to Becker D. M. and Guarente L. ((1991) Methods in Enzymolo- gy, 194: 18). In various embodiments, the terephthalate and or DCD pathway(s) of the non-naturally occurring microbial organism comprises a set of biochemical reactions catalyzed by en- zyme(s) which converts the carbon source to terephthalate and/or DCD. A pathway may comprise many reactions or a single reaction. An enzyme may catalyze more than one reaction, which may occur in one or different catalytic domains. A catalytic domain is the catalytically active region of the enzyme where reactions actually occur. The catalytic domain may be formed by more sub-units, which are polypeptide chains. The reactions can occur simultaneously or sequentially. For instance, an enzyme may promote a decarboxylation and a dehydrogenation of the substrate.

The cofactor is a non-peptide molecule which is required for the enzyme activity to be performed. The cofactor can be tightly or loosely bound to the enzyme, can be organic or inorganic. A person skilled in the art will understand that enzyme activity can be modified by applying several cofactor alterations comprising modification of cofactor intracellular availability or cofactor replacement.

An enzyme which catalyzes a reaction for converting a first compound to a second compound may be tuned to catalyze a reverse reaction to convert the second compound to the first compound. That is, the compound which is the reagent in a direct reaction, may be- come the product in the reverse reaction. The enzyme may, under certain conditions, catalyze the reverse reaction of a reaction catalyzed by the same enzyme. For instance, this may be done by genetically modification of a micro-organism or modification of the culture medium in such a way that the reverse reaction is promoted. In a similar way, a pathway comprising many reactions may be reverted by reverting each single reaction which forms the pathway. Using a multidisciplinary approach involving techniques including moleculargraph-based methods to protein engineering, new biosynthetic pathways for producing TPA and/or DCD were identified. In the present disclosure, molecular graph-based methods for the modeling of enzymatic activity and specificity and for building predictors were used. The predictors are based on statistical inference and machine learning methods. The results are presented in two broad categories - identification of enzymes which are suitable for the desired reactions, and identification of mutations within the enzyme which increases the suitability of the enzyme to perform the desired reaction. The starting point for the new biosynthetic pathways is the degradation pathway from TPA to PCA, taking as reference its characterization and gene cluster identification for Comamonas sp. Strain E6(Sasoh et al, 2006). As shown in Figure 3, this pathway involves two steps: l)converting terephthalic acid (TPA) to (3S,4R)-3,4-dihydroxycyclohexa-l,5- diene-l,4-dicarboxylate (DCD), which is catalyzed by enzyme TPA 1,2-dioxygenase (ECl.14.12.15, KEGG reaction R05148). This dioxygenase is formed by two compo- nents:the oxygenase (encoded by genes TphA2, TphA3) and the reductase (encoded by TphAl); and 2) converting DCD to 3,4-dihydroxybenzoate or protocatechuic acid (PCA), which is catalyzed by enzyme DCD dehydrogenase (EC 1.3.1.53, KEGG reaction R01633), and encodedby gene TphB. In the present disclosure, enzymes capable of cata- lyzing the reverse steps from the TPA degradation pathway were determined, in order to synthetize TPA from PCA.

In a first aspect, candidate enzymes for the target reactions were determined using an annotation tool for reactions in the RetroPathtool (Carbonell et al, 2011). The method is based on a kernel-based predictor from the tensor product of the reactions molecular signatures and the string kernel of enzyme sequences (Faulon et al, 2008), and employed a computing cluster of 8 nodes x 12 cores (76 CPUs) and two metabolic databases KEGG (release 50) and Metacyc (release 16.0). This analysis screened 566117 enzyme sequences in KEGG and 5418 enzyme sequences in MetaCyc, as well as 6746 reactions in KEGG and 4392 re- actions in MetaCyc.

For each reversed reaction, the closest reactions in the chemical reaction space for both KEGG and MetaCyc databases measured through molecular signatures were searched. In Figure 4, the similarity between the original reaction (TPA biodegradation route), and the corresponding reverted ones to the reactions in the KEGG database are shown. As it can be seen from the figure, this technique will basically identify those reactions whose substrates and products are chemically transformed in a similar fashion as the reversed reactions.

From the set of reactions closer to the target reactions above a predefined cutoff, the list of sequences corresponding to enzymes catalyzing the reactions was chosen. The cutoff was selected in order to guarantee that enough non-redundant sequences were known. The set of this list of enzymes is what constitutes "positive set", i.e. enzymes that are known to catalyze reactions that are similar to the desired one. In one aspect, enzymes with a certain promiscuous level for some of these reactions for the desired activity may be obtained. Enzyme promiscuity is the ability of an enzyme to catalyze more than one reaction, or to show broad substrate specificity. The negative set of the training set was selected by sam- pling the sequence and reaction space dissimilar to the target reaction in a way that guarantees chemical and sequence diversity. The set of given reactions paired with the set of enzyme sequences was used to develop a predictor for the chemical-sequence space around the desired reaction. To that end, a support vector machine was trained from the convolution of the kernels of similarities of both reactions and sequences (Karatzoglou et al, 2005). The list of all non-redundant enzyme sequences available in KEGG and MetaCyc was then scored by the predictors of the two target reactions. In order to filter out false positives or highly unlikely hits, three scores were combined: 1) The output of the tensor-product predictor, which indicates how likely the enzyme can catalyze the reaction; 2) The sequence similarity of the given sequence to the positive set, in order to filter out sequences too far away from the positive set; and 3) The chemical similarity of the native reaction in the enzyme to the target reaction, in order to consider promiscuous activities that are closer to the parent. Shortlisted enzymes after the filtering were then inspected for their native reactions and information was collected from literature sources. When several enzymes with same activity had been selected, only the most promising ones were kept. A list of candidate en- zymes was then ranked and considered for subsequent analysis.

As described above, a predictor that scores the ability of a given enzyme by its gene se- quence to catalyze the reverse reaction of the DCD dehydrogenase (Figure 5) was built based on enzyme annotations available in either KEGG or MetaCyc databases. Because no known enzyme exists catalyzing this reaction, a positive training set of enzymes was selected from those catalyzing reactions that are close in terms of their molecular signature - based similarity to the target reaction.

Next, a list of most promising candidates obtained from the screening of more than half- million enzyme sequences was created. Predicted enzymes can often catalyze several reactions promiscuously. For those cases, only one reference reaction is provided. Similarly, enzyme sequences catalyzing the same reaction(s) that are in addition close homologues were usually scored approximately with the same values. Redundancy was removed in the list so that only top scored representative sequences are provided in a summarized form. Therefore, close homologues of selected genes with same catalytic function can be considered candidates as well, even if they have not been explicitly indicated.

For each candidate enzyme, two parameters are provided: 1) Tensor product: this score is the output from our molecular signature -based predictor (Faulon et al, 2008), (Carbonell and Faulon, 2010). A higher value of this score can be interpreted as a higher feasibility for the enzyme to be able to catalyze efficiently the target reaction; and 2) Closeness of se- quence to positive set: this score provides the maximum similarity between the predicted sequence and the sequences in the positive set. It is used in order to control the maximum allowed departure from the positive set that is screened in the sequence space.

Based on the above described methods, the following candidate enzymes were determined.

5-epiaristolochene 1,3-dihydroxylase (EC 1.14.13.119) as a candidate for catalyzing inverse DCD dehydrogenase. This gene, which was ranked first in the predictions from the Metacyc database, is from the organism Nicotianatabacum. It belongs to the cytochrome P450 family, and is a membrane protein. Therefore, it might be challenging to express it in a bacterial host such as E.coli, although there are precedents about expressing in E. coli plant cytochrome P450 after its directed evolution into the desired activity (Ajikumar et al, 2010). The enzyme consists of a long sequence formed by 504 amino acids. The parent re- action of the enzyme encoded by this gene is involved in the mevalonate pathway and produces the plant secondary metabolite capsidiol from substrate the fungal toxin aristolochone (Figure 6), which is a bicyclic sesquiterpene produced by certain fungi including the cheese mold Penicillium roqueforti. This last step in the biosynthesis of capsidiol catalyzes the regio- and stereo specific insertion of two hydroxyl moieties into the bicyclic sesquiterpene 5-epiaristolochene (Takahashi et al 2005). The scores for this enzyme sequence are given below.

2,4-dichlorophenol 6-monoxygenase (EC 1.14.13.20) as a candidate for catalyzing inverse DCD dehydrogenase. This enzyme belongs to the family of PheA/TfdB FAD monooxygenases. This enzyme 2,4-dichlorophenol into 3,5-dichlorocatechol (Figure 7). The top ranked gene is the tfdB from the bacterium Cupriaviduspinatubonensis strain JMP134 / LMG 1197) where it is part of a cluster involved in the xenobiotic degradation. The enzyme is a homotetramer of 4 x 598amino acids and it was successfully expressed in E. coli (Ledger et al, 2006).

Parameter Value p-value

Reference sequence splP27138ITFDB_CUPPJ

Tensorproduct 0.967 3.75E-4

Closeness of sequence to 0.964 6.83E-2

Phenol 2-monooxygenase (EC 1.14.13.7) as a candidate for catalyzing inverse DCD dehydrogenase. The phenol 2-monooxygenase is a flavoprotein involved in several degradation pathways such as chlorocyclohexane, chlorobenzene, toluene or aminobenzoate. This en- zyme is highly promiscuous: it transforms phenol to catechol (Figure 8), as well as toluene to o-cresol, 3-cresol to 2,3-dihyroxytoluene, 4-cresol to 4-methylcatechol, o-cresol to 2,3- dihydroytoluene,resorcinol to benzene- 1,2,4-triol. The top ranked gene is the one from Arthrobacteraurescens; it codes a 644 amino acids protein (Mongodin et al, 2006).

Benzoate 4-monooxygenase (EC 1.14.13.12) as a candidate for catalyzing inverse DCD dehydrogenase. This is a cytochrome P450 benzoate 4-monooxygenase that transforms benzoate into 4-hydroxybenzoate (Parales et al, 2006) (Figure 9). This enzyme is promiscuous, being able to accept as substrates benzoate derivatives like chloro/fluoro/hydroxy/methyl/methoxybenzoate, cinnamate, nicotinate or picolinate. The top ranked gene is from the fungus Aspergillusfumigatus Af293, and codes a 519 amino acids protein (Nierman et al, 2005).

Parameter Value p-value

Reference sequence afm:AFUA_2G 15230 Tensorproduct 0.753 N/A

Closeness of sequence to 0.864 1.41E-3

positive set

Benzoate 1,2-dioxygenase (EC 1.14.12.10) as a candidate for catalyzing inverse DCD dehydrogenase. This enzyme belongs to the benzoate degradation pathway (from benzoate to catechol). It degrades benzoate into 1,2-cis-dihydroxybenzoate (Figure 10). It is known the role of benzoate in anaerobic degradation of terephthalate (Kleerebezem et al, 1999). Within this enzyme family, the sequence that was ranked at the top by the predictor was the one corresponding to benzoate 1,2-dioxygenase beta subunit in Burkholderiathailandensis MSMB43. It consists of 163 amino acids.

Target reactions that transforms DCD into TPA by reverse TPA 1,2-dioxygenase were also examined. Similar to the previous section, a predictor was determined that scores the ability of a given enzyme to catalyze in this case the reverse reaction of TPA 1,2-dioxygenase (Figure 11). Same types of considerations about redundancy that are described above apply for this section. The scores that are provided here for each enzyme candidate, i.e. the tensor product and the closeness of sequence to positive set, were defined above. Based on the above described methods, the following candidate enzymes were determined.

3-Dehydroquinate dehydratase (DHQD) (EC 4.2.1.10) as a candidate for catalyzing inverse TPA 1,2-dioxygenase. This enzyme belongs to the family of lyases and is found in the shikimate pathway, the pathway that allows plants, fungi, and bacteria to produce aromatic amino acids. This reaction transforms 3-dehydroquinate (DHQ) into 3-dehydroshikimate (DHS) (Figure 12). The bestrankedgene is AROl from yeast (Gientka et al, 2009), which is a pentafunctional enzyme that catalyzes 5 consecutive reactions in the shikimate pathway.

Salicylaldehyde dehydrogenase (EC 1.2.1.65) as a candidate for catalyzing inverse TPA 1,2-dioxygenase. This enzyme belongs to the family of oxidoreductases and is involved in the naphthalene degradation pathway. It catalyzes the chemical reaction where salicylaldehyde is accepted as its main substrate, whereas its main product is salicylate (Figure 13), which is degraded forward into catechol. The top ranked gene is NahF, from Pseudomonas putida G7 (Coitinho et al, 2012) .

l,6-dihydroxycyclohexa-2,4-diene-l-carboxylate dehydrogenase (EC 1.3.1.25) as a candi- date for catalyzing inverse TPA 1,2-dioxygenase. This enzyme transforms (1R,6R)-1,6- dihydroxycyclohexa-2,4-diene-l-carboxylate into catechol (Figure 14). It belongs to the benzoate degradation pathway that transforms benzoate into catechol through two steps, EC 1.14.12.10 and EC 1.3.1.25. The top ranked sequence is the one belonging to the bacte- rium Novosphingobiumaromaticivorans.

As broadly introduced above, in silico mutagenesis was also performed in order to investigate variants from parent enzyme sequences with the ability of catalyzing the target reac- tions. Two approaches were combined: 1) A conventional computational protein design procedure that looks for areas around active regions which are more likely to interact with the substrate; and 2) Use of a proprietary bioinformatics tools in order to predict the performance for a given pair of enzyme sequence - catalytic reaction (Carbonell et al, 2011). Initially, mutational analysis of candidate enzymes for reverse DCD dehydrogenase was examined. Based on the above described methods, the following enzymes and associated mutations were determined.

With respect to the reverse reaction of (lR,2S)-l,2-dihydroxy-3,5-cyclohexadiene-l,4- dicarboxylate (DCD) dehydrogenase (Figure 5), mutations in the parent enzyme DCD dehydrogenase (EC 1.3.1.53) were considered for their ability for reversing the reaction direction. The parent gene sequence is the DCD dehydrogenase, gene TphB, from organism Comamonastestoteroni,GenBank:AAX18943, reference protein sequence is UniProt:Q5d0x4. DCD dehydrogenasefrom Comamonastestosteroni T-2 has been purified and some of its properties characterized (Sailer et al, 1995). It corresponds to a homodimer (60.0 kD), with KM for NAD+ = 43 M,and K M for DCD = 90 μΜ. This enzyme employs one cofactor, iron. According to Uniprot, the closest template (27% similarity) having a known crystal structure is lptmA (E.coli PdxA). According to PFAM, Interpro, is classi- fied in the family of pyridoxal phosphatebio synthetic proteins PdxA. According to Sailer et al, 1995, DCD dehydrogenase shows most similarity to benzene dihydrodiol dehydrogenase (EC 1.3.1.19). It has low similarity with other gene sequences in databases, even in 50% cluster of Uniprot, it belongs to a cluster of 6 proteins, consisting of homologues in Comamonas strains, except for one in another organism (Ramlibactertataouinensis), which is less annotated.

Annotations in Metacycpredict that the reaction is reversible. In addition, the reversibility and the possibility of tuning the direction or "catalytic bias" of hydrogenases has been investigated and reported by several groups (AbouHamdan et al, 2012), (Mcintosh et al, 2011), (Stiebritz et al, 2012). The catalytic cycle of oxidoreductases involves various steps (substrate binding, product release, proton and electron transfers, active site chemistry). The mechanism for biasing the enzyme in one direction which was employed was to slow the step that is rate-limiting for the enzyme only when it is working in the opposite direction (AbouHamdan et al, 2012). According to this study, these rate-limiting steps might occur on sites of the proteins that are remote from the active site.

The procedure begins with an identification of conserved regions of the DCD dehydrogenaseenzyme sequence. Since there is limited information available about the functional role of residues and no structure is available, a multiple sequence alignment from a BLAST search was run and downloaded from UniProt. In the following table, closest homologues found by BLAST are shown. Conservation of each position in the sequence was scored by using t_coffee withBLOSUM62 as substitution matrix scored.

List of closest homologues toTphB from Comamonastestoteroni.

gblAAX18943.ll terephthalatedihydrodioldehydrogenase [Comamonas testosteroni] gil60326848 terephthalatedihydrodioldehydrogenase [Comamonas testosteroni]

gil78210740 l,2-dihydroxy-3,5-cyclohexadiene-l,4- dicarboxylatedehydrogenase [Comamonassp. E6]

gil221066791 4-hydroxythreonine-4-phosphate dehydrogenase [Comamonas testosteroni KF-1]

gil264677631 terephthalatedihydrodioldehydrogenase [Comamonas testosteroni CNB-2] gil78210749 l,2-dihydroxy-3,5-cyclohexadiene-l,4- dicarboxylatedehydrogenase [Comamonassp. E6]

gil337279312 4-hydroxythreonine-4-phosphate dehydrogenase [Ramlibactertataouinensis TTB310]

gil388565267 4-hydroxythreonine-4-phosphate dehydrogenase [Hydrogenophagasp. PBC] gil388824001 4-hydroxythreonine-4-phosphate dehydrogenase [Rhodococcusopacus M213]

gil384101989 4-hydroxythreonine-4-phosphate dehydrogenase [Rhodococcusimtechensis RKJ300]

gill 11024961 4-hydroxythreonine-4-phosphate dehydrogenase [Rhodococcusjostii RHA1] gil377811638 unnamed protein product [Burkholderia sp. YI23]

gil40787272 putative pyridoxal phosphate biosynthesis protein [Rhodococcus sp. DK17]

The structural study was performed through homology modeling. The template model was 2hilA (putative 4-hydroxythreonine-4-phosphate dehydrogenase fromSalmonella typhimurium). Quality of the model was estimated with a z-score = -2.44, a value that can be considered in quality close to "good models" (see Figure 15) and Benkert et al, 2011). The structural alignment between the template and DCD dehydrogenase is shown in Figure 16), and the secondary and tertiary structure of the model in Figure 17). Active sites in the enzyme were predicted using Q-SiteFinder (Laurie et al, 2005), an energy-based method for the prediction of protein-ligand binding sites. (Figure 18) shows the predicted pocket regions identified by Q-SiteFinder.

Next, a docking of the desired substrate (PCA) into the structural model (Grossdidier et al, 2011) was performed. Then, values from conservation, prediction of active sites and interface from the docking depending on the binding energies were combined. The cluster of substrate conformers with the lowest fitness energy was obtained at -1618.0659. Full fitness is a global energy score that considers internal, solvent and surface energies. This con- formation with the lowest fitness, thus, was selected as a model of the transition state enzyme-substrate, and the docking area was then computed in order to determine the active site, as shown in Figure 19). By putting together the difference computed scores for residues inTphB, a list of 30 hot- spot positions was selected, as shown in the table below, which were scored by their docking fitness, distance to ligand, active site, and conservation index.

Identified hot-spot positions along with their scores, for Comamonastestoteroni

DCD dehydrogenase.

238 A score=- -20943.15236;fitness=-

1618.06590;calpha . _dist= =7.05418 ;ligand_site=4;cons=9

203 H score=- -19313.38500;fitness=-

1618.06590;calpha . _dist= =8.61715;ligand_site=4;cons=8

151 L score=- -19307.51652;fitness=-

1618.06590;calpha . _dist= =9.10619;ligand_site==4;cons==8

257 Q score=- -19287.56688;fitness=-

1618.06590;calpha . _dist= = 10.76866;ligand_site=4;cons=8

250 Y score=- -17757.08913;fitness=-

1618.06590;calpha . _dist= =3.78507;ligand_site=2;cons=9

152 R score=- -17681.26833;fitness=-

1618.06590;calpha . _dist= = 10.67787 ;ligand_site=3 ;cons=8

150 G score=- -17669.74935;fitness=-

1618.06590;calpha . _dist= = 11.72505 ;ligand_site=2;cons=9

247 H score=- -16158.57420;fitness=-

1618.06590;calpha . _dist= =2.20848 ;ligand_site=l;cons=9

196 A score=- -16148.67460;fitness=-

1618.06590;calpha . _dist= =3.19844;ligand_site= 1 ;cons=9

194 Q score=- -16109.63470;fitness=-

1618.06590;calpha . _dist= =7.10243 ;ligand_site= 1 ;cons=9

248 D score=- -16098.56440;fitness=-

1618.06590;calpha . _dist= =8.20946;ligand_site=4;cons=6 61 P score=-16074.01380;fitness=-

1618.06590;calpha_dist=10.66452;ligand_site=3;cons=7

01 N score=-16064.28580;fitness=-

1618.06590;calpha_dist=11.63732;ligand_site=l;cons=9

40 M score=-14539.84695;fitness=- 1618.06590;calpha_dist=2.52735;ligand_site=0;cons=9

33 D score=- 14521.84785 ;fitness=- 1618.06590;calpha_dist=4.52725;ligand_site=0;cons=9

198 F score=-14513.58558;fitness=- 1618.06590 ;calpha_dist=5.44528 ;ligand_site= 1 ;cons=8

52 A score=-14464.70325;fitness=-

1618.06590 ;calpha_dist= 10.87665 ;ligand_site= 1 ;cons=8

193 P score=- 14460.97734;fitness=-

1618.06590;calpha_dist=11.29064;ligand_site=l;cons=8

43 A score=- 12894.88816;fitness=- 1618.06590 ;calpha_dist=6.20488 ;ligand_site= 1 ;cons=7

31 A score=-12868.56120;fitness=- 1618.06590 ;calpha_dist=9.49575 ;ligand_site= 1 ;cons=7

46 K score=- 11306.31404;fitness=- 1618.06590;calpha_dist=2.87818;ligand_site=2;cons=5

35 P score=-l 1273.53647;fitness=- 1618.06590;calpha_dist=7.56069;ligand_site=l;cons=6

37 G score=-8065.85335;fitness=-1618.06590;calpha_dist=4.89523 ligand_site=0;cons=5 45 R score=-8048.23515;fitness=-1618.06590;calpha_dist=8.41887: ligand_site= 1 ;cons=4 44 Q score=-4832.42001;fitness=-1618.06590;calpha_dist=7.25923 ligand_site=2;cons=l 32 V score=-4828.95903;fitness=-1618.06590;calpha_dist=8.41289 ligand_site= 1 ;cons=2 42 L score=-3231.58214;fitness=-1618.06590;calpha_dist=2.27483 ligand_site= 1 ;cons= 1 197 V score=-3223.95412;fitness=-1618.06590;calpha_dist=6.08884 ligand_site=2;cons=0 36 M score=-1615.32540;fitness=-

1618.06590;calpha_dist=2.74050;ligand_site=0;cons=l

49 L score=-1610.52979;fitness=-1618.06590;calpha_dist=7.53611;li gand_site=l;cons=0 Selected positions were mutated in order to build a combinatorial library containing all mutations and pair combinations of mutations. The list of sequences (total372,620) was then submitted to the predictor of enzyme sequence-reaction performance (tensorproduct), which had originally been trained with data from reactions that are similar to the one sought (the reverse DCDdehydrogenase).

At the top of the predictions appeared two residues: 240M and 232V to be mutated to lighter residues such as glycine and alanine. As show in Figure 20A, B, mutation from methionine to glycine might make the active site much more flexible in order to accommodate the substrate, whereas the mutation from valine to alanine might be linked to substrate recognition.

Based on the above, a list of top ranked variants for EC 1.3.1.53 which promote the desired enzymatic reaction for reversing DCD dehydrogenase was determined, and listed below.

Top ranked variants for Benzoate 1,2-dioxygenase (EC 1.14.12.10), using the method described above, were also determined. This enzyme belongs to the benzoate degradation pathway (from benzoate to catechol). It is know the role of benzoate in anaerobic degrada- tion of terephthalate (Kleerebezem et al, 1999). The sequence is the one corresponding to benzoate 1,2-dioxygenase beta subunit Burkholderiathailandensis MSMB43. It consists of 163 amino acids. A structural template has been used (no modeling), corresponding to PDB code 3E99 (crystal structure of the beta subunit of the benzoate 1,2-dioxygenase from Burkholderia Mallei ATCC23344. The structure is an alpha and beta protein (a+b) from the NTF2-like superfamily fold according to the SCOP classification. No information about the catalytic site was found in the Catalytic Site Atlas.

A docking was performed with substrate PCA. Substrate was positioned in one of the main cavities of the protein. Then, values from conservation, prediction of active sites and interface from the docking depending on the binding energies were combined. The cluster of substrate conformers with the lowest fitness energy was obtained at -1089.09. 60 hot-spots were considered for mutagenesis in order to build a combinatorial library containing all mutations and pairs of combinations of mutations. The list of sequences (total: 1,021,020) was then submitted to the predictor of enzyme sequence-reaction performance, which had originally been trained with data from reactions that are similar to the one sought (the re- verse DCD dehydrogenase).

The best candidate hot-spots for mutations, shown below, are to some extent directly linked to the ligand interface as shown in the Figure 21). The main kind of mutations that we can observe are substitutions from an aromatic (tryptophan) to an amino acid quite lighter such as glycine or alanine. Starting from this, one may consider that mutating an aromatic to a lighter element allows molecules to get a better conformation, which makes it possible to extend the interface between enzyme and PCA.

Based on the above, a list of top ranked variants for EC 1.14.12.10 which promote the de- sired enzymatic reaction for reversing DCD dehydrogenase was determined, and listed below.

Posl Pos2 Consensus score p-value

W25G W28A 12.04 3.03E-14

D21G K23A 9.769 5.69E-10

W25A W28L 9.387 2.45E-09

W25A W28A 8.907 1.42E-08 S17A H91I 8.72 2.77E-08

S17A W28L 8.55 4.97E-08

W25R W28L 8.53 5.24E-08

W28L H91I 8.389 8.54E-08

W25G W28L 8.28 1.23E-07

S17A Y125A 8.14 1.97E-07

Top ranked variants for 5-epi-aristolochene 1,3-dihydroxyalse (EC 1.14.13.119) which promote the desired enzymatic reaction for reversing DCD dehydrogenase was determined. The top ranked candidate in this family of enzymes corresponds to a gene from plant Nicotianatabacum. This enzyme belongs to the cytochrome P450 family, being a membrane protein. It can be therefore challenging to express in host bacteria such as E. coli, although there are precedents about expressing in E. coli plant cytochrome P450 that have been evolved by directed evolution into the desired activity (Ajikumar 2010). The parent reaction of this gene produces the plant secondary metabolite capsidiol, derived from the mevalonate pathway, from the fungal toxin aristolochone, a bicyclic sesquiterpene produced by certain fungi, including the cheese mold Penicilliumroqueforti. This last step in the biosynthesis of capsidiol catalyzes the regio- and stereo specific insertion of two hydroxyl moieties into the bicyclic sesquiterpene 5-epiaristolochene (Takahashi et al 2005). Our direction was to take advantage of the stereospecific hydro xilating activity of this enzyme in order to search for variants with the potential ability of catalyzing the reverse DCD dehydrogenase reaction.

This protein is encoded by a sequence of 504 amino acids. Structures in the family ofcytochrome P450 are not always well characterized and good templates are not always available. In our case, the modeler has selected as template the PDB structure 3CZH:B, with az-score = -4.513. Therefore, the quality of the model structure was rather low (see Figure 15). The proposed top substitutions are given below and depicted in Figure 22). It should be noted that in an adjacent region to the one considered for this enzyme, mutants in residues S368 and 1463 have been reportedly observed to alter substrate specificity (Takahashi et al,2005). Based on the above, a list of top ranked variants for EC 1.14.13.119 which promote the desired enzymatic reaction for reversing DCD dehydrogenase was determined, and listed below.

Candidate enzymes and mutations were also determined for the reverse TPA 1,2-dioxygenase reaction, as described above. As in previous sections, we start our mutagenesis study for the reverse reaction of terephthalate(TPA) 1,2-dioxygenase (Figure 11) by considering the possibility of reversing the reaction direction in the parent enzyme. This enzyme belongs to a class of multicomponent enzyme systems (EC 1.14.12.-) that catalyze reductive dihydroxylation of their substrates. These enzymes are typically quite promiscuous, catalyzing the oxidation of a wide range of compounds in addition to the native substrate (Parales et al, 2006). This enzyme is composed of three subunits. In our study, we have considered mutations in the oxygenase component large subunit (TphA2 from Comamonas sp. Strain E6) (Sasoh et al, 2006). This subunit is known to play a major role in determining substrate specificity in ring hydroxylatingdioxygenases (see Parales et al, 2006, and references therein). It contains two conserved domains, a Rieske-type [2Fe-2S] cluster-binding motif, and an aromatic ring-hydroxylating catalytic motif.

The oxygenase component of TPA 1,2-dioxygenase has been purified from strains T-2 and T-7 of Comamonastestoteroni(Fukuhara et al, 2008). It was estimated to be a heterotetramer and heterohexamer, composed of two subunits with 49 kDa, 18 kDa, and 46 kDa, 16 kDa, respectively. It showed activity only toward TPA, and the cof actor iron was required. KM values for TPA were determined to be 72 μΜ, while KM value for NADH and NADPH are 51 μΜ and 10 μΜ, respectively (Fukuhara et al, 2008).

Homology modeling was performed in automated mode by the modeler from template 3VCA, chain A, which corresponds to the crystal structure of a ring hydroxylating dioxygenase from Sinorhizobiummeliloti. Sequence identity was low (less than 20%) and z-score was -5.9. Therefore, the quality of this model was low (See Figure 15). In addition, structures 107N:A, 3GZX:A, 1ULLE, 1WQL:A, 2B1X:E, which correspond to closer enzymes both in terms of function and similarity, were manually submitted as templates to the modeler. Following the same workflow as above (conservation sites, prediction of active sites in the structure, and docking with substrate DCD), a list of 34 hot-spot positions was identified in the sequence (out of 413 amino acids), as shown in the table below, which were scored by their docking fitness, distance to ligand, active site, and conservation index.

Identified hot-spot positions along with their scores, for Comamonastestoteroni

TP ADO.

142 " R score=-40890.00150;fitness=-

2727.88280;calpha_dist=1.88270;ligand_site=6;cons=9

104 Y score=-35354.34877;fitness=-

2727.88280;calpha_dist=8.31751;ligand_site=4;cons=9

85 R score=-32702.46156;fitness=-2727.88280;calpha_dist=2.67767;l igand_site=3;cons=9 65 I score=-32615.12592;fitness=-2727.88280;calpha_dist=9.95564;l igand_site=3;cons=9 64 P score=-29924.58018;fitness=-2727.88280;calpha_dist=7.46642;l igand_site=2;cons=9 63 T score=-29907.58320;fitness=-2727.88280;calpha_dist=9.01160;l igand_site=2;cons=9 144 L score=-29896.84907;fitness=-

2727.88280;calpha_dist=9.98743;ligand_site=2;cons=9

108 S score=-29875.73171;fitness=- 2727.88280;calpha_dist=11.90719;ligand_site=2;cons=9

81 R score=-27255.80870;fitness=-2727.88280;calpha_dist=2.30193;l igand_site=l;cons=9

86 G score=-27254.11570;fitness=-2727.88280;calpha_dist=2.47123;l igand_site=l;cons=9

79 E score=-27234.01460;fitness=-2727.88280;calpha_dist=4.48134;l igand_site=l;cons=9

138 E score=-27215.17030;fitness=-

2727.88280;calpha_dist=6.36577;ligand_site=l;cons=9

102 C score=-27195.18210 ;fitness=-

2727.88280;calpha_dist=8.36459;ligand_site=2;cons=8

143 K score=-27193.19220 ;fitness=-

2727.88280;calpha_dist=8.56358;ligand_site=l;cons=9

62 E score=-27188.13550;fitness=-2727.88280;calpha_dist=9.06925;l igand_site=l;cons=9

159 E score=-27185.37190;fitness=-

2727.88280;calpha_dist=9.34561;ligand_site=3;cons=7

121 E score=-27171.65350;fitness=-

2727.88280;calpha_dist=10.71745;ligand_site=2;cons=8

36 R score=-27168.14530;fitness=-

2727.88280;calpha_dist=11.06827;ligand_site=l;cons=9

157 F score=-27165.42230 ;fitness=-

2727.88280;calpha_dist=11.34057;ligand_site=2;cons=8

103 V score=-27164.15300;fitness=-

2727.88280;calpha_dist=11.46750;ligand_site=l;cons=9

82 C score=-24534.10566;fitness=-2727.88280;calpha_dist= =1.87106 ;ligand_site=0 ;cons=9

83 A score=-24530.87070;fitness=-2727.88280;calpha_dist=2.23050;l igand_site=0;cons=9

84 H score=-24521.91525;fitness=-2727.88280;calpha_dist=3.22555;l igand_site=0;cons=9

141 P score=-24502.34079;fitness=-

2727.88280 ;calpha_dist=5.40049 ;ligand_site= 1 ;cons=8

139 H score=-21783.33536;fitness=-

2727.88280;calpha_dist=4.96588;ligand_site=l;cons=7

136 K score=-19022.49069;fitness=-

2727.88280;calpha_dist=10.38413;ligand_site=l;cons=6

87 A score=- 16335.69210;fitness=-2727.88280;calpha_dist=5.26745;ligand_s ite=2;cons=4

105 H score=- 13595.10315 ;fitness=- 2727.88280;calpha_dist=8.86217;ligand_site=3;cons=2

90 A score=-10868.01988;fitness=-

2727.88280;calpha_dist=10.87783;ligand_site=2;cons=2

89 I score=-8164.48965;fitness=-2727.88280;calpha_dist=6.38625;li gand_site=2;cons=l 78 F score=-8158.35915;fitness=-2727.88280;calpha_dist=8.42975;li gand_site=l;cons=2 115 L score=-8158.05609;fitness=-2727.88280;calpha_dist=8.53077;li gand_site=2;cons=l 100 F score=-5433.22770;fitness=-

2727.88280;calpha_dist=11.26895;ligand_site=l;cons=l

59 F score=-2716.16243 ;fitness=-2727.88280 ;calpha_dist= 11.72037 ;ligand_site= 1 ;cons=0

Selected positions were mutated leading to a mutant library of 449,480 sequences, containing all combinations of mutants and pairs of mutants. Best-scored variants, shown below and Figure 23), contained the substitution 89 ILE to ALA, a small amino acid, a substitu- tion that might play a role in substrate recognition. Simultaneously with thatsubstitution appear at the highest scores mutations from positions P141, R142, F59, E62, and T63.

Based on the above, a list of top ranked variants for EC 1.14.12.15 which promote the desired enzymatic reaction was determined, and listed below.

Top ranked variants for 3-Dehydroquinate dehydratase (DHQD) (EC 4.2.1.10) which pro- mote the desired enzymatic reaction were determined. This enzyme belongs to the family of lyases and is found in the shikimate pathway, the pathway that allows plants, fungi, and bacteria to produce aromatic amino acids. The sequence (Uniprot P58687) consists of 252 amino acids. For the selection of hot-spots, we selected the PDB structure 3M7W: A from Salmonella Thypmuruim LT2, (gene aroD), to which DCD was docked (Figure 24).

34 hot-spots were selected based on their scores, leading to a combinatorial library of 449,480 mutants. After performing the mutagenesis study, the best mutations appeared distant from the region where the substrate was positioned. However, residues that appear at the top in the prediction (S141, H143 proton acceptor, P169 Schiff-base intermediate with substrate) are reported to play a role in the reaction mechanism (Yao et al, 2012), which provides some evidences about the potential modulating effect on substrate specificity and the reaction mechanism of the proposed substitutions. Based on the above, a list of top ranked variants for EC 4.2.1.10 which promote the desired enzymatic reaction was determined, and listed below.

Top ranked variants for Salicylaldehyde dehydrogenase (EC 1.2.1.65) which promote the desired enzymatic reaction were determined. Parent sequence is the one for gene nahF in Pseudomonas putida. Closest homologues (35% identity) whose structure is the crystal structure 3R64 of the NAD-dependent benzaldehydedehydrogenase from Corynebacteriumglutamicum (EC 1.2.1.28) (according to UniProt site)and 2BJK:A 1- pyrroline-5-carboxylate dehydrogenase from Thermus (EC 1.5.1.12) (accordingto NCBI). The modeler took 1BXS:D, an aldehyde dehydrogenase. Quality of the model was good (z- score = -2.6). After performing the docking, the substrate was positioned at the mainpocket (see Figure 25).

Based on the above, a list of top ranked variants for EC 1.2.1.65 which promote the desired enzymatic reaction was determined, and listed below.

Top ranked variants for l,6-dihydroxycyclohexa-2,4-diene-l-carboxylate dehydrogenase (EC 1.3.1.25) which promote the desired enzymatic reaction were determined. This enzyme transforms (lR,6R)-l,6-dihydroxycyclohexa-2,4-diene-l-carboxylate into catechol. It belongs to the benzoate degradation pathway that transforms benzoate into catechol through two steps, EC 1.14.12.10 and EC 1.3.1.25. The top ranked sequence is the one belonging to the bacterium Novosphingobiumaromaticivorans (NCBI:YP_001165609.1,UniProt:A4XDS2). There is no known structure model, according to Uniprot. The closest template (35% similarity) is 1VL8. It has high to very high simi- larity for 6 proteins consisting of homologs in other Novosphingobium or Sphingomonas strains. The structural study was performed through homology modeling. The template model was 1BL8:B (gluconate 5-dehydrogeanse (TM00441) from Thermotogamaritima). The quality of the model measured through the z-score was of 0.61 (z-score = -2.36), a value that can be considered in quality close to a good model. Structural analysis reveals that gluconate 5- dehydrogenase adopts a protein fold similar to the ones found in members of the short chain dehydrogenase/reductase (SDR) family, while the enzyme itself represents a previously uncharacterized member of this family. Docking of the desired substrate (DCD) into the structural model was performed (Figure 26). Then, values from conservation, prediction of active sites and interface from the docking depending on the binding energies were combined. The cluster of substrate conformers with the lowest fitness energy was obtained at -1318.33. Selected positions were mutated in order to build a combinatorial library containing all mutations and pair of combinations of mutations.The list of sequences (total: 1,989,420) was then submitted to the predictor of enzyme sequence-reaction performance, which had originally been trained with data from reactions that are similar to the one sought (the reverse TPA 1,2-dioxygenase).

The top candidate mutations are given below. Mainly, one may find mutations on proline, which might imply conformational changes in order to improve interaction with DCD in the pocket. As shown in Figure 27, the two main mutations appear on the Tyrl58 and Thrl50. On one hand we observe the substitution Y158E or to lighter amino acids. Because this residue is directly in the pocket - thus, quite close to DCD -it appears normal that a mutation from an aromatic that adopts a big conformation and involves more struc- tural constraints to a lighter element would let more space to the substrate. On the other hand the substitution T150A is set around the zone of access to the pocket, appearing to be a mutation that also makes the access of the substrate to the pocket easier.

Based on the above, a list of top ranked variants for EC 1.3.1.25 which promote the desired enzymatic reaction was determined, and listed below. P157A Y158V 9.201 1.49E-08

P157A Y158E 8.09 5.46E-07

I112A Y158V 7.899 9.75E-07

C185A P188A 7.67 1.89E-06

T150A Y158V 7.283 5.73E-06

P157A Y158Q 7.192 7.37E-06

Y158V E233A 7.15 8.22E-06

Y158V V256T 7.058 1.06E-05

P157A Y158A 6.976 1.32E-05

Y158V R227A 6.97 1.35E-05

In summary, several predicted candidates for the reverse reactions of EC 1.3.1.53 and ECl.14.12.15 were determined (Figure 28). In both cases, candidates were found mainly among enzymes involved in degradation pathways such of those for benzoate or naphthalene. In addition, an enzyme involved in the mevalonate pathway was identified as a candidate for reversing DCD dehydrogeanase, and another enzyme involved in the shikimate pathway was identified as a candidate for reversing TPA 1,2-dioxygenase. In addition, the mutagenesis workflow described above provided several mutations of parent enzymes (TPA 1,2-dioxygenase) and (DCDdehydrogenase) from Comamonas sp. Strain E6, which were determined to be capable of reversing the reaction direction from degradation to production of TPA from PCA. Furthermore, studies also indicated mutants of the other described candidate enzymes which would favor the desired reactions, and comprised those capable of inverting DCD dehydrogenase: benzoate 1,2-dioxygenase; 5-epi-aristolochene 1,3-dihydroxylase; and those capable of inverting TPA 1,2-dioxygenase: 3-dehydroquinate dehydratase; salicylaldehydedehydrogenase; 1 ,6-dihydroxycyclohexa-2,4-diene- 1 - carboxylate dehydrogenase.

In one embodiment, the disclosed terephthalate pathway comprises a first pathway for converting the carbon source to dihydroxybenzoate; a second pathway for converting at least a fraction of the dihydroxybenzoate to at least DCD; and a third pathway for converting at least a fraction of DCD to at least TPA. By the previous expression it is intended that the disclosed terephthalate pathway comprises reactions for accumulating in the non-naturally occurring microbial organism at least the two intermediate products, dihydroxybenzoate and DCD. Other intermediate products may be formed in the pathway. Stated in another way, the reactions of the disclosed tereph- thalate pathway may be grouped in a set of pathways different from the disclosed terephthalate pathway, being the set of pathways within the disclosure of the present invention, provided that it comprises the formation of at least dihydroxybenzoate and DCD.

An endogenous pathway is comprised of reactions catalyzed only by homologous en- zymes.

An exogenous pathways comprises at least one reaction catalyzed by an heterologous enzyme. The first pathway converts the carbon source to the dihydroxybenzoate; the conversion occurs by means of reactions catalyzed by enzymes, which convert the carbon source to dihydroxybenzoate through the formation of intermediate compounds. The reactions may be catalyzed by homologous and/or heterologous enzymes. In one embodiment, the first pathway comprises a set of reactions for converting the carbon source to D-erythrose 4-phosphate (E4P) and phosphoenolpyruvate (PEP). Different sets of reactions may be included. One set of reactions comprises the glycolysis for the production of PEP from the carbon source. In one embodiment, the glycolysis is an endogenous pathway. Another set of reactions further comprises the pentose phosphate pathway, for the production of E4P from the carbon source. The previous pathways are well known to one person skilled in the art.

The first pathway may further comprise the conversion of the two intermediate compounds E4P and PEP to aromatics through a set of other intermediate compounds.

A common pathway of aromatic compounds biosynthesis from E4P and PEP is represented in Figure 29. The pathway of aromatic compound biosynthesis is endogenous. Aromat- ics pathways are endogenous in a wide variety of microorganisms, and are used for the production of various aromatic compounds. The aromatic pathway leads from E4P and PEP to chorismic acid with many intermediates in the pathway. The intermediates in the pathway include 3-deoxy-D-arabino-heptulosonic acid 7-phosphate (DAHP), 3- dehydroquinate (DHQ), 3-dehydroshikimate (DHS), shikimic acid, shikimate 3-phosphate (S3P), and 5-enolpyruvoylshikimate-3-phosphate (EPSP). The enzymes in the common pathway include DAHP synthase, DHQ synthase, DHQ dehydratase, shikimate dehydrogenase, shikimate kinase, EPSP synthase and chorismate synthase. In one embodiment, the first pathway comprises the conversion of E4P and PEP to PCA by means of a modification of the pathway of aromatic compounds biosynthesis of Figure 29. The modification of the pathway is disclosed in US5, 616,496, the teachings of which are herein incorporated by reference. In one embodiment, an exogenous nucleic acid is inserted in the host microbial organism, said exogenous nucleic acid encoding for 3- dehydro shikimate dehydratase, for converting at least a fraction of the DHS to protocatechuate. The resulting pathway is schematically represented in Figure 30.

The enzymes 3-dehydroshikimate dehydratase may be recruited from the ortho cleavage pathways which enable microbes such as Neurospora, Aspergillus, Acinetobacter, Klebsiella, and Pseudomonas to use aromatics (benzoate and p-hydroxybenzoate) as well as hydroaromatics (shikimate and quinate) as sole sources of carbon for growth.

In one embodiment, the exogenous nucleic acid is aroZ from Klebsiella pneumonia. In another embodiment, schematically represented in Figure 31, the first pathway further comprises an enzyme encoded by an exogenous nucleic acid that blocks the conversion of DHS to chorismate. Such mutants are unable to catalyze the conversion of 3- dehydro shikimate (DHS) to chorismate due to a mutation in one or more of the genes encoding shikimate dehydrogenase, shikimate kinase, EPSP synthase and chorismate syn- thase, and will thus accumulate elevated intracellular levels of DHS. As an example, E. coli AB2834 is unable to catalyze the conversion of 3-dehydroshikimate (DHS) to shikimic acid due to a mutation in the aroE locus which encodes shikimate dehydrogenase. Similar- ly E. coli AB2829 and E. coli AB2849 also result in increased intracellular levels of DHS.

In the second pathway, at least a fraction of protocatechuate is converted to DCD. In one embodiment, the second pathway comprises at least a carboxylation reaction catalyzed by an enzyme, and a reduction reaction, catalyzed by another enzyme. At least one of these enzymes is a heterologous enzyme. The reactions may occur simultaneously or sequentially. When the reactions occur sequentially, the carboxylation may precede or follow the reduction.

The carboxylation reaction occurs in the para position of the dihydroxybenzoate; and may be performed by a heterologous enzyme of the class of carboxylase. This enzyme may further catalyze more than one reaction, one of which is the carboxylation reaction. The reduction reaction occurs in the aromatic ring of the dihydroxybenzoate. In one embodiment, this enzyme is a heterologous enzyme of the class EC1.3.-.-., thereby indicating the enzyme class of oxidoreductases acting on alkyl bonds. This enzyme may further catalyze more than one reaction, one of which is the reduction reaction. In one embodiment, the second pathway is comprised of a carboxylation reaction and a reduction reaction, that is there are no other reactions comprised in the second pathway.

In an embodiment, the carboxylation reaction may be the reverse reaction of a decarboxylation reaction, being both reactions catalyzed by the same enzyme.

In an embodiment, the reduction reaction may be the reverse reaction of a oxidation reaction, being both reactions catalyzed by the same enzyme.

In an embodiment, in the second pathway the conversion of protocatechuate to DCD oc- curs in a single reductive carboxylation reaction, and the same enzyme catalyzes the carboxylation and reduction reaction. The carboxylation and reduction reaction may occur in one or different catalytic domains and they may occur simultaneously or sequentially. In one embodiment, the carboxylation reaction is the reverse reaction of a decarboxylation reaction and the reduction reaction is the reverse reaction of a oxidation reaction, being the four reactions catalyzed by the same enzyme.

In one embodiment, the enzyme is classified as ECl.3.1.53, named l,2-dihydroxy-3,5- cyclohexadiene- 1 ,4-dicarboxylate dehydrogenase.

The non-naturally occurring microbial organism having a terephthalate pathway may fur- ther comprise a third pathway for converting at least a fraction of l,2-dihydroxy-3,5- cyclohexadiene-l,4-dicarboxylate to terephthalate. In one embodiment, the third pathway comprises a di-dehydration reaction and at least a part of the third pathway is catalyzed by a heterologous enzyme. The di-dehydration involves both the hydroxyl groups of the vicinal diol of the DCD. In a di-dehydration reaction, two hydroxyl groups are removed from the substrate. The di-dehydration restores the aromaticity of the ring leading to the TPA formation. The di-dehydration reaction may be catalyzed by more than one heterologous enzyme. In this case, the di-dehydration reaction may be composed by two dehydration reactions occurring separately, being each reaction catalyzed by a different enzyme. The two dehydration reactions may occur simultaneously or sequentially.

In one embodiment, the third pathway is comprised of a di-dehydration reaction, catalyzed by at least a heterologous enzyme, that is there are no other reactions comprised in the third pathway. In one embodiment, this enzyme is a dehydratase. In another embodiment, the di-dehydration reaction is the reverse reaction of a di- hydroxylation reaction, being both of the reactions are catalyzed by the same enzyme. In one embodiment, the enzyme is classified as ECl.14.12.15, named terephthalate 1,2- dioxygenase. In one embodiment, the non-naturally occurring microbial organism comprises at least a first exogenous nucleic acid encoding at least a first enzyme expressed in a sufficient amount to produce l,2-dihydroxy-3,5-cyclohexadiene-l,4-dicarboxylate (DCD) from a dihydroxybenzoate, the at least a first enzyme being either a wild-type or a mutant and selected from the group consisting of (lR,2S)-l,2-dihydroxy-3,5-cyclohexadiene-l,4- dicarboxylatedehydrogenase (EC 1.3.1.53), 5-epi-aristolochene 1,3-dihydroxylase (EC 1.14.13.119), 2,4-dichlorophenol 6-monoxygenase (EC 1.14.13.20), Phenol 2-monooxygenase (EC 1.14.13.7), Benzoate 4-monooxygenase (EC 1.14.13.12), and Ben- zoate 1,2-dioxygenase (EC 1.14.12.10).

In a further embodiment, the non-naturally occurring microbial organism further comprises at least a second exogenous nucleic acid encoding at least a second enzyme expressed in a sufficient amount to produce terephthalate from l,2-dihydroxy-3,5-cyclohexadiene-l,4- dicarboxylate (DCD), the at least a second enzyme being either a wild-type or a mutant and selected from the group consisting of TPA 1,2-dioxygenase (EC 1.14.12.15), 3-Dehydroquinate dehydratase (EC 4.2.1.10), Salicylaldehyde dehydrogenase (EC 1.2.1.65), and l,6-dihydroxycyclohexa-2,4-diene-l-carboxylate dehydrogenase (EC 1.3.1.25).

In yet a further embodiment, the non-naturally occurring microbial organism further comprises at least a third exogenous nucleic acid encoding at least a third enzyme expressed in a sufficient amount to produce a dihydroxybenzoate from a carbon source, the at least a third enzyme being 3-dehydroshikimate dehydratase.

In one embodiment, the (lR,2S)-l,2-dihydroxy-3,5-cyclohexadiene-l,4- dicarboxylatedehydrogenaseenzyme is SEQ ID NO: 1 and comprises at least one mutation at a position selected from the group consisting of A238, H203, L151, Q257, Y250, R152, G150, H247, A196, Q194, D248, P261, N201, M240, D233, F198, A252, P193, A243, A231, K246, P235, G237, R245, Q244, V232, L242, V197, M236, and L249, wherein the wild-type amino acid and amino acid position are indicated, and wherein the at least one mutation increases a reverse enzymatic reaction and the production of l,2-dihydroxy-3,5- cyclohexadiene-l,4-dicarboxylate (DCD) from the dihydroxybenzoate.

In one embodiment, the (lR,2S)-l,2-dihydroxy-3,5-cyclohexadiene-l,4- dicarboxylatedehydrogenase enzyme is SEQ ID NO: 1 and comprises at least one mutation pair selected from the group consisting of V232A/M240G; L151R/V232A; V232A/M236A; V232A/M240L; L151R/M240G; V232A/M240A; M236A/M240G; V232A/P235A; V232A/Y250A; and Q194 V232A, wherein the at least one mutation pah- increases a reverse enzymatic reaction and the production of l,2-dihydroxy-3,5- cyclohexadiene-l,4-dicarboxylate (DCD) from the dihydroxybenzoate.

In one embodiment, the Benzoate 1,2-dioxygenase enzyme is SEQ ID NO: 6 and comprises at least one mutation pair selected from the group consisting of W25G/W28A; D21G/K23A; W25A/W28L; W25A/W28A; S17A/H91I; S17A/W28L; W25R/W28L; W28L/H91I; W25G/W28L; and S17A/Y125A.

In one embodiment, the 5-epi-aristolochene 1,3-dihydroxyalse enzyme is SEQ ID NO: 2 and comprises at least one mutation pair selected from the group consisting of F97A/N138L; F97L/N138L; F97A/M300A; F97A/N138A; F97A/N138D; F97A/M445A; F97A/K130N; H124A/W125A; F97G/N138L; and I116E/N138L.

In one embodiment, the TPA 1,2-dioxygenase enzyme is SEQ ID NO: 7 and comprises at least one mutation pair selected from the group consisting of I89A/P141E; F59K/I89A; I89A/R142K; I89A/P141L; E62M89A; T63R/I89A; T63K/I89A; I89A/R142Y; F78K/I89A; and I89A/K143G, wherein the at least one mutation pair increases a reverse enzymatic reaction and the production of terephthalate from l,2-dihydroxy-3,5- cyclohexadiene- 1 ,4-dicarboxylate (DCD).

In one embodiment, the 3-Dehydroquinate dehydratase enzyme is SEQ ID NO: 8 and comprises at least one mutation pair selected from the group consisting of S141A/H143Q; T183A/A187G; P169V/A187G; E86G/A187G; T183L/A187G; P169V/T183A; E86G/T183A; E86G/P169V; P169N/K170Q; and P169V/T183L.

In one embodiment, the 3-Salicylaldehyde dehydrogenase enzyme is SEQ ID NO: 9 and comprises at least one mutation pair selected from the group consisting of C352G/G353L; M361A/P362A; C352I/G353A; D315T/C352G; C352G/G353A; S312A/D315T; D321A/C322G; N337L/C352G; D315V/C352G; and P326L/M327G. In one embodiment, the l,6-dihydroxycyclohexa-2,4-diene-l-carboxylate dehydrogenase enzyme is SEQ ID NO: 10 and comprises at least one mutation pair selected from the group consisting of P157A/Y158V; P157A/Y158E; I112A/Y158V; C185A/P188A; T150A/Y158V; P157A/Y158Q; Y158V/E233A; Y158V/V256T; P157A/Y158A; and Y158V/R227A.

In various embodiments, the dihydroxybenzoate is protocatechuate. In various embodiments, the wild-type (lR,2S)-l,2-dihydroxy-3,5-cyclohexadiene-l,4- dicarboxylatedehydrogenase enzyme is encoded by TphB, from organism Comamonastestoteroni, and the wild-type TPA 1,2-dioxygenase enzyme comprises a first oxygenase component, encoded by TphA2, TphA3, and a second reductase component, encoded by TphAl, from organism Comamonastestoteroni.

In various embodiments, at least one of the at least a first enzyme, the at least a second enzyme, or the at least a third enzyme is a heterologous enzyme.

In one embodiment, the non-naturally occurring microbial organism comprises a first pathway comprising converting the carbon source to D-erythrose 4-phosphate (E4P) and phosphoenolpyruvate (PEP), converting E4P and PEP to 3-dehydroshikimate (DHS), and the at least a third enzyme is expressed in a sufficient amount to produce a dihydroxybenzoate from DHS; and a second pathway comprising the at least a first exogenous nucleic acid encoding the at least a first enzyme expressed in a sufficient amount to produce l,2-dihydroxy-3,5-cyclohexadiene-l,4-dicarboxylate (DCD) from thedihydroxybenzoate.

In one embodiment, the production of l,2-dihydroxy-3,5-cyclohexadiene-l,4-dicarboxylate (DCD) from the dihydroxybenzoate by the exogenously expressed wild-type or mutant (lR,2S)-l,2-dihydroxy-3,5-cyclohexadiene-l,4-dicarboxylate dehydrogenase comprises a reverse enzymatic reaction as compared to an endogenous, homologous wild-type (1R,2S)- 1 ,2-dihydroxy-3,5-cyclohexadiene- 1 ,4-dicarboxylate dehydrogenase. In one embodiment, the production of terephthalate from l,2-dihydroxy-3,5- cyclohexadiene-l,4-dicarboxylate (DCD) by the exogenously expressed wild-type or mutant TPA 1,2-dioxygenase comprises a reverse enzymatic reaction as compared to an en- dogenous, homologous wild- type TPA 1,2-dioxygenase.

In one embodiment, a method is provided for producing l,2-dihydroxy-3, 5-cyclohexadiene- 1,4-dicarboxylate (DCD), where the method comprises culturing the described non- naturally occurring microbial organism in the presence of a carbon source, and under conditions and for a sufficient amount of time to produce l,2-dihydroxy-3,5-cyclohexadiene- 1,4-dicarboxylate (DCD).

In one embodiment, a method is provided for producing terephthalate, where the method comprises culturing the described non-naturally occurring microbial organism in the presence of a carbon source, and under conditions and for a sufficient amount of time to pro- duce terephthalate.

In various embodiments of the method, the product accumulates in the non-naturally occurring microbial organism or is secreted from the non-naturally occurring microbial organism.

In various embodiments of the method, the non-naturally occurring microbial organism is cultured in anaerobic culture conditions.

The non-naturally occurring microbial organism comprising a terephthalate pathway is cul- tured in the presence of a carbon source.

Carbon sources commonly used as feed for the non-naturally occurring microbial organism may include carbohydrates, comprising complex carbohydrates such as cellulose and hem- icellulose, starch, and simple carbohydrates such as oligomeric and monomeric sugars. Oligomeric and monomeric sugars may be derived from complex carbohydrates. In the context of the present disclosure, simple sugars are the monomeric sugars, and may be selected from the group consisting of glucose, xylose, arabinose, mannose, galactose, and fructose. It should be noted that there may be other simple sugars not in the preceding list.

The carbon source may comprise a biomass feedstock. In the context of the present disclosure, biomass is biological material derived from living, or recently living organisms and comprises animal and vegetable derived material.

Plant biomass is a preferred feedstock. The constituent of plant biomass may comprise simple and dimeric sugars, and starch. Apart from sugars and starch, the three major constituents in plant biomass are cellulose, hemicellulose and lignin, which are commonly re- ferred to by the generic term lignocellulose. Polysaccharide-containing biomass is a generic term that includes both starch and lignocellulosic biomasses. Therefore, some types of feedstocks can be plant biomass, polysaccharide containing biomass, and lignocellulosic biomass. In this specification, a ligno-cellulosic biomass may or may not contain starch and/or monomeric and dimeric sugars.

The carbon source may comprise a biomass feedstock, preferably a ligno-cellulosic feedstock.

According to the invention, ligno-cellulosic feedstock includes any material that comprises ligno-cellulose. Ligno -cellulose is generally found, for example, in the stems, leaves, hulls, husks, and cobs of plants or leaves, branches, and wood of trees. The ligno-cellulosic material can also be, but is not limited to, herbaceous material, agricultural residues, forestry residues, municipal solid wastes, waste paper, and pulp and paper mill residues. It is understood herein that ligno-cellulosic material may be in the form of plant cell wall material containing lignin, cellulose, and hemi-cellulose in a mixed matrix.

In an embodiment the ligno-cellulosic feedstock is corn fiber, rice straw, pine wood, wood chips, poplar, wheat straw, switch grass, bagasse, Arundo donax, myscanthus, eucalyptus, bamboo, paper and pulp processing waste. In a preferred embodiment the ligno-cellulosic material is Arundo Donax. In another preferred embodiment the ligno-cellulosic material is woody or herbaceous plants selected from the group consisting of the grasses. Alternatively phrased, the preferred ligno-cellulosic feedstock is selected from the group consisting of the plants belonging to the Poaceae or Gramineae family. The ligno-cellulosic biomass preferably has less than 70% starch by dry weight, with less than 50% starch by dry weight being more preferred and less than 25% starch by dry weight being most preferred. Because the feedstock may use naturally occurring ligno-cellulosic biomass for the microorganism cultivation, the stream will have relatively young carbon materials. The following, taken from ASTM D 6866 - 04 describes the contemporary carbon, which is that found in bio-based hydrocarbons, as opposed to hydrocarbons derived from oil wells, which was derived from biomass thousands of years ago. "[A] direct indication of the rela- tive contribution of fossil carbon and living biospheric carbon can be as expressed as the fraction (or percentage) of contemporary carbon, symbol f c . This is derived from f M through the use of the observed input function for atmospheric 14 C over recent decades, representing the combined effects of fossil dilution of the 14 C (minor) and nuclear testing enhancement (major). The relation between fc and fM is necessarily a function of time. By 1985, when the particulate sampling discussed in the cited reference [of ASTM D 6866 - 04, the teachings of which are incorporated by reference in their entirety] the f M ratio had decreased to ca. 1.2."

Fossil carbon is carbon that contains essentially no radiocarbon because its age is very much greater than the 5730 year half life of 14 C. Modern carbon is explicitly 0.95 times the specific activity of SRM 4990b (the original oxalic acid radiocarbon standard), normalized ίοδ 13 C = -19%. Functionally, the faction of modern carbon = (1/0.95) where the unit 1 is defined as the concentration of 14 C contemporaneous with 1950 [A.D.] wood (that is, pre- atmospheric nuclear testing) and 0.95 are used to correct for the post 1950 [A.D.] bomb 14 C injection into the atmosphere. As described in the analysis and interpretation section of the test method, a 100% 14 C indicates an entirely modern carbon source, such as the products derived from this process. Therefore, the percent 14 C of the product stream from the process will be at least 75%, with 85% more preferred, 95% even preferred and at least 99% even more preferred and at least 100% the most preferred. (The test method notes that the percent 14 C can be slightly greater than 100% for the reasons set forth in the method). These percentages can also be equated to the amount of contemporary carbon as well. Therefore the amount of contemporary carbon relative to the total amount of carbon is preferred to be at least 75%, with 85% more preferred, 95% even more preferred and at least 99% even more preferred and at least 100% the most preferred. Correspondingly, each carbon containing compound in the reactor, which includes a plurality of carbon containing conversion products will have an amount of contemporary carbon relative to total amount of carbon is preferred to be at least 75%, with 85% more preferred, 95% even preferred and at least 99% even more preferred and at least 100% the most preferred. This means that the products made from the terephthalate will have contemporary carbon, with the amount of contemporary carbon relative to the total amount of carbon is preferred to be at least 25%, with 50% more preferred, 75% even more preferred and at least 99% even more preferred and at least 100% the most preferred.

In an embodiment, the ligno-cellulosic feedstock is subjected to a pretreatment process to create a pretreated ligno-cellulosic feedstock.

The term "pretreated" may be replaced with the term "treated". However, preferred techniques contemplated are those well known for "pretreatment" of ligno-cellulosic material as will be describe further below. As mentioned above treatment or pretreatment may be carried out using conventional methods known in the art, which promotes the separation and/or release of cellulose and increased accessibility of the cellulose from ligno-cellulosic material.

Pre-treatment techniques are well known in the art and include physical, chemical, and bio- logical pre-treatment, or any combination thereof. In preferred embodiments the pretreatment of ligno-cellulosic material is carried out as a batch or continuous process.

Physical pretreatment techniques include various types of milling/comminution (reduction of particle size), irradiation, steaming/steam explosion, and hydrothermolysis, in the pre- ferred embodiment, soaking, removal of the solids from the liquid, steam exploding the solids to create the pre-treated ligno-cellulosic biomass. Comminution includes dry, wet and vibratory ball milling. Preferably, physical pre- treatment involves use of high pressure and/or high temperature (steam explosion). In context of the invention high pressure includes pressure in the range from 3 to 6 MPa preferably 3.1 MPa. In context of the invention high temperature include temperatures in the range from about 100 to 300°C, preferably from about 160 to 235 °C. In a specific embodiment impregnation is carried out at a pressure of about 3.1 MPa and at a temperature of about 235 °C.

In one embodiment, the pre-treatment is done according to the process described in WO 2010/113129, the entire teachings of which are incorporated by reference.

Although not needed or preferred, chemical pre-treatment techniques include acid, dilute acid, base, organic solvent, lime, ammonia, sulfur dioxide, carbon dioxide, pH-controlled hydrothermolysis, wet oxidation, and solvent treatment.

If the chemical treatment process is an acid treatment process, it is more preferably, a continuous dilute or mild acid treatment, such as treatment with sulfuric acid, or another organic acid, such as acetic acid, citric acid, tartaric acid, succinic acid, or any mixture thereof. Other acids may also be used. Mild acid treatment means at least in the context of the invention that the treatment pH lies in the range from 1 to 5, preferably 1 to 3.

In a specific embodiment, the acid concentration is in the range from 0.1 to 2.0 wt % acid, preferably sulfuric acid. The acid is mixed or contacted with the ligno-cellulosic material and the mixture is held at a temperature in the range of around 160-220 °C for a period ranging from minutes to seconds. Specifically the pre-treatment conditions may be the following: 165-183 °C, 3-12 minutes, 0.5-1.4% (w/w) acid concentration, 15-25, preferably around 20% (w/w) total solids concentration. Other contemplated methods are described in U.S. Pat. Nos. 4,880,473, 5,366,558, 5,188,673, 5,705,369 and 6,228,177. Wet oxidation techniques involve the use of oxidizing agents, such as sulfite based oxidizing agents and the like. Examples of solvent treatments include treatment with DMSO (Dimethyl Sulfoxide) and the like. Chemical treatment processes are generally carried out for about 5 to about 10 minutes, but may be carried out for shorter or longer periods of time.

Biological pre-treatment techniques include applying lignin-solubilizing micro-organisms (see, for example, Hsu, T.-A., 1996, Pre-treatment of biomass, in Handbook on Bioethanol: Production and Utilization, Wyman, C. E., ed., Taylor & Francis, Washington, D.C., 179- 212; Ghosh, P., and Singh, A., 1993, Physicochemical and biological treatments for enzy- matic/microbial conversion of ligno-cellulosic biomass, Adv. Appl. Microbiol. 39: 295- 333; McMillan, J. D., 1994, Pretreating ligno-cellulosic biomass: a review, in Enzymatic Conversion of Biomass for Fuels Production, Himmel, M. E., Baker, J. O., and Overend, R. P., eds., ACS Symposium Series 566, American Chemical Society, Washington, D.C., chapter 15; Gong, C. S., Cao, N. J., Du, J., and Tsao, G. T., 1999, Ethanol production from renewable resources, in Advances in Biochemical Engineering/Biotechnology, Scheper, T., ed., Springer- Verlag Berlin Heidelberg, Germany, 65: 207-241; Olsson, L., and Hahn- Hagerdal, B., 1996, Fermentation of ligno-cellulosic hydrolysates for ethanol production, Enz. Microb. Tech. 18: 312-331; and Vallander, L., and Eriksson, K.-E. L., 1990, Production of ethanol from ligno-cellulosic materials: State of the art, Adv. Biochem. Eng./Biotechnol. 42: 63-95). In an embodiment both chemical and physical pre-treatment is carried out including, for example, both mild acid treatment and high temperature and pressure treatment. The chemical and physical treatment may be carried out sequentially or simultaneously.

In one embodiment, the pre-treatment is carried out as a soaking step with water at greater than 1 °C, removing the ligno-cellulosic biomass from the water, followed by a steam explosion step.

In another embodiment, the pre-treated ligno-cellulosic material is comprised of complex sugars, also known as glucans and xylans (cellulose and hemicellulose) and lignin.

The pretreated ligno-cellulosic feedstock may be subjected to a hydrolysis. Hydrolysis is a process which converts polymeric sugars that are not soluble to soluble sugars and oligomeric sugars into sugars having lower molecular weight.

In one embodiment, the hydrolysis of the pre-treated ligno-cellulosic feedstock is conducted in the presence of a catalyst.

In one embodiment, the catalyst comprises an enzyme or enzymes mixture and the hydrolysis is an enzymatic hydrolysis.

In another embodiment, the enzymatic hydrolysis is conducted according the process disclosed in WO2010113130, the teachings of which are hereby incorporated in their entirety.

The biomass feedstock will contain some compounds which are hydrolysable into a water- soluble species obtainable from the hydrolysis of the biomass. In the case of water soluble hydrolyzed species of cellulose, cellulose can be hydrolyzed into glucose, cellobiose, and higher glucose polymers and includes dimers and oligomers. Thus some of the water solu- ble hydrolyzed species of cellulose are glucose, cellobiose, and higher glucose polymers and includes their respective dimers and oligomers. Cellulose is hydrolysed into glucose by the carbohydrolyticcellulases. Thus the carbohydrolyticcellulases are examples of catalysts for the hydrolysis of cellulose. The prevalent understanding of the cellulolytic system divides the cellulases into three classes; exo-l,4-[beta]-D-glucanases or cellobiohydrolases (CBH) (EC 3.2.1.91), which cleave off cellobiose units from the ends of cellulose chains; endo-l,4-[beta]-D-glucanases (EG) (EC 3.2.1.4), which hydrolyse internal [beta]-l,4- glucosidic bonds randomly in the cellulose chain; 1 ,4-[beta]-D-glucosidase (EC 3.2.1.21), which hydrolyses cellobiose to glucose and also cleaves off glucose units from cellooligosaccharides. Therefore, if the biomass contains cellulose, then glucose is a water soluble hydrolyzed species obtainable from the hydrolysis of the biomass and the afore mentioned cellulases are specific examples, as well as those mentioned in the experimental section, of catalysts for the hydrolysis of cellulose.

By similar analysis, the hydrolysis products of hemicellulose are water soluble species obtainable from the hydrolysis of the biomass, assuming of course, that the biomass contains hemicellulose. Hemicellulose includes xylan, glucuronoxylan, arabinoxylan, glucomannan, and xyloglucan. The different sugars in hemicellulose are liberated by the hemicellulases. The hemicellulytic system is more complex than the cellulolytic system due to the heterologous nature of hemicellulose. The systems may involve among others, endo-l,4-[beta]-D- xylanases (EC 3.2.1.8), which hydrolyse internal bonds in the xylan chain; 1 ,4-[beta]-D- xylosidases (EC 3.2.1.37), which attack xylooligosaccharides from the non- reducing end and liberate xylose; endo-l,4-[beta]-D-mannanases (EC 3.2.1.78), which cleave internal bonds; l,4-[beta]-D-mannosidases (EC 3.2.1.25), which cleave mannooligosaccharides to mannose. The side groups are removed by a number of enzymes; such as [alpha] -D- galactosidases (EC 3.2.1.22), [alpha] -L-arabinofuranosidases (EC 3.2.1.55), [alpha]-D- glucuronidases (EC 3.2.1.139), cinnamoylesterases (EC 3.1.1.-), acetyl xylanesterases (EC 3.1.1.6) and feruloylesterases (EC 3.1.1.73). Therefore, if the biomass contains hemicellulose, then xylose and mannose are examples of a water soluble hydrolyzed species obtainable from the hydrolysis of the hemicellulose containing biomass and the afore mentioned hemicellulases are specific examples, as well as those mentioned in the experimental section, of catalysts for the hydrolysis of hemicellulose.

For the hydrolysis effectively occurring, the hydrolysis is conducted in the presence of a catalyst. The catalyst may comprise at least one enzyme or microorganism which converts at least one of the compounds in the biomass to a compound or compounds of lower molecular weight, down to, and including, the basic sugar or carbohydrate used to make the compound in the biomass. The enzymes capable of doing this for the various polysaccharides such as cellulose, hemicellulose, and starch are well known in the art and would include those not invented yet.

The catalyst may also comprise an inorganic acid preferably selected from the group consisting of sulfuric acid, hydrochloric acid, phosphoric acid, and the like, or mixtures thereof. The inorganic acid is believed useful for processing at temperatures greater than 100 °C. The process may also be run specifically without the addition of an inorganic acid.

Often the ligno-cellulosic biomass will contain starch. The more important enzymes for use in starch hydrolysis are alpha-amylases ( 1,4- [alpha] -D-glucanglucanohydrolases, (EC 3.2.1.1)). These are endo-acting hydrolases which cleave 1 ,4-[alpha]-D-glucosidic bonds and can bypass but cannot hydrolyse 1 ,6-[alpha]-D-glucosidicbranchpoints. However, also exo-acting glycoamylases such as beta-amylase (EC 3.2.1.2) and pullulanase (EC 3.2.1.41) can be used for starch hydrolysis. The result of starch hydrolysis is primarily glucose, maltose, maltotriose, [alpha] -dextrin and varying amounts of oligosaccharides. When the starch-based hydrolysate is used for fermentation it can be advantageous to add proteolytic enzymes. Such enzymes may prevent flocculation of the microorganism and may generate amino acids available to the microorganism. Therefore, if the biomass contains starch, then glucose, maltose, maltotriose, [alpha] -dextrin and oligosaccharides are examples of a water soluble hydrolyzed species obtainable from the hydrolysis of the starch containing biomass and the afore mentioned alpha-amylases are specific examples of catalysts for the hydrolysis of starch.

The non-naturally occurring microbial organism is cultured in a medium with a carbon source and other essential nutrients, under conditions and for a sufficient time to produce terephthalate dependent on the host microbial organisms. A person skilled in the art may easily define the suitable culture conditions, according to the host microorganism needs.

In some embodiments, culture conditions include anaerobic or substantially anaerobic growth or maintenance conditions. Exemplary anaerobic conditions are well known in the art. Such conditions can be obtained, for example, by culturing the non-naturally occurring microbial organism in a fermenter which can be sealed, and then sparging the medium. For strains where growth is not observed anaerobically, microaerobic conditions can be applied.

Nitrogen sources, growth stimulators and the like may be added to improve the microorganism cultivation and terephthalate production. Nitrogen sources include urea, ammonia salts (for example NH 4 C1 or NH 4 S04) and peptides. Protease may be used, e.g., to digest proteins to produce free amino nitrogen (FAN). Such free amino acids may function as nu- trient for the host cell, thereby enhancing the growth and enzyme or enzyme mixture production. Preferred cultivation stimulators for growth include vitamins and minerals. Examples of vitamins include multivitamins, biotin, pantothenate, nicotinic acid, meso-inositol, thiamine, pyridoxine, para-aminobenzoic acid, folic acid, riboflavin, and Vitamins A, B, C, D, and E. Examples of minerals include minerals and mineral salts that can supply nutrients comprising P, K, Mg, S, Ca, Fe, Zn, Mn, and Cu. Optionally, the pH of the medium can be maintained at a desired pH, in particular neutral pH, such as a pH of around 7 by addition of a base, such as NaOH or other bases, or acid, as needed to maintain the culture medium at a desirable pH.

Cultivation procedures are well known in the art. Briefly, cultivation for the biosynthetic production of terephthalate can be utilized in, for example, fed-batch cultivation and batch separation; fed-batch cultivation and continuous separation, or continuous cultivation and continuous separation. Examples of batch and continuous cultivation procedures are well known in the art. During the cultivation of the microorganism the carbon source is normally consumed by the microorganism and new carbon source is added. A person skilled in the art can easily determine the procedure for adding the carbon source according to the invention, for instance by monitoring carbon source depletion over time and measuring the microbial organism growth rate by measuring optical density using a spectrophotometer.

The carbon source may be added to the culture medium in a continuous, semi-continuous or single step manner. According to the invention the carbon source may be added to the culture medium either prior to inoculation, simultaneously with inoculation or after inoculation of non-naturally occurring microorganism in the culture medium.

The culture conditions described herein can be scaled up for manufacturing of terephthalate in commercial quantities. Generally, and as with non- continuous culture procedures, the continuous and/or near-continuous production of terephthalate will include cul- turing a non- naturally occurring terephthalate producing organism of the invention in suf- ficient nutrients and medium to sustain and/or nearly sustain growth in an exponential phase. Continuous culture under such conditions can include, for example, growth for 1 day, 2, 3, 4, 5, 6 or 7 days or more. Additionally, continuous culture can include longer time periods of 1 week, 2, 3, 4 or 5 or more weeks and up to several months. Alternatively, organisms of the invention can be cultured for hours, if suitable for a particular application. It is to be understood that the continuous and/or near-continuous culture conditions also can include all time intervals in between these exemplary periods. It is further understood that the time of culturing the microbial organism of the invention is for a sufficient period of time to produce a sufficient amount of product for a desired purpose.

The terephthalate and/or DCD produced by the non-naturally occurring microbial organism maybe removed or separated from the culture medium by any techniques known in the art and still to be invented. The removal or separation may occur in a batch, continuous, or semi-continuous manner and may involve purification processes.

The terephthalate and/or DCD produced by the non-naturally occurring microbial organism may be further converted to other compounds. Preferably, the conversion occurs after the terephthalate removal or separation from the culture medium. The conversion process may include chemical and biological conversion process.

In one embodiment, at least a portion of the terephthalate is converted to polyethylene terephthalate.

In another preferred embodiment, the polyethylene terephthalate is used for making polyethylene terephthalate preforms and bottles.

EXAMPLES

Example 1 - Example of first pathway for the accumulation of protocatechuate. US 5,616,496 discloses a heterologous cell transformant that biocatalytically converts a carbon source to catechol and cis, cismuconic acid. The cell transformant expresses heterologous genes encoding the enzymes 3-dehydroshikimate dehydratase and other genes relevant for the invention of US 5,616,496.

In Example 1 of US 5,616,496 - Cloning of AroZ gene, the gene which encodes DHS dehydratase, designated aroZ, was isolated from a genomic library of Klebsiellapneumoniae DNA and cloned in E. coli DH5e. After the growth of the genetically modified microorganism in a suitable growing medium, the growth medium appeared brown in color, analogous to the darkening of the medium which occurred when protocatechuic acid was spotted onto the plate.

In example 2 of US 5,616,496 - Confirmation of the Cloning of the aroZ gene, it is given the confirmation that transformation of an E. coli strain which typically converts D-glucose to DHS could further convert DHS to protocatechuic acid. E. coli AB2834 accumulates DHS in the culture supernatant due to a mutation in the aroE gene, which encodes shikimate dehydrogenase. The accumulation of protocatechuate is verified by NMR analysis.

This example demonstrates a non-naturally occurring microbial organism having a path- way for the accumulation of protocatechuate.

Example 2 - Example of second pathway for the conversion of protocatechuate to DCD

The conversion of protocatechuate to DCD is catalyzed by an enzyme which acts on an ar- omatic substrate, adding a carboxylic group in the para-position of PCA.

Example 2a

In Crystal Structures of Nonoxidative Zinc-dependent 2,6-Dihydroxybenzoate (- Resorcylate) Decarboxylase from Rhizobium sp. Strain MTP-10005, Masaru Goto et al., THE JOURNAL OF BIOLOGICAL CHEMISTRY VOL. 281, NO. 45, pp. 34365-34373, November 10, 2006, a reversible 2,6-dihydroxybenzoate decarboxylase from Rhizobium sp. strain MTP-10005 is reported to catalyze the decarboxylation of 2,6- dihydroxybenzoate; the enzyme also catalyzes the decarboxylation of 2,3- dihydroxybenzoate .

The enzyme possesses one Zn 2+ ion ligand, which is bound by the amino acid residues Glu8, HislO, Hisl64, Asp287, and a water molecule at the active site center. The carbox- ylate substrate takes the place of the water molecule and is coordinated to the Zn ion. The 2-hydroxy group of the substrate is hydrogen-bonded to Asp287, which forms a triad together with His218 and Glu221 and is assumed to be the catalytic base. Given the structural similarity between PCA and the substrates of the reactions catalyzed by 2,6-dihydroxybenzoate decarboxylase it is reasonable to expect some activity of this enzyme on the PCA. Moreover, a person skilled in the art knows how to modify the enzyme, using techniques as random mutagenesis, direct evolution, gene shuffling, and other well-known techniques, in such a way that the enzyme is able to accommodate the protocatechuate substrate in the catalytic site and to carboxylate the PCA in the para- position, for obtaining the DCD.

Example 2b In "Characterizing a proposed novel enzyme family of hydroxyarylic acid decarboxylases" Delina Y. Lyon, Thesis Submitted to the Graduate Faculty of The University of Georgia, Athens 2002, it is explained that, although decarboxylase reactions are often physiologically irreversible, one enzyme can be capable of performing both reactions. The physiological function of some of these enzymes depends on downstream reactions. The downstream reactions can enable thermodynamically unfavorable carboxylations by keeping the concentrations of the reaction products low inside the cell. This allows the carboxylations to be "pulled" in one direction or another.

For example, the 4-hydroxybenzoate decarboxylase (ShdC EC 4.1.1.61, isolated from Sedimentibacterhydroxybenzoicus) is able to decarboxylate a number of compounds.

S. hydroxybenzoicus also contains a 3,4-dihydroxybenzoate decarboxylase (Shd34 EC 4.1.1.63), which is a separate enzyme from ShdC. Shd34 is induced by and specific for 3,4-dihydroxybenzoate, as reported in He, Z., and J. Wiegel "Purification and characteriza- tion of an oxygen-sensitive, reversible 3,4-dihydroxybenzoate decarboxylase from Clostridium hydroxybenzoicum", J. Bacteriol. 178(12):3539-43 and in Zhang, X., L. Mandelco, and J. Wiegel "Clostridium hydroxybenzoicumsp. nov., an amino acid-utilizing, hydroxybenzoate-129 decarboxylating bacterium isolated from methanogenic freshwater pond sediment". Int. J. Syst. Bacteriol. 44(2):213-222.

Clostridium hydroxybenzoicum has been reclassified as Sedimentibacter. hydroxybenzoicus. Both decarboxylases are reversible. Carbon dioxide, not bicarbonate, is used as the carbon source for the reverse carboxylating activity of these enzymes. Given the structural similarity between PCA and the substrates of the reverse reactions catalyzed by ShdC and Shd34 it is reasonable to expect some activity of those enzymes on the PCA. DCD dehydrogenase is an enzyme which catalyzes a dehydrogenation coupled to a decarboxylation, leading to the conversion of DCD to PCA. Therefore, it is reasonable to expect that it catalyzes the reverse reaction, thereby carboxylating the PCA in the position where the decarboxylation occurs. Based on previous considerations, DCD dehydrogenase is another candidate enzyme for the conversion of PCA to DCD.

Examples 2a and 2b demonstrate that it is possible to define enzymes for converting PCA to DCD. A person skilled in the art knows how to functionally express at least one of these enzymes in a suitable micro-organism, thereby defining a pathway corresponding to the second pathway of the present invention. These techniques may comprise plasmids and cosmids transformation, direct integration in the host DNA.

Example 3 - Example of third pathway for the conversion of l,2-dihydroxy-3,5- cyclohexadiene-l,4-dicarboxylate to terephthalate. The conversion of DCD to terephthalate is catalyzed by an enzyme which di-dehydrates the vicinal diol of the DCD.

In the database MetaCyc the enzyme TP ADO which catalyzes the conversion of TPA to DCD is reported to be reversible.

A person skilled in the art will understand that the reversion may be achieved altering the carbon fluxes by means of genetically modifications of the microorganism, expressing the appropriate genes, tailoring their expression and altering culture conditions in order to enhance the intracellular DCD concentration and/or redox (for example NADH/NAD+) ratios. Accordingly, the metabolic flux will be driven through this pathway in the direction of terephthalate synthesis rather than degradation.

The following references, as well as references cited in the body of the specification, are expressly incorporated by reference herein.

AbouHamdan A, Dementin S, Liebgott PP, Gutierrez-Sanz O et al., Understanding andTuning the Catalytic Bias of Hydrogenase, J. Am. Chem. Soc, 134:20, 8368-8371 (2012).

Ajikumar PK, Xiao WH, Tyo KEJ et al., Isoprenoid pathway optimization for taxol precursoroverproduction in Escherichia coli, Science, 330:70-74 (2010).

Benkert P, Biasini M, Schwede T, Toward the estimation of the absolute quality of individualprotein structure models, Bioinformatics, l;27(3):343-50 (2011).

Bramucci MG, Mccutchen CM, Nagarajan V, Thomas SM, Microbial production ofterephthalic acid and isophthalic acid, US E. I. du Pont de Nemours and Comp., http://www.freepatentsonline.eom/6187569.html (2001).

Carbonell P, Faulon JL, Molecular signatures-based prediction of enzyme promiscui- ty,Bioinformatics 26, 2012-2019 (2010).

Carbonell P, Nussinov R, Del Sol A, Energetic determinants of protein binding specifici- tydnsights into protein interaction networks, Proteomics 9, 1744-1753 (2009).

Carbonell P, Planson AG, Fichera D, Faulon JL, A retrosynthetic biology approach tometabolic pathway design for therapeutic production, BMC Syst. Biol. 5, 122+ (2011).

Coitinho et al., Expression, purification and preliminary crystallographic studies of NahF, asalicylaldehyde dehydrogenase from Pseudomonas putida G7 involved in naphthalenedegradation, ActaCrystallo. Section F vol. 68, no. 1, 93-97 (2012).

Faulon JL, Misra M, Martin S, Sale K et al., Genome scale enzyme-metabolite and drug- target interaction predictions using the signature molecular descriptor, Bioinformatics 24, 225-233 (2008).

Fukuhara Y, Kasai D, Katayama Y, Fukuda M et al., Enzymatic properties of tereph- thalatel,2-dioxygenase of Comamonas sp. strain E6, Biosc, Biotech. &Biochem. 72(9): 2335-2341(2008).

Gientka I, Duszkiewicz-Reinhard W, Shikimate pathway in yeast cells: enzymes, function- ing,regulation, Polish Jour, of Food & Nut. Sc., 9(2): 113-118 (2009). Grosdidier A, Zoete V, Michielin O, SwissDock, a protein-small molecule docking webservice based on EADock DSS, Nuc. Acids Res. 39 (2011).

Karatzoglou, A, Smola, As, Hornik, K, Zeileis, A, kernlab - An S4 Package for KernelMethods in R, J. Stat. Soft. 11(9): 1-20 (2004).

Kleerebezem R, Hulshoff Pol LW, Lettinga G, The role of benzoate in anaerobic degradationof terephthalate, Appl. Env. Microbiol., 65(3):1161-7, PMJD: 10049877 (1999). Laurie AT, Jackson RM, Q-SiteFinder: an energy-based method for the prediction of proteinligandbinding sites, Bioinformatics, 21: 1908-1916 (2005).

Ledger T, Pieper DH, Gonzalez B. Chlorophenol hydroxylases encoded by plasmid pJP4differentially contribute to chlorophenoxyacetic acid degradation. Appl Environ Microbiol.72(4):2783-92 (2006).

Mcintosh, C L, Germer, F et al., The [NiFe] -Hydro genase of the CyanobacteriumSynechocystis sp. PCC 6803 Works Bidirectionally with a Bias to H2 Production, J. Am. Chem.Soc, vol. 133, no. 29, 11308-11319 (2011).

Mongodin EF, Shapir N, Daugherty SC, DeBoy RT, et al., Secrets of soil survival revealed bythe genome sequence of Arthrobacteraurescens TCI. PLoS Genet. 2(12):e214 (2006).

Nierman WC, Pain A, Anderson MJ, Wormian JR, et al., Genomic sequence of the pathogenicand allergenic filamentous fungus Aspergillusfumigatus. Nature 438(7071):1151-6, (2005).

Parales, RE, Resnick, Sol M, Aromatic Ring HydroxylatingDioxygenases Pseudomo- nas,Pseudomonas chap. 9 (Ramos, Juan-Luis Levesque, Roger C eds.), p. 287-340 (2006).

Planson AG, Carbonell P, Paillard E, PoUet N et al., Compound toxicity screening andstructure-activity relationship modeling in escherichia coli, Biotech. Bioeng. (2011).

Sailer E, Laue H, SchlafiOppenberg HR, Cook AM, Purification and some properties of( 1R,2S)- 1 ,2-dihydroxy-3,5-cyclohexadiene- 1 ,4-dicarboxylate dehydrogenase fromComamonastestosteroni T-2, FEMS Microbiol. Let., 130:97-102 (1995).

Sarnowski C, Carbonell P, Elati M, Faulon JL, Prediction of catalytic efficiency to discovernew enzymatic activities, Proc. of the 4th Internat. Workshop on Machine Learning in Syst.Biol., 153-156 (2010). Sasoh M, Masai E, Ishibashi S, Hara H et al., Characterization of the terephthalate degradationgenes of Comamonas sp. strain E6, Appl. Env. Microbiol., 72(3): 1825-32 (2006).

Stiebritz, Martin T, Reiher, Markus, Hydrogenases and oxygen, Chem. Sc. vol. 3, no. 6, 1739-1751 (2012).

Takahashi S, Zhao Y, O'Maille PE, Greenhagen BT et al., Kinetic and Molecular Analysis of5-Epiaristolochene 1,3-Dihydroxylase, a Cytochrome P450 Enzyme Catalyzing SuccessiveHydroxylations of Sesquiterpenes, Jour, of Biol. Chem. vol. 280, no. 5, 3686- 3696 (2005). Wang YZ, Zhou Y, Zylstra GJ, Molecular analysis of isophthalate and terephthalatedegradation by Comamonastestosteroni YZW-D, Env. Health Perspect 103, 9- 12 (1995).

Yao YL, Ze S, The reaction mechanism for dehydration process catalyzed by type Idehydroquinate dehydratase from Gram-negative Salmonella enterica, Chem. Physics Let. vol. 519-520, 100-104 (2012).