COMPOSITIONS AND METHODS FOR BIOREMEDIATION

Title:

COMPOSITIONS AND METHODS FOR BIOREMEDIATION

Document Type and Number:

WIPO Patent Application WO/1998/050587

Kind Code:

Abstract:

Compositions and methods for the degradation of compounds contained in a liquid or solid waste stream are described. Genes encoding toluene-degrading enzymes are described. The enzymes have homology to the $i(E. coli) pyruvate formate lyase and pyruvate formate lyase activator.

Inventors:

COSCHIGANO PETER W

Application Number:

PCT/US1998/009174

Publication Date:

November 12, 1998

Filing Date:

May 05, 1998

Export Citation:

Click for automatic bibliography generation Help

Assignee:

UNIV OHIO (US)

International Classes:

C12N1/26; C12N9/88; C12N15/54; C12P7/46; C12S1/00; C12S3/02; (IPC1-7): C12S3/02

Foreign References:

US5610065A	1997-03-11
US5300629A	1994-04-05

Attorney, Agent or Firm:

Carroll, Peter G. (LLP Suite 2200, 220 Montgomery Stree, San Francisco CA, US)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

A method of degrading compounds contained in a liquid or solid waste source, comprising the steps of: a) providing i) a waste source comprising toluene, ii) a reaction containing means, and iii) a compound selected from the group consisting of a functional, cellfree pyruvate formate lyase homologue of a toluenedegrading bacterium and a functional, cellfree pyruvate formate lyase activating homologue of a toluenedegrading bacterium; and b) reacting said homologue and said waste source in said containing means under conditions such that toluene is degraded.

2.	The method of Claim 1, wherein said homologue is derived from an organism selected from the group consisting of Thauera aromatica, Xanthomonas maltophilia, Geobacter metallireducens, and Azoarcus tolulyticus.

3.	The method of Claim 1, wherein said homologue has the amino acid sequence shown in Figure 11.

4.	The method of Claim 1, wherein said homologue has the amino acid sequence shown in Figure 13.

5.	The method of Claim 1, wherein said reaction containing means is a bioreactor.

6.	A protein having the amino acid sequence shown in Figure 11.

7.	A protein having the amino acid sequence shown in Figure 13.

Description:

COMPOSITIONS AND METHODS FOR BIOREMEDIATION This invention was made with government support under NSF grant MCB9507132 and DARPA grant N00014-92-J-1888. The Government of the United States of America has certain rights in the invention.

FIELD OF THE INVENTION This invention relates to biological treatment of organic compounds, and particularly to the degradation of toluene and toluene analogues.

BACKGROUND Industrial processes that use or generate toxic organic compounds (e.g., toluene, benzene, xylenes) has lead to the contamination of nearby water and land. Such compounds are among the most water soluble of all gasoline components and can also enter aquatic environments from many sources such as gasoline underground storage tanks, leaks, and spills.

Most approaches to decontamination or "remediation" involve stopping the local dumping of such compounds and transport of the waste to another area for containment. This is costly and does not eliminate the hazard.

As a remediation technology, bioremediation is considerably more attractive. Rather than merely transporting wastes, it offers the possibility of degrading toxic compounds to harmless reaction products by the use of biologicals.

Bioremediation field trials have involved both in-situ and ex-situ treatment methods.

Typically, ex-situ treatment involves the transfer of contaminated waste from the site into a treatment tank designed to support microbial growth, i.e., a "bioreactor". The reactor provides for effective mixing of nutrients and control over temperature, pH and aeration to allow optimum microbial growth.

In-situ treatment involves adding biologicals directly to the waste. This avoids the problems associated with handling (e.g., pumping) toxic compounds. However, in-situ treatment has its own problems. Unlike bioreactors, where microbial growth can be monitored and adjusted, in-situ environmental conditions are difficult to measure and control.

Fries et al., "Isolation, characterization and distribution of denitrifying toluene degraders from a variety of habitats," Appl. Environ. Microbiol. 60:2802 (1994) generally

indicates that biodegradation of benzene, toluene, ethylbenzene and xylenes under aerobic conditions is well known, although the availability of oxygen due to its low solubility in water and low rate of transport in soils and sediments is rate limiting. Fries et al. describes anaerobic respiration of toluene by microorganisms isolated from nature. The microorganisms could grow on 25 ppm toluene and could be fed 50 ppm toluene.

Rates have been determined at 28-300C with intact cells from a variety of strains. The rates vary from between 8 to 80 nmoles toluene min~' mg' protein. A. Frazer et al., "Toluene Metabolism Under Anaerobic Conditions: A Review," Anaerobe 1:293 (1995).

There remains a need to develop a bioremediation procedure that can be operated economically on a commercial scale. Such a procedure must be able to degrade organic compounds with high efficiency.

SUMMARY OF THE INVENTION This invention relates to biological treatment of organic compounds, and particularly to the degradation of toluene and toluene analogues. In one embodiment, the present invention contemplates a method of degrading compounds contained in a liquid or solid waste source, comprising the steps of: a) providing, i) a waste source comprising toluene (and/or a toluene analogue), ii) a reaction containing means, and iii) a compound selected from the group consisting of a functional, cell-free pyruvate formate lyase homologue of a toluene- degrading bacterium and a functional, cell-free pyruvate formate lyase activating homologue of a toluene-degrading bacterium; and b) reacting said homologue and said waste source in said containing means under conditions such that toluene (and/or the toluene analogue) is degraded.

It is not intended that the present invention be limited by the specific toluene- degrading bacterium. In one embodiment, said homologue is derived from an organism of the genus Thauera. In one embodiment, the organism is Thauera aromatica.

In another embodiment, said homologue is derived from an organism of the genus Xanthomonas. In one embodiment, the organism is Xanthomonas maltophilia.

In yet another embodiment, said homologue is derived from an organism of the genus Geobacter. In one embodiment, the organism is Geobacter metallireducens.

In still another embodiment, said homologue is derived from members of the genus Azoarcus. In one embodiment, the organism is Azoarcus tolulyticus.

The present invention contemplates the nucleic acid and amino acid sequences of toluene degrading enzymes as compositions of matter. In one embodiment, the present invention contemplates a purified nucleic acid comprising DNA having the sequence as set forth in Figure 12. In one embodiment, said DNA is in a vector. In another embodiment, said vector is a bacterial plasmid. In a particular embodiment, said bacterial plasmid is in a host cell. In one embodiment, said host cell expresses a toluene-degrading enzyme.

The present invention contemplates a functional, cell-free product of the tutD gene having the amino acid sequence as set forth in Figure 11. In one embodiment, said product is contained within a reaction containing means. In a preferred embodiment, said reaction containing means is a bioreactor.

It is also not intended that the present invention be limited by the precise amino acid sequence of the homologue. In one embodiment, it is encoded by the tutD gene, a nucleic acid sequence for which is shown in Figure 5, and has the amino acid sequence shown in Figure 7. In another embodiment, the homologue is an expanded TutD protein having the amino acid shown in Figure 11 and the corresponding nucleic acid sequence shown in Figure 12. In another embodiment, the homologue is encoded by the tutE gene having a nucleic acid sequence shown in Figure 12, and a corresponding amino acid sequence shown in Figure 13.

Additionally, the present invention contemplates a reporter gene fusion product constructed by fusing the tutD gene in frame to a reporter such as lacZ, luxA, or green fluorescence protein. Such constructs can be used to demonstrate regulated expression in response to toluene.

In another embodiment, the present invention contemplates a reporter gene fusion product constructed by fusing the tutE gene in frame to a reporter such as lacZ, luxA, or green fluorescence protein. Such constructs can be used to demonstrate regulated expression in response to toluene.

DEFINITIONS To facilitate understanding of the invention, a number of terms are defined below.

The term "reaction" or "chemical reaction" means reactions involving chemical reactants, such as organic compounds. A "reaction containing means" refers to anything that can contain a reaction, including but not limited to, tubes, microtiter plates, vessels, and bioreactors. It is not intended that the present invention be limited by a particular reaction

containing means. U.S. Patent No. 5,610,061, No. 5,585,272, No. 5,571,705, No. 5.560,737, No. 5,057, 221 and No. 5,037,551 all describe various reaction containing means (including bioreactors) and are hereby incorporated by reference.

"Initiating a reaction" means causing a reaction to take place. Reactions can be initiated by any means (e.g., mixing, heat, wavelengths of light, addition of a catalyst, etc.) A "solvent" is a liquid substance capable of dissolving or dispersing one or more other substances. It is not intended that the present invention be limited by the nature of the solvent used.

A "waste source" can be a solid or liquid waste source (e.g., paper pulp, pulp mill effluent. sludge, wastewater, petroleum spill, etc.).

"Toluene analogues" are structural analogues of toluene. While it is not intended that the present invention be limited to particular analogues, examples include the o-, m-, and p- isomers of chlorotoluene, fluorotoluene and xylene.

A "pyruvate formate lyase homologue" is defined as a gene product from a toluene- degrading organism, said gene product comprising i) regions of identity with the pyruvate formate lyase from E. coli (the PflD gene Genebank G418519) and/or from Clostridium pasteurianum (Genebank G1072361) such that the gene product contains the motif RVSGY (SEQ ID NO:1), RVAGY (SEQ ID NO:2), or VRVSGYSA (SEQ ID NO:3) at the essential glycine (shown in bold and discussed below), and ii) regions of non-identity. The gene product may contain other regions of identity with pyruvate formate lyase from E. coli (the PflD gene Genebank G418519) and from Clostridium pasteurianum (Genebank G1072361), including but not limited to, the motif TPDGR (SEQ ID NO:4), TPDGRF (SEQ ID NO:5), GPTAVL (SEQ ID NO:6), and GNDDD (SEQ ID NO:7). As noted below, the present invention also identifies other conserved regions, including but not limited to those associated with an essential conserved cysteine.

A "functional" homologue is one where transfer of the gene or expression of the gene product confers the ability to degrade toluene. Functional homologues need not comprise the entire gene product, i.e. functional peptide fragments (portions that are less than the entire gene product) are specifically contemplated.

The term "purified" means separated from some components that are normally present in the native state. Thus, a spectrum of purity is contemplated. At the very basic level, a cell-free preparation is "purified." Similarly, nucleic acid that is even substantially protein- free is "purified." At a more extreme level, the present invention contemplates a particular

toluene degrading protein that is substantially free of all other proteins (usually less than 10% and preferably less than 5% of other proteins are present).

The term "gene" refers to a DNA sequence that comprises control and coding sequences necessary for the production of a polypeptide or precursor thereof. The polypeptide can be encoded by a full length coding sequence or by any portion of the coding sequence so long as the desired enzymatic activity is retained.

The term "wild-type" refers to a gene or gene product which has the characteristics of that gene or gene product when isolated from a naturally occurring source. A wild-type gene is that which is most frequently observed in a population and is thus arbitrarily designated the "normal" or "wild-type" form of the gene. In contrast, the term "modified" or "mutant" refers to a gene or gene product which displays modifications in sequence and or functional properties (i.e., altered characteristics) when compared to the wild-type gene or gene product.

It is noted that naturally-occurring mutants can be isolated; these are identified by the fact that they have altered characteristics when compared to the wild-type gene or gene product.

The term "oligonucleotide" as used herein is defined as a molecule comprised of two or more deoxyribonucleotides or ribonucleotides, usually more than three (3), and typically more than ten (10) and up to one hundred (100) or more (although preferably between twenty and thirty). The exact size will depend on many factors, which in turn depends on the ultimate function or use of the oligonucleotide. The oligonucleotide may be generated in any manner, including chemical synthesis, DNA replication, reverse transcription, or a combination thereof.

Because mononucleotides are reacted to make oligonucleotides in a manner such that the 5' phosphate of one mononucleotide pentose ring is attached to the 3' oxygen of its neighbor in one direction via a phosphodiester linkage, an end of an oligonucleotide is referred to as the "5' end" if its 5' phosphate is not linked to the 3' oxygen of a mononucleotide pentose ring and as the "3' end" if its 3' oxygen is not linked to a 5' phosphate of a subsequent mononucleotide pentose ring. As used herein, a nucleic acid sequence, even if internal to a larger oligonucleotide, also may be said to have 5' and 3' ends.

When two different, non-overlapping oligonucleotides anneal to different regions of the same linear complementary nucleic acid sequence, and the 3' end of one oligonucleotide points towards the 5' end of the other, the former may be called the "upstream" oligonucleotide and the latter the "downstream" oligonucleotide.

The term "primer" refers to an oligonucleotide which is capable of acting as a point of initiation of synthesis when placed under conditions in which primer extension is initiated.

An oligonucleotide "primer" may occur naturally, as in a purified restriction digest or may be produced synthetically.

A primer or oligonucleotide is selected to be "substantially" complementary to a strand of specific sequence of the template. A primer must be sufficiently complementary to hybridize with a template strand for primer elongation to occur. A primer sequence need not reflect the exact sequence of the template. For example, a non-complementary nucleotide fragment may be attached to the 5' end of the primer, with the remainder of the primer sequence being substantially complementary to the strand. Non-complementary bases or longer sequences can be interspersed into the primer, provided that the primer sequence has sufficient complementarity with the sequence of the template to hybridize and thereby form a template primer complex for synthesis of the extension product of the primer.

"Hybridization" methods involve the annealing of a complementary sequence to the target nucleic acid (the sequence to be detected). The ability of two polymers of nucleic acid containing complementary sequences to find each other and anneal through base pairing interaction is a well-recognized phenomenon. The initial observations of the "hybridization" process by Marmur and Lane, Proc. Natl. Acad. Sci. USA 46:453 (1960) and Doty et al., Proc. Natl. Acad. Sci. USA 46:461 (1960) have been followed by the refinement of this process into an essential tool of modern biology.

Even where the sequence of a probe or oligonucleotide is completely complementary to the sequence of the target, i.e., the target's primary structure, the target sequence must be made accessible to the probe via rearrangements of higher-order structure. These higher-order structural rearrangements may concern either the secondary structure or tertiary structure of the molecule. Secondary structure is determined by intramolecular bonding. In the case of DNA or RNA targets this consists of hybridization within a single, continuous strand of bases (as opposed to hybridization between two different strands). Depending on the extent and position of intramolecular bonding, the probe can be displaced from the target sequence preventing hybridization.

Solution hybridization of oligonucleotide probes to denatured double-stranded DNA is further complicated by the fact that the longer complementary target strands can renature or reanneal. Again, hybridized probe is displaced by this process. This results in a low yield of hybridization (low "coverage") relative to the starting concentrations of probe and target.

Hybridization, regardless of the method used, requires some degree of complementarity between the sequence being assayed (the target sequence) and the fragment of DNA used to perform the test (the probe). (Of course, one can obtain binding without any complementarity but this binding is nonspecific and to be avoided.) The complement of a nucleic acid sequence as used herein refers to an oligonucleotide which, when aligned with the nucleic acid sequence such that the 5' end of one sequence is paired with the 3' end of the other, is in "antiparallel association." Certain bases not commonly found in natural nucleic acids may be included in the nucleic acids of the present invention and include, for example, inosine and 7-deazaguanine. Complementarity need not be perfect; stable duplexes may contain mismatched base pairs or unmatched bases. Those skilled in the art of nucleic acid technology can determine duplex stability empirically considering a number of variables including, for example, the length of the oligonucleotide, base composition and sequence of the oligonucleotide, ionic strength and incidence of mismatched base pairs.

Stability of a nucleic acid duplex is measured by the melting temperature, or "Tm+" The Tm of a particular nucleic acid duplex under specified conditions is the temperature at which on average half of the base pairs have disassociated. The equation for calculating the Tm of nucleic acids is well known in the art. As indicated by standard references, an estimate of the Tm value may be calculated by the equation: Tm = 81.5"C + 16.6 log M + .41(%GC) - 0.61(% form) - where M is the molarity of monovalent cations, %GC is the percentage of guanosine and cytosine nucleotides in the DNA, %form is the percentage of formamide in the hybridization solution, and L = length of the hybrid in base pairs. [See e.g., Guide to Molecular Cloning Techniques, Ed. S.L. Berger and A.R. Kimmel, in Methods in Enzymology Vol. 152, 401 (1987)]. Other references include more sophisticated computations which take structural as well as sequence characteristics into account for the calculation of Tm.

The present invention contemplates utilizing the nucleic acid sequence of the tutD gene to isolate other genes encoding pyruvate formate lyase homologues by hybridizing portions of the tutD gene to total DNA of various toluene-degrading organisms. Preferably, hybridization is carried out at high stringency (i.e., carried out at or near the Tm of the particular duplex). Hybridization can be used to capture other genes. Alternatively, hybridization can be followed by primer extension or PCR.

The present invention also contemplates utilizing the nucleic acid sequence of the tutE gene to isolate other genes encoding pyruvate formate lyase homologues by hybridizing portions of the tutE gene to total DNA of various toluene-degrading organisms. Preferably, hybridization is carried out at high stringency (i.e., carried out at or near the Tm of the particular duplex). Hybridization can be used to capture other genes. Alternatively, hybridization can be followed by primer extension or PCR.

Mullis, et al., U.S. patents Nos. 4,683,195 and 4,683,202 (both of which are hereby incorporated by reference), describe a methods for increasing the concentration of a segment of target sequence in a mixture of genomic DNA without cloning or purification. This process for amplifying the target sequence consists of introducing a molar excess of two oligonucleotide primers to the DNA mixture containing the desired target sequence. The two primers are complementary to their respective strands of the double-stranded sequence. The mixture is denatured and then allowed to hybridize. Following hybridization. the primers are extended with polymerase so as to form complementary strands. The steps of denaturation, hybridization, and polymerase extension can be repeated as often as needed to obtain are relatively high concentration of a segment of the desired target sequence. The length of the segment of the desired target sequence is determined by the relative positions of the primers with respect to each other, and therefore, this length is a controllable parameter. By virtue of the repeating aspect of the process, the method is referred to by the inventors as the "Polymerase Chain Reaction" (hereinafter PCR). Because the desired segment of the target sequence become the dominant sequences (in terms of concentration) in the mixture, they are said to be "PCR-amplified." It is not intended that the present invention be limited to a particular toluene-degrading organism. The present invention contemplates identifying homologues in both known and yet undiscovered toluene-degrading organisms. Known organisms are set forth in the Table 1.

Table 1 Strain Designations Energy Metabolism T Denitrifying T I Denitrifying Thauera aromatica, K 172 Denitrifying S100 and S2 Denitrifying Azoarcus tolulyticus, Tol 4 (type strain); others Denitrifying include Td-l, Td-2, Td-3, Td-15, Td-17, Td-19, Td-21 Tondi. mXyN1, and EbN1 Denitrifying Xanthomonas maltophilia, Sul Denitrifying Geobacter metallireducens, Gs-15 Denitrifying Desulfobacula toTholica, To12 Denitrifying PRTOLI Denitrifying The term "probe" as used herein refers to a labeled oligonucleotide which forms a duplex structure with a sequence in another nucleic acid, due to complementarity of at least one sequence in the probe with a sequence in the other nucleic acid.

The term "label" as used herein refers to any atom or molecule which can be used to provide a detectable (preferably quantifiable) signal, and which can be attached to a nucleic acid or protein. Labels may provide signals detectable by fluorescence, radioactivity, colorimetry, gravimetry, X-ray diffraction or absorption, magnetism. enzymatic activity, and the like.

The terms "nucleic acid substrate" and nucleic acid template" are used herein interchangeably and refer to a nucleic acid molecule which may comprise single- or double- stranded DNA or RNA.

The term "substantially single-stranded" when used in reference to a nucleic acid substrate means that the substrate molecule exists primarily as a single strand of nucleic acid in contrast to a double-stranded substrate which exists as two strands of nucleic acid which are held together by inter-strand base pairing interactions.

The term "sequence variation" as used herein refers to differences in nucleic acid sequence between two nucleic acid templates. For example, a wild-type structural gene and a mutant form of this wild-type structural gene may vary in sequence by the presence of single base substitutions and/or deletions or insertions of one or more nucleotides. These two forms

of the structural gene are said to vary in sequence from one another. A second mutant form of the structural gene may exist. This second mutant form is said to vary in sequence from both the wild-type gene and the first mutant form of the gene. It is noted, however, that the invention does not require that a comparison be made between one or more forms of a gene to detect sequence variations.

The term "Km" as used herein refers to the Michaelis-Menten constant for an enzyme and is defined as the concentration of the specific substrate at which a given enzyme yields one-half its maximum velocity in an enzyme catalyzed reaction.

The term "nucleotide analog" as used herein refers to modified or non-naturally occurring nucleotides such as 7-deaza purines (i.e., 7-deaza-dATP and 7-deaza-dGTP).

Nucleotide analogs include base analogs and comprise modified forms of deoxyribonucleotides as well as ribonucleotides. As used herein the term "nucleotide analog" when used in reference to substrates present in a PCR mixture refers to the use of nucleotides other than dATP, dGTP, dCTP and dTTP; thus, the use of dump (a naturally occurring dNTP) in a PCR would comprise the use of a nucleotide analog in the PCR. A PCR product generated using dump, 7-deaza-dATP, 7-deaza-dGTP or any other nucleotide analog in the reaction mixture is said to contain nucleotide analogs.

"Oligonucleotide primers matching or complementary to a gene sequence" refers to oligonucleotide primers capable of facilitating the template-dependent synthesis of single or double-stranded nucleic acids. Oligonucleotide primers matching or complementary to a gene sequence may be used in PCRs, RT-PCRs and the like.

A "consensus gene sequence" refers to a gene sequence which is derived by comparison of two or more gene sequences and which describes the nucleotides most often present in a given segment of the genes; the consensus sequence is the canonical sequence. A "motif' refers to the corresponding amino acid sequence defining a region of identity following a comparison of two or more amino acid sequences.

The term "polymorphic locus" is a locus present in a population which shows variation between members of the population (i.e., the most common allele has a frequency of less than 0.95). In contrast, a "monomorphic locus" is a genetic locus at little or no variations seen between members of the population (generally taken to be a locus at which the most common allele exceeds a frequency of 0.95 in the gene pool of the population).

The term "microorganism" as used herein means an organism too small to be observed with the unaided eye and includes, but is not limited to bacteria, viruses, protozoans, fungi, and ciliates.

The term "microbial gene sequences" refers to gene sequences derived from a microorganism.

The term "bacteria" refers to any bacterial species including abacterial and archaebacterial species.

The term "recombinant DNA molecule" as used herein refers to a DNA molecule which is comprised of segments of DNA joined together by means of molecular biological techniques.

The terms "in operable combination" or "operably linked" as used herein refers to the linkage of nucleic acid sequences in such a manner that a nucleic acid molecule capable of directing the synthesis of a desired protein molecule is produced. When a promoter sequence is operably linked to sequences encoding a protein, the promoter directs the expression of mRNA which can be translated to produce a functional form of the encoded protein. The term also refers to the linkage of amino acid sequences in such a manner that a functional protein is produced.

The term "an oligonucleotide having a nucleotide sequence encoding a gene" means a DNA sequence comprising the coding region of a gene or, in other words, the DNA sequence which encodes a gene product. The coding region may be present in either a cDNA or genomic DNA form. Suitable control elements such as enhancers/promoters, splice junctions, polyadenylation signals, etc. may be placed in close proximity to the coding region of the gene if needed to permit proper initiation of transcription and/or correct processing of the primary RNA transcript. Alternatively, the coding region utilized in the expression vectors of the present invention may contain endogenous enhancers/promoters, splice junctions, intervening sequences, polyadenylation signals, etc. or a combination of both endogenous and exogenous control elements.

The term "recombinant oligonucleotide" refers to an oligonucleotide created using molecular biological manipulations, including but not limited to, the ligation of two or more oligonucleotide sequences generated by restriction enzyme digestion of a polynucleotide

sequence the synthesis of oligonucleotides (e.g., the synthesis of primers or oligonucleotides) and the like.

The term "recombinant oligonucleotide having a sequence encoding a protein operably linked to a heterologous promoter" or grammatical equivalents indicates that the coding region encoding the protein (e.g., an enzyme) has been joined to a promoter which is not the promoter naturally associated with the coding region in the genome of an organism (i.e., it is linked to an exogenous promoter). The promoter which is naturally associated or linked to a coding region in the genome is referred to as the "endogenous promoter" for that coding region.

The term "transcription unit" as used herein refers to the segment of DNA between the sites of initiation and termination of transcription and the regulatory elements necessary for the efficient initiation and termination. For example, a segment of DNA comprising an enhancer/promoter, a coding region, and a termination and polyadenylation sequence comprises a transcription unit.

The term "regulatory element" as used herein refers to a genetic element which controls some aspect of the expression of nucleic acid sequences. For example, a promoter is a regulatory element which facilitates the initiation of transcription of an operably linked coding region.

The term "expression vector" or "vector" as used herein refers to a recombinant DNA molecule containing a desired coding sequence and appropriate nucleic acid sequences necessary for the expression of the operably linked coding sequence in a particular host organism. Nucleic acid sequences necessary for expression in prokaryotes include a promoter, optionally an operator sequence, a ribosome binding site and possibly other sequences. Eukaryotic cells are known to utilize promoters, enhancers, and termination and polyadenylation signals.

Transcriptional control signals in eucaryotes comprise "promoter" and "enhancer" elements. Promoters and enhancers consist of short arrays of DNA sequences that interact specifically with cellular proteins involved in transcription [Maniatis et al., Science 236:1237 (1987)]. Promoter and enhancer elements have been isolated from a variety of eukaryotic sources including genes in yeast, insect and mammalian cells and viruses (analogous control elements, i.e., promoters, are also found in prokaryotes). The selection of a particular promoter and enhancer depends on what cell type is to be used to express the protein of interest. Some eukaryotic promoters and enhancers have a broad host range while others are

functional in a limited subset of cell types [for review see Voss et al., Trends Biochem. Sci.

11:287 (1986) and Maniatis et al., supra (1987)]. For example, the SV40 early gene enhancer is very active in a wide variety of cell types from many mammalian species and has been widely used for the expression of proteins in mammalian cells [Dijkema et al., EMBO J: 4:761 (1985)]. Two other examples of promoter/enhancer elements active in a broad range of mammalian cell types are those from the human elongation factor loc gene [Uetsuki et al., J.

Biol. Chem., 264:5791 (1989); Kim et al., Gene 91:217 (1990); and Mizushima and Nagata, Nuc. Acids. Res., 18:5322 (1990)] and the long terminal repeats of the Rous sarcoma virus [Gorman et al., Proc. Natl. Acad. Sci. USA 79:6777 (1982)] and the human cytomegalovirus [Boshart et al., Cell 41:521(1985)].

The term "promoter/enhancer" denotes a segment of DNA which contains sequences capable of providing both promoter and enhancer functions (for example, the long terminal repeats of retroviruses contain both promoter and enhancer functions). The enhancer/promoter may be "endogenous" or "exogenous" or "heterologous." An endogenous enhancer/promoter is one which is naturally linked with a given gene in the genome. An exogenous (heterologous) enhancer/promoter is one which is placed in juxtaposition to a gene by means of genetic manipulation (i.e., molecular biological techniques).

DESCRIPTION OF THE DRAWINGS Figure 1 shows the restriction map of a cosmid clone capable of restoring the ability to grow on toluene in toluene-nondegrading mutants.

Figure 2 shows the nucleic acid sequence (SEQ ID NO: 8) of the tutB gene and tutC gene (submitted to the GenBank data base and assigned accession number U57900).

Figure 3 shows the amino acid sequence of the tutB gene product (SEQ ID NO: 9).

Figure 4 shows the amino acid sequence of the tutC gene product (SEQ ID NO: 10).

Figure 5 shows the nucleic acid sequence of the tutD gene (SEQ ID NO: 11).

Figure 6 shows part of the nucleic acid sequence of the tutE gene (SEQ ID NO:12).

Figure 7 shows the amino acid sequence of the tutD gene product (SEQ ID NO:13).

Figure 8 shows the restriction map for pRK4 15.

Figure 9 shows the polylinker contained in pRK4 15.

Figure 10 shows the restriction map of a cosmid clone containing the tutD and tutE genes.

Figure 11 shows an expanded amino acid sequence of the tutD gene product (SEQ ID NO:14).

Figure 12 shows an expanded nucleic acid sequence encompassing both the tutD and tutE gene (SEQ ID NO:15).

Figure 13 shows the amino acid sequence for the tutE gene product (SEQ ID NO:16).

Figure 14 shows Northern gel results indicating that both tutD and tutE are regulated by toluene.

DESCRIPTION OF THE INVENTION This invention relates to biological treatment of organic compounds, and particularly to the degradation of toluene. Toluene, along with benzene and xylenes is a common contaminant of ground and surface water. Toluene has been classified by the U.S.

Environmental Protection Agency as a priority pollutant due to its ability to depress the central nervous system and to enhance the effect of known carcinogens.

Anaerobic toluene degrading bacterial strains have been isolated. Most importantly, mutants have been obtained. These mutants fall into two classes, one class that fails to metabolize toluene, and another class that metabolizes toluene but fails to use it as a growth substrate.

A cosmid library was generated from total DNA isolated from the toluene-degrading bacterium strain T 1. Triparental matings were used to identify a clone that restored the ability of mutants to grow on toluene and utilize it as a carbon source. This clone has now been characterized (Figure 1 shows the restriction map). The DNA of this clone has now been sequenced and the genes identified are believed to be both regulatory and structural.

Regulatory Genes The sequence of the cloned SacII-ClaI-ClaI fragment (approximately 6.4 kb containing the tutB gene and the tutC gene), that fully complements the tutB-16 mutation and carries all the information necessary to restore the ability to utilize toluene, is shown in Figure 2 (the restriction sites for SacII and ClaI are indicated in Figure 1 as "Sa" and C" respectively, although not all SacII sites are shown; BamHI, HindII, PstI, SmaI and SalI sites are indicated

as "B", "H"* "P", "Sm" and "S", respectively). The subclone complements the mutation when inserted into the pRK415 vector (described below) in either orientation. This strongly suggests that the subclone provides all the cis acting factors necessary for gene expression and the vector does not provide any elements essential for expression of the insert.

DNA sequence analysis of the fragment has identified an open reading frame that has homology to the nodW gene product of B. japonicum and other proteins presented in Figure 3. All of these proteins have been identified as DNA binding regulatory proteins and members of the two component family of signal transduction proteins. All have phosphorylation sites at a conserved aspartic acid residue. The tutB gene product also has an aspartic acid residue in the analogous location, at amino acid 58.

Additional DNA sequence analysis has identified a second open reading frame upstream of the tutB gene. This open reading frame, named tutC, has homology to the nodV gene product of B. japonicum and other proteins presented in Figure 4. These gene products are proposed to serve as the sensor protein in the two component regulatory system). In their role as sensor proteins, they must autophosphorylate and then transfer the phosphate to the DNA binding protein. The site of autophosphorylation is a histidine residue that is conserved in all the systems. The tutC gene product has a histidine residue in the analogous location at amino acid 757. As can also be seen in Figure 4, the homology of the sensor proteins extends only about 400 amino acids. This region is proposed to be the transmitter domain, the part of the protein that sends the regulatory signal to the DNA binding protein. The remainder of the protein presumably serves to detect the signal from the environment and would not be expected to be conserved across the different systems.

The proteins that have the greatest similarity to the tutCB gene products appear to regulate a diverse set of genes. Both FixL/FixJ from R. meliloti and from A. caulinodans regulate genes involved in nitrogen fixation, while FixL/FixJ from B. japonicum are proposed to regulate anaerobic respiratory genes. The nodVW gene products of B. japonicum play a role in the nodulation process, while the dctSR gene products of R. capsulatus serve as regulators of C4-dicarboxylate transport. It is apparent that these genes function in a similar manner but the classes of genes they regulate have little in common.

Structural Genes Sequencing of another region of the cosmid clone has revealed the tutD gene (Figure 5 shows the sequence of an approximately 3.1 kb fragment) and part of the tutE gene (Figure

6). An expanded tutD gene is presented in Figure 12 (Figure 12 shows the sequence of approximately 5 kb fragment) with a corresponding amino acid sequence presented in Figure 11 (shown aligned with other pyruvate formate lyases). An analysis of this sequence shows that tutD encodes a protein having homologies with the pyruvate formate lyase from E. coli (the PflD gene Genebank G418519) and from Clostridium pasteurianum (Genebank G1072361) (Figure 7). Other pyruvate formate lyases also show homologies (not shown).

Pyruvate formate lyase catalyzes the conversion of pyruvate and CoA to acetyl-CoA and formate, which is the key step of the glucose fermentation route in anaerobically grown E. coli cells. See generally, Knappe and Wagner, Methods Enzymol. 258:343 (1995). The active form of pyruvate formate-lyase (PFL) from Escherichia coli contains a glycyl radical in position 734 of the polypeptide chain which is produced post-translationally by pyruvate formate-lyase-activating enzyme (PFL activase) using S-adenosylmethionine (AdoMet) and dihydroflavodoxin as co-substrates. A.F. Wagner et al., "The free radical in pyruvate formate-lyase is located on glycine-734," Proc. Natl. Acad. Sci. U.S.A. 89, 996-1000 (1992).

The glycyl radical has been shown to participate in catalysis by guiding the carbon-carbon bond cleavage step along a radical-chemical route. The radical is thought to interact with a cystein residue; indeed, a reversible hydrogen transfer, induced by substrate binding, has been proposed between the Gly-734 resting-state spin localization and Cys-418, whose thiyl radical will function as the "working radical" for substrate processing.

It is not known how the homologue of the present invention functions. However, the comparison shown in Figure 7 reveals the essential glycine (marked in the Figure with a While an understanding of the precise mechanism is not necessary to the successful practice of the invention, it is now known that a cysteine of the tutD gene product is also involved in the transfer that is ultimately directed to the methyl group of toluene (see discussion below).

Again, while it is not necessary to the successful practice of the invention, the lack of homology at the 5' end of the tutD gene suggests that this portion of the gene product involves the unique substrate recognition.

Experimental The following examples serve to illustrate certain preferred embodiments and aspects of the present invention and are not to be construed as limiting the scope thereof.

In the experimental disclosure which follows, the following abbreviations apply: eq (equivalents); M (Molar); pM (micromolar); N (Normal); mol (moles); mmol (millimoles);

pmol (micromoles); nmol (nanomoles); gm (grams); mg (milligrams); llg (micrograms); L (liters); ml (milliliters); ptl (microliters); cm (centimeters); mm (millimeters); ,um (micrometers); nm (nanometers); OC (degrees Centigrade).

Strains And Plasmids The Escherichia coli strains HB101, XL-1 Kan Blue (Stratagene, LaJolla, CA), and XL-1 Blue (Stratagene), used to propagate and transfer DNA, were transformed by the calcium chloride technique or were purchased from the company as competent cells. Strain HB101(pRK2013) (KanR) contains a helper plasmid that permitted mobilization of cosmids and plasmids into the T1 strain background.

Plasmids used in this study include pLAFR3 for construction of the genomic cosmid library, pRK415 (Figure 8) for construction of subclones and matings, and the pBluescript vector (Stratagene) for subcloning and preparation of DNA fragments.

Ditta et al. [Plasmid 13(1985) 149-153] constructed the moderately-sized cloning vector pRK404 from pRK290. In order to increase the cloning usefulness of this plasmid, the EcoRI site outside the polylinker was deleted and the polylinker, derived from pUC9, was replaced by the pUC19 polylinker (Figure 9). The resulting construct, pRK415 (Figure 8), permits cloning into all of the polylinker restriction sites of pRK404 as well as the additional unique EcoRI, XbaI, KpnI and SstI sites. The SphI site of the pUC19 polylinker is not generally useful because an SphI site occurs elsewhere in the plasmid. The unique DraI, ApaI, SmaI and Eco RV sites are convenient for mapping the orientation of inserted DNA fragments into the polylinker sites. Since pRK415 retains the lac promoter of pRK404, bacterial genes inserted in the proper orientation into the polylinker should be expressed in E. coli. XGal color screening can also be used for plasmid constructions in E. coli. pRK415 has proven useful for subcloning and maintaining small DNA fragments in field isolates of P. syringae pv. glycinea and other P. syringae pathovars. If fragments larger than approx. 5kb are cloned however, from a few to more that 50% of the P. syringae exconjugants have been observed to suffer deletions in the inserted DNA.

The restriction map for pRK415 is shown in Figure 8. This DNA was transformed into strain JM-101, a blue colony on XGal medium was retained and the resultant plasmid designated pRK4 15. The deleted EcoRI site is shown in brackets. Restriction sites separated by a slash occur close together.

Media Strain T1 and all strains derived from T1 were grown on either Brain Heart Infusion (BHI, Difco Laboratories, Detroit, MI) medium or a mineral salts medium (vitamins and yeast extract omitted). Unless otherwise specified. toluene (0.3 - 0.5 mM) or pyruvic acid (5 mM) were used as the carbon source to supplement the minimal medium. Nitrate was supplied to a concentration of 10 to 20 mM unless otherwise specified. Plates always contained 2% Agar Noble (Difco Laboratories). Liquid media was prepared and placed in serum bottles which were then tightly stoppered with teflon coated butyl rubber and aluminum crimp seals.

Anaerobic conditions were generated by evacuation and subsequent filling of the bottles with argon. This process was performed a total of four times. E. coli was grown in Luria-Bertani agar or broth (LB) or on BHI agar plates.

The antibiotics kanamycin (used at 50 mg/ml) and tetracycline (used at 25 mg/ml) were supplied where indicated. A 12.5 mg/ml stock of tetracycline was made in ethanol.

Upon addition to minimal media the tetracycline served to select for the cosmid while the ethanol (final concentration of approximately 17 mM) served as the carbon source for the transconj ugant strains.

Mutagenesis Mutagenesis was carried out on strain T1 under aerobic conditions. Strain T1 was grown in a rich medium (BHI + nitrate), washed, and resuspended in 100 mM sodium citrate buffer (pH 5.5) to a cell density of about 3.5 x 108 cells/ml. The cell suspensions were treated with nitrosoguanidine (final concentration of 50 ,ug/ml) and aliquots were removed at various times. The mutagenized cells were harvested by centrifugation and washed with 100 mM potassium phosphate buffer (pH 7.0) to remove the nitrosoguanidine and then resuspended in the phosphate buffer. The treated cells were tittered on BHI plates to establish a killing curve. The treatment group that resulted in about 50% killing was used for the isolation of mutants. Treated cells were diluted in phosphate buffer to yield 100-200 colonies per plate and spread onto minimal medium plates supplemented with nitrate and pyruvic acid.

After 5 days of incubation (300 C anoxic) colonies were replica plated to rich medium and minimal medium with nitrate and toluene supplied in the vapor phase. The plates were placed in an anaerobic incubation jar which was then sealed and filled with hydrogen gas (to 12 psi). In the presence of a palladium catalyst oxygen is removed by reaction with the hydrogen producing water and resulting in an anoxic atmosphere. After 5 days of anaerobic

incubation (300 C) colonies that grew on the rich medium but not on the minimal medium with nitrate and toluene were picked and streaked onto rich plates. The strains were retested for the ability to grow with toluene serving as the sole carbon source in both liquid and solid media. The strains were later tested for the ability to utilize toluene and produce the dead-end products benzylfumaric acid and benzylsuccinic acid in liquid culture.

Chemicals Tetracycline was purchased from Fluka (Ronkonkoma, NY). Kanamycin and N-methyl-N' -nitro-N-nitrosoguanidine (nitrosoguanidine) were obtained from Sigma (St.

Louis, MO).

Construction Of Cosmid Library Strain T 1 was grown in 500 ml of minimal + nitrate + ethanol medium under anaerobic conditions and genomic DNA was isolated. The DNA was purified by two successive CsCl gradient centrifugations. A partial digest of the DNA with Sau3AI enzyme was carried out and fragments of 15-25 kb were isolated on a 10-40% glycerol gradient.

These fragments were ligated into the BamHI site of pLAFR3. The resulting ligation mix was packaged into phage heads using a Packagene kit from Promega (Madison, WI). E. coli strain HB101 was infected with the phage and plated onto LB + tetracycline plates. The resulting 750 colonies were streaked on plates of the same medium and the isolates served as the genomic library for obtaining the cosmid clone.

Triparental Mating Triparental matings were carried out. Mutants of strain T1 were grown for 3 days in minimal + nitrate + pyruvic acid media. HB101 (or XL-1 Kan Blue) carrying the donor cosmid or plasmid was grown in LB + tetracycline overnight. HBlOl(pRK2013) was grown in LB + kanamycin overnight. One ml of each culture was centrifuged and resuspended in an equal volume of 100 mM phosphate buffer (pH 7). Ten ,ul of each culture was spotted (one on top of the other) onto a BHI + nitrate plate. After a three day incubation at 300 C in an anoxic environment, the resulting growth was scraped off the plate, resuspended in phosphate buffer, and spotted onto a minimal agar plate containing pyruvic acid, nitrate, ethanol, and tetracycline to select for transconjugants. After another three day incubation, cells from the

resultant growth were streaked onto the same media and grown in a sealed jar in the absence of oxygen. After three days of incubation, single transconjugant colonies were isolated from these plates and tested for complementation.

Restriction Mapping And Subcloning DNA manipulations were carried out as described by Maniatis et al. All enzymes were obtained from New England Biolabs (Beverely, MA). Cosmid 13-6-4 was the original clone isolated. Plasmid pPWC1-HSma was constructed in two steps. The first step entailed deleting the HindIII fragment of 13-6-4 (from the HindIII site internal to the insert to the HindIII site (not shown in Figure 1) in the pLAFR vector just beyond (to the right) the BamHI site) by digestion of 13-6-4 with HindIII and subsequent religation. The resulting cosmid (13-6-4-AH) was digested with the enzymes HindIII and SmaI and the 3.8 kb DNA fragment was isolated and inserted into HindIII-SmaI digested pBluescript. The HindIII-SmaI fragment was transferred to pRK415 by cutting both plasmids with the enzymes XbaI and KpnI and then isolating and ligating the fragments. The resulting plasmid was designated pPWC2-HSma (see Figure 1). Plasmid pPWC1-Cs was constructed by cutting 13-6-4 with ClaI enzyme, isolating the small (3.3 kb) DNA fragment and inserting it into ClaI digested, calf intestinal alkaline phosphatase treated pBluescript. The ClaI fragment was transferred into pRK415 by cutting pPWC 1 -C and the vector with XbaI and KpnI enzymes (to generate pPWC2-Cs) or with KpnI and EcoRI enzymes (to generate pPWC2-Cs', the reverse orientation of pPWC2-Cs) and ligating.

Restriction mapping was carried out with fragments inserted into the pBluescript vector to facilitate identification of restriction sites and to help place the sites on a restriction map. Digests were run on varying percentages of agarose gels with size standards to estimate the size of the fragments and to locate restriction sites.

Testing For Complementation Cosmid clones and subclones constructed in pLAFR3 or plasmid subclones constructed in pRK415 were mated into the tutB-16 mutant background via the triparental mating technique. The resultant transconjugant strain was tested to determine if the subclone complements the mutation. First, the transconjugants were streaked onto minimal + nitrate plates in which toluene was supplied in the vapor phase. After 5-7 days of anaerobic incubation (300 C), the subclones were scored for the ability to restore growth on toluene to

the mutants. The transconjugants were also grown in sealed 50 ml serum bottles of minimal + nitrate (10 mM) + pyruvic acid (1 mM) + toluene (0.4 mM) liquid media with an argon headspace. After 3-4 days of incubation (300 C) samples were withdrawn for toluene and dead-end product analysis (see below). The clones were scored for the ability to restore toluene utilization (in the presence of pyruvate) in liquid culture and for the ability to restore production of the dead-end metabolites under the same conditions to the mutants. If the transconjugate was positive for all three of these tests, the subclone was considered to complement the mutation.

Toluene Analysis One ml samples of the culture to be tested were withdrawn anaerobically and added to 400 ml of pentane containing 1 mM fluorobenzene as an internal standard in a sample vial.

One ml of the organic phase (into which toluene had been extracted) was injected using a CTC A200S autosampler (LEAP Technologies, Chapel Hill, NC) into an HP5890 gas chromatograph (Hewlett Packard, Palo Alto, CA) equipped with a Flame Ionization Detector, a DB-WAX column (J&W Scientific, Folsom, CA) and helium as the carrier gas. The injector temperature was set at 2500 C, the detector at 3000 C, and the column at 35"C. The amount of toluene present in each sample was quantified by comparison to external standards using the Chemstation software (Hewlett Packard).

Analysis Of Dead-End Products Samples of the culture were withdrawn anaerobically with a sterile syringe flushed with argon. The samples were centrifuged (5 min., microfuge) and the supernatant was filtered through a 0.45 mm filter (Millipore, Bedford, MA) into a sample vial. Samples were analyzed by high pressure liquid chromatography using a Beckman System Gold HPLC (Fullerton, CA) equipped with a Gilson (Middleton, WI) autosampler and a C18 column (250 mm by 4.6 mm, particle size 5 mm, Beckman) with UV detection at 260 nm. The mobile phase was 30:68:2 methanol:water:acetic acid (vol/vol) at a flow rate of 1 ml/min. Peaks were identified by comparison to the external standards benzylmaleic acid and benzylsuccinic acid.

Plasmid DNA Preparation In general DNA plasmid minipreps were performed. When larger scale preps were needed, Qiagen maxi-preps were carried out (Qiagen, Chatsworth, CA) according to the manufacturer's instructions.

DNA Sequence Analysis DNA was sequenced (both strands) by the dideoxy method of Sanger et al. with (a-35S)dATP serving as the label. Sequenase enzyme (modified T7 polymerase) and reagents were obtained in a Sequenase kit from U.S. Biochemicals (Cleveland, Ohio). The Bluescript vector and the T3, T7, -20, and M13 reverse primers used for sequence analysis were obtained from Stratagene. An Erase-a-Base System (Promega, Madison, Wis.) was used to generate deletions of the cloned DNA inserted in the Bluescript vector for sequence analysis.

Synthetic oligonucleotide primers were also purchased so that sequence data could be obtained to fill in gaps not covered by the deletions. Searches for protein sequence similarity were carried out against the Swissprot data base (release 32.0) of protein sequences using the FASTA and BLAST programs in the GCG software package (version 7.2) (GCG software, Madison, Wis.). Multiple sequence alignment was performed with the Lasergene software package from DNASTAR (Madison, Wis.).

Site-Directed Mutagenesis The QuickChange site directed mutagenesis kit (Stratagene) is used to make mutations in the tutD gene. To change the a glycine to an alanine, primers G828AF (GTGCGCGTTTCCGCCTACAGCGCTC) and G828AR (GAGCGCTGTAGGCGGAAACGCGCAC) are synthesized and used as directed. Plasmid pPWC3-CL-SacII serve as the target for the mutagenesis. The resulting plasmids are sequenced to identify those containing the desired mutation. The 4.9 kb SacI/SacII fragment of three plasmids with the correct change are subcloned into plasmid pRK415 and used to test for complementation of the tutDl 7 mutation. To change the cysteine at position 492 to an alanine primers C492AF (CAACGTGCTGGCCATGTCGCCCGGCATCC) and C492AR (GGATGCCGGGCGACATGCCCAGCACGTTG) are synthesized and used in the same manner described above.

EXAMPLE 1 This example describes the isolation and characterization of tut mutants. Cells of strain T 1 were grown and mutagenized with nitrosoguanidine as described above. Mutants were isolated from the treatment group that resulted in about 50% killing. Cells were diluted and plated onto minimal medium supplemented with nitrate and pyruvic acid to a density of about 100 - 200 colonies per plate. After about 5 days of incubation at 300 C in the absence of oxygen the colonies were replica plated to both rich medium and minimal medium with nitrate and with toluene supplied in the vapor phase. After incubation, colonies that grew on the rich medium but failed to grow on the minimal medium with nitrate and toluene were chosen for further study. Of about 10,000 colonies screened, 32 candidates were isolated in this manner. These 32 mutant candidates were again tested for their ability to grow on minimal medium supplemented with nitrate and toluene both in liquid and on plates.

Retesting the candidates identified seven mutants which were truly defective for toluene utilization. These seven were designated tut mutants for their defect in toluene utilization.

The seven tut mutants were tested for their ability to grow on various carbon sources.

Four of the mutants are able to use benzoic acid and phenylpropionic acid as a sole carbon source while three are not able to use either substrate. Based on this observation, the first group is predicted to be blocked early in the toluene utilization pathway and were designated tutB mutants. The second group is blocked later in the pathway, probably in benzoic acid utilization. This group was designated tutA. These designations are not meant to imply that all mutants in a particular group are defective in the same gene or in the same step of the pathway, only that they utilize the same range of substrates.

The tut mutants were also tested for their ability to metabolize toluene when provided with both toluene and pyruvic acid in liquid media. Pyruvic acid was added to insure that the transconjugants grew and that there was no selective pressure for reversion of the mutation to occur. Although the tutB-16 mutant metabolized toluene slightly, none of the tutB mutants tested were able to metabolize toluene to the same extent as the wild type control. Similarly, the tutB mutants did not produce significant amounts of the dead-end products benzylsuccinic acid and benzylfumaric acid. Members of the tutA class of mutants were able to both metabolize toluene and produce the dead-end products. This result indicates that the tutB mutants are blocked in a step (or steps) that is common to both the metabolic degradation of toluene and the side reaction that produces the dead-end compounds or in the regulation of such a step (or steps).

EXAMPLE 2 This example describes the generation of T1 DNA library and the isolation of a clone that complements the tutB-16 mutant. It has previously been shown that pLAFR3 derived cosmids can be transferred into and stably maintained in the strain T1 background.

Consequently, this vector was chosen for the construction of a genomic DNA library of strain T1. Genomic DNA was isolated from strain T1 as described above. A partial digest of the genomic DNA was carried out with the restriction enzyme Sau3AI and fragments of between 15 and 25 kb were isolated. These fragments were ligated into the BamHI site of pLAFR3.

The resulting ligation mix was packaged into lambda phage heads and used to infect E. coli strain HB101. About 750 tetracycline resistant E. coli colonies were picked and formed the genomic library used to isolate clones that complement the tut mutations.

The genomic T1 library constructed in pLAFR3 was introduced into a T1 derived strain carrying the tutB-16 mutation via a triparental mating. The donors for all the cosmids were E. coli strain Hub 101 derived strains, while E. coli Hub 101 carrying plasmid pRK2013 served as the helper to mobilize the cosmids. Transconjugants were selected on minimal medium supplemented with nitrate, pyruvic acid, and tetracycline and then screened for the ability to grow with toluene serving as the sole carbon source. One cosmid, designated 13-6-4, restored the ability of the tutB-16 carrying T1 strain to grow on toluene. This cosmid also restored the ability of the mutant strain to metabolize toluene in the presence of pyruvic acid in liquid culture and produce the dead-end products benzylsuccinic acid and benzylfumaric acid in this culture. This cosmid was used for further subcloning and restriction mapping to specifically identify the region containing the complementing gene.

In an effort to determine where on the cosmid the fragment that complements the tutB-16 mutation lies, deletions and subclones were constructed. All subclones were made in plasmid pRK4 15, a broad host range tetracycline resistance vector that can be conjugatively transferred into the T1 background in the same manner as pLAFR3 and is stably maintained in this background. Figure 1 shows a restriction map of cosmid 13-6-4. The relevant region of the cosmid is shown in more detail. The figure includes a number of subclones that were constructed in an effort to identify the region of the cosmid that contains the complementing gene.

Complementation tests were performed for the various subclones shown in Figure 1 when mated into a T1 strain carrying the tutB-16 mutation. Complementation was assayed in three ways: (1) the ability to grow with toluene serving as the sole carbon source on solid

media, (2) the ability to metabolize toluene in the presence of pyruvic acid in liquid media, and (3) the ability to produce the dead-end products benzylsuccinic acid and benzylfumaric acid from toluene in liquid media. The original clone and all complementing subclones were positive (i.e., behaved just as the wild type strain) in all three assays.

The small 3.3 kb ClaI fragment of 13-6-4 when inserted into pRK415 in either orientation is able to complement the tutB-16 mutation. Subclones constructed that do not contain this entire region do not complement this mutation. These results indicate that this 3.3 kb fragment is sufficient to replace the missing activity in the tutB-16 mutant strain.

EXAMPLE 3 This example describes the sequence analysis of the tutCB region. The complete nucleotide sequence of the 3.3 ClaI fragment of 13-6-4 (containing the tutB gene) was determined in both orientations. Analysis of this sequence revealed the presence of a second open reading frame (designated tutC) upstream of the tutB gene. As a result, the sequence was extended to a SacII site about 3 kb upstream of the ClaI site. Figure 2 presents the complete 6393 bp nucleotide sequence of the tutCB region. The protein translation of the two genes are presented below the DNA sequence in the figure. The TutC protein is 979 amino acids long with a calculated molecular mass of 108.0 kDa and a calculated pI of 5.2, while the TutB protein is 218 amino acids long with a calculated molecular mass of 24.3 kDa and a calculated pI of 7.9.

Goldman-Engleman-Steitz hydropathicity analysis failed to detect any membrane spanning regions in either protein but Kyte-Doolittle analysis suggested two possible membrane spanning regions in the TutC protein, amino acids 367-399 and 489-508 (data not shown). The translation of the tutB gene is shown as over-lapping the sequence of the tutC gene by 13 nucleotides. This methinone was chosen as likely to be the first amino acid in the sequence based on the location of a potential Shine-Dalgarno sequence and protein similarity analysis.

The protein sequence of the tutC gene product was compared to the Swissprot protein data base in an effort to identify other proteins with homologous sequences. The results of this analysis are presented in Figure 4. The TutC protein shows significant sequence similarity to sensor members of the two component family of signal transduction proteins, a set of bacterial regulatory proteins in which one member senses the environmental conditions of the microorganism and transmits a signal (via phosphorylation) to the other member (a

DNA binding protein). The five proteins, all sensor proteins, with the greatest sequence similarity to the tutC gene product are included in Figure 4. These proteins (and their percent identity to the tutC gene product) are the products of the nodV gene of Bradyrhizobium japonicum (36%), fxL gene of B. japonicum (33%), Azorhizobium caulinodans (30%), and Rhizobium meliloti (30%), and dctS gene of Rhodobacter capsulatus (33%).

In a similar manner, the sequence of the tutB gene product was compared to the Swissprot protein data base in an effort to identify other proteins with homologous sequences.

The results of this analysis are presented in Figure 3. The TutB protein shows significant sequence similarity to DNA binding protein members of two component sensor/regulator families. These proteins (and their percent identity to the tutB gene product) are the products of the nodW gene of B. japonicum (48%), thefxJ gene of B. japonicum (38%), A. caulinodans (37%), and R. meliloti (39%), and the dctR gene of R. capsulatus (38%).

Because the similarity between these proteins and TutB extends nearly to the methionine that over-laps the tutC gene product, it is believed that translation begins at this over-lapping methionine. Based on the results of the sequence similarity analysis and the previous result that the toluene utilization pathway of strain T 1 is inducible, the tutB and tutC gene products are likely involved in the regulation of gene expression (specifically toluene metabolic genes) in response to toluene.

EXAMPLE 4 This example describes the identification and cloning of the tutD and tutE genes. One class of mutants, the tutB class, are unable to grow with toluene serving as the sole carbon source but was able to grow when provided with benzoate. These mutants are also unable to metabolize (at wild type levels) toluene when provided with pyruvate and were unable to produce (at wild type levels) benzylsuccinic acid and a monounsaturated derivative from toluene in liquid media. P.J. Evans, et al., Metabolites formed during anaerobic transformation of toluene and o-xylene and their proposed relationship to the initial steps of toluene mineralization. Appl. Environ. Microbiol. 58:496(1992). Hence, it is determined this class of mutants is blocked early in the toluene utilization pathway. A cosmid with a genomic insert of approximately 20 kb (cosmid 13-6-4) is isolated for its ability to complement the tutBl6 mutation. P.W. Coschigano et al., Identification and sequence analysis of two regulatory genes involved in anaerobic toluene metabolism by strain T1. Appl.

Environ. Microbial. 63:652(1997). This original cosmid clone, along with a number of

subclones generated in the characterization of the tutB gene, are tested for their ability to complement the mutations referred to as tutB17 and tutB21, which have phenotypes similar to the tutB16 mutation. These mutations are placed in new complementation groups and are designated tutDl 7 and tutE21.

Determining where on the cosmid the fragments that complement the tutDl 7 and tutE21 mutations are located, a series of subclones are constructed. Subclones are made in plasmid pRK415, a broad host range tetracycline resistant vector that can be conjugatively transferred into the T1 background. Figure 10 shows a restriction map of cosmid 13-6-4 and a schematic representation of three of the subclones. Each subclone is tested for its ability to complement the tutDl 7 and tutE21 mutations. Complementation was assayed in three ways: (1) the ability to grow with toluene serving as the sole carbon source on solid media, (2) the ability to metabolize toluene in the presence of pyruvic acid in liquid media, and (3) the ability to produce benzylsuccinic acid and a monounsaturated derivative from toluene in liquid media. P.J. Evans, et al. Metabolites formed during anaerobic transformation of toluene and o-xylene and their proposed relationship to the initial steps of toluene mineralization. Appl. Environ. Microbiol. 58:496(1997). Restoration of the wild type phenotype in all three assays is required in order for the subclones to be considered as complementing the mutation.

As shown in Figure 10B, the tutDl 7 mutation and the tutE21 mutation are complemented by mutually exclusive subclones. The 3.0 kb NcoI fragment of 13-6-4 (pPWC4-CLN) is able to complement the tutD17 mutation but not the tutE21 mutation.

Conversely, the adjacent 1.3 kb NcoI/SacII fragment (pPWC4-CLNSac) is able to complement the tutE21 mutation but not the tutD17 mutation. These data suggest the 3.0 kb NcoI fragment is sufficient to replace the missing activity in the tutDl 7 mutant strain and the 1.3 kb NcoI/SacII fragment is sufficient to replace the missing activity in the tutE21 mutant strain; thereby confirming the mutations belong to distinct complementation groups.

EXAMPLE 5 This example describes the complete nucleotide sequence of the 4905 bp SacII/EcoRI fragment of cosmid 13-6-4 (containing the tutD and tutE genes), as determined for both strands. This nucleotide sequence has been deposited in the GenBank (accession number AF036765). Analysis of this sequence reveals the presence of four open reading frames on the same strand of DNA. The first open reading frame, present between the SacII and NcoI

sites (subclone pPWC4-CLNSac) and corresponding to the tutE gene is a sequence of 375 amino acids. The TutE protein has a calculated molecular mass of 41,300 Da and a predicted pI of 6.8.

Two open reading frames are identified on the 3.0 kb NcoI fragment immediately downstream of the tutE gene (subclone pPWC4-CLN). The first of these two open reading frames (designated open reading frame 2) consists of a 60 amino acid sequence which would code for a protein with a calculated molecular mass of 6,900 Da and a predicted pI of 5.2.

The translational start begins at the NcoI restriction site and hence no upstream transcriptional regulatory sites or ribosome binding sites for this open reading frame are included on this fragment. Therefore, it is highly unlikely that this open reading frame is responsible for the complemtation of the tutDl 7 mutation observed with this subclone. This observation, along with evidence from the site-directed mutagenesis experiments indicates that ORF2 is not the tutD gene.

The second open reading frame in this fragment is 864 amino acids in length with a calculated molecular mass of 97,600 Da. The predicted pI of this protein is 6.0. Results from the site-directed mutagenesis clearly show that this open reading frame corresponds to the tutD gene.

The fourth open reading frame (designated open reading frame 4) identified in the SacII/EcoRI fragment consists of a sequence of 81 amino acids with a calculated molecular mass of 9,300 Da and a predicted pI of 7.8. The pPWC4-CLN subclone removes approximately 50% of the C-terminal end of this protein. This result, in conjunction with the evidence presented regarding the third open reading frame, indicates that this 81 amino acid protein is not the tutD gene product.

EXAMPLE 6 This example describes homologies between the protein sequence of the tutD and tutE gene product and proteins in the Genbank protein database. The BLAST program identified a number of similar proteins, all of which are identified as either pyruvate formate-lyases (formate acetyl transferases) or pyruvate formate-lyase homologues. Interestingly, the sequences showing the highest degree of similarity with TutD are the E. coli proteins f8 10 (27% identical to TutD as calculated by the BLAST program) and PflD (26% identical to TutD), both pyruvate formate-lyase homologues. F.R. Blattner, et al. Analysis of the Escherichia coli genome. IV. DNA sequence of the region from 89.2 to 92.8 minutes.

Nucleic Acids Res. 21:5408(1993). F. Blattner, et al, The complete genome sequence of Escherichia coli K-12. Science (Wash. D.C.). 277:1453(1997).

The sequence similarities between TutD and these two proteins plus PflB (22% identical to TutD), a pyruvate formate-lyase from E. coli, are shown in Figure 11. R. Rabus, et al., Complete oxidation of toluene under strictly anoxic conditions by a new sulfate-reducing bacterium. Appl. Environ. Microbiol. 59:1444( 1993). A.F. Wagner, et al.

The free radical in pyruvate formate-lyase is located on glycine-734. Proc.Natl. Acad. Sci.

USA. 89:996(1992). As can be seen in Figure 11, the most conserved region is in the carboxyl end of these proteins. There is a highly conserved region around the glycine residue at position 828 of TutD (marked with an asterisk). In the E. coli pyruvate formate-lyase, this glycine has been shown to form a free radical which is essential for enzymatic function.

Additionally, in a less conserved region there is a cysteine residue at position 492 of TutD (marked with a dagger) that has been shown to transiently form a covalent bond with the acetyl group that is being transferred, an action which is also essential to enzyme function.

A. Ogiwara, et al. Construction and analysis of a profile library characterizing groups of structurally known proteins. Protein Sci. 5:1991(1996). W. Rödel, et al.. Primary structure of Escherichia coli pyruvate formate-lyase and pyruvate formate-lyase activating enzyme deduced from the DNA nucleotide sequences. Eur. J. Biochem. 177:153(1988). While it is not intended that the instant invention be limited to any one mechanism, the results of this protein sequence similarity analysis suggest a mechanism for TutD where glycine-828 forms a free radical which is necessary for the transient formation of a covalent bond between cysteine-492 and the compound (possibly acetate or fumarate) that is being transferred to the methyl group of toluene (or a toluene metabolite). This mechanism may involve a transient cysteine radical at an undetermined location, as proposed in the E. coli pyruvate formate-lyase system. A.F. Wagner, et al.. The free radical in pyruvate formate-lyase is located on glycine-734. Proc.Natl. Acad. Sci. USA. 89:996(1992).

A similar search was performed with the protein sequence of the tutE gene product.

The proteins with the highest homology are identified as pyruvate formate-lyase activating enzymes or pyruvate formate-lyase activating enzyme homologues. The sequence similarities between TutE and f308 (34% identical to TutE as calculated by the BLAST program), PflC (32% identical to TutE), and PflA (28% identical to TutE) (all from E. coli) are shown in Figure 13. Subsequent subjection of the TutE protein sequence to a Motif analysis identified a radical activating region from amino acids 60 to 81 (labeled with a line over it in Figure

13). This region which contains potential Fe binding sites (as identified by the Motif analysis) is conserved in the pyruvate formate-lyase activating enzymes. Additionally, the analysis revealed a 4Fe-4S binding domain typically found in ferredoxins (amino acids 98 to 109, labeled with a box over it in Figure 13). This region is not very well conserved in the E. coli pyruvate formate-lyase activating enzyme and homologues. PflA is missing this region and both f308 and PflC have alterations to the spacing or sequence. The results of this protein sequence similarity analysis are consistent with the predicted role of TutE serving as the activator for TutD and suggest that the activation may involve iron and/or iron-sulfur binding.

EXAMPLE 7 This example describes various protocols to examine the regulation of the tutD and tutE genes. To confirm that tutD and tutE genes are regulated in response to toluene, a Northern blot analysis is performed. Wild type cells of strain T1 are grown in liquid media containing either pyruvate or toluene as the carbon source. RNA is isolated from both of these cultures and subjected to Northern analysis. About 1 micro gram of total RNA from each culture is loaded in each of two lanes on a 1% gel. After electrophoresis the RNA is transferred to a nylon membrane and cut in two. One set of RNA is hybridized to a tutD probe while the other was hybridized to tutE probe. Figure 14 shows that only cells grown with toluene as the carbon source have tutD and tutE mRNA. It can also be seen that the size of the two messages differ, indicating that the two genes are not contained in one polycistronic mRNA. The fact that both genes are regulated by toluene suggests that common regulatory protein binding sites is upstream of these and possibly other toluene metabolic genes.

EXAMPLE 8 This example describes the site-directed mutagenesis of TutE protein. Specifically, two cysteine are individually changed to an alanine in an effort to determine if the conserved potential Fe binding site (as identified by the Motif analysis) of TutE plays a role in the enzymatic function of the protein. Three independent isolates of the resulting plasmids (pPWC-CLNSac-C72A, pPWC4-CLNSac-C79A, and pPWC4-CLNSac-C101A) are mated into the strain carrying the tutE21 mutation and the resulting transconjugants are then tested for their ability to complement the mutation. The plasmid carrying the unaltered clone (pPWC-

CLNSac) fully complements the tutE21 mutation (utilizing 100% of the toluene provided in the presence in of pyruvate and produces wild type levels of benzylsuccinic acid and a monounsaturated derivative). Neither of the altered plasmids pPWC4-CLNSac-C72A and pPWC4-CLNSac-C79A are able to complement the tutE21 mutation (see Table 2). both of these strains utilize about the same amount of toluene as is utilized by the mutant carrying plasmid pRK415, the vector alone. Likewise, they produce significant;y less benzylsuccinic acid and a monounsaturated derivative than the tutE21 mutant strain carrying the unaltered plasmid pPWC4-CLSac. In fact, they produce about the same amount of these compounds as the mutant carrying plasmid pRK4 15. Therefore. the results in Table 2 clearly demonstrate that cysteine 72 and cysteine 79 are essential for function of the TutE protein. Thus, while it is not intended that the present invention be limited to any one mechanism, the role of iron binding appears to be a mechanistic feature of the TutE protein in its role in toluene metabolism by strain T1.

TABLE 2 Plasmid Percent toluene utilized Percent benzylsuccnminic acid like compound producedc pPWC4-CLNSaca 100 100 pRK41Sb 31.3 # 5.4 8.6 # 1.2 pPWC4-CLNSac-C72A 19.5 # 7.4 8.3 # 0.8 pPWC4-CLNSac-C79A 31.3 j 13.8 7.8 t 1.9 pPWC4-CLNSac-C101A 88.8 # 13.8 55.7 # 6.1 the plasmid carrying the unaltered clone, serving as a positive control b the vector alone. serving as a negauve control @ normalized to 100% for pPWC4-CLNSac, inc positive control

EXAMPLE 9 This example describes the site-directed mutagenesis of TutD protein. To determine if the conserved glycine and cysteine residues of TutD play an essential role in the enzymatic function of the protein has been shown for PflB, both amino acids are individually changed to an alanine as described in materials and methods. W. Plaga, et al. Catalytic-site mapping of pyruvate formate lyase. Eur. J. Biochem. 178:445(1988), W. Rödel, et al. Primary structure of Escherichia coli pyruvate formate-lyase and pyruvate formate-lyase activating enzyme deduced from the DNA nucleotide sequences. Eur. J. Biochem. 177:153(1988). Three independent isolates of the resulting plasmids (pPWC4-CLSac-G828A and pPWC4-CLSac-C492A) are mated into the strain carrying the tutDl 7 mutation and the resulting transconjugants are then tested for their ability to complement the mutation. The plasmid carrying the unaltered clone (pPWC4-CLSac) fully complements the tutDl 7 mutation (utilizes 100% of the toluene provided in the presence of pyruvate and produces wild type levels of benzylsuccinic acid and a monounsaturated derivative). Neither of the altered plasmids pPWC4-CLSac-G828A nor pPWC4-CLSac-C492A are able to fully complement the tutD17 mutation (see Table 3).

Both of these strains utilized about the same amount of toluene as was utilized by the mutant carrying plasmid pRK415, the vector alone. Likewise, they produce significantly less benzylsuccinic acid and a monounsaturated derivative than the tutD1 7 mutant strain carrying the unaltered plasmid pPWC4-CLSac. The mutant carrying plasmid pPWC4-CLSac-C492A produced about the same amount of these compounds as the mutant carrying plasmid pRK415, while the strain carrying plasmid pPWC4-CLSac-G828A show higher levels of these compounds than the vector alone but levels much lower than observed with the unaltered plasmid. Since the E. coli pyruvate formate-lyase is known to be a homodimer which requires the formation of only one glycine free radical, the small amount of activity observed in the mutant carrying plasmid pPWC4-CLSac-G828A may be due to mixed dimers where the free radical forms on the defective chomosomally encoded TutD protein. A. F. Wagner, et al., The free radical in pyruvate formate-lyase is located on glycine-734. Proc. Natl. Acad.

Sci. USA. 89:996(1992). The results in Table 3 clearly demonstrate that glycine 828 and cysteine 492 are essential for function of the TutD protein. While it is not intended the present invention be limited to any one mechanism, the role of a glycine free radical and a covalent substrate-cysteine bond appear to be important mechanistic features of the TutD protein in its role in toluene metabolism by strain T1.

From the above, it should be clear that the present invention provides genes encoding toluene degrading enzymes useful for bioremediation. The genes can be used with an expression vector to over-express the enzymes in a host. In addition, the genes can be used to confer the ability of toluene degradation in an host organism that was not otherwise able to degrade toluene. In this manner, an organism that is native to a waste source (and therefore adapted for competition in the waste source) can be modified to have toluene degrading capabilities. In addition, an organism that is adapted to the laboratory that can overexpress the enzyme in large amounts can be made and used to provide a more efficient system of bioremediation (both in situ and ex-situ).

TABLE 3 Percent monounsaturated benzylsuccinic acid derived Plasmid Percent toluene utilized compound produced" pPWC4-CLSac 100 pRK415 23.5 + 64 1.3 + 0.1 pPWC4-CLSac-G828A 34.2 + 9.7 13.0 + 3.8 pPWC4-CLSac-C492A 17.7 + 5.4 1.8 + 0.1 a The plasmid carrying the unaltered clone, serving as a positive control. b The vector alone, serving as a negative control.

Normalized to 100% for pPWC4-CLSac, the positive control.

Previous Patent: SITE-SPECIFIC SYNTHESIS OF PSEUDOURIDINE IN RNA

Next Patent: PROCESS FOR PURIFYING SUGAR SOLUTIONS