Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
DNA ENCODING CEPHAMYCIN BIOSYNTHESIS LATE ENZYMES
Document Type and Number:
WIPO Patent Application WO/1995/029253
Kind Code:
A1
Abstract:
DNA encoding the late enzymes involved in the synthesis of the antibiotic cephamycin have been isolated and purified. The particular enzymes are involved in the late steps of cephamycin biosynthesis. These DNA's have been sequenced and cloned into recombinant expression vectors for their recombinant expression in host cells. The DNA's, vectors containing them and recombinant host cells which express them are useful for the production of antibiotics.

Inventors:
COQUE JUAN JOSE R (US)
ENGUITA FRANCISCO J (US)
FUENTE JUAN L (US)
LLARENA FRANCISCO J (US)
LIRAS PALOMA (US)
MARTIN JUAN F (US)
Application Number:
PCT/US1995/004801
Publication Date:
November 02, 1995
Filing Date:
April 17, 1995
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
MERCK & CO INC (US)
COQUE JUAN JOSE R (US)
ENGUITA FRANCISCO J (US)
FUENTE JUAN L (US)
LLARENA FRANCISCO J (US)
LIRAS PALOMA (US)
MARTIN JUAN F (US)
International Classes:
C12N15/09; C12N9/02; C12N9/10; C12N15/54; C12P35/00; C12P35/06; C12R1/365; (IPC1-7): C12P35/06; C12N9/02; C12N9/10; C12N15/53; C12N15/54; C12N15/74
Foreign References:
US4885251A1989-12-05
US5070020A1991-12-03
Other References:
BIO/TECHNOLOGY, Volume 6, Number 10, issued October 1988, C.W CHEN et al., "Cloning and Expression of a DNA Sequence Conferring Cephamycin C Production", pages 1222-1224.
ANTIMICROBIAL AGENTS AND CHEMOTHERAPY, Volume 37, Number 1, issued January 1993, X. XIAO et al., "Cloning of a Streptomyces Clavuligerus DNA Fragment Encoding the Cephalosporin 7alpha-Hydroxylase and Its Expression in Streptomyces Lividans", pages 84-88.
THE EMBO JOURNAL, Volume 12, Number 2, issued 1993, J.J.R. COQUE et al., "Genes for a beta-Lactamase, a Penicillin-Binding Protein and a Transmembrane Protein are Clustered with the Caphamycin Biosynthetic Genes in Nocardia Lactamdurans", pages 631-639.
BIOCHEMICAL JOURNAL, Volume 185, Number 3, issued 01 March 1980, S.J. BREWER et al., "An Adenosine Triphosphate-Dependent Carbamoylphosphate-3-Hydroxymethylcephem O-Carbamoyltransferase from Streptomyces Clavuligerus", pages 555-564.
BIOCHEMICAL JOURNAL, Volume 280, Number 2, issued 01 December 1991, X. XIAO et al., "Purification and Characterization of Cephalosporin 7alpha-Hydroxylase from Streptomyces Clavuligerus", pages 471-474.
MOLECULAR AND GENERAL GENETICS, Volume 236, Numbers 2-3, issued January 1993, J.J.R. COQUE et al., "Characterization and Expression in Streptomyces Lividans of cefD and cefE Genes from Nocardia Lactamdurans: the Organization of the Cephamycin Gene Cluster Differs from that in Streptomyces Clavuligerus", pages 453-458.
FEMS MICROBIOLOGY LETTERS, Volume 110, Number 1, issued 01 June 1993, J.J.R. COQUE et al., "Analysis of the Codon Usage of the Cephamycin C Producer Nocardia Lactamdurans", pages 91-95.
JOURNAL OF BACTERIOLOGY, Volume 173, Number 19, issued October 1991, J.J.R. COQUE et al., "A Gene Encoding Lysine 6-Aminotransferase, which Forms the beta-Lactam Precursor alpha-Aminoadipic Acid, is Located in the Cluster of Cephamycin Biosynthetic Genes in Nocardia Lactamdurans", pages 6258-6264.
FEMS MICROBIOLOGY LETTERS, Volume 110, Number 2, issued 1993, J.M. WARD et al., "The Biosynthetic Genes for Clavulanic Acid and Cephamycin Production Occur as a 'Super-Cluster' in Three Streptomyces", pages 239-242.
Download PDF:
Claims:
WHAT IS CLAIMED IS:
1. An isolated and purified DNA molecule encoding a late enzyme of cephamycin biosynthesis.
2. The isolated and purified DNA molecule of claim 1 , wherein said DNA encodes a protein having 3'hydroxymethylcephem O carbamoyltransferase activity.
3. The DNA molecule of claim 2 wherein said DNA comprises a nucleotide sequence: ACGATGCACGCGGTCACCTCGTAGCCACCGTGCCCGCGACCCCCGGCCCACGAGGCCGGGGGCGCGGTC CCGACCCTCGTCACGCGTTGGAGGAACGAATGCTCATCGTCGCGTTCAAACCGGGGCACGACGGTGCCG TCGCCGCGATCGGCGATCGCCGGTTGCTCTACTCGCTCGAATCGGAGAAGGACTCCCGGCCGCGGTACT CGCCGATCCTGGCCACCACCGTGCTCGACCTCGCCGAGCGGCTGGGCGAGGTGCCGGACGTGGTCGCCC TCGGCGGCTGGAGCGACCTGCGGCCCAACCGCATCTCCTACACCGGCGCCGGGTACTCGGGCATCGAAG AACCCACCGTGACCACCTCGCGCTTCTTCGGCAAGGAGGTGAAGTTCTTCAGCTCCACGCACGAACGTT CGCACATCTACATGGCCCTGGGCATGGCGCCGAGGGACGACAGCCCGGTCCAGACGGTGCTGGTGTGGG AGGGTGACGTCGGTGCCTTCTACGTGATCGACGGGCACCAGCGGATCACCCGCAAGGTCCAGGTGATGT CCGGCCCCGGCGCGCGCTACTCGTTCCTCTTCGGCCTCGCCGACCCCACTTTCCCCACCACCGGCGGGA AACCGCGGCTGAACGACGCCGGGAAGCTGATGGCGCTGGCGGCCTTCGGCGACTCCGCCGACGCGGACG CGGACATCACGCACGTGGTCGAGCGGATCCTCAAGCAGGACTCGATGTACCCGGCGCCGAAGGGTGAAT ACCGGGATTCGGTGCTGTACAACGCCGGGGTCGAGTCGCCGGAGTGCAAGATCGCCGCCGCGCTGCTCA CCGAACGCCTCTTCGAGACCTTCGCCGAGGTCGCCAGGCAGGAGATGCCCGAAGGCAGCCCGCTCTACA TCTCCGGCGGCTGCGGGCTGAACTGCGACTGGAACAGCCTGTGGGCGCAGCTCGGCCACTTCTCCTCGG TGTTCGTCGCGCCGTGCACCAACGACTCCGGTTCCGCGCTGGGCACCGCCATCGACGCGCTCACCACCT TCACCGGTGACCCGCACGTCGACTGGAGCGTCTACAGCGGACTGGAATTCGTCACCGACACCCAGCCGG ACCCGGCCAGGTGGACCTCCCGCCCGCTCGAGCACGACGAGCTCTCCGGCGCGCTCGCCGGTGGCCGGG TCGTCGCCTGGGTGCAGGGCCGCTGGGAGATCGGTCCGCGCGCGCTGTGCAACCGCTCGCTGCTGGCCG AGCCGTTCGGCGCGGTGACCAGGGACCGGCTCAACGAGATCAAGCAGCGCGAGGACTACCGCCCGATCG CGCCCGTGTGCCGGGTCGAGGACCTGGGCAAGGTCTTCCACGAGGACTTCGAAGACCCGTACATGCTCT ACTTCCGGCGGGTGCGCGAGTCCAGCGGCCTGCGCGCGGTGACCCACGTGGACGGTTCGGCCCGCGTGC AGACCGTGCGGGATTCGGGCAACCCGCAGATGCACCGGCTGCTCTCGGCCTTCGCCGCCCAGCGCGGTG TCGGCGTGCTGTGCAACACCTCGCTGAACTTCAACGGCGAGGGGTTCATCAACCGCATGTCGGACCTGG TGCTCTACTGCGAATCCCGCGGCATCTCGGACATGGTCGTCGGCGATACCTGGTACCAGCGTGCCGAGG GCTGACCCCGGCGCCGGACGGCCGTGAGCGCGAGCGTCCGGCGG (SEQ.ID.NO. :1) Or functional derivative thereof .
4. The isolated and purified DNA molecule of claim 1 , wherein said DNA encodes a protein having 3'methylcephem hydroxylase activity.
5. The DNA molecule of claim 4 wherein said DNA comprises a nucleotide sequence: AAGACGGTACCGGTCTTCAGCATGGCCGAACTGCGCGACGGCTCGCGCCAGGACGAGTTCCGCGAGTGG GCCCGCCGCGGGGTCTTCTACCTCACCGGGTACGGCGCCACCGAACGAGACCACCGGGTGGCCACCGAC ACCGCGATGGACTTCTTCGCCCAAGGCACGGCCGAGGAGAAGCAGGCCGTGACCACGAAGGTCCCGACC ATGCGGCGCGGGTACTCGGCGCTGGAGGCGGAAAGCACCGCCCAGGTCACCAACACCGGCACCTACACC GACTACTCCATGTCGTACTCGATGGGCATCGGCGGCAACCTGTTCCCGTCGAAGGAGTTCGAGTCGGTC TGGACGGACTACTTCGACAGCCTGTACCGCGCCGCGCAGGAGACCGCGCGCCTGGTGCTGACCGCCGCG GGCACCTACGACGGCGAGGACCTCGACACCCTGCTCGACTGCGACCCGGTGCTGCGCCTGCGGTACTTC CCGGAGGTCCCGGAGCACCGCGCCGCCGAGTACGAGCCACGCCGGATGGCCCCGCACTACGACCTGTCC ATCATCACCTTCATCCACCAGACCCCGTGCGCCAACGGTTTCGTCAGCCTGCAGGCCGAAGTGGACGGT GAGATGGTGAGCCTGCCGCACGTCGAGGACGCCGTGGTGGTGCTGTGCGGCGCGATCGCGCCGCTGGTC ACCCAGGGCGCGGTGCCCGCGCCCAACCACCACGTGGTCTCCCCGGACGCGAGCATGCTCAAGGGCAGC GACCGCACCTCGAGCGTGTTCTTCCTGCGCCCGTCGACCGATTTCACCTTCTCGGTGCCCGACGCCAGG AAGTACGGCCTCGACGTCAGCCTGGACATGGAGAAGGCGACCTTCGGCGACTGGATCGGGACCAACTAC GTCACGATGCACGCGGTCACCTCGTAGCCACCGTGCCCGCGACCCCCGGCCCACGAGGCCGGGGGCGCG GTCCCG (SEQ . ID. NO . : 2 ) or functional derivative thereof .
6. The isolated and purified DNA molecule of claim 1 , wherein said DNA encodes a protein having C7 hydroxycephem methyltransferase activity.
7. The DNA molecule of claim 6 wherein said DNA comprises a nucleotide sequence: CGTGCCGACCGTGTCGAGTCCGCTGTACTACGCCGCCCCGCTGACCCCGGACGGCGGGGACGGGGACTG GTGCATCGACGCCGTGACCCGGCCGCCGGAGGTGCTGTTCAACTTCCGCAAGGTGGGCGTGGAGACGAC CATCACCGACCTGCGCGAGGGCTCGATCCGGCCGGCGCTAGACGAAACGGGGTTCGAGAAGGTCACCGC GCCCACCGGCGCGTCCCAGCGGGGCCTGCTGGACAGCGAGGAAGCCGCGCTGGAGCAGTACCGGCGGGA AACCGGTGAGCTGCTCCGCTCGCTCACCGGCGCGGACGTGGTGGAGTTCTTCGACGCCACCCTGCGGCG GCAGGACGCGGCCGACGACCCGGCCGCCCAGTCCCCGCACCAGCGGGTGCACGTGGACCAGAGCCCGGG CAGCGCGCGGGCCAGGGCCGAGCGGCACCTCGGCCCCGGCCGGGAGTTCCGGCGCTTCCAGATCATCAA CGTCTGGCGGCCGCTGCTCGAGCCGGTGCGCAACTTCCCGCTGGCGCTGTGCGACTACCGGTCGCTGGA CCTGTCCGCCGACCTGGTGCCGACCCGGCTGGACTTCCCGGACTGGCTGAAGGACCGCGAGAACTACTC GGTCCGGCACAACCCCGCGCACCGCTGGTACTTCTGGGACTCGCTGACACCGGCCGAAGCGCTGGTCTT CAAGTGCTACGACAGCGCGAGCCGCGGGCTGGCCATGGCCGGTGGCGAGCCGGACGGCGGCGAACTGCG CGACGTGGCGGGTCTCTGCCCGCACACGGCCTTCTTCGACGAGAACGGGCCGTCGACCGGCCACCTGCG CACTTCGCTGGAACTGCGCGCGCTGGCCTTCCACGAATGAACGACGAACGAGGAGCAGGGGAAATGTCG GAc (SEQ . ID . NO . : 3 ) or functional derivative thereof .
8. The DNA molecule of claim 6 wherein said DNA comprises a nucleotide sequence: 5 GGCGGTCCGCTACGGCGACTACCTCAACCACGGCCTGCACTCGCTGATCGTGAAGAACGGCCAGACCTG ATCGAGGAGCACGCATGACTGACACCACCCGCCAGGACTTCCTGGACCTCAACCTGTTCCGGGGGCTGG GGGAGGACCCGGTCTACCACCCGCCGGTGCTGGCCGACCGCCCGCGCGACTGGCCGCTCGACCGGTGGG CCGAGGCCCCGCGCGATCTCGGGTTCTCCGACTTCGCCCGCTACCAGTGGCGCGGCCTGCGCATGCTGA AGAACCCGGACACCCAGGCCGCCTACCACGACCTGATGGTCGAACTGCGGCCCCGCACGGTGATCGAGC 0 TGGGCGTGTACAGCGGTGGCTCGCTGGCTCGGTTCCGGGACATGGCCGAGCTGATGGGCTTCGACTGCC AGGTGCTCGGCATCGACCGGGACCTGTCCCGCTGCCAGATCCCCGAGTCCGAGATGAAGAACATCTCGC TGCGCGAGGCCGACTGCAGCCTGGACCGGTGGAAGCTCGTGGACGCGCTGGACGGCGTGGGCGACCACA AGTAGCTGCTGCGCGTGCGCTTGTGGAAGTTGTAGGACGCCACCAGCCTGGACCACCTCCTGCACGAAG GCGACTACTTCATCATCGAGGACATGATCCCGTACTGGTACCGGTACAGCCCCAAGCTGCTCACCGAGT 5 ACCTCGCCGCGTTCGCCGGGGAGCTGAGCATGGACATGGTCTACGCCAACGCCAGTTCACAACTGGAAC GCGGTGTGCTGCGCCGGTCGGCACCGAAGGCGTAGGTGGAT (SEQ.ID.NO. :4) Or functional derivative thereof.
9. A process for the production of cephamycin ° antibiotics by a cell culture wherein said cells contain one or more recombinant genes encoding one or more late enzymes of cephamycin biosynthesis.
10. The process of claim 9 wherein the late enzymes of cephamycin biosynthesis are selected from the group consisting of 3' hydroxymethylcephem Ocarbamolytransferase, 3'methylcephem hydroxylase, and C7 hydroxycephem methyltransferase.
11. The process of claim 9 wherein the late enzymes of cephamycin biosynthesis are selected from the group consisting of 3' hydroxymethylcephem Ocarbamolytransferase, 3'methylcephem hydroxylase, and C7 hydroxycephem methyltransferase and wherein said cell culture is capable of cephamycin antiobiotic biosynthesis.
12. The process of claim 11 wherein said cells capable of cephamycin biosynthesis are a species of Nocardia.
13. An isolated and purified protein wherein said protein is a late enzyme of cephamycin biosynthesis.
14. The isolated and purified protein of claim 13, wherein said protein has 3 'hydroxymethylcephem Ocarbamoyltransferase activity.
15. The protein of claim 14 wherein said protein comprises an amino acid sequence: M L I V A F K P G H D G A V A A I G D R R L L Y S L E S E K D S R P R Y S P I L A T T V L D L A E R L G E V P D V V A L G G W S D L R P N R I S Y T G A G Y S G I E E P T V T T S R F F G K E V K F F S Ξ T H E R S H I Y M A L G M A P R D D S P V Q T V L V W E G D V G A F Y V I D G H Q R I T R K V Q V M S G P G A R Y S F L F G L A D P T F P T T G G K P R L N D A G K L M A L A A F G D S A D A D A D I T H V V E R I L K Q D S M Y P A P K G E Y R D S V L Y N A G V E S P E C K I A A A L L T E R L F E T F A E V A R Q E M P E G S P L Y I S G G C G L N C D W N S L W A Q L G H F S S V F V A P C T N D S G S A L G T A I D A L T T F T G D P H V D W S V Y S G L E F V T D T Q P D P A R W T S R P L E H D E L S G A L A G G R V V A W V Q G R W E I G P R A L C N R S L L A E P F G A V T R D R L N E I K Q R E D Y R P I A P V C R V E D L G K V F H E D F E D P Y M L Y F R R V R E S S G L R A V T H V D G S A R V Q T V R D S G N P Q M H R L L S A F A A Q R G V G V L C N T S L N F N G E G F I N R M S D L V L Y C E S R G I S D M V V G D T W Y Q R A E G (SEQ. ID. NO. : 5) or functional derivative thereof.
16. The isolated and purified protein of claim 13, wherein said protein has 3'methylcephemhydroxylase activity.
17. The protein of claim 16 wherein said protein comprises an amino acid sequence: M S D K T V P V F S M A E L R D G S R Q D E F R E W A R R G V F Y L T G Y G A T E R D H R V A T D T A M D F F A Q G T A E E K Q A V T T K V P T M R R G Y S A L E A E S T A Q V T N T G T Y T D Y S M S Y S M G I G G N L F P S K E F E S V W T D Y F D S L Y R A A Q E T A R L V L T A A G T Y D G E D L D T L L D C D P V L R L R Y F P E V P E H R A A E Y E P R R M A P H Y D L S I I T F I H Q T P C A N G F V S L Q A E V D G E M V S L P H V E D A V V V L C G A I A P L V T Q G A V P A P N H H V V S P D A S M L K G S D R T S S V F F L R P S T D F T F S V P D A R K Y G L D V S L D M E K A T F G D W I G T N Y V T M H A V T S (SEQ.ID.NO. :6) or functional derivative thereof.
18. The isolated and purified protein of claim 13, wherein said protein has C7 hydroxycephem methyltransferase activity.
19. The protein of claim 18 wherein said protein comprises an amino acid sequence: V P T V S S P L Y Y A A P L T P D G G D G D W C I D A V T R P P E V L F N F R K V G V E T T I T D L R E G S I R P A L D E T G F E K V T A P T G A S Q R G L L D S E E A A L E Q Y R R E T G E L L R S L T G A D V V E F F D A T L R R Q D A A D D P A A Q S P H Q R V H V D Q S P G S A R A R A E R H L G P G R E F R R F Q I I N V W R P L L E P V R N F P L A L C D Y R S L D L S A D L V P T R L D F P D W L K D R E N Y S V R H N P A H R W Y F W D S L T P A E A L V F K C Y D S A S R G L A M A G G E P D G G E L R D V A G L C P H T A F F D E N G P S T G H L R T S L E L R A L A F H E ( SEQ . ID. NO . : 7 ) or functional derivative thereof .
20. The protein of claim 18 wherein said protein comprises an amino acid sequence: T D T T R Q D F L D L N L F R G L G E D P V Y H P P V L A D R P R D P L D R W A E A P R D L G F S D F A R Y Q W R G L R M L K N P D T Q A Y H D L M V E L R P R T V I E L G V Y S G G S L A R F R D M A E L M G F D C Q V L G I D R D L S R C Q I P E S E M K N I S L R E A D C S D L A T F E H L R D L P H P L V F I D D A H A N T F N I L R W S V D H L L H E G D Y F I I E D M I P Y W Y R Y S P K L L T E Y L A A F A G E L S M D M V Y A N A S S Q L E R G V L R R S A P K A (SEQ.ID.NO. :8) or functional derivative thereof.
Description:
TITLE OF THE INVENTION

DNA ENCODING CEPHAMYCIN BIOSYNTHESIS LATE

ENZYMES

INTRODUCTION

Cephamycin C is a cephalosporin produced by Nocardia lactamdurans (Stapley et aL. 1972), Streptomyces clavuligerus (Brown et ah, 1979) and several other actinomycetes (see review by Martin and Liras, 1989). Cephamycin C is synthesized from the precursor amino acids L-α-aminoadipic acid, L-cysteine and L-valine by the multienzyme -aminoadipyl-cysteinyl-valine synthetase (Martin et ah, 1992; Aharonowitz et aL, 1993) which is encoded by the pcbAB gene (Coque et aL, 1991a). -Aminoadipic acid is formed from L-lysine by the lysine-6- aminotransferase, encoded by the ]at gene (Coque et al., 1991b; Madduri et al., 1991). The tripeptide is later cyclized to form isopenicillin N and this intermediate is epimerized to form penicillin N which is later converted to deacetoxycephalosporin C (DAOC) by the deacetoxy- cephalosporin C synthase (expandase). The genes encoding these three enzymatic steps, pcbC, cefD and cefE in N. lactamdurans are known to be clustered with ]at and pcbAB (Coque et aL, 1993a,b).

Cephalosporin C is the end product of the biosynthetic pathway in Cephalosporium acremonium. However in cephamycin - producing actinomycetes further reactions are involved in the synthesis of the C-7-methoxyl group and in the attachment of the carbamoyl group at C-3' (Fig. 1). Little information is available however about the so-called "late"genes, which convert deacetoxycephalosporin C into cephamycin C. The deacetoxycephalosporin C is known to be hydroxylated in S. clavuligerus to form deacetylcephalosporin C (DAC) by an oc- ketoglutarate-requiring dioxygenase (Turner et al., 1979; Baker et aL, 1991 ), but the enzyme has not been described in N. lactamdurans. In parallel, the deacetylcephalosporin C is enzymatically converted into O- carbamoyldeacetylcephalosporin C by an O-carbamoyltransferase that transfers a carbamoyl group from carbamoylphosphate (Brewer et aL, 1980). The methoxyl group at C-7 in the cephamycins derives from

molecular oxygen and methionine (Whitney et al., 1972) by the action of a monooxygenase and a methyltransferase (O'Sullivan et aL, 1979).

There are at least two types of oxygenases involved in hydroxylations of microbial metabolites. The first class are oc-keto- glutarate-dependent and require Fe^+ ions to introduce one of the oxygen atom as from 02 into the substrate (Abbot and Lindstedt, 1974). A second class of flavin monooxygenases require pyridine nucleotides as electron donors and 02- One of the best known flavin monooxygenases is the p-hydroxyphenylacetate-3-hydroxylase of Pseudomonas putida. which is a two protein component enzyme (Arunachalan et aL, 1992). For many years it has been unclear whether the C-7 hydroxylase was different from the C-3' hydroxylase which converts DAOC into DAC. Demain and coworkers [Xiao et aL, 1991] have purified the 7- hydroxylase activity of S. clavuligerus and the sequenced amino acid terminal end of the protein is very similar to that of the previously cloned C-3 hydroxylase (Xiao et aL, 1991).

The carbamoyl group is present in some intermediates of primary metabolism such as citrulline and carbamoylaspartate. These molecules are formed from ornithine or aspartate and carbamoyl phosphate by the ornithine carbamoyl transferase or the aspartate carbamoyl transferase, during the biosynthesis of arginine or pyrimidines, respectively.

In secondary metabolism, the carbamoyl group is present in a variety of antibiotics and other metabolities. It is found in venturicidin A produced by Streptomyces aureofaciens Duggar and Streptomyces hvgroscopicus A- 130 (Brufani et aL 1971; 1968), in the 3'-0- carbamoyl- 2-deoxy-β-D-rhamose moiety of the antifungal antibiotic irumamycin, produced by Streptomyces subflavus (Nakagawa et aL, 1985) and the related macrolide antibiotic X- 149523 from Streptomyces sp_. (Omura et al., 1985), in the antitumoral antibiotics mitomycins and porfiromycins produced by Streptomyces caespitosus. Streptomyces ordus and Streptomyces verticil latus (Glasby 1979), and in novobiocin, an inhibitor of the DNA gyrase, produced by Streptomyces niveus (Kominek 1972). The presence of carbamoyl groups in oligosaccharides of Rhizobium sp.

involved in nodulation is also well documented (Price et al., 1992; Holsters et al., 1993).

In the β-lactam antibiotics of the cephamycin family, a carbamoyl group is attached to the C-3' hydroxymethyl side chain of cephamycin C. A good knowledge exists at present on the biochemistry (Jensen 1986; Martin and Liras 1989) and genetics (Kovacevic et al., 1990; Kovacevic and Miller 1991 ; Coque et al., 1991a,b;) of cephamycin C biosynthesis (see review by Aharonowitz et aL, 1992). Genes encoding enzymes for the early (lat, pcbAB. pcbC) (Coque et al., 1991; Madduri et aL, 1991 ), and intermediate steps of the pathway (cefD cefE, cefF) have been cloned for Streptomyces clavuligerus (Kovacevic et al., 1990; Kovacevic et aL, 1989) and Nocardia lactamdurans (Coque et al.. 1993).

The late steps of cephamycin C biosynthesis are poorly understood. The biosynthetic intermediate deacetylcephalosporin C (DAC) is the substrate for a two step methoxylation at C-7. The genes encoding the C-7 hydroxylase and the C-7 O-methyltransferase (cmcl, crncJ) have never been described from N. lactamdurans. The carbamoylation at the C-3'-hydroxymethyl side chain of DAC occurs after methoxylation, or perhaps in parallel forming a metabolic grid (Fig. 1). A preliminary description of an ATP-dependent carbamoyl transferase that uses carbamoylphosphate as the carbamoyl donor has been described by Brewer et aL, (1980). However nothing is known about the gene encoding such an enzyme or any other gene encoding enzymes that carry out carbamoylation reactions in the biosynthesis of different antibiotics. Moreover, it is unknown whether the genes for cephem-carbamoyl transferases and ornithine (or aspartate) carbamoyl transferases have any similarity.

The identification, isolation and purification of the DNA encoding the enzymes which catalyze the late steps of cephamycin biosynthesis would be extremely useful to produce these antibiotics. These DNA's would be useful to establish recombinant host cells to produce these antibiotics on an industrial scale.

Three genes located in the cluster of cephamycin C biosynthesis in N. lactamdurans which encode the deacetoxycephalo¬ sporin C hydroxylase and two other proteins, which introduce the methoxyl group at C-7 have been isolated and sequenced. The sequence of one of the latter proteins resembles both cholesterol hydroxylases and methyltransferases of different origins acting on hydroxyl groups present in aromatic or quinone-type compounds; both proteins are required for the hydroxylation at C-7 and the transfer of the methyl group from S- adenosylmethionine to the 7-hydroxycephem intermediate. In addition, the isolation, nucleotide sequence, and the characterization of a gene encoding a 3'-hydroxymethylcephem 0-carbamoyltransferase of Nocardia is disclosed. The gene is named crncH according to the standard nomenclature of genes involved in the biosynthesis of β-lactam antibiotics (cef for genes common to cephalosporin and cephamycin producers; cmc for genes specific for cephamycin biosynthesis) (Martin et aL, 1991 ; Aharonowitz et al., 1992).

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 - A diagram of the late steps of the cephamycin C biosynthetic pathway is shown; carbamoylation may also proceed prior to the introduction of the methoxyl group at C7.

Figure 2 - Restriction map of the 5.4 kb BamHI DNA fragment of N. lactamdurans containing ORF7, ORF8, ORF9 and ORF10. Black bars (below) indicate the DNA fragments subcloned to give the plJ702-derived plasmids.

Figure 3 - Sequence of 2672 bp internal to the 5.4 kb BamHI DNA fragment. The first 69 nt correspond to the 3' end of the pcbC gene. The deduced amino acid sequences encoded by ORF7, ORF8 and ORF9 are indicated on the right. The translation initiation and termination codons, and the putative ribo some-binding sites are underlined. A sequence (nt 2734-2765) downstream of ORF9 forming a stem and loop

structure in the RNA that may correspond to a transcription terminator is indicated with arrows.

Figure 4 - Nucleotide and deduced amino acid sequence of the crncH gene (ORF10) of N. lactamdurans. Note the short intergenic region with the inverted repeat and the GGAGGA (putative ribosome binding) sequence preceding the first in frame ATG.

Figure 5 - Fragment of the S. clavuligerus cephamycin cluster carrying the cefF and the cmcH genes. The plasmids pULFJP62 and pULFJ30 are indicated by solid bars.

Figure 6 - Panels A and B - HPLC analysis of the reaction products of a 3' cephem hydroxylase (DAOC hydroxylase) assay using desalted ammonium sulfate fractions (30-70%) of extracts of S. lividans plJ702-58a as indicated in Materials and Methods. Panel A) Time zero. Panel B) After two hours of reaction.

Figure 7 - Panels A, B, C and D - HPLC of the reaction products of 7 cephem hydroxylase (Panels A, B) and 7-hydroxy cephem- methyltransferase assays (Panels C, D) using desalted ammonium sulfate fractions of extracts of S. lividans pUL702-55a. A, C) Time zero. B, D) After two hours of reaction.

Figure 8 - Panels A, B and C - NADH oxidation by extracts of Panel A S. lividans pUL702-56a, Panel B S. lividans pUL702-57a and Panel C S. lividans plJ702 in the presence (n) and absence (s) of cephalosporin C (50 μg/ml). At zero time 50 μg NADH were added to the reaction.

DETAILED DESCRIPTION OF THE INVENTION

The present invention is drawn to the isolation, purification and characterization of DNA molecules which encode enzymes involved in the late steps of biosynthesis of cephamycins. The present invention is

also drawn to the use of these DNA molecules for expression in recombinant host cells. Recombinant expression of the DNA molecules of the present invention is useful for the production of cephamycin antibiotics. Recombinant expression will also facilitate the production, purification and characterization of the recombinant proteins, and use of the recombinant proteins for antibiotic production.

The present invention relates to DNA encoding novel enzymes for cephamycin biosynthesis termed cmcH, cmcl and cmcJ. The present invention is also related to recombinant host cells which express the cloned enzyme-encoding DNA contained in a recombinant expression plasmid. The DNA of the present invention is isolated from cephamycin producing cells. In particular, the cephamycin-producing cells suitable for the isolation of DNA encoding these enzymes include but are not limited to Nocardia lactamdurans, Streptomyces clavuligerus. Streptomyces lipmanii. Streptomyces panayensis, Streptomyces cattleya. Streptomyces griseus. Streptomyces wadavamensis. Streptomyces todorominensis. Streptomyces filipinensis cephamycini and Streptomyces heteromorphus. The most preferred cephamycin producing cells are of the genus Nocardia.

Other cells and cell lines may also be suitable for use to isolate DNA encoding the enzymes of the present invention. Selection of suitable cells may be done by screening for enzymatic activity in the cells. Methods for detecting the enzymatic activity are well known in the art and are described below. Cells which possess the enzymatic activity in these assays may be suitable for the isolation of DNA encoding the enzymes.

Any of a variety of procedures may be used to clone DNA. These methods include, but are not limited to, direct functional expression of the DNA following the construction of an enzyme-containing DNA library in an appropriate expression vector system. Another method is to screen an enzyme activity- containing DNA library constructed in a bacteriophage or plasmid shuttle vector with a labelled oligonucleotide probe designed from

the amino acid sequence of the specific protein. The preferred method consists of screening an enzyme-containing DNA library constructed in a bacteriophage or plasmid shuttle vector with a partial DNA encoding the specific protein. This partial DNA is obtained by the specific PCR amplification of DNA fragments through the design of degenerate oligonucleotide primers from the amino acid sequence known for the particular enzyme or other enzymes which are related to the enzymes of the present invention.

It is readily apparent to those skilled in the art that other types of libraries, as well as libraries constructed from other cells or cell types, may be useful for isolating enzyme-encoding DNA. Other types of libraries include, but are not limited to, DNA libraries derived from other cells or cell lines other than Nocardia cells, and genomic DNA libraries.

It is readily apparent to those skilled in the art that suitable DNA libraries may be prepared from cells or cell lines which have the particular enzymatic activity. The selection of cells or cell lines for use in preparing a DNA library to isolate enzyme- encoding DNA may be done by first measuring cell associated enzymatic activity using the known assays used herein.

Preparation of DNA libraries can be performed by standard techniques well known in the art. Well known DNA library construction techniques can be found for example, in Maniatis, T., Fritsch, E.F., Sambrook, J., Molecular Cloning: A Laboratory Manual (Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 1982).

It is also readily apparent to those skilled in the art that DNA encoding the enzymes of the present invention may also be isolated from a suitable genomic DNA library.

Construction of genomic DNA libraries can be performed by standard techniques well known in the art. Well known genomic DNA library construction techniques can be found in Maniatis, T., Fritsch, E.F., Sambrook, J. in Molecular Cloning:

A Laboratory Manuel (Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 1982).

In order to clone the enzyme-encoding DNA gene by one of the above methods, the amino acid sequence or DNA sequence of the particular enzyme or a related enzyme from another organism is necessary. To accomplish this, the particular enzyme or a related enzyme may be purified and partial amino acid sequence determined by automated sequenators. It is not necessary to determine the entire amino acid sequence, but the linear sequence of two regions of 6 to 8 amino acids can be determined for the PCR amplification of a partial DNA fragment.

Once suitable amino acid sequences have been identified, the DNA sequences capable of encoding them are synthesized. Because the genetic code is degenerate, more than one codon may be used to encode a particular amino acid, and therefore, the amino acid sequence can be encoded by any of a set of similar DNA oligonucleotides. Only one member of the set will be identical to the enzyme sequence but others in the set will be capable of hybridizing to the DNA even in the presence of DNA oligonucleotides with mismatches. The mismatched DNA oligonucleotides may still sufficiently hybridize to the DNA to permit identification and isolation of enzyme-encoding DNA.

Another method for obtaining DNA encoding the enzymes of the present invention is to utilize DNA sequences encoding a separate and distinct protein, but one which is known or suspected to possibly have at least some degree of homology. DNA encoding the separate and distinct protein may have partial homology or share a region of homology with the DNA encoding the enzymes sought. By using the DNA encoding a protein suspected of sharing some degree of homology as a hybridization probe, a library, as described above, can be screened to identify DNA fragments which hybridize with the probe. Hybridizing DNA fragments identified by this means are further characterized to determine whether they encode the sought enzymes.

Using one of the above methods, DNA clones encoding the enzymes are isolated in a two-stage approach employing polymerase chain reaction (PCR) based technology and DNA library screening. In the first stage, NH2-terminal and internal amino acid sequence information from the purified enzyme or a homologous protein is used to design degenerate oligonucleotide primers for the amplification of enzyme-specific DNA fragments. In the second stage, these fragments are cloned to serve as probes for the isolation of full length DNA from a DNA library derived from Nocardia or other cephamycin producing cells.

The cloned DNA obta ed through the methods describ e d above may be recombinantly expressed by molecular cloning into an expression vector containing a suitable promoter and other appropriate transcription regulatory elements, and transferred into prokaryotic or eukaryotic host cells to produce recombinant enzyme. Techniques for such manipulations can be found described in Maniatis, T, et aL, supra, and are well known in the art.

Expression vectors are defined herein as DNA sequences that are required for the transcription of cloned DNA and the translation of their mRNAs in an appropriate host. Such vectors can be used to express eukaryotic DNA in a variety of hosts such as bacteria, bluegreen algae, plant cells, insect cells fungal cells including yeast and filamentous fungi and animal cells.

Specifically designed vectors allow the shuttling of DNA between hosts such as bacteria-fungal cells or bacteria- animal cells. An appropriately constructed expression vector should contain: an origin of replication for autonomous replication in host cells, selectable markers, a limited number of useful restriction enzyme sites, a potential for high copy number, and active promoters. A promoter is defined as a DNA sequence that directs RNA polymerase to bind to DNA and initiate RNA synthesis. A strong promoter is one which causes mRNAs to be

- 10

initiated at high frequency. Expression vectors may include, but are not limited to, cloning vectors, modified cloning vectors, specifically designed plasmids or viruses.

A variety of expression vectors may be used to express the recombinant cephamycin biosynthesis enzymes of the present invention in fungal cells. Commercially available expression vectors which may be suitable for recombinant enzyme expression, include but are not limited to, pIJ702 (ATCC 35287), pVEI (ATCC 14585), and pULJL43 (University of Leon) .

DNA encoding the enzymes of the present invention may be cloned into an expression vector for expression in a host cell. Host cells may be prokaryotic or eukaryotic, including but not limited to bacteria, mammalian cells, insect cells, fungal cells including yeast and filamentous fungi. Cells derived from fungal species which may be suitable and which are commercially available, include but are not limited to, Cephalosporium acremonium (Acremonium chrysogenum . Aspergillus nidulans. Penicillium chrvsogenum. and Penicillium notarum.

The expression vector may be introduced into host cells via any one of a number of techniques including but not limited to transformation, transfection, protoplast fusion, and electroporation. The expression vector-containing cells are individually analyzed to determine whether they produce the recombinant protein. Identification of enzyme-expressing cells may be done by several means, including but not limited to immunological reactivity with anti-enzyme antibodies, and the presence of host cell-associated enzymatic activity.

To determine the DNA sequence(s) that yields optimal levels of enzymatic activity and/or protein, DNA molecules including but not limited to the following can be constructed: the full-length open reading frame of the DNA and various constructs containing portions of the DNA encoding only specific domains of the protein or rearranged domains of the protein. All constructs can be designed to contain none, all or

portions of the 5' and/or 3' untranslated region of th eenzyme- encoding DNA. Enzymatic activity and levels of protein expression can be determined following the introduction, both singly and in combination, of these constructs into appropriate host cells. Following determination of the DNA cassette yielding optimal expression in transient assays, this DNA construct is transferred to a variety of expression vectors (including recombinant viruses), including but not limited to those for insect cells, bacteria and fungal cells including yeast and filamentous fungi.

Levels of the specific recombinant protein in host cells is quantitated by a variety of techniques including, but not limited to, immunoaffinity and/or enzymatic activity techniques. Enzyme-specific affinity beads or enzyme-specific antibodies are used to isolate -"S -methionine labelled or unlabelled recombinant enzyme. Labelled recombinant enzyme is analyzed by SDS- PAGE. Unlabelled recombinant protein is detected by Western blotting, ELISA or RIA assays employing EP3 specific antibodies. Enzymatic activity of the recombinant enzyme is also detected and measured as described below.

Following expression of the enzyme in a host cell, the enzyme may be recovered to provide the enzyme in active form, capable of carrying out its specific activity. Several recombinant enzyme purification procedures are available and suitable for use. Recombinant enzyme may be purified from cell lysates and extracts, or from conditioned culture medium, by various combinations of, or individual application of salt fractionation, ion exchange chromatography, size exclusion chromatography, hydroxylapatite adsorption chromatography and hydrophobic interaction chromatography.

In addition, recombinant enzyme can be separated from other cellular proteins by use of an immuno-affinity column made with monoclonal or polyclonal antibodies specific for full length enzyme, or polypeptide fragments of the enzyme.

Monospecific antibodies to the enzyme are purified from mammalian antisera containing antibodies reactive against the specific enzyme or are prepared as monoclonal antibodies reactive with the enzyme using the technique of Kohler and Milstein, Nature 256: 495-497 (1975). Monospecific antibody as used herein is defined as a single antibody species or multiple antibody species with homogenous binding characteristics for the enzyme. Homogenous binding as used herein refers to the ability of the antibody species to bind to a specific antigen or epitope, such as those associated with the enzyme, as described above. Recombinant enzyme-specific antibodies are raised by immunizing animals such as mice, rats, guinea pigs, rabbits, goats, horses and the like, with an appropriate concentration of the enzyme either with or without an immune adjuvant.

It is readily apparent to those skilled in the art that the above described methods for producing monospecific antibodies may be utilized to produce antibodies specific for polypeptide fragments, or full-length polypeptide.

Enzyme-specific antibody affinity columns are made by adding the antibodies to Affigel-10 (Biorad), a gel support which is pre-activated with N-hydroxysuccinimide esters such that the antibodies form covalent linkages with the agarose gel bead support. The antibodies are then coupled to the gel via amide bonds with the spacer arm. The remaining activated esters are then quenched with 1M ethanolamine HC1 (pH 8). The column is washed with water followed by 0.23 M glycine HC1 (pH 2.6) to remove any non-conjugated antibody or extraneous protein. The column is then equilibrated in phosphate buffered saline (pH 7.3) and the cell culture supernatants or cell extracts containing recombinant enzyme or fragments of it are slowly passed through the column. The column is then washed with phosphate buffered saline until the optical density (A280) f a ^ s t0 background, then the protein is eluted with 0.23 M glycine-HCl (pH 2.6). The purified protein is then dialyzed against phosphate buffered saline.

The following Examples are provided as illustrative of the present invention without, however, limiting the same thereto.

EXAMPLE 1

Bacterial strains and plasmids

N. lactamdurans LC41 1 , an improved cephamycin C producer strain, was used as the source of DNA and RNA. Streptomyces lividans 1326, (Hopwood et al., 1985) a strain unable to synthesize β- lactam antibiotics, was used as a host for transformation and for expression experiments. E. coli DH5α was used for high frequency transformation and E. coli WK6 with the helper phage M13K07 to obtain single-strand DNA.

The genes of the cephamycin C biosynthetic cluster were isolated from phages lambda EMBL-C2 and C-8 (Coque et al., 1991a), and subcloned into plasmids pBluescript KS( + ) or plJ2921. To express the N. lactamdurans genes in J>. lividans they were subcloned into plJ702 (Katz et aL, 1983).

EXAMPLE 2

Fermentations conditions

For preparation of seed cultures S. lividans transformants containing recombinant plasmids were grown in YEME medium with 34% sucrose (Hopwood et al., 1985). After 48 hours of growth in YEME 25 ml of this culture was used to inoculate 500 ml triple baffled flasks containing 100 ml of minimal medium with glucose and lysine (Coque et aL, 1991b) and grown at 30°C in an orbital shaker at 250 rpm. All the inoculum and fermentation media used to grow cultures of transformant strains contained thioestrepton (5 μg/ml).

Cell free extracts were prepared from washed mycelium suspended in MOPS buffer 100 mM pH 7.5 containing 20 μg/ml DNAse and 1 mM PMSF. The cells were broken by sonication with a Branson Sonifier B-12 or alternatively with a French Press (Aminco).

EXAMPLE 3

DNA isolation, sequencing and manipulation.

Fragments to be sequenced were subcloned in pBluescript KS(+) in both orientations. Ordered sets of nested DNA fragments were generated by sequential deletions using the Erase-a-Base system (Promega, Madison, Wis). The DNA was sequenced in both orientations by the dideoxynucleotide method (Sanger et al., 1977) using Taq polymerase (Promega) and 7-deaza-dGTP to avoid compressions. Isolation of plasmid DNA, digestion with endonucleases, labelling and Southern hybridizations were carried out according to standard procedures (Sambrook et al., 1989). Transformation of S. lividans was done as described previously (Hopwood et al. 1985; Garcϊa-Dominguez et al., 1991).

DNA and protein sequence analysis.

Open reading frames (ORFs) in DNA were identified using the GENEPLOT Program. Inverted repeated sequences were located with the STEM&LOOP Program. Protein comparisons were made with the AALIGN program using the EMBL Swiss Prot data bank. Dot Plot analysis was made with the DOT-PLOT Program using a window size of 25 amino acids and a percent match of 30%.

One of three open reading frames (ORF) located downstream from the pcbC gene hybridizes to cefE probes.

During the cloning of cefE gene from N. lactamdurans using a 734 bp Sacll internal probe of the A. chrysogenum gene cefEF. two positive hybridization bands were found in the total DNA of N. lactamdurans: one of them was characterized (cefE gene) and shown to encode the DAOC synthase (expandase) (Coque et aL, 1993a). The identity of the second hybridizing band, was unknown but could be a closely related gene or a duplicate cefE gene. Therefore, in order to determine what the DNA band encoded, phages containing DNA fragments of the N. lactamdurans cephamycin C gene cluster were

digested with BamHI and hybridized with a 503 bp Avail DNA fragment internal to the cefE gene from N. lactamdurans. A 5.4 kb BamHI DNA fragment (Fig. 2) [known to contain the pcbC gene but not the cefE gene (Coque et al., 1991; 1993)] gave a strong positive hybridation. The entire region of this 5.4 kb DNA fragment downstream from pcbC was sequenced. Three ORFs, ORF7, ORF8 and ORF9 (Fig 2) were found and a fourth complete ORF (ORF10) was present dowstream of ORF9. The four ORFs showed clearly, using the GENEPLOT program (DNASTAR), a high frequency of GC in the third position of the codons as expected in actinomycetes genes.

By subcloning and deletion experiments it was shown that the cefE probe hybridizes specifically to ORF9.

Characterization of ORF9 as the cefF gene. The ORF9 is located 1.6 kb dowstream from pcbC and has a size of 933 nucleotides (Fig. 3) and a G + C content of 68.1 %. It encodes a protein with a deduced Mr of 34,366 and a predicted pl of 4.65. The gene is separated from the upstream ORF (ORF8) by 23 nucleotides. A putative ribosome binding site (RBS) GAGGAGCA, is present in the intergenic region 7 bp upstream from the ATG translation initiation triplet. Downstream of the TAG termination codon, a secondary structure (nt 2734-2767) forms a stem and loop structure that may correspond to a terminator with a calculated free energy of -31 Kcal/mol. Computer comparison of the amino acid and nucleotide sequence of ORF9 with expandases and hydroxylases involved in cephalosporin or cephamycin biosynthesis showed 80.8% identity in nucleotides and 77.5% in amino acids with the cefF encoded protein of S clavuligerus. The protein encoded by ORF9 showed also a high similary to the bifunctional expandase/hydroxylase of A. chrvsogenum (encoded by cefEF) and to the expandases of N. lactamdurans and S. clavuligerus (encoded by cefE).

Characterization of ORF7 and ORF8.

Downstream from the pcbC gene and separated by only 13 nucleotides, starts an ORF (ORF7) of 71 1 nucleotides encoding a protein

(named P7) of 236 amino/acids (27364 daltons), with a deduced pi of 4.89. Immediately downstream from ORF7, with a separation of only 7 nucleotides begins ORF8, a sequence of 876 nucleotides (Fig. 3) encoding a deduced protein (P8) of 292 amino acids (32,090 daltons), with a pi of 5.02. The G + C content of both ORF7 and ORF8 was 67.5 and 71.9% respectively. Thirteen nucleotides upstream from the ATG of ORF7 a putative RBS sequence (GAGGAGCA) was observed, identical to that found upstream of ORF9. The small separation between ORF7 and ORFS and the presence of a putative RBS, GAAGG, before the end of ORF7 (inside the coding sequence), suggest that both genes are co- translated without release of the ribosomes.

A comparative DOTPLOT analysis of the predicted protein encoded by ORF7 with other proteins present in databanks indicates that the ORF7 product shows homology to 0-methyltransferases involved in chemotaxis (che genes) from S. typhimurium and E. coli. hydroxyindoil, cathecol and caffeic acid O-methyltransf erases, and lower homology to tylosin methyltransferases from Streptomyces fradiae (tcmP, tcmO) and methyltransferases for the methylation of oligosaccharides involved in nodulation in Azorhizobium (nodS). A typical S-adenosylmethionine binding motif, is present in the N-teπninal region (amino/acids 10 to 26) (Ingrosso et aL, 1989). Since all these proteins catalyze the O- methylation of phenolic or heterocyclic hydroxyl groups, ORF7 seems to encode the C-7 hydroxycephem methyltransferase. However, in addition, the protein exhibits a 30.5% identity in 59 amino/acids with human and rat cholesterol 7-α-monooxygenase, an enzyme which introduces oxygen at C-7 position in the cholesterol nucleus.

The ORFS protein showed no significant homology with any protein present in the EMBO and Swiss-Prot data banks and it behaves as a coupling protein.

Location of ORF10 of N. lactamdurans.

During the characterization of the genes cefF. cmcl and cmcJ present in a 5.4kb BamHI DNA fragment, an incomplete ORF (ORF 10) was found. This ORF was located downstream of cefF. In

order to obtain the complete sequence of this gene, a 3.6 kb BamHI DNA fragment of the cephamycin C gene cluster [known to be adjacent to the 5.4 kb BamHI fragment downstream from ORF9 (cefF) (Coque et al., 1993)] was subcloned and sequenced in both orientations. This 3.6 kb BamHI fragment contains the bja gene (Coque et aL, 1993), the 3' region of ORF10, and the 5' end of ORF14. In order to obtain ORF10 in a single DNA fragment, a 3.4 kb NotI DNA fragment was isolated from the recombinant phage lambda EMBL-C2 and subcloned in pBluescript KS(+). The fragment was recovered, the ends filled with Klenow polymerase, ligated to a synthetic S-mer Bglll linker and subcloned in Bglll digested pLT2921. From this plasmid the 3.4 kb fragment with Bglll ends was subcloned in the Streptomyces plasmid plJ702 in the same orientation of the mel gene and downstream from the mel promoter to give plasmid pUL702-37a (Fig. 2).

Characterization of ORF10

The translation initiator ATG codon of ORF 10 is located 74 bp downstream from cefF. It was preceeded by a GGAGGA sequence that resembles the Shine Delgarno ribosome binding sequences. An inverted repeat of 15 bp that may form, if transcribed, a stem and loop structure with a calculated free energy of -31 kcal/mol, is present in the intergenic region. If this structure is a functional terminator, ORF10 should be expressed from its own promoter; alternatively ORF10 may be expressed from an upstream promoter and the inverted repeat may work as a terminator regulated by an antitermination mechanism. ORF10 contains 1563 nt (Fig. 4) and has a G + C content of 68.8% which is similar to the average G + C content of the N. lactamdurans sequenced genome (70.4%) (Coque et al., 1993a). It encodes a protein of 520 amino acids with a deduced Mr of 57149 and a pi of 5.2. When amino acid comparison was made using the Aalign

Program the protein encoded by ORF10 showed 32.1 % and 30.2% identity (in 287 and 281 amino acids respectively) with the C-terminal end of the nodU genes from both Rhizobium fredii and Bradyrhizobium japonicum. The nodU genes encode O-carbamoyl transferases for the biosynthesis of carbamoylated polysaccharides required for nodulation

(Lewin et al., 1990). The gene corresponding to ORF10 encodes, therefore the cephem-carbamoyltransferase and was named cmcH, according to the proposal of Martin et aL, (1991) and Aharonowitz et aL, (1992) for designation of the β-lactam biosynthesis genes. The cmcH encoded protein showed little overall homology with aspartate carbamoyl transferases and ornithine carbamoyl transferase of E. coli. Aspergillus nidulans and a variety of other microorganisms.

Location of the cmcH gene of S. clavuligerus. From a genomic library of S. clavuligerus DNA in lambda-

GEM12 the cmcH and the adjacent cefF gene were subcloned into pIJ702 as a 6.2 kb BamHI DNA fragment originating from plasmid pULFJP62. When lambda GEM 12 phages containing DNA fragments of the cephamycin cluster were hybridized with the cefF or the cmcH genes of N. lactamdurans. positive hybridizations were found in phage EMBL-C5 and in the 6.2 kb BamHI DNA fragment. The 6.2 kb DNA fragment of S clavuligerus contains, therefore, the cefF (Kovacevic et al., 1991) and the cmcH genes.

The hybridizing J>. clavuligerus DNA sequence was mapped more precisely within a 3.0 kb Kpnl DNA fragment containing also the cefF gene (Fig. 5). The 3.0 kb Kpnl DNA fragment was subcloned in pBluescript KS(+) giving plasmid pULFJP30 and 160 bp at the Kpnl site of the distal end (downstream from cefF ) were sequenced. The 160 nt sequence matched almost perfectly the sequence of the cmcH gene of N. lactamdurans (nt 1479- 1639 in Fig. 3 A) with a 80% identity in nucleotides and 81 % identity in the deduced amino acids. These results established that the relative location of cefF and cmcH is identical in the cephamycin cluster of both cephamycin C producers S. clavuligerus and N. lactamdurans, although the overall organization of the clusters is different (Martin et al., 1992; and Martin and Gutierrez 1994).

EXAMPLE 4

Cell growth and preparation of cell-free extracts.

Cell free extracts were obtained from S. lividans transformants grown for 48 hours in minimal medium with glucose and lysine (Madduri et aL, 1991). The cells were washed and suspended in 100 mM, MOPS pH 7.5, containing DTT (1 mM), PMSF (1 mM) and DNAse (20 μg/ml). The cell suspension was sonicated in a Branson B-12 sonifier. Nucleic acids were removed from the cell free extracts by treatment with protamine sulphate (0.1 %) and the protein in the supernatant was precipitated with ammonium sulphate (0-80%). The protein precipitate was dissolved in the same MOPS buffer and passed through a PD-10 column (Pharmacia).

EXAMPLE 5

3'-Hydroxymethylcephem O-carbamoyltransferase (CC ) assays.

3'-Hydroxymethylcephem O-carbamoyltransferase activity was assayed by the following three different methods, i) By using decarbamoylcefur-oxime and as substrates and measuring the ethyl acetate-extractable carbamoylated cefuroxime radioactivity in a Phillips PW4700 scintillation counter, as reported by Brewer et al., (1980). ii) Alternatively, for qualitative determination of the carbamoylation TLC chromatography of the reaction products was performed on Silica gel 60 plates (20 x 20) using n-propanol:glacial acetic acid: water (5:1:2) and the compounds containing the cephem ring were detected by spraying the plates with a) I2/KI 5 mM solution containing 1 % w/v sodium azide and b) starch 1 % solution.

cmcH genes of N. lactamdurans and £. clavuligerus encode a functional 3'-hydroxymethylcephem O-carbamoyltransferase activity.

Cell free extracts of cultures (48 hours in minimal medium) of S. lividans [pIJ702], S. lividans [pUL702-37a1 (containing the cmcH

gene from N. lactamdurans) and .S . lividans [pULFP62] (containing the cmcH gene from S. clavuligerus) were obtained and the CCT activity was quantified. Table 1 shows that the CCT specific activity in S. lividans strains containing the N. lactamdurans or S. clavuligerus genes are 7.5- and 7.7-fold higher, respectively, than the background level of S lividans transformed with pIJ702. The 3'-hydroxymethylcephem O-carbamoyl¬ transferase was found to precipitate after treatment of the cell free extracts with ammonium sulphate in the 45-65% fraction and has been purified for further biochemical characterization. This demonstrates that ORF10 (cmcH) encodes a functional 3'-hydroxymethylcephem O- carbamoyltransferase.

Table 1

Carbamoyl transferase

STRAIN [ 14 C]carbamoylcephuroxime Protein* Sp. activity formed (cpm*/ml) (mg/ml) (cpm/mg of protein)

Experiment 1

S. lividans pIJ702 13.285 20.5 641

S. lividans ρUL702-37a 29.515 6.0 4919

Experiment 2

S. lividans ρIJ702 3.550 8.5 410 S. lividans ρULFP62 14.025 5.5 2550

^Radioactive carbamoylated cefuroxime extracted by the Brewer assay, cpm after speed-vac dessication of 1 ml of the organic phase.

EXAMPLE 6

3'-Methylcephem-Hydroxylase Activity.

This activity (also known as DAOC hydroxylase) was measured after precipitation of the crude extracts with ammonium sulphate (30-70%) and desalting the preparation through a Sephadex G-

25 column (Pharmacia PD-2). The assay, based on the conversion of DAOC to DAC, was carried out as described by Kovacevic et al., (1989) and incubated for 120 min at 30°C. The reaction was stopped with methanol (200 μl); the precipitated proteins were removed by centrifugation at 14,000 rpm for 10 min and the reaction product was quantified in the supernatant. The hydroxylation of DAOC to DAC was followed by HPLC using a Waters μBondapak CIS column (300 x 3.8 mm) equilibrated and eluted with NaH2Pθ4 200 mM pH 4.0 at a flow of 1.5 ml/min. The eluted fractions were monitored at 254 nm. Under these conditions DAOC eluted with a retention time of 10.7 minutes and DAC at 3.7 minutes.

The cefF gene encodes a functional cephem-3-hydroxylase without cephem-7-hydroxylase activity.

In order to fully characterize the product of ORF9, the 5.4 kb BamHI fragment was subcloned in plJ702 [Katz, E. et aL, 1983, J.Gen.MicrobioL, 129, pp.2703-2714] to give plasmid plJ702-54a. Additionally, a 1.4 kb Notl-Mlul DNA fragment (internal to the 5.4 kb fragment) was end-filled and subcloned into the polylinker of plJ2921 , recovered with Bglll and subcloned into the B iπ site of plJ702 to give plJ702-58a (Fig. 2). In both cases the OFR9 was subcloned downstream from and in the same orientation as the mel promoter. Additionally the 1.4 kb Notl-Mlul fragment was subcloned in the BamHI site of plJ699 [Kieser, D., and Melton, R.D., 1988, Gene, 65, pp.83-91] to give plasmid plJ699-58a.

Cell free extracts from a 48 hour culture of S. lividans transformed by the expression plasmid [plJ702-54a] were assayed for DAOC hydroxylase activity. HPLC analysis of the products of the reaction indicated the formation of a product with a retention time of 3.7 minutes (Fig. 6B) which co-eluted with pure deacetylcephalosporin C (Fig. 6A) demonstrating DAOC hydroxylase activity. The same DAOC hydroxylase activity was observed in cultures of S. lividans [plJ702-58a].

Cultures of S. lividans [plJ702-58a] and S. lividans [plJ699- 58a] were tested for C-7 hydroxylase activity at different times during the

- 22

fermentation. No activity C-7 hydroxylase could be detected at any time, indicating that ORF9 encodes the cefF gene of N. lactamdurans. which hydroxylates the cephem nucleus at the C-3' position but not in the C-7 position. These results shows that two different hydroxylases are required in the late reactions of cephamycin biosynthesis.

EXAMPLE 7

7-Cephem-hvdroxylase and 7-Hvdroxycephem methyltransferase assavs.

The enzymes were measured in the 30-70% ammonium sulphate precipitate of crude extracts after desalting through a Sephadex G-25 column as indicated above. The assays are based on the conversion of cephalosporin C to 7 -hydroxy cephalosporin C and 7-methoxycephalo- sporin C (Xiao and Demain, 1991 ). The reactions were carried out in a water bath with shaking to favor the oxygen transfer required for the reaction. After incubation for 120 minutes at 30°C the reaction was stopped by addition of acetic acid (10 μl); the proteins were eliminated by centrifugation at 14,000 rpm for 5 minutes.

The supernatant was applied to a QAE-Sephadex column (1 ml bed volume) equilibrated with 50 mM Tris-HCl pH 6.0. After washing the column with 600 μl of the same buffer, the products of the reaction were eluted with 3N NaCl (600 μl). Chromatography of the products of the reaction was performed in a Waters μBondapack 8 column as indicated previously except that the elution was done with a flow of 1.5 ml and a gradient of methanol as follow: Time 0-20 minutes, 0% methanol; 30 minutes, 5% methanol; 40 minutes, 10% methanol. Under these conditions the retention times of the substrate and products were as follows: cephalosporin C 29.2 min; 7-hydroxycephalosporin C 14.2 min; 7-methoxy-cephalosporin C 17.7 minutes. Alternatively, S- adenosyl-L-[methyl-14c] methionine (25 μCi/ml) was used in the assay (250 nCi per reaction) and the formation of labeled 7- methoxy cephalosporin C was monitored after HPLC chromatography using a radioisotope detector (Beckman 171).

Expression of ORF7 and ORFS in £• lividans result in C-7 methoxylating activity.

The similarity of the protein encoded by ORF7 with methyltransferases suggest that ORF7 (and ORFS) might correspond to genes encoding enzymes for the methoxylation at C-7 in the cephem nucleus. Therefore a 3061 bp Pstl DNA fragment from plJ702-54a (containing ORF7 and ORFS) was subcloned into plJ2921 , rescued with Bglll and subcloned in plJ702, downstream from and in the same orientation as the mel promoter to give expression plasmid plJ702-55a. Cell free extracts of 48 hour cultures of S. lividans [plJ702-55a] were precipitated with ammonium sulphate (30-70%) and tested for C-7 hydroxylase and methyltransferase activity. A product with a retention time of 14.2 minutes (Fig. 7B) is formed during the C-7 hydroxylation reaction. This peak decreases when s-adenosylmethionine (SAM) is added to the assay (Fig. 7D), with concomitant formation of a second product with a retention time of 17.7 minutes showing that both C-7 hydroxylase and methyltransferase activities (identified as 7- methoxyCPC) are present in the transformants; none of the two activities was found in control S. lividans [plJ702] cell free extracts (Figs. 7A and 7C). To identify the product with retention time 17.7 minutes labeled SAM was used in the assay and the radioactivity in the reaction products were observed by HPLC and by TLC followed by autoradiography. Three peaks of radioactivity were found in the HPLC eluted fractions with retention times of 10.7, 14.0 and 17.7 minutes. Formation of the three labeled peaks was dependent upon addition of CPC substrate to the reaction. One of the peaks (retention time 17.7) minutes was identified as 7-methoxycephalosporin C, whereas the others probably correspond to hydrolysis products of the 7-methoxycephalosporin C due to the acid treatment to stop the reaction.

The C-7 hydroxylase and methyltransferase activities of the transformants during the fermentation was determined. Two types of transformants were used: S. lividans [plJ702-55a], which contains the three genes downstream of the mel promoter and constructions in which either the cefF gene or the fragment containing ORF7-ORF8 were

subcloned in plJ699 a plasmid with two transcriptional terminators in which the ORF7-ORF8 should be expressed from its own promoters. The time course of methyltransferase activity during the fermentation overlaps with that of the 7-hydroxylase activity, suggesting that both activities are expressed coordinately.

ORF 7 encodes a 7-hydroxylase activity but the two proteins encoded by ORF7 and ORF8 are required for 7-methoxylase activity.

To define which enzyme activity corresponds to each gene, ORF7 and ORFS were subcloned individually. DNA fragments Pstl- Xhol (2056 bp) and Kpnl (1048 bp) containing ORF7 and ORFS respectively (Fig. 2) were cloned in plJ2921, rescued with Bglll and subcloned in the Bglll site of plJ702 downstream from and in the orientation of the mel promoter to give plasmids pUL702-56a and pUL702-57a.

C-7 hydroxylase activity was found in cultures of S lividans [pUL702-56a] in repeated experiments. No 7-hydroxymethyltransferase activity was observed unless S. lividans was transformed with DNA fragments carrying both ORF7 and ORFS (i.e. pUL702-55a).

Therefore in vitro complementation of both proteins was tested. Cell free extracts of S. lividans [pUL702-56a] and S. lividans [pUL702-57a] were incubated together for 30 minutes in the presence of the cofactors required for i) the C-7 hydroxylase activity; and ii) the C-7 hydroxylase and methyltransferase activities. Under these conditions a peak of C-7 hydroxycephalosporin C was observed showing that a C-7 hydroxylase activity was indeed encoded by ORF7. No formation of 7- methoxy CPC was observed in vitro when the ORF7 and ORFS proteins were mixed together, suggesting that expression levels of ORF8 are low but sufficient to detect methoxy CPC. It is likely that proteins from both ORF7 and ORFS become associated in vivo but such an association in vitro is difficult due to the low concentration of the proteins in cell free extracts.

The 7-hydroxylase shows a cephalosporin-dependent NADH-oxidase activity.

The proteins encoded by ORF7 and ORFS behave as a two protein component system. Cell-free extracts of S. lividans [plJ702-56a] expressing ORF7 showed a strong cephalosporin-dependent NADH oxidase activity (Fig. 8). The protein encoded by ORFS did not show a significant NADH-oxidase activity but was absolutely essential for productive 7-methoxylation. These results indicate that hydroxylation at C-7 is mediated by a hydroxylase that uses NADH as an electron donor for reduction of molecular oxygen to introduce the hydroxyl group. The protein encoded by ORF8 is strictly required for introduction of the methyl group to form 7-methoxy CPC derivatives.

SEQUENCE LISTING

(1) GENERAL INFORMATION:

(i) APPLICANT: Martin, Juan F. Coque, Juan R. Enguita, Francisco J. Fuente, Juan L. Llarena, Francisco J. Liras, Paloma

(ii) TITLE OF INVENTION: DNA ENCODING CEPHAMYCIN BIOSYNTHESIS LATE GENES

(iii) NUMBER OF SEQUENCES: 8

(iv) CORRESPONDENCE ADDRESS:

(A) ADDRESSEE: John W. Wallen III

(B) STREET: P.O. Box 2000

(C) CITY: Rahway

(D) STATE: New Jersey

(E) COUNTRY: USA

(F) ZIP: 07065

(v) COMPUTER READABLE FORM:

(A) MEDIUM TYPE: Floppy disk

(B) COMPUTER: IBM PC compatible

(C) OPERATING SYSTEM: PC-DOS/MS-DOS

(D) SOFTWARE: PatentIn Release #1.0, Version #1.25 (vi) CURRENT APPLICATION DATA:

(A) APPLICATION NUMBER:

(B) FILING DATE:

(C) CLASSIFICATION:

(viii) ATTORNEY/AGENT INFORMATION:

(A) NAME: Wallen III, John W.

(B) REGISTRATION NUMBER: 35,403 (C) REFERENCE/DOCKET NUMBER: 19179

(ix) TELECOMMUNICATION INFORMATION:

(A) TELEPHONE: (908) 594-3905

(B) TELEFAX: (908) 594-4720

(2) INFORMATION FOR SEQ ID NO:1:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 1700 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: cDNA

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1 :

ACGATGCACG CGGTCACCTC GTAGCCACCG TGCCCGCGAC CCCCGGCCCA CGAGGCCGGG 60

GGCGCGGTCC CGACCCTCGT CACGCGTTGG AGGAACGAAT GCTCATCGTC GCGTTCAAAC 120

CGGGGCACGA CGGTGCCGTC GCCGCGATCG GCGATCGCCG GTTGCTCTAC TCGCTCGAAT 180

CGGAGAAGGA CTCCCGGCCG CGGTACTCGC CGATCCTGGC CACCACCGTG CTCGACCTCG 240

CCGAGCGGCT GGGCGAGGTG CCGGACGTGG TCGCCCTCGG CGGCTGGAGC GACCTGCGGC 300

CCAACCGCAT CTCCTACACC GGCGCCGGGT ACTCGGGCAT CGAAGAACCC ACCGTGACCA 360

CCTCGCGCTT CTTCGGCAAG GAGGTGAAGT TCTTCAGCTC CACGCACGAA CGTTCGCACA 420

TCTACATGGC CCTGGGCATG GCGCCGAGGG ACGACAGCCC GGTCCAGACG GTGCTGGTGT 480

GGGAGGGTGA CGTCGGTGCC TTCTACGTGA TCGACGGGCA CCAGCGGATC ACCCGCAAGG 540

TCCAGGTGAT GTCCGGCCCC GGCGCGCGCT ACTCGTTCCT CTTCGGCCTC GCCGACCCCA 600

CTTTCCCCAC CACCGGCGGG AAACCGCGGC TGAACGACGC CGGGAAGCTG ATGGCGCTGG 660

CGGCCTTCGG CGACTCCGCC GACGCGGACG CGGACATCAC GCACGTGGTC GAGCGGATCC 720

TCAAGCAGGA CTCGATGTAC CCGGCGCCGA AGGGTGAATA CCGGGATTCG GTGCTGTACA 780

ACGCCGGGGT CGAGTCGCCG GAGTGCAAGA TCGCCGCCGC GCTGCTCACC GAACGCCTCT 840

TCGAGACCTT CGCCGAGGTC GCCAGGCAGG AGATGCCCGA AGGCAGCCCG CTCTACATCT 900

CCGGCGGCTG CGGGCTGAAC TGCGACTGGA ACAGCCTGTG GGCGCAGCTC GGCCACTTCT 960

CCTCGGTGTT CGTCGCGCCG TGCACCAACG ACTCCGGTTC CGCGCTGGGC ACCGCCATCG 1020

ACGCGCTCAC CACCTTCACC GGTGACCCGC ACGTCGACTG GAGCGTCTAC AGCGGACTGG 1080

AATTCGTCAC CGACACCCAG CCGGACCCGG CCAGGTGGAC CTCCCGCCCG CTCGAGCACG 1140

ACGAGCTCTC CGGCGCGCTC GCCGGTGGCC GGGTCGTCGC CTGGGTGCAG GGCCGCTGGG 1200

AGATCGGTCC GCGCGCGCTG TGCAACCGCT CGCTGCTGGC CGAGCCGTTC GGCGCGGTGA 1260

CCAGGGACCG GCTCAACGAG ATCAAGCAGC GCGAGGACTA CCGCCCGATC GCGCCCGTGT 1320

GCCGGGTCGA GGACCTGGGC AAGGTCTTCC ACGAGGACTT CGAAGACCCG TACATGCTCT 1380

ACTTCCGGCG GGTGCGCGAG TCCAGCGGCC TGCGCGCGGT GACCCACGTG GACGGTTCGG 1440

CCCGCGTGCA GACCGTGCGG GATTCGGGCA ACCCGCAGAT GCACCGGCTG CTCTCGGCCT 1500

TCGCCGCCCA GCGCGGTGTC GGCGTGCTGT GCAACACCTC GCTGAACTTC AACGGCGAGG 1560

GGTTCATCAA CCGCATGTCG GACCTGGTGC TCTACTGCGA ATCCCGCGGC ATCTCGGACA 1620

TGGTCGTCGG CGATACCTGG TACCAGCGTG CCGAGGGCTG ACCCCGGCGC CGGACGGCCG 1680

TGAGCGCGAG CGTCCGGCGG 1700

(2) INFORMATION FOR SEQ ID NO:2 :

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 972 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: cDNA

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2 :

AAGACGGTAC CGGTCTTCAG CATGGCCGAA CTGCGCGACG GCTCGCGCCA GGACGAGTTC 60

CGCGAGTGGG CCCGCCGCGG GGTCTTCTAC CTCACCGGGT ACGGCGCCAC CGAACGAGAC 120

CACCGGGTGG CCACCGACAC CGCGATGGAC TTCTTCGCCC AAGGCACGGC CGAGGAGAAG 180

CAGGCCGTGA CCACGAAGGT CCCGACCATG CGGCGCGGGT ACTCGGCGCT GGAGGCGGAA 240

AGCACCGCCC AGGTCACCAA CACCGGCACC TACACCGACT ACTCCATGTC GTACTCGATG 300

GGCATCGGCG GCAACCTGTT CCCGTCGAAG GAGTTCGAGT CGGTCTGGAC GGACTACTTC 360

GACAGCCTGT ACCGCGCCGC GCAGGAGACC GCGCGCCTGG TGCTGACCGC CGCGGGCACC 420

TACGACGGCG AGGACCTCGA CACCCTGCTC GACTGCGACC CGGTGCTGCG CCTGCGGTAC 480

TTCCCGGAGG TCCCGGAGCA CCGCGCCGCC GAGTACGAGC CACGCCGGAT GGCCCCGCAC 540

TACGACCTGT CCATCATCAC CTTCATCCAC CAGACCCCGT GCGCCAACGG TTTCGTCAGC 600

CTGCAGGCCG AAGTGGACGG TGAGATGGTG AGCCTGCCGC ACGTCGAGGA CGCCGTGGTG 660

GTGCTGTGCG GCGCGATCGC GCCGCTGGTC ACCCAGGGCG CGGTGCCCGC GCCCAACCAC 720

CACGTGGTCT CCCCGGACGC GAGCATGCTC AAGGGCAGCG ACCGCACCTC GAGCGTGTTC 780

TTCCTGCGCC CGTCGACCGA TTTCACCTTC TCGGTGCCCG ACGCCAGGAA GTACGGCCTC 840

GACGTCAGCC TGGACATGGA GAAGGCGACC TTCGGCGACT GGATCGGGAC CAACTACGTC 900 ACGATGCACG CGGTCACCTC GTAGCCACCG TGCCCGCGAC CCCCGGCCCA CGAGGCCGGG 960

GGCGCGGTCC CG 972

(2) INFORMATION FOR SEQ ID NO:3 : (i) SEQUENCE CHARACTERISTIC.:::

(A) LENGTH: 900 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: cDNA

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3 :

CGTGCCGACC GTGTCGAGTC CGCTGTACTA CGCCGCCCCG CTGACCCCGG ACGGCGGGGA 60

CGGGGACTGG TGCATCGACG CCGTGACCCG GCCGCCGGAG GTGCTGTTCA ACTTCCGCAA 120

GGTGGGCGTG GAGACGACCA TCACCGACCT GCGCGAGGGC TCGATCCGGC CGGCGCTAGA 180

CGAAACGGGG TTCGAGAAGG TCACCGCGCC CACCGGCGCG TCCCAGCGGG GCCTGCTGGA 240

CAGCGAGGAA GCCGCGCTGG AGCAGTACCG GCGGGAAACC GGTGAGCTGC TCCGCTCGCT 300

CACCGGCGCG GACGTGGTGG AGTTCTTCGA CGCCACCCTG CGGCGGCAGG ACGCGGCCGA 360

CGACCCGGCC GCCCAGTCCC CGCACCAGCG GGTGCACGTG GACCAGAGCC CGGGCAGCGC 420

GCGGGCCAGG GCCGAGCGGC ACCTCGGCCC CGGCCGGGAG TTCCGGCGCT TCCAGATCAT 480

CAACGTCTGG CGGCCGCTGC TCGAGCCGGT GCGCAACTTC CCGCTGGCGC TGTGCGACTA 540

CCGGTCGCTG GACCTGTCCG CCGACCTGGT GCCGACCCGG CTGGACTTCC CGGACTGGCT 600

GAAGGACCGC GAGAACTACT CGGTCCGGCA CAACCCCGCG CACCGCTGGT ACTTCTGGGA 660

CTCGCTGACA CCGGCCGAAG CGCTGGTCTT CAAGTGCTAC GACAGCGCGA GCCGCGGGCT 720 GGCCATGGCC GGTGGCGAGC CGGACGGCGG CGAACTGCGC GACGTGGCGG GTCTCTGCCC 780

GCACACGGCC TTCTTCGACG AGAACGGGCC GTCGACCGGC CACCTGCGCA CTTCGCTGGA 840

ACTGCGCGCG CTGGCCTTCC ACGAATGAAC GACGAACGAG GAGCAGGGGA AATGTCGGAC 900

(2) INFORMATION FOR SEQ ID NO:4 :

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 800 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: cDNA

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4 :

GGCGGTCCGC TACGGCGACT ACCTCAACCA CGGCCTGCAC TCGCTGATCG TGAAGAACGG 60

CCAGACCTGA TCGAGGAGCA CGCATGACTG ACACCACCCG CCAGGACTTC CTGGACCTCA 120

ACCTGTTCCG GGGGCTGGGG GAGGACCCGG TCTACCACCC GCCGGTGCTG GCCGACCGCC 180

CGCGCGACTG GCCGCTCGAC CGGTGGGCCG AGGCCCCGCG CGATCTCGGG TTCTCCGACT 240

TCGCCCGCTA CCAGTGGCGC GGCCTGCGCA TGCTGAAGAA CCCGGACACC CAGGCCGCCT 300

ACCACGACCT GATGGTCGAA CTGCGGCCCC GCACGGTGAT CGAGCTGGGC GTGTACAGCG 360

GTGGCTCGCT GGCTCGGTTC CGGGACATGG CCGAGCTGAT GGGCTTCGAC TGCCAGGTGC 420

TCGGCATCGA CCGGGACCTG TCCCGCTGCC AGATCCCCGA GTCCGAGATG AAGAACATCT 480 CGCTGCGCGA GGCCGACTGC AGCCTGGACC GGTGGAAGCT CGTGGACGCG CTGGACGGCG 540

TGGGCGACCA CAAGTAGCTG CTGCGCGTGC GCTTGTGGAA GTTGTAGGAC GCCACCAGCC 600

TGGACCACCT CCTGCACGAA GGCGACTACT TCATCATCGA GGACATGATC CCGTACTGGT 660

ACCGGTACAG CCCCAAGCTG CTCACCGAGT ACCTCGCCGC GTTCGCCGGG GAGCTGAGCA 720

TGGACATGGT CTACGCCAAC GCCAGTTCAC AACTGGAACG CGGTGTGCTG CGCCGGTCGG 780

CACCGAAGGC GTAGGTGGAT 800

(2) INFORMATION FOR SEQ ID NO:5:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 520 amino acids

(B) TYPE: amino acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

( i) SEQUENCE DESCRIPTION: SEQ ID NO:5:

Met Leu He Val Ala Phe Lys Pro Gly His Asp Gly Ala Val Ala Ala 1 5 10 15

He Gly Asp Arg Arg Leu Leu Tyr Ser Leu Glu Ser Glu Lys Asp Ser 20 25 30

Arg Pro Arg Tyr Ser Pro He Leu Ala Thr Thr Val Leu Asp Leu Ala 35 40 45

Glu Arg Leu Gly Glu Val Pro Asp Val Val Ala Leu Gly Gly Trp Ser 50 55 60

Asp Leu Arg Pro Asn Arg He Ser Tyr Thr Gly Ala Gly Tyr Ser Gly 65 70 75 80

He Glu Glu Pro Thr Val Thr Thr Ser Arg Phe Phe Gly Lys Glu Val 85 90 95

Lys Phe Phe Ser Ser Thr His Glu Arg Ser His He Tyr Met Ala Leu 100 105 110

Gly Met Ala Pro Arg Asp Asp Ser Pro Val Gin Thr Val Leu Val Trp 115 120 125

Glu Gly Asp Val Gly Ala Phe Tyr Val He Asp Gly His Gin Arg He 130 135 140

Thr Arg Lys Val Gin Val Met Ser Gly Pro Gly Ala Arg Tyr Ser Phe 145 150 155 160

Leu Phe Gly Leu Ala Asp Pro Thr Phe Pro Thr Thr Gly Gly Lys Pro 165 170 175

Arg Leu Asn Asp Ala Gly Lys Leu Met Ala Leu Ala Ala Phe Gly Asp 180 185 . 190

Ser Ala Asp Ala Asp Ala Asp He Thr His Val Val Glu Arg He Leu 195 200 205

Lys Gin Asp Ser Met Tyr Pro Ala Pro Lys Gly Glu Tyr Arg Asp Ser 210 215 220

Val Leu Tyr Asn Ala Gly Val Glu Ser Pro Glu Cys Lys He Ala Ala 225 230 235 240

Ala Leu Leu Thr Glu Arg Leu Phe Glu Thr Phe Ala Glu Val Ala Arg 245 250 255

Gin Glu Met Pro Glu Gly Ser Pro Leu Tyr He Ser Gly Gly Cys Gly

260 265 270

Leu Asn Cys Asp Trp Asn Ser Leu Trp Ala Gin Leu Gly His Phe Ser 275 280 285

Ser Val Phe Val Ala Pro Cys Thr Asn Asp Ser Gly Ser Ala Leu Gly 290 295 300

Thr Ala He Asp Ala Leu Thr Thr Phe Thr Gly Asp Pro His Val Asp 305 310 315 320

Trp Ser Val Tyr Ser Gly Leu Glu Phe Val Thr Asp Thr Gin Pro Asp 325 330 335

Pro Ala Arg Trp Thr Ser Arg Pro Leu Glu His Asp Glu Leu Ser Gly 340 345 350

Ala Leu Ala Gly Gly Arg Val Val Ala Trp Val Gin Gly Arg Trp Glu 355 360 365

He Gly Pro Arg Ala Leu Cys Asn Arg Ser Leu Leu Ala Glu Pro Phe 370 375 380

Gly Ala Val Thr Arg Asp Arg Leu Asn Glu He Lys Gin Arg Glu Asp 385 390 395 400

Tyr Arg Pro He Ala Pro Val Cys Arg Val Glu Asp Leu Gly Lys Val 405 410 415

Phe His Glu Asp Phe Glu Asp Pro Tyr Met Leu Tyr Phe Arg Arg Val 420 425 430

Arg Glu Ser Ser Gly Leu Arg Ala Val Thr His Val Asp Gly Ser Ala 435 440 445

Arg Val Gin Thr Val Arg Asp Ser Gly Asn Pro Gin Met His Arg Leu 450 455 460

Leu Ser Ala Phe Ala Ala Gin Arg Gly Val Gly Val Leu Cys Asn Thr 465 470 475 480

Ser Leu Asn Phe Asn Gly Glu Gly Phe He Asn Arg Met Ser Asp Leu 485 490 495

Val Leu Tyr Cys Glu Ser Arg Gly He Ser Asp Met Val Val Gly Asp 500 505 510 Thr Trp Tyr Gin Arg Ala Glu Gly

515 520

(2) INFORMATION FOR SEQ ID NO: 6:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 310 amino acids

(B) TYPE: amino acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6:

Met Ser Asp Lys Thr Val Pro Val Phe Ser Met Ala Glu Leu Arg Asp 1 5 10 15

Gly Ser Arg Gin Asp Glu Phe Arg Glu Trp Ala Arg Arg Gly Val Phe 20 25 30 Tyr Leu Thr Gly Tyr Gly Ala Thr Glu Arg Asp His Arg Val Ala Thr

35 40 45

Asp Thr Ala Met Asp Phe Phe Ala Gin Gly Thr Ala Glu Glu Lys Gin 50 55 60

Ala Val Thr Thr Lys Val Pro Thr Met Arg Arg Gly Tyr Ser Ala Leu 65 70 75 80

Glu Ala Glu Ser Thr Ala Gin Val Thr Asn Thr Gly Thr Tyr Thr Asp 85 90 95

Tyr Ser Met Ser Tyr Ser Met Gly He Gly Gly Asn Leu Phe Pro Ser 100 105 110

Lys Glu Phe Glu Ser Val Trp Thr Asp Tyr Phe Asp Ser Leu Tyr Arg 115 120 125

Ala Ala Gin Glu Thr Ala Arg Leu Val Leu Thr Ala Ala Gly Thr Tyr 130 135 140

Asp Gly Glu Asp Leu Asp Thr Leu Leu Asp Cys Asp Pro Val Leu Arg 145 150 155 160

Leu Arg Tyr Phe Pro Glu Val Pro Glu His Arg Ala Ala Glu Tyr Glu 165 170 175

Pro Arg Arg Met Ala Pro His Tyr Asp Leu Ser He He Thr Phe He 180 185 190

His Gin Thr Pro Cys Ala Asn Gly Phe Val Ser Leu Gin Ala Glu Val 195 200 205 Asp Gly Glu Met Val Ser Leu Pro His Val Glu Asp Ala Val Val Val 210 215 220

Leu Cys Gly Ala He Ala Pro Leu Val Thr Gin Gly Ala Val Pro Ala 225 230 235 240

Pro Asn His His Val Val Ser Pro Asp Ala Ser Met Leu Lys Gly Ser 245 250 255

Asp Arg Thr Ser Ser Val Phe Phe Leu Arg Pro Ser Thr Asp Phe Thr 260 265 270

Phe Ser Val Pro Asp Ala Arg Lys Tyr Gly Leu Asp Val Ser Leu Asp 275 280 285 Met Glu Lys Ala Thr Phe Gly Asp Trp He Gly Thr Asn Tyr Val Thr

290 295 300

Met His Ala Val Thr Ser 305 310

(2) INFORMATION FOR SEQ ID NO:7 : (i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 288 amino acids

(B) TYPE: amino acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:7 :

Val Pro Thr Val Ser Ser Pro Leu Tyr Tyr Ala Ala Pro Leu Thr Pro 1 5 10 15

Asp Gly Gly Asp Gly Asp Trp Cys He Asp Ala Val Thr Arg Pro Pro 20 25 30

Glu Val Leu Phe Asn Phe Arg Lys Val Gly Val Glu Thr Thr He Thr 35 40 45

Asp Leu Arg Glu Gly Ser He Arg Pro Ala Leu Asp Glu Thr Gly Phe 50 55 60

Glu Lys Val Thr Ala Pro Thr Gly Ala Ser Gin Arg Gly Leu Leu Asp 65 70 75 80

Ser Glu Glu Ala Ala Leu Glu Gin Tyr Arg Arg Glu Thr Gly Glu Leu 85 90 95

Leu Arg Ser Leu Thr Gly Ala Asp Val Val Glu Phe Phe Asp Ala Thr 100 105 110 Leu Arg Arg Gin Asp Ala Ala Asp Asp Pro Ala Ala Gin Ser Pro His

115 120 125

Gin Arg Val His Val Asp Gin Ser Pro Gly Ser Ala Arg Ala Arg Ala 130 135 140

Glu Arg His Leu Gly Pro Gly Arg Glu Phe Arg Arg Phe Gin He He 145 150 155 160

Asn Val Trp Arg Pro Leu Leu Glu Pro Val Arg Asn Phe Pro Leu Ala 165 170 175

Leu Cys Asp Tyr Arg Ser Leu Asp Leu Ser Ala Asp Leu Val Pro Thr 180 185 190

Arg Leu Asp Phe Pro Asp Trp Leu Lys Asp Arg Glu Asn Tyr Ser Val 195 200 205

Arg His Asn Pro Ala His Arg Trp Tyr Phe Trp Asp Ser Leu Thr Pro 210 215 220

Ala Glu Ala Leu Val Phe Lys Cys Tyr Asp Ser Ala Ser Arg Gly Leu 225 230 235 240

Ala Met Ala Gly Gly Glu Pro Asp Gly Gly Glu Leu Arg Asp Val Ala 245 250 255

Gly Leu Cys Pro His Thr Ala Phe Phe Asp Glu Asn Gly Pro Ser Thr 260 265 270

Gly His Leu Arg Thr Ser Leu Glu Leu Arg Ala Leu Ala Phe His Glu 275 280 285

(2) INFORMATION FOR SEQ ID NO:8:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 236 amino acids

(B) TYPE: amino acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: Met Thr Asp Thr Thr Arg Gin Asp Phe Leu Asp Leu Asn Leu Phe Arg 1 5 10 15

Gly Leu Gly Glu Asp Pro Val Tyr His Pro Pro Val Leu Ala Asp Arg 20 25 30

Pro Arg Asp Trp Pro Leu Asp Arg Trp Ala Glu Ala Pro Arg Asp Leu 35 40 45

Gly Phe Ser Asp Phe Ala Arg Tyr Gin Trp Arg Gly Leu Arg Met Leu 50 55 60

Lys Asn Pro Asp Thr Gin Ala Ala Tyr His Asp Leu Met Val Glu Leu 65 70 75 80

Arg Pro Arg Thr Val He Glu Leu Gly Val Tyr Ser Gly Gly Ser Leu 85 90 95

Ala Arg Phe Arg Asp Met Ala Glu Leu Met Gly Phe Asp Cys Gin Val 100 105 110

Leu Gly He Asp Arg Asp Leu Ser Arg Cys Gin He Pro Glu Ser Glu 115 120 125

Met Lys Asn He Ser Leu Arg Glu Ala Asp Cys Ser Asp Leu Ala Thr 130 135 140

Phe Glu His Leu Arg Asp Leu Pro His Pro Leu Val Phe He Asp Asp 145 150 155 160

Ala His Ala Asn Thr Phe Asn He Leu Arg Trp Ser Val Asp His Leu 165 170 175

Leu His Glu Gly Asp Tyr Phe He He Glu Asp Met He Pro Tyr Trp 180 185 190

Tyr Arg Tyr Ser Pro Lys Leu Leu Thr Glu Tyr Leu Ala Ala Phe Ala 195 200 205

Gly Glu Leu Ser Met Asp Met Val Tyr Ala Asn Ala Ser Ser Gin Leu 210 215 220

Glu Arg Gly Val Leu Arg Arg Ser Ala Pro Lys Ala 225 230 235