Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A NEW SENSITIVE METHOD FOR QUANTIFYING ACTIVE TRANSFORMING GROWTH FACTOR-BETA AND COMPOSITIONS THEREFOR
Document Type and Number:
WIPO Patent Application WO/1995/019987
Kind Code:
A1
Abstract:
The present invention describes a highly sensitive and specific non-radioactive quantitative assay method for quantifying transforming growth factor-beta (TGF-'beta') in a liquid sample. Also disclosed are TGF-'beta' responsive expression vectors that express the indicator molecule, luciferase, in a dose-dependent response to TGF-'beta' activation. Eucaryotic cells transformed with the disclosed expression vectors are also described. Diagnostic systems in the form of kits for quantifying the amount of TGF-'beta' in a liquid sample using the disclosed methods and expression vectors are described.

Inventors:
LOCKUTOFF DAVID J
CURRIDEN SCOTT A
Application Number:
PCT/US1995/001153
Publication Date:
July 27, 1995
Filing Date:
January 25, 1995
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
SCRIPPS RESEARCH INST (US)
International Classes:
C07K14/495; C12N15/18; C12Q1/68; (IPC1-7): C07H21/00; C07K14/495; C12N15/18
Foreign References:
US5216126A1993-06-01
US5268295A1993-12-07
Other References:
CELL, Volume 71, issued 11 December 1992, J.L. WRANA et al., "TGFbeta Signals Through a Heteromeric Protein Kinase Receptor Complex", pages 1003-1014.
JOURNAL OF CELLULAR PHYSIOLOGY, Volume 152, issued 1992, R. FLAUMENHAFT et al., "Cell Density Dependent Effects of TGF-beta Demonstrated by a Plasminogen Activator-Based Assay for TGF-beta", pages 48-55.
THE JOURNAL OF BIOLOGICAL CHEMISTRY, Volume 193, issued 1951, O.H. LOWRY et al., "Protein Measurement With the Folin Phenol Reagent", pages 265-275.
MOLECULAR AND CELLULAR BIOLOGY, Volume 12, No. 4, issued April 1992, A. RICCIO et al., "Transforming Growth Factor beta1-Responsive Element: Closely Associated Binding Sites for USF and CCAAT-Binding Transcription Factor-Nuclear Factor I in the Type 1 Plasminogen Activator Inhibitor Gene", pages 1846-1855.
Download PDF:
Claims:
WE CLAIM :
1. A method for quantifying the amount of transforming growth factorβ (TGFβ) in a liquid sample, which method compriseε : (a) incubating said liquid sample together with eucaryotic cells that contain a TGFβ responεive expresεion vector having a gene encoding luciferaεe for a predetermined time period εufficient for εaid eucaryotic cellε to expreεε a detectable amount of εaid luciferaεe; (b) meaεuring the amount of εaid luciferase expresεed during εaid time period; and (c) determining the amount of TGFβ preεent in said sample by comparing the measured amount of said luciferase • against a reference curve.
2. The method in accordance with claim 1 wherein the reference curve repreεentε a εerieε of meaεured amountε of said luciferase produced from a series of known concentrations of TGFβ by said eucaryotic cellε .
3. The method in accordance with claim 1 wherein εaid eucaryotic cellε are mammalian cellε.
4. The method in accordance with claim 3 wherein εaid mammalian cellε are memberε of the group conεiεting of mink lung epithelial cellε, HeLa cellε, Chineεe hamster ovary cells, Hep3B cells, GM7373 cellε, and NIH 3T3 cellε.
5. The method in accordance with claim 1 wherein the TGFβ reεponεive expreεεion vector iε a plaεmid comprising, in the direction of transcription, a regulatory region that includes at least one TGFβ inducible responεe element that iε operatively linked to a promoter, and a structural region downstream of said promoter, εaid response element being capable of inducing dosedependent luciferase activity and said structural region coding for said luciferase.
6. The method in accordance with claim 5 wherein εaid plasmid includes a nucleotide εequence that correεpondε to a εequence selected from the group consiεting of SEQ ID NOs 110.
7. The method in accordance with claim 5 wherein said plasmid has the identifying characteristicε of a plasmid selected from the group conεiεting of plaεmid ATCC Acceεεion Number 75627, plaεmid ATCC Acceεεion Number 74628 and plaεmid ATCC Acceεεion Number 75629.
8. The method in accordance with claim 5 wherein εaid TGFβ inducible response element comprises a nucleotide sequence that corresponds to a sequence selected from the group consisting of SEQ ID NOs 1117.
9. The method in accordance with claim 5 wherein said promoter comprises a nucleotide sequence that corresponds to a sequence selected from the group consisting of SEQ ID NOs 18 and 19. 10.
10. The method in accordance with claim 1 wherein said eucaryotic cells are stably transformed cells that contain said TGFβ responsive vector, and wherein said vector also includes a gene encoding a selectable marker.
11. The method in accordance with claim 10 wherein said vector is a plasmid compriεing a nucleotide sequence that correspondε to a εequence εelected from the group conεiεting of SEQ ID NOε 16.
12. The method in accordance with claim 1 wherein εaid eucaryotic cells are transiently transformed cells that contain said TGFβ responεive vector, and wherein εaid vector iε a plasmid comprising a nucleotide sequence that correspondε to a sequence selected from the group conεisting of SEQ ID NOs 710.
13. The method in accordance with claim 1 wherein said liquid sample iε selected from the group consiεting of a body fluid, culture medium and a tissue extract.
14. A method for quantifying the amount of transforming growth factorβ (TGFβ) in a liquid sample comprising: (a) providing, in eucaryotic cells capable of expressing an indicator molecule, a plasmid compriεing, in the direction of tranεcription, a regulatory region that includeε at leaεt one TGFβ inducible reεponse element that iε operatively linked to a promoter, and a εtructural region downstream of said promoter, said responεe element being capable of inducing doεedependent indicator molecule activity and said structural region coding for said indicator molecule; (b) incubating said liquid sample with said eucaryotic cells for a predetermined time period sufficient for said eucaryotic cells to expreεε a detectable amount of εaid indicator molecule; (c) measuring the amount of said indicator molecule expresεed during said time period; and (d) comparing the measured amount of said indicator molecule produced in step (c) with the amount of indicator molecule produced in a control aεεay performed according to steps (a) through (c) by treating said liquid sample with an antiTGFβ antibody to obtain a net measured amount of said indicator molecule induced by said TGFβ.
15. The method in accordance with claim 14 wherein said liquid sample contains an isoform of TGFβ εelected from the group conεiεting of TGFβl, TGFβ2 and TGFβ3.
16. The method in accordance with claim 14 wherein said liquid sample is εelected from the group conεiεting of a body fluid, culture medium and a tiεεue extract.
17. The method in accordance with claim 14 wherein said eucaryotic cell is a mammalian cell.
18. The method in accordance with claim 14 wherein said mammalian cell is selected from the group consisting of mink lung epithelial cellε, HeLa cellε, Chineεe Hamεter Ovary cellε, Hep3B cellε, GM7373 cellε and NIH 3T3 cellε.
19. The method in accordance with claim 14 wherein εaid indicator molecule iε luciferaεe.
20. The method in accordance with claim 14 wherein εaid plasmid comprises a nucleotide sequence that correspondε to a sequence selected from the group consiεting of SEQ ID NOε 110.
21. The method in accordance with claim 14 wherein εaid TGFβ inducible response element compriseε a nucleotide εequence that correεpondε to a εequence εelected from the group conεisting of SEQ ID NOs 1117.
22. The method in accordance with claim 14 wherein said promoter compriseε a nucleotide sequence that corresponds to a sequence selected from the group consiεting of SEQ ID NOs 18 and 19.
23. The method in accordance with claim 14 wherein said plasmid has the identifying characteristicε of a plasmid selected from the group consiεting of plasmid ATCC Accesεion Number 75627, plasmid ATCC Accesεion Number 74628 and plaεmid ATCC Accession Number 75629.
24. The method in accordance with claim 14 wherein εaid eucaryotic cellε are εtably tranεformed cells that contain said plasmid, and wherein said plasmid contains a gene encoding a selectable marker for the selection of said stably tranεformed cells.
25. The method in accordance with claim 24 wherein said plasmid comprises a nucleotide sequence that correspondε to a sequence εelected from the group conεiεting of SEQ ID NOs 16.
26. The method in accordance with claim 14 wherein said eucaryotic cellε are εtably tranεformed cellε that contain the TGFβ response element having the nucleotide εequence in SEQ ID NO 11, and wherein said cellε correspond to cells on depoεit with ATCC having the ATCC Accession Number CRL 11508.
27. The method in accordance with claim 14 wherein eucaryotic cells comprise transiently transformed cells that contain said plasmid comprising a nucleotide sequence that corresponds to a sequence εelected from the group conεiεting of SEQ ID NOs 710.
28. The method in accordance with claim 14 further comprising the step of: (e) determining the amount of said TGFβ present in said sample by comparing the measured amount of said indicator molecule obtained in step (d) against a reference curve.
29. The method in accordance with claim 28 wherein εaid reference curve repreεentε a εerieε of meaεured amountε of εaid indicator molecule produced from a εerieε of known concentrationε of TGFβ in εaid eucaryotic cellε.
30. A plasmid vector in subεtantially pure form capable of cauεing expreεεion of an indicator molecule in a eucaryotic cell, said plasmid including in the direction of transcription, a first nucleotide εequence comprising a regulatory region that includes at least one TGFβ inducible reεponse element operatively linked to a promoter, a second nucleotide sequence comprising a εtructural region downstream of said promoter and coding for said indicator molecule, and a third nucleotide sequence comprising a gene encoding a selectable marker for the selection of a stably transformed cell, said response element being capable of inducing dosedependent luciferase activity and said structural region coding for said luciferase.
31. The plasmid vector in accordance with claim 30 capable of expressing a chemiluminescent indicator molecule.
32. The plasmid vector in accordance with claim 30 wherein said plasmid comprises a nucleotide sequence that correspondε to a sequence selected from the group consiεting of SEQ ID NOs 16.
33. The plasmid vector in accordance with claim 30 wherein said TGFβ inducible responεe element comprises a nucleotide sequence that corresponds to a sequence selected from the group consiεting of SEQ ID NOε 1117.
34. The plasmid vector in accordance with claim 30 wherein said promoter comprises a nucleotide sequence that corresponds to a εequence εelected from the group consisting of SEQ ID NOs 18 and 19.
35. The plasmid vector in accordance with claim 30 wherein said gene comprises the nucleotide sequence in SEQ ID NO 20.
36. A plaεmid vector in substantially pure form and capable of causing expresεion of luciferase in a eucaryotic cell, εaid plaεmid compriεing in the direction of tranεcription, a regulatory region that includeε at leaεt one TGFβ inducible reεponεe element that iε operatively linked to a promoter, and a structural region downstream of said promoter for transcription therefrom and coding for εaid luciferase, said response element being capable of inducing dosedependent luciferase activity and said structural region coding for said luciferase, and wherein said plasmid has the identifying characteristics of a plasmid selected from the group consiεting of plasmid ATCC Acceεεion Number 75627, plasmid ATCC Accession Number 74628 and plasmid ATCC Accesεion Number 75629.
37. A plaεmid vector in εubεtantially pure form and capable of cauεing expreεεion of luciferaεe in a eucaryotic cell, εaid plaεmid compriεing in the direction of tranεcription, a regulatory region that includeε at leaεt one TGFβ inducible reεponεe element that iε operatively linked to a promoter, and a εtructural region downεtream of εaid promoter for tranεcription therefrom and coding for εaid luciferaεe, said response element being capable of inducing dosedependent luciferase activity and said structural region coding for εaid luciferaεe, and wherein said plasmid compriseε a nucleotide sequence that correspondε to a sequence selected from the group consisting of SEQ ID Nos 710.
38. A eucaryotic cell containing a plasmid vector having a nucleotide sequence that correspondε to a εequence selected from the group consiεting of SEQ ID NOε 110.
39. The eucaryotic cell in accordance with claim 3δ wherein εaid cell iε selected from the group conεiεting of mink lung epithelial cellε, HeLa cellε, Chineεe hamεter ovary cellε, Hep3B cellε, GM7373 cellε and NIH 3T3 cellε.
40. A kit uεeful in aεsaying the amount of TGFβ in a liquid sample comprising (a) packaging material; (b) eucaryotic cellε contained within said packaging material, said cells capable of expressing an indicator molecule and containing a plasmid comprising, in the direction of transcription, a regulatory region that includes at least one TGFβ inducible responεe element that is operatively linked to a promoter, and a structural region downstream of said promoter, said response element being capable of inducing dosedependent indicator molecule activity and said structural region coding for said indicator molecule; and (c) an aliquot of TGFβ contained within said packaging material, said TGFβ used for generating a reference curve representing a measured amount of the indicator molecule produced from a known concentration of TGF β.
41. The kit in accordance with claim 40 wherein said eucaryotic cells are selected from the group consisting of mink lung epithelial cells, HeLa cells, Chinese Hamster Ovary cells, Hep3B cells, GM7373 cells and NIH 3T3 cells.
42. The kit in accordance with claim 40 wherein said plasmid comprises a nucleotide sequence that correspondε to a sequence selected from the group conεisting of SEQ ID NOs 110.
43. The kit in accordance with claim 40 wherein said plasmid comprises a plasmid having the identifying characteristics of a plasmid selected from the group consisting of plasmid ATCC Accession Number 75627, plasmid ATCC Accesεion Number 74628 and plaεmid ATCC Acceεεion Number 75629.
44. The kit in accordance with claim 40 wherein said packaging material compriseε a label indicating that said eucaryotic cells can be used for determining the amount of TGF β in said liquid sample comprising the εtepε of (a) incubating εaid cells with said liquid sample; (b) measuring the amount of said indicator molecule produced thereby; and (c) comparing the amount of measured indicator molecule with said reference curve.
45. The kit in accordance with claim 40 wherein said eucaryotic cells are stably tranεformed cellε that contain the TGFβ reεponse element having the nucleotide sequence in SEQ ID NO 11, and wherein said cells correεpond to cells on depoεit with ATCC having the ATCC Accession Number CRL 11508.
46. The kit in accordance with claim 40 further compriεing: (d) an antiTGFβ antibody for use in a parallel control aεsay for determining the amount of indicator molecule produced other than by TGFβ induction.
Description:
A NEW SENSITIVE METHOD FOR QUANTIFYING

ACTIVE TRANSFORMING GROWTH FACTOR-BETA

AND COMPOSITIONS THEREFOR

>

5 TF-_r.hr_i.cal Field

The present invention relates to a sensitive assay method for quantifying the amount of active transforming growth factor beta (TGF-β) and vector compositions for use therein for expressing an indicator molecule in response to TGF-β 10 activation of a TGF-β response element in the vector.

Background

Transforming growth factor beta, hereinafter referred to as TGF-β, is a 25 kilodalton (kD) homodimeric protein that 15 belongs to a family of regulators of cell growth and differentiation that includes activins, inhibins, Mullerian inhibiting substance, the Drosophila decapentaplegic complex and bone morphogenic proteins. For review, see, Massague, Ann. Rev. Cell Biol .. 6:597-641 (1990); Roberts et al. , In Peptide 20 Growth Factors and Their Receptors, Sporn et al., Eds.,,

Springer-Verlag, Berlin, 1:419-472 (1990); and Hoffman, Curr. Ooin. Cell Biol.. 3:947-952 (1991). TGF-β was initially defined by its ability to induce morphological transformation of fibroblastic cells in monolayer culture and stimulation of 25 colony formation in soft agar. Delarco et al. , Proc. Natl.

Acad. Sci.. USA. 75:4001-4005 (1978) and Todaro et al. , Proc. Natl. Acaά. Sci .. USA. 77:5258-5262 (1980) .

Three distinct molecular isoforms of TGF-β, the genes of which are located on different chromosomes, have been 30 identified in mammals and are designated TGF-βl, TGF-β2 and

TGF-β3. Derynck et al. , Nature. 316:701-705 (1985); Hanks et al., Proc. Natl. Acad. Sci.. USA. 85:71-72 (1988); and Madisen " et al., DN . 7:1-8 (1988). Each of the isoforms are first

( synthesized as high molecular weight latent or inactive

* 35 precursor polypeptides that are then processed to 12.5 kD

monomers . Activation of the latent complex can occur through a variety of physiochemical or enzymatic treatments as well as in various tissue culture systems. For review, see Barnard et al., Biochim. Bionhvs. Acfca .. 1032:79-87 (1990) . Two processed monomers then dimerize to form biologically active TGF-β.

The activation process must occur to allow binding of the dimerized TGF-β to the high affinity TGF-β receptors expressed on the surfaces of all normal cells and most all neoplastic cells. Tucker et al. , Proc. Natl. Acad. Sci.. USA. 81:6757- 6761 (1984) ; Frolik et al . , J. Biol. Che .. 259:10995-11000

(1984) ; Pircher et al. , Biochem. Bioohvs . Res. Commun., 136:30- 37 (1986) .

Although some TGF-β activation systems generate the mature TGF-β in nanogram quantities, the majority liberate picogram amounts. These low concentrations, however, are sufficient to induce a variety of biological responses such as macrophage chemotaxis (Wahl et al., Proc. Natl. Acad. Sci., USA. 84:5788- 5792 (1987)), inhibition of endothelial cell migration and proliferation (Heimark et al. , Science. 233:1078-1080 (1986)), stimulation of extracellular matrix deposition (Ignotz et al . , J. Biol . Chem.. 261:4337-4345 (1986)) and decreased plasminogen activator (PA) activity as a result of decreased PA production (Laiho et al . , J. Cell. Biol.. 103:2403-2410 (1986) and Flaumenhaft et al., J. Cell. Phvsiol.. 152:48-55 (1992)) along with increased secretion of its inhibitor, plasminogen activator inhibitor-1 (PAI-1) (Laiho et al. , J. Biol . Chem.. 262:17467-17474 (1987)) .

PAI-1 is the primary inhibitor of both tissue-type plasminogen activator (t-PA) and urokinase-type plasminogen activator (u-PA) , and as such is a potent anti-fibrinolytic molecule. PAI-1 synthesis by cultured cells in vitro is induced by a variety of molecules including cytokines, growth factors, hormones, and other agents such as endotoxin and phorbol myristate acetate. Nuclear transcription run-on assays demonstrate that the regulation of PAI-1 by many of these

agents, including TGF-β, occurs primarily at the level of transcription.

TGF-β released from platelets may be an important negative regulator of the fibrinolytic system of the vessel wall since the TGF-β in releasates of thrombin-activated platelets causes large increases in PAI-1 synthesis by endothelial cells. This increased PAI-1 synthesis may account for the resistance of platelet-rich thrombi to thrombolytic therapy. The accumulation of PAI-1 in the extracellular matrix in response to TGF-β protects matrix proteins from proteolytic degradation. Thus, the induction of PAI-1 by TGF-β may also play a role in both wound healing and fibrotic responses.

These and other biological effects of TGF-β activity have been used to develop a variety of semiquantitative and quantitative bioassays including those based on chondrogenesis, inhibition -of DNA synthesis and cell growth, differentiation, migration or PA activity. Differentiation-based assays include the induction of cartilage specific proteoglycan expression (ED 50 = 5 ng/ml; 200 pM) (Ogawa et al . , in Peptide Growth Factors, Barnes et al. , Eds, Academic Press Inc., 198:317-327 (1991) ; Seyedin et al. , Proc. Natl. Acad. Sci.. USA. 82:2267- 2271 (1985)) and inhibition of rat L6 iyoblaεt differentiation (ED 50 = 0.2 ng/ml; 8 pM) (Florini et al . , J. Biol . Chem.. 261:16509-16513 (1986)) . An ED 50 represents the half-maximal amount of factor required to produce an effect, activation or inhibition, on differentiation of target cells. The abbreviations ng/ml, pg/ml, nM and pM respectively stand for nanogra s/milliliter, picograms/milliliter, nanomolar and picomolar. These assays are utilized primarily for studying differentiation rather than for quantification of TGF-β.

Assays based on TGF-β's ability to inhibit DNA synthesis and cell growth in mink lung epithelial cells (MLE cells) (ED 50 = 10-20 pg/ml; 0.4-0.8 pM) (Lucas et al., In Peptide Growth Factors, Barnes et al., Eds, Academic Press Inc. 198:303-316 (1991) and Danielpour et al. , J. Cell. Phvsiol .. 138:79-86

(1989)), African green monkey kidney epithelial cells (ED50 = 1 ng/ml; 40 pM) (Holley et al., Proc. Natl. Acad. Sci.. USA. 77:5989-5992 (1980)), rat hepatocytes (ED 50 = 0.4 ng/ml;16 pM) (Nakamura et al. , Bioche . Bioohvs. Res. Co rn.. 133:1042-1050 (1985) ) , and fetal bovine heart endothelial cells (ED 50 = 75- 125 pg/ml; 3-5 pM) (Qian et al., Proc. Natl. Acad. Sci.. USA. 89:6290-6294 (1992)) are sensitive but can be affected by a variety of molecules such as insulin, EGF, PDGF, and bFGF.

Migration and plasminogen activator (PA) activity assays have also been described. The migration of bovine aortic endothelial cells (BAEs) into a denuded area of a monolayer is inhibited by TGF-β (ED5 0 - 2 μg/ml; 80 pM: sensitivity 10-20 pg/ml; 0.4-0.8 pM) (Sato et al., J. Cell Biol.. 107:1199-1205 (1988) ; Sato et al., J. Cell Biol.. 109:309-315 (1989); and Sato et al., J. Cell Biol.. 111:757-763 (1990) . Migration of

BAEs, however, can be simultaneously stimulated by endogenously or exogenously supplied bFGF that can abrogate TGF-β's inhibitory effect (Sato et al . , J. Cell Biol.. 107:1199-1205 (1988) ) . The PA assay for measurement of TGF-β concentration is very sensitive and rapid (Flaumenhaft et al . , J. Cell.

Phvsiol.. 152:48-55 (1992)). The assay is based on the ability of TGF-β to decrease PA activity of BAEs by inhibiting PA synthesis and secretion and by inducing expression of its inhibitor, PAI-1. This assay, however, is also sensitive to other molecules, such as bFGF, that can alter PA activity

(Flaumenhaft et al . , J. Cell. Phvsiol .. 152:48-55 (1992) and Sato et al., J. Cell Biol.. 107:1199-1205 (1988)) . The ED 50 of the assay varies from 1 to 35 pg/ml (0.04-1.4 pM) of TGF-β depending on differences in basal PA levels and sensitivity to TGF-β among primary BAE cultures.

The ability of TGF-β to stimulate PAI-1 expression has recently been used to study TGF-β receptors. Wrana et al . , Cell. 71:1003-1014 (1992) transiently transfected a PAI-1 luciferase construct together with a human type II TGF-β receptor expression vector into TGF-β resistant MLE cells .

Thiε luciferase construct contained a short, synthetic TGF- β response element based on the human PAI-1 promoter and was used to report functional expression of the receptor. Although only used to screen transfected mutant cell lines, this construct appeared to be less sensitive to TGF-β than the constructs of this invention when transiently transfected into MLE cells, and no information was reported regarding its dose-responsiveness or specificity.

In another study of the TGF-β-stimulation of PAI-1 expression, Riccio et al. , Mol . Cell. Biol.. 12:1846-1855

(1992), transiently transfected TGF-β responsive cells with constructs containing varying regions of the 5'-flanking domain of the human PAI-1 gene to determine the transcription regulatory mechanism used by TGF-β. All the constructs contained the gene encoding the enzyme chloramphenicol acetyltransferase to provide for an indirect determination of the transcriptional effect of the various constructs. With this approach, a 67 base pair region that contained binding sites for the two proteins, CCAAT-binding transcription factor- nuclear family I family and USF factor. Both sites were necessary to obtain TGF-β induction. The constructs, however, were not utilized in assays to determine dose-responsiveness nor measure the amount of TGF-β in a sample.

The most specific assays for TGF-β are the radioreceptor, radioimmumoassay (RIA) , and enzyme-linked immunosorbent assay (ELISA) . Radioreceptor assays using a variety of cell types, such as A549 human lung carcinomas and murine AKR-213, have been described and have ranges of 125 pM/ml to 25 ng/ml (5 pM-1 nM) with ED 50 of approximately 0.5 ng/ml (20 pM) . See, Wakefield et al., J. Cell. Biol.. 105:965-975 (1987); Sato et al., J. Cell Biol.. 111:757-763 (1990); Lucas et al . , In * Peptide Growth Factors, Barnes et al., Eds, Academic Press Inc. 198:303-316 (1991) and 0'Connor-McCourt et al . , J. Biol . Chem.. 262:14090-14099 (1987) . RIAs specific for TGF-βl and β2 have ED50s of 12 and 37 pM, respectively (Danielpour et al., J. Cell

Phvsiol.. 138:79-86 (1989)) . Others, using different antibodies, describe the range of TGF-βl specific RIAs to be 6.25-200 ng/ml (0.25-8 nM) , with a sensitivity of 2.4 ng/ml (0.1 nM) (Lucas et al . , In Peptide Growth Factors, Barnes et al., Eds, Academic Press Inc. 198:303-316 (1991)) . As demonstrated by the differences in these results, the affinities of the antibodies can greatly alter the sensitivity of the assay.

Isoform-specific double antibody or sandwich ELISAs (SELISA) are also very sensitive to the affinities of the antibodies. One such assay, using two different monoclonal antibodies specific for TGF-βl, had a useful range of 0.63 to 40 ng/ml (0.025-16 nM) (Lucas et al. , In Peptide Growth Factors, Barnes et al., Eds, Academic Press Inc. 198:303-316 (1991)) . Using a combination of isoform-specific turkey and ' rabbit antibodies, Danielpour et al . , J. Cell Phvsiol .. 138:79- 86 (1989) created a SELISA with detection limits of 2-5 pg/well (20-50 pg/ml; 0 8-2 pM) . Although highly sensitive and specific, SELISAs such as these are not readily available and are expensive.

Although all of these other TGF-β assays can detect mature TGF-β, the low concentrations (<2 pM) generated in various biological systems make many of them impractical without prior concentration of the sample. This can result in large losses of the mature growth factor or more importantly activation of latent TGF-β. Moreover, many of the assays are complicated to establish and can be influenced by other factors present in the samples thus reducing their utility for accurating measuring the amount of TGF-β in the sample. For this reason, a need exists for a relatively simple, sensitive and nonconfounding assay for TGF-β.

Brief Description of the Invention

A highly sensitive and specific, non-radioactive assay, for mature (active) TGF-β has now been developed. When

compared to the sensitive and widely used proliferation-based MLEC method for measuring TGF-β concentration, the TGF-β assay method of this invention is more rapid, has comparable sensitivity, and has a greater detection range. Specificity of this novel assay was also higher as evidenced by its relative insensitivity to factors such as EGF and bFGF which can greatly affect other assays. The use of a truncated PAI-1 promoter that does not respond to other growth modulators such as PDGF found in biological samples, the method of this invention can be used in conditions where other bioassays are difficult to interpret. Because of its large range and specificity, the rapid, sensitive, non-radioactive, easily performed assay method of this invention is useful in determining active TGF-β concentrations in complex solutions. Thus, the present invention overcomes the limitations of existing methods used to quantify the amount of TGF-β in a liquid sample. This invention contemplates a method for quantifying the amount of TGF-β in a sample using a system comprising a TGF-β responsive cell containing an expression vector having a regulatory region comprising a TGF-β response element operatively linked to a promoter and having a structural region encoding an indicator molecule. Following TGF-β induced activation of the TGF-β response element, transcription results in the expression of an indicator molecule, the amount of which allows for the measurement of the amount of TGF-β responsible for the induced activation. In particular, in one embodiment of the invention contemplates a method for quantifying the amount of TGF-β in a liquid sample, which method comprises: (a) incubating the liquid sample together with eucaryotic cells that contain a TGF-β responsive expression vector having a gene encoding luciferase for a predetermined time period sufficient for the eucaryotic cells to express a detectable amount of the luciferase; (b) measuring the amount of the luciferase expressed

during the time period; and

(c) determining the amount of TGF-β present in the sample by comparing the measured amount of the luciferase against a reference curve. The invention further contemplates that the reference curve represents a quantitative relationship derived from a series of measured amounts of luciferase produced from a series of known concentrations of TGF-β.

Another embodiment of the invention contemplates a method for quantifying the amount of transforming growth factor-β (TGF- β ) in a liquid sample comprising:

(a) providing, in eucaryotic cells capable of expressing an indicator molecule, a plasmid comprising, in the direction of transcription, a regulatory region that includes at least one TGF-β inducible response element that is operatively linked to a promoter, and a structural region downstream of the promoter, where the response element is capable of inducing dose-dependent indicator molecule activity and where the structural region codes for the indicator molecule; (b) incubating the liquid sample with the eucaryotic cells for a predetermined time period sufficient for the eucaryotic cells to express a detectable amount of the indicator molecule;

(c) measuring the amount of the indicator molecule expressed during the time period; and

(d) comparing the measured amount of the indicator molecule produced in step (c) with the amount of indicator molecule produced in a control assay performed according to steps (a) through (c) by treating the liquid sample with an anti-TGF-β antibody to obtain a net measured amount of the indicator molecule induced by TGF-β.

Contemplated for use with the methods of this invention are plasmids having identifying characteristics of plasmids on deposit with ATCC having the ATCC Accession Numbers 75627, 75628 and 75629. Also contemplated are stably transformed

_ Q .

eucaryotic cells that contain the TGF-β response element having the nucleotide sequence in SEQ ID NO 11 where the cells correspond to cells on deposit with ATCC having the ATCC Accession Number CRL 11508. The invention describes plasmids for use in the methods that comprise a nucleotide sequence corresponding to nucleotide sequences listed in SEQ ID NOs 1-10. TGF-β inducible response elements that comprise a nucleotide sequence corresponding to nucleotide sequences listed in SEQ ID NOs 11-17 are also described. Contemplated promoter nucleotide sequences are listed in SEQ ID NOs 18 and 19.

A further embodiment of the methods of the invention are eucaryotic cells that are stably transformed cells containing a plasmid having a gene encoding a selectable marker for the selection of said stably transformed cells. The invention describes such plasmids having nucleotide sequences listed in SEQ ID NOs 1-6. The invention further describes a stably transformed eucaryotic cell on deposit with ATCC having ATCC Accession Number CRL 11508 containing the TGF-β response element having the nucleotide sequence in SEQ ID NO 11.

An additional embodiment are eucaryotic cells that are transiently transformed cells with plasmids corresponding to the nucleotide sequences listed in SEQ ID NOs 7-10.

The invention describes quantifying the amount of TGF-β in a body fluid, in culture medium, and in a tissue extract. A further preferred embodiment is the determination of the amount of a specific isoform of TGF-β, specifically TGF-βl, TGF-β2 or TGF-β3, in a liquid sample.

In a preferred embodiment, this invention describes the use of mammalian cells. Preferred mammalian cells include mink lung epithelial cells, HeLa cells, Chinese hamster ovary cells, Hep3B cells, GM7373 cells, and NIH 3T3 cells.

A preferred indicator molecule also ' described for use with the methods of this invention is a chemiluminescent molecule, preferably luciferase.

The invention describes a composition of a plasmid vector in capable of causing expression of an indicator molecule in a eucaryotic cell, where the plasmid contains nucleotide sequences comprising a regulatory region that includes at least one TGF-β inducible response element operatively linked to a promoter, a structural region downstream of said promoter and coding for said indicator molecule, and a gene encoding a selectable marker for the selection of a stably transformed cell, where the response element is capable of inducing dose- dependent luciferase activity.

In preferred embodiments, plasmids with selectable marker genes have the nucleotide sequences corresponding to SEQ ID NOs 1-6. Preferred TGF-β inducible response elements for use in the expression vectors of this invention have the nucleotide sequences corresponding to SEQ ID NOs 11-17.

A further preferred embodiment of the expression vectors of this invention is the use of the neomycin gene for selecting stable transformantε, the nucleotide sequence of which is listed in SEQ ID NO 20. The invention further describes plasmids lacking a selectable marker gene having the identifying characteristicε of plaεmid ATCC Acceεsion Numbers 75627, 75628, 75629, corresponding to SEQ ID NOs 8-10, respectively.

The invention describeε a eucaryotic cell containing a plaεmid having a nucleotide εequence liεted in SEQ ID NOε 1-10.

Kits useful in asεaying the amount of TGF-β in a liquid sample comprising (a) packaging material; (b) eucaryotic cells capable of expresεing an indicator molecule and containing a plaεmid of this invention and an aliquot of TGF-β, where the latter is used for generating a reference curve.

Other embodiments will be apparent to one skilled in the art.

Brief Description of the Drawings Figure 1 shows the structure and construction of the

pδOOneoLuc expression vector. pδOOLuc was digested with AccI and blunt-ended. pMAMneo was then digested with Sal I and Eco RI, blunt-ended, and the fragment containing the neo ycin- reεiεtance gene (neo r ) waε ligated to the linearized pδOOLuc to form pδOOneoLuc. Clones were analyzed via restriction enzyme mapping and one clone with the proper insert was selected. (MCS, multiple cloning site; PA1, 2, 3, polyadenylation regions 1, 2, and 3) . The details of the conεtruction are described in Example 1A. Figure 2A, having an inset (Figure 2B) , εhowε the doεe- dependent induction of the plaεminogen activator inhibitor- 1/luciferase (PAI/L) conεtruct in pδOOneoLuc expreεsion vector in stably transformed MLE cells by TGF-βl, TGF-β2, and TGF-β3. The TGF-β asεay waε performed aε described in Example 3 with DMEM-BSA containing the indicated concentrations in picomoles

(pM) of recombinant (r) TGF-βl (cloεed εquares) , TGF-β2 (closed circles), or TGF-β3 (closed triangleε) on the X-axiε. The amount of expressed luciferase detected by a luminometer is plotted on the Y-axiε and iε expreεεed in relative light units (RLU) . The results shown in Figures 2A, 2B and 2C are described in Example 3B. Figure 2B showε the treatment of pδOOneoLuc-tranεformed MLE cellε with all three TGF-β isoforms in a TGF-β assay that resulted in a linear dose-response over the range of 0 to 4 pM of TGF-β. In Figure 2C, the TGF-β aεεay waε performed with 8 pM rTGF-βl, TGF-β2 or TGF-β3 in DMEM-BSA in the presence (cross-hatched bars) or absence (open bars) of 100 μg/ml of anti-TGF-β, TGF-β2 and TGF-β3 monoclonal antibody. Baseline induction is indicated by medium alone (filled bars) . Figures 3A, 3B, 3C and 3D show the effects of medium, cell density and incubation time on senεitivity of the TGF-β assay as described in Example 3B with the amount of TGF-βl plotted on the X-axiε in pM against the measured RLU on the Y-axis. In Figure 3A, the assay was performed with increasing rTGF-βl concentrations in DMEM (closed εquares), alpha-MEM (closed circles), CMΞM (closed triangles: Eagleε MEM supplemented with

non-eεεential amino acids) or RPMI-1640 (closed diamonds: Bio- Whittaker) . All media contained 0.1% BSA. In Figure 3B, increasing concentrations of rTGF-βl in DMEM, 0.1% BSA were measured using 3.2 x 10 4 (closed squares), 1.6 x 10 4 (closed circles), or 0.8 x 10 4 (cloεed triangleε) clone 32 (C32) of mink lung epithelial cellε/well (MLE cellε) after a three hour attachment period. Sampleε were incubated with the cellε for 14 hourε prior to aεsaying for luciferase activity. In Figures 3C and 3D (an inset in Figure 3C) , 1.6 x 10 4 C32 cellε were allowed to attach for 3 hourε prior to addition of the indicated concentrationε of rTGF-βl. The sampleε were incubated for 6 (closed squares) , 14 (closed circles) , or 22 (closed triangles) hours prior to assaying for luciferase activity. The results are described in Example 3B. Figures 4A and 4B show the effects of growth factors on the TGF-β asεay and MLEC aεsay while Figure 4C showε the effectε cauεed by serum. For all figures, either the growth factors or TGF-β are plotted on the X-axiε againεt the RLU on the Y-axiε. In Figure 4A, the TGF-β assays were performed with DMEM-BSA containing the indicated concentrations of rTGF-βl (closed εquares), recombinant human bFGF (closed circles), recombinant IL-lalpha (closed triangles) , recombinant PDGF-BB (closed diamonds) , or EGF (open squares) . In Figure 4B, TGF-β assays were performed with DMEM-BSA containing 1 pM rTGF-βl (closed squares) and the indicated concentrations of recombinant human bFGF (closed circles) , recombinant IL-lalpha (closed triangles) , recombinant PDGF (closed triangles) , or EGF (open squareε) . The assays and results are described in Example 3C. In Figure 4C, TGF-β aεsays were performed with DMEM-BSA containing the indicated concentrations of rTGF-βl alone (closed squares) or with 0.5% (closed circles), 1% (closed triangles), or 2% (closed diamonds) calf εerum. The assayε and reεultε are described in Example 3D.

Figure 5 shows the comparison of CMs assayed by the TGF-β (shown as the PAI/L asεay) and MLEC assays. DMEM BSA (closed

squares), COS (X-marked lines), BSM (closed triangles) or BAE (closed circles) cell conditioned medium (CM) with the indicated concentrations of rTGF-βl were aεεayed by PAI/L (TGF- β) assay (broken line) as measured by RLU on the right-hand Y- axis and MLEC (unbroken line) asεay as measured by tritiated thymidine ( 3 H-thymidine) incorporation percent of controls described in Example 3E. The data points were normalized to DMEM-BSA.

Figure 6 shows the effects of growth factorε on DNA εyntheεis as measured by 3 H-thymidine incorporation percent of control. In the graph, DMEM-BSA containing rTGF-βl (closed squares), TGF-β2 (closed circles), TGF-β3 (closed triangles), recombinant human bFGF (closed diamonds), recombinant IL-lalpha (open squares), EGF (open circles), or recombinant PDGF-BB (open triangles) were εeparately aεsayed using the MLEC assay as described Example 3C.

Detailed Description of the Invention A. Definitions Recombinant DNA (rDNA) Molecule: A DNA molecule produced by operatively linking two DNA segmentε. Thuε, a recombinant DNA molecule iε a hybrid DNA molecule compriεing at least two nucleotide sequenceε not normally found together in nature. rDNA's not having a common biological origin, i.e., evolutionarily different, are said to be "heterologous" .

Vector: A rDNA molecule capable of autonomous replication in a cell and to which a DNA segment, e.g., gene or polynucleotide, can be operatively linked so as to bring about replication of the attached segment. Vectorε capable of directing the expression of genes encoding for one or more polypeptides are referred to herein aε "expression vectorε".

Upstream: In the direction opposite to the direction of DNA transcription, and therefore going from 5' to 3 ' on the non-coding strand, or 3' to 5' on the mRNA. Downstream: Further along a DNA sequence in the direction

of sequence transcription or read out, that is traveling in a 3 ' - to 5 ' -direction along the non-coding strand of the DNA or 5 ' - to 3 '-direction along the RNA transcript.

Reading Frame: Particular sequence of contiguous nucleotide triplets (codons) employed in tranεlation that define the εtructural protein encoding-portion of a gene, or structural gene. The reading frame depends on the location of the translation initiation codon.

Response Element: Also referred to as an enhancer element, is a short DNA sequence that occurε further upεtream than the upεtream promoter element. Reεponεe elementε contain specific nucleotide sequences recognized by transcription factors that are DNA-binding proteins.

Promoter: A region on a DNA molecule, generally from 100 to 200 base pairs longε, upεtream from the coding εequence; an area to which the RNA polymeraεe initially bindε prior to the initiation of trancription. The nucleotide εequence of the promoter, or at least part of it, determines the nature of the polymerase that associates with it. Certain consenεus sequences, CAT and TATA boxes, with the promoter region are important for binding of RNA polymerase.

Reσulatorv Region: A DNA control module upstream from the coding sequence containing an upstream promoter element and response elements, the latter of which is also referred to as enhancer elements.

Growth Factor: A small protein that binds to a receptor for controlling cell proliferation.

Receptor: A molecule, εuch as a protein, glycoprotein and the like, that can specifically (non-randomly) bind to another molecule. Receptors of one type are plasma membrane proteins that bind εpecific moleculeε including growth factorε, hormones, or neurotransmitterε, reεulting in the transmisεion of a signal to the cell's interior causing the cell to respond in a specific manner. Sense Strand: A nucleotide sequence referred to as a

sense εtrand of a double-εtranded deoxyribonucleic acid sequence iε the nucleotide εequence that when read in the 5 ' to 3 ' direction by the genetic code defineε an amino acid εequence of interest. Alternatively, senεe εtrand is referred to as a coding strand.

B. Transforming Growth Factor-β (TGF-β)

Transforming growth factor-β, hereinafter referred to as TGF-β, iε a growth inhibitor that exhibitε a diverεity of biological activities in addition to its effects on cellular proliferation. TGF-β belongs to a large family of related molecules with a wide range of regulatory activities as described in the Background. For review, see Barnard et al. , Biochi . Bionhvs. Acta.. 1032:79-67 (1990), the disclosure of which is hereby incorporated by reference.

As previouεly diεcuεεed, TGF-β iε produced and secreted from cells in three distinct molecular iεoformε of TGF-β, the geneε of which are located on different chromosomes, have been identified in mammals and are designated ' TGF-βl, TGF-β2 and TGF-β3. Derynck et al. , Nature. 316:701-705 (1985); Hankε et al., Proc. Natl. Acad. Sci.. USA. 85:71-72 (1988); and Madiεen et al., DNA. 7:1-8 (19δδ) . Each of the iεoformε are synthesized as high molecular weight latent or inactive precurεor polypeptides that are then procesεed to 12.5 kD monomers that then dimerize to form biologically active, also referred to aε mature, TGF-β.

The activation process must occur to allow binding of the dimerized TGF-β to the high affinity TGF-β receptors expresεed on the εurfaceε of all normal cells and most all neoplaεtic cells. Tucker et al. , Proc. Natl. Acad. Sci.. USA. 81:6757- 6761 (1964) ; Frolik et al. , J. Biol. Chem.. 259:10995-11000 (1984) ; Pircher et al . , Biochem. Biophys . Res. Co mun.. 136:30- 37 (1986) .

TGF-β has been shown to induce the increase secretion of the inhibitor, plasminogen activator inhibitor-1 (PAI-1) (Laiho

et al., -T. Biol. Chem.. 262:17467-17474 (1987)) . PAI-1 iε the primary inhibitor of both tiεεue-type plasminogen activator (t- PA) and urokinase-type plaεminogen activator (u-PA) , and as such iε a potent anti-fibrinolytic molecule. Aε a conεequence of PAI-1 induction by TGF-β, the activity of plaεminogen activator (PA) iε decreased. The resulting cascade of activation of plasminogen to plaεmin iε thereby inhibited reεulting in the subsequent degradation of fibrin.

While PAI-1 syntheεis by TGF-β has been shown to occur primarily at the level of transcription following the TGF-β receptor-ligand interaction, the mechaniεm of activation of the PAI-1 promoter resulting in the transcription of the PAI-1 gene is less well understood. Studieε of PAI-1 gene tranεcription have shown that the signal tranεduction mechaniεmε are independent of de novo protein synthesiε aε determined by the lack of inhibition by cycloheximide and rapid onεet of induction aε deεcribed by Sawdey et al. , J. Biol . Chem.. 264:10396-10401 (1989), the disclosure of which is hereby incorporated by reference. The TGF-β-induced enhancement of promoter activity for the alpha 2 collagen gene haε been εhown to be mediated by a binding εite for nuclear factor I as described by Sporn et al., J. Cell Biol.. 105:1039-1045 (1987) .

As shown in Example 4, the PAI-1 promoter contains AP-1- like nucleotide εequenceε which iε bound by the AP-1 heterodimeric transcription factor complex of Fos and Jun protein εubunitε. Although AP-1-like DNA enhancer εiteε are preεent in PAI-1, as shown in Example 4, activation of these siteε by the AP-1 heterodimeric complex waε independent of the TGF-β-mediated induction of PAI-1 syntheεis. Although the exact transcriptional mechaniεm of PAI-1 promoter activation following TGF-β receptor-ligand interaction is not known as well as the identification of the responsible TGF-β-related transcription factor, the activation of a TGF-β responεe element of thiε invention following TGF-β occupancy of the TGF-β receptor will be referred to as TGF-β-induced

activation. Since the TGF-β responεe element iε activated by TGF-β resulting in the induction of indicator protein expression, the TGF-β reεponεe element is also referred to aε a TGF-β inducible response element

C. TGF-β Response Elements

The present invention is based on the discovery that when eucaryotic cells, transformed with a TGF-β-responεive expreεsion vector of this invention, were exposed to liquid samples of TGF-β, the resulting expression of an indicator molecule was dose-dependent in relationship to the amount of TGF-β present in the sample. Thus, the present invention provides for a method to quantify the amount of TGF-β in an liquid sample by measuring the amount of indicator molecules expresεed.

The induced expression of the indicator molecules was the result of activation of TGF-β responεe elementε present in the regulatory region of the TGF-β responsive expression vectors, the latter of which are described in Section D. In practicing this invention, the regulation of transcription in the TGF-β responsive expression vector- tranεformed eucaryotic cellε is dependent TGF-β. As described above, the TGF-β occupation of the TGF-β receptor expressed on the surface of cells results in the activation of a TGF-β- related transcription factor. In general, tranεcription factors are site-εpecific DNA-binding proteinε. Typically, usually positioned 5 ' to a structural gene iε a region of nucleotide sequences that are responsible for controlling transcription. This region has been coined the "control module" .

The control module compriεes two categories of regulatory sequences, the promoter element and the enhancer elements. The promoter iε referred to aε an upstream promoter as it lies upstream of the structural geneε. Promoter elements are uεually 100 to 200 baεe pairs long and the εegment of DNA iε

relatively close to the site of initiation of transcription. A particular sequence recognized by one of several tranεcription factorε that are known to bind to the promoter region is the TATA box, a region that is rich in A-T base pairs. The enhancer regions are also referred to as response regions or responεe elements. Thus the term "TGF-β response element" can also be designated "TGF-β enhancer", "TGF-β enhancer region", or "TGF-β responεe region", and the like. The enhancer region iε hereinafter referred to as a reεponεe element . They are εhort DNA εegmentε that occur further upεtream from the initiator site than the upstream promoter element. Response elements contain specific εequences that are recognized by transcription factorε. The response elementε are often a few 1000 base pairs 5' to the promoter but may even be 20,000 base pairs or more distant.

The binding of a transcription factor to either a nucleotide εequence compriεing a response element or promoter resembles an "on switch" . In the context of the present invention, the binding of the TGF-β-related transcription factor results in the dose-dependent activation of the promoter resulting in the transcription of a structural region gene from DNA into RNA. In most caseε, the resulting RNA molecule serves as a template for syntheεiε of a εpecific molecule, such as the indicator molecule of this invention. Thus, "activation" of a TGF-β response element refers to a procesε whereby the functional εtate of the TGF-β reεponεe element iε altered. The reεult of the TGF-β activation of the TGF-β reεponεe element is an increase in the tranεcriptional efficiency of the εtructural gene driven from the promoter. A further embodiment of a TGF-β response element is that it is inducible. The term "inducible" refers to a an enhancement of a particular function. In this invention, the functional activity of a TGF-β response element is increased or induced following activation by the TGF-β-related transcription factor. Thuε, the TGF-β reεponεe element iε also referred to

aε a TGF-β inducible reεponεe element.

The reεult of TGF-β responεe element activation is the coordinate transcription and translation of the structural region containing a gene encoding an indicator protein of this invention aε deεcribed in Section D. The resulting expression of an indicator molecule iε doεe-dependent in relationεhip to the amount of TGF-β preεent in the εample.

The term "doεe-dependent" referε to the functional relationεhip between the amount of TGF-β activating the TGF-β reεponse element and the resulting expression of the indicator molecule. Thus, the functional relationεhip between TGF-β activation and expression of an indicator molecule can be referred to as a linear relationship. Because of the dose- dependent expresεion of an indicator molecule, such as luciferase, in responεe to TGF-β expoεure, the amount of TGF-β reεponεible for the activation of the expreεεion can be readily determined using the methods of this invention.

Thus, based on the teachingε herein, a TGF-β response element nucleotide sequence iε characterized by itε ability to be reεponεive to TGF-β-induced activation. Such a TGF-β reεponεe element iε uεeful herein aε a component in the expreεεion vectorε of thiε invention to provide for the ability to quantify the amount of TGF-β responsible for the transcriptional activation. Thus, a TGF-β responεe element of thiε invention compriεeε any nucleotide εequence that is activated by TGF-β, the process of which is as described in Section B.

In the context of this invention, the term nucleotide sequence refers to a plurality of joined nucleotide units formed from naturally- or non-naturally occurring baseε and cyclofuranoεyl groups joined by phosphodieεter bondε. Thuε, the nucleotide sequence includes the use of nucleotide analogs .

One embodiment of a TGF-β response element of this invention is an isolated double-stranded deoxyribonucleic acid molecule comprising a εequence of nucleotide baεes that defines

a TGF-β reεponse element. However, neither iε it neceεεary that the obtained TGF-β be a naturally occurring εequence preεent in the other genes nor that the TGF-β response element be limited to deoxyribonucleotides. The TGF-β response element may be found in DNA or RNA, in regulatory sequenceε, exonε, or intronε.

Preferred TGF-β reεponεe elementε are derived from selected regions of the promoter regions of the plasminogen activator inhibitor type 1 gene, hereinafter referred to as PAI-1, as described by Loskutoff et al. , Biochem.. 26:3763-3768 (1987), the diεcloεure of which iε hereby incorporated by reference. Loεkutoff et al. deεcribeε a coεmid containing the entire PAI-1 gene. In a related εtudy, the glucocorticoid regulation of the PAI-1 promoter waε deεcribed by van Zonneveld et al., Proc. Natl. Acad. Sci.. 85:5525-5529 (1988), the diεcloεure of which iε hereby incorporated by reference. The εequence of the PAI-1 promoter correεponding to nucleotide poεitionε -800 and extending through the TATA box and initiation site and ending at nucleotide position +200, the latter of which correspondε to the PAI-1 encoded protein at the ninth amino acid residue, in available in the GenBank™/EMBL Data Bank with Accesεion Number J03836.

Moreover, Boεma et al. , J. Biol. Chem.. 263:9129-9141 (1966), have deεcribed the entire 15,867 bp PAI-1 gene εequence including significant stretches of DNA that extend into its 5'- and 3 '-flanking DNA regions, the nucleotide sequence of which is available in the GenBank' IM /EMBL Data Bank with Accesεion Number J03764.

The PAI-1 promoter-derived TGF-β response elements for use in this invention are identified by the nucleotide poεitionε correεponding to the region in the PAI-1 promoter aε liεted in the GenBank™/EMBL Data Bank Acceεεion Number J03836.

Exemplary TGF-β reεponεe elements derived from the PAI-1 promoter have the nucleotide sequences listed in the Sequence Listing in SEQ ID NOs 11-17. The nucleotide εequenceε are

listed showing only the sense strand in the 5' to 3 ' direction of a double-stranded isolated TGF-β responεe element nucleotide εequence. The PAI-1-derived TGF-β reεponεe elementε correεponding to SEQ ID NOs 11-17 have the respective designations with the nucleotide regions correεponding to the PAI-1 promoter indicated in parentheses: 1) SEQ ID NO 11 = 1500 (-1481 to -40); 2) SEQ ID NO 12 = 800 (-800 up to -40); 3) SEQ ID NO 13 = 800/636 (-800 up to -636); 4) SEQ ID NO 14 = 56 (-56 to -41); 5) SEQ ID NO 15 = 674 (-674 to -650); 6) SEQ ID NO 16 = 743 (-743 to -708); and 7) SEQ ID NO 17 = 732 (-732 to -708) .

In one embodiment, a TGF-β responεe element uεeful for practicing the present invention may be derived from any promoter nucleotide sequence. In a further embodiment, a TGF-β response element may be designed to contain preεelected nucleotide baεes. In other words, a subject TGF-β responεe element need not be identical to the nucleotide εequence of the PAI-1-derived TGF-β reεponεe elementε deεcribed herein, εo long as the nucleotide sequence is activatable by TGF-β. A TGF-β response element of this invention thus may contain a variety of nucleotide units of any length, typically from about 5 to about 2000 nucleotides in length. More preferably, a TGF-β response element comprises nucleotide units from about 15 to about 1500 nucleotides in length. A preferred embodiment is a TGF-β response element having nucleotide sequenceε that is greater than 50 base pairs in length. Exemplary long TGF-β reεponεe elementε derived from PAI-1 are liεting in the Sequence Liεting in SEQ ID -NOε 11-13. A preferred embodiment is a TGF-β response element having nucleotide sequenceε that iε leεε than 50 baεe pairs in length. Exemplary short TGF-β responεe elementε derived from PAI-1 are liεting in the Sequence Listing in SEQ ID NOs 14-17.

In one embodiment, the invention contemplates the presence of at least one TGF-β response element present in the regulatory region of the expreεεion vectorε aε deεcribed in

Section D. Thus, one or more stretches of a nucleotide sequence comprising a TGF-β responεe element may be present within a regulatory region. If more than one TGF-β responεe element iε preεent, they are not required to be identical. In other words, TGF-β response elements having different nucleotide sequences as well as different lengths can be combined in a regulatory region of an expresεion vector of thiε invention.

TGF-β reεponse elements can be derived or produced from the PAI-1 promoter by truncation or expanεion of the native or wild-type PAI-1 promoter nucleotide εequence or aε a variant of the native PAI-1 promoter by εite-directed εubεtitution of a preselected nucleotide base or bases.

Also contemplated in this context are regulatory regions • containing multiple TGF-β responεe elementε that can be eithe ' r longer, εhorter, tandemly arranged, reversed in orientation, and permutations thereof. The design and construction of εuch arrangementε are well known to one of ordinary εkill in the art of oligonucleotide deεign and εynthesis and are described by Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Laboratory, pp 390-401 (1982) .

It is also contemplated that nucleotide base modifications can be made resulting in nucleotide analogs to provide certain advantages to the TGF-β response elements of this invention. A nucleotide analog refers to moietieε that function εimilarly to nucleotide εequenceε in a TGF-β reεponεe element of thiε invention but which have non-naturally occurring portionε. Thuε, nucleotide analogε can have altered εugar moietieε or inter-εugar linkageε. Exemplary are the phoεphorothioate and other εulfur-containing εpecies, analogs having altered base units, or other modifications consistent with the εpirit of thiε invention.

Preferred modi ications include, but are not limited to, the ethyl or methyl phosphonate modifications discloεed in the U.S. Patent No., 4,469,863 and the phoεphorothioate modified

deoxyribonucleotides described by LaPlanche et al. , Nucl . Acids Res.. 14:9081 (1986) and Stec et al . , J. Am. Chem. Soc.. 106:6077 (1984), the disclosureε of which are hereby incorporated by reference. These modifications provide reεiεtance to nucleolytic degradation. Preferred modifications are the modifications of the 3 ' -terminus using phosphothionate (PS) sulfurization modification deεcribed by Stein et al. , Nucl . Acids Res .. 16:3209 (1988) .

TGF-β response elements comprising nucleotide sequences can be obtained by a variety of procedures well known in the art, including de novo chemical syntheεiε of complementary oligonucleotideε and derivation of nucleic acid fragmentε from native nucleic acid sequences existing as genes, or parts of genes, in a genome, plasmid, or other vector, such as by reεtriction endonucleaεe digeεtion of larger nucleic acid fragmentε and strand separation or by enzymatic synthesiε using a nucleic acid template.

De novo chemical synthesis of oligonucleotides can be carried out, for example, by the phosphotriester method described by Matteucci et al., J. Am. Chem. Soc. 103:3185 (1981), or as described in U.S. Patent No. 4,356,270, the disclosures of which are hereby incorporated by reference. A particularly preferred method is the phosphoramide method uεing commercial automated synthesizers, such as the ABI automated synthesizer by Applied Bioεyεtemε. Inc., (Foεter City, CA) . Oligonucleotideε can be purified after εyntheεiε uεing published procedures aε deεcribed by Miller et al . , J. Biol. Chem.. 255:9659 (1980) . Thereafter, complementary oligonucleotideε are hybridized to form double-εtranded DNA segments that are TGF-β responεe elementε. Particularly preferred chemically-synthesized oligonucleotides are described in Example 1C and the senεe εtrandε of which are liεted in SEQ ID NOε 14-17, aε described above.

Derivation of a TGF-β response element from nucleic acids involves the cloning of a nucleic acid into an appropriate host

by means of a cloning vector, replication of the vector and therefore multiplication of the amount of the cloned nucleic acid followed by isolation of εubfragmentε of the cloned nucleic acids . For a description of εubcloning nucleic acid fragmentε, see Sambrook et al. , Molecular Cloning: A

Laboratory Manual, Cold Spring Laboratory, pp 390-401 (1982); and see U.S. Patent Nos 4,416,988 and 4,403,036.

In one embodiment, TGF-β reεponεe elementε are obtained by reεtriction digeεtion of cloned vectorε containing the PAI-1 promoter as described in Example 1A and 1C. Particularly preferred nucleotide sequenceε containing TGF-β response elements as well as the minimal promoter εequence obtained in thiε manner include nucleotide sequences corresponding to the nucleotide positions in the PAI-1 promoter sequence from -1481 to +76, specifically a Kpn I/Eco RI digest and -800 to +76, specifically a Hind III/Eco RI digeεt.

In an additional embodiment, in the practice of thiε invention, it iε not neceεsary that the TGF-β responεe element nucleotide εequence be known in order to obtain a TGF-β response element capable of being activated by TGF-β. To that end, contemplated for use in thiε invention are TGF-β reεponεe elementε obtained from promoter regions of other genes that can be determined to contain TGF-β responεe elementε uεing the methodε of this invention.

D. TGF-β Responsive Plasmid Expression Vectors

The present invention contemplates TGF-β responsive plasmid expression vectors in subεtantially pure form capable of causing expression of an indicator molecule in a eucaryotic cell. The term "TGF-β responεive" identifieε an expreεεion vector of this invention that by its composition contains TGF-β response elements that are activated by TGF-β mediated through a TGF-β response element specific transcription factor as described in Section C. Vectors capable of directing the expression of genes to which they are operatively linked are

referred to herein aε "expression vectors".

As uεed herein, the term "vector" referε to a nucleic acid molecule capable of tranεporting between different genetic environments another nucleic acid to which it has been operatively linked. One type of preferred vector iε an epiεome, i.e., a nucleic acid capable of extra-chromoεomal replication. Preferred vectorε are thoεe capable of autonomouε replication and/or expreεεion of nucleic acidε to which they are linked. A TGF-β expreεεion vector of thiε invention is a circular double-stranded plasmid that contains at least the following elements: 1) a regulatory region having at least one TGF-β reεponεe element aε defined in Section C, where the regulatory region iε operatively linked to a promoter; and 2) a εtructural region downεtream of the promoter that containε a gene coding for an indicator molecule of thiε invention.

In a εeparate embodiment, a TGF-β expreεεion vector alεo containε a gene, the expreεεion of which confers a selective advantage, such as a drug reεiεtance, to the eucaryotic hoεt cell when introduced or tranεformed into thoεe cellε. A typical eucaryotic drug reεistance genes conferε reεiεtance to neomycin, also referred to as G416 or Geneticin.

The choice of vector to which the regulatory region, promoter, and εtructural region of the preεent invention is operatively linked depends directly, as is well known in the art, on the functional properties desired, e.g., replication or protein expression, and the host cell to be transformed, these being limitations inherit in the art of constructing recombinant DNA molecules. In preferred embodiments, the vector utilized includeε procaryotic sequences that facilitate the propagation of the vector in bacteria, i.e., a DNA sequence having the ability to direct autonomous replication and maintenance of the recombinant DNA molecule extra-chromosomally when introduced into a bacterial host cell. Such replicons are well known in

the art. In addition, the TGF-β expreεεion vector of thiε invention includeε one or more tranεcription units that are expressed only in eucaryotic cellε .

The eucaryotic tranεcription unit conεiεtε of noncoding εequenceε and εequenceε encoding εelectable markerε . The expreεεion vectorε of thiε invention alεo contain distinct sequence elements that are required for accurate and efficient polyadenylation, referred to aε PAl, 2 and 3 and aε shown in Figure 1. In addition, splicing signals for generating mature mRNA are included in the vector. The eucaryotic TGF-β responsive expression vectorε contain viral replicons, the presence of which provides for the increase in the level of expresεion of cloned genes. A preferred replication sequence is provided by the simian viruε 40 or SV40 papovavirus. Operatively linking refers to the covalent joining of nucleotide sequences, preferably by conventional phosphodieεter bondε, into one εtrand of DNA, whether in single- or double- stranded form. Moreover, the joining of nucleotide εequenceε reεultε in the joining of functional elementε such as response elements in regulatory regions with promoters and downεtream structural regions as described herein.

A preferred eucaryotic expresεion vector of thiε invention aε prepared in Example 1 containε a regulatory region having TGF-β reεponεe elementε derived from the 5 ' promoter end of the human plaεminogen activator inhibitor type 1 (PAI-1) gene operatively linked to PAI-1 minimal promoter and a downεtream εtructural region containing a gene coding for an indicator polypeptide, preferably luciferaεe.

Exemplary TGF-β reεponεive expression vectors include the following expresεion vectors, the designations of which are indicated along with the corresponding SEQ ID NO in which the sense strand of the expresεion vector iε listed where the first nucleotide of the double-stranded circular vector is the middle "T" nucleotide present in the Eco RI restriction site as deεcribed in Example 1: 1) pδOOneoLuc (SEQ ID NO 1); 2)

pδ00/636neoLuc (SEQ ID NO 2); 3) p56neoLuc (SEQ ID NO 3); 4) p674neoLuc (SEQ ID NO 4); 5) p743neoLuc (SEQ ID NO 5); 6) p732neoLuc (SEQ ID NO 6); 7) p56Luc (SEQ ID NO 7); 8) p674Luc (SEQ ID NO 8); 9) p743Luc (SEQ ID NO 9); and 10) p732Luc (SEQ ID NO 10) .

The exemplary TGF-β expression vectors of this invention are derived from the starting cloning expression vector, designated pl9Luc, as described in Example 1. The nucleotide sequence of the senεe εtrand of an Eco Rl-linearized pl9LUC vector is listed in the Sequence Listing as SEQ ID NO 21.

A further embodiment of this invention is the preparation of TGF-β responεive expression vectors having altered arrangements of and selected types of TGF-β reεponse elementε in the regulatory region. To that end, pl9Luc and the pl9Luc- derived p39Luc expression cloning vectors, both of which is described in Example 1, are vectors that allow for the construction of TGF-β responεive vectorε having any εelected regulatory region operatively ligated to a εelected promoter. Therefore, any regulatory region of any length containing one or more TGF-β reεponse elements can be paired with any promoter, a non-TGF-β responεive PAI-1 or heterologouε HBV promoter aε uεed herein but not limited to that, to prepare TGF-β reεponεive expression vectorε that provide for the quantitation of inducing TGF-β. In a related embodiment, in addition to the construction methods detailed herein, other methodε of preparing pl9Luc- derived expreεεion vectorε having TGF-β reεponεe elementε and promoterε are familiar to one of ordinary εkill in the art of vector conεtruction and are described by Auεebel, et al. , In Current Protocolε in Molecular Biology, Wiley and Sonε, New York (1993) and by Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, 1989.

1. Plasmid Vectors for Stable Transformations In practicing one aspect of this invention, a

preferred embodiment iε a TGF-β reεponεive expreεsion vector having a gene for encoding a selectable marker providing for stably transformed cellε. Stably tranεformed cellε confer the ability to utilize a reproducible εource for practicing the methodε of this invention over a course of time. A preferred selectable marker gene iε the gene conferring neomycin- reεistance. Such a gene for encoding the εelectable marker was derived from an expresεion vector, designated pMAMneo, as described in Example 1. The nucleotide sequence of the neomycin-resiεtance conferring gene is listed in SEQ ID NO 20. In one embodiment, a TGF-β responεive expreεsion vector contains a first nucleotide sequence comprising a regulatory region that includes at least one TGF-β inducible response element operatively linked to a promoter, a second nucleotide sequence comprising a structural region downstream of the promoter and coding for an indicator molecule, and a third nucleotide sequence comprising a gene encoding a εelectable marker for the εelection of a εtably tranεformed cell, where the reεponse element iε capable of inducing doεe-dependent luciferaεe activity and the εtructural region codes for luciferase.

Preferred expression vectors containing the neomycin- resiεtance conferring gene include the following designations followed in parenthesis by the corresponding SEQ ID NO in which the sense strand of each Eco Rl-linearized vector is listed according to the convention adopted in this invention for listing vector sequences: 1) pδOOneoLuc (SEQ ID NO 1); 2) p800/636neoLuc (SEQ ID NO 2); 3) p56neoLuc (SEQ ID NO 3); 4) p674neoLuc (SEQ ID NO 4); 5) p743neoLuc (SEQ ID NO 5); 6) p732neoLuc (SEQ ID NO 6) .

In a further embodiment, the plaεmid expreεεion vectors of this invention contain TGF-β inducible response elements that correspond to a nucleotide sequence listed in SEQ ID NOs 11-17 as described in Section C. Preferred promoters for use in the TGF-β expresεion

vectorε of this invention for stably transforming cells aε well as for transient transformation are the PAI-1 minimal promoter εequence and the hepatitiε B viruε minimal promoter εequence, the sense εequenceε of which are reεpectively liεted in SEQ ID NOε 18 and 19. Contemplated for uεe in thiε invention are promoterε that are not reεponεive to TGF-β. The selection of alternative promoterε iε within the εcope of one having ordinary skill in the art.

This invention contemplates additional TGF-β expression vectorε for stably transforming cells that can be designed to have regulatory regionε that contain alternative TGF-β reεponse elements and promoterε .

a. Reσulatorv Region The regulatory region of a TGF-β expression vector of this invention contains at leaεt one TGF-β reεponεe element aε described herein and in Section C of this invention. As contemplated for uεe in thiε invention, the regulatory region of a TGF-β expreεεion vector can range in length from 5 to 2000 baεe pairε, preferably 15 to 1500 baεe pairε, and can contain more than one TGF-β response element in any orientation and arrangement. Thuε, if two or more TGF-β reεponεe elements are present in a regulatory region, they may be contiguous with one another or separated by an intervening nucleotide sequence. The design and construction of such arrangementε are well known to one of ordinary εkill in the art of oligonucleotide deεign and εyntheεiε and are deεcribed by Sambrook et al. , Molecular Cloning: A Laboratory Manual, Cold Spring Laboratory, pp 390- 401 (1982) . Preferred TGF-β reεponεe elementε preεent in the regulatory region of a TGF-β expreεεion vector are derived from the PAI-1 promoter and have the nucleotide εequences listed in the Sequence Listing in SEQ ID NOs 11-17. The PAI-1-derived TGF-β response elements correεponding to SEQ ID NOs 11-17 have the respective designations with the nucleotide regions

correεponding to the PAI-1 promoter indicated in parentheses: 1) SEQ ID NO 11 = 1500 (-1481 to -40) ; 2) SEQ ID NO 12 = 800 (- 800 up to -40); 3) SEQ ID NO 13 = 800/636 (-800 up to -636); 4) SEQ ID NO 14 = 56 (-56 to -41); 5) SEQ ID NO 15 = 674 (-674 to -650); 6) SEQ ID NO 16 = 743 (-743 to -708); and 7) SEQ ID NO 17 = 732 (-732 to -708) .

b. Structural Region

A plasmid vector of the present invention contain a structural region having a nucleotide sequence that encodes an indicator molecule. The structural region is operatively linked to the regulatory region such that the inducible promoter of the regulatory region, under the inducible control of the TGF-β responεe element, controlε tranεcription and expreεεion of the indicator molecule. Thuε, upon induction of the TGF-β reεponεe element, the regulatory region transcribes and thereby expresεeε the indicator molecule reεulting in a detectable event in the cell, which event can be measured by detection of the amount of the expressed indicator molecule. In other words, the responεe element is capable of inducing the expression of the indicator molecule by virtue of it ' s controlling expression of the indicator through the promoter to which the response element is operatively linked. Typically, the structural region iε "downεtream" of the regulatory region in the plasmid, and positioned to be under the direct control of the regulatory region. Other configurations can be utilized so long as the induction of the TGF-β responεe element reεultε in the expression of the indicator polypeptide. Exemplary and preferred configurations are described in Examples.

The term "indicator molecule" as used in thiε invention referε to a molecule encoded by a reporter gene, the expreεεion of which in the expreεεion vectorε of this invention, reεultε in a detectable measurable protein, polypeptide, enzyme and the like. Alternative expresεionε for indicator molecule are

reporter molecule, reporter polypeptide, indicator protein, indicator polypeptide and the like. In preferred embodiments, the indicator molecule is a protein.

There are any of a variety of indicator polypeptideε suitable for use in the present invention, and the invention need not be so limited to any particular indicator. A preferred indicator polypeptide is luciferase encoded by the firefly luciferase gene. Use of the luciferaεe gene for expreεεion of luciferaεe haε been deεcribed by Gould et al . , Anal. Biochem.. 7:5-13 (1986) and Braεier et al. , Bio- Techniques. 7:1116-1122 (1989) . A preferred structural region includes a nucleotide sequence having the sequence characteristics of the luciferase gene shown in SEQ ID NO 21. Alternative embodiments include indicator proteins εuch a β-galactosidase and chloramphenicol acetyltransferaεe (CAT) . Use of a β-galactosidaεe and CAT aε reporter moleculeε have been reεpectively by Luεkin et al . , Neuron. 1:635-647 (1988) and Gorman et al. , Mol . Cell Biol.. 2:1044-1051 (1982) .

Aεsociated with the use of an indicator molecule in the quantifying TGF-β are means for measuring the indicator molecule. A preferred method for detecting the luciferase indicator molecule is the use of a luminometer commercially available from Dynatech Laboratories Inc., Chantilly, VA aε described in Example 3A and analyzed according to manufacturer's instructions. For detecting CAT activity, a simple-phase extraction assay has been developed and described by Seed et al., Gene. 67:271-277 (1986), the disclosure of which is hereby incorporated by reference. Alternative preferred methods for detecting CAT activity are described in Current Protocols in Molecular Biology, Eds, Ausebel et al., Unit 9.0, John Wiley & Sonε (1993). Expression of β- galactoεidase activity is performed in activity asεays performed essentially as described by Miller, Experiments in Molecular Genetics, Cold Spring Harbor Laboratory, New York, (1972), the diεcloεure of which is hereby incorporated by

reference. With β-galactosidaεe additional reagentε are required to visualize its presence following induced expression. Such additional reagentε for β-galactosidase include o-nitrophenyl-β-D-galactopyranεoεide and the like for the development of a color reaction by abεorbance at wavelengths of 500 and 420.

c. Selectable Marker Gene

In preferred embodiments, the plasmid vector of the present invention includeε a gene that encodes a selectable marker that is effective in a eucaryotic cell, preferably a drug resiεtance εelection marker. A preferred drug resistance εelection marker iε a gene whose expresεion reεults in neomycin resiεtance, i.e., the neomycin phosphotransferaεe (neo) gene [Southern et al. , J. Mol . APPI . Gene .. 1:327-341 (1982)] or a gene whoεe expression results kanamycin resistance, i.e., the chimeric gene containing nopaline synthetaεe promoter, Tn5 neomycin phoεphotranεferaεe II and nopaline synthetase 3 ' non-translated region deεcribed by Rogerε et al . , Methodε for Plant Molecular Biology, A.

Weiεεbach and H. Weiεεbach, edε. , Academic Press, Inc., San Diego, CA (1988) . Other selectable markers which are utilizable in eucaryotic cells can be utilized in the present vectors and methodε and therefore the invention need not be limited to any particular εelectable marker. Thus, the invention contemplates the use of a nucleotide εequence which conferε a eucaryotic εelection meanε, including but not limited to geneε for resistance to neomycin and kanamycin.

A preferred nucleotide sequence defining a selectable marker gene is a nucleotide sequence having the εequence characteristics of the neomycin resiεtance gene εhown in SEQ ID NO 20.

The use of a selectable marker for eucaryotic cells provides the advantage of producing stably transformed cells, as diεcuεεed herein. Thus, one can produce a eucaryotic cell

line containing a plasmid vector of thiε invention for use in the present methods wherein all the cells of the culture are selected to be uniform and each contain intact plasmid vector, thereby asεuring that all of the eucaryotic cell in the culture are substantially similar in reεponsivenesε to TGF-β, thereby increasing the reliability and senεitivity of the assay.

In addition, preferred embodiments that include a procaryotic replicon also include a gene whose expreεεion conferε a selective advantage, such as a drug resiεtance, to the bacterial hoεt cell when introduced into those transformed cells. Typical bacterial drug reεiεtance geneε are thoεe that confer reεiεtance to ampicillin or tetracycline.

Thoεe vectorε that include a procaryotic replicon alεo typically include convenient reεtriction εiteε for inεertion of a recombinant DNA molecule of the preεent invention. Typical of εuch vector plaεmidε are pUC8, pUC9, pBR322, and PBR329 available from BioRad Laboratorieε, (Richmond, CA) and pPL, pK and K223 available from Pharmacia, (Piεcataway, NJ) , and pBLUESCRIPT and pBS available from Stratagene, (La Jolla, CA) . A vector of the present invention may also be a Lambda phage vector including thoεe Lambda vectorε described in Molecular Cloning: A Laboratory Manual. Second Edition, Maniatis et al . , eds., Cold Spring Harbor, NY (1989) .

Plasmid vectorε for use in the present invention are also compatible with eukaryotic cells. Eucaryotic cell expresεion vectorε are well known in the art and are available from εeveral commercial sources. Typically, such vectors provide convenient reεtriction εiteε for inεertion of the desired recombinant DNA molecule, and further contain promoterε for expreεεion of the encoded geneε which are capable of expreεεion in the eucaryotic cell, aε diεcuεsed earlier. Typical of such vectors are pSVO and pKSV-10 (Pharmacia) , and pPW-l/PML2d (International Biotechnology, Inc.), and pTDTl (ATCC, No. 31255) .

2. Plasmid Vectors for Co-transformation and Transient Transformation

This invention contemplates the use of TGF-β responεive expreεεion vectorε having regulatory, promoter and structural regions but lacking a gene for encoding a εelectable marker. In other wordε, in practicing thiε invention, TGF-β expreεεion vectorε for tranεient tranεformation of eucaryotic cells are contemplated. This embodiment allowε for an alternative to εtable transformation of cellε for use practicing the methods of this invention. Transiently tranεformed cellε produced aε deεcribed in Example 2D. are useful for performing TGF-β asεayε when having stably transformed cells iε not required or necesεitated. Aε described in Example 4, transiently transformed cells are useful for determining the nucleotide εequence of TGF-β reεponεe elementε aε well as quantifying the amount of TGF-β present in a heterogeneous or homogeneous liquid sample.

Preferred TGF-β expresεion vectorε used for transiently transforming eucaryotic cells include the following vectors εhown with their deεignationε and SEQ ID NOε in which the εenεe strand of the double-stranded Eco Rl-linearized vectors iε listed: 1) p56Luc (SEQ ID NO 7); 2) p674Luc (SEQ ID NO 8); 3) p743Luc (SEQ ID NO 9); and 4) p732Luc (SEQ ID NO 10) .

The invention further describeε TGF-β reεponεive plaεmidε lacking a selectable marker gene having the identifying characteristicε of plaεmids that have been deposited with the American Type Culture Collection, Rockville, MD having the assigned ATCC Acceεεion Numberε 75627, 75628, 75629., the plaεmidε of which reεpectively correεpond to the Eco RI- linearized εenεe strand nucleotide sequences listed SEQ ID NOs 8-10.

In an additional embodiment, this invention describeε the co-transformation of TGF-β expression vectors for transient transformation in conjunction with a second expresεion vector from which a εelectable marker iε expreεsed. A preferred

selectable marker expresεing plaεmid iε RSVneo aε described in Example 2C. The ability to prepare stably transformed cells through the use of a vector that only confers transient transformation iε accompliεhed with thiε approach. The advantage thiε approach provideε is that further vector constructions for inserting selectable marker geneε can be avoided, thereby providing εtably tranεformed cellε for uεe in practicing thiε invention when necessitated. Thus, eucaryotic cells that have been co-transformed with a transient TGF-β expresεion vector and a εecond plasmid such aε RSVneo provide for an alternative approach to create εtably transformed eucaryotic cells.

Any transient TGF-β expresεion vector of thiε invention can be uεed in thiε context. A preferred co-tranεformed eucaryotic cell is the cell line Hep3B that has been co- transformed with RSVneo and the pl500Luc expreεεion vector having the TGF-β reεponse element in SEQ ID NO 11. This stably transformed cell line has been deposited with the American Type Culture Collection, Rockville, MD and has been aεεigned ATCC having ATCC Acceεεion Number CRL 11508.

With the teachingε of thiε invention, additional TGF-β expreεεion vectorε for tranεiently tranεforming cellε can be deεigned to have regulatory regionε that contain alternative TGF-β response elements and promoterε. In a further embodiment, these additional vectors can be used to prepare stably transformed cells through the use of the co- transformation approach.

3. Recipient Cells for Transformations Insofar as the invention deεcribeε plasmid vectors for use in the present invention, the invention also contemplates a eucaryotic cell containing a plasmid vector of the present invention.

A eucaryotic cell suitable for use can be any eucaryotic cell which expresseε a TGF-β receptor on itε cell εurface and

LO to to l-> o LΠ o LΠ LΠ

element and a gene encoding an indicator polypeptide, wherein the plasmid is capable of expresεion of the indicator polypeptide in reεponεe to TGF-β induction. Particularly preferred are eucaryotic cellε that contain a plaεmid vector having a nucleotide sequence with the nucleotide sequence characteriεticε of the TGF-β reεponεe element εelected from the group conεiεting of the εequences shown in SEQ ID NOε 11-17. A particularly preferred eucaryotic cell containε a plaεmid vector having a nucleotide sequence with the nucleotide sequence characteristics of the plasmid vector εelected from the group consiεting of the εequenceε εhown in SEQ ID NOs 1-10.

A preferred eucaryotic cell described further herein is Hep3B stably transformed with the plasmid vector pl500Luc, referred to as LUCI, and having the ATCC accesεion No. CRL 11508.

E. Methods for Quantifying TGF-β

The present invention deεcribeε methods for detecting the presence, and preferably quantifying the amount, of TGF-β in a liquid sample, either containing purified TGF-β or TGF-β in a heterogeneous admixture, and is alεo referred to herein aε a TGF-β aεsay. The asεay system provides for the quantification of TGF-β through the expresεion of an indicator polypeptide which is expresεed in levelε proportional to the amount of TGF-β being detected.

The aεεay is a highly senεitive and εpecific, non- radioactive assay, for detecting mature (active) TGF-β. When compared to the senεitive and widely used proliferation-based mink lung epithelial cell (MLE cells) method for measuring TGF- β concentration, the TGF-β asεay method of thiε invention iε more rapid, haε comparable εenεitivity, and haε a greater detection range. Specificity of thiε novel aεεay waε alεo higher as evidenced by its relative insensitivity to factorε such as epidermal growth factor (EGF) and basic fibroblast growth factor (bFGF) which can greatly affect other assays.

cellε that contain a TGF-β reεponεive expression vector having a gene encoding an indicator polypeptide for a predetermined time period sufficient for the eucaryotic cells to express a detectable amount of the indicator polypeptide; 5 (b) measuring the amount of the indicator polypeptide expresεed during the time period; and

(c) determining the amount of TGF-β preεent in the εample by comparing the meaεured amount of the indicator polypeptide againεt a reference curve.

10 Preferably, the reference curve represents a quantitative relationship derived from a εerieε of meaεured amountε of indicator polypeptide produced from a series of known concentrationε of TGF-β.

The standardized reference curve is obtained from parallel

15 assays performed by exposing similarly transfected cells to a range, usually in serial dilution, of known (measured) amounts of one or more of the known TGF-β isoforms. The resulting expressed indicator polypeptide is then determined by direct detection of the indicator polypeptide. A reference curve is

'20 then generated by plotting the measured amount of expressed indicator polypeptide against the known range of inducing amountε of TGF-β. The amount of unknown TGF-β in the teεt liquid sample is then determined by extrapolating the measured amount of test indicator polypeptide to the reference curve.

25 The use of standard curves in quantifying the amount of protein in a liquid sample in general has been described by Lowry et al. , J. Biol. Chem.. 193:265-275 (1951), the disclosure of which is hereby incorporated by reference. As shown in the Exampleε herein, the TGF-β assay of thiε invention

30 allows for the measurement of TGF-β from the expresεion and subsequent detection of an indicator polypeptide from a concentration range from leεε than 5 picogramε/ml (pg/ml) equivalent to 0.2 pM up to 10 ng/ml equivalent to 40 pM (or 0.4 nM) . The doεe-dependent response to TGF-β is linear

35 between 0.2 pM up to 100 pM depending on the asεay conditionε.

including those which exhibit responsiveness to factors in addition to TGF-β, which activity is subtracted out by the use of the control data obtained using the antibody treatment . Second, one can correct for spuriouε induction or inhibition of a TGF-β response element by factors other than TGF-β. The analysiε of comparative data (comparing) produced by conducting the preεent method both with and without anti-TGF-β antibody for the purpoεe of determining the level of TGF-β in a liquid sample, can be conducted by a variety of statistical methodε that are not to be construed as limiting to the invention.

Exemplary comparative analyseε are deεcribed in the Exampleε .

Contemplated for use with any of the above TGF-β asεay methodε of this invention are plasmids having identifying characteristicε of plaεmidε on depoεit with ATCC having the ATCC Acceεεion Numberε 75627, 75628 and 75629. Also contemplated are eucaryotic cells that contain the TGF-β reεponεe element having the nucleotide εequence in SEQ ID NO 11 where the cellε correεpond to cellε on deposit with ATCC -having the ATCC Accession Number CRL 11508. In one embodiment, the use of stably transformed eucaryotic cells are contemplated. The invention deεcribes plasmidε for uεe in the methods that comprise a nucleotide sequence corresponding to nucleotide sequences listed in SEQ ID NOs 1-10. TGF-β inducible response elements that comprise a nucleotide εequence corresponding to nucleotide sequences listed in SEQ ID NOs 11-17 are also described. Contemplated promoter nucleotide sequenceε are liεted in SEQ ID NOs 18 and 19.

A further embodiment of the methods of the invention are eucaryotic cells that are stably tranεformed cellε containing a plasmid having a gene encoding a selectable marker for the selection of εaid εtably tranεformed cellε. The invention describes such plasmidε having nucleotide εequenceε liεted in SEQ ID NOε 1-6. The invention further describes a stably transformed eucaryotic cell on deposit with ATCC having ATCC Acceεεion Number CRL 11508 containing the TGF-β reεponse

element having the nucleotide sequence in SEQ ID NO 11.

An additional embodiment are eucaryotic cells that are transiently transformed cells with plasmids corresponding to the nucleotide sequenceε listed in SEQ ID NOs 7-10. The use of stably transformed cells is particularly preferred because it provides uniformity and reproducibility to the cell based assay without the need for additional controls for the efficiency of transformation typically asεociated with methodε uεing tranεient tranεformation. Stably transformed cells do not require the uεe of an internal εtandard for tranεformation efficiency, and all of the cells utilized are typically uniformly transformed. Furthermore, the methods do not require the additional step of transforming the cellε transiently because the stably transformed cell line is already available.

The invention describes quantifying the amount of TGF-β in a body fluid, in culture medium, in a tisεue extract, and in the like liquid εamples. A further preferred embodiment iε the determination of the amount of a specific iεoform of TGF-β, specifically TGF-βl, TGF-β2 or TGF-β3, in a liquid εa ple.

In a preferred embodiment, this invention describes the use of any eucaryotic host cell that containε a TGF-β receptor and iε capable of inducing a TGF-β reεponεe element upon activation by TGF-β. Exemplary aεεayε for measuring activation by TGF-β and induction of a TGF-β response element are described herein and can be used to identify candidate host cells εuitable for uεe in the preεent diagnoεtic methodε. A preferred hoεt cell is a mammalian cell. Preferred mammalian cells include mink lung epithelial (MLE) cellε, particularly clone C32 from MLE cellε, HeLa cellε, Chineεe hamεter ovary

(CHO) cellε, Hep3B cellε, GM7373 cellε, NIH 3T3 cells, and the like cells.

Conditions for incubating a eucaryotic cell in the present methods are the same as general cell culture methods. Typical cell culture media for culturing and incubating eucaryotic

cellε include alpha-MEM, Eagle'ε MEM (having non-eεεential amino acidε), RPMI 1640 and Dulbecco' ε modified MEM (DMEM), all which are well known in the art. The culture medium preferably containε 0.5 to 2 % (v/v) εeru , preferably a fetal calf or fetal bovine εerum (FCS or FBS) . Cell culture conditionε include the uεe of cellε plated at a density of about 0.8 to about 3.2 x 10 4 cells per well of a 96-well tissue culture plate, preferably about 1.6 x 10 4 cellε per well. Cellε are typically plated at the indicated density, and allowed to grow until they reach a confluence density of from about 70% confluent to about 1 day post-confluent, but should preferably be allowed to grow after plating for a time period sufficient for the cells to expresε detectable levelε of TGF-β receptor, which time period iε typically about 0.5-24 hourε, preferably about 1-5 hourε, and preferably is about 3 hours.

After plating and culturing, the eucaryotic cells are incubated under culturing conditionε with culture medium that includeε a predetermined volume of a liquid εample believed to contain TGF-β. The incubation time period iε a time εufficient for any TGF-β preεent in the liquid sample to interact with the eucaryotic cell TGF-β receptor and thereby induce the TGF-β responεe element and express the indicator polypeptide. The time required for the expressed indicator polypeptide to accumulate to detectable levels will vary with the choice of indicator and method of detection, and can be predetermined.

However, typical incubation times for contacting the cell with the liquid sample can range from 2 to 24 hours, preferably about 6 to 22 hours, more preferably 10 to 20 hours, and particularly about 14 hours. Particularly preferred culturing and incubation conditions for use in the present methods are described in the Examples.

The detection of TGF-β in liquid sampleε εuch aε body fluid or tiεεue extract samples iε uεeful in following the levels of TGF-β in patients experiencing a variety of conditions where the TGF-β level is important to the clinician.

For example, TGF-β levelε are significant in diseaseε characterized by excessive fibrosis such as hepatic fibroεiε and the like, in proliferative and in conditionε where there iε an increase in collagen expresεion, and the like conditionε where TGF-β iε believed to participate. In addition, there are many therapeutic uεes of TGF-β, and therefore, the present assay methods are useful for meaεuring the therapeutic fate of adminiεtered TGF-β in patients being treated therapeutically with TGF-β.

F. Diagnostic Methods and Kits

The present invention also contemplates a diagnostic system in kit form for asεaying the amount of TGF-β in a liquid sample according to the preεent methodε. The diagnoεtic kit contains, in an amount sufficient for at least one assay, a eucaryotic cell of this invention useful for practicing the diagnostic methods for detection of TGF-β.

The kit can further contain a packaging material. Packaging material can include container(s) for εtorage of the materials of the kit, and can include a label or instructionε for use.

The kit can additionally contain an aliquot of reference TGF-β for use in generating a standard reference curve using the methods of the invention. Thus in preferred embodiments, a diagnostic kit includes, in an amount sufficient for at least one asεay, the following: (a) packaging material; (b) eucaryotic cells contained within the packaging material, where the cells are capable of expresεing an indicator molecule and containing a plaεmid compriεing, in the direction of tranεcription, a regulatory region that includeε at leaεt one TGF-β inducible response element that is operatively linked to a promoter, and a structural region downstream of εaid promoter, where the TGF-β response element is capable of inducing dose-dependent indicator molecule activity and the structural region coding

for εaid indicator molecule; and (c) an aliquot of TGF-β contained within εaid packaging material, where the TGF-β iε used for generating a reference curve as described herein representing a measured amount of the indicator molecule produced from a known concentration of TGF-β.

As used herein, the term "packaging material" refers to a solid matrix or material such as glass, plastic, paper, foil and the like capable of holding within fixed limits eucaryotic cells and an aliquot of TGF-β. Thus, for example, packaging, material can be a plastic vial used to contain eucaryotic cells in growth medium to which liquid εampleε can be added for activating the TGF-β reεponsive plasmid within the cells. Packaging material can alεo be a glaεε vial in which an aliquot of TGF-β iε contained for use in generating a reference curve, the latter of which iε described in Section E.

Aε uεed herein, an "aliquot" of TGF-β referε to an amount of TGF-β εufficient to generate a reference curve of this invention. In preferred embodiments, the aliquot of TGF-β is provided in the form of a substantially dry powder, i.e., in lyophilized form, for subsequent reconεtitution or in the form of a εolution, i.e., a liquid diεperεion. Preferably the amount of powdered TGF-β iε in the range of 25 nanogramε (ng) , more preferably 125 ng to 625 ng, and moεt preferably 250 ng. Preferably the amount of TGF-β in liquid solution is in the range of 1 to 50 nanomolar (nM) , more preferably 5 to 25 nM and moεt preferably 10 nM. Preferred εerial dilutionε of TGF-β uεed in generating the reference curve are described in Section E. The TGF-β provided in the kit preferably includes each of the three TGF-β isoforms as described in Section B. The term "indicator molecule or indicator polypeptide" aε uεed in this invention and described in Section Dl refers to a molecule encoded by a reporter gene, the expresεion of which in the expreεεion vectorε of thiε invention, reεults in a detectable measurable protein, polypeptide, enzyme and the like.

In preferred embodiments, the packaging material includes a label indicating that eucaryotic cells containing TGF-β responεive expreεεion vectorε can be uεed for determining the amount of TGF-β in a liquid εample that includes the steps of (a) incubating the cells with the selected liquid sample; (b) measuring the amount of the induced indicator molecule; and (c) comparing the amount of measured indicator molecule with a reference curve. Thus, the packaging material containε a label that iε a tangible expreεsion describing the methodε of thiε invention aε deεcribed in Section E. of uεing plaεmid- transformed eucaryotic cells for quantifying the amount of TGF- β in a test liquid sample.

The packaging materials discuεεed herein in relation to the kit of this invention are those customarily utilized in kits or diagnostic syεtemε. Such materialε include glaεε and plaεtic, the latter of which include polyethylene, polypropylene and polycarbonate, bottles, vials, plastic and plastic-foil laminated envelopes and the like.

The eucaryotic cells transformed with the TGF-β responsive expression vectors of this invention are cells that express

TGF-β receptor on their cell εurface aε described in Section E. All normal cells and most all neoplastic cellε have cell surface membrane receptors alεo referred to a binding proteinε for TGF-β. For review, εee Tucker et al . , Proc. Natl. Acad. Sci.. USA. 81:6757-6761 (1984) and Frolik et al. , J. Biol.

Chem.. 259:10995-11000 (1984) . The receptors have previously been described in Section E. Preferred cells for use with the TGF-β asεay kit include mink lung epithelial cellε (MLE cellε) , HeLa cellε, Chineεe Hamster Ovary cells, Hep3B cells, GM7373 cells and NIH 3T3 cells, with the C32 clone from the mink lung epithelial cells being the moεt preferred cell line.

In preferred embodimentε, the eucaryotic cellε are transformed with the expreεsion vector plasmids described in Section D have a nucleotide sequence that corresponds to a sequence in SEQ ID NOs 1-10. Contemplated for uεe in the kit

are stably and tranεiently transformed eucaryotic cells. As described in Section Dl, for preparing stably transformed eucaryotic cells, the plasmids corresponding to SEQ ID NOε 1-6 are preferred for uεe. A further preferred eucaryotic cell for uεe in the kit iε the Hep3B cell line co-transfected with pl500Luc and RSVneo for preparing stably transformed cells that have been deposited with ATCC having the ATCC Acceεεion Number CRL 11508 and identified by the deεignation "LUCI". For preparing tranεiently tranεformed eucaryotic cellε, the plasmids corresponding to SEQ ID NOs 7-10 are preferred for use.

In preferred embodiments, eucaryotic cells for use with the kit contain a plasmid having the identifying characteristics of a plasmid on deposit with ATCC having the Accession Numbers 75627, 74628 and 75629 as described in Section C.

The kit of this invention further includes an anti-TGF-β antibody for use in a parallel control assay for determining the amount of indicator molecule produced other than by TGF-β induction. Preferred anti-TGF-β antibodies are anti-TGF-βl, anti-TGF-β2 or anti-TGF-β3 monoclonal antibodies commercially available from Genzyme Corp., Cambridge, MA.

Preferred diagnostic asεayε accomplished with the kit performed as described herein are for the quantitation of the amount of TGF-β in a liquid sample. A liquid sample can include an iεoform of TGF-β, εpecifically TGF-βl, TGF-β2 or TGF-β3. A liquid εample further includes any body fluid, culture medium and a tisεue extract that may contain unknown quantities of TGF-β. Thus, the liquid sample includeε the body fluids, serum, plasma, whole blood, lymph fluid, synovial fluid, follicular fluid, εeminal fluid, amniotic fluid, urine, spinal fluid, saliva, sputum, tears, perspiration, mucus and the like. Culture medium includes culture supernatant, also referred to aε conditioned medium, collected from cellε maintained in tiεεue culture as deεcribed in Example 3B.

Tiεεue extractε alεo encompaεε extractε of cellε, referred to aε cellular extractε. In addition, organs such as placentas can be obtained and extracted with well known procedureε to prepare placental extractε. Extractε can alεo be obtained of any body organ or portion thereof, tiεsue or cellε, including normal, tumorigenic, and malignant cellε. Thiε is generally accomplished by surgical meanε, i.e., by biopsy sampleε including needle aspirateε, tissue scrapingε, or freshly dissected tisεueε and the like. Extractε are the collected εampleε are then prepared by meanε including homogenization in lyεiε bufferε, including detergentε εuch as NP-40, Triton X- 100, and the like. Common methods include using potters, blenders, ultrasound generators, and dounce homogenizers.

EXAMPLES

The following examples relating to this invention are illustrative and should not, of course, be construed aε specifically limiting the invention. Moreover, such variations of the invention, now known or later developed, which would be within the purview of one skilled in the art are to be considered to fall within the scope of the preεent invention hereinafter claimed.

1. Preparation of Expreεεion Vectors Containing TGF-β Responεe Elementε

A. Source Cloning Vector Constructs and

Preparation of Expression Vectors for Stable Transformation

Eucaryotic expression vectors having a regulatory region having at leaεt one TGF-β response element derived from the 5 ' promoter end of the human plasminogen activator inhibitor type 1 (PAI-1) gene operatively linked to a PAI-1 minimal promoter and a downstream structural region containing a gene coding for an indicator polypeptide, preferably luciferaεe, were prepared and designated generally aε PAI/L

eukaryotic expreεεion conεtructε . Operatively linking refers to the covalent joining of nucleotide sequences, preferably by conventional phosphodiester bonds, into one strand of DNA, whether in single- or double-stranded form. Moreover, the joining of nucleotide sequenceε reεultε in the joining of functional elementε εuch as responεe elementε in regulatory regionε with promoterε and downstream εtructural regions as described herein.

The expresεion vector conεtructε of thiε invention were then used for preparing stably transformed cellε for use in the quantitative TGF-β asεays of thiε invention. The expreεεion vectors were designed to contain varying lengths and arrangements of the TGF-β response elements from the PAI-1 promoter, a neomycin-resiεtance conferring gene for selection and a gene encoding an indicator polypeptide, preferably luciferase. Two starting vectors were required to prepare the expression vectors having a neomycin-resiεtance conferring gene. One of these starting cloning plasmid vectors, designated pl9Luc, was previously described by van Zonneveld et al., Proc. Natl. Acad. Sci.. USA. 85:5525-5529 (1988), the disclosure of which is hereby incorporated by reference.

1) Preparation of Cloning Vector P!9LUC

The promoter-lesε reporter gene pl9Luc plaεmid waε originally designed by van Zonneveld et al . , Proc. Natl. Acad. Sci.. USA. 85:5525-5529 (1988) to monitor promoter activity with a εtructural region, having the firefly luciferase gene to function as a reporter gene, fused to a SV40 splice and polyadenylation site. The pl9Luc plasmid alεo contained a multiple cloning site preceded by two SV-40-derived polyadenylation siteε. The pl9Luc plaεmid waε constructed from PSVOAL-AΔ5 ' , a vector described by De Wet et al. , __________________________

Biol.. 7:725-737 (1987) . The pSVOAL-AΔ5 ' waε first linearized with Hind III and one portion of the plaεmid was blunt-ended by filling in the Hind III εiteε with E. coli DNA polymeraεe I

large fragment (Klenow) , ligated to phosphorylated Eco RI linkers (New England Biolabs, Beverly, MA) . Two of the resulting fragments, the 621 bp fragment originally containing the 5' end of the luciferase gene and the 2718 bp fragment originally ocated on the 5' end of this fragment, were isolated. A second portion of the Hind Ill-cleaved pSV0AL-AΔ5 ' was ligated to a 55 bp polylinker and cleaved with Eco RI. The resulting 2831 bp fragment containing the multiple cloning site and the pBR322-derived ampicillin reεiεtance-conferring gene waε iεolated. Theεe fragmentε were ligated to create the circular double-stranded pl9Luc plasmid that contained the three fragments in their original orientation but with the multiple cloning εite in the original Hind III εite.

The continuouε 6170 bp senεe εtrand, also referred to as the coding strand, nucleotide sequence of an Eco Rl-linearized pl9LUC vector is liεted in the Sequence Liεting as SEQ ID NO 21. The convention adopted for liεting the nucleotide sequences of the pl9Luc vector as well as all the expresεion vectorε of thiε invention derived from pl9Luc iε to liεt only the senεe εtrand of each vector with the nucleotide poεition 1 alwayε beginning with the middle of the Eco RI εite, specifically the firεt T nucleotide.

The Eco Rl-linearized pl9Luc vector contained the following list of elementε and reεtriction εiteε beginning with the 5' middle Eco RI "T" nucleotide poεition 1 and extending to the 3 ' end of the vector ending with the middle Eco RI "A" nucleotide position 6170 (nucleotide positions aε listed in SEQ ID NO 21 are indicated in parentheses) : a Pεt I reεtriction εite (750-755) within the pBR322-derived ampicillin reεiεtance- conferring gene (amp) ; an Ace I reεtriction site downstream of the amp gene (2113-2118) ; two tandem polyadenylation sites immediately upstream of the multiple cloning site beginning with Bam HI (2771-2776) and Hind III (2778-2783), continuing with adjacent Sph I, PstI, Hinc II/Acc T/Sal I, Xba I, Bam HI, X a I/Sma I, Kpn I, Sεt I, and ending with Eco RI (2829-2834);

methodε of preparing pl9Luc-derived expreεεion vectorε having TGF-β reεponse elementε and promoterε are familiar to one of ordinary εkill in the art of vector conεtruction and are deεcribed by Auεebel, et al., In Current Protocolε in Molecular Biology, Wiley and Sonε, New York (1993) and by Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, 1989.

2) Preparation of Expresεion Vector P1500LUC One expreεεion vector of this invention, designated pl500Luc, was constructed from pl9Luc and a cosmid containing the PAI-1 promoter in which TGF-β responεe elementε are located. To prepare pl500Luc, a 1547 baεe pair (bp) Kpn I- Eco RI fragment of the PAI-1 promoter was obtained from a * cosmid containing the entire PAI-1 gene (Loskutoff et al. , Bioche .. 26:3763-3768 (1987), the diεclosure of which is hereby incorporated by reference, and waε cloned into the Kpn I and Eco RI sites of pUC19, a plasmid available from American Type Culture Collection, Rockville, MD with the ATCC Accesεion Number 37254, to create a vector deεignated pUCEK19. The fragment contained the 1442 bp TGF-β reεponεe element (SEQ ID NO 11) from the PAI-1 promoter that correεponded to nucleotide position -1481 and extended to the nucleotide poεition -40 continuouε with a 115 bp minimal (non-TGF-β reεponεive) PAI-1 promoter sense strand sequence (SEQ ID NO 18) corresponding to nucleotide poεition -39 ending with an E. coli DNA polymerase filled-in Eco RI site at nucleotide position at +76 as described by Bosma et al., J. Biol. Chem.. 263:9129-9141 (1988) . The entire 15,867 bp PAI-1 gene εequence including εignificant εtretches of DNA that extend into its 5'- and 3'- flanking DNA regions was described by Bosma et al. , J. Biol. Chem.. 263:9129-9141 (1986), and iε available in the GenBank™/EMBL Data Bank with acceεεion number(s) J03764. To create a sensitive reporter gene system with a regulatory region having the 1442 TGF-β responεe element of the

PAI-1 promoter contiguouε with the minimal PAI-1 promoter, the PUCEK19 plasmid prepared above was then digeεted with Kpn I and Eco RI and the iεolated fragment waε then ligated into the multiple cloning εite of a εimilarly digeεted pl9Luc. The resulting vector was designated pl500Luc.

3) Preparation of Expression Vector P800LUC

Another vector, designated pδOOLuc, waε prepared for εubsequent constructon of pδOOneoLuc aε described below. The pδOOLuc plasmid, having a deletion in the 5' end of the PAI-1 construct so that the 5' end began with the -800 nucleotide in the native PAI-1 promoter, was prepared by digesting the PAI-1-gene-containing cosmid described above with Hind III and Eco RI . The actual Hind III-Eco RI digest of the PAI-1 promoter resulted in a fragment that corresponded to nucleotides -799 to +71 bp in the PAI-1 promoter that waε subsequently ligated into a similarly digeεted pl9Luc vector forming a PAI-1 region extending from nucleotide -800 to +76. The resulting pδOOLuc plasmid retained all the features of pl9Luc with the exception of the insertion of the PAI-1-derived regulatory region having a TGF-β response element and a promoter.

The restriction fragments described to prepare pl500Luc and pδOOLuc had an identical 3' end (an Eco RI site at +71 nucleotide of the PAI-1 promoter) and a different 5' end. The vectors, pl500Luc and pδOOLuc, were used for transient transformations as they lacked a selectable marker gene. The pl500Luc plasmid waε also used to'prepare stable transformations with a second vector as described in Example 1C. In addition, the pδOOLuc served as the starting cloning construct for the preparation of pδOOneoLuc as described below. The TGF-β responεe element in the -800 to +76 PAI-1 promoter region began at -800 and ended at -40, the nucleotide εequence of which iε liεted in SEQ ID NO 12. The remaining nucleotides comprised the non-TGF-β responεive minimal promoter in thiε

PAI-1 fragment are listed in SEQ ID NO 18.

4) Preparation of Cloninσ Vector P39LUC

An expresεion vector, designated p39Luc, having a promoter for activating transcription of the luciferaεe gene while lacking TGF-β reεponse elementε, thereby lacking reεponεiveness to TGF-β, waε prepared as described by Keeton et al., J. Biol. Chem.. 266:23046-23052 (1991) . A fragment of the PAI-1 promoter (i.e., between -39 and +76, which had been determined in the TGF-β assay as deεcribed in Example 3A to have low baεal activity and only minimal response to TGF-β (average induction of 2.7-fold), was used aε a minimal promoter in the conεtructε for uεe in quantifying the amount of TGF-β in a teεt liquid sample. Since the minimal promoter sequence conferred only a minimal background responεe to TGF-β aε εhown in Example 3A, the minimal PAI-1-derived promoter is also referred to as being "non-TGF-β responsive" .

Briefly, the pδOOLuc vector was linearized by digestion with Hind III followed by 5' digeεtion of PAI-1 promoter with Bal-31 εlow exonucleaεe (International Biotechnologieε, New Haven, CT) aε deεcribed by Keeton et al., J. Biol . Chem.. 266:23048-23052 (1991) . The digeεtion waε allowed to proceed until the -39 nucleotide poεition of the PAI-1 promoter waε reached. Thereafter, the linearized and Bal-31 digested plasmid was ligated with T4 ligase forming a double-εtranded circular vector deεignated p39Luc.

The reεultant expreεεion vector, into which TGF-β reεponεe elementε were subsequently ligated as described in Example 1C, contained the PAI-1 minimal promoter nucleotide sequence corresponding to -39 to +76 of the promoter as listed in SEQ ID NO 18. This minimal promoter was operatively linked to and continuous with the structural region that contained the firefly luciferase gene present in the vector. Since the p39Luc cloning vector was derived from pδOOLuc which itself was derived from pl9Luc, the remaining elements and features of the

vector were retained unchanged from pl9Luc. The 6229 bp εense strand nucleotide sequence of the Eco Rl-linearized p39Luc vector is liεted in the SEQ ID NO 23.

The p39Luc cloning expresεion vector iε alεo obtained by preparing a double-εtranded olignucleotide sequence corresponding to the εequence in SEQ ID NO 16 and ligating it into the Hind I I/Eco RI multiple cloning site of pl9Luc. The overhang from the Hind III/Eco RI digestε in the pl9Luc vector iε firεt digested with mung bean nuclease and followed by ligation with the blunt-ended double-εtranded oligonucleotide promoter. Other conεtruction methods are well known to and easily accomplished by one of ordinary skill in the art.

The p39Luc vector was useful for operatively ligating regulatory regions that contained TGF-β response elements resulting in an expression vector that was reεponεive to DNA- binding proteins, the result of which was induction of the transcription and translation of the indicator molecule, luciferase. TGF-β reεponεive expreεεion vectorε for uεe in practicing thiε invention having TGF-β reεponse elementε other than thoεe εpecified herein are readily conεtructed through the use of either pl9Luc or p39Luc εtarting cloning expression vectors.

5) Preparation of Cloninσ Vector HBVLuc To create expreεsion vectorε having heterologouε non-TGF-β reεponεive promoterε instead of having the PAI-1- derived minimal promoter described above, a minimal promoter construct derived from the Hepatitis B viral promoter (HBV) waε εelected. Thiε promoter contained the nucleotide εequence from -188 to +145 of the Hepatitis B promoter and showed only a 4- fold induction in response to TGF-β. The senεe εtrand of the double-εtranded nucleotide εequence of the HBV minimal promoter iε listed in SEQ ID NO 19. This promoter corresponded to the nucleotide sequence from -188 to +145 of the Hepatitis B promoter and showed only 4-fold induction in response to TGF-β.

The 6464 bp εenεe εtrand nucleotide sequence of the Eco Rl- linearized pHBVLuc vector is listed in the SEQ ID NO 25.

6) Preparation of Expression Vector pROOneoLuc

For preparing an expresεion vector for use in stable transformations, the neomycin-resiεtance conferring gene from pMAMneo (Clontech, Palo Alto, CA) was inεerted into the p800Luc vector containing -δOO to +76 of the 5' end of the human PAI-1 gene followed by the firefly luciferaεe gene. Aε εhown in Figure 1, pδOOLuc prepared above waε firεt digested with Ace I, repaired to blunt ends with the Klenow fragment of DNA polymerase I, and then was isolated. The pMAMneo plasmid was digested with Sal I and Eco RI then blunt-ended with Klenow. The neomycin-resiεtance gene containing fragment waε then isolated and had the 4302 bp sense εtrand nucleotide sequence listed in the Sequence Listing in SEQ ID NO 20. The linearized pδOOLuc and neomycin-resiεtance fragment were ligated, and one clone with the insert in the correct orientation was selected by restriction mapping and designated pδOOneoLuc. The entire Eco Rl-linearized 11293 bp nucleotide εequence of the sense strand of the double-stranded pδOOneoLuc vector is listed in the Sequence Liεting in SEQ ID NO 1. DNA sequencing was performed by a modification of the dideoxy chain-termination procedure with a Sequenase kit (United Stateε Biochemical; Cleveland, OH) . This clone, purified from large scale plasmid preparations via CsCl2 gradientε, waε uεed for subsequent transfections.

Since the pδOOneoLuc cloning vector waε derived from pδOOLuc which itself was derived from pl9Luc, the remaining elements and features of the vector were retained unchanged from ' pl9Luc. The pδOOneoLuc vector thus contained the neomycin-reεiεtance conferring gene providing for stable transformants . The pδOOneoLuc vector also contained an operatively ligated regulatory region that contained TGF-β

reεponεe element in the εequence correεponding to -600 to -40 of the PAI-1 promoter resulting in an expression vector that was responεive to TGF-β. With this expression vector construct, the induced activation of the transcription and translation of the indicator molecule, luciferase, was obtained further allowing for the quantitation of the amount of TGF-β responsible for activating gene expresεion.

7) Preparation of Cloninσ Vector p39neoLuc To create an expreεεion vector uεeful for constructing TGF-β responεive vectors that resulted in stably transformed cells, the p39Luc cloning vector prepared above was linearized as deεcribed above for pδOOLuc and ligated with the neomycin-reεistance conferring gene fragment from pMAMneo. The construction of the vector waε performed as described in Example 1A6) . The reεultant p39neoLuc cloning expression vector had the Eco Rl-linearized 10533 bp senεe εtrand nucleotide sequence listed in the SEQ ID NO 22. Regulatory regions containing TGF-β responεe elements were operatively ligated 5' to the minimal promoter sequence of the p39neoLuc as deεcribed in Example 1C for the preparation of plasmids for transient tranεformation.

δ) Preparation of Cloninσ Vector pHBVneoLuc To create an expreεεion vector useful for constructing TGF-β responεive vectorε with a heterologous promoter for stably transforming cellε, the pHBVLuc cloning vector prepared above waε linearized as described above for pδOOLuc and ligated with the neomycin-reεiεtance conferring gene fragment from pMAMneo. The conεtruction of the vector waε performed as described in Example 1A6) . The resultant pHBVneoLuc cloning expreεεion vector had the Eco Rl-linearized 10768 bp εense strand nucleotide sequence listed in the SEQ ID NO 24. Regulatory regions containing TGF-β responεe elementε were operatively ligated 5' to the minimal promoter εequence of

the pHBVneoLuc as described in Example 1C for preparing plasmidε for tranεient transformation.

9) Preparation of pl500neoLuc, p800/636neoLuc. p56neoLuc. p674neoLuc. p743neoLnc and p732neoLnc Expresεion Vectors

The pl500Luc vector prepared above iε similarly ligated with the neomycin-resistance gene from pMAMneo to form pl500neoLuc. Other PAI-1-promoter containing expreεεion vectorε lacking the neomycin reεiεtance gene, p800/636Luc, p56Luc, p674Luc, p743Luc and p732Luc, containing εmaller TGF-β response elements were prepared as deεcribed in Example 1C. To create the correεponding neomycin-reεiεtance expreεεion vectorε ■ for εtably tranεforming recipient cellε, the neomycin- reεiεtance gene from pMAMneo is separately ligated with each of these five vectorε to form expreεεion vectorε uεed for generating εtable cell tranεformationε . The five reεultant vectorε having the neomycin-reεiεtance gene inεerted are deεignated pδ00/63δneoLuc (10697 bp) , p56neoLuc (10549 bp) , p674neoLuc (10558 bp) , p743neoLuc (10569 bp) and p732neoLuc (10558 bp) and have the respective complete nucleotide sequenceε of the εenεe εtrand from the Eco Rl-linearized double-εtranded vectorε in SEQ ID NOε 2-6. Depending on the vector into which the PAI-1 promoter fragments were cloned, the designated names either had "Luc" alone or "neoLuc" respectively for vectors lacking the neomycin (neo) selectable marker gene or containing it. In addition, the plaεmids were further designated by the 5' end of the PAI-1 TGF-β responεe element. For example, five plaεmids with shorter TGF-β responεe elementε were thuε named p800/636neoLuc, p56Luc, p674Luc, p743Luc and p732Luc.

-As with all the expression vectors of this invention, the operative elements from the original cloning vector pl9Luc, from which the vectors were all derived, were retained.

The above neomycin-resistance containing expression vectors were then used in the TGF-β assay method as described in Example 3 following transformation of host recipient cells.

B. Expression Vectors for Co-Transformation of TGF-β Responsive Vectors and a Selectable Marker Vector for Stable Transforma ion Stably transformed Hep3B cells were also obtained as described in Example 2B below through the use of co- transfections of a TGF-β responεive vector lacking a εelectable marker gene of thiε invention, specifically the pl500Luc prepared in Example 1A3), with a selectable marker vector, RSVneo, available from American Type Culture Collection (ATCC) , Rockville, MD, ATCC Accession Number 37198. The stably transformed cell line containing plasmid pl500Luc, designated LUCI, was depoεited with the ATCC on or before December 16, 1993 and waε assigned the ATCC Accession Number CRL 11508.

C. Expression Vectors for Transient Transformation Additional TGF-β responεive expreεεion vectors were prepared for use in this invention. In the vectors prepared as described herein, the TGF-β response elements having a smaller length, thereby providing reεponεiveneεε to TGF-β with reduced or abεent reεponεiveness to other growth modulatorε, were made by either reεtriction digeεtion of the PAI-1 promoter or εyntheεizing double-εtranded blunt-end oligonucleotideε. The oligonucleotide εequenceε correεponded to preεelected regions of the PAI-1 promoter εequence. The reεultant TGF-β reεponεe elementε preεent within a regulatory region were then directionally ligated into p39Luc or p39HBV.

The regulatory region from the PAI-1 promoter corresponding to nucleotide position -800 up to and including -636 waε obtained by restriction digestion and had the following sense strand sequence: 5 'AAGCTTACCATGGTAACCCCTGGTCCCGTTCAGCCACCACCACCCCACCCAGCACACCT CC

AACCTCAGCCAGACAAGGTTGTTGACACAAGAGAGCCCTCAGGGGCACAGAGAGAGTCTG GAC ACGTGGGGAGTCAGCCGTGTATCATCGGAGGCGGCCGGGCA3 ' (SEQ ID NO 13) . The additional εelected regions for preparing oligonucleotides included the following sense strand nucleotide sequenceε with the indicated nucleotide poεitionε as present in the intact

PAI-1 promoter: 1) promoter nucleotide position -56 up to and including -41: 5 'AGTTCATCTATTTCCT3 ' (SEQ ID NO 14); 3) promoter nucleotide position -674 up to and including -650: 5 'GTGGGGAGTCAGCCGTGTATCATCG3 ' (SEQ ID NO 15) ; 4) nucleotide position -743 up to and including -708:

5 'CTCCAACCTCAGCCAGACAAGGTTGTTGACACAAGA3 ' (SEQ ID NO 16); and 5) nucleotide position -732 up to and including -708: 5 'GCCAGACAAGGTTGTTGACACAAGA3 ' (SEQ ID NO 17) . The complementary sequenceε to each of the senεe oligonucleotide sequenceε were alεo synthesized to allow for the formation of double-stranded oligonucleotides for ligation 5' to the PAI-1 minimal promoter sequence containing the TATA box.

The resulting double-stranded oligonucleotides were then separately operatively linked to the -39 position of this minimal promoter senεe strand sequence listed in SEQ ID NO 16 present in the expreεεion vector, p39Luc, prepared aε deεcribed in Example 1A4) . The εequenceε were confirmed by double- stranded sequencing methods.

The resulting five plasmids with shorter TGF-β reεponεe elements were thus named p800/636Luc, p56Luc, p674Luc, p743Luc and p732Luc. The plasmidε, p56Luc, p674Luc, p743Luc and p732Luc, have the reεpective complete εenεe εtrand nucleotide sequenceε beginning with the middle T of the Eco RI site as previously described listed in SEQ ID NOs 7-10. The plasmidε, p674Luc, p743Luc and p732Luc, were depoεited with ATCC aε described in Example 5 and respectively assigned the ATCC Accession Numbers 75627, 75628 and 75629.

In similar procedures, five plasmids having a heterologous hepatitis B viral promoter, HBV, instead of the PAI-1 minimal promoter were prepared with the shorter TGF-β reεponεe

elements, p800/636Luc, p56Luc, p674Luc, p743Luc and p732Luc. The HBVLuc cloning expresεion vector waε prepared aε deεcribed in Example 1A4) . The TGF-β reεponse elementε were ligated into linearized HBVLuc, prepared aε described in Example 1A5) , to form TGF-β response element-containing plasmidε lacking the neomycin-resistance-conferring gene.

Furthermore, as previously mentioned, the cloning vector constructε, pl9Luc and p39Luc, provide for the operative linking of preεelected regulatory regionε with preεelected promoters, both of which are not limited to the εpecific conεtructs described herein and above. Additional TGF-β responεe elementε in varied lengths and arrangementε along with promoters that provide for the tranεcription of the reporter gene are contemplated for uεe in thiε invention.

2. Transformation of Eucaryotic Cells with Expression Vectors Containing TGF-β Response Elements A. Reςj-pj-ept Evςgryotic Cells

To identify the cell types moεt reεponsive to TGF-β in which to transfect the TGF-β responsive expresεion vectors for use in asεaying the amount of TGF-β, the vectorε prepared in Example 1 were tranεfected aε deεcribed in Example 2B and 2C into recipient cell lineε including mink lung epithelial cellε (MLE cellε) (ATCC CCL 64), HeLa cellε (ATCC CCL 2), Chineεe hamster ovary (CHO cellε) (ATCC CCL 61), GM7373 (chemically tranεformed metal bovine aortic endothelial cellε or BAEs) (NIGMS Human Genetic Mutant Cell Repository, Ca den, NJ) , Hep3B (ATCC HB 8064) and NIH 3T3 cellε (ATCC CRL 1658) .

B. Stable Tranεforma ion

For preparing stably transfected cells for use with expresεion vectorε containing the pMAMneo conεtruct prepared in Example 1A, tranεfectionε of mink lung epithelial cellε (hereinafter referred to as MLE cells to distinguish from the TGF-β proliferation assay called MLEC) were performed. The MLE

cellε were εeeded at 7 x 10 5 cells/100 mm diεh for 24 hours at which point they were transfected with the PAI/L conεtruct, pδOOneoLuc, by calcium phoεphate precipitation aε described by Wigler et al . , Proc. Natl. Acad. Sci .. USA. 76:1373-1376 (1979) . Twenty-four hourε after tranεfection, the medium waε replaced and supplemented with 400 μg/ml of Geneticin. The resiεtant cellε were expanded in maεε culture or cloned by limiting dilution for further experimentε. Following selection, tranεfected MLE cells were maintained in DMEM containing 10% fetal calf serum and 250 μg/ml Geneticin (G-418 sulfate) (Gibco BRL, Grand Island, NY) .

Stable transformations are also performed as described above with the expression vectors, p800/636neoLuc, p56neoLuc, p674neoLuc, p743neoLuc and with p732neoLuc, all of which are prepared as described in Example 1A.

C. Stable Transformation Obtained bv Co- transfection of Cells

For transfecting 6 wells, 15 micrograms (μg) of pl500Luc expreεεion vector prepared in Example 1A2) that did not have a neomycin-reεistance gene was admixed with 3 μg of a plasmid encoding the neomycin selectable marker gene driven from a respiratory εyncytial virus promoter, RSVneo. The RSVneo plasmid is available from ATCC with ATCC Accesεion Number 37198. Hep3B cellε at a concentration of 6 X 10 5 cellε/well were εeeded aε deεcribed above in Example IB for 24 hours at which point they were transfected with the PAI/L conεtruct, pl500Luc, by calcium phosphate precipitation followed by selection with Geneticin. The reεultant cell line stably transformed with pl500Luc, designated LUCI, was deposited with ATCC on December 16, 1993 and was assigned the ATCC Accession Number CRL 11508.

D. Transient Transformation For preparing transiently tranεformed cellε

containing TGF-β responsive expresεion vectors lacking the neomycin resiεtance gene prepared aε described in Example 1C, Hep3B human hepatoma cells obtained from ATCC (ATCC Accession Number HB8064) were maintained in DMEM/HAMs F-12 (Whittaker Bioproductε, Walkerεville, MD) εupplemented with 10% fetal bovine serum (Hyclone Laboratories, Logan, UT) , glutamine, sodium pyruvate, non-esεential amino acids and penicillin/εtreptomycin (Whittaker) . For tranεfection experiments, semiconfluent cells in 6-well (10 cm 2 per well) tissue culture plates (Corning Inc., Corning, NY) were washed twice with εerum free media (DMEM/F-12) then incubated in εerum free media. Separate mixtures (50 ul/well) of lipofectin (GIBCO, Grand Island, NY) at a concentration of 12.5 μg/well and DNA vector conεtructs prepared in Example 1A-1C at a concentration of 2.5 μg/well each in water were added to the cell-containing wells and the plates were incubated for 18 hours. After lipofection, plates were incubated an additional 24 hourε in the abεence or presence of 1 ng/ml TGF-β provided by Berlix Biosciences, South San Francisco, CA. The monolayers were then waεhed followed by extraction into 0.25% Triton X-

100. Each conεtruct was tested with at least 2 independent DNA preparations in order to rule out any effectε related to differenceε in DNA preparation. For each experiment, two independent tranεfections were performed with every construct.

3. Method for Quantifying the Amount of TGF-β in a Sample A. The TGF-β Asεav Method

The p800neoLuc conεtruct εtably tranεfected into Hep3B cellε waε uεed in the initial characterization of the aεεay method aε deεcribed herein. TGF-β meaεurement aεsays performed with cells transiently transformed with the remaining expresεion vectorε containing TGF-β response elementε are presented in Example 4. The TGF-β asεay allows for the quantification of the

amount of TGF-β in a liquid sample, either containing purified TGF-β or TGF-β in a heterogeouε admixture. The assay εyεtem provideε for the quantification of TGF-β through the expression of an indicator polypeptide, such as luciferase. When TGF-β receptor-bearing cells, transfected with a TGF-β responεive expresεion vector of this invention, are exposed to TGF-β, the activation of the TGF-β response element in the vector resultε in the concomitant expreεεion of luciferaεe. The resulting expresεed luciferaεe is isolated then measured as described herein. The measured luciferase resulting from activation by TGF-β in the test liquid εample iε then compared to a standardized reference curve.

This reference curve iε obtained from parallel aεεays performed by exposing similarly transfected cells to a range of known measured amounts of TGF-β, one or more of the known TGF-β isoforms. The resulting expressed luciferase is then determined in a luminometer. A reference curve is then generated by plotting the measured amount of expreεεed luciferaεe againεt the known range of inducing amountε of TGF- β. The amount of unknown TGF-β in the teεt liquid sample iε then determined by extrapolating the measured amount of test luciferase to the reference curve. The uεe of standard curves ■ in quantifying the amount of protein in a liquid sample in general has been described by Lowry et al., J. Biol . Chem.. 193:265-275 (1951), the diεcloεure of which iε hereby incorporated by reference. Aε εhown in the Exampleε herein, the TGF-β aεεay of thiε invention allows for the measurement of TGF-β from the expresεion and εubεequent detection of an indicator polypeptide from a concentration range from less than 5 picogramε/ml (pg/ml) equivalent to 0.2 pM to 10 ng/ml equivalent to 0.4 nM. The doεe-dependent reεponse iε linear between 0.2 pM up to 30 pM and even up to 100 pM depending on the aεεay conditionε.

An additional aspect of the aεεay for quantifying TGF-β in complex εolutions was the use of neutralizing anti-TGF-β

monoclonal antibodies admixed with the test liquid sample in asεayε run in parallel to untreated teεt liquid samples as described in Example 3B. These control asεayε are uεed to determine if other moleculeε are preεent in the teεt εample 5 that can affect the aεεay through either inhibition or activation of other regionε of the truncated PAI-1 promoter. For example, conditioned medium obtained from cell cultureε and body fluidε contain growth factorε and DNA binding proteinε that function aε tranεcriptional activatorε or inhibitorε. If

10 a correεponding reεponεe element for an additional non-TGF-β activator or inhibitor iε preεent in the expreεsion vector, the binding of that molecule to the reεponεe element may cauεe enhanced or diminiεhed expreεεion of the indicator polypeptide. By antibody neutralization of the TGF-β in the teεt sample, any

15. residual measured luciferase can then be ascribed to non-TGF-β activation.

The shorter TGF-β response elements used in the expreεεion vector εyεtemε of thiε invention, even including the longer pδOOneoLuc, are less likely to have non-TGF-β responεe elementε

20 that are bound by other DNA-binding proteins as εhown in

Exampleε 3C-3F. Thuε, the uεe of parallel antibody control aεεayε to allow for a determination of the amount of luciferaεe produced from only TGF-β activation is preferred when expresεion vectorε having longer reεponεe elementε are uεed.

25 Moreover, while the TGF-β aεsay is not iεoform εpecific, uεing the appropriate εtandard reference curveε and parallel aεsays with neutralizing antibodies to the various TGF-β specieε allowε for quantification of unique TGF-β iεoforms.

In the assays described herein, the various following

30 reagentε including their εourceε are liεted: recombinant human TGF-βl (rTGF-βl) (gift from Berlix Bioεcienceε, South San Franciεco, CA) ; rTGF-β2 and neutralizing monoclonal antibodieε againεt TGF-βl, TGF-β2 and TGF-β3 (Genzyme, Cambridge, MA); rTGF-β3, recombinant human interleukin-lalpha (rIL-lalpha) and 5 recombinant human platelet-derived growth factor-BB (PDGF-BB)

(R&D Syεtemε, Minneapoliε, MN) ; recombinant human baεic fibroblaεt growth factor (bFGF) (Synergen Inc., Boulder, CO); epidermal growth factor (EGF) from mouεe εubmaxillary glandε (Boehringer Mannheim Biochemicalε, Indianapoliε, IN) ; dexamethasone, retinoic acid, and plasmin (Sigma Chemical Co., St. Louiε, MO); thrombin (Armour Pharmaceutical Co., Kankakee, ID ; and hematopoetic factorε granulocyte-colony εtimulating factor (GCSF) , granulocyte-macrophage-colony εtimulating factor (GMCSF) , εtem cell factor, and IL-3 (Amgen, Thouεand Oakε, CA) . The TGF-β quantification aεεay of thiε invention was performed aε follows: 1.6 x 10 4 stably tranεfected MLE cellε per well plated in 96 well tiεεue culture diεheε were allowed to attach for 3 hours at 37°C in a 5% CO 2 incubator. The medium was replaced with the test εample containing unknown quantities of TGF-β, DMEM, 0.1% BSA (DMEM-BSA) containing rTGF- βl, rTGF-β2, rTGF-β3, IL-lalpha, PDGF-BB, bFGF, or EGF for 14 hours at 37°C. Time courseε of exposure to the sampleε were performed aε shown for optimizing the assay as shown below. However, in general, approximately 24 hourε after additionε of the εample to the tranεfected cells, the cells were observed under phase contrast microscopy. At least in one vector- tranεfected cell line, Hep3B cellε, the preεence of TGF-β in quantitieε at least or greater than 0.1 ng/ml TGF-β in the sample was detected visually by the change of morphology and density of the cell population. The untreated cells remained organized with cell size decreasing upon confluence until the cell borders were no longer visible. In the preεence of TGF-β, the untreated cell denεity waε never attained and the cellε were larger, flatter and leεε organized. Following viεual inεpection, cell extracts were prepared and asεayed for luciferase activity uεing the enhanced luciferaεe aεεay kit (Analytical Lumineεcence, San Diego, CA) as per the manufacturer's illustructions . Treated cells were first washed twice with 2 ml phosphate-buffered saline (PBS) without Ca ++ and Mg ++ and then extracted with 100 ul of 0.25%

Triton-X 100 (cell lyεiε buffer, Analytical Lumineεcence) . The plateε were gently εhaken until the monolayer detached from the plaεtic. The plateε were then placed on a rotator at room temperature for 20 minuteε. Eighty ul of the reεultant lysates were tranεferred to a Microlight 1 96-well plate (Dynatech Laboratorieε Inc., Chantilly, VA) and were analyzed uεing an ML1000 luminometer (Dynatech) with 100 ul injectionε of both Subεtrateε A and B (Analytical Luminescence) . Luciferase activity was reported as relative light units (RLU) aε meaεured by the light generated over a ten εecond period. All aεεayε were performed in triplicate. Error barε in the collected data repreεent the standard error of the mean of the samples.

To quantitate the amount of TGF-β inducing the measured amount of luciferase from liquid sampleε, reference curveε were prepared from parallel aεεayε performed by expoεing εimilarly tranεfected cellε to a range of known meaεured amounts of TGF- β, one or more of the known TGF-β isoformε. Serial dilutionε of the control TGF-β concentrationε were prepared from a 1 nanomolar (nM) concentration down to 0.078 picomolar (pM) . The TGF-β aεεay waε performed for each εerial dilution and the reεulting expreεεed luciferaεe was then determined in a luminometer. A reference (standard) curve was then generated by plotting the meaεured amount of expressed luciferase against each of the known concentrations of inducing amounts of TGF-β. The amount of unknown TGF-β in the test liquid sample was then determined by extrapolating the meaεured amount of teεt luciferaεe to the reference curve.

B. Sensitivity of the TGF-β Assay Method

To identify the cell type most responεive to TGF-β for uεe in the methods of this invention, the pδOOneoLuc construct prepared in Example 1A was stably tranεfected aε described in Example 2B into a variety of cell lines including MLE cells, HeLa, Chinese hamster ovary (CHO) , GM7373 cells

(chemically transformed fetal bovine aortic endothelial cells obtained from the NIGMS Human Genetic Mutant Cell Repoεitory, Ca den, NJ) and NIH 3T3 cellε. After treatment of the transfected cell lines with recombinantly-produced TGF-βl, designated rTGF-βl, the cell lyεateε were aεεayed for luciferase activity and protein content. There was a linear relationship between the luciferase activity and the protein content of the cell lyεateε between 0.7 and 14 μg for all of the cell lines. Nontransfected parental cellε demonstrated no detectable luciferaεe activity. Of the variouε cell lineε, the tranεfected MLE cellε demonεtrated the greateεt εenεitivity to TGF-β. After cloning the tranεfected MLE cellε by limiting dilution, cellε from clone 32 (C32) were found to be the moεt senεitive and were uεed for all subsequent assayε. C32 cells were senεitive to rTGF-βl, β2 and β3 in the picomolar (pM) to the nanomolar (nM) range aε evidenced by increased luciferase activity in relative light units (RLU) as shown in Figure 2A. All three isoforms, rTGF-βl, rTGF-β2 and rTGF-β3, respectively graphed as cloεed εquareε, cloεed circleε and closed triangles, demonstrated good dose dependant responseε particularly at low TGF-β concentrationε (<4 pM: 100 pg/ml) where the reεponεeε were eεεentially linear (Figure 2B) . rTGF-β3 waε the moεt potent inducer of luciferaεe activity conεiεtent with the observation that MLE cells were most sensitive to this isoform of TGF-β3 as described by van

Zonneveld et al . , Proc. Natl. Acad. Sci .. USA. 85:5525-5529 (1988) (see also Figure 6 as described in Example 3E) .

To further asεeεs the dose-dependent responεiveness of luciferase activity by TGF-β induction, the TGF-β asεay waε performed with 8 pM of rTGF-βl, rTGF-β2 or rTGF-β3 in DMEM-BSA in the preεence (partially filled εquareε) or abεence (open squares) of 100 μg/ml of anti-TGF-βl, anti-TGF-β2 or anti-TGF- β3 monoclonal antibodies (Genzyme Corp., Cambridge, MA) . As shown in Figure 2C, the induction of luciferase activity by rTGF-βl, rTGF-β2 and rTGF-β3 was inhibited by the addition of

rTGF-βl, rTGF-β2 and rTGF-β3 neutralizing monoclonal antibodies aε compared to the baεeline induction obtained when uεing medium alone (filled εquareε) .

The effects of cell culture medium, cell density and assay incubation time on the senεitivity of the TGF-β aεεay waε alεo aεεeεεed. To teεt the effectε of cell culture medium, the TGF- β assay was performed using increasing concentration of rTGF-βl in DMEM (closed squares), alpha-MEM (closed circles), CMEM (Eagles medium supplemented with nonesεential amino acids; closed triangles), or RPMI-1640 (closed diamonds) . All media contained 0.1% BSA. The quantification of TGF-β in teεt samples was accompliεed in the TGF-β aεεay in all tested media as shown in Figure 3A, although samples asεayed in DMEM yielded the greateεt luciferaεe activity. The effect of different cell plating denεitieε on the induction of luciferaεe activity by rTGF-βl were alεo examined when tranεfected cellε were maintained in the preεence of DMEM. For thiε assay, increasing concentrations of rTGF-βl in DMEM and 0.1% BSA were measured using 3.2 X 10 4 (cloεed εquareε), 1.6 X 10 4 (cloεed circleε), or 0.2 X 10 4 (cloεed triangles) C32 cells/well after a three hour attachment period. The teεt samples were maintained with the transfected cells for 14 hours prior to asεaying for luciferaεe activity. The reεultε graphed in Figure 3B εhow that 1.6 x 10 4 cells/well were found to yield the best overall resultε. Cell denεitieε greater than 1.6 x 10 4 cellε/well decreased the senεitivity of the aεεay at low TGF-β concentrationε and did not εignificantly increase sensitivity at higher TGF-β3 levels. Decreasing the concentration of cellε to 0.8 x 10 4 cells/well increased the εensitivity at low TGF-β3 levels (Figure 3D (inset in Figure 3C) but decreased εenεitivity at higher TGF-β concentrationε.

Unlike the traditional MLEC aεεay where the denεity of the cellε prior to plating affectε the εensitivity, there was little or no difference whether the cellε were 70% confluent, confluent or 1 day post confluent prior to plating for the TGF-

β asεay. The cell attachment and incubation times, however, did affect the senεitivity. When C32 cellε were plated for 2, 3 or 4 hourε prior to the addition of εampleε, a 3 hour plating time appeared to be optimal. Shorter plating times decreased sensitivity, whereas longer times had little effect on the εubεequent aεεay.

Incubation time with the εample alεo affected the aεεay. After a three hour attachment period, 1.6 X 10 4 C32 cellε were incubated with variouε concentrationε of rTGF-βl ranging from 0 to 50 pM for 6 (cloεed εquareε), 14 (cloεed circleε) or 22 hourε (closed triangles) prior to asεaying for luciferaεe activity aε εhown in Figure 3C. Incubation times of 12-14 hours were found to give the best results over the wideεt concentration range. The sensitivity of cells incubated for 6 hourε waε not aε great at higher TGF-βl concentrationε, whereaε the sensitivity of cells incubated for 22 hours was decreased at low TGF-βl concentrations. There also appeared to be a slight decrease in senεitivity to TGF-β aε the cellε were repeatedly passaged (>30) . This phenomenon was observed for the MLEC assay as well.

C. Specificity of the TGF-β Assay Method

After examining the senεitivity of the aεεay, specificity of the TGF-β asεay waε then examined. Four known inducerε of PA.I-1 expreεεion, were incubated with C32 cellε and the luciferase activity determined. The inducerε teεted included fibroblast growth factor (bFGF) (Saksela et al, J. Cell Biol.. 105:957-963 (1987)), platelet-derived growth factor (PDGF-BB) (Reilly et al . , J. Biol. Chem.. 266:9419-9427 (1991) ), interleukin-1 alpha (rIL-lalpha) (Schleef et al. , J. Biol. Chem.. 263:5797-5803 (1986)) and epidermal growth factor (EGF) (Seebacher et al . , EXP. Cell Res.. 203:504-507 (1992) and Sato et al., Flxn. Cell Res.. 204:223-229 (1993)) . The assay was performed as described in Example 3A with DMEM-BSA containing rTGF-βl (closed squares), recombinant human bFGF

(cloεed circleε), recombinant IL-lalpha (closed triangles), recombinant PDGF-BB (cloεed triangleε) or EGF (open squares) ranging in concentration from 0.1 to 500 pM. Aε seen in Figure 4A, even at high concentrations of theεe factorε (500 pM) , there waε little or no induction of luciferaεe expression except by PDGF which demonstrated a slight induction.

Additional inducers of PAI-1, dexamethaεone (10 ~7 M) , retinoic acid (1 uM) , plaεmin (0.1 U/ml), thrombin (1 U/ml), and the hematopoetic factors granulocyte colony stimulating factor (10 ng/ml; 525 pM) , granulocyte-macrophage-colony stimulating factor (10 ng/ml; 690 pM) , stem cell factor (50 ng/ml; 2.7 nM) and IL-3 (10 ng/ml; 666 pM) , were also teεted for their ability to induce luciferase expresεion in the aεεay method of thiε invention. Only plaεmin and thrombin elicited minor elevationε of luciferaεe activity that were inhibited by the addition of aprotinin or hirudin, reεpectively. Of the moleculeε teεted in the TGF-β cell aεεay, only the TGF-βε demonstrated doεe-dependent increaεeε in luciferaεe expreεsion. When these factors were tested in the presence of TGF-βl, a slightly different pattern emerged. These aεεayε were performed with C32 cells maintained in DMEM/BSA containing 1 pM rTGF-βl (closed squareε) εeparately admixed with each of the growth factorε, bFGF (cloεed circleε) , recombinant IL-lalpha (cloεed triangleε), recombinant PDGF (cloεed diamondε) or EGF (open squares), ranging in concentration from 0.2 to 500 pM. The reεults, graphed in Figure 4B, show that high concentrationε (500 pM) of PDGF-BB and rIL-lalpha increaεed the luciferase ativity above that induced by TGF-β alone. bFGF had a similar effect that was observed at lower concentrationε. Thiε induction, maximal at 10 pM bFGF, was abrogated by the addition of bFGF neutralizing antibodies, and did not increase at higher concentrationε (>10 nM) of bFGF.

Because thiε enhancement may have resulted from a bFGF- mediated increase in total cell number and/or protein, crystal violet staining of parallel cultures and protein asεayε of the

cell lyεateε waε performed. The normalization of the amount of protein using these valueε, however, did not reduce the luciferaεe activity in the bFGF pluε rTGF-βl-treated cultureε to that of cellε treated with rTGF-βl alone. Interestingly, 5 uncloned transfected MLE cells were lesε εenεitive to bFGF and other factors including TGF-β.

Additional TGF-β asεayε were performed uεing the ATCC depoεited LUCI cell line containing the pl500Luc expreεεion vector co-tranεfected with RSVneo as described in Example 2C to

10 determine the specificity of activation of the PAI-1 promoter by other cell activating molecules (agents) . The TGF-β assays were performed as deεcribed in Example 3A with the exception that the pl500Luc vector waε uεed inεtead of the pδOOneoLuc vector. Controls in these asεayε included the uεe of two

15. additional luciferaεe-expreεεing vectorε that had the vitronectin (VN) and reεpiratory εynctial viruε (RSV) promoterε in place of the PAI-1 truncated promoter. The moleculeε used in the asεayε included the following: (the εource and concentrationε are indicated in the parentheεeε) 1) human

20 recombinant IL-6 (Boerhringer Mannheim, Indianapoliε, IN; 500

U/ml); 2) dexamethaεone (Sigma Chemical Co.; 10 ~5 M) ; 3) TGFβ- β (Berlix Biosciences; 1 ng/ml); 4) lipopolyεaccharide (LPS) (Sigma Chemical Co. ; 1 ng/ml) ; 5) human recombinant alpha tumor necroεiε factor (TNF) (Boehringer Mannheim; 100 ng/ml) ;

25 6) human recombinant IL-1 (Sigma Chemical Co.; 50 U/ml); and 7) thrombin (NY State Department of Health, Albany, NY; 10 U/ml) .

The aεsays were performed aε indicated in Table 1 in which the fold induction iε indicated aε meaεured by relative light

30 units of luciferase that resulted from the activation of either the PAI-1, VN or RSV promoters when exposed to the various agentε.

The 1500 bp PAI-1 promoter present in the pl500Luc vector was slightly responsive to IL-6, LPS and a mixture of IL-6 plus dexamethasome. In contraεt, the induction of luciferaεe expressing in responεe to activation by TGF-β waε 147-fold over that εeen in the control untreated cells. Furthermore, IL-6 and IL-6 plus dexamethasone were effective activating agents when uεed in the presence of a vitronectin promoter. None of the agents were εignificantly effective at inducing expreεεion from the RSV promoter.

These resultε confirm that TGF-β iε the predominant activator of the PAI-1 promoter and that the TGF-β aεεay of thiε invention exhibitε remarkable εpecificity. Thuε, the aεsay is valuable in that the meaεurement of TGF-β that haε been purified or even TGF-β preεent in unknown quantitieε in a complex εolution containing many promoter-specific molecules can be readily determined without confounding by con.taminantε. With the added control of pre-treating the liquid εampleε with neutralizing antibodieε to TGF-β iεomers, the absolute amounts of TGF-β as well as isomer type can be determined.

D. Effects of Serum for Quantifying TGF-β in the TGF-β Assay Method To asεeεε the effectε of εerum on the quantification

of TGF-β, TGF-β aεεays were performed in the preεence of DMEM- BSA containing rTGF-βl alone (cloεed εquareε), or with 0.5% (cloεed circleε), 1% (cloεed triangleε), or 2% (cloεed diamonds) calf serum. The rTGF-βl concentrations in the asεayε ranged from 0 to 8 pM. Aε shown in Figure 4C, εerum similarly enhanced the induction of the PAI/L construct by rTGF-βl similar to that by purified growth factors as εhown in Example 3C. At low rTGF-βl concentrations (<1 pM) , addition of 0.5, 1 or 2% serum had little effect on the luciferase activity. As the rTGF-βl concentration was increased, the serum-containing curves were shifted upwards posεibly aε a reεult of growth factors εuch aε bFGF in the εerum.

E. Comparison of the TGF-β Assay with the MLEC Asεav and the Radioreceptor Assay for

Quantifying TGF-β

Quantification of TGF-β in a defined media (DMEM-BSA) lacking growth factors or serum as demonstrated in Example 3D, however, is rarely found in the laboratory. For thiε reaεon, TGF-β aεεayε were also performed in COS, BSM and BAE cell conditioned medium (CM) , all of which normally contain latent but little, if any, active TGF-β. These εamples were tested uεing the TGF-β aεεay method of thiε invention in compariεon with the MLEC (mink lung epithelial cell tritiated thymidine uptake cell asεay) .

The TGF-β assay was performed aε deεcribed in Example 3A with rTGF-βl ranging in concentration from 0 to 40 pM in the preεence of either DMEM-BSA (cloεed εquareε) , COS CM (croεses) , BSM CM (closed triangles) or BAE CM (closed circles) . To prepare conditioned medium, BAE cellε were cultured in alphaMEM medium (Bio-Whittaker, Walkersville, MD) containing 5% fetal calf εerum. BSM and COS cellε were cultured in DMEM supplemented with 10% calf serum (Bio-Whittaker) . Conditioned medium was prepared by a 24 hour incubation of the indicated cells with DMEM containing 0.1% pyrogen-poor BSA

(weight/volume) (Pierce, Rockford, ID . All media were supplemented with L-glutamine (2 mM) , penicillin G (100 U/ml) and streptomycin sulfate (100 μg/ml) (Irvine Scientific, Santa Ana, CA) . The MLEC aεεay was performed esεentially aε deεcribed by Lucaε et al. , In Peptide Growth Factorε, Barneε et al . , Edε, Academic Preεε Inc. 198:303-316 (1991) . Briefly, 100 ul aliguotε of the εampleε were placed in 96-well plateε containing 10 4 MLE cellε per well in 100 ul of aεεay buffer (DMEM containing 0.25% fetal calf εerum and 10 mM HEPES) .

After 20 hourε at 37°C, one μCi of 3 H-thymidine (6.7Ci/mmol, Du Pont Co., Boston, MA) in 20 μl of the assay buffer was added to each well, and the plateε incubated an additional 4 hourε. The cells were harvested by incubation with 100 μl of 0.25% trypεin/lml EDTA at 37°C for 15 minuteε, tranεferred onto glass fiber filters, and placed into vials containing liquid scintillation solution. The amount of radioactivity was quantified with a Beckman LS 3801 β-scintillation counter (Fullerton, CA) . Aε clearly εhown by the data indicated by the unbroken lineε in Figure 5, both BAE and BSM CM contained factors that stimulated thymidine incorporation in the MLEC asεay 5-6 fold. Only at rTGF-βl levelε greater than or equal to 1 pM waε the ^H-thymidine incorporation εuppreεεed to a level equal to that of non-conditioned medium (DMEM-BSA) . In contraεt, COS CM contained factorε that εtrongly inhibited ^H-thymidine incorporation. With all three of theεe CM, calculation of TGF- β concentration would be very difficult uεing ^H-thymidine incorporation. In contraεt, when different CM were uεed in the TGF-β aεεay aε indicated in Figure 5 with the data plotted with broken lineε, there were alεo εlight changeε but theεe differenceε were much less significant than those seen with the MLEC asεay. BAE CM, which containε bFGF, shifted the responεe curve to higher values . BSM and COS CM had only minor effects on the standard curveε.

When bFGF (closed diamonds), EGF (open circles), PDGF-BB (open triangleε), rIL-lalpha (open εquareε), and the TGF- β ε (rTGF-βl (closed εquareε), rTGF-β2 (cloεed circleε), and rTGF- β3 (cloεed triangleε) were tested for their ability to affect 3__-thymidine i n or P ora tion into non-transfected MLE cells in the MLEC asεay performed as described above, more striking effects were obεerved aε εhown in Figure 6. The three TGF-β iεoformε, eεpecially TGF-β3, decreaεed ^H-thymidine incorporation aε expected. IL-lalpha and PDGF-BB had little effect, but bFGF and EGF had εtrong dose-dependent stimulatory effects on ^H-thymidine incorporation. Such effects can make the MLEC asεayε inaccurate and difficult to analyze.

F. Ouantitation of Total TGF-β Levels in Activated In order to analyze total levels of TGF-β, BAE CM collected after 12 or 24 hourε waε heat treated at 80°C for 10- 12 minuteε to activate endogenouε latent TGF-β aε deεcribed by Brown et al., Growth Fac .. 3:35-43 (1990) . After cooling, the samples were diluted to 5, 10 or 20% of their original concentration with DMEM-BSA and were quantified using the TGF-β assay. TGF-β concentrations of 23.4+3.4 pM (12 hour CM) and 122.1+16 pM (24 hours CM) were determined via comparison with a rTGF-β εtandard reference curve generated from plotting the detected amounts of luciferase activity that resulted from a range of predetermined amounts of TGF-β as described in Example 3A.

The heat-activated CM were alεo aεεayed uεing the highly εpecific radioreceptor aεεay as described by Kojima et al., J. Cell. Phvsiol.. 155:323-332 (1993), the discloεure of which iε hereby incorporated by reference. Briefly, murine AKR-2B fibroblaεts at 1 X 10 5 cells/well were plated in a 24-well plate in McCoy'ε 5A medium (Gibco BRL) εupplemented with 5% fetal calf εerum. The following day, the cellε were waεhed 3 times with binding buffer (McCoy's 5A, 0.1% BSA, 25 mM HEPES at pH 7.4) and were pre-incubated in 250 ul of binding buffer for

1 hour at room temperature. The medium was removed, and the cells were incubated for 2 hourε at room temperature in a mixture of 125 ul of binding buffer containing 50 pM 125 I-rTGF- βl and an equal volume of heat-activated (80°C for 10 minutes) BAE CM or εerial dilutions of cold rTGF-βl. The cells were washed 3 timeε with binding buffer, and the bound radioactivity waε εolubilized in cell lyεis buffer (Analytical Luminescence) and was measured in a Packard Multi-PRIASl gamma counter (Meriden, CT) . The radioreceptor aεεay waε εensitive between 0.0004 and 2 nM rTGF-βl.

In the radioreceptor assay, concentrations of 24±1.1 pM (12 hour CM) and 128±48.δ pM (24 hour CM) were calculated. The essentially identical resultε quantifying the amount of TGF-β in conditioned medium between the TGF-β assay described above and the radioreceptor asεay verify the accuracy and specificity of the TGF-β aεεay of this invention.

Thus, a highly senεitive and εpecific, non-radioactive aεsay for mature TGF-β has now been developed. When compared to the sensitive and widely used MLEC method for measuring TGF- β concentration, the TGF-β asεay waε more rapid, had comparable sensitivity, and a greater detection range. Specificity of thiε aεsay waε alεo higher as evidenced by its relative insenεitivity to factorε εuch as EGF and bFGF which can greatly affect other asεayε. The moεt remarkable example of the TGF-β aεsay specificity was obεerved with COS cell CM which completely inhibited the MLEC assay, while having no detrimental effects in the TGF-β asεay.

In addition to the TGF-β aεsay of this invention and the MLEC and radioreceptor aεεayε deεcribed herein, other aεεayε have been used to detect mature TGF-β including anchorage- independent growth assays, differentiation-based asεays, cell migration and plasminogen activity asεayε, radioimmunoaεεayε and enzyme-linked immunoεorbent aεεayε. Although all of theεe aεsays can detect mature TGF-β, the low concentrations of TGF- β, generally lesε than 2 pM, generated in many biological

εyεtemε make many of them impractical without prior concentration of the εample that can reεult in large loεses of the mature growth factor or even activation of latent TGF-β. The TGF-β asεay of thiε invention overcomeε theεe deficiencieε by being highly εenεitive and εpecific aε well aε nonradioactive. The εpecificity and sensitivity of the assay are the result of uεing a truncated PAI-1 promoter beginning at -800 and extending through 76 of the PAI-1 5' promoter that retainε two regionε reεponεible for maximal reεponse to TGF-β aε deεcribed by Keeton et al . , J. Biol. Chem.. 266:23048-23052 (1991) . Uεe of the complete PAI-1 promoter and upεtream elementε reεult in decreaεed εpecificity aε reεponεive elementε for other moleculeε preεent in complex εolutionε may be activated or inhibited deleteriously effecting the ability to quantify TGF-β. Moreover, the truncated PAI-1 promoter used above has been further fragmented to smaller more specific TGF- β response elements as deεcribed in Example 4 to enhance specificity and increase the senεitivity of the TGF-β aεεay method. When the TGF-β assay is compared to the senεitive and widely uεed MLEC aεεay for quantifying TGF-β concentrationε, the TGF-β aεsay was more rapid, had comparable εenεitivity but with a greater detection range. Specificity of the aεεay waε alεo higher aε evidenced by the TGF-β'ε aεεay insensitivity to growth factorε εuch aε EGF and bFGF that have been εhown to greatly effect other aεεayε. The moεt striking example of the specificity of the TGF-β assay waε obεerved with the COS cell line conditioned medium that completely inhibited the MLEC aεsay while having no detrimental effects in the TGF-β aεsay aε εhown in Figure 5.

Although the TGF-β aεεay iε not iεoform specific, use of the appropriate standard reference curves and addition of neutralizing antibodies to the various TGF-β specieε allowε for quantification of unique iεoforms. While the TGF-β asεay of thiε invention iε highly specific, the use of highly specific

neutralizing antibodies to TGF-β was used to verify that no other molecules were preεent in teεt liquid εampleε that may have affected the quantitation of TGF-β in the aεsay. Conεidering itε large range and εpecificity, thiε rapid, sensitive, non-radioactive, easily performed assay is of invaluable use in determining active TGF-β concentrations in complex solutionε, particularly εo with the uεe of parallel aεεayε with neutralizing antibodieε to TGF-β in complex unknown εampleε to verify that no other moleculeε are present that can affect the asεay through either inhibition or activation of other regions of the truncated PAI-1 promoter.

4. Quantifying TGF-β with Cells Transiently Transformed with Expression Vectors Having Shorter Fragments of the PAI-1 Promoter Containing TGF-β Response Elements

The regulation of PAI-1 by TGF-β appears to affect a number of biological systems and the mechanism of transcriptional regulation by TGF-β has been studied by a number of groups. For example, the autoinduction of the TGF-βl promoter suggestε a feedback loop deεigned to amplify the reεponεe to TGF-β under certain conditionε. This responεe waε εhown to involve specific AP-1 siteε. AP-1 iε a heterodimeric complex of Foε and Jun protein εubunits that binds to specific DNA enhancer sites which have the consenεuε εequence TGASTCA

(SEQ ID NO 26), where S can be either G or C. AP-1 iε believed to mediate the transcriptional effects of the tumor promoting phorbol esterε.

In contraεt to theεe reεultε, the TGF-β response sequence in the promoter for type 1 collagen, has been localized to a sequence with homology to a nuclear factor 1 (NF-1) binding site. A number of different consenεuε εequenceε for NF-1 have been described and these include the sequences TGGN 7 GCCAA (SEQ ID NO 27), where N can be either A, C, G or T, and TGGCA (SEQ ID NO 28) . The effect of TGF-β on the PAI-1 promoter has been

εtudied resulting in the demonstration that the reεponεive regionε contain sequences with homology to the AP-1 consenεus sequence.

To determine the role of AP-1 in the regulation of the PAI-1 promoter in more detail and to identify smaller TGF-β responεive regions with the PAI-1 promoter of pδOOneoLuc expression vector prepared in Example 1 for use in quantifying TGF-β in Example 3, the effect of both TGF-β and AP-1 on the activity of a 25 bp fragment correεponding to the PAI-1 promoter between -674 and -650 in the 5' flanking region waε evaluated. Thiε fragment contained one of the AP-1 like sequences that responded to TGF-β. The expresεion vectorε for uεe in assesεing the requirement for AP-1, including the one containing the 25 bp fragment, were prepared aε deεcribed in Example 1C.

A. TGF-β Activation of PAI-1 Promoter Fragments

AP-1 like siteε are located within each of three regionε of the 5' flanking region of the-PAI-1 promoter from -87 to -49, from -674 to -636 and from -740 to -703.

Oligonucleotideε having portionε or all of theεe regionε were εyntheεized and cloned into a pUC-luciferaεe expreεεing plaεmid containing the minimal promoter aε deεcribed in Example 1C. The resultant plasmidε were tranεiently tranεfected into recipient Hep3B cellε aε deεcribed in Example 2C and evaluated for their reεponεe to TGF-β aε meaεured by luciferase expresεion as described in Example 3A. The plasmid deεignated p56Luc contained an oligonucleotide εequence that correεponded to -56 to -41 of the PAI-1 promoter gene (alεo referred to aε region A) and conferred a 10-fold induction of measurable. TGF-β as compared to a 3-fold induction obtained with a plasmid expression vector only containing the minimal promoter sequence.

Another plasmid designated p674Luc, depoεited with ATCC and having ATCC Acceεεion Number 75627, contained an

oligonucleotide εequence 25 bp in length that correεponded to -674 to -650 of the PAI-1 promoter (alεo referred to aε region B) . This nucleotide sequence conferred a 70-fold induction on the minimal promoter. The plaεmid deεignated p743Luc contained an oligonucleotide εequence 35 bp in length that corresponded to -743 to -708 of the PAI-1 promoter (alεo referred to aε region C) . Thiε nucleotide εequence conferred a 35-fold induction in the promoter. The plasmid designated p732Luc exhibited 62-fold induction while the plasmid, p732HBV, having the hepatitis B virus (HBV) minimal promoter sequence instead of the PAI-1 εequence exhibited 47-fold induction.

Thiε reεult iε in compariεon to 6-fold baεal induction from a control plasmid having only the HBV minimal promoter without having any TGF-β responεe elementε. The nucleotide sequence of the sense strand of the HBV-minimal promoter- containing plaεmid having or lacking the neomycin εelectable marker gene are liεted reεpectively in SEQ ID NOε 23 and 24. In parallel aεsays, the p800Luc plaεmid that contained 3 AP-1- like εequenceε conferred greater than 150-fold induction of TGF-β reεponεiveneεs as compared to the minimal promoter sequence. The stably transformed p!500Luc similarly resulted in approximately 150-fold induction. Theεe reεultε aε well aε the otherε preεented in the Exampleε repreεent the average of at least 4 independent experiments, each performed in duplicate.

Regions A and C contained only a single AP-1 like sequence whereas region B contained 2 AP-1 like binding εequenceε. Thuε, oligonucleotideε containing AP-1 like εequenceε from each region were able to confer TGF-β reεponεiveneεε to a non- reεponεive minimal promoter.

B. Responεiveness of the TGF-β responsive Regions A. B and C to c-fos/c-iun

In order to directly test the response of the p56Luc, p674Luc and p743Luc plasmids to AP-1, they were cotranεfected

together into Hep3B cells with plasmids containing the mouse genes for c-fos and c-jun under the control of the RSV promoter. All three of theεe regionε εhowed a doεe dependent reεponεe to increaεing amounts of c-foε/c-jun, with maximum reεponεeε seen using 0.1 μg/well of c-fos and c-jun plaεmidε.

Thiε reεponεe waε dependent on co-tranεfeetion of both plaεmidε εince neither c-foε or c-jun alone waε able to cauεe thiε induction.

C. Detailed Analvεis of the TGF-β Responεive

Nucleotide Sequence in the PAI-1 Promoter from Nucleotide -743 to -708 (Region C) To find the minimal TGF-β reεponεive εequence in the PAI-1 promoter region from nucleotide poεition -743 to -706, the sequence of which is liεted in SEQ ID NO 16, two oligonucleotideε were made, the firεt from the 3' εide of region C which contained the AP-1 like εequence (C2 : -723 to -708 corresponding to the sequence in SEQ ID NO 16 from 21 to 36) and the second from the remaining 5' sequence (C3: -743 to -727 correεponding to the εequence in SEQ ID NO 16 from 1 to 17) . When the oligonucleotideε were examined for reεponεe to TGF-β, neither the C2 or C3 εequence showed maximal induction with TGF-β (10-fold and 3-fold induction, respectively) as compared to region C itself (25-fold induction) . Thiε reεult εuggeεted that a portion of a TGF-β reεponεive binding εite located between -723 and -727 waε deleted. The 5' εide of C2 waε then progreεεively extended to include baεeε between -723 to -728 " (7-fold induction) but found that thiε did not improve the TGF-β reεponse. However when this region was extended another 4 bp there was a dramatic increaεe in the TGF-β reεponεe (63-fold induction) indicating that thiε region waε crucial to thiε reεponse.

D. Site-Specific Mutations of the PAI-1 Promoter from Nucleotide -732 to -708, Region C5

To aεεess the role of the AP-1 site compared to the 5' TGF-β responεive εite, the responεe of the minimal promoter having the 5' flanking region of the PAI-1 promoter from -39 to + 76 to direct εtimulation with c-foε/c-jun waε determined. It showed 10-fold induction with AP-1 compared to only 3-fold induction with TGF-β. When C5 was tested in a similar manner there was only a 2-fold increase above the vector background induced by c-fos/c-jun compared to a greater than 20-fold increaεe above background seen with TGF-β (C5 itself showed 63- fold induction) . Thus, although the wild type AP-1 site in C5 was only a relatively poor reεponεive εequence to c-foε/c-jun, thiε region still showed a εtrong response to TGF-β. The AP-1 site was therefore mutated to produce a consenεus AP-1 sequence (TGACACA to TGAGTCA, SEQ ID NOs 29 and 30, respectively) and the response of mutant to both c-fos/c-jun and TGF-β was compared. Thiε mutation increaεed the AP-1 reεponεe from 19- fold to 105-fold but did not improve the TGF-β reεponεe. In fact, a conεiεtent decrease was seen in the TGF-β responεe following this mutation (63-fold induction with TGF-β for the wild type AP-1 like site to 30-fold for the consenεuε AP-1 εite) .

The AP-1 like εite was then mutated by changing the critical TGA baseε, a change εhown by otherε to decreaεe the activity of the AP-1 binding εite. Although this mutation had the expected effect of abolishing the AP-1 response, it did not completely abolish the responεe of thiε conεtruct to TGF-β (10- fold induction with c-foε/c-jun [i.e., vector background] but a 13-fold induction with TGF-β [i.e., 5-fold above vector background] ) . Thiε reεult once again εuggeεted that the 5' portion of C5 (-732 to -708) waε more critical than the AP-1 like εequence in mediating the TGF-β response. To further test this hypothesiε, 4 bp between -728 and -732 was mutated (the resultant mutated vector designated C8) since the previous deletion results suggeεted that thiε sequence waε critical to the TGF-β

response. A 3 bp εequence between -726 and -728 waε alεo mutated (the reεultant vector waε deεignated C9) . Aε expected, both of theεe 5 ' mutationε cauεed dramatic reductionε in the response of C5 to TGF-β (60-fold to 4-fold for both C8 and C9) . These changes had little effect on the AP-1 response which decreased only slightly from 19-fold to 13-fold. A double mutation of both of these εites was also created and this abolished both the TGF-β and the AP-1 activity.

E. Heterologons Promoter Induction

To teεt whether the 25 bp oligonucleotide from the PAI-1 promoter region C5, -732 to -708 (SEQ ID NO 15), waε able to activate a heterologouε promoter, it waε cloned into a hepatitiε B viral promoter, the latter of which had the nucleotide εequence from -188 to +145 of the viral promoter *

(SEQ ID NO 19) . Control experimentε found that thiε conεtruct alone εhowed 28-fold induction with fos/jun. However, the viral promoter εhowed only 4-fold induction with TGF-β. Thuε, even though the hepatitiε B viral promoter had active AP-1 like εiteε, theεe were not sufficient for a εtrong TGF-β reεponεe. The region between -708 and -732 of the PAI-1 promoter (C5) was then cloned into the viral promoter and the resultant construct waε teεted aε above. The 25 bp PAI-1 fragment waε able to dramatically increase the TGF-β responεe of the viral promoter from 4-fold to 47-fold but did not alter the AP-1 response (25-fold compared to 26-fold) . Finally, mutation of baseε between -732 and -728 of the PAI-1 promoter oligonucleotide dramatically reduced the TGF-β induction of this fragment but did not lower the response to AP-1.

F. AP-1-Independent TGF-β Induction

To determine if the 5' -732 to -708 nucleotide εequence from the PAI-1 promoter could function independently of the AP-1 site in the TGF-β responεe, a 15 bp oligonucleotide containing bases between -732 and -718, corresponding to the

nucleotide seσuence from position 1 to 15 in SEQ ID NO 17) (which excludes the AP-1 like site) was cloned into a pUC- luciferaεe expression vector having the minimal PAI-1 promoter. Thiε 15 bp εequence waε able to confer 20-fold induction with TGF-β with the minimal PAI-1 promoter and did not εhow any AP-1 activity.

With regard to the AP-1 like εiteε involved in thiε response, unlike the consensus sequence for AP-1 (TGASTCA, where S is G or C (SEQ ID NO 26), the most active εequenceε from the PAI-1 promoter all have the εequence TGA(N)ACA where N iε either A, C, G or T (SEQ ID NO 31) (PAI-1 promoter: -717 to -711 = TGACACA (SEQ ID NO 29); -659 to -653 = TGATACA (SEQ ID NO 32) . It iε poεεible that the T to A substitution may affect the binding affinity enough to preferentially bind another protein other than c-fos/c-jun. This iε conεiεtent with the functional data on the AP-1 like εite of the PAI-1 promoter (between -711 to -717) which indicateε that the wild type εequence iε a poor AP-1 binding εite and yet iε εtill important in the TGF-β reεponse. The mutation and deletion data of the 25 bp sequence from the wild type PAI-1 promoter (-732 to -708) εuggeεted that the 5 ' side of the oligonucleotide may contain a second binding site of importance in the TGF-β responεe. In fact thiε region appeared to be more critical than the AP-1 εequence εince mutation of thiε region almoεt completely aboliεhed the TGF-β reεponεe even though the AP-1 region was intact. When this sequence alone was evaluated, it was able to act independently of the AP-1 site and promote εtrong TGF-β induction of the normally unresponsive minimal promoter. However, the full TGF- β responεe was dependent on the functional activity of both the AP-1 like site and the 5' site. When the sequence of the 5' 15 bp sequence was compared to the other region of the PAI-1 promoter which also showed strong TGF-β induction (region B = 60-fold) , a seσuence was found that was common to both of these regions (CCNTGTNT, where N is either A, C, G or T (SEQ ID NO

33 ) ) .

In summary, the TGF-β responεe of the PAI-1 promoter haε been localized to specific AP-1 like siteε. However, the full TGF-β reεponεe of this region of the PAI-1 promoter is dependent on the interaction of two binding siteε. The firεt εite haε homology to an AP-1 site but does not appear to bind AP-1. While thiε εite iε not eεεential it is required for the full TGF-β induction from this region. The second site, located 5' to the AP-1 site, appears to be critical in the TGF- β responεe. Thiε εite is 15 bp in size and containε a motif that iε preεent in both active regionε of the PAI-1 promoter as well as in the moεt reεponεive region of the TGF-β promoter. Thiε novel sequence does not appear to match any previously described tranεcription factor binding εiteε and may repreεent a new and εpecific binding εite which iε critical for a εtrong TGF-β reεponεe.

5. Deooεit of Materials

The plasmidε, p674Luc, p743Luc and p732Luc, were depoεited on or before December 16, 1993, with the American Type Culture Collection, 1301 Parklawn Drive, Rockville, MD, USA (ATCC) and aεεigned the reεpective ATCC Acceεεion Numberε ATCC 75627, ATCC 75626 and ATCC 75629. The cell line, Hep3B, εtably transfected with plasmid pl500Luc for a transformed cell line designated LUCI, was also deposited on or before December 16, 1993 with ATCC and asεigned the ATCC Acceεεion Number CRL 1150δ. The deposit thus provides plasmids and a stably transfected cell line containing plaεmid pl500Luc. Theεe depoεitε were made under the proviεionε of the Budapeεt Treaty on the International Recognition of the Depoεit of Microorganiεms for the Purpose of Patent Procedure and the Regulations thereunder (Budapest Treaty) . Thiε aεεureε maintenance of viable plaεmidε and cell lineε for 30 yearε from the date of depoεit. The plasmids and cell line will be made available by ATCC under the terms of the Budapest Treaty which asεureε permanent and

unrestricted availability of the progeny of the culture to the public upon isεuance of the pertinent U.S. patent or upon laying open to the public of any U.S. or foreign patent application, whichever comeε first, and asεureε availability of the progeny to one determined by the U.S. Co miεεioner of

Patentε and Trademarkε to be entitled thereto according to 35 U.S.C. §122 and the Commiεεioner'ε ruleε pursuant thereto (including 37 CFR §1.14 with particular reference to 886 OG 638) . The assignee of the present application haε agreed that if the plaεmid or cell line depoεits should die or be lost or deεtroyed when cultivated under εuitable conditionε, they will be promptly replaced on notification with a viable εpecimen of the εame plaεmid or cell culture. Availability of the depoεited plasmids is not to be construed aε a licenεe to practice the invention in contravention of the rightε granted under the authority of any government in accordance with itε patent laws.

The foregoing written specification is conεidered to be εufficient to enable one εkilled in the art to practice the invention. The preεent invention iε not to be limited in εcope by the plaεmidε depoεited, εince the depoεited embodiment iε intended aε a εingle illuεtration of one aspect of the invention and any plasmids that are functionally equivalent are within the scope of this invention. The deposit of material does not constitute an admission that the written description herein contained is inadequate to enable the practice of any aspect of the invention, including the best mode thereof, nor is it to be construed as limiting the εcope of the claims to the specific illuεtration that it represents . Indeed, various modifications of the invention in addition to those shown and described herein will become apparent to those skilled in the art from the foregoing description and fall within the εcope of the appended claimε .

SEQUENCE LISTING

(1) GENERAL INFORMATION:

(i) APPLICANT:

(A) NAME: The Scripps Research Institute

(B) STREET: 10666 North Torrey Pines Road

(C) CITY: La Jolla

(D) STATE: CA

(E) COUNTRY: USA

(F) POSTAL CODE (ZIP): 92037

(G) TELEPHONE: 619-554-2937 (H) TELEFAX: 619-554-6312

(ii) TITLE OF INVENTION: A NEW SENSITIVE METHOD FOR QUANTIFYING ACTIVE TRANSFORMING GROWTH FACTOR-BETA AND COMPOSITIONS THEREFOR

(iii) NUMBER OF SEQUENCES: 33

(iv) COMPUTER READABLE FORM:

(A) MEDIUM TYPE: Floppy disk

(B) COMPUTER: IBM PC compatible

(C) OPERATING SYSTEM: PC-DOS/MS-DOS

(D) SOFTWARE: PatentIn Release #1.0, Version #1.25 (EPO)

(v) CURRENT APPLICATION DATA:

(A) APPLICATION NUMBER: PCT/US 95/

(B) FILING DATE: 25-JAN-1995

(vi) PRIOR APPLICATION DATA:

(A) APPLICATION NUMBERE: US 08/188,227

(B) FILING DATE: 25-JAN-1994

(2) INFORMATION FOR SEQ ID NO:l:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 11293 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: circular

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l:

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600.

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100

ATAGTTAAGC CAGTATTCGA CCTCGAGGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG 2160

TGACATAATT GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT 2220

TAAGTGTATA ATGTGTTAAA CTACTGATTC TAATTGTTTG TGTATTTTAG ATTCCAACCT 2280

ATGGAACTGA TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT 2340

CAGAAGAAAT GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC 2400

CAAAAAAGAA GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT 2460

TGAGTCATGC TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG 2520

AAAAAGCTGC ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA 2580

GGCATAACAG TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT 2640

CTGCTATTAA TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG 2700

TTAATAAGGA ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA 2760

TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT 2820

AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA 2880

AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT 2940

TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC GGCTGTGGAA TGTGTGTCAG 3000

TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA GTATGCAAAG AATTAGTCAG CAACCAGGTG TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG AGCATGCATC TCAATTAGTC AGCAACCATA GTCCCGCCCC TAACTCCGCC CTAACTCCGC CCAGTTCCGC CCATTCTCCG CCCCATGGCT GACTAATTTT GCAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG GGAGGCCTAG GCTTTTGCAA AAAGCTTCAC GCTGCCGCAA GCACTCAGGG GCTAAAGGAA GCGGAACACG TAGAAAGCCA GTCCGCAGAA ACGGTGCTGA ATGTCAGCTA CTGGGCTATC TGGACAAGGG AAAACGCAAG CGCAAAGAGA CTTGCAGTGG GCTTACATGG CGATAGCTAG ACTGGGCGGT TTTATGGACA CGGAATTGCC AGCTGGGGCG CCCTCTGGTA AGGTTGGGAA GCCCTGCAAA TGGCTTTCTT GCCGCCAAGG ATCTGATGGC GCAGGGGATC AAGATCTGAT GATGAGGATC GTTTCGCATG ATTGAACAAG ATGGATTGCA CGCAGGTTCT GGGTGGAGAG GCTATTCGGC TATGACTGGG CACAACAGAC AATCGGCTGC CCGTGTTCCG GCTGTCAGCG CAGGGGCGCC CGGTTCTTTT TGTCAAGACC GTGCCCTGAA TGAACTGCAG GACGAGGCAG CGCGGCTATC GTGGCTGGCC TTCCTTGCGC AGCTGTGCTC GACGTTGTCA CTGAAGCGGG AAGGGACTGG GCGAAGTGCC GGGGCAGGAT CTCCTGTCAT CTCACCTTGC TCCTGCCGAG TCATGGCTGA TGCAATGCGG CGGCTGCATA CGCTTGATCC GGCTACCTGC ACCAAGCGAA ACATCGCATC GAGCGAGCAC GTACTCGGAT GGAAGCCGGT AGGATGATCT GGACGAAGAG CATCAGGGGC TCGCGCCAGC CGAACTGTTC AGGCGCGCAT GCCCGACGGC GAGGATCTCG TCGTGACCCA TGGCGATGCC ATATCATGGT GGAAAATGGC CGCTTTTCTG GATTCATCGA CTGTGGCCGG CGGACCGCTA TCAGGACATA GCGTTGGCTA CCCGTGATAT TGCTGAAGAG AATGGGCTGA CCGCTTCCTC GTGCTTTACG GTATCGCCGC TCCCGATTCG CCTTCTATCG CCTTCTTGAC GAGTTCTTCT GAGCGGGACT CTGGGGTTCG CCAAGCGACG CCCAACCTGC CATCACGAGA TTTCGATTCC ACCGCCGCCT

GTTGGGCTTC GGAATCGTTT TCCGGGACGC CGGCTGGATG ATCCTCCAGC GCGGGGATCT 4620

CATGCTGGAG TTCTTCGCCC ACCCCGGGCT CGATCCCCTC GCGAGTTGGT TCAGCTGCTG 4680

CCTGAGGCTG GACGACCTCG CGGAGTTCTA CCGGCAGTGC AAATCCGTCG GCATCCAGGA 4740

AACCAGCAGC GGCTATCCGC GCATCCATGC CCCCGAACTG CAGGAGTGGG GAGGCACGAT 4800

GGCCGCTTTG GTCCCGGATC TTTGTGAAGG AACCTTACTT CTGTGGTGTG ACATAATTGG 4860

ACAAACTACC TACAGAGATT TAAAGCTCTA AGGTAAATAT AAAATTTTTA AGTGTATAAT 4920

GTGTTAAACT ACTGATTCTA ATTGTTTGTG TATTTTAGAT TCCAACCTAT GGAACTGATG 4980

AATGGGAGCA GTGGTGGAAT GCCTTTAATG AGGAAAACCT GTTTTGCTCA GAAGAAATGC 5040

CATCTAGTGA TGATGAGGCT ACTGCTGACT CTCAACATTC TACTCCTCCA AAAAAGAAGA 5100

GAAAGGTAGA AGACCCCAAG GACTTTCCTT CAGAATTGCT AAGTTTTTTG AGTCATGCTG 5160

TGTTTAGTAA TAGAACTCTT GCTTGCTTTG CTATTTACAC CACAAAGGAA AAAGCTGCAC 5220

TGCTATACAA GAAAATTATG GAAAAATATT CTGTAACCTT TATAAGTAGG CATAACAGTT 5280

ATAATCATAA CATACTGTTT TTTCTTACTC CACACAGGCA TAGAGTGTCT GCTATTAATA 5340

ACTATGCTCA AAAATTGTGT ACCTTTAGCT TTTTAATTTG TAAAGGGGTT AATAAGGAAT 5400

ATTTGATGTA TAGTGCCTTG ACTAGAGATC ATAATCAGCC ATACCACATT TGTAGAGGTT 5460

TTACTTGCTT TAAAAAACCT CCCACACCTC CCCCTGAACC TGAAACATAA AATGAATGCA 5520

ATTGTTGTTG TTAACTTGTT TATTGCAGCT TATAATGGTT ACAAATAAAG CAATAGCATC 5580

ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA GTTGTGGTTT GTCCAAACTC 5640

ATCAATGTAT CTTATCATGT CTGGATCCCC AGGAAGCTCC TCTGTGTCCT CATAAACCCT 5700

AACCTCCTCT ACTTGAGAGG ACATTCCAAT CATAGGCTGC CCATCCACCC TCTGTGTCCT 5760

CCTGTTAATT AGGTCACTTA ACAAAAAGGA AATTGGGTAG GGGTTTTTCA CAGACCGCTT 5820

TCTAAGGGTA ATTTTAAAAT ATCTGGGAAG TCCCTTCCAC TGCTGTGTTC CAGAAGTGTT 5880

GGTAAACAGC CCACAAATGT CAACAGCAGA AACATACAAG CTGTCAGCTT TGCACAAGGG 5940

CCCAACACCC TGCTCATCAA GAAGCACTGT GGTTGCTGTG TTAGTAATGT GCAAAACAGG 6000

AGGCACATTT TCCCCACCTG TGTAGGTTCC AAAATATCTA GTGTTTTCAT TTTTACTTGG 6060

ATCAGGAACC CAGCACTCCA CTGGATAAGC ATTATCCTTA TCCAAAACAG CCTTGTGGTC 6120

AGTGTTCATC TGCTGACTGT CAACTGTAGC ATTTTTTGGG GTTACAGTTT GAGCAGGATA 6180

TTTGGTCCTG TAGTTTGCTA ACACACCCTG CAGCTCCAAA GGTTCCCCAC CAACAGCAAA 6240

AAAATGAAAA TTTGACCCTT GAATGGGTTT TCCAGCACCA TTTTCATGAG TTTTTTGTGT 6300

CCCTGAATGC AAGTTTAACA TAGCAGTTAC CCCAATAACC TCAGTTTTAA CAGTAACAGC 6360

TTCCCACATC AAAATATTTC CACAGGTTAA GTCCTCATTT AAATTAGGCA AAGGAATTAT 6420

ACACTCCGCT ATCGCTACGT GACTGGGTCA TGGCTGCGCC CCGACACCCG CCAACACCCG 6480

CTGACGCGCC CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 6540

TCTCCGGGAG CTGCATGTGT CAGAGGTTTT CACCGTCATC ACCGAAACGC GCGAGGCAGC 6600

GGATCATAAT CAGCCATACC ACATTTGTAG AGGTTTTACT TGCTTTAAAA AACCTCCCAC 6660

ACCTCCCCCT GAACCTGAAA CATAAAATGA ATGCAATTGT TGTTGTTAAC TTGTTTATTG 6720

CAGCTTATAA TGGTTACAAA TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTTT 6780

TTTCACTGCA TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 6840

TCATAATCAG CCATACCACA TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 6900

TCCCCCTGAA CCTGAAACAT AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG 6960

CTTATAATGG TTACAAATAA AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 7020

CACTGCATTC TAGTTGTGGT TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC 7080

CAAGCTTACC ATGGTAACCC CTGGTCCCGT TCAGCCACCA CCACCCCACC CAGCACACCT 7140

CCAACCTCAG CCAGACAAGG TTGTTGACAC AAGAGAGCCC TCAGGGGCAC AGAGAGAGTC 7200

TGGACACGTG GGGAGTCAGC CGTGTATCAT CGGAGGCGGC CGGGCACATG GCAGGGATGA 7260

GGGAAAGACC AAGAGTCCTC TGTTGGGCCC AAGTCCTAGA CAGACAAAAC CTAGACAATC 7320

ACGTGGCTGG CTGCATGCCT GTGGCTGTTG GGCTGGGCAG GAGGAGGGAG GGGCGCTCTT 7380

TCCTGGAGGT GGTCCAGAGC ACCGGGTGGA CAGCCCTGGG GGAAAACTTC CACGTTTTGA 7440

TGGAGGTTAT CTTTGATAAC TCCACAGTGA CCTGGTTCGC CAAAGGAAAA GCAGGCAACG 7500

TGAGCTGTTT TTTTTTTCTC CAAGCTGAAC ACTAGGGGTC CTAGGCTTTT TGGGTCACCC 7560

GGCATGGCAG ACAGTCAACC TGGCAGGACA TCCGGGAGAG ACAGACACAG GCAGAGGGCA 7620

GAAAGGTCAA GGGAGGTTCT CAGGCCAAGG CTATTGGGGT TTGCTCAATT GTTCCTGAAT 7680

GCTCTTACAC ACGTACACAC ACAGAGCAGC ACACACACAC ACACACACAT GCCTCAGCAA 7740

GTCCCAGAGA GGGAGGTGTC GAGGGGGACC CGCTGGCTGT TCAGACGGAC TCCCAGAGCC 7800

AGTGAGTGGG TGGGGCTGGA ACATGAGTTC ATCTATTTCC TGCCCACATC TGGTATAAAA 7860

GGAGGCAGTG GCCCACAGAG GAGCACAGCT GTGTTTGGCT GCAGGGCCAA GAGCGCTGTC 7920

AAGAAGACCC ACACGCCCCC CTCCAGCAGC TGAATTCCAG CTGGCATTCC GGTACTGTTG 7980

GTAAAATGGA AGACGCCAAA AACATAAAGA AAGGCCCGGC GCCATTCTAT CCTCTAGAGG 8040

ATGGAACCGC TGGAGAGCAA CTGCATAAGG CTATGAAGAG ATACGCCCTG GTTCCTGGAA 8100

CAATTGCTTT TACAGATGCA CATATCGAGG TGAACATCAC GTACGCGGAA TACTTCGAAA 8160

TGTCCGTTCG GTTGGCAGAA GCTATGAAAC GATATGGGCT GAATACAAAT CACAGAATCG 8220

TCGTATGCAG TGAAAACTCT CTTCAATTCT TTATGCCGGT GTTGGGCGCG TTATTTATCG 8280

GAGTTGCAGT TGCGCCCGCG AACGACATTT ATAATGAACG TGAATTGCTC AACAGTATGA 8340

ACATTTCGCA GCCTACCGTA GTGTTTGTTT CCAAAAAGGG GTTGCAAAAA ATTTTGAACG 8400

TGCAAAAAAA ATTACCAATA ATCCAGAAAA TTATTATCAT GGATTCTAAA ACGGATTACC 8460

AGGGATTTCA GTCGATGTAC ACGTTCGTCA CATCTCATCT ACCTCCCGGT TTTAATGAAT 8520

ACGATTTTGT ACCAGAGTCC TTTGATCGTG ACAAAACAAT TGCACTGATA ATGAATTCCT 8580

CTGGATCTAC TGGGTTACCT AAGGGTGTGG CCCTTCCGCA TAGAACTGCC TGCGTCAGAT 8640

TCTCGCATGC CAGAGATCCT ATTTTTGGCA ATCAAATCAT TCCGGATACT GCGATTTTAA 8700

GTGTTGTTCC ATTCCATCAC GGTTTTGGAA TGTTTACTAC ACTCGGATAT TTGATATGTG 8760

GATTTCGAGT CGTCTTAATG TATAGATTTG AAGAAGAGCT GTTTTTACGA TCCCTTCAGG 8820

ATTACAAAAT TCAAAGTGCG TTGCTAGTAC CAACCCTATT TTCATTCTTC GCCAAAAGCA 8880

CTCTGATTGA CAAATACGAT TTATCTAATT TACACGAAAT TGCTTCTGGG GGCGCACCTC 8940

TTTCGAAAGA AGTCGGGGAA GCGGTTGCAA AACGCTTCCA TCTTCCAGGG ATACGACAAG 9000

GATATGGGCT CACTGAGACT ACATCAGCTA TTCTGATTAC ACCCGAGGGG GATGATAAAC 9060

CGGGCGCGGT CGGTAAAGTT GTTCCATTTT TTGAAGCGAA GGTTGTGGAT CTGGATACCG 9120

GGAAAACGCT GGGCGTTAAT CAGAGAGGCG AATTATGTGT CAGAGGACCT ATGATTATGT 9180

CCGGTTATGT AAACAATCCG GAAGCGACCA ACGCCTTGAT TGACAAGGAT GGATGGCTAC 9240

ATTCTGGAGA CATAGCTTAC TGGGACGAAG ACGAACACTT CTTCATAGTT GACCGCTTGA 9300

AGTCTTTAAT TAAATACAAA GGATATCAGG TGGCCCCCGC TGAATTGGAA TCGATATTGT 9360

TACAACACCC CAACATCTTC GACGCGGGCG TGGCAGGTCT TCCCGACGAT GACGCCGGTG 9420

AACTTCCCGC CGCCGTTGTT GTTTTGGAGC ACGGAAAGAC GATGACGGAA AAAGAGATCG 9480

TGGATTACGT CGCCAGTCAA GTAACAACCG CGAAAAAGTT GCGCGGAGGA GTTGTGTTTG 9540

TGGACGAAGT ACCGAAAGGT CTTACCGGAA AACTCGACGC AAGAAAAATC AGAGAGATCC 9600

TCATAAAGGC CAAGAAGGGC GGAAAGTCCA AATTGTAAAA TGTAACTGTA TTCAGCGATG 9660

ACGAAATTCT TAGCTATTGT AATGACTCTA GAGGATCTTT GTGAAGGAAC CTTACTTCTG 9720

TGGTGTGACA TAATTGGACA AACTACCTAC AGAGATTTAA AGCTCTAAGG TAAATATAAA 9780

ATTTTTAAGT GTATAATGTG TTAAACTACT GATTCTAATT GTTTGTGTAT TTTAGATTCC 9840

AACCTATGGA ACTGATGAAT GGGAGCAGTG GTGGAATGCC TTTAATGAGG AAAACCTGTT 9900

TTGCTCAGAA GAAATGCCAT CTAGTGATGA TGAGGCTACT GCTGACTCTC AACATTCTAC 9960

TCCTCCAAAA AAGAAGAGAA AGGTAGAAGA CCCCAAGGAC TTTCCTTCAG AATTGCTAAG 10020

TTTTTTGAGT CATGCTGTGT TTAGTAATAG AACTCTTGCT TGCTTTGCTA TTTACACCAC 10080

AAAGGAAAAA GCTGCACTGC TATACAAGAA AATTATGGAA AAATATTCTG TAACCTTTAT 10140

AAGTAGGCAT AACAGTTATA ATCATAACAT ACTGTTTTTT CTTACTCCAC ACAGGCATAG 10200

AGTGTCTGCT ATTAATAACT ATGCTCAAAA ATTGTGTACC TTTAGCTTTT TAATTTGTAA 10260

AGGGGTTAAT AAGGAATATT TGATGTATAG TGCCTTGACT AGAGATCATA ATCAGCCATA 10320

CCACATTTGT AGAGGTTTTA CTTGCTTTAA AAAACCTCCC ACACCTCCCC CTGAACCTGA 10380

AACATAAAAT GAATGCAATT GTTGTTGTTA ACTTGTTTAT TGCAGCTTAT AATGGTTACA 10440

AATAAAGCAA TAGCATCACA AATTTCACAA ATAAAGCATT TTTTTCACTG CATTCTAGTT 10500

GTGGTTTGTC CAAACTCATC AATGTATCTT ATCATGTCTG GATCCCCAGG AAGCTCCTCT 10560

GTGTCCTCAT AAACCCTAAC CTCCTCTACT TGAGAGGACA TTCCAATCAT AGGCTGCCCA 10620

TCCACCCTCT GTGTCCTCCT GTTAATTAGG TCACTTAACA AAAAGGAAAT TGGGTAGGGG 10680

TTTTTCACAG ACCGCTTTCT AAGGGTAATT TTAAAATATC TGGGAAGTCC CTTCCACTGC 10740

TGTGTTCCAG AAGTGTTGGT AAACAGCCCA CAAATGTCAA CAGCAGAAAC ATACAAGCTG 10800

TCAGCTTTGC ACAAGGGCCC AACACCCTGC TCAGCAAGAA GCACTGTGGT TGCTGTGTTA 10860

GTAATGTGCA AAACAGGAGG CACATTTTCC CCACCTGTGT AGGTTCCAAA ATATCTAGTG 10920

TTTTCATTTT TACTTGGATC AGGAACCCAG CACTCCACTG GATAAGCATT ATCCTTATCC 10980

AAAACAGCCT TGTGGTCAGT GTTCATCTGC TGACTGTCAA CTGTAGCATT TTTTGGGGTT 11040

ACAGTTTGAG CAGGATATTT GGTCCTGTAG TTTGCTAACA CACCCTGCAG CTCCAAAGGT 11100

TCCCCACCAA CAGCAAAAAA ATGAAAATTT GACCCTTGAA TGGGTTTTCC AGCACCATTT 11160

TCATGAGTTT TTTGTGTCCC TGAATGCAAG TTTAACATAG CAGTTACCCC AATAACCTCA 11220

GTTTTAACAG TAACAGCTTC CCACATCAAA ATATTTCCAC AGGTTAAGTC CTCATTTAAA 11280

TTAGGCAAAG GAA 11293 (2) INFORMATION FOR SEQ ID NO:2:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 10697 base pairε

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: circular

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2:

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100

ATAGTTAAGC CAGTATTCGA CCTCGAGGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG 2160

TGACATAATT GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT 2220

TAAGTGTATA ATGTGTTAAA CTACTGATTC TAATTGTTTG TGTATTTTAG ATTCCAACCT 2280

ATGGAACTGA TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT 2340

CAGAAGAAAT GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC 2400

CAAAAAAGAA GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT 2460

TGAGTCATGC TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG 2520

AAAAAGCTGC ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA 2580

GGCATAACAG TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT 2640

CTGCTATTAA TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG 2700

TTAATAAGGA ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA 2760

TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT 2820

AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA 2880

AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT 2940

TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC GGCTGTGGAA TGTGTGTCAG 3000

TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA GTATGCAAAG CATGCATCTC 3060

AATTAGTCAG CAACCAGGTG TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG AAGTATGCAA 3120

AGCATGCATC TCAATTAGTC AGCAACCATA GTCCCGCCCC TAACTCCGCC CATCCCGCCC 3180

CTAACTCCGC CCAGTTCCGC CCATTCTCCG CCCCATGGCT GACTAATTTT TTTTATTTAT 3240

GCAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG AGGCTTTTTT 3300

GGAGGCCTAG GCTTTTGCAA AAAGCTTCAC GCTGCCGCAA GCACTCAGGG CGCAAGGGCT 3360

GCTAAAGGAA GCGGAACACG TAGAAAGCCA GTCCGCAGAA ACGGTGCTGA CCCCGGATGA 3420

ATGTCAGCTA CTGGGCTATC TGGACAAGGG AAAACGCAAG CGCAAAGAGA AAGCAGGTAG 3480

CTTGCAGTGG GCTTACATGG CGATAGCTAG ACTGGGCGGT TTTATGGACA GCAAGCGAAC 3540

CGGAATTGCC AGCTGGGGCG CCCTCTGGTA AGGTTGGGAA GCCCTGCAAA GTAAACTGGA 3600

TGGCTTTCTT GCCGCCAAGG ATCTGATGGC GCAGGGGATC AAGATCTGAT CAAGAGACAG 3660

GATGAGGATC GTTTCGCATG ATTGAACAAG ATGGATTGCA CGCAGGTTCT CCGGCCGCTT 3720

GGGTGGAGAG GCTATTCGGC TATGACTGGG CACAACAGAC AATCGGCTGC TCTGATGCCG 3780

CCGTGTTCCG GCTGTCAGCG CAGGGGCGCC CGGTTCTTTT TGTCAAGACC GACCTGTCCG 3840

GTGCCCTGAA TGAACTGCAG GACGAGGCAG CGCGGCTATC GTGGCTGGCC ACGACGGGCG 3900

TTCCTTGCGC AGCTGTGCTC GACGTTGTCA CTGAAGCGGG AAGGGACTGG CTGCTATTGG 3960

GCGAAGTGCC GGGGCAGGAT CTCCTGTCAT CTCACCTTGC TCCTGCCGAG AAAGTATCCA 4020

TCATGGCTGA TGCAATGCGG CGGCTGCATA CGCTTGATCC GGCTACCTGC CCATTCGACC 4080

ACCAAGCGAA ACATCGCATC GAGCGAGCAC GTACTCGGAT GGAAGCCGGT CTTGTCGATC 4140

AGGATGATCT GGACGAAGAG CATCAGGGGC TCGCGCCAGC CGAACTGTTC GCCAGGCTCA 4200

AGGCGCGCAT GCCCGACGGC GAGGATCTCG TCGTGACCCA TGGCGATGCC TGCTTGCCGA 4260

ATATCATGGT GGAAAATGGC CGCTTTTCTG GATTCATCGA CTGTGGCCGG CTGGGTGTGG 4320

CGGACCGCTA TCAGGACATA GCGTTGGCTA CCCGTGATAT TGCTGAAGAG CTTGGCGGCG 4380

AATGGGCTGA CCGCTTCCTC GTGCTTTACG GTATCGCCGC TCCCGATTCG CAGCGCATCG 4440

CCTTCTATCG CCTTCTTGAC GAGTTCTTCT GAGCGGGACT CTGGGGTTCG AAATGACCGA 4500

CCAAGCGACG CCCAACCTGC CATCACGAGA TTTCGATTCC ACCGCCGCCT TCTATGAAAG 4560

GTTGGGCTTC GGAATCGTTT TCCGGGACGC CGGCTGGATG ATCCTCCAGC GCGGGGATCT 4620

CATGCTGGAG TTCTTCGCCC ACCCCGGGCT CGATCCCCTC GCGAGTTGGT TCAGCTGCTG 4680

CCTGAGGCTG GACGACCTCG CGGAGTTCTA CCGGCAGTGC AAATCCGTCG GCATCCAGGA 4740

AACCAGCAGC GGCTATCCGC GCATCCATGC CCCCGAACTG CAGGAGTGGG GAGGCACGAT 4800

GGCCGCTTTG GTCCCGGATC TTTGTGAAGG AACCTTACTT CTGTGGTGTG ACATAATTGG 4860

ACAAACTACC TACAGAGATT TAAAGCTCTA AGGTAAATAT AAAATTTTTA AGTGTATAAT 4920

GTGTTAAACT ACTGATTCTA ATTGTTTGTG TATTTTAGAT TCCAACCTAT GGAACTGATG 4980

AATGGGAGCA GTGGTGGAAT GCCTTTAATG AGGAAAACCT GTTTTGCTCA GAAGAAATGC 5040

CATCTAGTGA TGATGAGGCT ACTGCTGACT CTCAACATTC TACTCCTCCA AAAAAGAAGA 5100

GAAAGGTAGA AGACCCCAAG GACTTTCCTT CAGAATTGCT AAGTTTTTTG AGTCATGCTG 5160

TGTTTAGTAA TAGAACTCTT GCTTGCTTTG CTATTTACAC CACAAAGGAA AAAGCTGCAC 5220

TGCTATACAA GAAAATTATG GAAAAATATT CTGTAACCTT TATAAGTAGG CATAACAGTT 5280

ATAATCATAA CATACTGTTT TTTCTTACTC CACACAGGCA TAGAGTGTCT GCTATTAATA 5340

ACTATGCTCA AAAATTGTGT ACCTTTAGCT TTTTAATTTG TAAAGGGGTT AATAAGGAAT 5400

ATTTGATGTA TAGTGCCTTG ACTAGAGATC ATAATCAGCC ATACCACATT TGTAGAGGTT 5460

TTACTTGCTT TAAAAAACCT CCCACACCTC CCCCTGAACC TGAAACATAA AATGAATGCA 5520

ATTGTTGTTG TTAACTTGTT TATTGCAGCT TATAATGGTT ACAAATAAAG CAATAGCATC 5580

ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA GTTGTGGTTT GTCCAAACTC 5640

ATCAATGTAT CTTATCATGT CTGGATCCCC AGGAAGCTCC TCTGTGTCCT CATAAACCCT 5700

AACCTCCTCT ACTTGAGAGG ACATTCCAAT CATAGGCTGC CCATCCACCC TCTGTGTCCT 5760

CCTGTTAATT AGGTCACTTA ACAAAAAGGA AATTGGGTAG GGGTTTTTCA CAGACCGCTT 5820

TCTAAGGGTA ATTTTAAAAT ATCTGGGAAG TCCCTTCCAC TGCTGTGTTC CAGAAGTGTT 5880

GGTAAACAGC CCACAAATGT CAACAGCAGA AACATACAAG CTGTCAGCTT TGCACAAGGG 5940

CCCAACACCC TGCTCATCAA GAAGCACTGT GGTTGCTGTG TTAGTAATGT GCAAAACAGG 6000

AGGCACATTT TCCCCACCTG TGTAGGTTCC AAAATATCTA GTGTTTTCAT TTTTACTTGG 6060

ATCAGGAACC CAGCACTCCA CTGGATAAGC ATTATCCTTA TCCAAAACAG CCTTGTGGTC 6120

AGTGTTCATC TGCTGACTGT CAACTGTAGC ATTTTTTGGG GTTACAGTTT GAGCAGGATA 6180

TTTGGTCCTG TAGTTTGCTA ACACACCCTG CAGCTCCAAA GGTTCCCCAC CAACAGCAAA 6240

AAAATGAAAA TTTGACCCTT GAATGGGTTT TCCAGCACCA TTTTCATGAG TTTTTTGTGT 6300

CCCTGAATGC AAGTTTAACA TAGCAGTTAC CCCAATAACC TCAGTTTTAA CAGTAACAGC 6360

TTCCCACATC AAAATATTTC CACAGGTTAA GTCCTCATTT AAATTAGGCA AAGGAATTAT 6420

ACACTCCGCT ATCGCTACGT GACTGGGTCA TGGCTGCGCC CCGACACCCG CCAACACCCG 6480

CTGACGCGCC CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 6540

TCTCCGGGAG CTGCATGTGT CAGAGGTTTT CACCGTCATC ACCGAAACGC GCGAGGCAGC 6600

GGATCATAAT CAGCCATACC ACATTTGTAG AGGTTTTACT TGCTTTAAAA AACCTCCCAC 6660

ACCTCCCCCT GAACCTGAAA CATAAAATGA ATGCAATTGT TGTTGTTAAC TTGTTTATTG 6720

CAGCTTATAA TGGTTACAAA TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTTT 6780

TTTCACTGCA TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 6840

TCATAATCAG CCATACCACA TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 6900

TCCCCCTGAA CCTGAAACAT AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG 6960

CTTATAATGG TTACAAATAA AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 7020

CACTGCATTC TAGTTGTGGT TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC 7080

CAAGCTTACC ATGGTAACCC CTGGTCCCGT TCAGCCACCA CCACCCCACC CAGCACACCT 7140

CCAACCTCAG CCAGACAAGG TTGTTGACAC AAGAGAGCCC TCAGGGGCAC AGAGAGAGTC 7200

TGGACACGTG GGGAGTCAGC CGTGTATCAT CGGAGGCGGC CGGGCACCCA CATCTGGTAT 7260

AAAAGGAGGC AGTGGCCCAC AGAGGAGCAC AGCTGTGTTT GGCTGCAGGG CCAAGAGCGC 7320

TGTCAAGAAG ACCCACACGC CCCCCTCCAG CAGCTGAATT CCAGCTGGCA TTCCGGTACT 7380

GTTGGTAAAA TGGAAGACGC CAAAAACATA AAGAAAGGCC CGGCGCCATT CTATCCTCTA 7440

GAGGATGGAA CCGCTGGAGA GCAACTGCAT AAGGCTATGA AGAGATACGC CCTGGTTCCT 7500

GGAACAATTG CTTTTACAGA TGCACATATC GAGGTGAACA TCACGTACGC GGAATACTTC 7560

GAAATGTCCG TTCGGTTGGC AGAAGCTATG AAACGATATG GGCTGAATAC AAATCACAGA 7620

ATCGTCGTAT GCAGTGAAAA CTCTCTTCAA TTCTTTATGC CGGTGTTGGG CGCGTTATTT 7680

ATCGGAGTTG CAGTTGCGCC CGCGAACGAC ATTTATAATG AACGTGAATT GCTCAACAGT 7740

ATGAACATTT CGCAGCCTAC CGTAGTGTTT GTTTCCAAAA AGGGGTTGCA AAAAATTTTG 7800

AACGTGCAAA AAAAATTACC AATAATCCAG AAAATTATTA TCATGGATTC TAAAACGGAT 7860

TACCAGGGAT TTCAGTCGAT GTACACGTTC GTCACATCTC ATCTACCTCC CGGTTTTAAT 7920

GAATACGATT TTGTACCAGA GTCCTTTGAT CGTGACAAAA CAATTGCACT GATAATGAAT 7980

TCCTCTGGAT CTACTGGGTT ACCTAAGGGT GTGGCCCTTC CGCATAGAAC TGCCTGCGTC 8040

AGATTCTCGC ATGCCAGAGA TCCTATTTTT GGCAATCAAA TCATTCCGGA TACTGCGATT 8100

TTAAGTGTTG TTCCATTCCA TCACGGTTTT GGAATGTTTA CTACACTCGG ATATTTGATA 8160

TGTGGATTTC GAGTCGTCTT AATGTATAGA TTTGAAGAAG AGCTGTTTTT ACGATCCCTT 8220

CAGGATTACA AAATTCAAAG TGCGTTGCTA GTACCAACCC TATTTTCATT CTTCGCCAAA 8280

AGCACTCTGA TTGACAAATA CGATTTATCT AATTTACACG AAATTGCTTC TGGGGGCGCA 8340

CCTCTTTCGA AAGAAGTCGG GGAAGCGGTT GCAAAACGCT TCCATCTTCC AGGGATACGA 8400

CAAGGATATG GGCTCACTGA GACTACATCA GCTATTCTGA TTACACCCGA GGGGGATGAT 8460

AAACCGGGCG CGGTCGGTAA AGTTGTTCCA TTTTTTGAAG CGAAGGTTGT GGATCTGGAT 8520

ACCGGGAAAA CGCTGGGCGT TAATCAGAGA GGCGAATTAT GTGTCAGAGG ACCTATGATT 8580

ATGTCCGGTT ATGTAAACAA TCCGGAAGCG ACCAACGCCT TGATTGACAA GGATGGATGG 8640

CTACATTCTG GAGACATAGC TTACTGGGAC GAAGACGAAC ACTTCTTCAT AGTTGACCGC 8700

TTGAAGTCTT TAATTAAATA CAAAGGATAT CAGGTGGCCC CCGCTGAATT GGAATCGATA 8760

TTGTTACAAC ACCCCAACAT CTTCGACGCG GGCGTGGCAG GTCTTCCCGA CGATGACGCC 8820

GGTGAACTTC CCGCCGCCGT TGTTGTTTTG GAGCACGGAA AGACGATGAC GGAAAAAGAG 8880

ATCGTGGATT ACGTCGCCAG TCAAGTAACA ACCGCGAAAA AGTTGCGCGG AGGAGTTGTG 8940

TTTGTGGACG AAGTACCGAA AGGTCTTACC GGAAAACTCG ACGCAAGAAA AATCAGAGAG 9000

ATCCTCATAA AGGCCAAGAA GGGCGGAAAG TCCAAATTGT AAAATGTAAC TGTATTCAGC 9060

GATGACGAAA TTCTTAGCTA TTGTAATGAC TCTAGAGGAT CTTTGTGAAG GAACCTTACT 9120

TCTGTGGTGT GACATAATTG GACAAACTAC CTACAGAGAT TTAAAGCTCT AAGGTAAATA 9180

TAAAATTTTT AAGTGTATAA TGTGTTAAAC TACTGATTCT AATTGTTTGT GTATTTTAGA 9240

TTCCAACCTA TGGAACTGAT GAATGGGAGC AGTGGTGGAA TGCCTTTAAT GAGGAAAACC 9300

TGTTTTGCTC AGAAGAAATG CCATCTAGTG ATGATGAGGC TACTGCTGAC TCTCAACATT 9360

CTACTCCTCC AAAAAAGAAG AGAAAGGTAG AAGACCCCAA GGACTTTCCT TCAGAATTGC 9420

TAAGTTTTTT GAGTCATGCT GTGTTTAGTA ATAGAACTCT TGCTTGCTTT GCTATTTACA 9480

CCACAAAGGA AAAAGCTGCA CTGCTATACA AGAAAATTAT GGAAAAATAT TCTGTAACCT 9540

TTATAAGTAG GCATAACAGT TATAATCATA ACATACTGTT TTTTCTTACT CCACACAGGC 9600

ATAGAGTGTC TGCTATTAAT AACTATGCTC AAAAATTGTG TACCTTTAGC TTTTTAATTT 9660

GTAAAGGGGT TAATAAGGAA TATTTGATGT ATAGTGCCTT GACTAGAGAT CATAATCAGC 9720

CATACCACAT TTGTAGAGGT TTTACTTGCT TTAAAAAACC TCCCACACCT CCCCCTGAAC 9780

CTGAAACATA AAATGAATGC AATTGTTGTT GTTAACTTGT TTATTGCAGC TTATAATGGT 9840

TACAAATAAA GCAATAGCAT CACAAATTTC ACAAATAAAG CATTTTTTTC ACTGCATTCT 9900

AGTTGTGGTT TGTCCAAACT CATCAATGTA TCTTATCATG TCTGGATCCC CAGGAAGCTC 9960

CTCTGTGTCC TCATAAACCC TAACCTCCTC TACTTGAGAG GACATTCCAA TCATAGGCTG 10020

CCCATCCACC CTCTGTGTCC TCCTGTTAAT TAGGTCACTT AACAAAAAGG AAATTGGGTA 10080

GGGGTTTTTC ACAGACCGCT TTCTAAGGGT AATTTTAAAA TATCTGGGAA GTCCCTTCCA 10140

CTGCTGTGTT CCAGAAGTGT TGGTAAACAG CCCACAAATG TCAACAGCAG AAACATACAA 10200

GCTGTCAGCT TTGCACAAGG GCCCAACACC CTGCTCAGCA AGAAGCACTG TGGTTGCTGT 10260

GTTAGTAATG TGCAAAACAG GAGGCACATT TTCCCCACCT GTGTAGGTTC CAAAATATCT 10320

AGTGTTTTCA TTTTTACTTG GATCAGGAAC CCAGCACTCC ACTGGATAAG CATTATCCTT 10380

ATCCAAAACA GCCTTGTGGT CAGTGTTCAT CTGCTGACTG TCAACTGTAG CATTTTTTGG 10440

GGTTACAGTT TGAGCAGGAT ATTTGGTCCT GTAGTTTGCT AACACACCCT GCAGCTCCAA 10500

AGGTTCCCCA CCAACAGCAA AAAAATGAAA ATTTGACCCT TGAATGGGTT TTCCAGCACC 10560

ATTTTCATGA GTTTTTTGTG TCCCTGAATG CAAGTTTAAC ATAGCAGTTA CCCCAATAAC 10620

CTCAGTTTTA ACAGTAACAG CTTCCCACAT CAAAATATTT CCACAGGTTA AGTCCTCATT 10680

TAAATTAGGC AAAGGAA 10697 (2) INFORMATION FOR SEQ ID NO:3:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 10549 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: circular

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100

ATAGTTAAGC CAGTATTCGA CCTCGAGGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG 2160

TGACATAATT GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT 2220

TAAGTGTATA ATGTGTTAAA CTACTGATTC TAATTGTTTG TGTATTTTAG ATTCCAACCT 2280

ATGGAACTGA TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT 2340

CAGAAGAAAT GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC 2400

CAAAAAAGAA GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT 2460

TGAGTCATGC TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG 2520

AAAAAGCTGC ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA 2580

GGCATAACAG TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT 2640

CTGCTATTAA TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG 2700

TTAATAAGGA ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA 2760

TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT 2820

AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA 2880

AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT 2940

TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC GGCTGTGGAA TGTGTGTCAG 3000

TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA GTATGCAAAG CATGCATCTC 3060

AATTAGTCAG CAACCAGGTG TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG AAGTATGCAA 3120

AGCATGCATC TCAATTAGTC AGCAACCATA GTCCCGCCCC TAACTCCGCC CATCCCGCCC 3180

CTAACTCCGC CCAGTTCCGC CCATTCTCCG CCCCATGGCT GACTAATTTT TTTTATTTAT 3240

GCAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG AGGCTTTTTT 3300

GGAGGCCTAG GCTTTTGCAA AAAGCTTCAC GCTGCCGCAA GCACTCAGGG CGCAAGGGCT 3360

GCTAAAGGAA GCGGAACACG TAGAAAGCCA GTCCGCAGAA ACGGTGCTGA CCCCGGATGA 3420

ATGTCAGCTA CTGGGCTATC TGGACAAGGG AAAACGCAAG CGCAAAGAGA AAGCAGGTAG 3480

CTTGCAGTGG GCTTACATGG CGATAGCTAG ACTGGGCGGT TTTATGGACA GCAAGCGAAC 3540

CGGAATTGCC AGCTGGGGCG CCCTCTGGTA AGGTTGGGAA GCCCTGCAAA GTAAACTGGA 3600

TGGCTTTCTT GCCGCCAAGG ATCTGATGGC GCAGGGGATC AAGATCTGAT CAAGAGACAG 3660

GATGAGGATC GTTTCGCATG ATTGAACAAG ATGGATTGCA CGCAGGTTCT CCGGCCGCTT 3720

GGGTGGAGAG GCTATTCGGC TATGACTGGG CACAACAGAC AATCGGCTGC TCTGATGCCG 3780

CCGTGTTCCG GCTGTCAGCG CAGGGGCGCC CGGTTCTTTT TGTCAAGACC GACCTGTCCG 3840

GTGCCCTGAA TGAACTGCAG GACGAGGCAG CGCGGCTATC GTGGCTGGCC ACGACGGGCG 3900

TTCCTTGCGC AGCTGTGCTC GACGTTGTCA CTGAAGCGGG AAGGGACTGG CTGCTATTGG 3960

GCGAAGTGCC GGGGCAGGAT CTCCTGTCAT CTCACCTTGC TCCTGCCGAG AAAGTATCCA 4020

TCATGGCTGA TGCAATGCGG CGGCTGCATA CGCTTGATCC GGCTACCTGC CCATTCGACC 4080

ACCAAGCGAA ACATCGCATC GAGCGAGCAC GTACTCGGAT GGAAGCCGGT CTTGTCGATC 4140

AGGATGATCT GGACGAAGAG CATCAGGGGC TCGCGCCAGC CGAACTGTTC GCCAGGCTCA 4200

AGGCGCGCAT GCCCGACGGC GAGGATCTCG TCGTGACCCA TGGCGATGCC TGCTTGCCGA 4260

ATATCATGGT GGAAAATGGC CGCTTTTCTG GATTCATCGA CTGTGGCCGG CTGGGTGTGG 4320

CGGACCGCTA TCAGGACATA GCGTTGGCTA CCCGTGATAT TGCTGAAGAG CTTGGCGGCG 4380

AATGGGCTGA CCGCTTCCTC GTGCTTTACG GTATCGCCGC TCCCGATTCG CAGCGCATCG 4440

CCTTCTATCG CCTTCTTGAC GAGTTCTTCT GAGCGGGACT CTGGGGTTCG AAATGACCGA 4500

CCAAGCGACG CCCAACCTGC CATCACGAGA TTTCGATTCC ACCGCCGCCT TCTATGAAAG 4560

GTTGGGCTTC GGAATCGTTT TCCGGGACGC CGGCTGGATG ATCCTCCAGC GCGGGGATCT 4620

CATGCTGGAG TTCTTCGCCC ACCCCGGGCT CGATCCCCTC GCGAGTTGGT TCAGCTGCTG 4680

CCTGAGGCTG GACGACCTCG CGGAGTTCTA CCGGCAGTGC AAATCCGTCG GCATCCAGGA 4740

AACCAGCAGC GGCTATCCGC GCATCCATGC CCCCGAACTG CAGGAGTGGG GAGGCACGAT 4800

GGCCGCTTTG GTCCCGGATC TTTGTGAAGG AACCTTACTT CTGTGGTGTG ACATAATTGG 4860

ACAAACTACC TACAGAGATT TAAAGCTCTA AGGTAAATAT AAAATTTTTA AGTGTATAAT 4920

GTGTTAAACT ACTGATTCTA ATTGTTTGTG TATTTTAGAT TCCAACCTAT GGAACTGATG 4980

AATGGGAGCA GTGGTGGAAT GCCTTTAATG AGGAAAACCT GTTTTGCTCA GAAGAAATGC 5040

CATCTAGTGA TGATGAGGCT ACTGCTGACT CTCAACATTC TACTCCTCCA AAAAAGAAGA 5100

GAAAGGTAGA AGACCCCAAG GACTTTCCTT CAGAATTGCT AAGTTTTTTG AGTCATGCTG 5160

TGTTTAGTAA TAGAACTCTT GCTTGCTTTG CTATTTACAC CACAAAGGAA AAAGCTGCAC 5220

TGCTATACAA GAAAATTATG GAAAAATATT CTGTAACCTT TATAAGTAGG CATAACAGTT 5280

ATAATCATAA CATACTGTTT TTTCTTACTC CACACAGGCA TAGAGTGTCT GCTATTAATA 5340

ACTATGCTCA AAAATTGTGT ACCTTTAGCT TTTTAATTTG TAAAGGGGTT AATAAGGAAT 5400

ATTTGATGTA TAGTGCCTTG ACTAGAGATC ATAATCAGCC ATACCACATT TGTAGAGGTT 5460

' TTACTTGCTT TAAAAAACCT CCCACACCTC CCCCTGAACC TGAAACATAA AATGAATGCA 5520

ATTGTTGTTG TTAACTTGTT TATTGCAGCT TATAATGGTT ACAAATAAAG CAATAGCATC 5580

ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA GTTGTGGTTT GTCCAAACTC 5640

ATCAATGTAT CTTATCATGT CTGGATCCCC AGGAAGCTCC TCTGTGTCCT CATAAACCCT 5700

AACCTCCTCT ACTTGAGAGG ACATTCCAAT CATAGGCTGC CCATCCACCC TCTGTGTCCT 5760

CCTGTTAATT AGGTCACTTA ACAAAAAGGA AATTGGGTAG GGGTTTTTCA CAGACCGCTT 5820

TCTAAGGGTA ATTTTAAAAT ATCTGGGAAG TCCCTTCCAC TGCTGTGTTC CAGAAGTGTT 5880

GGTAAACAGC CCACAAATGT CAACAGCAGA AACATACAAG CTGTCAGCTT TGCACAAGGG 5940

CCCAACACCC TGCTCATCAA GAAGCACTGT GGTTGCTGTG TTAGTAATGT GCAAAACAGG 6000

AGGCACATTT TCCCCACCTG TGTAGGTTCC AAAATATCTA GTGTTTTCAT TTTTACTTGG 6060

ATCAGGAACC CAGCACTCCA CTGGATAAGC ATTATCCTTA TCCAAAACAG CCTTGTGGTC 6120

AGTGTTCATC TGCTGACTGT CAACTGTAGC ATTTTTTGGG GTTACAGTTT GAGCAGGATA 6180

TTTGGTCCTG TAGTTTGCTA ACACACCCTG CAGCTCCAAA GGTTCCCCAC CAACAGCAAA 6240

AAAATGAAAA TTTGACCCTT GAATGGGTTT TCCAGCACCA TTTTCATGAG TTTTTTGTGT 6300

CCCTGAATGC AAGTTTAACA TAGCAGTTAC CCCAATAACC TCAGTTTTAA CAGTAACAGC 6360

TTCCCACATC AAAATATTTC CACAGGTTAA GTCCTCATTT AAATTAGGCA AAGGAATTAT 6420

ACACTCCGCT ATCGCTACGT GACTGGGTCA TGGCTGCGCC CCGACACCCG CCAACACCCG 6480

CTGACGCGCC CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 6540

TCTCCGGGAG CTGCATGTGT CAGAGGTTTT CACCGTCATC ACCGAAACGC GCGAGGCAGC 6600

GGATCATAAT CAGCCATACC ACATTTGTAG AGGTTTTACT TGCTTTAAAA AACCTCCCAC 6660

ACCTCCCCCT GAACCTGAAA CATAAAATGA ATGCAATTGT TGTTGTTAAC TTGTTTATTG 6720

CAGCTTATAA TGGTTACAAA TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTTT 6780

TTTCACTGCA TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 6840

TCATAATCAG CCATACCACA TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 6900

TCCCCCTGAA CCTGAAACAT AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG 6960

CTTATAATGG TTACAAATAA AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 7020

CACTGCATTC TAGTTGTGGT TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC 7080

CAAGTTCATC TATTTCCTCC CACATCTGGT ATAAAAGGAG GCAGTGGCCC ACAGAGGAGC 7140

ACAGCTGTGT TTGGCTGCAG GGCCAAGAGC GCTGTCAAGA AGACCCACAC GCCCCCCTCC 7200

AGCAGCTGAA TTCCAGCTGG CATTCCGGTA CTGTTGGTAA AATGGAAGAC GCCAAAAACA 7260

TAAAGAAAGG CCCGGCGCCA TTCTATCCTC TAGAGGATGG AACCGCTGGA GAGCAACTGC 7320

ATAAGGCTAT GAAGAGATAC GCCCTGGTTC CTGGAACAAT TGCTTTTACA GATGCACATA 7380

TCGAGGTGAA CATCACGTAC GCGGAATACT TCGAAATGTC CGTTCGGTTG GCAGAAGCTA 7440

TGAAACGATA TGGGCTGAAT ACAAATCACA GAATCGTCGT ATGCAGTGAA AACTCTCTTC 7500

AATTCTTTAT GCCGGTGTTG GGCGCGTTAT TTATCGGAGT TGCAGTTGCG CCCGCGAACG 7560

ACATTTATAA TGAACGTGAA TTGCTCAACA GTATGAACAT TTCGCAGCCT ACCGTAGTGT 7620

TTGTTTCCAA AAAGGGGTTG CAAAAAATTT TGAACGTGCA AAAAAAATTA CCAATAATCC 7680

AGAAAATTAT TATCATGGAT TCTAAAACGG ATTACCAGGG ATTTCAGTCG ATGTACACGT 7740

TCGTCACATC TCATCTACCT CCCGGTTTTA ATGAATACGA TTTTGTACCA GAGTCCTTTG 7800

ATCGTGACAA AACAATTGCA CTGATAATGA ATTCCTCTGG ATCTACTGGG TTACCTAAGG 7860

GTGTGGCCCT TCCGCATAGA ACTGCCTGCG TCAGATTCTC GCATGCCAGA GATCCTATTT 7920

TTGGCAATCA AATCATTCCG GATACTGCGA TTTTAAGTGT TGTTCCATTC CATCACGGTT 7980

TTGGAATGTT TACTACACTC GGATATTTGA TATGTGGATT TCGAGTCGTC TTAATGTATA 8040

GATTTGAAGA AGAGCTGTTT TTACGATCCC TTCAGGATTA CAAAATTCAA AGTGCGTTGC 8100

TAGTACCAAC CCTATTTTCA TTCTTCGCCA AAAGCACTCT GATTGACAAA TACGATTTAT 8160

CTAATTTACA CGAAATTGCT TCTGGGGGCG CACCTCTTTC GAAAGAAGTC GGGGAAGCGG 8220

TTGCAAAACG CTTCCATCTT CCAGGGATAC GACAAGGATA TGGGCTCACT GAGACTACAT 8280

CAGCTATTCT GATTACACCC GAGGGGGATG ATAAACCGGG CGCGGTCGGT AAAGTTGTTC 8340

CATTTTTTGA AGCGAAGGTT GTGGATCTGG ATACCGGGAA AACGCTGGGC GTTAATCAGA 8400

GAGGCGAATT ATGTGTCAGA GGACCTATGA TTATGTCCGG TTATGTAAAC AATCCGGAAG 8460

CGACCAACGC CTTGATTGAC AAGGATGGAT GGCTACATTC TGGAGACATA GCTTACTGGG 8520

ACGAAGACGA ACACTTCTTC ATAGTTGACC GCTTGAAGTC TTTAATTAAA TACAAAGGAT 8580

ATCAGGTGGC CCCCGCTGAA TTGGAATCGA TATTGTTACA ACACCCCAAC ATCTTCGACG 8640

CGGGCGTGGC AGGTCTTCCC GACGATGACG CCGGTGAACT TCCCGCCGCC GTTGTTGTTT 8700

TGGAGCACGG AAAGACGATG ACGGAAAAAG AGATCGTGGA TTACGTCGCC AGTCAAGTAA 8760

CAACCGCGAA AAAGTTGCGC GGAGGAGTTG TGTTTGTGGA CGAAGTACCG AAAGGTCTTA 8820

CCGGAAAACT CGACGCAAGA AAAATCAGAG AGATCCTCAT AAAGGCCAAG AAGGGCGGAA 8880

AGTCCAAATT GTAAAATGTA ACTGTATTCA GCGATGACGA AATTCTTAGC TATTGTAATG 8940

ACTCTAGAGG ATCTTTGTGA AGGAACCTTA CTTCTGTGGT GTGACATAAT TGGACAAACT 9000

ACCTACAGAG ATTTAAAGCT CTAAGGTAAA TATAAAATTT TTAAGTGTAT AATGTGTTAA 9060

ACTACTGATT CTAATTGTTT GTGTATTTTA GATTCCAACC TATGGAACTG ATGAATGGGA 9120

GCAGTGGTGG AATGCCTTTA ATGAGGAAAA CCTGTTTTGC TCAGAAGAAA TGCCATCTAG 9180

TGATGATGAG GCTACTGCTG ACTCTCAACA TTCTACTCCT CCAAAAAAGA AGAGAAAGGT 9240

AGAAGACCCC AAGGACTTTC CTTCAGAATT GCTAAGTTTT TTGAGTCATG CTGTGTTTAG 9300

TAATAGAACT CTTGCTTGCT TTGCTATTTA CACCACAAAG GAAAAAGCTG CACTGCTATA 9360

CAAGAAAATT ATGGAAAAAT ATTCTGTAAC CTTTATAAGT AGGCATAACA GTTATAATCA 9420

TAACATACTG TTTTTTCTTA CTCCACACAG GCATAGAGTG TCTGCTATTA ATAACTATGC 9480

TCAAAAATTG TGTACCTTTA GCTTTTTAAT TTGTAAAGGG GTTAATAAGG AATATTTGAT 9540

GTATAGTGCC TTGACTAGAG ATCATAATCA GCCATACCAC ATTTGTAGAG GTTTTACTTG 9600

CTTTAAAAAA CCTCCCACAC CTCCCCCTGA ACCTGAAACA TAAAATGAAT GCAATTGTTG 9660

TTGTTAACTT GTTTATTGCA GCTTATAATG GTTACAAATA AAGCAATAGC ATCACAAATT 9720

TCACAAATAA AGCATTTTTT TCACTGCATT CTAGTTGTGG TTTGTCCAAA CTCATCAATG 9780

TATCTTATCA TGTCTGGATC CCCAGGAAGC TCCTCTGTGT CCTCATAAAC CCTAACCTCC 9840

TCTACTTGAG AGGACATTCC AATCATAGGC TGCCCATCCA CCCTCTGTGT CCTCCTGTTA 9900

ATTAGGTCAC TTAACAAAAA GGAAATTGGG TAGGGGTTTT TCACAGACCG CTTTCTAAGG 9960

GTAATTTTAA AATATCTGGG AAGTCCCTTC CACTGCTGTG TTCCAGAAGT GTTGGTAAAC 10020

AGCCCACAAA TGTCAACAGC AGAAACATAC AAGCTGTCAG CTTTGCACAA GGGCCCAACA 10080

CCCTGCTCAG CAAGAAGCAC TGTGGTTGCT GTGTTAGTAA TGTGCAAAAC AGGAGGCACA 10140

TTTTCCCCAC CTGTGTAGGT TCCAAAATAT CTAGTGTTTT CATTTTTACT TGGATCAGGA 10200

ACCCAGCACT CCACTGGATA AGCATTATCC TTATCCAAAA CAGCCTTGTG GTCAGTGTTC 10260

ATCTGCTGAC TGTCAACTGT AGCATTTTTT GGGGTTACAG TTTGAGCAGG ATATTTGGTC 10320

CTGTAGTTTG CTAACACACC CTGCAGCTCC AAAGGTTCCC CACCAACAGC AAAAAAATGA 10380

AAATTTGACC CTTGAATGGG TTTTCCAGCA CCATTTTCAT GAGTTTTTTG TGTCCCTGAA 10440

TGCAAGTTTA ACATAGCAGT TACCCCAATA ACCTCAGTTT TAACAGTAAC AGCTTCCCAC 10500

ATCAAAATAT TTCCACAGGT TAAGTCCTCA TTTAAATTAG GCAAAGGAA 10549 (2) INFORMATION FOR SEQ ID N0:4:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 10558 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: circular

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:4:

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGG.CAACAA TTAATAGACT GGATGGAGGC 840

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100

ATAGTTAAGC CAGTATTCGA CCTCGAGGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG 2160

TGACATAATT GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT 2220

TAAGTGTATA ATGTGTTAAA CTACTGATTC TAATTGTTTG TGTATTTTAG ATTCCAACCT 2280

ATGGAACTGA TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT 2340

CAGAAGAAAT GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC 2400

CAAAAAAGAA GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT 2460

TGAGTCATGC TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG 2520

AAAAAGCTGC ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA 2580

GGCATAACAG TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT 2640

CTGCTATTAA TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG 2700

TTAATAAGGA ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA 2760

TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT 2820

AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA 2880

AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT 2940

TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC GGCTGTGGAA TGTGTGTCAG 3000

TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA GTATGCAAAG CATGCATCTC 3060

AATTAGTCAG CAACCAGGTG TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG AAGTATGCAA 3120

AGCATGCATC TCAATTAGTC AGCAACCATA GTCCCGCCCC TAACTCCGCC CATCCCGCCC 3180

CTAACTCCGC CCAGTTCCGC CCATTCTCCG CCCCATGGCT GACTAATTTT TTTTATTTAT 3240

GCAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG AGGCTTTTTT 3300

GGAGGCCTAG GCTTTTGCAA AAAGCTTCAC GCTGCCGCAA GCACTCAGGG CGCAAGGGCT 3360

GCTAAAGGAA GCGGAACACG TAGAAAGCCA GTCCGCAGAA ACGGTGCTGA CCCCGGATGA 3420

ATGTCAGCTA CTGGGCTATC TGGACAAGGG AAAACGCAAG CGCAAAGAGA AAGCAGGTAG 3480

CTTGCAGTGG GCTTACATGG CGATAGCTAG ACTGGGCGGT TTTATGGACA GCAAGCGAAC 3540

CGGAATTGCC AGCTGGGGCG CCCTCTGGTA AGGTTGGGAA GCCCTGCAAA GTAAACTGGA 3600

TGGCTTTCTT GCCGCCAAGG ATCTGATGGC GCAGGGGATC AAGATCTGAT CAAGAGACAG 3660

GATGAGGATC GTTTCGCATG ATTGAACAAG ATGGATTGCA CGCAGGTTCT CCGGCCGCTT 3720

GGGTGGAGAG GCTATTCGGC TATGACTGGG CACAACAGAC AATCGGCTGC TCTGATGCCG 3780

CCGTGTTCCG GCTGTCAGCG CAGGGGCGCC CGGTTCTTTT TGTCAAGACC GACCTGTCCG 3840

GTGCCCTGAA TGAACTGCAG GACGAGGCAG CGCGGCTATC GTGGCTGGCC ACGACGGGCG 3900

TTCCTTGCGC AGCTGTGCTC GACGTTGTCA CTGAAGCGGG AAGGGACTGG CTGCTATTGG 3960

GCGAAGTGCC GGGGCAGGAT CTCCTGTCAT CTCACCTTGC TCCTGCCGAG AAAGTATCCA 4020

TCATGGCTGA TGCAATGCGG CGGCTGCATA CGCTTGATCC GGCTACCTGC CCATTCGACC 4080

ACCAAGCGAA ACATCGCATC GAGCGAGCAC GTACTCGGAT GGAAGCCGGT CTTGTCGATC 4140

AGGATGATCT GGACGAAGAG CATCAGGGGC TCGCGCCAGC CGAACTGTTC GCCAGGCTCA 4200

AGGCGCGCAT GCCCGACGGC GAGGATCTCG TCGTGACCCA TGGCGATGCC TGCTTGCCGA 4260

ATATCATGGT GGAAAATGGC CGCTTTTCTG GATTCATCGA CTGTGGCCGG CTGGGTGTGG 4320

CGGACCGCTA TCAGGACATA GCGTTGGCTA CCCGTGATAT TGCTGAAGAG CTTGGCGGCG 4380

AATGGGCTGA CCGCTTCCTC GTGCTTTACG GTATCGCCGC TCCCGATTCG CAGCGCATCG 4440

CCTTCTATCG CCTTCTTGAC GAGTTCTTCT GAGCGGGACT CTGGGGTTCG AAATGACCGA 4500

CCAAGCGACG CCCAACCTGC CATCACGAGA TTTCGATTCC ACCGCCGCCT TCTATGAAAG 4560

GTTGGGCTTC GGAATCGTTT TCCGGGACGC CGGCTGGATG ATCCTCCAGC GCGGGGATCT 4620

CATGCTGGAG TTCTTCGCCC ACCCCGGGCT CGATCCCCTC GCGAGTTGGT TCAGCTGCTG 4680

CCTGAGGCTG GACGACCTCG CGGAGTTCTA CCGGCAGTGC AAATCCGTCG GCATCCAGGA 4740

AACCAGCAGC GGCTATCCGC GCATCCATGC CCCCGAACTG CAGGAGTGGG GAGGCACGAT 4800

GGCCGCTTTG GTCCCGGATC TTTGTGAAGG AACCTTACTT CTGTGGTGTG ACATAATTGG 4860

ACAAACTACC TACAGAGATT TAAAGCTCTA AGGTAAATAT AAAATTTTTA AGTGTATAAT 4920

GTGTTAAACT ACTGATTCTA ATTGTTTGTG TATTTTAGAT TCCAACCTAT GGAACTGATG 4980

AATGGGAGCA GTGGTGGAAT GCCTTTAATG AGGAAAACCT GTTTTGCTCA GAAGAAATGC 5040

CATCTAGTGA TGATGAGGCT ACTGCTGACT CTCAACATTC TACTCCTCCA AAAAAGAAGA 5100

GAAAGGTAGA AGACCCCAAG GACTTTCCTT CAGAATTGCT AAGTTTTTTG AGTCATGCTG 5160

TGTTTAGTAA TAGAACTCTT GCTTGCTTTG CTATTTACAC CACAAAGGAA AAAGCTGCAC 5220

TGCTATACAA GAAAATTATG GAAAAATATT CTGTAACCTT TATAAGTAGG CATAACAGTT 5280

ATAATCATAA CATACTGTTT TTTCTTACTC CACACAGGCA TAGAGTGTCT GCTATTAATA 5340

ACTATGCTCA AAAATTGTGT ACCTTTAGCT TTTTAATTTG TAAAGGGGTT AATAAGGAAT 5400

ATTTGATGTA TAGTGCCTTG ACTAGAGATC ATAATCAGCC ATACCACATT TGTAGAGGTT 5460

TTACTTGCTT TAAAAAACCT CCCACACCTC CCCCTGAACC TGAAACATAA AATGAATGCA 5520

ATTGTTGTTG TTAACTTGTT TATTGCAGCT TATAATGGTT ACAAATAAAG CAATAGCATC 5580

ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA GTTGTGGTTT GTCCAAACTC 5640

ATCAATGTAT CTTATCATGT CTGGATCCCC AGGAAGCTCC TCTGTGTCCT CATAAACCCT 5700

AACCTCCTCT ACTTGAGAGG ACATTCCAAT CATAGGCTGC CCATCCACCC TCTGTGTCCT 5760

CCTGTTAATT AGGTCACTTA ACAAAAAGGA AATTGGGTAG GGGTTTTTCA CAGACCGCTT 5820

TCTAAGGGTA ATTTTAAAAT ATCTGGGAAG TCCCTTCCAC TGCTGTGTTC CAGAAGTGTT 5880

GGTAAACAGC CCACAAATGT CAACAGCAGA AACATACAAG CTGTCAGCTT TGCACAAGGG 5940

CCCAACACCC TGCTCATCAA GAAGCACTGT GGTTGCTGTG TTAGTAATGT GCAAAACAGG 6000

AGGCACATTT TCCCCACCTG TGTAGGTTCC AAAATATCTA GTGTTTTCAT TTTTACTTGG 6060

ATCAGGAACC CAGCACTCCA CTGGATAAGC ATTATCCTTA TCCAAAACAG CCTTGTGGTC 6120

AGTGTTCATC TGCTGACTGT CAACTGTAGC ATTTTTTGGG GTTACAGTTT GAGCAGGATA 6180

TTTGGTCCTG TAGTTTGCTA ACACACCCTG CAGCTCCAAA GGTTCCCCAC CAACAGCAAA 6240 AAAATGAAAA TTTGACCCTT GAATGGGTTT TCCAGCACCA TTTTCATGAG TTTTTTGTGT 6300 CCCTGAATGC AAGTTTAACA TAGCAGTTAC CCCAATAACC TCAGTTTTAA CAGTAACAGC 6360 TTCCCACATC AAAATATTTC CACAGGTTAA GTCCTCATTT AAATTAGGCA AAGGAATTAT 6420 ACACTCCGCT ATCGCTACGT GACTGGGTCA TGGCTGCGCC CCGACACCCG CCAACACCCG 6480 CTGACGCGCC CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 6540 TCTCCGGGAG CTGCATGTGT CAGAGGTTTT CACCGTCATC ACCGAAACGC GCGAGGCAGC 6600 GGATCATAAT CAGCCATACC ACATTTGTAG AGGTTTTACT TGCTTTAAAA AACCTCCCAC 6660 ACCTCCCCCT GAACCTGAAA CATAAAATGA ATGCAATTGT TGTTGTTAAC TTGTTTATTG 6720 CAGCTTATAA TGGTTACAAA TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTTT 6780 TTTCACTGCA TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 6840 TCATAATCAG CCATACCACA TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 6900 TCCCCCTGAA CCTGAAACAT AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG 6960 CTTATAATGG TTACAAATAA AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 7020 CACTGCATTC TAGTTGTGGT TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC 7080 CAGTGGGGAG TCAGCCGTGT ATCATCGCCC ACATCTGGTA TAAAAGGAGG CAGTGGCCCA 7140 CAGAGGAGCA CAGCTGTGTT TGGCTGCAGG GCCAAGAGCG CTGTCAAGAA GACCCACACG 7200 CCCCCCTCCA GCAGCTGAAT TCCAGCTGGC ATTCCGGTAC TGTTGGTAAA ATGGAAGACG 7260 CCAAAAACAT AAAGAAAGGC CCGGCGCCAT TCTATCCTCT AGAGGATGGA ACCGCTGGAG 7320 AGCAACTGCA TAAGGCTATG AAGAGATACG CCCTGGTTCC TGGAACAATT GCTTTTACAG ' 7380 ATGCACATAT CGAGGTGAAC ATCACGTACG CGGAATACTT CGAAATGTCC GTTCGGTTGG 7440 CAGAAGCTAT GAAACGATAT GGGCTGAATA CAAATCACAG AATCGTCGTA TGCAGTGAAA 7500 ACTCTCTTCA ATTCTTTATG CCGGTGTTGG GCGCGTTATT TATCGGAGTT GCAGTTGCGC 7560 CCGCGAACGA CATTTATAAT GAACGTGAAT TGCTCAACAG TATGAACATT TCGCAGCCTA 7620 CCGTAGTGTT TGTTTCCAAA AAGGGGTTGC AAAAAATTTT GAACGTGCAA AAAAAATTAC 7680 CAATAATCCA GAAAATTATT ATCATGGATT CTAAAACGGA TTACCAGGGA TTTCAGTCGA 7740

TGTACACGTT CGTCACATCT CATCTACCTC CCGGTTTTAA TGAATACGAT TTTGTACCAG 7800

AGTCCTTTGA TCGTGACAAA ACAATTGCAC TGATAATGAA TTCCTCTGGA TCTACTGGGT 7860

TACCTAAGGG TGTGGCCCTT CCGCATAGAA CTGCCTGCGT CAGATTCTCG CATGCCAGAG 7920

ATCCTATTTT TGGCAATCAA ATCATTCCGG ATACTGCGAT TTTAAGTGTT GTTCCATTCC 7980

ATCACGGTTT TGGAATGTTT ACTACACTCG GATATTTGAT ATGTGGATTT CGAGTCGTCT 8040

TAATGTATAG ATTTGAAGAA GAGCTGTTTT TACGATCCCT TCAGGATTAC AAAATTCAAA 8100

GTGCGTTGCT AGTACCAACC CTATTTTCAT TCTTCGCCAA AAGCACTCTG ATTGACAAAT 8160

ACGATTTATC TAATTTACAC GAAATTGCTT CTGGGGGCGC ACCTCTTTCG AAAGAAGTCG 8220

GGGAAGCGGT TGCAAAACGC TTCCATCTTC CAGGGATACG ACAAGGATAT GGGCTCACTG 8280

AGACTACATC AGCTATTCTG ATTACACCCG AGGGGGATGA TAAACCGGGC GCGGTCGGTA 8340

AAGTTGTTCC ATTTTTTGAA GCGAAGGTTG TGGATCTGGA TACCGGGAAA ACGCTGGGCG 8400

TTAATCAGAG AGGCGAATTA TGTGTCAGAG GACCTATGAT TATGTCCGGT TATGTAAACA 8460

ATCCGGAAGC GACCAACGCC TTGATTGACA AGGATGGATG GCTACATTCT GGAGACATAG 8520

CTTACTGGGA CGAAGACGAA CACTTCTTCA TAGTTGACCG CTTGAAGTCT TTAATTAAAT 8580

ACAAAGGATA TCAGGTGGCC CCCGCTGAAT TGGAATCGAT ATTGTTACAA CACCCCAACA 8640

TCTTCGACGC GGGCGTGGCA GGTCTTCCCG ACGATGACGC CGGTGAACTT CCCGCCGCCG 8700

TTGTTGTTTT GGAGCACGGA AAGACGATGA CGGAAAAAGA GATCGTGGAT TACGTCGCCA 8760

GTCAAGTAAC AACCGCGAAA AAGTTGCGCG GAGGAGTTGT GTTTGTGGAC GAAGTACCGA 8820

AAGGTCTTAC CGGAAAACTC GACGCAAGAA AAATCAGAGA GATCCTCATA AAGGCCAAGA 8880

AGGGCGGAAA GTCCAAATTG TAAAATGTAA CTGTATTCAG CGATGACGAA ATTCTTAGCT 8940

ATTGTAATGA CTCTAGAGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG TGACATAATT 9000

GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT TAAGTGTATA 9060

ATGTGTTAAA CTACTGATTC TAATTGTTTG TGTATTTTAG ATTCCAACCT ATGGAACTGA 9120

TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT CAGAAGAAAT 9180

GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC CAAAAAAGAA 9240

GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT TGAGTCATGC 9300

TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG AAAAAGCTGC 9360

ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA GGCATAACAG 9420

TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT CTGCTATTAA 9480

TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG TTAATAAGGA 9540

ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA TTTGTAGAGG 9600

TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT AAAATGAATG 9660

CAATTGTTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA AGCAATAGCA 9720

TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT TTGTCCAAAC 9780

TCATCAATGT ATCTTATCAT GTCTGGATCC CCAGGAAGCT CCTCTGTGTC CTCATAAACC 9840

CTAACCTCCT CTACTTGAGA GGACATTCCA ATCATAGGCT GCCCATCCAC CCTCTGTGTC 9900

CTCCTGTTAA TTAGGTCACT TAACAAAAAG GAAATTGGGT AGGGGTTTTT CACAGACCGC 9960

TTTCTAAGGG TAATTTTAAA ATATCTGGGA AGTCCCTTCC ACTGCTGTGT TCCAGAAGTG 10020

TTGGTAAACA GCCCACAAAT GTCAACAGCA GAAACATACA AGCTGTCAGC TTTGCACAAG 10080

GGCCCAACAC CCTGCTCAGC AAGAAGCACT GTGGTTGCTG TGTTAGTAAT GTGCAAAACA 10140

GGAGGCACAT TTTCCCCACC TGTGTAGGTT CCAAAATATC TAGTGTTTTC ATTTTTACTT 10200

GGATCAGGAA CCCAGCACTC CACTGGATAA GCATTATCCT TATCCAAAAC AGCCTTGTGG 10260

TCAGTGTTCA TCTGCTGACT GTCAACTGTA GCATTTTTTG GGGTTACAGT TTGAGCAGGA 10320

TATTTGGTCC TGTAGTTTGC TAACACACCC TGCAGCTCCA AAGGTTCCCC ACCAACAGCA 10380

AAAAAATGAA AATTTGACCC TTGAATGGGT TTTCCAGCAC CATTTTCATG AGTTTTTTGT 10440

GTCCCTGAAT GCAAGTTTAA CATAGCAGTT ACCCCAATAA CCTCAGTTTT AACAGTAACA 10500

GCTTCCCACA TCAAAATATT TCCACAGGTT AAGTCCTCAT TTAAATTAGG CAAAGGAA 10558 (2) INFORMATION FOR SEQ ID NO:5:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 10569 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: circular

(ii) MOLECULE TYPE: DNA (genomic)

(iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5:

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100

ATAGTTAAGC CAGTATTCGA CCTCGAGGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG 2160

TGACATAATT GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT 2220

TAAGTGTATA ATGTGTTAAA CTACTGATTC TAATTGTTTG TGTATTTTAG ATTCCAACCT 2280

ATGGAACTGA TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT 2340

CAGAAGAAAT GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC 2400

CAAAAAAGAA GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT 2460

TGAGTCATGC TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG 2520

AAAAAGCTGC ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA 2580

GGCATAACAG TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT 2640

CTGCTATTAA TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG 2700

TTAATAAGGA ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA 2760

TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT 2820

AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA 2880

AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT 2940

TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC GGCTGTGGAA TGTGTGTCAG 3000

TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA GTATGCAAAG CATGCATCTC 3060

AATTAGTCAG CAACCAGGTG TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG AAGTATGCAA 3120

AGCATGCATC TCAATTAGTC AGCAACCATA GTCCCGCCCC TAACTCCGCC CATCCCGCCC 3180

CTAACTCCGC CCAGTTCCGC CCATTCTCCG CCCCATGGCT GACTAATTTT TTTTATTTAT 3240

GCAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG AGGCTTTTTT 3300

GGAGGCCTAG GCTTTTGCAA AAAGCTTCAC GCTGCCGCAA GCACTCAGGG CGCAAGGGCT 3360

GCTAAAGGAA GCGGAACACG TAGAAAGCCA GTCCGCAGAA ACGGTGCTGA CCCCGGATGA 3420

ATGTCAGCTA CTGGGCTATC TGGACAAGGG AAAACGCAAG CGCAAAGAGA AAGCAGGTAG 3480

CTTGCAGTGG GCTTACATGG CGATAGCTAG ACTGGGCGGT TTTATGGACA GCAAGCGAAC 3540

CGGAATTGCC AGCTGGGGCG CCCTCTGGTA AGGTTGGGAA GCCCTGCAAA GTAAACTGGA 3600

TGGCTTTCTT GCCGCCAAGG ATCTGATGGC GCAGGGGATC AAGATCTGAT CAAGAGACAG 3660

GATGAGGATC GTTTCGCATG ATTGAACAAG ATGGATTGCA CGCAGGTTCT CCGGCCGCTT 3720

GGGTGGAGAG GCTATTCGGC TATGACTGGG CACAACAGAC AATCGGCTGC TCTGATGCCG 3780

CCGTGTTCCG GCTGTCAGCG CAGGGGCGCC CGGTTCTTTT TGTCAAGACC GACCTGTCCG 3840

GTGCCCTGAA TGAACTGCAG GACGAGGCAG CGCGGCTATC GTGGCTGGCC ACGACGGGCG 3900

TTCCTTGCGC AGCTGTGCTC GACGTTGTCA CTGAAGCGGG AAGGGACTGG CTGCTATTGG 3960

GCGAAGTGCC GGGGCAGGAT CTCCTGTCAT CTCACCTTGC TCCTGCCGAG AAAGTATCCA 4020

TCATGGCTGA TGCAATGCGG CGGCTGCATA CGCTTGATCC GGCTACCTGC CCATTCGACC 4080

ACCAAGCGAA ACATCGCATC GAGCGAGCAC GTACTCGGAT GGAAGCCGGT CTTGTCGATC 4140

AGGATGATCT GGACGAAGAG CATCAGGGGC TCGCGCCAGC CGAACTGTTC GCCAGGCTCA 4200

AGGCGCGCAT GCCCGACGGC GAGGATCTCG TCGTGACCCA TGGCGATGCC TGCTTGCCGA 4260

ATATCATGGT GGAAAATGGC CGCTTTTCTG GATTCATCGA CTGTGGCCGG CTGGGTGTGG 4320

CGGACCGCTA TCAGGACATA GCGTTGGCTA CCCGTGATAT TGCTGAAGAG CTTGGCGGCG 4380

AATGGGCTGA CCGCTTCCTC GTGCTTTACG GTATCGCCGC TCCCGATTCG CAGCGCATCG 4440

CCTTCTATCG CCTTCTTGAC GAGTTCTTCT GAGCGGGACT CTGGGGTTCG AAATGACCGA 4500

CCAAGCGACG CCCAACCTGC CATCACGAGA TTTCGATTCC ACCGCCGCCT TCTATGAAAG 4560

GTTGGGCTTC GGAATCGTTT TCCGGGACGC CGGCTGGATG ATCCTCCAGC GCGGGGATCT 4620

CATGCTGGAG TTCTTCGCCC ACCCCGGGCT CGATCCCCTC GCGAGTTGGT TCAGCTGCTG 4680

CCTGAGGCTG GACGACCTCG CGGAGTTCTA CCGGCAGTGC AAATCCGTCG GCATCCAGGA 4740

AACCAGCAGC GGCTATCCGC GCATCCATGC CCCCGAACTG CAGGAGTGGG GAGGCACGAT 4800

GGCCGCTTTG GTCCCGGATC TTTGTGAAGG AACCTTACTT CTGTGGTGTG ACATAATTGG 4860

ACAAACTACC TACAGAGATT TAAAGCTCTA AGGTAAATAT AAAATTTTTA AGTGTATAAT 4920

GTGTTAAACT ACTGATTCTA ATTGTTTGTG TATTTTAGAT TCCAACCTAT GGAACTGATG 4980

AATGGGAGCA GTGGTGGAAT GCCTTTAATG AGGAAAACCT GTTTTGCTCA GAAGAAATGC 5040

CATCTAGTGA TGATGAGGCT ACTGCTGACT CTCAACATTC TACTCCTCCA AAAAAGAAGA 5100

GAAAGGTAGA AGACCCCAAG GACTTTCCTT CAGAATTGCT AAGTTTTTTG AGTCATGCTG 5160

TGTTTAGTAA TAGAACTCTT GCTTGCTTTG CTATTTACAC CACAAAGGAA AAAGCTGCAC 5220

TGCTATACAA GAAAATTATG GAAAAATATT CTGTAACCTT TATAAGTAGG CATAACAGTT 5280

ATAATCATAA CATACTGTTT TTTCTTACTC CACACAGGCA TAGAGTGTCT GCTATTAATA 5340

ACTATGCTCA AAAATTGTGT ACCTTTAGCT TTTTAATTTG TAAAGGGGTT AATAAGGAAT 5400

ATTTGATGTA TAGTGCCTTG ACTAGAGATC ATAATCAGCC ATACCACATT TGTAGAGGTT 5460

TTACTTGCTT TAAAAAACCT CCCACACCTC CCCCTGAACC TGAAACATAA AATGAATGCA 5520

ATTGTTGTTG TTAACTTGTT TATTGCAGCT TATAATGGTT ACAAATAAAG CAATAGCATC 5580

ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA GTTGTGGTTT GTCCAAACTC 5640

ATCAATGTAT CTTATCATGT CTGGATCCCC AGGAAGCTCC TCTGTGTCCT CATAAACCCT 5700

AACCTCCTCT ACTTGAGAGG ACATTCCAAT CATAGGCTGC CCATCCACCC TCTGTGTCCT 5760

CCTGTTAATT AGGTCACTTA ACAAAAAGGA AATTGGGTAG GGGTTTTTCA CAGACCGCTT 5820

TCTAAGGGTA ATTTTAAAAT ATCTGGGAAG TCCCTTCCAC TGCTGTGTTC CAGAAGTGTT 5880

GGTAAACAGC CCACAAATGT CAACAGCAGA AACATACAAG CTGTCAGCTT TGCACAAGGG 5940

CCCAACACCC TGCTCATCAA GAAGCACTGT GGTTGCTGTG TTAGTAATGT GCAAAACAGG 6000

AGGCACATTT TCCCCACCTG TGTAGGTTCC AAAATATCTA GTGTTTTCAT TTTTACTTGG 6060

ATCAGGAACC CAGCACTCCA CTGGATAAGC ATTATCCTTA TCCAAAACAG CCTTGTGGTC 6120

AGTGTTCATC TGCTGACTGT CAACTGTAGC ATTTTTTGGG GTTACAGTTT GAGCAGGATA 6180

TTTGGTCCTG TAGTTTGCTA ACACACCCTG CAGCTCCAAA GGTTCCCCAC CAACAGCAAA 6240

AAAATGAAAA TTTGACCCTT GAATGGGTTT TCCAGCACCA TTTTCATGAG TTTTTTGTGT 6300

CCCTGAATGC AAGTTTAACA TAGCAGTTAC CCCAATAACC TCAGTTTTAA CAGTAACAGC 6360

TTCCCACATC AAAATATTTC CACAGGTTAA GTCCTCATTT AAATTAGGCA AAGGAATTAT 6420

ACACTCCGCT ATCGCTACGT GACTGGGTCA TGGCTGCGCC CCGACACCCG CCAACACCCG 6480

CTGACGCGCC CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 6540

TCTCCGGGAG CTGCATGTGT CAGAGGTTTT CACCGTCATC ACCGAAACGC GCGAGGCAGC 6600

GGATCATAAT CAGCCATACC ACATTTGTAG AGGTTTTACT TGCTTTAAAA AACCTCCCAC 6660

ACCTCCCCCT GAACCTGAAA CATAAAATGA ATGCAATTGT TGTTGTTAAC TTGTTTATTG 6720

CAGCTTATAA TGGTTACAAA TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTTT 6780

TTTCACTGCA TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 6840

TCATAATCAG CCATACCACA TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 6900

TCCCCCTGAA CCTGAAACAT AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG 6960

CTTATAATGG TTACAAATAA AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 7020

CACTGCATTC TAGTTGTGGT TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC 7080

CACTCCAACC TCAGCCAGAC AAGGTTGTTG ACACAAGACC CACATCTGGT ATAAAAGGAG 7140

GCAGTGGCCC ACAGAGGAGC ACAGCTGTGT TTGGCTGCAG GGCCAAGAGC GCTGTCAAGA 7200

AGACCCACAC GCCCCCCTCC AGCAGCTGAA TTCCAGCTGG CATTCCGGTA CTGTTGGTAA 7260

AATGGAAGAC GCCAAAAACA TAAAGAAAGG CCCGGCGCCA TTCTATCCTC TAGAGGATGG 7320

AACCGCTGGA GAGCAACTGC ATAAGGCTAT GAAGAGATAC GCCCTGGTTC CTGGAACAAT 7380

TGCTTTTACA GATGCACATA TCGAGGTGAA CATCACGTAC GCGGAATACT TCGAAATGTC 7440

CGTTCGGTTG GCAGAAGCTA TGAAACGATA TGGGCTGAAT ACAAATCACA GAATCGTCGT 7500

ATGCAGTGAA AACTCTCTTC AATTCTTTAT GCCGGTGTTG GGCGCGTTAT TTATCGGAGT 7560

TGCAGTTGCG CCCGCGAACG ACATTTATAA TGAACGTGAA TTGCTCAACA GTATGAACAT 7620

TTCGCAGCCT ACCGTAGTGT TTGTTTCCAA AAAGGGGTTG CAAAAAATTT TGAACGTGCA 7680

AAAAAAATTA CCAATAATCC AGAAAATTAT TATCATGGAT TCTAAAACGG ATTACCAGGG 7740

ATTTCAGTCG ATGTACACGT TCGTCACATC TCATCTACCT CCCGGTTTTA ATGAATACGA 7800

TTTTGTACCA GAGTCCTTTG ATCGTGACAA AACAATTGCA CTGATAATGA ATTCCTCTGG 7860

ATCTACTGGG TTACCTAAGG GTGTGGCCCT TCCGCATAGA ACTGCCTGCG TCAGATTCTC 7920

GCATGCCAGA GATCCTATTT TTGGCAATCA AATCATTCCG GATACTGCGA TTTTAAGTGT 7980

TGTTCCATTC CATCACGGTT TTGGAATGTT TACTACACTC GGATATTTGA TATGTGGATT 8040

TCGAGTCGTC TTAATGTATA GATTTGAAGA AGAGCTGTTT TTACGATCCC TTCAGGATTA 8100

CAAAATTCAA AGTGCGTTGC TAGTACCAAC CCTATTTTCA TTCTTCGCCA AAAGCACTCT 8160

GATTGACAAA TACGATTTAT CTAATTTACA CGAAATTGCT TCTGGGGGCG CACCTCTTTC 8220

GAAAGAAGTC GGGGAAGCGG TTGCAAAACG CTTCCATCTT CCAGGGATAC GACAAGGATA 8280

TGGGCTCACT GAGACTACAT CAGCTATTCT GATTACACCC GAGGGGGATG ATAAACCGGG 8340

CGCGGTCGGT AAAGTTGTTC CATTTTTTGA AGCGAAGGTT GTGGATCTGG ATACCGGGAA 8400

AACGCTGGGC GTTAATCAGA GAGGCGAATT ATGTGTCAGA GGACCTATGA TTATGTCCGG 8460

TTATGTAAAC AATCCGGAAG CGACCAACGC CTTGATTGAC AAGGATGGAT GGCTACATTC 8520

TGGAGACATA GCTTACTGGG ACGAAGACGA ACACTTCTTC ATAGTTGACC GCTTGAAGTC 8580

TTTAATTAAA TACAAAGGAT ATCAGGTGGC CCCCGCTGAA TTGGAATCGA TATTGTTACA 8640

ACACCCCAAC ATCTTCGACG CGGGCGTGGC AGGTCTTCCC GACGATGACG CCGGTGAACT 8700

TCCCGCCGCC GTTGTTGTTT TGGAGCACGG AAAGACGATG ACGGAAAAAG AGATCGTGGA 8760

TTACGTCGCC AGTCAAGTAA CAACCGCGAA AAAGTTGCGC GGAGGAGTTG TGTTTGTGGA 8820

CGAAGTACCG AAAGGTCTTA CCGGAAAACT CGACGCAAGA AAAATCAGAG AGATCCTCAT 8880

AAAGGCCAAG AAGGGCGGAA AGTCCAAATT GTAAAATGTA ACTGTATTCA GCGATGACGA 8940

AATTCTTAGC TATTGTAATG ACTCTAGAGG ATCTTTGTGA AGGAACCTTA CTTCTGTGGT 9000

GTGACATAAT TGGACAAACT ACCTACAGAG ATTTAAAGCT CTAAGGTAAA TATAAAATTT 9060

TTAAGTGTAT AATGTGTTAA ACTACTGATT CTAATTGTTT GTGTATTTTA GATTCCAACC 9120

TATGGAACTG ATGAATGGGA GCAGTGGTGG AATGCCTTTA ATGAGGAAAA CCTGTTTTGC 9180 TCAGAAGAAA TGCCATCTAG TGATGATGAG GCTACTGCTG ACTCTCAACA TTCTACTCCT 9240 CCAAAAAAGA AGAGAAAGGT AGAAGACCCC AAGGACTTTC CTTCAGAATT GCTAAGTTTT 9300 TTGAGTCATG CTGTGTTTAG TAATAGAACT CTTGCTTGCT TTGCTATTTA CACCACAAAG 9360 GAAAAAGCTG CACTGCTATA CAAGAAAATT ATGGAAAAAT ATTCTGTAAC CTTTATAAGT 9420 AGGCATAACA GTTATAATCA TAACATACTG TTTTTTCTTA CTCCACACAG GCATAGAGTG 9480 TCTGCTATTA ATAACTATGC TCAAAAATTG TGTACCTTTA GCTTTTTAAT TTGTAAAGGG 9540 GTTAATAAGG AATATTTGAT GTATAGTGCC TTGACTAGAG ATCATAATCA GCCATACCAC 9600 ATTTGTAGAG GTTTTACTTG CTTTAAAAAA CCTCCCACAC CTCCCCCTGA ACCTGAAACA 9660 TAAAATGAAT GCAATTGTTG TTGTTAACTT GTTTATTGCA GCTTATAATG GTTACAAATA 9720 AAGCAATAGC ATCACAAATT TCACAAATAA AGCATTTTTT TCACTGCATT CTAGTTGTGG 9780 TTTGTCCAAA CTCATCAATG TATCTTATCA TGTCTGGATC CCCAGGAAGC TCCTCTGTGT 9840 CCTCATAAAC CCTAACCTCC TCTACTTGAG AGGACATTCC AATCATAGGC TGCCCATCCA 9900 CCCTCTGTGT CCTCCTGTTA ATTAGGTCAC TTAACAAAAA GGAAATTGGG TAGGGGTTTT 9960

TCACAGACCG CTTTCTAAGG GTAATTTTAA AATATCTGGG AAGTCCCTTC CACTGCTGTG 10020

TTCCAGAAGT GTTGGTAAAC AGCCCACAAA TGTCAACAGC AGAAACATAC AAGCTGTCAG 10080

CTTTGCACAA GGGCCCAACA CCCTGCTCAG CAAGAAGCAC TGTGGTTGCT GTGTTAGTAA 10140

TGTGCAAAAC AGGAGGCACA TTTTCCCCAC CTGTGTAGGT TCCAAAATAT CTAGTGTTTT 10200

CATTTTTACT TGGATCAGGA ACCCAGCACT CCACTGGATA AGCATTATCC TTATCCAAAA 10260

CAGCCTTGTG GTCAGTGTTC ATCTGCTGAC TGTCAACTGT AGCATTTTTT GGGGTTACAG 10320

TTTGAGCAGG ATATTTGGTC CTGTAGTTTG CTAACACACC CTGCAGCTCC AAAGGTTCCC 10380

CACCAACAGC AAAAAAATGA AAATTTGACC CTTGAATGGG TTTTCCAGCA CCATTTTCAT 10440

GAGTTTTTTG TGTCCCTGAA TGCAAGTTTA ACATAGCAGT TACCCCAATA ACCTCAGTTT 10500

TAACAGTAAC AGCTTCCCAC ATCAAAATAT TTCCACAGGT TAAGTCCTCA TTTAAATTAG 10560

GCAAAGGAA 10569 (2) INFORMATION FOR SEQ ID NO:6:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 10558 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: circular

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:6:

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100

ATAGTTAAGC CAGTATTCGA CCTCGAGGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG 2160

TGACATAATT GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT 2220

TAAGTGTATA ATGTGTTAAA CTACTGATTC TAATTGTTTG TGTATTTTAG ATTCCAACCT 2280

ATGGAACTGA TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT 2340

CAGAAGAAAT GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC 2400

CAAAAAAGAA GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT 2460

TGAGTCATGC TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG 2520

AAAAAGCTGC ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA 2580

GGCATAACAG TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT 2640

CTGCTATTAA TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG 2700

TTAATAAGGA ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA 2760

TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT 2820

AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA 2880

AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT- 2940

TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC GGCTGTGGAA TGTGTGTCAG 3000

TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA GTATGCAAAG CATGCATCTC 3060

AATTAGTCAG CAACCAGGTG TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG AAGTATGCAA 3120

AGCATGCATC TCAATTAGTC AGCAACCATA GTCCCGCCCC TAACTCCGCC CATCCCGCCC 3180

CTAACTCCGC CCAGTTCCGC CCATTCTCCG CCCCATGGCT GACTAATTTT TTTTATTTAT 3240

GCAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG AGGCTTTTTT 3300

GGAGGCCTAG GCTTTTGCAA AAAGCTTCAC GCTGCCGCAA GCACTCAGGG CGCAAGGGCT 3360

GCTAAAGGAA GCGGAACACG TAGAAAGCCA GTCCGCAGAA ACGGTGCTGA CCCCGGATGA 3420

ATGTCAGCTA CTGGGCTATC TGGACAAGGG AAAACGCAAG CGCAAAGAGA AAGCAGGTAG 3480

CTTGCAGTGG GCTTACATGG CGATAGCTAG ACTGGGCGGT TTTATGGACA GCAAGCGAAC 3540

CGGAATTGCC AGCTGGGGCG CCCTCTGGTA AGGTTGGGAA GCCCTGCAAA GTAAACTGGA 3600

TGGCTTTCTT GCCGCCAAGG ATCTGATGGC GCAGGGGATC AAGATCTGAT CAAGAGACAG 3660

GATGAGGATC GTTTCGCATG ATTGAACAAG ATGGATTGCA CGCAGGTTCT CCGGCCGCTT 3720

GGGTGGAGAG GCTATTCGGC TATGACTGGG CACAACAGAC AATCGGCTGC TCTGATGCCG 3780

CCGTGTTCCG GCTGTCAGCG CAGGGGCGCC CGGTTCTTTT TGTCAAGACC GACCTGTCCG 3840

GTGCCCTGAA TGAACTGCAG GACGAGGCAG CGCGGCTATC GTGGCTGGCC ACGACGGGCG 3900

TTCCTTGCGC AGCTGTGCTC GACGTTGTCA CTGAAGCGGG AAGGGACTGG CTGCTATTGG 3960

GCGAAGTGCC GGGGCAGGAT CTCCTGTCAT CTCACCTTGC TCCTGCCGAG AAAGTATCCA 4020

TCATGGCTGA TGCAATGCGG CGGCTGCATA CGCTTGATCC GGCTACCTGC CCATTCGACC 4080

ACCAAGCGAA ACATCGCATC GAGCGAGCAC GTACTCGGAT GGAAGCCGGT CTTGTCGATC 4140

AGGATGATCT GGACGAAGAG CATCAGGGGC TCGCGCCAGC CGAACTGTTC GCCAGGCTCA 4200

AGGCGCGCAT GCCCGACGGC GAGGATCTCG TCGTGACCCA TGGCGATGCC TGCTTGCCGA 4260

ATATCATGGT GGAAAATGGC CGCTTTTCTG GATTCATCGA CTGTGGCCGG CTGGGTGTGG 4320

CGGACCGCTA TCAGGACATA GCGTTGGCTA CCCGTGATAT TGCTGAAGAG CTTGGCGGCG 4380

AATGGGCTGA CCGCTTCCTC GTGCTTTACG GTATCGCCGC TCCCGATTCG CAGCGCATCG 4440

CCTTCTATCG CCTTCTTGAC GAGTTCTTCT GAGCGGGACT CTGGGGTTCG AAATGACCGA 4500

CCAAGCGACG CCCAACCTGC CATCACGAGA TTTCGATTCC ACCGCCGCCT TCTATGAAAG 4560

GTTGGGCTTC GGAATCGTTT TCCGGGACGC CGGCTGGATG ATCCTCCAGC GCGGGGATCT 4620

CATGCTGGAG TTCTTCGCCC ACCCCGGGCT CGATCCCCTC GCGAGTTGGT TCAGCTGCTG 4680

CCTGAGGCTG GACGACCTCG CGGAGTTCTA CCGGCAGTGC AAATCCGTCG GCATCCAGGA 4740

AACCAGCAGC GGCTATCCGC GCATCCATGC CCCCGAACTG CAGGAGTGGG GAGGCACGAT 4800

GGCCGCTTTG GTCCCGGATC TTTGTGAAGG AACCTTACTT CTGTGGTGTG ACATAATTGG 4860

ACAAACTACC TACAGAGATT TAAAGCTCTA AGGTAAATAT AAAATTTTTA AGTGTATAAT 4920

GTGTTAAACT ACTGATTCTA ATTGTTTGTG TATTTTAGAT TCCAACCTAT GGAACTGATG 4980

AATGGGAGCA GTGGTGGAAT GCCTTTAATG AGGAAAACCT GTTTTGCTCA GAAGAAATGC 5040

CATCTAGTGA TGATGAGGCT ACTGCTGACT CTCAACATTC TACTCCTCCA AAAAAGAAGA 5100

GAAAGGTAGA AGACCCCAAG GACTTTCCTT CAGAATTGCT AAGTTTTTTG AGTCATGCTG 5160

TGTTTAGTAA TAGAACTCTT GCTTGCTTTG CTATTTACAC CACAAAGGAA AAAGCTGCAC 5220

TGCTATACAA GAAAATTATG GAAAAATATT CTGTAACCTT TATAAGTAGG CATAACAGTT 5280

ATAATCATAA CATACTGTTT TTTCTTACTC CACACAGGCA TAGAGTGTCT GCTATTAATA 5340

ACTATGCTCA AAAATTGTGT ACCTTTAGCT TTTTAATTTG TAAAGGGGTT AATAAGGAAT 5400

ATTTGATGTA TAGTGCCTTG ACTAGAGATC ATAATCAGCC ATACCACATT TGTAGAGGTT 5460

TTACTTGCTT TAAAAAACCT CCCACACCTC CCCCTGAACC TGAAACATAA AATGAATGCA 5520

ATTGTTGTTG TTAACTTGTT TATTGCAGCT TATAATGGTT ACAAATAAAG CAATAGCATC 5580

ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA GTTGTGGTTT GTCCAAACTC 5640

ATCAATGTAT CTTATCATGT CTGGATCCCC AGGAAGCTCC TCTGTGTCCT CATAAACCCT 5700

AACCTCCTCT ACTTGAGAGG ACATTCCAAT CATAGGCTGC CCATCCACCC TCTGTGTCCT 5760

CCTGTTAATT AGGTCACTTA ACAAAAAGGA AATTGGGTAG GGGTTTTTCA CAGACCGCTT 5820

TCTAAGGGTA ATTTTAAAAT ATCTGGGAAG TCCCTTCCAC TGCTGTGTTC CAGAAGTGTT 5880

GGTAAACAGC CCACAAATGT CAACAGCAGA AACATACAAG CTGTCAGCTT TGCACAAGGG 5940

CCCAACACCC TGCTCATCAA GAAGCACTGT GGTTGCTGTG TTAGTAATGT GCAAAACAGG 6000

AGGCACATTT TCCCCACCTG TGTAGGTTCC AAAATATCTA GTGTTTTCAT TTTTACTTGG 6060

ATCAGGAACC CAGCACTCCA CTGGATAAGC ATTATCCTTA TCCAAAACAG CCTTGTGGTC 6120

AGTGTTCATC TGCTGACTGT CAACTGTAGC ATTTTTTGGG GTTACAGTTT GAGCAGGATA 6180

TTTGGTCCTG TAGTTTGCTA ACACACCCTG CAGCTCCAAA GGTTCCCCAC CAACAGCAAA 6240

AAAATGAAAA TTTGACCCTT GAATGGGTTT TCCAGCACCA TTTTCATGAG TTTTTTGTGT 6300

CCCTGAATGC AAGTTTAACA TAGCAGTTAC CCCAATAACC TCAGTTTTAA CAGTAACAGC 6360

TTCCCACATC AAAATATTTC CACAGGTTAA GTCCTCATTT AAATTAGGCA AAGGAATTAT 6420

ACACTCCGCT ATCGCTACGT GACTGGGTCA TGGCTGCGCC CCGACACCCG CCAACACCCG 6480

CTGACGCGCC CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 6540

TCTCCGGGAG CTGCATGTGT CAGAGGTTTT CACCGTCATC ACCGAAACGC GCGAGGCAGC 6600

GGATCATAAT CAGCCATACC ACATTTGTAG AGGTTTTACT TGCTTTAAAA AACCTCCCAC 6660

ACCTCCCCCT GAACCTGAAA CATAAAATGA ATGCAATTGT TGTTGTTAAC TTGTTTATTG 6720

CAGCTTATAA TGGTTACAAA TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTTT 6780

TTTCACTGCA TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 6840

TCATAATCAG CCATACCACA TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 6900

TCCCCCTGAA CCTGAAACAT AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG 6960

CTTATAATGG TTACAAATAA AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 7020

CACTGCATTC TAGTTGTGGT TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC 7080

CAGCCAGACA AGGTTGTTGA CACAAGACCC ACATCTGGTA TAAAAGGAGG CAGTGGCCCA 7140

CAGAGGAGCA CAGCTGTGTT TGGCTGCAGG GCCAAGAGCG CTGTCAAGAA GACCCACACG 7200

CCCCCCTCCA GCAGCTGAAT TCCAGCTGGC ATTCCGGTAC TGTTGGTAAA ATGGAAGACG 7260

CCAAAAACAT AAAGAAAGGC CCGGCGCCAT TCTATCCTCT AGAGGATGGA ACCGCTGGAG 7320

AGCAACTGCA TAAGGCTATG AAGAGATACG CCCTGGTTCC TGGAACAATT GCTTTTACAG 7380

ATGCACATAT CGAGGTGAAC ATCACGTACG CGGAATACTT CGAAATGTCC GTTCGGTTGG 7440

CAGAAGCTAT GAAACGATAT GGGCTGAATA CAAATCACAG AATCGTCGTA TGCAGTGAAA 7500

ACTCTCTTCA ATTCTTTATG CCGGTGTTGG GCGCGTTATT TATCGGAGTT GCAGTTGCGC 7560

CCGCGAACGA CATTTATAAT GAACGTGAAT TGCTCAACAG TATGAACATT TCGCAGCCTA 7620

CCGTAGTGTT TGTTTCCAAA AAGGGGTTGC AAAAAATTTT GAACGTGCAA AAAAAATTAC 7680

CAATAATCCA GAAAATTATT ATCATGGATT CTAAAACGGA TTACCAGGGA TTTCAGTCGA 7740

TGTACACGTT CGTCACATCT CATCTACCTC CCGGTTTTAA TGAATACGAT TTTGTACCAG 7800

AGTCCTTTGA TCGTGACAAA ACAATTGCAC TGATAATGAA TTCCTCTGGA TCTACTGGGT 7860

TACCTAAGGG TGTGGCCCTT CCGCATAGAA CTGCCTGCGT CAGATTCTCG CATGCCAGAG 7920

ATCCTATTTT TGGCAATCAA ATCATTCCGG ATACTGCGAT TTTAAGTGTT GTTCCATTCC 7980

ATCACGGTTT TGGAATGTTT ACTACACTCG GATATTTGAT ATGTGGATTT CGAGTCGTCT 8040

TAATGTATAG ATTTGAAGAA GAGCTGTTTT TACGATCCCT TCAGGATTAC AAAATTCAAA 8100

GTGCGTTGCT AGTACCAACC CTATTTTCAT TCTTCGCCAA AAGCACTCTG ATTGACAAAT 8160

ACGATTTATC TAATTTACAC GAAATTGCTT CTGGGGGCGC ACCTCTTTCG AAAGAAGTCG 8220

GGGAAGCGGT TGCAAAACGC TTCCATCTTC CAGGGATACG ACAAGGATAT GGGCTCACTG 8280

AGACTACATC AGCTATTCTG ATTACACCCG AGGGGGATGA TAAACCGGGC GCGGTCGGTA 8340

AAGTTGTTCC ATTTTTTGAA GCGAAGGTTG TGGATCTGGA TACCGGGAAA ACGCTGGGCG 8400

TTAATCAGAG AGGCGAATTA TGTGTCAGAG GACCTATGAT TATGTCCGGT TATGTAAACA 8460

ATCCGGAAGC GACCAACGCC TTGATTGACA AGGATGGATG GCTACATTCT GGAGACATAG 8520

CTTACTGGGA CGAAGACGAA CACTTCTTCA TAGTTGACCG CTTGAAGTCT TTAATTAAAT 8580

ACAAAGGATA TCAGGTGGCC CCCGCTGAAT TGGAATCGAT ATTGTTACAA CACCCCAACA 8640

TCTTCGACGC GGGCGTGGCA GGTCTTCCCG ACGATGACGC CGGTGAACTT CCCGCCGCCG 8700

TTGTTGTTTT GGAGCACGGA AAGACGATGA CGGAAAAAGA GATCGTGGAT TACGTCGCCA 8760

GTCAAGTAAC AACCGCGAAA AAGTTGCGCG GAGGAGTTGT GTTTGTGGAC GAAGTACCGA 8820

AAGGTCTTAC CGGAAAACTC GACGCAAGAA AAATCAGAGA GATCCTCATA AAGGCCAAGA 8880

AGGGCGGAAA GTCCAAATTG TAAAATGTAA CTGTATTCAG CGATGACGAA ATTCTTAGCT 8940

ATTGTAATGA CTCTAGAGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG TGACATAATT 9000

GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT TAAGTGTATA 9060

ATGTGTTAAA CTACTGATTC TAATTGTTTG TGTATTTTAG ATTCCAACCT ATGGAACTGA 9120

TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT CAGAAGAAAT 9180

GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC CAAAAAAGAA 9240

GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT TGAGTCATGC 9300

TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG AAAAAGCTGC 9360

ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA GGCATAACAG 9420

TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT CTGCTATTAA 9480

TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG TTAATAAGGA 9540

ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA TTTGTAGAGG 9600

TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT AAAATGAATG 9660

CAATTGTTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA AGCAATAGCA 9720

TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT TTGTCCAAAC 9780

TCATCAATGT ATCTTATCAT GTCTGGATCC CCAGGAAGCT CCTCTGTGTC CTCATAAACC 9840

CTAACCTCCT CTACTTGAGA GGACATTCCA ATCATAGGCT GCCCATCCAC CCTCTGTGTC 9900

CTCCTGTTAA TTAGGTCACT TAACAAAAAG GAAATTGGGT AGGGGTTTTT CACAGACCGC 9960

TTTCTAAGGG TAATTTTAAA ATATCTGGGA AGTCCCTTCC ACTGCTGTGT TCCAGAAGTG 10020

TTGGTAAACA GCCCACAAAT GTCAACAGCA GAAACATACA AGCTGTCAGC TTTGCACAAG 10080

GGCCCAACAC CCTGCTCAGC AAGAAGCACT GTGGTTGCTG TGTTAGTAAT GTGCAAAACA 10140

GGAGGCACAT TTTCCCCACC TGTGTAGGTT CCAAAATATC TAGTGTTTTC ATTTTTACTT 10200

GGATCAGGAA CCCAGCACTC CACTGGATAA GCATTATCCT TATCCAAAAC AGCCTTGTGG 10260

TCAGTGTTCA TCTGCTGACT GTCAACTGTA GCATTTTTTG GGGTTACAGT TTGAGCAGGA 10320

TATTTGGTCC TGTAGTTTGC TAACACACCC TGCAGCTCCA AAGGTTCCCC ACCAACAGCA 10380

AAAAAATGAA AATTTGACCC TTGAATGGGT TTTCCAGCAC CATTTTCATG AGTTTTTTGT 10440

GTCCCTGAAT GCAAGTTTAA CATAGCAGTT ACCCCAATAA CCTCAGTTTT AACAGTAACA 10500 GCTTCCCACA TCAAAATATT TCCACAGGTT AAGTCCTCAT TTAAATTAGG CAAAGGAA 10558 (2) INFORMATION FOR SEQ ID NO:7:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 6245 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: circular

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:7:

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA i200

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100

ATAGTTAAGC CAGTATACAC TCCGCTATCG CTACGTGACT GGGTCATGGC TGCGCCCCGA 2160

CACCCGCCAA CACCCGCTGA CGCGCCCTGA CGGGCTTGTC TGCTCCCGGC ATCCGCTTAC 2220

AGACAAGCTG TGACCGTCTC CGGGAGCTGC ATGTGTCAGA GGTTTTCACC GTCATCACCG 2280

AAACGCGCGA GGCAGCGGAT CATAATCAGC CATACCACAT TTGTAGAGGT TTTACTTGCT 2340

TTAAAAAACC TCCCACACCT CCCCCTGAAC CTGAAACATA AAATGAATGC AATTGTTGTT 2400

GTTAACTTGT TTATTGCAGC TTATAATGGT TACAAATAAA GCAATAGCAT CACAAATTTC 2460

ACAAATAAAG CATTTTTTTC ACTGCATTCT AGTTGTGGTT TGTCCAAACT CATCAATGTA 2520

TCTTATCATG TCTGGATCAT AATCAGCCAT ACCACATTTG TAGAGGTTTT ACTTGCTTTA 2580

AAAAACCTCC CACACCTCCC CCTGAACCTG AAACATAAAA TGAATGCAAT TGTTGTTGTT 2640

AACTTGTTTA TTGCAGCTTA TAATGGTTAC AAATAAAGCA ATAGCATCAC AAATTTCACA 2700

AATAAAGCAT TTTTTTCACT GCATTCTAGT TGTGGTTTGT CCAAACTCAT CAATGTATCT 2760

TATCATGTCT GGATCCCAAG TTCATCTATT TCCTCCCACA TCTGGTATAA AAGGAGGCAG 2820

TGGCCCACAG AGGAGCACAG CTGTGTTTGG CTGCAGGGCC AAGAGCGCTG TCAAGAAGAC 2880

CCACACGCCC CCCTCCAGCA GCTGAATTCC AGCTGGCATT CCGGTACTGT TGGTAAAATG 2940

GAAGACGCCA AAAACATAAA GAAAGGCCCG GCGCCATTCT ATCCTCTAGA GGATGGAACC 3000

GCTGGAGAGC AACTGCATAA GGCTATGAAG AGATACGCCC TGGTTCCTGG AACAATTGCT 3060

TTTACAGATG CACATATCGA GGTGAACATC ACGTACGCGG AATACTTCGA AATGTCCGTT 3120

CGGTTGGCAG AAGCTATGAA ACGATATGGG CTGAATACAA ATCACAGAAT CGTCGTATGC 3180

AGTGAAAACT CTCTTCAATT CTTTATGCCG GTGTTGGGCG CGTTATTTAT CGGAGTTGCA 3240

GTTGCGCCCG CGAACGACAT TTATAATGAA CGTGAATTGC TCAACAGTAT GAACATTTCG 3300

CAGCCTACCG TAGTGTTTGT TTCCAAAAAG GGGTTGCAAA AAATTTTGAA CGTGCAAAAA 3360

AAATTACCAA TAATCCAGAA AATTATTATC ATGGATTCTA AAACGGATTA CCAGGGATTT 3420

CAGTCGATGT ACACGTTCGT CACATCTCAT CTACCTCCCG GTTTTAATGA ATACGATTTT 3480

GTACCAGAGT CCTTTGATCG TGACAAAACA ATTGCACTGA TAATGAATTC CTCTGGATCT 3540

ACTGGGTTAC CTAAGGGTGT GGCCCTTCCG CATAGAACTG CCTGCGTCAG ATTCTCGCAT 3600

GCCAGAGATC CTATTTTTGG CAATCAAATC ATTCCGGATA CTGCGATTTT AAGTGTTGTT 3660

CCATTCCATC ACGGTTTTGG AATGTTTACT ACACTCGGAT ATTTGATATG TGGATTTCGA 3720

GTCGTCTTAA TGTATAGATT TGAAGAAGAG CTGTTTTTAC GATCCCTTCA GGATTACAAA 3780

ATTCAAAGTG CGTTGCTAGT ACCAACCCTA TTTTCATTCT TCGCCAAAAG CACTCTGATT 3840

GACAAATACG ATTTATCTAA TTTACACGAA ATTGCTTCTG GGGGCGCACC TCTTTCGAAA 3900

GAAGTCGGGG AAGCGGTTGC AAAACGCTTC CATCTTCCAG GGATACGACA AGGATATGGG 3960

CTCACTGAGA CTACATCAGC TATTCTGATT ACACCCGAGG GGGATGATAA ACCGGGCGCG 4020

GTCGGTAAAG TTGTTCCATT TTTTGAAGCG AAGGTTGTGG ATCTGGATAC CGGGAAAACG 4080 CTGGGCGTTA ATCAGAGAGG CGAATTATGT GTCAGAGGAC CTATGATTAT GTCCGGTTAT 4140 GTAAACAATC CGGAAGCGAC CAACGCCTTG ATTGACAAGG ATGGATGGCT ACATTCTGGA 4200 GACATAGCTT ACTGGGACGA AGACGAACAC TTCTTCATAG TTGACCGCTT GAAGTCTTTA 4260 ATTAAATACA AAGGATATCA GGTGGCCCCC GCTGAATTGG AATCGATATT GTTACAACAC 4320 CCCAACATCT TCGACGCGGG CGTGGCAGGT CTTCCCGACG ATGACGCCGG TGAACTTCCC 4380 GCCGCCGTTG TTGTTTTGGA GCACGGAAAG ACGATGACGG AAAAAGAGAT CGTGGATTAC 4440 GTCGCCAGTC AAGTAACAAC CGCGAAAAAG TTGCGCGGAG GAGTTGTGTT TGTGGACGAA 4500 GTACCGAAAG GTCTTACCGG AAAACTCGAC GCAAGAAAAA TCAGAGAGAT CCTCATAAAG 4560 GCCAAGAAGG GCGGAAAGTC CAAATTGTAA AATGTAACTG TATTCAGCGA TGACGAAATT 4620 CTTAGCTATT GTAATGACTC TAGAGGATCT TTGTGAAGGA ACCTTACTTC TGTGGTGTGA 4680 CATAATTGGA CAAACTACCT ACAGAGATTT AAAGCTCTAA GGTAAATATA AAATTTTTAA 4740 GTGTATAATG TGTTAAACTA CTGATTCTAA TTGTTTGTGT ATTTTAGATT CCAACCTATG 4800 GAACTGATGA ATGGGAGCAG TGGTGGAATG CCTTTAATGA GGAAAACCTG TTTTGCTCAG 4860 AAGAAATGCC ATCTAGTGAT GATGAGGCTA CTGCTGACTC TCAACATTCT ACTCCTCCAA 4920 AAAAGAAGAG AAAGGTAGAA GACCCCAAGG ACTTTCCTTC AGAATTGCTA AGTTTTTTGA 4980 GTCATGCTGT GTTTAGTAAT AGAACTCTTG CTTGCTTTGC TATTTACACC ACAAAGGAAA 5040 AAGCTGCACT GCTATACAAG AAAATTATGG AAAAATATTC TGTAACCTTT ATAAGTAGGC 5100 ATAACAGTTA TAATCATAAC ATACTGTTTT TTCTTACTCC ACACAGGCAT AGAGTGTCTG 5160 CTATTAATAA CTATGCTCAA AAATTGTGTA CCTTTAGCTT TTTAATTTGT AAAGGGGTTA 5220 ATAAGGAATA TTTGATGTAT AGTGCCTTGA CTAGAGATCA TAATCAGCCA TACCACATTT 5280 GTAGAGGTTT TACTTGCTTT AAAAAACCTC CCACACCTCC CCCTGAACCT GAAACATAAA 5340 ATGAATGCAA TTGTTGTTGT TAACTTGTTT ATTGCAGCTT ATAATGGTTA CAAATAAAGC 5400 AATAGCATCA CAAATTTCAC AAATAAAGCA TTTTTTTCAC TGCATTCTAG TTGTGGTTTG 5460 TCCAAACTCA TCAATGTATC TTATCATGTC TGGATCCCCA GGAAGCTCCT CTGTGTCCTC 5520 ATAAACCCTA ACCTCCTCTA CTTGAGAGGA CATTCCAATC ATAGGCTGCC CATCCACCCT 5580

CTGTGTCCTC CTGTTAATTA GGTCACTTAA CAAAAAGGAA ATTGGGTAGG GGTTTTTCAC 5640

AGACCGCTTT CTAAGGGTAA TTTTAAAATA TCTGGGAAGT CCCTTCCACT GCTGTGTTCC 5700

AGAAGTGTTG GTAAACAGCC CACAAATGTC AACAGCAGAA ACATACAAGC TGTCAGCTTT 5760

GCACAAGGGC CCAACACCCT GCTCAGCAAG AAGCACTGTG GTTGCTGTGT TAGTAATGTG 5820

CAAAACAGGA GGCACATTTT CCCCACCTGT GTAGGTTCCA AAATATCTAG TGTTTTCATT 5880

TTTACTTGGA TCAGGAACCC AGCACTCCAC TGGATAAGCA TTATCCTTAT CCAAAACAGC 5940

CTTGTGGTCA GTGTTCATCT GCTGACTGTC AACTGTAGCA TTTTTTGGGG TTACAGTTTG 6000

AGCAGGATAT TTGGTCCTGT AGTTTGCTAA CACACCCTGC AGCTCCAAAG GTTCCCCACC 6060

AACAGCAAAA AAATGAAAAT TTGACCCTTG AATGGGTTTT CCAGCACCAT TTTCATGAGT 6120

TTTTTGTGTC CCTGAATGCA AGTTTAACAT AGCAGTTACC CCAATAACCT CAGTTTTAAC 6180

AGTAACAGCT TCCCACATCA AAATATTTCC ACAGGTTAAG TCCTCATTTA AATTAGGCAA 6240

AGGAA 6245 (2) INFORMATION FOR SEQ ID NO:8:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 6254 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: circular

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8:

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100

ATAGTTAAGC CAGTATACAC TCCGCTATCG CTACGTGACT GGGTCATGGC TGCGCCCCGA 2160

CACCCGCCAA CACCCGCTGA CGCGCCCTGA CGGGCTTGTC TGCTCCCGGC ATCCGCTTAC 2220

AGACAAGCTG TGACCGTCTC CGGGAGCTGC ATGTGTCAGA GGTTTTCACC GTCATCACCG 2280

AAACGCGCGA GGCAGCGGAT CATAATCAGC CATACCACAT TTGTAGAGGT TTTACTTGCT 2340

TTAAAAAACC TCCCACACCT CCCCCTGAAC CTGAAACATA AAATGAATGC AATTGTTGTT 2400

GTTAACTTGT TTATTGCAGC TTATAATGGT TACAAATAAA GCAATAGCAT CACAAATTTC 2460

ACAAATAAAG CATTTTTTTC ACTGCATTCT AGTTGTGGTT TGTCCAAACT CATCAATGTA 2520

TCTTATCATG TCTGGATCAT AATCAGCCAT ACCACATTTG TAGAGGTTTT ACTTGCTTTA 2580

AAAAACCTCC CACACCTCCC CCTGAACCTG AAACATAAAA TGAATGCAAT TGTTGTTGTT 2640

AACTTGTTTA TTGCAGCTTA TAATGGTTAC AAATAAAGCA ATAGCATCAC AAATTTCACA 2700

AATAAAGCAT TTTTTTCACT GCATTCTAGT TGTGGTTTGT CCAAACTCAT CAATGTATCT 2760

TATCATGTCT GGATCCCAGT GGGGAGTCAG CCGTGTATCA TCGCCCACAT CTGGTATAAA 2820

AGGAGGCAGT GGCCCACAGA GGAGCACAGC TGTGTTTGGC TGCAGGGCCA AGAGCGCTGT 2880

CAAGAAGACC CACACGCCCC CCTCCAGCAG CTGAATTCCA GCTGGCATTC CGGTACTGTT 2940

GGTAAAATGG AAGACGCCAA AAACATAAAG AAAGGCCCGG CGCCATTCTA TCCTCTAGAG 3000

GATGGAACCG CTGGAGAGCA ACTGCATAAG GCTATGAAGA GATACGCCCT GGTTCCTGGA 3060

ACAATTGCTT TTACAGATGC ACATATCGAG GTGAACATCA CGTACGCGGA ATACTTCGAA 3120

ATGTCCGTTC GGTTGGCAGA AGCTATGAAA CGATATGGGC TGAATACAAA TCACAGAATC 3180

GTCGTATGCA GTGAAAACTC TCTTCAATTC TTTATGCCGG TGTTGGGCGC GTTATTTATC 3240

GGAGTTGCAG TTGCGCCCGC GAACGACATT TATAATGAAC GTGAATTGCT CAACAGTATG 3300

AACATTTCGC AGCCTACCGT AGTGTTTGTT TCCAAAAAGG GGTTGCAAAA AATTTTGAAC 3360

GTGCAAAAAA AATTACCAAT AATCCAGAAA ATTATTATCA TGGATTCTAA AACGGATTAC 3420

CAGGGATTTC AGTCGATGTA CACGTTCGTC ACATCTCATC TACCTCCCGG TTTTAATGAA 3480

TACGATTTTG TACCAGAGTC CTTTGATCGT GACAAAACAA TTGCACTGAT AATGAATTCC 3540

TCTGGATCTA CTGGGTTACC TAAGGGTGTG GCCCTTCCGC ATAGAACTGC CTGCGTCAGA 3600

TTCTCGCATG CCAGAGATCC TATTTTTGGC AATCAAATCA TTCCGGATAC TGCGATTTTA 3660

AGTGTTGTTC CATTCCATCA CGGTTTTGGA ATGTTTACTA CACTCGGATA TTTGATATGT 3720

GGATTTCGAG TCGTCTTAAT GTATAGATTT GAAGAAGAGC TGTTTTTACG ATCCCTTCAG 3780

GATTACAAAA TTCAAAGTGC GTTGCTAGTA CCAACCCTAT TTTCATTCTT CGCCAAAAGC 3840

ACTCTGATTG ACAAATACGA TTTATCTAAT TTACACGAAA TTGCTTCTGG GGGCGCACCT 3900

CTTTCGAAAG AAGTCGGGGA AGCGGTTGCA AAACGCTTCC ATCTTCCAGG GATACGACAA 3960

GGATATGGGC TCACTGAGAC TACATCAGCT ATTCTGATTA CACCCGAGGG GGATGATAAA 4020

CCGGGCGCGG TCGGTAAAGT TGTTCCATTT TTTGAAGCGA AGGTTGTGGA TCTGGATACC 4080

GGGAAAACGC TGGGCGTTAA TCAGAGAGGC GAATTATGTG TCAGAGGACC TATGATTATG 4140

TCCGGTTATG TAAACAATCC GGAAGCGACC AACGCCTTGA TTGACAAGGA TGGATGGCTA 4200

CATTCTGGAG ACATAGCTTA CTGGGACGAA GACGAACACT TCTTCATAGT TGACCGCTTG 4260

AAGTCTTTAA TTAAATACAA AGGATATCAG GTGGCCCCCG CTGAATTGGA ATCGATATTG 4320

TTACAACACC CCAACATCTT CGACGCGGGC GTGGCAGGTC TTCCCGACGA TGACGCCGGT 4380

GAACTTCCCG CCGCCGTTGT TGTTTTGGAG CACGGAAAGA CGATGACGGA AAAAGAGATC 4440

GTGGATTACG TCGCCAGTCA AGTAACAACC GCGAAAAAGT TGCGCGGAGG AGTTGTGTTT 4500

GTGGACGAAG TACCGAAAGG TCTTACCGGA AAACTCGACG CAAGAAAAAT CAGAGAGATC 4560

CTCATAAAGG CCAAGAAGGG CGGAAAGTCC AAATTGTAAA ATGTAACTGT ATTCAGCGAT 4620

GACGAAATTC TTAGCTATTG TAATGACTCT AGAGGATCTT TGTGAAGGAA CCTTACTTCT 4680

GTGGTGTGAC ATAATTGGAC AAACTACCTA CAGAGATTTA AAGCTCTAAG GTAAATATAA 4740

AATTTTTAAG TGTATAATGT GTTAAACTAC TGATTCTAAT TGTTTGTGTA TTTTAGATTC 4800

CAACCTATGG AACTGATGAA TGGGAGCAGT GGTGGAATGC CTTTAATGAG GAAAACCTGT 4860 TTTGCTCAGA AGAAATGCCA TCTAGTGATG ATGAGGCTAC TGCTGACTCT CAACATTCTA ' 4920

CTCCTCCAAA AAAGAAGAGA AAGGTAGAAG ACCCCAAGGA CTTTCCTTCA GAATTGCTAA 4980

GTTTTTTGAG TCATGCTGTG TTTAGTAATA GAACTCTTGC TTGCTTTGCT ATTTACACCA 5040

CAAAGGAAAA AGCTGCACTG CTATACAAGA AAATTATGGA AAAATATTCT GTAACCTTTA 5100

TAAGTAGGCA TAACAGTTAT AATCATAACA TACTGTTTTT TCTTACTCCA CACAGGCATA 5160

GAGTGTCTGC TATTAATAAC TATGCTCAAA AATTGTGTAC CTTTAGCTTT TTAATTTGTA 5220

AAGGGGTTAA TAAGGAATAT TTGATGTATA GTGCCTTGAC TAGAGATCAT AATCAGCCAT 5280

ACCACATTTG TAGAGGTTTT ACTTGCTTTA AAAAACCTCC CACACCTCCC CCTGAACCTG 5340

AAACATAAAA TGAATGCAAT TGTTGTTGTT AACTTGTTTA TTGCAGCTTA TAATGGTTAC 5400

AAATAAAGCA ATAGCATCAC AAATTTCACA AATAAAGCAT TTTTTTCACT GCATTCTAGT 5460

TGTGGTTTGT CCAAACTCAT CAATGTATCT TATCATGTCT GGATCCCCAG GAAGCTCCTC 5520

TGTGTCCTCA TAAACCCTAA CCTCCTCTAC TTGAGAGGAC ATTCCAATCA TAGGCTGCCC 5580

ATCCACCCTC TGTGTCCTCC TGTTAATTAG GTCACTTAAC AAAAAGGAAA TTGGGTAGGG 5640

GTTTTTCACA GACCGCTTTC TAAGGGTAAT TTTAAAATAT CTGGGAAGTC CCTTCCACTG 5700

CTGTGTTCCA GAAGTGTTGG TAAACAGCCC ACAAATGTCA ACAGCAGAAA CATACAAGCT 5760

GTCAGCTTTG CACAAGGGCC CAACACCCTG CTCAGCAAGA AGCACTGTGG TTGCTGTGTT 5820

AGTAATGTGC AAAACAGGAG GCACATTTTC CCCACCTGTG TAGGTTCCAA AATATCTAGT 5880

GTTTTCATTT TTACTTGGAT CAGGAACCCA GCACTCCACT GGATAAGCAT TATCCTTATC 5940

CAAAACAGCC TTGTGGTCAG TGTTCATCTG CTGACTGTCA ACTGTAGCAT TTTTTGGGGT 6000

TACAGTTTGA GCAGGATATT TGGTCCTGTA GTTTGCTAAC ACACCCTGCA GCTCCAAAGG 6060

TTCCCCACCA ACAGCAAAAA AATGAAAATT TGACCCTTGA ATGGGTTTTC CAGCACCATT 6120

TTCATGAGTT TTTTGTGTCC CTGAATGCAA GTTTAACATA GCAGTTACCC CAATAACCTC 6180

AGTTTTAACA GTAACAGCTT CCCACATCAA AATATTTCCA CAGGTTAAGT CCTCATTTAA 6240

ATTAGGCAAA GGAA 6254 (2) INFORMATION FOR SEQ ID NO:9:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 6265 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: circular

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100

ATAGTTAAGC CAGTATACAC TCCGCTATCG CTACGTGACT GGGTCATGGC TGCGCCCCGA 2160

CACCCGCCAA CACCCGCTGA CGCGCCCTGA CGGGCTTGTC TGCTCCCGGC ATCCGCTTAC 2220

AGACAAGCTG TGACCGTCTC CGGGAGCTGC ATGTGTCAGA GGTTTTCACC GTCATCACCG 2280

AAACGCGCGA GGCAGCGGAT CATAATCAGC CATACCACAT TTGTAGAGGT TTTACTTGCT 2340

TTAAAAAACC TCCCACACCT CCCCCTGAAC CTGAAACATA AAATGAATGC AATTGTTGTT 2400

GTTAACTTGT TTATTGCAGC TTATAATGGT TACAAATAAA GCAATAGCAT CACAAATTTC 2460

ACAAATAAAG CATTTTTTTC ACTGCATTCT AGTTGTGGTT TGTCCAAACT CATCAATGTA 2520

TCTTATCATG TCTGGATCAT AATCAGCCAT ACCACATTTG TAGAGGTTTT ACTTGCTTTA 2580

AAAAACCTCC CACACCTCCC CCTGAACCTG AAACATAAAA TGAATGCAAT TGTTGTTGTT 2640

AACTTGTTTA TTGCAGCTTA TAATGGTTAC AAATAAAGCA ATAGCATCAC AAATTTCACA 2700

AATAAAGCAT TTTTTTCACT GCATTCTAGT TGTGGTTTGT CCAAACTCAT CAATGTATCT 2760

TATCATGTCT GGATCCCACT CCAACCTCAG CCAGACAAGG TTGTTGACAC AAGACCCACA 2820

TCTGGTATAA AAGGAGGCAG TGGCCCACAG AGGAGCACAG CTGTGTTTGG CTGCAGGGCC 2880

AAGAGCGCTG TCAAGAAGAC CCACACGCCC CCCTCCAGCA GCTGAATTCC AGCTGGCATT 2940

CCGGTACTGT TGGTAAAATG GAAGACGCCA AAAACATAAA GAAAGGCCCG GCGCCATTCT 3000

ATCCTCTAGA GGATGGAACC GCTGGAGAGC AACTGCATAA GGCTATGAAG AGATACGCCC 3060

TGGTTCCTGG AACAATTGCT TTTACAGATG CACATATCGA GGTGAACATC ACGTACGCGG 3120

AATACTTCGA AATGTCCGTT CGGTTGGCAG AAGCTATGAA ACGATATGGG CTGAATACAA 3180

ATCACAGAAT CGTCGTATGC AGTGAAAACT CTCTTCAATT CTTTATGCCG GTGTTGGGCG 3240

CGTTATTTAT CGGAGTTGCA GTTGCGCCCG CGAACGACAT TTATAATGAA CGTGAATTGC 3300

TCAACAGTAT GAACATTTCG CAGCCTACCG TAGTGTTTGT TTCCAAAAAG GGGTTGCAAA 3360

AAATTTTGAA CGTGCAAAAA AAATTACCAA TAATCCAGAA AATTATTATC ATGGATTCTA 3420

AAACGGATTA CCAGGGATTT CAGTCGATGT ACACGTTCGT CACATCTCAT CTACCTCCCG 3480

GTTTTAATGA ATACGATTTT GTACCAGAGT CCTTTGATCG TGACAAAACA ATTGCACTGA 3540

TAATGAATTC CTCTGGATCT ACTGGGTTAC CTAAGGGTGT GGCCCTTCCG CATAGAACTG 3600

CCTGCGTCAG ATTCTCGCAT GCCAGAGATC CTATTTTTGG CAATCAAATC ATTCCGGATA 3660

CTGCGATTTT AAGTGTTGTT CCATTCCATC ACGGTTTTGG AATGTTTACT ACACTCGGAT 3720

ATTTGATATG TGGATTTCGA GTCGTCTTAA TGTATAGATT TGAAGAAGAG CTGTTTTTAC 3780

GATCCCTTCA GGATTACAAA ATTCAAAGTG CGTTGCTAGT ACCAACCCTA TTTTCATTCT 3840

TCGCCAAAAG CACTCTGATT GACAAATACG ATTTATCTAA TTTACACGAA ATTGCTTCTG 3900

GGGGCGCACC TCTTTCGAAA GAAGTCGGGG AAGCGGTTGC AAAACGCTTC CATCTTCCAG 3960

GGATACGACA AGGATATGGG CTCACTGAGA CTACATCAGC TATTCTGATT ACACCCGAGG 4020

GGGATGATAA ACCGGGCGCG GTCGGTAAAG TTGTTCCATT TTTTGAAGCG AAGGTTGTGG 4080

ATCTGGATAC CGGGAAAACG CTGGGCGTTA ATCAGAGAGG CGAATTATGT GTCAGAGGAC 4140

CTATGATTAT GTCCGGTTAT GTAAACAATC CGGAAGCGAC CAACGCCTTG ATTGACAAGG 4200

ATGGATGGCT ACATTCTGGA GACATAGCTT ACTGGGACGA AGACGAACAC TTCTTCATAG 4260

TTGACCGCTT GAAGTCTTTA ATTAAATACA AAGGATATCA GGTGGCCCCC GCTGAATTGG 4320

AATCGATATT GTTACAACAC CCCAACATCT TCGACGCGGG CGTGGCAGGT CTTCCCGACG 4380

ATGACGCCGG TGAACTTCCC GCCGCCGTTG TTGTTTTGGA GCACGGAAAG ACGATGACGG 4440

AAAAAGAGAT CGTGGATTAC GTCGCCAGTC AAGTAACAAC CGCGAAAAAG TTGCGCGGAG 4500

GAGTTGTGTT TGTGGACGAA GTACCGAAAG GTCTTACCGG AAAACTCGAC GCAAGAAAAA 4560

TCAGAGAGAT CCTCATAAAG GCCAAGAAGG GCGGAAAGTC CAAATTGTAA AATGTAACTG 4620

TATTCAGCGA TGACGAAATT CTTAGCTATT GTAATGACTC TAGAGGATCT TTGTGAAGGA 4680

ACCTTACTTC TGTGGTGTGA CATAATTGGA CAAACTACCT ACAGAGATTT AAAGCTCTAA 4740

GGTAAATATA AAATTTTTAA GTGTATAATG TGTTAAACTA CTGATTCTAA TTGTTTGTGT 4800

ATTTTAGATT CCAACCTATG GAACTGATGA ATGGGAGCAG TGGTGGAATG CCTTTAATGA 4860

GGAAAACCTG TTTTGCTCAG AAGAAATGCC ATCTAGTGAT GATGAGGCTA CTGCTGACTC 4920

TCAACATTCT ACTCCTCCAA AAAAGAAGAG AAAGGTAGAA GACCCCAAGG ACTTTCCTTC 4980

AGAATTGCTA AGTTTTTTGA GTCATGCTGT GTTTAGTAAT AGAACTCTTG CTTGCTTTGC 5040

TATTTACACC ACAAAGGAAA AAGCTGCACT GCTATACAAG AAAATTATGG AAAAATATTC 5100

TGTAACCTTT ATAAGTAGGC ATAACAGTTA TAATCATAAC ATACTGTTTT TTCTTACTCC 5160

ACACAGGCAT AGAGTGTCTG CTATTAATAA CTATGCTCAA AAATTGTGTA CCTTTAGCTT 5220

TTTAATTTGT AAAGGGGTTA ATAAGGAATA TTTGATGTAT AGTGCCTTGA CTAGAGATCA 5280

TAATCAGCCA TACCACATTT GTAGAGGTTT TACTTGCTTT AAAAAACCTC CCACACCTCC 5340

CCCTGAACCT GAAACATAAA ATGAATGCAA TTGTTGTTGT TAACTTGTTT ATTGCAGCTT 5400

ATAATGGTTA CAAATAAAGC AATAGCATCA CAAATTTCAC AAATAAAGCA TTTTTTTCAC 5460

TGCATTCTAG TTGTGGTTTG TCCAAACTCA TCAATGTATC TTATCATGTC TGGATCCCCA 5520

GGAAGCTCCT CTGTGTCCTC ATAAACCCTA ACCTCCTCTA CTTGAGAGGA CATTCCAATC 5580

ATAGGCTGCC CATCCACCCT CTGTGTCCTC CTGTTAATTA GGTCACTTAA CAAAAAGGAA 5640

ATTGGGTAGG GGTTTTTCAC AGACCGCTTT CTAAGGGTAA TTTTAAAATA TCTGGGAAGT 5700

CCCTTCCACT GCTGTGTTCC AGAAGTGTTG GTAAACAGCC CACAAATGTC AACAGCAGAA 5760

ACATACAAGC TGTCAGCTTT GCACAAGGGC CCAACACCCT GCTCAGCAAG AAGCACTGTG 5820

GTTGCTGTGT TAGTAATGTG CAAAACAGGA GGCACATTTT CCCCACCTGT GTAGGTTCCA 5880

AAATATCTAG TGTTTTCATT TTTACTTGGA TCAGGAACCC AGCACTCCAC TGGATAAGCA 5940

TTATCCTTAT CCAAAACAGC CTTGTGGTCA GTGTTCATCT GCTGACTGTC AACTGTAGCA 6000 TTTTTTGGGG TTACAGTTTG AGCAGGATAT TTGGTCCTGT AGTTTGCTAA CACACCCTGC 6060 AGCTCCAAAG GTTCCCCACC AACAGCAAAA AAATGAAAAT TTGACCCTTG AATGGGTTTT 6120 CCAGCACCAT TTTCATGAGT TTTTTGTGTC CCTGAATGCA AGTTTAACAT AGCAGTTACC 6180 CCAATAACCT CAGTTTTAAC AGTAACAGCT TCCCACATCA AAATATTTCC ACAGGTTAAG 6240 TCCTCATTTA AATTAGGCAA AGGAA 6265 (2) INFORMATION FOR SEQ ID NO:10:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 6254 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: circular

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:10: TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 156.0

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100

ATAGTTAAGC CAGTATACAC TCCGCTATCG CTACGTGACT GGGTCATGGC TGCGCCCCGA 2160

CACCCGCCAA CACCCGCTGA CGCGCCCTGA CGGGCTTGTC TGCTCCCGGC ATCCGCTTAC 2220

AGACAAGCTG TGACCGTCTC CGGGAGCTGC ATGTGTCAGA GGTTTTCACC GTCATCACCG 2280

AAACGCGCGA GGCAGCGGAT CATAATCAGC CATACCACAT TTGTAGAGGT TTTACTTGCT 2340

TTAAAAAACC TCCCACACCT CCCCCTGAAC CTGAAACATA AAATGAATGC AATTGTTGTT 2400

GTTAACTTGT TTATTGCAGC TTATAATGGT TACAAATAAA GCAATAGCAT CACAAATTTC 2460

ACAAATAAAG CATTTTTTTC ACTGCATTCT AGTTGTGGTT TGTCCAAACT CATCAATGTA 2520

TCTTATCATG TCTGGATCAT AATCAGCCAT ACCACATTTG TAGAGGTTTT ACTTGCTTTA 2580

AAAAACCTCC CACACCTCCC CCTGAACCTG AAACATAAAA TGAATGCAAT TGTTGTTGTT 2640

AACTTGTTTA TTGCAGCTTA TAATGGTTAC AAATAAAGCA ATAGCATCAC AAATTTCACA 2700

AATAAAGCAT TTTTTTCACT GCATTCTAGT TGTGGTTTGT CCAAACTCAT CAATGTATCT 2760

TATCATGTCT GGATCCCAGC CAGACAAGGT TGTTGACACA AGACCCACAT CTGGTATAAA 2820

AGGAGGCAGT GGCCCACAGA GGAGCACAGC TGTGTTTGGC TGCAGGGCCA AGAGCGCTGT 2880

CAAGAAGACC CACACGCCCC CCTCCAGCAG CTGAATTCCA GCTGGCATTC CGGTACTGTT 2940

GGTAAAATGG AAGACGCCAA AAACATAAAG AAAGGCCCGG CGCCATTCTA TCCTCTAGAG 3000

GATGGAACCG CTGGAGAGCA ACTGCATAAG GCTATGAAGA GATACGCCCT GGTTCCTGGA 3060

ACAATTGCTT TTACAGATGC ACATATCGAG GTGAACATCA CGTACGCGGA ATACTTCGAA 3120

ATGTCCGTTC GGTTGGCAGA AGCTATGAAA CGATATGGGC TGAATACAAA TCACAGAATC 3180

GTCGTATGCA GTGAAAACTC TCTTCAATTC TTTATGCCGG TGTTGGGCGC GTTATTTATC 3240

GGAGTTGCAG TTGCGCCCGC GAACGACATT TATAATGAAC GTGAATTGCT CAACAGTATG 3300

AACATTTCGC AGCCTACCGT AGTGTTTGTT TCCAAAAAGG GGTTGCAAAA AATTTTGAAC 3360

GTGCAAAAAA AATTACCAAT AATCCAGAAA ATTATTATCA TGGATTCTAA AACGGATTAC 3420

CAGGGATTTC AGTCGATGTA CACGTTCGTC ACATCTCATC TACCTCCCGG TTTTAATGAA 3480

TACGATTTTG TACCAGAGTC CTTTGATCGT GACAAAACAA TTGCACTGAT AATGAATTCC 3540

TCTGGATCTA CTGGGTTACC TAAGGGTGTG GCCCTTCCGC ATAGAACTGC CTGCGTCAGA 3600

TTCTCGCATG CCAGAGATCC TATTTTTGGC AATCAAATCA TTCCGGATAC TGCGATTTTA 3660

AGTGTTGTTC CATTCCATCA CGGTTTTGGA ATGTTTACTA CACTCGGATA TTTGATATGT 3720

GGATTTCGAG TCGTCTTAAT GTATAGATTT GAAGAAGAGC TGTTTTTACG ATCCCTTCAG 3780

GATTACAAAA TTCAAAGTGC GTTGCTAGTA CCAACCCTAT TTTCATTCTT CGCCAAAAGC 3840

ACTCTGATTG ACAAATACGA TTTATCTAAT TTACACGAAA TTGCTTCTGG GGGCGCACCT 3900

CTTTCGAAAG AAGTCGGGGA AGCGGTTGCA AAACGCTTCC ATCTTCCAGG GATACGACAA 3960

GGATATGGGC TCACTGAGAC TACATCAGCT ATTCTGATTA CACCCGAGGG GGATGATAAA 4020

CCGGGCGCGG TCGGTAAAGT TGTTCCATTT TTTGAAGCGA AGGTTGTGGA TCTGGATACC 4080

GGGAAAACGC TGGGCGTTAA TCAGAGAGGC GAATTATGTG TCAGAGGACC TATGATTATG 4140

TCCGGTTATG TAAACAATCC GGAAGCGACC AACGCCTTGA TTGACAAGGA TGGATGGCTA 4200

CATTCTGGAG ACATAGCTTA CTGGGACGAA GACGAACACT TCTTCATAGT TGACCGCTTG 4260

AAGTCTTTAA TTAAATACAA AGGATATCAG GTGGCCCCCG CTGAATTGGA ATCGATATTG 4320

TTACAACACC CCAACATCTT CGACGCGGGC GTGGCAGGTC TTCCCGACGA TGACGCCGGT 4380

GAACTTCCCG CCGCCGTTGT TGTTTTGGAG CACGGAAAGA CGATGACGGA AAAAGAGATC 4440

GTGGATTACG TCGCCAGTCA AGTAACAACC GCGAAAAAGT TGCGCGGAGG AGTTGTGTTT 4500

GTGGACGAAG TACCGAAAGG TCTTACCGGA AAACTCGACG CAAGAAAAAT CAGAGAGATC 4560

CTCATAAAGG CCAAGAAGGG CGGAAAGTCC AAATTGTAAA ATGTAACTGT ATTCAGCGAT 4620

GACGAAATTC TTAGCTATTG TAATGACTCT AGAGGATCTT TGTGAAGGAA CCTTACTTCT 4680

GTGGTGTGAC ATAATTGGAC AAACTACCTA CAGAGATTTA AAGCTCTAAG GTAAATATAA 4740

AATTTTTAAG TGTATAATGT GTTAAACTAC TGATTCTAAT TGTTTGTGTA TTTTAGATTC 4800

CAACCTATGG AACTGATGAA TGGGAGCAGT GGTGGAATGC CTTTAATGAG GAAAACCTGT 4860

TTTGCTCAGA AGAAATGCCA TCTAGTGATG ATGAGGCTAC TGCTGACTCT CAACATTCTA 4920

CTCCTCCAAA AAAGAAGAGA AAGGTAGAAG ACCCCAAGGA CTTTCCTTCA GAATTGCTAA 4980

GTTTTTTGAG TCATGCTGTG TTTAGTAATA GAACTCTTGC TTGCTTTGCT ATTTACACCA 5040

CAAAGGAAAA AGCTGCACTG CTATACAAGA AAATTATGGA AAAATATTCT GTAACCTTTA 5100

TAAGTAGGCA TAACAGTTAT AATCATAACA TACTGTTTTT TCTTACTCCA CACAGGCATA 5160

GAGTGTCTGC TATTAATAAC TATGCTCAAA AATTGTGTAC CTTTAGCTTT TTAATTTGTA 5220

AAGGGGTTAA TAAGGAATAT TTGATGTATA GTGCCTTGAC TAGAGATCAT AATCAGCCAT 5280

ACCACATTTG TAGAGGTTTT ACTTGCTTTA AAAAACCTCC CACACCTCCC CCTGAACCTG 5340

AAACATAAAA TGAATGCAAT TGTTGTTGTT AACTTGTTTA TTGCAGCTTA TAATGGTTAC 5400

AAATAAAGCA ATAGCATCAC AAATTTCACA AATAAAGCAT TTTTTTCACT GCATTCTAGT 5460

TGTGGTTTGT CCAAACTCAT CAATGTATCT TATCATGTCT GGATCCCCAG GAAGCTCCTC 5520

TGTGTCCTCA TAAACCCTAA CCTCCTCTAC TTGAGAGGAC ATTCCAATCA TAGGCTGCCC 5580

ATCCACCCTC TGTGTCCTCC TGTTAATTAG GTCACTTAAC AAAAAGGAAA TTGGGTAGGG 5640

GTTTTTCACA GACCGCTTTC TAAGGGTAAT TTTAAAATAT CTGGGAAGTC CCTTCCACTG 5700

CTGTGTTCCA GAAGTGTTGG TAAACAGCCC ACAAATGTCA ACAGCAGAAA CATACAAGCT 5760

GTCAGCTTTG CACAAGGGCC CAACACCCTG CTCAGCAAGA AGCACTGTGG TTGCTGTGTT 5820

AGTAATGTGC AAAACAGGAG GCACATTTTC CCCACCTGTG TAGGTTCCAA AATATCTAGT 5880

GTTTTCATTT TTACTTGGAT CAGGAACCCA GCACTCCACT GGATAAGCAT TATCCTTATC 5940

CAAAACAGCC TTGTGGTCAG TGTTCATCTG CTGACTGTCA ACTGTAGCAT TTTTTGGGGT 6000

TACAGTTTGA GCAGGATATT TGGTCCTGTA GTTTGCTAAC ACACCCTGCA GCTCCAAAGG 6060

TTCCCCACCA ACAGCAAAAA AATGAAAATT TGACCCTTGA ATGGGTTTTC CAGCACCATT 6120

TTCATGAGTT TTTTGTGTCC CTGAATGCAA GTTTAACATA GCAGTTACCC CAATAACCTC 6180

AGTTTTAACA GTAACAGCTT CCCACATCAA AATATTTCCA CAGGTTAAGT CCTCATTTAA 6240

ATTAGGCAAA GGAA 6254 (2) INFORMATION FOR SEQ ID NO:11:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 1442 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:11: GGTACCCAGG CTGCATAACC AGGAGGTGAG TGGCAGGTGA GTGAAATTTC ATCTGTAGTT 60

ACAGCCACTC CTCATCACTC GCATTACCAC CAGAGCTCCA CTCCCTGTCA GATCAGCGGC 120

GGCATTAGAT TCTCATAGGA GCTCGAACCC TATTCTAAAC TGTTCATGTG AGGGATCTAG 180

GTTGCAAGCT CCCTATGAGA ATCTAATGCC TGATGATCTG TCACGGTCTC CCATCACCCC 240

TAGATGGGAC CATCTAGTTG CAGGAAAACA AGCTCAGGCT CCCACTGATT CTACACGATG 300

GTGAATTGTG GAATTATTTC ATTATATATA TTACAATGTA ATAATAATAG AAATAAAGCA 360

CACAATAAAT GTAATGTGCT TGAATCATCC CGAAACCATC CCACCCTGGT CTGTGAAAAA 420

ATTGTCTTCC ATGAAACCAG TCCCTGGTGC CAAAAACGTT GAGGACCACT GCTCCACAGA 480

ATCTATCGGT CACTCTTCCT CCCCTCACCC CCTTGCCCTA AAAGCACACC CTGCAAACCT 540

GCCATGAATT GACACTCTGT TTCTATCCCT TTTCCCCTTG TGTCTGTGTC TGGAGGAAGA 600

GGATAAAGGA CAAGCTGCCC CAAGTCCTAG CGGGCAGCTC GAGGAAGTGA AACTTACACG 660

TTGGTCTCCT GTTTCCTTAC CAAGCTTACC ATGGTAACCC CTGGTCCCGT TCAGCCACCA 720

CCACCCCACC CAGCACACCT CCAACCTCAG CCAGACAAGG TTGTTGACAC AAGAGAGCCC 780

TCAGGGGCAC AGAGAGAGTC TGGACACGTG GGGAGTCAGC CGTGTATCAT CGGAGGCGGC 840

CGGGCACATG GCAGGGATGA GGGAAAGACC AAGAGTCCTC TGTTGGGCCC AAGTCCTAGA 900

CAGACAAAAC CTAGACAATC ACGTGGCTGG CTGCATGCCT GTGGCTGTTG GGCTGGGCAG 960

GAGGAGGGAG GGGCGCTCTT TCCTGGAGGT GGTCCAGAGC ACCGGGTGGA CAGCCCTGGG 1020

GGAAAACTTC CACGTTTTGA TGGAGGTTAT CTTTGATAAC TCCACAGTGA CCTGGTTCGC 1080

CAAAGGAAAA GCAGGCAACG TGAGCTGTTT TTTTTTTCTC CAAGCTGAAC ACTAGGGGTC 1140

CTAGGCTTTT TGGGTCACCC GGCATGGCAG ACAGTCAACC TGGCAGGACA TCCGGGAGAG 1200

ACAGACACAG GCAGAGGGCA GAAAGGTCAA GGGAGGTTCT CAGGCCAAGG CTATTGGGGT 1260

TTGCTCAATT GTTCCTGAAT GCTCTTACAC ACGTACACAC ACAGAGCAGC ACACACACAC 1320

ACACACACAT GCCTCAGCAA GTCCCAGAGA GGGAGGTGTC GAGGGGGACC CGCTGGCTGT 1380

TCAGACGGAC TCCCAGAGCC AGTGAGTGGG TGGGGCTGGA ACATGAGTTC ATCTATTTCC 1440

TG 1442 (2) INFORMATION FOR SEQ ID NO:12: (i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 761 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:12:

AAGCTTACCA TGGTAACCCC TGGTCCCGTT CAGCCACCAC CACCCCACCC AGCACACCTC 60

CAACCTCAGC CAGACAAGGT TGTTGACACA AGAGAGCCCT CAGGGGCACA GAGAGAGTCT 120

GGACACGTGG GGAGTCAGCC GTGTATCATC GGAGGCGGCC GGGCACATGG CAGGGATGAG 180 GGAAAGACCA AGAGTCCTCT GTTGGGCCCA AGTCCTAGAC AGACAAAACC TAGACAATCA 240

CGTGGCTGGC TGCATGCCTG TGGCTGTTGG GCTGGGCAGG AGGAGGGAGG GGCGCTCTTT 300

CCTGGAGGTG GTCCAGAGCA CCGGGTGGAC AGCCCTGGGG GAAAACTTCC ACGTTTTGAT 360

GGAGGTTATC TTTGATAACT CCACAGTGAC CTGGTTCGCC AAAGGAAAAG CAGGCAACGT 420

GAGCTGTTTT TTTTTTCTCC AAGCTGAACA CTAGGGGTCC TAGGCTTTTT GGGTCACCCG 480

GCATGGCAGA CAGTCAACCT GGCAGGACAT CCGGGAGAGA CAGACACAGG CAGAGGGCAG 540

AAAGGTCAAG GGAGGTTCTC AGGCCAAGGC TATTGGGGTT TGCTCAATTG TTCCTGAATG 600

CTCTTACACA CGTACACACA CAGAGCAGCA CACACACACA CACACACATG CCTCAGCAAG 660

TCCCAGAGAG GGAGGTGTCG AGGGGGACCC GCTGGCTGTT CAGACGGACT CCCAGAGCCA 720

GTGAGTGGGT GGGGCTGGAA CATGAGTTCA TCTATTTCCT G 761 (2) INFORMATION FOR SEQ ID NO:13:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 165 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA (genomic)

(iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: AAGCTTACCA TGGTAACCCC TGGTCCCGTT CAGCCACCAC CACCCCACCC AGCACACCTC 60 CAACCTCAGC CAGACAAGGT TGTTGACACA AGAGAGCCCT CAGGGGCACA GAGAGAGTCT 120 GGACACGTGG GGAGTCAGCC GTGTATCATC GGAGGCGGCC GGGCA 165

(2) INFORMATION FOR SEQ ID NO:14:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 16 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: AGTTCATCTA TTTCCT 16

(2) INFORMATION FOR SEQ ID NO:15:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 25 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:15:

GTGGGGAGTC AGCCGTGTAT CATCG 25

(2) INFORMATION FOR SEQ ID NO:16:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 36 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: CTCCAACCTC AGCCAGACAA GGTTGTTGAC ACAAGA 36 (2) INFORMATION FOR SEQ ID NO:17:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 25 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: circular

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: GCCAGACAAG GTTGTTGACA CAAGA 25 (2) INFORMATION FOR SEQ ID NO:18:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 115 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA (genomic)

(iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: CCCACATCTG GTATAAAAGG AGGCAGTGGC CCACAGAGGA GCACAGCTGT GTTTGGCTGC 60 AGGGCCAAGA GCGCTGTCAA GAAGACCCAC ACGCCCCCCT CCAGCAGCTG AATTC 115

(2) INFORMATION FOR SEQ ID NO:19:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 345 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii). MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:19:

GGCCAGACGC CAACAAGGTA GGAGCTGGAG CATTCGGGCT GGGTTTCACC CCACCGCACG 60

GAGGCCTTTT GGGGTGGAGC CCTCAGGCTC AGGGCATACT ACAAACTTTG CCAGCAAATC 120

CGCCTCCTGC CTCCACCAAT CGCCAGTCAG GAAGGCAGCC TACCCCGCTG TCTCCACCTT 180

TGAGAAACAC TCATCCTCAG GCCATGCAGT GGAATTCCAC AACCTTCCAC CAAACTCTGC 240

AAGATCCCAG AGTGAGAGGC CTGTATTTCC CTGCTGGTGG CTCCAGTTCA GGAACAGTAA 300

ACCCTGTTCT GACTACTGCC TCTCCCTTAT CGTCAATCTT CTCGA 345 (2) INFORMATION FOR SEQ ID NO:20:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 4302 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA (genomic)

(iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20:

TCGACCTCGA GGGATCTTTG TGAAGGAACC TTACTTCTGT GGTGTGACAT AATTGGACAA 60

ACTACCTACA GAGATTTAAA GCTCTAAGGT AAATATAAAA TTTTTAAGTG TATAATGTGT 120

TAAACTACTG ATTCTAATTG TTTGTGTATT TTAGATTCCA ACCTATGGAA CTGATGAATG 180

GGAGCAGTGG TGGAATGCCT TTAATGAGGA AAACCTGTTT TGCTCAGAAG AAATGCCATC 240

TAGTGATGAT GAGGCTACTG CTGACTCTCA ACATTCTACT CCTCCAAAAA AGAAGAGAAA 300

GGTAGAAGAC CCCAAGGACT TTCCTTCAGA ATTGCTAAGT TTTTTGAGTC ATGCTGTGTT 360

TAGTAATAGA ACTCTTGCTT GCTTTGCTAT TTACACCACA AAGGAAAAAG CTGCACTGCT 420

ATACAAGAAA ATTATGGAAA AATATTCTGT AACCTTTATA AGTAGGCATA ACAGTTATAA 480

TCATAACATA CTGTTTTTTC TTACTCCACA CAGGCATAGA GTGTCTGCTA TTAATAACTA 540

TGCTCAAAAA TTGTGTACCT TTAGCTTTTT AATTTGTAAA GGGGTTAATA AGGAATATTT 600

GATGTATAGT GCCTTGACTA GAGATCATAA TCAGCCATAC CACATTTGTA GAGGTTTTAC 660

TTGCTTTAAA AAACCTCCCA CACCTCCCCC TGAACCTGAA ACATAAAATG AATGCAATTG 720

TTGTTGTTAA CTTGTTTATT GCAGCTTATA ATGGTTACAA ATAAAGCAAT AGCATCACAA 780

ATTTCACAAA TAAAGCATTT TTTTCACTGC ATTCTAGTTG TGGTTTGTCC AAACTCATCA 840

ATGTATCTTA TCATGTCTGG ATCCGGCTGT GGAATGTGTG TCAGTTAGGG TGTGGAAAGT 900

CCCCAGGCTC CCCAGCAGGC AGAAGTATGC AAAGCATGCA TCTCAATTAG TCAGCAACCA 960

GGTGTGGAAA GTCCCCAGGC TCCCCAGCAG GCAGAAGTAT GCAAAGCATG CATCTCAATT 1020

AGTCAGCAAC CATAGTCCCG CCCCTAACTC CGCCCATCCC GCCCCTAACT CCGCCCAGTT 1080

CCGCCCATTC TCCGCCCCAT GGCTGACTAA TTTTTTTTAT TTATGCAGAG GCCGAGGCCG 1140

CCTCGGCCTC TGAGCTATTC CAGAAGTAGT GAGGAGGCTT TTTTGGAGGC CTAGGCTTTT 1200

GCAAAAAGCT TCACGCTGCC GCAAGCACTC AGGGCGCAAG GGCTGCTAAA GGAAGCGGAA 1260

CACGTAGAAA GCCAGTCCGC AGAAACGGTG CTGACCCCGG ATGAATGTCA GCTACTGGGC 1320

TATCTGGACA AGGGAAAACG CAAGCGCAAA GAGAAAGCAG GTAGCTTGCA GTGGGCTTAC 1380

ATGGCGATAG CTAGACTGGG CGGTTTTATG GACAGCAAGC GAACCGGAAT TGCCAGCTGG 1440

GGCGCCCTCT GGTAAGGTTG GGAAGCCCTG CAAAGTAAAC TGGATGGCTT TCTTGCCGCC 1500

AAGGATCTGA TGGCGCAGGG GATCAAGATC TGATCAAGAG ACAGGATGAG GATCGTTTCG 1560

CATGATTGAA CAAGATGGAT TGCACGCAGG TTCTCCGGCC GCTTGGGTGG AGAGGCTATT 1620

CGGCTATGAC TGGGCACAAC AGACAATCGG CTGCTCTGAT GCCGCCGTGT TCCGGCTGTC 1680

AGCGCAGGGG CGCCCGGTTC TTTTTGTCAA GACCGACCTG TCCGGTGCCC TGAATGAACT 1740

GCAGGACGAG GCAGCGCGGC TATCGTGGCT GGCCACGACG GGCGTTCCTT GCGCAGCTGT 1800

GCTCGACGTT GTCACTGAAG CGGGAAGGGA CTGGCTGCTA TTGGGCGAAG TGCCGGGGCA 1860

GGATCTCCTG TCATCTCACC TTGCTCCTGC CGAGAAAGTA TCCATCATGG CTGATGCAAT 1920

GCGGCGGCTG CATACGCTTG ATCCGGCTAC CTGCCCATTC GACCACCAAG CGAAACATCG 1980

CATCGAGCGA GCACGTACTC GGATGGAAGC CGGTCTTGTC GATCAGGATG ATCTGGACGA 2040

AGAGCATCAG GGGCTCGCGC CAGCCGAACT GTTCGCCAGG CTCAAGGCGC GCATGCCCGA 2100

CGGCGAGGAT CTCGTCGTGA CCCATGGCGA TGCCTGCTTG CCGAATATCA TGGTGGAAAA 2160

TGGCCGCTTT TCTGGATTCA TCGACTGTGG CCGGCTGGGT GTGGCGGACC GCTATCAGGA 2220

CATAGCGTTG GCTACCCGTG ATATTGCTGA AGAGCTTGGC GGCGAATGGG CTGACCGCTT 2280

CCTCGTGCTT TACGGTATCG CCGCTCCCGA TTCGCAGCGC ATCGCCTTCT ATCGCCTTCT 2340

TGACGAGTTC TTCTGAGCGG GACTCTGGGG TTCGAAATGA CCGACCAAGC GACGCCCAAC 2400

CTGCCATCAC GAGATTTCGA TTCCACCGCC GCCTTCTATG AAAGGTTGGG CTTCGGAATC 2460

GTTTTCCGGG ACGCCGGCTG GATGATCCTC CAGCGCGGGG ATCTCATGCT GGAGTTCTTC 2520

GCCCACCCCG GGCTCGATCC CCTCGCGAGT TGGTTCAGCT GCTGCCTGAG GCTGGACGAC 2580

CTCGCGGAGT TCTACCGGCA GTGCAAATCC GTCGGCATCC AGGAAACCAG CAGCGGCTAT 2640

CCGCGCATCC ATGCCCCCGA ACTGCAGGAG TGGGGAGGCA CGATGGCCGC TTTGGTCCCG 2700

GATCTTTGTG AAGGAACCTT ACTTCTGTGG TGTGACATAA TTGGACAAAC TACCTACAGA 2760

GATTTAAAGC TCTAAGGTAA ATATAAAATT TTTAAGTGTA TAATGTGTTA AACTACTGAT 2820

TCTAATTGTT TGTGTATTTT AGATTCCAAC CTATGGAACT GATGAATGGG AGCAGTGGTG 2880

95/19987

- 158 -

GAATGCCTTT AATGAGGAAA ACCTGTTTTG CTCAGAAGAA ATGCCATCTA GTGATGATGA 2940

GGCTACTGCT GACTCTCAAC ATTCTACTCC TCCAAAAAAG AAGAGAAAGG TAGAAGACCC 3000

CAAGGACTTT CCTTCAGAAT TGCTAAGTTT TTTGAGTCAT GCTGTGTTTA GTAATAGAAC 3060

TCTTGCTTGC TTTGCTATTT ACACCACAAA GGAAAAAGCT GCACTGCTAT ACAAGAAAAT 3120

TATGGAAAAA TATTCTGTAA CCTTTATAAG TAGGCATAAC AGTTATAATC ATAACATACT 3180

GTTTTTTCTT ACTCCACACA GGCATAGAGT GTCTGCTATT AATAACTATG CTCAAAAATT 3240

GTGTACCTTT AGCTTTTTAA TTTGTAAAGG GGTTAATAAG GAATATTTGA TGTATAGTGC 3300

CTTGACTAGA GATCATAATC AGCCATACCA CATTTGTAGA GGTTTTACTT GCTTTAAAAA 3360

ACCTCCCACA CCTCCCCCTG AACCTGAAAC ATAAAATGAA TGCAATTGTT GTTGTTAACT 3420

TGTTTATTGC AGCTTATAAT GGTTACAAAT AAAGCAATAG CATCACAAAT TTCACAAATA 3480

AAGCATTTTT TTCACTGCAT TCTAGTTGTG GTTTGTCCAA ACTCATCAAT GTATCTTATC 3540

ATGTCTGGAT CCCCAGGAAG CTCCTCTGTG TCCTCATAAA CCCTAACCTC CTCTACTTGA 3600

GAGGACATTC CAATCATAGG CTGCCCATCC ACCCTCTGTG TCCTCCTGTT AATTAGGTCA 3660

CTTAACAAAA AGGAAATTGG GTAGGGGTTT TTCACAGACC GCTTTCTAAG GGTAATTTTA 3720

AAATATCTGG GAAGTCCCTT CCACTGCTGT GTTCCAGAAG TGTTGGTAAA CAGCCCACAA 3780

ATGTCAACAG CAGAAACATA CAAGCTGTCA GCTTTGCACA AGGGCCCAAC ACCCTGCTCA 3840

TCAAGAAGCA CTGTGGTTGC TGTGTTAGTA ATGTGCAAAA CAGGAGGCAC ATTTTCCCCA 3900

CCTGTGTAGG TTCCAAAATA TCTAGTGTTT TCATTTTTAC TTGGATCAGG AACCCAGCAC 3960

TCCACTGGAT AAGCATTATC CTTATCCAAA ACAGCCTTGT GGTCAGTGTT CATCTGCTGA 4020

CTGTCAACTG TAGCATTTTT TGGGGTTACA GTTTGAGCAG GATATTTGGT CCTGTAGTTT 4080

GCTAACACAC CCTGCAGCTC CAAAGGTTCC CCACCAACAG CAAAAAAATG AAAATTTGAC 4140

CCTTGAATGG GTTTTCCAGC ACCATTTTCA TGAGTTTTTT GTGTCCCTGA ATGCAAGTTT 4200

AACATAGCAG TTACCCCAAT AACCTCAGTT TTAACAGTAA CAGCTTCCCA CATCAAAATA 4260

TTTCCACAGG TTAAGTCCTC ATTTAAATTA GGCAAAGGAA TT 4302 (2) INFORMATION FOR SEQ ID NO:21: (i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 6170 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: circular

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21:

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100

ATAGTTAAGC CAGTATACAC TCCGCTATCG CTACGTGACT GGGTCATGGC TGCGCCCCGA 2160

CACCCGCCAA CACCCGCTGA CGCGCCCTGA CGGGCTTGTC TGCTCCCGGC ATCCGCTTAC 2220

AGACAAGCTG TGACCGTCTC CGGGAGCTGC ATGTGTCAGA GGTTTTCACC GTCATCACCG 2280

AAACGCGCGA GGCAGCGGAT CATAATCAGC CATACCACAT TTGTAGAGGT TTTACTTGCT 2340

TTAAAAAACC TCCCACACCT CCCCCTGAAC CTGAAACATA AAATGAATGC AATTGTTGTT 2400

GTTAACTTGT TTATTGCAGC TTATAATGGT TACAAATAAA GCAATAGCAT CACAAATTTC 2460

ACAAATAAAG CATTTTTTTC ACTGCATTCT AGTTGTGGTT TGTCCAAACT CATCAATGTA 2520

TCTTATCATG TCTGGATCAT AATCAGCCAT ACCACATTTG TAGAGGTTTT ACTTGCTTTA 2580

AAAAACCTCC CACACCTCCC CCTGAACCTG AAACATAAAA TGAATGCAAT TGTTGTTGTT 2640

AACTTGTTTA TTGCAGCTTA TAATGGTTAC AAATAAAGCA ATAGCATCAC AAATTTCACA 2700

AATAAAGCAT TTTTTTCACT GCATTCTAGT TGTGGTTTGT CCAAACTCAT CAATGTATCT 2760

TATCATGTCT GGATCCCAAG CTTGCATGCC TGCAGGTCGA CTCTAGAGGA TCCCCGGGTA 2820

CCGAGCTCGA ATTCCAGCTG GCATTCCGGT ACTGTTGGTA AAATGGAAGA CGCCAAAAAC 2880

ATAAAGAAAG GCCCGGCGCC ATTCTATCCT CTAGAGGATG GAACCGCTGG AGAGCAACTG 2940

CATAAGGCTA TGAAGAGATA CGCCCTGGTT CCTGGAACAA TTGCTTTTAC AGATGCACAT 3000

ATCGAGGTGA ACATCACGTA CGCGGAATAC TTCGAAATGT CCGTTCGGTT GGCAGAAGCT 3060

ATGAAACGAT ATGGGCTGAA TACAAATCAC AGAATCGTCG TATGCAGTGA AAACTCTCTT 3120

CAATTCTTTA TGCCGGTGTT GGGCGCGTTA TTTATCGGAG TTGCAGTTGC GCCCGCGAAC 3180

GACATTTATA ATGAACGTGA ATTGCTCAAC AGTATGAACA TTTCGCAGCC TACCGTAGTG 3240

TTTGTTTCCA AAAAGGGGTT GCAAAAAATT TTGAACGTGC AAAAAAAATT ACCAATAATC 3300

CAGAAAATTA TTATCATGGA TTCTAAAACG GATTACCAGG GATTTCAGTC GATGTACACG 3360

TTCGTCACAT CTCATCTACC TCCCGGTTTT AATGAATACG ATTTTGTACC AGAGTCCTTT 3420

GATCGTGACA AAACAATTGC ACTGATAATG AATTCCTCTG GATCTACTGG GTTACCTAAG 3480

GGTGTGGCCC TTCCGCATAG AACTGCCTGC GTCAGATTCT CGCATGCCAG AGATCCTATT 3540

TTTGGCAATC AAATCATTCC GGATACTGCG ATTTTAAGTG TTGTTCCATT CCATCACGGT 3600

TTTGGAATGT TTACTACACT CGGATATTTG ATATGTGGAT TTCGAGTCGT CTTAATGTAT 3660

AGATTTGAAG AAGAGCTGTT TTTACGATCC CTTCAGGATT ACAAAATTCA AAGTGCGTTG 3720

CTAGTACCAA CCCTATTTTC ATTCTTCGCC AAAAGCACTC TGATTGACAA ATACGATTTA 3780

TCTAATTTAC ACGAAATTGC TTCTGGGGGC GCACCTCTTT CGAAAGAAGT CGGGGAAGCG 3840

GTTGCAAAAC GCTTCCATCT TCCAGGGATA CGACAAGGAT ATGGGCTCAC TGAGACTACA 3900

TCAGCTATTC TGATTACACC CGAGGGGGAT GATAAACCGG GCGCGGTCGG TAAAGTTGTT 3960

CCATTTTTTG AAGCGAAGGT TGTGGATCTG GATACCGGGA AAACGCTGGG CGTTAATCAG 4020

AGAGGCGAAT TATGTGTCAG AGGACCTATG ATTATGTCCG GTTATGTAAA CAATCCGGAA 4080

GCGACCAACG CCTTGATTGA CAAGGATGGA TGGCTACATT CTGGAGACAT AGCTTACTGG 4140

GACGAAGACG AACACTTCTT CATAGTTGAC CGCTTGAAGT CTTTAATTAA ATACAAAGGA 4200

TATCAGGTGG CCCCCGCTGA ATTGGAATCG ATATTGTTAC AACACCCCAA CATCTTCGAC 4260

GCGGGCGTGG CAGGTCTTCC CGACGATGAC GCCGGTGAAC TTCCCGCCGC CGTTGTTGTT 4320

TTGGAGCACG GAAAGACGAT GACGGAAAAA GAGATCGTGG ATTACGTCGC CAGTCAAGTA 4380

ACAACCGCGA AAAAGTTGCG CGGAGGAGTT GTGTTTGTGG ACGAAGTACC GAAAGGTCTT 4440

ACCGGAAAAC TCGACGCAAG AAAAATCAGA GAGATCCTCA TAAAGGCCAA GAAGGGCGGA 4500

AAGTCCAAAT TGTAAAATGT AACTGTATTC AGCGATGACG AAATTCTTAG CTATTGTAAT 4560

GACTCTAGAG GATCTTTGTG AAGGAACCTT ACTTCTGTGG TGTGACATAA TTGGACAAAC 4620

TACCTACAGA GATTTAAAGC TCTAAGGTAA ATATAAAATT TTTAAGTGTA TAATGTGTTA 4680

AACTACTGAT TCTAATTGTT TGTGTATTTT AGATTCCAAC CTATGGAACT GATGAATGGG 4740

AGCAGTGGTG GAATGCCTTT AATGAGGAAA ACCTGTTTTG CTCAGAAGAA ATGCCATCTA 4800

GTGATGATGA GGCTACTGCT GACTCTCAAC ATTCTACTCC TCCAAAAAAG AAGAGAAAGG 4860

TAGAAGACCC CAAGGACTTT CCTTCAGAAT TGCTAAGTTT TTTGAGTCAT GCTGTGTTTA 4920

GTAATAGAAC TCTTGCTTGC TTTGCTATTT ACACCACAAA GGAAAAAGCT GCACTGCTAT 4980

ACAAGAAAAT TATGGAAAAA TATTCTGTAA CCTTTATAAG TAGGCATAAC AGTTATAATC 5040

ATAACATACT GTTTTTTCTT ACTCCACACA GGCATAGAGT GTCTGCTATT AATAACTATG 5100

CTCAAAAATT GTGTACCTTT AGCTTTTTAA TTTGTAAAGG GGTTAATAAG GAATATTTGA 5160

TGTATAGTGC CTTGACTAGA GATCATAATC AGCCATACCA CATTTGTAGA GGTTTTACTT 5220

GCTTTAAAAA ACCTCCCACA CCTCCCCCTG AACCTGAAAC ATAAAATGAA TGCAATTGTT 5280

GTTGTTAACT TGTTTATTGC AGCTTATAAT GGTTACAAAT AAAGCAATAG CATCACAAAT 5340

TTCACAAATA AAGCATTTTT TTCACTGCAT TCTAGTTGTG GTTTGTCCAA ACTCATCAAT 5400

GTATCTTATC ATGTCTGGAT CCCCAGGAAG CTCCTCTGTG TCCTCATAAA CCCTAACCTC 5460

CTCTACTTGA GAGGACATTC CAATCATAGG CTGCCCATCC ACCCTCTGTG TCCTCCTGTT 5520

AATTAGGTCA CTTAACAAAA AGGAAATTGG GTAGGGGTTT TTCACAGACC GCTTTCTAAG 5580

GGTAATTTTA AAATATCTGG GAAGTCCCTT CCACTGCTGT GTTCCAGAAG TGTTGGTAAA 5640

CAGCCCACAA ATGTCAACAG CAGAAACATA CAAGCTGTCA GCTTTGCACA AGGGCCCAAC 5700

ACCCTGCTCA GCAAGAAGCA CTGTGGTTGC TGTGTTAGTA ATGTGCAAAA CAGGAGGCAC 5760

ATTTTCCCCA CCTGTGTAGG TTCCAAAATA TCTAGTGTTT TCATTTTTAC TTGGATCAGG 5820

AACCCAGCAC TCCACTGGAT AAGCATTATC CTTATCCAAA ACAGCCTTGT GGTCAGTGTT 5880

CATCTGCTGA CTGTCAACTG TAGCATTTTT TGGGGTTACA GTTTGAGCAG GATATTTGGT 5940

CCTGTAGTTT GCTAACACAC CCTGCAGCTC CAAAGGTTCC CCACCAACAG CAAAAAAATG 6000

AAAATTTGAC CCTTGAATGG GTTTTCCAGC ACCATTTTCA TGAGTTTTTT GTGTCCCTGA 6060

ATGCAAGTTT AACATAGCAG TTACCCCAAT AACCTCAGTT TTAACAGTAA CAGCTTCCCA 6120

CATCAAAATA TTTCCACAGG TTAAGTCCTC ATTTAAATTA GGCAAAGGAA 6170 (2) INFORMATION FOR SEQ ID NO:22:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 10533 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: circular

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22:

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540

*. TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100

ATAGTTAAGC CAGTATTCGA CCTCGAGGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG 2160

TGACATAATT GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT 2220

TAAGTGTATA ATGTGTTAAA CTACTGATTC TAATTGTTTG TGTATTTTAG ATTCCAACCT 2280

ATGGAACTGA TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT 2340

CAGAAGAAAT GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC 2400

CAAAAAAGAA GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT 2460

TGAGTCATGC TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG 2520

AAAAAGCTGC ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA 2580

GGCATAACAG TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT 2640

CTGCTATTAA TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG 2700

TTAATAAGGA ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA 2760

TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT 2820

AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA 2880

AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT 2940

TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC GGCTGTGGAA TGTGTGTCAG 3000

TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA GTATGCAAAG CATGCATCTC 3060

AATTAGTCAG CAACCAGGTG TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG AAGTATGCAA 3120

AGCATGCATC TCAATTAGTC AGCAACCATA GTCCCGCCCC TAACTCCGCC CATCCCGCCC 3180

CTAACTCCGC CCAGTTCCGC CCATTCTCCG CCCCATGGCT GACTAATTTT TTTTATTTAT 3240

GCAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG AGGCTTTTTT 3300

GGAGGCCTAG GCTTTTGCAA AAAGCTTCAC GCTGCCGCAA GCACTCAGGG CGCAAGGGCT 3360

GCTAAAGGAA GCGGAACACG TAGAAAGCCA GTCCGCAGAA ACGGTGCTGA CCCCGGATGA 3420

ATGTCAGCTA CTGGGCTATC TGGACAAGGG AAAACGCAAG CGCAAAGAGA AAGCAGGTAG 3480

CTTGCAGTGG GCTTACATGG CGATAGCTAG ACTGGGCGGT TTTATGGACA GCAAGCGAAC 3540

CGGAATTGCC AGCTGGGGCG CCCTCTGGTA AGGTTGGGAA GCCCTGCAAA GTAAACTGGA 3600

TGGCTTTCTT GCCGCCAAGG ATCTGATGGC GCAGGGGATC AAGATCTGAT CAAGAGACAG 3660

GATGAGGATC GTTTCGCATG ATTGAACAAG ATGGATTGCA CGCAGGTTCT CCGGCCGCTT 3720

GGGTGGAGAG GCTATTCGGC TATGACTGGG CACAACAGAC AATCGGCTGC TCTGATGCCG 3780

CCGTGTTCCG GCTGTCAGCG CAGGGGCGCC CGGTTCTTTT TGTCAAGACC GACCTGTCCG 3840

GTGCCCTGAA TGAACTGCAG GACGAGGCAG CGCGGCTATC GTGGCTGGCC ACGACGGGCG 3900

TTCCTTGCGC AGCTGTGCTC GACGTTGTCA CTGAAGCGGG AAGGGACTGG CTGCTATTGG 3960

GCGAAGTGCC GGGGCAGGAT CTCCTGTCAT CTCACCTTGC TCCTGCCGAG AAAGTATCCA 4020

TCATGGCTGA TGCAATGCGG CGGCTGCATA CGCTTGATCC GGCTACCTGC CCATTCGACC 4080

ACCAAGCGAA ACATCGCATC GAGCGAGCAC GTACTCGGAT GGAAGCCGGT CTTGTCGATC 4140

AGGATGATCT GGACGAAGAG CATCAGGGGC TCGCGCCAGC CGAACTGTTC GCCAGGCTCA 4200

AGGCGCGCAT GCCCGACGGC GAGGATCTCG TCGTGACCCA TGGCGATGCC TGCTTGCCGA 4260

ATATCATGGT GGAAAATGGC CGCTTTTCTG GATTCATCGA CTGTGGCCGG CTGGGTGTGG 4320

CGGACCGCTA TCAGGACATA GCGTTGGCTA CCCGTGATAT TGCTGAAGAG CTTGGCGGCG 4380

AATGGGCTGA CCGCTTCCTC GTGCTTTACG GTATCGCCGC TCCCGATTCG CAGCGCATCG 4440

CCTTCTATCG CCTTCTTGAC GAGTTCTTCT GAGCGGGACT CTGGGGTTCG AAATGACCGA 4500

CCAAGCGACG CCCAACCTGC CATCACGAGA TTTCGATTCC ACCGCCGCCT TCTATGAAAG 4560

GTTGGGCTTC GGAATCGTTT TCCGGGACGC CGGCTGGATG ATCCTCCAGC GCGGGGATCT 4620

CATGCTGGAG TTCTTCGCCC ACCCCGGGCT CGATCCCCTC GCGAGTTGGT TCAGCTGCTG 4680

CCTGAGGCTG GACGACCTCG CGGAGTTCTA CCGGCAGTGC AAATCCGTCG GCATCCAGGA 4740

AACCAGCAGC GGCTATCCGC GCATCCATGC CCCCGAACTG CAGGAGTGGG GAGGCACGAT 4800

GGCCGCTTTG GTCCCGGATC TTTGTGAAGG AACCTTACTT CTGTGGTGTG ACATAATTGG 4860

ACAAACTACC TACAGAGATT TAAAGCTCTA AGGTAAATAT AAAATTTTTA AGTGTATAAT 4920

GTGTTAAACT ACTGATTCTA ATTGTTTGTG TATTTTAGAT TCCAACCTAT GGAACTGATG 4980

AATGGGAGCA GTGGTGGAAT GCCTTTAATG AGGAAAACCT GTTTTGCTCA GAAGAAATGC 5040

CATCTAGTGA TGATGAGGCT ACTGCTGACT CTCAACATTC TACTCCTCCA AAAAAGAAGA 5100

GAAAGGTAGA AGACCCCAAG GACTTTCCTT CAGAATTGCT AAGTTTTTTG AGTCATGCTG 5160

TGTTTAGTAA TAGAACTCTT GCTTGCTTTG CTATTTACAC CACAAAGGAA AAAGCTGCAC 5220

TGCTATACAA GAAAATTATG GAAAAATATT CTGTAACCTT TATAAGTAGG CATAACAGTT 5280

ATAATCATAA CATACTGTTT TTTCTTACTC CACACAGGCA TAGAGTGTCT GCTATTAATA 5340

ACTATGCTCA AAAATTGTGT ACCTTTAGCT TTTTAATTTG TAAAGGGGTT AATAAGGAAT 5400

ATTTGATGTA TAGTGCCTTG ACTAGAGATC ATAATCAGCC ATACCACATT TGTAGAGGTT 5460

TTACTTGCTT TAAAAAACCT CCCACACCTC CCCCTGAACC TGAAACATAA AATGAATGCA 5520

ATTGTTGTTG TTAACTTGTT TATTGCAGCT TATAATGGTT ACAAATAAAG CAATAGCATC 5580

ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA GTTGTGGTTT GTCCAAACTC 5640

ATCAATGTAT CTTATCATGT CTGGATCCCC AGGAAGCTCC TCTGTGTCCT CATAAACCCT 5700

AACCTCCTCT ACTTGAGAGG ACATTCCAAT CATAGGCTGC CCATCCACCC TCTGTGTCCT 5760

CCTGTTAATT AGGTCACTTA ACAAAAAGGA AATTGGGTAG GGGTTTTTCA CAGACCGCTT 5820

TCTAAGGGTA ATTTTAAAAT ATCTGGGAAG TCCCTTCCAC TGCTGTGTTC CAGAAGTGTT 5880

GGTAAACAGC CCACAAATGT CAACAGCAGA AACATACAAG CTGTCAGCTT TGCACAAGGG 5940

CCCAACACCC TGCTCATCAA GAAGCACTGT GGTTGCTGTG TTAGTAATGT GCAAAACAGG 6000

AGGCACATTT TCCCCACCTG TGTAGGTTCC AAAATATCTA GTGTTTTCAT TTTTACTTGG 6060

ATCAGGAACC CAGCACTCCA CTGGATAAGC ATTATCCTTA TCCAAAACAG CCTTGTGGTC 6120

AGTGTTCATC TGCTGACTGT CAACTGTAGC ATTTTTTGGG GTTACAGTTT GAGCAGGATA 6180

TTTGGTCCTG TAGTTTGCTA ACACACCCTG CAGCTCCAAA GGTTCCCCAC CAACAGCAAA 6240

AAAATGAAAA TTTGACCCTT GAATGGGTTT TCCAGCACCA TTTTCATGAG TTTTTTGTGT 6300

CCCTGAATGC AAGTTTAACA TAGCAGTTAC CCCAATAACC TCAGTTTTAA CAGTAACAGC 6360

TTCCCACATC AAAATATTTC CACAGGTTAA GTCCTCATTT AAATTAGGCA AAGGAATTAT 6420

ACACTCCGCT ATCGCTACGT GACTGGGTCA TGGCTGCGCC CCGACACCCG CCAACACCCG 6480

CTGACGCGCC CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 6540

TCTCCGGGAG CTGCATGTGT CAGAGGTTTT CACCGTCATC ACCGAAACGC GCGAGGCAGC 6600

GGATCATAAT CAGCCATACC ACATTTGTAG AGGTTTTACT TGCTTTAAAA AACCTCCCAC 6660

ACCTCCCCCT GAACCTGAAA CATAAAATGA ATGCAATTGT TGTTGTTAAC TTGTTTATTG 6720

CAGCTTATAA TGGTTACAAA TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTTT 6780

TTTCACTGCA TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 6840

TCATAATCAG CCATACCACA TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 6900

TCCCCCTGAA CCTGAAACAT AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG 6960

CTTATAATGG TTACAAATAA AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 7020

CACTGCATTC TAGTTGTGGT TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC 7080

CACCCACATC TGGTATAAAA GGAGGCAGTG GCCCACAGAG GAGCACAGCT GTGTTTGGCT 7140

GCAGGGCCAA GAGCGCTGTC AAGAAGACCC ACACGCCCCC CTCCAGCAGC TGAATTCCAG 7200

CTGGCATTCC GGTACTGTTG GTAAAATGGA AGACGCCAAA AACATAAAGA AAGGCCCGGC 7260

GCCATTCTAT CCTCTAGAGG ATGGAACCGC TGGAGAGCAA CTGCATAAGG CTATGAAGAG 7320

ATACGCCCTG GTTCCTGGAA CAATTGCTTT TACAGATGCA CATATCGAGG TGAACATCAC 7380

GTACGCGGAA TACTTCGAAA TGTCCGTTCG GTTGGCAGAA GCTATGAAAC GATATGGGCT 7440

GAATACAAAT CACAGAATCG TCGTATGCAG TGAAAACTCT CTTCAATTCT TTATGCCGGT 7500

GTTGGGCGCG TTATTTATCG GAGTTGCAGT TGCGCCCGCG AACGACATTT ATAATGAACG 7560

TGAATTGCTC AACAGTATGA ACATTTCGCA GCCTACCGTA GTGTTTGTTT CCAAAAAGGG 7620

GTTGCAAAAA ATTTTGAACG TGCAAAAAAA ATTACCAATA ATCCAGAAAA TTATTATCAT 7680

GGATTCTAAA ACGGATTACC AGGGATTTCA GTCGATGTAC ACGTTCGTCA CATCTCATCT 7740

ACCTCCCGGT TTTAATGAAT ACGATTTTGT ACCAGAGTCC TTTGATCGTG ACAAAACAAT 7800

TGCACTGATA ATGAATTCCT CTGGATCTAC TGGGTTACCT AAGGGTGTGG CCCTTCCGCA 7860

TAGAACTGCC TGCGTCAGAT TCTCGCATGC CAGAGATCCT ATTTTTGGCA ATCAAATCAT 7920

TCCGGATACT GCGATTTTAA GTGTTGTTCC ATTCCATCAC GGTTTTGGAA TGTTTACTAC 7980

ACTCGGATAT TTGATATGTG GATTTCGAGT CGTCTTAATG TATAGATTTG AAGAAGAGCT 8040

GTTTTTACGA TCCCTTCAGG ATTACAAAAT TCAAAGTGCG TTGCTAGTAC CAACCCTATT 8100

TTCATTCTTC GCCAAAAGCA CTCTGATTGA CAAATACGAT TTATCTAATT TACACGAAAT 8160

TGCTTCTGGG GGCGCACCTC TTTCGAAAGA AGTCGGGGAA GCGGTTGCAA AACGCTTCCA 8220

TCTTCCAGGG ATACGACAAG GATATGGGCT CACTGAGACT ACATCAGCTA TTCTGATTAC 8280

ACCCGAGGGG GATGATAAAC CGGGCGCGGT CGGTAAAGTT GTTCCATTTT TTGAAGCGAA 8340

GGTTGTGGAT CTGGATACCG GGAAAACGCT GGGCGTTAAT CAGAGAGGCG AATTATGTGT 8400

CAGAGGACCT ATGATTATGT CCGGTTATGT AAACAATCCG GAAGCGACCA ACGCCTTGAT 8460

TGACAAGGAT GGATGGCTAC ATTCTGGAGA CATAGCTTAC TGGGACGAAG ACGAACACTT 8520

CTTCATAGTT GACCGCTTGA AGTCTTTAAT TAAATACAAA GGATATCAGG TGGCCCCCGC 8580

TGAATTGGAA TCGATATTGT TACAACACCC CAACATCTTC GACGCGGGCG TGGCAGGTCT 8640

TCCCGACGAT GACGCCGGTG AACTTCCCGC CGCCGTTGTT GTTTTGGAGC ACGGAAAGAC 8700

GATGACGGAA AAAGAGATCG TGGATTACGT CGCCAGTCAA GTAACAACCG CGAAAAAGTT 8760

GCGCGGAGGA GTTGTGTTTG TGGACGAAGT ACCGAAAGGT CTTACCGGAA AACTCGACGC 8820

AAGAAAAATC AGAGAGATCC TCATAAAGGC CAAGAAGGGC GGAAAGTCCA AATTGTAAAA 8880

TGTAACTGTA TTCAGCGATG ACGAAATTCT TAGCTATTGT AATGACTCTA GAGGATCTTT 8940

GTGAAGGAAC CTTACTTCTG TGGTGTGACA TAATTGGACA AACTACCTAC AGAGATTTAA 9000

AGCTCTAAGG TAAATATAAA ATTTTTAAGT GTATAATGTG TTAAACTACT GATTCTAATT 9060

GTTTGTGTAT TTTAGATTCC AACCTATGGA ACTGATGAAT GGGAGCAGTG GTGGAATGCC 9120

TTTAATGAGG AAAACCTGTT TTGCTCAGAA GAAATGCCAT CTAGTGATGA TGAGGCTACT 9180

GCTGACTCTC AACATTCTAC TCCTCCAAAA AAGAAGAGAA AGGTAGAAGA CCCCAAGGAC 9240

TTTCCTTCAG AATTGCTAAG TTTTTTGAGT CATGCTGTGT TTAGTAATAG AACTCTTGCT 9300

TGCTTTGCTA TTTACACCAC AAAGGAAAAA GCTGCACTGC TATACAAGAA AATTATGGAA 9360

AAATATTCTG TAACCTTTAT AAGTAGGCAT AACAGTTATA ATCATAACAT ACTGTTTTTT 9420

CTTACTCCAC ACAGGCATAG AGTGTCTGCT ATTAATAACT ATGCTCAAAA ATTGTGTACC 9480

TTTAGCTTTT TAATTTGTAA AGGGGTTAAT AAGGAATATT TGATGTATAG TGCCTTGACT 9540

AGAGATCATA ATCAGCCATA CCACATTTGT AGAGGTTTTA CTTGCTTTAA AAAACCTCCC 9600

ACACCTCCCC CTGAACCTGA AACATAAAAT GAATGCAATT GTTGTTGTTA ACTTGTTTAT 9660

TGCAGCTTAT AATGGTTACA AATAAAGCAA TAGCATCACA AATTTCACAA ATAAAGCATT 9720

TTTTTCACTG CATTCTAGTT GTGGTTTGTC CAAACTCATC AATGTATCTT ATCATGTCTG 9780

GATCCCCAGG AAGCTCCTCT GTGTCCTCAT AAACCCTAAC CTCCTCTACT TGAGAGGACA 9840

TTCCAATCAT AGGCTGCCCA TCCACCCTCT GTGTCCTCCT GTTAATTAGG TCACTTAACA 9900

AAAAGGAAAT TGGGTAGGGG TTTTTCACAG ACCGCTTTCT AAGGGTAATT TTAAAATATC 9960

TGGGAAGTCC CTTCCACTGC TGTGTTCCAG AAGTGTTGGT AAACAGCCCA CAAATGTCAA 10020

CAGCAGAAAC ATACAAGCTG TCAGCTTTGC ACAAGGGCCC AACACCCTGC TCAGCAAGAA 10080

GCACTGTGGT TGCTGTGTTA GTAATGTGCA AAACAGGAGG CACATTTTCC CCACCTGTGT 10140

AGGTTCCAAA ATATCTAGTG TTTTCATTTT TACTTGGATC AGGAACCCAG CACTCCACTG 10200

GATAAGCATT ATCCTTATCC AAAACAGCCT TGTGGTCAGT GTTCATCTGC TGACTGTCAA 10260

CTGTAGCATT TTTTGGGGTT ACAGTTTGAG CAGGATATTT GGTCCTGTAG TTTGCTAACA 10320

CACCCTGCAG CTCCAAAGGT TCCCCACCAA CAGCAAAAAA ATGAAAATTT GACCCTTGAA 10380

TGGGTTTTCC AGCACCATTT TCATGAGTTT TTTGTGTCCC TGAATGCAAG TTTAACATAG 10440

CAGTTACCCC AATAACCTCA GTTTTAACAG TAACAGCTTC CCACATCAAA ATATTTCCAC 10500

AGGTTAAGTC CTCATTTAAA TTAGGCAAAG GAA 10533 (2) INFORMATION FOR SEQ ID NO:23:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 6229 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: circular

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23:

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100

ATAGTTAAGC CAGTATACAC TCCGCTATCG CTACGTGACT GGGTCATGGC TGCGCCCCGA 2160

CACCCGCCAA CACCCGCTGA CGCGCCCTGA CGGGCTTGTC TGCTCCCGGC ATCCGCTTAC 2220

AGACAAGCTG TGACCGTCTC CGGGAGCTGC ATGTGTCAGA GGTTTTCACC GTCATCACCG 2280

AAACGCGCGA GGCAGCGGAT CATAATCAGC CATACCACAT TTGTAGAGGT TTTACTTGCT 2340

TTAAAAAACC TCCCACACCT CCCCCTGAAC CTGAAACATA AAATGAATGC AATTGTTGTT 2400

GTTAACTTGT TTATTGCAGC TTATAATGGT TACAAATAAA GCAATAGCAT CACAAATTTC 2460

ACAAATAAAG CATTTTTTTC ACTGCATTCT AGTTGTGGTT TGTCCAAACT CATCAATGTA 2520

TCTTATCATG TCTGGATCAT AATCAGCCAT ACCACATTTG TAGAGGTTTT ACTTGCTTTA 2580

AAAAACCTCC CACACCTCCC CCTGAACCTG AAACATAAAA TGAATGCAAT TGTTGTTGTT 2640

AACTTGTTTA TTGCAGCTTA TAATGGTTAC AAATAAAGCA ATAGCATCAC AAATTTCACA 2700

AATAAAGCAT TTTTTTCACT GCATTCTAGT TGTGGTTTGT CCAAACTCAT CAATGTATCT 2760

TATCATGTCT GGATCCCACC CACATCTGGT ATAAAAGGAG GCAGTGGCCC ACAGAGGAGC 2820

ACAGCTGTGT TTGGCTGCAG GGCCAAGAGC GCTGTCAAGA AGACCCACAC GCCCCCCTCC 2880

AGCAGCTGAA TTCCAGCTGG CATTCCGGTA CTGTTGGTAA AATGGAAGAC GCCAAAAACA 2940

TAAAGAAAGG CCCGGCGCCA TTCTATCCTC TAGAGGATGG AACCGCTGGA GAGCAACTGC 3000

ATAAGGCTAT GAAGAGATAC GCCCTGGTTC CTGGAACAAT TGCTTTTACA GATGCACATA 3060

TCGAGGTGAA CATCACGTAC GCGGAATACT TCGAAATGTC CGTTCGGTTG GCAGAAGCTA 3120

TGAAACGATA TGGGCTGAAT ACAAATCACA GAATCGTCGT ATGCAGTGAA AACTCTCTTC 3180

AATTCTTTAT GCCGGTGTTG GGCGCGTTAT TTATCGGAGT TGCAGTTGCG CCCGCGAACG 3240

ACATTTATAA TGAACGTGAA TTGCTCAACA GTATGAACAT TTCGCAGCCT ACCGTAGTGT 3300

TTGTTTCCAA AAAGGGGTTG CAAAAAATTT TGAACGTGCA AAAAAAATTA CCAATAATCC 3360

AGAAAATTAT TATCATGGAT TCTAAAACGG ATTACCAGGG ATTTCAGTCG ATGTACACGT 3420

TCGTCACATC TCATCTACCT CCCGGTTTTA ATGAATACGA TTTTGTACCA GAGTCCTTTG 3480

ATCGTGACAA AACAATTGCA CTGATAATGA ATTCCTCTGG ATCTACTGGG TTACCTAAGG 3540

GTGTGGCCCT TCCGCATAGA ACTGCCTGCG TCAGATTCTC GCATGCCAGA GATCCTATTT 3600

TTGGCAATCA AATCATTCCG GATACTGCGA TTTTAAGTGT TGTTCCATTC CATCACGGTT 3660

TTGGAATGTT TACTACACTC GGATATTTGA TATGTGGATT TCGAGTCGTC TTAATGTATA 3720

GATTTGAAGA AGAGCTGTTT TTACGATCCC TTCAGGATTA CAAAATTCAA AGTGCGTTGC 3780

TAGTACCAAC CCTATTTTCA TTCTTCGCCA AAAGCACTCT GATTGACAAA TACGATTTAT 3840

CTAATTTACA CGAAATTGCT TCTGGGGGCG CACCTCTTTC GAAAGAAGTC GGGGAAGCGG 3900

TTGCAAAACG CTTCCATCTT CCAGGGATAC GACAAGGATA TGGGCTCACT GAGACTACAT 3960

CAGCTATTCT GATTACACCC GAGGGGGATG ATAAACCGGG CGCGGTCGGT AAAGTTGTTC 4020

CATTTTTTGA AGCGAAGGTT GTGGATCTGG ATACCGGGAA AACGCTGGGC GTTAATCAGA 4080

GAGGCGAATT ATGTGTCAGA GGACCTATGA TTATGTCCGG TTATGTAAAC AATCCGGAAG 4140

CGACCAACGC CTTGATTGAC AAGGATGGAT GGCTACATTC TGGAGACATA GCTTACTGGG 4200

ACGAAGACGA ACACTTCTTC ATAGTTGACC GCTTGAAGTC TTTAATTAAA TACAAAGGAT 4260

ATCAGGTGGC CCCCGCTGAA TTGGAATCGA TATTGTTACA ACACCCCAAC ATCTTCGACG 4320

CGGGCGTGGC AGGTCTTCCC GACGATGACG CCGGTGAACT TCCCGCCGCC GTTGTTGTTT 4380

TGGAGCACGG AAAGACGATG ACGGAAAAAG AGATCGTGGA TTACGTCGCC AGTCAAGTAA 4440

CAACCGCGAA AAAGTTGCGC GGAGGAGTTG TGTTTGTGGA CGAAGTACCG AAAGGTCTTA 4500

CCGGAAAACT CGACGCAAGA AAAATCAGAG AGATCCTCAT AAAGGCCAAG AAGGGCGGAA 4560

AGTCCAAATT GTAAAATGTA ACTGTATTCA GCGATGACGA AATTCTTAGC TATTGTAATG 4620

ACTCTAGAGG ATCTTTGTGA AGGAACCTTA CTTCTGTGGT GTGACATAAT TGGACAAACT 4680

ACCTACAGAG ATTTAAAGCT CTAAGGTAAA TATAAAATTT TTAAGTGTAT AATGTGTTAA 4740

ACTACTGATT CTAATTGTTT GTGTATTTTA GATTCCAACC TATGGAACTG ATGAATGGGA 4800

GCAGTGGTGG AATGCCTTTA ATGAGGAAAA CCTGTTTTGC TCAGAAGAAA TGCCATCTAG 4860

TGATGATGAG GCTACTGCTG ACTCTCAACA TTCTACTCCT CCAAAAAAGA AGAGAAAGGT 4920

AGAAGACCCC AAGGACTTTC CTTCAGAATT GCTAAGTTTT TTGAGTCATG CTGTGTTTAG 4980

TAATAGAACT CTTGCTTGCT TTGCTATTTA CACCACAAAG GAAAAAGCTG CACTGCTATA 5040

CAAGAAAATT ATGGAAAAAT ATTCTGTAAC CTTTATAAGT AGGCATAACA GTTATAATCA 5100

TAACATACTG TTTTTTCTTA CTCCACACAG GCATAGAGTG TCTGCTATTA ATAACTATGC 5160 TCAAAAATTG TGTACCTTTA GCTTTTTAAT TTGTAAAGGG GTTAATAAGG AATATTTGAT 5220 GTATAGTGCC TTGACTAGAG ATCATAATCA GCCATACCAC ATTTGTAGAG GTTTTACTTG 5280 CTTTAAAAAA CCTCCCACAC CTCCCCCTGA ACCTGAAACA TAAAATGAAT GCAATTGTTG 5340 TTGTTAACTT GTTTATTGCA GCTTATAATG GTTACAAATA AAGCAATAGC ATCACAAATT 5400 TCACAAATAA AGCATTTTTT TCACTGCATT CTAGTTGTGG TTTGTCCAAA CTCATCAATG 5460 TATCTTATCA TGTCTGGATC CCCAGGAAGC TCCTCTGTGT CCTCATAAAC CCTAACCTCC 5520 TCTACTTGAG AGGACATTCC AATCATAGGC TGCCCATCCA CCCTCTGTGT CCTCCTGTTA 5580 ATTAGGTCAC TTAACAAAAA GGAAATTGGG TAGGGGTTTT TCACAGACCG CTTTCTAAGG 5640 GTAATTTTAA AATATCTGGG AAGTCCCTTC CACTGCTGTG TTCCAGAAGT GTTGGTAAAC 5700 AGCCCACAAA TGTCAACAGC AGAAACATAC AAGCTGTCAG CTTTGCACAA GGGCCCAACA 5760 CCCTGCTCAG CAAGAAGCAC TGTGGTTGCT GTGTTAGTAA TGTGCAAAAC AGGAGGCACA 5820 TTTTCCCCAC CTGTGTAGGT TCCAAAATAT CTAGTGTTTT CATTTTTACT TGGATCAGGA 5880 ACCCAGCACT CCACTGGATA AGCATTATCC TTATCCAAAA CAGCCTTGTG GTCAGTGTTC 5940 ATCTGCTGAC TGTCAACTGT AGCATTTTTT GGGGTTACAG TTTGAGCAGG ATATTTGGTC 6000 CTGTAGTTTG CTAACACACC CTGCAGCTCC AAAGGTTCCC CACCAACAGC AAAAAAATGA 6060 AAATTTGACC CTTGAATGGG TTTTCCAGCA CCATTTTCAT GAGTTTTTTG TGTCCCTGAA 6120 TGCAAGTTTA ACATAGCAGT TACCCCAATA ACCTCAGTTT TAACAGTAAC AGCTTCCCAC 6180 ATCAAAATAT TTCCACAGGT TAAGTCCTCA TTTAAATTAG GCAAAGGAA 6229 (2) INFORMATION FOR SEQ ID NO:24:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 10768 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: circular

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24:

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720

ACCAAACGAC'GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100

ATAGTTAAGC CAGTATTCGA CCTCGAGGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG 2160

TGACATAATT GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT 2220

TAAGTGTATA ATGTGTTAAA CTACTGATTC TAATTGTTTG TGTATTTTAG ATTCCAACCT 2280

ATGGAACTGA TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT 2340

CAGAAGAAAT GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC 2400

CAAAAAAGAA GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT 2460

TGAGTCATGC TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG 2520

AAAAAGCTGC ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA 2580

GGCATAACAG TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT 2640

CTGCTATTAA TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG 2700

TTAATAAGGA ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA 2760

TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT 2820

AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA 2880

AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT 2940

TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC GGCTGTGGAA TGTGTGTCAG 3000

TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA GTATGCAAAG CATGCATCTC 3060

AATTAGTCAG CAACCAGGTG TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG AAGTATGCAA 3120

AGCATGCATC TCAATTAGTC AGCAACCATA GTCCCGCCCC TAACTCCGCC CATCCCGCCC 3180

CTAACTCCGC CCAGTTCCGC CCATTCTCCG CCCCATGGCT GACTAATTTT TTTTATTTAT 3240

GCAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG AGGCTTTTTT 3300

GGAGGCCTAG GCTTTTGCAA AAAGCTTCAC GCTGCCGCAA GCACTCAGGG CGCAAGGGCT 3360

GCTAAAGGAA GCGGAACACG TAGAAAGCCA GTCCGCAGAA ACGGTGCTGA CCCCGGATGA 3420

ATGTCAGCTA CTGGGCTATC TGGACAAGGG AAAACGCAAG CGCAAAGAGA AAGCAGGTAG 3480

CTTGCAGTGG GCTTACATGG CGATAGCTAG ACTGGGCGGT TTTATGGACA GCAAGCGAAC 3540

CGGAATTGCC AGCTGGGGCG CCCTCTGGTA AGGTTGGGAA GCCCTGCAAA GTAAACTGGA 3600

TGGCTTTCTT GCCGCCAAGG ATCTGATGGC GCAGGGGATC AAGATCTGAT CAAGAGACAG 3660

GATGAGGATC GTTTCGCATG ATTGAACAAG ATGGATTGCA CGCAGGTTCT CCGGCCGCTT 3720

GGGTGGAGAG GCTATTCGGC TATGACTGGG CACAACAGAC AATCGGCTGC TCTGATGCCG 3780

CCGTGTTCCG GCTGTCAGCG CAGGGGCGCC CGGTTCTTTT TGTCAAGACC GACCTGTCCG 3840

GTGCCCTGAA TGAACTGCAG GACGAGGCAG CGCGGCTATC GTGGCTGGCC ACGACGGGCG 3900

TTCCTTGCGC AGCTGTGCTC GACGTTGTCA CTGAAGCGGG AAGGGACTGG CTGCTATTGG 3960

GCGAAGTGCC GGGGCAGGAT CTCCTGTCAT CTCACCTTGC TCCTGCCGAG AAAGTATCCA 4020

TCATGGCTGA TGCAATGCGG CGGCTGCATA CGCTTGATCC GGCTACCTGC CCATTCGACC 4080

ACCAAGCGAA ACATCGCATC GAGCGAGCAC GTACTCGGAT GGAAGCCGGT CTTGTCGATC 4140

AGGATGATCT GGACGAAGAG CATCAGGGGC TCGCGCCAGC CGAACTGTTC GCCAGGCTCA 4200

AGGCGCGCAT GCCCGACGGC GAGGATCTCG TCGTGACCCA TGGCGATGCC TGCTTGCCGA 4260

ATATCATGGT GGAAAATGGC CGCTTTTCTG GATTCATCGA CTGTGGCCGG CTGGGTGTGG 4320

CGGACCGCTA TCAGGACATA GCGTTGGCTA CCCGTGATAT TGCTGAAGAG CTTGGCGGCG 4380

AATGGGCTGA CCGCTTCCTC GTGCTTTACG GTATCGCCGC TCCCGATTCG CAGCGCATCG 4440

CCTTCTATCG CCTTCTTGAC GAGTTCTTCT GAGCGGGACT CTGGGGTTCG AAATGACCGA 4500

CCAAGCGACG CCCAACCTGC CATCACGAGA TTTCGATTCC ACCGCCGCCT TCTATGAAAG 4560

GTTGGGCTTC GGAATCGTTT TCCGGGACGC CGGCTGGATG ATCCTCCAGC GCGGGGATCT 4620

CATGCTGGAG TTCTTCGCCC ACCCCGGGCT CGATCCCCTC GCGAGTTGGT TCAGCTGCTG 4680

CCTGAGGCTG GACGACCTCG CGGAGTTCTA CCGGCAGTGC AAATCCGTCG GCATCCAGGA 4740

AACCAGCAGC GGCTATCCGC GCATCCATGC CCCCGAACTG CAGGAGTGGG GAGGCACGAT 4800

GGCCGCTTTG GTCCCGGATC TTTGTGAAGG AACCTTACTT CTGTGGTGTG ACATAATTGG 4860

ACAAACTACC TACAGAGATT TAAAGCTCTA AGGTAAATAT AAAATTTTTA AGTGTATAAT 4920

GTGTTAAACT ACTGATTCTA ATTGTTTGTG TATTTTAGAT TCCAACCTAT GGAACTGATG 4980

AATGGGAGCA GTGGTGGAAT GCCTTTAATG AGGAAAACCT GTTTTGCTCA GAAGAAATGC 5040

CATCTAGTGA TGATGAGGCT ACTGCTGACT CTCAACATTC TACTCCTCCA AAAAAGAAGA 5100

GAAAGGTAGA AGACCCCAAG GACTTTCCTT CAGAATTGCT AAGTTTTTTG AGTCATGCTG 5160

TGTTTAGTAA TAGAACTCTT GCTTGCTTTG CTATTTACAC CACAAAGGAA AAAGCTGCAC 5220

TGCTATACAA GAAAATTATG GAAAAATATT CTGTAACCTT TATAAGTAGG CATAACAGTT 5280

ATAATCATAA CATACTGTTT TTTCTTACTC CACACAGGCA TAGAGTGTCT GCTATTAATA 5340

ACTATGCTCA AAAATTGTGT ACCTTTAGCT TTTTAATTTG TAAAGGGGTT AATAAGGAAT 5400

ATTTGATGTA TAGTGCCTTG ACTAGAGATC ATAATCAGCC ATACCACATT TGTAGAGGTT 5460

TTACTTGCTT TAAAAAACCT CCCACACCTC CCCCTGAACC TGAAACATAA AATGAATGCA 5520

ATTGTTGTTG TTAACTTGTT TATTGCAGCT TATAATGGTT ACAAATAAAG CAATAGCATC 5580

ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA GTTGTGGTTT GTCCAAACTC 5640

ATCAATGTAT CTTATCATGT CTGGATCCCC AGGAAGCTCC TCTGTGTCCT CATAAACCCT 5700

AACCTCCTCT ACTTGAGAGG ACATTCCAAT CATAGGCTGC CCATCCACCC TCTGTGTCCT 5760

' CCTGTTAATT AGGTCACTTA ACAAAAAGGA AATTGGGTAG GGGTTTTTCA CAGACCGCTT 5820

TCTAAGGGTA ATTTTAAAAT ATCTGGGAAG TCCCTTCCAC TGCTGTGTTC CAGAAGTGTT 5880

GGTAAACAGC CCACAAATGT CAACAGCAGA AACATACAAG CTGTCAGCTT TGCACAAGGG 5940

CCCAACACCC TGCTCATCAA GAAGCACTGT GGTTGCTGTG TTAGTAATGT GCAAAACAGG 6000

AGGCACATTT TCCCCACCTG TGTAGGTTCC AAAATATCTA GTGTTTTCAT TTTTACTTGG 6060

ATCAGGAACC CAGCACTCCA CTGGATAAGC ATTATCCTTA TCCAAAACAG CCTTGTGGTC 6120

AGTGTTCATC TGCTGACTGT CAACTGTAGC ATTTTTTGGG GTTACAGTTT GAGCAGGATA 6180

TTTGGTCCTG TAGTTTGCTA ACACACCCTG CAGCTCCAAA GGTTCCCCAC CAACAGCAAA 6240

AAAATGAAAA TTTGACCCTT GAATGGGTTT TCCAGCACCA TTTTCATGAG TTTTTTGTGT 6300

CCCTGAATGC AAGTTTAACA TAGCAGTTAC CCCAATAACC TCAGTTTTAA CAGTAACAGC 6360

TTCCCACATC AAAATATTTC CACAGGTTAA GTCCTCATTT AAATTAGGCA AAGGAATTAT 6420

ACACTCCGCT ATCGCTACGT GACTGGGTCA TGGCTGCGCC CCGACACCCG CCAACACCCG 6480

CTGACGCGCC CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 6540

TCTCCGGGAG CTGCATGTGT CAGAGGTTTT CACCGTCATC ACCGAAACGC GCGAGGCAGC 6600

GGATCATAAT CAGCCATACC ACATTTGTAG AGGTTTTACT TGCTTTAAAA AACCTCCCAC 6660

ACCTCCCCCT GAACCTGAAA CATAAAATGA ATGCAATTGT TGTTGTTAAC TTGTTTATTG 6720

CAGCTTATAA TGGTTACAAA TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTTT 6780

TTTCACTGCA TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 6840

TCATAATCAG CCATACCACA TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 6900

TCCCCCTGAA CCTGAAACAT AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG 6960

CTTATAATGG TTACAAATAA AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 7020

CACTGCATTC TAGTTGTGGT TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC 7080

CAGGCCAGAC GCCAACAAGG TAGGAGCTGG AGCATTCGGG CTGGGTTTCA CCCCACCGCA 7140

CGGAGGCCTT TTGGGGTGGA GCCCTCAGGC TCAGGGCATA CTACAAACTT TGCCAGCAAA 7200

TCCGCCTCCT GCCTCCACCA ATCGCCAGTC AGGAAGGCAG CCTACCCCGC TGTCTCCACC 7260

TTTGAGAAAC ACTCATCCTC AGGCCATGCA GTGGAATTCC ACAACCTTCC ACCAAACTCT 7320

GCAAGATCCC AGAGTGAGAG GCCTGTATTT CCCTGCTGGT GGCTCCAGTT CAGGAACAGT 7380

AAACCCTGTT CTGACTACTG CCTCTCCCTT ATCGTCAATC TTCTCGAAAT TCCAGCTGGC 7440

ATTCCGGTAC TGTTGGTAAA ATGGAAGACG CCAAAAACAT AAAGAAAGGC CCGGCGCCAT 7500

TCTATCCTCT AGAGGATGGA ACCGCTGGAG AGCAACTGCA TAAGGCTATG AAGAGATACG 7560

CCCTGGTTCC TGGAACAATT GCTTTTACAG ATGCACATAT CGAGGTGAAC ATCACGTACG 7620

CGGAATACTT CGAAATGTCC GTTCGGTTGG CAGAAGCTAT GAAACGATAT GGGCTGAATA 7680

CAAATCACAG AATCGTCGTA TGCAGTGAAA ACTCTCTTCA ATTCTTTATG CCGGTGTTGG 7740

GCGCGTTATT TATCGGAGTT GCAGTTGCGC CCGCGAACGA CATTTATAAT GAACGTGAAT 7800

TGCTCAACAG TATGAACATT TCGCAGCCTA CCGTAGTGTT TGTTTCCAAA AAGGGGTTGC 7860

AAAAAATTTT GAACGTGCAA AAAAAATTAC CAATAATCCA GAAAATTATT ATCATGGATT 7920

CTAAAACGGA TTACCAGGGA TTTCAGTCGA TGTACACGTT CGTCACATCT CATCTACCTC 7980

CCGGTTTTAA TGAATACGAT TTTGTACCAG AGTCCTTTGA TCGTGACAAA ACAATTGCAC 8040

TGATAATGAA TTCCTCTGGA TCTACTGGGT TACCTAAGGG TGTGGCCCTT CCGCATAGAA 8100

CTGCCTGCGT CAGATTCTCG CATGCCAGAG ATCCTATTTT TGGCAATCAA ATCATTCCGG 8160

ATACTGCGAT TTTAAGTGTT GTTCCATTCC ATCACGGTTT TGGAATGTTT ACTACACTCG 8220

GATATTTGAT ATGTGGATTT CGAGTCGTCT TAATGTATAG ATTTGAAGAA GAGCTGTTTT 8280

TACGATCCCT TCAGGATTAC AAAATTCAAA GTGCGTTGCT AGTACCAACC CTATTTTCAT 8340

TCTTCGCCAA AAGCACTCTG ATTGACAAAT ACGATTTATC TAATTTACAC GAAATTGCTT 8400

CTGGGGGCGC ACCTCTTTCG AAAGAAGTCG GGGAAGCGGT TGCAAAACGC TTCCATCTTC 8460

CAGGGATACG ACAAGGATAT GGGCTCACTG AGACTACATC AGCTATTCTG ATTACACCCG 8520

AGGGGGATGA TAAACCGGGC GCGGTCGGTA AAGTTGTTCC ATTTTTTGAA GCGAAGGTTG 8580

TGGATCTGGA TACCGGGAAA ACGCTGGGCG TTAATCAGAG AGGCGAATTA TGTGTCAGAG 8640

GACCTATGAT TATGTCCGGT TATGTAAACA ATCCGGAAGC GACCAACGCC TTGATTGACA 8700

AGGATGGATG GCTACATTCT GGAGACATAG CTTACTGGGA CGAAGACGAA CACTTCTTCA 8760

TAGTTGACCG CTTGAAGTCT TTAATTAAAT ACAAAGGATA TCAGGTGGCC CCCGCTGAAT 8820

TGGAATCGAT ATTGTTACAA CACCCCAACA TCTTCGACGC GGGCGTGGCA GGTCTTCCCG 8880

ACGATGACGC CGGTGAACTT CCCGCCGCCG TTGTTGTTTT GGAGCACGGA AAGACGATGA 8940

CGGAAAAAGA GATCGTGGAT TACGTCGCCA GTCAAGTAAC AACCGCGAAA AAGTTGCGCG 9000

GAGGAGTTGT GTTTGTGGAC GAAGTACCGA AAGGTCTTAC CGGAAAACTC GACGCAAGAA 9060

AAATCAGAGA GATCCTCATA AAGGCCAAGA AGGGCGGAAA GTCCAAATTG TAAAATGTAA 9120

CTGTATTCAG CGATGACGAA ATTCTTAGCT ATTGTAATGA CTCTAGAGGA TCTTTGTGAA 9180

GGAACCTTAC TTCTGTGGTG TGACATAATT GGACAAACTA CCTACAGAGA TTTAAAGCTC 9240

TAAGGTAAAT ATAAAATTTT TAAGTGTATA ATGTGTTAAA CTACTGATTC TAATTGTTTG 9300

TGTATTTTAG ATTCCAACCT ATGGAACTGA TGAATGGGAG CAGTGGTGGA ATGCCTTTAA 9360

TGAGGAAAAC CTGTTTTGCT CAGAAGAAAT GCCATCTAGT GATGATGAGG CTACTGCTGA 9420

CTCTCAACAT TCTACTCCTC CAAAAAAGAA GAGAAAGGTA GAAGACCCCA AGGACTTTCC 9480

TTCAGAATTG CTAAGTTTTT TGAGTCATGC TGTGTTTAGT AATAGAACTC TTGCTTGCTT 9540

TGCTATTTAC ACCACAAAGG AAAAAGCTGC ACTGCTATAC AAGAAAATTA TGGAAAAATA 9600

TTCTGTAACC TTTATAAGTA GGCATAACAG TTATAATCAT AACATACTGT TTTTTCTTAC 9660

TCCACACAGG CATAGAGTGT CTGCTATTAA TAACTATGCT CAAAAATTGT GTACCTTTAG 9720

CTTTTTAATT TGTAAAGGGG TTAATAAGGA ATATTTGATG TATAGTGCCT TGACTAGAGA 9780

TCATAATCAG CCATACCACA TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 9840

TCCCCCTGAA CCTGAAACAT AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG 9900

CTTATAATGG TTACAAATAA AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 9960

CACTGCATTC TAGTTGTGGT TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC 10020

CCAGGAAGCT CCTCTGTGTC CTCATAAACC CTAACCTCCT CTACTTGAGA GGACATTCCA 10080

ATCATAGGCT GCCCATCCAC CCTCTGTGTC CTCCTGTTAA TTAGGTCACT TAACAAAAAG 10140

GAAATTGGGT AGGGGTTTTT CACAGACCGC TTTCTAAGGG TAATTTTAAA ATATCTGGGA 10200

AGTCCCTTCC ACTGCTGTGT TCCAGAAGTG TTGGTAAACA GCCCACAAAT GTCAACAGCA 10260

GAAACATACA AGCTGTCAGC TTTGCACAAG GGCCCAACAC CCTGCTCAGC AAGAAGCACT 10320

GTGGTTGCTG TGTTAGTAAT GTGCAAAACA GGAGGCACAT TTTCCCCACC TGTGTAGGTT 10380

CCAAAATATC TAGTGTTTTC ATTTTTACTT GGATCAGGAA CCCAGCACTC CACTGGATAA 10440

GCATTATCCT TATCCAAAAC AGCCTTGTGG TCAGTGTTCA TCTGCTGACT GTCAACTGTA 10500

GCATTTTTTG GGGTTACAGT TTGAGCAGGA TATTTGGTCC TGTAGTTTGC TAACACACCC 10560

TGCAGCTCCA AAGGTTCCCC ACCAACAGCA AAAAAATGAA AATTTGACCC TTGAATGGGT 10620

TTTCCAGCAC CATTTTCATG AGTTTTTTGT GTCCCTGAAT GCAAGTTTAA CATAGCAGTT 10680

ACCCCAATAA CCTCAGTTTT AACAGTAACA GCTTCCCACA TCAAAATATT TCCACAGGTT 10740

AAGTCCTCAT TTAAATTAGG CAAAGGAA 10768

(2) INFORMATION FOR SEQ ID NO:25:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 6464 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: circular

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25:

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100

ATAGTTAAGC CAGTATACAC TCCGCTATCG CTACGTGACT GGGTCATGGC TGCGCCCCGA 2160

CACCCGCCAA CACCCGCTGA CGCGCCCTGA CGGGCTTGTC TGCTCCCGGC ATCCGCTTAC 2220

AGACAAGCTG TGACCGTCTC CGGGAGCTGC ATGTGTCAGA GGTTTTCACC GTCATCACCG 2280

AAACGCGCGA GGCAGCGGAT CATAATCAGC CATACCACAT TTGTAGAGGT TTTACTTGCT 2340

TTAAAAAACC TCCCACACCT CCCCCTGAAC CTGAAACATA AAATGAATGC AATTGTTGTT 2400

GTTAACTTGT TTATTGCAGC TTATAATGGT TACAAATAAA GCAATAGCAT CACAAATTTC 2460

ACAAATAAAG CATTTTTTTC ACTGCATTCT AGTTGTGGTT TGTCCAAACT CATCAATGTA 2520

TCTTATCATG TCTGGATCAT AATCAGCCAT ACCACATTTG TAGAGGTTTT ACTTGCTTTA 2580

AAAAACCTCC CACACCTCCC CCTGAACCTG AAACATAAAA TGAATGCAAT TGTTGTTGTT 2640 AACTTGTTTA TTGCAGCTTA TAATGGTTAC AAATAAAGCA ATAGCATCAC AAATTTCACA 2700 AATAAAGCAT TTTTTTCACT GCATTCTAGT TGTGGTTTGT CCAAACTCAT CAATGTATCT 2760 TATCATGTCT GGATCCCAGG CCAGACGCCA ACAAGGTAGG AGCTGGAGCA TTCGGGCTGG 2820 GTTTCACCCC ACCGCACGGA GGCCTTTTGG GGTGGAGCCC TCAGGCTCAG GGCATACTAC 2880 AAACTTTGCC AGCAAATCCG CCTCCTGCCT CCACCAATCG CCAGTCAGGA AGGCAGCCTA 2940 CCCCGCTGTC TCCACCTTTG AGAAACACTC ATCCTCAGGC CATGCAGTGG AATTCCACAA 3000 CCTTCCACCA AACTCTGCAA GATCCCAGAG TGAGAGGCCT GTATTTCCCT GCTGGTGGCT 3060 CCAGTTCAGG AACAGTAAAC CCTGTTCTGA CTACTGCCTC TCCCTTATCG TCAATCTTCT 3120 CGAAATTCCA GCTGGCATTC CGGTACTGTT GGTAAAATGG AAGACGCCAA AAACATAAAG 3180 AAAGGCCCGG CGCCATTCTA TCCTCTAGAG GATGGAACCG CTGGAGAGCA ACTGCATAAG 3240 GCTATGAAGA GATACGCCCT GGTTCCTGGA ACAATTGCTT TTACAGATGC ACATATCGAG 3300 GTGAACATCA CGTACGCGGA ATACTTCGAA ATGTCCGTTC GGTTGGCAGA AGCTATGAAA 3360 CGATATGGGC TGAATACAAA TCACAGAATC GTCGTATGCA GTGAAAACTC TCTTCAATTC 3420 TTTATGCCGG TGTTGGGCGC GTTATTTATC GGAGTTGCAG TTGCGCCCGC GAACGACATT 3480 TATAATGAAC GTGAATTGCT CAACAGTATG AACATTTCGC AGCCTACCGT AGTGTTTGTT 3540 TCCAAAAAGG GGTTGCAAAA AATTTTGAAC GTGCAAAAAA AATTACCAAT AATCCAGAAA 3600 ATTATTATCA TGGATTCTAA AACGGATTAC CAGGGATTTC AGTCGATGTA CACGTTCGTC 3660 ACATCTCATC TACCTCCCGG TTTTAATGAA TACGATTTTG TACCAGAGTC CTTTGATCGT 3720 GACAAAACAA TTGCACTGAT AATGAATTCC TCTGGATCTA CTGGGTTACC TAAGGGTGTG 3780 GCCCTTCCGC ATAGAACTGC CTGCGTCAGA TTCTCGCATG CCAGAGATCC TATTTTTGGC 3840 AATCAAATCA TTCCGGATAC TGCGATTTTA AGTGTTGTTC CATTCCATCA CGGTTTTGGA 3900 ATGTTTACTA CACTCGGATA TTTGATATGT GGATTTCGAG TCGTCTTAAT GTATAGATTT 3960 GAAGAAGAGC TGTTTTTACG ATCCCTTCAG GATTACAAAA TTCAAAGTGC GTTGCTAGTA 4020 CCAACCCTAT TTTCATTCTT CGCCAAAAGC ACTCTGATTG ACAAATACGA TTTATCTAAT 4080 TTACACGAAA TTGCTTCTGG GGGCGCACCT CTTTCGAAAG AAGTCGGGGA AGCGGTTGCA 4140

AAACGCTTCC ATCTTCCAGG GATACGACAA GGATATGGGC TCACTGAGAC TACATCAGCT 4200

ATTCTGATTA CACCCGAGGG GGATGATAAA CCGGGCGCGG TCGGTAAAGT TGTTCCATTT 4260

TTTGAAGCGA AGGTTGTGGA TCTGGATACC GGGAAAACGC TGGGCGTTAA TCAGAGAGGC 4320

GAATTATGTG TCAGAGGACC TATGATTATG TCCGGTTATG TAAACAATCC GGAAGCGACC 4380

AACGCCTTGA TTGACAAGGA TGGATGGCTA CATTCTGGAG ACATAGCTTA CTGGGACGAA 4440

GACGAACACT TCTTCATAGT TGACCGCTTG AAGTCTTTAA TTAAATACAA AGGATATCAG 4500

GTGGCCCCCG CTGAATTGGA ATCGATATTG TTACAACACC CCAACATCTT CGACGCGGGC 4560

GTGGCAGGTC TTCCCGACGA TGACGCCGGT GAACTTCCCG CCGCCGTTGT TGTTTTGGAG 4620

CACGGAAAGA CGATGACGGA AAAAGAGATC GTGGATTACG TCGCCAGTCA AGTAACAACC 4680

GCGAAAAAGT TGCGCGGAGG AGTTGTGTTT GTGGACGAAG TACCGAAAGG TCTTACCGGA 4740

AAACTCGACG CAAGAAAAAT CAGAGAGATC CTCATAAAGG CCAAGAAGGG CGGAAAGTCC 4800

AAATTGTAAA ATGTAACTGT ATTCAGCGAT GACGAAATTC TTAGCTATTG TAATGACTCT 4860

AGAGGATCTT TGTGAAGGAA CCTTACTTCT GTGGTGTGAC ATAATTGGAC AAACTACCTA 4920

CAGAGATTTA AAGCTCTAAG GTAAATATAA AATTTTTAAG TGTATAATGT GTTAAACTAC 4980

TGATTCTAAT TGTTTGTGTA TTTTAGATTC CAACCTATGG AACTGATGAA TGGGAGCAGT 5040

GGTGGAATGC CTTTAATGAG GAAAACCTGT TTTGCTCAGA AGAAATGCCA TCTAGTGATG 5100

ATGAGGCTAC TGCTGACTCT CAACATTCTA CTCCTCCAAA AAAGAAGAGA AAGGTAGAAG 5160

ACCCCAAGGA CTTTCCTTCA GAATTGCTAA GTTTTTTGAG TCATGCTGTG TTTAGTAATA 5220

GAACTCTTGC TTGCTTTGCT ATTTACACCA CAAAGGAAAA AGCTGCACTG CTATACAAGA 5280

AAATTATGGA AAAATATTCT GTAACCTTTA TAAGTAGGCA TAACAGTTAT AATCATAACA 5340

TACTGTTTTT TCTTACTCCA CACAGGCATA GAGTGTCTGC TATTAATAAC TATGCTCAAA 5400

AATTGTGTAC CTTTAGCTTT TTAATTTGTA AAGGGGTTAA TAAGGAATAT TTGATGTATA 5460

GTGCCTTGAC TAGAGATCAT AATCAGCCAT ACCACATTTG TAGAGGTTTT ACTTGCTTTA 5520

AAAAACCTCC CACACCTCCC CCTGAACCTG AAACATAAAA TGAATGCAAT TGTTGTTGTT 5580

AACTTGTTTA TTGCAGCTTA TAATGGTTAC AAATAAAGCA ATAGCATCAC AAATTTCACA 5640

AATAAAGCAT TTTTTTCACT GCATTCTAGT TGTGGTTTGT CCAAACTCAT CAATGTATCT 5700

TATCATGTCT GGATCCCCAG GAAGCTCCTC TGTGTCCTCA TTGAGAGGAC ATTCCAATCA TAGGCTGCCC ATCCACCCTC GTCACTTAAC AAAAAGGAAA TTGGGTAGGG GTTTTTCACA TTTAAAATAT CTGGGAAGTC CCTTCCACTG CTGTGTTCCA ACAAATGTCA ACAGCAGAAA CATACAAGCT GTCAGCTTTG CTCAGCAAGA AGCACTGTGG TTGCTGTGTT AGTAATGTGC CCCACCTGTG TAGGTTCCAA AATATCTAGT GTTTTCATTT GCACTCCACT GGATAAGCAT TATCCTTATC CAAAACAGCC CTGACTGTCA ACTGTAGCAT TTTTTGGGGT TACAGTTTGA GTTTGCTAAC ACACCCTGCA GCTCCAAAGG TTCCCCACCA TGACCCTTGA ATGGGTTTTC CAGCACCATT TTCATGAGTT GTTTAACATA GCAGTTACCC CAATAACCTC AGTTTTAACA AATATTTCCA CAGGTTAAGT CCTCATTTAA ATTAGGCAAA (2) INFORMATION FOR SEQ ID NO:26:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 7 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: TGASTCA (2) INFORMATION FOR SEQ ID NO:27:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 16 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: TGGNNNNNNN GCCCAA 16

(2) INFORMATION FOR SEQ ID NO:28:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH.: 5 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: TGGCA 5

(2) INFORMATION FOR SEQ ID NO:29:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 7 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: TGACACA (2) INFORMATION FOR SEQ ID NO:30:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 7 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:30: TGAGTCA (2) INFORMATION FOR SEQ ID NO:31:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 7 base pairε

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:31: TGANACA (2) INFORMATION FOR SEQ ID NO:32:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 7 base pairε

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double \B) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: TGATACA (2) INFORMATION FOR SEQ ID NO:33:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 8 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: CCNTGTNT