Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
BACTERIOPHAGE RM 378 OF A THERMOPHILIC HOST ORGANISM
Document Type and Number:
WIPO Patent Application WO/2000/075335
Kind Code:
A2
Abstract:
A novel bacteriophage RM 378 of Rhodothermus marinus, the nucleic acids of its genome, nucleic acids comprising nucleotide sequences of open reading frames (ORFs) of its genome, and polypeptides encoded by the nucleic acids, are described.

Inventors:
HJORLEIFSDOTTIR SIGRIDUR (IS)
HREGGVIDSSON GUDMUNDUR O (IS)
FRIDJONSSON OLAFUR H (IS)
AEVARSSON ARNTHOR (IS)
KRISTJANSSON JAKOB K (IS)
Application Number:
PCT/IB2000/000893
Publication Date:
December 14, 2000
Filing Date:
June 02, 2000
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
DECODE GENETICS EHF (IS)
HJORLEIFSDOTTIR SIGRIDUR (IS)
HREGGVIDSSON GUDMUNDUR O (IS)
FRIDJONSSON OLAFUR H (IS)
AEVARSSON ARNTHOR (IS)
KRISTJANSSON JAKOB K (IS)
International Classes:
C07K14/01; C12N1/21; C12N5/02; C12N7/00; C12N7/01; C12N15/34; C12N15/54; C12N15/55; C12N15/61; (IPC1-7): C12N15/34; C07K14/01; C12N7/00; C12N15/54; C12N15/55; C12N15/61
Domestic Patent References:
WO1994026766A11994-11-24
Other References:
ALFREDSSON G A ET AL: "RHODOTHERMUS MARINUS,GEN. NOV., SP. NOV., A THERMOPHILIC, HALOPHILIC BACTERIUM FROM SUBMARINE HOT SPRINGS IN ICELAND" JOURNAL OF GENERAL MICROBIOLOGY, vol. 134, no. 2, 1988, pages 299-306, XP000652195 ISSN: 0022-1287 cited in the application
WANG J ET AL: "Crystal structure of a pol alpha family replication DNA polymerase from bacteriophage RB69." CELL, vol. 89, no. 7, 1997, pages 1087-1099, XP002164082 ISSN: 0092-8674 cited in the application
HOPFNER KARL-PETER ET AL: "Crystal structure of a thermostable type B DNA polymerase from Thermococcus gorgonarius." PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES, vol. 96, no. 7, 30 March 1999 (1999-03-30), pages 3600-3605, XP002164083 ISSN: 0027-8424 cited in the application
"Exonuclease III (E. coli)" NEW ENGLAND BIOLABS CATALOG, 1998 - 1999, page 94 XP002164084
"DNA POLYMERASE I, KLENOW (EXONUCLEASE-FREE)" MOLECULAR BIOLOGY REAGENTS,US,USB, CLEVELAND, OH, 1990, page 88 XP000606205
PISANI ET AL: "Amino Acid Residues Involved in Determining the Processivity of the 3'-5' Exonuclease Activity in a Family B DNA Polymerase from the Thermoacidophilic Archaeon Sulfolobus solfataricus" BIOCHEMISTRY, vol. 37, no. 42, 20 October 1998 (1998-10-20), pages 15005-15012, XP002145197 ISSN: 0006-2960
Attorney, Agent or Firm:
Fridriksson, Einar Karl (Borgartun 24, IS-105 Reykjavik, IS)
Download PDF:
Claims:
CLAIMS What is claimed is:
1. Isolated bacteriophage RM 378.
2. Isolated nucleic acid molecule comprising the genome of bacteriophage RM 378.
3. Isolated nucleic acid molecule comprising the nucleotide sequence shown in Figure 1 (SEQ ID NO: 1).
4. Isolated nucleic acid molecule comprising a nucleotide sequence of an open reading frame of the nucleotide sequence shown in Figure 1 (SEQ ID NO: 1).
5. The isolated nucleic acid molecule of Claim 4, wherein the nucleic acid molecule comprises a nucleotide sequence of more than one open reading frame of the nucleotide sequence shown in Figure 1 (SEQ ID NO: 1).
6. The isolated nucleic acid molecule of Claim 4, wherein the open reading frame is selected from the group consisting of the open reading frames set forth in Figure 2.
7. The isolated nucleic acid molecule of Claim 6, wherein the open reading frame is selected from the group consisting of : ORF 056e (locus GP43a), ORF 632e (locus GP43b), ORF 739f (locus GP63), ORF 1218a (locus DAS), and ORF 1293b (locus GP41).
8. Isolated nucleic acid molecule which encodes a polypeptide obtainable from bacteriophage RM 378, or an active derivative or fragment thereof.
9. Isolated nucleic acid molecule of Claim 8, wherein the polypeptide is a protein selected from the group consisting of : DETA polymerase, 3'5' exonuclease, 5'3'exonuclease (RNase H), DNA helicase and RNA ligase.
10. Isolated nucleic acid molecule of Claim 9, wherein the polypeptide is a DNA polymerase lacking exonuclease domains.
11. Isolated nucleic acid molecule of Claim 8, wherein the polypeptide is a 3'5' exonuclease lacking DNA polymerase domain.
12. Isolated nucleic acid molecule of Claim 8, wherein the polypeptide is a derivative possessing substantial sequence identity with the endogenous polypeptide.
13. Isolated nucleic acid molecule which encodes a polypeptide that possesses substantial sequence identity with an endogenous polypeptide obtainable from bacteriophage RM 378.
14. A DNA construct comprising an isolated nucleic acid molecule of Claim 2, operatively linked to a regulatory sequence.
15. A host cell comprising a DNA construct of Claim 14.
16. A DNA construct comprising an isolated nucleic acid molecule of Claim 4, operatively linked to a regulatory sequence.
17. A host cell comprising a DNA construct of Claim 16.
18. Isolated polypeptide encoded by a nucleic acid molecule comprising a nucleotide sequence of an open reading frame of the nucleotide sequence shown in Figure 1 (SEQ ID NO: 1).
19. The isolated polypeptide of Claim 18, wherein the open reading frame is selected from the group consisting of the open reading frames set forth in Figure 2.
20. The isolated polypeptide of Claim 19, wherein the open reading frame is selected from the group consisting of : ORF 056e (locus GP43a), ORF 632e (locus GP43b), ORF 739f (locus GP63), ORF 1218a (locus DAS), and ORF 1293b (locus GP41).
21. Isolated polypeptide obtainable from bacteriophage RM 378, or an active derivative or fragment thereof.
22. Isolated polypeptide of Claim 21, wherein the polypeptide is a derivative possessing substantial sequence identity with the endogenous polypeptide.
23. Isolated polypeptide of Claim 21, wherein the polypeptide is a protein selected from the group consisting of : DNA polymerase, 3'5'exonuclease, 5'3'exonuclease (RNase H), DNA helicase and RNA ligase.
24. Isolated polypeptide that possesses substantial sequence identity with an endogenous polypeptide obtainable from bacteriophage RM 378.
25. An isolated DNA polymerase lacking exonuclease domains.
26. An isolated 3'5'exonuclease lacking DNA polymerase domain.
Description:
BACTERIOPHAGE RM 378 OF A THERMOPHILIC HOST ORGANISM RELATED APPLICATION This application claims the benefit of U. S. Provisional Application No.

60/137, 120, filed June 2,1999, the entire teachings of which are incorporated herein byreference.

BACKGROUND OF THE INVENTION The use of thermophilic enzymes has revolutionized the field of recombinant DNA technology. Polymerases (DNA and RNA), ligases, exonucleases, reverse transcriptases, polynucleotide kinases and lysozymes, as well as many other thermophilic enzymes, are of great importance in the research industry today. In addition, thermophilic enzymes are also used in commercial settings (e. g., proteases and lipases used in washing powder, hydrolidic enzymes used in bleaching).

Identification of new thermophilic enzymes will facilitate continued DNA research as well as assist in improving commercial enzyme-based products.

SUMMARY OF THE INVENTION This invention pertains to a novel bacteriophage of Rhodothermus rna3°inus, bacteriophage RM 378, which can be isolated from its native environment or can be recombinantly produced. The invention additionally pertains to the nucleic acids of the genome of bacteriophage RM 378 as deposited, as well as to the nucleic acids of a portion of the genome of bacteriophage RM 378 as shown in Figure 1; to isolated nucleic acid molecules containing a nucleotide sequence of an open reading frame (or more than one open reading frame) of the genome of bacteriophage RM 378, such as an open reading frame as set forth in Figure 2; to isolated nucleic acid molecules encoding a polypeptide obtainable from bacteriophage RM 378 or an

active derivative or fragment of the polypeptide (e. g., a DNA polymerase, such as a DNA polymerase lacking exonuclease domains; a 3'-5'exonuclease, such as a 3'-5' exonuclease lacking DNA polymerase domain; a 5'-3'exonuclease (RNase H); a DNA helicase; or an RNA ligase); to DNA constructs containing the isolated nucleic acid molecule operatively linked to a regulatory sequence; and also to host cells comprising the DNA constructs. The invention further pertains to isolated polypeptides encoded by these nucleic acids, as well as active derivatives or fragments of the polypeptides.

Because the host organism of the RM 378 bacteriophage is a thermophile, the enzymes and proteins of the RM 378 bacteriophage are expected to be significantly more thermostable than those of other (e. g., mesophilic) bacteriophages, such as the T4 bacteriophage of Escherichia col. The enhanced stability of the enzymes and proteins of BA 378 bacteriophage allows their use under temperature conditions which would be prohibitive for other enzymes, thereby increasing the range of conditions which can be employed not only in DNA research but also in commercial settings.

BRIEF DESCRIPTION OF THE DRAWINGS FIGs. 1A - 1Q2 are a depiction of the nucleic acid sequence (SEQ ID NO: 1) of the genome of bacteriophage RM 378.

FIGs. 2A-2C delineate the open reading frames (ORFs) in the genome of bacteriophage RM 378.

FIGs. 3A-3P depict a sequence alignment of the predicted gene products of ORF056e and 0RF632e and sequences of DNA polymerases of family B. The sequence marked RM378 (SEQ ID NO: 36) is the combine sequences of the gene products of ORF056e and ORF632e in bacteriophage RM378. The end of one sequence and the beginning of another is indicated. Other sequences are: Vaccinia virus (strain Copenhagen) DNA polymerase (DPOLVACCC) (SEQ ID NO: 2); Vaccinia virus (strain WR) DNA polymerase (DPOLVACCV) (SEQ ID NO: 3) ; Variola virus DNA polymerase (DPOLVARV) (SEQ ID N0: 4); Fowlpox virus DNA polymerase (DPOLFOWPV) (SEQ ID N0: 5); Bos laurlts (Bovine) DNA

polymerase delta catalytic chain (DPOD_BOVIN) (SEQ ID NO : 6); Human DNA polymerase delta catalytic chain (DPODJHUMAN) (SEQ ID N0: 7) ; Candida albzcans (Yeast) DNA polymerase delta large chain (DPOD CANAL) (SEQ ID NO: 8); Saccharomyces cerevisiae DNA polymerase delta large chain (DPOD_ YEAST) (SEQ ID NO: 9); Schizosaccharomyces pombe DNA polymerase delta large chain (DPOD_ SCHUPO) (SEQ ID NO: 10); Plasmodium falciparum DNA polymerase delta catalytic chain (DPOD_PLAFK) (SEQ ID NO: 11) ; Chlorella virus NY-2A DNA polymerase (DPOL_CHVN2) 9SEQ ID NO : 12); Paramecium bursaria chlorelSa virus 1 DNA polymerase (DPOLCHVP1) (SEQ ID NO: 13); Epstein-barr virus (strain B95-8) DNA polymerase (DPOLJEBV) (SEQ ID N0: 14); Herpesvirus saimiri (strain 11) DNA polymerase (DPOL_HSVSA) (SEQ ID NO : 15); Herpes simplex virus (type 1/strain 17) DNA polymerase (DPOL HSVU) (SEQ lI @ NO: 16); Herpes simplex virus (type 2/strain 186) DNA polymerase (DPOL_ HSV21) (SEQ ID NO: 17); Equine herpesvirus type 1 (strain Ab4p) (EHV-1) DNA polymerase (DPOL HSVEB) (SEQ ID N0: 18); Varicella-zoster virus (strain Dumas) (VZV) DNA polymerase (DPOL_VZVD) (SEQ ID NO : 19); Human cytomegalovirus (strain AD 169) DNA polymerase (DPOL HCMVA) (SEQ ID NO: 2Q); Murine cytomegalovirus (strain Smith) DNA polymerase (DPOL_ MCMVS) (SEQ ID NO: 21); Herpes simplex virus (type 6/strain Uganda-1102) DNA polymerase (DPOL_HSV6U) (SEQ ID NO : 22); Human DNA polymerase alpha catalytic subunit (DPOAJHUMAN) (SEQ ID NO : 23); Mouse DNA polymerase alpha catalytic subunit (DPOAMOUSE) (SEQ ID NO : 24); Drosophile melanogaster DNA polymerase alpha catalytic subunit (DPOADROME) (SEQ ID NQ: 25); Schizosaccharomyces pombe DNA polymerase alpha catalytic subunit (DPOA_SCHPO) (SEQ ID NO: 26); Saccharomyces cerevisiae DNA polymerase alpha catalytic subunit (DPOAYEAST) (SEQ ID No : 27); Tiypanosoma burcei DNA polymerase alpha catalytic subunit (DPOATRYBB) (SEQ ID NO : 28); Autographa californica nuclear polyhedrosis virus DNA polymerase (DPOLNPVAC) (SEQ ID NO : 29); Lyn7antria dispar multicapsid nuclear polyhedrosis virus DNA polymerase (DPOLNPVLD) (SEQ ID NO: 30); Saccharomyces cerevisiae DNA polymerase zeta catalytic subunit (DPOZ_ YEAST)

(SEQ ID NO: 3 1) ; Pyrococcus woesei DNA polymerase (DPOLPYRFU) (SEQ ID NO: 32);. Sulfolobus solfataricus DNA polymerase I (DPOl-SULSO) (SEQ ID (NO : 33); Escherichia coli DNA polymerase II (DP02ECOLI) (SEQ ID NO: 34); Desilforococcus strain Tok DNA polymerase (DpolDtok) (SEQ ID NO: 35); and bacteriophage RB69 DNA polymerase (RB69) (SEQ ID NO: 37). Most of the sequences are partial as found in the Pfam protein family database (http ://www.sanger. ac. ul/Pfam, family Da FIG. 4 depicts a sequence alignment of the predicted gene product of ORF739ffrom bacteriophage RM378 (ORF-739f) (SEQ ID NO: 40), Autographa californica nucleopolyhedrovirus putative bifunctional polynucleotide kinase and RNA ligase (ACNV-RNAlig) SEQ ID NO:38); and bacteriophage T4 RNA ligase (T4-RNAlig) (SEQ ID NO: 39).

FIG. 5 depicts a sequence alignment of the predicted gene product of ORF1218a from bacteriophage RM378 (ORF-1218a) (SEQ ID No : 43) with proteins or domains with 5'-3' exonuclease activity, including: Escherichict coli DNA polymerase I (Ecoli-polI) (SEQ ID NO : 41), Thermus aquaticus DNA polymerase I (Taq-poll) (SEQ ID NO: 42), bacteriophage T4 ribonuclease H (T4-ltNaseH) (SEQ ID NO :44) and bacteriophage T7 gene6 exonuclease (T7-gp6exo) (SEQ ID NO: 45).

Conservation of acidic residues mainly clustered at the proposed active site are seen.

FIGs. 6A-6B depict a sequence alignment of the predicted gene product of ORF1293b (SEQ ID NO: 55) from bacteriophage RM378 (ORF1293b) with sequences of replicative DNA helicases of the DnaB family, including: Escherichia coli (DnaB-Ecoli) (SEQ ID NO : 46), Haemophilus influenza (DnaB-Hinflu) (SEQ ID NO : 47), Chlamya omonas trachomatis (SEQ ID NO : 48), Bacillus (SEQIDNO:49),Halobacterpylori(DnaB-stearothermophilus(DnaB-B stearo) Hpylor) (SEQ ID NO: 50), Mycoplasma genitalium (DnaB-Mgenital) (SEQ ID NO: 51), Borrelia burgclorferi (DnaB-Bburgdor) (SEQ ID NO : 52), bacteriophage T4 gene 41 (T4-gp41) (SEQ ID NO: 53), bacteriophage T7 gene 4 (T7-gp4) (SEQ ID N0: 54) (from Pfam protein family database, http ://www. sanger. ac. uk/Pfam, family DnaB, accession no. PF00772). The sequences have been truncated at the N-termini, and conserved sequence motifs are indicated.

FIGs. 7A-7B depict the nucleic acid sequence of open reading frame ORF 056e (nucleotides 21993-23042 of the genome) (SEQ ID NO: 56) with flanking sequences, and the putative encoded polypeptide (SEQ ID NO: 57) which displays amino acid sequence similarity to polymerase 3'-5'exonucleases.

FIGs. 8A-SB depict the nucleic acid sequence of open reading frame ORF 632e (nucleotides 79584-81152 of the genome) (SEQ ID NO: 58) with flanking sequences, and the putative encoded polypeptide (SEQ ID NO: 59) which displays amino acid sequence similwity to polymerases.

FIGs. 9A-9B depict the nucleic acid sequence of open reading frame ORF 739f (nucleotides 90291-91607 of the genome) (SEQ ID NO: 60) with flanking sequences, and the putative encoded polypeptide (SEQ ID NO:40) which displays amino acid sequence similarity to RNA ligase.

FIGs. 1 0A-1 () B depict the nucleic acid sequence of open reading frame ORF 1218a (nucleotides 8212-9168 of the genome) (SEQ ID NO: 6 1) with flanking sequences, and the putative encoded polypeptide (SEQ ID NO : 43) which displays amino acid sequence similarity to 5'-3'exonuclease of DNA polymerase I and T4 RNase H.

FIGs. 11A-1 IB depict the nucleic acid sequence of open reading frame ORF 1293b (nucleotides 15785-17035 of the genome) (SEQ ID NO: 62) with flanking sequences. and the putative encoded polypeptide (SEQ ID NO: 55) which displays amino acid sequence similarity to T4 DNA helicase.

DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a bacteriophage, the nucleic acid sequence of the bacteriophage genome as well as portions of the nucleic acid sequence of the bacteriophage genome (e. g., a portion containing an open reading frame), and proteins encoded by the nucleic acid sequences, as well as nucleic acid constructs comprising portions of the nucleic acid sequence of the bacteriophage genome, and host cells comprising such nucleic acid constructs. As described herein, Applicants have isolated and characterized a novel bacteriophage active against the slightly halophilic, thermophilic eubacterium Rhodothermus marinus. The bacteriophage,

RM 378, is a member of the MzfyoviridNae family, with an A2 morphology. RM 378, which is completely stable up to about 65°C, appears to consist of approximately 16 proteins with one major protein of molecular weight of 61,000 daltons. RM 378 can be replicated in Rhodothermus marinus species ITI 378.

ITI378RHODOTHERMUSMARINUS Accordingly, one embodiment of the invention is the bacterium, speciesITI378.Rhodothermusmarinus,andparticlarlyRhodothermus marinus species ITI 378, can be cultured in a suitable medium, such as medium 162 for Thermos as described by Degryse et al. (Arch. Microbiol. 117 : 189-196 (1978)), with 1/10 buffer and with 1% NaCl. Rhodothermus marinus species ITI 378 can be used in replication of bacteriophage RM 378, as described herein, or in replication or identification of other bacteriophages, particularly thermophilic bacteriophages.

Rhodothermzcs marinus species ITI 378 can also used in the study of the relationship between the bacteriophages and their host cells (e. g., between bacteriophage RM 378 and Rhodothermus marinus species ITI 378).

BACTERIOPHAGE RM 378 Another embodiment of the invention is isolated RM 378 bacteriophage.

"Isolated"RM 378 bacteriophage refers to bacteriophage that has been separated, partially or totally, from its native environment (e. g., separated from Rhodothermus mari71us host cells) ("native bacteriophage"), and also refers to bacteriophage that has been chemically synthesized or recombinantly produced ("recombinant bacteriophage"). A bacteriophage that has been"recombinantly produced"refers to a bacteriophage that has been manufactured using recombinant DNA technology, such as by inserting the bacteriophage genome into an appropriate host cell (e. g., by introducing the genome itself into a host cell, or by incorporating the genome into a vector, which is then introduced into the host cell).

Isolated bacteriophage PM 378 can be used in the study of the relationship between the bacteriophages and their host cells (e. g., Rhodothermus marinus, such as Rhodothermus marinus species ITI 378). Isolated bacteriophage RM 378 can also

be used as a vector to deliver nucleic acids to a host cell; that is, the bacteriophage can be modifie to deliver nucleic acids comprising a gene from an organism other than the bacteriophage (a"foreign"gene). For example, nucleic acids encoding a polypeptide (e. g., an enzyme or pharmaceutical peptide) can be inserted into the genome of bacteriophage PWl 378, using standard techniques. The resultant modified bacteriophage can be then used to infect host cells, and the protein encoded by the foreign nucleic acids can then be produced.

Bacteriophage Ru 378 can be produced by inoculating appropriate host cells with the bacteriophage. Representative host cells in which the bacteriophage can replicate include Rhodothernzu, s marins, s, particularly species isolated in a location that is geographically similar to the location where bacteriophage Ru 378 was isolated (e. g., northwest Iceland). In a preferred embodiment, the host cell is Rhodothernus nnarinus species ITI 378. The host cells are cultured in a suitable medium (e. g., medium 162 for Thermus as described by Degryse et al., Arch.

A/licrobiol. 117: 189-196 (1978), with 1/10 buffer and with 1 % NaC1). In addition, the host cells are cultured under conditions suitable for replication of the bacteriophage. For example, in a preferred embodiment, the host cells are cultured at a temperature of at least approximately 50°C. In a more preferred embodiment, the host cells are cultured at a temperature between about 50°C and about 80°C. The bacteriophage can also be stored in a cell lysate at about 4°C.

NUCLEIC ACIDS OF THE INVENTION Another embodiment of the invention pertains to isolated nucleic acid sequences obtainable from the genome of bacteriophage RM 378. As described herein, approximately 130 kB of the genome of bacteriophage RM 378 have been sequenced. The sequence of this 130 kB is set forth in Figure 1. There are at least approximately 200 open reading frames (ORFs) in the sequence; of these, at least approximately 120 putatively encode a polypeptide of 100 amino acids in length or longer. These 120 are set forth in Figure 2. Figure 2 sets forth the locus of each ORF ; the start and stop nucleotides in the sequence of each ORF; the number of nucleotides in the ORF, and the expected number of amino acids encoded therein;

the direction of the ORF; the identity of the putative protein encoded therein; the protein identified by a BLAST search as being the closest match to the putative protein ; the percentage identity at the amino acid level of the putative protein (based on partial sequence similarity; the overall similarity is lower); the organism from which the closest matching protein is derived; and other information relating to the ORFs.

The invention thus pertains to isolated nucleic acid sequence of the genome ("isolated genomic DNA") of the bacteriophage RM 378 that has been deposited with the Deutsche Sammlung Von Mikroorganismen und Zellkulturen GmbH (DSMZ) as described below. The invention also pertains to isolated nucleic acid sequence of the genome of bacteriophage RI\A 378 as is shown in Figure 1 (SEQ ID N0: 1).

The invention additionally pertains to isolated nucleic acid molecules comprising the nucleotide sequences of each of the ORFs described above or fragments thereof, as well as nucleic acid molecules comprising nucleotide sequences of more than one of the ORFs described above or fragments of more than one of the ORFs. The nucleic acid molecules of the invention can be DNA, or can also be RNA, for example, mRNA. DNA molecules can be double-stranded or single-stranded; single stranded RNA or DNA can be either the coding, or sense, strand or the non-coding, or antisense, strand. Preferably, the nucleic acid molecule comprises at least about 100 nucieotides, more preferably at least about 150 nucleotides, and even more preferably at least about 200 nucleotides. The nucleotide sequence can be only that which encodes at least a fragment of the amino acid sequence of a polypeptide; alternatively, the nucleotide sequence can include at least a fragment of a coding sequence along with additional non-coding sequences such as non-coding 3'and 5'sequences (including regulatory sequences, for example).

In certain preferred embodiments, the nucleotide sequence comprises one of the following ORFs : ORF 056e, 632e, 739f 1218a, 1293b. For example, the nucleotide sequence can consist essentially of one of the ORFs and its flanking sequences, such as are shown in Figures 7-11 (e. g., ORF 056e (SEQ ID NO: 56),

632e (SEQ ID N0: 58), 739f (SEQ ID NO: 60), 1218a (SEQ ID NO: 61), i293b (SEQ ID NO: 62)).

Additionally, the nucleotide sequence (s) can be fused to a marker sequence, for example, a sequence which encodes a polypeptide to assist in isolation or purification of the polypeptide. Representative sequences include, but are not limited to, those which encode a glutathione-S-transferase (GST) fusion protein. In one embodiment, the nucleotide sequence contains a single ORF in its entirety (e. g., encoding a polypeptide, as described below); or contains a nucleotide sequence encoding an active derivative or active fragment of the polypeptide; or encodes a polypeptide which has substantial sequence identity to the polypeptides described herein. In a preferred embodiment, the nucleic acid encodes a polymerase (e. g., DNA polymerase); DNA polymerase accessory protein; dsDNA binding protein; deoxyriboncleotide-3-phosphatase; DNA topoisomerase; DNA helicase; an exonuclease (e. g., 3'-5'exonuclease, 5'-3'exonuclease (RNase H)); RNA ligase; site- specific RNase inhibitor ofprotease; endonuclease; exonuclease; mobility nuciease ; reverse transcriptase; single-stranded binding protein; endolysin; lysozyme; helicase; aipha-glusrosyltransferase; or thymidine kinase, as described herein. In a particularly preferred embodiment, the nucleic acid encodes a DNA polymerase, 3'-5' exonuclease, 5'-3 exonuclease (RNase H), DNA helicase or RNA ligase. In another particularly preferred embodiment, the nucleic acid encodes a DNA polymerase that lacks exonuclease domains, or a 3'-5'exonuclease that lacks DNA polymerase domain, as described below.

The nucleic acid molecules of the invention are"isolated;" as used herein, an "isolated"nucleic acid molecule or nucleotide sequence is intended to mean a nucleic acid molecule or nucleotide sequence which is not flanked by nucleotide sequences which normally (in nature) flank the gene or nucleotide sequence (as in genomic sequences) and/or has been completely or partially purified from other transcribed sequences (e. g., as in an RNA library). For example, an isolated nucleic acid of the invention may be substantially isolated with respect to the complex cellular milieu in which it naturally occurs. In some instances, the isolated material will form part of a composition (for example, a crude extract containing other

substances), buffer system or reagent mix. In other circumstance, the material may be purified to essential homogeneity, for example as determined by PAGE or column chromatography such as HPLC. Thus, an isolated nucleic acid molecule or nucleotide sequence can include a nucleic acid molecule or nucleotide sequence which is synthesized chemically or by recombinant means. Therefore, recombinant DNA contained in a vector are included in the definition of"isolated"as used herein.

Also, isolated nucleotide sequences include recombinant DNA molecules in heterologous organisms, as well as partially or substantially purified DNA molecules in. solution. In vivo and in vitro RNA transcripts of the DNA molecules of the present invention are also encompassed by"isolated"nucleotide sequences.

The present invention also pertains to nucleotide sequences which are not necessarily found in nature but which encode the polypeptides described below.

Thus, DNA molecules which comprise a sequence which is different from the naturally-occurring nucleotide sequence but which, due to the degeneracy of the genetic code, encode the polypeptides of the present invention are the subject of this invention. The invention also encompasses variations of the nucleotide sequences of the invention, such as those encoding active fragments or active derivatives of the polypeptides as described below. Such variations can be naturally-occurrirg, or non- naturally-occurring, such as those induced by various mutagens and mutagenic processes. Intended variations include, but are not limited to, addition, deletion and substitution of one or more nucleotides which can result in conservative or non- conservative amino acid changes, including additions and deletions. Preferably, the nucleotide or amino acid variations are silent or conserved; that is, they do not alter the characteristics or activity of the encoded polypeptide.

The invention described herein also relates to fragments of the isolated nucleic acid molecules described herein. The term"fragment"is intended to encompass a portion of a nucleotide sequence described herein which is from at least about 25 contiguous nucleotides to at least about 50 contiguous nucleotides or longer in length; such fragments are useful as probes and also as primers. Particularly preferred primers and probes selectively hybridize to the nucleic acid molecule

encoding the polypeptides described herein. For example, fragments which encode polypeptides that retain activity, as described below, are particularly useful.

The invention also pertains to nucleic acid molecules which hybridize under high stringency hybridization conditions, such as for selective hybridization, to a nucleotide sequence described herein (e. g., nucleic acid molecules which specifically hybridize to a nucleotide sequence encoding polypeptides described herein, and, optionally, have an activity of the polypeptide). Hybridization probes are oligonucleotides which bind in a base-specific manner to a complementary strand of nucleic acid. Suitable probes include polypeptide nucleic acids, as described in (Nielsen et ctl., Scierce 254,1497-1500 (1991)).

Such nucleic acid molecules can be detected and/or isolated by specific hybridization (e. g., under high stringency conditions)."Stringency conditions"for hybridization is a term of art which refers to the incubation and wash conditions, e. g., conditions of temperature and buffer concentration, which permit hybridization of a particular nucleic acid to a second nucleic acid; the first nucleic acid may be perfectly (i. e., 100%) complementary to the second, or the first and second may share some degree of complementarity which is less than perfect (e. g., 60%, 75%, 85%, 95%). For example, certain high stringency conditions can be used which distinguish perfectly complementary nncleic acids from those of less complementarity.

"High stringency conditions","moderate stringency conditions"and"low stringency conditions"for nucleic acid hybridizations are explained on pages 2.10.1- 2.10.16 and pages 6. 3.1-6 in Current Protocols'in Molecular Bioloy (Ausubel, F. M. et al., "Current Protocols in Molecular Biology", John Wiley a Sons, (1998)) the teachings of which are hereby incorporated by reference. The exact conditions which determine the stringency of hybridization depend not only on ionic strength (e. g., 0.2XSSC, 0.1XSSC), temperature (e. g., room temperature, 42°C, 68°C) and the concentration of destabilizing agents such as formamide or denaturing agents such as SDS, but also on factors such as the length of the nucleic acid sequence, base composition, percent mismatch between hybridizing sequences and the frequency of

occurrence of subsets of that sequence within other non-identical sequences. Thus, high, moderate or low stringency conditions can be determined empirically.

By varying hybridization conditions from a level of stringency at which no hybridization occurs to a level at which hybridization is first observed, conditions which will allow a given sequence to hybridize (e. g., selectively) with the most similar sequences in the sample can be determined.

Exemplary conditions are described in Krause, M. H. and S. A. Aaronson, Methods in Enzynaology, 200: 546"555 (1991). Also, in, Ausubel, et al.,"CnJrent MolecularBiology",JohnWiley&Sons,(1998),whichdescribesthepro tocolsin determination of washing conditions for moderate or low stringency conditions.

Washing is the step in which conditions are usually set so as to determine a minimum level of complementarity of the hybrids. Generally, starting from the lowest temperature at which only homologous hybridization occurs, each °C by which the final wash temperature is reduced (holding SSC concentration constant) allows an increase by 1% in the maximum extent of mismatching among the sequences that hybridize. Generally, doubling the concentration of SSC results in an increase in T, Using these guidelines, the washing temperature can be determined empirically for high, moderate or low stringency, depending on the level of mismatch sought.

For example, a low stringency wash can comprise washing in a solution containing 0.2XSSC/0.1% SDS for 10 min at room temperature; a moderate stringency wash can comprise washing in a prewarmed solution (42°C) solution containing 0.2XSSC/0.1% SDS for 15 min at 42°C ; and a high stringency wash can comprise washing in prewarmed (68°C) solution containing O. 1XSSC/0. 1% SDS for 15 min at 68°C. Furthermore, washes can be performed repeatedly or sequentially to obtain a desired resuit as known in the art.

Equivalent conditions can be determined by varying one or more of the parameters given as an example, as known in the art, while maintaining a similar degree of identity or similarity between the target nucleic acid molecule and the primer or probe used. Hybridizable nucleic acid molecules are useful as probes and primers, e. g., for diagnostic applications.

Such hybridizable nucleotide sequences are useful as probes and primers for diagnostic applications. As used herein, the term"primer"refers to a single-stranded oligonucleotide which acts as a point of initiation of template-directed DNA synthesis under appropriate conditions (e. g., in the presence of four different nucleoside triphosphates and an agent for polymerization, such as, DNA or RNA polymerase or reverse transcriptase) in an appropriate buffer and at a suitable temperature. The appropriate length of a primer depends on the intended use of the primer, but typically ranges from 15 to 30 nucleotides. Short primer molecules generally require cooler temperatures to form sufficiently stable hybrid complexes with the template. A primer need not reflect the exact sequence of the template, but must be sufficiently complementary to hybridize with a template. The term"primer site"rfe-s to the area of the target DNA to which a primer hybridizes. The term "primer pair"refers to a set of primers including a 5' (upstream) primer that hybridizes with the 5'end of the DNA sequence to be amplified and a 3' (downstream) primer that hybridizes with the complement of the 3'end of the sequence to be amplified.

The invention also pertains to nucleotide sequences which have a substantial identity with the nucleotide sequences described herein; particularly preferred are nucleotide sequences which have at least about 10%, preferably at least about 20%, more preferably at least about 30%, more preferably at least about 40%, even more preferably at least about 50%, yet more preferably at least about 70%, still more preferably at least about 80%, and even more preferably at least about 90% identity, with nucleotide sequences described herein. Particularly preferred in this instance are nucleotide sequences encoding polypeptides having an activity of a polypeptide described herein. For example, in one embodiment, the nucleotide sequence encodes a DNA polymerase, 3'-5'exonuclease, 5'-3'exonuclease (RNase H), DNA helicase, or RNA ligase, as described below. In a preferred embodiment, the nucleotide encodes a DNA polymerase lacking exonuclease domains, or a 3'-5' exonuclease lacking DNA polymerase domain, as described below.

To determine the percent identity of two nucleotide sequences, the sequences are aligned for optimal comparison purposes (e. g., gaps can be introduced in the

sequence of a first nucleotide sequence). The nucleotides at corresponding nucleotide positions are then compare. When a position in the first sequence is occupied by the same nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position. The percent identity between the two sequences is a function of the number of identical positions shared by the sequences (i. e.. % identity = # of identical positions/total # of positions x 100).

The determination of percent identity between two sequences can be accomplished using a mathematical algoritlm. A preferred, non-limiting example of a mathematical algorithm utilized for the comparison of two sequences is the algori-thm of Karlin et al., Proc. Natl. Acad. Sci. USa, 90 : 5873-5877 (1993). Such an algoritilm is incorporated into the NBLAST program which can be used to identify sequences having the desired identity to nucleotide sequences of the invention. To obtain gapped alignments for comparison purposes, Gapped BLAST can be utilized as described in Altschul et al., Nz. ecleic Acids Res, 25: 3389-3402 (1997). When utilizing BLAST and Gapped BLAST programs, the default parameters of the respective programs (e. g., NBLAST) can be used. See http://www. ncbi. nlm. nih. gov. In one embodiment, parameters for sequence comparison can be set at W=12. Parameters can also be varied (e. g., W=5 or W=20). The value"W"determines how many continuous nucleotides must be identical for the program to identify two sequences as containing regions of identity.

The invention also provides expression vectors containing a nucleic acid sequence encoding a polypeptide described herein (or an active derivative or fragment thereof), operably linked to at least one regulatory sequence. Many expression vectors are comrnercially available, and other suitable vectors can be readily prepared by the skilled artisan."Operably linked"is intended to mean that the nucleotide sequence is linked to a regulatory sequence in a manner which allows expression of the nucleic acid sequence. Regulatory sequences are alt-recognized and are selected to produce the polypeptide or active derivative or fragment thereof.

Accordingly, the term"regulatory sequence"includes promoters, enhancers, and other expression control elements which are described in Goeddel, Gene Expression

Technology: Methods in Enzymology 185, Academic Press, San Diego, CA (1990).

For example, the native regulatory sequences or regulatory sequences native to bacteriophage RM 378 can be employed. It should be understood that the design of the expression vector may depend on such factors as the choice of the host cell to be transformed and/or the type of polypeptide desired to be expressed. For instance, the polypeptides of the present invention can be produced by ligating the cloned gene, or a portion thereof, into a vector suitable for expression in an appropriate host cell (see, for example, Broach, et al., Experimntal Manipulation of Gene Expression, ed. M. Inouye (Academic Press, 1983) p. 83 ; Molecular Cloning: A Laboratory Mantial, 2nd Ed., ed. Sambrook et al. (Cold Spring Harbor Laboratory Press, 1989) Chapters 16 and 17). Typically, expression constructs will contain one or more selectable markers, including, but not limited to, the gene that encodes diiiydrofolate reductase and the genes that confer resistance to neomycin, tetracycline, ampicillin, chloramphenicol, kanamycin and streptomycin resistance. Thus, prokaryotic and eukaryotic host cells transformed by the described expression vectors are also provided by this invention. For instance, cells which can be transformed with the vectors of the present invention include, but are not limited to, bacterial cells such as Rhodothermus marinus, W. coli (e. g., E. coli K12 strains), Streptomyces, SerratiamarcescensandSalmonellatyphimurium,ThePseudomonas,Ba cillus, host cells can be transformed by the described vectors by various methods (e. g., electroporation, transfection using calcium chloride, rubidium chloride, calcium phosphate, DEAE-dextran, or other substances; microprojectile bombardment; lipofection, infection where the vector is an infectious agent such as a retroviral genome, and other methods), depending on the type of cellular host. The nucleic acid molecules of the present invention can be produced, for example, by replication in such a host cell, as described above. Alternatively, the nucleic acid molecules can also be produced by chemical synthesis.

The isolated nucleic acid molecules and vectors of the invention are useful in the manufacture of the encoded polypeptide, as probes for isolating homologous sequences (e. g., from other bacteriophage species), as well as for detecting the presence of the bacteriophage in a culture of host cells.

The nucleotide sequences of the nucleic acid molecules described herein (e. g., a nucleic acid molecule comprising any of the open reading frames shown in Figure 2, such as a nucleic acid molecule comprising the open reading frames depicted in Figures 7-11 (SEQ ID NQ: 6i and 62, respectively)) can be amplified by methods known in the art. For example, this can be accomplished by e. g., PCR. See generally PCR Technology: Principles and Applications for DNA , 4inplification (ed. H. A. Erlich, Freeman Press, NY, NY, 1992); PCR Protocols: A Guide to Methods and Applications (eds. Innis, et al., Academic Press, San Diego, CA, 1990); Mattila et al., Nucleic Acids Res. 19, 49677 1991) ; Eckert et al., PCR Methods and Applications 1, 17 (1991); PCR (eds. McPherson et al., IRL Press, Oxford); and 13. S. Patent 4, 683, 202.

Other suitable amplification methods include the ligase chain reaction (LCR) (see Wu and 1 Wallace, Genomics 4,560 (1989), Landegren et al., Science 241,1077 (1988), transcription amplification (Kwoh et al., Proc. Natl. Acad. Sci. USa 86, 1173 (1989)), and self-sustained sequence replication (Guatelli et al., Proc. Nat.

Acad. Sci. USA, 87,1874 (1990)) and nucleic acid based sequence amplification (NASBA). The latter two amplification methods involve isothermal rections based on isothermal transcription, which produce both single stranded RNA (ssRNA) and <BR> <BR> doublestrandedDNA(dsDNA)astheamplificationproductsinaratioof about3G or 100 to 1, respectively.

The amplified DNA can be radiolabelled and used as a probe for screening a library or other suitable vector to identify homologous nucleotide sequences.

Corresponding clones can be isolated, DNA can be obtained following in vivo excision, and the cloned insert can be sequenced in either or both orientations by art recognized methods, to identify the correct reading frame encoding a protein of the appropriate molecular weight. For example, the direct analysis of the nucleotide sequence of homologous nucleic acid molecules of the present invention can be accomplished using either the dideoxy chain termination method or the Maxam- Gilbert method (see Sambrook et al., Molecular Cloi2ingo A Laboratory Manual (2nd Ed., CSHP, New York 1989); Zyskind et al., Recombinant DNA Laboratory

AAIanual, (Acad. Press, 1988)). Using these or similar methods, the protein (s) and the DNA encoding the protein can be isolated, sequenced and further characterized.

POLYPEPTIDES OF THE INVENTION The invention additionally relates to isolated polypeptides obtainable from the bacteriophage RM 378. The term, "polypeptide," as used herein, includes proteins, enzymes, peptides, and gene products encoded by nucleic acids described herein. In one embodiment, the invention pertains to the polypeptides encoded by the ORFs as described above. In addition, as described in detail below, bacteriophage RM 378 is similar to the well-known E. coli bacteriophage T4. Thus, it is expected that bacteriophage RM 378 comprises additional polypeptides that are homologous to those found in bacteriophage T4.

For example, representative proteins expected to be encoded by genes of bacteriophage RM 378 include the following: DNA topoisomerase; exonuclease (e. g., 3'-5'exonuclease, 5'-3'exonuclease (RNase H)) ; helicase; enzymes related to DNA or RNA synthesis (e. g., dCTPase, dUTPase,dCDPase,dUDPase,GTPase, dGTPase, ATPase, dATPase) ; transposase; reverse transcriptase; polymerase (e. g., DNA. polymerase, RNA polymerase); DNA polymerase accessory protein; DNA packaging protein; DNA topoisomerase; RNA polymerase binding protein ; RNA polymerase sigma factor; site-specific RNase inhibitor of protease; recombinant protein; alpha-glucosyltransferase ; mobility nuclease; endonuclease (e. g., endonuclease II, endonuclease V, endonuclease VII); inhibitor of Lon protease; thymidine kinase; site-specific RNase; N-glycosidase; endolysin; lysozyme; dNMP kinase; DNA ligase ; deoxyribonucleotide-3'-phosphatase; ssDNA binding protein; dsDNA binding protein; and RNA ligase.

In a particularly preferred embodiment, the polypeptide is polymerase (e. g., DNApolymerase);DNApolymeraseaccessoryprotein;dsDNAbindingpro tein; deoxyriboncleotide-3-phosphatase; DNA topoisomerase; RNA ligase; site-specific RNase ilrhibitor of protease; endonuclease; exonuclease (e. g., 3'-5' exonuclease, 5'-3' exonuclease (RNase H)); nobility nuclease; reverse transcriptase; single-stranded binding protein; enolysin; lysozyme; helicase; alpha-glucosyltransferase; or

thymidine kinase. In an especially preferred embodiment, the polypeptide is a DNA polymerase, a 3'-5'exonuclease, a 5'-3'exonuclease (RNase H), a DNA helicase, or an RNA ligase, such as those shown in Figures 7-11 (e. g., for a DNA polymerase, SEQ ID N0: 58; a 3'-5'exonuclease, SEQ ID NO: 56; a 5'-3'exonuclease (RNase H) (SEQ ID NO: 61); a DNA helicase (SEQ ID NO: 62), or an RNA ligase (SEQ ID NO: 60)). In a most preferred embodiment, the polypeptide is a DNA polymerase that lacks exonuclease domains, or a 3'-5'exonuclease that lacks DNA polymerase domain, as described in the examples below. As used herein, the term,"lacking exonuclease domains,"indicates that the polypeptide does not contain an amino acid domain (e. g., a consecutive or closely spaced series of amino acids) homologous to domains where such exonuclease activity resides in other similar polymerases (such as polymerases in the same family); it does not refer to the presence oi a non- functional domain homologous to domains where exonuclease activity resides.

Similarly,theterm,"lacking DNA polymerase domain,"indicates that the polypeptide does not contain an amino acid domain (e. g., a consecutive or closely spaced series of amino acids) homologous to domains where such DNA polymerase activity resides in other similar exonucleases (such as exonucleases in the same family); it does not refer to the presence of a non-functional domain homologous to domains where DNA polymerase activity resides.

These polypeptides can be used in a similar manner as the homologous polypeptides from bacteriophage T4; for example, polymerases and ligases of bacteriophage RM 378 can be used for amplification or manipulation of DNA and RNA sequences. The polymerases and ligases of bacteriophage RM 378, however, are expected to be much more thermostable than those of bacteriophage T4, because of the thermophilic nature of the host of bacteriophage RM 378 (in contrast with the mesophilic nature of E. coli, the host of bacteriophage T4).

The polypeptides of the invention can be partially or substantially purified (e. g., purified to homogeneity), and/or are substantially free of other polypeptides.

According to the invention, the amino acid sequence of the polypeptide can be that of the naturally-occurring polypeptide or can comprise alterations therein.

Polypeptides comprising alterations are referred to herein as"derivatives"of the

native polypeptide. Such alterations include conservative or non-conservative amino acid substitutions, additions and deletions of one or more amino acids ; however, such alterations should preserve at least one activity of the polypeptide, i. e., the altered or mutant polypeptide should be an active derivative of the naturally- occurring polypeptide. For example, the mutation (s) can preferably preserve the three dimensional configuration of the binding site of the native polypeptide, or can preferably preserve the activity of the polypeptide (e. g., if the polypeptide is a DNA polymerase, any mutations preferably preserve the ability of the enzyme to catalyze combination of nucleotide triphosphates to form a nucleic acid strand complementary to a nucleic acid template strand). The presence or absence of activity or activities of the polypeptide can be determined by various standard functional assays including, but not limited to, assays for binding activity or enzymatic activity.

Additionally included in the invention are active fragments of the polypeptides described herein, as well as fragments of the active derivatives described above. An"active fragment,"as referred to herein, is a portion of polypeptide (or a portion of an active derivative) that retains the polypeptide's activity, as described above.

Appropriate amino acid alterations can be made on the basis of several criteria, including hydrophobicity, basic or acidic character, charge, polarity, size, the presence or absence of a functional group (e. j.,-SH or a glycosylation site), and aromatic character. Assignent of various amino acids to similar groups based on the properties above will be readily apparent to the skilled artisan; further appropriate amino a. cid changes can also be found in Bowie et al. (Science 247: 1306- 1310 (1990)). For example, conservative amino acid replacements can be those that take place within a family of amino acids that are related in their side chains.

Genetically encoded amino acids are generally divided into four families: (1) acidic=aspai-tate, glutamate; (2) basic=lysine, arginine, histidine; (3) nonpolar=alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tlyptophan; and (4) uncharged polar=glycine, asparagine, glutamine, cystine, serine, threonine, tyrosine. Phenylalanine, tryptophan and tyrosine are sometimes classified

jointly as aromatic amino acids. For example, it is reasonable to expect that an isolated replacement of a leucine with an isoleucine or valine, an aspartate with a glutamate, a threonine with a serine or a similar conservative replacement of an amino acid with a structurally related amino acid will not have a major effect on activity or fimctionality.

The polypeptides of the invention can also be fusion polypeptides comprising all or a portion (e. g., an active fragment) of the native bacteriophage RM 378 polypeptide amino acid sequence fused to an additional component, with optional linker sequences. Additional components, such as radioisotopes and antigenic tags, can be selected to assist in the isolation or purification of the polypeptide or to extend the half life of the polypeptide; for example, a hexahistidine tag would permit ready purification by nickel chromatography. The fusion protein can contain, e. g., a glutathione-S-transferase (GST), thioredoxin (TRX) or maltose binding protein (MBP) component to facilitate purification; kits for expression and purification of such fusion proteins are commercially available. The polypeptides of the invention can also be tagged with an epitope and subsequently purified using antibody specific to the epitope using art recognized methods. Additionally, all or a portion of the polypeptide can be fused to carrier molecules, such as immunoglobulins, for many purposes, including increasing the valency of protein binding sites. For example, the polypeptide or a portion thereof can be linked to the Fc portion of an immunoglobulin; for example, such a fusion could be to the Fc portion of an IgG molecule to create a bivalent form of the protein.

Also included in the invention are polypeptides which are at least about 90% identical (i. e., polypeptides which have substantial sequence identity) to the polypeptides described herein. However, polypeptides exhibiting lower levels of identity are also useful, particular if they exhibit high, e. g., at least about 90%, identity over one or more particular domains of the polypeptide. For example, polypeptides sharing high degrees of identity over domains necessary for particular activities, such as binding or enzymatic activity, are included herein. Thus, polypeptides which are at least about 10%, preferably at least about 20%, more preferably at least about 30%, more preferably at least about 40%, even more

preferably at least about 50%, yet more preferably at least about 70%, still more preferably at least about 80%, and even more preferably at least about 90% identity, are encompassed by the invention.

Polypeptides described herein can be isolated from naturally-occurring sources (e. g., isolated from host cells infected with bacteriophage RM 378).

Alternatively, the polypeptides can be chemically synthesized or recombinantly produced. For example, PCR primers can be designed to amplify the ORFs from the start codon to stop codon, using DNA of B4378 or related bacteriophages or respective recombinant clones as a template. The primers can contain suitable restriction sites for an efficient cloning into a suitable expression vector. The PCR product can be digested with the appropriate restriction enzyme and ligated between the corresponding restriction sites in the vector (the same restriction sites, or restriction sites producing the same cohesive ends or blunt end restriction sites).

Polypeptides of the present invention can be used as a molecular weight marker on SUDS-PAGE gels or on molecular sieve gel filtration columns using art- recognized methods. They are particularly useful for molecular weight markers for analysis of proteins from thermophilic organisms, as they will behave similarly (e. g., they will not denture as proteins from mesophilic organisms would).

The polypeptides of the present invention can be isolated or purified (e. g., to homogeneity) from cell culture (e. g., from culture of host cells infected with bacteriophage tri 378) by a variety of processes. These include, but are not limited to, anion or cation exchange chromatography, ethanol precipitation, affinity chromatography and high performance liquid chromatography (HPLC). The particular method used will depend upon the properties of the polypeptide; appropriate methods will be readily apparent to those skilled in the art. For example, with respect to protein or polypeptide identification, bands identified by gel analysis can be isolated and purified by HPLC, and the resulting purified protein can be sequenced. Alternatively, the purified protein can be enzymatically digested by methods known in the art to produce polypeptide fragments which can be sequenced.

The sequencing can be performed, for example, by the methods of Wilm et al.

(Nature 379 (6564): 466-469 (1996)). The protein may be isolated by conventional

means of protein biochemistry and purification to obtain a substantially pure product, i. e., 80,95 or 99% free of cell component contaminants, as described in Jacoby, Methods in Enzymology Volume 104, Academic Press, New York (1984); Scopes, Protein andPractice,2ndEdition,Springer-Verlag,Principles New York (1987); and Deutscher (ed), Guide to protein Purification, Methods in Enzvrnology, Vol. 182 (1990).

The following Examples are offered for the purpose of illustrating the present invention and are not to be construed to limit the scope of this invention. The teachings of all references cited are hereby incorporated herein by reference in their entirety.

EXAMPLE 1 Isolation, Purification and Characterization of Bacteriophage A. Materials and Methods andgrowthmediaBacterialstrains The thermophilic, slightly halophilic eubacterium, marinus was first isolated from shallow water submarine hot springs in Isafjardardjup in northwest Iceland (Alfredsson, G. A. et al., J. Gen. Microbiol. 134 : 299-306 (1988)).

Since then Rhodothermus has also been isolated from two other areas in Iceland (Petursdottir et al., in prep), from the Azores and the Bay of Naples in Italy (Nunes, O. C. et al., Sj) st. Appl. Alficrobiol. 15: 92-97 (1992) ; Moreira, L. et al., Syst. Appl.

Microbiol. 19: 83-90 (1996)). Rhodother7nus is distantly related to the group containing Flexibacter, Bacterioides and Cytophaga species (Anderson, O. S. and Fridjonsson, O. H.,. 7. Bacteriol. 176: 6165-6169 (1994)).

Strain ITI 378 (originally R-21) is one of the first Rhodothermus strains isolated from submarine hot springs in Isafjardardjup in northwest Iceland. The strain was grown at 65°C in medium 162 for thermus (Degryse et al., Arch.

Microbiol. 117: 189-196 (1978)), with 1/10 the buffer and with 1% NaCl. Strain ITI 378 is phenotypically and phylogenetically similar (over 99% similarity in 16s rRNA sequence) to type strain DSM 4252.

Bacteriophage Isolation A water sample with some sand and mud was collected from a hot spring (62°C) appearing at low tide in Isaf ardardjup at the same site as the bacterium was originally isolated. The same kind of samples were collected from the Blue Lagoon and the Salt factory on Reykjanes in southwest Iceland.

After mixing a sample in a Waring blende, the sample was filtered through a Buchner funnel, followed by centrifugation, before filtering the water through a 0.45 um After centrifuging again, the sample was filtered through a sterile 0.2 urn membrane. This filtrate was used for infecting 18 different Rhodothermus strains (° from Isatjardardjup in northwest Iceland, and 10 from Reykjanes in southwest Iceland). The sample (4 ml) was mixed with 5 ml of soft agar A (the above growth medium with 2% agar) and 1 ml of overnight culture of different Rhodothermus strains. After pouring the sample onto a thin layer agar plate, the plates were incubated for 1-2 days at 65°C. A single, well-isolated plaque was stabbed with a sterile Pasteur pipette and dissolved in 100 pli of 10 mM MgCl2 solution (forming the plaque solution).

The bacteriophage is sensitive to freezing; it can be stored in a cell lysate at 4°C (e. g., as described below under"Liquid Lysate").

PlateLysate Overnight culture (0.9 ml) was mixed with 100 ul of the plaque solution and incubated for 15 minutes at 65°C before adding 3 ml of soft agar B (same as A, but 1 % agar and 10 mM MgCI2). After mixing and pouring onto thin layer agar plates, the plates were incubated for 1-2 days at 65°C. To nearly totally lysed plates was added 1 mI of 10 mM MgCl2, and after incubating at 4°C for a few hours, the top layer was scraped off and put into a sterile tube. After adding 100 ul chloroform and mixing it, the sample was centrifuged and the supernatant collected. The sample was centrifuged again and filtered through a 0.2 u. m filter; the filtrate was stored at 4°C. This lysate was used for testing host specificity.

Liquid Eysate

Liquid cultures were infected when they had reached an absorbance of 0.5 at 600 run (expected to contain 2. D X 108 cells/ml). The phage ratio was 0.1 pfu/cell culture. The cultures were incubated at high shaking (300 rpm) and growth was s followed by measuring absorbance at 600 nm. When lysis had occurred, chloroform was added to the cultures (10 u,l/ml) and shaking continued for 1 hour. Cell debris was removed by centrifugation and titer estimation was performed on the supernatant. large-scale purification from 300 ml culture was undertaken for DNA isolation and for protein composition analysis, as well as for electron microcopy.

RacteriophagePurif cation For electron microscopy, the bacteriophages were precipitated using PEG 8000 etal.,MolecularCloning,ColdSpringHarborLaboratoryPress,J.

Cold Spring Harbor, New York, 1989) and resuspended in SM buffer (Sambrook, J. et al., Molecular Cloning, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1989) before loading on the top of CsCl (0.75 g/ml). This sample was centrifuged for 23 hours at 38,000 rpm in TY-64 rotor (Sorvall Ultracentrifuge). The layer ofbacteriophage was collected using a syringe.

Protein DNAIsolationand Purified bacteriophage supernatant with a titer of approximately 103 pfu/ml was boiled for 5 minutes in SDS and P-mercaptoethanol loading puifer according to the method of Lammli (Laemmli, U. K., Nature 227: 680-685 (1970)) using 10% polyacrylamide gel, and stained with Coomassie brilliant blue. Bio-Rad pre-stained low molecular weight standards (7.7-204 kDa) were used as size markers.

Bacteriophage DNA was isolated from a purified phage lysate containing approximately 10'3pfu/rnl using the Qiagen lambda kit (Catolog No. 12543, Qiagen) according to manufacturer's instructions.

Temperature and Chloroform, Sensitivity Bacteriophage RM 378 at approximately 1011 pfu/ml was incubated for 30 minutes over a temperature range of 50-96°C before the remaining bacteriophage

titer was determined. The bacteriophage lysate at approximately 10'1 pfu/mi was mixed with an equal volume of chlorofonn, and incubated at room temperature.

After 30 minutes, the remaining viable bacteriophage were titrated with strain ITI 378 as a host.

G+CContentDeterminationof The mole percent guanine plus cytosine content of the bacteriophage was determined by CSM with HPLC according to Mesbah (Mesbah, M. U. et ul., lnl. J.

39:159-167(1989)).Syst.Bacteriol.

Estimation SizeGenome Bacteriophage DNA was digested individually with a variety of restriction endonucleases, and the fragments separated by electrophoresis on (w/v) agarose gel. Pulsed-field gel electrophoresis (PFGE) was also used for size estimation. Pulsed Field Certified Agarose from BioRad (Catalog No. 162-0137, Bio Rad) (1%) was used for the gel, and low-melt agarose (Catalog No. 162-0017, Bio Rad) (1%) for filling the wells when using marker plugs. Samples of 1.0 and 0.5 ug DNA were used and Bio Rad low range marker (&num 350) as well as -ladder (Catalog No. 170-3635, Bio Rad) was employed. The running buffer was 0. 5 x TBE (Sambrook, J. et al., Molecular Cloning, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1989). Bio Rad Pulsed Field Electrophoresis system (CHEF-DRIII) was used with an initial switch time of 60 seconds, final switch time of 60 seconds, 6 V/cm angle of 120° and 21 hour run time. Gels were stained with ethidium bromide and washed in distille water for 3 hours before photographing under a UV light illuminator.

ElectronMicroscopy The bacteriophage was stained with 2.5% phosphotungstic acid and the grids examined with a Philips EM 300 electron microscope. Bacteriophage samples from CsCI purification, as well as directly from a liquid lysed culture with titer of 10 pfu/ml, were used for microscopy studies.

DNA Sequenciftg and Genome Analysis The phage genome was sequenced using the "shot gun sequencing" technique (see, e. g., Fleischmann, R. D. et al., Science 269: 496-512 (1995)). The sequences were aligned (Ewing, B., et al., Génome Research 8 : 175-185 (1998)); Ewing, B. and Green, P., Genome Research 8 : 186-194 (1998)). The consensus sequence of 130,480 bp was visualized with the program XBB-Tools (Sicheritz- Ponten, T., Department of Molecular Evolution, Uppsala, Sweden) for open reading frames (ORFs).

B. Results BacteriophageIsolation The phage sample from the southwest area of Iceland, prepared as described above, infected 4 strains of Rhodoíhermlls, all from Reykjanes in southwest Iceland.

The phage sample from the northwest area of Iceland, prepared as described above, infected 7 strains of Rhodothermus, all from Isafjardardjup in northwest Iceland.

Bacteriophages were isolated from two of the strains infected with the sample from the southwest, and from all 7 of the strains infected with the sample from the northwest. Of these, one of the bacteriophages from the sample from the northwest was isolated from strain ITI 378 and designated RM 378. The titer of this bacteriophage was estimated; in liquid culture it repeatedly gave titers of 5-8 x 1013 pfulml.

Attempts to isolate the bacteriophages from Rhodothermus by subjecting it to stress such as ultraviolet (UV) exposure did not succeed. Because such stress would have e from the chromosome and have initiated a lytic response, the failed attempts suggest that Rhodothernzus did not contain prophages.

BacteriophageMorphology Bacteriophage RM 378 is a tailed phage with a moderately elongated head. It is a T4-like phage, resembling the T4 phage of Escherichia coli both in morphology and genome size, and has a double-stranded DNA genome. Ru 378 belongs to the Myoviridae family and has the A2 morphology (Ackermann, H. W., Arch. Virol.

124: 201-209 (1992)). The bacteriophage head measures 85 nm on one side and 95 nm on the other. The tail is 150 m-n in length, with a clear right-handed spiral to the tail sheath. The head/tail ratio is 0.63 and the total length is 245 nm.

Host Infectionand RM 378 concentrated bacteriophage was tested against 9 different Rhodothermus strains from the two different areas (Isafjardardjup in northwest Iceland, and Reykjanes in southwest Iceland). It infected 5 strains from the northwest, but no strains from the southwest. Thus, the bacteriophage infected only strains of Rhodothermus from the same geographical area from which the bacteriophage was isolated. it did not infect any of the 6 Thermus strains that were tested.

Growth of bacteria was followed at 65°C in a liquid. Uninfected culture was used as control, and growth was followed until the control culture had reached stationary phase. Cell lysis started 9 hours after infection of the culture, and stationary phase in the control was reached about 14 hours after infection.

Stability of the Bacteriophage BacteriophageR@378wasstableto30minutesexposuretochloroform, indicating that it probably does not contain lipids. Heat stability of the phage was tested at 50°C-96°C by incubating the phage concentrate for 30 minutes, followed by estimation of titer. There was no change of the titer up to 65'C, but at 70°C and 80°C a 100-fold drop in pfu/ml was measured. Linear decrease of the titer was observed up to 96°C, where it was 10,000 times lower after 30 minutes than in the starting solution. After 3 months of storage at 4°C the titer dropped 100-fold (down to 10"pfu/ml). After 27 months of storage the titer had fallen from 10"pfu/ml to 105 pfuJml in a CsCl-purified sample.

Composition of Bacteriophage RM 378 Purified bacteriophage was subjected to SDS-PAGE analysis for examination ofitsproteincomposition.Thephagewascomposeofatleast16protein swith

apparent molecular weights from 23-150 kDa. The five main bands were at 92,61, 52,50 and 26 kDa. and were in a ratio of 0.14: 0.45: 0.21: 0.13: 0.06. The major protein band of 61 kDa accounted for about 20% of the total protein; the five main bands together represented about 50% of total proteins.

The average G+C @ mol% of the RM 378 phage was 42.0 + 0. 1. The DNA was digested with a variety of restriction enzymes (HindIII, XhoI, ClaI, AluI, NotI, SacI, PstI, BamHI, SmaI, SpeI, EcoRV). Three of the enzymes (Notl, Smat SpeI) did not cleave RM 378, and the rest resulted in multiple fragments. Because the addition of the fragment sizes resulted in a variable amount for the total genome size, the phage DNA was also run on PFGE, which estimated the size of the DNA to be about kb. theBacteriophageCharacteristicsof The RM 378 bacteriophage is a virulent bacteriophage following a lytic cycle of infection. Very high titer lysates of up to 10'3 pfu ! ml could be obtained, which indicated a large burst size of more than 100. Because no bacteriophages have been reporte against this bacterial genus, IBI 378 represents a new species.

Genome ComparisontoT4Bacteriophageand The nucleic acid sequence of RM 378 is set forth in Figure 1. The nucleic acid sequence of RM 378 contains at least 200 open reading frames (ORFs); see, for example, the ORFs described in Figure 2. Of these, five were identified in more detail, as described in Example 2, including the ORFs expected to encode DNA polymerase, 3'-5'exonuclease, 5'-3'exonuclease, RNA ligase and DNA helicase.

RM 378 belongs in the T-even family, in that. it is similar to bacteriophage T4 of Escherichia coli. Bacteriophage T4 of E. coli is a well-studied phage which, together with T2 and T6, belongs to the family of bacteriophages known as T-even phages. T-even phages are nearly identical not only in structure and composition, but also in properties. Several enzymes isolated from bacteriophage T4 are used in the field of recombinant DNA technology as well as in other commercial

applications. For example, T4 DNA polymerase, T4 DNA ligase and T4 RNA ligase are frequently used iu the research industry today.

The genome of RLA 37g was aligned in a consensus sequence, and the open reading frames (ORFs) were analyzed and compare to the T4 bacteriophage genome. The overall genome arrangement seemed to be different and the overall similarity to known proteins was low. However, despite this apparently high genetic divergence, several structural and morphological features were highly conserved.

Furthermore, homologues to proteins in T4 were identified in the RM 378 bacteriophage. These similarities are set ioi-th in Table 1, below.

In view of the similarities between bacteriophage T4 and bacteriophage RM 378, it is reasonable to expect that bacteriophage RM 378 comprises genes that are homologous to those found in bacteriophage T4, and that these genes in bacteriophage RM 378 encode proteins and enzymes that correlate to those proteins and enzymes found in bacteriophage T4.

EXAMPLE 2 Detailed Analysis of Five Open Reading Frames (ORFs) A. Selection of Reading Frames for Analysis Five open reading frames (ORFs) of the numerus ORFs described above in the genome of bacteriophage RM378, have been further characterized and the corresponding genes cloned and. expressed. The genes include a DNA polymerase, 3'-5'exonuclease, 5'-3'-exonuclease (IZNase H), replicative DNA helicase and RNA ligase. These genes were chosen as examples of the many valuable genes encoded by the bacteriophage genome. The corresponding polypeptide products of these genes are mainly components of the bacteriophage replication machinery and can be utilized in various molecular biology applications as evident by the current use of homologous counterparts from other sources. The sequences of the five ORFs show low similarity to sequences in public databases indicative of distant relationship to known proteins; however, probable homology to known sequences can be established by comparison with families of sequences showing overall sequence similarity as well as conservation of shorter regions, sequence motifs and functionally important residues, in some cases aided by three-dimensional structural

information. The limited sequence similarity or these sequences to publicly available sequences suggests that these gene products have functional properties very different from corresponding proteins currently in use in molecular biology applications. Together with the presumed thermostability, the properties of these gene products render them valable in various applications in molecular biology.

DAPolylerase DNA polymerases have evolved to accommodate the varied tasks required for replication and repair. DNA replication involves 1) local melting of the DNA duplex at an origin of the replication, 2) synthesis of a primer and Okazaki fragment, 3) DNA melting and unwinding at the replication fork, 4) extension of the primer on the leading strand and discontinuous synthesis of primers followed by extension of the lagging strand, 5) removal of RNA primers and 6) sealing of nicks. (Perler et al., Adv 48:377-43591996)).Chem The different types of DNA polymerases have been grouped into Families A, B, C and X corresponding to similarity with E. coli pol I, II and III and pol b respectively (Braithwaite, D. K. and Ito, J., NucleicAcidsRes. 21: 787-802 (1993)).

Each of these Families contains conserved sequence regions (Perler et al., dSv Protein chem. 48:377-435 91996) ; Blanco L., et al., Gene 100: 27-38 (1991); Morrison A. et al., Proc natl Acad Sci U S A. 88 : 9473-9477 (1991)). Family B DNA polymerasese are also called Pol α Family DNA polymerases.

The DNA polymerases of family B type include bacteriophage T4 and bacteriophage RB69 DNA polymerase as well as archaeal polymerases and E. coli polymerase II. Polymerases of this type normally have two activities, the polymerase activity and the proofreading 3'-5'exonuclease activity, found in different domains within the same polypeptide with the exonuclease domain being N-terminal to the polymerase domain (Steitz, T. A., J Biol Chem 274: 17395-8 (1999); Kornberg, A. and Baker, T. A., DNA Replication, Freeman, New York (1992); Brautigam, C. A. and Steitz, T. A., Curr. OPin. Struct. Biol. 8: 45-63 (1998)). Polymerases of family B have an overall domain architecture different from polymerases of family A and do not have a 5'-3'exonuclease activity which is normally found in polymerases in

family A. The determined structure of RB69 DNA polymerase is a representative structure of family B type polymerase and shows clearly the modular organization of the enzyme with separate domains (Wang, J. et al., Cell 89 : 1087-99 (1997), Protein data bank (PDB) accession code 1WAJ). The structure of the archaeal DNA polymerase from Desulfurococcus strain Tok was shown to have the same overall structure (Zhao, Y. et al., Structure Fold Des 7: 1189-99 (1999), PDB accession code 1QQC). The alignment of polymerases in this family indicates the presence of several conserved region in the sequences with characteristic sequence motifs both belonging to both the exonuclease domain and the polymerase domain (Hopfner, K. P. et al., Proc Natl Acad Sci U S A 96 : 3600-3605 (1999)).

Exonzrcleases Besides the basic polymerization function, DNA polymerases may contain 5'-3' and a 3-5'exonuclease activity. The 3'-5'exonuclease activity is required for proofreading. In general the family B polymerases have 3-5 exonuclease activity, but not 5'-3'exonuclease activity. If both exonucleases are present ; the 5'- 3'exonuclease domain is at the N-terminal followed by the 3'-5'exonuclease domain and the C-terminal polymerase domain. The structure of the polymerases can be defined further in terms of domain structure. The polymerase domain is thus composed of a number of smaller domains, often referred to as the palm, fingers and thumb, and although these parts are not homologous across families, they do show analogous structural features (Steitz, T. A., J Biol Chem 274: 17395-8 (1999); Kornberg, A. & Balcer, T. A., DNA Replication, Freeman, New York (1992); Brautigam, C. A. & Steitz, T. A., CM. Opin.Struct.Biol. 8 : 45-63 (1998)).

RNase H (Ribonuclease H), e. g. from bacteriophage T4, removes the RNA primers that initiate lagging strand fragments, during DNA replication of duplex DNA. The enzyme has a 5'-3'exonuclease activity on double-stranded DNA and RNA-DNA duplexes. Further, T4 RNase H has a flap endonuclease activity that cuts preferentially on either side of the junction between single and double-stranded DNA in flap and fork DNA structures. Besides replication, T4 RNase H also plays a role in DNA repair and recombination. (Bhagwat, M., et al., J. Biol. Chem.

272: 28531-28538 (1997); Bhagwat, M., et al. J. Biol. Chem. 272: 28523-28530 (1997)).

T4 RNase H shows sequence similarity to other enzymes with a demonstrated role in removing RNA primers, including phage T7 gene 6 exonuclease, the 5'-3'nuclease domain of E coli DNA polymerase I, and human FEN-1 (flap endonuclease). These enzymes have 5'-3'-exonuclease activity on both RNA-DNA and DNA-DNA duplexes and most of them have a flap endonuclease activity that removes the 5-ssDNA tail of flap or fork structures. The T4 enzyme homologous to members of the RAD2 family ofprokaryotic and eukaryotic replication and repair nucleases (Mueser T. C., et al., Cell. 85 : 1101-1112 (1996)).

RNase H is a part of the reverse transcriptase complex of various retroviruses. The HIV-1 RT associated ribonuclease H displays both endonuclease and 3'-5'exonuclease activity (Ben-Artzi, H., et al., Nucleic Acids Res. 20: 5115- 5118 (1992); Schatz, O., et al., 4 : 1171-1176 (1990)).

In molecular biology, RNase H is applied to the replacement synthesis of the second strand of cDNA. The enzyme produces nicks and gaps in the mRNA strand of the cDNA: mRNA hybrid, creating a series of RNA primers that are used by the corresponding DNA polymerase during the synthesis of the second strand of cDNA (Sambrook, J., et al, Molecular cloning: a laboratory manual, 2nd ed. Cold Spring Harbour Laboratory Press (1989)). The RNase H of e. coli can promote the formation and cleavage of RNA-DNA hybrid between an RNA site and a base paired strand of a stable hairpin or duplex DNA at temperature below their Tm (Li. J., and R. M. Wartell, Biochemistry 37: 5154-5161 (1998); Shibahara, S., et al., Nucleic Acides Res. 15: 4403-4415 (1987)). Thus, the enzyme has been used for site-directed cleavage of RNA using chimeric DNA splints (presence of complementary chimeric oligonucleotides) (Inoue, H., et al., IducleicAcids S1°mp Ser. 19 19 135-138 (1988)) or oligoribonucleotide capable oi forming a stem and ioop structure (Hosaka H., et al., J. Biol. Chem. 259: 20090-20094 (1994)).

DATA helicase

DNA helicases use energy derived from hydrolysis of nucleoside triphosphate to catalyze the disruption of the hydrogen bonds that hold the two strands of double-stranded DNA together. The reaction results in the formation of the single-stranded DNA required as a template or reaction intermediate in DNA replication, repair or recombination (Matson, S. et ol., BioLssays. 16 : l')-21 (1993)).

The bacteriophage T4 Gp41 is a highly processive replicative helicase (similar to the DNA B protein of E. coli) and has been shown to form hexamer in the presence of ATP (Dong, F., and P. H. von Hippel, J. Biol. Chez. 271: 19625-19631 1 (1996)). The enzyme facilitates the unwinding of DNA helix ahead of the advancing DNA polymerase and accelerates the movement of the replication fork. It has been suggested that gp4l interacts with the polymerase lioloenzyme at the replication fork (Schrock R. D. and B. Albeits, J. Biol. Chem. 271: 16678-16682 (1995)). Gp41 has a 5'-3'polarity and requires a single stranded region on the 5'side of the duplex to be unwound. The ATP-activated helicase binds to a single gp61 primase molecule on appropriate DNA template (Morris, P. D., and K. D. Raney, Biochemistry. 38 : 5164- 5171 (1999)) to reconstitute a stable primosome (Richardson, R. W. andN. G.

Nossal, J. i ? ol. Chez. 264: 4725-4731 (1989)). Although the gp41 alone does not form a stable complex with DNA template, this helicase by itself can carry out moderately processive ATP-driven translocation along single strand DNA (Dong, F., and P. H. von Hippel. J. Biol. Chers. 271: 19625-19631 (1996)). The T4 gene 59 protein accelerates the loading of gp41 onto DNA, wlacn it is covered with 32 protein (the T4 single strand binding protein), and stimulates the helicase activity to catalyze replication fork movement through a DNA double helix, even through a promoter-bound RNA polymerase molecule (Barry, J., and B. Alberts. J. Biol. Chem.

269: 33063-33068 (1994) ; Tarumi, K., and T. Yonesaki, J Biol Chem. 270: 2614- 2619 (1995)). The T4 gp41 helicase has also been disclosed to participate in DNA recombination. Following exonuclease nicking of ds DNA and further expansion into a gap, gp41 creates a free 3'end, which is required as a substrate by recombination proteins (RecAlike) (Tarumi, K., and T. Yonesaki. J Biol Chem.

270: 2614-2619 (1995)).

RNALigase RNA ligase is abundant in T4-infected cells and has been purified in high yields. Bacteriophage T4 RNA ligase catalyzes the ATP-dependent ligation of a 5'- phosphoryl-terminated nucleic acid donor (i. e. RNA or DNA) to a 3'-hydroxyi- tenninated nucleic acid acceptor. The reaction can be either intramolecular or intermolccular, i. e., the enzyme catalyzes the formation of circular DNA/RNA, linear DNA/RNA dimers, and RNA-DNA or DNA-RNA block co-polymers. The use of a 5'-phosphate, 3'-hydroxyl terminated acceptor and a 5'-phosphate, 3'- phosphate terminated donor limits the reaction to a unique product. Thus, the enzyme can be an important tool in the synthesis of DNA of defined sequence (Marie I., et al., Biochemistry 19:635-642 (1980), Sugion, A. et al.,. J. Biol. Chem.

252: 1732-1738 (1977)).

The practical use ofT4 RNA ligase has been demonstrated in many ways.

Various ligation-anchored PCR amplification methods have been developed, where an anchor of defined sequence is directly ligated to single strand DNA (following primer extension, e. g. first strand cDNA). The PCR resultant product is amplified by using primers specific for both the DNA of interest and the anchor (Apte, A. N., and P. D. Sieberi, BioTechniques. 15: 890-893 (1993); Troutt, A. B., et al., Proc. NSil.

Acad. Sci. USA. 89: 9823-9825 (1992); Zhang, X. H., and V. L. Chiang, Nucleic Acids Res. 24: 990-991 (1996)). Furthermore, T4 RNA ligase has been used in fluorescence-, isotope-or biotin-labeling of the 5'-end of single stranded DNA/RNA molecules (Kinoshita Y., et al., Nucleic Acid Res. 25: 3747-3748 (1997)), synthesis of circular hammer head ribozymes (Wang, L., and D. E. Ruffner.

Nucleic Acids Res 26: 2502-2504 (1998)), synthesis of dinucleoside polyphosphates (Atencia, E. A., et al. Eur. J. Biochejra. 261: 802-811 (1999)), and for the production of composite primers (Kaluz, S., et al. ,BioTechniques. 19: 182-186 (1995)).

B. DNA Polymerase Activity and 3'-5'Exonuclease Activity Are Found in Gene Products of Separate Genes in the Phage RM378 Genome

The predicted gene products of two open reading frames (ORF056e and ORF632e), which are widely separated in the genome of phage RM378, both showed similarity to family B type polymerases as shown below.

Identification of the ORF056e gene product as 3'-5'exonuclease The predicted gene product of ORF056e (locus GP43a) was run against a sequence database (NCBI nr) in a similarity search using BLAST (Altschul, S. F. et al., J AAol. Biol. 215: 403-410 (1990)) (Table 2). Out of 64 hits with E value lower (better) than 1, all sequences were of DNA polymerases of family B type including DNA polymerase from bacteriophage RB69, archaeal DNA polymerases and E. coli polymerase II. Importantly, all these sequences are DNA polymerase sequences having the sequence characteristics of the DNA polymerase domain as well as the 3'-5'eonuclease domain and are considerably longer (excluding partial sequences) than the predicted gene product ofORF056e which has a length of 349 residues.

The similarity is restricted to the N-terminal halves of these sequences corresponding to the part of the protein where the 3'-5'proofreading exonuclease domain is located.

Table 2 lists the 20 sequences with strongest similarity to the ORF056e sequence together with the length and E-value according to BLAST search. The sequence identity with the ORF056e sequence ranges from 21 to 27 %. Of the 64 sequences identified in the sequence database, 34 are of viral origin and 15 of archaeal origin. Out of the twenty top scoring sequences, 16 are of viral origin. theORF632egeneproductasDNApolymeraseIdentificationof The sequence similarity program BLAST (Altschul, S. F. et al.,. t Allol. Biol.

215: 403-410 (1990)) was also used to identify potential homologues ofthe ORF632e (locus GP43b) gene product. The 100 sequences in the sequence database (NCBI nr) with the strongest similarity to the ORF632e sequence were all defined as DNA polymerase sequences. These sequences all had an E value lower than 10-5 and are considerably longer (excluding partial sequences) than the predicted gene product of

ORFõ32e X ; hich has a length of 522 residues (Table 3). Sequence alignments between the ORF632e sequence and the sequences identified in the database shows that the similarity is restricted to a domain with the DNA polymerase activity as characterized by conserved sequence motifs such as DxxSLYPS (Hopfner, K. P. et al., Proc Marl Acad Sc USS 96 : 3600-3605 (1999)). In these sequences this domain is always preceded by a long N-terminal region where the 3'-5'exonuclease activity normally is found. The corresponding N-terminal region is lacking in ORF362e which consists only of the DNA polymerase domain (family B type polymerases).

The sequence motif DXXSLYPS (SEQ ID NO: 63) in the ORF632e sequence is found very close to its N-terminus unlike its location in all the 100 analyzed sequences in the public database.

Table 3 lists the 20 sequences with strongest similarity to the ORF632e sequence together with the length and E-value according to a BLASTsearch. The sequence identity with the ORF632e sequence rages from 23 to 28 % within aligned regions of 300 to 428 residues. The majority of these 20 sequences are of archaeal DNA polymerases of family B type.

The results of the similarity searches indicated that gene products of ORF056e and ORF632e correspond to the exonuclease domain and the polymerase domain of family B type polymerases, respectively. Partial alignment of sequences of a number of members of this family was obtained from the Pfam protein family database (Pfam database at http://www. sanger. ac. uk/Pfam, accession number PF00136). The sequences of ORF056e and ORF632e could be combine as one continuous polypeptide and aligned to the previous set of sequences. The coordinates of the three-dimensional structures of DNA polymerases from bacteriophage RB69 (PDB ID 1WAJ), the archaea Tlzermococcus gorgonarius ( PDB ID 1TGO) and the archaea Desulfo70cocclls strain Tolc (PDB ID I QQc) were structurally aligned and the sequence alignment produced from the structural alignment. The corresponding sequences were added to the previous alignment and the alignment adjusted, guided by the alignment from the structural superposition, mainly in regions which are less conserved. The resulting alignment, shown in Figure 3, strongly supports the previous interpretation that 3'-5'proofreading

activity and DNA polymerase activity are found in two proteins encoded by separate genes in bacteriophage RM378. As seen in the alignment (Figure 3), the major conserved regions in this protein family in the 5'-3'exonuclease domain and in the polymerase domain are also conserved in the gene products of ORF056e and ORF632e, respectively. As defined by Hopfner et al. (Hopfner, K. P. et al., Proc Natl Acad Sci USA 96 : 3600-3605 (1999)), this includes regions exo I,-II and-III in the exonuclease domain and motifs A,-B and-in the polymerase protein. Motif A corresponds to the DxxSLYPS motif mentioned above and includes an aspartic acid residue, involved in coordinating one of the two Mg2+ ions which are essential for the polymerase activity, and a tyrosine residue which stacks it side chain against an incoming nucleotide in the polymerase reaction. Another aspartic residue which also acts as Mg2+ ion ligand (motif C), and is essential for the catalytic mechanism, is also found in the sequence of ORF632e (D215). Inspection of the three-dimensional structure of bacteriophage RM69 DNA polymerase (PDB ID 1 WAJ), with respect to the alignment, shows that the end of the OP. F056e sequence and the beginning of the ORF632e sequence are found between the 3'-5'exonuclease domain and the DNA polymerase domain.

The polymerase activity encoded by bacteriophage RM378 thus resides in an enzyme which is relatively short corresponding only to the polymerase domain of other members in this family and unlike those relatives does not have an 3'-5' exonuclease domain. The 3'-5'exonuclease is found as another protein encoded by a separate gene elsewhere in the genome. The natural form of DNA polymerase from Therraus aquaticus (Taq) also lacks the proofreading 3'-5'exonuclease activity but this polymerase differs from the polymerase of RM378 in several aspects: i) it belong to a different family of polymerase (family A) which have a different general architecture, ii) the lack of 3'-5'exonuclease activity is due to a non-functional domain since it still contains a structural domain homologous to a domain where this activity resides in other polymerase in this family, and iii) naturally occurring Taq has 5'-3'exonuclease activity besides its polymerase activity (Kim, Y. et al., Nature 376: 612-616 (1995)). Thus, the current protein is the only known example of a DNA polymerase which by nature lacks proofreading activity and the corresponding

structural domain present in other polymerases of this type, and therefore represents the discovery of a unique compact type of DNA polymerase found in nature lacking both 3'-5'and 5'-3'exonuclease activity.

C. ORF739f Encodes an RNA Ligase Several sequences of RNA ligases in a protein sequence database showed similarity to the ORF739f sequence (locus GP63) as identified in a similarity search using BLAST (Altschul, S. F. et al., OTMol. Biol. 215: 403-410 (1990)). The top scoring sequences found in the BLAST search are show in Table 4. Only 3 sequences showed a score with E-avlue below 1.0. The two most significant and extensive similarities were found to the sequences of RNA ligases from Autographa californica nucleopolyhedrovirus and bacteriophage T4. The similarity to the third sequence, that of a DNA helicase, is much less extensive and has considerable higher E-value. The sequence identity between the ORF739f sequence and the two RNA ligase sequences is 23% over regions of 314 and 381 residues. A sequence alignment of these three sequences is shown in Figure 4.

The site of covalent reaction with ATP (adenylation) has been located at residue K99 in bacteriophage T4 RNA ligase (Thogersen HC, et al., Eur J Biochem 147: 325-9 (1985); Heaphy, S., Singh, M. and Gait, M. J., Biochet7istry 26 : 1688-96 (1999)). A corresponding Lysine residue (K126) is also found in the sequence of ORF730E An aspartic residue close to the adenylation site in T4 RNA ligase has also been implied as important for the catalytic mechanism (Heaphy, S., Singh, M. and Gait, M. J., Biochemistry 26: 1688-96 (1999)). This residue is also conserved in n ORF739f (D 128). It has been suggested that the motif KX (D/N) G may be a signature element for covalent catalysis in nucleotidyl transfer (Cong, P., and Shuman, S., J Biol Chem 268 : 7256-60 (1993)). The conservation of these active site residues supports the interpretation of ORF739f gene product as RNA ligase having catalytic mechanism in common with other RNA ligases and involving covalent reaction with ATP.

Table 4 shows sequences with strongest similarity (E-value cutoff of 1.0) to the ORF739f sequence together with their length and E-value according to BLAST search.

D. Orf 1218a Encodes a Gene Product with 5'-3'Exonuclease Activity A BLAST search (Altschul, S. F. et al., J Mol. Biol. 215: 403-410 (1990)) identified about 60 sequences in the database (NCBI nr) with significant similarity (corresponding to E-value lower than 1) to the sequence of the predicted gene product of ORF 1218a (locus DAS). Almost all the identified sequences are of DNA polymerase I from bacterial species (DNA polymerase family A) and the similarity is restricted to the N-terminal halves of these sequences and the ORF 1218a sequence is much shorter, 318 residues, compared to the identified sequences which usually are between 800 and 900 residues (Table 5).

Structural and functional studies of DNA polymerases of this type (family A) have defined the different structural domains and how these correlate with the different activities of the enzyme. Polymerases of this type normally have a polymerase activity located in a C-terminal domain and two exonuclease activities, a 3'-5'exonuclease proofreading activity in a central domain and a 5'-3 exonuclease activity in an N-terminal domain (Kornberg, A. and Baker, T. A., DNA Replication, Freeman, New York (1992); Brautigam, C. A. and Steitz, T. A., Cu7r. Open. Strict. Biol. 8: 45-63 (1998)). The sequence of ORF 1218a corresponds to the 5'-3'exonuclease domain of these polymerases.

The 5'-3'exonuclease domain of DNA polymerase I belongs to a large family of proteins which also include ribonuclease H (RNase H) including bacteriophage T4 RNase H. The analysis of the structure ofbacteriophage T4 RNase H revealed the conservation of a several acidic residues in this family of proteins.

These residues are clustered at the active site, some of which help coordinate two functionally important Mg2+ ions (Mueser, T. C., et al., Cell 85: 1101-12 (1996)). The corresponding alignment shown in Figure 5, including the sequence of the ORF 1218a gene product, shows that these acidic residues (possibly with the exception of

one) are also found in the gene product ofORF1218a thus further supporting its proposed activity as 5'-3'exonuclease.

The 5'-3'exonuclease of polymerase I and RNase H both remove RNA primers that have been formed during replication but T4 DNA polymerases and other polymerases of the same type (family B), including the identified polymerase of phage RNi378 identified here (see above), lack the exonuclease activity. T4 RNase H (305 residues) and the ORF1218a gne product (318 residues) are of similar size with conserved regions scattered throughout most of the sequences (Figure 5). These proteins are likely to have a vey similar structure given the structW ral silmilarity between T4 RNase H and 5'-3'exonuclease domain of polymerase I (Mueser, T. C., et al, Cell 85: 1101-12 (1996)). The gene product of ORF1218aprobablyhas a function analogous to the function of RNase H in bacteriophage T4.

Table 5 sets forth the 21 sequences with strongest similarity to the ORF1218a sequence together with the length and E-value according to BLAST search. The sequence identity with the ORF 1218a sequence ranges from 31 to 41 % within aligned regions of 82 to 145 residues.

E. A Replicative DNA Helicase Is Part of the Replication Machiner of Phage RM378 Several sequences of replicative DNA helicases were identified in a similarity search using BLAST (Altschul, S. F., et al., J. Mol. Biol. 215: 403-410 (1990)) with the ORP1293b (locus GP41) sequence as query sequence. 15 sequences had an E-value lower than 1.0 with the sequence of bacteriophage T4 replicative DNA helicase (product of gene 41, accession number P04530) having by far the lowest E-value. Some of the sequences found in the similarity search are hypothetical proteins and some are defined as RAD4 repair protein homologues.

However, the most extensive similarity was found with the replicative helicase sequences, with sequence identity of 20-23% spanning 210-295 residues, and these

sequences are all of length similar to the length of the ORF 1293b gene product (416 residues). Table 6 shows the identified sequences of the similarity search.

The replicative DNA helicases with similarity to the ORF1293b sequence are of the same protein family often named after the corresponding helicase in E. coli encoded by the DnaB gene (e. g. DnaB-like helicases). The Pfam protein family database holds 37 sequences in this family (family DnaB, accession number PF00772 ; http ://w,w. sauger. ac. uk/Pfam) and the alignment of these sequences shows clearly several regions with conserved sequence motifs. One of this motif is characteristic for ATPases and GTPases (Walker A motif, P-loop) and forms a loop that is involved in binding the phosphates of the nucleotide (Sawaya, M. R. et al., Cell 99: 167-77 (1999)). The replicative helicases bind single stranded DNA (at the replication fork) and translocate in the 5'-3'direction with ATP (GTP) driven translocation (Matson, S. W., et al., BioEsscrys 16: 13-22 (1993)). The significant similarity found in the BLAST search to sequences other than helicase sequences is partly due to the presence of an ATP/GTP binding sequence motif in these sequences.

Figure 6 shows the sequence alignment of some members of the DnaB protein family together with the sequence of ORF1293b. Sawaya et al. have shown how several conserved motifs and functionally important residues of the DnaB family relate to the crystal structure of the helicase domain of the T7 helicase- primase (Sawaya, M. R. et al., Cell 99: 167-77 (1999)). The alignment in Figure 6 shows how these conserved motifs are present in the ORF1293b sequence thereby supporting its role as replicative helicase.

The bacteriophage T4 replicative helicase sequence was indicated as most closely related to the ORF I 293b sequence in the similarity search. The structure and function of the corresponding helicases may be very similar in these two bacteriophages and, together with the similarity of numerus other components of these phages, may be indicative of other similarities of their replication machiner.

T4 replicative helicase is known to be an essential protein in the phage replication and interact with other proteins at the replication fork such as the primase to form the primosome (Nossal, N. G., FASEB J. 6: 871-8 (1992)). Similarly, the helicase

encoded by ORF1293b may have an essential function in bacteriophage RM378.

Other homologues of components of the T4 replication system have been detected as well as shown above and still others may also be expected to be encoded by the bacteriophage genome, Table 6 sets forth sequences with strongest similarity (E-value cutoff of 1.0) to the ORF1293b sequence together with the length and E-value according to BLAST search.

F. Subcloning of Selected ORFs from RM378 Plasmids were designated pSHI, pGKI, pOL6, pJBl and pJB2, were generated for the genes encoding the 3'-5'exonuclease, the DNA polymerase, the RNA-ligase gene, the RNaseH gene and the helicase gene, respectively. The correct insertion of the ORFs into the expression vector was verified by DNA sequencing, and the expression of the genes was verified by SDS gel electrophoresis of respective host strain crude extracts.

E coli strain JM109 [supE44A (1ac-proAB), hsdR17, recA1, endA1, gyrA96, thi-1, reIAl (F'traD36, proAB, lacIqZAM15)] (Viera and Messin, Gene, 19: 259- 268 (1982)) and strain XLlO-Gold [TetrA (mcrA) 183 A (mcrCB-hsdSMR-mrr) 173 endAl supE44 thi-1 recAl gyrA96 relAl lac Hte (F'proAB lacIqAM15 TnlO (Tetr) Amy Camr)] (Stratagene) were used as hosts for expression plasmids.

Restriction enzyme digestions, plasmid preparations, and other in vitro manipulation of DNA were performed using standard protocols (Sambrook et al., Molecular Cloning 2nd Ed. Cold Spring Harbor Press, 1989).

The PCR amplification of the nucleic acids sequence containing the open reading frame (ORF) 056e, which displayed similarity to 3'-5'exonuclease domain of family B polymerase genes was as follows. The forward primer exo-f : CACGAGCTC ATG AAG ATC ACG CTA AGC GCA AGC (SEQ ID NO: 64), spanning the start codon (underlined) and containing restriction enzyme site, was used with the reverse primer exo-r: ACAGGTACC TTA CTC AGG TAT TTT TTT GAA CAT (SEQ ID NO: 65), containing restriction site and spanning the stop codon

(underlined, reverse complement) [codon 350 of ORF 056E shown in Figure 7].

The PCR amplification was performed with 0.5 U of Dynazyme DNA polymerase (Finnzyme), 10 ng of RM378 phage DNA, a 1 µM concentration of each synthetic primer, a 0.2 mM concentration of each deoxynucleoside triphosphate, and 1.5 mM MgCl2 in the buffer recommended by the manufacturer. A total of 30 cycles were performed. Each cycle consisted of denaturing at 94°C for 50 s, annealing at 50°C for 40 s, and extension at 72°C for 90 s. The PCR products were digested with Kpn I and Sac I and ligated into lpn I and Sac I digested pTrcHis A (Invitrogen) to produce pSHl. Epicurian Coli XL10-Gold (Stratagene) were transformed with pSHI and used for induction of protein expression, although any host strain carrying a lac repressor could be used.

The PCR amplification of the nucleic acids sequence containing ORF 632e, which exhibited similarity to DNA polymerase domain of family B polymerase genes was similar as described above for the putative 3'-5'exonuclease gene except that other PCR-primers were used. The forward primer pol-f: CACGAGCTCATGAACATCAACAAGTATCGTTAT (SEQ ID N0: 66), spanning the start codon (underlined) and containing restriction enzyme sites was used with the reverse primer pol-r: ACAGGTACCTTAGTTTTCACTCTCTACAAG (SEQ ID NO: 67), containing restriction site and spanning the stop codon (underlined reverse complement) [codon 523 of ORF 632e shown in Figure 8]. The PCR products were digested with Kpn I and Sac I and ligated into Kpra I and 3CI C I digested pTrcHis A (Invitrogen) to produce pGKi. Epicurian Coli XL10-Gold (Stratagene) were transformed with pGKl and used for induction of protein expression. The expressed protein was observed with Anti-Xpress Antibody (Invitrogen) after Western Blot.

The PCR amplification of the nucleic acid sequence containing ORF 739f, (which displayed similarity to the T4 RNA ligase gene) was similar to the procedure described above for the putative 3'-5'exonuclease gene. The forward primer Rlig-f : GGG AAT TCT TAT GAA CGT AAA ATA CCC G (SEQ ID NO: 68), spanning the start codon (underlined) and containing restriction enzyme sites was used with the reverse primer Rlig-r: GGA GAT CTT ATT TAA ATA ACC CCT TTT C (SEQ ID NO: 69), containing restriction site and spanning the stop codon (underlined

reverse complement) [codon 437 of the ORF shown in Figure 9]. The PCR products were digested with EcoRI and BgglI. Subsequently the amplified products were cloned into EcoRI and amHI digested pBTacl (Amann et al., Gene 25: 167-178 (1983)) to produce pOL6. Cells ol E. coli strain JM109 were transformed with pOL6 and used for induction of protein expression, although any host strain carrying a lac repressor could be used.

The PCR amplification of the nucleic acid sequence containing ORF 1218a, (which displayed similarity to the T4 RNaseH gene) was similar to the procedure described above for the putative 3-5 exonuclease gene excepí that other PCR- primers were used. The forward primer RnH-f : GGGAATTCTT ATG AAA AGA CTG AGG AAT AT (SEQ ID NO: 70), spanning the start codon (underlined) and containing restriction enzyme sites was used with the reverse prim. er Px r: GGA GAT CTC ATA GTC TCC TCT TTC TT (SEQ ID NO: 71), containing restriction site and spanning the stop codon (underlined reverse complement) [codon 319 of the ORF shown in Figure 10]. The PCR products were digested with EcoRl and i5glII and ligated into EcoRl and BainHI digested pBTacl (Amann et al. Gene 25: 167- 178.1983) to produce pJB1. As for the RNA ligase clone, cells of E. coli strain JM109 were transformed with pJB1 and used for induction of protein expression.

The PCR amplification of the nucleic acid sequence containing ORF 1293b, which displayed similarity to the J/MB like helicase genes was as described above for the putative 3'-5 exonuclease gene except other PCR-primers were used. The forward primer HelI-f: GGGCAATTGTT ATG GAA ACG ATT GTA ATT TC (SEQ ID NO: 72), spanning the start codon (underlined) and containing restriction enzyme sites was used with the reverse primer HelI-r : CGGGATCC TCA TTT AAC AGC AAC GTC (SEQ ID NO : 73), containing restriction site and spanning the stop codon (underlined reverse complement) [codon 417 of the ORF shown in Figure 11].

The PCR products were digested with EcoPxI and BgllI and ligated into EcoRI and BamHI digested pBTacl (Amann et al. Gene 25: 167-178 (1983)) to produce pJB2.

Cells of E. coli strain JM109 were transformed with pJB2 and used for induction of protein expression.

BiologicalMaterialDepositof A deposit of Rlodothefnnus marinus strain ITI 378, and a deposit Rhodothermus marinus strain ITI 378 infected with bacteriophage RM 378, was made at the following depository under the terms of the Budapest Treaty: Deutsche Sammlung Von Milcroorganismen und Zelllculturen GmbH (DSMZ) Mascheroder Weg lb D-38124 Braunschweig, Germany.

The deposit was made in the name of the following depositor: IceThem, Ltd.

Liftaeiaiihus, Keidnaholt IS-112 Reykjavik, Iceland The deposit of Rhodothermus marinus strain ITI 378 received accession number DSM 12830, with an accession date of May 2out'', 1999. The infected strain (Rhodothermus marinus strain ITI 378 infected with bacteriophage RM 378) received accession number DSM 12831, with an accession date of May 31", 1999.

During the pendency of this application, access to the deposits described herein will be afforded to the Commissioner upon request. All restrictions upon the availability to the public of the deposited material will be irrevocably removed upon granting of a patent on this application, except for the requirements specified in 37 C. F. R. 1.808 (b) and 1.806. The deposits will be maintained in a public depository for a period of at least 30 years from the date of deposit-or for the enforceable life of the patent or for a period of five years after the date of the most recent request for the filrnishing of a sample of the biological material, whichever is longer. The deposits will be replaced if they should become nonviable or nonreplicable.

Depositor has authorized deCODE genetics, enf., Lynghals 1, IS-110 Reykjavik, Iceland, for the International or European patent application to refer to the aforementioned deposited biological material in the subject application and gives unreserved and irrevocable consent to the deposited biological material being made available to the public in accordance with Rule 28 EPC. This authorization and consent is effective from the filing date of 2 June 1999 of U. S. Provisional application No. 60/137,120.

Table 1: Comparison of Structural Features of T4 and RM 378

RM378FeatureT4 Phage typeT-even,A2morphology T-even ;A2morphology MyoviridaeFamilyMyoviridae 168,900basesca130,480basesGenomesize Numberof300>200ca GP13,GP17,GP18,PutativehomologsoftheCharacteristicGP3, structuralproteins GP20,GP21,GP22 samewereidentified Arrangement of Alloftheabovegenesareon Alloftheabovegenes structural proteins thesamestrandandclustered were dispersedoverthe inaregioncovering35 kbwhole genome and found on bothstrands Representative lysozyrneandthymidinekinase lysozymeandthymidine enzymes(onsame strand) kinase(ondifferent strands)

Table 2 Source: Accussion #: Definition:Length : E-vaiue*: | spodoptera litura AAC33750,1DNApolymerase6039e-08 nucleopolyhedrovirus (partial) Spodopteralittoralis AAF61904.1DNApolymerase 9989e-08 nudeopolyhedt-ovirus Sulfurisphaera050607DNAPOLYMERASEI8723e-07 ohwakuensis(DNAPOLYMERASE Bl) Xestia c-nigrumAAC06350.1DNA polymerase 109S 4e-07 granulovirus T30431DNA-directedDNALymantriadispar 10145e-07 nucleopolyhedroviruspolymerase P30318DNAPOLYMERASE10135e-07Lymantriadispar nucleopolyhedrovirus Buzura suppressaria AAC33747. 1DNApolymerase6478e-07 nucleopolyhedrovirus(partial) SulfolobusP95690DNA POLYMERASE I8754e-06 acidocaldarius BacteriophageRB69Q38087DNAPOLYMERASE 9035e-06 Spodoptera exigna AAC33749.1DNApolymerase6362e-04 nucleopolyhedrovirus(partial) Spodoptera exiguaAAF33622.1DNApolymerase10632e-04 nucleopolyhedrovirus AAC33746.1DNApolymerase6289e-04Mamestrabrassicae nucleopolyhedrovirus(partial) AAC97837.1putativeDNApolymerase10799e-04Melanoplussanguinipe s entomopoxvirus AAC33748.1DNApolymerase6580.003Orgyiaanartoides nucleopolyhedrovirus AAB53090.1Sulfolobussolfataricus8820.003polymerase P26811DNAPOLYMERASEI882Sulfolobussolfataricus0.003 7Humanherpesvirus AAC40752.1catalyticsubunit of10130.004 replicativeDNA polymerase 7AAC40752.1catalyticsubunitof10130.004Humanherpesvirus replicative DNA polymerase P52025DNAPOLYMERASE8240.010Methanococcusvoltae Source:Accession#:Definition:Length:E-value* : Bombyx mori nuclear P41712 DNA POLYMERASE9860.013 polyhedrosisvirus Bombyx mori nuclear BAA03756. 1 DNA polymerase9860.051 polyhedrosisvirus

* An E-value of 1 assigned to a hit can be interpreted as meaning that in a database of the current size one might expect to see 1 match with a similar score simply by chance.

Table 3 Source:Accession#:E-value*:Length: Aeropyrum pemix093745DNA POLYMERASE I9594e-20 BAA75662.1DNApolymeraseAeropyrumpernix 9234e-20 Aeropyrum pernixBAA75663.1DNA polymerase II 772 7e-14 Aeropyrumpemix 093746 DNAPOLYMERASEII 7847e-14 PyrodictiumBAA07579.1DNA polymerase9142e-16 occultum PyrodictiumA56277 DNA-directed DNA polymerase 8792e-16 occultum PyrodictiumB56277DNA-directedDNA polymerase8036e-ll occultum DNAPOLYMERASEI8755e-16SulfolobusP95690 acidocaidarius DNAPOLYMERASEArchaeoglobusO29753 7811e-14 fulgidus ChlorellavirusP30320DNA POLYMERASE9133e-14 NY2A ThermococcusP56689DNA POLYMERASE7734e-14 gorgonarius Paramecium bursariaA42543DNA-directedDNA polymerase9139e-14 Chlorellavirus1 Paramecium bursaria P30321 DNA POLYMERASE 913 4e-13 Chlorellavirus1 PyrobaculumAAF27815. 1familyB DNA polymerase7859e-14 islandicum HomosapiensP09884DNA POLYMERASEALPHA1462le-13 CATALYTICSUBUNIT Homosapiens(DNAdirected),11076e-07polymerase delta1,catalyticsubunit HomosapiensS35455DNA-directed107polymerase 9e-07 deltal ChlorellaBAA35142.1DNApolymeraseK2 9133e-13 Sulfolobuspolymerase8823e-13DNA solfataricus Source : Accession #:Definition:Length:E-value*: SulfolobusP26811DNA POLYMERASE I8823e-13 solfataricus

* An E-value of 1 assigned to a hit can be interpreted as meaning that in a database of the current size one might expect to see 1 match with a similar score simply by chance.

Table 4

Source:Accession#:E-value*:]Length: P41476PUTATIVEBIFUNCTIONAL6943e-07aphacalifornica nucleopolyhedrovirus POLYNUCLEOTIDE KINASE/RNALIGASE ColiphageT4P00971RNALIGASE3740.002 I D70476DNAhelicase530Aquifexaeolicus0.25 * An E-value of 1 assigned to a hit can be interpreted as meaning that in a database of the current size one might expect to see 1 match with a similar score simply by chance.

Table 5 Source:Accession#: Definition:Length:E- value*: StreptococcusP13252DNA POLYMERASE I8772e-08 pneumonie Lactococcuslactis 032801 DNAPOLYMERASEI 8772e-06 subsp.cremoris _ _ DNApolymeraseI8761e-05BacillusAAB52611.1 stearothermophilus BacillusAAB62092.1DNA polymerase 18772e-05 stearothermophilus BacillusS70368DNA polymerase I8762e-05 stearothermophiius BacillusP52026DNAPOLYMERASEI8762e-05 stearothermophilus BacillusJC4286DNA-directed DNA polymerase8794e-05 stearotherinophilus Bacilluspolymerase9544e-05DNA stearothemophilus 2113329ADNApolymrase8343e-05Thermusthermophilus P52028DNAPOLYMERASEIThermusthermophilus 8343e-05 BAA85001.1DNApolymerase8343e-05Thermusthermophilus Bacillussubtils034996DNAPOLYMERASEI8804e-05 BacillusBacilluscaldotenax Q04957IPOLYMERASE 8774e-05 DNA-directedDNApolymerase9214e-05DeinococusA40597 radiodurans DNAPOLYMERASEIDeinococcusP52027 9564e-05 radiodurans AquifexaeolicusD70440DNA3'-5'exodomain2897e-05I O52225DNAPOLYMERASEIThermusfiliformis 8337e-05 AnaerocellumQ59156IPOLYMERASE 8503e-04 thermophilum CAB56067.1DNApolymeraseI9223e-04Rickettsiafelis Rhodothermus sp. TTIAAC98908.1DNApolymerase type I9244e-04 518' P19821DNHAPOLYMERASEI8324e-04Thermusaquaticus *An E-value of 1 assigned to a hit can be interpreted as meaning that in a database of the current size one might expect to see 1 match with a similar score simply by chance.

Table 6 Source:Accession# : Definition : Length:E-value*: coliphage T4 P04530 PRIMASE-HELICASE4753e-06 (PROTEINGP41) replicativeDNAhelicas4580.003CampylobacterCAB75198.1 jejuni ListeriaQ48761DNAREPAIRPROTEIN4520.003 monocytogenesRADAHOMOLOG ListeriaAAC33293.1RadA homolog 4570.016 monocytogenes Mycoplasma MAVIAAC33767. 1putativereplicationprotein276.007 arthritidis bacteriophage Aeropyrumpemix B72665 hypotheticaiprotein7260.016 Porphyra purpureaP51333PROBABLE5680.027 REPLICATIVEDNA HELICASE Escherichia coli P03005REPLICATIVEDNANA 4710.047 HELICASE SH3domain4520.047SaccharomycesNP_011861.1 cerevisiae Chlamydia084300DNAREPAIRPROTEIN4540.14 trachomatisRADAHOMOLOG HaemophilusP45256REPLICATIVEDNA5040.14 influenzaeHELICASE CaenorhabditisT16375hypotheticalprotein5660.18 elegans PyrococcusE71133hypotheticalprotein4830.18 horikoshii Cyanidium AAF12980.1unknown;replication4890.53 caldariumhelicasesubunit Rickettsia Q9ZD04 DNAREPAIRPROTEIN4480.69 prowazekii RADAHOMOLOG * An E-value of 1 assigned to a hit can be interpreted as meaning that in a database of the current size one might expect to see 1 match with a similar score simply by chance.

While this invention has been particularly shown and described with references to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

ODEPOSITEDMICROORGANTSMINDICATIONSRELATING OR OTHER MATERIAL 13bis)(PCTRule A.Theindicationsmadebelowrelatetothedepositedmicroorganismor otherbiological material referredtointhedescription onpage45,line14 B.IDENTIFICATIONOFDEPOSITFurtherdepositsareidentified onanadditionalsheet Nameofdepositaryinstitution TheDeutscheSammlungvonMikroorganismenundZellkulturenGmbH(DSM Z) Addressof depositary institution (incizza'ing postal code and countr, v) MascheroderWeg1b D-38124Braunschweig GERMANY I DateofdepositAccessionNumber May28,1999DSM12830 { C A3DITlONAL lNDICATIONS (! eave blank if no asplicablejThisinformationiscontinuedonanadditionalsheet i Inrespectofthosedesignations for whichaEuropeanpatentis sought, the Applicants) herebyinformestheInternationalBureauthattheApplicantwishesth at,untilthepublication ofthementionofthegrantofaEuropeanpatentor for20yearsfromthedateoffiling if the applicationisrefizsed or withdrawn or deemedtobewithdrawn,thebiological material deposited with theDSMZ,underAccessionNo.DSM12830 | D.DESIGNATED STATES FOR WHiICH if iNDitCATIOTSS ARE MADE (iy'the {ndicationz are notfor ail designated States) E.SEPARATE FURNISHING OF INDICATIONS (leave blank jnot applicable) Theindicationsfisted beiow wiiibesubmittedtotheIntemational Bureau later (specthe znera ! rraitire oftfre indicationse. g.,' : Vumbero8Dçosit') ForreceivingOfficeuseonly For IntemationalBureauuseonly 'This sheetwasreceivedwiththeinternationalapplication7ij ThissheetwasreceivedbytheInternationalBureauon: AuthorizedofficerAuthorizedofficer INDICATIONS DEPOSITEDMICROORGANISMTO OR OTHER BIOLOGICAL MATERIAL 13bis)(PCTRule A.The indications made below reiate tothedepositedmicroorganismorotherbiologicalmaterial referredtointhedescription onpage45,line16 B.IDENTIFICATIONOFDEPOSITFurtherdepositsareidentifiedonanadd itionalsheet Nameofdepositaryinstitution TheDeutscheSammlungvonMikroorganismenundZellkulturenGmbH(DSM Z) Addressofdeposi [ary institution (inclwaing postal code and country) MascheroderWeglb D-38124Braunschweig GERMANY Date cidepositAccession Number May31,1999DSM12831 C. ADDtT1CNAL IIYDICAT1 [ONS (leave blank if nor applca6ley This informationiscontinuedonanadditionalsheet i Inrespectofthosedesignationsforwhich,aEuropeanpatentissought ,theApplicmt (s) herebyinforms the InternationalBureauthattheApplicantwishesthat,untilthepublic ation ofthementionofthegranitofaEuropeanpatentorfor20years fromthedateof fiiingifthe applicationisrefusedorwithdrawnordeemedtobe withdrawn,thebiologicalmaterial depositedwith theDSMZ,underAccessionNo.DSM12831 D.DESIGNATED STATES FOR WHICH INDICATIONS ARE MADE f' ; y\-/ ! g ;/M' ; cfos nrgof/o-a ; 7MaQi'afe E.SEPARATE FURNISHING OF INDICATIONS (lenve bfank if not applicable) Theindicationslisted oelow wiilbesubmittee ; o the InternationaiBureaulate.-the aenera, nacure aftire indications e-g.,", 4ccession ivumber ofdeposit' For receiving Officeuseonly For lncemacionalBureauuseonly'' trisThis sheetwasreceivedwiththeinternationalapplicationjThissheetwas receivedbytheInternationaiBureauon: Autnonzed olficer Authorized officer