Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
POLYMERASE ENZYME
Document Type and Number:
WIPO Patent Application WO/2021/242740
Kind Code:
A2
Abstract:
The present invention is in the field of molecular biology and is directed to novel reverse transcriptase enzymes and compositions, and to methods and kits for producing, amplifying, or sequencing nucleic acid molecules using these novel reverse transcriptase enzymes or compositions. In particular the Invention relates to a polymerase selected from the group of, a polymerase (O15) as encoded by a nucleic acid according to SEQ ID NO. 9 or a nucleic acid that is at least 98% identical thereto, a polymerase (O15) with the amino acid sequence according to SEQ ID NO: 10 or a polymerase that is at least 90% identical thereto, a polymerase (O57) as encoded by a nucleic acid according to SEQ ID NO. 11 or a nucleic acid that is at least 98% identical thereto, a polymerase (O57) with the amino acid sequence according to SEQ ID NO: 12 or a polymerase that is at least 90% identical thereto, a polymerase (O58) as encoded by a nucleic acid according to SEQ ID NO. 13 or a nucleic acid that is at least 98% identical thereto, and a polymerase (O58) with the amino acid sequence according to SEQ ID NO: 14 or a polymerase that is at least 90% identical thereto.

Inventors:
HELLER RYAN CHARLES (US)
SCHUSTER DAVID M (US)
Application Number:
PCT/US2021/034027
Publication Date:
December 02, 2021
Filing Date:
May 25, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
QIAGEN BEVERLY LLC (US)
International Classes:
C07K14/005; C12N9/12
Domestic Patent References:
WO2019211749A12019-11-07
Foreign References:
US5455170A1995-10-03
US8093030B22012-01-10
US20090137008A12009-05-28
EP1050587B12007-04-18
US9758812B22017-09-12
US5338671A1994-08-16
US5773258A1998-06-30
US5683896A1997-11-04
US5322770A1994-06-21
EP1934339A22008-06-25
Other References:
MYERSGELFAND, BIOCHEMISTRY, vol. 30, 1991, pages 7661
SAUTERMARX, ANGEW. CHEM. INT. ED. ENGL., vol. 45, 2006, pages 7633
KRANASTER ET AL., BIOTECHNOL. J., vol. 5, 2010, pages 224
BLATTER ET AL., ANGEW. CHEM. INT. ED. ENGL., vol. 52, 2013, pages 11935
ZHAO ET AL., RNA, vol. 24, 2018, pages 183
MOHR ET AL., RNA, vol. 19, 2013, pages 958
ELLEFSON ET AL., SCIENCE, vol. 352, 2016, pages 1590
"GenBank", Database accession no. AFN99405.1
SELLNERTURBETT, BIOTECHNIQUES, vol. 25, no. 2, 1998, pages 230 - 234
SELLNER ET AL., J. VIOL. METHODS, vol. 40, 1992, pages 255 - 264
CHANDLER ET AL., APPL. AND ENVIRONM MICROBIOL., vol. 64, no. 2, 1998, pages 669 - 677
YASUKAWA ET AL., J. BIOCHEM., vol. 143, 2008, pages 261
MALBOEUF ET AL., BIOTECHNIQUES, vol. 30, 2001, pages 1074
WANG ET AL., NUCLEIC ACIDS RES., vol. 32, 2004, pages 1197
MERKENS ET AL., BIOCHIM. BIOPHYS. ACTA, vol. 1264, 1995, pages 243
MURALI ET AL., PROC. NATL. ACAD. SCI. U.S.A., vol. 95, 1998, pages 12562
NUCLEIC ACIDS RESEARCH, 2019
Attorney, Agent or Firm:
MARTY, Scott D. et al. (US)
Download PDF:
Claims:
CLAIMS

1. Polymerase selected from the group of, a. a polymerase (015) as encoded by a nucleic acid according to SEQ ID NO. 9 or a nucleic acid that is at least 98% identical thereto,

1775 b. a polymerase (015) with the amino acid sequence according to SEQ ID NO: 10 or a polymerase that is at least 90% identical thereto, c. a polymerase (057) as encoded by a nucleic acid according to SEQ ID NO. 11 or a nucleic acid that is at least 98% identical thereto, d. a polymerase (057) with the amino acid sequence according to SEQ ID NO: 12 or

1780 a polymerase that is at least 90% identical thereto, e. a polymerase (058) as encoded by a nucleic acid according to SEQ ID NO. 13 or a nucleic acid that is at least 98% identical thereto, and f. a polymerase (058) with the amino acid sequence according to SEQ ID NO: 14 or a polymerase that is at least 90% identical thereto.

1785 2. Polymerase comprising, a. an N-terminal 5 ’ -3 ’nuclease domain, i. stemming from Taq polymerase or, ii. a polymerase sharing at least 95% amino acid sequence identity with the N- terminal 5 ’-3’ nuclease domain of Taq polymerase,

1790 b. an adjacent and linked polymerase domain, stemming from a viral family A polymerase, wherein the polymerase domain stems preferably from,

1. JGI20132J 14458_100001622 (1607 amino acids; SEQ ID NO. 20), or a functional fragment that shares at least 98% amino acid sequence identity

1795 thereto, and is altered to comprise the following amino acid changes, Q627N, H751Q, Q752K, and V753K, or

2. Ga0186926_l 22605 (1595 amino acids; SEQ ID NO. 21), or a functional fragment that shares at least 98% amino acid sequence identity thereto, and is altered to comprise the following amino acid changes, Q627N, H752Q, Q753K,

1800 and V754K, or

3. Ga0080008_l 5802729 (1619 amino acids; SEQ ID NO. 22) or a functional fragment that shares at least 98% amino acid sequence identity thereto, and is altered to comprise the following amino acid changes, Q628N, H752Q, Q753K, and L754K, or

1805 4. Ga0079997_l 1796739 (1608 amino acids; SEQ ID NO. 23), or a functional fragment that shares at least 98% amino acid sequence identity thereto and is altered to comprise the following amino acid changes, Q627N, H752Q, Q753K, and I754K.

3. Polymerase according to claim 2, wherein

1810 a. there is a peptide linker between the exonuclease domain and the polymerase domain and, b. optionally said peptide linker has the amino acid sequence according to SEQ ID NO. 19 (GGGGSGGGGS).

4. Polymerase according to claims 2 or 3, wherein polymerase domain is codon optimized

1815 for expression in E. coli.

5. Polymerase comprising, a. the amino acid sequence of i. SEQ ID NO. 16 (OP-2605) comprising the following additional amino acid changes, Q627N, H752Q, Q753K, and V754K,

1820 ii. or an amino acid sequence at least 95%, preferably at least 98% identical thereto, b. the amino acid sequence of i. SEQ ID NO. 15 (OS- 1622) comprising the following additional amino acid changes, Q627N, H751Q, Q752K, and V753K, ii. or an amino acid sequence at least 90%, preferably at least 95%, more preferably

1825 at least 98% identical thereto, c. the amino acid sequence of i. SEQ ID NO. 17 (CS-2729) comprising the following additional amino acid changes, Q628N, H752Q, Q753K, and L754K, or an amino acid sequence at least 90%, preferably at least 95%, more preferably at least 98% identical

1830 thereto, or d. the amino acid sequence of i. SEQ ID NO. 18 (PS-6739) comprising the following additional amino acid changes, Q627N, H752Q, Q753K, and I754K, ii. or an amino acid sequence at least 90%, preferably at least 95%, more preferably

1835 at least 98% identical thereto.

6. A method for amplifying template nucleic acids comprising contacting the template nucleic acids with a polymerase according to any one of claims 1 to 5, preferably wherein the method is reverse transcription (RT) PCR.

7. The method according to claim 6, wherein the method comprises:

1840 a) generating cDNA using a polypeptide according to any one of claims 1 to 6, and b) amplifying the generated cDNA using a polypeptide according to any one of claims 1 to 6.

8. Kit comprising a polymerase according to claims 1 to 5.

9. A vector encoding a polymerase according to any one of claims 1 to 5.

1845 10. A transformed host cell comprising the vector according to claim 9.

11. A viral family A polymerase, or a portion thereof comprising one of the following mutations, selected from the group of a. Q627N or Q628N; b. H752Q or H751Q;

1850 c. Q753K or Q752K; d. V754K or V753K or L754K or I754K; or mutations in similar residues from locally aligned family A polymerases per the amino acid numbering of polymerases according to claims 1 to 5.

12. Polymerase domain selected from the group of:

1855 (a) OP-2605 (577 amino acids) according to SEQ ID NO. 25 (derived from Locus tag GaO 186926_122605),

(b) OS-1622 (576 amino acids) according to SEQ ID NO. 24 (derived from Locus tag JGI20132J 14458_100001622),

(c) CS-2729 (577 amino acids) according to SEQ ID NO. 26 (derived from Locus tag

1860 Ga0080008_l 5802729), or

(d) PS-6739 (577 amino acids) according to SEQ ID NO. 27 (derived from Locus tag Ga0079997_l 1796739), or (e) polypeptide polymerase domain or functional fragment that shares more than 80%, 85%, 90%, 95% or 99% sequence identity with (a), (b), (c) or (d).

1865 13. Use of a polymerase domain according to claim 12 for constructing a chimeric enzyme, preferably an enzyme with polymerase activity, more preferably an enzyme with reverse transcriptase activity.

Description:
POLYMERASE ENZYME

TECHNICAL FIELD

[0001] The present invention is in the field of molecular biology, in particular in the field of enzymes and more particular in the field of polymerases and in the field of nucleic acid amplification and reverse transcription. The present invention is directed to novel reverse transcriptase enzymes and compositions, and to methods and kits for producing, amplifying, or sequencing nucleic acid molecules, particularly cDNA molecules, using these novel reverse transcriptase enzymes or compositions.

BACKGROUND ART

[0002] The detection, analysis, sequencing, transcription and amplification of nucleic acids are among the most important procedures in modem molecular biology. The application of such procedures for amplification, detection, quantification, sequencing and analysis of RNA is most typically dependent on the conversion of RNA into complementary DNA (cDNA) by reverse transcriptases. The term "reverse transcriptase" describes a class of polymerases characterized as RNA dependent DNA polymerases. Consequently, reverse transcriptases are considered foundational enzymes in molecular biology and are important for many applications, especially including the investigation of gene expression, in the diagnosis and management of infectious agents, such as RNA viruses, and in analysis of disease states including cancers and genetic disorders. Consequently, reverse transcriptases with improved properties, such as higher efficiency, speed, thermal stability, or resistance to inhibitory compounds in sample matrixes that negatively impact reverse transcription will lead to improved analysis of RNA and are highly valued in the areas of diagnostics, human and veterinary health care, agriculture, food safety, environmental monitoring and scientific research.

[0003] The primary tools for detecting and quantifying RNA are variants of reverse transcription polymerase chain reaction (RT-PCR), such as quantitative RT-PCR (RT- qPCR) or real-time RT-PCR. Other variants of RT-PCR include digital RT-PCR (dRT-PCR) or digital droplet RT-PCR (ddRT-PCR). In addition, reverse transcriptases are essential for many next-generation RNA sequencing (RNA-Seq) methods for RNA analysis. [0004] The RT-PCR procedure involves two separate molecular syntheses: First, the synthesis of cDNA from an RNA template; and second, the replication of the newly synthesized cDNA through PCR amplification. RT-PCR may be performed under three general protocols: 1) Uncoupled RT-PCR, also referred to as two-step RT-PCR. 2) Single enzyme coupled RT-PCR, also referred to as one-step RT-PCR or continuous RT-PCR, in which a single polymerase is used for both the cDNA generation from RNA as well as subsequent DNA amplification. 3) Two (or more) enzyme coupled RT-PCR, in which a thermolabile retroviral RT synthesizes complementary DNA (cDNA) using an RNA template, and a distinct DNA polymerase, commonly Taq polymerase, for amplification of the DNA product. Commonly, a 5'-3' nuclease activity, inherent in Taq DNA polymerase, facilitates fluorescent detection by amplification-dependent hydrolysis and dequenching of a fluorescent DNA probe. This is sometimes also referred to as one-step RT-PCR or, alternatively, one-tube RT-PCR.

[0005] In uncoupled RT-PCR, reverse transcription is performed as an independent step using buffer and reaction conditions optimal for reverse transcriptase activity. Following cDNA synthesis, an aliquot of the RT reaction product is used as template for PCR amplification with a thermostable DNA polymerase, such as Taq DNA Polymerase, under conditions optimal for PCR amplification.

[0006] Coupled RT-PCR provides numerous advantages over uncoupled RT-PCR. Coupled RT-PCR requires less handling of the reaction mixture reagents and nucleic acid products than uncoupled RT-PCR (e.g., opening of the reaction tube for component or enzyme addition in between the two reaction steps), and is therefore less labor-intensive, and time- consuming, and has reduced risk of contamination. Furthermore, coupled RT-PCR also requires less sample, making it especially suitable for applications where the sample amounts are limited (e.g., with FFPE, biopsy, or environmental samples). [0007] Although single-enzyme-coupled RT-PCR is easy to perform, this system is expensive to perform, however, due to the amount of DNA polymerase required. In addition, the single enzyme coupled RT-PCR method has been found to be less sensitive than uncoupled RT-PCR, and limited to polymerizing nucleic acids of less than one kilobase pair in length. [0008] Some inherently thermostable DNA polymerases, e.g. Tth polymerase and Hawk Z05, can be induced to function as reverse transcriptases by modifying the buffer to include manganese rather than the typical magnesium (Myers and Gelfand 1991. Biochemistry 30:7661). Other variants of thermostable DNA polymerases, e.g. those of Thermus (US 5,455,170), Thermatoga and other thermophiles, have been modified by mutagenesis and directed evolution to polymerize DNA from RNA templates (Sauter and Marx 2006. Angew. Chem. Int. Ed. Engl. 45:7633; Kranaster et al. 2010. Biotechnol. J. 5:224; Blatter et al. 2013. Angew. Chem. Int. Ed. Engl. 52:11935). Intron encoded RTs from various thermophilic bacteria have been explored for their potential use in single enzyme RT-PCR (Zhao et al. 2018. RNA 24:183; Mohr et al. 2013. RNA 19:958). Alternatively, mutagenesis of archaeal family B DNA polymerases has resulted in functional proofreading thermostable RTs (Ellefson et al. 2016. Science 352:1590).

[0009] Single enzyme magnesium-dependent RT-PCR was enabled by PyroPhage® DNA polymerase. A 588 amino acid sequence was submitted as GenBank Acc. No. AFN99405.1 with the patent filings, i.e. US 8,093,030 and related patents, and presumptively comprises the PyroPhage DNA polymerase. This enzyme has both thermostable reverse transcriptase and DNA polymerase activities. This enzyme, as described in patents (US 8,093,030), proved difficult to manufacture consistently, did not have sufficient RT activity, and was not competitive with the two enzyme systems with regard to ease of use, sensitivity, versatility in target RNAs, time-to-result, functionality in detection using probes or overall reliability.

[0010] Overall, none of these alternative thermostable reverse transcriptase/polymerase enzymes has been sufficiently effective in RT-PCR. Consequently, coupled RT-PCR systems with two (or more) enzyme mixes based on Taq polymerase and a thermolabile retroviral RT continue to be the state of the art for the great majority of practitioners and generally show increased sensitivity over the single enzyme system, even when coupled in a single reaction mixture. This effect has been attributed to the higher efficiency of reverse transcriptase in comparison to the reverse transcriptase activity of DNA polymerases (Sellner and Turbett, BioTechniques 25(2):230-234 (1998)).

[0011] Although the two-enzyme coupled RT-PCR system is more sensitive than the single enzyme system, reverse transcriptase has been found to interfere directly with DNA polymerase during the replication of the cDNA, thus reducing the sensitivity and efficiency of this technique (Sellner et al., J. Viol. Methods 40:255-264 (1992)). In order to minimize the number of manual manipulations required for processing large numbers of samples, Sellner et al. atempted to design a system whereby all the reagents required for both reverse transcription and amplification can be added to one tube and a single, non-interrupted 95 thermal cycling program can be performed. Whilst atempting to set up such a one-tube system with Taq polymerase and avian myoblastis virus RT, they noticed a substantial decrease in the sensitivity of detection of viral RNA. They found out a direct interference of reverse transcriptase with Taq polymerase. A variety of solutions to overcome the inhibitory activity of reverse transcriptase on DNA polymerase have been tried, including: increasing loo the amount of template RNA, increasing the ratio of DNA polymerase to reverse transcriptase, adding modifier reagents that may reduce the inhibitory effect of reverse transcriptase on DNA polymerase (e.g., non homologous tRNA, T4 gene 32 protein, sulphur or acetate-containing molecules), and heat-inactivation of the reverse transcriptase before the addition of DNA polymerase.

105 [0012] All of these modified RT-PCR methods have significant drawbacks, however.

Increasing the amount of template RNA is not possible in cases where only limited amounts of sample are available. Individual optimization of the ratio of reverse transcriptase to DNA polymerase is not practicable for ready-to-use reagent kits for one-step RT-PCR. The net effect of currently proposed modifier reagents to releive reverse transcriptase inhibition of no DNA polymerization is controversial and in dispute: positive effects due to these reagents are highly dependent on RNA template amounts, RNA composition, or may require specific reverse transcriptase-DNA polymerase combinations (Chandler et al., Appl. and Environm Microbiol. 64(2): 669-677 (1998)). Finally, heat inactivation of the reverse transcriptase before the addition of the DNA polymerase negates the advantages of the coupled RT-PCR ns and carries all the disadvantages of uncoupled RT-PCR systems discussed earlier. Even if a reverse transcriptase is heat inactivated, it still may confer an inhibitory effect on PCR, likely due to binding of heat-inactivated reverse transcriptase to the cDNA template.

[0013] Some improvements to reduce the inhibitory effect of reverse transcriptase on the activity of the polymerase have been made, including:

120 1) In US 2009/0137008 Al, Gong and Wang describe the reduction of the inhibitory effect of reverse transcriptase on DNA polymerase by proteins that bind dsDNA in a non-specific way such as Sso7d, Sac7d, Sac7e or Sso7e and by sulfonic-acid and by sulfonic acid salts. 2) In EP 1050587 Bl, Missel et al. describe the reduction of the inhibitory effect of

125 reverse transcriptase on DNA polymerase by homopolymeric nucleic acids.

3) In US Pat. No. 9,758,812 Fang and Missel describe the use of anionic polymers to improve the sensitivity of coupled one-step RT-PCR.

[0014] Although the methods described by Gong and Wang, Missel et al., and Fang and Missel respectively, successfully have shown a significant reduction of the inhibitory effect 130 of reverse transcriptase, a further improved specificity and sensitivity of RT-PCR by a more effective reduction of the inhibitory effect of reverse transcriptase is still a need in the art.

[0015] The lower temperature reaction conditions required for optimal retroviral RT activity (Yasukawa et al., 2008. J. Biochem. 143:261) is another factor that can limit the efficiency of reverse transcription and efficacy of one-step RT-PCR in detecting certain sequences. 135 This is especially true if the lower temperatures promote formation of unfavorable secondary structures such as hairpins, stem loops, and G quadruplexes that block primer binding and impede nascent strand synthesis on the RNA template (Malboeuf et al. 2001. BioTechniques 30: 1074). For highly structured RNA targets, especially common in viral genomes, it would be advantageous to perform cDNA synthesis at higher temperatures so that RNA secondary 140 structures are destabilized and non-specific primer binding is minimized. Additionally, highly thermal stable reverse transcriptases would enable compatibilty with monoclonal antibody (US Pat. No. 5,338,671) or chemical hot-start methods (US Pat. No. 5,773,258) such as those used for PCR amplification polymerases such as Taq DNA polymerase to further improve the specificity and efficiency of one-step RT-PCR. Lastly, highly 145 thermostable reverse transcriptases would enable integration of uracil DNA glycoslyase- medated amplicon carry-over decontamination methods (US Pat. No. 5,683,896) in one-step RT-PCR without the requirement for psychrophilic, heat-labile, uracil DNA glycosylases.

[0016] Because of the importance of RT-PCR applications, novel reverse transcriptases with high thermal stability and intrinsic inhibitor resistance that overcome the known drawbacks iso associated with a one-step RT-PCR system, in the form of a generalized ready-to-use composition, which exhibits high specificity and sensitivity, requires a small amount of initial sample, reduces the amount of practitioner manipulation, minimizes the risks of contamination, minimizes the expense of reagents, and maximizes the amount of nucleic acid end product is needed in the art. SUMMARY OF THE INVENTION

[0017] The present invention solves the aforementioned problem by providing for a polymerase comprising, a. an N-terminal 5 ’ -3 ’nuclease domain, i. stemming from Taq polymerase or, ii. a polymerase sharing at least 95% amino acid sequence identity with the N- terminal 5 ’-3’ nuclease domain of Taq polymerase, b. an adjacent and linked polymerase domain, stemming from a viral family A polymerase, wherein the polymerase domain stems preferably from,

1. JGI20132J14458_100001622 (1607 amino acids), or a functional fragment that shares at least 98% amino acid sequence identity thereto, and is altered to comprise the following amino acid changes, Q627N, H751Q, Q752K, and V753K, or

2. Ga0186926_l 22605 (1595 amino acids), or a functional fragment that shares at least 98% amino acid sequence identity thereto, and is altered to comprise the following amino acid changes, Q627N, H752Q, Q753K, and V754K, or

3. Ga0080008_l 5802729 (1619 amino acids) or a functional fragment that shares at least 98% amino acid sequence identity thereto, and is altered to comprise the following amino acid changes, Q628N, H752Q, Q753K, and L754K, or

4. Ga0079997_l 1796739 (1608 amino acids), or a functional fragment that shares at least 98% amino acid sequence identity thereto and is altered to comprise the following amino acid changes, Q627N, H752Q, Q753K, and I754K.

[0018] The term “functional fragment” refers to the minimum amino acid region and corresponding DNA coding sequence from the herein designated metagenomic viral polyproteins that when expressed in a suitable host in the context of suitable regulatory elements either singularly or with ancillary sequence elements, has detectable RNA-directed DNA polymerase activity.

[0019] Herein, the N-terminal 5’-3’nuclease domain acts also as a processivity enhancing fusion tag for the present inventive construct. It is defined as (i) stemming from Taq polymerase or, a polymerase sharing at least 95% amino acid sequence identity with the N- terminal 5’-3’ nuclease domain of Taq polymerase. As such it is not essential that this polypeptide acts as a nuclease within the inventive construct. Within the present inventive construct the inventors observe that the claimed domain acts similarly to Taq DNA polymerase, where additional interactions between the nuclease domain and the DNA template increases template affinity and improves processivity compared with the N- terminal nuclease deletion (Wang et al., 2004. Nucleic Acids Res. 32:1197; Merkens et al., 1995. Biochim. Biophys. Acta. 1264:243; Murali et al., 1998. Proc. Natl. Acad. Sci. U.S.A. 95:12562).

[0020] In an alternative embodiment the N-terminal 5’-3’nuclease domain is RNase H-like, or from the RNase H superfamily and stems preferably from a N-terminal 5’-3’nuclease domain, i. stemming from Taq polymerase or, ii. a polymerase sharing at least 95% amino acid sequence identity with the N- terminal 5’-3’ nuclease domain of Taq polymerase, [0021] In particular the new enzyme shows: a. increased thermostability; b. increased thermoreactivity; c. increased resistance to reverse transcriptase inhibitors; d. increased ability to reverse transcribe difficult templates; e. increased speed; f. increased processivity; g. increased specificity; or h. increased sensitivity.

[0022] Similar or equivalent sites of corresponding amino acid positions in reverse transcriptases from other species can be mutated to produce thermostable and/or thermoreactive reverse transcriptases as disclosed herein. For example, in some embodiments the present invention provides reverse transcriptases having at least 50% (e.g., 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, etc.) amino acid sequence identity to those SEQ IDs claimed herein. [0023] The present invention is also directed to DNA molecules (preferably vectors) containing a gene or nucleic acid molecule encoding the mutant reverse transcriptases of the present invention and to host cells containing such DNA molecules. Any number of hosts may be used to express the gene or nucleic acid molecule of interest, including prokaryotic and eukaryotic cells. Preferably, prokaryotic cells are used to express the polymerases of the invention. The preferred prokaryotic host according to the present invention is E. coli.

[0024] The invention also provides compositions and reaction mixtures for use in reverse transcription of nucleic acid molecules, comprising one or more mutant or modified reverse transcriptase enzymes or polypeptides as disclosed herein. Such compositions may further comprise one or more nucleotides, a suitable buffer, and/or one or more DNA polymerases. The compositions of the invention may also comprise one or more oligonucleotide primers or terminating agents (e.g., dideoxynucleotides). Such compositions may also comprise a stabilizing agent, such as glycerol or a surfactant. Such compositions may further comprise the use of hot start mechanisms to prevent or reduce unwanted polymerization products during nucleic acid synthesis.

[0025] The invention provides in certain embodiments, compositions that include one or more reverse transcriptases of the invention and one or more DNA polymerases for use in amplification reactions. Such compositions may further comprise one or more nucleotides and/or a buffer suitable for amplification. The compositions of the invention may also comprise one or more oligonucleotide primers. Such compositions may also comprise a stabilizing agent, such as glycerol or a surfactant. Such compositions may further comprise the use of one or more hot start mechanisms to prevent or reduce unwanted polymerization products during nucleic acid synthesis.

[0026] The invention also relates to certain polymerase domains an their uses:

OS-1622 (576 amino acids) SEQ ID NO. 24 is derived from Locus tag

JGI20132J 14458_100001622

OP-2605 (577 amino acids) SEQ ID NO. 25 is derived from Locus tag

GaO 186926_122605

CS-2729 (577 amino acids) SEQ ID NO. 26 is derived from Locus tag

Ga0080008_l 5802729

PS-6739 (577 amino acids) SEQ ID NO. 27 is derived from Locus tag

Ga0079997_l 1796739

[0027] The invention further provides methods for synthesis of nucleic acid molecules using one or more mutant reverse transcriptase enzymes or polypeptides as disclosed herein. In particular, the invention is directed to methods for making one or more nucleic acid molecules, comprising mixing one or more nucleic acid templates (preferably one or more RNA templates and most preferably one or more messenger RNA templates) with one or more reverse transcriptases of the invention and incubating the mixture under conditions sufficient to make a first nucleic acid molecule or molecules complementary to all or a portion of the one or more nucleic acid templates. In some embodiments, the first nucleic acid molecule is a single-stranded cDNA. Nucleic acid templates suitable for reverse transcription according to this aspect of the invention include any nucleic acid molecule or population of nucleic acid molecules (preferably RNA and most preferably mRNA), particularly those derived from a cell or tissue. In some embodiments, cellular sources of nucleic acid templates include, but are not limited to, bacterial cells, fungal cells, plant cells and animal cells.

[0028] In certain embodiments, the invention provides methods for making one or more double-stranded nucleic acid molecules. Such methods comprise (a) mixing one or more nucleic acid templates (preferably RNA or mRNA, and more preferably a population of mRNA templates) with one or more reverse transcriptases of the invention; (b) incubating the mixture under conditions sufficient to make a first nucleic acid molecule or molecules complementary to all or a portion of the one or more templates; and (c) incubating the first nucleic acid molecule or molecules under conditions sufficient to make a second nucleic acid molecule or molecules complementary to all or a portion of the first nucleic acid molecule or molecules, thereby forming one or more double-stranded nucleic acid molecules comprising the first and second nucleic acid molecules. Such methods may include the use of one or more DNA polymerases as part of the process of making the one or more double- stranded nucleic acid molecules. The invention also concerns compositions useful for making such double-stranded nucleic acid molecules. Such compositions comprise one or more reverse transcriptases of the invention and optionally one or more DNA polymerases, a suitable buffer, one or more primers, and/or one or more nucleotides.

[0029] The invention also provides methods for amplifying a nucleic acid molecule. Such amplification methods comprise mixing the double-stranded nucleic acid molecule or molecules produced as described above with one or more DNA polymerases and incubating the mixture under conditions sufficient to amplify the double-stranded nucleic acid molecule. In a first preferred embodiment, the invention concerns a method for amplifying a nucleic acid molecule, the method comprising (a) mixing one or more nucleic acid templates (preferably one or more RNA or mRNA templates and more preferably a population of mRNA templates) with one or more reverse transcriptases of the invention and with one or more DNA polymerases and (b) incubating the mixture under conditions sufficient to amplify nucleic acid molecules complementary to all or a portion of the one or more templates.

[0030] The invention is also directed to methods for reverse transcription of one or more nucleic acid molecules comprising mixing one or more nucleic acid templates, which are preferably RNA or messenger RNA (mRNA) and more preferably a population of mRNA molecules, with one or more reverse transcriptase of the present invention and incubating the mixture under conditions sufficient to make a nucleic acid molecule or molecules complementary to all or a portion of the one or more templates. To make the nucleic acid molecule or molecules complementary to the one or more templates, a primer (e.g., an oligo(dT) primer) and one or more nucleotides are preferably used for nucleic acid synthesis in the 5 to 3 direction. Nucleic acid molecules suitable for reverse transcription according to this aspect of the invention include any nucleic acid molecule, particularly those derived from a prokaryotic or eukaryotic cell. Such cells may include normal cells, diseased cells, transformed cells, established cells, progenitor cells, precursor cells, fetal cells, embryonic cells, bacterial cells, yeast cells, animal cells (including human cells), avian cells, plant cells and the like, or tissue isolated from a plant or an animal (e.g., human, cow, pig, mouse, sheep, horse, monkey, canine, feline, rat, rabbit, bird, fish, insect, etc.). Nucleic acid molecules suitable for reverse transcription may also be isolated and/or obtained from viruses and/or virally infected cells.

[0031] The invention further provides methods for amplifying or sequencing a nucleic acid molecule comprising contacting the nucleic acid molecule with a reverse transcriptase of the present invention. In some embodiments, such methods comprise one or more polymerase chain reactions (PCRs). In some embodiments, a reverse transcription reaction is coupled to a PCR, such as in RT-PCR.

[0032] The present invention also provides kits for reverse transcription comprising the reverse transcriptase of the present invention in a packaged format. The kit for reverse transcription of the present invention can include, for example, the reverse transcriptase, any conventional constituent necessary for reverse transcription such as a nucleotide primer, at least one dNTP, and a reaction buffer, and optionally a DNA polymerase.

[0033] The invention is also directed to kits for use in the methods of the invention. Such kits can be used for making, sequencing or amplifying nucleic acid molecules (single- or double-stranded). The kits of the invention comprise a carrier, such as a box or carton, having in close confinement therein one or more containers, such as vials, tubes, bottles and the like. In certain embodiments of the kits of the invention, a first container contains one or more of the reverse transcriptase enzymes of the present invention. The kits of the invention may also comprise, in the same or different containers, one or more DNA polymerase (preferably thermostable DNA polymerases), one or more suitable buffers for nucleic acid synthesis and one or more nucleotides. Alternatively, the components of the kit may be divided into separate containers (e.g., one container for each enzyme and/or component). The kits of the invention also may comprise instructions or protocols for carrying out the methods of the invention. In preferred kits of the invention, the reverse transcriptases are mutated such that the temperature at which cDNA synthesis occurs is increased. In additional preferred kits of the invention, the enzymes (reverse transcriptases and/or DNA polymerases) in the containers are present at working concentrations.

[0034] The present invention also solves the problem by providing for a method for amplifying template nucleic acids comprising contacting the template nucleic acids with a polymerase according to the invention, preferably wherein the method is RT-PCR. That means the polymerases of the invention all have reverse transcriptase activity, as described in US5322770.

[0035] The term "reverse transcriptase" describes a class of polymerases characterized as RNA dependent DNA polymerases. All known reverse transcriptase enzymes require a primer to synthesize a DNA transcript from an RNA template. Historically, reverse transcriptase has been used primarily to transcribe mRNA into cDNA which can then be cloned into a vector for further manipulation.

[0036] The present invention also solves the problem by providing for a kit comprising a polymerase according to the invention, a vector encoding a polymerase according to the invention, or a transformed host cell comprising the vector according to the invention. [0037] The problem is solved with a viral family A polymerase, or a portion thereof comprising one of the following mutations, selected from the group of. a. Q627N or Q628N b. H751Q or H752Q c. Q752K or Q753K d. V753K or V754K or L754K or I754K or mutations in similar residues from locally aligned family A polymerases per the amino acid numbering of the Taq nuclease domain-linked polymerases as outlined above.

[0038] As used herein, the term “comprising” is to be construed as encompassing both “including” and “consisting of’, both meanings being specifically intended, and hence individually disclosed embodiments in accordance with the present invention.

[0039] Herein, and throughout the specification mutations within the amino acid sequence of a polymerase are written in the following form: (i) single letter amino acid as found in wild type polymerase, (ii) position of the change in the amino acid sequence of the polymerase and (iii) single letter amino acid as found in the altered polymerase. So, mutation of a Tyrosine residue in the wild type polymerase to a Valine residue in the altered polymerase at position 409 of the amino acid sequence would be written as Y409V. This is standard procedure in molecular biology.

[0040] The invention provides simplified and improved methods for the detection of RNA target molecules in a sample. These methods employ thermostable polymerases to catalyze reverse transcription, second strand cDNA synthesis, and, if desired, amplification by PCR. The methods of the present invention provide RNA reverse transcription and amplification with enhanced specificity and at higher temperatures than previous RNA cloning and diagnostic methods. These methods are adaptable for use in kits for laboratory or clinical analysis.

BRIEF DESCRIPTIONS OF DRAWINGS Figure 1 [0041] Representation of the domain organization of full metagenomic viral gene products containing regions of family A polymerase homology. Core viral polymerase domains were isolated, then fused with the Taq polymerase 5'-3' nuclease domain at the N-terminus via a flexible linker. Polymerases were further engineered by altering a set of four amino acids for improvements in reverse transcription performance. Figure 2

[0042] Fig. 2 illustrates the efficient reverse transcriptase activity of the engineered viral family A DNA polymerase in lysate-based RT-qPCR reactions using MS2 RNA template and 70°C reaction temperature compared with the engineered, gene-shuffled M503 polymerase. Figure 3

[0043] Fig. 3 illustrates reverse transcriptase efficiency of OP-2605 mutant library variants after heating at 80 °C for 5 minutes in lysate-based RT-qPCR reactions using MS2 RNA template. The differences in Cq value are reported relative to the parental OP-2605 polymerase, in which the absolute Cq value was 20.1. Library variants 015, 057, or 058 each generated lower Cq values for detection of MS2 RNA than the parental OP-2605 polymerase, indicative of improved sensitivity and corresponding efficiency of RNA conversion to 1st strand product.

Figure 4

[0044] Figure 4 illustrates the thermal activity profile of the engineered viral variants as measured by the relative nucleotide polymerization rates.

Figure 5

[0045] Figure 5 illustrates the sensitivity and efficiency of detection of viral RNA by the engineered viral polymerase variants in probe-based in one-step RT-qPCR reactions.

Figure 6 [0046] Figure 6 illustrates the heparin resistance of the engineered viral polymerase variants compared with the engineered, gene shuffled M503 polymerase in probe-based, one-step RT-qPCR reactions.

DETAILED DESCRIPTION OF THE INVENTION [0047] The invention relates to numerous new polymerases, for use in reverse transcription, PCR, sequencing and RT-PCR.

[0048] The term “PCR” refers to polymerase chain reaction, which is a standard method in molecular biology for DNA amplification.

[0049] “RT-PCR” relates to reverse transcription polymerase chain reaction, a variant of PCR commonly used for the detection and quantification of RNA. RT-PCR comprises two steps, synthesis of complementary DNA (cDNA) from RNA by reverse transcription and amplification of the generated cDNA by PCR. Variants of RT-PCR include quantitative RT- PCR (RT-qPCR), real-time RT-PCR, digital RT-PCR (dRT-PCR) or digital droplet RT-PCR (ddRT-PCR).

[0050] “Methods of amplifying RNA without high temperature thermal cycling” as referred to herein, may be isothermal nucleic acid amplification technologies, such as loop-mediated amplification (LAMP), helicase dependent amplification (HDA) and recombinase polymerase amplification (RPA).

[0051] As used herein the term "cDNA" refers to a complementary DNA molecule synthesized using a ribonucleic acid strand (RNA) as a template. The RNA may be mRNA, tRNA, rRNA, or another form of RNA, such as viral RNA. The cDNA may be single- stranded, double-stranded or may be hydrogen-bonded to a complementary RNA molecule as in an RNA/cDNA hybrid. Such a hybrid molecule would result from, for example, reverse transcription of an RNA template using a DNA polymerase.

[0052] The present invention solves the aforementioned problem by providing for a polymerase comprising, a. an N-terminal 5 ’-3’ nuclease domain, i. stemming from Taq polymerase or, ii. a polymerase sharing at least 95 % amino acid sequence identity with the N- terminal 5 ’-3’ exonuclease domain of Taq polymerase, b. an adjacent and linked polymerase domain, stemming from a viral family A polymerase, wherein the polymerase domain stems preferably from,

1. JGI20132J14458_100001622 (1607 amino acids), or a functional fragment that shares at least 98% amino acid sequence identity thereto, and is altered to comprise the following amino acid changes, Q627N, H751Q, Q752K, and V753K, or

2. Ga0186926_122605 (1595 amino acids), or a functional fragment that shares at least 98% amino acid sequence identity thereto, and is altered to comprise the following amino acid changes, Q627N, H752Q, Q753K, and V754K, or

3. Ga0080008_l 5802729 (1619 amino acids) or a functional fragment that shares at least 98% amino acid sequence identity thereto, and is altered to comprise the following amino acid changes, Q628N, H752Q, Q753K, and L754K, or

4. Ga0079997_l 1796739 (1608 amino acids), or a functional fragment that shares at least 98% amino acid sequence identity thereto and is altered to comprise the following amino acid changes, Q627N, H752Q, Q753K, and I754K.

[0053] The 5 ’-3’ nuclease domain may be from Taq.

[0054] Taq is commercially available as a recombinant product or purified as native Taq from Thermus aquaticus (Perkin Elmer-Cetus). Recombinant Taq is designated as rTaq and native Taq is designated as nTaq. Native Taq is purified from T. aquaticus.

[0055] The 5 ’-3’ nuclease domain may also be from Tth purified from T. thermophilus or recombinant Tth.

[0056] Other thermostable polymerases that have been reported in the literature will also find use in the practice of the methods for making the 5 ’-3’ nuclease domain. Examples of these include polymerases extracted from the thermophilic bacteria Bacillus stearothermophilus, Thermus aquaticus, T. flavus, T. lacteus, T. rubens, T. ruber, and T. thermophilus.

[0057] Such polymerases are useful in PCR but also in RT-PCR. The present invention for the first time discloses a highly useful polymerase that can reverse transcribe RNA into DNA and react efficiently at high temperatures.

[0058] The activity of the polymerases of the invention do not require the presence of manganese so that the polymerases of the inventions may be used in conventional magnesium containing buffers. This compatibility with magnesium provides practical advantages in simplicity of reaction formulation and accuracy of synthesis, as is known in the art.

[0059] Preferably, in the polymerase according to the invention there is a peptide linker between the exonuclease domain and the polymerase domain and, optionally said peptide linker has the amino acid sequence according to SEQ ID NO. 19 (GGGGSGGGGS). In general, suitable linkers may be amino acid linkers comprising 5-15 amino acids, more preferably 7-12 amino acids, most preferably 9-11 amino acids. Alternatively, suitable linkers may be non-amino acid linkers.

[0060] Preferably, the polymerase domain is derived from a thermophilic viral family A polymerase. Other suitable polymerases include bacterial family A and non-thermophilic viral family A polymerases.

[0061] Preferably the exodomain of such a polymerase domain is inactivated. The 3’-5’ exonuclease (proofreading) activity was inactivated with a E to A mutation at residue 40 or 41 of the truncated enzyme. These would preferably be OS-1622 (577 amino acids), OP- 2605 (578 amino acids), CS-2729 (578 amino acids) and PS-6739 (578 amino acids).

[0062] In some embodiments, the mutant ezmye claimed herein demonstrate increased reverse transcriptase activity that is at least 10% (e.g., 10%, 25%, 50%, 75%, 80%, 90%, 100%, 200%, etc.) more than wild type reverse transcriptase activity. In some embodiments, the mutant enzyme possess reverse transcriptase activity after 5 minutes at 60° C. that is at least 25% (e.g., 50%, 100%, 200%, etc.) of the reverse transcriptase activity of wild type reverse transcriptase after 5 minutes at 37° C. In some embodiments, the mutant reverse transcriptases, demonstrate one or more of the following properties: increased thermostability; increased thermoreactivity; increased resistance to reverse transcriptase inhibitors; increased ability to reverse transcribe difficult templates, increased speed/processivity; and increased specificity (e.g., decreased primer-less reverse transcription).

[0063] A native proofreading activity is inherent to the parent molecules used to derive the enzymes of this invention. To limit complications from this secondary activity such as degradation of primers, this proofreading exonuclease activity was disabled by mutagenesis in versions of the enzyme of this invention that are intended for analytic uses. Since this activity is beneficial in preparative use, this proofreading activity could be reconstituted by reversion of the proofreading exonuclease domain to the wild-type sequence, allowing the polymerase to excise mismatched bases and then insert the correctly matched base. A proofreading function coupled to high efficiency reverse transcription and inhibitor tolerance would enable high fidelity cDNA synthesis for improvements in applications such as RNA- seq and high accuracy RT-PCR.

[0064] Preferably, the polymerase domain is codon optimized for expression in E. coli. The purpose is to:

• Rebalance codon usage

• Decrease sequence complexity

• Avoid rare codons

[0065] Most preferably, the polymerase is selected from the group of, a. a polymerase (015) as encoded by a nucleic acid according to SEQ ID NO. 9 or a nucleic acid that is at least 98% identical thereto, b. a polymerase (015) with the amino acid sequence according to SEQ ID NO: 10 or a polymerase that is at least 90% identical thereto, c. a polymerase (057) as encoded by a nucleic acid according to SEQ ID NO. 11 or a nucleic acid that is at least 98% identical thereto, d. a polymerase (057) with the amino acid sequence according to SEQ ID NO: 12 or a polymerase that is at least 90% identical thereto, e. A polymerase (058) as encoded by a nucleic acid according to SEQ ID NO. 13 or a nucleic acid that is at least 98% identical thereto, and f. A polymerase (058) with the amino acid sequence according to SEQ ID NO: 14 or a polymerase that is at least 90% identical thereto.

[0066] The invention also relates to certain polymerase domains an their uses:

OS-1622 (576 amino acids) SEQ ID NO. 24 is derived from Locus tag

JGI20132J 14458_100001622

OP-2605 (577 amino acids) SEQ ID NO. 25 is derived from Locus tag

GaO 186926_122605

CS-2729 (577 amino acids) SEQ ID NO. 26 is derived from Locus tag

Ga0080008_l 5802729

PS-6739 (577 amino acids) SEQ ID NO. 27 is derived from Locus tag

Ga0079997 11796739 [0067] The invention relates therefore to a polymerase domain selected from the group of:

(a) OS-1622 (576 amino acids) SEQ ID NO. 24 is derived from Locus tag JGI20132 J14458_100001622,

(b) OP-2605 (577 amino acids) SEQ ID NO. 25 is derived from Locus tag Ga0186926_122605,

(c) CS-2729 (577 amino acids) SEQ ID NO. 26 is derived from Locus tag Ga0080008_l 5802729, or

(d) PS-6739 (577 amino acids) SEQ ID NO. 27 is derived from Locus tag Ga0079997_l 1796739, or any polypeptide or functional fragment that shares more than 80%, 85%, 90%, 95% or 99% sequence identity with one of the above.

[0068] The invention relates to the use of such a polymerase domain for constructing a chimeric enzyme, preferably and enzyme with polymerase activity, more preferably with reverse transcriptase activity.

[0069] The invention relates to the use of one of the following metagenomic amino acid sequences for isolating a polmerase domain:

Locus tag JGI20132 J14458_100001622 (1607 amino acids) SEQ ID NO. 20 Locus tag Ga0186926_l 22605 (1595 amino acids) SEQ ID NO. 21 Locus tag Ga0080008_15802729 (1619 amino acids) SEQ ID NO. 22 Locus tag Ga0079997_l 1796739 (1608 amino acids) SEQ ID NO. 23

[0070] Preferably, the invention relates also to the use of the regions (SEQ ID NOs. 20 to 23) and those that are 80%, 85%, 90% or more than 95% similar to these regions, for isolating a polymerase domain. [0071] Thus, the present invention provides for also a polymerase comprising, a. a polymerase domain, or functional fragment thereof with reverse transcriptase activity, stemming from a viral family A polymerase, wherein the polymerase domain stems preferably from,

1. OS-1622 (SEQ ID NO. 24), defined herein as a 576 amino acid region from amino acid positions 1032 to 1607 of the poly protein reported in the Integrated

Microbial Genomes & Microbiomes database (IMG/M: https//img.jgi.doe.gov/m) as Locus ID:JGI20132J14458_100001622, or a functional fragment that shares at least 95% amino acid sequence identity thereto, or

2. OP-2605 (SEQ ID NO. 25) defined herein as a 577 amino acid region from amino acid positions 1019 to 1595 of the polyprotein reported in the IMG/M database as Locus ID: Ga0186926_122605, or a functional fragment that shares at least 95% amino acid sequence identity thereto, or

3. CS-2729 (SEQ ID NO. 26) defined herein as a 577 amino acid region from amino acid positions 1043 to 1619 of the polyprotein reported in the IMG/M database as Locus ID Ga0080008_l 5802729, or a functional fragment that shares at least 95% amino acid sequence identity thereto, or

4. PS-6739 (SEQ ID NO. 27), defined herein as a 577 amino acid region from amino acid positions 1032 to 1608 of the polyprotein reported in the IMG/M database as Locus ID: Ga0079997_l 1796739, or a functional fragment that shares at least 95% amino acid sequence identity thereto. b. an adjacent and linked domain from the RNase H-like, or RNase H superfamily that stems preferably from a N-terminal 5’-3’nuclease domain, i. stemming from Taq polymerase or, ii. a polymerase sharing at least 95% amino acid sequence identity with the N- terminal 5’-3’ nuclease domain of Taq polymerase, c. amino acid alterations that comprise the following amino acid changes:

1. OS-1622 Taq nuclease domain fusion (with mutations) (SEQ ID NO. 5) Q627N, H751Q, Q752K, and V753K

2. OP-2605 Taq nuclease domain fusion (with mutations) (SEQ ID NO. 6) Q627N, H752Q, Q753K, and V754K

3. CS-2729 Taq nuclease domain fusion (with mutations) (SEQ ID NO. 7) Q628N, H752Q, Q753K, and L754K

4. PS-6739 Taq nuclease domain fusion (with mutations) (SEQ ID NO. 8) Q627N, H752Q, Q753K, and I754K.

[0072] The invention relates to a polymerase comprising, a. the amino acid sequence of i. SEQ ID NO. 15 (OS-1622-Taq-wt) comprising the following additional amino acid changes, Q627N, H751Q, Q752K, and V753K, ii. or an amino acid sequence at least 90%, preferably at least 95%, more preferably at least 98% identical thereto, b. the amino acid sequence of i. SEQ ID NO. 16 (OP-2605-Taq-wt) comprising the following additional amino acid changes, Q627N, H752Q, Q753K, and V754K, ii. or an amino acid sequence at least 90%, preferably at least 95%, more preferably at least 98% identical thereto, c. the amino acid sequence of i. SEQ ID NO. 17 (CS-2729-Taq-wt) comprising the following additional amino acid changes, Q628N, H752Q, Q753K, and L754K, or an amino acid sequence at least 90%, preferably at least 95%, more preferably at least 98% identical thereto, or d. the amino acid sequence of i. SEQ ID NO. 18 (PS-6739-Taq-wt) comprising the following additional amino acid changes, Q627N, H752Q, Q753K, and I754K, ii. or an amino acid sequence at least 90%, preferably at least 95%, more preferably at least 98% identical thereto.

[0073] The invention also relates to a method for amplifying template nucleic acids comprising contacting the template nucleic acids with a polymerase according to the invention, preferably wherein the method is reverse transcription PCR (RT-PCR).

[0074] Template nucleic acids according to the present invention may be any type of nucleic acids, such as RNA, DNA, or RNA:DNA hybrids. Template nucleic acids may either be artificially produced (e.g. by molecular or enzymatic manipulations or by synthesis) or may be a naturally occurring DNA or RNA. In some preferred embodiments, the template nucleic acids are RNA sequences, such as transcription products, RNA viruses, or rRNA. Advantageously, the method of the invention also enables amplification and detection/quantification of template nucleic acids, such as specific RNA target sequences, out of a complex mixture of target and non-target background RNA. For instance, the method of the invention allows amplification of an mRNA transcript from total human RNA or amplification of rRNA directly from bacterial cell lysate. In some embodiments, the method referred to herein is RT-PCR. RT-PCR may be quantitative RT-PCR (RT-qPCR), real-time RT-PCR, digital RT-PCR (dRT-PCR) or digital droplet RT-PCR (ddRT-PCR). In other embodiments, the method referred to herein is a method of amplifying RNA without high temperature thermal cycling, such as loop-mediated isothermal amplification (LAMP), helicase dependent amplification (HD A) and recombinase polymerase amplification (RPA). [0075] In some embodiments, the method of the invention further comprises detecting and/ or quantifying the amplified nucleic acids. Quantification/detection of amplified nucleic acids may be performed, e.g., using non-sequence-specific fluorescent dyes (e.g., SYBR® Green, EvaGreen®) that intercalate into double-stranded DNA molecules in a sequence non specific manner, or sequence-specific DNA probes (e.g., oligonucleotides labelled with fluorescent reporters) that permit detection only after hybridization with the DNA targets, synthesis-dependent hydrolysis or after incorporation into PCR products.

[0076] In other particularly preferred embodiments, the generation of cDNA in step a) and the amplification of the generated cDNA in step b) are performed at isothermal conditions. Suitable temperatures may, for instance, be between 30-96 °C, preferably 55-95 °C, more preferably 55-75 °C, most preferably 55-65 °C.

[0077] In some embodiments, in the method of the invention, a polypeptide of the invention is used in combination with Taq DNA polymerase. In other embodiments, human serum albumin is added during amplification, preferably at a concentration of 1 mg/ml.

[0078] Preferably, the method comprises: a) generating cDNA using a polypeptide according to any one of claims 1 to 6, and b) amplifying the generated cDNA using a polypeptide according to any one of claims 1 to 6.

[0079] In some embodiments additional enzymes may be present in the reaction. These may be other polymerases, kinases, ligases, glycosylases, single-stranded binding proteins, RNase inhibitors, uracil-DNA glycosylases or the like.

[0080] The invention also relates to a kit comprising a polymerase according to the invention. In some embodiments, the invention relates to kits for amplifying template nucleic acids, wherein the kit comprises a polypeptide of the invention and a buffer. Optionally, the kit additionally comprises a DNA polymerase, oligonucleotide primers, salt solutions, buffer, or other additives. Buffers comprised in the kit may be conventional buffers containing magnesium. Suitable buffer solutions do not need to contain manganese. [0081] As used herein, mutants, variants and derivatives refer to all permutations of a chemical species, which may exist or be produced, that still retain the definitive chemical activity of that chemical species. Examples include, but are not limited to compounds that may be detectably labelled or otherwise modified, thus altering the compound's chemical or physical characteristics.

[0082] In a preferred embodiment, the nucleic acid polymerase may be a DNA polymerase. The DNA polymerase may be any polymerase capable of replicating a DNA molecule. Preferably, the DNA polymerase is a thermostable polymerase useful in PCR. More preferably, the DNA polymerase is Taq, Tbr, Tth, Tih, Tfi, Tfl, Pwo, Kod, VENT, DEEPVENT, Tma, Tne, Bst, Pho, Sac, Sso, Poc, Pab, ES4 or mutants, variants and derivatives thereof having DNA polymerase activity.

[0083] Oligonucleotide primers may be any oligonucleotide of two or more nucleotides in length. Primers may be random primers, homopolymers, or primers specific to a target RNA template, e.g. a sequence specific primer.

[0084] Additional compositional embodiments comprise an anionic polymer and other reaction mixture components such as one or more nucleotides or derivatives thereof. Preferably the nucleotide is a deoxynucleotide triphosphate, dNTP, e.g. dATP, dCTP, dGTP, dTTP, dITP, dUTP, . alpha. -thio-dTNP, biotin-dUTP, fluorescein-dUTP, digoxigenin-dUTP. [0085] Buffering agents, salt solutions and other additives of the present invention comprise those solutions useful in RT-PCR. Preferred buffering agents include e.g. TRIS, TRICINE, BIS-TRICINE, HEPES, MOPS, TES, TAPS, PIPES, CAPS. Preferred salt solutions include e.g. potassium chloride, potassium acetate, potassium sulphate, ammonium sulphate, ammonium chloride, ammonium acetate, magnesium chloride, magnesium acetate, magnesium sulphate, manganese chloride, manganese acetate, manganese sulphate, sodium chloride, sodium acetate, lithium chloride, and lithium acetate. Preferred additives include e.g. DMSO, glycerol, formamide, betain, tetramethyl ammonium chloride, PEG, Tween 20, NP 40, extoine, polyols, E. coli SSB protein, Phage T4 gene 32 protein, and serum albumin. Additional compositional embodiments comprise other components that have been shown to reduce the inhibitory effect of reverse transcriptase on DNA polymerase, e.g. homopolymeric nucleic acids as described in EP 1050587 Bl. [0086] Further embodiments of this invention relate to methods for generating nucleic acids from an RNA template and further nucleic acid replication. The method comprises : a) adding an RNA template to a reaction mixture comprising at least one reverse transcriptase and/or mutants, variants and derivatives thereof and at least one nucleic acid polymerase, and/or mutants, variants and derivatives thereof, and an anionic polymer that is not a nucleic acid, and one or more oligonucleotide primers, and b) incubating the reaction mixture under conditions sufficient to allow polymerization of a nucleic acid molecule complementary to a portion of the RNA template. In a preferred embodiment the method includes replication of the DNA molecule complementary to at least a portion of the RNA template. More preferably the method of DNA replication is polymerase chain reaction (PCR). Most preferably the method comprises coupled reverse transcriptase-polymerase chain reaction (RT-PCR).

[0087] The invention also relates to a vector encoding a polymerase according to the invention.

[0088] Preferably the vector is in a transformed host cell.

[0089] In some embodiment the invention relates to a viral family A polymerase, or a portion thereof comprising one of the following mutations/alterations, i.e. is an altered enzyme, selected from the group of. a. Q627N or Q628N b. H751Q or H752Q c. Q752K or Q753K d. V753K or V754K or L754K or I754K or mutations in similar residues from locally aligned family A polymerases per the amino acid numbering of the Taq nuclease domain-linked polymerases as outlined above.

[0090] Herein, "altered polymerase enzyme" means that the polymerase has at least one amino acid change compared to the control polymerase enzyme, for example the family A polymerase. In general, this change will comprise the substitution of at least one amino acid for another. In certain instances, these changes will be conservative changes, to maintain the overall charge distribution of the protein. However, the invention is not limited to only conservative substitutions. Non-conservative substitutions are also envisaged in the present invention. Moreover, it is within the contemplation of the present invention that the modification in the polymerase sequence may be a deletion or addition of one or more amino acids from or to the protein, provided that the polymerase has improved activity (over e.g. the wildtype) with respect to reverse transcriptase activity, thermostability or inhibitor resistance as compared to a control polymerase enzyme, such as the wild type.

[0091] The altered polymerase will generally and preferably be an "isolated" or "purified" polypeptide. By "isolated polypeptide" a polypeptide that is essentially free from contaminating cellular components is meant, such as carbohydrates, lipids, nucleic acids or other proteinaceous impurities which may be associated with the polypeptide in nature. One may use a His-tag for purification, but other means may also be used. Preferably, at least the altered polymerase may be a "recombinant" polypeptide. [0092] In these embodiments the ideal reaction is only reverse transcription and/or RT-PCR.

Preferably it is reverse transcription.

[0093] The present invention solves the aforementioned problem by providing for a method of making a polymerase comprising, i) isolating an N-terminal 5’-3’nuclease domain, stemming from Taq polymerase or, a polymerase sharing at least 95 % amino acid sequence identity with the N- terminal 5’-3’ nuclease domain of Taq polymerase, ii) linking thereto a polymerase domain, stemming from a viral family A polymerase, wherein the polymerase domain stems preferably from,

1. JGI20132J14458_100001622 (1607 amino acids), or a functional fragment that shares at least 98% amino acid sequence identity thereto, and is altered to comprise the following amino acid changes, Q627N, H751Q, Q752K, and V753K, or

2. Ga0186926_l 22605 (1595 amino acids), or a functional fragment that shares at least 98% amino acid sequence identity thereto, and is altered to comprise the following amino acid changes, Q627N, H752Q, Q753K, and V754K, or

3. Ga0080008_l 5802729 (1619 amino acids) or a functional fragment that shares at least 98% amino acid sequence identity thereto, and is altered to comprise the following amino acid changes, Q628N, H752Q, Q753K, and L754K, or

4. Ga0079997_l 1796739 (1608 amino acids), or a functional fragment that shares at least 98% amino acid sequence identity thereto and is altered to comprise the following amino acid changes, Q627N, H752Q, Q753K, and I754K. [0094] In one embodiment the polymerase consists of only the viral family A polymerase domain and the mutations mentioned above.

[0095] The invention relates to a method for amplifying a target RNA molecular suspected of being present in a sample, the method comprising the steps of:

(a) treating said sample with a first primer, which primer is sufficiently complementary to said target RNA to hybridize therewith, and a thermostable DNA polymerase according to the invention having the claimed reverse transcriptase activity in the presence of all four deoxyribonucleoside triphosphates, in an appropriate buffer and at a temperature sufficient for said first primer to hybridize with said target RNA and said thermostable DNA polymerase to catalyze to polymerization of said deoxyribonucleoside triphosphates to provide cDNA complementary to said target RNA;

(b) treating said cDNA formed in step (a) to provide single-stranded cDNA;

(c) treating said single-stranded cDNA formed in step (b) with a second primer, wherein said second primer can hybridize to said single-stranded cDNA molecule and initiate synthesis of an extension product in the presence of a the same polymerase according to the invention or another thermostable polymerase under appropriate conditions to produce a double-stranded cDNA molecule; and

(d) amplifying the double-stranded cDNA molecule of step (c) by a polymerase chain reaction. [0096] Ideally, said RNA target is diagnostic of a genetic or infectious disease.

[0097] The invention relates to a method for preparing duplex cDNA from an RNA template that comprises the steps of:

(a) treating said RNA template with a first primer, which primer is sufficiently complementary to said RNA template to hybridize therewith, and a thermostable DNA polymerase according to the invention having reverse transcriptase activity in the presence of all four deoxyribonucleoside triphosphates, in an appropriate buffer and at a temperature sufficient for said first primer to hybridize with said RNA template and said thermostable DNA polymerase to catalyze the polymerization of said deoxyribonucleoside triphosphates to provide cDNA complementary to said target RNA; optionally

(b) treating said cDNA formed in step (a) to provide single-stranded cDNA;

(c) treating said single-stranded cDNA formed in step (b) with a second primer, wherein said second primer can hybridize to said single-stranded cDNA molecule and initiate synthesis of an extension product in the presence of said same polymerase or another thermostable polymerase under appropriate conditions to produce a double-stranded cDNA molecule.

[0098] Preferably the 3'-5' proofreading exonuclease activity of the polymerase is inactivated. In many analytical applications the 3 ’-5’ proofreading exonuclease activity of the polymerase is not critical; however, there are applications for which it can be advantageous for the 3 ’-5’ proofreading activity to be active, allowing for high-fidelity cDNA synthesis. Hence, in some embodiments the 3'-5' proofreading exonuclease activity is present.

[0099] The primer typically contains 10-30 nucleotides, although that exact number is not critical to the successful application of the method. Short primer molecules generally require cooler temperatures to form sufficiently stable hybrid complexes with the template.

[0100] The present methods provide that the reverse transcription of the annealed primer- RNA template is catalyzed by the claimed polymerase, i.e. a thermostable polymerase according to the invention. As used herein, the term "thermostable polymerase" refers to an enzyme that is heat stable or heat resistant and catalyzes polymerization of deoxyribonucleotides to form primer extension products that are complementary to a nucleic acid strand. Thermostable polymerases useful herein are not irreversibly inactivated when subjected to elevated temperatures for the time necessary to effect destabilization of single- stranded nucleic acids. [0101] The thermostable polymerases described herein are significantly more thermostable than commonly used retroviral RTs and are active at commonly used PCR extension temperatures at which single-stranded secondary structures would be destabilized.

[0102] Irreversible denaturation of the enzyme refers to substantial loss of enzyme activity. Preferably a thermostable DNA polymerase will not irreversibly denature at about 65°-75° C. under polymerization conditions.

[0103] Of course, it will be recognized that for the reverse transcription of mRNA, the template molecule is single-stranded and therefore, a high temperature denaturation step is unnecessary.

[0104] But high temperature reverse transcription is advantageous for reducing secondary structure in single-stranded mRNA molecules, potentially improving cDNA yield.

[0105] A first cycle of primer elongation provides a double-stranded template suitable for denaturation and amplification as referred to above.

[0106] The heating conditions will depend on the buffer, salt concentration, and nucleic acids being denatured. Temperatures for RNA destabilization typically range from 50°-80° C. for a time sufficient for denaturation to occur which depend on the nucleic acid length, base content, and complementarity between single-strand sequences present in the sample, but typically about 0.5 to 4 minutes.

[0107] The thermostable enzyme preferably has optimum activity at a temperature higher than about 40 °C, e.g., 65°-75 °C. At temperatures much above 42 °C., DNA and RNA dependent polymerases, other than thermostable DNA polymerases, are inactivated. Thus, they are inappropriate for catalyzing high temperature polymerization reactions utilizing a DNA or RNA template. Previous RNA amplification methods require incubation of the RNA/primer mixture in the presence of reverse transcriptase at a 37°-42 °C prior to the initiation of an amplification reaction.

[0108] Hybridization of primer to template depends on salt concentration and composition and length of primer. Hybridization can occur at higher temperatures (e.g., 45°-70 °C), which are preferred when using a thermostable polymerase. Higher temperature optimums for the thermostable enzyme enable RNA transcription and subsequent amplification to proceed with greater specificity due to the selectively of the primer hybridization process. Preferably, the optimum temperature for reverse transcription of RNA ranges from about 55°-75 °C, more preferably 65°-70 °C.

[0109] The methods provided have numerous applications, particularly in the field of molecular biology and medical diagnostics. The reverse transcriptase activity described provides a cDNA transcript from an RNA template. The methods provide production and amplification of DNA segments from an RNA molecule, wherein the RNA molecule is a member of a population of total RNA or is present in a small amount in a biological sample. Detection of a specific RNA molecule present in a sample is greatly facilitated by a thermostable DNA polymerase used in the methods described herein. A specific RNA molecule or a total population of RNA molecules can be amplified, quantitated, isolated, and, if desired, cloned and sequenced using a thermostable DNA polymerase as described herein.

[0110] The methods and compositions of the present invention are a vast improvement over prior methods of reverse transcribing RNA into a DNA product. These methods provide products for PCR amplification or perform the PCR directly in one tube. The invention provides more specific and, therefore, more accurate means for detection and characterization of specific ribonucleic acid sequences, such as those associated with infectious diseases, genetic disorders, or cellular disorders.

EXAMPLES EXAMPLE 1

Domain structure of the full viral polyprotein

[0111] Four previously uncharacterized viral metagenomic gene product candidates were identified from the Joint Genome Institute Integrated Microbial Genomes and Microbiomes system as multidomain polyproteins.

Locus tag JGI20132 J14458_100001622 (1607 amino acids) SEQ ID NO. 20 Locus tag Ga0186926_l 22605 (1595 amino acids) SEQ ID NO. 21 Locus tag Ga0080008_15802729 (1619 amino acids) SEQ ID NO. 22 Locus tag Ga0079997_l 1796739 (1608 amino acids) SEQ ID NO. 23

[0112] These were chosen by the inventors based on careful analysis including selection criteria as, (i) sampling location in environments in which thermophilic organisms would be expected to grow and (ii) the finding that regions of the polyprotein display protein family homology to known DNA polymerase family A proteins as determined using the Pfam database (Nucleic Acids Research (2019) doi: 10.1093/nar/gky995). The Pfam database is a large collection of protein families represented by multiple sequence alignments and hidden Markov models. Although the analysis of each of the full protein sequences revealed a large uncharacterized region at the N-terminal portion of the putative protein with a domain of unknown function, each also contained domains at the C-termal portion with homology to DNA polymerase family A proteins and an associated domain with homology to Pol A 3'-5' proofreading exonuclease domains. This suggested to the inventors that these proteins may function in viral nucleic acid replication or repair and may possess thermoactive DNA polymerase and/or reverse transcriptase activities.

Truncation and protein engineering

[0113] We next sought to isolate an active polymerase region from the large putative viral protein by truncating the full protein according to the predicted Pfam structural and functional information.

[0114] The core polymerase sequences we isolated are as follows: OS-1622 (576 amino acids) SEQ ID NO. 24 is derived from Locus tag

JGI20132J 14458_100001622

OP-2605 (577 amino acids) SEQ ID NO. 25 is derived from Locus tag

GaO 186926_122605 CS-2729 (577 amino acids) SEQ ID NO. 26 is derived from Locus tag

Ga0080008_l 5802729

PS-6739 (577 amino acids) SEQ ID NO. 27 is derived from Locus tag

Ga0079997_l 1796739

[0115] Each of the candidate viral polymerase DNA sequences was codon optimized for expression in E. coli, and the corresponding synthetic gene fragments were constructed and assembled into an expression vector. Compared with the predicted wild-type amino acid sequence obtained from the previously identified viral genes, each polymerase was engineered in two ways: Fusion with the Taq DNA polymerase 5'-3' nuclease domain via an intervening eight amino acid flexible linker with the sequence GGGGSGGGGS and incorporation of four mutations in regions of the polymerase predicted to associate with template nucleic acid (Figure 1).

[0116] In addition, the 3 ’-5’ exonuclease (proofreading) activity was inactivated with a E to A mutation at residue 40 or 41 of the truncated enzyme.

[0117] The viral polymerase domain was fused at the N-terminus with the 5’-3’ nuclease domain of Taq polymerase via a flexible linker.

[0118] The Taq fusions were then mutated as follows:

OS-1622-Taq-wt (Q627N, H751Q, Q752K, V753K)

OP-2605-Taq-wt (Q627N, H752Q, Q753K, V754K)

CS-2729-Taq-wt (Q628N, H752Q, Q753K, L754K) PS-6739-Taq-wt (Q627N, H752Q, Q753K, I754K)

[0119] The OP-2605 -Taq-mut sequence was then further altered by incorporating seven stabilizing mutations as described below.

EXAMPLE 2

[0120] Using sequence divergent thermostable viral family A DNA polymerases identified from hot spring metagenomic sampling studies (see above), we show that the combination of two protein engineering steps induced robust, high activity, inhibitor resistant reverse transcription activity to the DNA polymerases in PCR-based RNA detection assays. The two modifications to the wild-type sequences were the N-terminal Taq nuclease fusion and the incorporation of four mutations in regions of the polymerase predicted to associate with template nucleic acid. Based on these findings, this protein engineering methodology may be generally applicable to improving on basal reverse transcription activity in a broad set of viral family A DNA polymerases.

[0121] The viral family A polymerases were selected from a database containing sequences from metagenomic sampling studies, the Joint Genome Institute Integrated Microbial Genomes and Microbiomes system (https://img.jgi.doe.gov/). Based on sampling locations in hot spring regions of Yellowstone National Park and similarity to known viral family A polymerases, a number of orthologs were selected (Table 1).

[0122] The C-terminal 576 or 577 amino acids of the larger putative viral gene corresponded to the polymerase domain and showed significant divergence from the gene shuffled Ml 60 viral family A variant (WO 2019/211749), with amino acid identity ranging from 79 to 85 percent. In addition, these additional viral family A polymerases show divergence from each other, with pairwise amino acid percent identity ranging from 79 to 89 percent.

[0123] Each of the candidate viral polymerase DNA sequences was codon optimized for expression in E. coli, and the corresponding synthetic gene fragments were constructed and assembled into an expression vector. Compared with the predicted wild-type amino acid sequence obtained from the previously identified viral genes, each polymerase was engineered in two ways: Fusion with the Taq DNA polymerase 5'-3' nuclease domain via an intervening eight amino acid flexible linker with the sequence GGGGSGGGGS and incorporation of four mutations in regions of the polymerase predicted to associate with template nucleic acid (Figure 1). After verification of the sequences of each of the nucleic acid constructs (SEQ ID NO 1-4), the engineered polymerases (SEQ ID NO 5-8) were overexpressed in BL21 cells. Overexpressed protein was not detected for CS-2729, but for the other three polymerases, soluble protein was produced, and stability was maintained after heating of lysate at 75 °C for 10 minutes to precipitate host E. coli protein and centrifugation to clarify lysate. Reverse transciptase activity was tested from lysates in RT-qPCR reactions (20 pi) containing Taq polymerase and Eva Green dye, targeting a 243-nucleotide region of the MS2 RNA genome (Figure 2). Incubation was at 70 °C for 1 min; followed by 94 °C for 30 s; followed by 40 cycles of 94 °C for 5 s and 70 °C for 20 s with fluorescence data collection during the anneal/extension step. Compared with the engineered, gene shuffled M503 polymerase (WO 2019/211749), the amplification fluorescence curves of the additional engineered viral family A polymerases were very similar, indicating highly efficient reverse transcriptase activity for all polymerases at the 70 °C reaction temperature in just one minute. In contrast, in reactions without reverse transcriptase-containing lysate and containing Taq polymerase only, amplification from the RNA template was late and inefficient as expected.

[0124] Whereas each engineered viral family A polymerase was stable in cell lysate after incubation at 75 °C for 10 minutes, some activity loss was observed after incubation at 80 °C for 5 minutes in reaction buffer. In order to improve the thermal stability of the engineered OP-2605 polymerase, seven amino acid positions were identified for combinatorial mutagenesis and variant screening for elevated reverse transcriptase activity after an 80 °C incubation. With a homology model of the OP-2605 polymerase using a well-studied KlenTaq structure as a template, thirteen stabilizing point mutations in total were predicted among the seven amino acid positions based on local amino acid environment. A variant mutant library was constructed in which each of the 48 possible combinations of these thirteen mutations could be tested at random. After screening a total of 64 E. coli lysates overexpressing the OP-2605 variants, it was found that 49 of these (76.6%) did not maintain efficient reverse transcriptase activity at 70 °C and so were discarded. The remaining 15 variants were tested for reverse transcriptase activity after incubation at 80 °C for 5 minutes (Figure 3). RT-qPCR reactions (20 pi) containing Taq polymerase and Eva Green dye targeted a 243 nucleotide region of the MS2 RNA genome. Incubation was at 70 °C for 1 min; followed by 94 °C for 30 s; followed by 40 cycles of 94 °C for 5 s and 70 °C for 20 s with fluorescence data collection during the anneal/extension step. It was found that three engineered OP-2605 variants showed improved thermal stability as measured by the lower Cq values after heat treatment compared with the parental polymerase, indicating that they retained higher activity levels. The mutations introduced in the three improved variants identified from the mutant library screening are shown in Table 2.

[0125] For further analysis of the enzymes, the three high activity engineered OP-2605 variants were then expressed in E. coli and purified by strong cation exchange and heparin spin-column chromatography as is known in the art. DNA polymerization activities of the 970 variants were measured by determining the relative rates of nucleotide incorporation (Figure 4) using a primed M13 template. Reactions (20 pi) contained 20 mM Tris, pH 8.8, 10 mM (NH4)2S04, 10 mM KC1, 2 mM MgS04, 0.1% Triton X-100, 200 mM dNTPs, IX SYBR Green I (Thermo Fisher), 7.5 pg/ml M13mpl8 DNA, 0.25 mM each of a mixture of three primers 24-33 nt in size, and 0.1-1 ng of polymerase. Reactions were incubated at the

975 indicated temperatures, fluorescence readings were taken every 15 seconds, and fluorescence initial slope values were calculated and compared. The temperature at which the activity was highest was set at 1 and other values were plotted relative to this number. As shown in figure 4, each of the 015, 057, and 058 variants display peak activity from 65- 70 °C.

980 [0126] To test the sensitivity of 015, 057, and 058 in detection of viral MS2 RNA, RT- qPCR reactions were performed using a dual-quenched FAM-labeled hydrolysis probe for amplification detection (Figure 5). Reactions (20 mΐ) contained Taq polymerase and targeted a 243-nucleotide region of the MS2 RNA genome. Incubation was at 70 °C for 1 min; followed by 94 °C for 30 s; followed by 40 cycles of 94 °C for 5 s and 70 °C for 20 s with

985 fluorescence data collection during the anneal/extension step. It was found that all three of the engineered variants catalyzed high efficiency reverse transcription of the viral RNA in the 1 -minute high temperature incubation step, supporting efficient and sensitive detection of the MS2 viral RNA. As few as 100 copes were detected, the smallest quantity tested, indicating a high degree of sensitivity and specificity.

990 [0127] The performance of nucleic acid amplification-based detection methods are often inhibited by the presence of inhibitors in target samples. One of these inhibitors, heparin, is commonly used as an anticoagulant and can copurify with nucleic acid samples derived from blood. To test the compatibility of the 015, 057, and 058 engineered variants with the detection of viral MS2 RNA in the presence of an inhibitor, RT-qPCR reactions were

995 performed with increasing quantities of heparin and compared with the engineered, gene shuffled M503 polymerase (Figure 6). Reactions (20 mΐ) contained Taq polymerase, 1X10 6 copies of the MS2 RNA genome, and incubation was at 70 °C for 2 min; followed by 94 °C for 30 s; followed by 40 cycles of 94 °C for 5 s and 70 °C for 20 s. Of the three engineered variants, 057 displayed the greatest heparin resistance as indicated by the lowest Cq values

1000 at elevated heparin concentrations. In addition, the 057 variant displayed a significantly greater inhibitor resistance than the engineered, gene shuffled M503 polymerase, with Cq values 3.7-6.5 lower in the presence of greater than 1.25 ng/mΐ heparin.

[0128] Table 1 shows the identification of potential thermophilic viral Family A DNA polymerases. [0129] Metagenomic viral family A polymerases were identified from Yellowstone hot spring sampling studies. The protein product size corresponding to the total size of the putative viral gene is indicated in addition to the size of the aligned polymerase domain. The percent identity is relative to the gene shuffled Ml 60 polymerase variant.

[0131] Table 2 shows OP-2605 stabilizing mutant sequences.

[0132] Table 2

[0133] Most astonishingly the new polymerases differ substantially from those previously developed; see WO 2019/211749 and EP1934339.

SEQUENCES LISTING

1020 SEQ ID NO. 1 Codon optimized OS- 1622 Taq fusion DNA sequence (with mutations) Length: 2,628, Type: DNA, Source: Synthetic

ATGCGTGGTATGCTTCCACTGTTTGAACCGAAAGGCCGTGTGCTGCTGGTTGAT

GGCCACCATCTGGCCTATCGTACCTTCCATGCGCTGAAAGGCCTGACGACCAG

CCGCGGCGAACCGGTGCAGGCGGTGTATGGCTTTGCGAAAAGCCTGCTGAAAG

1025 CGCTGAAAGAAGATGGCGATGCGGTTATTGTGGTGTTTGATGCGAAAGCGCCG

AGCTTTCGTCATGAAGCGTATGGCGGCTATAAAGCGGGTCGTGCGCCGACCCC

GGAAGATTTTCCGCGTCAGCTGGCCCTGATTAAAGAACTGGTGGATCTGCTGG

GCCTGGCGCGTCTGGAAGTGCCGGGCTATGAAGCGGATGATGTGCTGGCCAGC

CTGGCCAAAAAAGCGGAAAAAGAAGGCTACGAAGTTCGTATTCTGACCGCCG

1030 ATAAAGACCTGTATCAGCTGCTGTCTGATCGTATTCATGTGCTGCATCCTGAGG

GTTATCTGATTACCCCGGCGTGGCTGTGGGAAAAATATGGCCTGCGTCCGGAT

CAGTGGGCGGATTATCGTGCGCTGACCGGCGATGAAAGCGATAACCTGCCGGG

CGTGAAAGGCATTGGCGAAAAAACCGCGCGTAAACTGCTGGAAGAATGGGGC

AGCCTGGAAGCGCTGCTGAAAAACCTGGATCGTCTGAAACCGGCGATTCGTGA

1035 AAAGATCTTAGCGCACATGGATGATCTGAAACTGAGCTGGGATCTGGCCAAAG

TGCGTACCGATCTGCCGCTGGAAGTGGATTTTGCGAAACGTCGTGAACCGGAT

CGTGAACGTCTGCGTGCGTTTCTGGAACGTCTGGAATTTGGCAGCCTGCTGCAT

GAATTTGGCCTGCTGGAAAGCGGTGGCGGCGGTTCTGGCGGTGGTGGCAGCAA

CATTCCCAAGCCGATCCTTAAACCACAACCTAAAGCACTTGTTGAACCGGTTCT

1040 GTGCGACAGCGTCGATGAAATCCCCACAAAGTTTAACGAACCAATCTATTTCG

ATCTTGCAACCGACGGGGACCGCCCGGTGTTAGCATCCATCTACCAACCCCAC

TTTGAACGTAAGGTCTATTGTCTTAACTTATTAAAAGAGAAGCCTACTCGTTTT

AAGGAGTGGCTTCTGAAGTTCAGCGAGATTCGTGGCTGGGGTTTAGACTTCGA

TCTGCGCGCCTTAGGTTACACATACGAGCAGTTACGCGATAAAAAGATTGTGG

1045 ACGTGCAGCTGGCTATCAAAGTCCAGCATCATGAACGCTTCAAGCAGAACGGT

ACTAAGGGTGAAGGCTTTCGTCTGGACGACGTGGCCCGCGATTTGTTAGGAAT

CGAGTACCCTATGGATAAGACCAAGATCCGCGAGACGTTTAAAAATAACATTT

TTCACTCATTTAGCAATGAGCAATTGTTGTATGCATCTCTTGACGCTTATATCC

CTCACCTGCTTTACGAACAATTAACGAGTTCAACGCTTAATTCGCTGGTTTACC

1050 AGTTAGATCAGCAAGCACAGAAAATTGTGGTGGAAACAAGTCAGAATGGTAT

GCCGGTTAAATTAAAGGCTCTGGAAGAGGAAATCCATCGCTTGACGCAGCTTC

GTAACCAAATGCAAAAAGAAATTCCTTTTAACTACAATTCGCCTAAACAGACA

GCTAAATTCTTCCGTGTTGATTCCAGCAGTAAGGACGTTCTTATGGACCTGGCA

TTACAAGGTAATGAGATGGCGAAACGCGTTTTGGAAGCCCGCCAGGTCGAGAA

1055 GAGCCTGGCCTTCGCTAAGGATCTTTATGACATCGCGAAACGCAGCGGAGGGC

GCGTTTATGGAAATTTCTTTACCACAACGGCGCCGAGTGGACGTATGAGTTGT

AGCGATATCAACCTTCAAAATATCCCTCGCCGCTTACGCCAATTCATTGGCTTT

GATACGGAAGATAAACGTCTTATTACGGCAGACTTTCCTCAAATCGAGCTGCG

CTTAGC GGGAGTC AT CT GGAAC GAGAGC GAGTT C ATT GAAGC CTTTAAAC AAG 1060 GCATTGACCTTCATAAATTAACGGCGTCAATTCTGTTTGAGAAGAATATTGAG GA AGT C GGGA AGGAGGA AC GT C AGATT GGT A AAT C GGC GA ATTTT GGATT A AT TTATGGAATTGCTCCTAAAGGTTTTGCTGAGTACTGTATTACGAACGGAATTAA T AT GAC GGAAGAGC AGGC AT AC GA A ATT GT AC GCA AGT GG A AGA AAT ATT AT ACTAAGATTGCGGAGCAGCAAAAAAAGGCTTATGAACGTTTCAAATATAACGA

1065 GTACGTGGACAACGAAACATGGCTGAATCGCACCTACCGTGCATGGAAACCAC AAGATTTGTTAAACTACCAGATCCAAGGATCTGGTGCTGAGTTGTTCAAGAAG GCCATTGTCCTGCTGAAGGAGGCAAAACCGGATCTTAAGATCGTCAACTTGGT ACACGATGAGATTGTTGTCGAGGCCGACTCTAAGGAAGCCCAAGACCTTGCCA AGC T GAT C A A AGAGA AGAT GGA AGA AGC CT GGGAC T GGT GTTT GGA A A AGGC

1070 GGAGGAGTTCGGCAACCGTGTAGCCAAGATTAAACTTGAAGTAGAGCAGCCG AAC GT AGGGGAT AC AT GGGAGA A AT C G

SEQ ID NO. 2 Codon optimized OP-2605 Taq fusion DNA sequence (with mutations) Length: 2,631, Type: DNA, Source: Synthetic

ATGCGTGGTATGCTTCCACTGTTTGAACCGAAAGGCCGTGTGCTGCTGGTTGAT

1075 GGCCACCATCTGGCCTATCGTACCTTCCATGCGCTGAAAGGCCTGACGACCAG

CCGCGGCGAACCGGTGCAGGCGGTGTATGGCTTTGCGAAAAGCCTGCTGAAAG

CGCTGAAAGAAGATGGCGATGCGGTTATTGTGGTGTTTGATGCGAAAGCGCCG

AGCTTTCGTCATGAAGCGTATGGCGGCTATAAAGCGGGTCGTGCGCCGACCCC

GGAAGATTTTCCGCGTCAGCTGGCCCTGATTAAAGAACTGGTGGATCTGCTGG

1080 GCCTGGCGCGTCTGGAAGTGCCGGGCTATGAAGCGGATGATGTGCTGGCCAGC

CTGGCCAAAAAAGCGGAAAAAGAAGGCTACGAAGTTCGTATTCTGACCGCCG

ATAAAGACCTGTATCAGCTGCTGTCTGATCGTATTCATGTGCTGCATCCTGAGG

GTTATCTGATTACCCCGGCGTGGCTGTGGGAAAAATATGGCCTGCGTCCGGAT

CAGTGGGCGGATTATCGTGCGCTGACCGGCGATGAAAGCGATAACCTGCCGGG

1085 CGTGAAAGGCATTGGCGAAAAAACCGCGCGTAAACTGCTGGAAGAATGGGGC

AGCCTGGAAGCGCTGCTGAAAAACCTGGATCGTCTGAAACCGGCGATTCGTGA

AAAGATCTTAGCGCACATGGATGATCTGAAACTGAGCTGGGATCTGGCCAAAG

TGCGTACCGATCTGCCGCTGGAAGTGGATTTTGCGAAACGTCGTGAACCGGAT

CGTGAACGTCTGCGTGCGTTTCTGGAACGTCTGGAATTTGGCAGCCTGCTGCAT

1090 GAATTTGGCCTGCTGGAAAGCGGTGGCGGCGGTTCTGGCGGTGGTGGCAGCAA

TACTACTACATTAAGTGTGAAGCAGGAGGTAAAATCCCTTGTTAAACCGGTAG

TGTGCGATTCGATTGATAAAATTCCAGCAAAGTTCGATGAACCCGTTTATTTTG

ATCTTGCTACCGACAATGACAAGCCTGTTTTGGCCTCTATCTATCAATCTCATT

TTGGACATGACGTCTACTGCTTGAACTTATTAAAGGAGAAACCAGCCCGCCTG

1095 AAAGATTGGTTGTTGAAATTCAGCGAGATTCGTGGCTGGGGTTTAGATTATGA

CTTGCGCGTTCTTGGCTATACTTATGAACAACTTAAAGACAAAAAAATTGTAG

ACGTACAACTTGCTATTAAGGTGCAACACTACGAACGTTTTCGCCAGAACGGA

GCGAAGGGCGAGGGTTTCAAGCTTGACGATGTCGCCCGCGACCTGTTGGGAAT

CGAATACCCCATGGACAAGACGAAAATCCGTACTACCTTCAAGCAAAATATGT

1100 ATAATTCTTTTAATAAAGACCAGTTATTGTATGCCAGCCTGGATGCTTACATCC CTCACTTGCTTTACGAGCAACTGAGTTCAAATACTTTGAACAGTTTGGTCTATC

AGCTGGACCAGCAAGTTCAAAAGATCGGCATCGAGACGTCACAACATGGTCTT

CCTGTCCGTCTGCAAGCATTGCAAGAAGAGATTGATAAGTTATCACAGATCAA

GAAACGCATTCAGAAAGAGATCCCATTCAATTATAACTCCCCTAAACAAACCA

1105 CCCAGTACTTGGGCATCGATAGCTCCAGTAAGGACGTGTTGATGGACCTGGCG

TTAAAGGGCAACGAGTTAGCTAAGAAAATCCTTGAGGCTCGTCAAATTGAAAA

GGCTCTGACCTTCGCTAAAGATTTATACGATTTGGCGAAGCGTAATAACGGAC

GTATTTACGGTAACTTCTTTACTACTACCGCGCCATCTGGGCGTATGTCGTGTA

GCGACATCAACTTGCAAAACATTCCACGCAAGTTGCGTCCGTTCATTGGCTTTG

1110 AAACTGAAGATAAGAAACTGATTACCGCTGATTTTCCCCAAATCGAATTGCGC

TTGGCTGGTGTAATCTGGAACGAACCAAAGTTTATTGAAGCCTTCAATCAAGG

AATTGACTTACACAAGTTGACAGCATCAATTCTGTTCGATAAGCGCTCGGTCG

ATGAGGTCAGTAAAGAAGAGCGCCAGATCGGGAAGTCTGCAAACTTTGGGTTG

ATCTATGGGATCTCCCCGAAAGGATTCGCTGAGTACTGCATCACTAATGGAAT

1115 C A AC AT GAC C GA AGAGAT C GC AT AC GAG AT C GT C A AGA AGT GGA A A A A AT AT

TATACAAAAATCACTGAACAACAAAAGAAGGCGTATGAACGCTTCAAATACG

GGGAGTACGTCGATAACGAAACCTGGTTAAATCGTACCTATCGTGCCTATAAA

CCCCAGGACTTGTTGAACTACCAGATCCAGGGTTCTGGGGCTGAGCTGTTCAA

AAAAGCTATCATCCTGTTGAAAGAGGAGGAGCCAAGTGTTAAAATTGTCAACT

1120 TGGTCCATGATGAAATCGTTGTTGAGGCTGATAGTAAAGATGCTCAGGACGTA

GCCAATTTAATTAAAGAAAAGATGGGGCAGGCCTGGGATTACTGCTTGGATAA

GGC C A A AGAATT C GGA A AC C GC GT AGC GGA A ATT A AGCTT GA AGT AGA AGAG

C C C A AT GTC AGT GA AGTTT GGG AA A AGGGC

SEQ ID NO. 3 Codon optimized CS-2729 Taq fusion DNA sequence (with mutations)

1125 Length: 2,631, Type: DNA, Source: Synthetic

ATGCGTGGTATGCTTCCACTGTTTGAACCGAAAGGCCGTGTGCTGCTGGTTGAT

GGCCACCATCTGGCCTATCGTACCTTCCATGCGCTGAAAGGCCTGACGACCAG

CCGCGGCGAACCGGTGCAGGCGGTGTATGGCTTTGCGAAAAGCCTGCTGAAAG

CGCTGAAAGAAGATGGCGATGCGGTTATTGTGGTGTTTGATGCGAAAGCGCCG

1130 AGCTTTCGTCATGAAGCGTATGGCGGCTATAAAGCGGGTCGTGCGCCGACCCC

GGAAGATTTTCCGCGTCAGCTGGCCCTGATTAAAGAACTGGTGGATCTGCTGG

GCCTGGCGCGTCTGGAAGTGCCGGGCTATGAAGCGGATGATGTGCTGGCCAGC

CTGGCCAAAAAAGCGGAAAAAGAAGGCTACGAAGTTCGTATTCTGACCGCCG

ATAAAGACCTGTATCAGCTGCTGTCTGATCGTATTCATGTGCTGCATCCTGAGG

1135 GTTATCTGATTACCCCGGCGTGGCTGTGGGAAAAATATGGCCTGCGTCCGGAT

CAGTGGGCGGATTATCGTGCGCTGACCGGCGATGAAAGCGATAACCTGCCGGG

CGTGAAAGGCATTGGCGAAAAAACCGCGCGTAAACTGCTGGAAGAATGGGGC

AGCCTGGAAGCGCTGCTGAAAAACCTGGATCGTCTGAAACCGGCGATTCGTGA

AAAGATCTTAGCGCACATGGATGATCTGAAACTGAGCTGGGATCTGGCCAAAG

1140 TGCGTACCGATCTGCCGCTGGAAGTGGATTTTGCGAAACGTCGTGAACCGGAT

CGTGAACGTCTGCGTGCGTTTCTGGAACGTCTGGAATTTGGCAGCCTGCTGCAT GAATTTGGCCTGCTGGAAAGCGGTGGCGGCGGTTCTGGCGGTGGTGGCAGCAA

CACACCTTTCACAGTCAAAGTCAAGCCTGCCAACAAGTCGCTTGTAGACCCAA

TCTTATGTAATAGCATTGACGAGATTCCGGTGCGTTACGACGAGCCCGTGTATT

1145 TCGACATCGCAACGGAGGAGGATAAGCCAGTCCTTGTTAGTGTGTATCAGCCG

CATTTTGGGAACAAGGTTTATTGCTTGAATTTGTTGCGTGAGAAACCTGCGCGC

TTC A A AGAGT GGTTTTT GA A ATTTT C C GA A AT C C GC GGAT GGGGATT GGACTT C

GACTTGAAGATTCTGGGCTACACATACGAACAGCTTAAGAACAAAAAAATTGT

AGATGTACAGCTGGCAATCAAAGTTCAACATTATGAACGTTTCAAACAAGGAG

1150 GAACCAAAGGCGAGGGCTTTCGCCTGGACGAGGTTGCACGCGACTTACTTGGT

ATCGAGTACCCCATGGACAAGAGTAAGATCCGTATGACGTTCCGCAACAATAT

GTTCTCTAGTTTCTCTTACGAACAGTTGCTGTACGCGTCTTTGGACGCCTATATC

CCCCACTTATTATATGAACGTTTGAGTTCTTCGACCTTAAACTCGCTTGTTTATC

AAATTGACCAAGAGGTACAGAAGATCGTCGTAGAGACGAGCCAGCATGGTAT

1155 GCCTGTCAAATTACAGGCGTTAGAGGAGGAGATCCACCGTCTGTTACAAATTA

AAAACCAGATTCAAAAAGAGATTCCGTTCAATTATAACAGTCCGCAACAGACG

GCTAAGTTCTTCGGAGTTAACTCCTCTAGCAAAGACGTCTTGATGGACCTGGTA

CTGAAAGGGAATGAGATGGCGAAAAAGGTGTTGGAAGCCCGTCAAGTAGAAA

AGTCCTTAGCCTTCGCTAAGGATTTGTATGATCTGGCGAAGCGCTCGGGCGGA

1160 CGCATTTATGGTAATTTCTTCACTACAACCGCTCCATCGGGGCGTATGTCTTGT

TCCGACATTAACTTACAGAATATTCCACGCCGCTTGCGCCAATTTATTGGGTTT

GA A ACT GA AGAT A AGA A ACT GATT AC GGC GGATTT C C C GC AGAT C GAGTT AC G

TTTAGCTGGGGTGATTTGGAACGAACCGGAATTCATTAACGCGTTCCGTAAGG

GTTTGGACTTGCATAAACTGACAGCTTCAATCCTTTTTGAGAAGAACATCGAG

1165 GAGGT C AGC AAAGAAGAACGC C AAATC GGT AAAT CT GCT AATTT C GGCTT GAT

CTACGGGATCTCTCCCCGCGGTTTCGCGGAGTACTGTATTAGTAATGGTATCAA

CATGACCGAGGAAATGGCCGTGGAGATTGTTCGCAAATGGAAAAAATTCTACC

GTAAGATTGCAGAGCAACAGAAGAAGGCGTATGAACGTTTCAAGTACGACGA

ATACGTTGATAATGAGACTTGGTTGAACCGCCCCTATCGTGCATATAAGCCGC

1170 AAGACTTACTTAACTATCAGATTCAGGGCTCGGGAGCCGAGTTGTTTAAGAAG

GCAATTATCCTGATCAAAGAAGTACGTCCGGATTTAAAGCTGGTAAATCTTGT

ACATGACGAAATCGTAGCCGAAGCACTGACCGACGAAGCCGAGGATATTGCA

ATGTTAATTAAACAGAAGATGGAAGAAGCTTGGGATTATTGTCTTGAGAAGGC

C A A AGAATT C GGA A AC AAGGT GAGC GA A ATT A A ATT GGAT ATT GAGA AGC CT

1175 AACATCTCTCATGTATGGGAAAAAGAA

SEQ ID NO. 4 Codon optimized PS-6739 Taq fusion DNA sequence (with mutations) Length: 2,631, Type: DNA, Source: Synthetic

ATGCGTGGTATGCTTCCACTGTTTGAACCGAAAGGCCGTGTGCTGCTGGTTGAT

GGCCACCATCTGGCCTATCGTACCTTCCATGCGCTGAAAGGCCTGACGACCAG

1180 CCGCGGCGAACCGGTGCAGGCGGTGTATGGCTTTGCGAAAAGCCTGCTGAAAG

CGCTGAAAGAAGATGGCGATGCGGTTATTGTGGTGTTTGATGCGAAAGCGCCG

AGCTTTCGTCATGAAGCGTATGGCGGCTATAAAGCGGGTCGTGCGCCGACCCC GGAAGATTTTCCGCGTCAGCTGGCCCTGATTAAAGAACTGGTGGATCTGCTGG

GCCTGGCGCGTCTGGAAGTGCCGGGCTATGAAGCGGATGATGTGCTGGCCAGC

1185 CTGGCCAAAAAAGCGGAAAAAGAAGGCTACGAAGTTCGTATTCTGACCGCCG

ATAAAGACCTGTATCAGCTGCTGTCTGATCGTATTCATGTGCTGCATCCTGAGG

GTTATCTGATTACCCCGGCGTGGCTGTGGGAAAAATATGGCCTGCGTCCGGAT

CAGTGGGCGGATTATCGTGCGCTGACCGGCGATGAAAGCGATAACCTGCCGGG

CGTGAAAGGCATTGGCGAAAAAACCGCGCGTAAACTGCTGGAAGAATGGGGC

1190 AGCCTGGAAGCGCTGCTGAAAAACCTGGATCGTCTGAAACCGGCGATTCGTGA

AAAGATCTTAGCGCACATGGATGATCTGAAACTGAGCTGGGATCTGGCCAAAG

TGCGTACCGATCTGCCGCTGGAAGTGGATTTTGCGAAACGTCGTGAACCGGAT

CGTGAACGTCTGCGTGCGTTTCTGGAACGTCTGGAATTTGGCAGCCTGCTGCAT

GAATTTGGCCTGCTGGAAAGCGGTGGCGGCGGTTCTGGCGGTGGTGGCAGCAA

1195 TATCCAGAAATCAATCCTTAAACCGCAGCCCAAAGCCTTAGTAGAACCCGTTT

TGTGCAACTCCATCGACGAAATTCCAGCAAAGTTTAATGAGCCAATTTATTTCG

ATTTGGCGACTGACGAAGACCGTCCGGTTTTGGCATCGATCTATCAACCGCATT

TTGAGCGCAAGGTGTATTGCCTGAACCTGCTTAAAGAGAAACCGACCCGCTTT

AA AGAGT GGTT GTT A A AGTTT AGT GA A AT C C GC GGGT GGGGGTT AGATTTT G A

1200 CCTGCGCGTCTT GGGAT AC AC CT AT GAGC AGTT GA AGGAC A A A A AGATT GTC G

ATGTCCAATTAGCAATTAAAGTACAGCACTATGAGCGTTTCCGTCAAAATGGG

ACCAAAGGAGAAGGGTTCCGTCTGGATGACGTAGCCCGCGATCTGTTTGGCAT

CGAATATCCAATGGATAAGTCAAAAATCCGTACAACGTTTAAGCAAAACATGT

ACAATACATTCAGCGAGCAGCAGTTACTTTACGCCTCGTTAGACGCATACATTC

1205 CTCATCTGTTATACGAGCAACTTTCCTCATCCACATTAAACAGCTTGGTTTATC

AGTTGGATCAAACGGCACAAAAGATCGTCGTCGAGACCTCTCAGCATGGAATG

CCTGTCAAACTTAAAGCCTTGGAAGAAGAGATCTATCGCTTGACCCAGTTACG

CAACCAAATGCAGAAGGAAATTCCGTTTAACTATAACTCCCCCAAGCAGACCG

CAAAATTTTTCGGCCTGGATAGTAGCAGCAAAGACGTATTGATGGACCTTGCC

1210 CTTCAAGGGAACGAAATGGCTAAGAAAGTCCTTGAGGCACGCCAAATTGAAA

AATCCTTGACATTCGCTAAGGATCTTTACGACTTAGCAAAGAAGAGCGGAGGG

CGCATTTATGGGAACTTCTTTACTACGACTGCCCCTAGCGGACGCATGTCATGT

TCGGATATTAACCTGCAAAACATTCCTCGCCGTCTGCGCCAATTCATCGGGTTT

GACACGGAGGACAAGAAATTAATTACAGCAGACTTCCCGCAAATTGAATTGCG

1215 CTTGGCTGGCGTAATCTGGAACGAGCCCAAATTTATCGAAGCCTTCCGCCAGG

GCATTGACTTGCATAAGCTTACTGCTAGTATTTTATTTGACAAACAATCTATTG

ACGAAGTGTCTAAAGAAGAGCGCCAAATCGGCAAAAGCGCGAATTTCGGCCT

GATTTACGGTATCAGCCCGCGTGGATTTGCCGAGCATTGCATCACTAACGGGA

TCAATATTACTGAAGAGCAGGCGTATGAGATCGTTAAAAAATGGAAGAAGTAC

1220 TATACTAAGATTACCGAGCAACAGAAGAAAGCATATGAACGCTTCAAATATAA

TGAGTATGTCGACAACGAGACATGGCTGAACCGCACATATCGTGCATATAAGC

CACAAGATCTTTTAAACTATCAGATCCAGGGGAGCGGCGCAGAGTTATTCAAA

AAAGCGATTATCCTTTTGAAGCAAGAAGAGCCCTCCCTGAAGATTGTAAACTT

AGTACACGATGAAATTGTCGTGGAAGCTGATTCCAAGGATGCACAGGATCTGG

1225 C GA A ACT GATT AAGGA A A AGAT GGA AGA AGC GT GGGATT GGT GCTT GGA A A A GGCGGAGGAATTCGGGAACCGCGTCGCGAAGATCAAGTTAGAAGTCGAGGAA

CCCCACGTTGGGGAGGTCTGGGAGAAAGGC

SEQ ID NO. 5 OS-1622 Taq nuclease domain fusion (with mutations)Length: 876, Type: Protein, Source: Expression from synthetic gene

1230 OS-1622-Taq-mut

MRGMLPLFEPKGRVLLVDGHHL AYRTFH ALKGLTTSRGEPVQ AVY GFAKSLLKA

LKEDGDAVIVVFDAKAPSFRHEAYGGYKAGRAPTPEDFPRQLALIKELVDLLGLA

RLEVPGYEADDVLASLAKKAEKEGYEVRILTADKDLYQLLSDRIHVLHPEGYLITP

AWLWEKYGLRPDQWADYRALTGDESDNLPGVKGIGEKTARKLLEEWGSLEALL

1235 KNLDRLKPAIREKILAHMDDLKLSWDLAKVRTDLPLEVDFAKRREPDRERLRAFL

ERLEFGSLLHEFGLLESGGGGSGGGGSNIPKPILKPQPKALVEPVLCDSVDEIPTKF

NEPIYFDLATDGDRPVLASIY QPHFERKVY CLNLLKEKPTRFKEWLLKFSEIRGW G

LDFDLRALGYTYEQLRDKKIVDVQLAIKVQHHERFKQNGTKGEGFRLDDVARDL

LGIEYPMDKTKIRETFKNNIFHSFSNEQLLYASLDAYIPHLLYEQLTSSTLNSLVYQ L

1240 DQQAQKIVVETSQNGMPVKLKALEEEIHRLTQLRNQMQKEIPFNYNSPKQTAKFF

RVDSSSKDVLMDLALQGNEMAKRVLEARQVEKSLAFAKDLYDIAKRSGGRVYG

NFFTTTAPSGRMSCSDINLQNIPRRLRQFIGFDTEDKRLITADFPQIELRLAGVIWN E

SEFIEAFKQGIDLHKLTASILFEKNIEEV GKEERQIGKS ANF GLIY GIAPKGFAEY CIT

NGINMTEEQAYEIVRKWKKYYTKIAEQQKKAYERFKYNEYVDNETWLNRTYRA

1245 WKPQDLLNYQIQGSGAELFKKAIVLLKEAKPDLKIVNLVHDEIVVEADSKEAQDL

AKLIKEKMEE AWD WCLEKAEEF GNRV AKIKLEVEQPNV GDTWEKS

SEQ ID NO. 6 OP-2605 Taq nuclease domain fusion (with mutations) Length: 877, Type: Protein, Source: Expression from synthetic gene

OP-2605-Taq-mut

1250 MRGMLPLFEPKGRVLLVDGHHLAYRTFH ALKGLTTSRGEPVQ AVYGFAKSLLKA

LKEDGDAVIVVFDAKAPSFRHEAYGGYKAGRAPTPEDFPRQLALIKELVDLLGLA

RLEVPGYEADDVLASLAKKAEKEGYEVRILTADKDLYQLLSDRIHVLHPEGYLITP

AWLWEKYGLRPDQWADYRALTGDESDNLPGVKGIGEKTARKLLEEWGSLEALL

KNLDRLKPAIREKILAHMDDLKLSWDLAKVRTDLPLEVDFAKRREPDRERLRAFL

1255 ERLEFGSLLHEFGLLESGGGGSGGGGSNTTTLSVKQEVKSLVKPVVCDSIDKIPAK

FDEPVYFDLATDNDKPVLASIYQSHF GHDVY CLNLLKEKP ARLKDWLLKF SEIRG

W GLD YDLRVLGYTYEQLKDKKIVD V QL AIKV QHYERFRQN GAKGEGFKLDD V A

RDLLGIEYPMDKTKIRTTFKQNMYNSFNKDQLLYASLDAYIPHLLYEQLSSNTLNS

LVYQLDQQVQKIGIETSQHGLPVRLQALQEEIDKLSQIKKRIQKEIPFNYNSPKQTT

1260 QYLGID S S SKDVLMDL ALKGNELAKKILEARQIEKALTF AKDLYDLAKRNNGRIY

GNFFTTTAPSGRMSCSDINLQNIPRKLRPFIGFETEDKKLITADFPQIELRLAGVIW N

EPKFIEAFNQGIDLHKLTASILFDKRSVDEVSKEERQIGKSANFGLIYGISPKGFAE Y

CITNGINMTEEIAYEIVKKWKKYYTKITEQQKKAYERFKYGEYVDNETWLNRTYR AYKPQDLLNYQIQGSGAELFKKAIILLKEEEPSVKIVNLVHDEIVVEADSKDAQDV

1265 ANLIKEKMGQ AWDY CLDKAKEF GNRV AEIKLEVEEPNV SEVWEKG

SEQ ID NO. 7 CS-2729 Taq nuclease domain fusion (with mutations)Length: 877, Type: Protein, Source: Expression from synthetic gene

CS-2729-Taq-mut

MRGMLPLFEPKGRVLLVDGHHL AYRTFH ALKGLTTSRGEPVQ AVY GFAKSLLKA

1270 LKEDGDAVIVVFDAKAPSFRHEAYGGYKAGRAPTPEDFPRQLALIKELVDLLGLA

RLEVPGYEADDVLASLAKKAEKEGYEVRILTADKDLYQLLSDRIHVLHPEGYLITP

AWLWEKYGLRPDQWADYRALTGDESDNLPGVKGIGEKTARKLLEEWGSLEALL

KNLDRLKPAIREKILAHMDDLKLSWDLAKVRTDLPLEVDFAKRREPDRERLRAFL

ERLEFGSLLHEFGLLESGGGGSGGGGSNTPFTVKVKPANKSLVDPILCNSIDEIPVR

1275 YDEPVYFDIATEEDKPVLV SVY QPHF GNKVY CLNLLREKPARFKEWFLKFSEIRG

W GLDFDLKILGYTYEQLKNKKIVDV QL AIKV QHYERFKQGGTKGEGFRLDEV AR

DLLGIEYPMDKSKIRMTFRNNMF S SFS YEQLLY ASLD AYIPHLLYERLS S STLN SLV

YQIDQEVQKIVVETSQHGMPVKLQALEEEIHRLLQIKNQIQKEIPFNYNSPQQTAKF

FGVNSSSKDVLMDLVLKGNEMAKKVLEARQVEKSLAFAKDLYDLAKRSGGRIYG

1280 NFFTTTAPSGRMSCSDINLQNIPRRLRQFIGFETEDKKLITADFPQIELRLAGVIWNE

PEFINAFRKGLDLHKLTASILFEKNIEEV SKEERQIGKS ANF GLIY GISPRGFAEY CIS

NGINMTEEMAVEIVRKWKKFYRKIAEQQKKAYERFKYDEYVDNETWLNRPYRA

YKPQDLLNYQIQGSGAELFKKAIILIKEVRPDLKLVNLVHDEIVAEALTDEAEDIAM

LIKQKMEEAWDY CLEKAKEF GNKV SEIKLDIEKPNISHVWEKE

1285 SEQ ID NO. 8 PS-6739 Taq nuclease domain fusion (with mutations)Length: 877, Type: Protein, Source: Expression from synthetic gene

PS-6739-Taq-mut

MRGMLPLFEPKGRVLLVDGHHLAYRTFH ALKGLTTSRGEPVQ AVYGFAKSLLKA

LKEDGDAVIVVFDAKAPSFRHEAYGGYKAGRAPTPEDFPRQLALIKELVDLLGLA

1290 RLEVPGYEADDVLASLAKKAEKEGYEVRILTADKDLYQLLSDRIHVLHPEGYLITP

AWLWEKYGLRPDQWADYRALTGDESDNLPGVKGIGEKTARKLLEEWGSLEALL

KNLDRLKPAIREKILAHMDDLKLSWDLAKVRTDLPLEVDFAKRREPDRERLRAFL

ERLEFGSLLHEFGLLESGGGGSGGGGSNIQKSILKPQPKALVEPVLCNSIDEIPAKF N

EPIYFDLATDEDRPVLASIY QPHFERKVY CLNLLKEKPTRFKEWLLKFSEIRGW GL

1295 DFDLRVLGYTYEQLKDKKIVD VQLAIKV QHYERFRQNGTKGEGFRLDDV ARDLF

GIEYPMDKSKIRTTFKQNMYNTFSEQQLLYASLDAYIPHLLYEQLSSSTLNSLVYQ

LDQTAQKIVVETSQHGMPVKLKALEEEIYRLTQLRNQMQKEIPFNYNSPKQTAKFF

GLDSSSKDVLMDLALQGNEMAKKVLEARQIEKSLTFAKDLYDLAKKSGGRIYGN

FFTTTAPSGRMSCSDINLQNIPRRLRQFIGFDTEDKKLITADFPQIELRLAGVIWNE P

1300 KFIEAFRQGIDLHKLTASILFDKQSIDEVSKEERQIGKSANFGLIYGISPRGFAEHCIT

NGINITEEQAYEIVKKWKKYYTKITEQQKKAYERFKYNEYVDNETWLNRTYRAY KPQDLLNYQIQGS GAELFKKAIILLKQEEP SLKIVNLVHDEIVVEAD SKD AQDLAK LIKEKMEEAWD WCLEKAEEF GNRV AKIKLEVEEPHV GEV WEKG

SEQ ID NO. 9 Codon optimized 015 variant DNA sequence

1305 Length: 2,631, Type: DNA, Source: Synthetic

ATGCGTGGTATGCTTCCACTGTTTGAACCGAAAGGCCGTGTGCTGCTGGTTGAT

GGCCACCATCTGGCCTATCGTACCTTCCATGCGCTGAAAGGCCTGACGACCAG

CCGCGGCGAACCGGTGCAGGCGGTGTATGGCTTTGCGAAAAGCCTGCTGAAAG

CGCTGAAAGAAGATGGCGATGCGGTTATTGTGGTGTTTGATGCGAAAGCGCCG

1310 AGCTTTCGTCATGAAGCGTATGGCGGCTATAAAGCGGGTCGTGCGCCGACCCC

GGAAGATTTTCCGCGTCAGCTGGCCCTGATTAAAGAACTGGTGGATCTGCTGG

GCCTGGCGCGTCTGGAAGTGCCGGGCTATGAAGCGGATGATGTGCTGGCCAGC

CTGGCCAAAAAAGCGGAAAAAGAAGGCTACGAAGTTCGTATTCTGACCGCCG

ATAAAGACCTGTATCAGCTGCTGTCTGATCGTATTCATGTGCTGCATCCTGAGG

1315 GTTATCTGATTACCCCGGCGTGGCTGTGGGAAAAATATGGCCTGCGTCCGGAT

CAGTGGGCGGATTATCGTGCGCTGACCGGCGATGAAAGCGATAACCTGCCGGG

CGTGAAAGGCATTGGCGAAAAAACCGCGCGTAAACTGCTGGAAGAATGGGGC

AGCCTGGAAGCGCTGCTGAAAAACCTGGATCGTCTGAAACCGGCGATTCGTGA

AAAGATCTTAGCGCACATGGATGATCTGAAACTGAGCTGGGATCTGGCCAAAG

1320 TGCGTACCGATCTGCCGCTGGAAGTGGATTTTGCGAAACGTCGTGAACCGGAT

CGTGAACGTCTGCGTGCGTTTCTGGAACGTCTGGAATTTGGCAGCCTGCTGCAT

GAATTTGGCCTGCTGGAAAGCGGTGGCGGCGGTTCTGGCGGTGGTGGCAGCAA

TACTACTACATTAAGTGTGAAGCAGGAGGTAAAATCCCTTGTTAAACCGGTAG

TGTGCGATTCGATTGATAAAATTCCAGCAAAGTTCGATGAACCCGTTTATTTTG

1325 ATCTTGCTACCGACAATGACAAGCCTGTTTTGGCCTCTATCTATCAATCTCATT

TTGGACATGACGTCTACTGCTTGAACTTATTAAAGGAGAAACCAGCCCGCCTG

AAAGATTGGTTGTTGAAATTCAGCGAGATTCGTGGCTGGGGTTTAGATTATGA

CTTGCGCGTTCTTGGCTATACTTATGAACAACTTAAAGACAAAAAAATTGTAG

ACGTACAACTTGCTATTAAGGTGCAACACTACGAACGTTTTCGCCAGAACGGA

1330 GCGAAGGGCGAGGGTTTCAAGCTTGACGATGTCGCCCGCGACCTGTTGGGAAT

CGAATACCCCATGGACAAGACGAAAATCCGTACTACCTTCAAGCAAAATATGT

ATAATTCTTTTAATAAAGACCAGTTATTGTATGCCAGCCTGGATGCTTACATCC

CTCACTTGCTTTACGAGCAACTGAGTTCAAATACTTTGAACAGTTTGGTCTATC

AGCTGGACCAGCAAGTTCAAAAGATCGGCATCGAGACGTCACAACATGGTCTT

1335 CCTGTCCGTCTGCAAGCATTGCAAGAAGAGATTGATAAGTTATCACAGATCAA

GAAACGCATTCAGAAAGAGATCCCATTCAATTATAACTCCCCTAAACAAACCA

CCCAGTACTTGGGCATCGATAGCTCCAGTAAGGACGTGTTGATGGACCTGGCG

TTAAAGGGCAACGAGTTAGCTAAGAAAATCCTTGAGGCTCGTCAAATTGAAAA

GGCTCTGACCTTCGCTAAAGAgTTATACGATTTGGCGAAGCGTAATAACGGAC

1340 GTATTTACGGTAACTTCTTTACTACTACCGCGCCATCTGGGCGTATGTCGTGTA

GCGACATCAACTTGCAAAACATTCCACGCAAGTTGCGTCCGTTCATTGGCTTTG

AAACTGAAGATAAGcgtCTGATTACCGCTGATTTTCCCCAAATCGAATTGCGCTT GGCTGGTGTAATCTGGAACGAAagtAAGTTTATTGAAGCCTTCAATCAAGGAAT

TGACTTACACAAGTTGACAGCATCAATTCTGTTCGgcAAGCGCTCGGTCGATGA

1345 GGTCAGTAAAGAAGAGCGCCAGATCGGGAAGTCTGCAAACTTTGGGTTGATCT

ATGGGATCTCCCCGcgtGGATTCGCTGAGTACTGCATCACTAATGGAATCAACAT

GACCGAAGAGATCGCATACGAGATCGTCAAGAAGTGGAAAcgtTATTATACAAA

AATCACTGAACAACAAAAGAAGGCGTATGAACGCTTCAAATACGGGGAGTAC

GTCGATAACGAAACCTGGTTAgccCGTACCTATCGTGCCTATAAACCCCAGGAC

1350 TTGTT GAACT ACC AGATC C AGGGTT CT GGGGCT GAGCT GTT C AAAAAAGCT AT

CATCCTGTTGAAAGAGGAGGAGCCAAGTGTTAAAATTGTCAACTTGGTCCATG

ATGAAATCGTTGTTGAGGCTGATAGTAAAGATGCTCAGGACGTAGCCAATTTA

ATT A AAG AA A AGAT GGGGC AGGC CT GGG ATT ACTGC TT GGAT A AGGC C A A AG

AATTCGGAAACCGCGTAGCGGAAATTAAGCTTGAAGTAGAAGAGCCCAATGTC

1355 AGT GA AGTTT GGG A A A AGGGC

SEQ ID NO. 10 Engineered 015 variant polymerase

Length: 877, Type: Protein, Source: Expression from synthetic gene

MRGMLPLFEPKGRVLLVDGHHL AYRTFH ALKGLTTSRGEPVQ AVY GFAKSLLKA

1360 LKEDGDAVIVVFDAKAPSFRHEAYGGYKAGRAPTPEDFPRQLALIKELVDLLGLA

RLEVPGYEADDVLASLAKKAEKEGYEVRILTADKDLYQLLSDRIHVLHPEGYLITP

AWLWEKYGLRPDQWADYRALTGDESDNLPGVKGIGEKTARKLLEEWGSLEALL

KNLDRLKPAIREKILAHMDDLKLSWDLAKVRTDLPLEVDFAKRREPDRERLRAFL

ERLEFGSLLHEFGLLESGGGGSGGGGSNTTTLSVKQEVKSLVKPVVCDSIDKIPAK

1365 FDEPVYFDLATDNDKPVLASIYQSHF GHDVY CLNLLKEKP ARLKDWLLKF SEIRG

W GLD YDLRVLGYTYEQLKDKKIVD V QL AIKV QHYERFRQN GAKGEGFKLDD V A

RDLLGIEYPMDKTKIRTTFKQNMYNSFNKDQLLYASLDAYIPHLLYEQLSSNTLNS

LVYQLDQQVQKIGIETSQHGLPVRLQALQEEIDKLSQIKKRIQKEIPFNYNSPKQTT

QYLGID S S SKDVLMDL ALKGNELAKKILEARQIEKALTF AKELYDLAKRNNGRIY

1370 GNFFTTTAPSGRMSCSDINLQNIPRKLRPFIGFETEDKRLITADFPQIELRLAGVIWN

ESKFIEAFNQGIDLHKLTASILFGKRSVDEVSKEERQIGKSANFGLIYGISPRGFAE Y

CITNGINMTEEIAYEIVKKWKRYYTKITEQQKKAYERFKYGEYVDNETWLARTYR

AYKPQDLLNYQIQGSGAELFKKAIILLKEEEPSVKIVNLVHDEIVVEADSKDAQDV

ANLIKEKMGQ AWDY CLDKAKEF GNRV AEIKLEVEEPNV SEVWEKG

1375 SEQ ID NO. 11 Codon optimized 057 variant DNA sequence Length: 2,631, Type: DNA, Source: Synthetic

ATGCGTGGTATGCTTCCACTGTTTGAACCGAAAGGCCGTGTGCTGCTGGTTGAT

GGCCACCATCTGGCCTATCGTACCTTCCATGCGCTGAAAGGCCTGACGACCAG

CCGCGGCGAACCGGTGCAGGCGGTGTATGGCTTTGCGAAAAGCCTGCTGAAAG

1380 CGCTGAAAGAAGATGGCGATGCGGTTATTGTGGTGTTTGATGCGAAAGCGCCG

AGCTTTCGTCATGAAGCGTATGGCGGCTATAAAGCGGGTCGTGCGCCGACCCC

GGAAGATTTTCCGCGTCAGCTGGCCCTGATTAAAGAACTGGTGGATCTGCTGG

GCCTGGCGCGTCTGGAAGTGCCGGGCTATGAAGCGGATGATGTGCTGGCCAGC

CTGGCCAAAAAAGCGGAAAAAGAAGGCTACGAAGTTCGTATTCTGACCGCCG

1385 ATAAAGACCTGTATCAGCTGCTGTCTGATCGTATTCATGTGCTGCATCCTGAGG

GTTATCTGATTACCCCGGCGTGGCTGTGGGAAAAATATGGCCTGCGTCCGGAT

CAGTGGGCGGATTATCGTGCGCTGACCGGCGATGAAAGCGATAACCTGCCGGG

CGTGAAAGGCATTGGCGAAAAAACCGCGCGTAAACTGCTGGAAGAATGGGGC

AGCCTGGAAGCGCTGCTGAAAAACCTGGATCGTCTGAAACCGGCGATTCGTGA

1390 AAAGATCTTAGCGCACATGGATGATCTGAAACTGAGCTGGGATCTGGCCAAAG

TGCGTACCGATCTGCCGCTGGAAGTGGATTTTGCGAAACGTCGTGAACCGGAT

CGTGAACGTCTGCGTGCGTTTCTGGAACGTCTGGAATTTGGCAGCCTGCTGCAT

GAATTTGGCCTGCTGGAAAGCGGTGGCGGCGGTTCTGGCGGTGGTGGCAGCAA

TACTACTACATTAAGTGTGAAGCAGGAGGTAAAATCCCTTGTTAAACCGGTAG

1395 TGTGCGATTCGATTGATAAAATTCCAGCAAAGTTCGATGAACCCGTTTATTTTG

ATCTTGCTACCGACAATGACAAGCCTGTTTTGGCCTCTATCTATCAATCTCATT TTGGACATGACGTCTACTGCTTGAACTTATTAAAGGAGAAACCAGCCCGCCTG

AAAGATTGGTTGTTGAAATTCAGCGAGATTCGTGGCTGGGGTTTAGATTATGA

CTTGCGCGTTCTTGGCTATACTTATGAACAACTTAAAGACAAAAAAATTGTAG

1400 ACGTACAACTTGCTATTAAGGTGCAACACTACGAACGTTTTCGCCAGAACGGA

GCGAAGGGCGAGGGTTTCAAGCTTGACGATGTCGCCCGCGACCTGTTGGGAAT

CGAATACCCCATGGACAAGACGAAAATCCGTACTACCTTCAAGCAAAATATGT

ATAATTCTTTTAATAAAGACCAGTTATTGTATGCCAGCCTGGATGCTTACATCC

CTCACTTGCTTTACGAGCAACTGAGTTCAAATACTTTGAACAGTTTGGTCTATC

1405 AGCTGGACCAGCAAGTTCAAAAGATCGGCATCGAGACGTCACAACATGGTCTT

CCTGTCCGTCTGCAAGCATTGCAAGAAGAGATTGATAAGTTATCACAGATCAA

GAAACGCATTCAGAAAGAGATCCCATTCAATTATAACTCCCCTAAACAAACCA

CCCAGTACTTGGGCATCGATAGCTCCAGTAAGGACGTGTTGATGGACCTGGCG

TTAAAGGGCAACGAGTTAGCTAAGAAAATCCTTGAGGCTCGTCAAATTGAAAA

1410 GGCTCTGACCTTCGCTAAAGtgTTATACGATTTGGCGAAGCGTAATAACGGACG

TATTTACGGTAACTTCTTTACTACTACCGCGCCATCTGGGCGTATGTCGTGTAG

CGACATCAACTTGCAAAACATTCCACGCAAGTTGCGTCCGTTCATTGGCTTTGA

AACTGAAGATAAGcgtCTGATTACCGCTGATTTTCCCCAAATCGAATTGCGCTTG

GCTGGTGTAATCTGGAACGAAaagAAGTTTATTGAAGCCTTCAATCAAGGAATT

1415 GACTTACACAAGTTGACAGCATCAATTCTGTTCGgcAAGCGCTCGGTCGATGAG

GTCAGTAAAGAAGAGCGCCAGATCGGGAAGTCTGCAAACTTTGGGTTGATCTA

TGGGATCTCCCCGcgtGGATTCGCTGAGTACTGCATCACTAATGGAATCAACATG

ACCGAAGAGATCGCATACGAGATCGTCAAGAAGTGGAAAgcgTATTATACAAA

AATCACTGAACAACAAAAGAAGGCGTATGAACGCTTCAAATACGGGGAGTAC

1420 GTCGATAACGAAACCTGGTTAgccCGTACCTATCGTGCCTATAAACCCCAGGAC

TTGTT GAACT ACC AGATC C AGGGTT CT GGGGCT GAGCT GTT C AAAAAAGCT AT

CATCCTGTTGAAAGAGGAGGAGCCAAGTGTTAAAATTGTCAACTTGGTCCATG

ATGAAATCGTTGTTGAGGCTGATAGTAAAGATGCTCAGGACGTAGCCAATTTA

ATT A AAG AA A AGAT GGGGC AGGC CT GGG ATT ACTGC TT GGAT A AGGC C A A AG

1425 AATTCGGAAACCGCGTAGCGGAAATTAAGCTTGAAGTAGAAGAGCCCAATGTC

AGT GA AGTTT GGG A A A AGGGC

SEQ ID NO. 12 Engineered 057 variant polymerase

Length: 877, Type: Protein, Source: Expression from synthetic gene

MRGMLPLFEPKGRVLLVDGHHL AYRTFH ALKGLTTSRGEPVQ AVY GFAKSLLKA

1430 LKEDGDAVIVVFDAKAPSFRHEAYGGYKAGRAPTPEDFPRQLALIKELVDLLGLA

RLEVPGYEADDVLASLAKKAEKEGYEVRILTADKDLYQLLSDRIHVLHPEGYLITP

AWLWEKYGLRPDQWADYRALTGDESDNLPGVKGIGEKTARKLLEEWGSLEALL

KNLDRLKPAIREKILAHMDDLKLSWDLAKVRTDLPLEVDFAKRREPDRERLRAFL

ERLEFGSLLHEFGLLESGGGGSGGGGSNTTTLSVKQEVKSLVKPVVCDSIDKIPAK

1435 FDEPVYFDLATDNDKPVLASIYQSHF GHDVY CLNLLKEKP ARLKDWLLKF SEIRG

W GLD YDLRVLGYTYEQLKDKKIVD V QL AIKV QHYERFRQN GAKGEGFKLDD V A

RDLLGIEYPMDKTKIRTTFKQNMYNSFNKDQLLYASLDAYIPHLLYEQLSSNTLNS LVYQLDQQVQKIGIETSQHGLPVRLQALQEEIDKLSQIKKRIQKEIPFNYNSPKQTT

QYLGID S S SKDVLMDL ALKGNELAKKILEARQIEKALTF AKVLYDLAKRNNGRIY

1440 GNFFTTTAPSGRMSCSDINLQNIPRKLRPFIGFETEDKRLITADFPQIELRLAGVIWN

EKKFIEAFNQGIDLHKLTASILFGKRSVDEVSKEERQIGKSANFGLIYGISPRGFAE Y

CITNGINMTEEIAYEIVKKWKAYYTKITEQQKKAYERFKYGEYVDNETWLARTYR

AYKPQDLLNYQIQGSGAELFKKAIILLKEEEPSVKIVNLVHDEIVVEADSKDAQDV

ANLIKEKMGQ AWDY CLDKAKEF GNRV AEIKLEVEEPNV SEVWEKG

1445 SEQ ID NO. 13 Codon optimized 058 variant DNA sequence Length: 2,631, Type: DNA, Source: Synthetic

ATGCGTGGTATGCTTCCACTGTTTGAACCGAAAGGCCGTGTGCTGCTGGTTGAT

GGCCACCATCTGGCCTATCGTACCTTCCATGCGCTGAAAGGCCTGACGACCAG

CCGCGGCGAACCGGTGCAGGCGGTGTATGGCTTTGCGAAAAGCCTGCTGAAAG

1450 CGCTGAAAGAAGATGGCGATGCGGTTATTGTGGTGTTTGATGCGAAAGCGCCG

AGCTTTCGTCATGAAGCGTATGGCGGCTATAAAGCGGGTCGTGCGCCGACCCC

GGAAGATTTTCCGCGTCAGCTGGCCCTGATTAAAGAACTGGTGGATCTGCTGG

GCCTGGCGCGTCTGGAAGTGCCGGGCTATGAAGCGGATGATGTGCTGGCCAGC

CTGGCCAAAAAAGCGGAAAAAGAAGGCTACGAAGTTCGTATTCTGACCGCCG

1455 ATAAAGACCTGTATCAGCTGCTGTCTGATCGTATTCATGTGCTGCATCCTGAGG

GTTATCTGATTACCCCGGCGTGGCTGTGGGAAAAATATGGCCTGCGTCCGGAT

CAGTGGGCGGATTATCGTGCGCTGACCGGCGATGAAAGCGATAACCTGCCGGG

CGTGAAAGGCATTGGCGAAAAAACCGCGCGTAAACTGCTGGAAGAATGGGGC

AGCCTGGAAGCGCTGCTGAAAAACCTGGATCGTCTGAAACCGGCGATTCGTGA

1460 AAAGATCTTAGCGCACATGGATGATCTGAAACTGAGCTGGGATCTGGCCAAAG

TGCGTACCGATCTGCCGCTGGAAGTGGATTTTGCGAAACGTCGTGAACCGGAT

CGTGAACGTCTGCGTGCGTTTCTGGAACGTCTGGAATTTGGCAGCCTGCTGCAT

GAATTTGGCCTGCTGGAAAGCGGTGGCGGCGGTTCTGGCGGTGGTGGCAGCAA

TACTACTACATTAAGTGTGAAGCAGGAGGTAAAATCCCTTGTTAAACCGGTAG

1465 TGTGCGATTCGATTGATAAAATTCCAGCAAAGTTCGATGAACCCGTTTATTTTG

ATCTTGCTACCGACAATGACAAGCCTGTTTTGGCCTCTATCTATCAATCTCATT

TTGGACATGACGTCTACTGCTTGAACTTATTAAAGGAGAAACCAGCCCGCCTG

AAAGATTGGTTGTTGAAATTCAGCGAGATTCGTGGCTGGGGTTTAGATTATGA

CTTGCGCGTTCTTGGCTATACTTATGAACAACTTAAAGACAAAAAAATTGTAG

1470 ACGTACAACTTGCTATTAAGGTGCAACACTACGAACGTTTTCGCCAGAACGGA

GCGAAGGGCGAGGGTTTCAAGCTTGACGATGTCGCCCGCGACCTGTTGGGAAT

CGAATACCCCATGGACAAGACGAAAATCCGTACTACCTTCAAGCAAAATATGT

ATAATTCTTTTAATAAAGACCAGTTATTGTATGCCAGCCTGGATGCTTACATCC

CTCACTTGCTTTACGAGCAACTGAGTTCAAATACTTTGAACAGTTTGGTCTATC

1475 AGCTGGACCAGCAAGTTCAAAAGATCGGCATCGAGACGTCACAACATGGTCTT

CCTGTCCGTCTGCAAGCATTGCAAGAAGAGATTGATAAGTTATCACAGATCAA

GAAACGCATTCAGAAAGAGATCCCATTCAATTATAACTCCCCTAAACAAACCA

CCCAGTACTTGGGCATCGATAGCTCCAGTAAGGACGTGTTGATGGACCTGGCG TTAAAGGGCAACGAGTTAGCTAAGAAAATCCTTGAGGCTCGTCAAATTGAAAA

1480 GGCTCTGACCTTCGCTAAAGagTTATACGATTTGGCGAAGCGTAATAACGGACG

TATTTACGGTAACTTCTTTACTACTACCGCGCCATCTGGGCGTATGTCGTGTAG

CGACATCAACTTGCAAAACATTCCACGCAAGTTGCGTCCGTTCATTGGCTTTGA

AACTGAAGATAAGcgtCTGATTACCGCTGATTTTCCCCAAATCGAATTGCGCTTG

GCTGGTGTAATCTGGAACGAAaagAAGTTTATTGAAGCCTTCAATCAAGGAATT

1485 GACTTACACAAGTTGACAGCATCAATTCTGTTCGaaAAGCGCTCGGTCGATGAG

GTCAGTAAAGAAGAGCGCCAGATCGGGAAGTCTGCAAACTTTGGGTTGATCTA

TGGGATCTCCCCGcgtGGATTCGCTGAGTACTGCATCACTAATGGAATCAACATG

ACCGAAGAGATCGCATACGAGATCGTCAAGAAGTGGAAAcgtTATTATACAAAA

ATCACTGAACAACAAAAGAAGGCGTATGAACGCTTCAAATACGGGGAGTACG

1490 TCGATAACGAAACCTGGTTAgccCGTACCTATCGTGCCTATAAACCCCAGGACT

TGTTGAACTACCAGATCCAGGGTTCTGGGGCTGAGCTGTTCAAAAAAGCTATC

ATC CT GTT GAAAGAGGAGGAGC C AAGTGTT AAAATT GT C AACTT GGT C CAT GA

T GA A AT C GTT GTT GAGGC T GAT AGT A AAG AT GCT C AGGAC GT AGC C A ATTT A A

TTAAAGA AAAGAT GGGGC AGGC CT GGGATT ACTGCTTGGATAAGGC C AAAGA

1495 ATTC GGA A AC C GC GT AGC GGAA ATT A AGCTT GAAGT AGA AGAGC C C A AT GTC A

GT GA AGTTT GGGA A A AGGGC

SEQ ID NO. 14 Engineered 058 variant polymerase

Length: 877, Type: Protein, Source: Expression from synthetic gene

MRGMLPLFEPKGRVLLVDGHHL AYRTFH ALKGLTTSRGEPVQ AVY GFAKSLLKA

1500 LKEDGDAVIVVFDAKAPSFRHEAYGGYKAGRAPTPEDFPRQLALIKELVDLLGLA

RLEVPGYEADDVLASLAKKAEKEGYEVRILTADKDLYQLLSDRIHVLHPEGYLITP

AWLWEKYGLRPDQWADYRALTGDESDNLPGVKGIGEKTARKLLEEWGSLEALL

KNLDRLKPAIREKILAHMDDLKLSWDLAKVRTDLPLEVDFAKRREPDRERLRAFL

ERLEFGSLLHEFGLLESGGGGSGGGGSNTTTLSVKQEVKSLVKPVVCDSIDKIPAK

1505 FDEPVYFDLATDNDKPVLASIYQSHF GHDVY CLNLLKEKP ARLKDWLLKF SEIRG

W GLD YDLRVLGYTYEQLKDKKIVD V QL AIKV QHYERFRQN GAKGEGFKLDD V A

RDLLGIEYPMDKTKIRTTFKQNMYNSFNKDQLLYASLDAYIPHLLYEQLSSNTLNS

LVYQLDQQVQKIGIETSQHGLPVRLQALQEEIDKLSQIKKRIQKEIPFNYNSPKQTT

QYLGID S S SKDVLMDL ALKGNELAKKILEARQIEKALTF AKELYDLAKRNNGRIY

1510 GNFFTTTAPSGRMSCSDINLQNIPRKLRPFIGFETEDKRLITADFPQIELRLAGVIWN

EKKFIEAFNQGIDLHKLTASILFEKRSVDEVSKEERQIGKSANFGLIYGISPRGFAE Y

CITNGINMTEEIAYEIVKKWKRYYTKITEQQKKAYERFKYGEYVDNETWLARTYR

AYKPQDLLNYQIQGSGAELFKKAIILLKEEEPSVKIVNLVHDEIVVEADSKDAQDV

ANLIKEKMGQ AWDY CLDKAKEF GNRV AEIKLEVEEPNV SEVWEKG

1515 SEQ ID NO. 15 OS-1622 Taq nuclease domain fusion (without mutation)

Length: 876, Type: Protein, Source: Expression from synthetic gene

OS-1622-Taq-wt MRGMLPLFEPKGRVLLVDGHHLAYRTFH ALKGLTTSRGEPVQ AVYGFAKSLLKA

LKEDGDAVIVVFDAKAPSFRHEAYGGYKAGRAPTPEDFPRQLALIKELVDLLGLA

1520 RLEVPGYEADDVLASLAKKAEKEGYEVRILTADKDLYQLLSDRIHVLHPEGYLITP

AWLWEKYGLRPDQWADYRALTGDESDNLPGVKGIGEKTARKLLEEWGSLEALL

KNLDRLKPAIREKILAHMDDLKLSWDLAKVRTDLPLEVDFAKRREPDRERLRAFL

ERLEFGSLLHEFGLLESGGGGSGGGGSNIPKPILKPQPKALVEPVLCDSVDEIPTKF

NEPIYFDLATDGDRPVLASIY QPHFERKVY CLNLLKEKPTRFKEWLLKFSEIRGW G

1525 LDFDLRALGYTYEQLRDKKIVDVQLAIKVQHHERFKQNGTKGEGFRLDDVARDL

LGIEYPMDKTKIRETFKNNIFHSFSNEQLLYASLDAYIPHLLYEQLTSSTLNSLVYQ L

DQQAQKIVVETSQNGMPVKLKALEEEIHRLTQLRNQMQKEIPFNYNSPKQTAKFF

RVDSSSKDVLMDLALQGNEMAKRVLEARQVEKSLAFAKDLYDIAKRSGGRVYG

NFFTTTAPSGRMSCSDINLQqIPRRLRQFIGFDTEDKRLITADFPQIELRLAGVIWN E

1530 SEFIEAFKQGIDLHKLTASILFEKNIEEV GKEERQIGKS ANF GLIY GIAPKGFAEY CIT

NGINMTEEQAYEIVRKWKKYYTKIAEQhqvAYERFKYNEYVDNETWLNRTYRAW

KPQDLLNYQIQGSGAELFKKAIVLLKEAKPDLKIVNLVHDEIVVEADSKEAQDLAK

LIKEKMEEAWD WCLEKAEEF GNRV AKIKLEVEQPNV GDTWEKS

SEQ ID NO. 16 OP-2605 Taq nuclease domain fusion (without mutation)

1535 Length: 877, Type: Protein, Source: Expression from synthetic gene

OP-2605-Taq-wt

MRGMLPLFEPKGRVLLVDGHHL AYRTFH ALKGLTTSRGEPVQ AVY GFAKSLLKA

LKEDGDAVIVVFDAKAPSFRHEAYGGYKAGRAPTPEDFPRQLALIKELVDLLGLA

RLEVPGYEADDVLASLAKKAEKEGYEVRILTADKDLYQLLSDRIHVLHPEGYLITP

1540 AWLWEKYGLRPDQWADYRALTGDESDNLPGVKGIGEKTARKLLEEWGSLEALL

KNLDRLKPAIREKILAHMDDLKLSWDLAKVRTDLPLEVDFAKRREPDRERLRAFL

ERLEFGSLLHEFGLLESGGGGSGGGGSNTTTLSVKQEVKSLVKPVVCDSIDKIPAK

FDEPVYFDLATDNDKPVLASIYQSHF GHDVY CLNLLKEKP ARLKDWLLKF SEIRG

W GLD YDLRVLGYTYEQLKDKKIVD V QL AIKV QHYERFRQN GAKGEGFKLDD V A

1545 RDLLGIEYPMDKTKIRTTFKQNMYNSFNKDQLLYASLDAYIPHLLYEQLSSNTLNS

LVYQLDQQVQKIGIETSQHGLPVRLQALQEEIDKLSQIKKRIQKEIPFNYNSPKQTT

QYLGID S S SKDVLMDL ALKGNELAKKILEARQIEKALTF AKDLYDLAKRNNGRIY

GNFFTTTAPSGRMSCSDINLQqIPRKLRPFIGFETEDKKLITADFPQIELRLAGVIW N

EPKFIEAFNQGIDLHKLTASILFDKRSVDEVSKEERQIGKSANFGLIYGISPKGFAE Y

1550 CITNGINMTEEIAYEIVKKWKKYYTKITEQhqvAYERFKYGEYVDNETWLNRTYRA

YKPQDLLNYQIQGSGAELFKKAIILLKEEEPSVKIVNLVHDEIVVEADSKDAQDVA

NLIKEKMGQ AWDY CLDKAKEF GNRV AEIKLEVEEPNV SEVWEKG

SEQ ID NO. 17 CS-2729 Taq nuclease domain fusion (without mutation)

Length: 877, Type: Protein, Source: Expression from synthetic gene

1555 CS-2729-Taq-wt MRGMLPLFEPKGRVLLVDGHHLAYRTFH ALKGLTTSRGEPVQ AVYGFAKSLLKA

LKEDGDAVIVVFDAKAPSFRHEAYGGYKAGRAPTPEDFPRQLALIKELVDLLGLA

RLEVPGYEADDVLASLAKKAEKEGYEVRILTADKDLYQLLSDRIHVLHPEGYLITP

AWLWEKYGLRPDQWADYRALTGDESDNLPGVKGIGEKTARKLLEEWGSLEALL

1560 KNLDRLKPAIREKILAHMDDLKLSWDLAKVRTDLPLEVDFAKRREPDRERLRAFL

ERLEFGSLLHEFGLLESGGGGSGGGGSNTPFTVKVKPANKSLVDPILCNSIDEIPVR

YDEPVYFDIATEEDKPVLV SVY QPHF GNKVY CLNLLREKPARFKEWFLKFSEIRG

W GLDFDLKILGYTYEQLKNKKIVDV QL AIKV QHYERFKQGGTKGEGFRLDEV AR

DLLGIEYPMDKSKIRMTFRNNMF S SFS YEQLLY ASLD AYIPHLLYERLS S STLN SLV

1565 YQIDQEVQKIVVETSQHGMPVKLQALEEEIHRLLQIKNQIQKEIPFNYNSPQQTAKF

FGVNSSSKDVLMDLVLKGNEMAKKVLEARQVEKSLAFAKDLYDLAKRSGGRIYG

NFFTTTAPSGRMSCSDINLQqIPRRLRQFIGFETEDKKLITADFPQIELRLAGVIWN EP

EFINAFRKGLDLHKLTASILFEKNIEEVSKEERQIGKSANFGLIYGISPRGFAEYCI SN

GINMTEEMAVEIVRKWKKFYRKIAEQhqlAYERFKYDEYVDNETWLNRPYRAYKP

1570 QDLLNYQIQGSGAELFKKAIILIKEVRPDLKLVNLVHDEIVAEALTDEAEDIAMLIK

QKMEEAWD Y CLEKAKEF GNKV SEIKLDIEKPNI SHV WEKE

SEQ ID NO. 18 PS-6739 Taq nuclease domain fusion (without mutation)

Length: 877, Type: Protein, Source: Expression from synthetic gene

PS-6739-Taq-wt

1575 MRGMLPLFEPKGRVLLVDGHHL AYRTFH ALKGLTTSRGEPVQ AVY GFAKSLLKA

LKEDGDAVIVVFDAKAPSFRHEAYGGYKAGRAPTPEDFPRQLALIKELVDLLGLA

RLEVPGYEADDVLASLAKKAEKEGYEVRILTADKDLYQLLSDRIHVLHPEGYLITP

AWLWEKYGLRPDQWADYRALTGDESDNLPGVKGIGEKTARKLLEEWGSLEALL

KNLDRLKPAIREKILAHMDDLKLSWDLAKVRTDLPLEVDFAKRREPDRERLRAFL

1580 ERLEFGSLLHEFGLLESGGGGSGGGGSNIQKSILKPQPKALVEPVLCNSIDEIPAKFN

EPIYFDLATDEDRPVLASIY QPHFERKVY CLNLLKEKPTRFKEWLLKFSEIRGW GL

DFDLRVLGYTYEQLKDKKIVD VQLAIKV QHYERFRQNGTKGEGFRLDDV ARDLF

GIEYPMDKSKIRTTFKQNMYNTFSEQQLLYASLDAYIPHLLYEQLSSSTLNSLVYQ

LDQTAQKIVVETSQHGMPVKLKALEEEIYRLTQLRNQMQKEIPFNYNSPKQTAKFF

1585 GLDSSSKDVLMDLALQGNEMAKKVLEARQIEKSLTFAKDLYDLAKKSGGRIYGN

FFTTTAPSGRMSCSDINLQqIPRRLRQFIGFDTEDKKLITADFPQIELRLAGVIWNE P

KFIEAFRQGIDLHKLTASILFDKQSIDEVSKEERQIGKSANFGLIYGISPRGFAEHC IT

NGINITEEQAYEIVKKWKKYYTKITEQhqiAYERFKYNEYVDNETWLNRTYRAYKP

QDLLNYQIQGSGAELFKKAIILLKQEEPSLKIVNLVHDEIVVEADSKDAQDLAKLIK

1590 EKMEE AWD WCLEKAEEF GNRV AKIKLEVEEPHV GEV WEKG

SEQ ID NO. 19 Linker

GGGGSGGGGS

SEQ ID NO. 20 Putative viral gene product. Locus tag JGI20132J14458 100001622 1595 Length: 1607, Type: Protein, Source: Synthetic

MRSISFFELLVKIGLIVEDEYGYTFPDYVLVLTQTPEGIELKEIKDAFLRWNETNKE

KWVEEFEEYCKLARERNRYYLSLFAEKRNAQDFFKRTKVAIRIDIDEPLKLDQILEI

VNNRELLPIQPTHILRTIKGWHIFYITQDFIECDDKEILYMIHSYVEDLKSNLRKHA D

KIDHTY SIATRY SNEIYELREPYTKKELLEEMNKYYDTDILINGLPVKRREY SRIPIS

1600 QISEGLALTLWNACPVIRSLEEKWETHTYNEWFILSWKYAFLYVLTQKEEYKQEF

LQKSKFWKGKVVIAPEQQFRNTLKWMLKDRETLPYFSCSFVHRRVVDADEKCKN

CQYARWIFDEYGERKLISNWFKDLFYLETRLEGFKVDEKRNLWVKEDTNEPVCEL

FRIED VVLYNKPNRKEKYIKIFYRDKYEFIPYVLTASANTDFSEFIVLTFYNQQLFK

KLLTNYLTLFQL ARGVREIDKAGYKYNDLKRKWDMV V ANMD S FRAEDLNF YM

1605 WSDRTNRLNYYIPIVNGSFEAWKNAYRRVVKAKDPIMLILLGHFISHITKEYFRDK

FVASSEPNVLIFLRGFTTTGKTTRLRIASALYGTPQVIQITETTTAKILREFGNIGM PL

PLDEFRMRKDKEEEIANMIYAIANEASKDTAYERFSPIQVPVVFSGEKNALAVEVL

CKNREGLYRRSIVLDVDELPKQKNTALVEFYTNEILPILKYNHGYIFKLIDFIENHV

DIEALAQYYKDVEILRNEFDKKRSKVLRGIVKSLDNHLKLIYASIHVFLEFLGLSDE

1610 EKANVFVILEQYIRNVFAKFYDTLLPKEESKLNKIIDYLRDLADGLYNASNNPIKKT

TIRGLTIKKLIDIAGVQVPTTDIEPYLKLLFMKYYENKKAFVYLGSIFVEGRNPAWF

EGMVTREYERLTYIKEHHPEFYKSILEVFTELMLSIHGEAGLRRLHNLFVESFKFED

LKDFIDNNGGDNTPPDEDLPSGDDDDNTPPNDNLPPVEEFDYENKENEDNEEEDEL

EKHFTGEDGLSLPKRMNIPKPILKPQPKALVEPVLCDSVDEIPTKFNEPIYFDLETD G

1615 DRPVLASIY QPHFERKVY CLNLLKEKPTRFKEWLLKFSEIRGWGLDFDLRALGYTY

EQLRDKKIVDV QLAIKV QHHERFKQNGTKGEGFRLDDV ARDLLGIEYPMDKTKIR

ETFKNNIFHSFSNEQLLYASLDAYIPHLLYEQLTSSTLNSLVYQLDQQAQKIVVETS

QN GMP VKLKALEEEIHRLT QLRN QMQKEIPFNYNSPKQT AKFFRVD S S SKD VLMD

LALQGNEMAKRVLEARQVEKSLAFAKDLYDIAKRSGGRVYGNFFTTTAPSGRMS

1620 CSDINLQQIPRRLRQFIGFDTEDKRLITADFPQIELRLAGVIWNESEFIEAFKQGIDLH

KLTASILFEKNIEEV GKEERQIGKS ANF GLIY GIAPKGF AEY CITNGINMTEEQAYEI

VRKWKKYYTKI AEQHQV AYERFKYNEYVDNETWLNRTYRAWKPQDLLNY QIQG

SGAELFKKAIVLLKEAKPDLKIVNLVHDEIVVEADSKEAQDLAKLIKEKMEEAWD

W CLEKAEEF GNRV AKIKLEVEQPNV GDTWEKS

1625 SEQ ID NO. 21 Putative viral gene product. Locus tag Ga0186926_122605 Length: 1595, Type: Protein, Source: Synthetic

MNKITFFDLFVKIGLV YENEKY GYTFNDYVLVL AETLEGV AVKEIRD AFLGFNEA DKERWKKEFEEYCKV ARERNRYFLSLF AEKRNSFDYFKRTKV AIRIDIDEPLKLEE VLELVNNRDLIPIPPTHILRSVKGWHIFYITQDYIESVDREVLYFIHSYTEELKSLLRK

1630 HADKVDHTY QI ATRF SEEIYELREPYTKEKLF Q AINDYY GVEIQINGLTVKRGQY G KIPV AHLSEGV ALTLWNACPVLRQLEERWENHTYDEWFLMS WKYAFLY ALTQKE EYKQEFLQKSKLWKGQVKTTPEQQFQYTLKWILKDRETLPYFSCSFVHKSVEGAE EKCN S C Q Y ARWMLDEN GERRLI SNWFKDLF YLETRLEGFKIDERKNVWVKEDTE EPVCELFKIEDVVLYNKPNNKQKYIKIFYRDKYEFIPYVLTASANTDFSEFIVLTFY 1635 NQQLFKKLLTNYLTLFQLARGVREIDKAGYKYNDLKKRWDTVV ANV GAFRVED

LNFYMWNDRTSRLNYYIPVVNGSFEAWKDAYRRVVKAKDPILLILLGHFISHITKE

YFKDKFVASSEPNVLIFLRGFTTAGKTTRLRIASALYGTPQAIQITETTTAKILREF G

NIGTPLPLDEFRMRKDKEEEVANMIYAIANESAKDTAYERFNPIQVPVVFSGEKNA

LSVETLCKNRDGLYRRSIVLDIDEIPKQKNSSLVEFYTNKILPILKYHHGYIFKFID FI

1640 ENEVDIETVAERFKDVELLNEELNKKKSKVFRGIVKSLDNHLKMIIASLSVFLDFLN

LNEEEKADIYIALDHYIRNVLAKFYDTLLPKEEDKLSKIIDYLRDFADGLYNASNNP

IKKTTIKGLTTKKLIDV AGMQVPTTDIEPYLRLLFMKYY Q SNRGYTYLGSIFVEGR

NPAWFESMIKIEYERLIHIKEQHPTYYKNALEVFVELMLSIHGELGLRRLYRIFVKT

YKFDDLKDFISDNNDDTPPDDNPPNGDDGDDDLPPDDSISPNGHYTEDPEEPHFEE

1645 ETNSF S QNTTTL S VKQEVKS LVKP V V CDS IDKIP AKFDEPV YFDLETDNDKP VL ASI

Y Q SHF GHD V Y CLNLLKEKP ARLKD WLLKF S EIRGW GLD YDLRVLGYTYEQLKDK

KIVDVQLAIKVQHYERFRQNGAKGEGFKLDDVARDLLGIEYPMDKTKIRTTFKQN

MYNSFNKDQLLYASLDAYIPHLLYEQLSSNTLNSLVYQLDQQVQKIGIETSQHGLP

VRLQALQEEIDKLSQIKKRIQKEIPFNYNSPKQTTQYLGIDSSSKDVLMDLALKGNE

1650 LAKKILEARQIEKALTFAKDLYDLAKRNNGRIYGNFFTTTAPSGRMSCSDINLQQIP

RKLRPFIGFETEDKKLITADFPQIELRLAGVIWNEPKFIEAFNQGIDLHKLTASILF DK

RSVDEV SKEERQIGKS ANFGLIY GISPKGFAEY CITNGINMTEEIAYEIVKKWKKYY

TKITEQHQVAYERFKYGEYVDNETWLNRTYRAYKPQDLLNYQIQGSGAELFKKAI

ILLKEEEPSVKIVNLVHDEIVVEADSKDAQDVANLIKEKMGQAWDYCLDKAKEFG

1655 NRV AEIKLEVEEPNV SEVWEKG

SEQ ID NO. 22 Putative viral gene product. Locus tag Ga0080008_15802729 Length: 1619, Type: Protein, Source: Synthetic

MNRITFFDLFVKCGLIYDDEEYGYRFTPYVLVLAETVDGIGIKPITDLFFGFNETDR

ERWVKEFLSYCKEARERNRYYLSVFSERRNSFDFFKRTKAAIRIDIDEPLTLSEVIK

1660 LVENKDLIPIQPTHVLRSVRGWHILYITKDFIENDEQNKNIFYLLHSYAEDLKSNLR

KY ADKVDYTY QIATRFSEEIYELREPYEVKELIKAIEDYY SLDIEINGFKLKRRQF G

RIPISHISEGVALTLWNACPVLRRLEEKWEYHTYNEWFIMSWKYAFLYALTGKSE

YKEEFLNKS KL WKGV VKMTPEQQFEYTLKWVLKEKETLPYF S C S F VYKHV SE AE

EKCKECPYARWQEDEFGNKTLISSWFKELFYIESRLENFKIDEKRNLWVKADTNEP

1665 ICELFKIEDVVLYNKPNKKERFIKIFYRNKYEFVPYVLTASANMDFSEFNVLTFYNQ

TLFKNLLINYLNLFQLSRGAREIDKAGYKYNRITKSWDKVVANLGNFRVEDLNFF

MWNDRTNELRYYIPVVNGSYEVWRETYKKVLLAKDPIMLIILGHFLSHITREYFKD

KFVSSNEPNVLIFLRGFTTSGKTTRLKIASALYGTPEVIQITETTTAKILREFGNIG MP

LPLDEFRMRKDKEEEVANMIYAIANEAAKDTAYERFNPISVPVVFSGEKNTLFVET

1670 LAKNREGLYRRSIVLDVDEIPKPEREQLAEFYAREIYPVLRKNHGFIYKFIEFLENEA

DIDRLSELYQDVELLREEFDKRRSKVLRGIVRSLDNHLKMILASLHLFVDFIGLNDE

EKAEVYMCVEDYIKTKLVGFYETFLPKEEDKLTRIIDYLRDIIDGLYNAWKHPVNK

KTIKRLTINKLIEIAGV QAPTQDLEPYLKLLLMKYYPSNNTFTYV GS VFVEGRNYLS

DDY AKLETERLLFVKGRYPHLY QDILEVFVELMLIVHGEY GLSKLIKYMKKLGFT

1675 DVMEYTIKHNITIHKFGDDEDDNPSPTSPPKNPPEISPQNNSSSTEITSTSEVDEDLV NSFVGEEGFSSATLKTDTTKQQNQTNTPFTVKVKPANKSLVDPILCNSIDEIPVRYD

EPVYFDIETEEDKPVLV S VY QPHF GNKVY CLNLLREKP ARFKEWFLKFSEIRGW GL

DFDLKILGYTYEQLKNKKIVDVQLAIKVQHYERFKQGGTKGEGFRLDEVARDLLG

IEYPMDKS KIRMTFRNNMF S SF S YEQLLY AS LD AYIPHLLYERL S S STLN SLV Y QID

1680 QEVQKIVVETSQHGMPVKLQALEEEIHRLLQIKNQIQKEIPFNYNSPQQTAKFFGV

NSSSKDVLMDLVLKGNEMAKKVLEARQVEKSLAFAKDLYDLAKRSGGRIYGNFF

TTTAPSGRMSCSDINLQQIPRRLRQFIGFETEDKKLITADFPQIELRLAGVIWNEPE FI

NAFRKGLDLHKLTASILFEKNIEEV SKEERQIGKSANFGLIY GISPRGFAEY CISNGI

NMTEEMAVEIVRKWKKFYRKIAEQHQLAYERFKYDEYVDNETWLNRPYRAYKP

1685 QDLLNYQIQGSGAELFKKAIILIKEVRPDLKLVNLVHDEIVAEALTDEAEDIAMLIK

QKMEEAWD Y CLEKAKEF GNKV SEIKLDIEKPNI SHV WERE

SEQ ID NO. 23 Putative viral gene product. Locus tag Ga0079997_l 1796739 Length: 1608, Type: Protein, Source: Synthetic

MKSISFSELFVKIGLVSETDDGYTFNDYVLVLSQTPEGTVLKEIREAFLGFNETDKE

1690 RWVKEFEEY CKE ARERNRYYL SLF AEKRN S QD YLKRTKV AIRIDIDEPLKLEQ VLE

IVNN GDLIPIPPTHLLRTIKGWHIF YITKDFIENEDKEVI YLIH S YTEELKTHLRKY AD

KIDHTYQIATRYSTEIYELREPYTKEELLKAINDYFGVEIQVNGLIVKRKDCSGVPV

SQLSEGLALTLWNACPVLRSLEERWETHTYHEWFILSWKHAFLYVLTQKEEYRQE

FLQKS KL WKGKV VITPEQQF QNTLKWMLKDRETLP YF S C SF V YKYV AD AGEKCE

1695 KCQY ARWVFDENGERKLISNWFRDLFYLETRLEGFRVDEKRNLWVKEDTGEPV C

ELFKIEDVVLYNKPNRKEKYIKIFYRDKYEFIPYVLTASANTDFSEFIVLTFYNQQL F

KYLLNKYLTLFQLARGVREIDKAGYKYNDLKRKWDMVVANMGSFRAEDLNFYM

WNDRTNRLNYYIPIMNGSFETWKNTYRRVVKAKDPIMLLLLGHFISHITKEYFRDK

FVASSEPNVLIFLRGFTTAGKTTRLRIASALYGTPQVIQITETTTAKILREFGNIGM PL

1700 PLDEFKMRKDKEEEVANMIYAIANEASKDTAYERFNPIQVPVVFSGEKNALSVEK

LCANREGLYRRSIVLDVDELPKQKNSALIDFYTSELLPILKYNHGYIFKLIDFIENN L

DIEALTQLYKDVEILKDEFDKRKSKALRGIVKSLDNHLKLIFASIHVFLEFLDLSEE E

KAEVFAILEEYIRNVLAKFYDTLLPKEENKLSKIVDYLRDLADGLYNASNNPIKKT

TIRGLTLKKLIDVAGVQVPTTDIEPYVKMLFMRYYESKKGYVYLGSIFVEGRNPA

1705 WFEGMVAREYERLIYIKQHYPELYRSILEVFAELMLSIHGEAGLRRVHSIFVESFKF

DDLKDFLNNNNDDNTPPDDLPPNGGDDDDTPPDDLPPTEEFDYENEEDEEDEEEE

DELNEHFAGEDGLTTPKMMNIQKSILKPQPKALVEPVLCNSIDEIPAKFNEPIYFDL

ETDEDRPVLASIYQPHFERKVYCLNLLKEKPTRFKEWLLKFSEIRGWGLDFDLRVL

GYTYEQLKDKKIVDV QLAIKV QHYERFRQNGTKGEGFRLDDVARDLFGIEYPMD

1710 KSKIRTTFKQNMYNTFSEQQLLYASLDAYIPHLLYEQLSSSTLNSLVYQLDQTAQK

IV VET SQHGMP VKLKALEEEIYRLT QLRN QMQKEIPFNYNSPKQT AKFF GLD S S SK

DVLMDLALQGNEMAKKVLEARQIEKSLTFAKDLYDLAKKSGGRIYGNFFTTTAPS

GRMSCSDINLQQIPRRLRQFIGFDTEDKKLITADFPQIELRLAGVIWNEPKFIEAFR Q

GIDLHKLTASILFDKQSIDEVSKEERQIGKSANFGLIYGISPRGFAEHCITNGINIT EE

1715 QAYEIVKKWKKYYTKITEQHQIAYERFKYNEYVDNETWLNRTYRAYKPQDLLNY QIQGS GAELFKKAIILLKQEEP S LKI VNL VHDEI VVE AD S KD AQDL AKLIKEKMEE A WD W CLEKAEEF GNRV AKIKLEVEEPHV GEV WEKG

SEQ ID NO. 24 Core family A polymerase OS- 1622 Length: 576, Type: Protein, Source: Synthetic

1720 NIPKPILKPQPKALVEPVLCDSVDEIPTKFNEPIYFDLETDGDRPVLASIYQPHFERK

VY CLNLLKEKPTRFKEWLLKFSEIRGWGLDFDLRALGYTYEQLRDKKIVDV QLAI

KVQHHERFKQNGTKGEGFRLDDVARDLLGIEYPMDKTKIRETFKNNIFHSFSNEQL

LYASLDAYIPHLLYEQLTSSTLNSLVYQLDQQAQKIVVETSQNGMPVKLKALEEEI

HRLT QLRN QMQKEIPFNYN S PKQT AKFFRVD S S SKD VLMDL ALQGNEM AKRVLE

1725 ARQVEKSLAFAKDLYDIAKRSGGRVYGNFFTTTAPSGRMSCSDINLQQIPRRLRQF

IGFDTEDKRLITADFPQIELRLAGVIWNESEFIEAFKQGIDLHKLTASILFEKNIEE VG

KEERQIGKSANFGLIYGIAPKGFAEYCITNGINMTEEQAYEIVRKWKKYYTKIAEQ

HQVAYERFKYNEYVDNETWLNRTYRAWKPQDLLNYQIQGSGAELFKKAIVLLKE

AKPDLKIVNLVHDEIVVEADSKEAQDLAKLIKEKMEEAWDWCLEKAEEFGNRVA

1730 KIKLEVEQPNV GDTWEKS

SEQ ID NO. 25 Core family A polymerase OP-2605 Length: 577, Type: Protein, Source: Synthetic

NTTTLS VKQEVKSLVKPVV CD SIDKIP AKFDEPVYFDLETDNDKPVLASIYQSHF G HDVY CLNLLKEKPARLKDWLLKFSEIRGWGLDYDLRVLGYTYEQLKDKKIVDV Q

1735 LAIKVQHYERFRQNGAKGEGFKLDDVARDLLGIEYPMDKTKIRTTFKQNMYNSFN KDQLLYASLDAYIPHLLYEQLSSNTLNSLVYQLDQQVQKIGIETSQHGLPVRLQAL QEEIDKL S QIKKRIQKEIPFNYNSPKQTT Q YLGID S S S KD VLMDL ALKGNELAKKIL E ARQIEKALTF AKDL YDL AKRNN GRI Y GNFFTTT AP S GRMS C SDINLQQIPRKLRPF IGFETEDKKLITADFPQIELRLAGVIWNEPKFIEAFNQGIDLHKLTASILFDKRSVDE

1740 VSKEERQIGKSANFGLIYGISPKGFAEYCITNGINMTEEIAYEIVKKWKKYYTKITE QHQVAYERFKYGEYVDNETWLNRTYRAYKPQDLLNYQIQGSGAELFKKAIILLKE EEP S VKIVNL VHDEI V VE AD SKD AQD V ANLIKEKMGQ AWD YCLDKAKEF GNRV A EIKLEVEEPNV SEVWEKG

SEQ ID NO. 26 Core family A polymerase CS-2729

1745 Length: 577, Type: Protein, Source: Synthetic

NTPFTVKVKP ANKSLVDPILCN SIDEIPVRYDEPVYFDIETEEDKPVLV S VY QPHF G

NKVYCLNLLREKPARFKEWFLKFSEIRGWGLDFDLKILGYTYEQLKNKKIVDVQL

AIKVQHYERFKQGGTKGEGFRLDEVARDLLGIEYPMDKSKIRMTFRNNMFSSFSY

EQLLYASLDAYIPHLLYERLSSSTLNSLVYQIDQEVQKIVVETSQHGMPVKLQALE

1750 EEIHRLLQIKNQIQKEIPFNYNSPQQTAKFF GVN S S SKDVLMDLVLKGNEMAKKVL

EARQVEKSLAFAKDLYDLAKRSGGRIYGNFFTTTAPSGRMSCSDINLQQIPRRLRQ

FIGFETEDKKLITADFPQIELRLAGVIWNEPEFINAFRKGLDLHKLTASILFEKNIE EV SKEERQIGKSANFGLIYGISPRGFAEYCISNGINMTEEMAVEIVRKWKKFYRKIAEQ HQLAYERFKYDEYVDNETWLNRPYRAYKPQDLLNYQIQGSGAELFKKAIILIKEV

1755 RPDLKLVNLVHDEIV AEALTDEAEDI AMLIKQKMEEAWDY CLEKAKEF GNKV SEI KLDIEKPNI SHV WERE

SEQ ID NO. 27 Core family A polymerase PS-6739 Length: 577, Type: Protein, Source: Synthetic

NIQKSILKPQPKALVEPVLCNSIDEIPAKFNEPIYFDLETDEDRPVLASIYQPHFER K

1760 VY CLNLLKEKPTRFKEWLLKFSEIRGWGLDFDLRVLGYTYEQLKDKKIVDV QLAI

KVQHYERFRQNGTKGEGFRLDDVARDLFGIEYPMDKSKIRTTFKQNMYNTFSEQQ

LLY ASLDAYIPHLLYEQLS S STLNSLVYQLDQTAQKIVVETSQHGMPVKLKALEEE

IYRLTQLRNQMQKEIPFNYNSPKQTAKFFGLDSSSKDVLMDLALQGNEMAKKVLE

ARQIEKSLTFAKDLYDLAKKSGGRIYGNFFTTTAPSGRMSCSDINLQQIPRRLRQFI

1765 GFDTEDKKLITADFPQIELRLAGVIWNEPKFIEAFRQGIDLHKLTASILFDKQSIDEV

SKEERQIGKSANFGLIYGISPRGFAEHCITNGINITEEQAYEIVKKWKKYYTKITEQ H

QIAYERFKYNEYVDNETWLNRTYRAYKPQDLLNYQIQGSGAELFKKAIILLKQEEP

SLKIVNLVHDEIVVEADSKD AQDL AKLIKEKMEEAWDW CLEKAEEF GNRV AKIKL

EVEEPH V GEV WEKG

1770