Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A HIGHLY ERROR-PRONE ORTHOGONAL DNA REPLICATION SYSTEM FOR TARGETED CONTINUOUS EVOLUTION IN VIVO
Document Type and Number:
WIPO Patent Application WO/2019/079775
Kind Code:
A2
Abstract:
The invention provides compositions comprising highly error-prone polymerases and methods of using the polymerases for rapid evolution of a nucleic acid sequence within host cells. The invention further provides a versatile synthetic biology platform for manipulating DNA replication inside a cell. The invention also provides a mutually orthogonal replication system for manipulating and tuning DNA replication of multiple nucleic acid molecules using the error-prone polymerases.

Inventors:
LIU CHANG (US)
RAVIKUMAR ARJUN (US)
ARZUMANYAN GARRI (US)
JAVANPOUR ALEX (US)
Application Number:
PCT/US2018/056794
Publication Date:
April 25, 2019
Filing Date:
October 19, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV CALIFORNIA (US)
International Classes:
C12N15/85; C40B40/08
Attorney, Agent or Firm:
HARRIMAN, JD (US)
Download PDF:
Claims:
CLAIMS

What Is Claimed Is:

1. A host cell comprising a mutant polymerase and a template molecule, wherein said mutant polymerase is capable of replicating said template molecule at a mutation rate that, in the case where said mutation rate is applied to the genomic DNA molecules of said host cell, leads to extinction of said host cell.

2. The host cell of claim 1, wherein the mutant polymerase is a DNA polymerase.

3. The host cell of claim 1, wherein the template molecule is a non-genomic nucleic acid molecule.

4. The host cell of claim 1, wherein the template molecule is a plasmid.

5. The host cell of claim 1, wherein the mutation rate of plasmid replication does not decrease after culturing of said host cell.

6. The host cell of claim 1, wherein the mutation rate of genomic DNA molecules of said host cell are unaffected by the mutant polymerase.

7. The host cell of claim 1, wherein the mutant polymerase replicates the plasmid at mutation rate with an increase of a range at least 1.05- fold to 100,000-fold compared to the mutation rate of the genome of said host cell.

8. The host cell of claim 1, wherein the mutant polymerase replicates the plasmid at a mutation rate with an increase of a range of at least 2-fold to 10,000-fold compared to the parental wild-type polymerase.

9. The host cell of claim 1, wherein the mutant polymerase has a mutation rate range of lxlO'9 substitution mutations per nucleotide to lxlO"5 substitution mutations per nucleotide.

10. The host cell of claim 1, wherein the mutant

polymerase is a variant of TP-DNAP1 and further wherein the template molecule is a pi plasmid.

11. The host cell of claim 1, wherein the mutant

polymerase comprises a variant of the polypeptide sequence as set forth in SEQ ID NO: l .

12. The host cell of claim 1, wherein the mutant

polymerase is a variant of TP-DNAP2 and further wherein the template molecule is a p2 plasmid.

13. The host cell of claim 1, wherein the mutant

polymerase comprises a variant of the polypeptide sequence as set forth in SEQ ID NO: 13.

14. The host cell of claim 1, wherein the mutant

polymerase is encoded in a nucleic acid molecule.

15. The host cell of claim 1, wherein said host cell is a yeast cell.

16. A mutant polymerase, wherein the mutant polymerase is a variant of TP-DNAP1 comprising the amino acid sequence as set forth in SEQ ID NO: l and said mutant polymerase replicates a pi plasmid at a mutation rate with an increase of a range of at least 25-fold to 10,000-fold compared to the parental TP-DNAP1.

17. The mutant polymerase of claim 16, wherein said polymerase comprises at least two amino acid mutations relative to the parental polypeptide sequence.

18. The mutant polymerase of claim 16, wherein said polymerase has at least three amino acid mutations relative to the parental polypeptide sequence.

19. The mutant polymerase of claim 16, wherein said polymerase has at least four amino acid mutations relative to the parental polypeptide sequence.

20. The mutant polymerase of claim 16, wherein said polymerase has at least one mutation in a region selected from the group consisting of Exo I, Exo II, Exo III , pre-(S/T)Lx2h, (S/T)Lx2h, Motif A, Motif B, Motif C, pre-Motif B, Tx2G/AR and KxY.

21. The mutant polymerase of claim 20, wherein said polymerase comprises the polypeptide sequence as set forth in SEQ ID NO: l further comprising at least one mutation at an amino acid residue selected from the group consisting of: 1287, S289, N291, L295, Y296, E298, S305, 1301, K302, T303, F304, 1307, D308, N309, T310, 1311, T312, Y313, Y316, 1327, S330, D333, K344, T352, C354, F355, Y360, K365, N371, 1372, C376, Y382, K384, V385, G387, R395, V405, V406, D407, G410, E411, L412, N413, 1414, S415, 1420, A421, G425, G426, H430, Y431, P439, N450, D460, G461, L473, L474, L477, N479, S481, K492, T493, H497, K511, S514, S533, 1549, E550, C556, R557, N558, L561, S564, E569, A573, V574, E575, F578, L592, A599, N611, K612, E613, D614, F616, M618, E620, A621, L622, C627, V630, N631, C639, L640, K643, L645, A648, S649, F652, Y653, Q655, P656, R662, S664, D669, E670, 1673, Y675, R677, T679, N681, R682, N683, N684, N687, R692, S693, H694, N695, K696, T698, E703, E704, S705, T706, 1708, A709, N725, 1729, S733, K769, N773, V774, 1775, 1777, 1778, M779, S781, L782, W783, K785, A787, W790, V791, D804, W814, 1824, Y829, S831, P833, M848, K849, 1851, K857, E858, E861, C862, S865, D866, S869, F871, V872, H873, K874, V897, L900, L909, K934, S955, D958, K959, K962, K967 and F968.

22. The mutant polymerase of claim 20, wherein said polymerase comprises the polypeptide sequence as set forth in SEQ ID NO: l further comprising at least two mutations at an amino acid residue selected from the group consisting of: N282, 1287, S289, N291, L295, Y296, E298, S305, 1301, K302, T303, F304, 1307, D308, N309, T310, 1311, T312, Y313, Y316, 1327, S330, D333, K344, T352, C354, F355, Y360, K365, N371, 1372, C376, Y382, K384, V385, G387, R395, V405, V406, D407, G410, E411, L412, N413, 1414, S415, 1420, A421, G425, G426, H430, Y431, P439, N449, N450, D460, G461, L473, L474, L477, N479, S481, K492, T493, H497, K511, S514, S533, 1549, E550, C556, R557, N558, L561, S564, E569, A573, V574, E575, F578, L592, A599, N611, K612, E613, D614, F616, M618, E620, A621, L622, C627, V630, N631, C639, L640, K643, L645, A648, S649, F652, Y653, Q655, P656, R662, S664, D669, E670, 1673, Y675, R677, T679, N681, R682, N683, N684, N687, R692, S693, H694, N695, K696, T698, E703, E704, S705, T706, 1708, A709, N725, 1729, S733, K769, N773, V774, 1775, 1777, 1778, M779, S781, L782, W783, K785, A787, W790, V791, D804, W814, 1824, Y829, S831, P833, M848, K849, 1851, K857, E858, E861, C862, S865, D866, S869, F871, V872, H873, K874, V897, L900, L909, K934, S955, D958, K959, K962, K967 and F968.

23. The nucleic acid molecule of claim 14, wherein said nucleic acid molecule encodes a mutant polymerase of any of claims 20-22.

24. The nucleic acid molecule of claim 23, wherein the nucleic acid molecule comprises a nucleotide sequence selected from the group consisting of SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO: 10, SEQ ID NO: 11, and sequences having at least 90% identity thereto and encoding the same amino acid sequence.

25. A mutant polymerase, wherein the mutant polymerase is a variant of TP-DNAP2 comprising the amino acid sequence as set forth in SEQ ID NO: 13 and said mutant polymerase replicates a p2 plasmid at a mutation rate with an increase of a range of at least 2-fold to 14-fold compared to the parental TP-DNAP2.

26. The mutant polymerase of claim 25, wherein said polymerase comprises at least one amino acid mutations relative to the parental polypeptide sequence.

27. The mutant polymerase of claim 25, wherein said polymerase comprises at least two amino acid mutations relative to the parental polypeptide sequence.

28. The mutant polymerase of claim 25, wherein said polymerase has at least three amino acid mutations relative to the parental polypeptide sequence.

29. The mutant polymerase of claim 25, wherein said polymerase has at least four amino acid mutations relative to the parental polypeptide sequence.

30. The mutant polymerase of claim 25, wherein said polymerase has at least one mutation in a region selected from the group consisting of Exo I, Exo II, Exo III , pre-(S/T)Lx2h, (S/T)Lx2h, Motif A, Motif B, Motif C, pre-Motif B, Tx2G/AR and KxY.

31. The mutant polymerase of claim 30, wherein said polymerase comprises the polypeptide sequence as set forth in SEQ ID NO: 13 further comprising at least one mutation at an amino acid residue selected from the group consisting of: S370, Y424, L474 and F882.

32. The mutant polymerase of claim 20, wherein said polymerase comprises the polypeptide sequence as set forth in SEQ ID NO: l further comprising at least two mutations at an amino acid residue selected from the group consisting of: S370, Y424, L474 and F882.

33. The nucleic acid molecule of claim 14, wherein said nucleic acid molecule encodes a mutant polymerase of any of claims 30-32.

34. The nucleic acid molecule of claim 33, wherein the nucleic acid molecule comprises a nucleotide sequence selected from the group consisting of SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID N0 18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ID NO:32, SEQ ID NO:34, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, or SEQ ID NO:46 and sequences having at least 90% identity thereto and encoding the same amino acid sequence.

35. A method of generating a nucleic acid molecule comprising at least one mutation relative to a parental nucleic acid sequence, the method comprising contacting a template nucleic acid molecule with a mutant polymerase of any of claims 16-22 or 25-32, wherein the template nucleic acid molecule is replicated by the mutant polymerase.

36. The method of claim 35, wherein the template molecule is a non- genomic nucleic acid molecule.

37. The method of claim 36, wherein the mutant polymerase replicates the template nucleic acid molecule, but does not affect the mutation rate of genomic DNA molecules.

38. The method of claim 36, wherein the replication occurs within a host cell.

39. The method of claim 36, wherein the template molecule is a plasmid.

40. The method of claim 36, wherein the mutant polymerase is a variant of TP-DNAP1 and further wherein the template molecule is a pi plasmid.

41. The method of claim 36, wherein the mutant polymerase is a variant of TP-DNAP2 and further wherein the template molecule is a p2 plasmid.

42. A method of generating a nucleic acid molecule comprising at least one mutation relative to a parental nucleic acid

sequence, the method comprising culturing the host cell of any claims 1- 15 for a period of time sufficient for at least one cycle of DNA

replication.

43. A method of specifically replicating at least one target nucleic acid molecule in the presence of one or more additional nucleic acid molecules, the method comprising contacting a target nucleic acid molecule with a polymerase specific for the replication of the target nucleic acid molecule, wherein the polymerase is orthogonal to the one or more additional nucleic acid molecule, such that the target nucleic acid molecule is replicated, but not the one or more additional nucleic acid

molecules.

44. The method of claim 42, comprising a method of specifically replicating two different target nucleic acid molecules in the presence of a host genomic nucleic acid molecule, the method

comprising:

contacting a first target nucleic acid molecule with a first polymerase specific for the replication of the first target nucleic

acid molecule, wherein the first polymerase is orthogonal to the second target nucleic acid molecule and also to the host genomic nucleic acid molecule, and

contacting a second target nucleic acid molecule with a second polymerase specific for the replication of the second target nucleic acid molecule, wherein the second polymerase is orthogonal to the first target nucleic acid molecule and also to the host genomic nucleic acid molecule,

such that the first target nucleic acid molecule is replicated by the first polymerase, but not the second polymerase, and the second target nucleic acid molecule is replicated by the second polymerase, but not the first polymerase, further wherein neither the first nor the second polymerase replicate the host genomic nucleic acid molecule.

45. A polynucleotide library produced by mutating a nucleic acid molecule according to any of the methods of claims 35-48.

46. Cells produced by mutating a nucleic acid molecule according to any of the methods of claims 35-48.

Description:
A HIGHLY ERROR-PRONE ORTHOGONAL DNA REPLICATION SYSTEM FOR TARGETED CONTINUOUS EVOLUTION IN VIVO

This patent application claims priority to United States Provisional Patent Application Number 62/574,850 filed on Oct. 20, 2017 and Application Number

62/679,059 filed on June 1, 2018, which are incorporated by reference herein, in their entirety.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR

DEVELOPMENT

This invention was made with government support under Grant No.

1DP2GM119163-01 awarded by the National Institutes of Health (NIH); Grant No. MCB 1545158 awarded by the National Science Foundation (NSF); and Grant No.

HR0011 - 15-2-0031 and Grant No. DARPA- 14-49-AS-BRICS-FP-017 issued by the Defense Advanced Research Projects Agency (DARPA). The government has certain rights in the invention.

SEQUENCE LISTING

A sequence listing has been submitted with this invention and is incorporated by reference herein, in its entirety.

BACKGROUND OF THE SYSTEM

[0001] The subj ect matter described herein relates generally to orthogonal nucleic acid replication systems utilizing mutant polymerases to continuously mutate user-defined nucleic acids in vivo.

[0002] Living organisms are continuously evolving systems amenable to massive parallelization. But their laboratory application to the generation of new function from specific genes suffers two constraints. First, organismal evolution is slow, as there is an inverse relationship between genome size and mutation rate. Second, natural evolution is untargeted, because mutations occur randomly across an organism's entire genome. [0003] Historically, the field of laboratory evolution has sidestepped these constraints by implementing mutagenesis ex vivo, where mutation rate and the gene(s) subject to mutation can be controlled in a PCR. However, in this paradigm, evolution is neither continuous nor capable of massive parallelization, since ex vivo diversification requires labor-intensive DNA extraction, mutation, and transformation steps for each round of evolution.

[0004] There is thus a need in the art for novel orthogonal nucleic acid replication systems having altered capacity for generating mutations in vivo. The present invention addresses this unmet need in the art.

SUMMARY

[0005] Provided herein are embodiments of systems, apparatuses and methods for creating and using orthogonal nucleic acid replication systems utilizing mutant polymerases to continuously mutate user-defined genes in vivo. These orthogonal nucleic acid replication systems are separate from genomic nucleic acid replication systems and the mutant polymerases do not contribute to the generation of mutations in genomic nucleic acids. Rather these mutant polymerases generate mutations in plasmids during replication of said plasmids. Multiple, independent plasmids can each be replicated by their own dedicated mutant polymerase to create multiple, mutually orthogonal replication systems that operate together in vivo.

[0006] In one embodiment, the invention relates to a mutant polymerase, wherein said mutant polymerase comprises an altered mutation rate compared to a naturally occurring polymerase.

[0007] In one embodiment the mutation rate is at least 4.2xl0 "6 mutations per nucleotide, which is the extinction threshold of the host cell (Sacchawmyces cerevisiae), and which would result in the death of the host cell if used to generate mutations in genomic nucleic acids.

[0008] In one embodiment, the mutation rate is at least 1.64xl0 "7 mutations per nucleotide, which would result in an unstable host cell (Sacchawmyces cerevisiae) population if used to generate mutations in genomic nucleic acids.

[0009] In one embodiment, the mutation rate is at least lxlO "5 mutations per nucleotide.

[0010] In one embodiment, the mutation rate is at least lxlO "8 mutations per nucleotide.

[0011] In one embodiment, the mutation rate is increased by at least 25-fold, at least 100- fold, at least 1000-fold or at least 10,000-fold compared to a naturally occurring polymerase. [0012] In one embodiment, the method replicates and introduces mutations into a target nucleic acid molecule using a mutant polymerase with an altered mutation rate which is sustained for at least 90 generations of cell replication. In one embodiment, the polymerase is a DNA polymerase.

[0013] In one embodiment, the mutant polymerase comprises at least one amino acid mutation relative to the parental polypeptide sequence.

[0014] In one embodiment, the polymerase comprises at least two amino acid mutations relative to the parental polypeptide sequence.

[0015] In one embodiment, the polymerase has at least three amino acid mutations relative to the parental polypeptide sequence.

[0016] In one embodiment, the polymerase has at least four amino acid mutations relative to the parental polypeptide sequence.

[0017] In one embodiment, the polymerase has at least one mutation in a region selected from the group consisting of Exo I (a.a. 352-362 in TP-DNAPl, a.a. 366-376 in TP-DNAP2), Exo II (a.a. 422-450 in TP-DNAPl, a.a. 416-430 in TP-DNAP2), Exo III (a.a. 550-563 in TP-DNAPl), pre-(S/T)Lx2h (a.a. 463-483 in TP-DNAPl, a.a. 464-481 in TP-DNAP2), (S/T)Lx2h (a.a. 488-493 in TP-DNAPl), Motif A (a.a. 640-650 in TP- DNAPl), Motif B (a.a. 776-787 in TP-DNAPl), Motif C (a.a 862-871 in TP-DNAPl, a.a. 874-883 in TP-DNAP2), pre-Motif B (a.a. 748-759 in TP-DNAPl), Tx2G/AR (a.a. 840- 846 in TP-DNAPl) and KxY (a.a. 914-917 in TP-DNAPl).

[0018] In one embodiment, the mutant polymerase comprises the polypeptide sequence of TP-DNAPl as set forth in SEQ ID NO: l further comprising at least one mutation selected from the group consisting of: I287M, S289K, N291K, N291M, N291W, L295Q, Y296K, E298R, E298G, S305F, E298H, E298I, E298F, E298P, E298S, E298Y, E298V, 1301 A, I301C, I301E, I301L, I301K, I301M, 1301 S, 1301 Y, 1301 V, K302A, K302G, T303H, T303L, T303M, T303W, F304A, F304R, F304Y, S305N, S305G, I307L, D308Q, N309C, N309K, B UG, T310N, T310D, T310E, T310H, T310F, T310W, I311A, 13 UN, 131 ID, 1311G, 1311M, T312E, T312Q, Y313H, Y313M, Y313F, Y313W, Y313V, Y316R, I327Q, S330R, D333A, D333N, D333M, D333T, D333V, K344R, T352E, C354F, F355M, Y360F, K365N, N371A, N371C, N371M, I372C, C376I, Y382A, Y382V, K384D, V385W, G387H, R395P, V405I, V406C, V406Q, D407V, D407Q, D407A, D407C, D407G, D407H, D407M, D407F, D407T, G410H, G410Y, G410R, G410E, G410Q, G410I, G410L, G410K, G410M, G410F, G410S, G410T, G410W, E411A, E411R, E41 IN, E41 ID, E411G, E411L, E41 IK, E41 I S, E41 IT, L412V, N413E, N413H, N413T, N413I, I414H, I414F, S415T, S415V, I420Y, A421N, A421 S, N423R, N423D, N423Q, N423E, N423W, N423Y, G425S, G426C, G426A, G426K, G426P, G426T, Y427A, Y427N, Y427D, Y427G, Y427H, Y427K, H430Q, Y431H, P439A, N450P, D460G, G461K, L473Y, L474F, L474W, L477I, L477T, L477V, N479P, S481F, K492T, K492V, T493P, H497N, K511H, S514R, S533I, I549L, E550D, C556G, C556T, R557A, R557C, N558S, L561I, S564A, E569A, E569K, E569P, A573T, V574F, V574L, E575L, F578I, L592M, A599S, N611Q, N611P, K612A, K612Q, K612S, K612V, E613D, D614L, F616E, M618V, E620D, E620R, A621K, A621S, L622I, C627V, V630K, V630Y, N631A, C639T, C639I, C639V, C639Y, L640A, L640N, L640D, L640G, L640K, L640F, L640W, L640Y, K643N, K643S, K643V, L645N, L645M, Y646A, Y646F, A648S, S649A, F652G, F652L, Y653R, Y653H, Y653L, Q655H, P656A, R662T, R662A, R662C, S664H, D669G, E670M, I673V, Y675H, Y675L, R677Q, R677L, R677M, R677S, T679A, T679R, T679E, T679Q, N681Q, N681F, R682D, R682P, N683A, N683H, N683T, N684D, N687G, R692C, R692K, R692F, R692I, R692W, R692V, S693R, S693Q, S693G, H694I, H694T, N695E, N695Q, N695H, N695F, N695S, K696C, K696G, K696M, K696S, K696T, K696Y, T698M, T698A, T698L, E703D, E703R, E703H, E703W, E704N, E704D, E704G, E704I, E704M, E704V, E704K, S705R, S705L, S705F, T706E, T706R, T706Q, T706G, T706P, T706W, I708A, I708E, I708L, I708M, I708T, I708V, I708Q, A709N, N725R, I729F, I729W, I729V, S733R, S733N, S733E, S733Q, K769G, N773Q, V774A, V7741, 1775 A, I777A, I777K, I777V, I778A, M779L, M779S, S781G, L782G, W783Y, K785R, K785S, A787V, W790L, W790P, V791H, D804M, W814N, I824V, I824E, Y829I, Y829F, S831T, P833C, M848N, M848D, M848H, M848P, K849H, I851L, K857S, E858H, E861R, C862I, S865G, D866N, S869T, F871I, F871Y, V872I, V872L, H873R, H873T, K874P, V897V, L900S, L909F, K934F, K934W, S955W, D958A, K959A, K959M, K962Q K967C, F968C and F968T. [0019] In one embodiment, the mutant polymerase comprises the polypeptide sequence of TP-DNAP1 as set forth in SEQ ID NO: l further comprising at least two mutations selected from the group consisting of: I287M, S289K, N291K, N291M, N291W, L295Q, Y296K, E298R, E298G, S305F, E298H, E298I, E298F, E298P, E298S, E298Y, E298V, 1301 A, I301C, I301E, BOIL, I301K, I301M, 1301 S, 1301 Y, 1301 V, K302A, K302G, T303H, T303L, T303M, T303W, F304A, F304R, F304Y, S305N, S305G, I307L, D308Q, N309C, N309K, B UG, T310N, T310D, T310E, T310H, T310F, T310W, I311A, 13 UN, 131 ID, 1311M, T312E, T312Q, Y313H, Y313M, Y313F, Y313W, Y313V, Y316R, I327Q, S330R, D333A, D333N, D333M, D333T, D333V, K344R, T352E, C354F, F355M, Y360F, K365N, N371A, N371C, N371M, I372C, C376I, Y382A, Y382V, K384D, V385W, G387H, R395P, V405I, V406C, V406Q, D407V, D407Q, D407A, D407C, D407G, D407H, D407M, D407F, D407T, G410H, G410Y, G410R, G410E, G410Q, G410I, G410L, G410K, G410M, G410F, G410S, G410T, G410W, E411A, E411R, E41 IN, E41 ID, E411G, E411L, E41 IK, E41 I S, E41 IT, L412V, N413E, N413H, N413T, N413I, I414H, I414F, S415T, S415V, I420Y, A421N, A421 S, N423R, N423D, N423Q, N423E, N423W, N423Y, G425S, G426C, G426A, G426K, G426P, G426T, Y427A, Y427N, Y427D, Y427G, Y427H, Y427K, H430Q, Y431H, P439A, N450P, D460G, G461K, L473Y, L474F, L474W, L477I, L477T, L477V, N479P, S481F, K492T, K492V, T493P, H497N, K511H, S514R, S533I, I549L, E550D, C556G, C556T, R557A, R557C, N558S, L561I, S564A, E569A, E569K, E569P, A573T, V574F, V574L, E575L, F578I, L592M, A599S, N611Q, N611P, K612A, K612Q, K612S, K612V, E613D, D614L, F616E, M618V, E620D, E620R, A621K, A621S, L622I, C627V, V630K, V630Y, N631A, C639T, C639I, C639V, C639Y, L640A, L640N, L640D, L640G, L640K, L640F, L640W, L640Y, K643N, K643S, K643V, L645N, L645M, Y646A, Y646F, A648S, S649A, F652G, F652L, Y653R, Y653H, Y653L, Q655H, P656A, R662T, R662A, R662C, S664H, D669G, E670M, I673V, Y675H, Y675L, R677Q, R677L, R677M, R677S, T679A, T679R, T679E, T679Q, N681Q, N681F, R682D, R682P, N683A, N683H, N683T, N684D, N687G, R692C, R692K, R692F, R692I, R692W, R692V, S693R, S693Q, S693G, H694I, H694T, N695E, N695Q, N695H, N695F, N695S, K696C, K696G, K696M, K696S, K696T, K696Y, T698M, T698A, T698L, E703D, E703R, E703H, E703W, E704N, E704D, E704G, E704I, E704M, E704V, E704K, S705R, S705L, S705F, T706E, T706R, T706Q, T706G, T706P, T706W, I708A, I708E, I708L, I708M, I708T, I708V, I708Q, A709N, N725R, I729F, I729W, I729V, S733R, S733N, S733E, S733Q, K769G, N773Q, V774A, V774I, 1775 A, I777A, I777K, I777V, I778A, M779L, M779S, S781G, L782G, W783Y, K785R, K785S, A787V, W790L, W790P, V791H, D804M, W814N, I824V, I824E, Y829I, Y829F, S831T, P833C, M848N, M848D, M848H, M848P, K849H, I851L, K857S, E858H, E861R, C862I, S865G, D866N, S869T, F871I, F871Y, V872I, V872L, H873R, H873T, K874P, V897V, L900S, L909F, K934F, K934W, S955W, D958A, K959A, K959M, K962Q K967C, F968C and F968T.

[0020] In one embodiment, the polymerase comprises the polypeptide sequence as set forth in SEQ ID NO: l further comprising at least one combination of mutations selected from the group consisting of a combination of G410H and N423Q; a combination of E41 IT and G426C; a combination of A599S and C639T; a combination of L622I and C639I; a combination of I777K and W814N; a combination of S781G, L782G and W783Y; a combination of K849H and K857S; a combination of S955W and K967C; a combination of L622I, C639I and 1775 A; a combination of L622I, C639I and 1777 A; a combination of L622I, C639I and I777K; a combination of L622I, C639I, I777K and W814N; a combination of L622I, C639I, S781G, L782G and W783Y; a combination of C639Y, L640A and I777A; a combination of L640A and I775A; a combination of L640A and I777A; a combination of L640A and I777K; a combination of L640A, I777K and W814N; a combination of L640A and M779L; a combination of L640N and I777A; a combination of L640G and 1775 A; a combination of L640G and 1777 A; a combination of L640G, I777K and W814N; a combination of L640Y and I777A; a combination of L640Y, I777K and W814N; a combination of L645M and I777K; a combination of L645M and M779L; a combination of F652L and I775A; a combination of Y653L and 1775 A; a combination of Y653L and I777A; a combination of Y653L and I777K; a combination of I775A and F871Y; a combination of 1775 A and L900S; a combination of 1775 A and L909F; a combination of I775A and K934W; a combination of I775A and F968T; a combination of I777A and F871Y; a combination of I777A and V872I; a combination of I777A and L900S; a combination of I777A and L909F; a combination of I777A and K934W; a combination of I777K, W790L and K934W; a combination of I777K and F871Y; a combination of I777K and V872I; a combination of I777K and L900S; a combination of I777K and L909F; a combination of I777K and K934W; a combination of I777K and F968C; a combination of I777K and F968T; a combination of M779L and K934W; a combination of M779L and F968C; a combination of V774I and F871Y; a combination of V774I and L900S; a combination of V774I and F968C; a combination of Y431H, L640Y, I777K and W814N; a combination of L474W, L640Y, I777K and W814N; a combination of V574F, I777K and L900S; a combination of L477V, L640Y, I777K and W814N.

[0021] In one embodiment, the nucleic acid molecule comprises a nucleotide sequence selected from the group consisting of SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO: 10, or SEQ ID NO: l 1, or sequences having at least 90% identity thereto and encoding the same amino acid sequence.

[0022] In one embodiment, the invention relates to a cell comprising a mutant TP- DNAP1 polymerase, wherein said mutant polymerase comprises an altered mutation rate as compared to a naturally occurring TP-DNAP1 polymerase. In one embodiment, the cell is a yeast cell. In one embodiment, the cell further comprises a template molecule for replication by the mutant polymerase. In one embodiment, the template molecule is a pi plasmid.

[0023] In one embodiment, the invention relates to a cell comprising a nucleic acid molecule encoding a mutant TP-DNAP1 polymerase, wherein said mutant polymerase comprises an altered mutation rate as compared to a naturally occurring TP-DNAP1 polymerase. In one embodiment, the cell is a yeast cell. In one embodiment, the cell further comprises a template molecule for replication by the mutant polymerase. In one embodiment, the template molecule is a pi plasmid.

[0024] In one embodiment, the mutant polymerase comprises at least one amino acid mutation relative to the parental polypeptide sequence. In one embodiment, the polymerase comprises the TP-DNAP2 polypeptide sequence as set forth in SEQ ID NO: 13 further comprising at least one mutation at amino acid residue S370, Y424, F882 or L474. [0025] In one embodiment, the polymerase comprises the polypeptide sequence as set forth in SEQ ID NO: 13 further comprising at least one of a S370Q, S370P, S370R, S370E, S370K, S370L, Y424Q, Y424E, Y424K, Y424G, Y424R, L474D, L474A, L474V, F882A, F882V or F882R mutation.

[0026] In one embodiment, the invention relates to a nucleic acid molecule encoding a mutant TP-DNAP2 polymerase, wherein said mutant polymerase comprises an altered mutation rate as compared to a naturally occurring TP-DNAP2 polymerase. In one embodiment, the nucleic acid molecule comprises a nucleotide sequence of SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ID NO:32, SEQ ID NO:34, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, or SEQ ID NO:46. In one embodiment, the nucleic acid molecule comprises a nucleotide sequence having at least 90% identity to a nucleotide sequence of SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID NO: 18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO 30, SEQ ID NO:32, SEQ ID NO:34, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, or SEQ ID NO:46.

[0027] In one embodiment, the invention relates to a cell comprising a mutant TP- DNAP2 polymerase, wherein said mutant polymerase comprises an altered mutation rate as compared to a naturally occurring TP-DNAP2 polymerase. In one embodiment, the cell is a yeast cell. In one embodiment, the cell further comprises a template molecule for replication by the mutant polymerase. In one embodiment, the template molecule is a p2 plasmid.

[0028] In one embodiment, the invention relates to a cell comprising a nucleic acid molecule encoding a mutant TP-DNAP2 polymerase, wherein said mutant polymerase comprises an altered mutation rate as compared to a naturally occurring TP-DNAP2 polymerase. In one embodiment, the cell is a yeast cell. In one embodiment, the cell further comprises a template molecule for replication by the mutant polymerase. In one embodiment, the template molecule is a p2 plasmid. [0029] In one embodiment, the invention relates to a nucleic acid molecule encoding a mutant polymerase, wherein said mutant polymerase comprises an altered mutation rate compared to a naturally occurring polymerase.

[0030] In one embodiment, the invention relates to a cell comprising a mutant polymerase, wherein said mutant polymerase comprises an altered mutation rate compared to a naturally occurring polymerase, or a nucleic acid molecule encoding a mutant polymerase.

[0031] In one embodiment, the cell is a yeast cell.

[0032] In one embodiment, the cell further comprises a template molecule for replication by the mutant polymerase.

[0033] In one embodiment, the template molecule is a non-genomic nucleic acid molecule.

[0034] In one embodiment, the template molecule is a plasmid.

[0035] In one embodiment, the mutant polymerase is a mutant pi polymerase and the template molecule is a pi plasmid.

[0036] In one embodiment, the mutant polymerase is a mutant p2 polymerase and the template molecule is a p2 plasmid.

[0037] In one embodiment, the invention relates to a method of generating a nucleic acid molecule comprising at least one mutation relative to a parental nucleic acid sequence, the method comprising replicating a template nucleic acid molecule with a mutant polymerase, wherein said mutant polymerase comprises an altered mutation rate compared to a naturally occurring polymerase, wherein the template nucleic acid molecule is replicated by the mutant polymerase.

[0038] In one embodiment, the mutant polymerase replicates the template nucleic acid molecule, but is orthogonal to genomic nucleic acid molecules, such that the polymerase does not replicate the genomic nucleic acid molecules and does not contribute to the generation of mutations in the genomic nucleic acid molecules. [0039] In one embodiment, the invention relates to a method of specifically replicating at least one target nucleic acid molecule in the presence of one or more additional nucleic acid molecule, the method comprising replicating a target nucleic acid molecule with a polymerase specific for the replication of the target nucleic acid molecule, wherein the polymerase is orthogonal to the one or more additional nucleic acid molecule, such that the target nucleic acid molecule is replicated, but not the one or more additional nucleic acid molecule.

[0040] In one embodiment, the method comprises specifically replicating at least two different target nucleic acid molecules in the presence of host genomic nucleic acid molecules.

[0041] In one embodiment, the method comprises the steps of a) replicating a first target nucleic acid molecule with a first polymerase specific for the replication of the first target nucleic acid molecule, wherein the first polymerase is orthogonal to the second target nucleic acid molecule and also to host genomic nucleic acid molecules, and b) replicating a second target nucleic acid molecule with a second polymerase specific for the replication of the second target nucleic acid molecule, wherein the second polymerase is orthogonal to the first target nucleic acid molecule and also to host genomic nucleic acid molecules, such that the first target nucleic acid molecule is replicated by the first polymerase, but not the second polymerase, and the second target nucleic acid molecule is replicated by the second polymerase, but not the first polymerase, further wherein neither the first nor the second polymerase replicate host genomic nucleic acid molecules.

BRIEF DESCRIPTION OF THE DRAWINGS

[0042] The following detailed description of preferred embodiments of the invention will be better understood when read in conjunction with the appended drawings. For the purpose of illustrating the invention, there are shown in the drawings embodiments which are presently preferred. It should be understood, however, that the invention is not limited to the precise arrangements and instrumentalities of the embodiments shown in the drawings. [0043] Figure 1 depicts a schematic diagram of the architecture of the orthogonal DNA replication system containing TP-DNAPl, TP-DNAP2, the pi plasmid and the p2 plasmid.

[0044] Figure 2 depicts mutation rates of 65 TP-DNAPl variants found from a homology study and a TP-DNAPl library screen. Variants are ordered by amino acid position in the TP-DNAPl open-reading frame. TP-DNAPl substitution rates were measured with fluctuation tests using pl-encoded leu2 (Q180*).

[0045] Figure 3 depicts mutation rates of a representative panel of TP-DNAPl variants and genomic substitution rates in the presence of highly error-prone variants. TP-DNAPl substitution rates were measured with fluctuation tests using pl-encoded leu2 (Q180*). Open circles represent measurements from independent fluctuation tests, and bars denote median measurements. Genomic substitution rates were determined for strains harboring pi and each TP-DNAPl variant as well as for the TP-DNAPl parent strain, AH22, which lacks pi and TP-DNAPl. Genomic substitution rates were measured at the URA3 locus in large-scale fluctuation tests and are shown as individual measurements.

[0046] Figure 4 depicts a series of yeast genomic mutator strains spanning mutation rates from the w.t. rate to the extinction threshold (upper limit). TP-DNAPl ' s parent strain, AH22, was modified to express its genomic w.t. POL3 from a plasmid, and POL3 variants were introduced into w.t. or mismatch repair-deficient (Amsh6) versions of this strain via plasmid shuffle. W.t. POL3 is retained in pre-plasmid shuffle plating controls. Genomic mutation rates were measured -15 generations after plasmid shuffle. The projected mutation rate of the inviable pol3-01, Amsh6 strain was calculated as the product of the mutational increases due to pol3-01 (58-fold) and Amsh6 mutations (106-fold, averaged across genotypes). The proofreading deficient pol3-01 allele encodes POL3 (D321A, E323A). T711A, Y808C, H879Y, and S968R are suppressor mutations that reduce the error rate of pol3-01.

[0047] Figure 5 depicts the mutational stability of viable genomic mutator strains versus TP-DNAPl . Strains harboring POL3 variants or TP-DNAPl -4-2 were passaged in triplicate for 82 or 90 generations, respectively. Afterwards, genomic or TP-DNAPl substitution mutation rates were measured at the genomic CANl locus or with pl-encoded leu2 (Q180*), respectively.

[0048] Figure 6 depicts the architecture of TP-DNAP1, which consists of a fusion between the terminal protein, a 3 '-5' proofreading exonuclease domain, and a DNA polymerization domain. Motifs responsible for fidelity in the exonuclease and proofreading domains are highlighted. A multiple sequence alignment between TP- DNAPl and five closely related family B DNAPs is shown.

[0049] Figure 7 depicts a schematic diagram of the architecture of the mutually orthogonal DNA replication system.

[0050] Figure 8 depicts a chart showing that mutagenic TP-DNAPl variants increase pi mutation rate (left) by 380- and 870-fold, without any increase in p2 mutation rate (right).

[0051] Figure 9 depicts a chart showing that mutagenic TP-DNAP2 variants increase p2 mutation rate (right) by 16- and 29-fold, without a similar increase in pi mutation rate (left).

DETAILED DESCRIPTION OF THE SYSTEM

[0052] Before the present subject matter is described in detail, it is to be understood that this disclosure is not limited to the particular embodiments described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present disclosure will be limited only by the appended claims.

[0053] The present invention relates generally to modified polymerases having an altered error-rate with respect to a parental polymerase and uses of the polymerases to synthesize nucleic acid molecules with an increased number of mutations. In one embodiment, the modified polymerase has an increased error-rate with respect to the parental polymerase.

[0054] In one embodiment the modified polymerases of the invention function in vivo to increase the mutation rate during nucleic acid synthesis.

[0055] In one embodiment the modified polymerases of the invention have a mutation rate that is at least 4.2xl0 "6 mutations per nucleotide, which is the extinction threshold of the host cell (Saccharomyces cerevisiae), and which would result in the death of the host cell if used to generate errors in genomic nucleic acids. In one embodiment the modified polymerases of the invention have a mutation rate that is at least 1.64xl0 "7 mutations per nucleotide, which would result in an unstable host cell population if used to generate errors in genomic nucleic acids.

[0056] In one embodiment, the modified polymerase of the invention may be a DNA polymerase, an RNA polymerase, or a reverse transcriptase. In one embodiment, the polymerase is a DNA polymerase. In one embodiment, the parental polymerase may be from any organism, including, but not limited to, a mammal, a bacterium, and a yeast. In one embodiment, the parental polymerase is a Klnyveromyces lactis polymerase TP- DNAPl . In another embodiment, the parental polymerase is a Klnyveromyces lactis polymerase TP-DNAP2. [0057] The mutant polymerases of this invention contain at least one mutation that affects the fidelity, or error-rate, of the polymerase. In one embodiment, the mutant polymerase of the invention comprises an amino acid sequence as set forth in SEQ ID NO: l further comprising at least one mutation selected from the mutations listed in Tables 2 and 3. In one embodiment, the mutant polymerase of the invention comprises an amino acid sequence as set forth in SEQ ID NO: l further comprising at least two, three, four or more than four mutations selected from the mutations listed in Tables 2 and 3. In one embodiment, the mutant polymerase of the invention comprises an amino acid sequence as set forth in SEQ ID NO: 13 further comprising at least one mutation selected from the mutations listed in Table 7. In one embodiment, the mutant polymerase of the invention comprises an amino acid sequence as set forth in SEQ ID NO: 13 further comprising at least two, three, four or more than four mutations selected from the mutations listed in Table 7.

[0058] In one embodiment, the invention relates to a mutant polymerase of the invention. In one embodiment, the invention relates to compositions comprising a mutant polymerase of the invention. In one embodiment, the invention relates to nucleic acid molecules encoding a mutant polymerase of the invention. In one embodiment, the invention relates to cells modified with a nucleic acid molecule encoding a mutant polymerase of the invention.

[0059] In one embodiment, a cell modified with a nucleic acid molecule encoding a mutant polymerase of the invention may be from any organism, including, but not limited to, a mammal, a bacterium, and a yeast. In one embodiment, the cell is a yeast cell.

[0060] In one embodiment, the invention relates to methods of using a mutant polymerase for synthesizing a nucleic acid molecule. In one embodiment, the methods comprise replicating a template nucleic acid molecule with a mutant polymerase of the invention. In one embodiment, the template nucleic acid molecule is a pi plasmid. In one embodiment, the pi plasmid comprises a target nucleic acid sequence that encodes a protein, peptide or RNA molecule. In one embodiment, the template nucleic acid molecule is a p2 plasmid. In one embodiment, the p2 plasmid comprises a target nucleic acid sequence that encodes a protein, peptide or RNA molecule. In one embodiment, methods comprise replicating and introducing mutations into a target nucleic acid molecule using a mutant polymerase with an altered mutation rate which is sustained for at least 90 generations of cell replication.

[0061] In one embodiment, the mutant polymerase of the invention replicates the template nucleic acid molecule, but is orthogonal to genomic nucleic acid molecules, such that the polymerase does not replicate the genomic nucleic acid molecules and does not contribute to the generation of mutations in the genomic nucleic acid molecules.

[0062] In one embodiment, the invention relates to methods of using two mutant polymerases to synthesize nucleic acid molecules in an orthogonal system. In one embodiment the methods comprise the steps of a) contacting a pi plasmid with TP-DNAPl for the replication of the pi plasmid, wherein TP-DNAPl is orthogonal to the p2 plasmid and also to host genomic nucleic acid molecules, and b) contacting a p2 plasmid with a TP- DNAP2, wherein TP-DNAP2 is orthogonal to the pi plasmid and also to host genomic nucleic acid molecules, such that the pi plasmid is replicated by TP-DNAPl, but not TP- DNAP2, and the p2 plasmid is replicated by TP-DNAP2, but not TP-DNAPl, further wherein neither the TP-DNAPl nor TP-DNAP2 replicate host genomic nucleic acid molecules.

Definitions

[0063] The articles "a" and "an" are used herein to refer to one or to more than one (i.e., to at least one) of the grammatical object of the article. By way of example, "an element" means one element or more than one element. "About" as used herein when referring to a measurable value such as an amount, a temporal duration, and the like, is meant to encompass variations of ±20% or ±10%, more preferably ±5%, even more preferably ±1%, and still more preferably ±0.1% from the specified value, as such variations are appropriate to perform the disclosed methods.

[0064] As used herein the terms "alteration," "defect," "variation," or "mutation," refers to a mutation in a gene in a cell that affects the function, activity, expression (transcription or translation) or conformation of the polypeptide that it encodes. [0065] Mutations encompassed by the present invention can be any mutation of a gene in a cell that results in the enhancement or disruption of the function, activity, expression or conformation of the encoded polypeptide, including the complete absence of expression of the encoded protein and can include, for example, missense and nonsense mutations, insertions, deletions, frameshifts and premature terminations. Mutations may also be synonymous to the specific mutations in disclosed embodiments of the present invention, including amino acids that possess similar chemical characteristics to the disclosed mutations or their equivalents as known in the art.

[0066] The term "amino acid change" as used herein, refers to any mutation where the amino acid residue at a particular position in a sequence is different from that found at the corresponding location in the naturally occurring sequence. Such mutations can be nonsynonymous changes, conservative changes or non-conservative changes.

[0067] "Coding sequence" or "encoding nucleic acid" as used herein may refer to the nucleic acid (RNA or DNA molecule) that comprise a nucleotide sequence which encodes a sequence of amino acids. The coding sequence may further include initiation and termination signals operably linked to regulatory elements including a promoter and polyadenylation signal capable of directing expression in the one or more cells of a yeast cell or other eukaryotic cell wherein the nucleic acid is administered. The coding sequence may further include sequences that encode signal peptides.

[0068] The term "control" or "reference standard" describes a material comprising none, or a normal, low, or high level of one of more of the marker (or biomarker) expression products of one or more the markers (or biomarkers) of the invention, such that the control or reference standard may serve as a comparator against which a sample can be compared.

[0069] "Increased mutation rate" refers to polymerases with mutational levels which are at least 10% or more, for example, 20%, 30%, 40%, or 50%, 60%, 70%, 80%, 90% higher or more, and/or at least 1.05 fold, 1.06 fold, 1.07 fold, 1.08 fold, 1.09 fold, 1.1 fold, 1.11 fold, 1.12 fold, 1.13 fold, 1.14 fold, 1.15 fold, 1.16 fold, 1.17 fold, 1.18 fold, 1.19 fold, 1.2 fold, 1.25 fold, 1.3 fold, 1.35 fold, 1.4 fold, 1.45 fold, 1.5 fold, 1.55 fold, 1.6 fold, 1.65 fold, 1.7 fold, 1.75 fold, 1.8 fold, 1.85 fold, 1.9 fold, 1.95 fold, 2 fold, 2.1 fold, 2.2 fold, 2.3 fold, 2.4 fold, 2.5 fold, 2.6 fold, 2.7 fold, 2.8 fold, 2.9 fold, 3 fold, 4 fold, 5 fold, 6 fold, 7 fold, 8 fold, 9 fold, 10 fold, 15 fold, 20 fold, 25 fold, 30 fold, 35 fold, 40 fold, 45 fold, 50 fold, 55 fold, 60 fold, 65 fold, 70 fold, 75 fold, 80 fold, 85 fold, 90 fold, 95 fold, 100 fold, 150 fold, 200 fold, 250 fold, 300 fold, 350 fold, 400 fold, 450 fold, 500 fold, 550 fold, 600 fold, 650 fold, 700 fold, 750 fold, 800 fold, 850 fold, 900 fold, 950 fold, 1000 fold, 1500 fold, 2000 fold, 2500 fold, 3000 fold, 3500 fold, 4000 fold, 5000 fold, 6000 fold, 7000 fold, 8000 fold, 9000 fold, or at least 10,000 fold higher or more, and any and all whole or partial increments there between than a control or naturally occurring polymerase.

[0070] "Encoding" refers to the inherent property of specific sequences of nucleotides in a polynucleotide, such as a gene, a cDNA, or an mRNA, to serve as templates for synthesis of other polymers and macromolecules in biological processes having either a defined sequence of nucleotides (i.e., rRNA, tRNA and mRNA) or a defined sequence of amino acids and the biological properties resulting therefrom. Thus, a gene encodes a protein if transcription and translation of mRNA corresponding to that gene produces the protein in a cell or other biological system. Both the coding strand, the nucleotide sequence of which is identical to the mRNA sequence and is usually provided in sequence listings, and the non-coding strand, used as the template for transcription of a gene or cDNA, can be referred to as encoding the protein or other product of that gene or cDNA.

[0071] An "expression cassette" is a nucleic acid construct, generated recombinantly or synthetically, with a series of specified nucleic acid elements that permit transcription of a particular polynucleotide sequence in a host cell. An expression cassette may be part of a plasmid, cellular genome, viral genome, or nucleic acid fragment. Typically, an expression cassette includes a polynucleotide to be transcribed, operably linked to a promoter.

[0072] The term "gene" refers to a nucleic acid (e.g., DNA) sequence that includes coding sequences necessary for the production of a polypeptide, precursor, or RNA (e.g., mRNA). The polypeptide may be encoded by a full length coding sequence or by any portion of the coding sequence so long as the desired activity or functional property (e.g., enzymatic activity, ligand binding, signal transduction, immunogenicity, etc.) of the full- length or fragment is retained. The term also encompasses the coding region of a structural gene and the sequences located adjacent to the coding region on both the 5' and 3' ends for a distance of about 2 kb or more on either end such that the gene corresponds to the length of the full-length mRNA and 5' regulatory sequences which influence the transcriptional properties of the gene. Sequences located 5' of the coding region and present on the mRNA are referred to as 5'-untranslated sequences. The 5'- untranslated sequences usually contain the regulatory sequences. Sequences located 3' or downstream of the coding region and present on the mRNA are referred to as 3'- untranslated sequences. The term "gene" encompasses both cDNA and genomic forms of a gene. A genomic form or clone of a gene contains the coding region interrupted with non-coding sequences termed "introns" or "intervening regions" or "intervening sequences." Introns are segments of a gene that are transcribed into nuclear RNA (hnRNA); introns may contain regulatory elements such as enhancers. Introns are removed or "spliced out" from the nuclear or primary transcript; introns therefore are absent in the messenger RNA (mRNA) transcript. The mRNA functions during translation to specify the sequence or order of amino acids in a nascent polypeptide. "Homologous" refers to the sequence similarity or sequence identity between two polypeptides or between two nucleic acid molecules. When a position in both of the two compared sequences is occupied by the same base or amino acid monomer subunit, e.g., if a position in each of two DNA molecules is occupied by adenine, then the molecules are homologous at that position. The percent of homology between two sequences is a function of the number of matching or homologous positions shared by the two sequences divided by the number of positions compared X 100. For example, if 6 of 10 of the positions in two sequences are matched or homologous then the two sequences are 60% homologous. By way of example, the DNA sequences ATTGCC and TATGGC share 50% homology. Generally, a comparison is made when two sequences are aligned to give maximum homology.

[0073] "Isolated" means altered or removed from the natural state. For example, a nucleic acid or a polypeptide naturally present in a living animal is not "isolated," but the same nucleic acid or polypeptide partially or completely separated from the coexisting materials of its natural state is "isolated." An isolated nucleic acid or protein can exist in substantially purified form, or can exist in a non-native environment such as, for example, a host cell.

[0074] An "isolated nucleic acid" refers to a nucleic acid segment or fragment which has been separated from sequences which flank it in a naturally occurring state, e.g., a DNA fragment which has been removed from the sequences which arenormally adjacent to the fragment, e.g., the sequences adjacent to the fragment in a genome in which it naturally occurs. The term also applies to nucleic acids which have been substantially purified from other components which naturally accompany the nucleic acid, e.g., RNA or DNA or proteins, which naturally accompany it in the cell. The term therefore includes, for example, a recombinant DNA which is incorporated into a vector, into an autonomously replicating plasmid or virus, or into the genomic DNA of a prokaryote or eukaryote, or which exists as a separate molecule {e.g., as a cDNA or a genomic or cDNA fragment produced by PCR or restriction enzyme digestion) independent of other sequences. It also includes a recombinant DNA which is part of a hybrid gene encoding additional polypeptide sequence.

[0075] "Measuring" or "measurement," or alternatively "detecting" or "detection," means assessing the presence, absence, quantity or amount (which can be an effective amount) of either a given substance within a clinical or subject-derived sample, including the derivation of qualitative or quantitative concentration levels of such substances, or otherwise evaluating the values or categorization of a subject's clinical parameters.

[0076] By the term "modulating," as used herein, is meant mediating a detectable increase or decrease in the activity and/or level of a mRNA, polypeptide, or a response in a subject compared with the activity and/or level of a mRNA, polypeptide or a response in the subject in the absence of a treatment or compound, and/or compared with the activity and/or level of a mRNA, polypeptide, or a response in an otherwise identical but untreated subject. [0077] The term "non-conservative mutation" or "non-conservative change" as used herein applies to both amino acid and nucleic acid sequences. With respect to particular nucleic acid sequences, "non-conservative mutations" refers to those nucleic acid changes which do not encode identical or essentially identical amino acid sequences, or where the nucleic acid does not encode an amino acid sequence, to sequences which have different nucleotide sequences.

[0078] As to amino acid sequences, one of skill will recognize that individual substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein sequence which alter, add or delete a single amino acid or a small percentage of amino acids in the encoded sequence where the alteration results in the substitution of anamino acid with a chemically dissimilar amino acid is a "non-conservative mutation".

[0079] In contrast, individual substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein sequence which alter a single amino acid or a small percentage of amino acids in the encoded sequence where the alteration results in the substitution of an amino acid with a chemically similar amino acid is a "conservative mutation".

[0080] Because of the degeneracy of the genetic code, a large number of functionally identical nucleic acids encode any given protein. For instance, the codons GCA, GCC, GCG and GCU all encode the amino acid alanine. Thus, at every position where an alanine is specified by a codon, the codon can be altered to any of the corresponding codons described without altering the encoded polypeptide. Such nucleic acid variations are "synonymous" or "silent" variations. Every nucleic acid sequence herein which encodes a polypeptide also describes every possible silent variation of the nucleic acid. One of skill will recognize that each codon in a nucleic acid (except AUG, which is ordinarily the only codon for methionine, and TGG, which is ordinarily the only codon for tryptophan) can be modified to yield a functionally identical molecule. Accordingly, each silent variation of a nucleic acid which encodes a polypeptide is implicit in each described sequence. [0081] The term "nucleic acid" or "polynucleotide" refers to deoxyribonucleic acids (DNA) or ribonucleic acids (RNA) and polymers thereof in either single- or double- stranded form. Unless specifically limited, the term encompasses nucleic acids containing known analogues of natural nucleotides that have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally occurring nucleotides. The term also encompasses cDNA, RNA, DNA/RNA hybrid, antisense RNA, ribozyme, genomic DNA, synthetic forms, and mixed polymers, both sense and antisense strands, and may be chemically or biochemically modified to contain non-natural or derivatized, synthetic, or semi-synthetic nucleotide bases. Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses synonymously modified variants thereof (e.g., degenerate codon substitutions), alleles, orthologs, SNPs, and complementary sequences as well as the sequence explicitly indicated. For example, degenerate codon substitutions may be achieved by generating sequences in which one or more selected nucleotides that make up a codon is substituted without affecting the encoded amino acid residue (Batzer et al., Nucleic Acid Res. 19:5081 (1991); Ohtsuka et al, J. Biol. Chem. 260:2605-2608 (1985); and Rossolini et al, Mol. Cell. Probes 8:91-98 (1994)). The term nucleic acid is used interchangeably with gene, cDNA, and mRNA encoded by a gene. An "oligonucleotide" or "polynucleotide" is a nucleic acid ranging from at least 2, preferably at least 8, 15 or 25 nucleotides in length, but may be up to 50, 100, 1000, or 5000 nucleotides long or a compound that specifically hybridizes to a polynucleotide. Polynucleotides include sequences of deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) or mimetics thereof which may be isolated from natural sources, recombinantly produced or artificially synthesized. A further example of a polynucleotide of the present invention may be a peptide nucleic acid (PNA). (See U.S. Pat. No. 6, 156,501 which is hereby incorporated by reference in its entirety.) The invention also encompasses situations in which there is a nontraditional base pairing such as Hoogsteen base pairing which has been identified in certain tRNA molecules and postulated to exist in a triple helix. "Polynucleotide" and "oligonucleotide" are used interchangeably in this disclosure. It will be understood that when a nucleotide sequence is represented herein by a DNA sequence (e.g., A, T, G, and C), this also includes the corresponding RNA sequence (e.g., A, U, G, C) in which "U" replaces "T".

[0082] "Operably linked" as used herein may mean that expression of a gene is under the control of a promoter with which it is spatially connected. A promoter may be positioned 5' (upstream) or 3' (downstream) of a gene under its control. The distance between the promoter and a gene may be approximately the same as the distance between that promoter and the gene it controls in the gene from which the promoter is derived. As is known in the art, variation in this distance may be accommodated without loss of promoter function.

[0083] As used herein, the term "polymerase chain reaction" ("PCR") refers to the method of K. B. Mullis (U.S. Pat. Nos. 4,683,195 4,683,202, and 4,965,188, hereby incorporated by reference) for increasing the concentration of a segment of a target sequence in a mixture of DNA without cloning or purification. This process for amplifying the target sequence consists of introducing a large excess of two oligonucleotide primers to the DNA mixture containing the desired target sequence, followed by a precise sequence of thermal cycling in the presence of a DNA polymerase. The two primers are complementary to their respective strands of the double stranded target sequence. To effect amplification, the mixture is denatured and the primers then annealed to their complementary sequences within the target molecule. Following annealing, the primers are extended with a polymerase so as to form a new pair of complementary strands. The steps of denaturation, primer annealing and polymerase extension can be repeated many times (i.e., denaturation, annealing and extension constitute one "cycle"; there can be numerous "cycles") to obtain a high concentration of an amplified segment of the desired target sequence. The length of the amplified segment of the desired target sequence is determined by the relative positions of the primers with respect to each other, and therefore, this length is a controllable parameter. By virtue of the repeating aspect of the process, the method is referred to as the "polymerase chain reaction" (hereinafter "PCR"). Because the desired amplified segments of the target sequence become the predominant sequences (in terms of concentration) in the mixture, they are said to be "PCR amplified". As used herein, the terms "PCR product," "PCR fragment," "amplification product" or "amplicon" refer to the resultant mixture of compounds after two or more cycles of the PCR steps of denaturation, annealing and extension are complete. These terms encompass the case where there has been amplification of one or more segments of one or more target sequences.

[0084] "Promoter" as used herein may mean a synthetic or naturally-derived molecule which is capable of conferring, activating or enhancing expression of a nucleic acid in a cell. A promoter may comprise one or more specific transcriptional regulatory sequences to further enhance expression and/or to alter the spatial expression and/or temporal expression of same. A promoter may also comprise distal enhancer or repressor elements, which can be located as much as several thousand base pairs from the start site of transcription. A promoter may be derived from sources including viruses, bacteria, fungi, plants, insects, and animals. A promoter may regulate the expression of a gene component constitutively, or differentially with respect to the cell, tissue or organ in which expression occurs or, with respect to the developmental stage at which expression occurs, or in response to external stimuli such as physiological stresses, pathogens, metal ions, or inducing agents. Representative examples of promoters include the bacteriophage T7 promoter, bacteriophage T3 promoter, SP6 promoter, lac operator- promoter, tac promoter, SV40 late promoter, SV40 early promoter, RSV-LTR promoter, CMV IE promoter, SV40 early promoter or SV40 late promoter and the CMV IE promoter.

[0085] As used herein, the terms "peptide," "polypeptide," and "protein" are used interchangeably, and refer to a compound comprised of amino acid residues covalently linked by peptide bonds. A protein or peptide must contain at least two amino acids, and no limitation is placed on the maximum number of amino acids that can comprise a protein's or peptide's sequence. Polypeptides include any peptide or protein comprising two or more amino acids joined to each other by peptide bonds. As used herein, the term refers to both short chains, which also commonly are referred to in the art as peptides, oligopeptides and oligomers, for example, and to longer chains, which generally are referred to in the art as proteins, of which there are many types. "Polypeptides" include, for example, biologically active fragments, substantially homologous polypeptides, oligopeptides, homodimers, heterodimers, variants of polypeptides, modified polypeptides, derivatives, analogs, fusion proteins, among others. The polypeptides include natural peptides, recombinant peptides, synthetic peptides, or a combination thereof.

[0086] The term "regulating" as used herein can mean any method of altering the level or activity of a substrate. Non-limiting examples of regulating with regard to a protein include affecting expression (including transcription and/or translation), affecting folding, affecting degradation or protein turnover, and affecting localization of a protein. Non-limiting examples of regulating with regard to an enzyme further include affecting the enzymatic activity. "Regulator" refers to a molecule whose activity includes affecting the level or activity of a substrate. A regulator can be direct or indirect. A regulator can function to activate or inhibit or otherwise modulate its substrate.

[0087] A "reporter gene" encodes proteins that are readily detectable due to their biochemical characteristics, such as enzymatic activity or chemifluore scent features. One specific example of such a reporter is green fluorescent protein. Fluorescence generated from this protein can be detected with various commercially-available fluorescent detection systems. Other reporters can be detected by staining. The reporter can also be an enzyme that generates a detectable signal when contacted with an appropriate substrate. The reporter can be an enzyme that catalyzes the formation of a detectable product. Suitable enzymes include, but are not limited to, proteases, nucleases, lipases, phosphatases and hydrolases. The reporter can encode an enzyme whose substrates are substantially impermeable to eukaryotic plasma membranes, thus making it possible to tightly control signal formation. Specific examples of suitable reporter genes that encode enzymes include, but are not limited to, CAT (chloramphenicol acetyl transferase; Alton and Vapnek (1979) Nature 282: 864-869); luciferase (lux); β-galactosidase; LacZ; β. - glucuronidase; and alkaline phosphatase (Toh, et al. (1980) Eur. J. Biochem. 182: 231- 238; and Hall et al. (1983) J. Mol. Appl. Gen. 2: 101), each of which are incorporated by reference herein in its entirety. Other suitable reporters include those that encode for a particular epitope that can be detected with a labelled antibody that specifically recognizes the epitope.

[0088] The term "transfected" or "transformed" or "transduced" as used herein refers to a process by which exogenous nucleic acid is transferred or introduced into the host cell. A "transfected" or "transformed" or "transduced" cell is one which has been transfected, transformed or transduced with exogenous nucleic acid. The cell includes the primary subject cell and its progeny.

[0089] "Vector" as used herein may mean a nucleic acid sequence containing an origin of replication. A vector may be a plasmid, virus, bacterial artificial chromosome or yeast artificial chromosome. A vector may be a DNA or RNA vector. A vector may be either a self-replicating extrachromosomal vector or a vector which integrates into a host genome.

[0090] As used herein, the term "wild-type" refers to a gene or gene product isolated from a naturally occurring source. A wild-type gene is that which is most frequently observed in a population and is thus arbitrarily designed the "normal" or "wild-type" form of the gene. In contrast, the term "modified" or "mutant" refers to a gene or gene product that displays modifications in sequence and/or functional properties (i.e., altered characteristics) when compared to the wild-type gene or gene product. It is noted that naturally occurring mutants can be isolated; these are identified by the fact that they have altered characteristics (including altered nucleic acid sequences) when compared to the wild-type gene or gene product.

[0091] Ranges: throughout this disclosure, various aspects of the invention can be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 2.7, 3, 4, 5, 5.3, and 6. This applies regardless of the breadth of the range.

Description

[0092] The present invention relates to methods and compositions for modulating the mutation rate at a sequence of interest. The methods and compositions are based on the generation of a library of polymerases that are designed to have altered mutational rates. In one embodiment, a polymerase with an increased mutation rate generates mutations in a replicated DNA molecule at a higher frequency than a naturally occurring polymerase. Consequently, described herein are compositions and methods for generating low-fidelity or error-prone polymerases, wherein the polymerase has a mutation rate that is higher than the mutation rate of a naturally occurring polymerase.

Compositions

[0093] This invention relates, in part, to mutant polymerases having at least one mutation that results in an altered mutation rate of the polymerase relative to an endogenous, wild-type polymerase. In one embodiment, the endogenous or wild-type polymerase is encoded by a parent polynucleotide.

Polymerases Used as Parent Polymerases for Mutations

[0094] In certain embodiments, the parent polynucleotide encodes a DNA polymerase. The parent polynucleotide can also encode other polymerases including, but not limited to, an RNA polymerase, or a reverse transcriptase. In one embodiment, the parent polynucleotide used in the method for generating an improved polymerase encodes a naturally occurring polymerase. In one embodiment, the parent polynucleotide used in the method for generating an improved polymerase encodes a synthetic or non- naturally occurring polymerase.

[0095] Parent polymerases that may be modified to contain mutations that increase the mutation rate of the polymerase include, but are not limited to, polymerases from organisms such as humans, yeast, bacteria, and viruses, including phage. [0096] In certain embodiments, the parent polymerase can also be a T7 polymerase. In certain embodiments, the parent polymerase can be an endogenous low fidelity polymerase. Exemplar}' endogenous low fidelity polymerases include, but are not limited to, terminal deoxynucleotidyl transferase (TdT) and DNA polymerases β, ζ, κ, η, ι, λ, μ, and Revl . In other embodiments, the parent polymerases can also be HIV RT and DNA Polymerase I.

[0097] Numerous genes encoding endogenous polymerases have been isolated and sequenced. This sequence information is available on publicly accessible sequence databases such as GENBANK. A large compilation of the amino acid sequences of polymerases from a wide range of organisms can also be found in publicly accessible sequence databases. This information may be used in designing various embodiments of mutant polymerases of the invention and polynucleotides encoding these enzymes.

[0098] Genes encoding parent polymerase may be isolated using conventional cloning techniques in conjunction with publicly-available sequence information. Alternatively, many cloned polynucleotide sequences encoding polymerases have been deposited with publicly-accessible collection sites, e.g., the American type culture collection deposit accession number ATCC 40336 is a phage clone of Taq DNA polymerase.

[0099] The parent polynucleotide can encode any polymerase known to those of skill in the art. In one embodiment, the parent polynucleotide encodes a DNA polymerase from K. lactis. In one embodiment, the parental polymerase has the amino acid sequence for TP-DNAPl as set forth in SEQ ID NO: l . In one embodiment, the parental polymerase is encoded by a nucleotide sequence for TP-DNAPl as set forth in SEQ IDNO:2. In one embodiment, the parental polymerase has an amino acid sequence for TP-DNAP2 as set forth in SEQ ID NO: 13. In one embodiment, the parental polymerase is encoded by a nucleotide sequence for TP-DNAP2 as set forth in SEQ ID NO: 12. Mutations to Alter Mutation Rate

[00100] The mutant polymerases of this invention contain at least one mutation that affects the fidelity, or mutation rate, of the polymerase.

[00101] In various embodiments, the mutant polymerases of this invention have mutation rates that are at least 4.2xl0 '6 mutations per nucleotide, which is the extinction threshold of the host cell (Saccharomyces cerevisiae), and which would result in the death of the host cell if used to generate mutations in genomic nucleic acids.

[00102] In various embodiments, the mutant polymerases of this invention have mutation rates that are at least 1.64xl0 '7 mutations per nucleotide, which would result in an unstable host cell {Saccharomyces cerevisiae) population if used to generate mutations in genomic nucleic acids.

[00103] In various embodiments the mutant polymerases of this invention have an altered mutation rate which is sustained for at least 90 generations of cell replication.

[00104] In various embodiments, the mutant polymerase has a nucleotide sequence comprising at least one mutation relative to the nucleotide sequence of a parental polynucleotide. In one embodiment the mutant polymerase has at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or more than 10 mutations relative to a parental polynucleotide.

[00105] In various embodiments, the mutant polymerase has at least one amino acid mutation relative to the amino acid sequence of a parental polypeptide. In one embodiment the mutant polymerase has at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or more than 10 mutations relative to a parental polypeptide.

[00106] In one embodiment, the mutant polymerases of the invention, having at least one mutation that results in an increase in the mutation rate of the polymerase relative to an endogenous or wild-type polymerase, may further comprise at least one additional mutation. In one embodiment, an additional mutation does not result in an increase in the mutation rate of the polymerase relative to an endogenous or wild-type polymerase. In one embodiment, at least one additional mutation results in an additive or synergistic increase in the mutation rate of the polymerase relative to an endogenous or wild-type polymerase. In one embodiment the mutant polymerase has at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or more than 10 mutations that result in an additive or synergistic increase in the mutation rate of the polymerase relative to an endogenous or wild-type polymerase.

[00107] Exemplary mutations that can be included in a mutant polymerase of the invention include, but are not limited to, the mutations set forth in Tables 2 and 3. Therefore, in various embodiments, the mutant polymerases of the invention have at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, or more than 10 mutations selected from the mutations set forth in Tables 2 and 3.

[00108] In one embodiment, at least one mutation is in a region selected from: Exo I (a.a. 352-362 in TP-DNAPl), Exo II (a.a. 422-450 in TP-DNAPl), Exo III (a.a. 550-563 in TP-DNAPl), pre-(S/T)Lx2h (a.a. 463-483 in TP-DNAPl), (S/T)Lx2h (a.a. 488-493 in TP-DNAPl), Motif A (a.a. 640-650 in TP-DNAPl), Motif B (a.a. 776-787 in TP-DNAPl), Motif C (a.a 862-871 in TP-DNAPl), pre-Motif B (a.a. 748-759 in TP-DNAPl), Tx2G/AR (a.a. 840-846 in TP-DNAPl) and KxY (a.a. 914-917 in TP-DNAPl). These regions can be seen in Figure 6. In one embodiment, at least one mutation is outside of the Exo I, Exo II, Exo III , pre-(S/T)Lx2h, (S/T)Lx2h, Motif A, Motif B, Motif C, pre-Motif B, Tx2G/AR and KxY regions. In one embodiment, the mutant polymerases of the invention have at least 2 mutations. The at least two mutations can be in the same region, in different regions, a combination of inside a region and outside any region, or only outside of any region. The amino acids residues that comprise a specific region may vary depending on the parental polymerase, therefore the following amino acid residue designations, and amino acid designations throughout should be understood as exemplary for a given parental polypeptide sequence. Analogous regions or residues in alternative parental polypeptides may be identified by methods known in the art, including through structural models, by homology to polymerases with known structures, or by experimental characterization. [00109] Exemplary mutations that can be included in a polymerase of the invention include, but are not limited to, the mutations set forth in Tables 2 and 3. Therefore, in various embodiments, the mutant polymerases of the invention have at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or more than 10 mutations selected from the mutations set forth in Tables 2 and 3. In one embodiment a DNA polymerase of the invention comprises at least one of N282Y, T303H, G410H, E411T, N423D, N423Q, N423R, G426A, G426C, Y427A, Y427S, Y431H, L474W, L477V, V574F, A599S, L622I, C639I, C639T, L640A, L640G, L640N, L640Y, K643V, L645M, F652L, Y653L, D669Q, E704K, V774I, I775A, I777A, I777K, M779L, M779S, W814N, K849H, K857S, F871Y, V872I, L900S, L909F, K934W, S955W, K967C, F968C and F968T relative to the parent polypeptide sequence set forth in SEQ ID NO: l . In one embodiment a DNA polymerase of the invention comprises at least two of N282Y, T303H, G410H, E411T, N423D, N423Q, N423R, G426A, G426C, Y427A, Y427S, Y431H, L474W, L477V, V574F, A599S, L622I, C639I, C639T, L640A, L640G, L640N, L640Y, K643V, L645M, F652L, Y653L, D669Q, E704K, V774I, I775A, I777A, I777K, M779L, M779S, W814N, K849H, K857S, F871Y, V872I, L900S, L909F, K934W, S955W, K967C, F968C and F968T relative to the parent polypeptide sequence set forth in SEQ ID NO: l . In one embodiment a DNA polymerase of the invention comprises at least three of N282Y, T303H, G410H, E411T, N423D, N423Q, N423R, G426A, G426C, Y427A, Y427S, Y431H, L474W, L477V, V574F, A599S, L622I, C639I, C639T, L640A, L640G, L640N, L640Y, K643V, L645M, F652L, Y653L, D669Q, E704K, V774I, I775A, I777A, I777K, M779L, M779S, W814N, K849H, K857S, F871Y, V872I, L900S, L909F, K934W, S955W, K967C, F968C and F968T relative to the parent polypeptide sequence set forth in SEQ ID NO: l. Exemplary combinations of mutations that can be included in a mutant polymerase of the invention include, but are not limited to, a combination of L640Y, I777K, and W814N, a combination of L640Y, I777K, W814N and L477V, a combination of L640Y, I777K, W814N and Y431H, a combination of L640Y, I777K, W814N and L474W, and a combination of I777K, V574F and L900S, relative to the parent polypeptide sequence set forth in SEQ ID NO: 1. [00110] In one embodiment, the mutant error-prone polymerase is encoded by a nucleotide sequence set forth in SEQ ID NO:3 (encoding a mutant polymerase comprising a I777K mutation), SEQ ID NO:4 (encoding a mutant polymerase comprising a F871Y mutation), SEQ ID NO:5 (encoding a mutant polymerase comprising a N423D mutation), SEQ ID NO:6 (encoding a mutant polymerase comprising L640Y, I777K and W814N mutations), SEQ ID NO:7 (encoding a mutant polymerase comprising I777K and L900S mutations), SEQ ID NO:8 (encoding a mutant polymerase comprising Y431H, L640Y, I777K and W814N mutations), SEQ ID NO:9 (encoding a mutant polymerase comprising L474W, L640Y, I777K and W814N mutations), SEQ ID NO: 10 (encoding a mutant polymerase comprising V574F, I777K, and L900S mutations) or SEQ ID NO: 11 (encoding a mutant polymerase comprising L477V, L640Y, I777K and W814N mutations). In one embodiment, the mutant error- prone polymerase is encoded by a nucleotide sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or at least 99% identity to a nucleotide sequence set forth in SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO: 10, or SEQ ID NO: 11. In one embodiment, a nucleotide sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or at least 99% identity to a nucleotide sequence set forth in SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO: 10, or SEQ ID NO: 11 encodes the same amino acid sequence as encoded by SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO: 9, SEQ ID NO: 10, or SEQ ID NO: 11

[00111] In one embodiment, the mutant polymerase of the invention having an increased mutation rate has a mutation rate greater than

[00112] In one embodiment, the mutant polymerase of the invention has a mutation rate of at least 1.05 fold, 1.06 fold, 1.07 fold, 1.08 fold, 1.09 fold, 1.1 fold, 1.11 fold, 1.12 fold, 1.13 fold, 1.14 fold, 1.15 fold, 1.16 fold, 1.17 fold, 1.18 fold, 1.19 fold, 1.2 fold, 1.25 fold, 1.3 fold, 1.35 fold, 1.4 fold, 1.45 fold, 1.5 fold, 1.55 fold, 1.6 fold, 1.65 fold, 1.7 fold, 1.75 fold, 1.8 fold, 1.85 fold, 1.9 fold, 1.95 fold, 2 fold, 2.1 fold, 2.2 fold, 2.3 fold, 2.4 fold, 2.5 fold, 2.6 fold, 2.7 fold, 2.8 fold, 2.9 fold, 3 fold, 4 fold, 5 fold, 6 fold, 7 fold, 8 fold, 9 fold, 10 fold, 15 fold, 20 fold, 25 fold, 30 fold, 35 fold, 40 fold, 45 fold, 50 fold, 55 fold, 60 fold, 65 fold, 70 fold, 75 fold, 80 fold, 85 fold, 90 fold, 95 fold, 100 fold, 150 fold, 200 fold, 250 fold, 300 fold, 350 fold, 400 fold, 450 fold, 500 fold, 550 fold, 600 fold, 650 fold, 700 fold, 750 fold, 800 fold, 850 fold, 900 fold, 950 fold, 1000 fold, 1500 fold, 2000 fold, 2500 fold, 3000 fold, 3500 fold, 4000 fold, 5000 fold, 6000 fold, 7000 fold, 8000 fold, 9000 fold, or at least 10,000 fold greater than the mutation rate of an endogenous or wild-type polymerase.

Additional Mutations

[00113] The mutant DNA polymerases of the invention can comprise numerous mutations in addition to those for increasing the mutation rate. These secondary mutations may be either inside or outside the Exo I, Exo II, Exo III , pre-(S/T)Lx2h, (S/T)Lx2h, Motif A, Motif B, Motif C, pre-Motif B, Tx2G/AR and KxY regions. In one embodiment, the secondary mutations may be those indicated in Table 2 and 3, with respect to SEQ ID NO: l . Secondary mutations can be selected so as to confer some useful property on the mutant polymerase. For example, additional mutations may be introduced to increase thermostability, decrease thermostability, increase processivity, decrease processivity, decrease 3 '-5' exonuclease activity, increase 3 '-5' exonuclease activity, decrease 5 '-3' exonuclease activity, increase 5 '-3' exonuclease activity, or increase expression or stability of the polymerase.

[00114] In some embodiments, the mutant polymerases comprise one or more secondary mutations that reduce or eliminate 3 '-5' exonuclease activity. Exonuclease activity allows newly-added bases to be removed from the primer strand and then added back by polymerase. [00115] In some embodiments, the mutant polymerases comprise one or more secondary mutations that do not affect the activity of the polymerase. Such mutations may be synonymous or conservative mutations.

[00116] Peptides

[00117] In one embodiment, the present invention comprises a peptide comprising a mutant polymerase with an altered mutation rate.

[00118] The peptide of the present invention may be made using chemical methods. For example, peptides can be synthesized by solid phase techniques (Roberge J Y et al. (1995) Science 269: 202-204), cleaved from the resin, and purified by preparative high performance liquid chromatography. Automated synthesis may be achieved, for example, using the ABI 431 A Peptide Synthesizer (Perkin Elmer) in accordance with the instructions provided by the manufacturer.

[00119] The peptide may alternatively be made by recombinant means or by cleavage from a longer polypeptide. The composition of a peptide may be confirmed by amino acid analysis or sequencing.

[00120] The variants of the polypeptides according to the present invention may be one in which one or more of the amino acid residues are substituted with a conserved or non-conserved amino acid residue (preferably a conserved amino acid residue) and such substituted amino acid residue may or may not be one encoded by the genetic code, (ii) one in which there are one or more modified amino acid residues, e.g., residues that are modified by the attachment of substituent groups, and/or (iii) fragments of the polypeptides and/or (iv) one in which the polypeptide is fused with another polypeptide, such as a leader or secretory sequence or a sequence which is employed for purification (for example, His-tag) or for detection (for example, Sv5 epitope tag). The fragments include polypeptides generated via proteolytic cleavage (including multi-site proteolysis) of an original sequence. Variants may be post-translationally, or chemically modified. Such variants are deemed to be within the scope of those skilled in the art from the teaching herein. [00121] As known in the art the "similarity" between two polypeptides is determined by comparing the amino acid sequence and its conserved amino acid substitutes of one polypeptide to a sequence of a second polypeptide. Variants are defined to include polypeptide sequences different from the original sequence, preferably different from the original sequence in less than 40% of residues per segment of interest, more preferably different from the original sequence in less than 25% of residues per segment of interest, more preferably different by less than 10% of residues per segment of interest, most preferably different from the original protein sequence in just a few residues per segment of interest and at the same time sufficiently homologous to the original sequence to preserve the functionality of the original sequence and/or the ability to bind to ubiquitin or to a ubiquitylated protein. The present invention includes amino acid sequences that are at least 60%, 65%, 70%, 72%, 74%, 76%, 78%, 80%, 90%, or 95% similar or identical to the original amino acid sequence. The degree of identity between two polypeptides is determined using computer algorithms and methods that are widely known for the persons skilled in the art. The identity between two amino acid sequences is preferably determined by using the BLASTP algorithm [BLAST Manual, Altschul, S., et al, NCBI NLM NIH Bethesda, Md. 20894, Altschul, S., et al, J Mol. Biol. 215: 403-410 (1990)].

[00122] The polypeptides of the invention can be post-translationally modified. For example, post-translational modifications that fall within the scope of the present invention include signal peptide cleavage, glycosylation, acetylation, isoprenylation, proteolysis, myristoylation, protein folding and proteolytic processing, etc. Some modifications or processing events require introduction of additional biological machinery. For example, processing events, such as signal peptide cleavage and core glycosylation, are examined by adding canine microsomal membranes or Xenopus egg extracts (U.S. Pat. No. 6,103,489) to a standard translation reaction.

[00123] The polypeptides of the invention may include unnatural amino acids formed by post-translational modification or by introducing unnatural amino acids during translation. A variety of approaches are available for introducing unnatural amino acids during protein translation. By way of example, special tRNAs, such as tRNAs which have suppressor properties, suppressor tRNAs, have been used in the process of site- directed non-native amino acid replacement (SNAAR). In SNAAR, a unique codon is required on the mRNA and the suppressor tRNA, acting to target a non-native amino acid to a unique site during the protein synthesis (described in WO90/05785). However, the suppressor tRNA must not be recognizable by the aminoacyl tRNA synthetases present in the protein translation system. In certain cases, a non-native amino acid can be formed after the tRNA molecule is aminoacylated using chemical reactions which specifically modify the native amino acid and do not significantly alter the functional activity of the aminoacylated tRNA. These reactions are referred to as post- aminoacylation modifications. For example, the epsilon-amino group of the lysine linked to its cognate tRNA (tRNALYs), could be modified with an amine specific photoaffinity label.

[00124] The term "functionally equivalent" as used herein refers to a polypeptide according to the invention that retains at least an altered mutation rate relative to a parental polymerase.

Nucleic Acids

[00125] In one embodiment, the invention includes an isolated nucleic acid comprising a nucleotide sequence encoding a mutant polymerase of the invention.

[00126] The nucleotide sequences encoding a mutant polymerase can alternatively comprise sequence variations with respect to the original nucleotide sequences, for example, substitutions, insertions and/or deletions of one or more nucleotides, with the condition that the resulting polynucleotide encodes a polypeptide according to the invention. Therefore, the scope of the present invention includes nucleotide sequences that are substantially homologous to the nucleotide sequences recited herein and encodes a mutant polymerase of the invention.

[00127] In the sense used in this description, a nucleotide sequence is "substantially homologous" to any of the nucleotide sequences describe herein when its nucleotide sequence has a degree of identity with respect to the nucleotide sequence of at least 60%, advantageously of at least 70%, preferably of at least 85%, and more preferably of at least 95%. A nucleotide sequence that is substantially homologous to a nucleotide sequence encoding a mutant polymerase can typically be isolated from a producer organism of the polypeptide of the invention based on the information contained in the nucleotide sequence by means of introducing synonymous, conservative or non- conservative substitutions, for example. Other examples of possible modifications include the insertion of one or more nucleotides in the sequence, the addition of one or more nucleotides in any of the ends of the sequence, or the deletion of one or more nucleotides in any end or inside the sequence. The degree of identity between two polynucleotides is determined using computer algorithms and methods that are widely known for the persons skilled in the art. The identity between two amino acid sequences is preferably determined by using the BLASTN algorithm [BLAST Manual, Altschul, S., et al, NCBI NLM NIH Bethesda, Md. 20894, Altschul, S., et al, J. Mol. Biol. 215: 403- 410 (1990)].

[00128] In another aspect, the invention relates to a construct, comprising a nucleotide sequence encoding a mutant polymerase, or derivative thereof. In a particular embodiment, the construct is operatively bound to transcription, and optionally translation, control elements. The construct can incorporate an operatively bound regulatory sequence of the expression of the nucleotide sequence of the invention, thus forming an expression cassette.

[00129] A mutant polymerase may be prepared using recombinant DNA methods. Accordingly, nucleic acid molecules which encode a mutant polymerase may be incorporated in a known manner into an appropriate expression vector which ensures good expression of the mutant polymerase.

[00130] Therefore, in another aspect, the invention relates to a vector, comprising the nucleotide sequence of the invention or the construct of the invention. The choice of the vector will depend on the host cell in which it is to be subsequently introduced. In a particular embodiment, the vector of the invention is an expression vector. Suitable host cells include a wide variety of prokaryotic and eukaryotic host cells. In specific embodiments, the expression vector is selected from the group consisting of a viral vector, a bacterial vector, a yeast vector and a mammalian cell vector. Prokaryote- and/or eukaryote- vector based systems can be employed for use with the present invention to produce polynucleotides, or their cognate polypeptides. Many such systems are commercially and widely available.

[00131] Further, the expression vector may be provided to a cell in the form of a viral vector. Viral vector technology is well known in the art and is described, for example, in Sambrook et al. (2001), and in Ausubel et al. (1997), and in other virology and molecular biology manuals. Viruses, which are useful as vectors include, but are not limited to, retroviruses, adenoviruses, adeno-associated viruses, herpes viruses, and lentiviruses.

[00132] In general, a suitable vector contains an origin of replication functional in at least one organism, a promoter sequence, convenient restriction endonuclease sites, and one or more selectable markers. (See, e.g., WO 01/96584; WO 01/29058; and U.S. Pat. No. 6,326, 193.) Vectors suitable for the insertion of the polynucleotides are vectors derived from expression vectors in prokaryotes such as pUC18, pUC19, Bluescript and the derivatives thereof, mpl8, mpl9, pBR322, pMB9, ColEl, pCRl, RP4, phages and "shuttle" vectors such as pSA3 and pAT28, expression vectors in yeasts such as vectors of the type of 2 micron plasmids, integration plasmids, YEP vectors, centromeric plasmids with autonomously replicating sequences and the like, expression vectors in insect cells such as vectors of the pAC series and of the pVL, expression vectors in plants such as pIBI, pEarleyGate, pAVA, pCAMBIA, pGSA, pGWB, pMDC, pMY, pORE series and the like, and expression vectors in eukaryotic cells based on viral vectors (adenoviruses, viruses associated to adenoviruses such as retroviruses and, particularly, lentiviruses) as well as non-viral vectors such as pSilencer 4.1-CMV (Ambion), pcDNA3, pcDNA3.1/hyg, pHMCV/Zeo, pCR3.1, pEFI/His, pIND/GS, pRc/HCMV2, pSV40/Zeo2, pTRACER-HCMV, pUB6/V5-His, pVAXl, pZeoSV2, pCI, pSVL and PKSV-10, pBPV-1, pML2d and pTDTl .

[00133] By way of illustration, the vector in which the nucleic acid sequence is introduced can be a plasmid which is or is not integrated in the genome of a host cell when it is introduced in the cell. Illustrative, non-limiting examples of vectors in which the nucleotide sequence of the invention or the gene construct of the invention can be inserted include a tet-on inducible vector for expression in eukaryotic cells.

[00134] The vector may be obtained by conventional methods known by persons skilled in the art (Sambrook et al, "Molecular cloning, a Laboratory Manual", 2nd ed., Cold Spring Harbor Laboratory Press, N.Y., 1989 Vol 1-3). In a particular embodiment, the vector is a vector useful for transforming yeast cells.

[00135] The recombinant expression vectors may also contain nucleic acid molecules which encode a portion which provides increased expression of the recombinant mutant polymerase; increased solubility of the recombinant mutant polymerase; and/or aid in the purification of the recombinant mutant polymerase by acting as a ligand in affinity purification. For example, a proteolytic cleavage site may be inserted in the recombinant peptide to allow separation of the recombinant mutant polymerase from the fusion portion after purification of the fusion protein. Examples of fusion expression vectors include pGEX (Amrad Corp., Melbourne, Australia), pMAL (New England Biolabs, Beverly, Mass.) and pRIT5 (Pharmacia, Piscataway, N.J.) which fuse glutathione S- transferase (GST), maltose E binding protein, or protein A, respectively, to the recombinant protein.

[00136] Additional promoter elements, i.e., enhancers, regulate the frequency of transcriptional initiation. Typically, these are located in the region 30-110 bp upstream of the start site, although a number of promoters have recently been shown to contain functional elements downstream of the start site as well. The spacing between promoter elements frequently is flexible, so that promoter function is preserved when elements are inverted or moved relative to one another. In the thymidine kinase (tk) promoter, the spacing between promoter elements can be increased to 50 bp apart before activity begins to decline. Depending on the promoter, it appears that individual elements can function either co-operatively or independently to activate transcription.

[00137] A promoter may be one naturally associated with a gene or polynucleotide sequence, as may be obtained by isolating the 5' non-coding sequences located upstream of the coding segment and/or exon. Such a promoter can be referred to as "endogenous." Similarly, an enhancer may be one naturally associated with a polynucleotide sequence, located either downstream or upstream of that sequence. Alternatively, certain advantages will be gained by positioning the coding polynucleotide segment under the control of a recombinant or heterologous promoter, which refers to a promoter that is not normally associated with a polynucleotide sequence in its natural environment. A recombinant or heterologous enhancer refers also to an enhancer not normally associated with a polynucleotide sequence in its natural environment. Such promoters or enhancers may include promoters or enhancers of other genes, and promoters or enhancers isolated from any other prokaryote, virus, or eukaryote, and promoters or enhancers not "naturally occurring," i.e., containing different elements of different transcriptional regulatory regions, and/or mutations that alter expression. In addition to producing nucleic acid sequences of promoters and enhancers synthetically, sequences may be produced using recombinant cloning and/or nucleic acid amplification technology, including PCR™, in connection with the compositions disclosed herein (U.S. Patent 4,683,202, U.S. Patent 5,928,906). Furthermore, it is contemplated the control sequences that direct transcription and/or expression of sequences within non-nuclear organelles such as mitochondria, chloroplasts, and the like, can be employed as well.

[00138] Naturally, it will be important to employ a promoter and/or enhancer that effectively directs the expression of the DNA segment in the cell type, organelle, and organism chosen for expression. Those of skill in the art of molecular biology generally know how to use promoters, enhancers, and cell type combinations for protein expression, for example, see Sambrook et al. (2001). The promoters employed may be constitutive, tissue-specific, inducible, and/or useful under the appropriate conditions to direct high level expression of the introduced DNA segment, such as is advantageous in the large-scale production of recombinant proteins and/or peptides. The promoter may be heterologous or endogenous. A promoter sequence exemplified in the experimental examples presented herein is the immediate early cytomegalovirus (CMV) promoter sequence. This promoter sequence is a strong constitutive promoter sequence capable of driving high levels of expression of any polynucleotide sequence operatively linked thereto. However, other constitutive promoter sequences may also be used, including, but not limited to the simian virus 40 (SV40) early promoter, mouse mammary tumor virus (MMTV), human immunodeficiency virus (HIV) long terminal repeat (LTR) promoter, Moloney virus promoter, the avian leukemia virus promoter, Epstein-Barr virus immediate early promoter, Rous sarcoma virus promoter, as well as human gene promoters such as, but not limited to, the actin promoter, the myosin promoter, the hemoglobin promoter, and the muscle creatine promoter. Further, the invention should not be limited to the use of constitutive promoters. Inducible promoters are also contemplated as part of the invention. The use of an inducible promoter in the invention provides a molecular switch capable of turning on expression of the polynucleotide sequence which it is operatively linked when such expression is desired, or turning off the expression when expression is not desired. Examples of inducible promoters include, but are not limited to a metallothionine promoter, a glucocorticoid promoter, a progesterone promoter, and a tetracycline promoter. Further, the invention includes the use of a tissue specific promoter, which promoter is active only in a desired tissue. Tissue specific promoters are well known in the art and include, but are not limited to, the HER-2 promoter and the PSA associated promoter sequences.

[00139] In a particular embodiment, the expression of the nucleic acid is externally controlled. In a more particular embodiment, the expression is externally controlled using the methionine repressible MET3 promoter.

[00140] The recombinant expression vectors may also contain a selectable marker gene which facilitates the selection of transformed or transfected host cells. Suitable selectable marker genes are genes encoding proteins such as the neo gene from Tn5, , HIS3, LEU2, URA3, TRP1, MET15, HIS4, β-galactosidase, β-lactamase, chloramphenicol acetyltransferase, firefly luciferase, or an immunoglobulin or portion thereof such as the Fc portion of an immunoglobulin preferably IgG. The selectable markers may be introduced on a separate vector from the nucleic acid of interest.

[00141] Reporter genes are used for identifying potentially transfected cells and for evaluating the functionality of regulatory sequences. Reporter genes that encode for easily assayable proteins are well known in the art. In general, a reporter gene is a gene that is not present in or expressed by the recipient organism or tissue and that encodes a protein whose expression is manifested by some easily detectable property, e.g., enzymatic activity. Expression of the reporter gene is assayed at a suitable time after the DNA has been introduced into the recipient cells.

[00142] Suitable reporter genes may include genes encoding luciferase, β- galactosidase, chloramphenicol acetyl transferase, secreted alkaline phosphatase, or the green fluorescent protein gene (see, e.g., Ui-Tei et al., 2000 FEBS Lett. 479:79-82). Suitable expression systems are well known and may be prepared using well known techniques or obtained commercially. Internal deletion constructs may be generated using unique internal restriction sites or by partial digestion of non-unique restriction sites. Constructs may then be transfected into cells that display high levels of siRNA polynucleotide and/or polypeptide expression. In general, the construct with the minimal 5' flanking region showing the highest level of expression of reporter gene is identified as the promoter. Such promoter regions may be linked to a reporter gene and used to evaluate agents for the ability to modulate promoter-driven transcription.

[00143] Recombinant expression vectors may be introduced into host cells to produce a recombinant cell. The cells can be prokaryotic or eukaryotic. The vector of the invention can be used to transform eukaryotic cells such as yeast cells {e.g. Saccharomyces cerevisiae cells), or mammal cells {e.g. epithelial kidney 293 cells or U20S cells), or prokaryotic cells {e.g. Escherichia coli or Bacillus subtilis). Nucleic acid can be introduced into a cell using conventional techniques such as lithium acetate transformation, cytoduction techniques, cell mating, calcium phosphate or calcium chloride co-precipitation, DEAE-dextran-mediated transfection, lipofectin, electroporation or microinjection. Suitable methods for transforming and transfecting host cells may be found in Sambrook et al. (Molecular Cloning: A Laboratory Manual, 2nd Edition, Cold Spring Harbor Laboratory press (1989)), and other laboratory textbooks.

[00144] For example, a mutant polymerase of the invention may be expressed in bacterial cells, insect cells (using baculovirus), yeast cells or mammalian cells. Other suitable host cells can be found in Goeddel, Gene Expression Technology: Methods in Enzymology 185, Academic Press, San Diego, Calif. (1991). Methods of Generating Mutant DNA Polymerases of the Invention

[00145] In one embodiment, the invention comprises methods for generating a polymerase having an altered mutation rate. In one embodiment, the method comprises the steps of: (a) providing a parent polynucleotide; (b) mutating the polynucleotide to generate a library of mutated polynucleotides; and (c) selecting from the library a mutated polynucleotide encoding a polymerase having an altered error-rate.

[00146] In one embodiment, the method comprises the steps of: (a) providing a parent polynucleotide; (b) mutating the parent polynucleotide at a residue predicted to affect the error-rate; and (c) selecting from the library a mutated polynucleotide encoding a polymerase having an altered error-rate.

[00147] Such polymerases can be generated by introducing mutations in specific residues which are identified as being in the appropriate region through structural models, by homology to polymerases with known structures, or by experimental characterization (e.g., site-directed mutagenesis). In some cases, the mutant polymerase has additional mutations, including but not limited to mutations that increase the mutation rate of the polymerase and mutations that decrease exonuclease activity.

[00148] The residues that affect the error-rate will vary depending on the particular polymerase and in some degree, will vary depending on the particular modified nucleotide. It will be appreciated by those of skill in the art that the mutations which confer the greatest mutation rates will vary depending on the particular modifications to the nucleotides, e.g., whether the modification alters the charge or interaction of a base, etc. Such mutations are usually, although not necessarily, substitution mutations. Several different amino acid residues may be substituted at a given position of a parent enzyme so as to give rise to mutations that increase the mutation rate of the polymerase.

[00149] The amino acid residues at a given residue position may be systematically varied so as to determine which amino acid substitutions are effective. In one embodiment, the mutations are non-conservative mutations. [00150] A residue predicted to affect the error-rate can be a residue corresponding to a residue listed in Tables 2, 3, and 7. Corresponding residues between analogous proteins can be identified by any method known to those of skill in the art, including through structural models, by homology to polymerases with known structures, or by experimental characterization. In instances where large regions of homology can be found between polymerases, the determination of corresponding amino acid residues between different polymerases can be determined based on sequence homology. A large compilation of the amino acid sequences of polymerases from a wide range of organism and homology alignments between the sequences can be found in Braithwaite and Ito, Nucl. Acids Res. 21(4):787-802 (1993) and is useful for such purposes.

[00151] The mutations described above can be generated using any method typically used by those of skill in the art to introduce mutations at specific residues. Such methods are well described in Sambrook et al, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Publications, Cold Spring Harbor, N.Y. (1982).

[00152] In one embodiment, the mutant polymerase of the invention has decreased exonuclease activity or completely lacks exonuclease activity. In one embodiment, the mutant polymerase retains strand displacement activity and processivity. In one embodiment, the mutant polymerase is capable of synthesizing nucleic acid molecules at a rate of at least 1 nt/sec; at least 10 nts/sec; at least 100 nts/sec or greater than at least 100 nts/sec.

Methods of Generating a Library of Mutants

[00153] Methods of generating a library of mutants are well known to those of skill in the art. In preferred embodiments, the polynucleotide is mutated via in vitro or in vivo recombination, site-directed mutagenesis, error-prone PCR, site- saturation mutagenesis, or gene shuffling recombination. In one embodiment, the parental polynucleotide is systematically mutated at specific amino acids. In one embodiment, the specific amino acids are in at least one region selected from Exo I, Exo II, Exo III, pre-(S/T)Lx2h, (S/T)Lx2h, Motif A, Motif B, Motif C, pre-Motif B, Tx2G/AR and KxY regions. [00154] In other preferred embodiments, the polynucleotides are first mutated using a method which randomly introduces mutations, such as error-prone PCR; screened for desired activity; mutated using a method which introduces all possible mutations at the mutant amino acids which confer the desired activity, such as site-saturation mutagenesis; and then recombined or further mutated by methods such as the StEP (staggered extension process) method or other single-site or multi-site mutagenesis methods. Site-directed mutagenesis techniques are well known in the art as exemplified by U.S. Pat. Nos. 4,711,848; 4,873,192; 5,071,743; 5,284,760; 5,354,670; 5,556,747; Zoller and Smith, Nucleic Acids Res. 10:6487-6500 (1982), and Edelman et al. DNA 2: 183 (1983). Detailed protocols for site-directed mutagenesis are also given in many general molecular biology textbooks such as Sambrook et al. Molecular Cloning a Laboratory Manual 2nd Ed. Cold Spring Harbor Press, Cold Spring Harbor (1989), Ausubel et al. Current Protocols in Molecular Biology, (current edition). Additionally, many textbooks on PCR (the polymerase chain reaction), such as Diefenbach and Dveksler, PCR Primer: A Laboratory Manual, Cold Spring Harbor Press, Cold Spring Harbor, N.Y. (1995), describe methods of using PCR to introduce mutations.

[00155] In other embodiments, shuffling methods such as those described in U.S. Pat. No. 6, 117,679, issued to Stemmer et al. are used to generate additional mutants from mutant polynucleotides with altered error-rates. In some cases, two polynucleotides encoding mutant versions of the same polymerase are shuffled. In other cases, a polynucleotide encoding one type of polymerase and a polynucleotide encoding a different polymerase with sufficient nucleotide homology to permit shuffling and are shuffled. Gene shuffling utilizes naturally occurring nucleotide substitutions among family genes as the driving force for evolution, (see, Chang, C.-C, Chen, T. T., Cox, B. W., Dawes, G. N., Stemmer, W. P. C, Punnonen, J., and Patten, P. A. Evolution of a cytokine using DNA family shuffling. Nat. Biotechnol., 17, 793-797. (1999); Hansson, L. O., B-Grob, R., Massoud, T., and Mannervik, B. Evolution of differential substrate specificities in Mu class glutathione transferases probed by DNA shuffling. J. Mol. Biol., 287, 265-276. (1999); and Kikuchi, M., Ohnishi, K., and Harayama, S. An effective family shuffling method using single-stranded DNA. Gene, 243, 133-137. (2000)). [00156] In certain embodiments, the present invention also relates to a method of repeated cycles of nucleic acid mutation, transformation and selection, which allow for the creation of mutant proteins having enhanced charge-switch nucleotide polymerase activity.

Selection of Mutants with Desired Activity

[00157] Polynucleotides with desired activity can easily be selected using standard methods. The error-rate of a polymerase can be detected using an orthogonal replication system as described below, PCR-based assays, or any other methods known to those of skill in the art. Other properties of polymerases including, but not limited to replication activity, exonuclease activity, strand displacement activity and processivity can measured using assays well known in the art.

Methods of Using Mutant Polymerases

[00158] In one embodiment, the invention comprises methods of using the error- prone polymerases of this invention in any assay, test, or method where it would be useful to have sequences containing one or more mutations. Due to their high error-rate, the polymerases of this invention have utility in any molecular biology applications where it would either be advantageous or necessary to generate one or more random mutations in a newly synthesized nucleic acid molecule. In particular, these polymerases would be useful in methods where generation of multiple nucleic acid molecules having random mutations is advantageous or necessary. Exemplary embodiments include, but are not limited to methods investigating the effect of mutations on the level or activity of a protein or peptide.

[00159] More generally, the mutant polymerases of this invention can be substituted for the corresponding parent polymerase in most procedures that employ polymerases, particularly those where it would either be advantageous or necessary to generate one or more random mutations in a newly synthesized nucleic acid molecule. In one exemplary embodiment, the mutant polymerase of the invention is substituted for the wild type TP- DNAP1 in an orthogonal replication system as described in Ravikumar et al, Nat Chem Biol. 2014, 10(3):175-177; Arzumanyan et al, ACS Synth. Biol. 2018, 7, 1722-1729; and Ravikumar et al., 2018, bioRxiv, 313338, which are incorporated herein by reference.

[00160] In one embodiment, the invention comprises in vivo methods of using the error-prone polymerases of this invention to generate nucleic acid sequences comprising at least one mutation relative to a parental nucleic acid sequence. In one embodiment, the in vivo methods include contacting a mutant polymerase of the invention with a template nucleic acid molecule, wherein the contacting is performed in a cell. In one embodiment, the mutant DNA polymerase of the invention is capable of replicating the template nucleic acid molecule, but does not replicate cellular genomic nucleic acid molecules, including, but not limited to cellular DNA. Therefore, in one embodiment, the mutant polymerase of the invention and the template nucleic acid molecule for use in the methods of the invention comprise an orthogonal replication system for specific replication of the template DNA molecule by the mutant DNA polymerase of the invention. In one embodiment, the method comprises modifying a cell with a construct for expressing a mutant polymerase of the invention. In various embodiments, the cell is a bacterial cell, insect cell, yeast cell or mammalian cell. In one exemplary embodiment, the cell is a yeast cell.

[00161] In one embodiment, the cell further comprises a template nucleic acid molecule. A template nucleic acid molecule for use in the method of the invention can be specific for replication by the mutant polymerase. In one embodiment, a template nucleic molecule is an exogenous nucleic acid molecule for the cell expressing the mutant polymerase of the invention. In one embodiment, the template nucleic acid molecule is a plasmid. Plasmids for use in the methods of the invention include, but are not limited to a pi plasmid and a p2 plasmid.

[00162] Nucleic acid can be introduced into a cell using conventional techniques such as lithium acetate transformation, cytoduction techniques, cell mating, calcium phosphate or calcium chloride co-precipitation, DEAE-dextran-mediated transfection, lipofectin, electroporation or microinjection. Suitable methods for transforming and transfecting host cells may be found in Sambrook et al. (Molecular Cloning: A Laboratory Manual, 2nd Edition, Cold Spring Harbor Laboratory press (1989)), and other laboratory textbooks.

[00163] In one embodiment, method comprises incubating a mutant polymerase of the invention with a template nucleic acid molecule and one or more priming nucleic acid molecules under suitable polymerization conditions. In one embodiment, these conditions are provided by a reaction mixture containing ribonucleotide triphosphates (NTPs), deoxyribonucleotide triphosphates (dNTPs), dideoxyribonucleotide triphosphates (ddNTPs), or a combination thereof and a buffer containing a buffering agent, and optionally a divalent cation, and a monovalent cation.

[00164] Priming nucleic acid molecules, or "primers" generally refers to an oligonucleotide capable of acting as a point of initiation of nucleic acid synthesis when annealed to a nucleic acid template under conditions in which synthesis of a primer extension product is initiated, i.e., in the presence of nucleoside triphosphates and a polymerase in an appropriate buffer ("buffer" includes pH, ionic strength, cofactors, etc.) and at a suitable temperature. The primer typically contains 10-35 nucleotides, although the exact number is not critical to the successful application of the method. Short primer molecules generally require lower temperatures to form sufficiently stable hybrid complexes with the template.

[00165] DNA polymerases require a divalent cation for catalytic activity. Exemplary divalent cations that can be included in a reaction mixture include, but are not limited to, Mn +2 , Mg +2 , or Co +2 . The divalent cation can be supplied in the form of a salt such MgCh, Mg(OAc) 2 , MgS0 4 , MnCl 2 Mn(OAc) 2 , or MnSO 4 . Usable cation concentrations in a Tris-HCl buffer are for MnCb from 0.5 to 7 mM, for example, between 0.5 and 2 mM, and for MgCh from 0.5 to 10 mM. Usable cation concentrations in a Bicine/KOAc buffer are from 1 to 20 mM for Mn(OAc) 2 , preferably between 2 and 5 mM. In one embodiment, the monovalent cation is supplied by the potassium, sodium, ammonium, or lithium salts of either chloride or acetate. For KC1, the concentration is between 1 and 200 mM, preferably the concentration is between 40 and 100 mM, although the optimum concentration may vary depending on the polymerase used in the reaction. [00166] In one embodiment, deoxyribonucleotide triphosphates are added as solutions of the salts of dATP, dCTP, dGTP, dUTP, and dTTP, such as disodium or lithium salts. In one embodiment, a final concentration in the range of 1 μΜ to 2 mM each is suitable, and 100-600 μΜ is used, although the optimal concentration of the nucleotides may vary in the reverse transcription reaction depending on the total dNTP and divalent metal ion concentration, and on the buffer, salts, particular primers, and template. For longer products, i.e., greater than 1500 bp, 500 μΜ each dNTP and 2 mM MnCh may be preferred when using a Tris-HCl buffer.

[00167] In one embodiment, a suitable buffering agent is Tris-HCl, preferably pH 8.3, although the pH may be in the range 8.0-8.8. The Tris-HCl concentration is from 5- 250 mM. In one embodiment, the Tris-HCl concentration is from 10-100 mM. In one embodiment, a buffering agent is Bicine-KOH, MOPS-KOH, or HEPES-KOH, with a pH in the range 7.8-8.7.

[00168] In one embodiment, EDTA less than 0.5 mM may be present in a reverse transcription reaction mix. Detergents such as Tween-20™ and Nonidet™ P-40 are present in the enzyme dilution buffers. In one embodiment, a final concentration of non- ionic detergent approximately 0.1% or less is appropriate, and will not interfere with polymerase activity. Similarly, glycerol is often present in enzyme preparations and is generally diluted to a concentration of 1-20% in the reaction mix. In one embodiment, a mineral oil overlay may be added to prevent evaporation but is not necessary.

Mutually Orthogonal Replication Systems

[00169] In one embodiment, the invention comprises a system comprising the orthogonal polymerases of this invention for use in any assay, test, or method where it would be useful to replicate a specific template nucleic acid molecule without altering or replicating other nucleic acid molecules which are present in the system. In one embodiment, the orthogonal replication system includes one or more orthogonal polymerases in a system for replicating one or more specific target nucleic acid molecules. In one embodiment, the orthogonal replication system includes two orthogonal polymerases in a system for replicating two specific target nucleic acid molecules (e.g., a dual orthogonal replication system.) This is illustrated in Figure 7, which illustrates a dual orthogonal replication system which uses an orthogonal polymerase of this invention with an orthogonal plasmid.

[00170] In one embodiment, the method of the invention allows replication of a first template nucleic acid molecule with a first orthogonal polymerase having specificity for the first template and also replication of a second template nucleic acid molecule with a second orthogonal polymerase having specificity for the second template, wherein the first orthogonal polymerase does not replicate the second template, and the second orthogonal polymerase does not replicate the first template. Further, in one embodiment, the system includes genomic nucleic acid molecules and at least one polymerase specific for replication of genomic nucleic acid molecules (e.g., an endogenous polymerase), wherein the one or more orthogonal polymerases do not replicate the genomic nucleic acid molecule.

Use of the Orthogonal Replication Systems to Generate Mutations

[00171] In one embodiment, the method of the invention allows replication of a first template nucleic acid molecule with a first orthogonal polymerase having specificity for the first template and also replication of a second template nucleic acid molecule with a second orthogonal polymerase having specificity for the second template, wherein the first orthogonal polymerase does not replicate the second template, and the second orthogonal polymerase does not replicate the first template. Further, in one embodiment, the system includes genomic nucleic acid molecules and at least one polymerase specific for replication of genomic nucleic acid molecules (e.g., an endogenous polymerase), wherein the one or more orthogonal polymerase do not replicate the genomic nucleic acid molecules.

[00172] In various embodiments, the methods of the invention allow replication of a specific template nucleic acid with a polymerase having an altered (i.e., higher or lower) fidelity or error-rate without altering the fidelity of replication of another nucleic acid molecule that is present in the system. In one embodiment, the method of the invention allows replication of a first template nucleic acid molecule with a first mutant polymerase having altered fidelity and also replication of a second template nucleic acid molecule with a second mutant polymerase having altered fidelity, wherein the first mutant polymerase does not replicate the second template, and the second mutant polymerase does not replicate the first template. Further, in one embodiment, the system includes genomic nucleic acid molecules and at least one endogenous polymerase, wherein the one or more mutant polymerase do not replicate the genomic nucleic acid molecules. Due to their altered fidelity, the mutant polymerases of this invention have utility in any molecular biology applications where it would either be advantageous or necessary to generate one or more random mutations in a newly synthesized nucleic acid molecule. In particular, these polymerases would be useful in methods where generation of multiple nucleic acid molecules having random mutations is advantageous or necessary. Exemplary embodiments include, but are not limited to methods investigating the effect of mutations on the level or activity of a protein or peptide.

[00173] In one embodiment, the invention comprises in vivo methods of using the error- prone polymerases of this invention to generate nucleic acid sequences comprising at least one mutation relative to a parental nucleic acid sequence. In one embodiment, the in vivo methods include contacting a mutant polymerase of the invention with a template nucleic acid molecule, wherein the contacting is performed in a cell. In one embodiment, the mutant DNA polymerase of the invention is capable of replicating the template nucleic acid molecule, but does not replicate cellular genomic nucleic acid molecules, including, but not limited to cellular DNA. Therefore, in one embodiment, the mutant polymerase of the invention and the template nucleic acid molecule for use in the methods of the invention comprise an orthogonal replication system for specific replication of the template DNA molecule by the mutant DNA polymerase of the invention.

[00174] In one embodiment, method comprises modifying a cell with a construct for expressing a mutant polymerase of the invention. In various embodiments, the cell is a bacterial cell, insect cell, yeast cell or mammalian cell. In one exemplary embodiment, the cell is a yeast cell. [00175] In one embodiment, the cell further comprises a template nucleic acid molecule. A template nucleic acid molecule for use in the method of the invention can be specific for replication by the mutant polymerase. In one embodiment, a template nucleic molecule is an exogenous nucleic acid molecule for the cell expressing the mutant polymerase of the invention.

[00176] Nucleic acid can be introduced into a cell using conventional techniques such as lithium acetate transformation, cytoduction techniques, cell mating, calcium phosphate or calcium chloride co-precipitation, DEAE-dextran-mediated transfection, lipofectin, electroporation or microinjection. Suitable methods for transforming and transfecting host cells may be found in Sambrook et al. (Molecular Cloning: A Laboratory Manual, 2nd Edition, Cold Spring Harbor Laboratory press (1989)), and other laboratory textbooks.

[00177] In one embodiment, method comprises incubating a mutant polymerase of the invention a template nucleic acid molecule and one or more priming nucleic acid molecules under suitable polymerization conditions. In one embodiment, these conditions are provided by a reaction mixture containing ribonucleotide triphosphates (NTPs), deoxyribonucleotide triphosphates (dNTPs), dideoxy ribonucleotide triphosphates (ddNTPs), or a combination thereof and a buffer containing a buffering agent, and optionally a divalent cation, and a monovalent cation.

[00178] Priming nucleic acid molecules, or "primers" generally refers to an oligonucleotide capable of acting as a point of initiation of nucleic acid synthesis when annealed to a nucleic acid template under conditions in which synthesis of a primer extension product is initiated, i.e., in the presence of nucleoside triphosphates and a polymerase in an appropriate buffer ("buffer" includes pH, ionic strength, cofactors, etc.) and at a suitable temperature. The primer typically contains 10-35 nucleotides, although the exact number is not critical to the successful application of the method. Short primer molecules generally require lower temperatures to form sufficiently stable hybrid complexes with the template.

[00179] DNA polymerases require a divalent cation for catalytic activity. Exemplary divalent cations that can be included in a reaction mixture include, but are not limited to, Mn , Mg , or Co The divalent cation can be supplied in the form of a salt such MgCh, Mg(OAc)2, MgS04, MnCh, Mn(OAc)2, or MnSCk Usable cation concentrations in a Tris- HC1 buffer are for MnCh from 0.5 to 7 mM, for example, between 0.5 and 2 mM, and for MgCh from 0.5 to 10 mM. Usable cation concentrations in a Bicine/KOAc buffer are from 1 to 20 mM for Μη(ΟΑC) 2 preferably between 2 and 5 mM.

[00180] In one embodiment, the monovalent cation is supplied by the potassium, sodium, ammonium, or lithium salts of either chloride or acetate. For KC1, the concentration is between 1 and 200 mM, preferably the concentration is between 40 and 100 mM, although the optimum concentration may vary depending on the polymerase used in the reaction.

[00181] In one embodiment, deoxyribonucleotide triphosphates are added as solutions of the salts of dATP, dCTP, dGTP, dUTP, and dTTP, such as disodium or lithium salts. In one embodiment, a final concentration in the range of 1 μΜ to 2 mM each is suitable, and 100-600 μΜ is used, although the optimal concentration of the nucleotides may vary in the reverse transcription reaction depending on the total dNTP and divalent metal ion concentration, and on the buffer, salts, particular primers, and template. For longer products, i.e., greater than 1500 bp, 500 uM each dNTP and 2 mM MnCh may be preferred when using a Tris-HCl buffer.

[00182] In one embodiment, a suitable buffering agent is Tris-HCl, preferably pH 8.3, although the pH may be in the range 8.0-8.8. The Tris-HCl concentration is from 5-250 mM. In one embodiment, the Tris-HCl concentration is from 10-100 mM. In one embodiment, a buffering agent is Bicine-KOH, MOPS-KOH, or HEPES-KOH, with a pH in the range 7.8-8.7.

[00183] In one embodiment, EDTA less than 0.5 mM may be present in a reverse transcription reaction mix. Detergents such as Tween-20™ and Nonidet™ P-40 are present in the enzyme dilution buffers. In one embodiment, a final concentration of non-ionic detergent approximately 0.1% or less is appropriate, and will not interfere with polymerase activity. Similarly, glycerol is often present in enzyme preparations and is generally diluted to a concentration of 1-20% in the reaction mix. In one embodiment, a mineral oil overlay may be added to prevent evaporation but is not necessary.

Kits

[00184] As described above, mutant polymerases with altered error-rates have numerous molecular biology applications. Thus, the invention also provides kits comprising mutant polymerases. Such kits can comprise compositions comprising a mutant polymerase described herein together with readily available materials and reagents. In one embodiment, the kit of the invention comprises a cell modified to express a mutant polymerase of the invention. Kits preferably contain detailed instructions for how to perform the procedures for which the kits are adapted. A wide variety of kits can be prepared, depending on the intended user of the kit and the particular need of the user.

EXPERIMENTAL EXAMPLES

[00185] The invention is further described in detail by reference to the following experimental examples. These examples are provided for purposes of illustration only, and are not intended to be limiting unless otherwise specified. Thus, the invention should in no way be construed as being limited to the following examples, but rather, should be construed to encompass any and all variations which become evident as a result of the teaching provided herein.

[00186] Without further description, it is believed that one of ordinary skill in the art can, using the preceding description and the following illustrative examples, make and utilize the compounds of the present invention and practice the claimed methods. The following working examples therefore, specifically point out the preferred embodiments of the present invention, and are not to be construed as limiting in any way the remainder of the disclosure.

Example 1: A highly error-prone orthogonal replication system for rapid evolution of proteins and enzymes in yeast [00187] The key focus of these experiments was to generate an orthogonal replication system with a polymerase having an increased mutation rate. Without being bound by a particular theory, it was hypothesized that the key determinant of mutation rate is the DNA polymerase (TP-DNAP1) that replicates the pi plasmid (Figure 1). Therefore, the goal of these experiments was to find new TP-DNAP1 variants that have high mutation rates. Initial efforts yielded error-prone TP -DNAP 1 mutants by transplanting known fidelity -reducing mutations from related family B DNA polymerases into TP-DNAPl. However, few transplanted mutations produced error-prone TP -DNAP Is that still retained activity. Therefore, current microarray DNA synthesis technologies were used to create an exhaustive scanning saturation mutagenesis library of TP-DNAPl and screen for variants that have elevated mutation rates. Fidelity-reducing mutations identified from this screen were then grouped and select groups were cloned together to generate combinatorial libraries of TP-DNAPl . An additional screen was performed using this library, to identify TP-DNAPl variants with further elevated mutation rates. Fidelity-reducing mutations identified from this screen were then combined with select fidelity-reducing mutations identified from the first, exhaustive screen, to generate another combinatorial library of TP-DNAPl . An additional screen was performed using this library, to identify TP-DNAPl variants with even further elevated mutation rates.

[00188] The materials and methods employed in these experiments are now described.

Materials and Methods

Development of an Orthogonal Replication Svstem

[00189] An extrachromosomal orthogonal error-prone replication system was developed in yeast. At its core, this system consists of a heterologous DNA polymerase/plasmid pair that is orthogonal to host replication such that the orthogonal DNA polymerase (DNAP) replicates only the orthogonal plasmid and not the host genome. By engineering the orthogonal DNAP to be error-prone, a cell that rapidly mutates only the orthogonal plasmid, encoding only genes of interest was obtained. This constitutes a general genetic system for the continuous, parallelizable, targeted, rapid evolution of user- defined genes in vivo, and a versatile synthetic biology platform for manipulating DNA replication inside a cell. The orthogonal replication system is described in detail in Ravikumar etal., Nat Chem Biol. 2014, 10(3): 175-177 and Arzumanyan etal., ACS Synth. Biol. 2018, 7, 1722-1729.

Mutational Screen

[00190] To clone the scanning saturation mutagenesis library of TP-DNAP1, a pool of -19,000 oligonucleotides (130-200-mers) was obtained from Agilent Technologies and sub-cloned into an expression vector for TP-DNAPl . The oligo pool was designed as 29 sub-libraries, each covering a 25-50 variable amino acid region of the TP-DNAPl open-reading frame and flanked by -25 bp constant regions. The variable region consisted of a replacement of each amino acid in the w.t. sequence with 19 codons representing the 19 other amino acids. The mutagenic codons were chosen from a 20- codon genetic code with a maximal codon adaptation index for the S. cerevisiae genome. Constant regions were chosen for efficient PCR amplification. Each sub-library was PCR amplified and assembled into corresponding PCR-amplified plasmid 2 backbones by the Gibson method (Gibson et al., 2009). Assembled sub-libraries were transformed into E. coli at >30-fold coverage of theoretical diversity and plated on selective LB plates. After overnight growth at 37 °C, transformants were scraped from plates and resuspended in 0.9% NaCl for plasmid extraction. Control transformations containing only the plasmid 2 backbones were similarly treated, to verify a low frequency (<5%) of carry over of the template plasmid. Plasmids were extracted from individual clones of two sub-libraries and subject to analysis via agarose gel electrophoresis and Sanger sequencing.

[00191] To clone mutant TP-DNAPl shuffling libraries, plasmids of 65 error-prone basis set mutants identified from a fidelity screen of the scanning saturation mutagenesis library were pooled and crossed by the Gibson method (Gibson et al, 2009). Since many basis set mutations encode mutations outside of strictly conserved motifs, the TP- DNAPl open-reading frame was segmented into four regions to define broader boundaries for shuffling: the exonuclease domain (amino acids 1-596), motif A (amino acids 597-684), motif B (amino acids 685-819) and motif C (amino acids 820-987) (Fig. 6). To cross the 7 motif B basis set mutants with the 10 motif A and 8 motif C basis set mutants, the corresponding regions were PCR amplified from individual mutant TP- DNAPl plasmids, and PCR amplicons from each region were pooled in equimolar ratios. Pooled fragments were assembled with a PCR-amplified backbone of the TP- DNAPl expression vector, using the Gibson method (Gibson et al., 2009). Assembled libraries were transformed and extracted as described above. Shuffling libraries contained a large fraction of misassembled plasmids, as determined by agarose gel electrophoresis. The desired plasmid population was purified by gel extraction and re- transformation. Both transformation steps retained > 100-fold coverage of theoretical library size. Plasmids were extracted from individual clones of the purified libraries and subject to analysis via gel electrophoresis and Sanger sequencing. To cross mutants identified from a fidelity screen of the motif A, B, and C double mutant libraries with exonuclease basis set mutants, a new region was defined to cover the double mutants (amino acids 597-987), and a similar cloning procedure was followed.

[00192] All yeast transformations (including pi integrations) were performed as described previously (Ravikumar et al., 2014). Genomic modifications were made using a CRISPR-Cas9 system for S. cerevisiae (Ryan et al., 2016).

[00193] We note the following protocol modifications for library transformations: (i) 10 μg of plasmid DNA was added for each library transformation; (ii) cells were incubated at 30 °C for 45 min with rotation at -10 r.p.m prior to heat shock; (iii) cells were resuspended in 0.9% NaCl after heat shock and a small portion was plated on selective synthetic complete (SC) medium to determine library size; (iv) the remaining resuspension was inoculated directly into 50 mL (per transformation) of selective SC media and grown to saturation at 30 °C.

[00194] Whole-cell DNA extractions followed the yeast DNA miniprep procedure described previously (Ravikumar et al., 2014). Cytoplasmic plasmid extraction followed the standard whole-cell yeast DNA extraction protocol with a few modifications: (1) cells were washing in 0.9% NaCl prior to treatment with Zymolyase (US Biological); (2) 200 μ§/ηιΙ. proteinase K (Fisher Scientific) was supplemented during SDS treatment for degradation of TP; (3) rotation at -10 r.p.m was used during Zymolyase treatment.

[00195] A list of 99 homologs to TP-DNAPl (EMBL accession number: CAA25568.1) was generated via protein BLAST (Altschul et al., 1990) with default settings. A multiple sequence alignment of TP-DNAPl to these homologs was performed using Clustal Omega (Sievers et al., 2014) and the resulting alignment was analyzed using Jalview (Waterhouse et al., 2009).

[00196] Amino acid mutations were selected based on three criteria. First, candidate positions should be flanked on both sides by residues with sequence alignment to >75% of homologs. Second, the TP-DNAPl amino acid at a candidate position should be represented across >25% of homologs. Third, amino acids not present in TP-DNAPl at a candidate position should be conserved across >25% of homologs. If these criteria were met, then amino acids identified from the third criterion were introduced at the candidate position in TP-DNAPl.

[00197] All three TP-DNAPl libraries were screened through small scale pi fluctuation tests in a metastable S. cerevisiae strain, OR-Y24. OR-Y24 contains w.t. pi and recombinant pi that lacks w.t. TP-DNAPl, and instead, encodes a standardized fluorescence reporter of pi copy number (Table 1), and a disabled version of the LEU2 selection marker (leu2 (Q180*)). As described previously (Ravikumar et al., 2014), leu2 (Q180*) contains a C→T mutation at base 538 in LEU2 at a site permissive to all single point mutants that generate missense mutations. Reversion to functional LEU2 can be detected on medium lacking leucine.

Table 1: Calibration curve of qPCR-determined pi copy number to pl-encoded mKate2 fluorescence.

[00198] Generally, to screen TP-DNAP1 mutants, OR-Y24 strains were transformed with TP-DNAP1 plasmids and the resulting yeast strains were passaged 3-4 times at 1 : 100 dilutions in SC-UH to fully cure w.t. pi . Cured strains were diluted 1 : 10,000 into selective SC media for fluctuation tests. Selective SC media used for pi fluctuation tests lacked uracil, histidine and tryptophan, and was adjusted to pH 5.8 with NaOH (SC-UHW, pH 5.8). Absence of tryptophan and pH adjustment inhibited growth on reversion medium resulting from nonsense suppression of leu2 (Q180*). Dilutions were split into three 100 pL cultures and one 200 μΐ culture in 96- well trays, and cultures were grown to saturation for 2-2.5 days. Saturated 200 μΐ cultures were subject to a copy number measurement, as described below. The remaining three replicates were washed and resuspended in 35 pL 0.9% NaCl. 10 pL was spot plated onto solid SC medium selective for LEU2 revertants. Solid SC medium used for pi fluctuation tests lacked uracil, histidine, tryptophan and leucine and was adjusted to pH 5.8 with NaOH (SC-UHLW, pH 5.8). Plates were incubated at 30 °C for 5-6 days, and afterwards, colony-count was determined for each spot.

[00199] Small-scale fluctuation data were analyzed by the MSS maximum-likelihood estimator method (Foster et al., 2006). Measuring cell titers was infeasible due to the large number of strains, so the average number of cells per culture was assumed to remain constant. Relative phenotypic mutation rates were calculated by normalization to pi copy number. [00200] Prior to screening the TP -DNAPl scanning saturation mutagenesis library, a functional purification was imposed in OR-Y24 to eliminate frame-shifted TP -DNAPl variants, which were common due to errors in oligonucleotide synthesis. The pilot study shown confirmed that sub-libraries transformed into OR-Y24 are enriched for full-length TP-DNAPl variants after two passages in SC media lacking uracil and histidine (SC-UH). To purify the entire scanning saturation mutagenesis library, the remaining 27 TP-DNAPl plasmid sub-libraries were individually transformed into OR-Y24, and the resulting yeast sub-libraries were passaged twice at 1 : 100 dilutions in SC-UH. Passaged yeast sub- libraries were individually plated on solid SC medium lacking histidine. For each sub- library, 24 colonies were propagated in small cultures of SC-UH, in order to verify that >90% of clones robustly grow under selection for pi replication. Afterwards, purified yeast sub-libraries were plated on solid media and colonies from each were individually inoculated into small cultures of SC-UH at ~l-fold coverage of theoretical sub-library diversity. This resulted in a total of 13,625 clones. (This does not include sub-libraries 10- 10, which correspond to the putative N-terminal TP of TP-DNAPl, which should not influence fidelity. These sub-libraries were cloned and purified, but omitted from the screen.) The arrayed clones were then cured of w.t. pi and subject to small-scale pi fluctuation tests, as described above. Then, 376 clones with the highest relative phenotypic mutation rates were subject to an additional small-scale pi fluctuation test with six replicates. TP-DNAPl expression vectors were isolated from 95 yeast clones with the highest relative phenotypic mutation rates and subject to Sanger sequencing. These TP- DNAPls were characterized with large-scale pi fluctuation tests, as described below.

[00201] Large-scale fluctuation tests of pi -encoded leu2 (Q180*) were performed to precisely determine per-base substitution rates for individually cloned or isolated TP- DNAPl variants. Large-scale pi fluctuation tests are performed similarly to small-scale pi fluctuation tests, with several modifications. First, large-scale pi fluctuation tests were typically performed with 36-48 replicates. For highly error-prone TP-DNAPls obtained from later rounds of screening, fewer replicates (3-16) were used, which is sufficient for similar precision (Foster et al., 2006). Second, pi copy number was determined by the flow cytometry method, described below. Third, cell titers were measured for each fluctuation test to estimate the average number of cells per culture. Cell resuspensions were diluted and plated on solid SC-UH medium, and colony counts were determined after incubation at 30 °C for 2-3 days. Alternatively, cell resuspensions were diluted and subject to an event-count measurement via flow cytometry. Fourth, inoculums of highly error- prone TP-DNAPls occasionally contained pre-existing mutants, despite the 1 : 10,000 dilution, so mutant frequencies were estimated by plating pre-cultures on solid SC-UHLW, pH 5.8 medium. Plates were incubated for 2-3 days, in parallel with cultures grown for fluctuation tests. Pre-culture mutant titers were counted to estimate the number of replicates in the fluctuation test expected to contain pre-existing mutants (n). Revertants were counted from all replicates of the fluctuation tests, counts were sorted, and n replicates with the highest counts were omitted from calculations.

[00202] To calculate per-base substitution rates, fluctuation data were analyzed by the maximum likelihood method, implemented using newton.LD. plating in rSalvador 1.7 (Zheng, 2017). Phenotypic mutation rates were calculated by normalizing to the average number of cells per culture. Phenotypic mutation rates were divided by the measured pi copy number and by the number of ways leu2 (Q180*) can revert to LEU2 (2.33 for the ochre codon) to yield per-base substitution rates. 95% confidence intervals were similarly scaled by these factors. All data related to large-scale pi fluctuation tests are fully described in Table 2.

Table 2: Mutation-prone TP-DNAP1 Candidates

[00203] A calibration curve was established to correlate pi copy number, determined via quantitative PCR (qPCR), with fluorescence of pl-encoded mKate2 (Table 1). Five TP- DNAPl variants representing a wide range of copy numbers were transformed into ORY24 and passaged until w.t. pi was displaced. The five OR-Y24 strains were grown to saturation, diluted 1 : 10,000 to mimic pi fluctuation tests, and grown in triplicate 100 uL cultures and duplicate 40 mL cultures. 100 uL saturated cultures were subject to fluorescence measurement of mKate2 (ex/em = 561 nm/620 nm, bandwidth = 15 nm) on a flow cytometer (Invitrogen Attune NxT). Whole-cell DNA extracts were prepared from 40 mL cultures and used as templates for qPCR measurement of pi -encoded leu2 (Q180*), as described previously (Ravikumar et al., 2014). pl-encoded LEU2 was PCR amplified with qPCR-Leu2F and qPCR Leu2R; and genomic LEU3 was PCR amplified with qPCR- Leu3F and qPCR-Leu3R using SyBR Green (Fisher Scientific) master mix. The correlation of mKate2 fluorescence and qPCR-determined pi copy number had a strong linear fit across pi copy numbers ranging from 9-90, and had low background (y = 0.048x + 0.206, r2 = 0.954).

[00204] To assay pi copy number for large-scale p1 fluctuation tests, additional replicates of OR-Y24 strains were grown from the 1 : 10,000 dilution in triplicate and saturated cultures were subject to fluorescence measurement via flow cytometry. Fluorescence measurements were converted to copy number with the linear calibration curve.

[00205] To assay relative pi copy number for small-scale pi fluctuation tests, an additional 200 μL, culture of OR-Y24 strains were grown from the 1 : 10,000 dilution. 200 μΐ. saturated cultures were subject to an OD600 measurement and fluorescence measurement of mKate2 (ex/em = 588 nm/633 nm), using a microplate reader (TECAN Infinite M200 PRO). A linear relationship was assumed between pi copy number and OD600-normalized mKate2 fluorescence. Copy numbers were calculated from normalization to a w.t. TP-DNAPl control.

[00206] For a few large-scale pi fluctuation tests, pi copy number was assayed by the method of small scale pi fluctuation tests (i.e. microplate reader measurement). For these experiments, mutant TP-DNAPl pi copy numbers were calculated by normalization to a w.t. TP-DNAPl control. The copy number of this control was assumed to be the average w.t. TP-DNAPl copy number from all large-scale pi fluctuation tests that used flow cytometry measurements.

[00207] From mKate2 measurements of the 13,625 clones screened with small-scale pi fluctuation tests, 283 clones exhibiting high fluorescence were chosen for additional characterization. TP-DNAPl plasmids were isolated from these strains and subject to Sanger sequencing. 210 unique variants were identified and the corresponding plasmids were re-transformed into OR-Y24. Transformed strains were passaged in SC-UH until w.t. pi was displaced, and subject to pi copy number measurement in triplicate. Four high copy variants were directly subject to qPCR measurements for validation. To test the suppressor activity of the G410H mutation, which yields increased pi copy number, this amino acid change was added to several low-activity TP-DNAPl variants, pi copy number was similarly assayed for these strains.

[00208] From this screen, we found 95 promising error-prone TP-DNAPl candidates (Table 2). After measuring their mutation rates more accurately through large-scale fluctuation tests, we identified 41 unique variants (Rd2 mutants) with error rates up to ~2xl0 '7 s.p.b. (Table 2). Unlike Rdl mutants, Rd2 mutants retained high activity, and on average replicated pi at only a 2-fold lower copy number than did wild type (w.t.) TP- DNAPl . Only 9 of the Rd2 hits contained mutations at positions considered in the homology-based library design that generated Rdl hits, indicating that fidelity determinants of TP-DNAPl can lie outside of the most-conserved regions of DNAPs. Incidentally, we also discovered 210 TP-DNAPl variants that replicated pi at a higher copy number than did w.t. TP-DNAPl (Table 3), and added the mutation from one of these variants to several low-activity mutator TP-DNAPls to confirm the generality of the activity-boosting phenotype (Table 4). (These variants were not included in subsequent experiments here, but should prove useful in future TP-DNAPl engineering efforts.) Rd2 hits were combined with Rdl hits to form a 65-member basis set of mutations that moderately increase the error rate of TP-DNAPl (Figure 2).

Table 3: TP-DNAPl Variants that Replicate pi at a Higher Copy Number than w.t. TP-DNAPl

Table 4: Mutation G410H broadly increases activity of TP-DNAP1 variants

[00209] Using this basis set, we designed, cloned, and screened combinatorial libraries in order to find highly error-prone TP-DNAPls. To limit combinatorial diversity in our designs, we grouped basis set mutations according to their proximity to DNAP motifs known to affect fidelity (i.e. the A and C motifs in the palm domain, the B motif in the fingers domain, and the Exo I, II, and III motifs in the exonuclease domain (Joyce and Steitz, 1994)) and cloned only inter-motif combinations. We expected that synergy between inter-motif mutations from different domains (e.g. motif A mutations with motif B mutations) would yield super-additive or super-multiplicative reductions in fidelity, as observed with RB69 DNAP and E. coli Pol I, respectively (Bebenek et al., 2001; Camps et al., 2003). We screened a library of motif B mutants crossed with motif A and C mutants and found 46 mutators (Table 2). The most error-prone of these 46 include three TP- DNAP1 variants (Rd3 mutants) with mutation rates of -lxlO "6 s.p.b., representing a -400- fold increase over the w.t. TP-DNAP1 mutation rate and a - 10,000-fold increase over the yeast genomic mutation rate. We then crossed these Rd3 mutants with all of the exonuclease domain mutants from our basis set. After screening the resulting library, we obtained four hits (Rd4 mutants), including two highly error-prone variants, TP-DNAP1- 4-1 (V574F, I777K, L900S) and TP-DNAPl-4-2 (L477V, L640Y, I777K, W814N), that replicate pi at mutation rates of ~7xl0 "6 s.p.b. and -lxlO '5 s.p.b., respectively, and that both sustain a pi copy number of ~5 (Figure 3; Table 2). Additional rounds of library design and screening should reach even higher error rates, but these two Rd4 mutants are already exceptionally error prone, so we ended our polymerase engineering effort at this point. As a practical guide, for facile generation of DNA libraries in vivo with TP-DNAPl- 4-2, a 1 \\L saturated yeast culture is theoretically sufficient for 1-fold coverage of all single mutants of a 1 kb gene and a 200 mL culture is sufficient for all double mutants. With mutational accumulation, highly diverse libraries can be generated in even smaller volumes: 1-fold coverage of all double mutants of a 1 kb gene can be achieved in a 650 uL culture with just 50 generations of propagation.

[00210] Fluctuation tests of the genomic URA3 gene were performed to determine genomic per-base substitution rates, as previously described (Ravikumar et al., 2014). Fluctuation data were analyzed by the maximum likelihood method, as described for large- scale pi fluctuation tests. Phenotypic mutation rates were divided by the target size for loss of function of URA3 via base pair substitution (Lang et al., 2008), to yield per-base substitution rates. 95% confidence intervals were similarly scaled by these factors. All data related to genomic URA3 fluctuation tests are fully described in Table 5.

Table 5: Yeast genomic substitution mutation rates in the presence of error-prone TP-DNAP1 variants

[00211] We found that the high pi mutation rates driven by error-prone TP-DNAPls remained completely stable for the longest duration tested (90 generations; Table 2), and genomic mutation rates remain unchanged in the presence of pi replication by the most error-prone TP-DNAP1, TP-DNAP 1-4-2 (Figure 3; Table 5). Therefore, the mutant TP- DNAP1 of the invention can durably sustain in vivo mutagenesis with complete orthogonality (i.e. at least -100,000-fold mutational targeting) to enable continuous evolution experiments.

Comparison of error-prone TP-DNAP1 variants to elevated genomic mutation rates

[00212] The mutant TP-DNAP 1 of the invention can access and sustain mutation rates that untargeted genome mutagenesis cannot. There is a theoretically predicted and empirically observed inverse relationship between the length of an information-encoding polymer, such as a gene or genome, and the tolerable error rate of replication (Biebricher et al., 2006; Bull et al., 2007; Drake 1991; Nowak and Schuster, 1989). At sufficiently high mutation rates, essential genetic information is destroyed every generation, guaranteeing extinction, and even moderately elevated mutation rates can erode fitness (Herr et al., 2011 ; Wilke et al., 2001). Continuous directed evolution systems fundamentally work by targeting mutagenesis to desired genes in order to bypass the low error thresholds of large cellular genomes, but existing systems still elevate genome-wide mutation rates of cells or phages, falling short of a complete bypass (Badran and Liu, 2015; Camps et al., 2003; Crook et al., 2016; Esvelt et al., 2011; Fabret et al., 2000; Finney-Manchester and Maheshri, 2013; Halperin et al., 2018; Moore et al, 2018). Since the mutant TP-DNAP1 is fully orthogonal to genomic replication, it achieves the complete bypass of genomic error thresholds for genes of interest, which should result in the ability to run in vivo continuous evolution for indefinitely large numbers of generation at mutation rates that are exclusively limited by the thresholds of user-selected genes.

[00213] In order to demonstrate the limitations of genomic error thresholds on continuous evolution, we experimentally applied high mutation rates to the host genome. This was done by transplanting previously discovered mutations that increase the substitution mutation rate of POL3 (Herr et al., 2011), the primary yeast lagging strand DNAP, into w.t. or mismatch repair deficient (Amsh6) versions of AH22, the parent of TP- DNAP1 -containing strains. Mutator phenotypes, verified by fluctuation tests at a genomic locus, were accompanied by severe growth defects and led to immediate extinction in the case of pol3-01, Amsh6 AH22 (Figure 4). In agreement with a previous estimate (Herr et al., 2011), the projected mutation rate imposed in this nonviable AH22 strain was 4.72x10 " 6 s.p.b., calculated from the individual contribution of pol3-01 and the average effect of MSH6 loss. This mutation rate is presumed to exceed the haploid yeast error induced extinction threshold thereby killing the host cell (Herr et al., 2011). Since replication of pi by TP-DNAP1-4-2 occurs at a higher mutation rate than 4.72xl0 '6 s.p.b., we conclude that the mutant TP -DNAP 1 can stably exceed categorical mutation rate limits on replicating cellular genomes.

[00214] We also asked whether viable genomic mutator strains could sustain mutagenesis. Four AH22 strains with mutation rates of 1.64xl0 "7 -5.24xl0 " ' s.p.b. were propagated for 82 generations in triplicate, and afterwards, a clone from each was subject to genomic mutation rate measurements via fluctuation tests (Figure 5). Across replicates, the mutation rate drops an average of 284-fold (Figure 5), likely due to suppressor mutants that alleviate deleterious genome mutagenesis and overtake the population (Herr et al., 2011). In contrast, mutagenesis on pi remains constant (Figure 5). This indicates that in durations relevant to directed evolution experiments, even moderate genome mutagenesis is unsustainable whereas continuous mutagenesis in the invention is sustainable. [00215] To construct POL3 mutator strains, plasmids were transformed into AH22 encoding wild-type POL3 on a URA3 plasmid or Amsh6 AH22 encoding wild-type POL3 on a URA3 plasmid. Transformants were expanded in selective SC medium and spot plated on selective SC medium or selective SC medium supplemented with 5- FOA (1 g/L) for plasmid shuffle of of the plasmid encoding wild-type POL3 via URA3 counter- selection (Boeke et al., 1984).

[00216] Fluctuation tests of the genomic CAN1 gene were performed to determine genomic per-base substitution rates. To minimize propagation of POL3 mutator strains, fluctuation tests were performed directly on colonies from plasmid shuffle plates. For each strain, 48 colonies from plasmid shuffle plates were individually scraped and resuspended in 120 μL. 0.9% NaCl. 10 uL from each resuspension was diluted and subject to an event counts measurement via flow cytometry. This was done to identify colonies of similar cell count, because fluctuation tests are only appropriate when final population sizes for all replicates are similar. 24 resuspensions with similar event counts were used for fluctuation tests. 90 μL. from the 24 resuspensions were mixed and plated onto SC medium lacking arginine and supplemented with 10X canavanine (0.6 g/L). 10 , μ frLom four of the 24 resuspensions was pooled, diluted and titered on solid SC medium. Plates were incubated at 30 °C. Colonies were counted from titer plates after 2-4 days and from spot plates after 3-6 days. Based on titer counts, the average number of generations that occurred during colony formation was -15.

[00217] To test the stability of mutator phenotypes, three colonies from each plasmid shuffle plate were passaged ten times at 1 : 100 dilutions (67 generations). A single clone from each final population was isolated and subject to CAN1 fluctuation tests. This was performed using a protocol similar to that described previously for URA3 fluctuation tests (Ravikumar et al., 2014), except cultures from CAN1 fluctuation tests were plated onto solid SC medium lacking arginine and supplemented with 10X canavanine (0.6 g/L).

[00218] Fluctuation data were analyzed as described above for URA3 fluctuation data, but with mutation frequency parameters for CAN1 (Lang et al., 2008). The Fenton approximation (Fenton, 1960) was used to calculate the predicted rate of the extinct mutator strain. All data related to genomic CAN1 fluctuation tests are fully described in Figure 4 and Figure 5.

Use of error-prone TP-DNAP1 for high-throughput evolution of drug resistance

[00219] Sustainable, continuous, and targeted mutagenesis with the mutant polymerases of the invention can be used to understand and predict drug resistance in high-throughput evolution experiments that abundantly sample adaptive trajectories and outcomes. PfDHFR resistance to the antimalarial drug, pyrimethamine, occurs in the wild primarily through four active site mutations (N51I, C59R, S108N, and I164L), but the broader resistance landscape remains largely unknown. Laboratory evolution and landscape- mapping studies have mostly been limited to the quadruple mutant fitness peak (qm-wild) and suggest that resistance reproducibly arises from the crucial S108N mutation, followed by step-wise paths to qm-wild (Chusacultanachai et al., 2002; Hankins et al., 2001; Japrung et al., 2007; Lozovsky et al., 2009; Sirawaraporn et al., 1997; Wooden et al., 1997). We asked whether high-throughput directed evolution of PfDHFR resistance to pyrimethamine would reveal a more complex landscape with additional fitness peaks, including ones that forgo S108N.

[00220] We used mutant TP-DNAP1 to evolve PfDHFR resistance to pyrimethamine in 90 independent 0.5 mL cultures. Based on a well-established yeast model of PfDHFR, we constructed transgenic yeast strains that lack endogenous DHFR and depend on pl- encoded PfDHFR. These strains acquired sensitivity to pyrimethamine and in pilot studies, evolved resistance by accumulating mutations in PfDHFR. We found that resistance arose more commonly and successfully as the mutation rate of pi was increased, suggesting that the mutant polymerases of the invention could indeed be used to drive rapid PfDHFR evolution. To perform a large-scale resistance evolution experiment, strain OR-Y8, which uses the most mutagenic TP-DNAP1 (TP-DNAP1-4- 2) to replicate pi -encoded PfDHFR, was seeded into 90 independent 0.5 mL cultures containing pyrimethamine. Cultures were grown to saturation and uniformly passaged at 1 : 100 dilutions into media containing gradually increasing pyrimethamine concentrations chosen to maintain strong selection as populations adapted. After just 13 passages (i.e. 87 generations), 78 surviving populations adapted to media containing the maximum soluble concentration of pyrimethamine (3 mM). (Revival experiments showed that extinction of the 12 replicates was stochastic and that they could also adapt given repeated chances (Table 6).) From Sanger sequencing analysis of bulk adapted populations, we identified 37 unique protein-coding mutations across all replicates and as many as six amino acid changes in a single population. A large fraction of these mutations are predicted to be adaptive. For example, ten of the 37 mutations have been previously reported to yield pyrimethamine resistance (Chusacultanachai et al., 2002; Hankins et al., 2001; Japrung et al., 2007; Tanaka et al., 1990). In addition to these 37 mutations, several mutations identified in the promoter region increased gene expression; and we hypothesize that some of the observed synonymous mutations in PfDHFR reduce translational suppression mediated by binding of PfDHFR to its own mRNA sequence (Zhang and Rathod, 2002).

Table 6: Stochastic extinction in revival experiments of PfDHFR evolution

[00221] Adapted populations primarily converged on a region of the PfDHFR resistance landscape that contains previously unidentified S108N-based genotypes as fit as qm-wild. Across all replicates, we observed seven pervasive coding changes, including 737_738insA, which creates an adaptive C-terminal truncation. The two most common mutations, C59R and S108N, occur together in 62/78 adapted populations. Although these mutations are present in qm-wild, only one population accumulated a third mutation from the qm-wild peak. Instead, most populations diverged from qm-wild and acquired combinations of C50R, D54N, or Y57H in addition to C59R and S108N, indicating a new region in the PfDHFR resistance landscape with high fitness. To validate this, we fully mapped the resistance landscape of this region defined by C50R (10000), D54N (01000), Y57H (00100), C59R (00010), and S108N (00001) by constructing and measuring the MIC of all combinations of these five mutations. We found that this region is indeed highly fit and contains four alleles that have similar or higher pyrimethamine MICs than qm-wild (11110, 10111, 01111 and 11111). Since these alleles are close in genotype, differing by only one or two mutations, they approximate a fitness plateau. In two replicate lines of our evolution experiment, this plateau is reached via 01111. Although most replicates in our experiment do not reach this particular plateau, the 00111 intermediate was frequently accessed. In these instances, additional adaptive mutations were often acquired outside of the five-mutation landscape. For example, one of the replicates contains the previously reported C6Y resistance mutation, alongside Y57H, C59R, and S108N. Since 00111 by itself is almost as resistant as genotypes on the plateau, these populations likely achieved comparable fitness atop neighboring peaks in the wider landscape. Taken together, we conclude that our evolution experiments were able to rapidly identify previously unknown solutions to PfDHFR resistance.

[00222] Epistasis among mutations in S108N-based trajectories directs adaptation to 01111 and leads to the observed convergence of 00111 across replicate lines. Because S108N is a highly adaptive single mutant, 00001 rapidly and repeatedly fixed first in evolving populations, and blocked access to the 96/120 possible trajectories in this landscape that start with other first-step mutations. From 00001, access to the fitness plateau is constrained by negative epistasis between S108N and D54N, which is relieved and changes sign only when Y57H and C59R are both present. (We note that adapted populations in our evolution experiment containing high frequencies of D54N, C59R and S108N without Y57H, typically carry other, potentially compensatory, promoter and coding mutations that take the place of Y57H.) As a result, just eight of the 24 possible paths from 00001 to the plateau avoid inactive PfDHFR intermediates. Adapting populations limited to these paths likely follow the greediest one. This explains why our experiment finds that evolution, particularly of 00111 and 01111, is largely repeatable. [00223] Notably, 11110 lies on the fitness plateau without requiring S108N. Three populations in our experiment avoid mutation at S 108 and can access this unique quadruple mutant. We attribute this to a rare clonal interference event where the 01100 double mutant arises and displaces a population that has nearly fixed 00001. One of these replicates additionally fixed C59R to reach 01110, the triple mutant with the highest MIC. Stronger selection for pyrimethamine resistance, if feasible, should also fix C50R and lead to 11110.

[00224] Since 11110 is suppressed by rapid fixation of S108N, weaker early selection or greater population structure (Salverda et al., 2017; Szendro et al., 2013), should allow alternative first-step mutations (e.g. Y57H, C59R) to fix and increase the chance of reaching 11110. Alternatively, random initial mutations created by neutral drift have been shown to direct drug resistance evolution along new trajectories (Salverda et al., 2011). We examined this latter possibility by repeating evolution from a variant of w.t. PfDHFR with a synonymous codon change at SI 08 (AGC→TCA) that prevents mutation to N through a single substitution. Twelve populations starting from this allele were evolved under the same pyrimethamine regimen described for the large-scale experiment. In this experiment, the ten surviving populations dramatically shifted towards a new, convergent outcome that avoids S108N and fixes D54N instead. Seven of these ten populations reached the 01100 double mutant that can subsequently access 11110. Since different pyrimethamine-resistant mutants should respond differently to other DHFR inhibitors, the existence of S108N- independent outcomes and the ability to steer the population towards these through weaker selection or neutral drift may have implications for drugs schedule design. In the future, we aim to leverage the scalability of the mutant polymerases of the invention, by starting evolution from hundreds of neutral variants of PfDHFR, to capture the scope of trajectories that may be available from standing variation in natural P. falciparum populations and predict selection conditions that may prefer one trajectory over another. Here, we conclude that our large-scale evolution experiment is able to identify a rare path to pyrimethamine resistance that avoids the commonly observed S108N mutation that is crucial in natural PfDHFR resistance.

[00225] Several adaptive populations in our experiment access the broader landscape beyond 11111. As described above, in some replicates, 00111 serves as a stepping-stone to neighboring fitness peaks through additional mutations like C6Y. In other replicates, we find a suboptimal peak containing C59Y (10121) at which populations are occasionally trapped. In one replicate, D54N fixes with S108T and avoids negative epistasis with S108N. Future analysis will include less frequent candidate adaptive mutations that occur in multiple replicates (e.g. Y35H, I150T, D222N, L251S, T268A) or fix independently in time (e.g. M249I). However, our analysis of only the most common adaptive mutations and mutational paths has already uncovered new peaks in the landscape of PfDHFR- mediated drug resistance and provides examples of how epistasis results in evolutionary repeatability, and how the existence of greedy mutations such as S108N can render a highly adaptive outcome (11110) rare through early fixation. In other words, high-throughput directed evolution with the mutant polymerases of the invention enables the discovery of new fit regions of adaptive landscapes and thorough studies of molecular evolution at the level of a single protein.

Example 2; Mutually orthogonal DNA replication systems in vivo

[00226] The experiments presented herein demonstrate the development of an additional DNAP/plasmid pair which is orthogonal to genomic replication in S. cerevisiae and is also orthogonal to a previously developed DNAP/plasmid pair, such that the two DNAP/plasmid pairs are mutually orthogonal to each other. This solidifies two platforms for independently expanding the properties of DNA replication in vivo.

[00227] The two orthogonal replication systems are based on the cytoplasmically- localized pGKLl/pGKL2 (pl/p2) plasmids originating from Kluyveromyces lactis (Strak et al., 1990, Yeast 6: 1-29; Klassen et al., 2007, Microbial Linear Plasmids pp. 187-226). Both pi and p2 encode their own DNAPs, TP-DNAPl and TP-DNAP2, respectively. It was previously demonstrated that engineered error-prone variants of TP-DNAPl increase the mutation rate of pi to ~10 "5 substitutions per base (s.p.b.) without affecting the genomic mutation rate of ~10 "10 s.p.b. in S. cerevisiae (Ravikumar et al., 2018, bioRxiv, 313338). It was also found that pi replication strictly requires TP-DNAPl (Ravikumar et al., 2014, Nature Chemical Biology, 10: 175-177). This allowed the development of TP-DNAPl and pi as an orthogonal DNAP/plasmid pair such that engineered changes to TP-DNAPl only act on pi but not on the genome. However, it was not known whether orthogonality to genomic replication holds true for TP-DNAP2 and p2 nor whether the TP-DNAPl/pl and TP-DNAP2/p2 pairs are mutually orthogonal. Here, error-prone TP-DNAP2s are developed, and the associated genetic techniques necessary to engineer the p2 plasmid and TP-DNAP2 are reported. Finally, the experiments provided demonstrate that the TP- DNAP2/p2 pair is both orthogonal to genomic replication and to pi replication.

[00228] The existence of mutually orthogonal genetically tractable replication systems is significant for three main reasons. First, the finding of two mutually orthogonal DNA replication systems should lay the foundation for novel applications in synthetic biology. For example, in vivo accelerated evolution (Ravikumar et al., 2018, bioRxiv, 313338) of different genes or sets of genes can now be carried out at two distinct custom mutation rates, which could be useful for evolving components in hierarchically organized signaling pathways. Another possibility includes using inducible error-prone orthogonal DNAPs to record multiple cellular events or external stimuli, where the number of mutational events in pi and p2 would be independent readouts of the amount of exposure to two signals experienced by cells. In addition, the freedom to engineer two orthogonal DNAPs in vivo may enable propagation of different XNA's in living cells, whereas current efforts are limited to either using novel base-pairs recognized by host DNAPs or engineering DNAPs to synthesize XNA with novel backbones in vitro (Taylor et al., 2015, Nature, 518:427— 430; Pinheiro et al., 2012, Science, 336:341-344; Malyshev et al., 2014, Nature, 509:385- 388). Second is a practical consideration. A primary motivation for developing an orthogonal replication (OrthoRep) system was to achieve continuous rapid evolution of target genes in vivo at extreme mutation rates that the genome cannot withstand (Ravikumar et al, 2018, bioRxiv, 313338). This was achieved by making TP-DNAPl highly error-prone so that it rapidly mutates the pi plasmid but spares the genome. The mutual orthogonality demonstrated here ensures that any essential accessory genes encoded on p2 are also spared by error-prone TP-DNAPls during directed evolution experiments of genes on pi in OrthoRep. Third, the result provides in vivo evidence that DNA initiation of pi and p2 use independent components, pi and p2 both contain terminal proteins (TPs) linked to their 5' termini, which act as origins of replication, akin to other protein-primed DNA replication systems like those found in bacteriophage Φ29 and adenovirus (Rodriguez et al., J. Mol. Biol, 337:829-841; Mysiak et al., 2004, Nucleic Acids Res, 32:3913-3920). The lack of homology between the TPs of pi and p2 suggested that TP-DNAP1 and TP-DNAP2 may use distinct molecular interactions for plasmid initiation. In addition, in vitro biochemical data shows that TP-DNAP1 can initiate replication from pi 's inverted terminal repeat (ITR), hypothesized to act in concert with pi 's TP to form an origin of replication, but cannot initiate replication from p2's ITR. The identification of mutual orthogonality between pi and p2 replication demonstrates that highly specific TP- DNAP interactions with cognate TPs and ITRs govern plasmid initiation, encouraging future studies on the mechanisms of protein-primed DNA replication and suggesting a potential approach for engineering additional orthogonal replication systems that operate concurrently in the same cell (Figure 7).

[00229] The materials and methods employed in these experiments are now described. Materials and Methods

[00230] All oligonucleotide primers and synthesized gene fragments (gBlocks®) were purchased from IDT. Enzymes for PCR and cloning were obtained from NEB. All plasmids, including TP-DNAP2 libraries were cloned using Gibson assembly with overlap regions of 20-30 bp. Vectors harboring homologous recombination cassettes for p2 integrations were cloned as previously described for pi integration cassettes (Ravikumar et al., 2018, bioRxiv, 313338; Ravikumar et al, 2014, Nature Chemical Biology, 10: 175- 177).

[00231] Plasmid pGA55 was cloned as follows: three gene fragments constituting recoded TP-DNAP2 were assembled with the vector backbone of a yeast shuttle vector containing CEN6/ARS4 and HIS3 for propagation in yeast, and ColEl and KanR for propagation in E. coli. The resulting vector was used for TP-DNAP2 complementation and generation of TP-DNAP2 libraries.

[00232] S. cerevisiae strain AR-Y292 served as the parent for all strains used in this study and contains the wild type pGKLl and pGKL2 (or pi and p2) linear plasmids. GA- Y021 and GA-Y069 were created from AR-Y292 by p2 integration methods described below. AR-Y436 is a derivative of AR-Y292 encoding a functional copy of URA3 at the endogenous genomic locus, for 5-FOA-based fluctuation tests of genomic mutation rates in presence of mutagenic TP-DNAP2 variants.

[00233] Strains for testing mutual orthogonality were generated by transforming a panel of CEN6/ARS4 vectors encoding TP-DNAP1 or TP-DNAP2 variants into two base strains, AR-Y304 and GA-Y021. AR-Y304 (Ravikumar et al., 2014, Nature Chemical Biology, 10: 175-177) contains recombinant pi encoding mKate2, URA3 and leu2* without disturbing the native TP-DNAP1 ORF. Similarly, GA-Y021 encodes recombinant p2 that replaces ORF1 with mKate2, URA3 and leu2* without disturbing the native TP-DNAP2 ORF.

[00234] All yeast transformations were performed using the high-efficiency LiAc/SS carrier DNA/PEG method (Gietz and Schiestl, 2007, Nat Protoc, 2:31-34). For integrations into p2, 2-5 μg of plasmid containing the appropriate integration cassette was digested with Seal, yielding a linearized cassette with blunt ends containing the genes of interest flanked by regions of homology to p2. The products of the digestion reaction were directly transformed into appropriate AR-Y292-derived strains harboring wild type pi and p2, and plated on selective solid SC medium. Colonies appeared after 4-5 days of growth at 30°C. CEN6/ARS4 plasmids were also transformed with the LiAc/SS carrier DNA/PEG method, but with only 500-3000 ng of DNA for individual vectors and with at least 10 μg of plasmid DNA for TP-DNAP2 library transformations, to maintain 6-fold coverage.

[00235] pi, p2, and all derived linear plasmids were extracted using a modified version of the yeast DNA extraction protocol detailed in (Amberg et al., 2005, Methods in yeast genetics: a Cold Spring Harbor Laboratory course manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.). The modifications were as follows: (i) cells spun down from 40 mL of saturated culture were washed in 0.9% NaCl before treatment with Zymolyase (US Biological) to break up flocculated cells; (ii) 200 μg/mL proteinase K (Sigma) was supplemented during SDS treatment for degradation of TP; (iii) Rotation at ~ 10 r.p.m. was used during Zymolyase and proteinase K treatments. This large-scale extraction protocol was used for preparing DNA for absolute quantification by qPCR. For qualitative analysis by agarose gel electrophoresis, this extraction protocol was scaled down to extract DNA from only 1.5 mL of saturated yeast culture.

[00236] To achieve active curing of the cytoplasmic p2 plasmid, the yeast Cas9 genomic modification system developed by Cate and coworkers was repurposed for cytoplasmic targeting (Ryan et al., 2014, Elife 3). The SV-40 nuclear localization signal and 8x HIS tag were removed from the pCAS plasmid (Addgene plasmid # 60847) to localize Cas9 to the cytoplasm where p2 (and pi) plasmids propagate. Appropriate 20 nt spacers were cloned into this vector to target different sites in p2. These modified pCAS vectors were transformed into the strains harboring p2 plasmids to be cut, and plated on solid selective SC medium containing 1 g/L monosodium glutamate (MSG) as the nitrogen source and supplemented with G418 (400 μg/mL). Colonies that appeared after incubation at 30 °C for 2 days were inoculated into liquid selective SC medium with 1 g/L MSG and G418 (200 μg/mL) and passaged once at a 1 : 1000 dilution to cure the targeted p2 plasmid. The resulting cultures were then subjected to DNA extraction and analysis by gel electrophoresis to verify loss of the targeted p2 plasmid. To minimize potential toxicity due to Cas9 expression, final strains lacking the pCAS vector were isolated by passaging without G418 selection, and replica plating clones on solid medium with and without G418 to screen for loss of the pCAS vector.

[00237] TP-DNAPl and TP-DNAP2 peptide sequences were aligned using protein BLAST and four candidates residues for library generation were chosen by two criteria (Altschul et al., 1997, Nucleic Acids Res, 25:3389-3402). First, candidate TP-DNAP2 residues must match a residue in TP-DNAPl known to affect fidelity, based on prior studies (Ravikumar et al., 2018, bioRxiv, 313338; Ravikumar et al., 2014, Nature Chemical Biology, 10: 175-177). Second, at least 25% of the 20 neighboring residues must align. This analysis yielded positions S370, Y424, L474 and F882.

[00238] To clone the expression vector for wild type TP-DNAP2 (pGA55), TP-DNAP2 was codon optimized for expression in S. cerevisiae with GenScript's OptimumGene™ tool, and the recoded ORF was synthesized as three gene fragments, which were assembled downstream of the REVl promoter in a CEN6/ARS4 vector containing selection markers HIS3 and KanR. The four TP-DNAP2 NNK libraries were cloned from pGA55 via Gibson assembly. First, a two-step PCR was performed to limit bias in the NNK incorporation that may result from annealing between the degenerate codon and the plasmid template. Linear PCR fragments of pGA55 were generated with 5' ends that terminate immediately 3' of the library codons and 3' ends in the plasmid backbone. These linear amplicons were purified, diluted to 40 ng/^L, and re-amplified in PCR reactions with a forward primer containing Gibson overlap regions and NNK overhangs at the corresponding library site in TP-DNAP2, and the same reverse primer used in the initial PCR. These PCR products were then purified and treated with Dpnl for 6 hours at 37°C to digest any pGA55 plasmid carry- through. The second Gibson fragment was PCR amplified from pGA55 to include the vector backbone starting 5' in KanR and 3' leading up to, but not including the library codon. For each library, 100 ng of corresponding PCR amplicons were combined in a 20 μL. Gibson assembly reaction, and incubated at 50°C for 1 hour. The assemblies were purified and concentrated in 12 μL. of ddH 2 O.. 5 μL, of the purified assembly products were then transformed into electrocompetent ToplO cells and recovered at 37°C for 1 hour. Each transformation was plated at lx, lOx and lOOx dilutions on solid LB medium supplemented with kanamycin (50 μg/mL). After overnight incubation at 37°C, colony counts on the lOx and lOOx plates were used to calculate the transformation efficiency. All transformations yielded more than 3200 transformants, corresponding to > 100-fold coverage of each library in E. coli (each NNK library has a theoretical diversity of 32). The transformants from the lx plates were then harvested by resuspension in 5 mL of sterile ddH20. Library plasmid DNA was extracted from E. coli using the Zyppy™ Plasmid Miniprep Kit. Library quality was determined by verifying plasmid library sizes by agarose gel and Sanger sequencing of bulk library populations as well as 8 individual clones from each library.

[00239] For fluctuation tests used to measure p2 mutation rates, leu2* served as a marker for detecting mutational events. Ieu2* is a disabled version of LEU2, where Q180 is replaced with a TAA stop codon. Q 180 is a permissive site where mutation to any other codon other than TAG and TGA results in functional reversion to LEU2. These mutational events can be detected by plating scores of parallel cultures on medium lacking leucine and counting the number distribution of functional LEU2 mutants. [00240] For TP-DNAP2 library screening, each library member was subjected to small- scale leu2* fluctuation tests with six replicates. 190 library members from each yeast library transformation were arrayed and inoculated into liquid SC medium lacking uracil and histidine, and passaged three times at 1 : 10,000 dilutions. mKate2 fluorescence was measured at every passage on a microplate reader (TECAN Infinite® 200 PRO, settings: λεχ = 588, λειη = 633) to track p2 copy number stabilization. To perform fluctuation tests, each library member was diluted 1 : 10,000 into liquid SC medium buffered to pH 5.8 and lacking uracil, histidine, tryptophan, and dilutions were split into six 100μL , replicates. Cultures were grown for 48 hours at 30°C to reach saturation. Saturated cultures were washed with 200 μL 0.9% NaCl to remove residual leucine and resuspended in 35μL , of 0.9% NaCl. 10 μL. of this resuspension was spot plated onto solid SC medium buffered to pH 5.8 and lacking uracil, histidine, tryptophan and leucine. Spot plates were allowed to dry and incubated at 30°C. Colonies were counted after 5 days. Colony counts were used to calculate the expected number of LEU2 functional mutants (m), using the pO method (Foster, 2006, Methods Enzymol, 409: 195-213).

[00241] To precisely measure pi and p2 per-base substitution mutation rates, several modifications were made to the small-scale protocol. First, 36 replicates were performed for reconfirmation of candidate TP-DNAP2 mutators, and 48 replicates for genomic and mutual orthogonality experiments. Titers were also determined for each strain after spot plating by pooling the residual volume from 4 replicates and plating dilutions on YPD. The expected number of LEU2 functional mutants (m) was determined by the Ma-Sandri- Sarkar maximum likelihood estimator (as calculated by the FALCOR tool and corrected for partial plating (Foster, 2006, Methods Enzymol, 409: 195-213; Sarkar et al., 1992, Genetica, 85: 173-179; Hall et al., 2009, Bioinformatics, 25: 1564-1565). The mean mKate2 fluorescence was determined from 50,000 event counts on Attune™ NxT Flow Cytometer and converted to p2 copy number by using a calibration curve. To determine per-base substitution rates, the corrected m was normalized to the average cell titer, the p2 copy number, and the target size for functional leu2* reversion (2.33 bp). 95% confidence intervals were similarly scaled.

[00242] Primers used for LEU2 qPCR: [00243] forward (5'-GCTAATGTTTTGGCCTCTTC-3')

[00244] reverse (5'-ATTTAGGTGGGTTGGGTTCT-3')

[00245] Primers used for LEU3 qPCR:

[00246] forward (5 ' -CAGC AACTAAGGAC AAGG-3 ')

[00247] reverse (5 ' -GGTCGTT AATGAGCTTCC-3 ' )

[00248] Fluctuation tests using genomically encoded URA3 were performed in the presence of TP-DNAP2 variants to determine the genomic per-base substitution rates, similarly to previously described protocols (Lang and Murray, 2008, Genetics, 178:67— 82). AR-Y292 derived strains harboring the appropriate TP-DNAP2 variant encoded on a CEN6/ARS4 vector (with a HIS3 marker) were grown in liquid SC medium lacking uracil and histidine until saturation. Each strain was diluted 1 :5,000 into SC medium lacking histidine and aliquoted into 48 replicates of 200 μΕ each. Cultures were grown for 48 hours at 30°C to reach saturation. Saturated cultures were washed with 400 μΕ 0.9% NaCl and resuspended in 420 μΕ 0.9% NaCl. 400 μΕ from each replicate was spot plated on pre- dried solid SC medium lacking histidine and supplemented with 5-FOA (1 g/L). The residual 20 μΕ from six replicates were pooled, diluted, and plated on solid YPD medium to determine cell titers. Plates were allowed to dry before incubation at 30°C. Colonies were counted on titer plates after 2 days, and on spot plates after 5 days of growth. The expected number of mutants (m) was calculated using the MSS maximum likelihood estimator method via the FALCOR tool, and corrected for partial plating, as described above. To determine per-base substitution rates, the corrected m was normalized to the average cell titer, the URA3 copy number (1 in haploid yeast), and the target size for 5- FOA resistance via substitutions in URA3 (104 bp). 95% confidence intervals were similarly scaled.

[00249] A standard curve relating p2 copy number to mKate2 fluorescence was prepared by combining quantitative PCR with flow cytometry. During the 1 : 10,000 back dilution step of the leu2* fluctuation tests for the mutual orthogonality experiment, six strains with mKate2 encoded on p2 were diluted into liquid SC medium buffered to pH 5.8 and lacking uracil, histidine and tryptophan to yield 50 mL of saturated culture. After 48 hours of growth at 30°C, a small portion of each culture was diluted 1 : 100 in 0.9% NaCl and analyzed on a flow cytometer (Attune™ NxT Flow Cytometer, settings: λεχ = 561, λεηι = 620; gain = 550) to determine the mean red fluorescence from 50,000 counts. Genomic DNA and linear plasmids were extracted from the remaining 40 mL of each culture using the large-scale DNA extraction protocol detailed above to ensure complete and unbiased extraction of linear plasmids relative to genomic DNA. All extracts were diluted 4000-fold for use in two distinct qPCR reactions, one to quantify p2-encoded leu2* and the other to quantify the genomic copy of LEU3. Each 20 μ,.L qPCR reaction consisted of 5 , oμfL template DNA, 2 μL, forward primer (5 μΜ), 2 μ,L reverse primer (5 μΜ), 1 , μ dLdH20, and 10 μL, of Thermo Scientific™ Maxima SYBR Green/Fluorescein qPCR Master Mix (2X).

[00250] A standard curve for each primer set was prepared by performing qPCR on a dilution series of DNA extracted from F 102-2 (25x, 125x, 625x, 3125x). Non-template controls with only ddH 2 O were included for each primer set to detect contamination. All qPCR's were performed in triplicate on the Roche LightCycler® 480 System using the following protocol:

[00251] qPCR:

1) 95°C for 10 minutes

2) 40x:

95°C for 15 seconds 60°C for 1 minute measurement

3) 95°C for 1 minute

4) 55°C for 1 minute [00252] Primer melting curve: Ramp up to 95°C at 0.1 l°C/s, with 5 measurements per °C.

[00253] Cycle threshold (Ct) values were determined by the LightCycler® 480 software (fit-points method, threshold = 1.75). Ct values from both standard curves were plotted against log ([DNA]). The slope and y-intercept were calculated using linear regression. Each sample's average Ct values were converted into copy number values by using the following equation: copy number = 10((sample Ct - yintercept)/slope). The calculated leu2* copy number was divided by the LEU3 copy number to normalize to genomes extracted and account for variance in DNA extraction efficiency across samples.

[00254] The results of the experiments are now described.

[00255] The strategy used for probing orthogonality of the TP-DNAP2/p2 DNAP/plasmid pair is based on engineering and using error-prone TP-DNAP1 and TP- DNAP2 variants to measure whether they increase mutation rates of genes on p 1 , p2, and/or the host genome. Without being bound by theory, it was hypothesized that if error-prone TP-DNAPls only increase the mutation rate of pi (but not of p2 and the host genome) and error-prone TP-DNAP2s only increase the mutation rate of p2 (but not of pi and the host genome), then the results would indicate that TP-DNAPl/pl and TP-DNAP2/p2 are mutually orthogonal DNA replication systems that are both orthogonal to genomic replication.

[00256] First, a reliable method for encoding and expressing user-defined genes on p2 was developed. To measure mutation rates of p2 replication, mKate2, URA3, and leu2* (mUL*) were encoded on p2. mKatel would serve as a fluorescent reporter for copy number, URA3 as a selection marker, and leu2*, which contains a stop codon at a permissive site in LEU2 (Q180*), would serve as a reporter for substitution mutation rates in fluctuation tests that measure reversion to functional LEU2. Without being bound by theory, it was hypothesized that mUL* could be integrated onto p2 via in vivo homologous recombination, following similar procedures to those used for manipulating p 1 (Ravikumar et al., 2018, bioRxiv, 313338; Ravikumar et al, 2014, Nature Chemical Biology, 10: 175- 177). A DNA cassette was constructed encoding mUL* flanked by regions homologous to p2 such that successful recombination would result in the replacement of the non-essential ORF1 found on wildtype (wt) p2 (Schaffrath et al., 1992, Curr Genet 21 :357-363). After transformation of this cassette into S. cerevisiae strain AR-Y292 containing wt pi and p2, several clones exhibiting uracil prototrophy and detectable fluorescence from mKate2 were isolated. Extraction of cytoplasmic plasmids from these clones confirmed presence of the recombinant p2-delORFl-mUL*, but only at low copy as confirmed by DNA gel electrophoresis and PCR with primers specific to p2-delORF 1 -mUL* . In contrast to similar cassette integrations into pi, passaging under selection for URA3 expression failed to cure the parental wt p2 plasmid and increase the copy number of p2-delORFl-mUL* to levels easily detectable by gel electrophoresis. Although p2-delORFl-mUL* encodes all the necessary genes for its own replication and was selected for through URA3, it is likely that the shorter size of wt p2 provided it with enough of a replicative advantage to be maintained.

[00257] Therefore, more active methods were employed to cure the parental wt p2 plasmid. A yeast CRISPR/Cas9 vector (Ryan et al., 2014, Elife 3) and three candidate sgRNAs to target ORF 1 , which is present in wt p2 but not in p2-delORF 1-mUL, were used. One of the three sgRNA's expressed in conjunction with a cytoplasmically-localized Cas9 achieved complete curing of p2 and a concomitant increase in the copy number of p2- delORFl-mUL*. This was evidenced by an increase in mKate2 fluorescent signal and a brighter p2-delORFl-mUL* DNA gel electrophoresis band. Curing of parental p2 to undetectable levels was confirmed by lack of PCR amplification with primers specific to p2, yielding strain GA-Y021.

[00258] Using GA-Y021, the mutation rate of p2 replication by wt TP-DNAP2, still encoded on p2-delORFl-mUL*, was measured with a previously described fluctuation test where the number distribution of functional LEU2 mutants is used to calculate mutation rate by the MSS method (Foster, 2006, Methods Enzymol, 409:195-213; Sarkar et al., 1992, Genetica, 85: 173-179; Hall et al., 2009, Bioinformatics, 25: 1564-1565). The mutation rate of p2-delORFl-mUL* replication was 5.96 x 10 "10 s.p.b. (95% C.I. : 3.57 x 10 "10 -8.77 x 10 "10 ) with a copy number of 128 per cell (Table 7, Entry 1). This is similar to the wild type pi mutation rate and copy number, which are 1.39 x 10 "9 and 124, respectively (Ravikumar et al., Nat Chem Biol. 2014, 10(3): 175-177).

Table 7: Mutation-prone TP-DNAP2 Candidates

[00259] To facilitate the straightforward testing of error-prone TP-DNAP2s, it was demonstrated that p2 replication could be fully sustained by TP-DNAP2 encoded in trans, on a standard yeast nuclear plasmid, rather than in cis, on p2. First, TP-DNAP2, which is ORF2 of p2, was deleted by homologous recombination of a synthetic cassette encoding URA3. Since p2 is a multi-copy plasmid, the resulting strain (GA-Y069) harbored a mixture of the parental wt p2 and the recombinant p2 with ORF2 deleted (p2-delORF2-URA3), along with unaltered pi . In this strain, both the wt p2 and recombinant p2-delORF2-URA3 plasmids rely on TP-DNAP2 encoded on the parental wt p2 plasmid for replication. Thus, loss of wt p2 should disable replication of p2-delORF2-URA3. Indeed, when the parental p2 plasmid was cured by targeting 0RF2 of wt p2 with Cas9, it was found that all p2- delORF2-URA3 was also lost and that the strain could no longer grow in the absence of uracil. Next, the same experiment was repeated in the presence of a codon-optimized TP- DNAP2 expressed in trans from a standard yeast CEN6/ARS4 nuclear plasmid (pGA55- reTP-DNAP2). After p2 was fully cured by Cas9, it was found that p2 was not present, p2- delORF2-URA3 remained, and this strain could grow in the absence of uracil. In addition, pi was also maintained, indicating that the accessory genes encoded on p2-delORF2- URA3 necessary for replication of both pi and p2 and transcription of TP-DNAP1 on pi were still functional. Therefore, p2-derived plasmids can be replicated by TP-DNAP2 encoded on a standard nuclear plasmid, simplifying the characterization of p2 replication by error-prone TP-DNAP2 variants.

[00260] To identify error-prone TP-DNAP2s, a small library of TP-DNAP2s diversified at locations hypothesized to be responsible for DNAP fidelity was screened. An alignment between TP-DNAPl and TP-DNAP2 revealed that S370, Y424, L474, and F882 in TP- DNAP2 were homologous to residues in TP-DNAPl that were previously found could be mutated to yield error-prone TP-DNAPls (Ravikumar et al., 2018, bioRxiv, 313338; Ravikumar et al., 2014, Nature Chemical Biology, 10: 175-177). Four distinct site- saturation mutagenesis libraries were generated, each diversifying S370, Y424, L474, or F882 in TP-DNAP2 encoded on pGA55-reTP-DNAP2. Due to unsuccessful attempts to generate a strain with a full deletion of p2-encoded TP-DNAP2 and simultaneous integration of mKate, URA3, and leu2*, each library was transformed into GA-Y021 for convenient screening. However, since GA-Y021 still encodes wt TP-DNAP2 on p2- delORFl-mUL*, p2 mutation rates measured in this format are the result of in tandem replication of p2 by wt TP-DNAP2 and each TP-DNAP2 variant encoded in trans. 6-fold coverage of each library was maintained by picking 190 yeast colonies and passaging under selection for URA3 to stabilize the copy number of p2-delORFl-mUL* in the presence of the newly introduced TP-DNAP2 variants. To screen for TP-DNAP2 variants with increased p2 mutation rate, each library member was subjected to a preliminary, small scale leu2* fluctuation test with six replicates. Seventeen candidate mutators with the highest expected number of mutants m calculated by the pO method were then chosen for reconfirmation. Reconfirmation consisted of extracting the CEN6/ARS4 plasmids encoding TP-DNAP2 variants, retransforming into a fresh GA-Y021 background and repeating leu2* fluctuation tests with 36 replicates to determine p2 mutation rate with higher precision. Two error-prone TP-DNAP2 variants, S370Q and Y424Q, increased p2's substitution mutation rate by -14- and ~13-fold, respectively (Table 7, Entries 2 and 3). These variants were not active enough to fully complement a deletion of the native TP- DNAP2 and sustain p2-delORF2~URA3 replication on their own, making the measured mutation rates an underestimate of their true per-base substitution rate. Despite this, these two error-prone TP-DNAP2 variants elevate p2 mutation rate to high enough levels for measuring orthogonality.

[00261] To show that p2 replication is orthogonal to genomic replication, the mutation rate of the host genome was measured in the presence of error-prone TP-DNAP2 variants. Like pi replication by TP-DNAP1, p2 replication by TP-DNAP2 occurs in the cytoplasm via a protein-primed mechanism, making it likely that p2 replication is orthogonal to host genome replication. To test this, CEN6/ARS4 vectors lacking TP-DNAP2, or encoding codon optimized versions of were transformed into AR-Y436. AR-Y436 contains wt pi and p2, as well as an intact URA3 locus in the host genome. Genomic per-base substitution rates were determined via fluctuation tests based on the frequency of 5-FOA resistant clones arising from mutations in the genomic URA3 locus, as previously described (Ravikumar et al, 2014, Nature Chemical Biology, 10: 175-177; Lang and Murray, 2008, Genetics, 178:67-82). The substitution rich spectrum of mutations makes this assay ideal for

detecting whether the elevated substitution rate of TP-DNAP2 S370Q or TP-DNAP2 Y424Q contributes to genomic mutation. No increase in the host genomic mutation rate was observed when error-prone variants of and were present

(Table 8).

Table 8: Genomic Mutation Rate Is Unaltered by Error-Prone TP-DNAP2

[00263] To test whether the replication mechanisms of pi and p2 are mutually orthogonal in vivo, pi and p2 mutation rates were measured in the presence of a panel of TP-DNAP1 and TP-DNAP2 variants with varying mutation rates. Changes in pi or p2 mutation rate induced by TP-DNAP variants would therefore signal a degree of cross- replication between TP-DNAPl/pl and TP-DNAP2/p2, if any. A panel of six polymerases was introduced on CEN6/ARS4 vectors into two separate base strains, AR-Y304 (pi mutation rate reporter strain) and GA-Y021 (p2 mutation rate reporter strain), encoding mKate2, URA3, and leu2* on either pi or p2, respectively. Included in this panel were TP- DNAP1 WT and two error-prone TP-DNAP1 variants found in previous screens: TP- and . Also included were

and the error-prone TP-DNAP2 S370Q and TP-DNAP2 Y424Q variants found in this work. In AR-Y304 and GA-Y021, both pi and p2 still encode their native wt TP-DNAP's. Any contribution to replication by a third TP-DNAP encoded in trans is monitored by detecting changes in linear plasmid mutation rate. Importantly, these experiments afford each DNAP an opportunity to replicate its native plasmid and lose its attached TP, becoming "spent" and perhaps more likely to replicate its noncognate linear plasmid through DNAP exchange (Klassen and Meinhardt, 2007, Linear Protein-Primed Replicating Plasmids in Eukaryotic Microbes, in Microbial Linear Plasmids, pp 1 87— 226. Springer, Berlin, Heidelberg). The presence of TP-DNAP1 WT or TP-DNAP2 WT in trans had no effect on their noncognate linear plasmids' mutation rate. Mutagenic variants TP-DNAP 1 I777K ' L900S and TP- W 814N increased pi's mutation rate by 380- and 870-fold,

respectively, but caused no statistically significant change in p2's mutation rate (Figure 8). Likewise, pi mutation rate was unaltered in the presence of mutagenic TP-DNAP2 S370Q and which increased p2 mutation rate by 29- and 16-fold, respectively (Figure 9). Thus, TP-DNAP1 replicates pi with at least 870-fold specificity over p2, while TP-DNAP2 targets p2 with at least 29-fold specificity over pi. The level of mutual orthogonality measured here is limited by the error-rates of DNAPs used, especially that of TP-DNAP2 variants. Future discovery of more mutagenic TP-DNAP1 or TP-DNAP2 variants may prove an even greater orthogonality between these two replication systems.

[00264] In summary, pi replication by TP-DNAP1 and p2 replication by TP-DNAP2 are both orthogonal to genomic replication and to each other, resulting in two mutually orthogonal DNA replication systems in the same cell. This pair of orthogonal replication systems will enable the in vivo evolution of multiple genes at different elevated mutation rates, molecular recording of biological signals in two distinct DNA channels, and the establishment of additional mutually orthogonal DNAP/plasmid pairs by engineering new TPs.

[00265] While the embodiments are susceptible to various modifications and alternative forms, specific examples thereof have been shown in the drawings and are herein described in detail. It should be understood, however, that these embodiments are not to be limited to the particular form disclosed, but to the contrary, these embodiments are to cover all modifications, equivalents, and alternatives falling within the spirit of the disclosure. Furthermore, any features, functions, steps, or elements of the embodiments may be recited in or added to the claims, as well as negative limitations that define the inventive scope of the claims by features, functions, steps, or elements that are not within that scope.

[00266] Thus, a system for creating and using orthogonal DNA replication systems utilizing mutant DNA polymerases to generate continuous mutations in vivo has been described.