Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
EXPANSION OF THE GENETIC TOOLKIT FOR METABOLIC ENGINEERING OF CLOSTRIDIUM PASTEURIANUM: CHROMOSOMAL GENE DISRUPTION OF THE ENDOGENOUS CPAAI RESTRICTION ENZYME
Document Type and Number:
WIPO Patent Application WO/2016/061696
Kind Code:
A1
Abstract:
By this invention, for the first time, a method for high-efficiency generation of gene knockouts in the anaerobic bacterium Clostridium pasteurianum is provided. Clostridium pasteurianum is a bacterium of substantial industrial importance, due to its selectivity and high productivity of the biofuel and biochemical n-butanol, and its ability to grow on a wide variety of inexpensive substrates. Notable among the substrates that it can utilize as a sole source of carbon and energy is glycerine, which is produced in increasing quantities globally as a by-product of biodiesel processing. The industrial exploitation of Clostridium pasteurianum has previously been impeded by the lack of genetic engineering tools for this bacterium. This invention continues to expand the genetic engineering toolkit for this bacterium. Included in the invention is a means for protecting newly introduced DNA from degradation by a newly identified restriction enzyme within Clostridium pasteurianum, called CpaAII. This restriction enzyme represents a significant barrier in the generation of gene knockouts in Clostridium pasteurianum using established tools. Further included within the invention is a mutant strain of Clostridium pasteurianum lacking another restriction enzyme, CpaAI, which acts as another barrier for genetic transformation of Clostridium pasteurianum.

Inventors:
CHUNG DUANE (CA)
PYNE MICHAEL
MOO-YOUNG MURRAY
CHOU C PERRY
Application Number:
PCT/CA2015/051080
Publication Date:
April 28, 2016
Filing Date:
October 22, 2015
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
CHUNG DUANE (CA)
International Classes:
C12N1/21; C12N1/20; C12N9/22; C12N15/00; C12N15/09; C12N15/31; C12N15/55; C12N15/63; C12N15/74
Other References:
PYNE ME.: "Development of genetic tools for metabolic engineering of Clostridium pasteurianum''.", UWSPACE, 21 April 2014 (2014-04-21), Retrieved from the Internet [retrieved on 20151210]
DONG HJ ET AL.: "Engineering Clostridium strain to accept unmethylated DNA.", PLOS ONE, vol. 5 /2, 9 February 2010 (2010-02-09), pages e9038
CUI GZ ET AL.: "Targeted gene engineering in Clostridium cellulolyticum H10 without methylation.", J. MICROBIOL METHODS, vol. 89/3, June 2012 (2012-06-01), pages 201 - 208
HEAP JT ET AL.: "The ClosTron: A universal gene knock-out system for the genus Clostridium.", J MICROBIOL METHODS, vol. 70, 18 June 2007 (2007-06-18), pages 452 - 464, XP022212053, DOI: doi:10.1016/j.mimet.2007.05.021
PYNE ME ET AL.: "Expansion of the genetic toolkit for metabolic engineering of Clostridium pasteurianum: chromosomal gene disruption of the endogenous CpaAI restriction enzyme.", BIOTECHNOL BIOFUELS., vol. 7 /1, 19 November 2014 (2014-11-19), pages 163
Download PDF:
Claims:
CLAIMS We claim:

1. A method for generating gene knockouts in the bacterium Clostridium pasteurianum.

2. A method of protecting DNA from a newly identified restriction enzyme in Clostridium pasteurianum, named CpaAl I.

3. A bacterial cell lacking the restriction endonuclease CpaAl, wherein said bacteria cell is from Clostridium pasteurianum.

Description:
Title: Expansion of the genetic toolkit for metabolic engineering of Clostridium pasteurianum: Chromosomal gene disruption of the endogenous CpaAI restriction enzyme

[0001 ] This application claims the benefit of U.S. Provisional Patent Application No. 62/067,424, filed Oct. 22, 2014, which is incorporated by reference in its entirety.

REFERENCES CITED OTHER REFERENCES

Aune TEV, Aachmann FL: Methodologies to increase the transformation efficiencies and the range of bacteria that can be transformed. Appl Microbiol Biotechnol 2010, 85:1301-1313.

Biebl H: Fermentation of glycerol by Clostridium pasteurianum - Batch and continuous culture studies. J Ind Microbiol Biotechnol 2001 , 27:18-26.

Cui GZ, Hong W, Zhang J, Li WL, Feng YG, Liu YJ, Cui Q: Targeted gene engineering in Clostridium cellulolyticum H10 without methylation. J Microbiol Methods 2012,

89:201 -208.

Dabrock B, Bahl H, Gottschalk G: Parameters affecting solvent production by

Clostridium pasteurianum. Appl Environ Microbiol 1992, 58:1233-1239.

Dong HJ, Zhang YP, Dai ZJ, Li Y: Engineering Clostridium strain to accept

unmethylated DNA. PLoS One 2010, 5:e9038.

Guss AM, Olson DG, Caiazza NC, Lynd LR: Dcm methylation is detrimental to plasmid transformation in Clostridium thermocellum. Biotechnol Biofuels 2012, 5:1 -6. Heap JT, Pennington OJ, Cartman ST, Carter GP, Minton NP: The ClosTron: A universal gene knock-out system for the genus Clostridium. J Microbiol Methods 2007, 70:452-464.

Heap JT, Pennington OJ, Cartman ST, Minton NP: A modular system for Clostridium shuttle plasmids. J Microbiol Methods 2009, 78:79-85.

Heap JT, Kuehne SA, Ehsaan M, Cartman ST, Cooksley CM, Scott JC, Minton NP: The

ClosTron: Mutagenesis in Clostridium refined and streamlined. J Microbiol Methods

2010, 80:49-55.

Jang YS, Im JA, Choi SY, Lee Jl, Lee SY: Metabolic engineering of Clostridium

acetobutylicum for butyric acid production with high butyric acid selectivity.

Metabolic Engineering 2014, 23: 65-174.

Jensen TO, Kvist T, Mikkeisen MJ, Westermann P: Production of 1 ,3-PDO and butanol by a mutant strain of Clostridium pasteurianum with increased tolerance towards crude glycerol. AMB Express 2012, 2:44.

Kell DB, Peck MW, Rodger G, Morris JG: On the permeability to weak acids and bases of the cytoplasmic membrane of Clostridium pasteurianum. Biochem Biophys Res

Commun 1981 , 99:81 -88.

Leang C, Ueki T, Nevin KP, Lovley DR: A genetic system for Clostridium ljungdahlir. A chassis for autotrophic production of biocommodities and a model homoacetogen.

Appl Environ Microbiol 2013, 79:1 102-1109.

Li YC, Tschaplinski TJ, Engle NL, Hamilton CY, Rodriguez M, Liao JC, Schadt CW,

Guss AM, Yang YF, Graham DE: Combined inactivation of the Clostridium cellulolyticum lactate and malate dehydrogenase genes substantially increases ethanol yield from cellulose and switchgrass fermentations. Biotechnol Biofuefs 2012, 5:2.

Malaviya A, Jang YS, Lee SY: Continuous butanol production with reduced byproducts formation from glycerol by a hyper producing mutant of Clostridium pasteurianum.

Appl Microbiol Biotechnol 2012, 93: 1485-1494.

Mermelstein LD, Welker NE, Bennett GN, Papoutsakis ET: Expression of cloned

homologous fermentative genes in Clostridium acetobutylicum ATCC 824. Nat

Biotech 1992, 10:190-195.

Papoutsakis ET: Engineering solventogenic Clostridia. Curr Opin Biotechnol 2008,

19:420-429.

Pyne ME, Moo-Young M, Chung DA, Chou CP: Development of an

electrotransformation protocol for genetic manipulation of Clostridium pasteurianum.

Biotechnol Biofuels 2013, 6:50.

Pyne ME, Bruder M, Moo-Young M, Chung DA, Chou CP: Technical guide for genetic advancement of underdeveloped and intractable Clostridium. Biotechnol Adv

2014a, 32:623-641 .

Pyne ME, Utturkar S, Brown SD, Moo-Young M, Chung DA, Chou CP: Improved

genome sequence of Clostridium pasteurianum strain ATCC 6013 (DSM 525) using a hybrid next-generation sequencing approach. Genome Announc 2014b, 2:

e00790-14.

Richards DF, Linnett PE, Oultram JD, Young M: Restriction endonucleases in

Clostridium pasteurianum ATCC 6013 and C. thermohydrosulfuricum DSM 568. J Gen Microbiol 1988, 134:3151 -3157. Roberts RJ, Belfort M, Bestor T, Bhagwat AS, Bickle TA, Bitinaite J, Blumenthal RM, Degtyarev SK, Dryden DTF, Dybvig K, et al: A nomenclature for restriction enzymes, DNA methyltransferases, homing endonucleases and their genes.

Nucleic Acids Res 2003, 31 :1805-1812.

Roberts RJ, Vincze T, Posfai J, Macelis D: REBASE-a database for DNA restriction and modification: enzymes, genes and genomes. Nucleic Acids fles 2010, 38:D234- D236.

Sambrook J, Fritsch EF, Maniatis T: Molecular cloning: A laboratory manual. In. 2

edition. Cold Spring Harbor: Cold Spring Harbor Press; 1989.

Shao L, Hu S, Yang Y, Gu Y, Chen J, Jiang W, Yang S: Targeted gene disruption by use of a group II intron (targetron) vector in Clostridium acetobutylicum. Cell Res

2007, 17:963-965.

Taconi KA, Venkataramanan KP, Johnson DT: Growth and solvent production by

Clostridium pasteurianum ATCC (R) 6013 (TM) utilizing biodiesel-derived crude glycerol as the sole carbon source. Environ Prog & Sustainable Energy 2009, 28:100-1 10.

Wilkinson SR, Young M: Wide diversity of genome size among different strains of Clostridium acetobutylicum. J Gen Microbiol 1993, 139:1069-1076.

TECHNICAL FIELD

[UOUZj me present inventrorns directed to bacterial ceils ana metno is Tor overcoming nucleic acid restriction enzymes within bacterial cells, making genetic modifications within bacterial cells, and methods and nucleic acids related thereto.

BACKGROUND

[0003] Environmental and economic concerns surrounding the consumption of finite petroleum-based resources have led to the initiation of a rival biofuel industry wherein cleaner and renewable sources of fuel are produced using biological catalysts. Large- scale biofuel production is currently unfeasible, however, predominantly due to feedstock cost and availability. Traditional feedstocks, such as com and molasses, are readily fermentable, yet fluctuate drastically in price and compete with food supplies, rendering them unsustainable for biofuel production. Cellulosic biomass, on the other hand, is abundant and economical, yet suffers from several bioproce3sing drawbacks, including a requirement for costly enzymatic hydrolysis and tedious chemical pretreatment of raw substrates. To be competitive with petrochemical processes, feedstocks for biofuel production must be cheap and abundant non-food resources that are readily fermented into value-added products. Crude glycerol, resulting from the production of biodiesel at a concentration of 1 kg of glycerol per 10 kg of biodiesel, is one of the few substrates satisfying these criteria. Owing to the immense expansion of global biodiesel industries in recent years, crude glycerol has become a desirable and economical feedstock. However, microogranisms that can effectively dissimilate and ferment glycerol remain predominantly untapped. [0004] Clostridium pasteurianum is a mesophilic, strictly anaerobic, Gram-positive bacterium that possesses the metabolic capacity to ferment glycerol as a sole source of carbon and energy, yielding a mixture of gases (hydrogen and carbon dioxide), acids (acetic and butyric), and alcohols (ethanol, butanol, and 1 ,3-propanediol) (Biebl, 2001 ; Dabrock, 1992). Of these products, butanol is a promising biofuel due to its

resemblance to traditional gasoline with respect to physicochemical and fuel combustion properties. Although several microorganisms can metabolize glycerol, C. pasteurianum is the only species that converts glycerol to butanol, producing up to 17 g I 1 butanol (Biebl, 2001 ), with a maximum yield of 0.36 g g crude glycerol (Taconi, 2009). Biodiesel-derived glycerol requires only minor pretreatment to remove impurities and allows fermentation performance comparable to that of refined glycerol. The ability of C. pasteurianum to metabolize biodiesel-derived glycerol and its highly active butanol biosynthetic pathway make C. pasteurianum a bacterium of substantial biotechnological importance.

[0005] Several recent strategies have been employed in an attempt to alter the central metabolism of C. pasteurianum to enhance its productivity. Unfortunately, a lack of genetic tools has impeded metabolic engineering of C. pasteurianum, allowing only random chemical mutagenesis techniques (Jensen, et al., 2012; Malaviya, et al. , 2012). It is clear that metabolic engineering will play a central role in the development of C. pasteurianum as an efficient industrial producer. To this end, an electroporation- mediated method of transformation was recently established (Pyne, et al., 2013), allowing gene transfer to C. pasteurianum with efficiencies up to 10 4 transformants μg plasmid DNA. Such efficient plasmid transfer paves the way to rational metabolic engineering strategies, including gene disruption, knockdown, and overexpression techniques (Papoutsakis, 2008; Pyne, et al., 2014a), none of which have been explored using C. pasteurianum. Gene disruption offers the most robust avenue for altering the expression of a native chromosomal gene or metabolic pathway. In Clostridium, the preferred tool for gene disruption is the ClosTron system, which has been adapted from TargeTron technology in E. coli and exploits the retrohoming mechanism of bacterial group II introns (Heap, et al., 2010; Heap, et al., 2007; Shao, et al., 2007). Owing to the broad host range of group II introns, ClosTron-mediated gene disruptions have been performed in at least 1 1 species of Clostridium, leading to extensive metabolic engineering of solvent-producing Clostridia.

[0006] Following our initial report of gene transfer to C. pasteurianum, we have observed that electrotransformation efficiency varies drastically between certain shuttle vectors (Pyne, et al. , 2013). Poor electrotransformation outcome was shown to be specific to vectors harboring lactococcal group II intron machinery. Restriction- modification (RM) systems are the most common cause of transformation recalcitrance in bacteria and potently inhibit plasmid transfer (Pyne, et al., 2014a; Aune, et al. , 2010). C. pasteurianum ATCC 6013 produces at least two active RM systems, Cpal (5'-GATC- 3') and CpaAI (5'-CGCG-3') (Richards, et al., 1988), whereby restriction can be blocked using Dam (Cpal) and M.FnuDII (CpaAI) methylation, respectively (Pyne, et al., 2013). Based on our preliminary observation, it is possible that C. pasteurianum expresses a third RM system that recognizes a specific nucleotide sequence within the clostridial gene disruption vectors. This invention discloses for the first time that group-ll-intron- mediated chromosomal gene disruption in C. pasteurianum can be hindered by host restriction and low retrohoming efficiency. This invention further discloses several means by which these barriers can be overcome. The invention further discloses the successful derivation of the first gene disruption mutant of C. pasteurianum and provides a cell of C. pasteurianum containing this gene disruption. The invention disclosing means and methods for chromosomal gene disruption will promote future metabolic engineering of C. pasteurianum.

SUMMARY OF THE INVENTION

[0007] The present invention provides protocols that enable a DNA restriction system in bacterial cells to be overcome. By overcoming the DNA restriction system, it becomes possible to apply tools for introducing genetic modification within the genomes of the bacterial cells. The present invention includes the bacterial cells containing genetic modications.

[0008] In one preferred embodiment, the bacterial cells are of the anaerobic bacterium Clostridium pasteurianum. In another preferred embodiment, the invention describes a method whereby the sequence of recombinant DNA constructs can be altered, so as to avoid a restriction enzyme, which prevents efficient transformation of the bacterial cells. In another preferred embodiment, one or more methyltransferases and their amino acid and nucleotide sequences are disclosed, which can be applied to protect recombinant DNA constructs from degradation by restriction enyzmes present within said bacterial cells. In another preferred embodiment, a means of making genetic modifications within the genome of said bacterial cells is disclosed. In another preferred embodiment, a bacterial cell containing a genetic modification is disclosed.

DESCRIPTION OF THE FIGURES

FIG. 1. Electrotransformation data demonstrating that C. pasteurianum restricts a 932 bp Sacll-BstAPI fragment within the ItrA gene of pSY6catP. Only relevant vector regions corresponding to the intron components are shown and are depicted to scale. To allow alignment between constructs, deleted regions of vectors are represented as horizontal dashed lines and the ermB retrotransposition-activated marker (RAM; unshaded box) within plasmids pMTL007C-E2 and pMTL007C-E6 is shown above the Ll.ltrB intron. Shaded box: Ll.ltrB intron; unshaded box: ermB RAM; shaded arrow: ItrA; dashed line: Dcm recognition site; Pptb: ptb promoter (C. acetobutylicum); Pfdx: fdx promoter (C. sporogenes); ND: not detected. Relevant restriction endonuclease recognition sites corresponding to BstAPI (B), Mfel (M), Nhel (N), and Sacl l (S) are abbreviated using a single letter.

FIG. 2. Electrotransformation data demonstrating that C. pasteurianum restricts a 334 bp region within the ItrA gene sequence of pSY6catP. The 932 bp Sacll-BstAPI region of pSY6catP (see FIG. 1 ) is enlarged to better show relevant vector components. Point mutations within the ItrA coding sequence are depicted as vertical bands. Enlarged vector components are depicted to scale. Shaded box: Ll.ltrB intron; shaded arrow: ItrA; dashed line: Dcm recognition site; Pptb: ptb promoter (C. acetobutylicum); ND: not detected. Relevant restriction endonuclease recognition sites corresponding to Aatll (A), Bglll (B), EcoO109l (E), and Sacll (S) are abbreviated using a single letter.

FIG. 3. Identification and verification of AcpaAIR mutant colonies of C. pasteurianum. A) Schematic diagram depicting primer annealing sites and expected PCR products of wild-type cells (left) and AcpaAI mutant cells (right). Insertion of the Ll.ltrB intron into the cpaAIR gene leads to a 915 bp increase in size of the full-length PCR product generated using primers flanking the 176a intron insertion site (REN.Rv + REN.Fw). Both 5' and 3' gene-intron junction PCR products can be detected in AcpaAIR mutant cells using primer sets REN. Rv + ItrB.Rv and ItrB.Fw + REN.Fw, respectively. B) Colony PCR screening of gene disruption enrichment colonies for presence of intron insertion by amplification of the full length product. Lane 1 : marker; lane 2: no template control; lane 3: wild-type, non-recombinant C. pasteurianum colony; lanes 4-10: gene disruption enrichment colonies; lanes 4, 5, 7, and 8: positive colonies; lanes 6, 9, and 10: negative colonies. C) Further genomic verification of a single positive AcpaAIR mutant colony by amplification of all three PCR products depicted in FIG. 3A (5' junction, 3' junction, and full product). A wild-type C. pasteurianum colony was included as a control for all three PCR primer sets. Lane 1 : marker; lanes 2-4: wild-type colony; lanes 5-7: AcpaAIR mutant colony; lanes 2 and 5: 5' junction; lanes 3 and 6: 3' junction; lanes 4 and 7: full product.

FIG. 4. Electrotransformation results demonstrating successful electrotransformation of AcpaAIR gene disruption cells with M.FnuDII-unmethylated plasmid pMTL85141 ermB. Wild-type cells (top row) and AcpaAIR gene disruption cells (bottom row) of C.

pasteurianum w r electroporated separately with M.FnuDII-unmethylated (left column) and M.FnuDII-methylated (right column) plasmid pMTL85141 ermB. M.FnuDII methylation was achieved in vivo using an E. coii strain harboring pMTL85141 ermB and pFnuDI IMKn. Varying volumes of electrotransformation outgrowth cell suspensions were plated to give approximately equal numbers of transformants between

eiectrotransformations. Hence, the number of transformant colonies shown does not allow direct comparison of electrotransformation efficiency.

ABBREVIATIONS

IEP: intron-encoded protein; RAM: retrotransposition-activated marker; RM: restriction- modification; SMRT sequencing: single molecule real time sequencing; SOE PCR: splicing by overlap extension PCR

Table 1

Strains and plasmids employed in this study.

Strains or plasmids Relevant characteristics 3 Source or

reference

Strains

Escherichia coli DH5a F- endA 1 glnV44 thi-1 recA1 relA 1 gyrA96 deoR nupG Lab stock

(p80d!acZAM15 A(lacZYA-argF)U169, hsdR17(r K mK + ), k

Escherichia coli ER1821 F endA1 glnV44 thi-1 relA1 ? e14 (mcrA) rfbD1 ? spoT1 ? Lab stock; New

A(mcrC-mrr)114::IS10 England Biolabs

Clostridium pasteurianum Wild-type American Type ATCC 6013 Culture Collection

Clostridium pasteurianum Disruption mutant generated by inserting the Ll.ltrB intron into This study

LcpaAlR position 176a of the cpaAIR gene encoding the CpaAI

RENase

Plasmids

pFnuDIIMKn M.FnuDII methyltransferase plasmid for methylation of E. coli- (Pyne, et al., 2013)

C. pasteurianum shuttle vectors (Km"' ; p15A ori)

plMP1 E. coli-Clostridium shuttle vector (Ap R ; ColE1 ori; Em R ; repL (Mermelstein, et ori) al. , 1992) pMTL85141 E. coli-Clostridium shuttle vector (Cm R /Tm ; ColE1 ori; repL (Heap, et al., 2009) ori)

pMTL85141 erm B E. coli-Clostridium shuttle vector (Cm R /Tm ; Em R ; ColE1 ori; (Pyne, et al., 2013) repL ori)

pSY6catP E. coli-Clostridium shuttle vector expressing the Ll.ltrB intron (Pyne, et al., 2013) and ItrA IEP from the C. acetobutylicum ptb promoter (Ap R ;

Cm R /Tm R ; ColE1 ori; repL ori)

pMTL007C-E2 ClosTron vector expressing the Ll.ltrB intron with RAM and (Heap, et al., 2010)

ItrA IEP from the C. sporogenes fdx promoter (Cm R /Tm R ;

ColE1 ori; repH ori; Em RAM)

pFnuDIIMKn-CpaAIIM M.FnuDII and M.CpaAII dual methyltransferase plasmid for This study

methylation of E. coli-C. pasteurianum shuttle vectors (Km R ;

p15A ori)

pMTL007C-E6 repL derivative of pMTL007C-E2 This study pltrB // -deletion derivative of pSY6catP This study pltrA Ll.ltrB-deletion derivative of pSY6catP This study pDelPptb Derived by deleting the -35 and -10 signals of the C. This study acetobutylicum ptb promoter from plasm id pSY6catP

pDel2dcm Derived by mutating the two E. coli Dcm restriction recognition This study sites downstream of the ItrA coding sequence within plasm id

pSY6catP

pMB Derived by replacing a 1 ,661 bp Mfel + BstAPI restriction This study fragment of pSY6catP with a 48 bp stuffer fragment

pN Derived by replacing a 688 bp Nhel + Mfel restriction fragment This study of pSY6catP with a 48 bp stuffer fragment

pNS Derived by replacing a 1 ,434 bp Nhel + Sacll restriction This study fragment of pSY6catP with a 48 bp stuffer fragment

pFragl Derived by replacing a 1 ,332 bp Bglll + EcoO109l restriction This study fragment of pSY6catP with a 589 bp ItrA region

pFrag2 Derived by replacing a 1 ,332 bp Bglll + EcoO109l restriction This study fragment of pSY6catP with a 363 bp ItrA region

pFrag3 Derived by replacing a 1 ,332 bp Bglll + EcoO109l restriction This study fragment of pSY6catP with a 574 bp ItrA region

pSY334 Derived by subcloning a 334 bp Sacll + Aatli fragment of the This study

ItrA coding sequence into plasmid pMTL85141

pMut98 pSY6catP derivative possessing 98 silent mutations in the ItrA This study coding sequence

pMTLCP-E2 Derived by subcloning a 1 ,427 bp Mscl + Acll restriction This study fragment of pMut98 into plasmid pMTL007C-E2

pMTLCP-E6 Derived by subcloning a 1 ,427 bp Mscl + Acll restriction This study fragment of pMut98 into plasmid pMTL007C-E6

pDelCpaAII Deletion of the unique CpaAII recognition site within This study pSY6catP by introducing three silent point mutations

pCpaAII Introduction of a unique CpaAII recognition site within This study pMTL851 1 by introducing two point mutations

pSYCP-cpaAIR Derived by replacing the ptb promoter of pMut98 with a thl This study promoter and targeting the Ll.ltrB intron to position 176a of the

cpaAIR gene

pMTLCP-E2-cpaAIR Targeting construct of plasmid pMTLCP-E2 for disruption of This study

the cpaAIR gene at position 176a

pMTLCP-E6-cpaAIR Targeting construct of plasmid pMTLCP-E6 for disruption of This study

the cpaAIR gene at position 176a

Ap R : a mpiciINn resistant; Cm R : chloramphenicol resistant; Em : erythromycin resistant; Km R : kanamycin resistant; Tm R : thiamphenicol resistant

Table 2

Oligonucleotides employed in this study.

Oligonucleotide Sequence (5'-3') a SEQ ID

ItrB.Nhel.S CTAG CGCTATATGCGTTGATG CAATTTCTATGCACTCGTAGTAGT SEQ I D NO: 1

CT G AGAAG G CAT ATG

ltrB.BstAPI.AS ATGCCTTCTCAGACTACTACGAGTGCATAGAAATTGCATCAACGC SEQ I D NO: 2

AT AT AG CG

ltrA.Xhol.S TTTCTACTCGAGGCGTTGATGCAATTTCTATGCACTC SEQ I D NO: 3

ItrA.BstAPI.AS GGCATCAGAGCAGATTGTACTGAG SEQ I D NO: 4 del-Pptb.S GGGGTTAATCATTTAACATAGATAATTAAATAGTAAAAGGGAGTG SEQ I D NO: 5

TCGAGATATCC

del-Pptb.AS TCGAGGATATCTCGACACTCCCTTTTACTATTTAATTATCTATGTT SEQ I D NO: 6

AAATGATTAACCCC

Mfel/BstAPI.S AATTGATTTAGTAATTTCTATAAGCAG GTTAGCTGTAAAACTAGCA SEQ I D NO: 7

GT AG CACGCATATG

Mfel/BstAPI.AS ATG C GTG CTACT G CT AG TTTT AC AG CTAACCTG CTTATAGAAATTA SEQ I D NO: 8

CTAAATC

Nhel/Mfel.S CT AG C ATTTAGTAATTT CT ATAAG C AG G TTAG CT GTAAAACT AG C A SEQ I D NO: 9

GTAGCACC

Nhel/Mfel.AS AATTG GTG CTACTGCTAGTTTTACAGCTAACCTG CTTATAGAAATT SEQ I D NO: 10

ACTAAATG

Nhel/Sacll.S CT AG C ATTTAGTAATTT CTATAAG CAGGTTAG CT GTAAAACT AGC A SEQ I D NO: 11

GTAG CACCCGC

Nhel/Sacll.AS GGGTGCTACTGCTAGTTTTACAGCTAACCTGCTTATAGAAATTAC SEQ I D NO: 12

TAAATG

fragl .Bglll.S G G GAT ATG AT AT AC GAG TAAG GAGATCTGG SEQ I D NO 13 frag1.EcoO109I.AS AG TATTAG G C C CTG ACGT CC CACATAATT CAC AAC ATTT AG C SEQ I D NO 14 frag2.Bglll.S AACAGGAGATCTGCTAAATGTTGTGAATTATGTGGGACGTC SEQ I D NO 15 frag2.EcoO109I.AS TACTCTAGGCCCTGGAGACCCCACACTACCATCG SEQ I D NO 16 frag3.Bglll.S TCGCCAAGATCTCGATGGTAGTGTGGGGTCTCC SEQ ID NO 17 frag3.EcoO109I.AS GTGCCACCTGACGTCTAAGAAACC SEQ ID NO 18

3'SOE.S TGGGAAATG GCAATGATAGCGAAAC SEQ ID NO 19

SOE.EcoO109l.AS ATAGGCGTATCACGAGGCCCTTTC SEQ I D NO: 20 gBlock. Bgll l.S CGAGTAAGGAGATCTGGAACGATAAAACG SEQ ID NO: 21

CpaAI I-anneal.S CTAGAGTCGACGTCACGCGTCCAAGGAGATCTCCAGGCCTGCA SEQ ID NO : 22

GACATGCA

CpaAI I-anneal.AS AGCTTGCATGTCTGCAGGCCTGGAGATCTCCTTGGACGCGTGA SEQ I D NO : 23

CGTCGACT

SYCP.g Block.S GGAGGTCAATCTATGAAAATGCGATTAAGC SEQ ID NO: 24

SYCP.g Block.AS CTTTC GTTTCGTTCCCATAG GTTCTCC SEQ ID NO: 25

MTLCP.REN-Hindll l.S GTATTTAAAGCTTATAATTATCCTTAAATTTCTTAAAAGTG CG CC SEQ ID NO: 26

REN .Fw CTACTTGAGGTCTAGGACTTCTATCT SEQ ID NO: 27

REN . Rv ' ACAGATAGGCCATTAAAGGTATTCA SEQ ID NO: 28

ItrB.Fw CCAACG CG TCG CCAC GT AATAAAT SEQ ID NO: 29

ItrB.Rv ATGGGAACGAAACGAAAGCGATGC SEQ ID NO: 30

M.CpaAII. Fw TCATGTTTGACAGCTTATCATCGATAAGCTTTAATGCGGTAGTTTA SEQ ID NO: 31

TC AC AGTTAAATTG CTAACG CAGTC AGAGG AGAAAAATAG TCTAT GTTACACACAG AAATTAAG AG TAAAATTG ATAAACTATG G AAC

M.CpaAII. Rv TAGGCTGCAAATGGAGAAAAAAATCACTGGATATACCACCGTTGA SEQ ID NO: 32

TATATCCCAATGGCATCGTAAAGAACAAATTACGCCTACTTCACC

ATTCCCTTTAACTCAGC

S.CpaAII . Fw TCATGTTTGACAGCTTATCATCGATAAGCTTTAATGCGGTAGTTTA SEQ ID NO: 33

TC AC AGTTAAATTG CTAACG CAGTC AGAGG AGAAAAATAG TCTAT GGAAAATGTTAAGTTACCTGAAGGATGG

S.CpaAII . Rv TAGGCTGCAAATGGAGAAAAAAATCACTGGATATACCACCGTTGA SEQ ID NO: 34

TATATCCCAATGGCATCGTAAAGAACAAATTACGCTTAGCTATAC AATTCTCCATTAAAAGCTTTTTGCATC

Underline: relevant restriction endonuclease recognition sequences

Table 3

DNA nd Amino Acid Sequence of the M.CpaAII ethylase and S.CpaAII Specificity Domain from Clostridium pasteurianum

Name Sequence

M.CpaAII DNA ATGTTACACACAGAAATTAAGAGTAAAATTGATAAACTATGGAACAAT SEQ ID NO: 35 sequence (5'-3'}; TTTTGGTCAGGGGGAATATCTAATCCTTTGACAGTAATAGAGCAGATA

corresponding to TCTTATTTATTGTTTATAAAGAGATTAGATGATATAGATACTACTAATG

base-pairs 926934- AAAAAATGGCCAATAGGATGGGTAAGCCATTTAAATCCTTATTTATGG

928484 from AATTAAGTGAAAAGTTAAAGAAAGAGAATAAAATAAAGGATGATTCTA

GenBank Accession ATTGGGAATTATTAAAGTGGAGTCAATTTAAGCATTTAGAAGTAGAAA

number AGATGTTTGAAGTAGTATCTCAAAAGGTATTTCCATTTATAAAATCTAT

JPGY01000008.1 GGGAGGAGAAGAATCGAGCTTTACAGCTGAAATGAAGGATGCTGTAT

TTATGATACCTAAGCCATCATTACTTCAAGAATCTGTAAGACTTATAGA

TGGTATAAATATG GATGATG CAG ATAC AAAAG G G G ATTT ATAT G AATA

TCTGTTATCAAAACTTGCCACATCAGGGGTAAATGGTCAGTTTAGAAC

ACCAAGACACATAATAAGGATGATGGTTGAGCTTATGGATCCATCAG

CTGAAGATAAGATATGTGACCCTGCCTGTGGTACAGCAGGTTTTTTAA

TTAATTCATTAGAATATATACTTGAAAAATATACAAGATCAGAAAGTGT

ATTTACAGATGAGGAAGGGAATATACATAACAGAATTGGAGATATGAT

GAGTCCAAAAGACTGGGATCATTTTAAGAAGGATATGTTCTACGGTTT

TGACTTTGATGCATCAATGGTCAGAATAGCTTCAATGAACTTAATGCT

TCATAGCATTGATGATCCAAATGTTAGACAAATGGATTCGTTATCAAA

AAGATATGAAGAAGAAAATAAATATACTTTGGTTTTAGCTAATCCTCCT

TTTAAAGGCAGTATAGATAAGGAAGATATTAATCCTTCTTTATTAAAAG

GTGCCAAAACTACTAAGACGGAATTATTATTTATAAAACTCATAAACA

GAATACTTGACTTAGGTGGAAGATGCGCAGTTATAGTACCAGATGGA

GTACTATTTG G GTC AACTAAG G C ACATAAG G ATATAAG AAAAAATTTA

GTAGAAGAAAATCAATTAGAAGGTGTAATATCAATGCCTTCAGGAGTA

TTTAAGCCCTATGCAGGAGTATCAACAGCAGTGCTTTTATTTACTAAA

GGTGGAGAAACAAATAAAGTATGGTTTTATGATATGGAAAGCGATGG .

ATACTCTCTAGATGATAAGAGAAATAAACTTGATACTGATGGAGATAT

TCCAG ATAT AATTG AAAAAT G G AG AG ATAT AAAG AAAG ATAATAC TTT

GGAACCTAGTAAAGAAGATAGATGGTTCTGGGTAGAAAAAGAAGAAA

TAGCAGGTAATGATTATGATTTATCAATAAATAAGTATAAAGAGATAGA

ATATGAAGAAGTTGTGTATGAAAAGCCAGAGGTTATATTGGAGAGGA

TTGAAATTTTGGAGAAATCTATTTTAGAGAATATAGCTGAGTTAAAGG

GAATGGTGAAGTAG

M.CpaAII amino acid MLHTEIKSKIDKLWNNFWSGGISNPLTVIEQISYLLFIKRLDDI DTTNEKMA sequence NRMGKPFKSLFMELSEKLKKENKIKDDSNWELLKWSQFKHLEVEKMFE

WSQ KVFPFIKSMGGEESSFTAEMKDAVFMIPKPSLLQESVRLIDGINMD

DADTKGDLYEYLLSKLATSGVNGQFRTPRHIIRMMVELMDPSAEDKICD

PACGTAG FLINSLEYILEKYTRSESVFTDEEGNIHNRIGDMMSPKDWDHF

KKDMFYGFDFDASMVRIAS NLMLHSIDDPNVRQMDSLS RYEEENKY

TLVLAN PPFKGSIDKEDINPSLLKGAKTTKTELLFI KLI N RI LDLGGRCAVIV

PDGVLFGSTKAHKDI RKNLVEENQLEGVISMPSGVFKPYAGVSTAVLLFT

KGGETNKVWFYDMESDGYSLDDKRNKLDTDGDIPDII EKWRDIKKDNTL

EPSKEDRWFWVEKEEIAGNDYDLSINKYKEI EYEEWYEKPEVILERIEIL

EKSiLENIAELKGMVK

S.CpaAII DNA ATGGAAAATGTTAAGTTACCTGAAGGATGGAGATTAGAAAAACTAAAA sequence (5'-3'); G ATTTATTAG AAG ATAAATAT G CAGG G G AATG G GGTGATG AAG ATAC corresponding to TGATAACAGTGGAATTTGTGTAATAAGAACTACTAATTTTACTAATAGT base-pairs 928486- G G AAG GCTAG ATTTTAG CAAT GTTGT AACTAG AAAAATAG ATAAAAAA 929724 from ATAATTGAAAAGAAAGAGTTAAAGTATGGAGATATTATTTTAGAAAAAT GenBank Accession CAGGCGGTAGTGAAAATCAACCGGTAGGTAGAGTAGTATACTTTGAT number AAATATGAAGATAATCATATTTGCAATAATTTTACACATGTATTAAGAA

J PGY01000008.1 TAT ATG AT AAAAAAG CTGTTT C AAAATAT G TT AT G TACTTTTTACTAG A

CATATATAAAAAAGGAATAACAGAATATTTTCAAAATAAAACAACAGGA

ATAAGAAATCTTCAAATGAAGAGATATTTAGAGTTAGATATAGTATTAC

CACCATTAGAAACTCAAAAGAAAATAGTTGATGTTTTAGAAAAGGCTG

AAAAGACTTTAGAAAAGAGAAAAGAAGCCATTAAGTTGTTAGATGAAT

TAGTCAAATCGAGATTTATCGAGATGTTTGGTGATCCAGCTACTAATT

CT AAG G G AT G G AT AG TT G AAG AG CTTTC AAAG T GTTTAATT AATATT G

AAAATGGAAAAAGCTTTGTATGTGAAAACTATAGTAGAAAAGGTGAAT

ATCCAGCTATATTAAAATTAAGTGCTGTAACATATGGAATATATAATTC

TTCTGAAAATAAAGCATTGACAGATGAAAATTTATTTGTAGAAAGTGTT

GAGGTGAAAAATGGAGATTTGTTATTTACAAGAAAAAACACTCCAGAA

TTAGTAGGAATGAGTGCATATGTTTATGAAACTTCTCCTAATTTAATGA

TGCCTGATTTAATATTTAGGTTTAATACTAATGAAAAATGTAACAAAGT

TTATTTGTGGAAATTAATTAATCACGATTTGTTTAGAGAAAATATTAAG

GCTTTATCTAATGGTTCTGCGAAGTCTATGTCTAATATATCAAAACAAA

GATTAATGAGTCTGAATATTCCAATACCACCACTAGAACTCCAAAACC

AATTTGCAGATTTCGTCAAGCAAGTCGACAAATTGAAATTTGAAATGC AAAAGAGTTTAAAAGAACTAGAGGACAACTTTAATTCTTTGATGCAAA AAGCTTTTAATGGAGAATTGTATAGCTAA

S.CpaAII amino acid MENVKLPEGWRLEKLKDLLEDKYAGEWGDEDTDNSGICVIRTTNFTNS SEQ ID NO: 38 sequence GRLDFSNWTRKIDKKIIEKKELKYGDIILEKSGGSENQPVGRWYFDKYE

DNHICNNFTHVLRIYDKKAVSKYVMYFLLDIYKKGITEYFQNKTTGIRNLQ MKRYLELDIVLPPLETQKKIVDVLEKAEKTLEKRKEAIKLLDELVKSRFIEM F G D PAT NS KGWIVE E LS KCL I N I E N G KS FVC E N Y S R KG E Y P Al L KLSAVTY

GIYNSSENKALTDENLFVESVEVKNGDLLFTRKNTPELVGMSAYVYETS PNLMMPDLIFRFNTNEKCN KVYLWKLINHDLFRENIKALSNGSAKSMSNI SKQRLMSLNIPIPPLELQNQFADFVKQVDKLKFEMQKSLKELEDNFNSLM QKAFNGELYS

DETAILED DESCRIPTION OF INVENTION

Shuttle vectors harboring Ll.ltrB intron machinery hinder electrotransformation of C. pasteurianum

[0009] To attempt chromosomal gene disruption in C. pasteurianum, we first electrotransformed plasmid pSY6catP, which was developed for use in C.

acetobutylicum (Pyne, et al., 2013; Shao, et al., 2007). This vector harbors the Ll.ltrB intron and its cognate intron-encoded protein (IEP) gene, ItrA, both of which are transcribed from the same. C. acetobutylicum ptb promoter within a pl P1 vector backbone. Electrotransformation efficiencies of 3.7 χ 10 4 and 3.7 χ 10° transformants pg 1 plasmid DNA were obtained for plMP1 and pSYScatP, respectively, indicating an inability of pSY6catP to transform C. pasteurianum (FIG. 1 ). As the only difference between plMP1 and pSY6catP is the presence of the intron machinery, we also attempted to transfer the ClosTron plasmid pMTL007C-E2, which expresses the same intron elements within a different, pMTL007-based vector backbone. Like pSY6catP, pMTL007C-E2 also yielded a poor electrotransformation efficiency of only 1 .9 χ 10 1 transformants pg- plasmid DNA (FIG. 1 ). Since pMTL007C-E2 possesses a different replication origin from pSY6catP (repH and repL, respectively), we constructed a repL derivative of pMTL007C-E2, named pMTL007C-E6, in order to allow direct comparison of the two repL-based intron-containing vectors, pMTL007C-E6 and pSY6catP. Like pMTL007C-E2 and pSY6catP, pMTL007C-E6 suffered from the same

electrotransformation inhibition, generating only 3.2 χ 10° transformants pg "1 plasmid DNA, whereas the control vector, pMTL85141 , gave 1 .5 10 4 transformants pg plasmid DNA (FIG. 1 ). Taken together, these outcomes demonstrate that shuttle vectors carrying the Ll.ltrB intron machinery are inhibitory to electrotransformation of C.

pasteurianum regardless of the vector backbone and replication origin employed.

Inability to electrotransform Ll.ltrB-containing vectors is due to the presence of the ItrA gene

[0010] Since the Ll.ltrB intron and its cognate IEP gene, ItrA, are expressed from the same ptb promoter, we aimed to express the intron components independently in order to determine which element is responsible for electrotransformation inhibition. We constructed plasmids pLtrB and pLtrA, which individually express the intron and ItrA gene, respectively, with the use of the constitutive C. acetobutylicum ptb promoter. Upon transfer to C. pasteurianum, pLtrA hindered electrotransformation, giving an efficiency of 1 .4 * 10 transformants pg ~1 plasmid DNA, whereas pLtrB generated 4.6 χ 10 3 transform ants g plasmid DNA (FIG. 1 ), indicating that electrotransformation inhibition is specific to the ItrA region of pSY6catP. Since plasmid pLtrA expresses a functional LtrA IEP product possessing maturase, endonuclease, and reverse transcriptase activities, low electrotransformation efficiency could be the result of toxicity. To test this hypothesis, we constructed pDelPptb, in which the -35 and -10 signals of the ptb promoter within pSY6catP were deleted. The resulting plasmid should not produce intron RNA or a protein product corresponding to the IEP. Despite deletion of the promoter, pDelPptb failed to improve electrotransformation efficiency and generated only 2.1 x 10 1 transformants μ9 1 plasmid DNA (FIG. 1 ), implying that low electrotransformation efficiency was not associated with toxicity of the intron and IEP.

[0011 ] In light of these results, we speculated that a specific nucleotide sequence within the ItrA gene could be targeted by an uncharacterized RM system in C.

pasteurianum. Plasmid pLtrA contains two E. coll Dcm (5'-CCWGG-3'; W = A or T) restriction recognition sites immediately downstream of the ItrA coding sequence, which are not found in the control vector, plMP1 . Since E. coli-C. pasteurianum shuttle vector preparations destined for C. pasteurianum electrotransformation are first methylated in a Dcm + E. co// host strain (ER1821 ), all such plasmids would be methylated at both Dcm recognition sites (5'-CmCWGG-3'). Certain methylated Dcm sites have been shown to potently inhibit electrotransformation of C. thermocellum (Guss, et al., 2012) and C. ijungdahlii (Leang, et al., 2013). Therefore, we examined if Dcm methylation contained within the ItrA gene region was responsible for the decline in

electrotransformation efficiency by constructing plasmid pDel2dcm, in which both Dcm recognition sites were mutated. Plasmid pDel2dcm failed to generate any detectable transformants (FIG. 1 ), indicating that Dcm methylation is not responsible for the reduced electrotransformation efficiency of pSY6catP.

C. pasteurianum restricts a 334 bp region of the ItrA gene region and restriction can be overcome by extensive codon modification [0012] In a first attempt to identify any unknown restriction recognition sequence within plasmid pLtrA, three constructs were prepared by replacing various-sized restriction fragments in the ItrA gene and its downstream region with a 48 bp stuffer fragment. The sizes of these restriction fragments were 1 ,661 bp (pMB), 1 ,434 bp (pNS), and 688 bp (pNM) (FIG. 1 ). Upon electrotransformation of the three resulting vectors, only pMB gave an improved electrotransformation efficiency (8.8 χ 10 3 transformants pg ~1 plasmid DNA), whereas pNS and pNM yielded efficiencies of 2.8 x 10 1 and 4.2 0 1 transformants pg 1 plasmid DNA, respectively (FIG. 1 ). This result indicates that the putative restriction endonuclease recognition sequence is contained within a 932 bp Sacll-BstAPI restriction fragment of pSY6catP (corresponding to 493 bp of the ItrA coding region and 439 bp downstream of the ItrA gene).

[0013] To reduce the size of the putative vector region responsible for

electrotransformation inhibition, we constructed another three vectors in which a 1 ,332 bp Bglll-EcoO109l restriction fragment was replaced with one of the three different regions in the 932 bp Sacll-BstAPI restriction fragment. Of the initial 932 bp region, plasmids pFragl , pFrag2, and pFrag3 possess fragments of sizes 339 bp, 338 bp, and 305 bp, respectively (FIG. 2). Approximately 20-30 bp overlap was contained between PCR fragments to ensure the unknown restriction recognition site would be represented in its entirety. Of these three constructs, only pFragl showed a significant reduction in electotransformation efficiency (7.1 χ 10 1 transformants pg ~1 plasmid DNA) compared to the control vector (plMP1 ; 3.7 χ 10 4 transformants pg 1 plasmid DNA), whereas pFrag2 and pFrag3 yielded efficiencies of 3.5 10 4 and 9.7 10 3 transformants μg ■'l plasmid DNA, respectively (FIG. 2). This result reduced the inhibitory region of pSY6catP from 932 bp to 339 bp. To further verify that the 339 bp region of the ItrA coding sequence is responsible for inhibition of electrotransformation of C. pasteurianum, we aimed to insert the detrimental sequence into a control vector that is able to electrotransform C.

pasteurianum at a high efficiency. Thus, we constructed plasmid pSY334 by subcloning 334 bp of the 339 bp inhibitory fragment from pSY6catP into pMTL851 1 using Sacll and Aatll restriction sites. Plasmid pMTL.85141 consistently electrotransforms C.

pasteurianum with efficiencies on the order of 10 3 -10 4 transformants g plasmid DNA. As expected, pSY334 failed to generate any transformants upon several

electrotransformation attempts (FIG. 2), suggesting potent activity of the

uncharacterized RM system in C. pasteurianum.

[0014] To overcome the uncharacterized restriction barrier, we attempted to mutate the unknown restriction recognition sequence in the 334 bp inhibitory region of pSY6catP. Since this region is contained within the ItrA gene sequence, our efforts were limited to silent mutations, which would conserve the amino acid sequence and yield a functional ItrA gene product. Consequently, we silently mutated 98 bp spanning 83 codons within the 334 bp ItrA coding sequence, generating plasmid pMut98, with the greatest number of consecutive nucleotides left unaltered following mutagenesis being 5 bp. Since most RM recognition sequences are greater than or equal to 6 bp, we envisioned that the unknown restriction recognition site would be mutated within pMut98. As expected, pMut98 electrotransformed C. pasteurianum with an efficiency of 3.2 x 10 3 transformants g- plasmid DNA, an increase of approximately three orders of magnitude compared to unmodified pSY6catP (FIG. 2). We then subcloned the mutated 334 bp ItrA coding region into plasmids pMTL007C-E2 and pMTL007C-E6, yielding pMTLCP-E2 and pMTLCP-E6, respectively. Similar to pMut98, codon modification led to oignifioantly incrcaQod clcotrotranoformation cffioicnoico for both pMTLCP-C2 (1.0 »< 10 3 transformants g 1 plasm id DNA) and pMTLCP-E6 (2.7 χ 10 3 transformants pg 1 plasmid DNA), compared to 1.9 x 10 transformants pg- plasmid DNA for pMTL007C- E2 and 3.2 χ 10° transformants pg 1 plasmid DNA for pMTL007C-E6 (FIG. 1 ).

Methylome analysis unveils a unique RM system in C. pasteurianum (CpaAI I) that restricts a single 5'-AAGNNNNNCTCC-3' site within pSY6catP

[0015] Using single molecule real-time (SMRT) sequencing data, it is possible to probe an organism's unique DNA methylation profile, i.e. methylome. To determine a putative recognition sequence of the uncharacterized RM system, we utilized existing C. pasteurianum SMRT genome sequencing data generated using the RS II analyzer (Pyne, et al., 2014b). Due to current technical restraints in SMRT methylome analysis, only m6A residues could be detected, as identification of m5C residues requires more extensive sequence coverage. Surprisingly, methylome analysis revealed a previously unidentified methylation motif possessing a putative recognition sequence of 5'- AAGNNNNNCTCC-3', which we have designated CpaAII . A single CpaAII recognition site exists within the 334 bp of the ItrA gene shown to inhibit electrotransformation of C. pasteurianum. To further verify the CpaAII recognition sequence, we constructed plasmid pDelCpaAII, in which the unique CpaAII recognition site was mutated by introducing only three silent point mutations within the coding sequence of ItrA. Similar to p ut98, with 98 silent mutations and a high electrotransformation efficiency of 3.2 x 10 3 transformants pg plasmid DNA, pDelCpaAII generated a 432-fold increase in electrotransformation efficiency (1 .6 x 10 3 transformants pg 1 plasmid DNA) compared to unmodified pSY6catP (3.7 χ 10° transformants pg ~1 plasmid DNA). Conversely, we aimed to create a unique CpaAII recognition site by introducing only two point mutations within p TL85141 , which does not contain any CpaAII recognition sites and

electrotransforms C. pasteurianum efficiently. The resulting plasmid, pCpaAII, failed to electrotransform C. pasteurianum, further verifying the existence of the new RM system.

[0016] We analyzed all other E. coli-C. pasteurianum shuttle vectors utilized in this work and only pMTL007C-E2 and pMTL007C-E6, both harboring the lactococcal group II intron machinery, were found to possess a CpaAII site. The CpaAII site within these vectors is identical to that in pSY6catP, as both are located within the ItrA gene.

Unexpectedly, both pMTL007C-E2 and pMTL007C-E6 were found to possess an additional CpaAII site, which was reconfirmed by Sanger sequencing, within the erythromycin (ermB) retrotransposition-activated-marker (RAM) region. To assess the effect of the additional CpaAI I site on electrotransformation, we used plasmids pMTLCP-E2 and pMTLCP-E6, in which the detrimental CpaAII site within the ItrA coding sequence is mutated, yet the additional site within the ermB RAM is left unaltered, to electrotransform C. pasteurianum. As reported above, pMTI_CP-E2 and pMTLCP-E6 generated relatively high electrotransformation efficiencies of 1 .9 χ 10 3 transformants pg -1 plasmid DNA and 2.7 χ 10 3 transformants pg 1 plasmid DNA, respectively, suggesting that the additional CpaAII site is not subject to restriction by C. pasteurianum.

[0017] Vector methylation provided an alternative strategy to circumvent DNA digestion due to CpaAII. M. CpaAII is an N6 adenine-specific DNA methylase whose methylation activity protects against digestion by CpaAII. S. CpaAII is a protein encoding the specificity domain for M. CpaAII. M. CpaAII is encoded by locusjag CP6013_1663 (SEQ ID NO: 35) and S.CpaAII is encoded by locusjag CP6013J 664 (SEQ ID NO: 37) within contig 5 of the whole genome shotgun sequence of C. pasteurianum

(GenBank Accession No. J GY01000008.1 and Table 3). We cloned M. CpaAII and S.CpaAII into the methylation vector pFnuDIIMKn to yield the vector, pFnuDIIMKn- CpaAI IMS. Plasmids grown in E. coli strain ER1821 [pFnuDIIMKn-CpaAIIMS] harbouring this vector were simultaneously protected from digestion by both CpaAI and CpaAII restriction enzymes present within C. pasteurianum.

Generation of an intron-mediated gene disruption mutant of C. pasteurianum

[0018] Since plasmid pMut98 afforded a substantial improvement in

electrotransformation efficiency compared to pSY6catP, we used it as a C.

pasteurianum gene disruption vector. In addition, we replaced the C. acetobutylicum ptb promoter controlling intron transcription with the promoter from the C. pasteurianum thioiase gene. The resulting vector, pSYCP, was then used to target nucleotide position 176 within the antisense strand (176a) of the putative cpaAIR gene (corresponding to locus tag CP6013_2592), which was identified within the draft genome sequence of C. pasteurianum ATCC 6013 (GenBank accession: J PGY01000009.2; Pyne, et al., 2014b) and encodes the CpaAI Type II restriction endonuclease. Consistent with restriction analyses (Pyne, 2013; Richards, et al. , 1988), REBASE (Roberts, et al., 2010) predicts a recognition sequence of 5'-CGCG-3' for the putative cpaAIR gene product. We selected the cpaAIR gene for gene disruption since the predicted 176a insertion site generated a high predicted insertional score (7.3), gene disruption is unlikely lethal, and the resulting mutant should prove useful for future genetic and metabolic engineering applications by abolishing the requirement for methyiation of plasm id DNA prior to electrotransformation. The retargeted pSYCP-cpaAI R plasm id was electroporated to C. pasteurianum and transformants were selected using thiamphenicol. Transformant colonies were first screened for insertion of the intron within the cpaAIR coding sequence, resulting in an insertion of 915 bp, using two gene-specific primers flanking the predicted 176a intron insertion site (FIG. 3A). Of 28 screened colonies, all possessed the wild-type PCR product without the intron insertion. However, both gene- intron junctions could be detected in several pSYCP-cpaAIR transformant colonies using one gene-specific and one intron -specific primer (data not shown), signifying successful intron insertion had occurred. The resulting mosaic colonies were presumed to be comprised of a mixture of wild-type and intron-disrupted mutant cells, potentially due to low retrohoming efficiency (Jiri Perutka, personal communication). To enrich and isolate the mutant cells, mosaic colonies were subcultured in liquid 2xYTG medium containing thiamphenicol every 12 hours for a total of 5 days, or 10 transfers, followed by rescreening of the resulting colonies for the intron insertion using two gene-specific primers. Of seven colonies screened, four possessed the desired intron insertion (FIG. 3B). Finally, one such positive colony was selected and used for additional PCR verification by amplifying both gene-intron junctions to ensure proper intron insertion and orientation (FIG. 3C).

[0019] In another attempt for gene disruption, we constructed plasmids pMTLCP-E2- cpaAIR and pMTLCP-E6-cpaAI R, which possess an erythromycin RAM allowing direct selection of cells containing an intron insertion. Thiamphenicol-resistant transformants harboring pMTLCP-E2-cpaAIR or pMTLCP-E6-cpaAIR were restreaked onto erythromycin-containing 2xYTG agar plates to select for intron disruption mutants. Upon restreaking 24 transformant colonies, no erythromycin-resistant colonies were able to grow, signifying unsuccessful insertion of the RAM-containing intron into the cpaAIR gene of C. pasteurianum. Therefore, we recommend plasmid pSYCP, containing a markerless Ll.ltrB intron, with our developed enrichment method for performing chromosomal gene disruptions in C. pasteurianum.

Curing of intron donor plasmid pSYCP-cpaAIR

[0020] Prior to phenotypic characterization of the resulting AcpaAIR mutant, we attempted to cure the pSYCP-cpaAIR intron donor plasmid. Based on previous reports involving plasmid curing in Clostridium, we employed a method based on repeated subculturing in non-selective growth medium (Cui, et al., 2012; Li, et al., 2012). A single AcpaAIR mutant colony was grown in liquid 2xYTG medium and the seed culture was used to inoculate fresh growth medium every 12 hours for a total of three or seven successive transfers. The resulting cells were serially diluted and plated onto non- selective 2*YTG agar. Thirty-six colonies each from the three-transfer and seven- transfer methods were picked and restreaked onto both non-selective and

thiamphenicol-containing 2xYTG agar plates. A total of 18 and 33 thiamphenicol- sensitive colonies, corresponding to curing efficiencies of 50% and 92%, were identified in the three- and seven-transfer approaches, respectively. Four thiamphenicol-sensitive colonies were subjected to further confirmation of plasm id curing based on their sensitivity to thiamphenicol and their inability to generate a plasmid-borne PCR product upon conducting colony PCR using primers frag3.Bglll.S (SEQ ID NO: 17) +

frag3.EcoO109I.AS (SEQ ID NO: 18) (data not shown).

Characterization of the C. pasteurianum AcpaAIR gene disruption mutant

[0021 ] The resulting cpaAIR gene disruption mutant should not produce a functional CpaAI Type II restriction endonuclease and, therefore, should be efficiently transformed with plasmid DNA lacking methylation by the FnuDIIM methyltransferase (5'-m5CGCG- 3') (Pyne, et al., 2013). To test this phenotype, we used the plasmid-cured AcpaAIR mutant strain to assess its capacity to be electrotransformed with both M.FnuDII- unmethylated and M.FnuDII-methylated plasmid substrates (FIG. 4). We first verified that M.FnuDII unmethylated plasmid pMTL85141ermB, a dual erythromycin- and thiamphenicol-selectable derivative of pMTL85141 (Pyne, et al., 2013), fails to electrotransform wild-type C. pasteurianum (FIG. 4, top left panel) due to CpaAI restriction, whereas M.FnuDII-methylated pMTI_85141ermB electrotransforms efficiently (FIG. 4, top right panel; electrotransformation efficiency of 1 .1 χ 10 4 transformants g plasmid DNA). We next attempted to electrotransform M.FnuDll-unmethylated plasmid pMTL85141ermB to AcpaAIR mutant cells and, as expected, pMTL85141 ermB (FIG. 4, bottom left panel; electrotransformation efficiency of 9.6 χ 10 2 transformants g^ plasmid DNA) electrotransformed at a level comparable to M.FnuDII-methylated pMTL85141ermB (FIG. 4, bottom right panel; electrotransformation efficiency of 2.3 χ 10 3 transformants plasmid DNA). Successful electrotransformation of methylation- deficient plasmid DNA indicates inactivation of the cpaAIR Type II endonuclease in the AcpaAIR mutant strain. Note that electrotransformation efficiency of AcpaAIR cells was more than an order of magnitude lower than wild-type C. pasteurianum, which we have observed in two independent electrotransformation experiments. We are uncertain of the reason for the reduced electrotransformation efficiency of the AcpaAIR mutant. However, electrotransformation efficiency of the AcpaAIR mutant strain is still comparable to many other clostridial procedures, generating 10 2 -10 3 transformants g 1 plasmid DNA.

[0022] In summary, we show that the ability to perform group-ll-intron-mediated chromosomal gene disruptions in C. pasteurianum is limited by host restriction and low- level intron retrohoming efficiency, and overcoming these obstacles leads to successful derivation of C. pasteurianum mutant strains.

[0023] By assessing transformability of several plasmid deletion derivatives of pSY6catP (Figures 1 and 2) and probing the methylome of C. pasteurianum using SMRT genome sequencing data, we have concluded that the inability of intron- containing vectors for chromosomal gene disruption to efficiently electrotransform C. pasteurianum is due to restriction by a new RM system, designated as CpaAII. Based on methylation motifs predicted from methylome analysis, we propose a recognition sequence of 5'-AAG N NCTCC-3' for the CpaAII RM system. The proposed CpaAII recognition sequence is indicative of Type I RM systems, which are widespread in bacteria and recognize nucleotide sequences comprised of 4-8 degenerate (N) residues flanked by short (2-5 bp) defined sequences (Roberts, et al., 2010). Further, all Type I RM systems characterized to date encode m6A-specific methyltransferases (Roberts, et al., 2003), as we have proposed for CpaAII. Finally, a single Type I RM system is annotated in the draft genome sequence of C. pasteu rianum ATCC 6013 (Pyne, et al., 2014b) and encodes the three host specificity domains, corresponding to restriction {hsdR; locus tag CP6013J1662), modification [hsdM; locus tag CP6013_1663), and specificity (hsdS; locus tag CP6013_1664), that are typical characteristics of Type I RM systems (Roberts, et al., 2003). The hsdM domain encodes a putative N6 adenine- specific DNA methylase, which we refer to as M. CpaAII (SEQ ID NO: 36). DNA methylation by M. CpaAI I working together with the hsdS specificity domain, which we refer to as S. CpaAII (SEQ ID NO: 38), provides a generally applicable strategy to protect vectors against digestion by CpaAII, in cases where the introduction of point mutations to disrupt the recognition sequence of CpaAII may be inconvenient or impractical due to, e.g., the number of CpaAII sites present in a large DNA fragment or causing a change in an amino acid sequence due to the point mutations.

[0024] All vector derivatives with a significantly reduced ability to electrotransform C. pasteurianum in this study were found to possess at least one CpaAII recognition sequence. In the case of pSY6catP, mutation of the single CpaAII recognition site within the ItrA gene sequence restored electrotransformation to a level comparable to control vectors lacking lactococcal group I I intron elements. Conversely, only one of the two CpaAII restriction recognition sequences within ClosTron plasm ids pMTL007C-E2 and pMTL007C-E6 was found to negatively affect electrotransformation efficiency of C. pasteurianurn. A possible explanation for this outcome is that the actual CpaAII recognition sequence may differ slightly from the proposed motif predicted by methylome analysis in this study. It is likely that the actual CpaAII recognition sequence is more stringent than the predicted 5'-AAGNNNNNCTCC-3' motif, as only one of the two CpaAII recognition sequences within pMTL007C-E2 and pMTL007C-E6 was subject to restriction. For example, the recognition sequence of the Type I RM system from C. perfringens ATCC 13124, 5' -C AC T AAA-3' (R = A or G), is similar in structure to that of the predicted CpaAII enzyme (5'-AAGNNNNNCTCC-3'), yet contains a partially-degenerate 5 bp element at the 3' end, rather than a defined 4 bp element in the case of CpaAII. Increasing the coverage of SMRT sequencing data (our ongoing approach) could potentially resolve this apparent discrepancy.

[0025] Having overcome host restriction of shuttle vectors harboring the lactococcal ItrA gene, we used pMut98, which contains no CpaAII recognition sites, as the basis for deriving our chromosomal gene disruption vector. However, retrohoming efficiency of the intron proved to be too low to isolate a homogeneous gene disruption mutant directly from pSYCP-cpaAIR transformant colonies, as most colonies were found to represent a heterogeneous mixture of wild-type and mutant cells. Mixed-genotype transformant colonies have been described in other intron-based gene disruption studies. This outcome is not specific to the cpaAIR{M6a) insertion site employed in this study, as we have observed mixed -genotype colonies using predicted target sites of three other chromosomal genes in C. pasteurianum (data not shown). As such, retrohoming efficiency of the Ll.ltrB intron appears to be lower in C. pasteurianum compared to other solventogenic Clostridia (Heap, et al., 2010; Heap, et al., 2007; Shao, et al., 2007), as gene disruption mutants are typically identified and isolated following transfer of the intron donor plasmid. The use of a selectable RAM inserted into domain IV of the intron (Heap, et al., 2010; Heap, et al., 2007) within vectors pMTL007C- E2/pMTLCP-E2 and pMTL007C-E6/pMTLCP-E6 was not able to enhance selection and isolation of true intron disruption colonies, presumably due to a further decrease in retrohoming efficiency resulting from the presence of cargo DNA within the intron sequence (Heap, et al., 2010). Thus, to isolate a homogeneous gene disruption colony, we employed an enrichment procedure by subculturing a heterogeneous transformant colony in selective liquid medium to promote intron insertion. Following our enrichment protocol, approximately half of the screened colonies possessed the desired intron insertion (FIG. 3), suggesting its effectiveness in increasing retrohoming efficiency of the Ll.ltrB intron in C. pasteurianum.

[0026] Plasmid curing has proven to be an essential and often laborious aspect of clostridial strain engineering efforts. Several clostridial host-vector systems have been shown to produce plasmids that exhibit strong segregational stability in the absence of selection. This has led to the development of strategies involving induction of plasmid instability through the use of negative-selectable markers and antisense RNA targeted to the plasmid replication protein. Here, we have shown that efficient plasmid curing in C. pasteurianum does not require artificial induction methods, as plasmids based on the common repL replication origin can be efficiently cured by subculturing cells in non- selective growth medium for a total of only three successive transfers (approximately two days). This outcome contrasts other reported clostridial plasmid curing procedures, which often require up to seven successive transfers to cure replicative intron donor plasmids (Cui, et al., 2012).

[0027] The cpaAIR mutant strain developed in this study does not harbor plasmids nor antibiotic resistance markers, as it was constructed without the use of a RAM- containing intron, and should prove advantageous for future genetic and metabolic engineering efforts by abolishing the requirement for M. FnuDII methylation of shuttle vectors prior to electrotransformation. Note that shuttle vectors transforming AcpaAIR mutant cells still require Cpal methylation, which can be readily performed using Dam + E. coli strains for plasmid propagation. Analogous restriction-negative mutants have also been produced in C. acetobutylicum (Dong, et al., 2010) and C. cellulolyticum (Cui, et al., 2012) through disruption of the genes encoding the Cac824l and Ccel Type II RM systems, respectively. Our gene disruption system should prove to be broadly applicable to any non-essential gene within the genome of C. pasteurianum, permitting that a viable intron insertion site can be identified. Since metabolic engineering approaches often involve disruption of multiple genes and metabolic pathways, strains with multiple markerless intron insertions can be envisioned using our strategy by employing iterative rounds of intron retargeting, electrotransformation, enrichment, and plasmid curing. In fact, a recent report has detailed intron-mediated disruption of up to five genes in C. acetobutylicum (Jang, et al., 2014). Since the AcpaAIR mutant constructed in this report is the first C. pasteurianum mutant strain obtained using group II intron technology, we recommend using the cpaAIR( 176a) target site and plasmid pSYCP-cpaAIR as a control for future intron-mediated gene disruption studies. The cpaAIR(U6a) insertion site generated a predicted insertional score of 7.3 using a Ll.ltrB insertion site prediction algorithm (TargeTronics, LLC; Austin, TX) and, therefore, we recommend selecting sites of equal or greater score for disruption of genes in C.

pasteurianum using the Ll.ltrB group II intron. It is expected that the intron donor plasmid and associated gene disruption methodologies detailed herein will add to the expanding genetic toolkit available for C. pasteurianum and lead to rewarding metabolic engineering efforts involving this important biotechnological bacterium.

EXAMPLES

[0028] The following examples are provided by way of illustration and not by limitation.

Example 1

Bacterial strains, plasmids, and oligonucleotides

[0029] Bacterial strains and plasmids employed in this work are listed in Table 1 and oligonucleotide sequences are given in Table 2. E. coli DH5a was utilized for vector construction and cloning purposes and for methylation of E. coli-C. pasteurianum shuttle vectors destined for C. pasteurianum (Pyne, et al., 2013). Vectors plMP1 (Mermelstein, et al., 1992) and pSY6 (Shao, et al., 2007) were kindly provided by Professor Terry Papoutsakis (University of Delaware; Newark, DE) and Professor Sheng Yang (Chinese Academy of Sciences; Shanghai, China). Plasmid pMTL85141 (Heap, et al., 2009) and the ClosTron vector, pMTL007C-E2 (Heap, et al., 2010), were kindly shared by Professor Nigel Minton (University of Nottingham; Nottingham, UK). Oligonucleotides and gBlocks were synthesized by Integrated DNA Technologies (IDT; Coralville, IA). Oligonucleotides were prepared at the 25 nm scale using standard desalting. Custom gene synthesis was performed by Bio Basic Inc. (Markham, ON).

Example 2

Growth and maintenance conditions

[0030] E. coli strains were cultivated aerobically at 37 °C in lysogeny broth (LB) and recombinant derivatives were selected, when necessary, with ampicillin (100 g ml ), chloramphenicol (30 g ml -1 ), or kanamycin (30 g ml -1 ). Antibiotic levels were reduced by half for selection of E. coli strains harboring two vectors. C. pasteurianum strains were grown anaerobically at 37 °C in 2*YTG medium (16 g I 1 tryptone, 10 g I 1 yeast extract, 5 g I 1 glucose, and 4 g I 1 sodium chloride, pH 6.3) within an anaerobic containment chamber (Plas-Labs; Lansing, Ml) containing an atmosphere of 5% CO2, 10% H2, and 85% N2. Strict anaerobic conditions were maintained and monitored through the use of a palladium catalyst fixed to the heater of the chamber, removal of oxygen from growth medium via autoclaving, and addition of resazurin (1 mg I 1 ) to all solid and liquid media preparations. Recombinant C. pasteurianum strains were selected, when necessary, with 10 pg ml "1 thiamphenicol or 25 pg ml erythromycin. Recombinant E. coli and C. pasteurianum we e stored frozen in 15% glycerol at -80 °C (both species) or as sporulated colonies on solidified 2*YTG agar plates (C.

pasteurianum).

Example 3 DNA isolation, manipulation, and electrotransformation

[0031 ] Plasmid DNA was extracted from E. coli and purified using an EZ-10 Spin Column Plasmid DNA Miniprep Kit (Bio Basic, Inc.; arkham, ON). Intact, high molecular weight C. pasteurianum genomic DNA was extracted from a 60 ml culture (OD60D 0.5-0.7) by first washing cells in 40 ml of a buffer containing 25 mM potassium phosphate, pH 7.0, and 6 mM MgS04, followed by resuspension in 15 ml of the same buffer supplemented with 50% sucrose and 200 pg/ml lysozyme (Kell, et al, 1981 ;

Wilkinson, et al., 1993). After anaerobic incubation at 37 °C for 45 minutes, genomic DNA was extracted from 1 .0-5.0 ml samples of protoplast suspension using a DNeasy Blood and Tissue Kit from Qiagen (Valencia, CA). Due to the high nuclease content of Clostridia, hypertonic sucrose (50% w/v) was added to buffer ATL during cell lysis (Wilkinson and Young, 1993). Eluted genomic DNA was treated with 100 pg/ml RNase A prior to additional purification using a Genomic DNA Clean & Concentrator kit from Zymo Research (Irvine, CA).

[0032] DNA restriction fragments and PCR products were purified directly or from agarose gels using an EZ-10 Spin Column DNA Gel Kit (Bio Basic, Inc.; Markham, ON). Vector construction was carried out according to standard procedures (Sambrook, et al., 1989). Restriction enzymes, Standard Taq DNA Polymerase, Phusion High-Fidelity DNA Polymerase, and Quick Ligation Kit were purchased from New England Biolabs (Whitby, ON). All commercial enzymes and kits were used according to the

manufacturer's instructions. Electrotransformation of C. pasteurianum was performed as previously described (Pyne, et al., 2013). Example 4

Vector construction

[0033] Plasmid pMTL007C-E6 was constructed from pMTL007C-E2 by ligation of a 878 bp Ascl + Fsel digestion fragment of p TL85141 containing the repL replication module with a 7,300 bp product of pMTL007C-E2 resulting from digestion with the same restriction enzymes.

[0034] pLtrB was constructed by digesting pSY6catP with Nhel + BstAPI, extracting the resulting 6,131 bp fragment, and ligating it with complementary oligos ItrB.Nhel.S (SEQ ID NO: 1 ) + ItrB.BstAPI.AS (SEQ ID NO: 2) that had been annealed to generate compatible Nhel and BstAPI restriction ends. Complementary oligonucleotides were mixed in equimolar amounts, heated to 95 °C in a water bath, and allowed to anneal by disconnecting the power source from the water bath. For construction of pLtrA, the entire 3,426 bp Ll.ltrB-/frA intron region was removed from pSY6catP using Xhol + BstAPI digestion and replaced with a 2,398 bp PCR product containing only the ItrA coding sequence, generated using primers ItrA.Xhol.S (SEQ ID NO: 3) + ItrA. BstAPI . AS (SEQ ID NO: 4). The resulting product was digested with Xhol + BstAPI and ligated with the 5,072 bp pSY6catP vector backbone to place ItrA under transcriptional control of the ptb promoter. Plasmid pDelPptb was derived from pSY6catP by replacement of the 131 bp Smal + Xhol digestion product containing the C. acetobutylicum ptb promoter with a 56 bp stuffer fragment lacking -35 and -10 promoter signals derived by annealing oligonucleotides del-Pptb.S (SEQ ID NO: 5) + del-Pptb.AS (SEQ ID NO: 6). Ligation- proficient Smal and Xhol restriction ends were generated upon successful annealing of complementary oligonucleotides. pDel2dcm was derived from pSY6catP by replacing a 924 bp Sacll + BstAPI restriction fragment with the same 924 bp sequence in which the two Dcm sites were mutated by two single-base-pair mutations. The Dcm deletion fragment was synthesized by Bio Basic, Inc. (Markham, ON), digested with Sacll + BstAPI, and ligated into the corresponding sites of pSY6catP. The three restriction fragment deletion constructs, pMB, pNM, and pNS, were prepared by digesting pSY6catP with Mfel + BstAPI, Nhel + Mfel, and Nhel + Sacll , respectively, and annealing the resulting vector backbones with the respective annealed oligonucleotide pairs, Mfel/BstAPI.S (SEQ ID NO: 7) + Mfel/BstA I.AS (SEQ ID NO: 8) (pMB), Nhel/Mfel.S (SEQ ID NO: 9) + Nhel/Mfel.AS (SEQ ID NO: 10) (pNM), and Nhel/Sacll.S (SEQ ID NO: 1 1 ) + Nhel/Sacll.AS (SEQ ID NO: 12) (pNS).

[0035] To construct pFragl , pFrag2, and pFrag3, a 1 ,332 bp Bglll + EcoO109l restriction fragment was removed from pSY6catP and replaced with a 589 bp (primers fragl . Bglll . S (SEQ ID NO: 13) + fragl . EcoO109I.AS (SEQ ID NO: 14)), 363 bp (primers frag2.Bglll .S (SEQ ID NO: 15) + frag2. EcoO109I.AS(SEQ ID NO: 16)), or 574 bp (primers frag3. Bgll l. S (SEQ ID NO: 17) + frag3.EcoO109I.AS(SEQ ID NO: 18)) PCR product, respectively, corresponding to various products of the ItrA coding region of pSY6catP. To construct pSY334, a 334 bp Sacll + Aatll fragment of the ItrA coding sequence was subcloned into the corresponding sites of pMTL85141 . To mutate the unknown restriction recognition sequence within the inhibitory 334 bp region of the ItrA coding sequence, a 655 bp gBlock was synthesized possessing 98 silent mutations in which 83 codons were altered. A 731 bp PCR product containing the 3' ItrA coding sequence and downstream region and possessing 25 bp overlap with the mutated gBlock was amplified using primers 3'SOE.S (SEQ ID NO: 19) + SOE.EcoO109I .AS (SEQ ID NO: 20). The PCR product was loaded on a 1 .0% agarose gel, stabbed with a micropipette tip, and used as template along with 5 ng of the purified gBlock in a splicing by overlap extension (SOE) PCR by cycling for 10 cycles prior to adding primers gBlock.Bglll.S (SEQ ID NO: 21 ) + SOE.EcoO109I.AS (SEQ ID NO: 20) and cycling for 25 additional cycles. The resulting product was digested with Bglll +

EcoO109l and ligated with pSY6catP that had been digested with the same restriction endonucleases to generate pMut98. Plasmids pMTLCP-E2 and pMTLCP-E6 were constructed by subcloning a 1 ,427 bp Mscl + Acll restriction fragment of pMut98 into the corresponding sites of pMTL007C-E2 and pMTL007C-E6, respectively.

[0036] To mutate the CpaAII recognition site of pSY6catP using three silent point mutations, a gBlock was synthesized in which codons 521 (AGU- UCLI; Ser) and 523 (GCU- GCC; Ala) were mutated within the ItrA coding sequence. In a manner similar to pMut98, the mutated gBlock was fused with the same 731 bp 3' ItrA product using SOE PCR. The resulting product was digested with Bglll + EcoO109l and ligated with Bglll- and EcoO109l-digested pSY6catP to yield plasmid pDelCpaAII. Conversely, a unique CpaAII restriction recognition sequence was generated within pMTL85141 by first annealing complementary oligonucleotides CpaAII-anneal.S (SEQ ID NO: 22) + CpaAII- anneal.AS (SEQ ID NO: 23). The resulting annealed product, possessing cohesive Xbal + Hindi II ends, was ligated into the corresponding sites of pMTL85141 to give plasmid pCpaAI I. To construct the dual methylating plasm id pFnuDIIMKn-CpaAIIMS, we amplified M.CpaAII from C. pasteurianum genomic DNA using primers M.CpaAII.Fw (SEQ ID NO:

31 ) and M.CpaAII. Rv (SEQ ID NO: 32), which were synthesized as Ultramer oligonucleotides by Integrated DNA Technologies. M.CpaAII.Fw (SEQ ID NO: 31) adds a bacterial promoter to the M.CpaAII coding sequence, while M.CpaAII. Rv (SEQ ID NO:

32) adds a terminator sequence. The resulting blunt-ended PCR product was cloned into Seal-digested plasmid pFnuDIIMKn, so that the M.CpaAII coding sequence was in the same orientation as the M.FnuDI! coding sequence, to yield the plasmid

pFnuDllM n-CpaAIIM. Similarly, we amplified S.CpaAII from C. pasteurianum genomic DNA using primers S.CpaAII. Fw (SEQ ID NO: 33) and S.CpaAII. Rv (SEQ ID NO: 34) and cloned the resulting PCR product into Eco53kl -digested pFnuDllMKn-CpaAIIM, in the same orientation as the M.FnuDII coding sequence, to yield the final methylating plasmid pFnuDIIMKn-CpaAIIMS.

[0037] Ll.ltrB intron design was performed using the computer algorithm developed by TargeTronics, LLC (Austin, TX). The insertion site with the highest predicted insertion score splicing into the antisense strand was selected corresponding to nucleotide position 176 (score of 7.3) of the CpaAI restriction endonuclease gene, cpaAIR. Plasmid pMut98 was used as the basis for a C. pasteurianum TargeTron gene disruption vector. For retargeting pMut98, a 572 bp g Block fragment was synthesized possessing mutations in the IBS, EBS2, and EBS1 d intron regions corresponding to position 176a of the cpaAIR gene. The retargeted gBlock was designed with a constitutive C. pasteurianum thiolase promoter controlling transcription of the intron and ItrA gene. The gBlock fragment was PCR-amplified using primers SYCP.gBlock.S (SEQ ID NO: 24) + SYCP.gBlock.AS (SEQ ID NO: 25), digested with BamHI + BsrGI, and ligated into the corresponding sites of pMut98 to generate pSYCP-cpaAIR. To retarget the ClosTron vectors pMTLCP-E2 and pMTLCP-E6 to the cpaAIR gene of C.

pasteurianum, primers MTLCP.REN-Hindlll.S (SEQ ID NO: 26) and SYCP.gBlock.AS (SEQ ID NO: 25) were used to amplify the gBlock targeted to the CpaAI endonuclease. The resulting 384 bp PCR product was digested with Hindlil + BsrGI and ligated into the corresponding sites of pMTLCP-E2 and pMTLCP-E6 to give p TLCP-E2-cpaAIR and pMTLCP-E6-cpaAIR, respectively.

Example 5

SMRT genome sequencing and methylome analysis

[0038] SMRT sequencing was performed on intact, purified genomic DNA from C. pasteurianum according to a previous report (Pyne, et al., 2014b). SMRT reads were assembled by the Biosciences Division at Oak Ridge National Laboratory (Oak Ridge, TN) and the resulting assembly was used as a reference genome for methylome analysis, which was carried out by Pacific Biosciences (Menlo Park, CA).

Example 6

Group-ll-intron-mediated gene disruption, enrichment, and screening

[0039] To isolate a chromosomal gene disruption mutant, plasmid pSYCP-cpaAIR was first electrotransformed into C. pasteurianum and transformants were selected using 10 pg ml "1 thiamphenicol. Mosaic colonies containing both wild-type and intron insertion cells were identified with two separate PCRs using primers ItrB.Fw (SEQ ID NO: 29) + REN.Fw (SEQ ID NO: 27) for one chromosome-intron junction and primers ItrB.Rv (SEQ ID NO: 30) + REN. Rv (SEQ ID NO: 28) for the adjacent junction. One positive colony was selected for enrichment of the intron disruption by repeated subculturing in selective growth medium. Briefly, a sporulated colony was heat -shocked in 10 ml of 2*YTG medium, cooled on ice, and supplemented with 10 9 ml 1 thiamphenicol. Following approximately 24 h of growth, 0.5 ml was used to inoculate a tube of 10 ml 2xYTG containing 10 g ml 1 thiamphenicol. This process was repeated every 12 hours for a total of 10 transfers, at which time serial dilutions were plated onto nonselective 2*YTG agar. To identify a homogeneous gene disruption colony, colony PCR was performed on enrichment colonies using two gene-specific primers flanking the intron insertion site (REN.Fw (SEQ ID NO: 27) + REN.Rv (SEQ ID NO: 28)).

Example 7

Curing AcpaAIR cells of plasmid pSYCP-cpaAIR

[0040] To cure AcpaAIR disruption cells of the pSYCP-cpaAIR intron donor plasmid, a single AcpaAIR disruption colony was heat-shocked in 10 ml 2xYTG medium without selection. Once exponential-phase growth was observed, 0.5 ml was used to inoculate a new tube of 10 ml 2*YTG. This process was repeated every 12 hours for a total of three transfers, at which time serial dilutions were plated onto nonselective 2xYTG agar. Colonies were screened for the absence of plasmid pSYCP-cpaAIR by restreaking onto both non-selective and selective (10 μ9 ml "1 thiamphenicol) 2xYTG agar plates. For thiamphenicol-sensitive colonies, plasmid loss was further confirmed by lack of colony PCR amplification using primers Frag3.Bglll.S (SEQ ID NO: 17) + Frag3.EcoO109I.AS (SEQ ID NO: 18) and absence of growth in liquid 2xYTG medium containing 10 pg ml 1 thiamphenicol.