Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
ASSEMBLING SYNTHETIC DNA CONSTRUCTS FROM NATURAL DNA
Document Type and Number:
WIPO Patent Application WO/2023/215399
Kind Code:
A1
Abstract:
The disclosure provides for methods of constructing synthetic chromosomes including the steps of providing host cells with an endogenous chromosome, transforming a cloning vector and a cloning cassette into the host cells, excising target genomic nucleic acids from the endogenous chromosome, recombining the excised target genomic nucleic acids with the cloning cassette via homologous recombination to form heterologous vectors comprising cloned sequences, extracting the heterologous vectors containing the cloned sequences from the host cells, digesting the heterologous vectors with a restriction endonuclease to release the cloned sequences from the heterologous vectors to provide released cloned sequences, and introducing the released cloned sequences, a centromere cassette, and a yeast artificial chromosomes or bacterial artificial chromosomes into a second host cell, such that the released cloned sequences, the centromere cassette, and the YAC or BAC recombine with one another via homologous recombination to produce the synthetic chromosome.

Inventors:
CORADINI ALESSANDRO VENEGA (US)
EHRENREICH IAN (US)
Application Number:
PCT/US2023/020865
Publication Date:
November 09, 2023
Filing Date:
May 03, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV SOUTHERN CALIFORNIA (US)
International Classes:
C12N5/10; C12N9/22; C12N15/09; C12N15/66; C12N1/14; C12N1/21
Foreign References:
US5643763A1997-07-01
US7521240B22009-04-21
US20200063164A12020-02-27
Attorney, Agent or Firm:
JUDD, Paul K. (US)
Download PDF:
Claims:
What is claimed is:

1. A method for constructing a synthetic chromosome comprising the steps of: providing host cells comprising an endogenous chromosome; transforming a cloning vector and a cloning cassette into the host cells; excising target genomic nucleic acids from the endogenous chromosome; recombining the excised target genomic nucleic acids with the cloning cassette via homologous recombination to form heterologous vectors comprising cloned sequences; extracting the heterologous vectors containing the cloned sequences from the host cells; digesting the heterologous vectors with a restriction endonuclease to release the cloned sequences from the heterologous vectors to provide released cloned sequences; and introducing the released cloned sequences, a centromere cassette, and a yeast artificial chromosomes or bacterial artificial chromosomes into a second host cell, wherein the released cloned sequences, the centromere cassette, and the yeast artificial chromosomes or bacterial artificial chromosomes recombine with one another via homologous recombination to produce the synthetic chromosome.

2. The method of claim 1, wherein the host cells express a CRISPR endonuclease protein; and wherein the transformation step further comprises transforming into the host cells guide ribonucleic acids (gRNAs) that are complementary to regions flanking the target genomic nucleic acid sequences of the endogenous chromosome, wherein the CRISPR endonuclease protein and the gRNAs excise the target genomic nucleic acid sequences from the endogenous chromosome.

3. The method of claim 1, wherein the cloning cassette comprises two separate linear segments of nucleic acid, wherein a first segment of nucleic acid comprises a first hook region and a second segment of nucleic acid comprises a second hook region, wherein the first hook region comprises formula A-B- C, wherein:

A comprises a unique restriction endonuclease site and first homology region comprising a nucleic acid sequence complementary to first region of the cloning vector; B comprises a first adapter region comprising a nucleic acid sequence complementary to another adapter region; and

C comprises a first complementary region comprising a nucleic acid sequence complementary to a region flanking the excised target genomic nucleic acids; the second hook region comprises formula D-E-F, wherein:

D comprises a second complementary region comprising a nucleic acid sequence complementary to a region flanking the excised target genomic nucleic acids; E comprises a second adapter region comprising a nucleic acid sequence complementary to another adapter region; and

F comprises a unique restriction endonuclease site and a second homology region comprising a nucleic acid sequence complementary to second region of the cloning vector; and wherein the two separate pieces of nucleic acid are co-transformed into the host cells.

4. The method of claim 1, wherein the cloning vector comprise the cloning cassette, and the cloning cassette comprises a first hook region, a second hook region, and a multiple cloning site (MCS) disposed therebetween, wherein the first hook region comprises formula A-B-C, wherein:

A comprises a unique restriction endonuclease site;

B comprises a first adapter region comprising a nucleic acid sequence complementary to another adapter region; and

C comprises a first complementary region comprising a nucleic acid sequence complementary to a region flanking the excised target genomic nucleic acids; the second hook region comprises formula D-E-F, wherein:

D comprises a second complementary region comprising a nucleic acid sequence complementary to a region flanking the excised target genomic nucleic acids;

E comprises a second adapter region comprising a nucleic acid sequence complementary to another adapter region; and

F comprises a unique restriction endonuclease site; and wherein prior to the transformation step, the cloning cassette is linearized by restriction endonuclease digestion using a restriction endonuclease that recognizes and cuts a nucleic acid sequence within the MCS.

5. The method of claim 3, wherein the recombining step further comprises the first complementary region and the second complementary region separately contacting and binding to complementary nucleic acid sequences flanking the excised target genomic nucleic acids; and the first homology region and the second homology region separately contacting and binding to the first region and second region of the cloning vector, respectively.

6. The method of claim 4, wherein during the digesting step, the restriction endonuclease recognizes and cuts unique restriction endonuclease sites A and F flanking the cloning cassette, thereby releasing the cloned sequences from the heterologous vectors to provide released cloned sequences.

7. The method of claim 5, wherein the released cloned sequences comprise structure B-CS-E, wherein CS comprises the cloned sequences.

8. The method of claim 6, wherein during the introducing step, the first adapter region and the second adapter region of a first released clone sequence are complementary to and binds to one of the first adapter sequence and the second adapter sequence of a second released cloned sequence to permit homologous recombination between the first released clone sequence and the second released clone sequence to produce the synthetic chromosome.

9. The method of claim 3, wherein the cloning cassette is about 100 to about 1000 base pairs in length, wherein each of the B and E moieties is about 25 base pairs to about 200 base pairs in length; each of the C and D moieties is about 50 base pairs to about 200 base pairs in length; and each of the A and F moieties is about 4 base pairs to about 50 base pairs in length.

10. The method of claim 1, wherein the transforming step further comprises introducing a repair template into the host cells, wherein the repair template comprises: a selectable marker; a first homology arm; and a second homology arm; wherein the first homology arm and the second homology are complementary to nucleic acid sequences of the endogenous chromosome that flank the target genomic nucleic acids to permit homologous recombination between the repair template and the endogenous chromosome, thereby repairing the endogenous chromosome.

11. The method of claim 1, wherein the cloning vectors further comprise a selectable marker, an origin of replication, a centromere sequence, a sop genes, an oriV sequence, an oriS sequence, a rep gene, or a combination thereof.

12. The method of claim 1, wherein the centromere cassette comprises an origin of replication, a centromere sequence, and an adapter arm flanking both the origin of replication and the centromere sequence, and optionally, a selectable marker; wherein the adapter arm is complementary to another adapter region of the cloning cassette.

13. The method of claim 1, wherein the yeast artificial chromosomes or bacterial artificial chromosomes comprises a selectable marker, and homology arms that are complementary to adapter regions of the cloning cassette.

14. The method of claim 1, wherein the synthetic chromosome comprises deletion of segments of nucleic acid as compared to the endogenous chromosome; or wherein the synthetic chromosome comprises a rearrangement of nucleic acid segments as compared to an arrangement of corresponding nucleic acid segments of the endogenous chromosome.

15. The method of claim 1 , wherein the host cells are cells of at least two different genus or species, wherein the synthetic chromosome is a chimeric chromosome comprising nucleic acid sequences from the at least two different genus or species.

16. The method of claim 1, wherein the host cells are one or more of a human cell, a fungal cell, or a bacterial cell.

17. A recombinant cell having at least one synthetic chromosome produced according to the method of claim 1.

Description:
ASSEMBLING SYNTHETIC DNA CONSTRUCTS FROM NATURAL DNA

RELATED APPLICATIONS

This application claims priority under 35 U.S.C. § 119(e) to U.S. Provisional Patent Application No. 63/337,891, filed May 3, 2022, which is incorporated herein by reference.

GOVERNMENT SUPPORT

This invention was made with government support under Grant Number R35GM130381, awarded by the National Institutes of Health, and Grant Number 2124400, awarded by the National Science Foundation. The government has certain rights in the invention.

BACKGROUND OF THE INVENTION

It is now possible to answer fundamental questions in biology by synthesizing chromosomes. For example, a longstanding question has been what is the minimal set of genes required to produce a living cell? To answer this question, researchers used design-build-test cycles to synthesize a Mycoplasma mycoides chromosome that contains only 473 genes and is still capable of producing a free-living bacterium that replicates on lab timescales. Among the genes in this minimal set, 83% functioned in the expression and preservation of genetic information, the cell membrane, or cytosolic metabolism, while 17% had unknown functions. Identification of this minimal gene set demonstrates the potential for using chromosome synthesis to understand mechanisms underlying cellular life and its diversity.

To date, synthetic chromosomes have exclusively been generated de novo, through the progressive assembly of small synthetic fragments of DNA into larger molecules by a combination of in vitro and in vivo techniques. De novo synthesis is powerful because it allows the complete reprogramming of a chromosome’s sequence and structure. For example, de novo chromosome synthesis was used to generate an Escherichia coli strain in which all 18,214 instances of three codons were synonymously reprogrammed, resulting in a strain that utilizes only 61 codons. In another example, the Sc2.0 community is using de novo chromosome synthesis to generate a strain of the model budding yeast Saccharomyces cerevisiae in which all transposable elements have been eliminated and LoxP sites have been incorporated between genes, enabling the generation of random chromosome rearrangements by Cre recombinase.

The substantial amount of DNA fragment synthesis and assembly involved in de novo chromosome synthesis limits its use in biological research. Reductions in labor and reagent costs are needed to enable biologists to employ chromosome synthesis more widely. As such, there is a need for new methods of building synthetic chromosomes from cloned segments of natural DNA could be a relatively cheap and fast alternative to de novo chromosome synthesis. Such a method would enable the use of chromosome synthesis in research that does not require complete chromosome reprogramming. For example, projects that could be enabled include mapping the genetic basis of trait differences between individuals and species, probing the structural requirements of chromosomes, and streamlining chromosomes through the multiplex deletion of non-essential regions. The present disclosure satisfies these needs.

SUMMARY OF THE INVENTION

This disclosure introduces the CReATiNG (Cloning, Reprogramming, and Assembling Tiled Natural Genomic DNA) method for building synthetic chromosomes from natural components in .S'. cerevisiae. The first step of CReATiNG is the cloning of natural chromosome segments such that unique adapter sequences are appended to their termini, specifying how these molecules will recombine with each other later when they are assembled. The second step of CReATiNG is co-transforming cloned segments into cells and assembling them by homologous recombination in vivo. Synthetic chromosomes generated with CReATiNG can replace the native chromosomes in cells, making it possible to directly test their phenotypic effects. Here, we describe CReATiNG and demonstrate several of its use cases.

Accordingly, the disclosure provides methods for constructing one or more synthetic chromosomes as described herein. In one aspect, a method for constructing a synthetic chromosome comprises the steps of: providing one or more host cells comprising one or more endogenous chromosomes; transforming one or more cloning vectors and one or more cloning cassettes into the one or more host cells; excising one or more target genomic nucleic acids from the one or more endogenous chromosomes; recombining the excised target genomic nucleic acids with the one or more cloning cassettes via homologous recombination to form one or more heterologous vectors comprising one or more cloned sequences; extracting the one or more heterologous vectors containing the one or more cloned sequences from the host cells; digesting the one or more heterologous vectors with a restriction endonuclease to release the one or more cloned sequences from the heterologous vectors to provide one or more released cloned sequences; and introducing the one or more released cloned sequences, one or more centromere cassettes, and one or more yeast artificial chromosomes or bacterial artificial chromosomes into a second host cell, wherein the one or more released cloned sequences, the one or more centromere cassettes, and the one or more yeast artificial chromosomes or bacterial artificial chromosomes recombine with one another via homologous recombination to produce the synthetic chromosome.

In one aspect, the host cells express a Cas9 endonuclease protein; and wherein the transformation step further comprises transforming into the one or more host cells one or more guide ribonucleic acids (gRNAs) that are complementary to regions flanking the one or more target genomic nucleic acid sequences of the one or more endogenous chromosome, wherein the Cas9 endonuclease protein and the gRNAs excise the one or more target genomic nucleic acid sequences from the one or more endogenous chromosomes. While embodiments associated with this disclosure exemplify the use of Cas9, other DNA-targeting Cas enzymes, such as Casl2a/Cpfl, are also compatible with CReATiNG.

In one aspect, each of the one or more cloning cassettes comprise two separate linear segments of nucleic acid, wherein a first segment of nucleic acid comprises a first hook region and a second segment of nucleic acid comprises a second hook region, wherein the first hook region comprises formula A-B-C, wherein: A comprises a unique restriction endonuclease site and first homology region comprising a nucleic acid sequence complementary to first region of the cloning vector; B comprises a first adapter region comprising a nucleic acid sequence complementary to another adapter region; and C comprises a first complementary region comprising a nucleic acid sequence complementary to a region flanking the excised target genomic nucleic acids; the second hook region comprises formula D- E-F, wherein: D comprises a second complementary region comprising a nucleic acid sequence complementary to a region flanking the excised target genomic nucleic acids; E comprises a second adapter region comprising a nucleic acid sequence complementary to another adapter region; and F comprises a unique restriction endonuclease site and a second homology region comprising a nucleic acid sequence complementary to second region of the cloning vector; and wherein the two separate pieces of nucleic acid are co-transformed into the host cells.

In another aspect, wherein the one or more cloning vectors comprise the cloning cassette, the cloning cassette comprises a first hook region, a second hook region, and a multiple cloning site (MCS) disposed there between, wherein the first hook region comprises formula A-B-C, wherein: A comprises a unique restriction endonuclease site; B comprises a first adapter region comprising a nucleic acid sequence complementary to another adapter region; and C comprises a first complementary region comprising a nucleic acid sequence complementary to a region flanking the excised target genomic nucleic acids; the second hook region comprises formula D-E-F, wherein: D comprises a second complementary region comprising a nucleic acid sequence complementary to a region flanking the excised target genomic nucleic acids; E comprises a second adapter region comprising a nucleic acid sequence complementary to another adapter region; and F comprises a unique restriction endonuclease site; and wherein prior to the transformation step, the cloning cassette is linearized by restriction endonuclease digestion using a restriction endonuclease that recognizes and cuts a nucleic acid sequence within the MCS.

In another aspect, the recombining step further comprises the first complementary region and the second complementary region separately contacting and binding to complementary nucleic acid sequences flanking the one or more excised target genomic nucleic acids; and the first homology region and the second homology region separately contacting and binding to the first region and second region of the one or more cloning vectors, respectively.

In another aspect, the transforming step further comprises introducing one or more repair templates into the host cells, wherein the one or more repair templates comprise: a selectable marker; a first homology arm; and a second homology arm; wherein the first homology arm and the second homology are complementary to nucleic acid sequences of the one or more endogenous chromosome that flank the one or more target genomic nucleic acids to permit homologous recombination between the one or more repair templates and the one or more endogenous chromosomes, thereby repairing the endogenous chromosome.

In another aspect, during the digesting step, a restriction endonuclease recognizes and cuts unique restriction endonuclease sites A and F flanking the one or more cloning cassettes, thereby releasing the one or more cloned sequences from the one or more heterologous vectors to provide one or more released cloned sequences. In another aspect, the one or more released cloned sequences comprise structure B-CS-E, wherein CS comprises the one or more cloned sequences.

In another aspect, during the introducing step, the first adapter region and the second adapter region of a first released clone sequence are complementary to and binds to one of the first adapter sequence and the second adapter sequence of a second released cloned sequence to permit homologous recombination between the first released clone sequence and the second released clone sequence to produce the synthetic chromosome.

In another aspect, some embodiments of methods of making synthetic chromosomes also may comprise the use of centromere cassettes to form the synthetic chromosome. The centromere cassette comprises an origin of replication, a centromere sequence, and an adapter arm flanking both the origin of replication and the centromere sequence, and optionally, a selectable marker; wherein the adapter arm is complementary to another adapter region of the cloning cassette.

In another aspect, the synthetic chromosome described herein may comprise deletion of segments of nucleic acid as compared to the endogenous chromosome; or wherein the synthetic chromosome comprises a rearrangement of nucleic acid segments as compared to an arrangement of corresponding nucleic acid segments of the endogenous chromosome.

In another aspect, the one or more host cells are cells of at least two different genera or species, wherein the synthetic chromosome is a chimeric chromosome comprising nucleic acid sequences from the at least two different genera or species. In another aspect, the one or more host cells are one or more of a human cell, a fungal cell, or a bacterial cell.

These and other features and advantages of this invention will be more fully understood from the following detailed description of the invention taken together with the accompanying claims. It is noted that the scope of the claims is defined by the recitations therein and not by the specific discussion of features and advantages set forth in the present description.

BRIEF DESCRIPTION OF THE DRA WINGS

The following drawings form part of the specification and are included to further demonstrate certain embodiments or various aspects of the invention. In some instances, embodiments of the invention can be best understood by referring to the accompanying drawings in combination with the detailed description presented herein. The description and accompanying drawings may highlight a certain specific example, or a certain aspect of the invention. However, one skilled in the art will understand that portions of the example or aspect may be used in combination with other examples or aspects of the invention.

FIG. 1A-D. Synthesizing chromosomes from natural components in yeast using CReATiNG. a. The Bacterial Artificial Chromosome/Yeast Artificial Chromosome (BAC/YAC) vector (pASCl) used for cloning natural chromosome segments in vivo. Homology may be flanked by sequence adapters that program how a segment will assemble with others in later steps, b. A segment is cloned by cotransforming a linearized cloning vector, gRNAs targeting both sides of the segment, and a selectable repair template into a donor cell constitutively expressing Cas9. Cloned segments are then extracted from yeast and transferred to the E. coli. c. Cloned segments are excised from the vector through restriction digestion with I-Scel. These molecules are then purified and co-transformed into a recipient yeast cell with a centromere cassette and a centromere-free version of the BAC/YAC cloning vector (pASC2). These molecules are assembled into a synthetic chromosome by homologous recombination in vivo, while the native chromosome is eliminated by centromere destabilization and counterselection, d. Oxford Nanopore Technologies sequencing confirmed the correct assembly of Saccharomyces paradoxus ChrI (white) and replacement of the native .S', cerevisiae ChrI (black) in BY. The plot shows reads mapped to each chromosome using a reference genome including both BY and .S', paradoxus ChrI.

FIG. 2A-F. Recombining chromosomes between strains and species using CReATiNG. a. Segments 1 through 3 were cloned from two additional strains, the BY (dark gray) and RM (light gray) strains of .S', cerevisiae. The 27 possible syntenic combinations of the .S', paradoxus (white), BY, and RM segments were then assembled in BY recipient cells and the native ChrI was eliminated, b. Representative validations of assembled chromosomes by Oxford Nanopore Technologies sequencing. In silico designs of each chromosome are shown with a subset of mapped reads plotted below them, c and d. Euploid strains carrying the synthetic chromosomes were phenotyped for doubling time in rich medium containing glucose at 30°C and 35°C, respectively. Means are shown for each strain, with error bars representing one standard deviation around the mean. e. The mean effect (horizontal line) of each segment across all genotypes at 30°C is shown, with dots representing the mean of a genotype across 12 replicates, f. The mean effect (horizontal line) of each segment across all genotypes at 35°C is shown, with dots representing the mean of a genotype across 12 replicates.

FIG. 3A-B. Restructuring ChrI with CReATiNG. a. Five possible restructured versions of BY ChrI were created by altering the adapters appended to each segment. Oxford Nanopore sequencing was used to confirm that the chromosome assemblies had the correct structure. In silico designs of each chromosome are shown, b. Growth analysis of the natural and five non-natural chromosome structures in rich medium containing glucose at 30°C. Means are shown for each strain, with error bars representing one standard deviation around the mean.

FIG. 4A-D. Multiplex gene deletion with CReATiNG. a. We attempted to delete 10 non- adjacent regions of the chromosome core and both subtelomeres from BY ChrI, totaling 39.9% of the chromosome. To do this, we cloned 11 segments of the ChrI, assembled them, and performed native chromosome elimination, b. Oxford Nanopore Technologies sequencing of a colony confirmed results from PCR checks. The colony with the most deletions (nine regions) had the correct structure, but had retained the region containing SYN8. An in silico design of the chromosome is shown with a subset of mapped reads plotted below it. c. Growth rate analysis of different BY strains, including the unaltered reference strain (BY), a strain carrying a synthetic circular ChrI lacking subtelomeres (BY synthetic ChrI), a synthetic circular ChrI lacking nine core regions and both subtelomeres (multiple deletion ChrI), a synthetic circular ChrI lacking nine core regions, both subtelomeres, and SYN8 (multiple deletion ChrI syn8A). and the reference strain with SYN8 deleted (BY syn8A). Means are shown for each strain, with error bars representing one standard deviation around the mean. d. Recombination between the synthetic ChrI and the native ChrI resulted in the synthetic chromosome containing SYN8.

FIG. 5A-D. Reagents used for cloning in CReATiNG. a. DNA templates for producing gRNAs by in vitro transcription are generated by PCR. An oligonucleotide containing the tracrRNA is amplified using a tailed forward primer containing a 20 nt target sequence and a T7 promoter (SEQ ID NO: 164)(top sequence). The PCR reaction generates a dsDNA template that is transcribed in vitro into gRNAs (SEQ ID NO: 165) (bottom sequence) by T7 RNA polymerase. T7 promoter sequence (underlined); target sequence (bold); scaffold oligo and tracrRNA (italics) b. Map of the BAC7YAC vector pASCl. The vector contains a cloning site flanked by I-Scel where a cloning cassette is inserted using restriction digestion and ligation, c. Example of a cloning cassette design for capture of .S'. paradoxus segment 1. The cassette (SEQ ID NO: 166) contains segment-specific homology arms (underlined sequences) that are flanked by adapters (bold sequences) that program how cloned segments will recombine during chromosome assembly. Once the cloning cassette is added to pASCl, the vector is linearized by restriction digestion, exposing the homology arms. d. Map of the modified pRS316 vector (pMM_KanMX) used as a template for amplification of the KanMX repair template. Primers bind to the sites indicated as repair template priming sites.

FIG. 6A-D. The cloning step of CReATiNG has high efficiency, a. Map of the strategy for cloning the core of Saccharomyces paradoxus ChrI as three segments, b. Transformed cells are selected in SC plates lacking uracil and containing G418. c. Map of the pASCl cloning vector containing segment 1 from Saccharomyces paradoxus ChrI. The black arrows indicate primer sites for PCR junction checks, d. Cloning efficiency was accessed using junction PCRs. Electrophoresis in a 1% agarose gel shows the expected DNA bands for both junctions in 100% (5 of 5) checked colonies.

FIG. 7A-C. Assembly of a synthetic S. paradoxus ChrI in BY. a. The three segments of .S'. paradoxus ChrI were liberated from the cloning vector by I-Scel digestion and separated from the vector in an 0.5% agarose gel. b. The three segments, the assembly vector (pASC2), and a centromere cassette were co-transformed into and assembled in BY (white). The native BY ChrI (black) was marked for elimination prior to assembly, c. Correct assembly of .S', paradoxus ChrI and elimination of the native ChrI were initially confirmed by diagnostic PCRs targeting both chromosomes. The gel pictures on the left show junction PCR results for native BY ChrI before (upper) and after (bottom) native ChrI elimination. The gel pictures on the right show junction PCR results for the .S'. paradoxus ChrI before (upper) and after (bottom) native ChrI elimination.

FIG. 8A-B. S. paradoxus ChrI linearization using CRISPR/Cas9. a. We assembled chromosomes as circular molecules. To linearize the synthetic .S', paradoxus ChrI, we targeted the junctions between the chromosome and the vector with CRISPR/Cas9. We also provided repair templates containing synthetic telomere seed sequences (TSS) and selectable markers, URA3 and NatMX) for the left and right arms, respectively, b. Plate images showing cells in which the circular .S'. paradoxus ChrI was converted to its linear form. Cells with the linear ChrI become histidine auxotrophs but are uracil prototrophs with resistance to nourseothricin.

DETAILED DESCRIPTION OF THE INVENTION

De novo chromosome synthesis is costly and time-consuming, limiting its use in research and biotechnology. Building synthetic chromosomes from natural components is an unexplored alternative with many potential applications. Here, we report CReATiNG (Cloning, Reprogramming, and Assembling Tiled Natural Genomic DNA), a method for constructing synthetic chromosomes from natural components in yeast. CReATiNG entails cloning segments of natural chromosomes and then programmably assembling them into synthetic chromosomes that can replace the native chromosomes in cells. We used CReATiNG to synthetically recombine chromosomes between strains and species, to modify chromosome structure, and to delete many linked, non-adjacent regions totaling 39% of a chromosome. The multiplex deletion experiment revealed that CReATiNG also enables recovery from flaws in synthetic chromosome design via recombination between a synthetic chromosome and its native counterpart. CReATiNG facilitates the application of chromosome synthesis to diverse biological problems.

Definitions .

The following definitions are included to provide a clear and consistent understanding of the specification and claims. Unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present disclosure belongs. See, e.g., Singleton et al., Dictionary of Microbiology and Molecular Biology 2nd ed., J. Wiley & Sons (New York, N.Y. 1994); Sambrook et al., Molecular Cloning, A Laboratory Manual, Cold Springs Harbor Press (Cold Springs Harbor, N.Y. 1989).

The methods described herein may employ, unless otherwise indicated, conventional techniques and descriptions of molecular biology (including recombinant techniques), cell biology, biochemistry, and cellular engineering technology, all of which are within the skill of those who practice in the art. Such conventional techniques include oligonucleotide synthesis, hybridization and ligation of oligonucleotides, transformation and transduction of cells, engineering of recombination systems, creation of transgenic animals and plants, and human gene therapy. Specific illustrations of suitable techniques can be had by reference to the examples herein. However, equivalent conventional procedures can, of course, also be used. Such conventional techniques and descriptions can be found in standard laboratory manuals such as Genome Analysis: A Laboratory Manual Series (Vols. I-IV) (Green, et al., eds., 1999); Genetic Variation: A Laboratory Manual (Weiner, et al., eds., 2007); Sambrook and Russell, Condensed Protocols from Molecular Cloning: A Laboratory Manual (2006); and Sambrook and Russell, Molecular Cloning: A Laboratory Manual (2002) (all from Cold Spring Harbor Laboratory Press); Protein Methods (Bollag et al., John Wiley & Sons 1996); Nonviral Vectors for Gene Therapy (Wagner et al. eds., Academic Press 1999); Viral Vectors (Kaplift & Loewy, eds., Academic Press 1995); Immunology Methods Manual (Lefkovits ed., Academic Press 1997); Gene Therapy Techniques, Applications and Regulations From Laboratory to Clinic (Meager, ed., John Wiley & Sons 1999); M. Giacca, Gene Therapy (Springer 2010); Gene Therapy Protocols (LeDoux, ed., Springer 2008); Cell and Tissue Culture: Laboratory Procedures in Biotechnology (Doyle & Griffiths, eds., John Wiley & Sons 1998); Mammalian Chromosome Engineering — Methods and Protocols (G. Hadlaczky, ed., Humana Press 2011); Essential Stem Cell Methods, (Lanza and Klimanskaya, eds., Academic Press 2011); Stem Cell Therapies: Opportunities for Ensuring the Quality and Safety of Clinical Offerings: Summary of a Joint Workshop (Board on Health Sciences Policy, National Academies Press 2014); Essentials of Stem Cell Biology, Third Ed., (Lanza and Atala, eds., Academic Press 2013); and Handbook of Stem Cells, (Atala and Lanza, eds., Academic Press 2012), all of which are herein incorporated by reference in their entirety for all purposes. Before the present compositions, research tools and methods are described, it is to be understood that this invention is not limited to the specific methods, compositions, targets and uses described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular aspects only and is not intended to limit the scope of the present invention, which will be limited only by the appended claims.

References in the specification to "one embodiment", "an embodiment", etc., indicate that the embodiment described may include a particular aspect, feature, structure, moiety, or characteristic, but not every embodiment necessarily includes that aspect, feature, structure, moiety, or characteristic. Moreover, such phrases may, but do not necessarily, refer to the same embodiment referred to in other portions of the specification. Further, when a particular aspect, feature, structure, moiety, or characteristic is described in connection with an embodiment, it is within the knowledge of one skilled in the art to affect or connect such aspect, feature, structure, moiety, or characteristic with other embodiments, whether or not explicitly described.

The singular forms "a," "an," and "the" include plural reference unless the context clearly dictates otherwise. Thus, for example, a reference to "a compound" includes a plurality of such compounds, so that a compound X includes a plurality of compounds X. It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for the use of exclusive terminology, such as "solely," "only," and the like, in connection with any element described herein, and/or the recitation of claim elements or use of "negative" limitations.

The term "and/or" means any one of the items, any combination of the items, or all of the items with which this term is associated. The phrases "one or more" and "at least one" are readily understood by one of skill in the art, particularly when read in context of its usage. For example, the phrase can mean one, two, three, four, five, six, ten, 100, or any upper limit approximately 10, 100, or 1000 times higher than a recited lower limit. For example, one or more substituents on a phenyl ring refers to one to five substituents on the ring.

As will be understood by the skilled artisan, all numbers, including those expressing quantities of ingredients, properties such as molecular weight, reaction conditions, and so forth, are approximations and are understood as being optionally modified in all instances by the term "about." These values can vary depending upon the desired properties sought to be obtained by those skilled in the art utilizing the teachings of the descriptions herein. It is also understood that such values inherently contain variability necessarily resulting from the standard deviations found in their respective testing measurements. When values are expressed as approximations, by use of the antecedent "about," it will be understood that the particular value without the modifier "about" also forms a further aspect.

The terms "about" and "approximately" are used interchangeably. Both terms can refer to a variation of ± 5%, ± 10%, ± 20%, or ± 25% of the value specified. For example, "about 50" percent can in some embodiments carry a variation from 45 to 55 percent, or as otherwise defined by a particular claim. For integer ranges, the term "about" can include one or two integers greater than and/or less than a recited integer at each end of the range. Unless indicated otherwise herein, the terms "about" and "approximately" are intended to include values, e.g., weight percentages, proximate to the recited range that are equivalent in terms of the functionality of the individual ingredient, composition, or embodiment. The terms "about" and "approximately" can also modify the endpoints of a recited range as discussed above in this paragraph.

As will be understood by one skilled in the art, for any and all purposes, particularly in terms of providing a written description, all ranges recited herein also encompass any and all possible subranges and combinations of sub-ranges thereof, as well as the individual values making up the range, particularly integer values. It is therefore understood that each unit between two particular units are also disclosed. For example, if 10 to 15 is disclosed, then 11, 12, 13, and 14 are also disclosed, individually, and as part of a range. A recited range (e.g., weight percentages or carbon groups) includes each specific value, integer, decimal, or identity within the range. Any listed range can be easily recognized as sufficiently describing and enabling the same range being broken down into at least equal halves, thirds, quarters, fifths, or tenths. As a non-limiting example, each range discussed herein can be readily broken down into a lower third, middle third and upper third, etc. As will also be understood by one skilled in the art, all language such as "up to", "at least", "greater than", "less than", "more than", "or more", and the like, include the number recited and such terms refer to ranges that can be subsequently broken down into sub-ranges as discussed above. In the same manner, all ratios recited herein also include all sub-ratios falling within the broader ratio. Accordingly, specific values recited for radicals, substituents, and ranges, are for illustration only; they do not exclude other defined values or other values within defined ranges for radicals and substituents. It will be further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint.

This disclosure provides ranges, limits, and deviations to variables such as volume, mass, percentages, ratios, etc. It is understood by an ordinary person skilled in the art that a range, such as “number 1” to “number 2”, implies a continuous range of numbers that includes the whole numbers and fractional numbers. For example, 1 to 10 means 1, 2, 3, 4, 5, ... 9, 10. It also means 1.0, 1.1, 1.2. 1.3, ..., 9.8, 9.9, 10.0, and also means 1.01, 1.02, 1.03, and so on. If the variable disclosed is a number less than “number 10”, it implies a continuous range that includes whole numbers and fractional numbers less than number 10, as discussed above. Similarly, if the variable disclosed is a number greater than “number 10”, it implies a continuous range that includes whole numbers and fractional numbers greater than number 10. These ranges can be modified by the term “about”, whose meaning has been described above.

The recitation of a), b), c), ...or i), ii), iii), or the like in a list of components or steps do not confer any particular order unless explicitly stated. One skilled in the art will also readily recognize that where members are grouped together in a common manner, such as in a Markush group, the invention encompasses not only the entire group listed as a whole, but each member of the group individually and all possible subgroups of the main group. Additionally, for all purposes, the invention encompasses not only the main group, but also the main group absent one or more of the group members. The invention therefore envisages the explicit exclusion of any one or more of members of a recited group. Accordingly, provisos may apply to any of the disclosed categories or embodiments whereby any one or more of the recited elements, species, or embodiments, may be excluded from such categories or embodiments, for example, for use in an explicit negative limitation.

The term "contacting" refers to the act of touching, making contact, or of bringing to immediate or close proximity, including at the cellular or molecular level, for example, to bring about a physiological reaction, a chemical reaction, or a physical change, e.g., in a solution, in a reaction mixture, in vitro, or in vivo.

The term “substantially” as used herein, is a broad term and is used in its ordinary sense, including, without limitation, being largely but not necessarily wholly that which is specified. For example, the term could refer to a numerical value that may not be 100% the full numerical value. The full numerical value may be less by about 1%, about 2%, about 3%, about 4%, about 5%, about 6%, about 7%, about 8%, about 9%, about 10%, about 15%, or about 20%.

Wherever the term “comprising” is used herein, options are contemplated wherein the terms “consisting of’ or “consisting essentially of’ are used instead. As used herein, “comprising” is synonymous with "including," "containing," or "characterized by," and is inclusive or open-ended and does not exclude additional, unrecited elements or method steps. As used herein, "consisting of excludes any element, step, or ingredient not specified in the aspect element. As used herein, "consisting essentially of does not exclude materials or steps that do not materially affect the basic and novel characteristics of the aspect. In each instance herein any of the terms "comprising", "consisting essentially of and "consisting of may be replaced with either of the other two terms. The disclosure illustratively described herein may be suitably practiced in the absence of any element or elements, limitation, or limitations not specifically disclosed herein.

The terms “polynucleotide” and “nucleic acid” are used interchangeably and mean at least two or more ribo- or deoxy-ribo nucleic acid base pairs (nucleotide) linked which are through a phosphoester bond or equivalent. The nucleic acid includes polynucleotide and polynucleoside. The nucleic acid includes a single molecule, a double molecule, a triple molecule, a circular molecule, or a linear molecule. Examples of the nucleic acid include RNA, DNA, cDNA, a genomic nucleic acid, a naturally existing nucleic acid, and a non-natural nucleic acid such as a synthetic nucleic acid but are not limited. Short nucleic acids and polynucleotides (e.g., 10 to 20, 20 to 30, 30 to 50, 50 to 100 nucleotides) are commonly called “oligonucleotides” or “probes” of single-stranded or double -stranded DNA.

As used herein, “sequence identity” or “identity” in the context of two nucleic acid or polypeptide sequences makes reference to a specified percentage of residues in the two sequences that are the same when aligned for maximum correspondence over a specified comparison window, as measured by sequence comparison algorithms or by visual inspection. When percentage of sequence identity is used in reference to proteins it is recognized that residue positions which are not identical often differ by conservative amino acid substitutions, where amino acid residues are substituted for other amino acid residues with similar chemical properties (e.g., charge or hydrophobicity) and therefore do not change the functional properties of the molecule. When sequences differ in conservative substitutions, the percent sequence identity may be adjusted upwards to correct for the conservative nature of the substitution. Sequences that differ by such conservative substitutions are said to have “sequence similarity” or “similarity.” Means for making this adjustment are well known to those of skill in the art. Typically this involves scoring a conservative substitution as a partial rather than a full mismatch, thereby increasing the percentage sequence identity. Thus, for example, where an identical amino acid is given a score of 1 and a non-conservative substitution is given a score of zero, a conservative substitution is given a score between zero and 1. The scoring of conservative substitutions is calculated, e.g., as implemented in the program PC/GENE (Intelligenetics, Mountain View, Calif.).

As used herein, “percentage of sequence identity” means the value determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison, and multiplying the result by 100 to yield the percentage of sequence identity.

The term “substantial identity” in the context of a peptide indicates that a peptide comprises a sequence with at least 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, or 94%, or even 95%, 96%, 97%, 98% or 99%, sequence identity to the reference sequence over a specified comparison window. In certain embodiments, optimal alignment is conducted using the homology alignment algorithm of Needleman and Wunsch (Needleman and Wunsch, JMB, 48, 443 (1970)). An indication that two peptide sequences are substantially identical is that one peptide is immunologically reactive with antibodies raised against the second peptide. Thus, a peptide is substantially identical to a second peptide, for example, where the two peptides differ only by a conservative substitution. Thus, embodiment of the invention also provides nucleic acid molecules and peptides that are substantially identical to the nucleic acid molecules and peptides presented herein.

For sequence comparison, typically one sequence acts as a reference sequence to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are input into a computer, subsequence coordinates are designated if necessary, and sequence algorithm program parameters are designated. The sequence comparison algorithm then calculates the percent sequence identity for the test sequence(s) relative to the reference sequence, based on the designated program parameters.

A “chromosome” is a nucleic acid molecule - and associated proteins - that is capable of replication and segregation in a cell upon division of the cell. Typically, a chromosome contains a centromeric region, replication origins, telomeric regions and a region of nucleic acid between the centromeric and telomeric regions. An “acrocentric chromosome” refers to a chromosome with arms of unequal length.

“Synthetic chromosomes” (also referred to as “artificial chromosomes”) are nucleic acid molecules, typically DNA, that stably replicate and segregate alongside endogenous chromosomes in cells that have the capacity to accommodate and express heterologous genes. A “mammalian synthetic chromosome” refers to chromosomes that have an active mammalian centromere(s). A “human synthetic chromosome” refers to a chromosome that includes a centromere that functions in human cells and that preferably is produced in human cells. Some exemplary artificial chromosomes are described in, for example, U.S. Pat. Nos. 8,389,802; 7,521,240; 6,025,155; 6,077,697; and 5,891,691.

“Endogenous chromosomes” refers to chromosomes found in a cell prior to generation or introduction of a synthetic chromosome.

“Site-specific recombination” refers to site-specific recombination that is affected between two specific sites on a single nucleic acid molecule or between two different molecules that requires the presence of an exogenous protein, such as an integrase or recombinase. Certain site-specific recombination systems can be used to specifically delete, invert, or insert DNA, with the precise event controlled by the orientation of the specific sites, the specific system and the presence of accessory proteins or factors.

The term “genetically engineered” may refer to any manipulation of a host cell's genome (e.g., by insertion or deletion of nucleic acids). The term “genetically edited” refers to a host cell whose genome has been edited by a CRISPR complex.

As used herein, the term “gene” refers to any segment of DNA associated with a biological function. Thus, genes include, but are not limited to, promoter sequences, terminator sequences, splice sites, polyubiquitination sites, intron sequences, coding sequences and/or the regulatory sequences required for their expression. Genes can also include non-expressed DNA segments that, for example, form recognition sequences for other proteins, such as a guide RNA. Genes can be obtained from a variety of sources, including cloning from a source of interest or synthesizing from known or predicted sequence information, and may include sequences designed to have desired parameters.

As used herein, the term “homologous” or “homologue” or “ortholog” is known in the art and refers to related sequences that share a common ancestor or family member and are determined based on the degree of sequence identity. The terms “homology,” “homologous,” “substantially similar” and “corresponding substantially” are used interchangeably herein. They refer to nucleic acid fragments wherein changes in one or more nucleotide bases do not affect the ability of the nucleic acid fragment to mediate gene expression or produce a certain phenotype. These terms also refer to modifications of the nucleic acid fragments of the instant disclosure such as deletion or insertion of one or more nucleotides that do not substantially alter the functional properties of the resulting nucleic acid fragment relative to the initial, unmodified fragment. It is therefore understood, as those skilled in the art will appreciate, that the disclosure encompasses more than the specific exemplary sequences. These terms describe the relationship between a gene found in one species, subspecies, variety, cultivar or strain and the corresponding or equivalent gene in another species, subspecies, variety, cultivar or strain. For purposes of this disclosure, homologous sequences are compared. “Homologous sequences”, “homologues”, or “orthologs” are thought, believed, or known to be functionally related. A functional relationship may be indicated in any one of a number of ways, including, but not limited to: (a) degree of sequence identity and/or (b) the same or similar biological function. In some embodiments, both (a) and (b) are indicated. Homology can be determined using software programs readily available in the art, such as those discussed in Current Protocols in Molecular Biology (F. M. Ausubel et al., eds., 1987) Supplement 30, section 7.718, Table 7.71. Some alignment programs are MacVector (Oxford Molecular Ltd, Oxford, U.K.), ALIGN Plus (Scientific and Educational Software, Pennsylvania) and AlignX (Vector NTI, Invitrogen, Carlsbad, Calif.). Another alignment program is Sequencher (Gene Codes, Ann Arbor, Mich.), using default parameters.

The term “complement” or “complementary” as used herein means the complementary sequence to a nucleic acid according to standard Watson/Crick base pairing rules. A complement sequence can also be a sequence of RNA complementary to the DNA sequence or its complement sequence and can also be a cDNA, or a region of DNA may be complementary to another region of DNA. The term “substantially complementary” as used herein means that two sequences hybridize under stringent hybridization conditions. The skilled artisan will understand that substantially complementary sequences need not hybridize along their entire length. In particular, substantially complementary sequences comprise a contiguous sequence of bases that do not hybridize to a target or marker sequence, positioned 3' or 5' to a contiguous sequence of bases that hybridize under stringent hybridization conditions to a target or marker sequence.

“Hybridization” refers to a reaction in which one or more polynucleotides react to form a complex that is stabilized via hydrogen bonding between the bases of the nucleotide residues. The hydrogen bonding may occur by Watson-Crick base pairing, Hoogstein binding, or in any other sequence-specific manner. The complex may comprise two strands forming a duplex structure, three or more strands forming a multi-stranded complex, a single self-hybridizing strand, or any combination of these. A hybridization reaction may constitute a step in a more extensive process, such as the initiation of a PC reaction, or the enzymatic cleavage of a polynucleotide by a ribozyme.

Examples of stringent hybridization conditions include incubation temperatures of about 25° C. to about 37° C.; hybridization buffer concentrations of about 6xSSC to about lOxSSC; formamide concentrations of about 0% to about 25%; and wash solutions from about 4xSSC to about 8xSSC. Examples of moderate hybridization conditions include incubation temperatures of about 40° C. to about 50° C.; buffer concentrations of about 9xSSC to about 2xSSC; formamide concentrations of about 30% to about 50%; and wash solutions of about 5xSSC to about 2xSSC. Examples of high stringency conditions include incubation temperatures of about 55° C. to about 68° C.; buffer concentrations of about I xSSC to about O.l xSSC; formamide concentrations of about 55% to about 75%; and wash solutions of about 1 x SSC, 0.1 x SSC, or deionized water. In general, hybridization incubation times are from 5 minutes to 24 hours, with 1, 2, or more washing steps, and wash incubation times are about 1, 2, or 15 minutes. SSC is 0.15 M NaCl and 15 mM citrate buffer. It is understood that equivalents of SSC using other buffer systems can be employed.

As used herein, the term “at least a portion” or “fragment” of a nucleic acid or polypeptide means a portion having the minimal size characteristics of such sequences, or any larger fragment of the full-length molecule, up to and including the full-length molecule. A fragment of a polynucleotide of the disclosure may encode a biologically active portion of a genetic regulatory element. A biologically active portion of a genetic regulatory element can be prepared by isolating a portion of one of the polynucleotides of the disclosure that comprises the genetic regulatory element and assessing activity as described herein. Similarly, a portion of a polypeptide may be 4 amino acids, 5 amino acids, 6 amino acids, 7 amino acids, and so on, going up to the full-length polypeptide. The length of the portion to be used will depend on the particular application. A portion of a nucleic acid useful as a hybridization probe may be as short as 12 nucleotides; in some embodiments, it is 20 nucleotides. A portion of a polypeptide useful as an epitope may be as short as 4 amino acids. A portion of a polypeptide that performs the function of the full-length polypeptide would generally be longer than 4 amino acids.

For PCR amplifications of the polynucleotides disclosed herein, oligonucleotide primers can be designed for use in PCR reactions to amplify corresponding DNA sequences from cDNA or genomic DNA extracted from any organism of interest. Methods for designing PCR primers and PCR cloning are generally known in the art and are disclosed in Sambrook et al. (2001) Molecular Cloning: A Laboratory Manual (3rd ed., Cold Spring Harbor Laboratory Press, Plainview, N.Y.). See also Innis et al., eds. (1990) PCR Protocols: A Guide to Methods and Applications (Academic Press, New York); Innis and Gelfand, eds. (1995) PCR Strategies (Academic Press, New York); and Innis and Gelfand, eds. (1999) PCR Methods Manual (Academic Press, New York). Known methods of PCR include, but are not limited to, methods using paired primers, nested primers, single specific primers, degenerate primers, gene-specific primers, vector-specific primers, partially mismatched primers, and the like.

Primers are short nucleic acids, for example DNA oligonucleotides at least about six nucleotides in length, and/or no longer than 10, 20, 50, 100 or 200 nucleotides in length, though in some embodiments they are longer. Primers may be annealed to a complementary target DNA strand by nucleic acid hybridization to form a hybrid between the primer and the target DNA strand, and then extended along the target DNA strand by a DNA polymerase enzyme. Primer pairs can be used for amplification of a nucleic acid sequence, e.g., by PCR or other nucleic acid amplification methods known in the art.

Methods for preparing and using probes and primers are described, for example, in Sambrook et al (Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, 1989), Ausubel et al., Current Protocols in Molecular Biology, Greene Publishing Associates and Wiley-Intersciences (1987), and Innis et al., PCR Protocols, A Guide to Methods and Applications, 1990, Innis et al (eds.), 21-27, Academic Press, Inc., San Diego, Calif. PCR primer pairs can be derived from a known sequence, for example, by using computer programs intended for that purpose, such as Primer (Version 0.5, © 1991, Whitehead Institute for Biomedical Research, Cambridge, Mass.), Primer3 (Version 2.6. 1 © 2022, Whitehead Institute for Biomedical Research, Cambridge, Mass), and Benchling primer wizard (© 2019, Benchling).

Probes and primers comprise at least ten nucleotides of a nucleic acid sequence, although a shorter nucleic acid (e.g., six nucleotides) may be used as a probe or primer if it specifically hybridizes under stringent conditions with a target nucleic acid by methods well known in the art. One of ordinary skill in the art will appreciate that the specificity of a particular probe or primer increases with its length. Thus, for example, a primer comprising 20 consecutive nucleotides of a sequence will anneal to a target sequence (for instance, contained within a genomic DNA library) with a higher specificity than a corresponding primer of only 15 nucleotides. To enhance specificity, longer probes and primers can be used, for example probes and primers that comprise at least 20, 30, 40, 50, 60, 70, 80, 90, 100 or more consecutive nucleotides from any region of a target. As used herein, “promoter” refers to a DNA sequence capable of controlling the expression of a coding sequence or functional RNA. The promoter sequence may consist of proximal and more distal upstream elements, the latter elements often referred to as enhancers. Accordingly, an “enhancer” is a DNA sequence that can stimulate promoter activity, and may be an innate element of the promoter or a heterologous element inserted to enhance the level or tissue specificity of a promoter.

As used herein, the term "expression", as used herein, refers to the process by which a polypeptide or RNA molecule is produced based on the encoding sequence of a nucleic acid molecule, such as a gene. The process may include transcription, post-transcriptional control, post-transcriptional modification, translation, post-translational control, post-translational modification, or any combination thereof. An expressed nucleic acid molecule is typically operably linked to an expression control sequence (e.g., a promoter). Methods of expressing polypeptides and RNAs are known in the art.

The term “operably linked” means in this context the sequential arrangement of the promoter polynucleotide according to the disclosure with a further oligo or polynucleotide, resulting in transcription of said further polynucleotide. In some embodiments, the promoter sequences of the present disclosure are inserted just prior to a gene's 5'UTR, or open reading frame. In other embodiments, the operably linked promoter sequences and gene sequences of the present disclosure are separated by one or more linker nucleotides.

As used herein, the phrases “recombinant construct”, “expression construct”, “chimeric construct”, “construct”, “recombinant DNA construct” and “recombinant RNA” are used interchangeably herein. A recombinant construct comprises an artificial combination of nucleic acid fragments, e.g., regulatory and coding sequences that are not found together in nature. For example, a chimeric construct may comprise regulatory sequences and coding sequences that are derived from different sources, or regulatory sequences and coding sequences derived from the same source but arranged in a manner different than that found in nature. Such construct may be used by itself or may be used in conjunction with a vector. If a vector is used, then the choice of vector is dependent upon the method that will be used to transform host cells as is well known to those skilled in the art. For example, a plasmid vector can be used. The skilled artisan is well aware of the genetic elements that must be present on the vector in order to successfully transform, select and propagate host cells comprising any of the isolated nucleic acid fragments of the disclosure. The skilled artisan will also recognize that different independent transformation events will result in different levels and patterns of expression (Jones etal., (1985) EMBO J. 4:2411-2418; De Almeida etal., (1989) Mol. Gen. Genetics 218:78-86), and thus that multiple events must be screened in order to obtain lines displaying the desired expression level and pattern. Such screening may be accomplished by Southern analysis of DNA, Northern analysis of mRNA expression, immunoblotting analysis of protein expression, or phenotypic analysis, among others. Vectors can be plasmids, viruses, bacteriophages, pro-viruses, phagemids, transposons, artificial chromosomes, and the like, that replicate autonomously or can integrate into a chromosome of a host cell. A vector can also be a naked RNA polynucleotide, a naked DNA polynucleotide, polynucleotide composed of both DNA and RNA within the same strand, a poly-lysine-conjugated DNA or RNA, a peptide -conjugated DNA or RNA, a liposome-conjugated DNA, or the like, that is not autonomously replicating. As used herein, the term “expression” refers to the production of a functional end product e.g., an mRNA or a protein (precursor or mature).

A “vector” is a replicon, such as plasmid, phage, viral construct, cosmid, bacterial artificial chromosome, P-1 derived artificial chromosome, or yeast artificial chromosome to which another DNA segment may be attached. In some instances, a vector may be a chromosome such as in the case of an arm exchange from one endogenous chromosome engineered to comprise a recombination site to a synthetic chromosome. Vectors are used to transduce and express a DNA segment in a cell. A “bacterial artificial chromosome (BAC)” is a DNA construct capable of extrachromosomal replication and segregation in bacterial cells. A “yeast artificial chromosome (YAC)” is a DNA construct capable of extrachromosomal replication and segregation in yeast cells. A “mammalian synthetic chromosome (MAC)” refers to chromosomes that have an active mammalian centromere (s). A “human synthetic chromosome (HAC)” refers to a chromosome that includes a centromere that functions in human cells and that preferably is produced in human cells. For exemplary artificial chromosomes, see, e.g., U.S. Pat. Nos. 8,389,802; 7,521,240; 6,025,155; 6,077,697; 5,891,691; 5,869,294; 5,721,118; 5,712,134; 5,695,967; and 5,288,625.

As used herein, the term "expression vector" refers to a DNA construct containing a nucleic acid molecule that is operably linked to a suitable control sequence capable of affecting the expression of the nucleic acid molecule in a suitable host. Such control sequences include a promoter to effect transcription, an optional operator sequence to control such transcription, a sequence encoding suitable mRNA ribosome binding sites, and sequences which control termination of transcription and translation. The vector may be a plasmid, a phage particle, a virus, or simply a potential genomic insert. Once transformed into a suitable host, the vector may replicate and function independently of the host genome, or may, in some instances, integrate into the genome itself or deliver the polynucleotide contained in the vector into the genome without the vector sequence. An example of certain expression vectors is described, for example, in U.S. Pat. No. 11,390,882.

The term “CRISPR RNA” or “crRNA” refers to the guide RNA strand responsible for hybridizing with target DNA sequences and recruiting CRISPR endonucleases. crRNAs may be naturally occurring or may be synthesized according to any known method of producing RNA. The term crRNA and guide strand (gRNA) are equivalent and may be interchangeably used throughout this document.

The term “tracrRNA” refers to a small trans-encoded RNA. TracrRNA is complementary to and base pairs with crRNA to form a crRNA/tracrRNA hybrid, capable of recruiting CRISPR endonucleases to target sequences.

The term “guide sequence” or “spacer sequence” refers to the portion of a crRNA that is responsible for hybridizing with the target DNA. The term “protospacer” refers to the DNA sequence targeted by a crRNA or sgRNA guide strand. In some embodiments the protospacer sequence hybridizes with the crRNA or sgRNA guide (spacer) sequence of a CRISPR complex.

The term “seed region” refers to the critical portion of a crRNA's or guide RNA's guide sequence that is most susceptible to mismatches with their targets. In some embodiments, a single mismatch in the seed region of a crRNA can render a CRISPR complex inactive at that binding site. In some embodiments, the seed regions for Cas9 endonucleases are located along that last 12 nucleotides of the 3' portion of the guide sequence. In some embodiments, the seed regions for Cpfl endonucleases are located along the first 5 nucleotides of the 5' portion of the guide strand.

The term “Guide RNA” or “gRNA” as used herein refers to an RNA sequence or combination of sequences capable of recruiting a CRISPR endonuclease to a target sequence. Thus, as used herein, a guide RNA can be a natural or synthetic crRNA (e.g., for Cpfl), a natural or synthetic crRNA/tracrRNA hybrid (e.g., for Cas9), or a single-guide RNA (sgRNA).

The term “CRISPR landing site” as used herein, refers to a DNA sequence capable of being targeted by a CRISPR complex. Thus, in some embodiments, a CRISPR landing site comprises a proximately placed protospacer/Protospacer Adjacent Motif (PAM) combination sequence that is capable of being cleaved a CRISPR endonuclease complex.

The term “validated CRISPR landing site” refers to a CRISPR landing site for which there exists a guide RNA capable of inducing cleavage of said sequence. Thus, the term validated should be interpreted as meaning that the sequence has been previously shown to be cleavable by a CRISPR complex. Each “validated CRISPR landing site” will by definition confirm the existence of a tested guide RNA associated with the validation.

As used herein, the term “CRISPR complex” refers to a CRISPR endonuclease and guide RNA complex. The term CRISPR complex thus refers to a combination of CRISPR endonuclease and guide RNA capable of inducing a double stranded break at a CRISPR landing site.

As used herein, the term “directing sequence-specific binding” in the context of CRISPR complexes refers to a guide RNA's ability to recruit a CRISPR endonuclease to a CRISPR landing site.

As used herein, a “cassette” is a nucleic acid sequence, optionally encoding at least one selectable marker that can be inserted into the genome of a cell or into a plasmid or artificial chromosome, for instance a prokaryotic or eukaryotic cell. In some embodiments, the cassette is divided into one or more separate nucleic acid sequences.

As used herein, the terms “endonuclease” or “endonuclease enzyme” refers to a member or members of a classification of catalytic molecules that bind a recognition site encoded in a DNA molecule and cleave the DNA molecule at a precise location within or near the sequence.

As used herein, the terms “endonuclease recognition site”, recognition site”, “cognate sequence” or “cognate sequences” refer to the minimal string of nucleotides required for a restriction enzyme to bind and cleave a DNA molecule or gene. Embodiments of the Invention.

This disclosure provides for methods of engineering synthetic chromosomes and/or synthetic genomes, and host cells comprising the synthetic chromosomes form other naturally occurring nucleic acid segments. In some embodiments, the methods described herein feature the use of certain cloning cassettes, cloning vectors, and an in vivo cut and assembly cloning methods to construct the synthetic chromosomes.

In some embodiments, a method for constructing a synthetic chromosome comprising steps of: providing host cells comprising an endogenous chromosome; transforming a cloning vector and a cloning cassette into the host cells; excising target genomic nucleic acids from the endogenous chromosome; recombining the excised target genomic nucleic acids with the cloning cassette via homologous recombination to form heterologous vectors comprising cloned sequences; extracting the heterologous vectors containing the cloned sequences from the host cells; digesting the heterologous vectors with a restriction endonuclease to release the cloned sequences from the heterologous vectors to provide released cloned sequences; and introducing the released cloned sequences, a centromere cassette, and a yeast artificial chromosomes or bacterial artificial chromosomes into a second host cell, wherein the released cloned sequences, the centromere cassette, and the yeast artificial chromosomes or bacterial artificial chromosomes recombine with one another via homologous recombination to produce the synthetic chromosome.

In some embodiments, the cloning cassette may comprise a single segment of nucleic acid or multiple segments of nucleic acid (e.g., 2 separate segments of nucleic acid). These segments may be free, that is, not part of a cloning vector. In other embodiments, the cloning cassette may be part of (i.e., integrated into) a cloning vector.

In some embodiments where the cloning cassette comprises two separate segments of nucleic acid, a first segment of nucleic acid comprises a first hook region and a second segment of nucleic acid comprises a second hook region. Each of the hook regions inlclude adapter sequences configured to be complementary to another adapter sequence on a different cloned segment, a region of cloning vector, or a centromere cassette. Thus, each cloning cassette may have a different adapter sequence that specifies how these nucleic acid molecules can recombine with another nucleic acid molecule in the host cell. There is no limit to the number of sequences that may be used for the adapter regions. In some embodiments, each of the adapter regions is about 25 base pairs to about 400 base pairs in length, about 25 base pairs to about 350 base pairs in length, about 25 base pairs to about 300 base pairs in length, about 25 base pairs to about 250 base pairs in length, about 25 base pairs to about 200 base pairs in length, about 25 base pairs to about 150 base pairs in length, about 25 base pairs to about 100 base pairs in length, or about 25 base pairs to about 50 base pairs in length.

In some embodiments, each of the hook regions also includes a complementary region that is complementary to a region flanking the excised target genomic nucleic acids. Each complementary region is about 25 base pairs to about 200 base pairs in length, about 25 base pairs to about 175 base pairs in length, about 25 base pairs to about 150 base pairs in length, about 25 base pairs to about 125 base pairs in length, about 25 base pairs to about 100 base pairs in length, about 25 base pairs to about 75 base pairs in length, or about 25 base pairs to about 50 base pairs in length. In some embodiments, the complementary regions are about 5 base pairs to about 25 base pairs in length.

In some embodiments, each hook region includes an adapter sequence flanked by nucleic acid moieties comprising unique restriction endonuclease sites that may be used to excise the cloned sequences from a vector. In some embodiments, each of the unique restriction endonuclease site also may comprise a homology region. The homology regions are complementary to regions of the cloning vector to permit recombination into the cloning vector. In some embodiments, the unique restriction endonuclease site also may and homology region is about 4 base pairs to about 100 base pairs in length, about 4 base pairs to about 90 base pairs in length, about 4 base pairs to about 80 base pairs in length, about 4 base pairs to about 70 base pairs in length, about 4 base pairs to about 60 base pairs in length, about 4 base pairs to about 50 base pairs in length, about 4 base pairs to about 40 base pairs in length, about 4 base pairs to about 30 base pairs in length, or about 4 base pairs to about 20 base pairs in length. In some embodiments, where the cloning cassette is originally part of the cloning vector, then the homology regions may be omitted, and the unique restriction site may comprise a sequence of the shortest endonuclease recognition sequence.

In some embodiments, a hook region may comprise a unique restriction endonuclease site and a homology region that is complementary to regions of the cloning vector. The homology regions permit recombination into the cloning vector while the unique restriction endonuclease site may be used to excise the cloned sequences from a vector.

In some embodiments, where the cloning cassette comprises two separate segments of nucleic acid, a first segment of nucleic acid comprises a first hook region and a second segment of nucleic acid comprises a second hook region, wherein the first hook region comprises formula A-B-C, wherein: A comprises a unique restriction endonuclease site and first homology region comprising a nucleic acid sequence complementary to first region of the cloning vector; B comprises a first adapter region comprising a nucleic acid sequence complementary to another adapter region; and C comprises a first complementary region comprising a nucleic acid sequence complementary to a region flanking the excised target genomic nucleic acids; the second hook region comprises formula D-E-F, wherein: D comprises a second complementary region comprising a nucleic acid sequence complementary to a region flanking the excised target genomic nucleic acids; E comprises a second adapter region comprising a nucleic acid sequence complementary to another adapter region; and F comprises a unique restriction endonuclease site and a second homology region comprising a nucleic acid sequence complementary to second region of the cloning vector; and wherein the two separate pieces of nucleic acid are co-transformed into the host cells.

When the cloning cassette is part of the cloning vector, the cloning cassette may comprise a first hook region, a second hook region, and a multiple cloning site (MCS) disposed therebetween, wherein the first hook region comprises formula A-B-C, wherein: A comprises a unique restriction endonuclease site; B comprises a first adapter region comprising a nucleic acid sequence complementary to at least one other adapter region; and C comprises a first complementary region comprising a nucleic acid sequence complementary to a region flanking the one or more excised target genomic nucleic acids; the second hook region comprises formula D-E-F, wherein: D comprises a second complementary region comprising a nucleic acid sequence complementary to a region flanking the one or more excised target genomic nucleic acids; E comprises a second adapter region comprising a nucleic acid sequence complementary to one other adapter region; and F comprises a unique restriction endonuclease site; and wherein prior to the transformation step, the cloning cassette is linearized by restriction endonuclease digestion using a restriction endonuclease that recognizes and cuts a nucleic acid sequence within the MCS. In some embodiments, a cloning cassette need not include an adapter region if one or both complementary regions of the cloning cassette share homology with another complementary region of a separate cloning cassette or with a region of the cloning vector. Further, in some embodiments, the adapter regions can include naturally occurring sequences, synthetic sequences, or a combination of both naturally occurring sequences and synthetic sequences.

In some embodiments, the cloning cassette is about 100 to about 1000 base pairs in length, about 100 to about 900 base pairs in length, about 100 to about 800 base pairs in length, about 100 to about 700 base pairs in length, about 100 to about 600 base pairs in length, about 100 to about 500 base pairs in length, about 100 to about 400 base pairs in length, about 100 to about 300 base pairs in length, or about 100 to about 200 base pairs in length.

In some embodiments, after transforming a cloning vector and a cloning cassette into the host and then excising target genomic nucleic acids from the endogenous chromosome, the first complementary region and the second complementary region of the hook regions separately contact and binds to complementary nucleic acid sequences flanking the excised target genomic nucleic acids; and the first homology region and the second homology region separately contact and bind to a first region and a second region of the cloning vector, respectively.

In some embodiments, after the target genomic nucleic acid sequences are integrated into the cloning vector to form a heterologous vector comprising cloned sequences, a restriction endonuclease, in a digesting step, recognizes unique restriction endonuclease sites flanking the cloning cassette of the heterologous vector, thereby releasing the cloned sequences from the heterologous vectors to provide the released cloned sequences. In some embodiments, the released cloned sequences comprise the structure B-CS-E, wherein CS comprises the one or more cloned sequences.

In some embodiments, after being transformed into a second host cell, the released cloned sequences, the centromere cassette, and the YAC or BAC recombine with one another via homologous recombination to produce the synthetic chromosome, at least one of a first adapter region and a second adapter region of a first released clone sequence is complementary to and binds to at least one of the first adapter sequence and the second adapter sequence of a second released cloned sequence to permit homologous recombination between the first released clone sequence and the second released clone sequence to produce the synthetic chromosome.

In some embodiments, after being transformed into a second host cell or recipient cell, the released cloned sequences, the centromere cassette, and the YAC or BAC recombine with one another via homologous recombination to produce the synthetic chromosome. In some embodiments, at least one of a first adapter region and a second adapter region of a first released clone sequence is complementary to and binds to at least one of the first adapter sequence and the second adapter sequence of a second released cloned sequence to permit homologous recombination between the first released clone sequence and the second released clone sequence to produce the synthetic chromosome, and/or at least one of a first adapter region and a second adapter region of a released clone sequence is complementary to and binds to a complementary region or an adapter region of the centromere cassette or the YAC/BAC to permit homologous recombination between the released clone sequence and one or more of the centromere cassette and cloning vector to produce the synthetic chromosome.

The methods described herein permit any number of target nucleic acid sequences to be excised from the host cell and inserted into a cloning cassette to form the cloned sequences. For example, the number of excised target nucleic acids can be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 or more that can produce 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 or more cloned sequences. Once the resultant cloned sequences are introduced into the second host cell or recipient cell, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 or more of the cloned sequences and a centromere cassette may recombine to form the synthetic chromosome (See, for example, Fig. 1 and Fig. 4).

In some embodiments, the host cells are cells of at least two different genera or species, and the synthetic chromosome is a chimeric chromosome comprising nucleic acid sequences from the two different genera or species. In some embodiments, the host cells are one or more of a yeast cell, a mammalian cell, or a bacterial cell. The second host cell can be any cell as disclosed herein and may be the same or different than the host cells.

In some embodiments, the different species are yeast species may comprise a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, Yarrowia strain, Acremonium, Aspergillus, Aureobasidium, Cryptococcus, Filibasidium, Fusarium, Humicola, Magnaporthe, Mucor, Myceliophthora, Neocallimastix, Neurospora, Paecilomyces, Penicillium, Piromyces, Schizophyllum, Thermoascus, Thielavia, Tolypocladium, or Trichoderma strain. In other embodiments, the fungal cell is one or more of Aspergillus aculeatus, Aspergillus awamori, Aspergillus foetidus, Aspergillus japonicus, Humicola insolens, Humicola lanuginosa, Mucor miehei, Myceliophthora thermophila, Penicillium purpurogenum, Trichoderma harzianum, Trichoderma koningi, Trichoderma longibrachiatum, Trichoderma reesei, Trichoderma viride strain, Kluyveromyces lactis, Candida albicans, Aspergillus nidulans, Aspergillus niger, Aspergillus oryzae, Trichoderma reesei, Chrysosporium lucknowense, Fusarium sp., Fusarium gramineum, Fusarium venenatum, Physcomitrella patens, and Neurospora crassa. Preferably, the host cells are one or more of Saccharomyces spp. such as Saccharomyces arhoricolus, Saccharomyces hayanus, Saccharomyces hulderi, Saccharomyces cariocanus, Saccharomyces cariocus, Saccharomyces cerevisiae, Saccharomyces cerevisiae var. houlardii, Saccharomyces chevalieri, Saccharomyces dairenensis, Saccharomyces ellipsoideus, Saccharomyces euhayanus, Saccharomyces exiguus, Saccharomyces florentinus, Saccharomyces fragilis, Saccharomyces kudriavzevii, Saccharomyces martiniae, Saccharomyces mikatae, Saccharomyces monacensis, Saccharomyces norhensis, Saccharomyces paradoxus, Saccharomyces pastorianus, Saccharomyces spencerorum, Saccharomyces turicensis, Saccharomyces unisporus, Saccharomyces uvarum, Saccharomyces zonatus .

In certain embodiments, the mammalian cell is a murine cell. In certain embodiments, said mammalian cell is a bovine cell. In certain embodiments, said mammalian cell is a human cell. In certain embodiments, cells may be from other mammalian species including, but not limited to, equine, canine, porcine, ovine sources; or rodent species such as rat may be used. In some embodiments, the mammalian cell may be one of a retinal pigmented epithelial (RPE) cell, a hematopoietic cell, a red blood cell, a platelet, a pancreatic beta cell, a skin cell, a cardiomyocyte, a smooth muscle cell, an endothelial cell, a hepatocyte, a neuron, a glia cell, a skeletal muscle cell, or a vascular cell.

In some embodiments, the bacterial cell is one of Neisseria, Spirillum, Pasteurella, Brucella, Yersinia, Francisella, Haemophilus, Bordetella, Escherichia, Salmonella, Shigella, Klebsiella, Proteus, Vibrio, Pseudomonas, Bacteroides, Acetobacter, Aerohacter, Agrohacterium, Azotohacter, Spirilla, Serratia, Rhizohium, Chlamydia, Rickettsia, Treponema, Fusohacterium. Actinomyces, Bacillus, Clostridium, Corynehacterium, Erysipelothrix, Lactobacillus, Listeria, Mycobacterium, Myxococcus, Nocardia, Staphylococcus, Streptococcus, and Streptomyces.

The use of various host cells dictates the source of the cloned sequences (e.g., the excised target genomic nucleic acids), and ultimately, the composition of the synthetic chromosome. For example, one host cell may be a yeast cell, a second host cell may be a human cell, and a third host cell may be bacterial cell. Thus, there is no limit to the number of host cells that may be used, and each host cell may be a cell of a different genera from another host cell. Thus, in some embodiments, a host cell may comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 or more different genera or species. If follows then that a synthetic chromosome may be made by introducing the clones sequences from any number of host cells into a recipient cell, such as a yeast cell, to construct the synthetic chromosome, resulting in a synthetic chromosome having, for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 or more different sources of nucleic acid. Typically, the recipient cell or second host cell in which the synthetic chromosome is constructed through homologous recombination is a yeast cell such as a Saccharomyces spp.

In some embodiments, the host cells express, either constitutively or inducibly, a CRISPR endonuclease; and wherein the transformation step further comprises transforming one or more guide ribonucleic acids (gRNAs) into the one or more host cells that are complementary to regions flanking the one or more target genomic nucleic acid sequences of the endogenous chromosome, wherein the CRISPR endonuclease protein and the gRNAs excise the one or more target genomic nucleic acid sequences from the endogenous chromosome.

In some embodiments, the host cells express, either constitutively or inducibly, a Cas9 or Casl2a endonuclease protein; and wherein the transformation step further comprises transforming one or more guide ribonucleic acids (gRNAs) into the one or more host cells that are complementary to regions flanking the one or more target genomic nucleic acid sequences of the endogenous chromosome, wherein the Cas9 or Casl2a endonuclease protein and the gRNAs excise the one or more target genomic nucleic acid sequences from the endogenous chromosome.

In some embodiments, the host cell comprises a nucleic acid sequence encoding a CRISPR endonuclease that is integrated into the host cell chromosome. In other embodiments, the host cell includes a plasmid or other vector comprising a nucleic acid sequence encoding the CRISPR endonucleases. In still other embodiments, the host cell comprises one or more plasmids or other vectors comprising nucleic acid sequences that encode both the CRISPR endonucleases and one or more gRNAs.

Other known CRISPR endonucleases include, but are not limited to, Cas8a, Cas5, Cas8b, Cas8c, CaslOd, Csel, Cse2, Csyl, Csy2, Csy3, CaslO, Csm2, Cmr5, CaslO, Csxl l, CsxlO, Csfl, Cas9, Csn2, Cas4, Casl2, Casl2a (Cpfl), Casl2b (C2cl), Casl2c (C2c3), Casl2d (CasY), Casl2e (CasX), Casl2f (Casl4, C2cl0), Casl2g, Casl2h, Casl2i, Casl2k (C2c5), C2c4, C2c8, and C2c9. In some embodiments, the CRISPR endonuclease is Cas9 or Casl2a. The CRISPR endonuclease should be a nuclease targeting deoxyribose nucleic acid. These enzymes are known; for example, the amino acid sequence of .S', pyogenes Cas9 protein may be found in the SwissProt database under accession number Q99ZW2.

The CRISPR/Cas9 system is used to treat target genomic nucleic acid in order to generate double -stranded breaks near the target genomic nucleic acid. CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) and CRISPR-associated (Cas) endonucleases were originally discovered as adaptive immunity systems evolved by bacteria and archaea to protect against viral and plasmid invasion. Naturally occurring CRISPR/Cas systems in bacteria are composed of one or more Cas genes and one or more CRISPR arrays consisting of short palindromic repeats of base sequences separated by genome-targeting sequences acquired from previously encountered viruses and plasmids (called spacers). (Wiedenheft et al. Nature. 2012; 482:331; Bhaya et al., Annu. Rev. Genet. 2011; 45:231; and Terms et al., Curr. Opin. Microbiol. 2011: 14:321). Bacteria and archaea possessing one or more CRISPR loci, respond to viral or plasmid challenge by integrating short fragments of foreign sequence (protospacers) into the host chromosome at the proximal end of the CRISPR array. Transcription of CRISPR loci generates a library of CRISPR-derived RNAs (crRNAs) containing sequences complementary to previously encountered invading nucleic acids (Haurwitz et. al., Science. 2012:329; 1355; Gesner et. al., Nat. Struct. Mol. Biol. 2001: 18:688; Jinek et. al., Science. 2012:337; 816-21). Target recognition by crRNAs occurs through complementary base pairing with target DNA, which directs cleavage of foreign sequences by means of Cas proteins.

There are at least five main CRISPR system types (Type I, II, III, IV and V) and at least 16 distinct subtypes (Makarova et al., Nat Rev Microbiol. 2015. Nat. Rev. Microbiol. 13, 722-736; Makarova et al., Nature Reviews. Microbiology. 18 (2): 67-83). CRISPR systems are also classified based on their effector proteins. Class 1 systems possess multi-subunit crRNA-effector complexes, whereas in class 2 systems all functions of the effector complex are carried out by a single protein (e.g., Cas9 or Casl2a/Cpfl). In some embodiments, the present disclosure teaches using type II and/or type V single -subunit effector systems. Thus, in some embodiments, the present disclosure teaches the use of class 2 CRISPR systems. In some embodiments, the CRISPR system used to form the synthetic chromosomes described herein comprises the Casl2a/Cpfl system (Bijoya et al., Biomedical Journal, Vol. 1, Issue 1, 8-17, (2020)).

In some embodiments, the present disclosure teaches methods of gene editing using a Type 11 CRISPR system. In some embodiments, the present disclosure teaches Cas9 Type 11 CRISPR systems. Type II systems rely on a i) single endonuclease protein, ii) a transactivating crRNA (tracrRNA), and iii) a crRNA where a 20-nucleotide (nt) portion of the 5 ' end of crRNA is complementary to a target nucleic acid. The region of a CRISPR crRNA strand that is complementary to its target DNA protospacer is hereby referred to as “guide sequence.”

Cas9 endonucleases produce blunt end DNA breaks and are recruited to target DNA by a combination of a crRNA and a tracrRNA oligos, which tether the endonuclease via complementary hybridization of the RNA complex. In some embodiments, the crRNA and a tracrRNA are combined into a single gRNA.

In some embodiments, DNA recognition by the crRNA/endonuclease complex requires additional complementary base pairing with a protospacer adjacent motif (PAM) (e.g., 5'-NGG-3') located in a 3 ' portion of the target DNA, downstream from the target protospacer. (Jinek et. al., Science. 2012:337; 816-821). In some embodiments, the PAM motif recognized by a Cas9 varies for different Cas9 proteins. An exemplary use of a CRISPR system is described in U.S. Pat. App. No. US 2019/0134227.

In some embodiments, one or more repair templates are transformed into the host cell to permit repair of the endogenous chromosome after excision of the target genomic nucleic acids. For example, the one or more repair templates may comprise one or more selectable markers; a first homology arm; and a second homology arm; wherein the first homology arm and the second homology are complementary to nucleic acid sequences of the endogenous chromosome that flank the one or more target genomic nucleic acids to permit homologous recombination between the one or more repair templates and exogenous chromosome, thereby repairing the endogenous chromosome.

Embodiments of the methods disclosed herein permit the formation of certain synthetic chromosomes that comprises one more deletions of segments of nucleic acid as compared to the endogenous chromosome, or, in other embodiments, permits formation of synthetic chromosome comprising a rearrangement of one or more nucleic acid segments as compared to an arrangement of corresponding nucleic acid segments of the endogenous chromosome. In other embodiments, recombinant cells may be produced with a synthetic chromosome made using a method as disclosed herein.

In some embodiments, the cloned segments are propagated in library of cloned segments in cloning vectors or heterologous vectors. The library may be contained in, for example, bacterial cells (e.g., E. coli) for ease of use and manipulation. In some embodiments, the cloned segment of genomic nucleic acid is about 100 kB, about 95 kB, about 90 kB, about 85 kB, about 80 kB, about 75 kB, about 70 kB, about 65 kB, about 60 kB, about 55 kB, about 50 kB, about 45 kB, about 40 kB, about 35 kB, about 30 kB, about 25 kB, about 20 kB, about 15 kB, about 14 kB, about 13 kB, about 12 kB, about 11 kB, about 10 kB, about 9 kB, about 8 kB, about 7 kB, about 6 kB, about 5 kB, about 4 kB, about 3 kB, about 2 kB, about 1 kB, or about .5 kB. In some embodiments, the cloned sequences exclude the telomeric sequences or sub-telomeric sequences of nucleic acid. In other embodiments, the cloned sequences include the telomeric sequences or sub-telomeric sequences of nucleic acid. In other embodiments, the deleted segments of genomic nucleic acid are about 100 kB, about 95 kB, about 90 kB, about 85 kB, about 80 kB, about 75 kB, about 70 kB, about 65 kB, about 60 kB, about 55 kB, about 50 kB, about 45 kB, about 40 kB, about 35 kB, about 30 kB, about 25 kB, about 20 kB, about 15 kB, about 14 kB, about 13 kB, about 12 kB, about 11 kB, about 10 kB, about 9 kB, about 8 kB, about 7 kB, about 6 kB, about 5 kB, about 4 kB, about 3 kB, about 2 kB, about 1 kB, or about .5 kB.

In some embodiments, the synthetic chromosome includes a centromere cassette comprising an origin or replication. Origins of replication are regions of DNA from which DNA replication during the S phase of the cell cycle is primed. While yeast origins of replication, termed autonomously replicating sequence (ARS), are fully defined (Theis et al., Proc. Natl. Acad Sci. USA 94: 10786-10791, 1997), there does not appear to be a specific corresponding origin of replication sequence in mammalian DNA (Grimes and Cooke, Human Molecular Genetics, 7(10): 1635-1640, 1998). There are, however, numerous regions of mammalian DNA that can function as origins of replication (Schlessinger and Nagaraja, Ann. Med., 30: 186-191, 1998; Dobbs et al., Nucleic Acids Res. 22:2479-89, 1994; and Aguinaga et al., Genomics 5:605-11, 1989). It is known that for every 100 kb of mammalian DNA sequence there is a sequence that will support replication, but in practice sequences as short as 20 kb can support replication on episomal vectors (Calos, Trends Genet. 12:463-466, 1996). This indicates that epigenetic mechanisms, such as CpG methylation patterning likely play some role in replication of DNA (Rein et al., Mol. Cell. Biol. 17:416-426, 1997).

In some embodiments, the origin of replication of a disclosed synthetic chromosome (SC) can be any size that supports replication of the SC. One way of ensuring that the SC has a functional ori sequence is to require that SC contain at least 5 kb of genomic DNA. In other embodiments, it contains at least 10 kb, 15 kb, 20 kb, 25 kb, 30 kb, 35 kb, 40 kb, 45 kb, 50 kb, 60 kb, 70 kb, 80 kb, 90 kb, or 100 kb of genomic DNA. In general, any region of DNA could be used as origin of replication. If there is replication of the SC, the origin of replication is functioning as desired.

The origin of replication of the SC can be obtained from any number of sources, including particularly any number of sources of eukaryotic DNA. By way of example, it can be any region of eukaryotic DNA that is not based on a repeat sequence, such as the alphoid DNA sequence. A native alphoid DNA sequence does not contain an origin of replication in it, because the repeat sequences are so small, for example about 170 base pairs, and can be repeated many times, so that there is not enough variation for an origin of replication sequences to be present. However, in many instances these regions, when they contain multiple alphoid DNA repeats, can function as origins of replication in eukaryotes, such as human, cells (see, e.g., U.S. patent publication No. 2004/0245317). Also included in the SCs as described herein is a centromere region. It is understood that a centromere region, broadly defines a functional stretch of nucleic acid that allows for segregation of the SC during the cell cycle and during mitosis. Thus, in some embodiments, a centromere cassette comprises an origin of replication, a centromere sequence, and an adapter arm flanking both the origin of replication and the centromere sequence, wherein the adapter arm is complementary to at least one other adapter region of the cloning cassette. Optionally, the centromere cassette also may include one or more selectable markers.

Various cloning vectors for use with embodiments of the invention are known in the art. Preferred cloning vectors include Bacterial artificial chromosomes (BACs) which are DNA constructs capable of accommodating larger inserts than plasmids. BACs have an advantage of being able to capture large genomic segments up to approximately 350 kilobases, thereby facilitating the cloning of entire genes and other nucleic acid sequences including noncoding regions and regulatory elements. Hallmark features of BACs include, but are not limited to, a Pl or F origin of replication; an antibiotic resistance gene which may be one of several genes conferring resistance to kanamycin, spectinomycin, streptomycin, ampicillin, carbenicillin, bleomycin, erythromycin, polymyxin B, tetracycline, or chloramphenicol, among others; a parA and/or parB sequence for partitioning F plasmid DNA to daughter cells during cell division; and a polylinker region. A suitable BAC vector backbone may accommodate an entire gene or other large locus, or a library of inserts generated from a fractionated genome.

Similar to BACs, yeast artificial chromosomes (Y ACs) may be used to accommodate relatively large payloads of polynucleotides. In some implementations, YACs may be preferable to BACs for the cloning of polynucleotides into a cell. Hallmark features of YACs may include, but are not limited to, an ARS sequence (The ARS feature functions similarly to the bacterial origin of replication and allows for independent propagation of the plasmid in cells) to allow the YAC to replicate autonomously and extra chromosomally; CEN sequences (The CEN sequence is the attachment point for kinetochore complexes and allows for faithful segregation of copies of the plasmid to daughter cells during mitosis) to confer mitotic stability and allow for faithful segregation and maintenance of copy number during cell division; a selectable marker which may include, but is not limited to, URA3, LEU2, HIS3, TRP1, ADE2, LYS2, or MET15, an autoselection system URA3, FBAI, POT/TPI, or CDCx, or a dominant selectable marker gene may including, neomycin phosphotransferase (kan), hygromycin B phosphotransferase gene (hph), nourseothricin N-acetyltransferase (nat), phosphinothricin acetyltransferase (pat), phleomycin resistance (ble), acetamidase gene of Aspergillus nidulans (amdSYM), or thymidine kinase (tk). YACs may accommodate relatively large polynucleotide insert sizes up to 3000 kb in length (Dunnen, et al., Hum Mol Genet, 1(1): 19-28 (1992)). Various BACS and YACS are described, for example, in U.S. Pat. No. 11,583,556, 10,99,542.

Thus, in some embodiments, the one or more yeast artificial chromosomes or bacterial artificial chromosomes comprises one or more selectable markers, and one or more homology arms that are complementary to one or more adapter regions of the hook regions of the cloning cassette.

In some embodiments, the one or more cloning vectors are BACs/YACs, and may comprise one or more selectable markers, an origin of replication, a centromere sequence, one or more sop genes, an oriV sequence, an oriS sequence, one or more rep genes, or a combination thereof. Nucleic acids can be introduced into a host cell via any known means such as, but not limited to, conjugation, transformation, transduction, electroporation, or a combination thereof.

The following Examples are intended to illustrate the above invention and should not be construed as to narrow its scope. One skilled in the art will readily recognize that the Examples suggest many other ways in which the invention could be practiced. It should be understood that numerous variations and modifications may be made while remaining within the scope of the invention.

EXAMPLE 1

Example 1. CReATiNG synthetic chromosomes from natural DNA

A system for in vivo cloning and reprogramming of natural DNA. CReATiNG involves cloning segments of natural chromosomes in yeast donor cells and then programmably assembling these segments into synthetic chromosomes in different recipient cells. To clone a target segment, we cotransform three reagents into cells that constitutively express Cas9 (Figs. 1A and B; Fig. 5; Table 1): 1) in vitro transcribed guide RNAs (gRNAs) that direct Cas9 to cut a segment on each side, excising it from a chromosome; 2) a linear Bacterial Artificial Chromosomc/Ycast Artificial Chromosome (BAC/YAC) cloning vector flanked by homology to the ends of a segment, enabling capture of a segment in vivo by homologous recombination; and 3) a repair template comprised of a dominant drug marker flanked with homology arms that allow a cell to reconstitute its broken chromosome, resulting in replacement of a segment with a marker.

Prior to cloning a given target segment, we modify the cloning vector by introducing a -500 bp cloning cassette that is synthesized de novo. A cloning cassette contains segment-specific homology arms separated by restriction sites and bounded by I-Scel sites, which are absent from the yeast nuclear genome. Between the I-Scel sites and homology arms, we also include 100 bp DNA sequences (adapters) that are not present in the .S'. cerevisiae genome. These adapters are used to program how segments will assemble later, as different segments with the same adapters will recombine in vivo. After transformation with the cloning reagents, cells containing successful cloning events can be isolated by selection on the markers in the cloning vector and the repair template. Cloned segments can then be extracted from yeast donor cells, transformed into E. coli for amplification, and extracted from E. coli for assembly in yeast recipient cells.

Initial assembly of a chromosome using CReATiNG. To prototype CReATiNG, we used Chromosome I (ChrI), a 230 kb chromosome containing 117 known or predicted protein-coding genes. ChrI is the smallest chromosome in the Saccharomyces genus and shows synteny between species. In silico, we divided .S', paradoxus ChrI into three non-overlapping segments between 51 and 64 kb, which contained the entire chromosome except the centromere, subtelomeres, and telomeres (Fig. 6A). To assemble segments in their natural order and orientations, we used distinct adapters to specify the junctions between segments 1 and 2, segment 2 and the centromere cassette, and the centromere cassette and segment 3. We excluded subtelomeres from all subsequent work, as they are completely dispensable and highly variable across Saccharomyces strains and species. Telomeres were excluded because they are not amenable to cloning and we assemble chromosomes as circular, rather than linear, molecules.

To enable cloning of the .S', paradoxus ChrI segments, we generated a Cas9-expressing version of the .S', paradoxus CBS5829 strain. We then performed three transformations, each targeting a different segment (Table 2 and 3). For each transformation, we checked five random colonies by amplifying each junction between a cloned segment and the vector (Figs. 6B-D; Table 4). 14 of 15 (93%) colonies showed successful cloning based on amplification of both junctions (Table 5). After transfer to and amplification in E. coli, the three .S', paradoxus ChrI segments were extracted, liberated from the vector by I-Scel digestion, and purified (Fig. 7A).

Next, we assembled the segments into a circular chromosome by co-transforming them, the centromere cassette with appropriate adapters, and a centromere-free BAC7YAC into the BY4742 (BY) reference strain of .S', cerevisiae (Fig. 1C). Following transformation, we selected recipient cells in which the five molecules had assembled. Of five colonies checked by PCR of junctions between assembled segments, four (80%) contained the complete assembly (Table 6). We then performed whole genome Oxford Nanopore sequencing of a single yeast clone and confirmed the presence of two copies of ChrI, one from BY and one from .S', paradoxus (Fig. ID).

To produce a euploid strain containing only .S', paradoxus ChrI, we conditionally destabilized and selected against BY ChrI (Fig. 7B). Chromosome elimination involves disrupting centromere function with an inducible GALI promoter and counter selecting a URA3 marker on the chromosome using 5-FOA. We verified complete elimination of BY ChrI by PCR (Fig. 7C) and Oxford Nanopore Technologies sequencing (Fig. ID). We also used the Oxford Nanopore data to confirm that the euploid strain with synthetic .S', paradoxus ChrI had the expected sequence and structure genome-wide, with the exception of a single point mutation in a non-fiinctional portion of the centromere cassette. While the remaining work in this manuscript was conducted with circular chromosomes, some chromosome synthesis applications could require linear chromosomes. To confirm that it is possible to convert chromosomes synthesized by CReATiNG from circular to linear forms, we linearized the synthetic .S'. paradoxus Chrl. We used Cas9 to introduce double-strand breaks near each junction between the chromosome and the cloning vector. In addition to gRNAs, we co-transformed repair templates for both chromosome ends, each of which contained a synthetic telomere seed (Murray et al., Nature 305, 189-193 (1983)) and a distinct selectable marker. By selecting for the markers in both repair templates, we obtained cells containing a linear .S', paradoxus Chrl (Fig. 8).

Recombination of chromosomes between strains and species. After confirming that CReATiNG can be used to build synthetic chromosomes that replace the native chromosomes in recipient cells, we explored potential applications. The first application was to synthetically recombine chromosomes between strains and species, which could aid efforts to study the genetic basis of heritable phenotypes. Relative to the crosses conventionally used to generate recombinants, the advantages of CReATiNG are that it does not require mating, meiosis, or natural synteny. Additionally, CReATiNG allows three or more parental chromosomes to recombine in a single assembly. The main constraint of CReATiNG for synthetically recombining chromosomes is that at present it cannot be applied genome wide.

To use CReATiNG to recombine chromosomes synthetically, we next cloned the three Chrl segments from BY and another .S', cerevisiae strain, the vineyard isolate RMl l-la (RM). During cloning, we appended the same adapters that were used for .S', paradoxus segments, making it possible to generate all-possible syntenic combinations of the BY, RM, and .S', paradoxus segments (Fig. 2A). As we had already generated a strain containing an entirely .S', paradoxus Chrl, we individually assembled the 26 remaining possible chromosomes. These assemblies had efficiencies between 20 and 100% based on PCR examination of five colonies per transformation (Fig. 2B; Table 6). After elimination of native Chrl, a single successful assembly per design was then confirmed by Oxford Nanopore Technologies sequencing. In a few cases, cells with correct assemblies also possessed additional partial assemblies, which were eliminated by multiple rounds of streaking for single colonies followed by PCR confirmation of loss.

Next, we measured the growth rates of these strains with recombinant chromosomes in two conditions-rich liquid medium at 30 and 35°C (Figs. 2C and D; Table 7). .S', paradoxus is known to be more sensitive to high temperature than .S', cerevisiae. This phenotypic difference was also present in the donor strains. Among the 27 strains with recombinant chromosomes, growth rates varied substantially in both conditions (one-way ANOVAs, p-values < 6.6xl0 -11 ). In addition, the strain carrying a fully .S', paradoxus Chrl exhibited the slowest growth at both temperatures, but the difference between it and other strains was more severe at higher temperature. The results corroborate a recent finding that genetic differences on Chrl contribute to variation in thermotolerance between the two species. The 27 strains with recombinant chromosomes also provide an opportunity to measure the individual and combined phenotypic effects of the three ChrI segments. We found that two segments had significant effects on growth at 30°C (one-way ANOVAs, segment 1 p-value = 0.008 and segment 2 p-value = 0.019). However, at 35 °C, all three segments had highly significant effects (one-way ANOVAs, p-values < 2.8xl0 -5 ; Figs. 2E and F). These results show that all three ChrI segments contribute to growth variation across temperatures, with segment 3 in particular showing a strong interaction with temperature. We also used the data to measure epistasis among the segments and detected significant pairwise and three-way genetic interactions (F tests, p-values < 3.6xl0 -4 ; Table 8). Thus, CReATiNG is a useful tool for studying the contribution of genetic factors, including genotype- by-environment interactions and epistasis, to trait differences between strains and species.

Chromosome restructuring. CReATiNG can also be used to experimentally probe the structural rules underpinning chromosome organization, a topic relevant to genome function and evolution. Recent work suggests yeast can tolerate a diversity of chromosome structures, but most of these studies preserved the order of naturally linked genes 30-33 . CReATiNG makes it possible to restructure the contents of a chromosome in specific non-natural configurations that are programmed using adapters. CReATiNG can be used to synthesize chromosomes with one or more inversions, duplications, deletions, or modifications to gene order.

To demonstrate how CReATiNG can be used in chromosome restructuring, we re-cloned segments 1 through 3 from BY. During this round of cloning, we modified the adapters appended to each segment, making it possible to assemble the segments in all possible orders without inverting any segment. Using these re-cloned segments, we designed five non-natural ChrI structures with the same content but different orders (i.e., 1-3-2, 2-3-1, 2-1-3, 3-1-2, and 3-2-1) (Fig. 3A). We produced euploid strains with each non-natural ChrI structure by assembling segments with appropriate adapters and then eliminating the native ChrI. Each assembly was verified by junction PCRs and Oxford Nanopore sequencing (Fig. 3A; Table 9).

While all five strains possessing restructured versions of ChrI were viable, they also showed substantial phenotypic variation. The 2-3-1 configuration exhibited a 7% growth improvement relative to the natural 1-2-3 configuration, which had been generated earlier in the work on synthetic recombinants Fig. 3B; Table 10). By contrast, the 3-1-2 and 1-3-2 configurations respectively showed growth reductions of 18% and 68% relative to the natural 1-2-3 configuration. This result shows that relocating segment 2 to the natural position of segment 3 significantly impedes growth. However, the degree of this impairment also depends on the locations of segments 1 and 3. These findings show how programmably restructuring chromosomes with CReATiNG can be used to identify non-natural chromosome configurations with phenotypic effects.

Multiplex gene deletion and chromosome streamlining. Another application of CReATiNG is highly multiplexed deletion, a task that remains challenging for conventional genome editing technologies. Multiplexed deletion could enable the generation of streamlined chromosomes in which many non-essential genetic elements have been eliminated. Such streamlining can facilitate the production of yeast strains with a substantially reduced gene complement, as well as the reorganization of functionally related genes into modules. CReATiNG simplifies multiplexed deletion: segments of a natural chromosome that should be retained can be cloned and assembled, cleanly deleting all intervening parts of a natural chromosome.

We designed a CReATiNG workflow to delete 10 regions of ChrI, summing to 30 kb and containing 18 non-adjacent genetic elements in aggregate (12 protein-coding genes, 2 tRNAs, 3 LTRs, and 1 LTR retrotransposon; Fig. 4A; Tables 11-13). We chose these elements because they are all annotated as non-essential and no synthetic lethal interactions have been reported among them. This design required cloning 11 segments, which ranged in size from 3.8 kb to 21 kb (Tables 14 and 15), and programming them with appropriate adapters for assembly. As a reminder, we also exclude subtelomeres from synthetic chromosomes, which in BY comprise 61.7 kb and 24 protein-coding genes, 1 pseudogene, and 2 LTRs (Table 16). In total, the multiplex deletion design reduced ChrI by 39.9% (91.7 kb) and eliminated 45 genetic elements.

After cloning, we co-transformed the 11 segments into BY along with a centromere cassette and a linearized BAC7YAC vector. ADE1, a gene encoding an adenine biosynthesis enzyme whose loss of function causes colonies to accumulate red pigment, was present in one of the regions targeted for deletion. Thus, after native ChrI elimination, we picked 10 red colonies and checked each at all assembly junctions by PCR. While none of the red colonies possessed all deletions, a single red colony had nine of the 10 deletions, a finding confirmed by Oxford Nanopore Technologies sequencing (Fig. 4B) This strain with the multiplex deletion exhibited significantly slower growth than a strain with a synthetic BY ChrI lacking the deletions (212 and 94 minute doubling times, respectively; t-test, p-value = 6.1xl0 1 ; Fig. 4C).

Of the 10 core chromosome regions targeted for deletion, the only region retained in the colony with nine deletions was between segments 6 and 7. The single gene residing in this region is SYN8, which encodes a SNARE protein involved in vesicle fusion with membranes 36 . Re-examination of the red colonies found that SYN8 was retained in all 10, suggesting that it genetically interacts with one or more of the other deleted elements. To test for such an interaction, we deleted SYN8 from the strain with the other deletions. In this context, deleting SYN8 increased doubling time to 394 minutes, an 86% increase over the multiple deletion strain and a 319% increase over the strain with a synthetic BY ChrI lacking the deletions (Fig. 4C; Table 17). By contrast, SYN8 deletion had no effect on BY (difference = 1 minute; t-test, p-value = 0.96; Fig. 4C).

These results imply that multiplex deletion uncovered an unknown genetic interaction between SYN8 and other elements on ChrI, which converted the normally dispensable SYN8 into an quasiessential gene 7 . Such unknown genetic interactions represent a major obstacle for efforts to streamline chromosomes and genomes, as they will cause strains carrying synthetic chromosomes to show slow growth and poor tractability in the lab. However, the SYN8 example shows that synthetic chromosomes generated by CReATiNG can overcome such unknown interactions via recombination with native chromosomes prior to native chromosome elimination (Fig. 4D).

CReATiNG makes it possible to build synthetic chromosomes with diverse designs using natural components. Because CReATiNG employs cloned segments of natural chromosomes as opposed to small DNA fragments synthesized de novo, it is substantially cheaper and faster than de novo chromosome synthesis. For example, some of the final chromosomes completed for this disclosure went from in silico design to in vivo testing within a month and cost less than five hundred dollars to produce. Although some synthetic chromosome designs will require complete chromosome reprogramming, which is not possible with CReATiNG, many will not. Indeed, we have shown here that CReATiNG can be used to study a variety of fundamental questions in genetics, genomics, and evolution. Moreover, we unexpectedly found an additional benefit of CReATiNG, which is that it can allow cells to recover from unknown design flaws via recombination between a synthetic chromosome and its native counterpart.

Most of the work in this disclosure involved simple synthetic chromosome designs in which only three segments were assembled. However, we also demonstrated in the multiplex deletion experiment that CReATiNG can be used to build synthetic chromosomes with complex designs involving 10 segments or more. Using CReATiNG to make synthetic chromosomes with complex designs could lead to important biological discoveries. For example, while here we synthetically recombined chromosomes at only two sites, this number could be increased, potentially by a large amount, facilitating fine scale genetic mapping of heritable traits within and between species. Similarly, more complex modifications of chromosome structure could be used to identify natural design principles governing chromosome architecture. CReATiNG can also likely be used to delete larger numbers of linked, non-adjacent regions from chromosomes than explored here, which could facilitate the streamlining of the yeast genome.

CReATiNG has potential applications beyond those shown in this disclosure. CReATiNG can be paired with de novo chromosome synthesis to enable projects that might not otherwise be feasible. For example, chromosomes with Saccharomyces architecture but sequences from other non- Saccharomyces species could be synthesized de novo and then recombined with Saccharomyces chromosomes using CReATiNG. Such an experiment would facilitate study of the genetic basis of reproductive isolation and trait differences between phylogenetically distant organisms. In addition, CReATiNG and de novo chromosome synthesis could be employed in combination to efficiently relocate genes in the same pathways, complexes, or cellular processes to common genetic modules. Further yet, with some modifications, CReATiNG in yeast may be employed to produce synthetic chromosomes, large DNA constructs, or gene variant libraries that could then be transferred to other systems, such as mammalian cells. These diverse applications highlight how CReATiNG democratizes the use of chromosome synthesis in biological research. Table 1: Sequences of vectors pASCl, pASC2, and the centromere cassette.

Table 2: Cloning cassettes for segments 1 through 3 in all experiments except multiplex deletion.

Each cassette contains a pair of upstream and downstream adapters flanking segment-specific homology arms separated by Xhol and Avril sites used for vector linearization. The same cloning cassettes were used in BY and RM, while different cloning cassettes were used for .S' paradoxus.

Upstream adapter (italicized); upstream homology arm (bold); downstream homology (underlined); and downstream adapter (italicized and underlined)

Table 3: gRNAs used to clone the three segments in the initial chromosome substitution, synthetic recombination, and restructuring experiments. Column 1 (‘gRNAs’) lists the gRNAs IDs. The annotations SI, S2 and S3 represent segment 1, 2 and 3, respectively. The ‘up’ and ‘down’ annotations refer to the side of a segment targeted by a guide. Column 2 (‘sequence’) lists the target sequence of each gRNA.

Table 4: Primers used to confirm cloning of segments 1 through 3 for all experiments except multiplex deletion. Column 1 (‘Primers’) lists primers IDs. Column 2 (‘Sequence’) contains the nucleotide sequence of each primer.

Table 5: Cloning efficiency of chromosome I segments 1 through 3 from BY, RM, and 5. paradoxus. Column 2 (‘Size’) lists the size of each target segment from all 3 different strains. Column 2 (‘Upstream junction check’) lists the number of positive PCR reactions from the upstream junction of a target segment and the capture vector. Column 3 (‘Downstream junction check’) lists the number of positive PCR reactions from the downstream junction of a target segment and the capture vector.

Column 4 (‘Cloning efficiency’) lists the overall efficiency of segment cloning based on the positive rate of PCR reaction on both junctions up and downstream.

Table 6: Efficiency of assembly of synthetic chimera ChrI in BY4742 and elimination of its native

Chrl. Column 1 ('Strain') lists the ID of all the strains containing chimeric versions of ChrI. Column 2- 4 ('Segmentl','Segment2' and 'Segments') lists the strain specific segment associated to each chimera ChrI in distinct chimera strains. Column 5 ('Assembly efficiency') lists the efficiency of assembling the synthetic ChrI for each chimera cell based on PCR checking of 5 junctions across the assembled ChrI. Column 6 ('Native ChrI elimination efficiency') lists the efficiency of elimination of native BY ChrI based on PCR checking of segments covering the native ChrI.

Table 7: Phenotypic assay of all BY ChrI chimera strains growing on rich media at distinct temperature, 30 and 35°C. Column I ('Strain') lists IDs of each chimera strain. Column 2 through 4 ('Segmentl', 'Segment2' and 'Segments') lists each strain-specific segment to its respective chimera strain. Column 5 ('Doubling time at 30°C (min)') lists the doubling time of each chimera strain growing on rich media at 30°C. Column 5 ('Doubling time at 35°C (min)') lists the doubling time of each chimera strain growing on rich media at 35°C.

Table 8: Full factorial ANOVA table for growth of chimera ChrI cells at 35°C. PVE (phenotypic variance explained). Interaction terms are denoted by ‘

Table 9: Efficiency of assembly of synthetic restructured ChrI in BY and elimination of its native

ChrI. Column 1 ('Strain') lists the ID of all the strains containing restructured versions of ChrI. Column 2-4 ('Position l','Position2' and 'Positions') lists the order of assembled segments for each restructured version of chromosome I. Column 5 ('Assembly efficiency') lists the efficiency of assembling the restructured ChrI for each strain cell based on PCR checking of 5 junctions across the assembled Chrl. Column 6 ('Native Chrl elimination efficiency') lists the efficiency of elimination of native BY Chrl based on PCR checking of segments covering the native Chrl. A total of 12 replicas per strain were phenotyped on both conditions.

Table 10: Phenotypic assay of strains containing the restructured versions of BY Chrl. Column I ('Strain') lists IDs of each strain carrying a restructured version of Chrl. Column 2-4 ('Position l','Position2' and 'Positions') lists the order of assembled segments for each restructured version of chromosome I. Column 5 ('Doubling time at 30°C (min)') lists the doubling time of each restructured Chrl strain growing on rich media at 30°C. A total of 9 replicas per strain were phenotyped. Table 11: Cloning cassettes for chromosome I segment capture in Saccharomyces cerevisiae BY4742 used for multiplex deletion. Each module contains a pair of upstream and downstream adaptors flanking segment-specific homology arms that are separated by a site containing Xhol and Avril sites used for vector linearization. Column 1 ('Cloning module') lists the ID of each cloned region. Column 2 ('Sequence') contains the nucleotide sequence of each specific clone module.

Upstream adaptor (italicized); upstream homology arm (bold); downstream homology (underlined); and downstream adaptor (italicized and underlined)

Table 12: Location and size of each target segment on ChrI for cloning or deletion during multiplex deletion assay. Column 1 ('Segment') lists the IDs of all target segments. Column 2 ('Start') lists the start position of each segment referent to S. cerevisiae ChrI coordinates. Column 3 ('End') lists the end position for each segment. Column 4 ('Size(bp)') lists the size of each segment.

Table 13: Genetic elements removed from the ChrI multiple deletion strain. Column 1 ('Features') lists the standard ID of each genetic element removed from ChrI. Column 2 ('Systematic feature') lists the systematic feature of each genetic element removed from ChrI. Column 3 ('Feature type') associates the type of feature to each genetic element. Column 4 ('Coordinates') lists the start and end position of each removed element based on ChrI sequence coordinates. Column 5 ('Location') associates each removed genetic feature to a deleted segment into multiple deleted ChrI strain. A given deleted segment may contain one or more listed features.

Table 14: gRNAs used to clone segments for the multiplex deletion experiment. Column 1 ('gRNAs') lists the gRNAs IDs. The ‘up’ and ‘down’ annotations refer to the side of a segment targeted by a guide, Column 2 ('sequence') lists the target sequence of each gRNA.

Table 15: Primers used to confirm capture of segments 1 through 11 on multiple deletion assay.

Column 1 (‘Primers’) lists primers IDs. Column 2 (‘Sequence’) contains the nucleotide sequence of each primer.

Table 16: Genetic elements present on both sub-telomeric regions in S. cerevisiae BY Chrl. Both sub-telomeres (left and right) and consequently their genetic elements were removed from all the synthetic Chrl constructs. Column 1 ('Features') lists the standard ID of each genetic element present on sub-telomeres. Column 2 ('Systematic feature') lists the systematic feature of each genetic element present on sub-telomeres. Column 3 ('Feature type') associates the type of feature to each genetic element. Column 4 ('Coordinates') lists the start and end position of each element based on Chrl sequence coordinates. Column 5 ('Location') lists if a given feature is located on the left or right subtelomere on Chrl.

Table 17: Phenotypic assay of strains generated during ChrI multiple deletion assay and their controls. Column I ('Strain') lists IDs of each strain. The 'ChrI multiple deletion strain' contains a version of ChrI where 15 out of the 16 targeted genetic elements were removed in a single yeast transformation. The 'ChrI multiple deletion strain syn8A' is the former 'ChrI multiple deletion strain' whose gene SYN8 was deleted through CRISPR/cas9 cutting and replacement. The 'BY synthetic ChrI strain' contains a circular assembled version of ChrI without the sub-telomeres but preserved core region. The 'BY syn8A' strain is the parental wild-type BY4742 strain whose gene SYN8 was deleted by using CRISPR/CAS9 cutting and replacement. The 'BY4742 strain' is the wild type parental S. cerevisiae strain. Column 2 ('Doubling time at 30°C (min)') lists the doubling time of each strain growing on rich media at 30 °C.

Table 18: Sequences of vectors used for Spar ChrI linearization and SYN8 deletion.

Table 19: gRNAs used during S.par ChrI linearization and SYN8 deletion. Column 1 ('gRNAs') lists the gRNAs IDs. The ‘up’ and ‘down’ annotations refer to the side of a segment targeted by a guide, Column 2 ('sequence') lists the target sequence of each gRNA.

Example 2. Material and Methods

Production of target-specific cloning vectors. To produce the BAC7YAC cloning vector pASCl, we performed a four-piece Gibson assembly (Gibson et al., Nat Methods 6, 343-5 (2009).) with the pCCIBAC CopyControl plasmid from Epicentre Biotechnologies, a portion of the pRS316 plasmid (Sikorski et al., Genetics 122, 19-27 (1989).) (ATCC #77145) that included ARSH4, CEN6, and URA3, and two DNA blocks containing I-Scel sites. To create the necessary homology for Gibson assembly, pCCIBAC and the portion of pRS316 were amplified by PCR with tailed primers. The two DNA blocks containing I-Scel sites were ordered from Twist Bioscience and also contain a multiple cloning site (MCS) that is used as a homology sequence between the two blocks. Correct assembly of all four pieces produced a vector with a multiple cloning site flanked by two I-Scel sites. To prepare this cloning vector for a specific target segment, we add a -500 bp cloning cassette to pASCl at the MCS. Each cloning cassette contains two -150 bp homology arms flanked by -100 bp adapters and separated by 30 bp of restriction sites that are used for vector linearization. Cloning cassettes were ordered from Twist Bioscience. Addition of a cloning cassette to pASCl was done by restriction digestion and ligation. Equimolar amounts of pASCl and a cloning cassette were digested with EcoRI and SphI and ligated using T4 DNA ligase. After addition of a cloning cassette, the vector was transformed into TransforMax EPI300 cells (LGC Biosearch Technologies) and high copy number was induced with Epicentre’s CopyControl system via a copy control induction solution (LGC Biosearch Technologies). Large quantities of vectors were then harvested by ZymoPURE II plasmid midiprep kit (Zymo Research).

In vitro transcription of gRNAs. The gRNAs for all CRISPR/Cas9 cutting were produced by in vitro transcription (Kannan etal., Sci Rep 6, 30714 (2016)). For a given gRNA, we generated a dsDNA template by fusing two ssDNA oligonucleotides, one including the tracrRNA and the other the targetspecific crRNA, using PCR. After PCR products were purified using the DNA Clean and Concentrator- 5 kit, we combined 150 ng of purified PCR with 10 ul of RiboMAX 2X buffer and 2 ul of T7 express enzyme from the T7 RiboMAX Express Large Scale RNA Production System (Promega). Water was used to bring each reaction to a total volume of 20 ul. In vitro transcription reactions were incubated at 37°C for >4 hrs. We then added 2 ul of DNAse and incubated reactions for an additional 18 mins at 37°C. We cleaned gRNAs using the RNA Clean & Concentrator-5 kit (Zymo Research) and stored them at -20°C until use.

Cloning of natural genomic segments. We clone target segments by co-transforming a linear version of pASCl that possess homology arms matching the ends of a target segment, a repair template containing kan (KanMX). and gRNAs into a Saccharomyces strain that constitutively expresses Streptococcus pyogenes Cas9 either from a chromosomally-encoded construct or from a plasmid. Prior to transformation, the cloning vector is linearized by cutting between the homology arms using Xhol and Avril. The repair template is produced through a PCR reaction using a modified pRS316 plasmid in which URA3 was replaced with KanMX as the template. The primers are designed to flank KanMX and contain 40 bp homology tails that match genomic sites adjacent to a target segment. Yeast cells were transformed with 200 ng of linearized vector, 200 ng of repair template, and 1 ug total of a mix of multiple gRNAs. Typically, we included six distinct gRNAs, three targeting each side of a segment. Cells were transferred to 2 mm electroporation cuvettes and electroporated at 2.5 kV, 200 Q. and 25 uF 40 . Transformants were recovered for 2-3 hours in YPDS, a 50:50 mix of YPD (2% glucose, 1% yeast extract, and 2% peptone) and IM sorbitol, and plated on SC Ura- plates containing G418 to select for the pASCl vector and use of the repair template, respectively. After two days, transformants were checked by colony PCR at both junctions between pASC 1 and a cloned segment. Amplification of cloned segments. Cloned segments were extracted from yeast using the Zymo Research ZymoPURE II bacterial midiprep kit and the following steps. The yeast strain containing a cloned segment was inoculated into 50 ml of SC lacking uracil and containing G418, and grown for 12- 16 hrs at 30°C on a shaking incubator. Cells were pelleted and washed twice using sterile water. Cells were then converted to spheroplasts by resuspension in 3 ml of lysis buffer (I M sorbitol, 100 mM EDTA pH 8, and 14 mM P-mercaptoethanol) and 1 ml of lyticase (15 U/ul). This suspension was incubated at 37°C for 1 hr with shaking and inspected for spheroplasts every 15 mins. After the majority of cells had their cell wall removed (>75%), the spheroplasts were pelleted and washed with sterile water twice. The remaining steps proceeded according to manufacturer recommendations. To prevent DNA shearing, vortex steps were avoided, and wide bore pipette tips were employed. The plasmid containing the cloned segment was then transformed into EPI300 cells (LGC Biosearch Technologies) and transformants were selected by growing in LB chloramphenicol plates (30 pg/ml). Presence of the correct cloned segment was checked by junction PCRs and confirmed transformants were grown overnight at 37°C on a shaking incubator. High copy plasmid induction was done by transferring 4.5 ml of overnight growing cells to 45 ml of LB supplemented with chloramphenicol (30 ug/ml) and 50 ul of copy control induction solution (LGC Biosearch Technologies). Cells were grown for 5 hrs and plasmids extracted using the midiprep kit described before, using manufacturer recommendation. The cloned segments were liberated from the cloning vector by digestion with I-Scel, followed by 0.5% agarose gel electrophoresis at 70 V for 90 mins, and purification with the Zymoclean Large Fragment DNA Recovery kit (Zymo Research).

Construction of the centromere cassette. We generated the centromere cassette by adding kan KanMX) and a loxP site to the pRS316 plasmid right after its CEN/ARS region. The pRS316 vector, the KanMX cassette, and a loxP site were amplified using primers with homology tails. The molecules were mixed in equimolar amounts and ligated using Gibson Assembly Master Mix (New England Biolabs). These assemblies was then transformed into Dh5a cells and the correct assembly was identified by PCR and Sanger sequencing. To generate the centromere cassettes used in synthetic chromosome assemblies, we amplified the centromere cassette region of the plasmid with tailed primers containing appropriate adapters.

Chromosome assembly. Synthetic chromosomes were assembled as circular molecules including segments, a centromere cassette with appropriate adapters, and a modified BAC7YAC vector named pASC2, which lacks CEN/ARS and contains HISS instead of URA3. A given assembly was performed by co-transforming >500 ng of each purified segment, 200 ng of the centromere cassette, and 200 ng of linearized pASC2 into BY. Transformation was performed using a standard PEG/LiAc method (Gietz et al., method. Nat. Protoc. 2. 31-34 (2007).), but extra care was taken when handling DNA solutions to avoid DNA shearing. All vortex steps were replaced by gentle manual shaking and pipetting with wide bore tips. Transformants containing assemblies were selected on SC lacking histidine and containing G418. Correct assemblies were identified by junction PCRs and confirmed by Oxford Nanopore sequencing.

Elimination of native chromosomes. We used CRISPR/Cas9 to place pGALl 114 bp upstream of the centromere on the native ChrI in BY. In the presence of galactose, pGALl drives transcription through the centromere, destabilizing its function and resulting in native ChrI loss in some cells (Richardson etal., Science 355, 1040-1044 (2017); Hill et al., Mol. Cell. Biol. 7, 2397-2405 (1987).). We also marked the native ChrI with URA3, enabling selection for native ChrI loss. After assembly, cells possessing an assembly were selected on SC lacking histidine and containing G418, and then replicated onto plates that lacked histidine and contained galactose. Cells were grown for two days and then replicated into SC plates lacking histidine that were supplemented with 5-FOA. Elimination of native ChrI was confirmed by diagnostic PCRs of sites that present on the native ChrI but not a synthetic ChrI.

Linearization of synthetic S. paradoxus ChrI. The BY strain containing the assembled .S'. paradoxus ChrI was first transformed with a modified version of the pML104 Cas9 plasmid 42 , which had URA3 replaced with HIS3 (Table 18). To linearize the .S', paradoxus ChrI, we used electroporation to co-transform two telomere cassettes and two pairs of in vitro transcribed gRNAs that target the ends of the pASC2 vector (Table 19). The left telomere cassette was generated by amplification of URA3 in the pRS316 vector using primers with tails that added a telomere seed sequence and homology to .S'. paradoxus ChrI (For: TGTGTGGTGTGTGGTGTGTGTGGGTGTGTGGTGTGTGGGT CTGTAAGCGGATGCCGG (SEQ ID NO: 158); Rev: CTCCTTACGCATCTGTGCG TACCCTTTAAAATCTCATTGGCTCGTGATTAATTTGTTCTGTGCTGCTGAATATTCATGC X SEQ ID NO: 159). The right telomere cassette was generated by amplifying nat (NatMX) from a modified pRS316 plasmid, which had URA3 replaced with nat (NatMX) (For: TTACATATCCTCTACACCGAGCGCGTCGACCCGTCGAATGGTTTAGCTTGCCTTGTC CCC (SEQ ID NO: 160); Rev: GGCGGCGTTAGTATCGAATCCACC

CACCACACACACCCACACACACCACACACCCACCCA) (SEQ ID NO: 161). Linearization occurs when the gRNAs create double strand breaks at both ends of pASC2, which are repaired by homologous recombination between .S', paradoxus ChrI and the left and right telomere cassettes. Strains with a linear ChrI will have URA3 and nat (NatMX) and will lack the HIS3 present on pASC2. Linearization was confirmed phenotypically and by PCR of the linearization sites.

Deletion ofSYN8. We used Cas9 to delete SYN8 from BY and the ChrI multiple deletion strain. Both strains were electroporated with the pML104 Cas9 vector that had URA3 replaced with HISS. We used a 2 mm cuvette and 2.5 kV, 200 Q. and 25 uF. Transformants were selected on SC His-. We then performed a second electroporation with 200 ng of a nat (NatMX) repair template with homology arms targeting the sites adjacent to the cut and 1 ug total of gRNAs targeting SYN8 (Table 19). We added the homology arms to nat (NatMX) by tailed PCR using the modified pRS316 vector as the template (ForGGGCTATAAAGTATATATAGATACAAATATATGATGAATCGTTTAGCTTGCCTTGT C CCC (SEQ ID NO: 162); Rev: GAATAAAATTTCCCAGCACGACTTTGATCACC CGAAAGGGGGCGGCGTTAGTATCGAATC) (SEQ ID NO: 163). Transformants were recovered for 2-3 hours in YPDS and plated on YPD plates containing 200 ug/ul of nourseothricin (Goldbio, USA).

Genomic DNA extraction. Strains were streaked from a -80°C freezer stock onto YPD plates containing G418 and allowed to grow for two days at 30°C. A single colony was then used to inoculate a 5 ml overnight culture in YPD containing G418 which was placed in a shaker at 30°C. The 5 ml overnight culture was then inoculated into 50 ml +G418 in a 250 ml Erlenmeyer flask. The cells were shaken overnight at 30°C. Prior to extraction, these cultures were normalized to 7.0xl0 9 number of cells in YPD. Cells were then placed into 50 ml falcon tubes, centrifuged at 3000 x g for 10 mins at 4°C, and supernatant was decanted. Cell pellets were gently washed with 10 ml of PBS and centrifuged again at 3000 x g for 10 mins at 4°C. Supernatant was then decanted, and a fresh 10 mb of PBS was added. This process was repeated twice. After the wash steps 4 ml of Y1 (IM sorbitol, 100 mM EDTA pH 8.0, and 14mM beta mercaptoethanol which was added immediately before use) were added to the cells along with 1 ml of lyticase (15U/ul). The tubes were transferred back to the shaking incubator at 30°C for >1 hr, with hand inversion done every 10 mins to prevent the cells from settling. OD was checked periodically and once the cells had lost 80% of their OD660, spheroplasting was considered complete. Spheroplasts were centrifuged at 5000 x g for 10 mins at 4°C, and the supernatant was decanted and replaced with 5 ml of Qiagen G2 buffer with 15 ul of RNase A and 300 ul of Proteinase K. The solution was incubated at 55°C for >2 hrs. A Qiagen 100/G Genomic tip was equalized with 4mL of QBT buffer. While this was flowing through the samples were centrifuged at 5000 x g for 10 mins at 4°C to spin down cell debris. The sample was passed through the column, collected and repassed through the column a second time. The column was washed with 2 x 7.5 ml of buffer QC, then eluted with 5 ml of buffer QF. The collected elution was passed through the column a second time, then 3.5 ml of molecular grade isopropanol was added along with 0.0 lx volumes of fdter sterilized sodium acetate. The samples were placed in a -20°C to precipitate for 48 hrs. After precipitation the samples were spun down at 21000 x g for 10 mins at 4°C to collect the DNA in a Lo-bind tube. The ethanol was decanted, and the ethanol was allowed to evaporate for 1 min. An extraction buffer (EB) of 1,080 ul of 10 mM tris-HCl, 1 mM EDTA Ph 9.0, and 45 ul Triton X-100 .5% (v/v) was created fresh before each use. 200 ul of EB buffer was added to the DNA. The DNA was then allowed to dissolve at 55°C for 30 mins followed by 2-3 hours at 30°C.

Oxford Nanopore library preparation. For Oxford Nanopore library preparation, the SQK- LSK109 and EXP-NBD104 protocols were used with the following modifications. At the end of each bead clean up, DNA was eluted in EB at 37°C for 10 mins to allow for the recovery of larger molecules. The library was then barcoded and multiplexed with up to 10 samples. These multiplexes were sequenced on R9.4. 1 Chemistry, MIN- 106 flow cells. The .fast5 archive was Tarbelled, gunzipped, and uploaded onto the USC’s Center for Advanced Research Computing (CARC) Discovery cluster. Basecalling was performed using Guppy (v. 6.0.1), which was contained in a Singularity container using the configuration dna_r9.4.1_450bps_sup.cfg with 16 threads and a V100 GPU. The .fastq files were demultiplexed and trimmed of barcodes in Guppy using standard parameters.

Structural analysis of synthetic chromosomes. For each strain carrying a synthetic chromosome, we built a reference genome in silico using appropriate data from BY, RM, or .S', paradoxus CBS5829. Because all assemblies were performed in BY, Chromosomes II-XIV, as well as the mitochondrial genome and 2-um plasmid, were always taken from the BY reference genome. The only part of the reference genomes that varied was Chrl. Reads were mapped against the appropriate reference using minimap2 (Li et al., Bioinformatics 34, 3094-3100 (2018).) (v. 2.24-rl l22, -ax map-ont). Using samtools (Danecek et al., GigaScience 10, giab008 (2021); Bonfield et al., GigaScience 10, giab007 (2021).) (vl.15, htslib vl.15, view -bS), the .sam was converted into a .bam file, sorted, and indexed. We then used bamtools (v2.5.2) to split the .bam file by chromosome. The .bam file for Chrl was extracted and used in both Nanocaller and Sniffles with variant calls being made when both programs agreed upon a variant and the .bam files could be visually inspected to verify each call. Reads spanning adapters were extracted using samtools (vl.15, htslib vl.15, view) and checked by visual inspection.

Growth assays. Most phenotyping was done using liquid growth assays on a BioTek ELx808TM 96-well plate reader. A given strain was grown overnight in YPD at 30°C and 1.2 ul was then inoculated into 118.8 ul of YPD in the appropriate wells of a 96-well plate. A randomized block design was used to mitigate positional effects. The plates were incubated with shaking at 30°C or 35°C on the plate reader and OD600 was acquired every 15 mins until cultures reached the stationary phase. Doubling time value for each culture was calculated using PRECOG (PREsentation and Characterization Of Growth-data) (Femandez-Ricaud et al., BMC Bioinformatics 17, 249 (2016)). We also performed dilution spot assays. Overnight cultures and their ODs were measured. Cell aliquots were diluted in YPD to an OD of 1. We then performed 10-fold serial dilutions (1: 10, 1: 100, 1: 1,000, 1: 10,000). 3 ul of each dilution, including non-diluted overnight culture, were pinned into appropriate plates and incubated at 30°C or 35°C. Plates were imaged using a BioRAD Gel Doc XR+ Molecular Imager at a standard size of 11.4 x 8.52 cm2 (width x length) and imaged with epi-illumination using an exposure time of 0.5 s. Images were saved as 600 dpi tiffs.

While specific embodiments have been described above with reference to the disclosed embodiments and examples, such embodiments are only illustrative and do not limit the scope of the invention. Changes and modifications can be made in accordance with ordinary skill in the art without departing from the invention in its broader aspects as defined in the following claims.

All publications, patents, and patent documents are incorporated by reference herein, as though individually incorporated by reference. No limitations inconsistent with this disclosure are to be understood therefrom. The invention has been described with reference to various specific and preferred embodiments and techniques. However, it should be understood that many variations and modifications may be made while remaining within the spirit and scope of the invention.