Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
KARYOCREATE (KARYOTYPE CRISPR ENGINEERED ANEUPLOIDY TECHNOLOGY)
Document Type and Number:
WIPO Patent Application WO/2024/055002
Kind Code:
A1
Abstract:
Provided are a fusion protein comprising a mutated kinetochore protein and dCas9. The fusion protein is used in conjunction with guide RNAs target the fusion protein to a location of kinetochore assembly on a centromere such that the fusion protein interferes with chromosome segregation. Use of the fusion protein and the guide RNAs in cells results the cells acquiring an aneuploidy karyotype. Expression vectors that encode the fusion proteins and/or the guide RNAs and their uses in the method of producing an aneuploidy karyotype are also provided.

Inventors:
DAVOLI TERESA (US)
BOSCO NAZARIO (US)
Application Number:
PCT/US2023/073784
Publication Date:
March 14, 2024
Filing Date:
September 08, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV NEW YORK (US)
International Classes:
C07K19/00; C07K14/47; C12N5/10; C12N9/22; C12N15/62
Foreign References:
US20040029197A12004-02-12
Other References:
KUHL LISA-MARIE, MAKRANTONI VASSO, RECKNAGEL SARAH, VAZE ANIMISH N, MARSTON ADELE L, VADER GERBEN: "A dCas9-Based System Identifies a Central Role for Ctf19 in Kinetochore-Derived Suppression of Meiotic Recombination", GENETICS, vol. 216, no. 2, 1 October 2020 (2020-10-01), pages 395 - 408, XP093150524, ISSN: 1943-2631, DOI: 10.1534/genetics.120.303384
TOVINI LAURA, JOHNSON SARAH C., ANDERSEN ALEXANDER M., SPIERINGS DIANA CAROLINA JOHANNA, WARDENAAR RENÉ, FOIJER FLORIS, MCCLELLAND: "Inducing Specific Chromosome Mis-Segregation in Human Cells", BIORXIV, 19 April 2022 (2022-04-19), pages 1 - 25, XP093150563, [retrieved on 20240411], DOI: 10.1101/2022.04.19.486691
BAJAJ RAKHI; BOLLEN MATHIEU; PETI WOLFGANG; PAGE REBECCA: "KNL1 Binding to PP1 and Microtubules Is Mutually Exclusive", STRUCTURE, ELSEVIER, AMSTERDAM, NL, vol. 26, no. 10, 9 August 2018 (2018-08-09), AMSTERDAM, NL , pages 1327, XP085495325, ISSN: 0969-2126, DOI: 10.1016/j.str.2018.06.013
MCVEY SHELBY L, OLSON MISCHA A, PAWLOWSKI WOJCIECH P, NANNAS NATALIE J: "Beyond editing, CRISPR/Cas9 for protein localization: an educational primer for use with “A dCas9-based system identifies a central role for Ctf19 in kinetochore-derived suppression of meiotic recombination”", GENETICS, vol. 222, no. 1, 30 August 2022 (2022-08-30), XP093150569, ISSN: 1943-2631, DOI: 10.1093/genetics/iyac109
BOSCO NAZARIO, GOLDBERG ALEAH, JOHNSON ADAM F, ZHAO XIN, MAYS JOSEPH C, CHENG PAN, BIANCHI JOY J, TOSCANI CECILIA, KATSNELSON LIZA: "KaryoCreate: a new CRISPR-based technology to generate chromosome-specific aneuploidy by targeting human centromeres", BIORXIV, 28 September 2022 (2022-09-28), pages 1 - 56, XP093150579, [retrieved on 20240411], DOI: 10.1101/2022.09.27.509580
Attorney, Agent or Firm:
WATT, Rachel et al. (US)
Download PDF:
Claims:
What is claimed is:

1. A fusion protein comprising a mutated kinetochore protein and dCas9.

2. The fusion protein of claim 1, wherein the kinetochore protein comprises a segment of KNLI protein, wherein the segment of the KNLI protein comprises at least the first 86 N- terminal amino acids of the KNLI protein, and wherein the first 86 N-terminal amino acids comprises a mutation of the sequence RVSF to AAAA, or S24A to S60A.

3. The fusion protein of claim 2, wherein the segment of KNLI protein comprises the sequence

MDGVS SEANEENDNIERPVRRRHS SILKPPRSPLQDLRGGNETVQESNALRNKKNSR AAAAADTIKVFQTESHMKIVRKSEMEETE (SEQ ID NO: 1) or

MDGVS SEANEENDNIERPVRRRHASILKPPRSPLQDLRGGNETVQESNALRNKKNSR RVAFADTIKVFQTESHMKIVRKS (SEQ ID NO: 2).

4. A composition comprising the fusion protein of any one of claims 1-3.

5. The composition of claim 3, further comprising at least one guide RNA that targets the fusion protein to a location of kinetochore assembly on a centromere such that the fusion protein interferes with chromosome segregation.

6. A method comprising introducing into cells in vitro a fusion protein of any one of claims 1-3 and at least one guide RNA that targets the fusion protein to a location of kinetochore assembly on a centromere of a specific chromosome such that the fusion protein interferes with segregation of the chromosome, and allowing cell division in the presence of the fusion protein and the guide RNA such that cell division results in divided cells that comprise an aneuploidy karyotype.

7. The method of claim 6, wherein the aneuploidy karyotype comprises a gain of a chromosome.

8. The method of claim 6, wherein the aneuploidy karyotype comprises a loss of a chromosome.

9. The method claim 6, wherein the aneuploidy karyotype is associated with a malignant cell phenotype.

10. An isolated population of cells which comprise an aneuploidy karyotype made by the method of claim 6.

11. The isolated population of cells of claim 10, wherein the aneuploidy karyotype comprises a loss of a chromosome.

12. The isolated population of cells of claim 10, wherein the aneuploidy karyotype comprises comprises a gain of a chromosome.

13. The isolated population of cells of claim 10, wherein the aneuploidy karyotype is associated with a malignant cell phenotype.

14. A kit comprising a fusion protein of any one of claims 1-3 or an expression vector enconding the fusion protein, and optionally one or more guide RNAs that target the fusion protein to a location of kinetochore assembly on a centromere, or one or more polynucleotides that encode the one or more guide RNAs.

15. A method comprising selecting a guide RNA that targets a location of kinetochore assembly on a centromere of a specific chromosome, and introducing into cells a combination of the selected guide RNA and a fusion protein comprising a mutated kinetochore protein and dCas9, and allowing cell divisional in the presence of the selected guide RNA and the fusion protein such that divided cells comprise an aneuploidy karyotype.

16. An expression vector encoding a fusion protein of any one of claims 1-3.

Description:
KaryoCreate (Karyotype CRISPR Engineered Aneuploidy Technology)

CROSS REFERENCE TO RELATED APPLICATION

This application claims priority to U.S. provisional application no. 63/375,181, filed September 9, 2022, the entire disclosure of which is incorporated herein by reference.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH

This invention was made with government support under grant nos. 4R00CA212621- 03 and R37CA248631, awarded by the National Institutes of Health. The government has certain rights in the invention.

SEQUENCE LISTING

The instant application contains a Sequence Listing, which is submitted in .xml format and is hereby incorporated by reference in its entirety. Said .xml file is named “KaryoCreate.xml”, was created on September 1, 2023, and is 519,038 bytes in size.

RELATED INFORMATION

Aneuploidy, i.e. chromosomal gains or losses, is rare in normal tissues 1-3 as it causes cellular stress phenotypes 4,5 . Despite its detrimental effect, aneuploidy is common in cancer, where specific chromosomes tend to be gained or lost more frequently than others 2-6 . We and others have proposed that recurrent patterns of aneuploidy are selected for in cancer to maximize oncogene dosage and minimize tumor-suppressor gene dosage 4,7 .

A challenge in studying aneuploidy is the lack of straightforward methods to generate cell models with a specific chromosome added or removed. Common methods to induce aneuploidy utilize chemical inhibition of mitotic proteins, e.g. MPS1, resulting in random chromosome missegregation 8,9 . Microcell-mediated chromosome transfer induces chromosome gains but this method is quite complicated 10,11 . Centromere inactivation of the Y chromosome can induce its missegregation 12,13 . Newer strategies to induce chromosome losses involve using CRISPR/Cas9 to eliminate all or part of chromosomes 5,14,15 . Other recently described methods use non-centromeric repeats to induce specific losses or, more rarely, gains of chromosomes 1 and 9 16,17 .

Human centromeres contain repetitive α-satellite DNA hierarchically organized in megabase-long arrays called higher-order repeats (HOR), a subset of which bind CENPA, a histone H3 variant critical to kinetochore function 18-21 . In humans, HORs are generally specific to individual chromosomes: 15 autosomes and the 2 sex chromosomes have unique centromeric arrays 19 and the rest can be grouped in two families based on centromere similarity (chromosomes 1, 5, 19 and chromosomes 13, 14, 21, 22). CENPA-bound centromeric sequences direct the kinetochore assembly which enables microtubule binding to mitotic chromosomes 22 . The KMN network (KNL1/MIS12 complex/NDC80 complex) is important in modulating kinetochore-microtubule attachments 23 . In mitosis, each sister kinetochore must be attached to opposite spindle poles to allow their equal and correct segregation 24 . Properly attached chromatids experience an inter-kinetochore mechanical tension required to satisfy the spindle assembly checkpoint (SAC) and allow progression into anaphase 24,25 . SAC activation triggers the activity of Aurora B kinase, which destabilizes kinetochore-microtubule attachments by phosphorylating different targets including NDC80 and KNL1 26 27 . Aurora B activity is counteracted by the action of PPI phosphatase, recruited to the kinetochores through KNL1 28 . The balance between kinase and phosphatase activities determines the fate of the kinetochore-microtubule attachment and the timing of the metaphase-to-anaphase transition. In view of these complexities and the lack of previously methods to induce specific chromosome gains and to produce aneuploidy, there is an ongoing need to provide alternatives to the existing methods. The disclosure is pertinent to this need.

BRIEF SUMMARY

Aneuploidy, the presence of chromosome gains or losses, is a hallmark of cancer and congenital syndromes, such as Down Syndrome. The present disclosure provides compositions and methods for producing aneuploidy. The disclosure provides an approach to generating aneuploidy that is referred to herein as KaryoCreate (Karyotype CRISPR Engineered Aneuploidy Technology). KaryoCreate comprises a CRISPR/Cas9-based technology that uses gRNAs targeting chromosome-specific human centromeric repeats to direct a mutant KNLl/dCas9 construct that interferes with normal mitotic functions, generating chromosome-specific aneuploidy. Using this method, the disclosure demonstrated production of cell models of highly recurrent aneuploidies in human gastro-intestinal cancers and presents data supporting tumor-associated phenotypes occurring after chromosome 18q loss in colorectal cells. The disclosure thus includes a system that enables generation of chromosome-specific aneuploidies by co-expression of a single guide (sg)RNA targeting chromosome-specific CENPA-binding α-satellite repeats together with dCas9 fused to a mutant form of KNL1.

The disclosure includes unique and highly specific sgRNAs for 21 out of 24 human chromosomes. Further, 15 chromosomes out of 24 were validated by imaging and 10 out of 24 were validated by KaryoCreate. The disclosure may be adaptable for use with the remaining human chromosomes, and for use with cells from non-human animals. Expression of the sgRNAs with KNLlMut-dCas9 leads to missegregation and induction of gains or losses of the targeted chromosome in cellular progeny with an average efficiency of 8% and 12% for gains and losses, respectively (up to 20%), tested and validated across 10 chromosomes. Using KaryoCreate in colon epithelial cells, we show that chromosome 18q loss, a frequent occurrence in gastrointestinal cancers, promotes resistance to TGFβ, likely due to synergistic hemizygous deletion of multiple genes. Thus, the disclosure provides a new technology to create and study chromosome missegregation and aneuploidy in the context of cancer and other conditions that are correlated with the presence of aneuploidy. In one non-limiting embodiment, engineered chromosome 18q loss using a described system promotes tumor-associated phenotypes in colon-derived cells.

DESCRIPTION OF FIGURES

Figures 1 A-1F. Prediction and validation of chromosome-specific sgRNAs targeting human α-satellite centromeric sequences. (A) Schematic representation of the computational prediction of chromosome-specific centromeric sgRNAs based on specificity score and predicted efficiency. (B) Idiogram of human karyotype reporting the number of sgRNAs predicted with specificity >99% and validated by imaging for each chromosome. (C) Left: Proliferation assay of centromeric sgRNAs in hCECs expressing Cas9 or empty vector (EV). sgRNAα-β refers to a sgRNA specific for chromosome a where P is the sgRNA serial number. Percentage of live cells relative to EV determined 7 days after transduction by cell counting. Mean and S.D. (standard deviation) are from triplicates; p- values are from Wilcoxon test comparing each condition to NC (*=p<0.05); conditions with significant p- values are in red. Imaging validation is also indicated (see (D).) Right: Western blot showing Cas9 expression. (D) Top: Imaging validation of centromere targeting in hCEC clones (containing 3 copies of chr7 or chrl3) expressing 3xmScarlet-dCas9 and the indicated sgRNAs. Representative images of interphase are shown (percentages of cells displaying the expected number of foci are in Table SI). Scale bars: 5 pM Bottom: Low-pass WGS confirming specific aneuploidies in the two clones. (E) Imaging of hCECs (trisomic for chr7) expressing sgRNA7-l or sgRNA18-4 showing colocalization of 3xmScarlet-dCas9 foci (red) and chromosome 7 or 18 centromeric FISH probes (green); FISH protocol was used after PFA fixation. Colocalization is quantified at right (mean and S.D. from triplicates). (F) Validation of additional sgRNAs as in (D). Figures 2A-2H. KNLl Mut -dCas9 targeted to centromeres induces modest mitotic delay and chromosome missegregation. (A) Left: Maps of KN l RVSF/AAAA -dCas9 and dCas9- KN 1 RVSF/AAAA constructs. Right: Western blot showing the expression of the indicated constructs in hCECs. (B) Top: Time-lapse imaging of hCECs expressing H2B-GFP, KNLl Mut -dCas9, and the indicated sgRNA. Cells were analyzed for time spent in mitosis and for lagging chromosomes (quantified in C and D), and representative images are shown. Bottom: Analysis performed in H2B-GFP hCECs co-expressing 3xmScarlet-KNLl Mut -dCas9 and sgChr7-l, indicating specific chromosome missegregation. (C) Quantification of mitotic duration (time spent between metaphase and anaphase onset) of cells in (B) (mean and S.D. from triplicates; >25 dividing cells analyzed per condition). (D) Quantification as in (C) reporting % of mitoses showing lagging chromosomes. (E) Immunofluorescence (IF) analysis of mitotic HCT116 cells expressing KNLl Mut -dCas9 and sgChr7-l or sgChrl8-4 or sgNC stained as indicated. White arrows point to misaligned chromosomes. (F) Quantification of chromosome congression defects in (E) (mean and S.D. from triplicates). (G) Analysis of micronuclei in hCECs expressing KNLl Mut -dCas9 and sgChr7-l, sgChrl8-4, or sgNC. The percentage of cells with micronuclei relative to EV was determined 7 days after transduction (mean and S.D. from triplicates; >50 cells per condition). (H) Representative images and quantification of chr-18-containing micronuclei in cells treated as in (G), from triplicate experiments.

Figures 3A-3G. KNLl Mut -dCas9 is recruited to human centromeres and allows induction of chromosome-specific gains and losses. (A) KaryoCreate conceptualization: Chromosome specificity of human α-satellite centromeric sequences makes it possible to induce mis segregation of a specific chromosome while leaving the others unaffected. (B) Western blot showing the expression of KaryoCreate constructs in hCECs, either through transient transfection with a constitutive promoter (pHAGE-CMV) or through infection with a doxycycline (Doxy)-inducible promoter (pIND20). (C) KaryoCreate experimental plan with transient KNLl Mut -dCas9 expression and (transient or constitutive) sgRNA expression; cells are harvested after 7-9 days for validation by FISH and can then be plated to create single- cell clones. (D) Representative FISH images using probes specific for chr7 or chrl8 on hCECs showing gains and losses after KaryoCreate with the indicated sgRNAs. (E) Quantification of the experiment shown in (D) for chr7 (top) or chrl8 (bottom); see also Table S2 for automated image quantification. Mean and S.D. from triplicates. Gain and loss are the first and second bars in each set of two bars, from left to right, respectively. (F) Representative metaphase spreads from hCECs treated as in (D) and analyzed by FISH using probes specific for chr7 and chrl8 as indicated. (G) Quantification of FISH signals from (F) (mean and S.D. from triplicates). Gain and loss are the first and second bars in each set of two bars, from left to right, respectively.

Figure 4. KaryoCreate induces both arm-level and chromosome-level gains and losses across different human chromosomes. Heatmap depicting arm-level copy numbers inferred from scRNA-seq analysis in KaryoCreate experiments using the indicated sgRNAs. scRNA- seq was used to quantify the presence of chromosome- or arm-level gains or losses using a modified version of CopyKat (see Methods). Rows represent individual cells, columns represent chromosomes, gains in and losses as indicated. ‘Higher expression of KNLl Mut - dCas9’ indicates that the cells were transduced with a larger amount of the construct (as in Fig. 8D). See also Table S3 for quantification of arm- and chromosome-level events.

Figures 5A-5G. Loss of 18q in colon cancer cells promotes resistance to TGFp signaling. (A) Frequency of copy number alteration in colorectal cancer (TCGA) indicated as percentage of patients with gain or loss for each chromosome. (B) Kaplan-Meier survival analysis for colorectal cancer patients (TCGA) displaying or not displaying 18q loss (A=number). (C) Top: Shallow WGS analysis of single-cell-derived clones obtained by KaryoCreate using sgNC or sgChrl8-4 performed on diploid hCECs to identify arm-level gains and losses. Each row represents a single clone. Bottom: Plots of copy number alterations from WGS of two representative clones treated with sgChrl8-4. (D) Bulk RNA- seq showing differential expression analysis between clone 14 (18q loss) and clone 13 (diploid) using DESeq2 and GSEA (performed using the Hallmark gene sets); the top 7 pathways depleted in clone 14 are shown, including TGFβ signaling as the top depleted one. (E) Effects of TGFβ (20 ng/ml) on clone 13 and 14 growth monitored for 9 days. Cells were counted every 3 days in quadruplicates. p- value is from Wilcoxon test comparing the difference in cell number between treated and untreated clone 14 cultures versus the same difference calculated for clone 13 cultures. (F) Top 10 predicted tumor-suppressor genes (TSG) on 18q and their genomic locations. TSG were predicted based on the correlation between DNA and RNA levels, survival analysis, and TUSON-based q- value for the prediction of TSGs 4 (see Methods). (G) Western blot analysis for SMAD2, SMAD4, and GAPDH (as control) in clones 13 and 14. Quantification of SMAD2/SMAD4 levels after normalization against GAPDH.

Figures 6A-6E (related to Figs. 1 A-1F). Prediction and validation of chromosome- specific sgRNAs targeting human α-satellite centromeric sequences. (A) Left: Proliferation assay on RPEs p21/Rb shRNA expressing Cas9 or empty vector (EV) transduced with lentiviral vectors expressing the indicated sgRNAs. The same number of cells were plated in 6-well plates and the percentage of live cells relative to EV was determined 7 days after transduction. Mean and S.D. from triplicates, p-values from Wilcoxon test (*=p<0.05). Imaging validation is also indicated in red. Right: Western blot showing Cas9 expression. (B) Left: Imaging of hCECs (47, +7) expressing 3xmScarlet-dCas9 and sgChr7-l in the polyclonal population and in a derived clone (clone 8) with high 3xmScarlet-dCas9 expression. As compared to the polyclonal population, clone 8 contains a higher percentage of cells showing the expected foci. Average frequency of cells displaying foci is shown for the polyclonal and clonal populations (>100 cells counted; in triplicates). Right: Western blot analysis of the expression level of 3xmScarlet-dCas9 in the polyclonal population and clone 8. The percentage of cells showing foci was 45% in the hCEC polyclonal population transduced with 3xmScarlet-dCas9 and increased to 72% in clone 8. (C) Imaging of hCECs expressing 3xmScarlet-dCas9 and the indicated sgRNAs. Representative images of interphase cells are shown; the percentage of cells displaying foci is shown in Table SI. See also Fig. IF. (D) Imaging of RPEs p21/Rb shRNA expressing 3xmScarlet-dCas9 fusion and the indicated sgRNAs. Representative images of interphase cells are shown. (E) Top: Correlation between the intensity of the signal of the 3xmScarlet-dCas9 foci (measured with ImageJ/Fiji) and the sgRNA activity score (Doench et al., 2016, 2014) of cells treated as in (C). Bottom: Correlation between the intensity of the signal of the 3xmScarlet-dCas9 foci and the number of predicted sgRNA binding sites on the specific centromere (based on the T2T genome assembly) of cells treated as in (C). Pearson correlation coefficients and corresponding p- values are shown.

Figures 7A-7F (related to Figs. 2A-2H). Analysis of KNLl Mut -dCas9 and other fusion proteins targeted to centromeres. (A) Maps of the dCas9, KNLl RVSF/AAAA -dCas9, KNLl S24A;S60A -dCas9, NDC80-CHl-dCas9, and NDC80-CH2-dCas9 constructs. The predicted function of each construct is indicated on the right. See text for details. (B) Western blot showing the expression of the indicated constructs in hCECs. (C) Western blot showing the expression of the indicated constructs, in which different mutated segments of KNL1 or NDC80 are fused to the N- or C-terminus of dCas9; see also (A). L: linker with amino acid sequence GGSGGGS (SEQ ID NO: 5). (D) Imaging of hCECs (47, +7) expressing 3xmScarlet-KNLl Mut -dCas9 and transduced with sgChr7-l or sgChrl8-4. (E) Proliferation rate of hCECs transduced with KNLl Mut -dCas9 and with the indicated sgRNAs. Mean and S.D. from triplicates are shown for each time point. (F) FISH imaging and quantification of micronuclei containing chromosome 7 or 13 in hCECs treated with KNLl Mut -dCas9 and the indicated sgRNA (as in Fig. 2G); quantification of micronuclei counts is shown below. Experiments were performed in duplicates, and for each replicate, at least 100 cells were scored.

Figures 8A-8H (related to Figs. 3A-3G). Analysis of KNLl Mut -dCas9 and other fusion proteins targeted to centromeres for the induction of chromosome-specific gains and losses. (A) KaryoCreate experiment in hCECs comparing the efficiency of different methods for delivering KNLl Mut -dCas9, as quantified by FISH. Methods: (1) transfection of pHAGE- KNLl Mut -dCas9, whose expression of KNLl Mut -dCas9 is driven by the CMV promoter; (2) lentiviral-mediated transduction with pIND20-KNLl Mut -dCas9, whereby the vector is integrated in the genome of the target cells and expression of KNLl Mut -dCas9 is driven by doxycycline treatment (1 pg/ml); (3) lentiviral-mediated transduction with pHAGE-DD- KNLl Mut -dCas9, whereby expression of KNLl Mut -dCas9 is driven by treatment with shield-1 to stabilize the protein. All cells were transduced with sgChr7-l, and FISH quantification of chr7 gains/losses is shown (mean and S.D. from triplicates). Gain and loss are the first and second bars in each set of two bars, from left to right, respectively. (B) KaryoCreate experiment comparing the efficiency of different constructs in inducing chromosome gains and losses. hCECs were transduced with sgChr7-l and the indicated constructs. FISH quantification for chr7 gains/losses is shown (mean and S.D. from triplicates), along with the aneuploidy level (% of chr7 gains/losses) normalized to the expression level of each construct (as in Fig. 7B). Note that after normalization, the induction of aneuploidy is greatest for NDC80 CH2 -dCas9 and is higher for KNLl S24A;S60A -dCas9 than for KNLl RVSF/AAAA -dCas9. (C) Left: Western blot analysis of the indicated constructs. Right: KaryoCreate experiment to compare the efficiency of different constructs in inducing chromosome gains and losses. hCECs were transduced with sgChr7-l and the indicated constructs, and FISH quantification of chr7 gains/losses is shown. Gain and loss are the first and second bars in each set of two bars, from left to right, respectively. (D) Left: Western blot analysis of dCas9 expression in hCECs transduced with KNLl Mut -dCas9 using different amount of virus (about 3 times more virus in the HIGH versus LOW sample, i.e. MOI of 6 for HIGH and 2 for LOW). The corresponding quantification (through ImageJ) is shown below. Right: FISH quantification of chr7 gains/losses in cells expressing KNLl Mut -dCas9 transduced with sgChr7-l using different amounts of virus at 9-10 days after transduction (mean and S.D. from duplicates). (E) FISH quantification of chr7 gains/losses in hCECs transduced with KNLl Mut -dCas9 and with sgChr7-l and/or sgChr7-3 (mean and S.D. from duplicates). (F) Single-cell sequencing quantification of chr9 gains/losses in hCECs were transduced with KNLl Mut -dCas9 and with sgNC, sgChr9-3 and/or sgChr9-5 (mean and S.D. from technical duplicates). (G) Left: FACS sorting results for hCECs treated as in (D) using an MOI of 2 after sorting for low or high expression of the cell surface protein EPHB4, encoded by a gene on chr7. Right: FISH quantification of the % of chr7 gains or losses in each condition (A=100 nuclei; mean and S.D. from duplicates). *,p-value<0.05 (Welch two-sample Ltest). Gain and loss are the first and second bars in each set of two bars, from left to right, respectively. (H) scRNA-seq analysis of chromosome or arm gains/losses (as in Fig. 4) in hCECs transduced with KNLl Mut -dCas9 (via infection with pIND20-KNLl Mut -dCas9 lentiviral vector) and sgChr7-l. Cells were treated with doxycycline for the indicated number of days to induce construct expression; experiment performed in duplicate.

Figures 9A-9I (related to Fig. 4). Analysis of KaryoCreate across chromosomes and conditions. (A) Analysis of hCEC clones with different aneuploidies by bulk WGS (top) and scRNA-seq (bottom). Arm-level copy number events were inferred from each method (see Methods) and the derived copy number profiles are shown for both methods. See also (B). (B) FISH and scRNA-seq analyses of hCEC clones with chr7 trisomy or more complex karyotypes and the percentage of aneuploid cells was quantified using both methods. Mean values from duplicates are shown. (C) A heatmap depicting gene copy numbers inferred from scRNA-seq analysis following KaryoCreate control experiments. hCECs were transduced either with empty vector or with KNLl Mut -dCas9 together with a negative control sgRNA (sgNC), and scRNA-seq was performed as in (B) to estimate % of gains and losses across chromosomes. (D) A heatmap depicting gene copy numbers inferred from scRNA-seq analysis following KaryoCreate. KaryoCreate for different individual chromosomes (or combination of chromosomes) was performed on RPEs. scRNA-seq was used to estimate the presence of chromosome- or arm-level gains or losses using a modified version of CopyKat. The median expression of genes across each chromosome arm is used to estimate the DNA copy number. The % of gains/losses for each arm (reported below each heatmap) is estimated by comparing the DNA copy number distribution of each experimental sample (chromosome- specific sgRNA) to that of the negative control (sgRNA NC; see also Methods). Heatmap rows represent individual cells, columns represent different chromosomes, and the color represents the copy number change (gain in red and loss in blue). (E) Average proportions (%) of whole-chromosome and arm-level gains/losses. The percentage of the indicated events were calculated as the average among the aneuploid cells generated using KaryoCreate for chromosomes 6, 7, 8, 9, 12, 16, and X (mean values from duplicates). (F) A heatmap depicting chromosome copy numbers inferred from scRNA-seq analysis following KaryoCreate. KaryoCreate was performed on hCECs using two sgRNAs targeting chromosome 7 (sgChr7-l) and 18 (sgChrl8-4). scRNA-seq was used to estimate the presence of chromosome- or arm-level gains/losses using a modified version of CopyKat as in (D). Heatmap rows represent individual cells, columns represent different chromosomes, and the color represents the copy number change (gain in red and loss in blue). (G) Immunofluorescence (IF) assay showing DNA damage in HCT116 cells expressing KNLl Mut -dCas9 and sgNC, sgChr7-l, or sgChrl8-4. IF was performed for γH2AX (green), CREST (red) to visualize centromeres, and DAPI (blue). Representative images are shown. (H) Quantification of experiment shown in (G). Left: number of DNA damage foci colocalizing with CREST in each cell, quantified and normalized to the total number of CREST foci in the cell. Right: total γH2AX signal per cell, quantified and normalized to the total DAPI signal, p- values are from Wilcoxon test. (I) Left: The total γH2AX signal per cell as determined by IF analysis of hCECs expressing KNLl Mut -dCas9 (pIND20 vector) and sgNC or sgChr7-l for γH2AX (green) and DAPI (blue), quantified and normalized to the total DAPI signal. Right: Western blot analysis of KNLl Mut -dCas9 expression before or after treatment with doxycycline to induce construct expression. p-values are from Wilcoxon test.

Figures 10A-10H (related to Figs. 5A-5G). Dissection of the consequences of 18q loss in colorectal cancer. (A) Schematic of experimental plan to apply KaryoCreate across different chromosomes to derive single-cell clones with specific gains or losses. (B) Shallow WGS analysis of single-cell-derived clones obtained by KaryoCreate using sgNC or sgChr7- 1 performed on diploid hCECs (as indicated). (C) Representative FISH images and copy number plots from WGS analysis of hCEC sgChr7-l clone 23 (B) before or after 25 population doublings in culture. (D) Survival analysis (Kaplan-Meier curve) for colorectal cancer patients (TCGA - COADREAD) displaying or not displaying 18q loss, after exclusion of patients with SMAD4 point mutation. (E) Proliferation rates of the indicated hCEC clones 13 and 14 (18q loss) (as in Fig. 5E) after the overexpression of the indicated genes. Mean and S.D. are shown for triplicates; p-values are from Wilcoxon test (*=p<0.05). Proliferation rates for hCEC clones 10 and 5 (18 loss) with and without TGFp are also shown. (F) Western blot showing SMAD2 and SMAD4 levels in hCEC clone 13 after overexpression of GFP, SMAD2, SMAD4, or SMAD2 + SMAD4. Related to Fig. 10E. (G) Proliferation rates of the indicated hCEC cell lines (clone 14 and hCEC transduced with dCas9 and a SMAD4 or NC sgRNA) when cultured in the presence of TGFβ (20 ng/ml) for 9 days; cells were counted every 3 days in triplicates. p-value is derived from the Wilcoxon test. Western blot showing SMAD4 levels in hCECs transduced with dCas9 and a SMAD4 or NC sgRNA. Related to Fig. 10G. (H) Western blot showing SMAD4 levels in hCECs transduced with dCas9 and a SMAD4 or NC sgRNA. Related to Fig. S5G.

DETAILED DESCRIPTION

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

Every numerical range given throughout this specification includes its upper and lower values, as well as every narrower numerical range that falls within it, as if such narrower numerical ranges were all expressly written herein.

As used in the specification and the appended claims, the singular forms “a” "and” and “the" include plural referents unless the context clearly dictates otherwise. Ranges may be expressed herein as from “about” one particular value, and/or to “about” another particular value. When such a range is expressed, another embodiment includes from the one particular value and/or to the other particular value. Similarly, when values are expressed as approximations, by the use of the antecedent “about” it will be understood that the particular value forms another embodiment. The term “about” in relation to a numerical value is optional and means for example +/-10%.

This disclosure includes every amino acid sequence described herein and all nucleotide sequences encoding the amino acid sequences. Every sequence having from 80- 99% similarity, inclusive, and including and all numbers and ranges of numbers there between, with the sequences provided here are included in the invention. All of the amino acid sequences described herein can include amino acid substitutions, such as conservative substitutions, that do not adversely affect the function of the protein that comprises the amino acid sequences. All amino acid sequences encoded by the described polynucleotides are expressly included within this disclosure. The disclosure includes all segments of described polynucleotides that contain open reading frames.

All sequences that are described by reference to a database are incorporated herein by reference as the sequences exist in the database as of the effective filing date of this application or patent. All sequences referred to in publications are incorporated herein by reference.

This disclosure provides compositions, methods, and systems referred to herein as noted above as KaryoCreate, a new method that includes CRISPR/Cas9 technology combined with chromosome specificity for human centromeric α-satellite repeats with interfering with normal functions of the KMN network (in particular KNLI) to generate chromosome-specific aneuploidy. The described approach involves use of a fusion protein comprising a mutated kinetochore protein and dCas9.

In an embodiment the kinetochore protein is KNLI protein or a functional segment thereof. In embodiments, the KNLI protein or the functional segment thereof comprises one or more mutations. In embodiments, the kinetochore protein comprises a segment of KNLI protein, wherein the segment of the KNLI protein comprises at least the first 86 N-terminal amino acids of the KNLI protein, and wherein the first 86 N-terminal amino acids comprise a mutation of the sequence RVSF to AAAA, or S24A, or S60A, or a combination thereof.

The fusion protein may be modified to include a suitable nuclear localization signal.

In an embodiment, a KNLlRVSF/AAAA-dCas9 fusion protein is used. In another embodiment, a KNLl S24A;S60A -dCas9 fusion protein is used.

Any suitable linker sequence may be present between the KNLI protein segment and the dCas9 segment. In an embodiment a suitable linker comprises a GS sequence. In an embodiment, the linker has the sequence GGSGGGS (SEQ ID NO: 5).

In embodiments, the described fusion proteins have amino acid sequences that are encoded by the following DNA sequences:

KNLI linker dCas9

KNLl RVSF/AAAA -dCas9 (SEQ ID NO: 3)

ATGGATGGGGTGTCTTCAGAGGCTAATGAAGAAAATGACAATATAGAGAGACCTGTT AGAAGAC GGCATTCTTCAATATTGAAACCCCCAAGGAGTCCTCTTCAGGACCTCAGAGGTGGGAATG AAAC AGTTCAAGAGTCAAACGCGTTAAGGAATAAGAAAAACTCTCGTGCAGCCGCCGCTGCAGA TACT ATAAAGGTATTCCAGACGGAGTCTCATATGAAAATAGTGAGAAAGTCAGAAATGGAAGAA ACAG AA ggcggttccggcggagggtcg

GACAAGAAGTACAGCATCGGCCTGGCCATCGGCACCAACTCTGTGGGCTGGGCCGT GATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCG ACCGGCACAGCATCAAGAAGAACCTGATCGGCGCCCTGCTGTTCGACAGCGGAGAA ACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGA AGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGAC GACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCA CGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGT ACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGAC CTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCT GATCGAGGGCGACCTGAACCCCGACAACAGCGACGTGGACAAGCTGTTCATCCAGC TGGTGCAGACCTACAACCAGCTGTTCGAGGAAAACCCCATCAACGCCAGCGGCGTG GACGCCAAGGCCATCCTGTCTGCCAGACTGAGCAAGAGCAGACGGCTGGAAAATCT GATCGCCCAGCTGCCCGGCGAGAAGAAGAATGGCCTGTTCGGCAACCTGATTGCCC TGAGCCTGGGCCTGACCCCCAACTTCAAGAGCAACTTCGACCTGGCCGAGGATGCC AAACTGCAGCTGAGCAAGGACACCTACGACGACGACCTGGACAACCTGCTGGCCCA

GATCGGCGACCAGTACGCCGACCTGTTTCTGGCCGCCAAGAACCTGTCCGACGCCA

TCCTGCTGAGCGACATCCTGAGAGTGAACACCGAGATCACCAAGGCCCCCCTGAGC

GCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGC

TCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCA

AGAACGGCTACGCCGGCTACATCGATGGCGGAGCCAGCCAGGAAGAGTTCTACAA

GTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGC

TGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCC

CCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTT

ACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATC

CCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAG

AAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGC

GCCAGCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAA

CGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTACAACG

AGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGC

GGCGAGCAGAAAAAAGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGAC

CGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGG

AAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTG

CTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCT

GGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAAC

GGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGG

CGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGG

ACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAAC

AGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCA

GAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGG

CCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGA

GCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCA

GAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCG

GATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTG

GAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCG

GGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGG

ACGCTATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGATAACAAAGTGCTG

ACTCGGAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCG

TGAAGAAGATGAAGAACTACTGGCGCCAGCTGCTGAATGCCAAGCTGATTACCCAG

AGGAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATA

AGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTG

GCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAACGACAAACTGAT

CCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGG

ATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCC

TACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAG

CGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGA

GCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATG

AACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCT

GATCGAGACAAACGGCGAAACAGGCGAGATCGTGTGGGATAAGGGCCGGGACTTT

GCCACCGTGCGGAAAGTGCTGTCTATGCCCCAAGTGAATATCGTGAAAAAGACCGA

GGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGACA

AGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAG

CCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCA

AGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGC

TTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAA

GGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGA

AGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCC

CTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCT

CCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAACACTACCTG

GACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGC TAATCTGGACAAGGTGCTGAGCGCCTACAACAAGCACAGAGACAAGCCTATCAGAG

AGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCC

GCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGA

GGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGA

TCGACCTGTCTCAGCTGGGAGGCGACGCCTATCCCTATGACGTGCCCGATTATGCC

AGCCTGGGCAGCGGCTCCCCCAAGAAAAAACGCAAGGTGGAAGATCCTAAGAAAAA

GCGGAAAGTGGACGGCATTGGTAGTGGGAGCAACGGCAGCAGCGGATCCtga

The KNL1 RVSF/AAAA segment of the fusion protein sequence encoded by the DNA sequence above is:

MDGVSSEANEENDNIERPVRRRHSSILKPPRSPLQDLRGGNETVQESNALRNKKNSR AAAAA

DTIKVFQTESHMKIVRKSEMEETE (SEQ ID NO: 1)

In an embodiment, the KNLl S24A;S60A -dCas9 (SEQ ID NO: 4) fusion protein is encoded by the following DNA sequence:

KNL1 S24A:S60A _linkerdCas9 (SEQ ID NO: 4)

ATGGATGGGGTGTCTTCAGAGGCTAATGAAGAAAATGACAATATAGAGAGACCTGTT AGAAGAC GGCATGCCTCAATATTGAAACCCCCAAGGAGTCCTCTTCAGGACCTCAGAGGTGGGAATG AAA CAGTTCAAGAGTCAAACGCGTTAAGGAATAAGAAAAACTCTCGTCGAGTCGCCTTTGCAG ATAC TATAAAGGTATTCCAGACGGAGTCTCATATGAAAATAGTGAGAAAGTCA ggcggttccggcggagggtcg

GACAAGAAGTACAGCATCGGCCTGGCCATCGGCACCAACTCTGTGGGCTGGGCCGT GATCACCGACGAGTACAAGGTGCCCAGCAAGAAATTCAAGGTGCTGGGCAACACCG ACCGGCACAGCATCAAGAAGAACCTGATCGGCGCCCTGCTGTTCGACAGCGGAGAA ACAGCCGAGGCCACCCGGCTGAAGAGAACCGCCAGAAGAAGATACACCAGACGGA AGAACCGGATCTGCTATCTGCAAGAGATCTTCAGCAACGAGATGGCCAAGGTGGAC GACAGCTTCTTCCACAGACTGGAAGAGTCCTTCCTGGTGGAAGAGGATAAGAAGCA CGAGCGGCACCCCATCTTCGGCAACATCGTGGACGAGGTGGCCTACCACGAGAAGT ACCCCACCATCTACCACCTGAGAAAGAAACTGGTGGACAGCACCGACAAGGCCGAC CTGCGGCTGATCTATCTGGCCCTGGCCCACATGATCAAGTTCCGGGGCCACTTCCT GATCGAGGGCGACCTGAACCCCGACAACAGCGACGTGGACAAGCTGTTCATCCAGC TGGTGCAGACCTACAACCAGCTGTTCGAGGAAAACCCCATCAACGCCAGCGGCGTG GACGCCAAGGCCATCCTGTCTGCCAGACTGAGCAAGAGCAGACGGCTGGAAAATCT GATCGCCCAGCTGCCCGGCGAGAAGAAGAATGGCCTGTTCGGCAACCTGATTGCCC TGAGCCTGGGCCTGACCCCCAACTTCAAGAGCAACTTCGACCTGGCCGAGGATGCC

AAACTGCAGCTGAGCAAGGACACCTACGACGACGACCTGGACAACCTGCTGGCCCA GATCGGCGACCAGTACGCCGACCTGTTTCTGGCCGCCAAGAACCTGTCCGACGCCA TCCTGCTGAGCGACATCCTGAGAGTGAACACCGAGATCACCAAGGCCCCCCTGAGC GCCTCTATGATCAAGAGATACGACGAGCACCACCAGGACCTGACCCTGCTGAAAGC TCTCGTGCGGCAGCAGCTGCCTGAGAAGTACAAAGAGATTTTCTTCGACCAGAGCA AGAACGGCTACGCCGGCTACATCGATGGCGGAGCCAGCCAGGAAGAGTTCTACAA GTTCATCAAGCCCATCCTGGAAAAGATGGACGGCACCGAGGAACTGCTCGTGAAGC TGAACAGAGAGGACCTGCTGCGGAAGCAGCGGACCTTCGACAACGGCAGCATCCC CCACCAGATCCACCTGGGAGAGCTGCACGCCATTCTGCGGCGGCAGGAAGATTTTT ACCCATTCCTGAAGGACAACCGGGAAAAGATCGAGAAGATCCTGACCTTCCGCATC CCCTACTACGTGGGCCCTCTGGCCAGGGGAAACAGCAGATTCGCCTGGATGACCAG AAAGAGCGAGGAAACCATCACCCCCTGGAACTTCGAGGAAGTGGTGGACAAGGGC GCCAGCGCCCAGAGCTTCATCGAGCGGATGACCAACTTCGATAAGAACCTGCCCAA CGAGAAGGTGCTGCCCAAGCACAGCCTGCTGTACGAGTACTTCACCGTGTACAACG

AGCTGACCAAAGTGAAATACGTGACCGAGGGAATGAGAAAGCCCGCCTTCCTGAGC

GGCGAGCAGAAAAAAGCCATCGTGGACCTGCTGTTCAAGACCAACCGGAAAGTGAC

CGTGAAGCAGCTGAAAGAGGACTACTTCAAGAAAATCGAGTGCTTCGACTCCGTGG

AAATCTCCGGCGTGGAAGATCGGTTCAACGCCTCCCTGGGCACATACCACGATCTG

CTGAAAATTATCAAGGACAAGGACTTCCTGGACAATGAGGAAAACGAGGACATTCT

GGAAGATATCGTGCTGACCCTGACACTGTTTGAGGACAGAGAGATGATCGAGGAAC

GGCTGAAAACCTATGCCCACCTGTTCGACGACAAAGTGATGAAGCAGCTGAAGCGG

CGGAGATACACCGGCTGGGGCAGGCTGAGCCGGAAGCTGATCAACGGCATCCGGG

ACAAGCAGTCCGGCAAGACAATCCTGGATTTCCTGAAGTCCGACGGCTTCGCCAAC

AGAAACTTCATGCAGCTGATCCACGACGACAGCCTGACCTTTAAAGAGGACATCCA

GAAAGCCCAGGTGTCCGGCCAGGGCGATAGCCTGCACGAGCACATTGCCAATCTGG

CCGGCAGCCCCGCCATTAAGAAGGGCATCCTGCAGACAGTGAAGGTGGTGGACGA

GCTCGTGAAAGTGATGGGCCGGCACAAGCCCGAGAACATCGTGATCGAAATGGCCA

GAGAGAACCAGACCACCCAGAAGGGACAGAAGAACAGCCGCGAGAGAATGAAGCG

GATCGAAGAGGGCATCAAAGAGCTGGGCAGCCAGATCCTGAAAGAACACCCCGTG

GAAAACACCCAGCTGCAGAACGAGAAGCTGTACCTGTACTACCTGCAGAATGGGCG

GGATATGTACGTGGACCAGGAACTGGACATCAACCGGCTGTCCGACTACGATGTGG

ACGCTATCGTGCCTCAGAGCTTTCTGAAGGACGACTCCATCGATAACAAAGTGCTG

ACTCGGAGCGACAAGAACCGGGGCAAGAGCGACAACGTGCCCTCCGAAGAGGTCG

TGAAGAAGATGAAGAACTACTGGCGCCAGCTGCTGAATGCCAAGCTGATTACCCAG

AGGAAGTTCGACAATCTGACCAAGGCCGAGAGAGGCGGCCTGAGCGAACTGGATA

AGGCCGGCTTCATCAAGAGACAGCTGGTGGAAACCCGGCAGATCACAAAGCACGTG

GCACAGATCCTGGACTCCCGGATGAACACTAAGTACGACGAGAACGACAAACTGAT

CCGGGAAGTGAAAGTGATCACCCTGAAGTCCAAGCTGGTGTCCGATTTCCGGAAGG

ATTTCCAGTTTTACAAAGTGCGCGAGATCAACAACTACCACCACGCCCACGACGCC

TACCTGAACGCCGTCGTGGGAACCGCCCTGATCAAAAAGTACCCTAAGCTGGAAAG

CGAGTTCGTGTACGGCGACTACAAGGTGTACGACGTGCGGAAGATGATCGCCAAGA

GCGAGCAGGAAATCGGCAAGGCTACCGCCAAGTACTTCTTCTACAGCAACATCATG

AACTTTTTCAAGACCGAGATTACCCTGGCCAACGGCGAGATCCGGAAGCGGCCTCT

GATCGAGACAAACGGCGAAACAGGCGAGATCGTGTGGGATAAGGGCCGGGACTTT

GCCACCGTGCGGAAAGTGCTGTCTATGCCCCAAGTGAATATCGTGAAAAAGACCGA

GGTGCAGACAGGCGGCTTCAGCAAAGAGTCTATCCTGCCCAAGAGGAACAGCGACA

AGCTGATCGCCAGAAAGAAGGACTGGGACCCTAAGAAGTACGGCGGCTTCGACAG

CCCCACCGTGGCCTATTCTGTGCTGGTGGTGGCCAAAGTGGAAAAGGGCAAGTCCA

AGAAACTGAAGAGTGTGAAAGAGCTGCTGGGGATCACCATCATGGAAAGAAGCAGC

TTCGAGAAGAATCCCATCGACTTTCTGGAAGCCAAGGGCTACAAAGAAGTGAAAAA

GGACCTGATCATCAAGCTGCCTAAGTACTCCCTGTTCGAGCTGGAAAACGGCCGGA

AGAGAATGCTGGCCTCTGCCGGCGAACTGCAGAAGGGAAACGAACTGGCCCTGCC

CTCCAAATATGTGAACTTCCTGTACCTGGCCAGCCACTATGAGAAGCTGAAGGGCT

CCCCCGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAACACTACCTG

GACGAGATCATCGAGCAGATCAGCGAGTTCTCCAAGAGAGTGATCCTGGCCGACGC

TAATCTGGACAAGGTGCTGAGCGCCTACAACAAGCACAGAGACAAGCCTATCAGAG

AGCAGGCCGAGAATATCATCCACCTGTTTACCCTGACCAATCTGGGAGCCCCTGCC

GCCTTCAAGTACTTTGACACCACCATCGACCGGAAGAGGTACACCAGCACCAAAGA

GGTGCTGGACGCCACCCTGATCCACCAGAGCATCACCGGCCTGTACGAGACACGGA

TCGACCTGTCTCAGCTGGGAGGCGACGCCTATCCCTATGACGTGCCCGATTATGCC

AGCCTGGGCAGCGGCTCCCCCAAGAAAAAACGCAAGGTGGAAGATCCTAAGAAAAA

GCGGAAAGTGGACGGCATTGGTAGTGGGAGCAACGGCAGCAGCGGATCCtga

The KNL1 S24A;S6OA segment of the fusion protein encoded by the DNA sequence above is:

MDGVSSEANEENDNIERPVRRRHASILKPPRSPLQDLRGGNETVQESNALRNKKNSR

RVAFADTIKVFQTESHMKIVRKS (SEQ ID NO: 2) The sequence of dCas9 is well known in the art. The sequence of the dCas9 used in this disclosure is evident from the DNA sequences described herein.

The described fusion protein can be provided in a composition that is suitable for introducing the fusion protein into cells. The composition may include one or more guide RNAs, or the fusion protein may be introduced concurrently or sequentially into cells with one or more guide RNAs. The guide RNA targets the fusion protein to a location of kinetochore assembly on a centromere such that the fusion protein interferes with chromosome segregation.

The described fusion protein and the RNAs are used in a method to produce aneuploidy in eukaryotic cells. The method comprises introducing into cells a described fusion protein and at least one guide RNA that targets the fusion protein to a location of kinetochore assembly on a centromere of a specific chromosome such that the fusion protein interferes with segregation of the chromosome. The cells are then allowed to divide in the presence of the fusion protein and the guide RNA such that cell division results in divided cells that comprise an aneuploidy karyotype. In an embodiment the aneuploidy karyotype comprises a gain of a chromosome. In an embodiment, the aneuploidy karyotype comprises a loss of a chromosome. In an embodiment, the aneuploidy karyotype is associated with a malignant cell phenotype. The disclosure also provides an isolated population of cells made by the described methods, as well as cell lines with the engineered aneuploidy karyotypes.

The disclosure also provides a kit comprising the described fusion protein. The kit may include one or a plurality of guide RNAs that target the fusion protein to one or more locations of kinetochore assembly on a centromere of one or more chromosomes, one or more expression vectors that encode one or a plurality of guide RNAs, and/or an expression vector that encodes the described fusion protein, or the fusion protein itself. The components of the kit may be provided in one or more containers. The container(s) may contain reagents used to practice a method of the disclosure. The reagents may be provided in a ready to use buffer, or may be adapted for reconstitution in a suitable buffer, such as by lyophilization. The kits may include printed material that instructs a user how to use the kit contents in order to perform a described method. As such, the disclosure includes articles of manufacture that comprise one or more containers containing the described proteins and/or polynucleotides encoding the proteins, and printed material that describes contents and/or how to use the components in a described method. The disclosure also provides a method comprising selecting a guide RNA that targets a location of kinetochore assembly on a centromere of a specific chromosome, and introducing into cells a combination of the selected guide RNA and a fusion protein comprising a mutated kinetochore protein and dCas9, allowing cell divisional in the presence of the selected guide RNA and the fusion protein such that divided cells comprise an aneuploidy karyotype.

The described compositions, methods, and systems can be introduced into cells using a variety of approaches, such as by using mRNA, or a ribonucleoprotein (RNP) complex, or plasmids or other expression vectors, or combinations thereof. In embodiments, a viral vector can be used. In embodiments, a phagemid or modified bacteriophage can be used. The expression of the fusion protein may be driven by a promoter that is operably linked to the sequence coding the fusion protein. The promoter may be an inducible or constitutive promoter. Thus, in certain embodiments, such as by use of an inducible promoter, expression of the fusion protein and/or the guide RNA can be controlled such that the expression is transient.

Viral expression vectors may be used as naked polynucleotides, or may comprise viral particles. In embodiments, the expression vector comprises a modified viral polynucleotide, such as from an adenovirus, a herpesvirus, or a retrovirus, such as a lentiviral vector. In embodiments, a sequence encoding the described fusion protein and/or a guide RNA may be integrated into a chromosome of the same cell in which aneuploidy is induced.

In embodiments, one or more components of the described systems may be delivered to cells using, for example, a recombinant adeno-associated virus (rAAV) vector or a lentiviral vector. In embodiments, non-viral delivery systems may be used for introducing one or more of the components of the described system. Non-viral tools including hydrodynamic injection, electroporation and microinjection. In embodiments, and as described further below, more than one guide RNA can be used. In embodiments, the disclosure includes combining pairs of centromeric sgRNAs for use in a single cell. The guide RNAs used in the disclosure may be fully processed, or subjected to a processing step before they are used.

The gRNA binding sequences are provided in Table SI (SELECTED gRNAs) as DNA sequences. The disclosure expressly includes each DNA sequence in the form of RNA wherein each T is replaced by a U. This table contains all the gRNA binding that were tested and contains information on which gRNAs were validated by imaging through visualization of the centromeres. Furthermore, a subset of these gRNAs validated by imaging was also validated using scRNAseq and KaryoCreate as shown in Fig. 4. gRNAs normally are 20bp long. In one embodiment, 19-bp 18-bp or 17-bp version of the gRNAs (omitting the first one, two or three base pairs) can be utilized to increase the proportion of whole chromosome (versus chromosome arms) events and gains events.

For 21 out of 24 chromosomes, we computationally predicted unique sgRNAs binding >400 times at the centromere with a specificity of 99%. Using KaryoCreate, we demonstrated the successful induction of chromosome-specific aneuploidy for 10 chromosomes tested. In principle, KaryoCreate can be used for 21 out of 24 chromosomes, with the exception of chromosomes sharing similar centromeric sequences such as acrocentric chromosomes. However, the disclosure demonstrates that induction of gains and losses for the remaining chromosomes is still possible by using sgRNAs targeting both the chromosome of interest and other chromosomes sharing centromeric sgRNA binding sites (instead of single chromosomes). Furthermore, the disclosure demonstrates production of two highly recurrent aneuploidies in human gastro-intestinal cancers (chromosome 7 gain and 18q loss), and provides data supporting tumor-associated phenotypes associated with chromosome 18q loss in colorectal cells, as discussed in the Examples below.

The following Examples are intended to illustrate but not limit the disclosure.

Example 1

Computational prediction of sgRNAs targeting chromosome-specific α-satellite centromeric repeats.

To design chromosome-specific centromeric sgRNAs, the genome assembly from the Telomere-to-Telomere (T2T) consortium 29 was referred to. For centromeres resolved in previous assemblies, we confirmed the sgRNA predictions from T2T using the hg38 reference genome 30 , to reduce the risk of bias associated with a single assembly 31,32 . To increase the likelihood of interfering with chromosome segregation, we focused the design on centromeric HORs found to bind to CENPA in chromatin immunoprecipitation (ChIP) experiments (defined as “Live”, or HOR L, by the T2T) 21,33 . For any given chromosome, a preferred sgRNA has 1) high on-target specificity (i.e. does not bind to centromeres on other chromosomes or to other genomic locations), 2) high number of binding sites on the repetitive HOR L and 3) high efficiency in tethering dCas9 to the DNA. For each chromosome, we started by identifying all possible Cas9 sgRNAs targeting its HOR L. We performed this analysis for all 24 human chromosomes (Tables SI, S2). Next, we determined two parameters that define the specificity and efficiency of each sgRNA (both percentages, with 100% the best score): a chromosome specificity score, defined as the ratio of the number of binding sites on the target centromere to the total number of binding sites across all centromeres, and a centromere specificity score, defined as the ratio of the number of binding sites in centromeric regions to the number of sites across the whole genome. We also predicted the efficiency of each sgRNA based on GC content 34 , sgRNA activity (see Methods), and total number of binding sites to the specific centromere (Fig. 1A).

Using thresholds of 99% for both chromosome and centromere specificity scores, a GC content >40%, a minimum of 400 sgRNA binding sites, sgRNA activity 35,36 >0.1, and representation in hg38, we designed at least one sgRNA for 21 of the 24 human chromosomes (all except 21, 22, Y; Fig. IB; Table SI), with 1590 binding sites per chromosome on average. Increasing the chromosome specificity score from 99% to 100% resulted in at least one sgRNA for 16 chromosomes.

Example 2

Experimental validation of sgRNAs targeting α-satellite centromeric repeats on 15 human chromosomes.

To assess the activity of the predicted sgRNAs, we co-expressed selected sgRNAs with Cas9 and monitored cell proliferation, since the presence of several double-strand breaks at the centromere is likely to decrease cell viability 37 . We used hTERT TP53 ^ human colonic epithelial cells (hCECs) 38 and hTERT TP53 WT retinal pigment epithelial cells (RPEs) expressing p21 (CDKN1A) and RB (RBI) shRNAs 39 . We transduced Cas9-expressing RPEs and hCECs with a lentiviral vector expressing either a centromeric or a negative control sgRNA (sgNC) that does not target the human genome 40 . Hereafter we refer to each centromeric sgRNA as sgChrα-P, where a is the specific targeted chromosome and P is the serial number of the designed sgRNA.

We first tested 3 sgRNAs predicted for chromosomes 7 and 13, and 4 for chromosome 18. Compared to sgNC, hCECs and RPEs expressing sgChr7-l, sgChr7-3, sgChrl3-3, or sgChrl8-4 exhibited at least 50% reduction in proliferation, while the other sgRNAs did not result in significant differences (Fig. 1C; Fig. 6A). We selected the sgRNAs exhibiting the greatest reduction in proliferation for additional testing.

To confirm that the sgRNAs targeted the intended centromeres, we designed a dCas9- based imaging system comprising three mScarlet fluorescent molecules fused to the N- terminus of endonuclease-dead Cas9 (3xmScarlet-dCas9). To achieve consistently high expression, we FACS-sorted 3xmScarlet-dCas9-transduced hCECs for strong fluorescent signal. hCECs co-expressing 3xmScarlet-dCas9 and sgChr7-l, sgChrl3-3, or sgChrl8-4 (but not sgNC) showed bright nuclear foci (Fig. ID). Notably, the sgRNAs that did not cause a decrease in proliferation in the presence of Cas9 failed to form foci (Fig. 1C and data not shown).

To further confirm the chromosome specificity of the sgRNAs, we used two independent approaches. We first utilized hCEC clones with aneuploidies previously identified through whole-genome sequencing (WGS)-based copy number analysis to verify whether the observed number of foci was consistent with the expected DNA copy number. We found that hCEC clones carrying three copies of chromosome 7 or 13 each showed three foci when transduced with sgChr7-l or sgChrl3-3, respectively (Fig. ID; Fig. 6B). Transduction with sgRNAs targeting chromosomes present in two copies led to the formation of two foci per nucleus (Fig. ID). Next, we confirmed that the 3xmScarlet-dCas9 foci localized at specific centromeres by fluorescence in-situ hybridization (FISH) using centromeric probes. We confirmed colocalization of FISH signals for both chromosomes 7 (sgChr7-l) and 18 (sgChrl8-4) with mScarlet foci (Fig. IE). Altogether, these experiments indicate that the computationally predicted sgRNAs can recruit dCas9 to the expected specific centromere.

We tested 75 additional sgRNAs in hCECs and confirmed the formation of the expected number of foci for 24 sgRNAs targeting 15 different chromosomes (2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 18, 19, X; Fig. IF, Fig. 6C, Table SI). We also confirmed 4 sgRNAs in RPEs (Fig. 6D).

Altogether, we designed and validated 24 chromosome-specific sgRNAs targeting the centromeres of 15 different human chromosomes. Interestingly, the predicted sgRNA efficiency evaluated using a previously published algorithm 36 did not correlate with the ability of sgRNAs to form foci (r=0.2; =0.5; Fig. 6E, top). Instead, for the sgRNAs that formed foci, there was a significant correlation between the intensity of the signal of the foci and the number of binding sites at the centromeres predicted based on the CHM13 genome reference (r=0.65,p=0.03; Fig. 6E, bottom).

Example 3

Centromeric targeting of KNLl Mut -dCas9 induces modest mitotic delay and chromosome mis segregation. To induce chromosome missegregation, we built and tested four dCas9 fusion proteins to determine if they could disrupt kinetochore-microtubule attachments (Fig. 2A, Fig. 7A). KNLl S24A;S60A -dCas9 and KNLl RVSF/AAAA -dCas9 utilize the KNL1 N-terminal portion (amino acid (aa) 1-86) 28,41 and contain mutations with opposing effects in disrupting the cross-regulation between Aurora B and PPI (Fig. 7A). KNL1 S24A;S6OA was predicted to be always bound to PPI as its mutated residues cannot be phosphorylated by Aurora B 41 (Fig. 7 A); KNL1 RVSF/AAAA contains a mutation affecting the RVSF motif (aa 58-61) preventing it from interacting with PPI and recruiting it to the centromere 28 (Fig. 7A). NDC80-CHl-dCas9 and NDC80-CH2-dCas9 were designed to render the interaction between kinetochores and microtubules hyperstable and refractory to Aurora B destabilization. These constructs contain one (NDC80-CH1) or two (NDC80-CH2) CH domains (aa 1-207), the region of NDC80 responsible for binding microtubules. CH domains normally contain 6 residues whose phosphorylation by Aurora B inhibits the interaction with microtubules; our constructs have all 6 residues mutated, preventing Aurorα-B-mediated regulation 42 (Fig. 7A).

Western blot analysis showed that KNLl RVSF/AAAA -dCas9 and KNLl S24A;S60A -dCas9 expression levels were higher than those of NDC80-CHl-dCas9 and NDC80-CH2-dCas9 (Fig. 7B). For the KNL1 constructs, the N-terminal fusions were generally more stable than the C-terminal fusions (Fig. 7C, Fig. 2A). Given their higher protein expression and greater efficiency in inducing chromosome gains and losses compared to the other constructs, we focused on the KNL1 constructs, particularly KNLl RVSF/AAAA -dCas9, also referred to herein as KNLl Mut -dCas9.

To confirm centromeric localization of the fusion protein, we transduced hCECs expressing a fluorescently tagged version of KNLl Mut -dCas9 (3xmScarlet-KNLl Mut -dCas9) with centromeric sgRNAs, as described above. We observed the expected number of foci in the presence of sgChr7-l and sgChrl8-4 (Fig. 7D), indicating that fusing KNLl Mut with dCas9 does not alter the ability of dCas9 to be recruited to centromeres. Next, using live-cell imaging, we examined the effect of KNLl Mut -dCas9 on mitosis duration and chromosome segregation. hCECs constitutively expressing GFP -tagged histone H2B were transduced with KNLl Mut -dCas9 or empty vector (EV) and with sgChr7-l, sgChrl8-4, or sgNC. Cells expressing KNLl Mut -dCas9 and either sgChr7-l or sgChrl8-4 progressed more slowly through mitosis than cells transduced with EV and either sgChr7-l or sgChrl8-4 (Fig. 2C): the average time spent in the metaphase-to-anaphase transition increased from 6 minutes to 9 or 10 minutes in the sgChr7-l or sgChrl8-4 condition, respectively (Fig. 2B, 2C). Nonetheless, cells transduced with sgChr7-l or sgChrl8-4 did not arrest in metaphase and completed mitosis, and their proliferation rate was only slightly and non significantly lower than that of cells transduced with sgNC (Fig. 7E). The number of cell divisions with lagging chromosomes increased from <5% to 15% between EV+sgChr7-l and KNLl Mut - dCas9+sgChr7-l and from 7% to 23% between EV+sgChrl8-4 and KNLl Mut - dCas9+sgChrl8-4 (Fig. 2B, upper panel, 2D). Furthermore, live-cell imaging of cells expressing 3xmScarlet-KNLl Mut -dCas9 and sgChr7-l, where mScarlet marks chromosome 7 as in Fig. 7D (polyclonal population), showed that about 80% of the lagging chromosomes observed during mitosis had red foci, consistent with chromosome-specific missegregation (Fig. 2B). In this experiment sgNC could not be used as a control as it did not cause foci formation.

To corroborate these data in a different cell line, we performed a similar experiment in the HCT116 (TP 53 WT) colon cancer cell line, transducing them with KNLl Mut -dCas9 and either sgNC, sgChr7-l, or sgChrl8-4. Immunofluorescence for α-tubulin to visualize the mitotic spindle, CREST serum to visualize the centromeres, and DAPI to assess chromosome alignment showed that the percentage of mitoses with misaligned chromosomes increased from 12% in the sgNC samples to 32% and 35% in the sgChr7-l and sgChrl8-4 conditions, respectively (Fig. 2E, 2F).

Finally, we scored the fraction of KNLl Mut -dCas9-expressing hCECs containing micronuclei (a well-known consequence of missegregation 43 ) 7-9 days after transduction with sgRNAs. The percentage of cells showing micronuclei increased from <2.5% for sgNC to 9% for sgChr7-l and 14% for sgChrl8-4 (Fig. 2G). Furthermore, FISH using a chrl8 centromeric probe on cells co-expressing KNLl Mut -dCas9 and sgChrl8-4 showed that 85% of micronuclei had a FISH signal (Fig. 2H). We also confirmed this result for chromosomes 7 and 13 (Fig. 7F).

Altogether, these data indicate that tethering KNLl Mut -dCas9 to the centromeres through chromosome-specific sgRNAs can induce chromosome misalignment, lagging chromosomes, modest mitotic delay, and formation of micronuclei containing the targeted chromosome without substantially affecting the rate of cell division.

Example 4

KaryoCreate allows induction of chromosome-specific gains and losses in human cells.

Having designed and validated chromosome-specific sgRNAs and dCas9-based constructs to induce chromosome missegregation, we next tested the capability of this system, designated “KaryoCreate” for Karyotype CAISPR Engineered dneuploidy Technology, to generate specific aneuploidies in human cell lines (Fig. 3 A). We reasoned that transient targeting of the dCas9-based construct to the centromere would generate chromosome gains and losses and allow isolation of stable aneuploid lines.

We first designed a system based on doxycycline-inducible expression of KNLl Mut - dCas9 (constructed in the pIND20 vector 44 ) and constitutive sgRNA expression (pLentiGuide-Puro-FE, Fig. 3B, 3C; see Methods). We tested KaryoCreate in hCECs co- transduced with pIND20-KNLl Mut -dCas9 or pIND20-GFP (control) and with sgNC, sgChr7- 1, or sgChrl8-4. Cells were treated with doxycycline for 7-9 days, and analyzed by FISH. 95% of control cells (GFP with sgNC) showed two copies of chromosomes 7 and 18 (Fig. 3D, 3E). This percentage did not significantly change in cells expressing KNLl Mut -dCas9 and sgNC, indicating that in the absence of a centromere-specific sgRNA, KNLl Mut -dCas9 does not induce chromosome mis segregation (Fig. 3D, 3E; see Table S2 for automated quantification). Compared to sgNC, sgChr7-l expression in hCECs transduced with KNLl Mut -dCas9 significantly increased the percentages of cells showing chromosome loss, i.e. <2 copies (from 3% to 16%; p=0.01), or gain, i.e. >2 copies (from 2.8% to 12.5%; =0.03), of chromosome 7, but not loss or gain of chromosome 18 (3% versus 3.2%). We next tested sgChrl8-4, finding significant increases in loss (from 2% to 17.5%; p=0.01) and gain (from 2.5% to 14%; p=0.02) of chromosome 18 but not chromosome 7 (Fig. 3D, 3E; see Table S2 for automated quantification). Furthermore, we obtained comparable results when we restricted the FISH analysis to metaphase spreads as opposed to nuclei (Fig. 3F, 3G).

We also developed two additional KaryoCreate systems: one based on transient co- transfection of KNLl Mut -dCas9 driven by a constitutive promoter (pHAGE vector) and an sgRNA-expressing vector (pLentiGuide-Puro-FE) and another based on a degrader approach whereby KNLl Mut -dCas9 is fused to an FKBP-based degradation domain 45 and is stabilized only after treatment with the small molecule Shield-1 (see Methods). Overall, the three methods gave similar results (Fig. 8A).

We next analyzed the frequency of aneuploidy induced by other constructs generated for KaryoCreate (NDC80-CHl-dCas9 and NDC80-CH2-dCas9, described above; see Fig. 7A-7C, finding that the other fusion proteins induced aneuploidy with similar or lower efficiency than KNLl Mut -dCas9 (KNLl RVSF/AAAA -dCas9; Fig. 8B). KNLl S24A;S60A -dCas9 produced similar levels of induced aneuploidy to KNLl Mut -dCas9 (KNLl RVSF/AAAA -dCas9), while NDC80-CHl-dCas9 and NDC80-CH2-dCas9 showed lower but appreciable efficiency (see Fig. 7B). Notably, after normalization for the corresponding expression level (shown in Fig. 7B), KNLl S24A;S60A -dCas9 induced a higher absolute level of aneuploidy than KNLl RVSF/AAAA -dCas9, while NDC80-CHl-dCas9 and NDC80-CH2-dCas9 showed the highest induction of aneuploidy (Fig. 8B). We measured aneuploidy induced by expression of dCas9 (with sgRNAs), finding this to be approximately 30% of the level induced by KNLl RVSF/AAAA -dCas9 (Fig. 8B). About 90% of the aneuploidy events induced by dCas9 were losses and 10% were gains, whereas for KNLl RVSF/AAAA -dCas9 and especially KNLl S24A;S60A -dCas9, 55-65% were losses (Fig. 8C). This indicates that just the recruitment of dCas9 to centromeres at least partially inhibits its normal function, leading mainly to chromosome losses, and that the simultaneous expression of mutant forms of KNL1 (especially KNLl S24A;S60A -dCas9) has a significant additive effect on aneuploidy induction that is biased toward chromosome gains.

We evaluated which parameters and conditions affect KaryoCreate’s efficiency, focusing on KNLl Mut -dCas9 due to its higher absolute level of aneuploidy induction compared to other constructs. Higher levels of KNLl Mut -dCas9 expression induced greater aneuploidy: a 3-fold increase in KNLl Mut -dCas9 expression led to a 2-fold increase in gains or losses (Fig. 8D). Next, combining multiple sgRNAs targeting the same chromosome (sgChr7-l + sgChr7-3 or sgChr9-3 + sgChr9-5) did not increase the percentage of aneuploid cells over that due to individual sgRNAs, despite the increase in predicted binding sites achieved by combining the sgRNAs (Fig. 8E, 3F). We also tested whether FACS sorting, based on a cell surface marker encoded on the target chromosome, could increase the percentage of cells with gains or losses. We sorted cells transduced with KNLl Mut -dCas9 and sgChr7-l based on high (top 15%) or low (bottom 15%) expression of EPHB4, a gene on chromosome 7 encoding a cell surface ephrin receptor. The percentage of cells with chromosome 7 gain increased from 12% to 26% from unsorted to a^n-EPHB4 cells (Fig. 8G), and the percentage of cells with chromosome 7 loss increased from 8% to 16% from unsorted to \ow-EPHB4 cells. Finally, a time-course experiment showed that sustained KaryoCreate activity increased aneuploidy progressively after 1, 2, or 3 cell cycles (2, 4, and 6 days after doxycycline; Fig. 8H). Altogether, the results indicate that KaryoCreate can induce chromosome-specific aneuploidy.

Example 5

KaryoCreate allows induction of arm-level and chromosome-level gains and losses across human chromosomes. FISH analyses showed that targeting chromosome 7 does not affect chromosome 18 and vice versa, but did not rule out erroneous targeting of other chromosomes. To extend analysis of KaryoCreate’s specificity across all chromosomes, we performed high-throughput single-cell RNA sequencing (scRNA-seq) to estimate genome-wide DNA copy number profiles across thousands of cells 46-48 . To infer copy number, we use the mean expression of genes across each chromosome or arm as a proxy for DNA copy number and then estimated the percentage of gains and losses for each arm by comparing the DNA copy number distribution of each experimental sample to that of the control population (e.g. sgNC or untreated cells). To prove the ability to infer arm-level copy number through scRNA-seq, we compared scRNA-seq and bulk shallow WGS results for hCEC cell lines with specific gains and losses. Analysis of a trisomic chromosome 7 clone showed that the percentage of cells with chromosome 7 gain was 91% by FISH and 80% by scRNA-seq. Similarly, analysis of the more complex karyotype (+chr7, -chrl8, +19p) showed that the percentage of cells with chromosome 7 gain was 88% by FISH and 76% by scRNA-seq, and that for chromosome 18 loss was 87% by FISH and 81% by scRNA-seq (Fig. 9A, 9B). scRNA-seq slightly underestimated aneuploidy, especially gains, likely because a change from 2 to 3 copies represents an increase in DNA and RNA of 33%, while loss of 1 copy from 2 copies corresponds to a decrease of 50%. Overall, the patterns of aneuploidy inferred by scRNA-seq recapitulated those revealed by bulk WGS, confirming the validity of scRNA-seq for analyzing genome-wide gains and losses in single cells.

We performed scRNA-seq on diploid hCECs 7 days after KaryoCreate for chromosome 7 (sgChr7-l), chromosome 18 (sgChrl8-4), and sgNC to estimate the frequency of induced aneuploidy (Fig. 4; pIND20 vector, expression level intermediate compared to those in Fig. 8D). For each sample, we estimated arm-level gains or losses for most chromosomes, except those with few (<20) genes detected on the p arm. First, we confirmed that the expression of KNLl Mut -dCas9 with the sgNC construct did not significantly induce aneuploidy compared to that in cells treated with the EV control (Fig. 4, Fig. 9C), as it led to very low percentages of gains and losses across chromosomes, averaging 0.9% for gains and 1.2% for losses. We confirmed the induction of chromosome-specific gains or losses after KaryoCreate, consistent with our FISH experiments (Fig. 3D, 3E). For example, scRNA-seq showed 10% gains and 17% losses for chromosome 18 (sgChrl8-4) (Fig. 4, Table S3) and 9% and 11% gains and losses for chromosome 7 (sgChr7-l), respectively (Fig. 4, Table S3). scRNA-seq confirmed that KaryoCreate-induced aneuploidy was highly specific, with an average background level of nonspecific aneuploidy of 1% (Fig. 4, Table S3). Notably, the gains (0.9%) and losses (1.2%) observed in the sgNC sample across chromosomes are about 3 times lower than those observed by DNA FISH (3% for both gains and losses) (Fig. 3E), again suggesting that scRNA-seq underestimates aneuploidy, and especially gains, compared to FISH (Table S3).

We further tested KaryoCreate using sgRNAs targeting additional chromosomes, including 6, 8, 9, 12, 16, and X, that were previously confirmed to induce foci with mScarlet- dCas9 (Fig. 4; see also Fig. 1 and Fig. 6). We performed KaryoCreate with the diploid hCECs expressing KNLl Mut -dCas9 (pIND20) and analyzed the cells through scRNA-seq 7 days after doxycycline induction. In all cases, cells expressing the chromosome-specific sgRNAs showed more gains and losses of the targeted chromosome than those expressing sgNC. The chromosome-specific gains and losses differed among the chromosomes and ranged between 5% and 12% for gains (average across 10 chromosomes: 8%) and between 7% and 17% for losses (average across 10 chromosomes: 12%) (Fig. 4, Table S3). Notably, gains or losses of the non-targeted chromosomes never exceeded those in the sgNC control.

In agreement with our previous findings (Fig. 8D), the expression levels of the KNLl Mut -dCas9 construct correlated with the efficiency of KaryoCreate: a 3-fold increase in KNLl Mut -dCas9 expression (Fig. 8D) resulted in a 40-50% increase in both gains (from 9% to 16%) and losses (from 11% to 22%) (Fig. 4, compare sgChr7-l and sgChr7-l with high KNLl Mut -dCas9 expression). Furthermore, we successfully utilized KaryoCreate for inducing multiple chromosomal gains or losses in the same cells, by transducing cells simultaneously with multiple sgRNAs targeting different chromosomes (sgChr7-l + sgChrl8-4; 8% of cells had changes in both chromosomes 7 and 18 (Fig. 9F) or by utilizing a single sgRNA targeting multiple chromosomes (e.g. sgRNA 13-5 which targets both chromosomes 13 and 21 in hCEC; Fig. 4, Table S3). Finally, we obtained similar results using KaryoCreate in TP53 WT RPEs (Fig. 9D), suggesting that the method can be applied to different cell lines and in cells with an intact TP53 pathway.

Throughout the scRNA-seq analysis, we noted that in addition to whole-chromosome gains and losses, KaryoCreate also induced arm-level events, in which only one chromosomal arm (p or q) is gained or lost. Across the chromosomes tested, approximately 60% of aneuploidy events involved chromosome arms and 40% affected whole chromosomes (Fig. 9E). On average, there were 28% whole-chromosome losses, 17% whole-chromosome gains, 32% arm-level gains, and 23% arm-level losses (Fig. 9E, Table S3). Consistent with arm- level aneuploidy, we observed a modest increase in centromeric foci detected with the DNA damage marker γH2AX after expression of KNLl Mut -dCas9 and sgChr7-l or sgChr!8-4 (but not sgNC) for 10 days in HCT116 cells, in both interphase nuclei and mitotic cells; the average γH2AX signal intensity per cell, normalized to DAPI, also increased (Fig. 9G-9H and data not shown). In a time-course experiment, γH2AX signal had increased after 4 days of doxycycline treatment (approximately two cell cycles) but not after 2 days (approximately one cell cycle) (Fig. 91). Notably, the ratio between arm-level and chromosome-level events also increased significantly after 4 (and 6) compared to 2 days of doxycycline treatment (Fig. 8H), indicating that DNA damage signal increases over prolonged binding of KNLl Mut -dCas9 to the centromere and proportionally to arm-level events (see Discussion).

Altogether these data show that KaryoCreate can generate chromosomal gains and losses across individual chromosomes as well as combinations of the human autosomes and sex chromosomes.

Example 6

18q loss in colon cells promotes resistance to TGFβ signaling likely due to haploinsufficiency of multiple genes.

We used KaryoCreate to model 18q loss and chromosome 7 gain, aneuploidy events frequently found in colorectal cancer. Chromosome 18q is lost in about 62% of colorectal cancer (TCGA Dataset; 49 , Fig. 5A), and patients with 18q loss (A=136) show poorer survival than those without (A=86) (p=0.04, log-rank test, Fig. 5B). Chromosome 7 gain is present in 50% of patients (Fig. 5A).

To model these events, we performed KaryoCreate on hCECs using sgChr7-l, sgChrl8-4, or sgNC as above (see also Methods). About 20 single-cell-derived clones were derived for each condition and their copy number profiles evaluated by WGS. After KaryoCreate, cells were seeded at low density and allowed to grow into colonies for 3-4 weeks, a longer time than in the experiments above (Fig. 4), during which cells likely experienced selective pressure for the ability to grow as single colonies (Fig. 10A).

Compared to clones derived from the sgNC control population, clones derived from sgChr7-l showed an increase from 0% in sgNC to 22% in chr7 gains but no losses (0 for both conditions) (Fig. 10B). Clones derived from sgChrl8-4 showed an increase from 0% in sgNC to 30% in chrl8 loss losses but not gains (0 for both conditions) (Fig. 5C). This recapitulates the recurrent patterns observed in human tumors, where chromosome 18 is frequently lost but virtually never gained (2%), whereas chromosome 7 is frequently gained and almost never lost (0.3%). We did not observe aneuploidy of chromosomes not targeted by KaryoCreate except for lOq gain, which was present in -20% of clones for all conditions, including sgNC, and was likely present in the initial population. Next, to test whether KaryoCreate clones can be stably propagated, we cultured a chromosome 7 trisomic clone (sgChr7-l clone 23) for several weeks; we confirmed chromosome 7 gain by FISH and WGS analysis before and after 25 population doublings (Fig. IOC). We obtained similar results for sgChrl8-4 clone 14.

Given the association of chromosome 18q loss with poor survival (Fig. 5B), we characterized the phenotypes of clones with or without this loss, starting from two clones derived from the KaryoCreate hCECs with sgChrl8-4: one disomic control (clone 13) and one with 18q loss (clone 14). We performed bulk RNA sequencing analyses of each clone and conducted differential expression analysis using DESeq2 50 . Gene-set enrichment analysis (GSEA) for cancer hallmarks showed that the top pathway downregulated in clone 14 compared to clone 13 was TGFβ signaling (enrichment score=-0.59; q-value=0.006), followed by cholesterol homeostasis, myogenesis, and bile acid metabolism (Fig. 5D). TGFp (transforming growth factor beta) normally inhibits the proliferation of colon epithelial cells by promoting their differentiation; its inhibition through intestine niche factors such as Noggin is essential for the proliferation and expansion of colon epithelial cells 51 . We tested the effect of TGFβ activation in our clones through an in vitro cell proliferation assay in which we cultured clones 13 and 14 in the presence of TGFβ (20 ng/ml) for 10 days. At day 9, TGFp treatment had reduced cell growth by about 45% for the control clone 13 but <10% for clone 14 (Fig. 5E; p=0.02). Altogether, these data suggest that 18q deletion leads to decreased response to the growth-inhibitory signals derived from TGFβ treatment. We obtained similar results with an independent pair of different clones, clone 10 (diploid) and clone 5 (lacking chromosome 18) (Fig. 10E).

Chromosome 18q harbors the tumor-suppressor gene SMAD4 (located on 18q21.2), encoding a transcription factor critical for mediating response to TGFβ signaling 52 53 . In colorectal cancer, SMAD4 can be inactivated through point mutation (29% of patients) 54 or genomic loss (62% of patients); in 96% of cases of genomic loss, the deletion encompasses the entire chromosome arm. A previous study suggested that mutations may occur before chromosomal instability 54 . Independently of the timing of SMAD4 mutations versus 18q loss, it is unknown whether the decreased survival in 18q loss patients (Fig. 5B) is a consequence of the complete loss of SMAD4 (due to co-occurring point mutation in the other allele) or is independent of SMAD4 mutation and possibly due to simultaneous loss of several tumor- suppressor genes on 18q, as previously suggested 55 . To distinguish between these possibilities, we assessed the contribution of 18q loss to patient survival after excluding patients with point mutations in SMAD4'. if 18q loss serves to abolish SMAD4 function through deletion of the wild-type allele when one copy of SMAD4 carries a point mutation, we would predict that 18q loss would lose its association with patient survival after patients with SMAD4 mutations are excluded. 18q loss remained a significant predictor of survival after SMAD4-mutated patients were removed, indicating that decreased survival could be a consequence of the deletion of several tumor-suppressor genes on 18q (Fig. 10D,p-value of 0.006, lower than in the analysis including all patients, see Fig. 5B).

To systematically predict tumor-suppressor genes located on 18q, we developed a score using three computational parameters based on the TCGA dataset: 1. correlation between DNA and RNA level of each gene across patients 56 ; 2. association of expression level of each gene with patients’ survival; 3. TUSON-based prediction of the likelihood for a gene to behave as a tumor-suppressor gene based on its pattern of point mutations 4 . The top ten predicted genes were SMAD2, ADNP2, MBD1, ATP8B1, WDR7, MBD2, DYM, SMAD4, ZBTB7C, and LMANI (Fig. 5F). SMAD2, a paralogue of SMAD4 located on 18q21.1, is also a transcription factor acting downstream of TGFβ signaling 51,57 . Thus, concomitant decreases in gene dosage of both SMAD4 and SMAD2 could synergistically mediate the unresponsiveness of cells to TGFβ signaling.

We tested the role of decreased dosage of SMAD2 and SMALM proteins in our clone containing 18q loss. We confirmed by both RNA-seq and Western blotting a decrease in both SMAD2 and SMALM in clone 14 compared to control clone 13 (Fig. 5G; SMAD4 log2FC: - 0.78, p<0.0001; SMAD2 log2FC: -0.75, p<0.0001). Furthermore, overexpression of SMAD2 and SMAD4 in clone 14 decreased proliferation rate after TGFβ treatment to a level similar to clone 13 (Fig. 10E, 10F). To further test whether the increased resistance to TGFβ treatment after 18q loss was due to the synergistic effects of decreases in both SMAD2 and SMAD4 (as opposed to SMALM only), we derived hCECs with a -50% decrease in SMALM protein level by CRISPR interference (Fig. 10G, 10H). In proliferation assays, cells with 18q loss (clone 14) were more resistant to TGFβ treatment than hCECs with decreased SMALM levels (Fig. 10G, 10H), indicating that 18q loss has a greater effect than a -50% decrease in SMALM expression.

These computational and experimental data suggest that chromosome 18q loss, one of the most frequent events in gastro-intestinal cancers, is associated with poor survival and promotes resistance to TGFβ signaling, likely because of the synergistic effect of simultaneous deletion of haploinsufficient genes. Discussion of Examples

Chromosome-specific centromeric sgRNAs

KaryoCreate includes the design of sgRNAs targeting chromosome-specific α- satellite DNA. Among 75 tested, we validated 24 sgRNAs specific for 16 different chromosomes (Fig. 1, Fig. 6, Table SI). Since centromere sequences vary across the human population, we designed sgRNAs using two genome assemblies (CHM13 and GRCh38) and tested them in different cell lines (hCECs, RPEs, and HCT116), increasing their likelihood of targeting conserved regions.

The disclosure demonstrates the design and use of sgRNAs to target human centromeres for most human chromosomes. Some chromosomes are not included due to centromeric sequences sharing high similarity across specific chromosome groups (i.e. acrocentric), to the low GC content of centromeric sequences likely decreasing the gRNA activity, or to a lack of sufficient predicted binding sites (e.g. D21Z1, D15Z3, and D3Z1 in the CHM13 assembly have relatively small active centromere regions) 21,58 . The efficiency of centromeric sgRNAs is not accurately predicted using algorithms for non-centromeric regions 35 (Fig. 6E). Using more than one sgRNA simultaneously did not improve aneuploidy induction (Fig. 8E, 8F). Because of the repetitive nature of centromeres, any pair of sgRNAs is predicted to bind multiple times and relatively close together, potentially inducing competition or interference among KNLl Mut -dCas9 molecules.

Comparison of KaryoCreate with similar technologies

Other strategies have been recently described to induce chromosome-specific aneuploidy targeting non-centromeric repeats and have been successful for chromosome 1 using a sub-telomeric repeat and chromosome 9 using a pericentromeric repeat 16,17 . Tovini et al. used dCas9 fused to the kinetochore-nucleating domain of CENPT to form an ectopic kinetochore. Truong et al. tethered a plant kinesin to pull the chromatids towards one pole of the mitotic spindle, potentially generating a pseudo-dicentric chromosome, as suggested by the fact that most aneuploidies observed were of part of the targeted chromosome (chromosome 9). KaryoCreate is distinct in that it uses endogenous centromeric sequences to allow the generation of nearly any karyotype of interest. We found that cells progressed normally through the cell cycle with an expected brief delay in metaphase, likely due to attempts at correcting merotelic attachments 59,60 . Also, in contrast to existing technologies, KaryoCreate can induce specific aneuploidies across several chromosomes or combinations thereof (Table S3). KaryoCreate also enables induction of aneuploidy not only in TP53 1 cells but also in TP 53 WT cells such as HCT116 cells (Fig. 2E) and RPEs (Fig. 9D).

Targeting mutant kinetochore proteins to centromeric a-satellites to engineer chromosomespecific aneuploidy

Tethering of chimeric dCas9 with mutant forms of KNL1 or NDC80 to human centromeres induces chromosome- and arm-level gains and losses (Fig. 8B). Data in this disclosure suggest that dCas9 itself may induce low-frequency aneuploidy, possibly due to tethering of a bulky protein to the centromeric repeats 16 17,42 . Remarkably, the expression of chimeric mutants of kinetochore proteins at centromeric regions induces about 3 times as many aneuploidy events compared to dCas9 alone, which may be due to the disruption of their proper kinetochore functions (Fig. 8B). We noted that different mutants show different efficiency of aneuploidy induction relative to their expression level (Fig. 7B, 7B). NDC80 mutants induced aneuploidy efficiently relative to their low expression level, suggesting a higher degree of kinetochore disruption compared to KNL1 fusion (Fig. 7B, 8B). Of the two chimeras containing KNL1 mutants, we predicted that KNLl S24A;S60A -dCas9 would result in a more efficient induction of chromosome gains and losses than KNLl RVSF/AAAA -dCas9, owing to a more efficient inhibition of Aurorα-B-mediated error correction through recruitment of PPI 28,41 . Although this was not the case in terms of absolute level of aneuploidy, KNLl S24A;S60A -dCas9 efficiency was higher when normalized for protein expression level (Fig. 8B).

Induction of arm-level gains and losses

About 55% of the aneuploidy generated by KaryoCreate are arm -lev el events. In addition, we observed more losses (60%) than gains (40%) for both chromosome and arm events. Our data reveal a small fraction of centromeres positive for γH2AX upon aneuploidy induction with KaryoCreate (Fig. 9G-9I), especially upon prolonged centromere recruitment of KNLl Mut -dCas9 and proportionally to the ratio between arm-level and chromosome-level events (Fig. 8H). The mere recruitment of a bulky protein to the centromere may influence centromere function, as our data on the effect of dCas9 alone suggest (Fig. 8B) 18,31,61-63 . When recruited to the highly repetitive centromeric regions, dCas9 may influence chromosome segregation through impaired replication or transcription affecting chromatin, transcripts, and R-loops and, in turn, centromere function 62-66 .

Chromosome-specific aneuploidy as a driver of cancer hallmarks We used KaryoCreate to induce missegregation of chromosomes 7 and 18, two of the chromosomes most frequently aneuploid in colorectal tumors. Among the single-cell-derived clones, chromosome 7 tended to be gained and chromosome 18 tended to be lost (Fig. 5C, Fig. 9B), indicating that the selective pressure acting during tumor evolution to shape recurrent patterns of aneuploidy may also act in vitro4,7. In our analyses, 18q loss was a strong predictor of poor survival, consistent with previous studies 67,68 ; in addition the association of 18q loss with survival was independent of SMALM point mutations. We showed that chrl8q loss can promote resistance to TGFβ signaling in colon cells. While SMAD4 is a frequently mutated tumor-suppressor gene 54 on chrl8q, the TGFβ resistance phenotype determined by 18q loss may be due not solely to its loss but to the cumulative effect of losing multiple tumor suppressors on the arm. In fact, -50% reduction in SMALM alone was not sufficient to recapitulate resistance to TGFβ signaling seen after 18q loss, and dosage increases in both SMALM and SMAD2 could rescue TGFβ resistance in 18q loss cells (Fig. 5E, Fig. 10E-10H). Thus, chromosome 18 loss may drive TGFβ resistance through hemizygous deletion of (at least) two haploinsufficient genes acting in the same pathway.

Previous studies have proposed that a single cancer-driver gene may confer the strong phenotypic effect of whole-chromosome gain or loss 69,70 . Other studies, including previous work on chromosome 18, have proposed that the selective advantage of aneuploidy is instead conferred by the cumulative effect of gene dosages of multiple genes 4,6,55,71 . The present data support this latter hypothesis. Altogether, these data suggest that 18q loss may drive tumor phenotypes in colorectal cancer through the cumulative loss of several tumor-suppressor genes located on the chromosome arm.

Cell lines

All cells were grown at 37°C with 5% CO2 levels. hTERT TP53 1 human colonic epithelial cells (hCECs) 38 were cultured in a 4: 1 mix of DMEM:Medium 199, supplemented with 2% FBS, 5 ng/mL EGF, 1 pg/mL hydrocortisone, 10 μg/mL insulin, 2 pg/mL transferrin, 5 nM sodium selenite, pen-strep, and L-glutamine. hTERT retinal pigment epithelial cells (RPEs) 39 either WT (Fig. 9D) or expressing p21 (CDKN1A) and RB (RBI) shRNAs (Fig. 6D), and human colorectal carcinomα-116 cells (HCT116s) were incubated in DMEM, supplemented with 10% FBS, pen-strep, and L-glutamine. For long-term storage, cells were cryopreserved at --80°C in 70% medium (according to cell line), 20% FBS, 10% DMSO. TP53 was knocked- out in hCECs by transfection with a Cas9-containing plasmid (Addgene #42230) and plLentiGuide-Puro expressing the following sgRNA: GCATGGGCGGCATGAACCGG (SEQ ID NO: 6). Clones were derived and tested for the expression of TP53.

METHODS DETAILS

Cloning of KaryoCreate Constructs

Cas9 and dCas9 without ATG and without stop codon (for N-terminal and C-terminal tagging respectively) were cloned into D-TOPO vector (Thermo #K240020). Cloning of KNLl RVSF/AAAA -dCas9 was achieved by inserting KNL1 PCR product (aal-86, amplified from Addgene plasmid #45225 28 ) into Xhol-digested pENTR-dCas9 (no ATG) using Gibson assembly. The GGSGGGS (SEQ ID NO: 5) linker was added between KNL1 and dCas9. Cloning of KNLl S24A;S60A -dCas9 was achieved starting from KNLl RVSF/AAAA -dCas9 and inserting the appropriate mutations using Gibson assembly. Cloning of NDC80-CHl-dCas9 was achieved by Gibson assembly of NDC80 aal-207 (generously provided by Dr. Jennifer DeLuca) with BamHI-digested pENTR dCas9 (ATG). Cloning of NDC80-CH2-dCas9 was achieved in a similar way except that 2 CH domains were cloned in tandem separated by a linker (see also Fig. 7A).

To generate an inducible KNLl Mut -dCas9 construct, the FKBP12 degradation domain (DD, Banaszynski 2006 45 ) was first amplified from Degron-KI-donor backbone (Addgene #65483) and inserted at the N-terminus of the fusion protein sequence in pENTR- KNLl RVSF/AAAA -dCas9 using Gibson cloning. Gateway LR cloning was then used to yield the expression vector, pHAGE-DD-KNLl RVSF/AAAA -dCas9. pHAGE-3xmScarlet-dCas9 was generated by first assembling three mScarlets in series and inserting them into the Bsal-digested pAVIO vector by Golden Gate cloning. The assembled 3xmScarlet was then inserted into Xhol-digested pENTR-dCas9 using Gibson cloning to form pENTR-3xmScarlet-dCas9.

All pENTR vectors were cloned into specific pDEST vectors by LR reaction (Thermo #11791020) following the manufacturer’s instructions. pDEST vectors used in this study were pHAGE (blast resistance, CMV promoter) or pINDUCER20 (or pIND20, neomycin resistance, doxycycline inducible promoter) 44 .

Cloning of sgRNAs

We modified the scaffold sequence of pLentiGuide-Puro (Addgene #52963) by Gibson assembly to contain the A-U flip (F) and hairpin extension (E) described by Chen et al 72 , for improved sgRNA-dCas9 assembly, obtaining pLentiGuide-Puro-FE. sgRNAs were designed and cloned into this pLentiGuide-Puro-FE vector according to the Zhang Lab General Cloning Protocol 73 (also addgene.org/crispr/zhang/) (see also Table SI for sgRNA sequences). To be suitable for cloning into ///zs/-digested vectors, sense oligos were designed with a CACC 5’ overhang and antisense oligos were designed with an AAAC 5’ overhang. The sense and antisense oligos were annealed, phosphorylated, and ligated into either Bbsl- digested pLentiGuide-Puro-FE for KaryoCreate and imaging purposes or pX330-U6- Chimeric_BB-CBh-hSpCas9 74 (Addgene #42230) for CRISPR/Cas9 editing applications. Sequences were confirmed by Sanger sequencing.

Lentivirus production and nucleofection

For transduction of cells, lentivirus was generated as follows: 1 million 293T cells were seeded in a 6-well plate 24 hours before transfection. The cells were transfected with a mixture of gene transfer plasmid (2 pg) and packaging plasmids including 0.6 pg ENV (VSV-G; addgene #8454), 1 pg Packaging (pMDLg/pRRE; addgene #12251), and 0.5 pg pRSV-REV (addgene #12253) along with CaCl2 and 2x HBS or using Lipofectamine 3000 (Thermo #L3000075).The medium was changed 6 hours later and virus was collected 48 hours after transfection by filtering the medium through a 0.45-pm filter. Polybrene (1 : 1000) was added to filtered medium before infection.

Nucleofection of hCECs was carried out using the Amaxa Nucleofector II (Lonza), using the program optimized for the HCT116 cell line. Approximately 1 million cells suspended in 100 pL of electroporation buffer (80% 125 mM Na2HPO4.-7H2O), 12.5 nM KC1, 20% 55 mM MgCh) were subjected to electroporation in the presence of a vector and then immediately returned to normal medium.

KaryoCreate Experiments

The disclosure includes three representative approaches to perform the described KaryoCreate process. One difference between these methods is the way KNLl Mut -dCas9 and the sgRNA are expressed in the cell.

Representative methods to express KNLl Mut -dCas9:

A) KNLl Mut -dCas9 is expressed from a doxycycline-inducible promoter (pIND20- KNLl Mut -dCas9) through a viral vector constitutively integrated in the genome of the target cell. Cells are treated with doxycycline (1 pg/ul) for 7-9 days.

B) KNLl Mut -dCas9 is expressed from a constitutive promoter (pHAGE-KNLl Mut - dCas9; CMV promoter) through transient transfection. C) KNLl Mut -dCas9 is expressed through a viral vector constitutively integrated in the genome of the target cell; the expression level of KNLl Mut -dCas9 is regulated through a degron (pHAGE-DD-KNLl Mut -dCas9; see above)

For the sgRNA, expression is mediated by pLentiGuide-Puro-FE vector through infection or transient transfection. In this disclosure, unless otherwise specified, the sgRNA was introduced through infection. For a comparison of the three different methods, see Figure 8 A.

Western blot analysis

Cells were harvested by trypsinization, lysed in 2x NuPAGE LDS buffer (Thermo #NP0007) at 10 6 cells in 100 pl of buffer. DNA was sheared using a 28 1/2-gauge insulin syringe and lysate was denatured by heating at 80°C for 10 min. Lysate equivalent to 10 5 cells was resolved by SDS/PAGE using a NuPAGE 4-12% Bis-Tris mini gel and transferred to a PVDF membrane (Bio-Rad #1704274). The membrane was then blocked in 5% milk in TBS with 0.1% Tween-20 (TBS-T) for 1 hour at room temperature. Afterward, the membrane was probed with Cas9 (Abeam #abl91468, 1 : 1000 dilution) and GAPDH (Santa Cruz #sc- 47724, 1 : 10,000 or 1 : 100,000 dilution) or β-actin (Cell Signaling Technology #8844) primary antibodies and incubated in 1% milk in TBS at 4°C overnight. For SMAD2 and SMAD4 western blots, Abeam Ab40855 and Santa Cruz Biotechnology #Sc-7966 were used.

Subsequently, the membrane was washed three times with TBS-T and incubated with HRP-anti-Mouse secondary Ab (Abeam #ab205719, 1 : 1000 dilution) in 1% milk/TBS for 1 hour at room temperature. Signals were detected using an ECL system using 1 : 1 detection solution (Thermo Scientific #32209) after three 10-min washes in TBS-T. Images were acquired using a BIORAD transilluminator.

Fluorescence in situ hybridization (FISH)

For the analyses confirming centromeric localization of 3xmScarlet-dCas9 and localization of specific chromosomes within micronuclei, FISH was performed using an Empire Genomics chromosome 7 control probe (CHR07-10-GR) or chromosome 18 control probe (CHR18-10-GR) on PFA-fixed cells according to the manufacturer’s manual hybridization protocol.

FISH analysis was carried out on interphase nuclei and metaphase spreads prepared as follows: Cells at 70% confluence were harvested by trypsinization (after 3- to 4-hour treatment with 100 ng/mL colcemid (Roche #10295892001) for metaphase spreads), washed with PBS, suspended in 0.075 M KC1 at 37°C, and fixed in methanol-acetic acid (3: 1) at 4°C. Fixed cells were dropped onto glass slides and then allowed to air dry overnight.

The slides were next incubated with RNase solution (20 pg RNase A in 2x SSC ) for one hour at 37°C in a dark moist chamber. Denaturing was performed using a 70% formamide solution (in 2x SSC) for 3 min at 80°C prior to hybridization. Biotinylated/digoxigeninated probes were obtained by nick translation from BAC DNA (RP11-22N19 for chromosome 7, RP11-76N11 for chromosome 13, and RP11-787K12 for chromosome 18 from the BACPAC Resource Center). 200 ng of each labeled probe, together with 8 pg Human Cot-I DNA (Thermo #15279011) and 3 pg Herring Sperm DNA (Thermo #15634017) were precipitated for 1 hour at -20°C in 1/10 volume of 3 M sodium acetate and 3 volumes of ethanol. The pelleted probe was washed with 70% ethanol, air dried, and resuspended in hybridization solution (50% deionized formamide, 10x dextran sulfate, 2x SSC). The hybridization solution containing the probes was then denatured at 80°C for 10 min and then incubated at 37°C for 20 min to allow annealing of the Cot-I competitor DNA. The sealed hybridized slides were then incubated at 37°C in a dark moist chamber overnight. The following day, slides were washed in 1 x SSC at 60°C (3 times, 5 min each) and incubated with a blocking solution (BSA, 2x SSC, 0.1% Tween-20) for 1 hour at 37°C in a moist chamber. Following blocking, the slides were incubated with detection solution containing BSA , 2x SSC , 0.1% Tween-20, and FITC-A vidin conjugated (Thermo #21221), and 10 pl Rhodamine- Anti-Digoxigenin (Sigma #11207750910) to detect the biotin and digoxigenin signals. Finally, slides were washed 3 times (5 min each) with 4x SSC and 0.1% Tween-20 solution at 42°C and then mounted with DAPI to stain DNA (Vector Laboratories #H- 1200- 10).

Images were acquired using an Invitrogen™Evos™M700 imaging system or Nikon TI Eclipse. The number of fluorescent signals was counted in 100 intact nuclei per slide. Adobe Photoshop was used to count the signals and correct the images.

Live-cell imaging

Cells were plated on 35-mm glass-bottom microwell dishes (MatTek P35G-1.5-14-C) 1 day prior to imaging. Imaging was performed at 37°C and 5% CO2 using an Andor Yokogawa CSU-X confocal spinning disc on a Nikon TI Eclipse microscope. Samples were exposed to 488-nm (30-ms) and 561-nm (100-ms) lasers and fluorescence was recorded with a sCMOS Prime95B camera (Photometries). A 100x objective was used to acquire images at 0.9-pm steps (total range size=9 pm) every 1 or 3 min as indicated in the figure legends. Image analysis was performed using ImageJ and formatting (cropping, contrast adjustment, labeling) was performed in Adobe Photoshop.

Chromosome misalignment staining

HCT116 cells were plated onto coverslips coated with 5 pg/ml fibronectin (Sigmα- Aldrich) at 60-70% confluence and synchronized with 7.5 pM RO-3306 (Sigmα-Aldrich) for 16 hours at 37°C. Cells were released from RO-3306 for 40 min and then treated with 10 uM MG-132 (Tocris) for 90 min at 37°C. Cells were then fixed with 4% paraformaldehyde for 12 min at room temperature and blocked in 5% BSA for 30 min. Samples were stained with the following antibodies for 90 min at room temperature: anti-α-Tubulin (Sigmα- Aldrich #T9026, 1 : 1500 dilution) and anti-centromeric antibody (Antibodies Incorporated SKU 15-234, 1 : 100 dilution). CyTM3 AffiniPure (Jackson ImmunResearch #715-165-150) and Alexa 647- labeled (Jackson ImmunoResearch #709-606-149) secondary antibodies were used 1 :400 for 45 min at room temperature. Coverslips were mounted using Mowiol. Cells were imaged using a Leica SP5 confocal microscope with a magnification objective of 63 x. FIJI software was used for image analysis.

Low-pass whole-genome sequencing

Genomic DNA was extracted from trypsinized cells using 0.3 pg/pL Proteinase K (Qiagen #19131) in 10 mM Tris, pH 8.0, for 1 hour at 55°C and then heat inactivated at 70°C for 10 min. DNA was digested using NEBNext® dsDNA Fragmentase® (NEB #M0348S) for 25 min at 37°C and then subjected to magnetic DNA bead cleanup with Serα-Mag Select Beads (Cytiva #293430452), 2: 1 bead/lysate ratio by volume. DNA libraries with an average library size of 320 bp were created using the NEBNext® Ultra™ II DNA Library Prep Kit for Illumina® (NEB #E7645L) according to the manufacturer’s instructions. Quantification was performed using a Qubit 2.0 fluorometer (Invitrogen #Q32866) and the Qubit dsDNA HS kit (Invitrogen #Q32854). Libraries were sequenced on an Illumina NextSeq 500 at a target depth of 4 million reads in either paired-end mode (2 x 36 cycles) or single-end mode (1 x 75 cycles).

RNA bulk sequencing

Clones were plated in 6-well plates 1 day before collection. On the day of collection, cells were checked for confluency within 70-90% and normal morphology. Cells were washed twice with PBS and stored at -80°C immediately. RNA was purified for bulk sequencing using the Qiagen RNeasy Mini Kit (Qiagen #74106). RNA concentration and integrity were assessed using a 2100 BioAnalyzer (Agilent #G2939BA). Sequencing libraries were constructed using the TruSeq Stranded Total RNA Library Prep Gold (Illumina #20020598) with an input of 250 ng and 13 cycles final amplification. Final libraries were quantified using High Sensitivity D1000 ScreenTape (Agilent #5067-5584) on a 2200 TapeStation (Agilent #G2964AA) and Qubit 1 x dsDNA HS Assay Kit (Invitrogen #Q32854). Samples were pooled equimolar with sequencing performed on an Illumina NovaSeq6000 SP 100 Cycle Flow Cell vl.5 as Paired-end 50 reads.

Clone derivation hCECs were transduced with pHAGE-DD-KNLl Mut -dCas9 and a sgRNA vector and DD-KNLl Mut -dCas9 was stabilized with 100 nM Shield-1 (CheminPharma #CIP-S1, 0.5 nM) for 9 days. Three days after Shield- 1 treatment, 20-500 cells were plated per 15-cm plate and were incubated in normal culture conditions until colonies were visible (~2-3 weeks). Colonies were then picked by applying wax cylinders to the area surrounding each clone, trypsinizing the cells, and moving them to separate wells in 48-well plates for further expansion.

Single-cell RNA sequencing scRNA-seq libraries were prepared using the 10x Chromium Single-Cell 3' v3 Gene Expression kit according to the manufacturer's instructions, including the manufacturer's protocol for cell surface protein (hashtag antibody) feature barcoding. Up to 10 Total Seq-B hashtag antibodies (BioLegend) were used for multiplexing samples in each sequencing run.

Immunofluorescence for centromeric damage

Cells were grown on poly-L-lysine coverslips, fixed in PFA (Sigmα-Aldrich 8187081000) 2% in l x PBS, and washed three times in l x PBS. Fixed cells were permeabilized with l x PBS and 0.2% Triton (Sigmα-Aldrich X100, 500 ml) for 5 min at room temperature and washed again before being blocked with PBS-0.1% Tween 20 (Sigmα- Aldrich P1379, 500 ml) plus 5% BSA for 10 min. Cells were then incubated with primary antibodies, γH2AX (Sigmα-Aldrich 05-636) diluted 1 :200 and CREST (Antibodies Incorporated 15-234-0001). After 45 min, cells were washed three times with l x PBS and 0.1% Tween 20 and then incubated with the secondary antibodies anti-Mouse Alexα-488 (Jackson ImmunoResearch 711-545-152) and anti-Human Alexa 647 (Jackson ImmunoResearch 109-605-044). After 30 min, cells were washed twice with l x PBS and 0.1% Tween 20 and once with l x PBS with DAPI (Sigmα-Aldrich 28718-90-3) diluted 1 :750 from a 0.5 mg/ml stock. After 5 min, cells were washed one last time with 1 x PBS and mounted using ProLong Glass Antifade Mountant (Thermo Scientific P36980). Images were acquired using a Thunder Leica fluorescent microscope at a 100x magnification and with a 0.2 pm z-stack and then processed using FIJI-ImageJ 75 to obtain a maximum projection.

Quantification of centromeric damage

For each cell, the number of γH2AX and CREST colocalizing foci was scored using maximum projection images.

Quantification of the fluorescent mean intensity signal

FIJI software was used to select the area of each cell and measure the signal mean intensity of the maximum projection images.

Overexpression or downregulation of SMAD2 and SMAD4

To overexpress human SMAD2 and SMAD4, cDNA for each gene was cloned into pHAGE vectors. CRISPRi (CRISPR-inhibition) was used to downregulate SMAD4 expression by transducing dCas9 into the cells using a pHAGE-dCas9 vector together with a CRISPR-interference sgRNA (GGCAGCGGCGACGACGACCA (SEQ ID NO: 7)) from Gilbert et al 76 cloned into pLentiGuide-Puro-FE.

QUANTIFICATION AND STATISTICAL ANALYSIS

Replicates, statistical analyses and scale bars

For each experiment we report in the figure legends the sample size and whether triplicates or duplicates were performed. Unless otherwise specified, triplicates or duplicates were biological, not technical. Unless otherwise specified,; p-values are from the Wilcoxon test. If not otherwise specified,; at least 50 nuclei or cells were analyzed in the FISH or IF experiments. Also, if not otherwise specified the scale bars in the FISH and IF images represent 5 pM.

Computational sgRNA prediction

The CHM13 centromeric sequences and whole-genome reference were downloaded from the T2T Consortium (github.com/marbl/CHM13) 29 and the hg38 reference genome from the UCSC genome browser. For the CHM13 centromeric sequences, the HOR region with the classification “Live” or “HOR L” was selected. For each HOR L region, all possible SpCas9 sgRNA sites with a pattern comprising 20 nucleotides followed by NGG as PAM were searched. For each possible sgRNA, the numbers of binding sites in the centromeric HOR L regions of each chromosome and in the whole genome were counted. The number of sgRNA binding sites was also determined using the hg38 reference. The GC content for each sgRNA was also determined.

For each sgRNA, two scores were determined: the chromosome specificity score, defined as the ratio between the number of binding sites on the centromere (HOR L) of the target chromosome (chromosome that we intend to target) and the total number of sites across all centromeres (HOR L) (given as a fraction or as a percentage after multiplication by 100), and the centromere specificity score, defined as the ratio between the number of binding sites on the centromere (HOR L) of the target chromosome and the number of binding sites across the whole genome (given as a fraction or as a percentage after multiplication by 100).

The sgRNA efficiency was evaluated based on 3 parameters: 1) GC content, 2) total number of binding sites in the centromere of the target chromosome, and 3) sgRNA activity predicted from previous studies by Doench et al 35,36 . With that method, the sgRNA activity is calculated based on 72 genetic features 36 , which include the presence of certain nucleotides at specific positions along the sgRNA and the GC content. For a particular guide Sj ,the model weights for the features i will be Wij and the intercept will be int. The activity f(sj) is then given via logistic regression as:

Predicted sgRNA activity f(sj) falls into the range [0,1], with 0 as the worst score and 1 as the best score. Since CHM13 is a female-derived (XX) cell line, all binding sites for chromosome Y were evaluated based on hg38. Predicted sgRNAs are listed in Table SI.

Automated image quantification of FISH foci

In addition to manual counting of FISH foci (shown in Fig. 3 and Fig. 8), an automated image quantification was also performed (Table S2). FISH counts were calculated automatically using an in-house-developed python script, available publicly at github.com/davolilab/FISH-counting. Individual nuclei were segmented by applying an automatic threshold to the DAPI channel after smoothing and contrast enhancement. Thresholded objects were filtered for area and solidity to remove erroneously segmented regions. For probe detection within segmented nuclei, a white tophat filter was applied to remove small spurious regions, and then the “blob log” function from scikit-image package 77 was utilized to identify and count fluorescent spots. Since it was observed that some FISH probes were incorrectly doubly counted, a distance cutoff was applied so that spots within a set (minimal) distance count as one spot. Then, the probe numbers were aggregated and the percentages for different spot counts were calculated. The script was run under a python 3.7 environment; for more details, see the github repository.

Quantification of foci intensity

The regions corresponding to the FISH foci were determined by the threshold function of Fiji. Then, the average intensity of each determined region was calculated as the representative of the brightness of the focus by Fiji (used in Fig 6E).

Low-pass whole-genome sequencing analysis

Low-pass (~0.1-0.5x) whole-genome sequencing reads of cells were aligned to reference human genome hg38 by using BWA-mem (vO.7.17; github. com/lh3/bwa/releases/tag/v0.7.17) 78 , and duplicates were removed using GATK (Genome Analysis Toolkit, v4.1.7.0) (https://gatk.broadinstitute.org/hc/en-us) 79 with default parameters to generate analysis-ready BAM files. BAM files were processed by the R Package CopywriteR (vl.18.0; https://github.com/PeeperLab/CopywriteR) 80 to call the arm- level copy numbers.

Bulk RNA-seq analysis pipeline

RNA sequencing reads were processed, quality controlled, aligned, and quantified using the Seq-N-Slide software(gi thub.com/igordot/sns) 81 . In brief, total RNA sequencing reads were trimmed using Trimmomatic (https://github.com/timflutre/trimmomatic) 82 and mapped to the GENCODE human genome hg38 by STAR (github.com/alexdobin/STAR) 83 . featureCounts (github.com/byee4/featureCounts) 84 was used to quantify reads and generate a genes-sample counts matrix. Differential gene expression (DGE) analysis was completed with DESeq2 in R (bioconductor.org/packages/release/bioc/html/DESeq2.html) 50 . Gene ranks from DGE were used for pathway analysis using the GSEA preranked utility (www.gseα- msigdb.org/gsea/doc/GSEAUserGuideFrame.html) 85 . Further plotting and statistical analyses were completed in R.

Single-cell RNA sequencing data pre-processing The CellRanger v6.1 pipeline (10X Genomics) was used to process single-cell RNA sequencing data. CellRanger count was used to align sequences and generate gene expression matrices. Sequences were aligned to the pre-built GRCh38-2020-A human reference for CellRanger. Gene expression matrices were generated with each column representing a cell barcode and each row representing a gene or hashtag oligo sequences (HTO).

To identify the sample of origin for each cell barcode, the HTO count data from each 10X Chromium experiment were demultiplexed using the Seurat v4.0.3 package for R v4.1 (https://github.com/satijalab/seurat) 86 . Cell barcodes that could be confidently assigned to a single sample were kept. Several quality control thresholds were applied uniquely to each dataset on total gene number, total UMI counts, and total HTO counts to remove low-quality cells and potential cell doublets. Cells were also discarded if their proportion of total gene counts that could be attributed to mitochondrial genes exceeded 10%.

Modified CopyKat analysis

A modified version of the CopyKat vl.0.5 (github.com/navinlabcode/copykat) 46 pipeline for R was used to generate a copy number alteration (SCNA) score for each chromosome arm in each cell. Hashtagged samples from the same cell line in each 10X Chromium dataset were grouped together for analysis. Each such group of samples contained a diploid control sample used to set the SCNA value baseline centered around 0. For each analysis, genes expressed in less than 5% of the cells, HLA genes, and cell-cycle genes were excluded. The log-Freeman-Tukey transformation was used to stabilize variance and dlmSmooth() was used to smooth outliers. The diploid control sample for each set was used to calculate a baseline expression level for each gene. This value was subtracted from the samples in the set, centering the control sample expression around 0. Genes expressed in less than 10% of cells were then excluded from further analysis. The original CopyKat pipeline splits the transcriptome into artificial segments based on similar expression, and calculates a SCNA value for each segment. Instead, we generated a SCNA value for each chromosome arm by calculating the mean gene expression for the genes on that arm.

A single SCNA value for the entire chromosome 18 was calculated using genes on both the p and q arms of the chromosome instead of each arm individually, due to its relatively small size. SCNA values for chromosomes 13, 14, 15, 21, and 22 were calculated only using genes on their respective q arms. Gains or losses of a chromosome arm relative to the control sample (diploid) were called based on a threshold calculated from the control sample for each chromosome arm. The threshold is calculated as median ± (2.5 x MAD) where the median is calculated from the SCNA values for each arm in the control sample, and the median absolute deviation (MAD) is calculated by the mad() function from the stats R package. Gains (or losses) are then called for a chromosome arm if its SCNA value is above (or below) the threshold for its sample set.

CopyKat data visualization

Heatmaps were generated using the Compl exHeatmap v2.8 R package 87 . Each row represents one cell, each column represents a chromosome arm, and each value is the corresponding SCNA score. Column widths were scaled to the number of genes on the arm. For the heatmaps, cells were clustered by row of the chromosome of interest. Bar graphs were generated using the ggplot2 v3.3.5 R package.

Survival analysis

For survival analysis, the disease-free interval (DFI) and related clinical data were downloaded from cBioPortal 88 . Arm-level copy number was downloaded from TCGA Firehose Legacy (https://gdac.broadinstitute.org). For each patient, purity α, ploidy τ, and integer copy number q(x) data were downloaded from GDC (https://gdc.cancer.gov/about- data/publications/pancanatlas). Before the analysis, the arm-level copy number values R(x) were adjusted using the formula below:

Patients with arm-level log2 ratio less than -0.3 would be regarded as an arm-level loss event to evaluate patients based on the presence or absence of 18q arm loss. A log-rank test between the stratified patients and the Kaplan-Meier method was used to calculate the p- value and plot survival curves. Patients for whom clinical survival information was unavailable were excluded from the analysis. In addition, a Cox proportional hazards (PH) regression model was used to calculate each gene’s hazard ratio (HR) between the top 50% and bottom 50% expression.

Gene rank score analysis

For each gene on chromosome 18, we calculated the DNA-RNA Spearman’s correlation (rho value) from the TCGA-COADREAD dataset. Genes with no or very low frequency of SCNA (-0.02 < DNA log2FC < 0.02 in >70% of the patients) were removed because for those genes very little or no variance at the DNA level is likely to influence the correlation value. The Cox proportional-hazards model was then applied to estimate the association between the expression level of each gene and patients’ survival. The TUSON algorithm for predicting the likelihood for a gene to behave as a tumor-suppressor gene (TSG) based on its pattern of point mutation was from Davoli et al. 4 and was applied to the latest available TCGA dataset of point mutations. A gene rank score was generated based on the rank sum of the following three parameters: DNA-RNA correlation, hazard ratio from Cox proportional hazards regression, and q- value from TUSON-based TSG prediction. In other words, for each gene, the (three) rank position values determined based on the three parameters listed above were summed.

SUPPLEMENTARY MATERIAL

LEGENDS TO SUPPLEMENTARY TABLES

Table SI. Prediction of sgRNA for each chromosome with CHM13 genome, Related to Figure 1 (see also Methods).

Table SI contains in the first tab the sgRNA prediction for 76 selected sgRNAs across all chromosomes except chromosome Y. This table contains the sgRNA sequence, chromosome location, binding sites for specific CHM13 chromosome centromere, total binding sites across all centromeres, chromosome specificity (ratio between the number of binding sites on the centromere of that chromosome and the total number of sites across all centromeres), centromere specificity (ratio between the number of binding sites on the centromere of that chromosome and the number of binding sites across the whole genome), binding sites across whole CHM13 genome and hg38 genome, activity score (Doench score) and validation results by imaging. The table contains predictions of sgRNAs for every single chromosome, as indicated.

Table S2. Acrocentric chromosome sgRNA prediction in CHM13 and hg38 genome, Related to Figure 4. This table contains the specific sgRNA across different acrocentric chromosomes and includes predicted binding sites across different chromosomes, total binding sites across all centromeres with hg38 genome, and total binding sites across the whole hg38 genome.

Table S3. Automated quantification of FISH foci after KaryoCreate, Related to Figure 3. This table contains the number of FISH foci quantified using an Automated FISH counting (see Methods) designed to score FISH signals in interphase cells. TABLE SI

TABLE SI - SELECTED gRNAs

TABLE S1-CHR1

TABLE S1-CHR2

TABLE S1-CHR3

TABLE S1-CHR4

TABLE S1-CHR5

TABLE S1-CHR6

TABLE S1-CHR7

TABLE S1-CHR8

TABLE SI - CHR9

TABLE S1-CHR10

TABLE S1-CHR11

TABLE S1-CHR12

TABLE S1-CHR13

TABLE S1-CHR14

TABLE S1-CHR15

TABLE 1 - CHR16

TABLE S1-CHR17

TABLE S1-CHR18

TABLE S1-CHR19

TABLE S1-CHR20

TABLE S1-CHR21

TABLE S1-CHR22

TABLE Sl-CHRX

TABLE S1-CHRY(HG38)

TABLE S2

TABLE S2

TABLE S3

TABLE S3A (1 st Part)

TABLE S3A (2nd Part)

TABLE S3A (3rd Part)

TABLE S3B

TABLE S3B (1 st Part)

TABLE S3B (2 nd Part)

TABLE S3B (3 rd Part)

TABLE S3C

REFERENCES - This reference listing is not an indication that any reference is material to patentability

1. Knouse, K.A., Wu, J., Whittaker, C.A., and Amon, A. (2014). Single cell sequencing reveals low levels of aneuploidy across mammalian tissues. Proc. Natl. Acad. Sci. U. S.

A. Ill, 13409-13414. 10.1073/pnas. l415287111.

2. Knouse, K.A., Davoli, T., Elledge, S.J., and Amon, A. (2017). Aneuploidy in Cancer: Seq-ing Answers to Old Questions. Annu. Rev. Cancer Biol. 1, 335-354.

10.1146/annurev-cancerbio-042616-072231.

3. Beroukhim, R., Mermel, C.H., Porter, D., Wei, G., Raychaudhuri, S., Donovan, J., Barretina, J., Boehm, J.S., Dobson, J., Urashima, M., et al. (2010). The landscape of somatic copy-number alteration across human cancers. Nature 463, 899-905. 10.1038/nature08822.

4. Davoli, T., Xu, A.W., Mengwasser, K.E., Sack, L.M., Yoon, J.C., Park, P.J., and Elledge,

S.J. (2013). Cumulative haploinsufficiency and triplosensitivity drive aneuploidy patterns and shape the cancer genome. Cell 155, 948-962. 10.1016/j .cell.2013.10.011.

5. Taylor, A.M., Shih, J., Ha, G., Gao, G.F., Zhang, X., Berger, A.C., Schumacher, S.E., Wang, C., Hu, H., Liu, J., et al. (2018). Genomic and Functional Approaches to Understanding Cancer Aneuploidy. Cancer Cell 33, 676-689. e3.

10.1016/j.ccell.2018.03.007.

6. William, W.N., Zhao, X., Bianchi, J. J., Lin, H.Y., Cheng, P., Lee, J. J., Carter, H., Alexandrov, L.B., Abraham, J.P., Spetzler, D.B., et al. (2021). Immune evasion in HPV- head and neck precancer-cancer transition is driven by an aneuploid switch involving chromosome 9p loss. Proc. Natl. Acad. Sci. U. S. A. 118, e2022655118.

10.1073/pnas.2022655118.

7. Watkins, T.B.K., Lim, E.L., Petkovic, M., Elizalde, S., Birkbak, N.J., Wilson, G.A., Moore, D.A., Gronroos, E., Rowan, A., Dewhurst, S.M., et al. (2020). Pervasive chromosomal instability and karyotype order in tumour evolution. Nature 587, 126-132. 10.1038/s41586-020-2698-6.

8. Santaguida, S., Tighe, A., D’ Alise, A.M., Taylor, S.S., and Musacchio, A. (2010). Dissecting the role of MPS 1 in chromosome biorientation and the spindle checkpoint through the small molecule inhibitor reversine. J. Cell Biol. 190, 73-87.

10.1083/jcb.201001036.

9. Hewitt, L., Tighe, A., Santaguida, S., White, A.M., Jones, C.D., Musacchio, A., Green, S., and Taylor, S.S. (2010). Sustained Mpsl activity is required in mitosis to recruit O- Mad2 to the Madl-C-Mad2 core complex. J. Cell Biol. 190, 25-34.

10.1083/jcb.201002133.

10. Fournier, R.E. (1981). A general high-efficiency procedure for production of microcell hybrids. Proc. Natl. Acad. Sci. U. S. A. 78, 6349-6353. 10.1073/pnas.78.10.6349.

11. Stingele, S., Stoehr, G., Peplowska, K., Cox, J., Mann, M., and Storchova, Z. (2012). Global analysis of genome, transcriptome and proteome reveals the response to aneuploidy in human cells. Mol. Syst. Biol. 8, 608. 10.1038/msb.2012.40.

12. Ly, P., Teitz, L.S., Kim, D.H., Shoshani, O., Skaletsky, H., Fachinetti, D., Page, D.C., and Cleveland, D.W. (2017). Selective Y centromere inactivation triggers chromosome shattering in micronuclei and repair by non-homologous end joining. Nat. Cell Biol. 19, 68-75. 10.1038/ncb3450.

13. Ly, P., Brunner, S.F., Shoshani, O., Kim, D.H., Lan, W., Pyntikova, T., Flanagan, A.M., Behjati, S., Page, D.C., Campbell, P.J., et al. (2019). Chromosome segregation errors generate a diverse spectrum of simple and complex genomic rearrangements. Nat. Genet. 51, 705-715. 10.1038/s41588-019-0360-8.

14. Rayner, E., Durin, M.-A., Thomas, R., Moralli, D., O’Cathail, S.M., Tomlinson, I., Green, C.M., and Lewis, A. (2019). CRISPR-Cas9 Causes Chromosomal Instability and Rearrangements in Cancer Cell Lines, Detectable by Cytogenetic Methods. CRISPR J. 2, 406-416. 10.1089/crispr.2019.0006.

15. Zuo, E., Huo, X., Yao, X., Hu, X., Sun, Y., Yin, J., He, B., Wang, X., Shi, L., Ping, J., et al. (2017). CRISPR/Cas9-mediated targeted chromosome elimination. Genome Biol. 18, 224. 10.1186/S13059-017-1354-4.

16. Tovini, L., Johnson, S.C., Andersen, A.M., Spierings, D.C.J., Wardenaar, R., Foijer, F., and McClelland, S.E. (2022). Inducing Specific Chromosome Mis- Segregation in Human Cells. EMBO J 42: el l l559. 10.15252/embj.2022111559

17. Truong, M.A., Cane-Gasull, P., Vries, S.G. de, Nijenhuis, W., Wardenaar, R., Kapitein, L.C., Foijer, F., and Lens, S.M.A. (2022). A motor-based approach to induce chromosome-specific mis-segregations in human cells. EMBO J 42: el 11587.

10.15252/embj.2022111587

18. Barra, V., and Fachinetti, D. (2018). The dark side of centromeres: types, causes and consequences of structural abnormalities implicating centromeric DNA. Nat. Commun. 9, 4340. 10.1038/s41467-018-06545-y.

19. Hayden, K.E. (2012). Human centromere genomics: now it’s personal. Chromosome Res. Int. J. Mol. Supramol. Evol. Asp. Chromosome Biol. 20, 621-633. 10.1007/sl0577-012- 9295-y. 0. Schueler, M.G., and Sullivan, B.A. (2006). Structural and functional dynamics of human centromeric chromatin. Annu. Rev. Genomics Hum. Genet. 7, 301-313.

10.1146/annurev.genom.7.080505.115613. 1. Altemose, N., Logsdon, G.A., Bzikadze, A.V., Sidhwani, P., Langley, S.A., Caldas, G.V., Hoyt, S.J., Uralsky, L., Ryabov, F.D., Shew, C.J., et al. (2022). Complete genomic and epigenetic maps of human centromeres. Science 376, eabl4178. 10.1126/science.abl4178. 2. Musacchio, A., and Desai, A. (2017). A Molecular View of Kinetochore Assembly and Function. Biology 6, E5. 10.3390/biology6010005. 3. Cheeseman, I.M. (2014). The kinetochore. Cold Spring Harb. Perspect. Biol. 6, a015826. 10.1101/cshperspect.a015826. 4. Musacchio, A. (2015). The Molecular Biology of Spindle Assembly Checkpoint Signaling Dynamics. Curr. Biol. CB 25, R1002-1018. 10.1016/j.cub.2015.08.051. 5. Stern, B.M., and Murray, A.W. (2001). Lack of tension at kinetochores activates the spindle checkpoint in budding yeast. Curr. Biol. CB 11, 1462-1467. 10.1016/s0960- 9822(01)00451-1. 6. Liu, D., and Lampson, M.A. (2009). Regulation of kinetochore-microtubule attachments by Aurora B kinase. Biochem. Soc. Trans. 37. 7. Papini, D., Levasseur, M.D., and Higgins, J.M.G. (2021). The Aurora B gradient sustains kinetochore stability in anaphase. Cell Rep. 37, 109818. 10.1016/j.celrep.2021.109818. 28. Liu, D., Vleugel, M., Backer, C.B., Hori, T., Fukagawa, T., Cheeseman, I.M., and Lampson, M.A. (2010). Regulated targeting of protein phosphatase 1 to the outer kinetochore by KNL1 opposes Aurora B kinase. J. Cell Biol. 188, 809-820.

10.1083/jcb.201001006.

29. Nurk, S., Koren, S., Rhie, A., Rautiainen, M., Bzikadze, A.V., Mikheenko, A., Vollger, M.R., Altemose, N., Uralsky, L., Gershman, A., et al. (2022). The complete sequence of a human genome. Science 376, 44-53. 10.1126/science.abj6987.

30. Schneider, V.A., Graves-Lindsay, T., Howe, K., Bouk, N., Chen, H.-C., Kitts, P.A., Murphy, T.D., Pruitt, K.D., Thibaud-Nissen, F., Albracht, D., et al. (2017). Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 27, 849-864. 10.1101/gr.213611.116.

31. Sullivan, L.L., and Sullivan, B.A. (2020). Genomic and functional variation of human centromeres. Exp. Cell Res. 389, 111896. 10.1016/j.yexcr.2020.111896.

32. Willard, H.F. (1991). Evolution of alpha satellite. Curr. Opin. Genet. Dev. 1, 509-514. 10.1016/s0959-437x(05)80200-x.

33. Uralsky, L.I., Shepelev, V.A., Alexandrov, A.A., Yurov, Y.B., Rogaev, E.I., and Alexandrov, I. A. (2019). Classification and monomer-by-monomer annotation dataset of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly. Data Brief 24, 103708. 10.1016/j dib.2019.103708.

34. Wang, T., Wei, J.J., Sabatini, D.M., and Lander, E.S. (2014). Genetic screens in human cells using the CRISPR-Cas9 system. Science 343, 80-84. 10.1126/science.1246981.

35. Doench, J.G., Fusi, N., Sullender, M., Hegde, M., Vaimberg, E.W., Donovan, K.F., Smith, I., Tothova, Z., Wilen, C., Orchard, R., et al. (2016). Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 34, 184-191. 10.1038/nbt.3437.

36. Doench, J.G., Hartenian, E., Graham, D.B., Tothova, Z., Hegde, M., Smith, I., Sullender,

M., Ebert, B.L., Xavier, R.J., and Root, D.E. (2014). Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation. Nat. Biotechnol. 32, 1262-1267. 10.1038/nbt.3026.

37. Meyers, R.M., Bryan, J.G., McFarland, J.M., Weir, B.A., Sizemore, A.E., Xu, H., Dharia,

N.V., Montgomery, P.G., Cowley, G.S., Pantel, S., et al. (2017). Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nat. Genet. 49, 1779-1784. 10.1038/ng.3984.

38. Ly, P., Eskiocak, U., Kim, S.B., Roig, A.I., Hight, S.K., Lulla, D.R., Zou, Y.S., Batten, K., Wright, W.E., and Shay, J.W. (2011). Characterization of aneuploid populations with trisomy 7 and 20 derived from diploid human colonic epithelial cells. Neoplasia N. Y. N 13, 348-357. 10.1593/neo.l01580.

39. Maciejowski, J., Li, Y., Bosco, N., Campbell, P.J., and de Lange, T. (2015). Chromothripsis and Kataegis Induced by Telomere Crisis. Cell 163, 1641-1654. 10.1016/j.cell.2015.11.054.

40. Sanjana, N.E., Shalem, O., and Zhang, F. (2014). Improved vectors and genome-wide libraries for CRISPR screening. Nat. Methods 11, 783-784. 10.1038/nmeth.3047.

41. Bajaj, R., Bollen, M., Peti, W ., and Page, R. (2018). KNL1 Binding to PPI and Microtubules Is Mutually Exclusive. Structure 26, 1327-1336. e4. 10.1016/j.str.2018.06.013.

42. DeLuca, J.G., Gall, W.E., Ciferri, C., Cimini, D., Musacchio, A., and Salmon, E.D. (2006). Kinetochore microtubule dynamics and attachment stability are regulated by Heel. Cell 727, 969-982. 10.1016/j.cell.2006.09.047.

43. Hatch, E.M., Fischer, A.H., Deerinck, T.J., and Hetzer, M.W. (2013). Catastrophic nuclear envelope collapse in cancer cell micronuclei. Cell 154, 47-60.

10.1016/j.cell.2013.06.007.

44. Meerbrey, K.L., Hu, G., Kessler, J.D., Roarty, K., Li, M.Z., Fang, J.E., Herschkowitz, J.I., Burrows, A.E., Ciccia, A., Sun, T., et al. (2011). The pINDUCER lentiviral toolkit for inducible RNA interference in vitro and in vivo. Proc. Natl. Acad. Sci. U. S. A. 108, 3665-3670. 10.1073/pnas.1019736108.

45. Banaszynski, L.A., Chen, L.-C., Maynard- Smith, L.A., Ooi, A.G.L., and Wandless, T.J. (2006). A rapid, reversible, and tunable method to regulate protein function in living cells using synthetic small molecules. Cell 126, 995-1004. 10.1016/j.cell.2006.07.025.

46. Gao, R., Bai, S., Henderson, Y.C., Lin, Y., Schalck, A., Yan, Y., Kumar, T., Hu, M., Sei, E., Davis, A., et al. (2021). Delineating copy number and clonal substructure in human tumors from single-cell transcriptomes. Nat. Biotechnol. 39, 599-608. 10.1038/s41587- 020-00795-2.

47. Patel, A.P., Tirosh, I., Trombetta, J.J., Shalek, A.K., Gillespie, S.M., Wakimoto, H., Cahill, D.P., Nahed, B.V., Curry, W.T., Martuza, R.L., et al. (2014). Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344, 1396-1401. 10.1126/science.1254257.

48. Tirosh, I., Izar, B., Prakadan, S.M., Wadsworth, M.H., Treacy, D., Trombetta, J. J., Rotem, A., Rodman, C., Lian, C., Murphy, G., et al. (2016). Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352, 189-196.

10.1126/science.aad0501.

49. The Cancer Genome Atlas Network (2012). Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330-337. 10.1038/naturel 1252.

50. Love, M.I., Huber, W ., and Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550. 10.1186/s 13059-014- 0550-8.

51. Massague, J., Blain, S.W., and Lo, R.S. (2000). TGFbeta signaling in growth control, cancer, and heritable disorders. Cell 103, 295-309. 10.1016/s0092-8674(00)00121-5.

52. Drost, J., van Jaarsveld, R.H., Ponsioen, B., Zimberlin, C., van Boxtel, R., Buijs, A., Sachs, N., Overmeer, R.M., Offerhaus, G.J., Begthel, H., et al. (2015). Sequential cancer mutations in cultured human intestinal stem cells. Nature 527, 43-47.

10.1038/naturel4415.

53. van de Wetering, M., Francies, H.E., Francis, J.M., Bounova, G., Iorio, F., Pronk, A., van Houdt, W., van Gorp, J., Taylor-Weiner, A., Kester, L., et al. (2015). Prospective derivation of a living organoid biobank of colorectal cancer patients. Cell 161, 933-945. 10.1016/j.cell.2015.03.053.

54. Woodford-Richens, K.L., Rowan, A.J., Gorman, P., Halford, S., Bicknell, D.C., Wasan, H.S., Roylance, R.R., Bodmer, W.F., and Tomlinson, I.P.M. (2001). SMAD4 mutations in colorectal cancer probably occur before chromosomal instability, but after divergence of the microsatellite instability pathway. Proc. Natl. Acad. Sci. 98, 9719-9723.

10.1073/pnas.171321498.

55. Thiagalingam, S., Lengauer, C., Leach, F.S., Schutte, M., Hahn, S.A., Overhauser, J., Willson, J.K., Markowitz, S., Hamilton, S.R., Kern, S.E., et al. (1996). Evaluation of candidate tumour suppressor genes on chromosome 18 in colorectal cancers. Nat. Genet. 13, 343-346. 10.1038/ng0796-343.

56. Cheng, P., Zhao, X., Katsnelson, L., Camacho-Hernandez, E.M., Mermerian, A., Mays, J.C., Lippman, S.M., Rosales-Alvarez, R.E., Moya, R., Shwetar, J., et al. (2022). Proteogenomic analysis of cancer aneuploidy and normal tissues reveals divergent modes of gene regulation across cellular pathways. eLife 11, e75227. 10.7554/eLife.75227.

57. Eppert, K., Scherer, S.W., Ozcelik, H., Pirone, R., Hoodless, P., Kim, H., Tsui, L.C., Bapat, B., Gallinger, S., Andrulis, I.L., et al. (1996). MADR2 maps to 18q21 and encodes a TGFb etα-regulated MAD-related protein that is functionally mutated in colorectal carcinoma. Cell 86, 543-552. 10.1016/s0092-8674(00)80128-2.

58. Dumont, M., Gamba, R., Gestraud, P., Klaasen, S., Worrall, J.T., De Vries, S.G., Boudreau, V., Salinas-Luypaert, C., Maddox, P.S., Lens, S.M., et al. (2020). Human chromosome-specific aneuploidy is influenced by DNA -dependent centromeric features. EMBO J. 39. 10.15252/embj .2019102924.

59. Cimini, D., Howell, B., Maddox, P., Khodjakov, A., Degrassi, F., and Salmon, E.D. (2001). Merotelic kinetochore orientation is a major mechanism of aneuploidy in mitotic mammalian tissue cells. J. Cell Biol. 153, 517-527. 10.1083/jcb.153.3.517.

60. Gregan, J., Polakova, S., Zhang, L., Tolic-Norrelykke, I.M., and Cimini, D. (2011). Merotelic kinetochore attachment: causes and effects. Trends Cell Biol. 21, 374-381. 10.1016/j.tcb.201 L 01.003.

61. Whinn, K.S., Kaur, G., Lewis, J.S., Schauer, G.D., Mueller, S.H., Jergic, S., Maynard, H., Gan, Z.Y., Naganbabu, M., Bruchez, M.P., et al. (2019). Nuclease dead Cas9 is a programmable roadblock for DNA replication. Sci. Rep. 9, 13292. 10.1038/s41598-019- 49837-z.

62. Giunta, S., Herve, S., White, R.R., Wilhelm, T., Dumont, M., Scelfo, A., Gamba, R., Wong, C.K., Rancati, G., Smogorzewska, A., et al. (2021). CENP-A chromatin prevents replication stress at centromeres to avoid structural aneuploidy. Proc. Natl. Acad. Sci. 118, e2015634118. 10.1073/pnas.2015634118.

63. Bury, L., Moodie, B., Ly, J., McKay, L.S., Miga, K.H., and Cheeseman, I.M. (2020). Alphα-satellite RNA transcripts are repressed by centromere-nucleolus associations. eLife 9, e59770. 10.7554/eLife.59770.

64. McNulty, S.M., Sullivan, L.L., and Sullivan, B.A. (2017). Human Centromeres Produce Chromosome-Specific and Array-Specific Alpha Satellite Transcripts that Are Complexed with CENP-A and CENP-C. Dev. Cell 42, 226-240.e6. 10.1016/j.devcel.2017.07.001.

65. Chan, F.L., Marshall, O.J., Saffery, R., Won Kim, B., Earle, E., Choo, K.H.A., and Wong, L.H. (2012). Active transcription and essential role of RNA polymerase II at the centromere during mitosis. Proc. Natl. Acad. Sci. 109, 1979-1984.

10.1073/pnas.1108705109.

66. Kabeche, L., Nguyen, H.D., Buisson, R., and Zou, L. (2018). A mitosis-specific and R loop-driven ATR pathway promotes faithful chromosome segregation. Science 359, 1 OS- 114. 10.1126/science.aan6490.

67. Sarli, L., Bottarelli, L., Bader, G., lusco, D., Pizzi, S., Costi, R., Da€™Adda, T., Bertolani, M., Roncoroni, L., and Bordi, C. (2004). Association Between Recurrence of Sporadic Colorectal Cancer, High Level of Microsatellite Instability, and Loss of Heterozygosity at Chromosome 18q. Dis. Colon Rectum 47, 1467-1482.

10.1007/s 10350-004-0628-6.

68. Tanaka, T., Watanabe, T., Kazama, Y., Tanaka, J., Kanazawa, T., Kazama, S., and Nagawa, H. (2006). Chromosome 18q deletion and Smad4 protein inactivation correlate with liver metastasis: a study matched for T- and N- classification. Br. J. Cancer 95, 1562-1567. 10.1038/sj.bjc.6603460.

69. McFadden, D.G., Papagiannakopoulos, T., Taylor-Weiner, A., Stewart, C., Carter, S.L., Cibulskis, K., Bhutkar, A., McKenna, A., Dooley, A., Vernon, A., et al. (2014). Genetic and clonal dissection of murine small cell lung carcinoma progression by genome sequencing. Cell 156, 1298-1311. 10.1016/j .cell.2014.02.031.

70. Trakala, M., Aggarwal, M., Sniffen, C., Zasadil, L., Carroll, A., Ma, D., Su, X.A., Wangsa, D., Meyer, A., Sieben, C.J., et al. (2021). Clonal selection of stable aneuploidies in progenitor cells drives high-prevalence tumorigenesis. Genes Dev. 35, 1079-1092.

10.1101/gad.348341.121.

71. Xue, W., Kitzing, T., Roessler, S., Zuber, J., Krasnitz, A., Schultz, N., Revill, K., Weissmueller, S., Rappaport, A.R., Simon, J., et al. (2012). A cluster of cooperating tumor-suppressor gene candidates in chromosomal deletions. Proc. Natl. Acad. Sci. U. S. A. 109, 8212-8217. 10.1073/pnas.1206062109.

72. Chen, B., Gilbert, L.A., Cimini, B.A., Schnitzbauer, J., Zhang, W ., Li, G.-W., Park, J., Blackbum, E.H., Weissman, J.S., Qi, L.S., et al. (2013). Dynamic Imaging of Genomic Loci in Living Human Cells by an Optimized CRISPR/Cas System. Cell 155, 1479- 1491. 10.1016/j. cell.2013.12.001.

73. Ran, F.A., Hsu, P.D., Wright, J., Agarwala, V., Scott, D.A., and Zhang, F. (2013). Genome engineering using the CRISPR-Cas9 system. Nat. Protoc. 8, 2281-2308. 10.1038/nprot.2013.143.

74. Cong, L., Ran, F.A., Cox, D., Lin, S., Barretto, R., Habib, N., Hsu, P.D., Wu, X., Jiang, W ., Marraffini, L.A., et al. (2013). Multiplex Genome Engineering Using CRISPR/Cas Systems. Science 339, 819-823. 10.1126/science.1231143.

75. Schindelin, J., Argandα-Carreras, I., Frise, E., Kaynig, V., Longair, M., Pietzsch, T., Preibisch, S., Rueden, C., Saalfeld, S., Schmid, B., et al. (2012). Fiji - an Open Source platform for biological image analysis. Nat. Methods 9, 10.1038/nmeth.2019. 10.1038/nmeth.2019.

76. Gilbert, L.A., Horlbeck, M.A., Adamson, B., Villalta, J.E., Chen, Y., Whitehead, E.H., Guimaraes, C., Panning, B., Ploegh, H.L., Bassik, M.C., et al. (2014). Genome-Scale CRISPR-Mediated Control of Gene Repression and Activation. Cell 159, 647-661. 10.1016/j. cell.2014.09.029.

77. van der Walt, S., Schonberger, J.L., Nunez-Iglesias, J., Boulogne, F., Warner, J.D., Yager, N., Gouillart, E., Yu, T., and scikit-image contributors (2014). scikit-image: image processing in Python. PeerJ 2, e453. 10.7717/peerj.453. 78. Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Burrows- Wheeler transform. Bioinforma. Oxf. Engl. 25, 1754-1760.

10.1093/bioinformatics/btp324.

79. Van der Auwera, G.A. (2020). Genomics in the cloud : using Docker, GATK, and WDL in Terra First edition. (O’Reilly Media).

80. Kuilman, T., Velds, A., Kemper, K., Ranzani, M., Bombardelli, L., Hoogstraat, M., Nevedomskaya, E., Xu, G., de Ruiter, J., Lolkema, M.P., et al. (2015). CopywriteR: DNA copy number detection from off-target sequence data. Genome Biol. 76, 49.

10.1186/S13059-015-0617-1.

81. Dolgalev, Igor (2022). Seq-N-Slide. 10.5281/ZENODO.5550459.

82. Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinforma. Oxf. Engl. 30, 2114-2120. 10.1093/bioinformatics/btul70.

83. Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M., and Gingeras, T.R. (2013). STAR: ultrafast universal RNA-seq aligner. Bioinforma. Oxf. Engl. 29, 15-21. 10.1093/bioinformatics/bts635.

84. Liao, Y., Smyth, G.K., and Shi, W. (2014). featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinforma. Oxf. Engl. 30, 923-930. 10.1093/bioinformatics/btt656.

85. Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U. S. A. J 02, 15545-15550.

10.1073/pnas.0506580102.

86. Hao, Y., Hao, S., Andersen-Nissen, E., Mauck, W.M., Zheng, S., Butler, A., Lee, M.J., Wilk, A.J., Darby, C., Zager, M., et al. (2021). Integrated analysis of multimodal single- cell data. Cell 184, 3573-3587.e29. 10.1016/j .cell.2021.04.048.

87. Gu, Z., Eils, R., and Schlesner, M. (2016). Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847-2849. 10.1093/bioinformatics/btw313.

88. Liu, J., Lichtenberg, T., Hoadley, K.A., Poisson, L.M., Lazar, A.J., Cherniack, A.D., Kovatich, A.J., Benz, C.C., Levine, D.A., Lee, A.V., et al. (2018). An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics. Cell 173, 400-416. el l. 10.1016/j .cell.2018.02.052.

While the disclosure has been particularly shown and described with reference to specific embodiments (some of which are preferred embodiments), it should be understood by those having skill in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the present disclosure as disclosed herein.