Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
MOLECULAR CLONING METHOD AND VECTOR THEREFORE
Document Type and Number:
WIPO Patent Application WO/2023/089153
Kind Code:
A1
Abstract:
The invention relates to a method for providing a multiplexed RNA expression plasmid, comprising the steps of providing a plasmid backbone fragment comprising a 5' backbone end and a 3' backbone end; providing plurality of n insert fragments, wherein each insert comprises a sequence encoding a transcribed RNA sequence, and one of said insert fragments additionally encodes an insert selection marker under control of a promoter operable in a host cell. The fragments are inserted into the backbone by Gibson assembly. The product of the Gibson assembly are transformed into bacterial host cells and the ligated plasmid products are isolated. The invention further relates to a library made by the method described in the invention.

Inventors:
YIN JIANG-AN (CH)
AGUZZI ADRIANO (CH)
FRICK LUKAS (CH)
Application Number:
PCT/EP2022/082524
Publication Date:
May 25, 2023
Filing Date:
November 20, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV ZUERICH (CH)
International Classes:
C12N15/10; C12N15/64; C12N15/66
Domestic Patent References:
WO2020086144A22020-04-30
Foreign References:
EP3045537A12016-07-20
EP21209377A2021-11-19
Other References:
YIN JIANG-AN ET AL: "Robust and Versatile Arrayed Libraries for Human Genome-Wide CRISPR Activation, Deletion and Silencing", BIORXIV, 25 May 2022 (2022-05-25), XP093022872, Retrieved from the Internet [retrieved on 20230210], DOI: 10.1101/2022.05.25.493370
NICHOLAS S. MCCARTY ET AL: "Multiplexed CRISPR technologies for gene editing and transcriptional regulation", NATURE COMMUNICATIONS, vol. 11, no. 1, 9 March 2020 (2020-03-09), XP055736214, DOI: 10.1038/s41467-020-15053-x
A. M. KABADI ET AL: "Multiplex CRISPR/Cas9-based genome engineering from a single lentiviral vector", NUCLEIC ACIDS RESEARCH, vol. 42, no. 19, 13 August 2014 (2014-08-13), pages e147 - e147, XP055177310, ISSN: 0305-1048, DOI: 10.1093/nar/gku749
MARC ZUCKERMANN ET AL: "A novel cloning strategy for one-step assembly of multiplex CRISPR vectors", SCIENTIFIC REPORTS, vol. 8, no. 1, 30 November 2018 (2018-11-30), XP055742331, DOI: 10.1038/s41598-018-35727-3
GONÇALVES EMANUEL ET AL: "Minimal genome-wide human CRISPR-Cas9 library", GENOME BIOLOGY, vol. 22, no. 1, 21 January 2021 (2021-01-21), XP093000451, Retrieved from the Internet DOI: 10.1186/s13059-021-02268-4
JOANA A. VIDIGAL ET AL: "Rapid and efficient one-step generation of paired gRNA CRISPR-Cas9 libraries", NATURE COMMUNICATIONS, vol. 6, no. 1, 17 August 2015 (2015-08-17), XP055560017, DOI: 10.1038/ncomms9083
ADAMSON BRITT ET AL: "A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response", CELL, ELSEVIER, AMSTERDAM NL, vol. 167, no. 7, 15 December 2016 (2016-12-15), pages 1867, XP029850719, ISSN: 0092-8674, DOI: 10.1016/J.CELL.2016.11.048
ANOB M. CHAKRABARTI ET AL: "Target-Specific Precision of CRISPR-Mediated Genome Editing", MOLECULAR CELL, vol. 73, no. 4, 1 February 2019 (2019-02-01), AMSTERDAM, NL, pages 699 - 713.e6, XP055701322, ISSN: 1097-2765, DOI: 10.1016/j.molcel.2018.11.031
ANONYMOUS: "CRISPR 101: A Desktop Resource", CRISPR 101: A DESKTOP RESOURCE, 1 May 2017 (2017-05-01), pages 1 - 195, XP055689797, Retrieved from the Internet [retrieved on 20200428]
SAMBROOK ET AL.: "Molecular Cloning: A Laboratory Manual", 2012, COLD SPRING HARBOR LABORATORY PRESS
AUSUBEL ET AL.: "Short Protocols in Molecular Biology", 2002, JOHN WILEY & SONS, INC.
SMITHWATERMAN, ADV. APPL. MATH., vol. 2, 1981, pages 482
NEEDLEMANWUNSCH, J. MOL. BIOL., vol. 48, 1970, pages 443
PEARSONLIPMAN, PROC. NAT. ACAD. SCI., vol. 85, 1988, pages 2444
ALTSCHUL ET AL., J. MOL. BIOL., vol. 215, 1990, pages 403 - 410
GIBSON ET AL., NATURE METHODS, vol. 6, 2009, pages 343 - 345
CHAVEZ, A ET AL., NATURE METHODS, vol. 12, pages 326 - 328
NUNEZ ET AL., CELL, vol. 184, 29 April 2021 (2021-04-29), pages 2503 - 2519
GIBSON ET AL., NAT METHODS, vol. 6, 2009, pages 343 - 345
METZAKOPIAN, E.STRONG, A.IYER, V.HODGKINS, A.TZELEPIS, K.ANTUNES, L.FRIEDRICH, M.J.KANG, Q.DAVIDSON, T.LAMBERTH, J. ET AL.: "Enhancing the genome editing toolbox: genome wide CRISPR arrayed libraries", SCI REP, vol. 7, 2017, pages 2244, XP055566642, DOI: 10.1038/s41598-017-01766-5
KOIKE-YUSA ET AL., NOT BIOTECHNOL, vol. 32, 2014, pages 267 - 273
SANSON ET AL., NAT COMMUN, vol. 9, 2018, pages 5416
HORLBECK ET AL., ELIFE, 2016, pages 5
HART ET AL., G3 (BETHESDA, vol. 7, 2017, pages 2719 - 2727
BRUNELLO ET AL., NAT BIOTECHNOL, vol. 34, 2016, pages 184 - 191
HANNADOENCH, NAT BIOTECHNOL, vol. 38, 2020, pages 813 - 823
FANTOM CONSORTIUMFORREST, A.R.R.KAWAJI, H.REHLI, M.BAILLIE, J.K.DE HOON, M.J.L.HABERLE, V.LASSMANN, T.KULAKOVSKIY, I.V.LIZIO, M. E: "A promoter-level mammalian expression atlas", NATURE, vol. 507, 2014, pages 462 - 470
YATES, A.D.ACHUTHAN, P.AKANNI, W.ALLEN, J.ALLEN, J.ALVAREZ-JARRETA, J.AMODE, M.R.ARMEAN, I.M.AZOV, A.G.BENNETT, R. ET AL.: "Ensembl 2020", NUCLEIC ACIDS RES, vol. 48, 2020, pages D682 - D688
GLUSMAN, G.CABALLERO, J.MAULDIN, D.E.HOOD, L.ROACH, J.C.: "Kaviar: an accessible system for testing SNV novelty", BIOINFORMA. OXF. ENGL., vol. 27, 2011, pages 3216 - 3217
PEREZ, A.R., PRITYKIN, Y., VIDIGAL, J.A., CHHANGAWALA, S., ZAMPARO, L., LESLIE, C.S., VENTURA, A.: "GuideScan software for improved single and paired CRISPR guide RNA design", NAT. BIOTECHNOL., vol. 35, 2017, pages 347 - 349, XP055715900, DOI: 10.1038/nbt.3804
HSU, P.D.SCOTT, D.A.WEINSTEIN, J.A.RAN, F.A.KONERMANN, S.AGARWALA, V.LI, Y.FINE, E.J.WU, X.SHALEM, O. ET AL.: "DNA targeting specificity of RNA-guided Cas9 nucleases", NAT. BIOTECHNOL., vol. 31, 2013, pages 827 - 832, XP055219426, DOI: 10.1038/nbt.2647
CONCORDET, J.-P.HAEUSSLER, M.: "CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screens", NUCLEIC ACIDS RES, vol. 46, 2018, pages W242 - W245
HAEUSSLER ET AL., GENOME BIOL, vol. 17, 2016, pages 148
GRAF, R.LI, X.CHU, V.T.RAJEWSKY, K.: "sgRNA Sequence Motifs Blocking Efficient CRISPR/Cas9-Mediated Gene Editing", CELL REP, vol. 26, 2019, pages 1098 - 1103
LAMBERT, S.A.JOLMA, A.CAMPITELLI, L.F.DAS, P.K.YIN, Y.ALBU, M.CHEN, X.TAIPALE, J.HUGHES, T.R.WEIRAUCH, M.T.: "The Human Transcription Factors", CELL, vol. 172, 2018, pages 650 - 665, XP085347128, DOI: 10.1016/j.cell.2018.01.029
BRASCHI, B.DENNY, P.GRAY, K.JONES, T.SEAL, R.TWEEDIE, S.YATES, B.BRUFORD, E.: "Genenames.org: the HGNC and VGNC resources in 2019", NUCLEIC ACIDS RES, vol. 47, 2019, pages D786 - D792
SMEDLEY ET AL., BMC GENOMICS, vol. 10, 25 March 2020 (2020-03-25), pages 22
UHLEN, M.FAGERBERG, L.HALLSTROM, B.M.LINDSKOG, C.OKSVOLD, P.MARDINOGLU, A.SIVERTSSON, A.KAMPF, C.SJOSTEDT, E.ASPLUND, A. ET AL.: "Proteomics. Tissue-based map of the human proteome", SCIENCE, vol. 347, 2015, pages 1260419, XP055393269, DOI: 10.1126/science.1260419
ABUGESSAISA, I.NOGUCHI, S.HASEGAWA, A.HARSHBARGER, J.KONDO, A.LIZIO, M.SEVERIN, J.CARNINCI, P.KAWAJI, H.KASUKAWA, T.: "FANTOM5 CAGE profiles of human and mouse reprocessed for GRCh38 and GRCm38 genome assemblies", SCI. DATA, vol. 4, 2017, pages 170107
HUBER, W.CAREY, V.J.GENTLEMAN, R.ANDERS, S.CARLSON, M.CARVALHO, B.S.BRAVO, H.C.DAVIS, S.GATTO, L.GIRKE, T. ET AL.: "Orchestrating high-throughput genomic analysis with Bioconductor", NAT. METHODS, vol. 12, 2015, pages 115 - 121
BIRNBOIM, H.C.DOLY, J.: "A rapid alkaline extraction procedure for screening recombinant plasmid DNA", NUCLEIC ACIDS RES, vol. 7, 1979, pages 1513 - 1523
KOIKE-YUSA, H.LI, Y.TAN, E.P.VELASCO-HERRERA MDEL, C.YUSA, K.: "Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library", NAT BIOTECHNOL, vol. 32, 2014, pages 267 - 273, XP055115706, DOI: 10.1038/nbt.2800
DOENCH, J.G.FUSI, N.SULLENDER, M.HEGDE, M.VAIMBERG, E.W.DONOVAN, K.F.SMITH, I.TOTHOVA, Z.WILEN, C.ORCHARD, R. ET AL.: "Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9", NAT. BIOTECHNOL., vol. 34, 2016, pages 184 - 191
FRANKISH, A.DIEKHANS, M.FERREIRA, A.-M.JOHNSON, R.JUNGREIS, I.LOVELAND, J.MUDGE, J.M.SISU, C.WRIGHT, J.ARMSTRONG, J. ET AL.: "GENCODE reference annotation for the human and mouse genomes", NUCLEIC ACIDS RES, vol. 47, 2019, pages D766 - D773
GILBERT, L.A.HORLBECK, M.A.ADAMSON, B.VILLALTA, J.E.CHEN, Y.WHITEHEAD, E.H.GUIMARAES, C.PANNING, B.PLOEGH, H.L.BASSIK, M.C. ET AL.: "Genome-Scale CRISPR-Mediated Control of Gene Repression and Activation", CELL, vol. 159, 2014, pages 647 - 661, XP055247644, DOI: 10.1016/j.cell.2014.09.029
HAEUSSLER, M.SCHONIG, K.ECKERT, H.ESCHSTRUTH, A.MIANNE, J.RENAUD, J.-B.SCHNEIDER-MAUNOURY, S.SHKUMATAVA, A.TEBOUL, L.KENT, J. ET A: "Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR", GENOME BIOL., vol. 17, 2016, pages 148, XP055551329, DOI: 10.1186/s13059-016-1012-2
HANNA, R.E.DOENCH, J.G.: "Design and analysis of CRISPR-Cas experiments", NAT. BIOTECHNOL., 2020
HART, T.TONG, A.H.Y.CHAN, K.VAN LEEUWEN, J.SEETHARAMAN, A.AREGGER, M.CHANDRASHEKHAR, M.HUSTEDT, N.SETH, S.NOONAN, A. ET AL.: "Evaluation and Design of Genome-Wide CRISPR/SpCas9 Knockout Screens", G3 BETHESDA MD, vol. 7, 2017, pages 2719 - 2727
HORLBECK, M.A.GILBERT, L.A.VILLALTA, J.E.ADAMSON, B.PAK, R.A.CHEN, Y.FIELDS, A.P.PARK, C.Y.CORN, J.E.KAMPMANN, M. ET AL.: "Compact and highly active next-generation libraries for CRISPR-mediated gene repression and activation", ELIFE, 2016, pages 5
R CORE TEAM: "R: A Language and Environment for Statistical Computing", 2020, R FOUNDATION FOR STATISTICAL COMPUTING
SANSON, K.R.HANNA, R.E.HEGDE, M.DONOVAN, K.F.STRAND, C.SULLENDER, M.E.VAIMBERG, E.W.GOODALE, A.ROOT, D.E.PICCIONI, F. ET AL.: "Optimized libraries for CRISPR-Cas9 genetic screens with multiple modalities", NAT. COMMUN., vol. 9, 2018, pages 5416
SMEDLEY, D., HAIDER, S., BALLESTER, B., HOLLAND, R., LONDON, D., THORISSON, G., KASPRZYK, A.: "BioMart--biological queries made easy", BMC GENOMICS, vol. 10, 2009, pages 22, XP021047935, DOI: 10.1186/1471-2164-10-22
UHLEN, M.KARLSSON, M.J.HOBER, A.SVENSSON, A.-S.SCHEFFEL, J.KOTOL, D.ZHONG, W.TEBANI, A.STRANDBERG, L.EDFORS, F. ET AL.: "The human secretome", SCI. SIGNAL., 2019, pages 12
Attorney, Agent or Firm:
JUNGHANS, Claas (DE)
Download PDF:
Claims:
Claims

1 . A method for providing a multiplexed RNA expression plasmid, comprising the steps of: a. providing a plasmid backbone fragment comprising a 5’ backbone end and a 3’ backbone end; b. providing plurality of n insert fragments comprising a first insert, second insert, ... nth insert, wherein i. each insert comprises a sequence encoding a transcribed RNA sequence, II. each insert comprises a 5’ insert end and a 3’ insert end; ill. wherein

• the 5’ 1st insert end overlaps with the 3’ backbone end, and

• the 3’ 1st insert end overlaps with the 5’ 2nd insert end, and

• the 3’ 2nd insert end overlaps with the 5’ nth insert end overlap, and the 3’ nth insert end overlaps with the 5’ backbone end;

• and wherein each overlap has a length of 15 to 40 bp, particularly a length of 17 to 22 bp, iv. one of said inserts additionally encodes an insert selection marker under control of a promoter operable in a host cell; c. in a Gibson assembly step, contacting the plasmid backbone fragment and each of said plurality of insert fragments in presence of the following enzyme activities: a 5’ exonuclease activity, a DNA-dependent DNA polymerase activity a DNA ligase activity; d. transforming the product of said Gibson assembly step into bacterial host cells and propagating said host cells in a bacterial culture; e. isolating ligated plasmid products from said bacterial culture.

2. The method according to claim 1 , wherein said plurality of n insert fragments consists of a first fragment, a second fragment and a third fragment.

3. The method according to claim 1 or 2, wherein said plurality of inserts encode four transcribed RNA sequences.

4. The method according to any one of the preceding claims, wherein said transcribed RNA sequences comprise a sgRNA and a trcrRNA.

5. The method according to claim 4, wherein each of said sgRNA is specific for a sequence comprised in the same gene of said target cell.

6. The method according to any one of the preceding claims, wherein the plasmid backbone fragment is obtained by propagating a precursor plasmid comprising a backbone selection marker, isolating the precursor plasmid and excising the backbone selection marker, yielding the plasmid backbone fragment. The method according to any one of the preceding claims, wherein the insert fragments are provided by synthesis of the insert sequence and subsequent PCR using a primer pair introducing sgRNAs and creating the overlaps. The method according to any one of the preceding claims, wherein the plasmid backbone and the ligated plasmid product can be packaged into a retroviral vector. A method for providing a library of multiplexed RNA expression plasmids, comprising conducting the method according to any one of claims 1 to 8 in multiple parallel reactions, wherein in each of said reactions, said transcribed RNA sequences are different. The method according to claim 9, wherein each of said transcribed RNA sequences comprises a sgRNA, and for any one reaction, each of said sgRNA is specific for a sequence in the same gene comprised in said target cell, and for each of said reactions, said gene is different. The method according to claim 10, wherein at least 100, more particularly at least 1000 sgRNA sequences selected from SEQ ID NO 64 - 170943 are expressed. A library of multiplexed RNA expression plasmids, obtained or obtainable by a method according to claim 9 to 11 . A method for molecular cloning of an insert-containing plasmid from a parent plasmid, wherein a. the parent plasmid comprises a first selection marker flanked by recognition sequences susceptible to endonuclease action (cleavage) by a type II restriction enzyme; b. the parent plasmid is cut (digested) with the type II restriction enzyme, generating two fragments: i. a parent plasmid backbone fragment having a 5’ backbone (single stranded overlapping) end and a 3’ backbone (single stranded overlapping) end;

II. a first resistance marker fragment; c. an insert fragment containing a second selection marker, and comprising ends that anneal to (fit) the parent 5’ backbone end and a 3’ backbone end (sticky ends) is ligated to the parent plasmid backbone fragment, wherein the insert fragments has a sequence adjacent to the ends that does not reconstitute the type II recognition site; d. the product of the ligation is transferred into cells under conditions where only cells containing the second selection marker can propagate, and the resulting cell suspension is worked up (prepped) to obtain the insert-containing plasmid.

Description:
Molecular Cloning Method and Vector therefore

This application claims benefit of the priority of European Application No. EP21209377 submitted on 19 November 2021 , the contents of which are incorporated herein by reference.

The present invention relates to a method for providing a library of multiplexed RNA expression plasmids encoding single guide (sg)RNA molecules with genome-wide coverage. Another aspect of the invention relates to a method of preparing a library of transfected cells, each member of the library of cells comprising a member of the library of plasmids, and to a library of transfected cells obtained by the method. The invention further relates to a library of expression plasmids.

Background of the Invention

RNA-interference- or mutagen-mediated screenings have greatly improved our understanding of biology and human health and transformed drug development. The recently developed CRISPR- mediated techniques have now enormously expanded the toolkits of genetic screening, and now allow for gene knockout (CRISPRo) and activation (CRISPRa). Furthermore, it was shown that CRISPR-based screenings yield both overlapping and distinct hits compared to RNA interferencebased screenings, and CRISPR-mediated gene perturbations are relatively more specific than RNA-inference methods. Thus, genome-wide CRISPR-based gene perturbation libraries are of essential importance for global identification of genes involved in biological processes, particularly in understanding disease.

Indeed, many genome-wide CRISPR-based pooled libraries including CRISPRo and CRISPRa were generated and have been very successful in screenings of cellular-based phenotypes including cell survival, proliferation or sensitivity to insults, cellular function and gene expression. Yet these pooled libraries do not readily lend themselves to the study of cell-non-autonomous phenotypes, where a phenotype observed in a cell arises from a gene’s perturbation in another cell. Typical examples are protein secretion, mitochondria unfolded protein response (UPRmt, a conserved transcriptional stress response in mitochondria), and glia-neuron interactions (which are widely occurring and crucial for the brain function in normal and disease conditions). Furthermore, pooled libraries have limited use for high-content screening approaches such as image-based neurite extension/retraction observations, regardless whether the underlying mechanisms are cell- autonomous or non-cell autonomous.

The limited utility of pooled libraries can be circumvented via arrayed libraries. However, compared to the various available and fast-advancing pooled CRISPR libraries, limited resources (including both commercial and academic) of human CRISPR arrayed libraries are available, and their effectiveness remains to be ascertained. Moreover, certain drawbacks limit the usefulness of these CRISPR arrayed libraries. First, synthetic-RNA-based libraries are expensive and difficult to use in non-transfectable cells. Second, these libraries, especially the lentiviral-based ones, mostly rely on single gRNA, the efficiency of which is relatively low and unpredictable. Third, single-gRNA-induced knockouts can be homozygous or heterozygous and, in most cases, heterozygous knockouts cannot elicit functional consequences in protein coding genes, leading to false-negative calls. Fourth, the single gRNA is under the control of a defined promoter, and any cell-type specific reduced or silenced expression of the promoter will greatly affect the effectiveness of these libraries. Fifth, existing libraries were obtained using gRNA design algorithms based on the reference genome. However, the existence of frequent polymorphisms in human genomes can largely diminish the effectiveness of gRNAs in manipulating patent-derived cells, for instance patent-derived induced pluripotent stem cells(iPSCs). Thus, a new generation of highly active, robust, generic and versatile CRISPR arrayed libraries are highly demanded, especially in studies on phenotypes that cannot be addressed via empirically active pooled libraries.

CRISPR activation (CRISPRa) exploits endonuclease inactivated versions of CRISPR effectors (dCas9), with added transcriptional activators (e.g. VP64, VPR et.al) fused to dCas9 or recruited to the single guide RNAs (sgRNAs) to activate transcriptional expression of a gene of interest by targeting a specific sequence in the gene via sgRNA. The invention of 4sgRNA/gene synergizes the efficacy of single sgRNA and provides high ly-efficient and robust method for gene activation, and this invention, in principle, is compatible to all CRISPR activation systems where the transcriptional activators are fused or recruited to dCas9, such as dCas9-VP64, dCas9-VPR and dCas9-SunTag.

Based on the above-mentioned state of the art, the objective of the present invention is to provide means and methods to provide a multiplex sgRNA expressing library allowing targeting of a high number of genes in a cell. This objective is attained by the subject-matter of the independent claims of the present specification, with further advantageous embodiments described in the dependent claims, examples, figures and general description of this specification.

Summary of the Invention

A first aspect of the invention relates to a method for providing a multiplexed RNA expression plasmid, comprising the steps of: a. providing a plasmid backbone fragment comprising a 5’ backbone end and a 3’ backbone end; b. providing plurality of n insert fragments comprising a first insert, second insert, ... nth insert, wherein i. each insert comprises a sequence encoding a transcribed RNA sequence, and each transcribed RNA sequence in an insert differs from that of any other insert,

II. each insert comprises a 5’ insert end and a 3’ insert end; ill. wherein

• the 5’ 1 st insert end overlaps with the 3’ backbone end, and

• the 3’ 1 st insert end overlaps with the 5’ 2 nd insert end, and • the 3’ 2 nd insert end overlaps with the 5’ nth insert end overlap, and the 3’ nth insert end overlaps with the 5’ backbone end;

• and wherein each overlap has a length of 15 to 40 bp, particularly a length of 17 to 22 bp, iv. one of said inserts additionally encodes an insert selection marker under control of a promoter operable in a bacterial host cell; c. in a Gibson assembly step, contacting the plasmid backbone fragment and each of said plurality of insert fragments in presence of the following enzyme activities: a 5’ exonuclease activity, a DNA-dependent DNA polymerase activity a DNA ligase activity at a concentration of dNTP and divalent cation in appropriate buffer conducive to said enzyme activities; d. transforming the product of said Gibson assembly step into bacterial host cells and propagating said host cells in a bacterial culture; e. isolating ligated plasmid products from said bacterial culture.

Another aspect of the invention relates to a method for providing a library of multiplexed RNA expression plasmids, comprising conducting the method according to the previously described aspect in multiple reactions in parallel, wherein in each of said reactions, said transcribed RNA sequences are different.

Yet another aspect of the invention relates to a library provided by the above method, particularly bearing the sequences described herein and in the sequence protocol.

The inventors experimentally confirmed the superiority and robustness of 4gRNA/vector in the efficiency of CRISPR-mediated gene activation and ablation. They developed an automated liquid phase plasmid assembly and cloning protocol, which assembles 4 gRNAs (driven by four different promoters) into a single vector and leverages an antibiotic resistance switch in the backbone of 4gRNA-vector to eliminate the necessity of colony picking, enabling the automation of high- throughput liquid-phase cloning of libraries in a cost-effective manner. Updated gRNA design algorithms allow for the picking of specific and generic gRNAs without targeting “hot-spot” polymorphism regions in human genomes to maximize the synergy of 4gRNAs. The CRISPRa and CRISPRo libraries prepared by the inventors consist of 22,326, and 19,820 polyclonal plasmids, respectively, suitable for genome-wide screenings of human protein coding genes. Thus, the libraries provided according to the method of the invention, represent next-generation improvements and provide powerful genome-wide protein-coding gene perturbation resources for the community. Terms and definitions

For purposes of interpreting this specification, the following definitions will apply and whenever appropriate, terms used in the singular will also include the plural and vice versa. In the event that any definition set forth below conflicts with any document incorporated herein by reference, the definition set forth shall control.

The terms “comprising,” “having,” “containing,” and “including,” and other similar forms, and grammatical equivalents thereof, as used herein, are intended to be equivalent in meaning and to be open-ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. For example, an article “comprising” components A, B, and C can consist of (i.e., contain only) components A, B, and C, or can contain not only components A, B, and C but also one or more other components. As such, it is intended and understood that “comprises” and similar forms thereof, and grammatical equivalents thereof, include disclosure of embodiments of “consisting essentially of or “consisting of.”

Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit, unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is encompassed within the disclosure, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the disclosure.

Reference to “about” a value or parameter herein includes (and describes) variations that are directed to that value or parameter per se. For example, description referring to “about X” includes description of “X.”

As used herein, including in the appended claims, the singular forms “a,” “or,” and “the” include plural referents unless the context clearly dictates otherwise.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art (e.g., in cell culture, molecular genetics, nucleic acid chemistry, hybridization techniques and biochemistry). Standard techniques are used for molecular, genetic, and biochemical methods (see generally, Sambrook et al., Molecular Cloning: A Laboratory Manual, 4th ed. (2012) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. and Ausubel et al., Short Protocols in Molecular Biology (2002) 5th Ed, John Wiley & Sons, Inc.) and chemical methods.

Sequences

Sequences similar or homologous (e.g., at least about 70% sequence identity) to the sequences disclosed herein are also part of the invention. In some embodiments, the sequence identity at the amino acid level can be about 80%, 85%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or higher. At the nucleic acid level, the sequence identity can be about 70%, 75%, 80%, 85%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or higher. Alternatively, substantial identity exists when the nucleic acid segments will hybridize under selective hybridization conditions (e.g., very high stringency hybridization conditions), to the complement of the strand. The nucleic acids may be present in whole cells, in a cell lysate, or in a partially purified or substantially pure form.

In the context of the present specification, the terms sequence identity and percentage of sequence identity refer to a single quantitative parameter representing the result of a sequence comparison determined by comparing two aligned sequences position by position. Methods for alignment of sequences for comparison are well-known in the art. Alignment of sequences for comparison may be conducted by the local homology algorithm of Smith and Waterman, Adv. Appl. Math. 2:482 (1981 ), by the global alignment algorithm of Needleman and Wunsch, J. Mol. Biol. 48:443 (1970), by the search for similarity method of Pearson and Lipman, Proc. Nat. Acad. Sci. 85:2444 (1988) or by computerized implementations of these algorithms, including, but not limited to: CLUSTAL, GAP, BESTFIT, BLAST, FASTA and TFASTA. Software for performing BLAST analyses is publicly available, e.g., through the National Center for Biotechnology-Information (http://blast.ncbi.nlm.nih.gov/).

One example for comparison of amino acid sequences is the BLASTP algorithm that uses the default settings: Expect threshold: 10; Word size: 3; Max matches in a query range: 0; Matrix: BLOSUM62; Gap Costs: Existence 11 , Extension 1 ; Compositional adjustments: Conditional compositional score matrix adjustment. One such example for comparison of nucleic acid sequences is the BLASTN algorithm that uses the default settings: Expect threshold: 10; Word size: 28; Max matches in a query range: 0; Match/Mismatch Scores: 1 .-2; Gap costs: Linear. Unless stated otherwise, sequence identity values provided herein refer to the value obtained using the BLAST suite of programs (Altschul et al., J. Mol. Biol. 215:403-410 (1990)) using the above identified default parameters for protein and nucleic acid comparison, respectively.

Reference to identical sequences without specification of a percentage value implies 100% identical sequences (i.e. the same sequence).

General Molecular Biology: Nucleic Acid Sequences, Expression

The term gene refers to a polynucleotide containing at least one open reading frame (ORF) that is capable of encoding a particular polypeptide or protein after being transcribed and translated. A polynucleotide sequence can be used to identify larger fragments or full-length coding sequences of the gene with which they are associated. Methods of isolating larger fragment sequences are known to those of skill in the art.

The terms gene expression or expression, or alternatively the term gene product, may refer to either of, or both of, the processes - and products thereof - of generation of nucleic acids (RNA) or the generation of a peptide or polypeptide, also referred to transcription and translation, respectively, or any of the intermediate processes that regulate the processing of genetic information to yield polypeptide products. The term gene expression may also be applied to the transcription and processing of a RNA gene product, for example a regulatory RNA or a structural (e.g. ribosomal) RNA. If an expressed polynucleotide is derived from genomic DNA, expression may include splicing of the mRNA in a eukaryotic cell. Expression may be assayed both on the level of transcription and translation, in other words mRNA and/or protein product.

The term sgRNA (single guide RNA) in the context of the present specification relates to an RNA molecule capable of sequence-specific repression of gene expression (or regulation in the case of CRISPRa mediate activation) via the CRISPR (clustered regularly interspaced short palindromic repeats) mechanism.

The term nucleic acid expression vector in the context of the present specification relates to a plasmid, a viral genome or an RNA, which is used to transfect (in case of a plasmid or an RNA) or transduce (in case of a viral genome) a target cell with a certain gene of interest, or -in the case of an RNA construct being transfected- to translate the corresponding protein of interest from a transfected mRNA. For vectors operating on the level of transcription and subsequent translation, the gene of interest is under control of a promoter sequence and the promoter sequence is operational inside the target cell, thus, the gene of interest is transcribed either constitutively or in response to a stimulus or dependent on the cell’s status. In certain embodiments, the viral genome is packaged into a capsid to become a viral vector, which is able to transduce the target cell.

Detailed Description of the Invention

A first aspect of the invention relates to a method for providing a multiplexed RNA expression plasmid. This method comprises the following steps.

Firstly, a plasmid backbone fragment comprising a 5’ backbone end and a 3’ backbone end is provided. Additionally, a plurality of n insert fragments is also provided. These insert fragments consist of a first insert, second insert, and following numbered insert up to the nth insert.

Each insert comprises a sequence encoding a transcribed RNA sequence, and each transcribed RNA sequence in an insert differs from that of any other insert. Each insert comprises a 5’ insert end and a 3’ insert end, referenced by the number of the insert (5’ 1 st insert end, 5’ 2 nd insert end, 5’ 3 rd insert end, ... 5’ n th insert end). In this context, it is understood that the plasmid backbone fragment consists of double stranded DNA, and thus any linear piece of dsDNA has two 5’OH termini (which may be phosphorylated). For the purpose of designating ends and directions in the DNA strand, however, only one strand is considered and the terminology of 5’ insert end and 3’ insert end refers to the termini of this strand.

The inserts are synthesized in order to serve in a subsequent Gibson assembly step (see Gibson et al., Nature Methods volume 6, pages 343-345 (2009)). Accordingly, the 5’ 1 st insert end overlaps with the 3’ plasmid backbone end, and the 3’ 1 st insert end overlaps with the 5’ 2 nd insert end, and the 3’ 2 nd insert end overlaps with the 5’ 3 rd overlap, if present, and each subsequent 3’ overlap of a lower denominated fragment overlaps with the 5’ of the next higher dominated fragment, (nth insert end overlap), until the last (for n fragments, the n th ) 3’ n th insert end overlaps with the 5’ backbone end. Each overlap has a length of 15 to 40 bp. In particular embodiments, the overlap has a length of 17 to 22 bp,

One of said inserts additionally encodes an insert selection marker under control of a promoter operable in a bacterial host cell.

In a subsequent Gibson assembly step, the plasmid backbone fragment is contacted with each of said plurality of insert fragments in presence of the following enzyme activities: a 5’ exonuclease activity, a DNA-dependent DNA polymerase activity a DNA ligase activity at a concentration of dNTP and divalent cation in appropriate buffer conducive to said enzyme activities.

Then, the product of said Gibson assembly step is transformed (transported) into bacterial host cells and said host cells are propagated in a bacterial culture, and ligated plasmid products are isolated from said bacterial culture.

Each transcribed RNA sequence is under control of an RNA polymerase promoter sequence operable in a target cell, particularly a mammalian target cell. In certain embodiments, the promoter sequence is an RNA polymerase III specific promoter sequence. In certain embodiments, the promoter sequences controlling expression of a transcribed RNA sequence in a plasmid are all different within the same plasmid (and each plasmid library member carries the same promoters). In certain particular embodiments, the promoter sequences controlling expression of the different transcribed RNA sequence in a plasmid are selected from hU6, mU6, hH1 and h7SK.

In certain embodiments, the promoter controlling expression of the first transcribed sequence in the first fragment is contained on the backbone, adjacent to the ligation I insertion site of the first fragment. In certain particular embodiments thereof, the promoter controlling expression of the first transcribed sequence is selected from the hU6 and mU6 promoter. This facilitates the use of the mU6 and hU6 promoters for control of different transcribed sequences without the possibility of interference between the two, based on their sequence homology, during PCR generation of the fragments.

This design makes the homology parts for Gibson assembly always occur on distinct sequences on promoters, other than the very similar tracrRNA sequences, avoiding recombination between the four tracrRNAs. The design avoids an important limitation of Gibson assembly, which impedes use of the method to assemble very similar sequences.

The selection marker is a part of one of the inserts. In the particular example given herein, the selection marker expression cassette was synthesized and incorporated in insertl , please refer to the workflow chart in the figures. In certain embodiments, three fragments are inserted, in other words, the plurality of n insert fragments consists of a first fragment, a second fragment and a third fragment.

In certain embodiments, said plurality of inserts encode four transcribed RNA sequences. In those embodiments where three fragments are inserted, one insert comprises two transcribed RNA sequences.

In certain embodiments, the transcribed RNA sequences each comprise a sgRNA and a trcrRNA.

In certain embodiments, each of said sgRNA is specific for a sequence comprised in the same gene of said target cell.

In certain embodiments, the plasmid backbone fragment is obtained by propagating a precursor plasmid comprising a backbone selection marker under backbone marker selection condition, isolating the precursor plasmid and excising the backbone selection marker, yielding the plasmid backbone fragment.

The insert selection marker is different from the backbone selection marker.

In certain embodiments, the insert fragments are provided by synthesis of the insert sequence and subsequent PCR using a primer pair introducing the sgRNAs and creating the overlaps.

In certain embodiments, the plasmid backbone and the ligated plasmid product can be packaged into a retroviral vector.

Libraries

Another aspect of the invention relates to a method for providing a library of multiplexed RNA expression plasmids, comprising conducting the method according to anyone of the above aspects and embodiments of the invention, in multiple reactions in parallel, wherein in each of said reactions, said transcribed RNA sequences are different.

The inventors prepared two libraries based on the method of the invention. A first library is designed for genome-wide human protein coding gene activation with dCas9-VPR or any other CRISPR activators that are compatible with canonical tracrRNAs (Chavez, A et al., Nature Methods 12, Pages 326-328.). A second library is designed for genome-wide knockout of these genes. The activation library may can also turn off gene expression with CRISPRoff (Nunez et al., Cell 184, Issue 9, 29 April 2021 , Pages 2503-2519).

In certain embodiments, each of said transcribed RNA sequences comprises a sgRNA, and for any one reaction, each of said sgRNA is specific for a sequence in the same gene comprised in said target cell, and for each of said reactions, said gene is different.

For CRISPRa, the inventors have in certain occasions constructed more than one set of RNA transcripts to generate more than one plasmid that target a specific gene, because the design according to the invention targets the main transcription start site (TSS), and some genes have more than one TSS. In certain embodiments, at least 100, more particularly at least 1000 sgRNA sequences selected from SEQ ID NO 64 - 170943 are expressed.

A further aspect of the invention relates to a library of multiplexed RNA expression plasmids, obtained or obtainable by a method according to any of the preceding aspects or embodiments of the invention.

Pooling equal molar amounts of the arrayed library plasmids can yield an equally distributed, pooled library, which has never been possible before and can be realized now by the method of the invention. Statistically, the equal distribution libraries can easily achieve a real genome-wide coverage, while the other existing pooled libraries cannot realize it.

The pooled libraries were generated by cloning of a pool of sgRNAs into a backbone vector in a single tube. Due to variations in ligation efficiency or/and variable abundance of synthesized sgRNA oligos, the prevalence of ligated product of each sgRNA plasmid can vary across the genome. Making pooled libraries from the arrayed libraries like the 4sgRNA arrayed libraries presented herein can easily solve this issue since each plasmid was separately cloned and an equal amount of plasmid can be taken to pool down to make each gene same prevalence pooled libraries.

The backbone

The plasmid backbone fragment is obtained by type IIS restriction enzyme digestion of a backbone plasmid, which is characterized by bearing a first (backbone) antibiotic resistant selection gene enabling propagation and amplification of the plasmid in bacterial culture.

In certain embodiments, the backbone comprises as eukaryotic expression elements an LTR (long terminal repeat) sequence element, the CMV immediate early enhancer promoter, one sgRNA promoter in the arrangement depicted in Fig. 10 A and B.

After isolation and type IIS restriction enzyme digestion of the backbone plasmid, the antibiotic A (in the example of the invention: ampicillin) selection fragment is removed. The linear backbone is then subjected to Gibson assembly using a number of (n) inserts encoding different sgRNA- trcrRNA constructs that, within the same plasmid, target the same genetic sequence in a target cell. In the applications envisioned by the inventors, the target cell is a eukaryotic cell, particularly a mammalian cell. The method is, however, equally feasible to be used in other cells; the skilled artisan realizes that this will require changing the promoters for the transcribed sequence from eukaryotic (RNA Pol III) to adequate prokaryotic promoters.

The first sgRNA promoter will be in the 3’ backbone end and the last sgRNA tracrRNA will be in the 5’ backbone end and they will be accessible for following Gibson assembly with the inserts.

Type Ils restriction endonucleases recognize asymmetric sequences and cleave the DNA carrying these sequences, not -as type I endonucleases do- in the recognition site, but at a defined distance from the recognition site. DNA ends can be designed to be flanked by a Type Ils restriction site such that digestion of the fragments removes the enzyme recognition sites and generates complementary overhangs. Such ends can be ligated seamlessly, creating a junction that lacks the original site or scars.

In certain embodiments, the type II restriction enzyme is Bbsl.

One of the inserts contains an insert selection marker, in the case illustrated in the examples, an antibiotic resistance marker such as trimethoprim resistance. After Gibson assembly, the resultant plasmid carries the insert selection marker, and only the bacteria containing correctly assembled plasmids grow and amplify the desired plasmids. Wrongly ligated products, or any remaining backbone plasmid, will not be propagated under the conditions favouring the insert selection marker.

In certain embodiments, the ligated plasmid product is subsequently isolated after the Gibson assembly step.

In certain particular embodiments, the ligated plasmid product is not isolated after the Gibson assembly step. The product of the Gibson assembly step can be used directly in a transformation of bacterial cells that then produce a plasmid library member.

The ability to transform the ligated product without the need of a further plasmid prep is one of the key advantages of the method according to the invention. Isolating or purification of the ligated product is not necessary in the method according to the invention.

In certain embodiments, after Gibson assembly, the Gibson assembly products are subjected to transformation and the bacteria transformed with the correctly assembled product grow because of the antibiotic switch. Any bacteria transformed by non-assembled or incorrectly assembled products transformed will not grow. Thus, the switch of antibiotic to select bacteria is a key feature of the method disclosed herein, as this avoids colony picking, which is a common method to isolate correct plasmids in a classical cloning procedure.

Promoters of particular utility in the context of the present invention include the human and murine U6 promoters (hU6, mU6, respectively), the human H1 promoter, and the human 7SK promoter, all of which are RNA polymerase III promoters. Pol III genes and sequences under transcriptional control of such promoters are not polyadenylated and thus are ideal for making small nuclear RNAs such as shRNA hairpins.

The selection marker in the embodiment laid out in the examples encodes a trimethoprim antibiotic resistance gene, and is incorporated in between two sgRNA expression cassettes (each cassette including a promoter, sgRNA, tracrRNA and transcription terminal sequence). The selection marker gene cassette includes a bacterial promoter and marker gene coding sequence.

In certain embodiments, the following architecture of inserts is chosen; the 1st insert comprises a sequence encoding the first transcribed RNA sequence, the first tracrRNA sequence and a transcription termination sequence, followed by the promoter of the second RNA; the 2nd insert comprises a sequence encoding the second transcribed RNA sequence, the second tracrRNA sequence and a transcription termination sequence, followed by the promoter of the third RNA; the (n-1 )th (here n>=2, total n number of the inserts is the N number of the RNA sequences minus 1 , namely n=N-1 ) insert comprises a sequence encoding the (n-1 )th transcribed RNA sequence, the (n-1 )th tracrRNA sequence and a transcription termination sequence, followed by the promoter of the nth RNA;

• nth insert comprises a sequence encoding the nth transcribed RNA sequence, the nth tracrRNA sequence and a transcription termination sequence, followed by the promoter of the (n+1 )th (or Nth) RNA and the sequence of (n+1 )th (or Nth) RNA;

Description of the Figures

Fig. 1 shows (A) increased gene activation by 4-sgRNA strategy. (B) gene activation for tested genes. (C)-(D) gene ablation with 4-sgRNA strategy. (E) Digestion of the pYJA5 vector. (F) Detection of recombination after transfection of 4-sgRNA into HEK293 cells.

Fig. 2 shows (A) High throughput cloning scheme. (B)- (E) Accuracy of sgRNA and tracrRNA region reads. (F) Viral titer of 4-sgRNA packed in lentivirus. (G) sgRNA delivery rate. (H) Gene dependent activation.

Fig. 3 shows (A) general scheme of library generation. (B) Overlapping and non-overlapping binding of sgRNAs.

Fig. 4 shows (A) sgRNAs selected via updated algorithm. (B) Coverage of human protein-coding genes. (C)- (D) selected sgRNAs selected for libraries. (E)-(F) Targeting of generic regions. (G) Proportion of spaced sgRNAs. (H) CRISPRa sgRNAs targeting effect on surrounding genes. (I) Off-site target affect.

Fig. 5 shows (A) 4-sgRNA-mediated gene knockout.

Fig. 6 shows contribution of mutations and deletions as well as contamination in plasmids.

Fig. 7 shows decrease of prevalence of correct plasmids due to sequence homologies between sgRNAs.

Fig 8. shows (A)-(B) that 4-sgRNA combinations that shared any identical sub-sequences of 8 base pairs or more in length were avoided. (C)-(F) The characteristics of the 4-sgRNA library as a whole.

Fig. 9 shows the characteristics of the 4-sgRNA library as a whole.

Fig. 10 shows (A) Cloning scheme for generating a high-purity plasmid. (B) Scheme of antibiotic selection switch cloning method.

Fig. 11 shows (A) the empty backbone vector pYJA5 and (B) the 4sg-RNA vector.

Fig. 12 shows a zoom-in scheme of 4sgRNA expression cassettes and selection marker. Fig. 13 shows a scheme of inserts and precursor vector assembly

Fig. 14 shows data illustrating the virus producing capacity of 24 HEK293T monoclonal lines. Clone 5 showed the most robust and ~4-5 fold higher virus titters than most other lines.

Fig. 15 shows exemplary titers of lentivirus produced via the herein described methods in 384-well plates.

Fig. 16 shows the workflow of plasmid transfection for lentiviral production in 384-well plate.

Examples

Example 1: Superiority and Robustness of 4sgRNAs in Gene Activation and Ablation

To test the efficiency and robustness of the 4-sgRNA approach in CRISPR-mediated gene perturbation, the inventors cloned several 4-sgRNA-expressing plasmids and validated their efficiency for CRISPRa and CRISPRo in the human embryotic kidney cell line HEK293. For CRISPRa, three genes, ASCL1, NEUROD1 and CXCR4, which show low, moderate and high expression at baseline, respectively, were tested. The inventors found that single sgRNAs showed quite variable efficiency in the activation of these genes. On the other hand, the 4-sgRNA strategy dramatically increased the extent of gene activation in a robust manner (Figure 1A). This is consistent with the previous finding that three sgRNAs act more potently than single sgRNAs. To further test the robustness and efficacy of the 4-sgRNA approach for gene activation, the inventors tested genes that found to be difficult to activate in previous studies. The inventors found strong and robust gene activation for all the genes tested (Figure 1 B). The inventors also observed a superiority of the 4-sgRNA strategy in the efficiency of gene ablation (Figure 1C and D). Furthermore, 4-sgRNA-mediated gene knockout resulted in deletions of the genomic region between the cut sites (Figure 5A), which increases the likelihood of loss-of-function of the target gene. Together, these results demonstrate the high efficacy and robustness of the inventors’ 4- sgRNA/gene strategy for both gene activation and ablation.

Example 2: Design of APPEAL Cloning Method

In a traditional cloning process, bacterial colony-picking and validation are required, because the growth of undesired plasmid recombinants and the original vector backbone greatly reduce the abundance of the desired plasmids. This feature hampers automation and reduces throughput for the generation of large-scale arrayed plasmid libraries. To circumvent the drawbacks of the traditional cloning strategy, the inventors developed APPEAL (Automated-liquid-Phase-Plasmid- assEmbly-And-cLoning) via exploiting an antibiotic selection switch between the starting vector and the final desired vector, in order to generate high-purity plasmids (Figure 1 E and 5B). This eliminates the requirement for colony-picking and validation of individual clones. The inventors’ final vector includes four sgRNAs driven by four ubiquitously active Type III RNA polymerase promoters (namely the human U6, mouse U6, human H1 , and human 7SK promoters). Further, the inventors’ 4-sgRNA vector includes puromycin and TagBFP selection, lentiviral packaging, and PiggyBac (PB) transposon elements (Figure 1 E and 5B). These features contribute to the high efficiency and low recombination propensity of the inventors’ libraries, and make them very versatile resources.

The four sgRNAs are individually synthesized in the form of oligonucleotide primers containing the 20-nucleotide protospacer sequences and a constant region. In three distinct PCR reactions, the primers are amplified with separate constant fragments. The resulting three polymerase chain reaction (PCR) amplicons share homologous overlapping ends (approximately 20 nucleotides in length) with each other and the digested empty vector (pYJA5). The digested vector and the three amplicons were assembled together to form the final 4-sgRNA vector using the Gibson assembly cloning method (Gibson et al., 2009, Nat Methods 6, 343-345). In the cloning process, the antibiotic used for bacterial selection is switched from ampicillin (AmpR) for the starting vector (pYJA5) to trimethoprim (TmpR) for the final 4-sgRNA/vector (4sgRNA-pYJA5). This is accomplished by removing the ampicillin resistance gene (after restriction enzyme digestion at the two Bbsl recognition sites) from the pYJA5 backbone, and incorporating the trimethoprim resistance gene dihydrofolate reductase in between the sgRNAI and sgRNA2 cassettes in the 4-sgRNA vector (Figure 1 E and 5B). Unlike previously established multiplexed sgRNA cloning methods, the inventors’ cloning method starts from Bbsl digestion and purification of the starting vector, and all the following steps do not require gel or column purification. This makes APPEAL an ideal solution for cost-effective large-scale liquid-phase generation of libraries.

The inventors performed a proof-of-concept test of the APPEAL method. Digestion of the pYJA5 vector and the three fragment PCR amplifications yielded the correct starting materials for Gibson assembly (Figure 1 F). After transformation, in order to validate the method, the inventors spread the transformants onto agar plates, and performed single-colony PCR with primers flanking the 4- sgRNA region in the vector (Figure 1 F). Indeed, all the tested colonies showed 4-sgRNA expression cassettes of the correct size of 2.2 kilobases (Figure 1 F). Further, after 4-sgRNA plasmids were transfected into HEK293 cells, no detectable recombination was observed (Figure 1G). This indicates the feasibility of applying the APPEAL method for liquid-phase colony-picking-free plasmid generation.

Example 3: Scale-up APPEAL to 384-Well Cloning

The inventors successfully scaled the APPEAL method from single tubes to a high-throughput process (Figure 2A). Primer synthesis, PCR and Gibson assembly were performed in 384-well plates. Then transformation with the product of the Gibson assembly, bacterial cultivation and plasmid minipreparation were done in deep-well 96-well plates. Thus, the final products can be stored as bacterial glycerol stock, plasmids, lentiviral particles and transposons for downstream applications (Figure 2A).

To examine the quality of plasmids generated by the APPEAL method, the inventors cloned one 384-well plate of 4-sgRNA plasmids with APPEAL and subjected the plasmid pools to singlemolecule long-read sequencing (Pacific Biosystems). The inventors amplified the miniprepped plasmids with a pair of primers flanking the 4-sgRNA expression cassettes, and subjected the amplicons, which were 2.2 kilobases in length, to single-molecule real-time sequencing. Because this sequencing method has long-read capacity, it enables analysis of the plasmids at single-copy resolution. Since the mutation rate in the promoter regions of the 4-sgRNA cassettes was low (< 5%), and isolated mutations in these regions can usually be expected to be without functional consequences, the inventors focused on the protospacer and tracrRNA regions, and found that most wells of the 384-well plate had over 85% reads with perfectly correct sgRNA and tracrRNA regions. The others showed at least 2-3 sgRNA regions that were 100% correct (Figure 2B-E). The inventors further analyzed the plasmids that were not 100% correct, to understand the contribution of mutations and deletions in these plasmids, and found that both occurred in similar proportions (Figure 6). The inventors also examined the frequency of cross-contamination of plasmids between wells of the same plate, and found that fewer than 1 % of reads were contaminants (Figure 6). In summary, scaling of the APPEAL method to a high-throughput format is feasible, and the quality is acceptable for generating genome-wide CRISPR arrayed libraries.

Example 4: Versatility of the 4-sgRNA Vector

The inventors’ 4-sgRNA vector is amenable to lentiviral packaging; however, the insertion of 4-sgRNA expression cassettes greatly increases the packing size of lentiviral particles. The inventors packaged the inventors’ 4-sgRNA vectors into lentiviruses and robustly obtained titers of >10 7 TU/ml in raw culture-medium supernatants of 24-well plates (Figure 2F). These viruses greatly increased the sgRNA delivery rate, as measured by the fraction of tagBFP-positive cells via flow cytometry in non-transfectable cells, including the human lymphocyte-related cell lines THP-1 and ARH-77, the human neuroblastoma cell line GIMEN, the human glioblastoma cell line U251-MG, and patient-derived iPSCs (Figure 2G). The inventors further examined the efficiency of gene activation in non-transfectable iPSC-derived neurons (iNeurons) using lentivirus-mediated codelivery of dCas9-VPR and the 4-sgRNA vector since the inventors encountered difficulties in performing FACS for iNeurons due to the morphological changes after trypsinization. As shown in Figure 2H, gene activation is quite efficient, although its extent is gene-dependent. Together, these results further indicate the versatility and usefulness of the inventors’ 4-sgRNA vector.

Example 5: Updated Algorithms for Generic, Specific and Synergetic sgRNA Selection

To enable both gain-of-function and loss-of-fu notion arrayed CRISPR screens, the inventors decided to generate both CRISPRa and CRISPRo arrayed libraries for human protein-coding genes. Recently developed sgRNA design algorithms have greatly improved the likelihood of obtaining active sgRNAs. The inventors decided to adopt sgRNAs featured in existing, widely used pooled libraries to generate the arrayed libraries, namely sgRNAs from Calabrese and hCRISPRa- v2 for the CRISPRa library, and Brunello and TKOv3 for the CRISPRo library (Figure 3A). Common DNA polymorphisms in human genomes, such as single-nucleotide polymorphisms (SNPs), may reduce the effectiveness of sgRNAs if they affect the protospacer sequence or protospacer adjacent motif (PAM). However, with the exception of TKOv3, most existing libraries did not consider DNA polymorphisms when selecting sgRNAs. This drawback may hamper the efficiency of using these libraries for screens using patient-derived cells, for instance, patient-derived iPSCs. Furthermore, the recently developed GuideScan algorithm (http://www.guidescan.com) has been shown to predict off-target effects of sgRNAs very well, showing a strong correlation with the unbiased genome-wide off-target assay GUIDE-Seq. Based on this recent data, if the GuideScan score of a sgRNA exceeds 0.2, it is considered relatively specific.

To refine the above two parameters and choose the top 4 sgRNAs for building the libraries, the inventors designed a custom sgRNA selection algorithm (Figure 3A). The efficacy of CRISPRa- mediated gene activation relies on sgRNAs targeting a relatively narrow window around the transcription start site (TSS) of a gene, and some genes have more than one TSS. To account for multiple TSSs, the hCRISPRa-v2 library aimed to target major TSSs separately if they were spaced significantly apart (> 1 kilobase), and designed 10 sgRNAs per TSS. For the human CRISPRa library, the inventors adopted the same strategy for separating major TSSs, and included additional TSSs defined by the hCRISPRa-v2 library. The inventors then imposed two filters for the selection of sgRNAs: 1 ) no common polymorphism (with a frequency exceeding 0.1 %) should affect the 20- nucleotide protospacer sequence, or the two G nucleotides of the PAM sequence, according to the aggregated data from over 10’000 full human genomes (http://db.svstemsbiology.net/kaviar/): 2) the GuideScan score should exceed 0.2. Because some genes or TSSs did not have 4 sgRNAs that fulfilled these requirements, the inventors supplemented the above-mentioned libraries with sgRNAs from the CRISPick web portal (https://portals.broadinstitute.orq/qppx/crispick/public), which designs sgRNAs with the same algorithm that was used for the Calabrese and Brunello libraries for CRISPRa and CRISPRo, respectively. Finally, all possible combinations of four sgRNAs were ranked by their aggregated specificity score, and the top combination was chosen in order to minimize potential off-target effects (Figure 3A).

While choosing sgRNAs for the libraries, besides taking common DNA polymorphisms and predicted off-target effects into account, three additional points came into consideration. First, since the inventors generate their libraries using the Gibson assembly method, if two or more sgRNAs share identical subsequences of 8 nucleotides or more, the prevalence of correct plasmids decreased, because of recombination between the identical sequences among the four sgRNAs (Figure 7). Second, previous libraries are based on single sgRNAs, which were picked according to their ranking in predicted on-target efficacy and off-target effects. However, many sgRNAs chosen by these libraries for targeting a defined gene localize to the same genomic region, with only a few nucleotides difference. This was especially common in CRISPRa libraries, due to the relatively narrow target window for sgRNAs around the TSS. The inventors aimed to investigate whether the proximity between the binding sites of the four sgRNAs might affect their activity, and if overlapping or spaced sgRNAs should be preferred. The inventors tested six genes, and compared 4-sgRNA combinations that were spaced by at least 50 nucleotides with combinations that did not meet this criterion. Interestingly, the inventors observed that the use of four nonoverlapping sgRNAs (spaced at least 50 nucleotides apart) resulted in significantly higher gene activation, which suggests that unhindered binding of sgRNAs leads to a further synergistic effect (Figure 3B). Third, there are some sgRNAs in previously published libraries that target additional genes, in addition to the intended target gene. This can occur at additional perfect-match binding sites elsewhere in the genome (off-site unintended target genes), for example, when paralogous genes share stretches of identical sequence. It can also occur at a single, intended binding site, if more than one gene is potentially affected. In the case of CRISPRa, this may occur if two genes located on opposite strands of the genome share a promoter region (bidirectional promoters). Typically, sgRNAs are only annotated with the intended gene, so many users of the library may not be aware of this issue, and this affects interpretation of the data. In the sgRNA selection pipeline, the inventors avoided sgRNAs with multiple perfect-match binding sites, wherever possible (Figure 3A). In the case of families of closely related or nearly identical genes, for the sake of simplicity, the inventors nevertheless created a separate 4-sgRNA plasmid for each gene that possessed its own unique Entrez gene identifier.

Example 6: Overview of the CRISPRa and CRISPRo Libraries

As shown in Figure 4A, sgRNAs selected via the updated algorithm for the two libraries originated mainly from previously published libraries. Compared to the individual pooled libraries from which most of the sgRNAs were adopted, the genome-wide libraries show the most extensive coverage of human protein-coding genes (Figure 4B). Since the inventors preferred high-specificity sgRNAs (and avoided those with GuideScan scores below 0.2), the sgRNAs selected for the libraries showed higher predicted specificity (Figure 4C) without sacrificing their predicted efficacy (Figure 4D). Importantly, both the CRISPRa and CRISPRo libraries show a significant improvement in targeting generic regions of the genome, by avoiding genetic polymorphisms (Figure 4E and 4F). Notably, the algorithm for sgRNA selection greatly increased the proportion of sgRNAs spaced at least 50 nucleotides apart (compared with simply selecting the top four sgRNAs) (Figure 4G) and thus potentially further increased the activity of the two libraries, as reflected in Figure 3B. Furthermore, in the libraries, the inventors were able to completely avoid 4-sgRNA combinations that shared any identical sub-sequences of 8 base pairs or more in length (Figures 8A and B), thus ensuring minimal spacing for all genes. The inventors split the libraries into mutually exclusive sublibraries (see below), based largely on the categories used by the pooled library hCRISPRa-v2.

Sub-library CRISPRa CRISPRo

Transcription factors 1634 1634

GPCRs 802 800

Secretome 1694 1692

Membrane Proteins (non-GPCR) 1760 1759

Kinases/Phosphatases/Drug Targets 2012 2012 Mitochondria/T rafficking/Motility 2052 2052

Stress/Proteostasis 2699 2699

Cancer/Apoptosis 2266 2265

Gene Expression (non-TF) 1131 1132

Unassigned 3789 3775

Non-targeting 4-gRNA plasmid controls 116 116

Total genes targeted 19839 19820

Total TSS/ORF targeted 22326 19820

Total plasmids in the library 22442 19936

As mentioned in the sgRNA design section above, the inventors tried to exclude sgRNAs with potential pleiotropic off-target effects. In some instances, due to the structure of the human genome, targeting of additional, unintended genes was unavoidable. Indeed, the inventors observed that within a window of one kilobase surrounding the TSS, around 20% of CRISPRa sgRNAs targeted additional protein-coding or non-coding genes; this proportion was almost identical in all libraries the inventors examined, including the CRISPRa library (Figure 4H). If a sgRNA has more than one perfect-match target sequence in the genome, it can lead the targeting of unintended genes at a location remote from the intended gene (off-site targets). In the CRISPRo library, such unspecific sgRNAs were mostly excluded, whereas in the CRISPRa library, the proportion of sgRNAs with off-site targets was comparable to the reference pooled libraries (perhaps owing to the narrow target window around the TSS, which limited the pool of eligible sgRNAs) (Figure 4I). For the comparison among libraries, only genes (and TSSs) common to all libraries were included, to ensure a fair comparison; Figure 8C-F and 9 display the characteristics of the 4-sgRNA library as a whole.

Discussion

In this study, the inventors circumvented the relatively low and unpredictable efficiency of single sgRNAs in gene perturbation by combining four sgRNAs into a single vector. The newly developed APPEAL method for the cloning of 4-sgRNA vectors enabled the generation of human genomewide CRISPRa and CRISPRo libraries in a short time frame and in a cost-effective manner. The customized sgRNA selection algorithm ensured that the libraries target specific and generic regions of human genomes. Thus, the inventors have generated next-generation highly active and robust genome-wide arrayed libraries that can be easily adapted to many systems, and should help meet a need that is insufficiently covered by existing resources. Efficacy of CRISPR-Based Gene Perturbation

Efficacy is one of the most important parameters for any gene perturbation approach. In CRISPR- based approaches, most perturbations rely on single sgRNAs. Although the algorithms for the prediction of sgRNA activity have been improved, the efficacy of the single sgRNA-based approach is still relatively low and unpredictable. Previous CRISPR libraries, especially pooled libraries, often include several sgRNAs per gene in separate vectors, which can, to some extent, circumvent the problem of unpredictable efficiency of gene perturbation. However, the relatively low efficacy of single sgRNAs cannot be solved in this way. More importantly, the relatively low efficacy of single sgRNAs may frequently fail to elicit a functional consequence, and this will lead to false-negative calls when determining hit genes. These drawbacks hamper the goal of achieving a global understanding of a phenotype of interest, and may lead to the omission of important druggable targets using the existing CRISPR libraries in both pooled and arrayed approaches. Furthermore, multiplexed single-sgRNA-based perturbation strategies increase the workload and cost of CRISPR screens, since the number of vectors far exceeds the number of targeted genes. The 4- sgRNA/vector approach greatly increases the efficiency and robustness of CRISPR-mediated gene activation and ablation. Thus, the 4-sgRNA-based highly efficient and robust human arrayed libraries are next-generation resources for genome-wide CRISPR screens.

4-sgRNA Vector Strategy

The necessity of colony-picking and complex procedures in the traditional cloning approach makes it poorly suited to the construction of the 4-sgRNA vectors. Furthermore, previously developed cloning strategies for multiplexed sgRNA expression in a single vector employ complicated steps, especially the need for gel extraction of desired DNA fragments between cloning procedures, which makes these methods impractical for large-scale projects. The newly developed APPEAL method avoids colony-picking and simplifies the cloning procedure, making the generation of the 4- sgRNA/vector-based arrayed libraries feasible in an automatic and highly cost-effective manner. To the inventors’ knowledge, the antibiotic switch-based cloning method APPEAL enables plasmid construction without the need for colony-picking for the first time. Moreover, in principle, this antibiotic switch-based cloning method can be adapted for the generation of any large-scale library, and even molecular cloning in general. Thus, the inventors’ method may inform future approaches for the generation of large-scale plasmid libraries.

The delivery of multiplexed sgRNAs to the same cell may increase the possibility of off-target effects. Here, the inventors have developed their own updated sgRNA selection algorithm to adopt the most specific combination of four sgRNAs from existing, well-validated libraries and resources to minimize the potential for off-target effects. Nevertheless, a further genome-wide systematic empirical validation of off-target effects of the 4-sgRNA vectors in the libraries would be valuable. However, while the inventors do not yet have a full picture of the exact off-target effects of the libraries, this does not diminish the power and usefulness of the libraries. In any phenotypic screening approach, subsequent validation of hit genes is required. The superiority and robustness of the libraries in gene perturbation should allow them to identify significantly more hits compared to existing libraries. Moreover, many orthogonal approaches exist for a second-round validation of hits identified with the libraries. Thus, a combination of the inventors’ libraries and orthogonal resources reduces the impact of possible off-target effects and enables genetic screenings in a powerful and efficient manner.

Versatility of 4-sgRNA-Based Libraries

The 4-sgRNA vector includes basic elements such as puromycin resistance and TagBFP, and is compatible with lentiviral packaging and PiggyBac-mediated transposon integration. Thus, the inventors’ libraries can be used as plasmids in transfectable cells and lentiviral particles in non- transfectable cells, such as primary cells and patient-derived iPSCs and derivates. Furthermore, the inventors’ vector employs four different promoters - the human U6, mouse U6, human H1 and human 7SK promoters, which are active in many cell types to drive the expression of four different sgRNAs. Besides minimizing the propensity for possible recombination between sgRNA expression cassettes, this design maximizes the activity of sgRNAs in different cell lines, minimizing the risk of reduced or absent expression of a sgRNA driven by a single promoter (usually the human U6 promoter). Furthermore, the inventors’ in-house sgRNA selection algorithm ensures that sgRNAs adopted for the inventors’ libraries target the most generic regions of human genomes by avoiding genetic polymorphisms. This should contribute to the high efficiency of the inventors’ libraries, and make them exceptionally useful resources for identifying genetic modifiers (especially correctors) of a disease-related phenotype in patient-derived cells and organoids.

Validation of hit genes from pooled CRISPR screens, and orthogonal validation of hits from RNA- interference-based screens, involves the perturbation of individual genes, and single sgRNA- based methods have been commonly used. However, due to their unpredictability, the efficiency of single sgRNA needs to be validated before they can be employed for hit validation. The inventors’ libraries feature high activity and redundancy, and thus provide very valuable resources for manipulating a selected set of genes as well. Although demultiplexing of sgRNA expression vectors from a pooled library is very challenging, multiplexing arrayed libraries into pooled libraries is relatively easy. Due to the high efficacy of 4-sgRNA vectors, pooling the arrayed libraries into pooled CRISPRa and CRISPRo libraries can be very appealing too.

In summary, the inventors aimed to maximize the versality and potential usefulness of the libraries and in principle, the libraries benefit all research fields, including both academic and industry-based research.

Mutations and Recombinations in the 4-sgRNA Vector

The antibiotic-switch-based cloning method eliminated the necessity of colony-picking and enabled high-throughput liquid-phase cloning of the libraries in an automatic manner. Thus, in each well of these libraries, the composition of the plasmids is polyclonal. As shown in Figure 2B-2E, the fraction of entirely correct 4-sgRNA plasmids in a defined well is generally around 80%, and each of the 4 sgRNA regions was correct in -90% of plasmids. The remaining clones are 4-sgRNA plasmids, but showed some mutations, mainly in the region of sgRNAs and tracrRNAs. These mutations have two origins: a) Mutations can be introduced during oligonucleotide synthesis; b) due to the limitation in the fidelity of Taq DNA ligase, some mismatches in the double-stranded DNA overhangs are tolerated, which may allow the ligation of mismatched DNA during Gibson assembly cloning. The mutation rate of the sgRNA cassettes is consistent with the finding that with the Gibson assembly method, mutations occur in about 10% of plasmids.

The inventors used four different promoters to drive the expression of four sgRNAs with four variants of tracrRNAs to minimize the propensity for recombination between the sgRNA expression cassettes. Indeed, the inventors observed a low prevalence (<10%) of recombination between sgRNA expression cassettes. Importantly, the great majority of plasmids with a recombination retained two or three intact sgRNA sequences, and should thus remain functional. Furthermore, if multiple copies of the vector are delivered to a cell (i.e., with transfection or when using a high multiplicity of infection), the remaining correct clones will compensate for the small fraction of individual plasmids with a recombination or mutation. Indeed, the average percentage of entirely correct protospacer sequences for each of the four sgRNAs was greater than 90%.

General Cloning Methods

In addition to the above, the present specification encompasses any method that adopts the antibiotic switch or any way of selection switch between the precursor plasmids and final desired plasmids derived from the method, which enables the avoidance of colony picking in molecular cloning, can which be applied to any large-scale library construction/generation, and even small scale normal molecular cloning.

The inventors emphasize that the method of the invention can be generalized, in order to avoid any commercial adaptation of any method derived from the methods presented herein.

An alternative aspect of the invention relates to a method for molecular cloning of an insertcontaining plasmid from a parent plasmid, wherein a. the parent plasmid comprises a first selection marker flanked by recognition sequences susceptible to endonuclease action (digestion, cleavage) by a type II restriction enzyme; b. the parent plasmid is cut (digested) with the type II restriction enzyme, generating two fragments: i. a parent plasmid backbone (single stranded overlapping) fragment having a 5’ backbone end and a 3’ backbone end (single strand ends);

II. a first resistance marker fragment; c. an insert fragment containing a second selection marker, and comprising ends that anneal to (fit) the parent 5’ backbone end and a 3’ backbone end (sticky ends) is ligated to the parent plasmid backbone fragment, wherein the insert fragment is flanked by a sequence adjacent to the ends that does not reconstitute the type II recognition site; d. the product of the ligation is transferred into cells under conditions where only cells containing the second selection marker can propagate, and the resulting cell suspension is worked up (prepped) to obtain the insert-containing plasmid.

With restriction enzyme digestion of an insert to yield complementary single strand ends that match to the vector backbone, there are some issues that hamper the methods to be scaled up for high- throughput cloning: 1 ) If the insert comes from PCR, after digestion this would need an extra step of purification of the digested product otherwise DNA polymerase usually has exonuclease activity and will degrade the single stand ends; 2) if the insert comes from digestion of existing vectors, there also needs purification otherwise the insert can later ligate back to the vector it comes from and in the ligation step this will hamper the antibiotic switch principle because not only the desired plasmids will grow in the selection condition the insert have a selection marker for; 3) in theory, there is a third option to obtain the insert that is genome-wide synthesis of these fragments and then digestion with restriction enzymes, however, the cost will be significant.

No selection marker switch method as provided herein has been published and never been used for large scale library generation before. Furthermore, the concept of eliminating colony picking in a molecular cloning procedure has not been published either.

The method according to this last aspect can be regarded as part of the unitary invention provided herein. One can enter the process with either the precursor plasmid or the backbone fragment that the first selection marker has already excised. For a cloning from scratch, one would start usually from the precursor plasmid and prepare a big amount of digested backbone and then for a later cloning one can begin from the excised backbone.

Example - Materials and Methods

DNA constructs

The DNA constructs used in the study, except for the 4-gRNA expression plasmids (whose construction is described separately in the following), include hCas9 (addgene #41815) and lentiCas9-Blast (addgene#52962), SP-dCas9-VPR (addgene#63798) and pXPR_120 (lenti- dCas9-VPR-Blast, addgene#96917), psPAX2 (addgene#12260), VSV-G (addgene#8454), and pYJA5.

The pYJA5 construct was created by modifying the lenti-PB vector (Metzakopian et al., 2017, Sci Rep 7, 2244) (a gift from Dr. Allan Bradley) in three steps. First, the DNA fragment flanked by the recognition sites for the restriction enzymes Mlul and Agel in the lenti-PB vector was replaced by a synthesized DNA fragment that included the human U6 promoter and the fourth variant of tracrRNA, as well as an ampicillin resistance gene (P-lactamase expression cassette). Two Bbsl (type II restriction enzyme) recognition sites flanking the p-lactamase expression cassette were introduced into the new fragment, in order to facilitate the removal of the p-lactamase expression cassette. In a second step, the original ampicillin resistance (P-lactamase) expression cassette in the lenti-PB vector was removed between the two BspHI restriction enzyme recognition sites. After its removal, the insertion of 4-gRNA expression cassettes containing a trimethoprim resistance gene (dihydrofolate reductase) achieves antibiotic-switch-based cloning. Furthermore, all BsmBI recognition sites were mutated. Detailed sequences of the pYJA5 and 4-gRNA-pYJA5 constructs are shown below:

Sequence of the backbone pYJA5 vector (SEQ ID NO. 001 )

(see sequence protocol)

Sequence of 4-gRNA-pYJA5 (N20 indicates sgRNA sequence) (SEQ ID NO. 002):

(see sequence protocol)

Four sgRNA primer seq 5’-3’ (N20 in sg1 primer Fwd, sg2 primer Fwd, sg3 primer Fwd is exactly the sgRNA sequence, however, in sg4 primer Rev it should be the reverse complement sequence of the sgRNA): sg1 primer Fwd (SEQ ID NO. 003): ttgtggaaaggacgaaacaccGN2oGTTTAAGAGCTAAGCTG sg2 primer Fwd (SEQ ID NO. 004): cttggagaaaagccttgtttGN2oGTTTGAGAGCTAAGCAGA sg3 primer Fwd (SEQ ID NO. 005): gtatgagaccactctttcccGN2oGTTTCAGAGCTAAGCACA sg4 primer Rev (SEQ ID NO. 006): ATTTCTGCTGTAGCTCTGAAACN2oCgaggtacccaagcggc

Common primers seq 5’-3’: mll6 Rev (SEQ ID NO. 007): CAAACAAGGCTTTTCTCCAAGGG

M Rev (SEQ ID NO. 008): Cgggaaagagtggtctcataca

Constant template seq

C1 sequence 5’-3’ (SEQ ID NO. 009):

(see sequence protocol)

M 5’-3’ (SEQ ID NO. 010):

(see sequence protocol)

C2s 5’-3’ (SEQ ID NO. 011 ):

(see sequence protocol)

Single gRNAs were cloned into the homemade pYJA4 vector) individually via the previously established method (Koike-Yusa et al., 2014, Nat Biotechnol 32, 267-273). In silico 4gRNA libraries design

Pooling existing libraries

To provide a starting point for guide RNA selection, the inventors collected gRNAs from previously published and validated libraries and tools, which each employed their own algorithms to select gRNAs with high predicted on-target efficacy. The inventors included the Calabrese (Sanson et al., 2018, Nat Commun 9, 5416) and hCRISPRa v2 (Horlbeck et al., 2016, Elife 5) libraries for CRISPRa, and the TKOv3 (Hart et al., 2017, G3 (Bethesda) 7, 2719-2727) and Brunello (Doench et al., 2016, Nat Biotechnol 34, 184-191 ; Sanson et al., ibid.) libraries for CRISPRo. The inventors complemented these source libraries with gRNAs from the CRISPick tool (formerly GPP sgRNA Designer) (Doench et al., 2016, ibid; Hanna and Doench, 2020, Nat Biotechnol 38, 813-823; Sanson et al., ibid), to ensure optimal coverage of difficult-to-target and newly annotated genes (the website was accessed in April 2020, following the update of the 20th March 2020).

Gene definitions

Entrez gene identifiers were used to provide common gene definitions for gRNAs from all sources. If the source library did not provide Entrez identifiers, the official gene symbols were mapped to Entrez IDs, and the genomic location was used to disambiguate gene symbols, when necessary. Genes that were not defined as protein-coding by NCBI or Ensembl were excluded (according to the annotation files ftp://ftp.ncbi.nih.gov/gene/DATA/GENE_INFO/Mammalia/Homo_sap iens.gene_info.gz and ftp://ftp.ensembl.org/pub/release-99/tsv/homo_sapiens/Homo_s apiens.GRCh38.99.entrez.tsv.gz, both downloaded on 25 March 2020). The final libraries included 19839 protein-coding genes for CRISPRa, and 19819 for CRISPRo; the difference in gene counts arises from a small number of genes that are present for only one modality in the source libraries (for example, highly polymorphic genes related to adaptive immunity, such as the T Cell Receptor Alpha Locus (TRA) gene, are available for CRISPRa, but not CRISPRo).

TSS definitions

To ensure good coverage of alternative transcripts, and broad applicability of the CRISPRa library in multiple cell lines, the inventors adopted the alternative transcription start site (TSS) definitions from the hCRISPRa-v2 library (Horlbeck et al., 2016, ibid). The authors of this library used the FANTOM5 CAGE-Seq dataset (FANTOM Consortium and the RIKEN PMI and CLST (DGT) et al., 2014, Nature 507, 462-470), supplemented by Ensembl (Yates et al., 2020, Nucleic Acids Res. 48, D682-D688.) transcript models, to define TSS positions; additional TSSs were targeted by their own set of gRNAs if the FANTOM5 scores indicated significant transcriptional activity, and if they were spaced more than one kilobase apart from the primary TSS. The inventors chose a separate set of four gRNAs for each TSS, treating multiple TSSs as if they were separate genes. To group gRNAs by TSS, the inventors mapped gRNAs from all sources (including the top five gRNAs from the GPP sgRNA Designer) to their genomic locations, and iterated through each gRNA, starting with the lowest genomic coordinate; a new TSS group was defined if the distance from one guide to the next exceeded 1000 base pairs. Additional TSSs were only targeted if a valid combination of four guides was available. Multiple TSSs were included for 2311 genes (using 4803 four-guide combinations), whereas a single TSS was targeted for the remaining 17528 genes.

Avoidance of genetic polymorphisms

For each gRNA, the inventors checked for overlaps with regions of frequent genetic polymorphism in human populations, in either the 20-nucleotide protospacer sequence, or the two guanosine nucleotides of the protospacer adjacent motif (NGG). The inventors avoided gRNAs whose target region contained any genetic polymorphisms with frequencies greater than 0.1 %. Variant frequencies were derived from the Kaviar database (Glusman et al., 2011 , Bioinforma. Oxf. Engl. 27, 3216-3217), which includes curated genomic data on single nucleotide variants, indels, and complex variants from over 77000 individuals (including over 13000 whole genomes). The dataset (only variants seen more than 3 times, version 160204-hg38) was downloaded on 7 August 2019. The polymorphism frequencies in the Kaviar database were generally similar to those from TOPMED, gnomAD, and the 1000 Genomes Project.

Specificity scores

In order to select a four-guide combination with minimal off-target effects, the inventors computed specificity scores for each gRNA from the source libraries. The inventors used the approach introduced by the authors of the GuideScan (Perez et al., 2017, Nat. Biotechnol. 35, 347-349) tool: For each guide, potential off-target sites were weighted by their CFD (cutting frequency determination) scores (Doench et al., 2016, ibid), and CFD scores were aggregated into a single score using the formula: 1 / (1 + sum of CFD scores from all off-target sites (Hsu et al., 2013, Nat. Biotechnol. 31 , 827-832). Because the pre-computed GuideScan Cas9 database does not contain all gRNAs (it excludes those with perfect-match or one-mismatch off-target sites in the reference genome), the inventors annotated gRNAs using both the GuideScan and CRISPOR (Concordat and Haeussler, 2018, Nucleic Acids Res. 46, W242-W245; Haeussler et al., 2016, Genome Biol. 17, 148) tools. Local installations of these tools were used, and the source code was downloaded in December 2020 (GuideScan version 2018-05-16, and CRISPOR version 4.97); the output of the local installations was confirmed to be identical to that of the web-based tools. When available, GuideScan specificity scores were used (considering up to three mismatches); otherwise, CRISPOR specificity scores were used (considering up to four mismatches). CRISPOR three- mismatch (3MM) and four-mismatch (4MM) specificity scores analogous to those from GuideScan were computed, using the detailed output files listing each off-target site. GuideScan and CRISPOR specificity scores were highly correlated, but not identical, due to slight differences in the number of off-target sites identified for the same sequence. When selecting gRNAs, the inventors avoided low-specificity guides with 3MM scores below 0.2; this cut-off point was recently shown to have good predictive power for identifying gRNAs with significant off-target activity. However, this criterion had to be relaxed in cases where all eligible gRNAs had specificity scores below 0.2, for example, when targeting genes present in multiple copies in the genome, or those belonging to large gene families with many closely related paralogs and pseudogenes. Finally, in order to choose among all eligible four-guide combinations, the inventors computed an aggregate specificity score, using the formula: 1 / (1 + sum of CFD scores from all four guides), and picked the combination with the highest score, indicating high predicted specificity.

Guide RNA spacing

To allow for unhindered multiple binding for synergistic effect, the inventors aimed to select four gRNAs whose “cut” locations were spaced at least 50 base pairs apart. However, for CRISPRa, target sequences should be located within a window of about 400 base pairs upstream of the TSS for optimal activity, which is reflected in the selection of gRNAs in the source libraries; thus, overlaps were unavoidable for some genes. For CRISPRo, on the other hand, overlaps were often inevitable when targeting genes with very short coding sequences. In those cases, the inventors nevertheless aimed to minimize the total number of overlaps between neighbouring guides. Furthermore, all four-guide combinations strictly adhered to another criterion: No two gRNAs were allowed to share identical sub-sequences of more than seven base pairs. This was done primarily to minimize recombination events between identical regions during Gibson assembly of the plasmid. However, this also enforced minimal spacing of the four selected guides.

Selection of four gRNAs

After integration and annotation of gRNAs from the source libraries, the inventors selected the final combination of four gRNAs for each gene or TSS. First, gRNAs containing a stretch of four or more T nucleotides were excluded, since this sequence can induce termination of transcription. Next, all possible four-guide combinations for each gene were generated, and combinations that shared identical subsequences greater than seven base pairs in length were excluded. The potential combinations were then ranked, using a list of criteria that were applied in order; if multiple combinations were tied in first place, the decision was made using the next criterion down the list. The criteria were as follows: 1 ) Maximize the number of gRNAs (from zero to four) that fulfil certain minimal requirements - the gRNA can be mapped to a defined genomic location in the reference genome with an N(GG) PAM; there are no overlaps with frequent genetic polymorphisms (>0.1 %); the 3MM specificity score is at least 0.2; and for CRISPRo only, the guide conforms to the criteria of Graf et al. ( Graf et al., 2019, . Cell Rep. 26, 1098-11O3.e3); 2) maximize the number of gRNAs with exactly one perfect match location in the reference genome, 3) minimize the number of overlaps between two neighbouring gRNAs spaced fewer than 50 base pairs apart, 4) minimize the number of gRNAs derived from the GPP sgRNA Designer tool, rather than the previously published libraries, 5) for CRISPRa, minimize the number of gRNAs derived from the “supplemental 5” rather than “top 5” gRNAs for the hCRISPRa-v2 library, and for CRISPRo, minimize the number of GPP-derived gRNAs ranked outside the top 10, and 6) maximize the aggregate specificity score from all 4 guides. The highest- ranked four-guide combination was chosen. Since the aggregate specificity score was the only quantitative criterion, it acted as a tiebreaker, and had the greatest impact on the choice of guides. Sub-library allocation

To facilitate focussed screens of a subset of the genome, the inventors divided the entire set of protein-coding genes into mutually exclusive sub-libraries. Two of the inventors’ sub-libraries - Transcription Factors, and Secretome - were based on recent publications that combined bioinformatics analyses with expert curation to arrive at a comprehensive list of genes in those categories (Lambert et al., 2018, Cell 172, 650-665; Uhlen et al., 2019, ibid). These lists were obtained from the publication’s supplemental data (for the secretome) or the authors’ website (for the transcription factors; humantfs.ccbr.utoronto.ca, database version 1.01 ). Ensembl gene IDs were translated to Entrez gene IDs, making use of HUGO gene symbols to disambiguate one-to- many mappings for a few genes. A third sub-library was based on a list of G-protein coupled receptors, curated by the HUGO Gene Nomenclature Committee (HGNC) (Braschi et al., 2019, Nucleic Acids Res. 47, D786-D792) (https://www.genenames.org/cgi- bin/genegroup/download?id=139&type=branch, accessed on 11 Marc 2020). An additional seven thematic sub-libraries were adopted from the hCRISPRa-v2 library (Horlbeck et al., 2016, ibid): Membrane Proteins, Kinases/Phosphatases/Drug Targets, Mitochondria/Trafficking/Motility, Stress/Proteostasis, Cancer/Apoptosis, Gene Expression, and Unassigned. The first two of these thematic sub-libraries were updated to incorporate a small number of additional transmembrane receptors, transporters, kinases and phosphates, using Gene Ontology terms (exported from BioMart (Smedley et al., 2009, BMC Genomics 10, 22) on 25 March 2020) and a list of membrane proteins provided by the Human Protein Atlas project (Uhlen et al., 2015, Science 347, 1260419) (https://www.proteinatlas.org/search/protein_class:Predicted +membrane+proteins, accessed on 11 March 2020). If a gene belonged to multiple categories, it was assigned to the first sub-library (in the order in which they are listed in this section), and all remaining genes were added to the Unassigned sub-library.

Classification of unintended gene perturbations

Some gRNAs are expected to perturb additional genes, in addition to the intended target gene. In certain cases, this occurs at a different locus than the intended target site: For example, gene families of very close paralogs can often only be targeted with gRNAs that have multiple perfectmatch binding sites in the genome. However, in most cases, this involves a single locus - the intended binding site - where a gRNA may perturb more than one gene. In the case of CRISPRa, the same promoter region is often shared by two genes located on opposite strands of the chromosome, so that their transcription start sites (TSSs) lie only a few hundred base pairs apart. In this case, guide RNAs that effectively activate one gene would inevitably also activate the other. As a guide for users of the library, and to aid the interpretation of hit genes, the inventors annotated gRNAs with a complete list of all genes they target. For the purpose of summarizing this phenomenon across the entire library, the inventors classified each gRNA as 1 ) only targeting the intended gene, 2) targeting unintended genes, but in a single location, or 3) targeting unintended genes at other locations. For this analysis, if two perfect-match gRNA binding sites had any target genes in common, they were considered to target unintended genes at the same location (which is especially relevant for gRNAs targeting the pseudo-autosomal region of chromosomes X and Y).

Annotation of unintended target genes

To annotate each gRNA with all its potential target genes, a database of TSS locations was constructed by merging the FANTOM5 dataset (lifted over to the hg38 genome (Abugessaisa et al., 2017, Sci. Data 4, 170107), version 3) with data from BioMart (Smedley et al., 2009, ibid) (exported on 25 March 2020), using Entrez gene IDs as a common identifier. Similarly, data on coding sequence (CDS) and exon locations were compiled from BioMart, the “TxDb.Hsapiens.UCSC.hg38.knownGene” Bioconductor package (version 3.10.0), and GENCODE (Frankish et al., 2019, Nucleic Acids Res. 47, D766-D773) annotation data (Release 33), and location data were merged using Entrez gene identifiers (if available) or Ensembl gene identifiers. Genes annotated as pseudogenes, or whose categorization was unclear, were excluded from further analysis. For CRISPRa, perfect-match sgRNA binding sites within a window of 1000 base pairs around TSSs were considered. For CRISPRo, sgRNA cut locations had to lie within the coding sequences (CDSs) of protein-coding genes, or within the exons of non-coding RNAs.

Annotation of predicted deletions

In the case of CRISPRo, when four gRNAs are active within the same cell, the multiple, closely spaced double-strand breaks commonly lead to the loss of a DNA segment between the gRNA cut locations. Thus, in addition to annotating individual gRNAs, the inventors also determined which genes are affected by the predicted deletion - the segment between the first and last cut site. The inventors also took deletions induced by (perfect-match) off-target binding sites into consideration. Because deletions may be less likely to occur if the cut sites are very far apart, the inventors imposed a maximum distance of one megabase between cut sites, so that multiple predicted deletions (or isolated cut positions) on the same chromosome were possible.

In silico comparison of CRISPR libraries

To compare in silico characteristics of existing libraries and the 4sg library, the top four guides per gene were selected. Whereas the Brunello (Doench et al., 2016, ibid; Sanson et al., 2018, ibid) and TKOv3 (Hart et al., 2017, ibid) libraries were designed to contain four sgRNAs per gene, the Calabrese library (Sanson et al., 2018, ibid) was divided by the authors into Set A and Set B, each containing three sgRNAs per gene. To define the top four sgRNAs, the sgRNAs from Set A were supplemented with a randomly selected sgRNA from Set B. For the hCRISPRa v2 (Horlbeck et al., 2016, ibid) and GPP libraries, the four highest-ranked gRNAs were chosen (using the “Pick Order” column in the output from the GPP sgRNA designer tool). Since the libraries differed in the genes they covered, and since different genes vary in the availability of potential gRNAs with high predicted activity and specificity, only genes present in all libraries were used for benchmarking. Furthermore, for genes for which the 4sg and hCRISPRa v2 libraries included more than one transcription start site (TSS), only the gRNAs targeting the main TSS was included, defined as the TSS with the highest score in the FANTOM5 dataset, or - if data were unavailable for that gene - the most upstream TSS. To compare the expected number of gRNA binding sites affected by genetic polymorphisms, the frequencies of the most common polymorphisms overlapping each sgRNA were summed up. This is a conservative estimate, since SNPs with frequencies below 0.1 % were excluded. Furthermore, in the case of multiple single nucleotide polymorphisms (SNPs) overlapping with a gRNA, only the most frequent was considered. Because linkage disequilibrium between SNPs affecting the same sgRNA is highly likely, a precise estimation of the total probability of overlaps with polymorphisms would require access to the individual sequencing data underlying the SNP databases.

Software and code

The annotation and selection of gRNAs for the library design was performed using the R statistical programming environment (R Core Team, 2020), version 3.6.3, and the Bioconductor suite (Huber et al., 2015, Nat. Methods 12, 115-121 ), version 3.10. Source code is available upon request, and will be made public upon final publication of the manuscript.

PacBio next-generation sequencing of libraries

Barcoding, amplification, and long-read sequencing

To assess the frequency of mutations, recombinations and deletions within the polyclonal population of 4sg plasmids, single-molecule long-read sequencing was performed. Plasmids were amplified using polymerase chain reaction (PCR) with barcoded primers that uniquely identified each targeted gene. In the inventors’ pilot sequencing run, this was achieved using a combination of 16 different forward primers and 24 different reverse primers (distinguishing the rows and columns of the 384-well plate, respectively). The amplified region was 2225 base pairs in length, encompassing the entire 4sg expression cassette (containing all four promoter, guide RNA and tracrRNA sequences, as well as the trimethoprim resistance element), and was flanked by two 10- bp paired barcode sequences. Amplicons from all wells were pooled and single-molecule real-time (SMRT) sequencing data were generated using the PacBio Sequel instrument.

Processing of long-read seguencing data

Circular Consensus Reads (CCS) consensus calling was done using the PacBio SmrtLink software using default parameters, and only consensus reads with at least 5 full-pass subreads and an estimated read accuracy 99.9% were retained. Barcode demultiplexing was also done using the SmrtLink software, and to minimize incorrect well assignments, reads with a barcode score lower than 60 or a score lead lower than 30 were omitted. Reads were further filtered using a custom script to ensure the following criteria were met for each barcode: Either the barcode sequence must be present in full and entirely correct, or the sequence must be at least 8 bp in length and flanked by an entirely correct 20-bp flanking constant region (representing the constant region of the primers used for PCR amplification). These additional steps proved necessary to ensure that only complete reads were retained (containing the forward and reverse primer sequences), and to exclude truncated reads whose terminal sequences were incorrectly interpreted as truncated barcodes. Finally, consensus reads with an average per-base Phred Quality Score below 85 were excluded (with the highest achievable mean Phred Quality score being 93). In the pilot sequencing run, 78351 consensus reads remained after filtering, with an average of 204 reads per well (ranging from 63 to 1098).

Analysis of consensus reads

To quantify the percentage of correct guide RNA sequences, and to identify contaminations from other wells, each read was searched for the gRNA + tracrRNA sequences in the forward and reverse directions, and all perfect matches were counted. To further characterize incorrect sequences, each consensus read was aligned to the corresponding barcoded reference sequence for that well, with the “pairwiseAlignment” function of the Biostrings R/Bioconductor package, version 2.54.0, using default parameters. The region corresponding to the gRNA + tracrRNA sequence of the reference was then extracted from the aligned read, and each sequence was classified as a) entirely correct, b) a contamination (if it is a perfect match for a gRNA sequence from another well), c) a large deletion (if >50% of the aligned sequence was composed of gaps), or d) some other mutation.

APPEAL high-throughput generation of libraries

Oligo synthesis

Twenty-nucleotide sgRNA sequences were incorporated into oligonucleotide sequences with appended constant sequences, and synthesized in 384-well plates using the high affinity purification (HAP) purification method by Sangon Biotech (China). The sgRNAI (gRNA1 sequence, N20sg1 ) oligonucleotide sequence is: 5’- ttgtggaaaggacgaaacaccGN20sg1 GTTTAAGAGCTAAGCTG-3’ (SEQ ID NO. 003); sgRNA2 (gRNA2 sequence, N20sg2) oligo sequence is: 5’- cttggagaaaagccttgtttGN20sg2GTTTGAGAGCTAAGCAGA-3’ (SEQ ID NO. 004); sgRNA3 (gRNA3 sequence, N20sg3) oligo sequence is: 5’- gtatgagaccactctttcccGN20sg3GTTTCAGAGCTAAGCACA-3’ (SEQ ID NO. 005); and sgRNA4 (reverse complement sequence of gRNA4, N20 crsg4) oligo sequence is 5’- ATTTCTGCTGTAGCTCTGAAACN20crsg4Cgaggtacccaagcggc-3’ (SEQ ID NO. 006). The oligonucleotides were diluted with ultrapure water to a working concentration of 4 pM.

Three-fragment polymerase chain reactions (PCRs)

A total of 10 pL PCR reaction per well was performed in 384-well plates.

The C1 fragment (amplicon size 761 bp) PCR mix was prepared as follows: volume/well Reagent

0.2 pL C1 fragment 1 ng/pL

0.2 pL mll6 Rev primer 10 pM

(common primer, sequence attached) 2 pL 5X HF buffer

0.2 pL dNTPs 10 mM

0.1 pL Phusion High-Fidelity DNA polymerase

6.8 pL ddH 2 O

9.5 pL of the mix were aliquoted in each well of the 384-well plate, and 0.5 pl of sgRNAI primer (at 4 pM concentration) was added to each well and mixed.

The M fragment (amplicon size 360 bp) PCR mix was prepared as follows: volume/well Reagent

0.2 pL M fragment 1 ng/pL

0.2 pL M Rev primer 10 pM

(common primer, sequence attached)

2 pL 5X HF buffer

0.2 pL dNTPs 10 mM

0.1 pL Phusion High-Fidelity DNA polymerase

6.8 pL ddH 2 O

9.5 pL of the mix were aliquoted in each well of the 384-well plate, and then 0.5 pl of sgRNA2 primer (at 4 pM concentration) was added to each well and mixed.

C2s fragment (amplicon size 422 bp) PCR mix was prepared as follows: vol/well Reagent

0.2 pL C2s fragment 1 ng/pL

2 pL 5x HF buffer

0.2 pL dNTPs 10 mM

0.1 pL Phusion High-Fidelity DNA polymerase

6.5 pL ddH 2 O

9 pL of the mix were aliquoted in each well of the 384-well plate, and then 0.5 pL of sgRNA3 primer (at 4 pM concentration) and 0.5 pL of sgRNA4 primer (also at 4 pM concentration) were added to each well and mixed.

The Integra ViaFlo 384-well pipetting system was used for all 384-well liquid handling. All the PCR plates were sealed tightly and centrifuged at 2000 rpm for 2 minutes, and placed in thermocyclers with the following program: Preheat the lid at 99 °C; Initial denaturation at 98 °C for 30 seconds, 36 cycles comprising 98 °C for 10 seconds, 60 °C for 30 seconds, and 72 °C for 25 seconds, and final extension at 72 °C for 5 minutes, followed by cooldown to 20 °C. All PCR products were then diluted with 9 pL of ultrapure water for later Gibson assembly. The success of PCR on each plate was confirmed by DNA agarose gel electrophoresis of several random samples on the plate.

Gibson assembly

Assembly of the three fragment PCR products into the pYJA5 vector was performed in a 384-well plate by Gibson assembly, with the following reaction mix:

Volume Reagent

2 pL C1 amplified fragment (estimated around 16 ng/pL)

1 pL M amplified fragment (estimated around 16 ng/pL)

1 pL C2s amplified fragment (estimated around 20 ng/pL)

1 pL pYJA5 Bbsl digested purified vector, diluted to 120 ng/pL

5 pL 2x homemade Hi Fi Gibson master mix

The mix was incubated in the thermocycler at 50 °C for 1 hour, and then used for transformation of competent cells or stored immediately at -20 °C.

Transformation and bacterial storage

Transformation was carried out in 96-well deep-well plates (2.3 mL, Axygene P-DW-20-C) in the cold room. 5 pL (per well) of Gibson mix from the 384-well plate was transferred into four 96-well plates and spun down to the bottom of each well. 50 pL (per well) of homemade competent cells (NEB stable competent cells) were dispensed and mixed twice with the Gibson mix. The plates were then kept immersed in ice for 30 minutes. Heat shock was performed for 30 seconds at 42 °C by placing the plate into a water bath. Plates were placed back on ice for 5 minutes. 300 pL of homemade SOC medium (0.5% Yeast Extract, 2% Tryptone, 10 mM NaCI, 2.5 mM KCI, 10 mM MgC , 10 mM MgSO4, 20 mM Glucose) were then added into the plate and incubated for 1 hour at 37 °C under shaking at 900 rpm using a thermo-shaker. Then, 900 pL (per well) of Terrific Broth (TB) medium (https://openwetware.org/wiki/Terrific_Broth) containing 15 pg/mL trimethoprim and 15 pg/mL tetracycline was added to the transformation mix, and incubated at 30 °C under shaking at 900 r pm for 40-48 hours.

Bacteria were then stored at a final concentration of 16.7% (v/v) glycerol in both 96 well plates (300 pL final storage volume) and 384-well plates (150 pL final storage volume) at -80 °C.

Maqnetic-beads-based 96-well plasmids miniprep

50 pl of the Gibson assembly product transformed bacteria were transferred into 1.2 mL of TB medium (with 15 pg/mL trimethoprim and 15 pg/mL tetracycline in 96-well deep well plate) immediately before the storage of the bacteria, and grown at 30 °C at 900 rpm for 40-48 hours. The bacteria were then subjected to in-house magnetic-beads-based plasmids miniprep procedures, which were adopted from the canonical plasmids miniprep protocols (Birnboim and Doly, 1979). Briefly, the bacteria were pelleted at 4000 rpm for 10 min and resuspended in 200 pl of P1 buffer [50 mM glucose, 10 mM EDTA, 25 mM Tris (pH 8.0)], and subsequently lysed in 200 pl of P2 buffer [0.2 M NaOH, 1 % SDS (w/v) ], and the lysis mixture was neutralized in 200 pl of P3 buffer (3 M KOAc, pH 6.0) and subjected to centrifugation at 4000 rpm for 10 min at 4 °C. Then 400 pl of the supernatant were transferred into a new deep-well plate and 1000 pl of cold absolute ethanol were added and mixed, then centrifuged at 4000 rpm for 10 min at 4 °C. The supernatant was discarded and 50 pl of ddH?O was added to the plasmid pellet and mixed to dissolve the plasmids. Then 75 pl of beads buffer [2.5 M NaCI, 10 mM Tris base, 1mM EDTA, 3.36 mM HCI, 20% (w/v) PEG8000, 0.05% (w/v) Tween 20] and 50 pl of SpeedBeads™ magnetic carboxylate modified particles (GE Healthcare 65152105050250, 1 :50 dilution in beads buffer) were added to the plasmids, mixed and incubated for 5 min on a magnetic rack to separate the beads from the supernatant. The beads were then washed twice with 70% ethanol and dried in a water bath (65 °C). Plasmid DNA was then eluted by 150 pl of sterile tris-EDTA buffer [1 mM EDTA, 10 mM Tris-HCI (pH 8.0)] from the beads at 65 °C for 10 min and transferred to a new low-profile 96-well plate. To ensure the full cloning procedure was correct, two wells of plasmids from each 96-well plate were subjected to Sanger sequencing.

Cell culture, transfection, transduction, and flow cytometry

All cells were cultured at 37 °C with appropriate growth medium with 5% CO2. Transfection was performed via Lipofectamine 3000 (Thermo Fisher Scientific) at a cell density of 80-90% and with 0.25 pg of gRNA plasmids and 0.25 pg of Cas9 or dCas9-VPR plasmids in 24-well plates. For lentiviral transduction, a multiplicity of infection of ~1-2 was used and 3 days post infection, the cells were subjected to flow cytometry or RNA extraction for real-time quantitative PCR. For validation of gene knockout efficiency, cells were cultured for around 7 days under puromycin selection before subjected to genomic DNA extraction. Flow cytometry analysis was performed by the BD Canto II or LSRFortessa™ Cell Analyzer at the core facility center of the University of Zurich.

Real-time quantitative PCR

Total RNA of HEK293 cells or iNeurons were isolated by the TRIzol Reagent (Thermo Fisher Scientific) according to the manual. 600 ng of RNA were reversed transcribed into cDNA via QuantiTect Reverse Transcription Kit (Qiagen). Real-time quantitative PCR was done with SYBR green (Roche) according to the manual with the primer sets for each gene as follows. GAPDH, ACTB and HMBS were used as internal control.

Gene Forward primer sequence 5'-3' Reverse primer sequence 5'-3'

ACTB CATGTACGTTGCTATCCAGGC CTCCTTAATGTCACGCACGAT

(SEQ ID NO. 012) (SEQ ID NO. 013)

CXCR4 ACTACACCGAGGAAATGGGCT CCCACAATGCCAGTTAAGAAGA

(SEQ ID NO. 014) (SEQ ID NO. 015)

NEUROD1 GGATGACGATCAAAAGCCCAA GCGTCTTAGAATAGCAAGGCA (SEQ ID NO. 016) (SEQ ID NO. 017)

LINC00925 AATTGTCCTGTGAAGTGAAG TTCCTCTGTCTCCATTGTCA

(SEQ ID NO. 018) (SEQ ID NO. 019)

TINCR GGTCTGGGCTCCCAGGTGGA TGTCAGGGACTGGGGCTCC

(SEQ ID NO. 020) (SEQ ID NO. 021 )

POU5F1 TATTTGGGAAGGTATTCAGC CTTACACATGTTCTTGAAGC

(SEQ ID NO. 022) (SEQ ID NO. 023)

KLF4 CTGGCGGGAGGAGCTCTCC CGGCTCCGCCGCTCTCCA

(SEQ ID NO. 024) (SEQ ID NO. 025)

LIN28AGGGATGGATATATGAAGTAAGG TAGCTACCATGACACTATTAAT

(SEQ ID NO. 026) (SEQ ID NO. 027)

IL1 R2 GCTGCCAGAAGCTGCCG CTCAGGGCTACAGGCTCCC

(SEQ ID NO. 028) (SEQ ID NO. 029)

EGFR GTTTGCCAAGGCACGAGTAA GAGAAAATGATCTTCAAAAGTGCCC

(SEQ ID NO. 030) (SEQ ID NO. 031 )

HBG1 AATGTGGAAGATGCTGGAGG GCCAAAGCTGTCAAAGAACC

(SEQ ID NO. 032) (SEQ ID NO. 033)

MYOD1 GCTCCAACTGCTCCGAC TGCTGGACAGGCAGTCTA

(SEQ ID NO. 034) (SEQ ID NO. 035)

ZFP42 CAGGTGTTTGCTGAAGACAG GTTGCTGGCTCATGTTTTCC

(SEQ ID NO. 036) (SEQ ID NO. 037)

LINC00028 CACTCCCACACCCCAAC TGCCAACTTTCCACAGCTAG

(SEQ ID NO. 038) (SEQ ID NO. 039)

LINC00514 AGAAGTGGTTTGGGCGC CCGTTTCCATTGTTGTATCCTG

(SEQ ID NO. 040) (SEQ ID NO. 041 )

ASCL1 CGCGGCCAACAAGAAGATG CGACGAGTAGGATGAGACCG

(SEQ ID NO. 042) (SEQ ID NO. 043)

GAPDH ATGATCTTGAGGCTGTTG CTCAGACACCATGGGGAA

(SEQ ID NO. 044) (SEQ ID NO. 045)

AGER TGGATGAAGGATGGTGTGCC CACAGCTGTAGGTTCCCTGG (SEQ ID NO. 046) (SEQ ID NO. 047)

APOE CTGCTCAGCTCCCAGGTC TTGTTCCTCCAGTTCCGATT

(SEQ ID NO. 048) (SEQ ID NO. 049)

F2R CCGCAGGCCAGAATCAAAAG ACAAAGAGTGTCAGCCAGGAG

(SEQ ID NO. 050) (SEQ ID NO. 051 )

HES7 CCCCAAGATGCTCAAGCCGGGTTCCGGAGGTTCTGGTC

(SEQ ID NO. 052) (SEQ ID NO. 053)

LPAR4 GGCGGTATTTCAGCCTCTTT AGCAGGTGGTGGTTGCATTG

(SEQ ID NO. 054) (SEQ ID NO. 055)

ARG1 TCCCGATGTGCCAGGATTCT ACGTCTCTCAAGCCAATATA

(SEQ ID NO. 056) (SEQ ID NO. 057)

APP CTCGTCACGTGTTCAATATG GGGTGTGCTGTCTGTCCTTC

(SEQ ID NO. 058) (SEQ ID NO. 059)

HMBS GAAGGATGGGCAACTGTACC ATGGTAGCCTGCATGGTCTC

(SEQ ID NO. 060) (SEQ ID NO. 061 )

PRNP GTGCACGACTGCGTCAAT CCTTCCTCATCCCACTATCAGG

(SEQ ID NO. 062) (SEQ ID NO. 063)

Surveyor detection and quantification of gene editing efficiency

HEK293 cells were seeded in 24-well plates at a density of 4.0 x10 5 cells per well. 24 hours later, cells growing at ~90 % confluency were co-transfected with lentiCas9-Blast (Addgene, 52962, 250 ng per well) and sgRNA plasmids (250 ng per well) using the Lipofectamine 3000 transfection reagent (Thermo Fischer Scientific, L3000015) according to the manufacturer’s introductions. 72 hours after transfection, cells were selected by puromycin (1 pg/ml) for 3 days. Afterwards, cells were harvested for genomic DNA isolation using the DNeasy blood & tissue kit (Qiagen, 69506). Genomic DNA was used as the template for PCR amplification of the targeted region in the genome using Phusion high-fidelity DNA polymerase (New England Biolabs, M0530S). For each PCR reaction of 50 pl volume, approximately 150 ng genomic DNA, 0.5 pl Phusion DNA polymerase, 5 pM forward/reverse primers, 10 mM dNTP, and 10 pl 5* Phusion HF buffer were included, followed by temperature conditions: Initial denaturation at 98 °C for 30 seconds, 37 cycles including 98 °C for 10 seconds, 60 °C for 30 seconds, and 72 °C for 30 seconds per kb, and final extension at 72 °C for 10 min.

For 4-sgRNA-transfected genomic editing samples, large amounts of mutant DNA bands in the PCR amplicons were visualized on the image of the gel after electrophoresis, and the intensity of DNA bands was determined using the Imaged software (NIH). The gene editing efficiency was defined as the percentage of the integrated intensity of mutant DNA bands to that of total DNA bands.

For 1-sgRNA-transfected genomic editing samples, genetic editing efficiency was determined using the Surveyor mutation detection kit (Integrated DNA Technologies, 706020). Briefly, to generate heteroduplex DNA fragments, the above-mentioned PCR amplicons of the 1-sgRNA sample and the corresponding non-targeting control sgRNA sample were mixed in equal proportions with 4 pl per sample in a 10 pl reaction volume, followed by incubation under annealing temperature conditions: 95 °C for 10 min, 95 °C to 85 °C with 3°C/sec, 85 °C for 1 min, 85 °C to 75 °C with

0.3°C/sec, 75 °C for 1 min, 75 °C to 65 °C with 0.3°C/sec, 65 °C for 1 min, 65 °C to 55 °C with

0.3°C/sec, 55 °C for 1 min, 55 °C to 45 °C with 0.3°C/sec, 45 °C for 1 min, 45 °C to 35 °C with

0.3°C/sec, 35 °C for 1 min, and 35 °C to 25 °C with 0.3°C/sec, and 25 °C for 1 min. Heteroduplex

DNA was then treated with MgC (1 pl), Surveyor enhancer (1 pl), and Surveyor nuclease (4 pl) provided in the kit at 42°C for 30 minutes, and separated on 2% agarose gel. The percentage of gene editing was calculated with the formula: (1-(1-b/(a + b))0.5) x 100, where “a” signifies the intensity of the un-cleaved band and “b” signifies the integrated intensity of all cleavage bands.

To further quantify the gene editing efficiency more precisely, PCR amplicons of both 1-sgRNA and 4-sgRNA genetic perturbations were further submitted to next generation sequencing (NGS). First, barcodes were appended to the usual primers used for the PCR amplification described above. Four forward barcodes were designed corresponding to four biological replicates, and six reverse barcodes were designed corresponding to six kinds of genetic perturbations including non-targeting sgRNA, sgRNAI , sgRNA2, sgRNA3, sgRNA4, and 4- sgRNA. Second, the barcoded primers and the above-mentioned genomic DNA were used for PCR amplification. Then the PCR products were purified via gel extraction using the NucleoSpin gel and PCR clean-up kit (Macherey Nagel, 40609.250). Third, PCR amplicons were pooled into 614 ng with equal molarity among nontargeting sgRNA samples for all detected genes and equal amounts among all genetic perturbations for each gene.

Lentiviral packaging - general

In short, HEK293T cells were grown to 80-90% confluency in DMEM + 10% FBS on poly-D-lysine coated 24-wells plates and transfected with the 3 different plasmids (Transfer plasmid, PAX-2 and VSV-G; ratios: 5:3:2) with lipofectamine 3000 for lentivirus production. After 6 hours, or overnight incubation, the medium is changed to virus harvesting medium (DMEM + 10% FBS + 1 % BSA). The supernatant containing the lentiviral particles was then harvested 48-72 hours after the change to virus harvesting medium. Suspended cells or cellular debris was pelleted with centrifugation at 1500 rpm for 5 min. Then clear supernatant was titrated and stored at -80°C.

For the titration of the lentiviral particles, the same number of HEK293T cells were grown in 24-well plates, and infected by adding small volumes (V) of the above-mentioned viral supernatant (e.g. 3 pL). A representative batch of cells was used to determine the cell number at the time of infection (N). 72 hours after infection, the cells were harvested and analysed by flow cytometry to quantify the fraction of infected cells (BFP positive). The percentage of positive cells (P) is then used to calculate the titre (T) of the virus according to the following formula: T=(P*N)/V.

Lentiviral packaging - detailed

Method for generating of a HEK293T monoclonal cell line: Low passage number (before passage 8) of HEK293T cells were serially diluted to enable ~10 single cells per 10-cm diameter Petri dishes, 6 plates were prepared. 10 days later, 24 monoclonal lines that had clear and separate border from nearby clones were picked and further grown in 6-well plates individually. After reaching 80% of confluency, each clonal cell line was stored at -80 degree as seeds. Then all 24 lines were tested for their capacity of producing lentiviruses with 400 ng of 4sgRNA NT plasmid, 225 ng of PAX-2 (addgene # 12260), and 150ng of VSV-G (addgene # 8454) in a 24-well plate. After several rounds of validation, clone 5 among these clones, showed the best performance compared to other clones (see results Fig. 14).

Protocol for 384-well high-titter lentiviral production (see also Fig. 16):

Maintenance of cell culture | HEK293T clone5 cells are maintained at maximal 80% confluency until maximal passage 20. Two Flasks (T150) are needed for cell seeding lentiviral production.

Cell seeding in 384-well cell culture plate | 7.5-8 X10 3 cells are seeded into 384-well cell culture plate in a volume of 50 pl of DMEM+10% FBS medium with 384-well channel 125 pl Integra Viaflo tips (ViaFlo Settings: 32 mm, speed up: 8; speed down: 3). 280 ml of cell suspension (mixed in reservoir) of the density are needed for seeding of 10 plates considering dead volume of the experiment.

Virus packaging | 24 hours post cell seeding, the clone 5 of HEK293T cells will reach 80- 90% confluency. Then cells are ready for transfection, which is the core of protocol. Each plasmid-set of 384-well plate will be prepared in a bigger volume for triplicate transfection.

Reagents are prepared as follows for 3x plates of 384-well plate of plasmid set:

The transfection step is time sensitive (should be finished within 1 hour ) and reagents should be measured precisely.

Transfection: Prepare Tube 1 and dispense 38 pl/well to a 384-deep-well-plate using a multichannel pipette (Figure 16: 1 a; plate with mix 1 ); Dispense 10.1 pl/well from the plate with mix 1 to each of 3384-PCR-plates with Viaflo (12.5pl-tips); Prepare Tube 2 and dispense 46 pl/well to a 384-deep-well-plate using a multichannel pipette (Figure 16: 2a-b); in parallel: add 72 ng (2.4pl of 30pg/pl dilution) of transfer plasmid (from the library in 384- format) to the plate with the mix from Tube 1 - mix 10 cycles (Figure 1 : 3; speed up/down: 8); incubate for 5 minutes at room temperature;

Add each 12.5 pl/well from the plate with mix from tube 2 to the plates with tube 1 mix to obtain 25 pl/well transfection mix - mix 20 cycles (Figure 1 : 4; speed up/down: 8) reuse tips for the transfection step - therefore keep them in the respective box; incubate for 20 minutes at RT;

During the incubation period, remove 36 pl medium from the cell culture plate (ViaFlo setting: 31.5 mm; speed up 3); Centrifuge the plates with transfection mix before using (1500 rpm, 2 minutes); Transfect cells by adding 8 pl of transfection mix (ViaFlo setting: 16 mm; speed down 3) to each replica of cells, in total 3 replicates of each plasmid-set; incubate at 37 °C in the virus incubator for 6-12 hours.

Adding of Virus Harvesting Medium | 6 hours post transfection or the second day morning, virus harvesting medium will be added into each well of transfected cells without removing transfection mix. Briefly, the virus harvesting medium will be warmed up at 37°C 30 min before use then being poured into in a reservoir immediately before use. Add 44 pl of Virus Harvesting Medium to the transfected cells; stay close to the well’s wall while releasing the medium from the tips in order to not detach any cells in the well (ViaFlo Settings: 37mm, speed up: 8, speed down: 2).

Virus Collection | 48-60 hours post addition of virus harvest medium change, take 45 pl of viruses of each replica of transfected plate and pool the three replicates of the same plasmid set in a deep-well 384-well plate, mix 20 cycles (Viaflo setting speed up/down 8); centrifuge at 1500 rpm 2 min to pellet cells; aliquot 15 pl of virus supernatant into 6x PCR plates and 20 pl of viruses into another PCR plate for making the pooled library. Viruses in PCR plates are tightly sealed with aluminium covers and stored at -80°C. The rest of virus in the deep-well plate will be stored and used for transduction of wild-type HEK293T cells and then lentiviral titration in 384-well plates with high-throughput 384-well flow-cytometer (Figure 15).

Appendix: Medium Section

Cell Culture Medium: DMEM + 10% FBS

Virus Harvesting Medium: DMEM (remove 66 ml from the 500 ml bottle), add 59 ml of FBS, 6.82g of BSA powder and 6.8ml of Penicillin/Streptomycin.

References (Materials and Methods section)

Birnboim, H.C., and Doly, J. (1979). A rapid alkaline extraction procedure for screening recombinant plasmid DNA. Nucleic Acids Res 7, 1513-1523.

Koike-Yusa, H., Li, Y., Tan, E.P., Velasco-Herrera Mdel, C., and Yusa, K. (2014). Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library. Nat Biotechnol 32, 267-273. Metzakopian, E., Strong, A., Iyer, V., Hodgkins, A., Tzelepis, K., Antunes, L., Friedrich, M.J., Kang, Q., Davidson, T., Lamberth, J., et al. (2017). Enhancing the genome editing toolbox: genome wide CRISPR arrayed libraries. Sci Rep 7, 2244.

Abugessaisa, I., Noguchi, S., Hasegawa, A., Harshbarger, J., Kondo, A., Lizio, M., Severin, J., Carninci, P., Kawaji, H., and Kasukawa, T. (2017). FANTOM5 CAGE profiles of human and mouse reprocessed for GRCh38 and GRCm38 genome assemblies. Sci. Data 4, 170107.

Braschi, B., Denny, P., Gray, K., Jones, T., Seal, R., Tweedie, S., Yates, B., and Bruford, E. (2019). Genenames.org: the HGNC and VGNC resources in 2019. Nucleic Acids Res. 47, D786-D792.

Concordet, J.-P., and Haeussler, M. (2018). CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screens. Nucleic Acids Res. 46, W242-W245.

Doench, J.G., Fusi, N., Sullender, M., Hegde, M., Vaimberg, E.W., Donovan, K.F., Smith, I., Tothova, Z., Wilen, C., Orchard, R., et al. (2016). Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 34, 184-191.

FANTOM Consortium and the RIKEN PMI and CLST (DGT), Forrest, A.R.R., Kawaji, H., Rehli, M., Baillie, J.K., de Hoon, M.J.L., Haberle, V., Lassmann, T., Kulakovskiy, I.V., Lizio, M., et al. (2014). A promoter-level mammalian expression atlas. Nature 507, 462-470.

Frankish, A., Diekhans, M., Ferreira, A.-M., Johnson, R., Jungreis, I., Loveland, J., Mudge, J.M., Sisu, C., Wright, J., Armstrong, J., et al. (2019). GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766-D773.

Gilbert, L.A., Horlbeck, M.A., Adamson, B., Villalta, J.E., Chen, Y., Whitehead, E.H., Guimaraes, C., Panning, B., Ploegh, H.L., Bassik, M.C., et al. (2014). Genome-Scale CRISPR-Mediated Control of Gene Repression and Activation. Cell 159, 647-661.

Glusman, G., Caballero, J., Mauldin, D.E., Hood, L., and Roach, J.C. (2011 ). Kaviar: an accessible system for testing SNV novelty. Bioinforma. Oxf. Engl. 27, 3216-3217.

Graf, R., Li, X., Chu, V.T., and Rajewsky, K. (2019). sgRNA Sequence Motifs Blocking Efficient CRISPR/Cas9-Mediated Gene Editing. Cell Rep. 26, 1098-1103. e3.

Haeussler, M., Schonig, K., Eckert, H., Eschstruth, A., Mianne, J., Renaud, J.-B., Schneider- Maunoury, S., Shkumatava, A., Teboul, L., Kent, J., et al. (2016). Evaluation of off-target and on- target scoring algorithms and integration into the guide RNA selection tool CRISPOR. Genome Biol. 17, 148.

Hanna, R.E., and Doench, J.G. (2020). Design and analysis of CRISPR-Cas experiments. Nat. Biotechnol.

Hart, T., Tong, A.H.Y., Chan, K., Van Leeuwen, J., Seetharaman, A., Aregger, M., Chandrashekhar, M., Hustedt, N., Seth, S., Noonan, A., et al. (2017). Evaluation and Design of Genome-Wide CRISPR/SpCas9 Knockout Screens. G3 Bethesda Md 7, 2719-2727. Horlbeck, M.A., Gilbert, L.A., Villalta, J.E., Adamson, B., Pak, R.A., Chen, Y., Fields, A.P., Park, C.Y., Corn, J.E., Kampmann, M., et al. (2016). Compact and highly active next-generation libraries for CRISPR-mediated gene repression and activation. ELife 5.

Hsu, P.D., Scott, D.A., Weinstein, J. A., Ran, F.A., Konermann, S., Agarwala, V., Li, Y., Fine, E.J., Wu, X., Shalem, O., et al. (2013). DNA targeting specificity of RNA-guided Cas9 nucleases. Nat. Biotechnol. 31 , 827-832.

Huber, W., Carey, V.J., Gentleman, R., Anders, S., Carlson, M., Carvalho, B.S., Bravo, H.C., Davis, S., Gatto, L., Girke, T., et al. (2015). Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods 12, 115-121.

Lambert, S.A., Jolma, A., Campitelli, L.F., Das, P.K., Yin, Y., Albu, M., Chen, X., Taipale, J., Hughes, T.R., and Weirauch, M.T. (2018). The Human Transcription Factors. Cell 172, 650-665.

Perez, A.R., Pritykin, Y., Vidigal, J.A., Chhangawala, S., Zamparo, L., Leslie, C.S., and Ventura, A. (2017). GuideScan software for improved single and paired CRISPR guide RNA design. Nat. Biotechnol. 35, 347-349.

R Core Team (2020). R: A Language and Environment for Statistical Computing (Vienna, Austria: R Foundation for Statistical Computing).

Sanson, K.R., Hanna, R.E., Hegde, M., Donovan, K.F., Strand, C., Sullender, M.E., Vaimberg, E.W., Goodale, A., Root, D.E., Piccioni, F., et al. (2018). Optimized libraries for CRISPR-Cas9 genetic screens with multiple modalities. Nat. Commun. 9, 5416.

Smedley, D., Haider, S., Ballester, B., Holland, R., London, D., Thorisson, G., and Kasprzyk, A. (2009). BioMart-biological queries made easy. BMC Genomics 10, 22.

Uhlen, M., Fagerberg, L., Hallstrom, B.M., Lindskog, C., Oksvold, P., Mardinoglu, A., Sivertsson, A., Kampf, C., Sjostedt, E., Asplund, A., et al. (2015). Proteomics. Tissue-based map of the human proteome. Science 347, 1260419.

Uhlen, M., Karlsson, M.J., Hober, A., Svensson, A.-S., Scheffel, J., Kotol, D., Zhong, W., Tebani, A., Strandberg, L., Edfors, F., et al. (2019). The human secretome. Sci. Signal. 12.

Yates, A.D., Achuthan, P., Akanni, W., Allen, J., Allen, J., Alvarez-Jarreta, J., Amode, M.R., Armean, I.M., Azov, A.G., Bennett, R., et al. (2020). Ensembl 2020. Nucleic Acids Res. 48, D682- D688.