Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SITE-SPECIFIC DNA BREAK-INDUCED GENOME EDITING USING ENGINEERED NUCLEASES
Document Type and Number:
WIPO Patent Application WO/2015/115903
Kind Code:
A1
Abstract:
The invention relates to methods for modifying a target sequence in a genome of a cell by homologous recombination, to constructs used therein and to therapeutic and non-therapeutic applications thereof.

Inventors:
GONÇALVES MANUEL ANTÓNIO FARIA VIOLA (NL)
HOLKERS MAARTEN (NL)
Application Number:
PCT/NL2015/050072
Publication Date:
August 06, 2015
Filing Date:
February 03, 2015
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ACADEMISCH ZIEKENHUIS LEIDEN (NL)
International Classes:
C12N15/09
Other References:
HOLKERS MAARTEN ET AL: "Differential integrity of TALE nuclease genes following adenoviral and lentiviral vector gene transfer into human cells", NUCLEIC ACIDS RESEARCH, vol. 41, no. 5, March 2013 (2013-03-01), XP002726563
MERKERT SYLVIA ET AL: "Efficient Designer Nuclease-Based Homologous Recombination Enables Direct PCR Screening for Footprintless Targeted Human Pluripotent Stem Cells", STEM CELL REPORTS, vol. 2, no. 1, January 2014 (2014-01-01), pages 107 - 118, XP002726591
SCHAACK JEROME ET AL: "Characterization of a replication-incompetent adenovirus type 5 mutant deleted for the preterminal protein gene", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, vol. 93, no. 25, 1996, pages 14686 - 14691, XP002726564, ISSN: 0027-8424
MAGGIO IGNAZIO ET AL: "Adenoviral vector delivery of RNA-guided CRISPR/Cas9 nuclease complexes induces targeted mutagenesis in a diverse array of human cells", SCIENTIFIC REPORTS, vol. 4, May 2014 (2014-05-01), XP002726565
MILLER, D.G. ET AL.: "Adeno-associated virus vectors integrate at chromosome breakage sites", NAT. GENET., vol. 36, 2004, pages 767 - 773
PAPAPETROU, E.P. ET AL.: "Genomic safe harbors permit high 6-globin transgene expression in thalassemia induced pluripotent stem cells", NAT. BIOTECHNOL., vol. 29, 2011, pages 73 - 78
TSAI, H.H. ET AL.: "Terminal proteins of Streptomyces chromosome can target DNA into eukaryotic nuclei", NUCLEIC ACIDS RES., vol. 36, 2008, pages E62
MENCIA, M. ET AL.: "Terminal protein-primed amplification of heterologous DNA with a minimal replication system based on phage Phi29", PROC. NATL. ACAD. SCI. USA, vol. 108, 2011, pages 18655 - 18660
FALLAUX, F.J. ET AL.: "New helper cells and matched early region 1-deleted adenovirus vectors prevent generation of replication-competent adenoviruses", HUM. GENE THER., vol. 9, 1998, pages 1909 - 1917
SCHIEDNER, G. ET AL.: "Efficient transformation of primary human amniocytes by E1 functions of Ad5: generation of new cell lines for adenoviral vector production", HUM GENE THER., vol. 11, 2000, pages 2105 - 2116
HAVENGA, M.J. ET AL.: "Serum-free transient protein production system based on adenoviral vector and PER.C6 technology: high yield and preserved bioactivity", BIOTECHNOL. BIOENG., vol. 100, 2008, pages 273 - 283
SILVA, G. ET AL.: "Meganucleases and other tools for targeted genome engineering: perspectives and challenges for gene therapy", CURR. GENE THER., vol. 11, 2011, pages 11 - 27
GAJ, T. ET AL.: "ZFN, TALEN, and CRISPR/Cas-based methods for genome engineering", TRENDS BIOTECHNOL., vol. 31, 2013, pages 397 - 405
LOMBARDO, A. ET AL.: "Gene editing in human stem cells using zinc finger nucleases and integrase-defective lentiviral vector delivery", NAT. BIOTECHNOL., vol. 25, 2007, pages 1298 - 1306
GABRIEL, R. ET AL.: "An unbiased genome-wide analysis of zinc-finger nuclease specificity", NAT. BIOTECHNOL., vol. 29, 2011, pages 816 - 823
REDREJO-RODRIGUEZ, M. ET AL.: "Functional eukaryotic nuclear localization signals are widespread in terminal proteins of bacteriophages", PROC. NATL. ACAD. SCI. USA, vol. 109, 2012, pages 18482 - 18487
GONCALVES, M.A.F.V.; VAN DER VELDE, 1.; KNAAN-SHANZER, S.; VALERIO, D.; DE VRIES, A.A.F.: "Stable transduction of large DNA by high-capacity adeno-associated virus/adenovirus hybrid vectors", VIROLOGY, vol. 321, 2004, pages 287 - 296
CUDRE-MAUROUX, C. ET AL.: "Lentivector-mediated transfer of Bmi-1 and telomerase in muscle satellite cells yields a duchenne myoblast cell line with long-term genotypic and phenotypic stability", HUM. GENE THER., vol. 14, 2003, pages 1525 - 1533
GONCALVES, M.A.F.V. ET AL.: "Transcription factor rational design improves directed differentiation of human mesenchymal stem cells into skeletal myocytes", MOL. THER., vol. 19, 2011, pages 1331 - 1341
COLUCCIO, A. ET AL.: "Targeted gene addition in human epithelial stem cells by zinc-finger nuclease-mediated homologous recombination", MOL. THER., vol. 21, 2013, pages 1695 - 1704
JANSSEN, J.M.; LIU, J.; SKOKAN, J.; GONCALVES, M.A.F.V.; DE VRIES, A.A.F.: "Development of an AdEasy-based system to produce first- and second-generation adenoviral vectors with tropism for CAR- or CD46-positive cells", J. GENE MED., vol. 15, 2013, pages 1 - 11
MALI, P. ET AL.: "RNA-guided human genome engineering via Cas9", SCIENCE, vol. 339, 2013, pages 823 - 826
HOLKERS, H. ET AL.: "Differential integrity of TALE nuclease genes following adenoviral and lentiviral vector gene transfer into human cells", NUCLEIC ACIDS RES., vol. 41, 2013, pages E63
HOLKERS, M.; CATHOMEN, T.; GONQALVES M.A.F.V.: "Construction and characterization of adenoviral vectors for the delivery of TALENs into human cells", METHODS
PELASCINI, L.P. ET AL.: "Histone deacetylase inhibition rescues gene knockout levels achieved with integrase-defective lentiviral vectors encoding zinc-finger nucleases", HUM. GENE THER. METHODS, vol. 24, 2013, pages 399 - 411
PELASCINI, L.P.L.; JANSSEN, J.M.; GONQALVES M.A.F.V.: "Histone deacetylase inhibition activates transgene expression from integration-defective lentiviral vectors in dividing and non-dividing cells", HUM. GENE THER., vol. 24, 2013, pages 78 - 96
PELASCINI, L.P.L; GONQALVES M.A.F.V.: "Lentiviral Vectors Encoding Zinc-Finger Nucleases Specific for the Model Target Locus HPRT1", METHODS MOL. BIOL., 2013
BRIELMEIER, M. ET AL.: "Improving stable transfection efficiency: antioxidants dramatically improve the outgrowth of clones under dominant marker selection", NUCLEIC ACIDS RES., vol. 26, 1998, pages 2082 - 2085
VAN NIEROP, G.P.; DE VRIES, A.A.F.; HOLKERS, M.; VRIJSEN, K.R; GONCALVES, M.A.F.V.: "Stimulation of homology-directed gene targeting at an endogenous human locus by a nicking endonuclease", NUCLEIC ACIDS RES., vol. 37, 2009, pages 5725 - 5736
SZUHAI, K.; TANKE HJ.: "COBRA: combined binary ratio labeling of nucleic-acid probes for multi-color fluorescence in situ hybridization karyotyping", NAT. PROTOC., vol. 1, 2006, pages 264 - 275
GONCALVES, M.A.F.V. ET AL.: "Targeted chromosomal insertion of large DNA into the human genome by a fiber-modified high-capacity adenovirus-based vector system", PLOS ONE, vol. 3, 2008, pages E3084
HOLKERS, M. ET AL.: "Nonspaced inverted DNA repeats are preferential targets for homology-directed gene repair in mammalian cells", NUCLEIC ACIDS RES., vol. 40, 2012, pages 1984 - 1999
WANISCH, K.; YANEZ-MUNOZ, R.J.: "Integration-deficient lentiviral vectors: a slow coming of age", MOL. THER., vol. 17, 2009, pages 1316 - 1332
LOMBARDO, A. ET AL.: "Site-specific integration and tailoring of cassette design for sustainable gene transfer", NAT. METHODS, vol. 8, 2011, pages 861 - 869
BENABDALLAH, B.F. ET AL.: "Targeted gene addition of microdystrophin in mice skeletal muscle via human myoblast transplantation", MOL. THER. NUCLEIC ACIDS, vol. 2, 2013, pages E68
HOHER, T. ET AL.: "Highly efficient zinc finger nuclease-mediated disruption of an eGFP transgene in keratinocyte stem cells without impairment of stem cell properties", STEM CELL REV., vol. 8, 2012, pages 426 - 434
TERNS, R.M.; TERNS, M.P.: "CRISPR-based technologies: prokaryotic defense weapons repurposed", TRENDS GENET., vol. 30, 2014, pages 111 - 118
FU, Y. ET AL.: "High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells", NAT. BIOTECHNOL., vol. 31, 2013, pages 822 - 826
HSU, P.D. ET AL.: "DNA targeting specificity of RNA-guided Cas9 nucleases", NAT. BIOTECHNOL., vol. 31, 2013, pages 827 - 832
CRADICK, T.J.; FINE, E.J.; ANTICO, C.J.; BAO, G.: "CRISPR/Cas9 systems targeting 6-globin and CCR5 genes have substantial off-target activity", NUCLEIC ACIDS RES., vol. 41, 2013, pages 9584 - 9592
MILLER, D.G.; RUTLEDGE, E.A.; RUSSELL D.W.: "Chromosomal effects of adeno-associated virus vector integration", NAT. GENET., vol. 30, 2002, pages 147 - 148
CHADEUF, G.; CIRON, C.; MOULLIER, P.; SALVETTI, A.: "Evidence for encapsidation of prokaryotic sequences during recombinant adeno-associated virus production and their in vivo persistence after vector delivery", MOL. THER., vol. 12, 2005, pages 744 - 753
Attorney, Agent or Firm:
JANSEN, C.M. (Johan de Wittlaan 7, JR Den Haag, NL)
Download PDF:
Claims:
Claims

1. A method for modifying a target sequence in a genome of a cell comprising:

- inducing a DNA break at a site of interest in said target sequence;

- introducing into said cell a linear replication incompetent construct comprising i) a first nucleic acid sequence which is homologous to a first region of said target sequence,

ii) a second nucleic acid sequence which is homologous to a second region of said target sequence,

iii) optionally a donor nucleic acid sequence located between said first and second nucleic acid sequences and

iv) a molecule attached to at least one terminus of said construct, whereby said first region is located on one side of said site of interest and said second region is located on the other side of said site of interest.

2. Method according to claim 1, wherein said molecule is a (polypeptide.

3. Method according to claim 1 or 2, wherein said construct is a linear replication incompetent viral vector, preferably a double-stranded linear replication incompetent viral vector.

4. Method according to claim 3, wherein said viral vector is selected from the group consisting of an adenoviral vector, a herpes viral vector, an adeno-associated viral vector, a retroviral vector, a vaccinia viral vector us and a bacteriophage vector, such as Phi29, Bam 35, Nf, PRD1 or Cp-1.

5. Method according to claim 1, further comprising introducing into said cell an endonuclease capable of inducing said DNA break or a construct comprising a nucleic acid sequence encoding an endonuclease capable of inducing said DNA break.

6. A set of constructs, comprising:

- a first construct comprising: i) a first nucleic acid sequence which is homologous to a first region of a target sequence in a genome of a cell;

ii) a second nucleic acid sequence which is homologous to a second region of said target sequence;

iii) optionally a donor nucleic acid sequence located between said first and second nucleic acid sequences; and

iv) a molecule attached to at least one terminus of said construct,

said first construct being a linear replication incompetent construct, and

- a second construct comprising a nucleic acid sequence encoding an endonuclease capable of inducing a double-strand break in a genome of a cell.

7. A linear replication incompetent construct comprising:

- a targeting region comprising:

i) a first nucleic acid sequence which is homologous to a first region of a target sequence in a genome of a cell;

ii) a second nucleic acid sequence which is homologous to a second region of said target sequence; and

iii) optionally a donor nucleic acid sequence located between said first and second nucleic acid sequences; and

- a molecule attached to at least one terminus of said construct.

8. A construct according to claim 7, further comprising a nucleic acid sequence encoding an endonuclease capable of inducing a DNA break in said target sequence.

9. Set of constructs according to claim 6 or construct according to claim 7 or 8, wherein said linear replication incompetent construct

is a viral vector, preferably a double-stranded linear replication incompetent viral vector, more preferably selected from the group consisting of an adenoviral vector, a herpes viral vector, an adeno-associated viral vector, a retroviral vector, a vaccinia viral vector us and a bacteriophage vector, such as Phi29, Bam 35, Nf, PRD1 or Cp- 1.

10. A virus particle, preferably an adenovirus particle, comprising a linear replication incompetent construct according to claim 7 or 8.

11. Method according to any one of claims 1-5 wherein said target sequence comprising a mutation, for treating of preventing a genetic disease in an

individual, preferably a disease selected from the group consisting of hemophilia B, hemophilia A, Duchene muscular dystrophy, cystic fibrosis, thalassemia, sickle cell anemia, X-linked severe combined immunodeficiency (SCID), ADA-SCID, Wiskott- Aldrich syndrome, epidermolysis bullosa dystrophica, epidermolysis bullosa junctional, RAG-1 deficiency SCID, RAG-2 deficiency SCID, metachromatic leukodystrophy, limb-girdle muscular dystrophy, type 2C, limb-girdle muscular dystrophy, type 2A, X-linked chronic granulomatous disease, and glycogen storage disease II. 12. Method according to any one of claims 1-5 for introducing an exogenous nucleic acid sequence of interest in a genome of a cell.

13. A set of constructs, a construct or a virus particle according to any one of claims 6- 10 for use in a method of modifying a target sequence in a genome of a cell.

14. A set of constructs, a construct or a virus particle according to any one of claims 6- 10 for use in a method of homologous recombination wherein the ratio of legitimate versus illegitimate integration of donor nucleic acid is more than 10, preferably more than 100.

15. A set of constructs, construct or virus particle according to any one of claims 6-10 for use in a method of treating or preventing a disease, preferably a genetic disease, more preferably a genetic disease selected from the group consisting of hemophilia B, hemophilia A, Duchene muscular dystrophy, cystic fibrosis, thalassemia, sickle cell anemia, X-linked severe combined immunodeficiency (SCID), ADA-SCID, Wiskott-Aldrich syndrome, epidermolysis bullosa dystrophica, epidermolysis bullosa junctional, RAG- 1 deficiency SCID, RAG-2 deficiency SCID, metachromatic leukodystrophy, limb-girdle muscular dystrophy, type 2C, limb- girdle muscular dystrophy, type 2A, X-linked chronic granulomatous disease, and glycogen storage disease II. 16. A population of cells comprising cells having a donor nucleic acid sequence in their genome, wherein the ratio of cells having legitimate integration of said donor nucleic acid in their genome versus cells having illegitimate integration of said donor nucleic acid in their genome is more than 10, preferably more than 100.

Description:
Title: Site-specific DNA break-induced genome editing using engineered nucleases

The invention relates to the fields of molecular biology and genetic engineering and gene therapy, in particular DNA break-induced genetic

engineering using engineered nucleases.

Homologous recombination (HR) is used by eukaryotes for, e.g. proper chromosomal segregation during meiotic division and repair of harmful double- strand DNA breaks. HR is generally defined as an exchange of homologous segments between two DNA molecules. HR provides methods for genetically modifying chromosomal DNA sequences by introducing e.g. small mutations or exogenous nucleotide sequences. By incorporating in its genome recombinant DNA via HR, a microorganism or a eukaryotic cell is transformed with an exogenous DNA sequence. The center of the exogenous nucleotide sequence contains the desired DNA sequence, which is flanked by segments of homology with the cell's chromosomal DNA. The exogenous DNA is introduced into the cell and recombines into the cell's DNA.

The exchange of genetic information between target loci and exogenous donor DNA through error-free HR has become a solidly established strategy to manipulate prokaryote and eukaryote genomes. In most biotechnological applications, however, cell selection protocols are either not straightforward or feasible for isolating the few cells that become genetically modified. In human somatic cells, for instance, typical frequencies of spontaneous HR-mediated gene targeting range from 10 6 to 10 8 , with the vast majority (often >10-fold extra) of the exogenous DNA being found randomly integrated throughout host cell

chromosomes. The recent development and refinement of sequence-specific nucleases has increased the odds for retrieving cells with specific allelic alterations. Indeed, generation of double-stranded DNA breaks (DSBs) at predefined

chromosomal positions together with the introduction of donor DNA containing sequences similar to those flanking a genomic target sequence at which a site- specific DSB is induced, can increase gene targeting by several orders of

magnitude. In gene therapy, inserting transcriptional units into specific genomic positions (i.e. so-called safe harbors) or directly repairing faulty genes within their native context is a desirable goal to circumvent de-regulated host and therapeutic gene expression and tumorigenesis.

A problem with the present HR-based genome editing strategies is the number of unwarranted and unpredictable illegitimate recombination events that occur concomitantly with the, often fewer, precise and targeted DNA modifying events. In these illegitimate recombination-derived events, such as those derived from the engagement of the error-prone non-homologous end-joining DNA repair pathway, the exogenous donor DNA molecule that was provided into the cell is inserted at a different (i.e. off-target) chromosomal position, or is introduced in the wrong way into the target site of the cell's genome. Illegitimate recombination is not desired because it negates the safety advantage associated with the precision of the HR event.

Therefore, there is a need for improved HR-based genome engineering methods that lead to a higher specificity and accuracy of the DNA editing process. The specificity concerns the relative frequencies of on-target versus off-target chromosomal DNA insertions, whilst the accuracy relates to the structure or arrangement of site-specifically integrated exogenous DNA. Low-accuracy gene targeting includes the chromosomal integration of exogenous DNA copies in tandem (i.e. concatemers) as well as the unpredictable incorporation of construct- derived sequences at the target DNA site (e.g. virus and prokaryotic-cell sequences present in viral and non-viral gene delivery vehicles, respectively). Clearly, site- specific and accurate gene addition and gene repair is desired.

It is an object of the present invention to provide methods for HR that allow for specific and accurate gene repair and/or insertion. It is a further object to provide constructs for use in such HR methods.

The present inventors found that the nature of the HR vector influences parameters underlying reliable genome editing. It was shown that introducing donor DNA using a capped linear replication incompetent construct reduces the ratio of illegitimate to legitimate recombination. It also reduces the incidence of incorrect DNA inserts. In particular it was found that such a capped construct provides for exceptional levels of site-specific and accurate DNA editing. It was further found that this exceptional specificity and accuracy stems from the embedding of donor DNA sequences within capped vectors.

The present inventors assessed the specificity and the accuracy of exogenous DNA insertion using the standard viral vector for the delivery of donor DNA into mammalian cells using HR, the integrase- defective lentiviral vector (IDLV) as well as the specificity and the accuracy of exogenous DNA insertion resulting from widely used non-viral vector systems, i.e., both linear and circular donor DNA plasmids. In addition, the effect of using three different types of designer sequence-specific nucleases to induce double-stranded DNA breaks was evaluated. As is demonstrated in the Examples, human cells exposed to IDLV- containing donor DNA (Fig. 1, upper panel) and double-stranded DNA-break inducing zinc-finger nucleases (ZFNs) results in at least 45.5% of the cells exhibiting illegitimate recombination (Fig. 2a). In these cells, the donor DNA was not inserted at the specific target site where the endonuclease induced a double- stranded DNA-break. If transcription activator-like effector nucleases (TALENs) targeting the native AAVSl safe harbor locus are used instead of ZFNs the percentage of off-target illegitimate recombination with donor DNA delivered via IDLV (Fig. 1, lower panel) is reduced to 13.4% (Fig. 2d). However, the accuracy of donor DNA insertion was low, as evidenced by the detection of DNA inserts lacking exogenous DNA junctions at one or both ("telomeric" and/or "centromeric") termini of the insert and the occurrence of inserts consisting of multiple copies of the exogenous DNA (head-to-tail concatemers). In the latter cases, the insert was present at the target site, but not in a correct form arising therefore from illegitimate recombination as opposed to error-free HR events between the exogenous and the target site DNA (Fig. 2c, Fig. 2d, Fig.3a and Fig. 4).

Next to IDLVs, recombinant ade no- associated viral vectors (rAAV) are the most commonly used viral vector for delivery of donor DNA in HR methods. However, it is also known in the art that the use of rAAV as delivery vehicle for exogenous DNA results in illegitimate recombination at off-target sites containing, e.g. spontaneous, double -stranded DNA breaks [1].

Illegitimate recombination causing both the occurrence of donor DNA insertion at a chromosomal site other than the target site and insertion of an incorrect donor DNA sequence at the target site, are disadvantageous and unpredictable events. Disadvantages include disruption of open reading frames (ORFs) in the genomic sequence, inserts that do not restore endogenous ORFs or do not yield sufficient transgene expression and unpredictable and/or excess expression levels of transgenes. Improper repair of DSBs may lead to chromosomal aberrations such as translocations, deletions, inversions, amplifications or to mutations. All these events may contribute to cell dysfunction, cell death, or tumor formation.

As demonstrated in Example 2, the present inventors found that if capped vectors are used for delivering exogenous DNA, specificity and accuracy is greatly improved. Analysis of 110 randomly selected myoblast clones (Fig. 5c) revealed that no illegitimate recombination had occurred at all. All clones contained the exogenous DNA insert at the AAVSl target site (Fig. 5d and Fig. 5e). Gene targeting in protein-capped adenoviral vector-modified cells was also confirmed by Southern blot analyses (Fig. 6). In addition, in all of 83 randomly selected HeLa cell clones the inserts were correctly integrated as evidenced by HR- derived exogenous DNA-target site junctions at both termini (Fig. 7d and Fig. 7e). Head-to-tail concatemers were detected neither in adenoviral vector-modified myoblasts (Fig. 5f) nor in adenoviral vector-modified HeLa cell populations (Fig. 5c). Hence, the present invention demonstrates that using such capped vector, a HR specificity and accuracy of 100% can be achieved. The specific and accurate adenoviral vector-mediated gene targeting process stems from using capped linear donor DNA (Fig. 8 and Fig.9). The high specificity and accuracy is observed both when TALENs and the RN A- guided nuclease (RGN) system are used for

introduction of DSBs.

The Examples further demonstrate that target cells exposed to TALENs and non-viral circular or free-ended (non-capped) donor DNA plasmids led to cells displaying broad distributions of transgene expression levels as determined by flow cytometric screening of randomly selected eGFP + clones, independently of the topology of the targeting DNA (circular or linearized non-capped plasmids). Such broad distribution indicates significant levels of off-target and/or inaccurate donor DNA integration (Fig. lib and Fig. 11c). In contrast, stably transduced cells generated by delivering RGN complexes and protein capped AdV donor DNA displayed a remarkably narrow range of transgene expression levels (Fig. lib and Fig. 11c) in line with high specificity and accuracy (Fig. lid).

The reduction in the frequency of illegitimate recombination obtained in accordance with the invention is important at least at two levels: (i) reducing the frequency of chromosomal off-target or random insertion events of the donor DNA and (ii) preventing or minimizing the formation of ORF- disruptive forms of the inserted exogenous DNA (e.g. donor-target site DNA junction(s) arising from illegitimate recombination as opposed to error-free HR events) or also ORF- disruptive concatemeric forms ("footprints") that lead to heterogeneous and/or unpredictable expression levels in stably- transduced cell populations.

The invention therefore provides a method for modifying a target sequence in a genome of a cell comprising:

- inducing a DNA break at a site of interest in said target sequence;

- introducing into said cell a linear replication incompetent construct comprising: i) a first nucleic acid sequence which is homologous to a first region of said target sequence,

ii) a second nucleic acid sequence which is homologous to a second region of said target sequence,

iii) optionally a donor nucleic acid sequence located between said first and second nucleic acid sequences and

iv) a molecule attached to at least one terminus of said construct, whereby said first region is located on one side of said site of interest and said second region is located on the other side of said site of interest.

The invention further provides a linear replication incompetent construct comprising:

- a targeting region comprising:

i) a first nucleic acid sequence which is homologous to a first region of a target sequence in a genome of a cell;

ii) a second nucleic acid sequence which is homologous to a second s region of said target sequence; and

iii) optionally a donor nucleic acid sequence located between said first and second nucleic acid sequences; and - a molecule attached to at least one terminus of said construct.

The invention further provides a set of constructs, comprising:

- a first construct comprising:

i) a first nucleic acid sequence which is homologous to a first region of a target sequence in a genome of a cell;

ii) a second nucleic acid sequence which is homologous to a second region of said target sequence;

iii) optionally a donor nucleic acid sequence located between said first and second nucleic acid sequences; and

iv) a molecule attached to at least one terminus of said construct, said first construct being a linear replication incompetent construct, and

- a second construct comprising a nucleic acid sequence encoding an endonuclease capable of inducing a DNA break in a genome of a cell.

The invention further provides the use of a construct according to the invention for modifying a target sequence in a genome of a cell.

The invention further provides the use of a set of constructs according to the invention for modifying a target sequence in a genome of a cell.

A "target sequence" as used herein refers to a sequence in the genome of a cell of which the genomic sequence is to be modified. A "site of interest" as used herein refers to a genomic location in which a DNA break is to be introduced. Said DNA break can be a single-stranded or a double-stranded DNA break. A preferred site of interest is a genomic location in which a double-stranded DNA break is to be introduced. The site of interest is also referred to as target site and preferably comprises a recognition site for an endonuclease which induces a DNA break at the target site. The DNA break can be a single-stranded or double-stranded DNA break. In a preferred embodiment the DNA break is a double-stranded DNA break.

A preferred target sequence for modification in accordance with a method of the invention is a safe harbor locus. Preferred, but non-limiting target loci are AAVS1, CCR5, CCR2, the ROSA26 locus, DMD21, FUT8 and SH6 as well as house-keeping genes such as HPRT1, GAPDH and DHFR. The term "safe harbor locus" as used herein refers to a locus that is generally accepted to be a locus that allows safe and stable insertion and expression of a transgene. Such genomic safe harbors can be in intragenic as well as extragenic regions in a genome of a cell. Insertion of a (trans)gene into a safe harbor locus essentially has no impact on the expression of adjacent genes in the genome of the target cell and leads to a predictable level of transgene expression in said cell. A preferred safe harbor locus is a genomic locus that fulfils the following criteria: i) a distance of > 50 kb from the 5' terminus of any gene, ii) a distance of >300 kb from cancer-related genes, iii) a distance of > 300 bp from any microRNA; iv) the locus is located outside a gene transcription unit, and v) the locus is located outside an ultra-conserved region. These criteria are described in ref. [2] . Use of a safe harbor as target sequence is particularly suitable for use in targeted gene addition in the present invention, i.e. methods wherein a specific exogenous DNA molecule such as a gene is inserted in the target cell's genome.

Other preferred target sequences for modification in accordance with a method of the invention are genes underlying a genetic disease. As used herein, the term "genetic disease" or "genetic disorder" refers to a pathological condition that is directly or indirectly caused by or associated with at least one genetic mutation. As used herein, a mutation refers to a nucleotide change, such as a single or multiple nucleotide replacement, deletion or insertion, in a nucleotide sequence. Genomic DNA that contains a mutation has a nucleotide sequence that is different in sequence from that of the corresponding wild- type genomic DNA. Methods of the invention wherein the target sequence is a gene underlying a genetic disease are particularly useful for gene repair, which is discussed in more detail herein below.

The target sequence can be any gene or genomic sequence of a cell, such as a cell of a microorganism, plant or an animal. A method of modifying a target sequence in a genome of a cell according to the invention can be performed in vitro, in vivo or ex vivo. By "ex vivo" as used herein is meant that a method of the invention is performed in cells that have been removed from the body of an individual.

"Modifying" as used herein is also referred to as "altering" and means a replacement of one or more nucleotides with one or more other nucleotides, the insertion of one or more nucleotides, and/or the deletion of one or more nucleotides, or a combination thereof, within the target sequence in the genome of a cell. A modification of a genomic sequence is preferably a replacement of one or more nucleotides within a gene. The term "gene" as used herein refers to a nucleic acid sequence that undergoes transcription and that includes coding sequences necessary for the production of a protein, (polypeptide or RNA. A gene may encode a particular protein or (poly)peptide, or code for an RNA sequence that is of interest in itself, such as an antisense inhibitor. Replacement of one or more nucleotides within a gene is particularly suitable for gene repair, preferably for gene repair of a genetic mutation, such as a genetic mutation that is directly or indirectly associated with a genetic disease. Another preferred modification is the in situ editing of the sequence of an endogenous target gene to alter or to expand the function of that said gene, e.g. (i) in-frame addition of an heterologous sequence coding for a tag such as a fluorescent poly(peptide) for the live-cell spatio-temporal tracing and quantification of endogenous gene expression and (ii) changing the biochemical properties of the product encoded by the said target gene. Another preferred modification is insertion of one or more nucleotides, preferably of a gene. Such application is particularly suitable for targeted gene addition, preferably for methods wherein a specific exogenous DNA molecule such as a gene or an artificially designed transcriptional unit is inserted in the target cell's genome in order to achieve expression of said exogenous DNA molecule. As used herein, the term "transcriptional unit" refers to a nucleic acid molecule comprising a sequence of at least a promoter, a protein- and/or RNA-coding sequence and a transcription termination signal such as a polyadenylation signal.

The term construct, as used herein, refers to an artificially constructed segment of nucleic acid, such as a vector or plasmid. A circular plasmid can be linearized by treatment with a suitable restriction enzyme based on the nucleotide sequence of the plasmid. Non-limiting examples of suitable linear constructs are viral vectors based on adenoviruses, herpes viruses, adeno-associated viruses, retroviruses, vaccinia virus and bacteriophages, preferably bacteriophages that replicate via a protein-primed DNA replication mechanism, such as Phi29, Bam 35, Nf, PRD1 or Cp-1. Non-limiting examples of suitable non-viral constructs are in vitro-cwpTped (synthetic or recombinant) linear nucleic acid molecules, including in vitro-cwpped linear nucleic acid molecules generated by terminal protein-primed DNA replication, and protein-capped linear plasmids such as recombinant linear plasmids derived from Streptomyces such as described in ref. [3]. Generation of linear nucleic acid molecules by terminal protein-primed DNA replication involves the use of a protein as primer for DNA synthesis. The protein is generally named terminal protein (TP) and becomes covalently attached at the 5' termini of the DNA. Such constructs and preparation methods are particularly suitable to prepare capped constructs in accordance with the present invention. The terminal protein-primed DNA replication is described in detail in ref. [4] , which is incorporated herein by reference. A preferred linear replication incompetent construct is a double-stranded linear replication incompetent construct. A preferred linear vector is a linear replication incompetent viral vector, more preferably selected from the group consisting of an adenoviral vector, a herpes viral vector, an adeno-associated viral vector, a retroviral vector, a vaccinia viral vector and a bacteriophage viral vector, such as those based on Phi29, Bam 35, Nf, PRD1 or Cp-1. A more preferred linear vector is a double -stranded linear replication incompetent viral vector.

A preferred viral vector is an adenoviral vector. Adenoviruses are non- enveloped DNA viruses. The adenovirus genome is a linear double-stranded DNA molecule of approximately 36 kilobases (kb). The packaged adenoviral DNA molecule has a 55 kiloDalton (kDa) terminal protein covalently bound to the 5' terminus of each strand. During the infection, the adenoviral genes are expressed in two phases: the early phase, which is the period up to viral DNA replication and the late phase which is the period during which viral DNA replication occurs. The early gene products are expressed during the early phase. Functions of these early genes include the preparation of the host cell for synthesis of viral structural proteins. The early gene products are encoded by regions El, E2, E3 and E4 in the adenoviral genome. Late gene products are expressed in addition to the early gene products during the late phase during which nucleic acid and protein synthesis of the host cell are turned off. The host cell thus becomes committed to the production of adenoviral DNA and adenoviral proteins. Advantages of adenoviral vectors include their ability to infect both dividing and non-dividing cells, accommodation of up to 38 kb of foreign DNA and the fact that adenoviral DNA does not normally integrate into the host cell genome. As a result of the latter, the transferred gene effect will be transient because the adenoviral and foreign DNA will be lost with continued division of host/target cells. Adenovirus -based vectors are being widely investigated for use in vaccination protocols and as anti-cancer gene therapies. Such adenoviral vectors are based on recombinant adenoviruses that are either replication-incompetent or replication competent.

Replication-incompetent adenoviral vectors have a number of characteristics that are disadvantageous for use in therapy. For instance, the generation of replication-incompetent adenoviral vectors requires the use of a complementing cell line that provides the deleted protein or proteins in trans. Replication-competent adenoviral vectors lyse host cells that are infected by the vector. For use as anti-cancer agents, replication-competent adenoviral vectors are advantageous because replication and spreading of the viral vector throughout the tumor occurs. Similarly, replication and spreading of the adenoviral vector for expression of transgenes is advantageous in order to achieve high amounts of expressed foreign protein. Tumor specific promoters can be used that cause the virus to replicate and consequently exert the cytotoxic effect of the therapeutic protein specifically in tumorous tissue. The use of replication competent adenoviral vectors then serve to deliver the vector carrying the therapeutic gene to as many cells as possible while the expression of the therapeutic protein is restricted to cells having the tumor specific promoter.

A linear construct used in accordance with the invention is preferably replication incompetent. As used herein "replication incompetent", also called replication defective or non-replicating, refers to a construct that is not capable of replicating itself. The replication incompetent construct is preferably a vector, more preferably a viral vector, that as a result of gene deletions in its genome is incapable of replication by itself. Replication incompetent adenoviruses typically lack one or more of the early region genes. An adenoviral vector can for instance be made replication incompetent by deletions in the early-region 1 E1A and E1B genes, collectively referred to as El, of the adenoviral genome. In addition, it is advantageous to make recombinant adenoviruses which are mutated in the E2 region because the E2A protein may induce an immune response and because it plays a key role in the switch to the synthesis of late adenovirus proteins and in the viral DNA replication process. In addition, the E3 genes may also be deleted because they are not essential for virus replication in cultured packaging cells.

A preferred linear replication incompetent construct used in accordance with the invention is a linear replication incompetent adenoviral vector having the E1A and E1B genes deleted and optionally the E2 and/or the E3 genes deleted as well. Particularly preferred is a so-called "minimal" adenoviral vector, which is also referred to as "gutless", "gutted", high-capacity, gene-deleted or helper-dependent adenoviral vector. From the parental wild-type adenovirus genome, such vector comprises exclusively the cis-acting inverted terminal repeats (ITR) and the packaging signal.

Replication incompetent viral vectors, preferably adenoviral vectors, can be produced in packaging cells. "Packaging cell" as used herein refers to a cell that expresses in trans the required genes necessary for production of infectious viruses that are lacking in the viral backbone. The in trans required proteins are preferably expressed from genetic elements that are integrated into the genome of the packaging cell. The genetic elements typically comprise the coding region for the viral proteins. The genetic elements preferably have essentially no overlapping nucleic acid sequences with the replication incompetent linear viral vector. This prevents the generation of replication competent viruses due to homologous DNA sequences present in the vector and in the packaging cells. Examples of packaging cells for the production of replication incompetent constructs are 293, HepG2, CHO, BHK, Sf9, Sf 21 , 293, 293T-derived cpNX, BTI-Tn 5 B 1-4, COS, N1H/3T3, Vero, CV1, NSO, PER.C6 [5], N52.E6 [6] and PER.E2A [7] cells. Packaging cells specifically designed to reduce or essentially prevent homologous recombination between the vector and packaging cell sequences are, for instance, PER.C6, N52.E6 and PER.E2A.

Also provided by the invention is a virus particle comprising a linear replication incompetent construct according to the invention. Said viral particle preferably comprises a linear replication incompetent viral vector comprising a molecule, preferably a peptide, polypeptide or protein, more preferably a peptide, polypeptide or protein of between 1 and 100 kDa, attached to one or both termini of the said linear construct, e.g. the 3' terminus and/or the 5' terminus, preferably to the 5' terminus of each strand. Preferably said virus particle is a double-stranded DNA-containing virus particle, more preferably an adenovirus particle. Such virus particle is particularly suitable for use in a method in accordance with the invention, e.g. for modifying a target sequence in the genome of a cell, for treating or preventing a genetic disease in an individual and/or for introducing an exogenous nucleic acid sequence of interest in the genome of a cell. The term "exogenous nucleic acid sequence" refers to a nucleic acid fragment that is introduced into a cell, preferably into the genome of a cell. An exogenous nucleic acid sequence as used herein can be either a nucleic acid sequence that is not normally present in said cell or that is essentially identical to a nucleic acid sequence that is endogenous to said cell. Further provided is a kit of parts comprising a virus particle comprising a linear replication incompetent construct according to the invention and instructions for use. A linear replication incompetent construct used in a method of the invention comprises a first nucleic acid sequence which is homologous to a first region of said target sequence and a second nucleic acid sequence which is homologous to a second region of said target sequence. These first and second nucleic acid sequences are herein also referred to as "first and second homologous nucleic acid sequences". As used herein, a nucleic acid sequence preferably comprises a chain of nucleotides, more preferably DNA and/or RNA, most preferably DNA. The first and second regions of the target sequence are herein also referred to as "first and second homologous regions". "Homologous" as used herein refers to a nucleic acid sequence with enough identity in nucleotides between the first or second nucleic acid sequence and the first or second regions of the target sequence, respectively, to enable HR between sequences. Preferably, homologous sequences have at least 95% sequence identity, more preferably at least 97% sequence identity, more preferably at least 98% sequence identity and more preferably at least 99% sequence identity. A most preferred first nucleic acid sequence is identical to a first region of the target sequence and a most preferred second nucleic acid sequence is identical to a second region of the target sequence. The percentage of identity of a nucleic acid sequence, or the term "% sequence identity", is defined herein as the percentage of residues in a nucleic acid sequence that is identical with the residues in a reference sequence after aligning the two sequences and without introducing gaps. Methods and computer programs for the alignment are well known in the art, for example "Align 2".

Hence, the first and second homologous nucleic acid sequences enable HR to occur between the linear replication incompetent construct and the target sequence in a genome of a cell. If a donor nucleic acid is present, the first and second homologous nucleic acid sequences are located on opposite sides of the donor nucleic acid sequence, which means that the first homologous nucleic acid sequence is located on one side of the donor nucleic acid sequence and the second homologous nucleic acid sequence is located on the other side of the donor nucleic acid sequence.

The first and second homologous regions of the target sequence are preferably located on opposite sides of the site of interest in the target sequence, which means that the first homologous region is located on one side of the site of interest and the second homologous region is located on the other or opposite side of the site of interest, or the first and second homologous regions of the target sequence contain the site of interest. If located on opposite sides of the site of interest, the first and second homologous regions are preferably located adjacent to the site of interest in said target sequence and thus to the target site at which the single-stranded or double-stranded DNA break occurs. This is because the HR frequency decreases with incremental distance between the site-specific DNA break and the homologous nucleic acid sequences. As used herein, a region

"adjacent" to a specific sequence or site of interest refers to nucleic acid present near or next to the specific sequence or site of interest at which the nuclease- induced DNA break occurs. If located on opposite sides of the site of interest, the first and second homologous regions are preferably located within 10 base pairs (bp) from the site of interest, preferably within 7 bp from the site of interest, more preferably within 5 bp from the site of interest, more preferably within 4 bp from the site of interest, more preferably within 3 bp from the site of interest, more preferably within 2 bp from the site of interest, more preferably within 1 bp from the site of interest. Most preferably, said first and second homologous regions are located directly adjacent to the site of interest, which means that no nucleic acids are present between the homologous regions and the site of interest, or the first and second homologous regions of the target sequence contain the site of interest. Said site of interest preferably only contains the nucleic acids that between which a DNA break is induced.

The total region of homology, i.e. the total amount of nucleic acids of the first and second homologous nucleic acid sequences, comprises preferably at least 100 base pairs (bp). For instance the first homologous nucleic acid sequence comprises at least 50 bp and the second homologous nucleic acid sequence also comprises at least 50 bp. However, it is not necessary that the first and second nucleic acid sequences comprise an identical number of bp. More preferably the total region of homology is at least 150 bp, more preferably at least 200 bp, more preferably at least 300 bp, more preferably at least 400 bp, more preferably at least 500 bp. There is no upper limit to the total length of the total region of homology. However, preferred total lengths of the total region of homology are between 500 bp and 20 kb, more preferably between 500 bp and 15 kb.

A linear replication incompetent construct used in a method of the invention optionally further comprises a donor nucleic acid sequence that is located between the first and second homologous nucleic acid sequences. The donor nucleic acid sequence is preferably flanked by the first and second homologous nucleic acid sequences such that the linear replication incompetent construct used in a method of the invention comprises the first nucleic acid sequence homologous to a first region of the target sequence, followed by a donor nucleic acid sequence which is followed by the second nucleic acid sequence that is homologous to a second region in the target sequence. As used herein, "donor nucleic acid sequence" refers to a nucleic acid sequence that is to be inserted, or that has been inserted, into the target sequence via the HR process. The donor nucleic acid sequence typically comprises the modification that is to be introduced into the target sequence. The donor nucleic acid sequence is inserted into the target sequence in the genome of the cell. A preferred donor nucleic acid sequence can be essentially identical to part of the target sequence to be modified, with the exception of one or more nucleotides that are different from a nucleotide at the same position in the sequence of the target sequence. HR with such essentially identical donor nucleic acid results in the introduction of one or more point mutations, which is for instance useful in gene repair. Another preferred example is a donor nucleic acid sequence comprising the nucleic acid sequence of a gene and/or of an artificially designed transcriptional unit that is to be added to the target sequence. The replacement of a target sequence with that of a donor sequence is yet another preferred embodiment.

A donor nucleic acid sequence need not be present in a construct used in accordance with the present invention. For instance, if the purpose of a HR event is to remove the target sequence or part of the target sequence without introducing a donor nucleic acid sequence. An example of such application is deletion of a gene or part thereof from the genome of a cell.

A linear replication incompetent construct used in a method of the invention further comprises a molecule attached to at least one terminus of said construct. A molecule can be any molecule that is attached to the terminus of a linear replication incompetent construct according to the invention. Preferred but non limiting examples of such molecules are inorganic or organic molecules including, but not limited to, a protein, a peptide or polypeptide, a carbohydrate, a polysaccharide, a glycoprotein, a lipid, a hormone, a drug and a nanoparticle such as a quantum dot. As used herein, the term "terminus" refers to the region including the end of a stand of a nucleic acid molecule and may include the final nucleotide and up to 5 adjacent nucleotides. The term "5' terminus" refers to the region including the end of the strand of a nucleic acid molecule that has the fifth carbon in the sugar-ring of the deoxyribose or ribose at its terminus. This region may include the final nucleotide of the nucleic acid strand and up to 5 adjacent nucleotides. The term "3' terminus" refers to the region including the end of the strand of a nucleic acid molecule which terminates at the hydroxyl group of the third carbon in the sugar-ring of the deoxyribose or ribose at its terminus. This region may include the final nucleotide of the nucleic acid strand and up to 5 adjacent nucleotides. A molecule is preferably attached to one or more of the terminal five nucleotides of the 3' and/or 5' terminus or termini, more preferably to one or more of the terminal four nucleotides, more preferably to one or more of the terminal four nucleotides, more preferably to one or more of the terminal four nucleotides. Most preferably said molecule, preferably a (polypeptide or protein, is attached to the terminal nucleotide of the 3' and/or 5' terminus or termini. Without being bound by theory, it is believed that a linear HR construct with a molecule attached to at least one terminus thereof, preferably to both termini, blocks DNA-DNA interactions involving illegitimate recombination, such as between the construct and DNA sequences, such as genomic DNA, other vector DNA and cytoplasmatic DNA or between multiple copies of the HR construct. Such construct with a molecule attached to at least one 5' or 3' terminus is also referred to as a capped construct or a protein-capped construct if the molecules are peptides, polypeptides or proteins. A capped construct or protein-capped construct preferably comprises a molecule or protein attached to both 5' termini, or to both 3' termini, of a double-stranded nucleic acid molecule. Further preferred are capped constructs that comprise molecules or proteins attached to all termini, i.e. the two 5' and the two 3' termini of a double-stranded nucleic acid molecule. Fig. 9 provides an overview of generic arrangements of possible terminally-capped linear nucleic acid constructs in accordance with the invention. By blocking such DNA-DNA

interactions, illegitimate recombination events including inaccurate recombination such as concatemerization (i.e. multi-copy end-to-end assembly of exogenous DNA) are minimized. As demonstrated in the Examples, the present inventors found that the occurrence of illegitimate recombination can be reduced and HR accuracy is largely increased if a protein-capped HR construct is used. The superiority of the use of a protein-capped HR construct is shown over the most commonly used viral (IDLV) and non-capped non-viral vectors. Further, the reduced illegitimate recombination and increased HR accuracy appeared to be independent from the system used to introduce DSBs, as the effects were seen both with TALENs and with RGN complexes. In the Examples an adenoviral vector was used as an HR construct. The adenovirus contains a 55 kDa terminal protein that is covalently bound to the 5' ends of the linear double-stranded adenovirus genome. The TP is attached to the 5' termini of the genome by a phosphodiester bond. The 87 kDa precursor of the TP covalently binds to the first deoxycytidine nucleotide residue (dCMP) of a newly synthesized adenoviral DNA chain. The protein-bound dCMP then functions as a primer for DNA synthesis. A preferred molecule is a peptide, polypeptide or protein. As used herein the terms "peptide' "polypeptide" and "protein" refer to proteinaceous molecules that comprise multiple amino acids. The terms "peptide' "polypeptide" and "protein" are herein collectively referred to as "(polypeptide". Such peptide, polypeptide or protein preferably has a size of between 1 and 100 kDa, more preferably of between 10 and 80 kDa. Preferred, but non-limiting, examples of suitable molecules that can be used in accordance with the invention are Biotin, optionally attached to a binding partner of Biotin, such as Avidin, Streptavidin or Neutravidin, terminal proteins and precursor terminal proteins of adenoviruses, e.g. human adenovirus 2, 17, 5, 50, 35 or 26, phage terminal proteins, such as Bacillus phage protein phi29, Bacillus phage protein Nf, Enterobacteria phage protein PRD1, Streptococcus phage protein Cp- 1 and

Actinomyces phage protein av- 1, and Streptomyces terminal proteins, such as S. lividans TpgL, S. coelicolor TpgC, S. avermitilis TpgAl, S. rochei TpgRM and S. scabiei TpgL. Amino acid sequences of suitable adenoviral, phage or Streptomyces terminal proteins and precursor terminal proteins are shown in Fig. 10 . A preferred molecule is an adenoviral, phage or Streptomyces terminal protein or precursor terminal protein or a protein having at least 70% sequence identity thereto, preferably at least 80%, more preferably at least 90%, most preferably at least 95% sequence identity thereto. More preferably, a molecule has an amino acid sequence that has least 70% sequence identity to a sequence from Fig. 10 , preferably at least 80%, more preferably at least 90%, most preferably at least 95% sequence identity thereto.

A molecule, preferably a peptide, polypeptide or protein, is preferably attached to a 5' terminus or a 3' terminus or both of the construct by a covalent or non-covalent bond, preferably to a 3' and/or 5' terminal nucleotide of said construct. Preferably a linear replication incompetent construct used in accordance with the invention comprises at least two molecules, preferably (polypeptides, one at each 5' or 3' terminus of said construct. Further preferred is a linear replication

incompetent construct comprising four molecules, preferably (poly)peptides, whereby two molecules are attached to the 5' termini and two molecules are attached to the 3' termini of a double-stranded nucleic acid construct. Said molecule is preferably attached to at least one 5' or 3' terminus of each strand of a double-stranded linear replication incompetent construct, preferably a double- stranded linear replication incompetent viral vector. Thus, a preferred linear replication incompetent construct for use in a method of the invention is a, preferably double-stranded, linear replication incompetent construct comprising at least two molecules attached to the 5' and/or 3' terminus of each strand. More preferably, said construct is a linear replication incompetent viral vector, preferably double-stranded, comprising two (polypeptides attached to the 5' terminus of each strand, preferably the 55 kDa adenoviral proteins. A particularly preferred construct is a linear replication incompetent adenoviral vector, preferably double-stranded, comprising two, preferably inert, peptides, polypeptides or proteins attached to the 5' terminus of each strand, wherein said (polypeptides preferably are the 55 kDa adenoviral terminal protein or a protein similar thereto.

The molecule is preferably an inert molecule when present in the target cell. As used herein "inert molecule" refers to a molecule that has little or no ability to react with other molecules, e.g. that, when attached to the terminus of a construct in accordance with the invention, does not have biological functionality in the cell into which the linear construct is introduced other than blocking DNA- DNA illegitimate recombination.

Linear replication incompetent constructs comprising one or more molecules attached to at least one of their termini can be obtained by producing in cell lines and bacterial strains viral vectors whose parental viruses replicate via a protein-primed DNA replication mechanism such as adenoviruses and

bacteriophage Phi29.

Linear replication incompetent constructs comprising one or more molecules attached to at least one of their termini can also be obtained by using in vitro and in bacteria DNA replication systems such as those based on the phage Phi29 and the Streptomyces replicons, respectively (see, for example, refs. [3,4]). Linear replication incompetent constructs containing molecules attached to at least one of their termini can further be generated by in vitro capping of synthetic and/or recombinant nucleic acids. This can involve the direct chemical conjugation of molecules (e.g. polypeptides]) to linear replication incompetent constructs or, alternatively, first to oligonucleotide intermediates. The resulting modified oligonucleotides can subsequently be ligated to an "acceptor" linear replication incompetent construct. A non-limiting example of a modified oligonucleotide is provided by a 5' and/or 3' Biotin-modified oligonucleotide. The ligation of the modified oligonucleotides to "acceptor" donor DNA molecules can entail Watson- Crick hybridizations and/or DNA ligase treatments. The modified oligonucleotides can be used directly in the form of single-stranded or, after hybridization, double- stranded short linear replication incompetent donor DNA molecules.

The in vitro capping of synthetic and/or recombinant nucleic acids can also involve PCR amplification of a linear replication incompetent construct template by using as primers the aforementioned modified oligonucleotides, such as the 5' and/or 3' Biotin-modified oligonucleotide. For instance, Biotin-modified primers can be used to generate Biotin-capped linear replication incompetent constructs by PCR amplification of a donor DNA molecule of choice. The resulting products can be further modified, e.g. to introduce a bulkier "inert" moiety and/or an "effector" motif such as a nuclear localization signal. Such modification can comprise the covalent binding of Biotin to a protein of interest by chemical conjugation or by the strong non-covalent coupling of Biotin to Biotin-binding partners such unmodified or fusion protein-modified Avidin, Streptavidin or Neutravidin.

The terminally-capped linear nucleic acid constructs harboring the donor DNA resulting from the above-described procedures can be delivered into target cells by methods known in the art, which include, but are not limited to, viral vector particle-mediated transduction, microinjection, gene-gun or by standard nucleic acid transfection methods such as those based on calcium phosphate precipitation, liposomes, polycations, magnetofection and electroporation.

The invention provides a method for modifying a target sequence in a genome of a cell comprising:

- inducing a DNA break at a site of interest in said target sequence, preferably a double-stranded DNA break;

- introducing into said cell a double-stranded linear replication incompetent construct, preferably a viral vector, comprising

i) a first nucleic acid sequence which is homologous to a first region of said target sequence,

ii) a second nucleic acid sequence which is homologous to a second region of said target sequence, iii) optionally a donor nucleic acid sequence located between said first and second nucleic acid sequences and

iv) a (polypeptide attached to the 5' terminus of each strand of said double-stranded construct, preferably of said double-stranded viral vector, whereby said first region is located on one side of said site of interest and said second region is located on the other side of said site of interest.

Also provided is a double -stranded linear replication incompetent construct, preferably a viral vector, comprising

- a targeting region comprising:

i) a first nucleic acid sequence which is homologous to a first region of a target sequence in a genome of a cell;

ii) a second nucleic acid sequence which is homologous to a second region of said target sequence; and

iii) optionally a donor nucleic acid sequence located between said first and second nucleic acid sequences; and

- a (polypeptide attached to the 5' terminus of each strand of said double-stranded construct, preferably of said double -stranded viral vector.

Also provided is a set of constructs comprising

- a first double-stranded construct comprising:

i) a first nucleic acid sequence which is homologous to a first region of a target sequence in a genome of a cell;

ii) a second nucleic acid sequence which is homologous to a second region of said target sequence;

iii) optionally a donor nucleic acid sequence located between said first and second nucleic acid sequences; and

iv) a (polypeptide attached to the 5' terminus of each strand of said double-stranded construct, preferably of said double-stranded viral vector,

said first construct being a double-stranded linear replication incompetent construct, preferably a viral vector, and

- a second construct comprising a nucleic acid sequence encoding an endonuclease capable of inducing a site-specific DNA break in a genome of a cell.

A method of the invention comprises inducing a DNA break at a site of interest in the target sequence. Said DNA break is preferably a single-stranded or double-stranded DNA break, more preferably a double-stranded DNA break. A single-stranded or double-stranded break is induced in a site of interest of a target sequence. The site of interest preferably comprises a genetic mutation or is located in a safe harbor locus. A single-stranded or double -stranded DNA break is preferably induced by a sequence-specific endonuclease that recognizes large DNA recognition target sites of approximately 12 to 60 bp. For instance, a typical ZFN monomer may have a recognition site of 12 bp, i.e. 4 zinc-fingers binding to 3 nucleotides each. In the case of TALEN dimers the total target size is typically around 17 bp x2 + 12/13 bp of a spacer. As another example, homing endonucleases have target sites of 18-50 bps. For instance, the I-Scel homing endonuclease and engineered derivatives have a target site of 18 bp. As yet another example, the RNA-guided nucleases (RGNs) based on the CRISPR/Cas adaptive immune systems of bacteria have a target site of about 20 bp. The sequence-specific endonuclease specifically binds to the site of interest. As used herein, the term "specifically binds to the site of interest and introduces a single-stranded or double- stranded DNA break within the target sequence" means that the endonuclease is designed such that it binds to the particular sequence of a site of interest or near a site of interest and preferably does not bind to other sequences located in the genome. In a preferred embodiment said endonuclease essentially does not bind to such other sequences located in the genome. Provided is a method of the invention comprising introducing into said cell an endonuclease capable of inducing said single-stranded or double-stranded break. Said endonuclease is for instance introduced into target cells directly as unmodified protein, or as protein modified with a protein transduction domain, or by linking it to structural component(s) of a gene delivery system. Alternatively, said endonuclease is for instance introduced into target cells as DNA or mRNA nucleic acid constructs encoding it. Further provided is thus a method of the invention comprising introducing into said cell a construct comprising a nucleic acid sequence encoding an endonuclease capable of inducing said site-specific DNA break. Also provided is a construct according to the invention comprising a nucleic acid sequence encoding an endonuclease capable of inducing a single-stranded or double-stranded break, preferably a double-stranded DNA break, in the target sequence in a genome of a cell. Any nuclease able to cleave a genome at a specific position and induce a single- or double-stranded DNA break can be used in the present invention. Non- limiting examples of nucleases that can be used in accordance with the present invention are homing endonucleases, which are also referred to as meganucleases, Transcription Activator- Like (TAL) effector nucleases (TALENs), Zinc-finger nucleases (ZFNs) and RNA-guided nucleases (RGNs) based on CRISPR/Cas adaptive immune systems of prokaryotes [8,9] .

Meganucleases are endonucleases having a large nucleotide recognition site. Meganucleases are also called rare-cutting or very rare-cutting endonucleases. They have a high specificity for their nucleic acid target and can cleave a predefined chromosomal target sequence of choice. Meganucleases are found in eukaryotes, bacteria and archaea. Wild-type meganucleases such as I-Scel, 1-Crel and 1-Dmol, recognizing and cleaving their specific native DNA target sequences as well as custom engineered meganucleases which recognize and cleave novel DNA target sequences to which they were engineered to bind to can be used in accordance with the invention [8]. Suitable meganucleases that can be used in accordance with the invention are I-Scel, PI-SceI, I-Ceul, I-Crel, I-Chul, I-Csml, PI-Tlil, PI-MtuI, , I-SceII, I-Sce III, ΡΙ-CivI, Pi-Ctrl, PI-Aael, P-Bsul, PI-Dhal, I- Dmol, PI-Dral, PI-FacI, PI-PhoI, I-Msol, PI-MavI, PI-MchI, PI-MfuI, PI-Mfll, PI- Mgol, PI-MinI, PI-Mkal, PI-Mlel, PI-Mmal, PI-MshI, PI-MsmI, PI-MthI, PI-MtuI, PI-NpuI, Pl-Pful, PI-Rmal, Pl-Spbl, PI-SspI, PI-TagI, PI-Thyl, PI-Tkl and PI-TspI, preferably 1-Crel, I-Chul, I-Dmol, I-Crel, I-Csml, PI-SceI, or Pl-Pful and

derivatives thereof.

TALENs comprise a TALE (Transcription Activator -Like Effector) DNA- binding domain fused to a nuclease domain (e.g; that of the type IIS restriction enzyme Fokl). TALEs are naturally-occurring transcription factors that bind specific sequences within gene promoters. These proteins are found in

phytopathogenic bacteria of the genus Xanthomonas, and their role is to induce expression of specific host plant genes for enhanced virulence. TALEs contain a central region of tandem direct repeats that are responsible for sequence-specific DNA binding. Most repeats are composed of 34 amino acids with the only distinguishing feature among different repeats being a hypervariable

polymorphism at positions 12 and 13 called "repeat-variable di-residue" (RVD). Each individual RVD dictates the binding of the repeat to a single nucleotide. This direct one-repeat to one-nucleotide relationship establishes a simple rule

underlying TALE-DNA interactions. TALENs operate in pairs of two monomers [9]. The directional binding of each TALEN monomer to its respective half-target site induces dimerization of the Fokl portions resulting in site-specific DNA cleavage. Therefore, TALENs can be engineered to recognize and cleave a DNA target of choice with high specificity.

ZFNs are yet another class of artificial endonucleases that can be designed to bind to a predefined genomic target site and thus induce a DNA break at this specific site [9]. ZFNs are artificial enzymes generated by fusing an array of zinc finger DNA-binding domains to a nuclease DNA-cleaving domain (e.g. that of Fokl). Like TALENs, ZFNs work as dimers inducting single- or double-stranded DNA breaks at predefined target sequences of choice.

RGNs are RNA-dependent DNA nucleases based on type II clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated (Cas) adaptive immune systems of prokaryotes. For instance, the type II CRISPR/Cas9 system from Streptococcus pyogenes has been recently adapted to work as sequence-specific RGNs in mammalian cells [9]. These RGN systems comprise transfecting cells with RNA Pol-II and RNA Pol-III expression cassettes encoding, respectively, the Cas9 nuclease and a chimeric single guide RNA (sgRNA). The sequence-specific sgRNA module is engineered by fusing sequence-tailored CRISPR RNAs (crRNAs) to trans-acting CRISPR RNA (tracrRNA) scaffolds. Thus, RGN target site specificity is governed by RNA-DNA hybridizations, as opposed to protein-DNA interactions. This feature makes CRISPR/Cas-based RGN systems easy to engineer allowing also multiplexing i.e. targeting simultaneously multiple sequences within a target cell.

Preferably, a method of the invention comprises introducing into the cell nucleases capable of inducing the single-stranded or double-stranded DNA break, preferably double-stranded DNA break. Nucleases can be introduced into a target cell using both transgenic and transgene-free strategies. The nucleases can for instance be introduced into the target cells directly as unmodified proteins or as proteins modified with so-called protein transduction domains (PTDs), a.k.a cell- penetrating peptides (CPPs), of which RQIKIWFQNRRMKWKK (Antennapedia) and GRKKRRQRRRPPQ (HIV Tat) are but two examples. As another example, nucleases can be introduced in a target cell by linking or fusing them to structural components of gene delivery vehicles such as viral vectors or by delivering them packaged in vector particles in the form of "ready-for-expression" mRNA templates. Alternatively, these nucleases are introduced into the target cells in the form of recombinant constructs comprising nucleic acid sequences encoding them (i.e. DNA or mRNA). Such DNA constructs preferably comprise a coding sequence that encodes an endonuclease which is preferably coupled to an inducible promoter. The term "inducible promoter", as is used herein, refers to a promoter of which the expression can be regulated. Inducible promoters are known to the skilled person. Suitable inducible promoters depend on the type of cell that is targeted by HR in accordance with the invention, and are known to the skilled person. Examples of suitable inducible promoters include, but are not limited to, RAmy3D and CaMV promoters for expression in plant cells, polyhedrin, p lO, IE-0, PCNA, OplE2, OplEl, and Actin 5c promoters for expression in insect cells, and beta-actin promoter, immunoglobin promoter, 5S RNA promoter, promoter/enhancer elements derived from the human genes EEF1A1, UBC and PGK1 (a.k.a. EFla, ubiquin C and PGK, respectively) and virus derived promoters such as cytomegalovirus (CMV), Rous sarcoma virus (RSV), and Simian virus 40 (SV40) promoters for expression in mammalian cells.

A preferred nuclease used in accordance with the invention is a Transcription Activator- Like (TAL) effector nuclease (TALEN). As demonstrated in Example 2, introduction of DSBs in the target sequence with the use of a TALEN in combination with a capped DNA targeting construct results in particularly specific and accurate HR. Provided is therefore a method for modifying a target sequence in a genome of a cell comprising:

- inducing a DNA break, preferably a double-stranded DNA break, at a site of interest in said target sequence by introducing into said cell an endonuclease capable of inducing said DNA break or a construct comprising a nucleic acid sequence encoding an endonuclease capable of inducing said DNA break, wherein said endonuclease is a TALEN;

- introducing into said cell a linear replication incompetent construct, preferably an adenoviral vector, comprising i) a first nucleic acid sequence which is homologous to a first region of said target sequence,

ii) a second nucleic acid sequence which is homologous to a second region of said target sequence,

iii) optionally a donor nucleic acid sequence located between said first and second nucleic acid sequences and

iv) a molecule, preferably a (poly)peptide, attached to at least one terminus of said construct, whereby said first region is located on one side of said site of interest and said second region is located on the other side of said site of interest. Preferably a construct comprising a nucleic acid sequence encoding a TALEN is introduced into said cell. Preferably said linear replication incompetent construct comprises a (poly)peptide attached to the 5' terminus of each strand of a double- stranded viral, preferably adenoviral, construct.

Also provided is a set of constructs comprising

- a first linear replication incompetent construct comprising:

i) a first nucleic acid sequence which is homologous to a first region of a target sequence in a genome of a cell;

ii) a second nucleic acid sequence which is homologous to a second region of said target sequence;

iii) optionally a donor nucleic acid sequence located between said first and second nucleic acid sequences; and

iv) a molecule, preferably a (poly)peptide, attached to at least one terminus of said construct,

said first construct preferably being a double-stranded adenoviral vector, and - a second construct comprising a nucleic acid sequence encoding a TALEN.

Also provided is a linear replication incompetent construct, preferably a viral construct, more preferably an adenoviral construct, comprising:

- a targeting region comprising:

i) a first nucleic acid sequence which is homologous to a first region of a target sequence in a genome of a cell;

ii) a second nucleic acid sequence which is homologous to a second s region of said target sequence; and iii) optionally a donor nucleic acid sequence located between said first and second nucleic acid sequences;

- a molecule, preferably a (poly)peptide, attached to at least one terminus of said construct, and

- a nucleic acid sequence encoding an endonuclease capable of inducing a DNA break in said target sequence, wherein said endonuclease is a Transcription Activator- Like (TAL) effector nucleases (TALEN).

Another preferred nuclease used in accordance with the invention is an RNA-guided nuclease (RGN), such as the Cas9 nuclease, which is used in combination with a single guide RNA (sgRNA) addressing the RGN to a target site in the genome of a cell. As demonstrated in Example 4, introduction of DSBs in the target sequence with the use of Cas9 and a single guide RNA (sgRNA) in combination with a capped DNA targeting construct results in particularly specific and accurate HR. Provided is therefore a method for modifying a target sequence in a genome of a cell comprising:

- inducing a DNA break, preferably a double-stranded DNA break, at a site of interest in said target sequence by introducing into said cell an endonuclease capable of inducing said DNA break or a construct comprising a nucleic acid sequence encoding an endonuclease capable of inducing said DNA break, wherein said endonuclease is a RGN, preferably Cas9;

- introducing into said cell a linear replication incompetent construct, preferably a viral vector, more preferably an adenoviral vector, comprising

i) a first nucleic acid sequence which is homologous to a first region of said target sequence,

ii) a second nucleic acid sequence which is homologous to a second region of said target sequence,

iii) optionally a donor nucleic acid sequence located between said first and second nucleic acid sequences and

iv) a molecule, preferably a (poly)peptide, attached to at least one terminus of said construct, whereby said first region is located on one side of said site of interest and said second region is located on the other side of said site of interest. Preferably a construct comprising a nucleic acid sequence encoding an RGN, preferably Cas9, and a construct comprising a nucleic acid sequence encoding a sgRNA addressing the RGN, preferably Cas9, to the target site in the human genome, is introduced into said cell. Alternatively, a single construct comprising a nucleic acid sequence encoding an RGN, preferably Cas9, and a nucleic acid sequence encoding a sgRNA addressing the RGN, preferably Cas9, to the target site in the genome of a cell, is introduced into said cell. An example of a sgRNA target sequence is that in the AAVSl safe harbor locus shown in Fig. 11a.

Preferably said linear replication incompetent construct comprises a (polypeptide attached to the 5' terminus of each strand of a double-stranded viral, preferably adenoviral, construct.

Also provided is a set of constructs comprising

- a first linear replication incompetent construct comprising:

i) a first nucleic acid sequence which is homologous to a first region of a target sequence in a genome of a cell;

ii) a second nucleic acid sequence which is homologous to a second region of said target sequence;

iii) optionally a donor nucleic acid sequence located between said first and second nucleic acid sequences; and

iv) a molecule, preferably a (poly)peptide, attached to at least one terminus of said construct,

said first construct preferably being a double-stranded viral vector, more preferably an adenoviral vector,

- a second construct comprising a nucleic acid sequence encoding an RGN, preferably Cas9, and

- a third construct comprising a nucleic acid sequence encoding a sgRNA

addressing the RGN, preferably Cas9, to the target sequence in the genome of said cell. The RGN and sgRNA are preferably introduced into said cell by transfecting cells with RNA Pol-II and RNA Pol-III expression cassettes encoding, respectively, the RGN and a chimeric single guide RNA (sgRNA). Hence, said second construct preferably is a RNA Pol-II expression cassette encoding the RGN and said third construct is preferably a RNA Pol-III expression cassette encoding a chimeric single guide RNA (sgRNA).

Also provided is a set of constructs comprising - a first linear replication incompetent construct comprising:

i) a first nucleic acid sequence which is homologous to a first region of a target sequence in a genome of a cell;

ii) a second nucleic acid sequence which is homologous to a second region of said target sequence;

iii) optionally a donor nucleic acid sequence located between said first and second nucleic acid sequences; and

iv) a molecule, preferably a (poly)peptide, attached to at least one terminus of said construct,

said first construct preferably being a double-stranded viral vector, more preferably an adenoviral vector,

- a second construct comprising a nucleic acid sequence encoding an RGN, preferably Cas9, and a nucleic acid sequence encoding a sgRNA addressing the RGN, preferably Cas9, to the target sequence in the genome of said cell.

Also provided is a linear replication incompetent construct, preferably a viral construct, more preferably an adenoviral construct, comprising:

- a targeting region comprising:

i) a first nucleic acid sequence which is homologous to a first region of a target sequence in a genome of a cell;

ii) a second nucleic acid sequence which is homologous to a second s region of said target sequence; and

iii) optionally a donor nucleic acid sequence located between said first and second nucleic acid sequences;

- a molecule, preferably a (poly)peptide, attached to at least one terminus of said construct, and

- a nucleic acid sequence encoding an endonuclease capable of inducing a DNA break in said target sequence, wherein said endonuclease is a RNA-guided nuclease (RGN). Said construct further optionally comprises a nucleic acid sequence encoding a single guide RNA (sgRNA).

The construct comprising nucleic acid sequence encoding an endonuclease capable of inducing the single-stranded or double-stranded DNA break is preferably a vector, such as a linear or circular plasmid or minicircles non- viral vectors or a viral vector based on adenoviruses, herpes viruses, adeno- associated viruses, retroviruses, vaccinia virus, SV40, baculoviruses, alphaviruses, herpes simplex viruses, poxviruses and bacteriophages. A method of the invention may comprise introducing into a cell two separate constructs, i.e. one construct comprising a nucleic acid sequence encoding an endonuclease capable of inducing a site-specific break, preferably a double-stranded DNA break, and one linear replication incompetent construct comprising regions of homology, a donor nucleic acid and a molecule attached to at least one terminus thereof. Alternatively, a method of the invention comprises introducing into a cell a single construct, i.e. a linear replication incompetent constructs as defined herein, comprising a nucleic acid sequence encoding an endonuclease capable of inducing a site-specific break, preferably a double-stranded DNA break, and regions of homology, a donor nucleic acid and a molecule attached to at least one terminus thereof. Said endonuclease is preferably a TALEN or an RGN.

A linear replication incompetent construct according to the invention and, optionally, a construct comprising a nucleic acid sequence encoding an endonuclease can be introduced in a cell by various methods known in the art. Such methods are known to a skilled person, and include, but are not limited to, electroporation, calcium phosphate-mediated transfection and lipofection for non- viral constructs or by transducing cells with viral vectors containing the constructs. In the Examples a method to introduce constructs into the cells by using viral vectors is provided under the heading "transduction experiments". Alternatively, the nucleases can be introduced into the target cells in the form of mRNA or directly as unmodified proteins or as proteins containing so-called supercharged protein transduction domains (PTDs), a.k. a cell-penetrating peptides (CPPs), of which RQIKIWFQNRRMKWKK (Antennapedia) and GRKKRRQRRRPPQ (HIV Tat) are but two examples. Said nuclease is preferably a TALEN or an RGN. The nucleic acid sequence of a linear replication incompetent construct according to the invention used for HR is preferably present in a HR cassette. The HR cassette comprises the first nucleic acid sequence, second nucleic acid sequence and optionally a donor nucleic acid sequence as defined herein. The cassette preferably further comprises a selection marker, and/or a nucleic acid sequence encoding an endonuclease that is capable of inducing site-specific breaks. Said endonuclease is preferably a TALEN or an RGN. A person skilled in the art will be able to select the appropriate promoter for the particular endonuclease encoding sequence. The HR cassette can further comprise a marker sequence, such as a positive or negative selection maker, which can be used to identify cells which have undergone HR. Regulatory sequences that may be present in a HR cassette include, but are not limited to, RNA polymerase II promoters (e.g. from the HSV-1, SV40, RSV and CMV viral genomes and from the mammalian genes HPRT1, PGK1, EEF1A1 and UBC, from chimeric elements such as the CAG promoter and from artificially designed regulatory elements such as the doxy cycline -controllable promoter consisting of a minimal CMV promoter [mCMV] fused to 7 copies of the E.coli tetO sequence in tandem) and polyadenylation signals (e.g. from the HSV-1, SV40, RSV, CMV and AAV viral genomes and from the mammalian genes HPRT1, PGK1, EEF1A1, UBC, β-globin, -actin, β-actin and growth hormone) for expressing proteins and RNA polymerase III promoters (e.g. U6, HI and 7SK) and terminator sequences (e.g. TTTTT) for expressing short RNAs (e.g. microRNAs [miRNAs], short-hairpin RNAs [shRNAs] and single-guide RNAs [sgRNAs]). Non- limiting examples of selection marker genes ("selectable markers") include those that confer resistance to Geneticin, Zeocin, Puromycin and Hygromycin as well as the genes HPRT1, DHFR, HSV-1 thymidine kinase, inosine monophosphate dehydrogenase 2 (IMPDH2) variants and the P140K mutant of the methylguanine methyltransferase gene (MGMV 140 ^, or a gene compatible with FACS- and MACS- based selection/isolation such as the tNGFR (truncated nerve growth factor receptor). Other elements that can be included within expression units present in linear replication incompetent constructs are those encoding heterologous tags (e.g. FLAG, His, HA and HaloTag ® tags and fluorophore-containing [poly]pep tides ), internal ribosomal entry sites (e.g. from the encephalomyocarditis virus, Plautia stali intestine virus, Cricket paralysis virus, classical swine fever virus,

picornoviruses and hepatitis A and hepatitis C viruses), miRNA target sequences for tissue-specific gene expression and "self-cleaving" peptides (e.g. the P2A, T2A, E2A, and F2A peptides from the porcine teschovirus-l, Thosea asigna virus, equine rhinitis A virus and foot-and-mouth disease virus, respectively). A still further example of a genetic element that may be present in a HR assette are those which work at an epigenetic level by favoring an open chromatin structure for the exogenous DNA and by blocking its interaction with neighboring endogenous genes, e.g. Scaffold/Matrix Attachment Regions (S/MARs), such as those from the IFNB1 and immunoglobulin-] light chain (Igkc) loci and the human 1-68 MAR, and insulator elements such as the cHS4 and Locus Control Regions (LCRs) such as that of the human beta-globin gene.

Methods and the use of constructs or (nucleic acid-containing) virus particles according to the invention allow for a specific, accurate and controlled generation of cells having a donor nucleic acid sequence integrated into a specific site in the genome or having a deletion of genomic sequences for both therapeutic and non-therapeutic applications. Such methods, constructs and (nucleic acid- containing) virus particles are particularly suitable for gene therapy, gene targeting and/or gene repair. Gene therapy as used herein refers to the use of nucleic acid sequences, in particular DNA, as a pharmaceutical agent to treat disease, in particular genetic diseases, viral diseases or tumors. Use of methods and/or constructs of the invention for gene therapy involve for instance introducing a nucleic acid sequence that encodes a functional gene to repair a mutated gene, correcting a genetic mutation, or introducing a nucleic acid sequence that encodes a therapeutic protein.

As described herein before and demonstrated in the examples, use of a method of the invention greatly reduces the ratio of illegitimate to legitimate recombination, i.e. greatly reduces the incidence of off-target and incorrect or multi-copy exogenous DNA inserts. As used herein, the term

"legitimate recombination" refers to the integration of a donor nucleic acid sequence as defined herein in the target sequence in the genome of the targeted cell in the correct arrangement. Conversely, "illegitimate recombination" refers to the integration of a donor nucleic acid sequence as defined herein at a location in the genome of the cell other than within the target sequence or integration of said donor nucleic acid sequence in the target sequence of the genome in an incorrect structure or arrangement. An incorrect structure or arrangement of a donor nucleic acid sequence includes for instance the chromosomal integration of the donor nucleic acid sequence in multiple copies in tandem (i.e. concatemers) and/or the unpredictable incorporation of unwarranted viral- or prokaryotic-derived sequences at the target site.

Provided is therefore the use of a construct, set of constructs or virus particle according to the invention for HR-mediated genome engineering wherein the ratio of legitimate versus illegitimate integration of a donor nucleic acid sequence is more than 10. Said virus particle is preferably a nucleic acid-containing virus particle. As used herein, "homologous recombination" refers to the

recombination of a first nucleic acid sequence with a target nucleic acid sequence in a genome of a cell with which the first nucleic acid molecule has homology, preferably a sequence identity of at least 90%, more preferably of at least 95%, more preferably of at least 98%, more preferably of at least 99%. Said sequence identity may be 100%. Said HR preferably comprises a method for modifying a target sequence in a genome of a cell according to the invention. "The ratio of legitimate versus illegitimate integration of a donor nucleic acid sequence" as used herein is defined as:

the number of cells wherein legitimate recombination occurred / the number of cells wherein illegitimate recombination occurred.

This ratio thus represents:

the number of cells wherein the donor nucleic acid is integrated in the target sequence in the genome of the targeted cell in the correct arrangement / the number of cells wherein the donor nucleic acid is integrated at a location in the genome of the cell other than within the target sequence or wherein said donor nucleic acid sequence is integrated in the target sequence of the genome in an incorrect structure and/or arrangement. A method of the invention therefore preferably comprises modifying a target sequence in a genome of at least 10 cells, and/or introducing an exogenous nucleic acid sequence in the genome of at least 10 cells, more preferably of at least 100 cells, more preferably at least 10 3 cells, more preferably at least 10 4 cells. It will be understood by the skilled person that reference to "a cell" as used herein includes both references to a single cell as well as to more than one cell. Preferably, the use of a construct, set of constructs or virus particle according to the invention for a method of HR-mediated genome engineering is provided wherein in said method the ratio of legitimate versus illegitimate integration of a donor nucleic acid sequence is more than 10 and wherein said method comprises engineering the genome of at least 10 cells, more preferably of at least 100 cells, more preferably at least 10 3 cells, more preferably at least 10 4 cells.

Provided is a method for modifying a target sequence in a genome of a cell comprising:

- inducing a chromosomal DNA break, preferably a single-stranded or double- stranded DNA break, more preferably a double-stranded DNA break, at a site of interest in said target sequence;

- introducing into said cell a linear replication incompetent construct comprising i) a first nucleic acid sequence which is homologous to a first region of said target sequence,

ii) a second nucleic acid sequence which is homologous to a second region of said target sequence,

iii) a donor nucleic acid sequence located between said first and second nucleic acid sequences and

iv) a molecule attached to at least one terminus of said construct, whereby said first region is located on one side of said site of interest and said second region is located on the other side of said site of interest and wherein the ratio of legitimate versus illegitimate integration of a donor nucleic acid sequence is more than 10. Said method preferably comprises modifying a target sequence in a genome of at least 10 cells, more preferably of at least 100 cells, more preferably at least 10 3 cells, more preferably at least 10 4 cells.

The ratio of legitimate versus illegitimate integration of a donor nucleic acid sequence can for instance be determined by isolating and screening individual randomly- selected cellular clones genetically modified with the donor DNA. The isolation of these clones can be done by limiting dilution or can be assisted by using cell sorting methodologies such as Fluorescence -Activated Cell Sorting and

Magnetic-Activated Cell Sorting combined with fluorescent reporter proteins (e.g. enhanced green fluorescence protein) and cell-surface recombinant proteins (e.g. truncated nerve growth factor receptor), respectively (see, for example, refs.

[10, 11] . Cellular clones can also be isolated by deploying marker gene/small- molecule drug combinations. Characterization of chromosomally inserted donor DNA in the various randomly-selected cellular clones (e.g. genomic position, copy number, and structure/arrangement) can be carried out by a multitude of complementary techniques known in the art which include: Southern blot analysis, PCR, inverted PCR and DNA sequencing of donor-chromosomal DNA junctions (centromeric- and telomeric-oriented). In addition, the ratio of legitimate versus illegitimate integration of a donor nucleic acid sequence can also be determined by direct analysis of a target cell population of interest by using for instance:

Fluorescence In Situ Hybridization and digital PCR. The latter technique is particularly suitable for the absolute quantification of on-target versus off-target DNA insertion events directly in a whole target cell population. These data can be complemented by determining the structure and types of DNA inserts in the target cell population by Ion torrent sequencing of PCR amplicons spanning the target site. Further, whole-genome sequencing (WGS) techniques can be deployed to obtain nucleotide -level resolution of the genetic modifications introduced, at a genome-wide scale, in target cells. The same methods can be used for determining whether or not a donor nucleic acid sequence has been introduced in the genome of a cell and for determining the percentage of cells having a donor nucleic acid sequence introduced in their genome in a population of cells. Also provided is a method for HR comprising providing a cell with a construct, set of constructs or (nucleic acid-containing) virus particle according to the invention, wherein the ratio of legitimate versus illegitimate integration of a donor nucleic acid sequence is more than 10.

Further provided is the use of a construct, set of constructs or (nucleic acid-containing) virus particle according to the invention for HR wherein the ratio of legitimate versus illegitimate integration of a donor nucleic acid sequence is more than 10.

Further provided is the use of a construct, set of constructs or (nucleic acid-containing) virus particle according to the invention for the preparation of a pharmaceutical composition for HR wherein the ratio of legitimate versus illegitimate integration of a donor nucleic acid sequence is more than 10.

Further provided is a construct, set of constructs or (nucleic acid- containing) virus particle according to the invention for use in HR wherein the ratio of legitimate versus illegitimate integration of a donor nucleic acid sequence is more than 10. Preferably, a construct, set of constructs or (nucleic acid- containing) virus particle according to the invention for use in a method of HR is provided wherein in said method the ratio of legitimate versus illegitimate integration of a donor nucleic acid sequence is more than 10 and wherein said method comprises modifying a target sequence in a genome of at least 10 cells, more preferably of at least 100 cells, more preferably at least 10 3 cells, more preferably at least 10 4 cells.

Preferably the ratio of legitimate versus illegitimate integration of a donor nucleic acid sequence in accordance with a method or use of the invention is more than 20, more preferably more than 50, more preferably more than 100, more preferably more than 1000, more preferably more than 10.000. Said HR-based genome editing preferably comprises a method for modifying a target sequence in a genome of a cell according to the invention. A method or use of the invention therefore preferably comprises modifying a target sequence and/or introducing a donor or exogenous nucleic acid sequence in a genome of at least 10 cells, more preferably of at least 100 cells, more preferably at least 10 3 cells, more preferably at least 10 4 cells. A method or use of the invention is particularly suitable for treating and/or preventing a genetic disease and/or for gene repair and/or for gene modification. Further provided is a method for modifying a target sequence in a genome of a cell according to the invention wherein said target sequence comprises a mutation, for treating or preventing a genetic disease in an individual. As used herein, an "individual" refers to a human or an animal, preferably an animal that can be affected by a genetic disease, such as humans, non-human primates, rodents, ovines, bovines, canines, felines, ruminants and other mammals, birds, insects, fish and reptiles, and other vertebrates. Preferably said individual is a human. As used herein "treating a genetic disease" includes counteracting, inhibiting or curing a genetic disease and/or alleviating, inhibiting or abolishing symptoms resulting from a genetic disease. In treatment or prevention of a genetic disease, the target sequences for modification in accordance with a method of the invention is a gene underlying a genetic disease, said gene having one or more mutations. The genetic disease may be the result of a point mutation in a gene in an individual or due to a frame-shift, deletion, duplication or recombination of said gene. A single-gene or monogenetic disease is caused by mutation(s) in a single gene. Genetic diseases may also be polygenic, meaning they are associated with the effects of one or more mutation in multiple genes. Methods of the invention are particularly useful for gene repair wherein the target sequence is a gene underlying a genetic disease. "Gene repair" as used herein refers to a modification of a gene in the genome of a target cell to restore the correct function of said gene that comprises one or more mutations and as a result thereof has reduced or lost function or has gained an unwanted function.

Genetic diseases that can be treated or prevented with such method include, but are not limited to, diseases selected from the group consisting of hemophilia B, hemophilia A, Duchene muscular dystrophy, cystic fibrosis, thalassemia, sickle cell anemia, X-linked severe combined immunodeficiency (SCID), ADA-SCID, Wiskott-Aldrich syndrome, epidermolysis bullosa dystrophica, epidermolysis bullosa junctional, RAG- 1 deficiency SCID, RAG-2 deficiency SCID, metachromatic leukodystrophy, limb-girdle muscular dystrophy (type 2C), limb- girdle muscular dystrophy (type 2A), X-linked chronic granulomatous disease, and glycogen storage disease II. The genes that are modified to repair a mutation in said gene with a method of the invention for treatment or prevention are the following F9 (hemophilia B); F8 (hemophilia A); DMD (Duchene muscular dystrophy); CFTR (cystic fibrosis); beta-globin (thalassemia); hemoglobin/HBB (sickle cell anemia); IL2RG (X-linked severe combined immunodeficiency [SCID]); ADA (ADA-SCID); WAS (Wiskott-Aldrich syndrome); COL7A1 (epidermolysis bullosa dystrophica); LAMC2, LAMB3, COL17A1 or ITGB4 (epidermolysis bullosa junctional); RAG- 1 (RAG-1 deficiency SCID); RAG-2 (RAG-2 deficiency SCID); ARSA (metachromatic leukodystrophy); SGCG (limb-girdle muscular dystrophy, type 2C); CAPN3 (limb-girdle muscular dystrophy, type 2A); CYBB (X-linked chronic granulomatous disease); GAA (glycogen storage disease II).

Also provided is therefor a method for modifying a target sequence in a genome of a cell according to the invention for gene repair.

Also provided is a method for treatment or prevention of a genetic disease comprising administering to an individual in need thereof a therapeutic amount of a construct, set of constructs or (nucleic acid-containing) virus particle according to the invention.

Further provided is the use of a construct, set of constructs or (nucleic acid-containing) virus particle according to the invention for modifying a target sequence in a genome of a cell for treatment or prevention of a genetic disease.

Further provided is the use of a construct, set of constructs or (nucleic acid-containing) virus particle according to the invention for the preparation of a pharmaceutical composition for modifying a target sequence in a genome of a cell and/or for treatment or prevention of a genetic disease.

Further provided is a construct, set of constructs or (nucleic acid- containing) virus particle according to the invention for use in modifying a target sequence in a genome of a cell. Also provided is a construct, set of constructs or (nucleic acid-containing) virus particle according to the invention for use in the treatment or prevention of a genetic disease.

In a method or use of the invention for treatment or prevention of a genetic disease the ratio of legitimate versus illegitimate integration of a donor nucleic acid sequence in accordance with a method or use of the invention is preferably more than 10, preferably more than 20, more preferably more than 50, more preferably more than 100, more preferably more than 1000, more preferably more than 10.000.

A method or use of the invention is particularly suitable for targeted gene addition. Targeted gene addition has many applications in both gene therapy and cell engineering. The invention therefore further provides a method for modifying a target sequence in a genome of a cell for targeted gene addition and/or for introducing an exogenous nucleic acid sequence of interest into the genome of said cell. Preferably, targeted gene addition comprises introducing a donor nucleic acid sequence, preferably DNA that encodes a protein, a therapeutic protein or a non-coding RNA molecule. In a method for targeted gene addition, said target sequence for modification preferably is a (genomic) safe harbor locus. Non-limiting examples of such safe harbor loci are AAVSl, CCR5, CCR2, the ROSA26 locus, FUT8, DMD21, SH6 and house-keeping genes such as HPRT1, GAPDH and DHFR. A preferred (extragenic) safe harbor locus is a genomic locus that fulfils the following criteria: i) a distance of > 50 kb from the 5' terminus of any gene, ii) a distance of >300 kb from cancer-related genes, iii) a distance of > 300 bp from any microRNA; iv) the locus is located outside a gene transcription unit, and v) the locus is located outside an ultra-conserved region [2].

Also provided is a method for targeted gene addition comprising administering to a cell a construct, set of constructs or (nucleic acid-containing) virus particle according to the invention. Further provided is a method for targeted gene addition in an individual, preferably in an individual's cells, comprising administering to said individual, preferably to said individual's cells, a construct, set of constructs or (nucleic acid-containing) virus particle according to the invention. Such method preferably comprises a method for modifying a target sequence in the genome of a cell according to the invention

Further provided is the use of a construct, set of constructs or (nucleic acid-containing) virus particle according to the invention for modifying a target sequence in a genome of a cell for targeted gene addition.

Further provided is the use of a construct, set of constructs or (nucleic acid-containing) virus particle according to the invention for the preparation of a pharmaceutical composition for targeted gene addition.

Further provided is a construct, set of constructs or (nucleic acid- containing) virus particle according to the invention for use in targeted gene addition.

In a method or use of the invention for targeted gene addition the ratio of legitimate versus illegitimate integration of a donor nucleic acid sequence in accordance with a method or use of the invention is preferably more than 10, preferably more than 20, more preferably more than 50, more preferably more than 100, more preferably more than 1000, more preferably more than 10.000.

Methods for modifying a target sequence in the genome of a cell according to the invention and constructs or sets of constructs according to the invention are also particularly suitable for research purposes. Examples of such applications include the study of gene function which is for instance achieved by targeted gene inactivation by deleting or adding one or more nucleotides of a gene of interest or by overexpressing a gene of interest by gene addition, e.g. insertion of an exogenous gene into the genome of a cell, for instance at a safe harbor locus. Examples of modifications of the target genome for research purposes include, but are not limited to, the introduction of mutations into a gene or the deletion of one or more parts of a gene or of an entire gene or regulatory elements of a gene for analysis of the effect on gene function, or the modification of mutations or polymorphisms in a gene to determine which mutation or polymorphism is causative of a physiological or biochemical effect(s) or phenotype(s). Methods for modifying a target sequence in the genome of a cell according to the invention and constructs or sets of constructs according to the invention are also particularly suitable for engineering cell lines for biotechnology applications, such as for instance for enhanced production of proteins, metabolites and vaccines. Methods for modifying a target sequence in the genome of a cell according to the invention and constructs or sets of constructs according to the invention are also particularly suitable for cell engineering to create synthetic gene circuits, modify metabolic pathways and reprogram cell phenotypes such as by controlling cellular

differentiation pathways. Methods for modifying a target sequence in the genome of a cell according to the invention and constructs or sets of constructs according to the invention are also particularly suitable for the generation of transgenic prokaryotes (e.g. Escherichia coli, Bacillus subtilis, Mycoplasm genitalium, Synechocystis and Pseudomonas), transgenic protists (e.g. Chlamydomonas,

Dictyostelium discoideum, Tetrahymena and Plasmodium), transgenic fungi (e.g. , Saccharomyces cerevisiae, Neurospora crassa and Schizosaccharomyces pombe) transgenic plants (e.g. Arabidopsis, sorghum, rice. Lotus japonicus and

Brachypodium) and transgenic animals (e.g. Drosophila, C. elegans, zebrafish, mice, rats, cows, pigs and non-human primates). The invention further provides a method for modifying a target sequence in a genome of a cell, the method comprising providing a construct according to the invention or a set of constructs according to the invention to said cell.

The invention further provides a cell, comprising a genomic modification that is produced by a method of the invention. In said cell the ratio of legitimate versus illegitimate integration of a donor nucleic acid sequence as defined herein is more than 10, preferably more than 20, more preferably more than 50, more preferably more than 100, more preferably more than 1000, more preferably more than 10.000. Most preferably said cell comprises only the modification that was present on the linear replication incompetent construct and that was introduced following recombination of said construct into the target sequence, such as an insertion of one or more base pairs into the target sequence or a deletion of one or more base pairs from the target sequence.

Now that the present invention provides HR-based genome engineering methods that lead to a higher specificity and accuracy of the DNA editing process, populations of cells can be obtained that have a higher level of legitimate integration of donor nucleic acid in their genome as compared to methods known in the art. Provided is therefore a population of cells comprising cells having a donor nucleic acid sequence in their genome, wherein the ratio of cells having legitimate integration of said donor nucleic acid in their genome versus cells having illegitimate integration of said donor nucleic acid in their genome is more than 10. Said donor nucleic acid sequence is introduced in said genome. Preferably the ratio of cells having legitimate integration of said donor nucleic acid in the genome versus cells having illegitimate integration of said donor nucleic acid in the genome is more than 20, more preferably more than 50, more preferably more than 100, more preferably more than 1000, more preferably more than 10.000. Hence, up to approximately 9% of the cells of a cell population of the invention may have illegitimate integration of donor nucleic acid in their genome. Therefore, a population of cells according to the invention is provided comprising cells having a donor nucleic acid sequence introduced in their genome, wherein between 0.01% and 9% of said cells have illegitimate integration of said donor nucleic acid sequence in their genome, preferably between 0.01 and 5% of said cells, more preferably between 0.01% and 3% of said cells, more preferably between 0.01% and 1% of said cells, most preferably between 0.01% and 0.1% of said cells.

A population of cells of the invention is for instance a population of cells obtained directly after performing a HR-based genome engineering method, preferably a method of the invention. Preferably at least 2% of said cells in said population have said donor nucleic acid sequence in their genome, more preferably at least 3% of said cells, more preferably at least 4% of said cells, more preferably at least 5% of said cells, more preferably at least 7% of said cells, more preferably at least 10% of said cells. It is also possible to isolate from a population of cells obtained directly after performing a HR-based genome engineering method of the invention cells that have a donor nucleic acid sequence in their genome. It is therefore possible to obtain a population of cells according to the invention wherein up to 100% of the cells have said donor nucleic acid sequence in their genome, preferably up to 100% of the cells of the population have legitimate integration of said donor nucleic acids sequence in their genome. The methods of the invention allow for rapidly enriching for the desired genome-modified cell population. The precision an accuracy of integration of donor nucleic acid of a method of the invention reduces the dependency on time-consuming screening of genetically modified populations to identify properly targeted cells. The need for isolating, identifying and expanding monoclonal cell populations when using previously known methods, which generally have higher number of cells having illegitimate integration of donor nucleic acid in their genome, is by-passed.

A population of cells according to the invention is preferably a polyclonal cell population. The term "polyclonal population of cells" refers to a population of cells that contains cells with a donor nucleic acid sequence in their genome integrated into their genome, originating from more than one, preferably more than 10, preferably more than 100 cells independent HR recombination events.

The donor nucleic acid sequence is preferably introduced in the genome of cells of a population of cells using a method of the invention. Further, said donor nucleic acid sequence is preferably introduced into the cells of said population of cells using a linear replication incompetent construct according to the invention. Cells of said population that contain the donor nucleic acid sequence in their genome may therefore comprise a residual linear replication incompetent construct according to the invention or part thereof. A population of cells according to the invention is thus provided comprising cells having a donor nucleic acid sequence in their genome, wherein at least 1% of said cells comprise a linear replication incompetent construct according to the invention or part thereof, preferably at least 2% of said cells, more preferably at least 5% of said cells, more preferably at least 10% of said cells. Said part for instance comprises the molecule attached to at least one terminus of said construct, and optionally further comprises said terminus. Up to 100% of said cells having a donor nucleic acid sequence in their genome may contain said linear replication incompetent construct according to the invention or part thereof. Said linear replication incompetent construct is preferably a linear replication incompetent viral vector, more preferably a double- stranded linear replication incompetent viral vector, more preferably a vector selected from the group consisting of an adenoviral vector, a herpes viral vector, an ade no- associated viral vector, a retroviral vector, a vaccinia viral vector and a bacteriophage vector, such as Phi29, Bam 35, Nf, PRD1 or Cp-1, most preferably an adenoviral vector.

A population of cells according to the invention is for instance an ex vivo population of cells. Alternatively, a population of cells according to the invention is an in vitro population of cells. In yet another embodiment a population of cells according to the invention is an in vivo population of cells.

A population of cells according to the invention is particularly suitable for use in transplantation into an individual. Provided is therefore a population of cells for use in a method of treatment or prevention of a disease whereby said cells are transplanted into said individual. Said individual is preferably a human. Said disease is preferably a genetic disease, such as a disease selected from the group consisting of hemophilia B, hemophilia A, Duchene muscular dystrophy, cystic fibrosis, thalassemia, sickle cell anemia, X-linked severe combined

immunodeficiency (SCID), ADA-SCID, Wiskott-Aldrich syndrome, epidermolysis bullosa dystrophica, epidermolysis bullosa junctional, RAG-1 deficiency SCID, RAG-2 deficiency SCID, metachromatic leukodystrophy, limb-girdle muscular dystrophy (type 2C), limb-girdle muscular dystrophy (type 2A), X-linked chronic granulomatous disease, and glycogen storage disease II. Preferably the cells of said population are derived from said individual before a donor nucleic acid sequence is introduced in the genome of the cells. Said population of cells that is transplanted into said individual is preferably enriched for cells containing the donor nucleic acid sequence in their genome after performing a HR based method of the invention. Hence, preferably at least 25% of said cells in said population have said donor nucleic acid sequence in their genome, more preferably at least 50% of said cells, more preferably at least 70% of said cells, more preferably at least 80% of said cells, more preferably at least 90% of said cells in said population have said donor nucleic acid sequence in their genome. In a particularly preferred

embodiment essentially all cells of said population of cells contain said donor nucleic acid sequence in their genome. Also provided is a method of treating or preventing a disease, preferably a genetic disease, comprising administering to an individual in need thereof a population of cells according to the invention. Also provided is the use of a population of cells according to the invention for the preparation of a medicament for the treatment or prevention of a disease, preferably a genetic disease, in an individual. Said individual is preferably a human.

A population of cells according to the invention is further particularly suitable for expressing protein or (poly)peptide encoded by the donor nucleic acid sequence, in particular if it is essential to be able determine the exact location of the encoding nucleic acid that is introduced in the genome of the cells. Cells of a population of cells according to the invention are particularly suitable for such application because it is possible to exactly and with high accuracy determine the location in the genome of the cells where the donor nucleic acid is introduced and the donor nucleic acid is introduced with high accuracy and specificity. Provided is therefore the use of a population of cells according to the invention, wherein the donor nucleic acid encodes a protein, polypeptide or peptide of interest, for expressing said protein, polypeptide or peptide encoded by said donor nucleic acid.

A population of cells according to the invention is further particularly suitable for generating isogenic cellular substrates that genetically differ among each other exclusively in a well-defined genomic region or in a few nucleotides or in a single nucleotide. Such isogenic cells, which may include induced pluripotent stem cells (iPS) can, for instance, serve as model systems to study disease phenotypes in vitro and to screen libraries of small-molecule drugs to isolate compounds that revert or ameliorate said disease phenotypes. Provided is therefor the use of a population of cells according to the invention for generating isogenic cellular substrates genetically differ among each other exclusively in a well-defined genomic region or in a few nucleotides or in a single nucleotide. A few nucleotides for instance refers to up to 100 nucleotides, preferably up to 50, more preferably up to 25, more preferably up to 10, such as 2, 3, 4, 5, 6, 7, 8, 90 or 10 nucleotides. Also provided is a pharmaceutical composition comprising a construct according to the invention, a set of constructs according to the invention and/or a (nucleic acid-containing) virus particle according to the invention. Said

pharmaceutical composition preferably further comprises a suitable

pharmaceutical carrier, diluent and/or excipient. A pharmaceutical composition according to the invention is preferably suitable for human use.

The invention further provides a method for producing a cell comprising a modified genome, preferably at least one modified gene, the method comprising providing a construct according to the invention or a set of constructs according to the invention to said cell, and selecting a cell in which the genome has been modified at the target sequence and that functionally expresses a recombined selection marker. Said method preferably comprises inducing an inducible promoter for expression of the endonuclease, thereby inducing a site-specific DNA break in the target sequence.

Features may be described herein as part of the same or separate aspects or embodiments of the present invention for the purpose of clarity and a concise description. It will be appreciated by the skilled person that the scope of the invention may include embodiments having combinations of all or some of the features described herein as part of the same or separate embodiments.

The invention will be explained in more detail in the following, non- limiting examples.

Brief description of the drawings

Figure 1. Diagram of the gene targeting strategies for the recombinant eGFP allele and the native AAVSl safe harbor locus whose sequence is embedded within that of PPP1R12C. Illustrations of designer ZFN and TALEN proteins composed of sequence-tailored zinc finger arrays and transcription activator-like repeats, respectively, fused to the nuclease domain of Fokl. The ZFN and TALEN

monomers are depicted in relation to their cognate half target sites (upper case). These bipartite sequences frame the spacer elements (lower case) whose sequences are cleaved upon the local assembly of nuclease pairs and ensuing dimerization- dependent Fokl catalysis. EFla promoter, human EEF1A1 regulatory sequences; SUR, SV40 5'UTR (SU) and R region (R) from HTLV- 1; eGFP, reporter-encoding ORF; GHpA, human GH-1 polyadenylation signal; Ψ, HIV-1 packaging signal; white and grey boxes, 5' and 3' retroviral long terminal repeat sequences, respectively. IDLV.donor eGFP and IDLV.donor S1 genomes contain HR substrates consisting of a reporter expression unit flanked by sequences identical to those bracketing the ZFN and TALEN target sites, respectively. The expression unit in IDLV.donor eGFP consists of the hybrid GAG promoter, the FP635 (a.k.a. Katushka) ORF and the bovine GH-1 polyadenylation signal, whereas that in IDLV.donor S1 comprises the human PGK-1 promoter, the eGFP ORF and the bovine GH-1 polyadenylation signal.

Figure 2. Gene targeting of IDLV-delivered donor DNA by using ZFNs and TALENs. (a) IDLV gene targeting in indicator H27 cells. Target cells were transduced with IDLV.donor eGFP alone (-) or together with LV.ZFN-l eGFP and

LV.ZFN-2 eGFP (+). Flow cytometry was performed at 35 days post-infection. H27 cells genetically modified through eGFP-targeted exogenous DNA integration or via NHEJ-mediated eGFP disruption plus off-target exogenous DNA insertion acquire an eGFP7FP635 + phenotype (red bar), whereas those subjected exclusively to off- target insertion events become marked as eGFP + /FP635 + (orange bars).

Representative fluorescence micrographs corresponding to the differentially marked H27 populations are shown next to their respective pictograms. (b) IDLV gene targeting in myoblasts by using TALENs. Target cells were co-transduced with IDLV.donor sl , AdV.TALEN-L S1 and AdV.TALEN-R S1 (+/L/R). Cultures exposed to IDLV. donor eGFP alone (+/-) or mixed with AdV.TALEN-L S1 (+/L) served as controls. Flow cytometry was carried out at 27 days post-infection. Plots correspond to mean ± s.d. (n=3; P < 0.0001). (c) PCR analyses of myoblast clones targeting HR-derived junctions between exogenous and native AAVSl

chromosomal DNA. Upper panel, diagram of PCR assays diagnostic for HR-derived "telomeric" and "centromeric" IDLV ΌΝΑ-AAVSl junctions (jT and jC, respectively) and head-to-tail IDLV DNA concatemers (jH-T). Lower panel, PCR screening of eGFP + myoblast clones by using the PCR assays depicted in the upper panel. Open arrowheads, IDLV integrants lacking HR-derived junctions at either one or both termini; solid arrows, integrants comprising head-to-tail concatemers (d)

Cumulative molecular characterization of eGFP + myoblast clones (shown in Fig. 2c and Fig. 3a) isolated from cultures exposed to the AAVSl- specific TALEN pair and IDLV.donor S1 DNA (Fig. 2b, right-hand bar).

Figure 3. Molecular characterization of myoblasts genetically modified by using TALENs and IDLV donor DNA. (a) PCR analyses of independent eGFP + myoblast clones derived from cultures exposed to A4VS2-specifc TALENs and IDLV.donor S1 DNA. The presented panel complements that of Fig. 2c with the aggregate data being plotted in Fig. 2d. Marker, GeneRuler DNA Ladder Mix molecular weight marker (Thermo Scientific). Open arrowheads, IDLV DNA integrants lacking HR- derived junctions either at one or both termini; solid arrows, IDLV DNA integrants comprising head- to-tail (H-T) concatemers. (b) PCR analyses of long-term eGFP + myoblast sorted populations that were originally transduced with IDLV.donor sl in the absence of TALEN gene expression (IDLV + TALEN ). The genomic DNA

(gDNA) from these long-term cultures was subjected to PCR amplifications to detect "telomeric" and "centromeric" HR-derived junctions between AAVSl and exogenous DNA (jT and jC, respectively). Nuclease-free water and genomic DNA of unmodified myoblasts served as negative control samples, whilst genomic DNA extracted from an eGFP + myoblast clone targeted at AAVSl provided for positive controls. Marker, GeneRuler DNA Ladder Mix molecular weight marker. Figure 4. Schematic representation of exogenous DNA forms found in myoblasts genetically modified by deploying TALENs and IDLV donor DNA. Overview and nomenclature of IDLV.donor S1 DNA arrangements identified by PCR screening of eGFP + myoblast clones retrieved from populations co-transduced with

AdV.TALEN-L sl and AdV.TALEN-R S1 . For an explanation of the diagrams and symbols see the legends of Fig. 1 and Figure 2c. From this analysis it resulted that, in addition to integrants displaying bona fide HR-derived "telomeric" and

"centromeric" A4VS2-exogenous DNA junctions (class I), there were also those lacking either one (classes Ila and lib ) or both (class III) of these junctions.

Moreover, clones consistent with the aforementioned classes and displaying, in addition, a direct repeat concatemeric arrangement could also be identified (classes I c , IIa c , lib" and IIL).

Figure 5. Gene targeting in human myoblasts with AdV-delivered donor DNA and chromosomal site-specific DSBs. (a) AdV gene targeting in myoblasts by deploying AAVSl- specific TALENs. Target cells were co-transduced with AdV.A2.donor S1 , AdV.TALEN-L S1 and AdV.TALEN-R S1 (+/L/R). Cultures exposed exclusively to AdV.A2.donor sl (+/-) or to AdV.A2.donor S1 plus AdV.TALEN-L S1 (+/L) served as negative controls. Flow cytometry was performed at 45 days post-transduction. Plots correspond to mean ± s.e.m. (P < 0.0001). (b) Coefficient of variation (CV) values of eGFP + populations exposed to A4VS2-specific TALENs and HR

substrates transferred via AdV.A2.donor S1 versus IDLV.donor S1 . Clones containing an A4VS2-targeted donor sl DNA copy (n=10) and eGFP + populations with randomly inserted IDLV.donor S1 DNA served as controls (left and right bars, respectively), (c) Distribution of MFI values of eGFP + clones retrieved from cultures stably transduced by using TALENs and IDLV.donor S1 (grey bars) or by deploying TALENs and AdV. Δ2. donor S1 (black bars). Inset, boxplot of the cumulative MFI values corresponding to both series of myoblast clones analyzed, (d) PCR screening of eGFP + clones to detect HR-derived exogenous ONA-AAVSl junctions. Water and mock-transduced myoblasts served as negative controls, whereas IDLV.donor sl -targeted myoblast clone 4 and plasmid pSh.AAVSl.eGFP (shuttle) provided for positive controls, (e) Cumulative molecular characterization of randomly selected eGFP + myoblasts isolated from cultures co-transduced AdV.A2.donor S1 , AdV.TALEN-L S1 and AdV.TALEN-R S1 . (f) Probing by PCR the assembly of head-to-tail AdV DNA concatemers in transduced myoblasts. Upper panel, in wiro-generated head-to-tail AdV DNA junctions (jH-T). L-ITR and R-ITR, "left" and "right" AdV ITR, respectively; half arrows, primers; horizontal bar, head- to-tail-specific amplicon. Lower panel, PCR analyses of amplicons resulting from in iiro-generated head-to-tail templates and genomic DNA from eGFP + myoblast pools genetically modified by using A4VS2-specific TALENs and AdV.A2.donor S1 DNA. Figure 6. Southern blot analyses of human myoblasts stably transduced following AdV-mediated delivery of TALENs and donor DNA. Ncol-digested genomic DNA was resolved through agarose gels, blotted and incubated with an AAVSl- specific probe (short horizontal black bar). Left-hand panel, characteristic autoradiogram showing myoblast clones displaying the typical chromosomal integration pattern of AdV- delivered donor DNA resulting from HR events at AAVSl (solid arrowhead). Middle panel, autoradiogram displaying the DNA from myoblast clone 10 subjected to bi-allelic HR-mediated gene targeting at AAVSl loci (vertical open arrow). Right- hand panel, autoradiogram exhibiting the DNA corresponding to myoblast clone 58 whose restriction pattern is consistent with an additional, HR-independent, insertion event presumably brought about by the incorporation of AdV.A2.donor S1 backbone sequences (horizontal open arrow). Lanes M, GeneRuler DNA Ladder Mix molecular weight marker. Lane C, Genomic DNA from unmodified human myoblasts. Figure 7. Nuclease-mediated gene targeting of AdV donor DNA in genetically unstable human cervix carcinoma HeLa cells, (a) Stable transduction of HeLa cells following TALEN-induced DSB formation at AAVSl and donor DNA delivery by AdV.Al.donor S1 . Target cells were transduced with AdV.Al.donor S1 alone or with AdV.Al.donor S1 mixed with AdV.TALEN-L S1 and AdV.TALEN-R S1 (L/R). Flow cytometry was done at 24 days post-infection. Plotted data correspond to mean ± s.d. (P < 0.0001). (b) Representative flow cytometry dot plots corresponding to long- term HeLa cell cultures exposed to AdV.Al. donor S1 and AdV.TALEN-R S1 (R) or to AdV.Al.donor S1 plus a combination of AdV.TALEN-L S1 and AdV.TALEN-R S1 (L/R). Owing to a bi-cistronic expression unit, cells transduced with TALEN-encoding AdVs become tagged with DsRedEx2.1. The resulting dual-color dot plots highlight the selective maintenance of the eGFP-containing donor DNA in long-term cultures, (c) Probing by PCR the assembly of head- to-tail AdV DNA concatemers in stably transduced HeLa cells. For a description of the elements shown in the upper panel see the legend of Fig. 5f. Lower panel, agarose gel electrophoresis of PCR mixtures resulting from amplifications on in wiro-generated head-to-tail AdV DNA and on genomic DNA isolated from eGFP + HeLa cell populations genetically modified by using A4VS2-specific TALENs and AdV.Al. donor S1 DNA (gDNA donor sl+ pool), (d) Cumulative data corresponding to the molecular

characterization of randomly selected eGFP + HeLa cells isolated from cultures co- transduced with AdV.Al. donor sl , AdV.TALEN-L S1 and AdV.TALEN-R S1 (Fig. 7e). jT and jC, "telomeric" and "centromeric" junctions, respectively. (e)PCR screening of eGFP + HeLa cell clones to identify HR-derived junctions between exogenous and native chromosomal target DNA. Cellular DNA from mock-transduced HeLa cells, eGFP + HeLa cell populations from which the clones were isolated from (pool) and from eGFP + and donor DNA H27 cells served as controls.

Figure 8. Nuclease-mediated gene targeting of donor DNA associated with or segregated from AdV genomes, (a) Testing the effect of donor DNA eviction from AdV genomes on the frequency of HR-driven gene targeting. Upper panel, experimental strategy for the TALEN-mediated excision of HR templates from AdV DNA in transduced cells. AdV.A2.donor sl/T TS , AdV carrying donor sl DNA framed by target sequences for the AAVSl- specific TALENs (solid vertical arrowheads); Open oval, terminal protein (TP) covalently attached to the 5' termini of AdV DNA; Ψ, AdV packaging signal; exo., exogenous DNA; HR, homologous recombination. Lower panel, PCR screening of eGFP + HeLa cell clones to detect DNA junctions formed by HR events between AdV. Δ2. donor S1/T TS DNA and the "centromeric" side of the AAVSl locus. Open arrowheads, clones harboring integrants lacking HR- derived "centromeric" junctions, (b) Model for high-fidelity genome editing based on site-specific chromosomal DSBs and AdV-delivered donor DNA. Left-hand panel, viral vector genomes with free ends are readily sensed in transduced cells as broken DNA. After being co-opted by illegitimate recombination pathways, these genomes can be rerouted to off-target sites suffering spontaneous DSBs. The precision of the genome editing process can be further compounded by the chromosomal insertion of concatemeric vector DNA forms. Right-hand panel, conversely, protein-capped AdV genomes make interactions between donor templates and off-target DSBs less probable with the pairing of, and single-strand DNA invasion at, the shared endogenous and exogenous DNA sequences conferring the very high target site specificity of AdV-delivered donor DNA.

Figure 9. Generic arrangements of possible terminally-capped linear nucleic acid constructs in accordance with the invention. The oval indicates the capping molecule; the arrow indicates linear constructs with capping moieties at both termini.

Figure 10. Examples of terminal proteins and of precursor terminal proteins from adenoviruses, phage terminal proteins and Streptomyces terminal proteins that can be used as capping molecules in accordance with the invention and amino acid sequence thereof. Potential nuclear localization signal (NLS) sequences in phage terminal proteins are highlighted as indicated in ref. [12] . Potential NLS sequences in Streptomyces terminal proteins are highlighted as indicated in ref. [3] .

Figure 11. Nuclease -induced gene targeting following AdV- versus plasmid- mediated delivery of HR substrates, (a) TALEN and RGN target sites at AAVSl. Recognition sequences for the nuclease complexes TALEN-L S1 :TALEN-R S1 and Cas9:gRNA S1 complexes drawn in relation to the PPP1R12C locus in which they are embedded. The target sites of the TALEN pair and the RNA-guided nuclease are shown in upper case and boxed, respectively. The protospacer adjacent motif (PAM) is shaded. Open vertical arrowheads indicate the Cas9 cleavage site, (b) MFI distribution of randomly selected eGFP + HeLa cell clones genetically modified by Cas9:gRNA S! and AdV.A2.donor S! (TP-capped) or by TALEN-L S1 :TALEN-R S1 (L/R) and either pAdV.donor S1 (supercoiled), Pacl-linearized pAdV.donor S1 (in vitro- linearized) or p AdV. donor S1/T TS (in w o-linearized). (c) Boxplot of cumulative MFI values corresponding to the eGFP + HeLa cell clone series presented in panel (b). (d) Frequencies of eGFP + clones lacking the A4VS2-donor DNA "telomeric" junction. These cumulative data were obtained by PCR screening of the HeLa cell clones depicted in Figs. 15, 16, 17 and 7e.

Figure 12. Genotyping assay for validating AdV.Cas9 and AdV.gRNA S1 in HeLa cells. PCR products spanning the AAVSl target region (Fig. 11a) were amplified from genomic DNA of HeLa cells transduced with AdV.Cas9 alone at an MOI of 300 TCIDso/cell (Cas9) or with 1: 1 mixtures of AdV.Cas9 and AdV.gRNA S1 at a total MOI of 60, 120, 180, 240 and 300 TCIDso/cell. After amplicon denaturation/re- annealing, base pair mismatches (indels) resulting from NHEJ-mediated DSB repair were detected by T7 endonucleases I (+T7EI) digestions. Amplicons not exposed to T7EI (-T7EI) provided for negative controls. Solid and open arrowheads indicate the positions of, respectively, undigested and T7EI-digested DNA fragments. Figure 13. Designer nuclease-induced genetic modification of target cells following delivery of HR substrates in plasmid versus AdV DNA. HeLa cells were either co-transduced with AdV.Cas9, AdV.gRNA sl and AdV.A2.donor sl (first panel) or were co-transfected with the plasmid pair encoding the AAVSl- specific TALENs plus Pacl-linearized pAdV.donor sl , covalently-closed

pAdV.donor sl/T " TS or covalently-closed pAdV.donor sl (second, third and fourth panel, respectively). Parallel HeLa cell cultures that were co-transduced with AdV.Cas9 and AdVA2.donor sl or that were co-transfected with the TALEN-L S1 - encoding expression construct mixed with each of the three donor plasmid types provided for negative controls. The frequency of eGFP + cells present in the various long-term HeLa cell cultures was determined by flow cytometry at 23 days after transgene delivery and is indicated within each dot plot. Pictograms of the various types of donor DNA templates deployed in these experiments are drawn next to their respective dot plots. Thin parallel lines, double-stranded DNA; Ψ, AdV packaging signal; horizontal black, white and grey bars, donor sl DNA composed of AAVSl targeting sequences framing the eGFP-encoding transgene (exo.); TP, terminal protein covalently attached at the 5' termini of the AdV DNA; vertical arrowheads, position of the recognition sequences for the restriction enzyme Pad and the designer nuclease dimer complex TALEN- L S1 :TALEN-R S1 ; ori and KanR, prokaryotic origin of replication and kanamycin- resistance gene, respectively. Figure 14. Determining transgene expression profiles by flow cytometric analyses of genetically modified HeLa cell clones. The various eGFP + HeLa cell clones were generated by using the A4VS2-specific TALENs plus covalently-closed

p AdV. donor S1/T TS (first graph), Pad -linearized pAdV.donor S1 (second graph) or covalently-closed pAdV.donor S1 (third graph) or were made instead by deploying the A4VS2-specific RGN complex plus protein-capped AdV.A2.donor S1 DNA (fourth graph). Plotting the relationship between the CV and the MFI parameters for each individual clone (solid circles) highlights the higher homogeneity of transgene expression found in the clonal set modified with AdV donor DNA. Figure 15. Gene targeting analyses of HeLa cells genetically modified by using TALENs and free-ended linear donor plasmids. PCR screening for detecting HR- mediated targeting of donor DNA delivered in an in cellula linearized plasmid was carried out on DNA from eGFP + HeLa cell clones isolated from cultures subjected to the transfer of AVSi-specific TALENs and p AdV. donor S1/T TS . PCR

amplifications targeting eGFP served as internal controls for the presence and integrity of DNA templates. Vertical arrowheads indicate clones lacking the targeted exogenous ΌΝΑ-AAVSl junction. Nuclease-free water and genomic DNA of mock-transduced HeLa cells served as negative controls, whilst genomic DNA extracted from an AAVSl -targeted eGFP + HeLa cell clone provided for positive controls. Marker, GeneRuler DNA Ladder Mix molecular weight marker. These data is presented as a red bar in the graph of Fig. l id.

Figure 16. Gene targeting analyses of HeLa cells genetically modified by deploying TALENs and covalently-closed circular donor plasmids. PCR screening for detecting HR-mediated targeting of donor DNA delivered in a supercoiled plasmid was performed on DNA from eGFP + HeLa cell clones isolated from cultures exposed to A4VS2-specific TALENs and pAdV.donor S1 . PCR amplifications targeting eGFP served as internal controls for the presence and integrity of DNA templates. Vertical arrowheads mark clones lacking the targeted exogenous ΌΝΑ-AAVSl junction. Nuclease-free water and genomic DNA of mock-transduced HeLa cells served as negative controls, whilst DNA extracted from an AAVSl -targeted eGFP + HeLa cell clone provided for positive controls. Marker, GeneRuler DNA Ladder Mix molecular weight marker. These data is presented as an orange bar in the graph of Fig. l id.

Figure 17. Gene targeting analyses of HeLa cells genetically modified by using the CRISPR/Cas9 system and protein-capped linear AdV DNA. PCR screening for detecting HR-mediated targeting of donor DNA delivered in protein-capped AdV genomes was performed on DNA from eGFP + HeLa cell clones grown from cultures exposed to Cas9:gRNA S1 and AdV.A2.donor S1 . PCR amplifications targeting eGFP served as internal controls for the presence and integrity of DNA templates.

Vertical arrowheads indicate clones negative for the targeted exogenous DNA- AAVSl junction. Nuclease-free water and genomic DNA of mock-transduced HeLa cells served as negative controls, whilst DNA extracted from an AAVSl- targeted eGFP + HeLa cell clone provided for positive controls. Marker,

GeneRuler DNA Ladder Mix molecular weight marker. These data is presented as a green bar in the graph of Fig. l id.

Figure 18. Prokaryotic DNA status of cell populations genome-edited by combining engineered nucleases with AdV or plasmid HR templates. Kan H - specific PCR analysis of genomic DNA extracted from eGFP + sorted HeLa cells initially transduced with AdV. Δ2. donor S1 (AdV DNA) or transfected with pAdV.donor S1 (supercoiled), Pad -linearized pAdV.donor S1 (linear in vitro) or TALEN pair- susceptible pAdV.donor sl/T TS (linear in vivo). Targeted DSBs were induced in AdV- transduced and plasmid-transfected cells by using A4VS2-specific RGN

(Cas9:gRNA sl ) and TALEN (TALEN S1 [L/R]) complexes, respectively. Plasmid pAdV.donor S1 served as positive control (+), whereas genomic DNA from parental HeLa cells (-) and nuclease-free water provided for negative controls. The integrity of the various DNA templates was controlled for by carrying out parallel eGFP- specific PCR amplifications. Lane M, GeneRuler DNA Ladder Mix molecular weight marker.

Figure 19. Testing TALEN-mediated release of donor DNA from AdV genomes in transduced cells, (a) Characterization of control AdV.A2.donor sl/FRT and test AdV.donor sl/T " TS DNA by restriction fragment length analysis. Left-hand panel, generic structures of AdV.A2.donor sl/FRT and AdV.A2.donor sl/T"TS recombinant genomes. FRT and T-TS, target sites for the yeast site-specific FLP

recombinase and the AAVSl -specific TALEN pair, respectively; open box, exogenous DNA flanked by AAVSl -targeting sequences (black and grey bars); Ψ, AdV packaging signal; open oval, 5' covalently-attached AdV terminal protein. Central and right-hand panels, in silico and actual electrophoretic mobility pattern of restriction fragments upon treatment of control and test vector DNA with Xbal and Mlul, respectively. The in silico Xbal and Mlul restriction patterns, made by using the Gene Construction Kit (version 2.5), aided in establishing the integrity of vector DNA templates. Marker, GeneRuler DNA Ladder Mix molecular weight marker, (b) Experimental design. Cellular DNA was extracted from HeLa cells co-transduced with AdV.A2.donor sl/FRT and FLPe-encoding vector hcAd.FLPe.F50 and from HeLa cells transduced with a combination of AdV.A2.donorSi/T-TS ; AdV.A2.TALEN-L sl and AdV.A2.TALEN-R sl (L/R). PCR amplifications with "outward-facing" primers (half arrows) on linear templates will result in specific products only upon donor DNA excision and circularization. From this experimental setup it follows that direct FLPe- induced excision and circularization of donor DNA from AdV.A2.donor sl/FRT provides for in iwo-generated positive control templates. Detection of a similarly sized PCR species diagnostic for NHEJ-mediated circularization of linear donor DNA serves as a surrogate indicator for TALEN-dependent excision of donor DNA from AdV. donor sl/T"TS genomes. The size and position (in kb) of these PCR species are indicated on the right side of the agarose gel. Examples

Methods Cells. The HeLa cells (American Type Culture Collection) and its eGFP-positive H27 clone derivative [13] were cultured in Dulbecco's modified Eagle's medium (DMEM; Invitrogen) supplemented with 5% fetal bovine serum (FBS; Invitrogen). The 293T lentiviral vector producer cells were maintained in DMEM containing 10% FBS, whereas the AdV packaging cell lines PER.C6 [5] and PER.E2A [7] were cultured in DMEM supplemented with 10% FBS and 10 mM MgC (Sigma-

Aldrich) in the absence and in the presence of 250 μg/ml of Geneticin (Invitrogen), respectively. These cell types were kept in a humidified atmosphere containing 10% CO2. The origin of and the culture conditions for the human myoblasts have been detailed elsewhere [14, 15].

Recombinant DNA. The complete and annotated DNA sequences of lentiviral vector transfer plasmids pLV.donor eGFP and pLV.donor S1 can be retrieved via GenBank accession numbers KF419293 and KF419294, respectively. The AdV shuttle plasmid pSh.AAVSl.eGFP sl/T TS was constructed by inserting upstream and downstream of the donor sl DNA module in pSh.AAVSl.eGFP [16] two annealed oligodeoxyribonucleotides containing bipartite target sequences for the AAVSl- specific TALENs (T-TS). A similar approach based on inserting into

pSh.AAVSl.eGFP a direct repeat of FRT sites in place of the T-TS sequences, was pursued in parallel. These maneuvers resulted in pSh.AAVSl.eGFP sl/FRT . The AdV molecular clones pAd.AEl.donor sl .F 50 and pAd.AElAE2A.donor sl .F 50 were assembled by HR in E. coli strains [17] BJ^ S^ 3 ^- 1 50 and BJ5183P AdEas y- 2 50 respectively, transformed with Mssl-treated pSh.AAVSl.eGFP. The AdV molecular clones pAd.AElAE2A.donor sl/T TS .F 50 and pAd.AE lAE2A.donor sl/FRT .F 50 were built by HR following the transformation of the latter cells with Mssl-digested plasmids pSh.AAVSl.eGFP sl/T TS and pSh.AAVSl.eGFP sl/FRT , respectively.

The AdV shuttle plasmid AT25_pAdV.PGK.Cas9 contains a humanized ORF encoding the S. pyogenes nuclease Cas9 under the transcriptional control of the PGK-1 promoter and the SV40 polyadenylation signal, whereas the AdV shuttle plasmid E43_pAdV.U6.gRNA S1 encodes a U6 promoter-driven single guide RNA (herein referred to as gRNA S1 ) targeting the Cas9 protein to the human AAVSl locus. The human codon-optimized Cas9 ORF and the RNA PolIII-dependent gRNA S1 expression unit have been published elsewhere [18] and were isolated from constructs hCas9 (Addgene plasmid 41815) and gRNA_AAVSl-T2 (Addgene plasmid 41818), respectively. Next, the AdV molecular clones

AD05_pAdV.AElAE2A.PGK.Cas9.F 50 and AD09_pAdV.AElAE2A.U6.gRNA sl .F 50 were generated by HR in BJ5183 pAdE sy 2 50 cells [17] following their transformation with Mssl-treated AT25_pAdV.PGK.Cas9 and E43_pAdV.U6.gRNA Si , respectively.

Production and titration of AdV vectors. The generation of the fiber-modified E2-deleted AdVs Ad.AEl.TALEN-L sl .F 50 and Ad.AEl.TALEN-R sl .F 50 (herein referred to as AdV.TALEN-L S1 and AdV.TALEN-R S1 , respectively) has been detailed elsewhere [19]. The same applies to the fiber-modified El- and E2A- deleted AdVs 45 Ad.AE 1AE2A.TALEN-L S1 .F 50 and Ad.AE 1AE2A.TALEN-R S1 .F 50

(herein named AdV.A2.TALEN-L S1 and AdV.A2.TALEN-R S1 , respectively [19]). The productions of the fiber-modified E2-deleted AdV AdV.Al.donor S1 and of its El- plus Ei -deleted derivative AdV.A2.donor S1 , were initiated by transfecting PER.C6 and PER.E2A cells with Pad -linearized AL25_pAd.AEl.donor sl .F 50 and

AL27_pAd.AElAE2A. donor 31 . F 50 , respectively. The productions of the fiber- modified, eGFP-encoding, donor AdVs AdV.A2.donor sl/T TS and AdV. Δ2. donor S1/FRT were started by transfecting PER.E2A cells with Pacl-treated

p Ad. ΔΕ 1ΔΕ 2 A. donor s 1/T"TS . F 50 and pAd.AElAE2A.donor sl/FRT .F 50 , respectively. The El- plus Ei -complementing PER.E2A cells were also used for rescuing and propagating the fiber-modified AdVs AdV.Cas9 and AdV.gRNA S1 following their transfection with the Pacl-linearized molecular clones

AD05_pAdV.AElAE2A.PGK.Cas9.Fso and AD09_pAdV.AElAE2A.U6.gRNAS 1 .F5o, respectively.

The DNA transfection-mediated rescue of AdV particles in packaging cell lines as well as their subsequent propagation and purification were performed essentially as described previously [17, 19]. The isolation and restriction fragment length analysis procedures applied to AdV. Δ2. donor S1/T TS and AdV.A2.donor sl/FRT DNA has been detailed elsewhere [17]. The titers of the various reporter-encoding AdV stocks, expressed in terms of transducing units (TU) per ml, were determined through limiting dilutions on HeLa indicator cells seeded at a density of 8xl0 4 cells per well of 24-well plates. At 3 days post-transduction, frequencies of reporter- positive cells were measured by reporter-directed flow cytometry. The titers of the reporter-negative AdV preparations were established by TCID50 assays in complementing cells and by fluorometric quantification of genome -containing vector particles (VP) per ml as described elsewhere [17,20].

Production and titration of lentiviral vectors. The generation of the vesicular stomatitis virus glycoprotein G (VSV-G)-pseudotyped lentiviral vectors LV.ZFN- l eGFP and LV.ZFN-2 eGFP has been described before [21]. The generation of the VSV- G-pseudotyped vectors IDLV.donor eGFP and IDLV.donor S1 was carried out by transient transfections of 293T cells with transfer plasmids AP45_pLV. donor eGFP and AQ25_pLV.donor S1 , respectively, together with packaging construct [22] AM16_psPAX2.IN D116N and pseudotyping construct pLP/VSVG (Invitrogen) as detailed elsewhere [22,23]. The physical particle titers of lentiviral vector preparations were determined by using the RETRO-TEK HIV- 1 p24 ELISA kit as specified by the manufacturer (Gentaur Molecular Products). Titers of these vector stocks expressed in terms of TU/ml were derived by using a conversion factor of 2500 TU per ng of HIV- 1 p24s a s protein.

Transduction experiments. The vector transductions on cultures of H27 indicator cells were carried out as follows. Eighty-thousand cells were seeded in wells of 24-well plates (Greiner Bio-One). The next day, IDLV.donor eGFP was added at an multiplicity of infection (MOI) of 45 TU/cell together with the LV.ZFN- l eGFP and LV.ZFN-2 eGFP vectors each applied at an MOI of 8 TU/cell. Parallel H27 cultures that were either untreated or were incubated exclusively with

IDLV.donor eGFP at an MOI of 45 TU/cell, served as controls. Next, after an extensive 5-week sub -culturing period, the frequencies of reporter-positive and reporter-negative H27 cell populations were monitored and quantified,

respectively, by dual-color fluorescence microscopy and flow cytometry. Long-term transduction experiments on human myoblasts were initiated by seeding 2 l0 5 cells per well of 24-well plates. The next day, the cells were incubated with AdV.TALEN-L S1 (2.5 TU/cell) and AdV.TALEN-R S1 (2.5 TU/cell) mixed with IDLV.donor sl (10 TU/cell) or with AdV.A2. donor S1 (10 TU/cell). The frequencies of IDLV.donor S1 - and AdV.A2.donor sl -modified myoblasts were determined at 27 and 45 days post-transduction, respectively. Long-term transduction experiments on HeLa cells were started by seeding 8 l0 4 cells per well of 24-well plates. After an overnight incubation period, the cells were exposed to AdV.TALEN-L S1 (3 TU/cell) and AdV.TALEN-R S1 (3 TU/cell) together with AdV.Al.donor S1 (6 TU/cell). The frequencies of stably transduced cells were measured by flow cytometry at 24 days post-transduction. To investigate the effect of TALEN-mediated donor DNA excision on the rate of gene targeting, 8 l0 4 HeLa cells, seeded one day before, were transduced with AdV.A2.TALEN-L S1 (1.5 TU/cell) and AdV.A2.TALEN-R S1 (1.5 TU/cell) together with AdV.A2.donor sl/T TS (3 TU/cell). The frequencies of stably transduced cells were established at 24 days post-transduction by flow cytometry. The eGFP-positive cell populations present in the various long-term cultures were sorted, after which over 250 individual clones were randomly selected for expansion and molecular analyses.

Flow cytometry and light microcopy. The measurement of transgene expression parameters (i.e. frequencies of reporter-positive and reporter-negative target cells, mean fluorescence intensities and coefficients of variation) were determined by using a BD LSR II flow cytometer (BD Biosciences). Data were analyzed with the aid of BD FACSDiva 6.1.3 software (BD Biosciences). Mock- transduced target cells were used to set background fluorescence levels. Typically, 10,000 viable single cells were analyzed per sample. The light microscopic analyses were carried out with an 1X51 inverse fluorescence microscope equipped with a XC30 Peltier-cooled digital color camera (Olympus). The images were processed with the aid of CelF 3.4 imaging software (Olympus).

Functional validation of RGN complexes delivered into target cells by AdVs. Fifty- thousand HeLa cells were transduced with AdV.Cas9 alone (300 TCID5o/cell) or with 1: 1 mixtures of AdV.Cas9 and AdV.gRNA S1 applied at total vector doses of 60, 120, 180, 240 and 300 TCIDso/cell. At three days post- transduction, chromosomal DNA was extracted from the target cells by using the DNeasy Blood & Tissue Kit (Qiagen) following the manufacturer's instructions. Next, A4VS2-specific PCR amplifications and T7 endonuclease I-based genotyping assays were carried out essentially as previously described [20]. Designer nuclease-mediated gene targeting of linear and supercoiled plasmid donor DNA. HeLa cells were seeded at a density of 6.5 l0 4 cells per well of 24-well plates. The next day, the cells were transfected with DNA mixtures consisting of 100 ng of 1383.pVAX.AAVSl.TALEN.L-94 [19], 100 ng of

1384.pVAX.AAVSl.TALEN.R-95 [19] and 200 ng of A4VS2-targeting AdV donor DNA plasmids. The targeting constructs were pAdV.donor S1 , Pad -linearized pAdV.donor S1 and pAdV. donor S1/T TS . Controls were provided by transfecting HeLa cells with 200 ng of 1383.pVAX.AAVSl.TALEN.L-94 mixed together with 200 ng of pAdV.donor S1 , Pad -linearized pAdV.donor S1 or pAdV.donor sl/T TS . The completion of the Pa digestions was confirmed by agarose gel electrophoreses and ethidium bromide staining. Each of the plasmid mixtures were diluted in 50 μΐ of 150 mM NaCl and received 1.32 μΐ of a 1 mg/ml polyethylenimine solution under vigorous shaking for about 10 sec. After a 20-min incubation period at room temperature, the resulting polycation-DNA complexes were directly added into the culture medium. After 7 hours, the transfection mixtures were removed and fresh culture medium was added. The resulting cell populations were subsequently subjected to sub-culturing for 3 weeks after which cells stably expressing eGFP in these populations were individually sorted by flow cytometry into wells of 96-well plates. Viable single cell-derived clones corresponding to the various experimental settings were randomly selected for transgene expression and integration status analysis.

Gene targeting of AdV donor DNA by using the RNA-guided nuclease Cas9. HeLa cells were seeded at a density of 8.0 l0 4 cells per well of 24-well plates. The following day, the cells were transduced with AdV.Cas9 (150

TCIDso/cell), AdV.gRNA S1 (50 TCIDso/cell) and AdVA2.donor S! (10 TU/cell). To serve as negative controls, HeLa cells were either mock-transduced or were transduced with AdV.Cas9 (150 TCIDso/cell) and AdVA2. donor S1 (10 TU/cell). At 3 days post-transduction, mock- and vector-transduced HeLa cells started to be sub- cultured twice per week. At 17 days post-transduction, eGFP stably-expressing cells were individually sorted by flow cytometry into wells of 96-well plates. Viable single cell-derived clones isolated from cultures initially exposed to AdV.Cas9, AdV.gRNA S1 and AdV.A2.donor sl > were randomly selected for transgene expression and integration status analysis.

Cell sorting and clonal expansion. Flow cytometry- assisted cell sorting was done after the removal of donor DNA-containing episomes from long-term HeLa and human myoblast cultures. The eGFP-positive cells were collected in 1: 1 mixtures of regular medium containing 2x penicillin/streptomycin (Invitrogen) and FBS. Next, the sorted cells were individually seeded in wells of 96-well plates (Greiner Bio-One) at a density of 0.3 cells per well in their respective medium supplemented with 50 μΜ of oc-thioglycerol (Sigma- Aldrich) and 20 nM of bathocuprione disulphonate (Sigma- Aldrich) to increase cloning efficiency [24] . Finally, over 250 individual clones were randomly selected for expansion and molecular analyses.

Genomic DNA extraction. Genomic DNA was extracted from cell populations and clones essentially as described before [25]. In brief, the cells were collected and incubated overnight at 55°C in 500 μΐ of lysis buffer (100 mM Tris-HCl [pH 8.5], 5 mM ethylenediaminetetraacetic acid [EDTA], 0.2% sodium dodecyl sulfate and 200 mM NaCl) supplemented with freshly added Proteinase K (Thermo Scientific) at a final concentration of 100 ng/ml. The cell lysates were extracted twice with a buffer-saturated phenol:chloroform:isoamyl alcohol mixture (25:24: 1) and once with chloroform. Next, the genomic DNA was precipitated by the addition of 2.5 volumes of absolute ethanol and 0.5 volumes of 7.5 M ammonium acetate (pH 5.5). After washing with 70% ethanol, the DNA pellets were air-dried and dissolved in 100 μΐ of Tris-EDTA buffer (10 mM Tris [pH 8.0] and 1 mM EDTA) supplemented with RNase A (Thermo Scientific) at a final concentration of 100 μg/ml. The genomic DNA of eGFP-positive clones derived from cultures transfected with plasmid DNA as well as that derived from cultures co-transduced with AdV.Cas9, AdV.gRNA S1 and AdV.A2. donor S1 was extracted by using the DNeasy Blood & Tissue Kit (Qiagen) according to the protocol provided by the manufacturer. PCR analyses of IDLV and AdV gene targeting events. The composition of the PCR mixtures and cycling parameters utilized for the amplification of HR-derived AAVSl-donor DNA junctions are specified in the Table 1 and Table 2,

respectively.

Statistics. Data sets were analyzed by using the GraphPad Prism 5 software package and evaluated for significance by applying unpaired two-way Student's t- tests (P < 0.05 considered significant). Southern blot analyses. Genomic DNA was isolated from individual clones according to the above-described protocol. Next, 10^g DNA samples were digested overnight with Ncol (Thermo Scientific) and were resolved through a 0.8% agarose gel in lx Tris-acetate-EDTA buffer. The DNA was transferred by capillary action onto an Amersham Hybond-XL membrane (GE Healthcare Life Sciences) using standard Southern blot techniques. The 635-bp DNA probe specific for the

"centromeric" A4VS2-derived arm of the targeting donor DNA was obtained by digestion of plasmid pSh.AAVSl.eGFP with XmaJI (Thermo Scientific) followed by preparative agarose gel electrophoresis. The probe was radiolabeled with [oc- 32 P]dATP (GE Healthcare Life Sciences) by using the DecaLabel DNA labeling Kit following the manufacturer's instructions (Thermo Scientific). Prior to its deployment, the radiolabeled probe were separated from unincorporated dNTPs through size -exclusion chromatography in Sephadex-50 columns (GE Healthcare Life Sciences). A Storm 820 Phosphoimager (Amersham Biosciences) was used for the detection of the probe-hybridized DNA. The images were acquired by using the Storm Scanner Control 5.03 software and were processed with the aid of

ImageQuant Tools 3.0 software (both from Amersham Biosciences).

COBRA-FISH karyotyping. The COBRA-FISH karyotyping of HeLa target cells was done according to a published protocol [26].

Analysis of in vivo excision of AdV donor DNA. The experiments designed to assess TALEN-mediated excision of donor sl DNA from AdV backbones were performed as follows. Eighty-thousand HeLa cells were seeded in wells of 24-well plates and the following day they were co-transduced with AdV.A2.TALEN-L S1 (1.5 TU/cell), AdV.A2.TALEN-R sl (1.5 TU/cell) and AdV.A2.donor S1/T T s (3 TU/cell) or were co-transduced with hcAd.FLPe.F50 [27] (2 gene switch- activating units/cell) and AdV.A2.donor sl/FRT (3 TU/cell). Control samples were provided by parallel HeLa cell cultures exposed exclusively to AdV.A2. donor S1/T TS (3 TU/cell) or to

AdV.A2.donor sl/FRT (3 TU/cell). At 72 hours post-transduction extrachromosomal DNA was isolated essentially as described previously [28] after which 2-μ1 DNA samples were subjected to PCR. The PCR mixtures consisted of 0.4 μΜ of primer #997 (5'-GCACTGAAACCCTCAGTCCTAGG-3'), 0.4 μΜ of primer #998 (5'- CGGCGTTGGTGGAGTCC-3'), 0.1 mM of each dNTP (Invitrogen), 1 mM MgC (Promega), lx Colorless GoTaq Flexi Buffer (Promega) and 2.5 U of GoTaq Flexi DNA polymerase (Promega). Next, 50-μ1 PCR mixtures were subjected to an initial 2-min denaturation period at 95°C, followed by 30 cycles of 30 sec at 95°C, 30 sec at 62°C and 45 sec at 72°C. The reactions were terminated by a final extension period of 5 min at 72°C. The detection of the resulting PCR products was performed by conventional agarose gel electrophoresis.

Table 1 Primer pairs and composition of PCR mixtures used for the detection of AAySi-donor sl DNA junctions

Table 2 PCR cycling parameters used for the detection of AAVS l-donor sl DNA junctions.

target initial denaturation denaturation annealing elongation # cycles final elongation

95 °C 95 °C 66 °C 72 °C 72 °C

jT-IDLV 35

5 min 30 sec 30 sec 60 sec 5 min

95 °C 95 °C 63,5 °C 72 °C 72 °C

jT-AdV 40

5 min 30 sec 30 sec 100 sec 5 min

95 °C 95 °C 61 °C 72 °C 72 °C

jC-IDLV 40

5 min 30 sec 30 sec 120 sec 5 min

95 °C 95 °C 61 °C 72 °C 72 °C

jC-AdV 40

5 min 30 sec 30 sec 120 sec 5 min

95 °C 95 °C 62 °C 72 °C 72 °C

eGFP 35

5 min 30 sec 30 sec 30 sec 5 min

Results

Example \. Gene targeting with IDLV donor DNA and engineered nucleases is inaccurate

Integrase-defective lentiviral vectors (IDLVs) differ from their chromosomally integrating counterparts by bearing non-pleiotropic mutations in the catalytic pocket of their integrase moiety [29]. IDLVs represent one of the most commonly used viral vectors for the delivery of HR substrates into human cells

[10, 11, 16,30,31]. Therefore, we started by investigating the specificity and the accuracy of DSB-induced chromosomal insertion of IDLV donor DNA. To this end, the eGFP expression unit in HeLa-derived H27 cells and the AAVSl safe harbor locus in human myoblasts were targeted by zinc-finger nucleases (ZFNs) and transcription activator-like effector nucleases (TALENs), respectively (Fig. 1). The ZFNs [32] and TALENs [19] were introduced into target cells through lentiviral and adenoviral vectors (AdVs), respectively, together with the corresponding target site-matched HR substrates in IDLV.donor eGFP and IDLV.donor S1 (Fig. 1). Human cells exposed exclusively to donor DNA-containing IDLVs served to establish background levels of exogenous DNA genomic integration. After extensive sub- culturing, to eliminate episomal vector DNA, stably transduced cells were identified through live-cell fluorescence microscopy and flow cytometry. Results depicted in Figs. 2a, 2b show a clear nuclease-dependent increase in the

frequencies of stably transduced cells in both experimental systems. Quantification of the eGFP + /FP635 + double positive H27 cell fraction in the total FP635 + population revealed that at least 45.4% of the stably transduced cells underwent off-target vector DNA integration (Fig. 2a). The frequency of H27 cells with randomly inserted donor DNA is, however, likely to be higher considering that the eGFP-/FP635 + population (54.6%) consist of cells containing an eGFP-targeted FP635 cassette and cells harboring eGFP disrupted by NHEJ plus the FP635 cassette inserted at an off-target site (Fig. 2a).

Hitherto, IDLV donor DNA targeting experiments have invariably deployed ZFN technology. However, recent data suggest that TALENs are more specific than obligate heterodimeric ZFN pairs. Thus, to gauge the relative frequencies of on- target versus off-target IDLV donor DNA integration at an endogenous safe harbor locus using TALENs, we randomly selected single cell-derived eGFP + myoblasts (n=104) from cultures initially exposed to the AdV pair AdV.TALEN-L S1 and AdV.TALEN-R S1 (Fig. 2b). PCR screening with primers designed to yield amplicons diagnostic for HR-derived junctions between foreign and native target DNA (Fig. 2c and Fig. 3a) revealed that 86.6% of the eGFP + cells underwent homology-directed chromosomal insertion of the exogenous DNA (Figs. 2c, 2d and Fig. 3a). The resulting A4VS2-donor DNA junctions represented events involving the "telomeric" (6.7%), the "centromeric" (3.9%) or both ends of the targeting template (76.0%) (Fig. 2d). Further characterization of IDLV donor DNA integrants revealed high frequencies of head-to-tail (H-T) concatemeric forms not only in the non-targeted but also in the three -targeted clonal fractions (38.5%) (Figs. 2c, 2d and Fig.

3a). Importantly, PCR analyses of eGFP + cells sorted from cultures that were not exposed to TALENs did not yield amplicons diagnostic for homology-directed gene addition (Fig. 3b). Taken together and in line with the above-described data (Fig. 2a) as well as with other recent studies deploying ZFNs, our results on TALEN- induced IDLV donor DNA targeting indicate that a significant fraction of incoming templates integrates randomly into host cell chromosomes, presumably at spontaneous DSBs through a non-canonical (i.e. integrase -independent) process. Moreover, due to their concatemeric structure, a significant proportion of the targeted IDLV donor DNA harbors unwanted HIV-derived cis-acting elements. These tandem repeats are expected to neither restore endogenous ORFs nor yield homogeneous transgene expression levels in the context of gene repair and gene replacement strategies, respectively. An overview of the prevalent structures that can be acquired by chromosomally integrated IDLV donor DNA is depicted in Fig. 4.

Example 2. Gene targeting with protein-capped AdV donor DNA following site- specific DSBs is accurate

Together with IDLVs, recombinant adeno-associated viral (rAAV) vectors constitute the most commonly used types of viral vectors for the delivery of HR substrates into mammalian cells. Analogously to those of IDLVs, free-ended rAAV genomes become inserted at sporadic genomic DSBs after being co-opted by illegitimate recombination pathways involved in the repair of chromosomal DNA breaks. Thus, we subsequently asked whether donor DNA delivered in the context of protein-capped AdV genomes instead, firstly, display a less promiscuous chromosomal DNA integration profile and secondly, yield a more precise

chromosomal insertion pattern of the exogenous DNA than that resulting from using free-ended vector templates. To compare and to build upon the above- described IDLV gene targeting experiments at the native AAVSl locus, we generated AdV. Δ2. donor S1 to introduce the A4VS2-matched HR substrates into myoblasts exposed and not exposed to the A4VS2-specific TALENs. Flow

cytometric analyses of the resulting long-term cultures showed a significant

TALEN-dependent increase in the frequencies of stably transduced cells (Fig. 5a) with not only TALEN-induced but also residual exogenous DNA chromosomal integration rates being lower than those measured in their IDLV.donor S1 - transduced counterparts (Fig. 2b). The degree of the TALEN-dependent

stimulatory effect was nonetheless similar to that observed in cultures of

IDLV.donor sl -transduced myoblasts (Fig. 2b). Interestingly, we noticed that co- transducing myoblasts with AdV.A2.donor S1 and TALEN-encoding AdVs resulted in eGFP + populations whose narrow distribution of transgene expression levels (Fig. 5b, 2 nd bar) approached those of clones harboring A4VS2-targeted donor sl DNA (Fig. 5b, 1 st bar) and departed from those of IDLV donor sl -modified populations (Fig. 5b, 3 rd and 4 th bars). These results could be readily replicated at the clonal level by comparing the mean fluorescence intensities corresponding to the panels of eGFP + myoblasts randomly selected from TALEN-treated cultures transduced with IDLV.donor S1 or with AdV. Δ2. donor S1 (Fig. 5c). Collectively, these data indicate scarce chromosomal positional effects on transgene activity in AdV.A2. donors- modified populations possibly resulting from a preponderance of site-specific over random genomic DNA insertion events. Indeed, all of the eGFP + myoblast clones isolated from cultures co-transduced with AdV.A2. donor S1 and TALEN-encoding AdVs (n=110) were shown to have A4VS2-foreign DNA junctions resulting from HR events at both termini (Fig. 5d and 5e). To further probe the precision of AdV- mediated gene targeting, we setup a PCR assay to detect head-to-tail AdV DNA concatemers. This assay failed to produce any discernible head-to-tail-specific PCR species on genomic DNA from eGFP + myoblasts sorted from cultures co-transduced with AdV.A2.donor S1 and TALEN-encoding AdVs (Fig. 5f). Besides confirming this notable level of target site specificity, Southern blot analyses of Ad V.A2. donors- modified clones revealed, in addition, a clone that underwent bi-allelic gene targeting and a clone that, in addition to the typical A4VS2-targeted donor DNA, contained an integrant whose origin is consistent with an HR-independent DNA integration event (Fig. 6). Taken together, these data demonstrate that nuclease- mediated gene targeting of AdV donor DNA displays an exquisite level of specificity and accuracy when compared to that of IDLV donor DNA. Next, we performed AdV-based gene targeting experiments in HeLa cells. These cells display a high degree of genetic instability providing, as a result, a more stringent cellular model system on which to evaluate HR-mediated gene targeting amidst a presumably high frequency of spontaneous chromosomal DSBs.

Consistent with the experiments carried out in myoblasts (Fig. 5a), results depicted in Figs. 7a and 7b show a robust TALEN- dependent increase in the frequency of stably transduced HeLa cells. Also in agreement with previous data (Fig. 5f), PCR specific for head-to-tail AdV DNA concatemers did not reveal the presence of this direct repeat arrangement in eGFP + populations purified from HeLa cell cultures co-transduced with Ad V.Al. donor S1 and TALEN-encoding AdVs (Fig. 7c).

Importantly, all the clones randomly selected from these eGFP + HeLa cell populations (n=83) were genetically modified through homology-directed gene targeting at AAVSl (Fig. 7d and 7e).

Example 3. Gene targeting with uncapped AdV donor DNA following site-specific DSBs is inaccurate

Next, we sought to confirm donor DNA embedding in protein-capped AdV genomes as a key determinant conferring high specificity and fidelity to the AdV-based gene targeting process. To this end, we generated AdV.A2.donor sl/T TS and used it together with the AdV pair AdV.A2.TALEN-L S1 and AdV.A2.TALEN-R S1 to transduced HeLa cells. The donor DNA in this new vector has recognition sequences (T-TS) for the A4VS2-specific TALENs flanking its donor DNA payload (Fig. 8a). At striking difference with the results obtained with the T-TS-negative AdV (Fig. 7d and 7e), PCR screening of genomic DNA from thirty eGFP + HeLa cell clones expanded from cultures exposed to TALENs and AdV.A2.donor sl/T TS DNA, revealed that seven of these lines lacked HR-derived A4VS2-exogenous DNA "centromeric" junctions (Fig. 8a). Control experiments established the release of donor sl DNAfrom AdV.A2.donor sl/T TS in transduced cells demonstrating for the first time the susceptibility of AdV DNA to TALEN catalytic activity (Fig. 19). Taken together, these results demonstrate that HR substrates delivered in the context of AdV DNA leads to more precise gene targeting events than those resulting from using free-ended viral vector genomes (Fig. 8b). Hence, these data suggest a crucial role of the donor DNA structure in achieving high rates of accurate homology- directed gene targeting following DSB formation at a target chromosomal locus.

Example 4. AdV gene targeting is compatible with the RNA-guided CRISPR/Cas9 nuclease system

Towards expanding the utility of AdV gene targeting, we investigated its compatibility with the versatile RNA-guided nuclease (RGN) system derived from clustered regularly interspaced short palindromic repeats (CRIS PR) -associated Cas9 loci linked to adaptive immunity in prokaryotes [33]. To this end, we deployed a vector pair composed of AdV.Cas9 and AdV.gRNA S1 . This vector pair encodes Cas9 and a single guide RNA (gRNA S1 ) addressing the Cas9 nuclease to a position in the human genome overlapping with the target site of the A4VS2-specific TALENs (Fig. 11a). Co-transduction of HeLa cells with AdV.Cas9 and AdV.gRNA S1 resulted in robust and dose -dependent DSB formation at AAVSl (Fig. 12). Next, gene targeting experiments were initiated by co-transducing HeLa cells with AdV.A2.donor S1 , AdV.Cas9 and AdV.gRNA S1 . HeLa cells co-transduced exclusively with AdV.A2.donor S1 and AdV.Cas9, served as negative controls. In addition, we sought to determine genome modification endpoints resulting from gene targeting experiments using conventional non-viral vectors in the form of donor DNA plasmids with circular and linear topologies. In these experiments, to maximize the target site specificity of plasmid-based gene targeting approaches, we deployed the highly specific TALENs. The free-ended non-viral vector HR templates were generated in vitro and in cellula by restriction enzyme- and TALEN-induced DNA cleavage, respectively. Of note, the non-prokaryotic DNA portions of these plasmid molecules are isogenic to those present in protein-capped AdV.A2.donor S1 genomes. Thus, in these non-viral vector gene targeting experiments, HeLa cells were co- transfected with expression constructs encoding the A4VS2-specific TALENs mixed with pAdV.donor S1 (supercoiled), Pacl-linearized pAdV.donor S1 (in vitro linearized) or TALEN-cleavable pAdV. donor S1/T TS (in vivo linearized). HeLa cells co-transfected exclusively with the TALEN-L S1 expression construct and each of the donor DNA plasmid types, provided for negative controls. In line with previous data, the percentages of eGFP + cells in populations initially exposed to TALEN-L S1 :TALEN-R S1 dimers and AdV.A2.donor S1 (Figs. 7a and 7b) were comparable to those measured in cultures exposed to Cas9:gRNA S1 complexes and AdV.A2.donor S1 (Fig. 13, first panel). Moreover, in the various experimental settings, the frequencies of genetically modified cells were significantly higher in cultures that had been subjected to site-specific DSBs, indicating nuclease- dependent gene targeting (Fig. 13). Crucially, stably transduced HeLa cells generated by co-delivering RGN complexes and AdV.A2.donor S1 DNA displayed a remarkably narrow range of transgene expression levels, as determined by flow cytometric screening of randomly selected eGFP + clones (Figs, lib and 11c bottom line and right box, respectively). These results are reminiscent of those obtained by flow cytometric analyses of eGFP + clones derived from myoblast cultures receiving TALENs and AdV donor DNA (Fig. 5c). In contrast, target cells exposed to TALENs and donor DNA plasmids led to eGFP + clonal sets displaying significantly broader distributions of transgene expression levels, independently of the topology of the input targeting DNA (Figs lib and 11c). The higher homogeneity of transgene expression among the AdV-modified cells is also grasped via correlating coefficient of variation (CV) and MFI values for each individual clone (Fig. 14).

Collectively, these results suggest that incorporating targeting DNA in AdV genomes rather than in conventional linear or circular recombinant plasmids contribute to diminish off-target exogenous DNA insertions and, as a corollary, chromosomal position effects on transgene activity. Indeed, PCR analyses of donor ΌΝΑ-AAVSl junctions established a higher target site specificity of AdV over non- viral vector HR substrates even when using the presumably more promiscuous [34- 36], yet more versatile, RGN system [33] (Fig. lid and Figs 15, 16 and 17). Of note, these data represent also to the best of our knowledge, the first demonstration of the utility of the CRISPR/Cas9 system for viral vector-mediated exogenous DNA targeting.

There are precedents for the delivery and persistence of bacterial DNA in mammalian cells exposed to both non- viral and viral vectors [37,38]. Besides their unpredictable structures, these prokaryotic DNA "footprints" are also undesirable due to their immunostimulatory and methylation-prone nucleotide patterns (e.g.

CpG motifs). The probing for bacterial DNA in cell populations genetically modified by AdV- and plasmid-based gene targeting led to the detection of Kan K DNA exclusively in the latter case (Fig. 18). This finding provides an additional rationale for using the "scarless" genome editing approach based on designer nucleases and AdV donor DNA.

References

1. Miller, D.G. et al. Adeno-associated virus vectors integrate at chromosome breakage sites. Nat. Genet. 36, 767-773 (2004).

2. Papapetrou, E.P. et al. Genomic safe harbors permit high β-globin transgene expression in thalassemia induced pluripotent stem cells. Nat. Biotechnol. 29, 73-78 (2011).

3. Tsai, H.H. et al. Terminal proteins of Streptomyces chromosome can target DNA into eukaryotic nuclei. Nucleic Acids Res. 36, e62 (2008).

4. Mencia, M. et al. Terminal protein-primed amplification of heterologous

DNA with a minimal replication system based on phage Phi29. Proc. Natl. Acad. Sci. USA 108, 18655- 18660 (2011).

5. Fallaux, F.J. et al. New helper cells and matched early region 1-deleted adenovirus vectors prevent generation of replication-competent

adenoviruses. Hum. Gene Ther. 9, 1909- 1917 (1998).

6. Schiedner, G. et al. Efficient transformation of primary human amniocytes by El functions of Ad5: generation of new cell lines for adenoviral vector production. Hum Gene Ther. 11, 2105-2116 (2000).

7. Havenga, M.J. et al. Serum-free transient protein production system based on adenoviral vector and PER.C6 technology: high yield and preserved bioactivity. Biotechnol. Bioeng. 100, 273-283 (2008).

8. Silva, G. et al. Meganucleases and other tools for targeted genome

engineering: perspectives and challenges for gene therapy. Curr. Gene Ther. 11, 11-27 (2011).

9. Gaj, T. et al. ZFN, TALEN, and CRISPR/Cas-based methods for genome engineering. Trends Biotechnol. 31, 397-405 (2013).

10. Lombardo, A. et al. Gene editing in human stem cells using zinc finger

nucleases and integrase-defective lentiviral vector delivery. Nat. Biotechnol. 25, 1298-1306 (2007).

11. Gabriel, R. et al. An unbiased genome-wide analysis of zinc-finger nuclease specificity. Nat. Biotechnol. 29, 816-823 (2011). Redrejo-Rodriguez, M. et al. Functional eukaryotic nuclear localization signals are widespread in terminal proteins of bacteriophages. Proc. Natl. Acad. Sci. USA 109, 18482- 18487(2012).

Goncalves, M.A.F.V., van der Velde, I., Knaan-Shanzer, S., Valerio, D. & de Vries, A.A.F. Stable transduction of large DNA by high-capacity adeno- associated virus/adenovirus hybrid vectors. Virology 321, 287-296 (2004). Cudre-Mauroux, C. et al. Lentivector-mediated transfer of Bmi- 1 and telomerase in muscle satellite cells yields a duchenne myoblast cell line with long-term genotypic and phenotypic stability. Hum. Gene Ther. 14, 1525- 1533 (2003).

Goncalves, M.A.F.V. et al. Transcription factor rational design improves directed differentiation of human mesenchymal stem cells into skeletal myocytes. Mol. Ther. 19, 1331-1341 (2011).

Coluccio, A. et al. Targeted gene addition in human epithelial stem cells by zinc-finger nuclease -mediated homologous recombination. Mol. Ther. 21, 1695- 1704 (2013).

Janssen, J.M., Liu, J., Skokan, J., Goncalves, M.A.F.V. & de Vries, A.A.F. Development of an AdEasy-based system to produce first- and second- generation adenoviral vectors with tropism for CAR- or CD46-positive cells. J. Gene Med. 15, 1-11 (2013).

Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823-826 (2013).

Holkers, H. et al. Differential integrity of TALE nuclease genes following adenoviral and lentiviral vector gene transfer into human cells. Nucleic Acids Res. 41, e63 (2013).

Holkers, M., Cathomen, T. & Goncalves M.A.F.V. Construction and characterization of adenoviral vectors for the delivery of TALENs into human cells. Methods, DOI 10.1016/j.ymeth.2014.02.017.

Pelascini, L.P. et al. Histone deacetylase inhibition rescues gene knockout levels achieved with integrase- defective lentiviral vectors encoding zinc- finger nucleases. Hum. Gene Ther. Methods 24, 399-411 (2013).

Pelascini, L.P.L., Janssen, J.M. & Goncalves M.A.F.V. Histone deacetylase inhibition activates transgene expression from integration- defective lentiviral vectors in dividing and non-dividing cells. Hum. Gene Ther. 24, 78-96 (2013).

23. Pelascini, L.P.L. & Goncalves M.A.F.V. Lentiviral Vectors Encoding Zinc- Finger Nucleases Specific for the Model Target Locus HPRT1. Methods Mol. Biol. In press (2013).

24. Brielmeier, M. et al. Improving stable transfection efficiency: antioxidants dramatically improve the outgrowth of clones under dominant marker selection. Nucleic Acids Res. 26, 2082-2085 (1998).

25. van Nierop, G.P., de Vries, A.A.F., Holkers, M., Vrijsen, K.R & Goncalves, M.A.F.V. Stimulation of homology-directed gene targeting at an endogenous human locus by a nicking endonuclease. Nucleic Acids Res. 37, 5725-5736 (2009).

26. Szuhai, K. & Tanke HJ. COBRA: combined binary ratio labeling of nucleic- acid probes for multi-color fluorescence in situ hybridization karyotyping. Nat. Protoc. 1, 264-275 (2006).

27. Goncalves, M.A.F.V. et al. Targeted chromosomal insertion of large DNA into the human genome by a fiber-modified high-capacity adenovirus -based vector system. PLoS ONE 3, e3084 (2008).

28. Holkers, M. et al. Nonspaced inverted DNA repeats are preferential targets for homology-directed gene repair in mammalian cells. Nucleic Acids Res. 40, 1984-1999 (2012).

29. Wanisch, K. & Yanez-Munoz, R.J. Integration-deficient lentiviral vectors: a slow coming of age. Mol. Ther. 17, 1316- 1332 (2009).

30. Lombardo, A. et al. Site-specific integration and tailoring of cassette design for sustainable gene transfer. Nat. Methods 8, 861-869 (2011).

31. Benabdallah, B.F. et al. Targeted gene addition of micro dystrophin in mice skeletal muscle via human myoblast transplantation. Mol. Ther. Nucleic Acids 2, e68 (2013).

32. Hoher, T. et al. Highly efficient zinc finger nuclease-mediated disruption of an eGFP transgene in keratinocyte stem cells without impairment of stem cell properties. Stem Cell Rev. 8, 426-434 (2012). 33. Terns, R.M. & Terns, M.P. CRISPR-based technologies: prokaryotic defense weapons repurposed. Trends Genet. 30, 111-118 (2014).

34. Fu, Y. et al. High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells. Nat. Biotechnol. 31, 822-826 (2013).

35. Hsu, P.D. et al. DNA targeting specificity of RNA-guided Cas9 nucleases.

Nat. Biotechnol. 31, 827-832 (2013).

36. Cradick, T.J., Fine, E.J., Antico, C.J. & Bao, G. CRISPR/Cas9 systems

targeting β-globin and CCR5 genes have substantial off-target activity. Nucleic Acids Res. 41, 9584-9592 (2013).

37. Miller, D.G., Rutledge, E.A. & Russell D.W. Chromosomal effects of adeno- associated virus vector integration. Nat. Genet. 30, 147-148 (2002).

38. Chadeuf, G., Ciron, C, Moullier, P. & Salvetti, A. Evidence for

encapsidation of prokaryotic sequences during recombinant adeno- associated virus production and their in vivo persistence after vector delivery. Mol. Ther. 12, 744-753 (2005).