Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
NON-VIRAL TRANSGENESIS
Document Type and Number:
WIPO Patent Application WO/2021/022308
Kind Code:
A2
Abstract:
Provided herein are new compositions and methods for use in introducing transgenes into cells. The compositions are non-viral but achieve levels of transgene integration comparable to those obtained with viral-mediated methods, and can be used for targeted integration of a transgene at a specific genomic locus.

Inventors:
NI CHIH-WEN (US)
CHIANG CHANG-YING (US)
Application Number:
PCT/US2020/070344
Publication Date:
February 04, 2021
Filing Date:
July 31, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV CALIFORNIA (US)
International Classes:
C12N15/86
Attorney, Agent or Firm:
BRENNAN, Sean (US)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1. A polynucleotide comprising:

(a) one or more selection markers, wherein the selection markers are flanked by

(b) first and second att sites, wherein the att sites are flanked by

(c) first and second truncated retroviral long terminal repeats (LTRs), wherein the first truncated LTR is upstream of the first att site, the second truncated LTR is downstream of the second att site, and the first and second truncated retroviral LTRs are flanked by

(d) recognition sites for a restriction enzyme, wherein cleavage of the recognition sites generates blunt ends; and

(e) first and second 5'-ACTG-3' sequences, present at or near the termini of the polynucleotide.

2. The polynucleotide of claim 1, wherein the polynucleotide is double-stranded.

3. The polynucleotide of claim 1, wherein the selection marker is one or more of chloramphenicol resistance and the ccdB locus.

4. The polynucleotide of claim 1, wherein the first and second att sites are attR sites.

5. The polynucleotide of claim 4, wherein the first attR sites is attR4 and the second attR site is attR3.

6. The polynucleotide of claim 1, wherein the retroviral LTRs are LTRs from a lentivirus.

7. The polynucleotide of claim 6, wherein the lentivirus is a human

immunodeficiency virus (HIV).

8. The polynucleotide of claim 7, wherein the HIV is HIV-1.

9. The polynucleotide of claim 6, wherein the first truncated LTR comprises R and U5 sequence elements.

10. The polynucleotide of claim 6, wherein the second truncated LTR comprises dU3, R and U5 sequence elements.

11. The polynucleotide of claim 9, wherein the first truncated LTR sequence comprises:

12. The polynucleotide of claim 10, wherein the second truncated LTR sequence comprises:

13. The polynucleotide of claim 1, wherein the restriction enzyme is selected from the group consisting of Pmel, Seal and Bst Zlll.

14. The polynucleotide of claim 1, wherein the first and second 5'-ACTG-3' sequences are present at the termini of the polynucleotide.

15. The polynucleotide of claim 1, wherein the first and second 5'-ACTG-3' sequences are present one base pair inside the termini of the polynucleotide.

16. The polynucleotide of claim 1, wherein the first and second 5'-ACTG-3' sequences are present two base pairs inside the termini of the polynucleotide.

17. A polynucleotide comprising: (a) sequences encoding chloramphenicol resistance and the ccdB locus, wherein the sequences are flanked by

(b) an upstream attR4 site and a downstream attR3 site, wherein the att sites are flanked by

(c) a 5' dLTR sequence comprising R and U5 sequence elements upstream of the attR4 site and a 3' dLTR sequence comprising dU3, R and U5 sequence elements downstream of the attR3 site, wherein the 5' and 3' dLTR sequences are flanked by

(d) recognition sites for a restriction enzyme selected from the group consisting of Pmel , Seal and BstZ 171.

18. The polynucleotide of claim 17, wherein the 5' dLTR sequence comprises SEQ ID NO:4, and the 3' dLTR sequence comprises SEQ ID NO:5.

19. A polynucleotide having at least 90% sequence homology to the

polynucleotide of claim 18.

20. A polynucleotide that is complementary to the polynucleotide of claim 18.

21. A polynucleotide that hybridizes under stringent conditions to the

polynucleotide of claim 18.

22. The polynucleotide of claim 1, wherein:

(a) the polynucleotide further comprises a transgene disposed between the first and second truncated retroviral LTRs; and

(b) the polynucleotide does not contain a selection marker.

23. The polynucleotide of claim 22, wherein the first and second att sites are attP sites.

24. The polynucleotide of claim 22, wherein the first attP sites is attP4 and the second attP site is attP 3.

25. A polynucleotide comprising: (a) sequences encoding a transgene, wherein the sequences encoding a transgene are flanked by

(b) an upstream attP4 site and a downstream attP3 site, wherein the att sites are flanked by

(c) a 5' dLTR sequence comprising R and U5 sequence elements upstream of the attR4 site and a 3' dLTR sequence comprising dU3, R and U5 sequence elements downstream of the attR3 site, wherein the 5' and 3' dLTR sequences are flanked by

(d) recognition sites for a restriction enzyme selected from the group consisting of Pmel , Seal and BstZ 171.

26. A nucleic acid vector comprising the polynucleotide of claim 1.

27. A nucleic acid vector comprising the polynucleotide of claim 17.

28. The vector of claim 26, wherein:

(a) the vector further comprises a transgene disposed between the first and second truncated retroviral LTRs; and

(b) the vector does not contain a selection marker disposed between the first and second truncated retroviral LTRs.

29. A nucleic acid vector comprising:

(a) sequences encoding a transgene, wherein the sequences encoding a transgene are flanked by

(b) an upstream attP4 site and a downstream attP3 site, wherein the att sites are flanked by

(c) a 5' dLTR sequence comprising R and U5 sequence elements upstream of the attR4 site and a 3' dLTR sequence comprising dU3, R and U5 sequence elements downstream of the attR3 site, wherein the 5' and 3' dLTR sequences are flanked by

(d) recognition sites for a restriction enzyme selected from the group consisting of Pmel , Seal and BstZ 171.

30. The vector of claim 28, wherein the vector is cleaved with a restriction enzyme selected from the group consisting of Pmel, Seal and BstZlll.

31. The vector of claim 29, wherein the vector is cleaved with a restriction enzyme selected from the group consisting of Pmel , Seal and BstZlll.

32. A combination comprising:

(a) the polynucleotide of claim 22; and

(b) a plasmid containing sequences encoding a retroviral integrase.

33. A combination comprising:

(a) the polynucleotide of claim 25; and

(b) a plasmid containing sequences encoding a retroviral integrase.

34. A combination comprising:

(a) the vector of claim 30; and

(b) a plasmid containing sequences encoding a retroviral integrase.

35. A combination comprising:

(a) the vector of claim 31; and

(b) a plasmid containing sequences encoding a retroviral integrase.

36. A combination comprising:

(a) the polynucleotide of claim 22; and

(b) mRNA encoding a retroviral integrase.

37. A combination comprising:

(a) the polynucleotide of claim 25; and

(b) mRNA encoding a retroviral integrase.

38. A combination comprising:

(a) the vector of claim 30; and

(b) mRNA encoding a retroviral integrase.

39. A combination comprising:

(a) the vector of claim 31; and

(b) mRNA encoding a retroviral integrase.

40. The combination of any of claims 32-39, wherein the retroviral integrase is from a lentivirus.

41. The combination of claim 40, wherein the lentivirus is human

immunodeficiency virus (HIV).

42. The combination of claim 41, wherein the HIV is HIV-1.

43. The combination of any of claims 32-42, wherein the integrase comprises an additional nuclear localization signal (NLS) not present in the naturally-occurring integrase protein.

44. The combination of claim 43, wherein, the additional NLS is selected from the group consisting of the SV40 NLS, the c -myc NLS, the HIV Vpr NLS and the hnRNP A1 NLS.

45. A method for inserting a transgene into the genome of a cell, the method comprising contacting the cell with the combination of any of claims 32-44.

46. The combination of any of claims 32-44, further comprising:

(a) a polynucleotide encoding a fusion between dCas9 and psipla; and

(b) a guide RNA comprising:

(i) a hairpin sequence that binds to Cas9 or dCas9, and

(ii) a sequence complementary to a target sequence.

47. A method for targeted insertion of a transgene into the genome of a cell, the method comprising contacting the cell with the combination of claim 46.

48. The method of either of claims 45 or 47, wherein contact is by transfection.

49. The method of either of claims 45 or 47, wherein contact is by injection.

50. A plasmid comprising: (a) a first recognition site for a restriction enzyme selected from the group consisting of Pme I, Seal and BstZ 171;

(b) the sequence 5'-ACTG-3'

3'-TGAC-5';

(c) a first truncated long terminal repeat (LTR) sequence comprising SEQ ID NO:4 that is interior to the first recognition site and the 5'-ACTG-3' sequence;

(d) an attR4 site that is interior to the first truncated LTR sequence;

(e) the ccdB locus;

(f) an attR3 site that is exterior to the ccdB locus;

(g) a second truncated long terminal repeat (LTR) sequence comprising SEQ ID NO:5 that is exterior to the at/R3 site;

(h) the sequence 5'-CAGT-3'

3'-GTCA-5'; and

(i) a second recognition site for a restriction enzyme selected from the group consisting of Pmel, Seal and BstZ 171, wherein the second recognition site is the same as the first recognition site; and

wherein the5'-CAGT-3' sequence and the second recognition site are exterior to the second truncated LTR sequence.

51. The plasmid of claim 50, wherein the 5'-ACTG-3' sequence overlaps with the first recognition site and the 5'-CAGT-3' sequence overlaps with the second recognition site.

52. A plasmid comprising:

(a) a first recognition site for a restriction enzyme selected from the group consisting of Pmel , Seal and BstZ 171;

(b) the sequence 5'-ACTG-3'

3'-TGAC-3';

(c) a first truncated long terminal repeat (LTR) sequence comprising SEQ ID NO:4 that is interior to the first recognition site and the 5'-ACTG-3' sequence;

(d) an attP4 site that is interior to the first truncated LTR sequence;

(e) a transgene;

(f) an attP3 site that is exterior to the transgene;

(g) a second truncated long terminal repeat (LTR) sequence comprising SEQ ID NO:5 that is exterior to the attP3 site;

(h) the sequence 5'-CAGT-3' and

(i) a second recognition site for a restriction enzyme selected from the group consisting of Pmel , Seal and BstZ 171, wherein the second recognition site is the same as the first recognition site; and

wherein the5'-CAGT-3' sequence and the second recognition site are exterior to the second truncated LTR sequence.

53. The plasmid of claim 52, wherein the 5'-ACTG-3' sequence overlaps with the first recognition site and the 5'-CAGT-3' sequence overlaps with the second recognition site.

Description:
NON- VIRAL TRANSGENESIS

[0001] This application claims the benefit of United States Provisional Patent

Application No. 62/881,822 filed August 1, 2019, the entire disclosure (including text, drawings, and photographs) of which is incorporated by reference herein, in its entirety, for all purposes.

SEQUENCE LISTING

[0002] The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on July 29, 2020, is named M2-PCT_SL.txt and is 54,500 bytes in size.

FIELD

[0003] The present disclosure is in the field of transgenesis. New compositions for use in inserting a transgene into a cell; and methods utilizing said new compositions, are provided herein.

BACKGROUND

[0004] Methods for insertion of exogenous genes (transgenes) into cells are increasingly important in the fields of genetic research and gene therapy. Although a number of methods for introducing transgenes into cells exist; all are beset with problems of one sort or another. Transfection methods (i.e., simply contacting cells with naked DNA or a DNA conjugate) have a low efficiency and often result in the exogenous sequences undergoing rearrangement in the recipient cell.

[0005] Viral vectors; including adenovirus, adeno-associated virus (AAV), retrovirus, foamy virus, herpesvirus, and poxvirus vectors; have also been used for inserting transgenes into cells. Viral transgenesis is more efficient than simple transfection, and can provide stable transgenesis if the virally-introduced transgene is integrated into the recipient cell genome, or maintained in the recipient cell as an episome. However, viral vectors require modification of the viral genome so that replication is blocked or inefficient; which, in turn, requires that the debilitated vector virus be propagated in the presence of a helper virus (which supplies, in trans, the functions missing in the vector virus), requiring complicated culture systems.

[0006] An additional drawback associated with the use of viral vectors is the limitations on the size of the transgene that can be inserted into a viral vector; since even vector viruses must retain a certain amount of viral sequences to work effectively as a delivery vehicle; and most viruses are unable to package DNA molecules any larger that about 110% of viral genome size.

[0007] Another problem with the use of viral vectors in gene therapy is the ability of the capsid proteins of the vector virus to induce an immune response, which can destroy or damage the vector before the transgene is stably introduced into the recipient cell.

[0008] One class of viral vectors is retroviruses. Retroviruses (which include the genus of lentiviruses) have a single-stranded RNA genome. A repeated sequence (R) is present at the extreme 5' and 3' ends of the retroviral genome. Immediately interior to the R sequence, at the 5' end of viral RNA, is a sequence known as U5. Immediately interior to the R sequence, at the 3' end of viral RNA, is a sequence known as U3. A schematic diagram of a generic retroviral RNA genome, showing the location of the R, U5 and U3 sequences, is shown in Figure 1.

[0009] During the retroviral infectious cycle, the RNA genome is copied into a single- stranded DNA molecule (by a process of reverse transcription, catalyzed by the reverse transcriptase enzyme, product of the viral pol gene). The single-stranded DNA product of reverse transcription is then copied (again by reverse transcriptase) to form a double-stranded viral DNA molecule. Due to the nature of the copying processes ( e.g ., requirements for primers), the U3 sequence becomes appended to the 5' end of the double-stranded viral DNA genome (exterior to the R sequence); and the U5 sequence is appended to the 3' end of the double-stranded viral DNA genome (exterior to the R sequence), forming identical long terminal repeat (LTR) sequences at the termini of the double-stranded DNA genome. A schematic diagram of a generic retroviral double-stranded DNA genome, showing the location of the LTRs, and their constituent R, U5 and U3 sequences, is shown in Figure 2.

[0010] Following conversion of the single-stranded RNA genome to a double- stranded DNA genome; the double-stranded DNA genome, flanked by its LTRs, is inserted into the host cell genome. This insertion reaction is catalyzed by the viral integrase protein (also a product of the pol gene), and requires a double-stranded, blunt-ended DNA molecule, with the inverted terminal repeat sequence 5'-ACTG-3' (for HIV-1) as a substrate. The integrase protein removes the terminal TG residues on each strand, generating a double- stranded DNA molecule with a two-nucleotide 5' overhang (5'-AC-3') at each end. This molecule serves as a substrate for strand transfer by the int protein and is integrated into the host cell genome. [0011] Retrovirus genomes are generally 8kb or more in length and because, in most cases, all viral structural genes can be removed and replaced with exogenous sequences, retroviral vectors have a high capacity; requiring only that the transgene be flanked by viral LTRs to facilitate integration. However, the efficiency of stable transgenesis using retroviruses is comparatively low, and most retroviruses (excepting lentiviruses) are unable to infect dividing cells. Furthermore, when retrovirus vectors are used in gene therapy applications, retroviral capsid proteins can trigger immune responses.

[0012] For the reasons discussed above, there remains a need for transgenesis systems which have the benefits of viral vectors, such as high efficiency of genomic integration; but that do not suffer from the drawbacks associated with viral vectors, such as limited capacity and immunogenicity.

SUMMARY

[0013] Disclosed herein are nucleic acid compositions, and methods for their manufacture and use, that promote highly efficient insertion of transgenes, at levels commonly achieved with viral vectors, but without the use of virus particles. The compositions include transgene cassettes, which have a linear double-stranded DNA structure that resembles a retroviral pre-integration substrate, characterized by blunt ends, a terminal 5'-ACTG-3' sequence and truncated retroviral long terminal repeat (LTR) sequences. Nucleic acid vectors (insertion vectors) comprising transgene cassettes are also provided.

[0014] Transgene cassettes can be released from an insertion vector ( e.g ., a double- stranded circular plasmid DNA molecule) by cleavage with a restriction enzyme that generates blunt ends. Insertion vectors comprise one or more pairs of att sites, optionally with a negative selection marker disposed therebetween, for convenient insertion of transgenes using gateway cloning methods. Exterior to the att sites, insertion cassettes contain truncated retroviral long terminal repeat (LTR) sequences, a 5'-ACTG-3' sequence and recognition sites for a blunt end-generating restriction enzyme.

[0015] Integration of a transgene into the genome of a cell is accomplished by contacting the cell with a transgene cassette and a source of retroviral integrase (e.g., DNA or mRNA encoding a retroviral integrase (ini) enzyme. The integrase protein recognizes the transgene cassette as a substrate for integration, and integrates the transgene cassette into the genome of the recipient cell. [0016] Accordingly, in certain embodiments, provided herein is a polynucleotide {i.e., a transgene cassette) comprising: (a) one or more selection markers, wherein the selection markers are flanked by (b) first and second att sites, wherein the att sites are flanked by (c) first and second truncated retroviral long terminal repeats (LTRs), wherein the first truncated LTR is upstream of the first att site, the second truncated LTR is downstream of the second att site, and wherein the first and second truncated retroviral LTRs are flanked by recognition sites for a restriction enzyme, wherein cleavage of the recognition sites generates blunt ends, and wherein the sequence 5'-ACTG-3' is present at or near the termini of the polynucleotide.

[0017] In certain embodiments, the polynucleotide described in the preceding paragraph is a double-stranded DNA molecule. In additional embodiments, the

polynucleotide is single-stranded DNA or RNA.

[0018] Selection markers can be positive selection markers {i.e., the presence of the marker promotes cell viability in the presence of a selective agent) or negative selection markers ( e.g ., a marker that is inhibitory to cell viability so that cells survive when the marker is removed or replaced by exogenous sequences). Exemplary positive selection markers include those encoding resistance to antibiotics such as , for example, penicillin, ampicillin, tetracycline and chloramphenicol. Exemplary negative selection markers include the DNA gyrase inhibitor ccdB.

[0019] In certain embodiments, the att sites present in the transgene cassette are attR sites. In further embodiments, the first att site is attR4 and the second att site is attR3. In additional embodiments, the att sites are attL sites, attP sites or attB sites. Mutants and variants of att sites such as, for example, attP3, attP4, attRl, attR2 attR3 attR4, attLl, attL2 attL3 and attL4 are known in the art.

[0020] Truncated retroviral LTR sequences can be obtained from the genome of any retrovirus, as known in the art. In certain embodiments, the retrovirus is a lentivirus and the transgene cassette contains truncated lentiviral LTRs. In additional embodiments, the lentivirus is HIV, and the transgene cassette contains truncated HIV LTRs. In further embodiments, the lentivirus is HIV-1, and the transgene cassette contains truncated HIV-1 LTRs.

[0021] In certain embodiments, a truncated retroviral LTR is one in which one or more transcriptional regulatory sequences, normally present in the U3 region, are removed. Accordingly, certain truncated LTRs contain deleted U3 (dU3) R and U5 sequences. In additional embodiments of a truncated retroviral LTR, all U3 sequences are removed.

Accordingly, certain truncated LTRS contain R and U5 sequences, but no U3 sequences. In certain embodiments, the first truncated LTR comprises R and U5 sequence elements and the second truncated LTR comprises dU3, R and U5 sequence elements. In additional embodiments, the first truncated LTR comprises the nucleotide sequence:

[0022] In additional embodiments, the second truncated LTR sequence comprises the nucleotide sequence:

[0023] In further embodiments, the first truncated LTR comprises the nucleotide sequence:

)

and the second truncated LTR sequence comprises the nucleotide sequence:

[0024] The termini of the transgene cassette comprise recognition sites for a restriction enzyme whose cleavages results in production of blunt ends. In certain embodiments, the recognition sites comprise six or more nucleotide pairs (i.e., six, seven, eight, nine, ten, twelve or more nucleotide pairs). The longer the recognition site, the less likely it is that the restriction enzyme that recognizes that site will also recognize a site in the transgene insert (thereby destroying the integrity of the transgene). Generally both recognition sites will be recognized by the same restriction enzyme, but it is also possible to have recognition sites for different restriction enzymes at each end of the cassette, as long as both enzymes generate blunt ends after cleavage. In certain embodiments, the recognition sites are the same at both ends of the cassette and are recognized by a restriction enzyme selected from the group consisting of Pmel , Seal and Bst Zlll.

[0025] Transgene cassettes also contain the sequence 5'-ACTG-3' at or near the termini of the polynucleotide. In certain embodiments, the sequence 5'-ACTG-3' is present exactly at the termini of the transgene cassette, such that the transgene cassette terminates in blunt ends having the sequence

[0026] In other embodiments, one additional nucleotide pair is present, outside the sequence 5'-ACTG-3' , at the termini of the transgene cassette. In additional embodiments, two additional nucleotide pairs are present, outside the sequence 5'-ACTG-3' , at the termini of the transgene cassette. In further embodiments, three, four or five additional nucleotide pairs are present, outside the sequence 5'-ACTG-3' , at the termini of the transgene cassette.

[0027] In certain embodiments, provided herein is a transgene cassette comprising (a) sequences encoding chloramphenicol resistance and the ccdB locus, wherein the sequences encoding chloramphenicol resistance and the ccdB locus are flanked by (b) an upstream attR4 site and a downstream attR3 site, wherein the att sites are flanked by (c) a 5' dLTR sequence comprising R and U5 sequence elements upstream of the attR4 site and a 3' dLTR sequence comprising dU3, R and U5 sequence elements downstream of the attR3 site, wherein the 5' and 3' dLTR sequences are flanked by (e) recognition sites for a restriction enzyme selected from the group consisting of Pmel , Seal and BstZlll and wherein all or part of the sequence 5'-ACTG-3' is present within or near the recognition site for the restriction enzyme.

[0028] In certain embodiments of the transgene cassette described in the preceding paragraph, the 5' dLTR sequence comprises SEQ ID NO:4, and the 3' dLTR sequence comprises SEQ ID NO:5.

[0029] In additional embodiments, polynucleotides whose nucleotide sequences are homologous to that of the transgene cassette are provided. The nucleotide sequences of the homologous polynucleotides are at least 50% homologous, at least 60% homologous, at least 70% homologous, at least 75% homologous, at least 80% homologous, at least 85% homologous, at least 90% homologous, at least 95% homologous, at least 96% homologous, at least 97% homologous, at least 98% homologous, or at least 99% homologous to the sequence of the transgene cassettes described herein. Such homologous polynucleotides can be DNA or RNA and can be single-stranded or double-stranded. [0030] In additional embodiments, polynucleotides having nucleotide sequences complementary to the sequence of either strand of the transgene cassette are provided. Such polynucleotides can be DNA or RNA. In further embodiments, this disclosure provides polynucleotides that hybridize under stringent conditions to a transgene cassette as disclosed herein.

[0031] Also provided are nucleic acid vectors ( e.g ., plasmid vectors) comprising a transgene cassette as disclosed herein; i.e ., transgene vectors. Accordingly, in certain embodiments, provided herein is a plasmid comprising: (a) one or more selection markers, wherein the selection markers are flanked by (b) first and second att sites, wherein the att sites are flanked by (c) first and second truncated retroviral long terminal repeats (LTRs), wherein the first truncated LTR is upstream of the first att site, the second truncated LTR is downstream of the second att site, and wherein the first and second truncated retroviral LTRs are flanked by recognition sites for a restriction enzyme, wherein cleavage of the recognition sites generates blunt ends and wherein all or part of the sequence 5'-ACTG-3' is present within or near the recognition site for the restriction enzyme.

[0032] In additional embodiments, provided herein is a plasmid comprising (a) sequences encoding chloramphenicol resistance and the ccdB locus, wherein the sequences encoding chloramphenicol resistance and the ccdB locus are flanked by (b) an upstream attR4 site and a downstream attR3 site, wherein the att sites are flanked by (c) a 5' dLTR sequence comprising R and U5 sequence elements upstream of the attR4 site and a 3' dLTR sequence comprising dU3, R and U5 sequence elements downstream of the attR3 site, wherein the 5' and 3' dLTR sequences are flanked by (d) first and second 5'-ACTG-3' sequences, wherein all or part of the first and second 5'-ACTG-3' sequences are within or near (e) recognition sites for a restriction enzyme selected from the group consisting of Pmel , Seal and BstZ \ 71.

[0033] Also provided are plasmid vectors comprising a transgene cassette and a transgene. In certain embodiments, the transgene is located between the att sites of the transgene cassette, having been inserted by gateway cloning methodology, and optionally replacing one or more selection markers that were present between the att sites prior to insertion of the transgene. In certain embodiments, att sites present in the transgene vector (e.g., at(R4 and a//R3 ) are converted into different att sites (e.g, attP4 and attP3) in the process of transgene insertion. Transgenes are introduced by one-way, two-way or three- way gateway cloning, as known in the art. See, for example, Hartley et al. (2000) Genome Research 10: 1788-1795. [0034] Any sequence, coding or noncoding, can serve as a transgene. For example, a transgene can encode a detectable moiety; e.g ., a fluorescent protein, such as green fluorescent protein (GFP), enhanced green fluorescent protein (EGFP), red fluorescent protein, yellow fluorescent protein, tdTomato and the like. A transgene can also encode an enzymatic activity (e.g, b-galactosidase, b-glucuronidase, luciferase, or an oxidorecuctase).

A transgene can also be a therapeutic protein, such as globin or a coagulation factor.

[0035] Accordingly, in certain embodiments, provided herein is a polynucleotide comprising: (a) a transgene, wherein the transgene is flanked by (b) first and second att sites, wherein the att sites are flanked by (c) first and second truncated retroviral long terminal repeats (LTRs), wherein the first truncated LTR is upstream of the first att site, the second truncated LTR is downstream of the second att site, and wherein the first and second truncated retroviral LTRs are flanked by recognition sites for a restriction enzyme, wherein cleavage of the recognition sites generates blunt ends; the polynucleotide further comprising the sequence5'-ACTG-3' at or near its termini (i.e., at the termini of the polynucleotide, or within one, two, three four or five nucleotide pairs of the termini of the polynucleotide); and optionally wherein a selection marker is not present between the two att sites. In certain embodiments, the 5' dLTR sequence comprises SEQ ID NO:4, and the 3' dLTR sequence comprises SEQ ID NO:5. In further embodiments, this polynucleotide is present in a plasmid. In additional embodiments, this polynucleotide is a linear, double-stranded DNA molecule.

[0036] In additional embodiments, provided herein is a polynucleotide comprising (a) a transgene, wherein the transgene is flanked by (b) an upstream attP4 site and a downstream attP3 site, wherein the att sites are flanked by (c) a 5' dLTR sequence comprising R and U5 sequence elements upstream of the attP4 site and a 3' dLTR sequence comprising dU3, R and U5 sequence elements downstream of the attP3 site, wherein the 5' and 3' dLTR sequences are flanked by recognition sites for a restriction enzyme selected from the group consisting of Pme I, Seal and Bs/Z 171, wherein the sequence 5'-ACTG-3' is present within or near the recognition site for the restriction enzyme, and optionally wherein a selection marker is not present between the attP4 and attP3 sites. In certain embodiments, the 5' dLTR sequence comprises SEQ ID NO:4, and the 3' dLTR sequence comprises SEQ ID NO:5. In further embodiments, this polynucleotide is present in a plasmid. In additional embodiments, this polynucleotide is a linear, double-stranded DNA molecule.

[0037] In certain embodiments, the compositions disclosed herein comprise a plurality of DNA molecules resulting from cleavage of a plasmid with a restriction enzyme that generates blunt ends, wherein the plasmid comprises a transgene-containing transgene cassette. In additional embodiments, the restriction enzyme is selected from the group consisting of Pme I, Seal and Bs/Z \ 71.

[0038] Accordingly, in certain embodiments, provided herein is a plurality of DNA molecules, one of which comprises: (a) transgene, wherein the transgene is flanked by (b) first and second att sites, wherein the att sites are flanked by (c) first and second truncated retroviral long terminal repeats (LTRs), wherein the first truncated LTR is upstream of the first att site, the second truncated LTR is downstream of the second att site, and wherein the first and second truncated retroviral LTRs are flanked by (d) partial restriction enzyme recognition sites generated by cleavage with a restriction enzyme that generates blunt ends; and further comprising all or part of the sequence5'-ACTG-3' at or near its termini (i.e., at its terminus, or within one, two, three four or five nucleotide pairs of its terminus). In further embodiments, the restriction enzyme is selected from the group consisting of Pmel , Seal and BstZ171 In certain embodiments, the 5' dLTR sequence comprises SEQ ID NO:4, and the 3' dLTR sequence comprises SEQ ID NO:5.

[0039] In additional embodiments, this disclosure provides a plurality of DNA molecules, one of which comprises (a) a transgene, wherein the transgene is flanked by (b) an upstream attP4 site and a downstream attP3 site, wherein the att sites are flanked by (c) a 5' dLTR sequence comprising R and U5 sequence elements upstream of the attR4 site and a 3' dLTR sequence comprising dU3, R and U5 sequence elements downstream of the attR3 site, wherein the 5' and 3' dLTR sequences are flanked by (d) partial restriction enzyme recognition sites generated by cleavage with a restriction enzyme that generates blunt ends; and further comprising all or part of the sequence5'-ACTG-3' at or near its termini (i.e., at its terminus, or within one, two, three four or five nucleotide pairs of its terminus). In further embodiments, the restriction enzyme is selected from the group consisting of Pmel, Seal and BstZm. In certain embodiments, the 5' dLTR sequence comprises SEQ ID NO:4, and the 3' dLTR sequence comprises SEQ ID NO:5.

[0040] Also provided are nucleic acids (double-stranded DNA, single-stranded DNA and/or RNA) encoding a retroviral integrase protein. If the integrase-encoding nucleic acid is DNA, it can be present in a DNA vector, ( e.g ., a plasmid) in either double-stranded or single-stranded form. The integrase can further comprise one or more additional nuclear localization signals (NLS) in addition to the endogenous integrase NLS.

[0041] Also provided are combinations of a nucleic acid (DNA or RNA) encoding a retroviral (e.g., lentiviral; e.g, HIV; e.g, HIV-1) integrase and a transgene-containing transgene cassette (as described above). Further provided are combinations of a nucleic acid (DNA or RNA) encoding a retroviral ( e.g ., HIV; e.g, HIV-1) integrase and a plurality of DNA molecules (e.g, linear double stranded DNA molecules) comprising a transgene- containing transgene cassette as described above. For use in methods for targeted integration of a transgene, any of the combinations described previously in this paragraph can further comprise a polynucleotide encoding a fusion between dCas9 and psipla (or a polypeptide comprising a fusion between dCas9 and psipla); and a guide RNA comprising (i) a hairpin sequence that binds to Cas9 or dCas9, and (ii) a sequence complementary to a genomic target sequence.

[0042] Additionally provided herein are methods for introducing a transgene into the genome of a cell, wherein the methods comprise contacting the cell with a combination of a transgene-containing transgene cassette and a nucleic acid encoding a retroviral integrase protein. Contacting can be by, for example, transfection, electroporation, injection or any other method of introducing nucleic acids into a cell. Transgene-containing transgene cassettes have been described above and can be one of a plurality of the products of digestion of a plasmid with a blunt end-generating restriction enzyme. Alternatively, a transgene- containing transgene cassette can be an isolated DNA (or RNA) molecule.

[0043] The integrase-encoding nucleic acid can be DNA or mRNA. The retroviral integrase protein can be from any retrovirus. In certain embodiments, the retrovirus is a lentivirus. In additional embodiments, the lentivirus is HIV. In further embodiments, the HIV is HIV-1.

[0044] In certain embodiments, provided herein is a plasmid comprising (a) a first recognition site for a restriction enzyme selected from the group consisting of Pmel, Seal and BstZni; (b) the sequence 5'-ACTG-3'; (c) a first truncated long terminal repeat (LTR) sequence comprising SEQ ID NO:4 that is interior to the first recognition site and the 5'- ACTG-3' sequence; (d) an attR4 site that is interior to the first truncated LTR sequence; (e) the ccdB locus; (f) an attR3 site that is exterior to the ccdB locus; (g) a second truncated long terminal repeat (LTR) sequence comprising SEQ ID NO:5 that is exterior to the attR3 site; (h) the sequence 5'-CAGT-3'; and (i) a second recognition site for a restriction enzyme selected from the group consisting of Pmel, Seal and BstZlll, wherein the second recognition site is the same as the first recognition site; and wherein the5'-CAGT-3' sequence and the second recognition site are exterior to the second truncated LTR sequence. In certain embodiments, the 5'-ACTG-3' sequence overlaps with the first recognition site and the 5'-CAGT-3' sequence overlaps with the second recognition site. [0045] In additional embodiments, provided herein is a plasmid comprising, in sequence (a) a first recognition site for a restriction enzyme selected from the group consisting of Pmel , Seal and BstZ 171; (b) the sequence 5'-ACTG-3'; (c) a first truncated long terminal repeat (LTR) sequence comprising SEQ ID NO:4 that is interior to the first recognition site and the 5'-ACTG-3' sequence; (d) an a UP A site that is interior to the first truncated LTR sequence; (e) a transgene; (f) an attP3 site that is exterior to the transgene; (g) a second truncated long terminal repeat (LTR) sequence comprising SEQ ID NO:5 that is exterior to the allP3 site; (h) the sequence 5'-CAGT-3'; and (i) a second recognition site for a restriction enzyme selected from the group consisting of Pmel, Seal and BstZlll, wherein the second recognition site is the same as the first recognition site and wherein the5'-CAGT-3' sequence and the second recognition site are exterior to the second truncated LTR sequence. In certain embodiments, the 5'-ACTG-3' sequence overlaps with the first recognition site and the 5'- CAGT-3' sequence overlaps with the second recognition site.

[0046] In additional embodiments, methods and compositions for targeted integration of transgenes are provided. The methods utilize a fusion protein in which psipla

(LEDGF/p75) amino acid sequences are joined to amino acid sequences of dCas9, optionally through a flexible linker such as (GGS)s. Nucleic acids (i.e., polynucleotides) encoding these fusion proteins are also provided. Also utilized in methods for targeted integration is a guide RNA comprising a portion that is complementary to a target genomic sequence and a portion comprising a RNA hairpin that binds to dCas9. The guide RNA tethers the fusion protein to the target genomic sequence ( via its interaction with dCas9) and the psiplA portion of the fusion protein binds to a preintegration complex comprising integrase protein and a transgene cassette.

[0047] Accordingly, also provided are combinations of a nucleic acid (DNA or RNA) encoding a retroviral ( e.g ., lentiviral; e.g, HIV; e.g. , HIV-1) integrase, a transgene-containing transgene cassette (as described above), a nucleic acid encoding a fusion between dCas9 and psipla; and a guide RNA comprising (i) a hairpin sequence that binds to Cas9 or dCas9, and (ii) a sequence complementary to a genomic target sequence.

[0048] Additional embodiments provide combinations of a nucleic acid (DNA or RNA) encoding a retroviral (e.g, HIV; e.g, HIV-1) integrase, a plurality of DNA molecules (e.g, linear double stranded DNA molecules) comprising a transgene-containing transgene cassette (as described above), a nucleic acid encoding a fusion between dCas9 and psipla; and a guide RNA comprising (i) a hairpin sequence that binds to Cas9 or dCas9, and (ii) a sequence complementary to a genomic target sequence. [0049] The disclosure also provides methods for targeted insertion of a transgene into the genome of a cell, the method comprising contacting the cell with a combination comprising a nucleic acid (DNA or RNA) encoding a retroviral (e.g, lentiviral; e.g. , HIV; e.g. , HIV-1) integrase, a transgene-containing transgene cassette (as described above), a nucleic acid encoding a fusion between dCas9 and psipla; and a guide RNA comprising (i) a hairpin sequence that binds to Cas9 or dCas9, and (ii) a sequence complementary to a genomic target sequence.

[0050] In additional embodiments, the disclosure provides methods for targeted insertion of a transgene into the genome of a cell, the method comprising contacting the cell with a combination comprising a nucleic acid (DNA or RNA) encoding a retroviral (e.g,

HIV; e.g, HIV-1) integrase, a plurality of DNA molecules (e.g, linear double stranded DNA molecules) comprising a transgene-containing transgene cassette (as described above), a nucleic acid encoding a fusion between dCas9 and psipla; and a guide RNA comprising (i) a hairpin sequence that binds to Cas9 or dCas9, and (ii) a sequence complementary to a genomic target sequence.

BRIEF DESCRIPTION OF THE DRAWINGS

[0051] Figure l is a schematic diagram of a retroviral single-stranded RNA genome, focusing on the terminal sequences. R signifies a repeated sequence present at both termini of viral RNA. U5 is a noncoding sequence unique to the 5' end of viral RNA. U3 is a noncoding sequence unique to the 3' end of viral RNA. The remainder of the viral genome (containing gag,pol, env and other genes) is represented by the horizontal line.

[0052] Figure 2 is a schematic diagram of a retroviral double-stranded DNA genome, focusing on the terminal sequences. R signifies a repeated sequence present at both termini of viral RNA. U5 is a noncoding sequence unique to the 5' end of viral RNA. U3 is a noncoding sequence unique to the 3' end of viral RNA. The remainder of the viral genome (containing gag,pol, env and other genes) is represented by the horizontal lines. The long terminal repeat (LTR) regions of the double-stranded genome are indicated.

[0053] Figure 3 shows a schematic diagram (not to scale) of an exemplary transgene cassette. RE indicates a recognition site for a restriction enzyme that generates a blunt-ended cleavage product (e.g, Pmel, Seal or BstZ 17I) IR represents the inverted repeat sequence 5'- ACTG-3'. 5' dLTR represents the truncated LTR sequence shown in Figure 8B. 3' dLTR represents the truncated LTR sequence shown in Figure 9B. att represents an att site. INS represents a transgene. The RE and IR sites may overlap each other. [0054] Figure 4 is a schematic diagram of a transgene vector. The top row of the diagram shows regions of the HIV-1 LTR (dU3, U3, R and U5) relevant to construction of the vector and also shows certain restriction sites that can be used in the vector. The middle row shows the structures of the ends of the transgene cassette: the light-colored box represents one of the three restriction sites shown in the top row, and the darker boxes represent portions of the LTR present in the 5' dLTR and 3' dLTR sequences. The sequence 5'-ACTG-3' is present between the restriction site and each dLTR sequence. The bottom row shows a diagram of a gateway-compatible vector containing the dLTRs shown in the middle row, along with 5' entry sequences, middle entry sequences, and 3' entry sequences for insertion of transgenes and regulatory sequences.

[0055] Figure 5 is a schematic diagram (not to scale) illustrating construction of the dU3 sequence. The U3 sequence is arbitrarily divided into 3 regions: A, B and C. In the dU3 sequence, internal sequences represented by B have been deleted.

[0056] Figure 6 is a schematic diagram of an exemplary transgene cassette, focusing on the 5' dLTR and 3' dLTR sequences. R signifies a repeated sequence present at both termini of viral RNA. U5 is a noncoding sequence unique to the 5' end of viral RNA. dU3 is a deleted U3 sequence. The remainder of the cassette is represented by the horizontal lines.

[0057] Figure 7 shows the nucleotide sequence of the HIV-1 long terminal repeat (SEQ ID NO: 1). The sequence of the R region is underlined. Sequences upstream of the R region constitute the U3 region. Sequences downstream of the R region constitute the U5 region.

[0058] Figure 8A shows the nucleotide sequence of the HIV-1 U3 region (SEQ ID NO:2). Underlining indicates the portions of the U3 region that are retained in the deleted U3 (dU3) sequence. Figure 8B shows the nucleotide sequence of dU3 (SEQ ID NO:3).

[0059] Figure 9A shows the nucleotide sequence of the HIV-1 LTR (SEQ ID NO: 1). The R region is underlined, and the sequences present in 5' dLTR are shaded. Figure 9B shows the nucleotide sequence of 5' dLTR (SEQ ID NO: 4).

[0060] Figure 10A shows the nucleotide sequence of the HIV-1 LTR (SEQ ID NO: l). The R region is underlined, and the sequences present in 3' dLTR are shaded. Figure 10B shows the nucleotide sequence of 3' dLTR (SEQ ID NO: 5).

[0061] Figure 11 is a schematic diagram of a transgene (pLTR) vector, showing the locations of 5' dLTR and 3' dLTR sequences and other features of the vector. Abbreviations: “cmR” refers to sequences encoding resistance to chloramphenicol;“ccdB” refers to sequences encoding a DNA gyrase inhibitor lethal to E. coli .;“fl(+) ori” refers to the replication origin for the + strand of fl bacteriophage;“AmpR” refers to sequences encoding resistance to ampicillin;“ColEl origin” refers to the replication origin for Col El plasmid; “5'...83” refers to the 5' dLTR sequence;“3' L...319” refers to the 3' dLTR sequence.

Recognition sites for the BstZlll restriction enzyme are also shown.

[0062] Figure 12 is a schematic diagram showing portions of the pLTR vector (shown in Figure 11) in greater detail.“attR3” and“attR4” refer to sites at which recombination will occur with other att sites in the presence of bacteriophage l recombination proteins.

[0063] Figure 13 shows schematic diagrams of the nucleic acids used for zebrafish injection and an outline of the experimental plan.“5' dLTR” and“3' dLTR” are truncated HIV-1 LTR sequences as described elsewhere herein. CMV indicates the cytomegalovirus early promoter. EGFP indicates sequences encoding enhanced green fluorescent protein. pA indicates the bovine growth hormone polyadenylation signal.

[0064] Figure 14 shows a quantitative analysis of EGFP expression in zebrafish that developed from embryos that had been injected with integrase mRNA and a transgene cassette encoding enhanced green fluorescent protein. Analysis was conducted at two concentrations of each nucleic acid: a low dose of 12.5 ng/mΐ integrase mRNA and 12.5 ng/mΐ EGFP transgene cassette (second pair of bars from left), and a high dose of 25 ng/mΐ integrase mRNA and 25 ng/mΐ EGFP transgene cassette (fourth pair of bars from left). Control samples were injected with 12.5 ng/mΐ (left-most pair of bars) or 25 ng/mΐ (third pair of bars from left) EGFP transgene cassette only (i.e., in the absence of integrase mRNA).

[0065] Fish were sorted into five groups depending on degree of expression of the transgene (Group 0: no expression through Group 4:highest level of expression), and results are expressed as the percentage of total individuals examined that fell into each group. For each pair of bars, white coloring indicates the percentage of fish in Group 0; light stippling indicates the percentage of fish in Group 1; heavy stippling indicates the percentage of fish in Group 2; dark shading indicates the percentage of fish in Group 3; and black indicates the percentage of fish in Group 4.

[0066] Figure 15 shows a quantitative analysis of EGFP expression in zebrafish in which a EGFP transgene was introduced using a Tol2-mediated transposition system (right most pair of bars). Results from a control experiment which did not include Tol2 mRNA are shown in the left-most pair of bars. The percentage of fish in each group (Group 0 through Group 4) is indicated by shading, as in Figure 14.

[0067] Figure 16 shows a quantitative analysis of EGFP expression in zebrafish in which a EGFP transgene was introduced using l-Scel meganuclease-mediated integration (right-most pair of bars. Results from a control experiment which did not include the l-Scel meganuclease are shown in the left-most pair of bars. The percentage of fish in each group (Group 0 through Group 4) is indicated by shading, as in Figure 14.

[0068] Figure 17 shows a quantitative analysis of EGFP expression in zebrafish that developed from embryos that had been injected with integrase mRNA and a transgene cassette containing sequences encoding enhanced green fluorescent protein under the transcriptional control of the endothelial-specific Flilep enhancer. Analysis was conducted at two concentrations of nucleic acid: a low dose of 12.5 ng/mΐ integrase mRNA and 12.5 ng/mΐ EGFP transgene cassette (second pair of bars from left), and a high dose of 25 ng/mΐ integrase mRNA and 25 ng/mΐ EGFP transgene cassette (fourth pair of bars from left). Control samples were injected with 12.5 ng/mΐ (left-most pair of bars) or 25 ng/mΐ (third pair of bars from left) EGFP transgene cassette only (/.e., in the absence of integrase mRNA).

[0069] Fish were sorted into five groups depending on degree of expression of the transgene, and results are expressed as the percentage of total individuals examined that fell into each group. The percentage of fish in each group (Group 0 through Group 4) is indicated by shading, as in Figure 14.

[0070] Figure 18 shows schematic diagrams of the nucleic acids used for transfection of cultured cells and an outline of the experimental plan.“CMV” indicates the

cytomegalovirus early promoter. “Integrase” indicates sequences encoding the HIV-1 integrase protein. “2A-tomato” indicates sequences encoding a red fluorescent protein. “5' dLTR” and“3' dLTR” are truncated HIV-1 LTR sequences as described elsewhere herein. EGFP indicates sequences encoding enhanced green fluorescent protein. pA indicates the bovine growth hormone polyadenylation signal.

[0071] Figure 19 shows representative fluorescent micrographic images of cultured cells from two cell lines (A549 and PANC-1) that had been transfected with a transgene cassette encoding EGFP. The upper panels (“Control”) show images of cells transfected with an EGFP-encoding transgene cassette and a 2A-tomato-encoding vector. The lower panels (“Integrase”) show images of cells transfected with an EGFP-encoding transgene cassette and a vector encoding HIV-1 integrase and 2A-tomato. Fluorescence is indicative of stable integration of the transgene into the cellular genome.

[0072] Figure 20 shows results of measurement of the percentage of cells exhibiting green fluorescence, which is indicative of stable integration of an EGFP-encoding transgene. The right-most pair of bars shows results obtained with cells transfected with an EGFP- encoding transgene cassette and a plasmid encoding HIV-1 integrase. The left-most pair of bars shows results obtained with cells transfected with an EGFP-encoding transgene cassette and a control plasmid lacking integrase-encoding sequences. The left-most bar in each pair shows results for A549 cells; the right-most bar in each pair shows results for PANC-1 cells.

[0073] Figure 21 shows percentage of zebrafish stably expressing a tdTomato transgene after injection of embryos with tdTomato transgene cassettes terminating in Seal ends (left-most pair of bars) BstZlll ends (second pair of bars from left), ?me\ ends (third pair of bars from left) or ends generated by double digestion with Apa I and MM (right-most pair of bars). The sequence in and adjacent to the recognition site for each enzyme, or enzyme pair, is shown below each pair of bars.

[0074] For each pair of bars, the right-most bar (indicated by“+” beneath the graph) shows percentage of individuals stably expressing red fluorescence after co-injection of tdTomato-containing transgene cassette and integrase mRNA; the left-most bar (indicated by beneath the graph) shows results of control injections of tdTomato-containing transgene cassette only. Fish were sorted into groups depending on their degree of red fluorescence: fish in Group 1 (indicated by light shading) exhibited partial fluorescence in heart; and fish in Group 2 (indicated by darker shading) exhibited full fluorescence in heart.

[0075] Figure 22 is a schematic diagram illustrating the method used for targeted integration. A dCAs9/LEDGF (psipla) fusion protein is recruited to the target sequence by a sgRNA having a portion complementary to the target sequence and a hairpin portion that binds dCas9. LEDGF in turn binds to the pre-integration complex (comprising integrase bound at both termini of the transgene cassette, on right of diagram), thereby tethering the pre integration complex to the target sequence and directing integration at the target sequence.

[0076] Figure 23 is a schematic diagram of the pCS-NLS-dCas9-(GGS)5-zpsipla vector.

[0077] Figure 24 is a schematic diagram of the the pCS-zpsipla-(GGS)5-dCas9-NLS vector.

[0078] Figure 25 is a schematic diagram of the the pLTRB-CMV-tdTomato vector.

[0079] Figure 26 shows Z-stack fluorescent confocal images of zebrafish embryos at 5 hours post-fertilization, showing green fluorescence (left), red fluorescence (center) and merged fluorescence (right). Several red cells (arrow) are visible in the merged image.

[0080] Figure 27 shows the percentage of embryos exhibiting positive fluorescence (/.e., in groups 2, 3 or 4) after co-injection of a transgene cassette and mRNA encoding HIV-1 integrase protein or variants thereof. The transgene cassette, containing sequences encoding EGFP under the transcriptional control of an endothelium-specific enhancer (pFLilep:EGFP- pA), was co-injected with sequences encoding wild-type HIV-1 integrase (WT, left-most bar); sequences encoding an integrase variant containing a c -myc NLS appended to the N-terminus (5'NLS c myc , center bar) or sequences encoding an integrase variant containing a c -myc NLS appended to the C-terminus (3'NLS c myc , right-most bar). Fish were sorted into groups as shown, with Group 2 showing the lowest degree of fluorescence, and Group 4 showing the highest degree of fluorescence.

DETAILED DESCRIPTION

[0081] Practice of the present disclosure employs, unless otherwise indicated, standard methods and conventional techniques in the fields of cell biology, molecular biology, biochemistry, cell culture, recombinant DNA and related fields as are within the skill of the art. Such techniques are described in the literature and thereby available to those of skill in the art. See, for example, Alberts, B. et al .,“Molecular Biology of the Cell,” 6 th edition, Garland Science, New York, NY, 2015; Watson et al. ,“Molecular Biology of the Gene,” 7 th edition, Pearson, London, 2014; Lodish et al.“Molecular Cell Biology,” 8 th edition, W.H. Freeman, New York, NY, 2016; Voet, D. et al.“Fundamentals of Biochemistry: Life at the Molecular Level,” 5 th edition, John Wiley & Sons, Hoboken, NJ, 2016; Sambrook, J. et al. , “Molecular Cloning: A Laboratory Manual,” 3 rd edition, Cold Spring Harbor Laboratory Press, 2001; Ausubel, F. et al. ,“Current Protocols in Molecular Biology,” John Wiley &

Sons, New York, 1987 and periodic updates; Freshney, R.I.,“Culture of Animal Cells: A Manual of Basic Technique,” 4 th edition, John Wiley & Sons, Somerset, NJ, 2000; and the series“Methods in Enzymology,” Academic Press, San Diego, CA.

I. Definitions

[0082] A“transgene vector,” or“pLTR vector,” as disclosed herein, is a DNA plasmid vector which, when cleaved by an appropriate restriction enzyme, generates a DNA molecule that resembles the substrate for integration of a retroviral DNA genome. Transgene vectors are characterized by sequences that facilitate introduction of an exogenous gene ( e.g ., att sites), flanked by truncated retroviral long terminal repeat (LTR) sequences, which are in turn flanked by the sequence 5'-ACTG-3', which in turn overlaps with, or is flanked by, recognition sites for a restriction enzyme whose cleavage generates blunt ends and whose recognition sequence optionally contains six or more nucleotides. A transgene vector suitable for insertion of a transgene, but which do not comprise a transgene, is denoted an“insertion vector.” [0083] A“transgene” is any DNA sequence inserted into a transgene vector as described herein. A transgene will often be a sequence encoding a protein, but can also be, e.g ., a regulatory sequence (e.g, promoter, enhancer) or a sequence encoding a regulatory RNA, such as an antisense RNA or a siRNA.

[0084] A“transgene cassette” refers to a nucleic acid (e.g, DNA) molecule comprising a transgene (or one or more selection markers) flanked by sequences promoting recombination (e.g, att sites), which recombination-promoting sequences are in turn flanked by truncated LTR sequences, which truncated LTR sequences are in turn flanked by 5'- ACTG-3' sequences, which 5'-ACTG-3' sequences in turn overlap with, or are flanked by, recognition sequences for a restriction enzyme that, upon cleavage, generates blunt ends. A transgene cassette can be a portion of a transgene vector, wherein the transgene vector contains additional sequences such as, for example, replication origins, transcriptional regulatory sequences and additional selection markers. A transgene cassette can an isolated DNA molecule resulting from cleavage of a transgene vector with a blunt end-generating restriction enzyme as described herein. A transgene cassette may or may not comprise a transgene; if a transgene cassette comprises a transgene, it is denoted a“transgene-containing transgene cassette.”

[0085] The terms“interior” (or“internal”) and“exterior” (or“external”) refer to relative location within a transgene cassette or transgene vector. Taking the transgene (or the selection marker(s) present in the vector before insertion of the transgene) as center; a first element being“interior to” a second element means that the first element is closer to the transgene (or selection marker) than is the second element. Alternatively, a first element being“exterior to” a second element means that the second element is closer to the transgene (or selection marker) than is the first element.

[0086] An“integrase vector,” as disclosed herein, is a DNA plasmid vector containing sequences encoding a retroviral or lentiviral integrase protein. An integrase vector can also contain control sequences that regulate expression of the integrase protein. Such control sequences can be, for example, promoters for in vitro transcription, such as , for example, a SP6 promoter or a T7 promoter or the like; or a promoter (optionally in operative linkage with an enhancer) able to function in a eukaryotic cell. Such promoters and enhancers are known in the art. Sites specifying transcription termination and polyadenylation can also be present.

[0087] A restriction enzyme recognition site (or recognition sequence) is a DNA sequence to which a restriction enzyme binds in the process of DNA cleavage by the restriction enzyme. For most restriction enzymes, their recognition site is also the site at which the restriction enzyme cleaves DNA. However, certain restriction enzymes ( e.g ., Fokl) cleave at a site that is distinct from the sequence at which they bind.

[0088] Cleavage of DNA by a restriction enzyme generates two DNA ends at the site of cleavage. If the terminal nucleotide of those ends is base-paired, the ends are denoted “blunt ends.” If one or more of the 5'-terminal nucleotides are not base-paired, the ends are said to have a 5' extension or a 5'-overhang. If one or more of the 3'-terminal nucleotides are not base-paired, the ends are said to have a 3' extension or a 3'-overhang. 5'- and 3'-overhangs can consist of one, two, three, four or more unpaired nucleotides.

II. Homology and Identity of Nucleic Acids

[0089]“Homology” or“identity” or“similarity” as used herein refers to the relationship between two nucleic acid molecules based on an alignment of their nucleotide sequences. Homology and identity can each be determined by comparing a position in each sequence which may be aligned for purposes of comparison. For example, a“reference sequence” can be compared with a“test sequence.” When a position in the reference sequence is occupied by the same nucleotide at an equivalent position in the test sequence, then the molecules are identical at that position; when the equivalent position is occupied by a similar nucleotide residue (e.g., similar in steric and/or electronic nature, and/or in its hydrogen-bonding properties), then the molecules can be referred to as homologous (similar) at that position. The relatedness of two sequences, when expressed as a percentage of homology/similarity or identity, is a function of the number of identical or similar nucleotides at positions shared by the sequences being compared. In comparing two sequences, the absence of nucleotide residues, or presence of extra residues, in one sequence as compared to the other, also decreases the identity and homology/similarity.

[0090] As used herein, the term“identity” refers to the percentage of identical nucleotide residues at corresponding positions in two or more sequences when the sequences are aligned to maximize sequence matching, i.e., taking into account gaps and insertions. Identity can be readily calculated by known methods, including but not limited to those described in Computational Molecular Biology, Lesk, A. M., ed., Oxford University Press, New York, 1988; Biocomputing: Informatics and Genome Projects, Smith, D. W., ed., Academic Press, New York, 1993; Computer Analysis of Sequence Data, Part I, Griffin, A. M., and Griffin, H. G., eds., Humana Press, New Jersey, 1994; Sequence Analysis in

Molecular Biology, von Heinje, G., Academic Press, 1987; and Sequence Analysis Primer, Gribskov, M. and Devereux, J., eds., M Stockton Press, New York, 1991; and Carillo, H., and Lipman, D., SIAM J. Applied Math., 48: 1073 (1988). Methods to determine identity are designed to give the highest degree of match between the sequences tested. Moreover, methods to determine identity are codified in publicly available computer programs.

Computer program methods to determine identity between two sequences include, but are not limited to, the GCG program package (Devereux et al. (1984) Nucleic Acids Research

12:387), BLASTP, BLASTN, and FASTA (Altschul et al. (1990) J. Molec. Biol. 215:403- 410; Altschul et al. (1997) Nucleic Acids Res. 25:3389-3402). The BLAST X program is publicly available from NCBI and other sources. See, e.g, BLAST Manual, Altschul, S., et al., NCBI NLM NIH Bethesda, Md. 20894; Altschul et al.( 1990) J. Mol. Biol. 215:403-410. The well known Smith-Waterman algorithm can also be used to determine identity.

[0091] For sequence comparison, typically one sequence acts as a reference sequence, to which one or more test sequences are compared. Sequences are generally aligned for maximum correspondence over a designated region, e.g. , a region at least about 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65 or more nucleotides in length, and the region can be as long as the full-length of the reference nucleotide sequence. When using a sequence comparison algorithm, test and reference sequences are input into a computer program, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. The sequence comparison algorithm then calculates the percent sequence identity for the test sequence(s) relative to the reference sequence, based on the designated program parameters.

[0092] Examples of algorithms that are suitable for determining percent sequence identity are the BLAST and BLAST 2.0 algorithms, which are described in Altschul et al. (1990) J. Mol. Biol. 215:403-410 and Altschul et al. (1977) Nucleic Acids Res. 25:3389-3402, respectively. Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information at www.ncbi.nlm.nih.gov (visited July 22, 2019). Further exemplary algorithms include ClustalW (Higgins et al. (1994) Nucleic Acids Res. 22:4673-4680), available at www.ebi.ac.uk/Tools/clustalw/index.html (visited July 22, 2019).

[0093] Sequence identity between two nucleic acids can also be described in terms of annealing, reassociation, or hybridization of two polynucleotides to each other, mediated by base-pairing. Hybridization between polynucleotides proceeds according to well-known and art-recognized base-pairing properties, such that adenine base-pairs with thymine or uracil, and guanine base-pairs with cytosine. The property of a nucleotide that allows it to base-pair with a second nucleotide is called complementarity. Thus, adenine is complementary to both thymine and uracil, and vice versa ; similarly, guanine is complementary to cytosine and vice versa. An oligonucleotide or polynucleotide which is complementary along its entire length with a target sequence is said to be perfectly complementary, perfectly matched, or fully complementary to the target sequence, and vice versa. Two polynucleotides can have related sequences, wherein the majority of bases in the two sequences are complementary, but one or more bases are noncomplementary, or mismatched. In such a case, the sequences can be said to be substantially complementary to one another. If two polynucleotide sequences are such that they are complementary at all nucleotide positions except one, the sequences have a single nucleotide mismatch with respect to each other.

[0094] Conditions for hybridization are well-known to those of skill in the art and can be varied within relatively wide limits. Hybridization stringency refers to the degree to which hybridization conditions disfavor the formation of hybrids containing mismatched

nucleotides, thereby promoting the formation of perfectly matched hybrids or hybrids containing fewer mismatches; with higher stringency correlated with a lower tolerance for mismatched hybrids. Factors that affect the stringency of hybridization include, but are not limited to, temperature, pH, ionic strength, and concentration of organic solvents such as formamide and dimethylsulfoxide. As is well known to those of skill in the art, hybridization stringency is increased by higher temperatures, lower ionic strengths, and lower solvent concentrations. See, for example, Ausubel et al., supra ; Sambrook et al., supra ; M. A. Innis et al. (eds.) PCR Protocols, Academic Press, San Diego, 1990; B. D. Hames et al. (eds.) Nucleic Acid Hybridisation: A Practical Approach, IRL Press, Oxford, 1985; and van Ness et al., (1991) Nucleic Acids Res. 19:5143-5151.

[0095] Thus, in the formation of hybrids (duplexes) between two polynucleotides, the polynucleotides are incubated together in solution under conditions of temperature, ionic strength, pH, etc., that are favorable to hybridization, i.e., under hybridization conditions. Hybridization conditions are chosen, in some circumstances, to favor hybridization between two nucleic acids having perfectly-matched sequences, as compared to a pair of nucleic acids having one or more mismatches in the hybridizing sequence. In other circumstances, hybridization conditions are chosen to allow hybridization between mismatched sequences, favoring hybridization between nucleic acids having fewer mismatches.

[0096] The degree of hybridization between two polynucleotides, also known as hybridization strength, is determined by methods that are well-known in the art. A preferred method is to determine the melting temperature (T m ) of the hybrid duplex. This is accomplished, for example, by subjecting a duplex in solution to gradually increasing temperature and monitoring the denaturation of the duplex, for example, by absorbance of ultraviolet light, which increases with the unstacking of base pairs that accompanies denaturation. T m is generally defined as the temperature midpoint of the transition in ultraviolet absorbance that accompanies denaturation. Alternatively, if T m s are known, a hybridization temperature (at fixed ionic strength, pH and solvent concentration) can be chosen that is below the Tm of the desired duplex and above the Tm of an undesired duplex. In this case, determination of the degree of hybridization is accomplished simply by testing for the presence of duplex polynucleotide. Adsorption to hydroxyapatite can also be used to distinguish single-stranded nucleic acids from double-stranded nucleic acids.

[0097] Hybridization conditions are selected following standard methods in the art.

See , for example, Sambrook, et al. , Molecular Cloning: A Laboratory Manual, Second Edition, (1989) Cold Spring Harbor, N.Y. For example, hybridization reactions can be conducted under stringent conditions. An example of stringent hybridization conditions is hybridization at 50°C or higher in O.l x SSC (15 mM sodium chloride/1.5 mM sodium citrate). Another example of stringent hybridization conditions is overnight incubation at 42 °C in a solution: 50% formamide, 5x SSC (0.75 M NaCl, 75 mM trisodium citrate), 50 mM sodium phosphate (pH7.6), followed by washing in 0.1 x SSC at about 65°C. Optionally, one or more of 5x Denhardt’s solution, 10% dextran sulfate, and/or 20 mg/ml heterologous nucleic acid ( e.g ., yeast tRNA, denatured, sheared salmon sperm DNA) can be included in a hybridization reaction. Stringent hybridization conditions are hybridization conditions that are at least as stringent as the above representative conditions, where conditions are considered to be at least as stringent if they are at least about 80% as stringent, typically at least 90% as stringent as the above specific stringent conditions.

[0098] The term“substantially identical” is used herein to refer to a first nucleic acid sequence that contains a sufficient or minimum number of nucleotides that are identical to aligned nucleotides in a second nucleic acid sequence such that the first and second nucleotide sequences possess a common functional property (e.g., enhancing the expression, stability or transport of mRNA).

[0099] The term“homology” describes a mathematically based comparison of sequence similarities which is used to identify sequences with similar functions or motifs. A reference nucleotide sequence (e.g, a sequence as disclosed herein) is used as a“query sequence” to perform a search against public databases to, for example, identify other family members, related sequences or homologues. Such searches can be performed using the NBLAST and XBLAST programs (version 2.0) of Altschul et al. (1990) ./. Mol. Biol. 215:403-410. BLAST nucleotide searches can be performed with the NBLAST program, score=100, wordlength=12 to obtain nucleotide sequences homologous to a reference nucleotide sequence. To obtain gapped alignments for comparison purposes, Gapped BLAST can be utilized as described in Altschul et al. (1997) Nucleic Acids Res. 25:3389-3402. When utilizing the BLAST and Gapped BLAST programs, the default parameters of the respective programs ( e.g ., XBLAST and BLAST) can be used (see ncbi.nlm.nih.gov).

[0100] Nucleic acids and polynucleotides of the present disclosure encompass those having a nucleotide sequence that is at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.9% or 100% identical to any of SEQ ID NOs: 1-5.

[0101] Nucleotide analogues are known in the art. Accordingly, nucleic acids (i.e., SEQ ID NOs: 1-5) comprising nucleotide analogues are also encompassed by the present disclosure.

III. Transgene vectors and transgene cassettes

[0102] Transgene vectors are based on Gateway destination vectors and are designed so that, after insertion of transgene sequences; cleavage of the vector with an appropriate restriction enzyme generates a DNA molecule resembling a retroviral pre-integration substrate. Thus, a transgene vector contains a transgene cassette comprising one or more pairs of att sites to facilitate insertion of the transgene by Gateway cloning methods. The att sites are flanked externally by truncated retroviral (e.g., lentiviral) LTR sequences (denoted 5' dLTR and 3' dLTR herein) which, in turn, are flanked (externally) by the inverted repeat sequence 5'-ACTG-3'. The 5'-ACTG-3' sequences are flanked, in turn, by recognition sites for a restriction enzyme whose cleavage generates blunt-ended products. In certain embodiments, the 5'-ACTG-3' sequences overlap with the recognition site for the blunt end generating restriction enzyme. In certain embodiments, the recognition sites are six nucleotide pairs or greater in length. A schematic diagram of a transgene cassette is shown in Figure 3. A transgene cassette can be part of a DNA vector (e.g, a circular plasmid) or can exist as a linear, double-stranded DNA molecule. A schematic diagram of a transgene vector, designed for insertion of a transgene and/or regulatory elements by Gateway cloning, is shown in Figure 4.

[0103] In certain embodiments of a transgene vector, one or more selection markers are located between the att sites, to allow for selection of vectors containing an inserted transgene. The selection marker can be a negative selection marker (e.g., the ccdB gene) that causes cell death or blocks cell growth; so that replacement of the negative selection marker by transgene sequences allows survival of cells harboring a transgene-containing vector.

Selection markers are known in the art and include, for example, b-lactamase, ccdB, dihydrofolate reductase (DHFR), glutamine synthetase (GS), puromycin-N-acetyl transferase, hygromycin phosphotransferase, aminoglycoside-3 -phosphotransferase, ble; and sequences encoding resistance to ampicillin, tetracycline, kanamycin, chloramphenicol, G418, gentamycin and neomycin.

A. Restriction enzyme recognition sites

[0104] Integration of the retroviral double-stranded DNA genome requires a blunt- ended genome, terminating in the inverted repeat sequence

as a substrate for retroviral integrase activity. Accordingly, for transgene integration according to the present invention, the transgene is present on a blunt-ended DNA molecule; hence the restriction enzyme recognition sites that flank the transgene cassette are sites whose cleavage results in production of a blunt end (i.e., recognition sites for a blunt end-generating restriction enzyme) and whose recognition site contains all or part of the sequence 5'-ACTG- 3'.

[0105] In addition, to avoid the possibility of cleavage within the transgene itself, it is preferable that the recognition site contain six nucleotide pairs or more; e.g ., six nucleotide pairs, seven nucleotide pairs, eight nucleotide pairs, nine nucleotide pairs, ten nucleotide pairs, eleven nucleotide pairs, twelve nucleotide pairs or more. However, depending on the size and nucleotide sequence of the transgene, blunt end-generating restriction enzymes whose recognition sites contain four or five nucleotide pairs can also be used.

[0106] Exemplary restriction enzymes for use in the methods described herein, that produce blunt ends and whose recognition sequences contain all or part of the sequence 5'- ACTG-3', include Sea I, Pmel and BstTZ1 , whose recognition sequences are shown in Table 1

[0107] Additional restriction enzyme recognition sequence suitable for use in the transgene vectors described herein include those whose cleavage generates blunt ends terminating in the sequence 5'-ACTG-3', or in which the sequence 5'-ACTG-3' is within 1, 2, 3, 4 or 5 base pairs of a blunt-ended terminus. In addition, restriction enzymes generating 5'-overhanging ends which can be repaired by a DNA polymerase to generate (1) a blunt-end terminating in the sequence 5'-ACTG-3'; or (2) a blunt-ended in which the sequence 5'-ACTG-3' is within 1, 2, 3, 4 or 5 base pairs of the blunt-ended terminus, can also be used. Furthermore, restriction enzymes generating 3 '-overhanging ends which can be processed by a protein having 3'-specific, single-stranded exonuclease activity ( e.g ., SI nuclease, mung bean nuclease, E. coli. exonuclease I, E. coli. exonuclease X, E. coli DNA polymerase I, E. coli DNA polymerase II, E. coli DNA polymerase III, E. coli exonuclease T), to generate (1) a blunt-end terminating in the sequence 5'-ACTG-3'; or (2) a blunt-ended in which the sequence 5'-ACTG-3' is within 1, 2, 3, 4 or 5 base pairs of the blunt-ended terminus, can also be used.

B. Inverted repeat sequence

[0108] For integration of a double-stranded viral DNA genome into a host cell chromosome, the blunt-ended inverted repeat sequence

is required at the termini of the double-stranded viral DNA genome. The 3'-processing activity of the viral integrase (int) protein removes the terminal GT dinucleotide, leaving a 5' extension of the dinucleotide AC at both ends of the DNA molecule, which allows the molecule to serve as a substrate for strand transfer (i.e., integration). [0109] Accordingly, the transgene vectors disclosed herein contain, at both ends of the transgene cassette, the inverted repeat (IR) sequence

This 5'-ACTG-3' sequence can be part of the blunt end-generating restriction enzyme recognition site (as discussed in the previous section) or can overlap, either fully or partially, with the recognition site.

C. Truncated LTRs

[0110] The termini of retroviral and lentiviral genomes consist of identical long terminal repeat (LTR) sequences. A typical LTR contains three sequence elements: U5, a sequence unique to the 5' end of the RNA genome; U3, a sequence unique to the 3' end of the RNA genome; and R, a sequence contained at both the 5' and 3' ends of the RNA genome external to the U5 and U3 sequences. A generalized structure of a retroviral RNA genome, focusing on the terminal sequences, is shown in Figure 1.

[0111] During the infective cycle, the single-stranded RNA genome is converted to a double-stranded DNA molecule. Due to the nature of the reverse transcription reaction, certain terminal genomic sequences are duplicated and transferred to the other end of the genome, generating long terminal repeat (LTR) sequences, as shown schematically in Figure 2

[0112] The LTR-containing double-stranded DNA genome is the substrate for integration; however, not all LTR sequences are required for integration of viral double- stranded DNA. In particular, many, if not all of the approximately 50 transcriptional regulatory elements, present in the U3 region, are unnecessary for integration. Accordingly, in the transgene vectors and transgene cassettes disclosed herein, not all U3 sequences are present in the truncated LTRs (dLTRs) present in the transgene vectors. In particular, the 5' dLTR does not contain any U3 sequences, consisting of R and U5 sequences; and the 3' dLTR contains an internally deleted U3 (dU3) region (that retains only the Spl and GATA-3 binding sites) along with R and U5 sequences. Figure 5 shows a schematic diagrams of how U3 sequences were deleted to construct a dU3 sequence. A schematic diagram of the dLTR sequences of the transgene vectors and transgene cassettes is shown in Figure 6.

[0113] The derivation of the 5' dLTR and 3' dLTR are shown in more detail in Figures 7-10. Figure 7 shows the nucleotide sequence of the wild-type HIV-1 LTR, indicating the U3, R and U5 regions. Figure 8A shows the sequence of the U3 region, indicating sequences which are deleted (no underlining) and sequences which are retained (underlined) in dU3. Figure 8B show the nucleotide sequence of dU3. Figure 9A shows the nucleotide sequence of the HIV-1 LTR and indicates the sequences present in 5' dLTR. Figure 9B show the nucleotide sequence of the 5' dLTR which contains R and U5 sequences. Figure 10A shows the nucleotide sequence of the HIV-1 LTR and indicates the sequences present in 3' dLTR. Figure 10B show the nucleotide sequence of the 3' dLTR which contains dU3, R and U5 sequences.

D. att sites

[0114] Transgene vectors are designed for rapid and simple insertion of transgenes using the gateway cloning system. See, for example, Hartley et al ., supra. Accordingly, the transgene vectors disclosed herein, based on Gateway destination vectors, contain one or more pairs of att sites.

[0115] att sites are DNA sequences involved in the integration of the bacteriophage l genome into, and its excision from, the A. colt chromosome. The bacteriophage contains two sequence denoted attP , which, in the presence of a recombinase protein, recombine with a pair of bacterial sequence known as attB sites. The result of the recombination reaction is an E. colt genome containing an integrated l genome, in which the integrated l genome is flanked by hybrid att sites denoted attL and attR. Excision of an integrated l genome is catalyzed by the xis protein, resulting in the regeneration of the attP sites in the phage genome and regeneration of the attB sites in the bacterial genome.

[0116] In a vector with a single pair of att sites, one att site lies just interior to the 5' dLTR sequence, and the other att site lies just interior to the 3' dLTR sequence. In certain embodiments, transgene vectors contain two pairs of att sites. In additional embodiments, transgene vectors contain three pairs of att sites: a first pair of att sites for 5' entry clones; a second pair of att sites for middle entry clones and a third pair of att sites for 3' entry clones as described, for example, by Kwan et al. (2007) Devel. Dynamics 236:3088-3099. Exemplary pairs of att sites include:

att PI and att P2

att P3 and att P4

IV. Nucleic acids encoding retroviral integrase

[0117] Retroviral integrase proteins are encoded by a portion of the retroviral pol gene, near its 3' end. Integrase proteins comprise approximately 300 to 400 amino acids and include three domains, that are joined by linkers of varying length. The N-terminal domain includes two pairs of zinc-chelating histidine and cysteine residues (the HHCC motif) in which a bound Zn 2+ ion stabilizes a helix-turn-helix structure. The catalytic core domain is characterized by three acidic amino acids: two aspartic acid residues and a glutamic acid residue (the DDE motif) with the second aspartic acid and the glutamic acid being separated by approximately 35 residues. The DDE motif is also involved in metal ion chelation. Also within the central region of HIV- 1 integrase is a non-canonical nuclear localization signal (NLS), having the amino acid sequence IIGQVRDQAEHLK (SEQ ID NO: 12)which is in part responsible for the ability of HIV to infect non-dividing cells. The C-terminal domain of integrase proteins is the least well-conserved but contains b-strand barrels resembling that found in the SH3 domain and includes determinants for DNA binding and multimerization (retroviral integrases are active only as multimers: a dimer is capable of 3'-end processing, but a tetramer is required for strand transfer and integration). Certain retroviral integrases also contain a N-terminal extension.

[0118] A nucleic acid comprising sequences encoding a polypeptide having retroviral integrase activity can be, for example, a mRNA molecule. Such mRNA molecules can be generated, for example, by in vitro transcription of a DNA molecule having appropriate transcriptional control sequences such as, for example, a bacteriophage T7 promoter or a bacteriophage SP6 promoter. Transcription termination can be regulated by the presence of a transcriptional terminator sequence or a RNA molecule can be generated as the result of run off transcription from a linear DNA template. Optionally, such integrase mRNAs contain translational regulatory sequences; e.g ., a Kozak sequence or an internal ribosome entry site (IRES).

[0119] Alternatively, sequences encoding polypeptides having retroviral integrase activity are present in a DNA molecule, for example, a plasmid. In these cases, promoter and enhancer sequences, additional transcriptional regulatory sequences such as transcription termination signals and polyadenylation signals, insulators and translational regulatory sequences (such as Kozak sequences and internal ribosome entry sites) can also be present in the plasmid. See also Masuda (2011) Frontiers in Microbiology 2: 1-5 (Article 210).

[0120] In additional embodiments, the disclosure provides integrase proteins (and nucleic acids encoding them) that have been engineered to contain one or more additional nuclear localization signals. For example, in addition to the endogenous NLS present in HIV- 1 integrase; NLS sequences from SV40 (PKKKRKV, SEQ ID NO: 13), c -myc (

PAAKRVKLD, SEQ ID NO: 14), the HIV Vpr protein (RRTRNGASKS, SEQ ID NO: 15) and hnRNPAl (S SNF GPMLGGNRFFRS SPY, SEQ ID NO: 16) are introduced at the N-terminus and/or the C-terminus of the integrase protein. In certain embodiments, a linker sequence is present between the integrase protein and the exogenous nuclear localization signal(s) at the N- and/or C-terminus. Since different nuclear localization signals are recognized by different importin proteins (e.g, the HIV integrase NLS is recognized by importin a3 and the HIV Vpr NLS is recognized by importin al, while other NLS sequences are recognized by importin b); integrase proteins containing multiple different nuclear localization signals will accumulate at higher levels in cell nuclei; thereby increasing integration efficiency.

V. Regulatory elements

[0121] The transgene cassettes and transgene vectors disclosed herein are gateway compatible; accordingly, it is straightforward to include not only coding sequences, but also 5' and 3' regulatory sequences, such as, for example, enhancers, promoters, transcription termination sites, polyadenylation signals and translation initiation sites; using two-way or three-way gateway cloning protocols. Accordingly, transgene-containing transgene cassettes, and integrated transgenes obtained by the methods described herein, can contain

transcriptional and translational regulatory sequences to control the expression (e.g, temporal expression and/or regional expression) of the integrated transgene. Certain regulatory sequences, known in the art, can also provide constitutive expression of a transgene (e.g, actin promoter, CMV promoter, 3-GPDH promoter, ribosomal promoters). Transcriptional regulatory sequences include, for instance, promoters, enhancers, polyadenylation signals and insulators.

[0122] Promoters active in eukaryotic cells are known in the art and include, for example viral promoters (e.g, SV40 early promoter, SV40 late promoter, cytomegalovirus major immediate early (MIE) promoter, herpes simplex virus thymidine kinase (HSV-TK) promoter), EF1 -alpha (translation elongation factor- 1 a subunit) promoter, Ubc (ubiquitin C) promoter, PGK (phosphoglycerate kinase) promoter, actin promoter and others. See also Boshart et al., GenBank Accession No.K03104; Uetsuki et al. (1989 ) J. Biol. Chem.

264:5791-5798; Schorpp et al. (1996) Nucleic Acids Res. 24: 1787-1788; Hamaguchi et al. (2000 ) J. Virology 74: 10778- 10784; and Dreos et al. (2013) Nucleic Acids Res. 41(D1):D157- D164. Tissue-specific promoters, such as the cMLC2 promoter, which specifies transcription in myocardial cells, can also be used.

[0123] Enhancer elements, and their nucleotide sequences, are known in the art.

Certain enhancers can be used to direct tissue-specific expression of genes ( e.g. , transgenes) to which they are operatively linked. For example, the FlilEP enhancer directs transcription to endothelial cells.

[0124] Polyadenylation signals, and their nucleotide sequences, are known in the art. Generally, a polyadenylation signal is present downstream, in the transcriptional sense, of the transgene. Polyadenylation signals that are active in eukaryotic cells include, but are not limited to, the SV40 polyadenylation signal, the bovine growth hormone (BGH) gene polyadenylation signal and the herpes simplex virus thymidine kinase gene polyadenylation signal. The polyadenylation signal directs 3' end cleavage of pre-mRNA, polyadenylation of the pre-mRNA at the cleavage site and termination of transcription downstream of the polyadenylation signal. A core sequence AAUAAA is generally present in the

polyadenylation signal. See also Cole et al. (1985) Mol. Cell. Biol. 5:2104-2113.

[0125] In further embodiments, the vectors and transgene cassettes disclosed herein contain an insulator element, also known as a matrix attachment region (MAR) or scaffold attachment region (SAR). MAR and SAR sequences act, inter alia , to insulate the chromatin structure of adjacent sequences. Thus, in a stably transformed cell, in which heterologous sequences are chromosomally integrated, an insulator sequence can prevent repression of transcription of a transgene that has integrated into a region of the cellular genome having a repressive chromatin structure. Accordingly, inclusion of one or more insulator sequences in a vector can facilitate expression of a transgene from the vector in stably-transformed cells.

[0126] Exemplary insulator elements include those from the human interferon beta gene (IBM), the chicken (G. gallus) lysozyme gene 5' matrix attachment region (CLM), the human interferon alpha-2 gene (IAM), the mouse S4 MAR/SAR and the human X29

MAR/SAR. The insulator can be located at any location within the vector or the cassette. In certain embodiments, insulator elements are located within the transgene cassette upstream (in the transcriptional sense) of a promoter. In additional embodiments, insulator elements are present at both ends of a transgene. [0127] In certain embodiments, the vectors also include, within an expression cassette (as defined above) a post-transcriptional regulatory element (PRE). In certain embodiments, the post-transcriptional regulatory element is a cis- acting element that promotes mRNA stability. In other embodiments, the post-transcriptional regulatory element is a cis- acting element that promotes transport of RNA from the nucleus to the cytoplasm. Exemplary PREs include the human hepatitis B virus PRE (HPRE) and the woodchuck hepatitis virus post- transcriptional regulatory element (WPRE). See, e.g ., U.S. Patent No. 6,136,597; Huang & Liang (1993) Mol. Cell. Biol. 13:7476-7486; Huang & Yen (1994) J. Virol. 68:3193-3199; Donello et al. (1996) J. Virol. 70:4345-4351; and Donello et al. (1998) J. Virology 72:5085- 5092. Sub-elements of the HPRE (a element and b element) and WPRE (a element, b element and g element) have been identified. Accordingly, chimeric PREs containing mixtures of HPRE and WPRE sub-elements are also contemplated for use in the compositions disclosed herein.

[0128] Additional post-transcriptional regulatory elements include, but are not limited to, the 5'-untranslated region of the human Hsp70 gene, the SP163 sequence from the vascular endothelial growth factor (VEGF) gene, the tripartite leader sequence associated with adenovirus late mRNAs and the first intron of the human cytomegalovirus immediate early gene. See, for example, Mariati et al. (2010) Protein Expression and Purification 69:9-15.

[0129] A transgene can comprise an intron which, in certain instances, can increase production of mRNA from an integrated transgene. Exemplary introns that can be used include the human b-globin intron and the first intron of the human cytomegalovirus major immediate early (MIE) gene, also known as“intron A.”

[0130] Vectors containing a transgene cassette can contain a replication origin that functions in prokaryotic cells. Replication origins that functions in prokaryotic cells are known in the art and include, but are not limited to, the oriC origin of E. coir, plasmid origins such as, for example, the pSClOl origin, the pBR322 origin (rep) and the pUC origin; and viral (i.e., bacteriophage) replication origins (e.g, the fl replication origin). Methods for identifying prokaryotic replication origins are provided, for example, in Sernova & Gelfand (2008) Brief. Bioinformatics 9(5):376-391.

VI. Selection Markers

[0131] Selection markers, both positive and negative, are known in the art. An exemplary selection marker that functions in eukaryotic cells is the glutamine synthetase (GS) gene; selection is applied by culturing cells in medium lacking glutamine or medium containing methionine sulfoximine. Another exemplary selection marker that functions in eukaryotic cells is the gene encoding resistance to neomycin ( neo ); selection is applied by culturing cells in medium containing neomycin or G418. An exemplary gene encoding neomycin resistance is the TN5 Neo gene. Additional selection markers include sequences encoding dihydrofolate reductase (DHFR, imparts resistance to methotrexate), puromycin-N- acetyl transferase (provides resistance to puromycin), hygromycin kinase (provides resistance to hygromycin B), hygromycin phosphotransferase, aminoglycoside-3 -phosphotransferase, ble, and genes encoding resistance to zeocin. Yet additional selection markers that function in eukaryotic cells are known in the art. Selective agents that can be used in the methods disclosed herein are known in the art and include, but are not limited to, G418, methotrexate, neomycin, geneticin, puromycin, bleomycin, Zeocin, blasticidin, hygromycin, methionine sulfoximine and L-glutamine. Any of the sequences encoding a selection marker as described above can be operatively linked to a promoter and/or a polyadenylation signal.

[0132] The vectors disclosed herein can also contain one or more selection markers that function in prokaryotic cells. Selection markers that function in prokaryotic cells are known in the art and include, for example, sequences that encode polypeptides conferring resistance to a selective agent such as, for example, ampicillin, kanamycin, chloramphenicol, or tetracycline. An example of a polypeptide conferring resistance to ampicillin (and other beta-lactam antibiotics) is the beta-lactamase ( bla ) enzyme. Kanamycin resistance can result from activity of the neomycin phosphotransferase gene; and chloramphenicol resistance is mediated by chloramphenicol acetyl transferase.

[0133] Negative selection markers that are active in prokaryotic cells include the ccdB gene, which encodes a DNA gyrase inhibitor.

[0134] The vectors disclosed herein can be any nucleic acid vector known in the art. Exemplary vectors include plasmids, cosmids, bacterial artificial chromosomes (BACs) and viral vectors.

VII. Transgenes

[0135] Any sequence, coding or noncoding, can serve as a transgene. For example, a transgene can encode a detectable moiety; e.g ., a fluorescent protein, such as green fluorescent protein (GFP), enhanced green fluorescent protein (EGFP), red fluorescent protein, yellow fluorescent protein, tdTomato, luciferase and the like. A transgene can also encode an enzymatic activity (e.g, b-galactosidase, b-glucuronidase, luciferase and the like). A transgene can also be a therapeutic protein, such as globin, a coagulation factor, or a therapeutic antibody.

[0136] A transgene can encode, for example, a recombinant protein, a fusion protein, an antibody, a cytokine, a hormone, an enzyme or a clotting factor. Exemplary antibodies include monoclonal antibodies, single chain antibodies, bispecific antibodies, and antibody conjugates.

[0137] Exemplary transgenes include those encoding therapeutic proteins, e.g ., hormones (such as, for example, growth hormone), cytokines (e.g, erythropoietin), antibodies, monoclonal antibodies (e.g, rituximab), antibody conjugates, fusion proteins (e.g, IgG-fusion proteins), interleukins, CD proteins, MHC proteins, enzymes and clotting factors.

[0138] Exemplary cytokines include, but are not limited to, erythropoietin, granulocyte colony-stimulating factor (G-CSF), filgrastim, and PEGfilgrastim.

[0139] Exemplary hormones include, but are not limited to, human growth hormone, luteinizing hormone (Luveris), and epoetin (Procrit).

[0140] Insertion of a transgene into a transgene vector is conducted using standard gateway cloning procedures, which results in conversion of the att sites present in the transgene vector into different att sites in the transgene-containing transgene vector. For example, in certain embodiments, attR sites (e.g, attR4 and attR3) present in a transgene vector are converted to attP sites (e.g, altP 4 and attP3) in the process of inserting a transgene into the vector. Depending on the method of inserting transgene sequences, multiple att sites can be present in a transgene-containing transgene vector. For example, a transgene- containing transgene vector constructed by three-way gateway cloning will comprise four att sites.

VIII. Methods for transgenesis

[0141] The compositions disclosed herein can be used for convenient, high-efficiency, non-viral insertion of a transgene into the genome of a cell, by contacting the cell with a combination comprising (1) a transgene-containing transgene cassette (2) and a nucleic acid comprising sequences encoding a polypeptide having retroviral integrase activity. A transgene-containing transgene cassette can be an isolated, double-stranded DNA molecule or it can be one of a plurality of DNA molecules generated by digestion of a transgene- containing transgene vector with a restriction enzyme. Contact can be by any method known in the art, including transfection, injection, electroporation, biolistic delivery, protoplast fusion, polyethylene glycol (PEG)-mediated methods, polyethyleneimine (PEI)-mediated methods, DEAE-dextran-mediated methods, calcium phosphate co-precipitation, and lipid- based particles ( e.g ., lipofection).

[0142] The methods and compositions described herein achieve high-efficiency transgene integration. In certain embodiments, at least 5% of cells exposed to a transgene undergo stable integration of the transgene into the genome (i.e. 5% efficiency of integration). In additional embodiments, the efficiency of integration is greater than 10%, greater than 15%, greater than 20%, greater than 25%, greater than 30%, greater than 35%, greater than 40%, greater than 45%, greater than 50%, greater than 55%, greater than 60%, greater than 65%, greater than 70%, greater than 75%, greater than 80%, greater than 85%, greater than 90%, greater than 95%, or greater than 98%.

[0143] The cell can be any type of cell, including eukaryotic, prokaryotic or Archaeal. Exemplary eukaryotic cells include fungal cells (e.g., Trichoderma sp., Pichia pastoris , Schizosaccharomyces pombae and Saccharomyces cerevisiae), plant cells (e.g, Arabidopsis cells and tobacco BY2 cells), insect cells (e.g, Sf9, Sf21, and Drosophila S2 cells), vertebrate cells, teleost cells (e.g., ,Danio sp., e.g. Danio rerio or zebrafish), mammalian cells, primate cells and human cells. The transgene-containing transgene cassette can be an isolated and/or purified nucleic acid or can be part of a collection of nucleic acid molecules resulting from restriction enzyme digestion of a larger DNA molecule, e.g, a plasmid.

[0144] Cultured mammalian cell lines, useful for expression of recombinant polypeptides, include Chinese hamster ovary (CHO) cells, human embryonic kidney (HEK) cells, virally transformed HEK cells (e.g, HEK293 cells), NS0 cells, SP20 cells, CV-1 cells, baby hamster kidney (BHK) cells, 3T3 cells, Jurkat cells, HeLa cells, COS cells, PERC.6 cells, CAP ® cells, CAP-T ® cells (the latter two cell lines being commercially available from Cevec Pharmaceuticals, Cologne, Germany) and cancer cell lines such as A549 and PANC-1. A number of derivatives of CHO cells are also available such as, for example, CHO-DXB11, CHO-DG-44, CHO-K1 and CHO-S. Derivatives of any of the cells described herein obtained, for example, by mutagenesis, selection, gene knock-out, targeted integration (e.g, CRISPR/CAS9; zinc finger nucleases) or cloning, are also provided. Mammalian primary cells can also be used. Myeloma and hybridoma cells can also be used.

[0145] Nucleic acids comprising sequences encoding retroviral integrase activity, for use in these methods, are described elsewhere herein. IX: Additional embodiments

[0146] Each retrovirus encodes its own integrase protein, has unique LTR sequences and has a unique 5' terminal sequence of its double-stranded DNA pre-integration

intermediate. Accordingly, the present disclosure provides additional transgene vectors and transgene cassettes containing dLTR sequences and 5'-terminal inverted repeat sequences of a retrovirus other than HIV-1 and methods in which such transgene vectors and transgene cassettes are used in conjunction with nucleic acids encoding an integrase protein from the virus used to provide the dLTR and inverted repeat sequences.

X. Targeted Integration

[0147] For certain applications, it is desirable to insert a transgene(s) at a specific location in the genome of the target cell or target organism. Targeted integration is achieved by taking advantage of elements of the CRISPR-Cas9 targeting system. The Cas9 protein is a RNA-guided DNA endonuclease that cleaves DNA sequences that are complementary to a guide RNA. Guide RNAs can be synthesized to be complementary to any DNA sequence of choice, and are thereby able to target the Cas9 endonuclease to any DNA sequence of choice (i.e., a genomic DNA sequence complementary to the targeting portion of the sequence of the guide RNA). Moreover, mutants of Cas9 that lack endonuclease activity (so-called“dead Cas9” or dCas9) can be fused to functional domains (such as transcriptional activation domains and transcriptional repression domains) to target the activity of these domains to particular genomic sequences ( e.g ., promoters).

[0148] dCas9 is a catalytically inactive mutant of the Streptococcus pyogenes cas9 protein that lacks endonuclease activity. The dCas9 protein remains capable of binding to DNA/RNA duplexes and therefore can be targeted to a particular chromosomal sequence using a guide RNA of appropriate nucleotide sequence.

[0149] The amino acid sequence of S. pyogenes dCas9 is:

[0150] Lens epithelium-derived growth factor (LEGDF/p75) also known as psipla, PC4 or SFRS1 -interacting protein, is a host factor that participates in integration of the HIV genome into a host chromosome. The C-terminal portion of this protein contains an integrase-binding domain, which interacts with lentiviral integrase proteins and with other cellular proteins. The psipla protein also binds to chromosomal DNA, thereby tethering integrase to chromosomal DNA at the integration site.

[0151] The amino acid sequence of zebrafish psipla is:

[0152] For targeted integration using the transgene vectors disclosed herein, the transgene vector and integrase-encoding nucleic acid are supplemented with a nucleic acid (e.g., DNA, RNA) encoding a fusion between dCas9 and the psipla (LEDGF) protein, in conjunction with a guide RNA whose targeting region is complementary to the genomic sequence at which integration is desired. The guide RNA targets the dCas9 portion of the fusion protein to the target genomic sequence, while the psipla portion of the fusion protein interacts with integrase to tether the integrase/transgene cassette pre-integration complex to the target genomic sequence, thereby facilitating integration at the target genomic sequence.

A schematic diagram illustrating this method is shown in Figure 22.

[0153] Accordingly, in certain embodiments for targeted integration of a transgene, the following constituents are introduced into the target cell:

(1) single guide RNA (sgRNA) with a sequence complementary to the target genomic sequence and a hairpin sequence that binds dCas9,

(2) a dCas9-psipla fusion protein, or mRNA encoding a dCas9-psipla fusion protein,

(3) mRNA encoding an integrase, and

(4) a transgene cassette.

[0154] In additional embodiments, sequences encoding the dCas9-psipla fusion protein are present on a DNA molecule ( e.g ., a plasmid) and are under the transcriptional and translational control of elements that are active in the target cell.

[0155] In additional embodiments, sequences encoding the integrase protein are present on a DNA molecule (e.g., a plasmid) and are under the transcriptional and

translational control of elements that are active in the target cell.

[0156] The foregoing methods for targeted integration rely on binding of thepsipla portion of the psipla-dCas9 fusion protein to integrase molecules that are present at both ends of the transgene cassette in a preintegration complex. However, endogenous psipla (already present in the cell) can compete with binding of the psipla-dCas9 fusion protein to the integrase proteins present in the preintegration complex. Accordingly, in certain

embodiments, the psipla-dCas9 fusion protein is overexpressed in target cells, for example, by injecting RNA encoding the psipla-dCas9 fusion protein at a molar excess to integrase RNA, by injecting a quantity of RNA encoding the psipla-dCas9 fusion protein that will produce a molar excess of psipla-dCas9 fusion protein to endogenous psipla, or by introducing an expression vector containing sequences encoding the psipla-dCas9 fusion protein (instead of RNA encoding the psipla-dCas9 fusion protein) in which the sequences encoding the psipla-dCas9 fusion protein are under the transcriptional control of sequences that express, or can be induced to express, the psipl A-dCas9-encoding sequence at high levels. In additional embodiments, inhibition of expression of endogenous psipla, for example, by blocking splicing of psipla pre-mRNA with morpholino compounds, can also be used to enhance the efficiency of targeted integration. [0157] Translational control elements (e.g, Kozak sequences or the like) which are active at high levels in the host cell can also be included in vectors for overexpression of the psipla-dCas9 fusion protein.

EXAMPLES

Example 1: Construction of transgene vectors

[0158] Transgene plasmids (pLTR vectors) were constructed by modifying the Gateway cloning destination vector pminiTol2 R4R3 (Addgene #40970, see also Kwan et al. (2007) Devel. Dynamics 236:3088-3099), which contains an attR4/attR3 gateway cassette flanked by Tol2 transposon sequences.

[0159] Briefly, the upstream and downstream miniTol2 sequences were replaced by two truncated HIV-1 LTR sequences. The upstream miniTol2 sequence was replaced with sequences containing the R and U5 sequences of the HIV-1 LTR (5'-dLTR; template from Addgene #14883). The downstream miniTol2 sequence was replaced with sequences containing dU3, R and U5 sequences of the HIV-1 LTR (3'-dLTR; template from Addgene #19319).

[0160] For sequence replacement, DNA molecules were constructed that contained the replacement sequence (5' dLTR or 3' dLTR) with the sequence 5'-ACTG-3' appended to the 5' end of the replacement sequence, and terminating in a recognition site for a blunt end generating restriction enzyme ( e.g ., Seal, Pme I or Bs/Z \ 71). Replacement DNA molecules were amplified by PCR, using Addgene 14883 and 19319 as templates, using Platinum™ Taq DNA Polymerase High Fidelity (Invitrogen). The amplification products were then inserted into the pminiTol2R4R3 vector. 5' dLTR-containing PCR products were ligated into

NdeI /Xhol -digested pminiTol2R4R3. 3' dLTR-containing PCR products were ligated into ApaI /IASG/II-digested pminiTol2R4R3.

[0161] A schematic diagram of the vector is shown in Figure 11. A more detailed map of the transgene cassette portion of the vector is provided in Figure 12. The vector shown in Figure 11 has recognition sites for the blunt end-generating restriction enzyme BstZni external to the truncated LTR (i.e., 5' dLTR and 3' dLTR) sequences. Two additional vectors have been constructed: one having Pmel sites at these locations and the other having Seal sites at these locations.

[0162] Transgenes, and optionally regulatory sequences, are inserted into the transgene vector using standard gateway cloning methods. One-way, two-way, or three-way insertions can be used, depending on the nature of the transgene and associated (e.g., regulatory) sequences. See, e.g, Hartley el al. , supra for additional details of methods for one-way, two-way and three-way insertions.

[0163] Plasmids were amplified in One Shot ® TOP 10 E. coli cells (Invitrogen, Carlsbad, CA) and purified using a PureLink ® Quick Plasmid Miniprep Kit (Invitrogen) for subsequent microinjection, transfection, or production of mRNA by in vitro transcription.

Example 2: Construction of integrase vectors

[0164] The pCS2-integrase and pCS2-integrase-2A-tdTomato overexpression vectors were constructed using standard gateway cloning protocols with pCSDest2 (Addgene #22424), p3E-2a-tdTomato (Addgene #67707) and pME-integrase. pME-integrase was generated by conducting a standard gateway BP reaction using wild-type HIV-1 integrase in pET15b (Addgene #61668) as a template for PCR. A Kozak sequence was present in the vector for regulation of translation of the integrase sequences. All constructs were verified by DNA sequencing.

[0165] The p5E-CMV/SP6 plasmid ( a 5' entry gateway clone containing the CMV promoter) was obtained from Dr. Nathan Lawson. p5E-cmlc2 was obtained from a zebrafish Tol2 kit generated by Dr. Chien Chi-Bin. Kwan, K.M.et al. (2007) Dev Dyn 236:3088-3099. cmlc2 is a promoter that specifies transcription in the heart.

Example 3: Stable integration of a transgene in Zebrafish

[0166] This example shows that co-injection of an EGFP-expressing transgene cassette and integrase-encoding mRNA, into zebrafish embryos, results in high-efficiency, stable transfection.

[0167] Adult zebrafish were housed in an Aquaneering (San Diego, CA) zebrafish housing system at 28 °C on a 14-hours light and 10-hours dark cycle. Single pair crossing were used to generate fertilized embryos for microinjection to test for stable genomic integration of transgenes. After analysis, selected embryos were incubated in the egg water at 28 °C for up to 6 days post-fertilization (dpf) before being raised in the main system.

[0168] A transgene cassette comprising sequences encoding enhanced green fluorescent protein (EGFP) under the control of a CMV promoter (pLTR-CMV-EGFP) was constructed by inserting a CMV promoter, EGFP cDNA and a BGH polyadenylation signal into the vector described in Example 1 using a 3 -way (i.e., 5’ entry (CMV promoter), middle entry (EGFP ) and 3’ entry (polyadenylation signal)) gateway insertion. See Figure 13.

[0169] Integrase-encoding mRNA was generated using a mMESSAGE mMACHINE ® SP6 Transcription Kit (Invitrogen) with pCS2-Integrase, linearized with Vo/I, as a template. RNA was purified by phenol/chloroform extraction and ethanol precipitation. [0170] One-cell zebrafish embryos were co-injected with the EGFP transgene cassette and the integrase mRNA, as shown schematically in Figure 13. Microinjection was performed as described. Kawakami, K. (2007) Genome Biol 8 Suppl 1:S7; Thermes, V. et al. (2002) Mech Dev 118:91-98. Embryos at the one-cell stage were injected with a high dose of 25 ng/ul each of DNA and RNA, or with a low dose of 12.5 ng/ul each of DNA and RNA) in a volume of 0.5 nl per embryo.

[0171] The injected embryos were analyzed for the expression of the EGFP transgene at 6 days post-fertilization (DPF). For fluorescence analysis, live embryos were placed in egg water containing lx tricaine. Fluorescence images were acquired using a Leica M165 FC stereo microscope. Injected embryos were categorized in five different groups (Group 0 through Group 4) based on the degree of GFP expression, with Group 0 showing no EGFP fluorescence and Group 4 showing the highest amount of EGFP fluorescence. Groups 2-4 represent successful genome integration with strong transgene expression and a high potential for germ line transmission in FI fish. Group 0 and Group 1 represent fish in which no integration occurred (Group 0) or a very small amount of integration occurred (Group 1).

[0172] A comparison of integration levels using two different doses of injected nucleic acid (a high dose of 25 ng/ul each of mRNA and DNA or a low dose of 12.5 ng/ul each) was performed, and the results were quantified. As shown in Figure 14, stable integration (i.e., generation of fish in groups 2, 3 and 4) was obtained in 55% of embryos injected at the high dose; and in 38% of embryos injected at the low dose. When these results are compared with those obtained from embryos in control experiments injected with only the transgene cassette (Figure 14, first and third pairs of bars), it is clear that the HIV-1 integrase greatly facilitates the integration rate. Accordingly, the methods disclosed herein are capable of achieving stable transgenesis in zebrafish with very high efficiency.

Example 4: Comparison with other methods of zebrafish transgenesis

[0173] Existing methods for construction of transgenic zebrafish (and other organisms) without using viral vectors include (1) Tol2-mediated transgenesis and (2) meganuclease ( e.g ., I-riceI)-mediated transgenesis. Accordingly, the methods described herein were compared to these two methods of performing transgenesis in zebrafish. Figure 15 shows that Tol2-mediated integration resulted in 62% stable transgenesis (i.e., 62% of fish that developed from treated embryos fell into Groups 2, 3 and 4); and Figure 16 shows that I- LAT -mediated integration results in 20% stable transgenesis (i.e., 20% of fish that developed from treated embryos fell into Groups 2, 3 and 4) These results were consistent with those obtained previously Kawakami et al. (2007) Genome Biol. 8: Suppl 1 : S7; Thermes et al. (2002) Mech. Devel. 118:91-98. Thus, the efficiency of transgenesis obtained with the methods disclosed herein (up to55%) is much higher than that obtained using the \-Sce\ method, and comparable to that obtained using Tol2-mediated transposon sequences.

Moreover, the methods disclosed herein do not suffer from the disadvantage, encountered with Tol2-mediated transgenesis, of mobilization of the integrated transgene in the presence of the Tol2 transposon. These results indicate that the efficiency of transgenesis obtained with the methods disclosed herein is better than or similar to current methods.

Example 5: Tissue-specific transgene expression

[0174] To test for the ability to direct tissue-specific expression of a transgene introduced by the methods disclosed herein, a transgene cassette containing sequences encoding EGFP under the control of Flilep enhancer (which directs transcription in endothelial cells) was constructed and denoted pLTR-Flilep:EGFP-pA. The p5E-flilep plasmid, containing the Flilep enhancer, was obtained from Dr. Nathan Lawson.

[0175] As in Example 3, fish that developed from injected embryos were grouped into five categories based on the degree of EGFP expression (negative expression: Group 0, low expression: Group 1 and increasing degrees of positive expression: Groups 2, 3 and 4).

Fluorescent images of zebrafish that developed from embryos that had been injected with integrase mRNA and a transgene cassette containing sequences encoding enhanced green fluorescent protein under the transcriptional control of the endothelial-specific Flilep enhancer showed that; in Groups 2, 3 and 4; EGFP expression was primarily restricted to the vasculature. In addition, the levels of stable transgene integration were 57% in fish injected with 25 ng/ul and 27% in fish injected with 12.5 ng/ul (Figure 17) similar to the levels observed in Example 3 using an enhancerless construct. These results demonstrate that the methods disclosed herein provide the ability for regional, spatial and tissue-specific control of stable transgene expression.

[0176] In additional experiments using the catalytically-deficient integrase mutants D116A and El 52 A, a much lower integration efficiency (approximately 10%) was obtained; and all integrants were in Group 2 (i.e., low level of integration). These results indicate that, although a certain amount of integration can occur in the absence of integrase activity, high levels of integration depend on functional integrase.

Example 6: Stable transgenesis in cultured cells

[0177] This example shows that high levels of stable integration are obtained following co-transfection, into cultured human cells, of (1) a transgene cassette containing EGFP-encoding sequences under the transcriptional control of a CMV promoter and a (2) plasmid encoding HIV-1 integrase under the transcriptional control of a CMV promoter (pCS2-Integrase-2A-tdTomato). The transgene cassette was obtained by cleavage of the pLTR-CMV-EGFP plasmid (described in Example 3) with BstZni. The design of the experiment is shown schematically in Figure 18.

[0178] Two human epithelial cancer lines, A549 and PANC-1, were used in these experiments. Human lung cancer cell line A549 was acquired from ATCC (#CCL-185) and maintained in F12 medium supplied with 10% fetal bovine serum at 37°C in a humidified atmosphere of 5% CC>2/95% air in the presence of antibiotics. The human pancreatic cancer line PANC-1 was obtained from Sigma (#87092802) and maintained in DMEM with 10% fetal bovine serum at 37°C in a humidified atmosphere of 5% CO 2 //95% air in the presence of antibiotics.

[0179] Transfection was conducted using Lipofectamine ® 3000 (Invitrogen, Carlsbad, CA) according to the manufacturer’s instructions. Briefly, one day before transfection, cells were seeded at a density of 2xl0 5 cells/well in a 12-well plate. After 24 hours, the cells were rinsed with phosphate-buffered saline (PBS). Each group was transfected with a mixture of 1 pg BstZ17I-digested pLTR-CMV-EGFP and 1 pg of pCS2-Integrase-2A-tdTomato, using Lipofectamine ® -p3000 mixture in Opti-MEM for 4 hours, after which an equal volume of complete medium was added. In control experiments, cells were transfected with the EGFP transgene cassette and a plasmid that lacked sequences encoding integrase (pSC2-2ATomato- pA).

[0180] One day after transfection, the cells were subcultured and analyzed by flow cytometry to determine the number of cells that received both DNA molecules. Single cell suspensions of the samples were prepared by trypsinization, and the fluorescence intensity of each sample was evaluated on a LSR II flow cytometer (BD Biosciences, San Jose, CA). For each analysis, at least 10,000 events were recorded. Green (GFP) and Red (tdTomato) fluorescent signal were used as indicators for successful co-transfection of transgene and integrase plasmid, respectively, and the percentages of double positive events (both red and green fluorescence) were calculated using FACSDiva software (BD Biosciences).

Untransfected cells served as a negative control.

[0181] Seven days after transfection (approximately three passages), at which time only stable transfectants persist, the degree of integration was determined by fluorescence imaging using a Leica M165 FC stereomicroscope. At least four images were taken in random locations of the dish for each experimental group. Representative images are shown in Figure 19, with green fluorescence (shown as white in the figure) indicating stable integration of the EGFP transgene cassette.

[0182] To quantify the percentage of the cells with positive GFP expression, all images were analyzed and processed consistently using Image J by adjusting the threshold and counting the positive pixels.

[0183] Quantified results were averaged and normalized to the transfection efficiency. Figure 20 shows the results of the quantitative analysis, which indicate that 42% of A549 cells, and 41% of PANC-1 cells, that received both the EGFP transgene cassette and the integrase plasmid expressed EGFP, compared to 12% of A5459 cells, and 13% of PANC-1 cells, that received the transgene cassette and a plasmid that did not express integrase (pCS2- 2Atomato-pA).

Example 7: Effect of end structure on integration efficiency

[0184] As noted elsewhere herein, retroviral integrases require a linear double- stranded DNA molecule, containing the terminal inverted repeat sequence 5'-ACTG-3', as a substrate for end processing and strand transfer {i.e., integration). In this example, the effect, on integration efficiency, of the location of the 5'-ACTG-3' sequence ( the IR sequence), with respect to the termini of the transgene cassette, was tested. To this end, four versions of a transgene vector containing sequences encoding the red fluorescent protein tdTomato, under the transcriptional control of the cardiac-specific cMLC2 promoter and the BGH

polyadenylation site, were generated. Each had a different end structure external to the IR sequences. Cleavage of the transgene vector with Seal generated perfect 5'-ACTG-3' blunt- ends on the resulting transgene DNA cassette; while cleavage with BstZlll generated a transgene cassette with one additional terminal nucleotide exterior to the IR sequence (5'- TACTG-3') and cleavage with Pmel generated a transgene cassette with two extra nucleotides exterior to the IR sequence (5'-AAACTG-3'). Double digestion with MM and A pal generated ends with 4-nucleotide overhangs exterior to the IR sequence.

[0185] One-cell embryos were injected withl2.5 ng/mΐ of integrase-encoding mRNA and 12.5 ng/mΐ of the of each of four different tdTomato-encoding transgene cassettes. Fish developing from injected embryos were analyzed for red fluorescence at 6 days post fertilization dpf) and categorized into three groups: Group 0 (no fluorescence); Group 1 (partial fluorescence in heart) and Group 2 (full fluorescence in heart). The percentage of embryos in Groups 1 and 2 {i.e., percentage of embryos in which transgene was stably integrated) is shown in Figure 21. As can be seen, there were no significant differences, in integration efficiency among transgene cassettes terminating in Seal ends, BstZlll ends and Pme I ends. Thus, the presence of one or two extra nucleotide, external to the IR sequence, does not affect integration efficiency. In contrast, if the transgene cassette possessed ends having 4-nulceotide overhangs (generated by double digestion with MM (5'-CGCG overhang) and Apa\ (3'-CCGG overhang) external to the 5'-ACTG-3' IR sequence, integrase-dependent integration was totally abolished (Figure 21), suggesting that the integrase cannot perform 3' processing or strand transfer on such a substrate. These results indicate that the terminal sequence and structure of the transgene cassette is important for high-efficiency integration, but that a certain amount of variability in the location of the IR sequence is tolerated.

[0186] In additional experiments, the contribution of the LTR sequences that are present in the transgene cassette was investigated. The following results were obtained:

(a) transgenes whose expression was directed by an endothelium-specific enhancer, flanked on both ends with a 21-nucleotide U3 sequence that included a 5'-ACTG-3' blunt-ended sequence (i.e., no dLTR sequences), integrated efficiently in the presence of integrase; however, integration was non-specific;

(b) transgenes with a single downstream 3'-dLTR (i.e., no upstream 5' dLTR) integrated with higher efficiency than transgenes flanked by both a 5'-dLTR and a 3'-dLTR;

(c) transgenes with a single upstream 5'-dLTR (i.e., no downstream 3' dLTR) integrated with lower efficiency than transgenes flanked by both a 5'-dLTR and a 3'-dLTR.

Statistical analysis

[0187] All assays were carried out in triplicate or more. Data was expressed as a mean or stacked mean with standard deviation (SD). The Student’s t-test was used to compare the mean between groups to determine statistical significance; with a p value< 0.05 considered statistically significant.

Example 8: Vectors encoding dCas9-psipla fusions

[0188] A vector encoding a fusion between LEGDF (psipl A) and dCas9 was constructed as follows. Sequences encoding zebrafish psipl a (zpsipla) cDNA were cloned from zebrafish DNA and inserted by gateway cloning into the pME entry vector. Cas9 sequences were obtained as a Kpnl/Nhel fragment produced by double digestion of the dCas9 plasmid #100091 (Addgene, Watertown, MA). The psipla sequence, the cas9 sequence, linearized pCS expression vector (Miyoshi et al. (1998) J. Virol. 72:8150-8157), a nuclear localization sequence (NLS) and sequences encoding (GGS)s (SEQ ID NO: 17) linkers were joined by Gibson assembly (Gibson et al. (2009) Nature Methods 6:343-345) to generate two fusions: one in which dCas9 sequences are upstream of psipla sequences; the other in which dCas9 sequences are downstream of psipla sequences. Schematically, the two fusions have the following structures:

[0189] The nucleotide sequence of the pCS-NLS-dCas9-(GGS)5-zpsipla vector is:

[0190] Vector backbone sequences are represented by uppercase letters. Underlined segments of the sequence are as follows:

35-51 : SP6 promoter

64-78: b-globin translational leader sequence

94-114: nuclear localization sequence from SV40 large T-antigen

139-4242: dCas 9

4243-4287: (GGS)s linker (SEQ ID NO: 17) (not underlined)

4294-5535: zebrafish psipla 5573-5768: SV40 polyadenylation signal

7020-7880: Amp R gene.

A map of this vector is shown in Figure 23.

[0191] The nucleotide sequence of the pCS-zpsipla-(GGS)5-dCas9-NLS vector is:

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

3

3

3

3

[0192] Vector backbone sequences are represented by uppercase letters. Underlined segments of the sequence are as follows:

35-51 : SP6 promoter

64-78: b-globin translational leader sequence

91-1329: zebrafish psipla

1330-1374: (GGS)s linker (SEQ ID NO: 17) (not underlined)

1381-5484: dCas 9

5539-5559: nuclear localization sequence from SV40 large T-antigen

5609-5804: SV40 polyadenylation signal 7056-7916: Amp R gene. A map of this vector is shown in Figure 24.

[0193] Additional vectors are constructed with different linker sequences between the Cas9-encoding and psip la-encoding sequences. In these constructs, the (GGS)s linker (SEQ ID NO: 17) is replaced by the more rigid (EAAAK)n linker (in which n = 1-4) (SEQ ID NO: 18) and the flexible (GGGGS)n linker (in which n = 1-4) (SEQ ID NO: 19).

Example 9: pLTRB-CMV-tdTomato transgene vector

[0194] This plasmid was constructed by gateway cloning using p5E-CMV, pME- tdTomato, and the two-way Gateway cloning vector pLTRB-R4R2. The nucleotide sequence of this vector is:

[0195] Underlined segments of the sequence are as follows:

714-1679: b-lactamase promoter and coding sequence

2928-3107: truncated HIV-1 LTR containing R and U5 sequences (SEQ ID NO:4)

3175-3195: attR4 sequences

3226-4203: CMV IE94 promoter

4238-4254: SP6 promoter

4257-4279: attBl sequences

4280-5706: td-Tomato

5710-5734: attB2 sequences

5750-5884: SV40 polyadenylation signal

6034-6267: truncated HIV-1 LTR containing dU3, R and U5 sequences (SEQ ID NO:5).

A map of this vector is shown in Figure 25.

Example 10: Targeted integration in zebrafish embryos

[0196] This example describes targeted integration of a td-Tomato transgene in zebrafish. Transgenic zebrafish embryos (pTol2-CMV:EGFP-pA) that contained an integrated EGFP gene were constructed by Tol2-mediated transgenesis as described in Example 4. One-cell embryos obtained from adult zebrafish containing an exogenous EGFP gene that had been introduced by Tol2-mediated transgenesis of embryos (as described in Example 4) were used as target organisms. For each experiment, approximately 200 embryos were injected with a mixture of:

(a) 6.25 pg/embryo of a transgene cassette containing a td-Tomato coding region (as described in Example 9),

(b) 6.25 pg/embryo of integrase-encoding RNA (prepared as described in Example 3),

(c) 6.25 pg/embryo of RNA encoding the psipla-Cas9 fusion protein (as

described in Example 8), prepared by in vitro transcription with SP6 RNA polymerase, and

(d) 6.25 pg/embryo of guide RNA complementary to a portion of the EGFP coding region having the sequence:

in which the GGG sequence at the 3' end is the protospacer adjacent motif (PAM) sequence. [0197] Because the target embryos are transgenic for EGFP, they exhibit green fluorescence. However, if the td-Tomato-encoding transgene cassette is integrated at the target sequence, the EGFP gene will be disrupted and the cell will exhibit red fluorescence, due to the integrated td-Tomato transgene.

[0198] Injected embryos were cultured in egg water (60 mg/ml Instant Ocean ® sea salt) at 28.5°C. Five hours after injection, embryos were analyzed by confocal fluorescence microscopy. The results, shown in Figure 26, indicate that several cells emitted red fluorescence, indicative of targeted integration of the td-Tomato transgene into the target site in the EGFP gene in those cells.

Example 11: Test System

[0199] Transgenic zebrafish (made, e.g ., by I-Scel-mediated methods, Tol2-mediated methods, or the methods disclosed herein) containing an integrated EGFP gene (or any other gene providing a fluorescent readout) are selected in which a single exogenous EGFP gene is integrated at a locus that does not contain a coding region or regulatory element. This is achieved, for example, by outcrossing transgenic fish until a strain is obtained that contains a single EGFP insertion in a non-coding, non-regulatory region (confirmed, e.g. , by determining the DNA sequence of the insertion site). Such a strain is used as a test system, e.g. , for optimizing the methods and compositions disclosed herein. For example, targeted integration, into the EGFP sequences of such strains, of transgene cassettes containing sequences encoding a non-green fluorescent molecule, such as td-Tomato, results in loss of green fluorescence and acquisition of red fluorescence.

Example 12: Integrase proteins with additional nuclear localization signals

[0200] This example provides results of an experiment to determine the effect of additional NLS sequences, in the integrase protein, on the efficiency of integration. The pFLilep:EGF P-pA transgene cassette (see Example 5) was co-injected into one-cell embryos with mRNA encoding one of three different integrase proteins: wild-type HIV-1 integrase, HIV-1 integrase with a c -myc NLS attached to the N-terminus, and HIV-1 integrase with a c- myc NLS attached to the C-terminus.

[0201] Six days post-fertilization, embryos were analyzed by confocal fluorescence microscopy and sorted into Groups (0 through 4) as described in Examples 3 and5. The results, shown in Figure 27, indicate that the presence of the c -myc NLS at the N-terminus of the integrase protein increases the efficiency of integration.