Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
CHO CELL MODIFICATION
Document Type and Number:
WIPO Patent Application WO/2022/123242
Kind Code:
A1
Abstract:
The present disclosure relates to the modification of Chinese hamster ovary (CHO) cells at specific genomic loci that bestows the capability of receiving and expressing exogenous nucleic acid, for an extended period, as well as to methods of modifying the genome of a (CHO) cell, such that an exogenous nucleic acid sequence can be introduced, maintained and expressed by the modified CHO cell, for an extended period.

Inventors:
ROSSER SUSAN J (GB)
KLEINJAN DIRK-JAN (GB)
Application Number:
PCT/GB2021/053210
Publication Date:
June 16, 2022
Filing Date:
December 08, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV COURT UNIV OF EDINBURGH (GB)
International Classes:
C12N5/16; C12N15/90
Domestic Patent References:
WO2012138887A12012-10-11
WO2017180669A12017-10-19
WO2018150269A12018-08-23
Other References:
HAMAKER NATHANIEL K ET AL: "Site-specific integration ushers in a new era of precise CHO cell line engineering", CURRENT OPINION IN CHEMICAL ENGINEERING, vol. 22, 1 December 2018 (2018-12-01), Netherlands, pages 152 - 160, XP055898826, ISSN: 2211-3398, DOI: 10.1016/j.coche.2018.09.011
"NCBI", Database accession no. GCF_003668045.3. 11,306
ROMANOVA N.NOLL T.: "Engineered and Natural Promoters and Chromatin-Modifying Elements for Recombinant Protein Expression in CHO Cells", BIOTECHNOLOGY JOURNAL, vol. 13, 2018, pages 1700232
KAWAKAMI, K.: "Tol2: a versatile gene transfer vector in vertebrates", GENOME BIOL, vol. 8, 2007, pages S7, XP008156743, DOI: 10.1186/gb-2007-8-s1-s7
SZANISZLO PROSE WAWANG NREECE LMTSULAIA TVHANANIA EGELFERINK CJLEARY JF: "Scanning cytometry with a LEAP: laser-enabled analysis and processing of live cells in situ", CYTOMETRY A., vol. 69, no. 7, July 2006 (2006-07-01), pages 641 - 51, XP055845256, DOI: 10.1002/cyto.a.20291
SRIRANGAN K.LOIGNON M.DUROCHER Y.: "The use of site-specific recombination and cassette exchange technologies for monoclonal antibody production in Chinese Hamster Ovary cells: retrospective analysis and future directions", CRITICAL REVIEWS IN BIOTECHNOLOGY., vol. 40, no. 6, 2020, pages 833 - 851
HELLEN C.U.SARNOW,P.: "Internal ribosome entry sites in eukaryotic mRNA molecules", GENES DEV., vol. 15, 2001, pages 1593 - 1612, XP002332841, DOI: 10.1101/gad.891101
Attorney, Agent or Firm:
CHAPMAN, Paul (GB)
Download PDF:
Claims:
CLAIMS

1. A modified CHO cell comprising a modified genome, wherein the modified genome comprises a landing pad for exogenous nucleic acid insertion and expression, wherein the landing pad is located at any position within the interval approximately 190,595 bp upstream of the Lemd2 gene start codon to 2188 bp downstream of the LemD2 stop codon; any position within the interval from 11 ,469 bp upstream of the Cdk2ap2 start codon - 2585 bp downstream of the Cdk2ap2 stop codon; or any position within the interval from 25902 bp upstream of the Mrpl4 start codon - 7002 bp downstream of the Mrpl4 stop codon.

2. The modified CHO cell according to claim 1 , wherein the landing pad is located within the genome at any position within the interval approximately 52,027 bp upstream of the LemD2 start codon - 1937 bp downstream of the LemD2 stop codon; 11 ,306 bp upstream of the Cdk2ap2 start codon - 2585 bp downstream of the Cdk2ap2 stop codon; or 24994 bp upstream of the Mrpl4 start codon - 7161 bp downstream of the Mrpl4 stop codon.

3. The modified CHO cell according to either of claims 1 or 2 wherein the landing pad is not inserted within an exon of the identified genes.

4. The modified CHO cell according to any preceding claim, wherein the landing pad is inserted upstream of LemD2.

5. The modified CHO cell according to any of claims 1 - 3, wherein the landing pad is inserted within an intron of Cdk2ap2, such as intron 1 of the Cdk2ap2 gene.

6. The modified CHO cell according to any of claims 1 - 3, wherein the landing pad is inserted within an intron of the Mrpl4 gene, such as within intron 4 of the Mrpl4 gene.

7. The modified CHO cell according to any preceding claim, wherein the landing pad is located within the CHO cell genome at the position 40,471 bp of NCBI accession number NW_003614972.1 , 188,703 bp of NCBI accession number

NW_003614797.1 , or 1 ,214,962 bp of NCBI accession number NW_003613752.1 ± 500bp, 250bp, 100bp, 50bp of the CHO-K1 genome, or at a corresponding location within a genome of another CHO cell strain. A modified CHO cell according to any preceding claim, further comprising an exogenous nucleic acid sequence, inserted within the landing pad, through a method of cassette exchange, such as RMCE. A landing pad construct comprising in a 5’ - 3’ direction: a first homologous sequence which is homologous with a first portion of sequence approximately 1 ) 190,595 bp upstream of the Lemd2 gene start codon - 2188 bp downstream of the LemD2 stop codon; 2) 11 ,469 bp upstream of the Cdk2ap2 start codon - 2585 bp downstream of the Cdk2ap2 stop codon; or 3) 25902 bp upstream of the Mrpl4 start codon - 7002 bp downstream of the Mrpl4 stop codon; a first insulator site; a first promoter sequence, a first recombinase site for a first recombinase; an antibiotic resistance gene sequence; an internal ribosome entry site; a reporter gene; a second recombinase site for the first recombinase; a second insulator site; and a second homologous sequence which is homologous with a second portion, 3’ of the first homologous sequence, of sequence approximately 1 ) 190,595 bp upstream of the Lemd2 gene start codon - 2188 bp downstream of the LemD2 stop codon; 2) 11 ,469 bp upstream of the Cdk2ap2 start codon - 2585 bp downstream of the Cdk2ap2 stop codon; or 3) 25902 bp upstream of the Mrpl4 start codon - 7002 bp downstream of the Mrpl4 stop codon. The landing pad construct according to claim 9 wherein the first and second portions of homologous sequence are 52,027 bp upstream of the LemD2 start codon - 1937 bp downstream of the LemD2 stop codon; 11 ,306 bp upstream of the Cdk2ap2 start codon - 2585 bp downstream of the Cdk2ap2 stop codon; or 24994 bp upstream of the Mrpl4 start codon - 7161 bp downstream of the Mrpl4 stop codon The landing pad construct according to claim 9 or 10 wherein the first and second portions of homologous sequence are not inserted within an exon of the identified genes. The landing pad construct according to claims 9 or 10 wherein the first and second portions of homologous sequence are located upstream of LemD2. The landing pad construct according to claims 9 or 10 wherein the first and second portions of homologous sequence are located within an intron of Cdk2ap2, such as intron 1 of the Cdk2ap2 gene. The landing pad construct according to claim 9 or 10 wherein the first and second portions of homologous sequence are located within an intron of the Mrpl4 gene, such as within intron 4 of the Mrpl4 gene The landing pad construct according to claims 9 - 14 wherein the first portion of homologous sequence is 5’ of the second portion of sequence. The landing pad construct according to any of claims 9 - 15 wherein the first and second homologous sequences may be approximately 250bp - 2Kb in length, such as 500bp - 1 ,5kb, or 0.7 - 1 kb in length. Use of a modified CHO cell, or landing pad construct according to any preceding claim for stable, long-term expression of exogenous nucleic acid in a CHO cell. Use according to claim 10, wherein the exogenous nucleic acid is maintained for at least 25 generations of the modified CHO cell.

Description:
CHO cell modification

Field of the disclosure

The present disclosure relates to the modification of Chinese hamster ovary (CHO) cells at specific genomic loci that bestows the capability of receiving and expressing exogenous nucleic acid, for an extended period, as well as to methods of modifying the genome of a (CHO) cell, such that an exogenous nucleic acid sequence can be introduced, maintained and expressed by the modified CHO cell, for an extended period.

Background of the disclosure

CHO cells have become the cell of choice for expression of recombinant proteins and antibodies, as they can produce significant levels of protein/antibody and allow post- translational modification of such proteins/antibodies making them suitable for use in humans.

However, by a conventional random integration strategy, development of a high-expressing and stable recombinant CHO cell line has always been a difficult task due to the heterogenic insertion and a resulting requirement of multiple rounds of selection. Site-specific integration of transgenes into CHO cell hot spots is a strategy to overcome these challenges since it can generate isogenic cell lines with consistent productivity and stability. A general review and identification of particular hot spots in the CHO genome is descried in Hamaker and Lee (2018). There is however, a desire to identify further hot spots, as the existing hot spots may not always allow for expression of a recombinant protein of choice.

In the genome of eukaryotes, the wrapping of DNA around nucleosomes to form chromatin enables the packaging of the large genome within the nucleus. At many sites in the genome, this chromatin packaging imposes a repressive transcriptional environment, which leads to silencing of integrated transgenes, while at other locations the chromatin exists in a more ‘open’ configuration supportive of high-level transcription. Modulation of chromatin (tightness of packaging) between open and repressed states can be introduced through epigenetic modifications on the histone proteins. The identification of genomic locations (i.e. hot spots) that support high-level gene expression which remain stable over time is therefore of great value for economically viable protein production. Random integration of DNA constructs with a reporter gene can be used as an approach to identify such regions. Continuous analysis of clones over many generations can reveal those cells where a high level of expression is maintained over time. Once a stable, highly expressing genomic site has been identified, the ability to re-insert many subsequent constructs into this precise genomic location will be highly advantageous. This can be achieved by creating a Landing Pad.

Summary of the disclosure

The present disclosure is based in part on the identification of 3 new hot-spot sites in a CHO genome. These hot spots allow exogenous nucleic acid to be inserted into the CHO cell genome at the hot spot location and expressed by the CHO cell, particularly for an extended period and over many cell generations.

In a first teaching, there is provided a modified CHO cell comprising a modified genome, wherein the modified genome comprises a landing pad for exogenous nucleic acid insertion and expression, wherein the landing pad is located at any position in the interval between approximately 190,595 bp upstream of the Lemd2 gene start codon to - 2188 bp downstream of the LemD2 stop codon; in the interval between 11 ,469 bp upstream of the Cdk2ap2 start codon - 2585 bp downstream of the Cdk2ap2 stop codon; or in the interval between 25902 bp upstream of the Mrpl4 start codon - 7002 bp downstream of the Mrpl4 stop codon.

In one embodiment the landing pad is located within the genome at any position within the region approximately 52,027 bp upstream of the LemD2 start codon - 1937 bp downstream of the LemD2 stop codon; 11 ,306 bp upstream of the Cdk2ap2 start codon - 2585 bp downstream of the Cdk2ap2 stop codon; or 24994 bp upstream of the Mrpl4 start codon - 7161 bp downstream of the Mrpl4 stop codon

190,595 bp upstream of the LemD2 start codon refers to the region until the stop codon, and end of elevated mRNA levels, of the next gene (Grm4) identified in NCBI accession number GCF_003668045.3. 2188 bp downstream of the LemD2 stop codon, refers to the region until the start codon of the next gene (LOC118239606) identified in NCBI accession number GCF_003668045.3. 1937 bp downstream of the LemD2 stop codon, refers to the region before the start of elevated mRNA levels for the next gene (LOC118239606) identified in NCBI accession number GCF_003668045.3

11 ,469 bp upstream of the Cdk2ap2 start codon, refers to the region until the stop codon of the next gene (Cabp2) identified in NCBI accession number GCF_003668045.3. 11 ,306 bp upstream of the Cdk2ap2 start codon, refers to the region until the end of elevated mRNA levels, of the next gene (Cabp2) identified in NCBI accession number GCF_003668045.3. 2585 bp downstream of the Cdk2ap2 stop codon, refers to the region until the start codon of the next gene (Pitpnml ) identified in NCBI accession number GCF_003668045.3. 1440 bp downstream of the Cdk2ap2 stop codon, refers to the region until the start of elevated mRNA levels for the next gene (Pitpnml ) identified in NCBI accession number GCF_003668045.3 25902 bp upstream of the Mrpl4 start codon, refers to the region until the start of elevated mRNA levels for the next gene (S1 pr2) identified in NCBI accession number GCF_003668045.3. 24994 bp upstream of the Mrpl4 start codon, until the end of the noncoding RNA XR004769565.1 identified in NCBI accession number GCF_003668045.3. 7161 bp downstream of the Mrpl4 stop codon, refers to the region until the start codon of the next gene (Icami ) identified in NCBI accession number GCF_003668045.3. 7002 bp downstream of the Mrpl4 stop codon, refers to the region until the start of elevated mRNA levels for the next gene (Icami ) identified in NCBI accession number GCF_003668045.3.

In one embodiment, the landing pad is not inserted within an exon of the identified genes.

In one embodiment, the landing pad is inserted upstream of LemD2.

In one embodiment, the landing pad is inserted within an intron of Cdk2ap2, such as intron 1 of the Cdk2ap2 gene

In one embodiment, the landing pad is inserted within an intron of the Mrpl4 gene, such as within intron 4 of the Mrpl4 gene. In one embodiment the landing pad is located within the CHO cell genome at the position 40,471 bp of NCBI accession number NW_003614972.1 (4948 bp upstream of the LemD2 start codon), 188,703 bp of NCBI accession number NW_003614797.1 , or 1 ,214,962 bp of NCBI accession number NW_003613752.1 , or ± 500bp, 250bp, 10Obp, 50bp of the CHO-K1 genome, or at a corresponding location within a genome of another CHO cell strain.

In some embodiments the CHO cell, a CHO K1 cell, a CHO K1 SV cell, a DG44 cell, a DXB11 cell, a CHO K1S cell, a CHO K1 M cell, or any other derivatives thereof.

The modified CHO cells of the present disclosure may be obtained by a method of random landing pad integration within a population of CHO cells, followed by selection of individual CHO cell clones, which display stable integration of the landing pad and expression of a suitable marker gene. Suitable modified CHO cells identified in this manner, can then be used for site specific integration (SSI) of exogenous nucleic acid, where the exogenous nucleic acid is introduced into the genome of the modified CHO cell by a method of recombinase mediated cassette exchange (RMCE), using appropriate recombination sites within the landing pad inserted within the modified CHO cell. Thereafter, expression of the inserted exogenous nucleic acid is enabled through use of a suitable promoter element and other regulatory sequences, which are present in the landing pad. Such a technique is well known to the skilled addressee and is described, for example, in Hamaker and Lee (2018) and in more detail herein. Thus, in a further teaching of the present disclosure there is provided a modified CHO cell in accordance with the first teaching and associated embodiments, further comprising an exogenous nucleic acid sequence (typically a DNA sequence), inserted within the landing pad, through a method of cassette exchange, such as RMCE. As used herein, the term “cassette” is a type of mobile genetic element that comprises a nucleic acid and at least one recombination site. In some embodiments, the cassette comprises at least two recombination sites. In some embodiments, the cassette comprises a reporter or a selection marker. In some embodiments, a cassette is exchanged through RMCE.

As used herein, the term “exogenous” indicates that a nucleic acid sequence does not originate from the particular CHO cell and is introduced into the CHO cell by traditional DNA delivery methods, e.g., by transfection, electroporation, or transformation methods. An “exogenous” nucleotide sequence can have an “endogenous” counterpart in the CHO cell that is similar or identical to the exogenous nucleic acid, but the “exogenous” sequence is distinguished by being introduced into the CHO cell.

As referred to herein, a “reporter” is a gene whose expression confers a phenotype upon a cell that can be easily identified and measured. In some embodiments, the reporter gene comprises a fluorescent protein gene.

Suitable reporters include green fluorescent protein (GFP), enhanced GFP (eGFP), a synthetic GFP, a yellow fluorescent protein (YFP), an enhanced YFP (eYFP), a cyan fluorescent protein (CFP), mPlum, mCherry, tdTomato, mStrawberry, J-red, DsRed-monomer, mOrange, mKO, mCitrine, Venus, YPet, Emerald6, CyPet, mCFPm, a Cerulean reporte, and a T-Sapphire reporter.

As referred to herein, the term “selection marker” refers to the use of a gene, which encodes an enzymatic activity that confers the ability to grow in medium lacking what would otherwise be an essential nutrient; in addition, a selection gene may confer resistance to an antibiotic or drug upon the cell in which the selection gene is expressed. A selection gene may be used to confer a particular phenotype upon a host cell. When a host cell must express a selection gene to grow in selective medium, the gene is said to be a positive selection gene. Selection genes can also be used to select against host cells containing a particular gene; selection genes used in this manner are referred to as negative selection genes.

Examples of selection markers include an aminoglycoside phosphotransferase (APH) (e.g., hygromycin phosphotransferase (HYG), neomycin and G418 APH), dihydrofolate reductase (DHFR), thymidine kinase(TK), glutamine synthetase (GS), asparagine synthetase, tryptophan synthetase (indole), histidinol dehydrogenase (histidinol D), and genes encoding resistance to puromycin, blasticidin, bleomycin, phleomycin, chloramphenicol, Zeocin, and mycophenolic acid.

As referred to herein, the term “landing pad” refers to a nucleic acid sequence comprising at least one recombination target site integrated into a host cell’s genome. In some embodiments, a landing pad comprises two or more recombination target sites integrated into a host cell’s genome. In some embodiments, the cell comprises one or more, such as 2, 3, 4 or more landing pads. In some embodiments, landing pads are integrated at up to 1 , 2, or 3 distinct genomic loci.

A landing pad may further comprise regulatory elements, which are capable of controlling some aspect of the expression of nucleic acid sequences. This may include a promoter, promoter sequence, or promoter region and this refers to a DNA regulatory region/sequence capable of binding RNA polymerase and involved in initiating transcription of a downstream coding or non-coding sequence. In some examples of the present disclosure, the promoter sequence includes the transcription initiation site and extends upstream to include the minimum number of bases or elements necessary to initiate transcription at levels detectable above background. In some embodiments, the promoter sequence includes a transcription initiation site, as well as protein binding domains responsible for the binding of RNA polymerase. Eukaryotic promoters will often, but not always, contain “TATA” boxes and “CAT” boxes. Various promoters, including inducible promoters, may be used to drive the gene expression, e.g., in the CHO cell or vectors of the present disclosure. In some embodiments, the promoter is not a leaky promoter, i.e., the promoter is not constitutively expressing any one of the gene products as described herein.

The promoter may be a heterologous promoter, which is derived from a different species than the gene to which it is operably linked. In some embodiments, the heterologous promoter may derived from a prokaryotic system. In some embodiments, the heterologous promoter may derived from a eukaryotic system. Examples of suitable promoters include CMV promoter, EF1 alpha promoter, and PGK promoter.

In order to effect/control expression, a promoter, promoter sequence, or promoter region is operably linked to the nucleic acid to be expressed. For example, a promoter is operably linked to a nucleic acid if the promoter acts to modulate the transcription of the nucleic acid. In certain embodiments, an operably linked promoter is located upstream of the coding sequence and can be adjacent to it. Prior to integrating into the CHO cell genome, the landing pad may be provided is a suitable vector, which is introduced to the CHO cell to be modified. A “vector” is a replicon, such as a plasmid, phage, virus, or cosmid, to which another DNA segment, such as a landing pad of the present disclosure, may be attached, to bring about the replication of the attached DNA segment in a cell. “Vector” includes episomal (e.g., plasmids) and non-episomal vectors. In some embodiments of the present disclosure the vector is an episomal vector, which is removed/lost from a population of CHO cells after a number of cellular generations, e.g., by asymmetric partitioning. Vectors may be introduced into the desired host cells by well-known methods, including, but not limited to, transfection, transduction, cell fusion, and lipofection. Vectors can comprise various regulatory elements including promoters.

The landing pad may further comprise one or more, such as two barrier insulator sequences, which serve to protect the exogenous nucleic acid and its expression. Conveniently, two (optionally identical) insulator sequences may be provided adjacent to the location within the landing pad, where the exogenous nucleic acid is to be inserted. Examples of such insulator sequences include cHS4, UCOE, MAR and STAR (see, for example, Romanova N. and Noll T. Engineered and Natural Promoters and Chromatin-Modifying Elements for Recombinant Protein Expression in CHO Cells. Biotechnology Journal 13, 1700232 (2018).

In an embodiment, the landing pad may be introduced into the CHO cell through use of a suitable transposase system, such as the Tol2 transposase system known in the art Kawakami, K. To/2: a versatile gene transfer vector in vertebrates. Genome Biol 8, S7 (2007). . The Tol2 transposase system contains two vectors, such as E. coli derived plasmids. One vector, referred to as the helper plasmid, encodes the transposase. The other vector, referred to as the transposon plasmid, contains two inverted terminal repeats (ITRs) which flank the landing pad sequence to be transposed.

Transfected CHO cells may initially be selected by making use of selection marker expression However, in order to identify and select for clones, which display the highest levels of expression, cells may further be selected on the basis of reporter expression. In one embodiment, transfected cells, which comprise a suitably integrated landing pad may be initially selected on the basis of being able to survive when grown in the presence of a selection marker, such as an antibiotic. Such cells may then further analysed in order to identify which cells display the highest level of expression of a reporter gene within the landing pad. In one embodiment, cells which display the highest levels of reporter expression are identified and selected for growing on, using a technique of laser-enabled analysis and processing (LEAP), which allows unwanted cell colonies to be ablated, such that a single, high level reporter expressing clone can be isolated and retained/grown on (Szaniszlo P, Rose WA, Wang N, Reece LM, Tsulaia TV, Hanania EG, Elferink CJ, Leary JF. Scanning cytometry with a LEAP: laser-enabled analysis and processing of live cells in situ. Cytometry A. 2006 Jul;69(7):641 - 51.

If more than one landing pad integrates into a single genome, this may or may not be desirable depending on circumstances. If appropriate, a CHO cell, which includes more than one landing pad can used to generate CHO cells which comprise only a single landing pad. This may be achieved by determining the nucleic acid sequence immediately adjacent to one of said more than one landing pads which have initially been inserted and then generating a further landing pad which comprises flanking sequences which are homologous with sequence adjacent to one of said more than one landing pad. Such a landing pad, with homologous flanking sequences can be integrated into the genome of a non-modified CHO cells using conventional homologous recombination or homology-directed repair techniques known in the art. This may involve the use of an exogenous nuclease to promote the targeted integration. The exogenous nuclease may be, for example, a zinc finger nuclease (ZFN), a ZFN dimer, a transcription activator-like effector nuclease (TALEN), a TAL effector domain fusion protein, an RNA-guided DNA endonuclease, an engineered meganuclease, and a clustered regularly interspaced short palindromic repeats (CRISPR)-associated (Cas) endonuclease.

As used herein, the term “flanking” refers to a nucleotide sequence is located at either a 5 or 3 end, or both ends of a landing pad sequence. The flanking nucleotide sequence can be adjacent to or at a defined distance from the landing pad sequence. There is no specific limit of the length of a flanking nucleotide sequence. For example, a flanking sequence can be a few base pairs or a few thousand base pairs.

Once a landing pad has been inserted into a CHO cell genome, the exogenous nucleic acid may be inserted through site-specific integration. “Site-specific integration” refers to integration of a nucleic acid sequence into a chromosome at a specific site, in this case within the inserted landing pad. Site-specific integration is effected by rearrangement of two DNA partner molecules by specific enzymes performing recombination at their cognate pairs of sequences or target sites. Site-specific recombination, in contrast to homologous recombination, requires no DNA homology between partner DNA molecules, is RecA-independent, and does not involve DNA replication at any stage. In some embodiments, site-specific recombination uses a site-specific recombinase system to achieve site-specific integration of an exogenous nucleic acid in the modified CHO cell. A recombinase system typically consists of three elements: two specific DNA sequences (recombination recognition sequence (RRS)) and a specific enzyme (recombinase). The recombinase catalyzes a recombination reaction between the specific recombination sites. A recombinase enzyme, or recombinase, is an enzyme that catalyzes recombination in sitespecific recombination. In some embodiments of the disclosure, the recombinase used for site-specific recombination is derived from a non-mammalian system. In some embodiments, the recombinase is derived from bacteria, bacteriophage, or yeast.

Typically, the inserted landing pad and the exogenous nucleic acid each comprise one or more RRSs. In certain embodiments, the inserted landing pad and the exogenous nucleic acid each comprise at least two RRSs. The RRS can be recognized by a recombinase, for example, a Cre recombinase, an FLP recombinase, a Bxbl integrase, or cpC31 integrase. The RRS can be selected from the group consisting of a LoxP sequence, a LoxP L3 sequence, a LoxP 2L sequence, a LoxFas sequence, a Lox5l I sequence, a Lox2272 sequence, a Lox2372 sequence, a Lox5l7l sequence, a Loxm2 sequence, a Lox7l sequence, a Lox66 sequence - all of which are recognised by Cre; a FRT sequence, which is recognised by FLP; a Bxbl attP sequence and a Bxbl attB sequence, which is recognised by Bxbl; and a cpC31 attP sequence and a cpC31 attB sequence which is recognised by cpC31 integrase. More details are provided in Srirangan K., Loignon M. and Durocher Y. The use of site-specific recombination and cassette exchange technologies for monoclonal antibody production in Chinese Hamster Ovary cells: retrospective analysis and future directions. Critical Reviews in Biotechnology. 2020, 40, 6, 833-851 , to which the skilled reader is directed.

The presently disclosed subject matter also provides methods for the targeted integration of an exogenous nucleic acid into a CHO cell to facilitate the expression of a protein or polypeptide of interest. In certain embodiments, such methods comprise the targeted integration of an exogenous nucleic acid into a host cell via recombinase-mediated recombination or the use of an exogenous nuclease to promote targeted integration; and use of a landing pad, which has been integrated into the CHO cell genome at a specific location as described herein. The exogenous nucleic acid, following integration into the CHO cell genome, may be expressed and is stable over an extended period and many cell generations.

The present disclosure further provides landing pads suitable for use in the methods and generating modified CHO cells as described herein. In one embodiment there is taught a landing pad construct comprising in a 5’ - 3’ direction: a first inverted repeat (IR) transposon sequence; a first promoter sequence; a first recombinase site for a first recombinase; a first recombinase site for a second recombinase; an antibiotic resistance gene sequence; a first insulator site; a first recombinase site for a third recombinase; a reporter gene sequence and promoter on the complimentary strand; a second recombinase site for the third recombinase; a second insulator site; a second recombinase site for the second recombinase; a second recombinase site for a second recombinase; and a second IR transposon sequence.

In an embodiment, the first and second IR transposon sequences are Tol2 IR transposon sequences. In an embodiment, the first promoter sequence is a strong promoter, such as an SV40 promoter. In an embodiment, the first recombinase sites are FRT sequences, for FLP recombinase. In an embodiment, the second recombinase sites are BxB1 sites for BxB1 recombinase. In an embodiment, the antibiotic resistance gene is a neomycin resistance gene. In an embodiment, the first and second insulator sites are 2 cHs4 insulator sequences. In an embodiment, the third recombinase sites are cpC31 integration sites for cpC31 serine integrase. In an embodiment the reporter gene and its promoter is a GFP gene and its promoter.

An exemplary landing pad sequence in accordance with the above is identified in SEQ ID NO: 1.

In a further embodiment there is taught a landing pad construct comprising in a 5’ - 3’ direction: a first homologous sequence which is homologous with a first portion of sequence approximately 190,595 bp upstream of the Lemd2 gene start codon - 2188 bp downstream of the LemD2 stop codon; 11 ,469 bp upstream of the Cdk2ap2 start codon - 2585 bp downstream of the Cdk2ap2 stop codon; or 25902 bp upstream of the Mrpl4 start codon - 7002 bp downstream of the Mrpl4 stop codon.; a first insulator site; a first promoter sequence, a first recombinase site for a first recombinase; an antibiotic resistance gene sequence; an internal ribosome entry Hellen C.U. and Sarnow, P. (2001 ) Internal ribosome entry sites in eukaryotic mRNA molecules. Genes Dev., 15, 1593-1612. site; a reporter gene; a second recombinase site for the first recombinase; a second insulator site; and a second homologous sequence which is homologous with a second portion of sequence, 3’ of the first homologous sequence, approximately 190,595 bp upstream of the Lemd2 gene start codon - 2188 bp downstream of the LemD2 stop codon; 11 ,469 bp upstream of the Cdk2ap2 start codon - 2585 bp downstream of the Cdk2ap2 stop codon; or 25902 bp upstream of the Mrpl4 start codon - 7002 bp downstream of the Mrpl4 stop codon.

It is to be appreciated that the first portion of homologous sequence is typically 5’ of the second portion sequence. In an embodiment, the first and second portions of homologous sequence are positioned approximately 52,027 bp upstream of the LemD2 start codon - 1937 bp downstream of the LemD2 stop codon; 11 ,306 bp upstream of the Cdk2ap2 start codon - 2585 bp downstream of the Cdk2ap2 stop codon; or 24994 bp upstream of the Mrpl4 start codon - 7161 bp downstream of the Mrpl4 stop codon. In one embodiment, the homologous sequence is not found within an exon of the identified genes.

In one embodiment, the homologous sequence is found upstream of LemD2.

In one embodiment, the homologous sequence is found within an intron of Cdk2ap2, such as intron 1 of the Cdk2ap2 gene

In one embodiment, the homologous sequence is found within an intron of the Mrpl4 gene, such as within intron 4 of the Mrpl4 gene

The first and second homologous sequences may be approximately 250 bp - 2 Kb in length, such as 500 bp - 1 .5 kb, or 0.7 - 1 kb in length. In an embodiment, the recombinase sites are BxB1 sites for BxB1 recombinase. In an embodiment, the antibiotic resistance gene is a neomycin resistance gene. In an embodiment, the first and second insulator sites are 2 cHs4 insulator sequences. In an embodiment, the reporter gene is a GFP gene.

Modified CHO cells of the present disclosure, which comprise a landing pad and the inserted exogenous nucleic acid, are capable of long-term stable expression of the inserted exogenous nucleic acid. Conveniently, the exogenous nucleic acid encodes a protein and it is the protein, which is expressed (and optionally isolated) by the modified CHO cell of the present invention. In some embodiments, the protein is a therapeutic protein. Exemplary therapeutic proteins, include antibodies, such as monoclonal antibodies and bi-specific monoclonal antibodies, as well as well know active fragments thereof and/or fusions proteins comprising antibody fragments. Other therapeutic proteins include blood-clotting factors, anticoagulants, growth factors, hormones, interferons, interleukins, cytokines, bone morphogenetic proteins and enzymes, for example. For the expression of antibodies and antibody fragments, expression of one or both of a light chain and heavy chain sequence may be achieved.

As used herein, the term “expression” refers to transcription and/or translation. In certain embodiments, the level of transcription of a desired product can be determined based on the amount of corresponding mRNA that is present. For example, mRNA transcribed from a sequence of interest can be quantitated by PCR or by Northern hybridization. In certain embodiments, protein encoded by a sequence of interest can be quantitated by various methods, e.g. by ELISA, by assaying for the biological activity of the protein, or by employing assays that are independent of such activity, such as Western blotting or radioimmunoassay, using antibodies that recognize and bind to the protein. By long-term stable expression, we mean that modified CHO cells in accordance with the teaching herein are capable of expressing an exogenous protein and being maintained for at least 25, such as, 40, 50, 75 generations or more.

The terms “comprise(s),” “include(s),” “having,” “has,” “can,” “contain(s),” and variants thereof, as used herein, are intended to be open-ended transitional phrases, terms, or words that do not preclude the possibility of additional acts or structures. The singular forms “a,” “an” and “the” include plural references unless the context clearly dictates otherwise. The present disclosure also contemplates other embodiments “comprising,” “consisting of’, and “consisting essentially of,” the embodiments or elements presented herein, whether explicitly set forth or not. Additionally, reference to an embodiment is not to be construed as limiting and one or more embodiments as described herein, may be combined with one or more other embodiments, unless context otherwise dictates.

All publications, patent applications, patents and other references mentioned herein are incorporated by reference in their entirety. The materials, methods, and examples disclosed herein are illustrative only and not intended to be limiting.

Detailed description of the disclosure

The present disclosure will now be further described by way of example and with reference to the Figures, which show:

Figure 1 shows a schematic representation of the landing pad construct that was used for random integration into the CHO-K1 genome and which enabled the identification of the three genomic locations disclosed herein. The full sequence of this landing pad construct can be found in the supplementary information;

Figure 2 shows a schematic representation of a landing pad cassette for CRISPR/Cas9- targeted HDR-mediated knock-in at Cdk2ap2 or Mrpl4. Abbreviations: HR - homology region; 2cHS4 - two copies of the chicken p-globin insulator; SV40p - simian virus 40 early enhancer and promoter; IRES - internal ribosome entry site; GFP - Green Fluorescence Protein;

Figure 3 shows a graph of GFP fluorescence as measured in triplicate using the Fluostar Omega plate reader (BMG labtech) and standardised to 1 x 10 6 cells after background subtraction (CHO-K1 without landing pad) prior to each passage. In total, 17 passages were performed representing about 130 generations; Figure 4 shows a schematic representation of the Herceptin cassette that was used to recombine into the landing pad and to investigate long term protein expression stability in cell line B4. The sequence of this cassette can be found in the supplementary information;

Figure 5 is a Western blot showing expression of herceptin was detected by Licor IRDye® 680RD Goat anti-Human IgG (detects both LC + HC) in CHO culture media; and

Figure 6 is a Western blot showing IgG (Herceptin) detected in cell growth media. Passage number is indicated, as is the molecular weight of the herceptin heavy chain (HC) and light chain (LC). Passage 16 (p16) corresponds to approximately 120 cell divisions.

Figure 7 shows stable expression of GFP at the Cdk2ap2 locus, over 77 generations. CHO- K1 WT control is shown in green while other peaks represent three biological replicates for GFP containing cell lines. Y axis represents the number of cells counted, while the X axis represents the fluorescence observed from each cell line.

Figure 8 shows stable expression of GFP at the Mrpl4 locus, over 77 generations. CHO-K1 WT control is shown in green while other peaks represent three biological replicates for GFP containing cell lines. Y axis represents the number of cells counted, while the X axis represents the fluorescence observed from each cell line.

Figure 9 shows the region surrounding LemD2 identified in NCBI accession number GCF_003668045.3. Scaffold accession number NW_023276806.1. Hashed regions within CDS are exons while the regions in between are introns. Grey regions within mRNA indicate sections with elevated RNA levels in frame with CDS.

Figure 10 shows the region surrounding Cdk2ap2 identified in NCBI accession number GCF_003668045.3. Scaffold accession number NC_048596.1. Hashed regions within CDS are exons while the regions in between are introns. Grey regions within mRNA indicate sections with elevated RNA levels in frame with CDS.

Figure 11 shows the region surrounding Mrpl4 identified in NCBI accession number GCF_003668045.3. Scaffold accession number NC_048597.1. Hashed regions within CDS are exons while the regions in between are introns. Grey regions within mRNA indicate sections with elevated RNA levels in frame with CDS. Black arrow indicates non-coding RNA (ncRNA). Methods

Random landing pad integration

For identification of suitable ‘open chromatin’ sites in the genome a construct was created that would serve as a reporter for the level of transcription in the first instance as well as subsequently serve as a landing pad suitable for RMCE through its recombination sites (Figure 1 and SEQ ID NO:1 ). To shield the landing pad from the potentially adverse effects of encroaching, repressive heterochromatin flanking double cHS4 insulator sites were included. The construct was embedded within inverted repeats to create a Tol2 transposon. Using lipofection, CHO-K1 cells were co-transfected with the landing pad construct and a Tol2 transposase expression plasmid to achieve random transposition into the genome.

Following antibiotic selection with G418 (geneticin) for 14 days a pool of clones was obtained. Next 96-well plates were seeded at an average of 5 cells/well and incubated for two days to allow colony formation. The laser enabled analysis and processing (LEAP) system was used to ablate unwanted cell colonies from each well, such that a single clone per well, selected based on the strongest GFP fluorescence, was retained.

Site-specific Landing pad construction

Once a number of ‘open’ genomic sites were identified, we explored their suitability as landing pad sites by directed targeting of these genomic regions with a different landing pad construct (see Figure 2). This allowed us to separate the two landing pad sites of the G6 landing pad cell line into individual landing pad clones. Targeting was enabled by use of CRISPR/Cas9 (see below). Targeting constructs containing homology arms to the identified genomic sites were assembled using the modified Extensible Mammalian Modular Assembly (EMMA) platform by Edinburgh Genome Foundry using the process described in Martella et aL, 2016. New modules were ordered from Invitrogen GeneArt® (Thermofisher Scientific, UK) and domesticated to be functional parts of the platform. Homology regions (HR) of ~ 0.7-1 kb length were either amplified to contain appropriate overhangs for domestication using polymerase chain reaction (PCR) from the genomic DNA of an adherent CHO-K1 culture, or designed using CHOgenome database (http://www.choqenome.org/), and, subsequently, ordered from Invitrogen GeneArt® (Thermofisher Scientific, UK). The assemblies were analysed by fragments analysis using Fragment Analyzer (Agilent), restriction enzyme digestion, and Sanger sequencing (Edinburgh Genomics, Dundee Sequencing and/or Macrogen).

Targeted integration at Cdk2ap2 and Mrpl4 with CRISPR-Cas9

Single guide RNAs (sgRNAs) were designed using the Benchling platform (https://benchlinq.com). The selection of sgRNAs was based on the Cricetulus griseus genome targeting scores. The scores were calculated using algorithms developed by Doench, Fusi et aL, 2016 (on-target), as well as Hsu et aL, 2013 (off-target). Chemically modified Alt- R® CRISPR-Cas9 sgRNAs and Alt-R® S.p. HiFi Cas9 Nuclease V3 (Integrated DNA Technology, Belgium, Cat. No. 1081061 ) were used (Table 1 ).

Table 1 - Single guide RNA sequences.

Integrant copy number determination

In order to determine the landing pad copy number within each cell line, quantitative PCR (qPCR) was performed using genomic DNA isolated from the relevant clonal cell line. We compared the abundance of the neomycin resistance gene (contained within the landing pad) relative to two housekeeping genes, p-actin and B2M which are known to be present in the CHO-K1 genome.

Integration site identification

A modified inverted PCR method was used to determine integration sites. In short, after stable clone isolation, genomic DNA from the newly established clonal cell lines was isolated and fragmented using the Nlalll restriction enzyme. These fragments were circularised by intramolecular ligation at low concentration and used for inverse PCR amplification with primers located in the known part of the sequences. The resulting amplification products were analysed by Sanger DNA sequencing. Sequences extending from the known landing pad construct sequence into the CHO genome were analysed with BLAST against the CHO-K1 reference genome, thereby identifying the integration location.

Targeted integration of Herceptin

Herceptin (a human IgG monoclonal antibody) was chosen for proof-of-concept integration into the landing pad by recombinase-mediated cassette exchange (RMCE), replacing the GFP reporter module, to investigate suitability for therapeutic protein expression and secretion. The single landing pad containing B4 cell line was co-transfected with a Herceptin expression cassette together with an expression construct for <J>C31 integrase to affect cassette exchange on the <J>C31 recombination sequences contained in the landing pad. Successful recombinants were selected for by puromycin resistance. A similar approach, in this case using the Bxbl recombinase was used to recombine a Herceptin expression construct into landing pad cell line generated by CRISPR assisted gene targeting into the Cdk2ap2 locus. Results

Landing pad transfection and cell line isolation

The landing pad/reporter construct which was used to identify these sites were inserted into the CHO-K1 genome randomly following co-transfection of pCBT2+ and pCMV-Tol2. The landing pad region of pCBT2+ was composed of the following in 5’-3’ direction (Figure 1 ): Tol2 IR transposon sequence; SV40promoter; Flp recombinase site; BxB1 recombinase site; Neomycin resistance gene; 2cHS4 insulator site; <J>C31 serine integrase site; GFP and promoter on complimentary strand; <J>C31 serine integrase site; 2cHS4 insulator site; Flp recombinase site; Tol2 IR transposon sequence.

Subsequent to transfection and isolation of individual clones, those which produced strongest levels of GFP fluorescence were taken forward for further analysis of expression stability. Within these clones were the B4 and G6 cell lines, which exhibited consistently high levels of GFP expression. qPCR indicated that the B4 cell line had a single integration of the landing pad, with a relative copy number of 1 .27, against p-actin, and 1.13, against B2M, observed. The G6 cell line on the other hand was found to have two copies of the landing pad integrated with a relative copy number of 2.85, against p-actin, and 2.38, against B2M, observed.

Subsequently, inverse PCR was utilised to identify the location of these integration sites. In the B4 cell line this location was found to be at 40,471 bp of the CHO-K1 reference genome, NCBI Reference Sequence: NW_003614972.1 (referred to hereafter as Lemd2 - see Figure 9 for genomic location of Lemd2). In the G6 cell line the locations were found to be 188,703 bp of CHO-K1 reference genome, NCBI Reference Sequence: NW_003614797.1 (referred to hereafter as Cdk2ap2 - see Figure 10 for genomic location of Cdk2ap2) and 1 ,214,962 bp of CHO-K1 reference genome, NCBI Reference Sequence: NW_003613752.1 (referred to hereafter as Mrpl4 - see Figure 11 for genomic location of Mrlp4). The genomic coordinates listed above are according to the CHO-K1 genome sequence accessed via NCBI (https://www.ncbi.nlm.nih.qov/assembly/GCF 000223135.1 )).

Stability studies

Stability studies were carried out to determine the effectiveness of each genomic location for stable protein expression. As the B4 clone had been shown to contain a single landing pad, this cell line was utilised further to explore stability of gene expression at the genomic location upstream of Iemd2. However, since the G6 cell line contained two landing pad copies, additional cell lines were prepared with single landing pad integrations at each genomic location (Cdk2ap2 and Mrpl4) using CRISPR-Cas9. To that end, a simplified, alternative landing pad construct was made, outlined in Figure 2, with specific homology arms to the regions flanking Cdk2ap2 or Mrpl4.

Lemd2

GFP expression was analysed for about 130 generations (Figure 3). During this time, the B4 cell-line was cultured with and without antibiotic (G418) selection in order to find out if it was necessary for stability of reporter expression to keep cells under selective pressure. Measurements were carried out in a 12-well plate using a standard plate reader. The results show stable expression of GFP at Lemd2 for approximately 3 months. Furthermore, no statistical difference was observed between culture in the presence or absence of G418 demonstrating the stability of the genetic location for continued gene expression.

Herceptin was successfully integrated at the landing pad (using the construct shown in Figure 4 and RCME using cpC31 serine integrase) and is highly expressed and secreted into the media in almost all of the subclones isolated from the puromycin resistant pool. To ensure herceptin production remained stable over many cell generations, its expression was followed over time in a similar assay to GFP. Cells expressing Herceptin (i.e. both the heavy and light chains of Herceptin) were grown for approximately 120 cell divisions, over 16 passages. Samples were taken after each passage. The cells were cultured in the presence and absence of puromycin to show that expression of the integrated herceptin genetic construct was also stable during production. The results (Figure 6) confirm both that the genetic construct is stable in the absence of puromycin, as well as herceptin production being stable throughout the production process. This is further confirmation that Lemd2 is a suitable genomic location for long-term, stable protein expression within CHO-K1.

Cdk2ap2

GFP expression stability was analysed following isolation of the single integrant at Cdk2ap2. The results show after 77 generations (~2 months) all replicate clones have retained GFP expression in comparison to the CHO-K1 WT cell line (Figure 7).

Mrpl4

In a similar manner to Cdk2ap2, the Mrpl4 site was analysed as a single integration site to validate its use for stable protein expression. The results show GFP stability analysis after 77 generations (~2 months) all replicate clones have retained GFP expression in comparison to the CHO-K1 WT cell line (Figure 8). Conclusion

The invention being disclosed is the identification of three genomic locations where, upon insertion of a landing pad, stable protein expression was observed over multiple generations (>77 generations, which equates to more than 2 months based on a doubling time of 20 hours) in a CHO-K1 genetic background. The location of these genomic insertions was determined by inverse polymerase chain reaction (PCR) and Sanger sequencing to be: 40,471 bp (Lemd2); 188,703 bp (Cdk2ap2); 1 ,214,962 bp (Mrpl4). The genomic coordinates listed above are according to the accepted CHO-K1 genome sequence accessed via NCBI (https://www.ncbi.nlm.nih.gov/assembly/GCF 000223135.1 )). The Lemd2 site was identified in a CHO-K1 cell line derivative, known as B4, while the Cdk2ap2 and Mrpl4 sites were simultaneously found within a different CHO-K1 cell line derivative, known as G6.

Stable expression at all sites was observed for GFP. Furthermore, using recombinase sites flanking the GFP reporter within the landing pad, the antibody Herceptin (IgG) was incorporated into the Lemd2 and Cdk2ap2 genomic locations and found to allow stable protein expression for >120 generations.