Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
MULTIPLE COPY GENE CONSTRUCTS AND METHODS FOR THE RAPID GENERATION OF RECOMBINANT PROTEIN PRODUCTION CELL LINES
Document Type and Number:
WIPO Patent Application WO/2018/234223
Kind Code:
A1
Abstract:
The present invention is directed to multiple-copy gene constructs comprising multiple gene constructs for use in recombinant protein-producing cells, wherein each gene copy in the multi- ple gene constructs encodes the identical protein of interest but varies from the other genes in the multiple copy gene construct by at least 0.5%, preferably at least 5 % in its DNA sequence. The multiple-copy gene constructs may comprise more than one set of multiple gene constructs, wherein each set of gene constructs encodes the identical protein of interest but different sets of gene constructs encode different proteins, e.g. multi unit proteins such as light and heavy chains of an antibody. In further aspects the invention also relates to corresponding vectors, cells, and methods for using the multiple-copy gene construct as well as a kit of parts for use in such methods.

Inventors:
BENENSON YAAKOV (CH)
ALTAMURA RAFFAELE (CH)
DOSHI JITEN (CH)
Application Number:
PCT/EP2018/066078
Publication Date:
December 27, 2018
Filing Date:
June 18, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ETH ZUERICH (CH)
International Classes:
C12N15/67; C12N15/90
Domestic Patent References:
WO2005123915A12005-12-29
Other References:
KUVSHINOV VIKTOR ET AL: "Double recoverable block of function - a molecular control of transgene flow with enhanced reliability", ENVIRONMENTAL BIOSAFETY RESE, ISBR. EDP SCIENCES, XX, vol. 4, no. 2, 1 April 2005 (2005-04-01), pages 103 - 112, XP008169576, ISSN: 1635-7922, [retrieved on 20110314], DOI: 10.1051/EBR:2005015
HASEGAWA K ET AL: "Insulators prevent transcriptional interference between two promoters in a double gene construct for transgenesis", FEBS LETT, ELSEVIER, AMSTERDAM, NL, vol. 520, no. 1-3, 5 June 2002 (2002-06-05), pages 47 - 52, XP004361052, ISSN: 0014-5793, DOI: 10.1016/S0014-5793(02)02761-8
HACKER DAVID L ET AL: "Recombinant protein production from stable mammalian cell lines and pools", CURRENT OPINION IN STRUCTURAL BIOLOGY, ELSEVIER LTD, GB, vol. 38, 17 June 2016 (2016-06-17), pages 129 - 136, XP029707729, ISSN: 0959-440X, DOI: 10.1016/J.SBI.2016.06.005
NATURE BIOTECHNOLOGY, vol. 22, 2004, pages 1393 - 1398
BIOTECHNOLOGY AND BIOENGINEERING, vol. 108, 2011, pages 2434 - 2446
LEE, SCIENTIFIC REPORTS, vol. 5, 2015, pages 8572
BELL LABS TECHNICAL JOURNAL, vol. 29.2, 1950, pages 147 - 160
NUCLEIC ACIDS RESEARCH, vol. 41, 2013, pages 6
METHODS IN ENZYMOLOGY, vol. 350, 2002, pages 248
APPLIED MICROBIOLOGY AND BIOTECHNOLOGY, vol. 72, 2006, pages 211
LAU; SUN, BIOTECHNOL ADV, vol. 27, 2009, pages 1015 - 22
JOURNAL OF BIOLOGICAL CHEMISTRY, vol. 269.47, 1994, pages 29831 - 29837
NATURE BIOTECHNOLOGY, vol. 33, 2015, pages 198 - 203
HRUBY, D. E.: "Vaccinia virus vectors: new strategies for producing recombinant vaccines", CLINICAL MICROBIOLOGY REVIEWS, vol. 3, no. 2, 1990, pages 153 - 170
METHODS IN ENZMOLOGY, vol. 350, 2002, pages 248
WESTERS ET AL., MOL. CELL. RES., vol. 1694, no. 1-3, 2004, pages 299 - 310
LAU; SUN, BIOTECHNOL ADV, 18 May 2009 (2009-05-18)
NUCLEIC ACIDS RESEARCH, vol. 41, no. 6, 2013
PHARMACEUTICALS, vol. 6, no. 5, 2013, pages 579 - 603
NATURE PROTOCOLS, vol. 5.8, 2010, pages 1379 - 1395
Attorney, Agent or Firm:
KASCHE, André (CH)
Download PDF:
Claims:
Claims

1. Multiple copy gene construct, comprising

(i) multiple gene constructs, wherein each gene construct comprises

a. a gene of interest encoding a protein of interest,

b. regulatory sequences for the expression of the gene of interest, c. optionally at least one insulator sequence separating the gene constructs;

(ii) optionally at least one integration site, preferably one or more recognition sites for a site-specific recombinase, and

(iii) optionally at least one marker and/or selection gene indicating positive integration of the multiple copy gene construct in a cell,

wherein each gene of interest in each gene construct in the multiple copy gene construct encodes the identical protein of interest but varies from the other genes of interest in the multiple copy gene construct by at least 0.5 %, preferably at least 5 %, more preferably at least 10 %, most preferably at least 15 % of its nucleic acid sequence.

2. Multiple copy gene construct according to claim 1, wherein in addition to the gene of interest at least one further element of each gene construct in the multiple gene constructs, preferably regulatory elements and/or insulator sequences, varies from the other further element in the multiple gene construct by at least 0.5 %, preferably at least 5 %, more preferably at least 10 %, most preferably at least 15 % of its nucleic acid sequence.

3. Multiple copy gene construct according to claim 1 or 2, comprising more than one set of multiple gene constructs, wherein one set of multiple gene constructs encodes one identical protein of interest, and wherein further set(s) of multiple gene constructs encode one or more different proteins of interest.

4. Multiple copy gene construct according to any of claims 1 to 3, wherein each mRNA

encoded by each gene of interest has a folding energy for the first 100 RNA nucleotides that is at least 10 % higher, preferably at least 15 % higher, more preferably at least 30 % higher, most preferably at least 45% higher than the folding energy of the first 100 wild type RNA nucleotides encoded by the wild type gene of interest.

5. Multiple copy gene construct according to any of claims 1 to 4, wherein the multiple gene construct does not comprise any stop codons and/or protein or RNA binding sites interfering with the expression of the protein of interest in a cell.

6. Multiple copy gene construct according to any of claims 1 to 5, wherein the number of gene constructs comprising the gene of interest is at least 2, preferably at least 5, more preferably at least 10.

7. Vector comprising a multiple copy gene construct according to any of claims 1 to 6,

preferably a vector selected from the group consisting of a plasmid DNA vector, bacterial artificial chromosome (BAC) vector, yeast artificial chromosome (YAC) vector, viral vector, episomal vector, baculovirus vector, lentivirus vector, adenovirus vector, vaccinia vector, retroviral vector, yeast and bacterial episomal vector.

8. Cell comprising a multiple copy gene construct according to any of claims 1 to 6 and/or a vector according to claim 7, preferably a cell selected from the group consisting of yeast cells, preferably Saccharomyces cerevisiae, Pichia pastoris cells, bacterial E. coli cells, B. subtilis cells, plant cells, preferably Nicotiana tabacum or Physcomirella patens cells, insect cells, preferably sf9 insect cells, mammalian cells, preferably an epithelial cell, an embryonic kidney cell, a fibroblast cell, a Chinese hamster ovary (CHO) cell, an HEK-293 cell, an HEK293T cell, a BHK cell, an NIH-3T3 cell, and COS mammalian cells.

9. A method for producing a recombinant, preferably mammalian cell for recombinant

protein production, comprising the steps of

(i) producing a multiple copy gene construct according to any of claims 1 to 6,

(ii) introducing the multiple copy gene construct of step (i) into the genome of a host cell, preferably by targeted integration of the multiple copy gene construct at a predetermined locus or loci,

(iii) optionally selecting cells that express the protein of interest.

10. Recombinant, preferably mammalian cell comprising a multiple copy gene construct

according to any of claims 1 to 6, and produced by a method according to claim 9.

11. A method for producing a recombinant protein of interest, comprising the step of culturing a recombinant cell according to claim 10.

12. Kit of parts comprising

(i) a storage medium comprising a computer program for generating nucleic acid

sequences encoding the identical protein of interest, wherein each of the generated nucleic acid sequences varies from the other generated nucleic acid sequences by at least 0.5 %, preferably at least 5 %, more preferably at least 10 %, most preferably at least 15 % of its nucleic acid sequence identity,

(ii) means for producing a multiple copy gene construct according to any of claims 1 to 6 comprising at least two, preferably at least 5, more preferably at least 10 nucleic acid molecules having nucleic acid sequences generated by the computer program of step (i),

(iii) optionally a vector for receiving the multiple copy gene construct of step (ii),

preferably comprising means for the targeted integration of the multiple copy gene construct into a host cell of interest,

(iv) optionally means for the integration, preferably targeted integration of the multiple copy gene construct produced in step (ii) into a host cell of interest,

(v) optionally means for identifying a marker indicative of the integration of the

multiple copy gene construct into the host cell of interest.

13. Kit of parts according to claim 12, further comprising sequence-varied regulatory elements and/or insulator sequences for use in the production of gene constructs of the multiple gene construct(s), preferably sequence-varied regulatory elements selected from a promoter, 3'-UTR, 5'-UTR, and polyA sequence, for producing the multiple copy gene construct, wherein the sequence of the sequence-varied regulatory elements and/or sequence-varied insulators varies without loss of function and expression of the protein of interest by at least 0.5 %, preferably at least 5 %, more preferably at least 10 %, most preferably at least 15 % of its nucleic acid sequence identity.

14. Kit of parts according to claim 12 or 13, wherein each mRNA encoded by each gene of interest for producing the multiple copy gene construct in step (ii) has a folding energy for the first 100 RNA nucleotides that is at least 10 % higher, preferably at least 15 % higher, more preferably at least 30 % higher, most preferably at least 45% higher than the fold energy of the first 100 wild type RNA nucleotides encoded by the wild type gene of interest.

Description:
Multiple copy gene constructs and methods for the rapid

generation of recombinant protein production cell lines

The present invention is directed to multiple-copy gene constructs comprising multiple gene constructs for use in recombinant protein-producing cells, wherein each gene copy in the multiple gene constructs encodes the identical protein of interest but varies from the other genes in the multiple copy gene construct by at least 0.5%, preferably at least 5 % in its DNA sequence. The multiple-copy gene constructs may comprise more than one set of multiple gene constructs, wherein each set of gene constructs encodes the identical protein of interest but different sets of gene constructs encode different proteins, e.g. multiunit proteins such as light and heavy chains of an antibody. In further aspects the invention also relates to corresponding vectors, cells, and methods for using the multiple-copy gene construct as well as a kit of parts for use in such methods.

Background

The manufacturing of recombinant proteins is a growing sector in the pharmaceutical industry, with sales of over 140 billion dollars in the period of 2010-2013 only. Continuous efforts are being made to streamline the whole process with the aim of making drug development more efficient and cost-effective.

The current workflow for cell line development for recombinant protein production typically requires (1) construction of an expression vector containing the gene of interest, its regulatory sequences and a selection marker with its own regulatory sequences; (2) transfection of the expression vector into the host cell line and subsequent random integration into the genome of a fraction of cells; the host cells normally require a selection marker for survival, thus facilitating selection; (3) application of a selection gradient which inhibits the activity of the selection marker, e.g. increasing concentrations of drugs such as MTX or MSX, resulting in co- amplification of the gene of interest and the selection marker; (4) generation of single-cell derived cell lines from survivor cells and screening these for the expression of the protein of interest; (5) selection of promising single-cell derived (monoclonal) lines with high productivity and appropriate growth characteristics, expansion of these and further evaluation (see Fig. 1 below for a schematic diagram of these steps).

However, the current methodology for engineering mammalian cell lines for recombinant protein production often suffers from clonal instability, clonal heterogeneity and transgene silencing. These issues are intrinsically related to the current methods employed for generating production cell lines, which are reliant on random integration of the transgene of interest and on the use of drugs for transgene amplification, and result in lengthy and labor-intensive procedures for screening and ultimately selecting clonal lines for production.

For example, when host cells, e.g. mammalian cells such as CHO cells, are transfected with a recombinant gene of interest together with its regulatory sequences and a selectable marker, this leads to the random integration of the plasmid(s) containing these genetic elements into the genome of a small fraction of host cells. These cells are then selected for via the selectable marker, e.g. dihydrofolate reductase (DHFR) gene, a most common selectable marker, which encodes an enzyme required for purine metabolism, which marker can be used for selection on a dhfr-/- genetic background. Such a selection strategy is routinely coupled to a drug-based gene- amplification scheme to improve recombinant protein expression. Methotrexate (MTX), for instance, is a drug that inhibits DHFR activity. Exposing cells to increasing concentrations of MTX for a period of 2-3 weeks normally results in a small pool of survivor cells containing several copies of the gene of interest. As a consequence of cell-to-cell variability in the site(s) of integration and the effect of amplification drugs on chromosomal structure and integrity, variation in recombinant protein expression levels among different cell clones may exceed two orders of magnitude (Nature Biotechnology 22, 1393-1398 (2004)). Clonal instability arises principally because of a reduction in the copy number of the gene of interest, and the silencing via methyl- ation of the promoter controlling the expression of the gene of interest (Biotechnology and bioengineering, 108: 2434-2446 (2011)). Moreover, clones may appear phenotypically stable under the effect of drugs such as MTX, but then exhibit a sharp decline with respect to productivity when the selective pressure is relieved. Therefore, the identification of high-producer clones becomes a laborious and time-consuming process, which normally requires the screening of hundreds or thousands of candidate clonal lines.

Hence, there is a need for means and methods that improve clonal stability, reduce or even bypass altogether the need for screening individual clones and do not require drug-based gene amplification cycles.

One approach for preventing transgene silencing uses insulating elements with the aim of preventing transgene silencing due to positional effects (i.e. integration of the transgene into sites of low transcriptional activity). A number of genetic elements have been discovered that have the potential to stabilize gene expression. Among these are stabilizing and anti-repressor elements (STARs), expression augmenting sequence elements (EASEs) and matrix attachment regions (MARs). Despite the appeal of insulating transgenes regardless of their site of integration, reports on the use of such expression-enhancing elements have shown mixed results (Lonza - Cell Line Development and Engineering Workshop, Prague, 2008). Another approach is the targeted integration of transgenes to prevent clonal heterogeneity. Variability in yield and quality attributes of the recombinant protein-of-interest among different clonal cell lines has long been attributed to positional effects arising from random integration. Recent efforts have moved in the direction of establishing a platform for site-specific (targeted) integration into producer lines. Lee et al. made use of CRISPR/Cas9 to integrate a fluorescent marker into four loci of CHO cells and reported a significant increase in the homogeneity of expression among targeted clones as compared with random integrands (Lee eta I., Scientific Reports, 5: 8572 (2015). Nevertheless, a few random integrands showed significantly higher expression than the targeted population, possibly due to the integration of multiple copies of the transgene.

In a further option for speeding up clone isolation, the limiting dilution (necessary in order to obtain clonal lines) has been replaced with clone isolation in semisolid medium

(ClonaCell TM Medium, StemCell Technologies). Concurrently, individual colonies growing in semi-solid medium can be evaluated with respect to protein productivity using fluorescently tagged antibodies specific to the protein of interest, which accumulates around the colonies. ClonePix FL (ThermoFisher) is an instrument developed to automate the imaging, selection and picking workflow, thereby shortening evaluation and isolation time. While the ClonePix technology does shorten the time required to select high-producers, the clones chosen may not exhibit the stability required to enter the production process. While multiple rounds of cloning and isolation from semi-solid medium may aid in selecting stable clones, this may considerably lengthen the process.

It is the objective of the present invention to provide means and methods for efficiently producing cell clones with stably integrated transgenes for producing recombinant proteins of interest in an efficient and timewise economic manner. It is a further objective to provide means and methods for bypassing at least partially, preferably altogether the need for screening individual clones.

In a first aspect, the present invention is directed to a multiple copy gene construct, comprising

(i) multiple gene constructs, wherein each gene construct comprises

a. a gene of interest encoding a protein of interest,

b. regulatory sequences for the expression of the gene of interest,

c. optionally at least one insulator sequence separating the gene constructs;

(ii) optionally at least one integration site, preferably one or more recognition sites for a site- specific recombinase, and (iii) optionally at least one marker and/or selection gene indicating positive integration of the multiple-copy gene construct in a cell,

wherein each gene of interest in each gene construct in the multiple copy gene construct encodes the identical protein of interest but varies from the other genes of interest in the multiple copy gene construct by at least 0.5 %, preferably at least 5 %, more preferably at least 10 %, most preferably at least 15 % of its nucleic acid sequence.

Of course, the multiple copy gene construct of the invention can also comprise more than one set of multiple gene constructs, for example, for producing different proteins at the same time, for producing multiunit proteins, for example, for producing light and heavy chains of an antibody, or functional fragments and derivatives of antibodies in the same multiple copy gene construct. Therefore, in a preferred embodiment, the present invention also relates to a multiple copy gene construct, comprising more than one set of multiple gene constructs, wherein one set of multiple gene constructs encodes one identical protein of interest, and wherein further set(s) of multiple gene constructs encode one or more different proteins of interest each.

The advantage of the multiple copy gene construct according to the present invention is that - although all gene constructs relate to the same gene, and hence express the identical protein - every gene of interest in all gene constructs is "unique". Such uniqueness of each gene of interest is advantageous because homologous elements have a tendency to trigger unwanted recombination between each other.

A multiple copy gene construct according to the present invention is any polynucleic acid comprising more than one, preferably at least two gene constructs, each of which gene constructs comprise all the information necessary for expressing the gene of interest in said gene construct, including required regulatory sequences for the expression of the gene of interest, e.g. including a promoter, start codon, stop codon, UTR(s), poly-adenylation encoding sequences to terminate the RNA message, etc. The choice and assembly of regulatory sequences depends on the vector, plasmid, host cell and/or personal preference of the skilled person.

The assembly of the unique, i.e. varied gene constructs into a multi-copy construct according to the invention can be achieved by conventional recombinant technology, e.g., in a single step using homology-based recombination in yeast, giving rise to a multiple copy gene construct for use in, e.g. bio-manufacturing. Optionally, a number of extra elements can be included in the multiple copy gene construct in order to facilitate integration into a desired target cell line (for example, recombination sites, see below), as well as a marker and/or selection gene indicating positive integration of the multiple copy gene construct, e.g. for a fluorescent marker or a selection gene (see below) that is helpful for selecting those cells where the construct has been successfully integrated.

A schematic representation of a part of an exemplary multiple copy gene construct comprising multiple gene constructs, each comprising the varied gene of interest, corresponding regulatory sequences, e.g. promoter, each gene construct separated from each other by insulator and adapter sequences, is provided in Figure 3.

The gene of interest and the corresponding protein of interest for expression by the multiple gene constructs in the multiple copy gene construct of the present invention can be any protein for recombinant production and it is not limited to any specific origin, nature or function of a protein.

Once a protein of interest is decided upon, a library of coding sequences, i.e. genes varied in nucleic acid identity but encoding the identical protein, can be obtained by conventional means, e.g. by introducing variation at 'wobble bases' in the genetic sequence. For example, one can make use of python scripts in order to ensure that all the coding nucleic acid sequences in the library are sufficiently unique, i.e. each nucleic acid sequence differs from any other sequence in the library by a minimum threshold value for nucleic acid sequence identity (e.g. at least 0.5, 5, 10, 15 or 20 % difference in sequence identity) (see Fig. 2 for an illustration of the varied nucleic acid sequence for encoding the identical amino acid sequence).

In a preferred embodiment, the multiple copy gene construct of the present invention is one, wherein in addition to the gene of interest at least one further element of each gene construct in the multiple gene constructs, preferably regulatory elements and/or insulator sequences, varies from the other further element in the multiple gene construct by at least 0.5 %, preferably at least 5 %, more preferably at least 10 %, most preferably at least 15 % of its nucleic acid sequence. For example, in addition to the gene of interest encoding the protein of interest in each gene construct, further elements of the gene construct, e.g. selected from the group consisting of promoter, insulator, 5 ' -UTR, 3 ' -UTR, and polyA can be varied to the extent that their function is not impaired and the gene of interest is still expressed.

The percentage of variation of the unique genes of interest and optionally also the further elements of the gene construct in each multiple gene construct of the multiple copy gene construct of the present invention is preferably calculated according to the pairwise comparison between the DNA sequences of the genes of interest and optionally the further elements of the gene constructs, respectively.

In general, any method for calculating sequence divergence as understood by those skilled in the art can be used for determining the percentage of variation between the unique genes of interest in each gene construct. The same method can be applied for determining sequence variation between unique further elements in each gene construct of the multiple gene construct.

For example, for two nucleic acid sequences of the same length, the variation can be calculated using "Hamming Distance" metrics between the sequences, which is the number of positions at which the corresponding DNA bases are different between the two sequences. The Hamming distance is known in the art as a metric used in information theory to determine how many substitutions are required to change one string into another. In other words, given two or more strings of equal length n, the Hamming distance measures the number of positions at which corresponding symbols are different. As an example, the Hamming distance between the strings GAAT and GACG is equal to 2. The Hamming distance can be divided by the length of the strings n - in this example n = 4 - to give the percentage dissimilarity between the strings being compared (for this example: 2/4 = 0.5 or 50% dissimilarity). This method is preferably used herein to determine the dissimilarity between gene coding sequences. The Hamming distance was introduced by R.W. Hamming in the context of coding theory (Bell Labs Technical Journal 29.2 (1950): 147-160) and has been subsequently widely used to compare DNA sequences. The metric is particularly suited to the comparison of DNA sequences encoding the same protein because such sequences generally have equal length.

For sequences of different lengths, other commonly known distance metrics can be used such as Damerau-Levenshtein distance (see for example, D. Gusfield, Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, 1997.). Damerau-Levenshtein distance is the minimum number of operations (consisting of insertions, deletions or substitutions of a single character, or transposition of two adjacent characters) required to change one sequence into the other. For further common and suitable methods for determining percentage sequence variation, reference is made to the document "Distance Measures for Sequences" of Sandeep Hosangadi, which document is accessible via link https://arxiv.org/ftp/arxiv/papers/1208/1208.5713.pdf, which document is hereby included in toto by reference.

Furthermore, it was found that the absence of strong RNA secondary structure(s) around the ATG start codon of a gene of interest can enhance RNA translation and hence protein expression. In one embodiment for practicing the present invention, the multiple copy gene construct is one, wherein each mRNA encoded by each gene of interest has a folding energy for the first 100 RNA nucleotides that is at least 10 % higher, preferably at least 15 % higher, more preferably at least 30 % higher, most preferably at least 45% higher than the folding energy of the first 100 wild type RNA nucleotides encoded by the wild type gene of interest. I n another embodiment the multiple copy gene construct according to the present invention is one, wherein the folding energy of the first 100 nucleotides of each mRNA encoded by each gene of interest is higher than -2000 Kcal/mol, preferably higher than -1700 Kcal/mol. The folding energy for RNA sequences can be determined and calculated according to conventional methods, preferably by the algorithm cited in Nucleic Acids Research, 2013, Vol. 41, No. 6.

As mentioned above, the choice of regulatory sequences for the gene constructs in the multiple copy gene construct of the present invention will vary with the skilled person ' s preference, the plasmid, vector, host cell, selection method, etc. Exemplary promotors for use in the present invention are selected from the group consisting of Pichia pastoris GAP promoter, AUG1 promoter, FLD1 promoter and AOX1 promoter (see for example Pichia Expression Kit Instruction Manual, Invitrogen Corporation, Carlsbad, Calif.), the Saccharomyces cerevisiae GALl, ADH1, GAP, ADH2, MET25, GPD, CUP1 or TEF promoter (see for example Methods in Enzymo- logy, 350, 248, 2002), the Baculovirus polyhedrin plO or iel promoter (see for example Bac-to- Bac Expression Kit Handbook, Invitrogen Corporation, Carlsbad, Calif., and Novagen Insect Cell Expression Manual, Merck Chemicals Ltd., Nottingham, UK), the Lentivirus CMV, UbC, EFla, or MSCV promoter (see for example System Biosciences, Mountain View, CA, USA), the Adenovirus CMV promoter (see for example ViraPower Adenoviral Expression System, Life Technologies, Carlsbad, CA, USA), the Simian virus 40 promoter SV40, the E. coli U, araBAD, rhaP BAD, tetA, lac, trc, tac or pL promoter (see Applied Microbiology and Biotechnology, 72, 211, 2006), the B. subtilis, vegl, vegll, σΑ, P gr ac, Pgiv, man? or P43 promoter (see Applied Microbiology and Biotechnology, 72, 211, 2006), the plant CaMV35S, ocs, nos, Adh-1, Tet promoters (see e.g. Lau and Sun, Biotechnol Adv. 2009, 27, 1015-22) or inducible promoters for mammalian cells as described in Sambrook and Russell (2001).

Most preferably, the promoters for use in the present invention are based on the human EF1 core promoter architecture (see Journal of Biological Chemistry 269.47 (1994): 29831- 29837). For example, a preferred library of promoters based on the human Efla core promoter architecture can be designed as follows: core promoter sequences such as a TATA box, initiator element, EFP binding sites are preserved, while surrounding regions consist of a number of transcription factor binding sites and intervening spacers, and are unique in each promoter.

Polyadenylation signals can be any that are suitable for the plasmid, vector, and/or host cell. In a preferred embodiment poly-adenylation signals can be built around three essential regions of the Rabbit beta-globin poly-A sequence. Insulators are nucleic acid sequences that insulate the gene construct including the gene of interest and the corresponding regulatory sequences for protein expression from the surrounding chromatin. Preferably, insulators for gene constructs for use in the present invention are located upstream of the promoter. Preferred insulators are CTCF-based insulators, for example, as described in Nature Biotechnology 33, 198-203 (2015).

The multiple copy gene construct of the present invention may optionally comprise one or more integrations sites, preferable for a site-specific recombinase including tyrosine recom- binases, tyrosine integrases, and serine resolvases/invertases and serine integrases. Preferred recognition sites / site-specific recombinases for use in the present invention are selected from the group consisting of Cre, Dre, Flp, KD, B2, B3, lambda (λ), HK002, HP1, lambda-delta (λδ), ParA, Tn3, Gin, φ€31, Bxbl, R4, φΒΤΙ, Wb, TP901, TGI, SPBC, R4, RV, MRU, φ370, A118, c})Cl and φΚ38.

Any suitable marker and / or selection gene(s) may be used for constructing the multiple copy gene construct of the present invention. Preferably, the at least one marker and/or selection gene for use in the present invention is selected from the group consisting of mRNA, non-coding RNA, microRNA and (poly)peptides, preferably fluorescent proteins, more preferably mCitrine, green fluorescent protein (GFP), mCherry and DsRed, cell surface proteins, toxic proteins, antibiotic resistance proteins, apoptotic proteins, transcriptional regulators, immune- modulators and site-specific recombinases. Preferably, the marker and/or selection gene allows for easy detection, e.g. by optical methods such as microscopy, UV-VIS-light detection, binding assays, morphological changes of the biosensor cells, cell motility, etc.

When constructing the multiple copy gene construct of the invention it is also preferred that the resulting multiple copy gene construct does not comprise any stop codons and/or protein or RNA binding sites interfering with the expression of the protein of interest in a cell.

With regard to the number of gene constructs comprising the varied gene of interest in the multiple copy gene construct of the present invention, it is preferred that the number of gene constructs is at least 2, preferably at least 5, more preferably at least 10.

In a further aspect, the present invention reads on a vector comprising a multiple copy gene construct according to the invention, preferably a vector selected from the group consisting of a plasmid DNA vector, bacterial artificial chromosome (BAC) vector, yeast artificial chromosome (YAC) vector, viral or episomal vector, baculovirus vector, lentivirus vector, adenovirus vector, vaccinia vector, retroviral vector, yeast and bacterial episomal vector. The selection of a suitable vector and expression control sequences as well as vector construction are within the ordinary skill in the art. Preferably, the viral vector is a lentivirus vector (see for example System Biosciences, Mountain View, CA, USA), adenovirus vector (see for example ViraPower Adenoviral Expression System, Life Technologies, Carlsbad, CA, USA), baculovirus vector such as bacmid (or see for example Bac-to-Bac Expression Kit Handbook, Invitrogen Corporation, Carlsbad, Calif.), the pcDNA, pVITRO, pSV and pCMV series of plasmid vectors, vaccinia and retroviral vectors (see for example Hruby, D. E. (1990). Vaccinia virus vectors: new strategies for producing recombinant vaccines. Clinical Microbiology Reviews, 3(2), 153-170.), bacterial vector pGEX and pET (or see for example Novagen, Darmstadt, Germany)) or yeast vector pPIC (or see for example ATCC Manassas, Virginia). Vector construction, including the operable linkage of a coding sequence with a promoter and other expression control sequences, is within the ordinary skill in the art.

The multiple copy gene construct can operate in any suitable cell, preferably a eukaryotic cell, more preferably in a vertebrate, a mammalian, an insect, a worm, a yeast or a prokaryotic cell.

In another aspect the present invention relates to a cell comprising a multiple copy gene construct according to the invention and/or a vector according to the invention.

Preferred host cells for producing the polypeptide of the invention are selected from the group consisting of yeast cells, preferably Saccharomyces cerevisiae (see for example Methods in Enzmology, 350, 248, 2002), Pichia pastoris cells (see for example Pichia Expression Kit Instruction Manual, Invitrogen Corporation, Carlsbad, Calif.)], bacterial cells preferably E. coli cells (BL21(DE3), K-12 and derivatives) (see for example Applied Microbiology and Biotechnology, 72, 211, 2006) or B. subtilis cells (1012 wild type, 168 Marburg or WB800N)(see for example Westers et al., (2004) Mol. Cell. Res. Volume 1694, Issues 1-3 P: 299-310), plant cells, preferably Nicoti- ana tabacum or Physcomitrella patens (see e.g. Lau and Sun, Biotechnol Adv. 2009 May 18. [electronic publication ahead of print]), NIH-3T3 mammalian cells (see for example Sambrook and Russell, 2001), Human Embryonic Kidney 293 cells (HEK 293, adherent or in suspension, also large T antigen transformed HEK 293T cells), Chinese hamster ovary (CHO) cells, COS cells, and insect cells, preferably sf9 insect cells (see for example Bac-to-Bac Expression Kit Hand book, Invitrogen Corporation, Carlsbad, Calif.).

Most preferably, the cell for practicing the present invention are selected from the group consisting of yeast cells, preferably Saccharomyces cerevisiae, Pichia pastoris cells, bacterial E. coli cells, B. subtilis cells, plant cells, preferably Nicotiana tabacum or Physcomirella patens cells, insect cells, preferably sf9 insect cells, mammalian cells, preferably an epithelial cell, an embryonic kidney cell, a fibroblast cell, a Chinese hamster ovary (CHO) cell, an HEK-293 cell, an HEK293T cell, a BHK cell, an NIH-3T3 cell, and COS mammalian cells. In another aspect, the present invention is directed to a method for producing a recombinant, preferably mammalian cell for recombinant protein production, comprising the steps of

(i) producing a multiple gene construct according to the present invention,

(ii) introducing the multiple gene construct of step (i) into the genome of a host cell,

preferably by targeted integration of the multiple gene construct at a predetermined locus or loci,

(iii) optionally selecting cells that express the protein of interest.

In the above method of the present invention, the definitions and preferred embodiments taught above for the multiple gene construct of the present invention apply mutatis mutandis.

An exemplary and non-limiting workflow for a method according to the present invention is described below.

1) Construction of an integration vector, designed to contain multiple but varied copies of the gene of interest (or genes of interest, if there is more than one set of multiple gene constructs in the multiple copy gene construct) encoding the identical polyamino acid sequence together with their respective regulatory sequences. An individual gene copy with its regulatory sequence is defined as a gene construct or gene cassette. The integration vector used harbors multiple gene cassettes, i.e. multiple gene constructs, but all such cassettes are "genetically unique" (see below), despite expressing the identical protein;

2) Targeted integration (i.e. at a known locus/loci) of the multiple copy gene construct into the genome of a host cell line;

3) The multiple copy gene construct includes a fluorescent marker, which allows to identify and select cells in which integration at the target site has occurred. The selected cell population obtained in such a way shows high protein productivity, and it can be expanded and further evaluated for large-scale production.

In a further aspect, the present invention is directed to a recombinant, preferably mammalian cell comprising a multiple copy gene construct according to the present invention, and produced by any method as defined above.

In a further preferred embodiment, the present invention is directed to a method for producing a recombinant protein of interest (or proteins of interest if more than one set of multiple gene constructs is present in the multiple copy gene construct, e.g. multi subunit proteins, e.g. heavy and light chain of antibodies), comprising the step of culturing a

recombinant cell according to the present invention.

In another aspect, the present invention is directed to a kit of parts comprising (i) a storage medium comprising a computer program for generating nucleic acid sequences encoding the identical protein of interest (or proteins of interest if more than one set of multiple gene constructs is present in the multiple copy gene construct, e.g. multi subunit proteins, e.g. heavy and light chain of antibodies), wherein each of the generated nucleic acid sequences varies from the other generated nucleic acid sequences by at least at least 0.5%, preferably at least 5%, more preferably at least 10 %, most preferably at least 15 % of its nucleic acid sequence identity,

(ii) means for producing a multiple copy gene construct according to the present invention comprising at least two, preferably at least 5, more preferably at least 10 nucleic acid molecules having nucleic acid sequences generated by the computer program of step (i),

(iii) optionally a vector for receiving the multiple gene construct of step (ii), preferably

comprising means for the targeted integration of the multiple gene construct into a host cell of interest,

(iv) optionally means for the integration, preferably targeted integration of the multiple gene construct produced in step (ii) into a host cell of interest,

(v) optionally means for identifying a marker indicative of the integration of the multiple gene construct into the host cell of interest.

In a preferred embodiment, the kit of parts further comprises means for producing at least one multiple copy gene construct, wherein not only the genes of interest encoding the protein(s) of interest are varied in nucleic acid sequence but also at least one further element of the multiple gene construct(s), preferably at least one regulatory element or insulator sequence, more preferably a promoter, 3 ' -UTR, 5 ' -UTR, polyA, and/or insulator sequence is varied in nucleic acid sequence without loss of function and expression of the protein of interest.

Percentage variation of the at least one further element of the multiple gene construct(s) is preferably at least 0.5 %, preferably at least 5 %, more preferably at least 10 %, most preferably at least 15 % of its nucleic acid sequence identity.

In a preferred embodiment, the kit of parts is one, wherein each mRNA encoded by each gene of interest for producing the multiple copy gene construct in step (ii) has a folding energy for the first 100 RNA nucleotides that is at least 10 % higher, preferably at least 15 % higher, more preferably at least 30 % higher, most preferably at least 45% higher than the folding energy of the first 100 wild type RNA nucleotides encoded by the wild type gene of interest. In a further embodiment, the kit of parts is one, wherein each mRNA encoded by each gene of interest for producing the multiple copy gene construct in step (ii) has a folding energy for the first 100 RNA nucleotides that is higher than -2000 Kcal/mol, preferably higher than -1700 Kcal/mol. The folding energy for RNA sequences can be determined and calculated according to conventional methods, preferably by the algorithm cited in Nucleic Acids Research, 2013, Vol. 41, No. 6.

In the following the invention will be illustrated by figures and examples relating to specific embodiments of the invention, none of which are to be interpreted as limiting the scope of the invention beyond the scope of the appended claims.

Figures

Fig. 1 is a schematic flow chart depicting the typical workflow currently used for cell line development for recombinant protein manufacturing (adapted from Pharmaceuticals 2013, 6(5), 579-603); starting with the transfection of host cells with expression vectors comprising the gene of interest and regulatory sequences as well as a selection marker, leading to some host cells with randomly integrated genes of interest, which are selected using the selection marker, followed by single cell cloning or limiting dilution selection; the so selected clones are subjected to preliminary clone evaluation, followed by expansion of selected clones before they are ready for cell banking, further evaluation or protein production of the protein of interest.

Fig. 2 is a representative example of a set of sequences encoding the same polypeptide but differing in alternative bases at wobble positions leading to alternative nucleic acid triplets encoding the same amino acid. The so-called wobble positions are marked with dotted lines.

Fig. 3 is a schematic drawing of a section of a representative but non-limiting multiple-copy gene construct comprising a row of gene constructs, each gene construct including regulatory sequences, i.e. promoter, 5 ' -UTR and 3 ' UTR (UTR=untranslated region) and an adapter and insulator sequence separating each gene construct or forming part of each gene construct. Each of the coding sequences (CDS) is unique and encodes the identical amino acid sequence but differs from all remaining coding sequences in that it varies by at least 0.5, 5, 15, preferably by at least 20 % of its codons. Of course, there may be one or more sets of coding sequences (CDS), each set encoding the identical amino acid sequence. For example, different sets can encode different parts of the same protein, e.g. an antibody with different chains.

Fig. 4 A to C are column graphs showing the reporter fluorescence for generated libraries of (A) poly-A signals, (B) promoters and (C) coding sequences for the fluorescent protein mCitrine, which functions as representative but non-limiting protein of interest and which simplifies expression quantification via FACS. The figures show the validation of the individual components needed to construct gene cassettes for producing multiple-copy gene constructs for use in the present invention. The promoters and poly-A constructs can be re-used in different protein production scenarios. Fig. (A) shows the expression of a fluorescent reporter cloned upstream of a library of polyA sequences that were built around the Rabbit beta-globin signal, (B) shows fluorescence from a library of degenerate mCitrine-encoding sequences and, (C) illustrates expression of a fluorescent reporter cloned downstream of a library of promoters built around the EFla promoter architecture. In Fig. A the 5 th bar, in Fig. B the first bar, and in Fig. C the first bars indicate wild type sequences (RgG pA, mCitrine, EFla promoter). All constructs were measured after transfection in CHO cells.

Fig. 5 shows reporter expression from 11 gene cassettes. Each cassette comprises an insulator, a promoter, a protein coding sequence (e.g. the fluorescent reporter mCitrine) and a poly-adenylation signal. All these components are designed to be unique sequence-wise, while preserving function.

Fig. 6 shows three different bioproduction units expressing a reporter with approximately the same fluorescence intensity are assembled via yeast assembly on a shuttle vector. An approximately linear increase in fluorescence is observed in the multi-copy construct with respect to a single-cassette containing vector.

Fig. 7 shows an exemplary workflow of the lentiviral based screening approach. CHO-K1 cells are infected with lentivirus at optimal multiplicity of infection. Following infection, the positive cells indicated by presence of fluorescence are sorted and cultured for a period of time. The sorting process is repeated to remove cells that lose fluorescence over time. At certain time points, genomic DNA is extracted for next generation sequencing (NGS) analysis by nrLAM-PCR technique (see Nature protocols 5.8 (2010):1379-1395).