Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
READING FRAME INDEPENDENT EPITOPE TAGGING
Document Type and Number:
WIPO Patent Application WO/1998/026094
Kind Code:
A1
Abstract:
Oligonucleotide sequence comprising a repeating nucleotide sequence encoding circularly permuted epitope tag, and vectors comprising the oligonucleotide sequences. Methods for using the sequences to tag proteins. Antibodies specific for the epitopes. Methods for detecting and purifiying proteins.

Inventors:
JARVIK JONATHAN W (US)
Application Number:
PCT/US1997/022472
Publication Date:
June 18, 1998
Filing Date:
December 09, 1997
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
JARVIK JONATHAN W (US)
International Classes:
C07K14/415; C07K16/00; C12N15/10; C12N15/62; C12P21/02; (IPC1-7): C12Q1/68; C12P21/06; C12P19/34; C12N15/00; C12N1/20; C12N5/06; C12N5/16; C12N5/10; C12N5/02; C12N5/00; A61K38/00; C07H21/04
Foreign References:
US5652128A1997-07-29
US5612180A1997-03-18
Other References:
BIOTECHNIQUES, 1994, Vol. 17, No. 3, SURDEJ P. et al., "Strategy for Epitope Tagging the Protein-Coding Region of Any Gene", pages 560-565.
THE PLANT CELL, December 1995, Vol. 7, DEWITT N.D. et al., "Immuncytological Localization of an Epitope-Tagged Plasma Membrane Proton Pump (H+-ATPase) in Phloem Companion Cells", pages 2053-2067.
PROTEIN SCIENCE, August 1992, Vol. 1, No. 8, SCOTT M.R. et al., "Chimeric Prion Protein Expression in Cultured Cells and Transgenic Mice", pages 986-997.
METHODS IN ENZYMOLOGY, 1991, Vol. 185, UHLEN M. et al., "Gene Fusions for Purpose of Expression: An Introduction", pages 129-143.
See also references of EP 0961838A4
Attorney, Agent or Firm:
Brotman, Harris F. (7911 Herschel Avenue Suite 30, La Jolla CA, US)
Download PDF:
Claims:
What is claimed is:
1. An oligonucleotide comprising a nucleotide sequence, said nucleotide sequence encoding an epitope and adapted for insertion into a target nucleotide sequence and for expression in a host cell, said nucleotide sequence encoding said epitope independently of the reading frame of said nucleotide sequence.
2. The oliognucleotide of claim 1 wherein said nucleotide sequence encoding said epitope has the formula (S)n wherein S is a sequence of nucleotides whose number is not evenly divisible by 3, and n is an integer equal to or greater than the number of nucleotides in S, with the proviso that upon insertion into said target sequence, said nucleotide sequence does not encode a stop codon in any reading frame.
3. The oligonucleotide of claim 1 wherein one or more flanking sequences flank said nucleotide sequence such that the reading frame encoded by said target nucleotide sequence is not broken downstream of said oligonucleotide when said oligonucleotide is inserted into said target nucleotide sequence.
4. The oligonucleotide of claim 1 wherein the nucleotide sequence is selected from the group of sequences which encode an antisense strand wherein said antisense strand encodes an epitope independent of the reading frame of the antisense strand.
5. A probe having a nucleotide sequence sufficiently complementary to a nucleotide sequence which codes for an epitope independently of the reading frame of said nucleotide sequence.
6. A fusion polypeptide comprising a native protein to which is attached an epitope, said epitope comprising a sequence of amino acids encoded by an oligonucleotide, said oliognucleotide comprising a nucleotide sequence adapted for insertion into a target nucleotide sequence and for expression in a host cell, said nucleotide sequence encoding said epitope independently of the reading frame of said nucleotide sequence.
7. The fusion polypeptide of claim 6 wherein said nucleotide sequence has the formula (S)n wherein S is a sequence of nucleotides whose number is not evenly divisible by 3, and n is an integer equal to or greater than the number of nucleotides in S, with the proviso that upon insertion into said target sequence, said nucleotide sequence does not encode a stop codon in any reading frame.
8. The fusion polypeptide of claim 6, wherein one or more flanking sequences flank said nucleotide sequence such that the reading frame encoded by said target nucleotide sequence is not broken downstream of said oligonucleotide when said oligonucleotide is inserted into said target nucleotide sequence.
9. A DNA construct comprising a nucleotide sequence which codes for an epitope independently of the reading frame of said nucleotide sequence.
10. The DNA construct of claim 9 which further codes for a fusion polypeptide, said fusion polypeptide comprising a native polypeptide fused to said epitope.
11. The DNA construct of claim 9 wherein said nucleotide sequence coding for said epitope has the formula (S)n wherein S is a sequence of nucleotides whose number is not evenly divisible by 3, and n is an integer equal to or greater than the number of nucleotides in S, with the proviso that said nucleotide sequence does not encode a stop codon in any reading frame.
12. The DNA construct of claim 10 capable of being transcribed to yield mRNA, the mRNA capable of being translated to yield a fusion polypeptide which comprises said epitope.
13. The DNA construct of claim 10 capable of being transcribed to yield mRNA, the mRNA being capable of being translated to yield a fusion polypeptide which displays the antigenicity of said epitope.
14. The DNA construct of claim 10 wherein said fusion polypeptide is distinguishable from the endogenous production of said native polypeptide by the essential absence of the ability of said native polypeptide to specifically bind to an antibody specific for said epitope or by the absence of said native polypeptide to display the antigenicity of said universal epitope.
15. The DNA construct of claim 9 comprising a vector selected from the group of vectors consisting of bacterial plasmids, bacterial transposons, bacterial viruses, eucaryotic transposons and eucaryotic viruses.
16. An epitope comprising a sequence of amino acids encoded by a nucleotide sequence independently of the reading frame of the nucleotide sequence.
17. The epitope of claim 16 wherein said nucleotide sequence has the formula (S)n wherein S is a sequence of nucleotides whose number is not evenly divisible by 3, and n is an integer equal to or greater than the number of nucleotides in S.
18. A vector comprising the DNA construct of claim 9 incorporated into a plasmid capable of transforming host cells, said plasmid selected from the group of plasmids consisting of those with both a drug resistance marker and a replication origin.
19. A vector comprising the DNA construct of claim 9 incorporated into a virus capable of transforming host cells, said virus selected from the group of viruses consisting of lambda, P22, M13, fl, adenovirus, EpsteinBarr virus, herpes virus, baculovirus, SV40 ,MoMLV, MoMSV, ALV, and their derivitives.
20. A vector comprising the DNA construct of claim 9 incorporated into a transposon capable of transforming host cells, said transposon selected from the group consisting of TnlO, Tn5, Tn3, Ty1, P element and their derivitives.
21. A host cell transformed with the vector of claim 18.
22. A host cell transformed with the vector of claim 19.
23. A host cell transformed with the vector of claim 20.
24. An animal transformed with the vector of claim 18.
25. An animal transformed with the vector of claim 19.
26. An animal transformed with the vector of claim 20.
27. A plant transformed with the vector of claim 18.
28. A plant transformed with the vector of claim 19.
29. A plant transformed with the vector of claim 20.
30. Antibodies specific for an epitope, including the immunologically reactive fusion polpeptides and fragments thereof comprising said epitope, said epitope encoded by a nucleotide sequence independently of the reading frame of said nucleotide sequence.
31. The antibodies of claim 30 wherein said antibodies are polyclonal antibodies.
32. The antibodies of claim 30 wherein said antibodies are monoclonal antibodies.
33. A hybridoma or immortalized cell line which secretes monoclonal antibodies specific for an epitope, said epitope encoded by a nucleotide sequence independently of the reading frame of said nucleotide sequence.
34. A method for producing polyclonal antibodies specific for an epitope, said epitope encoded by a nucleotide sequence independently of the reading frame of said nucleotide sequence, which method comprises administering a sufficient amount of an antigen comprising said epitope to an animal and after a sufficient period of time collecting said polyclonal antibodies from said animal.
35. A method to produce monoclonal antibodies specific for an epitope, said epitope encoded by a nucleotide sequence independently of the reading frame of said nucleotide sequence, which method comprises culturing a hybridoma or immortalized cell line and recovering the monoclonal antibodies, said hybridoma immortalized cell line secreting monoclonal antibodies specific for an epitope, said epitope encoded by a nucleotide sequence independently of the reading frame of said nucleotide sequence.
36. A reagent specific for an epitope, including the immunologically reactive fusion polpeptides and fragments thereof comprising said epitope, said epitope encoded by a nucleotide sequence independently of the reading frame of said nucleotide sequence.
37. A method for epitope tagging a native polypeptide comprising the steps of: (a) attaching an oligonucleotide to the coding sequence of a native polypeptide to produce a tagged gene coding for a fusion polypeptide, said oligonucleotide comprising a nucleotide sequence, said nucleotide sequence encoding said epitope independently of the reading frame of said nucleotide sequence; and (b) introducing said tagged gene into an expression system under conditions sufficient for transcription of said tagged gene to yield mRNA, the mRNA being capable of being translated to yield said fusion polypeptide, said fusion polypeptide being distinguishable from the native polypeptide by the absence of antigenicity of the native polypeptide to an antibody specific for said epitope.
38. A method for purifying a polypeptide, comprising the steps of: (a) tagging a target sequence which encodes a polypeptide with a nucleotide sequence which encodes an epitope independent of the reading frame of the nucleotide sequence to produce a tagged target sequence which encodes a fusion polypeptide; (b) expressing said tagged target sequence in an expression system to produce said fusion polypeptide; and (c) purifying said fusion polypeptide.
39. A method for detecting a polypeptide, comprising the steps of: (a) tagging a target sequence which encodes a polypeptide with a nucleotide sequence which encodes an epitope independent of the reading frame of the nucleotide sequence to produce a tagged target sequence which encodes a fusion polypeptide; (b) expressing said tagged target sequence in an expression system to produce said fusion polypeptide; and (c) contacting said expression system with a sufficient amount of an antibody or reagent which are specific for said epitope under conditions which produce a detectable signal indicating a reaction between the fusion polypeptide and antibody or reagent.
40. A method for tagging genes, transcripts and proteins in a eukaryotic cell comprising the steps of: (a) introducing into an intron within a gene a DNA sequence including a first nucleotide sequence, an acceptor site for RNA splicing, a second nucleotide sequence, and a donor site for RNA splicing, the first nucleotide sequence being necessary for splice acceptor function, and the second nucleotide sequence encoding an epitope recognized by an antibody, other reagent or molecule; and (b) promoting expression of the gene in a eukaryotic cell to produce a protein product, the protein product comprising a peptide encoded by the second nucleotide sequence as part of its primary structure.
41. The method of claim 40 wherein said second nucleotide sequence encodes an epitope independently of the reading frame of said second nucleotide sequence.
42. A kit for epitope tagging which comprises antibodies or other reagents specific for an epitope, said epitope encoded by an oligonucleotide which comprises a nucleotide sequence which encodes an epitope and which is adapted for insertion into a target nucleotide sequence and for expression in a host cell, said nucleotide sequence encoding said epitope independently of the reading frame of said nucleotide sequence.
43. The kit of of claim 42 further comprising an oligonucleotide which comprises a nucleotide sequence which encodes an epitope and which is adapted for insertion into a target nucleotide sequence and for expression in a host cell, said nucleotide sequence encoding said epitope independently of the reading frame of said nucleotide sequence.
44. The kit of claim 42 further comprising a probe having a nucleotide sequence sufficiently complementary to a nucleotide sequence which codes for said epitope.
45. The kit of claim 42 further comprising a DNA construct comprising a nucleotide sequence which codes for an epitope independently of the reading frame of said nucleotide sequence.
46. The kit of claim 45 wherein said DNA construct further codes for a fusion polypeptide, said fusion polypeptide comprising a native polypeptide fused to said epitope.
47. The kit of claim 45 further comprising a vector suitable for incorporating said DNA construct.
48. The kit of claim 45 further comprising antibodies specific for said epitope.
49. The kit of claim 45 further comprising a probe having a nucleotide sequence sufficiently complementary to said nucleotide sequence which codes for said epitope.
Description:
READING FRAME INDEPENDENT EPITOPE TAGGING BACKGROUND OF THE INVENTION 1. Field of the Invention This invention relates to epitope tagging, in particular, to improved epitope tags, the nucleotide sequences that encode them, methods for using the nucleotide sequences and tags, and resulting cellular and multicellular products.

2. Background Art The publications and other reference materials referred to herein to describe the background of the invention and to provide additional detail regarding its practice are hereby incorporated by reference. For convenience, the reference materials are numerically referenced and grouped in the appended bibliography.

Epitope tagging is a recombinant DNA method for introducing immunoreactive peptides into the products of cloned genes (1-7). In particular, a DNA sequence encoding a sequence of amino acids that comprises a continuous epitope is inserted into the coding sequence of a cloned gene with the result that when the gene is expressed the protein of the gene is tagged with the epitope. The protein can then be detected and/or purified by virtue of its interaction with an antibody specific to the epitope. Epitope tags are typically 5-20 amino acids in length. Nucleotide sequences encoding the epitope are produced either by cloning appropriate portions of natural genes or by synthesizing a polynucleotide that encodes the epitope.

Epitope tagging is widely used for detecting, characterizing, and purifying proteins. The technique offers several advantages over alternative methods of detecting and purifying proteins. The small size of the epitope tag, which is usually 5-20 amino acids in length, generally has no effect on the biological function of the tagged protein.

This contrasts with many larger fusion protein products, in which the activity or function of the fusion protein has been affected by the longer peptide label. Epitope tagging also offers tremendous time savings over the traditional method of producing an antibody to the specific protein being studied.

Epitope tagging involves adding a unique ëepitope tagi peptide sequence to the protein of interest by recombinant DNA techniques, creating a fusion protein. The

resulting tagged protein can then be detected by and purified with an antibody specific for the epitope tag.

Epitope tagging methods have been used in a wide variety of applications, including western blot analysis, immunoprecipitation, immunofluorescence, and immunoaffinity purification of tagged proteins.

Epitope tagging was first described in 1984 by Munro and Pelham (1). A cDNA encoding the Drosophila melanogaster heat shock protein hsp70 was tagged at the 3' end of the coding sequence with a short oligonucleotide tag encoding either nine or fourteen amino acids of the peptide Substance P. After transfection of monkey COS cells, the tagged protein was detected using an anti-substance P monoclonal antibody. Since the initial report of Munro and Pelham, hundreds of investigations using epitope tagging have been reported in the scientific literature. Epitope tagging products and kits, which include various combinations of peptides, polynucleotides, and antibodies, are currently sold by a number of companies, including Boehringer-Mannheim, Indianopolis IN; Berkeley Antibody Company, Berkeley, CA; MBL International Corporation, Watertown, MA; Novagen, Madison WI; IBI, West Haven, CT and Life Technologies, Gaithersburg, MD.

To epitope tag a protein by conventional means, one begins with two DNA molecules: (1) a polynucleotide which is cloned in a plasmid vector and which includes a sequence of nucleotides encoding the protein as well as regulatory sequences (i.e.

promoter, translations start, etc.) needed to express the protein; and (2) an oligonucleotide encoding the epitope with which the protein is to be tagged. The oligonucleotide is designed to encode, in one of its reading frames, an epitope recognized by a known antibody. One chooses a site in the polynucleotideis protein coding sequence for insertion of the oligonucleotide. The site may be at or near the 3' or the 5' end of the coding sequence, or somewhere in between the 3' and 5' ends. The insertion site for the oligonucleotide is typically a unique restriction site. The plasmid is linearized with the restriction endonuclease, and the oligonucleotide is ligated into the site. The tagged gene is then introduced into living cells. Epitope-tagged protein, which is subsequently expressed from the tagged gene, is detected and/or purified by immunochemical means.

Using conventional epitope tagging techniques, hundreds of different proteins have been epitope-tagged with numerous distinct peptides, including the ten amino aciid c-myc epitope (EQKLISEEDL) derived from the human c-myc protein (8) );

the nine amino acid HA-epitope (YPYPDVYA) derived from influenza virus hemagglutinin (9, 10), the eight amino acid FLAG epitope (DYKDDDDK) derived from bacteriophage T7 (Castrucci et al., 1992. J. Virology 66:4647-4653) and the eleven amino acid epsilon-tag epitope (KGFSYFGEDLMP) derived from protein kinase C epsilon (Olah et al., 1994. Anal. Biochem. 221: 94-102). Indeed, there appears to be no practical limit to the number of possible epitope tags that can exist. Essentially any peptide can be used as an immunogen to raise antibodies that will recognize that same peptide when it is present within or at the termini of a protein (11, 12).

It is common practice in molecular biology to obtain antibodies that recognize the protein product of a cloned and sequenced gene by (1) synthesizing a peptide, typically ten to twenty amino acids in length, that corresponds to a portion of the protein, (2) immunizing an animal with the peptide, and (3) using the resulting antiserum to immunodetect or immunopurify the protein in which the peptide is situated. An example ofthis approach can be found in Sawin (15) . A particularly relevant example can be found in Sugii et al. (13). Here, 23 overlapping peptides that cover the entire amino acid sequence of bovine conglutinin were synthesized and used individually as peptide epitopes to immunize rabbits. Every serum showed cross-reactivity with the complete conglutinin protein.

A problem with conventional epitope tagging involves a limited probability of successfully tagging the protein. Despite researchersi best efforts, not every insertion into a host polynucleotide of an oligonucleotide encoding an epitope tag is achieved in a reading frame which allows expression of the intended epitope. The probability of success using a conventional method depends, in part, on how much is known about the polynucleotide before the construction is commenced. If the nucleotide sequence is known, and if, therefore, the reading frame at the target restriction site is known, then an oligonucleotide with the epitope encoded in the correct reading frame can be chosen. In this case, the probability that a given insertion event will be the desired one is one in two for the reason that the orientation of the oligonucleotide with respect to the polynucleotide cannot be controlled by the experimenter, and only one of the two orientations will serve.

If, on the other hand, the reading frame at the target restriction site is not known (as is frequently the case), then the probability of success drops to one in six because the reading frame will only be correct for one site out of three. The reading frame problem could be

dealt with by using three different DNA fragments, each of which encodes the epitope tag in a different reading frame (16). However, that involves production of multiple constructs to assure finding the one of interest, which is an inefficient process.

Accordingly, for known epitope tagging procedures to be effective, the added DNA must be (1) in the appropriate orientation, and (2) in the correct reading frame. There are thus two obstacles inherent in conventional epitope tagging: an orientation obstacle and a reading frame obstacle.

The reading frame obstacle can only be avoided if the reading frame around the target restriction site is known. Otherwise, three different DNA fragments, each of which encodes the epitope tag in a different reading frame must be used. In particular, if the insertion into the coding sequence is at a random or arbitrarily selected site, e.g. at a unique restriction site, then for a given epitope-encoding oligonucleotide, the maximum likelihood that it is possible to successfully epitope-tag the gene product by insertion of the oligonucleotide at that site is only one in three (due to the reading frame obstacle). The experimenter is forced to isolate multiple insertions at the target site and test them individually in order to find the one of interest. The test may be arduous. For example, if the gene of interest is to be assayed in transgenic animals, it would be necessary to make numerous transgenic constructs and examine them individually.

In summary, when the reading frame of the target restriction site is not known, the likelihood that a particular insertion will successfully tag the protein is only one in six (due to the reading frame obstacle and the orientation obstacle). In other words, in five tries out of six the experimenter will fail, and in two cases out of three the experimenter is destined to fail.

DISCLOSURE OF THE INVENTION The present invention overcomes the problem of inefficient epitope tagging.

In one aspect, the present invention is directed to compositions of oligonucleotide sequences, and to methods of using them to more efficiently epitope tag proteins.

The invention is based on an oligonucleotide sequence comprising a repeating nucleotide sequence which encodes a repeating circularly permuted amino-acid sequence epitope. Regardless of the reading frame, the oligonucleotide sequence of the invention enables one to tag a protein with the same epitope from all three possible reading

frames of the nucleotide sequence. This allows the present invention to overcome the inefficiency of epitope tagging caused by the reading frame obstacle, and, using certain embodiments of the invention, overcome the orientation obstacle as well.

The invention is directed to: 1. Oligonucleotides. A major aspect of the invention is directed to an oligonucleotide which comprises a nucleotide sequence that encodes an epitope. The nucleotide sequence encodes the epitope independently of the reading frame of the nucleotide sequence. The oligonucleotide is adapted for insertion into a target nucleotide sequence and for expression in a host cell. In a preferred embodiment, the nucleotide sequence encoding the epitope is a repeating sequence which has the formula (S)n wherein S is a sequence of nucleotides whose number is not evenly divisible by 3, and n is an integer equal to or greater than the number of nucleotides in S. Such an oligonucleotide is here defined as a "universal oligonucleotide." The claimed oligonucleotide sequences do not include those which upon insertion into the target sequence encode a stop codon in any reading frame. A version of the claimed oligonucleotide has sequences that flank the repeating nucleotide sequence and which allow insertion of the oligonucleotide into the target sequence such that the reading frame encoded by the target nucleotide sequence is not broken downstream of the oligonucleotide when the oligonucleotide is inserted -- said target nucleotide sequence.

2. DNA Constructs. The invention includes a DNA construct which comprises a nucleotide sequence which codes for an epitope independently of the reading frame of said nucleotide sequence. In one aspect, the DNA construct, codes for a fusion polypeptide, which comprises a native polypeptide fused to the epitope. When expressed, the fusion polypeptide is distinguishable from the native polypeptide by the absence of the ability of the native polypeptide to specifically bind to an antibody or other reagent specific for the universal epitope or by the absence of the native polypeptide to display the antigenicity of the universal epitope.

3. Vectors. Another aspect of the invention is a vector which comprises the DNA construct of the invention incorporated into a plasmid that is capable of stably transforming host cells. The vector can be incorporated into a virus or a transposon capable of transforming host cells.

4. Probes. The invention is further directed to probes which have a nucleotide sequence sufficiently complementary to a nucleotide sequence which codes for an epitope independently of the reading frame of said nucleotide sequence.

5. Epitopes. Another aspect of the invention is directed to an epitope which comprises a sequence of amino acids which is encoded by a nucleotide sequence independently of the reading frame of the nucleotide sequence. The nucleotide sequence has the formula (S)n wherein S is a sequence of nucleotides whose number is not evenly divisible by 3, and n is an integer equal to or greater than the number of nucleotides in S.

Such an epitope is here defined as a "universal epitope" 6. Fusion Polypeptides. In a further aspect of the invention, a fusion polypeptide is claimed. A fusion polypeptide of the invention comprises a native protein that comprises a universal epitope which is reactive with an antibody or other reagent specific for the universal epitope. The epitope comprises a sequence of amino acids encoded by the universal oligonucleotide of the invention. That is to say, the oliognucleotide comprises a nucleotide sequence adapted for insertion into a target nucleotide sequence and for expression in a host cell, the nucleotide sequence encoding the epitope independently of the reading frame of said nucleotide sequence.

7. Transformed Cellular and Multicellular Products. In yet another aspect, the invention provides either a host cell, an animal, or a plant transformed with one of the vectors of the invention.

8. Antibodies, Hybridomas, and Methods for Making. An additional feature of the invention is directed to antibodies that are specific for an epitope encoded by a nucleotide sequence independently of the reading frame of said nucleotide sequence.

The antibodies are further reactive with immunologically reactive fusion polypeptides comprising the epitope, and fragments thereof comprising the epitope. The antibodies of the invention may be polyclonal or monoclonal.

Yet another aspect of the invention is a hybridoma or immortalized cell line which secretes monoclonal antibodies specific for an epitope which is encoded by a nucleotide sequence independently of the reading frame of said nucleotide sequence.

Methods for producing the polyclonal or monoclonal antibodies of the invention are provided by the invention. The method for producing polyclonal antibodies specific for an epitope encoded by a nucleotide sequence independently of the reading frame of said nucleotide sequence involves administering a sufficient amount of an antigen comprising said epitope to an animal and after a sufficient period of time collecting said polyclonal antibodies from said animal. The method for producing monoclonal antibodies specific for an epitope encoded by a nucleotide sequence independently of the reading frame of said nucleotide sequence comprises culturing a hybridoma or immortalized cell line of the invention and recovering the monoclonal antibodies.

9. Other Reagents Specific to Universal Epitopes, and Methods for Making.An additional feature ofthe invention is directed to non-antibody reagents that bind specifically to universal epitopes. A number of methods, often called combinatorial methods, are known in the art to identify such reagents. Peptide reagents can be identified and produced, for example, using phage display (17, 18), random peptide display in bacteria (19, 20), or Selectide approaches (20, 21). DNA or RNA molecules can be identified and produced, for example, using the SELEX approach (22, 23).

10. Methods for Epitope Tagging and Production of Fusion Proteins.

The invention is also directed to a method for epitope tagging a native polypeptide to produce a fusion protein or polypeptide. The method involves attaching an oligonucleotide to the coding sequence of a native polypeptide to produce a tagged gene coding for a fusion polypeptide which comprises an epitope, which is coded for by the oligonucleotide, which itself comprises a nucleotide sequence which encodes the epitope independently of the reading frame of said nucleotide sequence. A further step of the method introduces the tagged gene into an expression system under conditions sufficient for transcription of the tagged gene to yield mRNA, and conditions sufficient for the mRNA to be translated to yield the fusion polypeptide. The fusion polypeptide is distinguishable from the native polypeptide by the absence of antigenicity of the native polypeptide to an antibody specific for the epitope.

The invention is also directed to another method of epitope tagging which involves by a single event tagging genes, transcripts and proteins in a eukaryotic cell. This method comprises a step of introducing into an intron within a gene a DNA sequence including a first nucleotide sequence, an acceptor site for RNA splicing, a second nucleotide sequence, and a donor site for RNA splicing. The first nucleotide sequence is necessary for splice acceptor function. The second nucleotide sequence encodes an epitope recognized by an antibody, other reagent or molecule. A further step promotes

expression of the gene in a eukaryotic cell to produce a protein product, which comprises a peptide epitope encoded by the second nucleotide sequence as part of its primary structure. A unique aspect of the invention is directed to the second nucleotide sequence, which encodes an epitope independently of the reading frame of the second nucleotide sequence.

11. Method for Purifying a Polypeptide. Another aspect of the invention is directed to a method for purifying a polypeptide. This method involves tagging a target sequence which encodes a polypeptide with a nucleotide sequence which encodes an epitope independent of the reading frame of the nucleotide sequence to produce a tagged target sequence which encodes a fusion polypeptide. The tagged target sequence is expressed in an expression system to produce the fusion polypeptide, which is then purified.

12. Method to Detect a Polypeptide. The invention is directed to method for detecting a polypeptide. A first stepof this method involves tagging a target sequence which encodes a polypeptide with a nucleotide sequence which encodes an epitope independent of the reading frame of the nucleotide sequence to produce a tagged target sequence which encodes a fusion polypeptide. It is understood that the fusion polypeptide comprises a universal epitope of the invention. The tagged target sequence is then expressed in an expression system to produce said fusion polypeptide. The expression system is then contacted with a sufficient amount of an antibody or reagent which is specific for the epitope under conditions which produce a detectable signal that indicates a reaction between the fusion polypeptide and antibody or reagent, thereby indicating the presence of the polypeptide of interest.

13. Kits for Epitope Tagging. A kit for epitope tagging is provided by the invention. The kit comprises antibodies or other reagents specific for the epitope or fusion specific for a fusion protein comprising the epitope. Further embodiments of the kit additionally comprise an oligonucleotide or DNA construct which comprises a nucleotide sequence which encodes an epitope and which is adapted for insertion into a target nucleotide sequence and for expression in a host cell. The nucleotide sequence encodes the epitope independently of the reading frame of the nucleotide sequence. Other emobidiments of the kit are directed to additional elements such as probes sufficiently complementary to a nucleotide sequence which codes for the epitope. One embodiment of

the kit comprises a DNA construct which codes for a fusion polypeptide which comprises a native polypeptide fused to the epitope. Still another version of the kit comprises a vector suitable for incorporating the DNA construct.

It is an object of the present invention that the claimed oligonucleotides, epitopes, fusion proteins, DNA constructs, vectors, probes, antibodies, transformed cellular and multicellular organisms, and methods for making and using them provide a set of robust tools which are more efficient than existing ones for analyzing and dissecting complex biological processes and systems. The present invention achieves this object in part by discovering new genes, determining the size and abundance of proteins produced by newly discovered genes, tracking the movement of proteins within cell membranes, monitoring receptor binding and internalization of exogenous proteins, identifying the components of functional protein complexes, purifying proteins, discovering the function of proteins, and in particular, proteins that are unstable, are difficult to purify, or share epitopes with a number of other proteins.

The above-discussed and many other features and attendant advantages of the present invention will become better understood by reference to the following detailed description of the invention taken in conjunction with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS Figures 1 (a) and (b) show the coding sequence encoding the Chlamydomonoas protein RSP3 before (a) and after (b) the insertion of the 37-mer oligonucleotide GAT CAC AGA CAG ACA GAC AGA CAG ACA GAC AGG GAT C.

Figures 2(a) and (b) show the untagged and tagged amino acid sequence of the RSP3 protein encoded by the coding sequences shown, respectively, in Figures 1(a) and (b).

Figure 3 is a list of all possible peptides resulting from translation of a sense stand made up of all possible repeating four-nucleotide sequences. Sequences that include nonsense codons are excluded.

Figure 4 shows the reactivity of a monoclonal antibody made against the peptide (PHHTT)3 to a GST fusion protein containing the (PHHTT)3 sequence.

MODES OF CARRYING OUT THE INVENTION General Description and Definitions The practice of the present invention will employ, unless otherwise indicated, conventional biochemistry, immunology, molecular biology and recombinant DNA techniques within the skill of the art. Such techniques are explained fully in the literature. See, e.g. Maniatis et al., Molecular Cloning: A Laboratory Manual (1982); DNA Cloning: A Practical Approach, vol., I & II (D. Glover, ed.); Oligonucleotide Synthesis (N Gait, ed., 1984); Nucleic Acid Hybridization (B. Hames and S. Higgins, eds., 1985); Transcription and Translation (B. Hames & S. Higgins, eds., 1984); Animal Cell Culture (R. Freshney, ed., 1986); Perbal, A Practical Guide to Molecular Cloning (1984); Peptide Antigens, A Practical Approach, ed. G.B. Wisdom (1994), Oxford University Press, NY, NY; and Immunological Recognition of Peptides in Medicine and Biology, eds. N.D. Zegers, W.J.A. Boersma, and E. Claassen (1995), CRC Press, Boca Raton, FL; Molecular Biology and Biotechnology, (ed. Robert A. Meyers, 1995) VCH Publishers, New York, NY.

The following terminology will be used in accordance with the definitions set out below in describing the present invention.

As used herein, the term "epitope" means that portion of a recombinant or non-recombinant protein that is recognized by a particular antibody species or recognized by another molecule that interacts specifically with the protein.

The term "expression system" is well understood in the art to mean either an in vitro system or cellular or multicellular organism capable of transcribing and translating nucleotide sequences to produce polypeptides.

As used herein, the term "tagging" or "tagging a target sequence" refers to introducing by recombinant methods one or more nucleotide sequences encoding a peptide epitope into a polypeptide-encoding gene, i.e. a target sequence so that the gene expresses a fusion polypeptide which comprises the peptide epitope.

The term "fusion polypeptide" or "fusion protein" refers to a polypeptide which has been tagged with a peptide epitope. The amino acid sequence of the fusion protein comprises the peptide epitope amino acid sequence, which epitope may be a universal epitope if it was encoded by a nucleotide sequence which encodes the peptide epitope independently of the reading frame of the nucleotide sequence..

The twenty amino acids with abbreviations and messenger RNA code designations are as follows: TTT phe F TCT ser S TAT tyr Y TGT cys C TTC phe F TCC ser S TAC tyr Y TGC cys C TTA leu L TCA ser S TAA OCH Z TGA OPA Z TTG leu L TCG ser S TAG AMB Z TGC trp W CTT leu L CCT pro P CAT his H CGT arg R CTC leu L CCC pro P CAC his H CGC arg R CTA leu L CCA pro P CAA gln Q CGA arg R CTG leu L CCG pro P CAG gln Q CGG arg R ATT ile I ACT thr T AAT asn N AGT ser S ATC ile I ACC thr T AAC asn N AGC ser S ATA ile I ACA thr T AAA lys K AGA arg R ATG met M ACG thr T AAG lys K AGG arg R GTT val V GCT ala A GAT asp D GGT gly G GTC val V GCC ala A GAC asp D GGC gly G GTA val V GCA ala A GAA glu E GGA gly G GTG val V GCG ala A GAG glu E GGG gly G The present invention overcomes the previously mentioned "reading frame obstacle" by providing a DNA construct for epitope-tagging irrespective of reading frame, which makes the construction of appropriately tagged genes three times more efficient as is otherwise possible with conventional methods of epitope tagging. In other words, a single DNA construct within the scope of the present invention enables one to tag a protein with the same epitope from all three possible reading frames of the nucleotide sequence encoding the epitope.

The invention provides an oligonucleotide which comprises a nucleotide sequence that encodes an epitope. The nucleotide sequence encodes the epitope independently of the reading frame of the nucleotide sequence. The oligonucleotide is adapted for insertion into a target nucleotide sequence, and is also adapted for expression in a host cell.

The oligonucleotide of the invention comprises a nucleotide sequence of the form (S)n where S is a sequence of nucleotides whose number is not evenly divisible by

3 and n is an integer equal to or greater than the number of nucleotides in S. The sequence S is chosen such that any oliognucleotide having sequence (S)n does not includes stop codons. In practice, the oligonucleotide is inserted into the coding sequence of a cloned gene in such a way that the reading frame 3' to the inserted oligonucleotide is maintained and the gene is expressed when inserted into living host cells. As a result, an epitope-tagged (i.e.peptide-tagged) protein is produced in the cell.

A peptide epitope is encoded in each of the three reading frames of the oligonucleotide (S)n (here defined as Peptide 1, Peptide 2, and Peptide 3) and is known from the sequence of codons inherent in the three reading frames of the linear sequence of nucleotides in the oligonucleotide (S)n. These peptides are related to one another by a simple circular permutation of the same peptide, whose length is the same number as the number of nucleotides in S. In a method of the invention, that same peptide is used to immunize an animal in order to make polyclonal or monoclonal antibodies specific for that same peptide, and specific for the related, circularly permuted peptide epitopes. In that manner, an antibody specific to a peptide that is common to Peptides 1, 2, and 3 is chosen.

That antibody is used to immunolocalize and/or immunopurify the epitope-tagged (i.e.

peptide-tagged) protein present in, or derived from, the living host cells in which the tagged gene is expressed.

For example, take the case where S is 4 and n is 7 for the sequence ACAG.

In this case, the oligonucleotide of the invention comprises the nucleotide sequence (ACAG)7, i.e. a 7-repeat ofthe sequence ACAG or ACAGACAGACAGACAGACAGACAGACAG. The invention provides adaptations to the oligonucleotide which allow it to be inserted into a target sequence and which adapt the nucleotide sequence for expression in a host cell as follows: An oligonucleotide is synthesized consisting of the (ACAG)7 sequence surrounded by one or more flanking sequences such as a few additional nucleotides that provide flanking restriction sites and that assure that the oligonucleotide will not break the reading frame when inserted into a corresponding restriction site in the target gene (i.e. assuring that the insert in the gene will be 3N n ucleotides in length). An example is a 37-mer GATCACAGACAGACAGACAGACAGACAGACAGGGAT C that contains the (ACAG)7 sequence flanked or surrounded by MboI (GATC) sites and a G at position 33.

The 37-mer oligonucleotide is inserted into an MboI site within the coding sequence of a cloned gene or cDNA. For example, the target could be a cDNA including the coding sequence encoding the Chlamydomonoas protein RSP3 (25 )shown in Figure l(a). This sequence contains a single MboI site, shown in bold type in the figure. When the 37-mer is inserted at the MboI site, the result is the sequence shown in Figure l(b).

The tagged gene is then introduced into living cells using the DNA constructs or DNA vectors of the invention. The amino acid sequences of an untagged and tagged RSP3 protein are shown, respectively, in Figures 2(a) and (b).

Hypothetical translation of the (ACAG)7 sequence in reading frame 0 (i.e.

beginning with the first nucleotide) yields the amino acid sequence TDRQTDRQT. In reading frame 1 (beginning with the second nucleotide), it yields the sequence QTDRQTDRQ. In reading frame 2 (beginning with the third nucleotide) it yields the sequence RQTDRQTD. It will be understood that the epitope of the invention can be any one of the repeating amino-acid sequences encoded independently of the reading frame of the sequence (S)n, which here is (4)7, and in particular (ACAG)7. Each of these amino acid sequences is related to the other sequences by a circular permutation of a repeating tetrapeptide. Three circularly permuted hexapeptides are common to all three sequences: TDRQTD, QTDRQT, and RQTDRQ, any of which is an epitope of the invention.

Likewise, the pentapeptides QTDRQ, TDRQT, DRQTD, RQTDR, are all epitopes of the invention. According to the method of the invention for producing antibodies (polyclonal or monclonal), one of the circularly permuted peptides is chosen, and is used as an immunogen for injecting an animal to produce an antibody recognizing the peptide. For example a mouse monoclonal recognizing QTDRQT is produced. Techniques within the skill of the art of immunology for making polyclonal and monoclonal antibodies are explained fully in the literature. See Current Protocols in Immunology, eds. Coligan et al., John Wiley and Sons, publ. (1996); Antibodies, A Laboratory Manual, Harlow and Lane, Cold Spring Harbor Laboratory Press (1988), and reference number 14. For example, proteins from the cells containing the tagged gene are separated by SDS gel electrophoresis. The proteins are transferred to nitrocellulose and probed with antibody.

Epitope-tagged protein is visualized using alkaline phosphatase-conjugated anti-mouse IgG secondary antibody.

It is important to emphasize that the repeating oligonucleotide sequences and peptide sequences of the invention constitute only a miniscule fraction of all possible oligonucleotides or peptides of equal size. For example, as listed in Figure 3, when S equals 4 nucleotides and n equals 15 (giving a sixty-nucleotide oligonucleotide and a twenty amino-acid peptide), there are exactly 208 sequences of the invention (DNA or protein) possible. The number 208 is arrived at as follows. There can exist 256 (44) repeating four-nucleotide sequences. Of these 48 include nonsense codons. (The number 48 is arrived at by summing the fraction of nonsense codons (3/64) over the four repeating codons in the oligonucleotide and multiplying by 256.) 48 is subtracted from 256 to give 208. Similarly when S equals 5 nucleotides and n equals 12, there are exactly 1024- 240=784 sequence of the invention. In dramatic contrast, the number of possible sixty- nucleotide sequences equals 460, and the number of possible twenty amino acid peptides is 2020 Both of these numbers are truly astronomical - making it is extremely unlikely that any of the oligonucleotides or peptides of the invention even exist in the natural world.

In a further elaboration of the invention, the choice of an oligonucleotide of the form (S)n is restricted to those cases where the oligonucleotide sequence, in the anti sense orientation, also lacks nonsense codons. All such antisense oligonucleotides, like all sense oligonucleotides, encode in each reading frame peptide epitopes that are related to each other by a simple circular permutation of a repeating peptide sequence. Two antibodies of the invention-- one to a peptide epitope present in each "forward peptide" and one to a peptide epitope present in each "reverse peptide" are used to detect and/or purify the tagged protein. Here both the reading frame obstacle and the orientation obstacle are overcome, and so the probability of successful epitope tagging is fully 100%.

An example is the sequence (GTCCA)9 which encodes the repeating pentapeptide VQSSP.

In its three reading frames, the sequence encodes the three related peptides shown below.

One of the several common peptides (VQSSPVQSSPV) encoded in each reading frame is underlined.

GTC CAG TCC AGT CCA GTC CAG TCC AGT CCA GTC CAG TCC AGT CCA V O S S P V O S S P V Q S S P <BR> <BR> <BR> S S P V O S S P V O S S P V <BR> <BR> <BR> <BR> P V O S S P V O S S P V Q S In the reverse orientation, the sequence is (TGGAC)9 which encodes the repeating pentapeptide WTGLD. In its three reading frames, the sequence encodes the three related peptides shown below. One of the several common peptides (WTGLDWTGLDW) encoded in each reading frame is underlined.

TGG ACT GGA CTG GAC TGG ACT GGA CTG GAC TGG ACT GGA CTG GAC <BR> <BR> <BR> <BR> <BR> W T G I D W T G I D W T G I D <BR> <BR> <BR> G I D W T G I D W T G I D W <BR> <BR> <BR> <BR> D W T G I D W T G I D W T G Using two antibodies or other reagents, one recognizing the sequence VQSSPVQSSPV and one the sequence WTGLDWTGLDW, the protein encoded by a gene tagged with the sequence is recognized irrespective of the reading frame or orientation of the inserted oligonucleotide. Accordingly, the oligonucleotide of the invention includes those nucleotide sequences that also encode a second amino acid sequence epitope on the antisense strand.

In some cases the universal oligonucleotide is palindromic and so the forward and reverse oligonucleotides are the same, as are the forward and reverse peptides. In these cases the protein encoded by a gene tagged with the sequence is recognized by a single antibody or other specific reagent irrespective of the reading frame or orientation of the inserted oligonucleotide.

An example is the palindromic sequence (ACGT), which encodes the repeating tetrapeptide TYVR. In its three reading frames, the sequence encodes the three related peptides (TYVRTYVRT) shown below, in which one of the several common peptides is underlined.

ACG TAC GTA CGT ACG TAC GTA CGT ACG TAC GTA CGT T Y V R T Y V R T Y V R R T Y V R T Y V P T Y <BR> <BR> <BR> V P T Y V P T Y V P T

Because the oligonucleotide sequence is palindromic, it encodes the identical peptide in reverse orientation. Using a single antibody or other specific reagent recognizing the sequence TYVRTYVRT, the protein encoded by a gene tagged with the sequence is recognized irrespective of the reading frame or orientation of the inserted oligonucleotide.

The scope of the present invention is, in part, illustrated in Figure 3, which is a list of peptide epitopes of the invention which would result from translation of an oligonucleotide ofthe invention comprising repeating four-nucleotide sequences, i.e.

S=4 nucleotides, inserted in the sense strand of a target sequence.

In order that the invention described herein may be more fully understood, the following examples are set forth. It should be understood that these examples are for illustrative purposes only and are not to be construed as limiting the scope of this invention in any manner.

EXAMPLE 1 1 . Generation of Polyclonal Mouse Sera Against the Peptide (PHHTT),, A Multiple Antigen Peptide (MAP) carrying the sequence (PHHTT)3 was synthesized using standard procedures (Tam and Shao. 1993. Current Protocols in Immunology, Suppl. 7: 9.6.1-9.6.18). A 1 mg/ml solution of the peptide in 0.1 M Sodium Bicarbonate was prepared and stored at -80 degrees C. Mice were immunized with 100 micrograms of the peptide in Freund's complete adjuvant and boosted with 100 micrograms of the peptide in Freund's incomplete adjuvant on days 21, 49 and 77 post-immunization and bled on day 82. Subsequent boosts were given two to three weeks after the first bleed, and blood samples were taken five days after each boost. Sera were prepared from whole blood by standard methods and immunoreactivity against the immunogen was assayed by ELISA in 96 well plates using standard methods. The blank values in the assay were 0.13 per well. The data, examples of which are shown in Table 1 below, demonstrated distinct immunoreaction to the (PHHTT)3 peptide by all four mice that were immunized.

Table 1: Immunoreactivity of mouse sera to the (PHHTT)3 immunogen.

Serum - bleed at day 112 Serum - bleed at day 80 1:1000 1:2000 1:1000 1:2000 Mouse 1 3.00 2.14 0.94 0.26 Mouse 2 2.43 0.99 1.24 0.34 Mouse 3 2.05 0.70 1.72 0.55 Mouse 4 2.98 2.74 1.68 0.37 2. Generation of Polyclonal Mouse Sera Against the Peptide (PHLTS)3 A Multiple Antigen Peptide (MAP) carrying the sequence (PHLTS)3 was synthesized using standard procedures (Tam and Shao. 1993. Current Protocols in Immunology, Suppl. 7: 9.6.1-9.6.18). A 1 mg/ml solution of the peptide in 0.1 M Sodium Bicarbonate was prepared and stored at -80 degrees C. Mice were immunized with 100 micrograms of the peptide in Freund's complete adjuvant and boosted with 100 micrograms of the peptide in Freund's incomplete adjuvant on days 21, 49 and 77 post-immunization and bled on day 82. Subsequent boosts were given two to three weeks after the first bleed, and blood samples were taken five days after each boost. Sera were prepared from whole blood by standard methods and immunoreactivity against the immunogen was assayed by ELISA in 96 well plates using standard methods. Representative data are shown in Table 2 below. The blank values in the assay were 0.13 per well. Although the (PHLTS)3 peptide was less immunogenic than the (PHHTT)3 peptide, distinct immunoreaction to the (PHLTS)3 peptide was observed for each of the five mice that were immunized.

Table 2: Immunoreactivity of mouse sera to the (PHLTS)3 immunogen.

Serum - bleed at day 112 Serum - bleed at day 80 1:250 1:500 1:250 1:500 Mouse 1 0.42 0.20 0.35 0.16 Mouse 2 1.11 0.66 0.68 0.29 Mouse 3 1.39 0.81 0.65 0.28 Mouse 4 1.94 1.18 1.56 0.91 Mouse 5 1.23 0.81 0.47 0.24 3. Generation of Monoclonal Antibodies Against the (PHHTT!3 Peptide.

A splenectomy were performed on mouse 4 of Table 1, and hybridomas were generated and cultured using standard methods (Antibodies, A Laboratory Manual, 1988. Harlow and Lane, Cold Spring Harbor Laboratory Press (1988). Five clones secreting reactive immunoglobulins were identified and cultured.

4. Production. Detection. and Analysis of Immunoreactive GST-Fusion Proteins Expressing the (PHHTT!3 and (PHLTS)3 peptides.

To test reactivity of antisera to proteins which were epitope tagged according to the method of the invention, GST (glutathione-S-transferase) fusion proteins containing the (PHHTT)3 and (PHLTS)3 peptides were prepared as follows.

To produce a fusion polypeptide with a PHHTT tag, a DNA oligonucleotide of the invention was produced which had the 91 nucleotide sequence: <BR> <BR> <BR> GGATCCAAGATCTGGTACCCCACACCACACCACACCACACCA <BR> <BR> <BR> <BR> <BR> CACCACACCACACCACACCACACCACACCACACCACAAGATCTGAATTC It was synthesized by standard methods, cut with the restriction enzymes BamI and EcoRI, and cloned into the vector pGEX-2T (Pharmacia, Piscataway, NJ) that had been cut with the same two enzymes, thus producing a vector of the invention. The tagged vector was transformed into E. coli DH5alpha cells and transformants, i.e host cells transformed by the vector were analyzed by standard methods to confirm that they contained the expected recombinant plasmid. Based on the known sequence of

the pGEX-2T plasmid (Smith and Johnson. 1988. Gene 67:31-40) it was expected that the insert into the target GST gene would lead to the introduction of the peptide KIWYPTPHHTTPHHTTPHHTTPHHKI within the GST protein.

To produce a fusion polypeptide with a PHLTS tag, the 90 nucleotide DNA sequence: GGATCCAGATCTGGTACCCCTCACCTCACCTCACCTCACCTCACCT CACCT CACCTCACCTCACCTCACCTCACCTCAAGATCTGAATTC was synthesized by standard methods, cut with the restriction enzymes BamI and EcoRI, and cloned into the vector pGEX-2T (Pharmacia, Piscataway, NJ) that had been cut with the same two enzymes. The tagged vector was transformed into E. coli DH5alpha cells and transformants were analyzed by standard methods to confirm that they contained the expected recombinant plasmid. Based on the known sequence of the pGEX-2T plasmid (Smith and Johnson. 1988. Gene 67:31-40) it was expected that the insert would lead to the introduction of the peptide RSGTPHLTSPHLTSPHLTSPHLTSRS within the GST protein.

Cultures, each 150 ml, of cells containing the tagged pGEX-2T plasmids were grown to mid-log phase and induced with IPTG (3 mM) following standard procedures. After 120 minutes, cells were concentrated by centrifugation.

5 microliters of 5X SDS sample buffer was added to 20 microliters of concentrated cell suspension; boiled for 5 minutes, and clarified by a ten minute centrifugation at 5,000 rpm. 1 microliter samples were loaded onto precast 12.5% acrylamide Pharmacia Phastgels with 6% acrylamide stackers and subjected to SDS gel electrophoresis.

Proteins were transferred to PVDF membranes using standard methods. The membranes were blocked with 3% gelatin for 60 minutes and then probed with immune or control sera (1:40 dilution) for 2 hours at room temperature. Reactive antibodies were visualized by standard methods using goat anti-mouse IgG linked to horseradish peroxidase. Each of the nine mouse sera listed in Tables 1 and 2 showed specific reactivity to the appropriate fusion protein, but not to the other fusion protein or to the non-tagged GST protein. Several monoclonal antibodies also showed strong and specific reactivity. An example is shown in Figure 4.

EXAMPLE 2 Alternative Method for Epitope Tagging The present invention incorporates by reference U.S. Patent Application Serial No. 08/000,619, which is directed to a method whereby a molecular tag is put on a eukaryotic gene, transcript and protein in a single recombinational event. The protein or epitope tag takes the form of a unique peptide that can be recognized by an antibody or other specific reagent. The transcript tag takes the form of the sequence of nucleotides encoding the peptide than can be recognized by a specific polynucleotide probe, and the gene tag takes the form of a larger sequence of nucleotides that includes the peptide-encoding sequence and other associated nucleotide sequences. The DNA which is used for insertion into a target sequence is structured such that when it is inserted into an intron within a gene it creates two hybrid introns separated by a new exon encoding the protein tag. A unique and improved feature of the present invention is directed to the exon, which comprises the oligonucleotide of the present invention encoding for an epitope regardless of the reading frame of the exon. The method allows one to identify new proteins or protein- containing structures, and to readily identify and analyze the genes encoding those protein.

In particular, the present invention is directed to a method of epitope tagging which involves tagging genes, transcripts and proteins in a eukaryotic cell.

This method comprises a step of introducing into an intron within a gene a DNA sequence including a first nucleotide sequence, an acceptor site for RNA splicing, a second nucleotide sequence, and a donor site for RNA splicing. The first nucleotide sequence is necessary for splice acceptor function. The second nucleotide sequence, which becomes a "guest exon" when inserted in a target gene, encodes an epitope recognized by an antibody, other reagent or molecule. A further step promotes expression of the gene in a eukaryotic cell to produce a protein product, which comprises a peptide epitope encoded by the second nucleotide sequence as part of its primary structure. A unique aspect of the invention is directed to the second nucleotide sequence, which encodes an epitope independently of the reading frame of the second nucleotide sequence, for example (PHHTT)3

EXAMPLE 3 Probes An aspect of the present invention is directed to a probe which has a nucleotide sequence that is sufficiently complementary to an oligonucleotide which comprises a nucleotide sequence which codes for an epitope independently of the reading frame of the nucleotide sequence. As generally understood in the art, a probe is a nucleotide sequence, generally, but not limited to DNA, that is used to detect its homologous location on a target sequence, which may be a chromosome. Probe construction and use are matters of standard technique well known in the literature and incorporated by reference herein.

Probes of the present invention, for example the sequence (TGTGG)12 that hybridizes specifically to the sequence (CCACA)12 that encodes the (PHHTT)3 epitope tag, are used to detect the presence of the epitope tag by hybridization using standard methods or are used as primers to PCR-amplify sequences lying between two tags or between a tag and a known sequence in a target gene.

EXAMPLE 4 Vectors and Transformed Host Cells. Animals and Plants The invention provides a recombinant vector which comprises a DNA construct of the invention. As described above, the oligonucleotide of the invention can be inserted into a gene cloned in a specific vector - for example a bacterial plasmid such as pBR322 and its derivitives or the pUC series of plasmids and their derivitives such as pUC118, or a bacterial transposon such as TnlO, Tn5 or Tn3 and their derivitives, or a bacterial virus such as lambda, M13, P22, fl and their derivitives, or a eucaryotic transposon such as Ty- 1 or P-element and their derivitives or a eucaryotic virus such as Epstein-Barr virus, herpes virus, baculovirus, adenovirus or SV-40 and their derivitives, or a retrovirus such as MoMLV, MoMSV, ALV and their derivitives, that allows replication and transfer of the oligonucleotide to a host or from one host to another. The vector of the invention is used for introducing the DNA construct of the invention to a host cell, and is useful for producing an aspect of the invention directed to transformed or transgenic cells, animals and plants.. Vector construction and use are well known in the scientific literature, which is referenced herein. Techniques are

also well known for modifying vectors to accommodate the oligonucleotide of the invention inserted into a target gene for delivery of the target gene into cells.

It is understood that the vectors of the invention are useful for producing animals or plants in which all or a portion of the organism's cells contain a vector of the invention. A transgenic organism is an animal or plant that carries a foreign gene integrated into its genetic material. It is understood that the foreign gene of the invention is a gene that has been tagged by the oligonucleotide of the invention using methods described herein, and which gene is detectable in the transgenic organism using the probe of the invention, or by detecting the polypeptide expression of the tagged gene using the antibodies or other reagents of the invention which are specific for the epitope-tagged polypeptide.

EXAMPLE 5 Method for Purifying a Polypeptide The invention is directed to a method for purifying a polypeptide. A first step involves tagging a target sequence which encodes a polypeptide with a nucleotide sequence which encodes an epitope independent of the reading frame of the nucleotide sequence to produce a tagged target sequence which encodes a fusion polypeptide. A typical technique for tagging a target sequence is described herein in Example 1. In a subsequent step, the tagged target sequence is expressed in an expression system to produce the fusion polypeptide. Using technques well known in the art (and referenced herein) for purifying polypeptides, the fusion polypeptide is substantially purified. A technique preferred by the invention for purifying a fusion polypeptide involves immunoaffinity chromatography (IAC) (25), which employs antibodies specific for the universal epitope or for the fusion polypeptide which comprises the universal epitope.. IAC is a powerful separation procedure for the purificaton of peptide epitopes or fusion polypeptide which comprise a universal epitope. The technique relies upon the immunological specificity of an antibody specific for a universal epitope in terms of the antibodies specific recognition and binding of the epitope, which occurs even in complex mixtures of diverse macromolecules.

IAC is a type of adsorption chromatography. Using IAC, one or more fusion proteins created by the method of the invention in a complex mixture to be separated interact with insoluble particles (the matrix) comprising the chromatographic medium, which is usually packed into a chromatographic column. Unadsorbed components in the mixture remain in the mobile liquid phase, which can then easily separted from the matrix. In one form of IAC, an antibody specific, which is specific for the universal epitope contained in the fusion protein, is immobilized on to the insoluble chromatographic matrix. The corresponding soluble fusion polypeptide in the mixture to be resolved can be specifically adsorbed to the substituted matrix following immunological recognition and binding, and the non-bound moieties (the contaminants) are then simply washed away. The complex between the insoluble immunoadsorbent and antigen is subsequently dissociated and the purified antigen fusion protein obtained.

EXAMPLE 6 Method to Detect a Polypeptide The invention is directed to a method for detecting a polypeptide, which involves the step of tagging a target sequence which encodes a polypeptide with a nucleotide sequence which encodes an epitope independent of the reading frame of the nucleotide sequence to produce a tagged target sequence which encodes a fusion polypeptide. The tagged target sequence is expressed in an expression system to produce said fusion polypeptide. The expression system is contacted with a sufficient amount of an antibody or reagent which is specific for the epitope under conditions which produce a detectable signal indicating a reaction between the fusion polypeptide and antibody or reagent. Immunoassay methods, which are well known and referenced herein, are used in the present method for detecting a fusion polypeptide. The immunoassay methods employ antibodies specific for the universal epitope or for the fusion polypeptide which comprises the universal epitope. The technique relies upon the immunological specificity of the antibody specific for a universal epitope in terms of the antibodies specific recognition and binding of the epitope, which occurs even in complex mixtures of diverse macromolecules. Competitive assays, two-site (sandwich assays), immunoblotting, and immunocytochemistry are immunoassay methods used in the present method for detecting fusion polypeptides either in complex mixtures, or for detecting expression and location in cells or in multicellular structures of fusion polypeptides by means of immunohistocytochemical methods.

It is intended that all matter contained in the above description or shown in the accompanying drawings shall be interpreted in an illustrative and not limiting sense.

BIBLIOGRAPHY 1. Munro, S. and Pelham, H.R.B., 1984, EMBO Journal 3:3087-3093.

2. Wilson, I.A., et al.. 1984, Cell 37:767-778.

3. Field, J., et al., 1988, Molec. Cell. Biol. 8(5): 2159-2165.

4. Munro, S. and Pelham, H.R.B., 1986, Cell 46:291-300.

5. Reisdorf, P., petal., 1993, Current Genetics 23:181-183.

6. Pati, U.K., 1992, Gene 114:285-288.

7. Surdez, P. and Jacobs-Lorena, M., 1994, BioTechniques 17(3):560-565 8. Evan, G.I., et al., 1985, Mol. Cell Biol. 5:3610-3616 9. Field, J., et al. 1988, Molec. Cell Biol. 8(5):2159-2165 10. Wilson, I.A., et al., 1984, Cell, 37:767-778 11. Peptide Antigens, A Practical Approach, ed. G.B. Wisdom (1994), Oxford University Press, NY, NY.

12. Immunological Recognition of Peptides in Medicine and Biology, eds. N.D.

Zegers, W.J.A. Boersma, and E. Claassen (1995), CRC Press, Boca Raton, FL 13. Sugii et al. (1994) 14. Posnett, D.N. and J.P. Tam in Methods in Immunology, V. 176:146.

15. Sawin et al. (1992. J. Cell Science. 101: 303-313).

16. Surdej and Jacobs-Lorena. 1994. Biotechniques 17: 560-565.

17. Ku and Schultz. 1995. Proc. Nat. Acad. Sci. USA 92:6552-6556.

18. ONeil and Hoess. 1995. Curr. Opin. Struct. Biol. 5: 443-449.

19. Lu et al., 1995. Bio/Technology 13: 366-372.

20. Lebl et al., 1995. Biopolymers 37: 177-198 21. Sepetov et al. 1995. Proc. Nat. Acad. Sci. USA 92: 5426-5430.

22. Klug and Famulok. 1994. Mol. Biol. Rep. 20: 97-107 23. Nieuwalandt et al. 1995. Biochemistry 34: 5651-5659.) 24. Williams et al. 1989. J. Cell Biol. 109: 235-245) 25. Jack, G.W., Mol. Biotechnol. 1:59-86 (1994).