Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD OF PRODUCING PROTEIN FRAGMENTS
Document Type and Number:
WIPO Patent Application WO/1983/004262
Kind Code:
A1
Abstract:
Cloning vectors with indicator genes, such as beta-galactosidase, which have been inactivated by frameshift. If a gene fragment with an open reading frame is inserted into a vector, the indicator gene may be reactivated. DNA from uncharacterized organisms can be fragmented, and protein-coding fragments can be easily identified by using the vectors. Polypeptides which can be produced by the identified fragments have a variety of useful purposes, including vaccines. This method allows for the simple production of a variety of valuable products, including polynucleotides, cells that produce useful polypeptides, vaccination agents and other polypeptides, antibodies, and assay kits.

Inventors:
ROSBASH MICHAEL M (US)
Application Number:
PCT/US1983/000797
Publication Date:
December 08, 1983
Filing Date:
May 24, 1983
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV BRANDEIS (US)
International Classes:
C12N15/62; C12N15/64; G01N33/531; G01N33/68; (IPC1-7): C12N15/00; A61K45/02; C07C103/52; C07H21/04; C12N1/00; C12P21/00; G01N33/54
Foreign References:
GB2071671A1981-09-23
EP0011560A21980-05-28
Other References:
Cell, Vol. 20, June 1980 L. GUARENTE et al.: "Improved Methods for Maximizing Expression of a Cloned Gene : a Bacterium that Synthesizes Rabbit-beta-Globin", pages 543-553, see pages 549-552 (cited in the Application)
Science, Vol. 209, 19 September 1980 L. GUARENTE et al.: "A Technique for Expressing Eukaryotic Genes in Bacteria", pages 1428-1430, see pages 1428 and 1429
Gene, Vol. 19, 1982 J. VIEIRA et al.: "The puc Plasmids, an M13mp7-Derived System for Insertion Mutagenesis and Sequencing with Synthetic Universal Primers", pages 259-268, see the enitre document
Download PDF:
Claims:
-2 7-Claims
1. A cloning vector which contains a gene which has been modified by the addition or deletion of one or more nucleotides, said gene in the unmodified , state being capable of coding for an indicator polypeptide, wherein said addition or deletion occurs between a first region of said gene which codes for a translational initiation site and a second region of said gene which codes for a poly peptide sequence which is necessary for said in¬ dicator polypeptide to have one or more indicator activities.
2. A cloning vector which can be modified to cause it to code for an indicator polypeptide by in " serting into said cloning vector a polynucleotide which codes for a different polypeptide, and which does not contain a translational initiation site.
3. A cloning vector of Claims 1 or 2 wherein said indicator polypeptide has^galactosidase activity.
4. A cloning vector of Claims 1 or 2 which may be cleaved by a first endonuclease such that a single cleavage occurs, said cleavage occurring in the Nterminal portion of said gene.
5. A cloning vector of Claim 4 which may be cleaved by a second endonuclease in two locations which are in proximity to and which bracket the cleavage site of said first endonuclease. " 28 .
6. A method of identifying cells which contain DNA . which contains exogenous gene segments, comprising the following steps: a. creating polynucleotide fragments; b. creating a plurality of plasmids by means of inserting said fragments into cloning vectors, into the Nterminal portion of at least one cloning vector gene which in the wild type codes for an indicator polypeptide which is capable of indicator activity despite the insertion of a polypeptide into the Nterminal portion of said indicator poly¬ peptide, wherein said gene has been inactivated by means of a frameshift addition or deletion in the portion of said gene which codes for the Nterminal portion of said indicator polypeptide. c. inserting one or more of said plasmids into cells which contain a low level of said indicator activity; d. culturing said cells; e. identifying one or more indicator cells which contain a relatively high level of said indicator activity.
7. A cell which is derived from a cell identified by the method of Claim 6.
8. A method of creating DNA, comprising the following steps: a. creating polynucleotide fragments; b. creating a plurality of plasmids by means .of inserting said fragments into cloning vectors into the Nterminal portion of at least one cloning vector gene which in the wild type 2 9 codes for an indicator polypeptide which is capable of indica¬ tor activity despite the insertion of a polypeptide into the Nterminal portion of said indicator polypeptide, wherein said gene has been inactivated by means of a frameshift addition or deletion in the portion of said gene which codes for the Nterminal portion of said indicator polypeptide; c. inserting one or more of said plasmids into cells which contain a low level of said indicator activity; d. culturing said cells; e. identifying one or more indicator cells which contain a relatively high level of said indicator activity; f. removing one or more of said plasmids from one or more of said indicator cells; and g. removing from one or more of said plasmids polynucleotide segment which contains a poly¬ nucleotide fragment which was inserted into said plasmid.
9. A DNA molecule derived from an open reading frame DNA segment identified by the method of Claim 8.
10. A cell which contains a segment of DNA derived from an open reading frame DNA segment which was identifed by the method of Claim 8.
11. A DNA segment which codes for a fusion protein com¬ prising a polypeptide segment inserted into an indicator polypeptide in a manner which does not destroy the indicator activity of said indicator polypeptide. 30 .
12. A fusion protein comprising a polypeptide segment inserted into an indicator polypeptide in a manner which does not destroy the indicator activity of said indicator polypeptide.
13. A method of creating polypeptides, comprising the following steps: a< creating polynucleotide fragments; b. creating a plurality of plasmids by means of inserting said fragments into cloning vectors, into the Nterminal portion of at least one cloning vector gene which in the wild type codes for an indicator polypeptide which is capable of indicator activity despite the insertion of a polypeptide into the Nterminal portion of said indicator polypeptide, wherein said gene has been inactived by means of a frameshift addition or deletion in the portion of said gene which codes for the Nterminal portion of said indicator polypeptide; σ. inserting said plasmids into cells which contain a low level of said indicator activity; d. culturing said cells; e. identifying one or more indicator cells which contain a relatively high level of said indicator activity; and, f. isolating polypeptides which are coded for by plasmids in, or derived from, said indicator cells. 31 .
14. A method of synthesizing polypeptides, comprising: a. identifying open reading frame segments of DNA by the method of Claim 6; b. determining the nucleotide sequence of said DNA; c. determining the polypeptide sequence that is coded for by said nucleotide sequence; d. synthesizing said polypeptide sequence or a portion thereof.
15. 0 15. A method of creating antibodies capable of binding to polypeptides created by a cell or micro¬ organism, comprising the following steps: a. creating polynucleotide fragments from the nucleic acid of said cell" or microorganism; 5 b. creating a plurality of plasmids by means of inserting said fragments into cloning vectors, into the Nterminal portion of at least one cloning vector gene which in the wild type codes for an indicator polypeptide which Q i capable of indicator activity despite the insertion of a polypeptide into the Nterminal portion of said indicator polypeptide, wherein said gene has been inactived by means of a frameshift addition 5 or deletion in the portion of said gene which codes for the Nterminal portion of said indicator polypeptide; c. inserting said plasmids into cells which contain a low level of said indicator activity; 0 d. culturing said cells; 32 e. identifying one or more indicator cells which contain a relatively high level of said • indicator activity; and, f. isolating polypeptides which are coded for by plasmids in, or derived from, said indicator cells. g. administering said polypeptides, or fragments thereof, to an animal or cell in a manner such that said polypeptides, or fragments thereof, are capable of acting as antigens which induce an immune response in said animal or cell which results in the production by said animal or cell of antibodies which are capable of binding to said polypeptides or fragments thereof; h.
16. selecting, from said antibodies, one or more antibodies that are capable of binding to one or more polypeptides that were coded for by one or more of said polynucleotide fragments.
17. A method for assaying a fluid to determine whether it contains a microorganism, comprising the following steps: a. creating antibodies capable of binding to polypeptides created by said microorganism, by the method of Claim 15; b. contacting said antibodies.with said fluid; and c. determining whether said antibodies bind to microorganisms in said fluid. ( _____?____ 33 .
18. A method for assaying a fluid to determine whether it contains antibodies which bind to a cell or microorganism, comprising the following steps: a. creating polypeptides which are coded for by DNA of said cell or microorganism, by the method of Claim 13; b. contacting said polypeptides, or fragments thereof, with said fluid; and c. determining whether said polypeptides or 10 fragments thereof are bound by antibodies in said fluid.
19. An assay kit containing polypeptides which are created by the method of Claim 13.
20. An assay kit containing antibodies which are ^5 created by the method of Claim 15.
Description:
METHOD OF PRODUCING PROTEIN FRAGMENTS

Description

Technical Field

This invention is in the fields of biology, medicine, and genetic chemistry.

Background Art

Proteins are created by a complex process that may be roughly divided into three stages. Those stages are commonly designated as transcription, processing, and translation, and they are summarized very briefly below.

The process of transcription involves the creation of a strand of RNA from a DNA template [1] . The DNA double helix separates temporarily into two strands, and an enzyme called "RNA polymerase" travels along one strand of DNA in the 5'. to 3' direction. As it moves, the polymerase identifies each exposed base on the DNA strand. The four bases are adenosine, thymine, cytosine, and guanine (designated as A, T, C and G, respectively) . The polymerase creates a strand of RNA with corresponding bases as follows:

DNA Template Transcribed in mRNA

A U (uracil) A

C G

G C

The length of the RNA strand created by the poly¬ merase is controlled by transcriptional initiation and termination sites and by other factors which are not entirely understood.

The strand of RNA thus created is called the "primary transcript." In eukar otic cells, it is processed or "edited" before it leaves the nucleus [2] . This is accomplished by a complex series of steps, some of which are not completely understood. One step of this process involved RNA splicing. Briefly, certain segments of the RNA strand are removed from the strand; the remaining segments are reconnected into a shorter strand which is then suitable for translation into protein. An "intron" refers to a portion of a gene which is transcribed into RNA which is deleted during processing; an intron is not translated into protein. An "exon" refers to a portion of a gene which is conserved throughout the RNA processing stage. The exon-complementary segments are reconnected to each other after the intron-complementary segments have been deleted to form mRNA. The edited RNA leaves the nucleus and is translated into protein.

The process of translation [3] involves the sequential connection of amino acids by means of peptide bonds, as shown in the following example-reaction: o -- o II r. ^-C- C_ / - - -._ n" .! " __ -*-

where R. and R- represent organic moieties. Only 20 different moieties are used to create the proteins in all living matter. Every protein or other polypeptide has an end which may be called an "amino terminus," an "N-terminus" or a "left-hand end." The other end of the polypeptide chain may be called the "carboxyl terminus," the "C- termimis," or the "right-hand end."

-2 . 1-

As used herein, the N-terminus of a polypeptide refers to either of the following: a. a methionine or formyl methionine molecule which is coded for by an AUG start codon, or b. an amino group or derivative thereof which is at the end of a polypeptide.

If a protein is divided into two portions, they may be regarded as the "N-terminal portion" (which con¬ tains the N-terminus) 'and the "C-terminal portion." Every indicator polypeptide (as defined below) contains one or more amino acids or amino acid se¬ quences which are essential for the proper function¬ ing of the polypeptide. Such amino acids are referred to herein as an "indicator sequence." As used herein, the "N-terminal portion" of a polypeptide refers to one or more amino acids which are located between the N-terminus and an indicator sequence of the polypep¬ tide. Similarly, the "N-terminal portion" of a gene refers to a polynucleotide sequence which codes for an N-terminal portion of a polypeptide.

-3-

The sequence of amino acids in a protein is very important to the proper functioning of the protein. The amino acid sequence is based upon the genetic code, which is identical for all types of cells. The bases in aStrand of mRNA are divided into groups of three; each group of three is called a "codon." Using the four bases of RNA, there are 64 possible codon sequences. Each sequence corresponds to a single type of amino acid. The entire genetic code has been identified [4] ; for example, the sequence UCA (specified in the 5' to 3' direction) codes for the amino acid serine. More than one codon sequence may code for a single amino acid; for example, four different codons (CCU, CCC, CCA, and CCG) code for proline.

One codon, AUG, is commonly called the "start codon" or a "translational initiation site." When a strand of mRNA passes through a ribosome, the process of peptide bonding does not commence until an AUG sequence is recognized. At that point, a methionine molecule (which is an amino acid) becomes the N-terminus of the protein that is being created. The next three nucleotides of the mRNA are read together as the next codon, and the corresponding amino acid is connected by a peptide bond to the methionine. The AUG start " ■ codon determines the "reading frame" or "register" of the mRNA; in other words, the AUG start sequence deter¬ mines how the nucleotides will be categorized into groups of three, each group of three serving as a codon.

Three of the 64 codon sequences are termination signals, commonly called "stop" codons. Those three codon sequences are UAA, UAG, and UGA. When a ribosome is translating a strand of mRNA, it con- 5 tinues adding amino acid molecules to the protein until it reaches a stop codon. It then releases the protein, and the amino acid which was coded for by the preceding codon becomes the C-terminus of the protein. 10 A segment of DNA which does not contain any stop codons is referred to herein as an "open reading frame." Stop codons do not occur within the coding regions of exon segments in the proper reading frame; otherwise, stop codons would cause a premature - * -* - truncation of the protein being translated, which might impair or destroy the functioning of the protein. The only exonic stop codon in the proper reading frame is at the terminal 3 1 sequence of the exon.

Stop codons may be created by various types of 0 mutation. For example, if a gene suffers a deletion of a nucleotide, the translation of the sequence can be totally altered, as indicated in the following example deletion:

Normal protein Asp His VAl Ala 5 Normal RNA GAU/CAU/GUA/GCA/

Altered RNA GAU/ AUG/UAG/CA

Altered protein Asp Met term Stop codons are presumed to occur randomly within non-coding DNA. They are also presumed to be created randomly when a gene is mutated or translated in the wrong frame.

-5-

Beta-galactosidase

In general, enzymes are biochemically active proteins which catalyze specific biochemical' functions. For example, the enzyme A -galactosidase (B-G) cleaves lactose (which is a disaccharide, i.e., a sugar molecule with two carbohydrate rings) into two monosaccharide rings, galactose and glucose, as shown in the following formula:

The enzyme B-G occurs in the commonly used bacteria Escherichia coli. It is coded for by the B-G gene, commonly called the lac-z gene.

Mutant strains of E. coli are commonly used which are lac-z " , i.e., the ^-G gene is not functional, Such cells can be grown -easily using nutrients, such as glucose, which do not require metabolism by the 3 -G enzyme. .

It is ' simple and convenient to distinguish lac-z + cells from lac-z ~ cells, by growing the cells on indicator plates which are commercially available, such as "lactose MacConkey" indicator plates. Both z and z cells can grow on such plates. Colonies of cells that have high levels of β -G activity (more than about 1000 units/cell) turn red. Colonies that have low levels of £$ -G activity are colorless.

SUBSTITUTE SHEET

'

-6-

Indicator plates may also contain one or more selective agents, such as an antibiotic such as ampicillin. For example, an indicator plate which contains ampicillin allows for the growth of E. coli cells which contain the enzyme -lactamase. This allows a convenient method of selecting cells which contain plasmids that code for _ -lactamase.

The β> -G enzyme often remains functional even if the N-terminus is removed and replaced by another polypeptide. It has been shown that an N-fragment which contains up to about 40 amino acids can be replaced by a wide variety of other polypeptides, without seriously affecting the -G activity of the remaining C-fragment of p> -G enzyme [5] . This allows for the creation of a fusion protein with β -G activity, by means of creating a hybrid or "chimeric" gene using recombinant DNA techniques or other genetic methods.

Gene Modification

Genetic recombination has been studied intensively in recent years. Various methods have been described in numerous articles [ 6 ] and several patents [ 7 ] . Several techniques of particular interest to this inventon are described below; however, these are not the only techniques which can be utilized in the method of this invention. Other techniques which are known to those skilled in the art, or which are hereafter dis¬ covered, may be used to perform the methods of this invention.

-7-

A double-helical strand of DNA can be cut ("cleaved" or "opened") by an enzyme called an "endonuclease" (i.e., an enzyme which is capable of breaking the phosphodiester bonds in both strands of a DNA double heli ) . Such nucleases often recognize a specific sequence of base pairs. For example, two such nucleases cut DNA in the following manner:

The nuclease designated "Sma-1" creates a "flush" end; the nuclease designated "Bam-H 1" creates a "sticky" or "cohesive" end. - A wide variety of endo- nucleases are known to exist, most of which have been correlated with a sequence of base pairs (usually from four to six base pairs in a specific sequence) that serve as cleavage sites.

If desired, it is possible to convert a sticky end into a lush end in at least two different ways. One way is to add nucleotides to the cleaved DNA with DNA polymerase, which is able to supply the "missing" bases, as shown in the following example:

-8-

Another method is to contact the cleaved DNA with an "adapter," which is a sequence of nucleotides in a prearranged order. .For example, if the adapter sequence GATCCCCGGG is mixed in solution, it will anneal to itself to form the following double-helical fragment:

tCC C C G-G Gy

The fragment will ligate with the sticky ends created by the nuclease "Bam-Hi'' to create the following sequence: C O-

Bam-Hl Sma-1 Bam-Hl

This sequence contains a Sma-1 cleavage site which is surrounded or "bracketed" by Bam-Hi cleavage sites in both directions.

Several plasmids with modifed D-G genes have been created. For example, L. Guarante et al have reported [8] three plasmids (pLG200, pLG300, and pLG400) with carboxyl terminal regions of lac z genes fused to portions of lac i genes, with a Hind III cleavage side near the 5' end of the lac i gene. The modified z5 -G genes are devoid of start codon. If a gene fragment with a start codon and an open reading frame is inserted into the Hind III site, the

-9 -

chimeric gene can be translated into protein with /j -G activity. The purpose of this work was to evaluate the effects of nucleotide sequences that preceded the -ό -G gene. " However, the reported plasmids would not be suitable for the methods of the present invention.

-10-

Vaccines

There are at least four different types of proteins: enzymes, antibodies, receptors, and structural proteins. As discussed above, enzymes catalyze certain biochemical reactions. By contrast, the primary purpose of a structural protein is to surround and protect something else, or to preserve a spatial configuration. Structural pro¬ teins usually exist at or near the surface of a cell or virus. For example, a typical virus comprises a strand of RNA or DNA which is encapsulated within a shell formed by one or more structural proteins. When a mammal is infected with virus particles, the mammalian body responds with a complex process that is not completely understood, referred to as immune response. This process involves antibodies, which are complex proteins that attach to the structural proteins in the viral coating. In this manner, antibodies can impede and inactivate viral particles. Any substance which induces ' the forma¬ tion of antibodies when injected into a " mammal is referred to as "immunogenic."

Vacc'ination is a process used to help people and animals resist " viral infection. In a typical vaccination, a person or animal is injected with a foreign substance which contains a protein. In most cases, the vaccination protein is in the form of an entire virus particle which has been treated by heat, chemicals, or other methods to render the virus non-pathogenic. Although the virus particle is no longer pathogenic, the particle contains surface proteins which are identical or very similar to the surface proteins of the pathogenic viral particles. The host animal responds to the non- pathogenic virus particle by means of an immune

-11-

response,which involves the creation of antibodies that bind to the surface proteins of the non-pathogenic virus. Those antibodies are also capable of binding to the surface proteins of the pathogenic viruses.

5; The usefulness of a vaccination is due primarily to a biochemical process called an "anamnestic" response, derived from Greek words meaning "not forgotten." The antibodies that were created in response to the vaccination are decomposed within

10. a period of days or weeks after they were created.

However, once the body has "learned" how to recognize . and respond to a foreign protein, it is capable of . responding to the same protein much more rapidly and effectively if a second infection occurs.

15 A substantial amount of research has been devoted to creating effective vaccinating agents comprised solely of protein [ 9 ] . A mammalian body may be able to recognize certain types of exogenous proteins, (i.e. , protein that was created by a dif-

2 Q ferent type of animal) , and to generate antibodies that will bind to forein proteins.; .

It is believed that a single protein molecule can usually be bound by a variety of antibodies, which are presumed to act upon different (possibly

25 overlapping) regions of the protein. It is also believed that a single protein molecule can act as an immunogen causing the creation of a variety of anti- ' bodies. It is further believed that a protein fragment (i.e., a polypeptide) can act as an immunogen

30 if the polypeptide is sufficiently long. This has led to attempts to synthesize protein fragments for possible use as antigens [101- However, such attempts suffer from several limitations.

1. Before the polypeptide may be synthesized, 35 its amino acid sequence must be determined.

2. The complexity of the synthesis increases as the length of the polypeptide increases.

-12-

3. Due to the complex folding of most proteins, it is difficult to determine which amino acids in a protein are exposed to antibodies and therefore which amino acid sequences will have the best immunogenic effect.

-13-

Disclosure of the Invention

This invention relates to a cloning vector with an inactivated gene. In its unmodified "wild-type," the gene codes for an indicator polypeptide, such as _ n -galactosidase ( -G) . The gene in the cloning vector has a cleavage site which allows for the insertion of a DNA fragment into the gene. The gene has also been inactivated by an addition or deletion which causes a fra eshift. If a DNA fragment with the following character¬ istics: a. an open reading frame, i.e., the absence of stop codons, and b. the proper number of nucleotides to cause a correcting frameshift is inserted into the cleavage site, the gene may be activated. The first condition virtually ensures that the only types of lengthy DNA fragments that can activate the gene are exon segments (i.e., DNA segments that are transcribed into mRNA. which is translated into protein) that are being read in the proper register. The second condition is presumed to be satisfied randomly with a probability of 1/3.

A large number of polynucleotide fragments may be created by fragmenting the genetic material of an organism of interest, such as a virus. The fragmenta¬ tion may be performed using known techniques, such as sonication or shearing. If desired, the resulting fragments may be processed by electrophoresis, centrifugation, etc. Using techniques known to those skilled in the art, the DNA fragments may be inserted into numerous cloning vectors at the desired location in the inactivated gene. The resulting plasmids may be inserted into suitable cells, using known techniques. The cells are cultured in a manner which allows for the identification of cells which contain activated

-14-

genes, i.e., genes which code for one or more indicator polypeptides. For example, if the indicator poly¬ peptide is U -G, the cells may be grown on indicator plates which cause lac z colonies to exhibit a red color, while lac z colonies are white. The lac z cells are very likely to contain plasmids which contain inserted DNA segments that are of great interest.

The cells themselves are of great interest. They may be utilized for several purposes, including: a. creating proteins (such as vaccines and other antigens) that are coded for by the organism of interest; b. creating antibodies which bind to said proteins and to the organism of interest; c. identifying the amino acid sequence of such proteins; d. identifying the nucleotide sequence and reading frame of the DNA of the organism of interest.

Brief Description of the Drawings

Figure 1 illustrates the creation of pMR2, an intermediate cloning vector.

Figure 2 illustrates the creation of pMRlOO, the cloning vector of this invention.

-15-

Best Mode of Carrying Out the Invention

One preferred embodiment of this invention utilizes a cloning vector designated as pMRlOO. This vector contains a lac operon with the following characteristics:

1. functional sites for promotion, translation initiation, ribosome binding, and trans¬ lation termination;

2. a TAC nucleotide sequence which is transcribed into an AUG start codon;

3. a C-terminal region of the -G gene which has been inactivated by a frameshift caused by the insertion of a nucleotide segment

(comprising ten nucleotides) into the N- terminal region of the -G gene; and 4. a nucleotide sequence within the N-terminal region of the f -G gene which allows for a single Sma-1 cleavage site bracketed by two Bam-Hl cleavage sites, as shown below:

Cloning vector pMRlOO also contains the following characteristics:

1. a gene which codes* for _5 -lactamase, thereby providing resistance to certain antibiotics such as ampicillin; 2. the absence of Sma-1 cleavage sites other than the site within the inactivated O-G gene.

-16-

DNA fragments were created by cleaving a plas id, pBR322, with a restriction endonuclease, Haelll. Each plasmid was cleaved into 22 fragments, ranging in length from 7 nucleotides to 587 nucleotides. These fragments were contacted with pMRlOO cloning vectors which had been cleaved by Sma-1, and with DNA ligase to cause the cleaved phosphodiester bonds to be re¬ connected. This created plasmids with inserted DNA, as well as religated vectors with no inserted DNA. The religated vectors with no inserted DNA remained lac z~.

The mixture of plasmids and religated vectors were then contacted with E. coli cells that are de¬ void of lac .z " gene DNA. The cells were treated with calcium chloride to promote the uptake of plasmids and religated vectors. The transformed cells were then cultured to promote the replication of plasmids within the cells, causing numerous copies to be present in most cells. The resulting cells were then seeded onto lactose MacConkey indicator plates, at relatively low densities. A small fraction of the seeded cells grew into colonies that were red, indicating that they contained _/__- -G activity. Based upon the nucleotide sequence of the pBR322 plasmid, it was predicted that only 3 of the 22 Haelll fragments were capable of activating the A -G gene in the pMRlOO cloning vectors. Those three fragments all have (3x-l) nucleotides, where x is an integer; the actual fragment sizes were 80, 89, and 104 bases. In addition, none of the three fragments contains a stop codon or creates a stop codon when inserted into a Sma-1 cleavage site.

-17-

The colonies were analyzed to' compare the experimental results with the predicted results. The results agreed perfectly; all three predicted fragments were found in red colonies, and no DNA fragments other than the three predicted fragments were found in red colonies.

The cloning vector and the methods of this invention can be utilized to create a wide variety of very useful products, including: 1. genetic material, such as plasmids and segments of DNA. Such genetic material can be removed from the cells that were used to identify it, and processed in a variety of ways. For example, the entire plasmid may be useful for insertion into other types of cells. Alternately, the exogenous DNA segment may be removed from the plasmid, for example, by use of a restriction endonuclease which cleaves the plasmid at two sites that bracket the insertion cleavage site. Either the plasmid or the extracted DNA segment may be treated by genetic recombination techniques that are currently known or hereafter discovered. 2. cells which contain DNA segments of interest.

Of course, only a few cells are likely to be directly identified by the methods of this invention. How¬ ever, such cells can be cultured into billions of cells, each of which will contain the same valuable genetic material, with very limited exceptions.

18-

Any cell which is a descendent of, or is other¬ wise derived from, a cell which is identified by the method of this invention, is within the scope of this invention. 3. polypeptides, such as antigenic proteins which can serve a variety of purposes, such as vaccination agents or agents for the production of polyclonal or monoclonal antibodies. Such polypeptides may be expressed by, or be capable of being removed from, the cells identified by the methods of this invention, or by any cells which descend from or are otherwise derived from cells identified by this invention.

4. antibodies which are capable of binding to the polypeptides that can be created by this invention.

5. assay kits utilizing the polypeptides or antibodies mentioned above.

As used herein, the term "indicator polypep¬ tide" includes any protein, protein fragment, or other polypeptide which has activity or other characteristics that allow cells which produce such polypeptides to be distinguished from (1) cells which do not produce such polypeptides, or (2) cells which produce polypeptides which for any reason do not have the same characteristics as the indicator polypeptides. For example, an indicator polypeptide may comprise an enzyme which causes a color change or other chemical reaction when contacted with selected chemicals, or an enzyme which increases the survivability of a cell in the presence of a selective agen^ or an amino acid sequence which modifies the rate of transport, ex¬ pression, or degradation of the polypeptide.

_^_

-19-

As used herein, the term "fragment" refers to a polynucleotide or polypeptide sequence that has been dissociated from the surrounding DNA or protein, respectively. By comparison, the term "segment" refers to any polynucleotide or polypeptide sequence, regardless of whether it has been dissociated from the surrounding DNA or protein.

As used herein, DNA includes complementary DNA (cDNA) which has been prepared from a template of RNA or single stranded DNA.

As used herein, a "cloning vector" is a DNA molecule which " contains, inter alia, genetic information which insures its own replication when transferred into a cell. Examples of transfer vectors commonly used are the DNA of bacteriophages, and plasmids (i.e., non-chromosonal loops of DNA that exist inside cells) . For convenience, the term "plasmid" is used hereafter in a slightly different sense; it includes any cloning vector that contains a segment of inserted DNA.

An essential step in several of the claims com¬ prises creating nucleotide fragments. There are several techniques known to those skilled in the art for perfor¬ ming this step. Such techniques include: 1. the fragmentation of DNA by means such as sonication, shearing, or contact with endonucleases or substances which cause bonds between nucleotides to break:

2. the creation of double-stranded DNA from a template comprising RNA or single-stranded DNA, by means such as contact with substances such as reverse" transcriptase or DNA polymerase(s) . Such steps might be performed either before or after the RNA or ssDNA is fragmented.

3. the creation of bonds between nucleotides, oligo- nucleotides, or polynucleotides, or their analogs, by enzymatic or non-enzymatic means, with or without the use of templates.

-20- Example 1: Construction of pMR200 Cloning Vector

The plasmid pLG400 (obtained from L. Guarente, Harvard Univ., described by L. Guarente et al, Cell 20:543, 1980) was digested with endonucleases Sma-I and Hind-III (Bethesda Research Laboratories (BRL) , Bethesda, MD) . The largest of the two resulting DNA fragments (approximately 8.8 kb long) was isolated by agarose gel electrophoresis, electroeluted in 0.05 x buffer comprising 90 mM Tris, 90 mM boric acid, and 4 mM EDTA. The eluted DNA was bound to DEAE-sepharose, and eluted with 1M NaCl,10 mM Tris, 1 mM EDTA buffer. The fragment was precipitated in 95% ethanol, washed in 70% ethanol, and dried by lyophilization. The plasmid pKB252 (obtained from L. Guarente,

■described by K. Backman et al, PNAS 73:4177, 1976) was digested by endonucleases HAE-III and Hind-III (obtained from BRL) . The largest fragment (about 1.1 kb) was purified in the same manner as above. The two purified fragments were ligated together using T, ligase (from BRL) overnight at,16°C, as diagrammed in Figure 1.

The ligated mixture was used to transform a strain of E.coli bacteria, designated as LG90. This bacterium is sensitive to ampicillin, and it does not contain a lac operon. The transformation was promoted by a standard technique using 50 mM calcium chloride.

4 -

A clonal colony that was lac-z (red color) on MacConkey indicator plates (Difco, Detroit, MI) was selected and cultured. Plasmids were removed from the cells by lysing the cells with detergent, followed by several purification steps. They were tested by various endonucleases; the results confirmed the structure indicated in Figure 1. These plasmids were designated as pMR2.

-21-

The plasmid pMR2 was linearized by Ba -HI (New England Biolabs, Beverly, MA) , extracted in phenol chloroform, precipitated in 95% ethanol, washed in 70% ethanol, and lyophilized. The resulting linear DNA was incubated for 3 hours at 22°C in the presence of T. ligase and a 50 fold molar excess of a synthetic oligo nucleotide with the following sequence: GATCCCCGGG. This results in a Sma-I cleavage site bracketed by two proximate Bam-HI cleavage sites. This insertion causes a frameshift (3 x + 1) which inactivates the C-terminal portion of the lac-z gene. The resulting plasmid was designated as cloning vector pMRlOO. It was used to transform LG90 cells, which tested lac-z on MacConkey indicator plates.

Example 2: Preparation of DNA Fragments

The plasmid ptkSVB was obtained from D. Livingston, Harvard University. The plasmid was digested with endonuclease Hind-III (from BRL) . The smaller of the two resultant fragments was gel purified as described in Example 1. A 1169bp segment was fragmented by sonication using a Heat Systems-Ultrasonics sonifier, setting 6, for 180 seconds, in 5 ml of 0.2 M NaCl, 10 mM Tris, 1 mM EDTA. The sonicated DNA was concentrated to 0.4 ml using a DEAE-sepharose column as described in Example 1. This DNA was precipitated and washed in ethanol and lyophilized. The DNA was then treated with the nuclease BAL-31 (obtained from BRL) using .01 units/mg DNA for 5 minutes at 30°C in a 10 XJI vol/ g DNA. The DNA was then phenol extracted and ethanol precipitated, resuspended in 10 mM Tris, 1 mM EDTA, and size-fractionated on 10% acrylamide-TBE gels. Fragments of the desired size class (approximately 350-500 bp) were selected

-22- using adjacent DNA fragment markers on the same gel electroblotted to DEAE-cellulose paper in 20 mM Tris, 1 mM EDTA for 2 hours at 100 mA. The size- fractionated DNA was eluted usin^ 1 N NaCl-TE, precipitated and washed in ethanol, dried, and resuspended in TE.

Example 3: Insertion of DNA Fragments Into Cloning Vector

The plasmid pMRlOO was linearized by the endonucleases Sma-I (from BRL) , digesting for 2 hours at 37°C- The 5' terminal phosphate groups were removed from the Sma-I ends to prevent religation, using calf intestinal alkaline phosphatase (laboratory stock) , digesting at 37°C for 30 minutes. The phosphatased, linearized vector was phenol extracted, ethanol precipitated, and resuspended in 10 mM .

Tris, 1 mM EDTA at a final concentration of 0.5 mg/ul. The vector was incubated with ptkSVB fragments, prepared as described above, in a ratio of 500 ng vector to 200 ng fragments, in a 10 ml volume using unit T4 ligase overnight at 16°C.

Example 4: Transformation of Recipient Cells

A 2 ul aliquot of the ligation mixture described in Example 3 was put into an eppendorf tube, ressuspended in 200 ul of 10 mM Tris, 1 mM EDTA, 0.2 M NaCl, and phenol extricated once. The aqueous layer con¬ taining the plasmids was mixed with 10 ug of yeast-tRNA carrier (from Gibco, NY) , precipitated and washed in ethanol, dried and resuspended in 5 μl of 10 mM Tris, 10 mM MgCl 2 , 10 mM CaCl-. This mixture was used to transform 0.05 ml L690 using the 100 mM

CaCl- transformation method of Dagert and Ehrlich (Gene 6: 23, 1979). The transformants were plated on MacConkey agar plates containing 25 g/ l ampicillin

23-

(Sigma) . Colonies of lac-z transformants were selected for further characterization after 24 hours incubation at 37°C.

Example 5: Characterization of Plasmid DNA From lac-z Clones

Lac-z clones from Example 4 were subcultured in 1 ml L broth containing 50 ug/ l ampicillin. Plasmid DNA was extracted from the cells using the method described in Nucleic Acids Research 5:298, 1981. The DNA precipitate was resuspended in 25 ul of 10 M Tris, 1 mM EDTA. Five ul of the DNA was digested with . 2 units of BamHI in a 20 ul reaction volume, for 60 minutes at 37°C. The digested DNA was treated with 1 jug/αl RNAase (laboratory stock) for 5 minutes at 37°C, and analyzed on 10% acrylamide TBE gels, run at 300 V for about 90 minutes. The gels were stained in ethidium bromide for 20 minutes and photographed. The size of the DNA insert was compared to size standards run on the same gel. With one possible exception (which may be due to a previously undetected cleavage site) the results of over 50 experiments have correlated well with predicted results.

Example 6: Characterization of the Fusion Proteins

Selected lac-z clones were subcultured into 3 ml L broth with 50 jug/ml ampicillin, and incubated at 37°C overnight. The cells were then centrifuged at 10,000 g for 5 minutes; the supernatant was discarded. 150 ul of 1.2 x Laemmli sample buffer was added to each pellet. Each pellet was resuspended by mixing and heated to 98°C for 3 minutes. The protein samples . ere then drawn into a 22 gauge needle 5 or 6 times to reduce sample viscosity.

-24-

The samples were tested by electrophoresis on- 7.'5% acrylamide SDS Laemmli gels, running at 200 V for approximately 3 hours. After electrophoresis, the gel was stained in 0.1% Coomassie blue for 60 minutes, and destained in 10% acetic acid-50% methonol for about 3 hours. The gel was then allowed to swell in 10% acetic acid and then dried using vacuum and heat. The size of the fusion protein was comprised to the protein produced in cells containing ' pMRl00 with no inserted DNA.

-25-

Iridustrial Applicability

This invention has industrial applicability in the analysis of numerous microorganisms which have industrial uses, such as yeast used in brewing and bacteria used in sewage treatment.

Equivalents

Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, numerous equivalents to the specific procedures described herein. Such equivalents are considered to be within the scope of this invention, and are covered by the fol¬ lowing claims.

-26-

References

1. See, e.g., A. L. Lehminger, Biochemistry, 2nd edition, p. 89 et seq. (Worth Publishers, New York City, 1975); L. Stryer, Biochemistry, 2nd Edition, p. 597 et seq. (W. H. Freeman & Co., San Francisco, 1981).

2. See, e.g., Stryer, supra note 1, at p. 702 35 seq.

3. See, e.g., Lehminger, supra note 1, at p. 929 et seq.; Stryer, supra note 1, at p. 619 et seq.

4. See Lehminger, supra note 1, at p. 962; Stryer, supra note 1, at p. 629; U. S. Patent 4,322,499 (Baxter et al., 1982), column 3.

5. E. Brickman et al., J. Bacteriol. 139: 13-18 (1979).

6. See, e.g., L. Guarente et al., Cell 20: 543-553 (1980).

7. See, e.g., U.S. Patent ' 4,237,224 (Cohen et al., 1980); U. S. Patent 4,322,499 (Baxter et al., 1982).

8. L. Guarante et al., supra note 6, at p. 549 et seq.

9. See, e.g., H. L. Bachrach et al., J. Immunol. 115: No. 6, p. 1636-1641 (1975); R. H. Meloen, J. Gen. Virol. 45: 761-763 (1979); M. Breindl, Virology 46: 962-964 (1971) .

10. See Science 213: 623-628 (7 August 1981).