Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
EXPRESSION OF HETEROLOGOUS POLYPEPTIDES IN HALOBACTERIA
Document Type and Number:
WIPO Patent Application WO/1994/021789
Kind Code:
A1
Abstract:
This invention relates to the preparation and use of expression systems capable of producing heterologous polypeptides in halobacterial hosts.

Inventors:
TURNER GEORGE J
BETLACH MARY C
Application Number:
PCT/US1994/002388
Publication Date:
September 29, 1994
Filing Date:
February 28, 1994
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV CALIFORNIA (US)
International Classes:
C12N15/09; C07K14/215; C07K14/705; C12N1/21; C12N9/10; C12N15/74; C12P21/02; C12R1/01; C12R1/19; (IPC1-7): C12N15/11; C12N1/21; C12N15/00; C12P21/00
Other References:
U. BLASEIO ET AL.: "Transfor- mation of Halobacterium halo- bium: Development of vectors and investigation of gas vesicle synthesis", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, vol. 87, September 1990 (1990-09-01), pages 6772 - 6776
PATENT ABSTRACTS OF JAPAN, unexamined applications, c field, vol. 8, no. 126, issued 1994, June 13 THE PATENT OFFICE JAPANESE GOVERNMENT, page 30 C 228; & JP,A,59-36 700 (MITSUBISHI KASEI KOGYO).
Download PDF:
Claims:
CLAIMS
1. An expression vector useful for the production of heterologous polypeptide in a halobacterial host comprising: a) transcription and translation regulatory DNA for operable expression of DNA in the 3 'position of said regulatory DNA; b) DNA encoding a heterologous polypeptide 3' of said regulatory DNA; and c) DNA encoding transcription and translation stop signals 3' of said heterologous DNA.
2. The vector according to Claim 1 having DNA encoding replication and selection capability for said halobacterial host.
3. The vector according to Claim 1 containing additional DNA encoding the presequence of bacteriorhodopsin gene between said regulatory DNA and said DNA encoding said heterologous polypeptide for expression of a fusion polypeptide of said presequence with said heterologous polypeptide.
4. The vector according to Claim 1 or 3 containing additional DNA encoding a Cterminal sequence of the bacteriorhodopsin gene and DNA operably encoding a unique protease cleavage site and a restriction site, in optional order, between said Cterminal sequence and said DNA encoding said heterologous polypeptide, said additional DNA being 3' of said DNA encoding said heterologous polypeptide..
5. The vector according to Claim 1 wherein said transcription and translation regulatory sequences and said transcription and translation stop signals are those of the bacteriorhodopsin gene.
6. The vector according to Claim 4 wherein said transcription and translation regulatory sequences and said transcription and translation stop signals are those of the bacteriorhodopsin gene.
7. The vector according to Claim 1 or 2 wherein said heterologous polypeptide is Type HM1 human muscarinic acetylcholine receptor.
8. The vector according to Claim 4 wherein said heterologous polypeptide is the catalytic subunit of aspartate transcarbamylase from Escherichia coli.
9. The vector according to Claim 5 wherein said heterologous polypeptide is the catalytic subunit of aspartate transcarbamylase from Escherichia coli.
10. A halobacterial host transformed with a vector according to Claim 1.
11. A halobacterial host transformed with a vector according to Claim 4.
12. A halobacterial host transformed with a vector according to Claim 5.
13. A method for producing a heterologous polypeptide in a halobacterial host comprising: a) obtaining the elements necessary for operable expression comprising: i) transcription and translation regulatory DNA for operable expression of DNA in the 3'position of said regulatory DNA; ii) DNA encoding a heterologous polypeptide 3' of said regulatory DNA; and iii) DNA encoding transcription and translation stop signals 3' of said heterologous DNA; b) operably assembling the elements of a); c) transforming a halobacterial host with said assembled elements; d) causing expression of said DNA encoding the heterologous polypeptide; e) isolating said heterologous polypeptide; and f) optionally further purifying said heterologous polypeptide.
14. The method according to Claim 13 wherein said elements include additional DNA encoding a Cterminal sequence of the bacteriorhodopsin gene and DNA operably encoding a unique protease cleavage site and a restriction site, in optional order, between said Cterminal sequence and said DNA encoding said heterologous polypeptide, said additional DNA being 3' of said DNA encoding said heterologous polypeptide.
15. The method according to Claim 13 wherein said transcription and translation regulatory sequences and said transcription and translation stop signals are those of the bacteriorhodopsin gene.
16. A method according to Claim 13 wherein said heterologous polypeptide is the catalytic subunit of aspartate transcarbamylase from Escherichia coli.
17. A method for producing a heterologous polypeptide in a halobacterial host comprising: a) causing expression of DNA encoding said heterologous polypeptide within an operable expression vector transformed into said halobacterium; b) isolating said heterologous polypeptide; and c) optionally further purifying said heterologous polypeptide.
18. A method for producing a heterologous polypeptide in a halobacterial host comprising: a) causing expression of DNA encoding said heterologous polypeptide and a Cterminal sequence of the bacteriorhodopsin gene within an operable expression vector transformed into a halobacterium, said Cterminal sequence being 3' of said DNA encoding the heterologous polypeptide; b) separating the membrane of said halobacterium after expression of said DNA encoding a heterologous polypeptide and bacteriorhodopsin Cterminal region; c) isolating said heterologous polypeptide; and d) optionally further purifying said heterologous polypeptide.
19. The method according to Claim 15 wherein said DNA further comprises additional DNA operably encoding a unique protease cleavage site between said DNA encoding said heterologous polypeptide and said DNA encoding said bacteriorhodopsin Cterminal sequence, said additional DNA being 3' of said DNA encoding said heterologous polypeptide.
Description:
EXPRESSION OF HETEROLOGOUS POLYPEPTIDES IN HALOBACTERIA

Field of the Invention

The present invention is directed to the preparation and use of a halobacterial expression system that is capable of producing soluble and transmembrane heterologous polypeptides that are not endogenous to said halobacterium.

Background of the Invention

Halobacteria are found in nature in evaporating salt water ponds under conditions of intense light and low oxygen saturation. They contain distinctive brightly colored pigments such as the orange-red pigment, bacterioruberin, or patches of "purple membrane". Halobacteria belong to a phylogenetically distinct group of prokaryotic organisms - the "archaebacteria" (Archaea) - that are as distantly related to the eubacteria as they are to the eukaryotes. Archaebacteria possess some attributes in common with the eukaryotes and the eubacteria, as well as characteristics that are uniquely archaeal. For example, the archaebacteria possess a eukaryotic-like transcription apparatus with a 7-12 subunit RNA polymerase which is immunologically related to eukaryotic RNA polymerase (1) and promoter structures are similar to those of RNA Pol II (2). In contrast, the archaebacteria have prokaryotic cellular morphology and 23S, 16S and 5S rRNAs with the genes encoding the rRNAs arranged into eubacterial-like operons (3). Notably, the archaebacteria are unique in their membrane composition.

Bacteriorhodopsin (BR) is found as the sole protein in specialized crystalline patches of the "purple membrane" in halobacteria. Synthesis of BR is induced by high light intensity and low oxygen tension and the patches of purple membrane can constitute up to 50% of the archaebacterium Halobacterium halobium cell surface area.

BR consists of a complex of one protein (bacterio-opsin) along with the chromophore retinal in a 1:1 stoichiometric ratio (4). This complex is embedded in the lipid matrix as seven transmembrane hydrophobic α-helices in a trimeric

configuration (5). Retinal is covalently attached at lysine at position 216 approximately one-third of the way across the transmembraneous region of one of the α-helices (6). The complex of bacterio-opsin with retinal was named bacteriorhodopsin (BR). The so-called bop gene encodes the light-driven protein pump bacteriorhodopsin (BR) in H. halobium.

There has been some reported research on expression of endogenous polypeptides in halobacteria (7, 8 and 9).

Summary of the Invention

The present invention is directed to the preparation and use of an expression system for heterologous polypeptide production in a halobacterial host.

In a first aspect, such systems in their broadest context would include transcription and translation regulatory DNA, DNA encoding a heterologous polypeptide that is not endogenous to the halobacterial host and DNA encoding transcription and translation stop signals.

Preferably such systems would include DNA encoding the pre-sequence of bacteriorhodopsin such that the polypeptide which is expressed is attached to the pre-sequence, thus allowing the heterologous polypeptide to be properly targeted to the membrane and either inserted into or secreted across the membrane. Yet another preferred embodiment of the present invention uses the transcription and translation regulatory sequences and the translation and transcription stop sequences of the bacteriorhodopsin gene, either in the presence or absence of the bacteriorhodopsin pre-sequence. The use of the regulatory and stop sequences of the bacteriorhodopsin gene serves to allow high level expression of the heterologous polypeptide sequence.

In a second aspect, the present invention is also directed to utilizing the C- terminal domain of the bacteriorhodopsin polypeptide in order to enhance the separation of the mature heterologous polypeptide from the membrane of the halobacterial host following expression. In a preferred embodiment of this aspect, DNA encoding a unique protease site is introduced between said C- terminal sequence and the DNA encoding the heterologous polypeptide.

In a preferred embodiment of this aspect, high levels of expression of the heterologous polypeptide linked to the C-terminal region of bacteriorhodopsin are achieved by using DNA encoding the transcription and translation regulatory and stop sequences of the bacteriorhodopsin gene. A further preferred embodiment of the invention is directed to the use of the bacteriorhodopsin pre-sequence to enhance expression of the heterologous polypeptide linked to the C-terminal region of bacteriorhodopsin.

The invention is directed to such systems in all their equivalent aspects, including expression vectors, halobacterial hosts transformed with such vectors and methods for producing, isolating and optionally further purifying heterologous polypeptides using such expression vectors.

Detailed Description

The present invention has been described herein by disclosing the preferred embodiments and best mode. It will be understood, however, that having detailed the method first used by the present inventors to produce the heterologous polypeptide expression system in halobacterium, it will be apparent to those skilled in the art that one could make modifications within the general skill of the art to produce expression systems that differ in one or more ways from that originally described.

A) Brief Description of the Drawings

Figure 1 is a restriction map of the Pstl/BαmHI fragment containing the bacteriorhodopsin gene and about 400 bp of upstream sequences from Halobacterium halobium strain Rl.

Figure 2 shows the nucleic acid sequence (SEQ ID NO:l) of the Pstl/SawHI construct of Figure 1 containing the bacteriorhodopsin gene and about 400 bp of upstream sequences from Halobacterium halobium strain Rl. Also shown is the amino acid sequence (SEQ ID NO:2) of the BR protein translation product.

Figure 3 shows the restriction map of pUBP2.

Figure 4 is a map of the secondary structure of the mature BR protein (SEQ ID NO:3).

Figure 5 is a restriction map of the Psil/BamHl fragment containing BR regulatory sequences and the gene for human muscarinic acetylcholine receptor (Type HM1) in pENDS-OMl.

Figure 6 shows the nucleic acid sequence (SEQ ID NO:6) of the Pstl/βαmHI fragment of Figure 5 containing the gene for human muscarinic acetylcholine receptor (Type HM1) of pENDS-OMl. Also shown is the amino acid sequence (SEQ ID NO:7) of HM1.

Figure 7 is a restriction map of the Psil/BamHl fragment containing the BR regulatory sequences and gene for human muscarinic acetylcholine receptor (Type HM1) in pENDS-OM2.

Figure 8 shows the nucleic acid sequence (SEQ ID NO:8) of the PstI/J5αmHI fragment of Figure 7 containing the gene for human muscarinic acetylcholine receptor (Type HM1) which lacks the 13 domain. The amino acid sequence

(SEQ ID NO:9) of HM1 having a deleted 13 domain is shown.

Figure 9 is a restriction map of the Psil/BamHl fragment containing the BR regulatory sequences and the rat serotonin receptor (Type 1C) gene.

Figure 10 shows the nucleic acid sequence (SEQ ID NO: 10) of the Psil/BamHl construct of Figure 9 containing the rat serotonin receptor gene and the amino acid sequence (SEQ ID NO: 11) of the rat serotonin receptor.

Figure 11 is a Southern blot of DNA isolated from H. halobium Bop deficient strain L33 transformed with pUBP2 containing the rat serotonin receptor (Type

1C) gene. Lanes 1-10, 12-19, 21-24 and 27 contained DNA from strain L33 transformed with pUBP2 containing the Fstl/BamHl fragment of Figs. 9 and 10

(SEQ ID NO: 10). Lanes 11 and 25 are positive controls which contained purified plasmid DNA (i.e. pUBP2 containing serotonin receptor gene). Lane 29 contained DNA from strain L33. The arrow indicates the location of the Psil/BamHl fragment corresponding to serotonin DNA.

Figure 12 shows a Northern blot of total RNA isolated from H. halobium Bop deficient strain L33 transformed with pUBP2 containing the rat serotonin receptor gene. Lanes 2 and 5 contain RNA from wild type strain L33 transformed with the 1.2 kb Pstl/BamHl fragment containing the bop gene in pUBP2 as a control. Lanes 1, 3 and 4 contain DNA from L33 transformed with the rat serotonin receptor gene. The 1.85 kb Pstl/BαmHI fragment of Figs. 9 and 10 was used as probe. The arrow shows the location of the rat serotonin receptor RNA.

Figure 13 is a restriction map of the Pstl/BamHl fragment containing BR regulatory sequences and the human thrombin receptor gene.

Figure 14 shows the nucleic acid sequence (SEQ ID NO: 12) of the Pstl/BamHl fragment of Figure 13 containing the human thrombin receptor gene and the amino acid sequence (SEQ ID NO: 13) of the human thrombin receptor.

Figure 15 shows the restriction maps of pøgbop, pEK17, pBATC, pl.2KbBop and pBRAT.

Figure 16 shows a restriction map of the Pstl/BamHl fragment containing BR regulatory sequences, the bacterio-opsin gene and the gene encoding the Escherichia coli catalytic subunit of aspartate transcarbamylase.

Figure 17 shows the nucleic acid sequence (SEQ ID NO: 14) of the Pstl/BamHl fragment of Figure 16 containing the bacterio-opsin and the E. coli aspartate transcarbamylase genes and the amino acid sequence (SEQ ID NO: 15) of the BR E. coli aspartate transcarbamylase fusion protein.

Figure 18 shows a Western blot of H. halobium transformed with pBRAT. Blots were probed with antibodies to the catalytic subunit of aspartate transcarbamylase. Lane 2 contains E. coli aspartate transcarbamylase. Lanes 6-9 and 11 contain protein from H. halobium transformed with pBRAT. The arrow in lane 8 indicates the position of the bacteriorhodopsin/aspartate transcarbamylase (BR/ATCase) fusion protein.

Figure 19 shows the localization of expression of the bacteriorhodopsin/aspartate transcarbamylase (BR/ATCase) fusion protein to the purple halobacterial cell membranes. Washed H. halobium whole cell membranes fractionated on sucrose density gradients (A) were electrophoresed on SDS-polyacrylamide gels and stained with Coomassie blue (B). Lanes in (B) contained the following protein samples: Molecular weight markers (lane 1); unfractionated total membranes from H. halobium strain L33 transformed with pBRAT (lane 2); purple membrane from H. halobium strain L33 transformed with a 1.2 Kb Pstl/BamHl fragment containing the bop gene (lane 3) or with a 9 Kb genomic DNA fragment containing the bop gene (lane 4); total membranes from H. halobium strain L33 (lane 5); purple membrane from wild-type H. halobium strain Rl (lane 6); purple membrane of H. halobium strain L33 transformed with pBRAT (lanes 7-9).

B) Definitions

The term "expression vector" herein has a functional definition and includes vectors capable of expressing DNA sequences contained therein, where such sequences are operably linked to other sequences capable of effecting their expression. In the present specification, "vector" and "plasmid" are used interchangeably as the plasmid is the most commonly used form of vector. However, the invention is intended to include such other forms of expression vectors capable of equivalent functions and which are or become known in the art.

By the term "operable" herein, and grammatical equivalents, is meant that the respective DNA sequences are operational and work for their intended purposes.

The term "heterologous polypeptide" herein refers to presently known or unknown polypeptides not endogenous to the host cell, or if endogenous to the host cell, are obtainable herein in amounts not achievable in native state. Included within the definition are the halobacterial non-retinal binding proteins. Examples of heterologous polypeptides include, but are not restricted to, polypeptides from eukaryotes, eubacteria, archaebacteria, synthetic polypeptides and polypeptides containing bioequivalent amino acid analogs. Further included are other members of the 7-transmembrane crossing family such as muscarinic acetylcholine receptor, serotonin receptor, thrombin receptor, 0-adrenergic receptor, and the like. Heterologous polypeptides also include membrane proteins, for example, cystic fibrosis transmembrane conductance regulator, and soluble proteins, such as various enzymes (e.g. proteases and aspartate transcarbamylase). Each is used in accord with their known or determined function biologically and is adapted for such in accord with procedures generally known in the art.

By the term "DNA encoding heterologous polypeptide" is meant a DNA sequence coding for a polypeptide that is not endogenous to the host wherein it is expressed. Because of the high GC content (i.e. about 58-68%) of the genome of halobacteria, it is preferred that the DNA sequence encoding the heterologous polypeptide be in this range, although sequences with higher and lower GC content than that usually found in halobacteria can be used. For example, we have been successful in expressing Escherichia coli aspartate transcarbamylase, having a GC content of about 50%, as a fusion protein to the C-terminus of BR.

The term "transcription and translation regulatory DNA" and equivalents, in its broadest sense refers to a DNA sequence responsible for the dual transcription and translation elements of expression. In a preferred embodiment the regulatory DNA is that of the bacteriorhodopsin gene (from -364 to +41 relative to the RNA start site, Fig.2 (SEQ ID NO:l)).

In an alternative embodiment, the regulatory DNA contains about 4000 bp of sequences (from about -4000 to +41) upstream of the bop gene and includes three other genes of the bop gene cluster, which include brp (13), bat (14) and blp (Gropp & Betlach, manuscript in preparation). Some or all of these genes may be regulatory genes.

By the term "transcription and translation stop signals" and equivalents, in its broadest scope is meant DNA which functions to terminate transcription and translation, respectively. It is preferred that the transcription and translation stop signals be those of the bacteriorhodopsin gene. By the term "pre-sequence of bacteriorhodopsin gene" herein is meant a sequence of about 13 amino acids required to target bacteriorhodopsin to the membrane. The 13 amino acid pre-sequence is encoded by nucleotides +3 to +41 relative to the RNA start site depicted in Fig. 2 (SEQ ID NO:l).

By the term "halobacterium host" is meant strains belonging to Halobacterium. including species of extreme and moderate halophiles having a wild-type genotype. Examples of the extreme halophilic species having a wild type genotype include Halobacterium saccharovorium (ATCC 29252), Halobacterium California (ATCC 38799), Halobacterium halobium (CCM 2090) and Halobacterium valismortis (ATCC 29-715). Wild type moderate halophiles are exemplified by the species Halobacterium mediterannei (ATCC 33500). It may be preferred that the halobacterial host species is bacteriorhodopsin deficient. Bacteriorhodopsin deficient species are ei*her wild-type, such as H volcanii, or mutants, such as L33 (15), S9Flx3 (16), IV-8 (17) and IV-14 (17). Bacteriorhodopsin deficient mutants derived from strains which express purple membrane constitutively, such as L33, or inducibly, are useful for different applications. Depending on the nature of the upstream regulatory regions in the expression vector construct, inducible strains permit regulated expression whereas constitutive strains do not.

The term "restriction site" herein refers to a DNA sequence recognizable by an endonuclease as a site of DNA cleavage.

By the term "C-terminal sequence", "C-terminal region" and equivalents, is meant the polypeptide sequence at the C-terminus of bacteriorhodopsin; See Fig. 4.

By the term "unique protease site" is meant an amino acid sequence recognizable by a protease as the site of cleavage of the polypeptide wherein it is disposed and which is absent from the heterologous polypeptide expression product. In a preferred embodiment, the protease site (Ile-Glu-Gly-Arg) (SEQ

ID NO:4) of factor X a is used in view of the rarity of this sequence.

C) Examples 1. Cloning the DNA sequence encoding the heterologous polypeptide into a halobacterial expression vector

i. Constructs for expression of membrane proteins

All constructions are assembled using standard molecular techniques (12) including PCR. Expression vectors can be prepared in a variety of conventional ways. Although others may be used, a preferred halobacterial cloning vector to be adapted into an expression vector is plasmid pUBP2 (Fig.3) described by Blaseio et al. (7). The plasmid may be isolated using conventional techniques. For example, the plasmid may be purified using caesium chloride-ethidium bromide density gradients, electrophoresis from an agarose gel onto a dialysis membrane, use of commercially available chromatography columns for the separation of plasmids, such as magic minipreps DNA purification system (Promega Corp., Madison, WI), etc.

The expression vectors which will be employed will normally include a marker which allows for selection of cells into which the DNA has been integrated, as against cells which have not integrated the DNA construct. An example of commonly used selection markers is antibiotic resistance. Two markers are available for selection of halobacteria, including resistance to novobiocin (8) and mevinolin (7). It is preferred that the marker used be that for mevinolin resistance; mevinolin is a HMG CoA reductase inhibitor (7). This marker is present in the preferred cloning plasmid pUBP2 (Fig.3).

To convenience insertion of DNA sequences, plasmids will contain polylinker sequences containing various restriction sites. Several examples of polylinkers are known and available (12). A typical polylinker is polylinker 1 (12.3) (Fig.3.) which contains restriction sites for Hindlll, SphI, Mlul, Xhol, PstI, Sail, Xbal, BamHl, Hindlll, Xbal and KpnI. Another typical polylinker is polylinker 2 (3.70) (Fig.3) with restriction sites for SphI, EcoR5, SstI, Smal and EcoRI.

The DNA sequence encoding the heterologous polypeptide is inserted such that it is placed downstream from a transcription and translation regulatory region containing a promoter and a ribosome binding site using standard techniques. It is preferred that the promoter used is inducible, allowing controlled expression of the heterologous polypeptide product. In a preferred embodiment of the invention, the transcription and translation regulatory sequences of the bacteriorhodopsin gene will be used. The bacteriorhodopsin gene may be isolated from the genome of halobacteria using appropriate restriction enzymes. Transcription and translation regulatory sequences of the bacteriorhodopsin gene are located in the region of -365 to +41 relative to the RNA start site of the bacteriorhodopsin sequence depicted in Fig.2 (SEQ ID NO:l). To effect appropriate termination of heterologous polypeptide synthesis,

DNA sequences encoding transcription and translation stop signals are placed downstream of the inserted DNA sequence encoding the heterologous polypeptide sequence using well known techniques (12). Preferably, the sequences downstream of the bacteriorhodopsin gene (Fig.2) (SEQ ID NO:l) which includes the translational stop codon (TGA) followed by -80 bp which include the transcriptional termination signal are employed as stop signals.

Where it is advantageous to produce a heterologous transmembrane polypeptide which is targeted to the halobacterial membrane, DNA encoding the heterologous polypeptide is ligated downstream of DNA encoding the pre- sequence of BR.

The heterologous gene of interest may be cloned into the E. coli plasmid, pUC19 (20), along with BR regulatory sequences such that all cloned sequences will reside on a DNA fragment containing two unique restriction sites (choice of Pstl, BamHl, Smaϊ). More specifically, the heterologous gene is ligated such that it is in frame with the BR pre-sequence, downstream of the bacteriorhodopsin regulatory sequences/promoter and upstream of the bacteriorhodopsin transcriptional and translational termination sequences. A specific unique protease site may be engineered into some constructions between the BR pre- sequence and the heterologous gene. A 1.2 kbp fragment containing the bop gene and ~370 bp of upstream sequences was isolated from H. halobium strain Rl DNA using PCR and cloned into the Pstl/BamHl sites of pUC19 (denoted pl.2Kbbop) (Fig.l5B). Two endogenous ^ 4ΛvNI sites were removed from the cloned 1.2 kbp fragment: i) one site located 165 bp upstream of the bop gene start codon (SEQ ID NO:l) was removed by generating a G→T point mutation using the Kunkel method (29), and ii) the second ^ 4ΛvNI site located 7 bp upstream of the bop gene stop codon was removed using the Transformer Site-Directed Mutagenesis kit (Clontech Laboratories, Inc., Palo Alto, CA). Subsequently, a ~400 bp Pstl/AlwNl fragment (denoted "bop 5' fragment") containing the bop upstream sequences, DNA encoding the BR presequence and the first four (extrahelical) residues of BR was isolated by PCR from the mutated 1.2 kbp fragment. Concurrently, a ~100 bp Notl/Ba Hl fragment (denoted "bop 3' fragment") containing DNA encoding six C-terminal residues of BR, the BR stop codon and the transcriptional termination sequences of BR (up to 44 bp downstream of the stop codon) was obtained from the 1.2 kbp bop gene fragment by preparative digestion and purification (Prep-A-Gene, BioRad, Richmond, CA). In addition, an endogenous AlwNl site located in pUC19 (position 1217) was removed using the Clontech Transformer kit and the mutated pUC19 was preparatively digested with Pstl/BamHl and preparatively purified (denoted "vector fragment"). The three fragments (i.e., "bop 5' fragment", "bop 3' fragment" and 'Vector fragment") were ligated with DNA fragments containing various heterologous genes engineered to be in frame with the BR presequence and extrahelical

residues and to contain a single AlwNl site at the 5' terminus of the fragment and a single NotI site at the 3' terminus of the fragment as described below. In all of the heterologous genes, endogenous AlwNl, NotI, BamHl and Pstl sites were first removed (if necessary) to facilitate the construction. Once the heterologous gene was cloned along with the BR 5' and 3' regulatory sequences into pUC19, this intermediate construct (denoted "pEΝDs") was preparatively digested with Pstl/BamHl

Subsequently, the Pstl/BamHl restriction fragment containing the heterologous gene with the regulatory sequences of BR was preparatively isolated away from pUC19 sequences by agarose gel electrophoresis, purified using Prep- A-Gene (Bio-Rad, Richmond, CA) and cloned into the E. coli/H. halobium shuttle vector, pUBP2 (7). pUBP2 carries the pBR322 replicon and ampicillin resistance marker, the halobacterial plasmid pHHl origin of replication and a mevinolin resistance marker. Mevinolin resistance is encoded by an up-promoter mutation of the HMG-CoA reductase gene.

The construction was verified by restriction mapping and nucleotide sequencing across the junctions between 5' and 3' BR regulatory sequences and the heterologous gene.

a. Human muscarinic acetylcholine receptor (Type HM1) Two different constructs were made with this gene. The first (denoted pEΝDs-OMl) contained the entire gene whereas the second (denoted pEΝDs- OM2) lacked the large internal cytoplasmic loop (i.e., 13 ) which is thought to be involved in signaling. Prior to the generation of the constructions described below, two endogenous AlwNl sites and one endogenous Pstl site were removed from human muscarinic acetylcholine receptor (denoted HM1) cloned in pGEM3 (Promega Corp, Madison, WI) using either the Clontech Transformer kit or the Kunkel method (29). The positions of the removed sites are shown in Fig.6 (SEQ ID ΝO:6). pENDs-OMl was generated as follows. First, the HM1 gene was isolated by PCR from pGEM3/HMl so as to contain an AlwNl site at the 5' terminus and a Not I site the 3' terminus of the PCR fragment. This PCR fragment was

ligated to the "bop 5' fragment", "bop 3' fragment" and "vector fragment" described above and transformed into E. coli. The resultant plasmid was named pENDs-OMl. pENDs-OMl contains the methionine start codon of HM1 located 4 codons downstream from the BR 5' sequences. Nine extra base pairs generated by introduction of the ^ ΛvNI site encode 3 extra residues (i.e., gin, ala, leu) located in frame between the BR 5' sequences and the start codon of the HM1 gene. At the 3' terminus of the gene, the HM1 stop codon precedes the BR stop codon by 48 bp. From pENDs-OMl, the BR regulatory sequences with the HM1 gene were transferred to pUBP2 on a Pstl/BamHl fragment (Fig.5 and Fig.6, SEQ ID NO:6) as described above. pENDs-OM2 was generated in a similar manner as its sibling construct. First, however, deletions of the 13 domain were introduced after digestion of the HM1 gene at the unique Stul restriction site (position 712 relative to the start codon of the HM1 gene, SEQ ID NO:6), followed by digestion with the exonuclease Bal-31 for varying times at 4°C. The blunt-ended product was self- ligated to yield mutants with deletions of varying size within the 13 domain. One of these was chosen for further study which lacked amino acid residues 231 through 357 of HM1 (SEQ ID NO:7). DNA from this mutant was used to generate a PCR fragment containing the HM1 gene (less 13 loop) with a 5' AlwNl site and a 3' Not I site. This PCR fragment was identical to the fragment described above except for the lack of the 13 loop and was used to generate pENDs-OM2 in a similar manner to the pENDs-OMl construct. The sequence of the Pstl/BamHl fragment containing the BR regulatory sequences and the HM1 gene (less 13 loop) is shown in Fig.8 (SEQ ID NO:8).

b. Rat serotonin receptor (Type 1 O

The rat serotonin receptor gene (denoted "Ser") cloned as a 3 Kb iscoRI cDNA fragment on the plasmid pSRlc (27) was used as a basis for the following constructions. The Ser gene contains no endogenous AlwNl, NotI, itomHI and Pstl sites and was adapted for expression in H. halobium as follows. AlwNl and NotI cloning sites were introduced within the 5' coding and 3' noncoding regions of the Ser gene, respectively. In addition, DΝA encoding a poly-aspartic acid

peptide was placed in frame upstream of the Ser gene and downstream of the AlwNl site. Translation of this sequence generates a peptide epitope useful for subsequent detection of expressed protein (31). This fragment was isolated and ligated to the "bop 5' fragment", "bop 3' fragment", and "vector fragment" described above and transformed into E. coli. The resultant plasmid was named pENDs-Ser and contains the 36th codon of the rat serotonin receptor gene preceded by DNA encoding the peptide epitope and BR 5' sequences. Nine extra base pairs generated by the construction and encoding 3 extra residues (i.e., gin, ala, leu) are located in frame between the BR 5' sequences and the epitope sequences. At the 3' terminus of the gene, the Ser stop codon precedes the BR stop codon by 18 bp. Following the construction of pENDs-Ser, the BR regulatory sequences with the Ser gene were transferred to pUBP2 on a Pstl/BamHl fragment (Fig.9 and Fig.10, SEQ ID NO: 10).

c. Human thrombin receptor A clone of the human thrombin receptor gene (denoted "Thromb") (33) was used as a basis for the following constructions. Four endogenous DNA restriction sites were removed from the gene using the Kunkel method (29). These included three AlwNl sites (291, 945, and 1038) and one Pstl site (537). Positions are given relative to the first base of the start codon of the gene. "pENDs-Thromb" was generated as follows. An^ΛvNl/Notl fragment containing the gene was generated using oligonucleotide-directed-insertion-mutagenesis and PCR. Included on this fragment were additional nucleotide sequences encoding short peptides for use in the detection and purification of the expressed protein. The AlwNl/Notl fragment containing the gene along with epitope encoding sequences was ligated to the "bop 5' fragment", "bop 3' fragment" and "vector fragment" described above and transformed into E. coli. The resultant plasmid was named pEΝDs-Thromb. In pEΝDs-Thromb, thirty-three extra base pairs generated by the construction and encoding eleven extra amino acids are located in frame between the BR 5' sequences and the Thromb sequences. Twenty seven of the extra residues encode a poly-aspartic acid peptide sequence which when translated generates a peptide epitope useful for detection of expressed

protein (31). At the 3' terminus of the gene, six histidine codons have been inserted upstream of the Thromb stop codon. These histidine codons are intended to aid in the affinity purification of expressed protein (26). At the 3' terminus of the gene, the Thromb stop codon precedes the BR stop codon by 18 bp.

The BR regulatory sequences with the human thrombin receptor gene may be transferred into pUBP2 on a Pstl/BamHl fragment (Fig.13 and Fig.14, SEQ ID NO: 12) as described above.

ii. Constructs for expression of soluble proteins Where it is desired that heterologous soluble polypeptide be released extracellularly into the culture medium following expression, the DNA sequence encoding the heterologous polypeptide may be ligated to DNA encoding the pre¬ sequence of bacteriorhodopsin (Fig.2 (SEQ ID NO:l), from +3 to +41 relative to the RNA start site) using techniques well known to those skilled in the art (12).

Where it is advantageous to produce a heterologous soluble polypeptide that is targeted, following expression, to the halobacterial membrane, DNA encoding the heterologous polypeptide is ligated downstream of the DNA encoding the C-terminal region (Fig.2 and Fig.4 (SEQ ID NOs:l and 3)) of bacteriorhodopsin or to fragments thereof.

To facilitate subsequent purification of the heterologous polypeptide product, a DNA sequence encoding a unique protease site is engineered between DNA encoding the bacteriorhodopsin C-terminal region and DNA encoding the heterologous polypeptide. Sequences encoding unique protease cleavage sites are known and include, for example, subtilisin, thrombin, enterokinase, and factor X a . In a preferred embodiment, a DNA sequence encoding the amino acid sequence Ile-Glu-Gly-Arg (SEQ ID NO:4) is used to encode a unique protease site which is recognized by Factor X-,.

Design of the soluble protein expression vector and methods used are similar to that described above for membrane proteins. However, soluble proteins are expressed as in-frame fusions to the C-terminal region of BR. Thus,

these fusion proteins will have membranous domain (i.e. BR or portions thereof) and a soluble domain (i.e. heterologous polypeptide). The heterologous gene is cloned at the C-terminus of BR, between the bacteriorhodopsin gene and the downstream transcriptional/translational termination sequences of BR. In addition, a unique protease site is engineered between BR and the heterologous gene to facilitate subsequent purification of the protein. The final construct is cloned into the E. coli/H. halobium shuttle vector, pUBP2 (7).

a. E. coli Aspartate Transcarbamylase (catalytic subunit) The catalytic subunit of Aspartate Transcarbamylase, (denoted ATCase), a soluble protein, has been fused to the C-terminus of BR as follows. The bop gene containing plasmid, pβgbop (32), was digested at the unique NotI site located near the 3' terminus of the bop gene (see Figure 15 A). Subsequently, this NotI site was filled-in to create a blunt site (12). The resulting DΝA was digested with SphI to generate two fragments, a large fragment (denoted fragment 1) containing the vector along with the Ν-teπninus of the bop gene and a small fragment containing internal bop gene sequences. Fragment 1 was isolated and purified. A second aliquot of pβgbop was digested with Sphl/Haell and a 217 bp fragment (denoted fragment 2) containing an internal portion of the bop gene was isolated and purified (Figure 15A). The structural gene for the E. coli catalytic subunit of aspartate transcarbamylase was isolated from pEK17 (Fig.l5A) (30). A 845 bp Msel/Nrul fragment (denoted fragment 3) which contains all but the first 18 bp of the gene encoding ATCase was isolated and purified.

A synthetic fragment of DΝA (denoted fragment 4) was constructed by annealing two complementary oligonucleotides and used to connect the bop and

ATCase genes. The synthetic fragment was engineered to contain a Haell site at the 5' terminus, a Msel site at the 3' terminus and an internal Nrul site. Also included were nucleotides encoding: i) a unique protease site (i.e., blood clotting Factor X a ) and ii) ATCase amino acids 6 and 7 (relative to ATCase start codon) Fig.17, SEQ ID NO: 14.

All four DNA fragments were ligated together and used to transform E. coli strain D1210 (28) with selection for ampicillin resistance. Positive clones were identified by colony filter hybridization using P 32 radiolabeled random primed (25) ATCase Msel/Nrul fragment as probe. Positive clones were verified by restriction mapping and nucleotide sequencing. One positive clone was chosen and denoted pBATC (Figure 15A).

Subsequently, the bop-ATCase fusion construct was adapted for H. halobium expression as follows. A fragment spanning the sequences in between and including the internal SphI site of the bop gene at the 5' terminus and the ATCase translational stop codon at the 3' terminus was isolated from pBATC by PCR (see Figure 15B). In addition, the oligonucleotide used to construct the 3' terminus of this PCR fragment was designed to be complementary to bop sequences downstream of the transcriptional termination sequences and to include a unique J3αmHI to facilitate subsequent cloning steps. The resultant PCR fragment was digested with Sphl/BamHl, purified and used in the following construction.

The plasmid, pl.2Kbbop, containing the bop gene and upstream sequences cloned in pUC19 (described above) was digested with Sphl/BamHl to yield two fragments, a large one containing the vector and the majority of the bop gene, and a 358 bp fragment containing the C-terminal half of the bop gene (Fig.l5B). The larger of these two fragments was isolated, purified and ligated to the Sphl/BamHl bop-ATCase PCR fragment. A positive clone was isolated and confirmed by restriction mapping and nucleotide sequencing. This clone was digested with Pstl/BamHl and a fragment containing DNA encoding the BR/ATCase fusion along with bop upstream regulatory sequences (Fig.16) was cloned into the E. coli/H. halobium shuttle vector pUBP2. The resultant construct was named pBRAT (Fig.l5B). The nucleotide sequence (SEQ ID NO: 14) and the translated amino acid sequence (SEQ ID NO: 15) of this Psil/BamHl fragment is shown in Fig.17.

2. Transformation of Halobacterium halobium

The Pstl/BamHl fragments of the pENDs-Ser (Fig.9 and 10, SEQ ID NO:10) and pBRAT (Fig.l5B, Fig.16 and Fig.17, SEQ ID NO: 14) constructs containing the heterologous genes with the BR regulatory sequences were isolated and purified. Subsequently, these fragments were cloned into the E. coli/H. halobium shuttle vector pUBP2 (7) and transformed into H halobium Bop deficient strain L33 as described (24).

Preferably, plasmids may be introduced into halobacteria using the polyethylene glycol (PEG) method (10, 11). Transformed halobacterial cells are then grown in culture in an appropriate nutrient medium sufficient to maintain the growth of halobacterial cells (7, 8).

H halobium is prone to cell lysis during transformation procedures (7). Since surfactants are known to promote halobacterial lysis (21), all media and glassware used were soap-free. Transformation was performed according to Blaseio (7) and Cline (11) with modifications. Initially, cells were subcultured several times in soap-free complex (YET) medium. Subsequently, cells were subcultured to an OD^ of about 0.01 and grown at 40°C until the early to mid- logarithmic stage of growth (OD^ of 0.4 to 0.6). All succeeding manipulations were performed at room temperature. The culture was removed from the waterbath shaker and incubated without agitation for 4 h to overnight, followed by centrifugation of 2 ml of culture at 1000 x g for 15 min. The supernatant was carefully removed with a pipette and the interior of the centrifuge tube dried with absorbent tissue. The cell pellet was resuspended in 1/10 volume of spheroplasting solution (11), followed by addition of 1/100 volume of 0.5 M EDTA in Spheroplasting solution (11) and incubation for 2 min. One μg of DNA in 10 μl of spheroplasting solution was then added to the spheroplasted cells along with an equal volume of 60% PEG 600 (un-recrystallized) in spheroplasting solution. The combined solutions were gently but thoroughly mixed and then incubated for 20 min. Ten ml of 15% sucrose in complex (YET) medium was added followed by incubation overnight with no agitation at 42°C. The following day, cells were centrifuged at 3000 x g for 15 minutes and

resuspended in 300 μl of 15% sucrose in complex (YET) medium. This solution was plated on solid complex (YET) selection medium.

3. Analysis of transformants, expression of the heterologous polypeptide and assays for expression To establish that halobacterial cells have been successfully transformed, various techniques may be employed. Where the expression vector used to transform the halobacteria contains a dominant selectable marker, transformed cells can be selected by growing in the appropriate selection medium such that growth of halobacterial cells not harboring the recombinant plasmid is inhibited. For example, where a plasmid containing the mevinolin resistance marker is used, halobacterial cells which harbor this plasmid may be selected by growing on solid nutrient medium containing mevinolin at a concentration in the range of 5 to 25 μM. Further, the plasmid may be isolated using standard techniques (12), restricted and used. The polymerase chain reaction, gel electrophoresis, restriction analysis, Southern, Northern, and Western blots may be employed, sequencing, or the like, may all be employed with advantage.

Depending upon the particular construct and the halobacterial background strain which have been employed for expression of the heterologous polypeptides, one may have constitutive or inducible expression of the heterologous polypeptide product. In the case of constitutive expression, the product will be continuously formed as the cells grow. By contrast, for inducible expression, one may provide for induction when the cells reach a predetermined cell density.

Where inducible promoters have been engineered into the expression vector containing the heterologous polypeptide DNA sequence, transcription may be induced using appropriate inducers under such conditions of concentration and duration as to effect induction of transcription. For example, if the regulatory sequences of the bacteriorhodopsin gene are used, transcription can be induced by low oxygen tension and high light intensity (18, 19) which are known to induce high level expression of BR. Low oxygen tensions are achieved in various ways such as by flushing culture flasks with oxygen-free nitrogen and sealing them, or by permitting cultures to reach the stationary phase of growth

in which oxygen limitation occurs naturally (18). High light intensity of greater than about 100 mW/cm 2 can be achieved using various light sources and apparati as described (18, 19).

H. halobium transformed with pUBP2 containing the pBRAT Pstl/BamHl fragment and with pUBP2 containing the pENDS-Ser Pstl/BamHl fragment was plated on solid complex (YET) medium containing 25 μM mevinolin. Plates were incubated for one to two weeks at 42°C to permit growth of transformants. Plasmid DNA was isolated from individual transformants using Magic Minipreps DNA Purification System (Promega Corp., Madison, WI). Southern analysis was used to verify the presence of the heterologous gene on pUBP2. Southern blot analysis using the AlwNl/Notl fragment containing the serotonin receptor gene as probe indicated the presence of serotonin receptor gene sequences in all assayed transformants (Figure 11). Total RNA was isolated from individual transformants using the RNAzol procedure (Cinna Biotech) and subjected to Northern analysis (18). Northern blot analysis revealed that transcription of Ser gene sequences had occurred (Figure 12). Western analysis using both BR and ATCase antibodies demonstrated that the BR/ATCase fusion was expressed and localized to halobacterial membranes (Figure 18). Washed halobacterial whole cell membranes were fractionated on sucrose gradients (Fig.l9A) and aliquots were subjected to SDS PAGE (Fig.l9B). A band corresponding to the predicted molecular weight of the fusion protein (i.e., ~60 kDa, see Fig.l9B) was observed which derived from a purple fraction. These data verify expression of the fusion and indicate that the BR portion of the fusion is folded correctly in the halobacterial membrane. The presence of the BR chromophore (extinction coefficient of 63,000; 31) affords an estimate of 5 mg/liter of fusion protein expression.

Transformants testing positive in Southern and Northern analyses are subjected to Western analysis if specific antibodies to the heterologous protein are available. If antibodies are not available, DNA encoding an epitope known to be antigenic may be engineered into the expression vector construction to aid in detection of expression. An example of such an epitope is the sequence encoding Glu-Glu-Glu-Glu-Tyr-Met-Pro-Met-Glu (SEQ ID NO:5) (22).

Alternatively, expression of the heterologous protein may be assayed functionally; for example, ligand binding assays for receptors, and assays for enzymic activity for soluble proteins using appropriate substrates.

4. Purification of heterologous polypeptides Production of the heterologous polypeptide may be stopped in a variety of ways. Where the heterologous polypeptide is released into the medium, it may be isolated in a soluble or insoluble form using physical e.g. mechanical or thermal, or chemical treatments. Treatments employed may include freezing (<0° C), heating, hydrodynamic shearing, drying, selective filtration or precipitation by addition of acid, base, salts or organic solvents.

Where the expressed heterologous polypeptide resides in the membrane or in the cytoplasm, cells are harvested to separate them from the culture medium. Various techniques may be used for harvesting, desirably using centrifugation. The supernatant may then be discarded and the cell pellet washed with an appropriate buffered aqueous medium to remove any residual culture medium components. Typically the buffered medium will be at a temperature in the range of about 1 to 10°C, more usually 4 β C.

The cells may be lysed by any convenient means, such as freezing and mechanical, use of hypotonic solutions (23), and the like. The resulting dispersion of disrupted cells is then treated by such means as to substantially separate cell membranes from soluble proteins and other contaminants. Several techniques may be employed to advantage for isolating membranes including differential centrifugation, density gradient centrifugation, and the like. This membrane isolation separates the fusion protein from the bulk of the soluble proteins.

Heterologous polypeptides are purified according to procedures dependent on their individual properties and those of BR. Where the expressed soluble heterologous polypeptide is fused at the C-terminal region of BR, advantage may be taken of the likelihood that the BR domain will anchor the fusion protein in the membrane.

Where the heterologous polypeptide is expressed as a fusion polypeptide linked to the C-terminal region, or fragment thereof, of the bacteriorhodopsin gene with a unique protease site between said heterologous polypeptide and C- terminal region, the heterologous polypeptide may be isolated by incubating the halobacterial membranes with an appropriate unique protease to effect substantially complete cleavage at the protease cleavage site. For example, where the heterologous polypeptide is linked to the bacteriorhodopsin C-terminal region through the amino acid sequence Ile-Glu-Gly-Arg (SEQ ID N0:4), cell membranes are incubated with factor X-, under conditions recommended by the manufacturers. Factor X a is dissolved in redistilled water to a final protein concentration of 1 mg/ml. The fusion protein to be cleaved is dissolved in 100 mM NaCl, 50 mM Tris-HCl, 1 mM CaCl 2 , pH 8.0. To increase the solubility of the substrate, urea or acetonitrile can be added up to a final concentration of 1 M and 10% (v/v), respectively without significant inhibition of the enzyme activity. The recommended amount of enzyme is 1/200 to 1/10 of the substrate by weight. Incubation should be carried out at 4°C to 25°C for 1-18 h. The optimum cleavage conditions have to be determined for each fusion protein. The release of the desired polypeptide from the fusion protein is influenced by the adjacent amino acid sequences at the cleavage site, the size of the two fused polypeptide components, and the accessibility of the cleavage site. Protease treatment is followed by standard purification protocols to remove the minor unique protease component.

If further purification of the heterologous polypeptide protein is desired, antibodies specific for the heterologous polypeptide, ligand affinity, electrophoresis, chromatography, zonal centrifugation, and the like, may be employed to advantage. The product may then be dried by any convenient means, such as freeze drying, spray drying, and the like, or alternatively suspended in an appropriate buffered aqueous solution. The heterologous polypeptide product is then ready for use.

5. Bioassays

The heterologous polypeptides may be assayed using protocols dependent on their individual properties. For example, receptors are assayed using ligand binding assays. Soluble proteins having enzyme activity are assayed using appropriate substrates.

Bibliography

For the sake of convenience, various documents referenced in the body of the present specification are grouped in the following bibliography by number that corresponds to the parenthetical number of that reference in the text. Each of these documents is hereby expressly incorporated by reference.

1. Gropp Svst. Appl. Microbiol. 7, 95 (1986).

2. Zillig Eur. J. Biochem. 173. 473 (1988).

3. Dennis J. Bacteriol. 168. 471 (1986).

4. Oesterhelt Proc. Nat. Acad. Sci. USA 70, 2853 (1971).

5. Henderson Annu. Rev. Biophys. Bioenerg. 6, 87 (1977). 6. Katre Proc. Natl. Acad. Sci. USA 78, 4068 (1981)

7. Blasieo Proc. natl. Acad. Sci. USA 87, 6772 (1990)

8. Holmes J. Bacteriol. 172. 756 (1990)

9. Ni Gene 90, 169 (1990).

10. Charlebois Proc. Natl. Acad. Sci. usa 84, 8530 (1987) 11. Cline J. bacteriol. 169. 1341 (1987)

12. Maniatis "Molecular Cloning: A Laboratory Manual". Cold Spring Harbor Laboratory, CSH, N.Y. (1989)

13. Betlach Nucl. Acids Res. 12, 7949 (1984)

14. Leong J. bacteriol. 170. 4903 (1988) 15. Wagner FEBS Letters 131. 341 (1983)

16. Spudich Proc. Natl. Acad. Sci. USA. 79 4398 (1982)

17. Pfeifer J. Bacteriol. 145. 375 (1981)

18. Shand J. Bacteriol. 173. 4692 (1991)

19. Betlach, In: "Protocols for Archael Research". Robb & Das Sarma (Eds.), Cold Spring Harbor Laboratory. Cold Spring Harbor, N.Y., in press (1993)

20. Yanisch-Perron Gene 33, 103 (1985)

21. Kamekura Appl. Environmental Microbiol. 54, 990 (1988)

22. Grussenmeyer Proc. Natl. Acad.Sci. USA 82 7952 (1985)

23. Turner Biochemistry 32 1332 (1993) 24. Cline Can. J. Microbiol.. 35 148 (1989)

25. Feinberg Analytical Biochemistry 132 6 (1983)

26. Hoffmann Nucleic Acids Research 19 6337 (1991)

27. Julius Science 241 558 (1988)

28. Kuhn Gene 44 253 (1986) 29. Kunkel Methods Enzvmol. 154 367 (1987)

30. Nowlan J. Biol. Chem. 260 14712 (1985)

31. Power Gene 113 95 (1992)

32. Shand Biochemistry 30 3082 (1991)

33. Vu Cell 64 1057 (1991)

Concluding Remarks

The foregoing description details specific methods that can be employed to practice the present invention. Having detailed specific methods initially used to construct and use vectors for the expression, isolation, detection and further purification of heterologous polypeptides in halobacteria, those skilled in the art will know how to devise alternative reliable methods for arriving at the same and equivalent systems described herein. The foregoing should not be construed as limiting the overall scope hereof; rather, the ambit of the present invention is to be governed only by the lawful interpretation of the appended claims.

The Halobacterium strains referred to above were deposited with the American Type Culture Collection, located at 12301 Parklawn Drive, Rockville,

Maryland 20852-1776. The dates of the deposits were ATCC 29252 - February 3, 1976; ATCC 38799 - September 13, 1979; ATCC 29715 - September 19, 1977 and ATCC 33500 - February 23, 1981.

SEQUENCE LISTING

(1) GENERAL INFORMATION:

(i) APPLICANT: TURNER, George J. BETLACH, Mary C.

(ii) TITLE OF INVENTION: EXPRESSION OF HETEROLOGOUS POLYPEPTIDES IN HALOBACTERIA

(iii) NUMBER OF SEQUENCES: 15

(iv) CORRESPONDENCE ADDRESS:

(A) ADDRESSEE: Robert Berliner

(B) STREET: 201 North Figueroa Street

(C) CITY: Los Angeles

(D) STATE: California

(E) COUNTRY: USA

(F) ZIP: 90012

(v) COMPUTER READABLE FORM:

(A) MEDIUM TYPE: Floppy disk

(B) COMPUTER: IBM PC compatible

(C) OPERATING SYSTEM: PC-DOS/MS-DOS

(D) SOFTWARE: Patentin Release #1.0, Version #1.25

(vi) CURRENT APPLICATION DATA:

(A) APPLICATION NUMBER: PCT

(B) FILING DATE:

(C) CLASSIFICATION:

(viii) ATTORNEY/AGENT INFORMATION:

(A) NAME: Berliner, Robert

(B) REGISTRATION NUMBER: 20,121

(C) REFERENCE/DOCKET NUMBER: 5555-206-PCT

(ix) TELECOMMUNICATION INFORMATION:

(A) TELEPHONE: (213) 977-1001

(B) TELEFAX: (213) 977-1003

(2) INFORMATION FOR SEQ ID NO:l:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 1254 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA (genomic)

(ιx) FEATURE.

(A) NAME/KEY: misc feature

(B) LOCATION: 376.7414

(D) OTHER INFORMATION: /note- "Bacteriorhodopsin pre-sequence."

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION: 376..1161

(ix) FEATURE:

(A) NAME/KEY: rrπsc feature

(B) LOCATION: 3..8 "

(D) OTHER INFORMATION: /note- "Pstl site."

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 12457.1250

(D) OTHER INFORMATION: /note- "BamHl site."

(ix) FEATURE:

(A) NAME/KEY: misc signal

(B) LOCATION: 374 "

(D) OTHER INFORMATION: /note- "RNA start site."

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 9..414

(D) OTHER INFORMATION: /note- "Bacteriorhodopsin transcriptional and translational regulatory sequences are located in this region."

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l:

ATCTGCAGGA TGGGTGCAAC CGTGAAGTCC GTCACGGCTG CGTCACGACA GGAGCCGACC 60

AGCGACACCC AGAAGGTGCG AACGGπGAG TGCCGCAACG ATCACGAGTT TTTCGTGCGC 120

TTCGAGTGGT AACACGCGTG CACGCATCGA CTTCACCGCG GGTGTTTCGA CGCCAGCCGG 180

CCGTTGAACC AGCAGGCAGC GGGCATTTCA CAGCCGCTGT GGCCCACACA CTCGGTGGGG 240

TGCGCTATTT TGGTATGGTT TGGAATCCGC GTGTCGGCTC CGTGTCTGAC GGTTCATCGG 300

TCTAAATTCC GTCACGAGCG TACCATACTG ATTGGGTCGT AGAGTTACAC ACATATCCTC 360

GTTAGGTACT GTTGC ATG TTG GAG TTA TTG CCA ACA GCA GTG GAG GGG GTA 411 Met Leu Glu Leu Leu Pro Thr Ala Val Glu Gly Val 1 5 10

TCG CAG GCC CAG ATC ACC GGA CGT CCG GAG TGG ATC TGG CTA GCG CTC 459 Ser Gin Ala Gin He Thr Gly Arg Pro Glu Trp He Trp Leu Ala Leu 15 20 25

GGT ACG GCG CTA ATG GGA CTC GGG ACG CTC TAT πC CTC GTG AAA GGG 507 Gly Thr Ala Leu Met Gly Leu Gly Thr Leu Tyr Phe Leu Val Lys Gly 30 35 40

ATG GGC GTC TCG GAC CCA GAT GCA AAG AAA πC TAC GCC ATC ACG ACG 555 Met Gly Val Ser Asp Pro Asp Ala Lys Lys Phe Tyr Ala He Thr Thr 45 50 55 60

CTC GTC CCA GCC ATC GCG πC ACG ATG TAC CTC TCG ATG CTG CTG GGG 603 Leu Val Pro Ala He Ala Phe Thr Met Tyr Leu Ser Met Leu Leu Gly 65 70 75

TAT GGC CTC ACA ATG GTA CCG πC GGT GGG GAG CAG AAC CCC ATC TAC 651 Tyr Gly Leu Thr Met Val Pro Phe Gly Gly Glu Gin Asn Pro He Tyr 80 85 90

TGG GCG CGG TAC GCT GAC TGG CTG πC ACC ACG CCG CTG πG πG πA 699 Trp Ala Arg Tyr Ala Asp Trp Leu Phe Thr Thr Pro Leu Leu Leu Leu 95 100 105

GAC CTC GCG πG CTC Gπ GAC GCG GAT CAG GGA ACG ATC Cπ GCG CTC 747 Asp Leu Ala Leu Leu Val Asp Ala Asp Gin Gly Thr He Leu Ala Leu 110 115 120

GTC GGT GCC GAC GGC ATC ATG ATC GGG ACC GGC CTG GTC GGC GCA CTG 795 Val Gly Ala Asp Gly He Met He Gly Thr Gly Leu Val Gly Ala Leu 125 130 135 140

ACG AAG GTC TAC TCG TAC CGC πC GTG TGG TGG GCG ATC AGC ACC GCA 843 Thr Lys Val Tyr Ser Tyr Arg Phe Val Trp Trp Ala He Ser Thr Ala 145 150 155

GCG ATG CTG TAC ATC CTG TAC GTG CTG πC πC GGG πC ACC TCG AAG 891 Ala Met Leu Tyr He Leu Tyr Val Leu Phe Phe Gly Phe Thr Ser Lys 160 165 170

GCC GAA AGC ATG CGC CCC GAG GTC GCA TCC ACG πC AAA GTA CTG CGT 939 Ala Glu Ser Met Arg Pro Glu Val Ala Ser Thr Phe Lys Val Leu Arg 175 180 185

AAC Gπ ACC Gπ GTG πG TGG TCC GCG TAT CCC GTC GTG TGG CTG ATC 987 Asn Val Thr Val Val Leu Trp Ser Ala Tyr Pro Val Val Trp Leu He 190 195 200

GGC AGC GAA GGT GCG GGA ATC GTG CCG CTG AAC ATC GAG ACG CTG CTG 1035 Gly Ser Glu Gly Ala Gly He Val Pro Leu Asn He Glu Thr Leu Leu 205 210 215 220

πC ATG GTG Cπ GAC GTG AGC GCG AAG GTC GGC πC GGG CTC ATC CTC 1083 Phe Met Val Leu Asp Val Ser Ala Lys Val Gly Phe Gly Leu He Leu 225 230 235

CTG CGC AGT CGT GCG ATC πC GGC GAA GCC GAA GCG CCG GAG CCG TCC 1131 Leu Arg Ser Arg Ala He Phe Gly Glu Ala Glu Ala Pro Glu Pro Ser 240 245 250

GCC GGC GAC GGC GCG GCC GCG ACC AGC GAC TGATCGCACA CGCAGGACAG 1181 Ala Gly Asp Gly Ala Ala Ala Thr Ser Asp 255 260

CCCCACAACC GGCGCGGCTG TGπCAACGA CACACGATGA GTCCCCCACT CGGTCπGTA 1241

CTCGGATCCT Tπ 1254

(2) INFORMATION FOR SEQ ID NO:2:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 262 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:2:

Met Leu Glu Leu Leu Pro Thr Ala Val Glu Gly Val Ser Gin Ala Gin 1 5 10 15

He Thr Gly Arg Pro Glu Trp He Trp Leu Ala Leu Gly Thr Ala Leu 20 25 30

Met Gly Leu Gly Thr Leu Tyr Phe Leu Val Lys Gly Met Gly Val Ser 35 40 45

Asp Pro Asp Ala Lys Lys Phe Tyr Ala He Thr Thr Leu Val Pro Ala 50 55 60

He Ala Phe Thr Met Tyr Leu Ser Met Leu Leu Gly Tyr Gly Leu Thr 65 70 75 80

Met Val Pro Phe Gly Gly Glu Gin Asn Pro He Tyr Trp Ala Arg Tyr 85 90 95

Ala Asp Trp Leu Phe Thr Thr Pro Leu Leu Leu Leu Asp Leu Ala Leu 100 105 110

Leu Val Asp Ala Asp Gin Gly Thr He Leu Ala Leu Val Gly Ala Asp 115 120 125

Gly He Met He Gly Thr Gly Leu Val Gly Ala Leu Thr Lys Val Tyr 130 135 140

Ser Tyr Arg Phe Val Trp Trp Ala He Ser Thr Ala Ala Met Leu Tyr 145 150 155 160

He Leu Tyr Val Leu Phe Phe Gly Phe Thr Ser Lys Ala Glu Ser Met 165 170 175

Arg Pro Glu Val Ala Ser Thr Phe Lys Val Leu Arg Asn Val Thr Val 180 185 190

Val Leu Trp Ser Ala Tyr Pro Val Val Trp Leu He Gly Ser Glu Gly 195 200 205

Ala Gly He Val Pro Leu Asn He Glu Thr Leu Leu Phe Met Val Leu 210 215 220

Asp Val Ser Ala Lys Val Gly Phe Gly Leu He Leu Leu Arg Ser Arg 225 230 235 240

Ala He Phe Gly Glu Ala Glu Ala Pro Glu Pro Ser Ala Gly Asp Gly 245 250 255

Ala Ala Ala Thr Ser Asp 260

(2) INFORMATION FOR SEQ ID NO:3:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 248 amino acids

(B) TYPE: amino acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(ix) FEATURE:

(A) NAME/KEY: Region

(B) LOCATION: 225..248

(D) OTHER INFORMATION: /note- "Cytoplasmic C-terminal region of bacteriorhodopsin."

(ix) FEATURE:

(A) NAME/KEY: Region

(B) LOCATION: 1

(D) OTHER INFORMATION: /note- "Pyrogluta ate."

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:3:

Xaa Ala Gin He Thr Gly Arg Pro Glu Trp He Trp Leu Ala Leu Gly 1 5 10 15

Thr Ala Leu Met Gly Leu Gly Thr Leu Tyr Phe Leu Val Lys Gly Met 20 25 30

Gly Val Ser Asp Pro Asp Ala Lys Lys Phe Tyr Ala He Thr Thr Leu 35 40 45

Val Pro Ala He Ala Phe Thr Met Tyr Leu Ser Met Leu Leu Gly Tyr 50 55 60

Gly Leu Thr Met Val Pro Phe Gly Gly Glu Gin Asn Pro He Tyr Trp 65 70 75 80

Ala Arg Tyr Ala Asp Trp Leu Phe Thr Thr Pro Leu Leu Leu Leu Asp 85 90 95

Leu Ala Leu Leu Val Asp Ala Asp Gin Gly Thr He Leu Ala Leu Val 100 105 110

Gly Ala Asp Gly He Met He Gly Thr Gly Leu Val Gly Ala Leu Thr 115 120 125

Lys Val Tyr Ser Tyr Arg Phe Val Trp Trp Ala He Ser Thr Ala Ala 130 135 140

Met Leu Tyr He Leu Tyr Val Leu Phe Phe Gly Phe Thr Ser Lys Ala 145 150 155 160

Glu Ser Met Arg Pro Glu Val Ala Ser Thr Phe Lys Val Leu Arg Asn 165 170 175

Val Thr Val Val Leu Trp Ser Ala Tyr Pro Val Val Trp Leu He Gly 180 185 190

Ser Glu Gly Ala Gly He Val Pro Leu Asn He Glu Thr Leu Leu Phe 195 200 205

Met Val Leu Asp Val Ser Ala Lys Val Gly Phe Gly Leu He Leu Leu 210 215 220

Arg Ser Arg Ala He Phe Gly Glu Ala Glu Ala Pro Glu Pro Ser Ala 225 230 235 240

Gly Asp Gly Ala Ala Ala Thr Ser 245

(2) INFORMATION FOR SEQ ID N0:4:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 4 amino acids

(B) TYPE: amino acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:4:

He Glu Gly Arg

1

(2) INFORMATION FOR SEQ ID NO:5:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 9 amino acids

(B) TYPE: amino acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5:

Glu Glu Glu Glu Tyr Met Pro Met Glu 1 5

(2) INFORMATION FOR SEQ ID NO:6:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 1956 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: cDNA

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION: 376..1812

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 376..414

(D) OTHER INFORMATION: /note- "Bacteriorhodopsin pre-sequence. "

( i x) FEATURE :

(A) NAME/KEY: terminator

(B) LOCATION: 1864..1866

(D) OTHER INFORMATION: /note- "Bacteriorhodopsin stop codon. "

(ix) FEATURE:

(A) NAME/KEY: mutation

(B) LOCATION: replace(213, " * ')

(D) OTHER INFORMATION: /note- "G to T mutation removes AlwNl restriction site."

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 427..435

(D) OTHER INFORMATION: /note- "AlwNl cloning site."

(ix) FEATURE:

(A) NAME/KEY: mutation

(B) LOCATION: replace(930, "")

(D) OTHER INFORMATION: /note- "G to A mutation removes AlwNl restriction site."

(ix) FEATURE:

(A) NAME/KEY: mutation

(B) LOCATION: replace(1179, "")

(D) OTHER INFORMATION: /note- "T to A mutation removes AlwNl site."

(ix) FEATURE:

(A) NAME/KEY: mutation

(B) LOCATION: replace(1245, "")

(D) OTHER INFORMATION: /note- "G to A mutation removes Pstl restriction site."

(ix) FEATURE:

(A) NAME/KEY: misc signal

(B) LOCATION: 374

(D) OTHER INFORMATION: /note- "RNA start site."

(ix) FEATURE:

(A) NAME/KEY: mutation

(B) LOCATION: replace(1863, "")

(D) OTHER INFORMATION: /note- "C to T mutation removes AlwNl restriction site."

(ix) FEATURE:

(A) NAME/KEY: terminator

(B) LOCATION: 1813..1815

(D) OTHER INFORMATION: /note- "Muscarinic "0M1" stop codon. "

-εε-

991 OST sn sλη BLV ^V u i jas naη OJd β JV JL|1 L^Λ uas 9i|d ^l βj V dsv θu,d t78 9W 3391933V139V 91333399313V 9193313113V13933V9111 on 9cτ oετ 21 jas aLI naη naη naη usv 1 H L^Λ ΘS ^LV usv ΘS B LV L^Λ --^1 sv 96Z 39V 31V 3139139131W 91V 3193313391W 39V 339919 IVl 3V9

021 9TI Oil naη LV naη d.χ naη dsv so BLV naη LJ A " L9 naη LV dji SLH X " L9

LU 9133399139913133V919113991393V 3999131399913V3399

90T 00T 96 law naη naη Jλj_ jqi jqi J/CJ. naη usv IΘH ^S θMd uqi 9 ΘLI 9 LI

669 91V 3139133V193V 33V IVl 3133W 91V 33131133V 19931V 31V

06 98 08 naη dsv ^LV sλ " o B LV naη jas naη naη ai)d J j. usv usv L e Λ J M1 sλη T 9 3133V913919133991339V 9139133113V13W 1W 319 V3V 9W

9Z OZ 99 naη nL9 JMI USV L B Λ sΛ " η ai|d Jas aLl naη LBΛ naη naη usv A ~ L9 Ji|i ε09 3139V993V 3W 3199W 31113131V 313 V199139133W 399 V3V

09 99 09 Sl7

LBΛ JL|1 BLV naη jas naη naη /LS JLJI JL|1 ΘLI 9 Θ I θu.d <?LV L<?Λ 999 919 V3V 339 V1393191331339993V 33V 31V 9991LV 311339919 o* ε oε

UL9 dJi O d X " L9 sλη X " L9 OJd BLV naη ^Λ u i ΘLI usv OJd JΘS L&Λ Z09 W39913331999W V99 V33 V3991331933V 31V 3W 33339V 319

92 02 91

BLV OJd O d BLV JΘS JMI uv }3W nθη LV UL9 ΘLI ULS BLV UL9 ΘS 69t7 139133 V33339 V3113V 3W 91V 9139399V331V 9V33399V3931

01 9 I

L<?Λ ^L9 nL9 L^Λ ^LV ^ l OJd naη naη nL9 naη ;aw III? V199999V9919 V39 V3V V33911 V1L 9V991191V 3911913V199V1L9

09ε 31331V1V3V 3V3V1L9V9V 193199911V 913V1V33V1939V93V3193311VW131

00ε 9931V311993V91319193313993191939331W99111991V199111LV139391

0i72 9999199313 V3V3V3339919139331V3 V3111V399939V399V39V 33W911933

081 99339V3393 V931U919993933V3113 V931V393V3919393V3W 19919V9311

02Ϊ 39391931111L9V93V31V 93W3933919V91L993W 939199W9V 333V3V939V

09 33V9339V99 V3V93V3193913993V3193319W91933W3919991 V99V39131V

: 9:0N QI Q3S : NOIldIαOS3C_ 33N3nθ3S (

€Z0/f6SO/I3J 6SLIZ/Ϋ6 O

CGC ACA CCC CGC CGC GCA GCT CTG ATG ATC GGC CTG GCC TGG CTG Gπ 891 Arg Thr Pro Arg Arg Ala Ala Leu Met He Gly Leu Ala Trp Leu Val 160 165 170

TCC Tπ GTG CTC TGG GCC CCA GCC ATC CTC πC TGG CAA TAC CTG GTA 939 Ser Phe Val Leu Trp Ala Pro Ala He Leu Phe Trp Gin Tyr Leu Val 175 180 185

GGG GAG CGG ACG ATG CTA GCT GGG CAG TGC TAC ATC CAG πC CTC TCC 987 Gly Glu Arg Thr Met Leu Ala Gly Gin Cys Tyr He Gin Phe Leu Ser 190 195 200

CAG CCC ATC ATC ACC Tπ GGC ACA GCC ATG GCT GCC πC TAC CTC CCT 1035

Gin Pro He He Thr Phe Gly Thr Ala Met Ala Ala Phe Tyr Leu Pro

205 210 215 220

GTC ACA GTC ATG TGC ACG CTC TAC TGG CGC ATC TAC CGG GAG ACA GAG 1083

Val Thr Val Met Cys Thr Leu Tyr Trp Arg He Tyr Arg Glu Thr Glu 225 230 235

AAC CGA GCA CGG GAG CTG GCA GCC Cπ CAG GGC TCC GAG ACG CCA GGC 1131

Asn Arg Ala Arg Glu Leu Ala Ala Leu Gin Gly Ser Glu Thr Pro Gly

240 245 250

AAA GGG GGT GGC AGC AGC AGC AGC TCA GAG AGG TCT CAG CCA GGG GCA 1179 Lys Gly Gly Gly Ser Ser Ser Ser Ser Glu Arg Ser Gin Pro Gly Ala 255 260 265

GAG GGC TCA CCA GAG ACT CCT CCA GGC CGC TGC TGT CGC TGC TGC CGG 1227

Glu Gly Ser Pro Glu Thr Pro Pro Gly Arg Cys Cys Arg Cys Cys Arg 270 275 280

GCC CCA AGG CTG CTG CAA GCC TAC AGC TGG AAG GAA GAA GAG GAA GAG 1275

Ala Pro Arg Leu Leu Gin Ala Tyr Ser Trp Lys Glu Glu Glu Glu Glu

285 290 295 300

GAC GAA GGC TCC ATG GAG TCC CTC ACA TCC TCA GAG GGA GAG GAG CCT 1323

Asp Glu Gly Ser Met Glu Ser Leu Thr Ser Ser Glu Gly Glu Glu Pro

305 310 315

GGC TCC GAA GTG GTG ATC AAG ATG CCA ATG GTG GAC CCC GAG GCA CAG 1371 Gly Ser Glu Val Val He Lys Met Pro Met Val Asp Pro Glu Ala Gin 320 325 330

GCC CCC ACC AAG CAG CCC CCA CGG AGC TCC CCA AAT ACA GTC AAG AGG 1419

Ala Pro Thr Lys Gin Pro Pro Arg Ser Ser Pro Asn Thr Val Lys Arg 335 340 345

CCG ACT AAG AAA GGG CGT GAT CGA GCT GGC AAG GGC CAG AAG CCC CGT 1467

Pro Thr Lys Lys Gly Arg Asp Arg Ala Gly Lys Gly Gin Lys Pro Arg 350 355 360

-9ε-

oε 92 02 usv OJd JΘS LBΛ BLV OJd OJd BLV JΘS J1 usv TΘW naη BLV UL9 aLI

91 0T 9 1

UL9 BLV UL9 Jas L Λ ^L9 nL9 LBΛ BLV l OJd naη nθη nL9 naη IΘH

■AOn 0103S : N0IldIαOS3C] 33N3.1D3S (P uμ ojd :3dλl ηn310W (! )

JBΘULL :λ9010d01 (CO

PPB OULUJB :3dλl (9) spLOB OULIUB βL : H19N31 (V)

: S3USiy313VdVH333N3.103S (!■)

: Z:0N Ql DBS OJ NOUVWdOJNI (2)

9961 1LLL 331V99313V 19113199313V3333319V 91V93V3V3V

2T6T 93W31L91913993939933W3V33339 V3V99V393V 3V3931V91L V939V33V93

298T 9339939V3333V3313331 V39133131333319V1V91391 W3393

9 P 0Z17 9917

JΘS ojd Jqi β JV LH LBΛ ΛΘS ^L9 0J d Bv sλη oj aLI sλη β JV di ε08I 33133313V 3933V39193313991333939W 33331V 9W 393991

99ZI on £ oεt?

BLV sλη usv sλo naη BLV λl sλo θW OJd usv ΘLI ^ l UΘS USV LBΛ Z0ZI 339 VW 3W 391313 V393V139191V 3333W 31V 33V 39V 3W 319

Jλi sλo nθη sλo dsv

699T 3V1391913 1913V9

0117 9017 0017 sλη sλo θq i JΘS L e Λ nθη LBΛ IΘW ΘLI USV Jλi ojd jqi dJi jqi TT9I 9W 39131133V 33191991391991V 31V 3W 3V1933 V3V 99133V

nθη BLV LV sλη sλη ε99T 313 1399399W 9W

naη U L9 Π 9 sλη 9I9T 913 9V3 9V9 9W V99

88 εro /f βsaii D d ew-iz/w OΛV

He Thr Val Leu Ala Pro Gly Lys Gly Pro Trp Gin Val Ala Phe He 35 40 45

Gly He Thr Thr Gly Leu Leu Ser Leu Ala Thr Val Thr Gly Asn Leu 50 55 60

Leu Val Leu He Ser Phe Lys Val Asn Thr Glu Leu Lys Thr Val Asn 65 70 75 80

Asn Tyr Phe Leu Leu Ser Leu Ala Cys Ala Asp Leu He He Gly Thr 85 90 95

Phe Ser Met Asn Leu Tyr Thr Thr Tyr Leu Leu Met Gly His Trp Ala 100 105 110

Leu Gly Thr Leu Ala Cys Asp Leu Trp Leu Ala Leu Asp Tyr Val Ala 115 120 125

Ser Asn Ala Ser Val Met Asn Leu Leu Leu He Ser Phe Asp Arg Tyr 130 135 140

Phe Ser Val Thr Arg Pro Leu Ser Tyr Arg Ala Lys Arg Thr Pro Arg 145 150 155 160

Arg Ala Ala Leu Met He Gly Leu Ala Trp Leu Val Ser Phe Val Leu 165 170 175

Trp Ala Pro Ala He Leu Phe Trp Gin Tyr Leu Val Gly Glu Arg Thr 180 185 190

Met Leu Ala Gly Gin Cys Tyr He Gin Phe Leu Ser Gin Pro He He 195 200 205

Thr Phe Gly Thr Ala Met Ala Ala Phe Tyr Leu Pro Val Thr Val Met 210 215 220

Cys Thr Leu Tyr Trp Arg He Tyr Arg Glu Thr Glu Asn Arg Ala Arg 225 230 235 240

Glu Leu Ala Ala Leu Gin Gly Ser Glu Thr Pro Gly Lys Gly Gly Gly 245 250 255

Ser Ser Ser Ser Ser Glu Arg Ser Gin Pro Gly Ala Glu Gly Ser Pro 260 265 270

Glu Thr Pro Pro Gly Arg Cys Cys Arg Cys Cys Arg Ala Pro Arg Leu 275 280 285

Leu Gin Ala Tyr Ser Trp Lys Glu Glu Glu Glu Glu Asp Glu Gly Ser 290 295 300

Met Glu Ser Leu Thr Ser Ser Glu Gly Glu Glu Pro Gly Ser Glu Val 305 310 315 320

Val He Lys Met Pro Met Val Asp Pro Glu Ala Gin Ala Pro Thr Lys 325 330 335

Gin Pro Pro Arg Ser Ser Pro Asn Thr Val Lys Arg Pro Thr Lys Lys 340 345 350

Gly Arg Asp Arg Ala Gly Lys Gly Gin Lys Pro Arg Gly Lys Glu Gin 355 360 365

Leu Ala Lys Arg Lys Thr Phe Ser Leu Val Lys Glu Lys Lys Ala Ala 370 375 380

Arg Thr Leu Ser Ala He Leu Leu Ala Phe He Leu Thr Trp Thr Pro 385 390 395 400

Tyr Asn He Met Val Leu Val Ser Thr Phe Cys Lys Asp Cys Val Pro 405 410 415

Glu Thr Leu Trp Glu Leu Gly Tyr Trp Leu Cys Tyr Val Asn Ser Thr 420 425 430

He Asn Pro Met Cys Tyr Ala Leu Cys Asn Lys Ala Phe Arg Asp Thr 435 440 445

Phe Arg Leu Leu Leu Cys Arg Trp Asp Lys Arg Arg Trp Arg Lys He 450 455 460

Pro Lys Arg Pro Gly Ser Val His Arg Thr Pro Ser Arg Gin Cys 465 470 475

(2) INFORMATION FOR SEQ ID NO:8:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 1581 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: cDNA

"Bacteriorhodopsin

(i x) FEATURE :

(A) NAME/KEY: terminator

(B) LOCATION: 1489..1491

(D) OTHER INFORMATION: /note- "Bacteriorhodopsin stop codon. "

(ix) FEATURE:

(A) NAME/KEY: mutation

(B) LOCATION: replace(213, "")

(D) OTHER INFORMATION: /note- "G to T mutation removes AlwNI restriction site."

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 427..435

(D) OTHER INFORMATION: /note- "AlwNI cloning site."

(ix) FEATURE:

(A) NAME/KEY: mutation

(B) LOCATION: replace(930, "")

(D) OTHER INFORMATION: /note- "G to A mutation removes AlwNI site."

(ix) FEATURE:

(A) NAME/KEY: misc signal

(B) LOCATION: 374 "

(D) OTHER INFORMATION: /note- "RNA start site."

(ix) FEATURE:

(A) NAME/KEY: terminator

(B) LOCATION: 1438..1440

(D) OTHER INFORMATION: /note- "Muscarinic stop codon."

(ix) FEATURE:

(A) NAME/KEY: mutation

(B) LOCATION: replace(1488, "")

(D) OTHER INFORMATION: /note- "C to T mutation removes AlwNI restriction site."

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8:

ATCTGCAGGA TGGGTGCAAC CGTGAAGTCC GTCACGGCTG CGTCACGACA GGAGCCGACC 60

AGCGACACCC AGAAGGTGCG AACGGπGAG TGCCGCAACG ATCACGAGπ πTCGTGCGC 120 πCGAGTGGT AACACGCGTG CACGCATCGA CπCACCGCG GGTGTπCGA CGCCAGCCGG 180

CCGπGAACC AGCAGGCAGC GGGCAπTCA CATCCGCTGT GGCCCACACA CTCGGTGGGG 240

TGCGCTAITT TGGTATGGπ TGGAATCCGC GTGTCGGCTC CGTGTCTGAC GGπCATCGG 300

TCTAAAπCC GTCACGAGCG TACCATACTG AπGGGTCGT AGAGπACAC ACATATCCTC 360

GπAGGTACT GπGC ATG πG GAG πA πG CCA ACA GCA GTG GAG GGG GTA 411 Met Leu Glu Leu Leu Pro Thr Ala Val Glu Gly Val 1 5 10

TCG CAG GCC CAG ATC CAG GCG CTG ATG AAC ACT TCA GCC CCA CCT GCT 459 Ser Gin Ala Gin He Gin Ala Leu Met Asn Thr Ser Ala Pro Pro Ala 15 20 25

GTC AGC CCC AAC ATC ACC GTC CTG GCA CCA GGA AAG GGT CCC TGG CAA 507 Val Ser Pro Asn He Thr Val Leu Ala Pro Gly Lys Gly Pro Trp Gin 30 35 40

GTG GCC πC Aπ GGG ATC ACC ACG GGC CTC CTG TCG CTA GCC ACA GTG 555 Val Ala Phe He Gly He Thr Thr Gly Leu Leu Ser Leu Ala Thr Val 45 50 55 60

ACA GGC AAC CTG CTG GTA CTC ATC TCT πC AAG GTC AAC ACG GAG CTC 603 Thr Gly Asn Leu Leu Val Leu He Ser Phe Lys Val Asn Thr Glu Leu 65 70 75

AAG ACA GTC AAT AAC TAC πC CTG CTG AGC CTG GCC TGT GCT GAC CTC 651 Lys Thr Val Asn Asn Tyr Phe Leu Leu Ser Leu Ala Cys Ala Asp Leu 80 85 90

ATC ATC GGT ACC πC TCC ATG AAC CTC TAT ACC ACG TAC CTG CTC ATG 699 He He Gly Thr Phe Ser Met Asn Leu Tyr Thr Thr Tyr Leu Leu Met 95 100 105

GGC CAC TGG GCT CTG GGC ACG CTG GCT TGT GAC CTC TGG CTG GCC CTG 747 Gly His Trp Ala Leu Gly Thr Leu Ala Cys Asp Leu Trp Leu Ala Leu 110 115 120

GAC TAT GTG GCC AGC AAT GCC TCC GTC ATG AAT CTG CTG CTC ATC AGC 795 Asp Tyr Val Ala Ser Asn Ala Ser Val Met Asn Leu Leu Leu He Ser 125 130 135 140

7TT GAC CGC TAC πC TCC GTG ACT CGG CCC CTG AGC TAC CGT GCC AAG 843 Phe Asp Arg Tyr Phe Ser Val Thr Arg Pro Leu Ser Tyr Arg Ala Lys 145 150 155

CGC ACA CCC CGC CGC GCA GCT CTG ATG ATC GGC CTG GCC TGG CTG Gπ 891 Arg Thr Pro Arg Arg Ala Ala Leu Met He Gly Leu Ala Trp Leu Val 160 165 170

TCC ITT GTG CTC TGG GCC CCA GCC ATC CTC πC TGG CAA TAC CTG GTA 939 Ser Phe Val Leu Trp Ala Pro Ala He Leu Phe Trp Gin Tyr Leu Val 175 180 185

GGG GAG CGG ACG ATG CTA GCT GGG CAG TGC TAC ATC CAG πC CTC TCC 987 Gly Glu Arg Thr Met Leu Ala Gly Gin Cys Tyr He Gin Phe Leu Ser 190 195 200

CAG CCC ATC ATC ACC Tπ GGC ACA GCC ATG GCT GCC πC TAC CTC CCT 1035 Gin Pro He He Thr Phe Gly Thr Ala Met Ala Ala Phe Tyr Leu Pro 205 210 215 220

GTC ACA GTC ATG TGC ACG CTC TAC TGG CGC ATC TAC CGG GAG ACA GAG 1083 Val Thr Val Met Cys Thr Leu Tyr Trp Arg He Tyr Arg Glu Thr Glu 225 230 235

AAC CGA GCA CGG GAG CTG GCA GCC Cπ CAG GGC TCC GAG ACG CCA GGC 1131 Asn Arg Ala Arg Glu Leu Ala Ala Leu Gin Gly Ser Glu Thr Pro Gly 240 245 250

AAA AAG GAG AAG AAG GCG GCT CGG ACC CTG AGT GCC ATC CTC CTG GCC 1179 Lys Lys Glu Lys Lys Ala Ala Arg Thr Leu Ser Ala He Leu Leu Ala 255 260 265 πC ATC CTC ACC TGG ACA CCG TAC AAC ATC ATG GTG CTG GTG TCC ACC 1227 Phe He Leu Thr Trp Thr Pro Tyr Asn He Met Val Leu Val Ser Thr 270 275 280 πC TGC AAG GAC TGT Gπ CCC GAG ACC CTG TGG GAG CTG GGC TAC TGG 1275 Phe Cys Lys Asp Cys Val Pro Glu Thr Leu Trp Glu Leu Gly Tyr Trp 285 290 295 300

CTG TGC TAC GTC AAC AGC ACC ATC AAC CCC ATG TGC TAC GCA CTC TGC 1323 Leu Cys Tyr Val Asn Ser Thr He Asn Pro Met Cys Tyr Ala Leu Cys 305 310 315

AAC AAA GCC πC CGG GAC ACC πT CGC CTG CTG Cπ TGC CGC TGG GAC 1371 Asn Lys Ala Phe Arg Asp Thr Phe Arg Leu Leu Leu Cys Arg Trp Asp 320 325 330

AAG AGA CGC TGG CGC AAG ATC CCC AAG CGC CCT GGC TCC GTG CAC CGC 1419 Lys Arg Arg Trp Arg Lys He Pro Lys Arg Pro Gly Ser Val His Arg 335 340 345

ACT CCC TCC CGC CAA TGC TGATAGTCCC CTCTCCTGCA TCCCTCCACC 1467

Thr Pro Ser Arg Gin Cys 350

CCAGCGGCCG CGACCAGCGA πGATCGCAC ACGCAGGACA GCCCCACAAC CGGCGCGGCT 1527

GTGπCAACG ACACACGATG AGTCCCCCAC TCGGTCπGT ACTCGGATCC TTIT 1581

(2) INFORMATION FOR SEQ ID NO:9:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 354 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:9:

Met Leu Glu Leu Leu Pro Thr Ala Val Glu Gly Val Ser Gin Ala Gin 1 5 10 15

He Gin Ala Leu Met Asn Thr Ser Ala Pro Pro Ala Val Ser Pro Asn 20 25 30

He Thr Val Leu Ala Pro Gly Lys Gly Pro Trp Gin Val Ala Phe He 35 40 45

Gly He Thr Thr Gly Leu Leu Ser Leu Ala Thr Val Thr Gly Asn Leu 50 55 60

Leu Val Leu He Ser Phe Lys Val Asn Thr Glu Leu Lys Thr Val Asn 65 70 75 80

Asn Tyr Phe Leu Leu Ser Leu Ala Cys Ala Asp Leu He He Gly Thr 85 90 95

Phe Ser Met Asn Leu Tyr Thr Thr Tyr Leu Leu Met Gly His Trp Ala 100 105 110

Leu Gly Thr Leu Ala Cys Asp Leu Trp Leu Ala Leu Asp Tyr Val Ala 115 120 125

Ser Asn Ala Ser Val Met Asn Leu Leu Leu He Ser Phe Asp Arg Tyr 130 135 140

Phe Ser Val Thr Arg Pro Leu Ser Tyr Arg Ala Lys Arg Thr Pro Arg 145 150 155 160

Arg Ala Ala Leu Met He Gly Leu Ala Trp Leu Val Ser Phe Val Leu 165 170 175

Trp Ala Pro Ala He Leu Phe Trp Gin Tyr Leu Val Gly Glu Arg Thr 180 185 190

Met Leu Ala Gly Gin Cys Tyr He Gin Phe Leu Ser Gin Pro He He 195 200 205

Thr Phe Gly Thr Ala Met Ala Ala Phe Tyr Leu Pro Val Thr Val Met 210 215 220

Cys Thr Leu Tyr Trp Arg He Tyr Arg Glu Thr Glu Asn Arg Ala Arg 225 230 235 240

Glu Leu Ala Ala Leu Gin Gly Ser Glu Thr Pro Gly Lys Lys Glu Lys 245 250 255

Lys Ala Ala Arg Thr Leu Ser Ala He Leu Leu Ala Phe He Leu Thr 260 265 270

Trp Thr Pro Tyr Asn He Met Val Leu Val Ser Thr Phe Cys Lys Asp 275 280 285

Cys Val Pro Glu Thr Leu Trp Glu Leu Gly Tyr Trp Leu Cys Tyr Val 290 295 300

Asn Ser Thr He Asn Pro Met Cys Tyr Ala Leu Cys Asn Lys Ala Phe 305 310 315 320

Arg Asp Thr Phe Arg Leu Leu Leu Cys Arg Trp Asp Lys Arg Arg Trp 325 330 335

Arg Lys He Pro Lys Arg Pro Gly Ser Val His Arg Thr Pro Ser Arg 340 345 350

Gin Cys

(2) INFORMATION FOR SEQ ID NO:10:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 1848 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: cDNA

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 376..414

(D) OTHER INFORMATION: /note- "Bacteriorhodopsin pre-sequence."

(ix) FEATURE:

(A) NAME/KEY: terminator

(B) LOCATION: 1756..1758

(D) OTHER INFORMATION: /note- "Bacteriorhodopsin stop codon."

(ix) FEATURE:

(A) NAME/KEY: misc_feature

(B) LOCATION: 517..591

(D) OTHER INFORMATION: /note- "Helix I of rat serotonin receptor protein (Type 1C)."

(ix) FEATURE:

(A) NAME/KEY: misc_feature

(B) LOCATION: 625..690

(D) OTHER INFORMATION: /note- "Helix II of rat serotonin receptor protein (Type 1C)."

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 736.7807

(D) OTHER INFORMATION: /note- "Helix III of rat serotonin receptor protein (Type lO."

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 868.7939

(D) OTHER INFORMATION: /note- "Helix IV of rat serotonin receptor protein (Type lO."

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 997.71059

(D) OTHER INFORMATION: /note- "Helix V of rat serotonin receptor protein (Type lO."

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 12977.1362

(D) OTHER INFORMATION: /note- "Helix VI of rat serotonin receptor protein (Type lO."

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 14117.1476

(D) OTHER INFORMATION: /note- "Helix VII of rat serotonin receptor protein (Type lO."

(ix) FEATURE:

(A) NAME/KEY: mutation

(B) LOCATION: replace(213, "")

(D) OTHER INFORMATION: /note- "G to A mutation removes AlwNI restriction site."

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 1732..1734

(D) OTHER INFORMATION: /note- "Codon encoding the C-terminal amino acid of the rat serotonin receptor protein (Type lO."

(ix) FEATURE:

(A) NAME/KEY: misc signal

(B) LOCATION: 374

(D) OTHER INFORMATION: /note- "RNA start site."

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION: 376..1734

( i x) FEATURE :

(A) NAME/KEY: terminator

(B) LOCATION: 1735..1737

(D) OTHER INFORMATION: /note- "Serotonin stop codon."

(ix) FEATURE:

(A) NAME/KEY: repeat region

(B) LOCATION: 436..452

(D) OTHER INFORMATION: /note- "Sequence encoding polyaspartic acid."

(ix) FEATURE:

(A) NAME/KEY: mutation

(B) LOCATION: replace(1755. "")

(D) OTHER INFORMATION: /note- "C to T mutation removes AlwNI restriction site."

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:10:

ATCTGCAGGA TGGGTGCAAC CGTGAAGTCC GTCACGGCTG CGTCACGACA GGAGCCGACC 60

AGCGACACCC AGAAGGTGCG AACGGπGAG TGCCGCAACG ATCACGAGπ πTCGTGCGC 120 πCGAGTGGT AACACGCGTG CACGCATCGA CTTCACCGCG GGTGTπCGA CGCCAGCCGG 180

CCGπGAACC AGCAGGCAGC GGGCAπTCA CATCCGCTGT GGCCCACACA CTCGGTGGGG 240

TGCGCTATπ TGGTATGGπ TGGAATCCGC GTGTCGGCTC CGTGTCTGAC GGπCATCGG 300

TCTAAAπCC GTCACGAGCG TACCATACTG AπGGGTCGT AGAGπACAC ACATATCCTC 360

GπAGGTACT GπGC ATG πG GAG πA πG CCA ACA GCA GTG GAG GGG GTA 411 Met Leu Glu Leu Leu Pro Thr Ala Val Glu Gly Val 1 - 5 10

TCG CAG GCC CAG ATC CAG GCG CTG GAC TAC AAG GAC GAT GAT GAC GTC 459 Ser Gin Ala Gin He Gin Ala Leu Asp Tyr Lys Asp Asp Asp Asp Val 15 20 25

GAC ACT ITT AAT TCC TCC GAT GGT GGA CGC πG πT CAA πC CCG GAC 507 Asp Thr Phe Asn Ser Ser Asp Gly Gly Arg Leu Phe Gin Phe Pro Asp 30 35 40

GGG GTA CAA AAC TGG CCA GCA Cπ TCA ATC GTC GTG Aπ ATA ATC ATG 555 Gly Val Gin Asn Trp Pro Ala Leu Ser He Val Val He He He Met 45 50 55 60

ACA ATA GGG GGC AAC Aπ Cπ Gπ ATC ATG GCA GTA AGC ATG GAG AAG 603 Thr He Gly Gly Asn He Leu Val He Met Ala Val Ser Met Glu Lys 65 70 75

AAA CTG CAC AAT GCA ACC AAT TAC πC πA ATG TCC CTA GCC Aπ GCT 651

Lys Leu His Asn Ala Thr Asn Tyr Phe Leu Met Ser Leu Ala He Ala

80 85 90

GAT ATG CTG GTG GGA CTA Cπ GTC ATG CCC CTG TCC CTG Cπ GCT Aπ 699

Asp Met Leu Val Gly Leu Leu Val Met Pro Leu Ser Leu Leu Ala He

95 100 105

Cπ TAT GAT TAT GTC TGG CCT πA CCT AGA TAT πG TGC CCC GTC TGG 747

Leu Tyr Asp Tyr Val Trp Pro Leu Pro Arg Tyr Leu Cys Pro Val Trp

110 115 120

Aπ TCA CTA GAT GTG CTA TCA ACT GCG TCC ATC ATG CAC CTC TGC 795

He Ser Leu Asp Val Leu Phe Ser Thr Ala Ser He Met His Leu Cys 125 130 135 140

GCC ATA TCG CTG GAC CGG TAT GTA GCA ATA CGT AAT CCT Aπ GAG CAT 843 Ala He Ser Leu Asp Arg Tyr Val Ala He Arg Asn Pro He Glu His 145 150 155

AGC CGG πC AAT TCG CGG ACT AAG GCC ATC ATG AAG Aπ GCC ATC Gπ 891 Ser Arg Phe Asn Ser Arg Thr Lys Ala He Met Lys He Ala He Val 160 165 170

TGG GCA ATA TCA ATA GGA Gπ TCA Gπ CCT ATC CCT GTG Aπ GGA CTG 939 Trp Ala He Ser He Gly Val Ser Val Pro He Pro Val He Gly Leu 175 180 185

AGG GAC GAA AGC AAA GTG πC GTG AAT AAC ACC ACG TGC GTG CTC AAT 987 Arg Asp Glu Ser Lys Val Phe Val Asn Asn Thr Thr Cys Val Leu Asn 190 195 200

GAC CCC AAC πC Gπ CTC ATC GGG TCC πC GTG GCA πC πC ATC CCG 1035 Asp Pro Asn Phe Val Leu He Gly Ser Phe Val Ala Phe Phe He Pro 205 210 215 220 πG ACG Aπ ATG GTG ATC ACC TAC πC πA ACG ATC TAC GTC CTG CGC 1083 Leu Thr He Met Val He Thr Tyr Phe Leu Thr He Tyr Val Leu Arg 225 230 235

CGT CAA ACT CTG ATG πA Cπ CGA GGT CAC ACC GAG GAG GAA CTG GCT 1131 Arg Gin Thr Leu Met Leu Leu Arg Gly His Thr Glu Glu Glu Leu Ala 240 245 250

AAT ATG AGC CTG AAC Tπ CTG AAC TGC TGC TGC AAG AAG AAT GGT GGT 1179 Asn Met Ser Leu Asn Phe Leu Asn Cys Cys Cys Lys Lys Asn Gly Gly 255 260 265

GAG GAA GAG AAC GCT CCG AAC CCT AAT CCA GAT CAG AAA CCA CGT CGA 1227 Glu Glu Glu Asn Ala Pro Asn Pro Asn Pro Asp Gin Lys Pro Arg Arg 270 275 280

-917-

81781 1111331V99313V 19113199313V3333319V

17I8T 91V93V3V3V 93W31L91913993939933W3V33339 V3V99V393V 3V3931V911

L Λ JΘS J S 179ZI V939V33V939339939V9191919V 19V usv J S OJd usv LBΛ OJd nθη nL9 nθη usv L9 LBΛ L9 IΘW nL9 ΘLI ZOZI IW 1313333W 319 V339139V9 Vll 3W 9V99199V391V 9V9 VIV

λL9 OJd nL9 j i SLH 6991 3991339V9 33V 1V3

Olfr 0V 0017 β JV Jλi ΘLI usv L Λ usv θη nL9 BJV X " L9 JΘS nθη BLV JMl BLV BLV IT9T 993 IVl 11V 3W 119 IW 3139V999V 99913191113913V 339139

LBΛ °Jd OJ sλη sλη dsv OJd sλη JXJ . dsv 8991 119 1331339W WV 3V9 V339W IVl 1V9

9191

j qi Jλi LBΛ naη LBΛ Z9H 13V 3V1919913 919

aqd L p Λ usv sλη λL9 6IH 111919 IW 9W 999

sλo nθη Ji ;aw ΘLI izεi 19111311993191331V IW 33V 31V 31111193339199191V 31V

nθη L sλη sλη nL9 ε2εt 913111919111311 V1911V 399113319 VW 331139 VW 9W W9

00ε 962 062 982 usv usv ΘLI BLV UL9 η.θW jqi λL96JV OJd BJV sλη nL9 sλη sλη sλη

9Z2T 3W 0W 31V 139 W391V 33V 399 V9V 3331939W W9 WV 9W 9W

88eZ0/fr6Sn/XDd 62LIZ/H O

(2) INFORMATION FOR SEQ ID NO:11:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 453 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:11:

Met Leu Glu Leu Leu Pro Thr Ala Val Glu Gly Val Ser Gin Ala Gin 1 5 10 15

He Gin Ala Leu Asp Tyr Lys Asp Asp Asp Asp Val Asp Thr Phe Asn 20 25 30

Ser Ser Asp Gly Gly Arg Leu Phe Gin Phe Pro Asp Gly Val Gin Asn 35 40 45

Trp Pro Ala Leu Ser He Val Val He He He Met Thr He Gly Gly 50 55 60

Asn He Leu Val He Met Ala Val Ser Met Glu Lys Lys Leu His Asn 65 70 75 80

Ala Thr Asn Tyr Phe Leu Met Ser Leu Ala He Ala Asp Met Leu Val 85 90 95

Gly Leu Leu Val Met Pro Leu Ser Leu Leu Ala He Leu Tyr Asp Tyr 100 105 110

Val Trp Pro Leu Pro Arg Tyr Leu Cys Pro Val Trp He Ser Leu Asp 115 120 125

Val Leu Phe Ser Thr Ala Ser He Met His Leu Cys Ala He Ser Leu 130 135 140

Asp Arg Tyr Val Ala He Arg Asn Pro He Glu His Ser Arg Phe Asn 145 150 155 160

Ser Arg Thr Lys Ala He Met Lys He Ala He Val Trp Ala He Ser 165 170 175

He Gly Val Ser Val Pro He Pro Val He Gly Leu Arg Asp Glu Ser 180 185 190

Lys Val Phe Val Asn Asn Thr Thr Cys Val Leu Asn Asp Pro Asn Phe 195 200 205

Val Leu He Gly Ser Phe Val Ala Phe Phe He Pro Leu Thr He Met 210 215 220

Val He Thr Tyr Phe Leu Thr He Tyr Val Leu Arg Arg Gin Thr Leu 225 230 235 240

Met Leu Leu Arg Gly His Thr Glu Glu Glu Leu Ala Asn Met Ser Leu 245 250 255

Asn Phe Leu Asn Cys Cys Cys Lys Lys Asn Gly Gly Glu Glu Glu Asn 260 265 270

Ala Pro Asn Pro Asn Pro Asp Gin Lys Pro Arg Arg Lys Lys Lys Glu 275 280 285

Lys Arg Pro Arg Gly Thr Met Gin Ala He Asn Asn Glu Lys Lys Ala 290 295 300

Ser Lys Val Leu Gly He Val Phe Phe Val Phe Leu He Met Trp Cys 305 310 315 320

Pro Phe Phe He Thr Asn He Leu Ser Val Leu Cys Gly Lys Ala Cys 325 330 335

Asn Gin Lys Leu Met Glu Lys Leu Leu Asn Val Phe Val Trp He Gly 340 345 350

Tyr Val Cys Ser Gly He Asn Pro Leu Val Tyr Thr Leu Phe Asn Lys 355 360 365

He Tyr Arg Arg Ala Phe Ser Lys Tyr Leu Arg Cys Asp Tyr Lys Pro 370 375 380

Asp Lys Lys Pro Pro Val Arg Gin He Pro Arg Val Ala Ala Thr Ala 385 390 395 400

Leu Ser Gly Arg Glu Leu Asn Val Asn He Tyr Arg His Thr Asn Glu 405 410 415

Arg Val Ala Arg Lys Ala Asn Asp Pro Glu Pro Gly He Glu Met Gin 420 425 430

Val Glu Asn Leu Glu Leu Pro Val Asn Pro Ser Asn Val Val Ser Glu 435 440 445

Arg He Ser Ser Val 450

(2) INFORMATION FOR SEQ ID NO:12:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 1764 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: cDNA

(ix) FEATURE:

(A) NAME/KEY: repeat region

(B) LOCATION: 436..4^2

(D) OTHER INFORMATION: /note- "Sequence encoding polyaspartic acid."

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 463.7465

(D) OTHER INFORMATION: /note- "Codon encoding the N-terminal amino acid of the human thrombin receptor protein."

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 16307.1632

(D) OTHER INFORMATION: /note- "Codon encoding the C-terminal amino acid of the human thrombin receptor protein. "

(ix) FEATURE:

(A) NAME/KEY: repeat region

(B) LOCATION: 1633..T650

(D) OTHER INFORMATION: /note- "Sequence encoding polyhistidine."

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 648.7656

(D) OTHER INFORMATION: /note- "Deleted AlwNI restriction site."

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 893.7898

(D) OTHER INFORMATION: /note- "Deleted Pstl restriction site."

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 1301..1309

(D) OTHER INFORMATION: /note- "Deleted AlwNI restriction site."

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 1394..1402

(D) OTHER INFORMATION: /note- "Deleted AlwNI restriction site."

(ix) FEATURE:

(A) NAME/KEY: misc signal

(B) LOCATION: 374 "

(D) OTHER INFORMATION: /note- "RNA start site."

(ix) FEATURE:

(A) NAME/KEY: mutation

(B) LOCATION: replace(1671. "")

(D) OTHER INFORMATION: /note- "C to T mutation removes AlwNI site."

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION: 376..1650

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 376.7414

(D) OTHER INFORMATION: /note- "Bacteriorhodopsin pre-sequence."

(ix) FEATURE:

(A) NAME/KEY: terminator

(B) LOCATION: 1672..1674

(D) OTHER INFORMATION: /note- "Bacteriorhodopsin stop codon."

(ix) FEATURE:

(A) NAME/KEY: terminator

(B) LOCATION: 1651..1653

(D) OTHER INFORMATION: /note- "Thrombin stop codon."

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:12:

ATCTGCAGGA TGGGTGCAAC CGTGAAGTCC GTCACGGCTG CGTCACGACA GGAGCCGACC 60

AGCGACACCC AGAAGGTGCG AACGGπGAG TGCCGCAACG ATCACGAGπ πTCGTGCGC 120 πCGAGTGGT AACACGCGTG CACGCATCGA CπCACCGCG GGTGTπCGA CGCCAGCCGG 180

CCGπGAACC AGCAGGCAGC GGGCAπTCA CATCCGCTGT GGCCCACACA CTCGGTGGGG 240

TGCGCTATπ TGGTATGGπ TGGAATCCGC GTGTCGGCTC CGTGTCTGAC GGπCATCGG 300

TCTAAAπCC GTCACGAGCG TACCATACTG AπGGGTCGT AGAGπACAC ACATATCCTC 360

GπAGGTACT GπGC ATG πG GAG πA πG CCA ACA GCA GTG GAG GGG GTA 411 Met Leu Glu Leu Leu Pro Thr Ala Val Glu Gly Val 1 5 10

19-

022 9T2 012 902

ΘLI LV nθη sλo JM1 θqd JΘS BLV β JV ^L9 nθη j qi BJV dJi JΘS nθη

9εθT 31V 33991319113V 31133113999V V9991313V 193991331313

002 96T 061

JΘS UL9 }ΘH OJd Jλi LBΛ LB BLV θη θqd β JV dsv ΘLI JΘS 3LI LBΛ Z86 3319V391V 333 IVl 919919139913 LL 9933V911V 39V VIV 319

981 081 9ZI jqi η.aπ naη naη ΘLI JΘS BLV Jλi q.θw usv sλo J^l Θ d LV B LV JM1 6ε6; V3V 91V 31391131V 1313393V191V 3W 1913V1 LL V39 V3913V

) OZT 991 09T

LBΛ θqd βJV sλo nθη nL9 JΘS " L9 ΘMd UL9 di dsv JΘS λ " L9 JΘS ΘMd T68 319311393191911 W91319991119V39911V919V 399331 LL

991 091 K jλi Jλi JΘS ΘLI sλη aqd OJd naη LBΛ JΘS L Λ ΘMd nθη LBΛ dsv BLV 8178 3V1 IVl 39V 31V 9W 1113333139191319191119139191V9 V39

96Z

02T 911 011 sλη nθη ΘLI θqd L Λ LBΛ ΘLI BLV IθW ΘLI USV nθη OJd nθη JΘS LB Z17Z VW 91331V 31191911931V 33991V 31V 3W V13 V3331339V 319

90T 001 96

LBΛ θqd LBΛ ^L9 JM1 J^l LBΛ JΘS OJd LBΛ θqd nθη jqi naη dJi JΘS 669 V19111919 V9933V 3V1919131 V33319 LL 313 V3V 913991331

06 98 08

JΘS Jqi naη j λi λL9 JΘS LV dsv L9 JΘS ΘLI ΘMd BLV OJd nθη UL9 199 39V 33V 911 IVl V993313391V9 W9 V3131V 311 V39133113 W3

9Z OZ 99 sλη UL9 nθη OJd JΘS JΘS sλη usv ΘLI JΘS L B Λ nθη BJV Jλi nL9 jqi 809 VW W311313319V 39V WV IW 31V 331319 Vll V9V 3V1 W913V

999

dsv usv o d usv v Z09 1V9 IW 3333W 99V 313113 LL V319933331V9 Vll 33V 3393V9

92 02 9T

L&Λ dsv dsv dsv dsv sλη λi dsv nθη BLV UL9 ΘLI UL9 BLV UL9 JΘS 6917 3193V91V91V93V99W 3V13V99139399V331V 9V33399V3931

ssezo / wsn / ioa 6SLiz/P6OΛ V

-29-

Jλi ΘLI

0991 9339939V91 3V1 VIV

JΘS usv BLV IΘW nθη 1191 39V 3W IW 9133W 19V 13139133V 1V991V VW 19V V3991V 911

UL9 JΘS OJd dsv J3 S S n L9 sλη sλo sλo 891 9V399919V 39V 3W IVl 19V 39V 3331V933119V W9 WV 391391

9T9I Vll 31V 19V 3V13193V199V 9V33919V91313311393V1 IVl 3V1

ΘLI θη OJd dsv JΘS LBΛ sλo LB nθη Z9VI 11V V133333V9 39V 319191319 313 vε ovε 9εε nθη j λi LV θqd -^i BLV BLV nL9 Jqi Jqi JΘ JM1 SLH JΘS nθη θqd 6IVI 3133V13391113V13391399V9 V3V 33V 33113V 3V3131113311

JΘS Jλi sλo ΘLI ΘLI IZεi V313V1 39111V 31V

9iε oiε 9oε θqd aLl sλo ΘMd L Λ BLV BLV JΘS nθη θqd nθη BLV BJV JΘS sλη sλη 8281 31131V 391311119139139 V319133119111399933319W 9W

9Z2T

082 9Z2 0Z2

Jλi sλo LBΛ JM1 ΘS ΘLI ΘLI nθη OJd LBΛ Θ d ΘMd ΘMd LB BLV JΘS Z22I IVl 19131993V 33111V 31V 913933919111111311319139131

992 092 992 θqd BLV JΘS ΘMd Jλi Jλi BLV λi λi λL9 ΠL9 nθη nθη jqi nig USV

6ZIT 311339 V313113V13V1339 IVl 3V1399 W931391333V W9 IW

092 9V2 0V2 nθη LBΛ d v LH sλo qi j qi ΘLI usv nθη X " L9 OJd L Λ UL9 ΘLI JM1 lεil 3139191V91V319133V 13V 31V 3W 3139993339199V331V 33V 82 082 922

UL9 nL9 sλη naη L Λ θη OJd L LBΛ λL9 BLV ΘLI LV nθη BLV dJi C80T WO 9V99W 313319313133919 V19999 V3931V 339911139991

68£lZ/fr6 OΛY

CGACCAGCGA πGATCGCAC ACGCAGGACA GCCCCACAAC CGGCGCGGCT GTGπCAACG 1720 ACACACGATG AGTCCCCCAC TCGGTCπGT ACTCGGATCC TTTT 1764

(2) INFORMATION FOR SEQ ID NO:13:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 425 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13:

Met Leu Glu Leu Leu Pro Thr Ala Val Glu Gly Val Ser Gin Ala Gin 1 5 10 15

He Gin Ala Leu Asp Tyr Lys Asp Asp Asp Asp Val Asp Ala Thr Leu 20 25 30

Asp Pro Arg Ser Phe Leu Leu Arg Asn Pro Asn Asp Lys Tyr Glu Pro 35 40 45

Phe Trp Glu Asp Glu Glu Lys Asn Glu Ser Gly Leu Thr Glu Tyr Arg 50 55 60

Leu Val Ser He Asn Lys Ser Ser Pro Leu Gin Lys Gin Leu Pro Ala 65 70 75 80

Phe He Ser Glu Asp Ala Ser Gly Tyr Leu Thr Ser Ser Trp Leu Thr 85 90 95

Leu Phe Val Pro Ser Val Tyr Thr Gly Val Phe Val Val Ser Leu Pro 100 105 110

Leu Asn He Met Ala He Val Val Phe He Leu Lys Met Lys Val Lys 115 120 125

Lys Pro Ala Val Val Tyr Met Leu His Leu Ala Thr Ala Asp Val Leu 130 135 140

Phe Val Ser Val Leu Pro Phe Lys He Ser Tyr Tyr Phe Ser Gly Ser 145 150 155 160

Asp Trp Gin Phe Gly Ser Glu Leu Cys Arg Phe Val Thr Ala Ala Phe 165 170 175

Tyr Cys Asn Met Tyr Ala Ser He Leu Leu Met Thr Val He Ser He 180 185 190

Asp Arg Phe Leu Ala Val Val Tyr Pro Met Gin Ser Leu Ser Trp Arg 195 200 205

Thr Leu Gly Arg Ala Ser Phe Thr Cys Leu Ala He Trp Ala Leu Ala 210 215 220

He Ala Gly Val Val Pro Leu Val Leu Lys Glu Gin Thr He Gin Val 225 230 235 240

Pro Gly Leu Asn He Thr Thr Cys His Asp Val Leu Asn Glu Thr Leu 245 250 255

Leu Glu Gly Tyr Tyr Ala Tyr Tyr Phe Ser Ala Phe Ser Ala Val Phe 260 265 270

Phe Phe Val Pro Leu He He Ser Thr Val Cys Tyr Val Ser He He 275 280 285

Arg Cys Leu Ser Ser Ser Ala Val Ala Asn Arg Ser Lys Lys Ser Arg 290 295 300

Ala Leu Phe Leu Ser Ala Ala Val Phe Cys He Phe He He Cys Phe 305 310 315 320

Gly Pro Thr Asn Val Leu Leu He Ala His Tyr Ser Phe Leu Ser His 325 330 335

Thr Ser Thr Thr Glu Ala Ala Tyr Phe Ala Tyr Leu Leu Cys Val Cys 340 345 350

Val Ser Ser He Ser Ser Cys He Asp Pro Leu He Tyr Tyr Tyr Ala 355 360 365

Ser Ser Glu Cys Gin Arg Tyr Val Tyr Ser He Leu Cys Cys Lys Glu 370 375 380

Ser Ser Asp Pro Ser Ser Tyr Asn Ser Ser Gly Gin Leu Met Ala Ser 385 390 395 400

Lys Met Asp Thr Cys Ser Ser Asn Leu Asn Asn Ser He Tyr Lys Lys 405 410 415

Leu Leu Thr His His His His His His 420 425

(2) INFORMATION FOR SEQ ID NO:14:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 2147 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: cDNA

(ix) FEATURE:

(A) NAME/KEY: misc signal

(B) LOCATION: 378.7380

(D) OTHER INFORMATION: /note- "Bacteriorhodopsin start codon."

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 378.7416

(D) OTHER INFORMATION: /note- "Bacteriorhodopsin pre-sequence."

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION: 378.. 054

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 417.7419

(D) OTHER INFORMATION: /note- "Codon encoding N-terminal amino acid of mature bacteriorhodopsin."

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 1122..1124

(D) OTHER INFORMATION: /note- "Codon encoding amino acid number 236 of bacteriorhodopsin."

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 11377.1139

(D) OTHER INFORMATION: /note- "Codon encoding amino acid number 6 of the catalytic subunit of E. coli Aspartate Transcarbamylase."

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 1125..1178

(D) OTHER INFORMATION: /note- "Synthetic DNA fragment."

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 1125..1136

(D) OTHER INFORMATION: /note- "Sequence encoding Factor Xa proteolytic site."

(ix) FEATURE:

(A) NAME/KEY: misc feature

(B) LOCATION: 2037..2039

(D) OTHER INFORMATION: /note- "Codon encoding amino acid number 306 of E. coli Aspartate Transcarbamylase."

-99-

06 98 08

ΘLI OJd usv UL9 nL9 λL9 L9 aqd OJd L 1ΘW Jqi naη λL9 Jλi λL9 099 31V 3333W 9V39V9999199311933 V1991V V3V 313399 IVl 999

9Z 0Z 99 09 nθη nθη q.θμ JΘS nθη jλi η.θw jqi θqd BLV ΘLI B LV OJd L B Λ nθη jqi

209 91391391V 9313133V191V 93V 31193931V 339 V3331931393V

99 09 9V

JMl ΘLI BLV Jλi θqd sλη sλη BLV dsv OJd dsv JΘS LB L9 }θl λL9 V99 93V 31V 3393V1311 VW 9W V391V9 V333V993131939991V 999

sλη LBΛ nθη nθη 909 WV 919313311 IVl 31393V 999313 V9991V V1393993V 199313

92 02 91

BLV naη d j i Θ I dJi nL9 OJd BJV λL9 JMl ΘLI L9 BLV UL9 JΘS LBΛ 89V 939 V1399131V 9919V9933193 V9933V 31V 9V33399V3931 V19

01 9 I

^L9 nL9 LBΛ BLV JMl OJd naη naη ΠL9 naη aw 0IV 9999V9919 V39 V3V V33911 Vll 9V991191V 3911913 V199V11931

098 331V1V3V3V 3V1L9V9V19319991LV913V1V33V1939V93V3193311VW13199

00ε 31V311993V 91319193313993191939331W9911L 991V1991111V13939199

0V2 99199313V3 V3V3339919139339V3V3111V399939 V399V39V33 W91L93399

081 339V3393V93111919993933V3113V931V393V3919393V3W19919V931139

021 39193111119V93V31V93 W3933919V 91L993W939199W9V333V3V939V33

09 V9339V99V3 V93V3193913993V3193319W91933V V3919991V99V39131W1

: VI : ON QI Q3S : NOUdId3S3α 33N3HQ3S (P

„ 'uopoo do;s u.Lsdopoq j ouθpBg,, =θ}θu/ - ' NOIlVWdOJNI d3H10 (0)

Z902 " 9902 : N0UV301 (9) JO BULLUJΘ :λ3X/3HVN (V)

: 3dPlV3J (XL)

„ - 6t 2 qBnojqq. grø s θqmnu p ioB OU LLUB L BU LLLIJΘ:|.- 3 u .Lsdopoqjθ LJθη.DBq βU Lpoouθ θouanbas,, =θ}θu/ ÷ NOU WyOJNI yΞHIO (0)

V902 ' 0V02 : N0I1V00T (9) θ j n^Bai OS LLU :

SKZ0IP6S ι UlDd OΛV

TAC TGG GCG CGG TAC GCT GAC TGG CTG πC ACC ACG CCG CTG TTG JIG 698

Tyr Trp Ala Arg Tyr Ala Asp Trp Leu Phe Thr Thr Pro Leu Leu Leu 95 100 105 πA GAC CTC GCG πG CTC Gπ GAC GCG GAT CAG GGA ACG ATC Cπ GCG 746 Leu Asp Leu Ala Leu Leu Val Asp Ala Asp Gin Gly Thr He Leu Ala 110 115 120

CTC GTC GGT GCC GAC GGC ATC ATG ATC GGG ACC GGC CTG GTC GGC GCA 794 Leu Val Gly Ala Asp Gly He Met He Gly Thr Gly Leu Val Gly Ala 125 130 135

CTG ACG AAG GTC TAC TCG TAC CGC πC GTG TGG TGG GCG ATC AGC ACC 842 Leu Thr Lys Val Tyr Ser Tyr Arg Phe Val Trp Trp Ala He Ser Thr 140 145 150 155

GCA GCG ATG CTG TAC ATC CTG TAC GTG CTG πc πC GGG πc ACC TCG 890 Ala Ala Met Leu Tyr He Leu Tyr Val Leu Phe Phe Gly Phe Thr Ser 160 165 170

AAG GCC GAA AGC ATG CGC CCC GAG GTC GCA TCC ACG πC AAA GTA CTG 938 Lys Ala Glu Ser Met Arg Pro Glu Val Ala Ser Thr Phe Lys Val Leu 175 180 185

CGT AAC Gπ ACC Gπ GTG πG TGG TCC GCG TAT CCC GTC GTG TGG CTG 986 Arg Asn Val Thr Val Val Leu Trp Ser Ala Tyr Pro Val Val Trp Leu 190 195 200

ATC GGC AGC GAA GGT GCG GGA ATC GTG CCG CTG AAC ATC GAG ACG CTG 1034 He Gly Ser Glu Gly Ala Gly He Val Pro Leu Asn He Glu Thr Leu 205 210 215

CTG πC ATG GTG Cπ GAC GTG AGC GCG AAG GTC GGC πC GGG CTC ATC 1082 Leu Phe Met Val Leu Asp Val Ser Ala Lys Val Gly Phe Gly Leu He 220 225 230 235

CTC CTG CGC AGT CGT GCG ATC πC GGC GAA GCC GAA GCG CCG ATC GAA 1130 Leu Leu Arg Ser Arg Ala He Phe Gly Glu Ala Glu Ala Pro He Glu 240 245 250

GGT CGT CAG AAA CAT ATC Aπ TCC ATA AAC GAC CTJ AGT CGC GAT GAC 1178 Gly Arg Gin Lys His He He Ser He Asn Asp Leu Ser Arg Asp Asp 255 260 265

Cπ AAT CTG GTG CTG GCG ACA GCG GCG AAA CTG AAA GCA AAC CCG CAA 1226 Leu Asn Leu Val Leu Ala Thr Ala Ala Lys Leu Lys Ala Asn Pro Gin 270 275 280

CCA GAG CTG πG AAG CAC AAA GTC Aπ GCC AGC TGT πC πC GAA GCC 1274 Pro Glu Leu Leu Lys His Lys Val He Ala Ser Cys Phe Phe Glu Ala 285 290 295

-89-

909 009 96V

BLV sλη LV usv LH naη dsv JΘS BLV BJV nθη L Λ θ d UL9 BLV sλη 8681 339 VW 3393W 3V33131V919V 3393931131191119V3939 VW

06V 98V 08V

LBΛ usv BLV Jλi nL9 JΘS OJd dsv naη BJV nL9 sλη UL9 L e Λ β JV JMl 0981 9193W 3393V19V93319333V99131939V9 VW W391939333V

9ZV OZV 99V 09V

}ΘW jλi nθη ΘLI dsv LBΛ nL9 L Λ 1ΘH L Λ nL9 nL9 ΘLI JΘS JΘS SLH

2081 91V 3V191331V 3V9 V19 W991991V 919 W9 W911V 13139V 3V3

99V 09V 9VV nθη jas dji BLV ΘLI λL9 sλη ΠL9 dsv nθη .aw dsv nθη ΘLI J^l U L9

V9ZI 91319V 991 V3931V 999 VW W91V931391V 1V991311V 3V1 W3 ovv 9εv oεv

OJd q.θH LV naη BLV dsv OJd BLV ΘLI ΘMd Jλi ΘMd BJV usv λL9 dsv

90ZI 93391V V399139393V993393931V 3113V1 LL 1933W 3993V9

92V 02V 9IV θqd sλη BLV nθη BLV UL9 JMl nθη JΘS SLH LBΛ JMl β JV λL9 Jλi sλη

8991 3119W 139 Vll 9399V313V 9133313V311933V 393199 IVl VW

0IV 90V 00V nθη dsv L9 LBΛ IΘH BLV L e Λ SLH nθη usv dsv nθη 6JV λ " L9 UL9 JMl 0191 9133V919911991V V393193V3313 IW 3V99131939999V333V

968 068 98ε 08ε nL9 UL9 ΘLI JMl θqd naη dsv nθη naη j qi UL9 Jqi OJd SLH UL9 USV

2991 W99V311V 13V 311 Vll 3V991391133V W393V 9331V3 W33W

j as λL9 dsv L9 jqi VI9I 3313991V9199339 IW 913 V19933 V19 IW 3993311119V933V

BLV nθη 6JV LV dsv

99VI 339913393 9391V9

9vε ovε 9εε

LBΛ J^l JMl JΘS ΘLI LBΛ JΘS ΘLI JMl sv BLV nθη j qi nL9 λL9 sλη

8IVI 3193V113V 39V 31V 119 V3111V 33V 1V933911393V W9399 WV

sλη JΘS dsv JΘS ΘMd LBΛ L Λ JΘS LV OZεi WV 39V 3V933131139991991939V 339

2281

mZOIPβSΩJlDd 6S IZIP6 O

AAT ATG AAA GTG CTG CAT CCG πG CCG CGT Gπ GAT GAG Aπ GCG ACG 1946 Asn Met Lys Val Leu His Pro Leu Pro Arg Val Asp Glu He Ala Thr 510 515 520

GAT Gπ GAT AAA ACG CCA CAC GCC TGG TAC πC CAG CAG GCA GGC AAC 1994 Asp Val Asp Lys Thr Pro His Ala Trp Tyr Phe Gin Gin Ala Gly Asn 525 530 535

GGG Aπ πC GCT CTG CAA GCG πA CTG GCA CTG Gπ CTG AAT CGG GCC 2042 Gly He Phe Ala Leu Gin Ala Leu Leu Ala Leu Val Leu Asn Arg Ala 540 545 550 555

GCG ACC AGC GAC TGATCGCACA CGCAGGACAG CCCCACAACC GGCGCGGCTG 2094 Ala Thr Ser Asp

TGπCAACGA CACACGATGA GTCCCCCACT CGGTCπGTA CTCGGATCCT Tπ 2147

(2) INFORMATION FOR SEQ ID NO:15:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 559 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:15:

Met Leu Glu Leu Leu Pro Thr Ala Val Glu Gly Val Ser Gin Ala Gin 1 5 10 15

He Thr Gly Arg Pro Glu Trp He Trp Leu Ala Leu Gly Thr Ala Leu 20 25 30

Met Gly Leu Gly Thr Leu Tyr Phe Leu Val Lys Gly Met Gly Val Ser 35 40 45

Asp Pro Asp Ala Lys Lys Phe Tyr Ala He Thr Thr Leu Val Pro Ala 50 55 60

He Ala Phe Thr Met Tyr Leu Ser Met Leu Leu Gly Tyr Gly Leu Thr 65 70 75 80

Met Val Pro Phe Gly Gly Glu Gin Asn Pro He Tyr Trp Ala Arg Tyr 85 90 95

Ala Asp Trp Leu Phe Thr Thr Pro Leu Leu Leu Leu Asp Leu Ala Leu 100 105 110

Leu Val Asp Ala Asp Gin Gly Thr He Leu Ala Leu Val Gly Ala Asp 115 120 125

Gly He Met He Gly Thr Gly Leu Val Gly Ala Leu Thr Lys Val Tyr 130 135 140

Ser Tyr Arg Phe Val Trp Trp Ala He Ser Thr Ala Ala Met Leu Tyr 145 150 155 160

He Leu Tyr Val Leu Phe Phe Gly Phe Thr Ser Lys Ala Glu Ser Met 165 170 175

Arg Pro Glu Val Ala Ser Thr Phe Lys Val Leu Arg Asn Val Thr Val 180 185 190

Val Leu Trp Ser Ala Tyr Pro Val Val Trp Leu He Gly Ser Glu Gly 195 200 205

Ala Gly He Val Pro Leu Asn He Glu Thr Leu Leu Phe Met Val Leu 210 215 220

Asp Val Ser Ala Lys Val Gly Phe Gly Leu He Leu Leu Arg Ser Arg 225 230 235 240

Ala He Phe Gly Glu Ala Glu Ala Pro He Glu Gly Arg Gin Lys His 245 250 255

He He Ser He Asn Asp Leu Ser Arg Asp Asp Leu Asn Leu Val Leu 260 265 270

Ala Thr Ala Ala Lys Leu Lys Ala Asn Pro Gin Pro Glu Leu Leu Lys 275 280 285

His Lys Val He Ala Ser Cys Phe Phe Glu Ala Ser Thr Arg Thr Arg 290 295 300

Leu Ser Phe Gin Thr Ser Met His Arg Leu Gly Ala Ser Val Val Gly 305 310 315 320

Phe Ser Asp Ser Ala Asn Thr Ser Leu Gly Lys Lys Gly Glu Thr Leu 325 330 335

Ala Asp Thr He Ser Val He Ser Thr Tyr Val Asp Ala He Val Met 340 345 350

Arg His Pro Gin Glu Gly Ala Ala Arg Leu Ala Thr Glu Phe Ser Gly 355 360 365

Asn Val Pro Val Leu Asn Ala Gly Asp Gly Ser Asn Gin His Pro Thr 370 375 380

Gin Thr Leu Leu Asp Leu Phe Thr He Gin Glu Thr Gin Gly Arg Leu 385 390 395 400

Asp Asn Leu His Val Ala Met Val Gly Asp Leu Lys Tyr Gly Arg Thr 405 410 415