Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
NOVEL SYNTHETIC PROTEIN STRUCTURAL TEMPLATES FOR THE GENERATION, SCREENING AND EVOLUTION OF FUNCTIONAL MOLECULAR SURFACES
Document Type and Number:
WIPO Patent Application WO/1997/045538
Kind Code:
A1
Abstract:
The present invention relates generally to protein molecules with novel binding or catalytic properties. More specifically, the invention relates to the production of libraries of peptide sequences in the framework of a structural template derived from Pleckstrin-Homology (PH) domains and the identification of such sequences that possess the desired properties of binding macromolecular or small molecule ligands, including the transition states of chemical reactions. The invention also relates to the provision of small molecules, derived from the so-obtained peptide sequences that possess ligand binding properties comparable to those of the peptides in the context of the structural framework.

Inventors:
STEIPE BORIS (DE)
BRUHN HEIKE (DE)
FUNK MARTIN (DE)
HENKEL THOMAS (DE)
Application Number:
PCT/EP1997/002840
Publication Date:
December 04, 1997
Filing Date:
May 30, 1997
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
MEDIGENE AG (DE)
STEIPE BORIS (DE)
BRUHN HEIKE (DE)
FUNK MARTIN (DE)
HENKEL THOMAS (DE)
International Classes:
C12N15/09; A61K31/70; A61K31/7088; A61K31/711; A61K38/00; A61K38/17; A61K39/00; A61K48/00; C07K14/47; C12N1/19; C12N1/21; C12N5/10; C12N15/10; C12N15/62; C12N15/81; C12P21/02; C12Q1/02; C40B40/02; G06F17/30; C12R1/19; (IPC1-7): C12N15/10; C12N15/62; C12N15/81; C12N1/19; C07K14/47; A61K38/17; A61K48/00
Domestic Patent References:
WO1993008278A11993-04-29
WO1994002502A11994-02-03
WO1991005058A11991-04-18
WO1992002536A11992-02-20
WO1993003172A11993-02-18
WO1991012328A11991-08-22
Other References:
M.G. CULL ET AL.: "Screening for receptor ligands using large libraries of peptides linked to the C terminus of the lac repressor", PROC. NATL. ACAD. SCI., vol. 89, no. 5, 1992, NATL. ACAD. SCI.,WASHINGTON,DC,US;, pages 1865 - 1869, XP002043736
SCHATZ P J: "CONSTRUCTION AND SCREENING OF BIOLOGICAL PEPTIDE LIBRARIES", CURRENT OPINION IN BIOTECHNOLOGY, vol. 5, no. 5, 1 January 1994 (1994-01-01), pages 487 - 494, XP000564300
SCHATZ P J: "USE OF PEPTIDE LIBRARIES TO MAP THE SUBSTRATE SPECIFICITY OF A PEPTIDE-MODIFYING ENZYME: A 13 RESIDUE CONSENSUS PEPTIDE SPECIFIES BIOTINYLATION IN ESCHERICHIA COLI", BIO/TECHNOLOGY, vol. 11, October 1993 (1993-10-01), pages 1138 - 1143, XP000606894
LEVENS D ET AL: "NOVEL METHOD FOR IDENTIFYING SEQUENCE-SPECIFIC DNA-BINDING PROTEINS", MOLECULAR AND CELLULAR BIOLOGY, vol. 5, no. 9, 1 September 1985 (1985-09-01), pages 2307 - 2315, XP000562760
LU Z ET AL: "EXPRESSION OF THIOREDOXIN RANDOM PEPTIDE LIBRARIES ON THE ESCHERICHIA COLI CELL SURFACE AS FUNCTIONAL FUSIONS TO FLAGELLIN: A SYSTEM DESIGNED FOR EXPLORING PROTEIN-PROTEIN INTERACTIONS", BIO/TECHNOLOGY, vol. 13, April 1995 (1995-04-01), pages 366 - 372, XP002033346
N. S. VISPO ET AL.: "Hybrid rop-pIII proteins for the display of constrained petides in filamentous phage capsid", ANNALES DE BIOLOGIE CLINIQUE, vol. 50, 1993, ELSEVIER, PARIS, FR, pages 917 - 922, XP002043759
L.C. MATTHEAKIS ET AL.: "An in vitro polysome display system for identifying ligands from very large peptide libraries", PROC. NATL. ACAD. SCI., vol. 91, September 1994 (1994-09-01), NATL. ACAD. SCI.,WASHINGTON,DC,US;, pages 9022 - 9026, XP002043737
K.M. FERGUSON ET AL.: "Scratching the surface with the PH domain", NATURE STRUCTURAL BIOLOGY, vol. 2, no. 9, September 1995 (1995-09-01), WASHINGTON, DC,US;, pages 715 - 718, XP002043738
T.J. GIBSON ET AL.: "PH domain: the first anniversary", TRENDS IN BIOCHEMICAL SCIENCES, vol. 19, no. 9, September 1994 (1994-09-01), ELSEVIER, AMSTERDAM, NL, pages 349 - 353, XP002043739
HASLAM R J ET AL: "PLECKSTRIN DOMAIN HOMOLOGY", NATURE, vol. 363, 27 May 1993 (1993-05-27), pages 309/310, XP002030407
Download PDF:
Claims:
CLAIMS
1. A nucleic acid vector molecule, characterized in that it comprises a first nonnatural nucleic acid sequence encoding a polypeptide, wherein the topology of the tertiary fold of said polypeptide is homologous to that of a PleckstrmHomology (PH) domain, characterized in that said nonnatural nucleic acid sequence, as compared to corresponding natural nucleic acid sequences, (a) comprises, in at least one of the nucleotide sequences encoding a loop of the PH domain, at least one altered nucleotide, wherein said ac least one altered nucleotide confers an alteration in the corresponding amino acid and said alteration (s) in said corresponding ammo acιd(s) confer(s) a gain of function; (b) comprises, in at least one of the nucleotide sequences encoding a loop of the PHdomam or the regions directly adjacent tnereto a sequence representing one or more endonuclease restriction sites, or/and (c) confers to the encoded polypeptide an improved stability, improved folding properties, an improved resistance to oxidation, aggregation or an improved protease resistance, facilitated or improved purification, altered properties of immunogenicity, and/or the capacity for being localized in specific cellular compartments. The vector molecule according to claim 1, characterized in that said loop is loop AB, CD or FG The vector molecule according to claim 1 or 2, characterized in that said PH domain is the cytohesin 1 or cytohesin 2 domain.
2. The vector molecule according to any one of claims 1 to 3, characterized in that it additionally comprises, adjacent to the 5' or 3 ' end of said first non natural nucleic acid sequence, at least one further nucleic acid sequence encoding an extension of the N or Cterminus of the corresponding polypeptide that facilitates or improves the purification or detection of said polypeptide, or aids in targeting said polypeptide to a specific location.
3. A recombinant nucleic acid vector molecule comprising (a) the vector molecule according to any one of claims 1 to 4 or a corresponding vector wherem said first nonnatural nucleic acid sequence is replaced by the corresponding natural nucleic acid sequence; and/or (b) at least one second nucleic acid sequence encoding one or more ammo acid sequences wherem said one or more amino acid sequences form at least a part of at least one loop, preferably loop AB, CD and/or FG, of said polypeptide encoded by said first or corresponding natural nucleic acid sequence, said one or more ammo acid sequences not being the one that naturally occurs m said natural nucleic acid sequences, wherein the structural integrity of the polypeptide encoded by said first or corresponding natural nucleic acid sequence is maintained by the polypeptide encoded by said first or corresponding natural and second nucleic acid sequences.
4. The (recombinant) vector molecule according to any one of claims 1 to 5, characterized m that said nucleic.
5. The recombinant vector molecule according to claim 5 to 6, characterized in that said one or more ammo acid sequences encoded by said at least one second nucleic acid sequence replace at least a part of at least one loop, preferably loop AB, CD and/or FG.
6. The recombinant vector molecule according to claim 5 or 6, characterized in that said one or more ammo acid sequences encoded by said at least one second nucleic acid sequence are inserted into at least one loop, preferably loop AB, CD and/or FG.
7. The recombinant vector molecule according to any one of claims 5 to 8 , characterized in that said one or more ammo acid sequences encoded by said at least one second nucleic acid sequence form or contribute to the forming of a continuous epitope.
8. The recombinant vector molecule according to any one of claims 5 to 8, characterized in that said one or more ammo acid sequences encoded by said at least one second nucleic acid sequence form or contribute to the forming of a discontinuous epitope.
9. The recombinant vector molecule according to any one of claims 5 to 10, characterized m that said one or more ammo acid sequences encoded by said at least one second nucleic acid sequence comprise ammo acid sequences that represent or contribute to the representation of a lead structure, a (poly) peptide with ligand binding capabilities, a (poly)peptide with substratemodifying capabilities, or a (poly) peptide capable of interacting with a catalyst or reactant .
10. A library of recombinant vector molecules comprising a plurality of recombinant vector molecules according to any one of claims 5 to 11.
11. A host transformed with the recombinant vector molecule according to any one of claims 5 to 11.
12. The host according to claim 13, characterized m that it is E. coli, a yeast cell, preferably S. cerevisiae, an insect cell or a mammalian cell.
13. A plurality of hosts according to claim 13 or 14, transformed with a library of claim 12.
14. A (poly)peptide encoded by the vector molecule according to any one of claims 1 to 11 or produced by the host according to claim 13 or 14 or the plurality of hosts of claim 15 or being chemically synthesized and having the same ammo acid sequence as the (poly)peptide encoded by the vector molecule according to any one of claims 1 to 11 or an ammo acid sequence comprising compounds which do not naturally occur in (poly) peptide chains, preferably stereoisomers, D ammo acids, protemous ammo acids or αamino acids but being otherwise identical to the above recited (poly) peptides .
15. A library comprising (poly) peptides according to claim 16.
16. A small molecular weight ligand derived from the (poly) peptide according to claim 16.
17. The small molecular weight ligand according to claim 18, characterized in that it is conformationally constrained and is preferably a cyclopeptide .
18. A method for the generation of a library according to claim 12, characterized in that it comprises the insertion of said second nucleic acid sequences encoding a plurality of ammo acid sequences into said first or corresponding natural nucleic acid sequences.
19. The method according to claim 20, characterized in that said insertion is effected by legation of said second nucleic acid sequences into restriction enzyme digested first or corresponding natural nucleic acid sequences .
20. A method of screening a library acornmg to claim 12 comprising (a) expressing the (poly) peptides ercoded by the recombinant vector molecules according to any one of claims 5 to 11; and (b) screening said (poly) peptides for the desired properties encoded by said second nucleic acid sequences .
21. A method of evolving or screening the library according to claim 12 comprising (a) expressing the (poly) peptides encoded by the recombinant vector molecules according to any one of claims 5 to 11, wherem a part of said (poly) peptides confers a selective advantage upon the transformed host according to claim 13 or 14; and (b) screening the host population for said selective advantage.
22. The method according to claim 23 or 23, characterized in that it employs steps that physically couple the genetic information of said second nucleic acid sequences with the expressed and screened (poly)peptides .
23. The method according to any one of claims 22 to 24, characterized in that said expression in step (a) is effected by phage display techniques.
24. The method according to claims 22 to 25, characterized m that said screening is effected by via techniques that mediate the coupling of genetic information and expressed polypeptide by DNAbmdmg domains fused to said polypeptides encoded by said second nucleic acid sequences, wherem said DNAbmdmg domains bind to said vector molecules according to any one of claims 5 to 11, or wherem said screening is effected by polysome display techniques or by interaction trap techniques .
25. The method according to claim 26 wherein said DNA bmding domain is the tet or lac repressor molecule.
26. A method screening a library according to claim 17 comprising screening the (poly) peptides contained msaid library for a desired property reflected by the ammo acids corresponding to said second nucleic acid sequences .
27. Kit comprising the vector molecule according to any one of claims 1 to 4 and/or a nucleic acid vector molecule, characterized in that it comprises a first nonnatural nucleic acid sequence encoding a polypeptide, wherem the topology of the tertiary fold of said polypeptide is homologous to that of a PleckstrmHomology (PH) domain, characterized in that said nonnatural nucleic acid sequence, as compared to corresponding natural nucleic acid sequences, (a) comprises, in at least one of the nucleotide sequences encoding a loop of the PH domain, at least one altered nucleotide; (b) comprises, in at least one of the nucleotide sequences encoding a loop of the PHdomam or the regions directly adjacent thereto a sequence representing one or more endonuclease restriction sites; or/and (c) confers to the encoded polypeptide an improved stability, improved folding properties, an improved resistance to oxidation, aggregation or an improved protease resistance, facilitated or improved purification, altered properties of immunogenicity, and/or the capacity for being localized m specific cellular compartments.
28. A pharmaceutical composition or vaccine comprising the (poly) peptide according to claim 16, or the ligand according to claim 17 or 18. Use of the vector molecule according to any one of claims 1 to 4 or the recombinant vector molecule according to any one of claims 5 to 11 for gene therapy.
Description:
Novel synthetic protein structural templates for the generation, screening and evolution of functional molecular surfaces .

Field of the Invention

The present invention relates generally to protein molecules with novel binding or catalytic properties. More specifically, the invention relates to tne production of libraries of peptide sequences m the framework of a structural template derived from Pleckstrm-Homology (PH) domains and the identification of such sequences that posses the desired properties of binding macromolecular or small molecule ligands, including the transition states of chemical reactions. The invention also relates to the provision of small molecules, derived from the so ontamed peptide sequences that posses ligand binding properties comparable to those of the peptides m the context of the structural framework.

Background of the Related Art

Generation of novel surface epitopes

Proteins are complex macromolecules that are assembled in the cell as linear heterocopolymers of ammo acids Their precise composition essentially is encoded in the genetic sequence and the protein sequence uniquely and robustly specifies the three-dimensional fold of the protein It is this fold, the spatial organization of molecular shape, hydrophobic and hydrophilic areas and electrostatic fields which is at the basis of the specificity of molecular interactions that ultimately determines the function of the proteins The underlying principle is one of molecular recognition oy complementarity of interacting surfaces Since the shape and nature of protein surfaces is determined by the three- dimensional structure of the protein, which ir. turn is determined by its ammo-acid sequence, changes in this

sequence can cause structural changes which can provide novel surfaces, complementary to proteins, other macromolecules or small molecules. In the case that such surfaces are complementary to the transition states of chemical reactions, the free energy of the transition state is lowered by binding of the substrate to such a surface and the reaction is catalyzed by the surface.

Since the number of possible sequence changes is combmatorially large and the effect of each single change is context dependent and has enthalpic and entropic components, neither of which can be precisely modeled, the rational engineering of such novel surfaces is at present not possible. As an alternative access to such surfaces, the art has employed techniques of random mutagenesis, combined with techniques of screening for successful variations, to emulate the process of natural selection.

In particular examples, a natural paradigm exists in the immunoglobulins, where random, somatic mutations of variable domains improve antigen-affmity, which m turn is translated into a proliferative stimulus for the B-cell expressing the mutated antibodies. The art has employed in vivo systems, such as peptide libraries cloned as continuous epitopes into Rop [Vispo NS, 1993] or thioredoxm [McCoy J, 1992] , peptides displayed on the surface of replication competent phages such as filamentous phages [Hoess RH, 1993] or bacteria [Little M, 1993] . The art has also employed in vitro systems that physically couple the genetic information with the expressed peptide sequence through DNA binding functions such as fusions of peptides to the C-termmus of the lac-repressor [Cull MG, 1992] or the display of peptides attached to polysomes [Mattheakis LC, 1994] . While each of these methods has specific advantages, each also has drawbacks limiting their applicability in principle and in practice The efficient generation of novel surface epitopes with high affinity remains a serious challenge. There is a continuing requirement for versatile structural templates which may be used in in vivo as well as in vitro screening systems, which must thus not require the oxidation of

disulfide bridges for functional expression, which allow the display of continuous and discontinuous peptide epitopes, which anchor such peptides in a context of antiparallel beta strand secondary structure to facilitate the use of such peptides as lead substances for the synthesis of cyclopeptides and similar conformationally constrained small molecules, which fold independently and are highly stable, which can be efficiently expressed recombinantly and which can be efficiently refolded. Yet, none of the structural templates currently used in the art possesses all these properties. Therefore, in the art of providing novel molecular effector surfaces, there remains a need for novel structural templates which combine the above desirable properties. To this end the inventors have investigated the use of suitable derivatives of Pleckstrin Homology (PH) domains.

Naturally occuring PH domains.

PH domains are proteins of a sequence length of approximately 110 amino acids. They were first described as an internal sequence homology at the N-terminus and C- terminus of Pleckstrin [Haslam RJ, 1993; Mayer BJ, 1993] and subsequently more than ninety such domains were identified by sensitive sequence comparison searches, predominantly as components of proteins of eukaryotic cellular signaling pathways [Gibson TJ, 1994] , and aligned. As is usually the case in protein families of low sequence similarity, sequence insertions and deletions relative to the core alignment can be found in all loops that connect elements of secondary structure. The structures of the PH domains of pleckstrin [Yoon HS, 1994] , beta-spectrin [Macias MJ, 1994; Zhang P, 1995] , dynamin [Downing AK, 1994; Ferguson KM, 1994; Timm D, 1994] and phospholipase C-delta 1 [Ferguson KM, 1995] have been determined with X-ray crystallographic as well as with nuclear magnetic resonance experiments, after recombinant expression of the native domain sequences or native domain sequences with additional non-native extensions of the N-

or C-terminus for facilitated purification. The overall structure of the PH domain is that of a seven stranded, antiparallel , bent beta-sheet, covered on one end by a C- termmal helix [Figure 1] . While the physiological function of PH domains is not precisely known - and may indeed vary from protein to protein - many PH domains appear to bind mositol bis- and trisphosphates with high affinity. Such binding may either effect the localization of the PH domain to phospholipid membrane surfaces, cause an allosteric effect due to second messenger binding, or both. It has been suggested that a surface epitope of PH domains, bordered by the loops AB, CD and FG, or an epitope bordered by the loops CD and EF may be responsible for such binding. Significantly, in the two structures known of a mositolphosphate/PH complex

[Ferguson KM, 1995; Hyvonen M, 1995] , the ligand is seen to bind in either one or the other of the different epitopes of the PH domain. And it has been observed of the three PH domains whose structure is known that." The surfaces of the three domains do not reveal any common patch or pocket that might lead one to propose a likely common binding site . " [Ferguson KM, 1995] .

Modifications of sequences of naturally occurring PH domains.

The N-terminal PH domain of Pleckstrin has been mutated in its framework region to define residues important for the binding of inositolphosphates . Mutations of residues adjacent to loop AB resulted in a reduction m binding affinity towards phosphatidylmositol 4,5- bisphosphate [Harlan JE, 1995] . The PH domain of beta- adrenergic receptor kinase was mutated in the helix C- termmal to loop FG to investigate binding to G-beta-gamma subunits and mositol phosphates. These mutations proved deleterious to the G-beta-gamma interactions and phosphatidylmositol 4 , 5-bιsphosphate binding. This was taken by the authors to imply yet another structurally distinct functional surface epitope [Touhara K, 1995] . In

fact, the demonstration that the flanking sequence C- termmal to PH domains is important for G-beta gamma binding [Mahadevan D, 1995; Touhara K, 1994; Tsukada S, 1994; Wang DS, 1994] , suggests that for structural reasons the epitopes mentioned above cannot be the primary site of protein- protein interactions of the domain. In several publications on the domain structure, the authors speculated on possible functional aspects - especially ligand interaction sites - that might become apparent from structural analogies. In this context, structural analogies have been noted to exist between PH domains and the calycin protein family [Yoon HS, 1994] , FK506 binding protein

[Macias MJ, 1994] , as well as the antigen combining site of the immunoglobulins, streptavidm, a domain of beef liver catalase and the spectrm SH3 domain [Ferguson KM, 1994] . The structural analogies would suggest preferred interaction sites on various surface epitopes of the PH domain. But none of the proteins mentioned above shares the exact folding topology with the PH domains so that homology, and thus a conserved ligand binding architecture, cannot be assumed. In summary, it is not obvious, especially given the low degree of inter-domain sequence homology, whether any demonstrated or implied functional site of ligand binding or protein - protein interactions may be generalized.

To simplify purification, a hemagglutmm epitope has been used as a C-termmal extension of the PH domain of the Ras exchange factor Son-of-sevenless to enable the antibody affinity preparation of the recombinant protein [McCollam L, 1995] . The PH domains themselves remained unmodified.

A His-tag epitope has been has been used as a C- termmal extension of the PH domains of Dbl, Sos 1, IRS-1 and beta-ARK to enable the immobilized-metal-affinity chromatographic purification the recombinant protein [Mahadevan D, 1995] . The PH domains themselves remained unmodified.

Screening for a PH domain ligand.

A particularly useful and versatile system for the in vivo screening of protein-protein interactions employs two fusion proteins respectively comprising DNA binding and transcription activating domains as exemplified in the interaction trap [Fields S, 1989] and two hybrid [Estojak J, 1995] systems. It has been reported that natural PH domains have been used in an interaction trap system to identify possible proteinaceous ligands [Ferguson KM, 1994] . Without specifying details, the authors state that no ligands were obtained and no binding to a proteinaceous ligand could be demonstrated and that this could be interpreted as that the PH domain would not bind to proteins or was not functional in this screening system.

Summary of the invention

The present invention provides a structural framework for the display of continuous and discontinuous surface epitopes in a context of antiparallel beta strand secondary structure which does not require the oxidation of disulfide bridges for functional expression and which folds independently, is highly stable, can be efficiently expressed recombinantly and efficiently refolded. In its principal embodiment, we describe the use of modified sequences of PH domains that may include restriction sites suitable for the efficient cloning of random oligonucleotide libraries into the structural context of the domain through methods of directional cloning or PCR. The sequence changes may code for amino acid changes that facilitate the handling of the domain, for example by improving the domain stability, improving the folding properties of the domain, improving its resistance to oxidation or aggregation, improving its protease resistance, facilitating or improving the purification of the protein, altering its properties of immunogenicity, or

modifying the proteins capacity for being localized m specific cellular compartments. Additionally, the above functions may be achieved through the addition of N- terminal or C-terminal extensions to the protein sequence. In such proteins three surface loops may be modified singly, or in combination to provide novel continuous or discontinuous surface epitopes . We describe methods to screen a library or libraries of sequences for functionality using this structural framework, as well as methods to use cycles of screening and modification for the evolution and further improvement of functional sequences. The sequences so obtained may be used as lead structures for small molecular weight compounds and we describe an approach to the preparation of small molecular weight ligands derived from such sequences, which are conformationally constrained in a way comparable to their conformational constraint in the original context of the protein structure.

Detailed Description of the Invention

The present invention relates to a nucleic acid vector molecule, characterized in that it comprises a first non- natural nucleic acid sequence encoding a polypeptide, wherein the topology of the tertiary fold of said polypeptide is homologous to that of a Plcckstrm-Homology (PH) domain, characterized in that said non-natural nucleic acid sequence, as compared to corresponding natural nucleic acid sequences,

(a) comprises, in at least one of the nucleotide sequences encoding a loop of the PH domain, at least one altered nucleotide, wherein said at least one altered nucleotide confers an alteration m the corresponding amino acid and said alteration (s) m said corresponding amino acid(s) confer (s) a gain of function;

(b) comprises, in at least one of the nucleotide sequences encoding a loop of the PH-domain or the regions directly adjacent thereto a sequence representing one or more endonuclease restriction sites; or/and (c) confers to the encoded polypeptide an improved stability, improved folding properties, an improved resistance to oxidation, aggregation or an improved protease resistance, facilitated or improved purification, altered properties of immunogenicity, and/or the capacity for being localized in specific cellular compartments.

In a preferred embodiment the vector molecule of the present invention is characterized in that said loop is loop AB, CD or FG.

In a further preferred embodiment the vector molecule of the present invention is characterized m that said PH domain is the cytohesin 1 or cytohesin 2 domain.

In a further preferred embodiment the vector molecule of the present invention is characterized m that it additionally comprises, adjacent to the 5'- or 3 ' -end of said first non-natural nucleic acid sequence, at least one further nucleic acid sequence encoding an extension of the N- or C-terminus of the corresponding polypeptide that facilitates or improves the purification or detection of said polypeptide, or aids in targeting said polypeptide to a specific location.

The invention further relates to a recombinant nucleic acid vector molecule comprising

(a) the vector molecule according to the present invention or a corresponding vector wherein said first non-

natural nucleic acid sequence is replaced by the corresponding natural nucleic acid sequence; and (b) at least one second nucleic acid sequence encoding one or more ammo acid sequences wherem said one or more amino acid sequences form at least a part of at least one loop, preferably loop AB, CD and/or FG, of said polypeptide encoded by said first or corresponding natural nucleic acid sequence, said one or more ammo acid sequences not being the one that naturally occurs in said natural nucleic acid sequences, wherem the structural integrity of the polypeptide encoded by said first or corresponding natural nucleic acid sequence is maintained by the polypeptide encoded by said first or corresponding natural and second nucleic acid sequences.

In a preferred embodiment the (recombinant) vector molecule of the present invention is characterized in that said nucleic acid is DNA.

In a most preferred embodiment the recombinant vector molecule of the invention is characterized in that said one or more ammo acid sequences encoded by said at least one second nucleic acid sequence replace at least a part of at least one loop, preferably loop AB, CD and/or FG.

In a further most preferred embodiment the recombinant vector molecule of the invention is characterized in that said one or more ammo acid sequences encoded by said at least one second nucleic acid sequence are inserted into at least one loop, preferably loop AB, CD and/or FG.

In a further most preferred embodiment the recombinant vector molecule of the invention is characterized in that

said one or more ammo acid sequences encoded by said at least one second nucleic acid sequence form or contribute to the forming of a continuous epitope.

In a further most preferred embodiment the recombinant vector molecule of the invention is characterized m that said one or more amino acid sequences encoded by said at least one second nucleic acid sequence form or contribute to the forming of a discontinuous epitope.

In a further most preferred embodiment the recombinant vector molecule of the invention is characterized in that said one or more ammo acid sequences encoded by said at least one second nucleic acid sequence comprise ammo acid sequences that represent or contribute to the representation of a lead structure, a (poly) peptide with ligand binding capabilities, a (poly) peptide with substrate-modifying capabilities, or a (poly) peptide capable of interacting with a catalyst or reactant .

The present invention also relates to a library of recombinant vector molecules comprising a plurality of recombinant vector molecules according to the present invention.

The present invention furthermore relates to a host transformed with the recombinant vector molecule according to the present invention.

In a preferred embodiment the host of the present invention is characterized in that it is E. coll, a yeast cell, preferably S. cerevisiae, an insect cell or a mammalian cell.

Further the present invention relates to a plurality of hosts according to the present invention, transformed with a library of the present invention.

The present invention furthermore relates to a (poly)peptide encoded by the vector molecule according to the present invention or produced by the host according to the present invention or the plurality of hosts of the present invention. Said polypeptides may also be chemically synthesized and have the same amino acid sequence as the (poly)peptide encoded by the vector molecule of the invention. Said (poly)peptide may further have an amino acid sequence that comprises compounds that do not naturally occur m (poly)peptide chains, preferably stereoisomers, D-amino acids, protemous ammo acids or α amino acids but being otherwise identical to the above- recited polypeptides.

Additionally, the present invention relates to a library comprising said (poly) peptides . In particular, said library comprises said chemically synthesized (poly) peptides, which may or may not have non-naturally occurring compounds contained therein.

The chemical synthesis of the (poly) peptides may, for example, be effected by utilizing fragment condensation techniques or one step procedures which result in whole (poly)peptide chains. The non-natural peptide compounds may be molecules that are susceptible to techniques of peptide synthesis. The libraries referred to above may be effected by methods of combinatorial chemistry. The construction may be effected by split synthesis or by utilizing building blocks consisting of fixed peptide sequences and sequences

-

consisting of said libraries of continuous sequence epitopes .

In a further embodiment a small molecular weight ligand is derived from the (poly) peptide according to the present invention.

In a preferred embodiment the small molecular weight ligand according to the present invention is characterized in that it is conformationally constrained and is preferably a cyclopeptide .

The present invention additionally relates to a method for the generation of a library according to the present invention, characterized m that it comprises the insertion of said second nucleic acid sequences encoding a plurality of ammo acid sequences into said first or corresponding natural nucleic acid sequences.

In a preferred embodiment the method according to the present invention is characterized in that said insertion is effected by ligation of said second nucleic acid sequences into restriction enzyme digested first or corresponding natural nucleic acid sequences

The invention also relates to a method of screening a library according to the present invention comprising

(a) expressing the polypeptides encoded by the recombinant vector molecules according to the present invention; and

(b) screening said (poly) peptides for the desired properties encoded by said second nucleic acid sequences .

The present invention also relates to a method of evolving or screening the library according to the present invention comprising

(a) expressing the polypeptides encoded by the recombinant vector molecules according to the present invention, wherem a part of said polypeptides confers a selective advantage upon the transformed host according to the present invention; and

(b) screening the host population for said selective advantage.

In a preferred embodiment, said method of the invention is characterized in that it employs steps that physically couple the genetic information of said second nucleic acid sequences with the expressed and screened (poly)peptides .

In a preferred embodiment the method according to the present invention is characterized in that said expression in step (a) is effected via phage display techniques.

In general, it may be preferable to display the expressed polypeptides or the coupling of genetic information of the second nucleic acid sequences and the expressed polypeptides on the surface of host cells, for example, utilizing outer membrance proteins, subunits of surface appendages or secreted proteins.

In a further preferred embodiment the method according to the invention is characterized in that said screening is effected by via techniques that mediate the coupling of genetic information and expressed polypeptide by DNA- bmding domains fused to said polypeptides encoded by said second nucleic acid sequences, wherem said DNA-bmdmg domains bind to said vector molecules according to the

invention, or wherein said screening is effected by polysome display techniques or by interaction trap techniques .

Particularly preferred is a method wherein said DNA-bindmg domain is the tet repressor molecule or the lac repressor molecule.

The present invention further relates to a kit comprising the vector molecule according to the present invention and/or a nucleic acid vector molecule, characterized in that it comprises a first non-natural nucleic acid sequence encoding a polypeptide, wherem the topology of the tertiary fold of said polypeptide is homologous to that of a Pleckstrm-Homology (PH) domain, characterized in that said non-natural nucleic acid sequence, as compared to corresponding natural nucleic acid sequences,

(a) comprises, in at least one of the nucleotide sequences encoding a loop of the PH domain, at least one altered nucleotide;

(b) comprises, in at least one of the nucleotide sequences encoding a loop of the PH-domain or the regions directly adjacent thereto a sequence representing one or more endonuclease restriction sites; or/and (c) confers to the encoded polypeptide an improved stability, improved folding properties, an improved resistance to oxidation, aggregation or an improved protease resistance, facilitated or improved purification, altered properties of immunogenicity, and/or the capacity for being localized in specific cellular compartments.

The present invention also relates to a pharmaceutical composition or vaccine comprising the (poly) peptide

according to the present invention or a ligand derived therefrom.

The present invention also relates to the use of the vector molecule according to the present invention or the recombinant vector molecule according to the present invention for gene therapy.

Detailed Description of the Invention

In the description that follows, a number of terms used in recombinant DNA and protein engineering technology are utilized In order to provide a consistent understanding of the specification and claims, the following definitions are provided.

Library. As used herein, a library is a plurality of sequences. Generally this plurality of sequences will be part of a sequence comprising common parts, represented by all or almost all molecules of the library, and unique parts, represented only by a single molecule of the library or represented only rarely.

Homology. As used herein, two sequences are considered to be homologous if they are recognizably derived from a common ancestor sequence

Loop. As used herein, a loop is the sequence of ammo acids bridging two elements of secondary structure. The loops of Pleckstrin Homology domains are defined m Figures 1 and 2. Framework. As used herein, the framework of a protein consists of those residues that are not part of the loop sequence .

Functional As used herein, the term functional is used to indicate a function of a proteir, acquired or significantly enhanced as a result of the methods of the invention. Typically this function will be an (improved) binding of a ligand.

Function of a protein or protein domain as used herein indicates a detectable effect of the interaction of the protein or protein domain with another molecule. Typically, function is intended to mean the binding of a ligand.

Gam of function as used herein is an acquisition of a novel interaction or an increased effect of an existing interaction of the protein or protein domain m contrast to a loss of function such as the loss of a phosphorylation site.

Domain. A domain is a structurally defined subunit, usually an autonomously folding polypeptide. Frequently,

domains are observed as functional modules in larger proteins .

Restriction sites. As used herein, a restriction site is a nucleotide sequence which will be recognized as a cleavage site by a restriction endonuclease .

Fusion proteins. A fusion protein is a hybrid protein, constructed to contain sequences from at least two different proteins or peptides.

Surface epitopes. As used herein, a surface epitope is an ammo acid or a set of amino acids, which is a subset of a protein, which contribute to a contiguous part of the solvent accessible surface of that protein.

Contmous epitope. As used herein, a continuous epitope is a surface epitope formed by amino acids placed sequentially m the primary structure of a protein.

Discontinuous epitope. As used herein, a discontinuous epitope is a surface epitope formed by two or more (nonsequential) continuous epitopes

Reporter gene. As used herein, a reporter gene is a gene which will confer upon a host a detectable phenotype, if a certain condition whithm said host is present. Typically a reporter gene can oe actively expressed as a result of a function posessed by a representative of an epitope library, such as tight binding to a ligand.. Displayed. As used herein, a molecule or part of a molecule is displayed if it contributes to the accessible surface of a molecule in such a way that interactions with other molecules are possible.

Evolution of functional epitope libraries. As used herein, functional epitope libraries are evolved, if such a library is subjected to conditions in which sequence changes occur in the components of the library or in copies thereof and representatives of said library are selected, that posess an improved function. Phenotype. As used herein, a phenotype is any detctable change in appearance, function or behaviour of a host cell. A phenotype can be the color of a colony, the growth characteristic of a cell, the capacity of a cell to

catalyze a chemical reaction or to bind compounds that can be detected by physical or other means, with high affinity and thus concentrate them in the cell or its immediate surrounding. Examples of these and other detectable phenotypes are well known to one of skill in the art.

The methods and compositions of the present invention permit the utilization of a novel structural principle for the display of continuous and discontinuous surface epitopes in a protein. This protein structural framework does not require the oxidation of disulfide bridges for functional expression. The novel surface epitopes are displayed in a context of antiparallel beta strand secondary structure particularly suitable for the construction of small molecular compounds based on functional sequences of modified PH domains.

Modifications to the vector sequence.

We describe a nucleic acid vector molecule which is modified to allow the generation of such proteins with genetic engineering methods. These modifications may consist of nucleotide sequence changes which facilitate the generation of novel proteins in the structural framework provided by this invention, while retaining the desirable characteristics of the protein template. Specifically, such modifications may introduce new restriction sites or delete existing restriction sites in the nucleotide sequence of the protein, with the aim of providing unique restriction sites adjacent to the regions of the sequence which are to contribute to novel surface epitopes. In one embodiment of the invention, an oligonucleotide primer with a stretch of randomized nucleotides, and restriction sites near the 3' and 5' end, is converted into a double stranded oligonucleotide after annealing with a complementary primer. This oligonucleotide is then cleaved with the appropriate restriction endonucleases and ligated into the prepared vector molecule, where the randomized nucleotides

encode a plurality of ammo acid sequences inserted m frame into the structural template molecule.

Modifications to the ammo acid sequence.

Modifications to the nucleotide sequence of the structural framework may change the protein amino acid sequence and introduce properties, desirable for technical and other reasons . It may be advantageous to increase the stability of the modified PH domain framework beyond that which is found in naturally occurring domains. Some of the peptide sequences intended to form new surface epitopes in the structural context of the modified PH domain framework may prove to be destabilizing and this may be compensated by increased levels of stability of the framework. Additionally, some uses of novel domains may require stability towards denaturant agents and/or high temperatures. Again, such uses may be facilitated by the increase of folding stability. Methods to achieve this goal are known in the art and include the introduction of consensus sequences from an alignment of closely related natural variants [Steipe B, 1994] , the stabilization of the helical dipole through incorporation of suitable charged residues [Nicholson H, 1988] , tne provision of N- cap or C-cap helix structural residues [Serrano L, 1989] , the incorporation of prolme residues in suitable positions of the framework [Nicholson H, 1992] , the incorporation of ammo acids with increased alpha-helix or beta-strand forming propensities in suitable positions of the structural framework [Blaber M, 1993, Smitn CK, 1994] and the substitution of ammo acids conserved for functional reasons in the PH domain sequence by ammo acids which may be more appropriate m the respective structural context - a procedure that has been successfully applied to stabilize lysozyme [Shoichet BK, 1995] .

Such engineering to improve the thermodynamic stability of the modified PH domain framework may at the same time improve the folding properties of the domain.

Such improved folding properties may also be achieved through other mutations and may be desirable for technological reasons. Specifically, the removal of slow folding steps which may lead to the accumulation of aggregation-prone folding intermediates, may reduce the propensity for the domain to aggregate during refolding. Slow folding steps have been reported for proteins that require the isomerization of a peptide-prolme bond from the energetically favored trans-conformation into a cis- conformation which may be necessary for structural reasons. Substitution of such sequences by other residues may significantly accelerate the folding of the domain - a procedure that has been successfully applied to ribonuclease A [Schultz DA, 1992] . An improved resistance of the modified PH domain framework to irreversible modifications may be desirable for technical or other reasons and may be achieved through partial or complete substitution of the sulfur containing ammo acids cysteine and methionine, which are sensitive to oxidative damage, or of asparagme residues, which are susceptible to deamidation and conversion to iso-aspartic acid, each with suitable other ammo acids.

An improved resistance to oxidation of disulfide linkages within or between modified PH domains may be desirable for technical or other reasons and may be achieved through the partial or complete replacement of cysteine residues with suitable other ammo acids such as serme, alanme or valine.

An improved resistance of modified PH domains towards aggregation and an increased solubility may be desirable for technical or other reasons. For example, this may improve the shelf life of compositions comprising modified PH domains, improve production processes including modified PH domains and the production of the domains themselves and improve the usefulness of modified PH domains in pharmaceutical compositions. This can be achieved through changes to the modified PH domain framework sequence, for example through the substitution of hydrophobic residues by other ammo acids - a procedure that has been successfully

applied to a respiratory syncitial virus major glycoprotem fragment [Murby M, 1995] . In another example, an improved resistance towards aggregation during refolding or afterwards and an increased solubility of modified PH domains may be achieved through provision of glycosylation sites, phosphorylation sites and/or charged residues, through mutagenesis of appropriate surface residues.

An improved resistance towards proteolytic cleavage and degradation of modified PH domains may be desirable for technical reasons and increase the usefulness of the domains in vivo and in vitro. For many proteases, specific recognition sites are known and may be changed in the sequence of modified PH domains to reduce their sensitivity towards such proteases . An increased susceptibility towards limited proteolytic cleavage by specific proteases, or towards chemical cleavage may be desirable, to obtain peptides from the expressed domains. For example protease recognition sites such as that of activated Factor X, Ile-Glu-Gly-Arg, or Thrombm, Leu-Val-Pro-Arg-Gly-Ser, or the ammo acid methionine may be engineered to occur m the ammo acid sequence on both sides adjacent to a loop region. Thus peptides comprising the sequence between these residues may be obtained by cleavage of the modifies PH domains with the appropriate protease, or by treatment with cyanogen bromide under conditions well know to one of skill m the art, whichever may be appropriate for the chosen modification.

The use of affinity chromatographic techniques for a facilitated purification of modified PH domains may be desirable for technical reasons For example, this may be achieved through the introduction of a surface epitope comprising histidine residues to coordinate bivalent metal cations which can subsequently be complexed by a suitable affinity matrix. Such a procedure has been successfully applied to retmol binding protein [Muller HN, 1994] .

The immunogenicity of modified PH domains intended for in vivo use or comprising pharmaceutical compositions may be reduced through the substitution of sequence epitopes foreign to the species in which the modified PH

domains are to be used, with such sequence epitopes native to the species, or with residues which may be less lmmunogenic for other reasons. For this purpose the de novo design of sequences, or the construction of hybrid PH domains comprising sequences of at least one other naturally occurring protein or proteins, may be employed. Such a procedure has been successfully applied to antibody variable domains [Roguska MA, 1994] .

Many proteins are transported into specific cellular compartments and the use of modified PH domains may require such specific transport. This may be effected by the introduction of appropriate sequence signals. Examples include nuclear localization signals such as the adenoviral sequence Lys-Arg-Pro-Arg-Pro, or retention signals for the endoplasmic reticulum such as the C-terminal sequence Lys- Asp-Glu-Leu.

N- and C-termmal fusions.

Many of the technological advantages described above, and others, may be realized through N- or C- terminal fusions with sequences conferring upon the modified PH domain framework the desired properties.

An increase in the stability of the modified PH domain framework beyond that which is found in naturally occurring domains may be achieved through N- or C- terminal fusion with sequences comprising residues which provide stabilizing interactions to the framework. Such fusion of N- or C-termmal sequences to the modified PH domain framework sequence may at the same time, or independently, improve the folding properties of the domain. For example an improved resistance towards aggregation during refolding or afterwards and an increased solubility of modified PH domains may be achieved through provision of N- or C- terminally fused sequence extensions comprising charged residues, phosphorylation sites and/or glycosylation sites.

An improved resistance towards proteolytic cleavage and degradation of modified PH domains may be achieved through N- or C- terminal fusion with protease inhibitory

sequences or protease inhibitors Affinity chromatographic techniques for a facilitated purification of modified PH domains may be employed after fusion with N- or C- terminal sequences comprising histidine residues [Lindner P, 1992] , epitopes, recognized by specific antibodies [Evan GI , 1985] , epitopes binding streptavidme [Schmidt TGM, 1994] , the ribonuclease S-peptide [Kim JS, 1993] , stretches of charged residues, fusion proteins such as for example Glutathione S-transferase [Smith DB, 1988] , or the use of other techniques kown to one of skill m the art.

The transport into specific cellular compartments of modified PH domains may be effected by N- or C- terminal fusion with appropriate sequence signals such as the adenoviral nuclear localization sequence Lys-Arg-Pro-Arg- Pro, or the C-termmal endoplasmic retention signal sequence Lys-Asp-Glu-Leu. Other applications of modified PH domains may require binding the domains to specific targets. This can be achieved through N- or C- terminal fusions with peptides or proteins specifically binding such targets such as antibodies or DNA binding domains. Other applications of modified PH domains may require binding functional peptides or proteins, such as enzymes or inhibitors, to specific targets via the affinity provided by the modified PH domain itself and this can be achieved through N- or C- terminal fusions of the functional peptides or proteins to an appropriately constructed or selected modified PH domain.

Modifications leading to functional modified PH domains.

In a preferred embodiment of the invention, epitope modifications may be performed in surface loops AB, CD and FG of a suitable modified PH domain, singly or m any combination, such epitope modifications retaining the desirable properties of the original domain such as folding independently, being highly stable, being suitable for efficient recombinant expression and efficiently refolding.

Such epitope modifications may provide conformationally constrained continuous sequence epitopes. Continuous sequence epitopes can provide ligands with dissociation constants of micromolar magnitude or less, and their appropriate conformational constraint may improve the dissociation constant by a large amount . Such conformational constraint is provided by the stable anchoring of surface loops AB, CD, EF and FG m antiparallel beta-strands of the modified PH domain framework.

Much greater binding affinity than that of continuous sequence epitopes may be provided by discontinuous epitopes. This may significantly increase the number of interactions between the surface epitopes of the modified PH domain and its ligand. Discontinuous epitopes may provide dissociation constants on the order of nanomolar magnitude or less and the possibility of constructing or otherwise obtaining large discontinuous surface epitopes with high affinity to a ligand is an important aspect of the invention. Such discontinuous epitopes may comprise more than one of the surface loops AB, CD, EF and FG of the modified PH domain in any combination, preferably all three of the surface loops AB, CD, and FG.

Construction of libraries

A library or libraries of continuous sequence epitopes in modified PH domains may be constructed m the following manner. Utilizing techniques of site-directed mutagenesis known to one of skill in the art, the natural or modified nucleotide sequence for the PH domain may be altered in such a way as to introduce suitable, preferably singular, restriction sites flanking both sides of the sequence coding for surface loops AB, CD or FG. Alternatively, restriction sites in the vector or coding sequence may be deleted using said methods of mutagenesis, to make a restriction site occurring more than once in the natural sequence singular. Alternatively, a single restriction site directly adjacent to, or within said

surface loop sequences may be introduced or made singular. After the introduction of a suitable endonuclease restriction site or sites into the nucleotide sequence encoding the modified domain, that sequence may be cleaved to generate compatible ends for subsequent ligation of a randomized oligonucleotide. Alternatively, the sequence of the randomized oligonucleotide itself may be introduced through site-directed mutagenesis, preferrably, but not necessarily, at the same time deleting an existing natural or previously introduced non-natural, singular restriction site to allow restriction selection of the mutant genotype. Said oligonucleotide is designed to comprise the same or compatible restriction endonuclease sites and to additionally comprise a stretch of nucleotides that have been synthesized to provide degenerate sequences, i.e. such sequences in which each position is not characterized by a specific nucleotide but can randomly be any of a number of alternatives with a probability determined by the exact conditions of synthesis. For example a degenerate codon of the sequence NNB will specify 48 different codons, all not ending in A. The use of such randomized codons will reduce the probability of introducing a stop codon from 0.047 (or one in 21 codons) to 0.021 (or one in 48 codons) , thus the number of random positions that may be generated while not letting the probability of introducing a stop codon rise above 0.5 is increased from ~ 14 to - 33. To prepare said randomized oligonucleotide for ligation into the appropriately prepared domain nucleotide sequence, a second nucleotide, complementary m sequence to the 5' end of said randomized oligonucleotide, is annealed to said oligonucleotide and the complementary strand is generated through the action of a suitable polymerase. The efficient ligation of the randomized oligonucleotide sequences into the domain vector sequence is then effected through techniques well known to one of skill in the art, ultimately producing a library of sequences of modified PH domains. Alternatively, the library may be constructed by gene synthesis of the entire domain or parts thereof wherem the randomized segments are complemented by action

of a suitable polymerase. A suitable host can then be transformed with such libraries of sequences, or the library of sequences can be used directly or after amplification by the polymerase chain reaction, for further manipulation of the nucleotide sequence.

Libraries of sequences of discontinuous epitopes may be generated according to the above procedure by simultaneously ligatmg more than one randomized oligonucleotide sequence into a vector sequence suitably prepared m more than one sites, m a multi-fragment ligation. Preferably though, the generation of libraries of discontinuous epitope sequences will be performed via the combination of libraries of randomized continuous epitopes using cloning techniques well known to one of skill in the art.

Synthetic PH domains generated by chemical synthesis

Another completely different approach for the construction of libraries of contmous or discontmous sequence epitopes in modified PH-domams could take the advantage of total chemical synthesis of the protein. Utilizing techniques known to one of skill in the art several peptides consisting of consecutive sequence parts of the modified PH domain could be chemically synthesized. These peptides may be condensed by suitable chemical methods resulting in a non-interrupted, full-length polypeptide chain, e.g. [Nyfeler R, 1994] . Alternatively it may be possible to synthesize the whole polypeptide sequence m a one-step peptide synthesis according to known techniques. For both said methods the short sequence length of a PH domain with approximately 110 ammo acids is of special advantage or even a requirement for efficency.

The said chemically synthesized polpeptide chain has to fold to its natural occuring three dimensional structure. A favourable feature for a polypeptide chain to fold spontanously under physiological conditions is the lack of disulfide linkages because the incorrect connection of free cysteine residues often results m non-folded, aggregating proteins. The naturally occuring PH domains do

not posess any disulfide linkages because their rigid structure of antiparallel β-strands does not need a stabilization by disulfide bridges. In this context the special technical interest of a modified PH domain in whicn the naturally occuring free cysteine residues are replaced as described before is to be mentioned

In order to create large libraries of non-natural contmous or discontmous sequence epitopes these said methods of total chemical protein synthesis are preferably supplemented by techniques of combinatorial peptide chemistry. Synthesis units of fixed sequences and units containing libraries of contmous sequence epitopes may be combined. Such a library may e.g. be obtained via split synthesis, a repeated combining and splitting of the solid phase beads after each ammo acid coupplmg step known to one of skill in the art. A related method may be the one- bead-one-compound concept recognized by Lam et al . [Lam KS, 1991] which is based on the fact that combinatorial libraries prepared via split synthesis approach contain single beads displaying only one type of compound. This method allows the rapid synthesis of large libraries which are spatially separable and therefore car. be screened concurrently but independently Alternative methods concerning combinatorial peptide cnerrustry produce spatially addressable combinatorial libraries where the position of a molecule identifies its composition. The utilization of a sophisticated chemistry, e.g. photolitographic techniques [Fodor SPA, 1991] permits the synthesis of thousands of different sequences on small surface areas. The rapidly developing field of combinatorial chemistry will shortly present further applicable and practicable techniques for constructing such said combined libraries.

Besides their high complexity the synthetic libraries provide the advantage of incorporation of non-natural amino acids into the peptide chain e g D-ammo acids, stereoisomers of natural ammo acids or any other compound which is susceptible to the coupplmg chemistry.

Screening for increased affinities to a ligand.

A combinatorial library of surface epitopes, constructed with an appropriately large plurality of sequences of modified PH domains will contain among these sequences those, that will posses an increased affinity towards a desired ligand. In order to identify such sequences, methods are described that will allow to discriminate between sequences binding a certain ligand and those that do not . In general such methods and procedures rely on the physical coupling of the expressed protein with its nucleic acid vector molecule, for example through the non-covalent binding of the expressed protein sequence with its vector, the identification of functional sequences within a suitable host cell, that carries the vector sequence or through the display of the expressed protein on the surface of the suitable host-cell or virus carrying the encoding nucleic acid sequence. As an example for the non-covalent binding of an expressed protein sequence from the library to the nucleic acid vector molecule encoding its sequence, the library is prepared using a vector molecule that comprises a modified PH domain sequence, which is fused to the N- or C- terminus of a peptide or protein that can bind a specific DNA sequence with high affinity and a slow kinetic off-rate such as for example the lac-repressor

As an example for the strategy of combining molecules from a surface epitope library of modified PH domains with the encoding vector sequence in a suitable host cell, the use of such modified PH domains m an interaction trap system is described. It is known and has been well documented in the literature, that transcriptional activation can be effected through separable functions of DNA binding and transcriptional activator domains in the cell. A method of detecting protein-protein interactions makes use of the separability of these functions, in that one of the interacting proteins is fused to an appropriate DNA binding domain, while the second protein is fused to an

appropriate transcriptional activator domain. In the case that these two fusion proteins do interact, the transcriptional activator is brought into the vicinity of the DNA binding site, which is located on a nucleic acid sequence, so that transcriptional activation leads to a detectable phenotype of the host cell.

In its principal embodiment, this procedure, as applied to screening libraries of sequences of modified PH domains, may entail creating a library of vector molecules in such a way, that the nucleic acid sequences encoding the library of modified PH domains is fused to the 5 1 end of a second nucleic acid sequence encoding a transcriptional activator domain, for example ammo-acids 768-881 of the yeast GAL4 protein. A suitable host is modified genetically to contain withm its chromosomal DNA, or on a separate plasmid, a constituitively expressed fusion protein consisting of a DNA binding domain, such as amino acids 1- 147 of yeast GAL4 , or the DNA binding domain of the bacterial protein LexA, fused to a sequence encoding the target protein for which it is desired to find modified PH domain sequences from the library, that bind to said proteins. On a chromosome of the host cell, or on a second vector molecule, a reporter gene is located which can be bound with high affinity by the DNA binding domain of the fusion protein mentioned above, and which is followed by a sequence encoding an enzyme, other protein, peptide or sequence functional when transcribed into RNA, the expression of which confers upon the host cell a certain phenotype. Such phenotype may be a property of the host cell which may become evident upon inspection, such as for example a color change effected through the enzymatic action of beta-galactosidase, or may confer upon the host cell a growth advantage under certain conditions of limiting nutrients such as the LEU or Ura locus of yeast cells, or confer upon the cell resistance against compounds normally toxic to the cell, such as the enzyme ammoglycoside-phosphotransferase conferring resistance against the gentamycm derivative antibiotic G418.

After a number of host cells have been transformed with vectors from the library of sequences of modified PH domains, there may be some expressed protein sequences which interact with high affinity with the desired target molecule. In those cells, the modified PH domain carrying the functional surface epitope will non-covalently attach to the desired ligand molecule, which itself is bound to the DNA sequence upstream of the reporter gene via the fused DNA binding domain; thus the transcriptional activator domain which is fused to the modified PH domain will activate the transcription of the reporter gene and the detectable phenotype is expressed.

In order to ensure a high specificity of the library of sequences to be screened, the actual screening step can be preceded with a purification step, in which a host is used that has been genetically prepared m exactly the same way as the host used in the screening, but lacking the actual protem-ligand against which affinity is to be screened, and replacing the detectable phenotype with one which inhibits growth. In that manner, elements from the library which provide the transcriptional activation via non-specific interactions, or those that interfere with cellular functions and thus by themselves inhibit growth, are depleted from the library. After recovery of the remaining vectors from the transformed host cells, this purified library may be used as described above.

A third example of screening modified PH domains employs phage display technology. The aim of the method is to provide fusion proteins comprising an N-termmal modified PH domain fused to the gene III or gene VIII product of M13, FI or similar filamentous, single-stranded DNA phages. This fusion protein is either encoded in the phage-genome, or in a phagemid vector. In the first case, phages contain many copies of the desired protein, in the latter case, after infection of the host cell with helper- phage, few, or single copies of the fusion protein can be found displayed on phage hull particles that have packaged the phagemid single stranded vector. In both cases, the genetic information encoding a modified PH domain is

contained in the phage particles and the fusion protein is displayed on the surface of said phages.

Both methods may be employed in the following way. A library of vector sequences with randomized PH domain surface epitopes is constructed and the PH domain is cleaved from said vectors with the appropriate restriction endonucleases. A phage genome or phagemid vector is prepared by mutagenesis to contain compatible restriction sites, in frame at the 3' end of the DNA sequence encoding the gene III protein. Double stranded RF-DNA is prepared from the phage, or the phagemid propagated, and the DNA is cleaved with the appropriate restriction endonucleases to provide compatible ends for the subsequent ligation of the library of modified PH domains sequence fragments. The resulting RF-DNA vectors or phagemids are transformed into suitable E. coli hosts, which are grown to allow production of recombinant phages in the case of using a phage genome; or which are infected with helper phages . In both cases the single-stranded DNA molecule encoding the modified PH domain will be encapsulated with an assembly of products of gene III and gene VIII and fusion proteins comprising the modified PH domain itself in the resultant phage particle. After centrifugation of the culture, the supernatant containing the phage particles can be directly used or the particles concentrated vie Ultraflltration or precipitation. These particles are incubated with the desired ligand molecule, preferably covalently linked to a column matrix. After washing the matrix, to remove phage particles that are not tightly bound to the ligand, remaining particles with ligand affinity may be eluted selectively with the desired ligand in soluble form. This eluate is enriched in phages containing such fusion molecules that bind specifically to the desired target molecule. It is subsequently used to infect (or transform, in the case of phagemids) E. coli cells, from which large amounts of DNA encoding the modified PH domains can be recovered after propagation and either used for deducing the nucleotide sequence of the functional domains, or used

for one or more further cycles of enriching the pool of functional sequences.

Evolution of functional sequence libraries.

As a method for obtaining very high affinities of domains selected from libraries of modified PH domains, it is not efficient to attempt to search in extremely large libraries, rather it is desirable to isolate pluralities of low to intermediate affinity binding domains and evolve them in a stepwise fashion via repeated cycles of modification and screening. It is especially advantageous m this context to utilize systems which allow the selection of a phenotype conferred upon the host through binding domains. Novel combinations of ammoacids in the structural context of a modified PH domain with the aim of providing a further increased affinity towards the desired ligand can be obtained as described in the three examples above, exemplifying efficient approaches to the further modification of preexisting binding domains for improved functionality. With this approach of evolving and optimizing a preexisting functional sequence, problems of finding suitable representatives m combinatorially complex, large sequence spaces can be reduced to a manageable size.

In one example, a plurality of sequences comprising modifications in more than one epitope is cleaved with appropriate restriction endonucleases and the resultant fragments, each individually expected to contribute towards binding the ligand in some way, are mixed and stochastically religated to provide new and potentially even more functional epitopes from combinations of sequence epitopes.

In another example, a plurality of sequences of modified PH domains, each with demonstrated ligand binding capabilities, can be amplified with the polymerase chain reaction under conditions that will introduce a high percentage of errors into the amplified sequence, thus effecting near-random point mutations over a desired

sequence range, which can either extend over a single epitope, a combination of epitopes or over the whole domain.

In a third example, individual loop regions are excised from the sequence of the pluralities of modified PH domains with demonstrated ligand capabilities, which are subsequently replaced with libraries of oligonucleotides comprising randomized sequence.

Production of functional proteins.

The present invention is also useful for the production of large amounts of proteins comprising continuous or discontinuous epitopes previously identified as possessing a specific function in a screening assay. For this purpose, the nucleic acid sequence encoding the domain is cloned downstream of a strong promoter in a suitable bacterial expression vector, an example being the T7 promoter in the vector pPHCYlLl, and the expressed material is recovered from the host after induction using standard techniques well kown to one of skill in the art.

Following production of said modified PH domains, the protein can either be used to perform a certain function by itself, or a continuous peptide epitope can be cleaved from the protein, utilizing specific protease cleavage sites as specified above. After cleavage, the peptide may be purified from the proteases and the rest of the structural template utilizing techniques well known to those of skill in the art, for example HPLC.

Use of functional epitopes as lead structures

Screened or evolved, functional, modified PH domains may be utilized in the design of lead structures for the generation of small molecule ligands. For this purpose single continuous epitopes may be synthesized chemically or prepared as described above and subjected to targeted or combinatorial chemical modifications with the aim of providing an even further improved function/increased

functionality, or desired properties, such as for example increased permeability into the cellular cytoplasm, or preferred localization in specific organismic or cellular subcompartments, increased bioavailability, increased stability after peroral mgestion of the compound, increased or decreased clearance from body fluids, combination with toxins, which result in a targeted delivery of such toxins, combination with radiochemicals or spin labels to allow targeting of these markers in the course of diagnostic procedures.

Strategy for generating conformationally constrained small molecules.

For very high affinity binding and specificity, it may be advantageous to use such peptides which are conformationally constrained in a manner similar to their constraint in the structural template in which they were found to be functional. Such conformational constraint will reduce the conformational entropy of said peptides and thus provide an increased free energy of binding, by destabilizing the unbound state relative to the bound state. The invention lends itself especially well to the design of conformationally constrained functional peptides, since the anchor-pomts of functional peptides can be assumed to remain well integrated into the structural context of the PH domain framework. Since the anchor points lie in two hydrogen bonded antiparallel beta-strands, the cyclization of the peptide via a beta-turn mimetic will ensure a conformational constraint highly similar to that of the original structure from which said peptide was screened or evolved. Such cyclizations are well known to one of skill in the art.

Pharmaceutical compositions and gene therapy.

All the compositions described above may be parts of and/or active principles in pharmaceutical compositions.

They may also be parts of and/or active principles in the preparation of vaccines.

The compositions of the invention, by their nature of being principally derived through methods of genetic engineering, lend themselves especially well to applications of gene therapy. For instance modified PH domains may be encoded in an expressible form on a suitable vector for gene therapy, and may be expressed m targeted cells whereupon a functional modified PH domain may become active.

The figures show:

Figure 1 illustrates the folding topology which defines a PH domain. Secondary structural elements are shown as arrows (beta strand) or helix (alpha helix) and loop AB connecting beta-strands A and B, loop CD connecting beta- strands C and D and loop FG connecting beta-strands F and G are labeled. The N and C terminus of the domain are also labeled.

Figure 2 illustrates a sequence alignment of the structurally characterized PH domains of rat dynamin (Dynl_Rat) and human spectrm (Spcnmr_Hum) , as well as the PH domains of human beta-adrenergic receptor kmase (Arkl_Human) , human Vav protein (Ph-Hvav) , human cytohesin 1 (Cythl_Huma) , human cytohesin 2 (Cyth2_Huma) and human Ras protein (Ph-Rasa-Hu) . The exact location and extent of the secondary structure elements relative to the alignment is indicated by underscoring the residues with the character "=" and the strands and helix are labeled with "Strand" (abbreviated "S.") and "Helix" respectively. The location and extent of the loops AB, CD, EF and FG relative to the alignment is indicated by underscoring the residues with the character "*" and the loops are labeled with "Loop" .

Figure 3 shows the position and approximate relative size of the genetic elements on plasmid pPHCYl .

Figure 4 illustrates the DNA sequence (SEQ ID NO: 11) of the expression plasmid pPHCYlLl for the expression of the modified PH domain sequence PHCY1L1 described in Example 1, and the ammo acid sequence (SEQ ID NO: 12) of the modified PH domain sequence PHCY1L1 contained therein.

Figure 5 illustrates the DNA sequence (SEQ ID NO: 13) of the random oligonucleotide primer used for the generation of the library of peptides inserted into the PHCY1L1 domain described in Example 1 .

Figure 6 illustrates the DNA sequences (SEQ ID NO:l, SEQ ID NO:3, SEQ ID NO: 5, SEQ ID NO:7 and SEQ ID NO: 9) of the m- frame mutants with which stability measurements were performed as described in Example 1. The sequences listed here begin with the nucleotide corresponding to nucleotide 139 of the plasmid sequence (SEQ ID NO: 11) , codon 12 of the PH domain sequence.

Figure 7 summarizes the stability measurements for progenitor and mutant PH domains. Delta G is the free energy of folding of the domains m PBS at 25 degree C given m kJ/mol .

Figure 8 illustrates the DNA sequences of the oligonucleotides for the replacement of cysteine residues as described in Example 4. The oligonucleotides complement the coding strand of the expressed domain and are thus designated "reverse". The mutations were C32A (SeQ ID

N0:17) , C66 (SeQ ID No: 18) , C82A (SeQ ID No:19) and C114A

(SeQ ID No: 20) , where the first letter denotes the wild type ammo acid in the one letter code, the number denotes the position in the sequence and the last letter denotes the replacement choice.

Figure 9 shows the position and approximate relative size of the genetic elements on plasmid pTLP2.

Figure 10 illustrates the relative fluorescence of Green Fluorescence Protein in a suspension of bacterial cells, plotted against the concentration of anhydrotetracyclme in the culture medium. The titration of both the wildtype tet- repressor (filled circles) and the tet-repressor-PH-domain fusion protein with anhydrotetracyclme results in equal levels of induction of fluorescence. This demonstrates that the wildtype repressor and the fusion protein have the same affinities for binding the inducer as well as for binding the operator site.

Figure 11 shows the sequence of the fixed reverse primer, complementary to the coding strand, used in the randomization experiment described in Example 6. It also shows the sequence and target base compositions for the randomized oligonucleotides used in the same example. Looplrm-ls is desigend to randomize Loop AB and Loop2rm-s is designed to randomize Loop CD.

Figure 12 shows the comparison of loop sequences from the randorized library described in example 6. The position of the nucleotides refer to Seq ID 21, the position of the corresponding amino acids refer to Seq ID 22. The mutated nucleotides are underlined, the corresponding ammo acids are in italics.

Figure 13 illustrates the nucleotide and amino acid sequence of the CR6 peptides inserted in three loop sequences of tne PHCY1 synthetic PH domain. Restriction enzyme sites are shown in cold. Underlined sequences represent the oligonucleotides used for the PCR reactions. Underlined ammo acid sequences represent the CRβ peptides.

Figure 14 shows the expression pattern of synthetic PH domaιn/B42 transactivator domain fusion proteins in yeast cells. Whole protein of two independent clones carrying a synthetic PH domain without the CR6 protein (pJGPHwt, lanes 1,2 and 5,6) and two independent clones carrying the CR6 peptide inserted in loop AB of a synthetic PH domain (pJGPHCRβ, lanes 3,4 and 7,8) were analysed for expression of the fusion protein under repressed condition (lanes 1-4) and induced conditions (lanes 5-8) . After SDS PAGE and Western blotting, proteins were detected with a monoclonal antibody raised against the HA-epitope fused to the B42 domain. Arrows mark the prominent band running with the expected molecular weight for the respective fusion protein.

Figure 15 shows a comparison of beta-galactosidase activities from wt and mutant synthetic PH domains, m which the CR6 peptide was inserted into loop sequences. Beta-galactosidase activities were determined from four independently isolated clones from each construct. Given are the men values which showed a standard deviation between 5 to 15%.

The following examples illustrate the invention.

Example 1: High level expression and folding stability measurement of a synthetic PH Domain

A cDNA clone of the human cytohesin 1 protein (formerly known as sec7 homology protein, Genbank ID: Humsec7hom) , obtained from human natural-killer T lymphocytes was obtained by standard procedures. The PH domain from this cDNA was amplified in a PCR reaction, utilizing primers to generate a synthetic PH domain comprising a start codon and

non-natural N-terminal ammo acid for the generation of an Nco I restriction site, a non-natural glycme and hexahistidine tag on the C-termmus forming a BstE II restriction site, a stop codon and an EcoR I restriction site. The resultant synthetic PH domain was cloned into the expression vector pRSet5d (GenBank X54205) under control of the T7 promoter to obtain plasmid pPHCYl (Figure 3, 4) (SEQ ID NO: 13) . A protocol for preparative scale expression and purification was used in the following way: 2 L of culture medium was inoculated with an overnight culture of E.coli BL21 (DE3) , transformed with the appropriate mutant plasmids. The cells were induced through addition of ImM IPTG at an OD(600 nm) of approximately 0.6 and grown for a further 3 h, when the cells were pelleted by centrifugation. Cell pellets were washed once with PBS

(phosphate buffered saline) and resuspended in approximately 2mL/g cell-wet-weight of buffer A (50 mM Tris

HCI, pH 8.0, 1 M NaCl) to which 0.01 mg/mL Lysozyme and

0.005 mL of a 10 mg/mL stock of DNAse/RNAse was added. The cells were incubated for 20 minutes in the cold room and subsequently lysed by sonication. Tne suspension was centrifuged at an RCF of approximately 47000 for 20 minutes. Protein was purified in a single step from this extract by immobilized metal affinity chromatography on a 2mL Pharmacia chelating Sepharose bed, preequilibrated with buffer A. The supernatant from the centrifugation was applied by gravity flow, washed with 10 mL of 20 mM imidazole in buffer A and the protein was eluted with 8 mL of 300 mM imidazole in buffer A. The eluted protein was dialyzed against PBS containing 1 mM DTT and brought to 5 mM DTT, 0.02% sodium azide, before storage at 4 degrees C. The yield of the isolated synthetic PH Domain was 25 mg of purified product per L of bacterial culture, equivalent to 20 % of the whole soluble cell protein.

For the determination of folding stability, 1.15 microM samples of protein were equilibrated in a final volume of 0.6 mL of PBS with varying amounts of GdmHCl (guanidinium chloride) , at least overnight. The fraction of unfolded and

folded protein, as a function of the concentration of the denaturant GdmHCl, was determined through observation of a large fluorescence change upon unfolding, most likely due to the change in solvent accessibility of the single tryptophan residue of the protein. Free energy values of folding were calculated assuming a linear relationship of the free energy of folding on the denaturant concentration, and applying a pre- and post-unfolding baseline correction in a non-linear least-squares fit of the observed fluorescence to the denaturant concentration, assuming a two-state model of folding. The stability was found to be -39 ± 1.6 kJ/mol , which is surprisingly high for an isolated protein domain. This makes the protein a good starting point for random mutagenesis experiments and for further optimizations of a suitable rigid backbone and of properties for biotechnological handling.

In summary, with this example we have demonstrated that a synthetic PH domain based on the sequence of the human protein cytohesin 1 can be expressed to high levels in a bacterial host, folds stably, and can be purified in a single step from a bacterial extract via an affinity tag genetically fused to the C-termmus.

Example 2 - A library of sequences m loop AB of a synthetic PH domain based on the cytohesin 1 sequence To verify the suitability of the framework of this synthetic PH domain for random mutagenesis, and to prepare a first embodiment of the invention, site directed mutagenesis was used to generate a library of domain sequences using oligonucleotide CylllRMas (SEQ ID NO: 15) (Figure 5) . The oligonucleotide was designed to generate randomized sequences in loop AB, two new restriction sites 5' and 3' of the coding sequence, and to eliminate the unique restriction site of BsmB I . Phagemid single strand

DNA was prepared from pPHCYl in the dut " ung " strain CJ236

[Kunkel TA, 1985] (BioRad Laboratories, Munchen, Germany) following procedures well known to one of skill in the art.

100 fmol single strand-DNA and 2 pmol of phosphorylated

oligonucleotide was used. After transformation into E. coli

XLl-blue (Stratagene GmbH, Heidelberg, Germany) , 2000 clones were obtained. These were combined, grown m LB-

Medium and plasmid DNA was prepared. An aliquot of the DNA was digested with the restriction endonuclease BsmB I to eliminate wild type DNA and competent E.coli BL21(DE3)

(Novagen Inc., Madison, USA) were transformed. Again, 2000 clones were obtained. Of these clones, 30 were picked at random and plasmid DNA was prepared from these 30 clones. A restriction digest demonstrated the presence of a mutant genotype in 22 of these 30 clones.

After sequencing of 18 of these 22 clones on a fluorescent dideoxy sequencing machine using standard protocols, 8 ln- frame mutant sequences were identified. 1 sequence contained an altered restriction site but wild-type loop sequence, 9 other sequences contained frameshift errors in the primer region, probably due to synthesis problems in the commercially obtained primer. All clones were grown in 2 mL cultures of LB and expression was induced through addition of ImM IPTG to the culture medium when the cells had reached an OD(600 nm) of approximately 0.6. Three hours later, the cells were pelleted through centrifugation, resuspended m 100 microliters of phosphate buffered saline and opened by sonication. The insoluble fraction was pelleted by centrifugation. Pellets and soluble supernatant were analyzed separately by SDS PAGE. In all lanes of the soluble protein fraction from clones containing mutant sequences in frame, a prominent band was visible running with the expected molecular weight of the synthetic, mutated PH domain. No such band was visible in any of the frameshifted sequences. The band corresponded to approximately 20% of the soluble cellular protein in all cases in which it was observed No corresponding band was seen m any of the samples of tne insoluble fraction. Whole cells from all clones were analyzed via SDS PAGE and Western blot with a monoclonal anti-His tag antibody (Dianova GmbH, Hamburg, Germany) . After detection with alkaline phosphatase conjugated anti-mouse antibody, the

expected band corresponding to the synthetic PH domain was seen m all lanes corresponding to the clones m which a prominent band had already been seen in the soluble cellular extract. No band was seen m any of the other lanes. These experiments demonstrate that all those clones posessmg in-frame random sequence mutations in loop AB were expressed at levels corresponding to those of the wildtype progenitor sequence. Further, no inclusion bodies were visible, in spite of the high expression levels driven by the T7 polymerase system. Together, expressibility and solubility are indicative of the structural integrity of the PH domains carrying randomized mutant sequences.

5 clones from those containing in-frame mutant sequences were chosen for further analysis [Figure 6] . The purification and stability measurement of these mutants was performed according to the protocol given in Example 1. The results of the stability measurements are summarized in

Figure 7. It is obvious, that the randomization is as complete as can be expected: only one sequence shares one position with the wild type and only one sequence shares one position with another mutant. Still, all mutants have comparable stabilities, deviating only by a small amount from the wild-type sequence. The average free energy of folding is -32.4 (+- 3.3) kJ/mol and even that sequence which has lost the largest amount of free energy of folding

(SEQ ID NO: 4) , falls well into the range of values reported for stably folding proteins ( > 20 kJ/mol) .

In summary, with this example we have successfully randomized a loop sequence of a PH domain and demonstrated that such randomizations indeed do not compromise the structural integrity and folding stability of the progenitor domain. While these mutant sequences were not selected on the basis of any functional property, now that the fact has been established that such mutations can be performed in principle, it is obvious that functional sequences can be derived from large libraries of sequences,

prepared according to the methods given above and others known to one of skill m the art.

Example 3: Creation of a stabilized synthetic PH-domain free of cysteine residues

In order to prevent any undesirable formation of intra- or inter-domain disulphide linkages in our modified PH domain, and to further improve resistance towards oxidative damage, we have replaced all 4 cystemes present in the natural progenitor sequence with other ammo acids. The oligonucleotides used for the mutations are illustrated in Figure 8. All mutations were performed according to a standard protocol for oligonucleotide directed mutagenesis and confirmed by dideoxy sequencing. The resulting protein SEQ ID 22 can no longer form any inter- or intra- domain disulfide linkages. Through judicious choice of replacements, the novel sequence shows a further increased thermodynamic stability of -41.6 ±1.8 KJ/mol, measured according to the protocol given m Example 1

In summary, with this example we have successfully introduced further mutations into the sequence of a synthetic PH domain and demonstrated that such mutations need not lead to destabilized protein The resultant protein cannot form any disulfide lιnκages which might inactivate the protein or lead to aggregation and thus it posesses properties more desirable than those of the progenitor domain for requirements of biotechnological procedures and for the stability of compositions comprising such domains. Yet it is more stable than the progenitor protein.

Example 4: Functional integrity of a fusion protein of tet- repressor and a synthetic PH domain

To demonstrate the folding, stability and functionality as well as the easy biotechnological handling of a fusion protein comprising a synthetic PH domain and a DNA-binding

domain, the synthetic PH domain of Example 3 was genetically fused to the C-Termmus of the tet-repressor class D sequence. This fusion protein allows site specific binding of a protein comprising a synthetic PH domain to a DNA sequence with high affinity. The sequences of the tet- repressor coded by the plasmid pASK75 [Skerra A, 1994] and of the synthetic PH domain described in example 3 were amplified in a PCR reaction. The utilized primers created an overlapping linker sequence GCT CCG GCA GCT GCT AAA CAG GAA GCT GCA CCG GCT GCA coding for the peptide APAAAKQEAAPAA. This linker connected the last ammo acid of the tet-repressor directly to the first ammo acid of the PH domain. An NspV-site was introduced into pASK75 by a standard site directed mutagenesis procedure, 3' of the tet-repressor sequence and the amplification product was cloned into pASK75 using the Nsil site inside the tet- repressor and the NspV-site 3'of the PH domain. In the resulting plasmid pTLP2 [Figure 9] the fusion protein is expressed from a polycistronic message, constltuitively transcribed from the beta-lactamase promoter gene. To monitor the function of the tet-repressor, the gene for the green fluorescent protein including a ribosome binding site has been Xbal/Hmdlll cloned downstream of the tet- operator/tet-promotor region replacing the ompA sequence, and the strep tag of pASK75. This plasmid (Seq ID 23) was used to transform competent E. coli JM109 cells (Stratagene GmbH, Heidelberg, Germany) .

To demonstrate the functionality and DNA sequence specific binding of the fusion protein, E. coli JM109 cells were grown to an OD(600 nm) of 0.6 and induced with different concentrations of anhydrotetracyclme. After 3 hours of expression 1 mL aliquots of cells were harvested by centrifugation, washed once with phosphate buffered saline and resuspended in 2 mL of PBS. The cell suspensions were adjusted to an OD(600nm) of 0.5 and were submitted to fluorescence analysis at 22°C using an excitation wavelength of 395±2.5 nm and an emission wavelength of 510+2.5 nm.

Figure 10 shows the comparison of the relative fluorescence of cultures of E. coli JM109 expressing wildtype tet- repressor and cultures expressing the fusion protein, as a function of the concentration of the inducer anhydrotetracyclme in the culture medium. The presence of anhydrotetracyclme in the cell culture medium results in the dissociation of the complex of the tet-repressor-PH- Domain from the tet-repressor DNA binding site, which induces the expression of the reporter gene green fluorescent protein, which in turn can be quantified by fluorescence measurements. The two titration curves are nearly identical, demonstrating the functional integrity of the fusion protein in spite of the presence of the PH domain.

In summary, with this example we have successfully demonstrated the utility of the fusion protein to bind synthetic PH domains directly to the plasmid encoding their genetic information, via a genetic fusion to the tet- repressor and a tet-operator binding site on the plasmid DNA. In this manner genetic information from a library of sequences and their respective expressed phenotypes can be physically coupled, for use in procedures for the screening of novel molecular surfaces. While this screening strategy is analogous to a system for screening which has been successfully employed by others, based on the lac-repressor protein and the lac-operator DNA sequence [Cull MG, 1992] , our synthetic PH domains have a unique potential beyond the system described in above reference, in that they may provide a much larger molecular interaction surface and they possess the potential of using discontinuous epitopes for screening.

Example 5: Synthesis of a large complex library of synthetic PH domains with randomized discontmous surface epitopes

To demonstrate the utility of PH domains to present randomized libraries of discontmous surface epitopes, in a first experiment the Loops AB and CD were randomized at the same time. Oligonucleotides were synthesized on an Eppendorf Ecosyn D300 synthesizer to comprise a part of the coding PH domain sequence flanking a randomized portion in the regions coding for loop ammo acids. Each loop was separately randomized by performing a PCR reaction using an oligonucleotide coding for a randomized sequence as forward primer, in combination with a fixed reverse primer complementary to the 3 '-end of the PH-domam sequence. The resulting products were restricted by the single cutting restriction endonucleases Aflll and Accl (Loop AB) respectively Accl and Xhol (Loop CD) creating compatible cohesive ends. Fragments of the expected size were gel purified and ligated, resulting in a non-interrupted PH- domam sequence corresponding to bp 3153 - 3482 of SEQ ID No. 21. The ligation products were amplified by PCR using the randomized forward primer of loop AB and the fixed reverse primer. The Sequences and target copmositions of the synthesized oligonucleotides are given m Figure 11.

The resulting PCR product (500 ng) was restricted with the enzymes Aflll and Xhol, gel purified and ligated to the compatibly restricted and purified vector (3.5 micro g) . The ligation product was dialyzed and used for electroporation of competent E. coli BL21(DE3) .

The resulting library consists of 2*10 6 individual clones which are assumed to be independent . Figure 12 shows the loops sequences of 9 arbitrary chosen clones. The observed ammo acid composition at the randomized position closely matches that expected from the target nucleotide composition of the oligonucleotides.

In summary, with this example we have successfully constructed a large complex library of PH domain sequences, which are randomized in two loops at the same time and demonstrated that such randomizations indeed need not compromise the structural integrity and folding stability of the progenitor domain. While these mutant sequences were not selected on the basis of any functional property, now that the fact has been established that such mutations can be performed in principle, it is clear that functional sequences can be derived from large libraries of sequences, prepared according to the methods given above and others known to one of skill in the art.

Example 6: Generation of a synthetic PH domain capable of interacting with another protein in a yeast interaction trap

To show that novel functional properties, such as the interaction capacity with a target protein, can indeed be generated in the context of a synthetic PH domain, we have constructed a synthetic PH domain that posesses a moiety of the viral transcription factor from the Epsstem Barr Nuclear Antigen 2 (EBNA-2) . EBNA-2 nas been described to interact with the C-promoter binding factor CBF [Henkel T, 1994; Ling P, 1995] and this interaction is critically dependent on the CR6 peptide of EBNA, the sequence GAPSGPPWWPPIGAG.

In order to display the CR6 peptide in the structural context of a synthetic PH domain, the synthetic PH domain from plasmid pPHCYCl was amplified m a PCR reaction, utilizing primers to generate a synthetic PH domain comprising a EcoR I and Spe I restriction site and a start codon, a stop codon and a Sal I restriction site [primers PHT and PHB, figure 13] . Said domain was cloned into compatible sites of the yeast expression vector pJG4-5 (Ausubel et al . ) under control of tne GAL1 promoter as a C terminal fusion to the B42 transactivation domain and to a Hemagglutinin epitope to obtain plasmid pJGPHwt . In order to insert the CR6 peptide [Ling P, 1995] between amino

acid 15 and 18 of loop AB of the said domain two independent PCR reactions were performed. One utilized the 5' -primer for the said domain with the EcoR I restriction site and a start codon and an internal primer designed for generation of a Bam HI restriction site coding for an additional serme and coding of the first ammo acids of the CR6 peptide and an Nco I restriction site [Figure 13] . The other utilized the 3' primer for the said domain with the stop codon and Sal I restriction site and a internal primer designed for an Nco I restriction site, for coding of the last ammo acids of the CR6 peptide and an additional tryptophan [primers LITW and LIBW, figure 13] . After digestion of the isolated PCR fragments with either EcoR I and Nco I or Nco I and Sal I, both resulting fragments were ligated simultaneously to the EcoR l/Xho I cleaved vector pJG4-5 to obtain the plasmid pJG- PHCR6 (Figure 13) .

Plasmid pJG-PHwt and pJG-PHCR6 were transformed into the yeast strain EGY48 (Ausubel et al . ) and whole cells from two independent clones of each construct, either grown under repressed conditions in the presence of glucose or grown under induced conditions in the presence of galactose, were analyzed via SDS PAGE and Western blot with anti-HA tag antibody (Boehπnger Mannheim, Germany) . All methods were performed using standard protocols (Sambrook et al . ) .

After detection with alkaline phosphatase conjugated anti mouse antibody, no expression of the PHwt/B42 or PHCR6/B42 fusionprotem could be detected when cells were grown under non induced conditions (Figure 14) . Under induced conditions, a prominent band was visible running with the expected molecular weight for the respective fusion proteins. (Figure 14) Clones were expressing the PHCR6 fusion protein to a very similar amount as the PHwt fusion protein.

Using a PCR approach as described above [primers PHT, PHB, CR6TL1, CR6BL1, CR6TL2 , CR6BL2 , CR6TL3 , CR6BL3 m figure 13] the CR6 peptide without any additional ammo acids was introduced between ammo acids 16 and 17 of loop AB, between ammo acids 41 and 42 of loop CD or between ammo acids 88 and 89 of loop EF to obtain the vectors pJGPHCRδ.Ll, pJGPHCR6.L2 or pJGPHCR6.L3 (Figure 13) .

The 500 ammo acid open reading frame of CBF [Amakawa R, 1993; Henkel T, 1994] was cloned into the expression vector pSHl (Ausubel et al . ) where it is genetically fused to the DNA binding domain of lex-A resulting in vector pSH- CBF. Vector pSH-CBF was cotransformed with the vector pJG4- 5 (negative control) or the corresponding pJG-PH vectors into the yeast strain EGY48. Four independent clones of each pJG-PH-construct were grown under inducing conditions in the presence of galactose and the beta-galactosidase activity was determined according to standard procedures.

Only fusion proteins carrying a CR6 peptide in one of the loops of the PH-domam showed an interaction with the CBF protein in an interaction trap assay (Figure 15) . Insertion of the CR6 peptide into the loop EF leads to the strongest protein/protein interaction as deduced from the highest beta-galactosidase activity.

In summary, with this example we have shown that the PH domain can be expressed to high levels in yeast cells, as a fusion protein with the B42 transacivation domain, even when foreign peptide sequences are inserted into the original domain sequence. We have further shown, that such a foreign sequence can retain properties of its progenitor protein - in our example this is the capacity to interact with the CBF protein - and that such properties are now conferred to the synthetic PH domain containing said foreign sequence. Finally we have shown, that the interaction of a synthetic PH domain and a target protein can be detected in a yeast interaction trap system. Thus synthetic PH domains are indeed suitable for the interaction trap based screening of peptide / protein and of protein / protein interactions.

1. [Amakawa R et al (1993) Human Jk recombination signal binding protein gene (IGKJRB) : comparison with its mouse homologue., Genomics, 1/7:306-315] xxx. [Ausubel MF et al (1992) Current Protocols in Molecular Biology, John Wiley & Sons, New York]

2. [Blaber M et al (1993) Structural basis of ammo acid alpha helix propensity, Science, 260 : 1637-16401

3. [Cull MG et al (1992) Screening for receptor ligands using large libraries of peptides linked to the C- termmus of the Lac Repressor, PNAS, 89 : 1865-18691 4. [Downing AK et al (1994) Three-dimensional solution structure of the pleckstrin homology domain from dynamm, Curr. Biol . , 4.: 884-891] 5. [Estojak J et al (1995) Correlation of two-hybrid affinity data with in vitro measurements, MCB, 15:5820-

5829] 6. [Evan GI et al (1985) Isolation of Monoclonal Antibodies

Specific for Human c-myc Proto-Oncogene Product., MCB, 5.:3610-3616]

7. [Ferguson KM et al (1994) Crystal Structure at 2.2 A

Resolution of the Pleckstrin Homology Domain from Human Dynamm, Cell, 79:199-209]

8. [Ferguson KM et al (1995) Structure of the High Affinity Complex of Inositol Tπsphosphate with a Phospholipase

C Pleckstrin Homology Domain, Cell, 83.1037-1046] 9. [Ferguson KM et al (1995) Scratching the surface with the

PH domain, Nature Struct. Biol.. 2 . :715-718] 10. [Fields S and Song 0 (1989) A novel genetic system to detect protein-protein interactions, Nature, 340:245- 246] xx. [Fodor SPA et al (1991) Light-directed, spatially addressable parallel chemical synthesis. Science, 251:767-7731 11. [Gibson TJ et al (1994) PH domain: the first anniversary, TIBS, 19:349-353] 12. [Harlan JE et al (1995) Strcutural Characterization of the Interaction between a Pleckstrin Homology Domian

and Phosphatidylmositol 4 , 5-Bιsphosphate, Biochemistry, 34 : 9859-9864] 13. [Haslam RJ et al (1993) Pleckstrin domain homology, Nature, 363 :309-310l 15. [Henkel T et al (1994) Mediation of EBV EBNA2 Transactivation by Recombination Signal-Binding Protein Jk, Science, 265:92-951 16. [Hoess RH (1993) Phage display of peptides and protein domains, Curr. Qpm. Struct. Biol. , 2:572-579] 17. [Hyvόnen M et al (1995) Structure of the binding site for mositol phosphates in a PH domain, EMBO J. , 14 = 4676-46851 18. [Kim JS and Raines RT (1993) Ribonuclease S-peptide as a carrier m fusion proteins., Protein Sci , 2_:348-356] 19. [Kunkel TA (1985) Rapid and efficient site-specific mutagenesis without phenotypic selection, PNAS, 82 :488- 492] xx. [Lam KS et al (1991) A new type of synthetic peptide library for identifying ligand binding activity. Nature, 354 :82-84

20. [Lindner P et al (1992) Purification of Native Proteins from the Cytoplasm and Periplasm of Escheri chia coli Using IMAC and Histidine Tails A Comparison of Proteins and Protocols, Methods, 4_ 41-56] 21. [Lmg P and Hayward S (1995) Contribution of conserved Ammo Acids m Mediating the Interaction between EBNA2 and CBFl/RBPJk, J. Virol.. 69:1944-1950]

22. [Little M et al (1993) Bacterial Surface Presentation of

Proteins and Peptides - An Alternative to Phage Technology, TiBtech, 11:3-5]

23. [Macias MJ et al (1994) Structure of tne pleckstrin homology domain from b-spectrm, Nature , 369

24. [Mahadevan D et al (1995) Structural Studies of the PH domains of Dbl, Sos 1, IRS-1, and bARKl and Their Differential Binding to Gbg Subunits, Biochemistry, 31:9111-9117] 25. [Mattheakis LC et al (1994) An m vi tro polysome system for identifying ligands from very large peptide libraries, PNAS, 91:9022-90261

26. [Mayer BJ et al (1993) A putative modular domain present in diverse signalling proteins, Cell , 73 : 629-6301

27. [McCollam L et al (1995) Functional roles for the

Pleckstrin and Dbl Homology Regions in the Ras Exchange Factor Son-of-sevenless, J.B.C. , 270 : 15954-15957]

28. [McCoy J and Lavallie ER (28 July 1992) WO patent

94/02502]

29. [Mύller HN and Skerra A (1994) Grafting of a High- Affmity Zn(II) -Binding Site on the b-Barrel of Retmol-Bmdmg Protein Results m Enhanced Folding Stability and Enables Simplified Purification, Biochemistry, 33 : 14126-141351

30. [Murby M et al (1995) Hydrophobicity engineering to increase solubility and stability of a recombinant protein from respiratory syncitial virus, Eur. J.

Biochem. , 230 :38-44l 31. [Nicholson H et al (1988) Enhanced protein thermostability from designed mutations that interact with a-helix dipoles, Nature, 336:651-6561 32. [Nicholson H et al (1992) Analysis of the Effectiveness of Proline Substitutions and Glycme Replacements in

Increasing the Stability of Phage T4 Lysozyme,

Biopolvmers, 32 xx. [Nyfeler H (1994) Peptide synthesis via fragment condensation, Methods . Mol . Biol . 35:303-316

33. [Roguska MA et al (1994) Humanization of murine monoclonal antibodies through variable domain resurfacing, PNAS, 91 : 969-9731 xx. [Sambrook J et al (1989) Molecular cloning, Cold Spring Harbour Press, New York]

34. [Schmidt TGM and Skerra A (1994) One-step affinity purification of bacteπally produced proteins by means of the "Strep tag" and immobilized recombinant core streptavidm. , J. Chrom. A, 676:337-345] 35. [Schultz DA et al (1992) Cis prolme mutants of πbonuclease A. II. Elimination of the slow-folding forms by mutation., Protein Sci. , 1:917-924] 36. [Serrano L and Fersht AR (1989) Capping and a-helix stability, Nature, 3.42:296-299]

37. [Shoichet BK et al (1995) A relationship between protein stability and protein function, PNAS, 92 :452-456] 38. [Skerra A (1994) Use of the tetracyclme promoter for the tightly regulated production of a murine antibody fragment in Escherichia coli . , Gene, 151 : 131-135]

39. [Smith CK et al (1994) A Thermodynamic Scale for the fa- Sheet Forming Tendencies of the Amino Acids,

Biochemistry, 33 . :5510-5517] 40. [Smith DB and Johnson KS (1988) Single-step purification of polypeptides expressed in Escherichia col i as fusions with glutathione S-transferase . , Gene, 67:31-

40] 41. [Steipe B et al (1994) Sequence Statistics Reliably

Predict Stabilizing Mutations in a Protein Domain, J.M.B. , 24JD: 188-192]

42. [Timm D et al (1994) Crystal structure of the pleckstrin homology domain from dynamin, Nature Struct . Biol . ,

1:782-788] 43. [Touhara K et al (1994) Binding of G protein bg-subunits to pleckstrin homology domains, JBC, 269 ; 10217-102201

44. [Touhara K et al (1995) Mutational Analysis of the

Pleckstrin Homology Domain of the b-Adrenergic Receptor

Kinase, J.B.C. , 270:17000-170051 45. [Tsukada S et al (1994) Binding of bg subunits of heterotrimeπc G proteins to the PH domain of Bruton tyrosine kinase, PNAS, 91 : 11256-112601 46. [Vιspo NS et al (1993) Hybrid Rop-Pm Proteins for the

Display of Constrained Peptides on Filemantous Phage

Capsids, Ann. Biol. Clin. , 51 :917-922] 47. [Wang DS et al (1994) Binding of PH-domains of b- adrenergic receptor kinase and b-spectrm to WD40/b- transducin repeat containing regions of the b-subumt of trimeric G-proteins, BBRC, 203 :29-35l 48. [Yoon HS et al (1994) Solution Structure of a Pleckstrin Homology Domain, Nature, 3_6_9:675-677]

49. [Zhang P et al (1995) Solution structure of the pleckstrin homology domain of Drosophila b-spectrm,

Structure, 2:1185-1195]

SEQUENCE LISTING

(1) GENERAL INFORMATION

(l) APPLICANT

(A) NAME Boris Steipe

(B) STREET Unterbrunnerstr 10

(C) CITY Gautmg (E) COUNTRY DE

(F) POSTAL CODE (ZIP) 82131

(G) TELEPHONE +49 89 8508941

(A) NAME MediGene GmbH (B) STREET Lochhamer Str 11

(C) CITY Martmsried / Muenchen

(E) COUNTRY DE

(F) POSTAL CODE (ZIP) 82152

(G) TELEPHONE +49 89 8956320

(ii) TITLE OF INVENTION Novel synthetic protein structural templates for the generation, screening and evolution of functional molecular surfaces

(ill) NUMBER OF SEQUENCES 72

(iv) COMPUTER READABLE FORM

(A) MEDIUM TYPE Floppy disk

(B) COMPUTER IBM PC compatible (C) OPERATING SYSTEM PC DOS/MS-DOS

(D) SOFTWARE Patentln Release #1 0, Version #1 30 (EPO)

(2) INFORMATION FOR SEQ ID NO 1

(l) SEQUENCE CHARACTERISTICS

(A) LENGTH 33 base pairs

(B) TYPE nucleic acid

(C) STRANDEDNESS double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(iii) HYPOTHETICAL: NO

(ix) FEATURE: (A) NAME/KEY: CDS

(B) LOCATION:!..33

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1:

AAG CTC GGA GCC AAC AAC CTG TTT ACC TGG AAG 33

Lys Leu Gly Ala Asn Asn Leu Phe Thr Trp Lys 1 5 10

(2) INFORMATION FOR SEQ ID NO: 2

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 11 amino acids (B) TYPE: amino acid

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2

Lys Leu Gly Ala Asn Asn Leu Phe Thr Trp Lys 1 5 10

(2) INFORMATION FOR SEQ ID NO: 3:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 33 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY linear

(11) MOLECULE TYPE- other nucleic acid

(A) DESCRIPTION /desc = "synthetic oligonucleotide'

(ill) HYPOTHETICAL NO

(ix) FEATURE: (A) NAME/KEY. CDS

(B) LOCATION.1 .33

(Xl) SEQUENCE DESCRIPTION: SEQ ID NO: 3

AAG CTC GGA CAT GCT AGG GAG TTG ACC TGG AAG 33

Lys Leu Gly His Ala Arg Glu Leu Thr Trp Lys 15 20

(2) INFORMATION FOR SEQ ID NO: 4

(l) SEQUENCE CHARACTERISTICS (A) LENGTH. 11 amino acids (B) TYPE, ammo acid

(D) TOPOLOGY- linear

(li) MOLECULE TYPE protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4

Lys Leu Gly His Ala Arg Glu Leu Thr Trp Lys 1 5 10

(2) INFORMATION FOR SEQ ID NO: 5:

(l) SEQUENCE CHARACTERISTICS

(A) LENGTH: 33 base pairs

(B) TYPE, nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY linear

(ul MOLECULE TYPE other nucleic acid

(A) DESCRIPTION /desc = "synthetic oligonucleotide"

(in) HYPOTHETICAL NO

(ix) FEATURE (A) NAME/KEY CDS

(B) LOCATION.1..33

(xi) SEQUENCE DESCRIPTION SEQ ID NO- 5

AAG CTC GGA TCG CCC CCC AAT CTT ACC TGG AAG 33

Lys Leu Gly Ser Pro Pro Asn Leu Thr Trp Lys 15 20

(2) INFORMATION FOR SEQ ID NO 6

(l) SEQUENCE CHARACTERISTICS (A) LENGTH 11 ammo acids (B) TYPE amino acid

(D) TOPOLOGY linear

(n) MOLECULE TYPE protein

(xi) SEQUENCE DESCRIPTION. SEQ ID NO: 6

Lys Leu Gly Ser Pro Pro Asn Leu Thr Trp Lys 1 5 10

(2) INFORMATION FOR SEQ ID NO 7

(ι) SEQUENCE CHARACTERISTICS

(A) LENGTH 33 base pairs

(B) TYPE nucleic acid

(C) STRANDEDNESS double

(D) TOPOLOGY, linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide'

(iii) HYPOTHETICAL: NO

(ix) FEATURE: (A) NAME/KEY: CDS

(B) LOCATION:!..33

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7:

AAG CTC GGA CGT CCC CTT CTT CAC ACC TGG AAG 33

Lys Leu Gly Arg Pro Leu Leu His Thr Trp Lys 15 20

(2) INFORMATION FOR SEQ ID NO: 8:

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 11 amino acids (B) TYPE: amino acid

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:

Lys Leu Gly Arg Pro Leu Leu His Thr Trp Lys 1 5 10

(2) INFORMATION FOR SEQ ID NO: 9-

(l) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 33 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY, linear

(11) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(ill) HYPOTHETICAL: NO

(ix) FEATURE: (A) NAME/KEY. CDS

(B) LOCATION:!..33

(xi) SEQUENCE DESCRIPTION: SEQ ID NO 9

AAG CTC GGA AAC ATG AAT TAC CCC ACC TGG AAG 33

Lys Leu Gly Asn Met Asn Tyr Pro Thr Trp Lys 15 20

(2) INFORMATION FOR SEQ ID NO: 10

(l) SEQUENCE CHARACTERISTICS (A) LENGTH. 11 ammo acids

(D) TOPOLOGY, linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO 10

Lys Leu Gly Asn Met Asn Tyr Pro Thr Trp Lys 1 5 10

(2) INFORMATION FOR SEQ ID NO: 11

(i) SEQUENCE CHARACTERISTICS

(A) LENGTH. 33 base pairs

(B) TYPE nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(iii) HYPOTHETICAL: NO

(ix) FEATURE: (A) NAME/KEY: CDS

(B) LOCATION:!..33

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11:

AAG CTC GGA GGT GGC AGG GTA AAG ACC TGG AAG 33

Lys Leu Gly Gly Gly Arg Val Lys Thr Trp Lys

15 20

(2) INFORMATION FOR SEQ ID NO: 12

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 11 amino acids (B) TYPE: amino acid

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12:

Lys Leu Gly Gly Gly Arg Val Lys Thr Trp Lys 1 5 10

(2) INFORMATION FOR SEQ ID NO: 13:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 3160 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: circular

(11) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "DNA Vector, partially synthetic"

(ill) HYPOTHETICAL: NO

(IX) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION: 106..495

(xi) SEQUENCE DESCRIPTION: SEQ ID NO- 13:

CGTATCCGAT CTCGATCCCG CGAAATTAAT ACGACTCACT ATAGGGAGAC CACAACGGTT 60

TCCCTCTAGA AATAATTTTG TTTAACTTTA AGAAGGAGAT ATACC ATG GCG AAT 114 Met Ala Asn

CCA GAC CGA GAA GGC TGG CTA CTT AAG CTC GGA GGT GGC AGG GTA AAG 162

Pro Asp Arg Glu Gly Trp Leu Leu Lys Leu Gly Gly Gly Arg Val Lys

15 20 25 30

ACC TGG AAG AGG CGC TGG TTC ATT CTG ACT GAC AAC TGC CTT TAC TAC 210

Thr Trp Lys Arg Arg Trp Phe lie Leu Thr Asp Asn Cys Leu Tyr Tyr

35 40 45

TTT GAG TAT ACC ACG GAT AAG GAG CCC CGT GGA ATC ATC CCT TTA GAG 258 Phe Glu Tyr Thr Thr Asp Lys Glu Pro Arg Gly lie lie Pro Leu Glu 50 55 60

AAT CTG AGT ATC CGG GAA GTG GAG GAC TCC AAA AAA CCA AAC TGC TTT 306 Asn Leu Ser lie Arg Glu Val Glu Asp Ser Lys Lys Pro Asn Cys Phe 65 70 75

GAG CTT TAT ATC CCC GAC AAT AAA GAC CAA GTT ATC AAG GCC TGC AAG 354

Glu Leu Tyr lie Pro Asp Asn Lys Asp Gin Val lie Lys Ala Cys Lys 80 85 90

ACC GAG GCT GAC GGG CGG GTG GTG GAG GGG AAC CAC ACT GTT TAC CGG 402

Thr Glu Ala Asp Gly Arg Val Val Glu Gly Asn His Thr Val Tyr Arg 95 100 105 110

ATC TCA GCT CCG ACG CCC GAG GAG AAG GAG GAG TGG ATT AAG TGC ATT 450 lie Ser Ala Pro Thr Pro Glu Glu Lys Glu Glu Trp lie Lys Cys lie

115 120 125

AAA GCA GCC ATC AGC AGG GAC GGT CAC CAC CAT CAC CAT CAC TAG 495

Lys Ala Ala lie Ser Arg Asp Gly His His His His His His * 130 135 140

TAAGAATTCG AAGCTTGATC CGGCTGCTAA CAAAGCCCGA AAGGAAGCTG AGTTGGCTGC 555

TGCCACCGCT GAGCAATAAC TAGCATAACC CCTTGGGGCC TCTAAACGGG TCTTGAGGGG 615

TTTTTTGCTG AAAGGAGGAA CTATATCCGG ATCTGGCGTA ATAGCGAAGA GGCCCGCACC 675

GATCGCCCTT CCCAACAGTT GCGCAGCCTG AATGGCGAAT GGGACGCGCC CTGTAGCGGC 735

GCATTAAGCG CGGCGGGTGT GGTGGTTACG CGCAGCGTGA CCGCTACACT TGCCAGCGCC 795

CTAGCGCCCG CTCCTTTCGC TTTCTTCCCT TCCTTTCTCG CCACGTTCGC CGGCTTTCCC 855

CGTCAAGCTC TAAATCGGGG GCTCCCTTTA GGGTTCCGAT TTAGTGCTTT ACGGCACCTC 915

GACCCCAAAA AACTTGATTA GGGTGATGGT TCACGTAGTG GGCCATCGCC CTGATAGACG 975

GTTTTTCGCC CTTTGACGTT GGAGTCCACG TTCTTTAATA GTGGACTCTT GTTCCAAACT 1035

GGAACAACAC TCAACCCTAT CTCGGTCTAT TCTTTTGATT TATAAGGGAT TTTGCCGATT 1095

TCGGCCTATT GGTTAAAAAA TGAGCTGATT TAACAAAAAT TTAACGCGAA TTTTAACAAA 1155

ATATTAACGC TTACAATTTA GGTGGCACTT TTCGGGGAAA TGTGCGCGGA ACCCCTATTT 1215

GTTTATTTTT CTAAATACAT TCAAATATGT ATCCGCTCAT GAGACAATAA CCCTGATAAA 1275

TGCTTCAATA ATATTGAAAA AGGAAGAGTA TGAGTATTCA ACATTTCCGT GTCGCCCTTA 1335

TTCCCTTTTT TGCGGCATTT TGCCTTCCTG TTTTTGCTCA CCCAGAAACG CTGGTGAAAG 1395

TAAAAGATGC TGAAGATCAG TTGGGTGCAC GAGTGGGTTA CATCGAACTG GATCTCAACA 1455

GCGGTAAGAT CCTTGAGAGT TTTCGCCCCG AAGAACGTTT TCCAATGATG AGCACTTTTA 1515

AAGTTCTGCT ATOTGGCGCG GTATTATCCC GTATTGACGC CGGGCAAGAG CAACTCGGTC 1575

GCCGCATACA CTATTCTCAG AATGACTTGG TTGAGTACTC ACCAGTCACA GAAAAGCATC 1635

TTACGGATGG CATGACAGTA AGAGAATTAT GCAGTGCTGC CATAACCATG AGTGATAACA 1695

CTGCGGCCAA CTTACTTCTG ACAACGATCG GAGGACCGAA GGAGCTAACC GCTTTTTTGC 1755

ACAACATGGG GGATCATGTA ACTCGCCTTG ATCGTTGGGA ACCGGAGCTG AATGAAGCCA 1815

TACCAAACGA CGAGCGTGAC ACCACGATGC CTGTAGCAAT GGCAACAACG TTGCGCAAAC 1875

TATTAACTGG CGAACTACTT ACTCTAGCTT CCCGGCAACA ATTAATAGAC TGGATGGAGG 1935

CGGATAAAGT TGCAGGACCA CTTCTGCGCT CGGCCCTTCC GGCTGGCTGG TTTATTGCTG 1995

ATAAATCTGG AGCCGGTGAG CGTGGOTCTC GCGGTATCAT TGCAGCACTG GGGCCAGATG 2055

GTAAGCCCTC CCGTATCGTA GTTATCTACA CGACGGGGAG TCAGGCAACT ATGGATGAAC 2115

GAAATAGACA GATCGCTGAG ATAGGTGCCT CACTGATTAA GCATTGGTAA CTGTCAGACC 2175

AAGTTTACTC ATATATACTT TAGATTGATT TAAAACTTCA TTTTTAATTT AAAAGGATCT 2235

AGGTGAAGAT CCTTTTTGAT AATCTCATGA CCAAAATCCC TTAACGTGAG TTTTCGTTCC 2295

ACTGAGCGTC AGACCCCGTA GAAAAGATCA AAGGATCTTC TTGAGATCCT TTTTTTCTGC 2355

GCGTAATCTG CTGCTTGCAA ACAAAAAAAC CACCGCTACC AGCGGTGGTT TGTTTGCCGG 2415

ATCAAGAGCT ACCAACTCTT TTTCCGAAGG TAACTGGCTT CAGCAGAGCG CAGATACCAA 2475

ATACTGTCCT TCTAGTGTAG CCGTAGTTAG GCCACCACTT CAAGAACTCT GTAGCACCGC 2535

CTACATACCT CGCTCTGCTA ATCCTGTTAC CAGTGGCTGC TGCCAGTGGC GATAAGTCGT 2595

GTCTTACCGG GTTGGACTCA AGACGATAGT TACCGGATAA GGCGCAGCGG TCGGGCTGAA 2655

CGGGGGGTTC GTGCACACAG CCCAGCTTGG AGCGAACGAC CTACACCGAA CTGAGATACC 2715

TACAGCGTGA GCTATGAGAA AGCGCCACGC TTCCCGAAGG GAGAAAGGCG GACAGGTATC 2775

CGGTAAGCGG CAGGGTCGGA ACAGGAGAGC GCACGAGGGA GCTTCCAGGG GGAAACGCCT 2835

GGTATCTTTA TAGTCCTGTC GGGTTTCGCC ACCTCTGACT TGAGCGTCGA TTTTTGTGAT 2895

GCTCGTCAGG GGGGCGGAGC CTATGGAAAA ACGCCAGCAA CGCGGCCTTT TTACGGTTCC 2955

TGGCCTTTTG CTGGCCTTTT GCTCACATGT TCTTTCCTGC GTTATCCCCT GATTCTGTGG 3015

ATAACCGTAT TACCGCCTTT GAGTGAGCTG ATACCGCTCG CCGCAGCCGA ACGACCGAGC 3075

GCAGCGAGTC AGTGAGCGAG GAAGCGGAAG AGCGCCCAAT ACGCAAACCG CCTCTCCCCG 3135

CGCGTTGGCC GATTCATTAA TGCAG 3160

(2) INFORMATION FOR SEQ ID NO: 14:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 129 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14

Met Ala Asn Pro Asp Arg Glu Gly Trp Leu Leu Lys Leu Gly Gly Gly 1 5 10 15

Arg Val Lys Thr Trp Lys Arg Arg Trp Phe lie Leu Thr Asp Asn Cys 20 25 30

Leu Tyr Tyr Phe Glu Tyr Thr Thr Asp Lys Glu Pro Arg Gly lie lie 35 40 45

Pro Leu Glu Asn Leu Ser lie Arg Glu Val Glu Asp Ser Lys Lys Pro 50 55 60

Asn Cys Phe Glu Leu Tyr lie Pro Asp Asn Lys Asp Gin Val lie Lys 65 70 75 80

Ala Cys Lys Thr Glu Ala Asp Gly Arg Val Val Glu Gly Asn His Thr 85 90 95

Val Tyr Arg lie Ser Ala Pro Thr Pro Glu Glu Lys Glu Glu Trp lie 100 105 110

Lys Cys lie Lys Ala Ala lie Ser Arg Asp Gly His His His His His 115 120 125

His * 130

(2) INFORMATION FOR SEQ ID NO: 15:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 57 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(iii) HYPOTHETICAL: NO

(xi) SEQUENCE DESCRIPTION SEQ ID NO 15

GAACCAGCGC CTCTTCCAGG TVNNVNNVNN VNNVNNTCCG AGCTTAAGTA GCCAGCC 57

(2) INFORMATION FOR SEQ ID NO 16:

(l) SEQUENCE CHARACTERISTICS (A) LENGTH- 45 base pairs

(B) TYPE nucleic acid

(C) STRANDEDNESS. single

(D) TOPOLOGY linear

(n) MOLECULE TYPE- other nucleic acid

(A) DESCRIPTION /desc = "synthetic oligonucleotide"

(ill) HYPOTHETICAL NO

(xi) SEQUENCE DESCRIPTION SEQ ID NO 16

TGACCGTCCC TCGAGATGGC TGCTTTAATA GCCTTAATCC ACTCC 45

(2) INFORMATION FOR SEQ ID NO 17

(l) SEQUENCE CHARACTERISTICS

(A) LENGTH- 30 base pairs (B) TYPE nucleic acid

(C) STRANDEDNESS Single

(D) TOPOLOGY linear

(n) MOLECULE TYPE other nucleic acid (A) DESCRIPTION /desc = "synthetic oligonucleotide'

(ill) HYPOTHETICAL NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17:

CTCAAAGTAG TACAGAGCGT TGTCAGTCAG 30

(2) INFORMATION FOR SEQ ID NO: 18:

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 40 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(iii) HYPOTHETICAL: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO. 18.

ATTGTCGGGG ATGTACAGCT CAAACACGTT TGGTTTTTTG 40

(2) INFORMATION FOR SEQ ID NO: 19:

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 27 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(iii) HYPOTHETICAL: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19:

AGCCTCGGTC TTCGCGGCCT TGATAAC 27

(2) INFORMATION FOR SEQ ID NO: 20:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 45 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(iii) HYPOTHETICAL: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20,

TGACCGTCCC TCGAGATGGC TGCTTTAATA GCCTTAATCC ACTCC 45

(2) INFORMATION FOR SEQ ID NO: 21:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 3153 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: circular

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "DNA Vector, partially synthetic"

(iii) HYPOTHETICAL: NO

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION:99..485

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21

GATCTCGATC CCGCGAAATT AATACGACTC ACTATAGGGA GACCACAACG GTTTCCCTCT 60

AGAAATAATT TTGTTTAACT TTAAGAAGGA GATATACC ATG GCG AAT CCA GAC 113

Met Ala Asn Pro Asp 135

CGA GAA GGC TGG CTA CTT AAG CTC GGA GGT GGC AGG GTA AAG ACC TGG 161

Arg Glu Gly Trp Leu Leu Lys Leu Gly Gly Gly Arg Val Lys Thr Trp 140 145 150

AAG AGG CGC TGG TTC ATT CTG ACT GAC AAC GCT CTG TAC TAC TTT GAG 209

Lys Arg Arg Trp Phe lie Leu Thr Asp Asn Ala Leu Tyr Tyr Phe Glu 155 160 165

TAT ACC ACG GAT AAG GAG CCC CGT GGA ATC ATC CCT TTA GAG AAT CTG 257

Tyr Thr Thr Asp Lys Glu Pro Arg Gly lie lie Pro Leu Glu Asn Leu

170 175 180

AGT ATC CGG GAA GTG GAG GAC TCC AAA AAA CCA AAC GTG TTT GAG CTG 305 Ser lie Arg Glu Val Glu Asp Ser Lys Lys Pro Asn Val Phe Glu Leu

185 190 195

TAC ATC CCC GAC AAT AAA GAC CAA GTT ATC AAG GCC GCG AAG ACC GAG 353

Tyr lie Pro Asp Asn Lys Asp Gin Val lie Lys Ala Ala Lys Thr Glu 200 205 210 215

GCT GAC GGT CAG GTG GTG GAG GGG AAC CAC ACT GTT TAC CGG ATC TCA 401

Ala Asp Gly Gin Val Val Glu Gly Asn His Thr Val Tyr Arg He Ser 220 225 230

GCT CCG ACG CCC GAG GAG AAG GAG GAG TGG ATT AAG GCT ATT AAA GCA 449

Ala Pro Thr Pro Glu Glu Lys Glu Glu Trp He Lys Ala He Lys Ala

235 240 245

GCC ATC TCG AGG GAC GGT CAC CAC CAT CAC CAT CAC TAGTAAGAAT 495

Ala He Ser Arg Asp Gly His His His His His His

250 255

TCGAAGCTTG ATCCGGCTGC TAACAAAGCC CGAAAGGAAG CTGAGTTGGC TGCTGCCACC 555

GCTGAGCAAT AACTAGCATA ACCCCTTGGG GCCTCTAAAC GGGTCTTGAG GGGTTTTTTG 615

CTGAAAGGAG GAACTATATC CGGATCTGGC GTAATAGCGA AGAGGCCCGC ACCGATCGCC 675

CTTCCCAACA GTTGCGCAGC CTGAATGGCG AATGGGACGC GCCCTGTAGC GGCGCATTAA 735

GCGCGGCGGG TGTGGTGGTT ACGCGCAGCG TGACCGCTAC ACTTGCCAGC GCCCTAGCGC 795

CCGCTCCTTT CGCTTTCTTC CCTTCCTTTC TCGCCACGTT CGCCGGCTTT CCCCGTCAAG 855

CTCTAAATCG GGGGCTCCCT TTAGGGTTCC GATTTAGTGC TTTACGGCAC CTCGACCCCA 915

AAAAACTTGA TTAGGGTGAT GGTTCACGTA GTGGGCCATC GCCCTGATAG ACGGTTTTTC 975

GCCCTTTGAC GTTGGAGTCC ACGTTCTTTA ATAGTGGACT CTTGTTCCAA ACTGGAACAA 1035

CACTCAACCC TATCTCGGTC TATTCTTTTG ATTTATAAGG GATTTTGCCG ATTTCGGCCT 1095

ATTGGTTAAA AAATGAGCTG ATTTAACAAA AATTTAACGC GAATTTTAAC AAAATATTAA 1155

CGCTTACAAT TTAGGTGGCA CTTTTCGGGG AAATGTGCGC GOAACCCCTA TTTGTTTATT 1215

TTTCTAAATA CATTCAAATA TGTATCCGCT CATGAGACAA TAACCCTGAT AAATGCTTCA 1275

ATAATATTGA AAAAGGAAGA GTATGAGTAT TCAACATTTC CGTGTCGCCC TTATTCCCTT 1335

TTTTGCGGCA TTTTGCCTTC CTGTTTTTGC TCACCCAGAA ACGCTGGTGA AAGTAAAAGA 1395

TGCTGAAGAT CAGTTGGGTG CACGAGTGGG TTACATCGAA CTGGATCTCA ACAGCGGTAA 1455

GATCCTTGAG AGTTTTCGCC CCGAAGAACG TTTTCCAATG ATGAGCACTT TTAAAGTTCT 1515

GCTATGTGGC GCGGTATTAT CCCGTATTGA CGCCGGGCAA GAGCAACTCG GTCGCCGCAT 1575

ACACTATTCT CAGAATGACT TGGTTGAGTA CTCACCAGTC ACAGAAAAGC ATCTTACGGA 1635

TGGCATGACA GTAAGAGAAT TATGCAGTGC TGCCATAACC ATGAGTGATA ACACTGCGGC 1695

CAACTTACTT CTGACAACGA TCGGAGGACC GAAGGAGCTA ACCGCTTTTT TGCACAACAT 1755

GGGGGATCAT GTAACTCGCC TTGATCGTTG GGAACCGGAG CTGAATGAAG CCATACCAAA 1815

CGACGAGCGT GACACCACGA TGCCTGTAGC AATGGCAACA ACGTTGCGCA AACTATTAAC 1875

TGGCGAACTA CTTACTCTAG CTTCCCGGCA ACAATTAATA GACTGGATGG AGGCGGATAA 1935

AGTTGCAGGA CCACTTCTGC GCTCGGCCCT TCCGGCTGGC TGGTTTATTG CTGATAAATC 1995

TGGAGCCGGT GAGCGTGGGT CTCGCGGTAT CATTGCAGCA CTGGGGCCAG ATGGTAAGCC 2055

CTCCCGTATC GTAGTTATCT ACACGACGGG GAGTCAGGCA ACTATGGATG AACGAAATAG 2115

ACAGATCGCT GAGATAGGTG CCTCACTGAT TAAGCATTGG TAACTGTCAG ACCAAGTTTA 2175

CTCATATATA CTTTAGATTG ATTTAAAACT TCATTTTTAA TTTAAAAGGA TCTAGGTGAA 2235

GATCCTTTTT GATAATCTCA TGACCAAAAT CCCTTAACGT GAGTTTTCGT TCCACTGAGC 2295

GTCAGACCCC GTAGAAAAGA TCAAAGGATC TTCTTGAGAT CCTTTTTTTC TGCGCGTAAT 2355

CTGCTGCTTG CAAACAAAAA AACCACCGCT ACCAGCGGTG GTTTGTTTGC CGGATCAAGA 2415

GCTACCAACT CTTTTTCCGA AGGTAACTGG CTTCAGCAGA GCGCAGATAC CAAATACTGT 2475

CCTTCTAGTG TAGCCGTAGT TAGGCCACCA CTTCAAGAAC TCTGTAGCAC CGCCTACATA 2535

CCTCGCTCTG CTAATCCTGT TACCAGTGGC TGCTGCCAGT GGCGATAAGT CGTGTCTTAC 2595

CGGGTTGGAC TCAAGACGAT AGTTACCGGA TAAGGCGCAG CGGTCGGGCT GAACGGGGGG 2655

TTCGTGCACA CAGCCCAGCT TGGAGCGAAC GACCTACACC GAACTGAGAT ACCTACAGCG 2715

TGAGCTATGA GAAAGCGCCA CGCTTCCCGA AGGGAGAAAG GCGGACAGGT ATCCGGTAAG 2775

CGGCAGGGTC GGAACAGGAG AGCGCACGAG GGAGCTTCCA GGGGGAAACG CCTGGTATCT 2835

TTATAGTCCT GTCGGGTTTC GCCACCTCTG ACTTGAGCGT CGATTTTTGT GATGCTCGTC 2895

AGGGGGGCGG AGCCTATGGA AAAACGCCAG CAACGCGGCC TTTTTACGGT TCCTGGCCTT 2955

TTGCTGGCCT TTTGCTCACA TGTTCTTTCC TGCGTTATCC CCTGATTCTG TGGATAACCG 3015

TATTACCGCC TTTGAGTGAG CTGATACCGC TCGCCGCAGC CGAACGACCG AGCGCAGCGA 3075

GTCAGTGAGC GAGGAAGCGG AAGAGCGCCC AATACGCAAA CCGCCTCTCC CCGCGCGTTG 3135

GCCGATTCAT TAATGCAG 3153

(2) INFORMATION FOR SEQ ID NO: 22:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 129 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22:

Met Ala Asn Pro Asp Arg Glu Gly Trp Leu Leu Lys Leu Gly Gly Gly 1 5 10 15

Arg Val Lys Thr Trp Lys Arg Arg Trp Phe He Leu Thr Asp Asn Ala 20 25 30

Leu Tyr Tyr Phe Glu Tyr Thr Thr Asp Lys Glu Pro Arg Gly He He 35 40 45

Pro Leu Glu Asn Leu Ser He Arg Glu Val Glu Asp Ser Lys Lys Pro 50 55 60

Asn Val Phe Glu Leu Tyr He Pro Asp Asn Lys Asp Gin Val He Lys 65 70 75 80

Ala Ala Lys Thr Glu Ala Asp Gly Gin Val Val Glu Gly Asn His Thr

85 90 95

Val Tyr Arg He Ser Ala Pro Thr Pro Glu Glu Lys Glu Glu Trp He 100 105 110

Lys Ala He Lys Ala Ala He Ser Arg Asp Gly His His His His His 115 120 125

His

(2) INFORMATION FOR SEQ ID NO: 23:

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 4330 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: circular

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "DNA Vector, partially synthetic"

(iii) HYPOTHETICAL: NO

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION:2462..3511

( ix) FEATURE :

(A) NAME/KEY: unsure

(B) LOCATION: 122..127

(D) OTHER INFORMATION:/note= "Eco RI restriction site appears to be absent . "

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23:

CCATCGAATG GCCAGATGAT TAATTCCTAA TTTTTGTTGA CACTCTATCA TTGATAGAGT 60

TATTTTACCA CTCCCTATCA GTGATAGAGA AAAGTGAAAT GAATAGTTCG ACAAAAATCT 120

AGAATTCATT AAAGAGGAGA AATTAACTCC ATGGTGAGCA AGGGCGAGGA GCTGTTCACC 180

GGGGTGGTGC CCATCCTGGT CGAGCTGGAC GGCGACGTAA ACGGCCACAA GTTCAGCGTG 240

TCCGGCGAGG GCGAGGGCGA TGCCACCTAC GGCAAGCTGA CCCTGAAGTT CATCTGCACC 300

ACCGGCAAGC TGCCCGTGCC CTGGCCCACC CTCGTGACCA CCTTCAGCTA CGGCGTGCAG 360

TGCTTCAGCC GCTACCCCGA CCACATGAAG CAGCACGACT TCTTCAAGTC CGCCATGCCC 420

GAAGGCTACG TCCAGGAGCG CACCATCTTC TTCAAGGACG ACGGCAACTA CAAGACCCGC 480

GCCGAGGTGA AGTTCGAGGG CGACACCCTG GTGAACCGCA TCGAGCTGAA GGGCATCGAC 540

TTCAAGGAGG ACGGCAACAT CCTGGGGCAC AAGCTGGAGT ACAACTACAA CAGCCACAAC 600

GTCTATATCA TGGCCGACAA GCAGAAGAAC GGCATCAAGG TGAACTTCAA GATCCGCCAC 660

AACATCGAGG ACGGCAGCGT GCAGCTCGCC GACCACTACC AGCAGAACAC CCCCATCGGC 720

GACGGCCCCG TGCTGCTGCC CGACAACCAC TACCTGAGCA CCCAGTCCGC CCTGAGCAAA 780

GACCCCAACG AGAAGCGCGA TCACATGGTC CTGCTGGAGT TCGTGACCGC CGCCGGGATC 840

ACTCACGGCA TGGACGAGCT GTACAAGTAA AGCGGCCGGC TGCAGGCAGC GCTTGGCGTC 900

ACCCGCAGTT CGGTGGTTAA TAAGCTTGAC CTGTGAAGTG AAAAATGGCG CACATTGTGC 960

GACATTTTTT TTGTCTGCCG TTTACCGCTA CTGCGTCACG GATCTCCACG CGCCCTGTAG 102C

CGGCGCATTA AGCGCGGCGG GTGTGGTGGT TACGCGCAGC GTGACCGCTA CACTTGCCAG 1080

CGCCCTAGCG CCCGCTCCTT TCGCTTTCTT CCCTTCCTTT CTCGCCACGT TCGCCGGCTT 1140

TCCCCGTCAA GCTCTAAATC GGGGGCTCCC TTTAGGGTTC CGATTTAGTG CTTTACGGCA 1200

CCTCGACCCC AAAAAACTTG ATTAGGGTGA TGGTTCACGT ΛGTGGGCCAT CGCCCTGATA 1260

GACGGTTTTT CGCCCTTTGA CGTTGGAGTC CACGTTCTTT AATAGTGGAC TCTTGTTCCA 1320

AACTGGAACA ACACTCAACC CTATCTCGGT CTATTCTTTT GATTTATAAG GGATTTTGCC 1380

GATTTCGGCC TATTGGTTAA AAAATGAGCT GATTTAACAA AAATTTAACG CGAATTTTAA 1440

CAAAATATTA ACGCTTACAA TTTCAGGTGG CACTTTTCGG GGAAATGTGC GCGGAACCCC 1500

TATTTGTTTA TTTTTCTAAA TACATTCAAA TATGTATCCG CTCATGAGAC AATAACCCTG 1560

ATAAATGCTT CAATAATATT GAAAAAGGAA GAGTATGAGT ATTCAACATT TCCGTGTCGC 1620

CCTTATTCCC TTTTTTGCGG CATTTTGCCT TCCTGTTTTT GCTCACCCAG AAACGCTGGT 1680

GAAAGTAAAA GATGCTGAAG ATCAGTTGGG TGCACGAGTG GGTTACATCG AACTGGATCT 1740

CAACAGCGGT AAGATCCTTG AGAGTTTTCG CCCCGAAGAA CGTTTTCCAA TGATGAGCAC 1800

TTTTAAAGTT CTGCTATGTG GCGCGGTATT ATCCCGTATT GACGCCGGGC AAGAGCAACT 1860

CGGTCGCCGC ATACACTATT CTCAGAATGA CTTGGTTGAG TACTCACCAG TCACAGAAAA 1920

GCATCTTACG GATGGCATGA CAGTAAGAGA ATTATGCAGT GCTGCCATAA CCATGAGTGA I960

TAACACTGCG GCCAACTTAC TTCTGACAAC GATCGGAGGA CCGAAGGAGC TAACCGCTTT 2040

TTTGCACAAC ATGGGGGATC ATGTAACTCG CCTTGATCGT TGGGAACCGG AGCTGAATGA 2100

AGCCATACCA AACGACGAGC GTGACACCAC GATGCCTGTA GCAATGGCAA CAACGTTGCG 2160

CAAACTATTA ACTGGCGAAC TACTTACTCT AGCTTCCCGG CAACAATTGA TAGACTGGAT 2220

GGAGGCGGAT AAAGTTGCAG GACCACTTCT GCGCTCGGCC CTTCCGGCTG GCTGGTTTAT 2280

TGCTGATAAA TCTGGAGCCG GTGAGCGTGG CTCTCGCGGT ATCATTGCAG CACTGGGGCC 2340

AGATGGTAAG CCCTCCCGTA TCGTAGTTAT CTACACGACG GGGAGTCAGG CAACTATGGA 2400

TGAACGAAAT AGACAGATCG CTGAGATAGG TGCCTCACTG ATTAAGCATT GGTAGGAATT 2460

A ATG ATG TCT CGT TTA GAT AAA AGT AAA GTG ATT AAC AGC GCA TTA 2506 Met Met Ser Arg Leu Asp Lys Ser Lys Val He Asn Ser Ala Leu 130 135 140

GAG CTG CTT AAT GAG GTC GGA ATC GAA GGT TTA ACA ACC CGT AAA CTC 2554 Glu Leu Leu Asn Glu Val Gly He Glu Gly Leu Thr Thr Arg Lys Leu 145 150 155 160

GCC CAG AAG CTA GGT GTA GAG CAG CCT ACA TTG TAT TGG CAT GTA AAA 2602 Ala Gin Lys Leu Gly Val Glu Gin Pro Thr Leu Tyr Trp His Val Lys 165 170 175

AAT AAG CGG GCT TTG CTC GAC GCC TTA GCC ATT GAG ATG TTA GAT AGG 2650

Asn Lys Arg Ala Leu Leu Asp Ala Leu Ala He Glu Met Leu Asp Arg 180 185 190

CAC CAT ACT CAC TTT TGC CCT TTA GAA GGG GAA AGC TGG CAA GAT TTT 2698

His His Thr His Phe Cys Pro Leu Glu Gly Glu Ser Trp Gin Asp Phe 195 200 205

TTA CGT AAT AAC GCT AAA AGT TTT AGA TGT GCT TTA CTA AGT CAT CGC 2746 Leu Arg Asn Asn Ala Lys Ser Phe Arg Cys Ala Leu Leu Ser His Arg 210 215 220

GAT GGA GCA AAA GTA CAT TTA GGT ACA CGG CCT ACA GAA AAA CAG TAT 2794

Asp Gly Ala Lys Val His Leu Gly Thr Arg Pro Thr Glu Lys Gin Tyr

225 230 235 240

GAA ACT CTC GAA AAT CAA TTA GCC TTT TTA TGC CAA CAA GGT TTT TCA 2842

Glu Thr Leu Glu Asn Gin Leu Ala Phe Leu Cys Gin Gin Gly Phe Ser

245 250 255

CTA GAG AAT GCA TTA TAT GCA CTC AGC GCA GTG GGG CAT TTT ACT TTA 2890 Leu Glu Asn Ala Leu Tyr Ala Leu Ser Ala Val Gly His Phe Thr Leu

260 265 270

GGT TGC GTA TTG GAA GAT CAA GAG CAT CAA GTC GCT AAA GAA GAA AGG 2938

Gly Cys Val Leu Glu Asp Gin Glu His Gin Val Ala Lys Glu Glu Arg 275 280 285

GAA ACA CCT ACT ACT GAT AGT ATG CCG CCA TTA TTA CGA CAA GCT ATC 2986

Glu Thr Pro Thr Thr Asp Ser Met Pro Pro Leu Leu Arg Gin Ala He

290 295 300

GAA TTA TTT GAT CAC CAA GGT GCA GAG CCA GCC TTC TTA TTC GGC CTT 3034

Glu Leu Phe Asp His Gin Gly Ala Glu Pro Ala Phe Leu Phe Gly Leu

305 310 315 320

GAA TTG ATC ATA TGC GGA TTA GAA AAA CAA CTT AAA TGT GAA AGT GGG 3082

Glu Leu He He Cys Gly Leu Glu Lys Gin Leu Lys Cys Glu Ser Gly

325 330 335

TCT GCT CCG GCA GCT GCT AAA CAG GAA GCT GCA CCG GCT GCA GCG AAT 3130 Ser Ala Pro Ala Ala Ala Lys Gin Glu Ala Ala Pro Ala Ala Ala Asn

340 345 350

CCA GAC CGA GAA GGC TGG CTA CTT AAG CTC GGA GGT GGC AGG GTA AAG 3178

Pro Asp Arg Glu Gly Trp Leu Leu Lys Leu Gly Gly Gly Arg Val Lys 355 360 365

ACC TGG AAG AGG CGC TGG TTC ATT CTG ACT GAC AAC GCT CTG TAC TAC 3226

Thr Trp Lys Arg Arg Trp Phe He Leu Thr Asp Asn Ala Leu Tyr Tyr

370 375 380

TTT GAG TAT ACC ACG GAT AAG GAG CCC CGT GGA ATC ATC CCT TTA GAG 3274 Phe Glu Tyr Thr Thr Asp Lys Glu Pro Arg Gly He He Pro Leu Glu 385 390 395 400

AAT CTG AGT ATC CGG GAA GTG GAG GAC TCC AAA AAA CCA AAC GTG TTT 3322 Asn Leu Ser He Arg Glu Val Glu Asp Ser Lys Lys Pro Asn Val Phe 405 410 415

GAG CTG TAC ATC CCC GAC AAT AAA GAC CAA GTT ATC AAG GCC GCG AAG 3370 Glu Leu Tyr He Pro Asp Asn Lys Asp Gin Val He Lys Ala Ala Lys 420 425 430

ACC GAG GCT GAC GGG CGG GTG GTG GAG GGG AAC CAC ACT GTT TAC CGG 3418 Thr Glu Ala Asp Gly Arg Val Val Glu Gly Asn His Thr Val Tyr Arg 435 440 445

ATC TCA GCT CCG ACG CCC GAG GAG AAG GAG GAG TGG ATT AAG GCT ATT 3466 He Ser Ala Pro Thr Pro Glu Glu Lys Glu Glu Trp He Lys Ala He 450 455 460

AAA GCA GCC ATC TCG AGG GAC GGT CAC CAC CAT CAC CAT CAC TAG 3511 Lys Ala Ala He Ser Arg Asp Gly His His His His His His * 465 470 475

TAAGAATTCG AAGCAGCATA ACCTTTTTCC GTGATGGTAA CTTCACTAGT TTAAAAGGAT 3571

CTAGGTGAAG ATCCTTTTTG ATAATCTCAT GACCAAAATC CCTTAACGTG AGTTTTCGTT 3631

CCACTGAGCG TCAGACCCCG TAGAAAAGAT CAAAGGATCT TCTTGAGATC CTTTTTTTCT 3691

GCGCGTAATC TGCTGCTTGC AAACAAAAAA ACCACCGCTA CCAGCGGTGG TTTGTTTGCC 3751

GGATCAAGAG CTACCAACTC TTTTTCCGAA GGTAACTGGC TTCAGCAGAG CGCAGATACC 3811

AAATACTGTC CTTCTAGTGT AGCCGTAGTT AGGCCACCAC TTCAAGAACT CTGTAGCACC 3871

GCCTACATAC CTCGCTCTGC TAATCCTGTT ACCAGTGGCT GCTGCCAGTG GCGATAAGTC 3931

GTGTCTTACC GGGTTGGACT CAAGACGATA GTTACCGGAT AAGGCGCAGC GGTCGGGCTG 3991

AACGGGGGGT TCGTGCACAC AGCCCAGCTT GGAGCGAACG ACCTACACCG AACTGAGATA 4051

CCTACAGCGT GAGCTATGAG AAAGCGCCAC GCTTCCCGAA GGGAGAAAGG CGGACAGGTA 4111

TCCGGTAAGC GGCAGGGTCG GAACAGGAGA GCGCACGAGG GAGCTTCCAG GGGGAAACGC 4171

CTGGTATCTT TATAGTCCTG TCGGGTTTCG CCACCTCTGA CTTGAGCGTC GATTTTTGTG 4231

ATGCTCGTCA GGGGGGCGGA GCCTATGGAA AAACGCCAGC AACGCGGCCT TTTTACGGTT 4291

CCTGGCCTTT TGCTGGCCTT TTGCTCACAT GACCCGACA 4330

(2) INFORMATION FOR SEQ ID NO: 24

U) SEQUENCE CHARACTERISTICS: (A) LENGTH: 349 amino acids (B) TYPE: amino acid

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24:

Met Met Ser Arg Leu Asp Lys Ser Lys Val He Asn Ser Ala Leu Glu 1 5 10 15

Leu Leu Asn Glu Val Gly He Glu Gly Leu Thr Thr Arg Lys Leu Ala 20 25 30

Gin Lys Leu Gly Val Glu Gin Pro Thr Leu Tyr Trp His Val Lys Asn 35 40 45

Lys Arg Ala Leu Leu Asp Ala Leu Ala He Glu Met Leu Asp Arg His 50 55 60

His Thr His Phe Cys Pro Leu Glu Gly Glu Ser Trp Gin Asp Phe Leu 65 70 75 80

Arg Asn Asn Ala Lys Ser Phe Arg Cys Ala Leu Leu Ser His Arg Asp 85 90 95

Gly Ala Lys Val His Leu Gly Thr Arg Pro Thr Glu Lys Gin Tyr Glu 100 105 110

Thr Leu Glu Asn Gin Leu Ala Phe Leu Cys Gin Gin Gly Phe Ser Leu 115 120 125

Glu Asn Ala Leu Tyr Ala Leu Ser Ala Val Gly His Phe Thr Leu Gly 130 135 140

Cys Val Leu Glu Asp Gin Glu His Gin Val Ala Lys Glu Glu Arg Glu 145 150 155 160

Thr Pro Thr Thr Asp Ser Met Pro Pro Leu Leu Arg Gin Ala He Glu 165 170 175

Leu Phe Asp His Gin Gly Ala Glu Pro Ala Phe Leu Phe Gly Leu Glu 180 185 190

Leu He He Cys Gly Leu Glu Lys Gin Leu Lys Cys Glu Ser Gly Ser 195 200 205

Ala Pro Ala Ala Ala Lys Gin Glu Ala Ala Pro Ala Ala Ala Asn Pro 210 215 220

Asp Arg Glu Gly Trp Leu Leu Lys Leu Gly Gly Gly Arg Val Lys Thr 225 230 235 240

Trp Lys Arg Arg Trp Phe He Leu Thr Asp Asn Ala Leu Tyr Tyr Phe 245 250 255

Glu Tyr Thr Thr Asp Lys Glu Pro Arg Gly He He Pro Leu Glu Asn 260 265 270

Leu Ser He Arg Glu Val Glu Asp Ser Lys Lys Pro Asn Val Phe Glu 275 280 285

Leu Tyr He Pro Asp Asn Lys Asp Gin Val He Lys Ala Ala Lys Thr 290 295 300

Glu Ala Asp Gly Arg Val Val Glu Gly Asn His Thr Val Tyr Arg He 305 310 315 320

Ser Ala Pro Thr Pro Glu Glu Lys Glu Glu Trp He Lys Ala He Lys 325 330 335

Ala Ala He Ser Arg Asp Gly His His His His His His * 340 345 350

(2) INFORMATION FOR SEQ ID NO: 25-

U) SEQUENCE CHARACTERISTICS

(A) LENGTH: 61 base pairs

(B) TYPE, nucleic acid

(C) STRANDEDNESS: single (D) TOPOLOGY: linear

Ui) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION /desc = "synthetic oligonucleotide"

(in) HYPOTHETICAL: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25

TGGCTACTTA AGNNNNNNNN NNNNNNNNNN NNNNNNNNNA AGAGGCGCTG GTTCATTCTG 60

61

(2) INFORMATION FOR SEQ ID NO: 26:

(l) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 56 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide'

(iii) HYPOTHETICAL: NO

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26:

CTTTGAGTAT ACCNNNNNNN NNNNNNNNNN NNNNATCATC CCTTTAGAGA ATCTGA 56

(2) INFORMATION FOR SEQ ID NO: 27:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 39 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide'

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27:

CGCGAATTCA CTAGTATAAT GGCGAATCCA GACCGAGAA 39

(2) INFORMATION FOR SEQ ID NO: 28:

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 32 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28:

GTCGACGTCG ACTAGTGATG GTGATGGTGG TG 32

(2) INFORMATION FOR SEQ ID NO: 29:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH.- 52 base pairs

(B) TYPE: nucleic acid (C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29:

CCATGGTGGC CACCACATAG GAGCAGGAAT CGTAAAGACC TGGAAGAGGC GC 52

(2) INFORMATION FOR SEQ ID NO: 30:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 48 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30:

CCATGGTGGT CCTGATGGTG CTCCGGATCC ACCTCCGAGC TTAAGTAG 48

(2) INFORMATION FOR SEQ ID NO: 31:

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 60 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31:

CCATCCGGGC CCCCATGGTG GCCACCAATA GGCGCCAGGG TAAAGACCTG GAAGAGGCGC 60

(2) INFORMATION FOR SEQ ID NO: 32;

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 53 base pairs (B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32:

CCACCATGGG GGCCCGGATG GTGCTCCTGC TAGGCCACCT CCGAGCTTAA GTA 53

(2) INFORMATION FOR SEQ ID NO: 33

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 57 base pairs (B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

Ui) MOLECULE TYPE: other nucleic acid (A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33:

CCATCCGGGC CCCCATGGTG GCCACCAATA GGCGCCAAGG AGCCCCGTGG AATCATC 57

(2) INFORMATION FOR SEQ ID NO: 34:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 54 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34:

CCACCATGGG GGCCCGGATG GTGCTCCTGC TAGATCCGTG GTATACTCAA AGTA 54

(2) INFORMATION FOR SEQ ID NO: 35:

U) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 57 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35:

CCATCCGGGC CCCCATGGTG GCCACCAATA GGCGCCCAGG TGGTGGAGGG GAACCAC 57

(2) INFORMATION FOR SEQ ID NO: 36:

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 54 base pairs (B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

Ui) MOLECULE TYPE: other nucleic acid (A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36:

CCACCATGGG GGCCCGGATG GTGCTCCTGC TAGACCGTCA GCCTCGGTCT TCGC 54

(2) INFORMATION FOR SEQ ID NO: 37:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 27 base pairs

(B) TYPE: nucleic acid (C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION:!..27

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37:

AAG GAG GGT GAT CGC GTG AAG ACT AGG 27 Lys Glu Gly Asp Arg Val Lys Thr Arg

355

(2) INFORMATION FOR SEQ ID NO: 38:

U) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 9 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38

Lys Glu Gly Asp Arg Val Lys Thr Arg 1 5

(2) INFORMATION FOR SEQ ID NO: 39:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 27 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION:!..27

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39:

CTG GGT GGT AGT CGG ATC CAG ACG AGG 27 Leu Gly Gly Ser Arg He Gin Thr Arg 10 15

(2) INFORMATION FOR SEQ ID NO: 40:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 9 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40:

Leu Gly Gly Ser Arg He Gin Thr Arg 1 5

(2) INFORMATION FOR SEQ ID NO: 41:

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 27 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

Ui) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(ix) FEATURE: (A) NAME/KEY: CDS

(B) LOCATION:!..27

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41:

CTC GGT GGT GGT CGC GTG AAG ACG TGG 27

Leu Gly Gly Gly Arg Val Lys Thr Trp 10 15

(2) INFORMATION FOR SEQ ID NO: 42:

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids (B) TYPE: amino acid

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42

Leu Gly Gly Gly Arg Val Lys Thr Trp 1 5

(2) INFORMATION FOR SEQ ID NO: 43:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 27 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

Ui) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION:!..27

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43:

ATC GGT GGT GGT CGC TTG AAG ACG TGG 27 He Gly Gly Gly Arg Leu Lys Thr Trp 10 15

(2) INFORMATION FOR SEQ ID NO: 44:

U) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 9 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44:

He Gly Gly Gly Arg Leu Lys Thr Trp 1 5

(2) INFORMATION FOR SEQ ID NO: 45:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 27 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION:!..27

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 45:

CTG GGT TGT GAG CGG GTC GAG ACT TGG 27

Leu Gly Cys Glu Arg Val Glu Thr Trp 10 15

(2) INFORMATION FOR SEQ ID NO: 46:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 9 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46

Leu Gly Cys Glu Arg Val Glu Thr Trp

1 5

(2) INFORMATION FOR SEQ ID NO: 47:

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 27 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(ix) FEATURE: (A) NAME/KEY: CDS

(B) LOCATION:!..27

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47:

ATG GGT GGT GGT AGC GTG GAG ACT TGG 27

Met Gly Gly Gly Ser Val Glu Thr Trp 10 15

(2) INFORMATION FOR SEQ ID NO: 48:

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids (B) TYPE: amino acid

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48:

Met Gly Gly Gly Ser Val Glu Thr Trp 1 5

(2) INFORMATION FOR SEQ ID NO: 49:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 27 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION:!..27

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49:

CTC AGT GGT GGT CGC GTC AAT AGT TGG 27 Leu Ser Gly Gly Arg Val Asn Ser Trp 10 15

(2) INFORMATION FOR SEQ ID NO: 50:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 9 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50:

Leu Ser Gly Gly Arg Val Asn Ser Trp 1 5

(2) INFORMATION FOR SEQ ID NO: 51:

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 27 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION:!..27

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51:

CTC GCT GGG GGT AAG GTG AAT ACG TGG 27

Leu Ala Gly Gly Lys Val Asn Thr Trp 10 15

(2) INFORMATION FOR SEQ ID NO: 52:

U) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52:

Leu Ala Gly Gly Lys Val Asn Thr Trp 1 5

(2) INFORMATION FOR SEQ ID NO: 53:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 27 base pairs

(B) TYPE: nucleic acid (C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION:!..27

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53:

CTC GGT AGT GGG AGG GTC AAG ACG TGG 27

Leu Gly Ser Gly Arg Val Lys Thr Trp 10 15

(2) INFORMATION FOR SEQ ID NO: 54:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 9 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54

Leu Gly Ser Gly Arg Val Lys Thr Trp

1 5

(2) INFORMATION FOR SEQ ID NO: 55:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 21 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

Ui) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION:!..21

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55:

AAG GAC AAG GAG CCG CGC GGT 21

Lys Asp Lys Glu Pro Arg Gly 10 15

(2) INFORMATION FOR SEQ ID NO: 56:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 7 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56:

Lys Asp Lys Glu Pro Arg Gly 1 5

(2) INFORMATION FOR SEQ ID NO: 57:

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 21 base pairs (B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid (A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(ix) FEATURE:

(A) NAME/KEY: CDS

(B ) LOCATION : 1 . . 21

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57:

AAT GAC AAG GAG GCG CGC GGT 21

Asn Asp Lys Glu Ala Arg Gly 10

(2) INFORMATION FOR SEQ ID NO: 58:

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 7 amino acids (B) TYPE: amino acid

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5£

Asn Asp Lys Glu Ala Arg Gly 1 5

(2) INFORMATION FOR SEQ ID NO: 59:

U) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 21 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION: 1..21

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59:

ACT GAC AAG GAG CCC CGG GGT 21

Thr Asp Lys Glu Pro Arg Gly 10

(2) INFORMATION FOR SEQ ID NO: 60:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 7 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60:

Thr Asp Lys Glu Pro Arg Gly 1 5

(2) INFORMATION FOR SEQ ID NO: 61:

' U) SEQUENCE CHARACTERISTICS: (A) LENGTH: 21 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(ix) FEATURE: (A) NAME/KEY: CDS

(B) LOCATION:!..21

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61

AAG AAC AAG GAG CCG CGC AGT 21

Lys Asn Lys Glu Pro Arg Ser

10

(2) INFORMATION FOR SEQ ID NO: 62:

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 7 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62:

Lys Asn Lys Glu Pro Arg Ser 1 5

(2) INFORMATION FOR SEQ ID NO: 63:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 21 base pairs

(B) TYPE: nucleic acid (C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION:!..21

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63

ACG GAC AAG CAG CCG CGG GGT 21

Thr Asp Lys Gin Pro Arg Gly 10

(2) INFORMATION FOR SEQ ID NO: 64:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 7 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64:

Thr Asp Lys Gin Pro Arg Gly 1 5

(2) INFORMATION FOR SEQ ID NO: 65:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 21 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION:!..21

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65:

ACG GAC ACG CAC CCA AGG GGT 21

Thr Asp Thr His Pro Arg Gly 10

(2) INFORMATION FOR SEQ ID NO: 66:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 7 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66:

Thr Asp Thr His Pro Arg Gly 1 5

(2) INFORMATION FOR SEQ ID NO: 67:

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 21 base pairs (B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid (A) DESCRIPTION: /desc = "synthetic oligonucleotide'

(ix) FEATURE:

(A) NAME/KEY: CDS (B) LOCATION:!..21

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67:

ACG GAC AAG CAG CCG CGG GGT 21

Thr Asp Lys Gin Pro Arg Gly 10

(2) INFORMATION FOR SEQ ID NO: 68:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 7 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61

Thr Asp Lys Gin Pro Arg Gly 1 5

(2) INFORMATION FOR SEQ ID NO: 69:

U) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 21 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide"

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION: 1..21

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69

GGT GAC AAG GAG CCG CGC GAT 21 Gly Asp Lys Glu Pro Arg Asp 10

(2) INFORMATION FOR SEQ ID NO : 70:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 7 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70:

Gly Asp Lys Glu Pro Arg Asp 1 5

(2) INFORMATION FOR SEQ ID NO: 71:

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 21 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "synthetic oligonucleotide'

(ix) FEATURE: (A) NAME/KEY: CDS

(B) LOCATION:!..21

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71:

ACG GAT AAG GAG CCC CGC GGT 21 Thr Asp Lys Glu Pro Arg Gly

10

( 2 ) INFORMATION FOR SEQ ID NO : 72 :

( i ) SEQUENCE CHARACTERISTICS :

(A) LENGTH: 7 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72:

Thr Asp Lys Glu Pro Arg Gly 1 5