Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
TWIN-ARGININE TRANSLOCATION (TAT) STREPTOMYCES SIGNAL SEQUENCES
Document Type and Number:
WIPO Patent Application WO/2007/071996
Kind Code:
A3
Abstract:
Described herein are novel Tat signal polypeptides and methods for using the Tat signal polypeptides for producing heterologous polypeptides. A novel reporter assay for testing the biological activity of the secreted proteins is also described.

Inventors:
PALMER TRACY (GB)
WIDDICK DAVID ANDREW (GB)
Application Number:
PCT/GB2006/004816
Publication Date:
November 08, 2007
Filing Date:
December 20, 2006
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV EAST ANGLIA (GB)
PALMER TRACY (GB)
WIDDICK DAVID ANDREW (GB)
International Classes:
C07K14/19
Other References:
BENDTSEN JANNICK DYRLOV ET AL: "Prediction of twin-arginine signal peptides", BMC BIOINFORMATICS, BIOMED CENTRAL, LONDON, GB, vol. 6, no. 1, 2 July 2005 (2005-07-02), pages 167, XP021000762, ISSN: 1471-2105
PALMER T ET AL: "Export of complex cofactor-containing proteins by the bacterial Tat pathway", TRENDS IN MICROBIOLOGY, ELSEVIER SCIENCE LTD., KIDLINGTON, GB, vol. 13, no. 4, April 2005 (2005-04-01), pages 175 - 180, XP004842094, ISSN: 0966-842X
DILKS KIERAN ET AL: "Prokaryotic utilization of the twin-arginine translocation pathway: a genomic survey.", JOURNAL OF BACTERIOLOGY FEB 2003, vol. 185, no. 4, February 2003 (2003-02-01), pages 1478 - 1483, XP002445523, ISSN: 0021-9193
Attorney, Agent or Firm:
KREMER, Simon et al. (York House23 Kingsway, London Greater London WC2B 6HP, GB)
Download PDF:
Claims:
What is claimed is:

1. An isolated TAT signal peptide comprising the sequence motif (X " 1 )RR(X +2 )(X +3 )(X +4 ), wherein R is arginine, X "1 is amino acid M, H, A, P, K, R, N, T, G 1 S, D 1 Q or E; X +2 is amino acid A, P 1 K, R 1 N 1 T, G 1 S, D, Q or E; X +3 is I 1 W, F 1 L 1 V, Y, M 1 C 1 H, A 1 P, N or T; and X +4 is Q, I 1 L 1 V 1 M or F, and wherein said motif is not within the first 35N' terminal residues of the amino acid sequence of the polypeptide.

2. An isolated TAT signal peptide comprising the sequence motif (X " 1 )RR(X +2 )(X +3 )(X +4 ), wherein R is arginine, X "1 is amino acid H 1 A 1 P, K, R, N 1 T 1 G 1 S 1 D, Q E or L; X +2 is A, P, K, R 1 N 1 T 1 G 1 S, D 1 Q or E; X +3 is I 1 W 1 F, L, V 1 Y, M 1 C, H, A, P, N or T; and X +4 is T 1 G or A 1 and wherein the motif is within the first 35N' terminal residues of the amino acid sequence of the polypeptide.

3. The TAT signal peptide of Claim 2, wherein when X "1 is H then X +4 is A.

4. The TAT signal peptide of Claim 2, wherein when X "1 is L then X +4 is G.

5. An isolated TAT signal peptide comprising the sequence motif (X "1 )RR(X +2 )(X +3 ) (X +4 ), wherein RR represents two adjacent arginine residue and X designates positions restrict to other selected amino acids: X "1 is M, H, A 1 P, K 1 R 1 N, T, G 1 S, D, Q, or E; X +2 is a polar amino acid residue; and X +3 and X +4 are non-polar amino acid residues, and wherein the motif is not within the first 35N' terminal residues of the amino acid sequence of the polypeptide.

6. An isolated variant of the TAT signal peptide of Claim 5.

7. An isolated TAT signal peptide comprising the amino acid sequence of the signal peptide of proteins SCO2286 (SEQ ID NO: 218), SCO3790!ong (SEQ ID NO: 227), SCO6580long (SEQ ID NO:241), SCO1590 (SEQ ID NO: 211), SCO1824 (SEQ ID NO: 213), SCO6580short (SEQ ID NO: 182), or SCO3790short (SEQ ID NO: 122).

8. An isolated polynucleotide comprising a polynucleotide sequence encoding a signal peptide of Claims 1 , 2, 5, or 6.

9. The isolated polynucleotide of Claim 8, wherein said nucleotide sequence encoding a TAT signal polypeptide is operably linked to a second nucleotide sequence encoding a heterologous polypeptide.

10. An expression vector comprising a first nucleotide sequence encoding a TAT signal polypeptide of Claims 1, 2, or 5 operably linked to a second nucleotide sequence encoding a heterologous polypeptide.

11. A fusion polypeptide comprising a TAT signal peptide of Claim 1 , 2, or 5 and a heterologous polypeptide.

12. The fusion polypeptide of Claim 11 , wherein said TAT signal peptide is the secretory leader sequence of polypeptides that are naturally expressed by Streptomyces.

13. The fusion polypeptide of Claim 11 , wherein said heterologous polypeptide is an enzyme, a growth factor or a hormone.

14. The fusion polypeptide of Claim 13, wherein said enzyme is a protease, a carbohydrase, an isomerase, a glucoamylase, a kinase, an amidase, an esterase, or an oxidase.

15. The fusion polypeptide of Claim 11 , wherein said heterologous polypeptide is not naturally associated with a secretion signal peptide.

16. A bacterial host cell genetically transformed with an expression vector of Claim 10.

17. A method for producing a heterologous polypeptide comprising:

(a) cuituring host cells comprising an expression vector comprising a first nucleotide sequence encoding a TAT signal peptide operatively linked to a second nucleotide sequence encoding a heterologous polypeptide, wherein said TAT signal peptide comprises the sequence motif

(X "1 )RR(X +2 )(X +3 )(X +4 ), wherein R is arginine, X "1 is amino acid M, H, A, P, K, R, N 1 T, G, S, D, Q or E; X +2 is amino acid A, P 1 K, R, N, T, G, S 1 D, Q or E; X +3 is I 1 W, F, L, V, Y, M, C, H, A, P, N or T; and X +4 is Q 1 I 1 L, V, M or F, and wherein said motif is not within the first 35N' terminal residues of the amino acid sequence of the polypeptide; and

(b) producing said heterologous polypeptide.

18. A method for producing a heterologous polypeptide comprising:

(a) culturing host cells comprising an expression vector comprising a first nucleotide sequence encoding a TAT signal peptide operatively linked to a second nucleotide sequence encoding a heterologous polypeptide, wherein said TAT signal peptide comprises the sequence motif

(X- 1 )RR(X +Z )(X +3 )(X +4 ), wherein R is arginine, X "1 is amino acid H, A, P, K, R, N, T, G, S, D, Q E or L; X +2 is A, P 1 K, R, N, T, G, S, D, Q or E; X +3 is I, W, F, L, V, Y, M, C 1 H, A, P, N or T; and X +4 is T, G or A, and wherein the motif is within the first 35N' terminal residues of the amino acid sequence of the polypeptide; and

(b) producing said heterologous polypeptide.

19. The method of Claims 17 or 18, wherein said step of producing comprises recovering said heterologous polypeptide from the culture medium.

20. The method of Claim 17 or 18, wherein said host cells are prokaryotic cells.

21. The method of Claim 17 or 18, wherein said host cells are Streptomyces bacterial cells.

22. The method of Claim 17 or 18, wherein said host cells are S. coelicolor or S. lividans bacterial cells.

23. The method of Claim 17 or 18, wherein said heterologous polypeptide is an enzyme, a growth factor or a hormone.

Description:

TWIN-ARGININE TRANSLOCATION (TAT) STREPTOMYCES SIGNAL SEQUENCES

FIELD OF THE INVENTION

This invention relates to Twin-arginine translocation (Tat) signal peptides, which have been identified in the cell wall associated fraction of the gram-positive soil bacterium Streptomyces coelicolor. The invention further relates to fusion polypeptides comprising a TAT signal peptide and a heterologous polypeptide.

BACKGROUND

In prokaryotes two pathways for protein translocation across the cytoplasmic membrane have been recognized. In most bacteria the general secretory (Sec) pathway is the best-characterized route for protein export. Proteins exported by this pathway are translocated across the membrane in an unfolded state through a membrane-embedded transloeon to which they are targeted by cleavable N-terminai signal peptides (Mori et al., (2001) Trends in Microbiology 9:494 - 500). More recently a second general export pathway has been described, which is designated the twin-arginine translocation (Tat) pathway and reference is made to US 2002/0110860; WO 03/079007; Berks, B. C. (1996) MoI. Microbiol. 22:393-404 and Tjalsma et al., (2000) Microbiol. & Molecul. Bio. Reviews 64:515 - 547. Unlike the Sec system, the Tat system is involved in the transport of pre- folded protein substrates (Thomas et al., (2001) MoI. Microbiol 39:47 - 53). Proteins are targeted to the Tat pathway by possession of N-terminal tripartite signal peptides. The signal peptides include a conserved twin-arginine motif in the N-region of Tat signal peptide. The motif has been defined as R-R-x-φ-φ, where φ represents a hydrophobic amino acid. In E. coli the Tat pathway comprises the three-membrane protein TatA, TatB and TatC. A fourth protein TatE forms a minor component of the Tat machinery and has a similar function to TatA. Studies by several groups suggest that the major role of the Tat pathway is in the translocation of redox proteins that integrate their cofactors in the cytoplasm. Other more recent studies indicate that the Tat pathway may play a broader role in protein secretion (Ochsner et al., (2001) Proc. Natl. Acad. Sci. USA 99:8312 - 8317). Because of the ability to secrete pre-folded protein substrates, the Tat pathway represents a significant mechanism for secreting a high level of heterologous proteins. Estimates of Tat substrates in organisms other than Bacillus subtilits and E. coli have been based predominantly in in silico analysis of genome sequences using programs

trained to recognize specific features of tat targeting sequences. While these programs are useful tools for identifying candidate Tat substrates encoded within bacterial genomes experimental verification of the in silico predicted sequences other than phenotype analysis has been lacking. Many of these predicted Tat substrates are in microorganisms of the Streptomyces genes such as S. coelicolor and S. avermitilis. Streptomyces are gram-positive spore forming microorganisms which produce a range of diverse and important secondary metabolites, including many commercially available antibiotics. Streptomyces are important in the field of biotechnology because they are prolific protein secretors. Prior in silico predictions estimate between 145 and 189 proteins from S. coelicolor may be Tat dependent. The present invention demonstrates the verification and importance of the Tat secretory pathway in Streptomyces and further is directed to Tat signal peptides which comprise a motif outside of the previous identified Tat signal peptide motif. Further the Tat signal peptides according to the invention maybe useful in the secretion of heterologous proteins in Streptomyces.

SUMMARY OF THE INVENTION

Provided herein are novel Tat signal peptides and methods of using the novel Tat signal peptides for producing heterologous polypeptides in a bacterial host cell as described in the appended claims.

In a first aspect, the invention is directed to novel Twin Arginine Translocation (TAT) signal peptides.

In one aspect, the invention is directed to an isolated signal peptide comprising the TAT signal sequence motif (X "1 )RR(X +2 )(X +3 )(X +4 ), wherein R is arginine, X "1 is amino acid M, H, A, P, K, R, N, T, G, S, D, Q or E; X +2 Js amino acid A, P, K, R, N, T, G, S, D, Q or E; X +3 is I 1 W, F, L, V, Y, M, C, H, A, P, N or T; and X +4 is Q, I, L, V, M or F, and wherein said motif is not within the first 35N' terminal residues of the amino acid sequence of the polypeptide.

In another aspect, the invention is directed to an isolated TAT signal peptide comprising the sequence motif (X "1 ) RR(X +2 ) (X +3 ) (X +4 ), wherein R is arginine, X "1 is amino acid H, A, P, K, R, N, T, G, S, D, Q E or L; X +2 is A, P, K, R, N, T, G, S, D, Q or E; X +3 is I 1 W, F, L, V, Y, M, C, H, A, P, N or T; and X +4 is T 1 G or A, and wherein the motif is within the first 35N' terminal residues of the amino acid sequence of the polypeptide. In some embodiments, the TAT signal peptide comprises a sequence motif (X "1 ) RR(X +2 ) (X +3 ) (X +4 ) that is within the first 35N' terminal residues, wherein when X "1 is H then X +4 is A. In other

embodiments, the TAT signal peptide comprises a sequence motif (X "1 )RR(X +2 )(X +3 )(X +4 ) that is within the first 35N' terminal residues, wherein when X "1 is L then X +4 is G.

In a further aspect, the invention is directed to an isolated TAT signal peptide comprising the sequence motif (X "1 )RR(X +2 )(X +3 ) (X +4 ), wherein RR represents two adjacent arginine residue and X designates positions restrict to other selected amino acids: X "1 is M, H, A, P, K, R, N, T, G, S, D, Q, or E; X +2 is a polar amino acid residue; and X +3 and X +4 are non-polar amino acid residues, and wherein the motif is not within the first 35N' terminal residues of the amino acid sequence of the polypeptide. In some embodiments, the TAT signal polypeptide is a variant of the polypeptide having the TAT motif that is not within the first 35N' terminal residues of the amino acid sequence of the signal peptide.

Other embodiments of the invention comprise TAT signal peptides comprising the amino acid sequences of TAT signal peptides of proteins SCO2286 (SEQ ID NO: 218), SCO3790long (SEQ ID NO: 227), and SCO6580long (SEQ ID NO: 241), SCO1590 (SEQ ID NO: 211), SCO1824 (SEQ ID NO: 213), SCO6580short (SEQ ID NO: 182), or SCO3790short (SEQ ID NO: 122).

In another aspect, the invention is directed to an isolated polynucleotide sequence comprising a polynucleotide sequence encoding a TAT signal peptide of the invention.

In some embodiments, the isolated polynucleotide sequence is a nucleic acid molecule comprising a first nucleotide sequence encoding a TAT signal sequence encompassed by the invention operably linked to a second nucleotide sequence encoding a heterologous polypeptide. In other embodiments, the invention is directed to a nucleic acid molecule comprising a first nucleotide sequence encoding a TAT signal sequence encompassed by the invention operably linked to a second nucleotide sequence encoding a homologous polypeptide. In a further aspect, the invention is directed to an expression vector comprising a first nucleotide sequence encoding a TAT signal sequence encompassed by the invention operably linked to a second nucleotide sequence encoding a heterologous polypeptide. In one embodiment of this aspect, the invention is directed to a bacterial host cell host cell that is genetically transformed with the recombinant expression vector encompassed by the invention.

In another aspect, the invention is directed to a fusion polypeptide comprising a TAT secretion signal peptide encompassed by the invention and a heterologous polypeptide. In an embodiment of this aspect, the fusion polypeptide comprises a TAT signal peptide that is naturally expressed in Streptomyces. In another embodiment of this aspect, the heterologous polypeptide is an enzyme, growth factor or hormone. In yet

another embodiment, the enzyme may be a protease, a carbohydrase, such as amylases, cellulases, xylanases, and lipases; an isomerase, such as racemases, epimerases, tautomerases, or mutases; a transferase; a glucoamylase; a kinase, an amidase, an esterase, or an oxidase. In other embodiments of this aspect, the heterologous polypeptide is not naturally associated with any secretion signal peptide.

In yet another aspect, the invention is directed to a method for producing a heterologous polypeptide comprising culturing host cells in culture medium under conditions suitable for producing said polypeptide, said host cells comprising an expression vector comprising a first nucleotide sequence encoding a TAT signal peptide encompassed by the invention operatively linked to a second nucleotide sequence encoding a heterologous polypeptide, and producing said heterologous polypeptide. In some embodiments, the method uses a TAT signal peptide of the invention that comprises the sequence motif (X "1 )RR(X +2 )(X +3 )(X +4 ), wherein R is arginine, X "1 is amino acid M, H, A, P, K, R, N, T, G, S, D, Q or E; X +2 is amino acid A, P, K, R, N, T, G, S, D, Q or E; X +3 is I 1 W, F, L, V, Y, M, C, H, A, P 1 N or T; and X +4 is Q, I, L, V, M or F, and wherein said motif is not within the first 35N' terminal residues of the amino acid sequence of the polypeptide.

In other embodiments, the heterologous polypeptide that is produced by the method of the invention includes a TAT signal peptide that comprises the sequence motif (X- 1 )RR(X +2 )(X +3 )(X +4 ), wherein R is arginine, X "1 is amino acid H, A; P, K, R, N, T, G, S, D, Q E or L; X +2 is A, P, K, R, N, T, G, S, D, Q or E; X +3 is I, W, F, L, V, Y, M, C, H, A, P, N or T; and X +4 is T, G or A, and wherein the motif is within the first 35N' terminal residues of the amino acid sequence of the polypeptide. In some embodiments of this aspect, the step of producing the heterologous polypeptide comprises recovering the polypeptide from the culture medium. In other embodiments of this aspect, the host cell is a prokaryotic cell. In other embodiments, the host cell is a Streptomyces bacterial cell. In yet other embodiments, the host cell is a S. coelicolor or an S. lividans bacterial ceil. In further embodiments of this aspect, the method of the invention produces a heterologous polypeptide that can be an enzyme a growth factor or a hormone. In yet another aspect, the invention is directed to a method for secreting a heterologous protein from a host microorganism comprising operably ligating a nucleotide acid sequence encoding the heterologous protein to a TAT signal sequence encompassed by the invention and inserting the ligated pair into an expression vector in a host microorganism, expressing the heterologous protein under the control of the TAT signal and secreting the heterologous protein from the microorganism by the TAT expression

pathway. In some embodiments of this aspect, the expressed heterologous protein is secreted in its correctly -folded active form. In some embodiments of this aspect, the host organism is a Streptomyces strain, for example a S. lividans strain.

In another aspect, the invention provides for a method of identifying TAT signal peptides of polypeptides secreted in a microorganism comprising identifying a TAT signal peptide of a secreted polypeptide and validating the ability of said signal peptide to secrete a biological functional polypeptide. In some aspects, testing the validity of the TAT signal peptide comprises expressing a fusion polypeptide in a host microorganism, comprising the TAT signal sequence operably linked to a heterologous polypeptide and testing the biological activity of the expressed heterologous polypeptide. In some embodiments, the heterologous polypeptide is agarase.

Additional aspects and embodiments of the invention are set forth in the detailed discussion, examples and figures which follow, which are intended for illustrative purposes only and are not intended in any way to limit the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 A-B. The tatC mutant of S. coelicolor has pleiotropic phenotypes. Colonies of the S. coelicolor wildtype (left hand plate) and δtatC strain TP4 (right hand plate) cultured on either MS (A), or R5 (B), media. FIG. 2 A-B illustrates a 2-dimensional gel analysis of extracellular protein fractions isolated from the S. coelicolor M145 wild type (A) and a δtatC mutant derivative (B). Strains were cultured on R5, and proteins associated with the cell wall were separated in the first dimension by isoelectric focusing (pH gradient 4 - 7) and in the second dimension by SDS PAGE. Protein spots that are circled represent proteins that migrated at specific positions in the extracellular fractions from the wild type strain but were absent from the corresponding position in extracellular fractions of the δtatC strain. These proteins spots were identified by in gel tryptic digest followed by mass spectrometry and identities of the proteins are indicated by the SCO number.

FIG. 3 A-D illustrates a 2-dimensional gel analysis of extracellular fractions isolated from strains grown on Complete medium (CM) (A and B) and Mannitol Soya (MS) (C and D). Proteins from the wild type S coelicolor M145 strain are shown in A and C, and proteins from the tatC mutant strains are shown in panels B and D. Protein spots that are circled represent proteins that migrated at specific positions in the extracellular fractions from the wild type strain but were absent from the corresponding position in extracellular fractions of the δtatC strain. These proteins spots were identified by in gel tryptic digest

followed by mass spectrometry and identities of the proteins are indicated by the SCO number.

FIG. 4 A-C Agarase is a Tat substrate. In A) the agarase signal peptide with the consecutive arginine residues of the twin-arginine motif are highlighted in bold and underlined. In B) S. coelicolor strain M145 (WT) and TP1 (_ηafC::Apra R ) were grown on MM-C minimal medium for 5 days and stained with Lugol solution. In C) strains M145 (Wt) and TP4 (δtatC) harboring either plJ6902 or plJ6902-dagA in single copy were grown on MM containing glucose, apramycin and thiostrepton for 72 hours and prior to staining with Lugol solution. FIG. 5 illustrates the export of agarase activity mediated by S. coelicolor signal peptides some of which are encompassed by the instant invention. The signal peptide is fused to the mature sequence of agarase. The y-axis shows the signal peptides from a range of cell wall associated S. coelicolor proteins (listed in Tables 3 and 4) gives the % agarase activity of the secreted protein for each fusion protein when compared with agarase including it native signal peptide (set at 100%). Each construct carries the same native agarase promoter and ribosome-binding site and therefore, the activity is a measure of the efficientcy of export directed by each particular signal peptide. The assay was verified using negative controls, signal peptides that do not posses twin-arginines in the signal peptide and were strongly detected in the extracellular fractions of both the wild type (M145) and δtatC mutant strain by MudPIT analysis. None of these signal peptides mediated extracellular agarase activity. The signal peptides from the following proteins were also tested and were found to be negative in this assay: SCO0432, SCO0474, SCO0494, SCO0930, SCP1230, SCO1396, SCO1824, ACO1948, ACO1968, SCO2226, SCO2383, SCO2446, SCO2591 , SCO2637, SCO2786, SACO2821 , SCO3456, SCO4010, SCO4142, SCO4152, SCO4884, SCO4885, SCO5074, SCO5113. SCO5447, SCO5461 , SCO6009, SCO6644, SCO6738, and SCO7399. The symbol * annotates that two versions of the annotated signal peptides (designated long and short), representing two alternative start sites, were tested (see Tables 3 and 4).

FIG. 6 illustrates the Tat-dependent export of agarase mediated by S. coelicolor signal peptides. For each plate, the signal peptide of each designated SCO protein, fused in frame with the mature sequence of agarase, was expressed in two S. lividans tat+ strains (1326, lower left on each plate, or 10-164, lower right of each plate), or in the 10- 164 isogenic tatC strain (top of each plate). Strains were cultured on minimal medium containing 0.5% glucose for 5 days and stained with lugol solution. (Top Left) plate (DagA) corresponds to agarase expressed from an identical construct with its native signal peptide.

Note that although expression of agarase in S. coelicolor \s highly inhibited by glucose, heterologous expression of agarase in S. lividans, which is dagA-, is not (Servin-Gonzalez L 1 Jensen MR, White J, Bibb M (1994) Microbiology 140:2555-2565).

DETAILED DESCRIPTION OF THE INVENTION

The invention will now be described in detail by way of reference only using the following definitions and examples. All patents and publications, including all sequences disclosed within such patents and publications, referred to herein are expressly incorporated by reference. Unless defined otherwise herein, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Singleton, et al., DICTIONARY OF MICROBIOLOGY AND MOLECULAR BIOLOGY, 2D ED. , John Wiley and Sons, New York (1994), and Hale & Marham, THE HARPER COLLINS DICTIONARY OF BIOLOGY, Harper Perennial, NY (1991) provide one of skill with a general dictionary of many of the terms used in this invention. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials are described. Numeric ranges are inclusive of the numbers defining the range. Unless otherwise indicated, nucleic acids are written left to right in 5 1 to 3' orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively.

Practitioners are particularly directed to Sambrook et al., 1989, and Ausubei FM et al., 1993, for definitions and terms of the art. It is to be understood that this invention is not limited to the particular methodology, protocols, and reagents described, as these may vary. The headings provided herein are not limitations of the various aspects or embodiments of the invention which can be had by reference to the specification as a whole. Accordingly, the terms defined immediately below are more fully defined by reference to the specification as a whole.

The term "polypeptide" as used herein refers to a compound made up of a single chain of amino acid residues linked by peptide bonds. The term "protein" as used herein may be synonymous with the term "polypeptide" or may refer, in addition, to a complex of two or more polypeptides.

The term "fusion polypeptide" or'Tat fusion polypeptide" as used herein refers to a Tat signal peptide linked to the protein of interest. It is understood that a protein of interest refers to a heterologous protein that is operably linked to the Tat signal peptide of the

invention.

A "signal peptide" as used herein refers to an amino-terminal extension on a protein to be secreted. Nearly all secreted proteins use an amino-terminal protein extension which plays a crucial role in the targeting to and translocation of precursor proteins across the membrane and which is proteolytically removed by a signal peptidase during or immediately following membrane transfer.

A "Tat signal peptide" refers to a N-terminally extended sequence which includes two consecutive arginine residues and which functions in the secretion of proteins in prefolded confirmation. A "Tat signal peptide" may be interchangeably referred to as "Tat peptide" or "Tat polypeptide".

A "tagged Tat fusion polypeptide" herein refers to a fusion polypeptide, which comprises a Tat signal peptide and a heterologous peptide, to which a tag sequence can be linked and used to identify transformants and/or to facilitate the purification of recombinant Tat fusion polypeptides. As used herein, a "protein of interest" or "polypeptide of interest" refers to the protein to be expressed and secreted by the host cell. The protein of interest may be any protein which up until now has been considered for expression in prokaryotes. The protein of interest may be either homologous or heterologous to the host. In the first case over expression should be read as expression above normal levels in said host. In the latter case basically any expression is of course over expression.

As used herein, the term "heterologous protein" refers to a protein or polypeptide that does not naturally occur in a host cell. Examples of heterologous proteins include enzymes such as hydrolases including proteases, cellulases, amylases, other carbohydrases, and lipases; isomerases such as racemases, epimerases, tautomerases, or mutases; transferases, kinases and phophatases. The heterologous gene may encode therapeutically significant proteins or peptides, such as growth factors, cytokines, ligands, receptors and inhibitors, as well as vaccines and antibodies. The gene may encode commercially important industrial proteins or peptides, such as proteases, carbohydrases such as amylases and glucoamylases, cellulases, oxidases and lipases. The gene of interest may be a naturally occurring gene, a mutated gene or a synthetic gene.

The term "homologous protein" refers to a protein or polypeptide native or naturally occurring in a host cell. The invention includes host cells producing the homologous protein via recombinant DNA technology. The present invention encompasses a host cell having a deletion or interruption of the nucleic acid encoding the naturally occurring homologous protein, such as a protease, and having nucleic acid encoding the homologous protein re-

introduced in a recombinant form. In another embodiment, the host cell produces the homologous protein.

The term "polynucleotide" or "nucleic acid molecule" includes RNA, DNA and cDNA molecules. As used herein, the term refers to a polymeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides. This term refers only to the primary structure of the molecule and thus includes double- and single-stranded DNA and RNA. It also includes known types of modifications, for example, labels which are known in the art, methylation, "caps", substitution of one or more of the naturally occurring nucleotides with an analog, internucleotide modifications such as, for example, those with uncharged linkages (e.g., methyl phosphonates, phosphotriesters, phosphoamidates, carbamates, etc.) and with charged linkages (e.g., phosphorothioates, phosphorodithioates, etc.), those containing pendant moieties, such as, for example proteins (including for e.g., nucleases, toxins, antibodies, signal peptides, poly-L-lysine, etc.), those with intercalators (e.g., acridine, psoralen, etc.), those containing chelates (e.g., metals, radioactive metals, boron, oxidative metals, etc.), those containing alkylators, those with modified linkages (e.g., alpha anomeric nucleic acids, etc.), as well as unmodified forms of the polynucleotide. Generally, nucleic acid segments provided by this invention may be assembled from fragments of the genome and short oligonucleotide linkers, or from a series of oligonucleotides, or from individual nucleotides, to provide a synthetic nucleic acid which is capable of being expressed in a recombinant transcriptional unit comprising regulatory elements derived from a microbial or viral operon, or a eukaryotic gene.

A "heterologous" nucleic acid construct or sequence has a portion of the sequence which is not native to the cell in which it is expressed. Heterologous, with respect to a control sequence refers to a control sequence (i.e. promoter or enhancer) that does not function in nature to regulate the same gene the expression of which it is currently regulating. Generally, heterologous nucleic acid sequences are not endogenous to the cell or part of the genome in which they are present, and have been added to the cell, by infection, transfection, microinjection, electroporation, or the like. A "heterologous" nucleic acid construct may contain a control sequence/DNA coding sequence combination that is the same as, or different from a control sequence/DNA coding sequence combination found in the native cell.

As used herein, the term "vector" refers to a nucleic acid construct designed for transfer between different host cells. An "expression vector" refers to a vector that has the ability to incorporate and express heterologous DNA fragments in a foreign cell. Many

prokaryotic and eukaryotic expression vectors are commercially available. Selection of appropriate expression vectors is within the knowledge of those having skill in the art. Accordingly, an "expression cassette" or "expression vector" is a nucleic acid construct generated recombinantly or synthetically, with a series of specified nucleic acid elements that permit transcription of a particular nucleic acid in a target cell. The recombinant expression cassette can be incorporated into a plasmid, chromosome, mitochondrial DNA, plastid DNA, virus, or nucleic acid fragment. Typically, the recombinant expression cassette portion of an expression vector includes, among other sequences, a nucleic acid sequence to be transcribed and a promoter. As used herein, the term "plasmid" refers to a circular double-stranded (ds) DNA construct used as a cloning vector, and which forms an extrachromosomal self-replicating genetic element in many bacteria and some eukaryotes.

As used herein, the term "selectable marker-encoding nucleotide sequence" refers to a nucleotide sequence which is capable of expression in mammalian cells and where expression of the selectable marker confers to cells containing the expressed gene the ability to grow in the presence of a corresponding selective agent.

As used herein, the term "promoter" refers to a nucleic acid sequence that functions to direct transcription of a downstream gene. The promoter will generally be appropriate to the host cell in which the target gene is being expressed. The promoter together with other transcriptional and translational regulatory nucleic acid sequences (also termed "control sequences") are necessary to express a given gene. In general, the transcriptional and translational regulatory sequences include, but are not limited to, promoter sequences, ribosomal binding sites, transcriptional start and stop sequences, translational start and stop sequences, and enhancer or activator sequences. "Chimeric gene" or "heterologous nucleic acid construct", as defined herein refers to a non-native gene (i.e., one that has been introduced into a host) that may be composed of parts of different genes, including regulatory elements. A chimeric gene construct for transformation of a host cell is typically composed of a transcriptional regulatory region (promoter) operably linked to a heterologous protein coding sequence, or, in a selectable marker chimeric gene, to a selectable marker gene encoding a protein conferring antibiotic resistance to transformed cells. A typical chimeric gene of the present invention, for transformation into a host cell, includes a transcriptional regulatory region that is constitutive or inducible, a signal peptide coding sequence, a protein coding sequence, and a terminator sequence. A chimeric gene construct may also include a second DNA sequence encoding a signal peptide if secretion of the target protein is desired.

A nucleic acid is "operably linked" when it is placed into a functional relationship with another nucleic acid sequence. For example, DNA encoding a secretory leader is operably linked to DNA for a polypeptide if it is expressed as a preprotein that participates in the secretion of the polypeptide; a promoter or enhancer is operably linked to a coding sequence if it affects the transcription of the sequence; or a ribosome binding site is operably linked to a coding sequence if it is positioned so as to facilitate translation. Generally, "operably linked" means that the DNA sequences being linked are contiguous, and, in the case of a secretory leader, contiguous and in reading phase. However, enhancers do not have to be contiguous. Linking is accomplished by ligation at convenient restriction sites. If such sites do not exist, the synthetic oligonucleotide adaptors or linkers are used in accordance with conventional practice.

As used herein, the term "gene" means the segment of DNA involved in producing a polypeptide chain, that may or may not include regions preceding and following the coding region, e.g. 5' untranslated (5' UTR) or "leader" sequences and 3' UTR or "trailer" sequences, as well as intervening sequences (introns) between individual coding segments (exons).

As used herein, "recombinant" includes reference to a cell or vector, that has been modified by the introduction of a heterologous nucleic acid sequence or that the cell is derived from a cell so modified. Thus, for example, recombinant cells express genes that are not found in identical form within the native (non-recombinant) form of the cell or express native genes that are otherwise abnormally expressed, under expressed or not expressed at all as a result of deliberate human intervention.

As used herein, the terms "transformed", "stably transformed" or "transgenic" with reference to a cell means the cell has a non-native (heterologous) nucleic acid sequence integrated into its genome or as an episomal plasmid that is maintained through two or more generations.

As used herein, the term "expression" refers to the process by which a polypeptide is produced based on the nucleic acid sequence of a gene. The process includes both transcription and translation. The term "introduced" in the context of inserting a nucleic acid sequence into a cell, means "transfection", or "transformation" or "transduction" and includes reference to the incorporation of a nucleic acid sequence into a eukaryotic or prokaryotic cell where the nucleic acid sequence may be incorporated into the genome of the cell (for example, chromosome, plasmid, plastid, or mitochondrial DNA), converted into an autonomous replicon, or transiently expressed (for example, transfected mRNA).

The terms "isolated" or "purified" as used herein refer to a nucleic acid or polypeptide that is removed from at least one component with which it is naturally associated.

As used herein, "substantially equivalent" can refer both to nucleotide and amino acid sequences, for example a mutant sequence, that varies from a reference sequence by one or more substitutions, deletions, or additions, the net effect of which does not result in an adverse functional dissimilarity between the reference and subject sequences. Typically, such a substantially equivalent sequence varies from one of those listed herein by no more than about 20% (i.e., the number of individual residue substitutions, additions, and/or deletions in a substantially equivalent sequence, as compared to the corresponding reference sequence, divided by the total number of residues in the substantially equivalent sequence is about 0.2 or less). Such a sequence is said to have 80% sequence identity to the listed sequence. In one embodiment, a substantially equivalent, e.g., mutant, sequence of the invention varies from a listed sequence by no more than 10% (90% sequence identity); in a variation of this embodiment, by no more than 5% (95% sequence identity); and in a further variation of this embodiment, by no more than 2% (98% sequence identity). Substantially equivalent, e.g., mutant, amino acid sequences according to the invention generally have at least 95% sequence identity with a listed amino acid sequence, whereas substantially equivalent nucleotide sequence of the invention can have lower percent sequence identities, taking into account, for example, the redundancy or degeneracy of the genetic code. For the purposes of the present invention, sequences having substantially equivalent biological activity and substantially equivalent expression characteristics are considered substantially equivalent. As used herein, the term "activity" or "biological activity" refers to an activity associated with a particular protein, such as enzymatic activity. Biological activity refers to any activity that would normally be attributed to that protein by one skilled in the art.

The recognized single letter codes for amino acid residues are used consistently herein wherein alanine (A); arginine ( R); asparagine (N); aspartic acid (D); cysteine ( C); glutamic acid (E); glutamine (Q); glycine (G); histidine (H); isoleucine (I); leucine (L); lysine (K); methionine (M); phenylalanine (F); proline (P); serine (S); threonine (T); tryptophan (W); tyrosine (Y) and valine (V).

The term "TATFIND", and "TATFIND1.2" refer to a Tat substrate recognition program developed to detect putative Tat substrates in bacterial genomes. In general, the program is based on the position and sequences of the Tat motif (WO 03/079007). The motif as disclosed in WO 03/079007 is within the first 35 amino acids of a protein

sequence, (X- 1 )R°R +1 (X +2 )(X +3 ) (X +4 ), wherein X "1 is M, H, A 1 P, K, R, N, T, G, S, D, Q or E; R 0 R +1 represent the twin-arginines; X +2 is A, P, K, R, N, T, G, S, D, Q or E; X +3 is I 1 W, F, L, V, Y, M, C, H, A, P, N or T (positively charged residues being excluded from this position) and X +4 is Q, I 1 L 1 V, M or F. The term "TatP" refers to a Tat substrate recognition program that recognizes the same Tat motif as TATFIND. The TatP program partially uses a neural network as well as a rule based classification used in the TATFIND program and reference is made to Bendtsen, J. D. et al., (2005), BMC Bioinformatics 6:167 - 175.

The present invention provides novel gram-positive microorganism secretion factors and methods that can be used in microorganisms to provide protein secretion and the production of proteins in secreted form.

Tat Nucleic Acids and Amino Acids The invention is based on the discovery of novel Tat signal peptides.

The invention comprises isolated Tat signal peptides that include, but are not limited to, a polypeptide comprising the amino acid sequence set forth as SEQ ID NO: 17, 20, 23 26 29 32 35, 38, 41 , 44, 47, 50,53, 56, 59, 62, 65, 68, 71 , 74, 77, 80, 83, 86, 89, 92, 95, 98, 101 , 104, 107, 110, 113 116, 119, 122, 125, 128 134, 134, 137, 140, 143, 146, 149, 152, 155, 158, 161 , 164, 167, 170, 173, 176, 179, 182, 185, 188, 191 , 194, 197, 200, 203, and 204-253. The Tat signal peptides include polypeptides comprising an amino acid sequence encoded by any one nucleotide sequence generated by amplifying a nucleic acid using polymerase chain reaction (PCR) primer pairs having SEQ ID NO: 15 and 16, 18 and 19, 21 and 22, 24 and 25, 27 and 28, 30 and 31, 33 and 34, 36 and 37, 39 and 40, 42 and 43, 45 and 46, 48 and 49, 51 and 52, 54 and 55, 57 and 58, 60 and 61 , 63 and 64, 66 and 67, 69 and 70, 72 and 73, 75 and 76, 78 and 79, 81 and 82, 84 and 85, 87 and 88, 90 and 91 , 93 and 94, 96 and 97, 99 and 100, 102 and 103, 105 and 106, 108 and 109, 111 and 112, 114 and 115, 117 and 118, 120 and 121 , 123 and 124, 126 and 127, 129 and 130, 132 and 133, 135 and 136, 138 and 139, 141 and 142, 144 and 145, 147 and 148, 150 and 151 , 153 and 154, 156 and 157, 159 and 160, 162 and 163, 165 and 166, 168 and 169,

171 and 172, 174 and 175, 177 and 178, 180 and 181 , 183 and 184, 186 and 187, 189 and 190, 192 and 193, 195 and 196, 198 and 199, and 201 and 202.

The present invention is directed to certain novel Tat signal peptides that comprise sequences which include the motif (X "1 )RR(X +2 )(X +3 ) (X +4 ), wherein RR represents two adjacent arginine residue and X designates positions restrict to other selected amino acids: X "1 is M, H, A, P, K, R, N, T, G, S, D, Q, or E; X +2 is A, P, K, R, N, T, G, S, D, Q or E; X +3 is

I 1 W, F, L, V 1 Y, M, C, H, A, P, N or T; and X +4 is Q, I, L, V, M or F and wherein the motif is not within the first 35N' terminal residues of the amino acid sequence of the polypeptide. Tat signal sequences of the invention include the motif (X '1 )RR(X +2 )(X +3 ) (X +4 ), wherein RR represents two adjacent arginine residue and X designates positions restrict to other selected amino acids: X "1 is M 1 H 1 A, P 1 K, R 1 N 1 T 1 G, S 1 D, Q, or E; X +2 is a polar amino acid residue; and X +3 and X +4 are non-polar amino acid residues, and wherein the motif is not within the first 35N' terminal residues of the amino acid sequence of the polypeptide.

In the present context, non-polar amino acids are alanine, leucine, isoleucine, valine, proline, phenylalanine, tryptophan, and methionine; and polar amino acids include glycine, serine, threonine, cysteine, tyrosine, asparagine, glutamine, arginine, lysine, histidine, aspartic acid and glutamic acid.

Some preferred Tat signal peptides encompassed by the invention include, but are not limited to the Tat signal peptides of proteins SCO2286 - MTPANHQAPTSAPSPAPSQSSHAPELRAAARSLGRRRFLTVTGAAAALAFAVNLPAAGTA

SAA (SEQ ID NO: 218) and SCO3790 long -

MRKLLPLIGTPSGSHPGGRSAMTCRFRCGDACFHEVPNTSSNEYVGDVIAGALSRRS MM

RAAAWTVAAAGAGAVGVAGAPSAQAA (SEQ ID NO: 227).

Another Tat signal peptide comprises the Tat signal peptide of protein SCO6580 long - MTPFTDSSRTDA

GTDPSADGPGESLRRALGVNRRRFLSTCTAVAAGAVAAPVFGASPALAH (SEQ ID NO:

241).

In addition, the invention contemplates isolated variants of the Tat signal peptides of the invention. In some embodiments, the variant of a Tat signal peptide is encoded by an alternative start site that leads to the translation of a shorter Tat signal peptide. Variant Tat signal peptides include, but are not limited to the Tat signal peptide of protein SCO3790, designated SCO3790short (SEQ ID NO: 122), and the Tat signal peptide of protein

SCO6580, designated SCO6580 short - (SEQ ID NO: 182). The invention also contemplates Tat signal peptides that comprise the motif (X "1 )RR(X +2 )(X +3 ) (X +4 ), wherein RR represents two adjacent arginine residue and X designates positions restrict to other selected amino acids: X "1 is H, A, P, K, R, N, T, G, S,

D, Q E or L; X +2 is A, P, K, R, N, T, G, S, D, Q or E; X +3 is I, W, F, L, V, Y, M, C, H, A, P, N or T; and X +4 is T, G or A and wherein the motif is within the first 35N' terminal residues of the amino acid sequence of the polypeptide. In some embodiments when X "1 is S then X +4

will be T; in other embodiments when X '1 is H then X +4 will be A, and in other embodiments when X '1 is L then X +4 will be G. The primary amino acid sequence of some of the Tat signal peptides dictates that the amino acid residues at positions X +3 and X +4 are non-polar amino acid residues. Thus, the invention comprises Tat signal sequences that include the motif (X "1 ) RR(X +2 ) (X +3 ) (X +4 ), wherein RR represents two adjacent arginine residue and X designates positions restrict to other selected amino acids: X '1 is H, A, P, K, R, N, T, G, S, D, Q E or L; X +2 is A, P, K, R, N, T, G, S, D, Q or E; and X +3 and X +4 are non-polar amino acid residues.

Preferred Tat signal peptides include, but are not limited to the Tat signal sequence of protein SCO1590 - MGGVSRRAFTVAALSAFTLVPEASAA (SEQ ID NO: 211) and Tat signal sequence of protein SCO1824 - MTAPLSR HRRALAIPAGLAVAASLAFLP GTPAAATPAAEAA (SEQ ID NO: 213).

The invention also provides biologically active variants of any of the amino acid sequences set forth as SEQ ID NO: 17, 20, 23 26 29 32 35, 38, 41 , 44, 47, 50,53, 56, 59, 62, 65, 68, 71 , 74, 77, 80, 83, 86, 89, 92, 95, 98, 101 , 104, 107, 110, 113 116, 119, 122, 125, 128 131 , 134, 137, 140, 143, 146, 149, 152, 155, 158, 161 , 164, 167, 170, 173, 176, 179, 182, 185, 188, 191, 194, 197, 200, 203, and 204-253, and substantial equivalents thereof. Substantial equivalent Tat signal peptides have at least about 80%, or 85%, more typically at least about 90%, 91%, 92%, 93%, or 94% and even more typically at least about 95%, 96%, 97%, 98% or 99% amino acid identity, and that retain biological activity. The biological activity of a signal peptide refers to the ability of the signal peptide to translocate the polypeptide to the extracellular space. Fragments of the Tat peptides of the present invention which are capable of exhibiting biological activity are also encompassed by the present invention. The term "variant" when made in reference to a polypeptide refers to any polypeptide differing from naturally occurring polypeptides by amino acid insertions, deletions, and substitutions, or combinations thereof, which can be created using, e g., recombinant DNA techniques. A variant of a polypeptide can also refer to a shortened version of a polypeptide that is generated by use of an alternative start codon as a translation initiation site. The alternative codon can be an in-frame or an out-of-frame start codon. Examples of variants of polypeptides that are translated from alternative start codons include but are not limited to the TAT signal polypeptides of SEQ ID NO: 122 and SEQ ID NO: 182.

Guidance in determining which amino acid residues may be replaced, added or deleted without abolishing activities of interest, may be found by comparing the sequence of the particular polypeptide with that of homologous peptides and minimizing the number

of amino acid sequence changes made in regions of high homology (conserved regions) or by replacing amino acids with consensus sequence. Preferably, amino acid "substitutions" are the result of replacing one amino acid with another amino acid having similar structural and/or chemical properties, i.e., conservative amino acid replacements. "Conservative" amino acid substitutions may be made on the basis of similarity in polarity, charge, solubility, hydrophobicity, hydrophilicity, and/or the amphipathic nature of the residues involved. For example, nonpolar (hydrophobic) amino acids include alanine, leucine, isoleucine, valine, proline, phenylalanine, tryptophan, and methionine; polar neutral amino acids include glycine, serine, threonine, cysteine, tyrosine, asparagine, and glutamine; positively charged (basic) amino acids include arginine, lysine, and histidine; and negatively charged (acidic) amino acids include aspartic acid and glutamic acid. "Insertions" or "deletions" are typically in the range of about 1 to 5 amino acids. The variation allowed may be experimentally determined by systematically making insertions, deletions, or substitutions of amino acids in a polypeptide molecule using recombinant DNA techniques and assaying the resulting recombinant variants for activity.

Where alteration of function is desired, insertions, deletions or non-conservative alterations can be engineered to produce altered polypeptides. Such alterations can, for example, alter one or more of the biological functions or biochemical characteristics of the polypeptides of the invention. For example, such alterations may change polypeptide characteristics such as ligand-binding affinities, interchain affinities, or degradation/turnover rate. Further, such alterations can be selected so as to generate polypeptides that are better suited for expression, scale up and the like in the host cells chosen for expression. The present invention also provides isolated Tat peptides encoded by the nucleic acids/polynculeotides of the present invention or by degenerate variants of the nucleic acids of the invention. By "degenerate variants" is intended nucleotide fragments that differ from a nucleic acid of the invention by nucleotide sequence but, due to the degeneracy of the genetic code, encode an identical polypeptide sequence.

The invention provides for fusion or chimeric polypeptides. As used herein, a fusion protein or fusion polypeptide comprises a Tat signal peptide operatively linked to a polypeptide/protein of interest. Within the fusion protein, the term "operatively linked" is intended to indicate that the Tat signal polypeptide and the polypeptide of interest are fused in-frame with one another. The Tat signal peptides are fused to the N-terminal end of the peptide of interest. Polypeptides of interest include homologous or heterologous polypeptides. Polypeptides of interest include full-length polypeptides that are naturally synthesized with a signal peptide, the mature form of the full-length polypeptides, and

polypetides that naturally lack a signal peptide. The fusion polypeptides thus comprise a Tat signal sequence that includes the motif (X "1 )RR(X +2 )(X +3 ) (X +4 ), wherein RR represents two adjacent arginine residue and X designates positions restrict to other selected amino acids: X "1 is M, H, A, P, K, R, N, T, G, S, D, Q 1 or E; X +2 is A, P 1 K, R 1 N, T 1 G, S 1 D, Q or E; X +3 is I 1 W, F 1 L 1 V, Y 1 M, C 1 H 1 A 1 P, N or T; and X +4 is Q 1 1 1 L, V 1 M or F and wherein the motif is not within the first 35N' terminal residues of the amino acid sequence of the polypeptide. In other embodiments Tat signal sequences comprised by the fusion polypepitde can include the motif (X "1 )RR(X +2 )(X +3 ) (X + "), wherein RR represents two adjacent arginine residue and X designates positions restrict to other selected amino acids: X "1 is M, H 1 A 1 P 1 K 1 R 1 N 1 T, G 1 S 1 D 1 Q 1 or E; X +2 is a polar amino acid residue; and X +3 and X +4 are non-polar amino acid residues, and wherein the motif is not within the first 35N 1 terminal residues of the amino acid sequence of the polypeptide.

As recited herein, in the present context, non-polar amino acids are alanine, leucine, isoleucine, valine, proline, phenylalanine, tryptophan, and methionine; and polar amino acids include glycine, serine, threonine, cysteine, tyrosine, asparagine, glutamine, arginine, lysine, histidine, aspartic acid and glutamic acid.

Some preferred fusion polypepitdes comprise Tat signal peptides that include, but are not limited to the Tat signal peptides of proteins SCO2286 - MTPANHQAPTSAPSPAPSQSSHAPELRAAARSLGRRRFLTVTGAAAALAFAVNLPAAGTA SAA (SEQ ID NO: 218) and SCO3790 long -

MRKLLPLIGTPSGSHPGGRSAMTCRFRCGDACFHEVPNTSSNEYVGDVIAGALSRRS MM RAAAWTVAAAGAGAVGVAGAPSAQAA (SEQ ID NO: 227).

Another Tat signal peptide comprises the Tat signal peptide of protein SCO6580 long - MTPFTDSSRTDA GTDPSADGPGESLRRALGVNRRRFLSTCTAVAAGAVAAPVFGASPALAH (SEQ ID NO: 241).

In addition, the fusion polypeptides of the invention comprise variants of the novel Tat signal peptides. In some embodiments, the fusion polypeptide comprises a variant of a Tat signal peptide that is encoded by an alternative start site that leads to the translation of a shorter Tat signal peptide. Variant Tat signal peptides include, but are not limited to the Tat signal peptide of protein SCO3790, designated SCO3790short (SEQ ID NO: 122), and the Tat signal peptide of protein SCO6580, designated SCO6580 short - (SEQ ID NO: 182).

The invention also contemplates fusion polypepitdes that comprise Tat signal peptides that include the motif (X "1 ) RR(X +2 ) (X +3 ) (X +4 ), wherein RR represents two adjacent

arginine residue and X designates positions restrict to other selected amino acids: X '1 is H 1 A, P, K, R, N, T, G, S, D, Q E or L; X +2 is A, P, K, R, N 1 T, G, S, D, Q or E; X +3 is I, W, F, L, V, Y, M, C, H, A, P, N or T; and X +4 is T, G or A and wherein the motif is within the first 35N' terminal residues of the amino acid sequence of the polypeptide. In some embodiments when X "1 is S then X +4 will be T; in other embodiments when X "1 is H then X +4 will be A, and in other embodiments when X "1 is L then X +4 will be G. The primary amino acid sequence of some of the Tat signal peptides dictates that the amino acid residues at positions X +3 and X +4 are non-polar amino acid residues. Thus, the invention comprises Tat signal sequences that include the motif (X "1 ) RR(X +2 ) (X +3 ) (X +4 ), wherein RR represents two adjacent arginine residue and X designates positions restrict to other selected amino acids: X "1 is H, A, P, K 1 R, N, T, G, S, D, Q E or L; X +2 is A, P, K, R, N, T, G 1 S, D, Q or E; and X +3 and X +4 are non-polar amino acid residues.

Preferred fusion polypepitdes comprise Tat signal peptides that include, but are not limited to the Tat signal sequence of protein SCO1590 - MGGVSRRAFTVAALSAFTLVPEASAA (SEQ ID NO: 211) and Tat signal sequence of protein SCO1824 - MTAPLSR HRRALAIPAGLAVAASLAFLP GTPAAATPAAEAA (SEQ ID NO: 213).

In some embodiments, the fusion polypeptide comprises a Tat signal peptide that is the secretory leader sequence of polypeptides that are naturally expressed by Streptomyces that is operably linked to a heterologous polypeptide or protein of interest. In some embodiments, the fusion polypeptide comprises a Tat signal peptide and a heterologous polypeptide such as an enzyme, a growth factor or a hormone. Enzymes include but are not limited to protease, a carbohydrase, such as amylases, cellulases, xylanases, and lipases; an isomerase, such as racemases, epimerases, tautomerases, or mutases; a transferase; a glucoamylase; a kinase, an amidase, an esterase, or an oxidase. Thus, the protein of interest may be an enzyme such as a carbohydrase, such as an α- amylase, an alkaline α-amylase, a β-amylase, a cellulase; a dextranase, an α-glucosidase, an α-galactosidase, a glucoamylase, a hemicellulase, a pentosanase, a xylanase, an invertase, a lactase, a naringanase, a pectinase or a pullulanase; a protease such as an acid protease, an alkali protease, bromelain, ficin, a neutral protease, papain, pepsin, a peptidase, rennet, rennin, chymosin, subtilisin, thermolysin, an aspartic proteinase, or trypsin; a lipase or esterase, such as a triglyceridase, a phospholipase, a pregastric esterase, a phosphatase, a phytase, an amidase, an iminoacylase, a glutaminase, a lysozyme, or a penicillin acylase; an isomerase such as glucose isomerase; an oxidoreductases, e.g., an amino acid oxidase, a catalase, a chloroperoxidase, a glucose

oxidase, a hydroxysteroid dehydrogenase or a peroxidase; a lyase such as a acetolactate decarboxylase, an aspartic β-decarboxylase, a fumarese or a histadase; a transferase such as cyclodextrin glycosyltranferase; or a ligase, for example. In particular embodiments, the protein may be an aminopeptidase, a carboxypeptidase, a chitinase, a cutinase, a deoxyribonuclease, an α-galactosidase, a β-galactosidase, a β-glucosidase, a laccase, a mannosidase, a mutanase, a pectinolytic enzyme, a polyphenoloxidase, ribonuclease or transglutaminase, for example. The enzyme may be a wild-type enzyme or a variant of a wild-type enzyme. The enzyme may also be a hybrid enzyme, which comprises at least two fragments from different enzymes, for example a catalytic domain of one enzyme and a starch binding domain of a different enzyme or two fragments each fragment comprising part of the catalytic domain of the enzymes. In some embodiments, the fusion polypeptide of the invention comprises a Tat signal peptide, as recited herein, and an enzyme that is a protease, a carbohydrase, an isomerase, a glucoamylase, a kinase, an amidase, an esterase, or an oxidase. In other embodiments, the fusion polypeptide caomprises a Tat signal peptide and a heterologous poplypeptide that is not naturally associated with a secretion signal peptide. In other embodiment, the fusion polypeptide comprises a heterologous polypeptide that may be a therapeutic protein (i.e., a protein having a therapeutic biological activity). Examples of suitable therapeutic proteins include: erythropoietin, cytokines such as interferon-α, interferon-β, interferon-γ, interferon-o, and granulocyte-CSF, GM-CSF, coagulation factors such as factor VIII, factor IX, and human protein C, antithrombin III, thrombin, soluble IgE receptor α-chain, IgG, IgG fragments, IgG fusions, IgM, IgA, interleukins, urokinase, chymase, and urea trypsin inhibitor, IGF-binding protein, epidermal growth factor, growth hormone-releasing factor, annexin V fusion protein, angiostatin, vascular endothelial growth factor-2, myeloid progenitor inhibitory factor-1 , osteoprotegerin, α-1-antitrypsin, α-feto proteins, DNase II, kringle 3 of human plasminogen, glucocerebrosidase, TNF binding protein 1 , follicle stimulating hormone, cytotoxic T lymphocyte associated antigen 4-lg, transmembrane activator and calcium modulator and cyclophilin ligand, soluble TNF receptor Fc fusion, glucagon like protein 1 and IL-2 receptor agonist. Antibody proteins, e.g., monoclonal antibodies that may be humanized, are of particular interest.

The invention encompasses the polynucleotides that encode the fusion polypeptides. Thus, in some embodiments, the invention is directed to isolated polynucleotides that comprise a polynucleotide sequence encoding a Tat signal peptide, as

recited above, that is operably linked to a second nucleotide sequence encoding a heterologous polypeptide.

The Tat polynucleotides that encode the Tat polypeptides of the invention, as recited above, include the sequence information of the nucleic acid sequences generated by PCR using the primer pairs having SEQ ID NO: 15 and 16, 18 and 19, 21 and 22, 24 and 25, 27 and 28, 30 and 31 , 33 and 34, 36 and 37, 39 and 40, 42 and 43, 45 and 46, 48 and 49, 51 and 52, 54 and 55, 57 and 58, 60 and 61 , 63 and 64, 66 and 67, 69 and 70, 72 and 73, 75 and 76, 78 and 79, 81 and 82, 84 and 85, 87 and 88, 90 and 91 , 93 and 94, 96 and 97, 99 and 100, 102 and 103, 105 and 106, 108 and 109, 111 and 112, 114 and 115, 117 and 118, 120 and 121 , 123 and 124, 126 and 127, 129 and 130, 132 and 133, 135 and 136, 138 and 139, 141 and 142, 144 and 145, 147 and 148, 150 and 151, 153 and 154, 156 abd 157, 159 and 160, 162 and 163, 165 and 166, 168 and 169, 171 and 172, 174 and 175, 177 and 178, 180 and 181 , 183 and 184, 186 and 187, 189 and 190, 192 and 193, 195 and 196, 198 and 199, and 201 and 202. The primer pairs are given in Table 2. The invention also encompasses polynucleotides that are generated by degenerate variants of the primer pairs recited herein.

The polynucleotides of the present invention also include, but are not limited to a polynucleotide comprising the protein coding sequence of the Tat signal peptides having SEQ ID NO: 254 -312. The polynucleotides of the present invention also include polynucleotides that hybridize under stringent conditions to the complement of any of the polynucleotides of the invention , a polynucleotide encoding any one of the Tat signal peptides of SEQ ID NO: 17, 20, 23 26 29 32 35, 38, 41 , 44, 47, 50,53, 56, 59, 62, 65, 68, 71 , 74, 77, 80, 83, 86, 89, 92, 95, 98, 101 , 104, 107, 110, 113 116, 119, 122, 125, 128 131 , 134, 137, 140, 143, 146, 149, 152, 155, 158, 161 , 164, 167, 170, 173, 176, 179, 182, 185, 188, 191 , 194, 197, 200, 203, and 204-253, a polynucleotide that is an allelic variant of any polynucleotide recited above, a polynucleotide which is a species homolog of any of the polypeptides recited above, or a polynucleotide that encodes a polypeptide comprising any one of the motifs described above. An allelic variant denotes any of two or more alternative forms of a gene occupying the same chromosomal locus. Allelic variation arises naturally through mutation, and may result in polymorphism within populations. Gene mutations can be silent (no change in the encoded polypeptide) or may encode polypeptides having altered amino acid sequences. An allelic variant of a polypeptide is a polypeptide encoded by an allelic variant of a gene.

The invention also encompasses polynucleotide fragments of the nucleic acid sequences of the invention. In addition to the use of these fragments for PCR, the

polynucleotide fragments can be used in various hybridization procedures or microarray procedures to identify or amplify identical or related parts of mRNA or DNA molecules from which the Tat signal peptides of the invention can be derived.

The polynucleotides of the invention additionally include the complement of any of the polynucleotides recited above.

The polynucleotides of the invention also provide polynucleotides including nucleotide sequences that are substantially equivalent to the Tat polynucleotides recited above. Polynucleotides according to the invention can have, e.g., at least about 65%, at least about 70%, at least about 75%, at least about 80%, more typically at least about 90%, and even more typically at least about 95%, sequence identity to a polynucleotide recited above. The invention also provides the complement of such polynucleotides. The polynucleotide can be DNA (genomic, cDNA, amplified, or synthetic) or RNA. Methods and algorithms for obtaining such polynucleotides are well known to those of skill in the art and can include, for example, methods for determining hybridization conditions which can routinely isolate polynucleotides of the desired sequence identities.

A Tat fusion polypeptide of the invention can be produced by standard recombinant DNA techniques. For example, DNA fragments coding for the different polypeptide sequences are ligated together in-frame in accordance with conventional techniques, e.g., by employing blunt-ended or stagger-ended termini for ligation, restriction enzyme digestion to provide for appropriate termini, filling-in of cohesive ends as appropriate, alkaline phosphatase treatment to avoid undesirable joining, and enzymatic ligation. In another embodiment, the fusion gene can be synthesized by conventional techniques including automated DNA synthesizers. Alternatively, PCR amplification of gene fragments can be carried out using anchor primers that give rise to complementary overhangs between two consecutive gene fragments that can subsequently be annealed and reamplified to generate a chimeric gene sequence (see, e.g., Ausubel, et al. (eds.) CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, John Wiley & Sons, 1992). A Tat signal peptide-encoding nucleic acid can be cloned into such an expression vector such that the fusion moiety e.g. polypeptide of interest, is linked in-frame to the Tat signal peptide. An expression vector comprising a Tat signal peptide -encoding polynucleotide can be any vector capable of expressing the polynucleotide encoding Tat signal peptide or Tat-fusion polypeptide in a selected host organism, and the choice of vector will depend on the host cell into which the expression vector is introduced. Thus, the invention encompasses expression vectors that comprise a first nucleotide sequence encoding a Tat

signal peptide, as recited herein, operably linked to a second nucleotide sequence encoding a heterologous polypeptide.

The expression vector typically includes the components of a cloning vector, such as, for example, an element that permits autonomous replication of the vector in the selected host organism and one or more phenotypically detectable markers for selection purposes. The expression vector normally comprises control nucleotide sequences encoding a promoter, operator, ribosome binding site, translation initiation signal and optionally, a repressor gene or one or more activator genes. For expression under the direction of control sequences, the nucleic acid sequence the modified enzyme is operably linked to the control sequences in proper manner with respect to expression.

Preferably, a polynucleotide in a vector is operably linked to a control sequence that is capable of providing for the expression of the coding sequence by the host cell, i.e. the vector is an expression vector. The control sequences may be modified, for example, by the addition of further transcriptional regulatory elements to make the level of transcription directed by the control sequences more responsive to transcriptional modulators. The control sequences may in particular comprise promoters.

In the vector, the nucleic acid sequence encoding for the Tat signal peptide or the Tat signal peptide fusion polypepitde is operably combined with a suitable promoter sequence. The promoter can be any DNA sequence having transcription activity in the host organism of choice and can be derived from genes that are homologous or heterologous to the host organism. Examples of suitable promoters for directing the transcription of the modified nucleotide sequence, such as modified enzyme nucleic acids, in a bacterial host include the promoter of the Streptomyces coelicolor agarase gene dagA promoters, the promoters of the Bacillus licheniformis .alpha.-amylase gene (amyL), the aprE promoter of Bacillus subtilis, the promoters of the Bacillus stearothermophilus maltogenic amylase gene (amyM, the promoters of the Bacillus amyloliquefaciens .alpha.-amylase gene (amyQ), the promoters of the Bacillus subtilis xylA and xylB genes and a promoter derived from a Lactococcus sp. -derived promoter including the P170 promoter.

The Tat fusion polypeptide may, in addition, can comprise a tag sequence that is fused to the C-terminus of the Tat fusion polypeptide to generate a tagged Tat fusion polypeptide. Such tag sequences can be used to identify transformants and/or to facilitate the purification of recombinant Tat fusion polypeptides. For example, the Tat fusion polypeptide it may be expressed to contain a tag such as those of maltose binding protein (MBP), glutathione-S-transferase (GST) or thioredoxin (TRX), or as a His tag. Kits for expression and purification of such fusion proteins are commercially available from New

England BioLab (Beverly, Mass.), Pharmacia (Piscataway, N.J.) and Invitrogen, respectively. The Tat fusion polypeptide can also be tagged with an epitope and subsequently purified by using a specific antibody directed to such epitope. One such epitope ("FLAG. RTM.) is commercially available from Kodak (New Haven, Conn.). Another tag that can be used in the invention is the c-myc tag, as is described in the examples.

This invention further provides expression vectors comprising at least a fragment of the polynucleotides set forth above and host cells or organisms transformed with these expression vectors. Useful vectors include plasmids, cosmids, lambda phage derivatives, phagemids, and the like, that are well known in the art. Accordingly, the invention also provides a vector including a polynucleotide of the invention and a host cell containing the polynucleotide. In general, the vector contains an origin of replication functional in at least one organism, convenient restriction endonuclease sites, and a selectable marker for the host cell.

The present invention further includes novel expression vectors comprising promoter elements operatively linked to polynucleotide sequences encoding a Tat fusion protein/polypeptide that comprises a Tat signal peptide of the invention and a protein of interest. The invention encompasses plasmids pTDW92, pTDW73, pTDW121 , pTDW102, pTDW119, pTDW74, pTDW75, pTDW48, pTDW76, pTDW103, pTDW118, pTDW77, pTDW49, pTDW51 , pTDW52, pTDW78, pTDW91 , pTDW79, pTDW104, pTDW90, pTDW89, pTDW88, pTDW80, pTDW53, pTDW106, pTDW81, pTDW82, pTDW83, pTDW84, pTDW61 , pTDW56, pTDW62, pTDW47, pTDW63, pTDW93, pTDW107, pTDW60, pTDW64, pTDW72, pTDW87, pTDW86, pTDW108, pTDW50, pTDW109, pTDW67, pTDW57, pTDW65, pTDW66, pTDW54, pTDW58, pTDW85, pTDW116, pTDW68, pTDWHO, pTDW111, pTDW113, pTDW94, pTDW69, pTDW70, pTDW114, pTDW71 , pTDW95, and pTDW115. The construction of the various expression vectors is disclosed in the examples.

Expression Systems

In some preferred embodiments, the expression host strain for heterologously expressed protein will be a Streptomyces strain. As used herein, the genus Streptomyces includes all members known to those skilled in the art, including but not limited to S. coelicolor, S. lividans, In some embodiments, the expression host strain is S. coelicolor. in other embodiments, the expression host strain is S. lividans. The genetic elements and tools required for heterologous expression of proteins is known in the art for Streptomyces, including expression vectors, promoters, as well as fermentation protocols (Glibert et al.,

(1995) Crit. Rev. Biotechnology 15:13 - 39). Other examples of suitable bacterial host organisms are gram positive bacterial species such as Bacillaceae including Bacillus subtilis, Bacillus Hcheniformis, Bacillus lentus, Bacillus brevis, Bacillus stearothermophilus, Bacillus alkalophilus, Bacillus amyloliquefaciens, Bacillus coagulans, Bacillus lautus, Bacillus megateήum and Bacillus thuhngiensis, and other Streptomyces species such as Streptomyces murinus, S. rubiginosus, S. griseus, S. avermitilis, lactic acid bacterial species including Lactococcus spp. such as Lactococcus lactis, Lactobacillus spp. including Lactobacillus reuteri, Leuconostoc spp., Pediococcus spp. and Streptococcus spp. In general a DNA sequence encoding a Tat signal sequence according to the invention is linked to the 5' end of the polynucleotide encoding the protein of interest, such that the signal sequence directs the secretion of the polypeptide sequence via the Tat pathway. Reference is made to Hopwood et al., (1985) Genetic Manipulation of Streptomyces: Laboratory Manual The John lnnes Foundation, Norwich, UK, Fernandez- Abalos et al., (2003) Microbiol. 149:1623 - 1632; Connell, N.D. (2001) Curr. Opin.

Biotechnol. 5:446 - 449; Binnie et al., (1997) Trends Biotechnol, 8:315 - 320; Van Mellaert (1993) FEMS Microbiol. Lett 114:121 - 128; and Baltz R., Chapter 6 "Gene Expression in Recombinant Streptomyces" in Gene Expression in Recombinant Microorganisms, Ed. Alan Smith, 1995, NY, M. Dekker).

Cell Cultures

The present invention further provides host cells genetically transformed with the vectors of the invention, which may be, for example, an expression vector that contains the Tat signal polynucleotides and/or the Tat fusion polynucleotides of the invention. Thus, the invention encompasses a bacterial host cell transformed with an expression vector, as recited above. The vector may be, for example, in the form of a plasmid, a viral particle, a phage etc. The host cells and transformed cells can be cultured in conventional nutrient media modified as appropriate for activating promoters, selecting transformants or amplifying Tat signal and/or Tat fusion polynucleotides. The specific culture conditions, such as temperature, pH and the like will be apparent to those skilled in the art. In addition, preferred culture conditions may be found in the scientific literature such as Hopwood (2000) Practical Streptomyces Genetics, John lnnes Foundation, Norwich UK; Hardwood et al., (1990) Molecular Biological Methods for Bacillus, John Wiley and from the American Type Culture Collection (ATCC). The present invention still further provides host cells genetically engineered to express the polynucleotides of the invention, wherein such

polynucleotides are in operative association with a regulatory sequence heterologous to the host cell which drives expression of the polynucleotides in the cell.

The host cell can be a higher eukaryotic host cell, such as a mammalian cell, or a lower eukaryotic host cell, such as a yeast cell. Preferably, the host cell is a prokaryotic cell, such as a bacterial cell. The bacterial host cells may be from gram positive or gram negative bacteria. Preferably, the invention encompasses host cells from gram positive bacteria. A number of types of gram positive cells that may act as suitable host cells for expression of the Tat signal peptides and/or Tat fusion polypeptides include, for example, the Strepϊomyces, Bacillus and Lactococcus species recited herein. For industrial purposes the bacteria generally used are Bacillus subtilis, S. lividans. Preferably, the polypeptides of the invention are expressed in S. lividans cells. If the protein is made in bacteria, it may be necessary to modify the protein produced therein, for example by phosphorylation or glycosylation of the appropriate sites, in order to obtain a biologically active protein. Such covalent attachments may be accomplished using known chemical or enzymatic methods.

Identification of Transformants

Although the presence/absence of marker gene expression suggests that a gene of interest is also present, its presence and expression should be confirmed. For example, if the nucleic acid encoding heterologous protein, such as agarase is inserted within a marker gene sequence, recombinant cells containing the insert can be identified by the absence of marker gene function. Alternatively, a marker gene can be placed in tandem with nucleic acid encoding the secretion factor under the control of a single promoter. Expression of the marker gene in response to induction or selection usually indicates expression of the secretion factor as well.

Alternatively, host cells which contain the coding sequence for a secretion factor and express the protein may be identified by a variety of procedures known to those of skill in the art. These procedures include, but are not limited to, DNA-DNA or DNA-RNA hybridization and protein bioassay or immunoassay techniques which include membrane- based, solution-based, or chip-based technologies for the detection and/or quantification of the nucleic acid or protein.

Secretion Assays

Means for determining the levels of secretion of a heterologous or homologous protein in a gram-positive host cell and detecting secreted proteins include, using either

polyclonal or monoclonal antibodies specific for the protein. Examples include enzyme- linked immunosorbent assay (ELISA), radioimmunoassay (RIA) and fluorescent activated cell sorting (FACS). These and other assays are described, among other places, in Hampton R et al (1990, Serological Methods, a Laboratory Manual, APS Press, St Paul MN) and Maddox DE et al (1983, J Exp Med 158:1211).

A wide variety of labels and conjugation techniques are known by those skilled in the art and can be used in various nucleic and amino acid assays. Means for producing labeled hybridization or PCR probes for detecting specific polynucleotide sequences include oligo labeling, nick translation, end-labeling or PCR amplification using a labeled nucleotide. Alternatively, the nucleotide sequence, or any portion of it, may be cloned into a vector for the production of an mRNA probe. Such vectors are known in the art, are commercially available, and may be used to synthesize RNA probes in vitro by addition of an appropriate RNA polymerase such as T7, T3 or SP6 and labeled nucleotides.

A number of companies such as Pharmacia Biotech (Piscataway NJ), Promega (Madison Wl), and US Biochemical Corp (Cleveland OH) supply commercial kits and protocols for these procedures. Suitable reporter molecules or labels include those radionuclides, enzymes, fluorescent, chemiluminescent, or chromogenic agents as well as substrates, cofactors, inhibitors, magnetic particles and the like. Patents teaching the use of such labels include US Patents 3,817,837; 3,850,752; 3,939,350; 3,996,345; 4,277,437; 4,275,149 and 4,366,241. Also, recombinant immunoglobulins may be produced as shown in US Patent No. 4,816,567 and incorporated herein by reference. An enzyme reporter assay for identifying a secreted polypeptide is described herein. While the present invention describes an agarase reporter assay for determining the presence of secreted polypeptides, it is understood that any assay that uses any one reporter described above can be used to identify the secreted polypetides. The enzymatic activity of a secreted TAT fusion polypeptide can be determined by contacting the secreted polypeptide with a substrate and detecting the reaction product produced by the action of the enzyme on the substrate. In addition, verifying that the correct enzyme or other reporter molecule is secreted can be accomplished by performing mass spectroscopy of the secreted protein.

Purification of Proteins

Host cells transformed with polynucleotide sequences encoding heterologous or homologous protein may be cultured under conditions suitable for the expression and recovery of the encoded protein from cell culture. "Recovering" a polypeptide from a culture medium refers to collecting the polypeptide in the culture medium into which it was

secreted by the host cell. The polypeptide can also be recovered from a lysate prepared from the host cells and further purified. A secreted polypeptide may be recovered from the cell wall fraction prepared according to methods known in the art. One skilled in the art can readily follow known methods for isolating polypeptides and proteins in order to obtain one of the isolated polypeptides or proteins of the present invention. These include, but are not limited to, immunochromatography, HPLC, size-exclusion chromatography, ion-exchange chromatography, and immuno-affinity chromatography. See, e.g., Scopes, Protein Purification: Principles and Practice, Springer-Verlag (1994); Sambrook, et al., in Molecular Cloning: A Laboratory Manual; Ausubel et al., Current Protocols in Molecular Biology. The protein produced by a recombinant host cell comprising a secretion factor of the present invention will be secreted into the culture media.

The invention also encompasses fusion polypeptides that comprise a Tat signal peptide and a heterologous peptide and a polypeptide domain that will facilitate purification of soluble proteins (Kroll DJ et al (1993) DNA Cell Biol 12:441-53). Such purification facilitating domains include, but are not limited to, metal chelating peptides such as histidine-tryptophan modules that allow purification on immobilized metals (Porath J (1992) Protein Expr Purif 3:263-281), protein A domains that allow purification on immobilized immunoglobulin, and the domain utilized in the FLAGS extension/affinity purification system (Immunex Corp, Seattle WA). The inclusion of a cleavable linker sequence such as Factor XA or enterokinase (Invitrogen, San Diego CA) between the purification domain and the heterologous protein can be used to facilitate purification.

In yet another aspect, the invention is directed to a method for producing a heterologous polypeptide comprising culturing host cells in culture medium under conditions suitable for producing the heterologous polypeptide, wherein the host cells contain an expression vector that comprises a first nucleotide sequence encoding a TAT signal peptide encompassed by the invention operatively linked to a second nucleotide sequence encoding a heterologous polypeptide, and producing the heterologous polypeptide. In some embodiments, the method uses a TAT signal peptide of the invention that comprises the sequence motif (X "1 )RR(X +2 )(X +3 )(X +4 ), wherein R is arginine, X "1 is amino acid M, H, A, P, K, R, N, T, G, S, D, Q or E; X +2 is amino acid A, P, K, R, N, T, G, S, D, Q or E; X +3 is I, W, F, L, V, Y, M, C, H, A, P, N or T; and X +4 is Q, I, L, V, M or F, and wherein the motif is not within the first 35N' terminal residues of the amino acid sequence of the polypeptide.

In other embodiments, the heterologous polypeptide that is produced by the method of the invention includes a TAT signal peptide that comprises the sequence motif

(X- 1 )RR(X +2 )(X +3 )(X +4 ), wherein R is arginine, X "1 is amino acid H, A, P, K, R, N, T, G, S, D, Q E or L; X +2 is A 1 P, K, R, N 1 T 1 G, S 1 D, Q or E; X +3 is I 1 W, F 1 L, V, Y, M, C 1 H 1 A 1 P 1 N or T; and X +4 is T, G or A 1 and wherein the motif is within the first 35N' terminal residues of the amino acid sequence of the polypeptide. In some embodiments of this aspect, the step of producing the heterologous polypeptide comprises recovering the polypeptide from the culture medium.

In other embodiments of this aspect, the host cell is a prokaryotic cell. In other embodiments, the host cell is a Streptomyces bacterial cell. In yet other embodiments, the host cell is a S. coelicolor or an S. lividans bacterial cell. In further embodiments of this aspect, the method of the invention produces a heterologous polypeptide that can be an enzyme a growth factor or a hormone.

It is contemplated that any signal sequence that directs a nascent polypeptide into a secretory pathway may be used in the present invention. It is to be understood that the any one of the newly identified signal sequences are encompassed by the invention. The signal peptides that are specifically contemplated by the instant invention included the Tat- dependent signal peptides. Specific examples include but are not limited to the Tat signal peptides listed in Tables 2, 3, and 6.

EXPERIMENTAL The following preparations and examples are given to enable those skilled in the art to more clearly understand and practice the present invention. They should not be considered as limiting the scope and/or spirit of the invention, but merely as being illustrative and representative thereof.

In the experimental disclosure which follows, the following abbreviations apply: eq (equivalents); M (Molar); μM (micromolar); N (Normal); mol (moles); mmol (millimoles); μmol (micromoles); nmol (nanomoles); g (grams); mg (milligrams); kg (kilograms); μg (micrograms); L (liters); ml (milliliters); μl (microliters); cm (centimeters); mm (millimeters); μm (micrometers); nm (nanometers); 0 C. (degrees Centigrade); h (hours); min (minutes); sec (seconds); msec (milliseconds); TLC (thin layer achromatography); TY, trypton/yeast extract; Ap, ampicillin; DTT, dithiotreitol; Em, erythromycin; HPDM, high phosphate defined medium; IPG, immobilized pH gradient; IPTG, isopropyl-β-D-thiogalactopyranoside; Km, kanamycin; LPDM, low phosphate defined medium; MM, minimal medium; OD, optical density; PAGE, polyacrylamide gel electrophoresis; PCR, polymerase chain reaction; Sp, spectinomycin; SSM, Schaeffer's sporulation medium; 2D, two-dimensional.

Example 1

Strains and Growth Conditions

Derivatives of S. coelicolor strain M145 (Bentley et al., (2002) Nature 417;141-147;

Kieser T. et al., (2000) Practical Streptomyces Genetics, The John lnnes Foundation, Norfolk, UK) and S. lividans 1326 were used throughout the examples. Strains were cultured on standard laboratory growth media: minimal medium (Kieser et al., (2000) Practical Streptomyces Genetics - The John lnnes Found, Norfolk, UK), complete medium (CM) Hopwood DA (1967) Bacterid Rev 31:373-403), Mannitol soya flour (MS) (Hobbs et al., (1989) Appl Microbiol Biotechnol 31:272-277) or R5 Bierman et al (1992) Gene 116:43- 49) for growth on plates and Tryptone soya broth (TSB) and Yeast extract-malt extract (YEME) for liquid growth (Hopwood DA, supra). Antibiotics were used at a final concentration of 50 μg/ml unless otherwise stated.

Example 2 Construction of TAT mutants and gross phenotypic analysis of

AtatC mutant S. coelicor strains To determine the impact of Tat-mediated protein export in S. coelicolor, marked mutations in each of the tatA, tatB and tatC genes and umarked in-frame deletions of tatB and tatC were constructed.

S. coelicolor tat deletion strains were prepared as follows. Antibiotic-marked and unmarked in-frame deletions were constructed using the method of Datsenko and Wanner (Datsenko et al., (2000) Proc. Natl. Acad. Sci. USA 97:6640 - 6645) modified as described by Gust et al., (Gust et al., (2003) Proc. Natl. Acad. Sci. USA 100:1541 - 1546). Cosmid 141 , which carries the tatA and tatC genes, and cosmid P8, which carries the tatB gene were mutated with apramycin replacement cassettes prepared for apramycin replacement cassettes prepared for tatA, tatB, and tatC genes by using primers listed in Table 1. The mutated cosmids were transferred by mating into S. coelicolor 1 MI 45, and single-crossover recombinants were selected for on MS medium containing apramycin and kanamycin (Berks et al., (2003) Adv Microbiol Physiol 47:187-254). Double crossover recombinants subsequently were selected by several rounds of growth on nonselective media followed by selection for colonies that were apramycin-resistant and kanamycin sensitive. Loss of the tat genes in each strain was confirmed by PCR, and the strains were designated TP1 (δte£C::ApraR), TP2 (δfø64::ApraR), and TP3 (δtatB::ApraR). Markerless strains were

constructed by the method of Gust et al. (2), loss of the apramycin-resistance cassette was confirmed by PCR, and the strains were designated TP4 (δtatC) and TP5 (δtatB).

The apramycin resistance cassette from plJ773 (Gust et al., (2003) Proc. Natl. Acad. Sci. USA 100:1541 - 1546) was amplified using the tatA, tatB or tatC primers, which resulted in a PCR fragments where the final 39bp at each end corresponded to the flanking regions of the target genes. The PCR products were transformed by electroporation into strain BW25113/plJ790 harboring either cosmid 141, which carries the tatA and tatC genes, orP8, which carries the tatB gene, and transformants were selected for apramycin resistance. The mutated cosmids were extracted, transformed into E. coli strain ET12567 carrying helper plasmid pUZ8002 and mated into S. coelicolor M145. Single crossovers were selected by several rounds of growth on non-selective media followed by selection for colonies that were apramycin resistant and kanamycin sensitive (Gust et al. (2003) supra). The loss of the tat genes in each strain was confirmed by PCR and the strains were designated TP1 (δtafC::Apra R ), TP2 (δtaM::Apra R ) and TP3 (δføfδ::Apra R ). Marker-less strains were constructed by transforming either cosmid 141 harboring the δfø£C::Apra R allele or cosmid P8 harboring the δfafβ::Apra R allele into E. coli strain DH5α carrying plasmid pBT340, inducing expression of the FLP recombinase by growth at 42°C and subsequent testing for colonies- that were apramycin sensitive. The cosmids carrying the unmarked alleles were then transformed into E. coli strain ET12567/pUZ8002 and mated into the appropriate S. coelicolor ApraR-marked tat deletion strain with selection for kanamycin resistance. Kanamycin resistant colonies were grown for several rounds without selection and colonies that were sensitive to both apramycin and kanamycin were identified. The loss of the apramycin resistance cassette was confirmed by PCR and the strains were designated TP4 (δtatC) and TP5 (δtatB). The gross phenotype of all of the coelicolor tat mutants were similar to those observed for the tatB and tatC mutants of S. lividans (Schaerlaekens et al., (2001) J Bacteriol 183:6727-6732; (Schaerlaekens et al., (2004) Microbiology 150:21-31; and Figure

1).

When cultured on solid MS, the S. coelicolor tat mutant strains formed very small colonies that appeared to hypersporulate (Figure 1 A). Conversely, on the hyperosmotic solid R5 medium (Figure 1B), tat mutants failed to sporulate and produced very little of the blue-pigmented antibiotic actinorhodin. In TSB liquid medium, the tat mutants grew in a very dispersed manner compared with the M145, which normally grows as pellets. Consistent with the pronounced developmental defect noted on R5 agar, tat mutant strains failed to grow in liquid YEME medium (which contains 34% sucrose) unless sucrose was

excluded. S. coelicolor tat mutant mycelium were markedly more fragile than M145 mycelium and were particularly prone to lysis on shaking.

Therefore, to study the extracellular proteins of these fragile strains, it was necessary to grow the cells on solid media only and focus analysis on the exported proteins associated with the cell wall.

TABLE 1 Primers pairs used for generating tat mutants and agarase constructs

Example 3

Construction of reporter plasmids

The primer sequences used to assemble a reporter construct based on the promoter and structural gene for agarase, dagA are listed in table 1. Initially a PCR

product (amplified with primers Agarase-F and AgaraseL eadnde covering the dagA promoter was digested with Hindlll and EcoRI and cloned into similarly digested pBluescriptll (Stratagene). A second PCR product, corresponding to the AadA streptomycin resistance gene of plJ778 plus its promoter (amplified using primers AadAFnde and AadARbam with plJ778 as the template), was digested with Ndel and BamHI and cloned into the above construct that had been pre-digested with the same enzymes. A third PCR fragment (amplified with primers AgaraseLEADBAM and AgaraseRxba) covering the region encoding the DagA mature protein sequence was digested with BamHI and Xbal and cloned into this to give plasmid pTDW45. Plasmid pTDW45 carries a fragment that corresponds to the dagA promoter and region of dagA encoding the mature protein sequence separated by aadA gene and flanked by BgIII sites. The BgIII fragment from pTDW45 was cloned into pSET152 (Bierman et al., (1992) Gene 116:43 - 49) that has been previously digested with BamHI to give pTDW46. This shuttle vector can replicate in E. coli and be mated into S. coelicolor where it integrates into the chromosome site-specifically. The aadA gene can be removed from this plasmid by digestion with Ndel and BamHI and replaced with DNA fragments encoding different signal peptides. A list of oligonucleotide primers and the signal peptides they amplify and the subsequent derivatives of pTDW46 carrying these signal peptides are listed in Table 2. Plasmid pU6902dagA carries a C-terminally myc eptitope-tagged derivative of agarase under control of the thiostrepton-inducible promoter PtipA. The Agarase gene was amplified using primers kddagaf and kddagamyc (Table 1) digested with Ndel and BamHI and cloned into similarly digested plJ6902 (Huang et al., (2005) MoI Microbiol 58:1276- 1287). The plJ6902dagA (PtipA-dagAmyc allele) was transferred to the φC31 site on the chromosome of M145 and TP4 as described in Bierman et al., supra.

TABLE 2 Primer pairs for amplifying Tat signal peptides

* The SCO number for each protein whose signal peptide was cloned in frame with mature agarase is given.

** The signal peptide coding sequence that was amplified for each secreted protein is shown, the arrow indicates the most likely position of signal peptide cleavage as determined by SignalP

*** The plasmid designation of the final construct carrying the particular signal peptide-agarase fusion is given.

Example 4 Identification of Tat-dependent proteins by 2D-gel electrophoresis (2-DGE)

To identify proteins secreted via the TAT pathway, extracellular proteins derived from the S. coelicolor parent strain (M145) and the tatC mutant strain were analyzed by two-dimensional gel electrophoresis (2-DGE).

Protein were prepared for 2D gel analysis or MUDPIT by following the method of (Hesketh et al., (2002) MoI. Microbiol. 46:917 - 932; Yu et al., (2003) Anal. Chem. 75:6023 -6028; and Perkins et al., (1999) Electrophoresis 20:3551 - 3567) S. coelicolor strains M145 or TP1 tatC were grown by inoculating 10 6 spores onto sterile cellophane disks placed on the surface of complete medium (CM), R5 or mannitol soya (MS) media. After incubation for 48 hours at 30 0 C the biomass was scraped from the disks, dispersed in 5M LiCI solution and left on ice for 30 minutes. The suspensions were vortexed for 2 - 3 minutes and the biomass removed by centrifugation (1800 g for 5 min) followed by

passage through a 0.45 um filter. Protein were precipitated from the LiCI solution by addition of trichloroacetic acid to a final concentration of 20%, incubated on ice for 30 min and centrifuged at 180Og for 15 min. After centrifugation, the mixture had settled into two phases. The upper phase was removed and discarded and water was added to the lower phase to adjust it back to the original volume. The solution turned cloudy so it was again centrifuged at full speed for 15 min, after which the precipitated protein formed a pellet. The pellet was washed three times with -20°C acetone and then air-dried.

For 2D gel analysis, the protein samples were resuspended in IEF sample-buffer (8M urea, 0.5% CHAPS, 0.2% DTT, 0.5% IPG buffer pH 4 - 7 (Amersham Biosciences), 0.002% bromophenol blue) and protein concentration was determined using the Biorad DC protein assay. The proteins in the samples were separated in the first dimension by their differing isoelectric points. 500 - 1000 μg samples of protein were loaded onto Amersham Biosciences 18cm lmmobline Drystrip pH 4 - 7 or pH 6 - 11 iso-electric focusing gels and resolved by electrophoresis for 33 kVh using an Amersham Biosciences Ettan IPGphor iso- electric focusing unit. Foci=ussed strips were treated as described by Hesketh et al., (2002) supra and then separated in the second dimension by SDS-PAGE using Amersham Biosciences DALT 12.5% gels. The gels were stained using a colloidal Coomassie Blue stain and scanned with a Proxpress Proteomic Imaging System scanner for later comparison. Protein spots of interest were subsequently excised from the gel, digested with trypsin and identified by MALDI-TOF peptide mass fingerprint analysis as described previously (Hesketh et al., (2002) supra).

Typical 2D gel electrophoretograms for the cell wall-associated proteome of these strains cultured on R5 medium are shown in Figure 2 and for MS and CM media in Figure 3. Significant differences in the staining patterns were observed between the two strains, and proteins that were present in M 145 but absent from the iatC strain clearly represented candidates for Tat substrates. Putative Tat-targeted proteins were identified by MALDI-TOF mass spectrometry (after in-gel digestion with trypsin); several of them are marked in Figs. 2 and 3.

A total of 98 proteins that migrated with unique positions were identified in the M145 washes. In all, 37 of these proteins had identifiable N-terminal signal peptides, and 34 of those signal peptides contained RR dipeptides (Table 3). Cross-referencing with the predicted Tat substrates encoded by the S. coelicolor genome and identified by the programs TatP and TATFIND (Rose et al., (2002) MoI Microbiol 45:943-950; Bendtsen et al., (2005) BMC Bioinformatics 6:167-175), suggested that 21 of this group of 37 had plausible Tat-targeting signal peptides.

In addition to the 37 putative exported proteins, the remaining 61 proteins unique to the tat strain included ribosomal subunit proteins and proteins of the thioredoxin pathway (Table 5). The identification of such obviously cytoplasmic proteins undoubtedly reflected contamination of the cell wall washes with cytoplasmic proteins from lysed bacteria. Cytoplasmic protein contamination in this type of analysis is almost inevitable given the S. coelicolor life cycle, which involves altruistic lysis of the mycelium to release nutrients for the continued growth of the aerial parts of the colony (Manteca et al., (2006) Res Microbiol 157:143-152). Moreover, such cytoplasmic protein contamination has been reported during analyses of extracellular proteins from B. subtilis (Antelmann et al., (2001) Genome Res 11 ;1483-1502) and Mycobacterium tuberculosis (Rosenkrands et al., (2000)

Electrophoresis 21:935-948). Indeed, cell lysis, as well as substrate modification (see below) and up-regulation of Sec substrates, also may contribute to the presence of additional extracellular proteins in the AtatC strain that are absent in M145 (Table 4). Several of the exported proteins were detected as multiple spots in the M145 sample, possibly as a result of posttranslational modification or proteolysis, and it was not uncommon for one or more of the additional spots for a particular protein to be absent from cell wall fractions of the AtatC strain. Therefore, because several putative extracellular proteases are predicted TAT substrates (Dilks et al., (2003) J Bacteriol 185:1478-1483), the lack of a particular protein spot in the 2-DGE analysis of the AtatC strain might be indicative of the lack of postexport protein modification rather than a lack of TAT translocation perse.

TABLE 3

Exported proteins observed in the cell wall-associated fraction of the wild type (M145) strain identified by either 2-DGE and/or MudPIT

Proteins are listed by SCO number and the signal peptide sequences are given in the right column. Twin-arginine dipeptides, where present, are shown in bold. Where multiple arginine dipeptides are present, only the most plausible one is marked. Method of observation indicates which technique was used to identify a particular protein (2D is two- dimensional gel electrophoresis; MudPIT is multidimensional protein identification technology).

There is some ambiguity about the assigned start codons for SCO3790 and SCO6580. For each of these proteins there is an alternative GTG start codon within the predicted signal peptide coding sequences. These are shown as valine residues that are highlighted in bold underline. Signal peptides initiating at these alternative start codons were tested in the agarase export assay (Fig 5) and in each case shown to mediate higher levels of agarase export than the full length signal peptides.

"These signal peptides have been shown to mediate Tat-dependent export when fused to the mature sequence of S. lividans xylanase C (Li H, Jacques PE, Ghinet MG, Brzezinski R, Morosoli R (2005) Microbiology 151: 2189-2198). t These proteins have also been identified in the extracellular fraction of S. coelicolor grown in liquid medium (Kim DW, Chater K, Lee KJ, Hesketh, A (2005) J Bacterid 187: 2957-2966).

#These proteins are only detected by TatP and TatFind when truncated in silico at the N- terminus to the underlined residue.

The annotation TatFind and/or TatP provided for SCO numbers indicates whether a particular protein has been identified as a putative Tat substrate by either of the two prediction programs TatFind 1.4 or TatPlO (information regarding the programs can be found on the world wide web at signalfind.org and cbs.dtu.dk/services/TatP-1.0, respectively).

TABLE 4

Exported proteins observed in the cell wall-associated fraction of the mutant tatC strain identified by either 2-DGE and/or MudPIT

The table lists proteins by SCO number and the signal peptide sequences are given in the right column. Method of observation indicates which technique was used to identify a particular protein (2D is two-dimensional gel electrophoresis; MudPIT is multidimensional protein identification technology. Bracketed after MudPIT is indicated "tatC if the protein was found in the cell-wall fraction of the 'tatC mutant strain only, and 'mututal' if it was found in the cell-wall fractions of both the M145 and the 'tatC mutant strains. The annotation TatFind and/or TatP provided for SCO numbers indicates whether a particular protein has been identified as a putative Tat substrate by either of the two prediction programs TatFind or TatP.

This protein migrated in a unique position on two-dimensional gel analysis of the extracellular fraction from the M145 strain. However it was detected in the tatC mutant strain by MudPIT analysis.

1 TrIeSe proteins have also been identified in the extracellular fraction of S. coelicolor grown in liquid medium (Kim et al., (2005) J Bacteriol 187: 2957-2966).

Example 5

Identification of Tat-dependent proteins by Multidimensional Protein Identification

Technology (MudPIT) In parallel to the traditional 2-D gel electrophoresis approach, the cell wall proteome of the S. coelicolor strains was analyzed by multidimensional protein identification technology

(MudPIT), a sensitive modern technique used to separate and identify even low-abundance proteins from complex mixtures. Samples of precipitated protein (1 mg) prepared as described above, were dissolved by addition of 100 μl 10OmM Tris-HCI, pH 8.0 200 μl 0.4% RapiGest TM (Waters Ltd, Elstree, herts, UK) in 2OmM Tric-HCI, pH 8.0 10μl of 40 mM EDTA in water and 40 μl of 45 mMDTT in 10OmM Tris-Hcl pH 8.0. Once the pellets dissolved the samples were heated in a water bath at 90°C for 30 min then cooled to room temperature. 40μl of 10OmM iodoacetamide in 10OmM Tris-HCI pHδ.O was added and the sample incubated in the dark for 15 min. 10 μg trypsin (modified sequence grade porcine, Promega, UK) in 10 μl 20, < Tris-HCI pH8.0 was added to each sample which were then incubated at 37 0 C. After 16 h further 10 μg of trypsin was added and the samples incubated for a further 8h. The RapiGest was denatured prior to MudPIT mass spectrometry by the addition of 40 μl of 500 mMHCI to give a concentration between 30 - 50 mM (pH < 2) followed by incubation at 37 0 C for 45 min. The cloudy samples containing hydrolyzed detergent were centrifuged at 13,000 rpm for 10 min in a BioFuge TM (Heraeus) and the supematants carefully removed for chromatographic separation.

Samples were loaded onto a biphasic column comprising a strong cation exchange phase (SCX) and a reverse phase. Peptides were eluted stepwise from the SCX phase by using increasing concentrations of salt onto the reverse phase. A reverse-phase gradient then was generated and peptides were eluted into a Q-ToF2 mass spectrometer (Micromass, Manchester, U.K.) The data from each reverse-phase gradient was combined and searched by using MASCOT (Matrix Science, London, U.K. (Perkins et al., Electrophoresis 20:3551-3567 (1999)). The details for the analysis of the samples are as follows.

For chromatographic separation, a 75 μm PicoFrot capillary (NewObjective, Inc.) was packed first with 90 mm of Symmetry C18 300A reverse-phase material (Waters, Ltd, UK) followed by 30 mm of PartiSphere strong cation exchange material (Whatman) using a pressure packing device. The resulting biphasic micro capillary column was equilibrated to

5% acetonitrile/0.1% formic acid. After loading the sample the column was mounted on the Z-spray ion source of a Q-ToF2 mass spectrometer (Micromass, Manchester, UK) and inline with a capillary HPLC system (CapLC, Water Ltd., UK). A fully automated 9-step chromatography run was carried out, with the mass spectrometer operating in data- dependent modes during each reverse phase elution. The three buffer solutions used for chromatography were 0.1 % formic acid (buffer A), 100% acetonitrile/0.1% formic acid (buffer b) and 50OmM ammonium acetate/0.1% formic acid (buffer c). Elution was performed using increasing concentrations of buffer C followed by a reverse-phase gradient. Buffer C concentrations were 0, 10, 25, 50, 75, 100, 200, 300 and 50OmM. The reverse phase gradients consisted of the following profile: t = 0 min, 5%B; t=80, 40%B: t=90, 80%B.

The eluted peptides were detected in MS mode in order to select precursor ions for collision-induced dissociation (MS/MS). Every 1.2s the three most intense signals were subjected to MS/MS analysis. The resulting MS/MS data was deconvoluted using MaxEnt3 (Micromass) and converted into a text file listing the mass, intensity and charge state of the parent ions and mass and intensity of the associated fragment ions using ProteinLynx (Micromass). These data were search using the MASCOT search tool (Matrix Science Ltd, London) against SCODB (the S. coelicolor protein database) using appropriate parameters. The sole fixed modification was carboxyamidomethyl (Cys) and the only variable modification was oxidation (Met). The enzyme was selected as trypsin and the maximum number of missed cleavages was set as 3. Peptide M w tolerance and MS/MS fragment ion tolerance were set as + 0.25Da.

The criteria for protein identification were based on the manufacturer's definitions (Matrix Science, Ltd.) Candidate peptides with probability based Mowse scores exceeding threshold (p< 0.05) indicated a significant homology and were referred to as "hits". Protein scores were derived from peptide ion scores as a non-probabilistic basis for ranking proteins.

MudPIT analysis of the M145 and AtatC strains grown on CM medium identified 308 and 279 individual proteins from the cell wall washes, respectively, of which 188 were common to both samples (Tables 3-5). Of the 120 remaining proteins exclusively present in M145 cell wall sample, 20 corresponded to proteins bearing plausible N-terminal signal peptides that contained two consecutive arginines in their sequences (including 11 that had been identified by 2-DGE analysis), strongly suggesting that this group represents the Tat substrates (Table 3).

TABLE 5

Cytoplasmic proteins observed in the extracellular fraction of S. coelicolor identified by either 2-DGE and/or MudPIT

The table lists proteins by SCO number. Method of observation indicates which technique was used to identify a particular protein. 2D is two-dimensional gel electrophoresis; MudPIT is multidimensional protein identification technology. Bracketed after MudPIT is indicated "tatC if the protein was found in the cell-wall fraction of the tatC mutant strain only, 'M145' if the protein was found in the cell-wall fraction of the M145 strain only and 'mutual' if it was found in the cell-wall fractions of both the M145 and the tatC mutant strains.

1 TlIeSe proteins have also been identified in the extracellular fraction of S. coelicolor grown in liquid medium (Kim DW, Chater K, Lee KJ, Hesketh, A (2005) J Bacteriol 187: 2957-2966).

Example 6 A Tat Transport Assay Based on Agarase

To identify bona-fide Tat-targeting signal peptides associated with the group of 43 proteins identified by the proteomic assays 2-DGE and MudPIT, a reporter-based assay for Tat transport was designed.

Sec-dependent signal peptides are not normally recognized by the Tat machinery, and Tat-targeted proteins usually are folded and, therefore, are not normally compatible for Sec-dependent export. Thus, a reporter-based assay for Tat transport was designed to address directly whether the group of 43 putative Tat substrates were indeed synthesized with bona fide Tat signal peptides.

Embedding of S. coelicolor colonies into agar is a phenomenon associated with secretion of the enzyme agarase, which degrades agar to smaller oligosaccharides. As the agar is broken down around the colony, this causes the colony to sink down into the medium. Agarase is encoded by the dagA gene, and the protein product bears an N- terminal signal peptide containing an apparent twin-arginine motif

MVNRRDLIKWSAVALGAGAGLAGPAPAAHAIAD (SEQ ID NO: 313) (Figure 4A). To determine whether agarase activity could be used in a reporter assay to test the ability of a signal peptide to secrete a protein via the TAT pathway, Applicants tested whether agarase is a TAT-dependent substrate. a) First, extracellular agarase activity of wild type M145 S. coelicor was compared to that in a tatC mutant (TP1 (δfafC::Apra R ).

S. coelicor strains M145 and TP1 were grown on MM-C minimal medium for 5 days and were stained with lugol solution.

First, it was seen that colonies of the S. coelicor tat mutants failed to sink into solid media suggesting that DagA is a TAT substrate. Subsequent staining of the agar- containing plates with lugol, an iodine based reagent, showed a clear halo corresponding to the degradation of agar around the wild type strain (M145), while no zone of clearing was observed around colonies of the tat mutant strains (Fig. 4B).

These data provide strong and visual evidence that that the tat mutants have no significant extracellular agarase activity. b) Second, to ensure that agarase activity was expressed in the tat mutants, the dagA gene was placed under control of the tipA thistrepton-inducible promoter and incorporated onto the chromosome in a single copy at the φ C31 attachment site. After integration of the construct into S. coelicolor strain M145 (wild type) and TP4 (δtatC), harboring plJ6902-c/ao//\ in a single copy, thiostrepton was added to induce expression of

agarase. Transcription of dagA in S. coelicolor \s highly regulated and reported to be repressed by glucose in the growth medium (serving-Gonzalez et al., (199) Microbiology 140:2555-2565). The bacteria were grown on mimimal medium containing 0.5% glucose, apramycin and thiostrepton for 3 days before being stained with lugol solution. As shown in Fig. 4C, agarase activity was produced in the wild type strain M145, while no active agarase was secreted in the TP4 (δtatC) background.

These data demonstrate that S. coelicolor DagA is a TAT-dependent extracellular protein. c) Third, to exploit agarase as a screen for Tat-dependent protein export the ability of the DagA mature protein to be targeted via alternative export routes was tested.

The TAT targeting signal peptide of DagA was swapped for the Sec-targeting signal peptides of three S. coelicolor proteins: SCO3053, SCO5660, and SCO6199, and the agarase activity determined as described above.

No extracellular agarase activity could be detected when the mature portion of the enzyme was targeted to the Sec machinery (Fig. 5; see Fig 6), most likely as a result of folding of agarase before export.

Taken together, these data show that DagA can be used as a reporter exclusively for TAT-mediated protein secretion.

Example 7 ldenfication of bona fide TAT-exported proteins and signal peptides using the TAT Transport Assay Based on Agarase.

The two proteomic techniques above identified putative (a total of 43) proteins that could have been transported by the Tat pathway (Table 3). To identify whether these proteins contained bona fide Tat-targeting signal peptides, the reporter-transport assay based on agarase activity was used. Fusions of the signal peptides of each of the putative TAT-targeted proteins to mature agarase were constructed and expressed in a nonagarase-producing host strain (Streptomyces lividans 1326) and assessed for the production of agarase activity using the lugol-based plate test. Tat dependency of Streptomyces coelicolor agarase was demonstrated by growing strain M145 and TP4 harboring either plL6902 or plJ6902dagA on minimal medium containing glucose, which strongly represses expression of the native agarase. Approximately 10 4 spores of each strain were streaked or spotted onto minimal medium containing glucose, additionally supplemented with apramycin and thiostrepton (to induce expression of the myc-tagged agarase) and grown at 37°C for 72 hours. And then stained

with Lugol solution (Sigma) for 45 min. Plasmids encoding signal peptide-agarase fusion proteins (listed in Table 2) were mated into S. lividans taf 1326 and 10-164 (as 1326, rnsiK ' ), and the 10-164 isogenic tatC mutant (2Faury et al., (2004) Biochem Biophys Acta 1699:155-162), all of which do not possess native agarase. Plates containing mature spores of the resultant strains, along with S. coelicolor M145 and TP4 were used to inoculate, with a dissecting needle, MM-C minimal Medium (MM-C) (10g Agar, 1g (NH 4 ) 2 4 , 0.5g K 2 HPO, 0.2g FeSO 4 JH 2 O in 1L) lacking a carbon source other than agar. The inoculated plates were grown for 5 days at 3OC and then stained with Lugol solution (Sigma) for 45 min before photography on a light box. Relative activities were estimated by determining the diameter of the zones of clearing using the measure tool of the image manipulation program GIMP (open source software distributed under a GNU general public license at <www.gimp.org>). One sample was included in every batch of results as a standard to which the diameters of the zones of clearing were adjusted. Each zone of clearing was measured twice with each measurement being at right angles to the other. The mean of the two measurements was taken and used to calculate R 2 (R = radius) as an estimate of the concentration of agarase. All measurements were then expressed as a percentage of the activity of pTDW47, which is the reporter pTDW46 with the native agarase signal peptide reintroduced. Nine measurements were made for each data point and the standard error of the mean was calculated for each point. (a) Bona fide signal peptides:

In all, 25 of the 43 signal peptides clearly directed Tat-dependent secretion of DagA, consistent with their being bona fide Tat signal peptides (Fig. 5 and Table 6; a representative sample of these also was tested in a S. lividans tatC strain (Faury et al., supra), further demonstrating their Tat dependence; Fig. 6). Of these 25, TATFIND predicted 20 and TatP predicted 18 to be Tat signal peptides. Both prediction programs failed to identify two of the Tat-active signal peptides (SCO2286 and SCO3790) because of their unusual length; however, in silico N-terminal truncation of these signal peptides restored the ability of both programs to recognize them as Tat-targeting signal peptides. Moreover, it is clear when comparing the results from the agarase assays (Figs. 5 and 6) that secretion efficiency varies significantly with signal peptide primary structure. Thus, comparisons of heterologous protein translocation by using only a single Tat or Sec signal peptide, as recently performed for hTNFalpha and hlL10 in S. lividans (Schaerlaekens et al., (2004) J Biotechnol 112:279-288), may not reflect the true potential of each pathway for protein transport. The remaining 18 signal peptides of the 43 that contained consecutive arginine residues could not mediate secretion of active agarase, and of these, TATFIND

predicted 0 and TatP predicted only 1 to be a genuine Tat-targeting signal peptide (Table 3). These 18 signal peptides therefore were considered not to be Tat-targeting signals and the corresponding passenger proteins are not detected in the cell wall fraction of the AtatC strain presumably because of pleiotropic effects. Of the exported proteins identified in cell wall analyses of the AtatC strain (Table 4), two (SCO0736 and SCO7677) were predicted by both TATFIND and TatP programs to contain Tat-targeting signal peptides. These signal peptides were subjected to the agarase test, and both conferred extracellular agarase activity to the S. lividans host strain, bringing the total of verified Tat-targeting signal peptides to 27 (Fig. 5). The apparent presence of these final two proteins in the δ tatC strain cell wall might have been due to minor contamination of washes because of cell lysis. Finally, rare examples of Tat-dependent signal peptides lacking consecutive arginine residues have been reported (e.g., Ignatova et al., (2002) 291 :146-149). A total of seven exported proteins were identified in the cell wall fraction of M145 that did not contain obvious twin-arginine motifs in their signal peptides (Table 3). The agarase test was applied to six of these seven signal peptides, but none were found to direct export of active agarase.

(b) TAT-exported proteins:

Taken altogether, Applicants have unambiguously identified 27 Tat-dependent proteins in S. coelicolor. This number represents 30% of all of the exported proteins that we detected in the cell wall fraction of the taf strain and clearly demonstrates that the Tat pathway is a major protein translocation route. The agarase reporter assay developed here and used to validate possible Tat-targeting signal peptides is particularly powerful because it is not only facile and rapid but also semiquantitative and, therefore, provides a ready measure of transport efficiency for a particular Tat-targeting signal peptide. It is also anticipated that the agarase reporter system will facilitate the exploitation of the Tat pathway for heterologous protein production. The identified Tat-exported proteins listed in Table 6 represent a broad spectrum of functional classes: several are predicted to be involved in phosphate and carbohydrate metabolism, nutrient transport, and lipid metabolism. However, unlike the well characterized E. coli system, only 3 of the 27 Tat substrates detected here are likely to be cofactor-containing and, remarkably, 2 of these (SCO6272 and SCO6281) are associated with a type I modular polyketide synthase gene luster, indicating a role in secondary metabolism and, therefore, are not expected a priori to be exported. Substrates of the twin-arginine translocation pathway have hitherto not been found associated with a secondary metabolite gene cluster. Other notable Tat substrates identified here include two proteins involved in peptidoglycan metabolism, SCO0736 and

SCO1172, the latter being a probable cell wall amidase. Considering the connectivity between nonexport of amidases and the integrity of the cell envelope in E. coli tat mutants (Ize et al., (2003) MoI Microbiol 48:1183-1193), the identification of SCO1172 as a Tat substrate may well account for the fragility of the Streptomyces tat mutants. Moreover, as with the E. coli model, it may perhaps be possible to "rescue" the fragile phenotype of the Streptomyces tat mutants by increased expression of a Sec-dependent amidase. This likely would enable a further proteomic analysis of the "Tat secretome" in this organism and the assignment of Tat-targeted proteins not associated with the cell wall. A major surprise was that 4 of the 27 Tat substrates are annotated as lipoproteins (Bendtsen et al., (2005) BMC bioinformatics 6:167-175) (Table 6). Again, this in vivo data demonstrates the Tat dependence of putative lipoproteins and implies that class I and class Il cellular signal peptidases can recognize and cleave Tat-targeting signal peptides. Interestingly, an outer membrane-localized dimethyl sulfoxide reductase, a Tat substrate that is also strongly predicted to be a lipoprotein, has been described in Shewanella oneidensis (Gralnick et al., (2006) Proc Natl Acad Sci USA 103:4669-4674).

Overall, these data demonstrate that the Tat pathway can export a diverse range of proteins and further reinforces the contention that the Tat pathway is a truly general protein export system in this organism. The ability of the Streptomyces Tat pathway to export proteins requiring various anti- and possibly posttranslocation modifications, as well as two of the largest single Tat substrates ever reported (SCO6198 and SCO6457, at 116 and 146 kDa, respectively), underscores the potential of the Tat system in Streptomycetes for heterologous protein production.

TABLE 6 Tat-dependent proteins identified in S. coelicolor

* Cofactor containing. fPutative lipoproteins.




 
Previous Patent: VIRAL VECTORS

Next Patent: METHOD OF ELICITING IMMUNE RESPONSE