ESSENTIAL BACTERIAL GENES AND THEIR USE

Title:

ESSENTIAL BACTERIAL GENES AND THEIR USE

Document Type and Number:

WIPO Patent Application WO/1999/033871

Kind Code:

A2

Abstract:

Disclosed are 23 genes, termed 'GEP' genes, found in $i(streptococcus pneumonia), which are located within operons that are essential for survival. Also disclosed is a related essential gene found in $i(Bacillus subtilis). These genes and the polypeptides that they encode, as well as homologs thereof, can be used to identify antibacterial agents for treating bacterial infections such as streptococcal pneumonia.

Inventors:

YOUNGMAN PHILIP
FRITZ CHRISTIAN
MURPHY CHRISTOPHER
GUZMAN LUZ-MARIA

Application Number:

PCT/US1998/027918

Publication Date:

July 08, 1999

Filing Date:

December 30, 1998

Export Citation:

Click for automatic bibliography generation Help

Assignee:

MILLENNIUM PHARM INC (US)

International Classes:

C07K14/315; G01N33/50; C07K14/32; C07K16/12; C12N1/19; C12N1/21; C12N15/09; C12P21/08; C12Q1/68; G01N33/15; (IPC1-7): C07K14/00

Domestic Patent References:

WO1997042210A1	1997-11-13
WO1998018931A2	1998-05-07

Other References:

ZHANG Y -B ET AL: "Analysis of a Streptococcus pneumoniae gene encoding signal peptidase I and overproduction of the enzyme" GENE: AN INTERNATIONAL JOURNAL ON GENES AND GENOMES, vol. 194, no. 2, 31 July 1997 (1997-07-31), page 249-255 XP004093351 ISSN: 0378-1119
DANIEL R A ET AL: "A COMPLEX FOUR-GENE OPERON CONTAINING ESSENTIAL CELL DIVISION GENE PBPB IN BACILLUS SUBTILIS" JOURNAL OF BACTERIOLOGY, vol. 178, no. 8, 1 April 1996 (1996-04-01), pages 2343-2350, XP002070525 ISSN: 0021-9193
GUIDOLIN A ET AL: "NUCLEOTIDE SEQUENCE ANALYSIS OF GENES ESSENTIAL FOR CAPSULAR POLYSACCHARIDE BIOSYNTHESIS IN STREPTOCOCCUS PNEUMONIAE TYPE 19F" INFECTION AND IMMUNITY, vol. 62, no. 12, 1 December 1994 (1994-12-01), pages 5384-5396, XP002015452 ISSN: 0019-9567

Attorney, Agent or Firm:

Fasse, Peter J. (MA, US)
Ellison, Eldora L. (N.W. Washington, DC, US)

Download PDF:

View/Download PDF PDF Help

Claims:

What is claimed is:

1.

An isolated operon comprising a nucleotide sequence, or an allelic variant or homolog of the nucleotide sequence, encoding: a gep103 polypeptide comprising the amino acid sequence of SEQ ID NO: 1, as depicted in Fig. 1; a geplll9 polypeptide comprising the amino acid sequence of SEQ ID NO: 4, as depicted in Fig. 2; a gep1122 polypeptide comprising the amino acid sequence of SEQ ID NO: 7, as depicted in Fig. 3; a gep1315 polypeptide comprising the amino acid sequence of SEQ ID NO: 10, as depicted in Fig. 4; a gepl493 polypeptide comprising the amino acid sequence of SEQ ID NO: 13, as depicted in Fig. 5; a gepl507 polypeptide comprising the amino acid sequence of SEQ ID NO: 16, as depicted in Fig. 6; a gep1511 polypeptide comprising the amino acid sequence of SEQ ID NO: 19, as depicted in Fig. 7; a gep1518 polypeptide comprising the amino acid sequence of SEQ ID NO: 22, as depicted in Fig. 8; a gepl546 polypeptide comprising the amino acid sequence of SEQ ID NO: 25, as depicted in Fig. 9; a gepl551 polypeptide comprising the amino acid sequence of SEQ ID NO: 28, as depicted in Fig. 10; a gepl561 polypeptide comprising the amino acid sequence of SEQ ID NO: 31, as depicted in Fig. 11; a gepl580 polypeptide comprising the amino acid sequence of SEQ ID NO : 34, as depicted in Fig. 12; a gep1713 polypeptide comprising the amino acid sequence of SEQ ID NO : 37 as depicted in Fig. 13; a gep222 polypeptide comprising the amino acid sequence of SEQ ID NO: 40, as depicted in Fig. 14; a gep2283 polypeptide comprising the amino acid sequence of SEQ ID NO: 43, as depicted in Fig. 15; a gep273 polypeptide comprising the amino acid sequence of SEQ ID NO: 46, as depicted in Fig. 16; a gep286 polypeptide comprising the amino acid sequence of SEQ ID NO: 49, as depicted in Fig. 17; a gep311 polypeptide comprising the amino acid sequence of SEQ ID NO: 52, as depicted in Fig. 18; a gep3262 polypeptide comprising the amino acid sequence of SEQ ID NO: 55, as depicted in Fig. 19; a gep3387 polypeptide comprising the amino acid sequence of SEQ ID NO: 58, as depicted in Fig. 20; a gep47 polypeptide comprising the amino acid sequence of SEQ ID NO: 61, as depicted in Fig. 21; a gep61 polypeptide comprising the amino acid sequence of SEQ ID NO: 64, as depicted in Fig. 22; or a gep76 polypeptide comprising the amino acid sequence of SEQ ID NO: 67, as depicted in Fig. 23.

2.

An isolated nucleic acid molecule comprising a nucleic acid sequence selected from the group consisting of : (1) an operon comprising the sequence of SEQ ID NO: 2, as depicted in Fig.1 or degenerate variants thereof ; (2) an operon comprising the sequence of SEQ ID NO: 2, or degenerate variants thereof, wherein T is replaced by U; (3) nucleic acids complementary to (1) and (2); (4) fragments of (1), (2), and (3) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 1; (5) an operon comprising the sequence of SEQ ID NO: 5, as depicted in Fig. 2 or degenerate variants thereof ; (6) an operon comprising the sequence of SEQ ID NO: 5, or degenerate variants thereof, wherein T is replaced by U; (7) nucleic acids complementary to (5) and (6); (8) fragments of (5), (6), and (7) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 4; (9) an operon comprising the sequence of SEQ ID NO: 8, as depicted in Fig.

3.

or degenerate variants thereof; (10) an operon comprising the sequence of SEQ ID NO: 8, or degenerate variants thereof, wherein T is replaced by U; (11) nucleic acids complementary to (9) and (10); (12) fragments of (9), (10), and (11) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 7; (13) an operon comprising the sequence of SEQ ID NO: 11, as depicted in Fig. 4, or degenerate variants thereof; (14) an operon comprising the sequence of SEQ ID NO: 11, or degenerate variants thereof, wherein T is replaced by U; (15) nucleic acids complementary to (13) and (14); and (16) fragments of (13), (14), and (15) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 10; (17) an operon comprising the sequence of SEQ ID NO: 14, as depicted in Fig. 5, or degenerate variants thereof ; (18) an operon comprising the sequence of SEQ ID NO: 14, or degenerate variants thereof, wherein T is replaced by U; (19) nucleic acids complementary to (17) and (18); (20) fragments of (17), (18), and (19) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 13; (21) an operon comprising the sequence of SEQ ID NO: 17, as depicted in Fig. 6, or degenerate variants thereof ; (22) an operon comprising the sequence of SEQ ID NO: 17, or degenerate variants thereof, wherein T is replaced by U; (23) nucleic acids complementary to (21) and (22); (24) fragments of (21), (22), and (23) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 16; (25) an operon comprising the sequence of SEQ ID NO: 20, as depicted in Fig. 7, or degenerate variants thereof ; (26) an operon comprising the sequence of SEQ ID NO: 20, or degenerate variants thereof, wherein T is replaced by U; (27) nucleic acids complementary to (25) and (26); (28) fragments of (25), (26), and (27) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 19; (29) an operon comprising the sequence of SEQ ID NO: 23, as depicted in Fig. 8, or degenerate variants thereof ; (30) an operon comprising the sequence of SEQ ID NO: 23, or degenerate variants thereof, wherein T is replaced by U; (31) nucleic acids complementary to (29) and (30); and (32) fragments of (39), (30), and (31) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 22; (33) an operon comprising the sequence of SEQ ID NO: 26, as depicted in Fig. 9, or degenerate variants thereof ; (34) an operon comprising the sequence of SEQ ID NO: 26, or degenerate variants thereof, wherein T is replaced by U; (35) nucleic acids complementary to (33) and (34); (36) fragments of (33), (34), and (35) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 25; (37) an operon comprising the sequence of SEQ ID NO: 29, as depicted in Fig. 10, or degenerate variants thereof ; (38) an operon comprising the sequence of SEQ ID NO: 29, or degenerate variants thereof, wherein T is replaced by U; (39) nucleic acids complementary to (37) and (38); (40) fragments of (37), (38), and (39) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 28; (41) an operon comprising the sequence of SEQ ID NO: 32, as depicted in Fig. 11, or degenerate variants thereof ; (42) an operon comprising the sequence of SEQ ID NO: 32, or degenerate variants thereof, wherein T is replaced by U; (43) nucleic acids complementary to (41) and (42); (44) fragments of (41), (42), and (43) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 31; (45) an operon comprising the sequence of SEQ ID NO: 35, as depicted in Fig. 12, or degenerate variants thereof ; (46) an operon comprising the sequence of SEQ ID NO: 35, or degenerate variants thereof, wherein T is replaced by U; (47) nucleic acids complementary to (45) and (46); and (48) fragments of (45), (46), and (47) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 34; (49) an operon comprising the sequence of SEQ ID NO: 38, as depicted in Fig. 13, or degenerate variants thereof ; (50) an operon comprising the sequence of SEQ ID NO: 38, or degenerate variants thereof, wherein T is replaced by U; (51) nucleic acids complementary to (49) and (50); (52) fragments of (49), (50), and (51) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 37; (53) an operon comprising the sequence of SEQ ID NO: 41, as depicted in Fig. 14, or degenerate variants thereof ; (54) an operon comprising the sequence of SEQ ID NO: 41, or degenerate variants thereof, wherein T is replaced by U; (55) nucleic acids complementary to (53) and (54); (56) fragments of (53), (54), and (55) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 40; (57) an operon comprising the sequence of SEQ ID NO: 44, as depicted in Fig. 15, or degenerate variants thereof ; (58) an operon comprising the sequence of SEQ ID NO: 44, or degenerate variants thereof, wherein T is replaced by U; (59) nucleic acids complementary to (57) and (58); (60) fragments of (57), (58), and (59) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 39; (61) an operon comprising the sequence of SEQ ID NO: 47, as depicted in Fig. 16, or degenerate variants thereof ; (62) an operon comprising the sequence of SEQ ID NO: 47, or degenerate variants thereof, wherein T is replaced by U; (63) nucleic acids complementary to (61) and (62); and (64) fragments of (61), (62), and (63) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 46; (65) an operon comprising the sequence of SEQ ID NO: 50, as depicted in Fig. 17, or degenerate variants thereof ; (66) an operon comprising the sequence of SEQ ID NO: 50, or degenerate variants thereof, wherein T is replaced by U; (67) nucleic acids complementary to (65) and (66); (68) fragments of (65), (66), and (67) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 49; (69) an operon comprising the sequence of SEQ ID NO: 53, as depicted in Fig. 18, or degenerate variants thereof ; (70) an operon comprising the sequence of SEQ ID NO: 53, or degenerate variants thereof, wherein T is replaced by U; (71) nucleic acids complementary to (69) and (70); (72) fragments of (69), (70), and (71) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 52; (73) an operon comprising the sequence of SEQ ID NO: 56, as depicted in Fig. 19, or degenerate variants thereof ; (74) an operon comprising the sequence of SEQ ID NO: 56, or degenerate variants thereof, wherein T is replaced by U; (75) nucleic acids complementary to (73) and (74); (76) fragments of (73), (74), and (75) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 55; (77) an operon comprising the sequence of SEQ ID NO: 59, as depicted in Fig. 20, or degenerate variants thereof ; (78) an operon comprising the sequence of SEQ ID NO: 59, or degenerate variants thereof, wherein T is replaced by U; (79) nucleic acids complementary to (77) and (78); and (80) fragments of (77), (78), and (79) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 58; (81) an operon comprising the sequence of SEQ ID NO: 62, as depicted in Fig. 21, or degenerate variants thereof; (82) an operon comprising the sequence of SEQ ID NO: 62, or degenerate variants thereof, wherein T is replaced by U; (83) nucleic acids complementary to (81) and (82); (84) fragments of (81), (82), and (83) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 61; (85) an operon comprising the sequence of SEQ ID NO: 65; as depicted in Fig. 22, or degenerate variants thereof ; (86) an operon comprising the sequence of SEQ ID NO: 65, or degenerate variants thereof, wherein T is replaced by U; (87) nucleic acids complementary to (85) and (86); (88) fragments of (85), (86), and (87) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 66; (89) an operon comprising the sequence of SEQ ID NO: 68, as depicted in Fig. 23, or degenerate variants thereof ; (90) an operon comprising the sequence of SEQ ID NO: 68, or degenerate variants thereof, wherein T is replaced by U; (91) nucleic acids complementary to (89) and (90); and (92) fragments of (89), (90), and (91) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 67.

4.

3 An isolated operon from Streptococcus comprising a nucleotide sequence that is at least 85% identical to a nucleotide sequence selected from the group consisting of SEQ ID NO: 2; SEQ ID NO: 5; SEQ ID NO: 8; SEQ ID NO: 11; SEQ ID NO: 14; SEQ ID NO : 17; SEQ ID N0:20; SEQ ID N0:23; SEQ ID N0:26; SEQ ID N0:29; SEQ ID NO : 32; SEQ ID NO : 35; SEQ ID NO : 38; SEQ ID NO : 41; SEQ ID NO : 44; SEQ ID NO : 47; SEQ ID NO:50; SEQ ID NO : 53; SEQ ID NO : 56; SEQ ID NO : 59; SEQ ID NO : 62; SEQ ID NO : 65; and NO:68.SEQID.

5.

An isolated nucleic acid molecule that is at least 15 base pairs in length and hybridizes under stringent conditions to a nucleotide sequence selected from the group consisting of SEQ ID NO : 2; SEQ ID NO:5; SEQ ID NO : 8; SEQ ID NO: 11; SEQ ID NO : 14; SEQ ID NO : 17; SEQ ID NO : 20; SEQ ID NO : 23; SEQ ID NO: 26; SEQ ID NO: 29; SEQ ID NO: 32; SEQ ID NO: 35; SEQ ID NO: 38; SEQ ID NO: 41; SEQ ID NO: 44; SEQ ID NO: 47; SEQ ID NO: 50; SEQ ID NO: 53; SEQ ID NO: 56; SEQ ID NO: 59; SEQ ID NO: 62; SEQ ID NO: 65; and SEQ ID NO: 68.

6.	A vector comprising an operon of claim 1.

7.	A vector comprising a nucleic acid molecule of claim 2.

8.	An expression vector comprising an operon of claim 1 operably linked to a nucleotide sequence regulatory element that controls expression of said operon.

9.	An expression vector comprising a nucleic acid molecule of claim 2, wherein said nucleic acid molecule is operably linked to a nucleotide sequence regulatory element that controls expression of said nucleic acid.

10.	A host cell comprising an exogenously introduced operon of claim 1.

11.	A host cell comprising an exogenously introduced nucleic acid molecule of claim 2.

12.	A host cell of claim 9, wherein the cell is a yeast or bacterium.

13.	A host cell of claim 10, wherein the cell is a yeast or bacterium.

14.	A genetically engineered host cell comprising an operon of claim 1 operably linked to a heterologous nucleotide sequence regulatory element that controls expression of the operon in the host cell.

15.	A host cell of claim 13, wherein the cell is a yeast or bacterium.

16.	A genetically engineered host cell comprising a nucleic acid molecule of claim 2 operably linked to a nucleotide sequence regulatory element that controls expression of the nucleic acid in the host cell.

17.	A host cell of claim 15, wherein the cell is a yeast or bacterium.

18.

An isolated operon comprising a nucleotide sequence encoding a polypeptide comprising an amino acid sequence selected from the group consisting of : the amino acid sequence of SEQ ID NO: 1, as depicted in Fig. 1; the amino acid sequence of SEQ ID NO: 4, as depicted in Fig. 2; the amino acid sequence of SEQ ID NO: 7, as depicted in Fig. 3; the amino acid sequence of SEQ ID NO: 10, as depicted in Fig. 4; the amino acid sequence of SEQ ID NO: 13, as depicted in Fig. 5; the amino acid sequence of SEQ ID NO: 16, as depicted in Fig. 6; the amino acid sequence of SEQ ID NO: 19, as depicted in Fig. 7; the amino acid sequence of SEQ ID NO: 22, as depicted in Fig. 8; the amino acid sequence of SEQ ID NO: 25, as depicted in Fig. 9; the amino acid sequence of SEQ ID NO: 28, as depicted in Fig. 10; the amino acid sequence of SEQ ID NO: 31, as depicted in Fig. 11; the amino acid sequence of SEQ ID NO: 34, as depicted in Fig. 12; the amino acid sequence of SEQ ID NO: 37, as depicted in Fig. 13; the amino acid sequence of SEQ ID NO: 40, as depicted in Fig. 14; the amino acid sequence of SEQ ID NO: 43, as depicted in Fig. 15; the amino acid sequence of SEQ ID NO: 46, as depicted in Fig. 16; the amino acid sequence of SEQ ID NO: 49, as depicted in Fig. 17; the amino acid sequence of SEQ ID NO: 52, as depicted in Fig. 18; the amino acid sequence of SEQ ID NO: 55, as depicted in Fig. 19; the amino acid sequence of SEQ ID NO: 58, as depicted in Fig. 20; the amino acid sequence of SEQ ID NO: 61, as depicted in Fig. 21; the amino acid sequence of SEQ ID NO : 64, as depicted in Fig. 22; and the amino acid sequence of SEQ ID NO : 67, as depicted in Fig. 23.

19.	An isolated polypeptide encoded by a nucleic acid located within an operon comprising a nucleic acid sequence selected from the group consisting of SEQ ID NO: 44,47,50,53, 56, 59,62,65, and 68.

20.	An isolated polypeptide, said polypeptide being encoded by an operon of claim 1.

21.	An isolated polypeptide, said polypeptide being encoded by a nucleic acid molecule of claim 2.

22.	An isolated polypeptide, said polypeptide being encoded by an operon of claim 3.

23.

A method for identifying an antibacterial agent, the method comprising: (a) contacting a test compound with a polypeptide, or a homolog of a polypeptide, encoded by a nucleic acid sequence located within an operon comprising a GEP gene selected from the group consisting of gep103, gep1119, gepll22, gepl315, gepl493, geplS07, geplSll, geplS18, geplS46, geplS51, gepl561, gepl580, gepl713, gep222, gep2283, gep273, gep286, gep311, gep3262, gep3387, gep47, gep61, and gep76; and (b) detecting binding of the test compound to the polypeptide, wherein binding indicates that the test compound is an antibacterial agent.

24.

The method of claim 22, further comprising: (c) determining whether a test compound that binds to the polypeptide inhibits growth of bacteria, relative to growth of bacteria cultured in the absence of a test compound that binds to the polypeptide, wherein inhibition of growth indicates that the test compound is an antibacterial agent.

25.	The method of claim 22, wherein the polypeptide is selected from the group consisting of geplO3, geplll9, gepll22, gepl315, gepl493, geplS07, geplSll, geplS18, geplS46, geplS51, gepl561, gepl580, gepl713, gep222, gep2283, gep273, gep286, gep311, gep3262, gep3387, gep47, gep61, and gep76.

26.	The method of claim 22, wherein the test compound is immobilized on a substrate, and binding of the test compound to the polypeptide is detected as immobilization of the polypeptide on the immobilized test compound.

27.	The method of claim 25, wherein immobilization of the polypeptide on the test compound is detected in an immunoassay with an antibody that specifically binds to the polypeptide.

28.	The method of claim 22, wherein the test compound is selected from the group consisting of polypeptides and small molecules.

29.

The method of claim 22, wherein: (a) the polypeptide is provided as a fusion protein comprising the polypeptide fused to (i) a transcription activation domain of a transcription factor or (ii) a DNAbinding domain of a transcription factor; and (b) the test compound is a polypeptide that is provided as a fusion protein comprising the test polypeptide fused to (i) a transcription activation domain of a transcription factor or (ii) a DNAbinding domain of a transcription factor, to interact with the fusion protein; and (c) binding of the test compound to the polypeptide is detected as reconstitution of a transcription factor.

30.	An antibody that specifically binds to a GEP polypeptide of claim 19.

31.	An antibody of claim 29, wherein the antibody is a monoclonal antibody.

32.

A method for identifying an antibacterial agent, the method comprising: (a) contacting a polypeptide encoded by a nucleic acid located within an operon comprising a GEP gene with a test compound; (b) detecting a decrease in function of the polypeptide contacted with the test compound; and (c) determining whether a test compound that decreases function of a contacted polypeptide inhibits growth of bacteria, relative to growth of bacteria cultured in the absence of a test compound that decreases function of a contacted polypeptide, wherein inhibition of growth indicates that the test compound is an antibacterial agent.

33.	The method of claim 31, wherein the polypeptide is selected from the group consisting of geplO3, geplll9, gepll22, gepl315, gepl493, gepl507, gepl511, gepl518, gepl546, gepl551, gepl561, gepl580, gepl713, gep222, gep2283, gep273, gep286, gep311, gep3262, gep3387, gep47, gep61, and gep76.

34.	The method of claim 31, wherein the test compound is selected from the group consisting of polypeptides and small molecules.

35.

A method for identifying an antibacterial agent, the method comprising: (a) contacting a nucleic acid comprising an operon containing a gene encoding a GEP polypeptide with a test compound, wherein the GEP polypeptide is selected from the group consisting of gep 103, gep 1119, gep 1122, gep 131 S, gep1511,gep1518,gep1546,gep1551,gep1561,gep1580,gep1493,gep1507, gepl713, gep222, gep2283, gep273, gep286, gep311, gep3262, gep3387, gep47, gep61, and gep76; and (b) detecting binding of the test compound to the nucleic acid, wherein binding indicates that the test compound is an antibacterial agent.

36.

The method of claim 34, further comprising: (c) determining whether a test compound that binds to the nucleic acid inhibits growth of bacteria, relative to growth of bacteria cultured in the absence of the test compound that binds to the nucleic acid, wherein inhibition of growth indicates that the test compound is an antibacterial agent.

37.	The method of claim 34, wherein the test compound is selected from the group consisting of polypeptides and small molecules.

38.

An isolated nucleic acid or an allelic variant thereof encoding: a gepl493 polypeptide comprising the amino acid sequence of SEQ ID NO: 13, as depicted in Fig. 5; a gepl507 polypeptide comprising the amino acid sequence of SEQ ID NO: 16, as depicted in Fig. 6; a gepl546 polypeptide comprising the amino acid sequence of SEQ ID NO: 25, as depicted in Fig. 9; a gep273 polypeptide comprising the amino acid sequence of SEQ ID NO: 46, as depicted in Fig. 16; a gep286 polypeptide comprising the amino acid sequence of SEQ ID NO: 49, as depicted in Fig. 17; or a gep76 polypeptide comprising the amino acid sequence of SEQ ID NO: 67, as depicted in Fig. 23.

39.

An isolated nucleic acid comprising a sequence selected from the group consisting of : (1) SEQ ID NO: 14, as depicted in Fig. 5, or degenerate variants thereof; (2) SEQ ID NO: 14, or degenerate variants thereof, wherein T is replaced by U; (3) nucleic acids complementary to (1) and (2); (4) fragments of (1), (2), and (3) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 13; (5) SEQ ID NO: 17, as depicted in Fig. 6, or degenerate variants thereof; (6) SEQ ID NO: 17, or degenerate variants thereof, wherein T is replaced by U; (7) nucleic acids complementary to (5) and (6); (8) fragments of (5), (6), and (7) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 16; (9) SEQ ID NO: 26, as depicted in Fig. 9, or degenerate variants thereof ; (10) SEQ ID NO: 26, or degenerate variants thereof, wherein T is replaced by U; (11) nucleic acids complementary to (9) and (10); (12) fragments of (9), (10), and (11) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 25; (13) SEQ ID NO: 47, as depicted in Fig. 16, or degenerate variants thereof ; (14) SEQ ID NO: 47, or degenerate variants thereof, wherein T is replaced by U; (15) nucleic acids complementary to (13) and (14); (16) fragments of (13), (14), and (15) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 46; (17) SEQ ID NO: 50, as depicted in Fig. 17, or degenerate variants thereof ; (18) SEQ ID NO: 50, or degenerate variants thereof, wherein T is replaced by U; (19) nucleic acids complementary to (i) and (j); (20) fragments of (i), (j), and (k) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 49; (21) SEQ ID NO: 68, as depicted in Fig. 23, or degenerate variants thereof ; (22) SEQ ID NO: 68, or degenerate variants thereof, wherein T is replaced by U; (23) nucleic acids complementary to (21) and (22); and (24) fragments of (21), (22), and (23) that are at least 15 base pairs in length and which hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 67.

40.

A method for identifying an antibacterial agent, the method comprising: (a) contacting a test compound with a polypeptide, or a homolog of a polypeptide, encoded by a nucleic acid sequence located within an operon comprising a ByneS gene; and (b) detecting binding of the test compound to the polypeptide, wherein binding indicates that the test compound is an antibacterial agent.

41.

The method of claim 39, further comprising: (c) determining whether a test compound that binds to the polypeptide inhibits growth of bacteria, relative to growth of bacteria cultured in the absence of a test compound that binds to the polypeptide, wherein inhibition of growth indicates that the test compound is an antibacterial agent.

Description:

ESSENTIAL BACTERIAL GENES AND THEIR USE Background of the Invention The invention relates to essential bacterial genes and their use in identifying antibacterial agents.

Bacterial infections may be cutaneous, subcutaneous, or systemic.

Opportunistic bacterial infections proliferate, especially in patients afflicted with AIDS or other diseases that compromise the immune system. The bacterium Streptococcus pneumonia typically infects the respiratory tract and can cause lobar pneumonia, as well as meningitis, sinusitis, and other infections.

Summum of the Invention The invention is based on the discovery of 23 genes in the bacterium Streptococcus pneumoniae, and a related gene in the bacterium Bacillus subtilis, that are located within operons that are essential for survival. These 23 Streptococcus genes are referred to herein as"GEP genes" (which stands for general essential protein); for convenience, the polypeptides encoded by these genes are referred to herein as"GEP polypeptides."Each GEP gene is located within an operon that contains a gene that is essential for survival of Streptococcus pneumonie; the essential gene can be the GEP gene or another gene located within the same operon. Bacterial operons contain several genes that are related, e. g., with respect to function or biochemical pathway. Transcription of an operon leads to the production of a single transcript in which multiple coding regions are linked.

Thus, an operon containing one or more essential genes can be considered an "essential operon,"since disruption of expression of one gene located within the operon will interfere with expression of the other genes in the operon. Each coding region of the transcript is separately translated into an individual polypeptide by ribosomes that initiate translation at multiple points along the transcript. Having identified one gene in the operon, one can readily identify and sequence the other genes located within the operon.

The genes encoding the GEP polypeptides are useful molecular tools for identifying similar genes in pathogenic microorganisms, such as pathogenic strains of Bacillus. In addition, the operons containing genes encoding GEP polypeptides, and the polypeptides encoded by such operons, are useful targets for identifying compounds that are inhibitors of the pathogens in which the GEP polypeptides are expressed. Such inhibitors inhibit bacterial growth by being bacteriostatic (e. g., inhibiting reproduction or cell division) or by being bacteriocidal (i. e., by causing cell death).

The invention, therefore, features an isolated polypeptide encoded by a nucleic acid located within an operon encoding a GEP polypeptide, termed gep103, having the amino acid sequence set forth in SEQ ID NO: 1, or conservative variations thereof. An isolated operon comprising a nucleic acid encoding gep 103 also is included within the invention. In addition, the invention includes an isolated nucleic acid of (a) an operon comprising the sequence of SEQ ID NO: 2, as depicted in Fig. 1, or degenerate variants thereof; (b) an operon comprising the sequence of SEQ ID NO: 2, or degenerate variants thereof, wherein T is replaced by U; (c) nucleic acids complementary to (a) and (b); and (d) fragments of (a), (b), and (c) that are at least 15 base pairs in length and that hybridize under stringent conditions to genomic DNA encoding the polypeptide of SEQ ID NO: 1. As described above for gep103, other nucleic acids and polypeptides encoded by nucleic acids located within operons encoding GEP polypeptides are included within the invention, including: (a) operons comprising the nucleic acids represented by the SEQ ID NOs. listed below, as depicted in the Figures listed below, or degenerate variants thereof; (b) operons comprising the nucleic acids represented by the SEQ ID NOs. listed below, wherein T is replaced by U; (c) nucleic acids complementary to (a) and (b); and (d) fragments of (a), (b), and (c) that are at least 15 base pairs in length and that hybridize under stringent conditions to genomic DNA encoding the polypeptides represented by the SEQ ID NOs. listed below.

Table 1: GEP nucleic acids and polypeptides GEP Nucleic Figure SEQ ID No. SEQ ID No. of SEQ ID No. Acid or No. of Amino the Coding of the Non- Polypeptide Acid Strand of the coding Sequence Nucleic Acid Strand of NucleicSequencethe Acid Sequence gep103 1 1 2 3 geplll9 2 4 6 gep1122 3 7 8 9 101112gep13154 131415gep14935 161718gep15076 192021gep15117 222324gep15188 252627gep15469 gepl551 10 28 29 30 313233gep156111 343536gep158012 373839gep171313 gep222 14 40 41 42 gep2283 15 43 44 45 gep273 16 46 47 48 gep286 17 49 50 51 gep311 18 52 53 54 gep3262 19 55 56 57 gep3387 20 58 59 60 gep47 21 61 62 63

GEP Nucleic Figure SEQ ID No. SEQ ID No. of SEQ ID No. Acid or No. of Amino the Coding of the Non- Polypeptide Acid Strand of the coding Sequence Nucleic Acid Strand of Sequence the Nucleic Acid Sequence gep61 22 64 65 66 gep76 23 67 68 69 The invention also includes allelic variants (i. e., genes encoding isozymes) of the genes located within operons encoding the GEP polypeptides listed above.

For example, the invention includes a gene that encodes a GEP polypeptide but which gene includes one or more point mutations, deletions, promotor variants, or splice site variants, provided that the resulting GEP polypeptide functions as a GEP polypeptide (e. g., as determined in a conventional complementation assay).

Identification of these GEP genes and the determination that they are located within operons containing an essential gene allows homologs of the GEP genes to be found in other organisms strains of Streptococcus. Also, orthologs of these genes can be identified in other species (e. g., Bacillus sp.). While "homologs"are structurally similar genes contained within a species,"orthologs" are functionally equivalent genes from other species (within or outside of a given genus, e. g., from Bacillus subtilis or E. coli). Such homologs and orthologs are expected to be located within operons that are essential for survival. Such homologous and orthologous genes and polypeptides can be used to identify compounds that inhibit the growth of the host organism (e. g., compounds that are bacteriocidal or bacteriostatic against pathogenic strains of the organism).

Homologous and orthologous genes and polypeptides that are essential for survival can serve as targets for identifying a broad spectrum of antibacterial agents.

An ortholog of gepl493, termed B-yneS, has been identified in B. subtilis and is essential for survival of B. subtilis. The amino acid sequence (SEQ ID NO: 70), coding sequence (SEQ ID NO: 71), and non-coding sequence (SEQ ID NO: 72)

of B-yneS is set forth in Fig. 24. As with the other polypeptides and genes disclosed herein, the B-yneS polypeptide and gene can be used in the methods described herein to identify antibacterial agents.

The term gep103 polypeptide or gene as used herein is intended to include the polypeptide and gene set forth in Fig. 1 herein, as well as homologs of the sequences set forth in Fig. 1. Also encompassed by the term gep 103 gene are degenerate variants of the nucleic acid sequence set forth in Fig. 1 (SEQ ID NO: 2).

Degenerate variants of a nucleic acid sequence exist because of the degeneracy of the amino acid code; thus, those sequences that vary from the sequence represented by SEQ ID NO: 2, but which nonetheless encode a gep 103 polypeptide are included within the invention. Likewise, because of the similarity in the structures of amino acids, conservative variations (as described herein) can be made in the amino acid sequence of the gep 103 polypeptide while retaining the function of the polypeptide (e. g., as determined in a conventional complementation assay). Other gep103 polypeptides and genes identified in additional Streptococcus strains may be such conservative variations or degenerate variants of the particular gep 103 polypeptide and nucleic acid set forth in Fig. 1 (SEQ ID NOs: l and 2, respectively). The gep103 polypeptide and gene share at least 80%, e. g., 90%, sequence identity with SEQ ID NOs: l and 2, respectively. Regardless of the percent sequence identity between the gep 103 sequence and the sequence represented by SEQ ID NOs: l and 2, the gep 103 genes and polypeptides encompassed by the invention are able to complement for the lack of gep103 function (e. g., in a temperature-sensitive mutant) in a standard complementation assay. Additional gep 103 genes that are identified and cloned from additional Streptococcus strains, and pathogenic strains in particular, can be used to produce gep 103 polypeptides for use in the various methods described herein, e. g., for identifying antibacterial agents. Likewise, the terms gep1315,gep1493,gep1507,gep1511,gep1518,gep1546,gep1122, gepl551, gepl561, gepl580, gepl713, gep222, gep2283, gep273, gep286, gep311, gep3262, gep3387, gep47, gep61, and gep76 encompass homologs, conservative variations, and degenerate variants of the sequences depicted in Figs. 2-23,

respectively. Such homologs, conservative variations, and degenerate variants also are included within the invention.

Since the various GEP genes described herein have been identified and shown to be located within operons that are essential for survival, the GEP genes and polypeptides encoded by nucleic acid sequences located within operons containing GEP genes and their homologs and orthologs can be used to identify antibacterial agents. More specifically, the polypeptides encoded by nucleic acid sequences located within operons containing GEP genes can be used, separately or together, in assays to identify test compounds that bind to these polypeptides. Such test compounds are expected to be antibacterial agents, in contrast to compounds that do not bind to these GEP polypeptides. As described herein, any of a variety of art-known methods can be used to assay for binding of test compounds to the polypeptides. The invention includes, for example, a method for identifying an antibacterial agent where the method entails: (a) contacting a polypeptide encoded by a nucleic acid sequence located within an operon containing a GEP gene, or homolog or ortholog thereof, with a test compound; (b) detecting binding of the test compound to the polypeptide or homolog or ortholog; and (c) determining whether a test compound that binds to the polypeptide or homolog or ortholog inhibits growth of bacteria, relative to growth of bacteria cultured in the absence of the test compound that binds to the polypeptide or homolog or ortholog, as an indication that the test compound is an antibacterial agent.

In various embodiments, the GEP polypeptide is derived from a non- pathogenic or pathogenic Streptococcus strain, such as Streptococcus pneumoniae, Streptococcus pyogenes, Streptococcus agalactiae, Streptococcus endocarditis, Streptococcus faecium, Streptococcus sangus, Streptococcus viridans, and Streptococcus hemolyticus. Suitable orthologs of the Streptococcus GEP genes can be derived from the bacterium Bacillus subtilis. The test compound can be immobilized on a substrate, and binding of the test compound to the polypeptide or homolog or ortholog can be detected as immobilization of the polypeptide or

homolog or ortholog on the immobilized test compound, e. g., in an immunoassay with an antibody that specifically binds to the polypeptide.

If desired, the test compound can be a test polypeptide (e. g., a polypeptide having a random or predetermined amino acid sequence; or a naturally-occurring or synthetic polypeptide). Alternatively, the test compound can be a nucleic acid, such as a DNA or RNA molecule. In addition, small organic molecules can be tested. The test compound can be a naturally-occurring compound or it can be synthetically produced, if desired. Synthetic libraries, chemical libraries, and the like can be screened to identify compounds that bind to the polypeptides. More generally, binding of test compounds to the polypeptide or homolog or ortholog can be detected either in vitro or in vivo. Regardless of the source of the test compound, the polypeptides described herein can be used to identify compounds that are bacterioidal or bacteriostatic to a variety of pathogenic or non-pathogenic strains.

In an exemplary method, binding of a test compound to a polypeptide encoded by a nucleic acid located within an operon containing a GEP gene can be detected in a conventional two-hybrid system for detecting protein/protein interactions (e. g., in yeast or mammalian cells). Generally, in such a method, (a) the polypeptide encoded by a nucleic acid located within an operon containing a GEP gene is provided as a fusion protein that includes the polypeptide fused to (i) a transcription activation domain of a transcription factor or (ii) a DNA-binding domain of a transcription factor; (b) the test polypeptide is provided as a fusion protein that includes the test polypeptide fused to (i) a transcription activation domain of a transcription factor or (ii) a DNA-binding domain of a transcription factor; and (c) binding of the test polypeptide to the polypeptide is detected as reconstitution of a transcription factor. Homologs and orthologs of the GEP polypeptides can be used in similar methods. Reconstitution of the transcription factor can be detected, for example, by detecting transcription of a gene that is operably linked to a DNA sequence bound by the DNA-binding domain of the reconstituted transcription factor (See, for example, White, 1996, Proc. Natl. Acad.

Sci. 93: 10001-10003 and references cited therein and Vidal et al., 1996, Proc. Natl.

Acad. Sci. 93: 10315-10320).

In an alternative method, an isolated operon containing a nucleic acid molecule encoding a GEP polypeptide is used to identify a compound that decreases the expression of a GEP polypeptide in vivo. Such compounds can be used as antibacterial agents. To discover such compounds, cells that express a GEP polypeptide are cultured, exposed to a test compound (or a mixture of test compounds), and the level of expression or activity is compared with the level of GEP polypeptide expression or activity in cells that are otherwise identical but that have not been exposed to the test compound (s). Many standard quantitative assays of gene expression can be utilized in this aspect of the invention.

To identify compounds that modulate expression of a GEP polypeptide (or homologous or orthologous sequence), the test compound (s) can be added at varying concentrations to the culture medium of cells that express a GEP polypeptide (or homolog or ortholog), as described herein. Such test compounds can include small molecules (typically, non-protein, non-polysaccharide chemical entities), polypeptides, and nucleic acids. The expression of the GEP polypeptide is then measured, for example, by Northern blot PCR analysis or RNAse protection analyses using a nucleic acid molecule of the invention as a probe. The level of expression in the presence of the test molecule, compared with the level of expression in its absence, will indicate whether or not the test molecule alters the expression of the GEP polypeptide. Because the GEP polypeptides are expressed from operons that are essential for survival, test compounds that inhibit the expression and/or function of the GEP polypeptide will inhibit growth of the cells or kill the cells.

Compounds that modulate the expression of the polypeptides of the invention can be identified by carrying out the assays described herein and then measuring the levels of the GEP polypeptides expressed in the cells, e. g., by performing a Western blot analysis using antibodies that bind to a GEP polypeptide.

The invention further features methods of identifying from a large group of mutants those strains that have conditional lethal mutations. In general, the gene and corresponding gene product are subsequently identified, although the strains themselves can be used in screening or diagnostic assays. The mechanism (s) of action for the identified genes and gene products provide a rational basis for the design of antibacterial therapeutic agents. These antibacterial agents reduce the action of the gene product in a wild type strain, and therefore are useful in treating a subject with that type, or a similarly susceptible type of infection by administering the agent to the subject in a pharmaceutically effective amount.

Reduction in the action of the gene product includes competitive inhibition of the gene product for the active site of an enzyme or receptor; non-competitive inhibition; disrupting an intracellular cascade path which requires the gene product; binding to the gene product itself, before or after post-translational processing; and acting as a gene product mimetic, thereby down-regulating the activity.

Therapeutic agents include monoclonal antibodies raised against the gene product.

Furthermore, the presence of the gene sequence in certain cells (e. g., a pathogenic bacterium of the same genus or similar species), and the absence or divergence of the sequence in host cells can be determined, if desired. Therapeutic agents directed toward genes or gene products that are not present in the host have several advantages, including fewer side effects, and lower overall dosage.

The invention includes pharmaceutical formulations that include a pharmaceutically acceptable excipient and an antibacterial agent identified using the methods described herein. In particular, the invention includes pharmaceutical formulations that contain antibacterial agents that inhibit the growth of, or kill, pathogenic Streptococcus strains. Such pharmaceutical formulations can be used for treating a Streptococcus infection in an organism. Such a method entails administering to the organism a therapeutically effective amount of the pharmaceutical formulation. In particular, such pharmaceutical formulations can be used to treat streptococcal pneumonia in mammals such as humans and domesticated mammals (e. g., cows, pigs, dogs, and cats), and in plants. The

efficacy of such antibacterial agents in humans can be estimated in an animal model system well known to those of skill in the art (e. g., mouse and rabbit model systems).

Also included within the invention are polyclonal and monoclonal antibodies that specifically bind to the various GEP polypeptides described herein (e. g., gep103). Such antibodies can facilitate detection of GEP polypeptides in various Streptococcus strains. These antibodies also are useful for detecting binding of a test compound to GEP polypeptides (e. g., using the assays described herein). In addition, monoclonal antibodies that bind to GEP polypeptides are themselves adequate antibacterial agents when administered to a mammal, as such monoclonal antibodies are expected to impede one or more functions of GEP polypeptides.

As used herein,"nucleic acids"encompass both RNA and DNA, including genomic DNA and synthetic (e. g., chemically synthesized) DNA. The nucleic acid can be double-stranded or single-stranded. Where single-stranded, the nucleic acid may be a sense strand or an antisense strand. The nucleic acid may be synthesized using oligonucleotide analogs or derivatives (e. g., inosine or phosphorothioate nucleotides). Such oligonucleotides can be used, for example, to prepare nucleic acids that have altered base-pairing abilities or increased resistance to nucleases.

An"isolated nucleic acid"is a DNA or RNA that is not immediately contiguous with both of the coding sequences with which it is immediately contiguous (one on the 5'end and one on the 3'end) in the naturally occurring genome of the organism from which it is derived. Thus, in one embodiment, an isolated nucleic acid includes some or all of the 5'non-coding (e. g., promoter) sequences that are immediately contiguous to the coding sequence. The term therefore includes, for example, a recombinant DNA that is incorporated into a vector, into an autonomously replicating plasmid or virus, or into the genomic DNA of a prokaryote or eukaryote, or which exists as a separate molecule (e. g., a genomic DNA fragment produced by PCR or restriction endonuclease treatment) independent of other sequences. It also includes a recombinant DNA that is part of a hybrid gene encoding an additional polypeptide sequence. The term"isolated"

can refer to a nucleic acid or polypeptide that is substantially free of cellular material, viral material, or culture medium (when produced by recombinant DNA techniques), or chemical precursors or other chemicals (when chemically synthesized). Moreover, an"isolated nucleic acid fragment"is a nucleic acid fragment that is not naturally occurring as a fragment and would not be found in the natural state. As used herein, the term"isolated nucleic acid molecule"includes an operon containing a contiguous cluster of linked sequences."Isolated operons" are those operons that are not naturally occurring and which are not associated with the sequences by which they are normally surrounded in a bacterial genome.

A nucleic acid sequence that is"substantially identical"to a GEP nucleotide sequence is at least 80% (e. g., 85%) identical to the nucleotide sequence of the nucleic acid sequences represented by the SEQ ID NOs listed in Table 1, as depicted in Figs. 1-23. For purposes of comparison of nucleic acids, the length of the reference nucleic acid sequence will generally be at least 40 nucleotides, e. g., at least 60 nucleotides or more nucleotides. Sequence identity can be measured using sequence analysis software (e. g., Sequence Analysis Software Package of the Genetics Computer Group, University of Wisconsin Biotechnology Center, 1710 University Avenue, Madison, WI 53705).

The GEP polypeptides useful in practicing the invention include, but are not limited to, recombinant polypeptides and natural polypeptides. Also useful in the invention are nucleic acid sequences that encode forms of GEP polypeptides in which naturally occurring amino acid sequences are altered or deleted. Preferred nucleic acids encode polypeptides that are soluble under normal physiological conditions. Also within the invention are nucleic acids encoding fusion proteins in which a portion of a GEP polypeptide is fused to an unrelated polypeptide (e. g., a marker polypeptide or a fusion partner) to create a fusion protein. For example, the polypeptide can be fused to a hexa-histidine tag to facilitate purification of bacterially expressed polypeptides, or to a hemagglutinin tag to facilitate purification of polypeptides expressed in eukaryotic cells. The invention also includes, for example, isolated polypeptides (and the nucleic acids that encode these

polypeptides) that include a first portion and a second portion; the first portion includes, e. g., a GEP polypeptide, and the second portion includes an immunoglobulin constant (Fc) region or a detectable marker.

The fusion partner can be, for example, a polypeptide which facilitates secretion, e. g., a secretory sequence. Such a fused polypeptide is typically referred to as a preprotein. The secretory sequence can be cleaved by the host cell to form the mature protein. Also within the invention are nucleic acids that encode a GEP polypeptide fused to a polypeptide sequence to produce an inactive preprotein.

Preproteins can be converted into the active form of the protein by removal of the inactivating sequence.

The invention also includes nucleic acids that hybridize, e. g., under stringent hybridization conditions (as defined herein) to all or a portion of the nucleotide sequences represented by the SEQ ID NOs. listed in Table 1, or their complements.

The hybridizing portion of the hybridizing nucleic acids is typically at least 15 (e. g., 20,30, or 50) nucleotides in length. The hybridizing portion of the hybridizing nucleic acid is at least 80%, e. g., at least 95%, or at least 98%, identical to the sequence of a portion or all of a nucleic acid encoding a GEP polypeptide or its complement. Hybridizing nucleic acids of the type described herein can be used as a cloning probe, a primer (e. g., a PCR primer), or a diagnostic probe. Nucleic acids that hybridize to the nucleotide sequences represented by the SEQ ID NOs. listed in Table 1 are considered"antisense oligonucleotides."Also included within the invention are ribozymes that inhibit the function of operons containing the GEP genes of the invention, as determined, for example, in a complementation assay.

Also useful in the invention are various cells, e. g., transformed host cells, that contain a GEP nucleic acid described herein. A"transformed cell"is a cell into which (or into an ancestor of which) has been introduced, by means of recombinant DNA techniques, a nucleic acid encoding a GEP polypeptide. Both prokaryotic and eukaryotic cells are included, e. g., bacteria, Streptococcus, Bacillus, and the like.

Also useful in the invention are genetic constructs (e. g., vectors and plasmids) that include a nucleic acid of the invention which is operably linked to a transcription and/or translation sequence to enable expression, e. g., expression vectors. By"operably linked"is meant that a selected nucleic acid, e. g., a DNA molecule encoding a GEP polypeptide, is positioned adjacent to one or more sequence elements, e. g., a promoter, which directs transcription and/or translation of the sequence such that the sequence elements can control transcription and/or translation of the selected nucleic acid.

The invention also features purified or isolated polypeptides encoded by nucleic acids located within operons containing GEP genes, as listed in Table 1.

As used herein, both"protein"and"polypeptide"mean any chain of amino acids, regardless of length or post-translational modification (e. g., glycosylation or phosphorylation). Thus, the terms gep 103 polypeptide, geplll9 polypeptide, gepll22 polypeptide, gepl315 polypeptide, gepl493 polypeptide, gepl507 polypeptide, gepl511 polypeptide, gep1518 polypeptide, gepl546 polypeptide, gepl551 polypeptide, gepl561 polypeptide, gepl580 polypeptide, gep1713 polypeptide, gep222 polypeptide, gep2283 polypeptide, gep273 polypeptide, gep286 polypeptide, gep311 polypeptide, gep3262 polypeptide, gep3387 polypeptide, gep47 polypeptide, gep61 polypeptide, and gep76 polypeptide include full-length, naturally occurring gep 103, gep 1119, gep 1122, gep 1315, gep 1493, gep 1507, gepl511, gepl518, gepl546, gepl551, gepl561, gepl580, gepl713, gep222, gep2283, gep273, gep286, gep311, gep3262, gep3387, gep47, gep61, and gep76 proteins, respectively, as well as recombinantly or synthetically produced polypeptides that correspond to the full-length, naturally occurring proteins, or to a portion of the naturally occurring or synthetic polypeptide.

A"purified"or"isolated"compound is a composition that is at least 60% by weight the compound of interest, e. g., a GEP polypeptide or antibody. Preferably the preparation is at least 75% (e. g., at least 90% or 99%) by weight the compound of interest. Purity can be measured by any appropriate standard method, e. g., column chromatography, polyacrylamide gel electrophoresis, or HPLC analysis.

Preferred GEP polypeptides include a sequence substantially identical to all or a portion of a naturally occurring GEP polypeptide, e. g., including all or a portion of the sequences shown in Figs. 1-23. Polypeptides"substantially identical" to the GEP polypeptide sequences described herein have an amino acid sequence that is at least 80% (e. g., 85%, 90%, 95%, or 99%) identical to the amino acid sequence of the GEP polypeptides represented by the SEQ ID NOs. listed in Table 1. For purposes of comparison, the length of the reference GEP polypeptide sequence will generally be at least 16 amino acids, e. g., at least 20 or 25 amino acids.

In the case of polypeptide sequences that are less than 100% identical to a reference sequence, the non-identical positions are preferably, but not necessarily, conservative substitutions for the reference sequence. Conservative substitutions typically include substitutions within the following groups: glycine and alanine; valine, isoleucine, and leucine; aspartic acid and glutamic acid; asparagine and glutamine; serine and threonine; lysine and arginine; and phenylalanine and tyrosine.

Where a particular polypeptide is said to have a specific percent identity to a reference polypeptide of a defined length, the percent identity is relative to the reference polypeptide. Thus, a polypeptide that is 50% identical to a reference polypeptide that is 100 amino acids long can be a 50 amino acid polypeptide that is completely identical to a 50 amino acid long portion of the reference polypeptide.

It also might be a 100 amino acid long polypeptide which is 50% identical to the reference polypeptide over its entire length. Of course, other polypeptides also will meet the same criteria.

The invention also features purified or isolated antibodies that specifically bind to a GEP polypeptide. By"specifically binds"is meant that an antibody recognizes and binds to a particular antigen, e. g., a GEP polypeptide, but does not substantially recognize and bind to other molecules in a sample, e. g., a biological sample that naturally includes a GEP polypeptide.

In another aspect, the invention features a method for detecting a GEP polypeptide in a sample. This method includes: obtaining a sample suspected of containing a GEP polypeptide; contacting the sample with an antibody that specifically binds to a GEP polypeptide under conditions that allow the formation of complexes of an antibody and the GEP polypeptide; and detecting the complexes, if any, as an indication of the presence of a GEP polypeptide in the sample.

Also encompassed by the invention is a method of obtaining a gene related to (i. e., a functional homolog or ortholog of) a GEP gene. Such a method entails obtaining a labeled probe that includes an isolated nucleic acid which encodes all or a portion of a GEP nucleic acid, or a homolog or ortholog thereof ; screening a nucleic acid fragment library with the labeled probe under conditions that allow hybridization of the probe to nucleic acid fragments in the library, thereby forming nucleic acid duplexes; isolating labeled duplexes, if any; and preparing a full-length gene sequence from the nucleic acid fragments in any labeled duplex to obtain a gene related to the GEP gene.

The invention offers several advantages. For example, the methods for identifying antibacterial agents can be configured for high throughput screening of numerous candidate antibacterial agents.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described herein. All publications, patent applications, patents, and other references mentioned herein are incorporated herein by reference in their entirety. In the case of a conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative and are not intended to limit the scope of the invention, which is defined by the claims.

Other features and advantages of the invention will be apparent from the following detailed description, and from the claims.

Brief Description of the Drawings Fig. 1 is a representation of the amino acid and coding strand and non- coding strand nucleic acid sequences of the gep 103 polypeptide and gene from a Streptococcus pneumonia strain (SEQ ID NOs: l, 2, and 3 respectively).

Fig. 2 is a representation of the amino acid and coding strand and non- coding strand nucleic acid sequences of the gep 1119 polypeptide and gene from a Streptococcus pneumonia strain (SEQ ID NOs: 4,5 and 6, respectively).

Fig. 3 is a representation of the amino acid and coding strand and non- coding strand nucleic acid sequences of the gep 1122 polypeptide and gene from a Streptococcus pneumonia strain (SEQ ID NOs: 7,8, and 9, respectively).

Fig. 4 is a representation of the amino acid and coding strand and non- coding strand nucleic acid sequences of the gepl315 polypeptide and gene from a Streptococcus pneumonia strain (SEQ ID NOs: 10,11, and 12, respectively).

Fig. 5 is a representation of the amino acid and coding strand and non- coding strand nucleic acid sequences of the gepl493 polypeptide and gene from a Streptococcus pneumonia strain (SEQ ID NOs: 13,14, and 15, respectively).

Fig. 6 is a representation of the amino acid and coding strand and non- coding strand nucleic acid sequences of the gepl507 polypeptide and gene from a Streptococcus pneumonia (SEQ ID NOs: 16,17, and 18, respectively).

Fig. 7 is a representation of the amino acid and coding strand and non- coding strand nucleic acid sequences of the gepl511 polypeptide and gene from a Streptococcus pneumonia (SEQ ID NOs: 19,20, and 21, respectively).

Fig. 8 is a representation of the amino acid and coding strand and non- coding strand nucleic acid sequences of the gepl518 polypeptide and gene from a Streptococcus pneumonia (SEQ ID NOs: 22,23, and 24, respectively).

Fig. 9 is a representation of the amino acid and coding strand and non- coding strand nucleic acid sequences of the gepl546 polypeptide and gene from a Streptococcus pneumonia strain (SEQ ID NOs: 25,26, and 27, respectively).

Fig. 10 is a representation of the amino acid and coding strand and non- coding strand nucleic acid sequences of the gepl551 polypeptide and gene from a Streptococcus pneumonia strain (SEQ ID NOs: 28,29, and 30, respectively).

Fig. 11 is a representation of the amino acid and coding strand and non- coding strand nucleic acid sequences of the gepl561 polypeptide and gene from a Streptococcus pneumonia strain (SEQ ID NOs: 31,32, and 33, respectively).

Fig. 12 is a representation of the amino acid and coding strand and non- coding strand nucleic acid sequences of the gepl580 polypeptide and gene from a Streptococcus pneumonia strain (SEQ ID NOs: 34,35, and 36, respectively).

Fig. 13 is a representation of the amino acid and coding strand and non- coding strand nucleic acid sequences of the gepl713 polypeptide and gene from a Streptococcus pneumonia (SEQ ID NOs: 37,38, and 39, respectively).

Fig. 14 is a representation of the amino acid and coding strand and non- coding strand nucleic acid sequences of the gep222 polypeptide and gene from a Streptococcus pneumonia (SEQ ID NOs: 40,41, and 42, respectively).

Fig. 15 is a representation of the amino acid and coding strand and non- coding strand nucleic acid sequences of the gep2283 polypeptide and gene from a Streptococcus pneumonia (SEQ ID NOs: 43,44, and 45, respectively).

Fig. 16 is a representation of the amino acid and coding strand and non- coding strand nucleic acid sequences of the gep273 polypeptide and gene from a Streptococcus pneumonia strain (SEQ ID NOs: 46,47, and 48, respectively).

Fig. 17 is a representation of the amino acid and coding strand and non- coding strand nucleic acid sequences of the gep286 polypeptide and gene from a Streptococcus pneumonia strain (SEQ ID NOs: 49,50, and 51, respectively).

Fig. 18 is a representation of the amino acid and coding strand and non- coding strand nucleic acid sequences of the gep311 polypeptide and gene from a Streptococcus pneumonia (SEQ ID NOs: 52,53, and 54, respectively).

Fig. 19 is a representation of the amino acid and coding strand and non- coding strand nucleic acid sequences of the gep3262 polypeptide and gene from a Streptococcus pneumonia (SEQ ID NOs: 55, 56, and 57, respectively).

Fig. 20 is a representation of the amino acid and coding strand and non- coding strand nucleic acid sequences of the gep3387 polypeptide and gene from a Streptococcus pneumonia (SEQ ID NOs: 58,59, and 60, respectively).

Fig. 21 are a representation of the amino acid and coding strand and non- coding strand nucleic acid sequences of the gep47 polypeptide and gene from a Streptococcus pneumonia strain (SEQ ID NOs: 61,62, and 63, respectively).

Fig. 22 is a representation of the amino acid and coding strand and non- coding strand nucleic acid sequences of the gep61 polypeptide and gene from a Streptococcus pneumonia strain (SEQ ID NOs: 64,65, and 66, respectively).

Fig. 23 is a representation of the amino acid and coding strand and non- coding strand nucleic acid sequences of the gep76 polypeptide and gene from a Streptococcus pneumonia strain (SEQ ID NOs: 67,68, and 69, respectively).

Fig. 24 is a representation of the amino acid and coding strand and non- coding strand nucleic acid sequences of the B-yneS polypeptide and gene from a Bacillus subtilis strain (SEQ ID NOs: 70,71, and 72, respectively).

Fig. 25 is a schematic representation of the PCR strategy used to produce DNA molecules used for targeted deletions of essential genes in Streptococcus pneumoniae.

Fig. 26 is a schematic representation of the strategy used to produce targeted deletions of essential genes in Streptococcus pneumoniae.

Detailed Description of the Invention Identifving Streptococcus Genes in Essential Operons As shown by the experiments described below, each of the GEP genes is located within an operon that is essential for survival of Streptococcus pneumonia.

Streptococcus pneumonia is available from the ATCC. To identify genes located within essential operons, mutants of Streptococcus pneumonia were produced. In

general, mutagenesis of Streptococcus pneumonia can be accomplished using any of various art-known methods.

In general, and for the examples set forth below, genes located within essential Streptococcus pneumonia operons can be identified using genes from a Streptococcus pneumonia RX1 genomic library, which was produced using standard methods (see Kim et al., Nucl. Acids. Res. 20: 1083-1085 (1992) and Ausubel et al. (eds.), 1995, Current Protocols in Molecular Biology, (John Wiley & Sons, NY)). Genes in this Streptococcus library were disrupted using a shuttle mutagenesis approach with the transposon TnPho-A. Each disrupted gene then was tested to determine whether it was located within an operon that is essential for survival of Streptococcus pneumonia. In this method, 2 ml of LB broth supplemented with chloramphenicol (10 g/ml), MgSO4 (10 mM) and maltose (0.2%) were inoculated with 50 1ll of the Streptococcus pneumonia RX-1 plasmid library. The culture was grown at 37°C while shaking until the OD6so of the culture reached 0.8 (approximately 2 hours). A 1 ml aliquot of TnPho-A- containing phage (10'pfu/ml) was added to 1 ml of the Streptococcus culture, producing a ratio of approximately 10 phage to 1 cell. The phage and cells were incubated at 37°C for 30 minutes. A 4 ml aliquot of LB broth, warmed to 37°C, then was added to the phage/cell mixture, and the mixture was incubated at 37°C, while shaking, for 1 hour. The cells then were pelleted by centrifuging them at 3500 rpm in a Beckman tabletop centrifuge for 5 minutes.

The pelleted cells then were resuspended in 800 nul of LB broth, and a 200 1 aliquot of cells was plated onto each of four petri plates containing LB agar supplemented with chloramphenicol (10 jug/ml), kanamycin (50 jug/ml), and erythromycin (300 ug/ml). The plates then were incubated overnight at 37°C, and the number of colonies appearing on the plates was counted. Approximately 18,000 colonies then were pooled and used to inoculate 50 ml of LB broth, which was incubated overnight at 37°C. Plasmid DNA from the culture then was extracted using a Qiagen MIDI Prep Kit; other art-known extraction methods can be substituted.

The concentration of the extracted DNA was measured, and 100 ng of the DNA was transformed, by electroporation, into E. coli DH10B cells (Gibco BRL).

A 1 ml aliquot of SOC broth then was added the transformed cells, and the cells were incubated at 37°C for 1 hour before being pelleted by centrifugation at 3500 RPM for 5 minutes. The cells then were resuspended in 200 yl of LB broth, and aliquots of 2,20, and 50 jul were plated onto petri plates containing LB agar and antibiotics as described above. After incubating the plates overnight at 37°C, 93 colonies were picked and used, individually, to inoculate 1.25 ml of Terrific broth supplemented with chloramphenicol (lO, ug/ml), kanamycin (50, ug/ml), and erythromycin (300, ug/ml). The cultures were incubated at 37°G for approximately 20 hours, while shaking. The DNA from each culture then was extracted, using a conventional alkaline lysis miniprep method.

The extracted DNA samples then were used, individually, to transform Streptococcus pneumonia cells in a 96-well microtitre format. The transposon promotes insertion of the mutagenized gene into the bacterial chromosome. Non- transforming clones indicate that the mutation was within an operon containing an essential gene.

The non-transforming clones then were grown in 50 ml of Terrific broth supplemented with chloramphenicol (10, ug/ml), kanamycin (50, ug/ml), and erythromycin (300 Rg/ml). DNA from these clones was extracted and retransformed into Streptococcus pneumonia and plated on petri dishes to confirm that they were non-transforming. The genes located within essential operons then were sequenced, using primers that hybridize to sequences of the transposon. The sequences of the primers were: 5'GCAGCCCGGTTTTCCAGAACAGG3' (SEQ ID NO: 73) and 5'GATTTAGCCCAGTCGGCCGCACG3' (SEQ ID NO: 74).

In an alternative method, which also was used, the transposon Tn 10 was used to disrupt genes in a Streptococcus pneumonia fosmid library, which was produced using standard methods. A 50 ml aliquot of TBMM broth supplemented with chloramphenicol (lO, ug/ml), MgSO4 (10 mM), and maltose (0.2%) were inoculated with a single fosmid colony from the fosmid library, and the cultures

were grown overnight at 37°C. The cells then were pelleted and resuspended in 5 ml of LB broth supplemented with chloramphenicol (10 yg/ml), MgSO4 (10 mM), and maltose (0.2%). A 100 pl aliquot of the cells then was mixed with 100 lit of TnlO phage lysate (10'° pfu/ml), and the mixture was incubated at room temperature for 15 minutes and then incubated at 37°C for 15 minutes.

A 5 ml aliquot of LB broth supplemented with IPTG (1 mM) and sodium citrate (50 mM) and warmed to 37°C then was added to the cell/phage mixture.

After incubating the cell/phage mixture at 37°C, while shaking, the cells were pelleted and resuspended in 800 nul of LB broth. The cells then were plated onto 4 plates of LB agar supplemented with chloramphenicol (10 jug/ml) and erythromycin (300 jug/ml). After incubating the cells overnight at 37°C, at least 10,000 of the resulting colonies were used to inoculate 50 ml of LB broth. DNA then was extracted and quantified using standard methods, and 100 ng of DNA were used to transform E. coli DH10B cells (Gibco BRL) via electroporation. After adding 1 ml of SOC broth to the cells, the cells were incubated at 37°C for 1 hour. The cells then were pelleted and suspended in 200 jul LB broth, and aliquots of 2,20, and 50 ttl were plated onto LB agar supplemented with chloramphenicol (10 ag/ml), kanamycin (50 jug/ml), and erythromycin (300 jug/ml). The plates then were incubated overnight at 37°C, and 93 colonies were picked and used to inoculate 1.25 ml of Terrific broth supplemented with chloramphenicol (lO, ug/ml), kanamycin (50 yg/ml) and erythromycin (300, ug/ml). These cultures were incubated for approximately 20 hours, while shaking, and the DNA was isolated using a standard miniprep method. The extracted DNA then was used to transform Streptococcus pneumonia, and the genes located within essential operons were sequenced as described above. The sequences of the primers used for sequencing were: 5'CCGCCATTCTTTGCTGTTTCG3' (SEQ ID NO: 75) and 5'TTACACGTTACTAAAGGGAATG3' (SEQ ID NO: 76).

Identification of the gepl493. gepl507, gepl546 gep273, gep286 and gep76 Genes as Essential Genes As shown by the experiments described below, the gepl493, gepl507, gepl546, gep273, gep286, and gep76 genes each have been shown to be essential for survival of Streptococcus pneumoniae. Each of the gepl493, gepl507, gepl546, gep273, gep286, and gep76 genes has been identified as essential by creating a targeted deletion of each gene, separately, in Streptococcus pneumoniae.

Each of the gepl493, gepl507, gepl546, gep273, gep286, and gep76 genes was, separately, replaced with a nucleic acid sequence conferring resistance to the antibiotic erythromycin (an"erm"gene). Other genetic markers can be used in lieu of this particular antibiotic resistance marker. Polymerase chain reaction (PCR) amplification was used to make a targeted deletion in the Streptococcus genomic DNA, as shown in Fig. 25. Several PCR reactions were used to produce the DNA molecules needed to carry out target deletion of the genes of interest. First, using primers 5 and 6, an erm gene was amplified from pIL252 from B. subtilis (available from the Bacillus Genetic Stock Center, Columbus, OH). Primer 5 consists of 21 nucleotides that are identical to the promoter region of the erm gene and complementary to Sequence A. Primer 5 has the sequence 5'GTG TTC GTG CTG ACT TGC ACC3' (SEQ ID NO: 77). Primer 6 consists of 21 nucleotides that are complementary to the 3'end of the erm gene. Primer 6 has the sequence 5'GAA TTA TTT CCT CCC GTT AAA3' (SEQ ID NO: 78). PCR amplification of the erm gene was carried out under the following conditions: 30 cycles of 94°C for 1 minute, 55°C for 1 minute, and 72°C for 1.5 minutes, followed by one cycle of 72°C for 10 minutes.

In the second and third PCR reactions, sequences flanking the gene of interest were amplified and produced as hybrid DNA molecules that also contained a portion of the erm gene. The second reaction produced a double-stranded DNA molecule (termed"Left Flanking Molecule") that includes sequences upstream of the 5'end of the gene of interest and the first 21 nucleotides of the erm gene. As shown in Fig. 25, this reaction utilized primer 1, which is 21 nucleotides in length

and identical to a sequence that is located approximately 500 bp upstream of the translation start site of the gene of interest. Primers 1 and 2 are gene-specific and include the sequences 5'CTC CGT GAA GTC CAC CTG AT3' (SEQ ID NO: 79) and 5'GGT GCA AGT CAG CAC GAA CAC GCG ACA TAG GTT CCA GTT AGG3' (SEQ ID NO: 80), respectively, for gepl493. Primer 2 is 42 nucleotides in length, with 21 of the nucleotides at the 3'end of the primer being complementary to the 5'end of the sense strand of the gene of interest. The 21 nucleotides at the 5'end of the primer were identical to Sequence A and are therefore complementary to the 5'end of the erm gene. Thus, PCR amplification using primers 1 and 2 produced the left flanking DNA molecule, which is a hybrid DNA molecule containing a sequence located upstream of the gene of interest and 21 base pairs of the erm gene, as shown in Fig. 25.

The third PCR reaction was similar to the second reaction, but produced the right flanking DNA molecule, shown in Fig. 25. The right flanking DNA molecule contains 21 base pairs of the 3'end of the erm gene, a 21 base pair portion of the 3'end of the gene of interest, and sequences downstream of the gene of interest.

This right flanking DNA molecule was produced with gene-specific primers 3 and 4. For gep 1493, primers 3 and 4 included the sequences 5'TTT AAC GGG AGG AAA TAA TTC CCA TAT CGT GGC TCC TGA AT 3' (SEQ ID NO: 81) and 5'TAA AGC CCT CAT GTC GAA CC3' (SEQ ID NO: 82), respectively. Primer 3 is 42 nucleotides; the 21 nucleotides at the 5'end of Primer 3 are identical to Sequence B and therefore are identical to the 3'end of the erm gene. The 21 nucleotides at the 3'end of Primer 3 are identical to the 3'end of the gene of interest. Primer 4 is 21 nucleotides in length and is complementary to a sequence located approximately 500 bp downstream of the gene of interest. As discussed above, primers 1-4 are gene-specific, and the sequences disclosed above were used for gepl493. Gene-specific primers were used to identify the other essential genes described herein, as shown in Table 2.

TABLE 2: Primers Used in Identifying Essential Genes Gene Primer 1 Primer 2 Primer 3 Primer 4 gepl493 5'CTCCGTGAA 5'GGTGCAAGT 5'TTTAACGGG 5'TTGGCAAG GTCCACCTGA CAGCACGAAC AGGAAATAAT AAGGCAGAG T3' (SEQ ID ACTGCTCGCG TCGGGGATTG AAT3' (SEQ ID N0: 79) TAGATTGATT AACCTAACCC NO: 82) TG3' (SEQ ID AT3' (SEQ ID NO: 80) NO: 81) gep1507 5'GCATGAGAA 5'GGTGCAAGT 5'TTTAACGGG 5'TAAAGCCC ACCCAGTCTC CAGCACGAAC AGGAAATAAT TCATGTCGAA C3' (SEQ ID ACGCGACATA TCCCATATCG CC3' (SEQ ID NO: 83) GGTTCCAGTT TGGCTCCTGA NO: 86) AGG3' (SEQ ID AT3' (SEQ ID NO: 84) NO: 85) gepl546 5'CAGTGACGA 5'GGTGCAAGT 5'TTTAACGGG 5'CCAGCAAA TACAGATGAA CAGCACGAAC AGGAAATAAT GGAAAACCG GAA3' (SEQ ID ACGATGCTGG TCGTCGCGAC ATA3' (SEQ ID NO: 87) CTTCGTTGAG TCCTAGCCAT NO: 90) TG3' (SEQ ID AC3' (SEQ ID NO: 88) NO: 89) gep273 5'GGTCAGTGA 5'GGTGCAAGT 5'TTTAACGGG 5'CCCATAAC CAGCAGCAGA CAGCACGAAC AGGAAATAAT CGTATCACCT T3'(SEQID ACGGCCTTGG TCCCGCTTAA GG3'(SEQID NO: 91) AAAAAAGACC ATTCTGCCAA NO: 94) AT3' (SEQ ID TC3' (SEQ ID NO:92) NO:93) gep286 5'CGGAACGGC 5'TTTAACGGG 5'TCGCCCTAC TATGAAAAAA CAGCACGAAC AGGAAATAAT TTTTCGTATG A3' (SEQ ID ACACGACGAA TCTGGTATGG C3' (SEQ ID NO: 95) AGGCAACCAT GGGTTGATGA NO: 98) AC3' (SEQ ID AG3' (SEQ ID NO: 96) NO: 97) gep76 5'AGCGATATT 5'GGTGCAAGT 5'TTTAACGGG 5'GGGATTGT AGTGCGGGAG CAGCACGAAC AGGAAATAAT CACGGTAAA A3' (SEQ ID ACCAGCAATT TCCTGGGGTA ACC3' (SEQ ID NO: 99) TTGTCATCAG ATGGAGCACA NO: 102) TCG3' (SEQ ID GT3' (SEQ ID NO:100) NO:101)

PCR amplification of the left and right flanking DNA molecules was carried out, separately, in 50 yl reaction mixtures containing: 1 yl Streptococcus pneumoniae (RX1) DNA (0. 25 µg), 2.5 tel Primer 1 or Primer 4 (10 pmol/jul), 2.5 lAl Primer 2 or Primer 3 (20 pmol/yl), 1.2 tel a mixture dNTPS (10 mM each), 37 yI H20,0. 7 µl Taq polymerase (5 U/yl), and 5 ul lOx Taq polymerase buffer (10 mM Tris, 50 mM KCI, 2.5 mM MgCl2). The left and right flanking DNA molecules were amplified using the following PCR cycling program: 95°C for 2 minutes; 72°C for 1 minute; 94°C for 30 seconds; 49°C for 30 seconds; 72°C for 1 minute; repeating the 94°C, 49°C, and 72°C incubations 30 times; 72°C for 10 minutes and then stopping the reactions. A 15 µl aliquot of each reaction mixture then was electrophoresed through a 1.2% low melting point agarose gel in TAE buffer and then stained with ethidium bromide. Fragments containing the amplified left and right flanking DNA molecules were excised from the gel and purified using the QIAQUICKTM gel extraction kit (Qiagen, Inc.) Other art-known methods for amplifying and isolating DNA can be substituted. The flanking left and right DNA fragments were eluted into 30 jul TE buffer at pH 8.0.

The amplified erm gene and left and right flanking DNA molecules were then fused together to produce the fusion product, as shown in Fig. 25. The fusion PCR reaction was carried out in a volume of 50 pl containing: 2 Itl of each of the left and right flanking DNA molecules and the erm gene PCR product; 5 yl of lOx buffer; 2.5 jul of Primer 1 (10 pmol/yl); 2.5 yl of Primer 4 (10 pmol/ttl), 1. 2 µl dNTP mix (10 mM each) 32 µl H2O, and 0.7 yl Taq polymerase. The PCR reaction was carried out using the following cycling program: 95°C for 2 minutes; 72°C for 1 minute; 94°C for 30 seconds, 48°C for 30 seconds; 72°C for 3 minutes; repeat the 94°C, 48°C and 72°C incubations 25 times; 72°C for 10 minutes. After the reaction was stopped, a 12, ul aliquot of the reaction mixture was electrophoresed through an agarose gel to confirm the presence of a final product of approximately 2 kb.

A 5, ul aliquot of the fusion product was used to transform S. pneumoniae grown on a medium containing erythromycin in accordance with standard

techniques. As shown in Fig. 26, the fusion product and the S. pneumonie genome undergo a homologous recombination event so that the erm gene replaces the chromosomal copy of the gene of interest, thereby creating a gene knockout.

Disruption of an essential gene results in no growth on a medium containing erythromycin. Using this gene knockout method, the gepl493, gepl507, gepl546, gep273, gep286, and gep76 genes were each identified as being essential for survival.

Identification of Homologs and Orthologs of GEP Polvpeptides Having shown that the various GEP genes are essential or located within operons that are essential for survival of Streptococcus, it can be expected that homologs and orthologs of the polypeptides encoded by these genes, when present in other organisms, for example B. subtilis, are essential or located within operons that are essential for survival of that organism as well, and therefore are useful targets for identifying antibacterial agents. Using the sequences of the GEP polypeptides identified in Streptococcus, homologs and orthologs of these polypeptides can be identified in other organisms. For example, the coding sequences of the GEP nucleic acids can be used to search the GenBank database of nucleotide sequences to identify homologs or orthologs that are expressed from essential operons in other organisms. Sequence comparisons can be performed using the Basic Local Alignment Search Tool (BLAST) (Altschul et al., J. Mol.

Biol., 215: 403-410 1990). The percent sequence identity shared by the GEP polypeptides and their homologs or orthologs can be determined using the GAP program from the Genetics Computer Group (GCG) Wisconsin Sequence Analysis Package (Wisconsin Package Version 9.0, GCG; Madison, WI). The following parameters are suitable: gap creation penalty, 12 (protein) 50 (DNA); gap extension penalty, 4 (protein) 3 (DNA). Typically, the GEP polypeptides and their homologs share at least 25% (e. g., at least 40%) sequence identity. Typically, the DNA sequences encoding GEP polypeptides and their homologs share at least 35% (e. g., at least 45%) sequence identity. To confirm that the homologs or orthologs of the GEP polypeptides are expressed from operons that are essential for survival of bacteria, the operon encoding each of the homologs or orthologs can be, separately, deleted from the genome of the host organism.

Identification of Essential Operons in Additional Streptococcus Strains Now that the various GEP genes have been identified as being located within operons that are essential for survival, these genes, or fragments thereof, can be used to detect homologous or orthologous genes in other organisms. In

particular, these genes can be used to analyze various pathogenic and non- pathogenic strains of bacteria. Fragments of a nucleic acid (DNA or RNA) encoding a GEP polypeptide or homolog or ortholog (or sequences complementary thereto) can be used as probes in conventional nucleic acid hybridization assays of pathogenic bacteria. For example, nucleic acid probes (which typically are 8-30, or usually 15-20, nucleotides in length) can be used to detect GEP genes or homologs or orthologs thereof in art-known molecular biology methods, such as Southern blotting, Northern blotting, dot or slot blotting, PCR amplification methods, colony hybridization methods, and the like. Typically, an oligonucleotide probe based on the nucleic acid sequences described herein, or fragments thereof, is labeled and used to screen a genomic library constructed from mRNA obtained from a Streptococcus or bacterial strain of interest. A suitable method of labeling involves using polynucleotide kinase to add 32P-labeled ATP to the oligonucleotide used as the probe. This method is well known in the art, as are several other suitable methods (e. g., biotinylation and enzyme labeling).

Hybridization of the oligonucleotide probe to the library, or other nucleic acid sample, typically is performed under stringent to highly stringent conditions.

Nucleic acid duplex or hybrid stability is expressed as the melting temperature or Tm which is the temperature at which a probe dissociates from a target DNA. This melting temperature is used to define the required stringency conditions. If sequences are to be identified that are related and substantially identical to the probe, rather than identical, then it is useful to first establish the lowest temperature at which only homologous hybridization occurs with a particular concentration of salt (e. g., SSC or SSPE). Then, assuming that 1% mismatching results in a 1°C decrease in the Tm, the temperature of the final wash in the hybridization reaction is reduced accordingly (for example, if sequences having > 95% identity with the probe are sought, the final wash temperature is decreased by 5°C). In practice, the change in Tm can be between 0.5° and 1.5°C per 1% mismatch.

As used herein, highly stringent conditions refer to hybridization at 68°C in 5x SSC/5x Denhardt's solution/1.0% SDS, and washing in 0.2x SSC/0.1% SDS at

42°C. Stringent conditions refer to washing in 3x SSC at 42°C. The parameters of salt concentration and temperature can be varied to achieve the optimal level of identity between the probe and the target nucleic acid. Additional guidance regarding such conditions is readily available in the art, for example, by Sambrook et al., 1989, Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Press, N. Y.; and Ausubel et al. (eds.), 1995, Current Protocols in Molecular Biology, (John Wiley & Sons, N. Y.) at Unit 2.10.

In one approach, libraries constructed from pathogenic or non-pathogenic Streptococcus or bacterial strains can be screened. For example, such strains can be screened for expression of GEP genes by Northern blot analysis. Upon detection of transcripts of the GEP genes or homologs or orthologs thereof, libraries can be constructed from RNA isolated from the appropriate strain, utilizing standard techniques well known to those of skill in the art. Alternatively, a total genomic DNA library can be screened using an GEP gene probe (or a probe directed to a homolog or ortholog thereof).

New gene sequences can be isolated, for example, by performing PCR using two degenerate oligonucleotide primer pools designed on the basis of nucleotide sequences within the GEP genes, or their homologs or orthologs, as depicted herein. The template for the reaction can be DNA obtained from strains known or suspected to express a GEP allele or an allele of a homolog or ortholog thereof.

The PCR product can be subcloned and sequenced to ensure that the amplified sequences represent the sequences of a new GEP nucleic acid sequence, or a sequence of a homolog or ortholog thereof.

Synthesis of the various GEP polypeptides or their homologs or orthologs (or an antigenic fragment thereof) for use as antigens, or for other purposes, can readily be accomplished using any of the various art-known techniques. For example, a polypeptide or homolog or ortholog thereof, or an antigenic fragment (s), can be synthesized chemically in vitro, or enzymatically (e. g., by in vitro transcription and translation). Alternatively, the gene can be expressed in, and the polypeptide purified from, a cell (e. g., a cultured cell) by using any of the

numerous, available gene expression systems. For example, the polypeptide antigen can be produced in a prokaryotic host (e. g., E. coli or B. subtilis) or in eukaryotic cells, such as yeast cells or insect cells (e. g., by using a baculovirus-based expression vector).

Proteins and polypeptides can also be produced in plant cells, if desired.

For plant cells viral expression vectors (e. g., cauliflower mosaic virus and tobacco mosaic virus) and plasmid expression vectors (e. g., Ti plasmid) are suitable. Such cells are available from a wide range of sources (e. g., the American Type Culture Collection, Rockland, MD; also, see, e. g., Ausubel et al., Current Protocols in Molecular Biology, John Wiley & Sons, New York, 1994). The optimal methods of transformation or transfection and the choice of expression vehicle will depend on the host system selected. Transformation and transfection methods are described, e. g., in Ausubel et al., supra; expression vehicles may be chosen from those provided, e. g., in Cloning Vectors: A Laboratory Manual (P. H. Pouwels et al., 1985, Supp. 1987). The host cells harboring the expression vehicle can be cultured in conventional nutrient media, adapted as needed for activation of a chosen gene, repression of a chosen gene, selection of transformants, or amplification of a chosen gene.

If desired, GEP polypeptides or their homologs or orthologs can be produced as fusion proteins. For example, the expression vector pUR278 (Ruther et al., EMBO J., 2: 1791,1983) can be used to create lacZ fusion proteins. The art- known pGEX vectors can be used to express foreign polypeptides as fusion proteins with glutathione S-transferase (GST). In general, such fusion proteins are soluble and can be easily purified from lysed cells by adsorption to glutathione- agarose beads followed by elution in the presence of free glutathione. The pGEX vectors are designed to include thrombin or factor Xa protease cleavage sites so that the cloned target gene product can be released from the GST moiety.

In an exemplary insect cell expression system, a baculovirus such as Autographa californica nuclear polyhedrosis virus (AcNPV), which grows in Spodoptera frugiperda cells, can be used as a vector to express foreign genes. A

coding sequence encoding a GEP polypeptide or homolog or ortholog can be cloned into a non-essential region (for example the polyhedrin gene) of the viral genome and placed under control of a promoter, e. g., the polyhedrin promoter or an exogenous promoter. Successful insertion of a gene encoding a GEP polypeptide or homolog or ortholog can result in inactivation of the polyhedrin gene and production of non-occluded recombinant virus (i. e., virus lacking the proteinaceous coat encoded by the polyhedrin gene). These recombinant viruses are then used to infect insect cells (e. g., Spodopterafrugiperda cells) in which the inserted gene is expressed (see, e. g., Smith et al., J. Virol., 46: 584,1983; Smith, U. S. Patent No. 4,215,051).

In mammalian host cells, a number of viral-based expression systems can be utilized. When an adenovirus is used as an expression vector, the nucleic acid sequence encoding the GEP polypeptide or homolog or ortholog can be ligated to an adenovirus transcription/translation control complex, e. g., the late promoter and tripartite leader sequence. This chimeric gene can then be inserted into the adenovirus genome by in vitro or in vivo recombination. Insertion into a non- essential region of the viral genome (e. g., region E1 or E3) will result in a recombinant virus that is viable and capable of expressing a essential gene product in infected hosts (see, e. g., Logan, Proc. Natl. Acad. Sci. USA, 81: 3655,1984).

Specific initiation signals may be required for efficient translation of inserted nucleic acid sequences. These signals include the ATG initiation codon and adjacent sequences. In general, exogenous translational control signals, including, perhaps, the ATG initiation codon, should be provided. Furthermore, the initiation codon must be in phase with the reading frame of the desired coding sequence to ensure translation of the entire sequence. These exogenous translational control signals and initiation codons can be of a variety of origins, both natural and synthetic. The efficiency of expression may be enhanced by the inclusion of appropriate transcription enhancer elements, or transcription terminators (Bittner et al., Methods in Enzymol., 153: 516,1987).

The GEP polypeptides and homologs and orthologs can be expressed individually or as fusions with a heterologous polypeptide, such as a signal sequence or other polypeptide having a specific cleavage site at the N-and/or C- terminus of the protein or polypeptide. The heterologous signal sequence selected should be one that is recognized and processed, i. e., cleaved by a signal peptidase, by the host cell in which the fusion protein is expressed.

A host cell can be chosen that modulates the expression of the inserted sequences, or modifies and processes the gene product in a specific, desired fashion. Such modifications and processing (e. g., cleavage) of protein products may facilitate optimal functioning of the protein. Various host cells have characteristic and specific mechanisms for post-translational processing and modification of proteins and gene products. Appropriate cell lines or host systems familiar to those of skill in the art of molecular biology can be chosen to ensure the correct modification and processing of the foreign protein expressed. To this end, eukaryotic host cells that possess the cellular machinery for proper processing of the primary transcript, and phosphorylation of the gene product can be used.

Such mammalian host cells include, but are not limited to, CHO, VERO, BHK, HeLa, COS, MDCK, 293,3T3, WI38, and choroid plexus cell lines.

If desired, the GEP polypeptide or homolog or ortholog thereof can be produced by a stably-transfected mammalian cell line. A number of vectors suitable for stable transection of mammalian cells are available to the public, see, e. g., Pouwels et al. (supra); methods for constructing such cell lines are also publicly known, e. g., in Ausubel et al. (supra). In one example, DNA encoding the protein is cloned into an expression vector that includes the dihydrofolate reductase (DHFR) gene. Integration of the plasmid and, therefore, the GEP polypeptide- encoding gene into the host cell chromosome is selected for by including 0.01-300 juM methotrexate in the cell culture medium (as described in Ausubel et al., supra).

This dominant selection can be accomplished in most cell types.

Recombinant protein expression can be increased by DHFR-mediated amplification of the transfected gene. Methods for selecting cell lines bearing gene

amplifications are described in Ausubel et al. (supra); such methods generally involve extended culture in medium containing gradually increasing levels of methotrexate. DHFR-containing expression vectors commonly used for this purpose include pCVSEII-DHFR and pAdD26SV (A) (described in Ausubel et al., supra).

A number of other selection systems can be used, including but not limited to, herpes simplex virus thymidine kinase genes, hypoxanthine-guanine phosphoribosyl-transferase genes, and adenine phosphoribosyltransferase genes, which can be employed in tk, hgprt, or aprt cells, respectively. In addition, gpt, which confers resistance to mycophenolic acid (Mulligan et al., Proc. Natl. Acad.

Sci. USA, 78: 2072,1981); neo, which confers resistance to the aminoglycoside G- 418 (Colberre-Garapin et al., J. Mol. Biol., 150: 1,1981); and hygro, which confers resistance to hygromycin (Santerre et al., Gene, 30: 147,1981), can be used.

Alternatively, any fusion protein can be readily purified by utilizing an antibody or other molecule that specifically binds to the fusion protein being expressed. For example, a system described in Janknecht et al., Proc. Natl. Acad.

Sci. USA, 88: 8972 (1981), allows for the ready purification of non-denatured fusion proteins expressed in human cell lines. In this system, the gene of interest is subcloned into a vaccinia recombination plasmid such that the gene's open reading frame is translationally fused to an amino-terminal tag consisting of six histidine residues. Extracts from cells infected with recombinant vaccinia virus are loaded onto Ni"nitriloacetic acid-agarose columns, and histidine-tagged proteins are selectively eluted with imidazole-containing buffers.

Alternatively, a GEP polypeptide or homolog or ortholog, or a portion thereof, can be fused to an immunoglobulin Fc domain. Such a fusion protein can be readily purified using a protein A column, for example. Moreover, such fusion proteins permit the production of a chimeric form of a GEP polypeptide or homolog or ortholog having increased stability in vivo.

Once the recombinant GEP polypeptide (or homolog or ortholog) is expressed, it can be isolated (i. e., purified). Secreted forms of the polypeptides can

be isolated from cell culture media, while non-secreted forms must be isolated from the host cells. Polypeptides can be isolated by affinity chromatography. For example, an anti-gepl03 antibody (e. g., produced as described herein) can be attached to a column and used to isolate the protein. Lysis and fractionation of cells harboring the protein prior to affinity chromatography can be performed by standard methods (see, e. g., Ausubel et al., supra). Alternatively, a fusion protein can be constructed and used to isolate a GEP polypeptide (e. g., a gep103-maltose binding fusion protein, a gep-103-p-galactosidase fusion protein, or a gep103-trpE fusion protein; see, e. g., Ausubel et al., supra; New England Biolabs Catalog, Beverly, MA). The recombinant protein can, if desired, be further purified, e. g., by high performance liquid chromatography using standard techniques (see, e. g., Fisher, Laboratory Techniques In Biochemistry And Molecular Biology, eds., Work and Burdon, Elsevier, 1980).

Given the amino acid sequences described herein, polypeptides useful in practicing the invention, particularly fragments of GEP polypeptides can be produced by standard chemical synthesis (e. g., by the methods described in Solid Phase Peptide Synthesis, 2nd ed., The Pierce Chemical Co., Rockford, IL, 1984) and used as antigens, for example.

Antibodies The GEP polypeptides (or antigenic fragments or analogs of such polypeptides) can be used to raise antibodies useful in the invention, and such polypeptides can be produced by recombinant or peptide synthetic techniques (see, e. g., Solid Phase Peptide Synthesis, supra ; Ausubel et al., supra). Likewise, antibodies can be raised against the GEP homologs and orthologs. In general, the polypeptides can be coupled to a carrier protein, such as KLH, as described in Ausubel et al., supra, mixed with an adjuvant, and injected into a host mammal.

Antibodies can be purified, for example, by affinity chromatography methods in which the polypeptide antigen is immobilized on a resin.

In particular, various host animals can be immunized by injection of a polypeptide of interest. Examples of suitable host animals include rabbits, mice, guinea pigs, and rats. Various adjuvants can be used to increase the immunological response, depending on the host species, including but not limited to Freund's (complete and incomplete adjuvant), adjuvant mineral gels such as aluminum hydroxide, surface active substances such as lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, keyhole limpet hemocyanin, dinitrophenol, BCG (bacille Calmette-Guerin) and Corynebacterium parvum. Polyclonal antibodies are heterogeneous populations of antibody molecules derived from the sera of the immunized animals.

Antibodies useful in the invention include monoclonal antibodies, polyclonal antibodies, humanized or chimeric antibodies, single chain antibodies, Fab fragments, F (ab') z fragments, and molecules produced using a Fab expression library.

Monoclonal antibodies (mAbs), which are homogeneous populations of antibodies to a particular antigen, can be prepared using the GEP polypeptides or homologs or orthologs thereof and standard hybridoma technology (see, e. g., Kohler et al., Nature, 256: 495,1975; Kohler et al., Eur. J. Immunol., 6: 511,1976; Kohler et al., Eur. J. Immunol., 6: 292,1976; Hammerling et al., In Monoclonal Antibodies and T Cell Hybridomas, Elsevier, NY, 1981; Ausubel et al., supra).

In particular, monoclonal antibodies can be obtained by any technique that provides for the production of antibody molecules by continuous cell lines in culture, such as those described in Kohler et al., Nature, 256: 495,1975, and U. S.

Patent No. 4,376,110; the human B-cell hybridoma technique (Kosbor et al., Immunology Today, 4: 72,1983; Cole et al., Proc. Natl. Acad. Sci. USA, 80: 2026, 1983); and the EBV-hybridoma technique (Cole et al., Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96,1983). Such antibodies can be of any immunoglobulin class including IgG, IgM, IgE, IgA, IgD, and any subclass thereof. The hybridomas producing the mAbs of this invention can be cultivated in vitro or in vivo.

Once produced, polyclonal or monoclonal antibodies are tested for specific recognition of a GEP polypeptide or homolog or ortholog thereof in an immunoassay, such as a Western blot or immunoprecipitation analysis using standard techniques, e. g., as described in Ausubel et al., supra. Antibodies that specifically bind to the GEP polypeptides, or conservative variants and homologs or orthologs thereof, are useful in the invention. For example, such antibodies can be used in an immunoassay to detect a GEP polypeptide in pathogenic or non- pathogenic strains of bacteria.

Preferably, antibodies of the invention are produced using fragments of the GEP polypeptides that appear likely to be antigenic, by criteria such as high frequency of charged residues. In one specific example, such fragments are generated by standard techniques of PCR, and are then cloned into the pGEX expression vector (Ausubel et al., supra). Fusion proteins are expressed in E. coli and purified using a glutathione agarose affinity matrix as described in Ausubel, et al., supra.

If desired, several (e. g., two or three) fusions can be generated for each protein, and each fusion can be injected into at least two rabbits. Antisera can be raised by injections in a series, typically including at least three booster injections.

Typically, the antisera is checked for its ability to immunoprecipitate a recombinant GEP polypeptide or homolog or ortholog, or unrelated control proteins, such as glucocorticoid receptor, chloramphenicol acetyltransferase, or luciferase.

Techniques developed for the production of"chimeric antibodies" (Morrison et al., Proc. Natl. Acad. Sci., 81: 6851,1984; Neuberger et al., Nature, 312: 604, 1984; Takeda et al., Nature, 314: 452,1984) can be used to splice the genes from a mouse antibody molecule of appropriate antigen specificity together with genes from a human antibody molecule of appropriate biological activity. A chimeric antibody is a molecule in which different portions are derived from different animal species, such as those having a variable region derived from a murine mAb and a human immunoglobulin constant region.

Alternatively, techniques described for the production of single chain antibodies (U. S. Patent 4,946,778; and U. S. Patents 4,946,778 and 4,704,692) can be adapted to produce single chain antibodies against a GEP polypeptide or homolog or ortholog. Single chain antibodies are formed by linking the heavy and light chain fragments of the Fv region via an amino acid bridge, resulting in a single chain polypeptide.

Antibody fragments that recognize and bind to specific epitopes can be generated by known techniques. For example, such fragments can include but are not limited to F (ab') 2 fragments, which can be produced by pepsin digestion of the antibody molecule, and Fab fragments, which can be generated by reducing the disulfide bridges of F (ab') 2 fragments. Alternatively, Fab expression libraries can be constructed (Huse et al., Science, 246: 1275,1989) to allow rapid and easy identification of monoclonal Fab fragments with the desired specificity.

Polyclonal and monoclonal antibodies that specifically bind to GEP polypeptides or homologs or orthologs can be used, for example, to detect expression of a GEP gene or homolog or ortholog in another strain of bacteria.

For example, a GEP polypeptide can be readily detected in conventional immunoassays of bacteria cells or extracts. Examples of suitable assays include, without limitation, Western blotting, ELISAs, radioimmune assays, and the like.

Assay for Antibacterial Agents The invention provides a method for identifying an antibacterial agent (s).

Although the inventors are not bound by any particular theory as to the biological mechanism involved, the new antibacterial agents are thought to inhibit specifically (1) the function of a polypeptide (s) encoded by a nucleic acid located within an operon containing a GEP gene, or (2) expression of the a gene located within an operon containing a GEP gene, or homologs or orthologs thereof. Screening for antibacterial agents can be rapidly accomplished by identifying those compounds (e. g., polypeptides or small molecules) that specifically bind to a polypeptide encoded by a nucleic acid located within an operon containing a GEP gene. A

homolog or ortholog of a GEP polypeptide can be substituted for the GEP polypeptide in the methods summarized herein. Specific binding of a test compound to a polypeptide can be detected, for example, in vitro by reversibly or irreversibly immobilizing the test compound (s) on a substrate, e. g., the surface of a well of a 96-well polystyrene microtitre plate. Methods for immobilizing polypeptides and other small molecules are well known in the art. For example, the microtitre plates can be coated with a polypeptide encoded by a nucleic acid located within an operon containing a GEP gene (e. g., a GEP polypeptide or a combination of GEP polypeptides and/or homologs and/or orthologs) by adding the polypeptide (s) in a solution (typically, at a concentration of 0.05 to 1 mg/ml in a volume of 1-100 yl) to each well, and incubating the plates at room temperature to 37°C for 0.1 to 36 hours. Polypeptides that are not bound to the plate can be removed by shaking the excess solution from the plate, and then washing the plate (once or repeatedly) with water or a buffer. Typically, the polypeptide, homolog, or ortholog is contained in water or a buffer. The plate is then washed with a buffer that lacks the bound polypeptide. To block the free protein-binding sites on the plates, the plates are blocked with a protein that is unrelated to the bound polypeptide. For example, 300 141 ouf bovine serum albumin (BSA) at a concentration of 2 mg/ml in Tris-HCl is suitable. Suitable substrates include those substrates that contain a defined cross-linking chemistry (e. g., plastic substrates, such as polystyrene, styrene, or polypropylene substrates from Corning Costar Corp. (Cambridge, MA), for example). If desired, a beaded particle, e. g., beaded agarose or beaded sepharose, can be used as the substrate.

Binding of the test compound to the new polypeptides (or homologs or orthologs thereof) can be detected by any of a variety of art-known methods. For example, an antibody that specifically binds to a GEP polypeptide can be used in an immunoassay. If desired, the antibody can be labeled (e. g., fluorescently or with a radioisotope) and detected directly (see, e. g., West and McMahon, J. Cell Biol. 74: 264,1977). Alternatively, a second antibody can be used for detection (e. g., a labeled antibody that binds to the Fc portion of an anti-GEP103 antibody).

In an alternative detection method, the GEP polypeptide is labeled, and the label is detected (e. g., by labeling a GEP polypeptide with a radioisotope, fluorophore, chromophore, or the like). In still another method, the GEP polypeptide is produced as a fusion protein with a protein that can be detected optically, e. g., green fluorescent protein (which can be detected under UV light). In an alternative method, the polypeptide (e. g., gap103) can be produced as a fusion protein with an enzyme having a detectable enzymatic activity, such as horse radish peroxidase, alkaline phosphatase, p-galactosidase, or glucose oxidase. Genes encoding all of these enzymes have been cloned and are readily available for use by those of skill in the art. If desired, the fusion protein can include an antigen, and such an antigen can be detected and measured with a polyclonal or monoclonal antibody using conventional methods. Suitable antigens include enzymes (e. g., horse radish peroxidase, alkaline phosphatase, and p-galactosidase) and non-enzymatic polypeptides (e. g., serum proteins, such as BSA and globulins, and milk proteins, such as caseins).

In various in vivo methods for identifying polypeptides that bind to GEP polypeptides, the conventional two-hybrid assays of protein/protein interactions can be used (see e. g., Chien et al., Proc. Natl. Acad. Sci. USA, 88: 9578,1991; Fields et al., U. S. Pat. No. 5,283,173; Fields and Song, Nature, 340: 245,1989; Le Douarin et al., Nucleic Acids Research, 23: 876,1995; Vidal et al., Proc. Natl. Acad. Sci.

USA, 93: 10315-10320,1996; and White, Proc. Natl. Acad. Sci. USA, 93: 10001- 10003,1996). Kits for practicing various two-hybrid methods are commercially available (e. g., from Clontech; Palo Alto, CA).

Generally, the two-hybrid methods involve in vivo reconstitution of two separable domains of a transcription factor. The DNA binding domain (DB) of the transcription factor is required for recognition of a chosen promoter. The activation domain (AD) is required for contacting other components of the host cell's transcriptional machinery. The transcription factor is reconstituted through the use of hybrid proteins. One hybrid is composed of the AD and a first protein

of interest. The second hybrid is composed of the DB and a second protein of interest.

Useful reporter genes are those that are operably linked to a promoter which is specifically recognized by the DB. Typically, the two-hybrid system employs the yeast Saccharomyces cerevisiae and reporter genes, the expression of which can be selected under appropriate conditions. Other eukaryotic cells, including mammalian and insect cells, can be used, if desired. The two-hybrid system provides a convenient method for cloning a gene encoding a polypeptide (i. e., a candidate antibacterial agent) that binds to a second, preselected polypeptide (e. g., gel03). Typically, though not necessarily, a DNA library is constructed such that randomly generated sequences are fused to the AD, and the protein of interest (e. g., gepl03) is fused to the DB.

In such two-hybrid methods, two fusion proteins are produced. One fusion protein contains the GEP polypeptide (or homolog or ortholog thereof) fused to either a transactivator domain or DNA binding domain of a transcription factor (e. g., of Gal4). The other fusion protein contains a test polypeptide fused to either the DNA binding domain or a transactivator domain of a transcription factor. Once brought together in a single cell (e. g., a yeast cell or mammalian cell), one of the fusion proteins contains the transactivator domain and the other fusion protein contains the DNA binding domain. Therefore, binding of the GEP polypeptide to the test polypeptide (i. e., candidate antibacterial agent) reconstitutes the transcription factor. Reconstitution of the transcription factor can be detected by detecting expression of a gene (i. e., a reporter gene) that is operably linked to a DNA sequence that is bound by the DNA binding domain of the transcription factor.

The methods described above can be used for high throughput screening of numerous test compounds to identify candidate antibacterial (or anti-bacterial) agents. Having identified a test compound as a candidate antibacterial agent, the candidate antibacterial agent can be further tested for inhibition of bacterial growth in vitro or in vivo (e. g., using an animal, e. g., rodent, model system) if desired.

Using other, art-known variations of such methods, one can test the ability of a nucleic acid (e. g., DNA or RNA) used as the test compound to bind to a polypeptide encoded by a nucleic acid sequence located within an operon containing a GEP gene or homolog or ortholog thereof.

In vitro, further testing can be accomplished by means known to those in the art such as an enzyme inhibition assay or a whole-cell bacterial growth inhibition assay. For example, an agar dilution assay identifies a substance that inhibits bacterial growth. Microtiter plates are prepared with serial dilutions of the test compound; adding to the preparation a given amount of growth substrate; and providing a preparation of Streptococcus cells. Inhibition of growth is determined, for example, by observing changes in optical densities of the bacterial cultures.

Inhibition of bacterial growth is demonstrated, for example, by comparing (in the presence and absence of a test compound) the rate of growth or the absolute growth of bacterial cells. Inhibition includes a reduction of one of the above measurements by at least 20% (e. g., at least 25%, 30%, 40%, 50%, 75%, 80%, or 90%).

Rodent (e. g., murine) and rabbit animal models of streptococcal infections are known to those of skill in the art, and such animal model systems are accepted for screening antibacterial agents as an indication of their therapeutic efficacy in human patients. In a typical in vivo assay, an animal is infected with a pathogenic Streptococcus strain, e. g., by inhalation of Streptococcus pneumoniae, and conventional methods and criteria are used to diagnose the mammal as being afflicted with streptococcal pneumonia. The candidate antibacterial agent then is administered to the mammal at a dosage of 1-100 mg/kg of body weight, and the mammal is monitored for signs of amelioration of disease. Alternatively, the test compound can be administered to the mammal prior to infecting the mammal with Streptococcus, and the ability of the treated mammal to resist infection is measured.

Of course, the results obtained in the presence of the test compound should be compared with results in control animals, which are not treated with the test

compound. Administration of candidate antibacterial agent to the mammal can be carried out as described below, for example.

Pharmaceutical Formulations Treatment includes administering a pharmaceutically effective amount of a composition containing an antibacterial agent to a subject in need of such treatment, thereby inhibiting bacterial growth in the subject. Such a composition typically contains from about 0.1 to 90% by weight (such as 1 to 20% or 1 to 10%) of an antibacterial agent of the invention in a pharmaceutically acceptable carrier.

Solid formulations of the compositions for oral administration may contain suitable carriers or excipients, such as corn starch, gelatin, lactose, acacia, sucrose, microcrystalline cellulose, kaolin, mannitol, dicalcium phosphate, calcium carbonate, sodium chloride, or alginic acid. Disintegrators that can be used include, without limitation, micro-crystalline cellulose, corn starch, sodium starch glycolate and alginic acid. Tablet binders that may be used include acacia, methylcellulose, sodium carboxymethylcellulose, polyvinylpyrrolidone (Povidone), hydroxypropyl methylcellulose, sucrose, starch, and ethylcellulose. Lubricants that may be used include magnesium stearates, stearic acid, silicone fluid, talc, waxes, oils, and colloidal silica.

Liquid formulations of the compositions for oral administration prepared in water or other aqueous vehicles may contain various suspending agents such as methylcellulose, alginates, tragacanth, pectin, kelgin, carrageenan, acacia, polyvinylpyrrolidone, and polyvinyl alcohol. The liquid formulations may also include solutions, emulsions, syrups and elixirs containing, together with the active compound (s), wetting agents, sweeteners, and coloring and flavoring agents.

Various liquid and powder formulations can be prepared by conventional methods for inhalation into the lungs of the mammal to be treated.

Injectable formulations of the compositions may contain various carriers such as vegetable oils, dimethylacetamide, dimethylformamide, ethyl lactate, ethyl

carbonate, isopropyl myristate, ethanol, polyols (glycerol, propylene glycol, liquid polyethylene glycol, and the like). For intravenous injections, water soluble versions of the compounds may be administered by the drip method, whereby a pharmaceutical formulation containing the antibacterial agent and a physiologically acceptable excipient is infused. Physiologically acceptable excipients may include, for example, 5% dextrose, 0.9% saline, Ringer's solution or other suitable excipients. Intramuscular preparations, a sterile formulation of a suitable soluble salt form of the compounds can be dissolved and administered in a pharmaceutical excipient such as Water-for-Injection, 0.9% saline, or 5% glucose solution. A suitable insoluble form of the compound may be prepared and administered as a suspension in an aqueous base or a pharmaceutically acceptable oil base, such as an ester of a long chain fatty acid, (e. g., ethyl oleate).

A topical semi-solid ointment formulation typically contains a concentration of the active ingredient from about 1 to 20%, e. g., 5 to 10% in a carrier such as a pharmaceutical cream base. Various formulations for topical use include drops, tinctures, lotions, creams, solutions, and ointments containing the active ingredient and various supports and vehicles.

The optimal percentage of the antibacterial agent in each pharmaceutical formulation varies according to the formulation itself and the therapeutic effect desired in the specific pathologies and correlated therapeutic regimens. Appropriate dosages of the antibacterial agents can readily be determined by those of ordinary skill in the art of medicine by monitoring the mammal for signs of disease amelioration or inhibition, and increasing or decreasing the dosage and/or frequency of treatment as desired. The optimal amount of the antibacterial compound used for treatment of conditions caused by or contributed to by bacterial infection may depend upon the manner of administration, the age and the body weight of the subject and the condition of the subject to be treated. Generally, the antibacterial compound is administered at a dosage of 1 to 100 mg/kg of body weight, and typically at a dosage of 1 to 10 mg/kg of body weight.

Example Using the transposon-based mutagenesis methods described above, the Streptococcus pneumonia genome was mutagenized, and 23 genes were identified as being located within operons that are essential for survival of Streptococcus pneumonia. These genes are listed in Table 1, above, and their nucleic acid and amino acid sequences are represented by SEQ ID NOs: 1-69, as shown in Figs. 1- 23.

Now that each of these genes is known to be located within an operon that is essential for survival of Streptococcus, the polypeptides encoded by nucleic acids located within those operons can be used to identify antibacterial agents by using the assays described herein. Other art-known assays to detect interactions of test compounds with proteins, or to detect inhibition of bacterial growth also can be used with the nucleic acids located within operons containing the GEP genes, and gene products and homologs or orthologs thereof.

Other Embodiments The invention also features fragments, variants, analogs, and derivatives of the GEP polypeptides described above that retain one or more of the biological activities of the GEP polypeptides, e. g., as determined in a complementation assay.

Also included within the invention are naturally-occurring and non-naturally- occurring allelic variants. Compared with the naturally-occurring GEP gene, sequences depicted in Figs. 1-23, the nucleic acid sequence encoding allelic variants may have a substitution, deletion, or addition of one or more nucleotides. The preferred allelic variants are functionally equivalent to a GEP polypeptide, e. g., as determined in a complementation assay.

It is to be understood that, while the invention has been described in conjunction with the detailed description thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims.

Previous Patent: HUMAN REGULATORY PROTEINS

Next Patent: TROPONIN CI COMPLEXES