Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
ORGANELLE TARGETING SEQUENCES
Document Type and Number:
WIPO Patent Application WO/2000/012732
Kind Code:
A2
Abstract:
Compositions and methods are provided for modulating the subcellular localization of proteins in a cell. Compositions include nucleotide and amino acid sequences of transit peptide sequences from maize. Such sequences find utility in the enhanced or modified localization of protein to a plastid or compartment thereof.

Inventors:
BENSEN ROBERT J (US)
Application Number:
PCT/US1999/018955
Publication Date:
March 09, 2000
Filing Date:
August 25, 1999
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
PIONEER HI BRED INT (US)
BENSEN ROBERT J (US)
International Classes:
C07K14/415; C12N15/82; (IPC1-7): C12N15/82; A01H5/00; C07K14/415; C12N5/10; C12N15/62
Domestic Patent References:
WO1997004114A21997-02-06
WO1998020144A21998-05-14
WO1991003561A11991-03-21
WO2000005387A12000-02-03
WO2000005353A22000-02-03
Foreign References:
US5639952A1997-06-17
EP0508909A11992-10-14
EP0337899A11989-10-18
Other References:
DELLA-CIOPPA, G. ET AL.: "Targeting a herbicide-resistant enzyme from Escherichia coli to chloroplasts of higher plants." BIO/TECHNOLOGY, vol. 5, June 1987 (1987-06), pages 579-84, XP002140206
KAVANAGH T A ET AL: "TARGETING A FOREIGN PROTEIN TO CHLOROPLASTS USING FUSIONS TO THE TRANSIT PEPTIDE OF A CHLOROPHYLL A/B PROTEIN" MOLECULAR AND GENERAL GENETICS,DE,SPRINGER VERLAG, BERLIN, vol. 215, no. 1, 1 December 1988 (1988-12-01), pages 38-45, XP000027953 ISSN: 0026-8925
DATABASE EMBL - EMEST2 [Online] Entry/Acc.no. AA661454, 14 November 1997 (1997-11-14) BAYSDORFER C.: "zEST00799 Maize Leaf, Stratagene #937005 Zea mays cDNA clone csuh00799 5' end similar to plastid ribosomal protein CL9, mRNA sequence." XP002140210
GANTT, J.S.: "Nucleotide sequences of cDNAs encoding four complete nuclear-encoded plastid ribosomal proteins." CURRENT GENETICS, vol. 14, 1988, pages 519-28, XP000914620
DATABASE NCBI - DBEST [Online] Acc.no. 1738932, 5 June 1998 (1998-06-05) WEN, T.J. ET AL.: "MEST6-D3.TW1412.Seq ISUM2 Zea mays cDNA clone MEST6-D3 5', mRNA sequence." retrieved from HTTP://WWW.NCBI.NLM.NIH.GOV/IRX/CGI-BIN/BI RX_DOC?DBEST+1641583 XP002140211
SMOOKER, P.M. ET AL.: "Ribosomal protein L35: identification in spinach chloroplasts and isolation of a cDNA clone encoding its cytoplasmic precursor." BIOCHEMISTRY, vol. 29, 1990, pages 9733-6, XP002140207
DATABASE NCBI - DBEST [Online] Acc.no. 1716328, 26 May 1998 (1998-05-26) WEN, T.J. ET AL.: "MEST2-D1.TW1412.Seq ISUM2 Zea mays cDNA clone MEST2-D1 5', mRNA sequence." retrieved from HTTP://WWW.NCBI.NLM.NIH.GOV/IRX/CGI-BIN/BI RX_DOC?DBEST+1621781 XP002140212
SMEEKENS, S. ET AL.: "Import into chloroplasts of a yeast mitochondrial protein directed by ferrodoxin and plastocyanin transit peptides." PLANT MOLECULAR BIOLOGY, vol. 9, 1987, pages 377-88, XP002140208
DATABASE NCBI - DBEST [Online] Id: 1290373, Acc.no. C72774, 22 September 1997 (1997-09-22) SASAKI, T.: "Rice cDNA, partial sequence (E2219_1A)." retrieved from HTTP://WWW.NCBI.NLM.NIH.GOV/IRX/CGI-BIN/BI RX_DOC?DBEST+1216681 XP002140213
G\RLACH, J. ET AL.: "Differential expression of tomato (Lycopersicon esculentum L.) genes encoding shikimate pathway isoenzymes. II. Chorismate synthase." PLANT MOLECULAR BIOLOGY, vol. 23, 1993, pages 707-16, XP002140209
KRUSE, E. ET AL.: "Coproporphyrinogen III oxidase from barley and tobacco - sequence analysis and initial expression studies." PLANTA, vol. 196, 1995, pages 796-803, XP000920807
VON HEIJNE, G. ET AL.: "Domain structure of mitochondrial and chloroplast targeting peptides" EUROPEAN JOURNAL OF BIOCHEMISTRY, vol. 180, 1989, pages 535-45, XP000877203 cited in the application
ARCHER, E.K. ET AL.: "Current views on chloroplast protein import and hypotheses on the origin of the transport mechanism." JOURNAL OF BIOENGERGETICS AND BIOMEMBRANES, vol. 22, 1990, pages 789-810, XP000877205 cited in the application
LEROUX, B. ET AL.: "Engineering herbicide resistance in tobacco plants by expression of a bromoxynil specific nitrilase." BULLETIN DE LA SOCIETE BOTANIQUE DE FRANCE, vol. 137, no. 3/4, 1990, pages 65-78, XP000884652
Attorney, Agent or Firm:
Spruill, Murray W. (NC, US)
Download PDF:
Claims:
THAT WHICH IS CLAIMED
1. A method of modulating the subcellular localization of a protein of interest in a plant or plant cell, said method comprising transforming said plant or plant cell with an expression cassette comprising a promoter operably linked to a nucleotide sequence encoding a transit peptide operably linked to a nucleotide sequence encoding a protein of interest, wherein said transit peptide directs the protein of interest to a plant plastid and said transit peptide is selected from the group consisting of : a) a nucleic acid molecule encoding a polypeptide comprising an amino acid sequence set forth in one of the SEQ ID NOS: 2,4,6,8,10,12,14, 16,18,20, or 22; b) a nucleic acid molecule comprising a sequence set forth in one of the SEQ ID NOS: 1,3,5,7,9,11,13,15,17,19, or 21; c) a nucleic acid molecule hybridizing under stringent conditions to the sequences of a) or b).
2. The method of claim 1, wherein said plant plastid is selected from the group comprising a chloroplast, amyloplast, chromoplast, and leucoplast.
3. The method of claim 1, wherein said promoter is a constitutive promoter.
4. The method of claim 1, wherein said promoter is a tissuespecific promoter.
5. The method of claim 1, wherein said protein of interest imparts herbicide resistance.
6. The method of c ! aim 5, wherein said protein of interest is 5 enolpyruvylshikimate3phosphate synthase.
7. An isolated nucleic acid molecule comprising a nucleotide sequence selected from the group consisting of : a) a nucleotide sequence encoding a polypeptide comprising the amino acid sequence set forth in one of SEQ ID NOS: 2,4,6,8,10,12,14,16, 18,20, or 22; b) a nucleotide sequence comprising the sequence set forth in SEQ ID NOS: 15,17,19, or 21; c) a nucleotide sequence hybridizing under stringent conditions to a nucleotide sequence of a) or b).
8. An expression cassette comprising a promoter operably linked to a sequence encoding a transit peptide operably linked to a gene of interest, wherein said sequence encoding a transit peptide is selected from the group consisting of : a) a nucleic acid molecule encoding a polypeptide comprising an amino acid sequence set forth in one of the SEQ ID NOS: 2,4,6,8,10,12,14, 16,18,20, or 22; b) a nucleic acid sequence comprising a sequence set forth in one of the SEQ ID NOS: 15,17,19, or 21; c) a nucleic acid sequence hybridizing under stringent conditions to a sequence of a) or b).
9. A vector comprising the expression cassette of claim 8.
10. A transformed plant having stably incorporated in its genome an expression cassette comprising the following operably linked elements; a promoter, a coding sequence for a protein of interest, and a nucleotide sequence encoding a transit peptide, wherein said nucleotide sequence encoding a transit peptide is selected from the group consisting of : a) a nucleic acid molecule encoding a polypeptide comprising an amino acid sequence set forth in one of the SEQ ID NOS: 2,4,6,8,10,12,14, 16,18,20, or 22; b) a nucleic acid molecule comprising a sequence set forth in one of the SEQ ID NOS: 15,17,19, or 21; c) a nucleic acid sequence hybridizing under stringent conditions to the sequences of a) or b).
11. The plant of claim 10, wherein said plant is a dicot.
12. The plant of claim 10, wherein said plant is a monocot.
13. The plant of claim 12, wherein said monocot is maize.
14. Seed of the plant of claim 10.
15. A transformed plant cell having stably incorporated in its genome an expression cassette comprising the following operably linked elements; a promoter, a coding sequence for a protein of interest, a nucleotide sequence encoding a transit peptide, wherein said sequence encoding the transit peptide is selected from the group consisting of : a) a nucleic acid molecule encoding a polypeptide comprising an amino acid sequence set forth in one of the SEQ ID NOS: 2,4,6,8,10,12,14, or22; b) a nucleic acid molecule comprising a sequence set forth in one of the SEQ ID NOS: 15,17,19, or 21; c) a nucleic acid molecule hybridizing under stringent conditions to the sequences of a) or b).
16. An isolated polypeptide selected from the group consisting of : a) a polypeptide comprising an amino acid sequence set forth in SEQ ID NOS: 2,4,6,8,10,12,14,16,18,20, or 22; b) a polypeptide encoded by a nucleotide sequence comprising the sequence set forth in SEQ ID NOS: 1,3,5,7,9,11,13,15,17,19, or 21; c) a polypeptide encoded by a nucleotide sequence that hybridizes under stringent conditions to a nucleotide sequence comprising the sequence set forth in SEQ ID NOS: 1,3,5,7,9,11,13,15,17,19, or 21.
Description:
ORGANELLE TARGETING SEQUENCES FIELD OF THE INVENTION The invention is drawn to the genetic modification of plants, particularly to the targeting of proteins to cellular organelles.

BACKGROUND OF THE INVENTION Plastids are a class of plant organelles derived from proplastids and include chloroplasts, leucoplasts, amyloplasts, and chromoplasts. The plastids are major sites of biosynthesis in plants. In addition to photosynthesis in the chloroplast, plastids are also sites of lipid biosynthesis, nitrate reduction to ammonium, and starch storage. And while plastids contain their own circular genome, most of the proteins localized to the plastids are encoded by the nuclear genome and are imported into the organelle from the cytoplasm.

The mechanism of protein import into the plastids has been most extensively studied in the chloroplast. The chloroplast is a complex cellular organelle composed of three membranes: the inner envelope membrane, the outer envelope membrane, and the thylakoid membrane. The membranes together enclose three aqueous compartments termed the intermediate space, the stroma, and the thylakoid lumem.

Proteins imported from the cytosol generally contain, at their amino terminus, short sequences referred to as"transit peptides"that are responsible for post-translational targeting of the protein to the chloroplast. The import process is initiated by binding of precursor proteins to the chloroplast surface, followed by the subsequent translocation of the precursor protein across the chloroplast envelope membranes. The transit peptide is typically an expendable part of the protein, and upon translocation into the chloroplast the amino acid sequence is cleaved from the precursor protein. Further sub-organellar sorting of the modified precursor takes place as appropriate.

Genes reported to have naturally encoded transit peptide sequences at their N-terminus include the chloroplast small subunit of ribulose-1,5-bisphosphate carboxylase (Rubisco), de Castro Silva Filho et al. (1996) Plant Mol. Biol. 30: 769-

780; Schnell, D. J. et al. (1991) J. Biol. Chem. 266 (5) : 3335-3342; 5- (enolpyruvyl) shikimate-3-phosphate synthase (EPSPS), Archer et al. (1990) J.

Bioenerg. and Biomemb. 22 (6): 789-810; tryptophan synthase. Zhao, J. et al. (1995) J. Biol. Chem. 2 70 (11): 6081-6087, plastocyanin, Lawrence et al. (1997) J. Biol.

Chem. 272 (33) : 20357-20363, chorismate synthase, Schmidt et al. (1993) J. Biol.

Chem. 268 (36): 27477-27457, and the light harvesting chlorophyll a/b binding protein (LHBP), Lamppa et al. (1988) J. Biol. Chem. 263: 14996-14999.

Statistical analysis of transit peptides that direct protein localization to chloroplasts has revealed a sequence profile for transit peptides with the following characteristics. In the central region, the peptides typically contain an exceptionally high content of basic and hydroxylated amino acids, such as serine and threonine.

In addition, there is a near absence of negatively charged amino acids such as aspartic acid, glutamic acid, asparagine, and glutamine. The amino-terminal region is devoid of charged amino acids and lacks turn promoting amino acids such as glycine and proline. The carboxy terminal domain is high in arginine and has a capacity for forming an amphipathic beta sheet secondary structure. The length of the transit peptide is variable, commonly between 50 and 120 amino acids. In addition, there is a well conserved cleavage site (V/I) X (C/A) A. Often one or more arginines are found some 5 to 10 residues upstream of this cleavage site.

Exceptions to the transit peptide signals described above are known. See von Heijne et al. (1989) Eur. J. Biochem 180: 535-545.

Because proteins containing transit peptides are localized to the chloroplast with a high degreed of specificity (Boutry et al. (1987) Nature 328: 340-342; de Boer et al. (1991) Biochem. Biophys. Acta 1071: 221-253), transit peptide sequences prove useful in recombinant DNA technology. For example, transit peptide sequences may be inserted into an expression cassette and serve to guide the expressed protein to the chloropast. In plants, transit peptide signals have been useful in the localization of proteins responsible for herbicide or antibacterial resistance to the chloroplast.

Although transit peptides have been described, only a few have been utilized successfully in attempts to target chimeric molecules to chloroplasts.

Thus, there is a need for additional DNA sequences that encode transit peptides for

use in future genetic engineering projects that require specific targeting of foreign proteins to chloroplast.

SUMMARY OF THE INVENTION The present invention provides methods and compositions for the subcellular localization of proteins. Specifically, the invention provides a means to direct the localization of a protein to a plant cell organelle, more particularly to a plant plastid. Compositions of the present invention include the nucleotide and amino acid sequences of novel plastid targeting sequences, hence referred to as transit peptides. Such sequences find utility in the enhanced or modified localization of proteins to a plastid or compartment thereof.

Further compositions of the invention include, expression cassettes and transformation vectors comprising the isolated nucleotide sequences of the transit peptides. Also provided are transgenic plants, plant cells, and plant tissue that express proteins that have been localized to a plastid using the transit peptides of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS Figure I schematically illustrates a plasmid vector comprising a ubiquitin promoter operably linked to a transit peptide sequence of the invention operably linked to a gene of interest.

DETAILED DESCRIPTION OF THE INVENTION Compositions and methods are provided for modulating the subcellular localization of proteins. Specifically, the compositions of the invention include maize transit peptide sequences that find use in modulating the cellular localization of a protein of interest. In particular, the transit peptides of the invention finds use in the localization of proteins to plant organelles, particularly to plastids and compartments thereof.

By"plastid"is intended a class of plant cell organelles comprising proplastids, leucoplasts, amyloplast, chromoplasts, and chloroplast. By"plastid or compartment thereof"is intended any plastid structure, membrane or compartment of a plastid. For example, when referring to a chloroplast,"a

compartment thereof"encompasses the intermediate envelope space, the stroma, the lumen, the outer envelope, the inner envelope and the thylakoid membrane.

A signal or targeting sequence is a structural peptide domain required for targeting of a given polypeptide to a subcellular organelle, subcellular compartment or secretion from the cell. The transport of a protein of interest to a subcellular compartment is accomplished by operably linking the nucleotide sequence encoding a signal sequence to the 5'and/or 3'region of the gene encoding the protein of interest. During protein synthesis and processing, the targeting sequence influence where the protein of interest is ultimately compartmentalized.

By"transit peptide"is intended a polypeptide that directs the transport of a nuclear encoded protein to a plastid or a compartment thereof. Typically, the transit peptide sequence is located at the amino-terminus of a polypeptide.

However, the transit peptide may also be located at either the c-terminus or internally in the polypeptide.

The maize sequences provided by the present invention includes a maize transit peptide having homology to the maize light harvesting chlorophyll a/b binding protein (SEQ ID NOS: 1 and 2). The present invention also provides a transit peptide having a homology to the maize ribulose bisphosphate carboxylase/oxygenase protein (SEQ ID NOS: 3 and 4).

Also provided are maize transit peptide sequences that share homology to transit peptide sequences of various non-maize gene products including, EPSP synthase (SEQ ID NOS: 5 and 6), tryptophan synthase component (SEQ ID NOS: 7 and 8), ribosomal protein L35 (SEQ ID NOS: 9 and 10), plastid ribosomal protein CL9 (SEQ ID NOS: 11 and 12), plastocyanin (SEQ ID NOS: 13 and 14), 3-dehydroquinate synthase (SEQ ID NOS: 15 and 16), plastid ribosomal protein CL15 (SEQ ID NOS: 17 and 18), chorismate synthase (SEQ ID NOS: 19 and 20), and choporphyringogen oxidase (SEQ ID NOS: 21 and 22).

In particular, the present invention provides for isolated nucleic acid molecules comprising nucleotide sequences encoding the amino acid sequences shown in SEQ ID NOS: and 22. Further provided are polypeptides having an amino acid sequence encoded by a nucleic acid

molecule described herein, for example those set forth in SEQ ID NOS: 1,3,5,7, 9,11,13,15,17,19, and 21 and fragments and variants thereof.

The invention encompasses isolated or substantially purified nucleic acid or polypeptide compositions. An"isolated"or"purified"nucleic acid molecule or protein, or biologically active portion thereof, is substantially free of other cellular material, or culture medium when produced by recombinant techniques, or substantially free of chemical precursors or other chemicals when chemically synthesized. Preferably, an"isolated"nucleic acid is free of sequences (preferably protein encoding sequences) that naturally flank the nucleic acid (i. e., sequences located at the 5'and 3'ends of the nucleic acid) in the genomic DNA of the organism from which the nucleic acid is derived. For example, in various embodiments, the isolated nucleic acid molecule can contain less than about 5 kb, 4 kb, 3 kb. 2 kb, 1 kb, 0.5 kb, or 0.1 kb of nucleotide sequences that naturally flank the nucleic acid molecule in genomic DNA of the cell from which the nucleic acid is derived. A polypeptide that is substantially free of cellular material includes preparations of polypeptides having less than about 30%, 20%, 10%, 5%, (by dry weight) of contaminating polypeptides. When the polypeptide of the invention or biologically active portion thereof is recombinantly produced, preferably culture medium represents less than about 30%, 20%, 10%, or 5% (by dry weight) of chemical precursors or non-protein-of-interest chemicals.

Fragments and variants of the disclosed nucleotide sequences and the polypeptides encoded thereby are also encompassed by the present invention. By "fragment"is intended a portion of the nucleotide sequence or a portion of the amino acid sequence and hence polypeptide encoded thereby. Fragments of a nucleotide sequence may encode polypeptide fragments that retain the biological activity of the native polypeptide and hence facilitates the transport of a nuclear encoded protein to a plastid or a compartment thereof. Alternatively, fragments of a nucleotide sequence that are useful as hybridization probes generally do not encode fragment polypeptides retaining biological activity. Thus, fragments of a nucleotide sequence may range from at least about 20 nucleotides, about 50 nucleotides, about 100 nucleotides, and up to the full-length nucleotide sequence encoding the proteins of the invention.

A fragment of a transit peptide nucleotide sequence that encodes a biologically active portion of a transit peptide of the invention will encode at least 50,60,70,80,90,100 contiguous amino acids, or up to the total number of amino acids present in a full-length transit peptide of the invention (for example, 135,117,144,152,150,152,145,132,134,166, and 107 amino acids for SEQ ID NOS; 2,4,6,8,10,12,14,16,18,20, and 22, respectfully).

Fragments of a transit peptide nucleotide sequence that are useful as hybridization probes for PCR primers generally need not encode a biologically active portion of a transit peptide.

Thus, a fragment of a transit peptide nucleotide sequence may encode a biologically active portion of a transit peptide, or it may be a fragment that can be used as a hybridization probe or PCR primer using methods disclosed below. A biologically active portion of a transit peptide can be prepared by isolating a portion of one of the transit peptide nucleotide sequences of the invention, expressing the encoded portion of the transit peptide (e. g., by recombinant expression in vitro), and assessing the activity of the encoded portion of the transit peptide. Nucleic acid molecules that are fragments of a transit peptide nucleotide sequence comprise at least 16,20,50,75,100,150,200,250,300,350,400 nucleotides, or up to the number of nucleotides present in a full-length transit peptide nucleotide sequence disclosed herein (for example, 407,426,455,459, 560,461,437,463,448,528, and 427 nucleotides for SEQ ID NOS: 1,3,5,7,9, 11,13,15,17,19, and 21, respectively).

By"variants"is intended substantially similar sequences. For nucleotide sequences, conservative variants include those sequences that, because of the degeneracy of the genetic code, encode the amino acid sequence of one of the transit peptide of the invention. Naturally occurring variants such as these can be identified with the use of well-known molecular biology techniques, as, for example, with polymerase chain reaction (PCR) and hybridization techniques as outlined below. Variant nucleotide sequences also include synthetically derived nucleotide sequences, such as those generated, for example, by using site-directed mutagenesis but which still encode a transit peptide or protein of the invention.

Generally, nucleotide sequence variants of the invention will have at least 40%,

50%, 60%, 70%, generally, 80%, preferably 85%, 90%, up to 95%, 98% sequence identity to its respective native nucleotide sequence.

By"variant"polypeptide is intended as a polypeptide derived from the native polypeptide by deletion (so-called truncation) or addition of one or more amino acids to the N-terminal and/or C-terminal end of the native polypeptide; deletion or addition of one or more amino acids at one or more sites in the native polypeptide; or substitution of one or more amino acids at one or more sites in the native protein. Such variants may result from, for example, genetic polymorphism or from human manipulation.

The transit polypeptides of the invention may be altered in various ways including amino acid substitutions, deletions, truncations, and insertions. Methods for such manipulations are generally known in the art. For example, amino acid sequence variants of the transit peptide can be prepared by mutations in the DNA.

Methods for mutagenesis and nucleotide sequence alterations are well known in the art. See, for example, Kunkel (1985) Proc. Natl. Acad. Sci. USA 82: 488-492; Kunkel et al. (1987) Methods in Enzymol. 154: 367-382; U. S. Patent No.

4,873,192; Walker and Gaastra, eds. (1983) Techniques in Molecular Biology (MacMillan Publishing Company, New York) and the references cited therein.

Guidance as to appropriate amino acid substitutions that do not affect biological activity of the protein of interest may be found in the model of Dayhoff et al.

(1978) Atlas of Protein Sequence and Structure (Natl. Biomed. Res. Found., Washington, D. C.), herein incorporated by reference. Conservative substitutions, such as exchanging one amino acid with another having similar properties, may be preferred.

Thus, the nucleotide sequences of the invention include both the naturally occurring sequences as well as mutant forms. Likewise, the polypeptides of the invention encompass both naturally occurring polypeptides as well as variations and modified forms thereof. Such variants will continue to possess the desired ability to facility the transport of a nuclear encoded protein to a plastid or compartment thereof. Obviously, the mutations that will be made in the DNA encoding the variant must not place the sequence out of reading frame and preferably will not create complementary regions that could produce secondary mRNA structure. See, EP Patent Application Publication No. 75,444.

The deletions, insertions, and substitutions of the polypeptides sequences encompassed herein are not expected to produce radical changes in the characteristics of the polypeptide. However, when it is difficult to predict the exact effect of the substitution, deletion, or insertion in advance of doing so, one skilled in the art will appreciate that the effect will be evaluated by routine screening assays. That is, the activity can be evaluated by the ability of the isolated sequences to target and deliver a reporter protein to a plastid or compartment thereof. See, for example, de Castro Silva Filho et al. (1996) Plant Mol. Biol. 30: 796-780, herein incorporated by reference.

Variant nucleotide sequences and polypeptides also encompass sequences and polypeptides derived from a mutagenic and recombinogenic procedure such as DNA shuffling. With such a procedure, one or more different transit peptide sequences can be manipulated to create a new transit peptide sequence possessing the desired properties. In this manner, libraries of recombinant polynucleotides are generated from a population of related sequence polynucleotides comprising sequence regions that have substantial sequence identity and can be homologously recombined in vitro or in vivo. For example, using this approach, sequence motifs encoding a domain of interest may be shuffled between the transit peptide sequences of the invention and other known signal sequences or transit peptide sequences to obtain a new nucleotide sequence coding for a transit peptide with an improved property of interest, such as an increased Km or an increased efficiency and/or specificity of plastid targeting. Strategies for such DNA shuffling are known in the art. See, for example, Stemmer (1994) Proc. Natl. Acad. Sci. USA 91: 10747-10751; Stemmer (1994) Nature 370: 389-391; Crameri et al. (1997) Nature Biotech. 15: 436-438; Moore et al. (1997) J. Mol. Biol. 272: 336-347; Zhang et al. (1997) Proc. Natl. Acad. Sci. USA 94: 4504-4509; Crameri et al. (1998) Nature 391: 288-291; and U. S. Patent Nos. 5,605,793 and 5,837,458.

The nucleotide sequences of the invention can be used to isolate coresponding sequences from other organisms, particularly other plants, more particularly other monocots. In this manner, methods such as PCR, hybridization, and the like can be used to identify such sequences based on their sequence homology to the sequences set forth herein. Sequences isolated based on their

sequence identity to the entire transit peptide sequences set forth herein or to fragments thereof are encompassed by the present invention.

In a PCR approach, oligonucleotide primers can be designed for use in PCR reactions to amplify corresponding DNA sequences from cDNA or genomic DNA extracted from any plant of interest. Methods for designing PCR primers and PCR cloning are generally known in the art and are disclosed in Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, New York). See also Innis et al., eds. (1990) PCR Protocols: A Guide to Methods and Applications (Academic Press, New York); Innis and Gelfand, eds. (1995) PCR Strategies (Academic Press, New York); and Innis and Gelfand, eds. (1999) PCR Methods Manual (Academic Press, New York). Known methods of PCR include, but are not limited to, methods using paired primers, nested primers, single specific primers, degenerate primers, gene-specific primers, vector-specific primers, partially-mismatched primers, and the like.

In hybridization techniques, all or part of a known nucleotide sequence is used as a probe that selectively hybridizes to other corresponding nucleotide sequences present in a population of cloned genomic DNA fragments or cDNA fragments (i. e., genomic or cDNA libraries) from a chosen organism. The hybridization probes may be genomic DNA fragments, cDNA fragments, RNA fragments, or other oligonucleotides, and may be labeled with a detectable group such as 32p, or any other detectable marker. Thus, for example, probes for hybridization can be made by labeling synthetic oligonucleotides based on the transit peptide sequences of the invention. Methods for preparation of probes for hybridization and for construction of cDNA and genomic libraries are generally known in the art and are disclosed in Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, New York).

For example, the entire nucleotide sequence encoding a transit peptide disclosed herein, or one or more portions thereof, may be used as a probe capable of specifically hybridizing to corresponding transit peptide sequences and messenger RNAs. To achieve specific hybridization under a variety of conditions, such probes include sequences that are unique among transit peptide sequences and are preferably at least about 10 nucleotides in length, and most preferably at least

about 20 nucleotides in length. Such probes may be used to amplify corresponding transit peptide sequences from a chosen plant by PCR. This technique may be used to isolate additional coding sequences from a desired plant or as a diagnostic assay to determine the presence of coding sequences in a plant. Hybridization techniques include hybridization screening of plated DNA libraries (either plaques or colonies; see, for example, Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, New York).

Hybridization of such sequences may be carried out under stringent conditions. By"stringent conditions"or"stringent hybridization conditions"is intended conditions under which a probe will hybridize to its target sequence to a detectably greater degree than to other sequences (e. g., at least 2-fold over background). Stringent conditions are sequence-dependent and will be different in different circumstances. By controlling the stringency of the hybridization and/or washing conditions, target sequences that are 100% complementary to the probe can be identified (homologous probing). Alternatively, stringency conditions can be adjusted to allow some mismatching in sequences so that lower degrees of similarity are detected (heterologous probing). Generally, a probe is less than about 1000 nucleotides in length, preferably less than 500 nucleotides in length.

Typically, stringent conditions will be those in which the salt concentration is less than about 1.5 M Na ion, typically about 0.01 to 1.0 M Na ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30°C for short probes (e. g., 10 to 50 nucleotides) and at least about 60°C for long probes (e. g., greater than 50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide. Exemplary low stringency conditions include hybridization with a buffer solution of 30 to 35% formamide, 1 M NaCl, 1% SDS (sodium dodecyl sulphate) at 37°C, and a wash in 1X to 2X SSC (20X SSC = 3.0 M NaCl/0.3 M trisodium citrate) at 50 to 55°C. Exemplary moderate stringency conditions include hybridization in 40 to 45% formamide, 1.0 M NaCl, 1% SDS at 37°C, and a wash in 0.5X to 1X SSC at 55 to 60°C.

Exemplary high stringency conditions include hybridization in 50% formamide, 1 M NaCl, 1% SDS at 37°C, and a wash in 0.1X SSC at 60 to 65°C.

Specificity is typically the function of post-hybridization washes, the critical factors being the ionic strength and temperature of the final wash solution.

For DNA-DNA hybrids, the Tm can be approximated from the equation of Meinkoth and Wahl (1984) Anal. Biochem. 138: 267-284: Tm = 81.5°C + 16.6 (log M) + 0.41 (% GC)-0.61 (% form)-500/L; where M is the molarity of monovalent cations, % GC is the percentage of guanosine and cytosine nucleotides in the DNA, % form is the percentage of formamide in the hybridization solution, and L is the length of the hybrid in base pairs. The Tm is the temperature (under defined ionic strength and pH) at which 50% of a complementary target sequence hybridizes to a perfectly matched probe. Tm is reduced by about 1 °C for each 1% of mismatching; thus, Tm, hybridization, and/or wash conditions can be adjusted to hybridize to sequences of the desired identity. For example, if sequences with >90% identity are sought, the Tm can be decreased 10°C. Generally, stringent conditions are selected to be about 5°C lower than the thermal melting point (Tm) for the specific sequence and its complement at a defined ionic strength and pH. However, severely stringent conditions can utilize a hybridization and/or wash at 1,2,3, or 4°C lower than the thermal melting point (Tm); moderately stringent conditions can utilize a hybridization and/or wash at 6,7,8,9, or 10°C lower than the thermal melting point (Tm); low stringency conditions can utilize a hybridization and/or wash at 11,12,13,14,15, or 20°C lower than the thermal melting point (Tm).

Using the equation, hybridization and wash compositions, and desired Tm, those of ordinary skill will understand that variations in the stringency of hybridization and/or wash solutions are inherently described. If the desired degree of mismatching results in a Tm of less than 45°C (aqueous solution) or 32°C (formamide solution), it is preferred to increase the SSC concentration so that a higher temperature can be used. An extensive guide to the hybridization of nucleic acids is found in Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biology-Hybridization with Nucleic Acid Probes, Part I, Chapter 2 (Elsevier, New York); and Ausubel et al., eds. (1995) Current Protocols in Molecular Biology, Chapter 2 (Greene Publishing and Wiley-Interscience, New York). See Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, New York).

In general, sequences that encode a transit peptide and hybridize to the nucleic acid sequences encoding a transit peptide disclosed herein will be at least 40% to 50% homologous, about 60% to 70% homologous, and even about 80%, 85%, 90%, 95% to 98% homologous or more with the disclosed sequences. That is, the sequence similarity of sequences may range, sharing at least about 40% to 50%, about 60% to 70%, and even about 80%, 85%, 90%, 95% to 98% sequence similarity.

The following terms are used to describe the sequence relationships between two or more nucleic acids or polynucleotides: (a)"reference sequence", (b)"comparison window", (c)"sequence identity", (d)"percentage of sequence identity", and (e)"substantial identity".

(a) As used herein,"reference sequence"is a defined sequence used as a basis for sequence comparison. A reference sequence may be a subset or the entirety of a specified sequence; for example, as a segment of a full-length cDNA or gene sequence, or the complete cDNA or gene sequence.

(b) As used herein,"comparison window"makes reference to a contiguous and specified segment of a polynucleotide sequence, wherein the polynucleotide sequence in the comparison window may comprise additions or deletions (i. e., gaps) compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. Generally, the comparison window is at least 20 contiguous nucleotides in length, and optionally can be 30.40,50,100, or longer. Those of skill in the art understand that to avoid a high similarity to a reference sequence due to inclusion of gaps in the polynucleotide sequence a gap penalty is typically introduced and is subtracted from the number of matches.

Methods of alignment of sequences for comparison are well known in the art. Optimal alignment of sequences for comparison may be conducted by the local homology algorithm of Smith et al. (1981) Adv. Appl. Math. 2: 482; by the homology alignment algorithm of Needleman et al. (1970) J. Mol. Biol. 48: 443; by the search for similarity method of Pearson et al. (1988) Proc. Natl. Acad. Sci.

85: 2444; by computerized implementations of these algorithms, including, but not limited to: CLUSTAL in the PC/Gene program by Intelligenetics, Mountain View, California; GAP, BESTFIT, BLAST, FASTA, and TFASTA in the Wisconsin

Genetics Software Package, Genetics Computer Group (GCG), 575 Science Drive, Maison, Wisconsin, USA ; the CLUSTAL program is well described by Higgins et al. (1988) Gene 73: 237-244 (1988); Higgins et al. (1989) CABIOS 5: 151-153; Corpet et al. (1988) Nucleic Acids Res. 16: 10881-90; Huangetal. (1992) Computer Applications in the Biosciences 8: 15 5-65, and Person et al. (1994) Meth.

Mol. Biol. 24: 307-331; preferred computer alignment methods also include the BLASTP, BLASTN, and BLASTX algorithms (see Altschul et al. (1990) J. Mol.

Biol. 215: 403-410). Alignment is also often performed by inspection and manual alignment. Sequence alignments are performed using the default parameters of the alignment programs.

(c) As used herein,"sequence identity"or"identity"in the context of two nucleic acid or polypeptide sequences makes reference to the residues in the two sequences that are the same when aligned for maximum correspondence over a specified comparison window. When percentage of sequence identity is used in reference to proteins it is recognized that residue positions which are not identical often differ by conservative amino acid substitutions, where amino acid residues are substituted for other amino acid residues with similar chemical properties (e. g., charge or hydrophobicity) and therefore do not change the functional properties of the molecule. When sequences differ in conservative substitutions, the percent sequence identity may be adjusted upwards to correct for the conservative nature of the substitution. Sequences that differ by such conservative substitutions are said to have"sequence similarity"or"similarity". Means for making this adjustment are well known to those of skill in the art. Typically this involves scoring a conservative substitution as a partial rather than a full mismatch, thereby increasing the percentage sequence identity. Thus, for example, where an identical amino acid is given a score of 1 and a non-conservative substitution is given a score of zero, a conservative substitution is given a score between zero and 1. The scoring of conservative substitutions is calculated, e. g., as implemented in the program PC/GENE (Intelligenetics, Mountain View, California).

(d) As used herein,"percentage of sequence identity"means the value determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i. e., gaps) as compared to the

reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison, and multiplying the result by 100 to yield the percentage of sequence identity.

(e) (i) The term"substantial identity"of polynucleotide sequences means that a polynucleotide comprises a sequence that has at least 70% sequence identity, preferably at least 80%, more preferably at least 90%, and most preferably at least 95%, compared to a reference sequence using one of the alignment programs described using standard parameters. One of skill in the art will recognize that these values can be appropriately adjusted to determine corresponding identity of proteins encoded by two nucleotide sequences by taking into account codon degeneracy, amino acid similarity, reading frame positioning, and the like.

Substantial identity of amino acid sequences for these purposes normally means sequence identity of at least 60%, more preferably at least 70%, 80%, 90%, and most preferably at least 95%.

Another indication that nucleotide sequences are substantially identical is if two molecules hybridize to each other under stringent conditions. Generally, stringent conditions are selected to be about 5°C lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. However, stringent conditions encompass temperatures in the range of about 1 °C to about 20°C, depending upon the desired degree of stringency as otherwise qualified herein. Nucleic acids that do not hybridize to each other under stringent conditions are still substantially identical if the polypeptides they encode are substantially identical. This may occur, e. g., when a copy of a nucleic acid is created using the maximum codon degeneracy permitted by the genetic code. One indication that two nucleic acid sequences are substantially identical is when the polypeptide encoded by the first nucleic acid is immunologically cross reactive with the polypeptide encoded by the second nucleic acid.

(e) (ii) The term"substantial identity"in the context of a peptide indicates that a peptide comprises a sequence with at least 70% sequence identity to a

reference sequence, preferably 80%, more preferably 85%, most preferably at least 90% or 95% sequence identity to the reference sequence over a specified comparison window. Preferably, optimal alignment is conducted using the homology alignment algorithm of Needleman et al. (1970) J. Mol. Biol. d8: 443.

An indication that two peptide sequences are substantially identical is that one peptide is immunologically reactive with antibodies raised against the second peptide. Thus, a peptide is substantially identical to a second peptide, for example, where the two peptides differ only by a conservative substitution. Peptides that are "substantially similar"share sequences as noted above except that residue positions that are not identical may differ by conservative amino acid changes.

As described in detail below, the nucleotide sequences of the present invention may be operably linked to the nucleotide sequences encoding a protein of interest and thereby modulate the cellular localization of the protein. By "modulate"is intended any increase in the concentration of said protein in a plastid or compartment thereof beyond that which occurs in the absence of the transit peptide. Additionally,"modulate"refers to an increased rate of importation of said protein into the plastid as compared to the rate of importation in the absence of the transit peptide sequence.

The nucleotide sequences of the invention are provided in expression cassettes for expression in the plant of interest. The cassette will comprise a trancriptional initiation and translational termination sequence functional in plants operably linked to a nucleic acid sequence encoding a transit peptide of the invention, operably linked to a nucleotide encoding a protein of interest. The cassette may contain at least one additional sequence to be cotransformed into the organism. Alternatively, the additional sequences can be provided on another expression cassette.

"Operably linked"refers to a functional linkage between a promoter and a second sequence, wherein the promoter inititates and mediates transcription of DNA sequences corresponding to the second sequence."Operably linked"also refers to a functional linkage between 2 or more distinct nucleotide sequences such that the nucleic acid sequences being linked are contiguous and, where necessary to join two protein coding regions, contiguous and in the same reading frame.

Operably linking the transit peptide-coding sequences with the nucleotide

sequences encoding a protein of interest may require the manipulation of one or more of the DNA sequences. For example, a convenient restriction site or a linker sequences that acts as a non-specific spacer that may permit better recognition of the amino-terminal transit sequence may be introduced.

Expression of the coding sequences of a protein of interest operably linked to sequences of the transit peptide produces a hybrid polypeptide, or so-called fusion protein. By"hybrid"polypeptide is intended the coding sequences for the transit peptide is foreign to the coding sequences for the protein of interest, and hence, the two coding sequences are not natively expressed as a polypeptide in the plant cell.

It is recognized that in addition to the transit peptide of the present invention, additional amino acids may be fused to the protein of interest to further influence the fate the protein. Techniques for making fusion proteins recombinantly are well known in the art.

The transcriptional initiation region, the promoter, may be native or analogous or foreign or heterologous to the plant host. Additionally, the promoter may be a natural sequence or alternatively a synthetic sequence. By foreign is intended that the transcriptional initiation region is not found in the native plant into which the transcriptional initiation region is introduced.

While it may be preferable to express the sequences using heterologous promoters, the native promoter sequences of either the gene of interest or the transit peptide sequence may be used. Such constructs would change expression levels of the gene of interest in the plant or plant cell. Thus, the phenotype of the plant or plant cell is altered.

It is recognized that a variety of promoters will be useful in the invention, the choice of which will depend in part upon the desired level of expression of the protein of interest. It is recognized that the levels of expression can be controlled to modulate the levels of expression in the plant cell. Constitutive and tissue specific promoters are of particular interest. Such constitutive promoters include, for example, the core promoter of the Rsyn7 (copending U. S. Application Serial No. 08/661,601); the core CaMV 35S promoter (Odell et al. (1985) Nature 313: 810-812); rice actin (McElroy et al. (1990) Plant Cell 2: 163-171); ubiquitin (Christensen et al. (1989) Plant Mol. Biol. 12: 619-632 and Christensen et al.

(1992) Plant Mol. Biol. 18: 675-689); pEMU (Last et ul. (1991) Theor. Appl.

Genet. 81: 581-588); MAS (Velten et al. (1984) EMBOJ. 3: 2723-2730); ALS promoter (U. S. Application Serial No. 08/409,297), and the like. Other constitutive promoters include, for example, U. S. Patent Nos. 5,608,149; 5,608,144; 5,604,121; 5,569,597; 5,466,785; 5,399,680; 5,268,463; and 5,608,142.

Tissue-specific promoters can be utilized to target enhanced expression within a particular plant tissue. Tissue-specific promoters include Yamamoto et al.

(1997) Plant J. 12 (2) 255-265; Kawamata et al. (1997) Plant Cell Physiol.

38 (7) : 792-803; Hansen et al. (1997) Mol. Gen Genet. 254 (3) : 337-343; Russell et al. (1997) Transgenic Res. 6 (2) : 157-168; Rinehart et al. (1996) Plant Physiol.

112 (3) : 1331-1341; Van Camp et al. (1996) Plant Physiol. 112 (2): 525-535; Canevascini et al. (1996) Plant Physiol. 112 (2) : 513-524; Yamamoto et al. (1994) Plant Cell Phvsiol. 35 (5) : 773-778; Lam (1994) ResultsProbl. Cell Differ. 20: 181- 196; Orozco et al. (1993) Plant Mol Biol. 23 (6) : 1129-113 8; Matsuoka et al. (1993) Proc Natl. Acad. Sci. USA 90 (20) : 9586-9590; and Guevara-Garcia et al. (1993) Plant J. 4 (3) : 495-505. Such promoters can be modified, if necessary, for weak expression.

Leaf-specific promoters are known in the art. See, for example, Yamamoto et al. (1997) Plant J. 12 (2) : 255-265; Kwon et al. (1994) Plant Physiol. 105: 357- 67; Yamamoto et al. (1994) Plant Cell Physiol. 35 (5) : 773-778; Gotor et al. (1993) Plant J. 3: 509-18; Orozco et al. (1993) Plant Mol. Biol. 23 (6) : 1129-1138; and Matsuoka et al. (1993) Proc. Natl. Acad. Sci. USA 90 (20) : 9586-9590.

The termination region may be native with the transcriptional initiation region, may be native with the DNA sequence of interest, or may be derived from another source. Convenient termination regions are available from the Ti-plasmid of A. tumefaciens, such as the octopine synthase and nopaline synthase termination regions. See also Guerineau et al. (1991) Mol. Gen. Genet. 262: 141-144; Proudfoot (1991) Cell 64: 671-674; Sanfacon et al. (1991) Genes Dev. 5: 141-149; Mogen et al. (1990) Plant Cell. 2: 1261-1272; Munroe et al. (1990) Gene 91: 151- 158; Ballas et al. (1989) Nucleic Acids Res. 17: 7891-7903; Joshi et al. (1987) Nucleic Acids Res. 15: 9627-9639.

The expression cassette contains a plurality of restriction sites to insert both the gene of interest and the transit peptide sequence 3'of the designated promoter.

The transit peptide sequences of the invention may be operably linked to the gene of interest at the 3'terminus, 5'terminus or internally. Preferably, the sequences of the invention will be placed at the 5'end. The nucleic acids included in the expression cassette may be optimized for expression in a plastid or compartment thereof to account for differences in codon usage between the plant nucleus and this organelle. In this manner, the nucleic acid sequences may be synthesized using chloroplast-preferred codons. See, for example, U. S. Patent No. 5,380,831, herein incorporated by reference.

Preferably the gene of interest encoding a protein to be localized the a plastid or compartment thereof is linked to the transit peptide nucleic acid sequence in such a way that upon translation and import into a plastid or compartment thereof the transit peptide is cleaved from the protein of interest.

Methods for preparing transit peptide chimeras are known in the art and are described in the following publications and issued patents. See de Castro Silva Filho et al. (1996) Plant Mol. Biol. 30: 769-780; Pilon et al. (1995) J. Biol. Chem.

270 (8) : 3882-3893; U. S. Patent No. 5,633,444; U. S. Patent No. 5,498,544.

The gene of interest may be native or analogous or foreign or heterologous to the plant host. By foreign is intended that the gene of interest is not found in the native plant into which it is introduced. The gene of interest may also be nuclear encoded or plastid encoded. Generally, the proteins selected for targeting to the plastids are heterologous to the transformed cell and nuclear encoded.

Genes of interest include, for example, any protein whose localization in the plastid will modify agronomically important traits such as oil, starch, and protein content. Other modified traits include herbicide, disease, and insect resistance.

Specific genes of interest may include, but are not limited to, the small subunit of ribulose bisphosphate carboxylase (Rubisco), Schnell et al. (1991) J.

Biol. Chem. 266 (5) : 3335-3342; ferrodoxin, Pilon et al. (1995) J : Biol. Chem.

270 (8) : 3882-3893; light harvesting chlorophyll a/b binding protein, Lamppa et al.

(1988) J. Biol. Chem. 263: 14996-14999, Reiski FeS protein, Madueno et al. (1994) J. Biol. Chem. 269 (26) : 17458-17463; plastocyanin, Lawrence et al. (1997) J. Biol.

Chem. 272 (33) : 20357-20363; Btl protein, Li et al. (1992) J. Biol. Chem.

267 (26) : 18999-19004; forms of dihydropteroate synthase (DHPS), U. S. Patent No.

acetyl CoA carboxylase, U. S. Patent No. 5,498,544; superoxide dismutase, U. S. Patent No. 5,538,878; and 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS), Archer et al. (1990) J. Bioenerg. Biomem. 22 (6) : 789-810; U. S.

Patent No. 5,188,642.

Of particular interest are genes encoding proteins involved in herbicide resistance. In a preferred embodiment, the herbicide resistance is imparted by 5- enolpyruvylshikimate-3-phosphate synthase. It is recognized that fragments and variants of the various proteins of interest may also be used with the transit peptide sequences of the invention. For example, the 5-enolpyruvylshikimate-3-phosphate synthase may be altered such that the protein product is less sensitive to herbicide inhibition. See for example, U. S. Patent No. 5,188,642.

Alternatively. the gene of interest may be a reporter gene. Reporter genes are generally known in the art. The reporter gene used should not be expressed endogenously. Ideally the reporter gene will exhibit low background activity and should not interfere with plant biochemical and physiological activities. The products expressed by the reporter gene should be stable and readily detectable. It is important that the reporter gene expression should be able to be assayed by a non-destructive, quantitative, sensitive, easy to perform and inexpensive method.

Examples of suitable reporter genes known in the art can be found in, for example, Jefferson ea al. (1991) in Plant Molecular Biology Manual (Gelvin et al. eds.) pp.

1-33, Kluwer Academic Publishers; DeWet et al. (1987) Mol. Cell. Biol. 7: 725- 737 ; Goff et al. (1990) EMBO J. 9: 2517-2522; Kain et al. (1995) BioTechniques 19: 650-655; Chiu et a1. (1996) Current Biology 6 : 325-330.

The transit peptide sequences of the invention may be native or analogous or foreign or heterologous to either the host plant or to the gene of interest. By foreign is intended that the transit peptide sequence is not found in the native host plant or is not naturally encoded by the gene of interest. Furthermore, the DNA sequence encoding the transit peptide may be chemically synthesized either wholly or in part from the known sequence of the transit peptide.

Where appropriate, the gene (s) of interest and the transit peptide sequences of the invention may be optimized for increased expression in the transformed plant. That is, the genes can be synthesized using plant-preferred codons for improved expression. See, for example, Campbell and Gowri (1990) Plant Physiol.

92: 1-11 for a discussion of host-preferred codon usage. Methods are available in the art for synthesizing plant-preferred genes. See, for example, U. S. Patent Nos.

436,391, and Murray et al. (1989) Nucleic Acids Res. 17: 477-498, herein incorporated by reference.

Additional sequence modifications are known to enhance gene expression in a cellular host. These include elimination of sequences encoding spurious polyadenylation signals, exon-intron splice site signals. transposon-like repeats, and other such well-characterized sequences that may be deleterious to gene expression. The G-C content of the sequence may be adjusted to levels average for a given cellular host, as calculated by reference to known genes expressed in the host cell. When possible, the sequence is modified to avoid predicted hairpin secondary mRNA structures.

The expression cassettes may additionally contain 5'leader sequences in the expression cassette construct. Such leader sequences can act to enhance translation. Translation leaders are known in the art and include: picornavirus leaders, for example, EMCV leader (Encephalomyocarditis 5'noncoding region) (Elroy-Stein et al. (1989) PNAS USA 86: 6126-6130); potyvirus leaders, for example, TEV leader (Tobacco Etch Virus) (Allison et al. (1986); MDMV leader (Maize Dwarf Mosaic Virus); Virology 154: 9-20), and human immunoglobulin heavy-chain binding protein (BiP), (Macejak et al. (1991) Nature 353: 90-94); untranslated leader from the coat protein mRNA of alfalfa mosaic virus (AMV RNA 4) (Jobling et al. (1987) Nature 325: 622-625); tobacco mosaic virus leader (TMV) (Gallie et al. (1989) in Molecular Biology ofRNA, ed. Cech (Liss, New York), pp. 237-256); and maize chlorotic mottle virus leader (MCMV) (Lommel et al. (1991) Virology 81: 382-385). See also, Della-Cioppa et al. (1987) Plant Physiol. 84: 965-968. Other methods known to enhance translation can also be utilized, for example, introns, and the like.

In preparing the expression cassette, the various DNA fragments may be manipulated, so as to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame. Toward this end, adapters or linkers may be employed to join the DNA fragments or other manipulations may be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites, or the like. For this purpose, in vitro mutagenesis,

primer repair, restriction, annealing, resubstitutions, e. transitions and transversions, may be involved.

Generally, the expression cassette will comprise a selectable marker gene for the selection of transformed cells. Selectable marker genes are utilized for the selection of transformed cells or tissues. Marker genes include genes encoding antibiotic resistance, such as those encoding neomycin phosphotransferase II (NEO) and hygromycin phosphotransferase (HPT), as well as genes conferring resistance to herbicidal compound, such as glufosinate ammonium, bromoxynil, imidazolinones, and 2,4-dichlorophenoxyacetate (2,4-D). See generally, Yarranton (1992) Curr.

Opin. Biotech. 3: 506-511; Christopherson et al. (1992) Proc. Natl. Acad. Sci. USA 89: 6314-6318; Yao et al. (1992) Cell 71: 63-72; Reznikoff (1992) Mol. Microbiol.

6: 2419-2422; Barkley et al. (1980) in The Operon, pp. 177-220; Hu et al. (1987) Cell 48: 555-566; Brown et al. (1987) Cell 49 : 603-612; Figge et al. (1988) Cell 52 : 713- 722 ; Deuschle et al. (1989) Proc. Natl. Acad. Aci. USA 86: 5400-5404; Fuerst et al.

(1989) Proc. Natl. Acad. Sci. USA 86: 2549-2553; Deuschle et al. (1990) Science 248: 480-483; Gossen (1993) Ph. D. Thesis, University of Heidelberg; Reines et al.

(1993) Proc. Natl. Acad. Sci. USA 90: 1917-1921; Labow et al. (1990) Mol. Cell. Biol.

10: 3343-3356; Zambretti et al. (1992) Proc. Natl. Acad. Sci. USA 89: 3952-3956; Baim et al. (1991) Proc. Natl. Acad. Sci. USA 88: 5072-5076; Wyborski et al. (1991) Nucleic Acids Res. 19 : 4647-4653; Hillenand-Wissman (1989) Topics Mol. Struc.

Biol. I0: 143-162; Degenkolb et al. (1991) Antimicrob. Agents Chemother. 35 : 1591- 1595; Kleinschnidt et al. (1988) Biochemistry 27: 1094-1104; Bonin (1993) Ph. D.

Thesis, University of Heidelberg; Gossen et al. (1992) Proc. Natl. Acad. Sci. USA 89: 5547-5551; Oliva et al. (1992) Antimicrob. Agents Chemother. 36: 913-919; Hlavka et al. (1985) Handbook of Experimental Pharmacology, Vol. 78 (Springer- Verlag, Berlin); Gill et al. (1988) Nature 334: 721-724. Such disclosures are herein incorporated by reference.

The above list of selectable marker genes is not meant to be limiting. Any selectable marker gene can be used in the present invention.

The present invention also relates to the introduction of the transformation constructs into plant protoplasts, calli, tissues, or organ explants and the regeneration of transformed plants expressing the recombinant constructs of the invention.

The expression cassette sequences of the present invention may be used for transformation of any plant species, including, but not limited to, com (Zea mays), canola (Brassica napus, Brassica rapa ssp.), alfalfa (Medicago sativa), rice (Oryza sativa). rye (Secale cereale), sorghum (Sorghum bicolor, Sorghum vulgare), sunflower (Helianthus annuus), wheat (Triticum aestivum), soybean (Glycine max), tobacco (Nicotiana tabacum), potato (Solanum tuberosum), peanuts (rachis hypogaea). cotton (Gossypium barbadense, Gossypium hirsutum), sweet potato (Ipomoea batatus), cassava (Manihot esculenta), coffee (Cofea spp.), coconut (Cocos nucifèra), pineapple (Ananas comosus), citrus trees (Citrus spp.), cocoa (Theobroma cacao), tea (Camellia sinensis), banana (Musa spp.), avocado (Persea americana), fig (Ficus casica), guava (Psidium guajava), mango (Mangifera indica), olive (Olea europaea), papaya (Carica papaya), cashew (Anacardium occidentale), macadamia (Macadamia integrifolia), almond (Prunus amygdalus), sugar beets (Beta vulgaris), sugarcane (Saccharum spp.), oats, barley, vegetables, ornamentals, and conifers.

Vegetables include tomatoes (Lycopersicon esculentum), lettuce (e. g., Lactuca sativa), green beans (Phaseolus vulgaris), lima beans (Phaseolus limensis), peas (Lathyrus spp.), and members of the genus Cucumis such as cucumber (C. sativus), cantaloupe (C. cantalupensis), and musk melon (C. melo). Ornamentals include azalea (Rhododendron spp.), hydrangea (Macrophylla hydrangea), hibiscus (Hibiscus rosasanensis), roses (Rosa spp.), tulips (Tulipa spp.), daffodils (Narcissus spp.), petunias (Petunia hybrida), carnation (Dianthus caryophyllus), poinsettia (Euphorbia pulcherrima), and chrysanthemum. Conifers that may be employed in practicing the present invention include, for example, pines such as loblolly pine (Pinus taeda), slash pine (Pinus elliotii), ponderosa pine (Pinus ponderosa), lodgepole pine (Pinus contorta), and Monterey pine (Pinus radiata); Douglas-fir (Pseudotsuga; Western hemlock (Tsuga canadensis); Sitka spruce (Picea glauca); redwood (Sequoia sempervirens); true firs such as silver fir (Abies amabilis) and balsam fir (Abies balsamea); and cedars such as Western red cedar (Thuja plicata) and Alaska yellow-cedar (Chamaecyparis nootkatensis). Preferably, plants of the present invention are crop plants (for example, corn, alfalfa, sunflower, canola, soybean, cotton, peanut, sorghum, wheat, tobacco, etc.), more preferably com and soybean plants, yet more preferably com plants.

Transformation protocols as well as protocols for introducing nucleotide sequences into plants may vary depending on the type of plant or plant cell, i. e., monocot or dicot, targeted for transformation. Suitable methods of introducing nucleotide sequences into plant cells and subsequent insertion into the plant genome include microinjection (Crossway et al. (1986) Biotechniques 4 : 320-334), electroporation (Riggs et al. (1986) Proc. Natl. Acad. Sci. USA 83 : 5602-5606, Agrobacterium-mediated transformation (Townsend et al., U. S. Pat No.

5,563, 055), direct gene transfer (Paszkowski et al. (1984) EMBO J. 3: 2717-2722), and ballistic particle acceleration (see, for example, Sanford et al., U. S. Patent No.

4,945,050; Tomes ei al. (1995)"Direct DNA Transfer into Intact Plant Cells via Microprojectile Bombardment,"in Plant Cell, Tissue, and Organ Culture.: Fundamental Methods, ed. Gamborg and Phillips (Springer-Verlag, Berlin); and McCabe et al. (1988) Biotechnology 6: 923-926). Also see Weissinger et al. (1988) Ann. Rev. Genet. 22: 421-477; Sanford et al. (1987) Particulate Science and Technology 5: 27-37 (onion); Christou et al. (1988) Plant Physiol. 87 : 671-674 (soybean); McCabe et al. (1988) BiolTechnology 6: 923-926 (soybean); Finer and McMullen (1991) In Vitro Cell Dev. Biol. 27P: 175-182 (soybean); Singh et al.

(1998) Theor. Appl. Genet. 96: 319-324 (soybean); Datta et al. (1990) Biotechnology 8 : 736-740 (rice); Klein et al. (1988) Proc. Natl. Acad. Sci. USA 85: 4305-4309 (maize); Klein et al. (1988) Biotechnology 6: 559-563 (maize); Tomes, U. S. Patent No. 5,240,855; Buising et al., U. S. Patent Nos. 5,322,783 and Tomes et al. (1995)"Direct DNA Transfer into Intact Plant Cells via Microprojectile Bombardment,"in Plant Cell, Tissue, and Organ Culture.

Fundamental Methods, ed. Gamborg (Springer-Verlag, Berlin) (maize); Klein et al. (1988) Plant Physiol. 91: 440-444 (maize); Fromm et al. (1990) Biotechnology 8: 833-839 (maize); Hooykaas-Van Slogteren et al. (1984) Nature (London) 311: 763-764; Bowen et al., U. S. Patent No. 5,736,369 (cereals); Bytebier et al.

(1987) Proc. Natl. Acad. Sci. USA 84: 5345-5349 (Liliaceae); De Wet et al. (1985) in The Experimental Manipulation of Ovule Tissues, ed. Chapman et al. (Longman, New York), pp. 197-209 (pollen); Kaeppler et al. (1990) Plant Cell Reports 9: 415- 418 and Kaeppler et al. (1992) Theor. Appl. Genet. 84: 560-566 (whisker-mediated transformation); D'Halluin et al. (1992) Plant Cell 4 : 1495-1505 (electroporation); Li et al. (1993) Plant Cell Reports 12: 250-255 and Christou and Ford (1995)

Annals oJBotany 75: 407-413 (rice); Osjoda et al. (1996) Nature Biotechnology 14: 745-750 (maize via Agrobacterium tumefaciens); all of which are herein incorporated by reference.

The modified plant may be grown into plants in accordance with conventional ways. See, for example, McCormick et al. (1986) Plant Cell. Reports 5: 81-84. These plants may then be grown, and either pollinated with the same transformed strain or different strains, and the resulting hybrid having the desired phenotypic characteristic identified. Two or more generations may be grown to ensure that the subject phenotypic characteristic is stably maintained and inherited and then seeds harvested to ensure the desired phenotype or other property has been achieved.

Assays to determine the efficiency by which the isolated transit peptide sequences of the invention target a protein of interest to a plastid are known. A reporter gene such as-glucuronidase (GUS), chloramphenicol acetyl transferase (CAT), or green fluorescent protein (GFP) is operably linked to the transit peptide sequence. This fusion is placed behind the control of a suitable promoter, ligated into a transformation vector, and transformed into a plant or plant cell. Following an adequate period of time for expression and localization into the plastid, the plastid fraction is extracted and reporter activity assayed. The ability of the isolated sequences to target and deliver the reporter protein to the plastid will be compared to other known transit peptide sequences. See de Castro Silva Filho et al. (1996) Plant Mol. Biol. 30: 769-780. Protein import can also be verified in vitro through the addition of proteases to the isolated plastid fraction. Proteins which were successfully imported into the plastid are resistant to the externally added proteases whereas proteins that remain in the cytosol are susceptible to digestion.

The following examples are offered by way of illustration and not by way of limitation.

EXPERIMENTAL Example 1 Generating an expression cassette for the expression of GFP-transit peptide chimeric proteins The Green Fluorescent Protein (GFP) gene described by Prasher et al.

(1992) Gene 111: 229-233, is modified for expression in maize. The modified sequence. GFPm, is derived from a back translation of the GFP protein sequence using maize preferred codons and is shown in SEQ ID NO: 23. Sequence analysis is performed using the Wisconsin Sequence Analysis Package from Genetics Computer Group, Madison, WI. The nucleotide sequence is assembled from a series of synthetic oligonucleotides. Cloning sites within the GFPm include a 5' flanking BamHI restriction site, an AfIIII site at the start codon, a 3'flanking HpaI site or a BglII site converting the stop codon to an isoleucine.

Amino terminal and carboxy terminal fusions of transit peptide sequences to the modified green fluorescent protein (GFPm) are created using synthetic oligonucleotides encoding the transit peptide sequence flanked by appropriate restriction sites that allow in-frame fusions with GFPm.

The ubiquitin promoter is inserted upstream of the GFPm fusion protein.

Also engineered into the expression cassette is an intron and a PinII termination sequence.

Example 2 Transformation and Regeneration of Transgenic Plants with GFP Screening Immature maize embryos from greenhouse donor plants are bombarded with a plasmid containing a transit peptide sequence cloned into a flanking restriction sites of GFPm to create an inframe fusion. These sequences are operably linked to a ubiquitin promoter (Figure 1). Also contained on this plasmid is the selectable marker gene, PAT, (Wohlleben et al. (1988) Gene 70: 25-37) that confers resistance to the herbicide Bialaphos. Transformation is performed as follows. All media recipes are in the Appendix.

Preparation of Target Tissue The ears are surface sterilized in 30% Chlorox bleach plus 0.5% Micro detergent for 20 minutes, and rinsed two times with sterile water. The immature embryos are excised and placed embryo axis side down (scutellum side up), 25 embryos per plate, on 560Y medium for 4 hours and then aligned within the 2.5- cm target zone in preparation for bombardment.

Preparation of DNA A plasmid vector comprising a transit peptide sequence sequences cloned into restriction sites resulting in an inframe fusion with GFPm, and operably linked to a ubiquitin promoter, and containing a PAT selectable marker is precipitated onto 1.1 um (average diameter) tungsten pellets using a CaCl2 precipitation procedure as follows: 100 pl prepared tungsten particles in water 10 ul g) DNA in TrisEDTA buffer (1 sug total) 10012. 5 M CaCl2 10 u. l 0.1 M spermidine Each reagent is added sequentially to the tungsten particle suspension, while maintained on the multitube vortexer. The final mixture is sonicated briefly and allowed to incubate under constant vortexing for 10 minutes. After the precipitation period, the tubes are centrifuged briefly, liquid removed, washed with 500 ml 100% ethanol, and centrifuged for 30 seconds. Again the liquid is removed, and 105 pL1 100% ethanol is added to the final tungsten particle pellet.

For particle gun bombardment, the tungsten/DNA particles are briefly sonicated and 10 Ill spotted onto the center of each macrocarrier and allowed to dry about 2 minutes before bombardment.

Particle Gun Treatment The sample plates are bombarded at level #4 in particle gun #HE34-1 or #HE34-2. All samples receive a single shot at 650 PSI, with a total of ten aliquots taken from each tube of prepared particles/DNA.

Subsequent Treatment Following bombardment, the embryos are kept on 560Y medium for 2 days, then transferred to 560R selection medium containing 3 mg/liter Bialaphos, and subcultured every 2 weeks. After approximately 10 weeks of selection, selection-resistant callus clones are transferred to 288J medium to initiate plant regeneration. Following somatic embryo maturation (2-4 weeks), well-developed somatic embryos are transferred to medium for germination and transferred to the lighted culture room. Approximately 7-10 days later, developing plantlets are transferred to 272V hormone-free medium in tubes for 7-10 days until plantlets are well established. Plants are then transferred to inserts in flats (equivalent to 2.5" pot) containing potting soil and grown for 1 week in a growth chamber, subsequently grown an additional 1-2 weeks in the greenhouse, then transferred to classic 600 pots (1.6 gallon) and grown to maturity. Screening for GFP expression is carried out at each transfer using a Xenon and/or Mercury light source with the appropriate filters for GFP visualization.

Once GFP expressing colonies are identified they are monitored regularly for new growth and expression using the Xenon light source. Plant cells containing GFP are regenerated by transferring the callus to 288 medium containing MS salts, 1 mg/L IAA, 0.5 mg/L zeatin and 4% sucrose. The callus is placed in the light. As plantlets develop they are transferred to tubes containing 272K, hormone-free MS medium and 3% sucrose. The percentage of green fluorescent colonies that regenerated into whole plants can be determined.

The ability of the tansit peptide to target GFP to the plastid is determined in stable transgenic maize cells using epifluorescent microscopy and image enhancement software. Samples of calli from the transformed maize plants are fixed in FAA and are examined with UV filters to visualize GFP localization in the plastid. APPENDIX 272nV Ingredient Amount Unit D-1H20 950. 000 Ml MS Salts (GIBCO 11117-074) 4.300 G Myo-Inositol 0. 100 G MS Vitamins Stock Solution Ml5.000 GSucrose40.000 Bacto-Agar @ 6. 000 G Directions: @ = Add after bringing up to volume Dissolve ingredients in polished D-I H2O in sequence Adjust to pH 5.6 Bring up to volume with polished D-1 H20 after adjusting pH Sterilize and cool to 60°C.

## = Dissolve 0.100 g of Nicotinic Acid; 0.020 g of Thiamine. HCL; 0.100 g of Pyridoxine. HCL; and 0.400 g of Glycine in 875.00 ml of polished D-1 H20 in sequence. Bring up to volume with polished D-1 H2O. Make in 400 ml portions.

Thiamine. HCL & Pyridoxine. HCL are in Dark Desiccator. Store for one month, unless contamination or precipitation occurs, then make fresh stock.

Total Volume (L) = 1.00 288 J UnitIngredientAmount D-IMl950.000 MSg4.300 gMyo-Inositol0.100 MS Vitamins Stock Solution ml5.000 Zeatinml1.000 gSucrose60.000 3.000gGelrite@ Indoleacetic Acid 0.5 mg/ml ml2.000 Acid1.000ml0.1mMAbscisic Bialaphos 1mg/ml ml3.000 Directions: @ = Add after bringing up to volume Dissolve ingredients in polished D-I H20 in sequence Adjust to pH 5.6 Bring up to volume with polished D-1 H20 after adjusting pH Sterilize and cool to 60°C.

Add 3.5g/L of Gelrite for cell biology.

## = Dissolve 0.100 g of Nicotinic Acid; 0.020 g of Thiamine. HCL; 0.100 g of Pyridoxine. HCL; and 0.400 g of Glycine in 875.00 ml of polished D-1 H2O in sequence. Bring up to volume with polished D-1 H20. Make in 400 ml portions.

Thiamine. HCL & Pyridoxine. HCL are in Dark Desiccator. Store for one month, unless contamination or precipitation occurs, then make fresh stock.

Total Volume (L) = 1.00 560 R UnitIngredientAmount D-I Water, ml950.000 CHU (N6) Basal Salts (SIGMAC-1416) 4.000 g Eriksson's Vitamin Mix (1000X SIGMA-1511) 1.000 ml Thiamine.HCL 0. 4mg/ml 1. 250 ml Sucrose 30. 000 g 2,4-D 0. 5mg/ml 4.000 ml Gelriteg3.000 Silver Nitrate 2mg/ml # 0. 425 ml Bialaphos lmg/ml # 3. 000 ml Directions: @ = Add after bringing up to volume # = Add after sterilizing and cooling to temp.

Dissolve ingredients in D-I H2O in sequence Adjust to pH 5.8 with KOH Bring up to volume with D-1 H20 Sterilize and cool to room temp.

Total Volume (L) = 1.00 560 Y UnitIngredientAmount D-I Water, ml950.000 CHU (N6) Basal Salts (SIGMA C-1416) g Eriksson's Vitamin Mix (1000X SIGMA-1511) ml Thiamine.HCLml1.250 gSucrose120.000 2,4-Dml2.000 gL-Proline2.880 Gelriteg2.000 Silver Nitrate 2mg/ml ml4.250 Directions: @ = Add after bringing up to volume # = Add after sterilizing and cooling to temp.

Dissolve ingredients in D-1 H20 in sequence Adjust to pH 5.8 with KOH Bring up to volume with D-I H20 Sterilize and cool to room temp.

** Autoclave less time because of increased sucrose** Total Volume (L) = 1.00

All publications and patent applications mentioned in the specification are indicative of the level of those skilled in the art to which this invention pertains.

All publications and patent applications are herein incorporated by reference to the same extent as if each individual publication or patent application was specifically and individually indicated to be incorporated by reference.

Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it will be obvious that certain changes and modifications may be practiced within the scope of the appended claims.