Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
IMPROVED PRODUCTION OF TERPENOIDS USING ENZYMES ANCHORED TO LIPID DROPLET SURFACE PROTEINS
Document Type and Number:
WIPO Patent Application WO/2020/033705
Kind Code:
A2
Abstract:
Methods and expression systems are described herein that are useful for production of terpenes and terpenoids.

Inventors:
HAMBERGER BJORN (US)
SADRE RADIN
BENNING CHRISTOPH (US)
BIBIK JACOB DAVID (US)
Application Number:
PCT/US2019/045730
Publication Date:
February 13, 2020
Filing Date:
August 08, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV MICHIGAN STATE (US)
HAMBERGER BJORN (US)
SADRE RADIN (US)
BENNING CHRISTOPH (US)
BIBIK JACOB DAVID (US)
International Classes:
A01H6/82
Attorney, Agent or Firm:
PERDOK, Monique M. et al. (US)
Download PDF:
Claims:
What is claimed:

1. A fusion protein comprising a lipid droplet surface protein linked in-frame to one or more of the following fusion partners: a monoterpene synthase, diterpene synthase, sesquiterpene synthase, sesterterpene synthase, triterpene synthase, tetraterpene synthase, polyterpene synthase, cytochrome P450, cytochrome P450 reductase, 1-deoxy-D-xylulose 5-phosphate synthase (DXS), 1-deoxy-D-xylulose 5-phosphate-reducto-isomerase, cytidine S'-diphosphate- methylerythritol (CDP-ME) synthetase (IspD), 2-C-melhyl-d-erythritol 2,4- cyclodiphosphate synthase (IspF), geranylgeranyl diphosphate synthase (GGDPS), HMG-CoA synthase, HMG-CoA reductase (HMGR), mevalonic acid kinase (MVK), phosphomevalonate kinase (PMK), mevalonate-5- diphosphate decarboxylase (MPD), isopentenyl diphosphate isomerase (IDI), abietadiene synthase (ABS), famesylpyrophosphate synthase (FPPS), ribulosc bisphosphate carboxylase, squalene synthase (SQS), or patchoulol synthase.

2. The fusion protein of claim 1, wherein the lipid droplet surface protein has a sequence with at least 95% sequence identity to SEQ ID NO:l , or a truncated sequence with at least 95% sequence identity to a sequence consisting of at least 70 contiguous amino acids of SEQ ID NO:l .

3. The fusion protein of claim 1 , wherein the fusion partner comprises a

polypeptide with at least 95% sequence identity to SEQ ID NO:3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 , 33, 35, 37, 39, 41 , 43, 45, 47, 49, 51, 52, 53, 54, 55, 56, 59, 61, 63, 64, 65, 67, 68, 97, 99, 101, 104, 105, 107, 108, 110, or 111.

4. An expression system comprising at least one expression vector comprising a a first nucleic acid segment encoding a lipid droplet surface protein and at least one second nucleic acid segment encoding one or more of the following proteins: monoterpene synthase, diterpene synthase, sesquiterpene synthase, sesterterpene synthase, triterpene synthase, tetraterpene synthase, polyterpene synthase, transcription factor, cytochrome P450, cytochrome P450 reductase, 1-deoxy-D-xylulose 5-phosphate synthase (DXS), 1-deoxy-D-xylulose 5- phosphate-reducto-isomerase, cytidine 5'-diphosphate-methylerythritol (CDP- ME) synthetase (IspD), 2-C-methyl-d-erythritol 2,4-cyclodiphosphate synthase (IspF), geranylgeranyl diphosphate synthase (GGDPS), HMG-CoA synthase, HMG-CoA reductase (HMGR), mevalonic acid kinase (MVK), phosphomevalonate kinase (PMK), me valonate-5 -diphosphate decarboxylase (MPD), isopentenyl diphosphate isomerase (IDI), abietadiene synthase (ABS), famesylpyrophosphate synthase (FPPS), ribulosc bisphosphate carboxylase, squalene synthase (SQS), or patchoulol synthase, wherein the first nucleic segment, the at least one second nucleic acid segment, or a combination thereof are operably linked to a heterologous promoter.

5. The expression system of claim 4, wherein the lipid droplet surface protein has a sequence with at least 90% sequence identity to SEQ ID NO:l or a truncated sequence with at least 95% sequence identity to a sequence consisting of at least 70 contiguous amino acids of SEQ ID NO:l .

6. The expression system of claim 4, wherein first nucleic acid segment encoding a lipid droplet surface protein is linked in frame with at least one second nucleic acid segment encoding at least one of the proteins, such that the expression system expresses a fusion protein comprising the lipid droplet surface protein and at least one of the proteins.

7. The expression system of claim 4, comprising two or more expression

cassettes or two or more expression vectors, a first expression cassette or expression vector comprising a heterologous promoter operably linked to a nucleic acid segment encoding a fusion protein comprising a lipid droplet surface protein linked in-frame to one or more of tire following fusion partners: a monoterpene synthase, diterpene synthase, sesquiterpene synthase, sesterterpene synthase, triterpene synthase, tetraterpene synthase, polyterpene synthase, cytochrome P450, cytochrome P450 reductase, 1-deoxy-D-xylulose 5-phosphate synthase (DXS), 1-deoxy-D-xylulose 5 -phosphate-reducto- isomerase, cytidine S'-diphosphate-metiiylerythritol (CDP-ME) synthetase (IspD), 2-C-methyl-d-erythritol 2,4-cyclodiphosphate synthase (IspF), geranylgeranyl diphosphate synthase (GGDPS), HMG-CoA synthase, HMG- CoA reductase (HMGR), mevalonic acid kinase (MVK), phosphomevalonate kinase (PMK), mevalonate-5-diphosphate decarboxylase (MPD), isopentenyl diphosphate isomerase (IDI), abietadiene synthase (ABS),

famesylpyrophosphate synthase (FPPS), ribulose bisphosphate carboxylase, squalene synthase (SQS), or patchoulol synthase; and a second expression cassette or expression vector comprising a heterologous promoter operably linked to a nucleic acid segment encoding one or more of the following proteins: a monoterpene synthase, diterpene synthase, sesquiterpene synthase, sesterterpene synthase, triterpene synthase, tetraterpene synthase, polyterpene synthase, cytochrome P450, cytochrome P450 reductase, 1-deoxy-D-xylulose 5-phosphate synthase (DXS), 1-deoxy-D-xylulose 5 -phosphate-reducto- isomerase, cytidine 5'-diphosphate-methylerythritol (CDP-ME) synthetase (IspD), 2-C-methyl-d-erythritol 2,4-cyclodiphosphate synthase (IspF), geranylgeranyl diphosphate synthase (GGDPS), HMG-CoA synthase, HMG- CoA reductase (HMGR), mevalonic add kinase (MVK), phosphomevalonate kinase (PMK), mevalonate-5-diphosphate decarboxylase (MPD), isopentenyl diphosphate isomerase (ID1), abietadiene synthase (ABS),

fames ylpyrophosphate synthase (FPPS), ribulose bisphosphate carboxylase, squalene synthase (SQS), or patchoulol synthase.

8. The expression system of claim 4, further comprising at least one expression cassette or at least one expression vector comprising a heterologous promoter operably linked to a nucleic acid segment encoding an enzyme selected from geranylgeranyl diphosphate synthase (GGDPS), 1-deoxy-D-xylulose 5- phosphate synthase (DXS), abietadiene synthase (ABS), 3-hydroxy-3- methylglutaryl-CoA reductase (HMGR), fames yl diphosphate synthase (FDPS), cytochrome P450, N ADPH-dependent cytochrome P450 reductase (CPR), each nucleic acid segment encoding an enzyme optionally linked in frame to a lipid droplet surface protein.

9. The expression system of claim 4, further comprising an expression cassette comprising a promoter operably linked to a nucleic add encoding a WRI1 transcription factor.

10. The expression system of claim 4, further comprising an expression cassette comprising a promoter operably linked to a nucleic add encoding a lipid droplet surface protdn.

11. The expression system of claim 4, wherein an encoded plastid targeting region or an encoded hydrophobic region is removed from the nucleic acid segment encoding the one or more of the proteins.

12. The expression system of claim 4, further comprising an encoded plastid targeting region or an encoded hydrophobic region linked in frame with nucleic acid segment encoding the one or more of the proteins.

13. The expression system of claim 4, wherein one or more of the proteins has an amino acid sequence at least 95% sequence identity to SEQ ID NO:3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 52, 53, 54, 55, 56, 59, 61, 63, 64, 65, 67, 68, 97, 99, 101, 104, 105, 107, 108, 110, or 11 1.

14. The expression system of claim 4, wherein die first nucleic acid segment or at least one of the second nucleic acid segments is codon-optimized for expression in plastid or in a host cell.

15. The expression system of claim 4, wherein at least of the heterologous

promoters is active in plant plastids.

16. A host cell, host tissue, host seed, or host plant comprising the expression system of claim 4.

17. The host cell, host tissue, host seed, or a host plant of claim 16, which is an oilseed, camelina, canola, castor bean, com, flax, lupin, peanut, potatoe, safflower, soybean, sunflower, cottonseed, oil firewood tree, rapeseed, rutabaga, sorghum, walnut, or nut species.

18. The host cell, host tissue, host seed, or a host plant of claim 16, which is a Nicotiana benthamiana, Nicotiana tabacum, Nicotiana rustica, Nicotiana excelsior, or Nicotiana excelsiana species.

19. A method comprising:

(a) incubating or cultivating one or more host cells, host tissues, host seeds, or host plants, each comprising expression system comprising at least one expression vector comprising a a first nucleic acid segment encoding a lipid droplet surface protein and at least one second nucleic acid segment encoding one or more of the following proteins: monoterpene synthase, diterpene synthase, sesquiterpene synthase, sesterterpene synthase, triterpene synthase, tetraterpene synthase, polyterpene synthase, transcription factor, cytochrome P450, cytochrome P450 reductase, 1-deoxy-D-xylulose 5- phosphate synthase (DXS), 1-deoxy-D-xylulose 5-phosphate-reducto- isomerase, cytidine 5'-diphosphate-methylerythritol (CDP-ME) synthetase (IspD), 2-C-methyl-d-erythritol 2,4-cyclodiphosphate synthase (IspF), geranylgeranyl diphosphate synthase (GGDPS), HMG-CoA synthase, HMG- CoA reductase (HMGR), mevalonic add kinase (MVK), phosphomevalonate kinase (PMK), mevalonate-5-diphosphate decarboxylase (MPD), isopentenyl diphosphate isomerase (IDI), abietadiene synthase (ABS),

famesylpyrophosphate synthase (EPPS), ribulose bisphosphate carboxylase, squalene synthase (SQS), or patchoulol synthase, wherein the first nucleic segment, the at least one second nucleic acid segment, or a combination thereof are operably linked to a heterologous promoter; and

(b) isolating lipids from the host cell, host tissue, host seed, or host plant.

20. The method of claim 19, wherein the lipid droplet surface protein has a

sequence with at least 90% sequence identity to SEQ ID NO:l or a truncated sequence with at least 95% sequence identity to a sequence consisting of at least 70 contiguous amino acids of SEQ ID NO:l .

21. The method of claim 19, wherein first nucleic acid segment encoding a lipid droplet surface protein is linked in frame with at least one second nucleic acid segment encoding at least one of the proteins, such that the expression system expresses a fusion protein comprising the lipid droplet surface protein and at least one of the proteins.

22. The method of claim 19, comprising two or more expression cassettes or two or more expression vectors, a first expression cassette or expression vector comprising a heterologous promoter operably linked to a nucleic acid segment encoding a fusion protein comprising a lipid droplet surface protein linked in- frame to one or more of the following fusion partners: a monoterpene synthase, diterpene synthase, sesquiterpene synthase, sesterterpene synthase, triterpene synthase, tetraterpene synthase, polyterpene synthase, cytochrome P450, cytochrome P450 reductase, 1-deoxy-D-xylulose 5-phosphate synthase (DXS), 1-deoxy-D-xylulose 5-phosphate-reducto-isomerase, cytidine 5'- diphosphate-methylerythritol (CDP-ME) synthetase (IspD), 2-C-methyl-d- erythritol 2,4-cyclodiphosphate synthase (IspF), geranylgeranyl diphosphate synthase (GGDPS), HMG-CoA synthase, HMG-CoA reductase (HMGR), mevalonic acid kinase (MVK), phosphomevalonate kinase (PMK), mevalonate-5 -diphosphate decarboxylase (MPD), isopentenyl diphosphate isomerase (IDI), abietadiene synthase (ABS), famesylp yrophosphate synthase (EPPS), ribulose bisphosphate carboxylase, squalene synthase (SQS), or patchoulol synthase; and a second expression cassette or expression vector comprising a heterologous promoter operably linked to a nucleic acid segment encoding one or more of the following proteins: a monoteipene synthase, diterpene synthase, sesquiterpene synthase, sesterterpene synthase, triterpene synthase, tetraterpene synthase, polyterpene synthase, cytochrome P450, cytochrome P450 reductase, 1-dcoxy-D-xylulose 5-phosphate synthase (DXS), 1-deoxy-D-xylulose 5-phosphate-reducto-isomerase, cytidine 5'-diphosphate- methylerythritol (CDP-ME) synthetase (IspD), 2-C-methyl-d-erythritol 2,4- cyclodiphosphate synthase (IspF), geranylgeranyl diphosphate synthase (GGDPS), HMG-CoA synthase, HMG-CoA reductase (HMGR), mevalonic acid kinase (MVK), phosphomevalonate kinase (PMK), mevalonate-5- diphosphate decarboxylase (MPD), isopentenyl diphosphate isomerase (IDI), abietadiene synthase (ABS), famesylpyrophosphate synthase (FPPS), ribulose bisphosphate carboxylase, squalene synthase (SQS), or patchoulol synthase.

23. The method of claim 19, further comprising at least one expression cassette or at least one expression vector comprising a heterologous promoter operably linked to a nucleic acid segment encoding an enzyme selected from geranylgeranyl diphosphate synthase (GGDPS), 1-deoxy-D-xylulose 5- phosphate synthase (DXS), abietadiene synthase (ABS), 3-hydroxy-3- methylglutaryl-CoA reductase (HMGR), fames yl diphosphate synthase (FDPS), cytochrome P450, N ADPH-dependent cytochrome P450 reductase (CPR), each nucleic acid segment encoding an enzyme optionally linked in frame to a lipid droplet surface protein.

24. The method of claim 19, further comprising an expression cassette comprising a promoter operably linked to a nucleic acid encoding a WRI1 transcription factor.

25. The method of claim 19, further comprising an expression cassette comprising a promoter operably linked to a nucleic acid encoding a lipid droplet surface protein.

26. The method of claim 19, wherein an encoded plastid targeting region or an encoded hydrophobic region is removed from the nucleic acid segment encoding the one or more of the proteins.

27. The method of claim 19, further comprising an encoded plastid targeting

region or an encoded hydrophobic region linked in frame with nucleic acid segment encoding the one or more of the proteins.

28. The method of claim 19, wherein one or more of the proteins has an amino acid sequence at least 95% sequence identity to SEQ ID NO:3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 52, 53, 54, 55, 56, 59, 61, 63, 64, 65, 67, 68, 97, 99, 101, 104, 105, 107, 108, 110, or

111.

29. The method of claim 19, wherein the first nucleic acid segment or at least one of the second nucleic acid segments is codon-optimized for expression in plastid or in a host cell.

30. The method of claim 19, wherein at least of the heterologous promoters is active in plant plastids.

31. The method of claim 19, wherein the lipids isolated from one or more host cells, host tissues, host seeds, or host plants comprise one or more types of monoterpene, diterpene, sesquiterpene, sesterterpene, triterpene, tetraterpene, polyterpene, or a mixture thereof.

32. The method of claim 19, wherein after incubation or cultivation, one or more host cells, host tissues, host seeds, or host plants has at least 300 micrograms terpenoids per gram fresh weight or at least 0.03% fresh weight monoterpene, diterpene, sesquiterpene, sesterterpene, triterpene, tetraterpene, polyterpene, or a mixture thereof.

Description:
Improved Production of Teroenoids

using Enzymes Anchored to Lipid Droplet Surface Proteins

This application claims benefit of priority to the filing date of U.S. Provisional Application Ser. No. 62/716,076, filed August 8, 2018, the contents of which are specifically incorporated herein by reference in their entity.

Government Funding

This invention was made with government support under DE-FC02- 07ER64494 and under DE-SC0018409 awarded by the U.S. Department of Energy. The government has certain rights in the invention.

Background

Plant-derived terpenoids have a wide range of commercial and industrial uses. Examples of uses for terpenoids include specialty fuels, agrochemicals, fragrances, nutraceuticals and pharmaceuticals. However, currently available methods for petrochemical synthesis, extraction, and purification of terpenoids from the native plant sources have limited economic sustainability. For example, terpenoid biotechnology in photosynthetic tissues has remained challenging at least in part because any engineered pathways must compete for precursors with highly networked native pathways and their associated regulatory mechanisms.

Summary

Described herein are methods and expression systems that provide high yields of terpenoids and related compounds in cells having terpene synthases and other enzymes anchored to cellular lipid droplets. The methods enhance precursor flux through targeting of enzymes that can synthesize terpene precursors to native and non-native compartments to provide for increased terpenoid production. By producing lipophilic products (e.g., terpenoids) at the surface or within tire lipid droplet, the anchored terpenoid biosynthetic enzymes facilitate sequestration of terpenoid products within the lipid droplets. The methods can efficiently produce industrially relevant terpenoids in photosynthetic tissues. For example, in some experiments yields of terpenoids of more than 300 micrograms terpenoids per gram fresh weight (0.03% fresh weight) can be obtained. Fusion proteins are described herein including those that have a lipid droplet surface protein linked in-frame to one or more of die following fusion partners: a monoterpene synthase, diterpene synthase, sesquiterpene synthase, sesterterpene synthase, triterpene synthase, tetraterpene synthase, polyterpene synthase, cytochrome P450, cytochrome P450 reductase, 1 -deoxy-D-xylulose 5-phosphate synthase (DXS),

1-deoxy-D-xylulose 5 -phosphate-redu cto- isomer ase, cytidine S'-diphosphate- methylerythritol (CDP-ME) synthetase (IspD), 2-C-methyl-d-erythritol 2,4- cyclodiphosphate synthase (IspF), geranylgeranyl diphosphate synthase (GGDPS), HMG-CoA synthase, HMG-CoA reductase (HMGR), mevalonic acid kinase (MVK), phosphomevalonate kinase (PMK), mcvalonate-5 -diphosphate decarboxylase (MPD), isopentenyl diphosphate isomerase (IDI), abietadiene synthase (ABS),

famesylpyrophosphate synthase (FPPS), ribulose bisphosphate carboxylase, squalene synthase (SQS), or patchoulol synthase.

Expression systems are also described herein that include at least one expression vector having a first nucleic acid segment encoding a lipid droplet surface protein and at least one second nucleic acid segment encoding one or more of the following proteins: monoterpene synthase, diterpene synthase, sesquiterpene synthase, sesterterpene synthase, triterpene synthase, tetraterpene synthase, polyterpene synthase, transcription factor, cytochrome P450, cytochrome P450 reductase, 1- deoxy-D-xylulose 5-phosphate synthase (DXS), 1-deoxy-D-xylulose 5-phosphate- redu cto- isomerase, cytidine S'-diphosphate-methylerythritol (CDP-ME) synthetase (IspD), 2-C-methyl-d-erythritol 2,4-cyclodiphosphate synthase (IspF), geranylgeranyl diphosphate synthase (GGDPS), HMG-CoA synthase, HMG-CoA reductase

(HMGR), mevalonic acid kinase (MVK), phosphomevalonate kinase (PMK), mevalonate-5 -diphosphate decarboxylase (MPD), isopentenyl diphosphate isomerase

(IDI), abietadiene synthase (ABS), famesylpyrophosphate synthase (FPPS), ribulose bisphosphate carboxylase, squalene synthase (SQS), or patchoulol synthase, wherein the first nucleic segment, the at least one second nucleic acid segment, or a combination thereof are op er ably linked to a heterologous promoter.

Methods are also described herein. For example, such a method can include: (a) incubating or cultivating one or more host cells, host tissues, host seeds, or host plants, each comprising expression system comprising at least one expression vector comprising a a first nucleic acid segment encoding a lipid droplet surface protein and at least one second nucleic acid segment encoding one or more of the following proteins: monoterpene synthase, diterpene synthase, sesquiterpene synthase, sesterterpene synthase, triterpene synthase, tetraterpene synthase, polyterpene synthase, transcription factor, cytochrome P450, cytochrome P450 reductase, 1- deoxy-D-xyhilose 5-phosphate synthase (DXS), 1-deoxy-D-xylulose 5-phosphate- reducto-isomerase, cytidine 5’-diphosphate-methylerythritol (CDP-ME) synthetase (IspD), 2-C-methyl-d-erythritol 2,4-cyclodiphosphate synthase (IspF), geranylgeranyl diphosphate synthase (GGDPS), HMG-CoA synthase, HMG-CoA reductase

(HMGR), mevalonic acid kinase (MVK), phosphomevalonate kinase (PMK), mevalonate-5 -diphosphate decarboxylase (MPD), isopentenyl diphosphate isomerase (IDI), abietadiene synthase (ABS), famesylpyrophosphate synthase (FPPS), ribulose bisphosphate carboxylase, squalene synthase (SQS), or patchoulol synthase, wherein the first nucleic segment, the at least one second nucleic acid segment, or a combination thereof are op er ably linked to a heterologous promoter; and (b) isolating lipids from the host cell, host tissue, host seed, or host plant.

For example, one of the methods described herein involves (a) incubating a population of host cells comprising an expression system that includes at least one expression cassette having a heterologous promoter operably linked to a nucleic acid segment encoding a fusion protein that includes lipid droplet surface protein (LDSP) linked in-frame to a monoterpene synthase, diterpene synthase, sesquiterpene synthase, sesterterpene synthase, triterpene synthase, tetraterpene synthase, or a polyterpene synthase; and (b) isolating lipids from the population of host cells. The method expression system can also include an expression cassette comprising a promoter operably linked to a nucleic acid encoding a WRI1 transcription factor. In addition, the expression system can include expression cassettes that can express geranylgeranyl diphosphate synthase (GGDPS) enzymes, 1-deoxy-D-xylulose 5- phosphate synthase (DXS), 3-hydroxy-3-methylglutaryl-CoA reductase (HMGR), famesyl diphosphate synthase (FDPS), cytochromes P450, cytochrome P450 reductase, other terpenoid synthesizing enzymes, and combinations thereof.

In some cases, methods of producing terpenes and/or terpenoids can include, for example, (a) incubating a population of host cells comprising an expression system that includes: (i) an expression cassette (or expression vector) having a heterologous promoter operably linked to a nucleic acid segment encoding a geranylgeranyl diphosphate synthase (GGDPS) enzyme, (ii) an expression cassette (or expression vector) having a heterologous promoter that is active in plant plastids operably linked to a nucleic acid segment encoding a 1 -deoxy-D-xylulose 5- phosphate synthase (DXS) enzyme, (iii) an expression cassette (or expression vector) having a heterologous promoter operably linked to a nucleic acid segment encoding an abietadiene synthase (ABS) enzyme, or (iv) a combination thereof; and (b) isolating lipids from the population of host cells. In addition, the expression system can include expression cassettes that can express 3-hydroxy-3-methylglutaryl-CoA reductase (HMGR), famesyl diphosphate synthase (FDPS), cytochromes P450, cytochrome P450 reductase, other terpenoid synthesizing enzymes, and combinations thereof.

In some cases, methods of producing terpenes and/or terpenoids can include, for example, (a) incubating a population of host cells comprising an expression system that includes: (i) at least one expression cassette (or expression vector) having a heterologous promoter that operably linked to a nucleic acid segment encoding a 3- hydraxy-3-methylglutaryl-CoA reductase (HMGR) enzyme; (ii) at least one expression cassette (or expression vector) having a heterologous promoter that operably linked to a nucleic acid segment encoding a geranylgeranyl diphosphate synthase (GGDPS) enzyme; (iii) at least one expression cassette (or expression vector) having a heterologous promoter that operably linked to a nucleic acid segment encoding an abietadiene synthase (ABS) enzyme; or (iv) a combination thereof; and (b) isolating lipids from the population of host cells. In addition, the expression system can include expression cassettes that can express 1 -deoxy-D-xylulose 5- phosphate synthase (DXS), 3 famesyl diphosphate synthase (FDPS), cytochrome P450, cytochrome P450 reductase, other terpenoid synthesizing enzymes, and combinations thereof.

Description of the Figures

FIG. 1A-1C illustrates engineered lipid droplet triacylglycerol (TAG) and patchoulol production in N. benthamiana leaves. FIG. 1 A illustrates that

triacylglycerol accumulation is increased through expression of Arabidopsis thaliana WRINKLED1 (producing AfWRIl (1-397) protein, which has a deletion of the C- terminal region) and enhanced through co -expression of a Nannochloropsis oceanica lipid droplet surface protein (JVbLDSP). FIG. IB illustrates patchoulol production that was engineered to occur in the cytosol in the absence and presence of AfWRIl(l-397) and TVbLDSP. FIG. 1C illustrates patchoulol production that was engineered in the plastid in the absence and presence of AfWRIl (1-397) and /VbLDSP. To enhance famesyl diphosphate (FDP) availability for patchoulol production, a cytosolic, de- regulated 3-hydroxy-3-methylglutaryl-CoA reductase (HMGR) from Euphorbia lathyris (E/HMGR159-582, missing residues 1-158), a plastid-localized Plectranthus barbatus (Coleus forskohlii) 1-deoxy-D-xylulose 5-phosphate synthase (PbOXS, CfDXS, plastid), and an Arabidopsis thaliana famesyl diphosphate synthase

(AzFDPS) (localized in the cytosol or plastid) were expressed in transient assays. The different construct combinations are indicated below each bar (·, was included; -, was not included) and in the schematic diagram next to each graph. Average levels with standard deviation (SD) (n=6) and SD (n=8) for TAG and patchoulol, respectively, are shown. Statistically significant differences are indicated in the bars identified by the letters a-e (P<0.05). MEV pathway, mevalonic acid pathway; MEP pathway (2-C- methyl-D-erythiitol 4-phosphate pathway), methylerythritol 4-phosphate pathway; LD, lipid droplet.

FIG. 2A-2F illustrate engineered diterpenoid production in Nicotiana benthamiana leaves. FIG. 2A illustrates production of diterpenoids (abietadiene and its isomers) in the plastids of N. benthamiana leaves, where Abies grandis abietadiene synthase (AgABS) was expressed with a variety of different enzymes. FIG. 2B illustrates production of diterpenoids (abietadiene and its isomers) in the plastids of N. benthamiana leaves when Abies grandis abietadiene synthase (AgABS) was expressed with a variety of different enzymes and/or a truncated WRINKLED (WRI1) and/or a Nannochloropsis oceanica lipid droplet surface protein (iVbLDSP).

N FIG. 2C illustrates production of diterpenoids (abietadiene and its isomers) in the cytosol of N. benthamiana leaves when cytosolic Abies grandis abietadiene synthase (AgABS) is expressed with a variety of enzymes and/or truncated WRINKLED

(WRI1) and/or a Nannochloropsis oceanica lipid droplet surface protein (A/oLDSP). To enhance GGDP availability for diterpenoid production in FIGs. 2 A-2C, truncated 3-hydroxy-3-methylglutaryl-CoA reductase (HMGR) from Euphorbia lathyris (E/HMGR 159 582 , expressed in the cytosol), 1-deoxy-D-xylulose 5-phosphate synthase from Plectranthus barbatus (also called Coleus forskohlii) (PbOXS] expressed in plastids), and distinct geranylgeranyl diphosphate synthases (GGDPSs) (cytosol or plastid) were included in transient assays. The protein combinations are indicated below each bar (black circle, was included; minus, was not included) and in the scheme next to each graph. The production of diterpenoids was engineered in the plastid (FIG. 2A-2B) and in the cytosol (FIG. 2C) in the absence and presence of AfWRIl 1-397 and /VbLDSP. Average diterpenoid levels with SD (n=4), SD (rt=8) and SD (n=6) are shown in FIGs. 2A, 2B, and 2C, respectively. Statistically significant differences are indicated by letters a-/(P<0.05). MEV pathway, mevalonic acid pathway; MEP pathway, methylerythritol 4-phosphate pathway; LD, lipid droplet. FIG. 2D-2E illustrate that diterpenoids were sequestered in isolated lipid droplet fractions. FIG. 2D shows floating lipid droplet layers after gradient centrifugation of isolated lipid droplet fractions from M benthamiana leaves expressing either plastid:AgABS alone or in combination with AtWRIl(l-397) and /VbLDSP (without and without YFP-tag). FIG. 2E graphically illustrates diterpenoid content in the isolated lipid droplet fractions with the bars representing average values and SD for three biological replicates (n=3). Statistically significant differences are indicated by the letters a-c (P<0.05). FIG. 2F illustrates that expression of (YFP)-tagged

Nannochbropsis oceanica lipid droplet surface protein (LDSP), LDSP-fiused ABS 83- 868 protein, LDSP-fused CYP720B4 3(M83 protein, and LDSP-fused CoCPR 70-708 protein promotes clustering of small lipid droplets in /V. benthamiana leaves engineered for triacylglycerol accumulation. In the LDSP-fused ABS 85-868 protein (LD :AgABS 85-868 ), the LDSP replaces the transit peptide (residues 1-84) of the ABS enzyme to provide a cytosolic version of the ABS enzyme. The LDSP-fused

CYP720B4 30 " 483 protein (LDti D sCYP720B4 30-483 ) is the cytochrome P450

70-708 j s cytochrome P450 reductase (CaCPR) from Camptotheca acuminata without residues 1-69. Confocal laser scanning microscopy merged images are shown for N.

benthamiana leaves (yellow, YFP signal; red, chlorophyll fluorescence; scale bar 2 mih).

FIG. 3A-3B illustrate triacylglycerol (TAG) yield in N. benthamiana leaves engineered for the co-production of terpenoids and lipid droplets. FIG. 3A illustrates the impact of engineering patchoulol production on the amounts of lipids (TAG) in N. benthamiana leaves that express a P. cablin patchoulol synthase in the cytosol or plastids (plastid :PcP AS) in addition to other enzymes. FIG. 3B illustrates the impact of engineering diterpenoid production in either plastids or in tire cytosol on the amounts of lipids (TAG) produced in /V. benthamiana leaves that express a variety of enzymes in addition to Abies grandis abietadiene synthase (AgABS), which can synthesize diterpenes. TAG accumulation was initiated through ectopic expression of WRINKLED 1 (AtWRIl 1 397 ) and further enhanced through co-expression of /VoLDSP. The different construct combinations are indicated below each bar (·, was included; -, was not included). Average TAG levels with SD (n=6) are shown.

Statistically significant differences are indicated by a-d (P<0.05).

FIG. 4 illustrates localization of heterologously-expressed yellow fluorescent protein (YFP)-tagged fusion proteins including YFP-tagged Nannochloropsis oceanica lipid droplet surface protein (LDSP), YFP-tagged LDSP-fused AgABS 85-868 (LDAgABS 85 868 , missing residues 1-84), YFP-tagged LDSP-fused CYP720B4 protein (LD:/¾CYP720B4(30-483) missing residues 1-29), and YFP-tagged LDSP- fused CPR protein (LD:CaCPR(70-708), missing residues 1-69)). The AgABS(85- 868) protein was truncated to remove the plastid targeting sequence while the /¾CYP720B4(30-483) and C¾CPR(70-708) proteins were truncated to remove tire membrane anchoring domain. Note that AtWRIl (1 -397) was co-produced and leaf samples were stained with Nile red to visualize neutral lipids in lipid droplets. This experiment was replicated twice. Confocal laser scanning microscopy images are shown (the lighter signal is yellow produced by YFP fluorescence; the darker signal is red produced by chlorophyll fluorescence; scale bar 10 pm). The expressed YFP- proteins are indicated in each line. LD, lipid droplet. Channels: YFP yellow fluorescent protein (scale bar 20 pm), NR Nile red (scale bar 20 pm), YFP NR, enlarged merge YFP and NR (scale bar 5 pm).

FIG. 5A-5D illustrate lipid droplets are useful engineering platforms for the production of functionalized diterpenoids. FIG. 5A graphically illustrates diterpenoid and diterpenoid acid production when the following terpenoid biosynthesis enzymes were targeted to lipid droplets as fusion proteins with Nannochloropsis oceanica lipid droplet surface protein (LD): LD:PsCYP720B44(30-483) and LD:CaCPR(70-708), and different combinations with other enzymes were also expressed as indicated below each bar (black circle, was included; minus, was not included). FIG. 5B graphically illustrates diterpenoid and diterpenoid acid production when the following terpenoid biosynthesis enzymes were targeted to lipid droplets as fusion proteins with Nannochloropsis oceanica lipid droplet surface protein (LD): LD:PsCYP720B44(30- 483) and LD:CaCPR(70-708), and different combinations with other enzymes were also expressed as indicated below each bar (black circle, was included; minus, was not included). FIG. 5C graphically illustrates diterpenoid and diterpenoid acid production when the following terpenoid biosynthesis enzymes were targeted to lipid droplets as fusion proteins with Nannochloropsis oceanica lipid droplet surface protein (LD): LD:AgABS(85-868), LD:PsCYP720B44(30-483), and

LD:CaCPR(70-708), and different combinations with other enzymes were also expressed as indicated below each bar (black circle, was included; minus, was not included). As shown, production of native or modified AgABS led to accumulation of diterpenoids, and when native or modified PsCYP720B4 was co-produced, conversion of diterpenoids to diterpenoid acids was also observed. For FIGs. 5 A-5C, data were analyzed by Shapiro-Wilk, Brown-Forsythe ANOVA (diterpenoids P < 0.0184, P < 0.0001, P < 0.0001 ; diterpenoid acids P < 0.0001, P < 0.0001, P < 0.0001) and Welch ANOVA (diterpenoids P < 0.0509, P 0.0002, P < 0.0001 ; diterpenoid acids P < 0.0001, P < 0.0001, P 0.0002) followed by t-tests (unpaired, two-tailed, Welch correction). Results are presented as individual biological replicates and bars representing average levels with SD (N indicated below each bar). Statistically significant differences are indicated by a-d based on t-tests (P < 0.05). The experiments relating to FIGs. 5A-5C were replicated twice. FIG. 5D schematically illustrates the conversion of abietadiene to abietic acid when LD:AgABS(85-868) (NoLDSP-AgABS), LD:PsCYP720B44(30-483) (NoLDSP-PsCYP) and

LD:CaCPR(70-708) (NoLDSP-CaCPR) were produced. LD, lipid droplet; e-, electron from NADPH.

FIG. 6 illustrates LC/MS analysis of extracts from N. benthamiana leaves producing AfWRIl (1-397) withNdLDSP, £7HMGR( 159-582), cytosokMzGGDPS,

LD AgABS(85-868), and ER:PsCYP720B4. Extracted ion chromatograms m/z 301.217 are shown in acquisition function 1 (0 V) and function 2 (20-80 V).

Compounds 1-4 were subjected to MS/MS analysis. The elution order and MS/MS data were consistent with compound 1-3 and compound 4 being formate adducts of tetrahexosyl diterpenoid acid isomers and trihexosyl diterpenoid add, respectively (see FIGs. 7-8).

FIG. 7 illustrates LC/MS/MS analysis of tetrahexosyl dilerpenoid acid isomers in N. benthamiana leaf extracts where the leaves transiently expressed AfWRIl 1-397 with WoLDSP, £/HMGR(159-582), cytosoliMiGGDPS, LDiAgABS(85-868), and ER:PcCYP720B4. Accurate masses and MS/MS spectra of compounds 1-3 are consistent with formate adducts of tetrahexosyl diterpenoid acid isomers [M+formate] m/z 995.4 (fragments: [M-formate] m/z 949.4, [M-formate-partial loss of dihexosyl] m/z 667.3 and [M-formate-tetrahexosyl] m/z 301.2).

FIG. 8 illustrates LC/MS/MS analysis of a trihexosyl diterpenoid acid (compound 4) in M benthamkma leaf extracts where die leaves transiently expressed AfWRIl 1 397 with WoLDSP, £/HMGR(l 59-582), cytosol:MzGGDPS, LDAgABS(85- 868), and ER:PcCYP720B4. Elemental composition and MS/MS spectrum of compound 4 are consistent with a formate adduct of trihexosyl diterpenoid acid

[M+formate] m/z 833.3 (fragments: [M-formate] m/z 787.4, [M-formate-dihexosyl] m/z 463.3 and [M-formate-trihexosyl] m/z 301.2).

FIG. 9 is a schematic diagram illustrating lipid droplet scaffolding of squalene biosynthesis enzymes farnesyl diphosphate synthase (FPPS) and squalene synthase (SQS), the final two steps of squalene biosynthesis. Lipid droplet formation is induced by expression of AtWRll(l-397) and by expression of variations of /VoLDSP alone or as LDSP-fusions with either FPPS or SQS.

FIG. 10 graphically illustrates casbene levels generated during a screen of 1- deoxy-D-xylulose 5-phosphate synthase (DXS) and DXS alternatives that were co- expressed with Coleus forskohlii GGPPS (QGGPPS) and a casbene synthase (CasS). Vertical bars represent upper and lower value limits. The interquantile range between the first and third quantile represented by the box. Middle horizontal bar represents the median value and red cross represents tire average value.

FIG. 11 graphically illustrates results of screening squalene synthases for optimal activity. The graph shows squalene yields as determined by GC-FID for various squalene synthases, where the relative yields are reported as tire ratio of squalene to the internal standard, n-hexacosane. As illustrated, a MortiereUa alpina squalene synthase with 17 amino acids truncated from the C-tenninus had the highest squalene synthase activity.

FIG. 12 graphically illustrates results of screening of farnesyl diphosphate synthase (FPPS) candidates to optimize squalene synthesis. The graph shows squalene yields as determined by GC-FID for various farnesyl diphosphate synthases, where the relative yields are reported as the ratio of squalene to an internal standard.

FIG. 13A-13B graphically illustrates that linkage to lipid droplet surface protein to enzymes involved in squalene biosynthesis can improve squalene accumulation. FIG. 13A shows that expression of squalene synthase fused to lipid droplet surface protein can improve squalene synthesis compared to when squalene synthase is in soluble (non-fused) form, FIG. 13B shows that fusion of squalene synthase or FPPS can improve squalene accumulation.

FIG. 14 illustrates improved capacity of the lipid droplet scaffolding platform by providing contributions from the MEP pathway and tire plastidial squalene biosynthesis pathway.

FIG. 15 illustrates that fusions of lipid droplet surface protein Agrobaeterium- mediated transient expression performed on leaves of poplar NM6 to expand LD scaffolding to new species. Top row: images of wild type, not infiltrated poplar leaves. Middle row: images of leaf transiently expressing eYFP-iVoLDSP fusion gene from pEAQ vector. Bottom row: images of leaf transiently expressing AfWRll 1-397 linked to eYFP-A/bLDSP by the“self-cleaving” LP4/2A hybrid linker, which is cleaved during translation to form the two separate protein products. Punctae shown in bottom row images indicate formation of lipid droplets in leaves of poplar NM6.

Detailed Description

Described herein are methods for high-yield synthesis of lipid compounds, including terpenes, terpenoids, steroids and biofuels (oils) in engineered lipid droplet- accumulating plant cells. For example, the systems and methods described herein can facilitate production of products such as terpenoids, carotenoids, withanolides, ubiquinones, dolichols, sterols, and biofuels. To do this, one or more of the enzymes that synthesize such products can be fused to a lipid droplet surface protein (LDSP), or a portion thereof. Such a LDSP-synthetic enzyme fusion protein is anchored on lipid droplet organelles within host cells. As the anchored synthetic enzymes make their hydrophobic, and sometimes volatile, products, these products accumulate in the lipid droplets. Hence, hydrophobic and volatile products are sequestered in a hydrophobic environment where they do not injure the cell. Instead, the hydrophobic and volatile products remain solubilized within the lipid droplets (rather than being lost by vaporization). In addition, the concentration of hydrophobic and volatile products within the lipid droplets facilitates their separation and purification away from other cellular materials. For example, lipids useful as biofuels (e.g. squalene and related compounds) can be made in commercially relevant plant species where die lipids are concentrated within lipid droplets that can readily be isolated from plant materials. To optimize such production, the availability of precursors for such terpenoid products can also be enhanced by engineering the cells to also express de-regulated, robust enzymes from the mevalonic acid (MEV) pathway or the methylerythritol 4- phosphate pathway (MEP). The enzymes can be expressed or transported into the same intracellular compartments or into intracellular compartments that optimize terpenoid synthesis.

Lipid Droplet Surface Protein (LDSP)

As illustrated herein, fusion of synthetic enzymes with lipid droplet surface protein (LDSP), or a portion thereof, can increase manufacture of various terpenoid products. Hence, the LDSP or a portion thereof can be linked in frame with a fusion partner such as a terpene synthase. The LDSP can localize and stabilize fusion partner enzymes within or at the surface of lipid droplets. The lipid droplets can absorb and concentrate / sequester lipophilic products such as terpenoids.

Cytosolic lipid droplets are dynamic organelles typically found in seeds as reservoirs for physiological energy and carbon in form of triacylglycerol (oil) to fuel germination. They are derived from the endoplasmic reticulum (ER) where newly synthesized triacylglycerol accumulates in lens-like structures between the leaflets of the membrane bilayer. After growing in size, the lipid droplets can bud off from the outer membrane of the endoplasmic reticulum.

A mature lipid droplet is typically composed of a hydrophobic core of triacylglycerol surrounded by a phospholipid monolayer and coated with lipid droplet associated proteins such as oleosins involved in the biogenesis and function of the organelle. These oleosins contain surface-oriented amphipathic N- and C-termini essential to efficiently emulsify lipids and a conserved hydrophobic central domain anchoring the oleosins onto the surface of lipid droplets. One type of lipid droplet associated protein is a lipid droplet surface protein.

An amino acid sequence for the full-length Nannochloropsis oceanica lipid droplet surface protein (A/oLDSP, JQ268559.1) sequence is shown below as SEQ ID NO:l. Such an LDSP polypeptide can be fused to enzymes such as those involved in the synthesis of terpenes and terpenoids. When a LDSP polypeptide is fused to another protein or enzyme, (LD) or LD is used with the protein or enzyme name.

A nucleic acid sequence for the full-length N oceanica lipid droplet surface protein (TVoLDSP, JQ268559.1) sequence is shown below as SEQ ID NO:2.

Expression cassettes and expression vectors can have a nucleic acid segment that includes a segment with SEQ ID NO:2 and/or a segment encoding an LDSP protein with SEQ ID NO: 1.

The LDSP can have one or more deletions, insertions, replacements, or substitutions without loss of LDSP activities. Such LDSP activities include localizing and stabilizing enzymes within or at the surface of lipid droplets. The LDSP can have, for example, at least 60%, or at least 70%, or at least 80%, or at least 90%, or at least 95%, or at least 97%, or at least 98%, or at least 99% sequence identity to a sequence described herein.

The systems and methods described herein are useful for synthesizing terpenes, terpenoids, and compounds made from teipenes and terpenoids. A variety of enzymes useful for making such compounds can be used in native or modified forms and are described hereinbelow. Many of the enzymes are part of the mevalonate pathway or the mevalonic acid pathway Mevalonate (MEV) Pathway

The mevalonate pathway, also known as the isoprenoid pathway or HMG- CoA reductase pathway, is an essential metabolic pathway present in eukaryotes, archaea, and some bacteria. The pathway produces the two five-carbon building blocks for terpenes (isoprenoids): isopentenyl pyrophosphate (IPP) and dimethylallyl pyrophosphate (DMAPP).

Isoprenoids are a diverse class of over 30,000 biomolecules such as cholesterol, heme, vitamin K, coenzyme Q10, steroid hormones and molecules used in processes as diverse as protein prenylation, cell membrane maintenance, the synthesis of hormones, protein anchoring and N-glycosylation.

The mevalonate pathway is shown below, beginning with acetyl-CoA and ending with the production of IPP and DMAPP.

The MEV pathway starts with the condensation of two molecules of acetyl- CoA (3) by acetyl-coenzyme A acetyl transferase to form acetoacetyl-CoA (4). Further condensation with a third molecule of acetyl-CoA by HMG-CoA synthase produces 3-hydroxy-3-methyl-glutaryl-CoA (HMG-CoA, 5), which is then reduced by HMG- CoA reductase (HMGR) to give mevalonic acid (6). Following two consecutive phosphorylation steps catalyzed by mevalonic acid kinase (MVK) and

phosphomevalonate kinase (PMK), the resulting mevalonate-5 -diphosphate (8) is converted to isopentenyl pyrophosphate (1) in an ATP-coupled decarboxylation reaction catalyzed by mevalonate-5-diphosphate decarboxylase (MPD). While the plastidic MEP pathway (described below) results in the synthesis of both GRR and DMAPP, tire cytosol-localized mevalonate pathway produces only IPP. IPP can be isomerized to DMAPP by isopentenyl diphosphate isomerase (or IPP:DMAPP) isomerase (ID I). Grochowski et al. (J. Bacteriol. 188:3192-3198 (2006)) identified an enzyme from Methanocaldococcus jannaschii capable of phosphorylating isopentenyl phosphate (9) to isopentenyl pyrophosphate (1). A modified MEV pathway was thus proposed in which mevalonate-5 -phosphate (7) is decarboxylated to 9 and then phosphorylated by isopentenyl phosphate kinase (IPK) to form isopentenyl pyrophosphate (1). However, tire proposed phosphomevalonate decarboxylase (PMD, 7 ® 9 conversion) has yet to be identified.

While the plastidic MEP pathway (described below) results in the synthesis of both IPP and DMAPP, tire cytosol-localized mevalonate pathway produces only IPP. IPP can be isomerized to DMAPP by isopentenyl diphosphate isomerase (ID1), a divalent metal ion-requiring enzyme found in all living organisms.

Methylerythritol Phosphate (MEP) Pathway

For decades, the mevalonic acid pathway was thought to be the only IPP and DMAPP biosynthetic pathway. However, the incompatibility of many isotopic labeling results relating to the MEV pathway had been puzzling. Efforts to resolve such discrepancies eventually led to the discovery of the 2-C-methyl-D-erythritol 4- phosphate (MEP) pathway, also known as the 1-deoxy-D-xylulose 5 -phosphate (DXP), or non-mevalonate pathway.

In plants, the MEP pathway is active in plastids. Reactions proceeding by the MEP pathway are shown below.

The MEP pathway is initiated with a thiamin diphosphate-dependent condensation between D-glyceraldehyde 3-phosphate (11) and pyruvate (10) by 1- deoxy-D-xylulose 5-phosphate synthase (DXS) to produce 1-deoxy-D-xylulose 5- phosphate (DXP, 12), which is then reductively isomerized to methylerythritol phosphate (13) by DXP reducto-isomerase (DXR/lspC). Subsequent coupling between methylerythritol phosphate (13) and cytidine 5'-triphosphate (CTP) is catalyzed by CDP-ME synthetase (IspD) and produces methylerythritol cytidyl diphosphate (CDP-ME, 14). An ATP-dependent enzyme (IspE) phosphorylates the C 2 hydroxyl group of 14, and the resulting 4-diphosphocytidyl-2-C-methyl-D-erythritol- 2-phosphate (CDP-MEP, 15) is cyclized by 2-C-methyl-d-erythritol 2,4- cyclodiphosphate synthase (IspF) to 2-C-methyl-D-erythritol-2, 4-cyclodiphosphate (MEcPP, 16). l-Hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate synthase (IspG) catalyzes the ring-opening of the cyclic pyrophosphate and the Cg-reductive dehydration of MEcPP (16) to form 4-hydroxy-3-methyl-butenyl 1 -diphosphate (HMBPP, 17). The final step of the MEP pathway is catalyzed by 4-hydroxy-3- methylbut-2-enyl diphosphate reductase (IspH) and converts HMBPP (17) to both IPP (1) and DMAPP (2). Thus, unlike the MEV pathway, IPP:DMAPP isomerase (ID I) is not essential in many MEP pathway utilizing organisms. Any of the enzymes of the MEV and MEP pathways can be employed in the systems and methods described herein. Enzymes

A variety of enzymes can be used to make terpenoids. In some cases, fusion of those enzymes to lipid droplet surface proteins can increase lipid and terpenoid production with host cells and host plants. For example, sequestration of a desired product in lipid droplets can increase production of a product and facilitate isolation of that product. Such sequestration of a product be optimized by fusing or linking enzymes in the final steps of synthesizing the product to a lipid droplet surface protein. Enzymes that provide precursors for the final product may not, in some cases, need to be fused or linked to a lipid droplet surface protein. For example, if the desired product is patchoulol or squalene, fusion of patchoulol synthase or squalene synthase, respectively, to a lipid droplet surface protein can help sequester the patchoulol or squalene within lipid droplets. Use of lipid droplets to collect desirable products can also prevent modification of the products into undesired side products, because the lipid droplets can shield the products from modification by other cellular enzymes. As described above, in plants the C5-building blocks for terpenoids, dimethylallyl diphosphate (DMADP) and isopentenyl diphosphate (IDP), are synthesized by two compartmentalized pathways. The mevalonic acid pathway converts acetyl-CoA by enzyme activities located in the cytosol, endoplasmic reticulum and peroxisomes, providing precursors for a wide range of terpenoids with diverse functions such as in growth and development, defense and protein prenylation. The enzyme 3-hydroxy-3-methylglutaryl-CoA reductase (HMGR) catalyzes the rate-limiting step in the mevalonic acid pathway. As illustrated herein, truncation of tire catalytic domain of HMGR by N-terminal truncation can improve the flux of precursors into terpenoid biosynthesis.

In tire plastid, the 2-C-methyl-D-erythritol 4-phosphate (MEP) pathway uses pyruvate and D-glyceraldehyde 3-phosphate to provide precursors for the biosynthesis of terpenoids related to development, photosynthesis and defense against biotic and abiotic stresses. The enzyme 1-deoxy-D-xylulose 5-phosphate synthase (DXS) is rate- limiting in the MEP pathway. Constitutive overproduction of DXS can enhance terpenoid production in some plant species tested. For example, when DXS is expressed in plastids, DXS overexpression can improve production of sesquiterpenes via a sesquiterpene-synthesizing enzyme, especially when farnesyl diphosphate synthase (FDPS) is also produced in plastids, for to provide farnesyl pyrophosphate building blocks.

Head-to-tail condensation of DMADP and IDP affords linear isoprenyl diphosphates, such as farnesyl diphosphate (FDP, C15) or geranylgeranyl diphosphate (GGDP, C20) catalyzed by farnesyl diphosphate synthase (FDPS) and geranylgeranyl diphosphate synthase (GGDPS), respectively. In Nicotiana benthamiana, both DXS and GGDPS were required to enhance terpenoid synthesis. Cytosolic sesquiterpene synthases and plastidial diterpene synthases convert FDPS and GGDPS, respectively, into typically cyclic terpenoid scaffolds, contributing to the enormous structural diversity among terpenoids in the plant kingdom. Such terpenoid scaffolds often undergo further stereo- and regio-selective functionalization catalyzed by ER membrane-bound monooxygenases, such as cytochromes P450 (CYPs), which utilize electrons provided by co-localized NADPH-dependent cytochrome P450 reductases (CPRs). Terpenoid biotechnology in photosynthetic tissues has remained challenging because the engineered pathways must compete for precursors with highly networked native pathways (and their associated regulatory mechanisms).

Examples of enzymes that can produce useful precursors and/or facilitate terpene synthesis include Plectranthus barbatus ( Coleus jbrskohlif) 1 -deoxy-D- xylulose 5-phosphate synthase (/¾DXS), 3-hydroxy-3-methylglutaryl-CoA reductase (HMGR) from Euphorbia lathyris (E/HMGR or a truncated E/HMGR159-582), geranylgeranyl diphosphate synthase (GGDPS), famesyl diphosphate synthase (FDPS), or combinations thereof. As illustrated herein a type I enzyme such as Methanothermobacter thermautotrophicus (AftGGDPS, type I) can be a robust alternative to type P GGDPS enzymes that can increase precursor availability for diterpenoid synthesis and circumvent potential negative feedbacks observed as illustrated herein (see, e.g., FIGs. 2A-2B). The methods and expression systems described herein are useful for manufacture of terpenes, diterpenes, sesquiterpenes, triterpenoids, and combinations thereof. For examples, the methods and expression systems described herein are also useful for manufacture of FDPS-dependent sesquiterpenoids, triterpenoid or combinations thereof.

Highest accumulations of an example target sesquiterpenoid was achieved through compartmentation of the biosynthetic pathway in the plastid instead of the cytosol (FIG. 1C). Diterpenoid pathways were engineered in the plastid

(Pi>DXS+plastid:MzGGDPS+ plastid:AgABS) or in the cytosol/lipid droplets (E/HMGR159-582+cytosol:MfGGDPS+ LDAgABS85-868) with equal success yielding a high content of target diterpenoids in vegetative tissue and demonstrating the practicability of the chosen approaches (FIGs. 2 and 5).

Sequences of some of the enzymes useful for making precursors for terpene / terpenoid synthesis and other useful products are provided herein.

For example, a l-deoxy-D-xylulose-5-phosphate synthase (EC 2.2.1.7; DXS) can facilitate synthesis of precursors for a variety of terpenes. Such a DXS enzyme can catalyze the following reaction:

30

pyruvate + D-glyceraldehyde 3-phosphate ¹ 1-deoxy-D-xylulose 5-phosphate + CO2

One example of a useful DXS enzyme is a Plectranlhus barbatus ( Coleus forskohlii ) 1-deoxy-D-xylulose 5-phosphate synthase (PbDXS·, accession

MH363713), which can have the following amino acid sequence (SEQ ID NO:3).

An example of a nucleotide sequence that encodes the Plectranthus barbatus ( Coleus forskohlii ) 1-deoxy-D-xylulose 5-phosphate synthase (PbDXS) enzyme with SEQ ID NO:3 is shown below as SEQ ID NO:4.

TCAACACACC CGATGACAAA ATCATCTGGG ATGTTGGACA TCAGGCGTAT CCACACAAAA TCTTGACAGG GAGGAGGTCC AGAATGCACA CCATCCGACA GACTTTCGGG CTTGCAGGGT TCCCCAAGAG GGATGAGAGC CCGCACGACG CCTTCGGAGC TGGTCACAGC TCCACTAGTA TTTCAGCTGG TCTAGGGATG GCGGTGGGGA GGGACTTGCT GCAGAAGAAC AACCACGTGA TCTCGGTGAT CGGCGACGGG GCCATGACAG CGGGGCAGGC ATACGAGGCC TTGAACAATG CAGGATTTCT TGATTCCAAT CTGATCATCG TGTTGAACGA CAACAAACAA GTGTCCCTGC CTACAGCCAC AGTCGACGGC CCTGCTCCTC CCGTCGGAGC CTTGAGCAAA GCCCTCACCA AGCTGCAAGC AAGCAGGAAG TTCCGGCAGC TACGAGAAGC AGCAAAAGGC ATGACTAAGC AGATGGGAAA CCAAGCACAC GAAATTGCAT CCAAGGTAGA CACTTACGTT AAAGGAATGA TGGGGAAACC AGGCGCCTCC CTCTTCGAGG AGCTCGGGAT TTATTACATC GGCCCTGTAG ATGGACATAA CATCGAAGAT CTTGTCTATA TTTTCAAGAA AGTTAAGGAG ATGCCTGCGC CCGGCCCTGT TCTTATTCAC ATCATCACCG AGAAGGGCAA AGGCTACCCT CCAGCTGAAG TTGCTGCTGA CAAAATGCAT GGTGTGGTGA AGTTTGATCC AACAACGGGG AAACAGATGA AGGTGAAAGC GAAGACTCAA TCATACACCC AATACTTCGC GGAGTCTCTG GTTGCAGAAG CAGAGCAGGA CGAGAAAGTG GTGGCGATCC ACGCGGCCAT GGGAGGCGGA ACGGGGCTGA ACATCTTCCA GAAACGGTTT CCCGACCGAT GTTTCGATGT CGGGATAGCC GAGCAGCATG CAGTCACCTT CGCCGCGGGT CTTGCAACGG AAGGCCTCAA GCCCTTCTGC ACAATCTACT CTTCCTTCCT GCAGCGAGGC TATGATCAGG TGGTGCACGA TGTGGATCTT CAGAAACTCC CGGTGAGATT CATGATGGAC AGAGCTGGAC TGGTGGGAGC TGACGGCCCA ACCCATTGCG GCGCCTTCGA CACCACCTAC ATGGCCTGCC TGCCCAACAT GGTGGTCATG GCTCCCTCAG ATGAGGCTGA GCTCATGCAC ATGGTCGCCA CCGCCGCCGT CATTGATGAT CGCCCTAGCT GCGTTAGGTA CCCTAGAGGA AACGGTATAG GGGTGCCCCT CCCTCCAAAC AACAAAGGAA TTCCATTAGA GGTTGGGAAG GGAAGGATTT TGAAAGAGGG TAACCGAGTT GCCATTCTAG GCTTCGGAAC TATCGTGCAA AACTGTCTAG CAGCAGCCCA ACTTCTTCAA GAACACGGCA TATCCGTGAG CGTAGCCGAT GCGAGATTCT GCAAGCCTCT GGATGGAGAT CTGATCAAGA ATCTTGTGAA GGAGCACGAA GTTCTCATCA CTGTGGAAGA GGGATCCATT GGAGGATTCA GTGCACATGT CTCTCATTTC TTGTCCCTCA ATGGACTCCT CGACGGCAAT CTTAAGTGGA GGCCTATGGT GCTCCCAGAT AGGTACATTG ATCATGGAGC ATACCCTGAT CAGATTGAGG AAGCAGGGCT GAGCTCAAAG CATATTGCAG GAACTGTTTT GTCACTTATT GGTGGAGGGA AAGACAGTCT TCATTTGATC AACATG

A Plectranthus barbatus 1-deoxy-D-xylulose 5-phosphate synthase (PbDXS) protein with SEQ ID N0:3 was used in experiments described in the Examples. The PbDXS nucleotide sequence used in the experiments (SEQ ID NO:3) described herein significantly differed from the previously published sequence (Gnanasekaran et al. J. Biol. Eng. 9, 24 (2015)).

DXS enzymes with sequences that are not identical to SEQ ID NO:3 can also be used. For example, a variant Plectranthus barbatus 1-deoxy-D-xylulose 5- phosphate synthase (PiDXS) protein (NCBI accession number KP889115.1) is shown below as SEQ ID NO:5.

1 MASCGAIGSS FLPLLHSDES SLLSRPTAAL HIKKQKFSVG 41 AALYQDNTND WPSGEGLTR QKPRTLSFTG EKPSTPILDT 81 INYPIHMKNL SVEELEILAD ELREEIVYTV SKTGGHLSSS 121 LGVSELTVAL HHVFNTPDDK IIWDVGHQAY PHKILTGRRS 161 RMHTIRQTFG LAGFPKRDES PHDAFGAGHS STSISAGLGM 201 AVGRDLLQKN NHVISVIGDG AMTAGQAYEA MNNAGFLDSN 241 LIIVLNDNKQ VSLPTATVDG PAPPVGALSK ALTKLQASRK 281 FRQLREAAKG MTKQMGNQAH EIASKVDTYV KGMMGKPGAS 321 LFEELGIYYI GPVDGHNIED LVYIFKKVKE MPAPGPVLIH 361 IITEKGKGYP PAEVAADKMH GWKFDPTTG KQMKVKTKTQ 401 SYTQYFAESL VAEAEQDEKV VAIHAAMGGG TGLNIFQKRF 441 PDRCFDVGIA EQHAVTFAAG LATEGLKPFC TIYSSFLQRG 481 YDQWHDVDL QKLPVRFMMD RAGLVGADGP THCGAFDTTY 521 MACLPNMWM APSDEAELMH MVATAAVIDD RPSCVRYPRG 561 NGIGVPLPPN NKGIPLEVGK GRILKEGNRV AILGFGTIVQ 601 NCLAAAQLLQ EHGISVSVAD ARFCKPLDGD LIKNLVKEHE 641 VLITVEEGSI GGFSAHVSHF LSLNGLLDGN LKWRPMVLPD 681 RYIDHGAYPD QIEEAGLSSK HIAGTVLSLI GGGKDSLHLI 721 NM

A cDNA sequence for Plectranthus barbatus 1-deoxy-D-xylulose 5-phosphate synthase (PbDXS) with SEQ ID NO:5 is shown below as SEQ ID NO:6.

1 ATGGCGTCTT GTGGAGCTAT CGGGAGTAGT TTCTTGCCAC

41 TGCTCCATTC CGACGAGTCA AGCTTGTTAT CTCGGCCCAC

81 TGCTGCTCTT CACATCAAGA AGCAGAAGTT TTCTGTGGGA

121 GCTGCTCTGT ACCAGGATAA CACGAACGAT GTCGTTCCGA

161 GTGGAGAGGG TCTGACGAGG CAGAAACCAA GAACTCTGAG

201 TTTCACGGGA GAGAAGCCTT CAACTCCAAT TTTGGATACC

241 ATCAACTATC CAATCCACAT GAAGAATCTG TCCGTGGAGG

281 AACTGGAGAT ATTGGCCGAT GAACTGAGGG AGGAGATAGT

321 TTACACGGTG TCGAAAACGG GAGGGCATTT GAGCTCAAGC

361 TTGGGTGTAT CAGAGCTCAC CGTTGCACTG CATCATGTAT

401 TCAACACACC CGATGACAAA ATCATCTGGG ATGTTGGACA 441 TCAGGCGTAT CCACACAAAA TCTTGACAGG GAGGAGGTCC

481 AGAATGCACA CCATCCGACA GACTTTCGGG CTTGCAGGGT

521 TCCCCAAGAG GGATGAGAGC CCGCACGACG CGTTCGGAGC

561 TGGTCACAGC TCCACTAGTA TTTCAGCTGG TCTAGGGATG

601 GCGGTGGGGA GGGACTTGCT ACAGAAGAAC AACCACGTGA

641 TCTCGGTGAT CGGAGACGGA GCCATGACAG CGGGGCAGGC

681 ATACGAGGCC ATGAACAATG CAGGATTTCT TGATTCCAAT

721 CTGATCATCG TGTTGAACGA CAACAAACAA GTGTCCCTGC

761 CTACAGCCAC CGTCGACGGC CCTGCTCCTC CCGTCGGAGC

801 CTTGAGCAAA GCCCTCACCA AGCTGCAAGC AAGCAGGAAG

841 TTCCGGCAGC TACGAGAAGC AGCAAAAGGC ATGACTAAGC

881 AGATGGGAAA CCAAGCACAC GAAATTGCAT CCAAGGTAGA

921 CACTTACGTT AAAGGAATGA TGGGGAAACC AGGCGCCTCC

961 CTCTTCGAGG AGCTCGGGAT TTATTACATC GGCCCTGTAG

1001 ATGGACATAA CATCGAAGAT CTTGTCTATA TTTTCAAGAA

1041 AGTTAAGGAG ATGCCTGCGC CCGGCCCTGT TCTTATTCAC

1081 ATCATCACCG AGAAGGGCAA AGGCTACCCT CCAGCTGAAG

1121 TTGCTGCTGA CAAAATGCAT GGTGTGGTGA AGTTTGATCC

1161 AACAACGGGG AAACAGATGA AGGTGAAAAC GAAGACTCAA

1201 TCATACACCC AATACTTCGC GGAGTCTCTG GTTGCAGAAG

1241 CAGAGCAGGA CGAGAAAGTG GTGGCGATCC ACGCGGCGAT

1281 GGGAGGCGGA ACGGGGCTGA ACATCTTCCA GAAACGGTTT

1321 CCCGACCGAT GTTTCGATGT CGGGATAGCC GAGCAGCATG

1361 CAGTCACCTT CGCCGCGGGT CTTGCAACGG AAGGCCTCAA

1401 GCCCTTCTGC ACAATCTACT CTTCCTTCCT GCAGCGAGGT

1441 TATGATCAGG TGGTGCACGA TGTGGATCTT CAGAAACTCC

1481 CGGTGAGATT CATGATGGAC AGAGCTGGAC TTGTGGGAGC

1521 TGACGGCCCA ACCCATTGCG GCGCCTTCGA CACCACCTAC

1561 ATGGCCTGCC TGCCCAACAT GGTCGTCATG GCTCCCTCCG

1601 ATGAGGCTGA GCTCATGCAC ATGGTCGCCA CTGCCGCTGT

1641 CATTGATGAT CGCCCTAGCT GCGTTAGGTA CCCTAGAGGA

1681 AACGGTATAG GGGTGCCCCT CCCTCCAAAC AATAAAGGAA

1721 TTCCATTAGA GGTTGGGAAG GGAAGGATTT TGAAAGAGGG

1761 TAACCGAGTT GCCATTCTAG GGTTGGGAAG TATCGTGCAA

1801 AACTGTCTAG CAGCAGCCCA ACTTCTTCAA GAACACGGCA

1841 TATCCGTGAG CGTAGCCGAT GCGAGATTCT GCAAGCCTCT

1881 GGATGGAGAT CTGATCAAGA ATCTTGTGAA GGAGCACGAA

1921 GTTCTCATCA CTGTGGAAGA GGGATCCATT GGAGGATTCA

1961 GTGCACATGT CTCTCATTTC TTGTCCCTCA ATGGACTCCT

2041 CGACGGCAAT CTTAAGTGGA GGCCTATGGT GCTCCCAGAT

2081 AGGTACATTG ATCATGGAGC ATACCCTGAT CAGATTGAGG

2121 AAGCAGGGCT GAGCTCAAAG CATATTGCAG GAACTGTTTT

2161 GTCACTTATT GGTGGAGGGA AAGACAGTCT TCATTTGATC

2201 AACATGTAA

A comparison of the SEQ ID NO:3 and SEQ ID NO:5 Plectranthus barbatus 1 -deoxy-D-xyiulose 5-phosphate synthase (PbDXS) proteins is shown below, illustrating that these two DXS proteins have at least 99.3% sequence identity.

Another 1 -deoxy-D-xylulose 5-phosphate synthase enzyme from Isodon rubescens can be used as a fusion partner with LDSP is the Isodon nibescens DXS protein (NCBI accession number AMM72794.1) shown below as SEQ ID NO:7.

1 MASCGAIRSS FLPLLHSDDS SLLSRTAAAL PIKKQKFSVG 41 AALQQDNSND VAANGESLTR QKPRALSFTG EKPSTPILDT 81 INYPNHMKNL SVEELERLAD ELREE IVYSV SKTGGHLSSS 121 LGVSELTVAL HHVFNTPDDK I IWDVGHQAY PHKILTGRRS 161 RMNTIRQTFG LAGFPKRDES AHDAFGAGHS STSI SAGLGM 201 AVGRDLLKKN NHVISVIGDG AMTAGQAYEA LNNAGFLDSN 241 LIWLNDNKQ VSLPTATVDG PAPPVGALSK ALTRLQASRK 281 FRQLREAAKG MTKQMGNQAH EVASKVDTYV KGMMGKPGAS 321 LFEELGIYYI GPVDGHSMED LVYIFQKVKE MPAPGPVLIH 361 I ITEKGKGYP PAEVAADKMH GWKFDPTTG KQMKTKTKTQ 401 SYTQYFAESL VAEAEQDEKV VAIHAAMGGG TGLNIFQKRF 44 1 PERCFDVGIA EQHAVTFAAG LATEGLKPFC TIYSSFLQRG 481 YDQWHDVDL QKLPVRFMMD RAGLVGADGP THCGAFDTTY 521 MACLPNMVVM AP SDEAELMH MVATAGVIDD RPSCVRYPRG 561 NGI GVPLPPN NKGNPLEIGK GRILKEGSRV AILGFGTIVQ 601 NCLAAAQLLQ EHGISVSVAD ARFCKPLDGD LIKKLVKEHE 641 VLI TVEEGS I GGFSAHVSHF LSLNGLLDGN LKWRPMVLPD 681 RYIDHGAYPD QIEEAGLSSK HIAGTVLSLI GGGKDSLHLI 721 NM

A cDNA sequence that encodes the Isodon rubescens DXS protein with SEQ ID NO:7 is available as NCBI accession number KT831764.1, shown below as SEQ ID NO:8.

1 ATGGCATCTT GTGGAGCTAT CAGGAGCAGT TTCCTGCCAT

41 TGCTCCATTC TGACGATTCT AGCTTGTTAT CCCGCACTGC

81 TGCTGCTCTT CCCATCAAAA AGCAAAAGTT CTCTGTGGGA

121 GCAGCTCTTC AACAGGATAA CAGCAACGAT GTGGCGGCGA

161 ATGGAGAGAG TCTCACGAGG CAGAAGCCAA GAGCTCTCAG

201 TTTTACGGGA GAAAAGCCTT CAACTCCAAT TTTGGATACT

24 1 ATTAACTATC CAAACCACAT GAAAAATCTT TCCGTCGAGG

281 AACTAGAGAG ATTGGCTGAT GAATTGAGGG AAGAGATAGT

321 TTACTCGGTG TCCAAAACGG GAGGGCATTT AAGTTCAAGC

361 CTAGGTGTAT CAGAGCTCAC AGTTGCACTT CATCATGTAT

401 TCAACACACC TGATGATAAA ATCATTTGGG ATGTCGGACA

441 TCAGGCGTAT CCACACAAAA TCTTGACGGG GAGGAGGTCA

481 AGAATGAACA CGATTCGACA GACTTTCGGG TTAGCCGGGT

521 TCCCCAAGAG GGATGAGAGC GCGCACGATG CGTTTGGAGC

561 TGGTCACAGT TCAACTAGCA TTTCAGCTGG TCTAGGGATG

601 GCGGTGGGGA GGGACTTGCT AAAGAAGAAC AACCACGTCA

641 TATCAGTGAT CGGAGATGGG GCCATGACAG CCGGACAGGC

681 ATATGAGGCT TTGAACAATG CAGGATTCCT GGACTCCAAT

721 CTCATCGTCG TCTTGAACGA CAACAAGCAA GTGTCCCTGC

761 CCACTGCCAC CGTCGACGGC CCTGCTCCCC CCGTTGGAGC

801 CCTCAGCAAA GCCCTCACCA GACTGCAAGC CAGCAGAAAA

84 1 TTCCGCCAGC TCCGTGAAGC AGCTAAAGGC ATGACTAAGC

881 AGATGGGAAA CCAAGCCCAC GAAGTTGCAT CAAAGGTGGA

921 CACTTATGTG AAGGGAATGA TGGGGAAACC CGGCGCCTCC

961 CTCTTCGAGG AGCTTGGGAT TTATTACATC GGCCCTGTAG

1001 ATGGCCACAG TATGGAAGAT CTTGTCTATA TTTTCCAGAA

1041 AGTTAAGGAG ATGCCGGCGC CTGGACCTGT TCTCATTCAC

1081 ATCATAACCG AGAAGGGCAA AGGCTATCCT CCTGCTGAAG

Another enzyme that is useful for making precursors for terpene / terpenoid production is a geranylgeranyl diphosphate synthase (GGDPS; EC 2.5.1.29). This enzyme is at a branch point in the mevalonate pathway, and catalyzes the synthesis of geranylgeranyl diphosphate (GGPP, shown below) from dimethylallyl diphosphate and isopentenyl diphosphate.

A variety of different GGDPS enzymes can be used in the methods and expression systems described herein. One example of such a GGDPS enzyme is a Methanothermobacter thermautotmphicus (MiGGDPS) enzyme, which is a cytosolic protein. The Methanothermobacter thermautotmphicus (MiGGDPS) enzyme with the following sequence SEQ ID NOS.

1 MMEVMDILRK YSEMADERIR ESISDITPET LLRASEHLIT 41 AGGKKIRPSL ALLS SEAVGG DPGDAAGVAA AIELIHTFSL

81 IHDDIMDDDE IRRGEPAVHV LWGEPMAILA GDVLFSKAFE

121 AVIRNGDSEM VKEALAVWD SCVKICEGQA LDMGFEERLD

161 VTEEEYMEMI YKKTAALIAA ATKAGAIMGG GSPQEIAALE

201 DYGRC IGLAF QIHDDYLDW SDEESLGKPV GSDIAEGKMT

241 LMWKALERA SEKDRERLI S ILGSGDEKLV AEAIEIFERY

281 GATEYAHAVA LDHVRMAKER LEVLEESDAR EALAMIADFV

321 LEREH An optimized cDNA sequence for this Methanothermobacter thermautotrophicus (Af /GGDPS) with SEQ ID NO:9 is shown below as SEQ ID NO: 10.

ATGATGGAGG TAATGGACAT ACTCCGAAAG TATTCAGAAA TGGCAGATGA GAGGATCCGA GAGTCTATAA GTGATATTAC TCCTGAAACG CTGCTTAGAG CATCAGAGCA CCTGATAACA GCCGGAGGCA AGAAAATCAG GCCGAGCCTT GCTCTCTTAT CCAGCGAAGC TGTGGGCGGG GACCCCGGAG ACGCTGCTGG AGTCGCCGCC GCAATAGAGT T GAT AC AT AC ATTCTCCTTA ATACATGATG ATATCATGGA CGATGACGAG ATCAGGAGGG GTGAGCCAGC CGTCCATGTC TTGTGGGGTG AGCCGATGGC TATTCTCGCA GGTGACGTCT TGTTTAGTAA GGCTTTTGAG GCCGTAATTA GAAATGGGGA TTCAGAGATG GTCAAAGAAG CCCTTGCTGT TGTGGTGGAT TCATGTGTCA AGATATGCGA GGGTCAAGCT CTTGACATGG GTTTCGAAGA GCGACTGGAC GTAACCGAGG AAGAGTATAT GGAGATGATA TATAAAAAAA CTGCAGCATT GATTGCTGCT GCTACAAAGG CAGGAGCCAT CATGGGTGGC GGATCACCCC AGGAAATCGC AGCTCTTGAA GACTATGGGA GATGTATTGG GTTGGCATTT CAAATCCACG ACGACTATTT AGATGTAGTT TCTGATGAGG AAAGTCTGGG AAAGCCCGTT GGGTCTGACA TAGCAGAAGG CAAGATGACA CTGATGGTCG TCAAAGCCTT AGAGAGAGCT TCTGAAAAAG ATAGGGAGAG GTTGATCTCT ATACTCGGGA GTGGCGACGA GAAGCTTGTG GCCGAAGCCA TCGAAATTTT CGAACGATAC GGAGCAACTG AATATGCTCA CGCCGTGGCC CTGGATCATG TGCGTATGGC TAAGGAGCGT TTGGAAGTCC TCGAAGAGTC CGATGCCAGG GAAGCTTTAG CCATGATTGC AGATTTTGTG TTAGAGCGTG AACACTAA

Another example of a GGDPS enzyme that can be used is an Euphorbia peplus GGDPS 1 (EpGGDPSl; accession no. MH363711) enzyme, which can increase precursor availability for diterpenoid synthesis. Such an Euphorbia peplus GGDPS 1 (EpGGDPSl) enzyme can have the following amino add sequence (SEQ ID NO:ll). MAFSATFS SC DYSLLLKKSS VNGLKNHPKV PFSGQHFKLM KANFTTRALT VSKSSAVQQP PLTAADSQGS NSNTIPLPPF AFDEYMKTKA KSVNKALDDA IPIQHP IKIH ESMRYSLLAG GKRVRPVLCI AACELVGGDE AAAMPSACAM EMIHTMSLIH DDLPCMDNDD LRRGKPTNHI KYGEETAILA GDALLSFSFE HVARATKNVS PDRMIRVIGE LGSAVGSEGL VAGQIVDIDS EGKEVSLSDL EYIHIHKTAK LLEAAWCGA IVGGADDESV ERMRKYARCI GLLFQWDDI LDVTKSSEEL GKTAGKDLAT DKATYPKLLG IDEARKLAAK LVEQANQELA YFDAAKAAPL YHFANYIASR QN

A nucleotide sequence encoding the Euphorbia peplus GGDPS1 enzyme with SEQ ID NO:ll is shown below as SEQ ID NO:12.

ATGGCCTTCT CCGCGACATT TTCCAGCTGC GACTACTCAC TTCTTTTAAA AAAATCATCC GTCAATGGCC TCAAAAACCA CCCGAAAGTT CCATTTTCTG GTCAACACTT CAAGTTAATG AAAGCCAACT TCACCACCCG TGCCCTGACC GTTTCCAAAT CCTCCGCGGT GCAGCAACCA CCGCTCACTG CGGCGGATTC TCAAGGATCA AATTCCAATA CTATCCCTCT TCCTCCATTC GCATTCGACG AATACATGAA AACCAAGGCT AAAAGCGTCA ACAAAGCATT AGACGACGCT ATTCCGATTC AACATCCGAT CAAAATCCAT GAATCCATGA GATACTCTCT CCTCGCCGGC GGCAAGCGTG TCCGGCCAGT TTTATGTATA GCTGCTTGTG AACTAGTCGG AGGAGACGAA GCAGCAGCTA TGCCGTCAGC ATGTGCTATG GAAATGATCC ATACCATGTC ATTAATCCAC GACGATCTTC CTTGTATGGA CAACGACGAT CTTCGTCGCG GAAAACCAAC AAACCACATA AAATACGGGG AAGAAACCGC CATTCTTGCC GGCGATGCAC TCCTTTCATT TTCCTTTGAA CACGTAGCTA GGGCAACAAA AAACGTTTCC CCGGACCGGA TGATCCGAGT CATAGGGGAG CTAGGTTCAG CTGTGGGTTC GGAAGGTTTA GTCGCGGGAC AAATCGTGGA CATCGATAGC GAGGGGAAGG AAGTGAGTTT AAGTGATTTG GAGTATATTC ATATTCATAA GACGGCTAAG CTTTTGGAAG CAGCCGTCGT GTGTGGTGCG ATAGTCGGTG GCGCCGACGA TGAAAGTGTG GAGAGAATGA GGAAATATGC TAGATGTATA GGCCTATTGT TCCAAGTTGT GGATGATATA TTAGATGTGA CAAAGTCATC GGAGGAGCTC GGGAAGACCG CGGGGAAAGA TTTAGCGACG GATAAAGCGA CGTATCCGAA GTTGTTGGGG ATTGACGAGG CGAGGAAACT TGCAGCTAAA TTGGTGGAGC AAGCTAATCA AGAACTTGCT TATTTTGATG CTGCTAAGGC TGCTCCGTTA TATCATTTTG CTAATTATAT TGCTAGTAGG CAAAATTGA Another example of a GGDPS enzyme that can be used is an Euphorbia peplus GGDPS2 (£pGGDPS2; accession no. MH363712) enzyme, which can have the following amino acid sequence (SEQ ID NO: 13).

MNSMNLGSWL NTSSIFNQST RSRSPPLKSF SIRLPRHKPR FIS SIMTKEE ETLTQKPQFD FKSYMLQKAA S IHQALDAAV SIKEPAKI HE SMRYSLLAGG KRVRPALCLA ACELVGGNDS QAMPAACAVE MVHTMSLIHD DLPCMDNDDL RRGKPTNHIV FGEDVAVLAG DALLSFAFEH IAVATVNVSP ERIVRAIGEL ASAIGAEGLV AGQWDIACE KACDVGLETL EFIHVHKTAK LLECAWLGA ILGGGKDDEI EKLRKYARGI GLLFQWDDI LDVTKSSEEL GKTAGKDLVA DKVTYPKLLG IEKSREFAEK LNREAQQQLS EFDVEKAAPL I ALAN Y I AYR QN

A nucleotide sequence encoding the Euphorbia peplus GGDPS2 enzyme with SEQ ID NO:13 is shown below as SEQ ID NO:14.

ATGAACTCCA TGAATTTGGG TTCATGGCTC AACACTTCTT CAATCTTCAA CCAATCTACC AGATCCAGAT CCCCGCCATT AAAATCCTTC TCAATTCGTC TTCCCCGTCA CAAACCCAGA TTCATTTCTT CAATTATGAC CAAAGAAGAA GAAACCCTAA CCCAAAAACC CCAATTTGAT TTCAAATCTT ACATGCTCCA AAAAGCTGCT TCCATTCATC AAGCTCTAGA CGCCGCCGTT TCGATCAAAG AACCCGCTAA AATCCATGAA TCCATGCGGT ATTCCCTCTT AGCCGGCGGG AAAAGAGTCC GGCCAGCGTT ATGTTTAGCC GCGTGTGAGC TCGTCGGCGG GAACGATTCT CAGGCGATGC CGGCGGCTTG CGCGGTGGAA ATGGTCCACA CGATGTCTCT TATTCACGAT GATCTCCCCT GTATGGATAA CGATGATCTA CGCCGCGGAA AACCCACGAA CCATATCGTG TTCGGGGAAG ACGTGGCGGT TCTCGCTGGG GATGCGTTGC TCTCGTTCGC ATTCGAGCAC ATTGCGGTTG CTACGGTGAA TGTGTCACCG GAGAGGATTG TCCGGGCCAT CGGGGAATTA GCCAGCGCGA TTGGGGCAGA AGGGTTAGTT GCTGGACAAG TGGTTGATAT AGCTTGTGAG AAAGCTTGTG ATGTGGGATT AGAAACGTTG GAGTTCATTC ATGTTCACAA AACGGCGAAA TTGCTGGAAT GCGCTGTCGT ATTGGGGGCA ATATTAGGGG GAGGAAAGGA TGATGAGATT GAGAAGTTGA GGAAATATGC AAGAGGAATA GGGTTGTTGT TTCAAGTAGT GGATGATATT TTAGATGTCA CAAAATCATC GGAAGAGTTG GGGAAAACTG CAGGGAAAGA TTTGGTGGCG GATAAGGTAA CATACCCTAA ACTTTTAGGG ATTGAAAAAT CAAGGGAATT TGCTGAGAAA TTGAATAGGG AAGCTCAACA ACAGTTGAGT GAGTTTGATG TGGAAAAGGC AGCTCCTTTG ATTGCTTTGG CTAATTATAT TGCTTATAGG CAGAATTGA Another example of a GGDPS enzyme that can be used is an Sulfolobus acidocaldarius GGDPS enzyme, which is a cytosolic protein. The Sulfolobus acidocaldarius GGDPS enzyme can have the following amino acid sequence (SEQ ID NO:15).

MSYFDNYFNE IVNSVNDIIK SYISGDVPKL YEASYHLFTS GGKRLRPLIL TISSDLFGGQ RERAYYAGAA IEVLHTFTLV HDDIMDQDNI RRGLPTVHVK YGLPLAILAG DLLHAKAFQL LTQALRGLPS ETIIKAFDIF TRSIIIISEG QAVDMEFEDR IDIKEQEYLD MISRKTAALF SASSSIGALI AGANDNDVRL MSDFGTNLGI AFQIVDDILG LTADEKELGK PVFSDIREGK KTILVIKTLE LCKEDEKKIV LKALGNKSAS KEELMSSADI IKKYSLDYAY NLAEKYYKNA IDSLNQVSSK SDIPGKALKY LAEFTIRRRK A codon optimized nucleotide sequence encoding the Suljblobus acidocaldarius

GGDPS (SaGGDPS) enzyme with SEQ ID NO:15 is shown below as SEQ ID NO:16.

ATGAGTTATT TTGACAACTA CTTCAATGAA ATAGTCAACA GCGTCAATGA TATAATCAAA TCCTACATCA GTGGAGACGT GCCAAAACTC TACGAAGCAT CATACCACCT GTTCACATCT GGAGGAAAAC GATTGAGACC CTTGATATTA ACCATAAGTA GCGACCTCTT TGGGGGCCAG AGAGAAAGAG CATATTACGC TGGAGCAGCT ATCGAGGTGT TACATACATT CACCTTGGTG CATGATGACA TTATGGATCA GGACAATATA AGGCGAGGTT TACCGACTGT GCATGTGAAA TACGGTCTGC CGCTGGCTAT TCTGGCCGGC GATTTACTCC ATGCCAAGGC CTTCCAGTTG CTCACCCAGG CACTCCGTGG ACTGCCCAGC GAGACAATTA TCAAAGCCTT TGACATTTTC ACGAGATCCA TAATAATTAT TTCCGAGGGC CAAGCTGTCG ATATGGAATT TGAAGATAGG ATAGATATTA AAGAGCAGGA ATATCTCGAC ATGATTAGCC GAAAAACCGC TGCTCTCTTC AGTGCCTCTA GCTCCATCGG CGCTTTAATC GCCGGCGCAA ACGATAATGA CGTCAGACTT ATGTCTGATT TCGGGACTAA TCTCGGCATC GCCTTTCAGA TCGTAGACGA TATTCTTGGT CTGACTGCAG ATGAAAAGGA GCTTGGGAAG CCGGTGTTCT CCGACATCCG TGAAGGTAAA AAGACGATCT TGGTCATCAA GACGCTGGAA CTTTGCAAAG AAGATGAGAA GAAGATCGTG CTCAAGGCCT TAGGCAACAA GAGCGCCAGT AAGGAGGAGC TCATGTCTAG TGCTGATATC ATTAAAAAGT ACAGCCTTGA CTACGCCTAT AACCTCGCAG AGAAATACTA TAAGAACGCT ATCGATTCTT TAAACCAAGT CAGCTCTAAG AGCGATATCC CTGGTAAAGC ACTGAAGTAT CTCGCTGAAT TTACAATAAG GAGACGTAAG TAA Another example of a GGDPS enzyme that can be used is a Mortierella elongata GGDPS (AfeGGDPS), which is a cytosolic protein. The Mortierella elongata GGDPS enzyme can have the following amino acid sequence (SEQ ID NO: 17).

MAIPSIYPTD HDEAALLEPY TYICSNPGKE MRTELIEAFN IWI KVPPQEL AI ITKWKML HTSSLLVDDI EDDS I LRRGE PVAHKIFGVP AT INCAN YVY FLALAELSKI SNPKMLTIFT EELLCLHRGQ GMELLWRDSL TCPTEEEYIA MVNDKTGGLL RLAVKLMQAA SDSTVDYVPM VELIGIHFQI RDDYLNLQSS QYSANKGFCE DLTEGKFSYP I IHS IRAAPN SRKLLNILKQ KPKDHELKVY AVSLMNATKT FEYCRQQLTL YEERARAEVR RLGGNARLEK I IDRLS I PDP DSADAEKDW PMFVATSTAG GAAK A codon optimized nucleotide sequence encoding the Mortierella elongata GGDPS enzyme with SEQ ID NO:17 is shown below as SEQ ID NO:18.

ATGGCTATAC CTTCTATTTA CCCTACGGAT CACGATGAAG CTGCCCTTCT GGAGCCGTAC ACGTATATAT GCAGTAATCC GGGAAAGGAG ATGAGGACCG AGTTAATAGA AGCCTTTAAT ATCTGGATCA AAGTGCCCCC TCAGGAGTTG GCAATCATCA CAAAGGTCGT TAAGATGTTA CATACAAGCT CACTCTTGGT AGATGACATT GAAGATGATA GTATTCTCCG TCGAGGCGAG CCAGTTGCAC ACAAAATATT CGGTGTTCCG GCAACTATAA ACTGTGCTAA TTATGTTTAC TTCCTCGCCT TAGCTGAATT GTCTAAGATA TCTAATCCAA AAATGCTTAC GATATTTACC GAAGAGCTTC TTTGCCTTCA TAGGGGACAA GGCATGGAGC TCCTTTGGCG TGATAGCTTA ACGTGCCCGA CCGAGGAAGA GTATATAGCT ATGGTGAACG ATAAAACTGG AGGCCTTCTT AGACTGGCCG TTAAGCTCAT GCAGGCAGCT AGTGACTCTA CCGTAGACTA CGTCCCAATG GTGGAACTCA TTGGCATTCA TTTTCAAATA AGGGACGATT ACTTAAACCT TCAGAGTTCT CAGTACAGTG CAAACAAAGG TTTTTGCGAG GACCTGACTG AGGGCAAGTT TTCCTATCCG ATTATTCACT CCATAAGGGC AGCACCTAAT AGTCGAAAGT TGTTGAACAT CTTGAAGCAG AAACCTAAAG ATCATGAACT CAAGGTTTAT GCCGTGTCAT TAATGAACGC TACGAAAACA TTTGAGTATT GTAGGCAGCA GCTGACCCTT TACGAGGAAC GTGCCCGAGC AGAAGTGAGG CGTTTGGGAG GGAATGCTAG GCTCGAAAAA ATCATCGACA GACTCTCTAT TCCAGACCCC GACAGCGCAG ATGCAGAGAA GGACGTGGTT CCTATGTTCG TTGCAACGTC AACTGCTGGT GGAGCTGCAA AGTAA

Some tests indicated that a plastid-targeted form of Mortierella elongata GGDPS was not particularly active for terpenoid synthesis. Hence, in some cases the GGDPS enzyme is not a plastid-targeted form of Mortierella elongata GGDPS. Another example of a GGDPS enzyme that can be used is a Tolypothrix sp. PCC 7601 geranylgeranyl diphosphate synthase genomic (7AGGDPS). The

Tolypothrix sp. PCC 7601 GGDPS enzyme can have the following amino acid sequence (SEQ ID NO:19).

MVATDKFKKM PETATFNLSA YLKERQQLCE TALDQALPVS YPEKIYESMR YSLLAGGKRV RPILCLATSE MMGGTIEMAM PTACAVEMIH TMSLIHDDLP AMDNDDYRRG KLTNHKVYGE DIAILAGDGL LAYAFEFVAI ATPLTVPRDR VLQWARLAR ALGAAGLVGG QVVDLESEGK TDTSLETLNY IHNHKTAALL EACWCGGIL AGASVEDVQR LTRYAQNIGL AFQIVDDILD ITATQEQLGK TAGKDLKAQK VTYPSLWGIE ESRVKAEQLI EAACAELDVF GEKAQPLKAI AHFIISRNH

A genomic nucleotide sequence encoding the Tolypothrix sp. PCC 7601 GGDPS enzyme with SEQ ID NO:19 is shown below as SEQ ID NO:20.

ATGGTAGCAA CTGATAAGTT TAAAAAGATG CCAGAGACAG CCACGTTTAA CCTATCAGCG TATCTCAAAG AGCGTCAACA GCTTTGTGAA ACTGCTTTGG ATCAAGCGCT TCCCGTTTCC TATCCAGAGA AGATTTACGA GTCGATGCGC TATTCTCTCT TAGCTGGTGG CAAACGTGTG CGTCCTATCC TGTGCCTTGC TACCAGTGAA ATGATGGGCG GCACAATCGA AATGGCAATG CCAACAGCTT GTGCGGTGGA AATGATCCAC ACAATGTCAT TAATTCATGA TGATTTGCCA GCGATGGATA ATGACGATTA CCGTCGGGGT AAGCTGACAA ACCACAAGGT TTATGGCGAA GATATCGCGA TTTTAGCTGG CGATGGTTTG TTGGCCTATG CTTTTGAATT TGTTGCGATC GCCACCCCTT TAACTGTCCC TAGAGATAGA GTATTGCAGG TAGTAGCGCG TCTTGCTCGG GCATTAGGGG CTGCTGGCTT GGTTGGGGGC CAAGTAGTGG ATCTAGAATC AGAAGGTAAA ACAGATACTT CCCTAGAGAC TCTGAATTAC ATTCATAACC ACAAAACAGC TGCCCTTTTG GAAGCTTGTG TTGTTTGTGG TGGTATTTTA GCGGGAGCAT CTGTTGAAGA TGTACAAAGA CTAACTCGGT ATGCTCAGAA TATTGGTCTG GCATTCCAAA TTGTTGATGA TATTTTAGAT ATCACCGCTA CTCAAGAACA ATTAGGCAAA ACTGCTGGCA AGGATTTGAA AGCGCAGAAA GTTACTTATC CCAGCCTGTG GGGAATTGAA GAATCTCGCG TTAAAGCCGA ACAACTCATT GAAGCAGCAT GTGCGGAATT AGACGTATTT GGAGAAAAAG CACAACCTTT AAAAGCGATC GCTCATTTTA TTATCAGCCG CAATCACTAA

Another enzyme that can be used in the methods described herein is 3- hydroxy-3-methyl-glutaryl-coenzyme A reductase (HMG-CoA reductase or HMGR) is an NADH-dependent enzyme (EC 1.1.1.88) or in some cases an NADPH- dependent enzyme (EC 1.1.1.34) enzyme that is rate-controlling in the mevalonate pathway, which is the metabolic pathway that produces cholesterol and other isoprenoids. HMG-CoA reductase converts HMG-CoA to mevalonic acid.

Such HMG-CoA reductase enzymes are useful for sesquiterpenoid synthesis.

One example of an HMG-CoA reductase that can be used is an Euphorbia lathyris hydroxymethylglutaryl coenzyme A reductase ((E/HMGR), for example, with accession number JQ694150.1, and with the sequence shown below (SEQ ID NO:21.

1 MDSTRPESKL RRPIRRISDE VDHHGRCLSP PPKASDALPL 41 PLYLTNAVFF TLFFSVAYYL LHRWRDKIRN STPLHWTLS 81 EIAAIVSLIA SFIYLLGFFG IDFVQSFIAR ASHDTWDLDD 121 ADRNYLIDGD HRLVTCSPAK ISPINSLPPK MSSPPEPIIS 161 PLASEEDEEI VKSWNGTIP SYSLESKLGD CKRAAEIRRE 201 ALQRMMGRSL EGLPVEGFDY ESILGQCCEM PVGYVQIPVG 241 IAGPLLLDGQ EYSVPMATTE GCLVASTNRG CKAIHLSGGA 281 SSVLLKDGMT RAPWRFASA MRAADLKFFL ENPENFDSLS 321 IAFNRSSRFA KLQSIQCSIA GKNLYMRFTC STGDAMGMNM 361 VSKGVQNVLD FLQSDFPDMD VIGISGNFCS DKKPAAVNWI 401 QGRGKSWCE AIIKEEWKK VLKSSVASLV ELNMLKNLTG 441 SAIAGALGGF NAHAGNIVSA IFIATGQDPA QNVESSHCIT 481 MMEAVNDGKD LHISVTMPSI EVGTVGGGTQ LASQSACLNL 521 LGVKGASKES PGANSRLLAT IVAGSVLAGE LSLMSAIAAG 561 QLVRSHMKYN RSSKDVTKFA SS A nucleic acid sequence for a full-length E. lathyris HMGR (EZHMGR159- 582, JQ694150.1; SEQ YD NO:21) is shown below as SEQ ID NO:22.

1 ACGCATAAAC ACATTCAAAC AGCTACTCTT CCAGCTCTTC

41 CTTTTTTCCC CCATTTCCAC TTCCATTATT TTATCCCCCC

81 TTTTTTCTCT CTTCTTCTCG ATTCATCCAT GGATTCCACT

121 CGGCCGGAAT CCAAACTCCG GCGACCGATC CGCCGCATCT

161 CGGACGAGGT TGACCACCAC GGCCGCTGTC TCTCTCCGCC

201 TCCTAAAGCC TCCGATGCTC TCCCTCTCCC GTTGTATTTA

241 ACCAATGCGG TTTTCTTTAC TCTCTTTTTC TCCGTCGCGT

281 ACTATCTTCT CCACCGGTGG AGAGATAAGA TCCGTAATTC

321 TACTCCTCTT CATGTCGTTA CTCTCTCTGA AATTGCCGCC

361 ATTGTTTCTC TCATTGCGTC TTTCATCTAC CTGCTTGGAT

401 TCTTCGGGAT TGATTTCGTT CAGTCTTTCA TTGCACGCGC

441 TTCTCATGAC ACGTGGGACC TTGATGATGC GGATCGTAAC

481 TACCTCATTG ATGGAGATCA CCGTCTCGTT ACTTGCTCTC

521 CTGCGAAGAT TTCTCCGATT AATTCTCTTC CTCCTAAAAT

561 GTCTTCCCCG CCGGAACCGA TTATTTCGCC TCTGGCATCC

601 GAGGAGGATG AGGAAATTGT TAAATCTGTT GTTAATGGAA

641 CGATTCCTTC GTATTCGTTG GAATCGAAGC TTGGGGATTG

681 TAAAAGAGCG GCTGAGATTC GACGGGAGGC TTTGCAGAGA

721 ATGATGGGGA GGTCGTTGGA GGGTTTACCT GTTGAAGGAT

761 TCGATTATGA GTCGATTTTA GGTCAGTGCT GTGAAATGCC

801 TGTTGGTTAT GTGCAGATTC CGGTTGGAAT TGCTGGGCCG

841 TTGCTGCTAG ACGGGCAAGA GTACTCTGTT CCGATGGCGA

881 CCACCGAGGG TTGTTTGGTT GCTAGCACTA ATAGAGGGTG

921 TAAAGCGATC CATTTGTCAG GTGGTGCTAG TAGTGTCTTG

961 TTGAAGGATG GCATGACTAG AGCTCCCGTT GTTCGATTCG

1001 CCTCGGCCAT GAGGGCCGCG GATTTGAAGT TTTTCTTAGA

1041 GAATCCTGAG AATTTCGATA GCTTGTCCAT CGCTTTCAAT

1081 AGGTCCAGTA GATTTGCAAA GCTCCAAAGC ATACAATGTT

1121 CTATTGCTGG AAAGAATCTA TATATGAGAT TCACCTGCAG

1161 CACTGGTGAT GCAATGGGGA TGAACATGGT TTCCAAAGGG

1201 GTTCAAAACG TTCTTGACTT CCTTCAAAGT GATTTCCCTG

1241 ACATGGATGT TATTGGCATC TCAGGAAATT TTTGTTCGGA

1281 CAAGAAGCCA GCTGCTGTGA ACTGGATTCA AGGGCGAGGC

1321 AAATCGGTTG TTTGCGAGGC AATTATCAAG GAAGAGGTGG

1361 TGAAGAAGGT ATTGAAATCA AGTGTTGCTT CACTAGTAGA

1401 GCTGAACATG CTCAAGAATC TTACTGGTTC AGCTATTGCT

1441 GGAGCTCTTG GTGGATTCAA TGCACATGCT GGCAACATAG

1481 TCTCTGCAAT TTTCATTGCC ACTGGCCAGG ATCCAGCCCA

1521 GAATGTTGAG AGTTCTCATT GCATCACCAT GATGGAAGCT

1561 GTCAATGATG GAAAAGATCT CCACATCTCT GTAACCATGC

1601 CTTCAATCGA GGTAGGAACA GTTGGAGGAG GGACACAACT

1641 AGCATCCCAA TCAGCATGTC TGAACCTACT CGGTGTAAAA

1681 GGAGCAAGTA AAGAATCACC AGGAGCAAAC TCAAGGCTCC

1721 TAGCCACAAT AGTAGCTGGT TCAGTCCTAG CTGGTGAACT

1761 CTCCCTAATG TCAGCCATAG CAGCAGGACA ACTAGTCCGG

1801 AGCCACATGA AGTACAACAG ATCCAGCAAA GATGTAACCA

1841 AATTTGCATC ATCTTAATCA AAACTGGTTC ACAATAATAA

1881 AAGCGTCCGA ACCAAACCTC ATAGACAGAG AGCCAGATAG 1921 ACAGAGCCAG AAAGAGAAAG GGGAAGAAAA TGGAAGAAGA 1961 AGACTGTACT GTAGGGTACC TACCCCATGT GAGTTTTTTT 2001 ATTTTTTTTC AAAGCTTTTA ATAGCTGTAA AGTTGCTTAA 2041 TCATATGGAG AGAAGAAAGA AGAATTAGGT ACACAAAACT 2081 TTTGAAAATC TCCATTTTCT TACCCCAAAT TTGAGAAGTG 2121 GGTGTACTGT ATTAGTATGT TGGTGAGCAC ATGTGAGCAA 2161 AAAAGGTCCC CACTATCTAC TACCTAGTGT TTTTTGTGTA 2201 TGTTTGTGTC CTAATTTATT TGTTAATGTT TAGTTGCTTT 2241 CTTTCTTCTA TTTTTTGCAT ACATATGTTG TGTACACTTG 2281 TTTTTGTGTT TGAACTTACC TGGGGCTGAC ATGTGACACG 2321 TGGCGTGATA TTGTTTGTTG TTGATTTCCT TTTTTTTT

A truncated E7HMGR159-582 polypeptide can also be used and is particularly useful because it is a feedback-insensitive form of E1HMGR. Such a truncated E/HMGR159-582 enzyme is shown below as SEQ ID NO:23.

MISPLASEED EEIVKSWNG TIPSYSLESK LGDCKRAAEI RREALQRMMG RSLEGLPVEG FDYESILGQC CEMPVGYVQI PVGIAGPLLL DGQEYSVPMA TTEGCLVAST NRGCKAIHLS GGASSVLLKD GMTRAPWRF ASAMRAADLK FFLENPENFD SLS IAFNRSS RFAKLQS IQC S I AGKNLYMR FTCSTGDAMG MNMVSKGVQN VLDFLQSDFP DMDVIGISGN FCSDKKPAAV NWIQGRGKSV VCEAI IKEEV VKKVLKSSVA SLVELNMLKN LTGSAIAGAL GGFNAHAGNI VSAIFIATGQ DPAQNVESSH CITMMEAVND GKDLHISVTM PSIEVGTVGG GTQLASQSAC LNLLGVKGAS KESPGANSRL LATIVAGSVL AGELSLMSAI AAGQLVRSHM KYNRSSKDVT KFASS

Note that a methionine was added to the N-terminus of this EZHMGR159-582 polypeptide to facilitate expression. A nucleotide sequence for the EZHMGR 159-582 polypeptide with SEQ ID NO:23 is shown below with the added ATG (SEQ ID NO:24).

1 ATGATTTCGC CTCTGGCATC CGAGGAGGAT GAGGAAATTG 41 TTAAATCTGT TGTTAATGGA ACGATTCCTT CGTATTCGTT 81 GGAATCGAAG CTTGGGGATT GTAAAAGAGC GGCTGAGATT 121 CGACGGGAGG CTTTGCAGAG AATGATGGGG AGGTCGTTGG 161 AGGGTTTACC TGTTGAAGGA TTCGATTATG AGTCGATTTT 201 AGGTCAGTGC TGTGAAATGC CTGTTGGTTA TGTGCAGATT 241 CCGGTTGGAA TTGCTGGGCC GTTGCTGCTA GACGGGCAAG 281 AGTACTCTGT TCCGATGGCG ACCACCGAGG GTTGTTTGGT 321 TGCTAGCACT AATAGAGGGT GTAAAGCGAT CCATTTGTCA 361 GGTGGTGCTA GTAGTGTCTT GTTGAAGGAT GGCATGACTA 401 GAGCTCCCGT TGTTCGATTC GCCTCGGCCA TGAGGGCCGC 44 1 GGATTTGAAG TTTTTCTTAG AGAATCCTGA GAATTTCGAT 481 AGCTTGTCCA TCGCTTTCAA TAGGTCCAGT AGATTTGCAA 521 AGCTCCAAAG CATACAATGT TCTATTGCTG GAAAGAATCT 561 ATATATGAGA TTCACCTGCA GCACTGGTGA TGCAATGGGG 601 ATGAACATGG TTTCCAAAGG GGTTCAAAAC GTTCTTGACT

641 TCCTTCAAAG TGATTTCCCT GACATGGATG TTATTGGCAT

681 CTCAGGAAAT TTTTGTTCGG ACAAGAAGCC AGCTGCTGTG

721 AACTGGATTC AAGGGCGAGG CAAATCGGTT GTTTGCGAGG

761 CAATTATCAA GGAAGAGGTG GTGAAGAAGG TATTGAAATC

801 AAGTGTTGCT TCACTAGTAG AGCTGAACAT GCTCAAGAAT

841 CTTACTGGTT CAGCTATTGC TGGAGCTCTT GGTGGATTCA

881 ATGCACATGC TGGCAACATA GTCTCTGCAA TTTTCATTGC

921 CACTGGCCAG GATCCAGCCC AGAATGTTGA GAGTTCTCAT

961 TGCATCACCA TGATGGAAGC TGTCAATGAT GGAAAAGATC

1001 TCCACATCTC TGTAACCATG CCTTCAATCG AGGTAGGAAC

1041 AGTTGGAGGA GGGACACAAC TAGCATCCCA ATCAGCATGT

1081 CTGAACCTAC TCGGTGTAAA AGGAGCAAGT AAAGAATCAC

1121 CAGGAGCAAA CTCAAGGCTC CTAGCCACAA TAGTAGCTGG

1161 TTCAGTCCTA GCTGGTGAAC TCTCCCTAAT GTCAGCCATA

1201 GCAGCAGGAC AACTAGTCCG GAGCCACATG AAGTACAACA

1241 GATCCAGCAA AGATGTAACC AAATTTGCAT CATCTTAA

Another enzyme that is useful for making precursors for terpene / terpenoid production is a famesyl diphosphate synthase, which makes precursors for the biosynthesis of essential isoprenoids like carotenoids, withanolides, ubiquinones, dolichols, sterols, among others. Famesyl diphosphate synthase makes famesyl diphosphate, shown below.

One example of a famesyl diphosphate synthase that can be used is from Arabidopsis thaliana. An example of an Arabidopsis thaliana famesyl diphosphate synthase sequence is shown below (accession AAB49290.1, SEQ ID NO:25).

1 MSVSCCCRNL GKTIKKAIPS HHLHLRSLGG SLYRRRIQSS 41 SMETDLKSTF LNVYSVLKSD LLHDPSFEFT NESRLWVDRM 81 LDYNVRGGKL NRGLSWDSF KLLKQGNDLT EQEVFLSCAL 121 GWCIEWLQAY FLVLDDIMDN SVTRRGQPCW FRVPQVGMVA 161 INDGILLRNH IHRILKKHFR DKPYYVDLVD LFNEVELQTA 201 CGQMIDLITT FEGEKDLAKY SLSIHRRIVQ YKTAYYSFYL 241 PVACALLMAG ENLENHIDVK NVLVDMGIYF QVQDDYLDCF 281 ADPETLGKIG TDIEDFKCSW LWKALERCS EEQTKILYEN 321 YGKPDPSNVA KVKDLYKELD LEGVFMEYES KSYEKLTGAI 361 EGHQSKAIQA VLKSFLAKIY KRQK A nucleotide sequence encoding the Arabidopsis thaliana fames >4 diphosphate synthase with SEQ ID NO:25 is shown below as SEQ ID NO:26. 1 GGCGTTTTCG GGAGAAGAAG GAGGAATATG AGTGTGAGTT

41 GTTGTTGTAG GAATCTGGGC AAGACAATAA AAAAGGCAAT

81 ACCTTCACAT CATTTGCATC TGAGAAGTCT TGGTGGGAGT

121 CTCTATCGTC GTCGTATCCA AAGCTCTTCA ATGGAGACCG

161 ATCTCAAGTC AACCTTTCTC AACGTTTATT CTGTTCTCAA

201 GTCTGACCTT CTTCATGACC CTTCCTTCGA ATTCACCAAT

241 GAATCTCGTC TCTGGGTTGA TCGGATGCTG GACTACAATG

281 TACGTGGAGG GAAACTCAAT CGGGGTCTCT CTGTTGTTGA

321 CAGTTTCAAA CTTTTGAAGC AAGGCAATGA TTTGACTGAG

361 CAAGAGGTTT TCCTCTCTTG TGCTCTCGGT TGGTGCATTG

401 AATGGCTCCA AGCTTATTTC CTTGTGCTTG ATGATATTAT

441 GGATAACTCT GTCACTCGCC GTGGTCAACC TTGCTGGTTC

481 AGAGTTCCTC AGGTTGGTAT GGTTGCCATC AATGATGGGA

521 TTCTACTTCG CAATCACATC CACAGGATTC TCAAAAAGCA

561 TTTCCGTGAT AAGCCTTACT ATGTTGACCT TGTTGATTTG

601 TTTAATGAGG TTGAGTTGCA AACAGCTTGT GGCCAGATGA

641 TAGATTTGAT CACCACCTTT GAAGGAGAAA AGGATTTGGC

681 CAAGTACTCA TTGTCAATCC ACCGTCGTAT TGTCCAGTAC

721 AAAACGGCTT ATTACTCATT TTATCTCCCT GTTGCTTGTG

761 CGTTGCTTAT GGCGGGCGAA AATTTGGAAA ACCATATTGA

801 CGTGAAAAAT GTTCTTGTTG ACATGGGAAT CTACTTCCAA

841 GTGCAGGATG ATTATCTGGA TTGTTTTGCT GATCCCGAGA

881 CGCTTGGCAA GATAGGAACA GATATAGAAG ATTTCAAATG

921 CTCGTGGTTG GTGGTTAAGG CATTAGAGCG CTGCAGCGAA

961 GAACAAACTA AGATATTATA TGAGAACTAT GGTAAACCCG

1001 ACCCATCGAA CGTTGCTAAA GTGAAGGATC TCTACAAAGA

1041 GCTGGATCTT GAGGGAGTTT TCATGGAGTA TGAGAGCAAA

1081 AGCTACGAGA AGCTGACTGG AGCGATTGAG GGACACCAAA

1121 GTAAAGCAAT CCAAGCAGTG CTAAAATCCT TCTTGGCTAA

1161 GATCTACAAG AGGCAGAAGT AGTAGAGACA GACAAACATA

1201 AGTCTCAGCC CTCAAAAATT TCCTGTTATG TCTTTGATTC

1241 TTGGTTGGTG ATTTGTGTAA TTCTGTTAAG TGCTCTGATT

1281 TTCAGGGGGA ATAATAAACC TGCCTCACTT TTATTCTTGT

1321 GTTACAATTG TATTTGTTTC ATGACTATGA TCTTCTTCTT

1361 TCATCAGTTA TATGAATTTG AGATTCTTGT TGGTTG

Another amino acid sequence for a full-length cytosolic A. thaliana fames yl diphosphate synthase (cytosol AfFDPS, NM_117823.4); SEQ ID NO:27) is shown below.

1 MADLKSTFLD VYSVLKSDLL QDPSFEFTHE SRQWLERMLD 41 YNVRGGKLNR GLSWDSYKL LKQGQDLTEK ETFLSCALGW 81 CIEWLQAYFL VLDDIMDNSV TRRGQPCWFR KPKVGMIAIN 121 DGILLRNHIH RILKKHFREM PYYVDLVDLF NEVEFQTACG 161 QMIDLITTFD GEKDLSKYSL QIHRRIVEYK TAYYSFYLPV 201 ACALLMAGEN LENHTDVKTV LVDMGIYFQV QDDYLDCFAD 241 PETLGKIGTD IEDFKCSWLV VKALERCSEE QTKILYENYG 281 KAEPSNVAKV KALYKELDLE GAFMEYEKES YEKLTKLIEA 321 HQSKAIQAVL KSFLAKIYKR QK A nucleic acid sequence for a full-length cytosolic A. thaliana FDPS

(cytosol A fFDPS , NM_117823.4; SEQ ID NO:28) is shown below.

A variety of enzymes can be used in the methods described herein including enzymes that can synthesize terpene precursors, monoterpenes, diterpenes, triterpenes, sesquiterpenes, and combinations thereof. The terpene synthases can be monoterpene synthases, diterpene synthases, sesquiterpene synthases, sesterterpene synthases, triterpene synthases, tetraterpene synthases, polyterpene synthases, or combinations thereof. Such terpene synthases can be fused to LDSP polypeptides.

For example, one enzyme that can be fused LDSP is an Abies grandis abietadiene synthase enzyme (EC 4.2.3.18), which is an enzyme that catalyzes the conversion of GGDP via CPP, a carbocation, and tertiary allyiic alcohol to form a mixture of four products, where abietadiene is the main product.

An amino add sequence for an A. grandis abietadiene synthase (U50768.1) is shown below as SEQ ID NO:31.

1 MAMPSSSLSS QIPTAAHHLT ANAQSIPHFS TTLNAGSSAS

41 KRRSLYLRWG KGSNKIIACV GEGGATSVPY QSAEKNDSLS

81 SSTLVKREFP PGFWKDDLID SLTSSHKVAA SDEKRIETLI

121 SEIKNMFRCM GYGETNPSAY DTAWVARIPA VDGSDNPHFP

161 ETVEWILQNQ LKDGSWGEGF YFLAYDRILA TLACIITLTL

201 WRTGETQVQK GIEFFRTQAG KMEDEADSHR PSGFEIVFPA

241 MLKEAKILGL DLPYDLPFLK QIIEKREAKL KRIPTDVLYA

281 LPTTLLYSLE GLQEIVDWQK IMKLQSKDGS FLSSPASTAA

321 VFMRTGNKKC LDFLNFVLKK FGNHVPCHYP LDLFERLWAV

361 DTVERLGIDR HFKEEIKEAL DYVYSHWDER GIGWARENPV

401 PDIDDTAMGL RILRLHGYNV SSDVLKTFRD ENGEFFCFLG

441 QTQRGVTDML NVNRCSHVSF PGETIMEEAK LCTERYLRNA

481 LENVDAFDKW AFKKNIRGEV EYALKYPWHK SMPRLEARSY

521 IENYGPDDVW LGKTVYMMPY ISNEKYLELA KLDFNKVQSI

561 HQTELQDLRR WWKSSGFTDL NFTRERVTEI YFSPASFIFE

601 PEFSKCREVY TKTSNFTVIL DDLYDAHGSL DDLKLFTESV

641 KRWDLSLVDQ MPQQMKICFV GFYNTFNDIA KEGRERQGRD

681 VLGYIQNVWK VQLEAYTKEA EWSEAKYVPS FNEYIENASV

721 SIALGTWLI SALFTGEVLT DEVLSKIDRE SRFLQLMGLT

761 GRLVNDTKTY QAERGQGEVA SAIQCYMKDH PKISEEEALQ

801 HVYSVMENAL EELNREFVNN KIPDIYKRLV FETARIMQLF

841 YMQGDGLTLS HDMEIKEHVK NCLFQPVA

A nucleic acid sequence for the A grandis abietadiene synthase (U50768.1;

SEQ ID NO:31) is shown below as SEQ ID NO:32.

1 AGATGGCCAT GCCTTCCTCT TCATTGTCAT CACAGATTCC 41 CACTGCTGCT CATCATCTAA CTGCTAACGC ACAATCCATT 81 CCGCATTTCT CCACGACGCT GAATGCTGGA AGCAGTGCTA 121 GCAAACGGAG AAGCTTGTAC CTACGATGGG GTAAAGGTTC 161 AAACAAGATC ATTGCCTGTG TTGGAGAAGG TGGTGCAACC 201 TCTGTTCCTT ATCAGTCTGC TGAAAAGAAT GATTCGCTTT 241 CTTCTTCTAC ATTGGTGAAA CGAGAATTTC CTCCAGGATT 281 TTGGAAGGAT GATCTTATCG ATTCTCTAAC GTCATCTCAC 321 AAGGTTGCAG CATCAGACGA GAAGCGTATC GAGACATTAA 361 TATCCGAGAT TAAGAATATG TTTAGATGTA TGGGCTATGG 401 CGAAACGAAT CCCTCTGCAT ATGACACTGC TTGGGTAGCA 441 AGGATTCCAG CAGTTGATGG CTCTGACAAC CCTCACTTTC 481 CTGAGACGGT TGAATGGATT CTTCAAAATC AGTTGAAAGA 521 TGGGTCTTGG GGTGAAGGAT TCTACTTCTT GGCATATGAC 561 AGAATACTGG CTACACTTGC ATGTATTATT ACCCTTACCC 601 TCTGGCGTAC TGGGGAGACA CAAGTACAGA AAGGTATTGA 641 ATTCTTCAGG ACACAAGCTG GAAAGATGGA AGATGAAGCT 681 GATAGTCATA GGCCAAGTGG ATTTGAAATA GTATTTCCTG 721 CAATGCTAAA GGAAGCTAAA ATCTTAGGCT TGGATCTGCC 761 TTACGATTTG CCATTCCTGA AACAAATCAT CGAAAAGCGG 801 GAGGCTAAGC TTAAAAGGAT TCCCACTGAT GTTCTCTATG 841 CCCTTCCAAC AACGTTATTG TATTCTTTGG AAGGTTTACA 881 AGAAATAGTA GACTGGCAGA AAATAATGAA ACTTCAATCC 921 AAGGATGGAT CATTTCTCAG CTCTCCGGCA TCTACAGCGG 961 CTGTATTCAT GCGTACAGGG AACAAAAAGT GCTTGGATTT 1001 CTTGAACTTT GTCTTGAAGA AATTCGGAAA CCATGTGCCT 1041 TGTCACTATC CGCTTGATCT ATTTGAACGT TTGTGGGCGG 1081 TTGATACAGT TGAGCGGCTA GGTATCGATC GTCATTTCAA 1121 AGAGGAGATC AAGGAAGCAT TGGATTATGT TTACAGCCAT 1161 TGGGACGAAA GAGGCATTGG ATGGGCGAGA GAGAATCCTG 1201 TTCCTGATAT TGATGATACA GCCATGGGCC TTCGAATCTT 1241 GAGATTACAT GGATACAATG TATCCTCAGA TGTTTTAAAA 1281 ACATTTAGAG ATGAGAATGG GGAGTTCTTT TGCTTCTTGG 1321 GTCAAACACA GAGAGGAGTT ACAGACATGT TAAACGTCAA 1361 TCGTTGTTCA CATGTTTCAT TTCCGGGAGA AACGATCATG 1401 GAAGAAGCAA AACTCTGTAC CGAAAGGTAT CTGAGGAATG 1441 CTCTGGAAAA TGTGGATGCC TTTGACAAAT GGGCTTTTAA 1481 AAAGAATATT CGGGGAGAGG TAGAGTATGC ACTCAAATAT 1521 CCCTGGCATA AGAGTATGCC AAGGTTGGAG GCTAGAAGCT 1561 ATATTGAAAA CTATGGGCCA GATGATGTGT GGCTTGGAAA 1601 AACTGTATAT ATGATGCCAT ACATTTCGAA TGAAAAGTAT 1641 TTAGAACTAG CGAAACTGGA CTTCAATAAG GTGCAGTCTA 1681 TACACCAAAC AGAGCTTCAA GATCTTCGAA GGTGGTGGAA 1721 ATCATCCGGT TTCACGGATC TGAATTTCAC TCGTGAGCGT 1761 GTGACGGAAA TATATTTCTC ACCGGCATCC TTTATCTTTG 1801 AGCCCGAGTT TTCTAAGTGC AGAGAGGTTT ATACAAAAAC 1841 TTCCAATTTC ACTGTTATTT TAGATGATCT TTATGACGCC 1881 CATGGATCTT TAGACGATCT TAAGTTGTTC ACAGAATCAG 1921 TCAAAAGATG GGATCTATCA CTAGTGGACC AAATGCCACA 1961 ACAAATGAAA ATATGTTTTG TGGGTTTCTA CAATACTTTT 2001 AATGATATAG CAAAAGAAGG ACGTGAGAGG CAAGGGCGCG 2041 ATGTGCTAGG CTACATTCAA AATGTTTGGA AAGTCCAACT 2081 TGAAGCTTAC ACGAAAGAAG CAGAATGGTC TGAAGCTAAA 2121 TATGTGCCAT CCTTCAATGA ATACATAGAG AATGCGAGTG 2161 TGTCAATAGC ATTGGGAACA GTCGTTCTCA TTAGTGCTCT 2201 TTTCACTGGG GAGGTTCTTA CAGATGAAGT ACTCTCCAAA 2241 ATTGATCGCG AATCTAGATT TCTTCAACTC ATGGGCTTAA 2281 CAGGGCGTTT GGTGAATGAC ACCAAAACTT ATCAGGCAGA 2321 GAGAGGTCAA GGTGAGGTGG CTTCTGCCAT ACAATGTTAT 2361 ATGAAGGACC ATCCTAAAAT CTCTGAAGAA GAAGCTCTAC 2401 AACATGTCTA TAGTGTCATG GAAAATGCCC TCGAAGAGTT 2441 GAATAGGGAG TTTGTGAATA ACAAAATACC GGATATTTAC 2481 AAAAGACTGG TTTTTGAAAC TGCAAGAATA ATGCAACTCT 2521 TTTATATGCA AGGGGATGGT TTGACACTAT CACATGATAT 2561 GGAAATTAAA GAGCATGTCA AAAATTGCCT CTTCCAACCA 2601 GTTGCCTAGA TTAAATTATT CAGTTAAAGG CCCTCATGGT 2641 ATTGTGTTAA CATTATAATA ACAGATGCTC AAAAGCTTTG 2681 AGCGGTATTT GTTAAGGCTA TCTTTGTTTG TTTGTTTGTT 2721 TACTGCCAAC CAAAAAGCGT TCCTAAACCT TTGAAGACAT 2761 TTCCATCCAA GAGATGGAGT CTACATTTTA TTTATGAGAT 2801 TGAATTATTT CAAGAGAATA TACTACATAT ATTTAAAAGT 2841 AAAAAAAAAA AAAAAAAAAA A

However, a truncated Abies grandis abietadiene synthase enzyme that is missing the first 84 amino acids (AgABS 85 868 ) can be used for cytosolic expression of the enzyme (cytosolrAgABS 85 868 ). A sequence for this cytosolAgABS 85 868 enzyme is shown below as SEQ ID NO:33.

VKREFPPGFW KDDLIDSLTS SHKVAASDEK RIETLISEIK NMFRCMGYGE TNPSAYDTAW VARIPAVDGS DNPHFPETVE WILQNQLKDG SWGEGFYFLA YDRILATLAC IITLTLWRTG ETQVQKGIEF FRTQAGKMED EADSHRPSGF EIVFPAMLKE AKILGLDLPY DLPFLKQIIE KREAKLKRIP TDVLYALPTT LLYSLEGLQE IVDWQKIMKL QSKDGSFLSS PASTAAVFMR TGNKKCLDFL NFVLKKFGNH VPCHYPLDLF ERLWAVDTVE RLGIDRHFKE EIKEALDYVY SHWDERGIGW ARENPVPDID DTAMGLRILR LHGYNVSSDV LKTFRDENGE FFCFLGQTQR GVTDMLNVNR CSHVSFPGET IMEEAKLCTE RYLRNALENV DAFDKWAFKK NIRGEVEYAL KYPWHKSMPR LEARSYIENY GPDDVWLGKT VYMMPYISNE KYLELAKLDF NKVQSIHQTE LQDLRRWWKS SGFTDLNFTR ERVTEIYFSP ASFIFEPEFS KCREVYTKTS NFTVILDDLY DAHGSLDDLK LFTESVKRWD LSLVDQMPQQ MKICFVGFYN TFNDIAKEGR ERQGRDVLGY IQNVWKVQLE AYTKEAEWSE AKYVPSFNEY IENASVSIAL GTWLISALF TGEVLTDEVL SKIDRESRFL QLMGLTGRLV NDTKTYQAER GQGEVASAIQ CYMKDHPKIS EEEALQHVYS VMENALEELN REFVNNKIPD IYKRLVFETA RIMQLFYMQG DGLTLSHDME IKEHVKNCLF QPVA A nucleotide sequence for this cytosolAgABS 85 868 enzyme with SEQ ID NO:33 is shown below as SEQ ID NO:34.

GTGAAACGAG AATTTCCTCC AGGATTTTGG AAGGATGATC TTATCGATTC TCTAACGTCA TCTCACAAGG TTGCAGCATC AGACGAGAAG CGTATCGAGA CATTAATATC CGAGATTAAG

AATATGTTTA GATGTATGGG CTATGGCGAA ACGAATCCCT CTGCATATGA CACTGCTTGG GTAGCAAGGA TTCCAGCAGT TGATGGCTCT GACAACCCTC ACTTTCCTGA GACGGTTGAA TGGATTCTTC AAAATCAGTT GAAAGATGGG TCTTGGGGTG AAGGATTCTA CTTCTTGGCA TATGACAGAA TACTGGCTAC ACTTGCATGT ATTATTACCC TTACCCTCTG GCGTACTGGG GAGACACAAG TACAGAAAGG TATTGAATTC TTCAGGACAC AAGCTGGAAA GATGGAAGAT GAAGCTGATA GTCATAGGCC AAGTGGATTT GAAATAGTAT TTCCTGCAAT GCTAAAGGAA GCTAAAATCT TAGGCTTGGA TCTGCCTTAC GATTTGCCAT TCCTGAAACA AATCATCGAA AAGCGGGAGG CTAAGCTTAA AAGGATTCCC ACTGATGTTC TCTATGCCCT TCCAACAACG TTATTGTATT CTTTGGAAGG TTTACAAGAA ATAGTAGACT GGCAGAAAAT AATGAAACTT CAATCCAAGG ATGGATCATT TCTCAGCTCT CCGGCATCTA CAGCGGCTGT ATTCATGCGT ACAGGGAACA AAAAGTGCTT GGATTTCTTG AACTTTGTCT TGAAGAAATT CGGAAACCAT GTGCCTTGTC ACTATCCGCT TGATCTATTT GAACGTTTGT GGGCGGTTGA TACAGTTGAG CGGCTAGGTA TCGATCGTCA TTTCAAAGAG GAGATCAAGG AAGCATTGGA TTATGTTTAC AGCCATTGGG ACGAAAGAGG CATTGGATGG GCGAGAGAGA ATCCTGTTCC TGATATTGAT GATACAGCCA TGGGCCTTCG AATCTTGAGA TTACATGGAT ACAATGTATC CTCAGATGTT TTAAAAACAT TTAGAGATGA GAATGGGGAG TTCTTTTGCT TCTTGGGTCA AACACAGAGA GGAGTTACAG ACATGTTAAA CGTCAATCGT TGTTCACATG TTTCATTTCC GGGAGAAACG ATCATGGAAG AAGCAAAACT CTGTACCGAA AGGTATCTGA GGAATGCTCT GGAAAATGTG GATGCCTTTG ACAAATGGGC TTTTAAAAAG AATATTCGGG GAGAGGTAGA GTATGCACTC AAATATCCCT GGCATAAGAG TATGCCAAGG TTGGAGGCTA GAAGCTATAT TGAAAACTAT GGGCCAGATG ATGTGTGGCT TGGAAAAACT GTATATATGA TGCCATACAT TTCGAATGAA AAGTATTTAG AACTAGCGAA ACTGGACTTC AATAAGGTGC AGTCTATACA CCAAACAGAG CTTCAAGATC TTCGAAGGTG GTGGAAATCA TCCGGTTTCA CGGATCTGAA TTTCACTCGT GAGCGTGTGA CGGAAATATA TTTCTCACCG GCATCCTTTA TCTTTGAGCC CGAGTTTTCT AAGTGCAGAG AGGTTTATAC AAAAACTTCC AATTTCACTG TTATTTTAGA TGATCTTTAT GACGCCCATG GATCTTTAGA CGATCTTAAG TTGTTCACAG AATCAGTCAA AAGATGGGAT CTATCACTAG TGGACCAAAT GCCACAACAA ATGAAAATAT GTTTTGTGGG TTTCTACAAT ACTTTTAATG ATATAGCAAA AGAAGGACGT GAGAGGCAAG GGCGCGATGT GCTAGGCTAC ATTCAAAATG TTTGGAAAGT CCAACTTGAA GCTTACACGA AAGAAGCAGA ATGGTCTGAA GCTAAATATG TGCCATCCTT CAATGAATAC ATAGAGAATG CGAGTGTGTC AATAGCATTG GGAACAGTCG TTCTCATTAG TGCTCTTTTC ACTGGGGAGG TTCTTACAGA TGAAGTACTC TCCAAAATTG ATCGCGAATC TAGATTTCTT CAACTCATGG GCTTAACAGG GCGTTTGGTG AATGACACCA AAACTTATCA GGCAGAGAGA GGTCAAGGTG AGGTGGCTTC TGCCATACAA TGTTATATGA AGGACCATCC TAAAATCTCT GAAGAAGAAG CTCTACAACA TGTCTATAGT GTCATGGAAA ATGCCCTCGA AGAGTTGAAT AGGGAGTTTG TGAATAACAA AATACCGGAT ATTTACAAAA GACTGGTTTT TGAAACTGCA AGAATAATGC AACTCTTTTA TATGCAAGGG GATGGTTTGA CACTATCACA TGATATGGAA ATTAAAGAGC ATGTCAAAAA TTGCCTCTTC CAACCAGTTG CC

Another enzyme that can be used in the methods is a cytochrome P450 (CYP720B4) enzyme, which can convert abietadiene and several isomers to the corresponding diterpene resin acids. One example of a cytochrome P450 that can be used is a Picea sitchensis CYP720B4, which is expressed in the endoplasmic reticulum (ER:/¾CYP720B4). Such a Picea sitchensis CYP720B4, for example, can have accession number HM245403.1 and the following amino acid sequence SEQ ID NO:35.

1 MAPMADQISL LLWFTVAVA LLHLIHRWWN IQRGPKMSNK 41 EVHLPPGSTG WPLIGETFSY YRSMTSNHPR KFIDDREKRY 81 DSDIFISHLF GGRTWSADP QFNKFVLQNE GRFFQAQYPK 121 ALKALIGNYG LLSVHGDLQR KLHGIAVNLL RFERLKVDFM 161 EEIQNLVHST LDRWADMKEI SLQNECHQMV LNLMAKQLLD 201 LSPSKETSDI CELFVDYTNA VIAIPIKIPG STYAKGLKAR 241 ELLIKKISEM IKERRNHPEV VHNDLLTKLV EEGLISDEII 281 CDFILFLLFA GHETSSRAMT FAIKFLTYCP KALKQMKEEH 321 DAILKSKGGH KKLNWDDYKS MAFTQCVINE TLRLGNFGPG 361 VFREAKEDTK VKDCLIPKGW WFAFLTATH LHEKFHNEAL 401 TFNPWRWQLD KDVPDDSLFS PFGGGARLCP GSHLAKLELS 441 LFLHIFITRF SWEARADDRT SYFPLPYLTK GFPISLHGRV 481 ENE

This endoplasmic Picea sitchensis CYP720B4 (PsCYP720B4, HM245403.1; SEQ ID NO:35) can be encoded by the following cDNA sequence (SEQ ID NO:36).

1 ATGGCGCCCA TGGCAGACCA AATATCATTA CTGTTGGTGG 41 TGTTCACGGT AGCGGTGGCG CTCCTCCACC TTATTCACAG 81 GTGGTGGAAT ATCCAGAGAG GCCCAAAAAT GAGTAATAAG 121 GAGGTTCATC TGCCTCCTGG GTCGACTGGA TGGCCGCTTA 161 TTGGCGAAAC CTTCAGTTAT TATCGCTCCA TGACCAGCAA 201 TCATCCCAGG AAATTCATCG ACGACAGAGA GAAAAGATAT 241 GATTCGGACA TTTTCATATC TCATCTATTT GGAGGCCGGA 281 CGGTTGTATC AGCGGATCCC CAGTTCAACA AGTTTGTTCT 321 ACAAAACGAG GGGAGATTCT TTCAAGCCCA ATACCCAAAG 361 GCACTGAAGG CTTTGATAGG CAACTACGGG CTGCTCTCTG 401 TGCATGGAGA TCTCCAGAGA AAGCTCCACG GAATAGCTGT 441 GAATTTGCTG AGGTTTGAGA GACTGAAAGT CGATTTCATG 481 GAGGAGATAC AGAATCTCGT GCACTCCACG TTGGATAGAT 521 GGGCAGATAT GAAGGAAATT TCTCTGCAGA ATGAATGTCA 561 CCAGATGGTT CTCAACTTGA TGGCCAAACA ACTGCTGGAT 601 TTATCTCCTT CCAAAGAGAC GAGTGATATT TGCGAGCTAT 641 TCGTTGACTA TACCAATGCA GTGATTGCCA TTCCCATCAA

681 AATCCCAGGT TCCACCTATG CAAAGGGGCT TAAGGCAAGG

721 GAGCTTCTCA TAAAAAAGAT TTCAGAAATG ATAAAAGAGA

761 GAAGGAATCA TCCTGAAGTT GTTCATAATG ATTTGTTAAC

801 TAAACTTGTG GAAGAGGGGC TCATTTCAGA TGAAATTATT

841 TGTGATTTTA TTTTATTTTT ACTTTTTGCT GGACATGAGA

881 CTTCCTCTAG AGCCATGACA TTTGCTATCA AGTTTCTTAC

921 CTATTGCCCC AAGGCATTGA AGCAAATGAA GGAAGAGCAT

961 GATGCTATAT TAAAATCAAA GGGAGGTCAT AAGAAACTTA

1001 ATTGGGATGA CTACAAATCA ATGGCATTCA CTCAATGTGT

1041 TATAAATGAA ACACTTCGAT TAGGTAACTT TGGTCCAGGG

1081 GTGTTTAGAG AAGCTAAAGA AGACACTAAA GTAAAAGATT

1121 GTCTCATTCC AAAAGGATGG GTGGTATTTG CTTTTCTGAC

1161 TGCAACACAT CTACATGAAA AGTTTCATAA TGAAGCTCTT

1201 ACTTTTAACC CATGGCGATG GCAATTGGAT AAAGATGTAC

1241 CAGATGATAG TTTGTTTTCA CCTTTTGGAG GTGGAGCTAG

1281 GCTTTGTCCA GGATCTCATC TAGCTAAACT TGAATTGTCA

1321 CTTTTTCTTC ACATATTTAT CACAAGATTC AGTTGGGAAG

1361 CGCGTGCAGA TGATCGTACC TCATATTTTC CATTACCTTA

1401 TTTAACTAAA GGCTTTCCCA TTAGCCTTCA TGGTAGAGTA

1441 GAGAATGAAT AA

To target terpenoid synthesis to the lipid droplets, a truncated CYP720B4 lacking the membrane-binding domain was produced that is missing amino acids 1-29 and that is expressed in the cytosol (cytosol:CYP720B4(30-483)). This truncated

CYP720B4 can be a fusion partner with LDSP. A sequence for such a truncated Picea sitchensis CYP720B4 is shown below as SEQ ID NO:37.

NIQRGPKMSN KEVHLPPGST GWPLIGETFS YYRSMTSNHP RKFIDDREKR YDSDIFISHL FGGRTWSAD PQFNKFVLQN EGRFFQAQYP KALKALIGNY GLLSVHGDLQ RKLHGIAVNL LRFERLKVDF MEEIQNLVHS TLDRWADMKE ISLQNECHQM VLNLMAKQLL DLSPSKETSD ICELFVDYTN AVIAIPIKIP GSTYAKGLKA RELLIKKISE MIKERRNHPE WHNDLLTKL VEEGLISDEI ICDFILFLLF AGHETSSRAM TFAIKFLTYC PKALKQMKEE HDAILKSKGG HKKLNWDDYK SMAFTQCVIN ETLRLGNFGP GVFREAKEDT KVKDCLIPKG WWFAFLTAT HLHEKFHNEA LTFNPWRWQL DKDVPDDSLF SPFGGGARLC PGSHLAKLEL SLFLHIFITR FSWEARADDR TSYFPLPYLT KGFPISLHGR VENE

This truncated PsCYP720B4(30-483) polypeptide can have a methionine at its N- terminus. This truncated cytosolic Picea sitchensis CYP720B4 (PsCYP720B4) can be encoded by die following cDNA sequence (SEQ ID NO:38).

AATATCCAGA GAGGCCCAAA AATGAGTAAT AAGGAGGTTC ATCTGCCTCC TGGGTCGACT GGATGGCCGC TTATTGGCGA AACCTTCAGT TATTATCGCT CCATGACCAG CAATCATCCC AGGAAATTCA TCGACGACAG AGAGAAAAGA TATGATTCGG ACATTTTCAT ATCTCATCTA TTTGGAGGCC GGACGGTTGT ATCAGCGGAT CCCCAGTTCA ACAAGTTTGT TCTACAAAAC GAGGGGAGAT TCTTTCAAGC CCAATACCCA AAGGCACTGA AGGCTTTGAT AGGCAACTAC GGGCTGCTCT CTGTGCATGG AGATCTCCAG AGAAAGCTCC ACGGAATAGC TGTGAATTTG CTGAGGTTTG AGAGACTGAA AGTCGATTTC ATGGAGGAGA TACAGAATCT CGTGCACTCC ACGTTGGATA GATGGGCAGA TATGAAGGAA ATTTCTCTGC AGAATGAATG TCACCAGATG GTTCTCAACT TGATGGCCAA ACAACTGCTG GATTTATCTC CTTCCAAAGA GACGAGTGAT ATTTGCGAGC TATTCGTTGA CTATACCAAT GCAGTGATTG CCATTCCCAT CAAAATCCCA GGTTCCACCT ATGCAAAGGG GCTTAAGGCA AGGGAGCTTC TCATAAAAAA GATTTCAGAA ATGATAAAAG AGAGAAGGAA TCATCCTGAA GTTGTTCATA ATGATTTGTT AACTAAACTT GTGGAAGAGG GGCTCATTTC AGATGAAATT ATTTGTGATT TTATTTTATT TTTACTTTTT GCTGGACATG AGACTTCCTC TAGAGCCATG ACATTTGCTA TCAAGTTTCT TACCTATTGC CCCAAGGCAT TGAAGCAAAT GAAGGAAGAG CATGATGCTA TATTAAAATC AAAGGGAGGT CATAAGAAAC TTAATTGGGA TGACTACAAA TCAATGGCAT TCACTCAATG TGTTATAAAT GAAACACTTC GATTAGGTAA CTTTGGTCCA GGGGTGTTTA GAGAAGCTAA AGAAGACACT AAAGTAAAAG ATTGTCTCAT TCCAAAAGGA TGGGTGGTAT TTGCTTTTCT GACTGCAACA CATCTACATG AAAAGTTTCA TAATGAAGCT CTTACTTTTA ACCCATGGCG ATGGCAATTG GATAAAGATG TACCAGATGA TAGTTTGTTT TCACCTTTTG GAGGTGGAGC TAGGCTTTGT CCAGGATCTC ATCTAGCTAA ACTTGAATTG TCACTTTTTC TTCACATATT TATCACAAGA TTCAGTTGGG AAGCGCGTGC AGATGATCGT ACCTCATATT TTCCATTACC TTATTTAACT AAAGGCTTTC CCATTAGCCT TCATGGTAGA GTAGAGAATG AATAA This cDNA with SEQ ID NO:38, which encodes a truncated Picea sitchensis CYP720B4 (PsCYP720B4), can have an ATG at the 5’ end.

To facilitate the catalytic activity of the cytochrome P450, a cytochrome P450 reductase can also be expressed. One example of a cytochrome P450 reductase that can be used is a Camptotheca acuminata cytochrome P450 reductase (CaCPR), for example with accession number KP162177.1 and the following amino acid sequence

(SEQ ID NO:39.

1 MQSSSVKVST FDLMSAILRG RSMDQTNVSF ESGESPALAM 41 LIENRELVMI LTTSVAVLIG CFWLLWRRS SGKSGKVTEP 81 PKPLMVKTEP EPEVDDGKKK VSIFYGTQTG TAEGFAKALA 121 EEAKVRYEKA SFKVIDLDDY AADDEEYEEK LKKETLTFFF 161 LATYGDGEPT DNAARFYKWF MEGKERGDWL KNLHYGVFGL 201 GNRQYEHFNR IAKWDDTIA EQGGKRLIPV GLGDDDQCIE 241 DDFAAWRELL WPELDQLLQD EDGTTVATPY TAAVLEYRW 281 FHDSPDASLL DKSFSKSNGH AVHDAQHPCR ANVAVRRELH 321 TPASDRSCTH LEFDISGTGL VYETGDHVGV YCENLIEWE 361 EAEMLLGLSP DTFFSIHTDK EDGTPLSGSS LPPPFPPCTL 401 RRALTQYADL LSSPKKSSLL ALAAHCSDPS EADRLRHLAS 441 PSGKDEYAQW WASQRSLLE VMAEFPSAKP PIGAFFAGVA 481 PRLQPRYYSI SSSPRMAPSR IHVTCALVFE KTPVGRIHKG 521 VCSTWMKNAV PLDESRDCSWAPIFVRQSNF KLPADTKVPV 561 LMIGPGTGLA PFRGFLQERL ALKEAGAELG PAILFFGCRN 601 RQMDYIYEDE LNNFVETGAL SELIVAFSRE GPKKEYVQHK 641 MMEKASDIWN MISQEGYIYV CGDAKGMARD VHRTLHTIVQ 681 EQGSLDSSKT ESMVKNLQMN GRYLRDVW

A nucleotide sequence that encodes the Camptotheca acuminata cytochrome P450 reductase with SEQ ID NO:39 is shown below as SEQ ID NO:40.

1 AGTCTCTGCA ACCATAACCA TAACCAGAAC CAGAACCAGG

41 AAGCCAGAGG CTCTCTTTTC TTTCTCTCTC TCTCATTACC

81 AATTCTCCGG TAATTTTCTA GCCGGCCACA GGACCTTTAT

121 TTTTTTCCCG GTAAGATGCA ATCGAGTTCG GTTAAGGTGT

161 CGACGTTTGA TTTGATGTCA GCGATTTTGA GGGGGAGGAG

201 TATGGATCAG ACCAACGTGT CGTTCGAATC CGGCGAGTCT

241 CCGGCGTTGG CGATGTTGAT CGAGAATCGG GAGCTGGTGA

281 TGATCCTGAC GACGTCTGTG GCGGTGTTGA TAGGGTGTTT

321 TGTAGTGTTG TTGTGGCGGA GATCGTCAGG AAAGTCGGGG

361 AAAGTGACAG AACCTCCGAA GCCGCTGATG GTGAAGACTG

401 AGCCGGAGCC GGAAGTTGAT GACGGCAAGA AGAAGGTTTC

441 TATCTTCTAT GGCACGCAGA CCGGTACCGC CGAAGGTTTC

481 GCAAAGGCAC TCGCCGAGGA AGCAAAAGTG AGATACGAAA

521 AGGCGTCATT TAAAGTGATA GATTTGGATG ATTATGCCGC

561 CGACGATGAA GAATACGAAG AGAAATTGAA GAAAGAAACT

601 TTAACATTTT TCTTCTTAGC TACATACGGA GATGGAGAAC

641 CAACTGACAA TGCCGCCAGA TTCTACAAAT GGTTTATGGA

681 GGGAAAAGAG AGAGGGGACT GGCTTAAGAA TCTCCATTAC

721 GGAGTATTTG GTCTCGGCAA CAGGCAGTAT GAGCATTTCA

761 ACAGGATTGC AAAGGTGGTG GATGATACCA TTGCCGAGCA

801 GGGTGGGAAG CGCCTCATTC CTGTGGGCCT TGGAGATGAT

841 GATCAATGCA TTGAAGATGA TTTTGCTGCA TGGCGGGAGT

881 TATTGTGGCC CGAGTTGGAT CAGTTGCTTC AAGATGAAGA

921 TGGCACAACT GTTGCTACTC CTTACACTGC CGCTGTATTG

961 GAATATCGTG TTGTATTCCA TGACAGCCCA GATGCATCAT

1001 TACTGGACAA GAGCTTCAGT AAGTCAAATG GTCATGCTGT

1041 TCATGATGCT CAACATCCAT GCAGAGCTAA CGTGGCTGTG

1081 AGAAGGGAGC TTCACACTCC CGCATCTGAT CGTTCTTGCA

1121 CTCATCTGGA ATTTGATATT TCTGGCACTG GACTTGTATA 1161 TGAAACTGGG GACCATGTTG GTGTGTATTG TGAGAATTTA 1201 ATTGAAGTTG TGGAGGAGGC AGAAATGTTA TTAGGTTTAT 1241 CACCAGATAC CTTTTTCTCC ATTCACACTG ATAAGGAGGA 1281 TGGCACACCA CTTAGTGGAA GCTCCTTGCC ACCTCCTTTC 1321 CCCCCCTGTA CTTTAAGAAG AGCGCTGACT CAATATGCAG 1361 ATCTTTTGAG TTCTCCCAAA AAGTCCTCTT TGCTTGCTCT 1401 AGCAGCTCAT TGTTCTGATC CAAGTGAAGC TGATCGATTA 1441 AGACACCTTG CATCTCCTTC TGGAAAGGAT GAATATGCAC 1481 AGTGGGTAGT TGCAAGTCAG AGAAGTCTCC TTGAGGTCAT 1521 GGCAGAATTT CCATCAGCAA AGCCCCCGAT TGGAGCTTTC 1561 TTTGCCGGAG TTGCCCCACG TCTGCAACCC AGATACTATT 1601 CAATTTCATC CTCCCCAAGG ATGGCACCAT CTAGAATCCA 1641 CGTTACTTGT GCATTAGTTT TTGAGAAAAC ACCTGTAGGA 1681 CGGATTCACA AGGGTGTGTG TTCAACTTGG ATGAAGAATG 1721 CTGTGCCACT AGATGAGAGC CGTGATTGCA GCTGGGCACC 1761 TATTTTTGTT AGGCAATCTA ACTTCAAACT TCCTGCTGAT 1801 ACTAAAGTAC CTGTTTTAAT GATTGGACCT GGCACAGGAT 1841 TGGCTCCTTT TAGGGGTTTC CTGCAGGAAA GATTGGCTCT 1881 GAAAGAAGCT GGAGCAGAAC TTGGACCTGC CATACTATTT 1921 TTTGGATGCA GGAATCGTCA AATGGATTAC ATTTATGAGG 1961 ATGAGCTGAA CAACTTTGTT GAAACTGGTG CACTCTCTGA 2001 GCTTATTGTC GCTTTCTCAC GCGAGGGACC CAAAAAGGAA 2041 TATGTGCAAC ATAAGATGAT GGAGAAAGCG TCGGATATCT 2081 GGAACATGAT TTCTCAGGAA GGATATATAT ATGTATGTGG 2121 TGACGCCAAA GGCATGGCGA GGGATGTCCA CAGAACACTA 2161 CACACTATTG TGCAAGAGCA GGGATCTCTA GACAGCTCCA 2201 AGACTGAAAG CATGGTGAAG AATCTGCAAA TGAATGGAAG 2241 GTATTTGCGT GATGTGTGGT GATTAGTACC CTCAAGTTAA 2281 CCCATCATAA AGTTGGGGCA AATGAAAGAA AATTATGTAA 2321 TTTATACTGG CCGAGGCCAA ATTGCCGGGG ATAAAAGAAA 2361 GCATGCAGCA AGGCAAAGTG AGAAGATTAC TCACCTTCGC 2401 TGCCAATTCT TAATAGTGAT CAGTTCTGTG ATTCTTTTTA 2441 CTCTTCTTGT GCGAAGGATT TTTTGGTTCA TGTAATTTAT 2481 ATATATATAC ACACAATATG TTGTAGTTAT AATAGCAGTA 2521 ATTGGGAGGC ATTTTTACTG GACTTTCTCT CTGTAATTTT 2561 ACTCTAATGA GCAGATAAGT TAATTGATTC TGGACAAAAA 2601 AAAAAA

A truncated Camptotheca acuminata cytochrome P450 reductase, which is expressed in the cytosol, can be used. Such a truncated cytochrome P450 reductase can have the N-terminal 1-69 amino acids missing and, for example, can be referred to as GaCPR 70 708 when the cytochrome P450 reductase is from Camptotheca acuminata. A sequence for this truncated Camptotheca acuminata cytochrome P450 reductase (GaCPR 70 708 ) is shown below as SEQ ID NO:41.

SSGKSGKVTE PPKPLMVKTE PEPEVDDGKK KVSIFYGTQT GTAEGFAKAL AEEAKVRYEK ASFKVIDLDD YAADDEEYEE KLKKETLTFF FLATYGDGEP TDNAARFYKW FMEGKERGDW LKNLHYGVFG LGNRQYEHFN RIAKWDDTI AEQGGKRLIP VGLGDDDQCI EDDFAAWREL LWPELDQLLQ DEDGTTVATP YTAAVLEYRV VFHDSPDASL LDKSFSKSNG HAVHDAQHPC RANVAVRREL HTPASDRSCT HLEFDI SGTG LVYETGDHVG VYCENLIEW EEAEMLLGLS PDTFFS IHTD KEDGTPLSGS SLPPPFPPCT LRRALTQYAD LLSSPKKSSL LALAAHCSDP SEADRLRHLA SP SGKDEYAQ WWASQRSLL EVMAEFPSAK PPI GAFFAGV APRLQPRYYS I SSSPRMAPS RIHVTCALVF EKTPVGRI HK GVCSTWMKNA VPLDESRDCS WAPIFVRQSN FKLPADTKVP VLMIGPGTGL APFRGFLQER LALKEAGAEL GPAILFFGCR NRQMDYIYED ELNNFVETGA LSELIVAFSR EGPKKEYVQH KMMEKASDIW NMISQEGYIY VCGDAKGMAR DVHRTLHT IV QEQGSLDSSK TESMVKNLQM NGRYLRDVW This truncated Camptotheca acuminata cytochrome P450 reductase (GzCPR 70 708 ) polypeptide can have a methionine at its N-terminus, and it can be encoded by the following cDNA sequence (SEQ ID NO:42).

TCGTCAGGAA AGTCGGGGAA AGTGACAGAA CCTCCGAAGC CGCTGATGGT GAAGACTGAG CCGGAGCCGG AAGTTGATGA CGGCAAGAAG AAGGTTTCTA TCTTCTATGG CACGCAGACC GGTACCGCCG AAGGTTTCGC AAAGGCACTC GCCGAGGAAG CAAAAGTGAG ATACGAAAAG GCGTCATTTA AAGTGATAGA TTTGGATGAT TATGCCGCCG ACGATGAAGA ATACGAAGAG AAATTGAAGA AAGAAACTTT AACATTTTTC TTCTTAGCTA CATACGGAGA TGGAGAACCA ACTGACAATG CCGCCAGATT CTACAAATGG TTTATGGAGG GAAAAGAGAG AGGGGACTGG CTTAAGAATC TCCATTACGG AGTATTTGGT CTCGGCAACA GGCAGTATGA GCATTTCAAC AGGATTGCAA AGGTGGTGGA TGATACCATT GCCGAGCAGG GTGGGAAGCG CCTCATTCCT GTGGGCCTTG GAGATGATGA TCAATGCATT GAAGATGATT TTGCTGCATG GCGGGAGTTA TTGTGGCCCG AGTTGGATCA GTTGCTTCAA GATGAAGATG GCACAACTGT TGCTACTCCT TACACTGCCG CTGTATTGGA ATATCGTGTT GTATTCCATG ACAGCCCAGA TGCATCATTA CTGGACAAGA GCTTCAGTAA GTCAAATGGT CATGCTGTTC ATGATGCTCA ACATCCATGC AGAGCTAACG TGGCTGTGAG AAGGGAGCTT CACACTCCCG CATCTGATCG TTCTTGCACT CATCTGGAAT TTGATATTTC TGGCACTGGA CTTGTATATG AAACTGGGGA CCATGTTGGT GTGTATTGTG AGAATTTAAT TGAAGTTGTG GAGGAGGCAG AAATGTTATT AGGTTTATCA CCAGATACCT TTTTCTCCAT TCACACTGAT AAGGAGGATG GCACACCACT TAGTGGAAGC TCCTTGCCAC CTCCTTTCCC CCCCTGTACT TTAAGAAGAG CGCTGACTCA ATATGCAGAT CTTTTGAGTT CTCCCAAAAA GTCCTCTTTG CTTGCTCTAG CAGCTCATTG TTCTGATCCA AGTGAAGCTG ATCGATTAAG ACACCTTGCA TCTCCTTCTG GAAAGGATGA ATATGCACAG TGGGTAGTTG CAAGTCAGAG AAGTCTCCTT GAGGTCATGG CAGAATTTCC ATCAGCAAAG CCCCCGATTG GAGCTTTCTT TGCCGGAGTT GCCCCACGTC TGCAACCCAG ATACTATTCA ATTTCATCCT CCCCAAGGAT GGCACCATCT AGAATCCACG TTACTTGTGC ATTAGTTTTT GAGAAAACAC CTGTAGGACG GATTCACAAG GGTGTGTGTT CAACTTGGAT GAAGAATGCT GTGCCACTAG ATGAGAGCCG TGATTGCAGC TGGGCACCTA TTTTTGTTAG GCAATCTAAC TTCAAACTTC CTGCTGATAC TAAAGTACCT GTTTTAATGA TTGGACCTGG CACAGGATTG GCTCCTTTTA GGGGTTTCCT GCAGGAAAGA TTGGCTCTGA AAGAAGCTGG AGCAGAACTT GGACCTGCCA TACTATTTTT TGGATGCAGG AATCGTCAAA TGGATTACAT TTATGAGGAT GAGCTGAACA ACTTTGTTGA AACTGGTGCA CTCTCTGAGC TTATTGTCGC TTTCTCACGC GAGGGACCCA AAAAGGAATA TGTGCAACAT AAGATGATGG AGAAAGCGTC GGATATCTGG AACATGATTT CTCAGGAAGG ATATATATAT GTATGTGGTG ACGCCAAAGG CATGGCGAGG GATGTCCACA GAACACTACA CACTATTGTG CAAGAGCAGG GATCTCTAGA CAGCTCCAAG ACTGAAAGCA TGGTGAAGAA TCTGCAAATG AATGGAAGGT ATTTGCGTGA TGTGTGGTGA

An amino acid sequence for a cytosolic P. cablin patchoulol synthase (cytosoliPcPAS, AY508730; SEQ ID NO:43) is shown below.

1 MELYAQSVGV GAASRPLANF HPCVWGDKFI VYNPQSCQAG 41 EREEAEELKV ELKRELKEAS DNYMRQLKMV DAIQRLGIDY 81 LFVEDVDEAL KNLFEMFDAF CKNNHDMHAT ALSFRLLRQH 121 GYRVSCEVFE KFKDGKDGFK VPNEDGAVAV LEFFEATHLR 161 VHGEDVLDNA FDFTRNYLES VYATLNDPTA KQVHNALNEF 201 SFRRGLPRVE ARKYISIYEQ YASHHKGLLK LAKLDFNLVQ 241 ALHRRELSED SRWWKTLQVP TKLSFVRDRL VESYFWASGS 281 YFEPNYSVAR MILAKGLAVL SLMDDVYDAY GTFEELQMFT 321 DAIERWDASC LDKLPDYMKI VYKALLDVFE EVDEELIKLG 361 APYRAYYGKE AMKYAARAYM EEAQWREQKH KPTTKEYMKL 401 ATKTCGYITL IILSCLGVEE GIVTKEAFDW VFSRPPFIEA 441 TLI IARLVND ITGHEFEKKR EHVRTAVECY MEEHKVGKQE 481 WSEFYNQME SAWKDINEGF LRPVEFPIPL LYLILNSVRT 521 LEVIYKEGDS YTHVGPAMQN IIKQLYLHPV PY

A nucleic acid sequence for a cytosolic P. cablin patchoulol synthase (cytosol:PcPAS, AY508730; SEQ ID NO:44) is shown below.

1 ATGGAGTTGT ATGCCCAAAG TGTTGGAGTG GGTGCTGCTT 41 CTCGTCCTCT TGCGAATTTT CATCCATGTG TGTGGGGAGA 81 CAAATTCATT GTCTACAACC CACAATCATG CCAGGCTGGA 121 GAGAGAGAAG AGGCTGAGGA GCTGAAAGTG GAGCTGAAAA 161 GAGAGCTGAA GGAAGCATCA GACAACTACA TGCGGCAACT 201 GAAAATGGTG GATGCAATAC AACGATTAGG CATTGACTAT 241 CTTTTTGTGG AAGATGTTGA TGAAGCTTTG AAGAATCTGT 281 TTGAAATGTT TGATGCTTTC TGCAAGAATA ATCATGACAT 321 GCACGCCACT GCTCTCAGCT TTCGCCTTCT CAGACAACAT 361 GGATACAGAG TTTCATGTGA AGTTTTTGAA AAGTTTAAGG

401 ATGGCAAAGA TGGATTTAAG GTTCCAAATG AGGATGGAGC

441 GGTTGCAGTC CTTGAATTCT TCGAAGCCAC GCATCTCAGA

481 GTCCATGGAG AAGACGTCCT TGATAATGCT TTTGACTTCA

521 CTAGGAACTA CTTGGAATCA GTCTATGCAA CTTTGAACGA

561 TCCAACCGCG AAACAAGTCC ACAACGCATT GAATGAGTTC

601 TCTTTTCGAA GAGGATTGCC ACGCGTGGAA GCAAGGAAGT

641 ACATATCAAT CTACGAGCAA TACGCATCTC ATCACAAAGG

681 CTTGCTCAAA CTTGCTAAGC TGGATTTCAA CTTGGTACAA

721 GCTTTGCACA GAAGGGAGCT GAGTGAAGAT TCTAGGTGGT

761 GGAAGACTTT ACAAGTGCCC ACAAAGCTAT CATTCGTTAG

801 AGATCGATTG GTGGAGTCCT ACTTCTGGGC TTCGGGATCT

841 TATTTCGAAC CGAATTATTC GGTAGCTAGG ATGATTTTAG

881 CAAAAGGGCT GGCTGTATTA TCTCTTATGG ATGATGTGTA

921 TGATGCATAT GGTACTTTTG AGGAATTACA AATGTTCACA

961 GATGCAATCG AAAGGTGGGA TGCTTCATGT TTAGATAAAC

1001 TTCCAGATTA CATGAAAATA GTATACAAGG CCCTTTTGGA

1041 TGTGTTTGAG GAAGTTGACG AGGAGTTGAT CAAGCTAGGC

1081 GCACCATATC GAGCCTACTA TGGAAAAGAA GCCATGAAAT

1121 ACGCCGCGAG AGCTTACATG GAAGAGGCCC AATGGAGGGA

1161 GCAAAAGCAC AAACCCACAA CCAAGGAGTA TATGAAGCTG

1201 GCAACCAAGA CATGTGGCTA CATAACTCTA ATAATATTAT

1241 CATGTCTTGG AGTGGAAGAG GGCATTGTGA CCAAAGAAGC

1281 CTTCGATTGG GTGTTCTCCC GACCTCCTTT CATCGAGGCT

1321 ACATTAATCA TTGCCAGGCT CGTCAATGAT ATTACAGGAC

1361 ACGAGTTTGA GAAAAAACGA GAGCACGTTC GCACTGCAGT

1401 AGAATGCTAC ATGGAAGAGC ACAAAGTGGG GAAGCAAGAG

1441 GTGGTGTCTG AATTCTACAA CCAAATGGAG TCAGCATGGA

1481 AGGACATTAA TGAGGGGTTC CTCAGACCAG TTGAATTTCC

1521 AATCCCTCTA CTTTATCTTA TTCTCAATTC AGTCCGAACA

1561 CTTGAGGTTA TTTACAAAGA GGGCGATTCG TATACACACG

1601 TGGGTCCTGC AATGCAAAAC ATCATCAAGC AGTTGTACCT

1641 TCACCCTGTT CCATATTAA

An example of a Picea abies FPPS (PflFPPS) sequence is shown below as SEQ ID NO:45 (NCBI accession no. ACA21460.1).

1 MASNGIVDVK TKFEEIYLEL KAQILNDPAF DYTEDARQWV 41 EKMLDYTVPG GKLNRGLSVI DSYRLLKAGK EISEDEVFLG 81 CVLGWCIEWL QAYFLILDDI MDSSHTRRGQ PCWFRLPKVG 121 LIAVNDGILL RNHICRILKK HFRTKPYYVD LLDLFNEVEF 161 QTASGQLLDL ITTHEGATDL SKYKMPTYVR IVQYKTAYYS 201 FYLPVACALV MAGENLDNHV DVKNILVEMG TYFQVQDDYL 241 DCFGDPEVIG KIGTDIEDFK CSWLWQALE RANESQLQRL 281 YANYGKKDPS CVAEVKAVYR DLGLQDVFLE YERTSHKELI 321 SSIEAQENES LQLVLKSFLG KIYKRQK

A cDNA encoding the Picea abies FPPS (AzFPPS) with SEQ ID NO:45 is shown below as SEQ ID NO:46. 1 ATGGCTTCAA ACGGCATCGT CGACGTGAAA ACCAAGTTTG

41 AGGAAATCTA TCTTGAGCTT AAGGCTCAGA TTCTGAACGA

81 TCCTGCCTTC GATTACACCG AAGACGCCCG TCAATGGGTC

121 GAGAAGATGC TGGACTACAC GGTGCCCGGA GGAAAGCTGA

161 ACCGCGGTCT GTCTGTAATA GACAGCTACA GGCTATTGAA

201 AGCAGGAAAG GAAATATCAG AAGATGAAGT CTTTCTTGGA

241 TGTGTGCTTG GCTGGTGTAT TGAATGGCTT CAAGCATATT

281 TCCTCATATT AGATGACATC ATGGACAGCT CTCACACTAG

321 GCGTGGACAA CCTTGTTGGT TCAGATTACC TAAGGTTGGC

361 TTAATTGCTG TTAATGATGG AATATTGCTT CGTAACCACA

401 TATGCAGAAT TCTGAAAAAG CATTTTCGCA CTAAGCCTTA

441 CTATGTGGAT CTCCTTGATT TATTCAATGA GGTTGAGTTT

481 CAAACAGCTA GTGGACAGTT GCTGGACCTT ATCACTACTC

521 ATGAAGGAGC AACTGACCTT TCAAAGTACA AAATGCCAAC

561 TTATGTTCGT ATAGTTCAAT ACAAGACTGC CTACTATTCA

601 TTCTATCTGC CGGTTGCCTG TGCACTGGTA ATGGCAGGGG

641 AAAATTTAGA TAATCACGTA GATGTCAAGA ATATTTTAGT

681 CGAAATGGGA ACCTATTTTC AAGTACAGGA TGATTATCTT

721 GATTGCTTTG GTGATCCAGA AGTGATTGGG AAGATTGGAA

761 CTGATATCGA AGACTTCAAG TGCTCTTGGT TGGTGGTGCA

801 AGCCCTTGAA CGGGCAAATG AGAGCCAACT TCAACGATTA

841 TATGCCAATT ATGGAAAGAA AGATCCTTCT TGTGTTGCAG

881 AAGTGAAGGC TGTATATAGG GATCTTGGAC TTCAGGATGT

921 TTTTCTGGAA TACGAGCGTA CTAGTCACAA GGAGCTCATT

961 TCTTCCATCG AGGCTCAGGA GAATGAATCT TTGCAGCTTG

1001 TTCTGAAGTC CTTCCTAGGG AAGATATACA AGCGACAGAA

1041 GTAA

An example of a Gallus gallus FPPS (GgFPPS) polypeptide sequence is shown below as SEQ ID NO:47 (NCBI accession no. XP_015154133.1).

1 MSADGAKRTA AEREREEFVG FFPQIVRDLT EDGIGHPEVG 41 DAVARLKEVL QYNAPGGKCN RGLTVVAAYR ELSGPGQKDA 81 ESLRCALAVG WCIELFQAFF LVADDIMDQS LTRRGQLCWY 121 KKEGVGLDAI NDSFLLESSV YRVLKKYCGQ RPYYVHLLEL 161 FLQTAYQTEL GQMLDLITAP VSKVDLSHFS EERYKAIVKY 201 KTAFYSFYLP VAAAMYMVGI DSKEEHENAK AILLEMGEYF 241 QIQDDYLDCF GDPALTGKVG TDIQDNKCSW LWQCLQRVT 281 PEQRQLLEDN YGRKEPEKVA KVKELYEAVG MRAAFQQYEE 321 SSYRRLQELI EKHSNRLPKE IFLGLAQKIY KRQK

A cDNA encoding the Gallus gallus FPPS (GgFFPS) with SEQ ID NO:47 is shown below as SEQ ID NO:48.

1 AGAATGCCCC GCGCGGCGCC GGGCGGAGCG CACGGAAAGG

41 TCGCGGGGCA AAAAGCGGCG CTGAGCGGAC GGGGCCGAAC

81 GCGTCGGGGT CGCCATGAGC GCGGATGGGG CGAAGCGGAC

121 GGCGGCCGAG AGGGAGAGGG AGGAGTTCGT GGGGTTCTTC

161 CCGCAGATCG TCCGCGATCT GACCGAGGAC GGCATCGGAC

201 ACCCGGAGGT GGGCGACGCT GTGGCGCGGC TGAAGGAGGT 241 GCTGCAATAC AACGCTCCCG GTGGGAAATG CAACCGTGGG

281 CTGACGGTGG TGGCTGCGTA CCGGGAGCTG TCGGGGCCGG

321 GGCAGAAGGA TGCTGAGAGC CTGCGGTGCG CGCTGGCCGT

361 GGGTTGGTGC ATCGAGTTGT TCCAGGCCTT CTTCCTGGTG

401 GCTGATGATA TCATGGATCA GTCCCTCACG CGCCGGGGGC

441 AGCTGTGTTG GTATAAGAAG GAGGGGGTCG GTTTGGATGC

481 CATCAACGAC TCCTTCCTCC TCGAGTCCTC TGTGTACAGA

521 GTGCTGAAGA AGTACTGCGG GCAGCGGCCG TATTACGTGC

561 ATCTGTTGGA GCTCTTCCTG CAGACCGCCT ACCAGACTGA

601 GCTCGGGCAG ATGCTGGACC TCATCACAGC TCCCGTCTCC

641 AAAGTGGATT TGAGTCACTT CAGCGAGGAG AGGTACAAAG

681 CCATCGTTAA GTACAAGACT GCCTTCTACT CCTTCTACCT

721 ACCCGTGGCT GCTGCCATGT ATATGGTTGG GATCGACAGT

761 AAGGAAGAAC ACGAGAATGC CAAAGCCATC CTGCTGGAGA

801 TGGGGGAATA CTTCCAGATC CAGGATGATT ACCTGGACTG

841 CTTTGGGGAC CCGGCGCTCA CGGGGAAGGT GGGCACCGAC

881 ATCCAGGACA ATAAATGCAG CTGGCTCGTG GTGCAGTGCC

921 TGCAGCGCGT CACGCCGGAG CAGCGGCAGC TCCTGGAGGA

961 CAACTACGGC CGTAAGGAGC CCGAGAAGGT GGCGAAGGTG

1001 AAGGAGCTGT ATGAGGCCGT GGGGATGAGG GCTGCGTTCC

1041 AGCAGTACGA GGAGAGCAGC TACCGGCGCC TGCAGGAACT

1081 GATAGAGAAG CACTCGAACC GCCTCCCGAA GGAGATCTTC

1121 CTCGGCCTGG CACAGAAGAT CTACAAACGC CAGAAATGAG

1161 GGGTGGGGGC GGCAGCGGCT CTGTGCTTCG CGCTGTGTTG

1201 GGTGGCTTCG CAGCCCCGGA CCCGGTGCTC CCCCCACCCG

1241 TTATCCCCGG AGATGCGGGG GGGGGGCGGT GCGGGGCGCG

1281 CATCCATCGG TGCCGTCAGA CTGTGTGTCA ATAAACGTTA

1321 ATTTATTGCC

An Arabidopsis thaliana ribulose bisphosphate carboxylase small chain 1 A protein encoded shown below as SEQ ID NO:49.

1 MASSMLSSAT MVASPAQATM VAPFNGLKSS AAFPATRKAN 41 NDITSITSNG GRVNCMQVWP PIGKKKFETL SYLPDLTDSE 81 LAKEVDYLIR NKWIPCVEFE LEHGFVYREH GNSPGYYDGR 121 YWTMWKLPLF GCTDSAQVLK EVEECKKEYP NAFIRIIGFD 161 NTRQVQCISF IAYKPPSFTG

A nucleotide sequence for the Arabidopsis thaliana ribulose bisphosphate carboxylase small chain 1 A (NM_105379.4) is shown below as SEQ ID NO:50.

1 CCAAGGTAAA AAAAAGGTAT GAAAGCTCTA TAGTAAGTAA

41 AATATAAATT CCCCATAAGG AAAGGGCCAA GTCCACCAGG

81 CAAGTAAAAT GAGCAAGCAC CACTCCACCA TCACACAATT

121 TCACTCATAG ATAACGATAA GATTCATGGA ATTATCTTCC

161 ACGTGGCATT ATTCCAGCGG TTCAAGCCGA TAAGGGTCTC

201 AACACCTCTC CTTAGGCCTT TGTGGCCGTT ACCAAGTAAA

241 ATTAACCTCA CACATATCCA CACTCAAAAT CCAACGGTGT

281 AGATCCTAGT CCACTTGAAT CTCATGTATC CTAGACCCTC

321 CGATCACTCC AAAGCTTGTT CTCATTGTTG TTATCATTAT 361 ATATAGATGA CCAAAGCACT AGACCAAACC TCAGTCACAC

401 AAAGAGTAAA GAAGAACAAT GGCTTCCTCT ATGCTCTCTT

441 CCGCTACTAT GGTTGCCTCT CCGGCTCAGG CCACTATGGT

481 CGCTCCTTTC AACGGACTTA AGTCCTCCGC TGCCTTCCCA

521 GCCACCCGCA AGGCTAACAA CGACATTACT TCCATCACAA

561 GCAACGGCGG AAGAGTTAAC TGCATGCAGG TGTGGCCTCC

601 GATTGGAAAG AAGAAGTTTG AGACTCTCTC TTACCTTCCT

641 GACCTTACCG ATTCCGAATT GGCTAAGGAA GTTGACTACC

681 TTATCCGCAA CAAGTGGATT CCTTGTGTTG AATTCGAGTT

721 GGAGCACGGA TTTGTGTACC GTGAGCACGG TAACTCACCC

761 GGATACTATG ATGGACGGTA CTGGACAATG TGGAAGCTTC

801 CCTTGTTCGG TTGCACCGAC TCCGCTCAAG TGTTGAAGGA

841 AGTGGAAGAG TGCAAGAAGG AGTACCCCAA TGCCTTCATT

881 AGGATCATCG GATTCGACAA CACCCGTCAA GTCCAGTGCA

921 TCAGTTTCAT TGCCTACAAG CCACCAAGCT TCACCGGTTA

961 ATTTCCCTTT GCTTTTGTGT AAACCTCAAA ACTTTATCCC

1001 CCATCTTTGA TTTTATCCCT TGTTTTTCTG CTTTTTTCTT

1041 CTTTCTTGGG TTTTAATTTC CGGACTTAAC GTTTGTTTTC

1081 CGGTTTGCGA GACATATTCT ATCGGATTCT CAACTGTCTG

1121 ATGAAATAAA TATGTAATGT TCTATAAGTC TTTCAATTTG

1161 ATATGCATAT CAACAAAAAG AAAATAGGAC AATGCGGCTA

1201 CAAATATGAA ATTTACAAGT TTAAGAACCA TGAGTCGCTA

1241 AAGAAATCAT TAAGAAAATT AGTTTCAC

In some cases, a portion of the Arabidopsis thaliana ribulose bisphosphate carboxylase small chain 1 A protein can be used as a chloroplast transit peptide to relocalize cytosolic proteins to the chloroplast, for example, an Arabidopsis thaliana ribulose bisphosphate carboxylase small chain 1A peptide with SEQ ID NO:101 (shown below).

1 MASSMLSSAT MVASPAQATM VAPFNGLKSS AAFPATRKAN 41 NDITSITSNG GRVN

A nucleic acid segment that encodes the Arabidopsis thaliana ribulose bisphosphate carboxylase small chain 1A peptide with SEQ ID NO:101 is shown below as SEQ ID NO: 102.

1 ATGGCTTCCT CTATGCTCTC TTCCGCTACT ATGGTTGCCT

41 CTCCGGCTCA GGCCACTATG GTCGCTCCTT TCAACGGACT

81 TAAGTCCTCC GCTGCCTTCC CAGCCACCCG CAAGGCTAAC

121 AACGACATTA CTTCCATCAC AAGCAACGGC GGAAGAGTTA

161 AC

The enzyme and protein sequences shown herein can have one or more deletions, insertions, replacements, or substitutions without loss of their enzymatic activities. Such enzymatic activities include the synthesis of terpenes / terpenoids. The teipene synthase enzymes can have, for example, at least 60%, or at least 70%, or at least 80%, or at least 90%, or at least 95%, or at least 97%, or at least 98%, or at least 99% sequence identity to a sequence described herein.

In some cases, the enzymes and proteins described herein are naturally expressed in the cytosol, but it can be desirable to express some of these enzymes and/or proteins in plastids or other subcellular locations.

In some cases, it is useful to target enzymes and/or proteins to the plastid. To do this, a nucleic acid segment encoding the enzymes or proteins can be fused to sequences were fused at their N-terminus to the plastid targeting sequence. For example, a plastid targeting sequence of the Arabidopsis thaliana ribulose bisphosphate carboxylase small chain 1 A (NM_105379.4; SEQ ID NO:49 or 101) can be used.

For example, wild type £/HMGR, A/WRI 11-397 (transcription factor), /VbLDSP (lipid droplet surface protein), SbGGDPS, M/GGDPS, TsGGDPS,

MeGGDPS, A/FDPS and PcPAS are cytosolic proteins. However, in some cases it can be useful to target these enzymes and/or proteins to the plastid. Hence, SaGGDPS, M/GGDPS, TsGGDPS, MeGGDPS, A/FDPS and PcPAS can be targeted to plastids by fusing each of their N-termini to the plastid targeting sequence of the of the Arabidopsis thaliana ribulose bisphosphate carboxylase small chain 1A

(NM_105379.4; SEQ ID NO:49 or 101).

Some proteins / enzymes are naturally targeted to plastids, but in some cases, it can be useful to target them to the cytosol. This can be some in some cases by removing a natural plastid targeting sequence. For example, native PbOXS (CfDXS) and AgABS (plastidAgABS) each have a plastid targeting sequence in their N- terminus. To target AgABS to the cytosol, for example, the plastid targeting sequence can be removed (e.g., cytosolAgABS 85 868 , residues 1-84 were removed).

Similarly, native PsCYP720B4 and native CaCPR are naturally localized at the endoplasmic reticulum (ER; e.g., ER:PcCYP720B4 and ER:CaCPR,

respectively). To target PcCYP720B4 to the cytosol, the hydrophobic region that including amino acids 1-29 was removed (cytosol: PsC YP720B4 30 · 483 ). To target PsCYP720B4 and CaCPR to lipid droplets, hydrophobic regions were removed, and the truncated proteins were fused to /VbLDSP (LD:PsCYP720B4 3<M83 and

LD:CaCPR , respectively). Hence, the enzymes and proteins described herein can have sequences that are modified (compared to wild type) to include a segment encoding a plastid targeting sequence, or a LDSP. In some cases, the enzymes and proteins described herein can have sequences that are modified (compared to wild type) by removal of plastid targeting segments or hydrophobic regions.

Squalene Synthases

A variety of squalene synthase enzymes can be used in the methods described herein to synthesize squalene and compounds derived from squalene. Squalene is useful as a component in numerous formulations and it is a biochemical precursor to a family of steroids. Squalene synthases can be used in the expression systems and methods described herein in native or modified form. For examples, in some cases, the squalene synthases can be modified by removal of a plastidial targeting sequence or a hydrophobic region. In addition, the native or modified forms of squalene synthases can be fused to a lipid droplet surface protein (LDSP). For example, the LDSP protein can replace the truncated segments of a squalene synthase.

Examples of squalene synthases that can be used include those from

Amaranthus hybridus, Botryococcus braunii, Euphorbia laihyrism, Ganoderma lucidum, and Mortierella alpine.

For example, an Amaranthus hybridus squalene synthase (AASQS) with the following sequence is shown below as SEQ ID NO:51 (also as NCBI accession no. BAW27654.1).

In some cases, the Amaranthus hybridus squalene synthase can have a C- terminal truncation of about 30-50 amino acids. For example, the Amaranthus hybridus squalene synthase sequence with SEQ ID NO:51 can have a 41 -amino acid C-terminal truncation (AASQS CA41), with a sequence such as that shown below (SEQ ID NO:52).

1 MGSLGAILKH PDEFYPLLKL KMAVKEAEKQ IPSESHWGFC 41 YSMLHKVSRS FALVIQQLGT ELRNAVCVFY LVLRALDTVE 81 DDTSIATDVK LPILKAFYQH IYDREWHFSC GTKHYKVLMD 121 EFHQVSTAFL ELERGYQLAI EDITKRMGAG MAKFICQEVE 161 TVSDYDEYCH YVAGLVGLGL SKLFHNAGLE DLASDDLSNS 201 MGLFLQKTNI IRDYLEDINE IPKCRMFWPR EIWSKYVNKL 241 EDLKYEENSV KAVQCLNDMV TNALLHVEDC LKYMSALRDH 281 AIFRFCAIPQ IMAIGTLALC YNNVEVFRGV VKMRRGLTAR 321 VIDKTDSMPD VYGAFYDFAC MIKPKVDKND PNAMKTLSRI 361 DAIEKICRDS GTLN

In another example, a Botryococcus braunii squalene synthase can be used, for example, with the following sequence (SEQ ID NO:53; NCB1 accession no.

AAF20201.1).

1 MGMLRWGVES LQNPDELIPV LRMIYADKFG KIKPKDEDRG 41 FCYEILNLVS RSFAIVIQQL PAQLRDPVCI FYLVLRALDT 81 VEDDMKIAAT TKIPLLRDFY EKISDRSFRM TAGDQKDYIR 121 LLDQYPKVTS VFLKLTPREQ EIIADITKRM GNGMADFVHK 161 GVPDTVGDYD LYCHYVAGW GLGLSQLFVA SGLQSPSLTR 201 SEDLSNHMGL FLQKTNIIRD YFEDINELPA PRMFWPREIW 241 GKYANNLAEF KDPANKAAAM CCLNEMVTDA LRHAVYCLQY 281 MSMIEDPQIF NFCAIPQTMA FGTLSLCYNN YTIFTGPKAA 321 VKLRRGTTAK LMYTSNNMFA MYRHFLNFAE KLEVRCNTET 361 SEDPSVTTTL EHLHKIKAAC KAGLARTKDD TFDELRSRLL 401 ALTGGSFYLA WTYNFLDLRG PGDLPTFLSV TQHWWSILIF 441 LISIAVFFIP SRPSPRPTLS A

A nucleotide sequence encoding the Botryococcus braunii squalene synthase with SEQ ID NO:53 is shown below as SEQ ID NO:54 (NCBI accession no.

AF205791.1).

1 AACAGCAACA AGTCCTCTGC GTCAGGCAAA ACGTCCGTTT 41 GTATGGCTTG GCGCTTGAAA GCTGGTGGGG ATAAACGTCA 81 AAAGAAAGAA GCTCTGTTCG GGTTCACGGG TGTCGTTTAG 121 TACTTTCCCC TACGACATTG TCAGCCTTGG CTCATCGCAA 161 TCCAACCAAA TATGGGGATG CTTCGCTGGG GAGTGGAGTC 201 TTTGCAGAAT CCAGATGAAT TAATCCCGGT CTTGAGGATG 241 ATTTATGCTG ATAAGTTTGG AAAGATCAAG CCAAAGGACG 281 AAGACCGGGG CTTCTGCTAT GAAATTTTAA ACCTTGTTTC 321 AAGAAGTTTT GCAATCGTCA TCCAACAGCT CCCTGCACAG 361 CTGAGGGACC CAGTCTGCAT ATTTTACCTT GTACTACGCG 401 CCCTGGACAC AGTCGAAGAT GATATGAAAA TTGCAGCAAC 441 CACCAAGATT CCCTTGCTGC GTGACTTTTA TGAGAAAATT 481 TCTGACAGGT CATTCCGCAT GACGGCCGGA GATCAAAAAG 521 ACTACATCAG GCTGTTGGAT CAGTACCCCA AAGTGACAAG 561 CGTTTTCTTG AAATTGACCC CCCGTGAACA AGAGATAATT

601 GCAGACATTA CAAAGCGGAT GGGGAATGGA ATGGCTGACT

641 TCGTGCATAA GGGTGTTCCC GACACAGTGG GGGACTACGA

681 CCTTTACTGC CACTATGTTG CTGGGGTGGT GGGTCTCGGG

721 CTTTCCCAGT TGTTCGTTGC GAGTGGACTA CAGTCACCCT

761 CTTTGACCCG CAGTGAAGAC CTTTCCAATC ACATGGGCCT

801 CTTCCTTCAG AAGACCAACA TCATCCGCGA CTACTTTGAG

841 GACATCAATG AGCTGCCTGC CCCCCGGATG TTCTGGCCCA

881 GAGAGATCTG GGGCAAGTAT GCGAACAACC TCGCTGAGTT

921 CAAAGACCCG GCCAACAAGG CGGCTGCAAT GTGCTGCCTC

961 AACGAGATGG TCACAGATGC ATTGAGGCAC GCGGTGTACT

1001 GCCTGCAGTA CATGTCCATG ATTGAGGATC CGCAGATCTT

1041 CAACTTCTGT GCCATCCCTC AGACCATGGC CTTCGGCACC

1081 CTGTCTTTGT GTTACAACAA CTACACTATC TTCACAGGGC

1121 CCAAAGCGGC TGTGAAGCTG CGTAGGGGCA CCACTGCCAA

1161 GCTGATGTAC ACCTCTAACA ATATGTTTGC GATGTACCGT

1201 CATTTCCTCA ACTTCGCAGA GAAGCTGGAA GTCAGATGCA

1241 ACACCGAGAC CAGCGAGGAT CCCAGCGTGA CCACCACTCT

1281 GGAACACCTG CATAAGATCA AAGCTGCCTG CAAGGCTGGG

1321 CTGGCACGCA CAAAAGATGA CACCTTTGAC GAATTGAGGA

1361 GCAGGTTGTT AGCGCTGACG GGAGGCAGCT TCTACCTCGC

1401 CTGGACCTAC AATTTCCTAG ACCTTCGAGG CCCGGGAGAC

1441 CTGCCCACCT TCTTATCTGT AACCCAACAT TGGTGGTCTA

1481 TTCTGATCTT CCTCATTTCG ATTGCCGTCT TCTTTATTCC

1521 GTCGAGGCCC TCACCTAGAC CCACACTCAG CGCCTAATCC

1561 TTTGGCTCTC GTCAATTCCG GAGTCCCCCA TTGTTGTCAG

1601 CACTTGGGGA ATTTCGTGGT CTTCTTGACC ACACTCTTGT

1641 CTCTGGCAGA GGTCAAGGAC ACTGTCAGGG ACAAGTGAGT

1681 ATTCTGACCC CCCCCCCCCC CCCCCTCTGC TCCTTTCACC

1721 ACCCCTCCCT ATCATCTGGG GCAAAGCTTG GGAATGGGCC

1761 CGTCCCCCTG TTGTCCCGCT CAGATGCAAA GTTTGGGTTA

1801 TGTAACTGGG TTGAACGGCT CGGGGCGGTT TGAAGCTGTC

1841 CCTTGTTGGA GATGGAAAAT TGCAGGGCCC GGGGGGGTTA

1881 ACTGGACACG CTCTTCCGTC CCGCAGTGTC CTTCTGGCTT

1921 TATTCTGCCG TGGATGCTGT GAACCCGCCC CCTCTCTGGG

1961 CCGGCTCAAT ATACAAGTAT TAGTTTCGGT GTTTGTGTCA

2001 ATCCTTTCTC ACAACTTCCC TGTTCGTTGG ACTGGAGACG

2041 CACCCTTAGG TCCTTTGATT GGGAATGCGG CCCCTTTGGG

2081 TCTTTAGGCT CTCGGGTAGT CTAGTTTGCA ATTGTTGCAT

2121 GGGCGCGGCT TTGCACAGAC GCCTGGACCT TCATTGAGAC

2161 ACGTTTCGGA AAACTCGACA GTTTTGAGGT AACCTGCTCG

2201 TGGGCCTCGG TGTGTCTGGA GGTGTCAGGG GCCTGTGCTC

2241 CCTGCTGGGA TGTTCCCGCT TTGCTGTAAA AAGTCGGACG

2281 TTTGTTATCC TTTGCGGGGG TTCATCTTTG AGTGGGCCCT

2321 GCTTCTCTGC CCGTGTGATG TAATGGTTTG TATTGGATAG

2361 GTATGTTGCC TTATCTCGTG TATGGAATTC GTATGGTACT

2401 TGCAGTATTC AGGAGACTTG AGTAACGACA TCGAGGACAG

2441 GTAACAAGCG CTCCGATTAT GTGCTCTGTT ACACCCGACT

2481 TCCAAAGATT TATGCGAGGT CCTGGGGAAC GCAGATTTGA

2521 CATTGGAGAG CCCCAATTGG CCGTGGCAAT CTGTAGAATG

2561 TCAAAAGAGA AAACAGGAAA TCAGGTTTTA AAGTCCGTGC

In some cases, the Botryococcus braunii squalene synthase can have a C- terminal truncation, for example, of about 40-85 amino acids. Such a C-terminal truncation of a Botryococcus braunii squalene synthase can have 40 amino acids truncated from the C-terminus, and the following sequence (SEQ ID NO:55) (also called BbSQS CA40).

Another a C-terminal truncation of a Botryococcus braunii squalene synthase can have 83 amino acids truncated from the C-terminus, and die following sequence (SEQ ID NO:56) (also called BbSQS CA83).

In another example, an Euphorbia lathyris squalene synthase can be used, for example, with the following sequence (SEQ ID NO:57; UNIPROT accession no. AOAOA6ZA44_9ROSI).

1 MGSLGAILKH PDDFYPLLKL KMAAKHAEKQ IPAQPHWGFC

41 YSMLHKVSRS FSLVIQQLGT ELRDAVCIFY LVLRALDTVE

81 DDTSIPTDVK VPILIAFHKH IYDPEWHFSC GTKEYKVLMD

121 QIHHLSTAFL ELGKSYQEAI EDITKKMGAG MAKFICKEVE

161 TVDDYDEYCH YVAGLVGLGL SKLFDASGFE DLAPDDLSNS

201 MGLFLQKTNI IRDYLEDINE IPKSRMFWPR QIWSKYVNKL

241 EDLKYEENSV KAVQCLNDMV TNALIHMDDC LKYMSALRDP

281 AIFRFCAIPQ IMAIGTLALC YNNVEVFRGV VKMRRGLTAK

321 VIDRTRTMAD VYRAFFDFSC MMKSKVDRND PNAEKTLNRL

361 EAVQKTCKES GLLNKRRSYI NESKPYNSTM VILLMIVLAI

401 ILAYLSKRAN

A nucleotide sequence encoding the Euphorbia lathyris squalene synthase with SEQ ID NO:57 is shown below as SEQ ID NO:58 (NCBI accession no.

JQ694152.1).

1 GAACCTTGTG GCGTGCAGAG AGAGACAGAG AGAGACAGAG

41 ATTGTTGAAT CTCTATTTAA TTCATAGTAG CCTCATTGGA

81 CTCAATCCGT CGTTTTCGTT TCCATCTCCT TTAAAAACCA

121 GTCGATCGTT TCTCCTCAAT TTCGACTTCA ACTCTTTCTT

161 TCGCTTATTC ATTTGGTTTT TCAAGGGATC TGAGGATAAT

201 GGGGAGTTTG GGAGCAATTC TGAAGCATCC GGATGATTTT

241 TACCCGCTTT TGAAGCTGAA AATGGCTGCT AAACATGCTG

281 AGAAGCAGAT CCCAGCACAA CCTCACTGGG GTTTCTGTTA

321 CTCCATGCTT CATAAGGTCT CTCGTAGCTT TTCTCTTGTC

361 ATTCAACAGC TTGGCACTGA GCTCCGTGAC GCTGTTTGTA

401 TATTCTATTT GGTTCTTCGA GCCCTTGATA CTGTTGAGGA

441 TGATACAAGC ATCCCTACAG ATGTGAAAGT GCCGATCTTG

481 ATAGCTTTTC AGAAGCAGAT ATACGATCCT GAATGGCATT

521 TTTCTTGTGG TACTAAGGAA TATAAAGTTC TCATGGACCA

561 GATTCATCAT CTTTCAACTG CTTTTCTTGA GCTTGGGAAA

601 AGTTATCAGG AGGCAATCGA GGATATCACG AAAAAAATGG

641 GTGCAGGAAT GGCTAAATTC ATATGCAAAG AGGTGGAAAC

681 AGTTGATGAC TACGATGAAT ATTGCCATTA TGTTGCAGGA

721 CTTGTTGGAC TAGGTCTTTC CAAGCTTTTT GATGCCTCTG

761 GATTTGAAGA TTTGGCACCA GATGACCTTT CCAACTCGAT

801 GGGGTTATTT CTCCAGAAAA CAAACATTAT CCGGGATTAT

841 TTGGAGGATA TAAATGAGAT ACCTAAGTCA CGCATGTTTT

881 GGCCTCGCCA GATCTGGAGT AAATATGTTA ATAAACTTGA

921 GGACTTGAAA TATGAAGAAA ACTCAGTCAA GGCAGTGCAA

961 TGCTTGAATG ATATGGTTAC TAATGCTTTG ATACATATGG

1001 ATGATTGCTT GAAATACATG TCGGCACTAC GAGATCCTGC

1041 TATATTTCGT TTTTGTGCCA TCCCTCAGAT TATGGCAATT

1081 GGAACCCTAG CATTGTGCTA CAACAACGTT GAAGTATTTA

1121 GAGGTGTAGT GAAGATGAGG CGTGGTCTTA CTGCAAAGGT

1161 CATTGACAGA ACAAGGACCA TGGCAGATGT CTATCGGGCC

1201 TTCTTTGACT TCTCATGTAT GATGAAATCC AAGGTTGACA 1241 GGAATGATCC AAATGCAGAA AAGACATTGA ACAGGCTGGA 1281 AGCAGTGCAA AAAACTTGCA AGGAGTCTGG GCTGCTAAAC 1321 AAAAGGAGAT CTTACATAAA TGAGAGCAAG CCATATAATT 1361 CTACTATGGT TATTCTACTG ATGATTGTAT TGGCAATCAT 1401 TTTGGCTTAT CTGAGCAAAC GGGCCAACTA ACTAGTGTAA 1441 CTTCTGTTAA GTAATCAGTT GAGGATTTGA ATCCGGTTAT 1481 CGTGAAACCG GGTTATTGCA GGATGTCTAC TTCTGTGAAC 1521 AATTTCTGCA GATGGATGGC TAGCTAGCAA TGAAGGTGCT 1561 TGCTGGACTT GTTCCAGGAG AGTTGTGAAT TTGATGTTTC 1601 AGTATATAGT GTAGTGCCAT AACAATGTTT GTGTCCAATG 1641 TGCCACTAAT GTGATCATAT TAGTGTTTTG TTCTCGTGGG 1681 TTGTTATTAT ACTCCTTAAT TATGGAATTG AAGCAATATC 1721 TTGAAGGATC TTCTGAATAT CTTGATTCAA GTCGCTGTTA 1761 TTCACATC

In some cases, the Euphorbia lathyris squalene synthase can have a C-terminal truncation, for example, of about 20-50 amino acids. Such a C-terminal truncation of a Euphorbia lathyris squalene synthase can have 36 amino acids truncated from the C-terminus, and the following sequence (SEQ ID NO:59) (also called EISQS CA36).

1 MGSLGAILKH PDDFYPLLKL KMAAKHAEKQ IPAQPHWGFC

41 YSMLHKVSRS FSLVIQQLGT ELRDAVCIFY LVLRALDTVE

81 DDTSIPTDVK VPILIAFHKH IYDPEWHFSC GTKEYKVLMD

121 QIHHLSTAFL ELGKSYQEAI EDITKKMGAG MAKFICKEVE

161 TVDDYDEYCH YVAGLVGLGL SKLFDASGFE DLAPDDLSNS

201 MGLFLQKTNI IRDYLEDINE IPKSRMFWPR QIWSKYVNKL

241 EDLKYEENSV KAVQCLNDMV TNALIHMDDC LKYMSALRDP

281 AIFRFCAIPQ IMAIGTLALC YNNVEVFRGV VKMRRGLTAK

321 VIDRTRTMAD VYRAFFDFSC MMKSKVDRND PNAEKTLNRL

361 EAVQKTCKES GLLN

In another example, a Ganoderma lucidum squalene synthase can be used, for example, with the following sequence (SEQ ID NO:61; NCB1 accession no.

ABF57213.1).

1 MGATSMLTLL LTHPFEFRVL IQYKLWHEPK RDITQVSEHP 41 TSGWDRPTMR RCWEFLDQTS RSFSGVIKEV EGDLARVICL 81 FYLVLRGLDT IEDDMTLPDE KKQPILRQFH KLAVKPGWTF 121 DECGPKEKDR QLLVEWTWS EELNRLDACY RDIIIDIAEK 161 MQTGMADYAH KAATTNSIYI GTVDEYNLYC HYVAGLVGEG 201 LTRFWAASGK EAEWLGDQLE LTNAMGLMLQ KTNIIRDFRE 241 DAEERRFFWP REIWGRDAYG KAVGRANGFR EMHELYERGN 281 EKQALWVQSG MWDVLGHAT DSLDYLRLLT KQSIFCFCAI 321 PQTMAMATLS LCFMNYDMFH NHIKIRRAEA ASLIMRSTNP 361 RDVAYIFRDY ARKMHARALP EDPSFLRLSV ACGKIEQWCE 401 RHYPSFVRLQ QVSGGGIVFD PSDARTKWE AAQARDNELA 441 REKRLAELRD KTGKLERKLR WSQAPSS A nucleotide sequence encoding the Ganoderma lucidum squalene synthase with SEQ ID NO:61 is shown below as SEQ ID NO:62 (NCBI accession no.

DQ494674.1).

1 ATGGGCGCGA CGTCTATGCT CACCCTCCTC CTCACACACC

41 CCTTCGAGTT CCGCGTCCTC ATCCAATACA AGCTCTGGCA

81 CGAACCAAAA CGCGACATTA CCCAAGTCTC CGAGCACCCG

121 ACTTCAGGAT GGGACCGCCC TACTATGCGA CGGTGTTGGG

161 AGTTCCTTGA CCAGACCAGC CGGAGTTTCT CTGGGGTCAT

201 CAAGGAAGTG GAGGGTGATT TAGCAAGAGT GATCTGCTTA

241 TTCTACCTGG TGCTACGAGG CCTGGACACG ATCGAAGATG

281 ACATGACGCT TCCTGACGAG AAAAAACAAC CCATACTCCG

321 ACAATTCCAC AAACTCGCCG TGAAGCCCGG TTGGACATTC

361 GACGAGTGTG GACCCAAAGA AAAGGACAGG CAACTCCTCG

401 TCGAGTGGAC AGTTGTCAGC GAAGAGCTCA ACCGTCTCGA

441 CGCATGCTAC CGCGATATTA TTATCGACAT TGCGGAAAAG

481 ATGCAGACCG GGATGGCCGA CTACGCGCAT AAAGCAGCGA

521 CCACGAATTC GATTTACATC GGAACCGTCG ACGAGTACAA

561 CCTCTACTGC CACTACGTCG CCGGCCTCGT CGGCGAGGGC

601 CTCACGCGCT TCTGGGCCGC GTCCGGCAAG GAGGCGGAAT

641 GGCTGGGGGA CCAGCTCGAG CTGACGAACG CGATGGGCCT

681 CATGCTGCAG AAGACGAACA TTATCCGTGA CTTCCGCGAG

721 GACGCCGAGG AGCGCCGCTT CTTCTGGCCG CGCGAGATCT

761 GGGGGCGCGA CGCATACGGC AAGGCCGTCG GCCGCGCGAA

801 CGGGTTCCGC GAGATGCACG AGCTGTACGA GCGGGGCAAC

841 GAGAAGCAGG CGCTGTGGGT GCAGAGCGGG ATGGTCGTTG

881 ACGTGCTCGG GCACGCTACA GACTCGCTCG ACTATCTCCG

921 CCTACTCACG AAGCAGAGCA TCTTCTGCTT CTGTGCGATC

961 CCACAAACGA TGGCCATGGC CACCCTCAGC TTGTGCTTCA

1001 TGAACTACGA CATGTTCCAC AACCATATCA AGATCCGCAG

1041 GGCTGAGGCT GCCTCGCTTA TTATGCGGTC AACGAACCCC

1081 CGCGACGTCG CATACATTTT CCGCGACTAC GCGCGCAAGA

1121 TGCACGCCCG CGCGCTGCCC GAGGACCCCT CCTTCCTCCG

1161 CCTCTCCGTC GCGTGCGGCA AGATCGAGCA GTGGTGCGAG

1201 CGCCACTACC CCTCCTTTGT CCGCCTCCAG CAGGTCTCGG

1241 GTGGGGGCAT CGTGTTCGAC CCGAGCGACG CGCGCACCAA

1281 GGTCGTCGAG GCCGCGCAGG CCCGCGACAA CGAGCTCGCG

1321 CGCGAGAAGC GCCTGGCCGA GCTCCGTGAC AAGACTGGAA

1361 AGCTTGAGCG CAAGCTGCGG TGGAGTCAAG CCCCATCGAG

1401 CTGA

In some cases, the Ganoderma lucidum squalene synthase can have a C- terminal truncation, for example, of about 20-80 amino acids. Such a Ganoderma lucidum squalene synthase can, for example, have 61 amino acids truncated from the C-terminus, to have the following sequence (SEQ ID NO:63) (also called GISQS CA61).

1 MGATSMLTLL LTHPFEFRVL IQYKLWHEPK RDITQVSEHP

In another example, a Ganoderma lucidum squalcne synthase can, for example, have 30 amino acids truncated from the C-terminus, and the following sequence (SEQ ID NO:64) (also called G/SQS CA30).

In another example, a Mortierella alpina squalene synthase can be used, for example, with the following sequence (SEQ ID NO:65; NCBI accession no.

ALA40031.1).

A nucleotide sequence encoding the Mortierella alpina squalene synthase with SEQ

ID NO:65 is shown below as SEQ ID NO:66 (NCBI accession no. KT318395.1). 81 CCAACACGAC TACAGCAACG ATAAAACCAG GCAGCGCCTC

121 TACCACCACT TGAACATGAC CTCGCGTAGT TTCTCAGCGG

161 TCATCCAGGA TCTGGACGAG GAACTGAAGG ATGCGATTTG

201 CTTGTTCTAC CTCGTCCTTC GTGGACTCGA TACCATTGAG

241 GACGATATGA CGATTGATTT GGACACCAAG TTGCCATATC

281 TGAGGACGTT CCACGAAATC ATCTACCAGA AGGGATGGAC

321 CTTTACGAAG AATGGTCCTA ACGAAAAAGA CCGCCAGTTG

361 CTGGTTGAGT TTGACGCCAT CATCGAGGGA TTCTTGCAAC

401 TAAAGCCAGC GTATCAAACC ATCATTGCCG ACATCACTAA

441 ACGCATGGGC AATGGAATGG CTCACTACGC CACTGCAGGA

481 ATTCACGTTG AGACTAATGC TGATTATGAC GAATACTGCC

521 ATTACGTCGC GGGCCTTGTT GGTCTGGGAT TGAGCGAGAT

561 GTTCAGCGCC TGTGGATTTG AATCGCCTTT GGTAGCCGAG

601 AGAAAAGACC TCTCAAACTC GATGGGTCTG TTTCTCCAAA

641 AGACCAACAT CGCACGCGAT TATCTCGAGG ATCTGCGCGA

681 CAATCGCCGT TTCTGGCCAA AGGAGATCTG GGGCCAGTAT

721 GCGGAAACGA TGGAGGACCT AGTCAAGCCC GAGAACAAGG

761 AGAAGGCTCT GCAGTGTCTG AGCCACATGA TCGTCAACGC

801 CATGGAGCAC ATCCGAGATG TCCTCGAGTA CCTTAGTATG

841 ATCAAGAACC CGTCCTGCTT TAAGTTCTGT GCGATTCCCC

881 AGGTTATGGC CATGGCGACT TTGAACCTCC TCCACTCCAA

921 CTACAAGGTT TTTACGCACG AGAATATCAA AATCCGCAAG

961 GGCGAGACAG TGTGGCTGAT GAAGGAGTCA GACAGCATGG

1001 ACAAGGTGGC AGCCATCTTC CGACTTTATG CGCGCCAGAT

1041 CAACAACAAG TCAAACTCTC TGGACCCCCA CTTTGTTGAC

1081 ATCGGTGTCA TTTGCGGCGA GATTGAGCAG ATCTGTGTTG

1121 GAAGGTTCCC AGGATCCACG ATTGAGATGA AGCGCATGCA

1161 AGCTGGAGTG CTGGGCGGCA AAACCGGAAC CGTGCTTGCT

1201 GCAGCTGCGG CTGTTGCAGG AGCTGTTGTT ATCAACAATG

1241 CGCTCGCATA A

In some cases, the Mortierella alpina squalene synthase can have a C-terminal truncation, for example, of about 10-40 amino acids. Such a Mortierella alpina squalene synthase can, for example, have 37 amino acids truncated from the C- terminus, to have the following sequence (SEQ ID NO:67) (also called MaSQS CA37).

1 MASAILASLL HPSEVLALVQ YKLSPKTQHD YSNDKTRQRL 41 YHHLNMTSRS FSAVIQDLDE ELKDAICLFY LVLRGLDTIE 81 DDMTIDLDTK LPYLRTFHEI IYQKGWTFTK NGPNEKDRQL 121 LVEFDAIIEG FLQLKPAYQT IIADITKRMG NGMAHYATAG 161 IHVETNADYD EYCHYVAGLV GLGLSEMFSA CGFESPLVAE 201 RKDLSNSMGL FLQKTNIARD YLEDLRDNRR FWPKEIWGQY 241 AETMEDLVKP ENKEKALQCL SHMIVNAMEH IRDVLEYLSM 281 IKNPSCFKFC AIPQVMAMAT LNLLHSNYKV FTHENIKIRK 321 GETVWLMKES DSMDKVAAIF RLYARQINNK SNSLDPHFVD 361 IGVICGEIEQ ICVGRFPGS In another example, a Mortierella alpina squalene synthase can, for example, have 17 amino acids truncated from the C-terminus, and the following sequence (SEQ ID NO:68) (also called MaSQS CA17).

Hence, a variety of native and modified squalene synthases can be used in the expression systems, cells, and methods described herein.

WRINKLED (WRI1)

WRINKLED 1 (WRI1) is a member of the AP2/EREBP family of transcription factors and master regulator of fatty acid biosynthesis in seeds. Because WRI1 is a transcription factor, it is generally expressed in the cytosol and not expressed as a fusion partner with a lipid droplet surface protein. However, ectopic production of WRI1 in vegetative tissues promotes fatty acid synthesis in plastids and, indirectly, triacylglycerol accumulation in lipid droplets.

As illustrated herein, increased WR11 expression can increase the synthesis of proteins involved in oil synthesis. The data provided herein also shows that co- expression of WRI1 with ectopic lipid biosynthesis enzymes and a lipid droplet associated protein can improve terpene and terpenoid production.

Plants can be generated as described herein to include WRINKLED 1 nucleic acids that encode WRINKLED transcription factors. Plants are especially desirable when the WR1NKLED1 nucleic acids are operably linked to control sequences capable of WRINKLED 1 expression in a multitude of plant tissues, or in selected tissues and during selected parts of the plant life cycle to optimize the synthesis of oil and terpenoids. Such control sequences are typically heterologous to the coding region of the WRINKLED 1 nucleic acids. One example of an amino acid sequence for a WRINKLED 1 ( WR11 ) sequence from Arabidopsis thatiana is available as accession number AAP80382.1

(GI:32364685) and is reproduced below as SEQ ID NO:69.

1 MKKRLTTSTC SSSPSSSVSS STTTSSPIQS EAPRPKRAKR

41 AKKSSPSGDK SHNPTSPAST RRSSIYRGVT RHRWTGRFEA

81 HLWDKSSWNS IQNKKGKQVY LGAYDSEEAA AHTYDLAALK

121 YWGPDTILNF PAETYTKELE EMQRVTKEEY LASLRRQSSG

161 FSRGVSKYRG VARHHHNGRW EARIGRVFGN KYLYLGTYNT

201 QEEAAAAYDM AAIEYRGANA VTNFDISNYI DRLKKKGVFP

241 FPVNQANHQE GILVEAKQEV ETREAKEEPR EEVKQQYVEE

281 PPQEEEEKEE EKAEQQEAEI VGYSEEAAW NCCIDSSTIM

301 EMDRCGDNNE LAWNFCMMDT GFSPFLTDQN LANENPIEYP

361 ELFNELAFED NIDFMFDDGK HECLNLENLD CCWGRESPP

401 SSSSPLSCLS TDSASSTTTT TTSVSCNYLV

A nucleic acid sequence for the above Arabidopsis thaliana WRI1 protein is available as accession number AY254038.2 (01:51859605), and is reproduced below as SEQ ID NO:70.

1 AAACCACTCT GCTTCCTCTT CCTCTGAGAA ATCAAATCAC

41 TCACACTCCA AAAAAAAATC TAAACTTTCT CAGAGTTTAA

81 TGAAGAAGCG CTTAACCACT TCCACTTGTT CTTCTTCTCC

121 ATCTTCCTCT GTTTCTTCTT CTACTACTAC TTCCTCTCCT

161 ATTCAGTCGG AGGCTCCAAG GCCTAAACGA GCCAAAAGGG

201 CTAAGAAATC TTCTCCTTCT GGTGATAAAT CTCATAACCC

241 GACAAGCCCT GCTTCTACCC GACGCAGCTC TATCTACAGA

281 GGAGTCACTA GACATAGATG GACTGGGAGA TTCGAGGCTC

301 ATCTTTGGGA CAAAAGCTCT TGGAATTCGA TTCAGAACAA

361 GAAAGGCAAA CAAGTTTATC TGGGAGCATA TGACAGTGAA

401 GAAGCAGCAG CACATACGTA CGATCTGGCT GCTCTCAAGT

421 ACTGGGGACC CGACACCATC TTGAATTTTC CGGCAGAGAC

481 GTACACAAAG GAATTGGAAG AAATGCAGAG AGTGACAAAG

521 GAAGAATATT TGGCTTCTCT CCGCCGCCAG AGCAGTGGTT

581 TCTCCAGAGG CGTCTCTAAA TATCGCGGCG TCGCTAGGCA

601 TCACCACAAC GGAAGATGGG AGGCTCGGAT CGGAAGAGTG

641 TTTGGGAACA AGTACTTGTA CCTCGGCACC TATAATACGC

681 AGGAGGAAGC TGCTGCAGCA TATGACATGG CTGCGATTGA

721 GTATCGAGGC GCAAACGCGG TTACTAATTT CGACATTAGT

761 AATTACATTG ACCGGTTAAA GAAGAAAGGT GTTTTCCCGT

801 TCCCTGTGAA CCAAGCTAAC CATCAAGAGG GTATTCTTGT

841 TGAAGCCAAA CAAGAAGTTG AAACGAGAGA AGCGAAGGAA

881 GAGCCTAGAG AAGAAGTGAA ACAACAGTAC GTGGAAGAAC

921 CACCGCAAGA AGAAGAAGAG AAGGAAGAAG AGAAAGCAGA

961 GCAACAAGAA GCAGAGATTG TAGGATATTC AGAAGAAGCA

1001 GCAGTGGTCA ATTGCTGCAT AGACTCTTCA ACCATAATGG

1041 AAATGGATCG TTGTGGGGAC AACAATGAGC TGGCTTGGAA

1081 CTTCTGTATG ATGGATACAG GGTTTTCTCC GTTTTTGACT

1121 GATCAGAATC TCGCGAATGA GAATCCCATA GAGTATCCGG 114 1 AGCTATTCAA TGAGTTAGCA TTTGAGGACA ACATCGACTT 1201 CATGTTCGAT GATGGGAAGC ACGAGTGCTT GAACTTGGAA 1241 AATCTGGATT GTTGCGTGGT GGGAAGAGAG AGCCCACCCT 1281 CTTCTTCTTC ACCATTGTCT TGCTTATCTA CTGACTCTGC 1321 TTCATCAACA ACAACAACAA CAACCTCGGT TTCTTGTAAC 1361 TATTTGGTCT GAGAGAGAGA GCTTTGCCTT CTAGTTTGAA 1401 TTTCTATTTC TTCCGCTTCT TCTTCTTTTT TTTCTTTTGT 1441 TGGGTTCTGC TTAGGGTTTG TATTTCAGTT TCAGGGCTTG 1481 TTCGTTGGTT CTGAATAATC AATGTCTTTG CCCCTTTTCT 1501 AATGGGTACC TGAAGGGCGA

Yields of triacylglycerol and terpenoids can further be increased by removal of an intrinsically disordered C-terminal region of Arabidopsis thaliana WRI1. For example, use of a truncated WRI1 protein with amino acids 1-397 (AfWRIl (1-397)) can increase the WRil protein stability and increase the amounts of oils and terpenoids produced by plants and plant cells.

The A. thaliana WRINKLED 1 (AfWRIl 1-397; SEQ ID NO:29) amino acid sequence is shown below.

1 MKKRLTTSTC SS SPSSSVSS STTTSSPIQS EAPRPKRAKR

41 AKKSSPSGDK SHNPTSPAST RRSSIYRGVT RHRWTGRFEA

81 HLWDKSSWNS IQNKKGKQVY LGAYDSEEAA AHTYDLAALK

121 YWGPDTILNF PAETYTKELE EMQRVTKEEY LASLRRQSSG

161 FSRGVSKYRG VARHHHNGRW EARIGRVFGN KYLYLGTYNT

201 QEEAAAAYDM AAIEYRGANA VTNFDI SNYI DRLKKKGVFP

241 FPVNQANHQE GI LVEAKQEV ETREAKEEPR EEVKQQYVEE

281 PPQEEEEKEE EKAEQQEAEI VGYSEEAAW NCCIDSSTIM

321 EMDRCGDNNE LAWNFCMMDT GFSPFLTDQN LANENPIEYP

361 ELFNELAFED NI DFMFDDGK HECLNLENLD CCWGRESPP

401 SSS SPLSCLS TDSASSTTTT TTSVSCNYLV

The A. thaliana WRINKLED 1 (AfWRIl 1 -397; SEQ ID NO:30) nucleotide sequence is shown below.

1 AAACCACTCT GCTTCCTCTT CCTCTGAGAA ATCAAATCAC 41 TCACACTCCA AAAAAAAATC TAAACTTTCT CAGAGTTTAA 81 TGAAGAAGCG CTTAACCACT TCCACTTGTT CTTCTTCTCC 121 ATCTTCCTCT GTTTCTTCTT CTACTACTAC TTCCTCTCCT 161 ATTCAGTCGG AGGCTCCAAG GCCTAAACGA GCCAAAAGGG 201 CTAAGAAATC TTCTCCTTCT GGTGATAAAT CTCATAACCC 241 GACAAGCCCT GCTTCTACCC GACGCAGCTC TATCTACAGA 281 GGAGTCACTA GACATAGATG GACTGGGAGA TTCGAGGCTC 321 ATCTTTGGGA CAAAAGCTCT TGGAATTCGA TTCAGAACAA 361 GAAAGGCAAA CAAGTTTATC TGGGAGCATA TGACAGTGAA 401 GAAGCAGCAG CACATACGTA CGATCTGGCT GCTCTCAAGT 441 ACTGGGGACC CGACACCATC TTGAATTTTC CGGCAGAGAC 481 GTACACAAAG GAATTGGAAG AAATGCAGAG AGTGACAAAG 521 GAAGAATATT TGGCTTCTCT CCGCCGCCAG AGCAGTGGTT 561 TCTCCAGAGG CGTCTCTAAA TATCGCGGCG TCGCTAGGCA

601 TCACCACAAC GGAAGATGGG AGGCTCGGAT CGGAAGAGTG

641 TTTGGGAACA AGTACTTGTA CCTCGGCACC TATAATACGC

681 AGGAGGAAGC TGCTGCAGCA TATGACATGG CTGCGATTGA

721 GTATCGAGGC GCAAACGCGG TTACTAATTT CGACATTAGT

761 AATTACATTG ACCGGTTAAA GAAGAAAGGT GTTTTCCCGT

801 TCCCTGTGAA CCAAGCTAAC CATCAAGAGG GTATTCTTGT

841 TGAAGCCAAA CAAGAAGTTG AAACGAGAGA AGCGAAGGAA

881 GAGCCTAGAG AAGAAGTGAA ACAACAGTAC GTGGAAGAAC

921 CACCGCAAGA AGAAGAAGAG AAGGAAGAAG AGAAAGCAGA

961 GCAACAAGAA GCAGAGATTG TAGGATATTC AGAAGAAGCA

1001 GCAGTGGTCA ATTGCTGCAT AGACTCTTCA ACCATAATGG

1041 AAATGGATCG TTGTGGGGAC AACAATGAGC TGGCTTGGAA

1081 CTTCTGTATG ATGGATACAG GGTTTTCTCC GTTTTTGACT

1121 GATCAGAATC TCGCGAATGA GAATCCCATA GAGTATCCGG

1161 AGCTATTCAA TGAGTTAGCA TTTGAGGACA ACATCGACTT

1201 CATGTTCGAT GATGGGAAGC ACGAGTGCTT GAACTTGGAA

1241 AATCTGGATT GTTGCGTGGT GGGAAGAGAG AGCCCACCCT

1281 CTTCTTCTTC ACCATTGTCT TGCTTATCTA CTGACTCTGC

1321 TTCATCAACA ACAACAACAA CAACCTCGGT TTCTTGTAAC

1361 TATTTGGTCT GAGAGAGAGA GCTTTGCCTT CTAGTTTGAA

1401 TTTCTATTTC TTCCGCTTCT TCTTCTTTTT TTTCTTTTGT

1441 TGGGTTCTGC TTAGGGTTTG TATTTCAGTT TCAGGGCTTG

1481 TTCGTTGGTT CTGAATAATC AATGTCTTTG CCCCTTTTCT

1521 AATGGGTACC TGAAGGGCGA

Other types ofWRIl proteins (e.g., with different sequences) can also be used, such as any of the WRI1 proteins and sequences therefor that are described hereinbelow and in published US Patent Application US 2017/0002371 (which is incorporated by reference herein in its entirety).

For example, the WRI1 protein has a PEST domain that has an amino acid sequence enriched in proline (P), glutamic acid (E), serine (S), and threonine (T)), which is associated with intrinsically disordered regions (IDRs). Removal of the C- terminal PEST domain from WRI1 or use of mutations in such C-terminal PEST domains results in a more stable WRI1 transcription factors and increased oil biosynthesis by plants expressing such deleted or mutated WRINKLED transcription factors.

The Arabidopsis thaliana protein with SEQ ID NO:69 can have C-terminal deletions or mutations, for example in the following PEST sequence (SEQ ID NO:71).

396 RESPP SS SSPLSCLS TDSASSTTTT TTSVSCNYLV . For example, expression of a C-terminally truncated Arabidopsis thaliana WRI1 protein or an Arabidopsis thaliana WRI1 protein with at least four mutations at any of positions 398, 401, 402, 407, 415, 416, 420, 421, 422, and/or 423 increases the content of triacylglycerol in plant tissues such as leaves and seeds. Hence, a mutant WRI1 protein can be used in the systems and methods described herein that includes a substitution, insertion, or deletion in any of the X residues of tire following sequence (SEQ ID NO:72):

396 REXPP XXSSPLXCLS TDSAXXTTTX XXXVSCNYLV. For example, at least four of the X residues in the SEQ ID NO:72 sequence can be a substitution, insertion, or deletion compared to the wild type sequence (SEQ ID NO:71). The X residues are not acidic amino acids, for example, the X residues are not aspartic acid or glutamic acid. However, the X residue can be a small amino acid or a hydrophobic amino acid. For example, the X residues can each separately be alanine, glycine, valine, leucine, isoleucine, methionine, or any mixture thereof. As illustrated herein, WRI1 proteins with an alanine instead of a serine or a threonine at each of positions 398, 401, 402, and 407 have increased stability and, when expressed in plant cells, the cells produce more triacylglycerols than do wild type plants that do not express such a mutant WRI1 protein.

Another aspect of the invention is a mutant WRI1 protein with a truncation at the C terminus of at least 5, or at least 7, or at least 10, or at least 13, or at least 15, or at least 17, or at least 20, or at least 25, or at least 30, or at least 35, or at least 40, or at least 45 amino acids. For example, such deletions can be within the SEQ ID NO:50 portion of the WRI1 protein. Such mutant WRI1 proteins can be expressed in plant tissues to increase the oil / fatty acid / TAG content of those tissues.

Other types of WRI1 proteins also have utility for increasing the oil / fatty acid / TAG content of lipid droplets within plant tissues.

For example, an amino acid sequence for a WRI1 sequence from Brassica napus is available as accession number AD016346.1 (GI:308193634). This Brassica napus WRINKLED 1 sequence is reproduced below as SEQ ID NO:73.

1 MKRPLTTSPS TSSSTSSSAC ILPTQPETPR PKRAKRAKKS

41 SIPTDVKPQN PTSPASTRRS SIYRGVTRHR WTGRYEAHLW

81 DKSSWNSIQN KKGKQVYLGA YDSEEAAAHT YDLAALKYWG

121 PDTILNFPAE TYTKELEEMQ RCTKEEYLAS LRRQSSGFSR

161 GVSKYRGVAR HHHNGRWEAR IGRVFGNKYL YLGTYNTQEE 201 AAAAYDMAAI EYRGANAVTN FDISNYIDRL KKKGVFPFPV 241 SQANHQEAVL AEAKQEVEAK EEPTEEVKQC VEKEEPQEAK 281 EEKTEKKQQQ QEVEEAWTC CIDSSESNEL AWDFCMMDSG 301 FAPFLTDSNL SSENPIEYPE LFNEMGFEDN IDFMFEEGKQ 361 DCLSLENLDC CDGWWGRE SPTSLSSSPL SCLSTDSASS 401 TTTTTITSVS CNYSV

A nucleic acid sequence for the above Brassica napus WRI1 protein is available as accession number HM370542.1 (01:308193633), and is reproduced below as SEQ ID NO:74.

1 ATGAAGAGAC CCTTAACCAC TTCTCCTTCT ACCTCCTCTT

41 CTACTTCTTC TTCGGCTTGT ATACTTCCGA CTCAACCAGA

61 GACTCCAAGG CCCAAACGAG CCAAAAGGGC TAAGAAATCT

121 TCTATTCCTA CTGATGTTAA ACCACAGAAT CCCACCAGTC

161 CTGCCTCCAC CAGACGCAGC TCTATCTACA GAGGAGTCAC

201 TAGACATAGA TGGACAGGGA GATACGAGGC TCATCTATGG

241 GACAAAAGCT CGTGGAATTC GATTCAGAAC AAGAAAGGCA

281 AACAAGTTTA TCTGGGAGCA TATGACAGCG AGGAAGCAGC

321 AGCGCATACG TACGATCTAG CTGCTCTCAA GTACTGGGGT

361 CCCGACACCA TCTTGAACTT TCCGGCTGAG ACGTACACAA

401 AGGAGTTGGA GGAGATGCAG AGATGTACAA AGGAAGAGTA

441 TTTGGCTTCT CTCCGCCGCC AGAGCAGTGG TTTCTCTAGA

481 GGCGTCTCTA AATATCGCGG CGTCGCCAGG CATCACCATA

521 ACGGAAGATG GGAAGCTAGG ATTGGAAGGG TGTTTGGAAA

541 CAAGTACTTG TACCTCGGCA CTTATAATAC GCAGGAGGAA

601 GCTGCAGCTG CATATGACAT GGCGGCTATA GAGTACAGAG

641 GCGCAAACGC AGTGACCAAC TTCGACATTA GTAACTACAT

681 CGACCGGTTA AAGAAAAAAG GTGTCTTCCC ATTCCCTGTG

721 AGCCAAGCCA ATCATCAAGA AGCTGTTCTT GCTGAAGCCA

761 AACAAGAAGT GGAAGCTAAA GAAGAGCCTA CAGAAGAAGT

801 GAAGCAGTGT GTCGAAAAAG AAGAACCGCA AGAAGCTAAA

841 GAAGAGAAGA CTGAGAAAAA ACAACAACAA CAAGAAGTGG

881 AGGAGGCGGT GGTCACTTGC TGCATTGATT CTTCGGAGAG

921 CAATGAGCTG GCTTGGGACT TCTGTATGAT GGATTCAGGG

961 TTTGCTCCGT TTTTGACGGA TTCAAATCTC TCGAGTGAGA

1001 ATCCCATTGA GTATCCTGAG CTTTTCAATG AGATGGGGTT

104 1 TGAGGATAAC ATTGACTTCA TGTTCGAGGA AGGGAAGCAA

1081 GACTGCTTGA GCTTGGAGAA TCTGGATTGT TGCGATGGTG

1121 TTGTTGTGGT GGGAAGAGAG AGCCCAACTT CATTGTCGTC

1161 TTCACCGTTG TCTTGCTTGT CTACTGACTC TGCTTCATCA

1201 ACAACAACAA CAACAATAAC CTCTGTTTCT TGTAACTATT

124 1 CTGTCTGA Expression of a C-terminally truncated Brassica napus WRI1 protein or an Brassica napus WRI1 protein with a mutation (e.g., substitution, insertion, or deletion) at four or more of positions 381 , 383, 384, 386, 387, 388, 391, 399, 400, 401, 402, 403, 404, 405, 407, or 408 can increase the content of triacylglycerol in plant tissues such as leaves and seeds. Hence, a mutant WRI1 protein can be used in the systems and methods described herein that includes a mutation (substitution, insertion, or deletion) in the following sequence (SEQ ID NO: 75):

379 RE SPTSLSSSPL SCLSTDSASS TTTTTITSVS CNYSV

For example, expression of a C-terminally truncated Brassica napus WRI1 protein or a Brassica napus WRI1 protein with at least four mutations (substitution, insertion, or deletion) at any of positions 381, 383, 384, 386, 387, 388, 391 , 399, 400, 401, 402, 403, 404, 405, 407, and/or 408 can increase the content of triacylglycerol in plant tissues such as leaves and seeds. Hence, a mutant WRI1 protein can be used that includes the following sequence (SEQ 1D NO: 76):

RE XPXXLXXXPL XCLSTDSAXX XXXXXIXXVS CNYSV where at least four of the X residues in the SEQ ID NO:76 sequence is a substitution, insertion, or deletion compared to the wild type sequence (SEQ ID NO:75). The X residues are not acidic amino acids such as aspartic acid or glutamic acid. However, the X residue can be a small amino acid or a hydrophobic amino add. For example, the X residues can each separately be alanine, glycine, valine, leucine, isoleucine, methionine, or any mixture thereof.

Another aspect of the invention is a mutant WRI1 protein with a truncation at the C terminus of the SEQ ID NO:69 (or the SEQ ID NO:73) sequence of at least 4, or at least 5, or at least 7, or at least 10, or at least 13, or at least 15, or at least 17, or at least 20, or at least 25, or at least 30, or at least 35, or at least 40, or at least 45 amino acids. Such mutant WRI1 proteins can be expressed in plant tissues to increase the oil / fatty acid / TAG content of those tissues.

Another example of an amino add sequence for a WRINKLED 1 (WRI1 ) sequence from Brassica napus is available as accession number ABD16282.1

(GI:87042570), and is reproduced below as SEQ P) NO:77.

1 MKRPLTTSPS SSSSTSSSAC ILPTQSETPR PKRAKRAKKS 41 SLRSDVKPQN PTSPASTRRS SIYRGVTRHR WTGRYEAHLW 81 DKSSWNSIQN KKGKQVYLGA YDSEEAAAHT YDLAALKYWG 121 PNTILNFPVE TYTKELEEMQ RCTKEEYLAS LRRQSSGFSR 161 GVSKYRGVAR HHHNGRWEAR IGRVFGNKYL YLGTYNTQEE 201 AAAAYDMAAI EYRGANAVTN FDIGNYIDRL KKKGVFPFPV 241 SQANHQEAVL AETKQEVEAK EEPTEEVKQC VEKEEAKEEK 281 TEKKQQQEVE EAVITCCIDS SESNELAWDF CMMDSGFAPF 321 LTDSNLSSEN PIEYPELFNE MGFEDNIDFM FEEGKQDCLS 361 LENLDCCDGV WVGRESPTS LSSSPLSCLS TDSASSTTTT 401 ATTVTSVSWN YSV

A nucleic acid sequence for the above Brussica napus WRI1 protein is available as accession number DQ370141.1 (GI:87042569), and is reproduced below as SEQ ID NO:78.

1 ATGAAGAGAC CCTTAACCAC TTCTCCTTCT TCCTCCTCTT

41 CTACTTCTTC TTCGGCCTGT ATACTTCCGA CTCAATCAGA

61 GACTCCAAGG CCCAAACGAG CCAAAAGGGC TAAGAAATCT

121 TCTCTGCGTT CTGATGTTAA ACCACAGAAT CCCACCAGTC

161 CTGCCTCCAC CAGACGCAGC TCTATCTACA GAGGAGTCAC

181 TAGACATAGA TGGACAGGGA GATACGAAGC TCATCTATGG

241 GACAAAAGCT CGTGGAATTC GATTCAGAAC AAGAAAGGCA

281 AACAAGTTTA TCTGGGAGCA TATGACAGCG AGGAAGCAGC

321 AGCACATACG TACGATCTAG CTGCTCTCAA GTACTGGGGT

361 CCCAACACCA TCTTGAACTT TCCGGTTGAG ACGTACACAA

401 AGGAGCTGGA GGAGATGCAG AGATGTACAA AGGAAGAGTA

441 TTTGGCTTCT CTCCGCCGCC AGAGCAGTGG TTTCTCTAGA

481 GGCGTCTCTA AATATCGCGG CGTCGCCAGG CATCACCATA

521 ATGGAAGATG GGAAGCTCGG ATTGGAAGGG TGTTTGGAAA

541 CAAGTACTTG TACCTCGGCA CCTATAATAC GCAGGAGGAA

601 GCTGCAGCTG CATATGACAT GGCGGCTATA GAGTACAGAG

641 GTGCAAACGC AGTGACCAAC TTCGACATTG GTAACTACAT

681 CGACCGGTTA AAGAAAAAAG GTGTCTTCCC GTTCCCCGTG

721 AGCCAAGCTA ATCATCAAGA AGCTGTTCTT GCTGAAACCA

761 AACAAGAAGT GGAAGCTAAA GAAGAGCCTA CAGAAGAAGT

801 GAAGCAGTGT GTCGAAAAAG AAGAAGCTAA AGAAGAGAAG

841 ACTGAGAAAA AACAACAACA AGAAGTGGAG GAGGCGGTGA

881 TCACTTGCTG CATTGATTCT TCAGAGAGCA ATGAGCTGGC

921 TTGGGACTTC TGTATGATGG ATTCAGGGTT TGCTCCGTTT

961 TTGACTGATT CAAATCTCTC GAGTGAGAAT CCCATTGAGT

1001 ATCCTGAGCT TTTCAATGAG ATGGGTTTTG AGGATAACAT

1041 TGACTTCATG TTCGAGGAAG GGAAGCAAGA CTGCTTGAGC

1081 TTGGAGAATC TTGATTGTTG CGATGGTGTT GTTGTGGTGG

1121 GAAGAGAGAG CCCAACTTCA TTGTCGTCTT CTCCGTTGTC

1141 CTGCTTGTCT ACTGACTCTG CTTCATCAAC AACAACAACA

1201 GCAACAACAG TAACCTCTGT TTCTTGGAAC TATTCTGTCT

1241 GA Expression of a C -terminally truncated Brassica rtapus WRI1 protein or a Brassica napus WRI1 protein with a mutation at four or more of positions 381, 383, 384, 385, 387, 388, 391, 394, 399, 400, 401, 402, 403, 404, 406, 407, 409, and/or 410 can increase the content of triacylglycerol in plant tissues such as leaves and seeds. Hence, a mutant WRI1 protein can be used that includes a mutation (substitution, insertion, or deletion) in the following sequence (SEQ ID NO:79):

For example, expression of a C-terminally truncated Brassica napus WRI1 protein or a Brassica napus WRI1 protein with at least four mutations at any of positions 381, 383, 384, 385, 387, 388, 391, 394, 399, 400, 401, 402, 403, 404, 406, 407, 409, and/or 410 can increase the content of triacylglycerol in plant tissues such as leaves and seeds. Hence, a mutant WRI1 protein can be used that includes the following sequence (SEQ ID NO: 80):

379 RE XPXXLXSSPL XCLXTDSAXX XXXXAXXVXX VSWN where at least four of the X residues in the SEQ ID NO:80 sequence is a substitution, insertion, or deletion compared to the wild type sequence (SEQ ID NO:79). The X residues are not acidic amino acids such as aspartic acid or glutamic acid. However, the X residue can be a small amino acid or a hydrophobic amino acid. For example, the X residues can each separately be alanine, glycine, valine, leucine, isoleucine, methionine, or any mixture thereof.

In some cased, a mutant WRI1 protein can be used in the systems and methods that has a truncation at the C terminus of the SEQ ID NO:73 (or from the SEQ ID NO:77) sequence of at least 4, or at least 5, or at least 7, or at least 10, or at least 13, or at least 15, or at least 17, or at least 20, or at least 25, or at least 30, or at least 35, or at least 40, or at least 45 amino acids. Such mutant WRI1 proteins can be expressed in plant tissues to increase the oil / fatty acid / TAG content of those tissues.

Other Brassica napus amino acid and cDNA WRINKLED 1 (WRI1) sequences are available as accession numbers ABD72476.1 (GI:89357185) and DQ402050.1 (GI:89357184), respectively.

An example of an amino add sequence for a WRINKLED 1 (WRI1) sequence from Zea mays is available as accession number ACG32367.1

(GL195621074) and reproduced below as SEQ ID NO:81. 1 MERSQRQSPP PPSPSSSSSS VSADTVLVPP GKRRRAATAK

41 AGAEPNKRIR KDPAAAAAGK RSSVYRGVTR HRWTGRFEAH

81 LWDKHCLAAL HNKKKGRQVY LGAYDSEEAA ARAYDLAALK

121 YWGPETLLNF PVEDYSSEMP EMEAVSREEY LASLRRRSSG

161 FSRGVSKYRG VARHHHNGRW EARIGRVFGN KYLYLGTFDT

201 QEEAAKAYDL AAIEYRGVNA VTNFDISCYL DHPLFLAQLQ

241 QEPQWPALN QEPQPDQSET GTTEQEPESS EAKTPDGSAE

281 PDENAVPDDT AEPLSTVDDS IEEGLWSPCM DYELDTMSRP

321 NFGSSINLSE WFADADFDCN IGCLFDGCSA ADEGSKDGVG

361 LADFSLFEAG DVQLKDVLSD MEEGIQPPAM ISVCN

A nucleic acid sequence for the above Zea mays WR11 protein sequence is available as accession number EU960249.1 (01:195621073), and is reproduced below as SEQ ID NO:82.

1 CTCCCCCGCC TCGCCGCCAG TCAGATTCAC CACCGGCTCC

41 CCTGCACAAC CGCGTCCGCG CTGCACCACC ACCGTTCATC

81 GAGGAGGAGG GGGGACGGAG ACCACGGACA TGGAGAGATC

121 TCAACGGCAG TCTCCTCCGC CACCGTCGCC GTCCTCCTCC

161 TCGTCCTCCG TCTCCGCGGA CACCGTCCTC GTCCCTCCCG

201 GAAAGAGGCG GAGGGCGGCG ACGGCCAAGG CCGGCGCCGA

241 GCCTAATAAG AGGATCCGCA AGGACCCCGC CGCCGCCGCC

281 GCGGGGAAGA GGAGCTCCGT CTACAGGGGA GTCACCAGGC

321 ACAGGTGGAC GGGCAGGTTC GAGGCGCATC TCTGGGACAA

361 GCACTGCCTC GCCGCGCTCC ACAACAAGAA GAAAGGCAGG

401 CAAGTCTACC TGGGGGCGTA TGACAGCGAG GAGGCAGCTG

441 CTCGTGCCTA TGACCTCGCA GCTCTCAAGT ACTGGGGTCC

481 TGAGACTCTG CTCAACTTCC CTGTGGAGGA TTACTCCAGC

521 GAGATGCCGG AGATGGAGGC CGTTTCCCGG GAGGAGTACC

561 TGGCCTCCCT CCGCCGCAGG AGCAGCGGCT TCTCCAGGGG

601 CGTCTCCAAG TACAGAGGCG TCGCCAGGCA TCACCACAAC

641 GGGAGGTGGG AGGCACGGAT TGGGCGAGTC TTTGGGAACA

681 AGTACCTCTA CTTGGGAACA TTTGACACTC AAGAAGAGGC

721 AGCCAAGGCC TATGACCTTG CGGCCATTGA ATACCGTGGC

761 GTCAATGCTG TAACCAACTT CGACATCAGC TGCTACCTGG

801 ACCACCCGCT GTTCCTGGCA CAGCTCCAAC AGGAGCCACA

841 GGTGGTGCCG GCACTCAACC AAGAACCTCA ACCTGATCAG

881 AGCGAAACCG GAACTACAGA GCAAGAGCCG GAGTCAAGCG

921 AAGCCAAGAC ACCGGATGGC AGTGCAGAAC CCGATGAGAA

961 CGCGGTGCCT GACGACACCG CGGAGCCCCT CAGCACAGTC

1001 GACGACAGCA TCGAAGAGGG CTTGTGGAGC CCTTGCATGG

1041 ATTACGAGCT AGACACCATG TCGAGACCAA ACTTTGGCAG

1081 CTCAATCAAT CTGAGCGAGT GGTTCGCTGA CGCAGACTTC

1121 GACTGCAACA TCGGGTGCCT GTTCGATGGG TGTTCTGCGG

1161 CTGACGAAGG AAGCAAGGAT GGTGTAGGTC TGGCAGATTT

Expression of an internally deleted Zea mays WRI1 protein or a Zea mays WRI1 protein with a mutation at four or more of amino acid positions 358, 360, 362, 363, 369, 370, 374, 378, 395, 395, 400, 407, 416, 418, and/or 419 can increase the content of triacylglycerol in plant tissues such as leaves and seeds. Hence, one aspect of die invention is a mutant WRI1 protein that includes a mutation (substitution, insertion, or deletion) in the following sequence (SEQ ID NO:83):

For example, expression of an internally deleted Zea mays WRI1 protein or a Zea mays WRI1 protein with a mutation at four or more of the following positions 358, 360, 362, 363, 369, 370, 374, 378, 395, 395, 400, 407, 416, 418, and/or 419 can increase the content of triacylglycerol in plant tissues. Hence, another aspect of the invention is a mutant WRIl protein that includes a mutation (substitution, insertion, or deletion) in die following sequence (SEQ ID NO: 84):

where at least four of the X residues in tire SEQ ID NO:84 sequence is a substitution, insertion, or deletion compared to the wild type sequence (SEQ 1D NO:83). The X residues are not acidic amino acids such as aspartic acid or glutamic acid. However, the X residue can be a small amino acid or a hydrophobic amino acid. For example, the X residues can each separately be alanine, glycine, valine, leucine, isoleucine, methionine, or any mixture thereof. A mutant WRI1 protein with a deletion within the SEQ ID NO:83 portion of the WRI1 protein of at least 3, or at least 4, or at least 5, or at least 6, or at least 7, or at least 8, or at least 10, or at least 13, or at least 15, or at least 17, or at least 20, or at least 25, or at least 30, or at least 35, or at least 40, or at least 45 amino acids. Such mutant WRI1 proteins can be expressed in plant tissues to increase the oil / fatty acid / TAG content of those tissues.

Another example of an amino acid sequence for a WRINKLED 1 (WRI1) sequence from Zea mays is available as accession number NP_001131733.1

(GI:212721372) and reproduced below as SEQ ID NO:85.

1 MTMERSQPQH QQSPPSPSSS SSCVSADTVL VPPGKRRRRA

41 ATAKANKRAR KDPSDPPPAA GKRSSVYRGV TRHRWTGRFE

81 AHLWDKHCLA ALHNKKKGRQ VYLGAYDGEE AAARAYDLAA

121 LKYWGPEALL NFPVEDYSSE MPEMEAASRE EYLASLRRRS

161 SGFSRGVSKY RGVARHHHNG RWEARIGRVL GNKYLYLGTF

201 DTQEEAAKAY DLAAIEYRGA NAVTNFDISC YLDHPLFLAQ

241 LQQEQPQWP ALDQEPQADQ REPETTAQEP VSSQAKTPAD

281 DNAEPDDIAE PLITVDNSVE ESLWSPCMDY ELDTMSRSNF

321 GSSINLSEWF TDADFDSDLG CLFDGRSAVD GGSKGGVGVA

361 DFSLFEAGDG QLKDVLSDME EGIQPPTIIS VCN

A nucleic acid sequence for the above Zea mays WRI1 protein sequence is available as accession number NM_001138261.1 (GI:212721371), and is reproduced below as SEQ ID NO:86.

1 CGTTCATGCA TGACCATGGA GAGATCTCAA CCGCAGCACC 41 AGCAGTCTCC TCCGTCGCCG TCGTCCTCCT CGTCCTGCGT 81 CTCCGCGGAC ACCGTCCTCG TCCCTCCGGG AAAGAGGCGG 121 CGGAGGGCGG CGACAGCCAA GGCCAATAAG AGGGCCCGCA 161 AGGACCCCTC TGATCCTCCT CCCGCCGCCG GGAAGAGGAG 201 CTCCGTATAC AGAGGAGTCA CCAGGCACAG GTGGACGGGC 241 AGGTTCGAGG CGCATCTCTG GGACAAGCAC TGCCTCGCCG 281 CGCTCCACAA CAAGAAGAAA GGCAGGCAAG TCTATCTGGG 321 GGCGTACGAC GGCGAGGAGG CAGCGGCTCG TGCCTATGAC 361 CTTGCAGCTC TCAAGTACTG GGGTCCTGAG GCTCTGCTCA 401 ACTTCCCTGT GGAGGATTAC TCCAGCGAGA TGCCGGAGAT 441 GGAGGCAGCG TCCCGGGAGG AGTACCTGGC CTCCCTCCGC 481 CGCAGGAGCA GCGGCTTCTC CAGGGGGGTC TCCAAGTACA 521 GAGGCGTCGC CAGGCATCAC CACAACGGGA GATGGGAGGC 561 ACGGATCGGG CGAGTTTTAG GGAACAAGTA CCTCTACTTG 601 GGAACATTCG ACACTCAAGA AGAGGCAGCC AAGGCCTATG 641 ATCTTGCGGC CATCGAATAC CGAGGTGCCA ATGCTGTAAC 681 CAACTTCGAC ATCAGCTGCT ACCTGGACCA CCCACTGTTC 721 CTGGCGCAGC TCCAGCAGGA GCAGCCACAG GTGGTGCCAG

761 CGCTCGACCA AGAACCTCAG GCTGATCAGA GAGAACCTGA

801 AACCACAGCC CAAGAGCCTG TGTCAAGCCA AGCCAAGACA

841 CCGGCGGATG ACAATGCAGA GCCTGATGAC ATCGCGGAGC

881 CCCTCATCAC GGTCGACAAC AGCGTCGAGG AGAGCTTATG

921 GAGTCCTTGC ATGGATTATG AGCTAGACAC CATGTCGAGA

961 TCTAACTTTG GCAGCTCGAT CAACCTGAGC GAGTGGTTCA

1001 CTGACGCAGA CTTCGACAGC GACTTGGGAT GCCTGTTCGA

1041 CGGGCGCTCT GCAGTTGATG GAGGAAGCAA GGGTGGCGTA

1081 GGTGTGGCGG ATTTCAGTTT GTTTGAAGCA GGTGATGGTC

1121 AGCTGAAGGA TGTTCTTTCG GATATGGAAG AGGGGATACA

1161 ACCTCCAACG ATAATCAGTG TGTGCAATTG ATTCTGAGAC

1201 CTATGCGTGG CGTGCGACAA GTGTCCTGTC TTTGGGTATA

1241 CTTGGTTTGT CCAATGCCAC GGTGCCACTG CTGCGAGTCA

1281 GCTGAACTTC TTGTAGAAAG CACATGGCAG CTTGGCATTA

1321 GACAAGTGTG TTGGTGTTCC TTAATTCTTT GGATATGCTT

1361 TAGGCATTGA CTAACCTTAA GGGTTCGTCA CTGTCTCGCT

1401 TAGCTTAGAT TAGACTAATC ACATCCTTGA ATCTGAAGTA

1441 GTTGTGCAGT ATCACAGTTT CACATGGCAA TTCTGCCAAT

1481 GCAGCATAGA TTTGTTCGTT TGAACAGCTG TAACTGTAAC

1521 CCTATAGCTC CAGATTAAGG AACAGTTTGT TTTTCATCCA

1561 T

Expression of an internally deleted Zea mays WRI1 protein or a Zea mays WRI1 protein with a mutation at four or more of positions 265, 266, 272, 273, 277, 294, 298, 302, 305, 314, and/or 316 can increase the content of triacylglyeerol in plant tissues such as leaves and seeds. Hence, a mutant WRI1 protein can be used in the systems and methods described herein that includes a mutation (substitution, insertion, or deletion) in the following sequence (SEQ ID NO:87):

261 REPETTAQEP VSSQAKTPAD

281 DNAEPDDIAE PLITVDNSVE ESLWSPCMDY ELDTMSR

For example, expression of an internally deleted Zea mays WRI1 protein or a Zea mays WRIl protein with a mutation at four or more of positions 265, 266, 272, 273, 277, 294, 298, 302, 305, 314, and/or 316 can increase die content of

triacylglyeerol in plant tissues. Hence, a mutant WRIl protein can be used that includes the following sequence (SEQ ID NO:88):

261 REPEXXAQEP VXXQAKXPAD

281 DNAEPDDIAE PLIXVDNXVE EXLWXPCMDY ELDXMXR where at least four of the X residues in the SEQ ID NO:88 sequence is a substitution, insertion, or deletion compared to the wild type sequence (SEQ 1D NO:87). The X residues are not acidic amino acids such as aspartic acid or glutamic acid. However, the X residue can be a small amino acid or a hydrophobic amino add. For example, the X residues can each separately be alanine, glycine, valine, leucine, isoleucine, methionine, or any mixture thereof.

Another aspect of the invention is a mutant WRI1 protein with a deletion within the SEQ ID NO:85 or SEQ ID NO:88 portion of the WRI1 protein of at least 3, or at least 4, or at least 5, or at least 6, or at least 7, or at least 8, or at least 10, or at least 13, or at least 15, or at least 17, or at least 20, or at least 25, or at least 30, or at least 35, or at least 40, or at least 45 amino acids.

An example of an amino add sequence for a WRINKLED 1 (WR11) sequence from Elaeis guineensis (palm oil) is available as accession number

XP_010922928.1 (GL743789536) and reproduced below as SEQ ID NO:89.

1 MTLMKNSPPS TPLPPISPSS SASPSSYAPL SSPNMIPLNK

41 CKKSKPKHKK AKNSDESSRR RSSIYRGVTR HRGTGRYEAH

81 LWDKHWQHPV QNKKGRQVYL GAFTDELDAA RAHDLAALKL

121 WGPETILNFP VEMYREEYKE MQTMSKEEVL ASVRRRSNGF

161 ARGTSKYRGV ARHHKNGRWE ARLSQDVGCK YIYLGTYATQ

201 EEAAQAYDLA ALVHKGPNIV TNFASSVYKH RLQPFMQLLV

241 KPETEPAQED LGVLQMEATE TIDQTMPNYD LPEISWTFDI

281 DHDLGAYPLL DVPIEDDQHD ILNDLNFEGN IEHLFEEFET

321 FGGNESGSDG FSASKGA

A nucleic acid sequence for the above Elaeis guineensis WRI1 protein sequence is available as accession number XM_010924626.1 (GI:743789535), and is reproduced below as SEQ ID NO:90.

1 AGAGAGAGAG AGATTCCAAC ACAGGGCAGC TGAGATTGAG

41 CACAAGGCGC CGTGGAAACC ACGAGTTCCA TTGGCAACAT

81 GGGAAACCTG GTGGCCAAGT GTAGAGCTCT CTCACACAAA

121 CCCATGCGGC CAACTTGCAG ACCCTCGAGT CATTTGGACT

161 CTTCCAAGCT CACCAGCCGT AGGGTTTTTT GACAAGAGGG

201 ACCTCCAGTA AACGTTAAAC AAACTCGCAG CTCCCACCTT

241 TGGATCCATT CCATCGCTTC AACGGTGGGT TAGAAGCCTC

281 CGCGCCAAAT GCACGAGTGC TCAACAGCAC GCTCCCCTAA

321 TTTTTCTCTC TCCACCTCCT CACTTCTCTA TATATAATCC 361 TCTCTTTGGT GAACCACCAT CAACCAAACC AACGGTATAG

401 TATACGTAGG AAATAATCCC TTTCTAGAAC ATGACTCTCA

441 TGAAGAAATC TCCTCCCTCT ACTCCTCTCC CACCAATATC

481 GCCTTCCTCT TCCGCTTCAC CATCCAGCTA TGCACCCCTT

521 TCTTCTCCTA ATATGATCCC TCTTAACAAG TGCAAGAAGT

561 CGAAGCCAAA ACATAAGAAA GCTAAGAACT CAGATGAAAG

601 CAGTAGGAGA AGAAGCTCTA TCTACAGAGG AGTCACGAGG

641 CACCGAGGGA CTGGGAGATA TGAAGCTCAC CTGTGGGACA

681 AGCACTGGCA GCATCCGGTC CAGAACAAGA AAGGCAGGCA

721 AGTTTACTTG GGAGCCTTTA CTGATGAGTT GGACGCAGCA

761 CGAGCTCATG ACTTGGCTGC CCTTAAGCTC TGGGGTCCAG

801 AGACAATTTT AAACTTCCCT GTGGAAATGT ATAGAGAAGA

841 GTACAAGGAG ATGCAAACCA TGTCAAAGGA AGAGGTGCTG

881 GCTTCGGTTA GGCGCAGGAG CAACGGCTTT GCCAGGGGTA

921 CCTCTAAGTA CCGTGGGGTG GCCAGGCATC ACAAAAACGG

961 CCGGTGGGAG GCCAGGCTTA GCCAGGACGT TGGCTGCAAG

1001 TACATCTACT TGGGAACATA CGCAACTCAA GAGGAGGCTG

1041 CCCAAGCTTA TGATTTAGCT GCTCTAGTAC ACAAAGGGCC

1081 AAATATAGTG ACCAACTTTG CTAGCAGTGT CTATAAGCAT

1121 CGCCTACAGC CATTCATGCA GCTATTAGTG AAGCCTGAGA

1161 CGGAGCCAGC ACAAGAAGAC CTGGGGGTTA TGCAAATGGA

1201 AGCAACCGAG ACAATCGATC AGACCATGCC AAATTACGAC

1241 CTGCCGGAGA TCTCATGGAC CTTCGACATA GACCATGACT

1281 TAGGTGCATA TCCTCTCCTT GATGTCCCAA TTGAGGATGA

1321 TCAACATGAC ATCTTGAATG ATCTCAATTT CGAGGGGAAC

1361 ATTGAGCACC TCTTTGAAGA GTTTGAGACC TTCGGAGGCA

1401 ATGAGAGTGG AAGTGATGGT TTCAGTGCAA GCAAAGGTGC

1441 CTAGCAGAGG AAAGTGGTTT GAAGATGGAG GACATGGCAT

1481 CTAAAGCGAA CTGAGCCTCC TGGCCTCTTC AAAGTAGTGT

1521 CTGCTTTTTA GAAATCTTGG TGGGTCGATT TGAGTTAGGA

1561 GCCCGATACT TCTATCAGGG GATATGTTTA GCTACAATTC

1601 TAGTTTTTTT TTCTTTTTTT TTTTTCAGCC GGAAGTCTGG

1641 TACTTCTGTT GAATATTATG ATGTGCTTCT TGCTTAGTTG

1681 TTCCTGTTCT TCTCCCTTTT AGAGTTCAGC ATATTTATGT

1721 TTTGATGTAA TGGGGAATGT TGGCAGACAG CTTGATATAT

1761 GGTTATTTCA TTCTCCATTA AA

Expression of an internally deleted Elaeis guineensis WRI1 protein or an Elaeis guineensis WRI1 protein with a mutation at four or more of the following positions 244, 259, 261, 265, 275, and/or 277 can increase the content of triacylglycerol in plant tissues such as leaves and seeds. Hence, in some cases a mutant WRI1 protein is used that includes a mutation (e.g., a substitution, insertion, or deletion) in the following sequence (SEQ ID NO:91):

241 KPETEPAQED LGVLQMEATE TIDQTMPNYD LPEISWTFDI DH For example, expression of an internally deleted Elaeis guineensis WRI1 protein or an Elaeis guineensis WRI1 protein with a mutation at four or more of positions 244, 259, 261, 265, 275, and/or 277 can increase the content of

triacylglycerol in plant tissues. Hence, in some cases a mutant WRI1 protein can be used that includes the following sequence (SEQ ID NO: 92):

241 KPEXEPAQED LGVLQMEAXE XIDQXMPNYD LPEIXWXFDI DH where at least four of the X residues in the SEQ ID NO:92 sequence is a substitution, insertion, or deletion compared to the wild type sequence (SEQ ID NO:91). The X residues are not acidic amino acids such as aspartic acid or glutamic acid. However, the X residue can be a small amino acid or a hydrophobic amino add. For example, the X residues can each separately be alanine, glycine, valine, leucine, isoleucine, methionine, and any mixture thereof.

Another aspect of the invention is a mutant WRI1 protdn with a deletion within the SEQ ID NO:89 or SEQ ID NO:91 portion of the WRI1 protein of at least 3, or at least 4, or at least 5, or at least 6, or at least 7, or at least 8, or at least 10, or at least 13, or at least 15, or at least 17, or at least 20, or at least 25, or at least 30, or at least 35, or at least 40, or at least 45 amino acids. Such mutant WRI1 proteins can be expressed in plant tissues to increase the oil / fatty acid / TAG content of those tissues.

An example of an amino add sequence for a WRINKLED1 (WRJ1) sequence from Glycine max (soybean) is available as accession number

XP_006596987.1 (01:571513961) and reproduced below as SEQ ID NO:93).

1 MKRSPASSCS SSTSSVGFEA PIEKRRPKHP RRNNLKSQKC

41 KQNQTTTGGR RSSIYRGVTR HRWTGRFEAH LWDKSSWNNI

81 QSKKGRQGAY DTEESAARTY DLAALKYWGK DATLNFPIET

121 YTKELEEMDK VSREEYLASL RRQSSGFSRG LSKYRGVARH

161 HHNGRWEARI GRVCGNKYLY LGTYKTQEEA AVAYDMAAIE

201 YRGVNAVTNF DISNYMDKIK KKNDQTQQQQ TEAQTETVPN

241 SSDSEEVEVE QQTTTITTPP PSENLHMPPQ QHQVQYTPHV

281 SPREEESSSL ITIMDHVLEQ DLPWSFMYTG LSQFQDPNLA

321 FCKGDDDLVG MFDSAGFEED IDFLFSTQPG DETESDVNNM

361 SAVLDSVECG DTNGAGGSMM HVDNKQKIVS FASSPSSTTT

401 VSCDYALDL A nucleic acid sequence for the above Glycine max WRI1 protein sequence is available as accession number XM_006596924.1 (GI:571513960), and is reproduced below as SEQ ID NO:94.

1 AGTGTTGCTC AAATTCAAGC CACTTAATTA GCCATGGTTG

41 ATTGATCAAG TTAAATTCCA ACCCAAGGTT AAATCATTAC

81 TCCCTTCTCA TCCTTCCCAA CCCCAACCCC CAGAAATATT

121 ACAGATTCAA TTGCTTAATT AAATACTATT TTCCCCTCCT

161 TCTATAATAC CCTCCAAAAT CTTTTTCCTT CTTCATTCTC

201 CCTTTCTCTA TGTTTTGGCA AACCACTTTA GGTAACCAGA

241 TTACTACTAC TATTGCTTCA TATACAAAGA TGCTATCGTA

281 AAAAAGAGAG AAACTTGGGA AGTGGGAACA CATTCAAAAT

321 CCTTGTTTTT CTTTTTGGTC TAATTTTTCA TCTCAAAACA

361 CACACCCATT GAGTATTTTT CATTTTTTTG TTCTTTTGGG

401 ACAAAAAAGG TGGGTGTTGT TGGCATTATT GAAGATAGAG

441 GCCCCCAAAA TGAAGAGGTC TCCAGCATCT TCTTGTTCAT

481 CATCTACTTC CTCTGTTGGG TTTGAAGCTC CCATTGAAAA

521 AAGAAGGCCT AAGCATCCAA GGAGGAATAA TTTGAAGTCA

561 CAAAAATGCA AGCAGAACCA AACCACCACT GGTGGCAGAA

601 GAAGCTCTAT CTATAGAGGA GTTACAAGGC ATAGGTGGAC

641 AGGGAGGTTT GAAGCTCACC TATGGGATAA GAGCTCTTGG

681 AACAACATTC AGAGCAAGAA GGGTCGACAA GGGGCATATG

721 ATACTGAAGA ATCTGCAGCC CGTACCTATG ACCTTGCAGC

761 CCTTAAATAC TGGGGAAAAG ATGCAACCCT GAATTTCCCG

801 ATAGAAACTT ATACCAAGGA GCTCGAGGAA ATGGACAAGG

841 TTTCAAGAGA AGAATATTTG GCTTCTTTGC GGCGCCAAAG

881 CAGTGGCTTT TCTAGAGGCC TGTCTAAGTA CCGTGGGGTT

921 GCTAGGCATC ATCATAATGG TCGCTGGGAA GCACGAATTG

961 GAAGAGTATG CGGAAACAAG TACCTCTACT TGGGGACATA

1001 TAAAACTCAA GAGGAGGCAG CAGTGGCATA TGACATGGCA

1041 GCAATAGAGT ACCGTGGAGT CAATGCAGTG ACCAATTTTG

1081 ACATAAGCAA CTACATGGAC AAAATAAAGA AGAAAAATGA

1121 CCAAACCCAA CAACAACAAA CAGAAGCACA AACGGAAACA

1161 GTTCCTAACT CCTCTGACTC TGAAGAAGTA GAAGTAGAAC

1201 AACAGACAAC AACAATAACC ACACCACCCC CATCTGAAAA

1241 TCTGCACATG CCACCACAGC AGCACCAAGT TCAATACACC

1281 CCCCATGTCT CTCCAAGGGA AGAAGAATCA TCATCACTGA

1321 TCACAATTAT GGACCATGTG CTTGAGCAGG ATCTGCCATG

1361 GAGCTTCATG TACACTGGCT TGTCTCAGTT TCAAGATCCA

1401 AACTTGGCTT TCTGCAAAGG TGATGATGAC TTGGTGGGCA

1441 TGTTTGATAG TGCAGGGTTT GAGGAAGACA TTGATTTTCT

1481 GTTCAGCACT CAACCTGGTG ATGAGACTGA GAGTGATGTC

1521 AACAATATGA GCGCAGTTTT GGATAGTGTT GAGTGTGGAG 1561 ACACAAATGG GGCTGGTGGA AGCATGATGC ATGTGGATAA 1601 CAAGCAGAAG ATAGTATCAT TTGCTTCTTC ACCATCATCT 1641 ACAACTACAG TTTCTTGTGA CTATGCTCTA GATCTATGAT 1681 CTCTTCAGAA GGGTGATGGA TGACCTACAT GGAATGGAAC 1721 CTTGTGTAGA TTATTATTGG GTTTGTTATG CATGTTGTTG 1761 GGGTTTGTTG TGATAGGTTG GTGGATGGGT GTGACTTGTG 1801 AAAATGTTCA TTGGTTTTAG GATTTTCCTT TCATCCATAC 1841 TCCGTTGTCG AAAGAAGAAA ATGTTCATTT TAGACTTGGA 1881 TTTTAGTATA AAAAAAAAGG AGAAAAAACC AAAAATGTGA 1921 TTTGGGTGCA AACAATGTTT TGTTTTTCTT TTTACTTTTG 1961 GGGTAAGGAG ATGAAGAGAG GGGAAATTTA AACCATTCCT 2001 ATTCTTGGGG GATAATGCAG TATAAATTAA GATCAGACTG 2041 TTTTTAGCAT ATGGAGTGCA AACTGCAAAG GCCAAGTTTC 2081 CTTTGTTTAA ACAATTTAGG CTTTCTTTTC CTTTGCCTAT 2121 TTTTTTTTTA TTTTTTTTTT TGTATTGGGG CATAGCAGTT 2161 AGTGTTGTGT TGAGATCTGA AATCTGATCT CTGGTTTGGT 2201 TTGTTC

Expression of an internally deleted Glycine max WRJ1 protein or an Glycine max WRI1 protein with a mutation at four or more of the following positions 353,

355, 361, 366, 372, 378, 390, 393, 394, 396, 397, 398, 399, 400 and/or 402 can increase the content of triacyiglycerol in plant tissues such as leaves and seeds.

Hence, one aspect of the invention is a mutant WRI1 protein that includes a mutation (e.g., a substitution, insertion, or deletion) in the following sequence (SEQ ID NO:95):

For example, expression of an internally deleted Glycine max WRI1 protein or a Glycine max WRI1 protein with a mutation at four or more of positions 353, 355, 361, 366, 372, 378, 390, 393, 394, 396, 397, 398, 399, 400 and/or 402 can increase the content of triacyiglycerol in plant tissues. Hence, a mutant WR11 protein can be used that includes the following sequence (SEQ ID NO: 96): where at least four of the X residues in the SEQ ID NOS6 sequence is a substitution, insertion, or deletion compared to the wild type sequence (SEQ ID NO:95). The X residues are not acidic amino acids such as aspartic acid or glutamic acid. However, the X residue can be a small amino acid or a hydrophobic amino acid. For example, the X residues can each separately be alanine, glycine, valine, leucine, isoleucine, methionine, and any mixture thereof.

In some cases, a mutant WRI1 protein with a deletion within the SEQ ID NO:93 portion of the WRI1 protein of at least 3, or at least 4, or at least 5, or at least 6, or at least 7, or at least 8, or at least 10, or at least 13, or at least 15, or at least 17, or at least 20, or at least 25, or at least 30, or at least 35, or at least 40, or at least 45 amino acids. Such mutant WRI1 proteins can be expressed in plant tissues.

Expression of Proteins

Also described herein are expression systems that include at least one expression cassette (e.g., expression vectors or transgenes) that encode one or more of the enzymes described herein, transcription factor(s) described herein, LDSP-protein fiision(s) described herein, or combinations thereof. For example, the expression systems can also include one or more expression cassettes encoding LDSP, monoterpene synthase, diterpene synthase, sesquiterpene synthase, sesterterpene synthase, triterpene synthase, tetraterpene synthase, polyterpene synthase, transcription factor, cytochrome P450, cytochrome P450 reductase, 1-deoxy-D- xylulose 5-phosphate synthase (DXS), 1 -deoxy-D- xylu lose 5-phosphate-reducto- isomerase, cytidine 5'-diphosphate-methylerythritol (CDP-ME) synthetase (IspD), 2- C-methyl-d-erythritol 2,4-cyclodiphosphate synthase (IspF), HMG-CoA synthase, HMG-CoA reductase (HMGR), mevalonic acid kinase (MVK), phosphomevalonate kinase (PMK), mevalonate-5-diphosphate decarboxylase (MPD), isopentenyl diphosphate isomerase, abietadiene synthase (ABS), fames yip yrophosphate synthase (FPPS), or squalene synthase (SQS), LDSP-protein fusions, or enzymes that facilitate production of terpene precursors or building blocks.

Nucleic acids encoding the proteins can have sequence modifications. For example, nucleic acid sequences described herein can be modified to express enzymes and transcription factors that have modifications. For example, most amino acids can be encoded by more than one codon. When an amino acid is encoded by more than one codon, the codons are referred to as degenerate codons. A listing of degenerate codons is provided in Table 1A below.

Different organisms may translate different codons more or less efficiently (e.g., because they have different ratios of tRNAs) than other organisms. Hence, when some amino acids can be encoded by several codons, a nucleic acid segment can be designed to optimize the efficiency of expression of an enzyme by using codons that are preferred by an organism of interest. For example, the nucleotide coding regions of the enzymes described herein can be codon optimized for expression in various plant species. Such enzymes can be expressed in a variety of host cells, including for example, as Nicotiana benthamiana, Nicotiana tabacum, Nicotiana rustica, Nicotiana excelsior, and Nicotiana excelsiana.

An optimized nucleic acid can have less than 98%, less than 97%, less than 95%, or less than 94%, or less than 93%, or less than 92%, or less than 91%, or less than 90%, or less than 89%, or less than 88%, or less than 85%, or less than 83%, or less than 80%, or less than 75% nucleic acid sequence identity to a corresponding non-optimized (e.g., a non-optimized parental or wild type enzyme nucleic acid) sequence.

In some cases, LDSP or enzymes can have conservative changes such as one or more deletions, insertions, replacements, or substitutions that have no significant effect on the activities of the enzymes. Examples of conservative substitutions are provided below in Table IB.

The nucleic acids described herein can also be modified to improve or alter the functional properties of the encoded enzymes. Deletions, insertions, or substitutions can be generated by a variety of methods such as, but not limited to, random mutagenesis and/or site-specific recombination-mediated methods. The mutations can range in size from one or two nucleotides to hundreds of nucleotides (or any value there between). Deletions, insertions, and/or substitutions are created at a desired location in a nucleic acid encoding the enzyme(s).

Nucleic acids encoding one or more enzyme(s) can have one or more nucleotide deletions, insertions, replacements, or substitutions. For example, the nucleic acids encoding one or more enzyme(s) can, for example, have less than 95%, or less than 94.8%, or less than 94.5%, or less than 94%, or less than 93.8%, or less than 94.50% nucleic acid sequence identity to a corresponding parental or wild-type sequence. In some cases, the nucleic acids encoding one or more enzyme(s) can have, for example, at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at 90% sequence identity to a corresponding parental or wild-type sequence. Examples of amino acid sequences for parental LDSP and unmodified proteins include amino acid sequences with SEQ ID NOs:l, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 52, 53, 54, 55, 56, 59, 61, 63, 64, 65, 67, 68, 69, 71, 72, 73, 75, 76, 77, 79, 80, 81, 83, 84, 85, 87, 89, 91, 92, 93, 95, 96, 97, 99, 101, 104, 105, 107, 108, 110, or 1 11 include nucleic acid sequence SEQ ID N0:2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 66, 70, 74, 78, 82, 87, 88, 90, 94, 98, 100, 102, 103, 106, or 109. Any of these amino acid or nucleic acid sequences can, for example, have or encode enzyme sequences with less than 99%, less than 98%, less than 97%, less than 96%, less than 95%, less than 94.8%, less than 94.5%, less than 94%, less than 93.8%, less than 93.5%, less than 93%, less than 92%, less than 91 %, or less than 90% sequence identity to a corresponding parental or wild-type sequence.

Also provided are nucleic acid molecules (polynucleotide molecules) that can include a nucleic acid segment encoding an enzyme with a sequence that is optimized for expression in at least one selected host organism or host cell. Optimized sequences include sequences which are codon optimized, i.e., codons which are employed more frequently in one organism relative to another organism. In some cases, the balance of codon usage is such that the most frequently used codon is not used to exhaustion. Other modifications can include addition or modification of Kozak sequences and/or introns, and/or to remove undesirable sequences, for instance, potential transcription factor binding sites.

The LDSP, enzymes and LDSP-protein fusions described herein can be expressed from an expression cassette and/or an expression vector. Such an expression cassette can include a nucleic acid segment that encodes at least one

LDSP, enzyme, or LDSP-protein fusion operably linked to a promoter to drive expression of one or more LDSP, enzyme, or LDSP-protein fusion. Convenient vectors, or expression systems can be used to express such LDSP, enzymes and LDSP-protein fusions. In some instances, the nucleic acid segment encoding one or more LDSP, enzyme, or LDSP-protein fusion is operably linked to a promoter and/or a transcription termination sequence. The promoter and/or the termination sequence can be heterologous to the nucleic acid segment that encodes the LDSP, enzyme, or LDSP-protein fusion. Expression cassettes can have a promoter operably linked to a heterologous open reading frame encoding a LDSP, enzyme, or LDSP-protein fusion. The invention therefore provides expression cassettes or vectors useful for expressing one or more one or more LDSP, enzyme, or LDSP-protein fusion.

Constructs, e.g., expression cassettes, and vectors comprising the isolated nucleic acid molecule, e.g., with optimized nucleic acid sequence, as well as kits comprising the isolated nucleic acid molecule, construct or vector are also provided. Techniques of molecular biology, microbiology, and recombinant DNA technology which are within the skill of the art can be employed to make and use the enzymes, expression systems, and terpene products described herein. Such techniques available in the literature. See, e.g., Sambrook, Fritsch & Maniatis, Molecular Cloning: A Laboratory Manual, Second Edition (1989); DNA Cloning, Vols. I and II (D. N. Glover ed. 1985); Oligonucleotide Synthesis (M. J. Gait ed.

1984); Nucleic Acid Hybridization (B. D. Hames & S. J. Higgins eds. 1984); Animal Cell Culture (R. K. Freshney ed. 1986); Immobilized Cells and Enzymes (ERL press, 1986); Perbal, B., A Practical Guide to Molecular Cloning (1984); the series Methods In Enzymology (S. Colowick and N. Kaplan eds., Academic Press, Inc.); Current Protocols In Molecular Biology (John Wiley & Sons, Inc), Current Protocols In Protein Science (John Wiley & Sons, Inc), Current Protocols In Microbiology (John Wiley & Sons, Inc), Current Protocols In Nucleic Acid Chemistry (John Wiley & Sons, Inc), and Handbook of Experimental Immunology, Vols. I-IV (D. M. Weir and C. C. Blackwell eds., 1986, Blackwell Scientific Publications).

The expression systems can be introduced into a variety of host cells, host tissues, seeds (e.g.,“host seeds”), and host plants.

Examples of host cells, host tissues, host seeds and plants that may be improved by these methods (e.g., by incorporation of nucleic acids and expression systems) include but are not limited to those useful for production of oils such as oilseeds, camelina, canola, castor bean, corn, flax, lupins, peanut, potatoes, safflower, soybean, sunflower, cottonseed, oil firewood trees, rapeseed, rutabaga, sorghum, walnut, and various nut species. Other types host cells, host tissues, host seeds and plants that can be used include fiber-containing plants, trees, flax, grains (maize, wheat, barley, oats, rice, sorghum, millet and rye), grasses (switchgrass, prairie grass, wheat grass, sudangrass, sorghum, straw-producing plants), softwood, hardwood and other woody plants (e.g., poplar, pine, and eucalyptus), oil (oilseeds, camelina, canola, castor bean, lupins, potatoes, soybean, sunflower, cottonseed, oil firewood trees, rapeseed, rutabaga, sorghum), starch plants (wheat, potatoes, lupins, sunflower and cottonseed), and forage plants (alfalfa, clover and fescue). In some embodiments the plant is a gymnosperm. Examples of plants useful for pulp and paper production include most pine species such as loblolly pine, Jack pine, Southern pine, Radiata pine, spruce, Douglas fir and others. Hardwoods that can be modified as described herein include aspen, poplar, eucalyptus, and others. Plants useful for making biofuels and ethanol include corn, grasses (e.g., miscanthus, switchgrass, and the like), as well as trees such as poplar, aspen, pine, oak, maple, walnut, rubber tree, willow, and the like. Plants useful for generating forage include legumes such as alfalfa, as well as forage grasses such as bromegrass, and bluestem. In some cases, tire plant is a Brassicaceae or other Solanaceae species. In some embodiments, the plant is not a species of Arabidopsis, for example, in some embodiments, the plant is not

Arabidopsis thaliana.

Modified plants that contain nucleic acids encoding one or more LDSP, enzyme, and/or LDSP-protein fusion within their somatic and/or germ cells are described herein. Such genetic modification can be accomplished by available procedures. For example, one of skill in the art can prepare an expression cassette or expression vector that can express one or more encoded LDSP, enzyme, and/or LDSP-protein fusion. Plant cells can be transformed by the expression cassette or expression vector, and whole plants (and their seeds) can be generated from the plant cells that were successfully transformed with one or more LDSP, enzyme, and/or LDSP-protein fusion nucleic acids. Some procedures for making such genetically modified plants and their seeds are described below.

Promoters: The nucleic acids encoding one or more LDSP, enzyme, and/or

LDSP-protein fusion can be operably linked to a promoter, which provides for expression of mRNA from the nucleic acids. The promoter is typically a promoter functional in plants and can be a promoter functional during plant growth and development. A nucleic acid segment encoding one or more LDSP, enzyme, and/or LDSP-protein fusion is operably linked to the promoter when it is located downstream from the promoter. The combination of a coding region for an enzyme operably linked to a promoter forms an expression cassette, which can optionally include other elements as well.

Promoter regions are typically found in the flanking DNA upstream from the coding sequence in both the prokaryotic and eukaryotic cells. A promoter sequence provides for regulation of transcription of the downstream gene sequence and typically includes from about 50 to about 2,000 nucleotide base pairs. Promoter sequences also contain regulatory sequences such as enhancer sequences that can influence the level of gene expression. Some isolated promoter sequences can provide for gene expression of heterologous DNAs, that is a DNA different from the native or homologpus DNA. Promoter sequences are also known to be strong or weak, or inducible. A strong promoter provides for a high level of gene expression, whereas a weak promoter provides for a very low level of gene expression. An inducible promoter is a promoter that provides for turning on and off gene expression in response to an exogenously added agent, or to an environmental or developmental stimulus. For example, a bacterial promoter such as the promoter can be induced to varying levels of gene expression depending on the level of isopropyl-beta-D-thiogalactoside added to the transformed cells. Promoters can also provide for tissue specific or developmental regulation. An isolated promoter sequence that is a strong promoter for heterologous DNAs is advantageous because it provides for a sufficient level of gene expression for easy detection and selection of transformed cells and provides for a high level of gene expression when desired.

Expression cassettes generally include, but are not limited to, examples of plant promoters such as the CaMV 35S promoter (Odell et al., Nature. 313:810-812 (1985)), or others such as CaMV 19S (Lawton et al ., Plant Molecular Biology.

9:315-324 (1987)), nos (Ebert et al., Proc. Natl. Acad. Sci. USA. 84:5745-5749 (1987)), Adhl (Walker et aL, Proc. Natl. Acad. Sci. USA. 84:6624-6628 (1987)), sucrose synthase (Yang et al., Proc. Natl Acad. Sci. USA. 87:4144-4148 (1990)), a- tubulin, ubiquitin, actin (Wang et al., Mol Cell. Biol. 12:3399 (1992)), cab (Sullivan et al., Mol. Gen. Genet. 215:431 (1989)), PEPCase (Hudspeth et al, Plant Molecular Biology. 12:579-589 (1989)) or those associated with the R gene complex (Chandler et al., The Plant Cell. 1 :1175-1183 (1989)). Further suitable promoters include a CYP71D16 trichome-specific promoter and the CBTS (cembratrienol synthase) promotor, cauliflower mosaic virus promoter, the Z10 promoter from a gene encoding a 10 kD zein protein, a Z27 promoter from a gene encoding a 27 kD zein protein, the plastid rRNA-operon (rm) promoter, inducible promoters, such as the light inducible promoter derived from the pea rbcS gene (Coruzzi et al., EMBO J. 3:1671 (1971)), RUBISCO-SSU light inducible promoter (SSU) from tobacco and the actin promoter from rice (McElroy et al., The Plant Cell. 2:163-171 (1990)). Other promoters that are useful can also be employed.

Examples of leaf-specific promoters include the promoter from the Populus ribulose- 1 ,5 -bisphosphate carboxylase small subunit gene (Wang et al. Plant Molec Biol Reporter 31 (1): 120-127 (2013)), the promoter from the Brachypodium distachyon sedoheptulose- 1 ,7 -bisphosphatase (SBPase- p) gene (Alotaibi et al. Plants 7(2): 27 (2018)), the fructose- 1,6-bisphosphate aldolase (FBPA-p) gene from Brachypodium distachyon (Alotaibi et aL Plants 7(2): 27 (2018)), and the photosystem-II promoter (CAB2-p) of the rice ( Oryza sativa L.) light-harvest chlorophyll a/b binding protein (CAB) (Song et aL J Am Soc Hort Sci 132(4): 551 - 556 (2007)). Additional promoters that can be used include those available in expression databases, see for example, website bar.utoronto.ca/eplant/

which includes poplar or heterologous promoters from Arabidopsis (for example from AT2G26020 / PDF1.2b or AT5G44420 / LCR77).

Alternatively, novel tissue specific promoter sequences may be employed. cDNA clones from a particular tissue can be isolated and those clones which are expressed specifically in that tissue can be identified, for example, using Northern blotting. Preferably, the gpne isolated is not present in a high copy number but is relatively abundant in specific tissues. The promoter and control elements of corresponding genomic clones can then be localized using techniques well known to those of skill in the art.

Plant plastid originated promoters can also be used, for example, to improve expression in plastids, for example, a rice clp promoter, or tobacco rm promoter. Chloroplast-specific promoters can also be utilized for targeting the foreign protein expression into chloroplasts. For example, the 16S ribosomal RNA promoter (Prm) like psbA and atpA gene promoters can be used for chloroplast transformation.

A nucleic acid encoding one or more LDSP, enzyme, and/or LDSP-protein fusion can be combined with the promoter by standard methods to yield an expression cassette, for example, as described in Sambrook et al. (MOLECULAR CLONING: A LABORATORY MANUAL. Second Edition (Cold Spring Harbor, NY: Cold Spring

Harbor Press (1989); MOLECULAR CLONING: A LABORATORY MANUAL Third Edition (Cold Spring Harbor, NY: Cold Spring Harbor Press (2000)). Briefly, a plasmid containing a promoter such as the 35S CaMV promoter or the CYP71D16 trichome- specific promoter can be constructed as described in Jefferson ( Plant Molecular Biology Reporter 5:387-405 (1987)) or obtained from Clontech Lab in Palo Alto,

California (e.g., pBI121 or pBI221). Typically, these plasmids are constructed to have multiple cloning sites having specificity for different restriction enzymes downstream from the promoter. The nucleic acid sequence encoding one or more LDSP, enzyme, and/or LDSP-protein fusion can be subcloned downstream from the promoter using restriction enzymes and positioned to ensure that the DNA is inserted in proper orientation with respect to the promoter so that the DNA can be expressed as sense RNA. Once the nucleic acid segment encoding the one or more LDSP, enzyme, and/or LDSP-protein fusion is operably linked to a promoter, the expression cassette so farmed can be subcloned into a plasmid or other vector (e.g., an expression vector).

In some embodiments, a cDNA clone encoding a LDSP, enzyme, and/or LDSP-protein fusion is isolated from selected plant tissues, or a nucleic acid encoding a wild type, mutant or modified enzyme is prepared by available methods or as described herein. For example, the nucleic acid encoding the enzyme can be any nucleic acid with a coding region that hybridizes to SEQ ID NO: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 66, 70, 74, 78, 82, 87, 88, 90, 94, 98, 100, 102, 103, 106, or 109, and that encodes a protein with LDSP-anchoring activity and/or enzyme activity. Using restriction endonucleases, the entire coding sequence for the LDSP, enzyme, and/or LDSP- protein fusion is subcloned downstream of the promoter in a 5’ to 3’ sense orientation.

Targeting Sequences: Additionally, expression cassettes can be constructed and employed to target the nucleic acids encoding one or more LDSP, enzyme, and/or LDSP-protein fusion to an intracellular compartment within plant cells or to direct an encoded protein to the extracellular environment. This can generally be achieved by joining a DNA sequence encoding a LDSP, transit or signal peptide sequence to the coding sequence of the nucleic acid encoding the enzyme. The resultant transit, or signal, peptide can transport the protein to a particular intracellular, or extracellular destination, and can then be co-translationally or post-translationally removed.

Transit peptides act by facilitating the transport of proteins through intracellular membranes, e.g., vacuole, vesicle, plastid and mitochondrial membranes, whereas signal peptides direct proteins through the extracellular membrane. By facilitating transport of the protein into compartments inside or outside the cell, these sequences can increase the accumulation of a particular gene product in a particular location. For example, see U.S. Patent No. 5,258,300. For example, in some cases it may be desirable to localize the enzymes to lipid droplets.

The best compliment of LDSP/transit peptides/secretion peptide/signal peptides can be empirically ascertained. The choices can range from using the native secretion signals akin to the enzyme candidates to be transgenically expressed, to transit peptides from proteins known to be localized into plant organelles such as trichome plastids in general

For example, transit peptides can be selected from proteins that have a relative high titer in the trichomes. Examples include, but not limited to, transit peptides form a terpenoid cyclase (e.g. cembratrieneol cyclase), the LTPJ protein, the Chlorophyll a-b binding protein 40, Phylloplanin, Glycine-rich Protein (GRP), Cytochrome P450 (CYP71D16); all from Nicotiana sp. alongside RUBISCO (Ribulose bisphosphate carboxylase) small unit protein from both Ambidopsis and Nicotiana sp.

3' Sequences: When the expression cassette is to be introduced into a plant cell, the expression cassette can also optionally include 3' untranslated plant regulatory DNA sequences that act as a signal to terminate transcription and allow for the polyadenylation of the resultant mRNA. The 3' untranslated regulatory DNA sequence can include from about 300 to 1 ,000 nucleotide base pairs and can contain plant transcriptional and translational termination sequences. For example, 3' elements that can be used include those derived from the nopaline synthase gene of

Agrobacterium tumefaciens (Bevan et al, Nucleic Acid Research. 11 :369-385 (1983)), or the terminator sequences for the T7 transcript from the octopine synthase gene of Agrobacterium tumefaciens, and/or the 3' end of the protease inhibitor I or P genes from potato or tomato. Other 3' elements known to those of skill in the art can also be employed. These 3' untranslated regulatory sequences can be obtained as described in An ( Methods in Enzymology. 153:292 (1987)). Many such 3' untranslated regulatory sequences are already present in plasmids available from commercial sources such as Clontech, Palo Alto, California. The 3 * untranslated regulatory sequences can be operably linked to the 3’ terminus of the nucleic acids encoding the LDSP or enzyme.

Selectable and Screenable Marker Sequences: To improve identification of transformants, a selectable or screenable marker gene can be employed with the expressible nucleic acids encoding the LDSP and/or enzyme(s). "Marker genes" are genes that impart a distinct phenotype to cells expressing the marker gene and thus allow such transformed cells to be distinguished from cells that do not have the marker. Such genes may encode either a selectable or a screenable marker, depending on whether the marker confers a trait which one can‘select’ for by chemical means, i.e., through the use of a selective agent (e.g., a herbicide, antibiotic, or the like), or whether it is simply a trait that one can identify through observation or testing, i.e., by ‘screening’ (e.g., the R-locus trait). Of course, many examples of suitable marker genes are available can be employed in the practice of the invention.

Included within the terms‘selectable or screenable marker genes' are also genes which encode a“secretable marker” whose secretion can be detected as a means of identifying or selecting for transformed cells. Examples include markers which encode a secretable antigen that can be identified by antibody interaction, or secretable enzymes that can be detected by their catalytic activity. Secretable proteins fall into a number of classes, including small, diffusible proteins detectable, e.g., by ELISA; and proteins that are inserted or trapped in the cell wall (e.g., proteins that include a leader sequence such as that found in the expression unit of extensin or tobacco PR-S).

With regard to selectable secretable markers, the use of an expression system that encodes a polypeptide that becomes sequestered in the cell wall, where the polypeptide includes a unique epitope may be advantageous. Such a cell wall antigen can employ an epitope sequence that would provide low background in plant tissue, a promoter-leader sequence that imparts efficient expression and targeting across the plasma membrane, and that can produce protein that is bound in the cell wall and yet is accessible to antibodies. A normally secreted cell wall protein modified to include a unique epitope would satisfy such requirements.

Examples of protein markers suitable for modification in this manner include extensin or hydroxyproline rich glycoprotein (HPRG). For example, the maize HPRG (Stiefel et al., The Plant Cell. 2:785-793 (1990)) is well characterized in toms of molecular biology, expression, and protein structure and therefore can readily be employed. However, any one of a variety of extensins and/or glycine-rich cell wall proteins (Keller et al., EMBO J. 8:1309-1314 (1989)) could be modified by the addition of an antigenic site to create a screenable marker.

Selectable markers for use in connection with the present invention can include, but are not limited to, a neo gene (Potrykus et al., Mol. Gen. Genet.

199: 183-188 (1985)) which codes for kanamycin resistance and can be selected for using kanamycin, G418; a bar gene which codes for bialaphos resistance; a gene which encodes an altered EPSP synthase protein (Hinchee et al., Bio/Technology. 6:915-922 (1988)) thus conferring glyphosate resistance; a nitrilase gene such as bxn from Klebsiella ozaenae which confers resistance to bromoxynil (Stalker et al.. Science. 242:419-423 (1988)); a mutant acetolactate synthase gene (ALS) which confers resistance to imidazolinone, sulfonylurea or other ALS-inhibiting chemicals (European Patent Application 154,204 (1985)); a methotrexate-resistant DHFR gene (Thillet et al, J. Biol Chem. 263:12500-12508 (1988)); a dalapon dehalogenase gene that confers resistance to the herbicide dalapon; or a mutated anthranilate synthase gene that confers resistance to 5-methyl tryptophan. Where a mutant EPSP synthase gene is employed, additional benefit may be realized through the incorporation of a suitable chloroplast transit peptide, CTP (European Patent Application 0 218 571 (1987)).

An illustrative embodiment of a selectable marker gene capable of being used in systems to select transformants is the gene that encode the enzyme

phosphinothricin acetyltransferase, such as the bar gene from Streptomyces hygroscopicus or the pat gene from Streptomyces viridochromogenes (U.S. Patent No. 5,550,318). The enzyme phosphinothricin acetyl transferase (PAT) inactivates the active ingredient in the herbicide bialaphos, phosphinothricin (PPT). PPT inhibits glutamine synthetase, (Murakami et al., Mol Gen. Genet. 205:42-50 (1986); Twell et al., Plant Physiol. 91 :1270-1274 (1989)) causing rapid accumulation of ammonia and cell death.

Screenable markers that may be employed include, but are not limited to, a b- glucuronidase or uidA gene (GUS) that encodes an enzyme for which various chromogenic substrates are known; an R-locus gene, which encodes a product that regulates the production of anthocyanin pigments (red color) in plant tissues

(Dellaporta et al, In: Chromosome Structure and Function: Impact of New Concepts, is" 1 Stadia- Genetics Symposium, J.P. Gustafson and R. Appels, eds. (New York: Plenum Press) pp. 263-282 (1988)); a b-lactamase gene (Sutcliffe, Proc. Natl Acad. Scl USA. 75:3737-3741 (1978)), which encodes an enzyme for which various chromogenic substrates are known (e.g., PADAC, a chromogenic cephalosporin); a xylE gene (Zukowsky et al, Proc. Natl Acad. Sci. USA. 80:1101 (1983)) which encodes a catechol dioxygenase that can convert chromogenic catechols; an a- amylase gene (Ikuta et al., Bio/technology 8:241 -242 (1990)); a tyrosinase gene (Katz et al, J. Gen. Microbiol. 129:2703-2714 (1983)) which encodes an enzyme capable of oxidizing tyrosine to DOPA and dopaquinone which in turn condenses to form the easily detectable compound melanin; a b-galactosidase gene, which encodes an enzyme for which there are chromogenic substrates; a luciferase (lux) gene (Ow etal., Science. 234:856-859.1986), which allows for bioluminescence detection; or an aequorin gene (Prasher et al., Biochem. Biophys. Res. Comm. 126:1259-1268 (1985)), which may be employed in calcium-sensitive bioluminescence detection, or a green or yellow fluorescent protein gene (Niedz et al., Plant Cell Reports. 14:403 (1995)).

Another screenable marker contemplated for use is firefly luciferase, encoded by the lux gene. The presence of the lux gene in transformed cells may be detected using, for example, X-ray film, scintillation counting, fluorescent spectrophotometry, low-light video cameras, photon counting cameras or multiwell luminometry. It is also envisioned that this system may be developed for population screening for bioluminescence, such as on tissue culture plates, or even for whole plant screening.

Other Optional Sequences: An expression cassette of the invention can also include plasmid DNA. Plasmid vectors include additional DNA sequences that provide for easy selection, amplification, and transformation of the expression cassette in prokaryotic and eukaryotic cells, e.g., pUC-derived vectors such as pUC8, pUC9, pUC18, pUC19, pUC23, pUC119, and pUC120, pSK-derived vectors, pGEM-derived vectors, pSP-dedved vectors, or pBS-derived vectors. The additional DNA sequences can include origins of replication to provide for autonomous replication of the vector, additional selectable marker genes, for example, encoding antibiotic or herbicide resistance, unique multiple cloning sites providing for multiple sites to insert DNA sequences or genes encoded in the expression cassette and sequences that enhance transformation of prokaryotic and eukaryotic cells.

Another vector that is useful for expression in both plant and prokaryotic cells is the binary Ti plasmid (as disclosed in Schilperoort et al., U.S. Patent No.

4,940,838) as exemplified by vector pGA582. This binary Ti plasmid vector has been previously characterized by An (Methods in Enzymology. 153:292 (1987)) and is available from Dr. An. This binary Ti vector can be replicated in prokaryotic bacteria such as E. coli and Agrobacterium. The Agrobacterium plasmid vectors can be used to transfer the expression cassette to dicot plant cells, and under certain conditions to monocot cells, such as rice cells. The binary Ti vectors can include the nopaline T DNA right and left borders to provide for efficient plant cell transformation, a selectable marker gene, unique multiple cloning sites in the T border regions, the co /El replication of origin and a wide host range replicon. The binary Ti vectors carrying an expression cassette of the invention can be used to transform both prokaryotic and eukaryotic cells but is usually used to transform dicot plant cells.

DNA Delivery of die DNA Molecules into Host Cells: Methods described herein can include introducing nucleic acids encoding LDSP and/or enzymes, such as a preselected cDNA encoding the selected LDSP and/or enzyme, into a recipient cell to create a transformed cell. In some instances, die frequency of occurrence of cells taking up exogenous (foreign) DNA may be low. Moreover, it is most likely that not all recipient cells receiving DNA segments or sequences will result in a transformed cell wherein the DNA is stably integrated into the plant genome and/or expressed. Some recipient cells may provide only initial and transient gene expression. However, certain cells from virtually any dicot or monocot species may be stably transformed, and these cells regenerated into transgenic plants, through the application of the techniques disclosed herein.

Another aspect of the invention is a plant or plant cell that can produce terpenes, diterpenes and terpenoids, wherein the plant has introduced nucleic acid sequence(s) encoding one or more enzymes. The plant or plant cell can be a monocotyledon or a dicotyledon.

Another aspect of the invention includes plant cells (e.g., embryonic cells or other cell lines) that can regenerate fertile transgenic plants and/or seeds. The cells can be derived from either monocotyledons or dicotyledons. In some embodiments, the plant or cell is a monocotyledon plant or cell In some embodiments, the plant or cell is a dicotyledon plant or cell. For example, the plant or cell can be a tobacco plant or cell. The cell(s) may be in a suspension cell culture or may be in an intact plant part, such as an immature embryo, or in a specialized plant tissue, such as callus, such as Type I or Type P callus.

Transformation of plant cells can be conducted by any one of a number of methods available in the art. Examples are: Transformation by direct DNA transfer into plant cells by electroporation (U.S. Patent No. 5,384,253 and U.S. Patent No. 5,472,869, Dekeyser et al., The Plant Cell. 2:591-602 (1990)); direct DNA transfer to plant cells by PEG precipitation (Hayashimoto et al., Plant Physiol. 93:857-863

(1990)); direct DNA transfer to plant cells by microprojectile bombardment (McCabe et al., Bio/Technology. 6:923-926 (1988); Gordon-Kamm et al., The Plant Cell.

2:603-618 (1990); U.S. Patent No. 5,489,520; U.S. Patent No. 5,538,877; and U.S. Patent No. 5,538,880) and DNA transfer to plant cells via infection with Agrobacterium. Methods such as microprojectile bombardment or electroporation can be carried out with“naked” DNA where the expression cassette may be simply carried on any E. co/z -derived plasmid cloning vector. In the case of viral vectors, it is desirable that die system retain replication functions, but lack the functions for disease induction.

One method for dicot transformation, for example, involves infection of plant cells with Agrobacterium tumefaciens using the leaf-disk protocol (Horsch et al., Science 227:1229-1231 (1985). Methods for transformation of monocotyledonous plants utilizing Agrobacterium tumefaciens have been described by Hiei et al.

(European Patent 0 604 662, 1994) and Saito et al. (European Patent 0 672 752, 1995).

Monocot cells such as various grasses or dicot cells such as tobacco can be transformed via microprojectile bombardment of embryogenic callus tissue or immature embryos, or by electroporation following partial enzymatic degradation of the cell wall with a pecdnase-containing enzyme (U.S. Patent No. 5,384,253; and U.S. Patent No. 5,472,869). For example, embryogenic cell lines derived from immature embryos can be transformed by accelerated particle treatment as described by Gordon-Kamm et al. (The Plant Cell. 2:603-618 (1990)) or U.S. Patent No.

5,489,520; U.S. Patent No. 5,538,877 and U.S. Patent No. 5,538,880, cited above. Excised immature embryos can also be used as the target for transformation prior to tissue culture induction, selection and regeneration as described in U.S. application Serial No. 08/112,245 and PCT publication WO 95/06128.

The choice of plant tissue source for transformation may depend on die nature of the host plant and the transformation protocol. As illustrated herein, leaves were used in some transient expression experiments. Useful tissue sources include callus, suspensions culture cells, protoplasts, leaf segments, stem segments, tassels, pollen, embryos, hypocotyls, tuber segments, meristematic regions, and the like. The tissue source is selected and transformed so that it retains the ability to regenerate whole, fertile plants following transformation, i.e., contains totipotent cells.

The transformation is carried out under conditions directed to the plant tissue of choice. The plant cells or tissue are exposed to the DNA or RNA encoding enzymes for an effective period of time. This may range from a less than one second pulse of electricity for electroporation to a 2-day to 3-day co-cultivation in the presence of plasmid-bearing Agrobacterium cells. Buffers and media used will also vary with the plant tissue source and transformation protocol. Many transformation protocols employ a feeder layer of suspended culture cells (tobacco, for example) on the surface of solid media plates, separated by a sterile filter paper disk from the plant cells or tissues being transformed.

In some cases, plasdd expression is desired. Transformation of plastids can be achieved by use of expression cassettes or expression vectors that include one or more of the following: delivery of expression cassettes or expression vectors across cell membranes and intracellular plastid membranes, one or more regions of homology with plastid DNA, enzyme nucleotide sequences optimized for plastid expression, one or more selectable markers for plastid transformation, segregation of genomic copies of the expression cassette within a plastid, or a combination thereof. Particle bombardment can be used for plastid transformation, but other methods can also be used. For example, polyethylene glycol (PEG) treatment of protoplasts has been used to transform plastids.

Electroporation: Where one wishes to introduce DNA by means of electroporation, it is contemplated that the method of Krzyzek et al. (U.S. Patent No. 5,384,253) may be advantageous. In this method, certain cell wall-degrading enzymes, such as pectin-degrading enzymes, are employed to render the target recipient cells more susceptible to transformation by electroporation than untreated cells. Alternatively, recipient cells can be made more susceptible to transformation, by mechanical wounding.

To effect transformation by electroporation, one may employ either friable tissues such as a suspension cell cultures, or embryogenic callus, or alternatively, one may transform immature embryos or other organized tissues directly. The cell walls of the preselected cells or organs can be partially degraded by exposing them to pectin-degrading enzymes (pectinases or pectolyases) or mechanically wounding them in a controlled manner. Such cells would then be receptive to DNA uptake by electroporation, which may be carried out at this stage, and transformed cells then identified by a suitable selection or screening protocol dependent on the nature of the newly incorporated DNA.

Microprojectile Bombardment: A further advantageous method for delivering transforming DNA segments to plant cells is microprojectile bombardment. In this method, microparticles may be coated with DNA and delivered into cells by a propelling force. Exemplary particles include those comprised of tungsten, gold, platinum, and the like.

In some cases, expression cassette / expression vector nucleic acids can be precipitated onto metal particles for DNA delivery using microprojectile

bombardment. However, in some instances DNA precipitation onto metal particles would not be necessary for DNA delivery to a recipient cell using microprojectile bombardment. In an illustrative embodiment, non-embryogenic cells were bombarded with intact cells of the bacteria E. coli or Agrobacterium tumefaciens containing plasmids with either the b-glucoronidase or bar gene engineered for expression in selected plant cells. Bacteria were inactivated by ethanol dehydration prior to bombardment. A low level of transient expression of the b-glucoronidase gene was observed 24-48 hours following DNA delivery. In addition, stable transformants containing the bar gene were recovered following bombardment with either E. coli or Agrobacterium tumefaciens cells. It is contemplated that particles may contain DNA rather than be coated with DNA. Hence it is proposed that particles may increase the level of DNA delivery but are not, in and of themselves, necessary to introduce DNA into plant cells.

An advantage of microprojectile bombardment, in addition to being an effective means of reproducibly stably transforming monocots, microprojectile bombardment does not require the isolation of protoplasts (Christou et al., PNAS.

84:3962-3966 (1987)), the formation of partially degraded cells, and no susceptibility to Agrobacterium infection is required. An illustrative embodiment of a method for delivering DNA into maize cells by acceleration is a Biolistics Particle Delivery System, which can be used to propel particles coated with DNA or cells through a screen, such as a stainless steel or Nytex screen, onto a filter surface covered with maize cells cultured in suspension (Gordon-Kamm et al., The Plant Cell. 2:603-618 (1990)). The screen disperses the particles so that they are not delivered to the recipient cells in large aggregates. It is believed that a screen intervening between the projectile apparatus and the cells to be bombarded reduces the size of projectile aggregate and may contribute to a higher frequency of transformation, by reducing the damage inflicted on recipient cells by an aggregated projectile.

For bombardment, cells in suspension are preferably concentrated on filters or solid culture medium. Alternatively, immature embryos or other target cells may be arranged on solid culture medium. The cells to be bombarded are positioned at an appropriate distance below the microprojectile stopping plate. If desired, one or more screens are also positioned between the acceleration device and the cells to be bombarded. Through the use of techniques set forth herein, one may obtain up to 1000 or more foci of cells transiently expressing a marker gene. The number of cells in a focus which express die exogenous gene product 48 hours post-bombardment often range from about 1 to 10 and average about 1 to 3.

In bombardment transformation, one may optimize the prebombardment culturing conditions and the bombardment parameters to yield the maximum numbers of stable transformants. Both the physical and biological parameters for bombardment can influence transformation frequency. Physical factors are those that involve manipulating the DNA/microprojectile precipitate or those that affect the path and velocity of either the macro- or microprojectiles. Biological factors include all steps involved in manipulation of cells before and immediately after bombardment, the osmotic adjustment of target cells to help alleviate the trauma associated with the bombardment, and also the nature of the transforming DNA, such as linearized DNA or intact supercoiled plasmid DNA.

One may wish to adjust various bombardment parameters in small scale studies to fully optimize the conditions and/or to adjust physical parameters such as gap distance, flight distance, tissue distance, and helium pressure. One may also minimize the trauma reduction factors (TRFs) by modifying conditions which influence the physiological state of the recipient cells and which may therefore, influence transformation and integration efficiencies. For example, the osmotic state, tissue hydration and the subculture stage or cell cycle of the recipient cells may be adjusted for optimum transformation. Execution of such routine adjustments will be known to those of skill in the art.

Selection: An exemplary embodiment of methods for identifying transformed cells involves exposing the bombarded cultures to a selective agpnt, such as a metabolic inhibitor, an antibiotic, or the like. Cells which have been transformed and have stably integrated a marker gene conferring resistance to the selective agent used, will grow and divide in culture. Sensitive cells will not be amenable to further culturing.

To use the foir-bialaphos or the EPSPS-glyphosate selective system, bombarded tissue is cultured for about 0-28 days on nonselective medium and subsequently transferred to medium containing from about 1-3 mg/1 bialaphos or about 1-3 mM glyphosate, as appropriate. While ranges of about 1-3 mgZl bialaphos or about 1-3 mM glyphosate can be employed, it is proposed that ranges of at least about 0.1-50 mg/l bialaphos or at least about 0.1-50 mM glyphosate will find utility in the practice of the invention. Tissue can be placed on any porous, inert, solid or semi- solid support for bombardment, including but not limited to filters and solid culture medium. Bialaphos and glyphosate are provided as examples of agents suitable for selection of transformants, but the technique of this invention is not limited to them.

The enzyme luciferase is also useful as a screenable marker in the context of the present invention. In the presence of the substrate luciferin, cells expressing luciferase emit light which can be detected on photographic or X-ray film, in a luminometer (or liquid scintillation counter), by devices that enhance night vision, or by a highly light sensitive video camera, such as a photon counting camera. All of these assays are nondestructive and transformed cells may be cultured further following identification. The photon counting camera is especially valuable as it allows one to identify specific cells or groups of cells which are expressing luciferase and manipulate those in real time.

It is further contemplated that combinations of screenable and selectable markers may be useful for identification of transformed cells. For example, selection with a growth inhibiting compound, such as bialaphos or glyphosate at concentrations that provide 100% inhibition followed by screening of growing tissue for expression of a screenable marker gene such as luciferase would allow one to recover transformants from cell or tissue types that are not amenable to selection alone.

Regeneration and Seed Production: Cells that survive tire exposure to the selective agent, or cells that have been scored positive in a screening assay, are cultured in media that supports regeneration of plants. One example of a growth regulator that can be used for such purposes is dicamba or 2,4-D. However, other growth regulators may be employed, including NAA, NAA + 2,4-D or perhaps even picloram. Media improvement in these and like ways can facilitate the growth of cells at specific developmental stages. Tissue can be maintained on a basic media with growth regulators until sufficient tissue is available to begin plant regeneration efforts, or following repeated rounds of manual selection, until the morphology of the tissue is suitable for regeneration, at least two weeks, then transferred to media conducive to maturation of embryoids. Cultures are typically transferred every two weeks on this medium. Shoot development signals the time to transfer to medium lacking growth regulators.

The transformed cells, identified by selection or screening and cultured in an appropriate medium that supports regeneration, can then be allowed to mature into plants. Developing plantlets are transferred to soilless plant growth mix, and hardened, e.g., in an environmentally controlled chamber at about 85% relative humidity, about 600 ppm CO2, and at about 25-250 microeinsteins/sec-m 2 of light. Plants can be matured either in a growth chamber or greenhouse. Plants are regenerated from about 6 weeks to 10 months after a transformant is identified, depending on the initial tissue. During regeneration, cells are grown on solid media in tissue culture vessels. Illustrative embodiments of such vessels are petri dishes and Plant Con™. Regenerating plants can be grown at about 19 °C to 28 °C. After the regenerating plants have reached tire stage of shoot and root development, they may be transferred to a greenhouse for further growth and testing.

Mature plants are then obtained from cell lines that are known to express the trait. In some embodiments, the regenerated plants are self-pollinated. In addition, pollen obtained from the regenerated plants can be crossed to seed grown plants of agronomically important inbred lines. In some cases, pollen from plants of these inbred lines is used to pollinate regenerated plants. The trait is genetically characterized by evaluating the segregation of the trait in first and later generation progeny. The heritability and expression in plants of traits selected in tissue culture are of particular importance if the traits are to be commercially useful.

Regenerated plants can be repeatedly crossed to inbred plants to introgress the nucleic acids encoding an enzyme into the genome of the inbred plants. This process is referred to as backcross conversion. When a sufficient number of crosses to the recurrent inbred parent have been completed in order to produce a product of the backcross conversion process that is substantially isogenic with the recurrent inbred parent except for the presence of the introduced nucleic acids, the plant is self-pollinated at least once in order to produce a homozygous backcross converted inbred containing the nucleic acids encoding tire enzyme(s). Progeny of these plants are true breeding.

Alternatively, seed from transformed plants regenerated from transformed tissue cultures is grown in the field and self-pollinated to generate true breeding plants. Seed from the fertile transgenic plants can then be evaluated for the presence and/or expression of the enzyme(s). Transgenic plant and/or seed tissue can be analyzed for enzyme expression using methods such as SDS polyacrylamide gel electrophoresis, Western blot, liquid chromatography (e.g., HPLC) or other means of detecting an enzyme product (e.g., a terpene, diterpene, terpenoid, or a combination thereof).

Once a transgenic seed expressing the enzyme(s) and producing one or more terpenes, diterpenes, and/or terpenoids in die plant is identified, the seed can be used to develop true breeding plants. The true breeding plants are used to develop a line of plants expressing terpenes, diterpenes, and/or terpenoids in various plant tissues (e.g., in leaves, bracts, and/or trichomes) while still maintaining other desirable functional agronomic traits. Adding the trait of terpene, diterpene, and/or terpenoid production can be accomplished by back-crossing with selected desirable functional agronomic trait(s) and with plants that do not exhibit such traits and studying the pattern of inheritance in segregating generations. Those plants expressing the target trait(s) in a dominant fashion are preferably selected. Back-crossing is carried out by crossing the original fertile transgenic plants with a plant from an inbred line exhibiting desirable functional agronomic characteristics while not necessarily expressing die trait of terpene, diterpene, and/or terpenoid production in the plant. The resulting progeny can then be crossed back to the parent that expresses the terpenes, diterpenes, and/or terpenoids. The progeny from this cross will also segregate so that some of the progeny carry the trait and some do not. This back-crossing is repeated until the goal of acquiring an inbred line with the desirable functional agronomic traits, and with production of terpenes, diterpenes, and/or terpenoids within various tissues of the plant is achieved. The enzymes can be expressed in a dominant fashion.

Subsequent to back-crossing, the new transgenic plants can be evaluated for synthesis of terpenes, diterpenes, and/or terpenoids in selected plant lines. This can be done, for example, by gas chromatography, mass spectroscopy, or NMR analysis of whole plant cell walls (Kim, H., and Ralph, J. Solution-state 2D NMR of ball-milled plant cell wall gels in DMSO-de/pyridine-ds. (2010) Org. BiomoL Chem. 8(3), 576- 591; Yelle, D. J., Ralph, J., and Irihart, C. R. Characterization of non-derivatized plant cell walls using high-resolution solution-state NMR spectroscopy. (2008) Magn. Reson. Chem. 46(6), 508-517; Kim, H., Ralph, J., and Akiyama, T. Solution-state 2D NMR of Ball-milled Plant Cell Wall Gels in DMSO-de. (2008) BioEnergy Research 1(1), 56-66; Lu, F., and Ralph, J. Non-degradative dissolution and acetylation of ball- milled plant cell walls; high-resolution solution-state NMR. (2003) Plant J 35(4), 535-544). The new transgenic plants can also be evaluated for a battery of functional agronomic characteristics such as lodging, yield, resistance to disease, resistance to insect pests, drought resistance, and/or herbicide resistance.

Determination of Stably Transformed Plant Tissues: To confirm tire presence of the nucleic acids encoding terpene synthesizing enzymes in the regenerating plants, or seeds or progeny derived from the regenerated plant, a variety of assays may be performed. Such assays include, for example, molecular biological assays, such as Southern and Northern blotting and PCR; biochemical assays, such as detecting the presence of enzyme products, for example, by enzyme assays, by immunological assays (ELISAs and Western blots). Various plant parts can be assayed, such as trichomes, leaves, bracts, seeds or roots. In some cases, the phenotype of the whole regenerated plant can be analyzed.

Whereas DNA analysis techniques may be conducted using DNA isolated from any part of a plant, RNA may only be expressed in particular cells or tissue types and so RNA for analysis can be obtained from those tissues. PCR techniques may also be used for detection and quantification of RNA produced from introduced nucleic acids. PCR can also be used to reverse transcribe RNA into DNA, using enzymes such as reverse transcriptase, and then this DNA can be amplified through the use of conventional PCR techniques. Further information about the nature of the RNA product may be obtained by Northern blotting. This technique will demonstrate the presence of an RNA species and give information about the integrity of that RNA. The presence or absence of an RNA species can also be determined using dot or slot blot Northern hybridizations. These techniques are modifications of Northern blotting and also demonstrate the presence or absence of an RNA species.

While Southern blotting may be used to detect the nucleic acid encoding the enzyme(s) in question, it may not provide information as to whether the preselected DNA segment is being expressed. Expression may be evaluated by specifically identifying the protein products of the introduced nucleic acids or evaluating the phenotypic changes brought about by their expression.

Assays for the production and identification of specific proteins may make use of physical-chemical, structural, functional, or other properties of the proteins. Unique physical-chemical or structural properties allow the proteins to be separated and identified by electrophoretic procedures, such as, native or denaturing gel electrophoresis or isoelectric focusing, or by chromatographic techniques such as ion exchange, liquid chromatography or gel exclusion chromatography. The unique structures of individual proteins offer opportunities for use of specific antibodies to detect their presence in formats such as an ELISA assay. Combinations of approaches may be employed with even greater specificity such as Western blotting in which antibodies are used to locate individual gene products that have been separated by electrophoretic techniques. Additional techniques may be employed to absolutely confirm the identity of the enzyme such as evaluation by amino acid sequencing following purification. Other procedures may be additionally used.

The expression of a gene product can also be determined by evaluating the phenotypic results of its expression. These assays also may take many forms including but not limited to analyzing changes in the chemical composition, morphology, or physiological properties of the plant. Chemical composition may be altered by expression of preselected DNA segments encoding storage proteins which change amino acid composition and may be detected by amino arid analysis.

Hosts

Terpenes, including diterpenes and terpenoids, can be made in a variety of host organisms. As used herein, a“host” means a cell, tissue or organism capable of replication. The host can have an expression cassette or expression vector that can include a nucleic arid segment encoding an enzyme that is involved in the biosynthesis of terpenes.

The term“host cell”, as used herein, refers to any prokaryotic or eukaryotic cell that can be transformed with an expression cassettes or vector carrying the nucleic acid segment encoding one or more LDSP, enzyme, LDSP-protein fusion, or a combination thereof that is involved in the biosynthesis of one or more terpenes. The host cells can, for example, be a plant, bacterial, insect, or yeast cell. Expression cassettes encoding biosynthetic enzymes can be incorporated or transferred into a host cell to facilitate manufacture of the enzymes described herein or the terpene, diterpene, or tetpenoid products of those enzymes.

For example, the enzymes, terpenes, diterpenes, and terpenoids can be made in plants or plant cells. The terpenes, diterpenes, and terpenoids can, for example, be made and extracted from whole plants, plant parts, plant cells, or a combination thereof. Enzymes can also be made, for example, in insect, plant, or fungal (e.g., yeast) cells.

Examples of host cells include, without limitation, tobacco cells such as Nicotiana benthamiana, Nicotiana tabacum, Nicotiana rustica, Nicotiana excelsior, and Nicotiana excelsiana cells; cells of the genus Escherichia such as the species Escherichia call·, cells of the genus Clostridium such as the species Clostridium ljungdahlii, Clostridium autoeihanogenum or Clostridium kluyveri’, cells of the genus Corynebacterium such as the species Corynebacterium glutamicum; cells of the genus Cupriavidus such as the species Cupriavidus necator or Cupriavidus metallidurans·, cells of the genus Pseudomonas such as the species Pseudomonas fluorescens,

Pseudomonas putida or Pseudomonas oleavorans; cells of the genus Delftia such as the species Delftia acidovomns ; cells of the genus Bacillus such as the species Bacillus subtilis', cells of the genus Lactobacillus such as the species Lactobacillus delbrueckii·, or cells of the genus Lactococcus such as the species Lactococcus lactis.

“Host cells” can further include, without limitation, those from yeast and other fungi, as well as, for example, insect cells. Examples of suitable eukaryotic host cells include yeasts and fungi from the genus Aspergillus such as Aspergillus niger, from the genus Saccharomyces such as Saccharomyces cerevisiae; from the genus Candida such as C. tropicalis, C. albicans, C. cloacae, C. guillermondii, C. intermedia, C. maltosa, C. parapsilosis, and C. zeylenoides; from the genus Pichia (or

KomagataeUa) such as Pichia pastoris; from the genus Yarrowia such as Yarrowia lipolytica; from the genus Issatchenkia such as Issathenkia orientalis; from the genus Debaryomyces such as Debaryomyces hansenir, from the genus Arxula such as Arxula adenoinivorans, or from the genus Kluyveromyces such as Kluyveromyces lactis or from the genera Exophiala, Mucor, Trichoderma, Cladosporium, Phanerochaete, Cladophialophora, Paecilomyces, Scedosporium, and Ophiostoma.

The host cells can have organelles that facilitate manufacture or storage of the terpenes, diteipenes, and terpenoids. Such organelles can include lipid droplets.

During and after production of the terpenes, diterpenes, and terpenoids these organelles can be isolated as a semi-pure source of the of the tapenes, diterpenes, and terpenoids.

As illustrated herein, terpenoid yields obtained using the methods described herein demonstrate the versatility of the transient N. benthamiana system as a platform to produce terpenoids at industrial scales in economically relevant biomass crops.

Methods

Methods are described herein that are useful for synthesizing terpenes. The methods can involve incubating cells or tissues having a heterologous at least one expression cassette or expression vector that can express any of the enzymes and/or proteins described herein.

For example, one method can involve (a) incubating a population of host cells or host tissue comprising any of the expression systems, enzymes, lipid droplet, and/or fusion proteins described herein; and (b) isolating lipids from the population of host cells or the host tissue. In some cases, the host cells or the host tissue can be in a plant, in which case the incubating step is a cultivating step where tire plant is cultivated in an environment suitable for plant growth.

Another example of a method can involve (a) incubating a population of host cells or a host tissue, or cultivating a host seed or a host plant, where the population of host cells, the host tissue, host seed, or cells of the host plant has an expression system having at least one expression cassette having a heterologous promoter operably linked to a nucleic acid segment encoding a fusion protein comprising a lipid droplet surface protein linked in-frame to one or more a fusion partners such as a

monoterpene synthase, diterpene synthase, sesquiterpene synthase, sesterterpene synthase, triterpene synthase, tetraterpene synthase, polyterpene synthase, transcription factor, cytochrome P450, cytochrome P450 reductase, 1 -deoxy-D- xylulose 5-phosphate synthase (DXS), 1-deoxy-D-xylulose 5-phosphate-reducto- isomerase, cytidine 5 '-diphosphate-methylerythritol (CDP-ME) synthetase (IspD), 2- C-methyl-d-erythritol 2,4-cyclodiphosphate synthase (IspF), geranylgeranyl diphosphate synthase (GGDPS), HMG-CoA synthase, HMG-CoA reductase

(HMGR), mevalonic acid kinase (MVK), phosphomevalonate kinase (PMK), mevalonate-5 -diphosphate decarboxylase (MPD), isopentenyl diphosphate isomerase (EDI), abietadiene synthase (ABS), famesylpyrophosphate synthase (FPPS), ribulose bisphosphate carboxylase, squalene synthase (SQS), patchoulol synthase, or WRI1 protein; and (b) isolating lipids from the population of host cells, the host plant’s cells, or the host tissue. In some cases, a combination of enzymes, transcription factors, and lipid droplet proteins can be expressed in host cells, host plant, or host tissues. For example, high diterpenoid yields were obtained when cells or tissues were engineered to co-express DXS, GGDPS (M/GGDSP, TsGGDPS, or EpGGDPS2), and AgABS and these enzymes were targeted to plastids by fusion to a plastid-targcting peptide (see FIGs. 2A-2B, and 3B). Added expression of AfWRI(l-397) did not significantly affect diteipenoid production. Hence, it can be useful to use cells or tissues in such methods when the cells or tissues produce enzymes DXS, GGDPS, and ABS in plastids with or without expression of the WRI1 transcription factor.

In another example, high diterpenoid yields were obtained when each of the following was expressed in the cytosol: HMGR159-582, MtGGDPS, and AgABS85- 868 (FIG. 2C and FIG. 3B). Added expression of AtWRIl-397 and NoLDSP did not significantly affect diterpenoid production.

In another example, high diterpenoid yields were obtained when cells or tissues were engineered to co-produce cytosolic HMGR (e.g., cytosol:HMGR(159- 582)), cytosolic GGDPS (e.g., cytosohM/GGDPS), LDSP-fused ABS (e.g.,

LD AgABS(85-868)), and WRI1 (FIG. 5).

To produce other types terpenes and terpenoids, different types of enzymes can be used. For example, for production of functionalized diterpenoids in lipid droplets the following combinations of enzymes can be used: WRI1, LDSP, DXS (plastid), GGDSP (plastid), ABS (plastid), and either CYP (ER) or [CYP (LD) and CPR(LD)] (see, e.g., FIG. 5). Note that ER means that die enzyme or protein is localized in the endoplasmic reticulum, while LD means that the enzyme or protein is targeted to lipid droplets (e.g. because the enzyme or protein is fused to LDSP).

In another example, the following combinations of enzymes can be used to produce functionalized diterpenoids that are sequestered within or on lipid droplets: WRI1 , LDSP, HMGR (cytosol), GGDPS (cytosol), ABS (cytosol), and CYP (ER) (see, e.g., FIG. 5).

In another example, the following combinations of enzymes can be used to produce functionalized diterpenoids in lipid droplets: WRI1, HMGR (cytosol), GGDPS (cytosol), ABS (LD), CYP (LD) and CPR (LD).

Definitions

As used herein,“isolated” means a nucleic acid, polypeptide, or product has been removed from its natural or native cell. Thus, the nucleic acid, polypeptide, or product can be physically isolated from the cell, or the nucleic acid or polypeptide can be present or maintained in another cell where it is not naturally present or synthesized. The isolated nucleic acid, the isolated polypeptide, or the isolated product can also be a nucleic acid, protein, or product that is modified but has been introduced into a cell where it is or was naturally present. Thus, a modified isolated nucleic acid or an isolated polypeptide expressed from a modified isolated nucleic acid can be present in a cell along with a wild copy of the (unmodified) natural nucleic acid and along with wild type copies of the (natural) polypeptide.

As used herein, a“native” nucleic acid or polypeptide means a DNA, RNA, amino acid sequence or segment thereof that has not been manipulated in vivo or in vitro, Le., has not been isolated, purified, amplified, mutated, and/or modified.

The term“transgenic” when used in reference to a plant or leaf or vegetative tissue or seed for example a“transgenic plant,” transgenic leaf,”“transgenic vegetative tissue,”“transgenic seed,” or a“transgenic host cell” refers to a plant or leaf or tissue or seed that contains at least one heterologous or foreign gene in one or more of its cells. The term“transgenic plant material” refers broadly to a plant, a plant structure, a plant tissue, a plant seed or a plant cell that contains at least one heterologous gene in one or more of its cells.

The term“transgene” refers to a foreign gene that is placed into an organism or host cell by the process of transfection. The term“foreign nucleic acid” or refers to any nucleic acid (e.g., encoding a promoter or coding region) that is introduced into the gpnome of an organism or tissue of an organism or a host cell by experimental manipulations, such as those described herein, and may include nucleic acid sequences found in that organism so long as the introduced gene does not reside in die same location, as does the naturally occurring gene.

The term“host cell” refers to any cell capable of replicating and/or transcribing and/or translating a heterologous nucleic acid. Thus, a“host cell” refers to any eukaryotic or prokaryotic cell (e.g., plant cells, algal cells, bacterial cells, yeast cells, E. coli, insect cells, etc.), whether located in vitro or in vivo. For example, a host cell may be located in a transgenic plant or located in a plant part or part of a plant tissue or in cell culture.

As used herein, the term“wild-type” when made in reference to a gene refers to a functional gene common throughout an outbred population. As used herein, the term“wild-type” when made in reference to a gene product refers to a functional gene product common throughout an outbred population. A functional wild-type gene is that which is most frequently observed in a population and is thus arbitrarily designated the“normal” or“wild-type” form of the gene.

As used herein, the term“plant” is used in its broadest sense. It includes, but is not limited to, any species of grass (fodder, ornamental or decorative), crop or cereal, fodder or forage, fruit or vegetable, fruit plant or vegetable plant, herb plant, woody plant, flower plant or tree. It is not meant to limit a plant to any particular structure. It also refers to a unicellular plant (e.g. microalga) and a plurality of plant cells that are largely differentiated into a colony (e.g. volvox) or a structure that is present at any stage of a plant's development. Such structures include, but are not limited to, a seed, a tiller, a sprig, a stolen, a plug, a rhizome, a shoot, a stem, a leaf, a flower petal, a fruit, et cetera.

The term“plant tissue” includes differentiated and undifferentiated tissues of plants including those present in roots, shoots, leaves, pollen, seeds and tumors, as well as cells in culture (e.g., single cells, protoplasts, embryos, callus, etc.). Plant tissue may be in planta, in organ culture, tissue culture, or cell culture.

As used herein, the term“plant part” as used herein refers to a plant structure or a plant tissue, for example, pollen, an ovule, a tissue, a pod, a seed, a leaf and a cell. Plant parts may comprise one or more of a tiller, plug, rhizome, sprig, stolen, meristem, crown, and the like. In some instances, the plant part can include vegetative tissues of the plant.

Vegetative tissues or vegetative plant parts do not include plant seeds, and instead include non-seed tissues or parts of a plant. The vegetative tissues can include reproductive tissues of a plant, but not the mature seeds.

The term“seed” refers to a ripened ovule, consisting of the embryo and a casing.

The term“propagation” refers to the process of producing new plants, either by vegetative means involving the rooting or grafting of pieces of a plant, or by sowing seeds. The terms“vegetative propagation” and“asexual reproduction” refer to the ability of plants to reproduce without sexual reproduction, by producing new plants from existing vegetative structures that are clones, i.e., plants that are identical in all attributes to the mother plant and to one another. For example, the division of a clump, rooting of proliferations, or cutting of mature crowns can produce a new plant.

The term“heterologous” when used in reference to a nucleic acid refers to a nucleic acid that has been manipulated in some way. For example, a heterologous nucleic acid includes a nucleic acid from one species introduced into another species. A heterologous nucleic acid also includes a nucleic acid native to an organism that has been altered in some way (e.g., mutated, added in multiple copies, linked to a non- native promoter or enhancer sequence, etc.). Heterologous nucleic acids can include cDNA forms of a nucleic acid; the cDNA may be expressed in either a sense (to produce mRNA) or anti-sense orientation (to produce an anti-sense RNA transcript that is complementary to the mRNA transcript). For example, heterologous nucleic acids can be distinguished from endogenous plant nucleic acids in that the heterologous nucleic acids are typically joined to nucleic acids comprising regulatory elements such as promoters that are not found naturally associated with the natural gene for the protein encoded by the heterologous gene. Heterologous nucleic acids can also be distinguished from endogenous plant nucleic acids in that the heterologous nucleic acids are in an unnatural chromosomal location or are associated with portions of the chromosome not found in nature (e.g., the heterologous nucleic acids are expressed in tissues where the gene is not normally expressed).

The term“expression” when used in reference to a nucleic acid sequence, such as a gpne, refers to the process of converting genetic information encoded in a gene into RNA (e.g., mRNA, rRNA, tRNA, or snRNA) through“transcription” of the gene (Le., via the enzymatic action of an RNA polymerase), and into protein where applicable (as when a gene encodes a protein), through“translation” of mRNA. Gene expression can be regulated at many stages in the process.“Up-regulation” or “activation” refers to regulation that increases the production of gene expression products (Le., RNA or protein), while“down-regulation” or“repression” refers to regulation that decrease production. Molecules (e.g., transcription factors) that are involved in up-regulation or down-regulation are often called“activators” and “repressors,” respectively.

The terms“in operable combination,”“in operable order,” and“operably linked” refer to the linkage of nucleic acid sequences in such a manner that a nucleic acid molecule capable of directing the transcription of a coding region (e.g., gene) and/or the synthesis of a desired protein molecule is produced. The term also refers to the linkage of amino acid sequences in such a manner so that a functional protein is produced.

Transcriptional control signals in eukaryotes comprise“promoter” and “enhancer” elements. Promoters and enhancers consist of short arrays of DNA sequences that interact specifically with cellular proteins involved in transcription (see, for e.g., Maniatis, et al. (1987) Science 236:1237; herein incorporated by reference). Promoter and enhancer elements have been isolated from a variety of eukaryotic sources including genes in yeast, insect, mammalian and plant cells. Promoter and enhancer elements have also been isolated from viruses and analogous control elements, such as promoters, are also found in prokaryotes. The selection of a particular promoter and enhancer depends on the cell type used to express the protein of interest. Some eukaryotic promoters and enhancers have a broad host range while others are functional in a limited subset of cell types (for review, see Maniatis, et al. (1987), supra; herein incorporated by reference).

The terms“promoter element,”“promoter,” or“promoter sequence” refer to a DNA sequence that is located at the 5' end of the coding region of a DNA polymer. The location of most promoters known in nature is 5’ to tire transcribed region. The promoter ftmctions as a switch, activating the expression of a gene. If the gene is activated, it is said to be transcribed, or is participating in transcription. Transcription involves the synthesis of mRNA from the gene. The promoter, therefore, serves as a transcriptional regulatory element and also provides a site for initiation of transcription of the gene into mRNA.

The term“regulatory region” refers to a gene's 5' transcribed but untranslated regions, located immediately downstream from the promoter and ending just prior to the translational start of the gene.

The term“promoter region” refers to the region immediately upstream of the coding region of a DNA polymer and is typically between about 500 bp and 4 kb in length and is preferably about 1 to 1.5 kb in length. Promoters may be tissue specific or cell specific.

The term“tissue specific” as it applies to a promoter refers to a promoter that is capable of directing selective expression of a nucleic acid of interest to a specific type of tissue (e.g., vegetative tissues) in the relative absence of expression of the same nucleic acid of interest in a different type of tissue (e.g., seeds). Tissue specificity of a promoter may be evaluated by, for example, operably linking a reporter gene and/or a reporter gene expressing a reporter molecule, to the promoter sequence to generate a reporter construct, introducing the reporter construct into the genome of a plant such that the reporter construct is integrated into every tissue of the resulting transgenic plant, and detecting the expression of the reporter gene (e.g., detecting mRNA, protein, or the activity of a protein encoded by the reporter gene) in different tissues of the transgenic plant. The detection of a greater level of expression of the reporter gene in one or more tissues relative to the level of expression of the reporter gene in other tissues shows that the promoter is specific for the tissues in which greater levels of expression are detected.

The term“cell type specific” as applied to a promoter refers to a promoter that is capable of directing selective expression of a nucleic acid of interest in a specific type of cell in the relative absence of expression of the same nucleic acid of interest in a different type of cell within the same tissue. The term“cell type specific” when applied to a promoter also means a promoter capable of promoting selective expression of a nucleotide sequence of interest in a region within a single tissue. Cell type specificity of a promoter may be assessed using methods well known in the art, e.g., immunohistochemical staining. Briefly, tissue sections are embedded in paraffin, and paraffin sections are reacted with a primary antibody that is specific for the polypeptide product encoded by the nucleic acid of interest whose expression is controlled by the promoter. A labeled (e.g., peroxidase conjugated) secondary antibody that is specific for the primary antibody is allowed to bind to the sectioned tissue and specific binding detected (e.g., with avidin/biotin) by microscopy.

Promoters may be“constitutive” or“inducible.” The term“constitutive” when made in reference to a promoter means that die promoter is capable of directing transcription of an operably linked nucleic acid in the absence of a stimulus (e.g., heat shock, chemicals, light, etc.). Typically, constitutive promoters are capable of directing expression of a transgene in substantially any cell and any tissue. Exemplary constitutive plant promoters include, but are not limited to Cauliflower Mosaic Virus (CaMV SD; see e.g., U.S. Pat. No. 5352,605, incorporated herein by reference), mannopine synthase, octopine synthase (ocs), superpromoter (see e.g., WO 95/14098; herein incorporated by reference), and ubi3 promoters (see e.g., Garbarino and Belknap, Plant Mol. Biol. 24:119-127 (1994); herein incorporated by reference). Such promoters have been used successfully to direct the expression of heterologous nucleic acid sequences in transformed plant tissue.

In contrast, an“inducible” promoter is one that is capable of directing a level of transcription of an operably linked nucleic acid in the presence of a stimulus (e.g., heat shock, chemicals, light, etc.) that is different from the level of transcription of the operably linked nucleic acid in the absence of the stimulus. The term“vector” refers to nucleic acid molecules that transfer DNA segment(s). Transfer can be into a cell, cell to cell, et cetera. The term“vehicle” is sometimes used interchangeably with“vector.” The vector can, for example, be a plasmid. But the vector need not be plasmid.

As used herein, the singular forms“a,”“an,” and“the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. Also, as used herein,“and/or” refers to, and encompasses, any and all possible combinations of one or more of the associated listed items. Unless otherwise defined, all terms, including technical and scientific terms used in the description, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.

The term“about”, as used herein, can allow for a degree of variability in a value or range, for example, within 10%, within 5%, or within 1% of a stated value or of a stated limit of a range.

The term“enzyme” or“enzymes”, as used herein, refers to a protein catalyst capable of catalyzing a reaction. Herein, the term does not mean only an isolated enzyme, but also includes a host cell expressing that enzyme. Accordingly, the conversion of A to B by enzyme C should also be construed to encompass the conversion of A to B by a host cell expressing enzyme C.

The terms“identical” or percent“identity”, as used herein, in the context of two or more nucleic acids, or two or more polypeptide sequences, refer to two or more sequences or subsequences that are the same or have a specified percentage of nucleotides or amino acid residues that are tire same (e.g., 75% identity, 80% identity, 85% identity, 90% identity, 95% identity, 97% identity, 98% identity, 99% identity, or 100% identity in pairwise comparison). Sequence identity can be determined by comparison and/or alignment of sequences for maximum correspondence over a comparison window, or over a designated region as measured using a sequence comparison algorithm, or by manual alignment and visual inspection. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the results by 100 to yield the percentage of sequence identity. A“reference sequence” is a defined sequence used as a basis for a sequence comparison; a reference sequence may be a subset of a larger sequence.

As used herein the term“terpene” includes any type of terpene or terpenoid, including for example any mono terpene, diterpene, sesquiterpene, sesterterpene, triterpene, tetrateipene, polyterpene, and any mixture thereof.

The following non-limiting Examples describe some procedures that can be performed to facilitate making and using the invention.

Example 1: Materials and Methods

This Example describes some of the materials and methods used in the development of the invention.

Generation of constructs for transient expression studies in N. benthamiana

The open reading frames encoding truncated A. thaliana WRINKLED 1 (AfWRIll-397, AY254038.2) and full-length N. oceanica lipid droplet surface protein (iVoLDSP, JQ268559.1) were amplified from existing cDNAs.

The coding sequences for truncated cytosolic E. lathyris HMGR

(E/HMGR159-582, JQ694150.1), cytosolic A. thaliana FDPS (cytosolA/FDPS,

NM_117823.4), cytosolic P. cablin patchoulol synthase (cytosolrPcPAS, AY508730), plastidic A. grandis abietadiene synthase (plastid:AgABS, U50768.1), and plastidic P. barbatus (PibDXS) were amplified from cDNAs derived from total RNA of the host organisms.

An amino acid sequence for a cytosolic P. cablin patchoulol synthase (cytosohPcPAS, AY508730; SEQ ID NO:43) is shown below.

1 MELYAQSVGV GAASRPLANF HPCVWGDKFI VYNPQSCQAG 41 EREEAEELKV ELKRELKEAS DNYMRQLKMV DAIQRLGIDY 81 LFVEDVDEAL KNLFEMFDAF CKNNHDMHAT ALSFRLLRQH 121 GYRVSCEVFE KFKDGKDGFK VPNEDGAVAV LEFFEATHLR 161 VHGEDVLDNA FDFTRNYLES VYATLNDPTA KQVHNALNEF 201 SFRRGLPRVE ARKYISIYEQ YASHHKGLLK LAKLDFNLVQ 241 ALHRRELSED SRWWKTLQVP TKLSFVRDRL VESYFWASGS 281 YFEPNYSVAR MILAKGLAVL SLMDDVYDAY GTFEELQMFT 321 DAIERWDASC LDKLPDYMKI VYKALLDVFE EVDEELIKLG 361 APYRAYYGKE AMKYAARAYM EEAQWREQKH KPTTKEYMKL 401 ATKTCGYITL IILSCLGVEE GIVTKEAFDW VFSRPPFIEA 441 TLIIARLVND ITGHEFEKKR EHVRTAVECY MEEHKVGKQE 481 WSEFYNQME SAWKDINEGF LRPVEFPIPL LYLILNSVRT 521 LEVIYKEGDS YTHVGPAMQN IIKQLYLHPV PY A nucleic acid sequence for a cytosolic P. cablin patchoulol synthase (cytosohPcPAS, AY508730; SEQ ID NO:44) is shown below.

1 ATGGAGTTGT ATGCCCAAAG TGTTGGAGTG GGTGCTGCTT

41 CTCGTCCTCT TGCGAATTTT CATCCATGTG TGTGGGGAGA

81 CAAATTCATT GTCTACAACC CACAATCATG CCAGGCTGGA

121 GAGAGAGAAG AGGCTGAGGA GCTGAAAGTG GAGCTGAAAA

161 GAGAGCTGAA GGAAGCATCA GACAACTACA TGCGGCAACT

201 GAAAATGGTG GATGCAATAC AACGATTAGG CATTGACTAT

241 CTT TTTGT GG AAGATGTTGA TGAAGCTTTG AAGAATCTGT

281 TTGAAATGTT TGATGCTTTC TGCAAGAATA ATCATGACAT

321 GCACGCCACT GCTCTCAGCT TTCGCCTTCT CAGACAACAT

361 GGATACAGAG TTTCATGTGA AGTTTTTGAA AAGTTTAAGG

401 ATGGCAAAGA TGGATTTAAG GTTCCAAATG AGGATGGAGC

441 GGTTGCAGTC CTTGAATTCT TCGAAGCCAC GCATCTCAGA

481 GTCCATGGAG AAGACGTCCT TGATAATGCT TTTGACTTCA

521 CTAGGAACTA CTTGGAATCA GTCTATGCAA CTTTGAACGA

561 TCCAACCGCG AAACAAGTCC ACAACGCATT GAATGAGTTC

601 TCTTTTCGAA GAGGATTGCC ACGCGTGGAA GCAAGGAAGT

641 ACATATCAAT CTACGAGCAA TACGCATCTC ATCACAAAGG

681 CTTGCTCAAA CTTGCTAAGC TGGATTTCAA CTTGGTACAA

721 GCTTTGCACA GAAGGGAGCT GAGTGAAGAT TCTAGGTGGT

761 GGAAGACTTT ACAAGTGCCC ACAAAGCTAT CATTCGTTAG

801 AGATCGATTG GTGGAGTCCT ACTTCTGGGC TTCGGGATCT

841 TATTTCGAAC CGAATTATTC GGTAGCTAGG ATGATTTTAG

881 CAAAAGGGCT GGCTGTATTA TCTCTTATGG ATGATGTGTA

921 TGATGCATAT GGTACTTTTG AGGAATTACA AATGTTCACA

961 GATGCAATCG AAAGGTGGGA TGCTTCATGT TTAGATAAAC

1001 TTCCAGATTA CATGAAAATA GTATACAAGG CCCTTTTGGA

1041 TGTGTTTGAG GAAGTTGACG AGGAGTTGAT CAAGCTAGGC

1081 GCACCATATC GAGCCTACTA TGGAAAAGAA GCCATGAAAT

1121 ACGCCGCGAG AGCTTACATG GAAGAGGCCC AATGGAGGGA

1161 GCAAAAGCAC AAACCCACAA CCAAGGAGTA TATGAAGCTG

1201 GCAACCAAGA CATGTGGCTA CATAACTCTA ATAATATTAT

1241 CATGTCTTGG AGTGGAAGAG GGCATTGTGA CCAAAGAAGC

1281 CTTCGATTGG GTGTTCTCCC GACCTCCTTT CATCGAGGCT

1321 ACATTAATCA TTGCCAGGCT CGTCAATGAT ATTACAGGAC

1361 ACGAGTTTGA GAAAAAACGA GAGCACGTTC GCACTGCAGT

1401 AGAATGCTAC ATGGAAGAGC ACAAAGTGGG GAAGCAAGAG

1441 GTGGTGTCTG AATTCTACAA CCAAATGGAG TCAGCATGGA

1481 AGGACATTAA TGAGGGGTTC CTCAGACCAG TTGAATTTCC

1521 AATCCCTCTA CTTTATCTTA TTCTCAATTC AGTCCGAACA

1561 CTTGAGGTTA TTTACAAAGA GGGCGATTCG T AT AC AC AC G

1601 TGGGTCCTGC AATGCAAAAC ATCATCAAGC AGTTGTACCT

1641 TCACCCTGTT CCATATTAA

The open reading frame encoding a truncated C. acuminata CPR (CaCPRVO- 708, KP162177) lacking the N-terminal membrane anchor domain was synthesized. Codon optimized open reading frames were synthesized for the type I GGDPSs from S. acidocaldarius (SaGGDPS, D28748.1) and M. thermautotrophicus (MiGGDPS, AE000666.1).

A putative M. elongata AG77 MeGGDPS (type III) was identified through mining of transcriptome data43 and a codon optimized open reading frame was synthesized (Supplemental Data). Two putative type II GGDPSs, EpGGDPSl and £pGGDPS2, were identified through mining of E. peplus transcriptome data and amplified from leaf cDNA. A putative type P GGDPS was identified in the genome of Tolypothrix sp. PCC 7601 (TsGGDPS) and the coding sequence was amplified from genomic DNA. To target SaGGDPS, AftGGDPS, TsGGDPS, AfeGGDPS,

AiFDPS and PtPAS to the plastid, the sequences were fused at their N-terminus to the plastid targeting sequence of the Ambidopsis thaliana ribulose bisphosphate carboxylase small chain 1A (NM_105379.4). This Ambidopsis thaliana ribulose bisphosphate carboxylase small chain 1 A protein is shown below as SEQ ID NO:49.

1 MASSMLSSAT MVASPAQATM VAPFNGLKSS AAFPATRKAN

41 NDITSITSNG GRVNCMQVWP PIGKKKFETL SYLPDLTDSE

81 LAKEVDYLIR NKWIPCVEFE LEHGFVYREH GNSPGYYDGR

121 YWTMWKLPLF GCTDSAQVLK EVEECKKEYP NAFIRIIGFD

161 NTRQVQCISF IAYKPPSFTG

A nucleotide sequence for die Ambidopsis thaliana ribulose bisphosphate carboxylase small chain 1A (NM_105379.4) is shown below as SEQ ID NO:50.

1 CCAAGGTAAA AAAAAGGTAT GAAAGCTCTA TAGTAAGTAA 41 AATATAAATT CCCCATAAGG AAAGGGCCAA GTCCACCAGG 81 CAAGTAAAAT GAGCAAGCAC CACTCCACCA TCACACAATT 121 TCACTCATAG ATAACGATAA GATTCATGGA ATTATCTTCC 161 ACGTGGCATT ATTCCAGCGG TTCAAGCCGA TAAGGGTCTC 201 AACACCTCTC CTTAGGCCTT TGTGGCCGTT ACCAAGTAAA 241 ATTAACCTCA CACATATCCA CACTCAAAAT CCAACGGTGT 281 AGATCCTAGT CCACTTGAAT CTCATGTATC CTAGACCCTC 321 CGATCACTCC AAAGCTTGTT CTCATTGTTG TTATCATTAT 361 ATATAGATGA CCAAAGCACT AGACCAAACC TCAGTCACAC 401 AAAGAGTAAA GAAGAACAAT GGCTTCCTCT ATGCTCTCTT 441 CCGCTACTAT GGTTGCCTCT CCGGCTCAGG CCACTATGGT 481 CGCTCCTTTC AACGGACTTA AGTCCTCCGC TGCCTTCCCA 521 GCCACCCGCA AGGCTAACAA CGACATTACT TCCATCACAA 561 GCAACGGCGG AAGAGTTAAC TGCATGCAGG TGTGGCCTCC 601 GATTGGAAAG AAGAAGTTTG AGACTCTCTC TTACCTTCCT 641 GACCTTACCG ATTCCGAATT GGCTAAGGAA GTTGACTACC 681 TTATCCGCAA CAAGTGGATT CCTTGTGTTG AATTCGAGTT 721 GGAGCACGGA TTTGTGTACC GTGAGCACGG TAACTCACCC 761 GGATACTATG ATGGACGGTA CTGGACAATG TGGAAGCTTC 801 CCTTGTTCGG TTGCACCGAC TCCGCTCAAG TGTTGAAGGA

841 AGTGGAAGAG TGCAAGAAGG AGTACCCCAA TGCCTTCATT

881 AGGATCATCG GATTCGACAA CACCCGTCAA GTCCAGTGCA

921 TCAGTTTCAT TGCCTACAAG CCACCAAGCT TCACCGGTTA

961 ATTTCCCTTT GCTTTTGTGT AAACCTCAAA ACTTTATCCC

1001 CCATCTTTGA TTTTATCCCT TGTTTTTCTG CTTTTTTCTT

1041 CTTTCTTGGG TTTTAATTTC CGGACTTAAC GTTTGTTTTC

1081 CGGTTTGCGA GACATATTCT ATCGGATTCT CAACTGTCTG

1121 ATGAAATAAA TATGTAATGT TCTATAAGTC TTTCAATTTG

1161 ATATGCATAT CAACAAAAAG AAAATAGGAC AATGCGGCTA

1201 CAAATATGAA ATTTACAAGT TTAAGAACCA TGAGTCGCTA

1241 AAGAAATCAT TAAGAAAATT AGTTTCAC

In some cases, a portion of the Arabidopsis thaliana ribulose bisphosphate carboxylase small chain 1A protein was used as a chloroplast transit peptide to re- localize cytosolic proteins to the chloroplast. Such an Arabidopsis thaliana ribulose bisphosphate carboxylase small chain 1A peptide can have SEQ ID NO:l01 (shown below).

1 MASSMLSSAT MVASPAQATM VAPFNGLKSS AAFPATRKAN 41 NDITSITSNG GRVN

A nucleic acid segment that encodes the Arabidopsis thaliana ribulose bisphosphate carboxylase small chain 1A peptide with SEQ ID NO: 101 is shown below as SEQ ID NO: 102.

1 ATGGCTTCCT CTATGCTCTC TTCCGCTACT ATGGTTGCCT

41 CTCCGGCTCA GGCCACTATG GTCGCTCCTT TCAACGGACT

81 TAAGTCCTCC GCTGCCTTCC CAGCCACCCG CAAGGCTAAC

121 AACGACATTA CTTCCATCAC AAGCAACGGC GGAAGAGTTA

161 AC

Examples of plastid-targeted proteins are referred to as plastid:5aGGDPS, plastid:M/GGDPS, plastid:7sGGDPS plastid:MeGGDPS, plastidAfFDPS and plastid:PcPAS.

The coding sequences of A. grandis abietadiene synthase (SEQ ID NO:31) and P. sitchensis CYP720B4 (ER:PcCYP720B4; SEQ ID NO:35) were truncated to target the enzymes to the cytosol, in this study referred to as cytosol:AgABS(85-868) (SEQ ID NO:33) and cytosol:PsCYP720B4(30-483)(SEQ ID NO:37), respectively.

For lipid droplet targeting, truncated A. grandis abietadiene synthase, P. sitchensis CYP720B4 and C. acuminata CPR were either fused to the N-terminus or C-terminus of N. oceanica lipid droplet surface protein resulting in LD Ag ABS85- 868, LD:PsCYP720B4(30-483) and LD:CaCPR(70-708), respectively (FIG. 4). The full-length and modified coding sequences were verified by sequencing, inserted into pENTR4 (Invitrogen), and subsequently transferred into tire Gateway vectors pEarleygate 100 and pEarleygate 104 (N-terminal YFP-tag), each under control of a 35S promoter for strong constitutive expression (Earley et al. Plant J. 45, 616-629

(2006)). These constructs were introduced into A. tumefaciens LBA4404 for transient expression studies in Nicotiana benthamiana.

Agrbactetrobacterium-mediated transient expression in N. benthamiana leaves

Transformants of A. tumefaciens LBA4404 carrying selected binary vectors were grown overnight at 28 °C in Luria-Bertani medium containing SO pg/mL rifampicin and SO pg/mL kanamycin. Prior to infiltration into N benthamiana leaves, the A. tumefaciens cells were sedimented by centrifugation at 3800 x g for 10 min, washed, resuspended in infiltration buffer (10 mM MES-KOH pH 5.7, 10 mM MgCk, 200 mM acetosyringone) to an optical density at 600 nm (ODMM) 0.8 and incubated for approximately 30 min at 30°C. To test various gene combinations, equal volumes of the selected bacterial suspensions were mixed and infiltrated into M benthamiana leaves using a syringe without a needle. A. tumefaciens LBA4404 carrying the tomato bushy stunt virus gene P19 (Voinnet et al. Proc. Natl. Acad. Sci. 96, 14147-14152 (1999)); Voinnet et al. Proc. Natl. Acad. Sci. 112, E4812 (2015)) was included in all infiltrations to suppress RNA silencing in N. benthamiana. The M benthamiana plants were grown for 3.5 to 4 weeks in soil at 25 °C under a 12-hour photoperiod at 150 mthoΐ m 2 s 1 . After infiltration, the plants were grown for 4 additional days in the growth chamber. Samples from the infiltrated leaves were subsequently analyzed for terpenoid or triacylglycerol content.

Lipid Analysis

Triacylglycerol analyses were performed essentially as described by Yang et al. (Plant Physiol. 169,1836-1847 (2015)) with minor modifications. Fra- each sample, one M benthamiana leaf was freshly harvested and total lipids were extracted with 4 mL chloroform/methanol/formic acid (10:20:1, by volume). Ten micrograms tri-17K) TAG (Sigma) was added as internal standard to each sample. Statistical Analyses

Statistical analyses were conducted using two-tailed unpaired Student’s f-tests. A P- value of < 0.05 was considered statistically significant. Terpenoid analyses in N. benthamiana leaves

For each sample, one leaf disc (~100 mg fresh weight) was incubated with 1 mL hexane containing 2 mg/mLl-eicosene (internal standard, TCI America) on a shaker for 15 min at room temperature prior to incubation in the dark for 16 hours at room temperature. The reaction products were separated and analyzed by GC-MS using an Agilent 7890A GC system coupled to an Agilent 5975C MS detector.

Chromatography was performed with an Agilent VF-5ms column (40 m x 0.25 mm x 0.25 pm) at 1.2 mL/min helium flow. The injection volume was 1 pL in splitless mode at an injector temperature of 250°C. The following oven program was used (run time 18.74 min): 1 min isothermal at 40°C, 40°C per minute to 180°C, 2 min isothermal at 180°C, 15°C per minute to 300°C, 1 min isothermal at 300°C, 100°C per minute to 325 °C and 3 minutes isothermal at 325°C. The mass spectrometer was operated at 70 eV electron ionization mode, a solvent delay of 3 minutes, ion source temperature at 230°C, and quadrupole temperature at 150°C. Mass spectra were recorded from m/z 30 to 600. Terpenoid products were identified based on retention times, mass spectra published in relevant literature and through comparison with tire NIST Mass Spectral Library vl 7 (National Institute of Standards and Technology, USA). Quantitation of diterpenoid products as well as patchoulol was based on 1- eicosene standard curves. The extracted ion chromatograms for each target compound were integrated, and compounds were quantified using QuanLynx tool (Waters) with a mass window allowance of 0.2 and a signal-to-noise ratio greater than or equal to 10. All calculated peak areas were normalized to the peak area for the internal standard 1-eicosene and tissue fresh weight.

Diterpenoid resin acids and glycosylated derivatives were analyzed by UHPLC/MS/MS to confirm accurate masses and fragments. For each sample, one leaf disc (-100 mg fresh weight) was incubated with 1 mL methanol containing 1.25 pM telmisartan (internal standard, Toronto Research Chemicals) in the dark for 16 h at room temperature. A 10-pL volume of each extract was subsequently analyzed using a 31 -min gradient elution method on an Acquity BEH C18 UHPLC column (2.1 x 100 mm, 1.7 pm, Waters) with mobile phases consisting of 0.15% formic acid in water (solvent A) and acetonitrile (solvent B). The method involved a 31 -minute gradient employing 1 % B at 0.00 to 1 min, linear gradient to 99% B at 28.00 min, with a hold until 30 min, followed by a return to 1 % B and a hold from 30.10 to 31 minutes. The flow rate was 0.3 mL/min and the column temperature was 40°C. The mass spectrometer (Xevo G2-XS QTOF, Waters) was equipped with an electrospray ionization source and operated in negative-ion mode. Source parameters were as follows: capillary voltage 2500 V, cone voltage 40 V, desolvation temperature 300°C, source temperature 100°C, cone gas flow 50 L/h, and desolvation gas flow 600 L/h. Mass spectrum acquisition was performed in negative ion mode over m/z 50 to 1500 with scan time of 0.2 seconds using a collision energy ramp 20 to 80 V.

Isolation of lipid droplets

Lipid droplets were isolated as previously described with minor adjustments (Ding, Y. et al. Nat. Protoc. 8: 43 (2012)). For each sample, 1 g infiltrated N benthamiana leaf tissue was ground with mortar and pestle in 20 mL ice-cold buffer A (20 mM tricine, 250 mM sucrose, 0.2 mM phenylmethylsulfonyl fluoride pH 7.8). The homogenate was filtered through Miracloth (Calbiochem) and centrifuged in a 50-mL tube at 3,400 g for 10 min at 4°C to remove cell debris. From each tube, 10 mL supernatant was collected and transferred to a 15-mL tube. The supernatant fraction was then overlaid with 3 mL buffer B (20 mM HEPES, 100 mM KC1, 2 mM MgCh, pH 7.4) and centrifuged for 1 hour at 5,000 g. After centrifugation, 2 mL from the top of each gradient containing floating lipid droplets were collected. For terpenoid analysis, each lipid droplet fraction was extracted with 1 mL hexane containing 2 pg/mL 1-eicosene (internal standard, TCI America) prior to GC-MS analysis.

Con focal imaging

For lipid droplet visualization, freshly harvested leaf samples were stained with Nile red as described by Sanjaya et aL (Plant Biotechnol. J. 9, 874-883 (2011)). Imaging of Nile red, chlorophyll and enhanced yellow fluorescent protein (EYFP) fluorescence was conducted with a confocal laser scanning microscope FluoView VF1000 (Olympus) at excitation 559 nm/emission 570-630 nm, excitation 559 nm/emission 655-755 nm and excitation 515 nm/emission 527 nm, respectively. Images were processed using the FV10-ASW 3.0 microscopy software (Olympus). Example 2: Expression of a microalgal lipid droplet surface protein increases WRINKLEDl-initiated triacylglycerol accumulation

To assess the impact of NoLDSP on AtWRIl (l-397)-initiated triacylglycerol accumulation, leaves of N. benthamiana were infiltrated with Agrobacterium tumefaciens suspensions for transient production of AtWRIl (1-397) alone or in combination with a lipid droplet surface protein (NoLDSP) encoding cDNA from the microalga Narmochloropsis oceanica (AtWRIl (1-397) + NoLDSP). NoLDSP possesses a hydrophobic central region that likely mediates the anchoring on lipid droplets.

In leaves producing AtWRIl (1-397) or AtWRIl(l-397) with NoLDSP, the triacylglycerol level was at least 3-fold higher and about 12-fold higher, respectively, than in control leaves without AtWRIl 1-397 (FIG. 1A).

These results clearly demonstrated the beneficial impact of the microalgal NoLDSP on lipid droplet accumulation. NoLDSP had no negative impact on triacylglycerol production and enhanced the accumulation of lipid droplets in infiltrated N. benthamiana leaves.

Example 3: Engineered sesquiterpenoid production

in the cytosol and plastids

Different engineering strategies were then tested for the production of sesquiterpenoids using patchoulol as a model compound. Like many other sesquiterpenoids, patchoulol is volatile. Previous work has shown that engineered production of patchoulol in transgenic lines of N. tabacum resulted in significant losses from volatile emission (Wu et al Nat. BiotechnoL 24: 1441-1447 (2006)). In the experiments described here, losses of atmospheric terpenoid emission were not recorded because the engineering strategies were designed to sequester target terpenoids in lipid droplets in the plant biomass.

Transient production of cytosolic Pogostemon cablin patchoulol synthase (cytosoltPcPAS) led to formation of a single low-level product, patchoulol, which was not detected in wild-type control plants (FIG. IB).

To enhance the precursor availability for sesquiterpenoid synthesis, feedback- insensitive forms of Euphorbia lathyris HMGR (£ZHMGR(159-582)) and A. thaliana FDPS (cytosohAfFDPS) were included in the transient assays. Some reports indicate that E. lathyris accumulates high levels of triterpenoids and their esters (Skrukrud et al. in The Metabolism, Structure, and Function of Plant Lipids (eds. Paul K. Stumpf,

J. Brian Mudd, & W. David Nes) 115-118 (Springer New York, 1987)), suggesting that its HMGR could be a robust enzyme for sesquiterpenoid production in M benthamiana. The selection of the A. thaliana FDPS was based on its relatively high thermal stability (Keim et al. PloS One 7, c49109 (2012)).

The patchoulol content in M benthamiana leaves producing £7HMGR(l59- 582) with cytosolAfFDPS and cytosohPcPAS was at least 5-fold higher than in leaves with cytosol:PcPAS alone, which is consistent with enhanced precursor flux. However, co-engineering of patchoulol and triacylglycerol synthesis impaired cytosolic terpenoid accumulation, independent of whether precursor availability was increased or not (FIG. IB).

A previous study demonstrated that re-direction of PcPAS and avian FDPS to the plastid increased the retained patchoulol levels in leaves of stable transgenic M tabacum lines up to approximately 30 pg patchoulol per gram fresh weight (Wu et al. Nat. Biotechnol. 24, 1441-1447 (2006)). This approach was modified to further examine engineering strategies for the co-production of patchoulol and lipid droplets in N. benthamiana leaves.

Targeting of patchoulol synthase to plastids (plastid:PcPAS) led to accumulation of approximately 0.5 pg patchoulol per gram fresh weight (FIG. 1C). To increase the precursor flux in the plastids, P. barbatus DXS (PbDXS) and plastid- targeted AfFDPS (plastid ArFDPS) were combined with plastid: PcPAS in die assays. This strategy resulted in a 60-fold increase in die level of patchoulol (FIG. 1C). Synthetic lipid droplet accumulation impaired patchoulol production in leaves in the absence of PbDXS and plastid lAfFDPS, when precursor synthesis was not coengineered (FIG. 1C). The negative impact on patchoulol synthesis was rescued when plastidiA/FDPS or PM3XS with plastid:ArFDPS were included in the assay.

Leaves transiently producing PfcDXS with plastid :A/FDPS, plastid:PcPAS, A/WRI1 (1-397), and JVoLDSP yielded the highest patchoulol level retained in leaves, up to about 45 ug patchoulol per gram fresh weight, an average 90-fold and 1.5-fold higher compared to leaves producing plastid:PcPAS and PbDXS with

plastidiArFDPS, and plastid:PcPAS, respectively. Example 4: Diterpenoid scaffold production in plastids and cytosol

Strategies for diterpenoid production in the M benthamkma system were examined using the Abies grandis abietadiene synthase (AgABS) as diterpene synthase. This bifunctional enzyme has class II and class I terpcne synthase activity and catalyzes both the bicyclization of GGDP to a (+)-copalyl diphosphate intermediate and the subsequent secondary cyclization and further rearrangement.

Transient production of the native plastidial A. grandis abietadiene synthase (plastidAgABS) resulted in the accumulation of abietadiene (abieta-7, 13-diene), levopimaradiene (abieta-8(14), 12-diene), neoabietadiene (abieta-8(14),13(15)-diene) and, as minor product, palustradiene (abieta-8, 13-diene). These diterpenoids were not detected in wild-type control leaves of N benthamiana.

Sole production of plastidAgABS yielded about 40 pg diterpenoids per gram fresh weight (FIG. 2A). To enhance tire production of diterpenoids, plastidAgABS was co-produced in different combinations with PbDXS and a plastid GGDPS.

GGDPSs are differentiated into three types (type I-III) according to their amino acid sequences around the first aspartate-rich motif. These three types differ in their mechanism of determining product chain-length (Noike et al. J. Biosci. Bioeng. 107, 235-239 (2009); Chang et aL J. Biol. Chem. 281, 14991-15000 (2006)). Plant GGDPSs are type P enzymes that are regulated on gene expression, transcript and protein level (Xu et aL BMC Genomics 11, 246-246 (2010); Zhou et aL Proc. Nad. Acad. Sci. 1 14, 6866-6871 (2017); Ruiz-Sola et al. New PhytoL 209, 252-264 (2016)).

The inventors hypothesized that inclusion of distantly related type I and type PI GGDPSs or a cyanobacterial type II GGDPS may bypass potential regulatory steps that can limit diterpenoid production in N. benthamiana. Six GGDPSs were selected based on GenBank and BLAST searches as well as analysis of transcriptome data, a GGDPS from the archaea Sulfolobus acidocaldarius (SaGGDPS, type I) and five predicted GGDPSs from the archaea Methano thermobacter thermau to trophicus (AftGGDPS, type I), the cyanobacterium Tolypothrix sp. PCC 7601 (7sGGDPS, type P), the plant Euphorbia peplus (EpGGDPSl and £pGGDPS2, type P), and the fungus

Mortierella elongata AG77 (AfeGGDPS, type III). The sequences of SaGGDPS, M/GGDPS, and AfeGGDPS enzymes share only 24%, 25% and 17% amino acid identities with EpGGDPSl, respectively, whereas TsGGDPS and £pGGDPS2 share 48% and 58% identities with EpGGDPSl, respectively. For transient assays in M henthamiana, the coding sequences for the bacterial and fungal GGDPSs were codon-optimized (except for TsGGDPS) and modified to target the enzymes to the plastids, referred to as plastid:SaGGDPS,

plastid:MfGGDPS, plastid:7sGGDPS, and plastid:AfeGGDPS. Co-production of PbDXS with plastid AgABS or plastid:GGDPS with plastidAgABS was insufficient to increase the diterpenoid content in M benthamiana leaves more than 2-fold compared to the diterpenoid level in plastidAgABS-producing leaves (FIG. 2A).

In contrast, co-production of PbOXS with GGDPS and plastidAgABS enhanced diterpenoid production to up to 6.5-fold compared to leaves producing plastidAgABS). Significant differences in diterpenoid yields were obtained depending on which GGDPS was included, apparently unrelated to a specific type of GGDPS (FIG. 2A). The highest diterpenoid levels were in N. benthamiana leaves co- producing PbDXS with plastid AgABS, plastid:MfGGDPS (type 1), plastid: 7sGGDPS (type P), or £pGGDPS2 (type P), with similar yield between these combinations (FIG. 2A).

Diterpenoid accumulation was further evaluated in the presence of lipid droplets. Co-production of plastid Ag ABS with AfWRIl (1-397) had no significant impact on the diterpenoid level compared to control leaves producing plastid Ag ABS alone. However, in leaves producing plastid Ag ABS with AfWRIl 1-397 and

/VbLDSP, the diterpenoid content was increased 2-fold (FIG. 2B). Similarly, co- production of plastid:AftGGDPS with plastidAgABS, AfWRIl (1-397) and NoLDSP increased the diterpenoid level 2.5 -fold compared to plastid:MfGGDPS with plastid:AgABS-producing leaves.

These results indicated that the increased abundance of lipid droplets was beneficial for, and contributed to, die accumulation of diterpenoid products.

Sequestration of the lipophilic diterpenoids into lipid droplets may have helped to circumvent negative feedback regulatory mechanisms and served as“pull force” in diterpenoid production.

In fact, isolated lipid droplet fractions from leaves producing plastid AgABS with AfWRIl(l -397) and plastidAgABS with AfWRIl (1-397) and NoLDSP contained at least 35-fold and 420-fold more diterpenoids, respectively, than control fractions from leaves with plastidAgABS, consistent with the sequestration of diterpenoids in lipid droplets (FIG. 2D-2E). NoLDSP promotes clustering of small lipid droplets (FIG. 2F). The localization of yellow fluorescent fusion protein-tagged NoLDSP (YFP-NoLDSP) in clustered lipid droplets was observed by confocal laser scanning microscopy on a collected lipid droplet fraction.

Co-production of PbDXS and plastid tMfGGDPS together with plastidAgABS yielded the highest diterpenoid level (FIG. 2B), independent of whether AtWRIl(l- 397) was included for lipid droplet synthesis, in the transient assays yielded the highest diterpenoid level independent of whether lipid droplets were co-engineered (FIG. 2B). In contrast, co-production of P6DXS with plastid:AftGGDPS and plastid AgABS together with AtWRIl (1-397) and A/bLDSP resulted in a significant reduction of the diterpenoid level (compared to leaves producing PbDXS with plastid:MK3GDPS and plastidAgABS).

When A. grandis abietadiene synthase was targeted to the cytosol

(cytosolAgABS(85-868)), leaves accumulated approximately 0.2 ^ diterpenoids per gram fresh weight and addition of precursor pathway genes enhanced diterpenoid synthesis (FIG. 2C). Co-production of cytosol AgABS(85-868) together with

£7HMGR( 159-582) and cytosolic Af. thermautotrophicus GGDPS

(cytosohAftGGDPS) increased the diterpenoid yield more than 400-fold (relative to cytosol AgABS(85-868) containing leaves) and, thus, close to the highest diterpenoid yield achieved with plastid engineering approaches (FIGs. 2B-2C).

Moreover, these data indicated that lipid droplets exhibited an enhancing effect of accumulation on terpenoid production when cytosolAgABS(85-868) was coproduced with AfWRIl(l-397) or AfWRIl(l -397) with A/oLDSP (FIG. 2C). Under these conditions, terpenoid production was increased up to approximately 3-fold which is consistent with diterpenoids being sequestered in lipid droplets.

When E/HMGR( 159-582) with cytosol:AftGGDPS, cytosol AgABS(85-868), ArWRU(l-397) and A/bLDSP were co-produced, no additive effects of lipid droplet engineering on terpenoid yield were detected (relative to £/HMGR( 159-582) with cytosohAftGGDPS and cytosolAgABS85-868) (FIG. 2C).

Example 5: Triacylglycerol analysis of N. benthamiana leaves

engineered for terpenoid and lipid droplet production

To examine a potential impact of terpenoid engineering on triacylglycerol yield, the established approaches for low-yield or high-yield terpenoid synthesis combined with lipid droplet production were further tested. Four days after A. tumefaciens infiltration into N. benthamiana to engineer the N. benthamiana to express various enzyme expression systems, N. benthamiana leaves were subjected to triacylglycerol analysis. Leaves co-engineered for lipid droplet and high-yield patchoulol production in the cytosol contained approximately 50% less triacylglycerol than leaves producing just A/WRIl(l-397) with JVoLDSP

(FIG. 3A). A significant decrease in the triacylglycerol level was also detected when leaves were engineered for cytosol-targeted high-yield production of diterpenoids (compared to leaves producing AfWRIl 1 -397 with JVoLDSP) (FIG. 3B). When lipid droplet production was combined with a plastid-targeted approach for high-yield terpenoid synthesis, no negative impact on triacylglycerol accumulation was observed compared to control plants (Fig. 3A-3B).

In tiie cytosol, low-yield terpenoid production of diterpenoid had no impact on TAG yield; low-yield of sesquiterpenoid also had little or no significant impact on triacylglycerol yield. Highr yield production of sesquiterpenoids and diterpenoids in the cytosol led to approximately 50% less triacylglycerol.

Under certain conditions, terpenoid production may compete with

triacylglycerol biosynthesis for carbon from the plastid. The different triacylglycerol yields in cytosolic approaches (low yield vs. high yield) suggest regulatory mechanisms may exist to control the partitioning of carbon between plastid and cytosol. As both FDP and GGDP serve as prenyl donors for protein prenylation in the cytosol, protein prenylation may be involved in these regulatory networks. Alterations in tiie cytosolic levels of FDP and GGDP may have indirectly contributed to the decrease in triacylglycerol yields.

Example 6: Targeting diterpenoid and diterpenoid acid production

to lipid droplets

This Example describes experiments designed to determine whether lipid droplets in the cytosol can be used as platform to anchor biosynthetic pathways for the production of functionalized diterpenoids. The proof-of-concept experiments included use of Picea sitchensis cytochrome P450 PsCYP720B4 (ER:PsCYP720B4) that can convert abietadiene and several isomers to the corresponding diterpene resin acids as well as a modified A grandis abietadiene synthase.

To target terpenoid synthesis to lipid droplets, A grandis abietadiene synthase lacking the N-terminal plastid targeting sequence (cytosol:AgABS(85-868)) and truncated /¾CYP720B4 lacking the N-terminal membrane-binding domain

(cytosol:PsCYP720B4(30-483)) were produced as C-terminal and N-terminal /VoLDSP-fusion proteins, respectively. The /VoLDSP-fusion proteins are herein referred to as LD:AgABS(85-868) and LD:PsCYP720B4(30-483).

Inclusion of cytochrome P450 reductases (CPRs) can help drive metabolic fluxes in cytochrome P450 (CYP)-mediated production of high-value target compounds in non-native hosts and synthetic compartments. Camptotheca acuminata CPR (cytosol:CaCPR(70-708)) was included the experiments as /VoLDSP-fusion protein to co-localize the CoCPR and PsCYP720B4 activities on lipid droplets and facilitate the CYP-catalyzed production of functionalized terpenoids. As the C- terminus of CPRs is pivotal for catalytic activity and not suitable for modifications, the predicted N-terminal hydrophobic domain of native CoCPR was replaced by /VoLDSP to produce the fusion protein LD:CaCPR(70-708).

To determine the localization in planta, the /VoLDSP-fusion proteins were each produced as yellow fluorescent protein (YFP)-tagged proteins together with AfWRIl (1-397) for lipid droplet production. The YFP-signals in infiltrated leaves were subsequently compared to the signals obtained for YFP-tagged /VoLDSP, which indicated that all three YFP-tagged /VoLDSP-fusion proteins were targeted to the surface of the lipid droplets (FIG. 4). It is noteworthy that production of the YFP- tagged /VoLDSP and /VoLDSP-fusion proteins promoted clustering of small lipid droplets in planta and in isolated lipid droplet fractions (FIG. 4, FIG. 2D-2F). As confirmed for /VoLDSP, the clustering of small lipid droplets was independent of die presence or absence of the YFP-tag (FIG. 2F).

To compare different engineering approaches, the A. grandis abietadiene synthase was produced as plastidAgABS (native), cytosolAgABS(85-868), or LDAgABS85-868, each alone or combined with ER:i¾CYP720B4 (native), cytosol:/¾CYP720B4(30-483), or LD:PsCYP720B4(30-483), with LD:CaCPR(70- 708) (FIG. 5). Note that these assays also included either PbDXS with

plastidiAftGGDPS, or £7HMGR( 159-582) with cytosol:AftGGDPS to increase the precursor flux, and A/WRIl(l-397) to initiate lipid droplet accumulation. /VoLDSP was included in those assays that lacked any /VoLDSP-fusion proteins. /VoLDSP was included in those assays that lacked any /VoLDSP-fusion proteins.

Compared to the assays with plastidAgABS, use of cytosolAgABS(85-868) and LD:AgABS(85-868) resulted in similar diterpenoid yield. When native or modified A. gmndis abietadiene synthase was co-produced with native or modified P. sitchensis PsCYP720B4, the leaves accumulated diteipene resin acids in free and glycosylated forms (FIGs. 6-8).

The glycosyl modifications of the diterpenoid acids are likely the result of intrinsic defense/detoxification mechanisms in N. benthamiana. Incubation of leaf extracts with Viscozyme® L resulted in the hydrolysis of the glycosylated diterpenoid acids to free diteipenoid resin acids which allowed determination of the level of total diterpenoid acids produced in infiltrated leaves.

To facilitate the comparison between the different engineering strategies, the level of diterpenoids and total diterpenoid acids were quantified for each infiltrated leaf (FIG. 5). Co-production of plastidiAgABS with ER:PiCYP720B4,

cytosol:PsCYP720B4(30-483) or LD:PsCYP720B4(30-483) decreased the diterpenoid level (compared to controls with plastidiAgABS) and resulted in the accumulation of diterpenoid acids, consistent with diterpenoids being converted to diterpenoid acids. The level of diterpenoid acids was about 4-fold and 3-fold higher in transient assays with plastid:AgABS including ER:PsCYP720B4 and plastid:AgABS, LD:P.?CYP720B4(30-483), LD:CaCPR(70-708) compared to assays including cytosol:PsCYP720B4(30-483). The highest diterpenoid acid yield in transient assays with cytosolAgABS(85-868) was achieved in combination with ER:PsCYP720B4 which was at least 2-fold or at least 3-fold higher than with cytosolAgABS(85-868) and LD:PsCYP720B4(30-483) with LD:CaCPR(70-708), respectively (FIG. 5). In transient assays with LDAgABS(85-868), the diterpenoid arid level was 2-fold higher in assays with ER:PsCYP720B4 than in assays with either

cytosol:P.vCYP720B4(30-483) or LD:PsCYP720B4(30-483) with LD:CaCPR(70- 708) (FIG. 5).

Example 7: Screening DXS variants l-Deoxy-D- xylulose 5-phosphate synthase (DXS) is the entry step to the plasddial 2-C-methyl-D-erythritol 4-phosphate (MEP) pathway. DXS variants were screened to increase availability of IPP/DMAPP for teipene biosynthesis.

Candidate DXS and DXS alternatives were agrobacterium-transformed into Nicotiana benthamiana for transient expression of a Coleus forskohlii GGPPS (QGGPPS) and a casbene synthase (CasS) recently discovered by the inventors (unpublished). Casbene was used as a proxy of DXS activities to evaluate DXS candidates for improving flux through the MEP pathway.

Three DXS enzymes were screened; Coleus forskohlii DXS (C/DXS), Populus trichocarpa DXS (PfDXS), and PfDXS with two-point mutations (PfDXS

A147G:A352G) to reduce feedback inhibition by IPP/DMAPP. Additionally, two genes from E. coli (ribB and yajO) were also screened, as they provide a route to DXP, the first compound in the MEP pathway, via different substrates. These enzymes were also screened as fusions to DXP reductase (DXR), the next step in the MEP pathway.

Ratios of the product, casbene, were measured by GC-FID, compared to the internal standard ledol (IS), to determine the relative yields of casbene.

As shown in FIG. 10, the most casbene was produced by the Coleus forskohU DXS and the Populus trichocarpa DXS (PfDXS).

Example 8: Screening squalene synthase (SQS) candidates Squalene synthase (SQS) candidates were screened to identify highly enzymes. Candidates that can increase squalene yields can be integrated into the lipid droplet scaffolding platform.

The squalene synthases evaluated included squalene synthases from

Amaranthus hybridus, Botryococcus braunii, Euphorbia laihyrism, Ganoderma lucidum, and Mortierella alpine. All SQS candidates were natively ER bound but were modified to target them to plastids to reduce interference from the native, cytosolic N. benthamiana SQS. The following SQS candidates with truncations to remove endoplasmic reticulum (ER) targeting peptide were evaluated: Amaranthus hybridus SQS with a 41-amino acid, C-terminal truncation (AASQS CA41),

Botryococcus braunii SQS with an 83-amino acid, C-terminal truncation (BbSQS CA83), Botryococcus braunii SQS with an 40-amino acid, C-terminal truncation (BbSQS CA40), Euphorbia lathyris SQS with an 36-amino acid, C-terminal truncation (EISQS CA36), Ganoderma lucidum SQS with an 61-amino acid, C- terminal truncation (GISQS CA61), Ganoderma lucidum SQS with a 30-amino acid, C-terminal truncation (GISQS CA30), and Mortierella alpina SQS with a 37-amino acid, C-terminal truncation (AfrzSQS CA37), and Mortierella alpina SQS with a 17- amino acid, C-terminal truncation (MaSQS CA17). Candidates were co -expressed with C/DXS and plastidial targeted Arabidopsis thaliana fames)! diphosphate synthase (AfFPPS) to provide the squalene precursor, famesyl diphosphate (FPP).

FIG. 11 shows the squalene yields as determined by GC-FID, where the relative yields are reported as the ratio of squalene to the internal standard, n- hexacosane. As shown, a Mortierella alpina squalene synthase with 17 amino adds truncated from the C-terminus had the highest squalene synthase activity. Such a truncated Mortierella alpina squalene synthase can have the following sequence (SEQ

ID NO:68) (also called AfaSQS CA17).

1 MASAILASLL HPSEVLALVQ YKLSPKTQHD YSNDKTRQRL

41 YHHLNMTSRS FSAVIQDLDE ELKDAICLFY LVLRGLDTIE

81 DDMTIDLDTK LPYLRTFHEI IYQKGWTFTK NGPNEKDRQL

121 LVEFDAIIEG FLQLKPAYQT IIADITKRMG NGMAHYATAG

161 IHVETNADYD EYCHYVAGLV GLGLSEMFSA CGFESPLVAE

201 RKDLSNSMGL FLQKTNIARD YLEDLRDNRR FWPKEIWGQY

241 AETMEDLVKP ENKEKALQCL SHMIVNAMEH IRDVLEYLSM

281 IKNPSCFKFC AIPQVMAMAT LNLLHSNYKV FTHENIKIRK

321 GETVWLMKES DSMDKVAAIF RLYARQINNK SNSLDPHFVD

361 IGVICGEIEQ ICVGRFPGST IEMKRMQAGV LGGKTGTVL

Hence squalene synthases from various species can be evaluated or modified and then evaluated to optimize production of squalene.

Example 9: Screening of famesyl diphosphate synthase (EPPS) candidates

This Example describes screening of famesyl diphosphate synthase (FPPS) candidates to increase yields of squalene prior to integration into the lipid droplet scaffolding platform.

Three FPPS candidates were evaluated: Arabidopsis thaliana FPPS (AfFPPS), Picea abies FPPS (PaFPPS), and Gallus gallus FPPS (GgFPPS). An example of a Picea abies FPPS (PaFPPS) sequence is shown below as SEQ ID NO:97 (NCBI accession no. ACA21460.1).

1 MASNGIVDVK TKFEEIYLEL KAQILNDPAF DYTEDARQWV

41 EKMLDYTVPG GKLNRGLSVI DSYRLLKAGK EISEDEVFLG

81 CVLGWCIEWL QAYFLILDDI MDSSHTRRGQ PCWFRLPKVG

121 LIAVNDGILL RNHICRILKK HFRTKPYYVD LLDLFNEVEF

161 QTASGQLLDL ITTHEGATDL SKYKMPTYVR IVQYKTAYYS

201 FYLPVACALV MAGENLDNHV DVKNILVEMG TYFQVQDDYL

241 DCFGDPEVIG KIGTDIEDFK CSWLWQALE RANESQLQRL

281 YANYGKKDPS CVAEVKAVYR DLGLQDVFLE YERTSHKELI

321 SSIEAQENES LQLVLKSFLG KIYKRQK A cDNA encoding the Picea abies FPPS (A? FPPS) with SEQ ID NO:90 is shown below as SEQ ID NO:98.

1 ATGGCTTCAA ACGGCATCGT CGACGTGAAA ACCAAGTTTG

41 AGGAAATCTA TCTTGAGCTT AAGGCTCAGA TTCTGAACGA

81 TCCTGCCTTC GATTACACCG AAGACGCCCG TCAATGGGTC

121 GAGAAGATGC TGGACTACAC GGTGCCCGGA GGAAAGCTGA

161 ACCGCGGTCT GTCTGTAATA GACAGCTACA GGCTATTGAA

201 AGCAGGAAAG GAAATATCAG AAGATGAAGT CTTTCTTGGA

241 TGTGTGCTTG GCTGGTGTAT TGAATGGCTT CAAGCATATT

281 TCCTCATATT AGATGACATC ATGGACAGCT CTCACACTAG

321 GCGTGGACAA CCTTGTTGGT TCAGATTACC TAAGGTTGGC

361 TTAATTGCTG TTAATGATGG AATATTGCTT CGTAACCACA

401 TATGCAGAAT TCTGAAAAAG CATTTTCGCA CTAAGCCTTA

441 CTATGTGGAT CTCCTTGATT TATTCAATGA GGTTGAGTTT

481 CAAACAGCTA GTGGACAGTT GCTGGACCTT ATCACTACTC

521 ATGAAGGAGC AACTGACCTT TCAAAGTACA AAATGCCAAC

561 TTATGTTCGT ATAGTTCAAT ACAAGACTGC CTACTATTCA

601 TTCTATCTGC CGGTTGCCTG TGCACTGGTA ATGGCAGGGG

641 AAAATTTAGA TAATCACGTA GATGTCAAGA ATATTTTAGT

681 CGAAATGGGA ACCTATTTTC AAGTACAGGA TGATTATCTT

721 GATTGCTTTG GTGATCCAGA AGTGATTGGG AAGATTGGAA

761 CTGATATCGA AGACTTCAAG TGCTCTTGGT TGGTGGTGCA

801 AGCCCTTGAA CGGGCAAATG AGAGCCAACT TCAACGATTA

841 TATGCCAATT ATGGAAAGAA AGATCCTTCT TGTGTTGCAG

881 AAGTGAAGGC TGTATATAGG GATCTTGGAC TTCAGGATGT

921 TTTTCTGGAA TACGAGCGTA CTAGTCACAA GGAGCTCATT

961 TCTTCCATCG AGGCTCAGGA GAATGAATCT TTGCAGCTTG

1001 TTCTGAAGTC CTTCCTAGGG AAGATATACA AGCGACAGAA

1041 GTAA

An example of a Gallus gallus FPPS (QgFPPS) polypeptide sequence is shown below as SEQ ED NO:99 (NCBI accession no. XP_015154133.1).

1 MSADGAKRTA AEREREEFVG FFPQIVRDLT EDGIGHPEVG

41 DAVARLKEVL QYNAPGGKCN RGLTWAAYR ELSGPGQKDA

81 ESLRCALAVG WCIELFQAFF LVADDIMDQS LTRRGQLCWY

121 KKEGVGLDAI NDSFLLESSV YRVLKKYCGQ RPYYVHLLEL

161 FLQTAYQTEL GQMLDLITAP VSKVDLSHFS EERYKAIVKY

201 KTAFYSFYLP VAAAMYMVGI DSKEEHENAK AILLEMGEYF

241 QIQDDYLDCF GDPALTGKVG TDIQDNKCSW LWQCLQRVT

281 PEQRQLLEDN YGRKEPEKVA KVKELYEAVG MRAAFQQYEE

321 SSYRRLQELI EKHSNRLPKE IFLGLAQKIY KRQK

A cDNA encoding the Gallus gallus FPPS (GgFPPS) with SEQ ID NO:92 is shown below as SEQ ID NO: 100.

1 AGAATGCCCC GCGCGGCGCC GGGCGGAGCG CACGGAAAGG

These famesyl diphosphate synthases are natively cytosolic. However, these farnesyl diphosphate synthases were modified to be targeted to plastids.

The plastid-targeted famesyl diphosphate synthases were co-expressed with C/DXS and AfaSQS CA17 and squalene yields were measured by GC-FLD.

The squalene yields are reported in FIG. 12 as a ratio to the internal standard, n-hexacosane. As shown in FIG. 12, in this experiment, an Arabidopsis thaliana FPPS provided the highest squalene production. Example 10: Linking SQS and/or FTPS to lipid droplet surface proteins improves squalene yields

This Example illustrates that linkage of lipid droplet surface protein to enzymes can optimize production of lipophilic products.

In a first experiment, AfFPPS and MaSQS CA17 were transiently expressed in Nicotiana benthamiana in cytosolic or soluble form, or in fusion with lipid droplet surface protein. LDSP fusions were to the C-terminal ends of AfFPPS and MaSQS CA17. Constructs excluding the empty vector were coexpressed with an N-terminally truncated Euphorbia lathyris HMG-CoA reductase (EZHMGR 159 582 ) to increase flux through the cytosolic MVA pathway, thereby increasing IPP/DMAPP availability. AfWRIl 1 397 , lipid droplet surface protein (not fused to an enzyme), or a combination thereof was also expressed in some assays.

Table 2 summarizes the amounts of squalene that accumulated in cells expressing various constructs and combinations of proteins.

These data are graphically illustrated in FIG. 13 A, demonstrating that in this experiment, the combination which yields the highest levels of squalene included expression of AfWRIl 1 397 , MaSQS CA17-M>LDSP, EZHMGR 159 582 , and AfFPPS. In a second experiment, /VbLDSP was fused to either the C-terminus of AfaSQS CA17, the N-terminus of AiFPPS, or /VoLDSP was linked to both AfaSQS and AiFPPS to form a single fusion of all three proteins with /VbLDSP in between.

AiWRll 1 397 was expressed in samples indicated with“LD”, alongside either /VbLDSP alone, or /VbLDSP fused to AiFPPS and AfaSQS CA17 as indicated. All samples co- expressed with EZHMGR 159 582 except for tire empty vector.

Table 3 summarizes the amounts of squalene that accumulated in cells expressing various constructs and combinations of proteins.

These data are graphically illustrated in FIG. 13B, showing that cellular accumulation of squalene was improved by linkage of either of the two final enzymes in the squalene pathway to lipid droplet surface protein. But squalene accumulation was comparable in cells with either of the two final enzymes in the squalene pathway fused with lipid droplet surface protein. The methods and expression systems described herein can readily be adapted to optimize squalene and triterpene biosynthesis. Linkage of enzymes in the squalene biosynthesis pathway to lipid droplet surface protein increased squalene accumulation compared to the amounts of squalene that accumulated in Nicotiana benthamiana cells when such enzymes are expressed in soluble, non-fused form.

Example 11: Improved capacity of the lipid droplet scaffolding platform

This Example illustrates that contributions from the MEP pathway with plastidial expression and use of enzyme fusions to lipid droplet surface protein can further boost squalene biosynthesis.

The contributions of plastidial IPP/DMAPP or the MEP pathway were evaluated while using the following expression systems.

A“Cytosol SQS-LD Scaffold” system included a lipid droplet surface protein fused to a MaSQS CA17squalene synthase (MaSQS CA17-A/bLDSP). The AfWRIl 1 397 , E/HMGR 159 582 , and AfFPPS were expressed with the Cytosol SQS-LD Scaffold.

A“Plastid Pathway” system involved use of components of a plastidial targeted squalene pathway consisting of C/DXS, plastidial AfFPPS, and plastidial MaSQS CA17. Additionally, C/DXS alone was co-expressed with the SQS-LD scaffold.

Table 4 summarizes the amounts of squalene that accumulated in cells expressing various constructs and combinations of proteins.

These data are graphically illustrated in FIG. 14, illustrating that increased plastidial IPP/DMAPP availability when using the cytosolic LD scaffolding platform can influence and increase accumulation of terpenes.

Example 12: LDSP- Fusions Increase Lipid Accumulation in Poplar Leaves

This Example illustrates that expression of lipid droplet surface protein fusions provides accumulation of lipid droplets within poplar leaves. ArWRIl 1-397 was linked to eYFP-A/ioLDSP by the“self-cleaving” LP4/2A hybrid linker. This AfWRIl 1-397 -eYFP-A/bLDSP fusion or an eYFP-NoLDSP fusion was expressed in poplar NM6 leaves by Agrobacterium-mediated transient expression.

FIG. 15 shows images of wild type, non-infiltrated poplar leaves (top row). The middle row in FIG. 15 shows images of leaves transiently expressing eYFP- /VbLDSP fusion gene from pEAQ vector, while the bottom row images show leaves transiently expressing AfWRIl 1-397 linked to eYFP-/VoLDSP by the“self-cleaving” LP4Z2A hybrid linker, which is cleaved during translation to form the two separate protein products.

Punctae are present in the bottom row images of FIG. 15 indicating formation of lipid droplets in leaves of poplar NM6.

Example 13: Constructs and Vectors

This Example describes some of the constructs and vectors that have been made and used in the development of the systems and methods described herein. The pEAQ vectors (see, e.g., Sainsbury et al. (Plant Biotechnology Journal 7: 682-693 (2009)) were used as a basis for these constructs and expression vectors.

Table 5 describes the proteins and/or fusion proteins encoded within several pEAQ-ht or pEAQ vectors.

As indicated, an additional cloning site was inserted into a pEAQ vector to facilitate expression of more than one protein or fusion protein. The LP4/2A vl linker, which undergoes cleavage during translation was used in some cases. For example, a soluble E1HMGR(159-582) was linked to an AtFPPS via the LP4/2Avl linker and the AtFPPS was linked to MaSQS CA17 via a LP4/2Av2 linker, allowing these three proteins to be expressed together and then to be separated as they were translated.

An example of a sequence for the pldlhfs2-peaq-ld-sq plasmid is shown below as SEQ ID NO:103.

cctgtggttggcatgcacatacaaatggacgaacggataaaccttttcacgcccttt taaatatccgattattctaataaacgctcttttctcttaggtttacccgccaatata tcctgtcaaacactgatagtttgtgaaccatcacccaaatcaagttttttggggtcg aggtgccgtaaagcactaaatcggaaccctaaagggagcccccgatttagagcttga cggggaaagccggcgaacgtggcgagaaaggaagggaagaaagcgaaaggagcgggc gccattcaggctgcgcaactgttgggaagggcgatcggtgcgggcctcttcgctatt acgccagctggcgaaagggggatgtgctgcaaggcgattaagttgggtaacgccagg gttttcccagtcacgacgttgtaaaacgacggccagtgaattgttaattaagaattc gagctccaccgcggaaacctcctcggattccattgcccagctatctgtcactttatt gagaagatagtggaaaaggaaggtggctcctacaaatgccatcattgcgataaagga aaggccatcgttgaagatgcctctgccgacagtggtcccaaagatggacccccaccc acgaggagcatcgtggaaaaagaagacgttccaaccacgtcttcaaagcaagtggat tgatgtgatatctccactgacgtaagggatgacgcacaatcccactatccttcgcaa gacccttcctctatataaggaagttcatttcatttggagaggtattaaaatcttaat aggttttgataaaagcgaacgtggggaaacccgaaccaaaccttcttctaaactctc tctcatctctcttaaagcaaacttctctcttgtctttcttgcgtgagcgatcttcaa cgttgtcagatcgtgcttcggcaccagtacaacgttttctttcactgaagcgaaatc aaagatctctttgtggacacgtagtgcggcgccattaaataacgtgtacttgtccta ttcttgtcggtgtggtcttgggaaaagaaagcttgctggaggctgctgttcagcccc atacattacttgttacgattctgctgactttcggcgggtgcaatatctctacttctg cttgacgaggtattgttgcctgtacttctttcttcttcttcttgctgattggttcta taagaaatctagtattttctttgaaacagagttttcccgtggttttcgaacttggag aaagattgttaagcttctgtatattctgcccaaattcgcgATGAAGAAGCGCTTAAC CACTTCCACTTGTTCTTCTTCTCCATCTTCCTCTGTTTCTTCTTCTACTACTACTTC CTCTCCTATTCAGTCGGAGGCTCCAAGGCCTAAACGAGCCAAAAGGGCTAAGAAATC TTCTCCTTCTGGTGATAAATCTCATAACCCGACAAGCCCTGCTTCTACCCGACGCAG CTCTATCTACAGAGGAGTCACTAGACATAGATGGACTGGGAGATTCGAGGCTCATCT TTGGGACAAAAGCTCTTGGAATTCGATTCAGAACAAGAAAGGCAAACAAGTTTATCT GGGAGCATATGACAGTGAAGAAGCAGCAGCACATACGTACGATCTGGCTGCTCTCAA GTACTGGGGACCCGACACCATCTTGAATTTTCCGGCAGAGACGTACACAAAGGAATT GGAAGAAATGCAGAGAGTGACAAAGGAAGAATATTTGGCTTCTCTCCGCCGCCAGAG CAGTGGTTTCTCCAGAGGCGTCTCTAAATATCGCGGCGTCGCTAGGCATCACCACAA CGGAAGATGGGAGGCTCGGATCGGAAGAGTGTTTGGGAACAAGTACTTGTACCTCGG CACCTATAATACGCAGGAGGAAGCTGCTGCAGCATATGACATGGCTGCGATTGAGTA TCGAGGCGCAAACGCGGTTACTAATTTCGACATTAGTAATTACATTGACCGGTTAAA GAAGAAAGGTGTTTTCCCGTTCCCTGTGAACCAAGCTAACCATCAAGAGGGTATTCT TGTTGAAGCCAAACAAGAAGTTGAAACGAGAGAAGCGAAGGAAGAGCCTAGAGAAGA AGTGAAACAACAGTACGTGGAAGAACCACCGCAAGAAGAAGAAGAGAAGGAAGAAGA GAAAGCAGAGCAACAAGAAGCAGAGATTGTAGGATATTCAGAAGAAGCAGCAGTGGT CAATTGCTGCATAGACTCTTCAACCATAATGGAAATGGATCGTTGTGGGGACAACAA TGAGCTGGCTTGGAACTTCTGTATGATGGATACAGGGTTTTCTCCGTTTTTGACTGA TCAGAATCTCGCGAATGAGAATCCCATAGAGTATCCGGAGCTATTCAATGAGTTAGC ATTTGAGGACAACATCGACTTCATGTTCGATGATGGGAAGCACGAGTGCTTGAACTT GGAAAATCTGGATTGTTGCGTGGTGGGAAGAGAGTCAAATGCAGCAGACGAAGTTGC TACTCAACTTTTGAATTTTGACTTGCTGAAGTTGGCTGGTGATGTTGAGTCAAACCC TGGACCTATGGGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGA GCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGA TGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGT GCCCTGGCCCACCCTCGTGACCACCTTCGGCTACGGCCTGCAGTGCTTCGCCCGCTA CCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGT CCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGT GAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAA GGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGT CTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCA CAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCAT CGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCTACCAGTCCGCCCT GAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGC CGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGTCCGGACTCAGATCTCGAGC TCAAGCTTCGAATTCTGCAGTCGACGGTACCGCGGGCCCGGGATCATCAACAAGTTT GTACAAAAAAGCAGGCTCCACCATGGCCGGCCCCATCATGACCTCTGCGCCCTCCGC GACCACGCCCACGGGCAAGACAATGCCGTTCAAGCAGCCTTTCAAGACTGTGGCCAC GCTGTCCGCCAAGACTGGCAACATTACCAAGCCCATCGACCCTGCCATCTCCAAGAC CATTGACTTCGTCTACAATGGTTACTCGACGGTCAAGACCAAGGTTGACAAGGCCCC TAAGGTAAACCCCTACCTGCTCATTGCCGGCGGCCTCGTCCTCTCGTGCATCATCTC CATGTGCCTGCTCGTCCCGGCCGTGATCTTCTTCCCCGTCACCATCTTCCTGGGTGT CGCTACGTCGTTTGCGCTCATTGCATTGGCCCCCGTGGCTTTTGTGTTCGGGTGGAT CCTGATCTCCTCTGCTCCGATCCAGGATAAGGTGGTGGTGCCCGCCTTGGACAAGGT GCTGGCCAATAAGAAGGTGGCGAAGTTCCTCCTCAAGGAGTAAtcgaggcctttaac tctggtttcattaaattttctttagtttgaatttactgttattcggtgtgcatttct atgtttggtgagcggttttctgtgctcagagtgtgtttattttatgtaatttaattt ctttgtgagctcctgtttagcaggtcgtcccttcagcaaggacacaaaaagatttta attttattaaaaaaaaaaaaaaaaaagaccgggaattcgatatcaagcttatcgacc tgcagatcgttcaaacatttggcaataaagtttcttaagattgaatcctgttgccgg tcttgcgatgattatcatataatttctgttgaattacgttaagcatgtaataattaa catgtaatgcatgacgttatttatgagatgggtttttatgattagagtcccgcaatt atacatttaatacgcgatagaaaacaaaatatagcgcgcaaactaggataaattatc gcgcgcggtgtcatctatgttactagatctctagagtctcaagcttggcgcgccagc ttggcgtaatcatggtcatagctgttgcgattaagaattcgagctcggtacccccct actccaaaaatgtcaaagatacagtctcagaagaccaaagggctattgagacttttc aacaaagggtaatttcgggaaacctcctcggattccattgcccagctatctgtcact tcatcgaaaggacagtagaaaaggaaggtggctcctacaaatgccatcattgcgata aaggaaaggctatcattcaagatgcctctgccgacagtggtcccaaagatggacccc cacccacgaggagcatcgtggaaaaagaagacgttccaaccacgtcttcaaagcaag tggattgatgtgacatctccactgacgtaagggatgacgcacaatcccactatcctt cgcaagacccttcctctatataaggaagttcatttcatttggagaggacagcccaag cttcgactctagaggatccccttaaatcgatATTTATGATTTCGCCTCTGGCATCCG AGGAGGATGAGGAAATTGTTAAATCTGTTGTTAATGGAACGATTCCTTCGTATTCGT TGGAATCGAAGCTTGGGGATTGTAAAAGAGCGGCTGAGATTCGACGGGAGGCTTTGC AGAGAATGATGGGGAGGTCGTTGGAGGGTTTACCTGTTGAAGGATTCGATTATGAGT CGATTTTAGGTCAGTGCTGTGAAATGCCTGTTGGTTATGTGCAGATTCCGGTTGGAA TTGCTGGGCCGTTGCTGCTAGACGGGCAAGAGTACTCTGTTCCGATGGCGACCACCG AGGGTTGTTTGGTTGCTAGCACTAATAGAGGGTGTAAAGCGATCCATTTGTCAGGTG GTGCTAGTAGTGTCTTGTTGAAGGATGGCATGACTAGAGCTCCCGTTGTTCGATTCG CCTCGGCCATGAGGGCCGCGGATTTGAAGTTTTTCTTAGAGAATCCTGAGAATTTCG ATAGCTTGTCCATCGCTTTCAATAGGTCCAGTAGATTTGCAAAGCTCCAAAGCATAC AATGTTCTATTGCTGGAAAGAATCTATATATGAGATTCACCTGCAGCACTGGTGATG CAATGGGGATGAACATGGTTTCCAAAGGGGTTCAAAACGTTCTTGACTTCCTTCAAA GTGATTTCCCTGACATGGATGTTATTGGCATCTCAGGAAATTTTTGTTCGGACAAGA AGCCAGCTGCTGTGAACTGGATTCAAGGGCGAGGCAAATCGGTTGTTTGCGAGGCAA TTATCAAGGAAGAGGTGGTGAAGAAGGTATTGAAATCAAGTGTTGCTTCACTAGTAG AGCTGAACATGCTCAAGAATCTTACTGGTTCAGCTATTGCTGGAGCTCTTGGTGGAT TCAATGCACATGCTGGCAACATAGTCTCTGCAATTTTCATTGCCACTGGCCAGGATC CAGCCCAGAATGTTGAGAGTTCTCATTGCATCACCATGATGGAAGCTGTCAATGATG GAAAAGATCTCCACATCTCTGTAACCATGCCTTCAATCGAGGTAGGAACAGTTGGAG GAGGGACACAACTAGCATCCCAATCAGCATGTCTGAACCTACTCGGTGTAAAAGGAG CAAGTAAAGAATCACCAGGAGCAAACTCAAGGCTCCTAGCCACAATAGTAGCTGGTT CAGTCCTAGCTGGTGAACTCTCCCTAATGTCAGCCATAGCAGCAGGACAACTAGTCC GGAGCCACATGAAGTACAACAGATCCAGCAAAGATGTAACCAAATTTGCATCATCTT CAAATGCAGCAGACGAAGTTGCTACTCAACTTTTGAATTTTGACTTGCTGAAGTTGG CTGGTGATGTTGAGTCAAACCCTGGACCTATGGCGGATCTGAAATCAACCTTCCTCG ACGTTTACTCTGTTCTCAAGTCTGATCTGCTTCAAGATCCTTCCTTTGAATTCACCC ACGAATCTCGTCAATGGCTTGAACGGATGCTTGACTACAATGTACGCGGAGGGAAGC TAAATCGTGGTCTCTCTGTGGTTGATAGCTACAAGCTGTTGAAGCAAGGTCAAGACT TGACGGAGAAAGAGACTTTCCTCTCATGTGCTCTTGGTTGGTGCATTGAATGGCTTC AAGCTTATTTCCTTGTGCTTGATGACATCATGGACAACTCTGTCACACGCCGTGGCC AGCCTTGTTGGTTTAGAAAGCCAAAGGTTGGTATGATTGCCATTAACGATGGGATTC TACTTCGCAATCATATCCACAGGATTCTCAAAAAGCACTTCAGGGAAATGCCTTACT ATGTTGACCTCGTTGATTTGTTTAACGAGGTAGAGTTTCAAACAGCTTGCGGCCAGA TGATTGATTTGATCACCACCTTTGATGGAGAAAAAGATTTGTCTAAGTACTCCTTGC AAATCCATCGGCGTATTGTTGAGTACAAAACAGCTTATTACTCATTTTATCTTCCTG TTGCTTGCGCATTGCTCATGGCGGGAGAAAATTTGGAAAACCATACTGATGTGAAGA CTGTTCTTGTTGACATGGGAATTTACTTTCAAGTACAGGATGATTATCTGGACTGTT TTGCTGATCCTGAGACACTTGGCAAGATAGGGACAGACATAGAAGATTTCAAATGCT CCTGGTTGGTAGTTAAGGCATTGGAACGCTGCAGTGAAGAACAAACTAAGATACTAT ACGAGAACTATGGTAAAGCCGAACCATCAAACGTTGCTAAGGTGAAAGCTCTCTACA AAGAGCTTGATCTCGAGGGAGCGTTCATGGAATATGAGAAGGAAAGCTATGAGAAGC TGACAAAGTTGATCGAAGCTCACCAGAGTAAAGCAATTCAAGCAGTGCTAAAATCTT TCTTGGCTAAGATCTACAAGAGGCAGAAGAAATCCTCATCTAACGCTGCTGATGAGG TGGCAACACAGTTGCTGAACTTCGATCTTTTGAAACTTGCAGGAGACGTGGAATCTA ATCCAGGCCCAATGGCCAGTGCTATTCTTGCTTCATTACTCCACCCATCAGAAGTGT TGGCACTTGTGCAGTACAAGCTTTCACCCAAAACCCAGCATGATTACTCTAACGACA AAACTAGGCAAAGACTTTATCATCATCTTAATATGACTTCCCGATCCTTCTCTGCCG TCATACAGGACCTTGATGAAGAGTTAAAGGATGCTATATGCTTATTCTATCTGGTGC TGAGAGGCTTAGATACTATAGAAGACGACATGACCATCGACCTTGACACTAAATTGC CTTACCTTCGTACGTTCCACGAAATCATATACCAGAAAGGCTGGACTTTCACTAAGA ACGGCCCAAATGAAAAAGATAGGCAATTACTGGTAGAATTTGACGCCATCATAGAGG GCTTCCTTCAATTGAAGCCAGCCTATCAGACTATCATTGCCGATATAACCAAACGTA TGGGGAACGGAATGGCACACTACGCTACGGCAGGGATACATGTTGAGACCAACGCAG ACTACGACGAGTACTGCCACTATGTCGCTGGTTTGGTGGGGCTGGGTCTCTCTGAAA TGTTTTCCGCATGTGGGTTCGAAAGTCCTCTTGTGGCAGAAAGAAAAGACCTTAGCA ACAGCATGGGACTTTTCCTTCAGAAGACGAACATTGCACGTGATTATCTTGAAGACC TCAGAGACAATCGTCGATTTTGGCCCAAGGAAATATGGGGGCAGTATGCTGAGACTA TGGAGGACTTGGTAAAGCCCGAAAATAAAGAAAAGGCCCTCCAATGCCTCTCCCATA TGATCGTCAATGCAATGGAGCATATCAGAGACGTTTTGGAGTATCTCTCTATGATAA AGAATCCGAGCTGCTTCAAATTTTGTGCTATTCCACAAGTCATGGCTATGGCCACAT TAAACCTGCTTCATTCCAACTACAAAGTGTTCACGCATGAGAATATCAAGATCCGTA AAGGTGAGACAGTGTGGCTTATGAAAGAAAGTGACAGTATGGACAAGGTAGCTGCTA TCTTTAGGTTGTACGCCCGACAAATTAACAACAAGTCCAACTCTCTTGATCCCCATT TTGTGGATATAGGGGTGATTTGCGGTGAGATCGAGCAAATTTGCGTAGGAAGGTTCC CTGGCTCCACAATAGAAATGAAGCGAATGCAGGCTGGAGTCTTAGGGGGGAAAACTG GAACGGTCCTGTAATCAGCAATTGggggagctcgaattcgctgaaatcaccagtctc tctctacaaatctatctctctctattttctccataaataatgtgtgagtagtttccc gataagggaaattagggttcttatagggtttcgctcatgtgttgagcatataagaaa cccttagtatgtatttgtatttgtaaaatacttctatcaataaaatttctaattcct aaaaccaaaatccagtactaaaatccagatctcctaaagtccctatagatctttgtc gtgaatataaaccagacacgagacgactaaacctggagcccagacgccgttcgaagc tagaagtaccgcttaggcaggaggccgttagggaaaagatgctaaggcagggttggt tacgttgactcccccgtaggtttggtttaaatatgatgaagtggacggaaggaagga ggaagacaaggaaggataaggttgcaggccctgtgcaaggtaagaagatggaaattt gatagaggtacgctactatacttatactatacgctaagggaatgcttgtatttatac cctataccccctaataaccccttatcaatttaagaaataatccgcataagcccccgc ttaaaaattggtatcagagccatgaataggtctatgaccaaaactcaagaggataaa acctcaccaaaatacgaaagagttcttaactctaaagataaaagatggcgcgtggcc ggcctacagtatgagcggagaattaagggagtcacgttatgacccccgccgatgacg cgggacaagccgttttacgtttggaactgacagaaccgcaacgttgaaggagccact cagccgcgggtttctggagtttaatgagctaagcacatacgtcagaaaccattattg cgcgttcaaaagtcgcctaaggtcactatcagctagcaaatatttcttgtcaaaaat gctccactgacgttccataaattcccctcggtatccaattagagtctcatattcact ctcaatccaaataatctgcaccggatctggatcgtttcgcatgattgaacaagatgg attgcacgcaggttctccggccgcttgggtggagaggctattcggctatgactgggc acaacagacaatcggctgctctgatgccgccgtgttccggctgtcagcgcaggggcg cccggttctttttgtcaagaccgacctgtccggtgccctgaatgaactgcaggacga ggcagcgcggctatcgtggctggccacgacgggcgttccttgcgcagctgtgctcga cgttgtcactgaagcgggaagggactggctgctattgggcgaagtgccggggcagga tctcctgtcatctcaccttgetcctgccgagaaagtatcoatcatggctgatgcaat gcggcggctgcatacgcttgatccggctacctgcccattcgaccaccaagcgaaaca tcgcatcgagcgagcacgtactcggatggaagccggtcttgtcgatcaggatgatct ggacgaagagcatcaggggctcgcgccagccgaactgttcgccaggctcaaggcgcg catgcccgacggcgatgatctcgtcgtgacccatggcgatgcctgcttgccgaatat catggtggaaaatggccgcttttctggattcatcgactgtggccggctgggtgtggc ggaccgctatcaggacatagcgttggctacccgtgatattgctgaagagcttggcgg cgaatgggctgaccgcttcctcgtgctttacggtatcgccgctcccgattcgcagcg catcgccttctatcgccttcttgacgagttcttctgagcgggactctggggttcgaa atgaccgaccaagcgacgcccaacctgccatcacgagatttcgattccaccgccgcc ttctatgaaaggttgggcttcggaatcgttttccgggacgccggctggatgatcctc cagcgcggggatctcatgctggagttcttcgcccacgggatctctgcggaacaggcg gtcgaaggtgccgatatcattacgacagcaacggccgacaagcacaacgccacgatc ctgagcgacaatatgatcgcggcgtccacatcaacggcgtcggcggcgactgcccag gcaagaccgagatgcaccgcgatatcttgctgcgttcggatattttcgtggagttcc cgccacagacccggatgatccccgatcgttcaaacatttggcaataaagtttcttaa gattgaatcctgttgccggtcttgcgatgattatcatataatttctgttgaattacg ttaagcatgtaataattaacatgtaatgcatgacgttatttatgagatgggttttta tgattagagtcccgcaattatacatttaatacgcgatagaaaacaaaatatagcgcg caaactaggataaattatcgcgcgcggtgtcatctatgttactagatcgggactgta ggccggccctcactggtgaaaagaaaaaccaccccagtacattaaaaacgtccgcaa tgtgttattaagttgtctaagcgtcaatttgtttacaccacaatatatcctgccacc agccagccaacagctccccgaccggcagctcggcacaaaatcaccactcgatacagg cagcccatcagtccgggacggcgtcagcgggagagccgttgtaaggcggcagacttt gctcatgttaccgatgctattcggaagaacggcaactaagctgccgggtttgaaaca cggatgatctcgcggagggtagcatgttgattgtaacgatgacagagcgttgctgcc tgtgatcaaatatcatctccctcgcagagatccgaattatcagccttcttattcatt tctcgcttaaccgtgacagagtagacaggctgtctcgcggccgaggggcgcagcccc tgggggggatgggaggcccgcgttagcgggccgggagggttcgagaagggggggcac cccccttcggcgtgcgcggtcacgcgcacagggcgcagccctggttaaaaacaaggt ttataaatattggtttaaaagcaggttaaaagacaggttagcggtggccgaaaaacg ggcggaaacccttgcaaatgctggattttctgcctgtggacagcccctcaaatgtca ataggtgcgcccctcatctgtcagcactctgcccctcaagtgtcaaggatcgcgccc ctcatctgtcagtagtcgcgcccctcaagtgtcaataccgcagggcacttatcccca ggcttgtccacatcatctgtgggaaactcgcgtaaaatcaggcgttttcgccgattt gcgaggctggccagctccacgtcgccggccgaaatcgagcctgcccctcatctgtca acgccgcgccgggtgagtcggcccctcaagtgtcaacgtccgcccctcatctgtcag tgagggccaagttttccgcgaggtatccacaacgccggcggccgcggtgtctcgcac acggcttcgacggcgtttctggcgcgtttgcagggccatagacggccgccagcccag cggcgagggcaaccagcccggtgagcgtcggaaaggcgctcggtcttgccttgctcg tcggtgatgtacactagtcgctggctgctgaacccccagccggaactgaccccacaa ggccctagcgtttgcaatgcaccaggtcatcattgacccaggcgtgttccaccaggc cgctgcctcgcaactcttcgcaggcttcgccgacctgctcgcgccacttcttcacgc gggtggaatccgatccgcacatgaggcggaaggtttccagcttgagcgggtacggct cccggtgcgagctgaaatagtcgaacatccgtcgggccgtcggcgacagcttgcggt acttctcccatatgaatttcgtgtagtggtcgccagcaaacagcacgacgatttcct cgtcgatcaggacctggcaacgggacgttttcttgccacggtccaggacgcggaagc ggtgcagcagcgacaccgattccaggtgcccaacgcggtcggacgtgaagcccatcg ccgtcgcctgtaggcgcgacaggcattcctcggccttcgtgtaataccggccattga tcgaccagcccaggtcctggcaaagctcgtagaacgtgaaggtgatcggctcgccga taggggtgcgcttcgcgtactccaacacctgctgccacaccagttcgtcatcgtcgg cccgcagctcgacgccggtgtaggtgatcttcacgtccttgttgacgtggaaaatga ccttgttttgcagcgcctcgcgcgggattttcttgttgcgcgtggtgaacagggcag agcgggccgtgtcgtttggcatcgctcgcatcgtgtccggccacggcgcaatatcga acaaggaaagctgcatttccttgatctgctgcttcgtgtgtttcagcaacgcggcct gcttggcctcgctgacctgttttgccaggtcctcgccggcggtttttcgcttcttgg tcgtcatagttcctcgcgtgtcgatggtcatcgacttcgccaaacctgccgcctcct gttcgagacgacgcgaacgctccacggcggccgatggcgcgggcagggcagggggag ccagttgcacgctgtcgcgctcgatcttggccgtagcttgctggaccatcgagccga cggactggaaggtttcgcggggcgcacgcatgacggtgcggcttgcgatggtttcgg catcctcggcggaaaaccccgcgtcgatcagttcttgcctgtatgccttccggtcaa acgtccgattcattcaccctccttgcgggattgccccgactcacgccggggcaatgt gcccttattcctgatttgacccgcctggtgccttggtgtccagataatccaccttat cggcaatgaagtcggtcccgtagaccgtctggccgtccttctcgtacttggtattcc gaatcttgccctgcacgaataccagcgaccccttgcccaaatacttgccgtgggcct cggcctgagagccaaaacacttgatgcggaagaagtcggtgcgctcctgcttgtcgc cggcatcgttgcgccacatctaggtactaaaacaattcatccagtaaaatataatat tttattttctcccaatcaggcttgatccccagtaagtcaaaaaatagctcgacatac tgttcttccccgatatcctccctgatcgaccggacgcagaaggcaatgtcataccac ttgtccgccctgccgcttctcccaagatcaataaagccacttactttgccatctttc acaaagatgttgctgtctcccaggtcgccgtgggaaaagacaagttcctcttcgggc ttttccgtctttaaaaaatcatacagctcgcgcggatctttaaatggagtgtcttct tcccagttttcgcaatccacatcggccagatcgttattcagtaagtaatccaattcg gctaagcggctgtctaagctattcgtatagggacaatccgatatgtcgatggagtga aagagcctgatgcactccgcatacagctcgataatcttttcagggctttgttcatct tcatactcttccgagcaaaggacgccatcggcctcactcatgagcagattgctccag ccatcatgccgttcaaagtgcaggacctttggaacaggcagctttccttccagccat agcatcatgtccttttcccgttccacatcataggtggtccctttataccggctgtcc gtcatttttaaatataggttttcattttctcccaccagcttatataccttagcagga gacattccttccgtatcttttacgcagcggtatttttcgatcagttttttcaattcc ggtgatattctcattttagccatttattatttccttcctcttttctacagtatttaa agataccccaagaagctaattataacaagacgaactccaattcactgttccttgcat tctaaaaccttaaataccagaaaacagctttttcaaagttgttttcaaagttggcgt ataacatagtatcgacggagccgattttgaaaccacaattatgggtgatgctgccaa cttactgatttagtgtatgatggtgtttttgaggtgctccagtggcttctgtttcta tcagctgtccctcctgttcagctactgacggggtggtgcgtaacggcaaaagcaccg ccggacatcagcgctatctctgctctcactgccgtaaaacatggcaactgcagttca cttacaccgcttctcaacccggtacgcaccagaaaatcattgatatggccatgaatg gcgttggatgccgggcaacagcccgcattatgggcgttggcctcaacacgattttac gtcacttaaaaaactcaggccgcagtcggtaactatgcggtgtgaaataccgcacag atgcgtaaggagaaaataccgcatcaggcgctcttccgcttcctcgctcactgactc gctgcgctcggtcgttcggctgcggcgagcggtatcagctcactcaaaggcggtaat acggttatccacagaatcaggggataacgcaggaaagaacatgtgagcaaaaggcca gcaaaaggccaggaaccgtaaaaaggccgcgttgctggcgtttttccataggctccg cccccctgacgagcatcacaaaaatcgacgctcaagtcagaggtggcgaaacccgac aggactataaagataccaggcgtttccccctggaagctccctcgtgcgctctcctgt tccgaccctgccgcttaccggatacctgtccgcctttctcccttcgggaagcgtggc gctttctcatagctcacgctgtaggtatctcagttcggtgtaggtcgttcgctccaa gctgggctgtgtgcacgaaccccccgttcagcccgaccgctgcgccttatccggtaa ctatcgtcttgagtccaacccggtaagacacgacttatcgccactggcagcaggtaa cctcgcgcatacagccgggcagtgacgtcatcgtctgcgcggaaatggacgggcccc cggcgccagatctggggaac

The pld 1 hfs2-peaq-ld- sq plasmid encodes the following in multi-cloning site within site 1 (SEQ ID NO: 104).

MKKRLTTSTC SSSPSSSVSS STTTSSPIQS EAPRPKRAKR AKKSSPSGDK SHNPTSPAST RRSSIYRGVT RHRWTGRFEA HLWDKSSWNS IQNKKGKQVY LGAYDSEEAA AHTYDLAALK YWGPDTILNF PAETYTKELE EMQRVTKEEY LASLRRQSSG FSRGVSKYRG VARHHHNGRW EARIGRVFGN KYLYLGTYNT QEEAAAAYDM AAIEYRGANA VTNFDISNYI DRLKKKGVFP FPVNQANHQE GILVEAKQEV ETREAKEEPR EEVKQQYVEE PPQEEEEKEE EKAEQQEAEI VGYSEEAAW NCCIDSSTIM EMDRCGDNNE LAWNFCMMDT GFSPFLTDQN LANENPIEYP ELFNELAFED NIDFMFDDGK HECLNLENLD CCWGRESNA ADEVATQLLN FDLLKLAGDV ESNPGPMGKG EELFTGWPI LVELDGDVNG HKFSVSGEGE GDATYGKLTL KFICTTGKLP VPWPTLVTTF GYGLQCFARY PDHMKQHDFF KSAMPEGYVQ ERTIFFKDDG NYKTRAEVKF EGDTLVNRIE LKGIDFKEDG NILGHKLEYN YNSHNVYIMA DKQKNGIKVN FKIRHNIEDG SVQLADHYQQ NTPIGDGPVL LPDNHYLSYQ SALSKDPNEK RDHMVLLEFV TAAGITLGMD ELYKSGLRSR AQASNSAVDG TAGPGSSTSL YKKAGSTMAG PIMTSAPSAT TPTGKTMPFK QPFKTVATLS AKTGNITKPI DPAISKTIDF VYNGYSTVKT KVDKAPKVNP YLLIAGGLVL SCIISMCLLV PAVIFFPVTI FLGVATSFAL IALAPVAFVF GWILISSAPI QDKVWPALD KVLANKKVAK FLLKE

The pldl hfs2-peaq-ld-sq plasmid encodes the following in site 2 (SEQ ID

NO: 105).

MISPLASEED EEIVKSVVNG TIPSYSLESK LGDCKRAAEI RREALQRMMG RSLEGLPVEG FDYESILGQC CEMPVGYVQI PVGIAGPLLL DGQEYSVPMA TTEGCLVAST NRGCKAIHLS GGASSVLLKD GMTRAPVVRF ASAMRAADLK FFLENPENFD SLSIAFNRSS RFAKLQSIQC SIAGKNLYMR FTCSTGDAMG MNMVSKGVQN VLDFLQSDFP DMDVIGISGN FCSDKKPAAV NWIQGRGKSV VCEAIIKEEV VKKVLKSSVA SLVELNMLKN LTGSAIAGAL GGFNAHAGNI VSAIFIATGQ DPAQNVESSH CITMMEAVND GKDLHISVTM PSIEVGTVGG GTQLASQSAC LNLLGVKGAS KESPGANSRL LATIVAGSVL AGELSLMSAI AAGQLVRSHM KYNRSSKDVT KFASSSNAAD EVATQLLNFD LLKLAGDVES NPGPMADLKS TFLDVYSVLK SDLLQDPSFE FTHESRQWLE RMLDYNVRGG KLNRGLSWD SYKLLKQGQD LTEKETFLSC ALGWCIEWLQ AYFLVLDDIM DNSVTRRGQP CWFRKPKVGM IAINDGILLR NHIHRILKKH FREMPYYVDL VDLFNEVEFQ TACGQMIDLI TTFDGEKDLS KYSLQIHRRI VEYKTAYYSF YLPVACALLM AGENLENHTD VKTVLVDMGI YFQVQDDYLD CFADPETLGK IGTDIEDFKC SWLWKALER CSEEQTKILY ENYGKAEPSN VAKVKALYKE LDLEGAFMEY EKESYEKLTK LIEAHQSKAI QAVLKSFLAK IYKRQKKSSS NAADEVATQL LNFDLLKLAG DVESNPGPMA SAILASLLHP SEVLALVQYK LSPKTQHDYS NDKTRQRLYH HLNMTSRSFS AVIQDLDEEL KDAICLFYLV LRGLDTIEDD MTIDLDTKLP YLRTFHEIIY QKGWTFTKNG PNEKDRQLLV EFDAIIEGFL QLKPAYQTII ADITKRMGNG MAHYATAGIH VETNADYDEY CHYVAGLVGL GLSEMFSACG FESPLVAERK DLSNSMGLFL QKTNIARDYL EDLRDNRRFW PKEIWGQYAE TMEDLVKPEN KEKALQCLSH MIVNAMEHIR DVLEYLSMIK NPSCFKFCAI PQVMAMATLN LLHSNYKVFT HENIKIRKGE TVWLMKESDS MDKVAAIFRL YARQINNKSN SLDPHFVDIG VICGEIEQIC VGRFPGSTIE MKRMQAGVLG GKTGTVL The pldsl hf2-peaq_wril lvl sqs-ldspmcsljhmgrlvl fppsmcs2 plasmid has the following sequence (SEQ ID NO: 106).

cctgtggttggcatgcacatacaaatggacgaacggataaaccttttcacgcccttt taaatatccgattattctaataaacgctcttttctcttaggtttacccgccaatata tcctgtcaaacactgatagtttgtgaaccatcacccaaatcaagttttttggggtcg aggtgccgtaaagcactaaatcggaaccctaaagggagcccccgatttagagcttga cggggaaagccggcgaacgtggcgagaaaggaagggaagaaagcgaaaggagcgggc gccattcaggctgcgcaactgttgggaagggcgatcggtgcgggcctcttcgctatt acgccagctggcgaaagggggatgtgctgcaaggcgattaagttgggtaacgccagg gttttcccagtcacgacgttgtaaaacgacggccagtgaattgttaattaagaattc gagctccaccgcggaaacctcctcggattccattgcccagctatctgtcactttatt gagaagatagtggaaaaggaaggtggctcctacaaatgccatcattgcgataaagga aaggccatcgttgaagatgcctctgccgacagtggtcccaaagatggacccccaccc acgaggagcatcgtggaaaaagaagacgttccaaccacgtcttcaaagcaagtggat tgatgtgatatctccactgacgtaagggatgacgcacaatcccactatccttcgcaa gacccttcctctatataaggaagttcatttcatttggagaggtattaaaatcttaat aggttttgataaaagcgaacgtggggaaacccgaaccaaaccttcttctaaactctc tctcatctctcttaaagcaaacttctctcttgtctttcttgcgtgagcgatcttcaa cgttgtcagatcgtgcttcggcaccagtacaacgttttctttcactgaagcgaaatc aaagatctctttgtggacacgtagtgcggcgccattaaataacgtgtacttgtccta ttcttgtcggtgtggtcttgggaaaagaaagcttgctggaggctgctgttcagcccc atacattacttgttacgattctgctgactttcggcgggtgcaatatctctacttctg cttgacgaggtattgttgcctgtacttctttcttcttcttcttgctgattggttcta taagaaatctagtattttctttgaaacagagttttcccgtggttttcgaacttggag aaagattgttaagcttctgtatattctgcccaaattcgcgATGAAGAAGCGCTTAAC CACTTCCACTTGTTCTTCTTCTCCATCTTCCTCTGTTTCTTCTTCTACTACTACTTC CTCTCCTATTCAGTCGGAGGCTCCAAGGCCTAAACGAGCCAAAAGGGCTAAGAAATC TTCTCCTTCTGGTGATAAATCTCATAACCCGACAAGCCCTGCTTCTACCCGACGCAG CTCTATCTACAGAGGAGTCACTAGACATAGATGGACTGGGAGATTCGAGGCTCATCT TTGGGACAAAAGCTCTTGGAATTCGATTCAGAACAAGAAAGGCAAACAAGTTTATCT GGGAGCATATGACAGTGAAGAAGCAGCAGCACATACGTACGATCTGGCTGCTCTCAA GTACTGGGGACCCGACACCATCTTGAATTTTCCGGCAGAGACGTACACAAAGGAATT GGAAGAAATGCAGAGAGTGACAAAGGAAGAATATTTGGCTTCTCTCCGCCGCCAGAG CAGTGGTTTCTCCAGAGGCGTCTCTAAATATCGCGGCGTCGCTAGGCATCACCACAA CGGAAGATGGGAGGCTCGGATCGGAAGAGTGTTTGGGAACAAGTACTTGTACCTCGG CACCTATAATACGCAGGAGGAAGCTGCTGCAGCATATGACATGGCTGCGATTGAGTA TCGAGGCGCAAACGCGGTTACTAATTTCGACATTAGTAATTACATTGACCGGTTAAA GAAGAAAGGTGTTTTCCCGTTCCCTGTGAACCAAGCTAACCATCAAGAGGGTATTCT TGTTGAAGCCAAACAAGAAGTTGAAACGAGAGAAGCGAAGGAAGAGCCTAGAGAAGA AGTGAAACAACAGTACGTGGAAGAACCACCGCAAGAAGAAGAAGAGAAGGAAGAAGA GAAAGCAGAGCAACAAGAAGCAGAGATTGTAGGATATTCAGAAGAAGCAGCAGTGGT CAATTGCTGCATAGACTCTTCAACCATAATGGAAATGGATCGTTGTGGGGACAACAA TGAGCTGGCTTGGAACTTCTGTATGATGGATACAGGGTTTTCTCCGTTTTTGACTGA TCAGAATCTCGCGAATGAGAATCCCATAGAGTATCCGGAGCTATTCAATGAGTTAGC ATTTGAGGACAACATCGACTTCATGTTCGATGATGGGAAGCACGAGTGCTTGAACTT GGAAAATCTGGATTGTTGCGTGGTGGGAAGAGAGTCAAATGCAGCAGACGAAGTTGC TACTCAACTTTTGAATTTTGACTTGCTGAAGTTGGCTGGTGATGTTGAGTCAAACCC TGGACCTATGGCCAGTGCTATTCTTGCTTCATTACTCCACCCATCAGAAGTGTTGGC ACTTGTGCAGTACAAGCTTTCACCCAAAACCCAGCATGATTACTCTAACGACAAAAC TAGGCAAAGACTTTATCATCATCTTAATATGACTTCCCGATCCTTCTCTGCCGTCAT ACAGGACCTTGATGAAGAGTTAAAGGATGCTATATGCTTATTCTATCTGGTGCTGAG AGGCTTAGATACTATAGAAGACGACATGACCATCGACCTTGACACTAAATTGCCTTA CCTTCGTACGTTCCACGAAATCATATACCAGAAAGGCTGGACTTTCACTAAGAACGG CCCAAATGAAAAAGATAGGCAATTACTGGTAGAATTTGACGCCATCATAGAGGGCTT CCTTCAATTGAAGCCAGCCTATCAGACTATCATTGCCGATATAACCAAACGTATGGG GAACGGAATGGCACACTACGCTACGGCAGGGATACATGTTGAGACCAACGCAGACTA CGACGAGTACTGCCACTATGTCGCTGGTTTGGTGGGGCTGGGTCTCTCTGAAATGTT TTCCGCATGTGGGTTCGAAAGTCCTCTTGTGGCAGAAAGAAAAGACCTTAGCAACAG CATGGGACTTTTCCTTCAGAAGACGAACATTGCACGTGATTATCTTGAAGACCTCAG AGACAATCGTCGATTTTGGCCCAAGGAAATATGGGGGCAGTATGCTGAGACTATGGA GGACTTGGTAAAGCCCGAAAATAAAGAAAAGGCCCTCCAATGCCTCTCCCATATGAT CGTCAATGCAATGGAGCATATCAGAGACGTTTTGGAGTATCTCTCTATGATAAAGAA TCCGAGCTGCTTCAAATTTTGTGCTATTCCACAAGTCATGGCTATGGCCACATTAAA CCTGCTTCATTCCAACTACAAAGTGTTCACGCATGAGAATATCAAGATCCGTAAAGG TGAGACAGTGTGGCTTATGAAAGAAAGTGACAGTATGGACAAGGTAGCTGCTATCTT TAGGTTGTACGCCCGACAAATTAACAACAAGTCCAACTCTCTTGATCCCCATTTTGT GGATATAGGGGTGATTTGCGGTGAGATCGAGCAAATTTGCGTAGGAAGGTTCCCTGG CTCCACAATAGAAATGAAGCGAATGCAGGCTGGAGTCTTAGGGGGGAAAACTGGAAC GGTCCTGATGGCCGGCCCCATCATGACCTCTGCGCCCTCCGCGACCACGCCCACGGG CAAGACAATGCCGTTCAAGCAGCCTTTCAAGACTGTGGCCACGCTGTCCGCCAAGAC TGGCAACATTACCAAGCCCATCGACCCTGCCATCTCCAAGACCATTGACTTCGTCTA CAATGGTTACTCGACGGTCAAGACCAAGGTTGACAAGGCCCCTAAGGTAAACCCCTA CCTGCTCATTGCCGGCGGCCTCGTCCTCTCGTGCATCATCTCCATGTGCCTGCTCGT CCCGGCCGTGATCTTCTTCCCCGTCACCATCTTCCTGGGTGTCGCTACGTCGTTTGC GCTCATTGCATTGGCCCCCGTGGCTTTTGTGTTCGGGTGGATCCTGATCTCCTCTGC TCCGATCCAGGATAAGGTGGTGGTGCCCGCCTTGGACAAGGTGCTGGCCAATAAGAA GGTGGCGAAGTTCCTCCTCAAGGAGTAAtcgaggcctttaactctggtttcattaaa ttttctttagtttgaatttactgttattcggtgtgcatttctatgtttggtgagcgg ttttctgtgctcagagtgtgtttattttatgtaatttaatttctttgtgagctcctg tttagcaggtcgtcccttcagcaaggacacaaaaagattttaattttattaaaaaaa aaaaaaaaaaagaccgggaattcgatatcaagcttatcgacctgcagatcgttcaaa catttggcaataaagtttcttaagattgaatcctgttgccggtcttgcgatgattat catataatttctgttgaattacgttaagcatgtaataattaacatgtaatgcatgac gttatttatgagatgggtttttatgattagagtcccgcaattatacatttaatacgc gatagaaaacaaaatatagcgcgcaaactaggataaattatcgcgcgcggtgtcatc tatgttactagatctctagagtctcaagcttggcgcgccagcttggcgtaatcatgg tcatagctgttgcgattaagaattcgagctcggtacccccctactccaaaaatgtca aagatacagtctcagaagaccaaagggctattgagacttttcaacaaagggtaattt cgggaaacctcctcggattccattgcccagctatctgtcacttcatcgaaaggacag tagaaaaggaaggtggctcctacaaatgccatcattgcgataaaggaaaggctatca ttcaagatgcctctgccgacagtggtcccaaagatggacccccacccacgaggagca tcgtggaaaaagaagacgttccaaccacgtcttcaaagcaagtggattgatgtgaca tctccactgacgtaagggatgacgcacaatcccactatccttcgcaagacccttcct ctatataaggaagttcatttcatttggagaggacagcccaagcttcgactctagagg atCCCCttaaatcgatATTTATGATTTCGCCTCTGGCATCCGAGGAGGATGAGGAAA TTGTTAAATCTGTTGTTAATGGAACGATTCCTTCGTATTCGTTGGAATCGAAGCTTG GGGATTGTAAAAGAGCGGCTGAGATTCGACGGGAGGCTTTGCAGAGAATGATGGGGA GGTCGTTGGAGGGTTTACCTGTTGAAGGATTCGATTATGAGTCGATTTTAGGTCAGT GCTGTGAAATGCCTGTTGGTTATGTGCAGATTCCGGTTGGAATTGCTGGGCCGTTGC TGCTAGACGGGCAAGAGTACTCTGTTCCGATGGCGACCACCGAGGGTTGTTTGGTTG CTAGCACTAATAGAGGGTGTAAAGCGATCCATTTGTCAGGTGGTGCTAGTAGTGTCT TGTTGAAGGATGGCATGACTAGAGCTCCCGTTGTTCGATTCGCCTCGGCCATGAGGG CCGCGGATTTGAAGTTTTTCTTAGAGAATCCTGAGAATTTCGATAGCTTGTCCATCG CTTTCAATAGGTCCAGTAGATTTGCAAAGCTCCAAAGCATACAATGTTCTATTGCTG GAAAGAATCTATATATGAGATTCACCTGCAGCACTGGTGATGCAATGGGGATGAACA TGGTTTCCAAAGGGGTTCAAAACGTTCTTGACTTCCTTCAAAGTGATTTCCCTGACA TGGATGTTATTGGCATCTCAGGAAATTTTTGTTCGGACAAGAAGCCAGCTGCTGTGA ACTGGATTCAAGGGCGAGGCAAATCGGTTGTTTGCGAGGCAATTATCAAGGAAGAGG TGGTGAAGAAGGTATTGAAATCAAGTGTTGCTTCACTAGTAGAGCTGAACATGCTCA AGAATCTTACTGGTTCAGCTATTGCTGGAGCTCTTGGTGGATTCAATGCACATGCTG GCAACATAGTCTCTGCAATTTTCATTGCCACTGGCCAGGATCCAGCCCAGAATGTTG AGAGTTCTCATTGCATCACCATGATGGAAGCTGTCAATGATGGAAAAGATCTCCACA TCTCTGTAACCATGCCTTCAATCGAGGTAGGAACAGTTGGAGGAGGGACACAACTAG CATCCCAATCAGCATGTCTGAACCTACTCGGTGTAAAAGGAGCAAGTAAAGAATCAC CAGGAGCAAACTCAAGGCTCCTAGCCACAATAGTAGCTGGTTCAGTCCTAGCTGGTG AACTCTCCCTAATGTCAGCCATAGCAGCAGGACAACTAGTCCGGAGCCACATGAAGT ACAACAGATCCAGCAAAGATGTAACCAAATTTGCATCATCTTCAAATGCAGCAGACG AAGTTGCTACTCAACTTTTGAATTTTGACTTGCTGAAGTTGGCTGGTGATGTTGAGT CAAACCCTGGACCTATGGCGGATCTGAAATCAACCTTCCTCGACGTTTACTCTGTTC TCAAGTCTGATCTGCTTCAAGATCCTTCCTTTGAATTCACCCACGAATCTCGTCAAT GGCTTGAACGGATGCTTGACTACAATGTACGCGGAGGGAAGCTAAATCGTGGTCTCT CTGTGGTTGATAGCTACAAGCTGTTGAAGCAAGGTCAAGACTTGACGGAGAAAGAGA CTTTCCTCTCATGTGCTCTTGGTTGGTGCATTGAATGGCTTCAAGCTTATTTCCTTG TGCTTGATGACATCATGGACAACTCTGTCACACGCCGTGGCCAGCCTTGTTGGTTTA GAAAGCCAAAGGTTGGTATGATTGCCATTAACGATGGGATTCTACTTCGCAATCATA TCCACAGGATTCTCAAAAAGCACTTCAGGGAAATGCCTTACTATGTTGACCTCGTTG ATTTGTTTAACGAGGTAGAGTTTCAAACAGCTTGCGGCCAGATGATTGATTTGATCA CCACCTTTGATGGAGAAAAAGATTTGTCTAAGTACTCCTTGCAAATCCATCGGCGTA TTGTTGAGTACAAAACAGCTTATTACTCATTTTATCTTCCTGTTGCTTGCGCATTGC TCATGGCGGGAGAAAATTTGGAAAACCATACTGATGTGAAGACTGTTCTTGTTGACA TGGGAATTTACTTTCAAGTACAGGATGATTATCTGGACTGTTTTGCTGATCCTGAGA CACTTGGCAAGATAGGGACAGACATAGAAGATTTCAAATGCTCCTGGTTGGTAGTTA AGGCATTGGAACGCTGCAGTGAAGAACAAACTAAGATACTATACGAGAACTATGGTA AAGCCGAACCATCAAACGTTGCTAAGGTGAAAGCTCTCTACAAAGAGCTTGATCTCG AGGGAGCGTTCATGGAATATGAGAAGGAAAGCTATGAGAAGCTGACAAAGTTGATCG AAGCTCACCAGAGTAAAGCAATTCAAGCAGTGCTAAAATCTTTCTTGGCTAAGATCT ACAAGAGGCAGAAGTAAAAATCCTCAGCAATTGggggagctcgaattcgctgaaatc accagtctctctctacaaatctatctctctctattttctccataaataatgtgtgag tagtttcccgataagggaaattagggttcttatagggtttcgctcatgtgttgagca tataagaaacccttagtatgtatttgtatttgtaaaatacttctatcaataaaattt ctaattcctaaaaccaaaatccagtactaaaatccagatctcctaaagtccctatag atctttgtcgtgaatataaaccagacacgagacgactaaacctggagcccagacgcc gttcgaagctagaagtaccgcttaggcaggaggccgttagggaaaagatgctaaggc agggttggttacgttgactcccccgtaggtttggtttaaatatgatgaagtggacgg aaggaaggaggaagacaaggaaggataaggttgcaggccctgtgcaaggtaagaaga tggaaatttgatagaggtacgctactatacttatactatacgctaagggaatgcttg tatttataccctataccccctaataaccccttatcaatttaagaaataatccgcata agcccccgcttaaaaattggtatcagagccatgaataggtctatgaccaaaactcaa gaggataaaacctcaccaaaatacgaaagagttcttaactctaaagataaaagatgg cgcgtggccggcctacagtatgagcggagaattaagggagtcacgttatgacccccg ccgatgacgcgggacaagccgttttacgtttggaactgacagaaccgcaacgttgaa ggagccactcagccgcgggtttctggagtttaatgagctaagcacatacgtcagaaa ccattattgcgcgttcaaaagtcgcctaaggtcactatcagctagcaaatatttctt gtcaaaaatgctccactgacgttccataaattcccctcggtatccaattagagtctc atattcactctcaatccaaataatctgcaccggatctggatcgtttcgcatgattga acaagatggattgcacgcaggttctccggccgcttgggtggagaggctattcggcta tgactgggcacaacagacaatcggctgctctgatgccgccgtgttccggctgtcagc gcaggggcgcccggttctttttgtcaagaccgacctgtccggtgccctgaatgaact gcaggacgaggcagcgcggctatcgtggctggccacgacgggcgttccttgcgcagc tgtgctcgacgttgtcactgaagcgggaagggactggctgctattgggcgaagtgcc ggggcaggatctcctgtcatctcaccttgctcctgccgagaaagtatccatcatggc tgatgcaatgcggcggctgcatacgcttgatccggctacctgcccattcgaccacca agcgaaacatcgcatcgagcgagcacgtactcggatggaagccggtcttgtcgatca ggatgatctggacgaagagcatcaggggctcgcgccagccgaactgttcgccaggct caaggcgcgcatgcccgacggcgatgatctcgtcgtgacccatggcgatgcctgctt gccgaatatcatggtggaaaatggccgcttttctggattcatcgactgtggccggct gggtgtggcggaccgctatcaggacatagcgttggctacccgtgatattgctgaaga gcttggcggcgaatgggctgaccgcttcctcgtgctttacggtatcgccgctcccga ttcgcagcgcatcgccttctatcgccttcttgacgagttcttctgagcgggactctg gggttcgaaatgaccgaccaagcgacgcccaacctgccatcacgagatttcgattcc accgccgccttctatgaaaggttgggcttcggaatcgttttccgggacgccggctgg atgatcctccagcgcggggatctcatgctggagttcttcgcccacgggatctctgcg gaacaggcggtcgaaggtgccgatatcattacgacagcaacggccgacaagcacaac gccacgatcctgagcgacaatatgatcgcggcgtccacatcaacggcgtcggcggcg actgcccaggcaagaccgagatgcaccgcgatatcttgctgcgttcggatattttcg tggagttcccgccacagacccggatgatccccgatcgttcaaacatttggcaataaa gtttcttaagattgaatcctgttgccggtcttgcgatgattatcatataatttctgt tgaattacgttaagcatgtaataattaacatgtaatgcatgacgttatttatgagat gggtttttatgattagagtcccgcaattatacatttaatacgcgatagaaaacaaaa tatagcgcgcaaactaggataaattatcgcgcgcggtgtcatctatgttactagate gggactgtaggccggccctcactggtgaaaagaaaaaccaccccagtacattaaaaa cgtccgcaatgtgttattaagttgtctaagcgtcaatttgtttacaccacaatatat cctgccaccagccagccaacagctccccgaccggcagctcggcacaaaatcaccact cgatacaggcagcccatcagtccgggacggcgtcagcgggagagccgttgtaaggcg gcagactttgctcatgttaccgatgctattcggaagaacggcaactaagctgccggg tttgaaacacggatgatctcgcggagggtagcatgttgattgtaacgatgacagagc gttgctgcctgtgatcaaatatcatctccctcgcagagatccgaattatcagccttc ttattcatttctcgcttaaccgtgacagagtagacaggctgtctcgcggccgagggg cgcagcccctgggggggatgggaggcccgcgttagcgggccgggagggttcgagaag ggggggcaccccccttcggcgtgcgcggtcacgcgcacagggcgcagccctggttaa aaacaaggtttataaatattggtttaaaagcaggttaaaagacaggttagcggtggc cgaaaaacgggcggaaacccttgcaaatgctggattttctgcctgtggacagcccct caaatgtcaataggtgcgcccctcatctgtcagcactctgcccctcaagtgtcaagg atcgcgcccctcatctgtcagtagtcgcgcccctcaagtgtcaataccgcagggcac ttatccccaggcttgtccacatcatctgtgggaaactcgcgtaaaatcaggcgtttt cgccgatttgcgaggctggccagctccacgtcgccggccgaaatcgagcctgcccct catctgtcaacgccgcgccgggtgagtcggcccctcaagtgtcaacgtccgcccctc atctgtcagtgagggccaagttttccgcgaggtatccacaacgccggcggccgcggt gtctcgcacacggcttcgacggcgtttctggcgcgtttgcagggccatagacggccg ccagcccagcggcgagggcaaccagcccggtgagcgtcggaaaggcgctcggtcttg ccttgctcgtcggtgatgtacactagtcgctggctgctgaacccccagccggaactg accccacaaggccctagcgtttgcaatgcaccaggtcatcattgacccaggcgtgtt ccaccaggccgctgcctcgcaactcttcgcaggcttcgccgacctgctcgcgccact tcttcacgcgggtggaatccgatccgcacatgaggcggaaggtttccagcttgagcg ggtacggctcccggtgcgagctgaaatagtcgaacatccgtcgggccgtcggcgaca gcttgcggtacttctcccatatgaatttcgtgtagtggtcgccagcaaacagcacga cgatttcctcgtcgatcaggacctggcaacgggacgttttcttgccacggtccagga cgcggaagcggtgcagcagcgacaccgattccaggtgcccaacgcggtcggacgtga agcccatcgccgtcgcctgtaggcgcgacaggcattcctcggccttcgtgtaatacc ggccattgatcgaccagcccaggtcctggcaaagctcgtagaacgtgaaggtgatcg gctcgccgataggggtgcgcttcgcgtactccaacacctgctgccacaccagttcgt catcgtcggcccgcagctcgacgccggtgtaggtgatcttcacgtccttgttgacgt ggaaaatgaccttgttttgcagcgcctcgcgcgggattttcttgttgcgcgtggtga acagggcagagcgggccgtgtcgtttggcatcgctcgcatcgtgtccggccacggcg caatatcgaacaaggaaagctgcatttccttgatctgctgcttcgtgtgtttcagca acgcggcctgcttggcctcgctgacctgttttgccaggtcctcgccggcggtttttc gcttcttggtcgtcatagttcctcgcgtgtcgatggtcatcgacttcgccaaacctg ccgcctcctgttcgagacgacgcgaacgctccacggcggccgatggcgcgggcaggg cagggggagccagttgcacgctgtcgcgctcgatcttggccgtagcttgctggacca tcgagccgacggactggaaggtttcgcggggcgcacgcatgacggtgcggcttgcga tggtttcggcatcctcggcggaaaaccccgcgtcgatcagttcttgcctgtatgcct tccggtcaaacgtccgattcattcaccctccttgcgggattgccccgactcacgccg gggcaatgtgcccttattcctgatttgacccgcctggtgccttggtgtccagataat ccaccttatcggcaatgaagtcggtcccgtagaccgtctggccgtccttctcgtact tggtattccgaatcttgccctgcacgaataccagcgaccccttgcccaaatacttgc cgtgggcctcggcctgagagccaaaacacttgatgcggaagaagtcggtgcgctcct gcttgtcgccggcatcgttgcgccacatctaggtactaaaacaattcatccagtaaa atataatattttattttctcccaatcaggcttgatccccagtaagtcaaaaaatagc tcgacatactgttcttccccgatatcctccctgatcgaccggacgcagaaggcaatg tcataccacttgtccgccctgccgcttctcccaagatcaataaagccacttactttg ccatctttcacaaagatgttgctgtctcccaggtcgccgtgggaaaagacaagttcc tcttcgggcttttccgtctttaaaaaatcatacagctcgcgcggatctttaaatgga gtgtcttcttcccagttttcgcaatccacatcggccagatcgttattcagtaagtaa tccaattcggctaagcggctgtctaagctattcgtatagggacaatccgatatgtcg atggagtgaaagagcctgatgcactccgcatacagctcgataatcttttcagggctt tgttcatcttcatactcttccgagcaaaggacgccatcggcctcactcatgagcaga ttgctccagccatcatgccgttcaaagtgcaggacctttggaacaggcagctttcct tccagccatagcatcatgtccttttcccgttccacatcataggtggtccctttatac cggctgtccgtcatttttaaatataggttttcattttctcccaccagcttatatacc ttagcaggagacattccttccgt atcttttacgcagcggtatttttcgatcagtttt ttcaattccggtgatattctcattttagccatttattatttccttcctcttttctac agtatttaaagataccccaagaagctaattataacaagacgaactccaattcactgt tccttgcattctaaaaccttaaataccagaaaacagctttttcaaagttgttttcaa agttggcgtataacatagtatcgacggagccgattttgaaaccacaattatgggtga tgctgccaacttactgatttagtgtatgatggtgtttttgaggtgctccagtggctt ctgtttctatcagctgtccctcctgttcagctactgacggggtggtgcgtaacggca aaagcaccgccggacatcagcgctatctctgctctcactgccgtaaaacatggcaac tgcagttcacttacaccgcttctcaacccggtacgcaccagaaaatcattgatatgg ccatgaatggcgttggatgccgggcaacagcccgcattatgggcgttggcctcaaca cgattttacgtcacttaaaaaactcaggccgcagtcggtaactatgcggtgtgaaat accgcacagatgcgtaaggagaaaataccgcatcaggcgctcttccgcttcctcgct cactgactcgctgcgctcggtcgttcggctgcggcgagcggtatcagctcactcaaa ggcggtaatacggttatccacagaatcaggggataacgcaggaaagaacatgtgagc aaaaggccagcaaaaggccaggaaccgtaaaaaggccgcgttgctggcgtttttcca

The plds 1 hf2-peaq_wri 1 lv 1 sqs-ldspmcs l_hmgrlv 1 fppsmcs2 plasmid encodes the following in multi-cloning site within site 1 (SEQ ID NO:107).

The plds 1 hf2-peaq_wril lvl sqs-ldspmcs l_hmgrlvl fppsmcs2 plasmid encodes the following in site 2 (SEQ ID NO: 108).

LNLLGVKGAS KESPGANSRL LATIVAGSVL AGELSLMSAI

AAGQLVRSHM KYNRSSKDVT KFASSSNAAD EVATQLLNFD LLKLAGDVES NPGPMADLKS TFLDVYSVLK SDLLQDPSFE FTHESRQWLE RMLDYNVRGG KLNRGLSWD SYKLLKQGQD LTEKETFLSC ALGWCIEWLQ AYFLVLDDIM DNSVTRRGQP CWFRKPKVGM IAINDGILLR NHIHRILKKH FREMPYYVDL VDLFNEVEFQ TACGQMIDLI TTFDGEKDLS KYSLQIHRRI VEYKTAYYSF YLPVACALLM AGENLENHTD VKTVLVDMGI YFQVQDDYLD CFADPETLGK IGTDIEDFKC SWLWKALER CSEEQTKILY ENYGKAEPSN VAKVKALYKE LDLEGAFMEY EKESYEKLTK LIEAHQSKAI QAVLKSFLAK IYKRQK

The pwh 1 slf2-peaq_wri 1 lv 1 hmgrmcs l_sqs-ldsp-fppsmcs2 plasmid has the following sequence (SEQ ID NO: 109).

cctgtggttggcatgcacatacaaatggacgaacggataaaccttttcacgcccttt taaatatccgattattctaataaacgctcttttctcttaggtttacccgccaatata tcctgtcaaacactgatagtttgtgaaccatcacccaaatcaagttttttggggtcg aggtgccgtaaagcactaaatcggaaccctaaagggagcccccgatttagagcttga cggggaaagccggcgaacgtggcgagaaaggaagggaagaaagcgaaaggagcgggc gccattcaggctgcgcaactgttgggaagggcgatcggtgcgggcctcttcgctatt acgccagctggcgaaagggggatgtgctgcaaggcgattaagttgggtaacgccagg gttttcccagtcacgacgttgtaaaacgacggccagtgaattgttaattaagaattc gagctccaccgcggaaacctcctcggattccattgcccagctatctgtcactttatt gagaagatagtggaaaaggaaggtggctcctacaaatgccatcattgcgataaagga aaggccatcgttgaagatgcctctgccgacagtggtcccaaagatggacccccaccc acgaggagcatcgtggaaaaagaagacgttccaaccacgtcttcaaagcaagtggat tgatgtgatatctccactgacgtaagggatgacgcacaatcccactatccttcgcaa gacccttcctctatataaggaagttcatttcatttggagaggtattaaaatcttaat aggttttgataaaagcgaacgtggggaaacccgaaccaaaccttcttctaaactctc tctcatctctcttaaagcaaacttctctcttgtctttcttgcgtgagcgatcttcaa cgttgtcagatcgtgcttcggcaccagtacaacgttttctttcactgaagcgaaatc aaagatctctttgtggacacgtagtgcggcgccattaaataacgtgtacttgtccta ttcttgtcggtgtggtcttgggaaaagaaagcttgctggaggctgctgttcagcccc atacattacttgttacgattctgctgactttcggcgggtgcaatatctctacttctg cttgacgaggtattgttgcctgtacttctttcttcttcttcttgctgattggttcta taagaaatctagtattttctttgaaacagagttttcccgtggttttcgaacttggag aaagattgttaagcttctgtatattctgcccaaattcgcgATGAAGAAGCGCTTAAC

CACTTCCACTTGTTCTTCTTCTCCATCTTCCTCTGTTTCTTCTTCTACTACTACTTC

CTCTCCTATTCAGTCGGAGGCTCCAAGGCCTAAACGAGCCAAAAGGGCTAAGAAATC

TTCTCCTTCTGGTGATAAATCTCATAACCCGACAAGCCCTGCTTCTACCCGACGCAG

CTCTATCTACAGAGGAGTCACTAGACATAGATGGACTGGGAGATTCGAGGCTCATCT

TTGGGACAAAAGCTCTTGGAATTCGATTCAGAACAAGAAAGGCAAACAAGTTTATCT

GGGAGCATATGACAGTGAAGAAGCAGCAGCACATACGTACGATCTGGCTGCTCTCAA

GTACTGGGGACCCGACACCATCTTGAATTTTCCGGCAGAGACGTACACAAAGGAATT

GGAAGAAATGCAGAGAGTGACAAAGGAAGAATATTTGGCTTCTCTCCGCCGCCAGAG

CAGTGGTTTCTCCAGAGGCGTCTCTAAATATCGCGGCGTCGCTAGGCATCACCACAA

CGGAAGATGGGAGGCTCGGATCGGAAGAGTGTTTGGGAACAAGTACTTGTACCTCGG

CACCTATAATACGCAGGAGGAAGCTGCTGCAGCATATGACATGGCTGCGATTGAGTA

TCGAGGCGCAAACGCGGTTACTAATTTCGACATTAGTAATTACATTGACCGGTTAAA GAAGAAAGGTGTTTTCCCGTTCCCTGTGAACCAAGCTAACCATCAAGAGGGTATTCT TGTTGAAGCCAAACAAGAAGTTGAAACGAGAGAAGCGAAGGAAGAGCCTAGAGAAGA AGTGAAACAACAGTACGTGGAAGAACCACCGCAAGAAGAAGAAGAGAAGGAAGAAGA GAAAGCAGAGCAACAAGAAGCAGAGATTGTAGGATATTCAGAAGAAGCAGCAGTGGT CAATTGCTGCATAGACTCTTCAACCATAATGGAAATGGATCGTTGTGGGGACAACAA TGAGCTGGCTTGGAACTTCTGTATGATGGATACAGGGTTTTCTCCGTTTTTGACTGA TCAGAATCTCGCGAATGAGAATCCCATAGAGTATCCGGAGCTATTCAATGAGTTAGC ATTTGAGGACAACATCGACTTCATGTTCGATGATGGGAAGCACGAGTGCTTGAACTT GGAAAATCTGGATTGTTGCGTGGTGGGAAGAGAGTCAAATGCAGCAGACGAAGTTGC TACTCAACTTTTGAATTTTGACTTGCTGAAGTTGGCTGGTGATGTTGAGTCAAACCC TGGACCTATGATTTCGCCTCTGGCATCCGAGGAGGATGAGGAAATTGTTAAATCTGT TGTTAATGGAACGATTCCTTCGTATTCGTTGGAATCGAAGCTTGGGGATTGTAAAAG AGCGGCTGAGATTCGACGGGAGGCTTTGCAGAGAATGATGGGGAGGTCGTTGGAGGG TTTACCTGTTGAAGGATTCGATTATGAGTCGATTTTAGGTCAGTGCTGTGAAATGCC TGTTGGTTATGTGCAGATTCCGGTTGGAATTGCTGGGCCGTTGCTGCTAGACGGGCA AGAGTACTCTGTTCCGATGGCGACCACCGAGGGTTGTTTGGTTGCTAGCACTAATAG AGGGTGTAAAGCGATCCATTTGTCAGGTGGTGCTAGTAGTGTCTTGTTGAAGGATGG CATGACTAGAGCTCCCGTTGTTCGATTCGCCTCGGCCATGAGGGCCGCGGATTTGAA GTTTTTCTTAGAGAATCCTGAGAATTTCGATAGCTTGTCCATCGCTTTCAATAGG TC CAGTAGATTTGCAAAGCTCCAAAGCATACAATGTTCTATTGCTGGAAAGAATCTATA TATGAGATTCACCTGCAGCACTGGTGATGCAATGGGGATGAACATGGTTTCCAAAGG GGTTCAAAACGTTCTTGACTTCCTTCAAAGTGATTTCCCTGACATGGATGTTATTGG CATCTCAGGAAATTTTTGTTCGGACAAGAAGCCAGCTGCTGTGAACTGGATTCAAGG GCGAGGCAAATCGGTTGTTTGCGAGGCAATTATCAAGGAAGAGGTGGTGAAGAAGGT ATTGAAATCAAGTGTTGCTTCACTAGTAGAGCTGAACATGCTCAAGAATCTTACTGG TTCAGCTATTGCTGGAGCTCTTGGTGGATTCAATGCACATGCTGGCAACATAGTCTC TGCAATTTTCATTGCCACTGGCCAGGATCCAGCCCAGAATGTTGAGAGTTCTCATTG CATCACCATGATGGAAGCTGTCAATGATGGAAAAGATCTCCACATCTCTGTAACCAT GCCTTCAATCGAGGTAGGAACAGTTGGAGGAGGGACACAACTAGCATCCCAATCAGC ATGTCTGAACCTACTCGGTGTAAAAGGAGCAAGTAAAGAATCACCAGGAGCAAACTC AAGGCTCCTAGCCACAATAGTAGCTGGTTCAGTCCTAGCTGGTGAACTCTCCCTAAT GTCAGCCATAGCAGCAGGACAACTAGTCCGGAGCCACATGAAGTACAACAGATCCAG CAAAGATGTAACCAAATTTGCATCATCTTAAtcgaggcctttaactctggtttcatt aaattttctttagtttgaatttactgttattcggtgtgcatttctatgtttggtgag cggttttctgtgctcagagtgtgtttattttatgtaatttaatttctttgtgagctc ctgtttagcaggtcgtcccttcagcaaggacacaaaaagattttaattttattaaaa aaaaaaaaaaaaaagaccgggaattcgatatcaagcttatcgacctgcagatcgttc aaacatttggcaataaagtttcttaagattgaatcctgttgccggtcttgcgatgat tatcatataatttctgttgaattacgttaagcatgtaataattaacatgtaatgcat gacgttatttatgagatgggtttttatgattagagtcccgcaattatacatttaata cgcgatagaaaacaaaatatagcgcgcaaactaggataaattatcgcgcgcggtgtc atctatgttactagatctctagagtctcaagcttggcgcgccagcttggcgtaatca tggtcatagctgttgcgattaagaattcgagctcggtacccccctactccaaaaatg tcaaagatacagtctcagaagaccaaagggctattgagacttttcaacaaagggtaa tttcgggaaacctcctcggattccattgcccagctatctgtcacttcatcgaaagga cagtagaaaaggaaggtggctcctacaaatgccatcattgcgataaaggaaaggcta tcattcaagatgcctctgccgacagtggtcccaaagatggacccccacccacgagga gcatcgtggaaaaagaagacgttccaaccacgtcttcaaagcaagtggattgatgtg acatctccactgacgtaagggatgacgcacaatcccactatccttcgcaagaccctt cctctatataaggaagttcattt catttggagaggacagcccaagcttcgactctag aggatccccttaaatcgatATTTATGGCCAGTGCTATTCTTGCTTCATTACTCCACC CATCAGAAGTGTTGGCACTTGTGCAGTACAAGCTTTCACCCAAAACCCAGCATGATT ACTCTAACGACAAAACTAGGCAAAGACTTTATCATCATCTTAATATGACTTCCCGAT CCTTCTCTGCCGTCATACAGGACCTTGATGAAGAGTTAAAGGATGCTATATGCTTAT TCTATCTGGTGCTGAGAGGCTTAGATACTATAGAAGACGACATGACCATCGACCTTG ACACTAAATTGCCTTACCTTCGTACGTTCCACGAAATCATATACCAGAAAGGCTGGA CTTTCACTAAGAACGGCCCAAATGAAAAAGATAGGCAATTACTGGTAGAATTTGACG CCATCATAGAGGGCTTCCTTCAATTGAAGCCAGCCTATCAGACTATCATTGCCGATA TAACCAAACGTATGGGGAACGGAATGGCACACTACGCTACGGCAGGGATACATGTTG AGACCAACGCAGACTACGACGAGTACTGCCACTATGTCGCTGGTTTGGTGGGGCTGG GTCTCTCTGAAATGTTTTCCGCATGTGGGTTCGAAAGTCCTCTTGTGGCAGAAAGAA AAGACCTTAGCAACAGCATGGGACTTTTCCTTCAGAAGACGAACATTGCACGTGATT ATCTTGAAGACCTCAGAGACAATCGTCGATTTTGGCCCAAGGAAATATGGGGGCAGT ATGCTGAGACTATGGAGGACTTGGTAAAGCCCGAAAATAAAGAAAAGGCCCTCCAAT GCCTCTCCCATATGATCGTCAATGCAATGGAGCATATCAGAGACGTTTTGGAGTATC TCTCTATGATAAAGAATCCGAGCTGCTTCAAATTTTGTGCTATTCCACAAGTCATGG CTATGGCCACATTAAACCTGCTTCATTCCAACTACAAAGTGTTCACGCATGAGAATA TCAAGATCCGTAAAGGTGAGACAGTGTGGCTTATGAAAGAAAGTGACAGTATGGACA AGGTAGCTGCTATCTTTAGGTTGTACGCCCGACAAATTAACAACAAGTCCAACTCTC TTGATCCCCATTTTGTGGATATAGGGGTGATTTGCGGTGAGATCGAGCAAATTTGCG TAGGAAGGTTCCCTGGCTCCACAATAGAAATGAAGCGAATGCAGGCTGGAGTCTTAG GGGGGAAAACTGGAACGGTCCTGATGGCCGGCCCCATCATGACCTCTGCGCCCTCCG CGACCACGCCCACGGGCAAGACAATGCCGTTCAAGCAGCCTTTCAAGACTGTGGCCA CGCTGTCCGCCAAGACTGGCAACATTACCAAGCCCATCGACCCTGCCATCTCCAAGA CCATTGACTTCGTCTACAATGGTTACTCGACGGTCAAGACCAAGGTTGACAAGGCCC CTAAGGTAAACCCCTACCTGCTCATTGCCGGCGGCCTCGTCCTCTCGTGCATCATCT CCATGTGCCTGCTCGTCCCGGCCGTGATCTTCTTCCCCGTCACCATCTTCCTGGGTG TCGCTACGTCGTTTGCGCTCATTGCATTGGCCCCCGTGGCTTTTGTGTTCGGGTGGA TCCTGATCTCCTCTGCTCCGATCCAGGATAAGGTGGTGGTGCCCGCCTTGGACAAGG TGCTGGCCAATAAGAAGGTGGCGAAGTTCCTCCTCAAGGAGATGGCGGATCTGAAAT CAACCTTCCTCGACGTTTACTCTGTTCTCAAGTCTGATCTGCTTCAAGATCCTTCCT TTGAATTCACCCACGAATCTCGTCAATGGCTTGAACGGATGCTTGACTACAATGTAC GCGGAGGGAAGCTAAATCGTGGTCTCTCTGTGGTTGATAGCTACAAGCTGTTGAAGC AAGGTCAAGACTTGACGGAGAAAGAGACTTTCCTCTCATGTGCTCTTGGTTGGTGCA TTGAATGGCTTCAAGCTTATTTCCTTGTGCTTGATGACATCATGGACAACTCTGTCA CACGCCGTGGCCAGCCTTGTTGGTTTAGAAAGCCAAAGGTTGGTATGATTGCCATTA ACGATGGGATTCTACTTCGCAATCATATCCACAGGATTCTCAAAAAGCACTTCAGGG AAATGCCTTACTATGTTGACCTCGTTGATTTGTTTAACGAGGTAGAGTTTCAAACAG CTTGCGGCCAGATGATTGATTTGATCACCACCTTTGATGGAGAAAAAGATTTGTCTA AGTACTCCTTGCAAATCCATCGGCGTATTGTTGAGTACAAAACAGCTTATTACTCAT TTTATCTTCCTGTTGCTTGCGCATTGCTCATGGCGGGAGAAAATTTGGAAAACCATA CTGATGTGAAGACTGTTCTTGTTGACATGGGAATTTACTTTCAAGTACAGGATGATT ATCTGGACTGTTTTGCTGATCCTGAGACACTTGGCAAGATAGGGACAGACATAGAAG ATTTCAAATGCTCCTGGTTGGTAGTTAAGGCATTGGAACGCTGCAGTGAAGAACAAA CTAAGATACTATACGAGAACTATGGTAAAGCCGAACCATCAAACGTTGCTAAGGTGA AAGCTCTCTACAAAGAGCTTGATCTCGAGGGAGCGTTCATGGAATATGAGAAGGAAA GCTATGAGAAGCTGACAAAGTTGATCGAAGCTCACCAGAGTAAAGCAATTCAAGCAG TGCTAAAATCTTTCTTGGCTAAGATCTACAAGAGGCAGAAGTAAAAATCCTCAGCAA TTGggggagctcgaattcgctgaaatcaccagtctctctctacaaatctatctctct ctattttctccataaataatgtgtgagtagtttcccgataagggaaattagggttct tatagggtttcgctcatgtgttgagcatataagaaacccttagtatgtatttgtatt tgtaaaatacttctatcaataaaatttctaattcctaaaaccaaaatccagtactaa aatccagatctcctaaagtccctatagatctttgtcgtgaatataaaccagacacga gacgactaaacctggagcccagacgccgttcgaagctagaagtaccgcttaggcagg aggccgttagggaaaagatgctaaggcagggttggttacgttgactcccccgtaggt ttggtttaaatatgatgaagtggacggaaggaaggaggaagacaaggaaggataagg ttgcaggccctgtgcaaggtaagaagatggaaatttgatagaggtacgctactatac ttatactatacgctaagggaatgcttgtatttataccctataccccctaataacccc ttatcaatttaagaaataatccgcataagcccccgcttaaaaattggtatcagagcc atgaataggtctatgaccaaaactcaagaggataaaacctcaccaaaatacgaaaga gttcttaactctaaagataaaagatggcgcgtggccggcctacagtatgagcggaga attaagggagtcacgttatgacccccgccgatgacgcgggacaagccgttttacgtt tggaactgacagaaccgcaacgttgaaggagccactcagccgcgggtttctggagtt taatgagctaagcacatacgtcagaaaccattattgcgcgttcaaaagtcgcctaag gtcactatcagctagcaaatatttcttgtcaaaaatgctccactgacgttccataaa ttcccctcggtatccaattagagtctcatattcactctcaatccaaataatctgcac cggatctggatcgtttcgcatgattgaacaagatggattgcacgcaggttctccggc cgcttgggtggagaggctattcggctatgactgggcacaacagacaatcggctgctc tgatgccgccgtgttccggctgtcagcgcaggggcgcccggttctttttgtcaagac cgacctgtccggtgccctgaatgaactgcaggacgaggcagcgcggctatcgtggct ggccacgacgggcgttccttgcgcagctgtgctcgacgttgtcactgaagcgggaag ggactggctgctattgggcgaagtgccggggcaggatctcctgtcatctcaccttgc tcctgccgagaaagtatccatcatggctgatgcaatgcggcggctgcatacgcttga tccggctacctgcccattcgaccaccaagcgaaacatcgcatcgagcgagcacgtac tcggatggaagccggtcttgtcgatcaggatgatctggacgaagagcatcaggggct cgcgccagccgaactgttcgccaggctcaaggcgcgcatgcccgacggcgatgatct cgtcgtgacccatggcgatgcctgcttgccgaatatcatggtggaaaatggccgctt ttctggattcatcgactgtggccggctgggtgtggcggaccgctatcaggacatagc gttggctacccgtgatattgctgaagagcttggcggcgaatgggctgaccgcttcct cgtgctttacggtatcgccgctcccgattcgcagcgcatcgccttctatcgccttct tgacgagttcttctgagcgggactctggggttcgaaatgaccgaccaagcgacgccc aacctgccatcacgagatttcgattccaccgccgccttctatgaaaggttgggcttc ggaatcgttttccgggacgccggctggatgatcctccagcgcggggatctcatgctg gagttcttcgcccacgggatctctgcggaacaggcggtcgaaggtgccgatatcatt acgacagcaacggccgacaagcacaacgccacgatcctgagcgacaatatgatcgcg gcgtccacatcaacggcgtcggcggcgactgcccaggcaagaccgagatgcaccgcg atatcttgctgcgttcggatattttcgtggagttcccgccacagacccggatgatcc ccgatcgttcaaacatttggcaataaagtttcttaagattgaatcctgttgccggtc ttgcgatgattatcatataatttctgttgaattacgttaagcatgtaataattaaca tgtaatgcatgacgttatttatgagatgggtttttatgattagagtcccgcaattat acatttaatacgcgatagaaaacaaaatatagcgcgcaaactaggataaattatcgc gcgcggtgtcatctatgttactagatcgggactgtaggccggccctcactggtgaaa agaaaaaccaccccagtacattaaaaacgtccgcaatgtgttattaagttgtctaag cgtcaatttgtttacaccacaatatatcctgccaccagccagccaacagctccccga ccggcagctcggcacaaaatcaccactcgatacaggcagcccatcagtccgggacgg cgtcagcgggagagccgttgtaaggcggcagactttgctcatgttaccgatgctatt cggaagaacggcaactaagctgccgggtttgaaacacggatgatctcgcggagggta gcatgttgattgtaacgatgacagagcgttgctgcctgtgatcaaatatcatctccc tcgcagagatccgaattatcagccttcttattcatttctcgcttaaccgtgacagag tagacaggctgtctcgcggccgaggggcgcagcccctgggggggatgggaggcccgc gttagcgggccgggagggttcgagaagggggggcaccccccttcggcgtgcgcggtc acgcgcacagggcgcagccctggttaaaaacaaggtttataaatattggtttaaaag caggttaaaagacaggttagcggtggccgaaaaacgggcggaaacccttgcaaatgc tggattttctgcctgtggacagcccctcaaatgtcaataggtgcgcccctcatctgt cagcactctgcccctcaagtgtcaaggatcgcgcccctcatctgtcagtagtcgcgc ccctcaagtgtcaataccgcagggcacttatccccaggcttgtccacatcatctgtg ggaaactcgcgtaaaatcaggcgttttcgccgatttgcgaggctggccagctccacg tcgccggccgaaatcgagcctgcccctcatctgtcaacgccgcgccgggtgagtcgg cccctcaagtgtcaacgtccgcccctcatctgtcagtgagggccaagttttccgcga ggtatccacaacgccggcggccgcggtgtctcgcacacggcttcgacggcgtttctg gcgcgtttgcagggccatagacggccgccagcccagcggcgagggcaaccagcccgg tgagcgtcggaaaggcgctcggtcttgccttgctcgtcggtgatgtacactagtcgc tggctgctgaacccccagccggaactgaccccacaaggccctagcgtttgcaatgca ccaggtcatcattgacccaggcgtgttccaccaggccgctgcctcgcaactcttcgc aggcttcgccgacctgctcgcgccacttcttcacgcgggtggaatccgatccgcaca tgaggcggaaggtttccagcttgagcgggtacggctcccggtgcgagctgaaatagt cgaacatccgtcgggccgtcggcgacagcttgcggtacttctcccatatgaatttcg tgtagtggtcgccagcaaacagcacgacgatttcctcgtcgatcaggacctggcaac gggacgttttcttgccacggtccaggacgcggaagcggtgcagcagcgacaccgatt ccaggtgcccaacgcggtcggacgtgaagcccatcgccgtcgcctgtaggcgcgaca ggcattcctcggccttcgtgtaataccggccattgatcgaccagcccaggtcctggc aaagctcgtagaacgtgaaggtgatcggctcgccgataggggtgcgcttcgcgtact ccaacacctgctgccacaccagttcgtcatcgtcggcccgcagctcgacgccggtgt aggtgatcttcacgtccttgttgacgtggaaaatgaccttgttttgcagcgcctcgc gcgggattttcttgttgcgcgtggtgaacagggcagagcgggccgtgtcgtttggca tcgctcgcatcgtgtccggccacggcgcaatatcgaacaaggaaagctgcatttcct tgatctgctgcttcgtgtgtttcagcaacgcggcctgcttggcctcgctgacctgtt ttgccaggtcctcgccggcggtttttcgcttcttggtcgtcatagttcctcgcgtgt cgatggtcatcgacttcgccaaacctgccgcctcctgttcgagacgacgcgaacgct ccacggcggccgatggcgcgggcagggcagggggagccagttgcacgctgtcgcgct cgatcttggccgtagcttgctggaccatcgagccgacggactggaaggtttcgcggg gcgcacgcatgacggtgcggcttgcgatggtttcggcatcctcggcggaaaaccccg cgtcgatcagttcttgcctgtatgccttccggtcaaacgtccgattcattcaccctc cttgcgggattgccccgactcacgccggggcaatgtgcccttattcctgatttgacc cgcctggtgccttggtgtccagataatccaccttatcggcaatgaagtcggtcccgt agaccgtctggccgtccttctcgtacttggtattccgaatcttgccctgcacgaata ccagcgaccccttgcccaaatacttgccgtgggcctcggcctgagagccaaaacact tgatgcggaagaagtcggtgcgctcctgcttgtcgccggcatcgttgcgccacatct aggtactaaaacaattcatccagtaaaatataatattttattttctcccaatcaggc ttgatccccagtaagtcaaaaaatagctcgacatactgttcttccccgatatcctcc ctgatcgaccggacgcagaaggcaatgtcataccacttgtccgccctgccgcttctc ccaagatcaataaagccacttactttgccatctttcacaaagatgttgctgtctccc aggtcgccgtgggaaaagacaagttcctcttcgggcttttccgtctttaaaaaatca tacagctcgcgcggatctttaaatggagtgtcttcttcccagttttcgcaatccaca tcggccagatcgttattcagtaagtaatccaattcggctaagcggctgtctaagcta ttcgtatagggacaatccgatatgtcgatggagtgaaagagcctgatgcactccgca tacagctcgataatcttttcagggctttgttcatcttcatactcttccgagcaaagg acgccatcggcctcactcatgagcagattgctccagccatcatgccgttcaaagtgc aggacctttggaacaggcagctttccttccagccatagcatcatgtccttttcccgt tccacatcataggtggtccctttataccggctgtccgtcatttttaaatataggttt tcattttctcccaccagcttatataccttagcaggagacattccttccgtatctttt acgcagcggtatttttcgatcagttttttcaattccggtgatattctcattttagcc atttattatttccttcctcttttctacagtatttaaagataccccaagaagctaatt ataacaagacgaactccaattcactgttccttgcattctaaaaccttaaataccaga aaacagctttttcaaagttgttttcaaagttggcgtataacatagtatcgacggagc cgattttgaaaccacaattatgggtgatgctgccaacttactgatttagtgtatgat ggtgtttttgaggtgctccagtggcttctgtttctatcagctgtccctcctgttcag ctactgacggggtggtgcgtaacggcaaaagcaccgccggacatcagcgctatctct gctctcactgccgtaaaacatggcaactgcagttcacttacaccgcttctcaacccg gtacgcaccagaaaatcattgatatggccatgaatggcgttggatgccgggcaacag cccgcattatgggcgttggcctcaacacgattttacgtcacttaaaaaactcaggcc gcagtcggtaactatgcggtgtgaaataccgcacagatgcgtaaggagaaaataccg catcaggcgctcttccgcttcctcgctcactgactcgctgcgctcggtcgttcggct gcggcgagcggtatcagctcactcaaaggcggtaatacggttatccacagaatcagg ggataacgcaggaaagaacatgtgagcaaaaggccagcaaaaggccaggaaccgtaa aaaggccgcgttgctggcgtttttccataggctccgcccccctgacgagcatcacaa aaatcgacgctcaagtcagaggtggcgaaacccgacaggactataaagataccaggc gtttccccctggaagctccctcgtgcgctctcctgttccgaccctgccgcttaccgg atacctgtccgcctttctcccttcgggaagcgtggcgctttctcatagctcacgctg taggtatctcagttcggtgtaggtcgttcgctccaagctgggctgtgtgcacgaacc ccccgttcagcccgaccgctgcgccttatccggtaactatcgtcttgagtccaaccc ggtaagacacgacttatcgccactggcagcaggtaacctcgcgcatacagccgggca gtgacgtcatcgtctgcgcggaaatggacgggcccccggcgccagatctggggaac

The pwhl slf2-peaq_wril lvlhmgrmcs l_sqs-ldsp-fppsmcs2 plasmid encodes the following in multi-cloning site within site 1 (SEQ ID NO:l 10).

MKKRLTTSTC SSSPSSSVSS STTTSSPIQS EAPRPKRAKR AKKSSPSGDK SHNPTSPAST RRSSIYRGVT RHRWTGRFEA HLWDKSSWNS IQNKKGKQVY LGAYDSEEAA AHTYDLAALK YWGPDTILNF PAETYTKELE EMQRVTKEEY LASLRRQSSG FSRGVSKYRG VARHHHNGRW EARIGRVFGN KYLYLGTYNT QEEAAAAYDM AAIEYRGANA VTNFDISNYI DRLKKKGVFP FPVNQANHQE GILVEAKQEV ETREAKEEPR EEVKQQYVEE PPQEEEEKEE EKAEQQEAEI VGYSEEAAW NCCIDSSTIM EMDRCGDNNE LAWNFCMMDT GFSPFLTDQN LANENPIEYP ELFNELAFED NIDFMFDDGK HECLNLENLD CCWGRESNA ADEVATQLLN FDLLKLAGDV ESNPGPMISP LASEEDEEIV KSWNGTIPS YSLESKLGDC KRAAEIRREA LQRMMGRSLE GLPVEGFDYE SILGQCCEMP VGYVQIPVGI AGPLLLDGQE YSVPMATTEG CLVASTNRGC KAIHLSGGAS SVLLKDGMTR APVVRFASAM RAADLKFFLE NPENFDSLSI AFNRSSRFAK LQSIQCSIAG KNLYMRFTCS TGDAMGMNMV SKGVQNVLDF LQSDFPDMDV IGISGNFCSD KKPAAVNWIQ GRGKSWCEA IIKEEWKKV LKSSVASLVE LNMLKNLTGS AIAGALGGFN AHAGNIVSAI FIATGQDPAQ NVESSHCITM MEAVNDGKDL HISVTMPSIE VGTVGGGTQL ASQSACLNLL GVKGASKESP GANSRLLATI VAGSVLAGEL SLMSAIAAGQ LVRSHMKYNR SSKDVTKFAS S

The pwhl slf2-peaq_wril lv 1 hmgrmcs l_sqs-ldsp-fppsmcs2 plasmid encodes the following in multi-cloning site within site 2 (SEQ ID NO:l 11).

MASAILASLL HPSEVLALVQ YKLSPKTQHD YSNDKTRQRL YHHLNMTSRS FSAVIQDLDE ELKDAICLFY LVLRGLDTIE

DDMTIDLDTK LPYLRTFHEI IYQKGWTFTK NGPNEKDRQL LVEFDAIIEG FLQLKPAYQT IIADITKRMG NGMAHYATAG IHVETNADYD EYCHYVAGLV GLGLSEMFSA CGFESPLVAE RKDLSNSMGL FLQKTNIARD YLEDLRDNRR FWPKEIWGQY AETMEDLVKP ENKEKALQCL SHMIVNAMEH IRDVLEYLSM IKNPSCFKFC AIPQVMAMAT LNLLHSNYKV FTHENIKIRK GETVWLMKES DSMDKVAAIF RLYARQINNK SNSLDPHFVD IGVICGEIEQ ICVGRFPGST IEMKRMQAGV LGGKTGTVLM AGPIMTSAPS ATTPTGKTMP FKQPFKTVAT LSAKTGNITK PIDPAISKTI DFVYNGYSTV KTKVDKAPKV NPYLLIAGGL VLSCIISMCL LVPAVIFFPV TIFLGVATSF ALIALAPVAF VFGWILISSA PIQDKWVPA LDKVLANKKV AKFLLKEMAD LKSTFLDVYS VLKSDLLQDP SFEFTHESRQ WLERMLDYNV RGGKLNRGLS WDSYKLLKQ GQDLTEKETF LSCALGWCIE WLQAYFLVLD DIMDNSVTRR GQPCWFRKPK VGMIAINDGI LLRNHIHRIL KKHFREMPYY VDLVDLFNEV EFQTACGQMI DLITTFDGEK DLSKYSLQIH RRIVEYKTAY YSFYLPVACA LLMAGENLEN HTDVKTVLVD MGIYFQVQDD YLDCFADPET LGKIGTDIED FKCSWLWKA LERCSEEQTK ILYENYGKAE PSNVAKVKAL YKELDLEGAF MEYEKESYEK LTKLIEAHQS KAIQAVLKSF LAKIYKRQK

References

1. Chapman, K. D. & Ohlrogge, J. B. Compartmentation of triacylglycerol

accumulation in plants. J. Biol. Chem. 287, 2288-2294 (2012).

2. Li, M. et al. Purification and structural characterization of the central

hydrophobic domain of oleosin. /. Biol Chem. 277, 37888-37895 (2002).

3. Zale, J. et al Metabolic engineering of sugarcane to accumulate energy-dense triacylglycerols in vegetative biomass. Plant BiotechnoL J. 14, 661-669 (2016).

4. Yang, Y. et al Ectopic expression of WRI1 affects fatty acid homeostasis in Brachypodium distachyon vegetative tissues. Plant Physiol. 169,1836-1847 (2015).

5. Du, Z. Y. & Benning, C. Triacylglycerol accumulation in photosynthetic cells in plants and algae. Subcell. Biochem. 86, 179-205 (2016).

6. Cemac, A. & Benning, C. WRINKLED1 encodes an AP2/EREB domain

protein involved in the control of storage compound biosynthesis in

Arabidopsis. Plant J. 40, 575-585 (2004).

7. Maeo, K. etal. An AP2-type transcription factor, WRINKLED 1, of

Arabidopsis thaliana binds to the AW-box sequence conserved among proximal upstream regions of genes involved in fatty acid synthesis. Plant J. 60, 476-487 (2009).

8. Sanjaya, Durrett, T. P., Weise, S. E. & Benning, C. Increasing the energy density of vegetative tissues by diverting carbon from starch to oil biosynthesis in transgenic Arabidopsis. Plant BiotechnoL J. 9, 874-883 (2011).

9. Vanhercke, T. et al. Metabolic engineering of biomass for high energy

density: oilseed-like triacylglycerol yields from plant leaves. Plant BiotechnoL J. 12, 231-239 (2014). 10. Grimberg, A., Carlsson, A. S., Marttila, S., Bhalerao, R. & Hofvander, P. Transcriptional transitions in Nicotiana benthamiana leaves upon induction of oil synthesis by WRINKLED 1 homologs from diverse species and tissues. BMC Plant Biol 15, 192 (2015).

11. Ma, W. etal. Deletion of a C -terminal intrinsically disordered region of

WRINKLED 1 affects its stability and enhances oil accumulation in

Arabidopsis. Plant J. 83, 864-874 (2015).

12. Fan, J., Yan, C., Zhang, X. & Xu, C. Dual role for phospholipid:diacylglycerol acyltransferase: enhancing fatty acid synthesis and diverting fatty acids from membrane lipids to triacylglycerol in Arabidopsis leaves. Plant Cell 25, 3506- 3518 (2013).

13. Lange, B. M. & Ahkami, A. Metabolic engineering of plant monoterpenes, sesquiterpenes and diteipenes-current status and future opportunities. Plant BiotechnoL J. 11, 169-196 (2013).

14. Augustin, J. M., Higashi, Y., Feng, X. & Kutchan, T. M. Production of mono- and sesquiterpenes in Camelina saliva oilseed. Planta 242, 693-708 (2015).

15. Reed, J. et al. A translational synthetic biology platform for rapid access to gram-scale quantities of novel drug-like molecules. Metab. Eng. 42, 185-193 (2017).

16. Wu, S. et al. Redirection of cytosolic or plastidic isoprenoid precursors

elevates terpene production in plants. Nat. BiotechnoL 24, 1441-1447 (2006).

17. Pateraki, I. et al Manoyl oxide (13R), the biosynthetic precursor of forskolin, is synthesized in specialized root cork cells in Coleus forskohlii. Plant Physiol 164, 1222-1236 (2014).

18. Liao, P., Hemmerlin, A., Bach, T. J. & Chye, M. L. The potential of the

mevalonate pathway for enhanced isoprenoid production. BiotechnoL Adv. 34, 697-713 (2016).

19. Frank, A. & Groll, M. The Methylerythritol Phosphate Pathway to

Isoprenoids. Chem. Rev. 117, 5675-5703 (2017).

20. Banerjee, A. & Sharkey, T. D. Methylerythritol 4-phosphate (MEP) pathway metabolic regulation. Nat. Prod. Rep. 31, 1043-1055 (2014).

21. Chappell, J., Wolf, F., Proulx, J., Cuellar, R. & Saunders, C. Is the reaction catalyzed by 3-hydroxy-3-methylglutaryl coenzyme A reductase a rate- limiting step for isoprenoid biosynthesis in plants? Plant Physiol 109, 1337- 1343 (1995).

22. Estevez, J. M., Cantero, A., Reindl, A., Reichler, S. & Leon, P. 1-Dcoxy-D- xylulose-5-phosphate synthase, a limiting enzyme for plastidic isoprenoid biosynthesis in plants. J. Biol Chem. 276, 22901-22909 (2001).

23. Bruckner, K. & Tissier, A. High-level diterpene production by transient

expression in Nicotiana benthamiana. Plant Methods 9, 46 (2013).

24. Vieler, A., Brubaker, S. B., Vick, B. & Benning, C. A lipid droplet protein of Nannochloropsis with functions partially analogous to plant oleosins. Plant Physiol 158, 1562-1569 (2012).

25. Skrakrud, C. L., Taylor, S. E., Hawkins, D. R. & Calvin, M. in The

Metabolism, Structure, and Function of Plant Lipids (eds. Paul K. Stumpf, J. Brian Mudd, & W. David Nes) 115-118 (Springer New York, 1987).

26. Keim, V. et al. Characterization of Arabidopsis ITS isozymes and FPS gene expression analysis provide insight into the biosynthesis of isoprenoid precursors in seeds. PloS One 7, e49109 (2012). 27. Vogel, B. S., Wildung, M. R., Vogel, G. & Croteau, R. Abietadiene synthase from grand fir (Abies gmndis ): cDNA isolation, characterization, and bacterial expression of a bifunctional diterpene cyclase involved in resin acid biosynthesis. J. Biol. Chem. 271, 23262-23268 (1996).

28. Peters, R. J. etal. Abietadiene synthase from grand fir (Abies grandis):

characterization and mechanism of action of tire "pseudomature" recombinant enzyme. Biochem. 39, 15592-15602 (2000).

29. Keeling, C. L, Madilao, L. L., Zerbe, P., Dullat, H. K. & Bohlmann, J. The primary diterpene synthase products of Picea abies

levopimaradiene/abietadiene synthase (PaLAS) are epimers of a thermally unstable diterpenol. /. Biol. Chem. 286, 21145-21153 (2011).

30. Noike, M., Katagiri, T., Nakayama, T., Nishino, T. & Hemmi, H. Effect of mutagenesis at the region upstream from the G(Q/E) motif of three types of geranylgeranyl diphosphate synthase on product chain-length. /. BioscL Bioeng. 107, 235-239 (2009).

31. Chang, T. H., Guo, R. T., Ko, T. P., Wang, A. H. & Liang, P. H. Crystal

structure of type-III geranylgeranyl pyrophosphate synthase from

Saccharomyces cerevisiae and the mechanism of product chain length determination. J. Biol. Chem. 281, 14991-15000 (2006).

32. Xu, Q. et al Discovery and comparative profiling of microRNAs in a sweet orange red-flesh mutant and its wild type. BMC Genomics 11, 246-246 (2010).

33. Zhou, F. et al. A recruiting protein of geranylgeranyl diphosphate synthase controls metabolic flux toward chlorophyll biosynthesis in rice. Proc. Natl. Acad. Sci. 114, 6866-6871 (2017).

34. Ruiz-Sola, M. A. et al Arabidopsis GERANYLGERANYL DIPHOSPHATE SYNTHASE 11 is a hub isozyme required for the production of most photosynthesis-related isoprenoids. New PhytoL 209, 252-264 (2016).

35. Hamberger, B., Ohnishi, T., Bamberger, B., Seguin, A. & Bohlmann, J.

Evolution of diterpene metabolism: Sitka spruce CYP720B4 catalyzes multiple oxidations in resin acid biosynthesis of conifer defense against insects. Plant Physiol 157, 1677-1695 (2011).

36. Dong, L., Jongedijk, E., Bouwmeester, H. & Van Der Krol, A. Monoterpene biosynthesis potential of plant subcellular compartments. New Phytol 209, 679-690 (2016).

37. van Herpen, T. W. et al Nicotiana benthamiana as a production platform for artemisinin precursors. PloS One 5, el 4222 (2010).

38. Gnanasekaran, T. etal. Heterologous expression of the isopimaric acid

pathway in Nicotiana benthamiana and the effect of N-terminal modifications of the involved cytochrome P450 enzyme. J. Biol. Eng. 9, 24 (2015).

39. Jagalski, V. et al Biophysical study of resin acid effects on phospholipid membrane structure and properties. Biochim. Biophys. Acta 1858, 2827-2838 (2016).

40. Delatte, T. L. etal Engineering storage opacity for volatile sesquiterpenes in Nicotiana benthamiana leaves. Plant Biotechno L J. (2018) Epub ahead of print.

41. Zhao, C. et al. Co-Comp alimentation of terpene biosynthesis and storage via synthetic droplet. ACS Synth. Biol. 7,774-781 (2018).

42. Tissier, A., Morgan, J. A. & Dudareva, N. Plant Volatiles: Going 'in' but not 'out' of trichome cavities. Trends Plant Sci 22, 930-938 (2017). 43. Uehling, J. etal. Comparative genomics of Mortierella elongata and its bacterial endosymbiont Mycoavidus cysteinexigens. Environ. Microbiol 19, 2964-2983 (2017).

44. Xiao, M. et al Transcriptome analysis based on next-generation sequencing of non-model plants producing specialized metabolites of biotechnological interest. /. BiotechnoL 166, 122-134 (2013).

45. Yerrapragada, S. etal. Extreme sensory complexity encoded in the 10- megabase draft genome sequence of tire chromatically acclimating

cyanobacterium Tolypothrix sp. PCC 7601. Genome Announc. 3, e00355-15 (2015).

46. Earley, K. W. et al. Gateway-compatible vectors for plant functional

genomics and proteomics. Plant /. 45, 616-629 (2006).

47. Voinnet, O., Pinto, Y. M. & Baulcombe, D. C. Suppression of gene silencing: a general strategy used by diverse DNA and RNA viruses of plants. Proc.

Natl. Acad. ScL 96, 14147-14152 (1999).

48. Voinnet, O., Pinto, Y. M. & Baulcombe, D. C. Correction for Voinnet et aL, Suppression of gene silencing: A general strategy used by diverse DNA and RNA viruses of plants. Proc. Natl. Acad. ScL 112, E4812 (2015).

49. Ding, Y. et al. Isolating lipid droplets from multiple species. Nat. Protoc. 8, 43

(2012).

All patents and publications referenced or mentioned herein are indicative of the levels of skill of those skilled in the art to which the invention pertains, and each such referenced patent or publication is hereby specifically incorporated by reference to die same extent as if it had been incorporated by reference in its entirety individually or set forth herein in its entirety. Applicants reserve the right to physically incorporate into this specification any and all materials and information from any such cited patents or publications.

The following statements are intended to describe and summarize various features of tire invention according to the foregoing description provided in the specification and figures.

Statements:

1. A fusion protein comprising a lipid droplet surface protein linked in-frame to one or more a fusion partners comprising a monoterpene synthase, diterpene synthase, sesquiterpene synthase, sesterterpene synthase, triterpene synthase, tetraterpene synthase, polyterpene synthase, transcription factor, cytochrome P450, cytochrome P450 reductase, 1-deoxy-D-xylulose 5-phosphate synthase (DXS), 1- deoxy-D-xylulose 5-phosphate-reducto-isomerase, cytidine 5'-diphosphate- methylerythritol (CDP-ME) synthetase (IspD), 2-C-methyl-d-erythritol 2,4- cyclodiphosphate synthase (IspF), geranylgeranyl diphosphate synthase (GGDPS), HMG-CoA synthase, HMG-CoA reductase (HMGR), mevalonic acid kinase (MVK), phosphomevalonate kinase (PMK), mevalonate-5-diphosphate decarboxylase (MPD), isopentenyl diphosphate isomer ase (IDI), abietadiene synthase (ABS), famesylpyrophosphatc synthase (FPPS), ribulose bisphosphate carboxylase, squalene synthase (SQS), patchoulol synthase, or WRI1 protein.

2. The fusion protein of statement 1 , wherein the lipid droplet surface protein has a sequence with at least 90% sequence identity to SEQ ID NO:l , or a truncated sequence with at least 90% sequence identity to a sequence consisting of less than 120 contiguous amino acids, or less than 110 contiguous amino acids, or less than 105 contiguous amino acids, or less than 100 contiguous amino acids, or less than 95 contiguous amino acids, or less than 90 contiguous amino acids, or less than 85 contiguous amino acids, or less than 80 contiguous amino acids, or less than 75 contiguous amino acids of SEQ ID NO:l.

3. The fusion protein of statement 1 and 2, wherein the fusion partner is a

polypeptide with at least 95% sequence identity to a sequence comprising SEQ ID NO:3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51 , 52, 53, 54, 55, 56, 59, 61 , 63, 64, 65, 67, 68, 69, 71 , 72, 73, 75, 76, 77, 79, 80, 81, 83, 84, 85, 87, 89, 91, 92, 93, 95, 96, 97, 99, 101, 104, 105, 107, 108, 110, or 111.

4. An expression system comprising at least one expression cassette (or expression vector) having a heterologous promoter operably linked to a nucleic acid segment encoding a lipid droplet surface protein and another expression cassette (or expression vector) comprising a heterologous promoter operably linked to a nucleic acid segment encoding one or more of the following proteins:

monoterpene synthase, diterpene synthase, sesquiterpene synthase, sesterterpene synthase, triterpene synthase, tetraterpene synthase, polyterpene synthase, transcription factor, cytochrome P450, cytochrome P450 reductase, 1-deoxy-D- xylulose 5-phosphate synthase (DXS), 1-deoxy-D-xylulose 5-phosphate-reducto- isomerase, cytidine 5 '-diphosphate-methylerythritol (CDP-ME) synthetase (IspD), 2-C-methyl-d-erythritol 2,4-cyclodiphosphate synthase (IspF), geranylgeranyl diphosphate synthase (GGDPS), HMG-CoA synthase, HMG-CoA reductase (HMGR), mevalonic acid kinase (MVK), phosphomevalonate kinase (PMK), mevalonate-5 -diphosphate decarboxylase (MPD), isopentenyl diphosphate isomerase (IDI), abietadiene synthase (ABS), famesylp yrophosphate synthase (FPPS), ribulose bisphosphate carboxylase, squalene synthase (SQS), patchoulol synthase, or WRI1 protein.

5. An egression system comprising at least one expression cassette (or expression vector) having a heterologous promoter operably linked to a nucleic acid segment encoding a fusion protein, the fusion protein comprising a lipid droplet surface protein linked in-frame to one or more a fusion partners comprising a

monoterpene synthase, diteipene synthase, sesquiterpene synthase, sesterterpene synthase, triterpene synthase, tetraterpene synthase, polyterpene synthase, transcription factor, cytochrome P450, cytochrome P450 reductase, 1-deoxy-D- xylulose 5-phosphate synthase (DXS), 1-deoxy-D-xylulose 5-phosphate-reducto- isomerase, cytidine 5’-diphosphate-methylerythritol (CDP-ME) synthetase (IspD), 2-C-methyl-d-erythritol 2,4-cyclodiphosphate synthase (IspF), geranylgeranyl diphosphate synthase (GGDPS), HMG-CoA synthase, HMG-CoA reductase (HMGR), mevalonic acid kinase (MVK), phosphomevalonate kinase (PMK), mevalonate-5-diphosphate decarboxylase (MPD), isopentenyl diphosphate isomerase (IDI), abietadiene synthase (ABS), famesylp yrophosphate synthase (FPPS), ribulose bisphosphate carboxylase, squalene synthase (SQS), patchoulol synthase, or WRI1 protein.

6. The expression system of statement 4 or 5, further comprising at least one

expression cassette (or expression vector), each having a heterologous promoter operably linked to a nucleic acid segment encoding a protein selected from gpranylgeranyi diphosphate synthase (GGDPS), famesylpyrophosphate synthase (FPPS), 1-deoxy-D-xylulose 5 -phosphate synthase (DXS), abietadiene synthase (ABS), cytochrome P450, cytochrome P450 reductase, mevalonic acid kinase (MVK), phosphomevalonate kinase (PMK), mevalonate-5-diphosphate decarboxylase (MPD), cytidine S'-diphosphate-methylerythritol (CDP-ME) synthetase (IspD), 2-C-methyl-d-erythritol 2,4-cyclodiphosphate synthase (IspF), isopentenyl diphosphate isomerase (IDI), ribulose bisphosphate carboxylase, or WRI1 protein.

7. The expression system of statement 4, 5 or 6, wherein the fusion protein and protein are encoded by separate expression cassettes (or expression vectors).

8. The expression system of statement 4-6 or 7, wherein the fusion protein and each protein are encoded within one expression cassette (or expression vector), wherein expression of the fusion protein and at least one protein is from one promoter that drives expression of the fusion protein and the at least one protein.

9. An expression system comprising a first expression cassette or first expression vector comprising a heterologous promoter operably linked to a nucleic acid segment encoding a WRINKLED (WRI1) transcription factor, and a second expression cassette or second expression vector comprising a heterologous promoter operably linked to a nucleic acid segment encoding a lipid droplet surface protein (LDSP).

10. The expression system of statement 9, further comprising an expression cassette or expression vector comprising a heterologous promoter operably linked to a nucleic acid segment encoding a abietadiene synthase (ABS).

11. An expression system comprising at least one expression cassette or expression vector comprising one or more nucleic acid segment, each nucleic acid segment encoding one or more of die following proteins: encoding one or more of die following proteins: a HMG-CoA reductase (HMGR), famesylpyrophosphate synthase (FPPS), patchoulol synthase, or a combination thereof, wherein a heterologous promoter is operable linked to each of the nucleic acid segments encoding a protein.

12. An expression system comprising at least one expression cassette or expression vector comprising one or more nucleic acid segment, each nucleic acid segment encoding one or more of die following proteins: 1-deoxy-D-xylulose 5-phosphate synthase (DXS), famesylpyrophosphate synthase (FPPS), patchoulol synthase, lipid droplet surface protein (LDSP), WRINKLED, or a combination thereof, wherein a heterologous promoter is operable linked to each of the nucleic acid segments encoding a protein.

13. An expression system comprising at least one expression cassette or expression vector comprising one or more nucleic acid segment, each nucleic acid segment encoding one or more of the following proteins: 1-deoxy-D-xytulose 5-phosphate synthase (DXS), geranylgeranyl diphosphate synthase (GGDPS), abietadiene synthase (ABS), or a combination thereof, wherein a heterologous promoter is operable linked to each of the nucleic acid segments encoding a protein.

14. An expression system comprising at least one expression cassette or expression vector comprising one or more nucleic acid segment, each nucleic acid segment encoding one or more of the following proteins: HMG-CoA reductase (HMGR), geranylgeranyl diphosphate synthase (GGDPS), abietadiene synthase (ABS), or a combination thereof, wherein a heterologous promoter is operable linked to each of die nucleic acid segments encoding a protein.

15. The expression system of statement 11-14, further comprising an expression cassette or expression vector comprising one or more nucleic acid segments encoding at least one of the following proteins cytochrome P450, cytochrome P450 reductase, or a combination thereof, wherein optionally one or more nucleic acid segments encoding the cytochrome P450, cytochrome P450 reductase, or both are linked to in-frame to a nucleic acid segment encoding lipid surface droplet protein.

16. The expression system of statement 4-14 or 15, wherein the fusion partner or die at least one protein is linked in-frame to a plastid targeting segment.

17. The expression system of statement 4-14 or 15, wherein the fusion partner or the protein is not linked in-frame to a plastid targeting segment.

18. The expression system of statement 4-16 or 17, wherein a plastid targeting region or a hydrophobic region is removed from the nucleic acid segment encoding the one or more protein.

19. The expression system of statement 4-17 or 18, further comprising an expression cassette comprising a promoter operably linked to a nucleic acid encoding a WRI1 transcription factor.

20. The expression system of statement 4-18 or 19, further comprising an expression cassette (or expression vector) comprising a promoter operably linked to a nucleic acid encoding a lipid droplet surface protein.

21. The expression system of statement 4-19 or 20, wherein the fusion partner or protein has at least 90% sequence identity to a sequence comprising SEQ ID NO: 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51 , 52, 53, 54, 55, 56, 59, 61 , 63, 64, 65, 67, 68, 69, 71 , 72, 73, 75, 76, 77, 79, 80, 81, 83, 84, 85, 87, 89, 91, 92, 93, 95, 96, 97, 99, 101, 104, 105, 107, 108, 110, or 111.

22. The expression system of statement 4-20 or 21 , wherein the nucleic acid segment is codon-optimized for expression in plastid or in a host cell.

23. The expression system of statement 4-21 or 22, wherein one or more of die

heterologous promoters is active in plant plastids. 24. A host cell, host tissue, host seed, or a host plant comprising the expression system of statement 4-22 or 23.

25. The host cell, host tissue, host seed, or a host plant of statement 24, each

comprising insect cells, plant cells, fungal cells, insect tissues, plant tissues, or fungal tissues.

26. The host cell, host tissue, host seed, or a host plant of statement 24 or 25, which is an oil-producing plant species.

27. The host cell, host tissue, host seed, or a host plant of statement 24, 25 or 26, which is an oilseed, camelina, canola, castor bean, com, flax, lupin, peanut, potatoe, safflower, soybean, sunflower, cottonseed, oil firewood tree, rape seed, rutabaga, sorghum, walnut, or nut species.

28. The host cell, host tissue, host seed, or a host plant of statement 24, 25 or 26, which is a Nicotiana benthamiana, Nicotiana tabacum, Nicotiana rustica, Nicotiana excelsior, or Nicotiana excelsiana species.

29. The host cell, host tissue, host seed, or a host plant of statement 24-26 or 27, which is not a Nicotiana benthamiana species.

30. A method comprising (a) incubating a population of host cells or a host tissue comprising an expression system of statement 4-22 or 23; and (b) isolating lipids from the population of host cells or the host tissue.

31. The method of statement 30 comprising (a) incubating a population of host cells or a host tissue comprising an expression system that includes at least one expression cassette having a heterologous promoter operably linked to a nucleic acid segment encoding a fusion protein comprising a lipid droplet surface protein linked in-frame to one or more a fusion partners comprising a monoterpene synthase, diterpene synthase, sesquiterpene synthase, sesterterpene synthase, triterpene synthase, tetraterpene synthase, polyterpene synthase, transcription factor, cytochrome P450, cytochrome P450 reductase, 1-deoxy-D-xylulose 5- phosphate synthase (DXS), 1-deoxy-D-xylulose 5-phosphate-reducto-isomerase, cytidine 5'-diphosphate-methylerythritol (CDP-ME) synthetase (IspD), 2-C- methyl-d-erythritol 2,4-cyclodiphosphate synthase (IspF), geranylgeranyi diphosphate synthase (GGDPS), HMG-CoA synthase, HMG-CoA reductase (HMGR), mevalonic acid kinase (MVK), phosphomevalonate kinase (PMK), mevalonate-5 -diphosphate decarboxylase (MPD), isopentenyl diphosphate isomerase (IDI), abietadiene synthase (ABS), famesylp yrophosphate synthase (FPPS), ribulose bisphosphate carboxylase, squalene synthase (SQS), patchoulol synthase, or WRI1 protein; and (b) isolating lipids from the population of host cells or the host tissue.

32. The method of statement 30 or 31, wherein die population of host cells or the host tissue is within a plant.

33. The method of statement 30, 31 or 32, wherein the population of host cells or the host tissue is within a plant and the incubating comprises cultivating the plant or a seed of the plant.

34. A method comprising (a) cultivating a plant or a seed, the plant or the seed

comprising an expression system of statement 4-22 or 23 to generate a plant comprising lipid droplets within the plant’s cells; and (b) isolating lipids from the plant or the plant’s cells.

35. The method of statement 30-33 or 34, wherein the population of host cells, or the host tissue, or the cells of the plant further comprise at least one expression cassette (or expression vector), each having a heterologous promoter operably linked to a nucleic acid segment encoding a protein selected from geranylgeranyl diphosphate synthase (GGDPS), famesylpyrophosphate synthase (FPPS), 1- deoxy-D-xylulose 5-phosphate synthase (DXS), abietadiene synthase (ABS), cytochrome P450, cytochrome P450 reductase, mevalonic acid kinase (MVK), phosphomevalonate kinase (PMK), mevalonate-5-diphosphate decarboxylase (MPD), cytidine S'-diphosphate-methylerythritol (CDP-ME) synthetase (IspD), 2- C-methyl-d-erythritol 2,4-cyclodiphosphate synthase (IspF), isopentenyl diphosphate isomerase (IDI), ribulose bisphosphate carboxylase, or WRI1 protein.

36. The method of statement 30-34 or 35, wherein each fusion protein or protein is encoded by a separate expression cassette (or expression vector).

37. The method of statement 30-34 or 35, wherein at least two fusion proteins or proteins are encoded in a single expression vector.

38. The method of statement 30-36 or 37, wherein the population of host cells or the host tissue further comprises a heterologous expression cassette (or expression vector) comprising a promoter operably linked to a nucleic acid encoding a WRI1 transcription factor.

39. The method of statement 30-37 or 38, wherein the population of host cells or the host tissue further comprises a heterologous expression cassette (or expression vector) comprising a promoter operably linked to a nucleic acid encoding a lipid droplet surface protein.

40. The method of statement 30-38 or 39, wherein a segment encoding a plastid

targeting region or a hydrophobic region is removed from the nucleic acid segment encoding the one or more fusion partner or protein.

41. The method of statement 30-39 or 40, wherein one or more nucleic acid segment encoding the fusion protein, or the protein is codon-optimized for expression in plant plastids (Mr in a host cell

42. The method of statement 30-40 or 42, wherein the expression system comprises an expression cassette comprising a promoter operably linked to a nucleic acid segment encoding an enzyme with at least 90% sequence identity to a sequence comprising SEQ ID NO: 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 52, 53, 54, 55, 56, 59, 61, 63, 64, 65, 67, 68, 69, 71, 72, 73, 75, 76, 77, 79, 80, 81 , 83, 84, 85, 87, 89, 91, 92, 93, 95, 96, 97, 99, 101, 104, 105, 107, 108, 110, or 111.

43. The method of statement 30-41 or 42, wherein the lipids isolated from the

population of host cells comprise one or more types of terpene.

44. The method of statement 30-42 or 43, further comprising isolating terpenes from the lipids isolated from the population of host cells or tissues.

45. The method of statement 30-43 or 44, wherein the lipids isolated from the

population of host cells comprise one or more types of monoterpene, diterpene, sesquiterpene, sesterterpene, triterpene, tetraterpene, polyterpene, or a mixture thereof.

46. The method of statement 30-44 or 45, wherein after incubation, the host cells or tissues have at least 0.05%, at least 0.1%, at least 0.2%, at least 0.25%, or at least 0.3% fresh weight monoterpene, diterpene, sesquiterpene, sesterterpene, triterpene, tetraterpene, polyterpene, or a mixture thereof.

The specific methods, devices and compositions described herein are representative of preferred embodiments and are exemplary and not intended as limitations on the scope of the invention. Other objects, aspects, and embodiments will occur to those skilled in the art upon consideration of this specification, and are encompassed within the spirit of the invention as defined by the scope of the claims. It will be readily apparent to one skilled in the art that varying substitutions and modifications may be made to the invention disclosed herein without departing from the scope and spirit of the invention.

The invention illustratively described herein suitably may be practiced in die absence of any element or elements, or limitation or limitations, which is not specifically disclosed herein as essential. The methods and processes illustratively described herein suitably may be practiced in differing orders of steps, and the methods and processes are not necessarily restricted to the orders of steps indicated herein or in the claims.

Under no circumstances may the patent be interpreted to be limited to the specific examples or embodiments or methods specifically disclosed herein. Under no circumstances may the patent be interpreted to be limited by any statement made by any Examiner or any other official or employee of the Patent and Trademark Office unless such statement is specifically and without qualification or reservation expressly adopted in a responsive writing by Applicants.

The terms and expressions that have been employed are used as terms of description and not of limitation, and there is no intent in the use of such terms and expressions to exclude any equivalent of the features shown and described or portions thereof, but it is recognized that various modifications are possible within the scope of the invention as claimed. Thus, it will be understood that although the present invention has been specifically disclosed by preferred embodiments and optional features, modification and variation of the concepts herein disclosed may be resorted to by those skilled in the art, and that such modifications and variations are considered to be within the scope of this invention as defined by the appended claims and statements of the invention.

The invention has been described broadly and genetically herein. Each of the narrower species and subgeneric groupings falling within the generic disclosure also form part of the invention. This includes the generic description of the invention with a proviso or negative limitation removing any subject matter from die genus, regardless of whether or not the excised material is specifically recited herein. In addition, where features or aspects of the invention are described in terms of Markush groups, those skilled in the art will recognize that the invention is also thereby described in terms of any individual member or subgroup of members of the Markush group.