Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
NOVEL DITERPENE SYNTHASES AND THEIR USE FOR PRODUCTION OF DITERPENES
Document Type and Number:
WIPO Patent Application WO/2018/022654
Kind Code:
A1
Abstract:
Diterpene synthases and methods of their use are described herein.

Inventors:
ZERBE PHILIPP (US)
MAFU SIBONGILE (US)
Application Number:
PCT/US2017/043786
Publication Date:
February 01, 2018
Filing Date:
July 25, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV CALIFORNIA (US)
International Classes:
C12N15/70; A61K31/015; C07C13/36; C07C13/39; C07K14/415; C12N9/88; C12N15/85; C12P5/00
Domestic Patent References:
WO2009044336A92009-07-09
WO2013110673A12013-08-01
Foreign References:
US20140371311A12014-12-18
US6887895B22005-05-03
Other References:
BYUN-MCKAY, A ET AL.: "Wound-Induced Terpene Synthase Gene Expression in Sitka Spruce That Exhibit Resistance or Susceptibility to Attack by the White Pine Weevil", PLANT PHYSIOLOGY, vol. 140, no. 3, March 2006 (2006-03-01), pages 1009 - 1021, XP055459865
KEELING, CI ET AL.: "Transcriptome mining, functional characterization, and phylogeny of a large terpene synthase gene family in spruce (Picea spp.", BMC PLANT BIOLOGY, vol. 11, no. 43, 7 March 2011 (2011-03-07), XP021098386
ZERBE, P ET AL.: "Plant diterpene synthases: exploring modularity and metabolic diversity for bioengineering", TRENDS IN BIOTECHNOLOGY, vol. 33, no. 7, 20 May 2015 (2015-05-20), pages 419 - 428, XP029175823
MAFU, S ET AL.: "Biosynthesis of the microtubule-destabilizing diterpene pseudolaric acid B from golden larch involves an unusual diterpene synthase", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE U.S.A., vol. 114, no. 5, 31 January 2017 (2017-01-31), pages 974 - 979, XP055459855
DATABASE GenBank [O] 11 January 2017 (2017-01-11), ZERBE, P ET AL., XP055459859, Database accession no. APT40486.1
Attorney, Agent or Firm:
BASTIAN, Kevin L. et al. (US)
Download PDF:
Claims:
WHAT IS CLAIMED IS: 1. An isolated nucleic acid molecule, comprising a sequence of nucleotides encoding a diterpene synthase polypeptide selected from among: a) a polypeptide whose sequence is set forth in SEQ ID NO:l; b) a polypeptide encoded by a nucleotide sequence forth in GenBank accession number KU685114; c) an active fragment of the polypeptide of a) or b); and c) a polypeptide having a sequence of amino acids that has at least 95% sequence identity with a polypeptide of a), b), or c), wherein: the encoded polypeptide or active fragment catalyzes the formation of sibongilene from geranylgeranyl diphosphate (GGPP). 2. The isolated nucleic acid of claim 1, wherein the isolated nucleic acid is cDNA. 9. The isolated nucleic acid of claim 1, wherein the isolated nucleic acid encodes the diterpene synthase polypeptide whose sequence is set forth in SEQ ID NO:l. 4. The isolated nucleic acid of claim 1, wherein the isolated nucleic acid encodes the diterpene synthase polypeptide encoded by a nucleotide sequence forth in GenBank accession number KU6851 14.

5. A vector comprising the isolated nucleic acid of claim 1. 6. A host cell comprising an isolated nucleic acid molecule, comprising a sequence of nucleotides encoding a diterpene synthase polypeptide selected from among: a) a polypeptide whose sequence is set forth in one of SEQ ID NO: 1; b) a polypeptide encoded by a nucleotide sequence forth in GenBank accession number K.U685114; c) an active fragment of the polypeptide of a) or b); and c) a polypeptide having a sequence of amino acids that has at least 95% sequence identity with a polypeptide of a), b), or c), wherein: the encoded polypeptide or active fragment catalyzes the formation of sibongilene from geranylgeranyl diphosphate (GGPP), wherein the encoded diterpene synthase polypeptide is heterologous to the host cell. 7. The host cell of claim 6 that is a prokaryotic host cell. 8. The host cell of claim 6 that is a eukaryotic host cell.

9. The host cell of claim 7 that is an E. coli ceil. 10. The host cel! of claim 8, wherein the host cell is selected from the group consisting of a fungal, plant, insect, or amphibian host cell. The host cell of claim 8, wherein the host cell is an animal cell. 12. The host cell of claim 8, wherein the host cell is a yeast cell. 13. T'he host cell of claim 6, wherein the host cell produces a 7,5-fused bicyclic diterpene. 14. The host cell of claim 13, wherein the host cell produces sibongilene. 15. A method of producing a 7,5-fused bicyclic diterpene, comprising: i) contacting (E,E,E)-geranylgeranyl diphosphate (GGPP) with a diterpene synthase polypeptide encoded by the nucleic acid molecule of claim 1 under conditions effective to produce the cis-7,5-fused bicyclic diterpene, wherein: contacting is effected with an isolated diterpene synthase polypeptide, or contacting is effected in a host cell comprising the nucleic acid molecule, and the nucleic acid molecule is heterologous to the host cell; and ii) optionally, isolating the 7,5-fused bicyclic diterpene produced in step i).

16. The method of claim 15, wherein the 7,5-fused bicyclic diterpene is sibongilene. 17. 4'he method of claim 15, wherein the method further comprises isolating the 7,5-fused bicyclic diterpene. 18. The method of claim 15, wherein the method further comprises converting the 7,5-fused bicyclic diterpene to a pseudolaric acid. 19. The method of claim 15, wherein the method further comprises converting the 7,5-fused bicyclic diterpene to pseudolaric acid B.

20. The method of claim 19. wherein the method further comprises isolating the pseudolaric acid B.

21. A pharmaceutical composition comprising sibongilene and a pharmaceutical excipient.

22. A composition comprising isolated sibongilene. 23. A host cell, host cell lysate, or host cell conditioned medium comprising sibongilene, wherein the sibongilene is heterologous to the host cell.

Description:
PA'I'ENT

Attorney Docket No. 081906-2229lOPC-1054616

Client Ref. No. UC Ca.se: 2015-775 NOVEL DITERPENE SYNTHASES AND THEIR USE FOR PRODUCTION OF DITERPENES CROSS-REFERENCES TO RELATED APPLICATIONS

[0001] The present invention claims benefit of priority to US Provisional Application No. 62/367,264, filed .Inly 27, 2016, which is incorporated by reference for all purposes.. BRIEF SUMMARY OF INVENTION

[0002] The diversity of small molecules produced via plant diterpene metabolism offers a plethora of known and potentially novel therapeutics. Among these, the microtubule- destabilizing activity of pseudolaric acid B (PAB) holds promise for new anticancer agents. PAB is found, perhaps uniquely, in the roots of the coniferous tree golden larch {Pseudolarix amabilis, Pxa).

|0003] In one aspect, the present invention provides an isolated nucleic acid molecule, comprising a sequence of nucleotides encoding a diterpene synthase polypeptide selected from among: a) a polypeptide whose sequence is set forth in SEQ ID NO:l or is at least 85%, 90%, 95%, or 99% identical to the sequence set forth in SEQ ID NO:l ; b) a polypeptide encoded by a nucleotide sequence forth in GenBank accession number K.U685114, or at least 85%, 90%, 95%, or 99% identical to the sequence set forth in GenBank accession number KU685114; c) an active fragment of the polypeptide of a) or b); and c) a polypeptide having a sequence of amino acids that has at lea.st 95% sequence identity with a polypeptide of a), b), or c), wherein: the encoded polypeptide or active fragment catalyzes the formation of sibongilene from geranylgeranyl diphosphate (GGPP).

|0004| In some embodiments, the isolated nucleic acid is cDNA. In some embodiments, the the isolated nucleic acid encodes the diterpene synthase polypeptide whose sequence is set forth in SEQ ID NO: I. In some embodiments, the isolated nucleic acid encodes the diterpene synthase polypeptide encoded by a nucleotide sequence forth in GenBank accession number KU685114.

|0005| In another aspect, the present invention provides a vector or a host cell, or a host cell lysate, comprising one of the foregoing isolated nucleic acids. In another aspect, the present invention provides a host cell comprising an isolated nucleic acid molecule, comprising a sequence of nucleotides encoding a diterpene synthase polypeptide selected from among: a) a polypeptide whose sequence is set forth in one of SEQ ID NO:l; b) a polypeptide encoded by a nucleotide sequence forth in GenBank accession number

KU685114; c) an active fragment of the polypeptide of a) or b); and c) a polypeptide having a sequence of amino acids that has at least 95% sequence identity with a polypeptide of a), b), or c), wherein: the encoded polypeptide or active fragment catalyzes the formation of sibongilene from geranylgeranyl diphosphate (GGPP), wherein the encoded diterpene synthase polypeptide is heterologous to the host cell. In some embodiments, the host cell is a prokaryotic host cell. In some embodiments, the host cell is a eukaryotic host cell. In some embodiments, the host cell is an E. coli cell. In some embodiments, the host cell is selected from the group consisting of a fungal, plant, insect, or amphibian host cell. In some embodiments, the host cell is an animal cell. In some embodiments, the host cell is a yeast cell. In some embodiments, the host cell produces a 7,5-fused bicyclic diterpene {e.g., a heterologous 7,5-fused bicyclic diterpene). In some embodiments, the host cell produces sibongilene.

|0006| In another aspect, the present invention provides a method of producing a 7,5-fused bicyclic diterpene, comprising: i) contacting (E,E,E)-geranylgeranyl diphosphate (GGPP) with a diterpene synthase polypeptide encoded by the nucleic acid molecule of claim 1 under conditions effective to produce the cis-7,5-fused bicyclic diterpene, wherein: contacting is effected with an isolated diterpene synthase polypeptide, or contacting is effected in a host cell comprising the nucleic acid molecule, and the nucleic acid molecule is heterologous to the host cell; and ii) optionally, isolating the 7,5-fused bicyclic diterpene produced in step i). In some embodiments the 7,5-fused bicyclic diterpene is sibongilene. In some embodiments, the method further comprises isolating the 7,5-fused bicyclic diterpene. In some embodiments, the method further comprises converting the 7,5-fused bicyclic diterpene to a pseudolaric acid. In some embodiments, the method further comprises converting the 7,5- fused bicyclic diterpene to pseudolaric acid B. In some embodiments, the method further comprises isolating the pseudolaric acid B.

|(H)07| In another aspect, the present invention provides a pharmaceutical composition comprising sibongilene and a pharmaceutical excipient. In another aspect, the present invention provides isolated sibongilene or a composition comprising isolated sibongilene. In another aspect, the present invention provides a host cell, host cell lysate, or host cell conditioned medium comprising sibongilene, wherein the sibongilene is heterologous to the host cell. BRIEF DESCRIPTION OF THE DRAWINGS

[0008] Fig. 1. Annotation of TPS genes in the root-specific transcriptome of P. amabilis (£-value threshold lir-50). Relative transcript abundance is based on FPKM values obtained via mapping of lllumina reads against assembled TPS reads. P, partial transcript; F, full length transcript.

[0009] Fig. 2. Maximum likelihood tree illustrating the phylogenetic relationship of PxaTPSS with members of the gymnosperm TPS-d clade. Bootstrap support of >80% (1000 repetitions) is highlighted. Tree rooted with Physcomitrella patens en/-kaurene synthase. Abbreviations and accession numbers are detailed in Table 2.

[0010] Fig. 3. Functional characterization of PxaTPSS and PxaTPSS. GC/MS analysis of products resulting from Agrobacterinm-mQAmtsA transient expression of PxaTPSS [A] and PxaTPSS [B] in Nicotiana benthamiana. Controls represent expression of the silencing suppressor protein pi 9 alone. Results are depicted as individual spectra. [C] Predicted structure and stereochemistry of sibongilene.

[0011] Fig. 4. [A] Right panel Homology model of PxaTPSS based on the structure of A. grandis a-BlS (2S) illustrating the typical diTPS 3-domain structure comprised of a y- (magenta), (3- (cyan) and a-domain (blue). Left panel Stereo view of the PxaTPSS (blue) class I active site with GGPP (yellow) docked in the cavity; Mg"' (cyan). Active site residues with impact on catalysis are depicted as compared to Taxus brevifolia TXS (orange). A. grandis BIS (green), H. grandis abietadiene synthase (purple). [B] Protein alignment of select PxaTPSS residues with known gymnosperm TPSs. [C] Scaled extracted ion chromatograms (EIC; miz 216) comparing sibongilene formation of Pxa TPSS variants to the wild-type (WT).

[0012] Fig. 5. Proposed mechanism of the PxaTPSS-catalyzed reaction based on quantum chemical calculations. After cleavage of the diphosphate group of GGPP. the initial carbocation (A) forms via 1,6-cyclization. Single step 1,2-alkyl shift and 6,10-cyclization afford a second carbocation (B) with the characteristic 5,7-fused bicyclic structure.

Deprotonation of (B) yields sibongilene. |0013] Fig. 6. Biosynthesis of specialized diterpenes in gymnosperms. Products of geranylgeranyl diphosphate (GGPP) produced by the indicated diTPS enzymes and further functional modifications are illustrated.

|0014] Fig. 7. Illustrates data produced by in vitro functional assays of PxaTPSS and PxaTPSS. GC-MS analysis of reaction products obtained from in vitro enzyme assays of recombinant PxaTPSS with farnesyl diphosphate (FPP) as a substrate [A], and PxaTPSS with GGPP as a substrate [B]. [C] Activity assays of PxaTPSS with geranyl diphosphate (GPP) or farnesyl diphosphate (FPP) as substrate resulted exclusively in the corresponding dephosphorylated substrates demonstrating no conversion by PxaTPSS. GC/MS traces are illustrated as total ion chromatograms (TIC).

|0015] Fig. 8. NMR and quantum chemical analysis of pseudolarene. Illustrated is the structure of the PxaTPSS reaction product verified by NMR analysis in comparison to computational chemical calculations of IH [A] and 13C [B] chemical shifts. Experimental values are presented on top and computational data are at bottom in parentheses.

|00161 Fig. 9. Protein sequence similarity matrix of the class I active site of known gymnosperm diTPS and BIS enzymes. The class I active site is here defined as spanning from the start of helix A to the C-terminus, residues 520-846. Similarity is given in % sequence identity.

[0017] Fig. 10. Sibongilene formation in engineered yeast. Shown are GC/MS total ion chromatograms (TIC) of the PxaTPSS production after co-expression with the yeast GGPP synthase (BTSl) in the engineered yeast {S. cerevisiae) strain AM94. Sibongilene was purified on silica matrix. DEFINITIONS

|0018| Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood by one of skill in the art to which the invention(s) belong. All patents, patent applications, published applications and publications, Genbank sequences, databases, websites and other published materials referred to throughout the entire disclosure herein, unless noted otherwise, are incorporated by reference in their entirety. In the event that there are a plurality of definitions for terms herein, those in this section prevail. Where reference is made to a URL or other such identifier or address, it is understood that such identifiers can change and particular information on the internet may not be permanent. but equivalent information can be found by searching the internet. Reference thereto evidences the availability and public dissemination of such information.

|0019| As used herein, a diterpenoid is an unsaturated hydrocarbon based on the isoprene unit (CsHg), and having a core structure of the general formula CsxHgx. A diterpene contains a 20 carbon atom core structure, and hence is made up of four isoprene units. A diterpenoid also is a type of diterpene. A diterpenoid can derive from geranylgeranyl pyrophosphate (GGPP). Diterpenoids include diterpene olefins (e.g., sibongilene), deiterpene acids (c.g., pseudolaric acids, such as pseudolaric acid B), and diterpene alcohols.

|0020| As used herein,“diterpene synthase” or“diTPS” as used herein, refers to a monofunctional diterpene synthase that is capable of synthesizing a diterpene olefin by sequential cycloisomerisation of the substrate (E,E,E)-geranylgeranyl diphosphate (GGPP).

[00211 As used herein, monofunctional class I diTPS refers to a monofunctional synthase that contains a class I active site that has a Mg^ ' coordinating DDxxD motif and NSE/DTE motifs.

[0022] As used herein, an active fragment of a synthase polypeptide refers to a contiguous sequence of amino acids of a synthase polypeptide that exhibits synthase activity (e.g. 5,7- fused bicyclic synthase activity such as sibongilene synthase activity), but that does not include the full-sequence of the synthase polypeptide. For purposes herein, the active fragment typically includes the class I site that has a a DDxxD motif, and more typically a DDxxD motif and NSE/DTE motifs. The active fragment generally contains at least 300, 400, 500, 600, 700, 800 or more amino acid residues.

[0023[ As used herein,“sibongilene synthase activity” refers to a synthase polypeptide or an active fragment of a synthase polypeptide that catalyzes the formation of sibongilene from geranylgeranyl diphosphate (GGPP).

[0024| As used herein, a pseudomature polypeptide with reference to a synthase refers to a polypeptide that lacks one or more amino acid residues from the N-terminus of the preprotein, and typically, or typically at least 10, 20. 26, 30, 40, 50, 60, 70, 80, 90 or more N- terminal amino acid residues. Typically, a pseudomature polypeptide lacks the plastidial transit peptide. For example, with reference to PxaTPSS, the plastidial transit polypeptide corresponds to amino acid residues 1-28 ofSEQ ID NO;l. Hence, a pseudomature PxaTPSS polypepatide lacks at least 28. 30. 40. 50. 55, 60, 65, 70, 75. 80. 90 or more N-terminal amino acid residues of the preprotein set forth in SEQ ID NO; 1. In some embodiments, the pseudomature PxaTPSS polypeptide has the sequence set forth in SEQ ID NO:3.

[0025] As used herein, sibongilene is the compound having the following structure or a mixture of isomers thereof:

[0026] As used herein, corresponding residues refers to residues that occur at aligned loci. Related or variant polypeptides are aligned by any method known to those of skill in the art. Such methods typically maximize matches, and include methods such as manual alignments and those produced by the numerous alignment programs available (for example, BLASTP) and others known to those of skill in the art. By aligning the sequences of polypeptides, one skilled in the art can identify corresponding residues, using conserved and identical amino acid residues as guides. Con-esponding positions also can be based on structural alignments, for example by using computer simulated alignments of protein structure.

[0027] As used herein, nucleic acids or nucleic acid molecules include DNA, RNA and analogs thereof, including peptide nucleic acids (PNA) and mixtures thereof Nucleic acids can be single or double-stranded.

[0028] As used herein, the term polynucleotide means a single- or double-stranded polymer of deoxyribonucleotides or ribonucleotide bases read from the 5' to the 3' end.

Polynucleotides include RNA and DNA, and can be isolated from natural sources, synthesized in vitro, or prepared from a combination of natural and synthetic molecules. The length of a polynucleotide molecule is given herein in terms of nucleotides (abbreviated“nt") or base pairs (abbreviated“bp’")- The term nucleotides is used for single- and double-stranded molecules where the context permits. When the term is applied to double-stranded molecules it is used to denote overall length and will be understood to be equivalent to the term base pairs. It will be recognized by those skilled in the art that the two strands of a double-stranded polynucleotide can differ slightly in length and that the ends thereof can be staggered; thus ail nucleotides within a double-stranded polynucleotide molecule cannot be paired. Such unpaired ends will, in general, not exceed 20 nucleotides in length. [0029] As used herein, a peptide refers to a polypeptide that is greater than or equal to 2 amino acids in length, and less than or equal to 40 amino acids in length.

[0030] As used herein, the amino acids which occur in the various sequences of amino acids provided herein are identified according to their known, three-letter or one-letter abbreviations. The nucleotides which occur in the various nucleic acid fragments are designated with the standard single-letter designations used routinely in the art.

[0031] As used herein, an“amino acid” is an organic compound containing an amino group and a carboxylic acid group. A polypeptide contains two or more amino acids. For purposes herein, amino acids include the twenty naturally-occurring amino acids, non-natural amino acids and amino acid analogs (/.<?., amino acids wherein the a-carbon has a side chain).

[0032] As used herein,“amino acid residue” refers to an amino acid formed upon chemical digestion (hydrolysis) of a polypeptide at its peptide linkages. The amino acid residues described herein are presumed to be in the“L” isomeric form. Residues in the“D” isomeric form, which are so designated, can be substituted for any L-amino acid residue as long as the desired functional property is retained by the polypeptide. Nfh refers to the free amino group present at the amino terminus of a polypeptide. COOH refers to the free carboxy group present at the carboxyl terminus of a polypeptide.

[0033] All amino acid residue sequences represented herein by formulae have a left to right orientation in the conventional direction of amino-terminus to carboxyl-terminus. In addition, the phrase“amino acid residue" is defined to include the twenty naturally occurring and proteinogenic amino acids and modified and unusual amino acids, such as those referred to in 37 C.F.R. §§1.821-1.822, and incorporated herein by reference. Furthermore, it should be noted that a dash at the beginning or end of an amino acid residue sequence indicates a peptide bond to a further sequence of one or more amino acid residues, to an amino-terminal group such as Ni l? or to a carboxyl-terminal group such as COOH.

[0034] As used herein. "natLirally occurring amino acids” refer to the 20 L-amino acids that occui' in polypeptides.

[0035] As used herein,“non-natural amino acid" refers to an organic compound containing an amino group and a carboxylic acid group that is not one of the naturally-occurring amino acids. Non-naturally occurring amino acids thus include, for example, amino acids or analogs of amino acids othei' than the 20 naturally-occurring amino acids and include, but are not limited to, the D-isostereomers of amino acids. Exemplary non-natural amino acids are known to those of skill in the art.

[0036] As used herein, modification is in reference to modification of the primary sequence of amino acids of a polypeptide or a sequence of nucleotides in a nucleic acid molecule and includes deletions, insertions, and replacements and rearrangements of amino acids and nucleotides. Modifications can be made by making conservative amino acid replacements and also non-conservative amino acid substitutions as well as by insertions and other such changes in primary sequence. Modifications also can include post-translational modifications or other changes to the molecule that can occur due to conjugation or linkage, directly or indirectly, to another moiety, but when such modifications are contemplated they are referred to as post-translational modifications or conjugates or other such term as appropriate.

Methods of modifying a polypeptide are routine to those of skill in the art, and can be performed by standard methods, such as site directed mutations, amplification methods, and gene shuffling methods.

[0037] As used herein, amino acid replacements or substitutions contemplated include, but are not limited to, conservative substitutions, including, but not limited to, those set forth in Table 1. Suitable conservative substitutions of amino acids are known to those of skill in the art and can be made generally without altering the conformation or activity of the polypeptide. Those of skill in this art recognize that, in general, single amino acid substitutions in non-essential regions of a polypeptide do not substantially alter biological activity (see, e.g., Watson et al. Molecular Biology of the Gene, 4th Edition, 1987, The Benjamin/Cummings Pub. co., p. 224). Conservative amino acid substitutions are made, for example, in accordance with those set forth in Table 1 as follows:

[0038] Other conservative substitutions also are permissible and can be determined empirically or in accord with known conservative substitutions.

[0039] As used herein, a DNA construct is a single or double stranded, linear or circular DNA molecule that contains segments of DNA combined and juxtaposed in a manner not found in nature. DNA constructs exist as a result of human manipulation, and include clones and other copies of manipulated molecules.

[0040] As used herein, a DNA segment is a portion of a larger DNA molecule having specified attributes. For example, a DNA segment encoding a specified polypeptide is a portion of a longer DNA molecule, such as a plasmid or plasmid fragment, which, when read from the 5' to 3' direction, encodes the sequence of amino acids of the specified polypeptide.

[0041] As used herein,“primary sequence” refers to the sequence of amino acid residues in a polypeptide.

[0042] As used herein,“similarity” between two proteins or nucleic acids refers to the relatedness between the sequence of amino acids of the proteins or the nucleotide sequences of the nucleic acids. Similarity can be based on the degree of identity and/or homology of sequences of residues and the residues contained therein. Methods for assessing the degree of similarity between proteins or nucleic acids are known to those of skill in the art. For example, in one method of assessing sequence similarity, two amino acid or nucleotide sequences are aligned in a manner that yields a maximal level of identity between the sequences.“Identity” refers to the extent to which the amino acid or nucleotide sequences are invariant. Alignment of amino acid sequences, and to some extent nucleotide sequences, also can take into account conservative differences and/or frequent substitutions in amino acids (or nucleotides). Conservative differences are those that preserve the physico-chemical properties of the residues involved. Alignments can be global (alignment ofthe compared sequences over the entire length of the sequences and including all residues) or local (the alignment of a portion of the sequences that includes only the most similar region or regions). |0043| As used herein,“sequence identity” refers to the number of identical or similar amino acids or nucleotide bases in a comparison between a test and a reference polypeptide or polynucleotide. Sequence identity can be determined by sequence alignment of nucleic acid or protein sequences to identify regions of similarity or identity. For purposes herein, sequence identity is generally determined by alignment to identify identical residues. The alignment can be local or global. Matches, mismatches and gaps can be identified between compared sequences. Gaps are null amino acids or nucleotides inserted between the residues of aligned sequences so that identical or similar characters are aligned. Generally, there can be internal and terminal gaps. Sequence identity can be determined by taking into account gaps as the number of identical residues/length of the shortest sequencexlOO. When using gap penalties, sequence identity can be determined with no penalty for end gaps (e.g. terminal gaps are not penalized). Alternatively, sequence identity can be determined without taking into account gaps as the number of identical positions/length of the total aligned

sequencex 100.

|0044j Amino acid sequence similarity or identity can be computed by using the BLASTP and TBLASTN programs which employ the BLAST (basic local alignment search tool) 2.0 algorithm. Techniques for computing amino acid sequence similarity or identity are well known to those skilled in the art, and the use of the BLAST algorithm is described in ALTSCHUL et al. 1990, JMol Biol 215: 40.3-410 and ALTSCHUL et al. (1997), Nucleic Acids Res. 25: 3389-3402.

|0045| As used herein, the term“identity” represents a comparison between a test and a reference polypeptide or polynucleotide. In one non-limiting example,“at lea.st 90% identical to” refers to percent identities from 90 to 100% relative to the reference polypeptides.

Identity at a level of 90% or more is indicative of the fact that, assuming for exemplification purposes a test and reference polypeptide length of 100 amino acids are compared, no more than 10% (i.e., 10 out of 100) of amino acids in the test polypeptide differs from that of the leference polypeptides. Similar comparisons can be made between a test and reference polynucleotides. Such differences can be represented as point mutations randomly distributed over the entire length of an amino acid sequence or they can be clustered in one or more locations of varying length up to the maximum allowable, e.g., 10/100 amino acid difference (approximately 90% identity). Differences also can be due to deletions or truncations of amino acid or nucleotide residues. Differences are defined as nucleic acid or amino acid substitutions, insertions or deletions. Depending on the length of the compared sequences, at the level of homologies or identities above about 85-90%, the result reasonably independent of the program and gap parameters set; such high levels of identity can be assessed readily, often without relying on software.

|0046| As used herein, a substantially similar sequence is an amino acid sequence that differs from a reference sequence only by one or more conservative substitutions. Such a sequence can, for example, be functionally homologous to another substantially similar sequence. It will be appreciated by a person of skill in the art the aspects of the individual amino acids in a peptide provided herein that can be substituted.

|0047| As used herein, an aligned sequence refers to the use of homology (similarity and/or identity) to align corresponding positions in a sequence of nucleotides or amino acids.

Typically, two or more sequences that are related by about or 50% or more identity are aligned. An aligned set of sequences refers to 2 or more sequences that are aligned at corresponding positions and can include aligning sequences derived from RNAs, such as ESl's and other cDNAs, aligned with genomic DNA sequence.

|0048] As used herein, substantially pure means sufficiently homogeneous to appear free of readily detectable impurities as determined by standard methods of analysis, such as thin layer chromatography (TLC), gel electrophoresis and high performance liquid

chromatography (HPLC), used by those of skill in the art to assess such purity, or sufficiently pure such that further purification would not detectably alter the physical and chemical properties, such as enzymatic and biological activities, of the substance. Methods for purification of the compounds to produce substantially chemically pure compounds are known to those of skill in the art. A substantially chemically pure compound can. however, be a mixture of stereoisomers or isomers. In such instances, further purification might increase the specific activity of the compound.

|0049| As used herein, isolated or purified polypeptide or protein or biologically-active portion thereof is substantially free of cellular material or other contaminating proteins from the cell of tissue from which the protein is derived, or substantially free from chemical pi-ecLirsors or other chemicals when chemically synthesized. Preparations can be determined to be substantially free if they appear free of readily detectable impurities as determined by sUindard methods of analysis, such as thin layer chromatography (I'LC), gel electrophoresis and high performance liquid chromatography (HPLC), used by those of skill in the art to assess such purity, or sufficiently pure such that further purification would not detectably alter the physical and chemical properties, such as proteolytic and biological activities, of the substance. Methods for purification of the compounds to produce substantially chemically pure compounds are known to those of skill in the art. A substantially chemically pure compound, however, can be a mixture of stereoisomers. In such instances, further purification might increase the specific activity of the compound.

|0050| As used herein, substantially free of cellular material includes preparations of diTPSs or diterpene products in which the synthase or product is separated from cellular components of the cells from which it is isolated or produced. In one embodiment, the term substantially free of cellular material includes preparations of having less that about or less than 30%, 20%, 10%, 5% or less (by dry weight) of non-diTPS or diterpene product, including cell culture medium. When the synthase is recombinantly produced, it also is substantially free of culture medium, i.e., culture medium represents less than about or at 20%, 10% or 5% of the volume of the synthase protein preparation.

[0051 ] As used herein, the term substantially free of chemical precursors or other chemicals includes preparations of synthase proteins or diterpene products that is separated from chemical precursors or other chemicals that are involved in the synthesis thereof The term includes preparations of synthase proteins or diterpene products having less than about or less than 30% (by dry weight), 20%, 10%, 5% or less of chemical precursors or non­ synthase chemicals or components. As described herein, the present invention can provide isolated sibongilene, or a composition containing isolated sibongilene, wherein, e.g., the isolated sibongilene is substantially free of chemical precursors or other chemicals.

Similarly, the present invention can provide isolated PxaTPSS polypeptide, or an active fragment thereof wherein, e.g., the isolated PxaTPSS polypeptide, or active fragment thereof is substantially free of chemical precursors or other chemicals.

|0052| As used herein, synthetic, with reference to, for example, a synthetic nucleic acid molecule or a synthetic gene or a synthetic peptide refers to a nucleic acid molecule or polypeptide molecule that is produced by recombinant methods and/or by chemical synthesis methods. |0053] As used herein, production by recombinant methods by using recombinant DNA methods refers to the use of the well-known methods of molecular biology for expressing proteins encoded by cloned DNA.

[0054] As used herein, vector (or plasmid) refers to discrete DNA elements that are used to introduce heterologous nucleic acid into cells for either expression or replication thereof The vectors typically remain episomal, but can be designed to effect integration of a gene or portion thereof into a chromosome of the genome. Also contemplated are vectors that are artificial chromosomes, such as bacterial artificial chromosomes, yeast artificial

chromosomes and mammalian artificial chromosomes. Selection and use of such vehicles are well known to those of skill in the art.

|0055| As used herein, expression refers to the process by which nucleic acid is transcribed into mRNA and translated into peptides, polypeptides, or proteins. If the nucleic acid is derived from genomic DNA, expression can, if an appropriate eukaryotic host cell or organism is selected, include processing, such as splicing of the mRNA.

|0056| As used herein, an expression vector includes vectors capable of expressing DNA that is operatively linked with regulatory sequences, such as promoter regions, that are capable of effecting expression of such DNA fragments. Such additional segments can include promoter and terminator sequences, and optionally can include one or more origins of replication, one or more selectable markers, an enhancer, a polyadenylation signal, and the like. Expression vectors are generally derived from plasmid or viral DNA, or can contain elements of both. Thus, an expression vector refers to a recombinant DNA or RNA construct, such as a plasmid, a phage, recombinant virus or other vector that, upon introduction into an appropriate host cell, results in expression of the cloned DNA. Appropriate expression vectors are well known to those of skill in the art and include those that are replicable in eukaryotic cells and/or pi'okaryotic cells and those that remain episomal or those which integrate into the host cell genome.

,|0057] As used herein, vector also includes "virus vectors" or "viral vectors." Viral vectors are engineered viiaises that are operatively linked to exogenous genes to transfer (as vehicles or shuttles) the exogenous genes into cells. Viral vectors include, but are not limited to, adenoviral vectors, retroviral vectors and vaccinia virus vectors.

100581 As used herein, operably or operatively linked when referring to DNA segments means that the segments tire arranged so that they function in concert for their intended purposes, e.g., transcription initiates downstream of the promoter and upstream of any transcribed sequences. The promoter is usually the domain to which the transcriptional machinery binds to initiate transcription and proceeds through the coding segment to the terminator.

|0059| As used herein, the term assessing or determining includes quantitative and qualitative determination in the sense of obtaining an absolute value for the activity of a product, and also of obtaining an index, ratio, percentage, visual or other value indicative of the level of the activity. Assessment can be direct or indirect.

|0060| As used herein, recitation that a polypeptide“consists essentially” of a recited sequence of amino acids means that only the recited portion, or a fragment thereof, of the full-length polypeptide is present. The polypeptide can optionally, and generally will, include additional amino acids from another source or can be inserted into another polypeptide. |00611 As used herein, the term“heterologous” in reference to two different components refers to the two different components that are not found together in nature. For example, a diterpenoid such as sibongilene that is heterologous to a host cell, or lysate or conditioned medium thereof, in which it is found refers to a host cell, or lysate or conditioned medium thereof, that does not naturally produce or contain sibongilene. Similarly, a nucleic acid that is heterologous to a promoter to which it is operably linked is not found in nature to be operably linked to such promoter.

|0062) Pharmaceutically acceptable excipient” refers to a substance that aids the administration of an active agent to and absorption by a subject. Pharmaceutical excipients useful in the present invention include, but are not limited to, carriers, binders, fillers, disintegrants, lubricants, coatings, sweeteners, flavors and colors. One of skill in the art will recognize that other pharmaceutical excipients are useful in the present invention.

|0063| As used herein, the singular forms“a”,“an” and“the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to polypeptide, comprising“an amino acid replacement” includes polypeptides with one or a plurality of amino acid replacements.

|0064| As used herein, ranges and amounts can be expressed as "about” a particular value or I'ange. About also includes the exact amount. Hence“about 5%” means“about 5%“ and also“5%." |0065| As used herein,“optionaF’ or“optionally” means that the subsequently described event or circumstance does or does not occur, and that the description includes instances where said event or circumstance occurs and instances where it does not. For example, an optional step of isolating a diterpenoid (e.g. pseudolaric acid B) means that the diterpenoid (e.g. pseudolaric acid B) is isolated or is not isolated.

|0066| As used herein, the abbreviations for any protective groups, amino acids and other compounds, are, unless indicated otherwise, in accord with their common usage, recognized abbreviations, or the lUPAC-lUB Commission on Biochemical Nomenclature (see, (1972) Biochem. 11:1726).

|0067] For clarity of disclosure, and not by way of limitation, the detailed description is divided into the subsections that follow.

DETAILED DESCRIPTION OF THE INVENTION

I. Introduction

|0068] Diterpenes play essential roles in plant biology and serve as important bioproducts and therapeutics, including the anticancer drug Taxol®. Enzymes of the diterpene synthase family produce the many core structural scaffolds that form the foundation of the large diversity of biologically active diterpenes. This paper describes the discovery and mechanism of a novel diterpene synthase, sibongilene synthase, from the golden larch tree, Psendolarix awabilis. The enzyme catalyzes the first committed reaction in the biosynthesis of pseudolaric acid B, a complex diterpene with potential anticancer activity.

|0069| Drawing on centuries of knowledge of ethnomedicinal plants from around the world, plant natural products, also known as secondary or specialized metabolites, remain a major yet largely untapped source for drug discovery (I). The diterpenes are one of the most diverse classes of plant secondary metabolites, including some with known functions critical to plant fitness and survival (2). These various biological activities of diterpenes also form the basis of their use as modern therapeutics, such as the anticancer drug taxol (paclitaxel) and the cAMP-activating compound forskolin (3,4).

|0070] Golden larch {Pseiiclolarix aniabilis, Pinaceae) is a deciduous gymnosperm tree renowned as one of the 50 fundamental herbs in traditional Chinese medicine. Golden iai'ch produces a set of diterpenes with unique chemical structures among the metabolite class, the pseudolaric acids (5) (Fig. 6). The major bioactive ingredient, pseudolaric acid B (PAB), has demonstrated antitumor properties against a broad range of cancer types (6-9). Similar to the widely-used chemotherapeutics taxol and vinblastine, PAB binds to microtubules and has anti-proliferative activity (9). Specifically, PAB inhibits microtubule polymerization (9), and its activity has been shown to circumvent multidrug resistance (6,8).

|00711 Development of PAB as an anticancer drug is limited by supply, which depends on isolation from golden larch roots or may be achieved through multistep chemical synthesis (5,10). Knowledge of the genes and enzymes of PAB biosynthesis in golden larch would provide the resources needed to enable enzymatic biomanufacture. On this premise, we recently established a gene discovery strategy for diterpene metabolism in non-model plants, which is informed by metabolite and transcriptome profiling (11). Target genes include the family of diterpene synthases (diTPSs), which catalyze the carbocation-driven cyclization and rearrangement of the central precursor geranylgeranyl diphosphate (GGPP) into various diterpene scaffolds as the bedrock for diterpene structural diversity (2,12). The large number of diTPSs and their many different specific functions are the product of evolutionary diversification that involved repeated events of gene duplication and neo-functionalization (2,13). Given their shared ancestry, known plant diTPSs of different species and functions are structurally conserved with variations of three a-helical domains and two distinct active sites (N-terminal class II and C-terminal class 1). Domain architecture, as well as the presence and contour of these active sites define the catalytic specificity of a given diTPS (14-16). In gymnosperms, most diTPSs of secondary metabolism are bifunctional class 1/11 enzymes, which contain both functional active sites and form a variety of labdane-type diterpenes (17­ 22) (Fig. 6). Gymnosperms also evolved monofunctional class I diTPSs with roles in secondary metabolism, including enzymes of labdane biosynthesis in pines {Pinus) (22) and taxadiene synthases that form the precursor for taxol and other taxoids in yew (Taxus) (16,23). Based on the structure of pseudolaric acids (Fig. 6), we hypothesized that a monofunctional class 1 diTPS may catalyze the first committed step in the biosynthesis of PAB.

j0072| Describe herein is the discovery of the first committed step in PAB biosynthesis and the underlying reaction mechanism of a novel diterpene synthase (diTPS). Analysis of the golden larch root transcriptome revealed a large TPS family, including the unusual monofunctional class 1 diTPS Pxal'PS8, which converts geranylgeranyl diphosphate into a previously unknown 7,5-fused bicyclic diterpene, coined here sibongilene. The structure of sibongilene was elucidated by NMR combined with quantum chemical validation. Although PxaTPSS adopts the typical 3-domain structure of diTPSs, sequence phylogeny places the enzyme with 2-domain TPSs of mono- and sesqui-terpene biosynthesis, inferring an expansive evolutionary divergence. Site-directed mutagenesis of PxaTPSS revealed unique catalytic residues, which together with quantum chemical calculations suggested a novel carbocation-driven reaction mechanism en route to the 5,7-tra«x-fused bicyclic sibongilene scaffold, expanding the known diterpene structural landscape. PxaTPSS expression in microbial and plant hosts provided proof-of-concept systems for metabolic engineerin to’ and production of sibongilene, pseudolaric acids, such as psueudolaric acid B, and related diterpenoids.

II. Monofunctional Class I Diterpene Polypeptides and Diterpenoid Products

|0073| The present disclosure relates to one, or more than one, diterpene synthase (diTPS) nucleic acid molecule and one, or more than one, diTPS polypeptide. The one or more than one, diTPS polypeptides can be a class 1 diTPS. More specifically the one or more than one diTPS polypeptides can be a monofunctional class I diTPS. The diTPS can therefore contain a class I active site that has a DDxxD motif. The present disclosure provides a nucleic acid containing a nucleotide sequence encoding diterpene synthase (diTPS), for example, PxaTPSS diTPS (SEQ ID NO:2, encoding the polypeptide SEQ ID NO: I). The nucleotide sequence encoding diTPS can be operatively linked to a regulatory region active in a host. |0074] Also provided herein are variants of any of the nucleic acid sequences provided herein exhibiting substantially the same properties as the sequences provided herein. By this it is meant that nucleic acid sequences need not be identical to the sequence disclosed herein. Variations can be attributable to single or multiple base substitutions, deletions, or insertions or local mutations involving one or more nucleotides not substantially detracting from the properties of the nucleic acid sequence as encoding an enzyme having the activity of the diTPS as provided herein.

|0075| One, or more than one, nucleic acid encoding a diTPS are provided. The nucleic acid encoding a diTPS, such as is used in any of the described embodiments herein, can contain a nucleotide sequence that is at least 50% identical to SEQ ID NO: 2. a portion thereof that encodes an active fragment that exhibits diTPS activity or to the complement thereof For example, the nucleic acid contains a nucleotide sequence that is at least 55%. at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at letist 85%, at least 90%, such as generally at least 95% or at least 98% identical to SEQ ID NO: 2, a portion thereof that encodes an active fragment that exhibits diTPS activity or the complement thereof. The present disclosure provides nucleic acid sequences encoding for a polypeptide having a sequence selected from SEQ ID NO: 1, an active fragment thereof or sequences substantially identical thereto. For example, the provided nucleic acid sequence can encode a

pseudomature form of SEQ ID NQ: 1 (e.g., SEQ ID NQ:3), or an active fragment thereof The one, or more than one, nucleic acid can contain the sequence set forth in SEQ ID NQ: 2, or a portion thereof that encodes an active fragment that exhibits diTPS activity, combinations thereof, or sequences substantially similar thereto. The sequence of the nucleic acid can be changed, for example, to account for codon preference in a particular host cell. In particular examples, the nucleic acid encoding a diTPS contains a nucleotide sequence set forth in SEQ ID NO: 2, or a portion thereof that encodes an active fragment or the complement thereof In other examples, the nucleic acid encoding a diTPS is set forth in SEQ ID NO: 2, or a portion thereof that encodes an active fragment or the complement thereof [0076] Also provided are one, or more than one diTPS polypeptides. The polypeptide having a diTPS activity, such as intended for use in aspects of the methods provided herein, is a polypeptide having an amino acid sequence that is at least 50% identical to any of SEQ ID NO: 1, or an active fragment thereof that exhibits a diTPS activity. Such polypeptides include pseudomature forms lacking the transit peptide. For examples, among polypeptides provided herein are any that have an amino acid sequence that is at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, such as generally at least 95% or at least 98% identical, or identical, to SEQ ID NO: 1 or SEQ ID NO:3 or an active fragment thereof that exhibits diTPS activity. The one, or more than one diTPS polypeptides can contain the sequence set forth in SEQ ID NO: 1 or SEQ ID NO:3, or an active fragment thereof that exhibits diTPS activity, or sequences having at least about 80­ 100% sequence similarity thereto, including any percent similarity within these ranges, such as or at least or greater than 81, 82, 83, 84. 85. 86. 87. 88. 89. 90. 91.92, 93, 94, 95, 96, 97, 98, 99% sequence similarity thereto, fhe present disclosure provides nucleic acid sequences encoding for a polypeptide having a sequence selected from SEQ ID NO: 1 or SEQ ID NO:3, an active fragment thereof that exhibits diTPS activity, or sequences substantially identical thereto. In examples herein, the polypeptide contains the sequence of amino acids set forth in SEQ ID NO: 1 or SEQ ID NO:3. or an active fragment thereof that exhibits diTPS activity. In other examples, the amino acid sequence for a polypeptide provided herein is set forth in SEQ ID NO: 1 SEQ ID NO: 1 or SEsQ ID NO:3, or an active fragment thereof that exhibits diTPS activity. Also provided herein are pseudomature forms of any of SEQ ID NOS:l lacking the transit peptide, such as the polypeptide set forth in SEQ ID NO:3. Methods of Producing Diterpenoids

10077] diTPSs are useful enzymes for the metabolic engineering of bioproducts and biofuels in yeast and E. coli (E^ohlmann et al. (2008) Plant .1., 54:656-669; Peralta-Yahya et al. (201 \) Nat. Coivmun., 2:483). US Patent Application 2011/0041218 discloses a method for the production of sclareol, a compound useful in the fields of perfumery and flavoring. US Patent Application 2008/0281135 discloses a method for producing terpenes of interest in plants having glandular trichomes. The plants contain a sequence encoding a heterologous terpene synthase under the control of a promoter permitting it to be specifically expressed in the trichomes. Moreover, the pathway for producing endogenous diterpenes is blocked in the trichomes of the plants, to increase the flow in the heterologous pathway. WO 2008/007031 discloses a protein having a syn-copalyl-8-ol diphosphate synthase activity, the nucleotide sequence encoding said protein, as well as a vector and a transgenic non-human organism containing the nucleic acid.

|0078| Provided herein are methods of producing diterpenoids, such as one or more pseudolaric acids, or their precursors [e.g., sibongilene), in vitro or in vivo using the monofunctional Class 1 di'fPSs provided herein. Depending on the diTPS used, and the one or more functionalization methods used, the diterpenoid that can be produced by the present methods are for example sibongilene, pseudolaric acid A, pseudolaric acid B, pseudolaric acid C, or a derivative thereof In some cases, a diterpenoid such as sibongilene is produced in vivo or in vitro using a diTPS described herein, and the diterpenoid is optionally isolated and furthei' derivatized by means of enzymatic or synthetic organic chemical derivatization. |0079| In one example, the method for producing diterpenoids is carried out in vitro. In this case, (E,E/,l/)-gcranylgeranyl diphosphate (GGPP) is contacted with at least one polypeptide having a diterpene synthase (diTPS) activity under conditions effective to produce diterpenoids. In performing the methods, GGPP can be added to a suspension or solution containing a diterpene synthase polypeptide, which is then incubated at optimal temperature, for example between 15 and 40° C.. such as between 25 and 35° C., or at 30° C. The produced diterpenoid can optionally be isolated by methods known in the art. I-or example. after incubation, the one or more than one diterpene produced can be isolated from the incubated solution by standard isolation procedures, such as solvent extraction and distillation, optionally after removal of polypeptides from the solution. For example, extraction can be effected with pentane, diethyl ether, methyl tertiary butyl ether or other organic solvent. Production and quantification of the amount of the diterpene product (e.g. any one or more of sibongilene, pseudolaric acid A, pseudolaric acid B, pseudolaric acid C, or a derivative thereof) can then be determined using any method known in the art, such as column chromatography, for example liquid chromatography (e.g. LG-MS or HPLC) or gas chromatography (e.g. GC-MS), using an internal standard. For detection of diphosphate intermediates, reaction products can be dephosphorylated prior to extraction by incubation with alkaline phosphatase.

[00801 In another example, the method for producing diterpenoids is carried out in vivo. In this case, the method involves introducing into a host capable of producing GGPP, a nucleotide sequence encoding a diterpene synthase (diTPS) operatively linked with a regulatory region active in the host, and growing that host under conditions that permit the expression of the nucleic acid, thereby producing the diterpenoids. Any host cell can be used for expressing the diTPS, such as any host cell described herein. For example, the host cell can be a eukaryotic or prokaryotic host cell that produces GGPP or is modified to produce GGPP. Exemplary of host cells are bacterial host cells (e.g. E. coli) or fungal host cells (e.g. yeast). In such an example, it is possible to carry out the method in vivo without previously isolating the polypeptide. The reaction occurs directly within the organism or cell transformed to express said nucleic acid. The diterpene product (e.g. any one or more of sibongilene, pseudolaric acid A, pseudolaric acid B, pseudolaric acid C, or a derivative thereof) then can be extracted from the cell culture medium with an organic solvent and subsequently isolated or purified by any known method, such as column chromatography, such as liquid chromatography (e.g. LC-MS or FIPLC) or gas chromatography (e.g. GC-MS), and the amount and purity of the recovered product are assessed.

[00811 The quantity of diterpene produced, such as for example sibongilene, pseudolaric acid A, pseudolaric acid B, pseudolaric acid C, or a derivative thereof, can be determined by any known standard chromatographic technique useful for separating and analyzing organic compounds. For example, production can be assayed by any known chromatographic technique useful for the detection and quantification of hydrocarbons, including, but not limited to, gas chromatography mass spectrometry (GC-MS), gas chromatography using a flame ionization detector (GC-FID), capillary GC-MS, liquid chromatography mass spectrometry (LC-MS), high performance liquid chromatography (HPLC) and column chromatography. Typically, these techniques are carried out in the presence of known internal standards which are used to quantify the amount of the terpene produced. For example, diterpenes can be identified by comparison of retention times and mass spectra to those of authentic standards for the particular diterpene in gas chromatography with mass spectrometry detection. In other examples, quantification can be achieved by gas chromatography with flame ionization detection based upon calibration curves with known amounts of authentic standards and normalization to the peak area of an internal standard. These chromatographic techniques allow for the identification of any terpene or diterpene present in the organic layer. Nucleic Acid and Encoded diTPS Polypeptides

[0082] Provided herein are nucleic acid molecules encoding a sibongilene synthase polypeptide or an active fragment thereof, including pseudomature forms lacking the plastidial transit peptide, and the encoded polypeptides. The polypeptide or active fragment thereof catalyzes the formation of sibongilene from geranylgeranyl diphosphate (GGPP). The polypeptide having such activity, such as intended for use in aspects of the methods provided herein, is a polypeptide having an amino acid sequence that is at least 50% identical to SEQ ID NO: 1 or an active fragment thereof.

[0083] The PxaTPSS or active fragment thereof provided herein is a diTPS that is monofunctional and contains a class I active site that has a class I active site that has DDxxD and NSE/DTE motifs, and three conserved active site arginines R558, R560, R736. The class 1 active site encompasses for example residues corresponding to residues 594-603,

IDDIYDTYGT as set forth in SEQ ID NO: 1. The DDxxD motif corresponds to amino acid residues DDi YD 595-599 as set forth in SEQ ID NO: 1. The NSE/DTE motif corresponds to GDMNAYKID (residues 740-747 of SEQ ID NO: 1). In one example, a diTPS provided herein is a sibongelene synthase polypeptide or active fragment thereof that contains a Y564F, S696I, S696V. G697S or Ala701C mutation with reference to SEQ ID NO: I.

|0084| Provided herein are nucleic acid molecules that encode for a polypeptide having a sequence that is at least 50% identical to SEQ ID NO: I or that has a sequence set forth in SEQ ID NO: I or sequences substantially identical thereto, or an active fragment thereof. The nucleic acid encoding a diTPS that is a sibongilene synthase, such as is used in any of the described methods herein, can contain a nucleotide sequence that is at least 50% identical to SEQ ID NO: 2 or a portion thereof that encodes an active fragment having sibongilene synthase activity, or to the complement thereof For example, the nucleic acid contains a nucleotide sequence that is at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, such as generally at least 95% or at least 98% identical to SEQ ID NO: 2, or a portion thereof that encodes an active fragment having sibongilene synthase activity or the complement thereof The sequence of the nucleic acid can be changed, for example, to account for codon preference in a particular host cell.

|0085| Furthermore, the one or more than one diTPS polypeptides can contain modifications in active site residues as disclosed in FIG. 4B (see also Example 1). For example, the diTPS polypeptide that is a sibongilene synthase polypeptide or active fragment thereof can contain the sequence as set forth in SEQ ID NO: 1, an active fragment thereof (e.g. such as a pseudomature form as set forth in SEQ ID NO:3, or sequence substantially identical thereto, wherein the amino acid or a combination of amino acids selected from the amino acids disclosed in FIG. 4B is replaced with another amino acid. Corresponding amino acid residues in various orthologs or homologs or synthetic variants can be identified by one of skill in the art in other sequence forms of sibongilene synthase polypeptide by alignment of residues with SEQ ID NO: 1. Methods of Producing or Generating Diterpene Synthases, Vectors & Host Cells

|0086| Provided herein are polynucleotides encoding any of the diTPS provided herein or the encoded diTPSs polypeptide. As described herein, the nucleic acids and encoding polypeptides are derived from Golden larch (Pseudolarix cwiabilis, Pinaceae). The polypeptide or the nucleic acid can be used in any of the method provided herein for producing a diterpenoid. Also provided herein are vectors and hosts containing the diTPS and that can be used for producing diterpenoids.

|0087| The diTPS to be used in methods provided herein also can be generated synthetically. Standard reference works setting forth the general principles of peptide synthesis technology and methods known to those of skill in the art include, for example: Chan et ah, Fmoc Solid Phase Peptide Synthesis, Oxford University Press, Oxford, United Kingdom, 2005; Peptide and Protein Drug Analysis, ed. Reid, R., Marcel Dekker, Inc., 2000; Epitope Mapping, ed. Westwood et ah, Oxford University Press, Oxford, United Kingdom. 2000; Sambrook et ah, Molecular Cloning: A Laboratory Manual. 3"'ed., Cold Spring Harbor Press, Cold Spring Harbor, N.Y. 2001; and Ausubel et al., Current Protocols in Molecular Biology, Greene Publishing Associates and John Wiley & Sons, NY, 1994.

|0088| Also provided is a diTPS kit. The kit can contain one or more diTPS nucleic acid molecules. The kit can contain one or more diTPS polypeptides. The kit can contain a synthetic diTPS gene. The kit can contain a vector containing one or more diTPS nucleic acids. The kit can contain a host cell capable of expressing one or more than one diTPS polypeptide. Isolation of Nucleic Acid Encoding Diterpene Synthases

|0089| The one or more than one polynucleotide sequences encoding the diTPS as provided herein can be prepared by any method known by the person skilled in the art. For example, the polynucleotide sequence encoding a diTPS can be amplified from a cDNA template, by polymerase chain reaction with specific primers. In such an example the codons of the cDNA can be chosen to favor the expression of said protein in the desired expression system. In other examples, nucleic acids encoding diterpene synthases, including any of the diTPS provided herein, can be cloned or isolated using any available methods known in the art for cloning and isolating nucleic acid molecules. Such methods include PCR amplification of nucleic acids and screening of libraries, including nucleic acid hybridization screening. In some examples, methods for amplification of nucleic acids can be used to isolate nucleic acid molecules encoding a diTPS polypeptide, including for example, polymerase chain reaction (PCR) methods. A nucleic acid containing material can be used as a starting material from which a diTPS-encoding nucleic acid molecule can be isolated. For example, DNA and mRNA preparations from Golden larch (Pseudolarix cmiabilis, Pinaceae), can be used to obtain diterpene synthase genes.

|0090| Nucleic acid libraries also can be used as a source of starting material. Primers can be designed to amplify a diterpene synthase-encoding molecule, such as a diTPS-encoding molecule. For example, primers can be designed based on known nucleic acid sequences encoding a diterpene synthase, such as a class I monofunctional ditei'pene synthase, such as set forth in SEQ ID NO.:2. Nucleic acid molecules generated by amplification can b e sequenced and confirmed to encode a diTPS polypeptide.

|00911 Additional nucleotide sequences can be joined to a diTPS-encoding nucleic acid molecule, including linker sequences containing restriction endonuclease sites for the purpose of cloning the synthetic gene into a vector, for example, a protein expression vector or a vector designed for the amplifieation of the core protein coding DNA sequences.

Furthermore, additional nucleotide sequences specifying functional DNA elements can be operatively linked to a di'T'PS-encoding nucleic acid molecule. Still further, nucleic acid encoding other moieties or domains also can be included so that the resulting synthase is a fusion protein. For example, nucleic acids encoding other enzymes, such as a GGPP synthase, or protein purification tags, such as His or Flag tags. Vectors and Cells

[0092] The disclosure also relates, in part, to vectors containing such sequences, transformed cells, cell lines, and transgenic organisms. For recombinant expression of one or more of the diterpene synthase polypeptides provided herein, including diTPS polypeptides, the nucleic acid containing all or a portion of the nucleotide sequence encoding the synthase can be inserted into an appropriate expression vector, i.e., a vector that contains the necessary elements for the transcription and translation of the inserted protein coding sequence.

Depending upon the expression system used, the necessary transcriptional and translational signals also can be supplied by the native promoter for a diTPS gene, and/or their flanking regions. For example, vectors containing a polynucleotide sequence encoding a diTPS are provided herein. The vector can be obtained and introduced in a host cell by well-known recombinant DNA and genetic engineering techniques. In some examples, a vector can contain the gene encoding a GGPP synthase, such as the gene encoding the GGPP synthase PxaTPSS from Golden larch {Pseiidolarix amahilis, Pinaceae) (SEQ ID NO;2).

[0093] The disclosure also provides a prokaryotic or eukaryotic host cell which is modified by a polynucleotide or a vector as provided herein. The host cell can be prokaryotic, such as bacterial, or eukaryotic, such as fungal (e.g.. yeast), plant, archeae, insect, amphibian or animal cell. The host cell can contain a diTPS vector, a synthetic diTPS gene, and/or diTPS nucleic acid. The host cell can be any cell that is capable of being transformed by the vector, synthetic gene, and/or nucleic acid. The host cell can also be any cell that is capable of expressing the diTPS polypeptide, 'fhe host cell can be incubated under conditions that allow expression of the diTPS polypeptide.

|0094| Several of these organisms do not produce GGPP naturally. To be suitable to carry out the method of the invention, these organisms may need to be transformed with one or more sequences, such as a sequence encoding a GGPP synthase, that result in production of the precursor, GGPP. They can be so transformed either before the modification with the nucleic acid described according to any of the above embodiments, or simultaneously with a nucleotide sequence encoding diTPS, or a vector containing a nucleotide sequence encoding diTPS. Alternatively, in particular examples, the cells are yeast, such as Saccharomyces cerevisiae, that express an acyclic pyrophosphate terpene precursor, such as GGPP. The cells are used to produce a diterpene synthase, such as a diTPS polypeptide, by growing the above- described cells under conditions whereby the encoded diTPS is expressed by the cell. In some instances, the expressed synthase is purified. In other instances, the expressed synthase, such as a sibongilene synthase, converts GGPP to one or more terpenes (e.sibongilene) in the host cell.

|0095| Any method known to those of skill in the art for the insertion of DNA fragments into a vector can be used to construct expression vectors containing a chimeric gene containing appropriate transcriptional/translational control signals and protein coding sequences. These methods can include in vitro recombinant DNA and synthetic techniques and in vivo recombinants (genetic recombination). Expression of nucleic acid sequences encoding a diTPS polypeptide, or a fragment thereof, can be regulated by a second nucleic acid sequence so that the genes or fragments thereof are expressed in a host transformed with the recombinant DNA molecule(s). For example, expression of the proteins can be controlled by any promoteivenhancer known in the art. In a specific embodiment, the promoter is not native to the genes for a diTPS protein. Promoters that can be used include but are not limited to prokaryotic, yeast, mammalian and plant promoters. The type of promoter depends upon the expression system used, described in more detail below.

|0096| In a specific embodiment, a vector is used that contains a promoter operably linked to nucleic acids encoding a diTPS polypeptide, or a fragment thereof thereof one or more origins of replication, and optionally, one or more selectable markers (e.g., an antibiotic resistance gene). Vectors and systems for expression of diTPS polypeptides are described, including, for example, the pfrf28b(+) vector. Expression Systems

!0097| Diterpene synthases, including diTPS polypeptides provided herein, can be produced by any methods known in the art for protein production including in vitro and in vivo methods such as. for example, the introduction of nucleic acid molecules encoding the diterpene synthase (e.g. sibongilene synthase) into a host cell or host plant for in vivo production or expression from nucleic acid molecules encoding the diterpene synthase (e.g. . sibongilene synthase) in vitro. Diterpene synthases such as . sibongilene synthase polypeptides can be expressed in any organism suitable to produce the required amounts and forms of a synthase polypeptide. Expression hosts include prokaryotic and eukaryotic organisms such as E. coli, yeast, plants, insect cells, mammalian cells, including human cell lines and transgenic animals. Expression hosts can differ in their protein production levels as well as the types of post-translational modifications that are present on the expressed proteins. The choice of expression host can be made based on these and d other factors, such as regulatory and safety considerations, production costs and the need and methods for purification.

|0098] Isolated higher eukaryotic cells, for example cell culture, can also be used, instead of complete organisms, as hosts to carry out the method provided herein in vivo. Suitable eukaryotic cells can be any non-human cell, but are generally plant cells. Representative examples of a plant host cell include for example plants that naturally produce high amounts of terpenes. The plant can be selected from the family of Pinaceae, Funariacea, Solanaceae, Poaceae, Brassicaceae, Fabaceae, Malvaceae, Asteraceae or Lamiaceae. For example, the plant is selected from the genera Picea (spruce), Finns (pine), Abies (fir), Physcomitrella, Funariaceae, Nicotiana, Solamnn, Sorghum, Arabidopsis, Brassica (rape), Medicago (alfalfa), Gossypium (cotton), Artemisia, Salvia and Mentha. Preferably, the plant belongs to the species of Nicotiana tabaciim, Nicotiana benthamiana or Physcomitrella patens.

Additional plants and plant cells include, for example, citrus, corn, rice, algae, and lemna. In other examples, the eukaryotic cells are yeast cells. Representative examples of a yeast host cell include those from the Saccharomyces genus (e.g. Saccharomyces cerevisiae) and Pichia genus (e.g. Pichiapasloris). In some examples, insect cells such as Drosophila cells and lepidopteran cells are used for the expression of a diTPS provided herein. IZukaryotic cells for expression also include mammalian cells lines such as Chinese hamster ovary (CflO) cells or baby hamster kidney (BFIK) cells.

|0099| Eukaryotic expression hosts also include production in transgenic animals, for example, including production in serum, milk and eggs. Fhere are several methods known in the art for the creation of transgenic host organisms or cells such as plants, fungi, prokaryotes, or cultures of higher eukaryotic cells. Appropriate cloning and expre.ssion vectors for use with bacterial, fungal, yeast, plant and mammalian cellular hosts are described, for example, in Pouwels et ah. Cloning Vectors: A Laboratory Manual, 1985. Elsevier, New York and Sambrook et al., Molecular Cloning: A Laboratory Manual, 2 ild edition, 1989, Cold Spring Harbor Laboratory Press. Cloning and expression vectors for higher plants and/or plant cells in particular are available to the skilled person. See for example Schardl et al. (1987) Getw 61: 1-11.

|0100| Methods for transforming host organisms or cells to harbor transgenic nucleic acids are familiar to the skilled person. For the creation of transgenic plants, for example, current methods include: electroporation of plant protoplasts, liposome-mediated transformation, agrobacleriiim-meA\ated transformation, polyethylene-glycol-mediated transformation, particle bombardment, microinjection of plant cells, and transformation using viruses.

|0101| Many expression vectors are available and known to those of skill in the art for the expression of a diterpene synthase, such as a diTPS provided herein. Exemplary of expression vectors are pET expression vectors, such as pET28b(+). The choice of expression vector is influenced by the choice of host expression system. Such selection is well within the level of skill of the skilled artisan. In general, expression vectors can include transcriptional promoters and optionally enhancers, translational signals, and transcriptional and translational termination signals. Expression vectors that are used for stable transformation typically have a selectable marker which allows selection and maintenance of the transformed cells. In some cases, an origin of replication can be used to amplify the copy number of the vectors in the cells.

|0102] Diterpene synthases, including diTPS polypeptides, also can be utilized or expressed as protein fusions. P'or example, a fusion can be generated to add additional functionality to a polypeptide. Examples of fusion proteins include, but are not limited to, fusions of a signal sequence, a tag such as for localization, e.g. a hisfrtag or a myc tag, or a tag for purification, for example, a GST fusion, GFP fusion or CBP fusion, and a sequence for directing protein secretion and/or membrane association. In other examples, diterpene synthases such as diTPS polypeptides provided herein can be fused to GGPP synthase (see, e.g., Brodelius et al. (2002) Eur. J. Biochein. 269:3570-3579).

|0I03) Methods of production of diterpene synthase polypeptides, including sibongilene synthase polypeptides, can include co-expression of an acyclic pyrophosphate terpene precursor, such as GGPP, in the host cell, in some instances, the host cell naturally expresses GGPP. Such a cell can be modified to express greater quantities of GGPP (see e.g. U.S. Pat. Nos. 6,53 1,303, 6,689,593, 7,838,279 and 7,842,497). In other in.stances, a host cell that does not naturally produce GGPP is modified genetically to produce GGPP. Prokaryotic Cells

|0104| Prokaryotes, especially E. coli, provide a system for producing large amounts of the diTPS polypeptides provided herein. Transformation of E. coli is a simple and rapid technique well known to those of skill in the art. Representative examples of a bacterial host cell include, but are not limited to, E. coli strains such as for example E. coli BL21DE3-C4I (Miroux and Walker (1996),/Mo/i?/o/ 260:289-298). Exemplary expression vectors for transformation of E. coli cells, include, for example, the pGEX expression vectors, the pQE expression vectors, and the pET expression vectors (see, U.S. Pat. No. 4,952,496; available from Novagen, Madison, Wis.; see, also literature published by Novagen describing the system). Such plasmids include pETl la, which contains the T7lac promoter, T7 terminator, the inducible E. coli lac operator, and the lac repressor gene; pET12a-c, which contains the T7 promoter, T7 terminator, and the E. coli ompT secretion signal; and pET15b and pET19b (Novagen, Madison, Wis.), which contain a His-Tag™ leader sequence for use in purification with a His column and a thrombin cleavage site that permits cleavage following purification over the column, the T7-lac promoter region and the T7 terminator; and pET28b (Novagen, Madison, Wis.), which contains a His-Tag'''^’' leader sequence for use in purification with a His column and a thrombin cleavage site that permits cleavage following purification over the column, the T7-lac promoter region and the T7 terminator; and the pJET vectors (Thermo Scientific), such as the pJETl.2 vector which contains a lethal gene which is disrupted by ligation of a DNA insert into the cloning site and a T7 promoter for in vitro transcription. 10105| Expression vectors for E. coli can contain inducible promoters that are useful for inducing high levels of protein expression and for expressing proteins that exhibit some toxicity to the host cells. Exemplary prokaryotic promoters include, for example, the (3- lactamase promoter (Jay et ah, (1981) Proc. Nall. Acad. Sci. USA 78:5543) and the tac promoter (DeBoer et ah. (1983) Proc. Natl. Acad. Sci. USA 80:21 -25); see also ^'Useful Proteins from Recombinant Bacteria”: in Scientific American 242:79-94 (1980)). Examples of inducible promoters include the lac promoter, the trp promoter, the hybrid tac promoter, the T7 and SP6 RNA promoters and the temperature regulated PA' I promoter.

|0106| Diterpene synthases, including diTPS polypeptides provided herein can be expressed in the cytoplasmic environment of/h coli. The cytoplasm is a reducing environment and for some molecules, this can result in the formation of insoluble inclusion bodies. Reducing agents such as dithiothreitol and (3-mercaptoethanol and denaturants (e.g., such as guanidine-HCI and urea) can be used to resolubilize the proteins. An alternative approach is the expression diTPS polypeptides in the periplasmic space of bacteria which provides an oxidizing environment and chaperonin-like and disulfide isomerases leading to the production of soluble protein. Typically, a leader sequence is fused to the protein to be expressed which directs the protein to the periplasm. The leader is then removed by signal peptidases inside the periplasm. Examples of periplasmic-targeting leader sequences include the pelB leader from the pectate lyase gene and the leader derived from the alkaline phosphatase gene. In some cases, periplasmic expression allows leakage of the expressed protein into the culture medium. The secretion of proteins allows quick and simple purification from the culture supernatant. Proteins that are not secreted can be obtained from the periplasm by osmotic lysis. Similar to cytoplasmic expression, in some cases proteins can become insoluble and denaturants and reducing agents can be used to facilitate solubilization and refolding. Temperature of induction and growth also can influence expression levels and solubility. Typically, temperatures between 25° C. and 37° C. are used. Mutations also can be used to increase solubility of expressed proteins. Typically, bacteria produce aglycosylated proteins. Yeast Cells

|0107] Yeast systems, such as, but not limited to, those from the Saccharomyces genus (e.g. Saccharomyces cerevisiae), Schizosaccharomyces pomhe. Yarrowia lipolytica, Kliiyveromyces laclis, and Pichiapasloris can be used to express the diterpene synthases, such as the diTPS polypeptides, provided herein. Yeast expression systems also can be used to produce diterpenes whose reactions are catalyzed by the synthases. Yeast can be transformed with episomal replicating vectors or by stable chromosomal integration by homologous recombination. In some examples, inducible promoters are used to regulate gene expression. Exemplary promoter sequences for expression of di TPS polypeptides in yeast include, among others, promoters for metallothionine, 3-phosphoglycerate kinase (l litzeman et al. (1980)./, Biol. Chem. 255:2073), or other glycolytic enzymes (I less et al. (1968) J. Adv. Enzyme Reg. 7:149; and Holland etal. (1978) Biochern. 17:4900). such as enolase.

glyceraldehyde phosphate dehydrogenase, hexokinase, pyi uvate decarboxylase,

phosphofructokinase. glucose phosphate isomerase. 3-phosphoglycerate mutase. pyruvate kinase, triosephosphate isomerase, phosphoglucosc isomerase. and glucokinase. |0108| Other suitable vectors and promoters for use in yeast expression are further described in Hitzeman, EPA-73,657 or in Fleer et al. (1991) Gene, 107:285-195; and van den Berg et al. (1990) Bio/Technology, 8:135-139. Another alternative includes, but is not limited to, the glucose-repressible ADH2 promoter described by Russell et al. (,/. Biol. Chem.

258:2674, 1982) and Beier et al. {Nature 300:724, 1982), or a modified ADHl promoter. Shuttle vectors replicable in yeast and E. coli can be constructed by, for example, inserting DNA sequences from pBR322 for selection and replication in E. coli (Amp' gene and origin of replication) into a yeast vector.

|0109| Yeast expression vectors can include a selectable marker such as LEU2, TRPl, HIS3, and URA3 for selection and maintenance of the transformed DNA. Proteins expressed in yeast are often soluble and co-expression with chaperonins, such as Bip and protein disulfide isomerase, can improve expression levels and solubility. Additionally, proteins expressed in yeast can be directed for secretion using secretion signal peptide fusions such as the yeast mating type alpha-factor secretion signal from Saccharomyces cerevisiae and fusions with yeast cell surface proteins such as the Aga2p mating adhesion receptor or the Arxiila adeninivorans glucoamylase. A protease cleavage site (e.g., the Kex-2 protease) can be engineered to remove the fused sequences from the polypeptides as they exit the secretion pathway.

[0110| Yeast naturally express the required proteins, including GGPP synthase (BSTl; which can produce GGPP) for the mevalonate-dependent isoprenoid biosynthetic pathway. Thus, expression of the diterpene synthases, including diTPS polypeptides provided herein, in yeast cells can result in the production of diterpenes, such as sibongilene from GGPP.

Exemplary yeast cells for the expression of terpene synthases, including diTPS polypeptides, include yeast modified to express increased levels of FPP and/or GGPP. For example, yeast cells can be modified to produce less squalene synthase or less active squalene synthase (e.g. erg9 mutants; see e.g. U.S. Pat. Nos. 6.531,303 and 6,689,593). This results in accumulation of FPP in the host cell at higher levels compared to wild type yeast cells, which in turn can result in increased yields of GGPP and diterpenes (e.g. sibongilene, pseudolaric acid A, pseudolaric acid B, pseudolaric acid C. or a derivative thereof). In another example, yeast cells can be modified to produce more GGPP synthase by introduction of a GGPP synthase gene, such as BTSl from A cerevisiae, crtE from Erwinia iiredovora, crtE from

Xanthopbyllomyces denclrorhous. al-3 from Neiispora crassa or ggs from Giverella fujiiiroi (see U.S. Pal. No. 7,842,497). In some examples, the native GGPP gene in such yeast can be deleted. Other modifications that enable increased production of GGPP in yeast include, for example, but are not limited to, modifications that increase production of acetyl CoA, inactivate genes that encode enzymes that use FPP and GPP as substrate and overexpress of HMG-CoA reductases, as described in U.S. Pat. No. 7,842,497. Exemplary modified yeast cells include, but are not limited to, modified Saccharomyces cerevisiae strains CALI5-1 (ura3, leu2, his3, trpl, A erg9::HIS3, HMG2cat/TRPl::rDNA, dppl, sue), ALX7-95 (ura3, his3, trpl. Aerg9::HIS3, HMG2cat/TRPl ::rDNA, dppl sue), ALXl 1-30 (ura3, trpl, erg9‘'‘^*^2.5, HMG2cat/TRPl ::rDNA, dppl, sue), w'hich are known and described in one or more of U.S. Pat. Nos. 6,531,303, 6,689,593, 7,838,279, 7,842,497, and published U.S. Pat. Application Serial Nos. 20040249219 and 20110189717. Plants and Plant Cells

[0111] Transgenic plant cells and plants can be used for the expression of dilerpene synthases, including diTPS polypeptides provided herein. Expression constructs are typically transferred to plants using direct DNA transfer such as microprojectile bombardment and PEG-mediated transfer into protoplasts, and with agrobacteriim-med\a\Qd transformation. Expression vectors can include promoter and enhancer sequences, transcriptional termination elements, and translational control elements. Expression vectors and transformation techniques are usually divided between dicot hosts, such as Arahidopsis and tobacco, and monocot hosts, such as corn and rice. Examples of plant promoters used for expression include the cauliflower mosaic virus promoter, the nopaline synthase promoter, the ribose bisphosphate carboxylase promoter and the ubiquitin and UBQ3 promoters. Selectable markers such as hygromycin, phosphomannose isomerase and neomycin phosphotransferase are often used to facilitate selection and maintenance of transformed cells. Transformed plant cells can be maintained in culture as cells, aggregates (callus tissue) or regenerated into whole plants. Transgenic plant cells also can include algae engineered to produce proteins (see. for example. Mayfield et al. (2003) Proc Nallylcad Set USA 100:438-442). Transformed plants include, for example, plants selected from the genera Picea (spruce), Pinus (pine), Abies (fir). Pbysconiitrella. Fiinariaceae, Nicoliana, Solanwv, Sorghum, Arabidopsis, Medicago (allalfa), Gossypiiim (cotton), Brassica (rape), Artemisia, Salvia and Mentha. In some examples, the plant belongs to the species of Nicoliana tahacimi, Nicoliana henthamiana or Physcomitrella patens, and is transformed with vectors that overexpress a di I PS and optionally a geranyIgeranyi diphosphate synthase, such as described in U.S. Pat. Pub. No. 20090123984 and U.S. Pat. No. 7,906,710. Insecls and Insect Cells

|0112] Insects and insect cells, particularly a baculovirus expression system, can be used for expressing diterpene synthases, including diTPS polypeptides provided herein (see, for example, Muneta et al. (2003),/. Vet. Med. Set. 65(2):219-223). Insect cells and insect larvae, including expression in the haemolymph, express high levels of protein and are capable of most of the post-translational modifications used by higher eukaryotes. Baculoviruses have a restrictive host range which improves the safety and reduces regulatory concerns of eukaryotic expression. Typically, expression vectors use a promoter such as the polyhedrin promoter of baculovirus for high level expression. Commonly used baculovirus systems include baculoviruses such as Autographa californica nuclear polyhedrosis virus (AcNPV), and the Bombyx rnori nuclear polyhedrosis virus (BmNPV) and an insect cell line such as Sf9 derived from Spodopterafrugiperda, Pseiidaletia imipimcta (A7S) and Danaus plexippns (DpN 1). For high level expression, the nucleotide sequence of the molecule to be expressed is fused immediately downstream of the polyhedrin initiation codon of the virus. Mammalian secretion signals are accurately processed in insect cells and can be used to secrete the expressed protein into the culture medium. In addition, the cell lines Pseiidaletia unipimcla (A7S) and Danaus plexippns (DpNl) produce proteins with glycosylation patterns similar to mammalian cell systems.

[0113] An alternative expression system in insect cells is the use of stably transformed cells. Cell lines such as the Schnieder 2 (S2) and Kc cells {Drosophila melanogaster) and C7 cells (Aedes albopictus) can be used for expression. The Drosophila metallothionein promoter can be used to induce high levels of expression in the presence of heavy metal induction with cadmium or copper. Expression vectors are typically maintained by the use of selectable markers such as neomycin and hygromycin. Mammalian Expression

Mammalian expression systems can be used to express diterpene synthases, including diTPS

[0114] polypeptides provided herein and also can be used to produce diterpenes whose reactions are catalyzed by the synthases. Expression constructs can be transferred to mammalian cells by viral infection such as adenovirus or by direct DNA transfer such as liposomes, calcium phosphate, DEAE-dextran and by physical means such as electroporation and microinjection. Expression vectors for mammalian cells typically include an mRNA cap site, a TATA box, a translational initiation sequence (Kozak consensus sequence) and polyadenylation elements. Such vectors often include transcriptional promoter-enhancers for high level expression, for example the SV40 promoter-enhancer, the human cytomegalovirus (CMV) promoter, and the long terminal repeat of Rous sarcoma virus (RSV). These promoter-enhancers are active in many cell types. Tissue and cell-type promoters and enhancer regions also can be used for expression. Exemplary promoter/enhancer regions include, but are not limited to, those from genes such as elastase 1, insulin, immunoglobulin, mouse mammary tumor virus, albumin, alpha-fetoprotein, alpha 1-antitrypsin, beta-globin, myelin basic protein, myosin light chain-2 and gonadotropic releasing hormone gene control. Selectable markers can be used to select for and maintain cells with the expression construct. Examples of selectable marker genes include, but are not limited to, hygromycin B phosphotransferase, adenosine deaminase, xanthine-guanine phosphoribosyl transferase, aminoglycoside phosphotransferase, dihydrofolate reductase and thymidine kinase. Fusion with cell surface signaling molecules such as TCR-C and Fc,;Rl-y can direct expression of the proteins in an active state on the cell surface.

|0115] Many cell lines are available for mammalian expression including mouse, rat human, monkey, and chicken and hamster cells. Exemplary cell lines include, but are not limited to, BHK (i.e. BHK-21 cells), 293-F, CHO, CHO Express (CHOX; Excellgene), Balb/3T3, HeLa, MT2, mouse NSO (non-secreting) and other myeloma cell lines, hybridoma and heterohybridoma cell lines, lymphocytes, fibroblasts, Sp2/0, COS, NII13T3, HEK293, 293S, 293T, 2B8, and FIKB cells. Cell lines also are available adapted to serum-free media which facilitates purification of secreted proteins from the cell culture media. One such example is the serum free EBNA-1 cell line (Pham et al. (2003) Bioteclmol. Bioeng. 84:332­ 342). Purification

101161 Also provided is a method of producing the di'fPS polypeptide. The diTPS polypeptide can be purified using standard chromatographic techniques.

|0117| The polypeptide to be used when the method is carried out in vitro can be obtained by extraction from any organism expressing it. using standard protein or enzyme extraction technologies. If the host organism is a unicellular organism or cell releasing the polypeptide of the invention into the culture medium, the poly peptide can simply be collected from the culture medium, for example by centrifugation, optionally followed by washing steps and re­ suspension in suitable buffer solutions. If the organism or cell accumulates the polypeptide within its cells, the polypeptide can be obtained by disruption or lysis of the cells and further extraction of the polypeptide from the cell lysate.

|0118| Methods for purification of diterpene synthases, such as cliTPS polypeptides, from host cells depend on the chosen host cells and expression systems. For secreted molecules, proteins are generally purified from the culture media after removing the cells. For intracellular expression, cells can be lysed and the proteins purified from the extract. When transgenic organisms such as transgenic plants and animals are used for expression, tissues or organs can be used as starting material to make a lysed cell extract. Additionally, transgenic animal production can include the production of polypeptides in milk or eggs, which can be collected, and if necessary the proteins can be extracted and further purified using standard methods in the art.

|0119] Diterpene synthases, including cliTPS polypeptides provided herein, can be purified using standard protein purification techniques known in the art including but not limited to, SDS-PAGE, size fraction and size exclusion chromatography, ammonium sulfate precipitation, chelate chromatography and ionic exchange chromatography. Expression constructs also can be engineered to add an affinity tag such as a myc epitope, GST fusion or Flis6and affinity purified with myc antibody, glutathione resin, and Ni-resin, respectively, to a protein. Purity can be assessed by any method known in the art including gel

electrophoresis and staining and spectrophotometric techniques. The polypeptides, either in an isolated form or together with other proteins, for example in a crude protein extract obtained from cultured cells or microorganisms, can then be suspended in a buffer solution at optimal pH. If adequate, salts, DTT, BSA and other kinds of enzymatic co-factors, can be added in order to optimize enzyme activity. Fusion Proteins

|0120| Fusion proteins containing a diterpene synthase, including diTPS polypeptides, and one or more other polypeptides also are provided. Linkage of a diterpene synthase polypeptide with another polypeptide can be effected directly or indirectly via a linker. In one example, linkage can be by chemical linkage, such as via heterobifunctional agents or thiol linkages or other such linkages. Fusion also can be effected by recombinant means. Fusion of a diterpene synthase, such as a di'fPS polypeptide, e.g., a sibongilene synthase, to another polypeptide can be to the N- or C-terminus of the diTPS polypeptide. |0121] A fusion protein can be produced by standard recombinant techniques. For example, DNA fragments coding for the different polypeptide sequences can be ligated together in­ frame in accordance with conventional techniques, e.g., by employing blunt-ended or stagger-ended termini for ligation, restriction enzyme digestion to provide for appropriate termini, filling-in of cohesive ends as appropriate, alkaline phosphatase treatment to avoid undesirable joining, and enzymatic ligation. In another embodiment, the fusion gene can be synthesized by conventional techniques including automated DNA synthesizers.

Alternatively, PCR amplification of gene fragments can be carried out using anchor primers that give rise to complementary overhangs between two consecutive gene fragments that can subsequently be annealed and reamplified to generate a chimeric gene sequence (see, e.g., Ausubel et al. (eds.) Current Protocols in Molecular Biology, John Wiley & Sons, 1992). Moreover, many expression vectors are commercially available that already encode a fusion moiety (e.g., a GST polypeptide). For example, an Pxa'fPSS polypeptide-encoding nucleic acid can be cloned into such an expression vector such that nucleic acid encoding PxaTPSS is linked in-frame to a polypeptide encoding a protein purification tag, such as a Flis tag. In another example, a nucleic acid molecule encoding a diTPS polypeptide can be linked in­ frame to a polypeptide encoding a GGPP synthase. The diTPS polypeptide and additional polypeptide can be linked directly, without a linker, or alternatively, linked indirectly in­ frame with a linker.

111. Formulations

|0122] The compositions of the present invention can be prepared in a wide variety of oral, parenteral and topical dosage forms. Oral preparations include tablets, pills, powder, dragees, capsules, liquids, lozenges, cachets, gels, syrups, slurries, suspensions, etc., suitable for ingestion by the patient. The compositions of the present invention can also be administered by injection, that is, intravenously, intramuscularly, intracutaneously, subcutaneously, intraduodenally, or intraperitoneally. Also, the compositions described herein can be administered by inhalation, for example, intranasally. Additionally, the compositions of the present invention can be administered transdermally. 'fhe compositions of this invention can also be administered by intraocular, intravaginal. and intrarectal routes including

suppositories, insufflation, powders and aerosol formulations (for examples of steroid inhalants, see Rohatagi,./ Clin. Pharmacol 35:1 187-1 193. 1995; Tjvva..'lm?. Allergy Asllmia Imnnmol. 75:107-1 I I, 1995). Accordingly, the present invention also provides pharmaceutical compositions including a pharmaceutically acceptable carrier or excipient and the anti-inflammatory glucocorticosteroid and/or the GR modulator of Formula I.

|0123| For preparing pharmaceutical compositions from the compounds of the present invention, pharmaceutically acceptable carriers can be either solid or liquid. Solid form preparations include powders, tablets, pills, capsules, cachets, suppositories, and dispersible granules. A solid carrier can be one or more substances, which may also act as diluents, flavoring agents, binders, preservatives, tablet disintegrating agents, or an encapsulating material. Details on techniques for formulation and administration are well described in the scientific and patent literature, see, e.g., the latest edition of Remington's Pharmaceutical Sciences, Maack Publishing Co, Easton PA ("Remington's").

[0124] In powders, the carrier is a finely divided solid, which is in a mixture with the finely divided active component. In tablets, the active component is mixed with the carrier having the necessary binding properties in suitable proportions and compacted in the shape and size desired. The powders and tablets preferably contain from 5% or 10% to 70% of the anti­ inflammatory glucocorticosteroid and./or the GR modulator of Formula I.

[0125] Suitable solid excipients include, but are not limited to, magnesium carbonate; magnesium stearate; talc; pectin; dextrin; starch; tragacanth; a low melting wax; cocoa butter; carbohydrates; sugars including, but not limited to, lactose, sucrose, mannitol, or sorbitol, starch from corn, wheat, rice, potato, or other plants; cellulose such as methyl cellulose, hydroxypropylmethyl-cellulose, or sodium carboxymethylcellulose; and gums including arabic and tragacanth; as well as proteins including, but not limited to, gelatin and collagen. If desired, disintegrating or solubilizing agents may be added, such as the cross-linked polyvinyl pyrrolidone, agar, alginic acid, or a salt thereof, such as sodium alginate.

|0126| Dragee cores are provided with suitable coatings such as concentrated sugar solutions, which may also contain gum arabic, talc, polyvinylpyrrolidone, carbopol gel, polyethylene glycol, and/or titanium dioxide, lacquer solutions, and suitable organic solvents or solvent mixtures. Dyestuffs or pigments may be added to the tablets or dragee coatings for product identification or to characterize the quantity of active compound (i.e., dosage).

Pharmaceutical preparations of the invention can also be used orally using, for example, push-fit capsules made of gelatin, as well as soft, sealed capsules made of gelatin and a coating such as glycerol or sorbitol. Push-fit capsules can contain the anti-inflammatory glucocorticosteroid and/or the GR modulator of Formula 1 mixed with a filler or binders such as lactose or starches, lubricants such as talc or magnesium stearate, and, optionally, stabilizers. n soft capsules, the anti-inflammatory glucocorticosteroid and/or the GR modulator of Formula 1 may be dissolved or suspended in suitable liquids, such as fatty oils, liquid paraffin, or liquid polyethylene glycol with or without stabilizers.

|0127| For preparing suppositories, a low melting wax, such as a mixture of fatty acid glycerides or cocoa butter, is first melted and the anti-inflammatory glucocorticosteroid and/or the GR modulator of Formula 1 are dispersed homogeneously therein, as by stirring. The molten homogeneous mixture is then poured into convenient sized molds, allowed to cool, and thereby to solidify.

|0128| Liquid form preparations include solutions, suspensions, and emulsions, for example, water or water/propylene glycol solutions. For parenteral injection, liquid preparations can be formulated in solution in aqueous polyethylene glycol solution.

|0I29] Aqueous solutions suitable for oral use can be prepared by dissolving the anti­ inflammatory glucocorticosteroid and/or the GR modulator of Formula I in water and adding suitable colorants, flavors, stabilizers, and thickening agents as desired. Aqueous suspensions suitable for oral use can be made by dispersing the finely divided active component in water with viscous material, such as natural or synthetic gums, resins, methylcellulose, sodium carboxymethylcellulose, hydroxypropylmethylcellulose, sodium alginate, polyvinylpyrrolidone, gum tragacanth and gum acacia, and dispersing or wetting agents such as a naturally occurring phosphatide (e.g., lecithin), a condensation product of an alkylene oxide with a fatty acid (e.g., polyoxyethylene stearate), a condensation product of ethylene oxide with a long chain aliphatic alcohol (e.g., heptadecaethylene oxycetanol), a condensation product of ethylene oxide with a partial ester derived from a fatty acid and a hexitol (e.g., polyoxyethylene sorbitol mono-oleate), or a condensation product of ethylene oxide with a partial ester derived from fatty acid and a hexitol anhydride (e.g.,

polyoxyethylene sorbitan mono-oleate). I'he aqueous suspension can also contain one or more preservatives such as ethyl or n-propyl p-hydroxybenzoate, one or more coloring agents, one or more flavoring agents and one or more sweetening agents, such as sucrose, aspartame or saccharin. Formulations can be adjusted for osmolarity.

|0I30| Also included are solid form preparations, which are intended to be converted, shortly before use. to liquid form preparations for oral administration. Such liquid forms include solutions, suspensions, and emulsions. I'hese preparations may contain, in addition to the active component, colorants, flavors, stabilizers, buffers, artificial and natural sweeteners, dispersants, thickeners, solubilizing agents, and the like.

|01311 Oil suspensions can be formulated by suspending the anti-inflammatory glucocorticosteroid and/or the GR modulator of Formula I in a vegetable oil, such as arachis oil, olive oil, sesame oil or coconut oil, or in a mineral oil such as liquid paraffin; or a mixture of these. The oil suspensions can contain a thickening agent, such as beeswax, hard paraffin or cetyl alcohol. Sweetening agents can be added to provide a palatable oral preparation, such as glycerol, sorbitol or sucrose. These formulations can be preserved by the addition of an antioxidant such as ascorbic acid. As an example of an injectable oil vehicle, see Minto,,/. Pharmacol. Exp. Ther. 281;93-102, 1997. The pharmaceutical formulations of the invention can also be in the form of oil-in-water emulsions. The oily phase can be a vegetable oil or a mineral oil, described above, or a mixture of these. Suitable emulsifying agents include naturally-occurring gums, such as gum acacia and gum tragacanth, naturally occurring phosphatides, such as soybean lecithin, esters or partial esters derived from fatty acids and hexitol anhydrides, such as sorbitan mono-oleate, and condensation products of these partial esters with ethylene oxide, such as polyoxyethylene sorbitan mono-oleate. The emulsion can also contain sweetening agents and flavoring agents, as in the formulation of syrups and elixirs. Such formulations can also contain a demulcent, a preservative, or a coloring agent.

[0132| The compositions of the present invention can also be delivered as microspheres for slow release in the body. For example, microspheres can be formulated for administration via intradermal injection of drug-containing microspheres, which slowly release

subcutaneously (see Rao,./. Biomater Sci. Polyrn. Ed. 7:623-64.5, 1995; as biodegradable and injectable gel formulations (see, e.g., Gao Pharm. Res. 12:857-863, 1995); or, as microspheres for oral administration (see, e.g., Eyies,./. Pharm. Pharmacol. 49:669-674, 1997). Both transdermal and intradermal routes afford constant delivery for weeks or months.

|0133| In another embodiment, the compositions of the present invention can be formulated for parenteral administration, such as intravenous (IV) administration or administration into a body cavity or lumen of an organ. The formulations for administration will commonly comprise a solution of the compositions of the present invention dissolved in a

pharmaceutically acceptable carrier. Among the acceptable vehicles and solvents that can be employed are water and Ringer's solution, an isotonic sodium chloride. In addition, sterile fixed oils can conventionally be employed as a solvent or suspending medium. For this purpose any bland fixed oil can be employed including synthetic mono- or diglycerides. In addition, fatty acids such as oleic acid can likewise be used in the preparation of injectables. These solutions are sterile and generally free of undesirable matter. These formulations may be sterilized by conventional, well known sterilization techniques. The formulations may contain pharmaceutically acceptable auxiliary substances as required to approximate physiological conditions such as pH adjusting and buffering agents, toxicity adjusting agents, e.g., sodium acetate, sodium chloride, potassium chloride, calcium chloride, sodium lactate and the like. The concentration of the compositions of the present invention in these formulations can vary widely, and will be selected primarily based on fluid volumes, viscosities, body weight, and the like, in accordance with the particular mode of administration selected and the patienfs needs. For IV administration, the formulation can be a sterile injectable preparation, such as a sterile injectable aqueous or oleaginous suspension. This suspension can be formulated according to the known art using those suitable dispersing or wetting agents and suspending agents. The sterile injectable preparation can also be a sterile injectable solution or suspension in a nontoxic parenterally-acceptable diluent or solvent, such as a solution of 1,3-butanediol.

[0134] In another embodiment, the formulations of the compositions of the present invention can be delivered by the use of liposomes which fuse with the cellular membrane or are endocytosed, i.e., by employing ligands attached to the liposome, or attached directly to the oligonucleotide, that bind to surface membrane protein receptors of the cell resulting in endocytosis. By using liposomes, particularly where the liposome surface carries ligands specific for target cells, or are otherwise preferentially directed to a specific organ, one can focus the delivery of the compositions of the present invention into the target cells in vivo. (See, e.g., Al-Muhammed,A4icroencapsid. 13:293-306, 1996; Chonn, Ciirr. Opin.

Biotec/mol. 6:698-708, 1995; Ostro,/!/». ,/. Hasp. Phann. 46:1576-1587, 1989). EXAMPLES

Example 1: Discovery and Characterization of Novel Diterpene Synthase

Materials and Vlethods

|0135j Plant material. One-year-old P. amabilis saplings were obtained from the Camellia Forest nursery (www.camfoi'est.com). Seeds of Nicoliana benihaniiana were collected from mature plants. Plants were grown in Conviion TCR120 growth chambers (www.conviron.com) under a photoperiod of 16 h, 60% relative humidity, 100 pmol m'"s"' light intensity, and a day/night temperature cycle of 21/18°C.

[0136] Gene diseovery and eDNA cloning. The root-specific transcriptome of P. awabilis was described previously (11). Transcripts of interest were selected by querying the transcriptomes against a curated TPS database and subsequent phylogenetic analysis.

Relative transcript abundance was calculated by mapping adapter-trimmed lllumina reads against the assembled transcripts using BWA version 0.5.9-rl6. Reads were mapped as paired with 350 bp maximum insert size. Selected cDNAs were amplified from total RNA with gene-specific oligonucleotides (Table 3) and ligated into the pJET vector

(www.Clontech.com) for sequence verification.

[0137] In vitro enzyme analysis. A truncated form, lacking the plastidial transit peptide, of PxaTPS8 (A26) and the full length PxaTPSS cDNA were subcloned into the pE r28b vector (www.cmdmillipore.com) and expressed in E. coli BL21DE3-C41 cells, followed by Ni“ - affinity purification and enzyme assays as previously described (II), For activity assays, 50 pg of recombinant protein and 15 pM of GPP, FPP or GGPP (vvwvv.sigma.com) were mixed in 50 mM HEPES (pH 7.2), 7.5 mM MgCF, 5% (v/v) glycerol, 5 mM DTT, and incubated at 30°C for 1 h. Product were extracted with 500 pi pentane analyzed by GC/MS.

[0138] Expression in Nicotiana henthamiana. Full-length constructs of PxaTPSS and PxaTPSS were cloned into the pLlFE33 vector and transformed into A. twnefaciens strain GV3101 as previously described (11). Cultures containing the PxaTPSS or PxaTPSS constructs were mixed with one culture volume of the RNA silencing suppressor construct pi 9 (42), and pressure infiltrated into the abaxial side of the leaves of 6-week-old

N. henthamiana plants. Expression of pi 9 only served as a control. After five days, metabolites were extracted from a single infected leaf and analyzed by GC/MS.

[0139] Sibongilene formation in yeast. The truncated PxaTPS8(A26) was cloned into multiple cloning site 1 of the pESC-HlS:BTSl plasmid for co-expression with the yeast endogenous GGPP synthase BTSI (34). Plasmids were transformed into the S. cerevisiae strain AM94 (35) and cells grown in 1 L selective dropout medium (-Flis, -Leu, with 2% dextrose) at 30“C and 250 rpm until an ODsoo of -0.6. Cells were transferred into 1 L of YEP medium with 2% galactose for induction. After 41 h, the diterpene product was extracted from harvested cells by vortexing with glass beads in 5 ml of diethyl ether and separation on a silica matrix using 95:5% hexane:ethyl acetate (v,v).

[0140] GC/MS analysis. Terpenes were analyzed on an Agilent 7890B GC interfaced with a 5977 Extractor XL MSD at 70 eV and 1.2 ml miif' He fiow, using a HP5-ms column (30 m, 250 pm i.d., 0.25 pm film). GC parameters: 40-50°C for 1-2 min, 10-20°C miiC' to 300°C, hold 3 min; pulsed splitless injection at 250°C,

[0141] NMR analysis. Sibongilene was produced forNMR analysis using an E. coli system engineered forterpene formation (28). PxaTPS8 expression was conducted in 2 L Terrific Broth (TB, pH 7.0) medium at 16°C, and induction with 1 mM IPTG and 2 mM MgCL. After 72 h, the sibongilene was extracted with 1 L hexane and purified on silica matrix (70-230 mesh size) using hexane. Proton, ' 'C and HSOC spectra were acquired in deuterated chloroform on a Bruker 800 MHz Avance 111 spectrometer equipped with a Bruker 5 mm CPTCl cryoprobe. [0142] Quantum chemical calculations. NMR calculations were performed with Gaussian 09 using the B3LYP method (43-47). Single point energies and molecular geometries were calculated using the 6-31+G(d,p) basis set. Initial conformational searches were completed using Spartan 10 (48) and the Merck Molecular Force Field (MMFF94) (49). NMR shielding tensors were calculated with the Gauge-Independent Atomic Orbital (GIAO) method, with a 6-31 l+g(2d,p) basis set and the SMD continuum solvent model (50) for chloroform. Scaling factors for chemical shift predictions derive from the Chemical Shift Repository

(cheshirenmr.info) and used as described there. Probable correctness of the structure assignment was assesses via DP4 analysis (31). Mechanistic calculations were carried out at themPWlPW91/6-31+G(d,p) /B3LYP/6-31+G(d,p) level (44-47,51) validated for carbocation reactions (29,32,37). Intrinsic reaction coordinate (IRC) calculations (52) verified the observed transition state structure to be connected to A and B. Structures were visualized in CYLview (www.cylview.org). This report is part 13 of our series on computational studies on diterpene-forming carbocation rearrangements (for part 12 see reference 53).

[0143] Phylogenetic analysis. Protein sequence alignments were performed using clustalW2 and curated with Gblocks (54). A maximum likelihood phylogenetic tree was generated in PhyML (55) with 1000 bootstrap repetitions. Abbreviations and accession numbers for proteins used in the phylogenetic analysis are provided in Table 2.

[0144] Homology modeling and site-directed mutagenesis. Homology models of PxaTPS5 and PxaTPSS were generated using SWISS-MODEL based on the crystal structure of Abies grandis a-bisabolene synthase (24, 3EAS) with stereochemical validation using Ramachandran plots. GGPP was docked in the active site using Molegro Virtual Docker (56). Protein variants were generated using site-specific sense and anti-sense oligonucleotides (Table 3) and the pET28b:PxaTPS8 construct as template. Dpn\ treatment removed template plasmids. All protein variants were sequence verified prior to analysis.

[0145] Accession numbers. Nucleotide sequences described in this study have been deposited to the National Center for Biotechnology Information (NCBl) GenBanlC'^VEBl Data Bank with the following accession numbers: PxaTPSS (KU685114) and PxaTPSS (KU685114).

Results

[0146] TPS in the golden larch root transcriptome. Pseudolaric acids are almost exclusively found in the roots of golden larch (5). We had previously developed a root- specific transcriptome resource (11), in which we found a total of 16 candidate TPS genes (Fig. 1). Of the five diTPSs (PxaTPS4, 10, 12, 15 and 16), with highest database matches to known diTPSs, four represented partial sequences putatively annotated as c/rZ-kaurene synthase or class 1/11 diTPSs. The full length class l/Il diTPS PxaTPS4 was biochemically confirmed as a levopimaradiene/abietadiene synthase (II).

[0147] None of the identified di'fPS candidates matched the predicted class 1 diTPS hypothesized to be involved in PAB biosynthesis. We therefore investigated the remaining TPSs, whose top database matches were either mono- or sesqui-TPSs. The sequences of four transcripts (PxaTPS5-8) of particular interest resembled most closely 3-domain gymnosperm £-a-bisabolene synthases (BlSs), a sesqui-'fPS. Of these, PxaTPSS was given priority for further characterization, due to its high transcript abundance in roots as identified by RNAseq-based transcript mapping (Fig. 1).

|0148| Sequence phytogeny places the 3-domain PxaTPSS into an unusual position of the gymnosperm TPS family. The four BIS-like candidates identified in the golden larch root transcriptome resembled typical 3-domain class 1 TPSs, lacking a class It active site and featuring the DDxxD and NSE/DTE class I catalytic motifs (24). A 3-domain structure is typical for gymnosperm TPSs of the TPS-d3 clade. As expected, our sequence phylogeny placed golden larch PxaTPSS, 6 and 7 closely with known BISs from grand fir {Abies grandis), Douglas fir {Pseiidotsuga menziesii), and Norway spruce {Picea abies) within the gymnosperm TPS-d3 clade (Fig. 2). Surprisingly, despite its 3-domain architecture, PxaTPSS emerged outside the TPS-d3 clade as a distant branch at the base of the TPS-dl clade, which contains known 2-domain mono- and sesqui-TPSs. Phylogenetic position of a 3-domain TPS with the TPS-dl clade is unprecedented, and suggested an unusual evolutionary path of sequence divergence leading to PxaTPSS in golden larch. The unusual pairing of a 3-domain structure and TPS-d! clade association prohibited a functional prediction of PxaTPSS based on similarity with any known gymnosperm mono-, sesqui- or diTPS. However, these features also made it a prime candidate for a potentially unusual function.

|0149J PxaTPSS produces a novel diterpene. We performed in vitro enzyme assays with PxaTPSS expressed in E. coli and in vivo assays using transient Hgro/xr/c/mz/m-mediated expression in Nicotiana benthamiana. For comparison we also functionally characterized PxaTPSS, which represents a TPS-d3 BIS-like enzyme (Fig. 2). In vivo activity assays verified PxaTPSS as an a-BlS, as based on comparison to product reference mass spectra of known a-BISs from grand fir and Norway spruce (20,27) (Fig. 3A). In contrast, in vivo assays of PxaTPSS revealed as a single product a previously unknown diterpene with a fragmentation pattern featuring dominant ions of n//r 93(100), 121(25), 147(31), and 216(57) (Fig. 3B). In vitro assays using affinity-purified recombinant proteins confirmed the activity of PxaTPSS and PxaTPSS (Fig. 7). PxaTPSS was active only with GGPP as a substrate, while no product formation was detected in assays with geranyl diphosphate (GPP) or farnesyl diphosphate (FPP) (Fig. 7).

|0I50| Structural elucidation of the product of PxaTPSS as sibongilene. Using an engineered E. coli expression system (28) yielded sufficient amounts and purity of the PxaTPSS product to enable 1 D and 2D nuclear magnetic resonance (NMR) analysis, which identified a novel diterpene structure, termed sibongilene (Fig. 3C). In addition, we performed quantum chemical calculations of'H and NMR chemical shifts for several sibongilene isomers and all possible diastereomers of each (29,30) (48 structures total; Fig. 8). Mean absolute deviations (MAD) from experimental 'FI and '^C NMR data well within accepted ranges for correct structural assignments (30) were found for the isomer of sibongilene shown in Fig. 3C (as low as 0.24 ppm for 'FI and 2.7 ppm for ''^C). To further verify our structural proposal, we performed DP4 statistical analyses (31) of computed and experimental NMR data, which indicated a >95% probability (combined 'H and ’’C) for the depicted isomer to be correct (Fig. 8). While we are confident in the structural connectivity shown, the relative stereochemical assignment remains to be confirmed, due to the highly complex conformational sampling for the various possible diastereomers of sibongilene. Further work is underway to address this issue (for sibongilene and other flexible natural products) and will be reported in due course. These studies demonstrate PxaTPSS as a new diTPS expanding the catalytic diversity of the enzyme family, due to its capacity for transforming GGPP into a 5,7-trans-fused bicyclic scaffold that represents the characteristic core structure of pseudolaric acids (5).

|0151] Active site determinants of PxaTPSS function. We conducted homology modeling of PxaTPS8 based on the crystal structure of grand fir a-BlS (AgBIS) (24) and molecular docking of GGPP into the class I active site cavity to probe catalytic residues that determine the enzyme’s unique activity (Fig. 4). As common features, residues of the Mg~'- coordinating DDxxD and NSE/DTE motifs and three arginines (R558, R560, R736) previously shown as essential for substrate binding and catalysis in AgBIS and other class 1 TPSs are conserved in PxaTPSS. Consistently, alanine substitution of R558 abolished catalytic activity (Fig. 4). However, the low level of overall protein sequence identity of 32­ 38% between the class I active sites of PxaTPSS and previously characterized gymnosperm TPSs illustrated a large degree of evolutionary divergence (Fig. 9). Notably, this included substitutions at active site positions known to impact class I TPS product specificity (24 -26) (Fig. 4). Specifically. PxaTPSS A701 located at the hinge region on helix G, and Y564 positioned at the back of the active site cavity in direct proximity to the hydrocarbon tail of the GGPP substrate appeared to be unique compared to known gymnosperm fPSs (Fig. 4). In addition. S696 and G697 are located at the hinge region, which has been highlighted as a 'hot spot' for directing product outcome in gymnosperms TPSs (24-26). Substitution of PxaTPSS S696, E N Tto and A70I for different functional residues prominent in other gymnosperm TPSs led to a decrease (S696I, S696V, G697S and AlaVOl C) or complete loss (A701L) of activity (Fig. 4). None of the mutations resulted in changes in product or substrate specificity. In addition, substitution of Y564 for Thr, Val resulted in a loss of function, while exchange to Phe merely entailed a modest decrease in activity. This illustrates the importance of the aromatic ring in controlling sibongilene formation, likely via carbocation stabilizing to enable alkyl migration prior to deprotonation of the carbocation. These results support a function of S696, G697, A701 and Y564 in PxaTPSS catalysis.

|01S2| Quantum chemical calculations reveal a unique reaction mechanism in sibongilene formation. We used quantum chemical calculations (mPWlPW91/6-31+G(d,p)//B3LYP/6- 31+G(d,p); (30,32) to assess the viability of carbocation cyclization and rearrangement mechanisms for sibongilene formation (see Fig. 8 for computed geometries and additional details; a single diastereomer is discussed here, but similar results were found for other diastereomers). According to these calculations, the first carbocation intermediate (A) results from initial 1,6-cyclization (Fig. 5). The subsequent 1,2-alkyl shift and 6,10-cyclization occur in a single step (33) to form a carbocation (B) with the characteristic 5,7-fused bicyclic scaffold. The product of this step has a calculated energy 3.4 kcal/mol higher than A, and is formed with an activation energy of 13.6 kcal/mol. Deprotonation of B would form the final product (sibongilene).

[0153] Formation of sibongilene in engineered yeast To develop a proof-of-concept production platform for sibongilene, we co-transformed PxaTPSS with the yeast

{Saccharoinyces cerevisiae) GGPP synthase BTSl (34) in the engineered yeast strain AM94 that provides elevated terpene precursor yield (35). Sibongilene was abundant solely in cell pellets after induction with galactose for 41 h, yielding a product amount of 1 mg per L culture (Fig. 10). Dephosphorylated GGPP and squalene as major by-products after diethyl ether extraction could be readily removed by simple chromatography on silica matrix to afford sibongilene in greater than 90% purity. These findings outline a promising foundation for a microbial production system for sibongilene to enable efficient discovery of downstream pathw'ay components and related diterpene stmctures.

Discussion

|0154| The vast chemical space of diterpene natural products, which originate from the natural variation ofdi'fPS enzymes, provides a rich repertoire of known and potentially new' pharmaceutical lead compounds. Exploring diterpene chemical diveisity continues to be important as drug discovery of recent decades has been falling short of meeting the demand for new and improved therapies (36), Currently, only a few plant-derived diterpene pharmaceuticals, such as taxol and forskolin, are available at industrially relevant scale (2,4), due to the often low abundance of diterpenes in the natural source material or uneconomic chemical synthesis.

101551 Pseudolaric acids from the traditional Chinese medicinal tree golden larch have been recognized for their chemotherapeutic potential and prevention of multidrug resistance (6-9). The discovery of golden larch PxaTPSS presents a unique new catalyst among the diTPS portfolio with applications for biotechnological production of pseudolaric acids. 101561 Biochemical and quantum chemical mechanistic insights into the PxaTPSS- facilitated formation of sibongilene substantially expand our knowledge of the mechanistic and evolutionary underpinnings of diterpene chemical diversity. The conversion of GGPP to sibongilene (Fig. 5) exemplifies an exceedingly short mechanism that underscores nature’s ability to take advantage of inherent carbocation reactivity to generate a structurally complex product without much worry of diversion to other products that may arise if more discrete intermediates were involved (32,33,37). This hypothesis is further supported by the observation that substitution of select active site residues known to alter product specificity in other diTPSs only entailed a reduced activity but no functional change in PxaTPSS (Fig. 5). 10157] Prior to this work, knowledge of secondary diterpene metabolism in gymnosperms included the bifunctional class I/ll diTPSs involved in the biosynthesis of labdane diterpenes and derived diterpene resin acids of conifer chemical defense (25). In addition, only a few monofunctional class 1 diTPSs involved in diterpene resin acid biosynthesis in pine and taxane formation in species of yew had been described (2,4; Fig. 6). All of these represent 3- domain enzymes of (he TPS-d3 clade (Fig. 2). Surprisingly, PxaTPSS, which marks a diTPS catalyzing a new reaction mechanism eri route to a unique diterpene, sibongilene, was identified as the first 3-domain di'fPS at the base of the TPS-dl clade (Fig. 2 & 3). The 3- domain structure of Pxa'fPSS suggests a common origin with BIS enzymes. It appears that BIS-like genes in golden lai ch. of which there are at least four different members, may have undergone more substantial evolution with regard to gene number and functions compared to the corresponding, apparently single copy BIS genes in other gymnosperm species

(20.27,38.39). And this diversillcation may have resulted in the unique pseudolaric acid biosynthesis in the golden larch tree as the sole species of its genus. The sibongilene scaffold may be unique to diterpenes of golden larch, but structurally similar sphenolobane and tormesane diterpenes have been described in liverworts of the genus Anastrophylimi (40) and the eudicot Halimium viscosum {Cristaceae; 41). However, diTPSs of these species are not known.

[0158] Considering the low abundance of PAB in the roots of golden larch (11), metabolic engineering of PxaTPSS into a microbial or plant host paves the way to sibongilene production for semi-synthesis of PAB and related compounds. Such a recombinant system also provides a superb platform for accelerating the discovery of downstream enzymes of PBA biosynthesis.

References

1. De Luca V, Salim V, Atsumi SM, Yu F (2012) Mining the biodiversity of plants; a revolution i n the making. Science .336(6089): 1658-61.

2. Zerbe P, Bohlmann J (2015) Plant diterpene synthases: exploring modularity and metabolic diversity for bioengineering. Trends Biotechnol 33(7):419-28

3. Tholl D (2015) Biosynthesis and biological functions of terpenoids in plants. Adv Biochem Eng Biotechnol 148:63-10.

4 . Bohlmann J, Keeling Cl (2008) Terpenoid biomaterials. Plant J 54(4):656-69.

5, Chiu P, Leung LT, Ko BCB (2010) Pseudolaric acids; isolation, bioactivity and synthetic studies. Eat Prod Rep 27(7): 1066-83.

6. Wong VKW et al. (2005) Pseudolaric acid B, a novel microtubule-destabilizing agent that circumvents multidrug resistance phenotype and exhibits antitumor activity in vivo. Clin Cancer R es Off J Am Assoc Cancer Res 11( 16):6002-1 1.

7. Li M, Hong L (2015) Pseudolaric acid B exerts antitumor activity via suppression of the Akt signaling pathway in HeLa cervical cancer cells. Mol Med Rep 12(2):2021-6.

8. Sun Q, Li Y (2014) The inhibitory effect of pseudolaric acid B on gastric cancer and miiltidrug resistance via Cox-2/PKC-a/P-gp pathway. PloS One 9(9):e 107830.

9. Sarkar T et al. (2012) Interaction of pseudolaric acid B with the colchicine site of tubulin. Biochem Phannacoi 84(4):444-50.

10. Trost BM, Waser J, Meyer A (2008) Total .synthesis of (-)-pseudolaric acid B. J. im ('hem Soc 130(48): 16424-34.

11. Zerbe P et al. (2013) Gene discovery of modular diterpene metabolism in nonmodel systems. Plant Physiol 162(2): 1073-91.

12. Chen F, Tholl D, Bohlmann J, Pichersky E (201 1) The family of terpene synthases in plants: a m id-size family of genes for specialized metabolism that is highly diversified throughout the kingdom. Plant .J 66( 1 ):212-29. 13. f’eters RJ (2010) Two rings in them all: the labdane-related diterpenoids. Na^ Prod Rep 27(11): 1521-30,

14. Zhou K et al. (2012) Insights into diterpene cyclization from structure of bifunctional abietadiene synthase from Abies grandis. J Biol Chem 287(9):6840-50.

15. Koksal M, .lin Y, Coates RM, Croteau R, Christianson DW (2011) Taxadiene synthase structure and evolution of modular architecture in terpene biosynthesis. Naliire 769(7328):! 16-20. 16. Gao Y, Honzatko RB, Peters R.1 (2012) Terpenoid synthase structures: a so far incomplete view of complex catalysis. Nat Prod Rep 29(10):1153-75.

17. Schepmann HG, Pang J, Matsuda SP (2001) Cloning and characterization of Ginkgo biloha levopimaradiene synthase which catalyzes the first committed step in ginkgolide biosynthesis. Z/r/? Biochem Biophys 392(2):263-9.

18. Zerbe P et al. (2012) Bifunctional c/.v-abienol synthase from Abies balsamea discovered by transcriptome sequencing and its implications for diterpenoid fragrance production. J Bio! Chem 287(15):12121-3I.

19. Peters RJ et al. (2000) Abietadiene synthase from grand fir (Abies grandis): characterization and mechanism of action of the“pseudomature” recombinant enzyme. Biochemistry 39(50): 15592­ 602.

20. Martin DM, Faldt J, Bohlmann J (2004) Functional characterization of nine Norway Spruce TPS genes and evolution of gymnosperm terpene synthases of the TPS-d subfamily. Plant Physio! 135(4): 1908-27.

21. Keeling Cl, Madilao LL, Zerbe P, Dullat HK, Bohlmann J (2011) The primary diterpene synthase products of Picea abies levopimaradiene/abietadiene synthase (PaLAS) are epimers of a thermally unstable diterpenol. J Bio! Chem 286(24):21145-53.

22. Hall DE et al. (2013) Evolution of conifer diterpene synthases: diterpene resin acid biosynthesis in lodgepole pine and jack pine involves monofunctional and bifunctional diterpene synthases. Plant Physio! 161(2):600-16.

23. Wildung MR, Croteau R (1996) A cDNA clone for taxadiene synthase, the diterpene cyclase that catalyzes the committed step of taxol biosynthesis. J Bio! Chem 271(16):9201-4.

24. McAndrew RP et al. (2011) Structure of a three-domain sesquiterpene synthase: a prospective target for advanced biofuels production. Structnre 19( 12): 1876-84.

25. Keeling Cl, Weisshaar S, Lin RPC, Bohlmann J (2008) Functional plasticity of paralogous diterpene synthases involved in conifer defense. Proc Natl Acad Sc i 105(3): 1085-90.

26. Peters Croteau RB (2002) Abietadiene synthase catalysis: mutational analysis of a prenyl diphosphate ionization-initiated cyclization and rearrangement, Proc Natl AcadSci 99(2):580-4. 27. Bohlmann J, Crock J, .letter R, Croteau R (1998) Terpenoid-based defenses in conifers: cDNA cloning, characterization, and functional expression of wound-inducible (£)- -bisabolene synthase from grand fir (Abies grandis). Proc Natl Acad Sci 95( 12):6756-61, 28. Morrone D et ai. (2010) Increasing diterpene yield with a modular metabolic engineering system in E. coli: comparison of MEV and MEP isoprenoid precursor pathway engineering. AppI Microbiol Bioleclmol 85(6): 1893-906.

29. Vaughan MM et al. (2013) Formation of the unusual semivolatile diterpene rhizathalene by the Arabidopsis class 1 terpene synthase TPS08 in the root stele is involved in defense against belowground herbivory. The Plant Cell 25:1 108-25.

30. Lodewyk MW, Siebert MR, Tantillo DJ (2012) Computational prediction of 'H and '^C chemical shifts: A useful tool for natural product, mechanistic, and synthetic organic chemistry. Chem Rev 112(3): 1839-62.

31. Smith SG, Goodman .IM (2010) Assigning stereochemistry to single diastereomers by GIAO NMR calculation: The DP4 probability. J Am Chem Soc 132:12946-59.

32. Tantillo DJ (2013) Walking in the woods with quantum chemistiy-applications of quantum chemical calculations in natural products research. Nat Prod Rep 30(8): 1079-86.

33. Tantillo DJ (2010) The carbocation continuum in terpene biosynthesis—where are the secondai-y cations? Chem Soc Rev 39(8):2847-54.

34. Ro DK, Bohimann J. (2006) Diterpene resin acid biosynthesis in loblolly pine (Pinus taeda): functional characterization of abietadiene-levopimaradiene synthase (PtTPS-LAS) cDNA and subcellular targeting of PtTPS-LAS and abietadienol/abietadienal oxidase (PtAO, CYP720B1). Phytochemistry 67( 15): 1572-8.

35. Ignea C et al. (2015) Efficient diterpene production in yeast by engineering Erg20p into a geranylgeranyl diphosphate synthase. Metab Eng 27:65-7.

36. Scanned JW, Blanckley A, Boldon H, Warrington B (2012) Diagnosing the decline in pharmaceutical R&D efficiency. Nat Rev Drug Discov 11:191-200.

37. Tantillo DJ. Biosynthesis via carbocations: theoretical studies on terpene formation. Nat Prod 28(6): 1035-53.

38. Parveen 1 et al. (2015) Investigating sesquiterpene biosynthesis in Ginkgo biloba: molecular cloning and functional characterization of (AV7)-farnesol and a-bisabolene synthases. Plant Mol /i/o/89(4-5): 451-62.

39. Huber DP et al. (2005) Characterization of four terpene synthase cDN.As from methyl jasmonate-induced Douglas-flr, Pseudotsuga menziesii. Phytochemistry 66( 12): 1427-39.

40. Buchanan MS, Connolly .ID, Rycroft DS (1996) Sphenolobane diterpenoids of the liverwort Anastrophylliim donnianum. 1‘hytochemistry 43(6): 1297-301.

41. Urones JG. Marcos IS, Garrido MD (1990) J'ormesane derivatives of Halimiiim viscosiim. Phytochemistry 29( 10): 3243-46.

42. Voinnet O, Rivas S, Mestre P, Baulcombe D (2003) An enhanced transient expression system in plants based on suppression of gene silencing by the pl9 protein of tomato bushy stunt virus. Plant .733(5):949-56. 43. Frisch MJ et al. (2009) Gaussian 09, Revision B.OI, Gaussian, Inc., Wallingford CT.

44. Becke AD (1993) Density-functional thermochemistry. 111. The role of exact exchange.,/ Chem Phys 98:5648.

45. Becke AD (1993) A new mixing of Hartree-Fock and local density-functional theories../ Chem Phys 98:1372.

46. Lee C, Yang W, Parr RG (1988) Development of the Colic-Salvetti correlation-energy formula into a functional of the electron density. Phys Rev 37(2):785-9.

47. Tirado-Rives J, Jorgensen WL (2008) Performance of B3LYP density functional methods for a large set of organic molecules, J Chem Theory Comput 4:297-306.

48. Shao et al. (2006) Advances in methods and algorithms in a modern quantum chemistry program package. Phys Chem Chem Phys 8:3172-91.

49. Halgren TA (1996) Merck molecular force field. 1. Basis, form, scope, parameterization, and p erfonnance of MMFF94.,/ Comp Chem 17:490-5 19.

50. Marenich AV, Cramer CJ, Truhlar DJ (2009) Universal solvation model based on solute electron density and on a continuum model of the solvent defined by the bulk dielectric constant and atomic surface tensions. Chem B 113:6378-96.

51. Matsuda SPT, Wilson WK, Xiong Q (2006) Mechanistic insights into triterpene synthesis from quantum mechanical calculations. Detection of systematic errors in B3LYP cyclization energies. Org Biomol Chem 4:530-43.

5 2. Maeda S et al. (2015) Intrinsic reaction coordinate: calculation, bifurcation, and automated search, hit J Quantum Chem 115:258-69.

53. Potter KC et al. (2016) Blocking Deprotonation with Retention of Aromaticity in a Plant ent- Copalyl Diphosphate Synthase Leads to Product Rearrangement. AngeM’ Chem hit Ed Engl 55(2):634-8.

5 4. Talavera G, Castresana J (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Bio! 56(4):564-77.

55. Guindon S et al. (2009) Estimating maximum likelihood phylogenies with PhyML. Methods .L/o/5/0/537:113-37.

56. Thomsen R, Christensen Mi l (2006) MolDock: a new technique for high-accuracy molecular docking.,/ Med Chem 49( 11 ):3315-21. INFORMAL SEQUENCE LISTING

[0159] SEQ ID NO:l: Sequence of PxaTPSS polypeptide

MSRFTSATHGLNLSIKMPISVSQVPSIRSNTSKYELQKLRSTGRSVLQTRRQLAIINMTK