Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
BIOSYNTHETIC GENES AND POLYPEPTIDES
Document Type and Number:
WIPO Patent Application WO/2020/249698
Kind Code:
A1
Abstract:
The present invention relates to newly characterised plant genes and polypeptides which have utility in engineering or modifying limonoid or proto-limonoid production in host cells. The invention further relates to systems, methods and products employing the same.

Inventors:
OSBOURN ANNE (GB)
HODGSON HANNAH (GB)
Application Number:
PCT/EP2020/066241
Publication Date:
December 17, 2020
Filing Date:
June 11, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
PLANT BIOSCIENCE LTD (GB)
International Classes:
C07K14/415; A01N65/36; C07D493/18; C12N9/02; C12N9/88; C12N15/81; C12N15/82
Domestic Patent References:
WO2009087391A12009-07-16
WO2007135480A12007-11-29
WO1995034668A21995-12-21
WO1992001047A11992-01-23
Foreign References:
JP2005052009A2005-03-03
JP2005052009A2005-03-03
JP2005052009A2005-03-03
EP0194809A11986-09-17
US5231020A1993-07-27
Other References:
AVINASH PANDREKA ET AL: "De novo Sequencing and Analysis of Transcriptome from Azadirachta indica to Characterize the Genes Involved in Limonoid Biosynthesis", PHD THESIS, 1 May 2018 (2018-05-01), pages 217 - 219 , 227-, XP055718318, Retrieved from the Internet [retrieved on 20200727]
DATABASE UniProt [online] 3 April 2013 (2013-04-03), "RecName: Full=Terpene cyclase/mutase family member {ECO:0000256|RuleBase:RU362003}; EC=5.4.99.- {ECO:0000256|RuleBase:RU362003};", XP055726771, retrieved from EBI accession no. UNIPROT:L7WI23 Database accession no. L7WI23
DATABASE UniProt [online] 3 September 2014 (2014-09-03), "SubName: Full=Uncharacterized protein {ECO:0000313|EMBL:KDO78504.1};", XP055726772, retrieved from EBI accession no. UNIPROT:A0A067GFT7 Database accession no. A0A067GFT7
DATABASE NCBI [online] ANONYMOUS: "Premnaspirodiene oxygenase-like [Citrus sinensis]", XP055719411, Database accession no. XP_006469495
DATABASE UniProt [online] 3 September 2014 (2014-09-03), "SubName: Full=Uncharacterized protein {ECO:0000313|EMBL:KDO44741.1};", XP055726773, retrieved from EBI accession no. UNIPROT:A0A067E1K2 Database accession no. A0A067E1K2
DATABASE UniProt [online] 28 February 2018 (2018-02-28), "SubName: Full=Uncharacterized protein {ECO:0000313|EMBL:GAY33000.1};", XP055726774, retrieved from EBI accession no. UNIPROT:A0A2H5MYA2 Database accession no. A0A2H5MYA2
HODGSON, HANNAH ET AL.: "Identification of key enzymes responsible for protolimonoid biosynthesis in plants: Opening the door to azadirachtin production", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES, vol. 116.34, 2019, pages 17096 - 17104
WEISSBACHWEISSBACH: "Molecular Cloning: a Laboratory Manual", 1989, COLD SPRING HARBOR LABORATORY PRESS
ZHANG ET AL., THE PLANT CELL, vol. 4, 1992, pages 1575 - 1588
GUERINEAUMULLINEAUX: "Plant Molecular Biology Labfa", 1993, SCIENTIFIC PUBLISHERS, article "Plant transformation and expression vectors", pages: 121 - 148
FRISCH, D. A.L. W. HARRIS-HALLER ET AL.: "Complete Sequence of the binary vector Bin 19", PLANT MOLECULAR BIOLOGY, vol. 27, 1995, pages 405 - 409, XP000654452, DOI: 10.1007/BF00020193
MARSHALLHODGSON, NATURE BIOTECHNOLOGY, vol. 16, 1998, pages 177 - 180
HALDRUP ET AL., PLANT MOLECULAR BIOLOGY, vol. 37, 1998, pages 287 - 296
GROTEWOLD ET AL.: "Engineering Secondary Metabolites in Maize Cells by Ectopic Expression of Transcription Factors", PLANT CELL, vol. 10, 1998, pages 721 - 740, XP002145082, DOI: 10.1105/tpc.10.5.721
VASIL ET AL.: "Laboratory Procedures and Their Applications", vol. I, II and III, 1984, ACADEMIC PRESS, article "Cell Culture and Somatic Cell Genetics of Plants"
SMITH ET AL., NATURE, vol. 334, 1988, pages 724 - 726
ENGLISH ET AL., THE PLANT CELL, vol. 8, 1996, pages 179 - 188
BOURQUE, PLANT SCIENCE, vol. 105, 1995, pages 125 - 149
FLAVELL, PNAS USA, vol. 91, 1994, pages 3490 - 3496
VAN DER KROL ET AL., THE PLANT CELL, vol. 2, 1990, pages 279 - 289
ANGELLBAULCOMBE, THE EMBO JOURNAL, vol. 16, no. 12, 1997, pages 3675 - 3684
VOINNETBAULCOMBE, NATURE, vol. 389, 1997, pages 553
FIRE A. ET AL., NATURE, vol. 391, 1998
FIRE, TRENDS GENET., vol. 15, 1999, pages 358 - 363
SHARP, GENES DEV., vol. 15, 2001, pages 485 - 490
HAMMOND ET AL., NATURE REV. GENES, vol. 2, 2001, pages 1110 - 1119
TUSCHL, CHEM. BIOCHEM., vol. 2, 2001, pages 239 - 245
ZAMORE P.D., NATURE STRUCTURAL BIOLOGY, vol. 8, no. 9, 2001, pages 746 - 750
SCHWAB ET AL., PLANT CELL, vol. 18, 2006, pages 1121 - 1133
ELENA, CLAUDIA ET AL.: "Expression of codon optimized genes in microbial systems: current industrial applications and perspectives", FRONTIERS IN MICROBIOLOGY, vol. 5, 2014, XP002765948, DOI: 10.3389/fmicb.2014.00021
SU R: "Triterpenoids from the fruits of Phellodendron chinense: The stereostructure of niloticin", CHEMICAL AND PHARMACEUTICAL BULLETIN, vol. 38, no. 6, 1990, pages 1616 - 1619
ARMITAGE ET AL., NATURE, vol. 357, 1992, pages 80 - 82
ENGLER, C. ET AL.: "A golden gate modular cloning toolbox for plants", ACS SYNTH BIOL, vol. 3, no. 11, 2014, pages 839 - 43
MORGAN ED: "Azadirachtin, a scientific gold mine", BIOORGANIC & MEDICINAL CHEMISTRY, vol. 17, no. 12, 2009, pages 4096 - 4105, XP026152224, DOI: 10.1016/j.bmc.2008.11.081
TAN Q-GLUO X-D: "Meliaceous limonoids: chemistry and biological activities", CHEMICAL REVIEWS, vol. 111, no. 11, 2011, pages 7437 - 7522
ROY ASARAF S: "Limonoids: Overview of significant bioactive triterpenes distributed in plants kingdom", BIOLOGICAL & PHARMACEUTICAL BULLETIN, vol. 29, no. 2, 2006, pages 191 - 201, XP008103058, DOI: 10.1248/bpb.29.191
ZHANG YYXU H: "Recent progress in the chemistry and biology of limonoids", RSC ADVANCES, vol. 7, no. 56, 2017, pages 35191 - 35220
HASEGAWA SMIYAKE M: "Biochemistry and biological functions of citrus limonoids", FOOD REVIEWS INTERNATIONAL, vol. 12, no. 4, 1996, pages 413 - 435
VEITCH GE ET AL.: "Synthesis of azadirachtin: a long but successful journey", ANGEW CHEM INT ED ENGL, vol. 46, no. 40, 2007, pages 7629 - 32
YAMASHITA S ET AL.: "Total synthesis of limonin", ANGEWANDTE CHEMIE INTERNATIONAL EDITION, vol. 54, no. 29, 2015, pages 8538 - 8541
GUALDANI RCAVALLUZZI MMLENTINI GHABTEMARIAM S: "The chemistry and pharmacology of citrus limonoids", MOLECULES, vol. 21, no. 11, 2016, pages 1530
AKHILA ASRIVASTAVA MRANI K: "Production of radioactive azadirachtin in the seed kernels of Azadirachta indica (the Indian neem tree", NATURAL PRODUCT LETTERS, vol. 11, no. 1, 1996, pages 107 - 110
AARTHY T ET AL.: "Tracing the biosynthetic origin of limonoids and their functional groups through stable isotope labelling and inhibition in neem tree (Azadirachta indica) cell suspension", BMC PLANT BIOLOGY, vol. 18, no. 1, 2018, pages 230
THIMMAPPA RGEISLER KLOUVEAU TO'MAILLE POSBOURN A: "Triterpene biosynthesis in plants", ANNUAL REVIEW OF PLANT BIOLOGY, vol. 29, no. 65, 2014, pages 225 - 57
PANDREKA A ET AL.: "Triterpenoid profiling and functional characterization of the initial genes involved in isoprenoid biosynthesis in neem (Azadirachta indica)", BMC PLANT BIOLOGY, vol. 15, no. 1, 2015, pages 214
WANG FS ET AL.: "Identification of putative genes involved in limonoids biosynthesis in citrus by comparative transcriptomic analysis", FRONTIERS IN PLANT SCIENCE, vol. 8, no. 1, 2017, pages 782
NARNOLIYA LKRAJAKANI RSANGWAN NSGUPTA VSANGWAN RS: "Comparative transcripts profiling of fruit mesocarp and endocarp relevant to secondary metabolism by suppression subtractive hybridization in Azadirachta indica (neem", MOLECULAR BIOLOGY REPORTS, vol. 41, no. 5, 2014, pages 3147 - 3162
RAJAKANI RNARNOLIYA LSANGWAN NSSANGWAN RSGUPTA V: "Subtractive transcriptomes of fruit and leaf reveal differential representation of transcripts in Azadirachta indica", TREE GENETICS & GENOMES, vol. 10, no. 5, 2014, pages 1331 - 1351, XP035392453, DOI: 10.1007/s11295-014-0764-7
WANG SZHANG HLI XZHANG J: "Gene expression profiling analysis reveals a crucial gene regulating metabolism in adventitious roots of neem (Azadirachta indica)", RSC ADVANCES, vol. 6, no. 115, 2016, pages 114889 - 114898
BHAMBHANI S ET AL.: "Genes encoding members of 3-hydroxy-3-methylglutaryl coenzyme A reductase (HMGR) gene family from Azadirachta indica and correlation with azadirachtin biosynthesis", ACTA PHYSIOL. PLANT., vol. 39, no. 1, 2017, pages 65
KITA M ET AL.: "Molecular cloning and characterization of a novel gene encoding limonoid UDP-glucosyltransferase in Citrus", FEBS LETTERS, vol. 469, no. 2-3, 2000, pages 173 - 178, XP004261071, DOI: 10.1016/S0014-5793(00)01275-8
KRISHNAN NM ET AL.: "A draft of the genome and four transcriptomes of a medicinal and pesticidal angiosperm Azadirachta indica", BMC GENOMICS, vol. 13, no. 1, 2012, pages 464, XP021120003, DOI: 10.1186/1471-2164-13-464
KRISHNAN NMJAIN PGUPTA SHARIHARAN AKPANDA B: "An improved genome assembly of Azadirachta indica A. Juss", G3: GENES, GENOMES, GENETICS, vol. 6, no. 7, 2016, pages 1835 - 1840
KURAVADI NA ET AL.: "Comprehensive analyses of genomes, transcriptomes and metabolites of neem tree", PEERJ, vol. 3, 2015, pages e1066
KRISHNAN NM ET AL.: "De novo sequencing and assembly of Azadirachta indica fruit transcriptome", CURRENT SCIENCE, vol. 101, no. 12, 2011, pages 1553 - 1561
WANG Y ET AL.: "Comparative analysis of the terpenoid biosynthesis pathway in Azadirachta indica and Melia azedarach by RNA-seq", SPRINGERPLUS, vol. 5, no. 1, 2016, pages 1 - 9
BHAMBHANI S ET AL.: "Transcriptome and metabolite analyses in Azadirachta indica: identification of genes involved in biosynthesis of bioactive triterpenoids", SCIENTIFIC REPORTS, vol. 7, no. 1, 2017, pages 5043
XU Q ET AL.: "The draft genome of sweet orange (Citrus sinensis)", NATURE GENETICS, vol. 45, no. 1, 2012, pages 59 - 66
RACOLTA SJUHL PBSIRIM DPLEISS J: "The triterpene cyclase protein family: a systematic analysis", PROTEINS: STRUCTURE, FUNCTION, AND BIOINFORMATICS, vol. 80, no. 8, 2012, pages 2009 - 2019
EBIZUKA YKATSUBE YTSUTSUMI TKUSHIRO TSHIBUYA M: "Functional genomics approach to the study of triterpene biosynthesis", PURE AND APPLIED CHEMISTRY, vol. 75, no. 2-3, 2003, pages 369 - 374
MORLACCHI P ET AL.: "Product profile of PEN3: the last unexamined oxidosqualene cyclase in Arabidopsis thaliana", ORGANIC LETTERS, vol. 11, no. 12, 2009, pages 2627 - 2630
NELSON DR: "Cytochrome P450 Protocols", 2004, HUMANA PRESS, article "Cytochrome P450 nomenclature", pages: 1 - 10
KOENEN EJCLARKSON JJPENNINGTON TDCHATROU LW: "Recently evolved diversity and convergent radiations of rainforest mahoganies (Meliaceae) shed new light on the origins of rainforest hyperdiversity", NEW PHYTOLOGIST, vol. 207, no. 2, 2015, pages 327 - 39
EKONG DEUIBIYEMI SAOLAGBEMI EO: "The meliacins (limonoids). biosynthesis of nimbolide in the leaves of Azadirachta indica", JOURNAL OF THE CHEMICAL SOCIETY D: CHEMICAL COMMUNICATIONS, vol. 18, 1971, pages 1117 - 1118
CAMACHO C ET AL.: "BLAST+: architecture and applications", BMC BIOINFORMATICS, vol. 10, no. 1, 2009, pages 421, XP055111342, DOI: 10.1186/1471-2105-10-421
BAK S ET AL.: "The Arabidopsis Book", vol. 9, 2011, article "Cytochromes p450", pages: e0144 - e0144
STEPHENSON MJREED JBROUWER BOSBOURN A: "Transient expression in nicotiana benthamiana leaves for triterpene production at a preparative scale", JOVE, vol. 138, 2018, pages e58169
REED J ET AL.: "A translational synthetic biology platform for rapid access to gram-scale quantities of novel drug-like molecules", METABOLIC ENGINEERING, vol. 42, no. 1, 2017, pages 185 - 193, XP085136198, DOI: 10.1016/j.ymben.2017.06.012
ZHAO W-Y ET AL.: "New tirucallane triterpenoids from Picrasma quassioides with their potential antiproliferative activities on hepatoma cells", BIOORGANIC CHEMISTRY, vol. 84, no. 1, 2019, pages 309 - 318
NAKANISHI TINADA ALAVIE D: "A new tirucallane-type triterpenoid derivative, lipomelianol from fruits of Melia toosendan. Sieb. et Zucc", CHEMICAL AND PHARMACEUTICAL BULLETIN, vol. 34, no. 1, 1986, pages 100 - 104
BEVAN CEKONG DHALSALL TTOFT P: "West African timbers. Part XX. The structure of turraeanthin, an oxygenated tetracyclic triterpene monoacetate", JOURNAL OF THE CHEMICAL SOCIETY C: 1967(ORGANIC, 1967, pages 820 - 828
POLONSKY JVARON ZRABANAL RMJACQUEMIN H: "21, 20-anhydromelianone and melianone from Simarouba amara (Simaroubaceae); carbon-13 NMR spectral analysis of A7-tirucallol-type triterpenes", ISRAEL JOURNAL OF CHEMISTRY, vol. 16, no. 1, 1977, pages 16 - 19
YUAN C-M ET AL.: "Bioactive limonoid and triterpenoid constituents of Turraea pubescens", JOURNAL OF NATURAL PRODUCTS, vol. 76, no. 6, 2013, pages 1166 - 1174
GRABHERR MG ET AL.: "Full-length transcriptome assembly from RNA-Seq data without a reference genome", NATURE BIOTECHNOLOGY, vol. 29, no. 7, 2011, pages 644 - U 130, XP055689113, DOI: 10.1038/nbt.1883
HAAS BJ ET AL.: "De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis", NATURE PROTOCOLS, vol. 8, no. 8, 2013, pages 1494 - 1512, XP055454988, DOI: 10.1038/nprot.2013.084
STANKE MMORGENSTERN B: "AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints", NUCLEIC ACIDS RESEARCH, vol. 33, 2005, pages W465 - W467
EDGAR R: "MUSCLE: multiple sequence alignment with high accuracy and high throughput", NUCLEIC ACIDS RESEARCH, vol. 32, no. 5, 2004, pages 1792 - 1797, XP008137003, DOI: 10.1093/nar/gkh340
KUSHIRO TSHIBUYA MMASUDA KEBIZUKA Y: "Mutational studies on triterpene synthases: engineering lupeol synthase into 3-amyrin synthase", JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, vol. 122, no. 29, 2000, pages 6816 - 6824
LIVAK KJSCHMITTGEN TD: "analysis of relative gene expression data using real-time quantitative PCR and the 2-AACT method", METHODS, vol. 25, no. 4, 2001, pages 402 - 408
SAINSBURY FTHUENEMANN ECLOMONOSSOFF GP: "pEAQ: versatile expression vectors for easy and quick transient expression of heterologous proteins in plants", PLANT BIOTECHNOLOGY JOURNAL, vol. 7, no. 7, 2009, pages 682 - 693
LAVIE DJAIN MKSHPAN-GABRIELITH SR: "A locust phagorepellent from two melia species", CHEMICAL COMMUNICATIONS (LONDON), vol. 1967, no. 18, 1967, pages 910 - 911
SAXENA NKUMAR Y: "Chemistry of azadirachtin and other bioactive isoprenoids from neem", 2008, INTERNATIONAL PUBLISHING HOUSE PVT. LTD., article "Spearman rank correlation coefficient", pages: 502 - 505
PAAL C: "Ueber die derivate des acetophenonacetessigesters und des acetonylacetessigesters", BERICHTE DER DEUTSCHEN CHEMISCHEN GESELLSCHAFT, vol. 17, no. 2, pages 2756 - 2767
KNORR L: "Synthese von furfuranderivaten aus dem diacetbernsteinsaureester", BERICHTE DER DEUTSCHEN CHEMISCHEN GESELLSCHAFT, vol. 17, no. 2, pages 2863 - 2870
SIDDIQUI SMAHMOOD TSIDDIQUI BSFAIZI S: "Isolation of a triterpenoid from Azadirachta indica", PHYTOCHEMISTRY, vol. 25, no. 9, 1986, pages 2183 - 2185
PURUSHOTHAMAN KKDURAISWAMY KCONNOLLY JDRYCROFT DS: "Triterpenoids from Walsura piscidia", PHYTOCHEMISTRY, vol. 24, no. 10, 1985, pages 2349 - 2355, XP026647169, DOI: 10.1016/S0031-9422(00)83040-X
AYAFOR JFSONDENGAM BLCONNOLLY JDRYCROFT DSOKOGUN JI: "Tetranortriterpenoids and related compounds, part 26. tecleanin, a possible precursor of limonin, and other new tetranortriterpenoids from Teclea grandifolia Engl.(Rutaceae", JOURNAL OF THE CHEMICAL SOCIETY, PERKIN TRANSACTION, vol. 1, no. 1, 1981, pages 1750 - 1753
HASEGAWA SHERMAN ZORME EOU P: "Biosynthesis of limonoids in citrus: sites and translocation", PHYTOCHEMISTRY, vol. 25, no. 12, 1986, pages 2783 - 2785
OU PHASEGAWA SHERMAN ZFONG CH: "Limonoid biosynthesis in the stem of Citrus limon", PHYTOCHEMISTRY, vol. 27, no. 1, 1988, pages 115 - 118
HASEGAWA SHERMAN Z: "Biosynthesis of obacunone from nomilin in Citrus limon", PHYTOCHEMISTRY, vol. 24, no. 9, 1985, pages 1973 - 1974, XP026617030, DOI: 10.1016/S0031-9422(00)83102-7
PRICE MDEHAL PARKIN A: "FastTree 2-approximately maximum-likelihood trees for large alignments", PLOS ONE, vol. 5, no. 3, 2010, pages e9490
LETUNIC IBORK P: "Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees", NUCLEIC ACIDS RESEARCH, vol. 44, no. W1, 2016, pages W242 - W245
MACKENZIE DJMCLEAN MAMUKERJI SGREEN M: "Improved RNA extraction from woody plants for the detection of viral pathogens by reverse transcription-polymerase chain reaction", PLANT DISEASE, vol. 81, no. 2, 1997, pages 222 - 226, XP001100054
LANGMEAD BTRAPNELL CPOP MSALZBERG SL: "Ultrafast and memory-efficient alignment of short DNA sequences to the human genome", GENOME BIOL, vol. 10, no. 1, 2009, pages R25, XP021053573, DOI: 10.1186/gb-2009-10-3-r25
LI BDEWEY CN: "RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome", BMC BIOINFORMATICS, vol. 12, no. 1, 2011, pages 323, XP021104619, DOI: 10.1186/1471-2105-12-323
ROBINSON MMCCARTHY DSMYTH G: "edgeR: a Bioconductor package for differential expression analysis of digital gene expression data", BIOINFORMATICS, vol. 26, no. 1, 2010, pages 139 - 140
LOVE MHUBER WANDERS S: "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2", GENOME BIOLOGY, vol. 15, no. 12, 2014, pages 550, XP021210395, DOI: 10.1186/s13059-014-0550-8
PEARSON K: "Notes on regression and inheritance in the case of two parents", PROCEEDINGS OF THE ROYAL SOCIETY OF LONDON, vol. 58, no. 1, 1895, pages 240 - 242
ZHAO SGUO YSHENG QSHYR Y: "Heatmap3: an improved heatmap package with more powerful and convenient features", BMC BIOINFORMATICS, vol. 15, no. 10, 2014, pages 16
STEPHENSON MJREED JBROUWER BOSBOURN A: "Transient expression in Nicotiana benthamiana leaves for triterpene production at a preparative scale", JOVE, vol. 138, no. 1, 2018, pages e58169
BAK S ET AL.: "Cytochromes P450", THE ARABIDOPSIS BOOK, vol. 9, no. 1, 2011, pages e0144 - e0144
SIEVERS F ET AL.: "Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega", MOLECULAR SYSTEMS BIOLOGY, vol. 7, no. 1, 2011, pages 539
MITTAPELLI SRMARYADA SKKHAREEDU VRVUDEM DR: "Structural organization, classification and phylogenetic relationship of cytochrome P450 genes in Citrus lementina and Citrus sinensis", TREE GENETICS & GENOMES, vol. 10, no. 2, 2014, pages 399 - 409
GRAY ALBHANDARI PWATERMAN PG: "New protolimonoids from the fruits of Phellodendron chinense", PHYTOCHEMISTRY, vol. 27, no. 6, 1988, pages 1805 - 1808, XP026631050, DOI: 10.1016/0031-9422(88)80448-5
NAKANISHI TINADA ALAVIE D: "A new tirucallane-type triterpenoid derivative, lipomelianol from fruits of Melia toosendan Sieb. et Zucc", CHEMICAL AND PHARMACEUTICAL BULLETIN, vol. 34, no. 1, 1986, pages 100 - 104
MULHOLLAND DAKOTSOS MMAHOMED HATAYLOR DAH: "Triterpenoids from Owenia cepiodora", PHYTOCHEMISTRY, vol. 49, no. 8, 1998, pages 2457 - 2460, XP004290460, DOI: 10.1016/S0031-9422(98)00307-0
WATTANAPIROMSAKUL CWATERMAN PG: "Flavanone, triterpene and chromene derivatives from the stems of Paramignya griffithii", PHYTOCHEMISTRY, vol. 55, no. 3, 2000, pages 269 - 273, XP004291643, DOI: 10.1016/S0031-9422(00)00311-3
FO ERFERNANDES JBVIEIRA PCDA SILVA MFDGF: "Isolation of secoisolariciresinol diesters from stems of Simaba cuneata", PHYTOCHEMISTRY, vol. 31, no. 6, 1992, pages 2115 - 2116, XP026633436, DOI: 10.1016/0031-9422(92)80374-N
POLONSKY JBASKEVITCH-VARON ZDAS BC: "Triterpenes tetracycliques du Simarouba amara", PHYTOCHEMISTRY, vol. 15, no. 2, 1976, pages 337 - 339
LUO X-DWU S-HMA Y-BWU D-G: "Tirucallane triterpenoids from Dysoxylum hainanense", PHYTOCHEMISTRY, vol. 54, no. 8, 2000, pages 801 - 805, XP004291578, DOI: 10.1016/S0031-9422(00)00172-2
LIU HHEILMANN JRALI TSTICHER O: "New tirucallane-type triterpenes from Dysoxylum variabile", JOURNAL OF NATURAL PRODUCTS, vol. 64, no. 2, 2001, pages 159 - 163
KUMAR VNIYAZ NMMWICKRAMARATNE DBMBALASUBRAMANIAM S: "Tirucallane derivatives from Paramignya monophylla fruits", PHYTOCHEMISTRY, vol. 30, no. 4, 1991, pages 1231 - 1233, XP027190981
JAYAKUMAR GAJITHA BAI MDFUJIMOTO Y: "Beddomeilactone: a new triterpene from Dysoxylum Beddomei", NATURAL PRODUCT RESEARCH, vol. 18, no. 4, 2004, pages 329 - 334
GU J: "Chemical components of Dysoxylum densiflorum", NATURAL PRODUCTS AND BIOPROSPECTING, vol. 3, no. 2, 2013, pages 66 - 69
GROSVENOR SNJMASCOLL KMCLEAN SREYNOLDS WFTINTO WF: "Tirucallane, apotirucallane, and octanorapotirucallane triterpenes of Simarouba amara.", JOURNAL OF NATURAL PRODUCTS, vol. 69, no. 9, 2006, pages 1315 - 1318
MOHAMAD K: "Tirucallane triterpenes from Dysoxylum macranthum", PHYTOCHEMISTRY, vol. 52, no. 8, 1999, pages 1461 - 1468, XP004291153, DOI: 10.1016/S0031-9422(99)00455-0
ORISADIPE ATADESOMOJU AAD'AMBROSIO MGUERRIERO AOKOGUN JI: "Tirucallane triterpenes from the leaf extract of Entandrophragma angolense", PHYTOCHEMISTRY, vol. 66, no. 19, 2005, pages 2324 - 2328, XP005096140, DOI: 10.1016/j.phytochem.2005.07.017
CHEN J: "Cytotoxic triterpenoids from Azadirachta indica", PLANTA MEDICA, vol. 77, no. 16, 2011, pages 1844 - 1847
RAGASA CY: "Glabretal-type triterpenoids from Dysoxylum mollissimum", PHYTOCHEMISTRY LETTERS, vol. 6, no. 4, 2013, pages 514 - 518
INADA AKONISHI MMURATA HNAKANISHI T: "Structures of a new limonoid and a new triterpenoid derivative from pericarps of Trichilia connaroides", JOURNAL OF NATURAL PRODUCTS, vol. 57, no. 10, 1994, pages 1446 - 1449
VIEIRA JI ET AL.: "Hirtinone, a novel cycloartane-type triterpene and other compounds from Trichilia hirta L. (Meliaceae", MOLECULES, vol. 18, no. 3, 2013, pages 2589 - 2597
RODRIGUES VFCARMO HMBRAZ RFMATHIAS LVIEIRA I: "Two new terpenoids from Trichilia quadrijuga (Meliaceae", NATURAL PRODUCT COMMUNICATIONS, vol. 5, no. 2, 2010, pages 179 - 184
HARDING WWJACOBS HLEWIS PAMCLEAN SREYNOLDS WF: "Cycloartanes, protolimonoids, a pregnane and a new ergostane from Trichilia reticulata", NATURAL PRODUCT LETTERS, vol. 15, no. 4, 2001, pages 253 - 260
KETWARU PKLASS JTINTO WFMCLEAN SREYNOLDS WF: "Pregnane steroids from Trichilia schomburgkii", JOURNAL OF NATURAL PRODUCTS, vol. 56, no. 3, 1993, pages 430 - 431
TINTO WFJAGESSAR PKKETWARU PREYNOLDS WFMCLEAN S: "Constituents of Trichilia schomburgkii", JOURNAL OF NATURAL PRODUCTS, vol. 54, no. 4, 1991, pages 972 - 977
WANG G-C ET AL.: "Limonoids and triterpenoids as 11 β-HSD1 inhibitors from Walsura robusta", JOURNAL OF NATURAL PRODUCTS, vol. 79, no. 4, 2016, pages 899 - 906
LIU J-Q ET AL.: "Limonoids from the leaves of Toona ciliata var. yunnanensis", PHYTOCHEMISTRY, vol. 76, no. 1, 2012, pages 141 - 149
KISHI KYOSHIKAWA KARIHARA S: "Limonoids and protolimonoids from the fruits of Phellodendron amurense", PHYTOCHEMISTRY, vol. 31, no. 4, 1992, pages 1335 - 1338, XP028087504, DOI: 10.1016/0031-9422(92)80285-M
ITOKAWA HKISHI EMORITA HTAKEYA K: "Cytotoxic quassinoids and tirucallane-type triterpenes from the woods of Eurycoma longifolia", CHEMICAL & PHARMACEUTICAL BULLETIN, vol. 40, no. 4, 1992, pages 1053 - 1055
SARAIVA RDCGPINTO ACNUNOMURA SMPOHLIT AM: "Triterpenes and a canthinone alkaloid from the stems of Simaba polyphylla (Cavalcante) WW Thomas (Simaroubaceae", QUIMICA NOVA, vol. 29, no. 2, 2006, pages 264 - 268
ESIMONE CO ET AL.: "Potential anti-respiratory syncytial virus lead compounds from Aglaia species", DIE PHARMAZIE - AN INTERNATIONAL JOURNAL OF PHARMACEUTICAL SCIENCES, vol. 63, no. 10, 2008, pages 768 - 773
BENOSMAN A: "Tirucallane triterpenes from the stem bark of Aglaia leucophylla", PHYTOCHEMISTRY, vol. 40, no. 5, 1995, pages 1485 - 1487
IRUNGU BN: "Antiplasmodial and cytotoxic activities of the constituents of Turraea robusta and Turraea nilotica", JOURNAL OF ETHNOPHARMACOLOGY, vol. 174, 2015, pages 419 - 425
WANG J-R: "Protolimonoids and norlimonoids from the stem bark of Toona ciliata var. pubescens", ORGANIC & BIOMOLECULAR CHEMISTRY, vol. 9, no. 22, 2011, pages 7685 - 7696
AHSAN MARMSTRONG JAGRAY ALWATERMAN PG: "Boronialatenolide: a novel pentanortriterpene from the aerial parts of Boronia alata (Rutaceae", AUSTRALIAN JOURNAL OF CHEMISTRY, vol. 47, no. 9, 1994, pages 1783 - 1787
AHSAN MARMSTRONG JAGRAY ALWATERMAN PG: "Terpenoids, alkaloids and coumarins from Boronia inornata and Boronia gracilipes", PHYTOCHEMISTRY, vol. 38, no. 5, 1995, pages 1275 - 1278
REEGAN ADGANDHI MRPAULRAJ MGBALAKRISHNA KIGNACIMUTHU S: "Effect of niloticin, a protolimonoid isolated from Limonia acidissima L. (Rutaceae) on the immature stages of dengue vector Aedes aegypti L. (Diptera: Culicidae", ACTA TROPICA, vol. 139, no. 1, 2014, pages 67 - 76
LIEN TPKAMPERDICK CSCHMIDT JADAM GSUNG T: "Apotirucallane triterpenoids from Luvunga sarmentosa (Rutaceae", PHYTOCHEMISTRY, vol. 60, no. 7, 2002, pages 747 - 754, XP004371695, DOI: 10.1016/S0031-9422(02)00156-5
KIPLIMO JISLAM SKOORBANALLY N: "Ring A, D-SECO limonoids and flavonoid from the Kenyan Vepris uguenensis Engl. and their antioxidant activity", PLANTA MEDICA, vol. 78, no. 11, 2012, pages PI111
HONG Z-L ET AL.: "Tetracyclic triterpenoids and terpenylated coumarins from the bark of Ailanthus altissima (''tree of heaven", PHYTOCHEMISTRY, vol. 86, no. 1, 2013, pages 159 - 167
GRIECO PAHADDAD JPINEIRO-NUNEZ MMHUFFMAN JC: "Quassinoids from the twigs and thorns of Castela polyandra", PHYTOCHEMISTRY, vol. 50, no. 4, 1999, pages 637 - 645, XP004290852, DOI: 10.1016/S0031-9422(98)00589-5
WANG JZHANG YLUO JKONG L: "Complete 1 H and 13C NMR data assignment of protolimonoids from the stem barks of Aphanamixis grandifolia", MAGNETIC RESONANCE IN CHEMISTRY, vol. 49, no. 7, 2011, pages 450 - 457
ZHANG X-Y ET AL.: "Tirucallane-type alkaloids from the bark of Dysoxylum laxiracemosum", JOURNAL OF NATURAL PRODUCTS, vol. 73, no. 8, 2010, pages 1385 - 1388
HUANG HL ET AL.: "Tirucallane-type triterpenoids from Dysoxylum lenticellatum", JOURNAL OF NATURAL PRODUCTS, vol. 74, no. 10, 2011, pages 2235 - 2242
HAYASIDA WOLIVEIRA LFERREIRA ALIMA M: "Ergostane steroids, tirucallane and apotirucallane triterpenes from Guarea convergens", CHEMISTRY OF NATURAL COMPOUNDS, vol. 53, no. 2, 2017, pages 312 - 317, XP036221474, DOI: 10.1007/s10600-017-1977-4
JIMENEZ A: "Limonoids from Swietenia humilis and Guarea grandiflora (Meliaceae) Taken in part from the PhD and MS theses of C. Villarreal and M. A. Jimenez, respectively", PHYTOCHEMISTRY, vol. 49, no. 7, 1998, pages 1981 - 1988
MIGUITA CH ET AL.: "313-0-tigloylmelianol from Guarea kunthiana: a new potential agent to control rhipicephalus (boophilus) microplus, a cattle tick of veterinary significance", MOLECULES, vol. 20, no. 1, 2015, pages 111
NTALLI NG: "Cytotoxic tirucallane triterpenoids from Melia azedarach fruits", MOLECULES, vol. 15, no. 9, 2010, pages 5866 - 5877
HAN JLIN WXU RWANG WZHAO S: "Studies on the chemical constituents of Melia azedarach L", ACTA PHARMACEUTICA SINICA, vol. 26, no. 6, 1991, pages 426 - 429
COOMBES PHMULHOLLAND DARANDRIANARIVELOJOSIA M: "Mexicanolide limonoids from the Madagascan Meliaceae Quivisia papinae", PHYTOCHEMISTRY, vol. 66, no. 10, 2005, pages 1100 - 1107, XP004967657, DOI: 10.1016/j.phytochem.2005.03.002
KAUR RARORA S: "Chemical constituents and biological activities of Chukrasia tabularis A. Juss.-A review", JOURNAL OF MEDICINAL PLANTS RESEARCH, vol. 3, no. 4, 2009, pages 196 - 216
BASAK SISLAM A: "DP Melianone from Swietenia mahagoni. J", INDIAN CHEM. SOC., vol. 47, no. 5, 1970, pages 501 - 503
BIAVATTI MW: "Chemistry and bioactivity of Raulinoa echinata Cowan, an endemic Brazilian Rutaceae species", PHYTOMEDICINE, vol. 8, no. 2, 2001, pages 121 - 124, XP004957212, DOI: 10.1078/0944-7113-00016
YANG S-PNI GGU Y-CYUE J-M: "Triterpenoids from Aglaia odorata var. microphyllina AU - Liu, Jia", J. ASIAN NAT. PROD. RES., vol. 14, no. 10, 2012, pages 929 - 939
BEVAN CEKONG DHALSALL TTOFT P: "West African timbers. Part XX. The structure of turraeanthin, an oxygenated tetracyclic triterpene monoacetate", JOURNAL OF THE CHEMICAL SOCIETY C: ORGANIC, 1967, pages 820 - 828
K. GEISLERR. K. HUGHESF. SAINSBURYG. P. LOMONOSSOFFM. REJZEKS. FAIRHURSTC.-E. OLSENM. S. MOTAWIAR. E. MELTONA. M. HEMMINGS ET AL.: "Biochemical analysis of a multifunctional cytochrome P450 (CYP51) enzyme required for synthesis of antimicrobial triterpenes in plants", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES, vol. 110, no. 35, 2013, pages E3360 - E3367
R. J. GREBENOKT. E. OHNMEISSA. YAMAMOTOE. D. HUNTLEYD. W. GALBRAITHD. DELIA PENNA: "Isolation and characterization of an Arabidopsis thaliana C-8,7 sterol isomerase: functional and structural similarities to mammalian C-8, 7 sterol isomerase/emopamil-binding protein", PLANT MOLECULAR BIOLOGY, vol. 38, no. 5, 1998, pages 807 - 815, XP002558802, DOI: 10.1023/A:1006028623875
A. RAHIERS. PIERREG. RIVEILLF. KARST: "Identification of essential amino acid residues in a sterol 8, 7-isomerase from zea mays reveals functional homology and diversity with the isomerases of animal and fungal origin", BIOCHEMICAL JOURNAL, vol. 414, no. 2, 2008, pages 247 - 259
A. C. HUANGT. JIANGY.-X. LIUY.-C. BAIJ. REEDB. QUA. GOOSSENSH.-W. NUTZMANNY. BAIA. OSBOURN: "A specialized metabolic network selectively modulates Arabidopsis root microbiota", SCIENCE, vol. 364, no. 6440, 2019, pages eaau6389
A. BAYERX. MAJ. STOCKIGT: "Acetyltransfer in natural product biosynthesis functional cloning andmolecular analysis of vinorine synthase", BIOORGANIC &MEDICINAL CHEMISTRY, vol. 12, no. 10, 2004, pages 2787 - 2795
S. T. MUGFORDX. QIS. BAKHTL. HILLE. WEGELR. K. HUGHESK. PAPADOPOULOUR.MELTONM. PHILOF. SAINSBURY ET AL.: "A serine carboxypeptidase-like acyltransferase is required for synthesis of antimicrobial compounds and disease resistance in oats", THE PLANT CELL, vol. 21, no. 8, 2009, pages 2473 - 2484
M. GIOLAIP. PAAJANENW. VERWEIJL. PERCIVAL-ALWYND. BAKERK. WITEKF. JUPEG. BRYANI. HEINJ. D. JONES ET AL.: "Targeted capture and sequencing of gene-sized DNA molecules", BIOTECHNIQUES, vol. 61, no. 6, 2016, pages 315 - 322
A. HALLAB: "PhD thesis", 2015, UNIVERSITATS-UND LANDESBIBLIOTHEK BONN, article "Protein Function Prediction Using Phylogenomics, Domain Architecture Analysis, Data Integration, and Lexical Scoring"
T. Z. BERARDINIL. REISERD. LIY. MEZHERITSKYR. MULLERE. STRAITE. HUALA: "The Arabidopsis information resource: making and mining the 'gold standard' annotated reference plant genome", GENESIS, vol. 53, no. 8, 2015, pages 474 - 485
U. CONSORTIUM: "UniProt: a hub for protein information", NUCLEIC ACIDS RESEARCH, vol. 43, no. D1, 2014, pages D204 - D212
P. JONESD. BINNSH.-Y. CHANGM. FRASERW. LIC. MCANULLAH. MCWILLIAMJ. MASLENA.MITCHELLG.NUKA ET AL.: "Interproscan 5: genome-scale protein function classification", BIOINFORMATICS, vol. 30, no. 9, 2014, pages 1236 - 1240
S. ANDREWSF. KRUEGERA. SEGONDS-PICHONL. BIGGINSC. KRUEGERS. WINGETTFASTQC: "RNA-Seq analysis workshop course material", vol. 29, 2013, WEILL CORNELLMEDICAL COLLEGE, article "STAR: ultrafast universal RNA-Seq aligner", pages: 15 - 21
H. LIB. HANDSAKERA. WYSOKERT. FENNELLJ. RUANN. HOMERG. MARTHG. ABECASISR. DURBIN: "The sequence alignment/map format and SAMtools", BIOINFORMATICS, vol. 25, no. 16, 2009, pages 2078 - 2079, XP055229864, DOI: 10.1093/bioinformatics/btp352
Y. LIAOG. K. SMYTHW. SHI: "The Subread aligner: fast, accurate and scalable readmapping by seed-and-vote", NUCLEIC ACIDS RESEARCH, vol. 41, no. 10, 2013, pages e108 - e108
M. I. LOVEW. HUBERS. ANDERS: "Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2", GENOME BIOLOGY, vol. 15, no. 12, 2014, pages 550
M. D. ROBINSOND. J. MCCARTHYG. K. SMYTH: "edgeR: a bioconductor package for differential expression analysis of digital gene expression data", BIOINFORMATICS, vol. 26, no. 1, 2010, pages 139 - 140
Q. LIUB. KHAKIMOVP. D. CARDENASF. COZZIC. E. OLSENK. R. JENSENT. P. HAUSERS. BAK: "The cytochrome P450 CYP72A552 is key to production of hederagenin-based saponins that mediate plant defense against herbivores", NEW PHYTOLOGIST, vol. 222, no. 3, 2019, pages 1599 - 1609
S. ZHAOY. GUOQ. SHENGY. SHYR: "Heatmap3: an improved heatmap package withmore powerful and convenient features", BMC BIOINFORMATICS, vol. 15, no. S10, 2014, pages 16
Attorney, Agent or Firm:
MEWBURN ELLIS LLP (GB)
Download PDF:
Claims:
Claims

1 A method of converting a host from a phenotype whereby the host is unable to carry out melianol biosynthesis from 2,3-oxidosqualene (OS) to a phenotype whereby the host is able to carry out said melianol biosynthesis,

which method comprises the step of expressing a heterologous nucleic acid within the host or one or more cells thereof, following an earlier step of introducing the nucleic acid into the host or an ancestor of either,

wherein the heterologous nucleic acid comprises a plurality of nucleotide sequences each of which encodes a polypeptide which in combination have said melianol biosynthesis activity.

2 A method as claimed in claim 1 wherein the nucleic acid encodes two or three of the following polypeptides

(i) a tirucalla-7,24-dien-3b-ol synthase (“TDS”) for cyclisation of OS to tirucalla-7,24-dien- 3b-oI;

(ii) an enzyme capable of oxidising tirucalla-7,24-dien-3b-ol or an oxidised derivative thereof at the C-23 position to a secondary alcohol and introducing an epoxide at the C24-C25 alkene (“C-23 oxidase/C23-C24 epoxidase”);

(iii) an enzyme capable of oxidising tirucalla-7,24-dien-3b-ol or an oxidised derivative thereof at the C-21 position to an alcohol or aldehyde (“C-21 oxidase”)

wherein each of the polypeptides is optionally obtained from Meliaceae or Rutaceae families.

3 A method as claimed in claim 2 wherein the nucleic acid encodes at least a C-23 oxidase/C23-C24 epoxidase and a C-21 oxidase, and optionally these are CYP450 enzymes, which are optionally CYP71 enzymes.

4 A method as claimed in claim 3 wherein the TDS, C-23 oxidase/C23-C24 epoxidase, and C-21 oxidase are selected from

the respective polypeptides in Tables 1 or 2,

or substantially homologous variants or fragments of any of said polypeptides in Tables 1 or 2,

or are encoded by the respective polynucleotides in Tables 1 or 2,

or substantially homologous variants or fragments of any of said polynucleotides in Tables 1 or 2.

5 A method as claimed in claim 4 wherein the polypeptides are selected from the list consisting of:

(i) the TDS shown in SEC ID: No 2, 4 or 6;

(ii) the C-23 oxidase/C23-C24 epoxidase shown in SEC ID: No 8;

(iii) the C-21 oxidase shown in SEC ID: No 14;

or substantially homologous variants or fragments of any of said polypeptides. 6 A method as claimed in any one of claims 1 to 5 for converting a host to a phenotype whereby the host is able to carry out said synthesis of a melianol-derivative,

wherein the heterologous nucleic acid comprises a plurality of nucleotide sequences each of which encodes a polypeptide, wherein the polypeptides in combination have said melianol-derivative biosynthesis activity.

7 A method as claimed in any one of claims 6 wherein the melianol-derivative is selected from: a limonoid; 7,8-epoxymelianol; melianol B; dehydrogenated melianol B, which is optionally melianone B; acetylated melianol B, which is optionally acetoxy-melianol B.

8 A method as claimed in claim 6 or claim 7 wherein the nucleic acid encodes one, two, three or four of the following polypeptides

(i) an enzyme capable of oxidising melianol or an oxidised derivative thereof at C-7 and introducing an epoxide at C7-C8 (“C7-C8 epoxidase”);

(ii) an enzyme capable of converting the product obtained by exposing melianol to a C7-C8 epoxidase, to melianol B (“7,8-epoxymelianol isomerase”);

(iii) an enzyme capable of dehydrogenating melianol B (“melianol B dehydrogenase”);

(iv) an enzyme capable of acetylating melianol B (“melianol B acetyltransferase”);

wherein each of the polypeptides is optionally obtained from Meliaceae or Rutaceae families.

9 A method as claimed in claim 8 wherein the nucleic acid encodes at least the C7-C8 epoxidase and 7,8-epoxymelianol isomerase, and the melianol-derivative is a limonoid.

10 A method as claimed in claim 9 wherein the C7-C8 epoxidase, 7,8-epoxymelianol isomerase, melianol B dehydrogenase, and melianol B acetyltransferase are selected from the respective polypeptides in Table 1 ,

or substantially homologous variants or fragments of any of said polypeptides in Table 1 ,

or are encoded by the respective polynucleotides in Table 1 ,

or substantially homologous variants or fragments of any of said polynucleotides in Table 1.

11 A method as claimed in claim 10 wherein the polypeptides are selected from the list consisting of:

(i) the C7-C8 epoxidase shown in SEQ ID: No 34;

(ii) the 7,8-epoxymelianol isomerase shown in SEQ ID: No 37;

(iii) the melianol B dehydrogenase shown in SEQ ID: No 39;

(iv) the melianol B acetyltransferase shown in SEQ ID: No 41 ;

or substantially homologous variants or fragments of any of said polypeptides. 12 A method as claimed in any one of claims 1 to 11 wherein the nucleic acid further encodes one or more of the following polypeptides:

(i) an HMG-CoA reductase (HMGR);

(ii) a squalene synthase (SQS),

wherein the HMGR or SQS are optionally selected from the respective polypeptides in Table 3 or substantially homologous variants or fragments of any of said polypeptides, or are encoded by the respective polynucleotides in Table 3, or substantially homologous variants or fragments of any of said polynucleotides.

13 A method as claimed in any one of claims 1 to 12 wherein the nucleotide sequences are present on two or more different nucleic acid molecules.

14 A method as claimed in claim 13 wherein the nucleic acid molecules are introduced by co-infiltration of a plurality of Agrobacterium tumefaciens strains each carrying one or more of the nucleic acid molecules.

15 A method as claimed in claim 14 wherein the nucleic acid molecules are transient expression vectors.

16 A method as claimed in claim 15 wherein each of the transient expression vectors comprises an expression cassette comprising:

(i) a promoter, operably linked to

(ii) an enhancer sequence derived from the RNA-2 genome segment of a bipartite RNA virus, in which a target initiation site in the RNA-2 genome segment has been mutated;

(iii) a nucleotide sequence encoding one of the polypeptides which in combination have said melianol biosynthesis activity;

(iv) a terminator sequence; and optionally

(v) a 3’ UTR located upstream of said terminator sequence.

17 A method as claimed in any one of claims 1 to 16 wherein the host is a plant which is converted such as to have a modified flavour or insect resistance as a result of the biosynthesis of melianol or a melianol derivative or other downstream product thereof, which downstream product is optionally shown in Figure 1 or Figure 19, and which is optionally azadirachtin.

18 A host cell containing or transformed with a heterologous nucleic acid which comprises a plurality of nucleotide sequences each of which encodes a polypeptide which in combination have melianol or melianol derivative biosynthesis activity,

wherein expression of said nucleic acid imparts on the transformed host the ability to carry out melianol or melianol derivative biosynthesis, wherein the host cell is optionally obtainable by the method of any one of claims 1 to 17.

19 A process for producing the host cell of claim 18 by co-infiltrating a plurality of recombinant constructs comprising said nucleic acid into the cell for transient expression thereof.

20A process for producing the host cell of claim 18 by transforming a cell with heterologous nucleic acid by introducing said nucleic acid into the cell via a vector and causing or allowing recombination between the vector and the cell genome to introduce the nucleic acid into the genome.

21 A method for producing a transgenic plant, which method comprises the steps of:

(a) performing a process as claimed in claim 20 wherein the host cell is a plant cell,

(b) regenerating a plant from the transformed plant cell.

22 A transgenic plant which is obtainable by the method of claim 21 , or which is a clone, or selfed or hybrid progeny or other descendant of said transgenic plant,

wherein expression of said heterologous nucleic acid imparts an increased ability to carry out melianol or melianol derivative synthesis compared to a wild-type plant otherwise corresponding to said transgenic plant.

23 An isolated nucleic acid molecule which nucleic acid comprises a melianol- or melianol derivative- biosynthetic nucleotide sequence which:

(i) encodes all or part of polypeptide SEQ ID NO: 2, 4, 6, 8, 10, 12, 14, 16, 18, 34, 37, 39 or 41 ;

(ii) encodes a variant polypeptide which is a homologous variant of any of these SEQ ID Nos, which shares at least about 60% identity with said SEQ ID NO,

which polypeptide in each case has the respective activity of said SEQ ID NO. shown in Table 1.

24 A nucleic acid as claimed in claim 23 wherein the nucleotide sequence is selected from SEQ I D NO: 1 , 3, 5, 7, 9, 11 , 13, 15, 17, 33, 35, 36, 38 or 40 or the genomic equivalent thereof.

25 A nucleic acid as claimed in claim 23 wherein the nucleotide sequence consists of an allelic or other homologous or orthologous variant of the nucleotide sequence of claim 24.

26 A nucleic acid as claimed in claim 25 wherein the nucleotide sequence is derived or obtained from Meliaceae or Rutaceae families, optionally from the species Azadirachta indica, Melia azedarach or Citrus sinensis. 27 A nucleic acid as claimed in claim 17 wherein the melianoll-biosynthetic nucleotide sequence encodes a derivative of the amino acid sequence shown in SEQ ID NO: 2, 4, 6, 8, 10, 12, 14, 16, 18, 34, 37, 39 or 41 by way of addition, insertion, deletion or substitution of one or more amino acids.

28 A process for producing a nucleic acid as claimed in claim 27 comprising the step of modifying a nucleic acid as claimed in any one of claims 24 to 26.

29 A method for identifying or cloning a nucleic acid as claimed in claim 25 or claim 26, which method employs all or part of a nucleic acid as claimed in claim 24 or the complement thereof.

30 A method as claimed in claim 29, which method comprises the steps of:

(a) providing a preparation of nucleic acid from a plant cell;

(b) providing a nucleic acid molecule which is a probe, said nucleic acid molecule having a sequence, which sequence is present in a nucleotide sequence of claim 24, or the complement of either;

(c) contacting nucleic acid in said preparation with said nucleic acid molecule under conditions for hybridisation; and,

(d) identifying nucleic acid in said preparation which hybridises with said nucleic acid molecule.

31 A method as claimed in claim 29, which method comprises the steps of:

(a) providing a preparation of nucleic acid from a plant cell;

(b) providing a pair of nucleic acid molecule primers suitable for PCR, at least one of said primers being a sequence of at least about 16-24 nucleotides in length, which sequence is present in a nucleotide sequence of claim 24, or the complement of either;

(c) contacting nucleic acid in said preparation with said primers under conditions for performance of PCR; and,

(d) performing PCR and determining the presence or absence of an amplified PCR product.

32 A method for identifying a nucleic acid as claimed in claim 25, which method employs all or part of the nucleotide sequence of a nucleic acid as claimed in claim 24 or the complement thereof as query sequence to interrogate a database of plant genomic sequences, and identifying the target nucleic acid as claimed in claim 25 based on sequence similarity and clustering of the target nucleic acid with the melianoll-biosynthetic nucleotide sequences.

33 A recombinant vector which comprises the nucleic acid of any one of claims 23 to 27. 34 A vector as claimed in claim 33 wherein the nucleic acid is operably linked to a promoter for transcription in a host cell, wherein the promoter is optionally an inducible promoter.

35 A vector as claimed in claim 33 or claim 34 which is a plant vector or a microbial vector.

36 A vector as claimed in claim 35 wherein the vector comprises an expression cassette comprising:

(i) a promoter, operably linked to

(ii) an enhancer sequence derived from the RNA-2 genome segment of a bipartite RNA virus, in which a target initiation site in the RNA-2 genome segment has been mutated;

(iii) the melianol-biosynthetic nucleotide sequence;

(iv) a terminator sequence; and optionally

(v) a 3’ UTR located upstream of said terminator sequence.

37 A method which comprises the step of introducing the vector of any one of claims 33 to 36 into a host cell, and optionally causing or allowing recombination between the vector and the host cell genome such as to transform the host cell.

38 A host cell containing or transformed with a heterologous nucleic acid as claimed in any one of claims 23 to 27 or a vector according to any one of claims 33 to 36.

39 A host cell as claimed in claim 38 which is microbial, optionally a yeast cell.

40 A host cell which is a plant cell having a heterologous nucleic acid as claimed in any one of claims 23 to 27 within its chromosome.

41 A method for producing a transgenic plant, which method comprises the steps of:

(a) performing a method as claimed in claim 37 wherein the host cell is a plant cell,

(b) regenerating a plant from the transformed plant cell.

42 A transgenic plant which is obtainable by the method of claim 41 , or which is a clone, or selfed or hybrid progeny or other descendant of said transgenic plant, which in each case includes a heterologous nucleic acid of any one of claims 23 to 27.

43 A plant having a heterologous nucleic acid as claimed in any one of claims 23 to 26 or vector of any one of claims 33 to 36 in one or more of its cells. 44 A leaf, stem, or edible portion or propagule from a plant as claimed in claim 42 or claim 43, which in either case includes a heterologous nucleic acid of any one of claims 23 to 26.

45 A host cell of any one of claims 38 to 40 or a plant of claim 42 or claim 43 or a leaf, stem, or edible portion or propagule of claim 44 which comprises nucleic acid encoding all of the following polypeptides:

(i) a TDS;

(ii) a C-23 oxidase/C23-C24 epoxidase;

(iii) a C-21 oxidase;

and optionally

(iv) a C7-C8 epoxidase;

(v) a 7,8-epoxymelianol isomerase;

and further optionally

(vi) a melianol B dehydrogenase and/or;

(vii) a melianol B acetyltransferase;

wherein at least one of said nucleic acids is heterologous to said host cell, plant, leaf, stem, or edible portion or propagule, and

wherein said host cell, plant, leaf, stem, or edible portion or propagule is capable of melianol or melianol derivative biosynthesis activity.46 A host cell of any one of claims 38 to 40 or a plant of claim 42 or claim 43 or a leaf, stem, or edible portion or propagule of claim 44 which comprises a heterologous nucleic acid encoding all of the following polypeptides:

(i) the TDS shown in SEQ ID: No 2, 4 or 6;

(ii) the C-23 oxidase/C23-C24 epoxidase shown in SEQ ID: No 8, 10 or 12;

(iii) the C-21 oxidase shown in SEQ ID: No 14, 16 or 18;

and optionally:

(iv) the C7-C8 epoxidase shown in SEQ ID: No 34;

(v) the 7,8-epoxymelianol isomerase shown in SEQ ID: No 37;

and further optionally:

(vi) the melianol B dehydrogenase shown in SEQ ID: No 39;

(vii) the melianol B acetyltransferase shown in SEQ ID: No 41 ;

or encoding substantially homologous variants or fragments of any of said polypeptide.

47 An isolated, and optionally recombinant, polypeptide which is encoded by the melianol- or melianol derivative- biosynthetic nucleotide sequence of any one of claims 23 to 27.

48 Use of a polypeptide of claim 47 in an in vivo or in vitro method of synthesising melianol or a melianol derivative. 49 A method of making the polypeptide of claim 47, which method comprises the step of expression the polypeptide from a nucleic acid of any one of claims 23 to 27 in a host cell.

50 A method for influencing or affecting the melianol or a melianol derivative

biosynthesis in a host, the method comprising the step of:

(i) causing or allowing expression of a heterologous nucleic acid as claimed in any one of claims 23 to 27 within the cells of the host, following an earlier step of introducing the nucleic acid into a cell of the host or an ancestor thereof, or

(ii) introducing a silencing agent capable of silencing expression of a nucleotide sequence as described in claim 24 or claim 25 into a cell of the host or an ancestor thereof.

51 A method of producing a product which is melianol or a melianol derivative or other downstream product thereof in a host, which is optionally a plant, which method comprises performing a method as claimed in any one of claims 1 to 17, 37, or 50, and optionally isolating the product from the host.

52 A method of producing a product which is melianol or a melianol derivative or other downstream product thereof in a heterologous host, which method comprises culturing a host cell as claimed in any one of claims 38 to 40, or claim 45 or claim 46, and purifying the product therefrom.

53 A method of producing a product which is melianol or a melianol derivative or other downstream product thereof in a heterologous host, which method comprises growing a plant as claimed in any one of claims 22, or claims 42 or 43 and then harvesting it and purifying the product therefrom.

54 Use of melianol or a melianol derivative or other downstream product thereof obtained by the method of any one of claims 51 to 53 in the preparation of a limonoid, which is optionally azadirachtin

55 A method, plant, vector, host cell, leaf, stem, or edible portion or propagule, as claimed in any one of claims 17, 21 , 22, 30-32, 35, 40-46 or 53 wherein the plant is a crop plant or a moss.

56 Double-stranded RNA which comprises an RNA sequence equivalent to part of a nucleotide sequence as described in claim 24. 57 Double-stranded RNA as claimed in claim 56 which is a siRNA duplex consisting of between 20 and 25 bps.

58 An antibody which specifically binds the polypeptide of claim 47.

Description:
Biosynthetic qenes and polypeptides

Technical field

The present invention relates generally to genes and polypeptides which have utility in engineering or modifying limonoid or proto-limonoid production in host cells. The invention further relates to systems, methods and products employing the same.

Backqround art

Limonoids are a diverse natural products made by plants belonging to the Meliaceae (Mahogany) and Rutaceae (Citrus) families.

Some limonoids have been heralded as bee-friendly degradable“natural” insecticides. Limonoids also contribute to bitterness in citrus fruits and have important pharmaceutical properties. The best known limonoid insecticide is azadirachtin, produced by the Meliaceae family neem tree ( Azadirachta indica).

Despite intensive investigation of limonoids over the last half century, the route of limonoid biosynthesis remains unknown.

The basic limonoid scaffold has 26 carbon atoms (C26). Limonoids are classified as tetranor-triterpenes because their prototypical structure is a tetracyclic triterpene scaffold (C30) which has lost four carbons during furan ring formation (1) (Fig. 1).

The immediate precursors to limonoids (i.e. the C30 tetracyclic triterpenes preceding the loss of four carbons) are known as protolimonoids. Limonoids are heavily oxygenated and can exist either as simple ring-intact structures or as highly modified seco-ring derivatives (2) (Fig. 1). Limonoid production is largely confined to specific families within the Sapindales order (Meliaceae, Rutaceae, and to a lesser extent the Simaroubaceae) (3, 4).

Rutaceae limonoids have historically been studied because they are partially responsible for bitterness in citrus fruit. They have also been reported to have important medicinal activities (e.g. anti-cancer, anti-HIV) and so are of interest as potential pharmaceuticals. Around 50 limonoid aglycones have been reported from the Rutaceae, primarily with seco-A,D-ring structures (3-5) (Fig. 1).

In contrast, the Meliaceae are known to produce around 1500 structurally diverse limonoids, of which the seco-C-ring limonoids are the most dominant (2, 4). Seco- Co- ring limonoids (e.g. salannin and azadirachtin; Fig. 1) are generally regarded as particular to the Meliaceae and are of interest because of their anti-insect activity (2).

Azadirachtin (isolated from A. indica) is particularly renowned because of its potent insect antifeedant activity and other features that make it suitable for crop protection, such as systemic uptake, degradability, and low toxicity to mammals, birds, fish and beneficial insects (1). Azadirachtin has a highly complex structure (Fig. 1). Although the total chemical synthesis of this limonoid was reported in 2007, this represented the culmination of a 22-year endeavour (6) involving 71 steps and with 0.00015% total yield. Chemical synthesis of azadirachtin is therefore not practical for production on an industrial scale. Similarly, chemical synthesis of Rutaceae limonoids such as limonin [achieved in 35 steps from geraniol (7)] is also unlikely to be commercially viable. Therefore, at present the use of seco-C-ring Meliaceae limonoids for crop protection relies on extraction of A. indica seeds (1). Similarly, the potential health benefits of Rutaceae limonoids remain restricted to dietary consumption (8).

The involvement of the mevalonate (MVA) pathway in limonoid biosynthesis has been demonstrated by feeding experiments with P 14 PC-mevalonate and P 13 PC- glucose in A. indica plants and cell cultures, respectively (9, 10). The MVA pathway supplies the generic triterpene precursor 2,3-oxidosqualene, which can be cyclized to a variety of different triterpene scaffolds, a process initiated and controlled by enzymes known as oxidosqualene cyclases (OSCs) (11).

Two OSC sequences have previously been identified in A. indica, but neither of the products of these genes has been functionally characterised (12).

Additionally, an OSC has been identified in Citrus grandis and implicated in limonoid biosynthesis by viral induced gene silencing (13). However, the C. grandis OSC is a close homolog to characterised lanosterol synthases (11), which would make involvement in limonoid biosynthesis unlikely.

Thus, although speculation of potential biosynthetic routes is possible based on limonoid and protolimonoid structures (Fig. 1), the nature of the triterpene scaffold implicated in limonoid biosynthesis remains unknown.

The predicted route of limonoid biosynthesis beyond initial triterpene scaffold generation remains entirely speculative. In triterpene biosynthesis, oxidosqualene cyclisation is commonly followed by oxidation, performed by cytochrome P450s (CYPs) (11). Several CYP sequences identified in A. indica and C. grandis have been implicated in limonoid biosynthesis based on expression profiling, in silico docking modelling, and phylogenetic analysis (12-17). However, these CYPs have not been functionally characterised and predictions of their activity are problematic without an understanding of the nature of the triterpene scaffold that they would act on.

The only limonoid biosynthetic enzyme whose function has been confirmed by recombinant expression is a limonoid UDP-glucosyltransferase from Citrus unshiu, which produces limonin-17^-D-glucopyranoside (18).

The lack of enzymatic characterisation is not due to an absence of genetic information as sequence data from limonoid-producing species of the Meliaceae (12, 14, 15, 19-24) and Rutaceae families (34Thttp://www.citrusgenomedb.org/)(25) is available. Rather the lack of characterisation appears to reflect significant complexity and uncertainty surrounding the biosynthetic pathway.

JP2005052009 reports the cloning of a tirucalla-7,24-dien-3b-ol synthase, apparently from tree of heaven (Ailanthus altissima, family Simaroubaceae). A. altissima is known to produce quassinoids but are not known to produce true limonoids.

Multifunctional OSC genes are known from Arabidopsis thaliana ( AtLUP5 , At PEN 3). The encoded enzymes may produce tirucalla-7,24-dien-3b-ol as part of their product profile (11 , 27, 28).

However, given the significant challenges of chemical synthesis, and the lack of detailed information in relation to the limonoid pathway, it can be seen that characterisation of the enzymes involved in limonoid biosynthesis would provide a contribution to the art.

Disclosure of the invention

The present inventors have investigated three diverse limonoid-producing species {A. indica, Melia azedarach and Citrus sinensis) to elucidate the early steps in limonoid biosynthesis.

They have identified an oxidosqualene cyclases able to produce the potential 30- carbon triterpene scaffold precursor tirucalla-7,24-dien-3b-ol from each of the three species (termed herein AiOSCI , MaOSCI , CsOSCI from the three limonoid- producing plant species A. indica, M. azedarach and C. sinensis respectively).

Although the sequences of the three previously reported uncharacterised putative OSCs (two from A. indica and one from C. grandis) (12, 13) have not been deposited in publicly available databases, they appear to be phylogenetically distinct from the three tirucalla-7,24-dien-3b-ol synthases characterised here.

The presently characterised synthases are also clearly distinct from Atl_UP5, the multifunctional OSC from A.thaliana) although another previously characterised multifunctional OSC (AtPEN3) from A. thaliana is located in a neighbouring subclade to the presently characterised synthases.

The present inventors have further identified co-expressed cytochrome P450 enzymes from M. azedarach (MaCYP71CD2 and MaCYP71 BQ5), as well as orthologs or homologs of these from A. indica and C. sinensis (see Table S4), that are capable of three oxidations of tirucalla-7,24-dien-3b-ol, resulting in spontaneous hemiacetal ring formation and the production of the protolimonoid melianol.

The pathway shown in Fig. 5 D is the proposed pathway for melianol biosynthesis in M. azedarach, showing the synthesis of tirucalla-7,24-dien-3b-ol (1) and ultimately the protolimonoid melianol (4). This biosynthetic scheme appears to be conserved across the Meliaceae and Rutaceae families.

The present work is believed to represent the first characterisation of protolimonoid biosynthetic enzymes from any plant species.

Furthermore the present inventors have successfully engineered the melianol biosynthetic pathway into heterologous organisms which are not otherwise melianol producers.

After the presently claimed priority date, some of this disclosure was published (Hodgson, Hannah, et al. "Identification of key enzymes responsible for protolimonoid biosynthesis in plants: Opening the door to azadirachtin production." Proceedings of the National Academy of Sciences 116.34 (2019): 17096-17104).

The present inventors have further identified limonoid biosynthetic genes for steps downstream of melianol. Candidate genes were selected from a novel genome assembly, based on their annotation and shared expression with melianol biosynthetic genes. A subset of genes were assessed for activity by co-expression with melianol biosynthetic gene in N. benthamiana. This approach has led to the identification of four diverse melianol-modifying enzymes (see Figure 19).

MaCYP88A108 is a homolog of AiCYP88A108, which was co-expressed with

AiOSC1 in A. indica (Figure 3, Table S4).

MalSOMI was identified as a potential sterol isomerase, inferred to be important during limonoid scaffold rearrangement.

Importantly, in combination these enzymes have been shown to structurally rearrange the internal scaffold of melianol to form a melianol derivative with a true limonoid internal scaffold.

The present inventors have further identified“tailoring enzymes”: a short chain dehydrogenase/reductase (MaSDRI) and an acyltransferase (MaBAHDI). These are characterised as modifying melianol-type scaffolds by dehydrogenation and acetylation respectively.

The methods and materials described herein can be used, inter alia, to produce recombinant host organisms (for example plants or microorganisms) which can produce limonoids or proto-limonoids even if they are not naturally produced by the wild-type host. The disclosure herein provides the means to engineer plants with enhanced insect resistance and produce high value limonoids for pharmaceutical and other applications by expression in heterologous hosts.

Thus in one aspect of the invention there is provided a method of converting a host from a phenotype whereby the host is unable to carry out melianol biosynthesis from 2,3-oxidosqualene (OS) to a phenotype whereby the host is able to carry out said melianol biosynthesis,

which method comprises the step of expressing a heterologous nucleic acid within the host or one or more cells thereof, following an earlier step of introducing the nucleic acid into the host or an ancestor of either,

wherein the heterologous nucleic acid comprises a plurality of nucleotide sequences each of which encodes a polypeptide which in combination have said melianol biosynthesis activity.

As explained below, in the methods described herein, one or more of the recited activities may be provided by enzymes or other polypeptides native to the host, provided at least one (e.g. one, two or three) are provided by heterologous nucleic acid of the invention. Hosts of the invention are therefore“non-naturally occurring” in nature.

By way of example, using the illustrative scheme of Figure 5D, these polypeptides (enzymes) could comprise any one or more of the following:

(i) a specific or multifunctional tirucalla-7,24-dien-3b-ol synthase (TDS) for cyclisation of OS to tirucalla-7,24-dien-3b-ol;

(ii) an enzyme capable of oxidising tirucalla-7,24-dien-3b-ol or an oxidised derivative thereof at the C-23 position to a secondary alcohol and introducing an epoxide at the C24-C25 alkene (“C-23 oxidase/C23-C24 epoxidase”);

(iii) an enzyme capable of oxidising tirucalla-7,24-dien-3b-ol or an oxidised derivative thereof at the C-21 position to an alcohol or aldehyde (“C-21 oxidase”)

By way of non-limiting example:

Enzyme (ii) may be an enzyme capable of oxidising tirucalla-7,24-dien-3b-ol to dihydroniloticin

Enzyme (iii) may be an enzyme capable of oxidising dihydroniloticin to tirucalla-7- ene-24,25-epoxy^,21 ,23-triol or 21-oxotirucalla-7-ene-24,25-epoxy^,23-diol. Preferably the heterologous nucleic acid comprises all of (i) to (iii), but if one or more of those activities is present natively in the host, then that activity need not be provided heterologously.

In one embodiment at least enzymes (ii) and (iii) are provided as part of the method.

As explained hereinafter, it appears that the combination of introduction into tirucalla- 7,24-dien-3b-ol of a secondary alcohol at C23, epoxidation of the C24-25 alkene and oxidation at C21 together causes spontaneous hemiacetal ring formation by nucleophilic attack of C21 to give melianol.

The formation of melianol suggests that the oxidation at C21 may produce an aldehyde at this position, since this would allow the formation of the melianol hemiacetal ring. Indeed melianol exists as an epimeric mixture in solution (37), with the hemiacetal ring opening and reforming with two different stereochemistries at C21.

In certain embodiments the C-23 oxidase/C23-C24 epoxidase and C-21 oxidase are CYP450 enzymes, which are optionally CYP71 enzymes.

Preferred genes or polypeptides (TDS, C-23 oxidase/C23-C24 epoxidase, and C-21 oxidase) for use in the practice of the invention are shown in Tables 1 or 2 herein (or the Sequence Annex) or are substantially homologous variants or fragments of these, having the requisite biological activity shown in the Tables, or in the Figures of Examples.

For example TDS activity may be provided by any of OSC1 sequences in Table 1 , or the A. altissima tirucalla-7,24-dien-3b-ol synthase of JP2005052009

A. thaliana Atl_UP5 or AtPEN3 sequences of Table 2, or substantially homologous variants or fragments of these, having the requisite biological activity.

In preferred embodiments, the one, two, or three of the respective polypeptides are selected from the sequences listed in Table 1.

In one embodiment the polypeptides are selected from the list consisting of:

(i) the TDS shown in SEQ ID: No 2, 4 or 6;

(ii) the C-23 oxidase/C23-C24 epoxidase shown in SEQ ID: No , 10 or 12;

(iii) the C-21 oxidase shown in SEQ ID: No 14, 16 or 18;

or substantially homologous variants or fragments of any of said polypeptides.

The invention also provides a method of converting a host to a phenotype whereby the host is able to carry out said synthesis of a melianol-derivative, wherein the heterologous nucleic acid comprises a plurality of nucleotide sequences each of which encodes a polypeptide, wherein the polypeptides in combination have said melianol-derivative biosynthesis activity. Examples of melianol-derivatives include “true” limonoids; 7,8-epoxymelianol; melianol B; dehydrogenated melianol B, which is optionally melianone B; and acetylated melianol B, which is optionally acetoxy- melianol B.

By way of example, using the illustrative scheme of Figure 19, these polypeptides (enzymes) could comprise any one, two, three or all four of the following:

(i) an enzyme capable of oxidising melianol or an oxidised derivative thereof at the C- 7 and introducing an epoxide at C7-C8 (“C7-C8 epoxidase”); (ii) an enzyme capable of converting the product obtained by exposing melianol to a C7-C8 epoxidase, to melianol B (“7,8-epoxymelianol isomerase”);

(iii) an enzyme capable of dehydrogenating melianol B (“melianol B

dehydrogenase”);

(iii) an enzyme capable of acetylating melianol B (“melianol B acetyltransferase”); wherein each of the polypeptides is optionally obtained from Meliaceae or Rutaceae families.

Preferably the nucleic acid encodes at least the C7-C8 epoxidase and C7-C8 isomerase, and the melianol-derivative is a limonoid.

Preferred genes or polypeptides (C7-C8 epoxidase, 7,8-epoxymelianol isomerase, melianol B dehydrogenase, and melianol B acetyltransferase) for use in the practice of the invention are shown in Table 1 (or in the Sequence Annex) or are substantially homologous variants or fragments of these, having the requisite biological activity described in Table 1 or in the corresponding Figures or Examples.

In one embodiment the polypeptides are selected from the list consisting of:

(i) the C7-C8 epoxidase shown in SEQ ID: No 34;

(ii) the 7,8-epoxymelianol isomerase shown in SEQ ID: No 37;

(i) the melianol B dehydrogenase shown in SEQ ID: No 39;

(i) the melianol B acetyltransferase shown in SEQ ID: No 41 ;

or substantially homologous variants or fragments of any of said polypeptides.

For brevity, in the context of the present invention, and in particular the methods and uses described herein, the nucleotide sequences of any of Tables 1 and 2 may be referred to herein as“melianol-biosynthesis (modifying) sequences” or“M-B sequences” e.g. M-B genes and M-B polypeptides.

Variants

In addition to use of these M-B genes (and polypeptides) the invention encompasses use of variants of these genes (and polypeptides).

A“variant” M-B nucleic acid or M-B polypeptide molecule shares homology with, or is identical to, all or part of the M-B genes or polypeptides discussed herein.

A variant polypeptide shares the relevant biological activity of the native M-B polypeptide. A variant nucleic acid encodes the relevant variant polypeptide.

In this context the“biological activity” of the M-B polypeptide is the ability to catalyse the respective reaction shown in Fig. 5D or 19, and described above (e.g. the cyclase or oxidase or epoxidase or isomerase or dehydrogenase or acetyl transferase activity) and/or the activity set out in the respective Table e.g. Table 1 , 2 or 3. The relevant biological activities may be assayed based on the reactions shown in Fig.

5D in vitro. Alternatively they can be assayed by activity in vivo as described in the Examples i.e. by introduction of a plurality of heterologous constructs to generate melianol, which can be assayed by LC-MS or the like.

Table 4 shows pairwise comparisons of some of the enzymes described herein, obtained using Clustal Omega (version 1.2.4 - accessed through

https://www.ebi.ac.uk; see ref 20).

Variants of the sequences disclosed herein preferably share at least 50%, 55%, 56%, 57%, 58%, 59%, 60%, 65%, or 70%, or 80% identity, most preferably at least about 90%, 95%, 96%, 97%, 98% or 99% identity. Such variants may be referred to herein as“substantially homologous”.

Preferred variants may be:

(i) Naturally occurring nucleic acids such as alleles (which will include polymorphisms or mutations at one or more bases) or pseudoalleles (which may occur at closely linked loci to the M-B genes of the invention). Also included are paralogues, isogenes, or other homologous genes belonging to the same families as the M-B genes of the invention. Also included are orthologues or homologues from other plant species.

Homology may be at the nucleotide sequence and/or amino acid sequence level, as discussed below.

(ii) Artificial nucleic acids, which can be prepared by the skilled person in the light of the present disclosure. Such derivatives may be prepared, for instance, by site directed or random mutagenesis, or by direct synthesis. Preferably the variant nucleic acid is generated either directly or indirectly (e.g. via one or more amplification or replication steps) from an original nucleic acid having all or part of the sequence of a M-B gene of the invention.

Also included are nucleic acids corresponding to those above, but which have been extended at the 3' or 5' terminus.

The term“M-B variant nucleic acid” as used herein encompasses all of these possibilities. When used in the context of polypeptides or proteins it indicates the encoded expression product of the variant nucleic acid.

In each case, the preferred melianol-biosynthesis modifying nucleic acids are any of SEQ I D Nos 1 , 3, 5, 7, 9, 11 , 13, 15, 17, 33, 35, 36, 38 or 40 or substantially homologous variants thereof.

The preferred melianol-biosynthesis modifying polypeptides are any of SEQ ID Nos 2, 4, 6, 8, 10, 12, 14, 16, 18, 34, 37, 39 or 41 or substantially homologous variants thereof. Other preferred melianol-biosynthesis modifying nucleic acids for use in the invention are SEQ ID No 19, or substantially homologous variants or fragments thereof. Other preferred melianol-biosynthesis modifying polypeptides are polypeptides encoded by any of these sequences or variants or fragments.

Supplementary genes

In embodiments of the invention, in addition to the M-B genes and variant nucleic acids of the invention described herein, it may be preferable to introduce additional genes which may affect flux of melianol-biosynthesis.

For example MVA is an important intermediate in triterpenoid synthesis. Therefore it may be desirable to expression of rate-limiting MVA pathway genes into the host, to maximise yields of downstream products.

HMG-CoA reductase (HMGR) is believed to be a rate-limiting enzyme in the MVA pathway.

The use of a recombinant feedback-insensitive truncated form of HMGR (tHMGR) has been demonstrated to increase triterpene (b-amyrin) content upon transient expression in N. benthamiana [35]

Thus one embodiment of the invention comprises the use of a heterologous HMGR (e.g. a feedback-insensitive HMGR) along with the M-B genes described herein. Examples of HMGR encoding or polypeptide sequences include SEQ ID Nos 21 to 24, or variants or fragments of these. Variants may be homologues, alleles, or artificial derivatives etc. as discussed in relation to M-B genes or polypeptides as described above. For example an HMGR native to the host being utilised may be preferred - for example a yeast HMGR in a yeast host, and so on. HMGR genes are known in the art and may be selected, as appropriate in the light of the present disclosure.

It has also been reported that squalene synthase (SQS is a potential rate-limiting step [35]

Thus one embodiment of the invention comprises the use of a heterologous SQS along with the M-B genes and optionally HMGR described herein.

Examples of SQS encoding or polypeptide sequences include SEQ ID Nos 25 to 26, or variants or fragments of these. Variants may be homologues, alleles, or artificial derivatives etc. as discussed in relation to M-B genes or polypeptides as described above. For example an SQS native to the host being utilised may be preferred - for example a yeast SQS in a yeast host, and so on. SQS genes are known in the art and may be selected, as appropriate in the light of the present disclosure.

When using certain hosts (for example yeasts) it may be desirable to introduce additional genes to improve the flux of melianol-biosynthesis. Examples may include one or more plant cytochrome P450 reductases (CPRs) to serve as the redox partner to the introduced P450s. Thus one embodiment of the invention comprises the use of a heterologous cytochrome P450 reductase such as AtATR2 (Arabidopsis thaliana cytochrome P450 reductase 2) along with the M-B genes described herein.

Examples of HAtATR2 encoding or polypeptide sequences include SEQ ID Nos 27 to 28, or variants or fragments of these. Variants may be homologues, alleles, or artificial derivatives etc. as discussed in relation to M-B genes or polypeptides as described above.

Thus in one embodiment the nucleic acid utilised in the invention further encodes one or more of the following polypeptides:

(i) an HMG-CoA reductase (HMGR);

(ii) a squalene synthase (SQS)

wherein the HMGR or SQS are optionally selected from the respective polypeptides in Table 3 or substantially homologous variants or fragments of any of said polypeptides, or are encoded by the respective polynucleotides in Table 3, or substantially homologous variants or fragments of any of said polynucleotides.

It will be understood by those skilled in the art, in the light of the present disclosure, that additional genes may be utilised in the practice of the invention, to provide additional activities and\or improve expression or activity. These include those expressing co-factor or helper proteins, or other factors.

For brevity, unless context demands otherwise, any of these nucleic acid sequences (the“M-B genes of the invention” and “M-B variant nucleic acids”, plus other genes effecting M-B synthesis, or secondary modifications to melianol) may be referred to herein as“M-B nucleic acid” or“melianol-biosynthesis modifying nucleic acid”.

Likewise the encoded polypeptides may be referred to herein as“M-B polypeptides” or“melianol-biosynthesis modifying polypeptides”.

It will be appreciated that where these generic terms are used in relation to any aspect or embodiment, the meaning or disclosure will be taken to apply mutatis mutandis to any of these sequences individually.

Vectors

As one aspect of the invention there is disclosed a method employing the co infiltration of a plurality of Agrobacterium tumefaciens strains each carrying one or more of the M-B nucleic acids discussed above for concerted expression thereof in a biosynthetic pathway discussed above.

In some embodiments at least 2 or 3 different Agrobacterium tumefaciens strains are co- infiltrated e.g. each carrying a M-B nucleic acid.

The genes may be present from transient expression vectors. A preferred expression system utilises the called“'Hyper-Translatable' Cowpea Mosaic Virus ('CPMV-HT') system, described in W02009/087391 the disclosure of which is specifically incorporated herein in support of the embodiments using the CPMV-HT system - for example vectors based on pEAQ-HT expression plasmids.

Thus the vectors (typically binary vectors) for use in the present invention will typically comprise an expression cassette comprising:

(i) a promoter, operably linked to

(ii) an enhancer sequence derived from the RNA-2 genome segment of a bipartite RNA virus, in which a target initiation site in the RNA-2 genome segment has been mutated;

(iii) a M-B nucleic acid sequence as described above;

(iv) a terminator sequence; and optionally

(v) a 3’ UTR located upstream of said terminator sequence.

Further examples of vectors and expression systems useful in the practice of the invention are described in more detail hereinafter.

Hosts

In aspects of the invention a host may be converted from a phenotype whereby the host is unable to carry out effective melianol biosynthesis from OS to a phenotype whereby the host is able to carry out said melianol-biosynthesis, such that melianol can be recovered therefrom or utilised in vivo to synthesize downstream products.

Since the melianol precursor (2,3-oxidosqualene) is ubiquitous in higher plants due to its role in sterol biosynthesis, the present invention has wide applicability in plant hosts. As discussed herein, additional activities may be employed when practising the invention in microorganisms.

Examples hosts includes plants such as Nicotiana benthamiana and microorganisms such as yeast. These are discussed in more detail below.

The invention may comprise transforming the host with heterologous nucleic acid as described above by introducing the M-B nucleic acid into the host cell via a vector and causing or allowing recombination between the vector and the host cell genome to introduce a nucleic acid according to the present invention into the genome.

In another aspect of the invention there is provided a host cell transformed with a heterologous nucleic acid which comprises a plurality of nucleotide sequences each of which encodes a polypeptide which in combination have said melianol- biosynthesis activity,

wherein expression of said nucleic acid imparts on the transformed host the ability to carry out melianol-biosynthesis from OS, or improves said ability in the host. As explained above, one or more of the recited activities may be provided by enzymes or other polypeptides native to the host, provided at least one (e.g. one, two or three) nucleotide sequences are provided by heterologous M-B nucleic acid of the invention. Hosts of the invention are therefore non-naturally occurring in nature.

The invention further encompasses a host cell transformed with nucleic acid or a vector as described above (e.g. comprising the melianol-biosynthesis modifying nucleotide sequences) especially a plant or a microbial cell. In the transgenic host cell (i.e. transgenic for the nucleic acid in question) the transgene may be on an extra-genomic vector or incorporated, preferably stably, into the genome. There may be more than one heterologous nucleotide sequence per haploid genome.

The methods and materials described herein can be used, inter alia, to generate stable crop-plants that accumulate melianol. Examples of plants include row crops such as sunflower, potato, canola, dry bean, field pea, flax, safflower, buckwheat, cotton, maize, soybeans, and sugar beets. Major crop-plants such as corn, wheat, oilseed rape and rice may also be preferred hosts.

Plants which include a plant cell according to the invention are also provided. Production of products

The methods described above may be used to generate melianol in a heterologous host. The melianol will generally be non-naturally occurring in the species into which they are introduced.

□monoids from the plants or methods of the invention may be isolated and commercially exploited.

The methods above may form a part of, possibly one step in, a method of producing downstream limonoids such as azadirachtin in a host. The method may comprise the steps of culturing the host (where it is a microorganism) or growing the host (where it is a plant) and then harvesting it and purifying the melianol or a downstream product or derivative (e.g. azadirachtin) product therefrom. The product thus produced forms a further aspect of the present invention. The utility of limonoids is described above.

Alternatively, melianol may be recovered to allow for further chemical synthesis of limonoids or limonoids-based compounds such as pharmaceuticals.

Novel genes of the invention

In support of the present invention, the present inventors have newly characterised or identified sequences from Meliaceae or Rutaceae families which are believed to be involved in the synthesis of limonoids (see SEQ. ID: Nos 1-18; 33-41)

In preferred embodiments, the methods of the present invention will include the use of one or more of these newly characterised M-B nucleic acids of the invention (e.g. one, two, or three such M-B nucleic acids) optionally in conjunction with the manipulation of other genes affecting melianol biosynthesis known in the art. These newly characterised M-B sequences from Meliaceae or Rutaceae families (SEQ. ID: Nos 1-18; 33-41) form aspects of the invention in their own right, as do derived variants and materials of these sequences, and methods of using them.

Some aspects and embodiments of the present invention will now be described in more detail.

Detailed description of the invention

The present inventors utilised a variety of genome and transcriptome approaches with Meliaceae and Rutaceae species to begin to elucidate the biosynthetic pathway to structurally complex and important limonoids such as azadirachtin. Phylogenetic analysis, gene expression analysis and metabolite profiling have been used to identify OSCs from M. azedarach and C. sinensis and CYPs from Melia azedarach. Functional characterisation of candidate genes by heterologous expression in Saccharomyces cerevisiae or transient expression in Nicotiana benthamiana has led to the identification of three enzymes from M. azedarach that together are capable of biosynthesis of the 30C protolimonoid, melianol. Identification of the corresponding three C. sinensis homologs supports the notion of conserved initial biosynthesis for limonoids in Meliaceae and Rutaceae species.

In different embodiments, the present invention provides means for manipulation of total levels of limonoids or protolimonoids such as melianol in host cells such as microorganisms or plants.

In one aspect of the present invention, the M-B modifying nucleic acid described above is in the form of a recombinant and preferably replicable vector.

“Vector” is defined to include, inter alia, any plasmid, cosmid, phage or

Agrobacterium binary vector in double or single stranded linear or circular form which may or may not be self-transmissible or mobilizable, and which can transform a prokaryotic or eukaryotic host either by integration into the cellular genome or exist extrachromosomally (e.g. autonomous replicating plasmid with an origin of replication).

As is well known to those skilled in the art, a“binary vector” system includes (a) border sequences which permit the transfer of a desired nucleotide sequence into a plant cell genome; (b) desired nucleotide sequence itself, which will generally comprise an expression cassette of (i) a plant active promoter, operably linked to (ii) the target sequence and\or enhancer as appropriate. The desired nucleotide sequence is situated between the border sequences and is capable of being inserted into a plant genome under appropriate conditions. The binary vector system will generally require other sequence (derived from A. tumefaciens ) to effect the integration. Generally this may be achieved by use of so called "agro-infiltration" which uses Agrobacterium-mediated transient transformation. Briefly, this technique is based on the property of Agrobacterium tumefaciens to transfer a portion of its DNA ("T-DNA") into a host cell where it may become integrated into nuclear DNA. The T-DNA is defined by left and right border sequences which are around 21-23 nucleotides in length. The infiltration may be achieved e.g. by syringe (in leaves) or vacuum (whole plants). In the present invention the border sequences will generally be included around the desired nucleotide sequence (the T-DNA) with the one or more vectors being introduced into the plant material by agro-infiltration.

Generally speaking, those skilled in the art are well able to construct vectors and design protocols for recombinant gene expression. Suitable vectors can be chosen or constructed, containing appropriate regulatory sequences, including promoter sequences, terminator fragments, polyadenylation sequences, enhancer sequences, marker genes and other sequences as appropriate. For further details see, for example, Molecular Cloning: a Laboratory Manual: 2nd edition, Sambrook et at,

1989, Cold Spring Harbor Laboratory Press or Current Protocols in Molecular Biology, Second Edition, Ausubel et al. eds., John Wiley & Sons, 1992.

Specifically included are shuttle vectors by which is meant a DNA vehicle capable, naturally or by design, of replication in two different host organisms, which may be selected from actinomycetes and related species, bacteria and eucaryotic (e.g. higher plant, mosses, yeast or fungal cells).

A vector including nucleic acid according to the present invention need not include a promoter or other regulatory sequence, particularly if the vector is to be used to introduce the nucleic acid into cells for recombination into the genome.

Preferably the nucleic acid in the vector is under the control of, and operably linked to, an appropriate promoter or other regulatory elements for transcription in a host cell such as a microbial, e.g. yeast and bacterial, or plant cell. The vector may be a bi-functional expression vector which functions in multiple hosts. In the case of genomic DNA, this may contain its own promoter or other regulatory elements (optionally in combination with a heterologous enhancer, such as the 35S enhancer discussed in the Examples below). The advantage of using a native promoter is that this may avoid pleiotropic responses. In the case of cDNA this may be under the control of an appropriate promoter or other regulatory elements for expression in the host cell

By "promoter" is meant a sequence of nucleotides from which transcription may be initiated of DNA operably linked downstream (i.e. in the 3' direction on the sense strand of double-stranded DNA).

"Operably linked" means joined as part of the same nucleic acid molecule, suitably positioned and oriented for transcription to be initiated from the promoter. DNA operably linked to a promoter is "under transcriptional initiation regulation" of the promoter.

In a preferred embodiment, the promoter is an inducible promoter. The term "inducible" as applied to a promoter is well understood by those skilled in the art. In essence, expression under the control of an inducible promoter is

"switched on" or increased in response to an applied stimulus. The nature of the stimulus varies between promoters. Some inducible promoters cause little or undetectable levels of expression (or no expression) in the absence of the appropriate stimulus. Other inducible promoters cause detectable constitutive expression in the absence of the stimulus. Whatever the level of expression is in the absence of the stimulus, expression from any inducible promoter is increased in the presence of the correct stimulus.

Thus nucleic acid according to the invention may be placed under the control of an externally inducible gene promoter to place expression under the control of the user. An advantage of introduction of a heterologous gene into a plant cell, particularly when the cell is comprised in a plant, is the ability to place expression of the gene under the control of a promoter of choice, in order to be able to influence gene expression, and therefore melianol biosynthesis, according to preference.

Furthermore, mutants and derivatives of the wild-type gene, e.g. with higher or lower activity than wild-type, may be used in place of the endogenous gene.

Thus this aspect of the invention provides a gene construct, preferably a replicable vector, comprising a promoter (optionally inducible) operably linked to a nucleotide sequence provided by the present invention, such as the melianol-biosynthesis modifying gene, most preferably one of the M-B nucleic acids which are described herein, or a derivative thereof.

Particularly of interest in the present context are nucleic acid constructs which operate as plant vectors. Specific procedures and vectors previously used with wide success upon plants are described by Guerineau and Mullineaux (1993) (Plant transformation and expression vectors. In: Plant Molecular Biology Labfax (Cray RRD ed.) Oxford, BIOS Scientific Publishers, pp 121-148). Suitable vectors may include plant viral-derived vectors (see e.g. EP-A-194809).

Preferably the vectors of the present invention which are for use in plants comprise border sequences which permit the transfer and integration of the expression cassette into the plant genome. Preferably the construct is a plant binary vector. Preferably the binary transformation vector is based on pPZP (Hajdukiewicz, et al. 1994). Other example constructs include pBin19 (see Frisch, D. A., L. W. Harris- Haller, et al. (1995).“Complete Sequence of the binary vector Bin 19.” Plant Molecular Biology 27: 405-409).

Suitable promoters which operate in plants include the Cauliflower Mosaic Virus 35S (CaMV 35S). Other examples are disclosed at mg. 120 of Lindsey & Jones (1989) “Plant Biotechnology in Agriculture” Pub. OU Press, Milton Keynes, UK. The promoter may be selected to include one or more sequence motifs or elements conferring developmental and/or tissue-specific regulatory control of expression. Inducible plant promoters include the ethanol induced promoter of Caddick et al (1998) Nature Biotechnology 16: 177-180. If desired, selectable genetic markers may be included in the construct, such as those that confer selectable phenotypes such as resistance to antibiotics or herbicides (e.g. kanamycin, hygromycin, phosphinotricin, chlorsulfuron,

methotrexate, gentamycin, spectinomycin, imidazolinones and glyphosate). Positive selection system such as that described by Haldrup et al. 1998 Plant molecular Biology 37, 287-296, may be used to make constructs that do not rely on antibiotics.

As explained above, a preferred vector is a 'CPMV-HT' vector as described in W02009/087391. The Examples below demonstrate the use of these pEAQ-HT expression plasmids.

These vectors (typically binary vectors) for use in the present invention will typically comprise an expression cassette comprising:

(i) a promoter, operably linked to

(ii) an enhancer sequence derived from the RNA-2 genome segment of a bipartite RNA virus, in which a target initiation site in the RNA-2 genome segment has been mutated;

(iii) a M-B nucleic acid sequence as described above;

(iv) a terminator sequence; and optionally

(v) a 3’ UTR located upstream of said terminator sequence.

“Enhancer” sequences (or enhancer elements), as referred to herein, are sequences derived from (or sharing homology with) the RNA-2 genome segment of a bipartite RNA virus, such as a comovirus, in which a target initiation site has been mutated. Such sequences can enhance downstream expression of a heterologous ORF to which they are attached. Without limitation, it is believed that such sequences when present in transcribed RNA, can enhance translation of a heterologous ORF to which they are attached.

A“target initiation site” as referred to herein, is the initiation site (start codon) in a wild-type RNA-2 genome segment of a bipartite virus (e.g. a comovirus) from which the enhancer sequence in question is derived, which serves as the initiation site for the production (translation) of the longer of two carboxy coterminal proteins encoded by the wild-type RNA-2 genome segment.

Typically the RNA virus will be a comovirus as described hereinbefore.

Most preferred vectors are the pEAQ vectors of W02009/087391 which permit direct cloning version by use of a polylinker between the 5’ leader and 3’ UTRs of an expression cassette including a translational enhancer of the invention, positioned on a T-DNA which also contains a suppressor of gene silencing and an NPTII cassettes.

The presence of a suppressor of gene silencing in such gene expression systems is preferred but not essential. Suppressors of gene silencing are known in the art and described in WO/2007/135480. They include HcPro from Potato virus Y, He-Pro from TEV, P19 from TBSV, rgsCam, B2 protein from FHV, the small coat protein of CPMV, and coat protein from TCV. A preferred suppressor when producing stable transgenic plants is the P19 suppressor incorporating a R43W mutation.

The present invention also provides methods comprising introduction of such a construct into a plant cell or a microbial (e.g. bacterial, yeast or fungal) cell and/or induction of expression of a construct within a plant cell, by application of a suitable stimulus e.g. an effective exogenous inducer.

As an alternative to microorganisms, cell suspension cultures of engineered limonoid -producing plant species, including also the moss Physcomitrella patens, may be cultured in fermentation tanks (see e.g. Grotewold et al. (Engineering Secondary Metabolites in Maize Cells by Ectopic Expression of Transcription Factors, Plant Cell, 10, 721-740, 1998).

In a further aspect of the invention, there is disclosed a host cell containing a heterologous construct according to the present invention, especially a plant or a microbial cell.

The discussion of host cells above in relation to reconstitution of melianol

biosynthesis in heterologous organisms applies mutatis mutandis here.

Thus a further aspect of the present invention provides a method of transforming a plant cell involving introduction of a construct as described above into a plant cell and causing or allowing recombination between the vector and the plant cell genome to introduce a nucleic acid according to the present invention into the genome.

The invention further encompasses a host cell transformed with nucleic acid or a vector according to the present invention (e.g. comprising the melianol -biosynthesis modifying nucleotide sequence) especially a plant or a microbial cell. In the transgenic plant cell (i.e. transgenic for the nucleic acid in question) the transgene may be on an extra-genomic vector or incorporated, preferably stably, into the genome. There may be more than one heterologous nucleotide sequence per haploid genome.

Yeast has seen extensive employment as a triterpene-producing host and is therefore potentially well adapted for melianol biosynthesis.

Therefore in one embodiment, the host is a yeast. For such hosts, it may be desirable to introduce additional genes to improve the flux of melianol production as described above. Examples may include one or more plant cytochrome P450 reductases (CPRs) to serve as the redox partner to the introduced P450s, as well as an HMGR.

Plants, which include a plant cell transformed as described above, form a further aspect of the invention. If desired, following transformation of a plant cell, a plant may be regenerated, e.g. from single cells, callus tissue or leaf discs, as is standard in the art. Almost any plant can be entirely regenerated from cells, tissues and organs of the plant.

Available techniques are reviewed in Vasil et al., Cell Culture and Somatic Cell Genetics of Plants, Vol I, II and III, Laboratory Procedures and Their Applications, Academic Press, 1984, and Weissbach and Weissbach, Methods for Plant Molecular Biology, Academic Press, 1989.

In addition to the regenerated plant, the present invention embraces all of the following: a clone of such a plant, seed, selfed or hybrid progeny and descendants (e.g. F1 and F2 descendants). The invention also provides a plant propagule from such plants, that is any part which may be used in reproduction or propagation, sexual or asexual, including cuttings, seed and so on. It also provides any part of these plants (e.g. leaf, stem, dried or ground product, edible portion etc.), which in all cases include the plant cell or heterologous melianol-biosynthesis modifying DNA described above.

The present invention also encompasses the expression product of any of the coding melianol -biosynthesis modifying nucleic acid sequences disclosed and methods of making the expression product by expression from encoding nucleic acid therefore under suitable conditions, which may be in suitable host cells.

As described below, plant backgrounds such as those above may be natural or transgenic e.g. for one or more other genes relating to melianol biosynthesis, or otherwise affecting that phenotype or trait.

In modifying the host phenotypes, the M-B nucleic acids described herein may be used in combination with any other gene, such as transgenes affecting the rate or yield of melianol, or its modification, or any other phenotypic trait or desirable property.

By use of a combination of genes, plants or microorganisms (e.g. bacteria, yeasts or fungi) can be tailored to enhance production of desirable precursors, or reduce undesirable metabolism.

As an alternative, down-regulation of genes in the host may be desired e.g. to reduce undesirable metabolism or fluxes which might impact on M-B yield.

Such down regulation may be achieved by methods known in the art, for example using anti-sense technology.

In using anti-sense genes or partial gene sequences to down-regulate gene expression, a nucleotide sequence is placed under the control of a promoter in a "reverse orientation" such that transcription yields RNA which is complementary to normal mRNA transcribed from the "sense" strand of the target gene. See, for example, Rothstein et al, 1987; Smith et al, (1988) Nature 334, 724-726; Zhang et al, (1992) The Plant Cell 4, 1575-1588, English et ai, (1996) The Plant Cell 8, 179- 188. Antisense technology is also reviewed in Bourque, (1995), Plant Science 105, 125-149, and Flavell, (1994) PNAS USA 91 , 3490-3496.

An alternative to anti-sense is to use a copy of all or part of the target gene inserted in sense, that is the same, orientation as the target gene, to achieve reduction in expression of the target gene by co-suppression. See, for example, van der Krol et aI, (1990) The Plant Cell 2, 291-299; Napoli et al., (1990) The Plant Cell 2, 279-289; Zhang et al., (1992) The Plant Cell 4, 1575-1588, and US-A-5,231 ,020. Further refinements of the gene silencing or co-suppression technology may be found in W095/34668 (Biosource); Angell & Baulcombe (1997) The EMBO Journal

16,12:3675-3684; and Voinnet & Baulcombe (1997) Nature 389: mg 553.

Double stranded RNA (dsRNA) has been found to be even more effective in gene silencing than both sense or antisense strands alone (Fire A. et al Nature, Vol 391 , (1998)). dsRNA mediated silencing is gene specific and is often termed RNA interference (RNAi) (See also Fire (1999) Trends Genet. 15: 358-363, Sharp (2001) Genes Dev. 15: 485-490, Hammond et al. (2001) Nature Rev. Genes 2: 1110-1119 and Tuschl (2001) Chem. Biochem. 2: 239-245).

RNA interference is a two-step process. First, dsRNA is cleaved within the cell to yield short interfering RNAs (siRNAs) of about 21-23nt length with 5' terminal phosphate and 3' short overhangs (~2nt) The siRNAs target the corresponding mRNA sequence specifically for destruction (Zamore P.D. Nature Structural Biology, 8, 9, 746-750, (2001)

Another methodology known in the art for down-regulation of target sequences is the use of“microRNA” (miRNA) e.g. as described by Schwab et al 2006, Plant Cell 18, 1121-1133. This technology employs artificial miRNAs, which may be encoded by stem loop precursors incorporating suitable oligonucleotide sequences, which sequences can be generated using well defined rules in the light of the disclosure herein.

Thus in one aspect the invention provides a method for influencing or affecting limonoid or protolimonoid biosynthesis in a host, which method comprises any of the following steps of:

(i) causing or allowing transcription from a nucleic acid comprising the complement sequence of a M-B nucleotide sequence such as to reduce the respective encoded polypeptide activity by an antisense mechanism;

(ii) causing or allowing transcription from a nucleic acid encoding a stem loop precursor comprising 20-25 nucleotides, optionally including one or more

mismatches, of a M-B nucleotide sequence such as to reduce the respective encoded polypeptide activity by an miRNA mechanism;

(iii) causing or allowing transcription from nucleic acid encoding double stranded RNA corresponding to 20-25 nucleotides, optionally including one or more mismatches, of a m-B nucleotide sequence such as to reduce the respective encoded polypeptide activity by an siRNA mechanism. The methods of the present invention embrace both the in vitro and in vivo production, or manipulation, of one or more limonoids. For example, M-B

polypeptides may be employed in fermentation via expression in microorganisms such as e.g. E.coli, yeast and filamentous fungi and so on. In one embodiment, one or more newly characterised M-B sequences of the present invention may be used in these organisms in conjunction with one or more other biosynthetic genes.

In vivo methods are describe extensively above, and generally involve the step of causing or allowing the transcription of, and then translation from, a recombinant nucleic acid molecule encoding the M-B polypeptides.

In other aspects of the invention, the M-B polypeptides (enzymes) may be used in vitro, for example in isolated, purified, or semi-purified form. Optionally they may be the product of expression of a recombinant nucleic acid molecule.

Newly characterised sequences from Meliaceae or Rutaceae families

As noted above, in support of the present invention, the inventors have identified genes from Meliaceae or Rutaceae families which are believed to encode

polypeptides which affect melianol biosynthesis (see SEQ. ID: Nos 1-18; 33-41 in Table 1).

In certain aspects of the present invention, the M-B nucleic acid is derived from Meliaceae or Rutaceae families (SEQ. ID: Nos 1-18; 33-41).

The above newly characterised limonoid or protolimonoid biosynthetic genes from Meliaceae or Rutaceae families thus form aspects of the present invention in their own right.

In a further aspect of the present invention there are disclosed nucleic acids which are variants of the M-B nucleic acid derived from Meliaceae or Rutaceae families discussed above.

Such variants, as with the native M-B genes discussed herein, may be used to alter the limonoid (e.g. melianol or melianol derivative) content of a plant, as assessed by the methods disclosed herein. For instance a variant nucleic acid may include a sequence encoding a variant M-B polypeptide sharing the relevant biological activity of the native M-B polypeptide, as discussed above. Examples include variants of any of SEQ ID Nos 2, 4, 6, 8, 10, 12, 14, 16 or 18.

Derivatives

Described herein are methods of producing a derivative nucleic acid comprising the step of modifying any of the M-B genes of the present invention disclosed above, particularly the M-B sequences from Meliaceae or Rutaceae families such as A. indica, M. azedarach or C. sinensis. Changes may be desirable for a number of reasons. For instance they may introduce or remove restriction endonuclease sites or alter codon usage. This may be particularly desirable where the M-B genes are to be expressed in alternative hosts e.g. microbial hosts such as yeast. Methods of codon optimizing genes for this purpose are known in the art (see e.g. Elena, Claudia, et al. "Expression of codon optimized genes in microbial systems: current industrial applications and perspectives." Frontiers in microbiology 5 (2014)). Thus sequences described herein including codon modifications to maximise yeast expression represent specific embodiments of the invention.

Alternatively changes to a sequence may produce a derivative by way of one or more (e.g. several) of addition, insertion, deletion or substitution of one or more nucleotides in the nucleic acid, leading to the addition, insertion, deletion or substitution of one or more (e.g. several) amino acids in the encoded polypeptide.

Such changes may modify sites which are required for post translation modification such as cleavage sites in the encoded polypeptide; motifs in the encoded polypeptide for phosphorylation etc. Leader or other targeting sequences (e.g. membrane or golgi locating sequences) may be added to the expressed protein to determine its location following expression if it is desired to isolate it from a microbial system.

Other desirable mutations may be random or site directed mutagenesis in order to alter the activity (e.g. specificity) or stability of the encoded polypeptide. Changes may be by way of conservative variation, i.e. substitution of one hydrophobic residue such as isoleucine, valine, leucine or methionine for another, or the substitution of one polar residue for another, such as arginine for lysine, glutamic for aspartic acid, or glutamine for asparagine. As is well known to those skilled in the art, altering the primary structure of a polypeptide by a conservative substitution may not significantly alter the activity of that peptide because the side-chain of the amino acid which is inserted into the sequence may be able to form similar bonds and contacts as the side chain of the amino acid which has been substituted out. This is so even when the substitution is in a region which is critical in determining the peptides

conformation. Also included are variants having non-conservative substitutions. As is well known to those skilled in the art, substitutions to regions of a peptide which are not critical in determining its conformation may not greatly affect its activity because they do not greatly alter the peptide's three dimensional structure. In regions which are critical in determining the peptides conformation or activity such changes may confer advantageous properties on the polypeptide. Indeed, changes such as those described above may confer slightly advantageous properties on the peptide e.g. altered stability or specificity.

Fragments

The present invention may utilise fragments of the polypeptides encoding the M-B genes of the present invention disclosed above, particularly the M-B sequences from Meliaceae or Rutaceae families such as A. indica, M. azedarach or C. sinensis. Thus the present invention provides for the production and use of fragments of the full-length M-B polypeptides of the invention disclosed herein, especially active portions thereof. An“active portion” of a polypeptide means a peptide which is less than said full length polypeptide, but which retains its essential biological activity.

A“fragment” of a polypeptide means a stretch of amino acid residues of at least about five to seven contiguous amino acids, often at least about seven to nine contiguous amino acids, typically at least about nine to 13 contiguous amino acids and, most preferably, at least about 20 to 30 or more contiguous amino acids.

Fragments of the polypeptides may include one or more epitopes useful for raising antibodies to a portion of any of the amino acid sequences disclosed herein.

Preferred epitopes are those to which antibodies are able to bind specifically, which may be taken to be binding a polypeptide or fragment thereof of the invention with an affinity which is at least about 1000x that of other polypeptides.

For brevity, and of these M-B sequences from the Meliaceae or Rutaceae families A. indica, M. azedarach or C. sinensis, or variants (e.g. derivatives such as fragments thereof) may be referred to as“MR M-B sequences (or nucleic acid, or polypeptide)”. These MRM-B polypeptides, and nucleic acids encoding them, form one aspect of the invention.

It will be appreciated that where this term is used generally, it also applies to any of these sequences individually.

Thus in one aspect of the invention, there is disclosed isolated nucleic acid encoding any of these polypeptides (2, 4, 6, 8, 10, 12, 14, 16, 18, 34, 37, 39 or 41).

Preferably this may have the sequence of 1 , 3, 5, 7, 9, 11 , 13, 15, 17, 33, 35, 36, 38 or 40). Other nucleic acids of the invention include those which are degeneratively equivalent to these, or homologous variants (e.g. derivatives) of these.

Aspects of the invention further embrace isolated nucleic acid comprising a sequence which is complementary to any of those discussed hereinafter.

Use of a MRM-B sequence to catalyse its respective biological activity (e.g. as described in Fig. 5D or Fig. 19) forms another aspect of the invention..

Thus the invention further provides a method of influencing or affecting limonoid e.g. melianol biosynthesis in a host such as a plant, the method including causing or allowing transcription of a heterologous MRM-B nucleic acid as discussed above within the cells of the plant. The step may be preceded by the earlier step of introduction of the MRM-B nucleic acid into a cell of the plant or an ancestor thereof.

Such methods will usually form a part of, possibly one step in, a method of producing a limonoid or protolimonoid e.g. melianol in a host such as a plant. Preferably the method will employ a M-B modifying polypeptide of the present invention (e.g. in Table 1) or derivative thereof, as described above, or nucleic acid encoding either. ln a further embodiment, there are provided antibodies raised to a MRM-B

polypeptides or peptides of the invention

Some aspects of the invention as it relates to heterologous reconstitution of the biosynthetic pathways discussed above will now be discussed in more detail.

“Nucleic acid” according to the present invention may include cDNA, RNA, genomic DNA and modified nucleic acids or nucleic acid analogs (e.g. peptide nucleic acid). Where a DNA sequence is specified, e.g. with reference to a figure, unless context requires otherwise the RNA equivalent, with U substituted for T where it occurs, is encompassed. Nucleic acid molecules according to the present invention may be provided isolated and/or purified from their natural environment, in substantially pure or homogeneous form, or free or substantially free of other nucleic acids of the species of origin, and double or single stranded. Where used herein, the term “isolated” encompasses all of these possibilities. The nucleic acid molecules may be wholly or partially synthetic. In particular they may be recombinant in that nucleic acid sequences which are not found together in nature (do not run contiguously) have been ligated or otherwise combined artificially. Nucleic acids may comprise, consist, or consist essentially of, any of the sequences discussed hereinafter.

The term "heterologous" is used broadly herein to indicate that the gene/sequence of nucleotides in question (e.g. encoding melianol -biosynthesis modifying polypeptides) have been introduced into said cells of the host or an ancestor thereof, using genetic engineering, i.e. by human intervention. Nucleic acid heterologous to a host cell will be non-naturally occurring in cells of that type, variety or species. Thus the heterologous nucleic acid may comprise a coding sequence of or derived from a particular type of plant cell or species or variety of plant, placed within the context of a plant cell of a different type or species or variety of plant. A further possibility is for a nucleic acid sequence to be placed within a cell in which it or a homologue is found naturally, but wherein the nucleic acid sequence is linked and/or adjacent to nucleic acid which does not occur naturally within the cell, or cells of that type or species or variety of plant, such as operably linked to one or more regulatory sequences, such as a promoter sequence, for control of expression.

“Transformed” in this context means that the nucleotide sequences of the

heterologous nucleic acid alter one or more of the cell’s characteristics and hence phenotype e.g. with respect to limonoid or protolimonoid e.g. melianol or melianol derivative biosynthesis. Such transformation may be transient or stable. As noted above, the term“protolimonoid” refers to immediate precursors to limonoids, achieved by oxidations of tirucallla-7,24-dien-3b-ol scaffold by suitable Cyp enzymes (see Figure 1).

“Unable to carry out melianol biosynthesis” means that the host, prior to the conversion, does not, or is not believed to, naturally produce detectable or recoverable levels of melianol under normal metabolic circumstances of that host. The nucleotide sequence information provided herein may be used to design probes and primers for probing or amplification. An oligonucleotide for use in probing or PCR may be about 30 or fewer nucleotides in length (e.g. 18, 21 or 24). Generally specific primers are upwards of 14 nucleotides in length. For optimum specificity and cost effectiveness, primers of 16-24 nucleotides in length may be preferred. Those skilled in the art are well versed in the design of primers for use in processes such as PCR. If required, probing can be done with entire restriction fragments of the gene disclosed herein which may be 100's or even 1000's of nucleotides in length. Small variations may be introduced into the sequence to produce‘consensus’ or ‘degenerate’ primers if required.

Probing may employ the standard Southern blotting technique. For instance DNA may be extracted from cells and digested with different restriction enzymes.

Restriction fragments may then be separated by electrophoresis on an agarose gel, before denaturation and transfer to a nitrocellulose filter. Labelled probe may be hybridised to the single stranded DNA fragments on the filter and binding determined. DNA for probing may be prepared from RNA preparations from cells. Probing may optionally be done by means of so-called‘nucleic acid chips’ (see Marshall &

Hodgson (1998) Nature Biotechnology 16: 27-31 , for a review).

In one embodiment, a variant encoding a melianol-biosynthesis modifying

polypeptide in accordance with the present invention is obtainable by means of a method which includes:

(a) providing a preparation of nucleic acid, e.g. from plant cells. Test nucleic acid may be provided from a cell as genomic DNA, cDNA or RNA, or a mixture of any of these, preferably as a library in a suitable vector. If genomic DNA is used the probe may be used to identify untranscribed regions of the gene (e.g. promoters etc.), such as are described hereinafter,

(b) providing a nucleic acid molecule which is a probe or primer as discussed above,

(c) contacting nucleic acid in said preparation with said nucleic acid molecule under conditions for hybridisation of said nucleic acid molecule to any said gene or homologue in said preparation, and,

(d) identifying said gene or homologue if present by its hybridisation with said nucleic acid molecule. Binding of a probe to target nucleic acid (e.g. DNA) may be measured using any of a variety of techniques at the disposal of those skilled in the art. For instance, probes may be radioactively, fluorescently or enzymatically labelled. Other methods not employing labelling of probe include amplification using PCR (see below), RN’ase cleavage and allele specific oligonucleotide probing. The identification of successful hybridisation is followed by isolation of the nucleic acid which has hybridised, which may involve one or more steps of PCR or amplification of a vector in a suitable host.

Preliminary experiments may be performed by hybridising under low stringency conditions. For probing, preferred conditions are those which are stringent enough for there to be a simple pattern with a small number of hybridisations identified as positive which can be investigated further. For example, hybridizations may be performed, according to the method of

Sambrook et al. (below) using a hybridization solution comprising: 5X SSC (wherein ‘SSC’ = 0.15 M sodium chloride; 0.15 M sodium citrate; pH 7), 5X Denhardt’s reagent, 0.5-10% SDS, 100 mg/ml denatured, fragmented salmon sperm DNA,

0.05% sodium pyrophosphate and up to 50% formamide. Hybridization is carried out at 37-42°C for at least six hours. Following hybridization, filters are washed as follows: (1) 5 minutes at room temperature in 2X SSC and 1 % SDS; (2) 15 minutes at room temperature in 2X SSC and 0.1% SDS; (3) 30 minutes - 1 hour at 37°C in 1X SSC and 1 % SDS; (4) 2 hours at 42-65°C in 1X SSC and 1% SDS, changing the solution every 30 minutes.

One common formula for calculating the stringency conditions required to achieve hybridization between nucleic acid molecules of a specified sequence homology is (Sambrook et al., 1989): T m = 81.5°C + 16.6Log [Na+] + 0.41 (% G+C) - 0.63 (% formamide) - 600/#bp in duplex

As an illustration of the above formula, using [Na+] = [0.368] and 50-% formamide, with GC content of 42% and an average probe size of 200 bases, the T m is 57°C.

The T m of a DNA duplex decreases by 1 - 1.5°C with every 1% decrease in homology. Thus, targets with greater than about 75% sequence identity would be observed using a hybridization temperature of 42°C. Such a sequence would be considered substantially homologous to the nucleic acid sequence of the present invention.

It is well known in the art to increase stringency of hybridisation gradually until only a few positive clones remain. Other suitable conditions include, e.g. for detection of sequences that are about 80-90% identical, hybridization overnight at 42°C in 0.25M Na 2 HP0 4 , pH 7.2, 6.5% SDS, 10% dextran sulfate and a final wash at 55°C in 0.1X SSC, 0.1% SDS. For detection of sequences that are greater than about 90% identical, suitable conditions include hybridization overnight at 65°C in 0.25M

Na 2 HP0 4 , pH 7.2, 6.5% SDS, 10% dextran sulfate and a final wash at 60°C in 0.1X SSC, 0.1% SDS.

In a further embodiment, hybridization of a nucleic acid molecule to a variant may be determined or identified indirectly, e.g. using a nucleic acid amplification reaction, particularly the polymerase chain reaction (PCR). PCR requires the use of two primers to specifically amplify target nucleic acid, so preferably two nucleic acid molecules with sequences characteristic of a M-B gene of the present invention are employed. Using RACE PCR, only one such primer may be needed (see "PCR protocols; A Guide to Methods and Applications", Eds. Innis et al, Academic Press, New York, (1990)).

Thus a method involving use of PCR in obtaining nucleic acid according to the present invention may include:

(a) providing a preparation of plant nucleic acid, e.g. from a seed or other appropriate tissue or organ, (b) providing a pair of nucleic acid molecule primers useful in (i.e. suitable for) PCR, at least one of said primers being a primer according to the present invention as discussed above,

(c) contacting nucleic acid in said preparation with said primers under conditions for performance of PCR,

(d) performing PCR and determining the presence or absence of an amplified PCR product. The presence of an amplified PCR product may indicate identification of a variant.

In all cases above, if need be, clones or fragments identified in the search can be extended. For instance if it is suspected that they are incomplete, the original DNA source (e.g. a clone library, mRNA preparation etc.) can be revisited to isolate missing portions e.g. using sequences, probes or primers based on that portion which has already been obtained to identify other clones containing overlapping sequence.

Purified protein (polypeptide, enzyme), or a fragment, mutant, derivative or variant thereof, e.g. produced recombinantly by expression from encoding nucleic acid therefor, forms one aspect of the invention.

Such purified polypeptides may be used to raise antibodies employing techniques which are standard in the art. Antibodies and polypeptides comprising antigen binding fragments of antibodies may be used in identifying homologues from other species as discussed further below.

Methods of producing antibodies include immunising a mammal (e.g. human, mouse, rat, rabbit, horse, goat, sheep or monkey) with the protein or a fragment thereof. Antibodies may be obtained from immunised animals using any of a variety of techniques known in the art, and might be screened, preferably using binding of antibody to antigen of interest. For instance, Western blotting techniques or immunoprecipitation may be used (Armitage et al, 1992, Nature 357: 80-82).

Antibodies may be polyclonal or monoclonal.

As an alternative or supplement to immunising a mammal, antibodies with

appropriate binding specificity may be obtained from a recombinantly produced library of expressed immunoglobulin variable domains, e.g. using lambda

bacteriophage or filamentous bacteriophage which display functional immunoglobulin binding domains on their surfaces; for instance see W092/01047.

Antibodies raised to a polypeptide or peptide can be used in the identification and/or isolation of homologous polypeptides, and then the encoding genes.

Antibodies may be modified in a number of ways. Indeed the term“antibody” should be construed as covering any specific binding substance having a binding domain with the required specificity. Thus, this term covers antibody fragments, derivatives, functional equivalents and homologues of antibodies, including any polypeptide comprising an immunoglobulin binding domain, whether natural or synthetic. A number of patents and publications are cited herein in order to more fully describe and disclose the invention and the state of the art to which the invention pertains. Each of these references is incorporated herein by reference in its entirety into the present disclosure, to the same extent as if each individual reference was specifically and individually indicated to be incorporated by reference.

Throughout this specification, including the claims which follow, unless the context requires otherwise, the word“comprise,” and variations such as“comprises” and “comprising,” will be understood to imply the inclusion of a stated integer or step or group of integers or steps but not the exclusion of any other integer or step or group of integers or steps.

It must be noted that, as used in the specification and the appended claims, the singular forms“a,”“an,” and“the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to“a pharmaceutical carrier” includes mixtures of two or more such carriers, and the like.

Ranges are often expressed herein as from“about” one particular value, and/or to “about” another particular value. When such a range is expressed, another embodiment includes from the one particular value and/or to the other particular value. Similarly, when values are expressed as approximations, by the use of the antecedent“about,” it will be understood that the particular value forms another embodiment.

Any sub-titles herein are included for convenience only, and are not to be construed as limiting the disclosure in any way.

The invention will now be further described with reference to the following non limiting Figures and Examples. Other embodiments of the invention will occur to those skilled in the art in the light of these.

The disclosure of all references cited herein, inasmuch as it may be used by those skilled in the art to carry out the invention, is hereby specifically incorporated herein by cross-reference.

Figures

Figure 1. Hypothetical route of limonoid biosynthesis.

Predictions of the major biosynthetic steps required for the biosynthesis of limonoids. The triterpene precursor 2,3-oxidosqualene is proposed to be cyclised to an unconfirmed tetracyclic triterpene scaffold. The structure of ring-intact limonoids implicates a tetracyclic triterpene precursor of either the euphane (20R) or tirucallane (20S) type. Retrosynthetic discrimination between these two side chains is impossible based on limonoid structures, because the formation of the furan ring eradicates any remnants of the precursor’s C20 stereochemistry. However predictions can be made based on the immediate precursors of limonoids

(protolimonoids), for instance the C20 carbon of melianol (4) has been assigned (although not yet confirmed by X-ray crystallography) as the S configuration which implies a tirucallane precursor (49). Further, the C7-8 alkene of certain

protolimonoids suggests the most likely triterpene precursor is in fact tirucalla-7,24- dien-3b-ol (1), as indicated by the retrosynthetic arrow (*), rather than tirucallol itself. Biosynthesis of limonoids from triterpene scaffolds is predicted to occur through protolimonoid structures such as melianol (4) and requires two major biosynthetic steps: scaffold rearrangements, and furan ring formation accompanied by loss of four carbons. Scaffold rearrangement is proposed to be initiated by epoxidation of the C7 double bond (C7-8 epoxide) and furan ring formation could feasibly be initiated through oxidation and cyclisation of the C20 tail (melianol (4)). The diversity of protolimonoid structures isolated has led to different predictions of the order of these two events (2, 50). The hemiacetal side chains of isolated protolimonoids such as melianol (4) suggest a Paal-Knorr-like (51 , 52) route to the furan ring of limonoids. Isolation of nimbocinone from A. indica (Fig. 6) (53), a feasible degradation product of this route (**), further supports conversion of protolimonoids to ring-intact limonoids through this mechanism. The ring intact 7-deactylazadirone has been isolated from both Meliaceae (54) and Rutaceae (55) species. Numerous further chemical transformations are required for the formation of seco-ring limonoid derivatives. In the Rutaceae radioactive [ 14 C]-labelling experiments in C. limon (lemon) have helped to delineate the late stages of the pathway, proving that nomilin can be biosynthesised in the stem and converted into other limonoids such as limonin and obacunone (Fig. 6) elsewhere in the plant (56-58).

Figure 2. Identification and characterisation of oxidosqualene cyclases (OSCs) from limonoid producing species.

(A) Phylogenetic tree of candidate OSCs from Azadirachta indica (blue), Melia azedarach (green) and Citrus sinensis (orange). Functionally characterised OSCs from other plant species (11) are included, with the two previously characterised tirucalla-7,24-dien-3b-ol synthases from Arabidopsis thaliana ( AtLUP5 , At PEN 3) highlighted (yellow). Human and prokaryotic OSCs sequences used as an outgroup are represented by the grey triangle. Candidate OSCs chosen for further analysis are indicated (circles). The phylogenetic tree was constructed by FastTree V2.1.7 (59) and formatted using iTOL (60). Local support values from FastTree Shimodaira- Hasegawa (SH) test (between 0.6 and 1.0) are indicated at nodes and scale bar depicts estimated number of amino acid substitutions per site. (B) GC-MS total ion chromatograms of derivatised extracts from yeast strains expressing candidate OSCs. Traces for the empty vector (pYES2) and strains expressing the candidates AiOSCI (blue), MaOSCI (green), CsOSCI (orange) and the previously

characterised AtLUP5 (yellow) are shown. (C) GC-MS mass spectra of TMS-tirucalla- 7,24-dien-3b-ol (1). (D) Confirmation of the structure of the cyclization product generated by AiOSCI as tirucalla-7,24-dien-3b-ol (1) by NMR (Table S3). Figure 3. Expression patterns of AiOSC1 and other co-expressed genes in A. indica.

A heatmap of a subset of differentially expressed (p<0.05) genes with similar expression patterns to AiOSC1 (blue circle) across flower, root, fruit and leaf tissues of A. indica are shown. Raw RNAseq reads (19, 22) were aligned to a Trinity- assembled transcriptome of the same dataset. Read counts were normalised to library size and logR 2 R-transformed. Values depicted are scaled by row (gene) to emphasise differences across tissues. The Pfam identifier for relevant predicted gene is included next to the contig number. Genes with no structural (Augustus) or functional (Pfam) annotations have been excluded. Cytochrome P450 (CYP) candidates AiCYP71BQ5, AiCYP72A721 and AiCYP88A108 (blue triangles) are indicated with the latter two being considered gene fragments (< 300 amino acids).

Figure 4. Accumulation of melianol and salannin and expression of MaOSCI, MaCYP71CD2 and MaCYP71BQ5 in Melia azedarach.

(A) Estimated concentrations (mg/g DW, n=4±SE) of the protolimonoid melianol (4) and the seco-C-ring limonoid salannin in extracts from M. azedarach leaf, root and petiole tissue. (B) Normalised expression of MaOSCI, MaCYP71CD2 and

MaCYP71BQ5 relative to Mab-acf/n in RNA from leaves, roots and petioles of M. azedarach by qRT-PCR. Relative expression levels were calculated using the AACq method (47) (n=4±SE). T-test significance values are indicated: not significant (NS), p-value £ 0.05 (*), 0.01 (**) or £ 0.001 (***).

Figure 5. Identification and functional analysis of cytochrome P450 enzymes capable of melianol biosynthesis.

(A) A subset of a larger phylogenetic tree showing the CYP71 family. Candidate cytochrome P450s (CYPs) from M. azedarach (green) and previously identified CYPs from A. thaliana (34Thttp://www.p450.kyl.dk34T) and Cucumis sativus

(34Thttp://drnelson. uthsc.edu/cvtochromeP450.html34T) (black) are included.

Candidate CYPs selected for cloning (Table S4) were identified by homology to A. indica candidate CYPs identified as co-expressed with AiOSC1 (triangle) or occurrence in a unique CYP71 subclade lacking close homologs from A. thaliana or C. sativus (squares). The phylogenetic tree was constructed by FastTree V2.1.7 (59) and formatted using iTOL (60). Local support values from FastTree Shimodaira- Hasegawa (SH) test (between 0.6 and 1.0) are indicated at nodes and scale bar depicts estimated number of amino acid substitutions per site. GC-MS total ion chromatograms (B) and LC-MS ESI extracted ion chromatograms (C) of triterpene extracts from agro- infiltrated Nicotiana benthamiana leaves expressing A. indica and M. azedarach candidate genes in the pEAQ-/-/T-DEST1 vector. (D)The peak marked with an asterisk is an endogenous N. benthamiana peak, not tirucalla-7,24-dien-3b-ol (1) (Fig. 9). (E) Proposed pathway of melianol (4) biosynthesis in M. azedarach, NMR confirmation of all structures can be found in Tables S3 and S5-S7). Fig. 6. Structures of additional protolimonoids and family-specific limonoids referred to in this work

Fig. 7. Mass spectra of TMS-dihydroniloticin (2) and TMS-tirucalla-7,24-dien- 21,3b-oI (3) generated by GC-MS.

(A) Mass spectra corresponding to GC-MS total ion chromatograms, depicted in Fig. 5 B, of derivatized triterpene extracts from agroinfiltrated Nicotiana benthamiana leaves expressing candidate genes in pEAQ-H7-DEST1. (B) Structures of TMS- dihydroniloticin (2) and TMS-tirucalla-7,24-dien-21 ,3b-ol (3).

Fig. 8. Mass spectra of dihydroniloticin (2) and melianol (4) generated by (+)- UHPLC-IT-TOF ESI.

(A) Mass spectra corresponding to LC-MS extracted ion chromatograms, depicted in Fig. 5C, of triterpene extracts from agroinfiltrated Nicotiana benthamiana leaves expressing candidate Meliaceae genes in pEAQ-/-/7-DEST1. Observed adducts are listed. (B) Structures and exact mass of dihydroniloticin (2) and melianol (4).

Fig. 9. Mass spectra of co-eluting peak (*) compared to tirucalla-7,24-dien-38-ol (1 )·

(A) GC-MS total ion chromatograms of extracts from N. benthamiana expressing

AiOSC1 and control. Tirucalla-7,24-dien-3b-ol (1) (black) and co-eluting peak (red asterisk) are indicated. (B) Mass spectra generated by GC-MS of tirucalla-7,24-dien- 3b-oI (1) and co-eluting peak (red).

Fig. 10. Expression patterns of differentially expressed candidate genes from M. azedarach (Elv1).

Genes were selected based on annotation, from a larger subset of differentially expressed genes identified as co-expressed based on hierarchical clustering. Genes are labelled with Elv1 identifier and given name (based on human readable annotation). The melianol biosynthetic genes ( MaOSCI , MaCYP71CD2 and

MaCYP71BQ5) are included for comparison (bold). Read counts used for hierarchical clustering were normalised by library size and log2 transformed. The heatmap was constructed by Heatmap3 V1.1.1 (99) with scaling performed by row (gene) to emphasise pattern of expression.

Fig. 11. Expression of MaCYP88A108 in N. benthamiana.

(A) UHPLC-IT-TOF generated EICs of extracts of N. benthamiana leaves expressing melianol biosynthetic genes ( AiOSC1 , MaCYP71CD2 and MaCYP71BQ5) with and without MaCYP88A108. The UHPLC-IT-TOF‘limonoid’ gradient was used. The EICs shown are for the following adducts [melianol+Na] + =495.3433 (red) and

[melianol+0+Na] + =511.3384 (blue). Melianol peaks are labelled along with newly identified peaks (5-11). Mass spectra of newly identified peaks are provided in Figure 12. (B) Predictions of structures, exact masses and mechanisms of formation of oxidised melianol peaks (8-11). Structure of 8-methyl-melian-14,15-ene-3,7-diol has here been termed melianol B for simplicity. Fig. 12. Mass spectra of oxidised melianol peaks (5-11).

Mass spectra corresponding to UHPLC-IT-TOF analysis, depicted in Figure 11 , of triterpene extracts from agroinfiltrated N. benthamiana leaves expressing candidate genes in pEAQ-HT-DEST1. (A) Mass spectra of peaks (5-7) present when melianol biosynthetic genes ( AiOSC1 , MaCYP71CD2 and MaCYP71BQ5) alone are expressed in N. benthamiana and thought to be the result of endogenous N.

benthamiana modification. (B) Mass spectra of peaks (8-11) present when melianol biosynthetic genes are expressed with MaCYP88A108. UHPLC-IT-TOF performed using‘limonoid’ method.

Fig. 13. Expression of MalSOMI in N. benthamiana.

(A) UHPLC-ITOF generated EIC of extracts from N. benthamiana leaves expressing melianol biosynthetic genes ( AiOSC1 , MaCYP71 CD2 and MaCYP71BQ5) and MaCYP88A108 with and without MalSOMI. The UHPLC-IT-TOF limonoid’ gradient was used. The EIC trace shown is for the [melianol+0+Na] + =511.3369 adduct (blue) and corresponding peaks are labelled (8-11). Mass spectra are available in Figure 14A. (B) Structure, exact mass and predicted mechanism of formation of melianol B (10) (of 8-methyl-melian-14,15-ene-3,7-diol).

Fig. 14. Mass spectra of newly identified peaks (10,12-20).

Mass spectra, corresponding to UHPLC-IT-TOF analysis of N. benthamiana leaves expressing candidate genes in pEAQ-HT-DEST1. (A) Mass spectra for peaks: (10) produced by MalSOMI (Figure 13), (12) produced endogenously in N. benthamiana, (13) produced by MaSDRI and (14-16) produced by MaBAHDI (Figure 16). (B)

Mass spectra for peaks (17-20) produced by co-expression of AiOSC1,

MaCYP71CD2, MaCYP71BQ5, MaCYP88A108, MalSOMI MaSDRI and MaBAHDI (Figure 18). UHPLC-IT-TOF analysis was performed using limonoid’ method.

Fig. 15. Combinatorial transient expression of MalSOMI in N. benthamiana.

UHPLC-IT-TOF generated EICs of methanol extracts from agroinfiltrated N.

benthamiana leaves expressing MalSOMI in combination with melianol biosynthetic genes ( AiOSC1 , MaCYP71CD2 and MaCYP71BQS), with and without

MaCYP88A108. EICs displayed are for the adducts [melianol+Na]+=495.3489 (red) and [melianol+0+Na]+=511.3411 (blue). UHPLC-IT-TOF analysis performed using limonoid’ method.

Fig. 16. Expression of MaSDRI and MaBAHDI in N. benthamiana.

UHPLC-IT-TOF EICs of extracts from N. benthamiana leaves expressing melianol B biosynthetic genes ( AiOSC1 , MaCYP71CD2, MaCYP71BQ5, MaCYP88A 108 and MalSOMI) with and without MaSDRI (A) and with and without MaBAHDI (B). The UHPLC-IT-TOF limonoid’ gradient was used. The EICs shown are for the following adducts: [melianol+0+Na] + =511.3369 (blue), [melianol+O-2H+Na] + =509.3254 (pink) and [melianol+0+CH2CO+Na] + =553.3525 (purple). Peaks are labelled as follows: melianol B (10); dehydrogenated oxidised melianol B (12-13); oxidised melianol B acetate (14-16). The mass spectra for newly identified peaks is provided in Figure 14. (C) Predictions of structures, exact masses and mechanisms of formation of the new peaks (9-16). Fig. 17. Combinatorial transient expression of MaSDRI and MaBAHDI in N. benthamiana.

UHPLC-IT-TOF generated EICs of methanol extracts from agroinfiltrated N.

benthamiana leaves expressing MaSDRI and MaBAHDI in combination with melianol biosynthetic genes ( AiOSC1 , MaCYP71CD2 and MaCYP71BQ5 ). EICs displayed are for the following adducts: [melianol+Na] + =495.3440 (red), [melianol- 2H+Na] + =493.3291 (pink) and [melianol+CH2CO+Na] + =537.3563 (purple). Mass spectra of new peaks (highlighted with grey arrow) are given. UHPLC-IT-TOF analysis performed using‘limonoid’ method. Predicted structures for labelled peaks are also provided (with exact mass and sodium adduct).

Fig. 18. Transient expression of melianol B biosynthetic genes in combination with MaBAHDI and MaSDRI in N. benthamiana. (A) UHPLC-IT-TOF generated EICs of methanol extracts from agroinfiltrated N. benthamiana leaves expressing melianol B biosynthetic genes ( AiOSC1 , MaCYP71CD2, MaCYP71BQ5,

MaCYP88A108 and MalSOMI) with and without both tailoring genes ( MaSDRI and MaBAHDI). EICs displayed are for the adducts [melianol+0+Na] + =511.3369 (blue) and [melianol+0-2H+CH2CO] + 551.3358 (orange). Newly identified peaks are labelled (17-20) and mass spectra are provided in Figure 15. (B) A selection of postulated structures which new peaks (17-20) could represent based on mass spectra.

Fig. 19. Current understanding of the limonoid biosynthetic pathway.

Pathway showing the steps of protolimonoid biosynthesis which have been characterised in this work along with predicted major biosynthetic transformations required to reach azadirachtin. For each step modifications are shown in red. The function of all enzymes in the protolimonoid biosynthetic pathway have been confirmed by NMR with the exception of MaSDRI and MaBAHDI (indicated by (*)) whose functions have been predicted based on MS data.

Examples

Example 1 - Characterisation of OSCs catalysing the formation of tirucall-7.24- dien-38-ol from three limonoid-producinq species.

Four sequence resources were used to search for candidate genes implicated in limonoid biosynthesis: a Citrus sinensis var. Valencia genome annotation

downloaded from NCBI (25); two A. indica transcriptomes that we assembled from raw RNAseq data downloaded from NCBI-SRA, using the Trinity de novo assembler (12, 19, 22); and a Melia azedarach transcriptome that we assembled using the same method (23) (Table S1). The protein sequences of 83 previously characterised OSCs (11) were used as a BLAST+ query. Hits were filtered based on predicted protein sequence length and presence of a conserved triterpene synthase motif (26). Phylogenetic analysis revealed that of the 10 candidate OSCs identified, three grouped with a conserved clade containing characterised cycloartenol synthases, and a fourth with lanosterol synthases (Fig. 2 A). These OSCs are therefore deemed likely to have functions in sterol biosynthesis rather than in limonoid biosynthesis (11). The remaining candidates fell into other more diverse triterpene OSC clades and so were deemed more likely to have roles in specialized metabolism. Three of these were from C. sinensis, one from A. indica and another from M. azedarach.

One of the C. sinensis OSCs formed a tight subclade [subclade 1 , Shimodaira- Hasegawa (SH) local support value of 1] with the latter two Meliaceae candidates (Fig. 2A), making these the most promising candidates for limonoid biosynthesis.

Functional characterization of these three OSC candidates (named AiOSCI , MaOSCI and CsOSCI) (Fig. 2 A) was performed by expression in Saccharomyces cerevisiae strain GIL77 (Table S2). A single major product with the same retention time was detected when extracts of yeast strains expressing each of the three OSCs were analysed by GC-MS. This product was tentatively identified as tirucalla-7,24- dien-3b-ol (1) in all three cases, based on its mass spectrum. Comparison of the retention time and mass spectra of the product with that of the multifunctional OSC Atl_UP5 (from Arabidopsis thaliana), which produces tirucalla-7,24-dien-3b-ol (1) as part of its product profile, was consistent with this (Fig. 2 B-C) (11 , 27, 28). We next isolated and purified the AiOSCI product and confirmed its structure as tirucalla- 7,24-dien-3b-ol (1) by NMR (Fig. 2D; Table S3).

The two other phylogenetically distinct OSCs from C. sinensis (CsOSC2 and

CsOSC3, indicated in Fig. 2 A) were also expressed in yeast and found to make different products, which were identified as b-amyrin and lupeol based on

comparison with standards (results not shown; Table S2).

Although the sequences of the three previously reported uncharacterised putative OSCs (two from A. indica and one from C. grandis) (12, 13) have not been deposited in publicly available databases, they appear to be phylogenetically distinct from the three tirucalla-7,24-dien-3b-ol synthases characterised here (based on phylogeny and closest reported BLAST hits). They are also clearly distinct from AtLUP5. However, another previously characterised multifunctional OSC (AtPEN3) from A. thaliana that produces tirucalla-7,24-dien-3b-ol (11 , 27, 28) is located in a neighbouring subclade in the tree (Fig. 2 A).

Example 2 - Expression of AiOSC1 and MaOSCI in limonoid-accumulatinq tissues

Differential gene expression analysis and hierarchal clustering was performed using available A. indica RNAseq data (19, 22). AiOSC1 showed highest expression in the fruit (Fig. 3), consistent with a previous report of high levels of the ring-intact limonoids azadiradione and epoxyazadiradione (Fig. 6) in A. indica in this organ (12). Other genes that were highly co-expressed with AiOSC1 included three predicted CYP sequences. These were named by the Cytochrome P450 Nomenclature Committee following established convention (29) as AiCYP71BQ5, AΪOUR72A721 and AiCYP88A 108 (Fig. 3). These co-expressed CYPs are implicated as potential candidates for oxidation of the tirucalla-7,24-dien-3b-ol (1) scaffold produced by

AiOSC1. Unlike A. indica , the spatial occurrence of limonoids within other Meliaceae species has not been investigated. M. azedarach, a close relative of A. indica (30), is the second most prolific limonoid-producing species with 109 limonoid structures reported, including seco-C-ring limonoids of the azadirachtin and meliacarpin class (2,4). We therefore investigated the levels of melianol and salannin in extracts from the leaves, roots and petioles of young (~12 months) M. azedarach plants.

Azadirachtin was not detected in our analyses, consistent with an earlier investigation (23). Melianol accumulation was significantly higher in extracts from the petiole compared to root and leaf tissue, whilst salannin accumulation was highest in the roots (Fig. 4 A). The relative expression levels of MaOSCI in available M. azedarach tissues were significantly higher in the tissues identified as having the highest accumulation of melianol and salannin, the petiole and root tissues, respectively (Fig. 4B).

The expression of A. indica and M. azedarach AiOSC1 and MaOSCI in tissues that accumulate limonoids and the lack of an alternative candidate OSC shared between these limonoid-producing species together suggest that these OSCs may catalyse the first step in limonoid biosynthesis. The characterisation of these two OSCs along with CsOSCI as tirucalla-7,24-dien-3b-ol synthases supports the hypothesis that tirucalla-7,24-dien-3b-ol (1) is the triterpene precursor of limonoids in these species.

Our finding is in contrast to an earlier A. indica study (31) which reported a greater relative incorporation of [ 3 H] labelled euphol into the seco-C-ring limonoid nimbolide (Fig. 6), compared to the tirucallanes or other euphanes. However, this earlier report employed inconsistent methods of precursor labelling and used wet leaf weight in calculations of relative incorporation, and so the results may be unreliable.

Example 3 - Identification of two cytochrome P450 enzymes from Melia azedarach that together convert tirucall-7.24-dien-3b-ol to melianol.

To identify candidate CYPs that may be capable of oxidising the tirucall-7,24-dien- 3b-oI scaffold, the protein sequences of 235 A. thaliana CYPs (downloaded from http://www.p450.kvl.dk) were used to BLAST+ (32) search the Trinity-assembled M. azedarach transcriptome. Phylogenetic comparison to A. thaliana and Cucumis sativus (http://drnelson.uthsc.edu/cytochromeP450.html) CYPs (the latter included as an additional dicotyledonous species for which full genomic complement of CYPs have been assigned) revealed that whilst most of the 103 candidate M. azedarach CYPs were dispersed throughout the tree, a discrete subclade of seven M.

azedarach CYPs (SH local support value of 1) was phylogenetically distinct from all A. thaliana and C. sativus candidates (Fig.5A). This subclade sits within the largest CYP family in plants (CYP71), which is known for taxa-specific subfamily blooms and includes CYPs with characterised roles in secondary metabolism (33). Further the subclade includes a candidate CYP ( MaCYP71BQ5) which is an ortholog of

AiCYP71BQ5. AiCYP71BQ5 is co-expressed with AiOSC1 in A. indica (Fig. 3). We selected a total of nine M. azedarach candidate CYPs (Table S4) - the seven from the CYP71 subclade, and a further two that were homologous to two other A. indica CYPs that were co-expressed with AiOSC1 - for further analysis. Two of the candidates, MaCYP71CD2 and MaCYP71BQ5, share a similar expression pattern to MaOSCI (Fig. 4B)10T.

To determine the ability of the candidate CYPs to modify the tirucall-7,24-dien-3b-ol scaffold (1), functional analysis was performed by transient co-expression with

AiOSC1 in Nicotiana benthamiana (34, 35). Expression of AiOSC1 gave the expected product, tirucalla-7,24-dien-3b-ol (Fig. 5B), consistent with previous expression in yeast (Fig. 2B, C). Co-expression of AiOSC1 and MaCYP71CD2 resulted in consumption of tirucall-7,24-dien-3b-ol (1) and generation of a new product with a derivatised mass of 602.6 determined by GC-MS and an adduct of 481.365 determined by LC-MS (2) (Fig. 5 B, C, Fig. 7-8). This suggested that two oxidations of tirucall-7,24-dien-3b-ol could have been performed by MaCYP71CD2, which could feasibly include a hydroxylation and a conversion of an alkene to either an epoxide or ketone. In contrast co-expression of AiOSC1 and MaCYP71BQ5 resulted in partial consumption of tirucall-7,24-dien-3b-ol (1) and production of a new product (3) (Fig. 5S). This new product (3) had a derivatised mass of 498.4 as determined by GC-MS (Fig. 7) which could suggest a single hydroxylation of tirucall- 7,24-dien-3b-ol. When AiOSC1 was co-expressed with both MaCYP71CD2 and MaCYP71BQ5, tirucalla-7,24-dien-3b-ol was completely consumed and a new product (4) detectable by LC-MS with an adduct of 495.344 was observed (Fig.

C, Fig. 8). The lack of detectable (2) and (3) suggests that these CYPs work sequentially to form (4). MaCYP71CD2 may act first due to greater efficiency of consumption of tirucalla-7,24-dien-3b-ol (1) than observed for MaCYP71 BQ5 (Fig.

5 B).

We next carried out large-scale co-expression, purified (2), (3) and (4), and determined their structures by NMR (Fig. 5 D; Tables S5-S7). Our structural analysis revealed that MaCYP71CD2 introduces a secondary alcohol at C23 and an epoxide at the C24-25 alkene of tirucall-7,24-dien-3b-ol to give the previously isolated compound dihydroniloticin (2), a postulated protolimonoid. Dihydroniloticin and its C3 ketone, niloticin, have previously been isolated on multiple occasions from

Meliaceae, Rutaceae and Simaroubaceae species. Several structures with C23 oxidation only (lacking the C24 epoxide) have also been reported. However, the occurrence of niloticin-type structures with both the epoxidation and C23 oxidation are far more common. MaCYP71 BQ5 introduces a primary alcohol at C21 of (1) to form the previously isolated compound tirucalla-7,24-dien-21 ,3b-diol (3), another postulated protolimonoid. Tirucalla-7,24-dien-21 ,3b-diol (3) has been isolated only once from an obscure member of the Simaroubaceae family (36), whilst a related structure with an aldehyde at C21 , 3-oxotirucalla-7,24-dien-21-al, has been isolated a total of four times from members of Meliaceae, Rutaceae and Simaroubaceae families. The mass adduct of the product of co-expression of AiOSCI , MaCYP71CD2 and MaCYP71 BQ5 (4) does not correspond to the predicted product of

MaCYP71CD2 and MaCYP71 BQ5 acting together (tirucalla-7-ene-24,25-epoxy- 3b,21 ,23-triol). Melianol exists as an epimeric mixture in solution (37), with the hemiacetal ring opening and reforming with two different stereochemistries at C21. Similar epimeric mixtures have been reported in other protolimonoids containing a hemiacetal ring structure such as turraeanthin (38) and melianone (39). Although melianol (4) has only been isolated eight times, its C3 ketone melianone has been isolated from a total of 18 species across the Meliaceae, Rutaceae and

Simaroubaceae families.

The previous isolation of the products generated by MaCYP71CD2 and

MaCYP71 BQ5 from members of all three limonoid-producing families of plants suggests that the biosynthesis of melianol could represent the initial stage of limonoid biosynthesis across the Sapindales order. Consistent with this, close homologs of MaCYP71CD2 and MaCYP71BQ5 are present in A. indica and C. sinensis (Table 4).

The pathway shown in Fig. 5 D is the proposed pathway for melianol biosynthesis in M. azedarach and C. sinensis, based on the evidence presented here. Our work provides the first example of the functional characterisation of biosynthetic enzymes involved in protolimonoid biosynthesis. Together MaCYP71CD2 and MaCYP71 BQ5 are capable of catalysing the three oxygenations of tirucalla-7,24-dien-3b-ol required to induce hemiacetal ring formation, so affording the protolimonoid melianol. The identification of enzymes capable of initial ring formation on the sidechain of tirucalla- 7,24-dien-3b-ol are feasibly the starting process of furan ring formation (Fig. 1). Thus, the identification of these enzymes could imply that the order of chemical

transformations in the subsequent conversion of protolimonoids to limonoids begins with furan ring formation (Fig. 1). The isolation of protolimonoid 7- deacetylbruceajavanin B (41), which possesses a hemiacetal ring and typical limonoid internal scaffold rearrangements (Fig. 6) is counter to this, but could still support formation of a hemiacetal ring before scaffold rearrangements.

In further experiments, anti-insect activity caused by the characterised melianol biosynthetic genes was assessed by transiently expressing these genes in N.

benthamiana and evaluating the effects of their expression on tobacco hornworm (Manduca sexta) feeding. Whilst individually these genes had no significant effect of feeding compared to the negative control, when the genes required for the production of melianol ( AiOSC1 , MaCYP71CD2 and MaCYP71BQ5) were co expressed they caused a small but significant reduction in M. sexta feeding, confirming the utility of the these genes in modifying the phenotype of host cells.

Example 4 - Identification of enzymes capable of modifying the melianol scaffold

A subset of 18,151 differentially expressed genes (P-value < 0.05) was extracted from the 31 ,048 genes annotated in a newly generated M. azedarach genome (Elv1) (unpublished). Hierarchical clustering was performed to identify which of these differentially expressed genes shared expression patterns with characterised melianol biosynthetic genes ( MaOSCI , MaCYP71CD2 and M aCYP71BQ5), which resulted in a list of 283 candidate genes. This was manually searched to identify genes annotated with biosynthetic functions of interest. As the post-melianol steps in limonoid biosynthesis could proceed by a number of routes, a broad range of biosynthetic enzyme classes were considered to be potential candidates. This includes, but is not limited to: hydrolase, oxygenase, dehydratase and isomerase enzymes. Subsequently, 26 candidates were selected for further analysis all of which showed strong co-expression with melianol biosynthetic genes (Figure 10).

Expression of melianol biosynthetic genes, and selected candidates, is highest in the petiole tissues of M. azedarach and lowest in the leaves (Figure 10) in line with previous profiling of this species by metabolite analysis and qRT-PCR (Figure 4).

The candidate genes selected based on differential expression analysis were cloned into pEAQ-HT-DEST 1 vectors to allow their functional characterisation by transient expression in N. benthamiana. Expression of different combinations of these genes in combination with melianol biosynthetic genes revealed that four candidate genes had activity on melianol-type scaffolds and therefore likely limonoid biosynthetic genes. The activities observed included modification of the internal scaffold of melianol from a protolimonoid-type to a limonoid-type internal scaffold (by

MaCYP88A108 and MalSOMI) to form‘melianol B’ and the further decoration of melianol-type scaffolds by tailoring enzymes (MaSDRI and MaBAHDI).

Example 5 - Characterisation of enzymes required to convert melianol to melianol B

MaCYP88A108 activity

MaCYP88A108 was originally identified as a homolog of AiCYP88A 108, which was co-expressed with AiOSC1 in A. indica (Figure 3, Table S4). It was reselected as a candidate here because the newly generated M. azedarach genome revealed an extended 5’ sequence (141 bp) within the coding sequence of this gene.

Expression of melianol biosynthetic genes, in the absence of MaCYP88A 108, results in the production of melianol (4) and additionally three peaks (5-7) with a mass of 511.3384, equivalent to the sodium adduct of melianol with an additional oxygen (Figure 11.A). These are believed to be the result of modification of the melianol scaffold by endogenous N. benthamiana enzymes. However, when MaCYP88A 108 is expressed in combination with these genes, there is a -80% reduction in melianol (4) accumulation and almost complete reduction of these the endogenous peaks (5- 7) (Figure 11. A). This is accompanied by the accumulation of four new peaks (8-11), all of which have an identical mass equivalent to the sodium adduct of melianol with an additional oxygen (Figure 11.A). Both copies of MaCYP88A 108 (from the Ma1 transcriptome assembly and the new genome) were found to give the same results.

The production of peaks (8-11), all with a mass suggesting the addition of one oxygen without the loss of hydrogen, could be explained by either the addition of a hydroxyl group to a carbon or the conversion of an alkene to an epoxide.

While it is possible that MaCYP88A108 may introduce a hydroxyl group to multiple positions on the melianol scaffold this is unlikely given that no peaks with multiple oxidations were identified and when considering previously characterised triterpene biosynthetic CYPs that are capable of multiple oxidations, such as MaCYP71CD2 and AsCYP51 H10 from A. strigosa (80), oxidise both positions simultaneously as opposed to oxidising positions individually.

Thus we conclude that melianol (4) is being oxidised in only one position and a lack of stability in the resultant structure can lead to rearrangements that produce multiple isomers of the same mass. This is consistent with the predicted next steps in the pathway (Figure 1), whereby MaCYP88A108 is capable of oxidising the C7 alkene of melianol to form an epoxide (7,8-epoxymelianol) the mass of which would be

511.3384 (Figure 11 B). Subsequent epoxide opening would form a carbocation intermediate, which could be rearranged into a number of different structures with a mass of 511.3384, thereby explaining the multiple peaks (8-11). Based on this premise one of these peaks (8-11) represents 7,8-epoxymelianol and the remaining are rearranged structures, or alternatively, if the epoxide was unstable, all four of these peaks (8-11) represent carbocation-derived rearranged structures (Figure 11 B).

MalSOMI activity

MalSOMI is annotated in Elv1 as sterol-8, 7-isomerase (IPR007905, IPR033118) and was selected as a candidate based on the structural similarity of protolimonoids to sterols and the potential requirement of isomerase function during limonoid scaffold rearrangement. The sequence amplified using MalSOMI primers was not identical to the predicted sequence as it is containing an un-spliced intron, which is assumed to be spliced out when expressed in heterologous expression systems such as N.

benthamiana.

Transient expression of melianol biosynthetic genes ( AiOSC1 , MaCYP71CD2 and MaCYP71BQ5) and MaCYP88A 108 in N. benthamiana, results in production of four peaks (8-11) with a mass equivalent to the sodium adduct of oxidised melianol (Figure 13. A), as discussed above. However, when MalSOMI is co-expressed with these genes only one of these peaks is detectable (10) and there is an absence of all other peaks with this mass (8,9,11) (Figure 13. A). This suggested that MalSOMI is capable of controlling the rearrangement of the MaCYP88A108 carbocation intermediate towards the formation of only one isomer (Figure 13.B).

To determine the structure of the remaining oxidised melianol peak (10) transient expression of AiOSC1, MaCYP71CD2, MaCYP71BQ5, MaCYP88A108 and

MalSOMI in N. benthamiana was performed on a large scale. This enabled the product forming peak (10) to be purified and its structure confirmed as 8-methyl- melian-14,15-ene-3,7-diol (Table S13, Figure 13. B), here termed melianol B for simplicity. This structure is similar to melianol except with a mature limonoid internal scaffold rather than a protolimonoid internal arrangement. Therefore, this

characterisation confirms that together MaCYP88A108 and MalSOMI are capable of converting the internal scaffold of protolimonoids to limonoid type scaffolds in a relatively efficient manner (Figure 13.A). The predicted mechanism of this is likely to be epoxidation of the C7 alkene by MaCYP88A108, and the subsequent protonation of this epoxide by MalSOMI leading to a methyl shift from C14 to C8, elimination of hydrogen from C15, and formation of a C14-C15 alkene (Figure 13.B).

It appears that MalSOMI may protonate a novel substrate for this class of enzyme as other plant sterol-8, 7-isomerases (characterised from A. thaliana and Zea mays) do not require prior oxygenation in order to perform their isomerisations and utilise an alkene as a substrate (81 ,82). When MalSOMI is expressed in N. benthamiana in the absence of MaCYP88A 108, melianol is not consumed and no new products are detected (Figure 15), which confirms that MalSOMI does not function in the absence of oxidation by MaCYP88A108 and requires the epoxide rather than the alkene as a substrate.

Example 6 - Identification of enzymes capable of tailoring melianol-type scaffolds

MaSDRI activity

MaSDRI is annotated as an SDR (IPR002347) and was selected based on its potential ability to convert the C3 hydroxy group of protolimonoids to a C3 ketone commonly seen in true limonoids. In absence of expression of MaSDRI, when melianol B biosynthetic genes ( AiOSC1 , MaCYP71CD2, MaCYP71BQ5,

MaCYP88A108 and MalSOMI) there is accumulation of a small background peak (12) with a mass of 509.3254 (Figure 16. A), that would be equivalent to the sodium adduct of doubly dehydrogenated melianol B. Peak (12) which is thought to be a N. benthamiana derived modification of melianol B (10). When MaSDRI is co expressed with this combination of genes melianol B (10) is almost completely consumed and a new peak (13) with a mass of 509.3254 is detectable (Figure 16. A).

The mass of 509.3254 (13) is equivalent to the sodium adduct of doubly

dehydrogenated melianol B and therefore the new peak is believed to represent melianone B, after dehydrogenation of the C3 hydroxy group of melianol B (Figure 16.C). However, based on mass spectra alone it is not possible to confirm which of the three hydroxy groups (C3,C7,C21) of the melianol B scaffold has been dehydrogenated, or whether MaSDRI is alternatively catalysing the formation of an alkene rather than a carbonyl. Based on predicted pathway steps (Figure 1) and the relatively high conversion efficiency (Figure 16.A), the C3 position seems the most likely target of MaSDRI Further, the promiscuous AtTHARI from A. thaliana (83), which performs this function on a range of triterpene scaffolds, shares 44% protein identity to MaSDRI . MaSDRI can also doubly dehydrogenate a melianol (4) scaffold (Figure 17), suggesting it may be a tailoring enzyme that can function at multiple stages within this pathway.

MaBAHD1 activity

MaBAHDI was selected based on it being preliminarily identified as a possible ‘vinorine synthase’. Vinorine synthases are a specific enzyme-type within the Benzylalcohol acetyl-, anthocyanin-O-hydroxy-cinnamoyl-, anthranilateW-hydroxy- cinnamoyl/benzoyl-, deacetylvindoline acetyltransferase (BAHD) superfamily, which transfer acetyl groups in monoterpenoid indole alkaloid biosynthetic pathways (84). Such acetyl decorations are common in both protolimonoid and limonoid structures.

In the absence of MaBAHDI, when melianol B (10) biosynthetic genes ( AiOSC1 , MaCYP71CD2, MaCYP71BQ5, MaCYP88A108 and MalSOMI) there is no observations of peaks with a mass suggesting acetylation has occurred (Figure 16. B). However, when MaBAHDI is expressed in combination with these genes, the melianol B peak is partially consumed and a new broad peak with a mass of

553.3525 is detected, which appears may consist of three distinct peaks (14-16) (Figure 16.B). This mass of these new peaks (14-16) is consistent with a sodium adduct of melianol B acetate, suggesting MaBAHDI functions by acetylation of a hydroxy group on the melianol B scaffold (Figure 16. C). The occurrence of three peaks (14-16) could indicate that MaBAHDI may be capable of acetylating each of the three different hydroxy groups of the melianol B scaffold. The low level of consumption of melianol B (10) by MaBAHDI (Figure 16. B) may suggest that melianol B (10) is not its preferred substrate and that in M. azedarach this enzyme may act later in the limonoid biosynthetic pathway. Acyl decoration are often added late in triterpene biosynthesis, such as the acylation of avenacin which occurs at a late stage of the pathway (85). Co-expression of MaBAHDI with melianol

biosynthetic genes ( AiOSC1 , MaCYP71CD2 and MaCYP71BQ5) alone indicated that MaBAHDI is also able to acetylate the melianol (4) scaffold (Figure 17), again suggesting its activity as a tailoring enzyme in this pathway.

MaSDRI and MaBAHDI activity

Expression of both MaSDRI and MaBAHDI in combination with the melianol B biosynthetic genes ( AiOSC1 , MaCYP71CD2, MaCYP71BQ5, MaCYP88A 108 and MalSOMI) resulted in the production of new peaks with masses equivalent to sodium adducts of double dehydrogenated and acetylated melianol B (Figure 18).

Example 7 - Production of Melianol and limonoids by stable transformation

Triterpenes have previously been produced using engineered transgenic plant lines (e.g. Arabidopsis, Wheat). A series of Golden Gate (23. Engler, C., et al. , A golden gate modular cloning toolbox for plants. ACS Synth Biol, 2014. 3(11): p. 839-43.) vectors which allow for construction of multigene vectors and allow integration of an entire pathway into a single locus have been reported. These can be applied analogously to the present invention, in the light of the disclosure herein.

Materials and Methods for Examples 1-6

Melia azedarach material

A young (<1 year) Melia azedarach plant was purchased from Crug Farm Plants (UK) in summer 2016 and maintained in a John Innes Centre greenhouse (24 °C, 16 h light, grown in John Innes Cereal mix). The individual’s provenance is

Chikugogawa Prefectural Natural Park (Japan). Seeds were collected by Crug Farm Plants in autumn 2015 from an area of the park with no sampling restrictions.

Confirmation that the material was out of scope of the Nagoya protocol and Access and Benefit Sharing legislation was given by the National Focal Point of Japan.

Transcriptome assembly

Raw RNAseq reads from two studies of Azadirachta indica (12, 19, 22) and one study of Melia azedarach (23) were downloaded from NCBI-SRA (Table S1). Within each dataset, tissues were pooled and a reference transcriptome was assembled using Trinity de novo assembler V.r0140717 (42) following a standard protocol (43) (Table S1). For protein annotation Augustus V3.2.2 (44) was used in intron-less mode with an Arabidopsis thaliana training model and untranslated region (UTR) identification turned off.

Identification of oxidosqualene cyclases

Protein sequences of 83 functionally characterised oxidosqualene cyclases (OSCs)

(11) were used as a query for identification of candidate OSCs by BLAST+ V2.7.1 (32) searches of trinity-assembled transcriptomes (Table S1) and a Citrus sinensis protein annotation (GCF_000317415) (25). Candidates were filtered based on the presence of the conserved triterpene synthase OSC motif (DCTAE) (26), length (between 700 and 1000 amino acids) and prediction of a protein coding sequence (Augustus V3.2.2 (44)). Five unique candidate OSCs were present in C. sinensis and 2 in both M. azedarach and A. indica. Protein sequences were aligned with MUSCLE V3.8.31 (45).

Characterisation of candidate oxidosqualene cyclases

Candidate OSC sequences from C. sinensis and A. indica were synthesised

(Integrated DNA Technologies) in two fragments and recombined into the pYES2 vector (Thermo-Fisher Scientific). MaOSCI was cloned from M. azedarach into pYES2, and the cloned gene for AtLUP5, a previously characterised OSC (27), was sourced through TAIR (Stock: U 16880). Details of cloning methods of OSC candidates (Table S2) are described below. Candidate OSCs were expressed in the yeast strain GIL77 (MAT a/a gal2 hem3-6 erg7 ura3-176 ) (46) in 10 ml cultures. Triterpenes were extracted in hexane after saponification and analysed by GC-MS (see below). Purification of AiOSC1 is described below. In addition, candidate OSCs were cloned into pEAQ-HT-DEST1 (see below) to enable agro- infiltration of Nicotiana benthamiana which was performed as previously described (35).

Azadirachta indica differential gene expression analysis

Using raw RNAseq data and the corresponding Trinity-assembled transcriptome of Azadirachta indica (19, 22), differentially gene expression analysis was performed to identify a subset of differentially expressed genes (p<0.05). Within this subset hierarchal clustering analysis identified a cluster of genes with similar expression patterns to AiOSC1 (see below). HMMSCAN (EMBL-EBI) was used to assign pFAM domains with an E-value of 1.

Melia azedarach limonoid and protolimonoid quantification Freeze-dried Melia azedarach material was weighed (~10 mg) and homogenised using Tungsten Carbide Beads (3 mm; Qiagen) with the a TissueLyser (1000 rpm, 1 min). Samples were extracted in 550 ml 100% methanol (10 mg/ml podophyllotoxin internal standard (Sigma-Aldrich)) and agitated at 18°C for 20 min. Supernatant (400 mI) was transferred and mixed with 140 mI ddh^O. De-fatting was performed by addition of hexane (400 mI) and removal of the upper phase (300 mI) in duplicate. Remaining solvent was evaporated to dryness and extracts re-suspended in 100 mI of methanol. Spin-X® Centrifuge Filter Centrifuge Tubes (pore size 0.22pm, Corning® Costar®) were used to filter extracts by centrifugation. Eluate (50 pi) was diluted in 50 ml of methanol and transferred to a glass insert placed inside a glass autosampler vial for LCMS analysis (see below). LCMSsolutions V3 (Shimadzu) was utilised to analyse chromatograms and for peak identification. The internal standard (podophyllotoxin) was used to calculate an estimated concentration of target compound in starting material. Azadirachtin (Sigma-Aldrich), salannin (Greyhound Chromatography) and melianol (4) (see below) standards were used to confirm retention times and mass adducts.

Identification of candidate cytochrome P450s in M. azedarach

Candidate cytochrome P450s (CYPs) were identified in M. azedarach transcriptome data by a BLAST+ V2.7.1 (32) search using Arabidopsis thaliana CYP protein sequences (http://www.p450.kvl.dk) as a query. A total of 1672 hits were identified with 103 representing unique, full-length (300-700 amino acids) protein coding (Augustus V3.2.2 (44)) sequences with‘cytochrome P450’ pFAM annotations (HMMSCAN (EMBL-EBI)). Protein sequences of candidates were aligned with MUSCLE V3.8.31 (45) to CYP protein sequences from A. thaliana

(34Thttp://www.p450.kyl.dk34T) and Cucumis sativus

(34Thttp://drnelson. uthsc.edu/cytochromeP450.html34T). CYP clades were determined based on previous phylogenetic studies (33). Protein sequences of the 103 CYP candidates from M. azedarach were used as a BLAST+ V2.7.1 (32) query to identify homologs in A. indica and C. sinensis. CYPs were named following convention by the Cytochrome P450 Nomenclature Committee (29)

Relative expression of candidate genes in M. azedarach

In the absence of a M. azedarach genome sequence, intron spanning PCR primers were designed for MaA-Acf/n, MaOSCI, MaCYP71CD2 and MaCYP71BQ5 (Table S10) by assuming similar intron patterning to the closely related A. indica species and subsequent alignment to the closest homologs in the A. indica draft genome (PRJNA176672; AMWY00000000.1) (21)). Lightcycler® 480 SYBR Green I Mastemix (Roche) was used for quantitative real time PCR (qRT-PCR) performed on CFX96 real-time system and C1000 touch thermal cycler (BioRad). R was used to calculate relative expression compared to MaA-acf/n using the AACq method (47).

Functional analysis of candidate genes from M. azedarach in N. benthamiana

Candidate genes from M. azedarach were expressed in N. benthamiana by agroinfiltration of Agrobacteria tumefaciens LLBA4404 strains transformed with pEAQ-HT-DEST 1 constructs (48) (pEAQ-HT-DEST 1 was kindly provided by Lomonossoff laboratory). Different combinations of strains were co-infiltrated to test combinations of genes. A feedback insensitive HMG CoA-reductase ( AstHMGR ) was included in addition to the candidates due to its proven ability to boost triterpene yield in this system (35). Agro-infiltration and harvest of leaf discs was performed as described previously for combinatorial triterpene biosynthesis (35). The methods outlined for extraction of limonoids and protolimonoids from M. azedarach were used for analysis of infiltrated N. benthamiana leaves. Hexane extracts from defatting process were retained for GC-MS analysis. Extracts were analysed by GC-MS or LC- MS depending on the polarity of products formed (see below). Purification of protolimonoid products of these enzymes is described below.

Materials and Methods for Fiqures 6-9, Tables S1 to S12

RNA extraction and cDNA synthesis from Melia azedarach. M. azedarach tissues were flash-frozen in liquid nitrogen and ground using a pestle and mortar. RNA was extracted from leaf tissues using the modified protocol for RNeasy Plant Mini Kit (Qiagen) developed for extraction from woody plants (1). DNAase treatment was performed‘on-column’ using DNAse (Promega). Following the manufacturer’s instructions, first-strand cDNA synthesis was performed using the GoScript™

Reverse transcription system (Promega).

Saccharomyces cerevisiae heterologous recombination and transformation.

Heterologous recombination was performed in the S. cerevisiae strain GIL77

(MATa/a gal2 hem3-6 erg7 ura3-176) (2). Sequences were amplified by PCR using primers with homologous sequences to the ends of linearized pYES2 vector cut at restriction sites Xbal and Hindi 11. The 5’ of the forward primer overlapped with GAL1 promoter and that of the reverse primer with the CYC1 terminator sequence. Coding sequences of oxidosqualene cyclase (OSC) candidates from Azadirachta indica and C. sinensis were synthesized by Integrated DNA Technologies (IDT) in two fragments. Fragment 1 was amplified with a pYES2-specific forward primer as described and a reverse primer with the 3’ end complementary to the 5’ end of fragment 2. Fragment 2 was amplified using a forward primer with the 5’ end complementary to the 3’ of fragment 1 and a pYES2-specific reverse primer as described above. Coding sequences of oxidosqualene cyclase (OSC) candidates from M. azedarach were amplified from cDNA by PCR with pYES2-specific primers. All PCR reactions were performed using Phusion polymerase (Promega) following the manufacturer’s instructions, and all primers (Table S10) were ordered from Sigma-Aldrich. The PCR fragments were co-transformed into GIL77 with linearized pYES2 vector following a standard protocol (YeastMaker™, Yeast transformation system 2, Clontech laboratories). To confirm successful recombination and correct coding sequence of PCR fragments, plasmids were extracted using Zymoprep™ Yeast Plasmid Miniprep (Zymo Research), transformed into Escherichia coli for propagation and extracted for sequencing. All plasmid purifications from E. coli were performed using QIAprep Spin Miniprep Kit (Qiagen).

Expression of OSCs and cytochrome P450s (CYPs) in S. cerevisiae and triterpene extraction. S. cerevisiae strains GIL77 (2) and Y21900 ( MATa/a ura3A0 leu2A0 his3A 1 met15A 0/ME T15 LYS2/lys2A0 ERG 7/ERG 7::kanMX4) (EuroScarf) were used for expression of candidate OSCs and CYP genes. Both strains have either partial (Y21900) or full (GIL77) loss of function of ERG7. All media used for GIL77 strain were supplemented with 20 mg/mL ergosterol (Fluka), 13 mg/mL hemin (Sigma-Aldrich) and 5 mg/ml_ Tween 80 (Sigma-Aldrich). Selection media are listed in Table S12. Strains were grown in liquid culture at 30°C with shaking at 200 rpm. For expression of candidate genes, strains were first pre-cultured to saturation (~48 h) in SD +glucose (2% wt/vol) +[supplements]. Cells were then pelleted by centrifugation, washed in ddHF^RO, resuspended in SD +galactose (2 % wt/vol) + [supplements] and cultured for a further 48 h before being pelleted for extraction.

Cells were saponified by resuspending in 250 pi of saponification reagent (20% (wt/vol) KOH in 50% (vol/vol) EtOH) and incubating for 2 h at 65°C. Triterpenes were then extracted in an equal volume of hexane and the hexane extracts were pooled and dried down.

Gateway® cloning of OSCs and genes from M. azedarach.. The coding sequence of candidate genes were amplified by PCR with a forward primer containing 5’ AttB1 site and a reverse containing 5’ AttB2. Gel electrophoresis was used to confirm the sizes of PCR fragments. These were then purified using QIAquick Gel Extraction Kit or QIAquick PCR Purification Kit (Qiagen). Gateway® technology (Invitrogen) was used following the manufacturer’s instructions. Briefly, purified PCR fragments were transferred into donor vector pDNR207 by performing a BP recombination reaction followed by transformation into E. coli (DH5a™ (ThermoFisher Scientific)). Plasmids were sequenced to check for successful recombination and correct coding sequence. Finally, an LR recombination reaction was performed to transfer the coding sequence of candidate genes from pDNR207 to the desired expression vector following the manufacturer’s instructions. For expression in yeast, PYES2-DEST52 (Thermo- Fisher Scientific), pAG423GAL and pAG425GAL (Addgene) were used as

expression vectors. The pEAQ-HT-DEST 1 vector (3) (kindly provided by

Lomonossoff laboratory) was used as an expression vector for Agrobacteria tumefaciens mediated transient expression in Nicotiana benthamiana.

GC-MS analysis of triterpene extracts. Dried samples were resuspended in 200 mL of extraction solvent and 50 mL aliquots were dried down under N2 gas. Dried aliquots were then derivatized in 50mL 1-(trimethylsilyl)imidazole - pyridine mixture (Sigma- Aldrich) and heated at 65°C for 30 min, before being transferred to glass inserts in glass autosampler vials. GC-MS analysis was performed using a 7890B GC (Agilent) and an electron-impact (El) 5977AMSD (Agilent) fitted with a Zebron ZB5-HT Inferno column (Phenomenex) following a previously described method (4). Briefly, 1 mL of sample was injected (inlet 250°C) in pulse splitless mode (pulse pressure 30 psi) with a program that involved an oven temperature at 2 min 170°C, ramp of 20 °C/min to 300°C and 11.5 min at 300 °C. Detection was carried out in scan mode (60-800 mass units), set to 7.2 after a solvent delay of 8 minutes. Data analysis was undertaken using MassHunter workstation (Agilent) software.

General considerations for NMR. NMR spectra were recorded in Fourier transform mode at a nominal frequency of 400 MHz for 1 H NMR, and 100 MHz for 13C NMR (unless specified otherwise), using the specified deuterated solvent. Chemical shifts were recorded in ppm and referenced to the residual solvent peak or to an internal TMS standard. Multiplicities are described as, s = singlet, d = doublet, dd = doublet of doublets, dt = doublet of triplets, t = triplet, q = quartet, quint = quintet, tquin = triplet of quintets, m = multiplet, br = broad, appt = apparent; coupling constants are reported in hertz as observed and not corrected for second order effects.

Purification of tirucalla-7,24-dien-3b-ol (AiOSCI product). Two L of GIL77 cells expressing AiOSCI (pYES2) were cultured and pelleted, yielding 15.28 g of material. Saponification was performed in 100 ml of reagent and triterpenes were extracted by addition of an equal volume of hexane in triplicate yielding 220 mg of dried crude extract. Fractionation using Isolera™ Prime (Biotage) (Table S11) yielded 1 mg of purified tirucalla-7,24-dien-3b-ol enabling structural confirmation by NMR.

Azadirachta indica differential gene expression analysis (DGE) and hierarchal clustering. DGE analysis was performed using Trinity-assembled A. indica transcriptome (Ai 1) (Table S1) as reference sequence with corresponding raw RNAseq reads from fruit, root, leaf, stem and flower tissues (5, 6). Transcript abundance estimation was performed using a script provided within the Trinity de novo assembler package“align and estimate abundance” (7). Briefly, raw RNAseq reads for each tissue were aligned to the transcriptome (BowTie V1.0.1 (8)) and abundance per gene was estimated (RSEM V1.3.0 (9)) using Trinity transcripts as a proxy for genes. The resultant estimated counts per gene were converted to integers and genes scoring less than one count per million in two or more tissues were excluded from the analysis. Data were normalized to account for differences in library size by using a trimmed mean of M-values (TMM) method (EdgeR V3.22.5 (10)). Due to a lack of replicates in the published dataset, a dispersion value could not be calculated and was therefore manually estimated at 0.05. A genewise negative binomial generalized linear model (EdgeR V3.22.5 (10)) was used to identify differentially expressed genes (likelihood ratio test=1 and p-value<0.05). LogR 2 R- normalised read counts (DESeq2 V1.22.1 (11)) were used for hierarchal clustering analysis of differentially expressed genes. Correlation matrices for tissues and genes were calculated using the Spearman (12) and Person (13) methods, respectively. Conversion of correlation matrices to distance matrices was performed based on complete linkages. The dendrogram of clustered genes was cut at 0.08422892 (max height of tree/4.85) and visualized using Heatmap3 V1.1.1 (14).

LCMS analysis of triterpene extracts from N. benthamiana leaves infiltrated with M. azedarach CYPs. LC-MS was carried out based on a previously described method (15) using positive mode electrospray LC-MS on a Nexera/Prominence UHPLC equipped with an ion-trap ToF mass spectrometer (Shimadzu). Separation was on a 100 x 2.1 mm 100 A 2.6 pm Kinetex EVO C18 column (Phenomenex) using 0.1 % formic acid in water (A) versus methanol (B) run at 500 pL/min, 40°C and following gradients of solvent B; 32-60% from 0-3 min, 60-65% from 7-13 min, 65- 90% from 13-13.5 min, 90% from 13.5-16.5 min, 90-40% from 16.5-17 min and 40% 17-20 min. Full MS spectra were collected (m/z 200-2000) with a maximum ion accumulation time of 20 msec, and automatic sensitivity control set to a target of 70% optimal base peak intensity. The instrument also collected data-dependent MS2 (m/z 50-2000) of the most abundant precursor ions, with an isolation width of m/z 3.0,

50% collision energy and 50% collision gas, and a fixed ion accumulation time of 10 msec. Spray chamber conditions were 300°C heat block, 250°C curved desorbation line, 1.5 L/min nebuliser gas, and drying gas‘on’. The instrument was calibrated using sodium trifluoroacetate cluster ions according to the manufacturer’s instructions.

To improve separation of melianol and dihydroniloticin the above gradient was modified to 0.1 % formic acid in water (A) versus methanol (B) run at 500 mL/min, 40°C and the following gradients of solvent B; 70-95% from 0-10 min, 95% from 10- 11 min, 95-70% from 11-11.1 min and 70% from 11.1-14.5 min. This new gradient is referred to as the‘protolimonoid’ gradient and the former as the limonoid’ gradient.

Purification of dihydroniloticin (the product of AiOSCI and MaCYP71CD2).

Using the previously described vacuum infiltration method (4, 16) 144 N.

benthamiana plants were agroinfiltrated with equal volumes of A. tumefaciens strains containing pEAQ-/-/T-DEST1 expression construct for AstHMGR, AiOSC1 and MaCYP71CD2. Initial extraction was performed on dried leaf material (68.64 g) following the large-scale triterpene extraction protocol previously described (16). Successive rounds of fractionation were performed using Isolera™ Prime (Biotage) as described in Table S11. To achieve final purification, the sample was dissolved in a minimal amount of ethanol and agitated (15 min) with activated charcoal (Sigma- Aldrich). This yielded 86 mg of dihydroniloticin, enabling structural confirmation by NMR.

Purification of tirucalla-7,24-dien^,21-diol (the product of AiOSCI and MaCYP71 BQ5). Five L of S. cerevisiae Y21900 cells expressing AiOSCI (pYES2), AtATR2 (pAG425gal) and MaCYP71 BQ5 (pAG423gal) were cultured and pelleted. Saponification was carried out in 500 ml of reagent. Extraction was performed in 1.5 L of hexane, yielding 1.13 g of dried crude extract. Successive fractionation using Isolera™ Prime (Biotage) (Table S11) yielded 4 mg of tirucalla-7,24-dien-3b,21-diol, enabling structural confirmation by NMR.

Purification of melianol (the product of AiOSCI, MaCYP71CD and

MaCYP71 BQ5). Using vacuum infiltration (4, 16) N. benthamiana plants (160) were agroinfiltrated with equal volumes of A. tumefaciens strains containing pEAQ-HT- DEST1 expression constructs of AstHMGR, AiOSC1, MaCYP71CD2 and

MaCYP71BQ5. Initial extraction was performed on dried leaf material (198.5 g) following the large-scale triterpene extraction protocol previously described (16). Successive rounds of fractionation were performed using Isolera™ Prime (Biotage) (Table S11). To achieve final purification, re-crystallisation was performed by dissolving the sample in a minimal volume of methanol (90°C), covering and allowing crystals to form at room temperature. Crystals were washed in ice-cold methanol.

The re-crystallisation process was repeated, yielding ~6 mg of melianol. Structural confirmation was carried out by NMR. Generation of a M. azedarach genome and transcriptome (Elv1). HMWgDNA was extracted from M. azedarach JPN11 leaves using the a modified CTAB protocol which includes the addition of proteinaseK and RNase A (Qiagen) (86). RNA was extracted from leaf, petiole (inclusive of rachis) and root tissues of M. azedarach individuals JPN11 and JPN02. All tissues were harvested on the same day, flash frozen in liquid N2 and stored at -80±C prior to extraction. RNA extractions were performed in technical replicates (four separate extractions, each including one replicate of each tissue type).

Preparation and sequencing of a 20-30 Kb PacBio shotgun HMW gDNA library and high-throughput lllumina stranded RNA library (150bp, paired end) was performed and sequenced on PacBio Sequel system and lllumina HiSeq4000 respectively. Assembly of the PacBio reads used HGAP-4 smrtlink V5.0.1.9585and structural annotations for this assembly used a specialised pipeline. Functional annotation of Elv1 was performed using the Assignment of Human Readable Descriptions (AHRD) V.3.3.3 (87) tool. AHRD was provided with results of BLAST V2.6.0 (32) searches (e- value = 1e-5) against reference proteins from TAIR (88), UniProt, Swiss-Prot and TREMBL datasets (89), along with interproscan (90) results.

To generate expression data from the raw RNA-Seq reads relative to the Elv1 annotation a basic methodology outlined in‘lntro2RNAseq’ (produced by Weill Cornell Medical College (91)) was followed. All tools and quality control steps were performed with parameters specified in this protocol. Quality control of all samples was assessed by FastQC V.0.10.1 (92). STAR V.2.5 (93) was used to align all reads (pooling all reads per replicate (directional and lane)) to the Elv1 annotation.

Samtools V.1.7 (94) was utilised to index the subsequent alignment. The

featureCounts tool of subread V1.6.0 (95) was used to generate raw read counts by counting the number of reads overlapping with Elv1 genes in each alignment. Raw read counts were analysed in R using DEseq2 v1.22.1 (96). Genes with zero counts were removed from the analysis, normalisation was performed based on library size and subsequent counts were log2 transformed with a pseudo count of one. The resultant library-normalised log2 read counts were used for downstream analyses.

Identification of candidate enzymes for modification of the melianol scaffold.

Following methodology and best practice given by 1ntro2RNAseq’ (Weill Cornell Medical College) (91), edgeR V3.22.5 (97) was used to select genes from the newly generated Melia azedarach genome Elv1 (unpublished), which were differentially expressed (P-value < 0.05). Briefly, raw read counts were imported into an EdgeR object and genes with low coverage (less than one count per million in more than four samples) were discarded. Normalisation (by library size) was performed using the‘trimmed mean of M-values’ method. To identify differentially expressed genes, a genewise negative binomial generalized linear model (glmQLFit) was used with pairwise comparisons between all sample types. DEseq2 V1.22.1 (96) was used to produce read counts for hierarchical clustering, by removing read counts of zero, normalising by library size and performing log2 transformation with a pseudo count of one. The log2-library-normalised read counts were used for hierarchical clustering and plotting as described for the A. indica analysis. A number of methods were used to capture the widest possible pool of co-expressed genes. The criteria used were as follows: genes clustering with melianol biosynthetic genes ( MaOSCI , MaCYP71CD2 and MaCYP71BQ5) when using read counts from all 28 RNA-Seq tissues or using a mean value for each of the seven tissues and genes clustering with the latest melianol biosynthetic gene at time of analysis ( MaCYP71BQ5) based on read counts from all 28 RNA-Seq tissues or using a mean value for each of the seven tissues. From this pool of co-expressed genes, candidates were manual selected based on their annotation.

Purification of melianol B (the product of AiOSCI, MaCYP71CD2,

MaCYP71 BQ5, MaCYP88A108 and MalSOMI). Using vacuum infiltration (4, 16)

115 large N. benthamiana plants were agroinfiltrated with equal volumes of A.

tumefaciens strains containing pEAQ-HT-DEST1 expression constructs of AstHMGR,

AiOSC1, MaCYP71CD2, MaCYP71BQ5, MaCYP88A 108 and MalSOMI Initial extraction was performed on dried leaf material (159.9g) following the large-scale triterpene extraction protocol previously described (16). Successive rounds of fractionation were performed using Isolera™ Prime (Biotage) (Table S11).

To achieve final purification semi-preparative UHPLC was performed on an Agilent Technologies 1290 Infinity II system equipped with an Agilent Technologies 1290 infinity II Diode Array Detector (DAD), Agilent 1260 Infinity Evaporative Light

Scattering Detector (ELSD) and an Agilent 1260 infinity II fraction collector (calibrated to the two detection units). Separation was performed on a 250x10 mm S-5 mM 12nm Pack pro C18 column (YMC) using water (A) versus 95% acetonitrile (B) run at 4 ml/min, 40°C and the following gradient of solvent B 68% from 0-30 min, 68-100% from 30-32min, 100% from 32-37min, 100-41% from 37-39min and 41 % from 39- 44mins. The sample was dissolved in a minimal volume of acetonitrile and injected in 200 ml aliquots. The fraction collector was programmed to collect fractions between 22-25 minutes (with a maximum peak duration of 2 min) and to be triggered by detection of a peak from either the DAD or ELSD detector (with criteria of up slope 2, down slope 4, threshold 2.5 and upper threshold 5000). The DAD was set to collect signals with a wavelength of 205nm and bandwidth of 4nm, and the ELSD to acquire signals with the following parameters, temperature 40°C, gas flow rate of 1.6 SLM, data rate of 80Hz and LED intensity 100%.

Fractions collected (over 11 runs) were pooled and dried down yielding 13.1 mg of product which was a white powder. Structural characterisation was performed by NMR in deuterated chloroform.

N. benthamiana leaf disc based M. sexta feeding assay. The assay was based on a published assay (98). In preparation for the assay, A. tumefaciens liquid cultures were inoculated with strains harbouring genes of interest. Infiltration of N. benthamiana leaves with A. tumefaciens and NeemAZAL T/S solution (diluted 1 :200 in ddH20 to achieve recommended concentration) was performed the following day. For acclimitisation, three days post-lay M. sexta eggs were transferred onto uninfiltrated N. benthamiana leaves in petri-dishes containing moistened blue roll. To perform the assay three discs of filter paper (Whatman qualitative Grade 1 15 mm) were placed in each well of a 24-well plate (CLS3738 Corning Costar) with the addition of 120 ml of sterile water. Discs of infiltrated N. benthamiana leaves (four days post-infiltration) were prepared using a 10 mm diameter leaf cutter and placed into wells following a computer generated randomised experimental design.

Remaining leaf material was freeze-dried and quantitative metabolite analysis was performed as described. A paintbrush was used to transfer an acclimatised five day post-lay (one day posthatch) M. sexta larvae into each well, ensuring no contact between paintbrush and leaf. Plates were sealed and incubated for 32-48. Images of plates were taken before and after the assay. Post-assay images were aligned to pre-assay images using the landmark correspondence plug-in in Fiji to generate comparable post-assay images. Python was used to run the‘find green area differences’ script to calculate the percentage of green area remaining after the assay.

Table S1. Summary of Meliaceae transcriptomes assembled from previously generated RNAseq data.

Each transcriptome has been assigned an ID based on species and original data set: Azadirachta indica (Ai1 and Ai2) and Melia azedarach (Ma1). Details of RNAseq data used to assemble transcriptomes include: BioSample, SRA identifiers and the tissue of origin. Basic statistics from Trinity de novo (7) generated assemblies are also included: total number of transcripts, N50 and total number of bases assembled.

Table S2. Cloning and functional characterization of candidate OSCs.

Table S3. 13 C & 1 H d assignments for tirucalla-7,24-dien-3b-ol.

Carbon numbering scheme and selected COSY and HMBC

10 34.95 / 19 13.12 0.75 (3H, s)

NMR spectra were recorded using CDCIR3R and referenced to TMS. Coupling constants are reported as observed and not corrected for second order effects. Assignments were made via a combination of 1 H, 13C, DEPT-edited HSQC, HMBC and 2D NOESY experiments. Where signals overlap 1 H d is reported as the centre of the respective HSQC crosspeak. Assignments were consistent with previous literature assignments for tirucalla-7,24-dien-3β-ol (23). Table S4. Candidate CYPs from M. azedarach, orthologs from A. indica and closest C. sinensis homologs.

Table S4 Continued

Candidate CYPs from M. azedarach transcriptome (Ma1) are listed with their orthologs or closet homologs from A. indica

transcriptomes (Ai1 , Ai2) and C. sinensis protein annotation (GCF_000317415.1 (Cs)). For each candidate, the CYP nomenclature is listed (black) along with identifiers in datasets (grey). CYPs and fragments from A. indica and M. azedarach were named and assigned to clans by the Cytochrome P450 Nomenclature Committee following established convention (24). CYPs from C. sinensis had previously been identified from an alternative C. sinensis genome (25) and later assigned names by the Cytochrome P450 Nomenclature Committee (24). Instances where homologs were not identifiable in a dataset (grey box), were not full-length candidates (*), or were under 300 amino acids and therefore considered a fragment (**) are indicated. Candidate selection was based on differential gene expression analysis performed on an A. indica dataset (A. indica DGE) (5, 6) or occurrence in the unique CYP71 subclade (CYP71). Phylogeny of the CYP candidates is presented in Fig. 5 A, with the exception of MaCYP72A720 and MaCYP88A108 which are phylogenetically distinct from the CYP71 family. Candidates from M. azedarach (Ma1) were cloned and details of amplification and number of single nucleotide polymorphisms (SNPs) are given.

Table S5. 13 C & 1 H d assignments for tirucalla-7-ene-24,25-epoxy-3b,23- diol (dihydroniloctin).

Carbon numbering scheme and selected COSY and HMBC

4 38.98 / 11 18.10 1 .53 (2H, m)

10 34.96 / 19 13.12 0.75 (3H, s)

NMR spectra were recorded using CDCIR3R and referenced to TMS. Coupling constants are reported as observed and not corrected for second order effects. Assignments were made via a combination of P 1 PH, P 13 PC, DEPT-edited HSQC, HMBC and 2D NOESY experiments. Where signals overlap P 1 PH d is reported as the centre of the respective HSQC crosspeak. Assignments were consistent with previous literature assignments for dihydroniloticin (26). Table S6. 13 C & 1 H d assignments for tirucalla-7,24-dien-3b,21-diol.

Carbon numbering scheme and selected COSY and HMBC

,

NMR spectra were recorded using CDCIR3R and referenced to TMS. Coupling constants are reported as observed and not corrected for second order effects. Assignments were made via a combination of P 1 PH, P 13 PC, DEPT-edited HSQC, HMBC and 2D NOESY experiments. Where signals overlap P 1 PH d is reported as the centre of the respective HSQC crosspeak. Table S7. 13C d comparison to the literature for tirucalla-7-ene- 23,21,24,25-diepoxy-3b,21-diol (melianol) C21 epimeric mixture.

Carbon numbering scheme

NMR spectra were recorded in Fourier transform mode at a nominal frequency of 150 MHz using CDCIR 3 R and referenced to TMS. Assignments were consistent with previous literature assignments for melianol (27). Table S10. Primers

Primer Sequence Target

OSC candidate yeast recombination primers

1 AiOSC1_pYES2-F ACTACTAGCAGCTGTAATACGACTCACTATAGGGAATATTAATGTGGAAGCTGAAGATTG AiOSC1 fragl

2 AiOSC1Join-R TTCAGGATCCTCCACCCAACAAGCAAGCATACACAGC AiOSC1 fragl

3 AiOSC1Join-F GCTGTGTATGCTTGCTTGTTGGGTGGAGGATCCTGAA AiOSC1 frag2.

4 AiOSC1_pYES2-R GAATGTAAGCGTGACATAACTAATTACATGATGCGGCCCTTTAATTAGGCAATGGAAC AiOSC1 frag2.,

MaOSCI

5 MaAZA1_pYES2-F ACT ACT AGCAGCT GT AAT ACGACTCACT AT AGGGAAT ATT AAT GTGGAAGCT GAAGGTT GCAGAG MaOSC 1

6 CsOSC1_PYES2-F ACT ACT AGCAGCT GT AAT ACGACTCACT AT AGGGAAT ATT AAT GTGGAGGCT GAAGGTTGC CsOSC 1 frag 1.

7 CsOSC1Join-R CGTTTGGATCTTCAACCCAACAAGCAAGCATACACAACG CsOSCI fragl

8 CsOSCI Join-F CGTT GT GT ATGCTTGCTT GTTGGGTT GAAGATCCAAACG CsOSCI frag2.

9 CsOSC1_PYES2-R GAAT GT AAGCGT GACAT AACT AATT ACAT GATGCGGCCCTTT AATT AGGCAGT GGAACT C CsOSCI frag2.

10 CsOSC2_PYES2-F ACT ACT AGCAGCT GT AAT ACGACT CACT AT AGGGAAT ATTAAT GTGGAAGCT AAAGGT AGGAG CsOSC2 fragl

11 CsOSC2 J 0in -R CTTCCAAAGCTCTGCATCTTCATTCCATCCTCAGCAACCC CsOSC2 fragl

12 CsOSC2joiri-F GGGTT GCT GAGGAT GGAAT GAAGAT GCAGAGCTTT GGAAG CsOSC2 frag2.

13 CsOSC2_PYES2-R GAAT GT AAGCGTGACAT AACT AATT ACAT GAT GCGGCCCTT CAAGAAAGAGT AACCT GAT G CsOSC2 frag2.

14 CsOSC3_PYES2-F ACT ACT AGCAGCT GT AAT ACGACTCACT AT AGGGAAT ATT AAT GTGGAGGCTT AAGATTGG CsOSC3 fragl

15 CsOSC3Join-R CCATT AGGAT CTTCCGCCCAACAGGAGAGCAT GTTCAGCG CsOSC3 fragl

16 CsOSC3_join-F CGCTGAACATGCTCTCCTGTTGGGCGGAAGATCCTAATGG CsOSC3 frag2.

17 CsOSC3_PYES2-R GAATGTAAGCGTGACATAACTAATTACATGATGCGGCCCTTCAGAAAATCTTGGACGATT G CsOSC3 frag2.

18 AtLU P 5 _P YES2 -F ACT ACT AGCAGCT GT AAT ACGACTCACT AT AGGGAAT ATT AAT GTGGAGGTT AAAGGT AG AtLUP5

19 AtLUP5 PYES2-R GAATGTAAGCGTGACATAACTAATTACATGATGCGGCCCTCTATAGATCTGCGTGATGT AtLUP5

OSC candidate gateway cloning

20 AiOSC1 AttB1-F GGGGACAAGTTT GTACAAAAAAGCAGGCTTCAT GTGGAAGCT GAAGATT G AiOSC1

23 CsOSC1_AttB1-F GGGGACAAGTTTGTACAAAAAAGCAGGCTTCAATGTGGAGGCTGAAGGTTGC CsOSCI

24 CsOSC1_AttB2-R GGGGACCACTTT GT ACAAGAAAGCTGGGTTTT AATT AGGCAGTGGAACTC CsOSCI

25 CsOSC2_AttB1-F GGGGACAAGTTTGTACAAAAAAGCAGGCTTCATGTGGAAGCTAAAGGTAGGAG CsOSC2

26 CsOSC2_AttB2-R GGGGACCACTTT GT ACAAGAAAGCTGGGTTTCAAGAAAGAGT AACCT GAT G CsOSC2

27 CsOSC3_AttB1-F GGGGACAAGTTT GT ACAAAAAAGCAGGCTTCAT GTGGAGGCTT AAGATTGG CsOSC3

28 CsOSC3_AttB2-R _ GGGGACCACTTT GT ACAAGAAAGCTGGGTTTT CAGAAAATCTTGGACGATT G CsOSC3

Sequencing- plasmid specific

CYP candidate gateway cloning

39 MaCYP72A720_AttB1-F GGGGACAAGTTT GT ACAAAAAAGCAGGCTT CAT GGAGTT AT CT CT GAAATCGG MaCYP72A720

40 MaCYP72A720_AttB1-R GGGGACCACTTT GT ACAAGAAAGCT GGGTTTT ATAATTT CTTT AAAAT CAAG MaCYP72A 720

41 MaCYP71 BQ5_AttB1-F GGGGACAAGTTT GT ACAAAAAAGCAGGCTT CAT GGAGTT CAGACT GCCT GTT C MaCYP71BQ5

42 MaCYP71 BQ5_AttB1 - R GGGGACCACTTT GT ACAAGAAAGCTGGGTTT CACTT CT GAAAAGGAAT ACGAG MaCYP71BQ5

43 MaCYP88A108_AttB1-F GGGGACAAGTTT GT ACAAAAAAGCAGGCTTCATGGAGCT AAATTTCCT GTGG MaCYP88A 108

44 MaCYP88A108_AttB1 -R GGGGACCACTTT GT ACAAGAAAGCTGGGTTTCAGAAGTTCTT GACCTT GAT G MaCYP88A 108

45 MaCYP71 BQ6_AttB1-F GGGGACAAGTTT GT ACAAAAAAGCAGGCTT AATGGATTCTTCAAT ATCCC MaCYP71BQ6

46 MaCYP71 BQ6_AttB2-R GGGGACCACTTT GT ACAAGAAAGCTGGGT ATCACTTCT GT AAAGGAAAACG MaCYP71BQ6

47 MaCYP71 D557_AttB1-F GGGGACAAGTTT GT ACAAAAAAGCAGGCTT AATGGAGGTTCAATTT GTTTCC MaCYP71D557

48 MaCYP71 D557_AttB2-R GGGGACCACTTTGTACAAGAAAGCTGGGTATCATGGATGATCAGCCGG MaCYP71D557

49 M aC Y P71 C D2_Att B 1 - F GGGGACAAGTTTGTACAAAAAAGCAGGCTTAATGAATCTCCAACTCGATTAC MaCYP71CD

50 MaCYP71 CD2_AttB2-R GGGGACCACTTTGTACAAGAAAGCTGGGTATTAATTTTCCACCTCAATGTTG MaCYP71CD

51 MaCYP71 BE124_AttB1 F GGGGACAAGTTT GT ACAAAAAAGCAGGCTT AATGGAGT ACCAACTTCCATC MaCYP71BE124

52 MaCYP71 BE124 AttB2 R GGGGACCACTTT GT ACAAGAAAGCTGGGT ACT AGACT AGTTTGGAGTT ATT G MaCYP71BE124

Post-melianol candidate gateway cloning

61 MaCYP88A108_AttB1 GGGGACAAGTTTGTACAAAAAAGCAGGCTTAATGGAGCTAAATTTCCTGTGG MaCYP88A 108

62 MaCYP88A108_AttB2 GGGGACCACTTTGTACAAGAAAGCTGGGTATCAGAAGTTCTTGACCTTGATG MaCYP88A 108

63 MalSOM1_AttB1 GGGGACAAGTTTGTACAAAAAAGCAGGCTTAATGGAGAACAAAAGTGCATCG MalSOMI

64 MalSOM1_AttB2 GGGGACCACTTTGTACAAGAAAGCTGGGTATTAATTTAGTTTATCATTATCCATCC MalSOMI

65 MaSDR1_AttB1 GGGGACAAGTTTGTACAAAAAAGCAGGCTTAATGGCGGATCATTTGAATGC MaSDRI

66 MaSDR1_AttB2 GGGGACCACTTTGTACAAGAAAGCTGGGTACTAAGCGAGACCACGATC MaSDRI

67 MaBAH D 1 _AttB 1 GGGGACAAGTTTGTACAAAAAAGCAGGCTTAATGGAGCCTGAAATAATTTCC MaBAH D1

68 MaBAH D1 AttB2 GGGGACCACTTTGTACAAGAAAGCTGGGTATCACATGGGACTTGGG MaBAH D1

The nucleotide sequences of all primers used and their target genes are listed including, those used for cloning, sequencing and qRT-PCR.

Table S11. Isolera™ Prime fractionation conditions.

Target Column Solvents Gradient Yield

AiOSC1 SNAP Ultra A: Hexane 0-2%(1 CV), 2-25%

product 10g B: EtOAc (10CV), 25-100% (3CV)

A:Hexane 0-100%(45CV),

KP-Sil 50g 50 mg

B: EtOAc 100%(11 CV)

AiOSC1

SNAP Ultra A:Hexane 20-50%

MaCYP71BQ5 10 mg

10g B: EtOAc (176CV),50%(5CV) product

SNAP Ultra A:Hexane

20%(176CV) 4 mg 10g B: EtOAc

SNAP KP-Sil A:Hexane

6-100% (10CV) 190 mg 25g B: EtOAc

SNAP Ultra A:Hexane 10%(1 CV), 10-

AiOSC1 170 mg

10g B: EtOAc 60%(10CV), 60%(5CV)

MaCYP71CD2

product SNAP Ultra A:Hexane 10% (3CV), 10-34%

130 mg

10g B: EtOAc (143CV)

SNAP Ultra A:Hexane

0-30%(181CV) 120 mg 10g B: EtOAc

SNAP KP-Sil A:Hexane

6-100% (13CV) 410 mg 100g B: EtOAc

AiOSC1 SNAP Ultra A:Hexane

20-53% (1 16CV) 220 mg

MaCYP71CD2 10g B: EtOAc

MaCYP71BQ5 SNAP Ultra A: Hexane

10-41 % (140CV) 34 mg product 10g B:DCM

SNAP Ultra A: Hexane 10-30% (87CV), 30%

4 mg

10g B:DCM (79CV)

A: Hexane

AiOSC1 SNAP Ultra 50g 6-100% (28CV) 200 mg

B: EtOAc

MaCYP71CD2

MaCYP71BQ5 A:DCM

SNAP KP-Sil 25g 0-10% (106CV) 130 mg

MaCYP88A108 B:MeOH

Mai SOM 1

product A:DCM

SNAP Ultra 10g 10-41 % (140CV) 40 mg

B:MeOH Table S11 : Details of conditions used for Isolera™ Prime fractionation including: column, solvent system, percentage gradient of solvent B, column volume and dry weight of resulting extract (yield). All samples were dry-loaded onto Isolera™ Prime (Biotage) using Celite® (Sigma-Aldrich).

Table S12. Media used for S. cerevisiae selection

Plasmid Selection Medium

pYES2 URA3 SD-URA

pAG423GAL HIS3 SD-HIS

pAG425GAL LEU2 SD-LEU

pYES2, URA3

pAG423GAL, HIS3 SD-URA -HIS -LEU

pAG425GAL LEU2

Supplement drop-out (SD) media used to select for each S. cerevisiae plasmid. Selection amino are include uracil (URA), histidine (HIS) and leucine (LEU).

Table S13. 13 C & 1 H d assignments for melianol B (C21 epimeric mixture).

Carbon numbering scheme and selected COSY, HMBC and NOESY

Carbon 13 C d 1 H d Carbon 13 C d 1 H d

# (150 MHz) (600 MHz) # (150 MHz) (600 MHz)

15 119.52 119.04 5.47 (1H, m) 5.46 (1H, m) 10 37.58 37.57 / 102.38 97.59 5.39 (1H. ill) 5.38 (1H. ill) 35.07 34.73 2.18 (2H, m) 2.12 (2H, m)

2.09 (1 H, m) 2.02 (1 H, m)

3 78.77 78.74 3.28 (1 H, dd J= 11.3, 4.5) 22 34.71 31.35

1.41 (1 H, m) 1.74 (1 H, m)

7 72.33 72.32 3.93 (1 H, m) 28 27.67 0.99 (3H, s)

1.64 (1 H, m)

25 58.09 57.27 / 2 27.14

1.58 (1 H, m) 57.51 52.73 1.74 (1H, m) 2.01 (1H, m) 25.02 24.92 1.33 (3H,s) 1.32 (3H,s)

1.85 (1 H, m)

20 47.80 45.48 2.39 (1H, m) 1.49 (1H, m) 6 23.68 23.66

1.74 (1 H, m)

13 47.09 46.70 19.79 19.45 1.02 (3H, s) 1.09 (3H,s)

5 46.53 46.49 1.50 (1H, m) 1.48 (1H, m) 27 19.35 19.21 1.32 (3H, s)

9 41.78 41.74 1.93 (1H, m) 1.91 (1 H, m) 29 15.45 0.79 (3H, s) 0.89 (3H, s)

NMR spectra were recorded using CDCI3 and referenced to TMS. Assignments

were made via a combination of 1 H, 13C, DEPT-135, DEPT-edited HSQC, HMBC

and 2D NOESY experiments. Where signals overlap 1H d is reported as the centre of the respective HSQC crosspeak. Sequence tables

Table 1 - melianol-biosynthetic sequences” or“M-B sequences” characterised or identified in the present work

Table 1 (continued)

*Partial CDS and encoded polypeptide, see Table S4

** Predicted sequence Table 2 - Other TDS sequences

Table 3 - Accessory enzymes

Table 4

Percentage identity matrix were generated using ClustalOmega (v1.2.0) under default parameters (clustalo --full --percent-id --distmat-out=output.distmat).

SEQ ID 1 - AiOSC1 coding sequence (nucleotide)

ATGTGGAAGCTGAAGATTGCAGAGGGTGACAAAAATAGCCCATATATTTCTACAACAAAC AATTTCGT

TGGAAGGCAAATATGGGAATTTGATCCGAACGCCGGAACTGCTGAAGAGCTTGCTGA AGTTGAAGAAG

CTCGTCAGAATTTCTACAAGAATCGCCATCAAGTCAAACCTGCTAGTGATCTTATTT TTCGTCTTCAG

TTTCTTAGAGAGAAAAACTTCAAGCAAACGATTCCTCAAGTGAAGGTTGAAGATGGG GAGGAGATCAC

ATATGACACTGCCACAGCAGCAATGAAGAGGGCTGCTCACTACTTCTCAGCAATTCA GGCTAGCGATG

GCCATTGGCCTGCTGAAAATTCTGGCCCTATGTATTTCCTTCCTCCATTTGTATTCT GCTTGTACATT

ACAGGACATCTTGATACTGTATTTACAGCTGCTCATCGCAGAGAAGTCCTTCGTTAC TTATACAATCA

TCAGCATGAAGATGGAGGATGGGGAATACACATAGAAGCGCCAAGCAGTATGTTTGG TACAGTTTACA

GTTATCTTACAATGCGTTTGCTAGGGTTAGGACCCAACGATGGTGAAAACAATGCCT GCGCCAGAGCT

AGAAAATGGATTCGTGATAATGGTGGTGTCACTTACATTCCCTCTTGGGGAAAGAAT TGGCTTTCGAT

TCTTGGTTTGTTTGAATGGGCTGGAACACACCCAATGCCCCCAGAGTTCTGGATGCT TCCTTCTCATT

TTCCACTTCATCCAGCCCAAATGTGGTGCTTCTGCCGGCTGGTTTACATGCCCTTGT GTTATTTATAC

GGCAAAAGATTTGTTGGTCCAATCACTCCACTTATCAAACAACTGAGAGAAGAACTT CATACAGAGCC

TTACGATAAAATCAACTGGAGGAAAGTTCGTCATCAATGTGCAAAGACTGATCTCTA CTACCCCCATC

CATTCGTACAAGAAGTTCTATGGGATACTCTATACTTTGCTACAGAGCCTCTGCTTA CTCGTTGGCCA

TTGAACAAGTATGTCAGAGAGAAGGCTTTGAAACAAACGATGAAGATCATTCATTAT GAAGACCAAAG

CAGTCGATATATTACTATTGGATGCGTCGAGAAGCCGCTGTGTATGCTTGCTTGTTG GGTGGAGGATC

CTGAAGGGGTTGCTTTCAAGAAGCATCTTGAGAGAATTGCTGATTTTATTTGGATTG GAGAAGATGGA

ATGAAAGTTCAGACATTTGGCAGTCAAACATGGGATACTGCTCTTGGACTTCAAGCT TTGCTTGCTTG

CAATATCGTTGATGAAATTGGACCTGCACTTGCTAAAGGACACGACTACTTGAAGAA AGCTCAGGTGA

GGGATAATCCAGTGGGTGATTATACAAGCAATTTCCGACACTTTTCCAAAGGAGCAT GGACTTTCTCT

GATCAAGATCATGGTTGGCAAGTTTCAGATTGTACTGCAGAAAGTTTGAAGTGCTGC CTGCATTTCTC

AATGCTGCCTCCAGAAATTGTTGGAGAGAAACATGATCCTGAGAGATTATATGAAGC TGTCAATTTCA

TACTCTCTCTTCAGGATAAAAATGGTGGAATAGCAGTTTGGGAGAAAGCTGGTGCCT CTTTGATGTTA

GAGTGGCTCAATCCTGTAGAGTTTCTGGAGGACCTTATTGTTGAGCATACTTACGTG GAATGCACTGC

TTCAGCAATCGAGGCATTTGTTATGTTCAAGAAATTATACCCACATCATCGCAAGAA GGAGATTGAAA

ATTTCCTCGTAAAAGCTGTACAGTACATTGAAAATGAACAAACTGCTGATGGTTCAT GGTATGGAAAC

TGGGGAGTTTGCTTCTTATATGGAACATGTTTTGCACTTGGAGGTTTACATGCTGCT GGAAAGACTTA

CAACAATTGTCTTGCCATTCGTAGAGCAGTTGAGTTTCTGCTTCAAGCACAGAGTGA TGATGGTGGTT

GGGGAGAGAGCTACAAATCTTGCCCTAGTAAGATATACGTACCTCTTGATGGGAAAA GATCAACTGTG

GTACACACTGCATTGGCTATTCTTGGTTTAATCCATGCTGGGCAGGCTGAAAGAGAC CCAACCCCTAT

TCATCGTGGTGTAAAATTGCTGATCAACTCTCAATTGGAGAATGGAGACTTCCCTCA ACAGGAAATTA

TGGGAGTTTTTATGAGAAACTGTATGTTACACTATGCTCAATACAGGAATATTTTTC CTTTGTGGGCT

TTAGCTGAATATAGAAGAAAAGTTCCATTGCCTAATTAA

SEQ ID 2 - AiOSCl translated nucleotide sequence (peptide)

MWKLKIAEGDKNSPYISTTNNFVGRQIWEFDPNAGTAEELAEVEEARQNFYKNRHQVKPA SDLIFRLQ FLREKNFKQTIPQVKVEDGEEITYDTATAAMKRAAHYFSAIQASDGHWPAENSGPMYFLP PFVFCLYI TGHLDTVFTAAHRREVLRYLYNHQHEDGGWGIHIEAPSSMFGTVYSYLTMRLLGLGPNDG ENNACARA RKWIRDNGGVTYIPSWGKNWLSILGLFEWAGTHPMPPEFWMLPSHFPLHPAQMWCFCRLV YMPLCYLY GKRFVGPITPLIKQLREELHTEPYDKINWRKVRHQCAKTDLYYPHPFVQEVLWDTLYFAT EPLLTRWP LNKYVREKALKQTMKI IHYEDQSSRYITIGCVEKPLCMLACWVEDPEGVAFKKHLERIADFIWIGEDG MKVQTFGSQTWDTALGLQALLACNIVDEIGPALAKGHDYLKKAQVRDNPVGDYTSNFRHF SKGAWTFS DQDHGWQVSDCTAESLKCCLHFSMLPPEIVGEKHDPERLYEAVNFILSLQDKNGGIAVWE KAGASLML EWLNPVEFLEDLIVEHTYVECTASAIEAFVMFKKLYPHHRKKEIENFLVKAVQYIENEQT ADGSWYGN WGVCFLYGTCFALGGLHAAGKTYNNCLAIRRAVEFLLQAQSDDGGWGESYKSCPSKIYVP LDGKRSTV VHTALAILGLIHAGQAERDPTP IHRGVKLLINSQLENGDFPQQEIMGVFMRNCMLHYAQYRNIFPLWA LAEYRRKVPLPN SEQ ID 3 - MaOSCl coding sequence (nucleotide)

ATGTGGAAGCTGAAGGTTGCAGAGGGTGACAAAAATAGCCCATATATTTTTACAACAAAC AATTTCGT TGGAAGGCAAATATGGGAATTTGATCCGAATGCTGGAACTGCTGAAGAGCTTGCTGAAGT TGAAGAAG CTCGTCAGAATTTCTACAAGAATCGCCATCAAGTCAAGCCTGCTAGTGACCTTATTTTTC GTCTTCAG TTTCTTAGAGAGAAAAACTTCAAGCAAACGATTCCTCAAGTGAAGGTTGAAGATGGGGAG GAAATCAC ATATGACACTGCTACAGCAGCAATGAAGAGGGCTGCTCACTACTTCTCTGCAATTCAGGC TAGCGATG GCCATTGGCCTGCTGAAAATTCTGGCCCTATGTATTTCCTTCCTCCATTTATATTCAGCT TGTACATT ACAGGACATCTTGATACTGTATTTACAGCTGCTCATCGCAGAGAAGTCCTTCGTTACTTA TACAATCA TCAGCATGAAGATGGAGGATGGGGAATACACATAGAAGGACCAAGCAGTATGTTTGGTAC AGTTTACA GTTATCTTACAATGCGTTTGCTGGGGTTAGGACCCAACGATGGTGAAAACAATGCCTGTG CTAGAGCT AGAAAGTGGATTCGTGATAATGGTGGTGTCACTTACATTCCCTCTTGGGGAAAGAATTGG CTTTCGAT TCTTGGATTGTTTGAATGGGCTGGAACACACCCAATGCCCCCAGAGTTCTGGATGCTTCC TTCTTATT TTCCACTTCATCCAGCCCAAATGTGGTGCTTCTGCCGGCTGGTTTACATGCCCTTGTCTT ATTTATAC GGCAAAAGATTTGTTGGTCCAATCACTCCACTTATCAAACAACTCAGAGAAGAACTTCAC ACAGAGCC TTACGATCAAATCAACTGGAGGAAAGTTCGTCATCTATGTGCGAAGCCTGATCTGTACTA CCCACATC CATTCGTACAAGACGTTCTATGGGATACTCTATACTTGGCTACAGAGCCTCTGCTTACTC GTTGGCCA TTGAACAAGTATCTCAGAGAGAAGGCTTTGAAACAAACGATGAAGATCATTCATTATGAA GACCAAAG CAGTCGATACATTACTATTGGCTGCGTCGAGAAGCCGTTGTGTATGCTTGCTTGTTGGGT GGAGGATC CTGAGGGGGTTGCTTTCAAGAAGCATCTTGAGAGAATTGCTGATTTTATTTGGATTGGAG AAGATGGA ATGAAAGTTCAGACATTTGGTAGTCAAACATGGGATACTGCTCTTGGTCTTCAAGCTTTG CTTGCTTG CAATATCGTTGATGAAATTGGACCAGCACTTGCTAAAGGACACGACTACTTGAAGAAAGC TCAGGTGA GGGATAATCCAGTGGGTGATTATACAAGCAATTTCCGTCACTTTTCTAAAGGAGCATGGA CTTTCTCT GATCAAGATCATGGTTGGCAAGTTTCAGATTGTACTGCAGAAAGTTTGAAGTGCTGCTTG CATTTCTC AATGCTGCCTCGAGAAATTGTTGGAGAGAAACATGATCCTGAGAGATTATATGAAGGTGT CAATTTCA TACTCTCTCTTCAGGATAAAAATGGTGGATTAGCAGTTTGGGAGAAAGCTGGTGCCTCTT TGTTGTTA GAGTGGCTCAATCCTGTAGAGTTTCTGGAGGACCTTATTGTTGAGCATACTTATGTGGAA TGCACTGC TTCAGCAATCGAGGCATTTGTTATGTTCAAGAAATTATACCCACATCATCGCAAGAAGGA GATTGAAA ATTTCCTCGTAAAAGCTGTACAGTACATTGAAAATGAACAAACTGCTGATGGTTCATGGT ATGGAAAT TGGGGAGTTTGCTTCTTATATGGAACATGTTTTGCACTTGGAGGATTACATGCTGCTGGA AAGACTTA CAACAATTGTCTTGCCATTCGTAGAGCAGTTGAATTTCTGCTCCAAGCACAGAGTGATGA TGGTGGTT GGGGAGAGAGCTACAAATCCTGCCCTAGTAAGATATATATTCCTCTTGATGGAAAAAGAT CAAGTGTG GTACACACTGCATTGGCTGTTCTTGGTTTAATTCATGCTGGGCAGGCTGAAAGAGACCCA ACACCTAT TCATCGTGGTGTAAAATTGCTGATCAACTCTCAATTGGAGAATGGAGACTTCCCTCAACA GGAAATTA TGGGAGTTTTTATGAGAAACTCTATGTTACACTATGCTCAATACAGGAATATTTTTCCTT TGTGGGCT TTAGCCGAATATAGAAGAAAAGTTCCATTGCCTAATTAA

SEQ ID 4 - MaOSCl translated nucleotide sequence (peptide)

MWKLKVAEGDKNSPYIFTTNNFVGRQIWEFDPNAGTAEELAEVEEARQNFYKNRHQVKPA SDLIFRLQ FLREKNFKQTIPQVKVEDGEEITYDTATAAMKRAAHYFSAIQASDGHWPAENSGPMYFLP PFIFSLYI TGHLDTVFTAAHRREVLRYLYNHQHEDGGWGIHIEGPSSMFGTVYSYLTMRLLGLGPNDG ENNACARA RKWIRDNGGVTYIPSWGKNWLSILGLFEWAGTHPMPPEFWMLPSYFPLHPAQMWCFCRLV YMPLSYLY GKRFVGPITPLIKQLREELHTEPYDQINWRKVRHLCAKPDLYYPHPFVQDVLWDTLYLAT EPLLTRWP LNKYLREKALKQTMKI IHYEDQSSRYITIGCVEKPLCMLACWVEDPEGVAFKKHLERIADFIWIGEDG MKVQTFGSQTWDTALGLQALLACNIVDEIGPALAKGHDYLKKAQVRDNPVGDYTSNFRHF SKGAWTFS DQDHGWQVSDCTAESLKCCLHFSMLPREIVGEKHDPERLYEGVNFILSLQDKNGGLAVWE KAGASLLL EWLNPVEFLEDLIVEHTYVECTASAIEAFVMFKKLYPHHRKKEIENFLVKAVQYIENEQT ADGSWYGN WGVCFLYGTCFALGGLHAAGKTYNNCLAIRRAVEFLLQAQSDDGGWGESYKSCPSKIYIP LDGKRSSV VHTALAVLGLIHAGQAERDPTP IHRGVKLLINSQLENGDFPQQEIMGVFMRNSMLHYAQYRNIFPLWA LAEYRRKVPLPN

SEQ ID 5 - CsOSCl coding sequence (nucleotide)

ATGTGGAGGCTGAAGGTTGCAGAGGGTGACAAAAACAGCCCATATATGTTTACAACAAAC AATTTTGT

TGGAAGGCAAATATGGGAGTTTGATCCAAAGGCTGGCTCACCAGAAGAGCTTGCTGA AGTTGAAGAAG CTCGACAAAAATTTTACAAAAATCGACATAATGTGAAGCCGGCGGGTGATCTTCTTTGGC GCTTACAG

TTTCTGAGAGAGAAAAATTTCAAACAAAGAATTCCTCAAGTAAAGGTTAAGGATGGT GATGCAATCAC

ATATGAAACTGCCACCACTGCAATGAAGAGGGCTGCTCATTACTTTTCAGCTATACA AGCTAGCGACG

GCCATTGGCCCGCTGAAAATGCCGGCCCCATGTATTTTCTTCCTCCATTTGTCTTCT GCTTGTACATT

ACAGGTCATCTTAACACTGTCTTCACAGTTGAGCATCGCAGAGAAATTCTTCGCTAT TTGTACAACCA

TCAGCATGAAGATGGTGGCTGGGGAGTACATGTAGAGGCTCCAAGCAGCATGTTTGG TACAGTTTTCA

GCTACCTTTGCATGCGTTTACTGGGTCTAGGACCCAATGATGGAGAAAACAATGCTT GTGCTAGAGCT

CGTAAATGGATTCGCGACCATGGTGGTGTCACTTACATACCCTCTTGGGGAAAGAAT TGGCTTTCAAT

ACTTGGAATATTTGAATGGTCTGGCACCAACCCAATGCCCCCAGAGTTCTGGATCCT GCCTTCCTTTG

TTCCACTTCATCCATCAAAAATGTGGTGCTATTGTAGACTGGTTTACATGCCCGTGT CTTACTTATAC

GGGAAAAGGTTTGTTGGTCCAATCACTCCACTAATTCAACAATTGAGGGAAGAACTT CACACTCAGCC

TTACAATGAAATCAACTGGAGAAAAGTTCGCCATCTCTGCGCGAAGGAGGATCTCTA CTATCCCCATC

CTTTTGTACAGGAATTACTTTGGGATACGCTTTACTTAGCAAGCGAGCCTCTCCTAA CTCGTTGGCCT

TTGAACAAGCTTATCAGACAGAAGGCTTTAAAAGAAACAATGAAGTTCATTCATTAC GAGGACCACAA

CAGTCGGTACATTACCATAGGCTGTGTGGAAAAGCCGTTGTGTATGCTTGCTTGTTG GGTTGAAGATC

CAAACGGAATTGCATTTAAGAAGCATCTTAATAGGATTGCAGATTACATTTGGCTCG GGGAAGACGGA

ATGAAAGTTCAGACTTTCGGCAGCCAAACATGGGATACTGCTCTTGGTCTTCAAGCT TTGATGGCTTG

CAATATTGCTGACGAAGTTGAATCTGTACTCGGTAAAGGACACGACTACTTGAAGAA AGCTCAGATTA

GGGACAATCCAGTAGGTGACTACAAAGGCAATTTTCGACACTTTTCTAAAGGAGCAT GGACATTTTCG

GATCAAGATCACGGATGGCAAGTTTCAGATTGTACCGCAGAAGGTTTGAAGTGTGTC CTACAACTCTC

ACTGATGCCACCAGAAATTGTTGGCGAGAAAATGGAACCTGAGAGACTATATGATGC TGTTAATTTCT

TACTCTCTCTTCAGGATGAAAAAACTGGTGGATTAGCAGTTTGGGAGCGAGCTGGAG CCTCGTTGCTG

TTGGAGTGGCTAAATCCGGTGGAGTTTCTTGAGGACCTTATAGTAGAGCACACTTAC GTCGAATGCAC

TGCATCAGCAATCGAAGCATTTACGTTGTTTAAGAAATTATACCCTCATCACAGAAA AAAGGAGATTG

AAAATTTCATCGTGAAAGCTGTGCAGTACATCGAAGACGAACAAACTGCTGATGGTT CATGGTATGGA

AATTGGGGAATTTGCTTCATATATGGTACCTGTTTTGCTCTTGGAGGGCTGCAAGTT GCTGGCAAGAC

TTACAACAATTGTCTTGCAATTCGAAGAGCAGTTGATTTTCTACTAAATGCACAAAG TGATGATGGTG

GGTGGGGAGAGAGCTACAAATCATGCCCAAATAAGATATATACACCTCTTGAAGGAA AGAGATCAACA

GTGGTACACACCGCATTGGCAGTTCTTAGTTTAATTAGTGCTGGGCAGGCTGATAGA GACCCAACTCC

TATTCATCGTGGTGTGAAGTTGTTGATCAACTCACAATTGGAAAATGGAGACTTCCC GCAACAGGAAA

TTATGGGAGTGTTCATGAGAAACTGCATGTTACACTATGCAGAATACAGGAATATTT TCCCATTGAGG

GCTTTAGCAGAGTATCGCAAAAGAGTTCCACTGCCTAATTAA

SEQ ID 6 - CsOSCl translated nucleotide sequence (peptide)

MWRLKVAEGDKNSPYMFTTNNFVGRQIWEFDPKAGSPEELAEVEEARQKFYKNRHNVKPA GDLLWRLQ

FLREKNFKQRIPQVKVKDGDAITYETATTAMKRAAHYFSAIQASDGHWPAENAGPMY FLPPFVFCLYI

TGHLNTVFTVEHRREILRYLYNHQHEDGGWGVHVEAPSSMFGTVFSYLCMRLLGLGP NDGENNACARA

RKWIRDHGGVTYIPSWGKNWLSILGIFEWSGTNPMPPEFWILPSFVPLHPSKMWCYC RLVYMPVSYLY

GKRFVGPITPLIQQLREELHTQPYNEINWRKVRHLCAKEDLYYPHPFVQELLWDTLY LASEPLLTRWP

LNKLIRQKALKETMKFIHYEDHNSRYITIGCVEKPLCMLACWVEDPNGIAFKKHLNR IADYIWLGEDG

MKVQTFGSQTWDTALGLQALMACNIADEVESVLGKGHDYLKKAQIRDNPVGDYKGNF RHFSKGAWTFS

DQDHGWQVSDCTAEGLKCVLQLSLMPPEIVGEKMEPERLYDAVNFLLSLQDEKTGGL AVWERAGASLL

LEWLNPVEFLEDLIVEHTYVECTASAIEAFTLFKKLYPHHRKKEIENFIVKAVQYIE DEQTADGSWYG

NWGICFIYGTCFALGGLQVAGKTYNNCLAIRRAVDFLLNAQSDDGGWGESYKSCPNK IYTPLEGKRST

WHTALAVLSLISAGQADRDPTPIHRGVKLLINSQLENGDFPQQEIMGVFMRNCMLHY AEYRNIFPLR

ALAEYRKRVPLPN

SEQ ID 7 - MaCYP71CD2 coding sequence (nucleotide)

ATGAATCTCCAACTCGATTACTTCTCCATTACTAGCTTTCTTGTTTTTCTTGTGGTCTTG TTCAGAAT

AGTTTCGGATTGGAACAAGAAATCCACAAACCTCAGACTTCCTCCGGGGCCTTCCAA GCTACCGATTA

TCGGAAGTGTTCATCACATGATCGGTCTGGATGTTGATCTCCCTTATCATGCATTGA CTGATCTTGCC

AAGAAATACGGTCCTCTGATGCATCTACAGCTGGGACAAATGTCTCTTGTCGTTGCT TCATCGGCCAA

AATGTTTAAGGAGTTGATGAAGGAGAACGACCTCGCCATTTCTCAGAGGCCTGTGCC ATATGTCGCCA

GGGTCCTAAACGATGCCGGAAGAGATATTGCTTTTGTCCCCTACGGAGATTACTGGA GACAAATCAGG AAAATTTCCAGGATGGAGCTTTTCAGCGTCAGGAAAGTTCAGTCATTGTATTACATTCGT GAAGATCA

ATCAAACAAGATGATTGATGCCATTGGGGGATCAGCAGAAACAGTAATGAATCTAAG TAAAGCTGTTT

CGGATTACACGAGTACGGTTGTTGCAAGAGCGGCGTTCGGCAGCGGATGCAAGGATC AGGATAAGTTT

ATCAAGTTGTCCCTGGAAATGGTGGCGGCGGCTGGAGCTGTCAGTACTTTGCCGGAT ATGTTCCCTGC

TCTAGGGTTTATTCCTGTACTCAGCGGGAAGAAAGCTTTCTTGCAGAATATTCAGAA GGAAGCTGACA

AGATCTTGGATTACATCATTGATGAACATATTCAGAGAACCAAGAGCAAAGATTACG ACGGCAAGGAA

TCCGACAAGGAGGATATCGTCGATGTCCTTCTCAGGCTTGAGAAAACCGGCGAGCTT GAAATCCCCAT

CACCACTCCAGACATCAAAGCTGTGATTTGGAGTGTATTTGCTGGAGGAACGGATAC ATCATCAACAA

CAACATTATGGGCAATGTCAGAATTGATGAGAAATCCAAAAGTAATGGAGAAGGTGC AAGCTGAAGTA

AGAGAAAAGCTGAAGGGAAAGAAGGAAATTTTGGAGGCAGATATTCAGGATTTACCA TACATGAGAGC

AGTAATCAAAGAAACTCTAAGACTAAGAATTCCAGGTCCATTGTTACTCCCAAGAGA AACCATGGAAC

CAATCGAAGTTGACGGGTACGTAATTCCAGAGAAAACCAAAATTCTGTTCAATGCAT GGGCAGTAACA

AGAGATCCTGAACTCTGGGAAAATCCTGAGAGTTTCATTCCGGAGAGATTTATTGAA AAACAGATAGA

TTTCAAGGGAACGAATTATGAATTCACACCATTTGGATCAGGAAGAAGGATTTGTCC AGGGATGAATT

TTGGCATAGCAAATGTAGAACTTCCATTGGCTAAATTGCTCTACTACTTCAATTGGC AGCTTCCCCAT

GGGATGAAACCGGAAGACCTCGATATGACTGCAAAATTCGGTGTGGTTTGTGGAAGG AAGAATGACTT

GTTTTTGATTCCTACTCCTTACAACATTGAGGTGGAAAATTAA

SEQ ID 8 - MaCYP71CD2 translated nucleotide sequence (peptide)

MNLQLDYFSITSFLVFLVVLFRIVSDWNKKSTNLRLPPGPSKLP IIGSVHHMIGLDVDLPYHALTDLA KKYGPLMHLQLGQMSLWASSAKMFKELMKENDLAISQRPVPYVARVLNDAGRDIAFVPYG DYWRQIR KISRMELFSVRKVQSLYYIREDQSNKMIDAIGGSAETVMNLSKAVSDYTSTVVARAAFGS GCKDQDKF IKLSLEMVAAAGAVSTLPDMFPALGFIPVLSGKKAFLQNIQKEADKILDYIIDEHIQRTK SKDYDGKE SDKEDIVDVLLRLEKTGELEIP ITTPDIKAVTWSVFAGGTDTSSTTTLWAMSELMRNPKVMEKVQAEV REKLKGKKEILEADIQDLPYMRAVIKETLRLRIPGPLLLPRETMEP IEVDGYVTPEKTKILFNAWAVT RDPELWENPESFIPERFIEKQIDFKGTNYEFTPFGSGRRICPGMNFGIANVELPLAKLLY YFNWQLPH GMKPEDLDMTAKFGWCGRKNDLFLIPTPYNIEVEN

SEQ ID 9— CsCYPyiCDl coding sequence (nucleotide)

ATGGAGCAACAATTTGATTACTTCACTGTTACTAGTCTCCTTGTCTTTTTAACGTTCCTG TTAAGATT GGTGTGGGGATGGAAAAAATCGAGTGATAAAATCAAAATCAGATTGCCTCCAGGGCCTTC AAAGCTGC CAATTATTGGAAGCCTACATCACTTGATCGGGCTCGATGTTGATTTGCCCTACTATGCAC TTACAGAT TTGGCCAACAAATATGGACCTTTGATGCATCTGCAACTGGGAAAAATGTCCCTAGTCGTC GCGTCGTC AGCTAAAATGTTTAAAGAGCTGATGAAGGAGAATGATCTAGCCATTTCGCAAAGGCCAGT GCCATACG TTGCCAGAGTCCTGGAAGATGCTGGAAGAGACATTGCCTTTGTTCCCTATGGTGATTACT GGAGACAA ATTCGGAAAATAAGCAGAATGGAGCTGTTCAGTGTCAAGAAAGTCCAGTCATTGCATTAT ATTCGTGA AGATCAATCCAGCAAGCTTGTTGAATCCATTAGGGGTCATGCAGGAACAGTAATGAATCT GAGCAAAG CAGTATCCGACTACACGAGTACTGTTGTTGCAAGAGCAGCTTTTGGTAGCGGATGTAAAG ACCAAGAT AAGTTTATAAGGCTATCCCTAGAAATGGTTGCGGCAGCAGGAGCCGTCAGTACTTTGCCA GATATGTT TCCTGCTCTAGGGTTTATTCCTATACTCAGCGGGAAGAAAGCCTTTTTGAAGAGCATTCA AACGGAAG CTGACAAGATCCTCGACGTCATTATTGATGAGCATATTCAGAAAACCAAGAGCAATGAAT ATGACGGA AAGGAATCAGACAAGGAAGACATAGTTGATGTTCTCCTAAGACTTGAGAAATCTGGCGAA CTTGAAAT CCCTATTACTACCCAAGACATCAAAGCTGTCATCTGGGTAAGATTATATGTCTGCCTTTG ATTTTGTC GATTTTTTTCCCTTTTTGGGGGCATATTTGGATTGACCTGTGTGCAGCAGTTAGCTTCTA ATTCACTG TTTGTTATTGATGGAATAAAATCAGTAACAGTCAAGTAATTAAAAATATTTTATTAAACA ATTAGGTT GCTGTGCTTCCGAAATTAAGCGGAAGCAAAACAATCACCAGCTAGTACTTGTAGATTTTA TATTCGTG TGTTGTCTTTCATTGCTTATCCTTTTTTTTAAAAAAATAAAACATTATATGTGCGATGCA TGTTGTTA TCATCTGCAGAGTGTATTTGCTGGTGGAACGGACACTTCATCAACTACAACATTATGGAC TATGTCGG AGTTGATGAGAAATCCTAAAGTGATGGAGAAGGTACAAGCTGAGATAAGAGAAAAGCTCA AGGGGAAG AAAGAAATCTATGAGTCAGATATTCAGGATTTACACTATATGAGGGCAGTCATCAAAGAA GCTCTAAG ACTAAGGATTCCAGGTCCCTTATTGCTCCCAAGAGAAACCATGGAGCCAATTGAGGTTGA TGGTTACG TAATTCCAGAAAGAACCAAAATACTTTTCAATGCTTGGGCAGTAACAAGAGATCCCCAAC TTTGGGAG AATCCTGAGAGCTTTATTCCAGAAAGGTTCATAGAAAATCCCCTGGATTACAAGGGAACT AACTATGA ATTCACACCATTTGGATCAGGAAGAAGGATTTGTCCAGGCATGAATTTTGGTATAGCCAA TGTAGAGC TTCCATTGGCTAAACTACTCTACTTCTTTAATTGGCAGCTCCCTCCTGGGATGCAACCAC ATGAACTT

GATATGACTGCAAAATTTGGTGTGGTTTGCGGTAGAAAGAATGACTTGTTTCTGATT CCTACTCCTTA

CAATAATATTCCGTGA

SEQ ID 10 - CsCYP71CDl translated nucleotide sequence (peptide)

MEQQFDYFTVTSLLVFLTFLLRLVWGWKKSSDKIKIRLPPGPSKLP IIGSLHHLIGLDVDLPYYALTD LANKYGPLMHLQLGKMSLWASSAKMFKELMKENDLAISQRPVPYVARVLEDAGRDIAFVP YGDYWRQ IRKISRMELFSVKKVQSLHYIREDQSSKLVESIRGHAGTVMNLSKAVSDYTSTVVARAAF GSGCKDQD KFIRLSLEMVAAAGAVSTLPDMFPALGFIP ILSGKKAFLKSIQTEADKILDVIIDEHIQKTKSNEYDG KESDKEDIVDVLLRLEKSGELEIP ITTQDIKAVTWSVFAGGTDTSSTTTLWTMSELMRNPKVMEKVQA EIREKLKGKKEIYESDIQDLHYMRAVIKEALRLRIPGPLLLPRETMEP IEVDGYVTPERTKILFNAWA VTRDPQLWENPESFIPERFIENPLDYKGTNYEFTPFGSGRRICPGMNFGIANVELPLAKL LYFFNWQL PPGMQPHELDMTAKFGWCGRKNDLFLIPTPYNNIP

SEQ ID 11 - AiCYP71CD2 coding sequence (nucleotide)

ATGAATCTCCAACTTGATTACTTCTCCATTACTAGCTTTCTTGTTTTTCTTGTGGTCTTG TTTAGAAT

AGTTTCAGATTGGAAGAAGAAATCTACAAACCTCAGGCTCCCTCCAGGCCCCTCCAA GCTACCGATTA

TCGGAAGTGTTCATCACTTGATCGGTATGGATGTTGATCTCCCTTATCATGCATTCG CTGATCTTGCC

AAGAAATACGGTCCTCTGATGCATCTACAGCTGGGACAAATGTCTCTTGTCGTTGCT TCATCGGCCAA

AATGTTTAAGGAGTTGATGAAGGAGAACGACCTCGCCATTTCTCAGAGGCCTGTGCC GTACGTCGCCA

GGGTCCTGAACGATGCCGGAAGAGATATTGCCTTTGTCCCCTACGGAGATTACTGGA GACAAATCAGG

AAAATTTCCAGGATGGAGCTTTTCAGCGTCAGGAAAGTTCAGTCATTGTATTACATT CGCGAAGATCA

ATCAAACAAGATGATTGATGCCATTCGGGGATCATCAGAAACAGTAATGAATCTAAG TAAAGCTGTTT

CGGATTACACGAGTACGGTTGTTGCAAGAGCGGCGTTCGGCAGCGGATGCAAGGATC AGGATAAGTTT

ATCAAGTTGTCCCTGGAAATGGTGGCCGCGGCTGGAGCTGTCAGTACTTTGCCGGAT ATGTTCCCTGC

TCTAGGGTTTATTCCCATACTCAGCGGGAAGAAAGCTTTCTTGCAGAATATCCAGAA GGAAGCTGACA

AAATCTTGGATTACATCATTGATGAACATATTCAGAGAACCAAGAGCAAAGATTACG ACGGCAAGGAA

TCAGACAAGGAAGATATCGTCGATGTTCTTCTCAGGCTTGAGAAAACCGGCGAGCTT GAAATCCCCAT

CACCACTCAAGACATCAAAGCTGTGATTTGGAGTGTATTTGCCGGAGGAACGGATAC ATCATCAACAA

CAACATTATGGGCAATGTCAGAATTGATGAGAAATCCAAAAGTAATGGAGAAGGTGC AAGCAGAGGTA

AGAGAAAAGCTGAAGGGAAAGAAGGAAATTTTGGAGGCAGATATTCAGGATTTACCA TACATGAGAGC

AGTAATCAAAGAAACTCTAAGACTAAGAATTCCAGGTCCATTGTTACTCCCAAGAGA AACCATGGAAC

CAATCGAAGTTGATGGGTATGTAATTCCGGAGAAAACCAAAATTCTGTTCAATGCAT GGGCAGTAACA

AGAGATCCTGAACTCTGGGAAAATCCTGAGAGTTTCATTCCGGAGAGATTTATTGAA AAACAGATAGA

TTTCAAGGGAACGAATTATGAATTCACACCATTTGGATCAGGAAGAAGGATTTGTCC AGGGATGAATT

TTGGCATAGCAAATGTAGAACTTCCATTGGCTAAATTACTCTACTACTTCAATTGGC AGCTTCCCCAC

GGGATGAAACCAGAAGACCTCGACATGACTGCAAAATTCGGTGTTGTCTGTGGAAGG AAGAATGACTT

GTT"TT"TGATT CCTACTCCTT ACAATATTGAGGGACAAAATTAA

SEQ ID 12 - AiCYP71CD2 translated nucleotide sequence (peptide)

MNLQLDYFSITSFLVFLVVLFRIVSDWKKKSTNLRLPPGPSKLP IIGSVHHLIGMDVDLPYHAFADLA KKYGPLMHLQLGQMSLWASSAKMFKELMKENDLAISQRPVPYVARVLNDAGRDIAFVPYG DYWRQIR KISRMELFSVRKVQSLYYIREDQSNKMIDAIRGSSETVMNLSKAVSDYTSTVVARAAFGS GCKDQDKF IKLSLEMVAAAGAVSTLPDMFPALGFIP ILSGKKAFLQNIQKEADKILDYIIDEHIQRTKSKDYDGKE SDKEDIVDVLLRLEKTGELEIP ITTQDIKAVTWSVFAGGTDTSSTTTLWAMSELMRNPKVMEKVQAEV REKLKGKKEILEADIQDLPYMRAVIKETLRLRIPGPLLLPRETMEP IEVDGYVTPEKTKILFNAWAVT RDPELWENPESFIPERFIEKQIDFKGTNYEFTPFGSGRRICPGMNFGIANVELPLAKLLY YFNWQLPH GMKPEDLDMTAKFGWCGRKNDLFLIPTPYNIEGQN SEQ ID 13 - MaCYP71BQ5 coding sequence (nucleotide)

ATGGAGTTCAGACTGCCTGTTCTCTTATCCTTTCTTCTCTTCTTCTTGATGCTTGTTAGG CATTGGAA

GAGATCCAAGGGCCAAGGGAAGCCACCTCCGGGGCCGAAACCGCTGCCAATTCTGGG AAACTTGCATC

AGTTGGCAGATGGTTTACCACATTATGCTGTGACAAAATTGTGCAGGAAATATGGTC CTGTCATGAAG

CTAAAACTTGGTCAGCTTGATGCTGTGGTCATTTCGTCACCTGAAGCAGCCAAAGAG GTGTTGAAAAC

AAATGAGATCAAGTTTGCTCAGAGACCTGAAGTTTATGCTGTTGAAATCATGTCTTA TGATCATTCAA GTATTGTTTTCTCTCCCTATGGTGACTATTGGAGAGAAATGAGGAAGATATCCGTCCTGG AGCTGTTG AGTAATAGGCGTGTCACATCATTCAGATCAATAAGAGAAGATGAGGTATGGAATCTTGTT CAATTCAT TTCCGAAAATGAAGGATGTATCGTAAATCTTAGCGAGAGGATTTTTATCATGACGAATGA TATTGTTT CTCGAGCAGCCTTCGGTAATAAGTGCGATGATCAACATAATTTTACAGCATTGCTTGAGG AAATCCTG CAGATTGGTGCAGGCTTTGCCATTGCTGATTTGTATCCTTCCCTTACGTTTCTTCGCCCC TTAACTGG AATGAAACCTGCACTTGAGAGAATTCATAAAAAGATGGACAAGATTCTTGAACAAATTGT GACTGAAC ATCAGATAAAAAGAAAAGCTGCTGCCAAGAACAACACTAAGTTTGAAGAGGAAGATCTAG TTGACACA CTTCTCAATTATGCAGAGGCTAACAAGAATGAGTTTCATCTCACAACCGATCAGGTCAAA GCTGTCAC TCTGGACATTTTCTCAGCAGGGAGTGAAACATCTGCAACATCAATGGAATGGGCAATGTC AGAGCTGC TAAAGAACCCAAGAGTGATGAAGAAGGCACAAGAAGAGGTCCGACAAGCATGCAAAGGGA AGAGCAAA ATCAAAGAGACAGACATTCAAAACCTAGAGTACTTGAAATTAGTCATCAAGGAAACATTT AGATTACA CGCTCCGGGTCCTTTTACCCCAAGAGAAGCAAGGGAAACATGTGAGATTGGTGGATATAC AATACCAG CTAAGGCCAAAATCCTCATTAATCTTCATGCAATGGGGAGAGATCCAACAATCTGGAAGG ATCCCGAA TGCTTCCGACCAGAGAGATTTGAAGGTTCTTCCATCGACTTCAAAGGAAATCACTTTGAG TTGATTCC ATTTGGCGGAGGAAGGAGGATTTGTCCAGGCATATCATTTGCTACTGCAAATATTGAACT TGGACTTG CTCAGATGATGTACCATTTCGACTATAAACTTCCAAACGGGAAGAGCCTAGAAGACCTTG ATATGAAT GAGAATTTTGGAATGACATGTAGAAGAAAGGAGAATCTGCAGGTGATCGCCACCACTCGT ATTCCTTT TCAGAAGTGA

SEQ ID 14 - MaCYP71BQ5 translated nucleotide sequence (peptide)

MEFRLPVLLSFLLFFLMLVRHWKRSKGQGKPPPGPKPLPILGNLHQLADGLPHYAVTKLC RKYGPVMK LKLGQLDAWISSPEAAKEVLKTNEIKFAQRPEVYAVEIMSYDHSSIVFSPYGDYWREMRK ISVLELL SNRRVTSFRSIREDEVWNLVQFISENEGCIVNLSERIFIMTNDIVSRAAFGNKCDDQHNF TALLEEIL QIGAGFAIADLYPSLTFLRPLTGMKPALERIHKKMDKILEQIVTEHQIKRKAAAKNNTKF EEEDLVDT LLNYAEANKNEFHLTTDQVKAVTLDIFSAGSETSATSMEWAMSELLKNPRVMKKAQEEVR QACKGKSK IKETDIQNLEYLKLVIKETFRLHAPGPFTPREARETCEIGGYTIPAKAKILINLHAMGRD PTIWKDPE CFRPERFEGSSIDFKGNHFELIPFGGGRRICPGISFATANIELGLAQMMYHFDYKLPNGK SLEDLDMN ENFGMTCRRKENLQVIATTRIPFQK

SEQ ID 15 - CsCYP71BQ4 coding sequence (nucleotide)

ATGGACATTACTACTACAGCAACCCAAATGCTTCATTTGCCATCTCTTCCTGTTCTCTTA TCTTTTCT CCTCTTCTTGTTGATGCTTATTAGATATTGGAAGAATTCCAATGGACAAGGCAAGCAGCC TCCCGGGC CAAAACCGCTGCCGATTCTGGGGAACTTGCATCAGTTGGCTGATGGTCAGCCACACCATG TTATGACA AAACTATGCAGGAAATATGGTCCTGTCATGAAACTAAAACTTGGCCAGCTTGATGCTGTG ATCATTTC ATCACCTGAAGCTGCCAAAGAGGTGCTGAAAACAAATGAGATCAAGTTTGCACAGAGGCC TGAAGTTT ATGCTGTCGAAATCATGTCATATGATCATTCAAGTATCGTGTTTGCTCCCTATGGTGATT ACTGGAGA GAAATGAGGAAGATATCGGTCTTGGAGCTTTTGAGTAACAAGCGTGTCCAGTCATTCAGA TCGATAAG AGAAGATGAGGTATGGGGTCTTGTTGAATTCATTTCCTCAAACCAGGGTCGTCCCATCAA TCTTAGCG AAAAGATCTTCACCATGACAAATGATATTATTGCTCGAGCAGCCTTTGGTAGGAAGAACA GCGACCAG CATAACTTCACTGTATTGCTAGAGGAGATCATGAAAATTGGTGCAGGCTTTGCCATTGCT GATTTGTA CCCTTCCCTTACGTTTCTTCGTCCCTTGACTGGAGTGAAGCCTGCTCTGATGAGAATTCA GAAAAAGA TGGACAAGATTCTTGAAGATATTGTAGCTGAACACAAGTTGAAAAGAAAAGCTGCTGCAA ACAACAAT GTTAAACTCGAAGAGGAAGATCTAGTTGACACACTTCTGAATTATGCGGAGGCTACTAAT AAGAATGA GTTTCATCTCACAATCGATCAGGTTAAAGCTGTCACTTTGGTAAGAGCTATACACCATGC CAAAATTC AGTTTCTATTATTGAAGTTTCCTATCAAATATGTATTAATAAGACCTATTATTGAAGATG ATTAACTT GATGAGACTTCAATATGTTTCCAGGACATTTTTTCAGCAGGTAGTGAGACTTCGGCAACA TCAATGGA ATGGGCTATGTCAGAGCTACTAAAGAACCCAAGAGTAATGAAGAAGGCACAAGAAGAGGT AAGGCAAG CATGCAAAGGAAAGAGCAAAATTCAAGAGGCAGACATTCAAAAGCTAGATTACTTGAAAT TAGTTATC AAGGAAACATTTAGATTACATGCTCCAGGTCCCTTTACCCCAAGAGAAGCAAGGGAAAAA TGTGAGAT TAGAGGATATACAATACCAGCCAAAGCCAAAATCCTCATTAATCTTCATGCAATAGGAAG AGATCCAA CAGTCTGGAAAGATCCTGAATGCTTTCAACCAGAGAGATTTGAAGGTTCTTCTACTGACT TTAAAGGA AATCACTTTGAGTTGATCCCATTTGGTGGAGGAAGGAGGATTTGCCCAGGCATATCATTT GCTACTGC TAATATTGAACTTGGACTTGCTCAACTAATGTACCATTTTGACTGGAAACTTGCAAATGG AGAAAGAC TAGAAGACCTTGATATGTCTGAGAATTTTGGAATGACAGCTAGAAGAAAGGAGAATCTGC AGGTGATC

GCCACCACTCGTATTCCTTTTGAGAAGTGA

SEQ ID 16 - CsCYP71BQ4 translated nucleotide sequence (peptide)

MDITTTATQMLHLPSLPVLLSFLLFLLMLIRYWKNSNGQGKQPPGPKPLP ILGNLHQLADGQPHHVMT KLCRKYGPVMKLKLGQLDAVIISSPEAAKEVLKTNEIKFAQRPEVYAVEIMSYDHSSIVF APYGDYWR EMRKISVLELLSNKRVQSFRSIREDEVWGLVEFISSNQGRPINLSEKIFTMTNDIIARAA FGRKNSDQ HNFTVLLEEIMKIGAGFAIADLYPSLTFLRPLTGVKPALMRIQKKMDKILEDIVAEHKLK RKAAANNN VKLEEEDLVDTLLNYAEATNKNEFHLTIDQVKAVTLDIFSAGSETSATSMEWAMSELLKN PRVMKKAQ EEVRQACKGKSKIQEADIQKLDYLKLVIKETFRLHAPGPFTPREAREKCEIRGYTIPAKA KILINLHA IGRDPTVWKDPECFQPERFEGSSTDFKGNHFELIPFGGGRRICPGISFATANIELGLAQL MYHFDWKL ANGERLEDLDMSENFGMTARRKENLQVIATTRIPFEK

SEQ ID 17 - AiCYP71BQ5 partial coding sequence (nucleotide)

GTTTATGCTGTTGAAATCATGTCTTATGATCATTCAAGTATTGTTTTCTCTCCCTATGGT GACTATTG

GAGAGAAATGAGGAAGATATCCGTCCTGGAGCTGTTAAGTAATAGGCGTGTCACGTC ATTCAGATCAA

TAAGAGAAGATGAGGTATGGAGTCTTGTTCAATTCATTTCCGAAAATGAGGGATGTA TCATAAATCTT

AGCGAGAGGATTTTTACCATGACGAATGATATTATTTCTCGAGCAGCCTTTGGTAAT AAGTGCGATGA

TCAACATAATTTTACAGCATTGCTTGAGGAAATCCTGCAGATTGGTGCAGGCTTTGC CATTGCTGATT

TGTACCCTTCCCTTACATTTCTTCGCCCCTTAACTGGAATGAAACCTGCACTTGAGA GAATTCATAAA

AAGATGGACAAGATTCTTGAAGAAATTGTGACTGAACATCAGATAAAAAGAAAAGCA GCTGCCAAGAA

CAACACTGAGTTTGAAGAGGAGGATCTAGTTGACACACTTCTCAATTATGCAGAGGC TAACAAGAATG

AGTTTCATCTCACAACCGATCAGGTCAAAGCTGTCACTCTGGACATTTTCTCAGCGG GGAGTGAAACA

TCTGCAACATCAATGGAATGGGCAATGTCAGAGCTGCTAAAGAACCCAAGAGTGATG AAGAAGGCACA

AGAAGAGGTCCGACAAGCATGCAAAGGGAAGAGCAAAATCAGAGAGGCAGACATTCA AAACCTAGAGT

ACTTGAAATTAGTCATCAAGGAAACATTTAGATTACACGCTCCGGGTCCTTTTACCC CGAGAGAAGCA

AGGGAGACATGTGAGATTGGTGGATATACAATACCAGCCAAGGCCAAAATCCTCATT AATCTTCATGC

AATGGGGAGAGATCCAACAATCTGGAAGGATCCCGAATGCTTCCAACCAGAGAGATT TGAAGGTTCTT

CCATCGACTTCAAAGGAAATCACTTTGAGTTGATTCCATTTGGCGGAGGAAGGAGGA TTTGTCCAGGC

ATATCATTTGCTACTGCAAACATTGAACTTGGACTTGCTCAGATGATGTACCATTTC GACTTTAAACT

TCCAAATGGGAAGAGCCTAGAAGACCTTGATATGGATGAGAATTTTGGAATGACATG TAGAAGAAAGG

AGAATCTGCAGGTGATCGCCACCACTCGTATTCCTTTTGAGAAGTGA

SEQ ID 18 - AiCYP71BQ5 translated partial nucleotide sequence (peptide)

VYAVEIMSYDHSSIVFSPYGDYWREMRKISVLELLSNRRVTSFRSIREDEVWSLVQFISE NEGCIINL

SERIFTMTNDIISRAAFGNKCDDQHNFTALLEEILQIGAGFAIADLYPSLTFLRPLT GMKPALERIHK

KMDKILEEIVTEHQIKRKAAAKNNTEFEEEDLVDTLLNYAEANKNEFHLTTDQVKAV TLDIFSAGSET

SATSMEWAMSELLKNPRVMKKAQEEVRQACKGKSKIREADIQNLEYLKLVIKETFRL HAPGPFTPREA

RETCEIGGYTIPAKAKILINLHAMGRDPTIWKDPECFQPERFEGSSIDFKGNHFELI PFGGGRRICPG

ISFATANIELGLAQMMYHFDFKLPNGKSLEDLDMDENFGMTCRRKENLQVIATTRIP FEK

SEQ ID 19 - Ailanthus altissima tirucalla-7,24-dien-3¾-ol synthase of JP2005052009A coding sequence (nucleotide)

ATGTGGAGGCTTAAGATTGCAGAGGGTGACAAAAATAGCCCATACATTTTTACAACAAAC AATTTTGT

GGGAAGGCAAATATGGGAATTTGATCCAAATTATGCTGCCTCGCCGGAAGAGCTAGC TGAAGTTGAAG

AGGCTCGCCAGAAGTTTCACAAAAATCGCCACAAGGTCAAGCCTGCCAGTGATCTTA TGTGGCGGCTA

CAGTTCCTTAGAGAGAAAAACTTCAAGCAAACAATCCCTCCAGTAAAGGTTAAGGAT GAGGAGGAAAT

CACGTATGAAACGGCAACCAAGGCAGTGAAGAGGGCTGCTAGCTATTTTTCAGCCAT ACAGGCTAACG

ATGGCCACTGGCCTGCTGAAAATGCTGGCCCTATGTACTTCCTTCCTCCATTTGTCT TCTGCCTGTAC

ATTACAGGGCATCTTGATGCTGTATTTACAGCTGAGCACAAAAAAGAAATCCTTCGC TATTTGTACAA

TCATCAGCATGAAGATGGTGGGTGGGGAATACACATAGAAGGCCACAGCAGCATGTT TGGCACAGTTT

ACGGCTACATTACTATGCGTTTACTTGGATTAGGACCCAATGATGGTGAAAACAATG CTTGTGCAAGA

GCACGAAAATGGATTCGCGACAATGGTGGTGTCACATACATACCCTCCTGGGGAAAG AATTGGCTGTC

GATACTTGGATTGTTTGAATGGGCTGGAACCCACCCAATGCCGCCTGAGTTCTGGCT GCTCCCTTCTT ATTTTCCATTGCATCCAGCACAAATGTGGTGCTATTGCCGACTTGTTTACATGCCATTAT CTTATTTG

TATGGGAAAAGATTTGTTGGTCCAATCACTCCGCTTATTCAACAATTGAGGAATGAA CTTCACACTCA

GCCGTACAAGGAAATAAATTGGAGGAAAGTTCGTCATTTATGTGCAAAGCCGGATCT CTACTATCCCC

ACACCGTGGTACAGAACATACTTTGGGATGGTATGTACTTGGCAACAGAGCCTCTCC TAACTCGTTGG

CCTTTGAACAAGTATCTTAGACAGAAGGCTTTAAAAGAAACAATGAAGATCATTCAT TATGAAGACCA

AAGTAGTAGATACATTACCATAGGAAGTGTAGAAAAGCCTTTATGTATGCTTGCTTG TTGGGTTGAAG

ATCCCGATGGTGTTGCCTTTAAGAAGCATCTTGCTAGAGTTTCAGATTACTTCTGGC TTGGAGAAGAT

GGAATGAAAGCTCAGACTTTTGGAAGTCAAACATGGGATACTGCTCTTGGCCTTCAA GCTTTGCTTGC

TTGCGATCTCGTCGATGAAATTGCACCTACTCTTGCAAAAGGACACGACTACTTAAA GAAAGCTCAGG

TGAGGGATAATCCAATAGGCGATTATACAAGCAATTTCCGTCACTTTTCTAAAGGAG CGTGGACTTTC

TCTGATCAAGATCATGGATGGCAAGTTTCGGACTGTACAGCAGAAAGTTTGAAGTGT TGCCTAAATTT

CTCAATGATGTCACCTGAAATCGTTGGCGAGAAAATCGAACCTGAGAGGTTATATGA TGCTGTCAATT

TCATACTCTCTCTTCAGGACAAAACTACTGGTGGATTAGCAGTTTGGGAAAAAGCCG GTGCCTCGTTG

TTATTGGAGTGGCTCAATCCTGTCGAGTTTCTTGAGGACCTTATTGTCGAGCATACG TACGTCGAATG

CACTGCTTCAGCAATTGAAGCATTTGTTTTGTTTAGGAAATTATACCCACATCACCG AAAGAAGGAGA

TTGATAATTTCATTGTAAAAGCTGTACAGTATATTGAACACGAACAAACTGCCGATG GTTCATGGTAT

GGAAATTGGGGAATTTGCTTCCTATACGGTTCATGTTTTGCACTTGGAGGCTTGGCT GCTGCTGGCAA

AACTTACCACAATTGTGAAGCCATTCGTAGAGGAGTTGATTTTCTGCTAAAAGCACA AAGTGATGATG

GTGGCTGGGGAGAGAGCTACCAGTCATGCCCAAATAAGATATATACACCACTTGATG GGAAGAGATCA

ACCGTCGTACACACTGCATTGGCTGTTCTTGGTTTAATTCATGCCGGGCAGGCTGAG AGAGACGCCAC

TCCTATTCATCGCGGTGTCAAGTTTTTGATCAACTCTCATTTGGAAAATGGAGACTT CCCACAACAGG

AAATTATGGGAGTTTTCATGAGAAACTGCATGTTACATTATGCAGAATACAGGAATA TTTTTCCATTG

TGGGCTTTAGCTGAATACCGAAGGAAAGTTCCATTGCCTAATTGA

SEQ ID 20 - Ailanthus altissima tirucalla-7,24-dien-3R-ol synthase of JP2005052009A translated nucleotide sequence (peptide)

MWRLKIAEGDKNSPYIFTTNNFVGRQIWEFDPNYAASPEELAEVEEARQKFHKNRHKVKP ASDLMWRL QFLREKNFKQTIPPVKVKDEEEITYETATKAVKRAASYFSAIQANDGHWPAENAGPMYFL PPFVFCLY ITGHLDAVFTAEHKKEILRYLYNHQHEDGGWGIHIEGHSSMFGTVYGYITMRLLGLGPND GENNACAR ARKWIRDNGGVTYIPSWGKNWLSILGLFEWAGTHPMPPEFWLLPSYFPLHPAQMWCYCRL VYMPLSYL YGKRFVGP ITPLIQQLRNELHTQPYKEINWRKVRHLCAKPDLYYPHTVVQNILWDGMYLATEPLLTRW PLNKYLRQKALKETMKIIHYEDQSSRYITIGSVEKPLCMLACWVEDPDGVAFKKHLARVS DYFWLGED GMKAQTFGSQTWDTALGLQALLACDLVDEIAPTLAKGHDYLKKAQVRDNP IGDYTSNFRHFSKGAWTF SDQDHGWQVSDCTAESLKCCLNFSMMSPEIVGEKIEPERLYDAVNFILSLQDKTTGGLAV WEKAGASL LLEWLNPVEFLEDLIVEHTYVECTASAIEAFVLFRKLYPHHRKKEIDNFIVKAVQYIEHE QTADGSWY GNWGICFLYGSCFALGGLAAAGKTYHNCEAIRRGVDFLLKAQSDDGGWGESYQSCPNKIY TPLDGKRS TVVHTALAVLGLIHAGQAERDATP IHRGVKFLINSHLENGDFPQQEIMGVFMRNCMLHYAEYRNIFPL WALAEYRRKVPLPN

SEQ ID 21 - AsHMGR coding sequence (nucleotide)

ATGGCTGTGGAGGTTCACCGCCGGGCTCCCGCGCCCCATGGCCGGGGCACCGGGGAGAAG GGCCGCGT

GCAGGCCGGGGACGCGCTGCCGCTGCCGATCCGCCACACCAACCTCATCTTCTCGGC GCTCTTCGCCG

CCTCCCTCGCATACCTCATGCGCCGCTGGAGGGAGAAGATCCGCAACTCCACGCCGC TCCACGTCGTG

GGGCTCACCGAGATCTTCGCCATCTGCGGCCTCGTCGCCTCCCTCATCTACCTCCTC AGCTTCTTCGG

CATCGCCTTCGTGCAGTCCGTCGTATCCAACAGCGACGACGAGGACGAGGACTTCCT CATCGCGGCTG

CAGCATCCCAGGCCCCCCCGCCGCCCTCCTCCAAGCCCGCGCCGCAGCAGTGCGCCC TGCTGCAGAGC

GCCGGAGTCGCGCCCGAGAAAATGCCCGAGGAGGACGAGGAAATCGTCGCCGGGGTC GTCGCAGGGAA

GATCCCCTCCTACGTGCTCGAGACCAGGCTAGGCGACTGCCGCAGGGCAGCCGGGAT CCGCCGCGAGG

CGCTGCGCCGGATCACCGGCAGGGAGATCGACGGCCTTCCCCTCGACGGCTTCGACT ACGACTCGATT

CTCGGACAGTGCTGCGAGATGCCCGTCGGGTACGTGCAGCTGCCGGTCGGCGTCGCG GGGCCGCTCGT

CCTCGACGGCCGCCGCATATACGTCCCGATGGCCACCACGGAGGGCTGCCTAATCGC CAGCACCAACC

GCGGATGCAAGGCCATTGCCGAGTCCGGAGGCGCATCCAGCGTCGTGTACCGCGACG GGATGACCCGC

GCCCCCGTAGCCCGCTTCCCCTCCGCACGACGCGCCGCAGAGCTCAAGGGCTTCCTG GAGAATCCGGC CAACTACGACACCCTGTCCGTGGTCTTTAACAGATCAAGCAGATTTGCAAGGCTGCAGGG GGTCAAGT

GCGCCATGGCTGGGAGGAACTTGTACATGAGGTTCACCTGCAGCACCGGGGATGCCA TGGGGATGAAC

ATGGTCTCCAAGGGCGTCCAAAATGTGCTCGACTATCTGCAGGAGGACTTCCCTGAC ATGGACGTTGT

CAGCATCTCAGGCAACTTTTGTTCCGACAAGAAATCAGCTGCTGTAAACTGGATTGA AGGCCGTGGAA

AGTCCGTGGTTTGTGAGGCAGTAATCAGAGAGGAAGTTGTCCACAAGGTTCTCAAGA CCAACGTTCAG

TCACTCGTGGAGTTGAATGTGATCAAGAACCTTGCTGGCTCAGCAGTTGCTGGTGCT CTTGGGGGTTT

CAACGCCCACGCAAGCAACATCGTAACGGCTATCTTCATTGCCACTGGTCAGGATCC TGCACAGAATG

TGGAGAGCTCACAGTGTATCACTATGTTGGAAGCTGTAAATGATGGCAGAGACCTTC ACATCTCCGTT

ACAATGCCATCTATCGAGGTGGGCACAGTTGGTGGAGGCACGCAGCTGGCCTCACAG TCGGCCTGCTT

GGACCTACTGGGCGTCAAAGGCGCCAACAGGGAATCTCCGGGGTCGAACGCTAGGCT GCTGGCCACGG

TGGTGGCTGGTGCCGTCCTAGCTGGGGAGCTGTCCCTCATCTCCGCCCAAGCTGCCG GCCATCTGGTC

CAGAGCCACATGAAATACAACAGATCCAGCAAGGACATGTCCAAGATCGCCTGCTGA

SEQ ID 22 - AsHMGR translated nucleotide sequence (peptide)

MAVEVHRRAPAPHGRGTGEKGRVQAGDALPLP IRHTNLIFSALFAASLAYLMRRWREKIRNSTPLHW GLTEIFAICGLVASLIYLLSFFGIAFVQSVVSNSDDEDEDFLIAAAASQAPPPPSSKPAP QQCALLQS AGVAPEKMPEEDEEIVAGWAGKIPSYVLETRLGDCRRAAGIRREALRRITGREIDGLPLD GFDYDSI LGQCCEMPVGYVQLPVGVAGPLVLDGRRIYVPMATTEGCLIASTNRGCKAIAESGGASSV VYRDGMTR APVARFPSARRAAELKGFLENPANYDTLSVVFNRSSRFARLQGVKCAMAGRNLYMRFTCS TGDAMGMN MVSKGVQNVLDYLQEDFPDMDVVSISGNFCSDKKSAAVNWIEGRGKSVVCEAVIREEVVH KVLKTNVQ SLVELNVIKNLAGSAVAGALGGFNAHASNIVTAIFIATGQDPAQNVESSQCITMLEAVND GRDLHISV TMPSIEVGTVGGGTQLASQSACLDLLGVKGANRESPGSNARLLATVVAGAVLAGELSLIS AQAAGHLV QSHMKYNRSSKDMSKIAC

SEQ ID 23 - AstHMGR coding sequence (nucleotide)

ATGGCGCCCGAGAAAATGCCCGAGGAGGACGAGGAAATCGTCGCCGGGGTCGTCGCAGGG AAGATCCC

CTCCTACGTGCTCGAGACCAGGCTAGGCGACTGCCGCAGGGCAGCCGGGATCCGCCG CGAGGCGCTGC

GCCGGATCACCGGCAGGGAGATCGACGGCCTTCCCCTCGACGGCTTCGACTACGACT CGATTCTCGGA

CAGTGCTGCGAGATGCCCGTCGGGTACGTGCAGCTGCCGGTCGGCGTCGCGGGGCCG CTCGTCCTCGA

CGGCCGCCGCATATACGTCCCGATGGCCACCACGGAGGGCTGCCTAATCGCCAGCAC CAACCGCGGAT

GCAAGGCCATTGCCGAGTCCGGAGGCGCATCCAGCGTCGTGTACCGCGACGGGATGA CCCGCGCCCCC

GTAGCCCGCTTCCCCTCCGCACGACGCGCCGCAGAGCTCAAGGGCTTCCTGGAGAAT CCGGCCAACTA

CGACACCCTGTCCGTGGTCTTTAACAGATCAAGCAGATTTGCAAGGCTGCAGGGGGT CAAGTGCGCCA

TGGCTGGGAGGAACTTGTACATGAGGTTCACCTGCAGCACCGGGGATGCCATGGGGA TGAACATGGTC

TCCAAGGGCGTCCAAAATGTGCTCGACTATCTGCAGGAGGACTTCCCTGACATGGAC GTTGTCAGCAT

CTCAGGCAACTTTTGTTCCGACAAGAAATCAGCTGCTGTAAACTGGATTGAAGGCCG TGGAAAGTCCG

TGGTTTGTGAGGCAGTAATCAGAGAGGAAGTTGTCCACAAGGTTCTCAAGACCAACG TTCAGTCACTC

GTGGAGTTGAATGTGATCAAGAACCTTGCTGGCTCAGCAGTTGCTGGTGCTCTTGGG GGTTTCAACGC

CCACGCAAGCAACATCGTAACGGCTATCTTCATTGCCACTGGTCAGGATCCTGCACA GAATGTGGAGA

GCTCACAGTGTATCACTATGTTGGAAGCTGTAAATGATGGCAGAGACCTTCACATCT CCGTTACAATG

CCATCTATCGAGGTGGGCACAGTTGGTGGAGGCACGCAGCTGGCCTCACAGTCGGCC TGCTTGGACCT

ACTGGGCGTCAAAGGCGCCAACAGGGAATCTCCGGGGTCGAACGCTAGGCTGCTGGC CACGGTGGTGG

CTGGTGCCGTCCTAGCTGGGGAGCTGTCCCTCATCTCCGCCCAAGCTGCCGGCCATC TGGTCCAGAGC

CACATGAAATACAACAGATCCAGCAAGGACATGTCCAAGATCGCCTGCTGA

SEQ ID 24 - AstHMGR translated nucleotide sequence (peptide)

MAPEKMPEEDEEIVAGWAGKIPSYVLETRLGDCRRAAGIRREALRRITGREIDGLPLDGF DYDSILG

QCCEMPVGYVQLPVGVAGPLVLDGRRIYVPMATTEGCLIASTNRGCKAIAESGGASS VVYRDGMTRAP

VARFPSARRAAELKGFLENPANYDTLSVVFNRSSRFARLQGVKCAMAGRNLYMRFTC STGDAMGMNMV

SKGVQNVLDYLQEDFPDMDVVSISGNFCSDKKSAAVNWIEGRGKSVVCEAVIREEVV HKVLKTNVQSL

VELNVIKNLAGSAVAGALGGFNAHASNIVTAIFIATGQDPAQNVESSQCITMLEAVN DGRDLHISVTM

PSIEVGTVGGGTQLASQSACLDLLGVKGANRESPGSNARLLATVVAGAVLAGELSLI SAQAAGHLVQS

HMKYNRSSKDMSKIAC SEQ ID 25 - AsSQS coding sequence (nucleotide)

ATGGGGGCGCTGTCGCGGCCGGAGGAGGTGGTGGCGCTGGTCAAGCTGAGGGTGGCGGCG GGGCAGAT

CAAGCGCCAGATCCCGGCCGAGGAACACTGGGCCTTCGCCTACGACATGCTCCAGAA GGTCTCCCGCA

GCTTCGCGCTCGTCATCCAGCAGCTCGGACCCGAACTCCGCAATGCCGTGTGCATCT TCTACCTCGTG

CTCCGGGCCCTGGACACCGTCGAGGACGACACCAGCATCCCCAACGACGTGAAGCTG CCCATCCTTCG

GGATTTCTACCGCCATGTCTACAACCCCGACTGGCGTTATTCATGTGGAACAAACCA CTACAAGGTGC

TGATGGATAAGTTCAGACTCGTCTCCACGGCTTTCCTGGAGCTAGGCGAAGGATATC AAAAGGCAATT

GAAGAAATCACTAGGCGAATGGGAGCAGGAATGGCAAAATTTATATGCCAGGAGGTT GAAACGATTGA

TGACTATAATGAGTACTGCCACTATGTAGCAGGGCTAGTAGGCTATGGACTTTCCAG GCTCTTTCATG

CTGCTGGGACAGAAGATCTGGCTTCAGATCAACTTTCGAATTCAATGGGTTTGTTTC TTCAGAAAACC

AATATAATAAGGGATTATTTGGAGGATATAAATGAGATACCAAAGTGCCGTATGTTT TGGCCTCGAGA

AATATGGAGTAAATATGCAGATAAACTTGAGGACCTCAAGTATGAGGAAAATTCAGA AAAAGCAGTGC

AATGCTTGAATGATATGGTGACTAATGCTTTGGTCCACGCCGAAGACTGTCTTCAAT ACATGTCTGCG

TTGAAGGATAATACTAATTTTCGGTTTTGTGCAATACCTCAGATAATGGCAATTGGG ACATGTGCTAT

TTGCTACAATAATGTGAAAGTCTTTAGAGGAGTTGTTAAGATGAGGCGTGGGCTCAC TGCACGAATAA

TTGATGAGACAAAATCAATGTCAGATGTCTATTCTGCTTTCTATGAGTTCTCTTCAT TGCTAGAGTCA

AAGATTGACGATAACGACCCAAGTTCTGCACTAACACGGAAGCGTGTAGAGGCAATA AAGAGGACTTG

CAAGTCATCCGGTTTACTAAAGAGAAGGGGATACGACCTGGAAAAGTCAAAGTATAG GCATATGTTGA

TCATGCTTGCACTTCTGTTGGTGGCTATTATCTTCGGTGTACTGTACGCCAAGTGA

SEQ ID 26 - AsSQS translated nucleotide sequence (peptide)

MGALSRPEEVVALVKLRVAAGQIKRQIPAEEHWAFAYDMLQKVSRSFALVIQQLGPELRN AVCIFYLV LRALDTVEDDTSIPNDVKLP ILRDFYRHVYNPDWRYSCGTNHYKVLMDKFRLVSTAFLELGEGYQKAI EEITRRMGAGMAKFICQEVETIDDYNEYCHYVAGLVGYGLSRLFHAAGTEDLASDQLSNS MGLFLQKT NIIRDYLEDINEIPKCRMFWPREIWSKYADKLEDLKYEENSEKAVQCLNDMVTNALVHAE DCLQYMSA LKDNTNFRFCAIPQIMAIGTCAICYNNVKVFRGVVKMRRGLTARIIDETKSMSDVYSAFY EFSSLLES KIDDNDPSSALTRKRVEAIKRTCKSSGLLKRRGYDLEKSKYRHMLIMLALLLVAIIFGVL YAK*

SEQ ID 27 - AtATR2 coding sequence (nucleotide)

ATGAAAAACATGATGAATTATAAATTAAAACTCTGTTCTGTCTCAAAAAACTCAAAAGGA GTCTCTCT CTCACCTACACCACACCTAACCAAACCCCCTACGATTCACACAGAGAGAGATCTTCTTCT TCCTTCTT CTTCCTTCTTCTTTCTTCTTCTTTCTTCTTCTAGCTACAACATCTACAACGCCATGTCCT CTTCTTCT TCTTCGTCAACCTCCATGATCGATCTCATGGCAGCAATCATCAAAGGAGAGCCTGTAATT GTCTCCGA CCCAGCTAATGCCTCCGCTTACGAGTCCGTAGCTGCTGAATTATCCTCTATGCTTATAGA GAATCGTC AATTCGCCATGATTGTTACCACTTCCATTGCTGTTCTTATTGGTTGCATCGTTATGCTCG TTTGGAGG AGATCCGGTTCTGGGAATTCAAAACGTGTCGAGCCTCTTAAGCCTTTGGTTATTAAGCCT CGTGAGGA AGAGATTGATGATGGGCGTAAGAAAGTTACCATCTTTTTCGGTACACAAACTGGTACTGC TGAAGGTT TTGCAAAGGCTTTAGGAGAAGAAGCTAAAGCAAGATATGAAAAGACCAGATTCAAAATCG TTGATTTG GATGATTACGCGGCTGATGATGATGAGTATGAGGAGAAATTGAAGAAAGAGGATGTGGCT TTCTTCTT CTTAGCCACATATGGAGATGGTGAGCCTACCGACAATGCAGCGAGATTCTACAAATGGTT CACCGAGG GGAATGACAGAGGAGAATGGCTTAAGAACTTGAAGTATGGAGTGTTTGGATTAGGAAACA GACAATAT GAGCATTTTAATAAGGTTGCCAAAGTTGTAGATGACATTCTTGTCGAACAAGGTGCACAG CGTCTTGT ACAAGTTGGTCTTGGAGATGATGACCAGTGTATTGAAGATGACTTTACCGCTTGGCGAGA AGCATTGT GGCCCGAGCTTGATACAATACTGAGGGAAGAAGGGGATACAGCTGTTGCCACACCATACA CTGCAGCT GTGTTAGAATACAGAGTTTCTATTCACGACTCTGAAGATGCCAAATTCAATGATATAAAC ATGGCAAA TGGGAATGGTTACACTGTGTTTGATGCTCAACATCCTTACAAAGCAAATGTCGCTGTTAA AAGGGAGC TTCATACTCCCGAGTCTGATCGTTCTTGTATCCATTTGGAATTTGACATTGCTGGAAGTG GACTTACG TATGAAACTGGAGATCATGTTGGTGTACTTTGTGATAACTTAAGTGAAACTGTAGATGAA GCTCTTAG ATTGCTGGATATGTCACCTGATACTTATTTCTCACTTCACGCTGAAAAAGAAGACGGCAC ACCAATCA GCAGCTCACTGCCTCCTCCCTTCCCACCTTGCAACTTGAGAACAGCGCTTACACGATATG CATGTCTT TTGAGTTCTCCAAAGAAGTCTGCTTTAGTTGCGTTGGCTGCTCATGCATCTGATCCTACC GAAGCAGA ACGATTAAAACACCTTGCTTCACCTGCTGGAAAGGATGAATATTCAAAGTGGGTAGTAGA GAGTCAAA GAAGTCTACTTGAGGTGATGGCCGAGTTTCCTTCAGCCAAGCCACCACTTGGTGTCTTCT TCGCTGGA

GTTGCTCCAAGGTTGCAGCCTAGGTTCTATTCGATATCATCATCGCCCAAGATTGCT GAAACTAGAAT

TCACGTCACATGTGCACTGGTTTATGAGAAAATGCCAACTGGCAGGATTCATAAGGG AGTGTGTTCCA

CTTGGATGAAGAATGCTGTGCCTTACGAGAAGAGTGAAAACTGTTCCTCGGCGCCGA TATTTGTTAGG

CAATCCAACTTCAAGCTTCCTTCTGATTCTAAGGTACCGATCATCATGATCGGTCCA GGGACTGGATT

AGCTCCATTCAGAGGATTCCTTCAGGAAAGACTAGCGTTGGTAGAATCTGGTGTTGA ACTTGGGCCAT

CAGTTTTGTTCTTTGGATGCAGAAACCGTAGAATGGATTTCATCTACGAGGAAGAGC TCCAGCGATTT

GTTGAGAGTGGTGCTCTCGCAGAGCTAAGTGTCGCCTTCTCTCGTGAAGGACCCACC AAAGAATACGT

ACAGCACAAGATGATGGACAAGGCTTCTGATATCTGGAATATGATCTCTCAAGGAGC TTATTTATATG

TTTGTGGTGACGCCAAAGGCATGGCAAGAGATGTTCACAGATCTCTCCACACAATAG CTCAAGAACAG

GGGTCAATGGATTCAACTAAAGCAGAGGGCTTCGTGAAGAATCTGCAAACGAGTGGA AGATATCTTAG

AGATGTATGGTAA

SEQ ID 28 - AtATR2 translated nucleotide sequence (peptide)

MKNMMNYKLKLCSVSKNSKGVSLSPTPHLTKPPTIHTERDLLLPSSSFFFLLLSSSSYNI YNAMSSSS SSSTSMIDLMAAIIKGEPVIVSDPANASAYESVAAELSSMLIENRQFAMIVTTSIAVLIG CIVMLVWR RSGSGNSKRVEPLKPLVIKPREEEIDDGRKKVTIFFGTQTGTAEGFAKALGEEAKARYEK TRFKIVDL DDYAADDDEYEEKLKKEDVAFFFLATYGDGEPTDNAARFYKWFTEGNDRGEWLKNLKYGV FGLGNRQY EHFNKVAKWDDILVEQGAQRLVQVGLGDDDQCIEDDFTAWREALWPELDTILREEGDTAV ATPYTAA VLEYRVSIHDSEDAKFNDINMANGNGYTVFDAQHPYKANVAVKRELHTPESDRSCIHLEF DIAGSGLT YETGDHVGVLCDNLSETVDEALRLLDMSPDTYFSLHAEKEDGTP ISSSLPPPFPPCNLRTALTRYACL LSSPKKSALVALAAHASDPTEAERLKHLASPAGKDEYSKWWESQRSLLEVMAEFPSAKPP LGVFFAG VAPRLQPRFYSISSSPKIAETRIHVTCALVYEKMPTGRIHKGVCSTWMKNAVPYEKSENC SSAPIFVR QSNFKLPSDSKVPI IMIGPGTGLAPFRGFLQERLALVESGVELGPSVLFFGCRNRRMDFIYEEELQRF VESGALAELSVAFSREGPTKEYVQHKMMDKASDIWNMISQGAYLYVCGDAKGMARDVHRS LHTIAQEQ GSMDSTKAEGFVKNLQTSGRYLRDVW

SEQ ID 29- AtLUP5 coding sequence (nucleotide)

ATGTGGAGGTTAAAGGTAGGAGAAGGAAAAGGAAAAGATCCTTATTTATTCAGCAGCAAC AACTTCGT GGGACGTCAAACATGGGAGTTTGACCCCAAAGCCGGCACACGGGAGGAACGAACCGCAGT CGAAGAAG CTCGCCGGAGTTTCTTCGACAACCGTTCTCGTGTTAAACCTTCCAGTGATCTATTGTGGA AAATGCAA TTTCTAAAAGAGGCAAAATTTGAGCAAGTGATTCCGCCGGTAAAAATTGACGGTGGGGAA GCCATAAC TTATGAAAAAGCGACAAATGCATTACGGCGAGGAGTTGCTTTCTTATCAGCTTTGCAAGC CTCCGACG GCCACTGGCCGGGAGAGTTCACCGGACCGCTCTGCATGCTTCCGCCATTGGTATTTTGTT TGTACATT ACTGGACACTTGGAAGAGGTATTCGATGCAGAGCATCGCAAAGAGATGCTTCGATATATC TATTGTCA CCAGAACGAAGATGGTGGATGGGGATTCCACATTGAGAGCAAAAGCATTATGTTCACTAC CACGCTGA ATTACATATGCTTGCGTATACTTGGAGTAGGTCCCGATGGAGGACTAGAAAACGCATGCA AACGGGCC AGGCAATGGATTCTTAGCCATGGCGGTGTGATTTATATTCCTTGTTGGGGAAAAGTTTGG CTCTCGGT ACTTGGAATCTATGATTGGTCTGGAGTCAACCCGATGCCTCCCGAGATTTGGTTGCTACC TTATTTCC TACCAATTCACCTAGGGAAAGCTTTTAGCTATACCCGGATAACATATATGCCCATCTCTT ATCTATAT GGCAAAAAATTCGTGGGTCAAATTACACCTCTTATTATGCAACTACGTGAAGAACTACAC TTACAACC TTATGAAGAAATCAACTGGAACAAAGCGCGACATCTATGCGCAAAGGAAGACAAGTACTA TCCCCATC CTCTAGTTCAAGATTTGATATGGGATGCTCTCCACACCTTCGTGGAGCCTTTGCTTGCAA GTTGGCCG ATAAACAAACTTGTAAGGAAAAAGGCTCTTCAGGTGGCAATGAAACACATACATTACGAG GACGAAAA CAGTCACTATATCACCATTGGATGTATTGAAAAGAATTTGTGCATGCTTGCTTGCTGGAT TGACAACC CGGACGGGAATCACTTTAAAAAGCATCTCTCTAGAATTCCGGACATGATGTGGGTAGCTG AAGATGGA ATGAAAATGCAGTGCTTTGGAAGTCAACTTTGGATGACGGGATTTGCAGTTCAGGCTTTA CTAGCAAG TGATCCACGCGATGAAACCTATGACGTGCTCAGGAGAGCACACGATTACATAAAGAAATC ACAGGTTA GAGACAACCCATCAGGTGACTTCAAGAGCATGTACCGCCACATCTCCAAAGGAGGATGGA CTCTTTCT GATCGAGATCATGGATGGCAAGTTTCAGATTGTACAGCTGAAGCTGCTAAGTGTTGCATG TTGCTTTC CACAATGCCAACTGATATCACTGGAGAGAAAATCAATCTTGAACAACTATACGATTCTGT TAATCTCA TGTTATCTCTACAAAGTGAAAATGGAGGTTTTACTGCATGGGAACCTGTTCGCGCCTATA AATGGATG GAATTGATGAATCCCACAGATTTGTTTGCTAATGCTATGACCGAGCGTGAATATACAGAA TGTACCTC

AGCTGTGTTACAAGCTTTGGTTATATTCAATCAACTATATCCGGATCATAGGACAAA AGAGATCACTA

AGTCGATTGAGAAAGCAGTGCAATTCATAGAAAGCAAACAATTGCGAGATGGTTCAT GGTACGGAAGC

TGGGGTATTTGTTTCACTTATGGGACATGGTTTGCTCTTTGCGGCCTAGCAGCGATT GGTAAGACATA

CAACAATTGTCTATCTATGCGCGACGGTGTACATTTCCTTCTTAATATACAAAATGA AGATGGGGGTT

GGGGTGAAAGCTATATGTCATGCCCTGAACAGAGATACATACCATTAGAGGGGAATA GATCAAACGTA

GTGCAAACCGCGTGGGCTATGATGGCTCTGATTCACGCTGGACAGGCTAAGAGAGAT CTTATACCTCT

ACATAGTGCTGCAAAATTTATTATCACGTCGCAACTGGAAAACGGAGATTTTCCTCA ACAGGAACTAT

TAGGAGCGTCTATGAGTACATGCATGCTACACTATTCTACATACAAAGACATCTTCC CACCATGGGCA

CTTGCAGAGTACCGGAAAGCTGCGTTCATACATCACGCAGATCTATAG

SEQ ID 30- AtLUP5 translated nucleotide sequence (peptide)

MWRLKVGEGKGKDPYLFSSNNFVGRQTWEFDPKAGTREERTAVEEARRSFFDNRSRVKPS SDLLWKMQ FLKEAKFEQVIPPVKIDGGEAITYEKATNALRRGVAFLSALQASDGHWPGEFTGPLCMLP PLVFCLYI TGHLEEVFDAEHRKEMLRYIYCHQNEDGGWGFHIESKSIMFTTTLNYICLRILGVGPDGG LENACKRA RQWILSHGGVIYIPCWGKVWLSVLGIYDWSGVNPMPPEIWLLPYFLPIHLGKAFSYTRIT YMPISYLY GKKFVGQITPLIMQLREELHLQPYEEINWNKARHLCAKEDKYYPHPLVQDLIWDALHTFV EPLLASWP INKLVRKKALQVAMKHIHYEDENSHYITIGCIEKNLCMLACWIDNPDGNHFKKHLSRIPD MMWVAEDG MKMQCFGSQLWMTGFAVQALLASDPRDETYDVLRRAHDYIKKSQVRDNPSGDFKSMYRHI SKGGWTLS DRDHGWQVSDCTAEAAKCCMLLSTMPTDITGEKINLEQLYDSVNLMLSLQSENGGFTAWE PVRAYKWM ELMNPTDLFANAMTEREYTECTSAVLQALVIFNQLYPDHRTKEITKSIEKAVQFIESKQL RDGSWYGS WGICFTYGTWFALCGLAAIGKTYNNCLSMRDGVHFLLNIQNEDGGWGESYMSCPEQRYIP LEGNRSNV VQTAWAMMALIHAGQAKRDLIPLHSAAKFI ITSQLENGDFPQQELLGASMSTCMLHYSTYKDIFPPWA LAEYRKAAFIHHADL

SEQ ID 31- AtPEN3 coding sequence (nucleotide)

ATGTGGAGGCTGAGGATCGGAGCTAAGGCAGGAGATGACCCTCACTTGTGCACCACCAAC AACTTCTT

GGGAAGGCAGATATGGGAGTTTGATGCCAACGCAGGCTCTCCAGCGGAACTCTCTGA GGTTGATCAGG

CTCGACAAAATTTCTCAAACAATAGGTCACAATACAAGGCTTGTGCCGATCTCCTTT GGCGTATGCAG

TTTCTAAGGGAGAAGAATTTCGAGCAAAAGATTCCACGAGTGAGAATAGAGGATGCC AAGAAAATAAC

ATTTGAAGACGCAAAAAATACACTGAGAAGAGGAATACATTATATGGCAGCGTTGCA ATCTGATGATG

GACATTGGCCTTCCGAAAACGCTGGTTGCATTTTCTTCAATGCCCCCTTTGTTATAT GTTTGTATATC

ACTGGCCATCTGGATAAAGTTTTCTCTGAAGAGCATCGGAAAGAGATGTTGCGTTAC ATGTACAACCA

TCAGAACGACGATGGTGGATGGGGAATAGACGTAGAAAGCCATAGTTTTATGTTTTG CACGGTCATCA

ACTACATCTGCCTACGAATCTTCGGAGTAGATCCCGATCATGATGGTGAAAGTGCTT GTGCAAGGGCT

CGTAAATGGATCATTGACCACGGTGGCGCTACCTATACGCCATTATTTGGAAAAGCC TGGCTTTCGGT

TCTTGGAGTATATGAATGGTCTGGTTGCAAACCCATACCCCCAGAGTTCTGGTTTTT TCCTTCCTATT

TTCCTATTAATGGAGGCACTCTCTGGATATATTTACGGGATACTTTCATGGCAATGT CCTATTTGTAT

GGTAAAAAATTTGTTGCTAAACCAACACCTCTCATTCTACAACTTCGTGAAGAACTT TATCCTCAACC

TTATGCTGAAATTGTTTGGAGCCAAGCTCGCAGTCGATGTGCGAAGGAAGATCTATA TTATCCACAAT

CATTGGTACAAGACTTGTTTTGGAAACTTGTTCACATGTTTTCGGAGAATATCTTAA ATCGATGGCCT

TTCAACAAGCTCATTAGAGAAAAAGCTATTCGAACGGCAATGGAACTCATTCACTAC CATGACGAAGC

CACCCGGTACATTACAGGTGGAGCAGTGCCAAAGGTGTTTCATATGCTTGCTTGTTG GGTTGAAGATC

CAGAGAGTGATTATTTTAAAAAACATCTTGCGCGAGTCTCTCATTTCATATGGATTG CGGAGGACGGC

TTGAAAATCCAGACTTTTGGTAGCCAAATATGGGATACAGCCTTCGTTCTCCAAGTC ATGTTAGCGGC

TGATGTTGACGATGAGATAAGGCCAACGCTCATAAAGGGATACTCTTACTTGAGGAA ATCCCAATTTA

CAGAGAATCCTCCCGGTGACTATATCAATATGTTTAGAGACATATCCAAAGGAGGGT GGGGCTATTCA

GACAAAGATCAAGGATGGCCTGTTTCAGATTGTATTTCTGAGAGTTTAGAGTGCTGC CTGATCTTTGA

GAGTATGTCATCCGAATTCATTGGTGAGAAAATGGAAGTGGAGAGGCTTTATGATGC CGTCAATATGC

TTCTCTATATGCAGAGCAGAAATGGAGGGATATCTATATGGGAAGCAGCGAGTGGGA AAAAATGGCTA

GAGTGGCTTAGTCCCATAGAGTTTATTGAAGACACTATCCTCGAGCATGAGTATCTA GAATGCACGGG

GTCAGCGATAGTGGTGTTGGCACGCTTCATGAAACAGTTTCCAGGGCATAGAACAGA AGAAGTCAAAA AATTTATAACAAAGGGAGTGAAATACATAGAAAGCTTACAAATTGCGGATGGTTCGTGGT ACGGAAAC

TGGGGAATATGTTTTATATATGGGACTTTCTTTGCTGTCCGAGGTTTAGTGGCCGCG GGAAACACTTA

CGATAACTGTGAGGCAATCCGTAGAGCAGTTCGATTCCTTCTTGATATACAAAACGG TGAAGGCGGTT

GGGGAGAGAGTTTTCTTTCTTGCCCCAACAAAAATTATATTCCTTTGGAGGGGAACA AGACCGATGTG

GTGAATACAGGACAAGCATTGATGGTTCTAATAATGGGTGGTCAGATGGATAGAGAT CCTTTACCGGT

TCACCGCGCTGCAAAAGTATTAATCAATTCACAAATGGATAACGGTGATTTTCCACA GCAGGAAATAA

GGGGTGTTTACAAAATGAATGTGATGCTAAATTTTCCAACCTTTAGAAACTCTTTCA CTCTTTGGGCA

CTAACACACTACACCAAGGCTATACGGTTGCTCCTTTGA

SEQ ID 32- AtPEN3 translated nucleotide sequence (peptide)

MWRLRIGAKAGDDPHLCTTNNFLGRQIWEFDANAGSPAELSEVDQARQNFSNNRSQYKAC ADLLWRMQ FLREKNFEQKIPRVRIEDAKKITFEDAKNTLRRGIHYMAALQSDDGHWPSENAGCIFFNA PFVICLYI TGHLDKVFSEEHRKEMLRYMYNHQNDDGGWGIDVESHSFMFCTVINYICLRIFGVDPDHD GESACARA RKWIIDHGGATYTPLFGKAWLSVLGVYEWSGCKP IPPEFWFFPSYFPINGGTLWIYLRDTFMAMSYLY GKKFVAKPTPLILQLREELYPQPYAEIVWSQARSRCAKEDLYYPQSLVQDLFWKLVHMFS ENILNRWP FNKLIREKAIRTAMELIHYHDEATRYITGGAVPKVFHMLACWVEDPESDYFKKHLARVSH FIWIAEDG LKIQTFGSQIWDTAFVLQVMLAADVDDEIRPTLIKGYSYLRKSQFTENPPGDYINMFRDI SKGGWGYS DKDQGWPVSDCISESLECCLIFESMSSEFIGEKMEVERLYDAVNMLLYMQSRNGGISIWE AASGKKWL EWLSP IEFIEDTILEHEYLECTGSAIWLARFMKQFPGHRTEEVKKFITKGVKYIESLQIADGSWY GN WGICFIYGTFFAVRGLVAAGNTYDNCEAIRRAVRFLLDIQNGEGGWGESFLSCPNKNYIP LEGNKTDV VNTGQALMVLIMGGQMDRDPLPVHRAAKVLINSQMDNGDFPQQEIRGVYKMNVMLNFPTF RNSFTLWA LTHYTKAIRLLL

SEQ I D 33 M0CYP88AIO8 coding sequence (nucleotide)

ATGGAGCTAAATTTCCTGTGGTTGATTCTTGCCATTTTTCTTGGCACATATGTTGTTTTG TTTGGGTT CTTAAGAAAGGTAAACGATTGGTATTATGTTAGCAGATTGGGAGAGAAGAAGAAATCTCT CCCTCCAG GTGATATGGGTTGGCCATTATTGGGCAACATGTTGTCCTTCATCCGAGCTTTCCAATCCA GTGATCCT GATGCCTTCGTCTACAACTTAGTTGACAGATATGGTCGAACTGGCGTCTACAAGAGCCAT ATGTTCTG GAGCCCAAGTATTGTTGTGACCACTCCGGAAACATGCAGACGTGTGCTGATGGACAATGA GCAATTTG GGAGGGGAAATCCTGAATCAACCAAGGAATTATTAGGAAAGAAAACGCTTGCACTTTCGA ATGAAGAA CACAAGCGTCTACGCAAGCTAACTACAAATCCATTCAGAGGTGATAAGGCATTAACCATG TATGTCGG ATACATTGAGGACATCGTGGTCGATTTGTTAGATGAATGGGCTGGCATGAAAAAGCCGAT TGTTTTCT TGTTTGAGATGAGAAAACTTGCTTTCAAGGTCATTGGACACATTGTCTTTGGAACAACTA GTGATCAT CTTCTTGAGTTAATGGAGCAATACTACACTGATTTACTTCTTGGATTGAGATCTCCGGCC ATTAATAT CCCTGGTTTTGCTTTCTATCGAGCACTCAAGGCACGAAAATTGTTGGTGAAGCTCCTGCA AGATGTCC TTGAAGAAAGAAAGAAGATGGTGGGAATTGAGCAGCAAAAGGGGAACAGAGGCATGATTG ATTTGTTG ATAGAAGCTGAAGATGAGAATGGTAAAAAATTGGCCGATGAAAATATCATAGATTTACTG ATCATAAA CTTGTTAGCCGGACATGAAAGCTCTGCCCATGCTTCAATGTGGGCAGCACTCTATCTGTA TCAACATC CAGAAATGCTGCAAAAAGCCAAGGAAGAGCAAGAGCAGATTCTAAAGAGAAGACCATCTA CACAGAAA GGATTGACTCTTGAAGAAATTAAACAAATGGATTATCTTGCTAAGGTTATAGATGAAACG ATGCGTAG AACCAGTCTCTTCATACCAATTTTCCGAGAGGCAAAAGTAGATACTGACATCAATGGTTA CACAGTGC CAAAAGGATGGCAAGTTTTGGTATGGACTAGGGGTGTTCATATGGACCCAGAAGTTCATC CGAACCCA AAAGAATTTGATCCCTCAAGATGGGATAATCGAGCAAAACCAGGATCTTACATTCCATTT GGAGGTGG ACCATGGATTTGCCCTGGAGCTGATCTGACCAAACTTGAAATCTACATTTTCCTTCATTA CTTTCTCC TTTACTACAAGCTTGAGCTACAAAATCCTGACTGCCCAGTTGCATACTTACCTGTACCAA GGCCTTCT GACAATTGTATTGGAAAAGTCATCAAGGTCAAGAACTTCTGA

SEQ I D 34 MaCYP88A108 translated nucleotide sequence (peptide)

MELNFLWLILAIFLGTYVVLFGFLRKVNDWYYVSRLGEKKKSLPPGDMGWPLLGNMLSFI RAFQSSDP DAFVYNLVDRYGRTGVYKSHMFWSPSIVVTTPETCRRVLMDNEQFGRGNPESTKELLGKK TLALSNEE HKRLRKLTTNPFRGDKALTMYVGYIEDIWDLLDEWAGMKKP IVFLFEMRKLAFKVIGHIVFGTTSDH LLELMEQYYTDLLLGLRSPAINIPGFAFYRALKARKLLVKLLQDVLEERKKMVGIEQQKG NRGMIDLL IEAEDENGKKLADENI IDLLIINLLAGHESSAHASMWAALYLYQHPEMLQKAKEEQEQILKRRPSTQK GLTLEEIKQMDYLAKVIDETMRRTSLFIPIFREAKVDTDINGYTVPKGWQVLVWTRGVHM DPEVHPNP

KEFDPSRWDNRAKPGSYIPFGGGPWICPGADLTKLEIYIFLHYFLLYYKLELQNPDC PVAYLPVPRPS

DNCIGKVIKVKNF

SEQ I D 35 MalSOMl coding sequence [predicted] (nucleotide)

ATGAGCGACTCATCATCTGTTCCCGTGGATTTTGTGCTAAACTTCTCAACTGCCGCCTTG CATGCTTG

GAATGGCCTCAGTTTATTCTTAATCGTCTTCATCTCCTGGTTTATCTCCGGGTTGAC ACAGGCGAAAA

CAAAAATGGACAGAGTGGTATTATGCTGGTGGGCTCTCACTGGCCTTATTCATGTCT TTCAAGAGGGT

TATTATGTTTTCACTCCAGATTTATTTAAAGACGATTCTCCTAATTTTATGGCTGAA ATTTGGAAAGA

ATACAGCAAAGGTGATTCAAGATATGCAACAAGACACACTTCAGTTCTTACCATCGA ATCGATGGCTT

CAGTTGTTCTGGGACCTCTTAGCCTTCTAGCAGCGTATGCTTTAGCTAAAGCGAAGT CATACAACTAC

ATTCTTCAGTTTGGAGTCTCAATTGCGCAGCTGTATGGGGCTTGTCTATATTTCCTA AGTGCTTTCCT

GGAGGGGGATAATTTTGCTTCTTCTCCGTATTTTTACTGGGCATATTACGTTGGACA AAGTAGCATCT

GGGTTATAGTACCAGCACTCATAGCTATACGTTGCTGGAAAAAAATCAATGCTATTT GCTATCTTCAA

GACAAGAAGAACAAGACCAAAGTTCGCTGA

SEQ I D 36 MalSOMl cDNA [cloned](nucleotide)

ATGAGCGACTCATCATCTGTTCCCGTGGATTTTGTGCTAAACTTCTCAACTGCCGCCTTG CATGCTTG GAATGGCCTCAGTTTATTCTTAATCGTCTTCATCTCCTGGTTTATCTCCGGTATGTCTGC TTATTAAT CTATTAAGTACACTTCGTATATAATTCTACCTCAATCATATGTAGTTTATTGTTTGACGT GTATATCA TATATCTACATATATATACGTTTGCATGAATTGATCATTGCTTGCAGGGTTGACACAGGC GAAAACAA AAATGGACAGAGTGGTATTATGCTGGTGGGCTCTCACTGGCCTTATTCATGTCTTTCAAG AGGGTTAT TATGTTTTCACTCCAGATTTATTTAAAGACGATTCTCCTAATTTTATGGCTGAAATTTGG AAAGAATA CAGCAAAGGTGATTCAAGATATGCAACAAGACACACTTCAGTTCTTACCATCGAATCGAT GGCTTCAG TTGTTCTGGGACCTCTTAGCCTTCTAGCAGCGTATGCTTTAGCTAAAGCGAAGTCATACA ACTACATT CTTCAGTTTGGAGTCTCAATTGCGCAGCTGTATGGGGCTTGTCTATATTTCCTAAGTGCT TTCCTGGA GGGGGATAATTTTGCTTCTTCTCCGTATTTTTACTGGGCATATTACGTTGGACAAAGTAG CATCTGGG TTATAGTACCAGCACTCATAGCTATACGTTGCTGGAAAAAAATCAATGCTATTTGCTATC TTCAAGAC AAGAAGAACAAGACCAAAGTTCGCTGA

SEQ I D 37 MalSOMl translated nucleotide sequence (peptide)

MSDSSSVPVDFVLNFSTAALHAWNGLSLFLIVFISWFISGLTQAKTKMDRWLCWWALTGL IHVFQEG

YYVFTPDLFKDDSPNFMAEIWKEYSKGDSRYATRHTSVLTIESMASWLGPLSLLAAY ALAKAKSYNY

ILQFGVSIAQLYGACLYFLSAFLEGDNFASSPYFYWAYYVGQSSIWVIVPALIAIRC WKKINAICYLQ

DKKNKTKVR

SEQ I D 38 MaSDRl coding sequence (nucleotide)

ATGAACAGTTATTCATCCGCGGCTCCCGGAAAAAGATTAGAAGGCAAAGTAGCAATCATC ACTGGTGG

AGCAAGCGGGATTGGAGCAACCGCAGTGCAAATTTTCCATGATAATGGTGCCAAGGT TGTTATATCTG

ATGTCCAGGATAAACTTGGCCAAGCCCTTGCTGATAAGCTAGGGGAAGGTGTTAGCT ACATCCATTGC

GACATATCAAATGAAAACGATGTGATAAATCTTGTTGATACAACTGTGGCTAAATAT GGAAAGCTTGA

TATCATGTACAACAACGCAGGCGTTATTGACCGTAACTTCGGAAGCATTTTGGACAC CCCAAAGTCTG

ATCTAGAACGCTTGCTTTCTGTTAACACCATTGGTGGTTTCTTAGGAGCCAAACATG CTGCAAGAGTC

ATGGTACCAAAGCAAAAGGGTTGCATATTGTTTACAGCTAGTGCCTGTACAGAAATT GCAGGACTTGG

CTCTCCTGCTTATACAGTGTCCAAATATGGGGTGGTAGCACTAGTTAAGAGCTTGGC AGCAGAGCTTG

GACAGTATGGTATAAGAGTAAATTGCGTATCACCTTACGGATTGGCAACCGGAATGT CGACTGCTGGA

GTTGATCCAGCATTAATAGAGTCATCATTGAGCGAGATGGGTAATTTAAAAGGGCAA GTTCTGAAAAC

CGATGGCATTGCAAATGCTGCGCTTTACTTGGCTTGTGATGAAGCTAGTTATGTGAG CGGACAAAACC

TCGTAGTCGATGGAGGATTCAGCATCCTCAACCCTACCATCATGAAAGCTTATAATC TTATCAATTAA

SEQ I D 39 MaSDRl translated nucleotide sequence (peptide)

MNSYSSAAPGKRLEGKVAIITGGASGIGATAVQIFHDNGAKVVISDVQDKLGQALADKLG EGVSYIHC

DISNENDVINLVDTTVAKYGKLDIMYNNAGVIDRNFGSILDTPKSDLERLLSVNTIG GFLGAKHAARV

MVPKQKGCILFTASACTEIAGLGSPAYTVSKYGVVALVKSLAAELGQYGIRVNCVSP YGLATGMSTAG

VDPALIESSLSEMGNLKGQVLKTDGIANAALYLACDEASYVSGQNLWDGGFSILNPT IMKAYNLIN SEQ I D 40 MaBAHDl coding sequence (nucleotide)

ATGAATCTCCGAATCACTTCCTCTGAAATCATCAAACCGTCTTCTCCTACCCCTCAAAAC CTGAAATC

CTATAGGCTTTCTATCGTGGATCAGTTAACACCTAATGTTTACTTCTCCATCATTCT CTTGTACACGA

AAACAACAGAAAACCCCACCAAAACTTCTGATCACCTTAAAAAATCTCTCTCAGAAA CTTTAACCCGC

TACTATCCTTTAGCAGGGCAACTCAAATATGATCAACTTATTGTTGATTGCAATGAT CAAGGAGTTCC

CTTCGTTGAAGCTGACGTATCCAACCACATGTCTGAGCTTCTCAAACTACCAAACAT CGACGTTCTTG

AGCAACTGCTACCATTCAAGCCGCATGAAGGTTTTAATGCTGAACGTTCTAACGTGA CCGTTCAGGTT

AATTACTTCGGTTGTGGTGGGATGGCTATCGGTCTTTGCTTTAAGCATAAAGTTCTT GATGCAACGAC

TGCTGCATTCTTTGTTAAAAACTGGGGTGTGATTGCTCGCGGTGCTGGTGAAATCAA GGATGTGATTT

ACGATCAAGCATCACTGTTTCCTGCAAGAGATTTGTCGTTCTTGTCGAAGAGTGTAG ACGAAGAGTTT

CTGAAGGCAGAATCCGAGACAAAAAGGTTCGTTTTCGACGGTTCTGCTATAGCTTCT ATGAGAGAGAA

GTTTACACATTTGGGGAGGCGTCCAACACGTTTTGAGGTTGTATCCGCAGTTATTTT GGGTGCTTTGA

TAAGCGCAGCTAAAGAAAGCGAAGAACCTCCTGAAAGATTGGATACCATAATCTCAG TGAATCTTCGA

CAGAGAATGGTTCCACCATTTCCAGAACACTGCTTGGGGAATATAATCTCAGGAGGA TTAATATACTG

GCCATTGGAGAAGAAACTCGACCATGGAAGTTTAGCAGAGGAAATTCATCAGTCAAT AAAGAAGGTAG

ACGACCAATTTGCTAGGAAGTTTTATGGAGAGGCTGAGTTCTTGAACCTGCCAAGAC TTGGGGCTAAT

GAAGTAGTGAAGAAGAGGGAGTTTTGGGTTACCAGCTGGTGCAAAACGCCACTACAT CAGTCTGATTT

CGGATGGGGAAAGCCTAAATGGGCAGGAAATTCAATGAGGCTCAATGAAATTACTGT TCTGTTCGACA

CCAGTGATGGTGAAGGAATTGAAGCGTGGGTGGGATTGCCCAAAAAAGACATGGCTC GATTTGAACAA

GATGCTACCATCGTTGCTTATACTTCTCCTAATCCCACCATACTTTGA

SEQ. I D 41 MaBAHDl translated nucleotide sequence (peptide)

MNLRITSSEI IKPSSPTPQNLKSYRLSIVDQLTPNVYFSI ILLYTKTTENPTKTSDHLKKSLSETLTR YYPLAGQLKYDQLIVDCNDQGVPFVEADVSNHMSELLKLPNIDVLEQLLPFKPHEGFNAE RSNVTVQV NYFGCGGMAIGLCFKHKVLDATTAAFFVKNWGVIARGAGEIKDVIYDQASLFPARDLSFL SKSVDEEF LKAESETKRFVFDGSAIASMREKFTHLGRRPTRFEVVSAVILGALISAAKESEEPPERLD TI ISVNLR QRMVPPFPEHCLGNIISGGLIYWPLEKKLDHGSLAEEIHQSIKKVDDQFARKFYGEAEFL NLPRLGAN EVVKKREFWVTSWCKTPLHQSDFGWGKPKWAGNSMRLNEITVLFDTSDGEGIEAWVGLPK KDMARFEQ DATIVAYTSPNPTIL

References for Examples 1-3

1. Morgan ED (2009) Azadirachtin, a scientific gold mine. Bioorganic & Medicinal

Chemistry 17(12):4096-4105.

2. Tan Q-G & Luo X-D (2011) Meliaceous limonoids: chemistry and biological activities.

Chemical Reviews 111(11):7437-7522.

3. Roy A & Saraf S (2006) Limonoids: Overview of significant bioactive triterpenes

distributed in plants kingdom. Biological & Pharmaceutical Bulletin 29(2):191-201.

4. Zhang YY & Xu H (2017) Recent progress in the chemistry and biology of limonoids.

Rsc Advances 7(56) :35191 -35220.

5. Hasegawa S & Miyake M (1996) Biochemistry and biological functions of citrus

limonoids. Food Reviews International 12(4):413-435.

6. Veitch GE, et al. (2007) Synthesis of azadirachtin: a long but successful journey.

Angew Chem Int Ed Engl 46(40):7629-32.

7. Yamashita S, et al. (2015) Total synthesis of limonin. Angewandte Chemie

International Edition 54(29): 8538-8541.

8. Gualdani R, Cavalluzzi MM, Lentini G, & Habtemariam S (2016) The chemistry and pharmacology of citrus limonoids. Molecules 21(11): 1530.

9. Akhila A, Srivastava M, & Rani K (1996) Production of radioactive azadirachtin in the seed kernels of Azadirachta indica (the Indian neem tree). Natural Product Letters

11 (1 ): 107- 110.

10. Aarthy T, et al. (2018) Tracing the biosynthetic origin of limonoids and their functional groups through stable isotope labelling and inhibition in neem tree ( Azadirachta indica) cell suspension. BMC Plant Biology 18(1):230.

11. Thimmappa R, Geisler K, Louveau T, O'Maille P, & Osbourn A (2014) Triterpene biosynthesis in plants. Annual Review of Plant Biology, 29(65):225-57.

12. Pandreka A, et al. (2015) Triterpenoid profiling and functional characterization of the initial genes involved in isoprenoid biosynthesis in neem ( Azadirachta indica). BMC Plant Biology 15(1):214.

13. Wang FS, et al. (2017) Identification of putative genes involved in limonoids

biosynthesis in citrus by comparative transcriptomic analysis. Frontiers in Plant Science 8(1):782.

14. Narnoliya LK, Rajakani R, Sangwan NS, Gupta V, & Sangwan RS (2014)

Comparative transcripts profiling of fruit mesocarp and endocarp relevant to secondary metabolism by suppression subtractive hybridization in Azadirachta indica (neem). Molecular Biology Reports 41(5):3147-3162.

15. Rajakani R, Narnoliya L, Sangwan NS, Sangwan RS, & Gupta V (2014) Subtractive transcriptomes of fruit and leaf reveal differential representation of transcripts in Azadirachta indica. Tree Genetics & Genomes 10(5) : 1331-1351.

16. Wang S, Zhang H, Li X, & Zhang J (2016) Gene expression profiling analysis reveals a crucial gene regulating metabolism in adventitious roots of neem ( Azadirachta indica). RSC Advances 6(115):114889-114898.

17. Bhambhani S, et al. (2017) Genes encoding members of 3-hydroxy-3-methylglutaryl coenzyme A reductase (HMGR) gene family from Azadirachta indica and correlation with azadirachtin biosynthesis. Acta Physiol. Plant. 39(1):65. Kita M, et al. (2000) Molecular cloning and characterization of a novel gene encoding limonoid UDP-glucosyltransferase in Citrus. FEBS Letters 469(2-3): 173-178.

Krishnan NM, et al. (2012) A draft of the genome and four transcriptomes of a medicinal and pesticidal angiosperm Azadirachta indica. BMC Genomics 13(1 ):464. Krishnan NM, Jain P, Gupta S, Hariharan AK, & Panda B (2016) An improved genome assembly of Azadirachta indica A. Juss. G3: Genes, Genomes, Genetics 6(7): 1835-1840.

Kuravadi NA, et al. (2015) Comprehensive analyses of genomes, transcriptomes and metabolites of neem tree. PeerJ 3:e1066.

Krishnan NM, et al. (2011) De novo sequencing and assembly of Azadirachta indica fruit transcriptome. Current Science 101 ( 12) : 1553- 1561.

Wang Y, et al. (2016) Comparative analysis of the terpenoid biosynthesis pathway in Azadirachta indica and Melia azedarach by RNA-seq. SpringerPlus 5(1): 1-9.

Bhambhani S, et al. (2017) Transcriptome and metabolite analyses in Azadirachta indica·. identification of genes involved in biosynthesis of bioactive triterpenoids.

Scientific Reports 7(1):5043.

Xu Q, et al. (2012) The draft genome of sweet orange ( Citrus sinensis). Nature Genetics 45(1):59-66.

Racolta S, Juhl PB, Sirim D, & Pleiss J (2012) The triterpene cyclase protein family: a systematic analysis. Proteins: Structure, Function, and Bioinformatics 80(8):2009- 2019.

Ebizuka Y, Katsube Y, Tsutsumi T, Kushiro T, & Shibuya M (2003) Functional genomics approach to the study of triterpene biosynthesis. Pure and Applied

Chemistry 75(2-3) :369-374.

Morlacchi P, et al. (2009) Product profile of PEN3: the last unexamined

oxidosqualene cyclase in Arabidopsis thaliana. Organic Letters 11 (12):2627-2630. Nelson DR (2006) Cytochrome P450 nomenclature, 2004. Cytochrome P450

Protocols, eds Phillips IR & Shephard EA (Humana Press, Totowa, NJ), pp 1-10. Koenen EJ, Clarkson JJ, Pennington TD & Chatrou LW (2015). Recently evolved diversity and convergent radiations of rainforest mahoganies (Meliaceae) shed new light on the origins of rainforest hyperdiversity. New Phytologist 207(2):327-39.

Ekong DEU, Ibiyemi SA, & Olagbemi EO (1971) The meliacins (limonoids).

biosynthesis of nimbolide in the leaves of Azadirachta indica. Journal of the Chemical Society D: Chemical Communications (18):1117-1118.

Camacho C, et al. (2009) BLAST+: architecture and applications. BMC

Bioinformatics 10(1):421.

Bak S, et al. (2011) Cytochromes p450. The Arabidopsis Book 9(1):e0144-e0144. Stephenson MJ, Reed J, Brouwer B, & Osbourn A (2018) Transient expression in nicotiana benthamiana leaves for triterpene production at a preparative scale. JoVE (138):e58169.

Reed J, et al. (2017) A translational synthetic biology platform for rapid access to gram-scale quantities of novel drug-like molecules. Metabolic Engineering 42(1 ): 185- 193.

Zhao W-Y, et al. (2019) New tirucallane triterpenoids from Picrasma quassioides with their potential antiproliferative activities on hepatoma cells. Bioorganic Chemistry 84(1):309-318. Nakanishi T, Inada A, & Lavie D (1986) A new tirucallane-type triterpenoid derivative, lipomelianol from fruits of Melia toosendan. Sieb. et Zucc. Chemical and

Pharmaceutical Bulletin 34( 1 ): 100- 104.

Bevan C, Ekong D, Halsall T, & Toft P (1967) West African timbers. Part XX. The structure of turraeanthin, an oxygenated tetracyclic triterpene monoacetate. Journal of the Chemical Society C: 1967(Organic):820-828.

Polonsky J, Varon Z, Rabanal RM, & Jacquemin H (1977) 21 , 20-anhydromelianone and melianone from Simarouba amara (Simaroubaceae); carbon-13 NMR spectral analysis of A7-tirucallol-type triterpenes. Israel Journal of Chemistry 16(1): 16-19. Yuan C-M, et al. (2013) Bioactive limonoid and triterpenoid constituents of Turraea pubescens. Journal of Natural Products 76(6): 1166-1174.

Grabherr MG, et al. (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnology 29(7) :644-U 130.

Haas BJ, et al. (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature Protocols 8(8): 1494-1512.

Stanke M & Morgenstern B (2005) AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Research 33(Web Server lssue):W465-W467.

Edgar R (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32(5): 1792-1797.

Kushiro T, Shibuya M, Masuda K, & Ebizuka Y (2000) Mutational studies on triterpene synthases: engineering lupeol synthase into b-amyrin synthase. Journal of the American Chemical Society 122(29):6816-6824.

Livak KJ & Schmittgen TD (2001) analysis of relative gene expression data using real-time quantitative PCR and the 2-AACT method. Methods 25(4):402-408.

Sainsbury F, Thuenemann EC, & Lomonossoff GP (2009) pEAQ: versatile expression vectors for easy and quick transient expression of heterologous proteins in plants. Plant Biotechnology Journal 7(7):682-693.

Lavie D, Jain MK, & Shpan-Gabrielith SR (1967) A locust phagorepellent from two melia species. Chemical Communications ( London ) 1967(18):910-911.

Saxena N & Kumar Y (2008). Chemistry of azadirachtin and other bioactive isoprenoids from neem. Neem a treatise, eds I.K. International Publishing House Pvt. Ltd. (New Delhi), pp 175-198

Paal C (1884) Ueber die derivate des acetophenonacetessigesters und des acetonylacetessigesters. Berichte der Deutschen Chemischen Gesellschaft

17(2):2756-2767.

Knorr L (1884) Synthese von furfuranderivaten aus dem diacetbernsteinsaureester. Berichte der Deutschen Chemischen Gesellschaft 17(2):2863-2870.

Siddiqui S, Mahmood T, Siddiqui BS, & Faizi S (1986) Isolation of a triterpenoid from Azadirachta indica. Phytochemistry 25(9) : 2183-2185.

Purushothaman KK, Duraiswamy K, Connolly JD, & Rycroft DS (1985) Triterpenoids from Walsura piscidia. Phytochemistry 24(10): 2349-2355.

Ayafor JF, Sondengam BL, Connolly JD, Rycroft DS, & Okogun Jl (1981)

Tetranortriterpenoids and related compounds part 26. tecleanin, a possible precursor of limonin, and other new tetranortriterpenoids from Teclea grandifolia Engl.(Rutaceae). Journal of the Chemical Society, Perkin Transactions 1 (1 ): 1750- 1753.

56. Hasegawa S, Herman Z, Orme E, & Ou P (1986) Biosynthesis of limonoids in citrus: sites and translocation. Phytochemistry 25(12):2783-2785.

57. Ou P, Hasegawa S, Herman Z, & Fong CH (1988) Limonoid biosynthesis in the stem of Citrus limon. Phytochemistry 27(1): 115-118.

58. Hasegawa S & Herman Z (1985) Biosynthesis of obacunone from nomilin in Citrus limon. Phytochemistry 24(9): 1973-1974.

59. Price M, Dehal P, & Arkin A (2010) FastTree 2-approximately maximum-likelihood trees for large alignments. PLOS ONE 5(3): e9490.

60. Letunic I & Bork P (2016) Interactive tree of life (iTOL) v3: an online tool for the

display and annotation of phylogenetic and other trees. Nucleic Acids Research 44(W1):W242-W245.

References for Figures Figs. 6-9, Tables S1 to S12

1. MacKenzie DJ, McLean MA, Mukerji S, & Green M (1997) Improved RNA extraction from woody plants for the detection of viral pathogens by reverse transcription- polymerase chain reaction. Plant Disease 81(2):222-226.

2. Kushiro T, Shibuya M, Masuda K, & Ebizuka Y (2000) Mutational studies on

triterpene synthases: engineering lupeol synthase into b-amyrin synthase. Journal of the American Chemical Society 122(29):6816-6824.

3. Sainsbury F, Thuenemann EC, & Lomonossoff GP (2009) pEAG: versatile

expression vectors for easy and quick transient expression of heterologous proteins in plants. Plant Biotechnology Journal 7(7):682-693.

4. Reed J, et ai. (2017) A translational synthetic biology platform for rapid access to gram-scale quantities of novel drug-like molecules. Metabolic Engineering 42:185- 193.

5. Krishnan NM, et al. (2011) De novo sequencing and assembly of Azadirachta indica fruit transcriptome. Current Science 101 ( 12) : 1553- 1561.

6. Krishnan NM, et ai. (2012) A draft of the genome and four transcriptomes of a

medicinal and pesticidal angiosperm Azadirachta indica. BMC Genomics 13(1):464.

7. Haas BJ, et ai. (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature Protocols 8(8): 1494-1512.

8. Langmead B, Trapnell C, Pop M, & Salzberg SL (2009) Ultrafast and memory- efficient alignment of short DNA sequences to the human genome. Genome Biol 10(1):R25.

9. Li B & Dewey CN (2011) RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12(1):323.

10. Robinson M, McCarthy D, & Smyth G (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139-140.

11. Love M, Huber W, & Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 15(12): 550. Anonymous (2008) Spearman rank correlation coefficient. The concise encyclopedia of statistics, eds Springer New York, (New York, NY), pp 502-505.

Pearson K (1895) Notes on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58(1): 240-242.

Zhao S, Guo Y, Sheng Q, & Shyr Y (2014) Heatmap3: an improved heatmap package with more powerful and convenient features. BMC Bioinformatics

15(10):P16.

Pandreka A, et ai. (2015) Triterpenoid profiling and functional characterization of the initial genes involved in isoprenoid biosynthesis in neem {Azadirachta indica). BMC Plant Biology 15:14.

Stephenson MJ, Reed J, Brouwer B, & Osbourn A (2018) Transient expression in Nicotiana benthamiana leaves for triterpene production at a preparative scale. JoVE 138(1):e58169.

Bak S, et al. (2011) Cytochromes P450. The Arabidopsis Book 9(1):e0144-e0144. Price M, Dehal P, & Arkin A (2010) FastTree 2-approximately maximum-likelihood trees for large alignments. PLOS ONE 5(3): e9490.

Letunic I & Bork P (2016) Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Research 44(W1):W242-W245.

Sievers F, et al. (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular Systems Biology 7(1 ):539. Wang Y, et al. (2016) Comparative analysis of the terpenoid biosynthesis pathway in Azadirachta indica and Melia azedarach by RNA-seq. SpringerPlus 5(1): 1-9.

Ebizuka Y, Katsube Y, Tsutsumi T, Kushiro T, & Shibuya M (2003) Functional genomics approach to the study of triterpene biosynthesis. Pure and Applied

Chemistry, p 369.

Morlacchi P, et al. (2009) Product profile of PEN3: the last unexamined

oxidosqualene cyclase in Arabidopsis thaliana. Organic Letters 11 (12):2627-2630. Nelson DR (2006) Cytochrome P450 nomenclature, 2004. Cytochrome P450 Protocols, eds Phillips IR & Shephard EA (Humana Press, Totowa, NJ), pp 1-10. Mittapelli SR, Maryada SK, Khareedu VR, & Vudem DR (2014) Structural organization, classification and phylogenetic relationship of cytochrome P450 genes in Citrus Clementina and Citrus sinensis. Tree Genetics & Genomes 10(2): 399-409. Gray Al, Bhandari P, & Waterman PG (1988) New protolimonoids from the fruits of Phellodendron chinense. Phytochemistry 27(6): 1805-1808.

Nakanishi T, Inada A, & Lavie D (1986) A new tirucallane-type triterpenoid derivative, lipomelianol from fruits of Melia toosendan Sieb. et Zucc. Chemical and

Pharmaceutical Bulletin 34( 1 ): 100- 104.

Zhao W-Y, et al. (2019) New tirucallane triterpenoids from Picrasma quassioides with their potential antiproliferative activities on hepatoma cells. Bioorganic Chemistry 84(1):309-318.

Mulholland DA, Kotsos M, Mahomed HA, & Taylor DAH (1998) Triterpenoids from Owenia cepiodora. Phytochemistry 49(8):2457-2460.

Wattanapiromsakul C & Waterman PG (2000) Flavanone, triterpene and chromene derivatives from the stems of Paramignya griffithii. Phytochemistry 55(3):269-273. Fo ER, Fernandes JB, Vieira PC, & Da Silva MFDGF (1992) Isolation of

secoisolariciresinol diesters from stems of Simaba cuneata. Phytochemistry

31 (6):2115-2116.

Polonsky J, Baskevitch-Varon Z, & Das BC (1976) Triterpenes tetracycliques du Simarouba amara. Phytochemistry 15(2): 337-339.

Luo X-D, Wu S-H, Ma Y-B, & Wu D-G (2000) Tirucallane triterpenoids from

Dysoxylum hainanense. Phytochemistry 54(8): 801-805.

Liu H, Heilmann J, Rali T, & Sticher O (2001) New tirucallane-type triterpenes from Dysoxylum variabile. Journal of Natural Products 64(2): 159-163.

Kumar V, Niyaz NMM, Wickramaratne DBM, & Balasubramaniam S (1991)

Tirucallane derivatives from Paramignya monophylla fruits. Phytochemistry

30(4):1231-1233.

Jayakumar G, Ajitha Bai MD, & Fujimoto Y (2004) Beddomeilactone: a new triterpene from Dysoxylum Beddomei AU - Hisham, A. Natural Product Research 18(4):329- 334.

Gu J, et al. (2013) Chemical components of Dysoxylum densiflorum. Natural

Products and Bioprospecting 3(2):66-69.

Grosvenor SNJ, Mascoll K, McLean S, Reynolds WF, & Tinto WF (2006) Tirucallane, apotirucallane, and octanorapotirucallane triterpenes of Simarouba amara. Journal of Natural Products 69(9): 1315-1318.

Mohamad K, et al. (1999) Tirucallane triterpenes from Dysoxylum macranthum.

Phytochemistry 52(8) : 1461 - 1468.

Orisadipe AT, Adesomoju AA, D’Ambrosio M, Guerriero A, & Okogun Jl (2005) Tirucallane triterpenes from the leaf extract of Entandrophragma angolense.

Phytochemistry 66(19):2324-2328.

Chen J, et al. (2011) Cytotoxic triterpenoids from Azadirachta indica. Planta medica 77(16): 1844- 1847.

Ragasa CY, et al. (2013) Glabretal-type triterpenoids from Dysoxylum mollissimum. Phytochemistry Letters 6(4):514-518.

Inada A, Konishi M, Murata H, & Nakanishi T (1994) Structures of a new limonoid and a new triterpenoid derivative from pericarps of Trichilia connaroides. Journal of Natural Products 57( 10) : 1446- 1449.

Vieira Jl, et al. (2013) Hirtinone, a novel cycloartane-type triterpene and other compounds from Trichilia hirta L. (Meliaceae). Molecules 18(3):2589-2597.

Rodrigues VF, Carmo HM, Braz RF, Mathias L, & Vieira I (2010) Two new terpenoids from Trichilia quadrijuga (Meliaceae). Natural Product Communications 5(2): 179-184. Harding WW, Jacobs H, Lewis PA, McLean S, & Reynolds WF (2001) Cycloartanes, protolimonoids, a pregnane and a new ergostane from Trichilia reticulata. Natural Product Letters 15(4):253-260.

Ketwaru P, Klass J, Tinto WF, McLean S, & Reynolds WF (1993) Pregnane steroids from Trichilia schomburgkii. Journal of Natural Products 56(3):430-431.

Tinto WF, Jagessar PK, Ketwaru P, Reynolds WF, & McLean S (1991) Constituents of Trichilia schomburgkii. Journal of Natural Products 54(4):972-977.

Wang G-C, et al. (2016) Limonoids and triterpenoids as I ΐ b-HSDI inhibitors from Walsura robusta. Journal of Natural Products 79(4): 899-906. Liu J-Q, et al. (2012) Limonoids from the leaves of Toona ciliata var. yunnanensis. Phytochemistry 76(1):141-149.

Kishi K, Yoshikawa K, & Arihara S (1992) Limonoids and protolimonoids from the fruits of Phellodendron amurense. Phytochemistry 31 (4): 1335-1338.

Itokawa H, Kishi E, Morita H, & Takeya K (1992) Cytotoxic quassinoids and tirucallane-type triterpenes from the woods of Eurycoma longifolia. Chemical & Pharmaceutical Bulletin 40(4): 1053- 1055.

Saraiva RdCG, Pinto AC, Nunomura SM, & Pohlit AM (2006) Triterpenes and a canthinone alkaloid from the stems of Simaba polyphylla (Cavalcante) WW Thomas (Simaroubaceae). Quimica Nova 29(2):264-268.

Esimone CO, et al. (2008) Potential anti-respiratory syncytial virus lead compounds from Aglaia species. Die Pharmazie - An International Journal of Pharmaceutical Sciences 63(10):768-773.

Benosman A, et al. (1995) Tirucallane triterpenes from the stem bark of Aglaia leucophylla. Phytochemistry 40(5): 1485-1487.

Irungu BN, et al. (2015) Antiplasmodial and cytotoxic activities of the constituents of Turraea robusta and Turraea nilotica. Journal of Ethnopharmacology 174:419-425. Wang J-R, et al. (2011) Protolimonoids and norlimonoids from the stem bark of Toona ciliata var. pubescens. Organic & Biomolecular Chemistry 9(22):7685-7696. Ahsan M, Armstrong JA, Gray Al, & Waterman PG (1994) Boronialatenolide: a novel pentanortriterpene from the aerial parts of Boronia alata (Rutaceae). Australian Journal of Chemistry 47(9): 1783-1787.

Ahsan M, Armstrong JA, Gray Al, & Waterman PG (1995) Terpenoids, alkaloids and coumarins from Boronia inornata and Boronia gracilipes. Phytochemistry 38(5): 1275- 1278.

Reegan AD, Gandhi MR, Paulraj MG, Balakrishna K, & Ignacimuthu S (2014) Effect of niloticin, a protolimonoid isolated from Limonia acidissima L. (Rutaceae) on the immature stages of dengue vector Aedes aegypt i L. (Diptera: Culicidae). Acta Tropica 139(1):67-76.

Lien TP, Kamperdick C, Schmidt J, Adam G, & Van Sung T (2002) Apotirucallane triterpenoids from Luvunga sarmentosa (Rutaceae). Phytochemistry 60(7): 747-754. Kiplimo J, Islam S, & Koorbanally N (2012) Ring A, D-SECO limonoids and flavonoid from the Kenyan Vepris uguenensis Engl and their antioxidant activity. Planta Medica 78(11): PI111.

Hong Z-L, et al. (2013) Tetracyclic triterpenoids and terpenylated coumarins from the bark of Ailanthus altissima (“tree of heaven”). Phytochemistry 86(1): 159-167.

Grieco PA, Haddad J, Pineiro-Niinez MM, & Huffman JC (1999) Quassinoids from the twigs and thorns of Castela polyandra. Phytochemistry 50(4):637-645.

Wang J, Zhang Y, Luo J, & Kong L (2011) Complete 1 H and 13C NMR data assignment of protolimonoids from the stem barks of Aphanamixis grandifolia.

Magnetic Resonance in Chemistry 49(7):450-457.

Zhang X-Y, et al. (2010) Tirucallane-type alkaloids from the bark of Dysoxylum laxiracemosum. Journal of Natural Products 73(8): 1385- 1388.

Huang HL, et al. (2011) Tirucallane-type triterpenoids from Dysoxylum lenticellatum. Journal of Natural Products 74(10):2235-2242. Hayasida W, Oliveira L, Ferreira A, & Lima M (2017) Ergostane steroids, tirucallane and apotirucallane triterpenes from Guarea convergens. Chemistry of Natural Compounds 53(2):312-317.

Jimenez A, et al. (1998) Limonoids from Swietenia humilis and Guarea grandiflora (Meliaceae) Taken in part from the PhD and MS theses of C. Villarreal and M. A. Jimenez, respectively. Phytochemistry 49(7):1981-1988.

Miguita CH, et al. (2015) 3b-O-tigIoymelianol from Guarea kunthiana : a new potential agent to control rhipicephalus (boophilus) microplus, a cattle tick of veterinary significance. Molecules 20(1): 111.

Ntalli NG, et al. (2010) Cytotoxic tirucallane triterpenoids from Melia azedarach fruits. Molecules 15(9):5866-5877.

Han J, Lin W, Xu R, Wang W, & Zhao S (1991) Studies on the chemical constituents of Melia azedarach L. Acta pharmaceutica Sinica 26(6):426-429.

Coombes PH, Mulholland DA, & Randrianarivelojosia M (2005) Mexicanolide limonoids from the Madagascan Meliaceae Quivisia papinae. Phytochemistry 66(10):1100-1107.

Kaur R & Arora S (2009) Chemical constituents and biological activities of Chukrasia tabularis A. Juss.-A review. Journal of Medicinal Plants Research 3(4): 196-216. Basak S & Islam A (1970) DP Melianone from Swietenia mahagoni. J. Indian Chem. Soc. 47(5) :501 -503.

Su R, et al. (1990) Triterpenoids from the fruits of Phellodendron chinense: The stereostructure of niloticin. Chemical and Pharmaceutical Bulletin 38(6):1616-1619. Biavatti MW, et al. (2001) Chemistry and bioactivity of Raulinoa echinata Cowan, an endemic Brazilian Rutaceae species. Phytomedicine 8(2):121-124.

Yang S-P, Ni G, Gu Y-C, & Yue J-M (2012) Triterpenoids from Aglaia odorata var. microphyllina AU - Liu, Jia. J. Asian Nat. Prod. Res. 14(10):929-939.

Bevan C, Ekong D, Halsall T, & Toft P (1967) West African timbers. Part XX. The structure of turraeanthin, an oxygenated tetracyclic triterpene monoacetate. Journal of the Chemical Society C: Organic: 820-828. eisler, R. K. Hughes, F. Sainsbury, G. P. Lomonossoff, M. Rejzek, S. Fairhurst,C.-E.

Olsen, M. S. Motawia, R. E. Melton, A. M. Hemmings, et al. Biochemical analysis of a multifunctional cytochrome P450 (CYP51) enzyme required for synthesis of antimicrobial triterpenes in plants. Proceedings of the National Academy of Sciences, 110(35): E3360-E3367, 2013.

. Grebenok, T. E. Ohnmeiss, A. Yamamoto, E. D. Huntley, D. W. Galbraith, and D.

Della Penna. Isolation and characterization of an Arabidopsis thaliana C-8,7 sterol isomerase: functional and structural similarities to mammalian C-8, 7 sterol isomerase/emopamil-binding protein. Plant molecular biology, 38(5):807-815, 1998. ahier, S. Pierre, G. Riveill, and F. Karst. Identification of essential amino acid residues in a sterol 8, 7-isomerase from zea mays reveals functional homology and diversity with the isomerases of animal and fungal origin. Biochemical Journal, 414 (2):247-259, 2008. A. C. Huang, T. Jiang, Y.-X. Liu, Y.-C. Bai, J. Reed, B. Qu, A. Goossens, H.-W.

Nutzmann, Y. Bai, and A. Osbourn. A specialized metabolic network selectively modulates Arabidopsis root microbiota. Science, 364(6440) :eaau6389, 2019.

A. Bayer, X. Ma, and J. Stockigt. Acetyltransfer in natural product biosynthesis functional cloning andmolecular analysis of vinorine synthase. Bioorganic &medicinal chemistry, 12(10):2787-2795, 2004.

S. T. Mugford, X. Qi, S. Bakht, L. Hill, E. Wegel, R. K. Hughes, K. Papadopoulou,

R. Melton, M. Philo, F. Sainsbury, et al. A serine carboxypeptidase-like

acyltransferase is required for synthesis of antimicrobial compounds and disease resistance in oats. The Plant Cell, 21 (8):2473-2484, 2009.

M. Giolai, P. Paajanen, W. Verweij, L. Percival-Alwyn, D. Baker, K. Witek, F. Jupe,G.

Bryan, I. Hein, J. D. Jones, et al. Targeted capture and sequencing of gene-sized DNA molecules. Biotechniques, 61 (6) : 315—322 , 2016.

A. Hallab. Protein Function Prediction Using Phylogenomics, Domain Architecture

Analysis, Data Integration, and Lexical Scoring. PhD thesis, Universitats-und

Landesbibliothek Bonn, 2015.

T. Z. Berardini, L. Reiser, D. Li, Y. Mezheritsky, R. Muller, E. Strait, and E. Huala. The

Arabidopsis information resource: making and mining the‘gold standard’ annotated reference plant genome genesis, 53(8):474-485, 2015.

U. Consortium. UniProt: a hub for protein information. Nucleic acids research, 43(D1):

D204-D212, 2014.

P. Jones, D. Binns, H.-Y. Chang, M. Fraser, W. Li, C. McAnulla, H. McWilliam, J.

Maslen, A. Mitchell, G.Nuka, et al. Interproscan 5: genome-scale protein function classification. Bioinformatics, 30(9):1236-1240, 2014.

RNA-Seq analysis workshop course material (Weill CornellMedical College). Published on the internet- date accessed: 06.11.19. URL

http://chagall.med.cornell.edu/RNASEOcourse/.

S. Andrews, F. Krueger, A. Segonds-Pichon, L. Biggins, C. Krueger, and S.

Wingett.FastOC. Babraham Institute, 2012.93. A. Dobin, C. A. Davis, F. Schlesinger, J. Drenkow, C. Zaleski, S. Jha, P. Batut.M. Chaisson.and T. R. Gingeras. STAR: ultrafast universal RNA-Seq aligner. Bioinformatics, 29(1): 15—21 , 2013.

H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G.

Abecasis.and R. Durbin. The sequence alignment/map format and SAMtools.

Bioinformatics, 25(16):2078-2079, 2009.

Y. Liao, G. K. Smyth, and W. Shi. The Subread aligner: fast, accurate and scalable

readmapping by seed-and-vote. Nucleic acids research, 41(10):e108-e108, 2013. M. I. Love, W. Huber, and S. Anders. Moderated estimation of fold change and

dispersion for RNA-Seq data with DESeq2. Genome biology, 15(12):550, 2014. . Robinson, D. J. McCarthy, and G. K. Smyth. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics,

26(1 ): 139—140, 2010.

iu, B. Khakimov, P. D. Cardenas, F. Cozzi, C. E. Olsen, K. R. Jensen, T. P. Hauser, and S. Bak. The cytochrome P450 CYP72A552 is key to production of hederagenin- based saponins that mediate plant defense against herbivores. New Phytologist, 222(3): 1599-1609, 2019.

hao, Y. Guo, Q. Sheng, and Y. Shyr. Heatmap3: an improved heatmap package withmore powerful and convenient features. BMC bioinformatics, 15(S10): P16, 2014.