USE OF HETEROLOGOUS EXPRESSED POLYKETIDE SYNTHASE AND SMALL MOLECULE FOLDASES TO MAKE AROMATIC AND CYCLIC COMPOUNDS

Title:

USE OF HETEROLOGOUS EXPRESSED POLYKETIDE SYNTHASE AND SMALL MOLECULE FOLDASES TO MAKE AROMATIC AND CYCLIC COMPOUNDS

Document Type and Number:

WIPO Patent Application WO/2016/198623

Kind Code:

Abstract:

A method for producing individual or libraries of tri- to pentadecaketide-derived aromatic compounds of interest by heterologous expression of polyketide synthase and aromatase/cyclase in a recombinant host cell.

Inventors:

FRANDSEN RASMUS JOHN NORMAND (DK)
MORTENSEN UFFE HASBRO (DK)
COUMOU HILDE CORNELIJNE (DK)
KANNANGARA RUBINI MAYA (DK)
MADSEN BJØRN (DK)
NAFISI MAJSE (DK)
ANDERSEN-RANBERG JOHAN (DK)
KONGSTAD KENNETH THERMANN (DK)
OKKELS FINN THYGE (DK)
KHORSAND-JAMAL PAIMAN (DK)
STÆRK DAN (DK)

Application Number:

PCT/EP2016/063331

Publication Date:

December 15, 2016

Filing Date:

June 10, 2016

Export Citation:

Click for automatic bibliography generation Help

Assignee:

UNIV DANMARKS TEKNISKE (DK)
KØBENHAVNS UNIV (DK)

International Classes:

C12P7/26; C12N1/15; C12N1/19; C12N15/82; C12P7/40; C12P15/00; C12P17/06; C12P19/60

Other References:

GAGNE ET AL: "Identification of olivetolic acid cyclase from Cannabis sativa reveals a unique catalytic route to plant polyketides", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES, U.S.A., vol. 109, no. 31, 31 July 2012 (2012-07-31), pages 12811 - 12816, XP002747517
AMES ET AL: "Structural and biochemical characterization of ZhuI aromatase/cyclase from the R1128 polyketide pathway", BIOCHEMISTRY, vol. 50, 26 August 2011 (2011-08-26), pages 8392 - 8406, XP002747579
WENJUN ZHANG ET AL: "Investigation of early tailoring reactions in the oxytetracycline biosynthetic pathway", THE JOURNAL OF BIOLOGICAL CHEMISTRY, vol. 282, 31 August 2007 (2007-08-31), pages 25717 - 25725, XP002500433
ZHOU ET AL: "Cyclization of aromatic polyketides from bacteria and fungi", NATURAL PRODUCT REPORTS, vol. 27, 2014, pages 839 - 868, XP002747553
LE ZHANG ET AL: "Synthesis of unnatural small molecules by plant specific polyketide synthases", CHINESE JOURNAL OF ORGANIC CHEMISTRY, vol. 33, 2013, pages 2469 - 2484, XP002747593
JADHAV ET AL: "Polyketide synthesis in tobacco plants transformed with a Plumbago zeylanica type III hexaketide synthase", PHYTOCHEMISTRY, vol. 98, 2014, pages 92 - 100, XP002747554
KHO GO ET AL: "Synthetic polyketide enzymology: platform for biosynthesis of antimicrobial polyketides", ACS CATALYSIS, vol. 5, 27 May 2015 (2015-05-27), pages 4033 - 4042, XP008177920
HARTUNG JÖRGENSEN: "Fusarium graminearum PKS14 is involved in orsellinic acid and orcinol synthesis", FUNGAL GENETICS AND BIOLOGY, vol. 70, 8 July 2014 (2014-07-08), pages 24 - 31, XP002747518
HASHIMOTO ET AL: "Fungal type III polyketide synthases", NATURAL PRODUCT REPORTS, vol. 31, 2014, pages 1306 - 1317, XP002747580
DAYU YU ET AL: "Type III polyketide synthases in natural product biosynthesis", IUBMB LIFE, vol. 64, 2012, pages 285 - 295, XP002747467
IMAN A.M. ABDEL-RAHMAN ET AL: "In vitro formation of the anthranoid scaffold by cell-free extracts from yeast-extract-treated Cassia bicapsularis cell cultures", PHYTOCHEMISTRY, vol. 88, 8 February 2013 (2013-02-08), pages 15 - 24, XP002747594
SARAH ELIZABETH POSETH: "Cyclization modes in type III polyketide synthases and the synthetic compounds used as probes for mechanistic studies", THESIS, 2012, pages Cover page, i-xii - 1-88, XP002747581, Retrieved from the Internet [retrieved on 20151020]
VAGSTAD ET AL: "Characterization of a fungal thioesterase having Claisen cyclase and deacetylase activities in melanin biosynthesis", CHEMISTRY & BIOLOGY, vol. 19, 2012, pages 1525 - 1534, XP002760112
FANG YANG ET AL: "Biosynthesis of phloroglucinol compounds in microorganisms", APPLIED MICROBIOLOGY AND BIOTECHNOLOGY, vol. 93, 2012, pages 487 - 495, XP035001278
KARPPINEN ET AL: "Octaketide-producing type III polyketide synthase from Hypericum perforatum is expressed in dark glands accumulating hypericins", THE FEBS JOURNAL, vol. 275, 2008, pages 4329 - 4342, XP002752481
J. SAMBROOK; E.F. FRITSCH; T. MANIATUS: "Molecular Cloning, A Laboratory Manual, 2nd edition", 1989, COLD SPRING HARBOR
R.D. FINN; A. BATEMAN; J. CLEMENTS; P. COGGILL; R.Y. EBERHARDT; S.R. EDDY; A. HEGER; K. HETHERINGTON; L. HOLM; J. MISTRY: "The Pfam protein families database", NUCLEIC ACIDS RESEARCH, vol. 42, 2014, pages D222 - D230
NEEDLEMAN; WUNSCH, J. MOL. BIOL., vol. 48, 1970, pages 443 - 453
RICE ET AL., TRENDS GENET., vol. 16, 2000, pages 276 - 277
HUSSAM H. NOUR-ELDIN; BJARNE G. HANSEN; MORTEN H. H. NORHOLM; JACOB K. JENSEN; BARBARA A. HALKIER: "Advancing uracil-excision based cloning towards an ideal technique for cloning PCR fragments", NUCLEIC ACIDS RES., vol. 34, no. 18, 2006, pages E122
MIKKELSEN MD; BURON LD; SALOMONSEN B; OLSEN CE; HANSEN BG; MORTENSEN UH; HALKIER BA: "Microbial production of indolylglucosinolate through engineering of a multi-gene pathway in a versatile yeast expression platform", METAB ENG., vol. 14, 2012, pages 104 - 111
GIETZ, R.D.; SCHIESTL, R.H.: "High-efficiency yeast transformation using the LiAc/SS carrier DNA/PEG method", NAT. PROTOC., vol. 2, 2007, pages 31 - 34
KARPPINEN ET AL.: "Octaketide-producing type III polyketide synthase from Hypericum perforatum is expressed in dark glands accumulating hypericins", FEBS, vol. 275, no. 17, 2008, pages 4329 - 4342
M RODRIGUEZ: "Plant Isoprenoids, Methods in Molecular Biology", vol. 1153, 2014, HUMANA PRESS, article BACH, S.S.; BASSARD, J.E.; ANDERSEN-RANBERG, J.; MOLDRUP, M.E.; SIMONSEN, H.T.; HAMBERGER, B.: "High-Throughput Testing of Terpenoid Biosynthesis Candidate Genes Using Transient Expression in Nicotiana benthamiana"
SAINSBURY, F.; THEUNEMANN, EC.; LOMONOSSOFF, GP.: "pEAQ: versatile expression vectors for easy and quick transient expression of heterologous proteins in plants", PLANT BIOTECHNOLOGY JOURNAL, vol. 7, no. 7, 2009, pages 682 - 693

Attorney, Agent or Firm:

GUARDIAN IP CONSULTING I/S (Building 381, 2800 Kgs. Lyngby, DK)

Download PDF:

View/Download PDF PDF Help

Claims:

A method of producing a polyketide-derived aromatic, polyaromatic, cyclic or polycyclic compound, wherein the carbon atom chain length of the polyketide backbone of the compounds is selected from 6 - 31 carbon atoms, comprising the steps of: a. providing a recombinant cell comprising: i. a transgene encoding a heterologous type III polyketide synthase capable of forming a linear non-reduced polyketide compound, wherein the carbon atom chain length of the polyketide backbone of the formed compound is selected from 6 - 31 carbon atoms; and ii. a transgene encoding a first heterologous 'small molecule foldase' enzyme capable of catalyzing the formation of one or more region- specific intramolecular carbon-carbon or carbon-oxygen bonds in a linear non-reduced polyketide compound, wherein the carbon atom chain length of the polyketide backbone of the compound is one or more of 6 - 31 carbon atoms, wherein the 'small molecule foldase' enzyme is a bacterial or fungal enzyme, and wherein the genus from which said bacterial or fungal enzyme is derived is different from the genus from which said PKSIII enzyme is derived, and wherein the recombinant cell is capable of a producing polyketide- derived aromatic, polyaromatic, cyclic or polycyclic compound, wherein the carbon atom chain length of the polyketide backbone of the compound is selected from among 6 - 31 carbon atoms; and b. incubating and/or culturing the recombinant cell in a culture medium to

support synthesis of the polyketide-derived aromatic, polyaromatic, cyclic or polycyclic compound.

A method of producing a library of polyketide-derived aromatic, polyaromatic, cyclic, and polycyclic compounds, wherein the carbon atom chain length of the polyketide backbone of the compounds is selected from two or more of 6 - 31 carbon atoms, comprising the steps of: a. providing one or more heterogeneous populations of recombinant cells, wherein each cell in the one or more populations comprises: i. a transgene encoding a heterologous type III polyketide synthase capable of forming a linear non-reduced polyketide compound , wherein the carbon atom chain length of the polyketide backbone of the formed compound is selected from 6 - 31 carbon atoms; and ii. a transgene encoding a first heterologous 'small molecule foldase' enzyme capable of catalyzing the formation of one or more region- specific intramolecular carbon-carbon or carbon-oxygen bonds in a linear non-reduced polyketide compound , wherein the carbon atom chain length of the polyketide backbone of the compound is one or more of 6 - 31 carbon atoms, wherein the 'small molecule foldase' enzyme is a bacterial or fungal enzyme, and wherein the genus from which said bacterial or fungal enzyme is derived is different from the genus from which the PKSIII enzyme is derived, and wherein the one or more populations of recombinant cells comprises cells capable of producing polyketide-derived aromatic, polyaromatic, cyclic, and/or polycyclic compounds, wherein the carbon atom chain length of the polyketide backbone of the compounds is selected from two or more of 6 - 31 carbon atoms; and b. incubating and/or culturing the one or more heterogeneous populations of recombinant cells in a culture medium to support synthesis of the library of polyketide-derived aromatic, polyaromatic, cyclic, and polycyclic compounds.

The method of claim 2, further comprising a step of: c. screening the library of polyketide-derived aromatic, polyaromatic, cyclic, and polycyclic compounds, wherein each recombinant cell, or its clonal derivatives, present in the one or more heterogeneous population of recombinant cells is grown individually on a solid support, or individually in a liquid culture.

The method of claim 2 or 3, further comprising the step of recovering the polyketide-derived aromatic, polyaromatic, cyclic, and polycyclic compounds produced by the one or more heterogeneous populations of recombinant cells or produced by one or more of the recombinant cell clones present in the one or more heterogeneous populations of recombinant cells.

The method according to any of claims 1-4, wherein the heterologous type III polyketide synthase is selected from the group consisting of: a. Triketide synthase polypeptide, wherein the amino acid sequence of the polypeptide has at least 70% sequence identity to 2-PS (SEQ ID No.2) from Gerbera hybrid; b. Tetraketide synthase polypeptide, wherein the amino acid sequence of the polypeptide has at least 70% sequence identity to PhID (SEQ ID No.4) from Pseudomonas fluoresceins; c. Pentaketide synthase polypeptide, wherein the amino acid sequence of the polypeptide has at least 70% sequence identity to a sequence selected from the group consisting of PCS (SEQ ID No.6) from Aloe arborescens, ORAS (SEQ ID No.8) from Neurospora crassa, and 1,3,6,8- tetrahydroxynaphthalene synthase (SEQ ID No.10) from Streptomyces fulvissimus; d. Hexaketide synthase polypeptide, wherein the amino acid sequence of the polypeptide has at least 70% sequence identity to a sequence selected from the group consisting of PinPKS (SEQ ID No.12) from Plumbago indica, DIuHKS (SEQ ID No.14) from Drosophyllum lusitanicum, and PzPKS (SEQ ID No.16) from Plumbago zeylanica; e. Heptaketide synthase polypeptide, wherein the amino acid sequence of the polypeptide has at least 70% sequence identity to ALS (SEQ ID No.18) from Rheum palmatum or AaPKS3 (SEQ ID No.20) from Aloe arborescens; f. Octaktide synthase polypeptide, wherein the amino acid sequence of the polypeptide has at least 70% sequence identity to a sequence selected from the group consisting of OKS (SEQ ID No.22), OKS2 (SEQ ID No.24), OKS3 (SEQ ID No.26) from Aloe arborescens or HpPKS2 (SEQ ID No.28) from Hypericum perforatum; g. Nonaketide synthase polypeptide, wherein the amino acid sequence of the polypeptide has at least 70% sequence identity to PCS F80A/Y82A/M207G (SEQ ID No.29) from Aloe arborescens; h. Decaketide synthase polypeptide, wherein the amino acid sequence of the polypeptide has at least 70% sequence identity to OKS N222G (SEQ ID No.30) from Aloe arborescens; and i. Dodecaketide synthase polypeptide, wherein the amino acid sequence of the polypeptide has at least 70% sequence identity to OKS F66L/N222G (SEQ ID No.31) from Aloe arborescens.

The method of any one of claims 1-5, wherein the cell comprises one or more transgene encoding a second, third and fourth heterologous 'small molecule foldase' enzyme capable of catalyzing the formation of one or more region-specific intramolecular carbon-carbon or carbon-oxygen bonds in a non-linear polyketide compound, and wherein the second, third and fourth 'small molecule foldase' enzymes are bacterial or fungal enzymes, and wherein the genus from which said bacterial or fungal enzymes is derived is different from the genus from which the PKSIII enzyme is derived.

The method of any one of claims 1-6, wherein one or more of said first, second, third and fourth heterologous 'small molecule foldase' enzyme has cyclase or aromatase catalytic activity and a corresponding structural domain selected from the group consisting of: a. a pfam04199 cyclase superfamily domain; b. a pfaml0604 or pfam03364 SRPBCC superfamily domain; c. a pfam07876 Dabb superfamily domain;

d. a pfam04673 Polyketide synthesis cyclase superfamily domain; e. a pfam00753 Lactamase_B/MBL fold metallo-hydrolase superfamily domain; f. a pfam07883 Cupin-2 superfamily domain; g. Dissected Product template (TIGR04532) domains from type I iterative PKS from filamentous fungi.

The method of any one of claims 1-7, wherein said first heterologous 'small molecule foldase' enzyme is selected from one or more of the groups consisting of: a. SRPBCC Foldase, wherein the amino acid sequence of the polypeptide has at least 70% sequence identity to a sequence selected from the group consisting of Zhul (SEQ ID No.33) from Streptomyces sp. R1128, pdmD (SEQ ID No.35) from Actinomadura hibisca, sanl (SEQ ID No.37) from

Streptomyces sp., SANK 61196; pnxD (SEQ ID No.39) from Streptomyces sp. TA-0256, llpCI (SEQ ID No.41) from Streptomyces tendae; Zhul-l (SEQ ID No.66) from Aspergillus nidulans or ZhuI-2 (SEQ ID No.69) from Aspergillus nidulans;

2xSRPBCC foldase, wherein the amino acid sequence of the polypeptide has at least 70% sequence identity to a sequence selected from the group consisting of gra-orf4 (SEQ ID No.43) from Streptomyces violaceoruber, schP4 (SEQ ID No.45) from Streptomyces fulvissimus DSM 40593, Erd4 (SEQ ID No.47) from uncultured soil bacterium V167, med-ORF19 (SEQ ID No.49) from Streptomyces sp. AM-7161, ssfYl (SEQ ID No.51) from

Streptomyces sp. SF2575, oxyK (SEQ ID No.53) from Streptomyces rimosus, Act_ARO-CYC ( SEQ ID No.55) from Streptomyces coelicolor A3(2);

Dabb foldase , wherein the amino acid sequence of the polypeptide has at least 70% sequence identity to a sequence selected from the group consisting of AOC-1 (SEQ ID No.71) from Aspergillus nidulans, AOC-2 (SEQ ID No.73) from Aspergillus nidulans, AOC-3 (SEQ ID No.75) from Aspergillus nidulans, AOC-4 (SEQ ID No.77) from Aspergillus nidulans, or AOC-5 (SEQ ID No.79) from Aspergillus nidulans; and

Dissected PT domain, wherein the amino acid sequence of the polypeptide has at least 70% sequence identity to a sequence selected from the group consisting of wA-PT (SEQ ID No.59) from Aspergillus nidulan to form C7-C12 + C1-C10, BIK1-PT (SEQ ID No.6) from Fusarium fujikuroi, PGL1_PT (SEQ ID No.63) from Fusarium graminearum, mpdG_PT (SEQ ID No.65) from Aspergillus nidulans or curs2-PT (GenBank AGC95321.1 position 1270 to 1613) from Aspergillus.

The method of any one of claims 6 - 8, wherein one or more of said second, third or fourth heterologous 'small molecule foldase' enzymes is selected from one or more of the groups consisting of: a. Cyclase foldase, wherein the amino acid sequence of the polypeptide has at least 70% sequence identity to a sequence selected from the group consisting of ZhuJ (SEQ ID No.81) from Streptomyces sp. R1128, oxyN (SEQ ID No.83) from Streptomyces rimosus, jadl (SEQ ID No.85) from

Streptomyces venezuelae, LndF (SEQ ID No.86) from Streptomyces globisporus, pgaF (SEQ ID No.89) from Streptomyces coelicoflavus, pnxK (SEQ ID No.95) from Streptomyces sp., NpCIII (SEQ ID No.101) from Streptomyces tendae, Act_CYC (SEQ ID No.91) from Streptomyces coelicolor A3(2), sanE (SEQ ID No.93) from Streptomyces ansochromogenes; b. Cupin foldase, wherein the amino acid sequence of the polypeptide has at least 70% sequence identity to a sequence selected from the group consisting of pnxL (SEQ ID No.95) from Streptomyces sp. TA-0256, NpCII (SEQ ID No.99) from Streptomyces tendae, c. Cyclase foldase, wherein the amino acid sequence of the polypeptide has at least 70% sequence identity to a sequence selected from the group consisting of ZhuJ-1 (SEQ ID No.103) from Aspergillus nidulans, ZhuJ-2 (SEQ ID No.105) from Aspergillus nidulans, ZhuJ-3 (SEQ ID No.107) from Aspergillus nidulans, ZhuJ-4 (SEQ ID No.109) from Aspergillus nidulans;

The method of any one of claim 1-9, wherein the recombinant cell or the

recombinant cells in the one or more heterogeneous populations, is selected from among a bacterial cell, a filamentous fungal cell, a yeast cell and a plant cell.

The method according to claim 10, wherein the yeast cell is an Ascomycete selected from the group consisting of Ashbya, Botryoascus, Debaryomyces, Hansenula, Kluveromyces, Lipomyces, Saccharomyces spp and the filamentous fungal cell is selected from the group consisting of Acremonium, Aspergillus, Fusarium, Humicola, Mucor, Myceliophthora, Neurospora, Penicillium, Thielavia, Tolypocladium, and Tricho derma.

The method according to claim 10, wherein the bacterial cell is selected from the group consisting of: Bacillus, Streptomyces, Corynebacterium, Pseudomonas, lactic acid bacteria and an E. coli cell.

The method according to claim 10, wherein the recombinant host cell is a Nicothiana benthamiana or Arabidopsis thaliana plant cell.

A heterogeneous population of recombinant cells capable of producing a library of polyketide-derived aromatic, polyaromatic, cyclic, and/or polycyclic compounds, according to the method of any one of claims 2 - 5, 7, 8, 10-13, wherein each cell in the population comprises: a. a transgene encoding a heterologous type III PKS capable of forming a

polyketide-derived aromatic, polyaromatic, cyclic, and/or polycyclic compound, wherein the carbon atom chain length of the polyketide backbone of the formed compound is selected from 6 - 31 carbon atoms; and a transgene encoding a first heterologous 'small molecule foldase' enzyme capable of catalyzing the formation of one or more specific intramolecular carbon-carbon bonds in a polyketide-derived aromatic, polyaromatic, cyclic and polycyclic compound, wherein the carbon atom chain length of the polyketide backbone of the compound is one or more of 6 - 31 carbon atoms, wherein the 'small molecule foldase' enzyme is a bacterial or fungal enzyme, and wherein the genus from which said bacterial or fungal enzyme is derived is different from the genus from which the PKSIII enzyme is derived, wherein the population of recombinant cells comprises cells capable of producing polyketide-derived aromatic, polyaromatic, cyclic, and/or polycyclic compounds, wherein the carbon atom chain length of the polyketide backbone of the compounds is selected from two or more of 6 - 31 carbon atoms.

15. The heterogeneous population of recombinant cells of claim 14, wherein each cell in the population further comprises one or more transgene encoding a second, third and fourth heterologous 'small molecule foldase' enzyme capable of catalyzing the formation of one or more region-specific intramolecular carbon-carbon or carbon- oxygen bonds in a non-linear polyketide compound, and wherein the second, third and fourth 'small molecule foldase' enzymes are bacterial or fungal enzymes, and wherein the genus from which said bacterial or fungal enzymes is derived is different from the genus from which the PKSIII enzyme is derived.

Description:

USE OF HETEROLOGOUS EXPRESSED POLYKETIDE SYNTHASE AND SMALL

MOLECULE FOLDASES TO MAKE AROMATIC AND CYCLIC COMPOUNDS FIELD OF THE INVENTION

The present invention relates to a method for producing individual and libraries of tri- to pentadecaketide derived aromatic and cyclic compounds of interest by heterologous expression of a polyketide synthase and one or more aromatases/cyclases in a recombinant host cell.

BACKGROUND OF THE INVENTION

Small molecules, of biological origin, often include aromatic or cyclic groups that impact their physiochemical and biological properties. Although nature is rich in aromatic compounds with different carbon skeletons, there is an urgent need for biosynthetic systems capable of producing both natural and new-to-nature aromatic compounds. Areas of specific interest are the formation of carbon skeletons that can be used medicinally (e.g. new antibiotics), or as chemical substitutes, or as food ingredients, or as precursors for the formation of more complex compounds. Among the top 100 drugs developed, 60% are small molecules (excluding proteins), and of these 82% possess aromatic motifs. Complex aromatic compounds are produced via many different biosynthetic pathways in nature, either as part of primary or secondary metabolism. One of the most versatile biosynthetic schemes for producing aromatic compounds is via the non-reducing polyketide pathways, wherein two-carbon units (-CH ₂-CO-), referred to as ketides or 'ketide units', are polymerized into linear chains called polyketides, which subsequently can fold into aromatic structures. The formation of polyketides is dependent on an enzymes class known as polyketide synthases (PKSs). Polyketides are synthesized by a group of enzymes which commonly is referred to as polyketide synthases (PKS). All PKSs share the ability to catalyze Claisen condensation based fusion of acyl groups by the formation of carbon-carbon bonds coupled with the release of carbon dioxide. This reaction is catalyzed by a beta-ketosynthase domain (KS). In addition to this domain/active site, synthesis can also depend on, but not exclusively, the action of Acyl-Carrier-Protein (ACP), Acyl-transferase (AT), Starter-Acyl-Transferase (SAT), Product Template (PT), ThioEsterase (TE), Chain Length Factor (CLF, also known as ΚΞβ), CLaisen CYCIase (CL-CYC), Ketoreductase (KR), Dehydratase (DH), Enoyl Reductase (ER) and C-METhyl transferase (Cmet). The substrates for polyketide synthesis are typically classified into starter and extender units, where the starter unit, e.g. but not exclusively, acetyl-CoA is the first added unit of the growing polyketide chain; and extender units, e.g. but not exclusively, malonyl-CoAs are all subsequently added carbon-carbon units. If the substrate is the standard starter (acetyl-CoA) and extender (malonyl-CoA) units, then the number of carbon atoms in the resulting polyketide chain will equal two times the number of iterations/'condensation reactions', performed by the PKS enzyme. Thus, a heptaketide synthase will perform six condensation reactions joining one starter unit (two carbons) with six extender units (six times two carbons), resulting in a polyketide consisting of seven ketide units, made up of a total of fourteen carbon atoms. However, PKSs may use alternative starter and extender units which can alter the number of carbon atoms in the final product, for example a heptaketide synthase could use p-coumaric acid (nine carbons) as a starter unit and six methyl-malonyl-CoA (six times three carbons) as extender units resulting in a heptaketide with twenty-seven carbon atoms. Each individual PKS, e.g. a heptaketide synthase, displays a different affinity for different starter and extender units, and can hence produce very different compounds which all will be categorized as heptaketides. The substrate availability in the host cell can also affect which product a given PKS produces as its preferred substrate may only be available in very limited amounts, or not at all, compared to less preferred substrates which then will outcompete the preferred substrate.

The chain length of the polyketide product is thus the result of the number of condensation reactions the PKS performs, which covalently joins one starter unit with one or more extender units together in a head-to-tail manner. A PKS that performs one iteration/condensation will produce a diketide, one that performs two iterations/condensations will produce a triketide, one that performs three iterations/ condensations will produce a tetraketide, and soforth. The number of carbon atoms in the resulting polyketides will in addition be the result of which starter and extender units the enzyme utilize.

At the primary sequence level (amino acid sequence), secondary structure level (local fold), tertiary structure level (all over fold) and quaternary structure level (protein-protein interactions) the PKSs display a very large diversity, and are hence subdivided into different types.

Type I PKS systems are typically found in filamentous fungi and bacteria, where they are responsible for both the formation of aromatic, polyaromatic and reduced polyketides. Members of the type I PKS possess several active sites on the same polypeptide chain and the individual enzyme is able to catalyze the repeated condensation of two-carbon units. The minimal set of domains in type I PKS includes KS, AT and ACP. The type I PKSs are further subdivided into modular PKSs and iterative PKSs, where iterative PKSs only possess a single copy of each active site type and re-use these repeatedly until the growing polyketide chain has reached its predetermined length. Type I iterative PKSs that forms aromatic and polyaromatic compounds typically rely on endogenous PT and CL-CYC domains to direct folding of the formed non-reduced polyketide chain. Dissected PT domains have been shown to work in trans with heterologous KS-AT-ACP fragments from the type I iterative PKSs to form folded polyketide products. The PT domains typically promote the formation of several intramolecular bonds. Modular PKSs contain several copies of the same active sites, these are organized into repeated sequences of active sites which are called modules, each module is responsible for adding and modifying a single ketide unit. Each active site in the individual modules is only used once during synthesis of a single polyketide. Type I iterative PKS are typically found in fungi, while type I modular PKSs are typically found in bacteria. Type I modular PKSs that form macrolide (macrocyclic) compounds includes a terminal CL-CYC domain.

Type II PKS systems are responsible for formation of aromatic and polyaromatic compounds in bacteria. Type II PKSs are protein complexes where individual enzymes interact transiently to form the functional PKS enzyme. The involved enzymes include activities for KS, CLF and ACP. Type II PKSs forms linear non-reduced polyketides that spontaneously folds into aromatic/cyclic compounds via the formation of intra-molecular carbon-carbon and carbon-oxygen bonds. Types I modular (Im), type I iterative (Ii) and type II (II) are all dependent on an ACP domain(s) which is responsible for tethering the growing polyketide (acyl) chain to the enzyme during synthesis. In the ACP-dependent PKS types, the acyl group is transferred from the incoming Co-enzyme A (CoA) to the ACP domain and is subsequently condensed with another acyl group bound to the KS domain of the enzyme, resulting in a diketide bound to the ACP domain. The formed diketide is subsequently moved back to the KS domain and another ACP bound extender unit, is loaded into the enzyme.

Type III PKSs generally only consist of a KS domain, referred to as a KASIII or Chalcone synthase domain and they lack an ACP domain. Type III PKSs are self-contained enzymes that form homodimers. Their single active site in each monomer catalyzes the priming and extension reactions iteratively to form polyketide products. Type III PKS from bacteria, plant and fungi have been described. Type III PKSs (also known as Chalcone synthase) have long been known in plants, where they are responsible for formation of compounds such as flavonoids (pigments/anti-oxidants) and stilbenes, which are found in many different plant species. Formation of flavonoids and stilbenes depends on one p-coumaroyl CoA starter unit and three malonyl-CoA extender units. The products of type III PKSs often spontaneously fold into complex aromatic/cyclic compounds, e.g. flavonoids in plants. Type III PKSs that use acetyl/malonyl-CoA as starter unit and malonyl-CoA as extender units resulting in linear non-reduced polyketides have also been described in plants. Type III enzymes do not have an 'acyl carrier protein' (ACP) functionality, but instead they rely on Co-enzyme A linking for associating the growing polyketide chain with the enzyme during the multiple catalytic cycles. In type III PKSs, the incoming acyl group remains bound to the Co-enzyme A unit, and the condensation between the two acyl groups results in a diketide bound to the incoming Co-enzyme A. The formed diketide is subsequently moved back to the KS domain and another Co-enzyme A bound extender unit, is loaded into the enzyme.

The above described unique functional and corresponding structural properties of the Type I, Type II or Type III PKS allow members of these three enzyme groups to be distinguished.

The subsequent folding and release of the polyketide chain produced by the different classes of PKS enzymes is either spontaneous, or may be catalyzed by several different enzyme families typically referred to as aromatases and/or cyclases, or by domain(s) within the PKS, such as a PT and/or CL-CYC domains. Herein these are collectively referred to as 'small molecule foldases'. This group of enzymes is characterized by catalyzing the regiospecific formation of intra-molecular carbon-carbon or carbon-oxygen bonds within a polyketide, resulting in the formation of aromatic or cyclic motifs. 'Small molecule foldases', acting on polyketides, are found in bacteria, fungi and plants. Several examples exist where folding of the polyketide is a spontaneous process, e.g. flavonoids in plants. Though 'small molecule foldases' perform similar functions in polyketide biosynthetic pathways they are very different at the primary sequence level, and can hence be categorized based on which structural and primary sequence motifs they contain. The group of 'small molecule foldases' that act on polyketides include enzymes from the 'Cyclase', 'SRPBCC Cyclases/aromatase', 'DABB Cyclase/aromatase', 'Polyketide synthesis cyclase', 'Lactamase_B/MBL fold metallo- hydrolase', ketroreductase from Act cluster and 'Cupin_2' Superfamilies and, in addition, includes dissected PT and CL-CYC domains from type I iterative PKS from filamentous fungi.

Importantly, the Type I, Type II or Type III PKSs are further distinguished by the timing and mechanism by which the formed polyketide chain are folded into complex structures with cyclic and aromatic motifs. In Type I modular PKS, containing a CL-CYC domain, the polyketide chain remains attached to the enzyme's ACP domain, and the CL-CYC domain is both responsible for folding of the chain into a macrolide and its simultaneously release from the ACP domain and thereby also the enzyme. Type I iterative PKSs contain a PT domain and/or CL-CYC domain, that catalyse the cyclization reactions and formation of aromatic groups in the polyketide chain. The PT domain acts on the polyketide that is bound to enzyme's ACP domain, where the ACP domain influences the docking and positioning of the polyketide substrate into the active site of the PT domain and thereby the chains folding pattern. The CL-CYC domains forms cyclic structures and simultaneously releases the ACP bound product from the enzyme. In the case of type II PKSs, polyketide folding is a post-PKS enzyme guided and catalyzed process. In this case, the KS/CLF/ACP enzyme complex forms a polyketide chain of a predetermined length, which remains bound to the ACP enzyme while it is folded by aromatase(s) and cyclase(s).

In the case of type III PKSs, the formed linear polyketide chain is released, likely following hydrolysis of the linkage to Co-enzyme A, whereafter the chain undergoes spontaneous folding into a range of sterically stable folds.

SUMMARY OF THE INVENTION

The problem solved by the present invention relates to the provision of a suitable biosynthetic pathway that forms aromatic and cyclic compounds (e.g. C ₅-C ₃i poly aromatic compounds) and/or libraries of aromatic compounds of interest in vivo.

The present invention is based on experimental results disclosed herein, which demonstrate that in vivo heterologous co-expression of a Type III polyketide synthase (PKSIII) from plants/bacteria/fungi and one or more 'small molecule foldases' from fungi/bacteria, wherein the aromatase/cyclase is from a different genus than the PKSIII, in a recombinant host cell (e.g. a yeast cell or bacterial cell), provides a suitable biosynthetic pathway for the production of aromatic compounds. The in vivo heterologously-expressed PKSIII produces a non-reduced polyketide which is converted in vivo into cyclic or/and aromatic compounds of interest by the action of the one or more heterologously-expressed 'small molecule foldases'.

Recombinant host cells expressing the PKSIII and one or more 'small molecule foldases' collectively form a programmable system for the formation of aromatic compounds, of any desirable length and fold. The natural systems do not offer such flexibility and predictability and the present invention therefore represent a major technological advance compared to existing technologies available for the creation of biosynthetic pathways that are not found in nature. The recombinant host cells may be used in a method to produce specific aromatic and cyclic compounds (e.g. C ₅-C ₃i poly aromatic compounds) and/or libraries of aromatic compounds of interest in vivo.

Accordingly, a first aspect of the present invention relates to a method of producing a library of polyketide-derived aromatic and/or polyaromatic; cyclic and/or polycyclic compounds; or any combination thereof, wherein the carbon atom chain length of the polyketide backbone of the compounds is selected from two or more of 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 and 31 carbon atoms, comprising the steps of:

a. providing one or more heterogeneous populations of recombinant cells, wherein each cell in the one or more populations comprises: i. a transgene encoding a heterologous type III polyketide synthase capable of forming a linear non-reduced polyketide compound, wherein the carbon atom chain length of the polyketide backbone of the formed compound is selected from 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 and 31 carbon atoms; and ii. a transgene encoding a first heterologous 'small molecule foldase' enzyme capable of catalyzing the formation of one or more region-specific intramolecular carbon-carbon or carbon-oxygen bonds in a linear non-reduced polyketide compound, wherein the carbon atom chain length of the polyketide backbone of the compound is one or more of 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 and 31 carbon atoms, and iii. optionally one or more transgene(s) encoding a second, third and fourth

heterologous 'small molecule foldase' enzyme capable of catalyzing the formation of one or more region-specific intramolecular carbon-carbon or carbon-oxygen bonds in a non-linear polyketide, wherein each of the first, second, third and fourth heterologous 'small molecule foldase' enzyme is a bacterial or fungal enzyme, and wherein the genus from which said bacterial or fungal enzyme is derived is different from the genus from which the PKSIII enzyme is derived, wherein the one or more populations of recombinant cells comprises cells capable of producing polyketide-derived aromatic and/or polyaromatic; cyclic and/or polycyclic compounds; or any combination thereof, wherein the carbon atom chain length of the polyketide backbone of the compounds is selected from two or more of 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 and 31 carbon atoms; and incubating and/or culturing the one or more heterogeneous populations of recombinant cells in a culture medium to support synthesis of the library of compounds.

A second aspect of the present invention relates to a heterogeneous population of recombinant cells capable of producing a library of polyketide-derived aromatic and/or polyaromatic; cyclic and/or polycyclic compounds; or any combination thereof, according to the method of the invention, wherein each cell in the population comprises: a. a transgene encoding a heterologous type III PKS capable of forming a polyketide- derived aromatic, polyaromatic, cyclic or polycyclic compound, wherein the carbon atom chain length of the polyketide backbone of the formed compound is selected from among 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 and 31 carbon atoms; and b. a transgene encoding a heterologous 'small molecule foldase' enzyme capable of

catalyzing the formation of one or more specific intramolecular carbon-carbon bonds in a polyketide-derived aromatic, polyaromatic, cyclic and polycyclic compound, wherein the carbon atom chain length of the polyketide backbone of the compound is one or more of 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 and 31 carbon atoms, and c. optionally one more transgene(s) encoding a second, third and fourth heterologous 'small molecule foldase' enzyme capable of catalyzing the formation of one or more region-specific intramolecular carbon-carbon or carbon-oxygen bonds in a non-linear polyketide, wherein each of the first, second, third and fourth heterologous 'small molecule foldase' enzyme is a bacterial or fungal enzyme, and wherein the genus from which said bacterial or fungal enzyme is derived is different from the genus from which the PKSIII enzyme is derived, wherein the population of recombinant cells comprises cells capable of producing polyketide-derived aromatic and/or polyaromatic; cyclic and/or polycyclic compounds; or any combination thereof, wherein the carbon atom chain length of the polyketide backbone of the compounds is selected from two or more of 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 and 31 carbon atoms. It is envisaged that individual heterologous host cells capable of producing an aromatic compound of interest may be identified as a result of the screening of the library of aromatic compounds produced by the one or more populations of heterologous host cells of the invention. This, or any individual heterologous host cell (or its clonal derivatives) of the invention may be used for the production of an aromatic compound. Accordingly, a second aspect of the present invention relates to a method of producing a polyketide-derived aromatic, polyaromatic, cyclic or polycyclic compound, wherein the carbon atom chain length of the polyketide backbone of the compound is selected from 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 and 31 carbon atoms, comprising the steps of: a. providing a recombinant cell comprising: i. a transgene encoding a heterologous type III polyketide synthase capable of

forming a linear non-reduced polyketide compound wherein the carbon atom chain length of the polyketide backbone of the formed compound is selected from 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 and 31 carbon atoms; and ii. a transgene encoding a first heterologous 'small molecule foldase' enzyme capable of catalyzing the formation of one or more region-specific intramolecular carbon- carbon or carbon-oxygen bonds in a linear non-reduced polyketide compound, wherein the carbon atom chain length of the polyketide backbone of the compound is one or more of 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 and 31 carbon atoms, and iii. optionally one more transgene(s) encoding a second, third and fourth heterologous

'small molecule foldase' enzyme capable of catalyzing the formation of one or more region-specific intramolecular carbon-carbon or carbon-oxygen bonds in a nonlinear polyketide compound, wherein each of the first, second, third and fourth heterologous 'small molecule foldase' enzyme is a bacterial or fungal enzyme, and wherein the genus from which said bacterial or fungal enzyme is derived is different from the genus from which the PKSIII enzyme is derived, wherein the recombinant cell is capable of a producing polyketide-derived aromatic, polyaromatic, cyclic or polycyclic compound, wherein the carbon atom chain length of the polyketide backbone of the compound is selected from 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 and 31 carbon atoms; and incubating and/or culturing the recombinant cell in a culture medium to support synthesis of the polyketide-derived aromatic, polyaromatic, cyclic or polycyclic compound.

DEFINITIONS All definitions of herein relevant terms are in accordance of what would be understood by the skilled person in relation to the herein relevant technical context.

The term "extender units" relates to the substrates that the PKS III adds to the starter unit 5 and the growing polyketide chain. The extender units are delivered as acyl groups bound to Co-enzyme A, such as, but not exclusively, malonyl-CoA, methylmalonyl-CoA, hydroxyl malonyl-CoA or ethyl-malonyl.

The term "heterologous host" is here defined as the situation where a gene is expressed in 10 a recombinant host cell that is taxonomically classified as belonging to a different genus than the organism where the gene of interest was obtained from.

The term "heterologous" with respect to an enzyme encoded by a transgene that is expressed in a recombinant cell of the invention, means that the enzyme is expressed in a 15 cell that does not normally express that enzyme; since the gene encoding the enzyme is derived from (and naturally found in) a cell from a different genetic origin (e.g. species) than the cell in which it is expressed.

The term "the genus" describes the taxonomic classification of the organism from which a 20 bacterial or fungal 'small molecule foldase' enzyme is derived, which is different from the genus from which the PKSIII enzyme is derived, which means that the 'small molecule foldase' enzyme and the PKSIII enzyme are derived from organisms that are classified to different genera.

25 The term "hybridizes" in relation to a polynucleotide which hybridizes under at least medium stringency conditions with (i) a nucleic acid molecule or (ii) a complementary strand of (i), relates to the nucleotide sequence hybridizing to a labeled nucleic acid probe corresponding to a nucleotide sequence desclosed herein, or its complementary strand under medium to very high stringency conditions. Molecules to which the nucleic acid probe

30 hybridizes under these conditions can be detected using e.g. X-ray film. Herein relevant hybridization stringency conditions are defined in J. Sambrook, E.F. Fritsch, and T. Maniatus, 1989, Molecular Cloning, A Laboratory Manual, 2d edition, Cold Spring Harbor, New York. According to the art - for long probes of at least 100 nucleotides in length, very low to very high stringency conditions are defined as prehybridization and hybridization at

35 42°C in 5X SSPE, 0.3% SDS, 200 μg/ml sheared and denatured salmon sperm DNA, and either 25% formamide for very low and low stringencies, 35% formamide for medium and medium-high stringencies, or 50% formamide for high and very high stringencies, following standard Southern blotting procedures for 12 to 24 hours optimally. For long probes of at least 100 nucleotides in length, the carrier material is finally washed three times each for 15

40 minutes using 2X SSC, 0.2% SDS preferably at least at 45°C (very low stringency), more preferably at least at 50°C (low stringency), more preferably at least at 55°C (medium stringency), more preferably at least at 60°C (medium-high stringency), even more preferably at least at 65°C (high stringency), and most preferably at least at 70°C (very high stringency).

The term "in vitro" (Latin: in glass) relates to studies that are conducted using components of an organism that have been isolated from their usual biological surroundings in order to permit a more detailed or more convenient analysis than can be done with whole organisms. Colloquially, these experiments are commonly called "test tube experiments". In contrast, in vivo studies are those that are conducted with living organisms in their normal intact state.

The term "in vivo" (Latin for "within the living") relates to experimentation using living cells or a whole living organism as opposed to a partial or dead cell or organism, or an in vitro ("within the glass", e.g., in a test tube or petri dish) controlled environment.

The term "ketide" refers to a single acyl unit added during a single condensation reaction step catalyzed by a PKS. If malonyl-CoA or methyl-malonyl are used as an extender units, then the ketide unit will be -CH ₂-CO, and -C(CH ₃)H-CO, respectively.

The term "non-reduced polyketide" denotes a non-reduced polyketide, characterized by the presence of the original ketone groups in the ketides (eg. -CH ₂-CO- if malonyl-CoA has been used as the extender unit), originating from the starter or extender units, either as ketones or in the form of carbonyls in phenolic groups (-CH ₂-CO- or its tautomeric form -CH=COH-). In the case of reduced polyketides, a single or all ketones have been reduced to alcohol (- CH ₂-CHOH-) groups by e.g. the KR domain/enzyme, or further to an alkene group (-C=C-) by e.g. a DH domain/enzyme, or even further to an alkane group (-CH ₂-CH ₂-) by e.g. an ER domain/enzyme. Based on these chemical features of the formed products, the involved PKSs are categorized as either being a non-reducing PKS or a reducing PKS.

The term "non-reducing PKS" or "non-reducing polyketide synthase" denotes a PKS which does not reduce the ketone groups in the formed polyketide chain. The lack of reductions can for instance be due to (I) a lack of the necessary keto-reductase (KR) active sites in the enzyme; and/or (II) lack of tailoring enzymes capable of catalyzing the keto-reduction reaction.

The term "nucleic acid construct" as used herein refers to a nucleic acid molecule, either single- or double-stranded, which is isolated from a naturally occurring gene or which is modified to contain segments of nucleic acids in a manner that would not otherwise exist in nature. The term nucleic acid construct is synonymous with the term "expression cassette" when the nucleic acid construct contains the control sequences required for expression of a coding sequence of the present invention. As known in the art, control sequences include all components that are necessary or advantageous for the expression of a polynucleotide encoding a polypeptide of the present invention. Each control sequence may be native or foreign to the nucleotide sequence encoding the polypeptide. Such control sequences include, but are not limited to, a leader, polyadenylation sequence, pro-peptide encoding sequence, promoter, signal peptide encoding sequence, and transcription terminator. At a minimum, the control sequences include a promoter, and transcriptional and translational stop signals. The control sequences may be provided with linkers for the purpose of introducing specific restriction sites facilitating ligation of the control sequences with the coding region of the nucleotide sequence encoding a polypeptide.

Numbering of the carbon atoms in the polyketides, and the numbering of the individual carbon atoms found in polyketide backbone is counted from the carboxylic acid (-COOH) end of the molecule. A single or double carbon-carbon bond that links e.g. the 5 and 12 carbon atom, counted from the carboxylic acid end of the polyketide, is represented as C5- C12.

The term "pentadeca" (Greek for "fifteen") denotes a polyketide chain consisting of fifteen ketide units, meaning that the polyketide backbone consists of 30 carbon atoms.

The term "pfam####" refers to specific motif in the Wellcome Trust Sanger Institute Protein-family (pfam) online database (http://pfam.xfam.org/ ^') described in Finn et al. 2014 (R.D. Finn, A. Bateman, J. Clements, P. Coggill, R.Y. Eberhardt, S.R. Eddy, A. Heger, K. Hetherington, L. Holm, J. Mistry, E.L.L. Sonnhammer, J. Tate, M. Punta. (2014) The Pfam protein families database. Nucleic Acids Research (2014), Database Issue 42: D222-D230), that allows for the identification of conserved functional sequence motifs based on Hidden Markov Models and multiple sequence alignments. The term "starter unit" relates to the first substrate that a PKS selects for incorporation into the growing polyketide chain, and hence the first ketide unit found in the polyketide chain originates from the starter units. The starter unit is delivered as acyl groups bound to Coenzyme A, such as, but not exclusively, acetyl-CoA, malonyl-CoA, methylmalonyl-CoA, p- coumaroyl-CoA, phenylacetyl-CoA or benzoyl-CoA. Type III PKSs normally uses malonyl- CoA as extender units, but can use the other starter units.

The term "recombinant expression vector" relates to recombinant expression vectors comprising a polynucleotide of the present invention, a promoter, and transcriptional and translational stop signals. The various nucleic acids and control sequences described above may be joined together to produce a recombinant expression vector which may include one or more convenient restriction sites to allow for insertion or substitution of the nucleotide sequence encoding the polypeptide at such sites.

The term "recombinant host cell" is a cell comprising a recombinant polynucleotide (e.g. DNA) molecule and a recombinant host cell will therefore not be understood as covering a natural wildtype cell as such. Recombinant polynucleotide (e.g. DNA) molecules are polynucleotide (e.g. DNA) molecules formed by laboratory methods of genetic recombination (such as molecular cloning) to bring together genetic material from multiple sources, creating sequences that would not otherwise be found in biological organisms.

The term, 'small molecule foldases' relates to enzymes that are capable of catalyzing the formation of intra-molecular carbon-carbon or carbon-oxygen bonds within a molecule, resulting in the formation of aromatic or cyclic motifs within the molecule. These include members of the following enzyme families: pfam04199 (Cyclase superfamily domain), pfaml0604 and pfam03364 (SRPBCC Cyclases/aromatases), pfam07876 (DABB Cyclases/aromatases), pfam04673 (Polyketide synthesis cyclase), pfam00753 (Lactamase_B/MBL fold metallo-hydrolase), ketroreductase from Act cluster, pfam07883 (Cupin_2) and in addition dissected PT domains from type I iterative PKS from filamentous fungi

The term "Sequence Identity" relates to the relatedness between two amino acid sequences or between two nucleotide sequences. For purposes of the present invention, the degree of sequence identity between two amino acid sequences is determined using the Needleman- Wunsch algorithm (Needleman and Wunsch, 1970, J. Mol. Biol. 48: 443-453) as implemented in the Needle program of the EMBOSS package (EMBOSS: The European Molecular Biology Open Software Suite, Rice et ai, 2000, Trends Genet. 16: 276-277), preferably version 3.0.0 or later. The optional parameters used are gap open penalty of 10, gap extension penalty of 0.5, and the EBLOSUM62 (EMBOSS version of BLOSUM62) substitution matrix. The output of Needle labeled "longest identity" (obtained using the - nobrief option) is used as the percent identity and is calculated as follows:

(Identical Residues x 100)/(Length of Alignment - Total Number of Gaps in Alignment). For purposes of the present invention, the degree of sequence identity between two nucleotide sequences is determined using the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970, supra) as implemented in the Needle program of the EMBOSS package (EMBOSS: The European Molecular Biology Open Software Suite, Rice et ai, 2000, supra), preferably version 3.0.0 or later. The optional parameters used are gap open penalty of 10, gap extension penalty of 0.5, and the EDNAFULL (EMBOSS version of NCBI NUC4.4) substitution matrix. The output of Needle labeled "longest identity" (obtained using the - nobrief option) is used as the percent identity and is calculated as follows: (Identical Deoxyribonucleotides x 100)/(Length of Alignment - Total Number of Gaps in Alignment).

As understood by the skilled person in the present context, for both "sequence identity between two nucleotide sequences" and "sequence identity between two amino acid sequences" - the term "Length of Alignment" should be understood as the actual length of alignment between the two sequences to be compared for sequence identity.

For instance, if a reference sequence is a specific SEQ ID of e.g. 100 amino acids and the other sequence is an identical sequence with 25 amino acids less at one end (i.e. the other sequence is of a length of 75 amino acids) then will the "Length of Alignment" be 75 amino acids and the percent identity will be 100%.

Another example is for instance, if a reference sequence is a specific SEQ ID of e.g. 100 amino acids and the other sequence is an identical sequence with 25 amino acids extra at one end (i.e. the other sequence is of a length of 125 amino acids) then will the "Length of Alignment" be 100 amino acids and the percent identity will be 100%.

The term "TIGRXXX" denotes a sequence motif in the The Institute of Genomic Research's Protein family database (http://www.icvi.ora/cai-bin/tiqrfams/Terms.cgi ^') that allows for the identification of conserved functional sequence motifs based on Hidden Markov Models and multiple sequence alignments.

The term "a dissected product template domain from type I iterative PKS" denotes an artificially constructed enzyme that only contains the Product Template (PT) portion of a type I non-reducing iterative PKS from fungi. The PT domain can either be identified via the National Center for Biotechnology Information (NCBI) Conserved Domain Database (CDD) and the associated search tool (CD-Search), which is available via http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cai 0020. In the CDD the PT domain has accession number "TIGR04532: PT_fungal_PKS". The artificial enzyme is designed by fusing the coding sequence of the PT domain with a 5' start codon (ATG) and a 3' stop codon (TGA, TAA or TAG).

The term "triketide" (greek for "three") denotes a polyketide chain consisting of three ketide units, meaning that the polyketide backbone consists of 6 carbon atoms. The term "ketide" refers to a -CH2-CO- unit. The term "Type III polyketide synthase (PKS)" is a self-contained enzyme that form homodimers. The single active site in each monomer catalyzes the priming and extension to form polyketide products.

DRAWINGS Figure 1 : Example of library of aromatic compounds made according to the invention (see working example herein). A population of PKSs (PKSl-PKS _n) that produces different chain lengths are combined in individual cells with 'small molecule foldases' (Cycl to Cyc _n) that catalyze different folding patters to form unique products.

Figure 2: The figure shows Extracted Ion Chromatograms (EIC) of the novel compounds synthesized by 5 different GMO strains of S. cerevisiae comprising transgenes expressing the following type III PKSs: triketide synthase 2-PS from the plant Gerbera hybrid (GH2PS); the pentaketide synthase PCS from the plant Aloe arborescens (AaPCS); hexaketide synthase HKS from plant Drosophyllum lusitanicu (DluHKS), and the heptaketide synthase PKS3 from the plant Aloe arborescens (AaPKS3), and the octaketide synthase OKS from Aloe arborescens (AaOKS), as compared to parent control strain lacking these transgenes. Compounds that correspond to the molecular mass of the various detected compounds are: (A) Triactetic acid at EIC 127.0390 +/- 0.005 (including a triacetic lactone standard-TAL); (B) 5,7-dihydroxy-2-methylchromone (pentaketidepyrone) at EIC 193.0495 +/- 0.005; (C) 6-(2',4'-dihydroxy-6'-methylphenyl)-4-hydroxy-2-pyrone (hexaketidepyrone) at EIC 235.0601 +/-0.005. Retention times, masses and compounds names can be found in table 1.

Figure 3: The figure shows Extracted Ion Chromatograms (EIC) of the novel compounds synthesized by 2 different GMO strains of S. cerevisiae comprising transgenes expressing the following type III PKSs: the octaketide synthase OKS from Aloe arborescens (AaOKS) or the heptaketide synthase PKS3 from the plant Aloe arborescens (AaPKS3), as compared to parent control strain lacking these transgenes. Compounds that correspond to molecular mass of the various detected compounds: (A) Heptaketide pyrone (TW93a) at EIC 277.0707 +/-0.005; (B) Aloesone at EIC 233.0808 +/-0.005; (C) The compounds SEK4 / SEK4b at EIC 319.0709 and dehydrated SEK4/SEK4B with a EIC of 319.0812. Retention times, masses and compounds names can be found in table 1.

Figure 4: Table showing an example of a library of aromatic compounds synthesized in vivo according to the invention (see working example herein). The introduction of a second 'small molecule foldase' (Cyclase a to Cyclase _n) into a system that already contain a PKS and a cyclase generates novel compounds.

Figure 5: The figure shows Extracted Ion Chromatograms (EIC) of a compound synthesized by 2 different GMO strains of S. cerevisiae comprising transgenes co-expressing: hexaketide synthase HKS, a type III PKS from plant Drosophyllum lusitanicu (DluHKS), together with a dissected product template domain (small molecule foldase), either: BIK1-PT from Fusarium graminearum or mdpG-PT from Aspergillus nidulans, as compared to a control GMO strain expressing only the type III PKS, hexaketide synthase HKS from plant Drosophyllum lusitanicu (DIuHKS). The detected compound #1 corresponds to a molecular mass

225.1120 m/z eluting at 4.89 minutes. A) EIC at 225.1120 m/z for the 'DluHKS+BIK-PT', 'DIuHKS+mdpG-PT' and control 'DIuHKS' strain. B) UV spectrum for compound #1 eluting at 4.89 minutes in the luHKS+mdpG-PT' strain.

Figure 6: The figure shows Extracted Ion Chromatograms (EIC) for a compound synthesized by a GMO strain of S. cerevisiae comprising transgenes co-expressing: hexaketide synthase HKS, a type III PKS from plant Drosophyllum lusitanicu (DIuHKS), together with the cyclase (small molecule foldase) gra-orf4 from Streptomyces violaceoruber, as compared to a control GMO strain expressing only the type III PKS, hexaketide synthase HKS from plant Drosophyllum lusitanicu (DIuHKS). The compound #2, corresponding to a molecular mass 191.0707 m/z, elutes at 3.95 minutes, that is just detectable in the control strain, is produced in larger amounts in the strain co-expressing the type IIIPKS and the cyclase. A) EIC at 191.0707 m/z for the ^,DluHKS+gra-orf4' and 'DIuHKS' strains. B) UV spectra for compound "2 eluting at 4.89 minutes in the ^,DluHKS+gra-orf4' strain.

Figure 7: The figure shows Extracted Ion Chromatograms (EIC) for two compounds synthesized by 2 different GMO strains of S. cerevisiae comprising transgenes co- expressing: hexaketide synthase HKS, a type III PKS from plant Drosophyllum lusitanicu (DIuHKS), together with a dissected product template domain (small molecule foldase), either: BIK1-PT from Fusarium graminearum or mdpG-PT from Aspergillus nidulans, as compared to a control GMO strain expressing only the type III PKS, hexaketide synthase HKS from plant Drosophyllum lusitanicu (DIuHKS). Detected compound #3, corresponding to a molecular mass (235.0608 m/z) elutes at 2.86 minutes and detected compound #4 (235.0608 m/z) elutes at 3.09 minutes. A) EIC at 235.0606 m/z for the "DluHKS+BIKl-PT", "DIuHKS+mdpG-PT" and control "DIuHKS" strains. B) UV spectra for compound #3 eluting at 2.86 m minutes in the luHKS+BIKl-PT' strain. C) UV/VIS spectrum compound #4 eluting at 3.8 minutes in the luHKS+BIKl-PT strain'.

Figure 8: The figure shows Extracted Ion Chromatograms (EIC) for a compound synthesized by a GMO strain of S. cerevisiae comprising transgenes co-expressing: hexaketide synthase HKS, a type III PKS from plant Drosophyllum lusitanicu (DIuHKS), together with a dissected product template domain (small molecule foldase) mdpG-PT from Aspergillus nidulans, as compared to a control GMO strain expressing only the type III PKS, hexaketide synthase HKS from plant Drosophyllum lusitanicu (DIuHKS). The detected compound #5 corresponding to a molecular mass (237.0757 m/z) elutes at 2.59 minutes, that is just detectable in the control strain, is produced in larger amounts in the strain co-expressing the type III PKS and the mdpG-PT small foldase. A) EIC at 237.0757 m/z for the DluHKS+mdpG-PT' and 'DluHKS' strains. B) UV spectra for compound #5 eluting at 2.59 minutes in the 'DluHKS+mdpG-PT' strain.

Figure 9: The figure shows Extracted Ion Chromatograms (EIC) for a compound synthesized by a GMO strain of S. cerevisiae comprising transgenes co-expressing: hexaketide synthase HKS, a type III PKS from plant Drosophyllum lusitanicu (DluHKS), together with a cyclase Zhul (small molecule foldase) from the bacterium Streptomyces sp. R1128, as compared to a control GMO strain expressing only the type III PKS, hexaketide synthase HKS from plant Drosophyllum lusitanicu (DluHKS). The detected compound #6, corresponding to a molecular mass 121.0649 m/z, elutes at 3.57 minutes. A) EIC at 121.0649 m/z of 'DluHKS+ZhuI' and control (DluHKS) strains. B) UV spectra for compound #6 eluting at 3.57 minutes in the 'DluHKS+ZhuI' strain.

Figure 10: The figure shows Extracted Ion Chromatograms (EIC) for compounds synthesized by GMO Nicotiana benthamiana lines co-expressing a type III polyketide synthase from Aloe arborescens (OKS), together with the cyclases/ketoreductase CYC, CYC_DH from the actinorhodin biosynthetic gene cluster in Streptomyces coelicolor A3 and a ketoreductase (KR) (cyclase superfamily), as compared to a control N. benthamiana expressing only the type III PKS (OKS). A) GMO plants expressing only type III PKS (OKS); B) GMO plants expressing type III PKS (OKS) and KR cyclase; C) GMO plants expressing type III PKS (OKS) and the cyclases/ketoreductase CYC, CYC_DH; D) GMO plants expressing type III PKS (OKS) and the cyclases/ketoreductase CYC and KR; E) GMO plants expressing type III PKS (OKS) and the cyclases/ketoreductase CYC, CYC_DH and KR. Figure 11: shows the structure of the heptaketides aloesone, aloesol and O-glucosylated derivatives thereof, synthesized by GMO N. benthamiana co-expressing a type III polyketide synthase (HpPKS2) together with several small molecule foldases.

DETAILED DESCRIPTION OF THE INVENTION

I A method for producing libraries of aromatic compounds

The invention provides a method of producing a library of polyketide-derived aromatic, polyaromatic, cyclic and polycyclic compounds, wherein the carbon atom chain length of the polyketide backbone of the compounds is selected from two or more of 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 and 31 carbon atoms. Alternatively, the carbon atom chain length of the polyketide backbone of the compounds is selected from six, eight, ten, twelve, fourteen, sixteen, eighteen, twenty, twenty-two, and twenty-four or twenty-eight of 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 and 31 carbon atoms. The method employs recombinant cells transformed with different heterologous genes encoding enzymes in a biosynthetic pathway leading to the formation of the library of polyketide- derived aromatic, polyaromatic, cyclic and polycyclic compounds. Surprisingly, the inventors have discovered that a recombinant cell that expresses a heterologous Type III polyketide synthase (PKS) and a heterologous 'small molecule foldase' derived from a fungal/bacterial source, where the aromatase/cyclase and the PKS are derived from a different genus, is capable of producing a non-reduced polyketide which is then converted in vivo into an aromatic compound of interest. 'Small molecule foldases' of bacterial or fungal origin are only known to act on polyketides that are bound to ACP within the KS/CLF/ACP enzyme complex of type II PKS or type I PKS. The ability of 'Small molecule foldases' of bacterial or fungal origin, that in nature act on polyketides tethered to PKSI or PKSII, to guide the folding of untethered non-reduced linear polyketides products of PKSIII enzymes derived from a different genus was therefore unexpected.

Depending on the specificity of both the PKS III and the small molecule foldase type expressed in a given recombinant cell, a wide range of aromatic compounds of interest can be produced. The inventors have further discovered that a population of heterologous recombinant cells, comprising individual host cells transformed with transgenes encoding different combinations of one type of heterologous Type III polyketide synthase (PKS) and at least one type of heterologous bacterial or fungal 'small molecule foldase', is capable of a producing the library of polyketide-derived aromatic, polyaromatic, cyclic and polycyclic compounds.

Ii Recombinantly expressed heterologous Type III polyketide synthases

Despite their structural simplicity, type III PKSs are thought to contribute to the biosynthesis of a wide array of compounds in nature, such as chalcones, pyrones, acridones, phloroglucinols, stilbenes, and resorcinolic lipids. The linear non-reduced polyketides produced by type III PKSs are characterized by the presence of ketone groups in the ketides (-CH ₂-CO-), originating from the starter or extender units, either as ketones or in the form of carbonyls in phenolic groups (-CH ₂-CO- or its tautomeric form -CH=COH-). A Type I PKS and/or a Type II PKS may be mutated to remove relevant elements (e.g. active sites) to be converted into a Type III PKS. A PKS, which by the skilled person is functionally considered to be a Type III PKS is herein understood to be a Type III PKS.

Preferably the individual type III PKS used produces products of a single chain length, i.e. only releases products after a fixed number of iterations. This will ensure that the individual recombinant cell in the library only produces one specific product which is desirable as 1) it increases the yields of the the specific product, by reducing the amount of less shunt products, and 2) it eases the identification of the active compound produced by the recombinant cell. Preferably 80% of the formed polyketides should be of the same chain length, more preferably 90% should be of same chain length, even more preferably 95% should be of the same single chain length and most preferably 99% of the formed product should be of the same chain length.

A recombinant cell of the invention comprises a transgene encoding a heterologous Type III PKS, which may be an enzyme that is natively expressed in a bacterial, fungal or plant cell. If the encoded enzyme is of bacterial origin it is preferably selected from Pseudomonas or Streptomyces.

Alternatively, if the enzyme is of fungal origin it is preferably selected from the group consisting of: Neurospora, Fusarium, Aspergillus, and Monasus.

If the encoded enzyme is of plant origin, it is preferably selected from the group consisting of: Gerbera hybrid, Aloe arborescens, Drosophyllum lusitanicum, Plumbago zeylanica, Rheum palmate, Hypericum perforatum and Plumbago indica.

Preferably, a recombinant cell of the invention comprises a transgene encoding a heterologous Type III polyketide synthase selected from the members of the groups listed below, or shares high amino acid sequence identity with a member of the group. Preferably the amino acid sequence of the heterologous Type III polyketide synthase shares at least 75, 80, 85, 90, 92, 94, 96, 98, 99 or 100% sequence identity with a member of the group. The GenBank ID numbers identifying the polypeptide sequence and corresponding native nucleotide sequence for each member of the groups of Type III polyketide synthases is given in the lists below. The nucleotide sequence of a transgene encoding any member of the group of Type III polyketide synthases may, however, need to be adapted to correspond to a codon usage required for optimal expression in the host recombinant cell.

Type III polyketide synthases selected for forming triketides are preferably: 2-PS [GenBank ID number Z38097.2 (nucleotide SEQ ID No: 1.) and GenBank ID number P48391.2 (polypeptide SEQ ID No: 2)] from Gerbera hybrid.

Type III polyketide synthases selected for forming tetraketides are preferably: PhID [GenBank ID number JN561597.1 position 2882 to 3970 (nucleotide SEQ ID No:3) and GenBank ID number AEW67127.1 (polypeptide SEQ ID No:4)] from Pseudomonas fluorescens for forming tetraketides.

Type III polyketide synthases selected for forming pentaketides are preferably: PCS [GenBank ID number AY823626 (nucleotide SEQ ID No: 5) and GenBank ID number AAX35541.1 (polypeptide SEQ ID No:6)] from Aloe arborescens or ORAS GenBank ID number XM_955334.2 position 582 to 1919 (nucleotide SEQ ID No:7) and GenBank ID number EGZ68458 (polypeptide SEQ ID No:8)] from Neurospora crassa or 1,3,6,8- tetrahydroxynaphthalene synthase [GenBank ID number CP005080 position 7775934 to 7776986 (nucleotide SEQ ID No:9) and GenBank ID number AGK81780 (polypeptide SEQ ID No: 10)] from Streptomyces fulvissimus.

Type III polyketide synthases selected for forming hexaketides are preferably: PinPKS [GenBank ID number AB259100 (nucleotide SEQ ID No: 11) and GenBank ID number BAF44539 (polypeptide SEQ ID No: 12)] from Plumbago indica, DluHKS [GenBank ID number EF405822 (nucleotide SEQ ID No: 13) and GenBank ID number ABQ59603 (polypeptide SEQ ID No: 14)] from Drosophyllum lusitanicum or PzPKS [GenBank ID number JQ015381 (nucleotide SEQ ID No: 15) and GenBank ID number AEX86944 (polypeptide SEQ ID No: 16)] from Plumbago zeylanica for forming hexaketides. Type III polyketide synthases selected for forming heptaketides are preferably: ALS [GenBank ID number AY517486 (nucleotide SEQ ID No: 17) and GenBank ID number AAS87170 (polypeptide SEQ ID No: 18)] from Rheum palmatum or AaPKS3 [GenBank ID number EF537574 (nucleotide SEQ ID No: 19) and GenBank ID number ABS72373 (polypeptide SEQ ID No: 20)] from Aloe arborescens for forming heptaketides.

Type III polyketide synthases selected for forming octaketides are preferably: OKS [GenBank ID number AY567707 (nucleotide SEQ ID No:21) and GenBank ID number AAT48709.1 (polypeptide SEQ ID No:22)] or OKS2 [GenBank ID number FJ536166 (nucleotide SEQ ID No:23) and GenBank ID number ACR19997.1 (polypeptide SEQ ID No:24)] or OKS3 [GenBank ID number FJ536167 (nucleotide SEQ ID No:25) and GenBank ID number ACR19998.1 (polypeptide SEQ ID No:26)] from Aloe arborescens or HpPKS2 [GenBank ID number HQ529467 (nucleotide SEQ ID No:27) and GenBank ID number AEE69029 (polypeptide SEQ ID No: 28)] from Hypericum perforatum. Type III polyketide synthases selected for forming nonaketides are preferably: PCS F80A/Y82A/M207G, a mutated polypeptide - SEQ ID No: 29 (derived from GenBank ID number AAX35541.1), from Aloe arborescens, having the specified triple point mutation (F80A/Y82A/M207G), and encoded by a synthetic gene. Type III polyketide synthases selected for forming decaketides are preferably: OKS N222G a mutated polypeptide SEQ ID No: 30 (derived from GenBank ID number AAT48709.1) from Aloe arborescens having the specified point mutation (N222G), and encoded by a synthetic gene. Type III polyketide synthases selected for forming dodecaketides are preferably: OKS F66L/N222G a mutated polypeptide SEQ ID No: 31 [derived from GenBank ID number AAT48709.1] from Aloe arborescens having the specified double point mutations (F66L/N222G), and encoded by a synthetic gene.

In one embodiment, the population of heterologous recombinant cells comprises host cells, or their clonal derivatives, where each individual cell comprises a transgene capable of expressing a PKS selected from a triketide synthase, tetraketide synthase, pentaketide synthase, hexaketide synthase, heptaketide synthase, octaketide synthase, nonaketide synthase, decaketide synthase, undecaketide synthase dodecaketide synthase, trideca synthase, tetradeca synthase, and pentadeca synthase. Preferably the population of heterologous recombinant cells is capable of expressing at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 or 13 members of this group. Iii Biosynthetic properties of the recombinantly expressed heterologous Type III polyketide synthases

The Type III polyketide synthase, expressed by the host recombinant cell is capable of converting suitable starter unit and extender units into a non-reduced polyketide under suitable incubation conditions. Suitable starter unit are acetyl-CoA or malonyl-CoA and suitable extender units are malonyl-CoA or methyl-malonyl-CoA. The biosynthesis of aromatic compounds (spontaneously folded polyketides of different chain length) by the host recombinant cell expressing a heterologous Type III polyketide synthase is exemplified in Example 1. Iiii Recombinantly expressed heterologous small molecule foldases

In bacterial type II PKS systems the folding of polyketide backbones is most often assisted/directed by different classes of enzymes, that act in trans (independent of the PKS enzyme) to promote a non-spontaneous fold. These enzyme classes are referred to herein as 'small molecule foldases', a group which includes aromatases and cyclases. In type II PKS systems, the formation of compounds with multiple aromatic rings typically relies on the successive action of multiple different 'small molecule foldases'. The 'small molecule foldases' can be divided into two groups based on the substrates they act on: where the first small molecule foldases only acts on linear polyketide chains and catalyze the formation of one or more aromatic/cyclic group, the second group of enzymes only accepts substrates that already contain an aromatic or cyclic group (= products from the first group of 'small molecule foldases') and catalyze the formation of additional aromatic or cyclic groups.

Surprisingly, the inventors have discovered that a bacterial/fungal 'small molecule foldase' derived from PKSI enzymes or interacting with PKSII enzymes in nature, when co- expressed with a Type III PKK in a recombinant cell, is capable of promoting a non- spontaneous fold in a non-reduced linear polyketide synthesized by the Type III PKK, thereby preventing its spontaneous folding/aromatization that it would otherwise undergo in vivo. Accordingly, the 'small molecule foldase' enzyme has a trans-acting catalytic activity that allows in vivo conversion of the non-reduced polyketide into an aromatic compound of interest. The 'small molecule foldase' enzyme is heterologous with respect to the host cell in which it is expressed, and is derived from a different genus than from which the PKS III is derived. The biosynthesis of a range of different aromatic compounds by the host recombinant cell co-expressing a heterologous Type III polyketide synthase and a heterologous bacterial/fungal small molecule foldase (where the genus from which the foldase is derived is different from the genus from which the PKSIII are derived), is exemplified in Example 2, 3 and 4.

Preferably, a recombinant cell of the invention co-expresses a Type III PKS together with a "small molecule foldase" that is an aromatase/cyclase belonging to a family selected from the group: Cyclase superfamily domain pfam04199; SRPBCC cyclase/aromatase superfamily pfam 10604 and/or pfam03364, or DABB cyclase/aromatase superfamily pfam07876; Polyketide synthesis cyclase superfamily pfam04673; Lactamase_B/MBL fold metallo- hydrolase superfamily pfam00753; ketroreductase from Act cluster; Cupin_2 superfamily pfam07883; and a dissected product template domain from type I iterative PKS originating from filamentous fungi.

Preferably, a recombinant cell of the invention comprises at least one transgene encoding a heterologous 'small molecule foldase' selected from the members of the groups listed below, or shares high amino acid sequence identity with a member of the group. Preferably the amino acid sequence of the heterologous small molecule foldase shares at least 75, 80, 85, 90, 92, 94, 96, 98, 99 or 100% sequence identity with a member of the group. The GenBank ID numbers identifying the polypeptide sequence and corresponding native nucleotide sequence for each member of the groups of small molecule foldase is given in the lists below. The nucleotide sequence of a transgene encoding any member of the group of 'small molecule foldase' may, however, need to be adapted to correspond to a codon usage required for optimal expression in the host recombinant cell.

A 'first heterologous small molecule foldase' capable of acting on the linear polyketide product of the type III PKK to form a first ring (and capable of introducing a fold at the given positions in the chain) is preferably selected from the group consisting of:

• Zhul (type: SRPBCC) [GenBank ID number AF293442.1 (nucleotide SEQ ID No: 32) and

GenBank ID number AAG30197.1 (polypeptide SEQ ID No: 33)] from Streptomyces sp.

R1128 to form a C7-C12 fold in the linear non-reduced polyketide chain; • pdmD (type: SRPBCC) [GenBank ID number EF151801.1 Position 23865 to 24326 (nucleotide SEQ ID No: 34) and GenBank ID number ABM21750.1 (polypeptide SEQ ID No: 35)] from Actinomadura hibisca to form C9-C14+C7-C16 folds;

• sanl (type: SRPBCC) [GenBank ID number GU937384.1 position 11996 to 12451

(nucleotide SEQ ID No: 36) and GenBank ID number ADG86318.1 (polypeptide SEQ ID No: 37)] from Streptomyces sp. SANK 61196;

• pnxD (type: SRPBCC) [GenBank ID number AB469194.1 position 16730 to 17203

(nucleotide SEQ ID No: 38) and GenBank ID number BAJ52684.1 (polypeptide SEQ ID No: 39)] from Streptomyces sp. TA-0256;

· NpCI (type: SRPBCC) [GenBank ID number AM492533.1 position 8866 to 9333

(nucleotide SEQ ID No: 40) and GenBank ID number CAM34342.1 (polypeptide SEQ ID No: 41)] from Streptomyces tendae;

• gra-orf4 (type: 2xSRPBCC) [GenBank ID number AJ011500.1 position 32006 to 32980 (nucleotide SEQ ID No: 42) and GenBank ID number CAA09656.1 (polypeptide SEQ ID No: 43)] from Streptomyces violaceoruber to form a C9-C14 fold;

• schP4/SFUL_4006 (type: 2xSRPBCC) [GenBank ID number CP005080.1 Position

4477979 to 4478932 (nucleotide SEQ ID No: 44) and GenBank ID number AGK78908.1 (polypeptide SEQ ID No: 45)] from Streptomyces fulvissimus DSM 40593 to form C7- C12;

· Erd4 (bifunc) (type: 2xSRPBCC) [GenBank ID number FJ719113.1 Position 3913 to 4863 (nucleotide SEQ ID No: 46) and GenBank ID number ACX83620.1 (polypeptide SEQ ID No: 47)] from uncultured soil bacterium V167 to form a C7-C12 fold;

• med-ORF19 (type: 2xSRPBCC) [GenBank ID number AB103463.1 Position 13942 to 14898 (nucleotide SEQ ID No: 48) and GenBank ID number BAC79027.1 (polypeptide SEQ ID No: 49)] from Streptomyces sp. AM-7161 to form a C7-C12 fold;

• ssfYl (type: 2xSRPBCC) [GenBank ID number GQ409537.1 Position 9830 to 10774 (nucleotide SEQ ID No: 50) and GenBank ID number ADE34490.1 (polypeptide SEQ ID No: 51)] from Streptomyces sp. SF2575 to form a C7-C12 fold;

• oxyK (type: 2xSRPBCC) [GenBank ID number DQ143963.2 Position 11443 to 12396 (nucleotide SEQ ID No: 52) and GenBank ID number AAZ78334.2 (polypeptide SEQ ID

No: 53)] from Streptomyces rimosus to form a C7-C12 fold;

• Act_ARO-CYC_actVII (type: 2xSRPBCC) [GenBank ID number AL939122.1 Position 162706 to 163656 (nucleotide SEQ ID No: 54) and GenBank ID number Q02055.1 (polypeptide SEQ ID No: 55)] from Streptomyces coelicolor A3(2) to form a C7-C12 fold;

• wA-PT (type: PT domain) [GenBank ID number None - synthetic (nucleotide SEQ ID No:

58) and GenBank ID number CAA46695 position 1276 to 1651 (polypeptide SEQ ID No:

59) ] from Aspergillus nidulan to form C7-C12 + C1-C10 folds; • BIK1-PT (type: PT domain) [GenBank ID number None - synthetic (nucleotide SEQ ID No: 60) and GenBank ID number CAB92399 Position 1252 to 1632 (polypeptide SEQ ID No: 61)] from Fusarium fujikuro\ to form C7-C12 + C1-C10 + C12-C17 folds;

• PGL1_PT (type: PT domain) [GenBank ID number None - synthetic (nucleotide SEQ ID No: 62) and GenBank ID number EYB26831 position 1225 to 1655 (polypeptide SEQ ID

No: 63)] from Fusarium graminearum to form C4-C9 + C2-C11 folds;

• mpdG_PT (type: PT domain) [GenBank ID number None - synthetic (nucleotide SEQ ID No: 64) and GenBank ID number XP_657754.1 position 1335 to 1739 (polypeptide SEQ ID No: 65)] from Aspergillus nidulans to form C6-C1 + C4-C13 + C2-C15 folds;

· Zhul-l (type: SRPBCC) [GenBank ID number ANIA_10642 (nucleotide SEQ ID No: 66) and GenBank ID number CBF80957.1 (polypeptide SEQ ID No: 67)] from Aspergillus nidulans;

• ZhuI-2 (type: SRPBCC) [GenBank ID number AN3000.2 (nucleotide SEQ ID No: 68) and GenBank ID number XP_660604.1 (polypeptide SEQ ID No: 69)] from Aspergillus nidulans;

• AOC-1 (type: Dabb) [GenBank ID number AN8584.2 (nucleotide SEQ ID No: 70) and GenBank ID number XP_681853.1 (polypeptide SEQ ID No: 71)] from Aspergillus nidulans;

• AOC-2 (type: Dabb) [GenBank ID number ANIA_01204 (nucleotide SEQ ID No: 72) and GenBank ID number CBF87939.1 (polypeptide SEQ ID No: 73)] from Aspergillus nidulans;

• AOC-3 (type: Dabb) [GenBank ID number ANIA_10997 (nucleotide SEQ ID No: 74) and GenBank ID number CBF79774.1 (polypeptide SEQ ID No: 75)] from Aspergillus nidulans;

· AOC-4 (type: Dabb) [GenBank ID number ANIA_11021 (nucleotide SEQ ID No: 76) and GenBank ID number CBF80167.1 (polypeptide SEQ ID No: 77)] from Aspergillus nidulans;

• AOC-5 (type: Dabb) [GenBank ID number AN1979.2 (nucleotide SEQ ID No: 78) and GenBank ID number XP_659583.1 (polypeptide SEQ ID No: 79)] from Aspergillus nidulans.

Iiv. Additional populations of heterologous recombinant cells for producing a library of aromatic compounds

The inventors have further discovered that the diversity of aromatic compounds produced by the heterologous recombinant cells of the invention can be extended by transforming each cell of the first population of heterologous recombinant cells with a second, optionally also a third, and optionally also a fourth transgene, where each of the second, third and fourth transgenes encodes a different heterologous 'small molecule foldase'. The second 'small molecule foldase' is capable of acting on the aromatic polyketide product of the 'first small foldase' to form an additional aromatic group(s), while the third and fourth 'small molecule foldases' are capable of forming additional aromatic groups in an iterative synthesis (and capable of introducing a fold at the given positions in the chain). The biosynthesis of a range of different aromatic compounds by the host recombinant cell co- expressing a heterologous Type III polyketide synthase and one or more heterologous bacterial/fungal small molecule foldases (where the genus from which the foldase is derived is different from the genus from which the PKSIII are derived), is exemplified in Examples 3 and 4.

Preferably, the second, third, and fourth heterologous 'small molecule foldase' is one selected from the members of the groups listed below, or shares high amino acid sequence identity with a member of this group. Preferably the amino acid sequence of the second, third, and fourth heterologous 'small molecule foldase' shares at least 75, 80, 85, 90, 92, 94, 96, 98, 99 or 100% sequence identity with a member of this group. The GenBank ID numbers identifying the polypeptide sequence and corresponding native nucleotide sequence for each member of the groups of 'small molecule foldase' is given in the lists below. The nucleotide sequence of a transgene encoding any member of the group of 'small molecule foldase' may, however, need to be adapted to correspond to a codon usage required for optimal expression in the host recombinant cell are preferably selected from the group consisting of:

• ZhuJ (type: Cyclase) [GenBank ID number AF293442.1 (nucleotide SEQ ID No. 80) and GenBank ID number AAG30196.1 (polypeptide SEQ ID No. 81)] from Streptomyces sp. R1128 to form a C5-C14 fold;

• oxyN (type: Cyclase) [GenBank ID number DQ143963.2 position 14855 to 15628

(nucleotide SEQ ID No. 82) and GenBank ID number AAZ78337.1 (polypeptide SEQ ID No. 83)] from Streptomyces rimosus to form C5-C14+C3-C16 folds;

• jadl (type: Polyketide synthesis cyclase) [GenBank ID number AAD37852.1 position 2020 to 2349 (nucleotide SEQ ID No. 84) and GenBank ID number AF126429.1

(polypeptide SEQ ID No. 85)] from Streptomyces venezuelae to form C4-C17 folds;

• LndF (type: Polyketide synthesis cyclase) [GenBank ID number AY659997.1 (nucleotide SEQ ID No. 86) and GenBank ID number AAU04837.1 (polypeptide SEQ ID No. 87)] from Streptomyces globisporus to form C4-C17+C2-C19 folds;

· pgaF (type: Polyketide synthesis cyclase) [GenBank ID number AHGS01000054.1

position 6389 to 6724 (nucleotide SEQ ID No. 88) and GenBank ID number EHN79050.1 (polypeptide SEQ ID No. 89)] from Streptomyces coelicoflavus to form C2-C19 folds;

• Act_CYC (type: Lactamase) [GenBank ID number X63449.1 Position 3830 to 4723

(nucleotide SEQ ID No. 90) and GenBank ID number CAA45047.1 (polypeptide SEQ ID No. 91)] from Streptomyces coelicolor A3(2); • sanE (type: None) [GenBank ID number AF228524.1 position 15 to 584 (nucleotide SEQ ID No. 92) and GenBank ID number AAF61923.1 (polypeptide SEQ ID No. 93)] from Streptomyces ansochromogenes;

• pnxK (type: Polyketide synthesis cyclase) [GenBank ID number AB469194.1 position 13057 to 13380 (nucleotide SEQ ID No. 94) and GenBank ID number BAJ52679.1

(polypeptide SEQ ID No. 95)] from Streptomyces sp. TA-0256;

• pnxL (type: Cupin_2 ) [GenBank ID number AB469194.1 position 13377 to 13901

(nucleotide SEQ ID No. 95) and GenBank ID number BAJ52680.1 (polypeptide SEQ ID No. 97)] from Streptomyces sp. TA-0256;

· NpCII (type: Cupin_2 ) [GenBank ID number AM492533.1 position 12120 to 12548

(nucleotide SEQ ID No. 98) and GenBank ID number CAM34346.1 (polypeptide SEQ ID No. 99)] from Streptomyces tendae;

• NpCIII (type: Polyketide synthesis cyclase) [GenBank ID number AM492533.1 position 12545 to 12880 (nucleotide SEQ ID No. 100) and GenBank ID number CAM34347.1 (polypeptide SEQ ID No.101)] from Streptomyces tendae;

• ZhuJ-1 (type: Cyclase) [GenBank ID number AN5060.2 (nucleotide SEQ ID No.102) and GenBank ID number XP_662664.1 (polypeptide SEQ ID No.103)] from Aspergillus nidulans;

• ZhuJ-2 (type: Cyclase) [GenBank ID number ANIA_11053 (nucleotide SEQ ID No.104) and GenBank ID number CBF74060.1 (polypeptide SEQ ID No.105)] from Aspergillus nidulans;

• ZhuJ-3 (type: Cyclase) [GenBank ID number ANIA_10146 (nucleotide SEQ ID No.106) and GenBank ID number CBF88175.1 (polypeptide SEQ ID No.107)] from Aspergillus nidulans;

· ZhuJ-4 (type: Cyclase) [GenBank ID number AN5068.2 (nucleotide SEQ ID No.108) and GenBank ID number XP_662672.1 (polypeptide SEQ ID No.109)] from Aspergillus nidulans;

Iv Aromatic compounds produced by the recombinant cells of the invention

In a preferred embodiment, the library of aromatic compounds may include aromatic compounds in the size range of C ₆-C The library of aromatic compounds produced by the method of the invention will comprise two to 10 ⁵ different compounds.

Ivi A recombinant cell

The term "recombinant cell" used in the method of the invention may be a eukaryotic cell [e.g. filamentous fungal cell, a yeast cell or a plant cell] or a prokaryotic cell.

Preferably the cell is a yeast cell, that may be selected from the group consisting of Ascomycetes, Basidiomycetes and fungi imperfecti, more preferably an Ascomycete. Preferably, the Ascomycetes yeast cell is selected from the group consisting of Ashbya, Botryoascus, Debaryomyces, Hansenula, Kluveromyces, Lipomyces, Saccharomyces spp e. g. Saccharomyces cerevisiae, Pichia spp., Schizosaccharomyces spp. Most preferably, the yeast cell is a yeast cell selected from the group consisting of Saccharomyces spp e. g. Saccharomyces cerevisiae, and Pichia spp.

The recombinant host cell may be a cell selected from the group consisting of a filamentous fungal cell. Filamentous fungi include all filamentous forms of the subdivision Eumycota and Oomycota (as defined by Hawksworth et al. , 1995, supra). Preferably the filamentous fungal cell is a species of Acremonium, Aspergillus, Fusarium, Humicola, Mucor, Myceliophthora, Neurospora, Penicillium, Thielavia, Tolypocladium, and Trichoderma or a teleomorph or synonym thereof. For example, the filamentous fungal cell may be an Aspergillus cell, in particular Aspergillus niger, Aspergillus oryzae or Aspergillus nidulans.

When the recombinant cell is a bacterial cell, it is preferably selected from the group consisting of: Bacillus, Streptomyces, Corynebacterium, Pseudomonas, lactic acid bacteria and an E. coll cell. A preferred Bacillus cell is B. subtilis, B. amyloliquefaciens or B. licheniformis. A preferred Streptomyces cell is S. setonii or S. coelicolor. A preferred Corynebacterium cell is C. glutamicum. A preferred Pseudomonas cell is P. putida or P. fluorescens.

Ivii Production of the library of aromatic compounds by the heterogeneous populations of recombinant cells

The one or more heterogeneous populations of recombinant cells are incubated and/or cultivated under conditions that support synthesis of the library of polyketide-derived aromatic, polyaromatic, cyclic and polycyclic compounds. Suitable cultivation conditions depend on the nature of the host recombinant cell. When the host recombinant cell is a yeast, filamentous fungal or bacterial cell, the cultivation medium (aqueous liquid or solid medium) will comprise nutrients (carbon source, minerals, essential vitamins and substrates for polyketide biosynthesis, e.g. but not exclusively acetate and malonate) necessary for the biosynthetic activity of the host cell and for host cell growth. When the host cell is a plant cell, the cultivation medium may provide a source of water and light. Iviii Screening the library of aromatic compounds

The method of producing a library of polyketide-derived, polyaromatic, cyclic and polycyclic compounds, may include the step of screening the compounds produced by the population of heterologous recombinant cells, wherein each recombinant cell clone present in the one or more heterogeneous population of recombinant cells is grown individually on a solid support, or individually in a liquid culture. Screening for compounds with antibiotic properties may be performed by growing the individual member on the recombinant cell library on a surface of bacteria and then observing the formation of clearing zones around the recombinant cells/colonies. Alternatively, the screen may be based on a light or color forming reaction that the formed compound promotes or inhibits. Alternatively the screen may be performed using in cell assays, build into the recombinant host cells prior to construction of the libraries.

Iix Recovery of the library of aromatic compounds

The method of producing a library of polyketide-derived, polyaromatic, cyclic and polycyclic compounds, may include the step of recovering the polyketide-derived aromatic, polyaromatic, cyclic and polycyclic compounds produced by the one or more heterogeneous populations of recombinant cells or produced by one or more of the recombinant cell clones present in the one or more heterogeneous populations of recombinant cells. Recovery may be performed by dilution plating or by re-streaking the population onto selective solid media.

II One or more populations of heterologous recombinant cells for production of a library of aromatic compounds

The invention provides one or more populations of heterologous recombinant cells, comprising cells capable of producing polyketide-derived aromatic, polyaromatic, cyclic and polycyclic compounds, wherein the carbon atom chain length of the polyketide backbone of the compounds is selected from two or more of 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 and 31 carbon atoms. Maintenance and replication of the individual cells, or clonal derivatives thereof, in the one or more populations will depend on the nature of the host recombinant cells, and that are known in the art.

III A method for the construction of a population of recombinant host cells for production of a library of aromatic compounds

The following method illustrates one way of constructing population(s) of recombinant host cells capable of producing a library of a polyketide-derived aromatic, polyaromatic, cyclic and polycyclic compounds, wherein the carbon atom chain length of the polyketide backbone of the compounds is selected from two or more of 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 and 31 carbon atoms. Alternatively, the carbon atom chain length of the polyketide backbone of the compounds is selected from six, eight, ten, twelve, fourteen, sixteen, eighteen, twenty, twenty-two, and tewenty-four or twenty-eight of 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 and 31 carbon atoms. The method involves transforming each individual member of the host cell population with a transgene encoding a heterologous type III PKS and one or more transgenes each encoding a different heterologous 'small molecule foldase(s)', as described in Section I. The method comprises the following steps:

(i) creating a library of transgenes encoding type III PKSs that is populated by different type III PKSs, where the individual type III PKS is responsible for forming a linear non-reduced polyketide chain of a specific length, wherein the carbon atom chain length of the polyketide backbone of the chain is selected from 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 and 31 carbon atoms.

(ii) Creating a library of transgenes encoding different types of 'small molecule

foldase(s)' that are populated by different foldases that individually catalyze the formation of one or more specific intramolecular carbon-carbon bonds in linear non-reduced polyketides of variable length (from (i)), resulting in the formation of aromatic compounds with different and unique folding patterns.

(iii) Creating one or more libraries of transgenes encoding different types of 'small

molecule foldase(s)' that are populated by different foldases that individually catalyze the formation of one or more specific intramolecular carbon-carbon bonds in non-reduced polyketides of variable length with one or more aromatic groups (from l(ii)), resulting in the formation of aromatic compounds with different and unique folding patterns. (iv)The libraries described in l(i), l(ii) and l(iii) consist of transgenes where the

sequences encoding the said genes are operationally linked to regulatory and cis-acting sequences that allows for transcription and translation in a recombinant host cell. The transgenes are preferably cloned into vectors, which can comprise one or more selection marker encoding genes and the vectors may additionally include: i. Sequences that allows for autosomal replication of the vector in the recombinant host cell, or ii. Sequences that allows for targeted integration of the vector into the genome of the recombinant host cell, or iii. Sequence that allows for transfer of the contents of the vectors to another organism by conjugation.

(v) Randomly combining the PKS type III library described in l(i) with i. library l(ii) or ii. library l(ii) plus library l(iii) or iii. library l(ii) plus two or three members of library l(iii).

(vi) Co-transformation of said libraries into a population of host cells, such that each individual cell comprises at least one transgene from library (i) and (ii) and optionally one or more additional transgene from library (iii). (vii) Optionally replicating the heterologous population of transformed cells produced in step (vi); and optionally storing the population, in a manner that each transformed cell produced in step (vi) and its clonal derivatives can be recovered individually.

(viii) Optionally isolating individual recombinant host cells from the population of host cells, to establish pure (isogenetic) cultures of the isolated recombinant host cell.

An alternative to the above described method, is as follows: Each library of transgenes described in l(i), l(ii) and l(iii), optionally cloned into vectors, is individually transformed into a population of host cells, such that each individual cell of the library comprises at least one transgene from library (i), or (ii), or library (iii). The transgenes from library (ii), and optionally library (iii) transformed into the respective populations of host cells, can be transferred to the host cell population comprising library (i) by conjugation, cell-cell fusion or crossing such that the each cell in the resulting population of heterologous host cells comprises at least one transgene encoding a Type III PKS and one or more transgene encoding 'small molecular foldases'.

EXAMPLES

Example 1 - Library of PKSs that produce polyketides of different lengths in S. cerevisiae

This example aims to show how the expressison of different type III PKSs in S. cerevisiae result in the formation of a range of different aromatic compounds in vivo. This concept is illustrated in Figure 1. Methods

Five different type III polyketide synthases of variable origin were selected for heterologous expression in S. cerevisiae; the triketide synthase 2-PS from the plant Gerbera hybrida, the pentaketide synthase PCS from the plant Aloe arborescens, hexaketide synthase HKS from the plant Drosophyllum lusitanicum, heptaketide synthase PKS3 from the plant Aloe arborescens, and the octaketide synthase OKS from Aloe arborescens. The genes were codon optimized for expression in S. cerevisiae using the GeneArt GeneOptimzer algorithm (LifeTechnologies). The de novo synthesized genes were delivered in shuttle vectors, and the coding sequences were amplified by PCR using the primers listed below: Primerlist:

Primers used for the construction process, where dU represents 2-deoxyuridine:

Sc_Gh_2-PS-F 5'-ATCAACGGGdUAAAAATGGGTTCCTACTCTTCTGATGATGTTG-3' SEQ

110

Sc_Gh_2-PS-R 5'-CGTGCGAdUTTAGTTACCATTAGCAACAGCAGCAGTAACTC-3' SEQ ID No. Ill

Sc_AaOKS-F 5'-ATCAACGGGDUAAAAATGAGTAGTTTATCAAATGCCAGTCAC-3' SEQ ID No.

112

Sc_AaOKS-R 5'-CGTGCGADUTTACATCAATGGCAAGGAATGCAATAAG-3' SEQ ID No. 113 Sc_Aa_PCS-F 5'- ATCAACGGGd U AAAAATGTCCTCCTTGTCTAATTCCTTGC- 3 ' SEQ ID No. 114 Sc_Aa_PCS-R 5'- CGTGCGAd UTTACATCAAAGGCAAAGAATGCA- 3 ' SEQ ID No. 115

Sc_DluHKS-F 5 '-ATCAACGGGd UAAAAATGGCTTTCGTTGAAGGTATGGGT- 3 ' SEQ ID No. 116 Sc_DluHKS-R 5 '-CGTGCGAd UTTAGTTGTTGATTGGGAAGGATCTCAAGA- 3 ' SEQ ID No. 117 Sc_AaPKS3/ALS-F 5'-ATCAACGGGdUAAAAATGGGTTCCTTGTCTGATTCTACTCCA-3' SEQ ID No. 118 Sc_Aa PKS3/ALS- R 5 '-CGTGCGAd UTTAGACTGGTGGCAAAGAATGCAACA- 3' SEQ ID No. 119 Promoter-F 5'- ACGTATCGCdUGTGAGTCGTATTACGGATCCTTG -3' SEQ ID No. 120

Promoter-R 5'- CGTGCGAdUGCCGCTTGTTTTATATTTGTTG -3' SEQ ID No. 121

Generation of plasmid constructs for expression in S. cerevisiae

The used primers included 5' overhangs that allowed for directional cloning into the 2- micron pBOSALl vector, by the Uracil-Specific Excision Reagent Cloning (USER) technique, described in Nour-Eldin et a/. 2006 (Hussam H. Nour-Eldin, Bjarne G. Hansen, Morten H. H. Norholm, Jacob K. Jensen, and Barbara A. Halkier. Advancing uracil-excision based cloning towards an ideal technique for cloning PCR fragments. Nucleic Acids Res. 2006, 34(18) : el22.). The PGK1 promoter was also PCR amplified from the vector pSP-G2, using the primers PGKl-d and PGK_F, as described in (Mikkelsen MD, Buron LD, Salomonsen B, Olsen CE, Hansen BG, Mortensen UH. Halkier BA. Microbial production of indolylglucosinolate through engineering of a multi-gene pathway in a versatile yeast expression platform. Metab Eng. 2012; 14: 104-111). The PCR amplicons were purified via 1% agarose gel electrophoresis and the Illustra 'GFX PCR DNA and gel band purification kit (GE Healthcare). The recipient vector pCfB257, was digested with AsiSI and Nb.BsmI, and the used restriction enzymes were subsequently heat inactivated. The individual purified coding sequences were combined with the digested recipient vector and the purified promoter element and treated with the USER enzyme mix (NEB) and transformed into chemical competent E. coli DH5-alpha cells, as described in Nour-Eldin et a/. 2006. Directional cloning resulted in the creation of an expression cassette, as described in Mikkelsen et a/. 2012. Transformants were selected for on Luria-Bertani (LB) agar supplemented with ampicillin. Plasmid DNA from colonies were purified using the GenElute kit (Sigma-Aldrich) and the size and restriction enzyme digestion pattern were analyzed and compared to the theoretical expected sizes and patterns for the individual plasmid. Final verification of the five constructed plasmids consisted of two overlapping sequencing reactions.

The validated plasmids were digested with NotI to liberate the expression/targeting cassette from each of the five plasmids. The liberated expression cassettes were transformed into the competent S. cerevisiae cells CEN.PK102-5B, mating type a via the lithium acetate/single-stranded carrier DNA/polyethylene glycol transformation method (Gietz, R.D., Schiestl, R.H., 2007. "High-efficiency yeast transformation using the LiAc/SS carrier DNA/PEG method". Nat. Protoc. 2, 31-34). Transformants were selected for by culturing on SC-Leu agar plates as described in Mikkelsen et al 2012. Correct transformants were identified by colony-PCR using the gene specific primers.

Growth of S. cerevisiae, metabolite extraction and LC-MS/MS analysis

The verified S. cerevisiae strains, called Sc.CEN.PK: :2m : :2-PS, Sc.CEN.PK: : 2m: :PCS, Sc.CEN.PK: : 2m : :W/ S, Sc.CEN.PK: : 2m: PKS3 and Sc.CEN.PK: : 2m: : OKS, were cultured in 300 ml Erlenmeyer flasks with either 100 ml liquid SC-Ura or Yeast-Peptone-Dextrose medium (REF). The cultures were allowed to grow for 3 days at 30 °C with 150 rpm orbital shake, after which the cells were harvested by centrifugation. The produced metabolites were extracted from the cells using isopropanol : ethyl acetate (1 : 3 v/v) with 1% formic acid and from the medium using ethyl-acetate. The solvents were evaporated and the analytes were resuspended in HPLC grade methanol. The analytes were separated using a Dionex UltiMate 3000 UHPLC equipped with a diode array detector (DAD) system hyphenated to a Q-TOF mass spectrometer. The samples were analyzed with three different injects volumes 1 μΙ, 5 μΙ and 10 μΙ. For separation in the UHPLC system a reversed-phase Kinetex C18 (100 mm, 2,1 mm, 2,6 μιη) column was used and the temperature was maintained at 40 °C and a flow rate of 400 μΙ/min. The used mobile phases consisted of MilliQ water with 20 mM formic acid (A) and acetonitrile with 20 mM formic acid (B). The analytes were eluted using a gradient starting at 10% solvent B and increased to 100% solvent B over a period of 15 minutes. The column was washed with 100% solvent B for 3 minutes and re-equilibrated for 2.4 minutes with 10% B before the next sample was injected. The analytes were detected via an online DAD (Dionex Ultimate 3000) detector from 200 to 600 nm and an online maXis 3G Qq-Oa-TOF (Bruker Daltronics GmbH). In the MS the analytes were ionized by electrospray operating in positive mode; capillary voltage at 4.5 kV, nebulizer gas at 2.4 bar, drying gas flow at 12 ml/min and a drying temperature of 220 °C. The MS was used in full scan mode in the mass range of 100-1000 Da. The instrument was calibrated using sodium formate (HCOONa) (Fluka, analytical grade). The obtained data were processed and handled using Compass DataAnalysis v. 4.0 SP4 Build 281 (Bruker Daltronics). Bruker Daltronics Compass IsotopicPattern was used for calculating isotopic patterns of the pseudo-molecular ion and adducts. An in-house standard of triaceticlactone (spontaneously folded triketide) was run under the same conditions to confirm identity of the produced triketide. Identification of other aromatic prolyketids were performed via detection of the monoisotopic molecular mass ([M + H] ⁺), supported by the maximal UV absorption wavelengths (nm) for the individual compound as specified in figure 4 in Karppinen et al. 2008 Octaketide-producing type III polyketide synthase from Hypericum perforatum is expressed in dark glands accumulating hypericins, FEBS 275(17): 4329-4342.

Results:

Expression of the five PKSs in S. cerevisiae resulted in production of new metabolites not observed in the reference strain not expressing any of the five genes (Table 1, Figure 2, 0 and Figure 3).

Table 1: Products produced from the heterologous expression of type III PKS in S. cerevisiae.

RT: retention time; [M+H] ⁺: positive molecular ion mass;.'+' indicates whether the given compound was detected upon expression of the given PKS. ^lnd' indicates that the compound was not detected in the sample.

Conclusion:

Heterologous expression of the five different type III PKS in S. cerevisiae resulted in the production of novel compounds, representing spontaneously folded tri-, penta-, hexa-, 0 hepta- and octaketides, in the individual strains. These results demonstrate that it is possible to functionally express type III PKS in S. cerevisiae and obtain products similar to those reported in the literature for in vitro experiments with purified enzymes. The compounds that have previously been obtained in in vitro experiments are the result of spontaneous folding/cyclization of the formed linear non-reduced polyketides. The example 5 shows that S. cerevisiae does not express any endogenous enzymes capable of preventing or altering the spontaneous folding/cyclization pattern. This demonstrates that S. cerevisiae does not contain any enzymatic activities that will interfere with attempts to control and direct folding of the formed linear non-reduced polyketide by introducing heterologous cyclases/aromatases.

Example 2 - Combining the PKS library with a library of 'small molecule foldases' in S. cerevisiae This example aims to show how different combinations of PKSs and cyclases can result in the formation of a range of different aromatic compounds. This concept is illustrated in Figure 4.

5 Methods

Four different 'small molecule foldases', including three different bacterial cyclases/aromatases and two product template (PT) domains, dissected from fungal type I iteratative polyketide synthases, were selected for heterologous expression in S. cerevisiae; Zhul from the bacterium Streptomyces sp. R1128 (C7-C12), gra-orf4 from the bacterium 10 Streptomyces violaceoruber (expected C9-C14), BIKl-PT from fungi Fusarium graminearum (expected C2-C7) and mdpG-PT from Aspergillus nidulans (expected C6-C11).

The genes were codon optimized for expression in S. cerevisiae using the GeneArt GeneOptimizer algorithm (LifeTechnologies). The de novo synthesized genes were delivered 15 in shuttle vectors, and the coding sequences were amplified by PCR using the primers listed below:

Primers used for the construction process, where dU represents 2-deoxyuridine:

Sc_ZhuI-F 5 '- AGCG ATACGd U AAAAATG AG AC ACGTTG AACACAC AGTTACCG - 3 ' SEQ ID No. 20 122

Sc_ZhuI-R 5'-CACGCGAdUTTATTATGCAGTTACGGTACCAACACCAC-3' SEQ ID No. 123 Sc_BIKl-PT-F 5' - AGCG ATACG U AAAAATG AG ATTGTCCG ATTCCGTTCAC A- 3 ' SEQ ID No. 124 Sc_BIKl-PT-R 5 '-CACGCG AUTTAAATC AAACCAG AAGCTG AACC AACTG - 3 ' SEQ ID No. 125 Sc_gra-orf4-F 5'-AGCGATACGdUAAAAATGGCTAGAACTGCTGCTTTGC-3' SEQ ID No. 126 25 Sc_gra-orf4-R 5'-CACGCGAdUTTAACCTGCTTCAGCAGCTTCAGC-3' SEQ ID No. 127

Sc_mdpG-PT-F 5 '- AGCG ATACGU AAAAATGTCTGGTTTG AG AACTTCC ACCG - 3 ' SEQ ID No. 144 Sc_mdpG-PT-F 5'-CACGCGAUTTAGACCAAAGCTTTAGCAGCAACTGAA-3' SEQ ID No. 145

The four 'small molecule foldases' encoding genes were cloned into the pCfB389 vector as 30 described for the five Type III PKS genes in Example 1. The used vector allows for targeted integration into the XI-2 site in the genome of S. cerevisiae, as described in Mikkelsen et al. 2006. The expression cassettes were transformed into the Sc.CEN.PK 111-61A mating type alpha and selected for on SC-Ura plates. Correct transformants were identified by colony- PCR using the gene specific primers. The obtained verified strains are hereafter referred to 35 as Sc.CEN.PK: :XI-2: :ZhuI, Sc.CEN.PK: :XI-2: :gra-orf4, Sc.CEN.PK: :XI-2: : BIK1-PT, and Sc.CEN.PK: :XI-2: :mdpG-PT respectively.

The S. cerevisiae strains Sc.CEN.PK: :2m: :HKS and Sc.CEN.PK: :2m: : OKS, described in Example 1, is in the present example (Example 2) used to exemplify a library of different 40 type III PKSs that produce polyketides of different lengths. The five foldases were crossed with the type III PKS HKS expressing strains Sc.CEN.PK: : 2m : :HKS, to form diploids yielding five new combinatory strains each containing a PKS and a cyclase/aromatase. The Sc.CEN.PK: : 2m: : OKS strains was crossed with the Sc.CEN.PK: :XI-2: :ZhuI. Mating between the PKS carrying strains (mating type a, Leu marker) and the foldase carrying strains (mating type alpha, URA3 marker) was performed by co-inoculating the respective strains combinations on YPD agar plates. The plates were incubated at 30° C for 8 hours, after which the cultures were replica plated onto SC-leu-ura, to select for diploids containing both the selective markers, and incubated at 30° C for four days. Colonies from the double selective plates were streaked onto fresh SC- leu-ura plates to purify them. Single colonies of the diploids containing both the PKS and a foldase were inoculated in shake flasks with 20 mL Delft Synthetic Minimal Medium lacking leucine and uracil, but with added histidine. The cultures were incubated at 30 °C with shake for 4-5 days.

The production of novel metabolites was analyzed by UHPLC-HRMS as described in Example 1.

Results:

Combining the DluHKS (type III PKS) with the dissected product template domain from mdpG-PT or BIK1-PT resulted in the production of a novel compound with a [M+H] ⁺ 225.1120 m/z which eluted at 4.89 minutes (Figure 5A). The UV spectrum of the compound (Figure 5B) shows that the compound includes a conjugated bond systems (absorption maxima at 222 nm and 280 nm) similar to what is found in phenolic compounds with a single aromatic group

Co-expression of DluHKS (type III PKS) and the cyclase gra-orf4 results in the accumulation of increased concentrations (9 times) of a compound with a [M+H] ⁺ of 191.0707 at 3.95 minutes (Figure 6A) compared to when DluHKS is expressed alone. The absorption maxima of 222 nm and 290 nm (Figure 6B) support that the compound includes a conjugated aromatic bond systems characteristic of aromatic compounds.

Expression of DluHKS (type III PKS) with the dissected product template domain (PT) from mdpG or BIK1-PT resulted in a significant increase of the concentrations of two compounds with a [M+H] ⁺ of 235.0606 eluting at 2.86 minutes and 3.08 minutes (Figure 7A). The compounds at 2.86 minutes has absorption maxima at 212 nm and 302 nm, while the compound at 3.08 minutes absorbs at 220 nm, 250 nm and 294 nm supporting that the two compounds includes aromatic conjugated bond systems (Figure 7B). Combining the DluHKS (type III) with the dissected product template domain (PT) from mdpG resulted in a seven fold increase in the concentration of a compounds with a [M+H] ⁺ of 237.0757 eluting at 2.58 minutes (Figure 8A), The compound absorbs at 218 nm and 276 nm (Figure 8B), indicative of a aromatic conjugated bond systems.

Co-expression of DluHKS (type III) with the cyclase Zhul resulted in a six fold increase in the concentration of a compound eluting at 3.57 min and with an [M + H] ⁺ of 121.0649 (Figure 9A). The compound exhibit absorption maxima at 222 nm and 278 nm indicating that the compound includes a aromatic conjugated bond system (Figure 9B).

Conclusion:

These results show that co-expression of a type III PKS and a heterologous cyclase/aromatase or dissected product template domain from a type I iterative PKS in the host cell Saccharomyces cerevisiae results in the formation of novel compounds than what is observed when the PKS is expressed alone. In several cases the co-expression resulted in the significant increase in the formation of aromatic compounds otherwise produced at low concentrations when the PKS is expressed alone. These results surprisingly shows that 'small molecule foldases' originating from bacterial or fungal type I and type II PKS systems, which in nature act on ACP-bound polyketides, can act on free non-reduced linear polyketides produced by type III PKSs.

Example 3 - Introducing a type III polyketide synthase (OKS) together with cyclases/ketoreductase CYC, CYC_DH and KR (cyclase superfamily) into Nicotiana benthamiana (N. benthamiana).

This example illustrates how the introduction of cyclases/ketoreductases, together with a type III polyketide synthase, OKS in N. benthamiana, can further increase the compound diversity. This concept is illustrated in Figure 4. Methods

Generation of plasmid constructs for expression in N. benthamiana.

CYC (actIORF5) and CYC_DH (actIORF4) from the actinorhodin biosynthetic gene cluster in Streptomyces coelicolor A3 (2) (Genbank accession: X63449.1) were codon optimized for N. benthamiana expression, whereas KR (Genbank accession: M 19536) was codon optimized for E. coli expression. All three genes were purchased as synthetic DNA fragments from Genscript together with the native sequence of OKS from Aloe arborescens (Genbank accession: AY567707). All synthetic fragments were used as PCR templates with compatible deoxyuracil(dU)-containing primers (see table 1) to generate constructs that were cloned into pEAQ-HT-USER (Sainsbury et al., 2009) by USER technology. All pEAQ-HT-USER plasmid constructs were transformed into the Agrobacterium tumefaciens strain, AGL-1 and infiltrated into leaves of N. benthamiana plants as described in (Bach, S.S., Bassard, J.E., Andersen-Ranberg, J., Moldrup, M.E., Simonsen, H.T., Hamberger, B. (2014). High- Throughput Testing of Terpenoid Biosynthesis Candidate Genes Using Transient Expression in Nicotiana benthamiana. In M Rodriguez Concepcion, ed, Plant Isoprenoids, Methods in Molecular Biology, Vol. 1153. Humana Press, New York.).

Primer sequences for amplification of different gene constructs.

Gene fragments Primer sequence

OKS- Forward 5'-GGCTTAA/dU/ATGAGTTCACTCTCCAACGCTTCCCATC-3' SEQ ID No. 130 OKS- Reverse 5'-GGTTTAA/dU/TTACATGAGAGGCAGGCTGTGGAGAAGGATAGT-3' SEQ ID No. 131

Zhul-Forward 5'-GGCTTAA/dU/ATGAGGCATGTCGAGCAT-3' SEQ ID No. 132

Zhul-Reverse 5'-GGTTTAA/dU/TTATGCCGTGACAGTTCCGACAC-3' SEQ ID No. 133 ZhuJ-Forward 5 '-GGCTTAA/d U/ ATGTCCGG ACGTAAG ACG - 3 ' SEQ ID No. 134

ZhuJ-Reverse 5 '-GGTTTAA/d U/TTAATCTTCCTCCTCCTGTTC AA- 3 ' SEQ ID No. 135 CYC-Forward 5 '-GGCTTAA/d U/ATGACTGTTGAAGTTCGT- 3 ' SEQ ID No. 136

CYC- Reverse 5 '-GGTTTAA/d U/TTAAGCC AAGC AAGTAGG AAGTT- 3 ' SEQ ID No. 137 CYC_DH-Forward 5 '-GGCTTAA/d U/ATGTCAAGACCTGG AG AA- 3 ' SEQ ID No. 138

CYC_DH-Reverse 5'-GGTTTAA/dU/TTAGCTTGCCGGCCCAGC-3' SEQ ID No. 139

KR-Forward 5'-GGCTTAA/dU/ATGGCAACCCAGGATAGCGAAGTTGCAC-3' SEQ ID 140

KR- Reverse 5 '-GGTTTAA/d U/TTAATAGTTGCCCAGACCACCACAAACATTCAG - 3 ' SEQ ID No. 141

HpPKS2-Forward 5 '- G GCTT A A/d U/ ATG G GTTCCCTTG AC A ATG GT- 3 ' SEQ ID No. 142

HpPKS2-Reverse 5'-GGTTTAA/dU/TTAGAGAGGCACACTTCGGAGAA-3' SEQ ID No. 143

Metabolite extraction and LC-MS/MS analysis

Compounds produced when OKS was co-expressed with CYC, CYC_DH and KR were extracted from discs (0=3cm) of agroinfiltrated N. benthamiana leaves. Leaf discs, excised with a cork borer, were flash frozen in liquid nitrogen. 0.5 ml of extraction buffer (85 % (v/v) methanol, 0.1 % (v/v) formic acid), equilibrated to 50 °C, were added to each frozen leaf disc followed by incubation for 1 hour at 50 °C, agitating at 600 rpm. The supernatant was isolated and passed through a Multiscreen _HTs HV 0.45 μιη filter plate (Merck Milipore). The filtered supernatant was subjected to LC-MS/MS analysis which was performed on an Agilent 1200 HPLC coupled to a Bruker micrOTOF-Q II mass spectrometer equipped with an electrospray ionization source. Chromatographic separation was obtained on a Luna Ci ₈(2) column (150 x 4.6 mm, 3 μιη, 100 A, Phenomenex) maintained at 40 °C. The aqueous eluent (A) consisted of water/acetonitrile (95: 5, v/v) and the organic eluent (B) consisted of water/acetonitrile (5:95, v/v); both acidified with 0.1% formic acid. Linear gradient elution profiles were used : 0 min, 0% B; 30 min, 100% B; 33 min 100% B; 35 min, 0% B. The flow rate was maintained at 0.5 mL/min and 10 min equilibration.

Results:

Introduction and co-expression of OKS and KR together with either CYC and/or CYC_DH in N. benthamiana, resulted in production of novel compounds with the masses and retention time shown in the table 2 and Figure 10.

Table 2: Novel compounds produced from the in vivo combination of OKS with cyclases/ ketoreductases.

+ indicate in which combination the polyketide synthase and foldases produced specific novel polyketide-derived compounds. LC-MS chromatograms in which the novel polyketide- derived compounds were identified from the different combinations (B-E), can be found in Figure 3. RT: retention time and m/z: mass-to-charge ratio and ESI+ : positive electrospray ionisation.

Conclusion

The heterologous co-expression, also defined as combinations, of OKS from Aloe arborescens with foldases (CYC and CYC_DH) and KR from Streptomyces coelicolor A3 (2) gives rise to the production of novel compounds, including polyketides of different chain-length and derivatives thereof in N. benthamiana. Example 4 - Introducing a type III polyketide synthase (HpPKS2) together with cyclases/ketoreductase Zhul, ZhuJ, CYC, CYC_DH and KR (cyclase superfamily) into N. benthamiana. Methods

Generation of plasmid constructs for expression in N. benthamiana.

CYC (actIORF5) and CYC_DH (actIORF4) from the actinorhodin biosynthetic gene cluster in Streptomyces coelicolor A3 (2) (Genbank accession: X63449.1), Zhul (Genbank accession: AAG30197) and ZhuJ (Genbank accession: AAG30196) were codon optimized for N. benthamiana expression , whereas KR (Genbank accession: M19536) was codon optimized for E. coll expression. All five genes were purchased as synthetic DNA fragments from Genscript together with the native sequence of HpPKS2 from Hypericum perforatum (Genbank accession: HQ529467). All synthetic fragments were used as PCR templates with compatible deoxyuracil(dU)-containing primers (see table 1) to generate constructs that were cloned into pEAQ-HT-USER by USER technology. All pEAQ-HT-USER plasmid constructs were transformed into the Agrobacterium tumefaciens strain, AGL-1 and infiltrated into leafs of N. benthamiana plants as described in (Bach, S.S., Bassard, J.E., Andersen- Ranberg, J., Moldrup, M.E., Simonsen, H.T., Hamberger, B. (2014). High-Throughput Testing of Terpenoid Biosynthesis Candidate Genes Using Transient Expression in Nicotiana benthamiana. In M Rodriguez Concepcion, ed, Plant Isoprenoids, Methods in Molecular Biology, Vol. 1153. Humana Press, New York.).

Metabolite extraction and LC-MS/MS analysis

Extraction protocol was as described in example 4.

Results

The co-expression of the type III polyketide synthase HpPKS2 together with either Zhul, ZhuJ and/or KR in N. benthamiana, resulted in the production of novel polyketide-derived compounds. Among these novel compounds the heptaketide aloesone, aloesol and O- glucosylated varieties thereof were identified (figure 11).

Conclusion The heterologous co-expression, also defined as combinations, of HpPKS2 with foldases (Zhul and ZhuJ) and KR from Streptomyces coelicolor A3 (2) give rise to the production of novel compounds, including polyketides of different chain-lengths and derivatives thereof in N. benthamiana. REFERENCES Bach, S.S., Bassard, J.E., Andersen-Ranberg, J., M0ldrup, M.E., Simonsen, H.T., Hamberger, B. (2014). High-Throughput Testing of Terpenoid Biosynthesis Candidate Genes Using Transient Expression in Nicotiana benthamiana. In M Rodriguez Concepcion, ed, Plant Isoprenoids, Methods in Molecular Biology, Vol. 1153. Humana Press, New York.)

Sainsbury, F., Theunemann, EC, Lomonossoff, GP., (2009) pEAQ: versatile expression vectors for easy and quick transient expression of heterologous proteins in plants, Plant Biotechnology Journal 7(7): 682-693.

Previous Patent: EQUIPMENT AND METHOD FOR SUPERVISING VALVES IN A HYDRAULIC CIRCUIT, ASSOCIATED HYDRAULIC CIRCUIT AND...

Next Patent: EXENDIN-4 DERIVATIVES AS TRIGONAL GLP-1/GLUCAGON/GIP RECEPTOR AGONISTS