Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
BIOSYNTHETIC POLYPEPTIDE AND METHODS OF USE
Document Type and Number:
WIPO Patent Application WO/2015/156739
Kind Code:
A1
Abstract:
The present disclosure provides for a polypeptide scaffold capable of identifying and developing polyketide synthase enzymes. Moreover, the polypeptide of the present disclosure is utilized in methods for screening to verify the activity and functionality of identified polyketide synthase enzymes. In addition, the polypeptide of the present disclosure is utilized in a method for producing polyketides from Acyl-Coenzyme A derivatives.

Inventors:
HO YING SWAN (SG)
WONG FONG TIAN (SG)
HOON SHAWN (SG)
ZHANG MINGZI (SG)
Application Number:
PCT/SG2015/050063
Publication Date:
October 15, 2015
Filing Date:
April 07, 2015
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
AGENCY SCIENCE TECH & RES (SG)
International Classes:
C12N9/10; C12P7/46; C12P17/08; C12Q1/48
Domestic Patent References:
WO2012058686A22012-05-03
Foreign References:
US20030068676A12003-04-10
US6117659A2000-09-12
Other References:
HONG, H. ET AL.: "Chain initiation on type I modular polyketide synthases revealed by limited proteolysis and ion-trap mass spectrometry", THE FEBS JOURNAL, vol. 272, 2005, pages 2373 - 2387, XP055230465
BYCROFT, M. ET AL.: "Efficient purification and kinetic characterization of a bimodular derivative of the erythromycin polyketide synthase", EUROPEAN JOURNAL OF BIOCHEMISTRY, vol. 267, 2000, pages 520 - 526, XP055230471
CANE, D.E. ET AL.: "Erythromycin biosynthesis. Highly efficient incorporation of polyketide chain elongation intermediates into 6-deoxyerythronolide B in an engineered Streptomyces host", THE JOURNAL OF ANTIBIOTICS, vol. 48, 1995, pages 647 - 651, XP055230474
MENZELLA, H.G. ET AL.: "Combinatorial polyketide biosynthesis by de novo design and rearrangement of modular polyketide synthase genes", NATURE BIOTECHNOLOGY, vol. 23, 2005, pages 1171 - 1176, XP002546862
CRAWFORD, J.M. ET AL.: "Deconstruction of iterative multidomain polyketide synthase function", SCIENCE, vol. 320, 2008, pages 243 - 246, XP055230485
Attorney, Agent or Firm:
SPRUSON & FERGUSON (ASIA) PTE LTD (Robinson Road Post Office, Singapore 1, SG)
Download PDF:
Claims:
Claims

1. A polypeptide comprising an amino acid sequence encoding at least an acyl carrier protein (ACP) domain and a thioesterase (TE) domain, and one or more further domains selected from the group consisting of an acyltransferase (AT) domain; a ketoreductase (KR) domain, a dehydratase (DH) domain, an O-methyltransferase (O- MT) domain, a C-methyltransferase (C-MT) domain, a halogenase domain(Hal), a sulfotransferase (ST) domain and enoyl reductase (ER) domain.

2. The polypeptide of claim 1 , wherein the ACP domain and TE domain are located adjacent to each other.

3. The polypeptide of claim 2, wherein the ACP domain and TE domain form a di domain derived from a polyketide synthase (PKS).

4. The polypeptide of claim 3, wherein the PKS is a type I PKS, type II PKS or derived from fatty acid synthases.

5. The polypeptide of claim 4, wherein the PKS is a type I PKS.

6. The polypeptide of claim 4 or 5, wherein the type I PKS is 6-deoxyerythonolide synthase (DEBS).

7. The polypeptide of any of claims 1 to 6, wherein the one or more further domains selected from the group consisting of a an acyltransferase (AT) domain; a ketoreductase (KR) domain, a dehydratase (DH) domain, an O-methyltransferase (O- MT) domain, a C-methyltransferase (C-MT) domain, a halogenase domain (Hal), a sulfotransferase (ST) domain and enoyl reductase (ER) domain are discrete from the polypeptide encoding at least an ACP domain and TE domain.

8. The polypeptide of any one of claims 1 to 6, wherein the ACP, TE and said one or more further domains are covalently linked via an amino acid sequence encoding a linker or scaffold encoding region.

9. The polypeptide of claim 8, wherein the C-terminus of said ACP domain is covalently linked to the N-terminus of the TE domain.

10. The polypeptide of claim 9, wherein the one or more further domains are covalently linked to the C-terminus of the TE domain or N-terminus of the ACP domain.

11. The polypeptide of claim 8, wherein the C-tenninus of said TE domain is covalently linked to the N-terminus of the ACP domain.

12. The polypeptide of claim 11, wherein the one or more further domains are covalently linked to the N-terminus of the TE domain or C-terminus of the ACP domain.

13. The polypeptide of claim 8, wherein the one or more further domains are positioned in between the ACP and TE domains.

14. An isolated nucleic acid encoding a polypeptide of any one of claims 1-13.

15. A recombinant DNA expression vector or plasmid comprising the isolated nucleic acid of claim 14 operably linked to a promoter sequence.

16. The recombinant DNA expression vector or plasmid of claim 15, wherein said vector or plasmid is capable of replicating autonomously.

17. A method for producing a polyketide, said method comprising reacting a polypeptide comprising an amino acid sequence encoding at least an acyl carrier protein (ACP) domain and a thioesterase (TE) domain, and one or more further domains selected from the group consisting of an acyltransferase (AT) domain; a ketoreductase (KR) domain, a dehydratase (DH) domain, an O-methyltransferase (O-MT) domain, a C- methyltransferase (C-MT) domain, a halogenase domain, a sulfotransferase (ST) domain and enoyl reductase (ER) domain with an Acyl-Coenzyme A derivative of a polyketide in the presence of a phosphopantetheinyl transferase and necessary co- factors to form a polyketide derived from said Acyl-Coenzyme A derivative.

18. The method of claim 17, wherein the Acyl-Coenzyme A derivative is selected from the group consisting of a Acetyl-CoA, Acetoacetyl-CoA, Malonyl-CoA, Succinyl- CoA, fatty acyl-CoA, phenylacetyl-CoA and Butyryl-CoA.

19. The method of claim 18, wherein the Acyl-Coenzyme A derivative is malonyl-CoA.

20. The method of any one of claims 17-19, wherein the polyketide is malonic acid.

21. The method of any one of claims 17 to 20, wherein the polypeptide and Acyl- Coenzyme A derivative of a polyketide are reacted at a temperature from about 1 to about 50°C, about 15 to about 40°C, or about 20 to about 40°C.

22. The method of claim 21 , wherein the polypeptide and Acyl-Coenzyme A derivative of a polyketide are reacted at a temperature of about 4 to about 37°C.

23. The method of any one of claims 17-22, wherein the polypeptide and Acyl-Coenzyme A derivative of a polyketide are reacted for a period of at least 30 seconds.

24. The method of any one of claims 17-23, wherein the one or more further domains comprises an AT domain and wherein the C-terminus of the ACP domain is covalently linked to the N-terminus of the TE domain, and the C-terminus of the AT domain is covalently linked to N-terminus of the ACP domain.

25. The method of any one of claims 17-24, wherein the one or more further domains comprises an AT domain and wherein the AT domain is discrete from the polypeptide encoding at least an ACP domain and TE domain.

26. A method for screening for the activity of one or more PKS domains, comprising:

(a) transforming a microorganism with a vector or plasmid of claim 15 or 16, wherein the vector or plasmid further comprises a nucleic acid encoding for phosphopantetheinyl transferase;

(b) culturing the microorganism of step (a) to express a polypeptide according to any one of claims 1 to 13 and phosphopantetheinyl transferase; (c) isolating the expressed polypeptide;

(d) incubating the polypeptide of step (c) with one or more substrates capable of reacting with one or more PKS domains to produce one or more metabolites derived from said substrate; and

(e) identifying and quantifying one or more metabolites derived from said substrate, wherein the identification and quantification of the one or more metabolites, relative to a control, indicates the activity of the one or more PKS domains.

27. The method of claim 26, wherein the one or more PKS domain in the polypeptide of any one of claims 1 to 13 is derived from genome mining of a bacterial genome for PKS genes.

28. The method of any one of claims 26 or 27, further comprising identifying and quantifying one or more intermediates covalently bound to the polypeptide, wherein the identification and quantification of one or more intermediates bound to the polypeptide, relative to a control, indicates that the polypeptide is active.

29. The method of any one of claims 26 to 28, wherein the step of identifying and quantifying comprises MS analysis. 30. The method of claim 29, wherein the MS analysis is LC-MS or MALDI-TOF MS.

Description:
Biosynthetic Polypeptide and Methods of Use

Cross-Reference to Related Applications

This application claims the benefit of priority of Singapore provisional application No. 10201401317Y, filed 7 April 2014, the contents of which is hereby incorporated by reference in its entirety for all purposes.

Technical Field

The present invention generally relates to the fields of microbiology, molecular biology, and biofuel technology. More specifically, the present application relates to biosynthetic polypeptide scaffolds, and methods for screening the activity of enzymatic domains, as well as methods for producing polyketides.

Background Art

Over the past decade, both academia and industry have increasingly turned to the use of microbes for the study of metabolic pathways and microbial biosynthesis of natural products, with applications in drug development or synthesis of biofuels through modified enzyme pathways. Microbial biosynthesis is advantageous as it affords green, complex stereochemical reactions at a lower cost and higher efficiency. Successful examples of microbial biosynthesis use in drug development include the production of plant-derived malaria drug, artemisinin, in yeast, and production scale biosynthesis of cholesterol-lowering drug, Zocor.

The characterization of biosynthetic pathways is a promising line of research for potentially leading to the development of chemical scaffolds with advantages in drug development and the discovery of enzymatic domains for biotechnological applications. As such, biosynthetic engineering of metabolic pathways may provide effective methods for the synthesis of drug compounds and biofuels.

Recent explosion of high throughput sequencing efforts have led to the discovery of many biosynthetic enzyme gene families that could have biotechnological applications, dependent upon their activity and function. However, many difficulties still exist in manipulation of these gene families. For example, these genes are often very large, making genetic engineering a challenging task even with current technologies. As a result, the journey from biosynthetic enzyme discovery to production and then to manipulation is a laborious, time consuming and expensive process.

In this regard, there is a need in the art for adaptable and effective biosynthetic platforms to enable bioengineering of enzymatic pathways for drug discovery and development, and also the provision of biosynthetic pathways towards the production of biofuels and chemicals. Summary of the Disclosure

In one embodiment, there is provided a polypeptide comprising an amino acid sequence encoding at least an acyl carrier protein (ACP) domain and a thioesterase (TE) domain, and one or more further domains selected from the group consisting of an acyltransferase (AT) domain; a ketoreductase (KR) domain, a dehydratase (DH) domain, an O-methyltransferase (O-MT) domain, a C-methyltransferase (C-MT) domain, a halogenase domain (Hal), a sulfotransferase (ST) domain and enoyl reductase (ER) domain.

In another embodiment, there is provided an isolated nucleic acid encoding a polypeptide as described herein.

In another embodiment, there is provided a recombinant DNA expression vector or plasmid comprising the isolated nucleic acid of as described herein that is operably linked to a promoter sequence.

In another embodiment, there is provided a method for producing a polyketide, said method comprising reacting a polypeptide comprising an amino acid sequence encoding at least an acyl carrier protein (ACP) domain and a thioesterase (TE) domain, and one or more further domains selected from the group consisting of an acyltransferase (AT) domain; a ketoreductase (KR) domain, a dehydratase (DH) domain, an O-methyltransferase (O-MT) domain, a C-methyltransferase (C-MT) domain, a halogenase domain (Hal), a sulfotransferase (ST) domain and enoyl reductase (ER) domain with an Acyl-Coenzyme A derivative of a polyketide in the presence of a phosphopantetheinyl transferase to form a polyketide derived from said Acyl-Coenzyme A derivative. In another embodiment, there is provided a method for screening for the activity of one or more PKS (polyketide synthase) domains, comprising: (a) transforming a microorganism with a vector or plasmid as described herein, wherein the vector or plasmid further comprises a nucleic acid encoding for phosphopantetheinyl transferase; (b) culturing the microorganism of step (a) to express a polypeptide as described herein and phosphopantetheinyl transferase; (c) isolating the expressed polypeptide; (d) incubating the polypeptide of step (c) with one or more substrates capable of reacting with one or more PKS domains to produce one or more metabolites derived from said substrate; and(e) identifying and quantifying one or more metabolites derived from said substrate, wherein the identification and quantification of the one or more metabolites, relative to a control, indicates the activity of the one or more PKS domains.

Brief Description of Drawings

The invention will be better understood with reference to the detailed description when considered in conjunction with the non-limiting examples and the accompanying drawings, in which:

Fig. 1 shows (A) Polyketide examples (B) Assembly-line arrangement of 6- deoxyerythronolide synthase (DEBS) for the production of 6-deoxyerythronolide (6DEB). KS: Ketosynthase, AT: acyltransferase, ACP: acyl carrier protein, KR: ketoreductase, DH: dehydratase, ER: Enoyl reductase, TE: thioesterase. (C) A comparison of the present disclosure with current standard genome mining strategies. The R and R' groups are potential points of modification.

Fig. 2 is a list of the top 50 domains seen in proteins containing both KS and ACP domains.

Fig. 3 shows protein expressions of engineering proteins. The protein samples are taken of elutes from Ni-NTA purification. Lane 1. CurA Hal-Hal ACP, 2. CurA Hal-Hal ACP, 3.

CurA Hal-Hal ACP (C-terminal his-tag), 4. CurA Hal-ACP6-TE, 5. CurM (ACP-ST-TE), 6.

LmnJ MT-ACP6-TE, 7. Cur Hal-ACP-TE, 8. Bar MT-BarF ACP-ACP6-TE, 9. MelF MT-

ACP6-TE. Hal: halogenase, ACP: acyl-carrier protein, TE: Thioesterase, ST: sulfotransferase, MT: methyltransferase. Unless the domains are from DEBS, they are annotated. BarF (Lyngbya majuscula barbamide biosynthesis gene cluster), MelF

{Melittangium lichenicola melithiazol gene cluster), LmnJ {Streptomyces atroolivaceus leinamycin biosynthetic gene cluster), Hal and ST and TE domains from Curacin biosynthesis (Lyngbya majuscule).

Fig. 4 shows protein constructs and their functions. The underlined domains (i.e. ACP6 -TE) are from DEBS. Other domains are annotated as before.

Fig. 5 is (A) a workflow for evaluation of PKS domains; (B) a LC-MS chromatogram showing the triketide lactone product peak (173 m/z) and (C) a MS/MS spectrum of the triketide lactone product.

Fig. 6 shows substrates and predicted products. A. O-methylation. B. C-methylation. C. Terminal alkene formation. D. Halogenation/Chlorination.

Fig. 7 is a chromatogram showing the chlorination product in control (no sfp enzyme) and enzyme-containing samples.

Fig. 8 is a phylogeny tree of the native ACP domains for the CurA halogenation domain (ACPI , II, III) and the non-native DEBS ACP6 partner.

Fig. 9 is the proposed malonic acid pathway. (A) In vivo malonic acid pathway utilizing malonyl-CoA from the fatty acid synthesis. (B) Proposed reactions of ACP-TE, sfp and AT for the formation of malonic acid. (C) In vitro reactions for malonic acid formation. Reaction

1 has a single turnover reaction whilst reaction 2 is predicted to have multiple turnovers in the presence of AT. ACP: Acyl carrier protein, AT: acyltransferase, TE: thioesterase.

Fig. 10 is an example of MRM chromatograms for 13C-malonic acid (top) and malonic acid (bottom) in sample 1.

Fig. 11 is the ACP-TE protein expression. Flowthrough (lane 1), wash (lane 2) and elution

(lane 3) of ACP-TE from Ni-NTA purification.

Fig. 12 shows the co-overexpression of PKS enzymes in malonyl-CoA overproducing E. coli cells. Coomassie (left panel) and His (right panel) stain of lysates from cells 1, 3 and 5 days after IPTG induction of PKS enzymes. Arrows indicated the expected sizes for sfp (30 kDa), AT (30 kDa) and ACP-TE (40 kDa).

Fig. 13 shows the MALDI-TOF analysis for O-methylation and C-methylation of MelF and LmnJ domains respectively. MALDI-TOF MS spectrum for MelF-ACP6-TE° with (A) no substrates compared with (B) acyl-CoA, sfp and SAM. (C) In vitro reaction, along with modifications on the ACP-serine shown. (D) Protein expression of MelF-ACP6-TE, BarF- ACP6-TE and LmnJ-ACP6-TE (E) Observed masses for methylated products. ACP motif for all constructs is the same. Fig. 14 shows LC-MS chromatogram with extracted ion peaks for O-methylation reaction product in control (no BarF-ACP-TE enzyme) and enzyme containing samples (m/z 187).

Description of Embodiments

The present disclosure is based on a combination of bioinformatics, protein engineering and mass spectrometric analysis to rapidly search and verify the activity of enzymatic domains for potential applications. In this regard, the present disclosure relates to an enzymatic polypeptide scaffold that aids in screening the activity and function of enzymatic domains in addition to being able to be manipulated for combinatorial engineering purposes and production of polyketides. The activity and function of enzymatic domains may be postulated based upon existing data and structural information, and subsequently verified by incorporating the enzymatic domains into the polypeptide scaffold. The activity of the identified enzymatic domains can be used to predict the activity and product of its native biosynthetic gene cluster. Moreover, the polypeptide scaffold is advantageous in the specific processing of intermediate and higher efficient substrate concentrations.

Standard strategies for elucidating the activity and function of enzymatic domains are illustrated in Figure 1C, whereby the present disclosure overcomes the associated disadvantages of these standard techniques, such as protein-interaction suitability, substrate library and protein solubility issues. As such, the present disclosure significantly cuts down on the enzymatic domain investigation process by allowing systematic rapid swapping of tailoring domains in a modular exchangeable system.

A particular biosynthetic family of enzymatic domains that may be applied in the polypeptide scaffold, is the family of polyketide synthases (PKS). Similar to the fungal, and discrete systems, the polyketide synthases use simple carbon building blocks as feedstock to yield a rich repertoire of chemical modifications and scaffolds. Moreover, polyketide synthases produce stereochemically and structurally complex drug compounds, such as lovastatin, erythromycin and rifamycin. But more importantly, polyketide synthases accomplish their reactions in a modular assembly-line manner using simple carbon building blocks as feedstock, which allows for rational engineering and more significantly, precise control over the end products. Accordingly, a focus of the present disclosure is on polyketide synthases domains for use in constructing a polypeptide scaffold. Polyketide synthases are a family of multi-domain enzymes or enzyme complexes that produce polyketides and are classified into three groups: Type I polyketide synthases that are large, highly modular proteins; Type II polyketide synthases that are aggregates of mono functional proteins; and Type III polyketide synthases that do not use acyl carrier protein (ACP) domains. Type I PKSs may be further subdivided into 1) iterative PKSs that re-use domains in a cyclic fashion; and 2) modular PKSs that contain a sequence of separate modules and do not repeat domains (with the exception of trans- acyltransferase (AT) domains). In this regard, type I polyketide-synthase module consists of several domains with defined functions that include Acyltransferase (AT); Acyl carrier protein (ACP); Keto- synthase (KS); Ketoreductase (KR); Dehydratase (DH); Enoylreductase (ER); Methyltransferase O- or C- (a or β) (O-MT or C-MT); sulfotransferase (ST); or Thioesterase (TE).

In addition to polyketide synthase domains for application in the polypeptide of the present disclosure, enzymatic domains derived from multi-domain enzymes families other than polyketide synthase may be additionally used. For example, one or more nonribosomal peptide synthetases (NRPS) or one or more or fatty acid synthases (FAS) may be applied to the polypeptide scaffold. For example, a NRPS domain such as halogenase (Hal) may be used in the present disclosure.

Accordingly, the present disclosure relates to a polypeptide constructed from one or more of a acyl carrier protein (ACP) domain, a thioesterase (TE) domain, an acyltransferase (AT) domain; a ketoreductase (KR) domain, a dehydratase (DH) domain, and an O- methyltransferase (O-MT) domain, a C-methyltransferase (C-MT) domain, a halogenase domain(H), a sulfotransferase (ST) domain and enoyl reductase (ER) domain.

In one embodiment, there is provided a polypeptide comprising an amino acid sequence encoding at least an acyl carrier protein (ACP) domain and a thioesterase (TE) domain. The polypeptide of the present disclosure may interchangeably be termed as a polypeptide scaffold or polypeptide platform comprising the ACP and TE.

Moreover, in one embodiment, there is provided a polypeptide comprising an amino acid sequence encoding at least an acyl carrier protein (ACP) domain and a thioesterase (TE) domain and one or more further domains selected from the group consisting of an acyltransferase (AT) domain; a ketoreductase (KR) domain, a dehydratase (DH) domain, an O-methyltransferase (O-MT) domain, a C-methyltransferase (C-MT) domain, a halogenase domain (Hal), a sulfotransferase (ST) domain and enoyl reductase (ER) domain.

The ACP, TE and one or more further domains may be "functional variants" that have a polypeptide sequence that is at least 70%, 75%, 80%, 85%, 90%, 95% or 99% identical to their reference amino acid sequence and may retain amino acids residues that are recognized as conserved for the enzyme, and may have non-conserved amino acid residues substituted or found to be of a different amino acid, or amino acid(s) inserted or deleted, but which does not affect or has an insignificant effect on its enzymatic activity as compared to the reference enzyme. The "functional variant" enzyme may be found in nature or be an engineered mutant thereof.

In another embodiment, the one or more further domain comprises a "variant" of a PKS, NRPS or FAS domain as described herein that have a polypeptide sequence that is at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95% or 99% identical to a reference enzyme domain described herein. In particular, the one or more further domains may be derived from genome mining of enzymatic PKS, NRPS or FAS gene clusters and functionally annotated by a protein database, such as pFAM to identify functional domains. As such, in one embodiment the one or more further domains comprise a variant of an acyltransferase (AT) domain; a ketoreductase (KR) domain, a dehydratase (DH) domain, an O-methyltransferase (O-MT) domain, a C-methyltransferase (C-MT) domain, a halogenase domain, a sulfotransferase (ST) domain or a enoyl reductase (ER) domain.

In one embodiment, the one or more further domains may comprise a tailoring domain. The term "tailoring domain" refers to one or more further domains that define an overall enzymatic function of the polypeptide disclosed herein, and may function in improving domain-domain interactions, modifying the hydrocarbon chain of the substrate and/or provide efficient substrate concentrations for enzymatic reactions. In this regard, the tailoring domain may be derived from a PKS, NRPS, or fatty acid synthase (FAS). In one embodiment, the tailoring domains are selected from the group consisting of an acyltransferase (AT) domain; a ketoreductase (KR) domain, a dehydratase (DH) domain, an O-methyltransferase (O-MT) domain, a C-methyltransferase (C-MT) domain, a halogenase domain (Hal), a sulfotransferase (ST) domain and enoyl reductase (ER) domain. In one embodiment, the C-terminus of said ACP domain is covalently linked to the N- terminus of the tailoring domain or the C-terminus of said tailoring domain is covalently linked to the N-terminus of the ACP domain. In another embodiment, the tailoring domain is covalently linked to the N-terminus of the ACP domain. In another embodiment, the tailoring domain is positioned in between the ACP and TE domains. In one embodiment, the ACP, TE and tailoring domain may be arranged as X-ACP-TE or ACP-X-TE, wherein X represents the one or more further domains, and "-" refers to the linkage as disclosed herein. The linkage of the domains may be a covalent linkage via an amino acid sequence encoding a linker or scaffold encoding region. The linker may be a flexible linker, that comprises a flexible polypeptide sequence, for example, a poly-alanine, poly-glycine or poly-glycine-serine amino acid sequence.

In one embodiment, the ACP domain and TE domain are located adjacent to each other, and arranged as ACP-TE, wherein "-" represents the linkage of the two domains. In one embodiment, the ACP and TE domains of the polypeptide may be covalently linked via an amino acid sequence encoding a linker or scaffold encoding region. The linker may be a flexible linker, that comprises a flexible polypeptide sequence, for example, poly-alanine, poly-glycine or poly-glycine-serine amino acid sequence.

In one embodiment, the ACP, TE and one or more further domains may be arranged as X- ACP-TE or ACP-X-TE, wherein X represents the one or more further domains, and "-" refers to the linkage as disclosed herein.

In one embodiment, the ACP, TE and one or more further domains may be derived from type I, type II, type III polyketide synthases (PKS) or fatty acid synthases (FAS), and preferably the modular type I polyketide synthases. In one embodiment, the polypeptide scaffold disclosed herein may be derived and deconstructed from module 6 of the prototypical 6- deoxyerythonolide synthase (DEBS) comprising an active centre, an acyl-carrier protein (ACP), and a thioesterase (TE).

In one embodiment the ACP domain and TE domain form a didomain derived from a type I polyketide synthase (PKS), more specifically derived from module 6 of the prototypical 6- deoxyerythonolide synthase (DEBS). In another embodiment, the one or more further domains selected from the group consisting of an acyltransferase (AT) domain; a ketoreductase (KR) domain, a dehydratase (DH) domain, an O-methyltransferase (O-MT) domain, a C-methyltransferase (C-MT) domain, a halogenase (Hal) domain, a sulfotransferase (ST) domain and enoyl reductase (ER) domain are positioned discrete from the polypeptide encoding at least an ACP domain and TE domain. In this embodiment, it will be appreciated that the one or more further domains are trans- acting on the polypeptide disclosed herein. In one embodiment, the one or more further domains are trans- acting on the ACT and TE didomain.

In one embodiment, the ACP, TE, or one or more further domains of the polypeptide may be derived from a bacterium, such as cyanobacteria, proteobacteria and actinobacteria. In addition, there is provided a recombinant nucleic acid that encodes the polypeptide described herein. The recombinant nucleic acid can be a double-stranded or single-stranded DNA, or RNA. The recombinant nucleic acid can encode an open reading frame (ORF) of the polypeptide. The recombinant nucleic acid can also comprise promoter sequences for transcribing the ORF in a suitable host cell and terminator sequences. The recombinant nucleic acid can also comprise sequences sufficient for having the recombinant nucleic acid stably replicate in a host cell. Accordingly, in one embodiment, there is provided an isolated nucleic acid encoding the polypeptide as described herein.

In another embodiment, there is provided a recombinant DNA expression vector or plasmid comprising the isolated nucleic acid as described herein operably linked to a promoter sequence. In one embodiment, the expression vector or plasmid is capable of expressing the polypeptide when cultured under suitable conditions. In one embodiment, the recombinant DNA expression vector or plasmid is capable of replicating autonomously. It will be appreciated by the person of skill in the art that many suitable plasmids or vectors exist and are included in the scope of the present disclosure, whereby the culture conditions for expressing the polypeptide would be readily understood and known to that known to those skilled in the art depending on the selected plasmid or vector expression system. In one embodiment, the vector or plasmid comprises an expression control sequence operably linked to said isolated nucleic acid. The expression control sequence may be a nucleic acid fragment that promotes, enhances or represses expression of a gene or protein of interest. In one embodiment, the vector or plasmid are suitable for introduction into and transformation of a microorganism, such as E.coli. Those skilled in the art would readily understand suitable and commercially acceptable expression vectors that may be used, for example the pET21 or pET28 vectors.

As previously described herein, the polypeptide may comprise a didomain consisting of an acyl-carrier protein (ACP), and a thioesterase (TE) along with one or more further domains. In this regard, simple acyl- substrates can be incorporated into the polypeptide via phosphopantetheinyl transferase loading of the corresponding acyl-CoA. The acyl-CoA may be derived from carboxylic acids via the action of acyl-CoA synthase. In this regard, the "loading" refers to the transfer a 4'-phosphopantetheine (4'-PP) moiety from coenzyme A (Co A) to the acyl carrier protein (ACP). This allows a large variety of substrates to be used for testing the polypeptide disclosed herein for activity as well as biosynthesis and production of polyketides. The thioesterase (TE) may be used to hydrolyze the product from the ACP to enable product analysis via a standard technique, such as LC-MS. Thus, the polypeptide disclosed herein may be used in a method for producing polyketides derived from acyl- substrates such as an Acyl-Coenzyme A derivative. Polyketides as described herein, refer to a class of secondary metabolites. In particular, polyketides can be divided into three classes: type I polyketides, for example macrolides produced by multimodular megasynthases; type II polyketides (for example, aromatic molecules produced by the iterative action of dissociated enzymes); and type III polyketides (for example, small aromatic molecules produced by fungal species). Polyketide antibiotics, antifungals, cytostatics, anticholesteremic, antiparasitics, coccidiostats, animal growth promoters and natural insecticides are may also be included.

Accordingly, in one embodiment there is provided a method for producing a polyketide, comprising reacting a polypeptide comprising an amino acid sequence encoding at least an acyl carrier protein (ACP) domain and a thioesterase (TE) domain, and one or more further domains selected from the group consisting of an acyltransferase (AT) domain; a ketoreductase (KR) domain, a dehydratase (DH) domain, an O-methyltransferase (O-MT) domain, a C-methyltransferase (C-MT) domain, a halogenase domain, a sulfotransferase (ST) domain and enoyl reductase (ER) domain with an Acyl-Coenzyme A derivative of a polyketide in the presence of a phosphopantetheinyl transferase to form a polyketide derived from said Acyl-Coenzyme A derivative. In one embodiment, the phosphopantetheinyl transferase is surfactin phosphopantetheinyl transferase (sfp).

In one embodiment of the method, the Acyl-Coenzyme A derivative is selected from the group consisting of a Acetyl-CoA, Acetoacetyl-CoA, Malonyl-CoA, Succinyl-CoA, fatty acyl-CoA, phenylacetyl-CoA and Butyryl-CoA. In another embodiment, the Acyl-Coenzyme A derivative is malonyl-CoA.

In one embodiment of the method, the polypeptide as described herein, the Acyl-Coenzyme A derivative and the phosphopantetheinyl transferase may be incubated and cultured under suitable conditions to produce the polyketide. In one embodiment, a culture medium is selected such that the polypeptide, the phosphopantetheinyl transferase and Coenzyme A derivative react to produce the polyketide. In one embodiment, the culture medium is a lysogeny broth (LB) culture medium.

In one embodiment of the method, the polypeptide, the phosphopantetheinyl transferase and the Acyl-Coenzyme A derivative are reacted at a temperature from about 1 to about 50°C, about 15 to about 40°C, or about 20 to about 40°C. In one embodiment, the polypeptide, the phosphopantetheinyl transferase and Acyl-Coenzyme A derivative of a polyketide are reacted at a temperature of about 4 to about 37°C.

In one embodiment, the polypeptide, the phosphopantetheinyl transferase and Acyl- Coenzyme A derivative are reacted for a period of at least 15 seconds, 30 seconds, 45 seconds, 1 minute, 5 minutes, 10 minutes, 15 minutes, 30 minutes, 1 hour.

In one embodiment, the method further comprises isolating the polyketide following the reaction step. In this regard, isolation of the polyketide may comprise a standard drying technique known in the art for removal of the enzymes and other protein components.

In one embodiment, the method further comprises verifying the isolated polyketide by mass spectrometry analysis, such as LC-MS analysis. In another embodiment, the polyketide produced may be quantified by multiple reaction monitoring scans (MRM). In one embodiment, the polyketide may be a polyketide intermediate, a chlorinated product; or carboxylic acid.

One particular polyketide of interest for production is malonic acid (1,3-propandioic acid). Malonic acid and its esters have many applications in the pharmaceutical, cosmetic, industrial and food industries. Malonic acid is used in Knoevenagel condensations for the synthesis of a, β-unsaturated carboxylic acids, including cinnamic and acrylic acids. Malonic acid and its esters are intermediates for syntheses of pharmaceuticals such as vitamins Bl and B6, barbiturates and non-steroidal anti-inflammatory drugs. In addition, malonic acid derivatives show promise as antineoplastic agents. Malonic acid is generally synthesized chemically by hydrolysis of chloroacetic acid. In nature, malonic acid is present in plant root tissues and labeling experiments reveal acetate and oxaloacetate as precursors for its biosynthesis. Additionally, malonic acid is also formed in microorganisms as a degradation of purimidines. Nonetheless, enzymes of these proposed malonic acid biosynthetic pathways have not been elucidated. As such, an advantage of the method disclosed herein is to use the polypeptide scaffold to yield malonic acid from malonyl CoA via acylation and subsequent hydrolysis.

Accordingly, in one embodiment of the method the product is malonic acid (1 ,3-propandioic acid), wherein the Acyl-Coenzyme A derivative is malonyl CoA. In another embodiment of the method, the one or more further domains of the polypeptide comprises an AT domain, and the C-terminus of the AT domain is covalently linked to N-terminus of the ACP domain. Accordingly, the AT domain may be cis-acting on the malonyl CoA. Specifically, the ACP of the polypeptide will act as the active centre for the acylation of malonyl-moiety via the cis- acting malonyl-CoA specific acyltransferase (AT).

In another embodiment, the one or more further domains of the polypeptide comprises an AT domain and wherein the AT domain is discrete from the polypeptide encoding the ACP domain and TE domain. Accordingly, the AT domain may be trans-acting. Specifically, the ACP of the polypeptide will act as the active centre for the acylation of malonyl-moiety via the trans-acting malonyl-CoA specific acyltransferase (AT). In one embodiment of the method, the polypeptide may be expressed in a host cell or has been previously isolated or purified. The host cell may be a eukaryotic or a prokaryotic cell. Suitable eukaryotic cells include yeast cells. In one embodiment, the host cell is Escherichia coli (E. coli). Specifically, the use of E. coli in producing malonic acid is advantageous, as malonyl-CoA is naturally synthesized in E. coli

The polypeptide described herein may produce the polyketide in vivo (in a host cell) or in vitro (in a cell extract or where all necessary chemical components or starting materials are provided). The disclosure provides methods of producing the polyketide using any of these in vivo or in vitro means, via techniques and methods that are readily known in the art, as outlined by the examples described herein.

In another embodiment, there is provided a method for screening for the activity of the one or more further domains of the polypeptide disclosed herein. In this regard, the one or more further domains may be derived from genome mining of a bacterial genome for PKS, NRPS or FAS genes, as described herein. In one embodiment, the one or more further domains are variants of an acyltransferase (AT) domain; a ketoreductase (KR) domain, a dehydratase (DH) domain, an O-methyltransferase (O-MT) domain, a C-methyltransferase (C-MT) domain, a halogenase domain (Hal), a sulfotransferase (ST) domain or a enoyl reductase (ER) domain that are derived from PKS, NRPS or FAS. As indicated in the examples described herein, genome mining is an effective informational/computational tool known in the art and provides for identifying new molecules and biosynthetic machinery for screening. In brief summary, genome mining involves searching microbial genome sequences available in public databases for characteristic natural product biosynthetic genes or gene clusters. Accordingly, genome mining in the context of the present disclosure relates to the screening of bacterial genomes, using readily available genomic databases such as GeneBank, for PKS genes and gene clusters as well as NRPS or FAS genes and gene clusters. In one embodiment, the one or more further domains are one or more PKS domains.

Accordingly, there is provided a method for screening for the activity of one or more PKS domains, comprising: (a) transforming a microorganism with a vector or plasmid as described herein, wherein the vector or plasmid further comprises a nucleic acid encoding for phosphopantetheinyl transferase; (b) culturing the microorganism of step (a) to express a polypeptide as described herein and phosphopantetheinyl transferase; (c) isolating the expressed polypeptide; (d) incubating the polypeptide of step (c) with one or more substrates capable of reacting with one or more PKS domains to produce one or more metabolites derived from said substrate; and (e) identifying and quantifying one or more metabolites derived from said substrate, wherein the identification and quantification of the one or more metabolites, relative to a control, indicates the activity of the one or more PKS domains.

In this regard, the one or more PKS domain may be derived from genome mining of a bacterial genome for PKS genes, as described herein. In one embodiment, the genome mining may screen for PKS gene clusters that contain both ketosynthase (KS) and acyl carrier protein (ACP) domains. Once the PKS genes or gene clusters are identified, the sequence of the deduced gene products is analyzed and the putative function of each gene is postulated. Accordingly, in one embodiment, the one or more PKS domain may be annotated with the pFAM database to identify functional domains and postulate a possible enzymatic function to screen for.

In one embodiment, the step of transforming a microorganism comprises using standard techniques known in the art, for example electroporation. Moreover, the skilled artisan would readily be able to appreciate the culturing conditions and components that would be required in the context of the selected microorganism for protein expression of the polypeptide and the phosphopantetheinyl transferase. Moreover, the skilled artisan would readily be able to appreciate the conditions and components that would be required to react the polypeptide of step (c) with one or more substrates to produce one or more metabolites.

In one embodiment, the screening method further comprises identifying and quantifying one or more intermediates covalently bound to the polypeptide described herein, wherein the identification and quantification of one or more intermediates bound to the polypeptide, relative to a control, indicates that the polypeptide is active. In one embodiment, the step of identifying and quantifying comprises mass spectrometry (MS) analysis of the hydrolysed metabolite or mass of the digested polypeptide with metabolite attached if thioesterase is inactivated.

In another embodiment, the metabolite is identified by LC-MS and its expected isotopic mass. Moreover, the metabolite level is relatively quantified with respect to a "control" sample, which comprises of all reactants except the enzymatic domain. Further analysis using the Multiple Reaction Monitoring (MRM) mode is then conducted on the LC-MS system. To ensure analytical consistency, 3 technical replicates of the sample may be ran. The relative fold -change of the metabolite (over the control) is then calculated by averaging the 3 integrated peak area values obtained in the LC-MS runs. The relative standard deviation of the values obtained may be less than 10% to indicate the presence or absence of the activity of the PKS domain. For example, the examples described herein outline the step of identifying and quantifying the metabolites derived from the substrate using MS analysis, with respect to 1) triketide lactone product (Fig 5B, C); 2) chlorinated product (Fig 7) ; and 3) malonic acid (Fig 10). Accordingly, in one embodiment the MS analysis may comprise LC-MS or MALDI-TOF MS.

The invention illustratively described herein may suitably be practiced in the absence of any element or elements, limitation or limitations, not specifically disclosed herein. Thus, for example, the terms "comprising", "including", "containing", etc. shall be read expansively and without limitation. Additionally, the terms and expressions employed herein have been used as terms of description and not of limitation, and there is no intention in the use of such terms and expressions of excluding any equivalents of the features shown and described or portions thereof, but it is recognized that various modifications are possible within the scope of the invention claimed. Thus, it should be understood that although the present invention has been specifically disclosed by preferred embodiments and optional features, modification and variation of the inventions embodied therein herein disclosed may be resorted to by those skilled in the art, and that such modifications and variations are considered to be within the scope of this invention.

The invention has been described broadly and genetically herein. Each of the narrower species and subgeneric groupings falling within the generic disclosure also form part of the invention. This includes the generic description of the invention with a proviso or negative limitation removing any subject matter from the genus, regardless of whether or not the excised material is specifically recited herein.

Other embodiments are within the following claims and non- limiting examples. In addition, where features or aspects of the invention are described in terms of Markush groups, those skilled in the art will recognize that the invention is also thereby described in terms of any individual member or subgroup of members of the Markush group.

Examples

Example 1

Materials and Methods

As it will be appreciated by those skilled in the art, the materials and methods described in this Example may be applied to those of Examples 2 and 3 below, and may be considered as general methods in the context of the technology and present disclosure that may be readily adapted or altered by the skilled artisan dependent upon the purpose and application of said materials and methods.

Reagents and Chemicals

SDS-PAGE gradient gels were obtained from Invitrogen. Ni-NTA agarose is from Roche. All other chemicals can be obtained from commercially available sources, such as Sigma.

Genome Mining

In developing the present invention genome mining was used for identifying enzymatic domains within gene clusters. The results of the genome mining were applied to the various databases and sources of deposited genomes that are accessible to identify any novel domains. Any novel domains that are discovered from genome mining were applied to a biosynthetic protein platform for screening of activity.

The application of the genome mining strategy is to identify the discovery of novel domains to then verify the activity and functionality of these domains, using the polypeptide scaffold disclosed herein for the screening of enzyme domain activity.

Protein constructs

His-tagged sfp was expressed as described. The DEBS ACP6-TE construct, containing both N-, C-terminal his-tags, was amplified using primers 5'- AAAAAAGGATCCGGGGCCGTCCCCGCGGTGCA-3' (SEQ ID NO: l) and 5'- TTTTTTCTCGAGTGAATTCCCTCCGCCCAGCCAGGC-3 ' (SEQ ID NO:2) with EryAIII as template. The amplified DNA fragment was digested with BamHI/XhoI and cloned into a pET28 vector to yield p(ACP6-TE). Halogenase and methyltransferases genes were optimized for E. coli expression and synthesized by Genscript. Halogenase domain from Curacin gene cluster was amplified using primers 5'- AAAAAAgctagcAGCGACCTGCAACAATACAAAAACAAG-3' (SEQ ID NO:3) and 5'- TTTTTTGGATCCACCAGAACCACCCGGTTGCTGCGTTTGCGGCG -3'. (SEQ ID NO:4) Methyltransferase (LmnJ) was amplified using primers 5'- AAAAAAgctagcGAACAACTGACCCGTGTTATCGC-3' (SEQ ID NO:5) and 5'- TTTTTTggatccACCAGAACCACCGTGTTCCGCGCTCAGGTATTCA -3'.(SEQ ID NO:6) The amplified DNA fragments were digested with Nhel/BamHI and cloned into p(ACP6-TE) to yield p(Hal-ACP6-TE) and p(LmnJ-ACP6-TE). Sulfotransferase (ST) and thioesterase (TE) domains from Curacin (CurM) were also constructed. ST-TE domains with its native ACP was amplified with 5'-

AAAAAAGGATCCTCGGTGCTGGCAAAACAATTTAGCTCG -3' (SEQ ID NO:7) and 5'- TTTTTTCTCGAGCGAGGTCAGAATCAGCGAAGCC -3'. (SEQ ID NO:8)The gene was digested with BamHI and Xhol and placed into pET28a to yield p(ACP-ST-TE). ST-TE domain was amplified with 5'- TTTTTTCTCGAGCGAGGTCAGAATCAGCGAAGCC -3' (SEQ ID NO:9)and 5'- AAAAAAaagcttTCAGAAGACAACCTGGCGACCC -3'. (SEQ ID NO: 10) ACP6 from DEBS was amplified with 5'-

TTTTTTaagcttGAGCTGCTGTCCTATGTGGTCGGCC -3' (SEQ ID NO: l l) and 5'- AAAAAAggatccGGGGCCGTCCCCGCGGTGCA -3 '.(SEQ ID NO:12) ST-TE domain was digested with Hindlll and Xhol whilst the ACP6 fragment was digested with BamHI and Hindlll. The two fragments were placed into pET28a to yield p(ACP6-ST-TE).

Protein expression and purification

Plasmids were introduced into E.coli BL21 (DE3) by electroporation. The resulting transformant was grown in LB medium at 37 °C until the culture optical density reached 0.6. The culture was cooled to 18 °C, then induced with 0.1 mM isopropyl β-D- thiogalactopyranoside (IPTG), and grown for another 15 h. Cells were harvested by centrifugation (4,420 g, 15 min). The cell pellet was resuspended in lysis/wash buffer (50 mM phosphate, 10 mM imidazole, 500 mM NaCl, pH 7.6), and lysed by sonication (6 χ 30 s, on ice). After centrifugation at 42,700 g for 45 minutes, the supernatant was incubated with Ni-NTA agarose for 1 h. The resin was washed with 10 column volumes of lysis/wash buffer, and the bound protein was eluted with 4 column volumes of elution buffer (50 mM phosphate, 150 mM imidazole, 300 mM NaCl, pH 7.6). The protein was exchanged to 20 mM Tris CI buffer (pH 7.2) for use in in vitro reactions. Yields of LmnJ-ACP6-TE and Hal- ACP6-TE are 15 mg/L culture volume each. DEBS ACP6-TE and sfp were expressed similarly. In vitro halogenation reactions

Cur Hal-ACP6-TE (200 μΜ) was incubated with 20 μΜ sfp, 0.5 mM alpha-ketoglutarate, 0.5 mM ammonium iron sulphate, 30 mM NaCl and 60 mM MgCl 2 at room temperature. 1 mM DL-3-hydroxy-3-methylglutaryl-CoA was added to start the reaction. The reaction proceeded at room temperature for 5 h.

In vitro methylation reactions

LmnJ-ACP6-TE (200 μΜ) was incubated with 20 μΜ sfp, 0.5 mM SAM, and 60 mM MgCl 2 at room temperature. 1 mM DL-beta-hydroxybutyryl-CoA was added to start the reaction. The reaction proceeded at room temperature for 5 h. Similar conditions were also accomplished for BarF-ACP6-TE and MelF-ACP6-TE.

General LC-MS analysis

The resulting products from the halogenation and methylation reactions were then analyzed using LC-MS. The solution containing the reaction components is first dried under vacuum at 4°C and reconstituted in appropriate buffers (water, organic solvents such as methanol or water-solvent mixtures) depending on the properties of the expected product. Similarly, a LC program is developed using water and organic solvents such as methanol or acetonitrile. The actual chromatographic column and method used is based on the properties of the expected product (eg. HILIC column for highly polar products and CI 8 reversed phase column for organic products).

Electrospray ion mass spectrometry (ESI-MS) was then carried out on the predicted reaction product (mass calculated based on molecular formula) and the extracted ion chromatogram obtained. The next step involves conducting a MS scan to determine the fragment ions generated from the reaction product. Theoretical fragmentation of the product will also be conducted in parallel using the MassFrontier program (Thermo Scientific). Based on both experimental and theoretical evaluation of the reaction product fragmentation, key fragment ions will be selected for multiple reaction monitoring (MRM) scans on the MS. The selected parent/fragment ion pairs uniquely represent the product, therefore enabling the detection and relative quantification of the product in different samples. The chosen parent/fragment ion pairs and key optimized MS parameters for the MRM scans will be obtained prior to the analysis.

The integrated peak areas for each of the parent/fragment ion pairs are then obtained. To ensure LC-MS experimental consistency, at least 3 technical replicates of each sample are obtained and the integrated peak areas for each sample should have a relative standard deviation of less than 10%. LC-MS for chlorination reaction

A product sample is first dried under vacuum at a temperature of 4°C and reconstituted in ultrapure water prior to analysis. The reaction product in the reconstituted sample was subsequently detected using a LC-MS system (Acquity UPLC-Xevo TQ-S system, Waters Coip). The separation was performed using a CI 8 reversed phase column (HSS T3 column, 2.1 x 100 mm in length, 2.5 μηι particle size), with the following solvents - A) water with 0.1 % formic acid (Sigma- Aldrich), B) methanol (Fisher optima grade) with 0.1 % formic acid. The UPLC gradient method is stated in Table 1 and the flow rate is constant at 0.4 mL/min. The column is subsequently washed using acetonitrile (Merck) with 0.1 % formic acid and solvent A in a ratio of 98:2 for 3 min at a flow rate of 0.5 mL/min and equilibrated with 99.9% A and 0.1 % B for a further 2.5 min at a flow rate of 0.4 mL/min.

[Table 1] UPLC gradient method used in reaction product analysis

Time (min) %A

0 99^9

0.50 99.9

8.50 50

8.51 2 1 1.50 2

Electrospray ion mass spectrometry (ESI-MS) was carried out on the reaction product (molecular formula: C 6 H 9 O5CI), firstly to determine its key fragment ions in negative ion mode. Theoretical fragmentation of the product was also conducted in parallel using the MassFrontier program (Thermo Scientific). Based on both experimental and theoretical evaluation of the reaction product fragments, 3 key fragment ions were selected for multiple reaction monitoring scans on the MS. The parent/fragment ion pairs used in the MRM scans and key optimized MS parameters are summarised in Table 2.

[Table 2] Parent/fragment ion pairs used in the MRM scans and their optimized MS parameters

Collision Cone

Parent ion mass Fragment ion

Product energy voltage

(m/z) mass (m/z)

(V) (V)

195.0 56.6 20 14

C 6 H 9 0 5 C1 [6] 195.0 74.8 20 14

195.0 84.8 20 14

The integrated peak areas for each of the parent/fragment ion pairs are then obtained. To ensure LC-MS experimental consistency, technical replicates of each sample are obtained and the integrated peak areas for each sample have a relative standard deviation of less than 10%.

Results

Genome mining

As an initial survey, -2000 full-length bacterial genomes available in GenBank were screened for modular P S genes by using constraints to search for proteins that contain both ketosynthase (KS) and acyl carrier protein (ACP) domains. Each PKS gene was then annotated with the PFAM database to identify functional domains. The distribution of the top 50 domains is shown in Fig. 2. From a preliminary investigation of several domains, an initial set of domains was examined in further detail in Table 3.

[Table.3] Domains for analysis Similar Domains (sequence

Pfam domains Function identity)

From genome

mining

Bac luciferase Mono-oxygenase Novel architecture

Leinamycin PKS (Tang, 2004) DUF2156 Protein of unknown function (52%)

From known

proteins

Methyltransferase

Sulfotransferase toward terminal

Sulfotransferase* alkene CurM (Gu, 2009)

BarBl , BarA, CytC3 (Vaillancourt,

Halogenase Halogenase 2006)

* Sulfotransferase (ST) domain is combined with a thioesterase (TE) to produce the terminal alkene.

From initial mining and investigation, several interesting domains were identified, as shown in Table 3. An unknown protein domain (DUF2156) was found from initial mining results. Interestingly, a similar sequence (-52% sequence identity) was previously described in the leinamycin (LNM) polyketide synthase gene cluster. The activity and function of this Lnm domain was not characterized possibly due to inappropriate substrates or perhaps an inactive gene. Subsequently, if the lnm gene is inactive, another possibility is that these genes might have evolved through horizontal gene transfer between bacterial groups. Thus, examining their ancestral sequences might yield active domains and even new catalytic reactions.

The second candidate is from the Bac luciferase family, which has previously been observed in non-ribosomal polypeptide synthases (NRPS). This monooxygenase-acting domain was found in the PKS portion of a PKS NRPS hybrid gene cluster, where it might be used to modify a PKS intermediate and would be an attractive tailoring domain for biosynthetic applications.

Among the initial candidate domains, previously reported domains were included with potentially valuable enzymatic activities. These proteins have not been used in a combinatorial manner and it would be important to determine if they can be engineered to do so, especially for biosynthetic engineering purposes. These enzymatic candidates were also be used as a positive control to verify the discovery platform. These domains include sulfotransferases, methyltransferases (-N, -O, -C,) and halogenases from NRPS. Interestingly, the sulfotransferase domain, along with its neighboring thioesterase, was a candidate from genome mining. The function of these domains has been recently elucidated where it was found that they encode for a decarboxylative chain termination mechanism to produce a terminal alkene via sequential sulfonation and hydrolysis reactions. Protein engineering of a universal protein scaffold

To verify that the polypeptide platform can be utilized with a variety of reactions and domains from different PKSs, 5 different tailoring domains were used to build a series of hybrid constructs. These genes not only encode domains of different functions, they are also obtained from different bacterium, such as cyanobacteria, proteobacteria and actinomycetes. These domains include O-methylation domains from BarF (Lyngbya majuscula barbamide biosynthesis gene cluster), MelF (Melittangium lichenicola melithiazol gene cluster), C- methylation domain from LmnJ (Streptomyces atroolivaceus leinamycin biosynthetic gene cluster), halogenase(Hal), sulfotransferase (ST) and thioesterase (TE) domains from Curacin biosynthesis {Lyngbya majuscule). Cur ST and TE function in sequence to produce terminal alkenes.

In the initial attempt at expressing constructs of Cur halogenase domain, minimal protein expression of the domain with its native ACP partner was observed (Figure 3, lane 1-3). However, in the presence of DEBS TE, minimal protein expression was observed. With the ACP-TE scaffold, this expression was greatly improved (Figure 3, Lane 4).

In vitro analysis

The polyketide products obtained from each of the enzyme domains described in Figure 6 will be detected and identified using LC-MS. In an initial assay, we examine the formation of triketide lactone product with a purified recombinant type I bimodular P S protein (DEBS3, module 5 and 6 from the DEBS PKS) in the presence of methylmalonyl CoA and NADPH. DEBS3 purified from E. coli was incubated overnight at room temperature (25 °C) with methylmalonyl CoA (1.7 mM) and NADPH (1.7 mM) as substrates. The reaction mixture was buffered using 100 mM phosphate solution (pH 7) with TCEP (2 mM) as a reducing agent. The resulting triketide lactone product (molecular weight of 173) was then extracted from the reaction mixture using ethyl acetate and analysed via LC-MS (Figure 5A). The product peak is shown in the LC-MS base peak profile in Figure 5B, with the corresponding MS/MS spectrum describing the key fragment ions obtained shown in Figure 5C.

To design the in vitro reactions for the engineered constructs, known substrates were utilized to verify the activity of the hybrid constructs. In several cases, new substrates were applied to test the substrate flexibility of the tailoring domains (Figure 6). ChJorination

The chlorination reaction using Cur Hal-ACP6-TE and DL-3-hydroxy-3-methylglutaryl-CoA substrate (Figure 4) was analyzed with LC-MS for its expected product. The negative control reaction lacks sfp, which enables the loading of the substrate onto the enzymatic construct. MRM chromatograms obtained for the product in both the sample containing the chlorination enzyme and the control without the enzyme is also shown in Figure 7. The integrated peak areas for each of the parent/fragment ion pairs are then obtained. To ensure LC-MS experimental consistency, 3 technical replicates of each sample are obtained and the integrated peak areas for each sample have a relative standard deviation of less than 10%. A summary of the results is shown in Table 4. The enzyme-containing sample in this case was found to contain an average of 4.5 fold more reaction product than the control (no enzyme) sample. From the product observation in comparison to the negative control, it appears that the hybrid halogenation construct is functional.

[Table 4] Relative amounts of reaction product in control and enzyme-containing samples

Parent/fragment Control (Avg area) Enzyme-containing (Avg Area) Ratio

195.0/56.6 1 146 5071 4.42

195.0/74.8 8122 37725 4.64

195.0/84.8 814 3762 4.62

Discussion A flexible protein scaffold

From the investigation into various hybrid protein construction and also expression of the wild type native domain- ACP partner, a significant improvement in recombinant protein expression has been observed with the addition of the DEBS ACP-TE expression. Using the ACP6-TE as the core scaffold, the 4 attempts at combinatorial engineering of the hybrid tailoring domains with ACP6-TE yielded soluble proteins. The 100% success rate could be contributed to the solubility of ACP-TE, which has a yield of 40 mg/L culture volume. Another factor could be the dimerization of the TE protein. This dimerization could be used to stabilize the domains and perhaps even aid in activity as it is predicted that the dimeric modular PKS operate across monomeric subunits. The flexibility of the ACP-TE scaffold would also be predicted from its relative small size and also from previous engineering studies of its larger module. In the flexibility of an ACP-TE didomain has been demonstrated for the placement of non-native tailoring domains. Chlorination

From the LC-MS assay, increased production of DEBS ACP6 with the hybrid construct has been observed. As for production of DEBS ACP6 in the negative control, this can be contributed to the direct interaction of DL-3-hydroxy-3-methylglutaryl-CoA as the terminal portion of the acyl moiety could interact with the active site, albeit at a lower efficiency. Interestingly, previous studies with Hal domain and its ACP interaction demonstrate a strong preference for ACPII. ACPII is the second ACP domain in ACP triplet domain. In the polypeptide disclosed herein, despite using a non-native partner, DEBS ACP6, the chlorination is still possible (Figure 8). This could be contributed to the high efficient protein concentration between the two domains as they are now covalently linked. This bodes well for future hybrid constructs, where it is expected that the scaffold would aid in pushing reactions forward by significantly increasing the concentration at which the ACP active center is presented to the tailoring domain.

Example 2

Materials and Methods Reagents and Chemicals

SDS-PAGE gradient gels were obtained from Invitrogen. Ni-NTA agarose was from Roche. All other chemicals were from Sigma. Protein constructs

Acyltransferase from disorazole synthase (DSZS) was expressed from plasmid pFW3. His- tagged sfp was expressed as described. The DEBS ACP6-TE construct, containing both N-, C-terminal his-tags, was amplified using primers 5'-

AAAAAAGGATCCGGGGCCGTCCCCGCGGTGCA-3' (SEQ ID NO:l) and 5'- TTTTTTCTCGAGTGAATTCCCTCCGCCCAGCCAGGC-3' (SEQ ID NO:2) with DEBS 3 as template. The amplified DNA fragment was digested with BamHI/XhoI and cloned into a pET28 vector.

Protein expression and purification

Plasmid pFW3 (DSZS AT) was introduced into E.coli BL21(DE3) by electroporation. The resulting transformant was grown in LB medium at 37 °C until the culture optical density reached 0.6. The culture was cooled to 18 °C, then induced with 0.1 mM isopropyl β-D- thiogalactopyranoside (IPTG), and grown for another 15 h. Cells were harvested by centrifugation (4,420 g, 15 min). The cell pellet was resuspended in lysis/wash buffer (50 mM phosphate, 10 mM imidazole, 500 mM NaCl, pH 7.6), and lysed by sonication (6 x 30 s, on ice). After centrifugation at 42,700 g for 45 minutes, the supernatant was incubated with Ni-NTA agarose for 1 h. The resin was washed with 10 column volumes of lysis/wash buffer, and the bound protein was eluted with 4 column volumes of elution buffer (50 mM phosphate, 150 mM imidazole, 300 mM NaCl, pH 7.6). The protein was exchanged to 20 mM Tris CI buffer (pH 7.2) for use in in vitro reactions. A typical yield of DSZS AT was 50 mg/L culture volume. DEBS ACP6-TE and sfp were expressed similarly.

In vitro reactions

DEBS ACP6-TE (100 μΜ) was incubated with 10 μΜ sfp and 60 mM MgCl 2 at room temperature. 1.2 mM malonyl-CoA was added to start the reaction. The enzymatic reaction mixture included the addition of 1 μΜ DSZS AT. Malonic acid analysis by LC-MS

Each sample is first spiked with a standard solution of 13C-malonic acid (malonic acid-2- 13 C, Sigma-Aldrich) of a known concentration, and dried under vacuum at a temperature of 4°C. The dried samples are then reconstituted in 100% methanol (Optima grade, ThermoFisher). This enables the removal of the enzymes and other protein components and extracts the malonic acid produced by the reaction.

Both the malonic acid product and the stable isotope 13C-malonic acid reference standard are subsequently detected using a LC-MS system (Acquity UPLC-Xevo TQ-S system, Waters Coip). The separation was performed using a hydrophilic interaction lipid chromatography (HILIC) column (Waters, Xbridge HILIC column, 2.1 x 50mm in length, 2.5 μ ι particle size), with the following solvents - A: water with 50 mM ammonium bicarbonate (Sigma- Aldrich), B: acetonitrile (Merck). The UPLC gradient method is stated in Table 5 and the flow rate is constant at 0.5 mL/min.

An electrospray ion source was then used in the MS to perform multiple reaction monitoring (MRM) scans in negative ion mode. These MRM scans utilizes parent/fragment ion pairs for the identification of malonic acid and 13C malonic acid respectively, which were obtained based on individual MS/MS experiments carried out using the standard solution of each metabolite. Each parent/fragment ion pair uniquely represents their individual metabolite, therefore enabling the detection and quantification of both metabolites in the different samples. The parent/fragment ion pairs used in the MRM scans and key optimized MS parameters are summarised in Table 6. An example of the MRM chromatograms obtained for both metabolites in sample 1 is also shown in Figure 10.

The integrated peak areas for each of the parent/fragment ion pairs are then obtained. As 13C-malonic acid is structurally similar to malonic acid, resulting in a similar ionization efficiency and retention time, the actual concentration of malonic acid in each sample can be obtained. This is calculated by direct comparison of the relative integrated peak areas of malonic acid and 13C-malonic acid, for which the concentration is already known. To ensure LC-MS experimental consistency, 4 technical replicates of each sample are obtained and the integrated peak areas for each sample have a relative standard deviation of less than 10%.

[Table 5] UPLC gradient method for malonic acid analysis

Time (min) %B

_ _ 0.50 99

0.60 60

2.00 60

2.10 50

4.50 50

4.60 99

6.50 99

[Table 6] Parent/fragment ion pairs used in the MRM scans and their optimized MS parameters

Collision Cone

Parent ion mass Fragment ion

Metabolite energy voltage

(m/z) mass (m/z)

(V) (V)

Malonic Acid 102.6 59.2 4 14

Malonic Acid-2-

103.6 60.0 4

1 3 C 14

Co-expression of PKS enzymes in malonyl-CoA overproducing E. coli cells

Sfp, DSZS AT and ACP-TE, each with their individual T7 promoter and terminator, were cloned into pET21a to attain the PKS expression construct. This construct was electroporated into the malonyl-CoA overproduction strain (Zha, 2009). A construct expressing three genes involved in 2,3-butanediol biosynthesis (BudA, BudB and BudC), each with its own T7 promoter and terminator) was electroporated into the same malonyl-CoA overproduction strain as a negative control.

The resulting transformants were picked into minimal medium and grown overnight at 37C. All media contain the appropriate antibiotics. The culture was then diluted 1 :100 in the same medium and incubated at 37C until OD600 reached 0.5. The culture was then diluted into IPTG-containing medium for a final IPTG concentration of 0.1 mM and incubated at 28C for 1-5 days. At indicated times after protein induction, 1 mL cultures were harvested by pelleting the cells and removing the supernatants. To prepare cell lysates, cell pellets were boiled in lysis buffer (4% SDS, 150 mM NaCl, 50 mM Tris pH 7.4, lx concentration of Roche complete protease inhibitor) for 1 min. The samples were cooled to room temperature and then treated with 1 uL benzonase for 15 min at room temperature prior to addition of 2-mercaptoethanol-containing Lammeli buffer and SDS-PAGE separation on a 4-20% Tris-HCl gel (Biorad). The gels were stained with His stain (Pierce) according to manufacturer's instructions to detect His-tagged proteins prior to coomassie stain.

Results

Design of ACP-TE from DEBS

Analysis of ACP (PDB: 2JU2) and ACP sequence alignment led to design of the ACP-TE didomain from module 6 of the prototypical DEBS gene cluster. The N-, C-terminal His- tagged <¾w-ACP-TE didomain was expressed in BL21 E.coli strain and purified to near homogeneity using His-tag affinity purification.

In vitro reactions

In the initial attempt to verify the activity of the purified apo-ACP-TE, the in vitro constitution of the ACP-TE with sfp was examined. A functional apo-ACP-TE should be phosphantinylated by sfp. Furthermore, by supplying malonyl-CoA instead of CoA, the malonyl moiety was expected to be added onto the ACP domain by the sfp. Due to the presence of attached TE and lack of competing reactions, this moiety should be, in turn, hydrolysed to produce the corresponding acid (Figure 9). In addition, the reaction yield was expected to increase by addition of a malonyl-CoA specific irans-acting acyltransferase. Highly active DSZS AT was chosen for this purpose due its ACP partner promiscuity. This would be expected to acylate the holo-ACP domain to form the malonyl-ACP, which is then hydrolysed by the TE (Figure 9). To investigate these hypotheses, the following reactions were set up as shown in Table 7. These reactions were left overnight at room temperature before malonic acid was extracted and quantified by MS. As predicted, acid formation was observed in the presence of ACP-TE and sfp. Product yield is further increased with presence of AT, however this increase is less than two fold, possibly due to the limitations in malonyl- CoA substrate concentrations. In the absence of enzymes, a much slower hydrolysis of malonyl-CoA is also observed.

[Table 7] In vitro reactions and malonic acid product concentration relative to the enzymatic reaction

Reactions %Malonic acid concentration

ACP-TE, AT, sfp, mCoA 100

ACP-TE, sfp, mCoA 65

mCoA 12

In vivo reactions

From the observations, the highest malonic acid yield came from a combination of ACP-TE, AT and sfp, as shown in Table 6. Subsequently, this combination of enzymes was tested for in vivo production. Since the main substrate is predicted to be malonyl-CoA, an E. coli strain was utilized, which has been optimized for malonyl CoA overproduction. When induced with IPTG, these proteins were successfully co-expressed in the malonyl-CoA overproducing strain and protein expression was sustained for at least 5 days after induction (Figure 12). Discussion

In the present disclosure, an advantageous pathway towards biosynthesis of malonic acid is presented. The novelty of this route does not only include the series of reactions towards malonic acid but also the use of a truncated polyketide synthase for small molecule production in bacterium.

Application of the truncated polyketide synthase for efficient production of small molecules presents a range of opportunities for utilizing the modular PKSs in de novo biosynthetic pathways. In this example, the activity of trans-acting malonyl-CoA specific AT from DSZS with ACP6-TE didomain from DEBS was investigated, but it is also possible to optimize these enzymes with variants. Variants for AT, ACP and TE can be found in other Type I, Type II or even fatty acid synthesis. An example is ACP-TE didomains from other polyketide synthases, especially with a TE of more efficient hydrolysis activity and promiscuity. The didomain scaffold is also expected to be versatile and could be used for biosynthesis of modified diacid products. For example, a different extender unit can be added upon by using an acyltransferase specific for a different acyl-CoA, i.e. ethylmalonyl CoA and methylmalonyl CoA. Tailoring domains can also be added onto the didomain, for example, ketoreduction, or O-methylation. Example 3

Materials and Methods

Inactivated thioesterase construction

The quikchange (Agilent) protocol was used to mutate the plasmids encoding PKS domain covalently linked to ACP-TE domains. Primers, 5'- CGTTCGTGGTGGCCGGTCACgCCGCGGGGGCACTGATGG-3' (SEQ ID NO: 13) and 5'- CCATCAGTGCCCCCGCGGcGTGACCGGCCACCACGAACG-3' (SEQ ID NO: 14), were used to provide a single base mutation to inactivate the active site serine of the thioesterase domain (TE°). Digestion and clean-up of MT-ACP-TE 0 reactions

The methylation reactions (MT-ACP-TE 0 , 10 uL) were quenched after 2 hours at room temperature by addition of 1 unit each of endoproteinase Glu-C (Staphylococcus aureus V8, sigma-aldrich) and trypsin. The reaction was then left overnight at room temperature. The total peptides in the digests were fractionated using a ZipTipC18 (Millipore) and eluted with increasing concentrations of acetronitrile in a step-gradient of 15%, 35%, 50% and 75%.

Mass spectrometry for digested peptides

Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) was performed using an AB Sciex TOF/TOF 5800. Mass spectra were acquired in positive-ion reflectron mode, using a laser intensity of 4000 V with a 1 KHz OptiBeam™ On-Axis Laser. External calibration was performed using 6 peptide CalMix mass standards (AB Sciex), with monoisotopic masses of 904.4681 , 1296.6853, 1570.6774, 2093.0867, 2465.1989 and 3657.9294 Da. 0.5 iL of eluant from each acetonitrile percentage was spotted onto a polished MALDI-TOF MS target plate, mixed with 0.5 of 2,5-dihydroxybenzoic acid (DHB; 10 mg/ml in 50% v/v acetonitrile and 0 ' .1 % v/v TFA) and allowed to dry at room temperature. If homogenous ciystals were not formed, a small amount of acetonitrile was touched to each spot, which was then allowed to dry again. The mass spectra shown are the average of 10,000 laser scans collected from multiple locations on the target spot. Laser intensity was adjusted such that ion intensity did not exceed 3.4 xl O 6 counts. The data were processed using Data Explorer (AB Sciex).

LC-MS for o-niethylation reaction

A product sample was first dried under vacuum at a temperature of 4°C and reconstituted in ultrapure water prior to analysis. The reaction product in the reconstituted sample was subsequently detected using a LC-MS system (Acquity UPLC-LTQ Orbitrap system, Waters Corp and Thermo Scientific Inc. respectively). The separation was performed using a CI 8 reversed phase column (HSS T3 column, 2.1 x 100 mm in length, 2.5 μιη particle size), with the following solvents - A) water with 0.1% formic acid (Sigma- Aldrich), B) methanol (Fisher optima grade) with 0.1 % formic acid. The UPLC gradient method is stated in Table 8 and the flow rate is constant at 0.4 mL/min. The column is subsequently washed using acetonitrile (Merck) with 0.1% formic acid and solvent A in a ratio of 98:2 for 3 min at a flow rate of 0.5 mL/min and equilibrated with 99.9% A and 0.1% B for a further 2.5 min at a flow rate of 0.4 mL/min.

[Table 8] UPLC gradient method used in reaction product analysis

Time (min) %A

0 99.9

0.50 99.9

8.50 50

8.51 2

11.50 2 Results

Methylation

To examine the covalently bound intermediate on the ACP, the thioesterase domain is first inactivated to prevent hydrolysis. The TE° constructs were then used for methylation. After incubation under methylation conditions, LmnJ-ACP6-TE° and MelF-ACP6-TE° were digested to produce 1248 Da peptide containing the DSL polypeptide of the ACP. The intermediate is bounded to the serine of this conserved motif. This is observed as a 1248.5841 Da in the digestion of the unmodified MelF-ACP-TE 0 . After reaction with sip and DL-beta- hydroxybutyryl-CoA, a larger peak (1674.6188 Da) is observed whilst the unreacted polypeptide disappears. After further addition of SAM, the second peak (methylated intermediate) is observed (Figure 13). MALDI-TOF analyses of digested fragments have verified the methylation reaction. O-methylation reaction with BarF-ACP-TE was also verified with LC-MS (Figure 14). The specific structures of the products will be further verified with LC-MS and if possible, ID NMR analysis.

Sequence Listing Free Text

(SEQ ID NO: 1) 5 '-AAAAAAGGATCCGGGGCCGTCCCCGCGGTGC A-3 '

(SEQ H) NO; 2) 5'-TTTTTTCTCGAGTGAATTCCCTCCGCCCAGCCAGGC-3'

(SEQ ID NO: 3) 5'-AAAAAAgctagcAGCGACCTGCAACAATACAAAAACAAG-3'

(SEQ ID NO: 4) 5'-TTTTTTGGATCCACCAGAACCACCCGGTTGCTGCGTTTGCGGCG-3' (SEQ ID NO: 5) 5 '-AAAAAAgctagcG AAC AACTGACCCGTGTTATCGC-3 '

(SEQ ID NO; 6) 5 '-TTTTTTggatccACCAGAACCACCGTGTTCCGCGCTCAGGTATTC A-3 ' (SEQ ID NO; 7) 5'-AAAAAAGGATCCTCGGTGCTGGCAAAACAATTTAGCTCG-3 '

(SEQ ID NO; 8) 5 '-TTTTTTCTCG AGCG AGGTC AG AATC AGCG AAGCC -3 '

(SEQ ID NO; 9) 5'-TTTTTTCTCGAGCGAGGTCAGAATCAGCGAAGCC-3'

(SEQ ID NO; 10) 5 '-AAAAAAaagcttTC AG AAGAC AACCTGGCGACCC-3 '

(SEQ ID NO; 11) 5'-TTTTTTaagcttGAGCTGCTGTCCTATGTGGTCGGCC-3'

(SEQ ID NO; 12) 5 '-AAAAAAggatccGGGGCCGTCCCCGCGGTGC A-3 '.

(SEQ ID NO; 13) 5 ' -CGTTCGTGGTGGCCGGTC ACgCCGCGGGGGC ACTGATGG-3 ' (SEQ ID NO; 14) 5'-CCATCAGTGCCCCCGCGGcGTGACCGGCCACCACGAACG-3'