METHODS AND STRAINS FOR THE PRODUCTION OF SARCINAXANTHIN AND DERIVATIVES THEREOF

Title:

METHODS AND STRAINS FOR THE PRODUCTION OF SARCINAXANTHIN AND DERIVATIVES THEREOF

Document Type and Number:

WIPO Patent Application WO/2011/151425

Kind Code:

Abstract:

The present invention relates to a new strain of Micrococcus luteus, named Otnes7, which is superior to known strains in its ability to synthesise the carotenoid sarcinaxanthin and a method of producing sarcinaxanthin or a derivative thereof, said method comprising introducing into and expressing in a host cell one or more nucleic acid molecules encoding an activity in the sarcinaxanthin biosynthetic pathway.

Inventors:

NETZER ROMAN (NO)
BRAUTASET TRYGVE (NO)
BRUHEIM PER (NO)

Application Number:

PCT/EP2011/059159

Publication Date:

December 08, 2011

Filing Date:

June 01, 2011

Export Citation:

Click for automatic bibliography generation Help

Assignee:

PROMAR AS (NO)
NETZER ROMAN (NO)
BRAUTASET TRYGVE (NO)
BRUHEIM PER (NO)

International Classes:

C07K14/305; C12P1/04

Domestic Patent References:

WO2002041833A2	2002-05-30
WO1998008958A1	1998-03-05

Other References:

DATABASE EMBL [online] 19 June 2009 (2009-06-19), "Micrococcus luteus NCTC 2665, complete genome.", XP002657761, retrieved from EBI accession no. EM_PRO:CP001628 Database accession no. CP001628
ALTSCHUL, S.F. ET AL.: "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", NUCLEIC ACIDS RES., vol. 25, 1997, pages 3389 - 3402
BLATNY ET AL., PLASMID, vol. 38, 1997, pages 35 - 51
BLATNY ET AL., APPL. ENVIRON. MICROBIOL, vol. 63, no. 2, 1997, pages 370 - 379
BRAUTASET ET AL., METAB. ENQ., vol. 2, no. 2, 2000, pages 104 - 114
BRAUTASET, T., LALE, R., VALLA, S.: "Positively regulated bacterial expression systems.", MICROBIAL BIOTECHNOLOQV, vol. 2, 2009, pages 15 - 30
CUNNINGHAM, F. X., JR., D. CHAMOVITZ ET AL.: "Cloning and functional expression in Escherichia coli of a cyanobacterial gene for lycopene cyclase, the enzyme that catalyzes the biosynthesis of beta-carotene.", FEBS LETT, vol. 328, no. 1-2, 1993, pages 130 - 8
CUNNINGHAM, F. X., JR., E. GANTT: "A portfolio of plasmids for identification and analysis of carotenoid pathway enzymes: Adonis aestivalis as a case study.", PHOTOSYNTH RES, vol. 92, no. 2, 2007, pages 245 - 59
CUNNINGHAM, F. X., JR., Z. SUN ET AL.: "Molecular structure and enzymatic function of lycopene cyclase from the cyanobacterium Synechococcus sp strain PCC7942.", PLANT CELL, vol. 6, no. 8, 1994, pages 1107 - 21
DAS, A., S.-H. YOON ET AL.: "An update on microbial carotenoid production: application of recent metabolic engineering tools.", APPLIED MICROBIOLOQV AND BIOTECHNOLOQV, vol. 77, no. 3, 2007, pages 505 - 512
DOWER, W. J., J. F. MILLER ET AL.: "High efficiency transformation of E. coli by high voltage electroporation.", NUCLEIC ACIDS RES, vol. 16, no. 13, 1988, pages 6127 - 45
FANG, T. J., Y. S. CHENG: "Isolation of astaxanthin over-producing mutants of Phaffia rhodozyma and their fermentation kinetics.", ZHONQHUA MIN GUO WEI SHENQ WU JI MIAN YI XUE ZA ZHI, vol. 25, no. 4, 1992, pages 209 - 22
FRASER, P. D., P. M. BRAMLEY: "The biosynthesis and nutritional uses of carotenoids.", PROQ LIPID RES, vol. 43, no. 3, 2004, pages 228 - 65
HARKER, M., P. M. BRAMLEY: "Expression of prokaryotic 1-deoxy-D-xylulose-5-phosphatases in Escherichia coli increases carotenoid and ubiquinone biosynthesis.", FEBS LETT, vol. 448, no. 1, 1999, pages 115 - 9
HOLM, J. OF MOL. BIOLOQY, vol. 233, 1993, pages 123 - 38
HOLM, TRENDS IN BIOCHEMICAL SCIENCES, vol. 20, 1995, pages 478 - 480
HOLM, NUCLEIC ACID RESEARCH, vol. 26, 1998, pages 316 - 9
KAISER, P., P. SURMANN ET AL.: "A small-scale method for quantitation of carotenoids in bacteria and yeasts.", J MICROBIOL METHODS, vol. 70, no. 1, 2007, pages 142 - 9
KIM, D., J.S. LEE, Y.K. PARK, J.F. KIM, H. JEONG, T.K. OH, B.S. KIM, C.H. LEE: "Biosynthesis of antibiotic prodiginines in the marine bacterium Hahella chejuensis KCTC 2396", J. APPL. MICROBIOL., vol. 102, 2007, pages 937 - 944
KRUBASIK, P., M. KOBAYASHI ET AL.: "Expression and functional analysis of agene cluster involved in the synthesis of decaprenoxanthin reveals the mechanisms for C50 carotenoid formation.", EUR J BIOCHEM, vol. 268, no. 13, 2001, pages 3702 - 8
KRUBASIK, P., G. SANDMANN: "A carotenogenic gene cluster from Brevibacterium linens with novel lycopene cyclase genes involved in the synthesis of aromatic carotenoids.", MOL GEN GENET, vol. 263, no. 3, 2000, pages 423 - 32
KRUBASIK, P., S. TAKAICHI ET AL.: "Detailed biosynthetic pathway to decaprenoxanthin diglucoside in Corynebacterium glutamicum and identification of novel intermediates.", ARCH MICROBIOL, vol. 176, no. 3, 2001, pages 217 - 23
KURUSU, Y., M. KAINUMA ET AL.: "Electroporation-transformation system for coryneform bacteria by auxotrophic complementation.", AQRIC BIOL CHEM, vol. 54, no. 2, 1990, pages 443 - 7
MERMOD ET AL., J. BACTERIOL., vol. 167, no. 2, 1986, pages 447 - 454
MYERS, E., MILLER, W.: "Optical Alignments in Linear Space", CABIOS, vol. 4, 1988, pages 11 - 17
PEARSON, W.R., LIPMAN, D.J.: "Improved tools for biological sequence analysis", PNAS, vol. 85, 1988, pages 2444 - 2448
PEARSON, W.R.: "Rapid and sensitive sequence comparison with FASTP and FASTA", METHODS IN ENZVMOLOQV, vol. 183, 1990, pages 63 - 98
RAJA, R., S. HEMAISWARYA ET AL.: "Exploitation of Dunaliella for beta-carotene production.", APPL MICROBIOL BIOTECHNOL, vol. 74, no. 3, 2007, pages 517 - 23
RAMOS ET AL., FEBS LETT, vol. 226, no. 2, 1988, pages 241 - 246
REICHENBACH, H., W. KOHL, A. B6TTGER-VETTER, H. ACHENBACH.: "Flexirubin-type pigments in flavobacterium", ARCH. MICROBIOL., vol. 126, 1980, pages 291 - 293
RODRIGUEZ-CONCEPCION, M., A. BORONAT: "Elucidation of the methylerythritol phosphate pathway for isoprenoid biosynthesis in bacteria and plastids. A metabolic milestone achieved through genomics", PLANT PHYSIOL, vol. 130, no. 3, 2002, pages 1079 - 89
SAMBROOK, J., E. F. FRITSCH ET AL.: "Molecular cloning: a Laboratory Manual", 1989, COLS SPRING HARBOR LABORATORY PRESS
SLETTA ET AL., APPL. ENV. MICROBIOL., vol. 70, no. 12, 2004, pages 7033 - 7039
SLETTA ET AL., APPL. ENV. MICROBIOL., vol. 73, no. 3, 2007, pages 906 - 912
STAFSNES MH, J. K., KILDAHL-ANDERSEN G, VALLA S, ELLINGSEN TE, BRUHEIM P.: "Isolation and characterization of marine pigmented bacteria from Norwegian coastal waters and screening for carotenoids with UVA-blue light absorbing properties", THE JOURNAL OF MICROBIOLOGY, vol. 48, no. 1, 2010, pages 16 - 23
TAO, L., H. YAO ET AL.: "Genes from a Dietzia sp. for synthesis of C40 and C50 beta- cyclic carotenoids.", GENE, vol. 386, no. 1-2, 2007, pages 90 - 7
THOMPSON, J. D ET AL.: "CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice", NUCLEIC ACIDS RES, vol. 22, 1994, pages 4673 - 4680
TRIPATHI, G., S. K. RAWAL: "Simple and efficient protocol for isolation of high molecular weight DNA from Streptomyces aureofaciens.", BIOTECHNOLOQV TECHNIQUES, vol. 12, no. 8, 1998, pages 629 - 631
VERTES, A. A., Y. ASAI ET AL.: "Transposon mutagenesis of coryneform bacteria.", MOL GEN GENET, vol. 245, no. 4, 1994, pages 397 - 405
WINTHER-LARSEN ET AL., METAB. ENG., vol. 2, 2000, pages 79 - 91
WINTHER-LARSEN ET AL., METAB. ENQ., vol. 2, 2000, pages 92 - 103

Attorney, Agent or Firm:

DZIEGLEWSKA, Hanna (St Bride's House10 Salisbury Square, London Greater London EC4Y 8JD, GB)

Download PDF:

View/Download PDF PDF Help

Claims:

1. A method of producing sarcinaxanthin or a derivative thereof, said method

comprising introducing into and expressing in a host cell one or more nucleic acid molecules encoding an activity in the sarcinaxanthin biosynthetic pathway, wherein said one or more nucleic acid molecules comprise:

(i) a nucleotide sequence as set forth in SEQ ID NO: 37 or a part thereof;

(ii) a nucleotide sequence with at least 90% sequence identity to SEQ ID NO: 37, or a part thereof; or

(iii) a nucleotide sequence complementary to (i) or (ii).

2. The method of claim 1 , wherein said one or more nucleic acid molecules comprise:

(i) a nucleotide sequence as set forth in SEQ ID NO: 26 or a part thereof;

(ii) a nucleotide sequence with at least 90% sequence identity to SEQ ID NO: 26, or a part thereof; or

(iii) a nucleotide sequence complementary to (i) or (ii).

3. The method of claim 1 or 2, wherein said one or more nucleic acid molecules encode the sarcinaxanthin biosynthetic pathway.

4. The method of any one of claims 1 to 3, further comprising the step of isolating the sarcinaxanthin or derivative thereof from the host cell.

5. The method of any one of claims 1 to 4, wherein said method comprises introducing into and expressing in a host cell:

(a) one or more nucleic acid molecules comprising nucleotide sequences encoding one or more proteins capable of synthesising flavuxanthin; and

(b) one or more nucleic acid molecules comprising nucleotide sequences encoding one or more proteins having or contributing to C₅₀ carotenoid γ-cyclase activity, wherein said one or more proteins of (b) are capable of catalysing the conversion of flavuxanthin to sarcinaxanthin.

6. The method of claim 5, wherein said host cell is a lycopene-producing host cell, preferably wherein said lycopene-producing host cell is capable of producing lycopene at levels of at least 0.5 mg/g CDW, further preferably, wherein the lycopene producing host cell comprises the plasmid pAC-LYC.

7. The method of claim 6, wherein said one or more proteins of (a) are capable of

catalysing the conversion of lycopene to flavuxanthin.

8. The method of claim 7, wherein said one or more proteins have lycopene elongase activity.

9. The method of any one of claims 5 to 8, wherein said one or more nucleic acid

molecule of (b) comprises: (1 ) a nucleic acid molecule encoding a C₅o carotenoid γ-cyclase subunit and comprising:

(1) a nucleotide sequence as set forth in all or part of SEQ ID NO: 12 or SEQ ID NO: 2, or which is degenerate therewith, or which has at least 90% sequence identity to SEQ ID NO: 12 or 2; or

(ii) a nucleotide sequence encoding a protein having all or part of an amino acid sequence as set forth in SEQ ID NO: 13 or 3 or an amino acid sequence which is at least 90% identical to SEQ ID NO: 13 or 3; and

(2) a nucleic acid molecule encoding a C₅₀ carotenoid γ-cyclase subunit and comprising:

(i) a nucleotide sequence as set forth in all or part of SEQ ID NO: 14 or 4, or which is degenerate therewith, or which has at least 90% sequence identity to SEQ ID NO: 14 or 4; or

(ii) a nucleotide sequence encoding a protein having all or part of an amino acid sequence as set forth in SEQ ID NO: 15 or 5 or an amino acid sequence which is at least 90% identical to SEQ ID NO: 15 or 5.

10. The method of any one of claims 5 to 8, wherein said one or more nucleic acid

molecules of (a) comprise:

(i) a nucleotide sequence as set forth in all or part of SEQ ID NO: 10, 6 or 7, or which is degenerate therewith, or which has at least 90% sequence identity to

SEQ ID NO: 10, 6 or 7; or

(ii) a nucleotide sequence encoding a protein having all or part of an amino acid sequence as set forth in SEQ ID NO: 1 1 , 8 or 9, or an amino acid sequence which is at least 90% identical to SEQ ID NO: 1 1 , 8 or 9.

1 1 . The method of any one of claims of claims 5 to 8, wherein said one or more nucleic acid molecule comprises a nucleotide sequence encoding all or part of a protein having an amino acid sequence selected from the sequences as set forth in any one of SEQ ID NO: 1 1 , 13 and 15 or an amino acid sequence which has at least 90% sequence identity to SEQ ID NO: 1 1 , 13 or 15.

12. The method of claim 1 1 , wherein said nucleotide sequence encodes a protein which when expressed in a lycopene-producing host cell together with each of the other said proteins results in at least 91 % of the total carotenoids produced being sarcinaxanthin, or a nucleic acid molecule which comprises a nucleotide sequence which is the complement of any aforesaid sequence.

13. The method of claim 1 1 or 12, wherein said nucleotide sequence encodes a protein which when expressed in a lycopene-producing host cell together with each of the other said proteins results in sarcinaxanthin production to a level of at least 150 μg/g of cell dry weight (CDW).

14. The method of any one of claims 1 to 13, wherein said one or more nucleic acid molecules comprise:

(i) a nucleotide sequence selected from sequences as set forth in SEQ ID NO:

10, 12 and 14;

(ii) a nucleotide sequence which is degenerate with the sequence of any one of SEQ ID NOs: 10, 12 or 14;

(iii) a nucleotide sequence which has at least 90% sequence identity to any one of SEQ ID NOs: 10, 12 or 14;

(iv) a nucleotide sequence which is a part of the nucleotide sequence of any one of SEQ ID NOs: 10, 12 or 14 or of a nucleotide sequence which is degenerate therewith; or

(v) a nucleotide sequence which is complementary to any of (i) to (iv) above. 15. The method of claim 14, wherein said one or more nucleic acid molecules comprises a nucleotide sequence encoding a protein having lycopene elongase activity and an amino acid sequence as set forth in all or part of SEQ ID NO: 1 1 or an amino acid sequence which is at least 90% identical to SEQ ID NO: 1 1 , wherein said amino acid sequence comprises one or more of the following:

(a) alanine at position 8;

(b) valine at position 88;

or a nucleotide sequence which is the complement of any aforesaid sequence, wherein the position numbers are stated with reference to SEQ ID NO. 1 1 , preferably wherein the nucleic acid molecule comprises a nucleotide sequence as set forth in

SEQ ID NO: 10 or a part of variant thereof, or a complement thereof.

16. The method of claim 14, wherein said one or more nucleic acid molecules comprises a nucleotide sequence encoding a protein which contributes to C₅₀ carotenoid γ- cyclase activity and which has an amino acid sequence as set forth in all or part of SEQ ID NO: 13 or an amino acid sequence which is at least 90% identical to SEQ ID

NO: 13, wherein said amino acid sequence comprises one or more of the following:

(a) valine at position 44;

(b) valine at position 64;

(d) arginine at position 104;

(e) proline at position 1 1 1 ;

(f) glycine at position 1 17; or a nucleotide sequence which is the complement of any aforesaid sequence, wherein the position numbers are stated with reference to SEQ ID NO. 13, preferably wherein the nucleic acid molecule comprises a nucleotide sequence as set forth in SEQ ID NO: 12 or a part of variant thereof, or a complement thereof.

17. The method of claim 14, wherein said one or more nucleic acid molecules comprises a nucleotide sequence encoding a protein which contributes to C₅₀ carotenoid v- cyclase activity and which has an amino acid sequence as set forth in all or part of SEQ ID NO: 15 or an amino acid sequence which is at least 90% identical to SEQ ID NO: 15, wherein said amino acid sequence comprises one or more of the following:

(a) a glycine residue at position 100;

(b) a glycine residue at position 103;

or a nucleotide sequence which is the complement of any aforesaid sequence, wherein the position numbers are stated with reference to SEQ ID NO. 15, preferably wherein the nucleic acid molecule comprises a nucleotide sequence as set forth in

SEQ ID NO: 14 or a part of variant thereof, or a complement thereof.

18. The method of any one of claims 1 to 17 comprising the introduction of a further nucleic acid molecule into said host cell, wherein said nucleic acid molecule encodes an enzyme capable of glycosylating sarcinxanthin.

19. The method of claim 18, wherein said further nucleic acid molecule encodes crtX from M.luteus or a functional equivalent thereof, preferably wherein the nucleic acid comprises:

(i) a nucleotide sequence as set forth in all or part of SEQ ID NO: 33 or 16, or which is degenerate therewith, or a nucleotide sequence with at least 70% sequence identity to SEQ ID NO: 33 or 16;

(ii) a nucleotide sequence which hybridizes to SEQ ID NO: 33 or 16 under non- stringent binding conditions of 6 x SSC/50% formamide at room temperature and washing under conditions of high stringency, e.g. 2 x SSC, 65°C, where SSC = 0.15 M NaCI, 0.015M sodium citrate, pH 7.2; or

(iii) a nucleotide sequence encoding a protein having all or part of an amino acid sequence as set forth in SEQ ID NO: 34 or 17 or which comprises an amino acid sequence which is at least 70% identical to SEQ ID NO: 34 or 17.

20. The method of claim 19, wherein said further nucleic acid molecule comprises a nucleotide sequence encoding a protein having sarcinaxanthin glycosylase activity and an amino acid sequence as set forth in all or part of SEQ ID NO: 34 or an amino acid sequence which is at least 90% identical to SEQ ID NO: 34, wherein said amino acid sequence comprises one or more of the following: (a) histidine at position 62;

(b) serine at position 109;

(d) alanine at position 138;

(e) arginine at position 248;

(f) proline at position 251 ;

or a nucleotide sequence which is the complement of any aforesaid sequence, wherein the position numbers are stated with reference to SEQ ID NO. 34, preferably wherein the nucleic acid molecule comprises a nucleotide sequence as set forth in SEQ ID NO: 33 or a part of variant thereof, or a complement thereof.

21 . The method of any one of claims 1 to 20, wherein the expression of one or more said nucleic acid molecules is inducible.

22. The method of any one of claims 1 to 21 , wherein said host cell is a microorganism particularly a bacterium.

23. The method of claim 22, wherein said bacterium is selected from Escherichia sp.,

Salmonella, Klebsiella, Proteus, Yersinia, Azotobacter sp., Pseudomonas sp., Xanthomonas sp., Agrobacterium sp., Alcaligenes sp., Bordatella sp., Haemophilus influenzae, Methylophilus methylotrophus, Rhizobium sp., Thiobacillus sp. and Clavibacter sp., preferably wherein the host cell is an Escherichia coli cell or a Corynebacterium glutamicum cell.

24. An isolated nucleic acid molecule comprising or consisting of all or a part of a

nucleotide sequence as set forth in SEQ ID NO: 37 or which has at least 90% sequence identity to SEQ ID NO. 37, which molecule encodes one or more proteins having activity in the biosynthesis of sarcinaxanthin, and wherein any nucleic acid molecule which comprises a nucleotide sequence which is a part of SEQ ID NO. 37 or which is at least 90 % identical to SEQ ID NO. 37 encodes proteins which are able to synthesise sarcinaxanthin at substantially the same level as the proteins encoded by SEQ ID NO: 37 when expressed in a host cell.

25. The nucleic acid molecule of claim 24, wherein said part of said nucleic acid

molecule comprises or consists of all or a part of a nucleotide sequence as set forth in SEQ ID NO: 26 or which has at least 90% sequence identity to SEQ ID NO. 26, which molecule encodes one or more proteins having activity in the biosynthesis of sarcinaxanthin, and wherein any nucleic acid molecule which comprises a nucleotide sequence which is a part of SEQ ID NO. 26 or which is at least 90 % identical to SEQ ID NO. 26 encodes proteins which are able to synthesise sarcinaxanthin at substantially the same level as the proteins encoded by SEQ ID NO: 26 when expressed in a host cell.

26. The nucleic acid molecule of claim 24 or 25, wherein said part of said nucleic acid molecule comprises a nucleotide sequence encoding all or part of a protein having an amino acid sequence as set forth in SEQ ID NO: 1 1 or an amino acid sequence which is at least 90% identical to SEQ ID NO: 1 1 and wherein said nucleotide sequence encodes a lycopene elongase with a lycopene to flavuxanthin conversion efficiency of at least 30%, when expressed in a host cell, or a nucleic acid molecule which comprises a nucleotide sequence which is the complement of any aforesaid sequence.

27. The nucleic acid molecule of claim 26, wherein said part of said nucleic acid

molecule comprises:

(i) a nucleotide sequence as set forth in SEQ ID NO: 10;

(ii) a nucleotide sequence which is degenerate with the sequence of SEQ ID NO: 10;

(iii) a nucleotide sequence which has at least 90% sequence identity to SEQ ID NO: 10;

(iv) a nucleotide sequence which is a part of the nucleotide sequence of SEQ ID NO: 10 or of a nucleotide sequence which is degenerate therewith; or

(v) a nucleotide sequence which is complementary to any of (i) to (iv) above.

28. The nucleic acid molecule of claim 24 or 25, wherein said part of said nucleic acid molecule comprises a nucleotide sequence encoding all or part of a protein having an amino acid sequence selected from the sequences as set forth in any one of SEQ ID NO: 1 1 , 13 and 15 or an amino acid sequence which has at least 90% sequence identity to SEQ ID NO: 1 1 , 13 or 15, and wherein said nucleotide sequence encodes a protein which when expressed in a lycopene-producing host cell together with each of the other said proteins results in at least 91 % of the total carotenoids produced being sarcinaxanthin, or a nucleic acid molecule which comprises a nucleotide sequence which is the complement of any aforesaid sequence.

29. The nucleic acid molecule of claim 24 or 25, wherein said part of said nucleic acid molecule comprises a nucleotide sequence encoding all or part of a protein having an amino acid sequence selected from the sequences as set forth in any one of SEQ ID NO: 1 1 , 13 and 15 or an amino acid sequence which has at least 90% sequence identity to SEQ ID NO: 1 1 , 13 or 15, wherein said nucleotide sequence encodes a protein which when expressed in a lycopene-producing host cell together with each of the other said proteins results in sarcinaxanthin production to a level of at least 150 g/g of cell dry weight (CDW).

30. The nucleic acid molecule of claim 28 or 29, wherein said nucleic acid molecule comprises: (i) a nucleotide sequence selected from sequences as set forth in SEQ ID NO: 10, 12 and 14;

(ii) a nucleotide sequence which is degenerate with the sequence of any one of SEQ ID NOs: 10, 12 or 14;

(iii) a nucleotide sequence which has at least 90% sequence identity to any one of SEQ ID NOs: 10, 12 or 14;

(iv) a nucleotide sequence which is a part of the nucleotide sequence of any one of SEQ ID NOs: 10, 12 or 14 or of a nucleotide sequence which is degenerate therewith; or

(v) a nucleotide sequence which is complementary to any of (i) to (iv) above.

31 . The nucleic acid molecule of claim 30, wherein said nucleic acid molecule comprises a nucleotide sequence encoding a protein having lycopene elongase activity and an amino acid sequence as set forth in all or part of SEQ ID NO: 1 1 or an amino acid sequence which is at least 90% identical to SEQ ID NO: 1 1 , wherein said amino acid sequence comprises one or more of the following:

(a) alanine at position 8;

(b) valine at position 88;

32. The nucleic acid molecule of claim 30, wherein said nucleic acid molecule comprises a nucleotide sequence encoding a protein which contributes to C₅₀ carotenoid v- cyclase activity and which has an amino acid sequence as set forth in all or part of

SEQ ID NO: 13 or an amino acid sequence which is at least 90% identical to SEQ ID NO: 13, wherein said amino acid sequence comprises one or more of the following:

(a) valine at position 44;

(b) valine at position 64;

(d) arginine at position 104;

(e) proline at position 1 1 1 ;

(f) glycine at position 1 17;

or a nucleotide sequence which is the complement of any aforesaid sequence, wherein the position numbers are stated with reference to SEQ ID NO. 13, preferably wherein the nucleic acid molecule comprises a nucleotide sequence as set forth in SEQ ID NO: 12 or a part of variant thereof, or a complement thereof.

33. The nucleic acid molecule of claim 30, wherein said nucleic acid molecule comprises a nucleotide sequence encoding a protein which contributes to C₅₀ carotenoid v- cyclase activity and which has an amino acid sequence as set forth in all or part of SEQ ID NO: 15 or an amino acid sequence which is at least 90% identical to SEQ ID NO: 15, wherein said amino acid sequence comprises one or more of the following:

(a) a glycine residue at position 100;

(b) a glycine residue at position 103;

34. The nucleic acid molecule of claim 24, wherein said part of said nucleic acid

molecule comprises a nucleotide sequence encoding all or part of a protein having an amino acid sequence as set forth in SEQ ID NO: 34 or an amino acid sequence which is at least 90% identical to SEQ ID NO: 34 and wherein said nucleotide sequence encodes a sarcinaxanthin glycosylase enzyme, which activity results in the production of both sarcinaxanthin mono- and diglucosides, when expressed in a host cell, or a nucleic acid molecule which comprises a nucleotide sequence which is the complement of any aforesaid sequence.

35. The nucleic acid molecule of claim 34, wherein said part of said nucleic acid

molecule comprises:

(i) a nucleotide sequence as set forth in SEQ ID NO: 33;

(ii) a nucleotide sequence which is degenerate with the sequence of SEQ ID NO: 33;

(iii) a nucleotide sequence which has at least 90% sequence identity to SEQ ID NO: 33;

(iv) a nucleotide sequence which is a part of the nucleotide sequence of SEQ ID NO: 33 or of a nucleotide sequence which is degenerate therewith; or

(v) a nucleotide sequence which is complementary to any of (i) to (iv) above.

36. The nucleic acid molecule of claim 35, wherein said nucleic acid molecule comprises a nucleotide sequence encoding a protein having sarcinaxanthin glycosylase activity and an amino acid sequence as set forth in all or part of SEQ ID NO: 34 or an amino acid sequence which is at least 90% identical to SEQ ID NO: 34, wherein said amino acid sequence comprises one or more of the following:

(a) histidine at position 62;

(b) serine at position 109; (c) arginine at position 129;

(d) alanine at position 138;

(e) arginine at position 248;

(f) proline at position 251 ;

37. A vector comprising the isolated nucleic acid molecule of any one of claims 24 to 36.

38. An isolated protein encoded by the nucleic acid molecule of any one of claims 24 to 36.

39. A strain of Micrococcus luteus as deposited under number DSM 23579 at the DSMZ, or a mutant or modified strain thereof which produces sarcinaxanthin or a derivative thereof.

Description:

Methods and strains for the production of sarcinaxanthin and derivatives thereof

The present invention relates to a new strain of Micrococcus luteus, named Otnes7, which is superior to known strains in its ability to synthesise the carotenoid sarcinaxanthin. The invention also relates to the identification and cloning of the gene cluster encoding the biosynthetic machinery for the synthesis of sarcinaxanthin, which includes the first known proteins responsible for the biosynthesis of a γ-cyclic C ₅₀ carotenoid and more particularly the identification for the first time of a C ₅₀ carotenoid γ-cyclase. In particular, novel genes and their encoded polypeptides from the novel Otnes7 strain are identified and sequenced. The invention accordingly provides the novel nucleic acid molecules and proteins from said strain. The invention further relates to the use of nucleic acid molecules encoding the sarcinaxanthin biosynthetic machinery enzyme system (as well as components thereof) in methods for the production of sarcinaxanthin, through heterologous expression of said nucleic acids and proteins in host cells.

Pigmentation is widespread among bacteria and pigments found in marine heterotrophic bacteria comprise carotenoid, flexirubin, xanthomonadine and prodigiosin (Kim et al., 2007; Reichenbach et al., 1980). The carotenoids are considered to be the main and most abundant pigment group.

Carotenoids are natural pigments synthesized by bacteria, fungi, algae and plants and to date more than 750 different natural carotenoids have been isolated from natural sources. In addition to their importance as colouration pigments, carotenoids play a critical role in photosynthetic processes and exhibit protective properties against damage by oxygen and light. Due to their antioxidant properties, carotenoids have been proposed to reduce the risk of certain cancers, cardiovascular disease and Alzheimer's disease. The global market for carotenoids used as food colourants and nutritional supplements was estimated at some $935 million by 2005 (Fraser and Bramley 2004). Despite intensive research into microbial production of carotenoids, most commercial carotenoids are still produced by chemical synthesis and only large-scale microbial production of β-carotene (Raja, Hemaiswarya et al. 2007) and astaxanthin (Fang and Cheng 1992) has been reported to date. There is an increasing demand for natural carotenoids for nutritional, pharmaceutical and medical applications, and hence the microbial production of these molecules is of great importance.

More than 95% of all natural carotenoids are based on a symmetric C ₄₀ phytoene backbone and only a small number of C ₃₀ and even fewer C ₅₀ carotenoids have been discovered so far. Carotenoids modified by oxygen-containing functional groups are cyclic or acyclic xanthophylls which have been shown completely to lack pro-oxidative abilities and display significant stronger anti-oxidative properties than carotenoids without oxygen

functionality (carotenes). The extension of conjugated double bonds has also been reported to increase the anti-oxidative potential of hydroxylated carotenoids and is assumed as one of the most important features for radical scavenging properties. Based on the high number of conjugated double bonds, and since all known C ₅₀ carotenoids contain at least one hydroxyl group, this class of carotenoids has a high potential for excellent anti-oxidative properties. Thus there is interest in the production of carotenoids in this class.

In nature C ₅₀ carotenoids are synthesized by bacteria of the actinomycetales family. The ε-cyclic C ₅₀ carotenoid decaprenoxanthin (2,2'-Bis-(4-hydroxy-3-methybut-2-enyl)-e,e-carotene) has been found in Agromyces mediolanus, Arthrobacter glacialis and Aureobacterium sp., and the decaprenoxanthin biosynthetic pathway was proposed in Corynebacterium glutamicum (Krubasik and Sandmann 2000; Krubasik, Kobayashi et al. 2001 ). The β-cyclic C ₅₀ carotenoid C.p.450 (2,2'-Bis-(4-hydroxy-3-methybut-2-enyl)-3,3-carotene) has been detected in

Curtobacterium flaccumfaciens (formerly Corynebacterium poinsettiae) and recently the biosynthetic pathway in Dietzia sp. CQ4 was proposed (Tao, Yao et al. 2007). For both C ₅₀ carotenoid pathways it was reported that the common precursor lycopene is synthesized via the methylerythritol 4-phosphate (MEP) pathway which is present in most eubacteria (Rodriguez- Concepcion and Boronat 2002). Biosynthesis of lycopene from Ci ₅ farnesyl pyrophosphate (FPP) has been well studied in many carotenogenic organisms. FPP is converted into C ₂₀ geranyl geranyl pyrophosphate (GGPP) catalyzed by GGPP synthase, followed by

condensation of two molecules GGPP to produce C ₄o phytoene, catalyzed by a phytoene synthase. Finally, phytoene is dehydrated to C ₄₀ lycopene, catalyzed by a phytoene

dehydrogenase. Heterologous production of lycopene has been performed successfully in non- carotenogenic organisms such as Escherichia coli and is being investigated intensively on an ongoing basis (Das, Yoon et al. 2007).

Using lycopene as the precursor, biosynthesis of cyclic C ₅₀ carotenoids is catalyzed by lycopene elongase and carotenoid cyclases. Although most carotenoids in plants and microorganisms exhibit cyclic structures, cyclization reactions are predominantly known for C ₄₀ pathways, catalyzed by monomeric enzymes which have been isolated from plants and bacteria. In C. glutamicum, the genes crtYe, crtYf and crtEb were identified to be involved in the conversion of lycopene to the ε-cyclic C ₅₀ carotenoid decaprenoxanthin. Sequential elongation of lycopene by two C ₅ isoprenyl units to form the acyclic C ₅₀ carotenoid flavuxanthin was catalyzed by a crtEb encoded lycopene elongase. Subsequent cyclization to decaprenoxanthin was catalyzed by a heterodimeric C ₅₀ carotenoid ε-cyclase encoded by crtYe and crtYf. Whilst the polypeptides encoded by crtYe and crtYf share primary sequence similarities with a new type of the heterodimeric lycopene cyclase CrtYe and CrtYd involved in lycopene cyclization in B. linens and Mycobacterium aurum, the C. glutamicum crtYeYf genes encode two polypeptides constituting a carotenoid cyclase that uses C ₄₅ and C ₅o carotenoids as substrates (Krubasik, Kobayashi et al. 2001 ). The genetic and enzymatic basis for glycosylation of decaprenoxanthin in C. glutamicum is unknown.

Recently, an analogous pathway was proposed for the biosynthesis of the β-cyclic C ₅o carotenoid C.p.450 in Dietzia sp. CQ4 (Tao, Yao et al. 2007). Synthesis of C.p.450 from lycopene also requires lycopene elongase and C ₅₀ carotenoid β-cyclase activity.

Whilst most cyclic carotenoids exhibit β-rings, ε-ring containing pigments are common in higher plants. Carotenoids substituted only with γ-rings are rarely observed in plants and algae, and only traces can be detected. Prior to the present invention, no biochemical pathway for γ- cyclic C ₅₀ carotenoids had been identified.

Sarcinaxanthin is a γ-cyclic C ₅₀ carotenoid which is known to be produced by

Micrococcus luteus. Micrococcus luteus is a GC rich Gram-positive bacterium belonging to the family of micrococcaceae within the order of actinomycetales. The carotenoids, including sarcinaxanthin, accumulated in this bacterium were identified and structurally elucidated decades ago. However, the biosynthetic machinery responsible for the synthesis of this molecule was, prior to the present invention, unknown. As suggested above, the elucidation and functional characterization of the genes responsible for the biosynthesis of the γ-cyclic C ₅₀ carotenoid sarcinaxanthin and its glycosylated derivatives is of great commercial importance and represents a significant contribution to knowledge in the biosynthesis of carotenoids. As discussed below, this has resulted in a much needed advance in methods for the production of sarcinaxanthin and the identification of a new class of cyclase, namely a C ₅₀ carotenoid γ- cyclase, which will be useful in the synthesis of structurally different carotenoids.

As noted above and described below, the present invention is based on the

identification, cloning and sequencing of a gene cluster for the biosynthesis of sarcinaxanthin which has not heretofore been available. Furthermore, the present inventors have isolated a novel strain of M. luteus, named Otnes7, which is capable of producing sarcinaxanthin in superior quantities to other known strains. The identification, cloning and sequencing of the gene cluster for the biosynthesis of sarcinaxanthin from M. luteus strain NCTC2665 has allowed the identification and cloning of nucleic acids from the Otnes 7 strain, which encode novel proteins the expression of which results in increased sarcinaxanthin production in comparison to the proteins of the NCTC2665 strain. Heterologous expression of one or more of the

sarcinaxanthin biosynthesis genes in a host cell has enabled a method for efficiently and economically producing sarcinaxanthin.

Analysis of the cloned genes has further allowed the elucidation of the biosynthetic pathway for sarcinaxanthin. Accordingly it is now proposed that the normal process of synthesis of sarcinaxanthin is initiated through the synthesis of lycopene, as described above, which is converted to nonaflavuxanthin and then flavuxanthin through the action of a lycopene elongase, which in M. luteus is encoded by the gene crtE2. The resultant flavuxanthin is cyclised by the action of a heterodimeric C ₅₀ γ-cyclase, which in M. luteus is encoded by crtYg and crtYh, which results in sarcinaxanthin (Figure 1 ). The sacrinaxanthin biosynthetic gene cluster also encodes at least one protein (CrtX) for the glycosylation of the synthesized molecules.

Since the chemical synthesis of compounds such as this is highly complex, a biosynthetic route in practice needs to be used and accordingly the isolation or purification of the compounds from appropriate hosts, particularly heterologous hosts (that is hosts transformed with one or more genes to enable the biosynthesis), is desirable. This also affords the opportunity of manipulating genes of the biosynthetic gene cluster in order to change the biosynthesis and thereby result in improved yields and/or the synthesis of new or modified carotenoid compounds.

In this respect, there remains a need and desire to provide methods for the improved production of carotenoid compounds (for example to improve yield, or production conditions, or to expand the range of available host cells) and the present invention is directed to these aims, based on the cloning and DNA sequencing of the sarcinaxanthin biosynthetic gene cluster. This provides the first characterisation for these carotenoid biosynthetic genes, as well as a tool for genetic manipulation in order to modify the expression levels or properties of sarcinaxanthin and/or the producing organism. Whilst the carotenoid sarcinaxanthin is known and the sequence of the genome of M. luteus strain NCTC2665 is available, in view of the background of a plurality of carotenoid-based molecules synthesised in M. luteus and the corresponding plurality of biosynthetic genes necessary for their synthesis, and further in view of the relatively poor sequence homology between the sequences of the present invention and the known carotenoid biosynthesis genes, it was not a straightforward matter to identify and clone the sarcinaxanthin gene cluster; a considerable effort and ingenuity in terms of sequence analysis was required. Furthermore, only after the identification and characterisation of the

sarcinaxanthin gene cluster from M. luteus strain NCTC2665 was it possible to identify homologous genes from the novel Otnes7 strain of the invention, which as discussed below resulted in the identification of genes the expression of which resulted in improved efficiency of sarcinaxathin production over the genes of the NCTC2665 strain.

The present inventors have isolated and purified sarcinaxanthin from a previously unknown source, bacterial isolate Otnes7, believed to be a novel strain of M. luteus (deposited in the name of the applicant under the deposit number DSM 23579, on 29 April 2010, at the Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH (DSMZ)) which was isolated from the surface micro layer of the mid-part of the Norwegian coast. The isolation of this novel microorganism has enabled the inventors to clone and sequence a novel sarcinaxanthin biosynthetic gene cluster, which shows improved activity in comparison to known strains. The biosynthetic gene cluster contains 8 genes that encode proteins that are believed to be involved in the biosynthesis of the sarcinaxanthin molecule and derivatives thereof (see Table 1 ). Based on the knowledge of the sequence, the inventors have been able to use various methods of genetic manipulation to confirm the activity of the proteins encoded by the gene cluster and to show that the sequences identified in the Otnes7 strain are indeed responsible for enhanced sarcinaxanthin biosynthesis.

The complete coding sequence for (i.e. the complete nucleotide sequence encoding) the sarcinoxanthin biosynthetic gene cluster from the NCTC2665 strain is shown in SEQ ID NO. 1 . This has been shown to contain a number of genes or ORFs, that are believed to encode all of the proteins and polypeptides that are required for normal sarcinaxanthin biosynthesis in M. luteus. The group of proteins and polypeptides encoded by the gene cluster as a whole are collectively referred to as the biosynthetic machinery for the biosynthesis of sarcinaxanthin.

In silico screening the of the M. luteus strain NCTC2665 DNA sequence data (which has been deposited under accession number NC_012803) resulted in the initial identification of a putative carotenoid biosynthesis gene cluster consisting of six open reading frames, or1009 - or1014 (comprised within SEQ ID NO: 1 ). The deduced or1014 gene product displayed only 31 % and 33% primary sequence identity to known CrtE proteins of C. glutamicum and Dietzia sp., respectively, both encoding geranyl geranyl pyrophosphate (GGPP) synthases. CrtE catalyzes the first reaction specific to the carotenoid branch of general isoprenoid metabolism, the conversion of farnesyl pyrophosphate (FPP) into GGPP. The or1014 gene was therefore designated crtE (SEQ ID NO: 18 and 19). The deduced or1013 gene product displayed only 41 % and 48% primary sequence identity to the CrtB proteins of C. glutamicum and Dietzia sp., respectively, which are phytoene synthases which catalyze the condensation of two GGPP molecules to phytoene. The or1013 gene was therefore designated crtB (SEQ ID NO: 20 and 21 ). The deduced or1012 gene product displayed only 43% and 53% primary sequence identity to the Crtl proteins of C. glutamicum and Dietzia sp., respectively. These proteins are phytoene desaturases which catalyse conversion of phytoene to lycopene by stepwise desaturation reactions. The or1012 gene was therefore designated crtl (SEQ ID NO: 22 and 23). The deduced or101 1 gene product displayed only 50% and 52% primary sequence identity to the lycopene elongases in C. glutamicum and in Dietzia sp., respectively. In C. glutamicum this enzyme (encoded by crtEb) catalyses the conversion of lycopene into nonaflavuxanthin and flavuxanthin. Secondary structure analysis revealed six transmembrane helices for the M. luteus elongase, five for the C. glutamicum elongase and eight for the Dietzia sp. elongase, strongly indicating that all are transmembrane proteins. The or101 1 gene was designated crtE2 (SEQ ID NO: 6 and 8). The deduced or1010 and or1009 gene products displayed only 32% and 31 % primary sequence identity to the C ₅₀ ε-cyclase subunits in C. glutamicum encoded by crtYe and crtYf, respectively. They also shared only 36% and 38% primary sequence identity to the corresponding proteins in Dietzia sp. In C. glutamicum, the crtYe and crtYf gene products are small polypeptides assumed to form a heterodimeric enzyme that catalyses the conversion of flavuxanthin into decaprenoxanthin. Both gene products exhibit three transmembrane helices. Secondary structure analysis revealed also three transmembrane helices for each C ₅₀ cyclase subunit from C. glutamicum and Dietzia sp.. The or1010 and or1009 genes were designated crtYg (SEQ ID NO: 2 and 3) and crtYh (SEQ ID NO: 4 and 5), respectively.

Further analysis of the gene cluster revealed that immediately downstream of crtYh there is a an ORF encoding a hypothetical protein (SEQ ID NO: 24 and 25), followed by or1007 which encodes a putative polypeptide sharing only 43% sequence identity to the putative glycosyl transferase protein CrtX from Dietzia sp., suggested to be involved in the glycosylation of C.p.450 (Tao, Yao et al. 2007). The or1007 gene was therefore designated crtX (SEQ ID NO: 16 and 17).

Without wishing to be bound by any single hypothesis, it is believed, due to the proximal localization and similar orientation of the genes, that the crtEIBE2YgYh genes are cotranscribed in M. luteus. Moreover, the assumed stop codons of crtB, oil, crtE2 and crtYg overlap the start codon of the corresponding subsequent gene which may allow translational coupling to ensure equimolar expression and/or proper folding of the products. Whilst the genetic organization of crt genes in M. luteus displays some similarities to the previously published biosynthetic gene clusters for the C ₅₀ carotenoids C.p.450 and decaprenoxanthin in Dietzia sp., in view of the differences in the order of the genes and the relatively low sequence identity between the genes it was only after experimental analysis, as discussed elsewhere herein, that the above described gene cluster was confirmed as being involved in sarcinaxanthin biosynthesis.

As discussed above, the sarcinaxanthin biosynthetic gene cluster is a nucleic acid molecule which contains the various genetic elements or different genes or ORFs that encode the proteins or polypeptides that are required for the biosynthesis of the sarcinaxanthin molecule or a sarcinaxanthin derivative. However, not all of the encoded proteins and polypeptides have yet been ascribed a role in the biosynthesis and so it is thought that not all of the encoded proteins or polypeptides of the cluster are essential for sarcinaxanthin

biosynthesis. The various genes and ORFs may encode enzymes that catalyse one or more biochemical reactions, or proteins that do not have catalytic activity but instead are involved in other processes such as the regulation of the process of sarcinaxanthin synthesis, or sarxinaxanthin transport, for example.

Each sarcinaxanthin biosynthetic gene or ORF encodes a single polypeptide chain (which can alternatively be described as a protein; the terms "polypeptide" and "protein" are used interchangeably herein) that has or is believed to have a function in the biosynthesis of the sarcinaxanthin molecule or a derivative thereof. Eight such genes or ORFs have been identified (see Table 1 ). As shown in Figure 1 , six of these are ascribed a direct role in the biosynthesis of sarcinaxanthin, whilst a seventh has been shown to have a role in the glycosylation of sarcinaxanthin to mono- and diglucoside forms and the eighth has not yet been ascribed a function.

However, as discussed further below, only two of the genes or ORFs are essential for the biosynthesis of sarcinaxanthin, i.e. those encoding the enzyme which catalyses the final step of the biosynthetic pathway that results in the conversion of flavuxanthin to sarcinaxanthin (namely crtYg and crtYh) and the other genes may be replaced by genes encoding enzymes with equivalent functional activities, or alternative activities that result in the production of flavuxanthin, i.e. the substrate for the C ₅₀ carotenoid γ-cyclase encoded by said genes. In other words, for the production of sarcinaxanthin in a host cell it is not necessary to introduce into said cell the entire biosynthetic cluster from M.luteus (although this is contemplated by the present invention) as the introduction of genes encoding the enzymes that catalyse the final step in the biosynthetic pathway is sufficient for the production of sarcinaxanthin as long as the substrate for the sarcinxanthin-synthesising C ₅₀ carotenoid γ-cyclase, i.e. flavuxanthin, is present in said cell.

In particular, as described in the examples herein, it has been found that higher levels of sarcinaxanthin production may be obtained by recombinant expression of the sarcinaxanthin- producing enzymes (i.e. of the sarcinaxanthin biosynthetic machinery) in a heterologous host, as compared with sarcinaxanthin production in native M. luteus cells. Thus, in terms of sarcinaxanthin production, recombinant expression is favoured over extraction from natural sources (i.e. over isolation of the product from cells in which it is naturally produced).

Thus in a very general sense, the present invention provides a method of producing sarcinaxanthin or a derivative thereof, said method comprising introducing into and expressing in a host cell one or more nucleic acid molecules encoding the sarcinaxanthin biosynthetic pathway.

By allowing the nucleic acid molecules to be expressed, the encoded biosynthetic machinery may act in the host cell to synthesise the sarcinaxanthin, which may be recovered from the host cell. Thus, in the method above, the sarcinaxanthin or derivative thereof is synthesised in the host cell, and the method may comprise the further step of isolating the sarcinaxanthin or derivative thereof from the host cell.

As noted above, it is not necessary to introduce the entire biosynthetic pathway into the host, as long as the host is capable of making an intermediate, or substrate in the pathway (i.e. a sarcinaxanthin precursor). For example, a host already capable of synthesising lycopene, and/or flavuxanthin, may be used.

Thus, in a further broad sense, the invention may be seen as providing a method of producing sarcinaxanthin or a derivative thereof, said method comprising introducing into and expressing in a host cell one or more nucleic acid molecules encoding an activity in the sarcinaxanthin biosynthetic pathway. As noted above, such a host cell will be a cell which produces an appropriate substrate or substrates for the introduced activity or activities, for example a lycopene-producing host cell, or a flavuxanthin-producing host cell. Preferably the host cells do not endogenously contain all of the nucleic acid molecules required for the synthesis of sarcinaxanthin or a derivative thereof, i.e. do not naturally produce sarcinaxanthin, but may preferably comprise nucleic acid molecules encoding proteins required for the synthesis of sarcinaxanthin precursors, e.g. lycopene, nonaflavuxanthin or flavuxanthin. Such nucleic acid molecules may be present endogenously i.e. the host cell may be a native producer of lycopene, nonaflavuxanthin and/or flavuxanthin. In a particularly preferred embodiment the host cell is a cell or microorganism other than that from which the nucleic acid molecules were (or from which they may be) derived and in which the molecules are natively present.

As will be described in more detail below, the nucleic acid molecules which are introduced will preferably encode one or more of the biosynthetic proteins of the organism M. luteus. In other words the nucleic acid molecules will be derived from, or will correspond to, the crt genes of M. luteus, as described herein. As noted above, and described in more detail below, in certain cases, for example in case of proteins involved in the biosynthesis up to the intermediate flavuxanthin, nucleic acid molecules encoding equivalent proteins from other sources may be used.

More particularly, the method of the invention involves (or comprises) the introduction and expression of a nucleic acid molecule encoding a protein having C ₅₀ carotenoid γ-cyclase activity. Such a protein may be an enzyme which catalyses the conversion of flavuxanthin to sarcinaxanthin, and in particular such an enzyme which performs this reaction in M. luteus. Thus, the protein may correspond to the gene product of the crtYgYh genes of M. luteus. Such proteins are described further below.

As noted above, the gene cluster for the entire biosynthetic pathway for sarcinaxanthin has been cloned and identified in M. luteus. Whilst a nucleic acid molecule corresponding to the entire gene cluster of M. luteus may be used according to the invention, nucleic acid molecules based on genes encoding equivalent proteins from other sources may be used to provide the host cell with the proteins needed to synthesize a substrate, or intermediate, in the pathway. Thus for example host cells producing lycopene are known in the art, as are nucleic acid molecules encoding lycopene-synthesising enzymes, which may be used to engineer a host cell suitable for use according to the invention, to produce lycopene. Similarly a flavuxanthin- producing host cell may be used, or may be engineered to produce flavuxanthin.

Accordingly, one aspect of the invention thus provides a method of producing

sarcinaxanthin or a derivative thereof, said method comprising introducing into and expressing in a host cell: (a) one or more nucleic acid molecules comprising nucleotide sequences encoding one or more proteins capable of synthesising flavuxanthin; and

(b) one or more nucleic acid molecules comprising nucleotide sequences encoding one or more proteins having or contributing to C ₅₀ carotenoid γ-cyclase activity, for example proteins capable of catalysing the conversion of flavuxanthin to sarcinaxanthin.

A further, more particular, aspect of the invention thus provides a method of producing sarcinaxanthin or a derivative thereof, said method comprising introducing into and expressing in a lycopene-producing host cell:

(a) one or more nucleic acid molecules comprising nucleotide sequences encoding one or more proteins capable of catalysing the conversion of lycopene to flavuxanthin, or, alternatively viewed, having lycopene elongase activity; and

(b) one or more nucleic acid molecules comprising nucleotide sequences encoding one or more proteins having or contributing to C ₅₀ carotenoid γ-cyclase activity, or, alternatively viewed, capable of catalysing the conversion of flavuxanthin to sarcinaxanthin.

In the context above the term "contributing" is meant to reflect that the C ₅₀ carotenoid v- cyclase enzyme is heterodimeric, and that on its own a single subunit, e.g. as encoded by crtYg or crtYh alone, is not active - both subunits are required for the C ₅₀ carotenoid γ-cyclase activity, but a single subunit contributes to activity.

More specific embodiments of these aspects of the invention are described further below. However, in general terms nucleic acid molecules of (b) may be obtained or derived from M. luteus, e.g. they may correspond to or be derived from the nucleotide sequences from M. luteus encoding proteins having or contributing to C ₅₀ carotenoid γ-cyclase activity, as described herein, more particularly they may be correspond to or be derived from the crtYg or crtYh genes of M. luteus as described herein. The nucleic acid molecules encoding proteins capable of synthesising flavuxanthin may be obtained or derived from other sources, for example from genes known to be efficient in encoding proteins for lycopene synthesis in other organisms (e.g. the crtEIB genes from Pantoea ananatis, which are particularly useful in this respect, are described below), and by way of further example, nucleic acid molecules encoding proteins having lycopene elongase activity may be obtained or derived from organisms synthesising flavuxanthin, such as Corynebacterium glutamicum (crtEb) or from M. luteus (crtE2).

Thus, more particularly the method of the invention may involve introducing into and expressing in a host cell one or more nucleic acid molecules comprising a nucleotide sequence encoding:

(i) a protein capable of catalysing the conversion of farnesyl pyrophosphate (FPP) into geranyl geranyl pyrophosphate (GGPP) (e.g. a protein as encoded by a crtE gene);

(ii) a protein capable of catalysing the condensation of GGPP to phytoene (e.g. a protein as encoded by a crtB gene); (iii) a protein capable of catalysing the conversion of phytoene to lycopene, or alternatively put a protein having phytoene dehydrogenase activity (e.g. a protein as encoded by a c f/ gene);

(iv) a protein capable of catalysing the conversion of lycopene to flavuxanthin, or, alternatively viewed, having lycopene elongase activity (e.g. a protein as encoded by a crtE2 or a crtEb gene); and

(v) a protein having or contributing to C ₅₀ carotenoid γ-cyclase activity, or, alternatively viewed, capable of catalysing the conversion of flavuxanthin to sarcinaxanthin (e.g. proteins as encoded by a crtYg gene and a crtYh gene as described herein).

As noted above, in a preferred embodiment nucleic acid molecules encoding (iv) and (v) above are introduced into a lycopene-producing host.

However, it is not precluded that the invention comprises the introduction of all the activities (i) to (v) set out above, and this may depend on the selected host, particular nucleic acid molecules involved etc. Thus, by way of representative example only, the method of the invention may comprise introducing into a host cell and expressing a nucleic acid molecule comprising the nucleotide sequence encoding the entire biosynthetic gene cluster, for example as obtained or derivable from a strain of M. luteus, e.g. as set forth in SEQ ID NO: 1 , SEQ ID

NO: 26 or SEQ ID NO: 37, or a sequence with at least 70% sequence identity to SEQ ID NO: 1 ,

26 or 37, or a part thereof, including particularly a part encoding the sarcinaxanthin biosynthetic pathway. In further embodiments, such a molecule may include a part of SEQ ID NO: 1 , 26 or

37 which encodes one or more activities in the biosynthetic pathway, and more particularly a part which encodes a C ₅o carotenoid γ-cyclase activity.

The nucleic acid molecule(s) which are introduced may be in the form of a single nucleic acid molecule or separate nucleic acid molecules. Thus a single nucleic acid molecule may comprise nucleotide sequences encoding all of the proteins/activities which are to be introduced, or the proteins/activities may be encoded by nucleotide sequences provided by (or on) more than one nucleic acid molecule.

The nucleic acid molecules for use in the method of the invention need not comprise the entire sarcinaxanthin biosynthetic gene cluster but may comprise a portion or part of it, more specifically a part encoding one or more proteins having a particular enzymic activity, and particularly a C ₅₀ carotenoid γ-cyclase activity, more particularly a lycopene elongase activity and a C ₅₀ carotenoid γ-cyclase activity.

A "sarcinaxanthin biosynthetic gene or ORF" refers to a gene or ORF which encodes a protein or polypeptide that is functional in the biosynthetic process of sarcinaxanthin or a sarcinaxanthin derivative. As noted above, this could be an enzyme that is involved in any step of the pathway, not only the final step of conversion of flavuxanthin to sarcinaxanthin, but also in the synthesis of lycopene or flavuxanthin or the precursors thereof, a protein that is involved in the modification of sarcinaxanthin to produce a sarcinaxanthin derivative (e.g. a glycosylated derivative) or a protein that is required for regulation or for transport of the molecule at any stage of its biosynthesis.

A nucleic acid molecule of the invention and for use in the method of the invention may be an isolated nucleic acid molecule (in other words isolated or separated from the components with which it is normally found in nature) or it may be a recombinant or a synthetic nucleic acid molecule.

The nucleic acid molecules may encode (or comprise a nucleotide sequence encoding) at least 1 , or more, e.g. 2, 3, 4, 5, 6, 7 or 8 of the polypeptides or proteins that are involved in the biosynthesis of the sarcinaxanthin or a sarcinaxanthin derivative. For example, the method may involve the introduction of a single nucleic acid molecule encoding, e.g. proteins having lycopene elongase and C ₅₀ carotenoid γ-cyclase activity, for example crtE2, crtYh and crtYg (or proteins with the equivalent functional activity, e.g. crtEb in place of crtE2). Alternatively it may comprise nucleic acid molecules corresponding to all of the ORFs/genes as set out in Table 1 except any one or more of crtX and the gene encoding the hypothetical protein (ORF1 ).

Each of the nucleic acid molecules of the method of the invention thus encodes one or more polypeptides involved in the biosynthesis of, or having functional activity in, the synthesis of sarcinaxanthin or a sarcinaxanthin derivative. Such a molecule may encode not only the known proteins, as they are found in nature, but also a functionally equivalent variant of a such a native protein, that is a protein which retains the activity of the native protein, which comprises one or more modifications in its amino acid sequence, for example an amino acid substitution, deletion, and/or insertion. Thus, fragments (or parts) of proteins are included as long as they retain the activity of the parent protein. Furthermore, also included are degenerate nucleic acid molecules, i.e. nucleic acid molecules in which the nucleotide sequence is varied with respect to the native sequence, but which encodes the same polypeptide. As defined above, the nucleic acid molecules of the invention may thus comprise functionally equivalent variants of SEQ ID NO: 1 , SEQ ID NO: 26 or SEQ ID NO: 37 and such variants may include parts, degenerate sequences, or homologues defined by a % sequence identity to SEQ ID NO. 1. Such functionally equivalent variants encode proteins/polypeptides having functional activity as defined above. Furthermore, "parts" or "portions" as described herein may be functional equivalents. Preferably these portions satisfy the identity (relative to a comparable region) or hybridizing conditions mentioned herein.

Such functional activity may be enzymatic activity e.g. an activity involved in the synthesis of sarcinaxanthin. Such activities, or proteins having such activities are as defined above, and may be e.g. an activity corresponding to the activity of crtE, crtB, crtl, crtE2, crtYg and/or crtYh. Such functional activity may also be sarcinaxanthin glycosylase activity corresponding to the activity of crtX. As mentioned above, a number of genes and ORFs have been identified within SEQ ID NO: 1 , SEQ ID NO: 26 and SEQ ID NO: 37 and parts or fragments which correspond to such genes or ORFs represent preferred "parts" or fragments of SEQ ID NO: 1 , 26 or 37. These are tabulated in Table 1 below:

Table 1

As described in more detail below, further work has revealed the presence of additional genes within the gene cluster which is represented by SEQ I D NO:26. Thus, although not shown in SEQ ID NO:26, this gene cluster also includes a cf gene, encoding a sarcinaxanthin glycosylase, the nucleotide and encoded amino acid sequences of which respectively are shown in SEQ ID NOs: 33 and 34. The "full length" gene cluster of the Otnes 7 strain is shown in SEQ ID NO: 37.

The sequences set out above thus represent sarcinaxanthin biosynthetic genes or ORFs. In other words, such genes/ORFs are found within the sarcinaxanthin biosynthetic gene cluster and encode proteins or polypeptides which have or are proposed to have a role in the biosynthesis of sarcinaxanthin in M. luteus. The term "sarcinaxanthin biosynthetic gene" or "sarcinaxanthin biosynthetic ORF" also includes genes and ORFs which encode proteins that share activity or function with the above proteins, and for example share high levels of sequence identity, as discussed elsewhere herein. They can alternatively be described as "functionally equivalent variants" or "functional equivalents".

In this respect, the sarcinaxanthin biosynthetic gene cluster has also been cloned from the novel Micrococcus luteus strain Otnes7, and the proteins encoded by said genes can be considered as functional equivalents of the NCTC2665 sarcinaxanthin biosynthetic proteins. However, as discussed elsewhere herein, the Otnes7 strain produces increased levels of carotenoids in comparison to the NCTC2665 strain, e.g. 19C^g/g cell dry weight (CDW) and 145 μg/g CDW, respectively. This difference in sarcinaxanthin production is sufficient to distinguish between the two strains by visual inspection as the difference between colour intensities of the M. luteus strains demonstrates clearly that the Otnes7 strain produces higher levels of sarcinaxanthin than the NCTC2665 strain. Furthermore, when expressed in a heterologous host, the Otnes7 genes resulted in higher sarcinaxanthin production levels as compared to expression of the NCTC2665 genes. From experimental analysis of the Otnes7 biosynthetic gene cluster the present inventors were able to determine that the Otnes7 genes comprise specific sequence modifications as compared to the genes from the NCTC2665 strain. It is unclear exactly why the Otnes7 genes result in increased production, and this may depend upon the host used for the expression. However, it is possible that they encode proteins which have an enhanced catalytic activity (or substrate conversion efficiency) in comparison to genes of the NCTC2665 strain. Specifically, in the experiments in the examples described below the CrtE2 protein from the Otnes7 strain shows a relative conversion efficiency of lycopene to nonaflavuxanthin and flavuxanthin of 79% in comparison to the equivalent protein from the NCTC2665 strain, which has a conversion efficiency of only 23%. Furthermore, when the nucleic acids from the Otnes7 strain encoding CrtE2, CrtYg and CrtYh are expressed in a heterologous host cell, at least 97% of the carotenoid produced was sarcinaxanthin, wherein the expression of the same genes from NCTC2665 resulted in only about 90% of the carotenoids produced being sarcinaxanthin.

Thus, in a further, and preferred, aspect the present invention also provides nucleic acid molecules which correspond to, or are based on or derived from, the Otnes7 genes (i.e. the sarcinaxanthin biosynthetic gene cluster of the Otnes7 strain). ln one embodiment of this aspect the invention can be seen to provide a nucleic acid molecule comprising or consisting of all or a part of a nucleotide sequence as set forth in SEQ ID NO: 26 or 37 or which has at least 90% sequence identity to SEQ ID NO. 26 or 37, which molecule encodes one or more proteins having activity in the biosynthesis of sarcinaxanthin, and wherein any nucleic acid molecule which comprises a nucleotide sequence which is a part of SEQ ID NO. 26 or 37 or which is at least 90 % identical to SEQ ID NO. 26 or 37 encodes proteins which are able to synthesise sarcinaxanthin at substantially the same level as the proteins encoded by SEQ ID NO: 26 or 37 when expressed in a host cell.

Thus, such a nucleic acid molecule encoding a part of SEQ ID NO: 26 or 37 or a variant of SEQ ID NO: 26 or 37 or a part thereof which variant has at least 90 % sequence identity, may encode a particular protein or enzyme in the pathway, or a protein which is a constituent part of a enzyme in the pathway. When such a nucleic acid molecule is expressed, for example with other nucleic acid molecules corresponding to parts of SEQ ID NO: 26 or 37 encoding other enzymes/proteins in the pathway, the level of sarcinaxanthin production is subtantially the same as when SEQ ID NO: 26 or 37 is expressed in the host cell. In other words, a sequence-variant or a part of SEQ ID NO: 26 or 37 will encode an activity, or a protein contributing to an activity which is at the same or an equivalent level to the activity of the protein encoded by SEQ ID NO: 26 or 37. "Substantially the same level" may be taken to mean activity which is at least 90%, more particularly at least 91 , 92, 93 or 94%, more preferably at least 95, 96, 97, 98 or 99% of the activity of the equivalent protein encoded by SEQ ID NO: 26 or 37. Thus the nucleic acid molecules of the invention encode proteins which are substantially as active as the native proteins encoded by SEQ ID NO: 26 or 37 i.e. they retain the improved properties of the Otnes7 genes.

It will be evident from the structure of the sarcinaxanthin biosynthetic gene cluster from M.luteus NCTC2665 described above, that the sarcinaxanthin biosynthetic gene cluster from the Otnes 7 strain may comprise also encoding sequences in addition to those presented in SEQ ID NO: 26, i.e. the encoding sequences presented in SEQ ID NO: 37. For instance, the

sarcinaxanthin biosynthetic gene cluster from the Otnes 7 strain also comprises a nucleic acid region encoding a protein with sarcinaxanthin glycosylase activity, i.e. a c f gene. Hence, the present invention may also be seen to provide a nucleic acid molecule comprising or consisting of all or a part of a nucleotide sequence as set forth in SEQ ID NO: 37 or which has at least 90% sequence identity to SEQ ID NO. 37, which molecule encodes one or more proteins having activity in the biosynthesis of sarcinaxanthin, and wherein any nucleic acid molecule which comprises a nucleotide sequence which is a part of SEQ ID NO. 37 or which is at least 90 % identical to SEQ ID NO. 37 encodes proteins which are able to synthesise sarcinaxanthin at substantially the same level as the proteins encoded by SEQ ID NO: 37 when expressed in a host cell. ln a preferred aspect of the invention the nucleic acid molecule comprises or consists of all or a part of a nucleotide sequence as set forth in SEQ ID NO: 26 or which has at least 90% sequence identity to SEQ ID NO. 26, which molecule encodes one or more proteins having activity in the biosynthesis of sarcinaxanthin, and wherein any nucleic acid molecule which comprises a nucleotide sequence which is a part of SEQ ID NO. 26 or which is at least 90 % identical to SEQ ID NO. 26 encodes proteins which are able to synthesise sarcinaxanthin at substantially the same level as the proteins encoded by SEQ ID NO: 26 when expressed in a host cell.

More particularly, the present invention also provides a nucleic acid molecule comprising (or consisting of) a nucleotide sequence encoding all or part of a protein having an amino acid sequence as set forth in SEQ ID NO: 1 1 or an amino acid sequence which is at least 90% identical to SEQ ID NO: 1 1 and wherein said nucleotide sequence encodes a lycopene elongase with a lycopene to flavuxanthin conversion efficiency of at least 30%, when expressed in a host cell, or a nucleic acid molecule which comprises a nucleotide sequence which is the complement of any aforesaid sequence.

Preferably, the conversion efficiency is at least 40, 50, 60, 70, 75 or 80%.

A nucleic acid molecule as defined in this aspect of the invention may comprise or consist of:

(i) a nucleotide sequence as set forth in SEQ ID NO: 10;

(ii) a nucleotide sequence which is degenerate with the sequence of SEQ ID NO: 10;

(iii) a nucleotide sequence which has at least 90% sequence identity to SEQ ID NO: 10;

(iv) a nucleotide sequence which is a part of the nucleotide sequence of SEQ ID NO: 10 or of a nucleotide sequence which is degenerate therewith; or

(v) a nucleotide sequence which is complementary to any of (i) to (iv) above.

Additionally the present invention provides a nucleic acid molecule comprising (or consisting of) a nucleotide sequence encoding all or part of a protein having an amino acid sequence selected from the sequences as set forth in any one of SEQ ID NO: 1 1 , 13 and 15 or an amino acid sequence which has at least 90% sequence identity to SEQ ID NO: 1 1 , 13 or 15, and wherein said nucleotide sequence encodes a protein which when expressed in a lycopene- producing host cell together with each of the other said proteins results in at least 91 % of the total carotenoids produced being sarcinaxanthin, or a nucleic acid molecule which comprises a nucleotide sequence which is the complement of any aforesaid sequence.

Preferably, at least 92, 93, 94, 95, 96, 97, 98 or 99% of the total carotenoids produced is sarcinaxanthin.

Furthermore, the present invention provides a nucleic acid molecule comprising (or consisting of) a nucleotide sequence encoding all or part of a protein having an amino acid sequence selected from the sequences as set forth in any one of SEQ ID NO: 1 1 , 13 and 15 or an amino acid sequence which has at least 90% sequence identity to SEQ ID NO: 1 1 , 13 or 15, wherein said nucleotide sequence encodes a protein which when expressed in a lycopene- producing host cell together with each of the other said proteins results in sarcinaxanthin production to a level of at least 150 μg/g of cell dry weight (CDW).

Preferably, sarcinaxanthin is produced to a level of at least 300, 500, 750, 1000, 2000,

2500 g/g CDW.

More particularly, in these aspects of the invention as set out above, the protein of SEQ ID NO: 1 1 or of a part or sequence variant thereof has lycopene elongase activity and the proteins of SEQ ID NOs: 13 and 15 or parts or sequence variants thereof have or contribute to C ₅₀ carotenoid γ-cyclase activity (e.g. together have C ₅₀ carotenoid γ-cyclase activity) or more particularly are capable of catalysing the conversion of flavuxanthin to sarcinaxanthin.

Included within these aspects of the invention is a nucleic acid molecule comprising or consisting of:

(i) a nucleotide sequence selected from sequences as set forth in SEQ ID NO: 10, 12 and 14;

(ii) a nucleotide sequence which is degenerate with the sequence of any one of SEQ ID NOs: 10, 12 or 14;

(iii) a nucleotide sequence which has at least 90% sequence identity to any one of SEQ ID NOs: 10, 12 or 14;

(iv) a nucleotide sequence which is a part of the nucleotide sequence of any one of SEQ

ID NOs: 10, 12 or 14 or of a nucleotide sequence which is degenerate therewith; or (v) a nucleotide sequence which is complementary to any of (i) to (iv) above.

Alternatively or additionally the present invention also provides a nucleic acid molecule comprising (or consisting of) a nucleotide sequence encoding a protein having lycopene elongase activity and an amino acid sequence as set forth in all or part of SEQ ID NO: 1 1 or an amino acid sequence which is at least 90% identical to SEQ ID NO: 1 1 , wherein said amino acid sequence comprises one or more of the following:

(a) alanine at position 8;

(b) valine at position 88;

or a nucleotide sequence which is the complement of any aforesaid sequence.

The position numbers are stated with reference to SEQ ID NO. 1 1 .

Preferably the nucleic acid encodes a lycopene elongase with a conversion efficiency, or which enables sarcinaxanthin production, as defined above. More preferably the nucleic acid molecule comprises a nucleotide sequence as set forth in SEQ ID NO: 10 or a part of variant thereof as defined above, or a complement thereof. Similarly, the invention provides a nucleic acid molecule comprising (or consisting of) a nucleotide sequence encoding a protein which contributes to (or more particularly which is a subunit of a protein having) C ₅₀ carotenoid γ-cyclase activity and which has an amino acid sequence as set forth in all or part of SEQ ID NO: 13 or an amino acid sequence which is at least 90% identical to SEQ ID NO: 13, wherein said amino acid sequence comprises one or more of the following:

(a) valine at position 44;

(b) valine at position 64;

(d) arginine at position 104;

(e) proline at position 1 1 1 ;

(f) glycine at position 1 17;

or a nucleotide sequence which is the complement of any aforesaid sequence.

The position numbers are stated with reference to SEQ ID NO: 13.

Preferably the nucleic acid encodes a polypeptide that enables sarcinaxanthin production as defined above (i.e. at the levels as defined above). More preferably the nucleic acid molecule comprises a nucleotide sequence as set forth in SEQ ID NO: 12 or a part of variant thereof as defined above, or a complement thereof.

The present invention further provides a nucleic acid molecule comprising (or consisting of) a nucleotide sequence encoding a protein which contributes to (or more particularly which is a subunit of a protein having) C ₅₀ carotenoid γ-cyclase activity and which has an amino acid sequence as set forth in all or part of SEQ ID NO: 15 or an amino acid sequence which is at least 90% identical to SEQ ID NO: 15, wherein said amino acid sequence comprises one or more of the following:

(a) a glycine residue at position 100;

(b) a glycine residue at position 103;

or a nucleotide sequence which is the complement of any aforesaid sequence.

The position numbers are stated with reference to SEQ ID NO: 15.

Preferably the nucleic acid molecule encodes a polypeptide that enables sarcinaxanthin production as defined above, e.g. at the levels defined above. More preferably the nucleic acid molecule comprises a nucleotide sequence as set forth in SEQ ID NO: 14 or a part of variant thereof as defined above, or a complement thereof.

Additionally, the present invention also provides a nucleic acid molecule comprising (or consisting of) a nucleotide sequence encoding all or part of a protein having an amino acid sequence as set forth in SEQ ID NO: 34 or an amino acid sequence which is at least 90% identical to SEQ ID NO: 34 and wherein said nucleotide sequence encodes a sarcinaxanthin glycosylase enzyme, which activity results in the production of both sarcinaxanthin mono- and diglucosides, when expressed in a host cell, or a nucleic acid molecule which comprises a nucleotide sequence which is the complement of any aforesaid sequence.

A nucleic acid molecule as defined in this aspect of the invention may comprise or consist of:

(i) a nucleotide sequence as set forth in SEQ ID NO: 33;

(ii) a nucleotide sequence which is degenerate with the sequence of SEQ ID NO: 33;

(iii) a nucleotide sequence which has at least 90% sequence identity to SEQ ID NO: 33;

(iv) a nucleotide sequence which is a part of the nucleotide sequence of SEQ ID NO: 33 or of a nucleotide sequence which is degenerate therewith; or

(v) a nucleotide sequence which is complementary to any of (i) to (iv) above.

Alternatively or additionally the present invention also provides a nucleic acid molecule comprising (or consisting of) a nucleotide sequence encoding a protein having sarcinaxanthin glycosylase activity and an amino acid sequence as set forth in all or part of SEQ ID NO: 34 or an amino acid sequence which is at least 90% identical to SEQ ID NO: 34, wherein said amino acid sequence comprises one or more of the following:

(a) histidine at position 62;

(b) serine at position 109;

(d) alanine at position 138;

(e) arginine at position 248;

(f) proline at position 251 ;

or a nucleotide sequence which is the complement of any aforesaid sequence.

The position numbers are stated with reference to SEQ ID NO. 34.

Preferably the nucleic acid encodes a sarcinaxanthin glycosylase which enables sarcinaxanthin mono- or diglucoside production, as defined elsewhere herein. More preferably the nucleic acid molecule comprises a nucleotide sequence as set forth in SEQ ID NO: 33 or a part of variant thereof as defined above, or a complement thereof.

Hence, in one embodiment a sarcinaxanthin glycosylase or a nucleic acid encoding a sarcinaxanthin glycosylase as described herein may be used for the production of a

sarcinaxanthin mono- or diglucoside. For instance, a nucleic acid encoding a sarcinaxanthin glycosylase may be introduced into a host cell capable of producing sarcinaxanthin to produce sarcinaxanthin mono- or diglucoside.

Additionally, the present invention also provides a nucleic acid molecule comprising (or consisting of) a nucleotide sequence encoding all or part of a protein having an amino acid sequence as set forth in SEQ ID NO: 36 or an amino acid sequence which is at least 90% identical to SEQ ID NO: 36 and wherein said nucleotide sequence encodes a protein of the sarcinaxanthin biosynthetic gene cluster, or a nucleic acid molecule which comprises a nucleotide sequence which is the complement of any aforesaid sequence.

A nucleic acid molecule as defined in this aspect of the invention may comprise or consist of:

(i) a nucleotide sequence as set forth in SEQ ID NO: 35;

(ii) a nucleotide sequence which is degenerate with the sequence of SEQ ID NO: 35;

(iii) a nucleotide sequence which has at least 90% sequence identity to SEQ ID NO: 35;

(iv) a nucleotide sequence which is a part of the nucleotide sequence of SEQ ID NO: 35 or of a nucleotide sequence which is degenerate therewith; or

(v) a nucleotide sequence which is complementary to any of (i) to (iv) above.

Alternatively or additionally the present invention also provides a nucleic acid molecule comprising (or consisting of) a nucleotide sequence encoding a protein of the sarcinaxanthin biosynthetic gene cluster and an amino acid sequence as set forth in all or part of SEQ ID NO: 36 or an amino acid sequence which is at least 90% identical to SEQ ID NO: 36, wherein said amino acid sequence comprises one or more of the following:

(a) valine at position 3;

(b) leucine at position 7;

(d) glutamine at position 29;

(e) aspartic acid at position 33;

(f) methionine at position 34;

(g) threonine at position 41 ;

(h) threonine at position 50;

(i) serine at position 68;

(j) arginine at position 161 ;

(k) tyrosine acid at position 163;

(I) isoleucine at position 190;

(m) arginine acid at position 197;

(n) glutamic acid at position 199;

or a nucleotide sequence which is the complement of any aforesaid sequence.

The position numbers are stated with reference to SEQ ID NO. 36.

Preferably the nucleic acid molecule comprises a nucleotide sequence as set forth in SEQ ID NO: 35 or a part of variant thereof as defined above, or a complement thereof.

The invention also extends to proteins or polypeptides encoded by the above-defined nucleic acids and use of the above-defined nucleic acids in the methods of the invention described elsewhere herein. ln general the term "gene" includes the ORF which encodes the protein, together with any regulatory sequences such as promoters, whereas the term "ORF" refers only to the part of the gene which is responsible for encoding the protein.

As referred to herein "functionally equivalent variants" or "functional equivalents" retain the activity of the entity to which they are related (or from which they are derived), e.g. encode or represent a protein with substantially the same properties, e.g. enzymatic or enzymatic subunit activity, and preferably retain the activity at substantially the same level as the parent entity. The properties or activities can be tested for using standard techniques that are known in the art. As used herein the term "substantially" can be taken to mean at least 90% and preferably at least 95, 96, 97, 98 or 99% of the activity of the parent entity.

A "part" of the nucleic acid molecule may contain at least 50 %, more particularly at least 60, 70, 75, 80, 85, 90 or 95 % of the nucleotides of the molecule. Thus by way of representative example it may be at least 180, or at least 200 bases in length, or at least 250, 280, 300, 500, 600, 700, 800, 900, 1000, 1500, 2000, 3000, 4000, 5000, 6000 or 7000 bases. In the context of a nucleic acid molecule representing the entire gene cluster, the fragment lengths will be longer. However, where molecules representing individual genes are concerned, representative part lengths will be lower. As mentioned above, a number of genes and ORFs have been identified within SEQ ID NO: 1 , 26 and 37 and parts or fragments which comprise such genes or ORFs represent preferred "parts" or fragments of SEQ ID NO: 1 , 26 and 37. However, also

encompassed are parts or fragments of the SEQ ID NOs representing the individual genes or ORFs.

Nucleotide or amino acid sequence identity may be assessed by any convenient method. However, for determining the degree of sequence identity between sequences, computer programs that make multiple alignments of sequences are useful, for instance Clustal W (Thompson, J. D et al., 1994). Programs that compare and align pairs of sequences, like ALIGN (Myers, E. and Miller, W. 1988), FASTA (Pearson, W.R. and Lipman, D.J. 1988 and Pearson, W.R. 1990) and gapped BLAST (Altschul, S.F., et al., 1997) are also useful for this purpose. Furthermore, the Dali server at the European Bioinformatics institute offers structure- based alignments of protein sequences (Holm, 1993; Holm, 1995; Holm, 1998).

For example, nucleotide sequence identity may be determined using the BestFit program of the Genetics Computer Group (GCG) Version 10 Software package from the University of Wisconsin. The program uses the local homology algorithm of Smith and

Waterman with the default values: Gap creation penalty = 50, Gap extension penalty = 3, Average match = 10,000, Average mismatch = -9.000.

Thus for example, depending on the context, nucleotide sequence identity may be at least 70%, 75%, 85%, 87%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% to any nucleotide sequence (i.e. a nucleotide sequence of any SEQ ID NO.) stated herein (i.e. within the constraints and confines stated herein). Nucleotide sequences meeting the % sequence identity criteria defined herein may be regarded as "substantially identical" sequences or as functionally equivalent or variant sequences.

Programs for determining amino acid sequence identity are mentioned above, for example amino acid sequence identity or similarity may be determined using the BestFit program of the Genetics Computer Group (GCG) Version 10 Software package from the University of Wisconsin. The program uses the local homology algorithm of Smith and

Waterman with the default values: Gap creation penalty - 8, Gap extension penalty = 2, Average match = 2.912, Average mismatch = -2.003.

Thus for example, depending on the context, amino acid sequence identity may be at least 70%, 75%, 85%, 87%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% to any amino acid sequence (i.e. to an amino acid sequence of any SEQ ID NO.) stated herein (i.e. within the constraints and confines stated herein). Amino acid sequences meeting the % sequence identity criteria defined herein may be regarded as "substantially identical" sequences or as functionally equivalent or variant sequences.

The polypeptide/protein of the invention may be an isolated, purified or synthesized polypeptide. As noted above, the term "polypeptide" is used herein interchangeably with the term "protein" and includes any amino acid sequence of two or more amino acids, i.e. both short peptides and longer lengths are included.

A "part" of any protein or amino acid sequence as defined herein may contain at least

50%, more particularly at least 60, 70, 75, 80, 85, 90 or 95% of the amino acid residues of the molecule or sequence. A part may comprise at least 20 contiguous amino acids, preferably at least 30, 40, 50, 60, 70, 80, 90, 100, 1 10, 120, 150, 160, 170, 180, 190, 200, 210, 220, 240, 250, 260, 270 or 280 contiguous amino acids.

As noted above in relation to "functionally equivalent variants" or "functional

equivalents", a part of a nucleic acid or protein molecule, or of a nucleotide or amino acid sequence, as referred to herein advantageously retains the activity of the entity to which it is related (or from which it is derived), e.g. encodes or represents a protein with substantially the same properties, e.g. enzymatic or enzymatic subunit activity, and preferably retains the activity at substantially the same level as the parent entity. The part may thus correspond to, or comprise, an active site or functional part of the protein.

The nucleotide sequences described herein provide important tools and information which can be utilised in a number of ways to manipulate sarcinaxanthin biosynthesis, particularly to produce high levels of sarcinaxanthin through the heterologous expression of the biosynthetic machinery in host cells. By sarcinaxanthin biosynthetic machinery is meant a group of proteins (e.g. encoded by a gene cluster) that comprises one or more proteins that are involved in the sarcinaxanthin biosynthetic pathway, which is functional in sarcinaxanthin synthesis, but which is not necessarily restricted only to the presence of sarcinaxanthin biosynthetic enzymes or enzymatic domains, e.g. genes/proteins isolated from M. luteus strains. Thus, as noted above, certain proteins may replaced with functionally-equivalent counterparts from (e.g. derived from) other sources, that is proteins which catalyse the same conversions, or which exhibit the same or equivalent activity.

Although the nucleic acids used in the methods of the invention may correspond to native genes/ORFs or may encode native proteins, as noted above the respective nucleotide and/or amino acid sequences may be modified. The modification may take place by modifying one or more nucleotide sequences so as to cause the modification of one or more encoded proteins. This may result in alteration of enzyme activity e.g. improved enzymatic activity and consequently may enhance yields of sarcinaxanthin or derivatives thereof. Alternatively, such a modification may be desirable to facilitate the operation of the method, for example construction of an expression vector etc, or otherwise in the manipulation of the nucleic acids, or it may result in improved expression etc, or enable expression in a different host etc. Thus, by way of example, nucleic acid molecules of the invention may be utilised to manipulate or facilitate the biosynthetic process, for example by extending the host range or increasing yield or production efficiency etc.

As described in more detail below, recombinant expression of a nucleic acid molecule according to the invention may involve the introduction of one or more nucleic acid molecules into a host cell (e.g. a heterologous host cell) and the culturing (or growth) of that host cell under conditions which allow the nucleic acid molecule to be expressed and sarcinaxanthin or a derivative thereof to be produced (i.e. conditions which allow the expression product(s) of the nucleic acid molecule to synthesise sarcinaxanthin). In such a recombinant expression system, the nucleic acid molecule may be subject to modification before being introduced into the host cell and expressed.

In certain embodiments a host may be used which already contains some of the genes required to make precursors in the sarcinaxanthin pathway, e.g. a lycopene-producing host cell. In such a host, modification of the genes which are already present in the host may take place in situ. In other words, in a lycopene-producing host for example, the endogenous genes already present for lycopene production may be altered, for example to increase lycopene production, e.g. by gene replacement, the introduction of new regulatory sequences or mutagenesis.

In the methods of the invention, the nucleic acid molecules may be any of the nucleic acid molecules of the invention as defined herein, namely nucleic acid molecules containing nucleotide sequences corresponding to, or derived from, the Otnes7 genes. However, whilst in certain aspects this is preferred, particularly in the context of the biosynthetic pathway from lycopene, due to the greater efficiency of these genes in sarcinaxanthin production, this is not mandatory and nucleic acid molecules from or based on other sources may be used. Thus, for example, as noted above lycopene is a common intermediate in a number of pathways, and may be synthesised by a number of different organisms. Nucleic acid molecules based on known gene sequences for proteins involved in lycopene production may be used. In terms of the sarcinaxanthin biosynthesis pathway beyond lycopene, nucleic acid molecules

corresponding to, or derived from, any M. luteus genes may be used, e.g. corresponding to, or derived from, the crtE2 and/or crtYgYh genes of any strain of M. luteus may be used, including in particular strain NCTC2665.

Thus, in one embodiment the method of the present invention may comprise introducing into a lycopene-producing host cell and expressing:

(a) a nucleic acid molecule encoding a protein capable of catalysing the conversion of lycopene to flavuxanthin, or alternatively put a protein having lycopene elongase activity;

(b) a nucleic acid molecule encoding a C ₅₀ carotenoid γ-cyclase subunit and comprising:

(i) a nucleotide sequence as set forth in all or part of SEQ ID NO: 2 or SEQ ID NO: 12, or which is degenerate therewith, or which has at least 70% sequence identity to SEQ ID NO: 2 or 12;

(ii) a nucleotide sequence which hybridizes to SEQ ID NO: 2 or 12 under non- stringent binding conditions of 6 x SSC/50% formamide at room temperature and washing under conditions of high stringency, e.g. 2 x SSC, 65°C, where SSC = 0.15 M NaCI, 0.015M sodium citrate, pH 7.2; or

(iii) a nucleotide sequence encoding a protein having all or part of an amino acid sequence as set forth in SEQ ID NO: 3 or 13 or an amino acid sequence which is at least 70% identical to SEQ ID NO: 3 or 13; and

(i) a nucleotide sequence as set forth in all or part of SEQ ID NO: 4 or 14, or which is degenerate therewith, or which has at least 70% sequence identity to SEQ ID NO: 4 or 14;

(ii) a nucleotide sequence which hybridizes to SEQ ID NO: 4 or 14 under non- stringent binding conditions of 6 x SSC/50% formamide at room temperature and washing under conditions of high stringency, e.g. 2 x SSC, 65°C, where SSC = 0.15 M NaCI, 0.015M sodium citrate, pH 7.2; or

(iii) a nucleotide sequence encoding a protein having all or part of an amino acid sequence as set forth in SEQ ID NO: 5 or 15 or an amino acid sequence which is at least 70% identical to SEQ ID NO: 5 or 15.

Thus, in the context of (a), (b) and (c) above, the method may involve the introduction of a single nucleic acid molecule encoding, e.g. crtE2, crtYh and crtYg (or proteins with the equivalent functional activity) from either the NCTC2665 or preferably the Otnes7 strains of M. luteus. Alternatively, two or more separate molecules may be introduced. Preferably the nucleic acid molecules used in the invention comprise any combination of the nucleic acid molecules as defined herein.

In one embodiment of the invention the method results in the production of

sarcinaxanthin to a level of at least 150 μg/g of cell dry weight (CDW). Preferably,

sarcinaxanthin is produced to a level of at least 300, 500, 750, 1000, 2000, 2500 ^g/g CDW.

In a further embodiment the method of the invention results in a host cell, wherein at least 91 % of the total carotenoids produced is sarcinaxanthin. Preferably, at least 92, 93, 94, 95, 96, 97, 98 or 99% of the total carotenoids produced is sarcinaxanthin.

A lycopene-producing host cell may be any cell that is capable of producing lycopene, preferably in significant amounts, e.g. at least 0.5, 0.6, 0.7, 0.8, 1.0 or 1 .5 mg/g CDW. In other words, a lycopene-producing cell comprises the biosynthetic machinery necessary to produce lycopene, wherein said machinery may be present naturally or endogenously as part of the host cell genome or said machinery or parts thereof may be introduced into said host cell to enable said cell to produce lycopene. For example, the sarcinaxanthin biosynthetic machinery comprises genes encoding enzymes capable of producing lycopene, i.e. crtE, crtB and crtl. Thus, the method of the invention includes the introduction and expression of one or more nucleic acid molecules comprising a nucleotide sequence as set forth in all or part of any one of SEQ ID NOs: 18, 20, 22, 27, 29, 31 and 33, or which are degenerate therewith, or which are at least 70% identical to SEQ ID NOs: 18, 20, 22, 27, 29, 31 or 33, or which are otherwise related to SEQ ID NOs 18, 20, 22, 27, 29, 31 or 33 by analogy to the definitions given above in relation to SEQ ID NOs. 2, 4, 12 or 14 or their corresponding amino acid sequences. Alternatively, the endogenous lycopene biosynthetic machinery of the host cell may be modified so as to enhance lycopene production in said host.

As mentioned above, the lycopene biosynthetic pathway has been extensively described and more than one pathway is known to exist, e.g. the MEP pathway described above and in the carotenoid biosynthetic pathway in plants and cyanobacteria (see e.g. Cunningham et al., 1994). Hence, any combination of genes encoding enzymes that result in the production of lycopene in the host cell, whether endogenous or heterologously expressed is encompassed by the present invention.

In a preferred aspect, the lycopene producing host cell comprises genes encoding the CrtE, Crtl and CrtB proteins from Pantoea ananatis or parts or functional equivalents thereof, wherein said genes are expressed. In other words, the host cell comprises genes encoding three enzymes for the biosynthesis of lycopene from isoprenyl pyrophosphate (IPP) and dimethylallyl pyrophosphate (DMAPP). Said genes may be integrated into the host genome or present in the form of a plasmid or equivalent thereof. Conveniently, the lycopene producing host cell may comprise the plasmid pAC-LYC (Cunningham and Gantt, 2007). As discussed above, enzymes capable of catalysing the conversion of lycopene to flavuxanthin, i.e. lycopene elongases, are known in the art, e.g. crtEb from Corynebacterium glutamicum, and nucleic acid molecules encoding any enzymes with an equivalent functional activity may be used in the methods of the invention. In a preferred aspect of the present invention the nucleic acid molecule encoding a protein capable of catalysing the conversion of lycopene to flavuxanthin may be a nucleic acid molecule comprising:

(i) a nucleotide sequence as set forth in all or part of SEQ ID NO: 6, 7 or 10, or which is degenerate therewith, or which has at least 70% sequence identity to SEQ ID NO: 6, 7 or 10;

(ii) a nucleotide sequence which hybridizes to SEQ ID NO: 6, 7 or 10 under non- stringent binding conditions of 6 x SSC/50% formamide at room temperature and washing under conditions of high stringency, e.g. 2 x SSC, 65°C, where SSC =

0.15 M NaCI, 0.015M sodium citrate, pH 7.2; or

(iii) a nucleotide sequence encoding a protein having all or part of an amino acid sequence as set forth in SEQ ID NO: 8, 9 or 1 1 or an amino acid sequence which is at least 70% identical to SEQ ID NO: 8, 9 or 1 1.

More preferably, the nucleic molecule which encodes an enzymes capable of catalysing the conversion of lycopene to flavuxanthin is a nucleic acid molecule of the invention as defined above.

A sarcinaxanthin derivative can be defined as any modification of the sarcinaxanthin molecule, e.g. the addition of further chemical groups, wherein said groups may or may not alter the functional properties of sarcinaxanthin. Such a derivative may for example be a glycosylated derivative, for example which may carry one or two glycosyl groups. As described in the examples, the sarcinaxanthin biosynthetic gene cluster encodes a sarcinaxanthin glycosylase enzyme, which activity results in the production of both sarcinaxanthin mono- and diglucosides. Thus, in a preferred embodiment of the invention the method comprises the introduction of a further nucleic acid molecule into said host cell, wherein said nucleic acid molecule encodes an enzyme capable of glycosylating sarcinxanthin. More preferably, said nucleic acid molecule encodes crtX from M. luteus or a functional equivalent thereof. Most preferably, the nucleic acid comprises: (i) a nucleotide sequence as set forth in all or part of SEQ ID NO: 16 or 33, or which is degenerate therewith, or a nucleotide sequence with at least 70% sequence identity to

SEQ ID NO: 16 or 33;

(ii) a nucleotide sequence which hybridizes to SEQ ID NO: 16 or 33 under non-stringent binding conditions of 6 x SSC/50% formamide at room temperature and washing under conditions of high stringency, e.g. 2 x SSC, 65°C, where SSC = 0.15 M NaCI, 0.015M sodium citrate, pH 7.2; or (iii) a nucleotide sequence encoding a protein having all or part of an amino acid sequence as set forth in SEQ ID NO: 17 or 34 or which comprises an amino acid sequence which is at least 70% identical to SEQ ID NO: 17 or 34.

Further preferably, the nucleic acid molecule comprises a nucleotide sequence encoding a protein having sarcinaxanthin glycosylase activity and an amino acid sequence as set forth in all or part of SEQ ID NO: 34 or an amino acid sequence which is at least 90% identical to SEQ ID NO: 34, wherein said amino acid sequence comprises one or more of the following:

(a) histidine at position 62;

(b) serine at position 109;

(d) alanine at position 138;

(e) arginine at position 248;

(f) proline at position 251 ;

or a nucleotide sequence which is the complement of any aforesaid sequence.

The position numbers are stated with reference to SEQ ID NO. 34.

Alternatively, sarcinaxanthin produced according to the invention may be glycosylated by glycosylase enzymes or other glycosylation mechanisms which are present in the host cell.

Further, the sarcinaxanthin produced according to the invention may be glycosylated in vitro according to procedures well known in the art.

Also included as part of the invention are cells into which a nucleic acid molecule has been introduced, namely a heterologous host cell, for example in accordance with any of the methods as hereinbefore defined, or cells into which a nucleic acid molecule of the invention has been introduced.

To enable heterologous expression of a nucleic acid molecule(s) of the invention, the invention also provides a vector, for example a cloning or preferably an expression vector, comprising a nucleic acid molecule of the invention. Said vector may then be introduced into the host cell for expression of said nucleic acid molecule and therefore production of sarcinaxanthin.

Generally speaking to perform the methods of the invention an appropriate expression vector may include appropriate control sequences such as for example translational (e.g. start and stop codons, ribosomal binding sites) and transcriptional control elements (e.g. promoter- operator regions, termination stop sequences) linked in matching reading frame with the nucleic acid molecules required for performance of the method of the invention as described herein. Appropriate vectors may include plasmids and viruses (including, e.g. bacteriophage). Preferred vectors include bacterial expression vectors, e.g. pBAD-vectors, pET-vectors and pTRC- vectors. The nucleic acid molecule may conveniently be fused with DNA encoding an additional polypeptide, e.g. glutathione-S-transferase, to produce a fusion protein on expression.

A range of vectors are possible and any convenient or desired vector may be used. A vast range of vectors and expression systems are known in the art and described in the literature and any of these may be used or modified for use according to the present invention. Vectors may be used which are based on the broad-host-range RK2 replicon, into which an appropriate strong promoter may be introduced. For example WO 98/08958 describes RK2- based plasmid vectors into which the Pm/xylS promoter system from a TOL plasmid has been introduced. Such vectors represent preferred vectors which may be used according to the present invention. Alternatively, any vector containing the Pm promoter may be used, whether in plasmid or any other form, e.g. a vector for chromosomal integration, for example a transposon- based vector.

Other vectors or expression systems which may be used include for example those based on the pET, pBT, pMyr, pSos, pTRG or pGen expression systems. Promoters that may be useful in the expression of the proteins according to the invention include, but are not limited to, the lac promoter, T7, Ptac, PtrcT7 RNA polymerase promoter (Ρ ₇φ10), λΡι and P _BAD- The vectors may, as noted above, be in autonomously replicating form, typically plasmids, or may be designed for chromosomal integration. This may depend on the host organism used, for example in the case of host cells of Bacillus sp. chromosomal integration systems are used industrially, but are less widely used in other prokaryotes. Generally speaking for chromosomal integration, transposon delivery vectors for suicide vectors may be used to achieve homologous recombination. In bacteria, plasmids are generally most widely used for protein production.

Thus viewed from a further aspect, the present invention provides a vector, preferably an expression vector, comprising a nucleic acid molecule as defined above.

Other aspects of the invention include methods for preparing recombinant nucleic acid molecules according to the invention, comprising inserting nucleotide sequences encoding the polypeptides of the invention into vector nucleic acid.

Any suitable expression system may be used in the host cell and will be dependent on the nature of said cells. The vector may comprise any number of other genetic elements, e.g. for selection, integration of the nucleic acids into the host genome, regulation of the expression of the nucleic acid molecules etc. The regulatory elements may be derived from various sources that are well known in the art. Such regulatory elements may result in the constitutive

expression of said nucleic acid molecules or may be inducible. As noted above, in a preferred embodiment of the invention, the nucleic acid molecules used in the methods discussed above are under the control of the Pm/xylS promoter system. The Pm/xylS promoter system has been shown to function in a wide range of gram negative bacterial species, and has been found useful for over-expression of recombinant proteins (Mermod et al., 1986; Ramos et al., 1988; Blatny et al. 1997a). The uninduced expression level from Pm is low, and the use of different effector compounds at various concentrations can be used to regulate the level of induced expression (Winther-Larsen et al., 2000a). Many of the inducers are low-cost compounds that enter the cell by passive diffusion.

The Pm/xylS expression system has been used in the construction of broad-host range expression vectors based on the RK2 minimal replicon (Blatny et al., 1997b; Blatny et al., 1997a; and WO98/08958). One of these vectors, pJB658, has proven useful for tightly regulated recombinant gene expression in several gram-negative species (Blatny et al., 1997b; Blatny et al, 1997a; Brautaset et al., 2000; Winther-Larsen et al., 2000b). For example, this vector has been used for recombinant expression of a host-toxic single-chain antibody fragment (scFv), hGM-CSF and hlFN-2ab (Sletta et al., 2004; Sletta et al., 2007).

Introduction of a vector (e.g. a plasmid) or more than one vector comprising the nucleic acid molecules as defined herein into the appropriate host cell can be performed using routine methods in the art. This may ultimately result in the integration of the nucleic acid molecule(s) into the genome of the host cell or said vector may exist as an autonomic replicating unit within the host cell.

The resultant modified host cell will therefore contain a sarcinaxanthin biosynthetic gene cluster, which encodes a sarcinaxanthin enzyme system. The sarcinaxanthin biosynthetic machinery will be expressed and thus synthesise sarcinaxanthin molecules.

A preferred embodiment of the present invention involves the isolation of genes from a native organism which synthesises sarcinaxanthin, e.g. M.luteus NCTC2665 or Otnes7, or from an organism which synthesizes a sarcinaxanthin precursor such as lycopene of flavuxanthin, optionally modifying said genes, and the introduction of said genes into a host cell, i.e. an organism other than M. luteus, for expression and production of sarcinaxanthin and derivatives thereof.

Generally speaking, the nucleic acid molecule will be expressed in a host cell under conditions in which the biosynthetic machinery may be expressed. The host cell may be grown or cultured under conditions which allow the nucleic acid molecules and biosynthetic machinery to be expressed, and sarcinaxanthin or a derivative thereof to be synthesised.

Thus, the nucleic acid molecule may be expressed in any desired host cell, but preferably it will be expressed in a cell or microorganism other than that from which it was (or from which it may be) derived and in which the molecule is natively present.

The methods of the invention for producing sarcinaxanthin or a derivative thereof may further comprise the step of recovering (e.g. isolating or purifying) sarcinaxanthin, e.g. from the culture medium in which the host cell was grown or from the host cell. This can be isolated or purified from the cell culture medium into which it has been transported or secreted if appropriate, or otherwise from the host cell in which it has been produced. Thus, for example, the cells of the producing organism may be harvested, e.g. by centrifugation, and sarcinaxanthin or a derivative thereof may be extracted following cell lysis, for example with organic solvent(s) (e.g., methanol and acetone in a ratio of 7:3). The sarcinaxanthin or derivatives thereof may be recovered from such an extract, for example by precipitation or evaporation. Further purification of a crude product obtained in this way may include e.g. chromatography, e.g. HPLC.

As noted above, in one aspect the invention provides a host cell containing one or more nucleic acid molecules as defined above, wherein said molecule(s) has been introduced into said host cell.

By way of representative example, the crtE2YgYh regions of the M. luteus strain Otnes7, may be amplified from genomic DNA and inserted into an expression vector, e.g. pJBphOx. Said expression vector may then be introduced into a host cell, e.g. E.coli XL1 Blue containing the pAC-LYC plasmid (described above). The host cell may then be cultivated such that the proteins encoded by the pAC-LYC and expression vectors are expressed thereby resulting in the production of sarcinaxanthin.

Alternatively, a host cell (e.g. microorganism) which endogenously contains one or more nucleic acid molecules required for synthesis of a sarcinaxanthin precursor, e.g. lycopene or flavuxanthin, may be modified by introduction of one or more nucleic acid molecules which encode proteins which are capable of catalysing the conversion of lycopene to flavuxanthin to sarcinaxanthin, for example by simple introduction of the nucleic acid molecule, or by e.g. gene replacement, for example to replace the gene encoding the flavuxanthin-converting activity in the host cell. Thus for example, C. glutamicum cells mays be modified to replace or supplement the crtYeYf genes with a nucleic acid molecule encoding a γ-cyclase activity, including any such molecule as defined herein.

The host cell for use in the methods of the invention may be any desired cell or organism, prokaryotic or eukaryotic, but generally it will be a microorganism particularly a bacterium. More particularly, the host cell will be an Escherichia coli cell or a Corynebacterium glutamicum cell. Other representative host cells include both Gram negative and Gram positive bacteria. Suitable bacteria include Escherichia sp., Salmonella, Klebsiella, Proteus, Yersinia, Azotobacter sp., Pseudomonas sp., Xanthomonas sp., Agrobacterium sp., Alcaligenes sp., Bordatella sp., Haemophilus influenzae, Methylophilus methylotrophus, Rhizobium sp.,

Thiobacillus sp. and Clavibacter sp. In a particularly preferred embodiment, expression of the desired gene product occurs in E.coli. Eukaryotic host cells may include yeast cells or mammalian cell lines.

Preferably the host cells do not endogenously contain all of the nucleic acid molecules required for the synthesis of sarcinaxanthin or a derivative thereof, but may preferably comprise nucleic acid molecules encoding proteins required for the synthesis of sarcinaxanthin precursors, e.g. lycopene, nonaflavuxanthin or flavuxanthin. A suitable example is the E.coli XL1 Blue strain comprising the pAC-LYC plasmid (Cunningham and Gantt, 2007).

The novel isolated strain referred to above, from which the gene cluster was also sequenced (isolate Otnes7), as deposited under deposit number DSM 23579 at the DSMZ, may be used for the production of sarcinaxanthin, but is not a preferred host cell of the methods of the invention. However, this strain represents an important aspect of the present invention and a preferred source of the nucleic acid molecules for use in the methods of the invention, particularly nucleic acid molecules encoding proteins crtE2, crtYg and crtYh. The endogenous nucleic acid molecules of the sarcinaxanthin biosynthetic gene cluster of this strain may be modified as described herein (i.e. directly or indirectly) to identify nucleic acid molecules that encode proteins with further improved enzyme activity/substrate to product conversion efficiency. Alternatively, the Otnes 7 strain may be mutagenised and screened to identify isolates with improved sarinaxanthin activity. Genes from the sarcinaxanthin gene cluster may then be isolated and used in the methods of the invention.

A further aspect of the present invention is thus a strain of Micrococcus luteus as deposited under number DSM 23579 at the DSMZ, or a mutant or modified strain thereof which produces sarcinaxanthin or a derivative thereof.

The sarcinaxanthin produced by the methods of the invention may be further modified for example by glycosylation or other derivatisation, in order to exhibit or improve activity, e.g. antioxidant activity. Methods for glycosylating carotenoids are generally known in the art; the glycosylation may be effected intracellularly by providing the appropriate glycosylation enzymes or may be effected in vitro using chemical synthetic means.

Mutations can be made to the native sequences using conventional techniques. The substrates for mutation can be an entire cluster of genes or only one or two of them; the substrate for mutation may also be portions of one or more of these genes. Techniques for mutation are well known in the art and described in the literature. Such techniques include preparing synthetic oligonucleotides including the mutation(s) and inserting the mutated sequence into the gene using restriction endonuclease digestion. Alternatively, the mutations can be effected using a mismatched primer (generally 15-30 nucleotides in length) which hybridizes to the native nucleotide sequence, at a temperature below the melting temperature of the mismatched duplex. The primer can be made specific by keeping primer length and base composition within relatively narrow limits and by keeping the mutant base centrally located. Primer extension is effected using DNA polymerase, the product cloned and clones containing the mutated DNA, derived by segregation of the primer extended strand, selected. The technique is also applicable for generating multiple point mutations. PCR mutagenesis will also find use for effecting the desired mutations. The vectors used to perform the various operations described above may be chosen to contain control sequences operably linked to the resulting coding sequences in a manner that expression of the coding sequences may be effected in the host. However, simple cloning vectors may be used as well.

The invention will now be described in more detail in the following non-limiting Examples with reference to the drawings in which:

Figure 1 : Proposed biosynthetic pathway for the individual steps in the formation of sarcinaxanthin and its glucosides from lycopene. CrtEBI: GGPP synthase, phytoene synthase, phytoene desaturase; CrtE2: lycopene elongase; CrtYg+CrtYf: C ₅₀ carotenoid γ-cyclase; CrtX: C ₅₀ carotenoid glycosyl transferase.

Figure 2: HPLC elution profile of carotenoids extracted from M. luteus strain Otnes7 (A), lycopene-producing £ coli XL1 Blue pAC-LYC transformed with pCRT-E2YgYh-07 (B), pCRT- E2YgYhX-07 (C) and pCRT-E2-07 (D). Peak 1 , sarcinaxanthin diglucoside; peak 2, sarcinaxanthin monoglucoside; peak 3, sarcinaxanthin; peak 4, lycopene; peak 5, flavuxanthin; peak 6, nonaflavuxanthin; Peak 4' 5' and 6' are the cis isomers of 4, 5 and 6 respectively.

Absorption spectra of carotenoids from peaks 1 , 2 and 3 (solid line) and peaks 4, 5 and 6 (scattered line) are depicted in graph (E).

Figure 3: Carotenoid biosynthesis gene clusters from M. luteus, C. glutamicum and Dietzia sp. leading to C ₅₀ carotenoids sarcinaxanthin, decaprenoxanthin, C.p.450 and its glycosylated derivatives, respectively. Genes indicated in grey are suggested not to be involved in carotenoid biosynthesis.

Figure 4: The relative carotenoid abundance in extracts from £ coli pAC-LYC overexpressing crtE2YgYh genes from M. luteus strain Otnes7 and strain NCTC2665 cultivated in the presence of 0, 0.002, 0.01 and 0.5 mM m-toluate. The fraction of sarcinaxanthin, lycopene and intermediates are indicated by dark grey, white and light grey columns, respectively. Samples were analyzed after 48h of cultivation. The extracted total carotenoid was similar in the presented samples and 100 % carotenoid abundance corresponds to [x] ± [y] mg/g CDW total carotenoid. EXAMPLES

Example 1

Materials and Methods

Bacteria, plasmids, standard DNA manipulations, and growth media

Bacterial strains and plasmids used in this work are listed in Table 2. Bacteria were cultivated in Luria-Bertani (LB) broth (Sambrook, Fritsch et al. 1989), and recombinant £ coli cultures were supplemented with ampicillin (100 g/ml) and chloramphenicol (30 g/ml). M. luteus and C. glutamicum strains were grown at 30°C and 225 rpm agitation, while £. coli strains were generally grown at 37°C and 225 rpm agitation. For heterologus production of carotenoids, 250 ml cultures of recombinant E. coli strains were grown at 28°C with 180 rpm agitation in 500 ml Erlenmeyer shake flasks for 24 h in the presence of 0.5 mM of the Pm inducer m-toluate, unless otherwise stated. Standard DNA manipulations were performed according to Sambrook et al., (1989) and isolation of total DNA from M. luteus strains was performed as described elsewhere (Tripathi and Rawal 1998).

Vector constructions

pCRT-EBIE2YqYh-2665 and pCRT-EBI-2665: The complete crtEBI E2YgYh gene cluster of M. luteus NCTC2665 was PCR amplified from genomic DNA by using the primer pair crtE-F (5'-TTTTTCATATGGGTGAAGCGAGGACGGG-3') and crtYh-R (5 -

TTTTTGCGGCCGCTCAGCGATCGTCCGGGTGGGG-3'). The crtEBI region of M. luteus NCTC2665 was PCR amplified from genomic DNA by using the primer pair crtE-F (see above) and crtl-R (5'-TTTTTGCGGCCGCTCATGTGCCGCTCCCCCCGG). The resulting PCR products, crtEBI E2YgYh (5283 bp) and crtEBI (3693 bp), were end digested with Nde\ and A/oil (indicated in bold in primer sequences) and ligated into the corresponding sites of pJBphOx (Sletta et al., 2004), yielding plasmids pCRT-EBIE2YgYh-2665 and pCRT-EBI-2665, respectively.

pCRT-E2YaYh-2665 and pCRT-E2YgYh-07: The crtE2YhYg regions of M. luteus strains NCTC2665 and Otnes7 were PCR amplified from genomic DNA using primers crtE2-F (5'- TTTTTC ATATG AT CC G C AC CCTCTTCTG-3') and crtYh-R (see above). The obtained 1615 bp PCR products were blunt end ligated into pGEM-Teasy vector system (Promega, Madison, Wise), and the resulting plasmids were digested with Nde\ and A/oil and the 1597 bp inserts were ligated into the corresponding sites of pJBphOx, yielding plasmids pCRT-E2YgYh-2665 and pCRT-E2YgYh-07, respectively.

pCRT-E2YqYhX-Q7: The crtE2YgYhX region of M. luteus strain Otnes7 was PCR amplified from genomic DNA using primers crtE2-F (see above) and crtYX-R: (5'- TTTTTCCTAGGAGATGGCCGCGAACATCCTG). The obtained PCR product was end digested with Nde\ and Blnl (indicated in bold in the primer) and the corresponding 3085 bp fragment ligated into the corresponding sites of pJBphOx, resulting in pCRT E2YgYhX-07.

pCRT-E2Yq-Q7 and pCRT-E2Yq-2665: The crtE2Yg coding regions of M. luteus strains

NCTC2665 and Otnes7 were PCR amplified from chromosomal DNA using primers crtE2-F (see above) and crtYg-R (5'-TTTTTGCGGCCGCTCACCGGCTCCCCCGGTCGGTC-3'). The obtained PCR products were end digested with Nde\ and A/oil (indicated in bold in primer sequence) and resulting 1247 bp fragments ligated into the corresponding sites of pJBphOx, resulting in pCRT-E2Yg-07 and pCRT-E2Yg-2665, respectively.

PCRT-E2-Q7 and PCRT-E2-2665: The crtE2 genes of M. luteus strains NCTC2665 and Otnes7 were PCR amplified from chromosomal DNA using primers crtE2-F (see above) and crtE2-R (5'-TTTTTGCGGCCGCTCATGCCGCCGCCCCCCGGG-3'). The resulting PCR products were end digested with Nde\ and A/oil (indicated in bold in the primer sequence) and the corresponding 890 bp fragments ligated into likewise treated pJB658phOx, resulting in pCRT-E2-07 and pCRT-E2-2665, respectively.

pCRT-YgYh-07 and pCRT-YgYh-2665: The YgYh regions of M. luteus strains

NCTC2665 and Otnes7 were PCR amplified from genomic DNA by using primers crtYg-F (5'- TTTTTC ATATG AT CT AC CTGCTGGCCCT-3') and crtYh-R (see above). The resulting 734 bp PCR products were end digested with digested with Nde\ and A/oil (indicated in bold in the primer sequences) and resulting 716 bp fragments were ligated into the corresponding sites of pJB658phOx, resulting in pCRT-YgYh-07 and pCRT-YgYh-2665, respectively.

pCRT-E2YeYf-Hybrid: According to the gene sequences of crtE2 in M. luteus Otnes7 and crt Ye Yf in C. glutamicum MJ233-MV10, four primers crtE2-F (5 - TGACCAACGACCGGTAGCGGAG-3') and crtE2-i-R (5 -

CCCA TCCACTAAACTTAAACAJCAJGCCGCCGCCCCCCGG-3'), crtYe-i-F (5 - 7G7774^G7774G7GG/A7GGGTTGATCCCTATCATCGATATTTCAC-3') and crtYf-R (5 - TTTTGCGGCCGCTTTTCCATCATGACTACGGCTTTTC) were used. Primers crtE2-i-R and crtYe-i-F contain homologous extensions of 21 bp (italic) at the 5' ends as linker sequences in order to allow cross over PCR. Primer pair crtE2-F and crtE2-i-R was used to amplify a 1227 bp fragment containing the crtE2 gene from genomic M. luteus DNA and primer pair crtYe-i-F and crtYf-R was used to amplify a 885 bp crtYeYf containing fragment from genomic C. glutamicum DNA. The resulting PCR fragments were used as template for PCR with primer pair crtE2-F and crtYe-R to amplify a 2090 bp hybrid DNA fragment containing crtE2 from M. luteus and crtYeYf from C. glutamicum connected by the 21-bp linker sequence. The resulting hybrid fragment was end digested with Age\ and A/oil (indicated in bold in primer sequence) and the obtained 2070 bp DNA fragment ligated into the corresponding sites of pJB658phOx, resulting in vector pCRT- E2YeYf-Hybrid.

pCRT-YeYfEb-MJ: The crtYeYfEb genes from C. glutamicum strain MJ-233C-MV10 were PCR amplified from genomic DNA using primers crtYe-F1 (5 -

TG G CTAT CTCT AG AAAG G C CT AC C C CTT AG G CTTT ATG C AAC AG AAAC AATAAT AATG G AG TCATGAACATATGATCCCTATCATCGATATTTCAC-3') and crtYf-R (5'-

TTTTGCGGCCGCCTGATCGGATAAAAGCAGAGTTATATC-3'). The resulting PCR product was digested with Xba\ and Not\ (indicated in bold in primer sequence) and the resulting 1789 bp DNA fragment was ligated into the corresponding sites of pJBphOx, yielding pCRT-YeYfEb- MJ.

All the constructed vectors were verified by DNA sequencing and transformed by electroporation (Dower, Miller et al. 1988) into E. coli strain XL1-blue and the lycopene producing E. coli strain XL1 -blue (pAC-LYC), respectively (Cunningham, Sun et al. 1994). Extraction of carotenoids from bacterial cell cultures

To extract carotenoids from M. luteus strains, cells were harvested, washed with deionized H ₂0, treated with lysozyme (20 mg/ml) and lipase (Fluka Chemicals, Germany) according to (Kaiser, Surmann et al. 2007) and the pigments were extracted with a mixture of methanol and acetone (7:3). For recombinant E. coli strains, 50 ml aliquots of the cell cultures were centrifuged at 10,000xg for 3 min and the pellets were washed with deionized H ₂0, the cells were then frozen and thawed to facilitate extraction. Finally the pigments were extracted with 4 ml methanol/acetone at 55°C for 15 min with thorough vortex every 5 min. When necessary, up to three extraction cycles were performed to remove all colours from the cell pellet. When selective extraction for xanthophylls was desired, pure methanol was used. 0.05% butylhydroxytoluene (BHT) was added to the organic solvent to contribute to the stabilization of carotenoids. Samples for preparative HPLC were in addition partitioned into 50% diethyl ether in petroleum ether. The collected upper phase was evaporated to dryness and dissolved in methanol.

Quantification of carotenoids in cell extracts

Carotenoids were quantified on the basis of the area in the chromatographic analysis and by using a standard curve made by known concentrations of a trans-beta-apo-8'-carotenal and lycopene standard (Fluka). The correct concentrations of the standard was determined spectrophotometrically (Harker and Bramley 1999) by using the extinction coefficients E 1 cm 1 % of 3450 for lycopene and 2590 for apo-carotenal. Standards were filtered through a syringe 0.2 μηη polypropylene filter (Pall Gelman) and stored in amber glass vessels at -80 °C under N ₂ atmosphere if not analyzed immediately.

LC-MS analyses

LC-MS analyses were performed on an Agilent Ion Trap SL mass spectrometer equipped with an Agilent 1 100 series HPLC system. The HPLC system was equipped with a diode array detector (DAD) which recorded UV/VIS spectra in the range from 200-650 nm. Two HPLC protocols were used for the analysis in this work. A high throughput protocol for a fast quantitative determination of known carotenoids was used as follows; the carotenoids were eluted isocratically in MeOH for 5 min. A Zorbax rapid resolution SB RP C ₁₈ column with dimension 2.1 ^*30 mm was used for the analyses. Column flow was kept at 0.4 mL/min and

10 L extract was injected for each run. For detailed qualitative carotenoid separation a Zorbax SB RP C-I8 with dimension 2.1 ^*150 mm was used. The carotenoids were eluted isocratically in MeOH/Acetonitrile (7:3) for 25 minutes. The column flow was 250 μΙ/min and 10 or 20 μί sample was injected depending on the concentration.

For determination of the molecular masses of carotenoids, mass spectrometry (MS) was performed under the following conditions. Analytes were ionized using a chemical ionization source with settings 325 °C dry temperature, 350 °C vaporizer temperature, 50 psi nebulizer pressure and 5.0 L/min dry gas. The MS was operated in scan mode. For carotenoid identification, preparative HPLC was performed on an Agilent preparative HPLC 1 100 series system equipped with two preparative HPLC pumps, a preparative autosampler and a preparative fraction collector. Mobile phases were methanol in channel 1 and acetonitrile in channel 2. Samples of 2 mL were injected at a flow rate of 20 mL/min to a Zorbax RP C18 2.1 ^*250 mm preparative LC column. On-line MS analysis was performed by splitting the flow 1 :200 after the column using an Agilent LC flow splitter and a make-up flow of 1 mL

methanol/min was used to carry the analytes to the MS with less than 15 sec delay. The diode array detector was used to trigger fraction collection.

Carotenoid structure determination by NMR

All NMR spectra were recorded on a Bruker Avance 600 MHz instrument, fitted with a TCI cryoprobe using CDCI ₃ as solvent with TMS as internal reference. ¹H and ¹³C signals were unambiguously assigned by the aid of ip-COSY, HSQC, HMBC, NOESY and HSQC-TOCSY experiments.

Example 2

Analysis of carotenoids produced by M. luteus strains NCTC2665 and Otnes7

We initially characterised the major carotenoids synthesized by M. luteus, and the recently genome sequenced M. luteus NCTC2665 was chosen as one model strain. Cell extracts from shake flask cultures were analyzed by LC-MS and one major peak (peak 3) (Figure 2A) was identical to that of the sarcinaxanthin standard purified and structurally identified by NMR earlier M. luteus (Stafsnes et al., 2010). In addition, two minor peaks, peak 1 and peak 2, were identified with the same absorption spectra as that of sarcinaxanthin (Figure 2A). The retention time of peak 2 was equal to sarcinaxanthin monoglucoside identified by NMR earlier (Stafsnes et al., 2010), while peak 1 was more polar and therefore here predicted to represent sarcinaxanthin diglucoside (Table 3).

Several M. luteus strains from the sea surface microlayer of the mid-part of the

Norwegian coast has previously been isolated and characterized for their sarcinaxanthin production capacities (Stafsnes et al., 2010). One selected isolate, designated Otnes7, forms bright yellow colonies on LB agar plates and with higher colour intensity than that of strain

NCTC2665. Otnes7 was here classified as a M. luteus strain by 16S-rRNA sequence analysis (93 % identical to NCTC2665), and this strain was included as a second model strain.

Qualitative analysis of extracts confirmed that strain Otnes7 produces the same carotenoids as NCTC2665, while the total carotenoid level (190 μg/g CDW) of Otnes7 cells was higher than that of NCTC2665 cells (145 μg/g CDW). The latter result was in agreement with the different colour intensities of the respective bacterial colonies, and this was further investigated. Example 3

Cloning and genetic characterisation of the M. luteus NCTC2665 crtEIBE2YqYh sarcinaxanthin biosynthetic gene cluster

The genome sequence of M. luteus strain NCTC2665 was deposited in the databases (Accession number: NC_012803). In silico screening of the DNA sequence data resulted in identification of a putative carotenoid biosynthesis gene cluster consisting of eight open reading frames, or1007, or1009-or1014 and ORF1. The genetic organization of cf genes in M. luteus displayed certain similarities to the previously published biosynthetic gene clusters for the C ₅₀ carotenoids C.p.450 and decaprenoxanthin in Dietzia sp. (Tao, Yao et al. 2007) and

C. glutamicum (Krubasik, Kobayashi et al. 2001,), respectively (Figure 3).

Example 4

Expression of the crtEIBE2YgYh genes resulted in production of non-glycosylated sarcinaxanthin in E coli

To experimentally test if the identified M. luteus gene cluster encoded an active sarcinaxanthin biosynthetic pathway, the crtEBIE2YgYh region from NCTC2665 was cloned in frame and under transcriptional control of the positively regulated Pm promotor in plasmid pJBphOx (Sletta et al., 2004). This expression vector has many favourable properties useful for regulated expression of genes and pathways under relevant levels in gram-negative bacteria (for review, see Brautaset et al., 2009). The resulting plasmid pCRT-EBIE2YgYh-2665 was transformed into the non-carotenogenic E. coli host strain XL1-blue, and the recombinant strain was analysed for carotenoid production under induced conditions (0.5 mM m-toluic acid). LC- MS analysis of cell extracts revealed a small peak at identical retention time, absorption spectrum, and relative molecular mass as sarcinaxanthin identified in M. luteus strains. The recombinant E. coli strain produced small amounts of sarcinaxanthin (10 to 15 μg/g CDW), which was not present in plasmid free cells, confirming that the identified gene cluster encodes a sarcinaxanthin biosynthetic pathway from FFP.

Example 5

Sarcinaxanthin production levels can be increased up to 150-fold by expressing Otnes7 crtE2YgYh genes and in a lycopene producing E coli host

To overcome the poor sarcinaxanthin production levels obtained (above) a recombinant strain E. coli XL1 Blue (pCRT-EBI-2665) was established, expressing three enzymes catalyzing the conversion of FFP into lycopene (Figure 1 ). Analysis of this recombinant strain under induced conditions confirmed that it produced lycopene. However, the production levels (8 - 12 μg/g CDW) remained low; analogous with the sarcinaxanthin levels obtained with E. coli XL1 Blue (pCRT-EBIE2YgYh-2665) (see above). Therefore, E. coli XL1-blue was transformed with plasmid pAC-LYC (Cunningham and Gantt 2007) harbouring the Pantoea ananatis crtEBI genes encoding three enzymes for biosynthesis of lycopene from IPP (isoprenyl pyrophosphate) and DMAPP (dimethylallyl pyrophosphate). LC-MS analysis confirmed that the resulting strain XL1- blue (pAC-LYC) accumulated significant amounts of lycopene (1.8 mg/g CDW) as sole carotenoid. Therefore, all further carotenoid production experiments were performed by using XL1 -blue (pAC-LYC) as a host.

XL1 -blue (pAC-LYC) (pCRT-E2YgYh-2665), and LC-MS analysis of cell extracts revealed a total carotenoid accumulation of 2.3 mg/g CDW and about 90% of the total carotenoid produced was identified as sarcinaxanthin. These data demonstrated that the M. luteus NCTC2665 crtE2YgYh gene products can effectively convert lycopene into

sarcinaxanthin in a lycopene producing cell under these conditions. We also established and analysed the strain XL1-blue (pAC-LYC) (pCRT-EBIE2YgYh-2665) and the results were similar as for XL1-blue (pAC-LYC) (pCRT-E2YgYh-2665) strain. The latter result implies that the M. luteus crtEBI gene products are not efficient for lycopene production in E. coli, and whether this is due to poor expression levels or low catalytic activities in this host, remained unknown.

An analogous strain XL1 Blue (pAC-LYC) (pCRT-E2YgYh-07) was established, and the total carotenoid production level (2.5 mg/g CDW) of the resulting recombinant strain was slightly higher than that of analogous strain XL1 Blue (pAC-LYC) (pCRT-E2YgYh-2665). 97% of the total carotenoid produced by XL1 Blue (pAC-LYC) (pCRT-E2YgYh-07) was sarcinaxanthin indicating efficient conversion of the lycopene. It should also be noted that the sarcinaxanthin production levels obtained in this heterologous host was above 10-fold higher than those obtained by the two M. luteus strains under such conditions (see above). To further compare the efficiency of using Otnes7 versus NCTC2665 derived biosynthetic genes, production analyses were performed with different Pm inducer concentrations (Figure 4). The results demonstrated that strain XL1 -blue (pAC-LYC) (pCRT-E2YgYh-07) produced sarcinaxanthin to significantly higher levels than strain XL1 -blue (pAC-LYC) (pCRT-E2YgYh-2665) under all conditions tested, thus confirming that Otnes7 genes are preferable for efficient sarcinaxanthin production in an £. coli host. This result was in agreement with the higher sarcinaxanthin production levels of Otnes7 compared to NCTC2665 (see above). DNA sequence analysis of the cloned Otnes7 crtE2YgYh fragment revealed in total 24 nucleotide substitutions compared to the corresponding NCTC2665 DNA sequence, resulting in three amino acid substitutions in CrtE2, six in CrtYg, and two substitutions plus one insertion in CrtYh. It is proposed that one or more of these sequence variations positively affects the expression level or the catalytic properties of the respective proteins.

Example 6 Expression of crtE2 and crtE2Yq resulted in accumulation of nonaflavuxanthin and Cm flavuxanthin

To elucidate the detailed biosynthetic steps for the conversion of lycopene to sarcinaxanthin, recombinant strain XL1 Blue (pAC-LYC) (pCRT-E2-2665) was established and analysed for carotenoid production. Two different carotenoids were accumulated in the cells in addition to lycopene (Figure 2D); all three compounds shared identical UV/Vis profiles. No sarcinaxanthin was detected. The minor carotenoid had a molecular mass of 620 Da, indicating a C ₄₅ carotenoid and the major carotenoid had a molecular mass of 704 Da indicating a C ₅₀ carotenoid. The major carotenoid was separated by preparative HPLC and analyzed by NMR. Inspection of ¹H, ¹³C and HSQC spectra revealed chemical shifts in agreement with reported data for the acyclic C ₅₀ carotenoid flavuxanthin (Krubasik, Takaichi et al. 2001 ). The minor carotenoid was identified as nonaflavuxanthin on the basis of the UV/Vis profile and the mass (Table 3). These results verified that the M. luteus crtE2 gene encodes a lycopene elongase catalyzing the sequential elongation of the C ₄₀ carotenoid lycopene via the C ₄₅ carotenoid nonaflavuxanthin to the C ₅₀ carotenoid flavuxanthin. A similar analysis by using the analogous strain XL1 Blue (pAC-LYC) (pCRT-E2-07) gave the same conclusion. Interestingly, the relative conversion of lycopene was substantially higher in the latter strain (79% vs. 23%), which was in agreement with the generally higher sarcinaxanthin production level obtained when expressing Otnes7 genes (see Figure 4).

We then constructed and analysed recombinant strains XL1 Blue (pAC-LYC) (pCRT-

E2Yg-07) and XL1 Blue (pAC-LYC) (pCRT-E2Yg-2665). The carotenoids produced by both strains were flavuxanthin, nonaflavuxanthin and lycopene and their relative abundance was similar to strains XL1 Blue (pAC-LYC) (pCRT-E2-07) and XL1 Blue (pAC-LYC) (pCRT-E2- 2665), respectively. Taken together our data thus imply that the CrtYg and CrtYh polypeptides must function together as an active C ₅₀ carotenoid cyclase catalyzing cyclization of flavuxanthin to sarcinaxanthin in vivo. To our knowledge, this γ-type of carotenoid cyclase enzyme has not previously been described. To unravel if this cyclase can also catalyse cyclization of lycopene, we established and analysed recombinant strains XL1 Blue (pAC-LYC) (pCRT-YgYh-07) and XL1 Blue (pAC-LYC) (pCRT-YgYh-2665). HPLC analysis showed that both strains accumulated lycopene, confirming that the crtYgYh gene products can not use lycopene as a substrate in vivo.

Example 7

The c f qene product encodes an active qlvcosyl transferase that can be used to produce monoqlvcosylated sarcinaxanthin in E. coli host

Immediately downstream of crtYh there is a an ORF encoding a hypothetical protein, followed by or1007 which encodes a putative polypeptide sharing 43% primary sequence identity to the putative glycosyl transferase protein CrtX (Figure 3) from Dietzia sp., suggested to be involved in the glycosylation of C.p.450 (Tao, Yao et al. 2007). To our knowledge, no analogous gene has been found in the C. glutamicum genome sequence and still this bacterium can synthesize glycosylated decaprenoxanthin (Krubasik, Takaichi et al. 2001 ). The or1007 gene was herein named crtX, and to unravel its biological function we constructed and analysed recombinant strain XL1 Blue (PAC-LYC) (pCRT-E2YgYhX-07). The resulting HPLC profile (Figure 2C) revealed sarcinaxanthin as the major carotenoid (peak 3), but an additional more polar carotenoid was eluted earlier (peak 2) which had an identical retention time and absorption spectrum to that of sarcinaxanthin monoglucoside from M. luteus Otnes 7 (Figure 2C and E). Another minor peak was observed with the same retention time as that of

sarcinaxanthin diglucoside; however, the detected amount was too low for a confident analysis of the mass and absorption spectrum. Interestingly, about 10% of the total produced

sarcinaxanthin was glycosylated both in M. luteus and when produced heterologously in E. coli. These results confirmed that crtX encodes an active glycosyl transferase that is necessary for the glycosylation of sarcinaxanthin under the conditions tested.

Based on all accumulated data we could deduce the complete biosynthetic pathway of sarcinaxanthin and its glucosides from FFP and via lycopene in M. luteus (Figure 1 ), and this represents to our knowledge the first experimentally confirmed biosynthetic pathway of a v- cyclic C ₅o carotenoid.

Table 2: Bacterial strains and plasmids used for heterologous production of sarcinaxanthin and other C ₅o carotenoids

Strain/Plasmid Relevant characteristics Reference source

Strain

E. coli DH5a General cloning host Gibco-BRL

£. coli XL1 -blue General cloning host Stratagene

National collection

M. luteus NCTC2665

of Type Cultures

M. luteus Otnes7 Marine wild type isolate This work

(Kurusu, Kainuma et al. 1990; Vertes,

C. glutamicum MJ-233C- Tn31831 mutant of C. glutamicum MJ-233C;

Asai et al. 1994; MV10 contains wild type crt gene cluster

Krubasik, Takaichi et al. 2001 )

Plasmid

pGEM-T pJBphOx

Cm ^r, lycopene producing plasmid containing (Cunningham, pAC-LYC

crtEIB from P. ananatis, p15A ori Chamovitz et al. 1993)

Amp ^r, pGEM-T with crtE2YgYh fragment

pGEM-TcrtE2YgYh-07 This work from strain Otnes7

Amp ^r, pGEM-T with crtE2YgYh fragment

pGEM-TcrtE2YgYh-2665 This work from strain NCTC2665