Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
REGULATORY FACTORS CONTROLLING OIL BIOSYNTHESIS IN MICROALGAE AND THEIR USE
Document Type and Number:
WIPO Patent Application WO/2012/047970
Kind Code:
A1
Abstract:
The production of transgenic plant oils is described by transfecting plant host cells with heterologous transcription inducers. Such inducers are specific for endogenous biosynthetic plant oil genes, such that the inducers induce an overexpression of the plant oil genes. Nitrogen (N)-deprivation was used to induce triacylglycerol accumulation and changes in developmental programs such as gametogenesis. Comparative global analysis of transcripts under induced and non-induced conditions was applied as a first approach to studying molecular changes that promote or accompany triacylglycerol accumulation in cells encountering a new nutrient environment. N-deprivation led to a marked redirection of metabolism: the primary carbon source, acetate, was no longer converted to cell building blocks by the glyoxylate cycle and gluconeogenesis, but funneled directly into fatty acid biosynthesis. The data provided here represent a rich source for the exploration of the mechanism of oil accumulation in microalgae. The overexpressed gene product (i.e., for example, diacylglycerol transferase) thereby results in an increased intracellular production of a plant oil (i.e., for example, triacylglycerol). In one embodiment, the transfected plant host cell can be a plant algae species (i.e., for example, Chlamydomonas reinhardtii).

Inventors:
BENNING CHRISTOPH (US)
MILLER RACHEL (US)
Application Number:
PCT/US2011/054881
Publication Date:
April 12, 2012
Filing Date:
October 05, 2011
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV MICHIGAN STATE (US)
BENNING CHRISTOPH (US)
MILLER RACHEL (US)
International Classes:
C12N15/63; C12N1/13
Domestic Patent References:
WO2010075143A12010-07-01
Foreign References:
US20100192258A12010-07-29
US20070009942A12007-01-11
US6294328B12001-09-25
US5523089A1996-06-04
JP2009072081A2009-04-09
US20100243451A12010-09-30
US20090163729A12009-06-25
Other References:
DATABASE GENBANK [online] 6 December 2007 (2007-12-06), "CACW26989.fwd CACW Chlamydomonas reinhardtii Strain S1D2 normal Chlamydomonas reinhardtü cDNA clone CACW26989 5-, mRNA sequence.", retrieved from http://www.ncbi.nlm.nih.gov/nucesUFC099837 Database accession no. FC099837
DATABASE GENBANK [online] 6 December 2007 (2007-12-06), "CACW21197.fwd CACW Chlamydomonas reinhardtii Strain S1D2 normal Chlamydomonas reinhardtii cDNA clone CACW21197 5-, mRNA sequence.", retrieved from http://www.ncbi.nlm.nih.gov/nucesUFC088806 Database accession no. FC088806
DATABASE GENBANK [online] 25 July 2000 (2000-07-25), "925008G04.y1 C. reinhardtii CC-2290, normalized, Lambda Zap II Chlamydomonas reinhardtii cDNA, mRNA sequence.", retrieved from http://www.ncbi.nlm.nih.gov/nucesUBE441688 Database accession no. BE441688
DATABASE UNIPROT [online] 9 July 2010 (2010-07-09), PROCHNIK ET AL., retrieved from http://www.uniprot.org/uniproUD8TIZ8 Database accession no. D8TIZ8
DATABASE GENBANK [online] 10 April 2007 (2007-04-10), "1024029G03.y2 C. reinhardtii CC-1690, normalized, Lambda Zap II Chlamydomonas reinhardtii cDNA, mRNA sequence", retrieved from http:/lwww.ncbi.nlm.nih.gov/nucestlBG850913 Database accession no. BG850913
DATABASE GENBANK [online] 10 April 2007 (2007-04-10), "1024010H11.y2 C. reinhardtii CC-1690, normalized, Lambda Zap II Chlamydomonas reinhardtii cDNA, mRNA sequence.", retrieved from http://www.ncbi.nlm.nih.govlnucest/BG845670 Database accession no. BG845670
DATABASE GENBANK [online] 8 May 2010 (2010-05-08), "Chlamydomonas reinhardtii C9 Chlamydomonas reinhardtii cDNA clone CM079c12_r 5-, mRNA sequence.", retrieved from http://www.ncbi.nlm.nih.gov/nucest/AV391942 Database accession no. AV391942
Attorney, Agent or Firm:
CARROLL, Peter, G. et al. (LLP703 Market Street,Suite 34, San Francisco CA, US)
Download PDF:
Claims:
CLAIMS:

We claim: 1. A composition comprising a nucleic acid sequence encoding a biosynthetic oil gene transcription regulator.

2. The composition of Claim 1, wherein said nucleic acid sequence is in a vector. 3. The composition of Claim 1, wherein said composition further comprises a heterologous promoter sequence.

4. The composition of Claim 1, wherein said nucleic acid sequence comprises SEQ ID NO:13 (TFU1).

5. The composition of Claim 1, wherein said nucleic acid sequence comprises SEQ ED NO:16 (TFU2).

6. The composition of Claim 1, wherein said nucleic acid sequence comprises SEQ ID NO:19 (TFU3).

7. The composition of Claim 1, wherein said nucleic acid sequence comprises SEQ ID NO:22 (TFU4). 8. The composition of Claim 1, wherein said nucleic acid sequence comprises SEQ ID NO:25 (TFU5).

9. The composition of Claim 1, the nucleic acid sequence is derived from a plant algae.

10. The composition of Claim 9, wherein said algae comprises Chlamydomonas reinhardtii.

11. A composition comprising an amino acid sequence comprising a biosynthetic oil gene transcription regulator. 12. The composition of Claim 11, wherein said amino acid sequence comprises SEQ ID NO:12 (TFUl).

13. The composition of Claim 11, wherein said amino acid sequence comprises SEQ ID NO:17 (TFU2).

14. The composition of Claim 11, wherein said amino acid sequence comprises SEQ ID NO:20 (TFU3).

15. The composition of Claim 11, wherein said amino acid sequence comprises SEQ ID NO:23 (TFU4).

16. The composition of Claim 11, wherein said amino acid sequence comprises SEQ ID NO:26 (TFU5).

17. The composition of Claim 11, wherein said amino acid sequence is derived from a plant algae.

18. The composition of Claim 17, wherein said algae comprises Chlamydomonas reinhardtii.

19. A method comprising:

a) providing;

i) a vector comprising 1) a nucleic acid sequence comprising a biosynthetic oil gene transcription regulator and 2) a promoter operably linked to said nucleic acid sequence;

ii) at least one plant cell comprising a biosynthetic oil gene;

b) transfecting said plant cell with said vector such that said nucleic acid sequence is expressed.

20. The method of Claim 19, wherein said method further comprises step (c) inducing said biosynthetic oil gene with said transcription regulator, wherein said biosynthetic oil gene expression is upregulated. 21. The method of Claim 20, wherein said gene expression is ectopic.

22. The method of Claim 20, wherein said biosynthetic oil gene is upregulated between 1.5 - 3 fold. The method of Claim 20, wherein said biosynthetic oil gene is upregulated between

24. The method of Claim 20, wherein said biosynthetic oil gene is upregulated between 5.5 - 7 fold.

25. The method of Claim 20, wherein said biosynthetic oil gene is upregulated between 7.5 - 10 fold.

The method of Claim 20, wherein said biosynthetic oil gene encodes an enzyme.

The method of Claim 26, wherein said enzyme produces a fatty acid.

The method of Claim 27, wherein said fatty acid comprises triacylglycerol.

29. The method of Claim 28, wherein said method further comprises step (e) collecting the fatty acid, thereby forming a biosynthetic oil.

30. The method of Claim 20, wherein said plant cell comprises an algae cell.

31. The method of Claim 30, wherein said algae cell comprises a Chlamydomonas reinhardtii cell.

32. A transgenic plant cell line, wherein said line was transfected with a vector comprises a nucleic acid sequence encoding a transcription regulator.

33. The transgenic plant cell of Claim 32, wherein said sequence is selected from the group consisting of SEQ ID NOS: 13, 16, 19, 22, 25, 28, 31, 34, 37 and 40.

Description:
REGULATORY FACTORS CONTROLLING OIL BIOSYNTHESIS IN

MICROALGAE AND THEIR USE

Statement of Governmental Support

This invention was made with government support awarded by: i) the US Air Force

Office of Scientifc Research (FA9550-08-1-0165 to C.B.); ϋ) the US National Science Foundation (MCB-0749634 and MCB-0929100 to S-H.S.); hi) a German Academic Exchange Program postdoctoral fellowship to S.Z.; and Michigan State Agricultural Experiment Station. The government has certain rights in the invention.

FIELD OF THE INVENTION

The present invention is related to biosynthetic oil compositions and methods of making thereof. The present invention contemplates using plant material (i.e., for example, algae) comprising recombinant transcription inducing factors for biosynthetic oil genes. In some embodiments, the inducing factors are transcription factor regulatory proteins.

BACKGROUND OF THE INVENTION

Plants have long been a commercially valuable source of oil. Traditionally, plant oils were used for nutritional purposes. Recently, however, attention has focused on plant oils as sources of industrial oils, for example as replacements for, or improvements on, mineral oils. Given that oil seeds of commercially useful crops such as Brassica napus contain a variety of lipids (Hildish & Williams, "Chemical Composition of Natural Lipids", Chapman Hall, London, 1964) [1], it is desirable to tailor the lipid composition to better suit other needs, for example use in recombinant DNA technology (Knauf, TIBtech, February 1987, 40-47) [2].

The production of commercially desirable specific oils in plants on a large scale is limited in two ways. Some plant species make oils with very high levels of essentially pure, specific fatty acids, but these species are unable to be grown in sufficient quantities and of sufficient yield to provide a commercially valuable product. Other plant species produce sufficient amounts of oil, but the oil has low levels of the specific desired fatty acids. Nevertheless, the field of oil modification in plants is wide and a number of different products have already been designed. Rape oil containing lauric acid has been marketed, and soybeans with modified levels of unsaturated fatty acids are available. In some cases, the production of specialty oils seems to be straight-forward. In others, however, a number of unexpected complications have arisen which have hampered the production of plants capable of making some specific oils. For example, mutations in plant lipid synthesis genes are generally difficult to detect due to the pleiotrophic effects of mutations on plant hardiness and yield. Even if detected, proteins involved in pathways of interest have proved difficult to isolate due to their biochemical instability.

Where regulation of such proteins has been successfully altered, results generally do not coincide with expectations, presumably due to the effect of multiple converging pathways. Examples of such problems relating to the production of Arahidopsis producing petroselinic acid are disclosed in Ohlrogge, 13 th International Symposium on Plant Lipids, Seville, Spain: 219 & 801, (1998). Thus, there are considerable problems to be solved in achieving reliable, large-scale production of a range of commercially desirable oils. For example, what is needed is a high-throughput method to improve biosynthetic plant oils production by recombinant engineering of biosynthetic oil gene regulation.

The search for sustainable sources of biofuels has led to renewed interest in microalgae as potential feed stock and consequently, a flurry of research has recently been initiated in microalgae (Wijffels and Barbosa, 2010) [3]. Microalgae accumulate large quantities of oils in the form of triacylglycerols (TAGs) when nutrient-deprived, and a thorough analysis of the underlying molecular mechanism is currently in its infancy (Hu et al, 2008) [4]. At this time, C. reinhardtii is the premier microalgal molecular model for this analysis. As such, the formation of lipid droplets following N-deprivation has recently been documented in detail by us and others (Moellering and Benning, 2010, Wang et al, 2009) [5, 6]. Although C. reinhardtii is not under direct consideration for the production of biomass as bio fuel feed stock, the analysis of its metabolism and physiology is expected to provide basic insights into mechanisms of TAG accumulation relevant to other microalgae, at least of the green algal phylum.

A sustainable liquid fuel source that is compatible with current transportation technology is desirable for short term transition initiatives focused on off-setting increasing fuel consumption in the United States. Ideally, the production of the replacement fuel would be isolated from aspects of agricultural production of food, inherently reduce environmental emissions, and contribute to a reduction in greenhouse gas emissions (Dismukes et al, 2008) [7]. The increasing requirement for alternative liquid fuel sources is evident in the depletion of petroleum oil feed stocks, rising fuel costs, and current concerns relevant to national security.

It is already apparent that the production of biofuels such as ethanol will only be instrumental in temporarily off- setting the growing requirement for gasoline used in cars (Ohlrogge et al, 2009) [8]. However, in the long term implementation of high energy density fuels such as biodiesel and biomass derived jet fuels (replacing JP-8) is necessary. Engineered plants that produce altered or increased seed oil content are on the horizon (Durrett et al, 2008) [9]. Although these and similar engineering strategies have the promise of promoting the production of seed oil in agronomical relevant crop systems, a multifaceted, diversified portfolio approach is desirable to provide oil on a scale that effectively off-sets increasing high energy liquid fuel demand. The nature of the portfolio is yet undefined as researchers engage this problem. However, it is clear that potentially viable options include the production of oil by microalgae (Dismukes et al, 2008, Wijffels and Barbosa, 2010) [3, 7]. Recent theoretical calculations taking into account different locations predict that microalgal photosynthesis can produce between 40,000 and 50,000 L ha "1 year "1 , which is 5-to-6 times the yield observed for oil palm (Weyer et al, 2010) [10]. The Aquatic Species Project, a major research initiative focused on the identification of algal species that produce large quantities of oil, had been implemented in the 1980s by the US Dept. of Energy (http://wwwl .eere.energy.gov/biomass/pdfs/biodiesel_from_algae.pdf). The initiative was successful in many areas of discovery, including the identification of species that have the propensity to produce and accumulate oil up to 60% of their cell mass, culture conditions that promote rapid growth and decrease the doubling time for specific species, and the development of facilities and infrastructure that support algal cultures. While this imitative did not address the improvement of microalga by genetic engineering, this approach is clearly on the horizon and expected to be applied to the improvement of algal production strains in the future (Radakovits et al, 2010) [11].

Knowledge and understanding of biological model systems coupled with recent advances in genetic and genomic technologies have opened a wealth of new possibilities. These new advances promise rapid progress towards the development of algal systems that could potentially produce oil on a scale that might soon supplement and possibly one day replace fossil high energy density liquid fuels. The proposed work takes advantage of recent advances in genetics and genomics of the model microalga C. reinhardtii. While this alga might ultimately not give rise to large scale production strain, it provides the means to discover regulatory principles, genes, and proteins essential to the biosynthesis of TAGs in microalgae. The peculiarities of C. reinhardtii lipid metabolism

The eukaryotic green alga C. reinhardtii is a well established model for many processes such as photosynthesis (Niyogi, 1999) [12], phototaxis and flagellar function (Silflow and Lefebvre, 2001) [13], post-transcriptional gene silencing (Wu-Scharf et al, 2000) [14], and nutrient acquisition (Davies et al, 1996) [15]. The recent large-scale EST projects (Grossman et al, 2003, Shrager et al, 2003) [16, 17] and sequencing of the C. reinhardtii genome (Merchant et al, 2007) [18], as well as the development of insertional mutagenesis (Tarn and Lefebvre, 1993) [19], RNA interference (RNAi) methods (Fuhrmann et al, 2001, Sineshchekov et al, 2002) [20, 21], and a molecular map (Kathir et al, 2003) [22] make C. reinhardtii an attractive model in which to study gene function by both forward and reverse genetics. Based on these attributes, C. reinhardtii has been instrumental in studying the biosynthesis and physiological functions of non-phosphorous lipids such as the plastidic galacto- and sulfolipids, and the betaine lipid ctiacylglyceryl-N^V- trimethylhomoserine (Giroud and Eichenberger, 1989, Moore et al, 2001, Riekhof et al, 2005, Sato, 1988) [23-27], as well as phospholipids such as phosphatidylethanolamine (Yang et al, 2004) [28] or phosphatidylglycerol (El Maanni et al, 1998, Pineau et al, 2004, Seras et al, 1989) [29-31].

Based on genome annotation, the basic lipid biosynthetic pathways present in C. reinhardtii have been reconstructed (Riekhof and Benning, 2009, Riekhof et al, 2005) [25, 32]. In fact, the entire primary metabolic network of C. reinhardtii has been recently reconstructed and a flux balance analysis has been conducted (Boyle and Morgan, 2009) [33]. It should be noted that lipid metabolism in C. reinhardtii has key differences when compared to plants like Arabidopsis: It lacks enzymes for the synthesis of phosphatidylcholine, but has a betaine lipid synthase instead. As such betaine lipid seems to replace phosphatidylcholine in this algae (Giroud et al, 1988) [34]. This difference is not trivial; much of the fatty acid modifications leading to the synthesis of TAGs and specific aspects, such as interorganelle lipid trafficking, involve phosphatidylcholine in plants (Benning, 2009, Durrett et al, 2008) [9, 35].

During the past year, research relevant to TAG metabolism in C. reinhardtii has gained momentum. Initial studies on the formation of lipid droplets in C. reinhardtii lipid droplet associate proteins were published (Moellering and Benning, 2010, Wang et al, 2009) [6, 36]. Moreover, it was shown that a defect in starch accumulation shifts carbon partitioning towards oil (Wang et al, 2009, Work et al, 2010) [6, 37]. Transcript profiling following nutrient deprivation by RNA Seq has reached new levels providing deep and likely complete gene coverage, permitting robust statistical analysis of events potentially leading to oil accumulation (Gonzalez-Ballester et al. , 2010, Miller et al. , 2010) [38, 39].

SUMMARY OF THE INVENTION:

The present invention is related to biosynthetic oil compositions and methods of making thereof. The present invention contemplates using plant material (i.e., for example, algae) comprising recombinant transcription inducing factors for biosynthetic oil genes. In some embodiments, the inducing factors are transcription factor regulatory proteins.

GENERAL DESCRIPTION OF THE INVENTION

Analysis of differentially regulated genes. Global transcript analysis under oil inducing conditions and standard growth conditions has yielded a large set of differentially regulated genes, providing candidates with a function in the regulation of lipid accumulation. One class will be particularly explored at the mechanistic level: genes that encode putative transcriptional regulators.

Differentially regulated transcription factors. One of the potentially most promising targets for uncoupling of the adverse effects of nutrient starvation from induction of oil accumulation would be the identification of a transcription factor specifically regulating oil biosynthesis. It would seem plausible that the expression of the gene encoding such a transcription factor itself might be induced following N-deprivation. Table 4 shows the most regulated transcription factors that we will pursue in this study.

Identifying transcription factors regulating oil accumulation. Candidates for transcription factors regulating oil biosynthesis in C. reinhardtii may present themselves as proteins encoded by genes disrupted in the mutants described above or as encoded by genes regulated during N-deprivation listed in Table 4.

. For the former, the genetic evidence already points towards a function of the protein in oil biosynthesis and corroborating evidence is sought. To confirm the involvement of a putative transcription factor, we will constitutively express the respective cDNAs following isolation by RT-PCR under the control of the PsaD promoter using the modified pJR38 vector (Neupert et al, 2009) [40] into which we inserted a multiple cloning site in place of the GFP coding sequence. We hypothesize that a positively acting DNA binding factor will induce genes required for oil biosynthesis, but its own gene may be turned off or down during growth on N-replete conditions. Thus, overriding the control of its own promoter by using the constitutive PsaD promoter may lead to the activation of genes involved in oil accumulation. Thus, following the isolation of transgenic C. reinhardtii cell lines, we will first screen them for expression of the transgene using quantitative RT-PCR. Promising lines with elevated transgene expression will then be tested for oil accumulation under N-replete conditions using lipid and fatty acid profiling techniques as described for the characterization of the insertion mutants above. It is also possible that the respective factor acts as a negative regulator under N-replete conditions, in particularly those encoded by genes down regulated following N-deprivation (see Table 4). Thus we will also test the expression lines following N-deprivation. If the respective factors act as repressors, we would expect oil accumulation to be reduced in these transgenic lines following N-deprivation.

Characterization of identified transcription factors. Once a gene has been confirmed to encode a potential transcriptional regulator of oil biosynthesis based on the criteria discussed above, a series of experiments can be initiated to determine the target genes for this factor. Towards this end, we will conduct a sequencing based RNA profiling experiment comparing the wild type and transgenic lines mentioned above under N-replete and N- deprived growth conditions. Differences in the transcript profiles will point towards potential target genes that can be further studied. Moreover, such a global approach may reveal the extent of the transcriptional network governed by the factor under investigation. It seems possible that factors are acting broadly affecting diverse processes such as gametogenesis, N- metabolism and oil biosynthesis. For future engineering purposes it would be most desirable to isolate a factor specifically regulating genes involved in oil biosynthesis as this would minimize side effects of a subsequent engineering effort.

As soon as potential target genes have been identified, the interaction of the respective protein -assuming it is a DNA binding protein- with the promoter sequence can be studied. Binding sites may be deduced using a comparative analysis of promoter sequences of all regulated genes or by in vivo deletion analysis of a specific promoter fused to a reporter gene. We are currently applying these techniques to the analysis of a plant transcription factor, WRI1, involved in the regulation of seed oil biosynthesis (Cernac and Benning, 2004) [41], Thus, we are familiar with the underlying techniques, which are easily transferable to the analysis of C. reinhardtii transcription factors. During the course of the proposed study we anticipate that we will identify and corroborate at least one transcription factor involved in the regulation of oil biosynthesis in C. reinhardtii and will be able to begin a detailed analysis of its target genes and specific binding site. In one embodiment, the present invention a composition comprising a nucleic acid sequence encoding a biosynthetic oil gene transcription regulator (e.g. a transcription factor described herein). In a preferred embodiment, said nucleic acid sequence is in a vector. In one embodiment, said vector is in a cell or organism. In one embodiment, said composition further comprises a heterologous promoter sequence in operable combination with said sequence for said transcription factor). In one embodiment, said nucleic acid sequence comprises SEQ ID NO: 13 (TFUl). In another, said nucleic acid sequence comprises SEQ ID NO: 16 (TFU2). In another, said nucleic acid sequence comprises SEQ ID NO: 19 (TFU3). In yet another, said nucleic acid sequence comprises SEQ ID NO:22 (TFU4). In still another, said nucleic acid sequence comprises SEQ ID NO:25 (TFU5). In still others, said sequence is selected from the group consisting of SEQ ID NOS: 13, 16, 19, 22, 25, 28, 31, 34, 37, and 40. In a preferred embodiment, the nucleic acid sequence is derived from a plant algae (e.g. Chlamydomonas reinhardtii) .

In other embodiments, the present invention contemplates a composition comprising an amino acid sequence comprising a biosynthetic oil gene transcription regulator. In one embodiment, said amino acid sequence comprises SEQ ID NO: 14 (TFUl). In another, said amino acid sequence comprises SEQ ID NO: 17 (TFU2). In still another, said amino acid sequence comprises SEQ ID NO:20 (TFU3). In yet another, said amino acid sequence comprises SEQ ID NO:23 (TFU4). In still others, said amino acid sequence comprises SEQ ID NO:26 (TFU5). In yet others, said amino acid sequence is selected from the group consisting of SEQ ID NOS: 29, 32, 35, 38 and 41. In one embodiment, said amino acid sequence is derived from a plant algae (e.g. Chlamydomonas reinhardtii)

The present invention also contemplates methods of making vectors and methods of transfecting with vectors, so as to generate useful oils. In one embodiment, the method comprises: a) providing i) a vector comprising 1) a nucleic acid sequence comprising a biosynthetic oil gene transcription regulator and 2) a promoter operably linked to said nucleic acid sequence; and ii) at least one plant cell comprising a biosynthetic oil gene; and b) transfecting said plant cell with said vector such that said nucleic acid sequence is expressed. In one embodiment, the method further comprises step (c) inducing said biosynthetic oil gene with said transcription regulator, wherein said biosynthetic oil gene expression is

upregulated.

In one embodiment, said gene expression is ectopic. In one embodiment, said biosynthetic oil gene is upregulated between 1.5 - 3 fold or between 3.5 - 5 fold or between 5.5 - 7 fold or between 7.5 - 10 fold. In one embodiment, said biosynthetic oil gene encodes an enzyme. In one embodiment, said enzyme produces a fatty acid. In a preferred embodiment, said fatty acid comprises triacylglycerol. In one embodiment, said method further comprises step (d) collecting the fatty acid, thereby forming a biosynthetic oil. In a preferred embodiment, said plant cell comprises an algae cell (e.g. a Chlamydomonas reinhardtii cell or cells).

The present invention also contemplates in one embodiment, a transgenic plant cell line, wherein said line was transfected with a vector comprises a nucleic acid sequence encoding a transcription regulator (e.g one of the transcription factors described herein). In one embodiment, said sequence is selected from the group consisting of SEQ ID NOS: 13, 16, 19, 22, 25, 28, 31, 34, 37, and 40.

DESCRIPTION OF PREFERRED EMBODIMENTS

The search for sustainable sources of biofuels has led to renewed interest in microalgae as potential feed stock and consequently, a flurry of research has recently been initiated in microalgae (Wijffels and Barbosa, 2010) [3]. Microalgae accumulate large quantities of oils in the form of triacylglycerols (TAGs) when nutrient-deprived, and a thorough analysis of the underlying molecular mechanism is currently in its infancy (Hu et al, 2008) [4]. At this time, Chlamydomonas reinhardtii is the premier microalgal molecular model for this analysis. As such, the formation of lipid droplets following N-deprivation has recently been documented in detail (Moellering and Benning, 2010; Wang et al, 2009). Although C. reinhardtii is not under direct consideration for the production of biomass as biofuel feed stock, the analysis of its metabolism and physiology is expected to provide basic insights into mechanisms of TAG accumulation relevant to other rnicroalgae at least of the green algal phylum.

The genome of C. reinhardtii is available (Merchant et al, 2007) and its annotation is currently at version 4 (http://genome.gi-psf.org/chlamy/chlamy.home.html). At this time, anumber of microarrays have been used to interrogate changes in response to environmental factors, e.g. (Jamers et al, 2006; Ledford et al, 2004; Ledford et al, 2007; Mus et al, 2007; Mustroph et al, 2010; Nguyen et al, 2008; Simon et al, 2008; Yamano et al, 2008). These microarrays could not cover all genes in the genome, but more recently massively parallel cDNA sequencing approaches were applied to C. reinhardtii overcoming the shortcomings of microarrays (Gonzalez-Ballester et al, 2010). Likewise we have chosen a cDNA sequencing based approach using 454 and Illumina technologies in parallel that allow the generation of large numbers of expressed sequence tags of varying abundance, which can be counted to obtain a measure of gene expression (Weber et al, 2007).

The goal of this study Was to determine major changes ill gene expression following nitrogen (N) -deprivation, the nutrient condition established in our previous analysis of lipid droplet formation and TAG accumulation in C. reinhardtii (Moellering and Benning, 2010). Comparison of the transcript levels of induced, N-deprived C. reinhardtii cultures to those of uninduced, N-replete cultures was expected to reflect the metabolic changes leading to TAG accumulation. Of course, making inferences on metabolism based on gene expression levels has its caveats as gene expression does not necessarily directly translate into metabolic fluxes. To interrogate the meaningfulness of some of the transcript level changes we observed with regard to metabolism, we also performed labeling experiments using acetate as the precursor. Acetate is a typical carbon source provided to C. reinhardtii for photoheterotrophic growth enabling short doubling times, and it is readily incorporated into fatty acids, the main constituents of TAGs. Keeping in mind that these are clearly conditions optimized for an experimental laboratory system, we nevertheless expect to be able to make basic inferences that will be relevant to a broader understanding of the induction of TAG biosynthesis and lipid droplet accumulation in green algae.

RESULTS

Defining conditions for N-deprivation of C. reinhardtii

Ideally, one would like to use finely spaced time course experiments to distinguish rapid versus long-term changes in gene expression following N-deprivation. However, because our resources were limited, we decided to focus on two conditions, N-replete and N- deprived. Independent biological replicates allowed for statistically sound interpretations of the data. To determine the time point for N-deprivation most likely to provide an accurate snapshot of readjustment of transcript steady-state levels following N-deprivation, we first used Northern blot hybridization to compare expression of genes known or expected to be regulated in C. reinhardtii following N-deprivation. An ammonium transporter, AMT4, which has been previously shown to be activated by nitrogen deprivation (Mamedov et al, 2005), and two putative diacylglycerol acyltransferases, tentatively designated DGTT2 and DGTT3 (PIDs 184281 and 400751, genome version 4) were monitored to test the various conditions. RNA was isolated from cells grown in standard (10 mM NH 4 + ) and low nitrogen (0.5 mM NH 4 + ) TAP medium to impose N-limitation, as well as cells grown to mid-log phase in standard TAP medium, then transferred to either standard or no nitrogen (0 mM NH 4 + ) TAP, with samples taken at 24 and 48 hours, to accomplish more drastic N-deprivation (Figure 31 A). For standardization, equal amounts of RNA were loaded and the 18S rRNA abundance was examined. Although C. reinhardtii ribosomes turn over following N- deprivation (Martin et al, 1976; Siersma and Chiang, 1971), their abundance drops no lower than 50% (see below). AMT4 mRNA was absent from the uninduced cells, and present at a high level in N-limited or -deprived cells under the conditions tested. DGTT2 mRNA was present at low levels in all conditions. DGTT3 RNA was present at low levels and increased slightly following N-deprivation. N-deprivation for 48 hours showed the greatest difference in RNA levels compared to the N-replete cultures. Based on basic analysis and our previous time course study of lipid droplet formation and TAG accumulation (Moellering and Benning, 2010), a 48 hour period of N-deprivation was chosen to compare global transcript levels in N-replete and N-deprived cells.

Global characteristics of C. reinhardtii gene expression following N-deprivation

To determine the differential expression of genes in C. reinhardtii under N-replete and N-deprived conditions, two sequencing approaches, 454 and Illumina were applied. Read length is longer for 454 but the number of reads per run is lower. As shown in Table 1, 60-85 fold more sequence tags were generated with Illumina than with 454 sequencing. Among the 454 reads, 78-80%o mapped to the C. reinhardtii genome. For the Illumina data, we mapped in three different ways, with varying stringency depending on whether 3 '-end read quality and exon-spanning reads were considered. Without filtering reads, a substantially smaller proportion of Illumina reads (63-68%) were mapped compared to 454 reads. Trimming low quality 3' regions of reads resulted in a further 2.7%> decrease in the number of mapped reads. Despite the large number of unmapped Illumina reads, out of 16,710 C. reinhardtii gene models, 15,505 (92%) had >1 reads. In contrast, only 6372 gene models (38.1%) were supported by the 454 transcriptome data set. In addition, nearly all genes covered by 454 were also covered by Illumina. Therefore, our sequencing data cover most annotated genes enabling us to interrogate differential expression under normal conditions and following N-deprivation. In addition, as expected, Illumina data provided a better coverage of the gene space than 454 sequences.

To determine differential gene expression following N-deprivation, we modeled count data with a moderated negative binomial distribution (see 1 Materials and Methods). Using thresholds of: 5% False Discovery Rates (FDR) and > 2-fold change for the Illumina dataset, 2,128 and 1,875 genes were categorized as up- and down-regulated, respectively, following N-deprivation. To see if fold changes inferred based on 454 and Illumina datasets were consistent, we determined the statistical correlation in fold change between two datasets (Figure 32) and found that it was rather weak (Pearson's correlation coefficient, r =0.10, p < xlO "16 ). There was an apparent anomaly as 4313 genes (out of 6369 genes with ~ 1 reads from both datasets) had a high degree of up- and down-regulation which was only observed with the 454 but not the Illumina dataset (Figure 32). Most of these genes with extreme responses based on 454 had very low counts (<10 reads combined in both conditions or 0 read in one of the conditions; Figure 32, red data points). As a result, high and likely inaccurate fold change values were assigned to those genes. In fact, if we only considered 2,056 genes with >10 reads combined and >1 read in both conditions, the correlation between

2 16

Illumina and 454 data were substantially improved (r =0.57, p < 2.2 x 10 " , Figure 32, blue data points).

Approximately 7-14% of the Illumina reads mapped to the "intergenic regions" (Table 1). We assembled Illumina reads into 42,574 transcribed fragments (transfrags). Among them, 17,095 transfrags did not map with, or within the vicinity (1855 bases, 99 percentile intron length) of current gene models. With the same conservative criterion, transfrags were joined into 1828 "intergenic transcriptional units". Most importantly, 287 of these intergenic transcriptional units were up-regulated and 176 were down-regulated following N- deprivation. These transfrags are unannotated genes that require further analysis to 'establish their authenticity.

Gene ontology (GO) annotation was used to coarsely identify major categories of genes involved in particular biological processes to assess trends in their transcriptional regulation following N-deprivation. We found multiple GO categories with significant enrichment in their numbers of differentially regulated genes (Table 2). Particularly, genes associated with lipid metabolism tend to be up-regulated, while those involved in photosynthesis and DNA replication initiation tend to be down-regulated.

Induction of gametogenesis and sexual reproductionBecause N-deprivation triggers gametogenesis (Kurvari et al, 1998; Martin and Goodenough, 1975), we examined several genes known to be involved in mating-type plus (rnt+) gamete differentiation or sexual fusion in C. reinhardtii as internal controls for the induction state of the cells following N- deprivation. Following N-deprivation, cells had substantial increases in the abundance of transcripts of four of the six genes considered. These genes encode FUSl, which is a glycoprotein required for sex recognition, SAGI (the rnt+ agglutinin gene), Peptidase M gametolysin, which releases the gametes from the cell wall, and NSGI3, which is a protein of unidentified function, known to be expressed in gametes as summarized by (Harris, 2;009b). A second gametolysin gene (Peptidase Mil) and GSPl, which is a gamete- specific transcription factor, did not show increased expression following N-deprivation, but perhaps that is because only a single time point was examined. Effects on genes of N-metabolism and protein biosynthesis

Many genes involved in N-import and -assimilation are known to be induced following N-deprivation (Fernandez et al, 2009; Gonzalez-Ballester et al, 2004; Schnell and Lefebvre, 1993). Our analysis revealed >2-fold up-regulation for several genes, including those that encode N0 3 ~ and N0 2 " transporters and reductases, as well as transport systems for NH 4 and organic N-sources. Of the genes involved in assimilation of NH 4 by the glutamine synthetase-glutamate synthase cycle, only GLN3 was up-regulated. Similarly, most genes involved in amino acid biosynthesis did not show a >2-fold change. Thus, transcript abundance suggests that following N-deprivation, pathways for the acquisition of new N- sources are strongly up-regulated, whereas biosynthetic pathways that utilize the assimilated nitrogen remain relatively unaffected.

Decades ago, N-deprivation of C. reinhardtii was found to result in degradation and resynthesis of both cytoplasmic and chloroplast ribosomes (Martin et al, 1976; Siersma and Chiang, 1971). Both the rRNA and proteins of the ribosomes were turned over under the conditions ofN-deprivation that also induce gamete differentiation. Hence, we expected that the m NAs for the ribosomal proteins might show different steady-stat~ levels in the comparison of logarithmically growing cells and cells that have been N-deprived for 48 hours. Indeed, following N-deprivation, abundance of transcripts encoding proteins of the chloroplast ribosomes consistently decreased to 30-50% of their levels of expression in logarithmically- growing cells.

A subset of the cytosolic 80S ribosomal protein genes has been identified in the version 4.0 genome dataset. Among those that have been annotated, most are encoded by single copy genes, although a few have two copies (e.g., L7, LIO, L13, L23). As these gene products are assembled into ribosomes, the respective genes have high levels of constitutive expression. The abundance of the transcripts in vegetative cells and N-deprived cells was fairly similar, although they are listed in the Table 3 starting with those that have the most elevated expression in the latter.

The rpl22 ribosomal protein of the cytosolic ribosomes is encoded by a multi-gene family in C. reinhardtii. Of the 37 rpl22 genes the version 4.0 of the genome dataset, 13 appeared not to be expressed under either condition tested, and six had barely detectable levels of transcripts. Of the 18 remaining genes, two gave rise to the most predominant transcripts; yellow highlighting) and their transcripts did not change markedly in abundance. Four of the 18 genes were moderately expressed and their transcript levels doubled in the N- deprived cells. Six of the 18 genes had markedly lower levels of transcripts following N- deprivation, while six others showed relatively constant levels of expression. The rpl22 genes are scattered amongst at least six chromosomes, and no correlation was found between location and level of gene expression.

General Changes in Primary Metabolism

Changes in transcript abundance of genes encoding enzymes of primary metabolism are depicted in Figure 33. Transcripts encoding key enzymes of the glyoxylate cycle, gluconeo genesis and the photosynthetic carbon fixation cycle markedly decrease following N-deprivation. Transcript abundance for glyoxylate cycle enzymes isocitrate lyase and malate synthase decreased more than 16-fold. In addition, mRNA abundance of phosphoenolpyruvate carboxykinase, which catalyzes the committed reaction of gluconeogenesis, dropped to 25% of the levels in N-replete cells, as did transcripts encoding enzymes involved in carbon fixation and reduction, ribulose-bisphosphate carboxylase, sedoheptulose 1,7 bisphosphate aldolase and sedoheptulose-bisphosphatase. In contrast, transcript abundance for enzymes of the pentose phosphate cycle, glucose-6-phosphate 1- dehydrogenase and phosphogluconate dehydrogenase (decarboxylating) were increased under those conditions. The mRNA encoding for one of the pyruvate decarboxylase subunits represented in the data set was also increased in abundance following N-deprivation. The pyruvate decarboxylase complex converts pyruvate to acetyl-CoA, which is a precursor of fatty acid biosynthesis.

To verify whether the changes observed in RNA abundance actually reflect changes in the activity of glyoxylate and gluconeogenic pathways, cells were grown in the presence of [U- 13 C] acetate. Cells grown in N-replete medium showed a higher degree of labeling in serine and glycine than did N-deprived cells (Figure 34). This reduced flow of carbon from acetate to these amino acids indicates reduced activity of the glyoxylate cycle in N-deprived cells as these amino acids are derived from this pathway. Similarly, N-deprived cells had reduced label in carbohydrates and ribose. Since these molecules were formed essentially by the gluconeogenic pathway during growth in the medium employed, the N-deprived cells appeared to have much lower gluconeogenic activity. Thus, these biochemical data corroborated the transcript abundance data (Figure 33) that suggested a down-regulation of the glyoxylate and gluconeogenic pathways in N-deprived cells.

No appreciable change in transcripts for genes encoding components of the mitochondrial respiratory pathway was noted following N-deprivation. However, an II-fold increase in the transcript abundance of an alternative oxidase gene (AOXl) was observed, while the AOX2 transcript was down-regulated 4-fold. These observations were consistent with previous observations of changes in gene expression oiAOXI and AOX2 (Baurain et al, 2003).

The candidate genes for peroxisomal beta oxidation showed an overall decrease in their transcript levels following N-deprivation, with acyl-CoA oxidase and 3-oxoacyl-CoA thiolase (ATOl) transcript abundance decreasing most drastically (>3-fold).- The only exception was an enoyl-CoA oxidase/isomerase candidate gene (ECHl) which showed increased transcript levels. An apparent down regulation of fatty acid oxidation is in line with the accumulation of TAGs under these conditions.

Reduced transcript abundance for most photosynthetic genes

In C. reinhardtii, photosynthetic efficiency decreases following N-deprivation, at least partially due to a reduction in abundance of light harvesting complexes (Peltier and Schmidt, 1991 ; Plumley and Schmidt, 1989) and selective degradation of the cytochrome 6r complex (Bulte and Wollman, 1992; Majeran et al, 2000). Likewise, the abundance of transcripts encoding photosynthesis-related proteiils was substantially reduced following N-deprivation. This regulation was not restricted to light harvesting complexes and cytochromes, but extended to the two photosystems as well. Following N-deprivation, the steady-state level of all nucleus encoded PSI genes decreased by at least 6-fold, while abundance of transcripts from genes encoding the corresponding light harvesting proteins was decreased even further, resulting in a 19-to-43-fold decrease relative to N-replete conditions. Only four of the cytochrome subunits are encoded by the nuclear genome, and three of them showed a considerable down-regulation (6-fold) following N-deprivation. In contrast, the transcript levels of PetO were weakly increased (2-fold). This observation supports the hypothesis that this protein may have a regulatory role as opposed to being a functional cytochrome h(f subunit (Hamel et al, 2000), because the PetO protein is only loosely bound to the complex, and its function is not required for the oxidoreductase activity. Expression of all nuclear genes encoding PSII components also decreased following N-deprivation, although the two least abundant transcripts decreased only slightly. The PSII light harvesting complex encoding transcripts showed a comparable change in abundance. Most of the transcripts levels were reduced, while the weakly expressed LHCB7 gene showed no alteration in transcript levels. The only two genes of the light-harvesting complex of PSII not following that pattern were PSBSI and PSBS2. Their transcript levels were strongly increased following N- deprivation (119- and 52-fold, respectively). This result was confirmed by RT-PCR (Figure 31B).

Specific changes in gene expression related to general lipid metabolism

N-deprivation has been demonstrated to lead to the accumulation of TAG In specialized organelles as well as to structural changes and breakdown of the intracellular membrane systems such as the thylakoids and the ER (Martin et al, 1976; Moellering and Benning, 2010). Therefore, we expected this to be reflected in the expression of genes encoding enzymes of lipid metabolic pathways. However, changes in transcript levels of genes encoding fatty acid metabolism were modest (Figure 35). A 2-fold increase in transcript levels for ketoacyl-ACP synthetase was observed. This enzyme is part of the fatty acid synthase II complex that catalyses the acyl-acyl carrier protein (acyl-ACP) dependent elongation steps from C4 to C14 in higher plants. The gene for acyl-ACP thioesterase (FATl) also showed elevated transcript levels following N-deprivation (about 4-fold). Its reaction terminates fatty acid synthesis by cleaving the acyl chain from Aep. This reaction competes with the direct transacylation of ACP by glycerol-3 -phosphate acyltransferases for the formation of phosphatidate. An increase in FATl activity coiild, therefore, be indicative of increased fatty acid export from the chloroplast to the ER, where TAG assembly occurs, as acyl-ACPs have to be hydrolyzed prior to export (Pollard and Ohlrogge, 1999).

A strong increase in transcript levels was observed for the gene encoding the committing step ofTAG synthesis. Out of the five putative diacylglycerol acyltransferases genes identified in the version 4.0 genome dataset, only four were expressed under either or both growth conditions. One of these genes (DGTT1, PMI 285889) was almost completely suppressed under N-replete conditions, but showed a large increase in transcript abundance following N-deprivation. However, its overall transcript abundance was too low to be detected by Northern blot compared to other genes encoding putative diacylglycerol acyltransferases, which were much less differentially expressed consistent with the initial RNA-DNA hybridization analysis (Figure 31 A).

Phosphatidic acid phosphatase takes part in the Kennedy pathway of glycerolipid and TAG synthesis (Figure 35). Both of the two candidate genes for phosphatidic acid phosphatase in C.reinhardtii annotated in the version 4.0 dataset showed increased transcript levels following N-deprivation. Both are part of the PAP2 family, which is thought to have broad substrate specificity (Carman and Han, 2006). The increase in the expression of the presumed phosphatidic acid phosphatase genes is consistent with the notion that DAG is generated from phosphatidic acid for further TAG biosynthesis following N-deprivation.

Out of a total of 16 putative membrane-bound desaturase and hydroxylase encoding genes found in C. reinhardtii, only three showed a change in transcript abundance that is greater than 2-fold. Transcript abundance for microsomal A12-desaturase. is more than 3- fold higher following N-deprivation as well as that for the plastidic acyl-ACP- Δ 9- desaturase, which introduces the first double bond in an acyl chain. Other microsomal desaturase encoding transcripts such as that encoding FAD13 - and ω 13/A5-desaturase - are also slightly increased in abundance whereas the plastid desaturase encoding genes are not affected.

Of all lipid-related genes those encoding putative lipases showed the strongest differences in transcript abundance between the two conditions tested. By searching for "lipase", "phospholipase", or "patatin" through the version 4.0 genome sequence data, 130 proteins containing the GXSXG lipase motif were identified. Among the respective genes; 35 (270/0) showed increased and 11 (8.5%) decreased transcript levels by 2-fold or more following N-deprivation. In addition, many potential lipases may be considered constitutively expressed. 74 out of 130 (57%) lipase candidates were expressed at slightly higher levels following N-deprivation. Some of these genes may encode lipases that are important for the turnover or replacement of membranes during cell growth or gamete fusion.

Changes in RNA abundance for transcription factors

The Plant Transcription Factor Database (Perez-Rodriguez et al, 2010) was used to identify 386 genes encoding putative transcription factors and transcriptional regulators in the C. reinhardtii transcript data set, which could be sorted into 53 families (Table 3). Of the 368 genes, 83 showed a 2-fold or greater change in transcript abundance following N-deprivation, with 46 being up-regulated, and 37 being down-regulated.

To date, only a few of the putative transcription factors identified in the C. reinhardtii genome have a known function. Transcript abundance for the gene encoding NIT2, atranscription factor regulating nitrate metabolism, was increased 6-fold, while that for NABl, a transcription factor regulating light harvesting proteins, was decreased 16-fold, consistent with previously described physiological changes in response to N-deprivation (Camargo et al, 2007; Mussgnug et al, 2005). The transcript level for GSP1 mt+ gamete- specific transcription factor was decreased 3 -fold at the 48 hour sample point. When looking at the changes in RNA abundance of putative transcription factor genes, no obvious trends emerged. However, transcripts falling into the AP2-EREBP and bHLH families were generally more abundant following N-deprivation, while those ofthe FHA family were generally decreased. DISCUSSION

Microalgae such as C. reinhardtii undergo drastic changes in metabolism and ultimately development when N-deprived. Some of the most remarkable changes involve gametogenesis (Harris, 2009b) and metabolic changes that lead to the accumulation of TAGs (Hu et al, 2008). The former aspect has been studied since the seventies. However, focus on the latter has been largely motivated by the renewed interest in microalgae as biofuel feed stocks (Wijffels and Barbosa, 2010). As sequencing technology has become increasingly faster and affordable, comparison of transcriptomic changes under different experimental conditions by massive parallel sequencing of cDNA libraries is a viable first approach towards indentifying genes that define changes in response N-deprivation or other nutrient stresses (Gonzalez-Ballester et al, 2010). With the goal of gaining a better understanding of the factors underlying or even controlling the process of TAG accumulation following N- deprivation, the foci;s has to be on metabolism and genes that encode enzymes of relevant pathways or regulatory factors.

The validity of making inferences on metabolism from transcriptome data in this study has been verified in different ways. Firstly, specific genes known to be induced following N- deprivation, such as genes involved in gametogenesis or ammonium transport, were found to be expressed as previously described. Secondly, major metabolic changes predicted by transcript analysis, such as the redirection of acetate from the glyoxylate cycle and gluconeogenesis to fatty acid biosyntheses following N-deprivation (Figure 33), were corroborated by labeling experiments (Figure 34).

By and large, gross changes in transcript abundance in response to N-deprivation follow expected themes: genes encoding enzymes directly involved in N-metabolism or N- compound uptake have to be induced, protein biosynthesis is reduced to adjust to the decreased availability of amino acids, and photosynthesis is down-regulated to adjust to the altered metabolic state of the cell. In cyanobacteria, N-deprivation led to the degradation of the highly abundant phycobili light harvesting proteins so that they could be used as a nitrogen source for protein synthesis (Collier and Grossman, 1992). In C. reinhardtii, there is evidence that the cytochrome subunits are degraded not in response to the low concentrations of N per se, but rather to the changed energy content of the cell. Thus, one possible advantage for the cells to decrease photosynthesis following N-deprivation is to prevent the accumulation of reactive oxygen species (Bulte and Wollman, 1992). At the same time, genes encoding proteins of the respiratory chain in mitochondria were only moderately affected following N-deprivation except for those encoding lternative oxidases, which showed elevated transcript abundance. These enzymes are induced under a number of stress conditions that affect the redox environment of the cell, but these effects can be quite indirect and are often difficult to causally connect to the applied stress, in this case N-deprivation. It should also be pointed out here that only the expression of nuclear gelles is probed in this study; the expression levels of genes for organelle-encoded proteins relevant to respiration or photosynthesis have not been examined.

A role for PSBS following N-deprivation?

One particular surprise was the strong up-regulation of PSBS following N-deprivation in C. reinhardtii. In Arabidopsis thaliana, PSBS has been shown to play a critical role in non- photochemical quenching (Li et al., 2000; Li et al, 2002), but previous studies of C. reinhardtii have not detected either of the two PSBS proteins in the thylakoids, and non- photochemical quenching was shown to be independent of these proteins (Bonente et al, 2008). Only rarely have ESTs been found for PSBS transcripts, with the exception of the cDNA stress collection II, which contains RNAs from different stress treatments, including the switch from ammonium to nitrate (Bonente et al., 2008; Shrager et al, 2003). Our results indicate that PSBS expression in C. reinhardtii is induced by ammonium deprivation as was previously observed for the expression of the gene for alternative oxidase, AOXI (Baurain et al, 2003). Recycling of membrane lipids or de novo synthesis of TAGS?

The elevation of synthesis and export of fatty acids from the chloroplast following N- deprivation could indicate that TAG is assembled from fatty acids that are synthesized de novo. This step would require the activation of the fatty acids by a long chain acyl-CoA synthetase. In fact, increased abundance of RNA encoding a putative long chain acyl-CoA synthetase was observed and the respective protein has been identified in the lipid droplet proteome (Moellering and Benning, 2010). However, another enzyme that could contribute to the changing spectrum of fatty acids is a putative phospholipid/glycerol acyltransferase, for which the transcript level decreased during N-deprivation. Long chain acyl-CoA synthetases are likely to playa key role in determining the fate of fatty acids in the cell (Shockey et al, 2002). Regulation of the respective genes could be a major factor in controlling the flux of fatty acids towards glycerolipid synthesis and their degradation by beta oxidation.

Major intracellular changes occur following N-deprivation and these are likely accompanied by remodeling of membranes. Thus fatty acids in membrane lipids might be recycled into TAGs. In fact, some of the transcripts whose abundance changes the most were those encoding lipases. In general, lipases belong to a family of enzymes that deesterify carboxyl esters, such as TAGs and phospholipids..i\s TAGs accumulate following N- deprivation, TAG lipases would be expected to be down-regulated. However, classifying lipases with selective substrate specificity based solely on their primary sequences is challenging, a fact that needs to be taken into consideration when interpreting the current data set. A TAG lipase typically contains a Ser-Asp/Glu-His catalytic triad, with the serine catalytic center located in a GXSXG motif (Brady et al, 1990; Winkler et al, 1 90). Some recently characterized TAG lipases in animals, yeast, and plants contain a patatin-like, iPLA2 family Ser-Asp catalytic dyad (Athenstaedt and Daum, 2005; Eastmond, 2006; Kurat et al, 2006; Zimmermann et al, 2004). Genes encoding lipases specific for membrane lipids would be expected to be up-regulated as they might mobilize fatty acids from membrane lipids into TAGs. Moreover, signaling pathways involving lipid products generated by lipases, such as diacyglycerol, may also control steady- state TAG levels (Kanoh et al, 1993). Further biochemical characterization of some of the most regulated lipase candidate genes will be necessary to determine their role in TAG accumulation following N- deprivation.

On the other hand, we recognize that lipase expression or activities may also be controlled at a post-transcriptional level, including translational regulation and post- translational modifications of the encoded proteins. In mammals, a hormone-sensitive lipase is phosphorylated by protein kinase A upon cAMP elevation, and consequently exhibits better accessibility to lipid droplets (Holm et al, 2000). Some of the C. reinhardtii TAG lipases may have a similar regulatory pattern, and hence not show significant transcriptional changes when cells are N-deprived. Reverse genetic studies on the lipase candidates and forward genetic screens for mutants with TAG deficiency phenotypes will disclose the bona fide TAG lipases and other lipases that impact TAG metabolism.

It should also be noted that during the analysis of the lipid gene data set, the annotation ambiguities of several fatty acid desaturases became obvious in the version 4.0 genomic sequence dataset: The C. reinhardtii genome harbors four presumed orthologs of FADS (named FAD5a-d). Based on our current prediction analysis (Emanuelsson et al, 1999) and previous reports (Riekhof and Benning, 2009; Riekhof et al, 2005), FAD5a, and b are presumed to be targeted to the chloroplast, whereas FAD5 c and d are likely located in the ER membrane. However, experimental corroboration is still needed.

Concluding remarks

Our interpretation of the current dataset places emphasis on TAG metabolism and potential regulatory factors, which are undoubtedly not yet completely identified. We expect that others will be able to mine this dataset taking into account different biological processes pertaining to N-deprivation. Cross querying this dataset with a lipid droplet proteomics data set (Moellering and Benning, 2010) should further narrow down possible candidates relevant for TAG accumulation. Likewise, a meta-analysis of this and other datasets including those from other species could facilitate the identification of genes most likely involved in TAG accumulation, as was recently done for low-oxygen stress (Mustroph et at, 2010). Thus, the current study represents only a first step of many towards gaining a molecular understanding of TAG accumulation and other cellular changes triggered by N-deprivation in C. reinhardtii.

MATERIALS AND METHODS

Strains and Growth Conditions

The C. reinhardtii strain used was dwl5.1 (cwl5, nit2, mt + ), kindly provided by Arthur Grossman. The cells were grown in liquid cultures under continuous light (-,80 J.lmole photons m "2 S ). For N-replete growth, TAP medium (Harris, 2009a) with 10 mM NH 4 (TAP+N) was used. For preliminary experiments, N-deprivation was applied by two methods: continuous growth in TAP with 0.5 mM NH4 , or growth in TAP+N to 5x106 cells/mL, followed by transfer to TAP with no NH 4 + (TAP-N) for an additional 24 or 48 hours. For further experiments, N-deprivation was defined as growth in TAP+N to 5x106 cells/mL, followed by transfer to TAP-N for 48 hours. For labeling studies, the cells were grown in 500 mL shake flasks with a culture volume of 50 mL with continuous shaking. For the N~deprivation experiment, cells were first grown in TAP medium with unlabeled acetate with at least 5 cell doublings to mid logarithmic phase to reach a biomass equivalent to 0.3-0.4 g cell dry weight (CDW)/L. The cells were divided up and transferred to TAP media containing [U- C] acetate (Isotec, a subsidiary of Sigma- Aldrich, Milwaukee, WI), either TAP+N or TAP-N.

Sequencing read processing

To generate material for high-throughput sequencing, cells were grown in 100 mL TAP+N to 5xl0 6 cells/mL. The cultures were split in half and cells were collected by centrifugation, with one pellet being resuspended in 50 mL TAP+N, and the other in 50,mL TAP-N. After 48 hours, the total RNA was harvested using a QIAGEN RNeasy Plant Mini kit (QIAGEN, Valencia, CA, USA). The RNA samples were treated with QIAGEN RNase- free DNase I during extraction.For 454 sequencing; full-length cDNA pools were generated with Clontech SMART cDNA library construction kit (Clontech, Mountain View, CA, USA). cDNA was synthesized using a modified cDNA synthesis primer

(5'TAGAGACCGAGGCGGCCGACATGTTTTGTTTTTTTTTCTTTTTTTTTTVN3 ') (SEQ ID NO: 1). Full- length cDNAs were amplified by PCR and pooled to increase their concentration. An Sfil digest was performed, followed by size fractionation. Fractions with the highest intensity and size distribution were pooled and purified. The resulting cDNA pools were then submitted to MSU- Research Technologies Service Facility (RTSF) for sequencing on a 454 GSFLX Titanium Sequencer (454 Life Sciences, Branford, CT, USA). For Illumina sequencing, total RNA was submitted directly to the MSU-RTSF for sequencing on an Illumina Genome Analyzer II (Illumina, Inc., San Diego, CA, USA). Default parameters were used to pass reads using 454 and IUumina quality control tools. The filtered sequence data were deposited in the NCBI Short Read Archive with accession numbers XXXXX and XXXXX for the 454 and the Illumina datasets, respectively. The filtered 454 sequencing reads were mapped to C. reinhardtii v4.0 assembly from the Joint Genome Institute with GMAP (Wu and Watanabe, 2005). In GMAP, the maximum intron length was set at 980bp which is at the 95 percentile of annotated C. reinhardtii intron lengths. The Illumina reads were mapped with Bowtie (Langmead et al, 2009) using parameters as follows: < 2 mismatches, sum of Phred quality values at all mismatched positions < 70, and excluding reads mapped to > 1 locations. Because the sequence qualities of Illumina reads degrade quickly toward the 3 '-end, an alternative mapping data was generated with reads- trimmed from the 3'-end (until the 3'-end most position with Phred-equivalent score was >20). Trimmed reads < 30bp were excluded from further analysis. In addition to sequence quality issue, some reads may span two exons and would not be mapped by Bowtie correctly. We used Tophat (Trapnell et al, 2009) to identify these exon spanning reads to generate another set of read mapping. The information from Tophat was used for assembling mapped reads into transcribed fragments (transfrags) with Cufflinks (Trapnell et al, 2010). h Cufflinks, the maximum intron length was set at 1855bp (99 percentile of all the intron lengths), 5% minimum isoform fraction, and 5% pre-mRNA fraction. Transfrags within 1855bp of an existing C. reinhardtii v4.0 gene model were regarded as potential missing exons of annotated genes. The rest were regarded as intergenic exons and adjacent transfrags < 1855bp apart were joined into "transcriptional units".

Northern blot analysis and RT-PCR

Total RNA was harvested from N-replete or N-deprived cells as described above, and 4 /ig of each total RNA was separated on a 1% formaldehyde gel and transferred to a Hybond-N + nylon membrane (GE Healthcare, Piscataway, NJ, USA). Probes were synthesized from cDNA and labeled with [ P] using Amersham Megaprime labeling kit (GE Healthcare). The blots were hybridized with the labeled probes in Ambion ULTRAhybe (Applied BiosystemslAmbion, Austin, TX, USA) at 42°C overnight. The blots were washed twice for 5 min with low-stringency buffer (IX SSC, 0.1% SDS) at 60°C, and then twice for 5 min with high-stringency buffer (0.1 X SSC, 0.1% SDS) at 60°C. The blots were exposed to a Molecular Dynamics phosphor screen (GE Healthcare) overnight, and visualized with a Storm 820 phosphoimager (GE Healthcare). Probes were synthesized from cDNA for AMT4 (5'GTATTGCGTCCGATCTGC3' (SEQ ID NO: 2), 5'CGTGGAAATGCTGTAGGG3 ' (SEQ ID NO: 3)), DGTT2 (5'TAAAGCACCG ACAAATGTGC3 ' (SEQ ID NO: 4), 5'C ATGATCTGGCATTCTGTGG3 ' (SEQ ID NO: 5)), and DGTT3 (5'GGTGGTGCTCTCCTACTGGA3' (SEQ ID NO: 6), 5'CCATGTACATCTCGGCAATG3' (SEQ ID NO: 7)).

For RT-PCR, RNA was extracted from N-replete and N-deprived cultures using TRIzol reagent (Invitrogen) and subjected to DNAse treatment with the turbo DNA-free Kit from Ambion. 1 μg of DNA-free RNA was used for cDNA synthesis with the Invitrogen M- MLV reverse transcriptase. 0.5 μg of oligo(dT)12-18 primer (Invitrogen) and 0.5 μg of random hexamer primers (Promega) were added to the RNA and the volume was adjusted to 20 μΐ, final volume. After heating the samples at 70°C for 10 min they were incubated on ice for 5 min. 20 U of RNAse Inhibitor (Applied Biosystems), 20 nmoles of dNTPs (Invitrogen), 4 /iL of first strand buffer and 0.2 nmoles DTT were added to the reaction. The reaction mixture was incubated at 37°C for 10 min for primer annealing. 200 U of M-MLV reverse transcriptase were added, the reaction was incubated at 37°C for 1 h, followed by deactivation at 70°C for 10 min. 1 ^uL of a 1 : 10 dilution of the respective cDNA was used as template for a PCR reaction using GoTaq polymerase (Promega). The reaction mixture (25 JlL) contained Ix buffer, 5 nmoles dNTPs, 12.5 pmoles of each primer, and 1 U of polymerase. PCR cycle conditions were 3 in initial denaturation at 94°C, following 40 cycles of 30 s denaturation, 30 s annealing at 60°C and 3 min elongation at 72°C. Final elongation was performed at for 10 min. The specific primers (5'ATGGCCATGACTCTGTCGAC3' (SEQ ID NO: 8), 5 'TTAGGCGGACTCCTCGTCC3 ' (SEQ ID NO: 9)) amplifyboth PSBSI and PSBS2. The IDAS gene (5 'GCC AGGTCTCTGCTCTGGTG3 ' (SEQ ID NO: 10), 5 ACTCGGACTTGGCGATCCA3' (SEQ ID NO: 11)) served as a control.

Analysis of differential gene expression

Differential expression between C. reinhardtii cultured in N-replete and N-depleted medium were determined using the numbers of mapped reads overlapped with annotated C. reinhardtii genes as inputs to EdgeR (Robinson et al, 2010). In the Joint Genome Institute database, multiple sets of C. reinhardtii v.4 gene models are available. We used the "filtered" gene models which contain the best gene model for each locus. Genes were regarded as differentially expressed if they have 2: 2-fold change between N-replete and N-deprived samples and < 5% False Discovery Rate (FDR). Differential expressed genes were regarded as up-regulated if their expression levels in N-deprived samples were significantly higher than those in N-replete samples. Conversely, down-regulated genes were those with significantly lower levels of expression following N-deprivation. Gene ontology (GO) annotation for C. reinhardtii v4.0 genome was acquired from Joint Genome Institute. Enrichment of differentially regulated genes in each GO category was determined using Fisher's exact test. To account for multiple testing, the p values from Fisher's exact tests were adjusted (Storey, 2003) and an FDR value of 5% was used as the threshold for enriched GO terms. GC-MS Analysis

To quantify 13 C-Iabeling patterns such as mass-isotopomer distributions and fractional 13 C enrichment, samples were analyzed using gas chromatography-mass spectrometry (GC/MS) using an HP 6890 GC (Hewlett Packard, Palo Alto, CA, USA) equipped with DB- 5MS column (5% phenyl -methyl-siloxan-diphenylpolysiloxan; 30 m x 0.251 mm x 0.25 μπι, Agilent, Waldbronn, Germany) and a quadrupole mass spectrometer (MS 5975, Agilent, Waldbronn, Germany). Electron ionization was carried out at 70 eV. The obtained mass spectrometric data were corrected for the natural abundance of the elements to give fractional 13 C labeling.

Sampling, Extraction and Analysis of Intracellular Amino Acid:

Cells from TAP+N and TAP-N cultures were harvested after 24 hours. This time point was chosen for these labeling experiments, because at the required cell concentration (~0.2 g CDW/L) for metabolite extraction, acetate depletion occurs in cells grown in TAP+N at later time points due to the high initial inoculum.

The harvested cells (~25mg CDW) were centrifuged at 3000xg for 1 min, the supernatant removed and quenched with 5 mL cold 100% methanol (Winder et al, 2008). The metabolites were harvested by vortexing the cells. A second extraction was performed with 5 mL chloroform: methanol (1 :2). The extracts were then pooled. Water was slowly added to the pooled extracts for phase separation. The polar metabolites which include the amino acids were present in the aqueo is phase. The aqueous phase was then dried under N 2 and converted to its ?-BDMS derivative using MTBSTFA (Mawhinney et al, 1986). The GC and MS conditions for this analysis were as previously described (Deshpande et al, 2009). Extraction and Analysis of Ribose

Ribose for analysis was obtained from the ribonucleic acid (RNA) as described (Boren et al., 2003). RNA was extracted from cells (0.1-0.2 g/L) at the same time point as the intracellular amino acids using the Tri reagent as described in the protocol (Molecular Research Center, Inc., Cincinnati, OH. USA). The RNA was acid-hydrolyzed to its monomers and dried under nitrogen. It was further analyzed using GC-MS by derivatizing it to its per-O-trimethylsilyl-O-ethyl oxime (MacLeod et al, 2001). The ions 481 to 486 (m/z), corresponding to the whole carbon backbone of the ribose molecule (Cl-CS), were monitored using the SIM (Single Ion Monitoring) of the MS.

Extraction and Analysis of Carbohydrate

The carbohydrates in the cells were acid-hydrolyzed by 2N HC1 at 102°C and the monomeric compounds analyzed by GC-MS after the sample was dried under N 2 . The sample was then converted to its di-O-isopropylidene acetate derivative for analysis by GC-MS (Hachey et al, 1999). The ions 287 to 293 (m/z), corresponding to the whole carbon backbone of glucose (CI - C6), were monitored.

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of the features and advantages of the present invention, reference is now made to the detailed description of the invention along with the accompanying figures.

Figure 1 shows the genomic sequence (SEQ ID NO: 12) for TFU1.

Figure 2 shows the transcript sequence (SEQ ID NO: 13) for TFU1.

Figure 3 shows the amino acid sequence (SEQ ID NO: 14) for TFU1.

Figure 4 shows the genomic sequence (SEQ ID NO: 15) for TFU2.

Figure 5 shows the transcript sequence (SEQ ID NO: 16) for TFU2. Figure 6 shows the amino acid sequence (SEQ ID NO: 17) for TFU2. Figure 7 shows the genomic sequence (SEQ ID NO: 18) for TFU3. Figure 8 shows the transcript sequence (SEQ ID NO: 19) for TFU3. Figure 9 shows the amino acid sequence (SEQ ID NO: 20) for TFU3. Figure 10 shows the genomic sequence (SEQ ID NO: 21) for TFU4. Figure 1 1 shows the transcript sequence (SEQ ID NO: 22) for TFU4. Figure 12 shows the amino acid sequence (SEQ ID NO: 23) for TFU4. Figure 13 shows the genomic sequence (SEQ ID NO: 24) for TFU5. Figure 14 shows the transcript sequence (SEQ ID NO: 25) for TFU5. Figure 15 shows the amino acid sequence (SEQ ID NO: 26) for TFU5. Figure 16 shows the genomic sequence (SEQ ID NO: 27) for TFD1. Figure 17 shows the transcript sequence (SEQ ID NO: 28) for TFD1. Figure 18 shows the amino acid sequence (SEQ ID NO: 29) for TFD1. Figure 19 shows the genomic sequence (SEQ ID NO: 30) for TFD2. Figure 20 shows the transcript sequence (SEQ ID NO: 31) for TFD2. Figure 21 shows the amino acid sequence (SEQ ID NO: 32) for TFD2. Figure 22 shows the genomic sequence (SEQ ID NO: 33) for TFD3. Figure 23 shows the transcript sequence (SEQ ID NO: 34) for TFD3. Figure 24 shows the amino acid sequence (SEQ ID NO: 35) for TFD3. Figure 25 shows the genomic sequence (SEQ ID NO: 36) for TFD4. Figure 26 shows the transcript sequence (SEQ ID NO: 37) for TFD4. Figure 27 shows the amino acid sequence (SEQ ID NO: 38) for TFD4. Figure 28 shows the genomic sequence (SEQ ID NO: 39) for TFD5. Figure 29 shows the transcript sequence (SEQ ID NO: 40) for TFD5. Figure 30 shows the amino acid sequence (SEQ ID NO: 41) for TFD5. Figure 31 shows the RNA DNA hybridization of standard and representative lipid genes. (A) cultures were grown in TAP medium either N-replete (10 mM NH 4 + ; 10), continual N-limited (0.5 mM NH 4 + ; 0.5), or N-deprived (0 mM NH 4 + ; 0) for 24 or 48 hours. The expression levels of AMT4, DGTT2 and DGTT3 were measured by RNA-DNA hybridization and rRNA was visualized as a loading control. (B) cultures were grown for 48 hours in either N-replete or N-deprived conditions. The levels of PSBS transcript were measured using RT-PCR, the constitutive IDA5 gene served as a control.

Figure 32 shows the fold change correlation between Illumina and 454 dataset for each C. reinhardfii gene. Only genes with > 454 and Illumina reads under either N-replete (+N) or N-deprived (-N) conditions were plotted. Fold change is determined by the by the number of reads following N-deprivation divided by the number of reads under N-replete conditions for each gene. For genes with > 210 or < 2-10 fold changes, their fold change values were to 10. Blue filled circles ("high" 454 read genes) indicate genes with > 10 reads (+N and -N combined) and > 1 reads in either +N or -N. Red filled circles ("low" 454 read genes) indicate genes that did not satisfied either of the above criteria.

Figure 33 shows the Regulation genes involved in primary metabolism. The figure indicates and gives the central metabolic pathways of C. reinhardtii and gives the differential regulation of gene expression following N-deprivation. Symbols represent log2 fold change as follows: + + +, >5; + +, > 2 and < 5;+, > 1 ; ± < 1 and> -1; - , <-l; - -, < -2 and > -5; - - -, <- 5.

Figure 34 shows the Changes in labeling patterns reflecting changes in gene expression. Labeling of metabolites after 24 h of N-deprivation compared to N-replete cells is shown. The methabolites give an indication of the pathway activity. Intracellular serine and glycine were extracted by quick quenching and extraction with cold methanol. Ribose and glucose were obtained from the acid hydrolysis of RNA and cellular polysaccharides respectively. The labeling was analyzed by GC-MS after derivatization. Open box, natural labeling; fine cross hatching, N-replete cells; coarse cross hatching, N-deprived cells.

Figure 35 shows the selected changes in glycerolipid metabolism transcript abundance. Numbers indicate log2 fold change of transcript abundance following N- deprivation. Enzymes labeled with an asterisk cannot be imequivocally assigned to a specific step in the metabolic pathway and are hypothetical.

Table 1 shows Summary of expression tags generated using two different sequencing methods.

Table 2 shows Gene Ontology categories significantly enriched in differentially regulated C. reinhardtii genes.

Table 3 shows Alumina analysis of transcripts encoding transcription factors and regulators following N-deprivation.

Table 4 shows differentially-regulated transcription factors considered in the present invention.

DEFINITIONS

To facilitate the understanding of this invention a number of terms are defined below. Terms defined herein (unless otherwise specified) have meanings as commonly understood by a person of ordinary skill in the areas relevant to the present invention. Terms such as "a", "an" and "the" are not intended to refer to only a singular entity, but include the general class of which a specific example may be used for illustration. The terminology herein is used to describe specific embodiments of the invention, but their usage does not delimit the invention, except as outlined in the claims.

As used herein, terms defined in the singular are intended to include those terms defined in the plural and vice versa. The term "upregulated" as used herein, should be interpreted in the most general sense possible. For example, a special type of molecule (i.e., for example, a nucleic acid) may be "upregulated" in a cell if it is produced at a level significantly and detectably higher (i.e., for example, between 1.5-10 fold) the natural expression rate. "Upregulation" of a molecule in a cell can be achieved via both traditional mutation and selection techniques and genetic manipulation methods.

The term "ectopic expression" as used herein, refers to any nucleic acid upregulation produced by an exogenous expression platform that is not natural to the plant cell (i.e., for example, a plant genome transfected by a vector).

To facilitate an understanding of the present invention, a number of terms and phrases as used herein are defined below:

The term "plant" is used in it broadest sense. It includes, but is not limited to; any species of woody, ornamental or decorative, crop or cereal, fruit or vegetable plant, and photosynthetic green algae (for example, Chlamydomonas reinhardtii). It also refers to a plurality of plant cells which are largely differentiated into a structure that is present at any stage of a plant's development. Such structures include, but are not limited to, a fruit, shoot, stem, leaf, flower petal, etc. The term "plant tissue" includes differentiated and undifferentiated tissues of plants including those present in roots, shoots, leaves, pollen, seeds and tumors, as well as cells in culture (for example, single cells, protoplasts, embryos, callus, etc.). Plant tissue may be in plants, in organ culture, tissue culture, or cell culture. The term "plant part" as used herein refers to a plant structure or a plant tissue.

The term "crop" or "crop plant" is used in its broadest sense. The term includes, but is not limited to, any species of plant or algae edible by humans or used as a feed for animals or used, or consumed by humans, or any plant or algae used in industry or commerce. The term "oil-producing species" refers to plant species that produce and store triacylglycerol in specific organs, primarily in seeds. Such species include, but are not limited to, green algae (Chlamydomonas reinhardtii), soybean {Glycine max), rapeseed and canola (including Brassica napus and B. campestris), sunflower (Helianthus annus), cotton (Gossypium hirsutum), corn (Zea mays), cocoa (Theobroma cacao), safflower (Carthamus tinctorius), oil palm (Elaeis guineensis), coconut palm (Cocos nucifera), flax (Linum usitatissimum), castor (Ricinus communis) and peanut (Arachis hypogaea). The group also includes non-agronomic species which are useful in developing appropriate expression vectors such as tobacco, rapid cycling Brassica species, and Arabidopsis thaliana, and wild species which may be a source of unique fatty acids.

The term "Chlamydomonas" refers to a plant or plants from the genus Chlamydomonas. Non-limiting examples of Chlamydomonas include plants from the species C. reindardtii

The term plant cell "compartments or organelles" is used in its broadest sense. The term includes, but is not limited to, the endoplasmic reticulum, Golgi apparatus, trans Golgi network, plastids, sarcoplasmic reticulum, glyoxysomes, mitochondrial, chloroplast, and nuclear membranes, and the like.

The term "host cell" refers to any cell capable of replicating and/or transcribing and/or translating a heterologous gene.

The terms "diacylglycerol" and "diglyceride" refer to a molecule comprising a glycerol backbone to which two acyl groups are esterified. Typically, the acyl groups are esterified to the sn-1 and sn-2 positions, although the acyl groups may also be esterified to the sn-1 and sn-3 positions, or to the sn-2 and sn-3 positions; the remaining position is unesterified and contains a hydroxyl group. This term may be represented by the abbreviation DAG. The terms "triacylglycerol" and "triglyceride" refer to a molecule comprising a glycerol backbone to which three acyl groups are esterified. This term may be represented by the abbreviation TAG.

The term "long chain triacylglycerol" refers to a triacylglycerol in which all three acyl groups are long chain, or in other words each chain is a linear aliphatic chain of 6 carbons or greater in length (an acyl group may be referred to by the letter C followed by the number of carbons in the linear aliphatic chain, as, for example, C6 refers to an acyl group of 6 carbons in length). This term may be represented by the abbreviation LcTAG.

The terms "acetyl glyceride" and "acetyl triacylglycerol" and the like refer to a triglyceride to which at least one acetyl or related group is esterified to the glycerol backbone. A particular acetyl glyceride is denoted by the position(s) to which an acetyl or related group is esterified; thus, "sn-3-acetyl glyceride" or "l,2-diacyl-3-acetin" refers to triacylglycerol with an acetyl group at the sn-3 position. These terms may be represented by the abbreviation AcTAG.

An "acetyl" or "related group", when used in reference to AcTAG, refers to an acyl moiety other than a long-chain acyl group esterified to TAG. The acyl moiety is any linear aliphatic chain of less than 6 carbons in length; it may or may not have side group chains or substituents. The acyl moiety may also be aromatic. Related group members include but are not limited to propionyl and butyryl groups, and aromatic groups such as benzoyl and cinnamoyl.

The term "diacylglycerol acyltransferase" refers to a polypeptide with the capacity to transfer an acyl group to a diacylglycerol substrate. Typically, a diacylglycerol acyltransferase transfers an acyl group to the sn-3 position of the diacylglycerol, though transfer to the sn-1 and sn-2 positions are also possible. The acyl substrate for the transferase is typically esterified to CoA; thus, the acyl substrate is typically acyl-CoA. The enzyme is therefore also referred to as an "diacylglycerol:acyl-CoA acyltransferase," and in some particular embodiments, as an "acyl-CoA:sn-l,2-diacylglycerol acyltransferase," and the like. The term may be referred to by the abbreviation DAG AT.

The term "diacylglycerol acetyltransferase" refers to a diacylglycerol acyltransferase polypeptide with a unique acyl group transfer specificity, such that the polypeptide is able to transfer an acetyl or related group to a diacylglycerol substrate, and such that the diacylglycerol acetyltransferase exhibits increased specificity for an acetyl or related group compared to a diacylglycerol acyltransferase obtained from a plant in which acetyl TAGs are not present, or are present in only trace amounts (in other words, less than about 1% of the total TAGs). The specificity may be determined by either in vivo or in vitro assays. From an in vivo assay, the specificity is the proportion of total TAGs that are AcTAGs, where the AcTAGs are synthesized by the presence of a heterologous diacylglycerol acetyltransferase. From an in vitro assay, the specificity is the activity of transfer of an acetyl or related group to a diacylglycerol, when the substrate is an acetyl-CoA or related group esterified to CoA. The increase in specificity of transferring an acetyl or related group for an AcDAGAT is at least about 1.5 times, or about 2 times, or about 5 times, or about 10 times, or about 20 times, or about 50 times, or about 100 times, or up to about 2000 times, the specificity of a DAGAT obtained from a plant in which acetyl TAGs are not present, or are present in only trace amounts. One standard DAGAT to which an AcDAGAT is compared, in order to determine specificity of transfer of an acetyl or related group, is a DAGAT obtained from Arabidopsis (AtDAGAT).

The acetyl or related group substrate of the transferase is typically esterified to CoA; thus, typical acetyl substrate include but are not limited to acetyl-CoA, propionyl-CoA, butyryl-CoA, benzoyl-CoA, or cinnamoyl-CoA, as described above. These CoA substrates are typically non-micellar acyl-CoAs, or possess high critical micelle concentrations (CMCs), in that they form micelles at relatively high concentrations when compared to the CMCs of long chain acyl-CoAs.

The diacylglycerol substrate of AcDAGAT is typically a long chain diacylglycerol, although other groups are also contemplated. The acyl (or other) groups are esterified to the sn-1 and sn-2 positions, although the acyl groups may also be esterified to the sn-1 and sn-3 positions, or to the sn-2 and sn-3 positions.

Thus, the enzyme is also referred to as an " diacylglycerol :acetyl-Co A acetyl transferase," or in particular embodiments, as an "acetyl-CoA:sn-l,2-diacylglycerol acetyltransferase" and the like. This term may be referred to by the abbreviation AcDAGAT, indicating an activity of increased specificity for transfer of acetyl or related groups

The terms "Chlamydomonas " and " Chlamydomonas -like" when used in reference to a DAGAT refer to a DAGAT obtained from Chlamydomonas reinhardtii or with a substrate specificity that is similar to a DAGAT obtained from Chlamydomonas reinhardtii. The term may be referred to by the abbreviation, "ChDAGAT," indicating an enzyme obtained from Chlamydomonas reinhardtii, or from the genus Chlamydomonas, or from a closely related plant family, or an enzyme which has an amino acid sequence with a high degree of similarity to or identity with a DAGAT obtained from Chlamydomonas reinhardtii. By "high degree of similarity" it is meant that it is more closely related to ChDAGAT than to AtDAGAT by BLAST scores or other amino acid sequence comparison/alignment software programs.

The term "substrate specificity" refers to the range of substrates that an enzyme will act upon to produce a product.

The term "competes for binding" is used in reference to a first polypeptide with enzymatic activity which binds to the same substrate as does a second polypeptide with enzymatic activity, where the second polypeptide is variant of the first polypeptide or a related or dissimilar polypeptide. The efficiency (for example, kinetics or thermodynamics) of binding by the first polypeptide may be the same as or greater than or less than the efficiency substrate binding by the second polypeptide. For example, the equilibrium binding constants (KD) for binding to the substrate may be different for the two polypeptides.

The terms "protein" and "polypeptide" refer to compounds comprising amino acids joined via peptide bonds and are used interchangeably.

As used herein, "amino acid sequence" refers to an amino acid sequence of a protein molecule. "Amino acid sequence" and like terms, such as "polypeptide" or "protein," are not meant to limit the amino acid sequence to the complete, native amino acid sequence associated with the recited protein molecule. Furthermore, an "amino acid sequence" can be deduced from the nucleic acid sequence encoding the protein.

The term "portion" when used in reference to a protein (as in "a portion of a given protein") refers to fragments of that protein. The fragments may range in size from four amino acid residues to the entire amino sequence minus one amino acid.

The term "homology" when used in relation to amino acids refers to a degree of similarity or identity. There may be partial homology or complete homology (in other words, identity). "Sequence identity" refers to a measure of relatedness between two or more proteins, and is given as a percentage with reference to the total comparison length. The identity calculation takes into account those amino acid residues that are identical and in the same relative positions in their respective larger sequences. Calculations of identity may be performed by algorithms contained within computer programs.

The term "chimera" when used in reference to a polypeptide refers to the expression product of two or more coding sequences obtained from different genes, that have been cloned together and that, after translation, act as a single polypeptide sequence. Chimeric polypeptides are also referred to as "hybrid" polypeptides. The coding sequences include those obtained from the same or from different species of organisms.

The term "fusion" when used in reference to a polypeptide refers to a chimeric protein containing a protein of interest joined to an exogenous protein fragment (the fusion partner). The fusion partner may serve various functions, including enhancement of solubility of the polypeptide of interest, as well as providing an "affinity tag" to allow purification of the recombinant fusion polypeptide from a host cell or from a supernatant or from both. If desired, the fusion partner may be removed from the protein of interest after or during purification.

The term "homolog" or "homologous" when used in reference to a polypeptide refers to a high degree of sequence identity between two polypeptides, or to a high degree of similarity between the three-dimensional structure or to a high degree of similarity between the active site and the mechanism of action. In a preferred embodiment, a homolog has a greater than 60% sequence identity, and more preferable greater than 75% sequence identity, and still more preferably greater than 90% sequence identity, with a reference sequence.

The terms "variant" and "mutant" when used in reference to a polypeptide refer to an amino acid sequence that differs by one or more amino acids from another, usually related polypeptide. The variant may have "conservative" changes, wherein a substituted amino acid has similar structural or chemical properties (for example, replacement of leucine with isoleucine). More rarely, a variant may have "non-conservative" changes (for example, replacement of a glycine with a tryptophan). Similar minor variations may also include amino acid deletions or insertions (in other words, additions), or both. Guidance in determining which and how many amino acid residues may be substituted, inserted or deleted without abolishing biological activity may be found using computer programs well known in the art, for example, DNAStar software. Variants can be tested in functional assays. Preferred variants have less than 10%, and preferably less than 5%, and still more preferably less than 2% changes (whether substitutions, deletions, and so on).

The term "gene" refers to a nucleic acid (for example, DNA or RNA) sequence that comprises coding sequences necessary for the production of RNA, or a polypeptide or its precursor (for example, proinsulin). A functional polypeptide can be encoded by a full length coding sequence or by any portion of the coding sequence as long as the desired activity or functional properties (for example, enzymatic activity, ligand binding, signal transduction, etc.) of the polypeptide are retained. The term "portion" when used in reference to a gene refers to fragments of that gene. The fragments may range in size from a few nucleotides to the entire gene sequence minus one nucleotide. Thus, "a nucleotide comprising at least a portion of a gene" may comprise fragments of the gene or the entire gene.

The term "gene" also encompasses the coding regions of a structural gene and includes sequences located adjacent to the coding region on both the 5' and 3' ends for a distance of about 1 kb on either end such that the gene corresponds to the length of the full- length mRNA. The sequences which are located 5' of the coding region and which are present on the mRNA are referred to as 5' non-translated sequences. The sequences which are located 3' or downstream of the coding region and which are present on the mRNA are referred to as 3' non-translated sequences. The term "gene" encompasses both cDNA and genomic forms of a gene. A genomic form or clone of a gene contains the coding region interrupted with non- coding sequences termed "introns" or "intervening regions" or "intervening sequences." Introns are segments of a gene that are transcribed into nuclear RNA (hnRNA); introns may contain regulatory elements such as enhancers. Introns are removed or "spliced out" from the nuclear or primary transcript; introns therefore are absent in the messenger RNA (mRNA) transcript. The mRNA functions during translation to specify the sequence or order of amino acids in a nascent polypeptide. In addition to containing introns, genomic forms of a gene may also include sequences located on both the 5' and 3' end of the sequences that are present on the RNA transcript. These sequences are referred to as "flanking" sequences or regions (these flanking sequences are located 5' or 3' to the non-translated sequences present on the mR A transcript). The 5' flanking region may contain regulatory sequences such as promoters and enhancers that control or influence the transcription of the gene. The 3' flanking region may contain sequences that direct the termination of transcription, posttranscriptional cleavage and polyadenylation.

The term "heterologous gene" refers to a gene encoding a factor that is not in its natural environment (in other words, has been altered by the hand of man). For example, a heterologous gene includes a gene from one species introduced into another species. A heterologous gene also includes a gene native to an organism that has been altered in some way (for example, mutated, added in multiple copies, linked to a non-native promoter or enhancer sequence, etc.). Heterologous genes may comprise plant gene sequences that comprise cDNA forms of a plant gene; the cDNA sequences may be expressed in either a sense (to produce mRNA) or anti-sense orientation (to produce an anti-sense RNA transcript that is complementary to the mRNA transcript). Heterologous genes are distinguished from endogenous plant genes in that the heterologous gene sequences are typically joined to nucleotide sequences comprising regulatory elements such as promoters that are not found naturally associated with the gene for the protein encoded by the heterologous gene or with plant gene sequences in the chromosome, or are associated with portions of the chromosome not found in nature (for example, genes expressed in loci where the gene is not normally expressed).

The term "oligonucleotide" refers to a molecule comprised of two or more deoxyribonucleotides or ribonucleotides, preferably more than three, and usually more than ten. The exact size will depend on many factors, which in turn depends on the ultimate function or use of the oligonucleotide. The oligonucleotide may be generated in any manner, including chemical synthesis, DNA replication, reverse transcription, or a combination thereof.

The term "an oligonucleotide having a nucleotide sequence encoding a gene" or "a nucleic acid sequence encoding" a specified polypeptide refers to a nucleic acid sequence comprising the coding region of a gene or in other words the nucleic acid sequence which encodes a gene product. The coding region may be present in either a cDNA, genomic DNA or RNA form. When present in a DNA form, the oligonucleotide may be single- stranded (in other words, the sense strand) or double-stranded. Suitable control elements such as enhancers/promoters, splice junctions, polyadenylation signals, etc. may be placed in close proximity to the coding region of the gene if needed to permit proper initiation of transcription and/or correct processing of the primary RNA transcript. Alternatively, the coding region utilized in the expression vectors of the present invention may contain endogenous enhancers/promoters, splice junctions, intervening sequences, polyadenylation signals, etc. or a combination of both endogenous and exogenous control elements.

The terms "complementary" and "complementarity" refer to polynucleotides (in other words, a sequence of nucleotides) related by the base-pairing rules. For example, for the seq ience "A-G-T," is complementary to the sequence "T-C-A." Complementarity may be "partial," in which only some of the nucleic acids' bases are matched according to the base pairing rules. Or, there may be "complete" or "total" complementarity between the nucleic acids. The degree of complementarity between nucleic acid strands has significant effects on the efficiency and strength of hybridization between nucleic acid strands. This is of particular importance in amplification reactions, as well as detection methods that depend upon binding between nucleic acids. The term "homology" when used in relation to nucleic acids refers to a degree of complementarity. There may be partial homology or complete homology (in other words, identity). "Sequence identity" refers to a measure of relatedness between two or more nucleic acids, and is given as a percentage with reference to the total comparison length. The identity calculation takes into account those nucleotide residues that are identical and in the same relative positions in their respective larger sequences. Calculations of identity may be performed by algorithms contained within computer programs such as "GAP" (Genetics Computer Group, Madison, Wis.) and "ALIGN" (DNAStar, Madison, Wis.). A partially complementary sequence is one that at least partially inhibits (or competes with) a completely complementary sequence from hybridizing to a target nucleic acid is referred to using the functional term "substantially homologous." The inhibition of hybridization of the completely complementary sequence to the target sequence may be examined using a hybridization assay (Southern or Northern blot, solution hybridization and the like) under conditions of low stringency. A substantially homologous sequence or probe will compete for and inhibit the binding (in other words, the hybridization) of a sequence that is completely homologous to a target under conditions of low stringency. This is not to say that conditions of low stringency are such that non-specific binding is permitted; low stringency conditions require that the binding of two sequences to one another be a specific (in other words, selective) interaction. The absence of non-specific binding may be tested by the use of a second target which lacks even a partial degree of complementarity (for example, less than about 30% identity); in the absence of non-specific binding the probe will not hybridize to the second non- complementary target.

When used in reference to a double-stranded nucleic acid sequence such as a cDNA or genomic clone, the term "substantially homologous" refers to any probe which can hybridize to either or both strands of the double-stranded nucleic acid sequence under conditions of low stringency as described infra.

Low stringency conditions when used in reference to nucleic acid hybridization comprise conditions equivalent to binding or hybridization at 42°C in a solution consisting of 5x SSPE (43.8 g/1 NaCl, 6.9 g/1 NaH 2 P0 4 H 2 0 and 1.85 g/1 EDTA, pH adjusted to 7.4 with NaOH), 0.1% SDS, 5x Denhardt's reagent [50x Denhardt's contains per 500 ml: 5 g Ficoll (Type 400, Pharmacia), 5 g BSA (Fraction V; Sigma)] and 100 μg/ml denatured salmon sperm DNA followed by washing in a solution comprising 5x SSPE, 0.1% SDS at 42°C when a probe of about 500 nucleotides in length is employed. Numerous equivalent conditions may be employed to comprise low stringency conditions; factors such as the length and nature (DNA, RNA, base composition) of the probe and nature of the target (DNA, RNA, base composition, present in solution or immobilized, etc.) and the concentration of the salts and other components (for example, the presence or absence of formamide, dextran sulfate, polyethylene glycol) are considered and the hybridization solution may be varied to generate conditions of low stringency hybridization different from, but equivalent to, the above listed conditions. In addition, the art knows conditions that promote hybridization under conditions of high stringency (for example, increasing the temperature of the hybridization and/or wash steps, the use of formamide in the hybridization solution, etc.).

"High stringency" conditions are used in reference to nucleic acid hybridization comprise conditions equivalent to binding or hybridization at 42° C. in a solution consisting of 5.times.SSPE (43.8 g/1 NaCl, 6.9 g/1 NaH 2 P0 4 .H 2 0 and 1.85 g/1 EDTA, pH adjusted to 7.4 with NaOH), 0.5% SDS, 5.times. Denhardt's reagent and 100 μg/ml denatured salmon sperm DNA followed by washing in a solution comprising O.l .times.SSPE, 1.0% SDS at 42° C. when a probe of about 500 nucleotides in length is employed. The term "substantially homologous", when used in reference to a double-stranded nucleic acid sequence such as a cDNA or genomic clone, refers to any probe that can hybridize to either or both strands of the double-stranded nucleic acid sequence under conditions of low to high stringency as described above.

The term "substantially homologous", when used in reference to a single-stranded nucleic acid sequence, refers to any probe that can hybridize (in other words, it is the complement of) the single-stranded nucleic acid sequence under conditions of low to high stringency as described above.

The term "hybridization" refers to the pairing of complementary nucleic acids. Hybridization and the strength of hybridization (in other words, the strength of the association between the nucleic acids) is impacted by such factors as the degree of complementary between the nucleic acids, stringency of the conditions involved, the T m of the formed hybrid, and the G:C ratio within the nucleic acids. A single molecule that contains pairing of complementary nucleic acids within its structure is said to be "self-hybridized." The term "T m " refers to the "melting temperature" of a nucleic acid. The melting temperature is the temperature at which a population of double-stranded nucleic acid molecules becomes half dissociated into single strands. The equation for calculating the T m of nucleic acids may be calculated by: T m =81.5.+-.0.41(% G+C), when a nucleic acid is in aqueous solution at 1 M NaCl (See for example, Anderson and Young, Quantitative Filter Hybridization (1985) in Nucleic Acid Hybridization) . Other references include more sophisticated computations that take structural as well as sequence characteristics into account for the calculation of T m .

As used herein the term "stringency" refers to the conditions of temperature, ionic strength, and the presence of other compounds such as organic solvents, under which nucleic acid hybridizations are conducted. With "high stringency" conditions, nucleic acid base pairing will occur only between nucleic acid fragments that have a high frequency of complementary base sequences. Thus, conditions of "low" stringency are often required with nucleic acids that are derived from organisms that are genetically diverse, as the frequency of complementary sequences is usually less.

"Amplification" is a special case of nucleic acid replication involving template specificity. It is to be contrasted with non-specific template replication (in other words, replication that is template-dependent but not dependent on a specific template). Template specificity is here distinguished from fidelity of replication (in other words, synthesis of the proper polynucleotide sequence) and nucleotide (ribo- or deoxyribo-) specificity. Template specificity is frequently described in terms of "target" specificity. Target sequences are "targets" in the sense that they are sought to be sorted out from other nucleic acid. Amplification techniques have been designed primarily for this sorting out.

Template specificity is achieved in most amplification techniques by the choice of enzyme. Amplification enzymes are enzymes that, under conditions they are used, will process only specific sequences of nucleic acid in a heterogeneous mixture of nucleic acid. For example, in the case of Q replicase, MDV-1 R A is the specific template for the replicase (Kacian et at. (1972) Proc. Natl. Acad. Sci. USA, 69:3038) [42]. Other nucleic acid will not be replicated by this amplification enzyme. Similarly, in the case of T7 RNA polymerase, this amplification enzyme has a stringent specificity for its own promoters (Chamberlin et al. (1970) Nature, 228:227) [43]. In the case of T4 DNA ligase, the enzyme will not ligate the two oligonucleotides or polynucleotides, where there is a mismatch between the oligonucleotide or polynucleotide substrate and the template at the ligation junction (Wu and Wallace (1989) Genomics, 4:560) [44]. Finally, Taq and Pfu polymerases, by virtue of their ability to function at high temperature, are found to display high specificity for the sequences bounded and thus defined by the primers; the high temperature results in thermodynamic conditions that favor primer hybridization with the target sequences and not hybridization with non-target sequences (H. A. Erlich (ed.) (1989) PCR Technology, Stockton Press) [45].

The term "amplifiable nucleic acid" refers to nucleic acids that may be amplified by any amplification method. It is contemplated that "amplifiable nucleic acid" will usually comprise "sample template."

The term "sample template" refers to nucleic acid originating from a sample that is analyzed for the presence of "target" (defined below). In contrast, "background template" is used in reference to nucleic acid other than sample template that may or may not be present in a sample. Background template is most often inadvertent. It may be the result of carryover, or it may be due to the presence of nucleic acid contaminants sought to be purified away from the sample. For example, nucleic acids from organisms other than those to be detected may be present as background in a test sample.

The term "primer" refers to an oligonucleotide, whether occurring naturally as in a purified restriction digest or produced synthetically, which is capable of acting as a point of initiation of synthesis when placed under conditions in which synthesis of a primer extension product which is complementary to a nucleic acid strand is induced, (in other words, in the presence of nucleotides and an inducing agent such as DNA polymerase and at a suitable temperature and pH). The primer is preferably single stranded for maximum efficiency in amplification, but may alternatively be double stranded. If double stranded, the primer is first treated to separate its strands before being used to prepare extension products. Preferably, the primer is an oligodeoxyribonucleotide. The primer must be sufficiently long to prime the synthesis of extension products in the presence of the inducing agent. The exact lengths of the primers will depend on many factors, including temperature, source of primer and the use of the method. The term "polymerase chain reaction" ("PCR") refers to the method of K. B. Mullis U.S. Pat. Nos. 4,683,195, 4,683,202, and 4,965,188, that describe a method for increasing the concentration of a segment of a target sequence in a mixture of genomic DNA without cloning or purification [46-48]. This process for amplifying the target sequence consists of introducing a large excess of two oligonucleotide primers to the DNA mixture containing the desired target sequence, followed by a precise sequence of thermal cycling in the presence of a DNA polymerase. The two primers are complementary to their respective strands of the double stranded target sequence. To effect amplification, the mixture is denatured and the primers then annealed to their complementary sequences within the target molecule. Following annealing, the primers are extended with a polymerase so as to form a new pair of complementary strands. The steps of denaturation, primer annealing, and polymerase extension can be repeated many times (in other words, denaturation, annealing and extension constitute one "cycle"; there can be numerous "cycles") to obtain a high concentration of an amplified segment of the desired target sequence. The length of the amplified segment of the desired target sequence is determined by the relative positions of the primers with respect to each other, and therefore, this length is a controllable parameter. By virtue of the repeating aspect of the process, the method is referred to as the "polymerase chain reaction" (hereinafter "PCR"). Because the desired amplified segments of the target sequence become the predominant sequences (in terms of concentration) in the mixture, they are said to be "PCR amplified." With PCR, it is possible to amplify a single copy of a specific target sequence in genomic DNA to a level detectable by several different methodologies (for example, hybridization with a labeled probe; incorporation of biotinylated primers followed by avidin-enzyme conjugate detection; incorporation of . sup.32P -labeled deoxynucleotide triphosphates, such as dCTP or dATP, into the amplified segment). In addition to genomic DNA, any oligonucleotide or polynucleotide sequence can be amplified with the appropriate set of primer molecules. In particular, the amplified segments created by the PCR process itself are, themselves, efficient templates for subsequent PCR amplifications.

The terms "PCR product," "PCR fragment," and "amplification product" refer to the resultant mixture of compounds after two or more cycles of the PCR steps of denaturation, annealing and extension are complete. These terms encompass the case where there has been amplification of one or more segments of one or more target sequences.

The term "amplification reagents" refers to those reagents (deoxyribonucleotide triphosphates, buffer, etc.), needed for amplification except for primers, nucleic acid template, and the amplification enzyme. Typically, amplification reagents along with other reaction components are placed and contained in a reaction vessel (test tube, micro well, etc.).

The term "reverse-transcriptase" or "RT-PCR" refers to a type of PCR where the starting material is mRNA. The starting mRNA is enzymatically converted to complementary DNA or "cDNA" using a reverse transcriptase enzyme. The cDNA is then used as a "template" for a "PCR" reaction.

The term "RACE" refers to Rapid Amplification of cDNA Ends.

The term "gene expression" refers to the process of converting genetic information encoded in a gene into RNA (for example, mRNA, rRNA, tRNA, or snRNA) through "transcription" of the gene (in other words, via the enzymatic action of an RNA polymerase), and into protein, through "translation" of mRNA. Gene expression can be regulated at many stages in the process. "Up-regulation" or "activation" refers to regulation that increases the production of gene expression products (in other words, RNA or protein), while "down- regulation" or "repression" refers to regulation that decrease production. Molecules (for example, transcription factors) that are involved in up-regulation or down-regulation are often called "activators" and "repressors," respectively. The terms "in operable combination", "in operable order" and "operably linked" refer to the linkage of nucleic acid sequences in such a manner that a nucleic acid molecule capable of directing the transcription of a given gene and/or the synthesis of a desired protein molecule is produced. The term also refers to the linkage of amino acid sequences in such a manner so that a functional protein is produced.

The term "regulatory element" refers to a genetic element that controls some aspect of the expression of nucleic acid sequences. For example, a promoter is a regulatory element that facilitates the initiation of transcription of an operably linked coding region. Other regulatory elements are splicing signals, polyadenylation signals, termination signals, etc.

The terms "promoter" and "enhancer" as used herein are examples of transcriptional control signals. Promoters and enhancers comprise short arrays of DNA sequences that interact specifically with cellular proteins involved in transcription (Maniatis, et al, Science 236:1237, 1987) [49]. Promoter and enhancer elements have been isolated from a variety of eukaryotic sources including genes in yeast, algae insect, mammalian and plant cells. Promoter and enhancer elements have also been isolated from viruses and analogous control elements, such as promoters, are also found in prokaryotes. The selection of a particular promoter and enhancer depends on the cell type used to express the protein of interest. Some eukaryotic promoters and enhancers have a broad host range while others are functional in a limited subset of cell types (for review, see Voss, et al, Trends Biochem. Sci., 11 :287, 1986; and Maniatis, et al , supra 1 87)[49, 50] .

The terms "promoter element," "promoter," or "promoter sequence" as used herein, refer to a DNA sequence that is located at the 5' end (in other words precedes) the protein coding region of a DNA polymer. The location of most promoters known in nature precedes the transcribed region. The promoter functions as a switch, activating the expression of a gene. If the gene is activated, it is said to be transcribed, or participating in transcription. Transcription involves the synthesis of mRNA from the gene. The promoter, therefore, serves as a transcriptional regulatory element and also provides a site for initiation of transcription of the gene into mRNA. Promoters may be tissue specific or cell specific.

The term "tissue specific" as it applies to a promoter refers to a promoter that is capable of directing selective expression of a nucleotide sequence of interest to a specific type of tissue (for example, seeds) in the relative absence of expression of the same nucleotide sequence of interest in a different type of tissue (for example, leaves). Tissue specificity of a promoter may be evaluated by, for example, operably linking a reporter gene to the promoter sequence to generate a reporter construct, introducing the reporter construct into the genome of a plant such that the reporter construct is integrated into every tissue of the resulting transgenic plant, and detecting the expression of the reporter gene (for example, detecting mRNA, protein, or the activity of a protein encoded by the reporter gene) in different tissues of the transgenic plant. The detection of a greater level of expression of the reporter gene in one or more tissues relative to the level of expression of the reporter gene in other tissues shows that the promoter is specific for the tissues in which greater levels of expression are detected.

The term "cell type specific" as applied to a promoter refers to a promoter that is capable of directing selective expression of a nucleotide sequence of interest in a specific type of cell in the relative absence of expression of the same nucleotide sequence of interest in a different type of cell within the same tissue. The term "cell type specific" when applied to a promoter also means a promoter capable of promoting selective expression of a nucleotide sequence of interest in a region within a single tissue. Cell type specificity of a promoter may be assessed using methods well known in the art, for example, immunohistochemical staining. Briefly, tissue sections are embedded in paraffin, and paraffin sections are reacted with a primary antibody that is specific for the polypeptide product encoded by the nucleotide sequence of interest whose expression is controlled by the promoter. A labeled (for example, peroxidase conjugated) secondary antibody that is specific for the primary antibody is allowed to bind to the sectioned tissue and specific binding detected (for example, with avidin/biotin) by microscopy.

The term "constitutive" when made in reference to a promoter means that the promoter is capable of directing transcription of an operably linked nucleic acid sequence in the absence of a stimulus (for example, heat shock, chemicals, light, etc.). Typically, constitutive promoters are capable of directing expression of a transgene in substantially any cell and any tissue. Exemplary constitutive plant promoters include, but are not limited to SD Cauliflower Mosaic Virus (CaMV SD; see for example, U.S. Pat. No. 5,352,605 [51], incorporated herein by reference), mannopine synthase, octopine synthase (ocs), superpromoter (see for example, WO 95/14098), and ubi3 (see for example, Garbarino and Belknap (1994) Plant Mol. Biol. 24:119-127) promoters [52]. Such promoters have been used successfully to direct the expression of heterologo xs nucleic acid sequences in transformed plant tissue.

The term "regulatable" or "inducible", when made in reference to a promoter is one that is capable of directing a level of transcription of an operably linked nuclei acid sequence in the presence of a stimulus (for example, heat shock, chemicals, light, etc.) which is different from the level of transcription of the operably linked nucleic acid sequence in the absence of the stimulus.

An "endogenous" enhancer or promoter is one that is naturally linked with a given gene in the genome.

An "exogenous", "ectopic" or "heterologous" enhancer or promoter is one that is placed in juxtaposition to a gene by means of genetic manipulation (in other words, molecular biological techniques) such that transcription of the gene is directed by the linked enhancer or promoter. For example, an endogenous promoter in operable combination with a first gene can be isolated, removed, and placed in operable combination with a second gene, thereby making it a "heterologous promoter" in operable combination with the second gene. A variety of such combinations are contemplated (for example, the first and second genes can be from the same species, or from different species).

The presence of "splicing signals" on an expression vector often results in higher levels of expression of the recombinant transcript in eukaryotic host cells. Splicing signals mediate the removal of introns from the primary RNA transcript and consist of a splice donor and acceptor site (Sambrook, et al. (1989) Molecular Cloning: A Laboratory Manual, 2nd ed., Cold Spring Harbor Laboratory Press, New York, pp. 16.7-16.8) [53]. A commonly used splice donor and acceptor site is the splice junction from the 16S RNA of SV40. Efficient expression of recombinant DNA sequences in eukaryotic cells requires expression of signals directing the efficient termination and polyadenylation of the resulting transcript. Transcription termination signals are generally found downstream of the polyadenylation signal and are a few hundred nucleotides in length. The term "poly(A) site" or "poly(A) sequence" as used herein denotes a DNA sequence which directs both the termination and polyadenylation of the nascent RNA transcript. Efficient polyadenylation of the recombinant transcript is desirable, as transcripts lacking a poly(A) tail are unstable and are rapidly degraded. The poly(A) signal utilized in an expression vector may be "heterologous" or "endogenous." An endogenous poly(A) signal is one that is found naturally at the 3' end of the coding region of a given gene in the genome. A heterologous poly(A) signal is one which has been isolated from one gene and positioned 3' to another gene. A commonly used heterologous poly(A) signal is the SV40 poly(A) signal. The SV40 poly(A) signal is contained on a 237 bp BamHI/BclI restriction fragment and directs both termination and polyadenylation (Sambrook, supra, at 16.6-16.7). The term "selectable marker" refers to a gene which encodes an enzyme having an activity that confers resistance to an antibiotic or drug upon the cell in which the selectable marker is expressed, or which confers expression of a trait which can be detected (for example luminescence or fluorescence). Selectable markers may be "positive" or "negative." Examples of positive selectable markers include the neomycin phosphotransferase (NPTII) gene that confers resistance to G418 and to kanamycin, and the bacterial hygromycin phosphotransferase gene (hyg), which confers resistance to the antibiotic hygromycin. Negative selectable markers encode an enzymatic activity whose expression is cytotoxic to the cell when grown in an appropriate selective medium. For example, the HSV-tk gene is commonly used as a negative selectable marker. Expression of the HSV-tk gene in cells grown in the presence of gancyclovir or acyclovir is cytotoxic; thus, growth of cells in selective medium containing gancyclovir or acyclovir selects against cells capable of expressing a functional HSV TK enzyme.

The term "vector" as used herein, refers to any nucleic acid molecule that transfers DNA segment(s) from one cell to another. The term "vehicle" is sometimes used interchangeably with "vector."

The terms "expression vector" or "expression cassette" as used herein, refer to a recombinant DNA molecule containing a desired coding sequence and appropriate nucleic acid sequences necessary for the expression of the operably linked coding sequence in a particular host organism. Nucleic acid sequences necessary for expression in prokaryotes usually include a promoter, an operator (optional), and a ribosome binding site, often along with other sequences. Eukaryotic cells are known to utilize promoters, enhancers, and termination and polyadenylation signals.

The term "transfection", as used herein, refers to the introduction of foreign DNA into cells. Transfection may be accomplished by a variety of means known to the art including calcium phosphate-DNA co-precipitation, DEAE-dextran-mediated transfection, polybrene- mediated transfection, glass beads, electroporation, microinjection, liposome fusion, lipofection, protoplast fusion, viral infection, biolistics (in other words, particle bombardment) and the like.

The term "Agrobacterium" refers to a soil-borne, Gram-negative, rod-shaped phytopathogenic bacterium that causes crown gall. The term "Agrobacterium" includes, but is not limited to, the strains Agrobacterium tumefaciens, (which typically causes crown gall in infected plants), and Agrobacterium rhizogens (which causes hairy root disease in infected host plants). Infection of a plant cell with Agrobacterium generally results in the production of opines (for example, nopaline, agropine, octopine etc.) by the infected cell. Thus, Agrobacterium strains which cause production of nopaline (for example, strain LBA4301, C58, A208, GV3101) are referred to as "nopaline-type" Agrobacteria; Agrobacterium strains which cause production of octopine (for example, strain LBA4404, Ach5, B6) are referred to as "octopine-type" Agrobacteria; and Agrobacterium strains which cause production of agropine (for example, strain EHA105, EHA101, A281) are referred to as "agropine-type" Agrobacteria.

The terms "bombarding, "bombardment," and "biolistic bombardment" refer to the process of accelerating particles towards a target biological sample (for example, cell, tissue, etc.) to effect wounding of the cell membrane of a cell in the target biological sample and/or entry of the particles into the target biological sample. Methods for biolistic bombardment are known in the art (for example, U.S. Pat. No. 5,584,807 [54], the contents of which are incorporated herein by reference), and are commercially available (for example, the helium gas-driven microprojectile accelerator (PDS-1000/He, BioRad). The term "microwounding" when made in reference to plant tissue refers to the introduction of microscopic wounds in that tissue. Microwounding may be achieved by, for example, particle bombardment as described herein.

The term "transgenic" when used in reference to a plant or fruit or seed (in other words, a "transgenic plant" or "transgenic fruit" or a "transgenic seed") refers to a plant or fruit or seed that contains at least one heterologous gene in one or more of its cells. The term "transgenic plant material" refers broadly to a plant, a plant structure, a plant tissue, a plant seed or a plant cell that contains at least one heterologous gene in one or more of its cells.

The terms "transformants" or "transformed cells" include the primary transformed cell and cultures derived from that cell without regard to the number of transfers. All progeny may not be precisely identical in DNA content, due to deliberate or inadvertent mutations. Mutant progeny that have the same functionality as screened for in the originally transformed cell are included in the definition of transformants.

The term "wild-type", "native", or "natural" when made in reference to a gene or protein that has the characteristics of a gene or protein isolated from a naturally occurring source. The term "wild-type" when made in reference to a gene or protein product refers to a gene or protein product that has the characteristics of a gene or protein product isolated from a naturally occurring source. A wild-type gene or protein is that which is most frequently observed in a population and is thus arbitrarily designated the "normal" form. In contrast, the term "modified" or "mutant" when made in reference to a gene, gene product, or protein refers, respectively, to a gene, gene product, or protein which displays modifications in sequence and/or functional properties (in other words, altered characteristics) when compared to the wild-type gene or gene product. It is noted that naturally-occurring mutants can be isolated; these are identified by the fact that they have altered characteristics when compared to the wild-type gene, gene product or protein. The term "antisense" refers to a deoxyribonucleotide sequence whose sequence of deoxyribonucleotide residues is in reverse 5' to 3' orientation in relation to the sequence of deoxyribonucleotide residues in a sense strand of a DNA duplex. A "sense strand" of a DNA duplex refers to a strand in a DNA duplex that is transcribed by a cell in its natural state into a "sense mRNA." Thus an "antisense" sequence is a sequence having the same sequence as the non-coding strand in a DNA duplex. The term "antisense RNA" refers to a RNA transcript that is complementary to all or part of a target primary transcript or mRNA and that blocks the expression of a target gene by interfering with the processing, transport and/or translation of its primary transcript or mRNA. The complementarity of an antisense RNA may be with any part of the specific gene transcript, in other words, at the 5' non-coding sequence, 3' non- coding sequence, introns, or the coding sequence. In addition, as used herein, antisense RNA may contain regions of ribozyme sequences that increase the efficacy of antisense RNA to block gene expression. "Ribozyme" refers to a catalytic RNA and includes sequence-specific endoribonucleases. "Antisense inhibition" refers to the production of antisense RNA transcripts capable of preventing the expression of the target protein.

The term "siRNAs" refers to short interfering RNAs. In some embodiments, siRNAs comprise a duplex, or double-stranded region, of about 18-25 nucleotides long; often siRNAs contain from about two to four unpaired nucleotides at the 3' end of each strand. At least one strand of the duplex or double-stranded region of a siRNA is substantially homologous to or substantially complementary to a target RNA molecule. The strand complementary to a target RNA molecule is the "antisense strand;" the strand homologous to the target RNA molecule is the "sense strand," and is also complementary to the siRNA antisense strand. siRNAs may also contain additional sequences; non-limiting examples of such sequences include linking sequences, or loops, as well as stem and other folded structures. siRNAs appear to function as key intermediaries in triggering RNA interference in invertebrates and in vertebrates, and in triggering sequence-specific RNA degradation during posttranscriptional gene silencing in plants.

The term "target RNA molecule" refers to an RNA molecule to which at least one strand of the short double-stranded region of an siRNA is homologous or complementary. Typically, when such homology or complementary is about 100%, the siRNA is able to silence or inhibit expression of the target RNA molecule. Although it is believed that processed mRNA is a target of siRNA, the present invention is not limited to any particular hypothesis, and such hypotheses are not necessary to practice the present invention. Thus, it is contemplated that other RNA molecules may also be targets of siRNA. Such targets include unprocessed mRNA, ribosomal RNA, and viral RNA genomes.

The term "RNA interference" or "RNAi" refers to the silencing or decreasing of gene expression by siRNAs. It is the process of sequence-specific, post-transcriptional gene silencing in animals and plants, initiated by siRNA that is homologous in its duplex region to the sequence of the silenced gene. The gene may be endogenous or exogenous to the organism, present integrated into a chromosome or present in a transfection vector that is not integrated into the genome. The expression of the gene is either completely or partially inhibited. RNAi may also be considered to inhibit the function of a target RNA; the function of the target RNA may be complete or partial.

The term "posttranscriptional gene silencing" or "PTGS" refers to silencing of gene expression in plants after transcription, and appears to involve the specific degradation of mRNAs synthesized from gene repeats.

The term "overexpression" refers to the production of a gene product in transgenic organisms that exceeds levels of production in normal or non-transformed organisms. The term "cosuppression" refers to the expression of a foreign gene that has substantial homology to an endogenous gene resulting in the suppression of expression of both the foreign and the endogenous gene.

The term "altered levels" refers to the production of gene product(s) in transgenic organisms in amounts or proportions that differ from that of normal or non-transformed organisms.

The term "recombinant" when made in reference to a nucleic acid molecule refers to a nucleic acid molecule that is comprised of segments of nucleic acid joined together by means of molecular biological techniques.

The term "recombinant" when made in reference to a protein or a polypeptide refers to a protein molecule that is expressed using a recombinant nucleic acid molecule.

The terms "Southern blot analysis" and "Southern blot" and "Southern" refer to the analysis of DNA on agarose or acrylamide gels in which DNA is separated or fragmented according to size followed by transfer of the DNA from the gel to a solid support, such as nitrocellulose or a nylon membrane. The immobilized DNA is then exposed to a labeled probe to detect DNA species complementary to the probe used. The DNA may be cleaved with restriction enzymes prior to electrophoresis. Following electrophoresis, the DNA may be partially depurinated and denatured prior to or during transfer to the solid support. Southern blots are a standard tool of molecular biologists (J. Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Press, NY, pp 9.31-9.58) [53].

The term "Northern blot analysis" and "Northern blot" and "Northern" as used herein refer to the analysis of RNA by electrophoresis of RNA on agarose gels to fractionate the RNA according to size followed by transfer of the RNA from the gel to a solid support, such as nitrocellulose or a nylon membrane. The immobilized RNA is then probed with a labeled probe to detect RNA species complementary to the probe used. Northern blots are a standard tool of molecular biologists (J. Sambrook, et al. (1989) supra, pp 7.39-7.52).

The terms "Western blot analysis" and "Western blot" and "Western" refers to the analysis of protein(s) (or polypeptides) immobilized onto a support such as nitrocellulose or a membrane. A mixture comprising at least one protein is first separated on an acrylamide gel, and the separated proteins are then transferred from the gel to a solid support, such as nitrocellulose or a nylon membrane. The immobilized proteins are exposed to at least one antibody with reactivity against at least one antigen of interest. The bound antibodies may be detected by various methods, including the use of radiolabeled antibodies.

The term "isolated" when used in relation to a nucleic acid, as in "an isolated oligonucleotide" refers to a nucleic acid sequence that is identified and separated from at least one contaminant nucleic acid with which it is ordinarily associated in its natural source. Isolated nucleic acid is present in a form or setting that is different from that in which it is found in nature. In contrast, non-isolated nucleic acids, such as DNA and RNA, are found in the state they exist in nature. For example, a given DNA sequence (for example, a gene) is found on the host cell chromosome in proximity to neighboring genes; RNA sequences, such as a specific mRNA sequence encoding a specific protein, are found in the cell as a mixture with numerous other mRNAs that encode a multitude of proteins. However, isolated nucleic acid encoding a plant DAGAT includes, by way of example, such nucleic acid in cells ordinarily expressing a DAGAT, where the nucleic acid is in a chromosomal location different from that of natural cells, or is otherwise flanked by a different nucleic acid sequence than that found in nature. The isolated nucleic acid or oligonucleotide may be present in single-stranded or double-stranded form. When an isolated nucleic acid or oligonucleotide is to be utilized to express a protein, the oligonucleotide will contain at a minimum the sense or coding strand (in other words, the oligonucleotide may single- stranded), but may contain both the sense and anti-sense strands (in other words, the oligonucleotide may be double-stranded).

The term "purified" refers to molecules, either nucleic or amino acid sequences that are removed from their natural environment, isolated or separated. An "isolated nucleic acid sequence" is therefore a purified nucleic acid sequence. "Substantially purified" molecules are at least 60% free, preferably at least 75% free, and more preferably at least 90% free from other components with which they are naturally associated. The term "purified" or "to purify" also refer to the removal of contaminants from a sample. The removal of contaminating proteins results in an increase in the percent of polypeptide of interest in the sample. In another example, recombinant polypeptides are expressed in plant, bacterial, yeast, or mammalian host cells and the polypeptides are purified by the removal of host cell proteins; the percent of recombinant polypeptides is thereby increased in the sample.

The term "sample" is used in its broadest sense. In one sense it can refer to a plant cell or tissue. In another sense, it is meant to include a specimen or culture obtained from any source, as well as biological and environmental samples. Biological samples may be obtained from plants or animals (including humans) and encompass fluids, solids, tissues, and gases. Environmental samples include environmental material such as surface matter, soil, water, and industrial samples. These examples are not to be construed as limiting the sample types applicable to the present invention.

DETAILED DESCRIPTION OF THE INVENTION

The present invention is related to biosynthetic oil compositions and methods of making thereof. The present invention contemplates using plant material (i.e., for example, algae) comprising recombinant transcription inducing factors for biosynthetic oil genes. In some embodiments, the inducing factors are transcription factor regulatory proteins. The presently contemplated invention addresses a widely recognized need for the development of biomass-based domestic production systems for high energy liquid transportation fuels. In one embodiment, the present invention contemplates inducing oil (i.e., for example, triacylglycerol) biosynthesis in microalgae. This novel inventive concept provides new insights that lay the foundation for rational engineering of algae-based production systems for high energy fuels. Initial efforts are focused on the unicellular model green alga Chlamydomonas reinhardtii with its abundance of genetic and genomic resources.

REFERENCES:

I. Hildish and Williams (1964) Chemical Composition of Natural Lipids, Chapman Hall, London.

2. Knauf, V. C. (1987) The application of genetic engineering to oilseed crops, Trends Biotechnol. 5, 40-47.

3. Wijffels, R. H. and Barbosa, M. J. (2010) An Outlook on Microalgal Biofuels,

Science 329, 796-799.

4. Hu, Q. et al. (2008) Microalgal triacylglycerols as feedstocks for biofuel production: perspectives and advances, Plant J. 54, 621-639.

5. Moellering, E. R. and Benning, C. (2010) Phosphate Regulation of Lipid Biosynthesis in Arabidopsis Is Independent of the Mitochondrial Outer Membrane DGS1 Complex, Plant Physiol. 152, 1951-1959.

6. Wang, Z. T. et al. (2009) Algal Lipid Bodies: Stress Induction, Purification, and

Biochemical Characterization in Wild-Type and Starchless Chlamydomonas reinhardtii, Eukaryot. Cell 8, 1856-1868.

7. Dismukes, G. C. et al. (2008) Aquatic phototrophs: efficient alternatives to land-based crops for biofuels, Curr. Opin. Biotechnol. 19, 235-240.

8. Ohlrogge, J. et al. (2009) Energy. Driving on Biomass, Science 324, 1019-1020. 9. Durrett, T. P., Benning, C, and Ohlrogge, J. (2008) Plant triacylglycerols as

feedstocks for the production of biofuels, Plant J. 54, 593-607.

10. Weyer, K. et al. (2010) Theoretical Maximum Algal Oil Production, BioEnergy

Research 3, 204-213.

I I . Radakovits, R. et al. (2010) Genetic Engineering of Algae for Enhanced Biofuel Production, Eukaryot. Cell 9, 486-501.

12. Niyogi, . K. (1999) Photoprotection revisited: Genetic and Molecular Approaches, Annu. Rev. Plant Physiol. Plant Mol. Biol. 50, 333.

13. Silflow, C. D. and Lefebvre, P. A. (2001) Assembly and Motility of Eukaryotic Cilia and Flagella. Lessons from Chlamydomonas reinhardtii, Plant Physiol. 127, 1500- 1507.

14. Wu-Scharf, D. et al. (2000) Transgene and Transposon Silencing in Chlamydomonas reinhardtii by a DEAH-Box RNA Helicase, Science 290, 1159-1162.

15. Davies, J. P., Yildiz, F. H., and Grossman, A. (1996) Sacl , a putative regulator that is critical for survival of Chlamydomonas reinhardtii during sulfur deprivation, EMBO J. 15, 2150-2159.

16. Shrager, J. et al. (2003) Chlamydomonas reinhardtii Genome Project. A Guide to the Generation and Use of the cDNA Information, Plant Physiol. 131, 401-408. 17. Grossman, A. R. et al. (2003) Chlamydomonas reinhardtii at the Crossroads of Genomics, Eukaryot. Cell 2, 1137-1150.

18. Merchant, S. S. et al. (2007) The Chlamydomonas Genome Reveals the Evolution of Key Animal and Plant Functions, Science 318, 245-250.

19. Tarn, L. W. and Lefebvre, P. A. ( 1993) Cloning of Flagellar Genes in

Chlamydomonas reinhardtii by DNA Insertional Mutagenesis, Genetics 135, 375-384. 20. Fuhrmann, M. et al. (2001) The abundant retinal protein of the Chlamydomonas eye is not the photoreceptor for phototaxis and photophobic responses, J. Cell Sci. 114, 3857-3863.

21. Sineshchekov, O. A., Jung, K.-H., and Spudich, J. L. (2002) Two rhodopsins mediate phototaxis to low- and high-intensity light in Chlamydomonas reinhardtii, Proc. Natl. Acad. Sci. U. S. A. 99, 8689-8694.

22. Kathir, P. et al. (2003) Molecular Map of the Chlamydomonas reinhardtii Nuclear Genome, Eukaryot. Cell 2, 362-379.

23. Giroud, C. and Eichenberger, W. (1989) Lipids of Chlamydomonas reinhardtii.

Incorporation of [ 14 C]Acetate, [ 14 C]Palmitate and [ 14 C]01eate into Different Lipids and Evidence for Lipid-Linked Desaturation of Fatty Acids, Plant Cell Physiol. 30, 121-128.

24. Moore, T. S., Du, Z., and Chen, Z. (2001) Membrane Lipid Biosynthesis in

Chlamydomonas reinhardtii. In Vitro Biosynthesis of

Diacylglyceryltrimethylhomoserine, Plant Physiol. 125, 423-429.

25. Riekhof, W. R., Sears, B. B., and Benning, C. (2005) Annotation of Genes Involved in Glycerolipid Biosynthesis in Chlamydomonas reinhardtii: Discovery of the Betaine Lipid Synthase BTAlCr, Eukaryot. Cell 4, 242-252.

26. Sato, N. (1988) Dual Role of Methionine in the Biosynthesis of

Diacylglyceryltrimethylhomoserine in Chlamydomonas reinhardtii, Plant Physiol. 86, 931-934.

27. Sato, N. and Murata, N. (1991) Transition of lipid phase in aqueous dispersions of diacylglyceryltrimemylhomoserine, Biochim. Biophys. Acta 1082, 108-11 1.

28. Yang, W. et al. (2004) Membrane lipid biosynthesis in Chlamydomonas reinhardtii: expression and characterization of CTP:phosphoethanolamine cytidylyltransferase, Biochem. J. 382, 51-57.

29. El Maanni, A. et al. (1998) Mutants of Chlamydomonas reinhardtii affected in

phosphatidylglycerol metabolism and thylakoid biogenesis, Plant Physiology and Biochemistry 36, 609-619.

30. Pineau, B. et al. (2004) A single mutation that causes phosphatidylglycerol deficiency impairs synthesis of photosystem II cores in Chlamydomonas reinhardtii, Eur. J. Biochem. 271, 329-338. 31. Seras, M. et al. (1989) Lipid Biosynthesis in Cells of the Wild-Type and of 2

Photosynthesis Mutants of Chlamydomonas-reinhardtii, Plant Physiology and Biochemistry 27, 393-399.

32. Riekhof, R. W. and Benning, C. (2009) Organellar and Metabolic Processes, Vol. 2, Second ed., Academic Press, Elsevier, Boston.

33. Boyle, N. R. and Morgan, J. A. (2009) Flux balance analysis of primary metabolism in Chlamydomonas reinhardtii, BMC Syst. Biol. 3, 4.

34. Giroud, C, Gerber, A., and Eichenberger, W. (1988) Lipids of Chlamydomonas reinhardtii. Analysis of Molecular Species and Intracellular Site(s) of Biosynthesis, Plant Cell Physiol. 29, 587-595.

35. Benning, C. (2009) Mechanisms of Lipid Transport Involved in Organelle Biogenesis in Plant Cells, Annu. Rev. Cell Dev. Biol. 25, 71-91.

36. Moellering, E. R. and Benning, C. (2010) RNA Interference Silencing of a Major Lipid Droplet Protein Affects Lipid Droplet Size in Chlamydomonas reinhardtii, Eukaryot. Cell 9, 97-106.

37. Work, V. H. et al. (2010) Increased Lipid Accumulation in the Chlamydomonas reinhardtii sta7-10 Starchless Isoamylase Mutant and Increased Carbohydrate Synthesis in Complemented Strains, Eukaryot. Cell 9, 1251-1261.

38. Gonzalez-Ballester, D. et al. (2010) RNA-Seq Analysis of Sulfur-Deprived

Chlamydomonas Cells Reveals Aspects of Acclimation Critical for Cell Survival,

Plant Cell 22, 2058-2084.

39. Miller, R. et al. (2010) Changes in transcript abundance in Chlamydomonas

reinhardtii following nitrogen-deprivation predict diversion of metabolism, Plant Physiol, provisionally accepted.

40. Neupert, J., Karcher, D., and Bock, R. (2009) Generation of Chlamydomonas strains that efficiently express nuclear transgenes, Plant J. 57, 1140-1150.

41. Cernac, A. and Benning, C. (2004) WRINKLED 1 encodes an AP2/EREB domain protein involved in the control of storage compound biosynthesis in Arabidopsis, The Plant Journal: For Cell And Molecular Biology 40, 575-585.

42. Kacian, D. L. et al. (1972) A Replicating RNA Molecule Suitable for a Detailed

Analysis of Extracellular Evolution and Replication, Proc. Natl. Acad. Sci. U. S. A. 69, 3038-3042.

43. Chamberlin, M., McGrath, J., and Waskell, L. (1970) New RNA Polymerase from Escherichia coli infected with Bacteriophage T7, Nature 228, 227-231.

44. Wu, D. Y. and Wallace, R. B. (1989) The ligation amplification reaction (LAR)- Amplifi cation of specific DNA sequences using sequential rounds of template- dependent ligation, Genomics 4, 560-569. 45. Erlich, H. A., (Ed.) (1989) PCR Technology: Principles and Applications for DNA Amplification, Stockton Press, New York.

46. Mullis, K. B. et al. "Process for amplifying, detecting, and/or-cloning nucleic acid sequences," United States Patent 4,683,195 (published July 28, 1987 ).

47. Mullis, K. B. "Process for amplifying nucleic acid sequences," United States Patent 4,683,202 (published July 28, 1987).

48. Mullis, K. B. et al. "Process for amplifying, detecting, and/or cloning nucleic acid sequences using a thermostable enzyme," United States Patent 4,965,188 (published October 23, 1990).

49. Maniatis, T., Goodbourn, S., and Fischer, J. A. (1987) Regulation of inducible and tissue-specific gene expression, Science 236, 1237-1245.

50. Voss, S. D., Schlokat, U., and Gruss, P. (1986) The role of enhancers in the regulation of cell-type-specific transcriptional control, Trends Biochem. Sci. 11, 287-289.

51. Fraley, R. T., Horsch, R. B., and Rogers, S. G. "Chimeric genes for transforming plant cells using viral promoters," United States Patent 5,352,605 (published October 4,

1994).

52. Garbarino, J. E. and Belknap, W. R. (1994) Isolation of a ubiquitin-ribosomal protein gene (ubi3) from potato and expression of its promoter in transgenic plants, Plant Mol. Biol. 24, 119-127.

53. Sambrook, J., Fritsch, E. F., and Maniatis, T., (Eds.) (1989) Molecular Cloning: A Laboratory Manual, 2nd ed., Cold Spring Harbor Laboratory Press, New York.

54. Mccabe, D. E. "Gas driven gene delivery instrument," United States Patent 5,584,807 (published December 17, 1996).