Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
ISOLATED GENES AND TRANSGENIC ORGANISMS FOR PRODUCING BIOFUELS
Document Type and Number:
WIPO Patent Application WO/2014/047103
Kind Code:
A2
Abstract:
The present invention relates to the production of lipids and biofuels with a culture of Candidatus Microthrix spp. grown on a medium such as wastewater or sewage sludge. The Candidatus Microthrix spp. may be cultured with additional microorganisms that contribute to the accumulation of lipids from the growth medium such as Zoogloea spp., Rhizobacter spp., Blautia spp., Hydrolatea spp., OD1 genera incertae sedis. The present invention is also directed to a transformed organism comprising genes isolated from Candidatus Microthrix parvicella. Additionally, lipids, fatty acids, or biofuels may be produced or processed in vitro by the protein products of the isolated genes.

Inventors:
MULLER EMILIE (LU)
WILMES PAUL (LU)
KEIM PAUL S (US)
GILLECE JOHN D (US)
SCHUPP JAMES M (US)
PRICE LANCE B (US)
ENGELTHALER DAVID M (US)
Application Number:
PCT/US2013/060285
Publication Date:
March 27, 2014
Filing Date:
September 18, 2013
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
TRANSLATIONAL GENOMICS RES INST (US)
UNIV LUXEMBOURG (LU)
MULLER EMILIE (LU)
WILMES PAUL (LU)
KEIM PAUL S (US)
GILLECE JOHN D (US)
SCHUPP JAMES M (US)
PRICE LANCE B (US)
ENGELTHALER DAVID M (US)
International Classes:
C12P7/64
Foreign References:
US20110250658A12011-10-13
US20120184003A12012-07-19
US20030106091A12003-06-05
Other References:
LEVANTESI ET AL.: 'Phylogeny, physiology and distribution of 'Candidatus Microthrix calida', a new Microthrix species isolated from industrial activated sludge wastewater treatment plants.' ENVIRON MICROBIOL. vol. 8, no. G, 2006, pages 1552 - 63
ROSSETTI ET AL.: 'Microthrix parvicella'', a filamentous bacterium causing bulking and foaming in activated sludge systems: a review of current knowledge.' FEMS MICROBIOL REV. vol. 29, no. 1, 2005, pages 49 - 64
MULLER ET AL.: 'Genome Sequence of ''Candidatus Microthrix parvicella'' Biol7-1, a Long-Chain- Fatty-Acid-Accumulating Filamentous Actinobacterium from a Biological Wastewater Treatment Plant.' J BACTERIOL. vol. 194, no. 23, December 2012, pages 6670 - 1
Attorney, Agent or Firm:
FULLER, Rodney J. (PLC1255 West Rio Salado Parkway,Suite 21, Tempe Arizona, US)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1. A method of producing a lipid comprising:

culturing Candidatus Microthrix spp. in a culture system suitable for maintaining the viability and proliferation of the Candidatus Microthrix spp. wherein the culture system provides a carbon source or lipid material in a growth medium;

isolating the Candidatus Microthrix spp. from the culture system; and

extracting the lipid from the Candidatus Microthrix spp.

2. The method of claim 1, further comprising producing a bio fuel from the extracted lipid.

3. The method according to any one of the preceding claims, further comprising: measuring depletion of the lipid material in the growth medium prior to isolating the

Candidatus Microthrix spp. from the culture system.

4. The method according to any one of the preceding claims, wherein the Candidatus Microthrix spp. are cultured at a temperature below 20 °C.

5. The method of claim 4, wherein the Candidatus Microthrix spp. are cultured at a temperature below 15 °C.

6. The method according to any one of the preceding claims, wherein the step of isolating the Candidatus Microthrix spp. from the culture system is selected from the group consisting of a filtration method, a sedimentation method, a centrifugation method, a mechanical collection method, and a combination thereof.

7. The method according to any one of the preceding claims, wherein the step of extracting the lipid from the Candidatus Microthrix spp. is selected from the group consisting of a solvent extraction method, a steam extraction method, a chemical extraction method, a mechanical extraction method, an enzymatic extraction method, and a combination thereof.

8. The method according to any one of the preceding claims, wherein the growth medium comprises wastewater or sewage sludge.

9. The method according to any one of the preceding claims, wherein the Candidatus Microthrix spp. is selected from the group consisting of Candidatus Microthrix parvicella and Candidatus Microthrix calida.

10. The method according to any one of the preceding claims, wherein the Candidatus Microthrix spp. is an environmental strain isolated from wastewater or sewer sludge.

11. The method of claim 10, wherein the Candidatus Microthrix parvicella is isolated with a method selected from the group consisting of laser microdissection, single cell encapsulation combined with flow cytometry, manipulation with optical tweezers, and segregation with a micromanipulator.

12. The method according to any one of the preceding claims, further comprising: culturing in the culture system with the Candidatus Microthrix spp. a microorganism selected from the group consisting of Zoogloea spp., Rhizobacter spp., Blautia spp., Hydrolatea spp., OD1 genera incertae sedis, and combinations thereof.

13. The method of claim 12, wherein the microorganism is Zoogloea spp.

14. The method of claim 12, wherein the microorganism is a combination of Rhizobacter spp., Blautia spp., Hydrolatea spp., and OD1 genera incertae sedis.

15. The method of claim 12, wherein the culture system is inoculated with a ratio of Candidatus Microthrix spp. cells to microorganism cells of from about 1 : 1 to about 100: 1.

16. The method of claim 15, wherein the culture system is inoculated with a ratio of Candidatus Microthrix spp. cells to microorganism cells of from about 5: 1 to about 20: 1.

17. A method of producing a lipid comprising:

transforming a microorganism with a nucleic acid encoding an enzyme selected from the group consisting of a long-chain-fatty-acid-CoA-ligase, an enoyl-CoA hydratase, a lipase, a 3- ketoacyl-CoA thiolase, an acyl-CoA thioesterase, a 3-hydroxyacyl-CoA dehydrogenase, and combinations thereof;

culturing the microorganism in a culture system suitable for maintaining the viability and proliferation of the microorganism wherein the culture system provides a carbon source or lipid material in a growth medium;

isolating the microorganism from the culture system; and

extracting the lipid from the microorganism. 18. The method of claim 17, further comprising producing a bio fuel from the extracted lipid.

19. The method of claim 17, wherein the microorganism is selected from the group consisting of Candidatus Microthrix spp., Bacillus spp., Saccharomyces cerevisiae, Escherichia coli, a cyanobacterium, and an alga.

20. The method of claim 17, wherein the nucleic acid encoding the long-chain- fatty- acid-CoA-ligase comprises a nucleotide sequence selected from the group consisting of SEQ ID NO : 1, SEQ ID NO : 3, SEQ ID NO : 5, SEQ ID NO : 7, SEQ ID NO : 9, SEQ ID NO : 11, SEQ ID NO : 13, SEQ ID NO : 15, SEQ ID NO : 17, SEQ ID NO : 19, SEQ ID NO : 21, SEQ ID NO : 23, SEQ ID NO : 25, SEQ ID NO : 27, SEQ ID NO : 29, SEQ ID NO : 31, SEQ ID NO : 33, SEQ ID NO : 35, SEQ ID NO : 37, SEQ ID NO : 39, SEQ ID NO : 41, SEQ ID NO : 43, SEQ ID NO : 45, SEQ ID NO : 47, SEQ ID NO : 49, SEQ ID NO : 51, SEQ ID NO : 53, and SEQ ID NO : 55.

21. The method of claim 17, wherein the nucleic acid encoding the enoyl-CoA hydratase comprises a nucleotide sequence selected from the group consisting of SEQ ID NO : 57, SEQ ID NO : 59, SEQ ID NO : 61, SEQ ID NO : 63, SEQ ID NO : 65, SEQ ID NO : 67, SEQ ID NO : 69, SEQ ID NO : 71, SEQ ID NO : 73, SEQ ID NO : 75, SEQ ID NO : 77, SEQ ID NO : 79, SEQ ID NO : 81, SEQ ID NO : 83, SEQ ID NO : 85, SEQ ID NO : 87, and SEQ ID NO : 89.

22. The method of claim 17, wherein the nucleic acid encoding the lipase comprises a nucleotide sequence selected from the group consisting of SEQ ID NO : 91 and SEQ ID NO : 93.

23. The method of claim 17, wherein the nucleic acid encoding the 3-ketoacyl-CoA thiolase comprises a nucleotide sequence selected from the group consisting of SEQ ID NO : 95, SEQ ID NO : 97, SEQ ID NO : 99, SEQ ID NO : 101, SEQ ID NO : 103, SEQ ID NO : 105, SEQ ID NO : 107, and SEQ ID NO : 109.

24. The method of claim 17, wherein the nucleic acid encoding the acyl-CoA thioesterase comprises a nucleotide sequence of SEQ ID NO : 111.

25. The method of claim 17, wherein the nucleic acid encoding the 3 -hydroxy acyl- CoA dehydrogenase comprises a nucleotide sequence of SEQ ID NO : 113.

26. The method according to any one of claims 17 to 25 wherein the growth medium comprises wastewater or sewage sludge. 27. The method according to claim 2 or claim 18, wherein the step of producing a biofuel from the extracted lipids comprises transesterification of the lipids.

28. An isolated strain of Candidatus Microthrix parvicella Bio 17-1. 29. An isolated peptide comprising a sequence having at least 80% homology to a sequence selected from the group consisting of: SEQ ID NO : 2, SEQ ID NO : 4, SEQ ID NO : 6, SEQ ID NO : 8, SEQ ID NO : 10, SEQ ID NO : 12, SEQ ID NO : 14, SEQ ID NO : 16, SEQ ID NO : 18, SEQ ID NO : 20, SEQ ID NO : 22, SEQ ID NO : 24, SEQ ID NO : 26, SEQ ID NO : 28, SEQ ID NO : 30, SEQ ID NO : 32, SEQ ID NO : 34, SEQ ID NO : 36, SEQ ID NO : 38, SEQ ID NO : 40, SEQ ID NO : 42, SEQ ID NO : 44, SEQ ID NO : 46, SEQ ID NO : 48, SEQ ID NO : 50, SEQ ID NO : 52, SEQ ID NO : 54, SEQ ID NO : 56, SEQ ID NO : 58, SEQ ID NO : 60, SEQ ID NO : 62, SEQ ID NO : 64, SEQ ID NO : 66, SEQ ID NO : 68, SEQ ID NO : 70, SEQ ID NO : 72, SEQ ID NO : 74, SEQ ID NO : 76, SEQ ID NO : 78, SEQ ID NO : 80, SEQ ID NO : 82, SEQ ID NO : 84, SEQ ID NO : 86, SEQ ID NO : 88, SEQ ID NO : 90, SEQ ID NO : 92, SEQ ID NO : 94, SEQ ID NO : 96, SEQ ID NO : 98, SEQ ID NO : 100, SEQ ID NO : 102, SEQ ID NO : 104, SEQ ID NO : 106, SEQ ID NO : 108, SEQ ID NO : 110, SEQ ID NO : 112, and SEQ ID NO 114.

30. A method of producing or modifying a lipid or fatty acid comprising:

adding to a suitable feedstock the peptide of claims 29; and

collecting and purifying the resulting lipid or fatty acid product.

31. The method of claim 30, wherein the feedstock is enriched in lipid material.

32. An isolated nucleic acid comprising a nucleotide sequence having at least 80% homology to a sequence selected from the group consisting of SEQ ID NO : 1, SEQ ID NO : 3,

SEQ ID NO : 5, SEQ ID NO : 7, SEQ ID NO : 9, SEQ ID NO : 11, SEQ ID NO : 13, SEQ ID NO : 15, SEQ ID NO : 17, SEQ ID NO : 19, SEQ ID NO : 21, SEQ ID NO : 23, SEQ ID NO : 25, SEQ ID NO : 27, SEQ ID NO : 29, SEQ ID NO : 31, SEQ ID NO : 33, SEQ ID NO : 35, SEQ ID NO : 37, SEQ ID NO : 39, SEQ ID NO : 41, SEQ ID NO : 43, SEQ ID NO : 45, SEQ ID NO : 47, SEQ ID NO : 49, SEQ ID NO : 51, SEQ ID NO : 53, SEQ ID NO : 55, SEQ ID NO : 57, SEQ ID NO : 59, SEQ ID NO : 61, SEQ ID NO : 63, SEQ ID NO : 65, SEQ ID NO : 67, SEQ ID NO : 69, SEQ ID NO : 71, SEQ ID NO : 73, SEQ ID NO : 75, SEQ ID NO : 77, SEQ ID NO : 79, SEQ ID NO : 81, SEQ ID NO : 83, SEQ ID NO : 85, SEQ ID NO : 87, SEQ ID NO : 89, SEQ ID NO : 91, SEQ ID NO : 93, SEQ ID NO : 95, SEQ ID NO : 97, SEQ ID NO : 99, SEQ ID NO : 101, SEQ ID NO : 103, SEQ ID NO : 105, SEQ ID NO : 107, SEQ ID

NO : 109, SEQ ID NO : 111, and SEQ ID NO 113.

33. An isolated 14-24 base pair nucleic acid sequence comprising a sequence complimentary to the sense or antisense sequence of a sequential 14 base pair sequence present in a sequence selected from SEQ ID NO : 1, SEQ ID NO : 3, SEQ ID NO : 5, SEQ ID NO : 7, SEQ ID NO : 9, SEQ ID NO : 11, SEQ ID NO : 13, SEQ ID NO : 15, SEQ ID NO : 17, SEQ ID NO : 19, SEQ ID NO : 21, SEQ ID NO : 23, SEQ ID NO : 25, SEQ ID NO : 27, SEQ ID NO : 29, SEQ ID NO : 31, SEQ ID NO : 33, SEQ ID NO : 35, SEQ ID NO : 37, SEQ ID NO : 39, SEQ ID NO : 41, SEQ ID NO : 43, SEQ ID NO : 45, SEQ ID NO : 47, SEQ ID NO : 49, SEQ ID NO : 51, SEQ ID NO : 53, SEQ ID NO : 55, SEQ ID NO : 57, SEQ ID NO : 59, SEQ ID NO : 61, SEQ ID NO : 63, SEQ ID NO : 65, SEQ ID NO : 67, SEQ ID NO : 69, SEQ ID NO : 71, SEQ ID NO : 73, SEQ ID NO : 75, SEQ ID NO : 77, SEQ ID NO : 79, SEQ ID NO : 81, SEQ ID NO : 83, SEQ ID NO : 85, SEQ ID NO : 87, SEQ ID NO : 89, SEQ ID NO : 91, SEQ ID NO : 93, SEQ ID NO : 95, SEQ ID NO : 97, SEQ ID NO : 99, SEQ ID NO : 101, SEQ ID NO : 103, SEQ ID NO : 105, SEQ ID NO : 107, SEQ ID NO : 109, SEQ ID NO : 111, and SEQ ID NO 113.

34. A transgenic cell comprising the nucleic acid of any one of claims 32 to 33.

Description:
ISOLATED GENES AND TRANSGENIC ORGANISMS FOR PRODUCING

BIOFUELS

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application claims the benefit of U.S. Provisional Application No.

61/702,659 filed on September 18, 2012, the content of which is hereby incorporated by reference in its entirety.

INCORPORATION-BY-REFERENCE OF MATERIAL ELECTRONICALLY FILED

Incorporated by reference in its entirety herein is a computer-readable nucleotide/amino acid sequence listing submitted concurrently herewith and identified as follows: One 315,413 byte ASCII (text) file named "Seq_List" created on September 18, 2013.

FIELD OF THE INVENTION

The present invention relates to bio fuel production using cultures of Candidatus

Microthrix spp. on a growth medium that may comprise wastewater or sewage sludge. Also provided in the present invention are isolated genes and peptides from a strain of Candidatus Microthrix parvicella Bio 17-1 that are associated with a lipid accumulating phenotype. BACKGROUND OF THE INVENTION

There is an ever-increasing demand for renewable biofuels and bioenergy products as an alternative to fossil fuels. Biofuels are currently produced from, for example, various cellulosic materials and sugar-based plants, including sugarcane, beets, corn, rice, potatoes (among others), as well as wood chips. While the process is straight forward, producing biofuels and bioenergy products from these materials is, overall, inefficient and expensive given the cost of the source materials, and tends to drive up the price of food. Further, the current raw material sources for production of bio fuel will not be sufficient to meet the escalating demands.

The U.S. population generates around 8.6 million dry metric tons of sludge annually, that is, approximately 13 billion pounds (dry basis) of sludge. Disposal of this enormous amount of sludge without substantial impact on the environment is an ongoing challenge. Moreover, in the United States approximately 230 million tons (dry matter) of animal waste (manure) are generated every year. Unsafe and improper disposal of decomposable animal waste causes substantial environmental pollution, including surface and groundwater contamination, odors, dust, and methane and ammonia emission. Processing and/or disposal of municipal, industrial, and farm sewage waste (e.g., sludge) is costly, and has an enormous impact on the environment as well as on the public health.

Candidatus Microthrix are deeply branching filamentous actinobacteria occurring at the water-air interface of biological wastewater treatment plants where they are often responsible for foaming and bulking. Candidatus Microthrix are notoriously difficult to grow in culture owing to their slow growth rate and unique growth medium requirements. This has long-delayed study of their genetic content. FEMS Microbiology Reviews 29 (2005) 49-64. In wastewater treatment plants, however, Candidatus Microthrix rapidly dominate the environment based on a competitive advantage which is likely conferred by their uptake of long-chain-fatty-acids (LCFA) that are accumulated as neutral lipids under anaerobic conditions and converted into phospholipids for cell division under aerobic conditions.

Lipid storing organisms are useful for the production of bio fuel. The stored lipids may be extracted and processed directly into, for example, biodiesel as in algae biofuels. Additionally, organisms with complex lipid storing mechanisms, such as Candidatus Microthrix, may possess novel lipid processing and storage enzymes that can be introduced into transformed organisms for the direct production of bio fuel or bio fuel precursors.

Alternatively, proteins associated with lipid processing and storage could be used to produce biofuels in vitro. Suitable enzymes for commercial production of biofuels that can assemble and process alkyl chains into biofuel or biofuel precursors have been difficult to isolate.

Methods and systems are needed for recovering valuable components of sewage sludge to help satisfy energy needs, while simultaneously reducing the impact of such waste on the environment and the health of the population. Therefore, there is a need to increase the lipid production and storage from sewage sludge with an organism or community of organisms to serve as a suitable feedstock for biofuels production. Additionally, the isolation of novel lipid storage and processing proteins, and the genes encoding them, is desirable. BRIEF SUMMARY OF THE INVENTION

The invention is directed to a method of producing a lipid comprising: culturing

Candidatus Microthrix spp. in a culture system suitable for maintaining the viability and proliferation of the Candidatus Microthrix spp. wherein the culture system provides a carbon source or lipid material in a growth medium; isolating the Candidatus Microthrix spp. from the culture system; and extracting the lipid from the Candidatus Microthrix spp. In certain implementations, the method further comprises producing a biofuel from the extracted lipid.

In certain embodiments, the Candidatus Microthrix spp. are cultured at a temperature below 20 °C or below 15 °C. The growth medium may comprise wastewater or sewage sludge. In other embodiments, the Candidatus Microthrix spp. are cultured together with a microorganism selected from the group consisting of Zoogloea spp., Rhizobacter spp., Blautia spp., Hydrolatea spp., OD1 genera incertae sedis, and combinations thereof. These microorganisms as well as the Candidatus Microthrix spp. may be isolated from wastewater, sewer sludge, soil, or other environmental sources.

The invention also provides a method of producing a lipid comprising: transforming a microorganism with a nucleic acid encoding an enzyme selected from the group consisting of a long-chain- fatty-acid-Co A- ligase, an enoyl-CoA hydratase, a lipase, a 3-ketoacyl-CoA thiolase, an acyl-CoA thioesterase, a 3-hydroxyacyl-CoA dehydrogenase, and combinations thereof; culturing the microorganism in a culture system suitable for maintaining the viability and proliferation of the microorganism wherein the culture system provides a carbon source or lipid material in a growth medium; isolating the microorganism from the culture system; and extracting the lipid from the microorganism. In certain aspects, the method further comprises producing a biofuel from the extracted lipid.

In one embodiment, the present invention is directed to an isolated 'Candidatus Microthrix parvicella,' strain Bio 17-1 and to 57 newly isolated genes and peptides that have been identified that are involved in the lipid accumulating phenotype. These include:

28 genes encoding Long-chain-fatty-acid~CoA ligases (EC 6.2.1.3);

17 genes encoding Enoyl-CoA hydratases (EC 4.2.1.17);

2 genes encoding Lipases (EC 3.1.1.5 and/or EC 3.1.1.23);

8 genes encoding 3-ketoacyl-CoA thiolases (EC 2.3.1.16);

1 gene encoding an esterase (EC 3.1.2.-); and 1 gene encoding a 3-hydroxyacyl-CoA dehydrogenase (EC 1.1.1.35).

The present invention is further directed to a system and method that utilizes these genes and the gene products to produce or process lipids, fatty acids, and biofuel. Additionally, the invention is directed to transgenic organisms that have one or more of isolated genes. The transgenic organisms are useful in producing biofuel or biofuel precursors.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 depicts a phylogenetic tree of the 28 long-chain-fatty-CoA ligase homologues. The phylogenetic tree was constructed by Bio-Neighbor Joining at all position (including gaps) after ClustalW2 alignment with default parameters of the amino-acid sequences of 'Candidatus Microthrix parvicella' strain Bio 17-1 and of some reference sequences (GenBank GI number indicated after the organism name).

Figure 2 depicts a phylogenetic tree of the 17 enoyl-CoA hydratase homologues. The phylogenetic tree was constructed by Bio-Neighbor Joining at all position (including gaps) after ClustalW2 alignment with default parameters of the amino-acid sequences of 'Candidatus Microthrix parvicella' strain Bio 17-1 and of some reference sequences (GenBank GI number indicated after the organism name).

Figure 3 depicts a workflow overview. 16 distinct samples were chosen for their temporally, spatially and ecological differences. Community structure and lipid were analyzed to choose one time point of particular interest. The chosen undivided sample undergoes a comprehensive biomolecular extraction (DNA, RNA, proteins and metabolites as described in Roume et al. (2013) The ISME Journal 7: 110) and analyzed by high-throughput technologies.

Figure 4 depicts lipid accumulating community dynamics, (a) 16S phylotyping at the phylum-level of the community at 4 different time points for 4 biological replicates, (b) Dynamics of the 2 most abundant organisms and of a potential keystone species (see Figure 5) of the community, (c) Wastewater temperature, Simpson diversity and Pielou evenness over time, (d) Cellular accumulation of some long chain fatty acids, calculated by comparing the intracellular to the extracellular concentration (by GCMS Single Ion Monitoring).

Figure 5 depicts Zooglea spp., a potential keystone species, (a) Spearman correlations of normalized abundances of conserved bacterial genera and metabolites. Node size is proportional to the average abundance of the taxa. p-value<0.001. (b) 2 isolated Zoogloea strains observed in epifluorescence after Nile Red staining (non-polar granules are fluorescent). Scale bare: 10 μηι.

Figure 6 depicts microbial community clustering into 9 composite genomes and identification, (a) Meta-genomic and meta-transcriptomic data treatment workflow, (b) Predicted proteins identity between the 9 subsets and Microthrix Bio 17-1 isolate genome. From outside to inside, concentric circles represent the percentage identity of subset 01 to subset 09.

Figure 7 depicts a summary of the composite genomes.

Figure 8 depicts the subset coverage and expression, (a) The inner blue plots show the DNA coverage and the outer blue plots represents the RNA expression levels (windows of lOOpb excepted for subset09, 50bp). (b) Normalized metagenome coverage, (c) Normalized metatranscriptome coverage.

Figure 9 depicts a workflow overview, (a) DNA and metabolites were extracted from 16 samples chosen for a representation of temporally, spatially and ecological distinct samples (4 time points ranging from both extremes temperatures over a year), (b) All biomolecular fractions were obtained from a single unique subsample (in house developed protocol; see Roume et al. (2013) The ISME Journal 7:110) and analyzed by high-throughput technologies.

Figure 10 depicts lipid accumulating community dynamics, (a) 16S phylotyping at the phylum-level of the community at 4 different time points for 4 biological replicates, (b) Dynamics of the 2 most abundant organisms and of 4 potential keystone species (see Figure 11) of the community, (c) Wastewater temperature and alpha diversity over time, (d) Cellular accumulation of the 2 most abundant long chain fatty acids of the system, calculated by comparing the intracellular lipid concentration to the extracellular concentration.

Figure 11 depicts a 16S rRNA amplicon and intracellular/extracellular metabolites correlation network. Spearman correlations were calculated based on normalized abundances of conserved genus and metabolites. 4 species of low abundance (see Figure 10b) show numerous edges with non-polar metabolites and particular long chain fatty acids, namely Rhizobacter, Blautia, Hydrotalea and OD1. The size of nodes are proportional to the average abundance of the taxa on the 16 samples, rho>0.75 or <-0.75, p<0.001.

Figure 12 depicts the Microthrix parvicella Bio 17-1 genome with mapped variations as well as transcripts and protein abundances. The genome of the isolated strain Bio 17-1 was sequenced (Illumina and PacBio) and assembled into 15 contigs (green) and automatically annotated through RAST. Orange bands represent the 25 homo logs of long-chain-fatty-acid acyl-CoA ligase (named in orange) and black bands show other features annotated to belong to fatty acid/lipid metabolism subsystem. The size scale of contig 1 is different than for other contigs. The outer plot in blue represents the metatranscriptome (Illumina RNAseq reads mapped with bwa). Line height represents the average coverage (<= 20 bp windows, scale 0-500). The red dots show SNP frequencies in the metatranscriptome (from 0 to 1, 0 being closer to the chromosome, phred >=20). The inner plot in red shows the peptide spectra mapping to the annotation feature (scale 0-250). Black rectangles highlight examples of long-chain-fatty-acid acyl-CoA ligase variant expression.

Figure 13 depicts an emergent self-organizing map of metagenomic contigs (training set).

Tetranucleotide frequencies were determined in order to cluster together reads belonging to the same organism. The different bins are color coded according to the topography. Microthrix parvicella Bio 17-1 reads tetranucleotide frequencies are mapped with large violet dots in the training map. They all clustered with the yellow bin that can be defined as Microthrix spp. pangenome.

Figure 14 depicts Microthrix parvicella Bio 17-1 long-chain- fatty-CoA ligase homo logs. The phylogenetic tree was constructed by Bio- Neighbor Joining after ClustalW2 alignment of the amino-acid sequences. Homo logs detected by metaproteomics are indicated with orange stars.

Figure 15 depicts a typical organic composition of wastewater.

Figure 16 depicts lipid accumulating organisms present in wastewater and highlights the importance of the filamentous bacteria Microthrix spp. among such organisms.

Figure 17 depicts a model community of lipid accumulating organisms present in wastewater and the water and air temperatures in this community's environment over a period of about 9 months.

Figure 18 depicts the Candidatus Microthrix parvicella Bio 17-1 genome. Groups of genes are identified by their biological function, and their location in the genome is indicated with the various colors. DETAILED DESCRIPTION OF THE INVENTION

Aspects and applications of the invention presented here are described below in the drawings and detailed description of the invention. Unless specifically noted, it is intended that the words and phrases in the specification and the claims be given their plain, ordinary, and accustomed meaning to those of ordinary skill in the applicable arts.

In the following description, and for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the various aspects of the invention. It will be understood, however, by those skilled in the relevant arts, that the present invention may be practiced without these specific details. It should be noted that there are many different and alternative configurations, devices and technologies to which the disclosed inventions may be applied. The full scope of the inventions is not limited to the examples that are described below.

The singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to "a step" includes reference to one or more of such steps.

As used herein, a "biofuel" refers to any fuel, fuel additive, aromatic, and/or aliphatic compound derived from a biomass starting material such sewage sludge, wastewater, or the like.

The term "culture system" as used herein refers to a system of water retaining, filtering, heating/cooling, and circulating systems, and structures that are typically employed in the maintenance of a culture medium under conditions suitable for supporting the viability and reproduction of a desired organism(s).

The terms "culture medium" and "growth medium" as used herein refer to an aqueous or agar-based medium designed to support the growth of microorganisms.

The term "carbon source" as used herein refers to a nutrient (e.g., sugar) that provides carbon skeletons needed for synthesis of new organic molecules (i.e., anabolism).

The term "viability" as used herein refers to "capacity for survival" and is more specifically used to mean a capacity for living, developing, or reproducing under favorable conditions.

The term "lipid" and "lipid material" as used herein refers to naturally-occurring molecules which includes fats, waxes, sterols, fat-soluble vitamins (such as vitamins A, D, E and K), monoglycerides, diglycerides, triglycerides, phospholipids, fatty acids and the like. The main biological functions of lipids include energy storage, as structural components of cell membranes, and as important signaling molecules. Lipids may be broadly defined as hydrophobic or amphiphilic small molecules; the amphiphilic nature of some lipids allows them to form structures such as vesicles, liposomes, or membranes in an aqueous environment.

The term "chemical extraction method" as used herein refers to a use of chemicals other than organic solvents to isolate lipids from undesirable components. The Babcock, Gerber and Detergent methods are examples of non-solvent liquid extraction methods for isolation of lipid content, and are well known to one of skill in the art.

The term "enzymatic extraction method" as used herein refers to a method of isolating lipids from undesirable components using enzymes such as hydrolases, proteinases, lipases and the like to break down complexes of polysaccharides, proteins, and lipids to release the desired lipids from cells. Lipids can then be extracted using organic solvents, mechanical methods, or combinations thereof.

The term "mechanical extraction method" as used herein refers to the disruption of cells by physical methods such as homogenization, crushing, filtration, sedimentation, and the like, to extract the lipids from cells.

The term "solvent extraction method" as used herein refers to the isolation of lipids using organic solvents and centrifugation methods. The fact that lipids are soluble in organic solvents, but insoluble in water, provides a convenient method of separating the lipid components from water-soluble components, such as proteins, carbohydrates and minerals.

The term "steam extraction methods" as used herein refers to heated water extraction which is a technique based on the use of steam heat as an extractant, at elevated temperatures, and at a pressure high enough to convert and maintain lipids in a liquid state.

As used herein, "transesterify," "transesterifying," and "transesterification" refer to a process of exchanging an alkoxy group of an ester by another alcohol and more specifically, of converting a lipid, e.g. triglycerides, to biodiesel, e.g. fatty acid alkyl esters, and glycerol.

Transesterification can be accomplished by using traditional chemical processes such as acid or base catalyzed reactions, or by using enzyme-catalyzed reactions.

The present invention arises from the discovery of a genus of filamentous bacteria, Candidatus Microthrix spp., that accumulate lipids and predominate among a community of microorganisms when cultured on wastewater or sewage sludge. An isolated strain, Candidatus Microthrix parvicella Bio 17-1, was genetically characterized and found to have an abundance of genes involved in lipid metabolism including a number of long-chain-fatty-acid-CoA-ligases, enoyl-CoA hydratases, lipases, and 3-ketoacyl-CoA thiolases along with at least one acyl-CoA thioesterase and 3-hydroxyacyl-CoA dehydrogenase.

The present invention is directed to a method of producing a biofuel in some embodiments. The method comprises one or more steps selected from culturing Candidatus Microthrix spp. in a culture system suitable for maintaining the viability and proliferation of the Candidatus Microthrix spp. wherein the culture system provides lipids in a growth medium; isolating the Candidatus Microthrix spp. from the culture system; extracting the lipids from the Candidatus Microthrix spp.; and/or producing a biofuel from the extracted lipids. In one embodiment, the method of producing a biofuel further comprises measuring depletion of the lipids in the growth medium prior to isolating the Candidatus Microthrix spp. from the culture system.

In certain aspects, the Candidatus Microthrix spp. culture used for producing biofuels is maintained at a temperature of about 25°C, about 24°C, about 23°C, about 22°C, about 21°C, about 20°C, about 19°C, about 18°C, about 17°C, about 16°C, about 15°C, about 14°C, about 13°C, about 12°C, about 11°C, about 10°C, about 9°C, about 8°C, about 7°C, about 6°C, about 5°C, about 4°C, about 3°C, about 2°C, or about 1°C.

In other embodiments, the temperature of the culture containing the Candidatus Microthrix spp. is maintained below about 25°C, about 24°C, about 23°C, about 22°C, about 21°C, about 20°C, about 19°C, about 18°C, about 17°C, about 16°C, about 15°C, about 14°C, about 13°C, about 12°C, about 11°C, about 10°C, about 9°C, about 8°C, about 7°C, about 6°C, about 5°C, or about 4°C.

In yet other embodiments, the Candidatus Microthrix spp. culture used for producing biofuels is preferably maintained at a temperature of 4°C to 25°C; e.g., any range within 4°C to 25°C such as 4°C to 10°C, 4°C to 15°C, 10°C to 15°C, 10°C to 20°C, 15°C to 25°C, 20°C to 25°C, 4°C to 20°C, 10°C to 25°C, etc.

In some embodiments, the growth medium for the culture may be wastewater or sewage sludge. The wastewater or sewage sludge may belong to a class selected from sanitary, commercial, industrial, agricultural and surface runoff. The wastewater from residences and institutions, carrying body wastes, washing water, food preparation wastes, laundry wastes, and other waste products of normal living, are classed as domestic or sanitary sewage. Liquid-carried wastes from stores and service establishments serving the immediate community, termed commercial wastes, are included in the sanitary or domestic sewage category if their characteristics are similar to household flows. Wastes that result from an industrial process or the production or manufacture of goods are classed as industrial wastewater. Their flows and strengths are usually more varied, intense, and concentrated than those of sanitary sewage. Surface runoff, also known as storm flow or overland flow, is that portion of precipitation that runs rapidly over the ground surface to a defined channel. Precipitation absorbs gases and particulates from the atmosphere, dissolves and leaches materials from vegetation and soil, suspends matter from the land, washes spills and debris from urban streets and highways, and carries all these pollutants as wastes in its flow to a collection point. Any of these types of wastewater or sewage may be used as growth medium in the present invention.

In one implementation, the Candidatus Microthrix spp. are cultured with one or more additional microorganisms that contribute to the production of biofuels by assisting with the accumulation and production of lipids. The Candidatus Microthrix spp. may be cultured in a culture system together with Zoogloea spp., Rhizobacter spp., Blautia spp., Hydrolatea spp., OD1 genera incertae sedis, Perlucidibaca spp., Brevibacterium spp., Mycobacterium spp., Nocardia spp., Rhodococcus spp., Micromonospora spp., Dietzia spp., and Gordonia spp., Acinetobacter spp., Saccharomyces spp., Rhodotorula spp., Chlorella spp., and combinations thereof. In a preferred embodiment, the Candidatus Microthrix spp. are cultured with a microorganism selected from the group consisting of Zoogloea spp., Rhizobacter spp., Blautia spp., Hydrolatea spp., OD1 genera incertae sedis, and combinations thereof. In other embodiments, the Candidatus Microthrix spp. are cultured with a consortium of strains from wastewater or sewer sludge.

In yet other implementations, the Candidatus Microthrix spp. are environmental strain isolated from wastewater or sewer sludge. Various methods may be used to isolate Candidatus Microthrix spp. or other microorganisms from environmental sources including laser microdissection, single cell encapsulation combined with flow cytometry, manipulation with optical tweezers, and segregation with a micromanipulator (See Pham et al. (2012) Trends in Biotechnology 30:475). Laser microdissection involves visualization of the cells of interest via microscopy, transfer of laser energy to a thermolabile polymer with formation of a polymer-cell composite (IR system) or photovolatilization of cells surrounding a selected area (UV system), and removal of the cells of interest from a heterogeneous sample (See Frohlich et al. (2000) FEMS Microbiol. Rev. 24:567).

In single cell encapsulation combined with flow cytometry, by emulsifying a mixture of diluted cell suspension and preheated agarose gel microdroplets (GMDs) including single cells are formed and incubated in media. This method allows low nutrient flux into GMDs and creates proper conditions for slow-growing microorganisms. Then, GMDs containing colonies are separated from free-living cells and empty GMDs using a flow cytometer (See Alain et al. (2009) Extremophiles 13:583; Zengler et al. (2002) Proc. Natl. Acad. Sci. U.S.A. 99: 15681; and Zengler et al. (2005) Methods Enzymol. 397: 124).

Manipulation with optical tweezers uses a highly focused laser beam to trap and manipulate microscopic neutral objects such as microbial cells. Using this method, a single cell can be isolated from a mixture of cells (See Frohlich et al. (2000) FEMS Microbiol. Rev. 24:567; and Zhang et al. (2008) J. R. Soc. Interface 5:671).

Segregation of cells with a micromanipulator entails the use of a capillary tube or a microneedle to pick up a single cell out of a mixed community under visual control with an inverse microscope (See Frohlich et al. (2000) FEMS Microbiol. Rev. 24:567).

In some aspects, the culture system is inoculated with a ratio of Candidatus Microthrix spp. cells to one or more additional microorganism cells of from about 1,000: 1 to about 1 :1,000; e.g., any range within from about 1,000: 1 to about 1 : 1,000 such as from about 750: 1 to about 1 :750, from about 500: 1 to about 1 :500, from about 250: 1 to about 1 :250, from about 100: 1 to about 1 : 100, from about 1 : 1 to about 1 : 100, from about 5:1 to about 20: 1, from about 50: 1 to about 1 :50, from about 25 : 1 to about 1 :25, etc.

In other aspects, the Candidatus Microthrix spp. is Candidatus Microthrix parvicella or Candidatus Microthrix calida. (See Levantesi et al. (2006) Environ. Microbiol. 8: 1552-1563). Also included within the scope of the present invention are Candidatus Microthrix spp. strains that are phylogenetically closely related. The strain of Candidatus Microthrix parvicella may be Candidatus Microthrix parvicella, strain Bio 17-1; Candidatus Microthrix parvicella, strain Ben 43; Candidatus Microthrix parvicella, strain DAN 1-3; Candidatus Microthrix parvicella, strain RN1; Candidatus Microthrix parvicella, clone 17; or Candidatus Microthrix parvicella, clone 6.

In another aspect, the present invention is directed to a method of producing a biofuel comprising: transforming a microorganism with a nucleic acid encoding an enzyme selected from the group consisting of a long-chain-fatty-acid-CoA-ligase, an enoyl-CoA hydratase, a lipase, a 3-ketoacyl-CoA thiolase, an acyl-CoA thioesterase, a 3-hydroxyacyl-CoA dehydrogenase, and combinations thereof; culturing the microorganism in a culture system suitable for maintaining the viability and proliferation of the microorganism wherein the culture system provides lipids in a growth medium; isolating the microorganism from the culture system; extracting the lipids from the microorganism; and producing a biofuel from the extracted lipids. The microorganism may be any one of Candidatus Microthrix spp., Bacillus spp., Saccharomyces cerevisiae, Escherichia coli, a cyanobacterium, and an alga.

Other host ceils useful for the methods and compositions described herein include archae, prokaryotic, or eukaryotic cells.

Suitable prokaryotic hosts include, but are not limited, to any of a variety of gram- positive, gram-negative, or gram- ariable bacteria. Examples include, but are not limited to, cells belonging to the genera: Agrobacterium, Alicyclobacillus, Anabaena, Anacystis, Artkrobacter, Azobacter, Bacillus, Brevibacterium, Ckromalium, Clostridium, Coryn bacterium, Enterobacter, Erwinia, Escherichia, Lactobacillus, Lactococciis, Mesorhizobium, Methylobacterium, Mlcrobacterl m, Phormidium, Pseudomonas, Khodobacter, Rhodopseudomonas, Rhodospirillum, Rhodococcus, Salmonella, Scene desmun, Serratia, Shigella, Staphlococcus, Strepromyces, Synnecocc s, and Zymomonas. Examples of prokaryotic strains include, but are not limited to: Bacillus subtilis, Bacillus amyloliquefacines, Brevibacterium ammoniagenes, Brevibacterium immariophUum, Clostridium beigerinckii, Enterobacter sakazakii, Escherichia coli, lM.ctococ.cus lactis, Mesorhizobium loti, Pseudomonas aeruginosa, Pseudomonas mevalonii, Pseudomonas pudica, Rhodobacter capsulatus, Khodobacter sphaeroides, Rhodospirillum rubrum, Salmonella enterica, Salmonella typhi, Salmonella typhiniurium, Shigella dysenteriae. Shigella flexneri. Shigella sonnei, and Staphylococcus aureus. In a particular embodiment, the host cell is an Escherichia coli cell.

Suitable archae hosts include, but are not limited to, cells belonging to the genera: Aeropyrum, A rchaeglobus, Halobacterium, Metkanococcus, Methanobacterium, Pyrococcus, Sulfolobus, and Tkermoplasma, Examples of archae strains include, but are not limited to: Archaeoglohus fulgidus, Halobacterium sp., Methanococcus jannaschu, Methanobacteriwn thermoautotrophicum, Tkermoplasma acidophilum, Tkermoplasma volcanium, Pyrococcus horikoshii, Pyrococcus abyssi, and Aeropyrum pernix.

Suitable eukaryotic hosts include, but are not limited to, fungal cells, algal cells, insect cells, and plant cells. In some embodiments, yeasts useful in the present methods include yeasts that have been deposited with microorganism depositories (e.g. IFO, ATCC, etc.) and belong to the genera Aciculoconidhmt, Ambrosiozynia, Arthroascus, Arxiozyma, Ashbya, Babjevia, Bensingtonia, Botryoasc s, Botryozyma, Brettanomyc.es, Bullera, Bull eromyc.es, Candida, Citeromyces, Clavispora, Cryptoeoecus, Cystqfilobasidium, Debaryomyces, Dekkara, Dipodascopsis, Dipodascus, Eeniella, Endo ycopsella, Erertiascus, Eremothecium, Erythrobasidium, Fellomyces, Filobasidium, Galactomyces, Geotrichum, Guilliermondelia, Hanseniaspora, Hansenula, Hasegawaea, Holtermannia, Honnoascus, Hyphoplchia, Issatchenkia, Kloeckera, Kloeckeraspora, Klmveromyc.es, Kondoa, Kuraishia, Kurtzmanomyces, Le osporidium, Lipomyces, Lodderomyces, Malassezia, Metschnikowia, Mrakia, Myxozyma, Nadsonia, Nakazawaea, Nematospora, Ogataea, Oosporidium, Pachysolen, Phachytichospora, Phaffia, Pichia, Rkodosporidium, Rhodotorula, Saccharomyces, Saccharomycodes, Saccharomycopsis, Saitoella, Sakaguchia, Saturnospora, Schizoblastosporion, Schizosaccharomyces, Schwanniomyces, Sporidiobolus, Sporobolomyces, Sporopachydermia, Stephanoascus, Sterigmatomyces, Sterigmatosporidium, Symbiotaphrina, Sympodiomyces, Sympodiomycopsis, Torulaspora, Trichosporiella, Trichosporon, Trigonopsis, Tsuchiyaea, Udeniomyces, Waltomyces, Wick rhamia, Wickerhamiella, Williopsis, Yamadazyma, Yarrowia, Zygoascus, Zygosaccharomyces, Zygowilliopsis, and Zygozyma, among others.

In some embodiments, the host microbe is Saccharomyces cerevisiae, Pichia pastoris, Schizosaccharomyces pombe, Dekkera bruxellensis, Kimveromyc.es lactis (previously called Saccharomyces lactis), Kluveromyces marxianus, Arxula adeninivorans, or Hansenula polymorpha (now known as Pichia angusta). In some embodiments, the host microbe is a strain of the genus Candida, such as Candida lipolytica, Candida g illiermondii, Candida krusei, Candida pseiidotropicalis, or Candida utilis.

As used herein, "transform" and "transformation" refer to the transfer of a nucleic acid molecule into a host organism. The nucleic acid molecule may be a plasmid that replicates autonomously, for example, or, it may integrate into the genome of the host organism. Host organisms containing the transformed nucleic acid fragments are referred to as "transgenic" or "recombinant" or "transformed" or "transformant" organisms.

"Stable transformation" refers to the transfer of a nucleic acid fragment into the genome of a host organism, including both nuclear and organellar genomes, resulting in genetically stable inheritance (i.e., the nucleic acid fragment is "stably integrated"). In contrast, "transient transformation" refers to the transfer of a nucleic acid fragment into the nucleus, or DNA- containing organelle, of a host organism resulting in gene expression without integration or stable inheritance.

The terms "plasmid" and "vector" refer to an extra chromosomal element often carrying genes that are not part of the central metabolism of the cell, and usually in the form of circular double-stranded DNA fragments. Such elements may be autonomously replicating sequences, genome integrating sequences, phage or nucleotide sequences, linear or circular, of a single- or double-stranded DNA or RNA, derived from any source, in which a number of nucleotide sequences have been joined or recombined into a unique construction that is capable of introducing an expression cassette(s) into a cell.

The term "expression cassette" refers to a fragment of DNA comprising the coding sequence of a selected gene and regulatory sequences preceding (5' non-coding sequences) and following (3' non-coding sequences) the coding sequence that are required for expression of the selected gene product. Thus, an expression cassette is typically composed of: 1) a promoter sequence; 2) a coding sequence ["ORF"]; and, 3) a 3' untranslated region (i.e., a terminator) that, in eukaryotes, usually contains a polyadenylation site. The expression cassette(s) is usually included within a vector, to facilitate cloning and transformation. Different expression cassettes can be transformed into different organisms including bacteria, yeast, plants and mammalian cells, as long as the correct regulatory sequences are used for each host.

Standard resource materials that are useful to make recombinant constructs describe, inter alia: 1) specific conditions and procedures for construction, manipulation and isolation of macromolecules, such as DNA molecules, plasmids, etc.; 2) generation of recombinant DNA fragments and recombinant expression constructs; and, 3) screening and isolation of clones. See, Sambrook, J., Fritsch, E. F. and Maniatis, T., Molecular Cloning: A Laboratory Manual, 2nd ed.,

Cold Spring Harbor Laboratory: Cold Spring Harbor, N.Y. (1989) (hereinafter "Maniatis"); by Silhavy, T. J., Bennan, M. L. and Enquist, L. W., Experiments with Gene Fusions, Cold Spring Harbor Laboratory: Cold Spring Harbor, N.Y. (1984); and by Ausubel, F. M. et al, Current Protocols in Molecular Biology, published by Greene Publishing Assoc. and Wiley-Interscience, Hoboken, N.J. (1987).

In general, the choice of sequences included in the construct depends on the desired expression products, the nature of the host cell and the proposed means of separating transformed cells versus non-transformed cells. The skilled artisan is aware of the genetic elements that must be present on the plasmid vector to successfully transform, select and propagate host cells containing the chimeric gene. Typically, however, the vector or cassette contains sequences directing transcription and translation of the relevant gene(s), a selectable marker and sequences allowing autonomous replication or chromosomal integration. Suitable vectors comprise a region 5' of the gene that controls transcriptional initiation, i.e., a promoter, the gene coding sequence, and a region 3' of the DNA fragment that controls transcriptional termination, i.e., a terminator. It is most preferred when both control regions are derived from genes from the transformed host cell, although they need not be derived from genes native to the production host.

Transcription initiation regions or promoters useful for driving expression of heterologous genes or portions of them in the desired host cell are numerous and well known. These control regions may comprise a promoter, enhancer, silencer, intron sequences, 3' UTR and/or 5' UTR regions, and protein and/or RNA stabilizing elements. Such elements may vary in their strength and specificity. Virtually any promoter, i.e., native, synthetic, or chimeric, capable of directing expression of these genes in the selected host cell is suitable, although transcriptional and translational regions from the host species are particularly useful. Expression in a host cell can occur in an induced or constitutive fashion. Induced expression occurs by inducing the activity of a regulatable promoter operably linked to the gene of interest, while constitutive expression occurs by the use of a constituitive promoter.

3' non-coding sequences encoding transcription termination regions may be provided in a recombinant construct and may be from the 3' region of the gene from which the initiation region was obtained or from a different gene. A large number of termination regions are known and function satisfactorily in a variety of hosts when utilized in both the same and different genera and species from which they were derived. Termination regions may also be derived from various genes native to the preferred hosts. The termination region is usually selected more for convenience rather than for any particular property.

Particularly useful termination regions for use in yeast are derived from a yeast gene, particularly Saccharomyces, Schizosaccharomyces, Candida, Yarrowia or Kluyveromyces . The 3 '-regions of mammalian genes encoding γ-interferon and a-2 interferon are also known to function in yeast. The 3 '-region can also be synthetic, as one of skill in the art can utilize available information to design and synthesize a 3 '-region sequence that functions as a transcription terminator. A termination region may be unnecessary, but is highly preferred.

The vector may also comprise a selectable and/or scorable marker, in addition to the regulatory elements described above. Preferably, the marker gene is an antibiotic resistance gene such that treating cells with the antibiotic results in growth inhibition, or death, of untransformed cells and uninhibited growth of transformed cells. For selection of yeast trans formants, any marker that functions in yeast is useful with resistance to kanamycin, hygromycin and the amino glycoside G418 and the ability to grow on media lacking uracil, lysine, histidine or leucine being particularly useful.

Merely inserting a gene into a cloning vector does not ensure its expression at the desired rate, concentration, amount, etc. In response to the need for a high expression rate, many specialized expression vectors have been created by manipulating a number of different genetic elements that control transcription, R A stability, translation, protein stability and location, oxygen limitation, and secretion from the host cell. Some of the manipulated features include: the nature of the relevant transcriptional promoter and terminator sequences, the number of copies of the cloned gene and whether the gene is plasmid-borne or integrated into the genome of the host cell, the final cellular location of the synthesized protein, the efficiency of translation and correct folding of the protein in the host organism, the intrinsic stability of the mR A and protein of the cloned gene within the host cell and the codon usage within the cloned gene, such that its frequency approaches the frequency of preferred codon usage of the host cell. Each of these may be used in the methods and host cells described herein to further optimize expression of disclosed genes.

For example, gene expression can be increased at the transcriptional level through the use of a stronger promoter (either regulated or constitutive) to cause increased expression, by removing/deleting destabilizing sequences from either the mRNA or the encoded protein, or by adding stabilizing sequences to the mR A (U.S. Pat. No. 4,910,141). Alternately, additional copies of the genes may be introduced into the recombinant host cells to thereby increase lipid production and accumulation, either by cloning additional copies of genes within a single expression construct or by introducing additional copies into the host cell by increasing the plasmid copy number or by multiple integration of the cloned gene into the genome.

After a recombinant construct is created comprising at least one chimeric gene comprising a promoter, an open reading frame ["ORF"], and a terminator, it is placed in a plasmid vector capable of autonomous replication in the host cell or is directly integrated into the genome of the host cell. Integration of expression cassettes can occur randomly within the host genome or can be targeted through the use of constructs containing regions of homology with the host genome sufficient to target recombination with the host locus. Where constructs are targeted to an endogenous locus, all or some of the transcriptional and translational regulatory regions can be provided by the endogenous locus.

When two or more genes are expressed from separate replicating vectors, each vector may have a different means of selection and should lack homology to the other construct(s) to maintain stable expression and prevent reassortment of elements among constructs. Judicious choice of regulatory regions, selection means and method of propagation of the introduced construct(s) can be experimentally determined so that all introduced genes are expressed at the necessary levels to provide for synthesis of the desired lipid products.

Constructs comprising the gene(s) of interest may be introduced into a host cell by any standard technique. These techniques include transformation, e.g., lithium acetate transformation (Methods in Enzymology, 194:186-187 (1991)), biolistic impact, electroporation, microinjection, vacuum filtration or any other method that introduces the gene of interest into the host cell.

For convenience, a host cell that has been manipulated by any method to take up a DNA sequence, for example, in an expression cassette, is referred to herein as "transformed" or "recombinant" or "transformant". The transformed host will have at least one copy of the expression construct and may have two or more, depending upon whether the gene is integrated into the genome, amplified, or is present on an extrachromosomal element having multiple copy numbers.

The transformed host cell can be identified by selection for a marker contained on the introduced construct. Alternatively, a separate marker construct may be co-transformed with the desired construct, as many transformation techniques introduce many DNA molecules into host cells.

Typically, transformed hosts are selected for their ability to grow on selective media, which may incorporate an antibiotic or lack a factor necessary for growth of the untransformed host, such as a nutrient or growth factor. An introduced marker gene may confer antibiotic resistance, or encode an essential growth factor or enzyme, thereby permitting growth on selective media when expressed in the transformed host. Selection of a transformed host can also occur when the expressed marker protein can be detected, either directly or indirectly. Additional selection techniques are described in U.S. Pat. No. 7,238,482 and U.S. Pat. No. 7,259,255.

Regardless of the selected host or expression construct, multiple transformants must be screened to obtain a strain displaying the desired expression level and pattern. Such screening may be accomplished by Southern analysis of DNA blots (Southern, J. Mol. Biol, 98:503 (1975)), Northern analysis of mRNA expression (Kroczek, J. Chromatogr. Biomed. Appl., 618(1-2): 133-145 (1993)), Western analysis of protein expression, phenotypic analysis or GC analysis of the lipid products.

The lipids produced by the methods of the present invention may be converted into biofuels by transesterification. Transesterification of lipids yields long-chain fatty acid esters useful as biodiesel.

In particular embodiments, the present application describes genetically engineering strains with one or more exogenous genes. For example, cells that produce high levels of triacylglycerides (TAGs) suitable for biodiesel can be engineered to express a lipase, which can facilitate transesterification of TAGs. The lipase can optionally be expressed using an inducible promoter, so that the cells can first be grown to a desirable density in a culture system and then harvested, followed by induction of the promoter to express the lipase, optionally in the presence of sufficient alcohol to drive conversion of TAGs to fatty acid esters.

Some lipid is sequestered in cell membranes and other non-aqueous parts of the cell. Therefore, to increase the yield of the transesterification reaction, it can be beneficial to lyse the cells to increase the accessibility of the lipase to the lipid. Cell disruption can be performed, for example, mechanically, through addition of pressurized steam, or by employing a virus that lyses the cells, expressing a gene to produce a lytic protein in the cell, or treating the culture with an agent that lyses cells. Optionally, the lipase can be expressed in an intracellular compartment, where it remains separate from the majority of the lipid until transesterification. Generally, it is preferable to carry out transesterification after water has been substantially removed from the preparation and/or an excess of alcohol has been added. Lipases can use water, as well as alcohol, as a substrate in transesterification. With water, the lipid is conjugated to a hydroxyl moiety to produce a polar fatty acid, rather than an ester. With an alcohol, such as methanol, the lipid is conjugated to a methyl group, producing a non-polar fatty acid ester, which is typically preferable for a transportation fuel. To limit exposure of the lipase to intracellular lipids until conditions are suitable for transesterification to produce fatty acid esters, the lipase can be expressed, for example, in the chloroplast, mitochondria, or other cellular organelle. This compartmentalized expression results in sequestration of the lipase from the majority of the cellular lipid until after the cells have been disrupted.

In one embodiment, the invention provides an isolated strain comprising Candidatus Microthrix parvicella Bio 17-1 as described herein.

The invention also includes isolated nucleic acids that comprise a nucleotide sequence having at least 80% homology, more preferably at least 90% homology, and most preferably at least 95%) homology to a sequence selected from the group consisting of: SEQ ID NO : 1, SEQ ID NO : 3, SEQ ID NO : 5, SEQ ID NO : 7, SEQ ID NO : 9, SEQ ID NO : 11, SEQ ID NO : 13, SEQ ID NO : 15, SEQ ID NO : 17, SEQ ID NO : 19, SEQ ID NO : 21, SEQ ID NO : 23, SEQ ID NO : 25, SEQ ID NO : 27, SEQ ID NO : 29, SEQ ID NO : 31, SEQ ID NO : 33, SEQ ID NO : 35, SEQ ID NO : 37, SEQ ID NO : 39, SEQ ID NO : 41, SEQ ID NO : 43, SEQ ID NO : 45, SEQ ID NO : 47, SEQ ID NO : 49, SEQ ID NO : 51, SEQ ID NO : 53, SEQ ID NO : 55, SEQ ID NO : 57, SEQ ID NO : 59, SEQ ID NO : 61, SEQ ID NO : 63, SEQ ID NO : 65, SEQ ID NO : 67, SEQ ID NO : 69, SEQ ID NO : 71, SEQ ID NO : 73, SEQ ID NO : 75, SEQ ID NO : 77, SEQ ID NO : 79, SEQ ID NO : 81, SEQ ID NO : 83, SEQ ID NO : 85, SEQ ID NO : 87, SEQ ID NO : 89, SEQ ID NO : 91, SEQ ID NO : 93, SEQ ID NO : 95, SEQ ID NO : 97, SEQ ID NO : 99, SEQ ID NO : 101, SEQ ID NO : 103, SEQ ID NO : 105, SEQ ID NO : 107, SEQ ID NO : 109, SEQ ID NO : 111, and SEQ ID NO 113. In a preferred embodiment, the sequence includes one or more the isolated sequences.

The invention also encompasses a nucleic acid having a sequence that encodes a peptide having at least 80% homology, more preferably at least 90% homology, and most preferably at least 95% homology to a sequence selected from the group consisting of: SEQ ID NO : 2, SEQ ID NO : 4, SEQ ID NO : 6, SEQ ID NO : 8, SEQ ID NO : 10, SEQ ID NO : 12, SEQ ID NO : 14, SEQ ID NO : 16, SEQ ID NO : 18, SEQ ID NO : 20, SEQ ID NO : 22, SEQ ID NO : 24, SEQ ID NO : 26, SEQ ID NO : 28, SEQ ID NO : 30, SEQ ID NO : 32, SEQ ID NO : 34, SEQ ID NO : 36, SEQ ID NO : 38, SEQ ID NO : 40, SEQ ID NO : 42, SEQ ID NO : 44, SEQ ID NO : 46, SEQ ID NO : 48, SEQ ID NO : 50, SEQ ID NO : 52, SEQ ID NO : 54, SEQ ID NO : 56, SEQ ID NO : 58, SEQ ID NO : 60, SEQ ID NO : 62, SEQ ID NO : 64, SEQ ID NO : 66, SEQ ID NO : 68, SEQ ID NO : 70, SEQ ID NO : 72, SEQ ID NO : 74, SEQ ID NO : 76, SEQ ID NO : 78, SEQ ID NO : 80, SEQ ID NO : 82, SEQ ID NO : 84, SEQ ID NO : 86, SEQ ID NO : 88, SEQ ID NO : 90, SEQ ID NO : 92, SEQ ID NO : 94, SEQ ID NO : 96, SEQ ID NO : 98, SEQ ID NO : 100, SEQ ID NO : 102, SEQ ID NO : 104, SEQ ID NO : 106, SEQ ID NO : 108, SEQ ID NO : 110, SEQ ID NO : 112, and SEQ ID NO 114.

The invention also encompasses a transgenic cell and/or organism that comprises any of the nucleotide sequences set forth above. A method of producing a transformed biofuel or biofuel precursor-producing cell or organism is also included. The method comprises: selecting an organism or cell suitable for growth on a suitable medium; and transforming the cell or organism with a nucleic acid molecule above.

According to some embodiments, a method of producing bio fuels comprises: contacting the transformed organism, comprising a nucleic acid molecule that comprises the nucleotide sequence of any one of SEQ ID NO 1-114, to a suitable medium; and, harvesting the lipids or fatty acids produced by the transformed organism.

In some embodiments, the organism suitable for growth on a medium is selected from the group consisting of: Candidatus Microthrix, Candidatus Microthrix parvicella Bio 17-1, Saccharomyces cerevisiae, Escherichia coli, a cyanobacteria, and an alga. In some embodiments, the suitable medium is wastewater. In some embodiments, the method further comprises: modifying the lipids by transesterification.

According to some embodiments, a method of producing bio fuels comprises: growing an organism containing at least one of the nucleic acid molecules set forth herein; growing the organism on a suitable medium; and harvesting the lipids or fatty acids produced by the organism. In some embodiments, the organism is selected from: Candidatus Microthrix, Candidatus Microthrix parvicella Bio 17-1, Saccharomyces cerevisiae, Escherichia coli, a cyanobacteria, and an alga. The method preferably further comprises: modifying the lipids by transesterification.

In yet other embodiments, the Candidatus Microthrix spp. is selected from the group consisting of Candidatus Microthrix parvicella and Candidatus Microthrix calida. Candidatus Microthrix calida is described in Levantesi C et al. (2006) Eniviron. Microbiol. 8: 1552.

According to some embodiments, an isolated peptide comprises a sequence having at least 80% homology, more preferably at least 90% homology, and most preferably at least 95% homology to a sequence selected from the group consisting of: SEQ ID NO : 2, SEQ ID NO : 4, SEQ ID NO : 6, SEQ ID NO : 8, SEQ ID NO : 10, SEQ ID NO : 12, SEQ ID NO : 14, SEQ ID NO : 16, SEQ ID NO : 18, SEQ ID NO : 20, SEQ ID NO : 22, SEQ ID NO : 24, SEQ ID NO : 26, SEQ ID NO : 28, SEQ ID NO : 30, SEQ ID NO : 32, SEQ ID NO : 34, SEQ ID NO : 36, SEQ ID NO : 38, SEQ ID NO : 40, SEQ ID NO : 42, SEQ ID NO : 44, SEQ ID NO : 46, SEQ ID NO : 48, SEQ ID NO : 50, SEQ ID NO : 52, SEQ ID NO : 54, SEQ ID NO : 56, SEQ ID NO : 58, SEQ ID NO : 60, SEQ ID NO : 62, SEQ ID NO : 64, SEQ ID NO : 66, SEQ ID NO : 68, SEQ ID NO : 70, SEQ ID NO : 72, SEQ ID NO : 74, SEQ ID NO : 76, SEQ ID NO : 78, SEQ ID NO : 80, SEQ ID NO : 82, SEQ ID NO : 84, SEQ ID NO : 86, SEQ ID NO : 88, SEQ ID NO : 90, SEQ ID NO : 92, SEQ ID NO : 94, SEQ ID NO : 96, SEQ ID NO : 98, SEQ ID NO : 100, SEQ ID NO : 102, SEQ ID NO : 104, SEQ ID NO : 106, SEQ ID NO : 108, SEQ ID NO : 110, SEQ ID NO : 112, and SEQ ID NO 114.

According to some embodiments, a method of producing or modifying lipids and fatty acids comprises: adding to a suitable feedstock, a peptide identified herein; and collecting and/or purifying the resulting lipid or fatty acid product.

The invention also encompasses an isolated 14-24 base pair nucleic acid sequence comprises a sequence complimentary to the sense or antisense sequence of a sequential 14 base pair sequence present in a sequence selected from: SEQ ID NO : 1, SEQ ID NO : 3, SEQ ID NO : 5, SEQ ID NO : 7, SEQ ID NO : 9, SEQ ID NO : 11, SEQ ID NO : 13, SEQ ID NO : 15, SEQ ID NO : 17, SEQ ID NO : 19, SEQ ID NO : 21, SEQ ID NO : 23, SEQ ID NO : 25, SEQ ID NO : 27, SEQ ID NO : 29, SEQ ID NO : 31, SEQ ID NO : 33, SEQ ID NO : 35, SEQ ID NO : 37, SEQ ID NO : 39, SEQ ID NO : 41, SEQ ID NO : 43, SEQ ID NO : 45, SEQ ID NO : 47, SEQ ID NO : 49, SEQ ID NO : 51, SEQ ID NO : 53, SEQ ID NO : 55, SEQ ID NO : 57, SEQ ID NO : 59, SEQ ID NO : 61, SEQ ID NO : 63, SEQ ID NO : 65, SEQ ID NO : 67, SEQ ID NO : 69, SEQ ID NO : 71, SEQ ID NO : 73, SEQ ID NO : 75, SEQ ID NO : 77, SEQ ID NO : 79, SEQ ID NO : 81, SEQ ID NO : 83, SEQ ID NO : 85, SEQ ID NO : 87, SEQ ID NO : 89, SEQ ID NO : 91, SEQ ID NO : 93, SEQ ID NO : 95, SEQ ID NO : 97, SEQ ID NO : 99, SEQ ID NO : 101, SEQ ID NO : 103, SEQ ID NO : 105, SEQ ID NO : 107, SEQ ID NO : 109, SEQ ID NO : 111, and SEQ ID NO 113.

In certain aspects, the present invention is directed to the production of biofuels and bio fuel precursors in transformed organisms.

As was described in U.S. Publication No. 2011/0223641 to Stephanopoulos et al, biofuel and biofuel precursor production using transformed, or engineered, microorganisms is known in the art. In Stephanopoulus, the transformed microorganism, Y. lipolytica, was modified to upregulate lipid production and accumulation by genetic manipulation which also conferred increased resistance to feedstock toxicity.

According to the results of the present disclosure, the genes isolated from Bio 17-1 are related to a lipid-accumulating phenotype. Specifically, the novel genes identified from Bio 17-1 can be introduced in specific "cassettes" that allow any transformed organism to increase lipid production, modification, purification or storage. Using methods known in the art, these genes or cassettes of genes can be transformed into organisms such as Candidatus Microthrix, Candidatus Microthrix parvicella Bio 17-1, Saccharomyces cerevisiae, Escherichia coli, cyanobacteria, and algae as these organisms are capable of being grown in liquid-phase media in large batches with high density of cells. (See for example, Kawai et al., Transformation of Saccharomyces cerevisiae and other fungi: methods and possible underlying mechanism.) The use of such organisms allows for the production, modification, or purification of large amounts of biofuel or biofuel precursors at economical costs. Other organisms would be suitable as one having skill in the art would recognize from the present disclosure.

As can be determined by one having skill in the art by the present disclosures, some modification of the genes may be necessary to produce the desired function of the genes or gene products introduced into a transformed organism. The function of the gene products is disclosed above in each section. For example, the gene products of the odd-numbered sequences of SEQ ID NO: 1 - 55 (SEQ ID NO: 1, 3, 5 . . . 55) perform the function of catalyzing the reaction: ATP + a long-chain carboxylate + CoA = AMP + diphosphate + an acyl-CoA The functions of the other genes and gene products are set forth herein.

To ensure that the gene product performs the same function in a transformed microorganism, an assay may need to be conducted that ensures that the gene product is enzymatically active. For example, the function of long-chain fatty acid:CoA ligases can be assayed using the method set forth in Bierbach, Studies on long chain fatty acid:CoA ligase from human small intestine. Gut. 1980 Aug;21(8) 689-954. Assays for the functionality of the other above-mentioned enzymes are routine and well-known in the art.

Modifications that might be necessary to ensure functionality in a transformed organism may include: changing the nucleotide sequence to produce a homologous peptide with a different codon sequence; adding, deleting, or modifying promoter and repressor sequences depending on the specific regulation mechanisms of the transformed organism, modifying the nucleic acid sequence for greater translational or transcriptional efficiency, adding a sequence encoding a poly-adenosine tail, adding a fluorescent marker or identifying sequence, or the like. Additionally, conservative amino acid encoding substitutions may be made to create a sequence that encodes a similar peptide but with slightly increased or decreased functionality as can be readily determined by the above-mentioned assay and others well-known in the art.

A modified nucleic acid would comprise a sequence having preferably 80% homology to a sequence located in the sequence listing, more preferably 95% homology, and most preferably, 100%) homology. A transgenic cell could be created according to the above disclosures using a sequence comprising any of these sequences. Such a transgenic cell may be useful for producing biofuel or biofuel precursors by selecting an organism based on the desired growth medium (for example nutrient broth, agar, wastewater, or any other desired media) and then transforming the organism with any of the above mentioned sequences. Additionally, the biofuel or biofuel precursor could be collected by contacting the transformed organism to the desired media and then harvesting the lipids or fatty acids produced by the transformed organism. Methods of harvesting lipids or fatty acids from transformed organisms are well-known in the art, and are discussed in U.S. Publication No. 2011/0223641 to Stephanopoulos et al.

In addition to using a transformed cell, an organism containing the above-mentioned nucleotide sequences could also be used to produce biofuel. Locating an organism containing the above mentioned nucleotide sequences can be readily determined through PCR, LAMP, or other DNA amplification and identification techniques based on the sequences disclosed herein. For example, a PCR primer consists, typically, of a 14-24 base pair sequence being complementary to the 3' ends of a sense and antisense strand of the desired gene. Using the sequences set forth herein, one having skill in the art would readily be able to design primers to locate the genes herein disclosed and additionally genes having similar functionality based on sequence modifications. For example, other Candidatus Microthrix species may have similar genes that would be useful for the production of biofuels and these genes could be readily isolated using the disclosed sequences. Additionally, nucleotide sequences that encode the peptide sequences described could also be used to design primers that will locate genes that disclose genes having a conserved amino acid sequence but that contain conservative mutations.

In some embodiment, the present invention is directed to the production of biofuels using peptide enzymes. The peptide sequences disclosed herein (and the modified versions disclosed having 80%, 95%, and 100%) homology) may be used in vitro to produce biofuel or biofuel precursors. In vitro biofuel production has several advantages over in vivo production, not least of which are not having to protect the in vivo organism from disease or deadly changes in environmental conditions. Therefore, the peptides and modified versions disclosed may be used to produce biofuel or biofuel precursors by adding them to a suitable feedstock (such as wastewater, a lipid or fatty acid-enriched medium, or other ligand, substrate, or product of the disclosed enzymes, or other such material containing a suitable chemical). Thereafter, the resulting product of the reaction may be collected, or collected and purified to result in a biofuel or biofuel precursor.

Related publications containing teachings that may be used in practicing the present invention include U.S. Publication No. 2012/01599839 to Koskinen et al, U.S. Publication No. 2005/0112735 to Zappi et al, and U.S. Publication No. 2011/0223641 to Stephanopoulos et al.

U.S. Publication No. 2012/01599839 to Koskinen et al. discloses an integrated process for producing biofuels using various feedstock materials using microorganisms that may directly process the feedstock into biofuels or biofuel precursors, or may produce enzymes useful in converting feedstock into biofuels or biofuel precursors.

U.S. Publication No. 2005/0112735 to Zappi et al. describes a method for producing biofuels in large-scale commercial amounts using lipids extracted from sludge generated from wastewater treatment. The lipid extraction is performed in vitro by chemical extraction with generation of bio fuel from transesterification of collected lipids.

U.S. Publication No. 2011/0223641 to Stephanopoulos et al. describes biofuel and bio fuel precursor production using transformed, or engineered, microorganisms. The transformed microorganism, Y. lipolytica, was modified to upregulate lipid production and accumulation by genetic manipulation which also conferred increased resistance to feedstock toxicity.

The present invention is further illustrated by the following examples that should not be construed as limiting. The contents of all references, patents, and published patent applications cited throughout this application, as well as the Figures, are incorporated herein by reference in their entirety for all purposes.

EXAMPLES

Example 1. Genome Sequence of "Candidatus Microthrix parvicella" Bio 17-1, a Long-Chain- Fatty-Acid-Accumulating Filamentous Actinobacterium from a Biological Wastewater Treatment Plant

Methods

A draft genome sequence of ' Candidatus M. parvicella' Bio 17-1 was produced and genes related to lipid storage and processing were identified according to the following method.

The Bio 17-1 strain was isolated from a Dutch wastewater treatment plant serving fish industries. First, two sequencing libraries were prepared with mean insert lengths of 350 basepairs (bp) (paired-end) or 2,750 bp (mated pairs) and sequenced on an Illumina Genome

Analyzer II. Raw 100-bp reads were error corrected with Quake. (Kelley et al. 2010. Quake: quality-aware detection and correction of sequencing errors. Genome Biology 11 :R116) 5.84 x

10 6 paired and 1.12 x 10 6 single-end reads with minimum mean quality value (QV) of 30 and a minimum length of 70 bp were used for assemblies. Second, 24,031 SMRT sequence reads were obtained on a Pacific Biosciences PacBio RS using CI chemistry. Error correction yielded 2,625 reads (232-1984 bp) from this run.

Using the Illumina sequence reads, two preliminary assemblies were obtained with

Velvet (Zerbino et al. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18:821-829) and Edena (Hernandez et al. 2008. De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computer. Genome Res.), and merged with the minimus2 utility (Treangen et al. 2011. Next generation sequence assembly with AMOS. Curr Protoc Bioinformatics Chapter l l :Unit 1 1.8). The resulting 27 contigs were scaffolded with SSPACE (Boetzer et al. 2011. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27:578-579) and gaps filled with GapFiller (Boetzer et al. 2012. Toward almost closed genomes with GapFiller. Genome biology 13:R56). Additional assemblies were obtained using SOAPdenovo (Li et al. 2009. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res.) (kmer values between 65-81, steps of 2) and CABOG (Miller et al. 2008. Aggressive assembly of pyro sequencing reads with mates. Bioinformatics 24:2818-2824). Error corrected PacBio reads (Koren et al. 2012. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nature Biotechnology 30:693-700) and the additional assemblies, were mapped onto the preliminary assemblies. Draft contigs were broken where discrepancies among assemblies or PacBio reads suggested misassemblies. Conversely, contigs were joined where contig ends overlapped with perfect identity for at least 500 bp. Manual curation and manipulation of the assemblies were performed using consed (Gordon et al. 1998. Consed: a graphical tool for sequence finishing. Genome Res. 8: 195-202). Automatic annotation and draft metabolic reconstruction were performed by the RAST server (Aziz et al. 2008. The RAST Server: rapid annotations using subsystems technology. BMC Genomics 9:75). CRISPR loci were identified using CRISPRFinder (Grissa et al. 2007. CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats. Nucleic acids research 35:W52-7).

Results

The draft assembly consists of 4,202,850 bp, arranged in 13/16 scaffolds/contigs, with a mean GC% of 66.4. Automated annotation identified 4,063 coding sequences, in addition to 1 rRNA operon and 46 tRNAs covering all amino-acids. A complete pentose phosphate pathway and TCA cycle are encoded in the genome. As previously hypothesized for ' parvicella' strain RNl, a nitrate reductase is encoded by the genome, but no nitrite reductase appears to be present. The strain is also predicted to be a prototroph for all amino-acids and to be able to polymerize/depolymerize polyhydroxybutyrate. No genes are annotated as related to photosynthesis, plasmid, prophage or phage elements. The draft sequence contains one CRISPR locus with 88 spacers.

'M parvicella' Biol7-l 's ability to accumulate excessive amounts of fatty acids is also highlighted by its gene content. The genome encodes 28 homologs of long-chain-fatty— acyl- CoA ligase and 17 homologs of enoyl-CoA hydratase. The genetic inventory of 'M parvicella' makes it of particular interest for future wastewater treatment strategies based around the comprehensive reclamation of nutrients and chemical energy-rich biomolecules.

The genome sequence of "Candidatus Microthrix parvicella" strain Bio 17-1 has been deposited at DDBJ/EMBL/GenBank under accession number AMPGOOOOOOOO; the version described in this paper is the first version, AMPGO 1000000. A provisional annotation is available upon request. Raw sequence reads were deposited in the Sequence Read Archive under accession number SRA058866.

The following are classifications of the identified enzymes. Nucleic acid sequences are listed in the Sequence Listing and the resulting protein encoded by each nucleic acid sequence is listed immediately after the nucleic acid sequence which encodes it.

Long-chain-fatty-acid— CoA ligase (EC 6.2.1.3)

According to the IUBMB nomenclature, these enzymes catalyze the reaction:

ATP + a long-chain carboxylate + CoA = AMP + diphosphate + an acyl-CoA

'Candidatus Microthrix parvicella' Bio 17-1 genome contains 28 homologues of this enzyme (Figure 1) and (Table 1).

Table 1 :

SEQ ID NO : 13 fig 340363.9.peg.ll68 (Mparv_1168)

SEQ ID NO : 15 fig 340363.9.peg.l597 (Mparv_1597)

SEQ ID NO : 17 fig 340363.9.peg.l683 (Mparv_1683)

SEQ ID NO : 19 fig 340363.9.peg.l808 (Mparv_1808)

SEQ ID NO : 21 fig 340363.9.peg.l904 (Mparv_1904)

SEQ ID NO : 23 fig 340363.9.peg.2090 (Mparv_2090)

SEQ ID NO : 25 fig 340363.9.peg.2103 (Mparv_2103)

SEQ ID NO : 27 fig 340363.9.peg.2191 (Mparv_2191)

SEQ ID NO : 29 fig 340363.9.peg.2242 (Mparv_2242)

SEQ ID NO : 31 fig 340363.9.peg.2435 (Mparv_2435)

SEQ ID NO : 33 fig 340363.9.peg.2685 (Mparv_2685)

SEQ ID NO : 35 fig 340363.9.peg.2716 (Mparv_2716)

SEQ ID NO : 37 fig 340363.9.peg.2732 (Mparv_2732)

SEQ ID NO : 39 fig 340363.9.peg.2734 (Mparv_2734)

SEQ ID NO : 41 fig 340363.9.peg.2828 (Mparv_2828)

SEQ ID NO : 43 fig 340363.9.peg.2932 (Mparv_2932)

SEQ ID NO : 45 fig 340363.9.peg.2979 (Mparv_2979)

SEQ ID NO : 47 fig 340363.9.peg.3191 (Mparv_3191)

SEQ ID NO : 49 fig 340363.9.peg.3315 (Mparv_3315)

SEQ ID NO : 51 fig 340363.9.peg.3388 (Mparv_3388)

SEQ ID NO : 53 fig 340363.9.peg.3405 (Mparv_3405)

SEQ ID NO : 55 fig 340363.9.peg.3638 (Mparv_3638)

Enoyl-CoA hvdratase (EC 4.2.1.17)

According to the IUBMB nomenclature, these enzymes catalyze the reaction:

(3iS)-3-hydroxyacyl-CoA = trans-2(or 3)-enoyl-CoA + H 2 0

'Candidatus Microthrix parvicella' Bio 17-1 genome contains 17 homo log enzyme (Figure 2) and (Table 2). Table 2:

Lipase (EC 3.1.1.5 and/or EC 3.1.1.23)

According to the IUBMB nomenclature, these enzymes hydrolyze glycerol monoesters of long-chain fatty acids (Table 3).

Table 3:

3-ketoacyl-CoA thiolase (EC 2.3.1.16)

According to the IUBMB nomenclature, these enzymes catalyze the reaction:

acyl-CoA + acetyl-CoA = Co A + 3-oxoacyl-CoA

'Candidatus Microthrix parvicella' Bio 17-1 genome contains 8 homologues enzyme (Table 4).

Table 4:

Acyl-CoA thioesterase (EC 3.1.2.-)

This enzyme is predicted by automatic annotation to catalyze the reaction:

acyl-CoA = CoA + free fatty acid

Candidatus Microthrix parvicella' Bio 17-1 genome contains 1 type of this enzyme (Table

5).

Table 5:

3-hydroxyacyl-CoA dehydrogenase (EC 1.1.1.35)

According to the IUBMB nomenclature, this enzyme catalyzes the reaction:

(5)-3-hydroxyacyl-CoA + NAD + = 3-oxoacyl-CoA + NADH + H +

Candidatus Microthrix parvicella' Bio 17-1 genome contains 1 type of this enzyme (Table 6). Table 6:

The protein products of the above mentioned genes were also determined. These proteins are listed in the Sequence Listing where each protein sequence follows immediately the nucleic acid sequence that encodes it.

Example 2. Systematic Molecular Measurements Reveal Key Microbial Populations Driving Community- Wide Phenotype

Introduction

Natural microbial communities are heterogeneous and dynamic. Therefore, a major consideration for multiple omic data studies is the sample-to-sample heterogeneity, which can lead to inconsistent results if the different biomolecular fractions are obtained from distinct sub- samples. Conversely, systematic omic measurements, i.e. the standardized, reproducible and simultaneous measurement of multiple features from a single undivided sample, result in fully integrable datasets.

Our objective was to prove the feasibility and benefits of such systematic measurements in the study of the respective contributions of different populations to a community-wide phenotype (here, the lipid accumulation of microbial community naturally present at the air- water interface of certain biological wastewater treatment systems).

Methods

The experimental methods used for this study are outlined in Figure 3.

Results

The lipid accumulating community was studied in quadruplicate at 4 different time points in term of composition (Figure 4a,b), diversity, evenness (Figure 4c) and long chain fatty acid contents. It was observed that biological replicates sampled the same day are highly heterogeneous, and community structure changes significantly over time, leading to a switch in the dominant organism. Biomass lipid accumulation increased when Microthrix spp. dominated the community (this also corresponded with low wastewater temperature). During the transition, there was a time point (25/01/11) when the diversity and the evenness of the community were maximal. This time point (25/01/11) was chosen to show the benefits of systematic measurements on the understanding of a community- wide phenotype (see Figure 4).

The abundance of a population identified as Zoogloea spp. with a confidence above 80% correlated with the abundance of numerous nonpolar metabolites, some of them identified as long-chain fatty acids. Zoogloea species accumulate intracellular non-polar granules. Correlations between bacterial abundances and metabolite abundances allowed the identification of low abundance species playing an essential role in the community as so called "keystone species" (see Figure 5).

Subset 08 was identified as a Microthrix population following comparison of the protein sequence with the proteome sequences of an isolate genome (see Figures 6-8).

Example 3. Linking Mixed Microbial Community Phenotype to Individual Genotypes Introduction

Biological wastewater treatment is arguably the most widely used biotechnological process on Earth. Wastewater also represents a valuable energy commodity that is currently not being harnessed comprehensively. Mixed microbial communities that naturally occur at the air- water interface of certain biological wastewater treatment systems accumulate excess long-chain- fatty-acids intracellularly (see Figures 15-17). This phenotypic trait may potentially be exploited for the transformation of wastewater into biodiesel (fatty acid methyl esters).

Using a molecular Eco-Systems Biology approach, we studied which organisms and genes contribute to the community-wide lipid accumulation phenotype and, thus, overall community function.

Methods

The experimental methods used for this study are outlined in Figure 9. Results

It was observed that the community structure changes significantly over time, leading to a switch in the dominant organism, and biological replicates sampled the same day are highly heterogeneous. Biomass lipid accumulation was maximal when Microthrix spp. dominated the community (this corresponded with cold wastewater temperature) (see Figure 10).

Four species of low abundance (see Figure 10b) predominated among those producing non-polar metabolites and particular long chain fatty acids, namely Rhizobacter, Blautia, Hydrotalea and OD1 (see Figure 11).

Microthrix spp. were key players within the community in the lipid accumulating phenotype (see Figure 13). Sequencing of the Microthrix parvicella Biol7-l genome uncovered 25 homo logs of long-chain- fatty-acid acyl-CoA ligase (see Figures 11-12 and 18).

Conclusion

Functional meta-omic analyses offer exciting prospects for elucidating the genetic blueprints and the functional relevance of specific populations within microbial communities (if the biomolecular fractions used for high-throughput omics are coming from a single undivided sample). Connecting the overall community phenotype to specific genotypes allows much needed fundamental ecological understanding of microbial community and population dynamics, particularly in relation to environment-driven demography changes leading to tipping points and catastrophic bifurcations.

Unless defined otherwise, all technical and scientific terms herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials, similar or equivalent to those described herein, can be used in the practice or testing of the present invention, the preferred methods and materials are described herein. All publications, patents, and patent publications cited are incorporated by reference herein in their entirety for all purposes.

The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. While the invention has been described in connection with specific embodiments thereof, it will be understood that it is capable of further modifications and this application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains and as may be applied to the essential features hereinbefore set forth and as follows in the scope of the appended claims.