METHOD FOR OBTAINING INTEGRATED GENOMIC, TRANSCRIPTOMIC, PROTEOMIC AND/OR METABOLOMIC INFORMATION FROM A SINGLE UNIQUE BIOLOGICAL SAMPLE

Title:

METHOD FOR OBTAINING INTEGRATED GENOMIC, TRANSCRIPTOMIC, PROTEOMIC AND/OR METABOLOMIC INFORMATION FROM A SINGLE UNIQUE BIOLOGICAL SAMPLE

Document Type and Number:

WIPO Patent Application WO/2013/113921

Kind Code:

Abstract:

The invention provides a method for the generation of integrated genomic, transcriptomic, proteomic and/or metabolomic information from single unique biological samples. It includes the separation and purification of biomolecular components including polar and non-polar metabolites, genomic DNA, RNA and proteins from said samples. The invention allows the identification of associations between different biomolecules and genetic variants in heterogeneous cell populations. It further allows the establishment and quantification of cell population profiles in heterogeneous cell populations and/or in populations where the genomic context is unknown.

Inventors:

WILMES PAUL (LU)
VLASSIS NIKOLAOS (LU)

Application Number:

PCT/EP2013/052134

Publication Date:

August 08, 2013

Filing Date:

February 04, 2013

Export Citation:

Click for automatic bibliography generation Help

Assignee:

UNIV LUXEMBOURG (LU)

International Classes:

G01N33/68; C12N15/10

Domestic Patent References:

WO2003058238A1	2003-07-17
WO2013029919A1	2013-03-07

Foreign References:

EP1327883A2

2003-07-16

Other References:

LAUGHARN JAMES ET AL: "Novel technology used in cancer sample preparation", PROCEEDINGS OF THE AMERICAN ASSOCIATION FOR CANCER RESEARCH, AACR, US, vol. 50, 22 April 2009 (2009-04-22), pages 798 - 799, XP009163540, ISSN: 0197-016X
ROUME HUGO ET AL: "A biomolecular isolation framework for eco-systems biology", ISME JOURNAL, vol. 7, no. 1, January 2013 (2013-01-01), pages 110 - 121, XP009168480, ISSN: 1751-7362, DOI: 10.1038/ismej.2012.72
GROSS V; CARLSON G; KWAN AT; SMEJKAL G; FREEMAN E; IVANOV AR ET AL.: "Tissue fractionation by hydrostatic pressure cycling technology: the unified sample preparation technique for systems biology studies", J BIOMOL TECH, vol. 3, 2008, pages 189 - 199

Attorney, Agent or Firm:

LECOMTE & PARTNERS (Luxembourg, LU)

Download PDF:

View/Download PDF PDF Help

Claims:

Claims

A method for obtaining integrated genomic, transcriptomic, proteomic and metabolomic data from a biological sample characterised by the steps of:

- providing a single unique sample comprising heterogeneous cell populations and biomolecular components, the biomolecular components comprising metabolites, genomic DNA, RNA and proteins;

- separating and purifying the biomolecular components from the biological sample, which separating and purifying comprises performing mechanical homogenization and partial lysis of the biological sample such that only a partial amount of the cells are lysed;

- performing genomic, transcriptomic, proteomic and metabolomic analyses to identify and/or quantify said biomolecular components, resulting in molecular data represented in at least one dataset;

- performing analysis and/or modeling of the molecular data.

The method as claimed in claim 1 characterised in that the step of separating and purifying biomolecular components comprises separating and purifying both polar and non-polar metabolites and/or a step of isolation of large and small RNA.

The method as claimed in claim 2 characterised in that the transcriptomic analyses include a sequencing of small RNA in order to obtain information including microRNA.

4. The method as claimed in any of claims 1 to 3 characterised in that the step of performing genomics includes obtaining epigenomic data.

5. The method as claimed in any of claims 1 to 4 characterised in that the step of performing genomics further includes performing SNP genotyping to obtain SNP patterns.

6. The method as claimed in any of claims 1 to 5 characterised in that the step of performing analysis and/or modeling of the molecular data includes a data preparation step, with preference said data preparation step includes at least one of (a) performing data filtering according to certain cut-offs, (b) performing missing value estimation (c) performing variable selection, and (d) any combination thereof.

7. The method as claimed in any of claims 1 to 6 characterised in that the step of, performing analysis and/or modeling of the molecular data includes at least one of (a) calculating linear or non-linear correlations between one or more of the datasets and/or variables identified within the datasets with preference by use of one or more correlation coefficients; (b) performing clustering; (c) performing dimension reduction with methods selected from principal component analysis (PCA), independent component analysis (ICA), singular value decomposition (SVD) and/or multidimensional scaling (MDS); and (d) any combination thereof.

The method as claimed in any of claims 1 to 7 characterised in that the step of performing analysis and/or modeling of the molecular data includes determining relationships between one or more of the datasets and/or variables identified within the datasets using alternative dependence measures with preference selected from mutual information and/or partial correlation.

The method as claimed in any of claims 1 to 8 characterised in that the step of performing analysis and/or modeling of the molecular data includes determining relationships between one or more of the datasets and/or variables identified within the datasets using one of more of (a) probabilistic models for capturing the dependencies among set of variables with preference including determining probabilistic graphical models and/or copulas, (b) estimation techniques selected from likelihood maximisation and convex programming.

10. The method as claimed in any of claims 1 to 9 characterised in that before the step of separating and purifying biomolecular components; the single biological sample is collected and immediately snap-frozen at a temperature of below -70°C and with preference at a preferred temperature of -196°C, and cryopreserved at said temperature until said step of separating and purifying biomolecular components.

1 1. The method as claimed in any of claims 1 to 10 characterised in that the mechanical homogenization and partial lysis in the step of separating and purifying of biomolecular components is followed by a second mechanical lysis step in which all remaining intact cells are lysed.

12. The method as claimed in any of claims 1 to 10 characterised in that the step of separating and purifying of biomolecular components from a single biological sample, comprises the following sub-steps:

a) performing a metabolite extraction on the homogenized single unique biological sample following the mechanical homogenization and partial lysis by addition of a phase separation solution, and centrifugation to form an upper phase, an interphase pellet and a lower phase; such that polar metabolites are in the upper phase, genomic DNA, RNA and proteins and the remaining cells not lysed by the mechanical lysis are in the interphase pellet, and non-polar metabolites are in the lower phase;

b) collecting separately the upper phase, the lower phase and the interphase pellet; c) adding a lysis solution to the collected interphase pellet to perform a chemical lysis or a combined mechanical and chemical lysis in order to obtain a lysate;

d) performing a sequential isolation of genomic DNA, RNA and proteins on the lysate.

13. The method as claimed in any of claims 1 to 12 characterised in that the partial lysis is halted when about 30 to 80 % and preferably about 50 % of cells have been lysed.

14. The method as claimed in any of claims 1 to 13 characterised in that the steps of separating and purifying biomolecular components comprising metabolites, genomic DNA, RNA and proteins from a single biological sample; and the step of performing genomic, transcriptomic, proteomic and metabolomic analyses to identify and/or quantify said biomolecular components; are carried out taking several temporally and/or spatially- resolved samples.

15. The method as claimed in any of claims 1 to 14 characterised in that the heterogeneous cell populations are selected from tumour cells and/or mixed microbial communities.

16. The method as claimed in claim 15 characterised in that it further comprises the step of comparing molecular data to databases of biological profiles to establish and/or quantify cell population profiles.

METHOD FOR OBTAINING INTEGRATED GENOMIC, TRANSCRIPTOMIC, PROTEOMIC AND/OR METABOLOMIC INFORMATION FROM A SINGLE

UNIQUE BIOLOGICAL SAMPLE

Description:

METHOD FOR OBTAINING INTEGRATED GENOMIC, TRANSCRIPTOMIC, PROTEOMIC AND/OR METABOLOMIC INFORMATION FROM A SINGLE

UNIQUE BIOLOGICAL SAMPLE

The invention relates to methods for generating, analysing and/or modeling data in the framework of integrated molecular systems biology investigations of biological samples. The invention also relates to methods for generating data representative of metabolite, protein, RNA and DNA components from single unique biological samples.

Heterogeneous cell populations are found in all biological systems, e.g. in mixed microbial communities or in cancerous human tumours. At the present time, it is not possible to study such heterogeneous cell populations in an integrated way to fully understand the biology at the system-level or to establish multifactorial molecular profiles (biomarkers) of such heterogeneous cell populations.

An important consideration for such integrated analyses is the need to obtain comprehensive and representative biomolecular fractions of DNA, RNA, proteins and metabolites which can then be analysed using dedicated instrumentation, so that the resulting omic data can be meaningfully analysed and modelled to provide systems-level information on the role of individual cell populations in the original sample. The main consideration is then the need to obtain genomic, transcriptomic, proteomic and metabolomic data from single unique samples. Indeed molecular data obtained from different sub-samples from heterogeneous biological samples is hampered:

- firstly by the uncertainty of whether the resulting disjointed datasets actually represent the overall cellular states, and

- secondly by the uncertainty of whether the individual omic datasets actually are representative of the underlying genetic information. A method for combined metabolomic, proteomic and transcriptomic analysis from one single sample and suitable statistical evaluation and correlation analysis is known from EP1327883. According to this method, at least two compound classes from the group of metabolites, proteins and RNA are extracted from the same sample, identified and quantified. The data obtained are then combined to perform network analysis. However, importantly this document does not disclose how to provide concomitant data useful for genomic analysis from the same single biological sample. It has to be noted that access to concomitant genomic (genetic) information is of paramount importance when studying heterogeneous cell populations as in general the genetic context is unknown in such systems. For example, the genomic information will be largely unknown for a mixed microbial community sampled from any eco-system. Such genomic information is often also unknown in cancer tumours, i.e. the extent of genetic heterogeneity resulting from an accumulation of mutations is unknown, as well as the specific types of mutations are also often unknown. The mere study of mRNA through transcriptomic techniques is not sufficient to establish good population profiles. Firstly, it is not sufficient because all RNA variant transcripts do not show the same transcription levels. Consequently, it is not possible to get information on the relative importance of each cell population in the sample or of the different mutations. Secondly, it is not sufficient because merely studying the mRNA transcripts does not provide information about the regulation of transcription, including information about the abundance of regulator elements such as microRNA for example, or mutations in regulatory genomic regions. Therefore is also a need for the person skilled in the art to get access to small RNA fraction including tRNA, and small non-coding RNA including microRNA to allow their integration.

Therefore, there is a need for the person skilled in the art to combine genomic, transcriptomic, proteomic and metabolomic analyses carried out on single unique samples. There is also a need for the person skilled in the art to establish and/or quantify cell population profiles particularly in samples where cell populations are heterogeneous and/or the genomic context is unknown.

The invention solves at least one problem encountered in prior art in providing to the skilled man a method to obtain integrative molecular data from biomolecular components including genomic DNA, RNA, proteins and metabolites from single unique samples to identify meaningful associations between different biomolecules and/or genetic variants in heterogeneous cell populations. According to a first embodiment, the invention relates to a method for obtaining integrated genomic, transcriptomic, proteomic and metabolomic data from a biological sample characterised by the steps of:

- providing a single unique sample comprising heterogeneous cell populations and biomolecular components, the biomolecular components comprising metabolites, genomic DNA, RNA and proteins;

- separating and purifying the biomolecular componentsfrom the biological sample, which separating and purifying comprises performing mechanical homogenization and partial lysis of the biological sample such that only a partial amount of the cells are lysed;

- performing genomic, transcriptomic, proteomic and metabolomic analyses to identify and/or quantify said biomolecular components, resulting in molecular data represented in at least one dataset;

- performing analysis and/or modeling of the molecular data. Preferably, the step of separating and purifying biomolecular components comprises separating and purifying both polar and non-polar metabolites and/or a step of isolation of large and small RNA.

Preferably, the transcriptomic analyses include a sequencing of small RNA in order to obtain information including microRNA. Preferably, the step of performing genomics includes obtaining epigenomic data.

Preferably, the step of performing genomics further includes performing SNP genotyping to obtain SNP patterns.

Preferably, the step of performing analysis and/or modeling of the molecular data includes a data preparation step, with preference said data preparation step includes at least one of (a) performing data filtering according to certain cut-offs, (b) performing missing value estimation (c) performing variable selection, and (d) any combination thereof.

Preferably, the step of performing analysis and/or modeling of the molecular data includes at least one of (a) calculating linear or non-linear correlations between one or more of the datasets and/or variables identified within the datasets with preference by use of one or more correlation coefficients; (b) performing clustering;

(c) performing dimension reduction with methods selected from principal component analysis (PCA), independent component analysis (ICA), singular value decomposition (SVD) and/or multidimensional scaling (MDS); and (d) any combination thereof.

Preferably, the step of performing analysis and/or modelling of the molecular data includes determining relationships between one or more of the datasets and/or variables identified within the datasets using alternative dependence measures with preference selected from mutual information and/or partial correlation.

Preferably the step of performing analysis and/or modeling of the molecular data includes determining relationships between one or more of the datasets and/or variables identified within the datasets using one of more of (a) probabilistic models for capturing the dependencies among set of variables with preference including determining probabilistic graphical models and/or copulas, (b) estimation techniques selected from likelihood maximisation and convex programming. Preferably, before the step of separating and purifying biomolecular components; the single biological sample is collected and immediately snap-frozen at a temperature of below -70°C and with preference at a preferred temperature of - 196°C, and cryopreserved at said temperature until said step of separating and purifying biomolecular components.

Preferably, the mechanical homogenization and partial lysis in the step of separating and purifying of biomolecular components is followed by a second mechanical lysis step in which all remaining intact cells are lysed.

Preferably, the step of separating and purifying of biomolecular components from a single biological sample, comprises the following sub-steps:

d) performing a sequential isolation of genomic DNA, RNA and proteins on the lysate.

Preferably, the partial lysis is halted when about 30 to 80 % and preferably about 50 % of cells have been lysed. Preferably, the steps of separating and purifying biomolecular components comprising metabolites, genomic DNA, RNA and proteins from a single biological sample; and the step of performing genomic, transcriptomic, proteomic and metabolomic analyses to identify and/or quantify said biomolecular components; are carried out taking several temporally and/or spatially-resolved samples. Preferably, the heterogeneous cell populations are selected from tumour cells and/or mixed microbial communities.

Preferably, the method further comprises the step of comparing molecular data to databases of biological profiles to establish and/or quantify cell population profiles.

According to a second embodiment, the invention relates to a method for obtaining integrated genomic, transcriptomic, proteomic and/or metabolomic data from single unique biological samples that entails the steps of:

- separating and purifying biomolecular components including metabolites, genomic DNA, RNA and proteins from single unique biological samples;

- performing at least one of genomic, transcriptomic, proteomic and/or metabolomic analyses to identify and/or quantify said biomolecular components, resulting in molecular data dispatched in at least one dataset, with preference high-resolution molecular data;

- performing analysis and/or modeling of said molecular data, resulting in integrated genomic, transcriptomic, proteomic and/or metabolomic data.

With preference, biomolecular components include intracellular components, or both intracellular and extracellular components.

With preference, the method includes performing genomic, transcriptomic, proteomic and metabolomic analyses on a single unique biological sample to identify and/or quantify said biomolecular components, resulting in molecular data. With preference, the step of separating and purifying biomolecular components comprises separating and purifying both polar and non-polar metabolites. Preferably, the step of separating and purifying biomolecular components comprises a step of isolation of small RNA.

With preference, before the step of separating and purifying biomolecular components, the single biological sample is collected and immediately snap-frozen at a temperature of less than -70°C and with preference at a preferred temperature of -196 °C, and cryopreserved at said temperature until the step of separating and purifying biomolecular components.

Preferably, the step of separating and purifying biomolecular components from a single biological sample comprises the following sub-steps:

a) performing homogenisation and mechanical lysis of the single biological sample such that a part, and with preference the majority, of cells are lysed;

b) performing a metabolite extraction on the homogenised single unique biological sample from step (a) by addition of a phase separation solution, and centrifugation to form an upper phase, an interphase pellet and a lower phase; such that polar metabolites are in the upper phase, genomic DNA, RNA and proteins and the remaining cells not lysed by the mechanical lysis are in the interphase pellet, and non- polar metabolites are in the lower phase;

c) collecting separately the upper phase, the lower phase and the interphase pellet;

d) adding a lysis solution to the collected interphase pellet to perform a chemical lysis or a combined mechanical and chemical lysis in order to obtain a lysate;

e) performing a sequential isolation of genomic DNA, RNA and proteins on the lysate. Preferably, the mechanical lysis of step (a) is halted when about 30 to 80 % and preferably about 50 % of cells have been lysed.

Preferably, the sequential isolation of genomic DNA, RNA and proteins of step (e) further comprises the steps of

(e-1 ) Mixing lysate with dipolar atropic solvent or with polar tropic solvent such as ethanol to obtain a solution,

(e-2) Applying the solution of step (e-1 ) to a first chromatographic spin column under conditions for genomic DNA, large RNA and part of the proteins to bind, and for obtaining a flowthrough;

(e-3) Collecting the flowthrough which contains small RNA and a part of the proteins;

(e-4) Applying the flowthrough of step (e-3) to a second chromatographic spin- column under conditions for small RNA to bind and for obtaining a flowthrough;

(e-5) Eluting the small RNA from the second chromatographic spin column using a dedicated buffer;

(e-6) Eluting sequentially genomic DNA and large RNA from the first chromatographic spin column using dedicated buffers;

(e-6) Collecting the flowthrough of step (e-4) and adjusting the pH with preference to pH 3,

(e-7) Applying the pH adjusted flowthrough of step (e-4) to the first chromatographic spin-column;

(e-8) Eluting proteins from the first chromatographic spin-column.

Alternatively, the step of separating and purifying biomolecular components comprises separating and purifying non-polar metabolites and/or a step of isolation of small RNA. With preference, the step of separating and purifying biomolecular components from a single biological sample comprises the following sub-steps: a) submiting the single biological sample, combined in an unhomogenised form with an extraction and partitioning solution containing amphipathic organic solvents, to alternating low and ultra-high hydrostatic pressure such that the majority of cells are lysed, with preference more than 90 % of the cells are lysed;

b) performing extraction by centrifugation of the sample of step (a) by centrifugation to form an upper phase, an interphase pellet and a lower phase; such that non-polar metabolites are in the upper phase, genomic DNA and RNA are in the interphase pellet, and proteins are in the lower phase;

c) collecting separately the upper phase, the lower phase and the interphase pellet;

d) performing a sequential isolation of genomic DNA and RNA from the interphase pellet.

With preference an amphipathic organic solvent is fluorinated alcohol such as hexafluoroisopropanol.

Such an alternative extraction protocol is known to the man skilled in the art and described in "Gross V, Carlson G, Kwan AT, Smejkal G, Freeman E, Ivanov AR et al. (2008). Tissue fractionation by hydrostatic pressure cycling technology: the unified sample preparation technique for systems biology studies. J Biomol Tech 3:189-199."

With preference, the steps of separating and purifying biomolecular components comprising metabolites, genomic DNA, RNA and proteins are carried out on a single biological sample; and the steps of performing at least one of genomic, transcriptomic, proteomic and/or metabolomic analyses to identify and/or quantify said biomolecular components; are carried out taking several time- and/or spatially- resolved samples. Preferably, a single or several biological samples comprise heterogeneous cell populations for which the genomic information and/or context is unknown. Heterogeneous cell populations comprise for example tumour cells and/or mixed microbial communities.

Preferably, several spatially and/or temporally resolved samples are collected from the same organism and/or environment.

According to a third embodiment, the invention relates to a method for profiling a biological system comprising the steps of:

a) providing a plurality of molecular datasets resulting from at least one of genomic, transcriptomic, proteomic and/or metabolomic analyses of biomolecular components carried out at least one biological sample of a biological system;

b) performing analysis and/or modeling of the molecular data

c) determining a profile for a state of the biological system based on the analysis and/or modeling performed.

With preference the plurality of datasets results from at least one of genomic, transcriptomic, proteomic and/or metabolomic analyses of biomolecular components separated and purified from at least two biological samples of a biological system.

With preference the biomolecular components include metabolites, genomic DNA, RNA and/or proteins from single unique biological samples. With preference again metabolites include both polar and non-polar metabolites.

Optionally, for first, second and third embodiments of the invention:

- The step of performing transcriptomic analyses includes the sequencing of small RNA in order to obtain among others microRNA data. - The step of performing genomics may include the provision for obtaining epigenetic (epigenomic) information or data.

- The step of performing genomics may further include performing SNP genotyping to obtain SNP patterns.

- The step of performing transcriptomics may include performing analysis and quantification of post-transcriptional modifications of RNA transcripts.

- The step of performing proteomics may further include performing analysis and quantification of proteolysis and/or of post-translational modifications. Preferably, for the first, second and third embodiments of the invention, the step of performing analysis and/or modeling of the molecular data includes one or more of the following approaches:

- performing a data preparation step that may involve filtering of the data according to certain cut-offs, estimating missing values and/or performing variable selection for identifying relevant features in the data;

- performing clustering;

- performing dimension reduction with methods such as principal component analysis (PCA), independent component analysis (ICA), singular value decomposition (SVD) and/or multidimensional scaling (MDS);

- calculating linear or non-linear correlations between one or more of the datasets and/or of variables identified within the datasets, with preference by use of one or more correlation coefficients, for example the Pearson correlation coefficient, the Spearman correlation coefficient and/or the Kendall correlation coefficient. It is reminded that correlation refers to any of broad class of statistical relationships involving dependence between two or more variables, or two or more sets of data;

- determining relationships between one or more of the datasets and/or of variables identified within the datasets using alternative dependence measures, for example mutual information and partial correlation. It is reminded that mutual information measures the information that two variables share and can capture non-linear correlations between two variables, while partial correlation can capture dependency between two variables when conditioning on the rest of the variables, which is of utmost relevance when dealing with a plurality of biological data sources where pairwise correlations are often found to be due to confounding factors whose contribution needs to be explicitly considered in the course of the analysis;

- determining relationships between one or more of the datasets and/or of variables identified within the datasets using probabilistic models for capturing the dependencies among set of variables, as for example probabilistic graphical models and copulas. It is reminded that probabilistic graphical models allow one to model the stochastic dependencies of a set of variables in a form of a graph, while copulas allow one to model and estimate the joint probability distribution of sets of variables by separately estimating the marginals of each variable and their dependence structure;

- determining relationships between one or more of the datasets and/or of variables identified within the datasets using estimation techniques selected from likelihood maximization and convex programming;

- establishing at least one of (a) linear correlation, (b) non-linear correlation, (c) mutual information, (d) partial correlation, (e) probabilistic graphical models, (f) copulas and (g) any combination thereof, between epigenomic data or a variable from epigenomic data and at least one variable selected from transcriptomic data, proteomic data and/or metabolomic data or any combination thereof; - establishing at least one of (a) linear correlation, (b) non-linear correlation, (c) mutual information, (d) partial correlation, (e) probabilistic graphical models, (f) copulas and (g) any combination thereof, between post-transcriptional modification data and/or a variable from post-transcriptional modification data and at least one variable selected from genomic data, epigenomic data, transcriptomic data, proteomic data, metabolomic data or any combination thereof;

- establishing at least one of (a) linear correlation, (b) non-linear correlation, (c) mutual information, (d) partial correlation, (e) probabilistic graphical models, (f) copulas and (g) any combination thereof, between post-translational modification data or a variable from post-translational modification data and at least one variable selected from genomic data, epigenomic data, transcriptomic data, post-transcriptional data, proteomic data, metabolomic data or any combination thereof;

- establishing at least one of (a) linear correlation, (b) non-linear correlation, (c) mutual information, (d) partial correlation, (e) probabilistic graphical models, (f) copulas and (g) any combination thereof, between SNP patterns and one or more selected from transcriptomic data, proteomic data, metabolomic data or any combination thereof.

With preference, the method further comprises the step of comparing molecular data to databases of biological information to establish and/or quantify cell population profiles. Another object of the invention is an article of manufacture having computer- readable media with computer readable instructions embodied therein for performing the method according to first, second and/or third embodiment.

From this definition, it can be understood that, from a first aspect, the method subject of the present invention represents a clear improvement over prior art techniques in providing data about all biomolecular components including polar and non-polar metabolites, genomic DNA (including the ability to resolve epigenetic imprints), small and large RNA (including posttranscriptional modifications) and proteins for network analyses. From a second aspect, the method subject of the present invention provides data about all biomolecular components including non-polar metabolites, genomic DNA (including the ability to resolve epigenetic imprints), small and large RNA (including posttranscriptional modifications) and proteins for network analyses. Since there was previously no possibility to extract concomitant genomic DNA,

RNA, proteins and metabolites (in particular including both polar and non-polar metabolites) from single unique samples, it was not possible to identify associations de novo between different biomolecules in a comprehensive way. Methods used in prior art imply the knowledge of the genetic information, and do not provide any information when such information is unknown and/or where the cell populations are heterogeneous.

Since cellular states may vary depending on the genotype, the data obtained by the method of the invention should facilitate meaningful data analysis and modelling. According to the method of the invention any biomolecular associations, e.g. between metabolite data, gene expression data and genetic information data, indeed reflect the actual processes in the cells under investigation. The generation of integrative biomolecular data is an essential requirement for obtaining comprehensive views of biomolecular networks and understanding how perturbations, e.g. diseases, impact such networks. Thus, the methodological framework according to the invention allows fine-scale patient stratification (an important requirement for personalised medicine) because of its ability to reflect all biomolecular information layers and may improve the discovery of new (multifactorial) biomarkers. The methodological framework according to the invention forms the basis for the development of meaningful molecular models as well as inter- and intra-system comparative analyses. The invention also allows the study of small non-coding RNA and in particular microRNA and to relate their abundances, to the protein level and/or to the underlying genetic information. As microRNA is involved in the normal functioning of eukaryotic cells, the dysregulation of microRNA complements has recently been associated with disease. For example, certain microRNAs have been linked to some forms of cancer, e.g. miRNA-21. Therefore it is of interest to be able to study from the same sample microRNA data in combination with information on the genomic DNA, RNA, proteins and/or metabolites.

The invention also allows the study of epigenomic modifications in relation to protein and/or metabolite expression levels. Epigenetic modifications, that are heritable and reversible, have been found to play an important role in gene expression and regulation. Epigenetic modifications are also involved in numerous cellular processes such as cell differentiation and tumorigenesis.

The invention also allows the study of post-transcriptional modifications, i.e. modifications of eukaryotic mRNA, tRNA or other RNAs made after transcription. The invention also allows the study of post-translational modifications, i.e. the chemical modifications of proteins following translation.

Definitions

As used herein "cryo-milling" is equivalent to cryogenic grinding, freezer milling and/or freezer grinding and refers to the act of cooling or chilling a material and then reducing it to a small particle size.

As used herein "partial lysis" refers to a lysis process of cells conducted in order to be incomplete such that a significant fraction of the cells is not lysed.

As used herein "indiscriminate cell type" means that all cells are concerned independent of e.g. their morphology, identity, chemical composition, etc. ^*

As used herein "phase separation solution" is a mixture comprising polar solvents and non-polar solvents. As used herein "phase separation" is a process by which a single phase separates into two or more new phases, the new phases being liquid and/or solid.

As used herein "genomic DNA" refers to the genetic instruction given by the deoxyribonucleic acid chains in the cells.

As used herein "genome" refers to the entirety of an organism's hereditary information.

As used herein "epigenome" refers to the set of epigenetic modification on the genetic material of a cell such as DNA methylations and/or histone modifications. As used herein "single-nucleotide polymorphism" (SNP) is a DNA sequence variation occurring when a single nucleotide— A, T, C or G— in the genome (or other shared sequence) differs between members of a biological species or paired chromosomes in an individual.

As used herein "RNA" refers to any ribonucleic acids that can be present in cells contained in the biological source material, including viral RNA.

As used herein "small RNA" refers to RNA below 200 nucleotides such as micro

RNA (i.e. post-transcriptional regulators) and other RNA, e.g. tRNA (i.e. transfer RNA that intervenes within the synthesis of protein molecules).

As used herein "large RNA" refers to RNA above 200 nucleotides such as mRNA (i.e. transcripts from protein-encoding genes) and rRNA (i.e. forms part of ribosomes which are the site of protein synthesis in all living cells).

As used herein "total RNA" refers to a mixture of both small RNA (<200 nt) and large RNA (>200 nt).

As used herein "large RNA fraction" refers to an RNA fraction comprising mainly large RNA.

As used herein "transcriptome" refers to the set of all RNA molecules, including mRNA, rRNA, tRNA, and other non-coding RNA produced in one or a population of cells.

As used herein "proteins" refers to any molecules comprising amino acids connected via peptide bonds that are present in the biological source material.

As used herein "proteome" refers to the entire set of proteins expressed by a genome, cell, tissue or organism at a given time. As used herein "metabolites" refers to any intermediate or product resulting from metabolism, i.e. from physical and chemical processes involved in the maintenance and reproduction of life in which nutrients are broken down to generate energy and simpler molecules which themselves may be used to form more complex molecules.

As used herein "polar metabolites" refers to all hydrophilic metabolites showing a capacity to interact with polar solvents, in particular with water or with other polar groups, such as sugars, amino acids, organic acid, etc.

As used herein "non-polar metabolites" refers to all hydrophobic metabolites having a tendency to dissolve in non-polar solvents, such as lipophilic compounds, lipids, waxes, chlorophyll, etc.

As used herein "metabolome" refers to the complete set of metabolites, such as metabolic intermediates or end products of metabolism, hormones and other signaling molecules to be found within a biological sample.

As used herein "integrated data" or "integrated information" refers to data or information resulting from omic analyses and further submitted to a step of analysis and/or modeling.

As used herein "biomolecular component" refers to biomolecules and is used as a synonym of "biomolecular complement".

The invention will now be described by way of examples with reference to accompanying figures in which:

- figure 1 shows a flowchart highlighting the methodological framework according to the invention;

- figure 2 shows a flowchart of the biomolecular isolation method flowchart according to the invention;

- figure 3 is a Pearson correlation network of microbial genera and detected metabolite features across 16 distinct microbial community samples obtained according to the method of the invention. As illustrated in figure 1 , the method for obtaining integrated genomic, transcriptomic, proteomic and/or metabolomic data from a biological sample according to the invention includes the steps of:

- separating and purifying biomolecular components comprising metabolites, genomic DNA, RNA and proteins from a single biological sample;

- performing analysis and/or modeling of the molecular data, resulting in integrated genomic, transcriptomic, proteomic and/or metabolomic data.

Step 1 : separating and purifying biomolecular components

According to the invention, the method provides an access to all biomolecular components in a standardised way.

A biological sample is collected and is immediately snap-frozen at a temperature minimum of -70°C and preferably -196°C, in order to stop inherent enzymatic activity. According to a preferred embodiment of the invention, the biological sample comprises heterogeneous cell populations, and is selected with preference from e.g. tumour tissues and/or mixed microbial communities; and/or cell population for which the genomic information or context is unknown.

The sample is cryopreserved at said temperature until applying the biomolecular extraction protocol according to the invention. With reference to figure 1 , the sample is then submitted to a cryogenic homogenisation and lysis step. With preference, this step is performed at -80°C at least, in an oscillating mill at a frequency of about 30 Hz during at least 2 min. Cells (for example microbial cells) are partially lysed by this cryogenic grinding. According to the invention, the cryo-milling step is performed in order that about 30 to 60 %, or 30 to 70%, or 30 to 80 % and with preference about 50 % of the cells are lysed indiscriminately. Preferably about 50 to 60% or 50 to 70% or 50 to 80 % of the cells are lysed indiscriminately. To conduct the first lysis in order to be partial presents several advantages. Firstly, the mechanical nature of the first lysis step does not involve chemical products such as chaotropic agents that may affect non-polar metabolites. Secondly, halting the mechanical step before the lysis of the cells is complete, avoids excessive milling and helps to preserve high quality and high molecular weight DNA, so the later isolated DNA is usable for genomic analyses. Thirdly, the partial nature of the lysis step allows obtaining a representative protein fraction by preserving a part of the cells intact, independent of their morphology or their identity, during the metabolite extraction. Indeed some proteins of the lysed cells that are released with the first mechanical step and which show hydrophobic properties, may dissolve in the non- polar solvents used during the metabolite extraction, and may therefore be lost for proteomic analyses. To preserve intact a significant fraction of the cells during the metabolite extraction step, and to lyse them after the metabolite extraction, aids in obtaining a protein fraction for proteomics that includes hydrophobic proteins, e.g. membrane proteins. Fourthly, to perform the first mechanical lysis step at very low temperature helps in preserving the respective biomolecules as a molecular snapshot of the time of sampling.

Metabolites are first extracted in a phase separation step. A phase separation solution comprising a mixture of methanol, chloroform and water in the proportion of one volume of methanol, one volume of water and two volumes of chloroform is added to the cryo-milled sample. The ratio of chloroform (or other non-polar solvents considered) in the mixture can be lowered by the skilled man if the sample is not particularly rich in lipids. Conversely, where the sample is expected to be rich in polar metabolites, the ratio of methanol (or other polar solvent) may be adjusted accordingly. The sample mixed with the solvent mixture is then homogenised and centrifuged. Polar and non-polar metabolites are extracted separately as they are solubilised either in the polar or in the non-polar phase. The centrifugation step allows a separation into three phases: an upper phase comprising polar metabolites, an interphase pellet comprising genomic DNA, large and small RNA, proteins and non-lysed cells, and a lower phase comprising non-polar metabolites. As we will see later, metabolomic studies are performed on the lower and upper phases. The addition of the phase solution and the following homogenisation step are performed at a temperature below 0°C in order to protect RNA and DNA from degradation. A combined mechanical and chemical lysis step is performed on the interphase pellet, which contains all remaining cellular constituents, or components. The lysis solution comprises with preference Tris-EDTA and a lysis buffer, β- mercaptoethanol, for example added at 10 μΙ per ml of lysis buffer, is added to the interphase together with the lysis solution to preserve RNA integrity. Following this, differential nucleic acid and protein isolation is carried out using chromatographic spin-columns.

In a first embodiment, the method involves for differential nucleic acid and protein isolation the commercially available "All-in-One Purification Kit" (Total RNA, microRNA, total proteins and genomic DNA) from Norgen Biotek Corp. This kit is based on the differential binding of biomolecules and their subsequent elution from chromatographic spin columns. The skilled man will use this kit because it provides the possibility to separately isolate small RNA (< 200 nt) comprising microRNA from the total RNA fraction. Access to the small RNA fraction is a requirement for full and comprehensive integrative omic measurements. Using this kit, the lysate is mixed with ethanol (for example 100 μΙ of pure ethanol) and is applied to a first mineral support (first column) under conditions for genomic DNA, large RNA and part of the proteins to bind. The enhanced quantity of ethanol added results in higher efficiency of nucleic acid binding. Large RNA, genomic DNA and a small part of proteins bind to the column while the flowthrough fraction contains unbound proteins and small

RNA. The flowthrough fraction is collected. The flowthrough is then applied to a second mineral support (second column) under conditions for small RNA to bind, allowing the purification of RNA smaller than 200 nt (small RNA fraction). The flowthrough of the second mineral support is collected since it contains the proteins. Genomic DNA and large RNA are sequentially washed off the first mineral support and eluted two times with dedicated solutions, resulting in large RNA and genomic

DNA fractions. The flowthrough containing proteins collected from the second mineral support is adjusted to pH 3 and is applied to the first mineral support under conditions to have the remaining proteins to bind together with the first bound proteins. Then the bound proteins are finally washed and eluted two times from the first mineral support, resulting in the protein fraction.

In another embodiment, the method involves for differential nucleic acid and protein isolation the commercially available AHPrep ^® DNA/RNA/Protein Mini kit (Qiagen, Valencia, CA). This kit is also based upon differential binding properties of biomolecules and their subsequent elution from chromatographic spin columns.

However, this procedure provides enrichment for mRNA since RNA smaller than 200 nt (small and microRNAs) is selectively excluded and, hence, the small RNA fraction cannot be retrieved. Using this kit, the lysate is first passed through a QIAshredder column which allows selective binding of genomic DNA and this is eluted in a further step using a dedicated buffer, resulting in the DNA fraction.

Ethanol (for example 400 μΙ for 600 μΙ of initially added buffer modified with β- mercaptoethanol) is then added to the flow-through to provide appropriate binding conditions for RNA onto the membrane of the subsequently used RNeasy spin column. The RNA is eluded using a dedicated buffer, resulting in the RNA fraction. An aqueous protein precipitation solution is added to the flow-through for the isolation of the total protein fraction. The protein pellet is then re-dissolved in the designated buffer, resulting in the protein fraction.

In another embodiment, the method involves for differential nucleic acid and protein isolation known methods using chromatographic spin-columns allowing the isolation of a total RNA fraction comprising small RNA and large RNA. Additionally a small RNA fraction can be obtained separately by any known isolation procedure.

With preference, the first and second mineral support are porous or non-porous and comprised of metal oxides or mixed metal oxides, silica gel, silicon carbide resin, silica membrane, glass particle, powdered glass, quartz, Aluminia, Zeolite, Titanium Dioxide, or Zirconium Dioxide.

To bind RNA to the column, ethanol is added to the lysate or to the flowthrough. However, in an embodiment of the method, ethanol can be replaced with a dipolar atropic solvent selected from acetone, acetonitrile, tetrahydrofuran (THF), methyl ethyl ketone, Ν,Ν-dimethylformamide (DMF) and dimethyl sulfoxide.

In another embodiment, the lysis solution includes a chaotropic salt, non-ionic detergent (i.e. non-ionic surfactant) and reducing agent. With preference said chaotropic salt is guanidine HCI. Preferably said non-ionic agent is selected from triethyleneglycol monolauryl ether, (octylplhenoxy)polyethoxyethanol, sorbitari monolaurate, T-octylphenoxyployethoxyethanol, or a combination thereof. Preferably the non-ionic detergent or combination thereof is in the range of 0,1 - 10 %. With preference the reducing agent is 2-aminoethanethiol, tris- carboxyethylphosphine (TCEP), or β-mercaptoethanol.

Step 2: performing at least one of genomic, transcriptomic, proteomic and/or metabolomic analyses to identify and/or quantify said biomolecular components.

Identification and quantification of the compounds of interest extracted from the sample can be done according to the well-known techniques known in the prior art. Omic analyses result in several datasets. Each dataset is related to one of the fractions of biomolecular components separated and purified in step 1. Genomics is performed on the genomic DNA fraction to obtain the genomic DNA information and establish the genome of the cells. Genomic techniques include but are not limited to Whole Genome Sequencing (WGS) analysis, Restriction Fragment Length Polymorphism (RFLP) analysis followed by Southern Blot analysis, Polymerase Chain Reaction (PCR) followed optionally by amplicon sequencing, Short Tandem Repeats (STR) analysis, and Amplified Fragment Length Polymorphism (AmpFLP) analysis.

Epigenomics is performed on the genomic DNA fraction to obtain epigenetic information and establish the epigenome of the cells. For the study of histone modifications, epigenomic techniques include but are not limited to chromatin immunoprecipitation (ChIP) technology with DNA micro arrays (termed ChlP-Chip), coupled chromatin immunoprecipitation and serial analysis of gene expression (ChlP-SAGE), coupled chromatin immunoprecipitation and paired end ditag sequencing (ChlP-PET) and coupled chromatin immunoprecipitation and DNA sequencing (ChlP-seq). For establishing the methylation patterns, epigenomic techniques include but are not limited to Restriction Landmark Genome Scanning (RLGS) technique and bisulfite sequencing. With preference genomic DNA information further includes obtaining SNP patterns.

SNP genotyping techniques include but are not limited to enzyme based method such as Whole Genome Sequencing (WGS) analysis, Restriction Fragment Length polymorphism (RFLP), Primer extension and sequencing, PCR based method, Temperature gradient gel electrophoresis (TGGL), etc.

Transcriptomic techniques are performed on the large and small RNA fractions in order to establish the transcriptome of the cells. Transcriptomic techniques include, but are not limited to, microarray-based approaches such as DNA microarray, differential displays approaches, sequencing-based approaches such as RNAseq, Serial Analysis of Gene Expression (SAGE) or Massively Parallel Signature

Sequencing (MPSS). Unlike the genome, which is roughly fixed for a given cell line or organism (excluding mutations), the transcriptome can vary with external environmental conditions. Because it includes all RNA transcripts in the cell, the transcriptome reflects the genes or non-coding regions that are actively being transcribed at any given time, with the exception of mRNA degradation phenomena such as transcriptional attenuation. The transcriptomes of cancer cells are of particular interest to classify tumours and, thus, for patient stratification. The information derived from gene expression profiling often has an impact on predicting the patient's clinical outcome. According to the invention, study of the transcriptome may include analysis and quantification of post-transcriptional modifications including methylations of RNA transcripts; and/or of post-transcriptional regulation processes including splicing and/or RNA editing. Characterization and quantification of RNA post-transcriptional modifications can be performed by known techniques using for example stable isotope labeling of RNA in conjunction with mass spectrometric analysis.

The protein fraction is submitted to proteomics. The techniques involved include, but are not limited to, liquid chromatography followed by mass spectrometry (MS), two dimensional gel-electrophoresis (2DE), SDS Polyacrylamide-Gel- Electrophoresis (SDS-PAGE), Western blotting, MS, ion-exchange chromatography, etc. Proteomics allows measuring protein abundances as well as studying and establishing the structure and the function of the proteins. The variation seen within the proteome of a cell is generally larger than the corresponding transcriptome due to alternative splicing of genes and post- translational modifications.

According to the invention, study of the proteome may include analyzing and/or quantifying proteolysis and/or one or more of the post-translational modifications of the proteins including but not limited to: proteolytic cleavage, glycosylation, acetylation, alkylation, methylation, biotinylation, glutamylation, glycylation, isoprenylation, lipoylation, phosphopantetheinylation, phosphorylation, sulfation, selenation, C-terminal amidation and/or any combination thereof. The study of proteolysis and/or posttranslational modifications is conducted by known methods using for example PROTOMAP technology which allows the identification of posttranslational changes to proteins that are manifested in altered migration patterns when subjected to for example one-dimensional SDS-PAGE, or two- dimensional gel electrophoresis, and difference gel electrophoresis.

Metabolomics is performed on the polar and non-polar metabolite fractions obtained. Metabolomics allows the study of the chemical processes of cells involving small molecules (metabolites). Metabolites include, but are not limited to, sugars, amino acids, organic acids, peptides, steroids, lipids, lipophilic compounds, chlorophyll, etc. The metabolite fraction can also comprise pharmacophore and drug breakdown compounds. According to the invention both polar and non-polar metabolites can be extracted from the same unique sample and studied to establish the metabolomes of different cell populations. Analytical approaches include, but are not limited to nuclear magnetic resonance (NMR) spectrometry and mass spectrometry (MS), which is used to identify and to quantify metabolites after separation, by gas chromatography (GC), high performance liquid chromatography (HPLC) or capillary electrophoresis (CE), for example. Metabolites are the end products of cellular regulatory processes, and their levels can be regarded as the ultimate response of biological systems to genetic or environmental changes.

The invention provides information of genomic DNA, RNA, proteins and metabolites from a single unique sample. The invention is remarkable in that it facilitates the generation of data which is particularly suitable for analyses of biological samples comprising heterogeneous cell populations and/or a cell population for which the genomic information or context is unknown.

With preference, the steps of separating and purifying biomolecular components comprising metabolites, genomic DNA, RNA and proteins from a single unique biological sample; and the step of performing at least one of genomic, transcriptomic, proteomic and/or metabolomic analyses to identify and/or quantify said biomolecular components; are carried out on several samples. The data are obtained for each sample, so that variation of the genomic, transcriptomic, proteomic and/or metabolomic components over time, space or cell populations or any other known difference between the biological samples collected can be assessed. For example, the several samples can comprise samples collected at different time points for the same organism.

Step 3: performing analysis and/or modeling of the molecular data to obtain integrated data

The molecular data obtained through at least one of genomic, transcriptomic, proteomic and/or metabolomics analyses are prepared for data analysis and/or subsequent modeling.

Preparation of the data for subsequent analysis involves for example normalisation of the different datasets that are the result of genomic, epigenomic, transcriptomic, proteomic, and/or metabolomic analyses to bring them onto the same scale and/or the combination of the different datasets into a single file or the storage of the individual datasets in a centralised database. Preparation of the data may also involve filtering of the data according to certain cut-offs and/or estimating missing values. Variable selection for identifying relevant features in the data, such as minimum-redundancy-maximum-relevance or correlation feature selection, may be part of the data preparation phase.

The prepared data obtained are then analysed using methods that are appropriate for semi-quantitative, quantitative and qualitative data. The semi-quantitative and quantitative data represent the amount of features present in the sample either in absolute terms (e.g. weight or moles per weight sample) or in relative terms (i.e. normalised to a certain reference during the preparation step). Analyses carried out on the semi-quantitative and quantitative data involve suitable statistical, correlation and partial correlation analyses, calculating Pearson correlation coefficients, mutual information, establishing rank correlation coefficients such as Spearman's rank correlation coefficient or Kendall's rank correlation coefficient (which allow the identification of non-linear relationships between variables), etc. Additional analyses may include clustering and dimension reduction methods such principal component analysis (PCA), independent component analysis (ICA), singular value decomposition (SVD) or multidimensional scaling (MDS). Probabilistic models may include ensembles of dependency trees and mixtures of copulas. Estimation techniques may include likelihood maximisation and convex programming.

Evaluation of the profiling data from the genome, transcriptome, proteome and metabolome by bioinformatics is in progress. On the basis of the assumption that large sets of genes and proteins follow synchronized patterns, attempts at taming profiling data have been made using clustering algorithms that group raw data in an unbiased way. Clustering can be used to build groups of genes with related expression patterns (i.e. coexpressed genes).

According to the invention, it is possible to uncover associations between for example genetic information contained within genomes with information contained within the transcriptomes, proteomes and/or metabolomes. It is also possible to correlate epigenome information (epigenomic signatures, e.g. methylation patterns) with information from the transcriptome, proteome and/or metabolome.

The genome and proteome information taken together allow the identification of proteins associated with specific diseases. This information can be complemented by transcriptome information and in particular with levels and information about biomolecules showing gene regulatory effects such as those exhibited by microRNA. Understanding the proteome, the structure and function of each protein and the complexities of protein-protein interactions will be critical for developing the most effective diagnostic techniques and disease treatments. The invention thus allows developing personalized drugs that are more effective for an individual. The analyses performed can be completed by integration of metabolome information to give a more complete picture of the physiology of a cell. The method according to the invention allows meaningful correlations/relationships between, e.g. microRNA abundance and certain metabolites, microRNA abundances and protein expression, etc.

Given the possibility to access to all biomolecular components within a single biological sample, the invention allows the identification of meaningful correlations between SNP patterns in genomic DNA and for example variant protein expression patterns, etc.

As it provides the possibility to access all biomolecular fractions (or components) within a single biological sample, the invention allows the establishment and/or quantification of the cell population profiles of heterogeneous cell populations by comparing them to measured data to e.g. databases of previously established profiles. For example it allows evaluating the extent of genetic heterogeneity in cancer tumours, the specific types of mutations in an individual, as well as the abundance of the cell populations showing such mutation, in order to optimise the treatment to be administered. It is understood that all possible mutations may not be found in existing databases of profiles and that the invention allows identifying new mutations and performing related post-genomic analyses.

Where the heterogeneous cell populations are mixed microbial communities, the method allows determining both type and abundance of the different organisms in the sample. The invention further allows the identifying both known and unknown or unexpected organisms in mixed microbial communities.

Example

The methodological framework according to the invention has been used to establish a correlation network of microbial genera and detected metabolites. Figure 3 shows a Pearson correlation network of microbial genera (determined using polymerase chain reaction amplification from isolated genomic DNA and deep 454 pyrosequencing of the phylogenetically informative 16S rRNA gene) and detected metabolite features extracted across 16 unique microbial community samples dominated by lipid accumulating organisms (the correlation network was built using the following cut-offs: r > 0.8; statistical p-value≤ 0.0001 ). In the network, nodes representing different genera are color-coded according to taxonomic affiliation (see insert), metabolite feature nodes are highlighted in dark grey and edges highlighting correlations between the nodes are color-coded in light grey. Distinct relationships are discernible from the network, e.g. the abundance of

Methylibium spp. correlates strongly with a number of unknown nonpolar metabolites and, thus, Methylibium spp. may represent a keystone genus in this microbial community. These potential linkages are only discernable thanks to the integrative molecular systems biology approach subject of the present filing.

Previous Patent: COATING SYSTEM FOR FLEXIBLE WEBS

Next Patent: METHOD FOR PROCESSING PLANT REMAINS