Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
MEANS AND METHODS TO DIAGNOSE GUT FLORA DYSBIOSIS AND INFLAMMATION
Document Type and Number:
WIPO Patent Application WO/2022/073973
Kind Code:
A1
Abstract:
The present invention relates to the field of the human gut microbiome, more particularly to its effect on health and disease. Provided herein are means and methods to diagnose and treat or reduce the severity of gut flora dysbiosis as well as of gastro-intestinal inflammation and inflammation-associated disorders or conditions in a subject in need thereof.

Inventors:
RAES JEROEN (BE)
PROOST SEBASTIAN (BE)
FALONY GWEN (BE)
ARAUJO VIEIRA DA SILVA SARA MANUEL (BE)
TIMPSON NICHOLAS JOHN (GB)
HUGHES DAVID ALLEN (GB)
Application Number:
PCT/EP2021/077386
Publication Date:
April 14, 2022
Filing Date:
October 05, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
VIB VZW (BE)
KATHOLIEKE UNIV LEUVEN K U LEUVEN R&D (BE)
UNIV BRISTOL (GB)
International Classes:
C12Q1/6883; C12Q1/689; G01N33/68
Domestic Patent References:
WO2016174677A12016-11-03
WO2020087130A12020-05-07
WO2017109059A12017-06-29
WO2020201457A12020-10-08
Foreign References:
US20200281991A12020-09-10
Other References:
PARADA VENEGAS DANIELA ET AL: "Short Chain Fatty Acids (SCFAs)-Mediated Gut Epithelial and Immune Regulation and Its Relevance for Inflammatory Bowel Diseases", FRONTIERS IN IMMUNOLOGY, vol. 10, 1 January 2019 (2019-01-01), pages 277, XP055829804, Retrieved from the Internet DOI: 10.3389/fimmu.2019.00277
CHEN MICHAEL X. ET AL: "Metabolome analysis for investigating host-gut microbiota interactions", JOURNAL OF THE FORMOSAN MEDICAL ASSOCIATION, vol. 118, 1 March 2019 (2019-03-01), HK, pages S10 - S22, XP055874033, ISSN: 0929-6646, Retrieved from the Internet DOI: 10.1016/j.jfma.2018.09.007
SITTIPO PANIDA ET AL: "Microbial Metabolites Determine Host Health and the Status of Some Diseases", INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, vol. 20, no. 21, 24 October 2019 (2019-10-24), pages 5296, XP055874049, DOI: 10.3390/ijms20215296
MOJSAK PATRYCJA ET AL: "The role of gut microbiota (GM) and GM-related metabolites in diabetes and obesity. A review of analytical methods used to measure GM-related metabolites in fecal samples with a focus on metabolites' derivatization step", JOURNAL OF PHARMACEUTICAL AND BIOMEDICAL ANALYSIS, vol. 191, 15 September 2020 (2020-09-15), AMSTERDAM, NL, pages 113617, XP055874274, ISSN: 0731-7085, Retrieved from the Internet DOI: 10.1016/j.jpba.2020.113617
C. CASÉN ET AL: "Deviations in human gut microbiota: a novel diagnostic test for determining dysbiosis in patients with IBS or IBD", ALIMENTARY PHARMACOLOGY & THERAPEUTICS., vol. 42, no. 1, 14 May 2015 (2015-05-14), GB, pages 71 - 83, XP055351398, ISSN: 0269-2813, DOI: 10.1111/apt.13236
DORIS VANDEPUTTE ET AL: "Quantitative microbiome profiling links gut community variation to microbial load", NATURE, 23 November 2017 (2017-11-23), London, XP055548220, ISSN: 0028-0836, DOI: 10.1038/nature24460
VIEIRA-SILVA SARA ET AL: "Quantitative microbiome profiling disentangles inflammation- and bile duct obstruction-associated microbiota alterations across PSC/IBD diagnoses", NATURE MICROBIOLOGY, NATURE PUBLISHING GROUP UK, LONDON, vol. 4microbiome pr, no. 11, 17 June 2019 (2019-06-17), pages 1826 - 1831, XP036914518, DOI: 10.1038/S41564-019-0483-9
AGUS ALLISON ET AL: "Gut microbiota-derived metabolites as central regulators in metabolic disorders", GUT MICROBIOTA, vol. 70, no. 6, 3 December 2020 (2020-12-03), UK, pages 1174 - 1182, XP055874048, ISSN: 0017-5749, Retrieved from the Internet DOI: 10.1136/gutjnl-2020-323071
WU YUANQI ET AL: "Identification of microbial markers across populations in early detection of colorectal cancer", NATURE COMMUNICATIONS, vol. 12, no. 1, 24 May 2021 (2021-05-24), XP055874017, Retrieved from the Internet DOI: 10.1038/s41467-021-23265-y
MISHIMA YOSHIYUKI ET AL: "Molecular Mechanisms of Microbiota-Mediated Pathology in Irritable Bowel Syndrome", INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, vol. 21, no. 22, 17 November 2020 (2020-11-17), pages 8664, XP055874276, DOI: 10.3390/ijms21228664
MARCOS-ZAMBRANO LAURA JUDITH ET AL: "Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment", FRONTIERS IN MICROBIOLOGY, vol. 12, 19 February 2021 (2021-02-19), XP055834142, DOI: 10.3389/fmicb.2021.634511
ARUMUGAM ET AL., NATURE, vol. 479, 2011, pages 538 - 541
FALONY ET AL., SCIENCE, vol. 352, 2016, pages 560 - 564
VANDEPUTTE ET AL., NATURE, vol. 551, 2017, pages 507 - 511
VALLES-COLOMER ET AL., NAT MICROBIOL, vol. 4, 2019, pages 1826 - 1831
VIEIRA-SILVA ET AL., NATURE, vol. 581, 2020, pages 310 - 315
REYNDERS ET AL., ANN CLIN TRANSL NEUR, vol. 7, 2020, pages 406 - 419
BOONSTRA ET AL., J HEPATOL, vol. 56, 2012, pages 1181 - 1188
AUSUBEL ET AL.: "Current Protocols in Molecular Biology", 1999, JOHN WILEY & SONS
COSTEA ET AL., NAT MICROBIOL, vol. 3, 2018, pages 8 - 16
CAS , no. 16534-24-0
LAZARIDIS ET AL., N ENGL J MED, vol. 375, 2016, pages 1161 - 1170
LINDOR ET AL., AM J GASTROENTEROL, vol. 110, 2015, pages 646 - 659
BOONSTRA ET AL., HEPATOLOGY, vol. 58, 2013, pages 1084 - 1093
ALABRABA ET AL., LIVER TRANSPL, vol. 15, 2009, pages 330 - 340
TABIBIAN ET AL., HEPATOLOGY, vol. 63, 2016, pages 185 - 196
LUNDER ET AL., GASTROENTEROLOGY, vol. 151, 2016, pages 660 - 669
SEBODE ET AL., J HEPATOL, vol. 60, 2014, pages 1010 - 1016
FRANCIOTTA ET AL., LANCET NEUROLOGY, vol. 7, 2008, pages 852 - 588
MS. LUBLIN ET AL., NEUROLOGY, vol. 83, 2014, pages 278 - 286
AMATO ET AL., J NEUROL, vol. 253, 2006, pages 1054 - 1059
CALABRESE ET AL., MULT SCLER, vol. 19, 2013, pages 904 - 911
WIESEL ET AL., EUR J GASTROENTEROL HEPATOL, vol. 13, 2001, pages 441 - 448
BERER ET AL., FEBS LETTERS, vol. 588, 2014, pages 4207 - 4013
OCHOA-REPARAZ ET AL., J IMMUNOL, vol. 183, 2009, pages 6041 - 6050
BERER ET AL., PROC NAT AC SC USA, 2017
HARRIES ET AL., BR MED J CLIN RES ED, vol. 284, 1982, pages 706
WILKS, MED TIMES GAZETTE, vol. 2, 8 January 1959 (1959-01-08), pages 264 - 265
GHIONE ET AL., AM J GASTROENTEROL, 2017
MOLODECKY ET AL., GASTROENTEROLOGY, vol. 142, 2012, pages 46 - 54
DE SOUZA ET AL., NAT REV GASTROENTEROL HEPATOL, vol. 13, 2016, pages 13 - 27
SMYTHIES ET AL., J CLIN INVEST, vol. 115, 2005, pages 66 - 75
KAMADA ET AL., J CLIN INVEST, vol. 118, 2008, pages 2269 - 2280
GERLACH ET AL., NAT IMMUNOL, vol. 15, 2014, pages 676 - 686
MAUL ET AL., GASTROENTEROLOGY, vol. 128, 2005, pages 1868 - 1878
GOMOLLON ET AL., J CROHNS COLITIS, vol. 11, 2017, pages 3 - 25
MANDREKAR, J THORAC ONCOL, vol. 5, 2010, pages 1315 - 1316
Attorney, Agent or Firm:
VIB VZW (BE)
Download PDF:
Claims:
Claims

1. A method of detecting a disease or disorder in a subject, said method comprising: a. Measuring in a biological sample of said subject the level of at least one metabolic biomarker selected from Table 1; b. Comparing the measured level of the at least one biomarker of said subject sample to that of a control sample; and c. Determining that the subject suffers from a disease or disorder if the measured level of the at least one biomarker in the subject sample is increased or decreased relative to the level of that of the control sample.

2. The method according to claim 1, wherein the disease or disorder is selected from the list consisting of gut flora dysbiosis, an inflammatory disorder, obesity, diabetes type 2 and depression.

3. The method according to claim 2, wherein the inflammatory disorder is selected from the list consisting of spondyloarthritis, ankylosing spondylitis, reactive arthritis, psoriatic arthritis, enteropathic arthritis, undifferentiated spondyloarthritis, juvenile idiopathic arthritis, primary sclerosing cholangitis, multiple sclerosis and any gut inflammation associated therewith.

4. The method according to claim 2, wherein the inflammatory disorder is a gut inflammatory disorder selected from the list consisting of Crohn's disease, irritable bowel syndrome, inflammatory bowel disease, ulcerative colitis and celiac disease.

5. The method according to any of the preceding claims, wherein the difference in the measured level of the at least one biomarker in the subject sample is statistically different from that of the control sample.

6. The method according to any of the preceding claims, wherein the biological sample is selected from the list consisting of blood, serum and plasma.

7. The method according to any of the preceding claims, wherein the at least one metabolic biomarker is lH-indole-7-acetic acid, 3-phenylpropionate or cinnamoylglycine.

8. The method according to any of the preceding claims, wherein the at least one metabolic biomarker is a group of biomarkers.

9. The method according to any of the preceding claims, wherein the group of biomarkers comprises or consists of at least 2 biomarkers selected from Table 6, at least 3 biomarkers selected from Table 7 or at least 4 biomarkers selected from Table 8.

10. The method according to claim 8, wherein said group of biomarkers comprises or consists of at least one biomarker selected from Table 5, at least 2 biomarkers selected from Table 6, at least 3

32 biomarkers selected from Table 7 or at least 4 biomarkers selected from Table 8 and one or more further metabolic biomarkers selected from Table 1, Table 6, Table 7, Table 8 and/or Table 9.

11. The method according to claim 8, wherein the group of biomarkers comprises or consists of the group of biomarkers listed in Table 9, Table 10 or Table 1.

12. A biomarker panel comprising or consisting of at least 2 biomarkers selected from Table 6, at least 3 biomarkers selected from Table 7, at least 4 biomarkers selected from Table 8 or comprising or consisting of at least one biomarker selected from Table 5 and at least 1 biomarker selected from Table 1, Table 6, Table 7, Table 8 or Table 9.

13. The biomarker panel according to claim 12, wherein said panel comprises or consists of the group of biomarkers listed in Table 9, Table 10 or Table 1.

14. The biomarker panel according to claim 12 or 13 for use in diagnosing gut flora dysbiosis, an inflammatory disorder, obesity, diabetes type 2 or depression in a subject.

15. The biomarker panel according to claim 12 or 13 for use according to claim 14, wherein the inflammatory disorder is selected from the list consisting of spondyloarthritis, ankylosing spondylitis, reactive arthritis, psoriatic arthritis, enteropathic arthritis, undifferentiated spondyloarthritis, juvenile idiopathic arthritis, primary sclerosing cholangitis, multiple sclerosis and any gut inflammation associated therewith.

16. The biomarker panel according to claim 12 or 13 for use according to claim 14, wherein the inflammatory disorder is a gut inflammatory disorder selected from the list consisting of Crohn's disease, irritable bowel syndrome, inflammatory bowel disease, ulcerative colitis and celiac disease.

33

Description:
MEANS AND METHODS TO DIAGNOSE GUT FLORA DYSBIOSIS AND INFLAMMATION

Field of the invention

The present invention relates to the field of the human gut microbiome, more particularly to its effect on health and disease. Provided herein are means and methods to diagnose and treat or reduce the severity of gut flora dysbiosis as well as of gastro-intestinal inflammation and inflammation-associated disorders or conditions in a subject in need thereof.

Background

The human gut is the natural habitat for a large and dynamic bacterial community. These human digestive-tract associated microbes are referred to as the gut microbiome. The human gut microbiome and its role in both health and disease has been the subject of extensive research. Imbalance of the normal gut microbiota - or gut flora dysbiosis - has been linked with gastrointestinal conditions such as inflammatory bowel disease (IBD) and irritable bowel syndrome (IBS), and wider systemic manifestations of disease such as obesity, diabetes, depression and atopy. A problem in mapping the gut microbiome is that the majority of bacteria living in the gut cannot be identified by traditional culturing methods. Therefore, culturing-independent methods have been developed such as 16S rRNA gene sequencing and shotgun sequencing. Based on the overall microbiota composition of a stool sample, bioinformatics analyses such as Dirichlet Multinomial Mixtures (DMM) modeling allow classifying the human gut microbiome in genera-driven clusters or enterotypes (Arumugam et al 2011 Nature 473:174-180; Falony et al 2016 Science 352:560-564). DMM community typing allows to distinguish a Ruminococcaceae (Rum or R), Prevotella (Prev or P), Bacteroidesl (Bactl or Bl), and Bacteroides2 (Bact2 or B2) enterotype. The latter is a recently described intestinal microbiota configuration embodying gut flora dysbiosis. It is also demonstrated that Bacteroides2 is associated with systemic inflammation, inflammatory bowel disease, primary sclerosing cholangitis, obesity, depression, multiple sclerosis and has a high prevalence in loose stools in humans (Vandeputte et al 2017 Nature 551: 507-511; Valles-Colomer et al 2019 Nat Microbiol 4: 623-632; Veira-Silva et al 2019 Nat Microbiol 4: 1826-1831; Veira-Silva et al 2020 Nature 581: 310- 315; Reynders et al 2020 Ann Clin Transl Neur 7: 406-419). B2 is characterized by a high proportion of Bacteroides, a low proportion of Faecalibacterium and low microbial cell densities (Vandeputte et al 2017 Nature 551: 507-511). Its prevalence varies from 13% in a general population cohort to as high as 78% in patients with inflammatory bowel disease.

Given the negative correlation between the B2 enterotype and health and given the complexity of B2 enterotype classification (i.e. combining microbiome profiling and flow cytometric enumeration of microbial cells), it would be advantageous to develop an easy and cheap diagnostic preferably based on conventional biological samples for diagnostic purposes such as blood.

Summary

Previously the inventors of current application found that the Bacteroides2 enterotype represents gut flora dysbiosis and that it is predominantly present in patients with systemic and intestinal inflammation, indicating that the Bacteroides2 enterotype depicts a vulnerable microbial community associated with disease or pre-disease status. Diagnosing B2 in an early stage would thus be advantageous in order to therapeutically interfere before severe clinical complaints arise. The inventors of current application have identified a narrow set of metabolites that allows predicting the B2 enterotype based on the blood serum metabolomics.

Therefore, it is an object of the invention to provide a method of detecting or diagnosing gut flora dysbiosis and/or inflammation in a subject, said method comprising the steps of: measuring in a biological sample of said subject the level of at least one metabolic biomarker selected from Table 1; comparing the measured level of the at least one biomarker of said subject sample to that of a control sample; and determining that the subject suffers from gut flora dysbiosis and/or inflammation if the measured level of the at least one biomarker in the subject sample is increased or decreased relative to the level of that of the control sample and/or if the difference in the measured level of the at least one biomarker in the subject sample is statistically different from that of the control sample.

In particular embodiments, inflammation can be gut inflammation associated with for example Crohn's disease, irritable bowel syndrome, inflammatory bowel disease, ulcerative colitis or celiac disease, but inflammation can also be not related to the gut, for example primary sclerosing cholangitis, spondyloarthritis or multiple sclerosis. The same method steps can also be used for methods of detecting diabetes type 2 or depression in a subject. In other embodiments, the inflammatory disorder is characterized by a TH1, TH17, TH2 and/or TH9 response.

In one embodiment, the biological sample is selected from the list consisting of blood, serum and plasma. In a particular embodiment, the at least one metabolic biomarker selected from Table 1 is lH-indole-7- acetic acid, 3-phenylpropionate or cinnamoylglycine. The at least one metabolic biomarker can also be a group of biomarkers. In a further particular embodiment, said group of biomarkers comprises or consists of at least 2 biomarkers selected from Table 6, at least 3 biomarkers selected from Table 7 or at least 4 biomarkers selected from Table 8. Said group of biomarkers can also comprise lH-indole-7-acetic acid, 3-phenylpropionate or cinnamoylglycine and one or more metabolic biomarkers selected from Table 1, Table 6, Table 7, Table 8 and/or Table 9. In a most particular embodiment, said group of biomarkers comprises or consists of the group of biomarkers listed in Table 9, Table 10 or Table 1.

Given that the application discloses single metabolites or sets of metabolites for the purpose of diagnosing gut flora dysbiosis or the B2 enterotype and hence all diseases or disorders that are linked to B2, the application also provides biomarker panels. In one embodiment, these biomarker panels comprise at least 2 biomarkers selected from Table 6, at least 3 biomarkers selected from Table 7 or at least 4 biomarkers selected from Table 8 or comprise lH-indole-7-acetic acid, 3-phenylpropionate or cinnamoylglycine and at least 1 biomarker selected from Table 1, 6, 7, 8 or 9. In a most particular embodiment, a biomarker panel is provided comprising or consisting of the group of biomarkers listed in Table 9 or Table 1.

These biomarker panels are also provided for use in diagnosing gut flora dysbiosis, an inflammatory disorder, obesity, diabetes type 2 or depression in a subject, wherein the inflammatory disorder can be selected from the list consisting of spondyloarthritis, ankylosing spondylitis, reactive arthritis, psoriatic arthritis, enteropathic arthritis, undifferentiated spondyloarthritis, juvenile idiopathic arthritis, primary sclerosing cholangitis, multiple sclerosis and any gut inflammation associated therewith. In particular embodiments, the inflammatory disorder is a gut inflammatory disorder selected from the list consisting of Crohn's disease, irritable bowel syndrome, inflammatory bowel disease, ulcerative colitis and celiac disease.

Brief description of the Figures

Figure 1 shows the correlations between the various metabolites detected to be important for predicting the B2 enterotype from blood serum metabolomics. Two distinct groups can be observed: one group (the cluster on the bottom right containing phenol sulfate and D-Urobilin among others) that is elevated in participants with the B2 enterotype, and one with metabolites decreased in B2 (the cluster on the top left containing hippurate and catechol sulfate among others). The numbers correspond to the compounds listed in Table 4.

Figure 2 shows the ROC curve when predicting the B2 enterotype vs non-B2 using an ADABoostClassifier taking all metabolites with significant differences (p < 0.05 after Bonferroni correction) into account. The shaded area around the curve is the 95% confidence interval. Figure 3 shows the ROC curve when predicting the B2 enterotype vs non-B2 using an ADABoostClassifier taking the top 10 (based on feature importance of the ADABoost classifier) metabolites as listed in Table 9 into account. The shaded area around the curve is the 95% confidence interval.

Detailed description

Definitions

The present invention will be described with respect to particular embodiments and with reference to certain drawings but the invention is not limited thereto but only by the claims. Any reference signs in the claims shall not be construed as limiting the scope. The drawings described are only schematic and are non-limiting. In the drawings, the size of some of the elements may be exaggerated and not drawn on scale for illustrative purposes. Where the term "comprising" is used in the present description and claims, it does not exclude other elements or steps. Where an indefinite or definite article is used when referring to a singular noun e.g. "a" or "an", "the", this includes a plural of that noun unless something else is specifically stated.

Furthermore, the terms first, second, third and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and that the embodiments of the invention described herein are capable of operation in other sequences than described or illustrated herein.

The following terms or definitions are provided solely to aid in the understanding of the invention. Unless specifically defined herein, all terms used herein have the same meaning as they would to one skilled in the art of the present invention. Practitioners are particularly directed to Michael R. Green and Joseph Sambrook, Molecular Cloning: A Laboratory Manual, 4 th ed., Cold Spring Harbor Laboratory Press, Plainsview, New York (2012); and Ausubel et al., Current Protocols in Molecular Biology (Supplement 47), John Wiley & Sons, New York (1999), for definitions and terms of the art. The definitions provided herein should not be construed to have a scope less than understood by a person of ordinary skill in the art.

Central in this application is the Bacteroides2, B2 or Bact2 enterotype. The B2 enterotype is an intestinal microbiota configuration that is associated with systemic inflammation and has a high prevalence in loose stools in humans (Vandeputte et al 2017 Nature 551: 507-511). B2 is characterized by a high proportion of Bacteroides, a low proportion of Faecalibacterium and low microbial cell densities, and its prevalence varies from 13% in a general population cohort to as high as 78% in patients with inflammatory bowel disease (Vandeputte et al 2017 Nature 551: 507-511). The B2 enterotype represents gut flora dysbiosis.

A "high proportion of Bacteroides" refers to a "high relative fraction of the Bacteroides genus" and is defined herein as an at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 100%, at least 1.5 fold, at least 2 fold, at least 3 fold, at least 5 fold or at least 10 fold higher relative abundance compared to the relative abundance of the Bacteroides genus in the stool sample of a healthy subject. "Bacteroides" as used herein refers to a genus of Gram-negative, obligate anaerobic bacteria. Bacteroides species are normally mutualistic, making up the most substantial portion of the mammalian gastrointestinal flora. The Bacteroides genus belongs to the family of Bacteroidaceae and a non-limiting example of a Bacteroides species is B. fragilis.

A "low proportion of Faecalibacterium" refers to a "low relative fraction of the Faecalibacterium genus" and is defined herein as an at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 100%, at least 1.5 fold, at least 2 fold, at least 3 fold, at least 5 fold or at least 10 fold lower relative abundance compared to the relative abundance of the Faecalibacterium genus in the stool sample of a healthy subject. "Faecalibacterium" as used herein refers to a genus of bacteria of which its sole known species, Faecalibacterium prausnitzii is grampositive, mesophilic, rod-shaped, anaerobic and is one of the most abundant and important commensal bacteria of the human gut microbiota. It is non-spore forming and non-motile. These Faecalibacterium bacteria produce butyrate and other short-chain fatty acids through the fermentation of dietary fiber. "Relative fraction" or "relative abundance" as used herein refers to the fraction or abundance of a certain genus with respect to or compared to a plurality of other genera present in the stool sample.

"Low microbial cell densities" or "low microbial cell count" as used herein is a microbial cell count which is at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 100%, at least 1.5 fold, at least 2 fold, at least 3 fold, at least 5 fold or at least 10 fold lower than the microbial count of a stool sample of a healthy subject.

"Cell count" as used herein refers to the sample cell density, in order words how many cells, more particularly microbial cells, are present in the sample, more particularly stool sample. Multiple methods are known by the skilled person to quantify microbial cell count in a stool sample, which is typically presented as cells per gram stool.

"Stool sample" and "fecal sample" are used interchangeably and refer to as a sample or aliquot of the stool or feces of a subject, more particular a mammal, even more particularly a human being, most particularly a patient. The stool sample as used herein comprises the gut microbiome from a human patient to be diagnosed. As used herein, the term "microflora" refers to the collective bacteria in an ecosystem of a host (e.g. an animal, such as a human) or in a single part of the host's body, e.g. the gut. An equivalent term is "microbiota". As used herein, the term "microbiome" refers to the totality of bacteria, their genetic elements (genomes) in a defined environment, e.g. within the gut of a host, the latter then being referred to as the "gut microbiome".

As used herein, the term "patient" or "individual" or "subject" typically denotes humans, but may also encompass reference to non-human animals, preferably warm-blooded animals, more preferably mammals, such as, e.g. non-human primates, rodents, canines, felines, equines, ovines, porcines, and the like.

As used herein, the term "gut" generally comprises the stomach, the colon, the small intestine, the large intestine, cecum and the rectum. In addition, regions of the gut may be subdivided, e.g. the right versus the left side of the colon may have different microflora populations due to the time required for digesting material to move through the colon, and changes in its composition in time. Synonyms of gut include the "gastrointestinal tract", or possibly the "digestive system", although the latter is generally also understood to comprise the mouth, esophagus, etc.

The term "gut microbiome composition" is equivalent in wording as "gut microbiome profile" and these wordings are used interchangeably herein. A gut microbiome profile represents the presence, absence or the abundance of one or more of bacterial genera identified in a stool sample. The gut microbiome profile can be determined based on an analysis of amplification products of DNA and/or RNA of the gut microbiota, e.g. based on an analysis of amplification products of genes coding for one or more of small subunit rRNA, etc. and/or based on an analysis of proteins and/or metabolic products present in the biological sample. Gut microbiome profiles may be "compared" by any of a variety of statistical analytic procedures.

In microbiology, "16S sequencing" or "16S" refers to a sequence derived by characterizing the nucleotides that comprise the 16S ribosomal RNA gene(s). The bacterial 16S rRNA is approximately 1500 nucleotides in length and is used in reconstructing the evolutionary relationships and sequence similarity of one bacterial isolate to another using phylogenetic approaches.

For the current application "method to detect an inflammatory disorder" is equivalent to a "method to detect the presence or to assess the risk of developing an inflammatory disease".

The term "inflammation", "inflammatory disorder" or "inflammatory disease" refers to complex but to the skilled person well known biological response of body tissues to harmful stimuli, such as pathogens, damaged cells, or irritants. Inflammation is not a synonym for infection though. Infection describes the interaction between the action of microbial invasion and the reaction of the body's inflammatory response — the two components are considered together when discussing an infection, and the word is used to imply a microbial invasive cause for the observed inflammatory reaction. Inflammation on the other hand describes purely the body's immunovascular response, whatever the cause may be. Inflammation is a protective response involving immune cells, blood vessels, and molecular mediators. The function of inflammation is to eliminate the initial cause of cell injury, clear out necrotic cells and tissues damaged from the original insult and the inflammatory process, and to initiate tissue repair. The classical signs of inflammation are heat, pain, redness, swelling, and loss of function. Inflammation is a generic response, and therefore it is considered as a mechanism of innate immunity, as compared to adaptive immunity, which is specific for each pathogen. Inflammation can be classified as either acute or chronic. Acute inflammation is the initial response of the body to harmful stimuli and is achieved by the increased movement of plasma and leukocytes (especially granulocytes) from the blood into the injured tissues. A series of biochemical events propagates and matures the inflammatory response, involving the local vascular system, the immune system, and various cells within the injured tissue. Prolonged inflammation, known as chronic inflammation, leads to a progressive shift in the type of cells present at the site of inflammation, such as mononuclear cells, and is characterized by simultaneous destruction and healing of the tissue from the inflammatory process.

The term ROC or Receiver Operating Characteristic curve refers to a graphical plot that illustrates the diagnostic ability of a binary classifier system or alternatively phrased a probability curve. The area under the curve (often referred to as simply the AUC) refers then to the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one. It thus tells how much the model is capable of distinguishing between classes. The higher the AUC, the better the prediction model is. In the ROC curve the sensitivity or the True Positive Rate (TPR) is plotted against the False Positive Rate or 1-specif icity, where TPR is on y-axis and FPR is on the x-axis.

A metabolic profile correlates with the B2 gut enterotype

The goal of the study that led to current invention was to find a narrow set of metabolites that allows the prediction of the B2 enterotype based on blood serum metabolomics. To this end various machine learning approaches were used to select relevant metabolites. Starting with 1000 different metabolites that were measured in blood samples, the inventors of current application found that the level of 59 metabolic analytes correlated with the B2 enterotype. A biomarker panel of 59 metabolic analytes depicted in Table 1 provided an excellent B2 prediction. Surprisingly, the list of metabolites could be narrowed down to the 10 biomarkers listed in Table 9 and even down to the 5 biomarkers listed in Table 10 while maintaining a highly reliable prediction (ROC AUC>0.8).

Therefore, in a first aspect a biomarker panel is provided comprising at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14 or at least 15 metabolic biomarkers selected from Table 1. In one particular embodiment, a biomarker panel is provided comprising at least 4, at least 5, at least 6, at least 7, at least 8 or 9 metabolic biomarkers selected from Table 9. In another particular embodiment, said biomarker panels comprise at least 3- phenylpropionate, isoursodeoxycholate and/or p-cresol sulfate.

In a most particular embodiment, a biomarker panel is provided comprising or consisting of the group of biomarkers listed in Table 9, Table 10 or Table 1.

Table 9. A biomarker panel consisting of a set of 10 metabolites which can be used to generate high quality predictions (ROC AUC> 0.8) for the B2 enterotype. (2) means isoform 2.

Table 10. A biomarker panel consisting of a set of 10 metabolites which can be used to generate high quality predictions (ROC AUC> 0.8) for the B2 enterotype.

In an attempt to restrict the number of metabolic biomarkers to still reliably predict the B2 enterotype, several combinations of two or more metabolites were tested. Surprisingly, acceptable predictions (ROC AUC>0.7) could be obtained from single metabolic markers (i.e. lH-indole-7-acetic acid, 3- phenylpropionate and cinnamoylglycine), while several combinations of two or more other metabolites were found to be usable to predict the B2 enterotype as well (see Example 3).

Therefore, in yet another embodiment, a biomarker panel is provided comprising or consisting of at least 2 metabolic biomarkers selected from Table 6.

In another embodiment, a biomarker panel is provided comprising or consisting of at least 3 metabolic biomarkers selected from Table 7.

In another embodiment, a biomarker panel is provided comprising or consisting of at least 4 metabolic biomarkers selected from Table 8.

In yet another embodiment, a biomarker panel is provided comprising lH-indole-7-acetic acid, 3- phenylpropionate or cinnamoylglycine and further comprising at least one additional metabolic biomarker. In a particular embodiment, said at least one additional metabolic biomarker is selected from Table 9, Table 10 or from Table 1. In yet another embodiment, a biomarker panel is provided comprising or consisting of lH-indole-7-acetic acid, 3-phenylpropionate and/or cinnamoylglycine.

Table 6. A biomarker panel consisting of 13 metabolites. Combinations of at least 2 of these metabolites predict the B2 enterotype with a ROC AUC>0.7.

Table 7. A biomarker panel consisting of 5 metabolites. Combinations of at least 3 of these metabolites predict the B2 enterotype with a ROC AUC>0.7. [2] means isoform 2. Table 8. A biomarker panel consisting of 16 metabolites. Combinations of at least 4 of these metabolites predict the B2 enterotype with a ROC AUC>0.7. [2] means isoform 2. Several minimal combinations could be identified that rendered an ROC AUC>0.8. For example, this is the case for the combinations of at least 6 metabolites from Table 11 or of at least 11 from Table 12. Therefore, in yet another embodiment, a biomarker panel is provided comprising or consisting of at least 6, at least 7, at least 8, at least 9 or at least 10 metabolic biomarkers selected from Table 11. In yet another embodiment, a biomarker panel is provided comprising or consisting of at least 11, at least 12, at least 13, at least 14 or at least 15 metabolic biomarkers selected from Table 12.

Table 11. A biomarker panel consisting of a set of 15 metabolites. Combination of at least 6 of these metabolites predict the B2 enterotype with a ROC AUC> 0.8. (2) means isoform 2.

Table 12. A biomarker panel consisting of a set of 21 metabolites. Combination of at least 11 of these metabolites predict the B2 enterotype with a ROC AUC> 0.8. (2) of [2] means isoform 2.

From hereon, the biomarker panels disclosed above will be referred to as "one of the biomarker panels of the application" or as "any of the biomarker panels of the application". In a second aspect, any of the biomarker panels of the application is provided for use in diagnosing a disease or disorder. Indeed, the B2 enterotype represent gut flora dysbiosis and is associated with health problems and several inflammatory disorders. People who have this dysbiotic enterotype have a higher blood concentration of C-reactive protein - a hallmark of inflammation - than do individuals who have other enterotypes (Costea et al 2018 Nat Microbiol 3: 8-16). More than 75% of individuals who have IBD have the B2 enterotype in contrast to fewer than 15% of people who do not have the disease (Veira- Silva et al 2019 Nat Microbiol 4: 1826-1831). The B2 enterotype is also correlated to primary sclerosing cholangitis (Veira-Silva et al 2019 Nat Microbiol 4: 1826-1831), multiple sclerosis (Reynders et al 2020 Ann Clin Transl Neur 7: 406-419), depression (Valles-Colomer et al 2019 Nat Microbiol 4: 623-632) and obesity (Veira-Silva et al 2020 Nature 581: 310-315).

Any of the biomarker panels of the application are also provided for use in detecting in a subject a gut flora microbiome associated with or predictive for a disease or disorder. In one embodiment, said disease or disorder is gut flora dysbiosis and/or an inflammatory disorder in a subject. In another embodiment, said disease or disorder is obesity, diabetes type 2 or depression.

In a particular embodiment, said inflammatory disorder is selected from the list consisting of spondyloarthritis, ankylosing spondylitis, reactive arthritis, psoriatic arthritis, enteropathic arthritis, undifferentiated spondyloarthritis, juvenile idiopathic arthritis, primary sclerosing cholangitis, multiple sclerosis, a gut inflammatory disorder, inflammatory bowel disease (IBD), Crohn's disease (CD), ulcerative colitis (UC), irritable bowel syndrome (IBS), celiac disease and any combination thereof and any gut inflammation associated with one of the above listed inflammatory disorders. In another particular embodiment, said inflammatory disorder is characterized by a TH1, TH17, TH2 and/or TH9 response.

In another aspect, the use of any of the biomarker panels of the application is provided to classify, categorize or distinguish different gut flora microbiomes based on isolated biological samples. The use of any of the biomarker panels of the application is also provided to distinguish a B2 enterotype or a dysbiotic gut microbiome or a gut microbiome associated with gut flora dysbiosis and/or an inflammatory disorder from a non-B2 enterotype or a gut microbiome not associated with gut flora dysbiosis and/or an inflammatory disorder.

In a fourth aspect, methods of detecting a disease or disorder in a subject are provided comprising the following steps:

Measuring in a biological sample of said subject the level of at least one metabolic biomarker selected from any of the metabolic biomarker panels of the application; Comparing the measured level of the at least one biomarker of said subject sample to that of a control sample; and

Determining that the subject suffers from a disease or disorder if the measured level of the at least one biomarker in the subject sample is increased or decreased relative to the level of the at least one biomarker in the control sample and/or if the difference between the measured level of the at least one biomarker in the subject sample and that of the control sample is statistically significant.

In one embodiment, said disease or disorder is gut flora dysbiosis and/or an inflammatory disorder. In another embodiment, said disease or disorder is obesity, diabetes type 2 or depression. In a particular embodiment, said inflammatory disorder is selected from the list consisting of spondyloarthritis, ankylosing spondylitis, reactive arthritis, psoriatic arthritis, enteropathic arthritis, undifferentiated spondyloarthritis, juvenile idiopathic arthritis, primary sclerosing cholangitis, multiple sclerosis, a gut inflammatory disorder, inflammatory bowel disease (IBD), Crohn's disease (CD), ulcerative colitis (UC), irritable bowel syndrome (IBS), celiac disease and any combination thereof and any gut inflammation associated with one of the above listed inflammatory disorders. In another particular embodiment, said inflammatory disorder is characterized by a TH1, TH17, TH2 and/or TH9 response.

In particular embodiments, the above disclosed methods steps are provided for a method of detecting or diagnosing in a subject a gut microbiome associated with or predictive for gut flora dysbiosis and/or an inflammatory disorder. Even more particular, said methods steps are also provided for methods of distinguishing or predicting or diagnosing different gut flora microbiomes, more particularly a gut flora microbiome associated with gut flora dysbiosis or inflammation, most particularly a Bacteroides2 enterotype.

In one embodiment, the biological sample for the methods of current application is selected from the list consisting of blood, serum and plasma.

In one embodiment, said at least one metabolic biomarker selected from any of the metabolic biomarker panels of the application is lH-indole-7-acetic acid (CAS No. 39689-63-9), 3-phenylpropionate (CAS No. 501-52-0, alternative names are hydrocinnamate and 3-phenylpropanoate) or cinnamoylglycine (CAS No. 16534-24-0). This is equivalent is saying that said at least one metabolic biomarker is selected from Table s.

Table 5. A biomarker panel consisting of lH-indole-7-acetic acid, 3-phenylpropionate and cinnamoylglycine. In another embodiment, said at least one metabolic biomarker selected from any of the metabolic biomarker panels of the application is a group of metabolic biomarkers. In a more particular embodiment, said group of metabolic biomarkers comprises at least 2 metabolic biomarkers selected from Table 6, or at least 3 metabolic biomarkers selected from Table 7 or at least 4 metabolic biomarkers selected from Table 8 or at least 6 metabolic markers selected from Table 11 or at least 11 metabolic biomarkers selected from Table 12.

In yet another embodiment, said group of metabolic biomarkers comprises at least one metabolic biomarker selected from the list consisting of lH-indole-7-acetic acid, 3-phenylpropionate (hydrocinnamate) and cinnamoylglycine and further comprises at least one, at least 2, at least 3 or at least 4 additional metabolic biomarker(s). In a further particular embodiment, said at least one, at least 2, at least 3 or at least 4 additional metabolic biomarker(s) is/are selected from Table 6, 7, 8, 9, 10, 11, 12 or 1.

In yet another embodiment, said group of metabolic biomarkers comprises at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14 or at least 15 metabolic biomarkers selected from Table 1. In yet another embodiment, said group of metabolic biomarkers comprises at least 4, at least 5, at least 6, at least 7, at least 8 or 9 metabolic biomarkers selected from Table 9. In yet another embodiment, said group of metabolic biomarkers comprises at least 3-phenylpropionate, isoursodeoxycholate and/or p-cresol sulfate.

In a most particular embodiment, said group of metabolic biomarkers comprises or consists of the group of biomarkers listed in Table 9, Table 10 or Table 1.

Blood serum metabolite levels can be measured in parallel using Liquid Chromatography paired with Mass Spectrometry (LC-MS), tandem mass-spectrometry (LC-MS/MS), gas chromatography paired with mass spectrometry (GC-MS), high performance liquid chromatography using UV or fluorescent detection, nuclear magnetic resonance (NMR) spectroscopy or combinations thereof. The skilled person is familiar with the measurements, as a plethora of platforms are commercially available to perform these measurements. Alternatively, when considering a limited set of key metabolic markers one or more targeted essays could also be used.

With access to samples and a labeled dataset, where the enterotype is known, a classifier can be trained on (a subset of) the measured metabolites. Any classifier that can predict a class label from one or more continuous features can be used, this includes, but isn't limited to Decision Trees, Random Forest Classifiers, Support Vector Classifiers, Stochastic Gradient Decent Classifier and the ADABoostClassifier. Various implementations for these classifiers are available in Scikit-learn (for the python language), Machine Learning for R (mlr library for R), ... Once trained on a labeled set, metabolite levels from patients' samples with an unknown enterotype can be provided to the trained classifier to obtain a predicted class (in this case B2 or non-B2).

Alternatively, a set of logical rules, depending on upper and lower thresholds for key metabolic markers, could also be designed to characterize the B2 enterotype. Indeed, in the methods described herein, the determination step can be based on an increased or decreased level of the at least one metabolic biomarker in the subject sample compared to that in the control sample (see also Table 1). If 3- phenylpropionate is measured as one of the metabolic biomarkers then a decreased level is predictive for the disease or disorder (e.g. gut flora dysbiosis and/or an inflammatory disorder, obesity, diabetes type 2, depression) or for a gut microbiome associated with or predictive for said disease or disorder. For cinnamoylglycine a decreased level is predictive, for 5-hydroxyhexanoate a decreased level, for 5alpha-androstan-3beta,17alpha-diol disulfate a decreased level, for 4-hydroxycoumarin a decreased level, for hippurate a decreased level, for phenol sulfate an increased level, for glucuronide of C19H28O4 a decreased level, for isoursodeoxycholate an increased level, for imidazole propionate an increased level, for indolepropionylglycine a decreased level, for l-urobilinogen an increased level, for N-acetyl- cadaverine an increased level, for glycoursodeoxycholate an increased level, for D-urobilin an increased level, for 11-ketoetiocholanolone glucuronide a decreased level, for 7-alpha-hydroxy-3-oxo-4- cholestenoate (7-Hoca) an increased level, for glutarate (C5-DC) an increased level, for lH-indole-7- acetic acid a decreased level, for carotene diol a decreased level, for ursodeoxycholate an increased level, for taurolithocholate 3-sulfate a decreased level, for indole-3-carboxylic acid a decreased level, for palmitoyl-linoleoyl-glycerol (16:0/18:2) an increased level, for N2,N5-diacetylornithine a decreased level, for glycolithocholate sulfate a decreased level, for beta-cryptoxanthin a decreased level, for phenylacetate a decreased level, for 3-(4-hydroxyphenyl)propionate an increased level, for l-(l-enyl- palmitoyl)-2-palmitoyl-GPC (P-16:0/16:0) a decreased level, for oleoyl-oleoyl-glycerol (18:1/18:1) [2] an increased level, for etiocholanolone glucuronide a decreased level, for palmitoyl-oleoyl-glycerol (16:0/18:1) [2] an increased level, for l-(l-enyl-palmitoyl)-2-oleoyl-GPC (P-16:0/18:l) a decreased level, for 3-methyladipate a decreased level, for 1-oleoyl-GPE (18:1) an increased level, for palmitoyl sphingomyelin (dl8:l/16:0) a decreased level, for carotene diol (2) a decreased level, for oleoyl- arachidonoyl-glycerol (18:1/20:4) [1] an increased level, for p-cresol sulfate a decreased level, for anthranilate a decreased level, for oleoyl-linoleoyl-glycerol (18:1/18:2) [2] an increased level, for guanidinosuccinate a decreased level, for 5-hydroxyindole sulfate a decreased level, for 2- acetamidophenol sulfate a decreased level, for glycosyl-N-tricosanoyl-sphingadienine (dl8:2/23:0) a decreased level, for 4-hydroxyglutamate an increased level, for 4-ethylphenylsulfate an decreased level, for adenosine 5'-monophosphate (AMP) a decreased level and for glycochenodeoxycholate sulfate an increased level.

In a further particular embodiment, said differences in level of the measured metabolic biomarker between the subject sample and that of the control sample are statistically significant.

The term "statistically significant" or "statistically significantly" different is well known by the person skilled in the art. Statistical significance plays a pivotal role in statistical hypothesis testing. It is used to determine whether the null hypothesis should be rejected or retained. It states that the results are obtained because of chance and are not supporting a real change or difference between two data sets. The null hypothesis is the default assumption that what one is trying to prove did not happen. In contrast the alternative hypotheses states that the obtained results support the theory being investigated. For the null hypothesis to be rejected (and thus the alternative hypothesis to be accepted), an observed result has to be statistically significant, i.e. the observed p-value is less than the pre-specified significance level a. The p stands for probability and measures how likely it is that the null hypothesis is incorrectly rejected and thus that any observed difference between data sets is purely due to chance. In most cases the significance level a is set at 0.05.

In one embodiment, said control sample is representative of matched human subjects. In one embodiment, said control sample is a sample from a subject with a non-B2 enterotype or alternatively phrased a subject with a Bacteriodesl, Prevotella or Ruminococcaceae enterotype. In another embodiment, said control sample is a sample from a subject with a gut microbiome that is not associated with or predictive for gut flora dysbiosis and/or inflammatory disorder or obesity or diabetes type 2 or depression. In other embodiments, said control sample is a negative control sample from a healthy individual, i.e. comparable individual not suffering from or diagnosed with gut flora dysbiosis and/or inflammatory disorders or obesity or diabetes type 2 or depression or a comparable individual not having an enterotype or a gut microbiome associated with or predictive for gut flora dysbiosis and/or inflammatory disorders.

The application also provides methods to detect the presence or to assess the risk of developing a disease or disorder, or a gut microbiome associated with or predictive of a disease or disorder in a patient, comprising the steps of: determining a metabolic profile from a biological sample obtained from said patient and comparing said profile to one or more metabolic reference profiles, wherein said one or more metabolic reference profiles comprise at least one of a positive metabolic reference profile based on results from control subjects with said disease or disorder or with a gut microbiome associated with or predictive of said disease or disorder, and a negative metabolic reference profile based on results from control subjects without said disease or disorder or without a gut microbiome associated with or predictive of said disease or disorder, if said metabolic profile for said patient statistically significantly matches said positive metabolic reference profile, then concluding that said patient has or is at risk of developing said disease or disorder or of a gut microbiome associated with or predictive of said disease or disorder in a patient; and/or if said metabolic profile for said patient statistically significantly matches said negative metabolic reference profile, then concluding that said patient does not have or is not at risk of developing said disease or disorder or does not have a gut microbiome associated with or predictive of said disease or disorder in a patient.

In one embodiment, a positive metabolic reference profile is a metabolic reference profile from a subject with a B2 enterotype and a negative metabolic reference profile is a metabolic reference profile from a subject not having a B2 enterotype or alternatively phrased having a Bl, R or P enterotype.

In one embodiment, said disease or disorder is gut flora dysbiosis and/or an inflammatory disorder. In another embodiment, said disease or disorder is obesity, diabetes type 2 or depression. In a particular embodiment, said inflammatory disorder is selected from the list consisting of spondyloarthritis, ankylosing spondylitis, reactive arthritis, psoriatic arthritis, enteropathic arthritis, undifferentiated spondyloarthritis, juvenile idiopathic arthritis, primary sclerosing cholangitis, multiple sclerosis, a gut inflammatory disorder, inflammatory bowel disease (IBD), Crohn's disease (CD), ulcerative colitis (UC), irritable bowel syndrome (IBS), celiac disease and any combination thereof and any gut inflammation associated with one of the above listed inflammatory disorders. In another particular embodiment, said inflammatory disorder is characterized by a TH1, TH17, TH2 and/or TH9 response.

In another embodiment, said metabolic profile is determined from a biological sample which can be blood, serum or plasma. In a more particular embodiment, said biological sample consists of blood, serum and plasma.

In another embodiment, said metabolic profile comprises an indication of the presence and/or abundance of at least one, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14 or at least 15 metabolic biomarkers selected from Table 1. In a more particular embodiment, said metabolic profile comprises an indication of the presence and/or abundance of at least one, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8 or 9 selected from Table 9.

In another embodiment, said metabolic profile comprises an indication of the presence and/or abundance of lH-indole-7-acetic acid, 3-phenylpropionate or cinnamoylglycine.

In another embodiment, said metabolic profile comprises an indication of the presence and/or abundance of at least 2 metabolic biomarkers selected from Table 6, or of at least 3 metabolic biomarkers selected from Table 7 or of at least 4 metabolic biomarkers selected from Table 8 or at least 6 metabolic biomarkers selected from Table 11 or at least 11 metabolic biomarkers selected from Table 12.

In yet another embodiment, said metabolic profile comprises an indication of the presence and/or abundance of at least one metabolic biomarker selected from the list consisting of lH-indole-7-acetic acid, 3-phenylpropionate and cinnamoylglycine and further at least one, at least 2, at least 3 or at least 4 additional metabolic biomarker(s). In a further particular embodiment, said at least one, at least 2, at least 3 or at least 4 additional metabolic biomarker(s) is/are selected from Table 1, 6, 7, 8, 9, 10, 11 or 12.

In another embodiment, the metabolic profile is obtained by one of the metabolic biomarker panels disclosed in current application.

In a most particular embodiment, said metabolic profile comprises an indication of the presence and/or abundance of the biomarkers listed in Table 9, Table 10 or Table 1. With abundance it is meant the quantification of the metabolic biomarkers. Said quantification can be absolute quantification or relative quantification compared reference values.

In a fifth aspect, methods of diagnosing and treating an inflammatory disorder in a subject are provided. Said methods comprise the steps from the methods provided in the fourth aspect of current application further co prising a step of administering an effective amount of anti-infla atory drugs to the subject. This is equivalent as saying that methods are provided of diagnosing and treating an inflammatory disorder in a patient, comprising administering anti-inflammatory therapy to said patient if the blood, plasma or serum metabolic profile for said patient statistically significantly matches that of a Bacteroides2 enterotype. In a particular embodiment, said match is performed by using one of the biomarkers from the application.

In further embodiments, said inflammatory disorder is selected from the list consisting of spondyloarthritis, ankylosing spondylitis, reactive arthritis, psoriatic arthritis, enteropathic arthritis, undifferentiated spondyloarthritis, juvenile idiopathic arthritis, primary sclerosing cholangitis, multiple sclerosis, a gut inflammatory disorder, inflammatory bowel disease (IBD), Crohn's disease (CD), ulcerative colitis (UC), irritable bowel syndrome (IBS), celiac disease and any combination thereof and any gut inflammation associated with one of the above listed inflammatory disorders.

As used herein, the term "spondyloarthritis" or abbreviated "SpA" refers to a group of closely related, but clinically heterogeneous, inflammatory arthritis diseases with common features, including inflammation of the spine, eyes, skin, joints and gastrointestinal tract. This SpA group is also sometimes referred to as spondylitis and spondyloarthropathies. As used herein, SpA includes ankylosing spondylitis (including non-radiographic axial SpA, i.e. ankylosing spondylitis diagnosed using MRI), reactive arthritis, psoriatic arthritis, enteropathic arthritis (arthritis associated with inflammatory bowel disease or IBD related arthritis), undifferentiated spondyloarthritis, juvenile idiopathic arthritis and juvenile-onset SpA. Characteristics of these SpA diseases include inflammatory arthritis of the spine, peripheral arthritis that differs from rheumatoid arthritis, extra articular manifestations of inflammatory bowel disease, arthritis and uveitis, seronegativity for rheumatoid factor and some degree of heritability, including the presence of the gene HLA-B27. It is thus clear that in current application SpA is not rheumatoid arthritis.

"Primary sclerosing cholangitis" or "PSC" as used herein refers to a severe chronic liver disease characterized by progressive biliary inflammation and fibrosis. The development of multifocal bile duct structures can lead to liver fibrosis and subsequent cirrhosis. Patients with PSC are usually asymptomatic and the diagnostic work up is triggered by incidental findings of altered liver enzymes. In symptomatic patients, fatigue, pruritus, abdominal pain and jaundice are the most reported symptoms (Lazaridis et al 2016 N Engl J Med 375:1161-1170). Following clinical suspicion and a suggestive biochemistry, magnetic resonance cholangiography or endoscopic retrograde cholangiopancreatography are used to establish the diagnosis. Presently, liver biopsy is reserved to diagnose suspected small duct PSC or to exclude other diagnosis (Lindor et al 2015 Am J Gastroenterol 110:646-659). It would thus be highly advantageous to develop presymptomatic diagnostic methods or non-invasive diagnostic methods. The diagnostic methods disclosed above solve this technical problem. Therefore, in a particular embodiment, the herein disclosed methods are provided of diagnosing primary sclerosing cholangitis, more particularly gut inflammation associated with primary sclerosing cholangitis.

A systematic review of the epidemiologic studies in PSC reported an incidence varying between 0 and 1.3 cases per 100 000 individuals and a prevalence of 0-16.2 cases per 100 000 individuals (Boonstra et al 2012 J Hepatol 56:1181-1188). Most commonly, PSC affects men at the age of 40 and the concomitant diagnose of IBD is very common. Between 60 to 80% of the patients with PSC have concomitantly IBD, most frequently UC, pointing towards the possible role of the colon in the pathogenesis of PSC (Boonstra et al 2013 Hepatology 58:2045-2055). This role is further evidenced by transplantation data showing that colectomy before liver transplantation is a protective factor for recurrence of PSC after liver transplantation (Alabraba et al 2009 Liver Transpl 15:330-340). Interestingly, the absence of intestinal microbiota is associated with increased severity of the disease in mice models (Tabibian et al 2016 Hepatology 63:185-196). Therefore, intestinal microbiota may play an important role in the pathogenesis of PSC by modulating the gut-associated immune system to a more immunogenic or tolerogenic phenotype. In patients with IBD, the prevalence of PSC varies from 0.4 to 6.4%. However, in a recent study using magnetic resonance to diagnose PSC in patients with IBD the prevalence of PSC was 3-fold higher than previously reported, mainly due to subclinical PSC without symptoms or altered liver enzymes (Lunder et al 2016 Gastroenterology 151:660-669).

Genome-wide association studies suggested a role for immune-related pathways in the pathogenesis of PSC. Patients with PSC have a higher activity of TH17 cells. These lymphocytes help in the defence against bacteria and fungi by promoting inflammation and are involved in autoimmune diseases (Katt et al 2013 Hepatology 58:1084-1093. Moreover, Treg cells (CD4+CD25+FOXP3+CD127-), which suppress inflammation, are reduced in PSC (Sebode et al 2014 J Hepatol 60:1010-1016). Therefore, in a very particular embodiment, the inflammatory disorder as mentioned in the application refers to inflammatory disorders characterized by a TH17 response.

"Multiple sclerosis" or "MS" as used herein refers to a chronic inflammatory and neurodegenerative disease characterized by substantial clinical heterogeneity. Both genetic and immunologic factors, as well as environmental elements contribute to its aetiology. Most MS patients present with recurrent periods of relapses and remissions, with relapses thought to be provoked by the infiltration of adaptive immune cells into the central nervous system (CNS), hereby resulting in focal inflammation and myelin loss (Franciotta et al 2008 Lancet neurology 7:852-588). In a minority of patients, slow progression is observed from onset. Therefore, three clinical phenotypes can be distinguished: relapsing-remitting (RR), secondary progressive (SP) or primary progressive (PP) MS. Lublin et al (2014 Neurology 83:278- 286) further described these phenotypes as active, not active, and with or without progression. While not recognized as a separate phenotype, a subset of RRMS patients appears to have a mild course, often referred to as benign MS (BMS) (Amato et al 2006 J Neurol 253:1054-1059; Calabrese et al 2013 Mult Scler 19:904-911). Patients experience a wide variety of symptoms, ranging from physical and cognitive symptoms to even bowel dysfunction, with the latter being reported in more than 70% of cases (Wiesel et al 2001 Eur J Gastroenterol Hepatol 13:441-448). Studies in experimental allergic encephalomyelitis (EAE), a widely used mouse model for MS, have provided evidence for a substantial effect of gut microbiota on central nervous system (CNS)-specific autoimmune disease (Berer et al 2014 FEBS letters 588:4207-4013). The absence of gut microbes (germ-free conditions) or the alteration of the gut microbial flora composition with antibiotics resulted in a shift in T cell responses (decreased concentration of IL-17, increased number of regulatory T and B cells) and affected disease severity (Ochoa-Reparaz et al 2009 J Immunol 183:6041-6050). Additionally, mice raised in a germ-free environment were highly resistant to developing spontaneous EAE, unless exposed to specific pathogen- free condition-derived fecal material or a fecal transplant from MS twin-derived microbiota (Berer K et al 2011 Nature 479:538-541; Berer et al 2017 Proc Nat Ac Sc USA). Immune cells from mouse recipients of MS-twin samples produced less IL-10 than immune cells from mice colonized with healthy-twin samples. IL-10 may have a regulatory role in spontaneous CNS autoimmunity, as neutralization of the cytokine in mice colonized with healthy-twin fecal samples increased disease incidence. This evidence suggests that the microbiota may be capable of altering the individual at a phenotypic level and influence the onset, severity and progression of MS. Therefore, in a particular embodiment, the methods disclosed herein are provided for detecting multiple sclerosis or gut inflammation associated with multiple sclerosis.

The wording "gut inflammation" is equivalent to the wording "microscopic gut inflammation" as used herein and refers to an inflammatory response in the gut as defined above. The inflammation can affect the entire gastrointestinal tract, can be more limited to for example the small intestine or large intestine but can also be limited to specific components or structures such as the bowel walls.

As used herein, the term "inflammatory bowel disease" or abbreviated "IBD" refers to an umbrella term for inflammatory conditions of the gut under which both Crohn's disease and ulcerative colitis fall. In people with IBD, the immune system mistakes food, bacteria, or other materials in the gut for foreign substances and responds by sending white blood cells into the lining of the bowels. The result of the immune system's attack is chronic inflammation. Crohn's disease and ulcerative colitis are the most common forms of IBD. Less common IBDs include microscopic colitis, diverticulosis-associated colitis, collagenous colitis, lymphocytic colitis and Behget's disease. In the case of CD, transmural inflammation commonly affects the terminal ileum, although any part of the gastrointestinal system can be affected. Discontinuous inflammation and the presence of non-caveating granulomas are also characteristic of the inflammation in patients with CD. In contrast, UC is characterized by continuous mucosal inflammation starting in the rectum and extending proximally until the caecum (Harries et al 1982 Br Med J Clin Res Ed, 284:706). These are chronic relapsing diseases originating mostly during adolescence and young adulthood and are characterized by chronic inflammation of the gastrointestinal tract leading to invalidating symptoms of bloody diarrhea, weight loss and fatigue (Wilks 1859 Med Times Gazette 2:264- 265). Recent epidemiologic data from France reported a mean incidence of 4.4 cases per 100 000 individuals (Ghione et al 2017 Am J Gastroenterol). Worldwide, the incidence and prevalence of CD range from 0.0-29.3 per 100 000 person-years and 0.6-318.5 per 100 000 persons, respectively. The incidence and prevalence of UC varies from 0.0-19.2 per 100 000 person-years and 2.42-298.5 per 100 000 persons, respectively (Molodecky et al 2012 Gastroenterology 142:46-54).

Several defects in innate and adaptive immunity have been described both in UC and CD (de Souza et al 2016 Nat Rev Gastroenterol Hepatol 13:13-27). In normal conditions, intestinal macrophages exhibit inflammatory anergy which allows the interaction with commensal flora without inducing strong inflammatory responses (Smythies et al 2005 J Clin Invest 115:66-75). However, CD14+ intestinal macrophages are more abundant in patients with CD than in healthy individuals. These CD14+ intestinal macrophages produce more proinflammatory cytokines, such as interleukin(IL)-6, IL-23 and tumor necrosis factor (TNF)-a, than the common CD14-intestinal macrophages (Kamada et al 2008 J Clin Invest 118:2269-2280). Adaptive immunity also plays a role in the pathogenesis of IBD. T helper (TH) lymphocytes are cytokine producing lymphocytes that potentiate or regulate immune responses by interacting with other immune cells such as macrophages, CD8+ T cells, eosinophils and basophils. Following an initial trigger (e.g. impaired barrier function by injury or exposure to xenobiotics) the microbe-associated molecular patterns will induce the secretion of cytokines by dendritic cells, epithelial cells and macrophages, among others. Different cytokine milieus will induce TH1, TH2, TH17 or regulatory T-cell (Treg) subsets (de Souza et al 2016 Nat Rev Gastroenterol Hepatol 13:13-27). In susceptible individuals, an interplay between TH1 and TH17 immune responses seem to be linked with inflammation associated with CD. On the other hand, UC has been described as a TH2-like condition with possible implication of a newly discovered TH9 lymphocytes (de Souza et al 2016 Nat Rev Gastroenterol Hepatol 13:13-27; Gerlach et al 2014 Nat Immunol 15:676-686). In both diseases, an insufficient Treg response seems to be involved in the impaired regulation of inflammatory responses (Maul et al 2005 Gastroenterology 128:1868-1878). In active IBD, the immune system shows an increased response to bacterial stimulation, thereby contributing even further to the chronic inflammatory state. This inflammatory state also produces an increase in the intestinal permeability, allowing bacterial antigens to contact with the immune system, hereby perpetuating the inflammatory state.

In particular embodiments, said inflammation or inflammatory disorder as used in the methods of the fifth aspect is inflammation or an inflammatory disorder characterized by a TH1, TH17, TH2 and/or TH9 response. In even more particular embodiments, said inflammation or inflammatory disorder is characterized by a TH1 and/or TH17 response.

The therapeutic options of the inflammatory disorder diagnosed using the methods herein provided comprise the commonly used anti-inflammatory drugs such as inhibitors of cyclooxygenase activity (aspirin, celecoxib, diclofenac, diflunisal, etodolac, ibuprofen, indomethacin, ketoprofen, ketorolac, meloxicam, nabumetone, naproxen, oxaprozin, piroxicam, salsalate, sulindac, tolmetin, among others) or corticosteroids (prednisone, dexamethasone, hydrocortisone, methylprednisolone, among others) or in combination with commonly used analgesics (acetaminophen, duloxetine, paracetamol, among others) or in any combination thereof. In particular embodiments, said anti-inflammatory therapy includes a biological therapy, such as TNF-alpha blockers, anti-IL17A monoclonal antibodies, anti-CD20 antibodies.

The therapeutic options for CD or UC include corticosteroids, aminosalicylates, immunosuppressive agents and biological therapies. Due to the chronic relapsing and remitting disease-course of IBD, the goal of medical therapy is to induce (induction phase) and maintain remission (maintenance phase). The choice between the different medical therapies depends on several factors such as disease location and severity, medical and surgical history, age, co-morbidities, extra-intestinal manifestations and treatment availability (Gomollon et al 2017 J Crohns Colitis 11:3-25; Harbord et al 2017 J Crohns Colitis 2017).

An "effective amount" of a composition is equivalent to the dosage of the composition that leads to treatment, prevention or a reduction of the severity of inflammation status in a patient. Said inflammation can be gut inflammation for which several methods are known to the person skilled in the art to evaluate or thus to diagnose the severity of the inflammation.

Recently, Vieira-Silva et al. (2020 Nature 581: 310-315) reported that a higher prevalence of the B2 enterotype correlates with a higher body-mass index and obesity. Interestingly, the pattern of enterotypes found in the population of obese individuals differed significantly depending on whether people were taking cholesterol-lowering drugs called statins. Obese participants taking statins had a significantly lower prevalence of the B2 enterotype than did their obese counterparts not taking statins. Therefore, methods of diagnosing and treating gut flora dysbiosis are provided. Methods of diagnosing a gut microbiome associated with or predictive for gut flora dysbiosis and/or inflammatory disorder and changing said gut microbiome to a healthy or non-disease associated gut flora are also provided. Said methods comprise the steps from the methods provided in the fourth aspect of current application further comprising a step of administering an effective amount of a statin to the subject.

Said methods comprise the following steps:

Measuring in a biological sample of a subject the level of at least one metabolic biomarker selected from any of the metabolic biomarker panels of the application;

Comparing the measured level of the at least one biomarker of said subject sample to that of a control sample; Treating the subject with an effective amount of a statin when the measured level of the at least one biomarker in the subject sample is increased or decreased relative to the level of the at least one biomarker in the control sample and/or if the difference between the measured level of the at least one biomarker in the subject sample and that of the control sample is statistically significant.

In one embodiment, the biological sample is selected from the list consisting of blood, serum and/or plasma.

Statins, also known as HMG-CoA reductase inhibitors, are a class of lipid-lowering medications that are often prescribed to reduce illness and mortality in those who are at high risk of cardiovascular disease. Statins are the most common cholesterol-lowering drugs. Non-limiting examples of statins are lovastatin, fluvastatin, pravastin, rosuvastatin, pitavastatin, atorvastatin, simvastatin, cerivastatin, mevastatin.

An "effective amount" of a statin is equivalent to the dosage of the statin that leads to change in gut microbiome in a subject. Said change is a change from a B2 enterotype to a non-B2 enterotype or from a gut microbiome associated with gut flora dysbiosis and/or an inflammatory disorder to a healthy gut microbiome.

The following examples are intended to promote a further understanding of the invention. While the invention is described herein with reference to illustrated embodiments, it should be understood that the invention is not limited hereto. Those having ordinary skill in the art and access to the teachings herein will recognize additional modifications and embodiments within the scope thereof. Therefore, the invention is limited only by the claims attached herein.

Examples

Example 1: Selecting blood serum metabolites to predict the B2 enterotype

To identify and characterize major gut microbiome-associated variables, the Flemish Gut Flora Project (FGFP) initiated a large-scale cross-sectional fecal sampling effort in a confined geographic region (Flanders, Belgium). FGFP collection protocols combined rigorous sampling logistics, including frozen sample collection and cold chain monitoring, with exhaustive phenotyping through online questionnaires, standardized anamnesis and health assessment by general medical practitioners (GPs), and extended clinical blood profiling. Encompassing an equilibrated range of age, gender, health, and lifestyle, the FGFP cohort is a representative for the average gut microbiota composition in a Western European population. From this cohort, blood serum metabolomics data and microbiome phylogenetic profiling based on 16S ribosomal RNA (rRNA) gene amplicon sequencing of stool samples was available for 2938 participants. In that dataset 1031 participants had a Bacteroides 1 type, 503 were determined to be Bacteroides 2, 900 participants a Ruminococcus and 504 a Prevotella type. To balance the dataset and to develop a method that distinguished B2 from non-B2 enterotypes, 500 participants with the B2 enterotype were randomly selected and 500 with another enterotype (i.e. Bl, R or P). From these randomly selected participants, blood serum metabolites were analysed. After removing uncharacterized metabolites, 1024 characterized metabolites were retained. Metabolite levels were centered around zero and the variation normalized using the StandardScaler implementation from scikit- learn.

To filter the full list of metabolites to a more manageable set of metabolites for further analysis, feature selection (as statistical analysis tool) was used to pick up all metabolites with a significant variation between participants with a B2 enterotype and those with another enterotype after correcting for multiple testing using a Bonferroni test. The resulting set of 59 metabolites can be found in Table 1. As co-linear metabolites could be selected using this approach a correlation analysis was done to highlight (potential) dependencies between metabolites (Figure 1).

Table 1: The full list of metabolites picked up using feature selection as highly relevant for predicting the B2 enterotype for blood serum metabolites. The column "B2 level" depicts whether the metabolite is decreased or increased in subjects with a B2 enterotype. CAS, CAS number; HMDB, Human Metabolite database identifier; round and square brackets with a single number indicate the metabolite is a structural isomer.

METABOLITE Score p-value p-value B2 level Fold CAS HMDB

(corrected) change

(Iog2)

3-phenylpropionate 212.4 9.23E-44 9.45E-41 decreased -1.36 501-52-0 HMDB00764

(hydrocinnamate) cinnamoylglycine 113.1 4.24E-25 4.34E-22 decreased -1.13 16534-24-0 HMDB11621 METABOLITE Score p-value p-value B2 level Fold CAS HMDB

(corrected) change

5-hydroxyhexanoate 98.8 2.92E-22 2.99E-19 decreased -0.37 44843-89-2 HMDB00525

5alpha-androstan- 87.0 6.80E-20 6.96E-17 decreased -1.02

3beta,17alpha-diol disulfate

4-hydroxycoumarin 86.0 1.08E-19 1.11E-16 decreased -1.25 1076-38-6 hippurate 81.8 7.66E-19 7.84E-16 decreased -0.75 495-69-2 HMDB00714 phenol sulfate 71.4 1.04E-16 1.06E-13 elevated 0.87 937-34-8 HMDB60015 glucuronide of C19H28O4 (2) 70.2 1.79E-16 1.83E-13 decreased -0.99 isoursodeoxycholate 69.6 2.42E-16 2.48E-13 elevated 1.75 78919-26-3 HMDB00686 imidazole propionate 64.5 2.71E-15 2.77E-12 elevated 1.06 1074-59-5 HMDB02271 indolepropionylglycine 58.5 4.84E-14 4.96E-11 decreased -0.93 l-urobilinogen 57.9 6.38E-14 6.53E-11 elevated 1.80 14684-37-8 HMDB04157

N-acetyl-cadaverine 56.3 1.39E-13 1.42E-10 elevated 0.75 32343-73-0 HMDB02284 glycoursodeoxycholate 45.6 2.41E-11 2.47E-08 elevated 1.11 64480-66-6 HMDB00708

D-urobilin 45.1 3.16E-11 3.23E-08 elevated 1.08 3947-38-4 HMDB04161

11-ketoetiocholanolone 41.1 2.24E-10 2.30E-07 decreased -0.85 17181-16-7 glucuronide

7-alpha-hydroxy-3-oxo-4- 40.9 2.48E-10 2.54E-07 elevated 0.27 115538-85-7 HMDB12458 cholestenoate (7-Hoca) glutarate (C5-DC) 40.6 2.88E-10 2.94E-07 elevated 0.84 110-94-1 HMDB00661 lH-indole-7-acetic acid 39.4 5.05E-10 5.17E-07 decreased -0.82 39689-63-9 carotene diol (1) 33.7 8.82E-09 9.03E-06 decreased -0.21 ursodeoxycholate 32.2 1.85E-08 1.89E-05 elevated 1.07 128-13-2 HMDB00946 taurolithocholate 3-sulfate 31.7 2.29E-08 2.34E-05 decreased -0.48 64936-83-0 HMDB02580 indole-3-carboxylic acid 28.8 9.99E-08 1.02E-04 decreased -0.27 771-50-6 HMDB03320 palmitoyl-linoleoyl-glycerol 28.4 1.25E-07 1.28E-04 elevated 0.28 HMDB07103

(16:0/18:2) [1]

N2,N5-diacetylornithine 27.5 1.91E-07 1.95E-04 decreased -0.30 39825-23-5 glycolithocholate sulfate 26.7 2.92E-07 2.99E-04 decreased -0.41 15324-64-8 HMDB02639 beta-cryptoxanthin 25.9 4.40E-07 4.51E-04 decreased -0.37 472-70-8 HMDB33844 phenylacetate 25.6 5.02E-07 5.14E-04 decreased -0.40 103-82-2 HMDB00209

3-(4- 25.5 5.28E-07 5.41E-04 elevated 0.82 501-97-3 HMDB02199 hydroxyphenyl)propionate l-(l-enyl-palmitoyl)-2- 25.3 5.75E-07 5.89E-04 decreased -0.10 HMDB11206 palmitoyl-GPC (P-16:0/16:0) catechol sulfate 25.2 6.16E-07 6.31E-04 decreased -0.33 4918-96-1 HMDB59724 palmitoyl-linoleoyl-glycerol 24.0 1.12E-06 1.14E-03 elevated 0.31 HMDB07103

(16:0/18:2) [2]* phenol glucuronide 24.0 1.12E-06 1.15E-03 elevated 1.49 17685-05-1 HMDB60014 glucuronide of C14H22O4 (2) 23.7 1.30E-06 1.33E-03 decreased -0.02 dihydroferulic acid 23.2 1.72E-06 1.76E-03 elevated 0.79 1135-23-5

N-acetylglucosamine 22.7 2.20E-06 2.25E-03 elevated 1.68 conjugate of C24H4004 bile acid oleoyl-arachidonoyl-glycerol 22.4 2.52E-06 2.58E-03 elevated 0.27 HMDB07228

(18:1/20:4) [2]

4-hydroxyphenylacetate 21.9 3.22E-06 3.29E-03 elevated 0.56 156-38-7 HMDB00020 l-(l-enyl-palmitoyl)-2- 21.8 3.49E-06 3.58E-03 decreased -0.11 HMDB11211 linoleoyl-GPC (P-16:0/18:2) oleoyl-oleoyl-glycerol 21.7 3.68E-06 3.77E-03 elevated 0.31 HMDB07218

(18:1/18:1) [2] METABOLITE Score p-value p-value B2 level Fold CAS HMDB

(corrected) change etiocholanolone glucuronide 21.2 4.71E-06 4.82E-03 decreased -0.25 3602-09-3 HMDB04484 palmitoyl-oleoyl-glycerol 20.7 5.99E-06 6.13E-03 elevated 0.41 HMDB07102

(16:0/18:1) [2] l-(l-enyl-palmitoyl)-2-oleoyl- 20.4 7.08E-06 7.26E-03 decreased -0.13

GPC (P-16:0/18:l)

3-methyladipate 20.3 7.48E-06 7.66E-03 decreased -0.29 3058-01-03 HMDB00555

1-oleoyl-GPE (18:1) 20.0 8.47E-06 8.68E-03 elevated 0.16 89576-29-4 HMDB11506 palmitoyl sphingomyelin 20.0 8.67E-06 8.88E-03 decreased -0.04 6254-89-3

(dl8:l/16:0) carotene diol (2) 19.4 1.19E-05 1.21E-02 decreased -0.22 oleoyl-arachidonoyl-glycerol 19.2 1.29E-05 1.33E-02 elevated 0.26 HMDB07228

(18:1/20:4) [1] p-cresol sulfate 19.1 1.37E-05 1.41E-02 decreased -0.29 3233-57-7 HMDB11635 anthranilate 18.9 1.49E-05 1.53E-02 decreased -0.14 118-92-3 HMDB01123 oleoyl-linoleoyl-glycerol 18.7 1.67E-05 1.71E-02 elevated 0.20 104346-53-4 HMDB07219

(18:1/18:2) [2] guanidinosuccinate 18.3 2.12E-05 2.17E-02 decreased -0.16 6133-30-8 HMDB03157

5-hydroxyindole sulfate 18.2 2.14E-05 2.19E-02 decreased -0.31

2-acetamidophenol sulfate 18.0 2.46E-05 2.52E-02 decreased -0.60 40712-60-5 glycosyl-N-tricosanoyl- 17.9 2.53E-05 2.59E-02 decreased -0.17 sphingadienine (dl8:2/23:0)

4-hydroxyglutamate 17.4 3.32E-05 3.40E-02 elevated 0.26 2485-33-8 HMDB01344

4-ethylphenylsulfate 17.2 3.59E-05 3.68E-02 decreased -0.99 123-07-9 adenosine 5'-monophosphate 17.1 3.81E-05 3.90E-02 decreased -0.21 149022-20-8 HMDB00045

(AMP) glycochenodeoxycholate 16.7 4.75E-05 4.86E-02 elevated 0.41 sulfate

In a next step, machine learning tools were used to determine if the selected 59 metabolites can be used to predict the B2 enterotype and what the impact is on the prediction of each of the 59 metabolites. Therefore, the dataset was split into a training set (90 % of samples after balancing) and a testing set (the remaining 10%). First, different individual classifiers were trained on the training set. Because the obtained results depend on the specific classifier used, an ensemble classifier - which combines predictions made by multiple individual classifiers - was created as well to obtain more robust, higher quality results. In a second step, the test set was used to generate a classification report for the ensemble classifier as well as the individual classifiers it encapsulates. The ensemble classifier resulted in a very good prediction of B2 versus non-B2 enterotype with a precision of 0.87, a recall of 0.87 and an Fl-score of 0.87 (Table 2). Table 2. Performance of the ensemble classifier when predicting B2 vs non-B2 precision recall fl-score support

B2 0.87 0.85 0.86 48 non 0.87 0.88 0.88 52 acc 0.87 100 ma 0.87 0.87 0.87 100 wei 0.87 0.87 0.87 100

It outperformed all individual classifiers although the ADABoostClassifier performed very close to the ensemble classifier (Table 3). Since the ensemble classifier doesn't allow applying common evaluation metrics like the well-established ROC curves, the best individual classifier (i.e. the ADABoostClassifier) which does support evaluation metrics, was selected for the remaining parts of the application.

Table 3. Performance of the ADABoost Classifier when predicting B2 vs non-B2 precision recall fl-score support

B2 0.90 0.73 0.80 48 non 0.79 0.92 0.85 52 acc 0.83 100 ma 0.83 0.83 0.83 100 wei 0.83 0.83 0.83 100

An additional advantage of using the ADABoostClassifier is that the most important features can be extracted from the classifier and ranked. This allows a further reduction of the number of required metabolites for the predictions. The complete list of metabolites and their individual impact on the decision process to distinguish the B2 enterotype from the others, for this particular classifier, is included in Table 4.

Table 4. Metabolites ranked by their individual impact on predicting the B2 enterotype using an ADABootsClassifier. The top 10 (highlighted in grey), was used in the next section.

Cumulative performance

METABOLITE Importance Count Test Score (mean) Std. ROC AUC (mean) Std.

3-phenylpropionate (h 0.05 0.761 0.037 0.759 0.005 isoursodeoxycholate 0.05 0.773 0.046 0.734 0.017 p-cresol sulfate 0.05 0.760 0.067 0.779 0.010 indole- 3-carboxylic aci 0.04 0.746 0.060 0.777 0.009 glycolithocholate sulfa 0.04 0.758 0.051 0.784 0.013 5-hydroxyhexanoate 0.04 0.773 0.050 0.776 0.016 anthranilate 0.03 0.779 0.043 0.778 0.011 adenosine 5'-monoph 0.03 0.786 0.047 0.792 0.011 glycoursodeoxycholate 0.03 0.798 0.048 0.767 0.012 1 Cumulative performance

METABOLITE Importance Count Test Score (mean) Std. ROC AUC (mean) Std. To get an impression how robust the method is depending on the input data a more advanced cross validation scheme was used. Of the set of 500 B2 and 500 non-B2 samples, 100 samples were set aside for validation and creating the ROC curves, while a lOx cross-validation (CV) strategy was used on the remaining 900 samples. For each cross-validation iteration, a new model was trained on 90% of the set and tested against the remaining 10%. Each model from the CV along with the relevant metrics was stored. Using all 59 metabolites picked up using the feature selection the average test score is 0.818 (std. 0.046). A very similar results could be obtained when using only the top 10 metabolites (using the feature impact, from the classifier described in Table 4), more precisely an average test score of 0.798 (std. 0.040) was obtained.

Next, ROC curves were generated using the cross-validation models and the 100 samples initially withheld. This allows the ROC curves to be drawn with a 95 percent confidence interval. When taking the full set of 59 metabolites into account (Figure 2, mean area under the curve 0.86) or only taking into account the top 10 metabolites (Figure 3, mean area under the curve 0.82). While an expected drop in performance can be observed between using the full set and using only the top 10 metabolites, classification methods with a ROC AUC >0.80 are considered excellent (Mandrekar 2010 J Thorac Oncol 5: 1315-1316).

Example 2. Behaviour of selected metabolites in the B2 enterotype

For all the 59 metabolites selected above, the levels in the B2 and non-B2 enterotypes were inspected and checked for significant differences. Table 1 provides an overview of all selected metabolites and indicates if they are elevated of decreased in B2 compared to the non-B2 enterotypes (column "B2 level").

Example 3. Defining the minimal number of metabolites required for B2 prediction

Next, we studied whether the biomarker panel of 10 metabolites could further be reduced and still provide an accurate B2 prediction. For this analysis, a classifier is considered acceptable if a ROC AUC of 0.7 or higher is obtained using the testing scheme outlined in the methods. To avoid excessive numbers of permutations to be necessary at later stages, a heuristic was used to estimate how many additional metabolites would likely be required for a given metabolite to yield acceptable predictions. To this end a classifier was generated using all metabolites and feature weights were determined as well as the cumulative performance using only the first, the first two, the first three, ... metabolites (Table 4). The number of metabolites required to reach a ROC AUC of 0.7 was stored along with the required metabolites to reach that point, next the highest ranked feature was eliminated from the matrix and the same procedure was repeated until no metabolites remained in the list. For each metabolite, the smallest number of additional compounds required to reach a prediction with a ROC AUC>0.7 was determined. In the next step, where these findings are verified using random permutations with two, three or four metabolites, only metabolites which were likely to perform good in combination with one, two or three other metabolites were considered to reduce the number of required permutations and compute time.

By doing so, three classifiers were found based on individual metabolites, lH-indole-7-acetic acid, 3- phenylpropionate (or hydrocinnamate) and cinnamoylglycine. All three metabolites are sufficiently decreased in the blood serum of participants with the B2 enterotype, compared to participants with another enterotype, that these can be used as a single marker with reasonable outcome of the predictions. Next, from metabolites where the heuristic indicated a combination of two metabolites would be sufficient, 5000 pairs were randomly selected, and tested. Metabolites in pairs that were able to generate classifiers with a ROC AUC > 0.7 were selected. These thirteen metabolites were (glutarate (C5-DC), glycolithocholate sulfate, isoursodeoxycholate, 5alpha-androstan-3beta,17alpha-diol disulfate, hippurate, phenol sulfate, catechol sulfate, 4-hydroxycoumarin, l-urobilinogen, p-cresol sulfate, N- acetyl-cadaverine, imidazole propionate and 5-hydroxyhexanoate) confirmed to work in at least one pair. These twelve metabolites were removed from the matrix and 5000 triads with three metabolites selected by the heuristic to work in triple were created, models were made and their performance checked. This revealed that five metabolites (ursodeoxycholate, indole-3-carboxylic acid, phenol glucuronide, 7-alpha-hydroxy-3-oxo-4-cholestenoate (7-Hoca) and palmitoyl-oleoyl-glycerol (16:0/18:1) [2]) picked up by the initial heuristic all have potential combinations with three metabolites that generate acceptable predictions. The same method uncovered that combinations of 4 metabolites selected from a list of 16 metabolites lead to an acceptable B2 prediction. The metabolites picked up at this step are: glucuronide of C14H22O4 (2), indolepropionylglycine, N-acetylglucosamine conjugate of C24H40O4 bile acid, glycoursodeoxycholate, phenylacetate, 4-ethylphenylsulfate, glycochenodeoxycholate sulfate, anthranilate, etiocholanolone glucuronide, 1-oleoyl-GPE (18:1), 4- hydroxyphenylacetate, dihydroferulic acid, l-(l-enyl-palmitoyl)-2-oleoyl-GPC (P-16:0/18:l), adenosine 5'-monophosphate (AMP), l-(l-enyl-palmitoyl)-2-palmitoyl-GPC (P-16:0/16:0) and palmitoyl sphingomyelin (dl8:l/16:0).

Materials and methods

Data acquisition

Enterotypes for participants from the Flemish Gut Flora Project were determined using a Dirichlet Multinomial Model using the data and methodology described in Falony et al. (2016 Science). Blood serum metabolite levels were determined using liquid chromatography paired with a mass spectrometer (LC-MS). Unknown metabolites were removed prior to analysis. Metabolite levels were scaled using the StandardScaler. The dataset was balanced using random under-sampling to ensure an equal number of participants for each category were present in the final set.

Feature selection

The first selection of the most relevant metabolites to characterize the B2 enterotype, was done by computing the ANOVA F-value, p-value and corrected (Bonferroni) p-value for all metabolites and retaining those with a corrected p-value < 0.05 (n = 59). In practice, the f_classif function implemented in scikit-learn (version 0.23.1) was used.

Classification methods

An ensemble classifier (Voting Classifier) was created consisting of a DecisionTreeClassifier, a RandomForestClassifier (with 50 estimators), an AdaBoostClassifier (with 100 estimators), a Perceptron, a Support Vector Classifier, and a Stochastic Gradient Descent classifier. All with the default settings unless otherwise stated. All classifiers along with the VotingClassifier are implemented in scikit-learn.

Method evaluation

For testing the performance of individual classifiers the full dataset was split into a training and testing dataset in a 9/1 ratio. The ensemble classifier was trained on the training dataset and the performance (precision, recall and fl-scores) was for each individual classifier as well as the ensemble determined using the function classification report from scikit-learn.

Cross Validation and ROC curves

For further evaluation the dataset was converted to B2 and non-B2 enterotypes and balanced using random under-sampling. Using 10-fold cross validation, the performance was evaluated and ROC curves were created.