RICHMAN JESSICA (US)
ALMONACID DANIEL (US)
CAMEJO PAMELA (US)
WO2015089333A1 | 2015-06-18 | |||
WO2016168354A1 | 2016-10-20 |
KELLYM ROBINSON; JONATHAN CRABTREE; JOHNS A MATTICK; KATHLEENE ANDERSON; JULIEC DUNNING HOTOPP: "Distinguishing potential bacteria-tumor associations from contamination in a secondary data analysis of public cancer genome sequence data", MICROBIOME, vol. 5, no. 9, 25 January 2017 (2017-01-25), XP021241104
CLAIMS I/we claim 1. A method of detecting associations between the abundances of bacterial species comprising the steps: a: detecting the existence of the bacteria; b: obtaining the sequencing data of the bacteria; c: normalizing the sequencing data obtained in step b. 2 The method of claim 1, further comprising analyzing the normalized data. |
Automatic Standardization Pipeline For Data Analysis
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Patent Application No.
62/780,759 filed December 17, 2018 entitled“Automatic Standardization Pipeline for Data
Analysis”, which is incorporated by reference herein in its entirety.
BACKGROUND OF THE INVENTION
[0002] Differential abundance methods aim at detecting associations between the abundances of bacterial species (and/or other suitable types of microorganisms and/or taxa; etc.). These methods use statistical analyses to indicate whether the abundance of microorganisms is considered different among two or more groups of samples. Statistical analysis of microbiome data has many challenges, including sparsity and compositional nature of the data. Therefore, microbial abundances need to be normalized before using any statistical method.
DESCRIPTION OF THE EMBODIMENTS
[0003] The following description of the embodiments is not intended to limit the embodiments, but rather to enable any person skilled in the art to make and use.
[0004] We developed a pipeline to standardize the normalization step of sequencing data and/or to identify differentially abundant microorganisms (e.g., of any suitable taxa; of same taxa across samples; of different taxa across samples; etc.) in sequenced samples. This pipeline combines one or more of the following steps (e.g., in any suitable combination, sequence, frequency, time; etc.): removal of zeros, normalization and/or statistical methods, and/or automatizing the analysis of sequencing data.
[0005] In an embodiment, The Differential Abundance pipeline compares the microbial population among samples representing multiple conditions (based on metadata) (however, samples can correspond to any suitable number and type of conditions). The provided results correspond to a preliminary inspection of the data, and therefore, further analyses might be required to confirm these findings.
[0006] Following includes examples of a description(s) of the analyses: 1. Prevalence filter: Only taxa present (abundance > 0) in at least 10% of samples were included in the analysis.
2. Statistical analysis of microbiome data has many challenges, including sparsity and
compositional nature of data. To deal with these challenges, we transformed zero abundance data using a‘multiplicative replacement’ normalization step, to convert data from a constrained compositionat to an euclidean space.
3. Centered Log Ratio (CLR) Transformation of data.
4. Differential Abundance: First, we generated a generalized linear model (GLM) for each taxa, based on a Gaussian distribution. Then p-values were calculated
using an analysis of variance test (ANOVA) of each model and were adjusted using the‘bonferroni’ method . Microorganisms with p-value < 0.05 were considered to be differentially abundant.
[0007] In variations, any suitable combination of the steps described above can be performed in any suitable sequence, frequency, and time.
[0008] Examples of the following information that can be generated by this pipeline:
a. PC A plots: PC A plots of CLR data where samples are colored by
condition.
b. Differential Abundance: A table with microorganisms with differential abundance, including the average abundance, standard deviation and number of samples analyzed for each condition. One table is generated for each comparison. If none microorganism was found to be differentially abundant, no table and plots are generated.
c. Box Plots: Box plot of CLR data for differentially abundant taxa at each condition. Only 6 taxa with the lowest p-values are plotted.
d. Heatmaps: For each microorganism with differential abundance, a
heatmap of CLR data divided by condition.
[0009] In variations, any suitable combination of information (e.g., types of information described above) can be generated in any suitable sequence, frequency, and time.
[0010] Embodiments of the method and/or system can include every combination and permutation of the various system components and the various method processes, including any variants (e.g., embodiments, variations, examples, specific examples, figures, etc.), where portions of embodiments of the method and/or processes described herein can be performed asynchronously (e.g., sequentially), concurrently (e.g., in parallel), or in any other suitable order by and/or using one or more instances, elements, components of, and/or other aspects of the system and/or other entities described herein.
[0011] Any of the variants described herein (e.g., embodiments, variations, examples, specific examples, figures, etc.) and/or any portion of the variants described herein can be additionally or alternatively combined, aggregated, excluded, used, performed serially, performed in parallel, and/or otherwise applied.
[0012] Portions of embodiments of the method and/or system can be embodied and/or implemented at least in part as a machine configured to receive a computer-readable
medium storing computer-readable instructions. The instructions can be executed by computer- executable components that can be integrated with the system. The computer-readable medium can be stored on any suitable computer-readable media such as RAMs, ROMs, flash memory,
EEPROMs, optical devices (CD or DVD), hard drives, floppy drives, or any suitable device. The computer-executable component can be a general or application specific processor, but any suitable dedicated hardware or hardware/firmware combination device can alternatively or additionally execute the instructions. As a person skilled in the art will recognize from the previous detailed description and from the figures and claims, modifications and changes can be made to
embodiments of the method, system, and/or variants without departing from the scope defined in the claims.