Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
AUTOMATIC STANDARDIZATION PIPELINE FOR DATA ANALYSIS
Document Type and Number:
WIPO Patent Application WO/2020/131923
Kind Code:
A1
Abstract:
This disclosure relates to methods of detecting associations between the abundances of bacterial species comprising normalization of the sequencing data.

Inventors:
APTE ZACHARY (US)
RICHMAN JESSICA (US)
ALMONACID DANIEL (US)
CAMEJO PAMELA (US)
Application Number:
PCT/US2019/066933
Publication Date:
June 25, 2020
Filing Date:
December 17, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UBIOME INC (US)
International Classes:
C12Q1/04; C12Q1/68; G16B30/10
Domestic Patent References:
WO2015089333A12015-06-18
WO2016168354A12016-10-20
Other References:
P. CHOUVARINE, WIEHLMANN L., MORAN LOSADA P., DELUCA D. S., TÜMMLER B.: "Filtration and Normalization of Sequencing Read Data in Whole-Metagenome Shotgun Samples", PLOS ONE, vol. 11, no. 10, 19 October 2016 (2016-10-19), pages e0165015, XP055721863
KELLYM ROBINSON; JONATHAN CRABTREE; JOHNS A MATTICK; KATHLEENE ANDERSON; JULIEC DUNNING HOTOPP: "Distinguishing potential bacteria-tumor associations from contamination in a secondary data analysis of public cancer genome sequence data", MICROBIOME, vol. 5, no. 9, 25 January 2017 (2017-01-25), XP021241104
Attorney, Agent or Firm:
CHAI, Deping et al. (US)
Download PDF:
Claims:
CLAIMS

I/we claim

1. A method of detecting associations between the abundances of bacterial species comprising the steps:

a: detecting the existence of the bacteria;

b: obtaining the sequencing data of the bacteria;

c: normalizing the sequencing data obtained in step b.

2 The method of claim 1, further comprising analyzing the normalized data.

Description:
TITLE OF THE INVENTION

Automatic Standardization Pipeline For Data Analysis

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit of U.S. Provisional Patent Application No.

62/780,759 filed December 17, 2018 entitled“Automatic Standardization Pipeline for Data

Analysis”, which is incorporated by reference herein in its entirety.

BACKGROUND OF THE INVENTION

[0002] Differential abundance methods aim at detecting associations between the abundances of bacterial species (and/or other suitable types of microorganisms and/or taxa; etc.). These methods use statistical analyses to indicate whether the abundance of microorganisms is considered different among two or more groups of samples. Statistical analysis of microbiome data has many challenges, including sparsity and compositional nature of the data. Therefore, microbial abundances need to be normalized before using any statistical method.

DESCRIPTION OF THE EMBODIMENTS

[0003] The following description of the embodiments is not intended to limit the embodiments, but rather to enable any person skilled in the art to make and use.

[0004] We developed a pipeline to standardize the normalization step of sequencing data and/or to identify differentially abundant microorganisms (e.g., of any suitable taxa; of same taxa across samples; of different taxa across samples; etc.) in sequenced samples. This pipeline combines one or more of the following steps (e.g., in any suitable combination, sequence, frequency, time; etc.): removal of zeros, normalization and/or statistical methods, and/or automatizing the analysis of sequencing data.

[0005] In an embodiment, The Differential Abundance pipeline compares the microbial population among samples representing multiple conditions (based on metadata) (however, samples can correspond to any suitable number and type of conditions). The provided results correspond to a preliminary inspection of the data, and therefore, further analyses might be required to confirm these findings.

[0006] Following includes examples of a description(s) of the analyses: 1. Prevalence filter: Only taxa present (abundance > 0) in at least 10% of samples were included in the analysis.

2. Statistical analysis of microbiome data has many challenges, including sparsity and

compositional nature of data. To deal with these challenges, we transformed zero abundance data using a‘multiplicative replacement’ normalization step, to convert data from a constrained compositionat to an euclidean space.

3. Centered Log Ratio (CLR) Transformation of data.

4. Differential Abundance: First, we generated a generalized linear model (GLM) for each taxa, based on a Gaussian distribution. Then p-values were calculated

using an analysis of variance test (ANOVA) of each model and were adjusted using the‘bonferroni’ method . Microorganisms with p-value < 0.05 were considered to be differentially abundant.

[0007] In variations, any suitable combination of the steps described above can be performed in any suitable sequence, frequency, and time.

[0008] Examples of the following information that can be generated by this pipeline:

a. PC A plots: PC A plots of CLR data where samples are colored by

condition.

b. Differential Abundance: A table with microorganisms with differential abundance, including the average abundance, standard deviation and number of samples analyzed for each condition. One table is generated for each comparison. If none microorganism was found to be differentially abundant, no table and plots are generated.

c. Box Plots: Box plot of CLR data for differentially abundant taxa at each condition. Only 6 taxa with the lowest p-values are plotted.

d. Heatmaps: For each microorganism with differential abundance, a

heatmap of CLR data divided by condition.

[0009] In variations, any suitable combination of information (e.g., types of information described above) can be generated in any suitable sequence, frequency, and time.

[0010] Embodiments of the method and/or system can include every combination and permutation of the various system components and the various method processes, including any variants (e.g., embodiments, variations, examples, specific examples, figures, etc.), where portions of embodiments of the method and/or processes described herein can be performed asynchronously (e.g., sequentially), concurrently (e.g., in parallel), or in any other suitable order by and/or using one or more instances, elements, components of, and/or other aspects of the system and/or other entities described herein.

[0011] Any of the variants described herein (e.g., embodiments, variations, examples, specific examples, figures, etc.) and/or any portion of the variants described herein can be additionally or alternatively combined, aggregated, excluded, used, performed serially, performed in parallel, and/or otherwise applied.

[0012] Portions of embodiments of the method and/or system can be embodied and/or implemented at least in part as a machine configured to receive a computer-readable

medium storing computer-readable instructions. The instructions can be executed by computer- executable components that can be integrated with the system. The computer-readable medium can be stored on any suitable computer-readable media such as RAMs, ROMs, flash memory,

EEPROMs, optical devices (CD or DVD), hard drives, floppy drives, or any suitable device. The computer-executable component can be a general or application specific processor, but any suitable dedicated hardware or hardware/firmware combination device can alternatively or additionally execute the instructions. As a person skilled in the art will recognize from the previous detailed description and from the figures and claims, modifications and changes can be made to

embodiments of the method, system, and/or variants without departing from the scope defined in the claims.