METHODS AND DEVICES FOR IDENTIFYING POPULATION CLUSTERS IN DATA

Title:

METHODS AND DEVICES FOR IDENTIFYING POPULATION CLUSTERS IN DATA

Document Type and Number:

WIPO Patent Application WO/2018/151680

Kind Code:

Abstract:

Methods and devices for automated gating of flow cytometry data, including: sorting the flow cytometry data according to one or more parameters measured by a flow cytometer; determining a density model of the sorted flow cytometry data; calculating a derivative value at each of a plurality of points on the density model; determining, using the calculated derivative values: one or more peaks in the density model, and one or more points of inflection in the density model; and identifying one or more gating points based on the one or more peaks or the one or more points of inflection.

More Like This:

WO/2014/012584	ANALYSIS DEVICE FOR EXAMINING A FLUID IN A FLEXIBLE SAC AND REACTION CONTAINER HAVING A WALL MADE OF A FLEXIBLE MATERIAL
JPH08201263	SMOKE DETECTOR
WO/2003/060486	FLOW SORTING SYSTEM AND METHODS REGARDING SAME

Inventors:

CHEN HAO (SG)
CHEN JINMIAO (SG)
POIDINGER MICHAEL (SG)
LARBI ANIS (SG)
CAMOUS XAVIER (SG)

Application Number:

PCT/SG2018/050073

Publication Date:

August 23, 2018

Filing Date:

February 15, 2018

Export Citation:

Click for automatic bibliography generation Help

Assignee:

AGENCY SCIENCE TECH & RES (SG)

International Classes:

G01N15/14; G01N35/00; G06F7/00

Foreign References:

US20020029235A1	2002-03-07
CN104200114A	2014-12-10
US20080221812A1	2008-09-11
US20050059046A1	2005-03-17

Other References:

MALEK M. ET AL.: "flowDensity: reproducing manual gating of flow cytometry data by automated density-based cell population identification", BIOINFORMATICS, vol. 31, no. 4, 16 October 2014 (2014-10-16), pages 606 - 607, XP055537251, [retrieved on 20180412]
GE Y. ET AL.: "flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding", BIOINFORMATICS, vol. 28, no. 15, 17 May 2012 (2012-05-17), pages 2052 - 2058, XP055073112, [retrieved on 20180412]

Attorney, Agent or Firm:

VIERING, JENTSCHURA & PARTNER LLP (SG)

Download PDF:

View/Download PDF PDF Help

Claims:

Claims

What is claimed is:

1. A method for automated gating of flow cytometry data by a processing device, the method comprising:

sorting the flow cytometry data according to one or more parameters measured by a flow cytometer;

determining a density model of the sorted flow cytometry data;

calculating a derivative value at each of a plurality of points on the density model; determining, using the calculated derivative values:

one or more peaks in the density model; and/or

one or more points of inflection in the density model; and

identifying one or more gating points based on the one or more peaks, or the one or more points of inflection.

2. The method of claim 1, when multiple peaks are determined in the density model, the method further comprising identifying at least one gating point between two of the multiple peaks based on the calculated derivative values of the density model between the two said peaks.

3. The method of claim 2, further comprising identifying the at least one gating point by determining a minimum absolute value of the calculated derivative values between the two said peaks.

4. The method of claim 1, further comprising determining the plurality of points along a domain of the density model by using a fixed number of bins.

5. The method of claim 4, further comprising determining a size of each of the bins by dividing a difference of a maximum of the domain and a minimum of the domain by the fixed number of bins.

6. The method of claim 1, further comprising determining at least one of the one or more peaks by using a first calculated derivative value on one side of a point and a second calculated derivative value on a second side of the point.

7. The method of claim 1, further comprising determining the one or more points of inflection of the density model by determining local maximum and/or local minimum in the calculated derivative values.

8. The method of claim 1, further comprising determining the one or more points of

inflection according to:

where j is a number of a point of the plurality of points on the density model, c is a point of inflection, d; is a derivative value at j, and m is determined peak.

The method of claim 1, wherein the one or more gating points, g, of the sorted flow cytometry data are identified according to:

where j is a number of a point of the plurality of points on the density model, c is a point of inflection, d_j is a derivative value at j, b_size is a fixed distance between each of the plurality of points on the density model, m is a determined peak, and I is an adjustable parameter configured to determine a distance between the gating on a respective tail of a respective peak.

10. The method of claim 1, further comprising sorting the flow cytometry data according to two parameters measured by a flow cytometer.

11. The method of claim 10, further comprising determining a plurality of density tracks in the density model along either one of an x-axis or a y-axis of the density model.

12. The method of claim 11, wherein determining the plurality of density tracks comprises determining an "A" number of bins along the x-axis of the density model and a "B" number of bins along the y-axis of the density model.

13. The method of claim 12, further comprising determining a bin size dimension along the x-axis, b_xsize, by a difference of a maximum the density model along the x-axis and a minimum of the density model along the x-axis by "A."

14. The method of claim 13, further comprising determining a bin size dimension along the y-axis, b_ysize, by a difference of a maximum the density model along the y-axis and a minimum of the density model along the y-axis by "B."

15. The method of claim 14, further comprising calculating a density value of each of the bins selected along the x-axis or y-axis of the density model.

16. The method of claim 15, wherein determining the plurality of density tracks comprises transforming the calculated density values into the plurality of density tracks along the selected axis of the density model.

17. The method of claim 16, wherein calculating the derivative value at each of the plurality of points on the density model comprises calculating the derivative value at each of the plurality of points along each of the plurality of density tracks.

18. The method of claim 17, wherein one or more peaks and one or more points of inflection are determined on each of the plurality of density tracks.

19. The method of claim 18, wherein at least one gating point is identified on each of the plurality of density tracks, providing a multiplicity of gating points.

20. The method of claim 19, further comprising identifying a gating line comprising a linear regression of the multiplicity of gating points.

21. Machine-readable storage including machine-readable instructions, which when executed, implement a method as claimed in any one of claims 1-20.

22. A system for performing flow cytometry experiments, the system comprising:

a flow cytometer configured to measure one or more parameters of a sample comprising a plurality of particles; and a processing device configured to: obtain the measurements from the flow cytometer; sort the measurements according to the one or more parameters; determine a density model of the sorted measurements; calculate a derivative value at each of a plurality of points on the density model; determine, using the calculated derivative values: one or more peaks in the density model; and/or one or more points of inflection in the density model; and identify one or more gating points based on the one or more peaks, or the one or more points of inflection.

23. The system of claim 22, the processing device further configured to, when multiple peaks are determined in the density model, identify at least one gating point between two of the multiple peaks based on the calculated derivative values of the density model between the two said peaks.

24. The system of claim 23, the processing device further configured to identify the at least one gating point by determining a minimum absolute value of the calculated derivative values between the two said peaks.

25. The system of claim 22, the processing device further configured to determine the one or more points of inflection according to:

where j is a number of a point of the plurality of points on the density model, c is a point of inflection, d_j is a derivative value at j, and m is a determined peak.

Description:

METHODS AND DEVICES FOR IDENTIFYING POPULATION CLUSTERS IN

DATA

Cross-Related Application

[0001] This application claims priority to Singapore application No. 10201701209X, filed on February 15, 2017, which is incorporated by reference in its entirety herein.

Technical Field

[0002] Various aspects relate generally to methods and devices for identifying clusters in multidimensional data, including in particle analysis such as gating of flow cytometry data.

Background

[0003] Flow cytometry devices and other particle analyzers (e.g. mass cytometers) provide for the identification and characterization of particles based on certain predetermined parameters, e.g. optical parameters including light scatter and fluorescence. In flow cytometry, for example, particles in beads of a fluid suspensions are passed through a detection region where the particles are subjected to light, typically from one or more lasers, and the light scattering and fluorescence properties of the particles are measured by sensors in the detection region. Particles are typically labeled with one or more fluorescent dyes of known properties in order to facilitate detection, and the sensors in the detection region are arranged in order to detect a plurality of different properties simultaneously, e.g. for each of the used fluorescent dyes, and one or more light scattering properties such as forward- scattered light (FCS), side-scattered light (SSC), etc. These sensors, e.g. photodetectors, obtain the data for the particles in real-time as they pass through the detection region, and transmit the data to computer readable media for data storage. The data obtained is multidimensional in nature, wherein each particle may correspond to a point in a multidimensional space defined by the measured parameters. Populations, or clusters, of certain types of cells are identified based on their correlation to each other in this

multidimensional space.

[0004] Manual gating plays a crucial role in flow cytometry data analysis due to its flexibility and intuition; however, manual sequential gating to extract interested cell populations becomes tedious and labor intensive at larger file sizes. And while automated gating mechanisms provides more expeditious results, current automated gating methods do not provide the accuracy and reliability of manual gating. As particle analyzers continue to improve and become capable of gathering larger amounts of data, e.g. greater than 40 channels in mass cytometry (each channel based on a measured stable isotope mass) or up to 50 characteristics per cell using a BD FACSymphony™ flow cytometer, improved automated gating methods are necessary in order to efficiently and accurately analyze data.

Summary of Invention

[0005] In some aspects, methods and devices for efficiently and effectively processing and analyzing data, including large sets of multidimensional data, are presented. These methods may include selecting at least one of ID modalGate and/or 2D modalGate (together, referred to as modalGate) depending on the data to be analyzed. modalGate efficiently and effectively identifies population clusters in data by determining a density model of the sorted data;

calculating a derivative value at each of a plurality of points on the density model;

determining, using the calculated derivative values one or more peaks in the density model, and/or one or more points of inflection in the density model; and identifying one or more gating points based on the one or more peaks or the one or more points of inflection, where the gating points are used to identify the population clusters. Brief Description of the Drawings

[0006] In the drawings, like reference characters generally refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead generally being placed upon illustrating the principles of the invention. In the following description, various embodiments of the invention are described with reference to the following drawings, in which:

FIG. 1 shows a basic configuration of a flow cytometer in some aspects;

FIG. 2 shows an internal configuration of a computer for data acquisition and analysis in some aspects;

FIG. 3 shows detection results using ID modalGate on the left two charts, under 300, and the detection results using 2D modalGate on the right two charts;

FIG. 4 shows gating results using modalGate compared to manual gating in some aspects;

FIGs. 5-9 show a comparison of the performance of modalGate with another automated gating method named flowDensity using expert manual gating as a benchmark in some aspects;

FIG. 10 shows is a box plot chart showing the Fl measurement between modalGate and flowDensity in some aspects; and

FIG. 11 shows a flowchart describing a method in some aspects.

Description

[0007] The following detailed description refers to the accompanying drawings that show, by way of illustration, specific details and embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. Other embodiments may be utilized and structural, logical, and electrical changes may be made without departing from the scope of the invention. The various embodiments are not necessarily mutually exclusive, as some embodiments can be combined with one or more other embodiments to form new embodiments. Various embodiments are described in connection with methods and various embodiments are described in connection with devices. However, it may be understood that embodiments described in connection with methods may similarly apply to the devices, and vice versa.

[0008] The word "exemplary" is used herein to mean "serving as an example, instance, or illustration." Any embodiment or design described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments or designs.

[0009] The terms "at least one" and "one or more" may be understood to include any integer number greater than or equal to one, i.e. one, two, three, four, [... ], etc. The term "a plurality" may be understood to include any integer number greater than or equal to two, i.e. two, three, four, five, [... ], etc.

[0010] The phrase "at least one of" with regard to a group of elements may be used herein to mean at least one element from the group consisting of the elements. For example, the phrase "at least one of" with regard to a group of elements may be used herein to mean a selection of: one of the listed elements, a plurality of one of the listed elements, a plurality of individual listed elements, or a plurality of a multiple of listed elements.

[0011] The words "plural" and "multiple" in the description and the claims expressly refer to a quantity greater than one. Accordingly, any phrases explicitly invoking the

aforementioned words (e.g., "a plurality of [objects]," "multiple [objects]") referring to a quantity of objects expressly refers more than one of the said objects. The terms "group (of)," "set [of]," "collection (of)," "series (of)," "sequence (of)," "grouping (of)," etc., and the like in the description and in the claims, if any, refer to a quantity equal to or greater than one, i.e. one or more.

[0012] The term "data" as used herein may be understood to include information in any suitable analog or digital form, e.g., provided as a file, a portion of a file, a set of files, a signal or stream, a portion of a signal or stream, a set of signals or streams, a key and/or value used in KV database, and the like. Further, the term "data" may also be used to mean a reference to information, e.g., in form of a pointer.

[0013] The terms "circuit" or "circuitry" as used herein are understood as any kind of logic-implementing entity, which may include special-purpose hardware or a processor executing software. A circuit may thus be an analog circuit, digital circuit, mixed-signal circuit, logic circuit, processor, microprocessor, Central Processing Unit (CPU), Graphics Processing Unit (GPU), Digital Signal Processor (DSP), Field Programmable Gate Array (FPGA), integrated circuit, Application Specific Integrated Circuit (ASIC), etc., or any combination thereof. Any other kind of implementation of the respective functions which will be described below in further detail may also be understood as a "circuit." It is understood that any two (or more) of the circuits detailed herein may be realized as a single circuit with substantially equivalent functionality, and conversely that any single circuit detailed herein may be realized as two (or more) separate circuits with substantially equivalent functionality. Additionally, references to a "circuit" may refer to two or more circuits that collectively form a single circuit.

[0014] The term "processor" or "controller" as for example used herein may be understood as any kind of entity that allows handling data. The data may be handled according to one or more specific functions executed by the processor or controller. Further, a processor or controller as used herein may be understood as any kind of circuit, e.g., any kind of analog or digital circuit. The term "handle" or "handling" as for example used herein referring to data handling, file handling or request handling may be understood as any kind of operation, e.g., an I/O operation, as for example, storing (i.e. writing) and reading, or any kind of logic operation.

[0015] A processor or a controller may be or include an analog circuit, digital circuit, mixed-signal circuit, logic circuit, processor, microprocessor, Central Processing Unit (CPU), Graphics Processing Unit (GPU), Digital Signal Processor (DSP), Field Programmable Gate Array (FPGA), integrated circuit, Application Specific Integrated Circuit (ASIC), etc., or any combination thereof. Any other kind of implementation of the respective functions, which will be described below in further detail, may also be understood as a processor, controller, or logic circuit. It is understood that any two (or more) of the processors, controllers, or logic circuits detailed herein may be realized as a single entity with equivalent functionality or the like, and conversely that any single processor, controller, or logic circuit detailed herein may be realized as two (or more) separate entities with equivalent functionality or the like.

[0016] In current technologies, differences between software and hardware implemented data handling may blur, so that it has to be understood that a processor, controller, or circuit detailed herein may be implemented in software, hardware or as hybrid implementation including software and hardware.

[0017] The term "software" refers to any type of executable instruction, including firmware.

[0018] The term "system" (e.g., a storage system, measurement system, data analysis system, etc.) detailed herein may be understood as a set of interacting elements, wherein the elements can be, by way of example and not of limitation, one or more mechanical components, one or more electrical components, one or more instructions (e.g., encoded in storage media), one or more processors, and the like.

[0019] The term "storage" (e.g., a storage device, a primary storage, storage system, etc.) detailed herein may be understood as any suitable type of memory or memory device, e.g., one or more of a solid state drive (SSD), hard disk drive (HDD), redundant array of independent disks (RAID), direct-connected NVM device, etc., or any combination thereof.

[0020] As used herein, "memory," "memory device," and the like may be understood as a non-transitory computer-readable medium in which data or information can be stored for retrieval. It is appreciated that a single component referred to as "memory" or "a memory" may be composed of more than one different type of memory, and thus may refer to a collective component comprising one or more types of memory. It is readily understood that any single memory component may be separated into multiple collectively equivalent memory components, and vice versa. Furthermore, while memory may be depicted as separate from one or more other components (such as in the drawings), it is understood that memory may be integrated within another component, such as on a common integrated chip.

[0021] The terms "point of inflection," "inflection point," and "change point" as described herein are used interchangeably.

[0022] FIG. 1 shows a basic configuration of a flow cytometer 100. It is appreciated that FIG. 1 is exemplary in nature and may thus be simplified for purposes of this explanation.

[0023] Flow cytometry is a laser, or impedance, based technology employed in cell counting, cell sorting, biomarker detection, protein engineering, and similar fields, which suspends cells 150 in a stream of fluid (sheath fluid 152) and passed the cells through an electronic detection apparatus. A flow cytometer allows for simultaneous multi-parametric analysis of the physical and/or chemical characteristics of the particles at high flow rates, e.g. thousands of particles a second.

[0024] A flow cytometer has several main components: a flow cell 102, a measuring system

(e.g. including one or more lasers 104), a detection region 110-116, an amplification system

(not shown), and a computer 130 (including a processor and a memory) for analysis of the signals. The flow cell 102 has a liquid stream (sheath fluid) 152 which carries and aligns the cells 150 (only one indicated is the one passing though the laser) so that they pass in a single file though one or more light beams, i.e. from one or more lasers 104, for sensing.

[0025] The measuring system may employ measurement of impedance and optical systems, including lamps, high power lasers, low power lasers, diode lasers, etc., which result in light signals which are detected by the sensors, 110-116, of the detection region. These sensors may include a forward-scatter light (FSC) detector, side-scatter light (SSC) detector, and one or more fluorescent (Fl) marker detectors, 114-116 (where N may be any integer greater than

[0026] A list of exemplary fluorescent markers are given in the table below:

[0027] The detection region may also include dichroic glass/mirrors, e.g. 120-122, for separating lights of different wavelengths for the fluorescent (Fl) marker sensors. Each of the signals from the sensors may pass through analog to digital converters (ADC) to convert the analog measurements of the FCS, SSC, and/or dye-specific fluorescence signal into digital signals than can be processed by the computer 130, wherein the computer has a storage medium for data acquisition and one or more processors for data analysis. Data acquisition is performed by the computer physically connected to the flow cytometer, and includes software which handles the digital interface with the flow cytometer. The software may be configured to adjust parameters (e.g., voltage, compensation) for testing, and also may assist in displaying initial sample information while acquiring sample data to ensure that parameters are set correctly. In some aspects, this software may include executable instructions which when retrieved by a processor of the computer 130, execute the processes described herein.

[0028] In order to distinguish particles, the signal data obtained from the sensors in the detection region may be stored in data storage 202 in order to be analyzed. This may include converting one or more signals corresponding to each particle into n-dimensional data, where n is an integer greater than or equal to 1. A processor 206 plot the n-dimensional data in a Cartesian coordinate system (e.g. (x, y) or (x, y, z)) as shown in FIG. 3-9 for data analysis.

[0029] In some aspects, methods and devices for performing analysis of data sets, such as those obtained via flow cytometry or mass cytometry, are disclosed. The methods and device presented herein provide unbiased, objective, fast, and fully automated methods for data analysis of sets of population (modal) density data.

[0030] FIG. 2 shows an exemplary internal configuration of a computer 130 for data acquisition and analysis 130 in some aspects. It is appreciated that FIG. 2 is exemplary in nature and may thus be simplified for purposes of this explanation.

[0031] Data storage 202 may be one or more memory devices configured to receive data from the sensors 110-116 of flow cytometer 100. Data storage may be configured to store this data as raw FCS files. Data analyzer 204 may include a processor 206 and a memory 208. Processor 206 may be a single processor or multiple processors, and may be configured to retrieve and execute program code to perform the methods as described herein. Processor 206 may transmit and receive data over a software-level connection that is physically transmitted as signals. Memory 208 may be a non-transitory computer readable medium storing instructions for subroutines 208a-208d, which may include executable instructions for performing the ID modalGate and/or 2D modalGate methods described herein. These executable instructions, for example, may include instructions determining a density model of a sorted flow cytometry data 208a; calculating a derivative value at each of a plurality of points on the density model 208b; determining, using the calculated derivatives values: one or more peaks in the density model and one or more points of inflection (POI) in the density model 208c; and identifying one or more gating points based on the one or more peaks, or the one or more points of inflection 208d. [0032] Computer 130 may further be configured with an interface to the one or more lasers (not shown).

[0033] It is recognized that not all clusters (i.e. populations) of data obtained by particle analyzers represent biologically meaningful cell populations. The performance of cell subset detection by clustering depends on the data "cleanness." Most clustering methods work optimally on FCS files that are obtained after several steps of pre-gating such as removing doublets, dead cells, lineage+ cell, etc. The disclosure herein automates these pre-gating steps so that clean T-cell, myeloid cell, B-cell, etc. data may automatically be obtained from raw flow cytometry standard (FCS) files.

[0034] The disclosure herein may include initially conducting a quality control in order to remove low-quality cell events from the FCS files, e.g. using flowAI. In flowAI, two methods to clean FCS files from unwanted events (i.e. data occurrences) are provided: 1) an automatic method that adopts algorithms for the detection of anomalies, and 2) an interactive method with a graphical user interface implemented into an R shiny application.

[0035] The general approach behind these two methods of flowAI includes three steps to check and remove suspected anomalies attributed to 1) abrupt changes in the flow rate, 2) instability of signal acquisition, and 3) outliers in the lower limit and margin events in the upper limit of the dynamic range. The first step (i.e. flow rate check) evaluates the steadiness of the flow rate of the analysis, which is reconstructed by reporting the number of cells acquired per unit of time. The second step (i.e. signal acquisition check) verifies the stability of the signal acquired over time, which may include, for example, verifying the quality of signal acquisition using Levy- Jennings-type graphs, where fluorescence is plotted against time. A stable signal acquisition should produce intensity values whose distribution is consistent throughout the course of the experiment. The third step (i.e. outlier check in the ranges) is performed at both the lower and upper limit of the dynamic range of the acquired data, which accounts for the occurrence of "margin events," i.e. measurements with a real value higher/lower than the upper/power limits, respectively, causing an accumulation of signals which is not comparable with the rest of the acquired data.

[0036] After the performing the automatic quality control, the gating methods (modalGate) described herein may be performed.

[0037] In some aspects, a flexible and efficient automated gating algorithm, herein referred to as modalGate, for analyzing populations of data is presented. With minor user inputs, modalGate can efficiently detect the modals (i.e. populations or clusters) and determine the gating boundary by tracking the density changes of selected marks. In some aspects, this gating algorithm can be further customized to fulfill most gating purposes implemented by manual gating while achieving data analysis speeds associated with automated gating. In some aspects, the algorithm disclosed herein may construct an automated gating pipeline by chaining multiple customized modalGates together to gate the data, e.g. flow cytometry data, in a hierarchal manner. The pipeline is data driven and may automatically adjust to account for variation among samples, and may therefore be applied as a gating template to any FCS files with a similar staining panel. By applying the modalGate algorithm, a gating template may be constructed to automatically gate myeloid cells from raw flow cytometry data, whereby the output matches that of manual gating with high precision and outperforms current automating gating methods such as flowDensity.

[0038] In some aspects, the modalGate algorithms automatically detect gating boundaries on markers by tracking changes in density. Two versions of modalGate, ID modalGate and 2D modalGate, automate the practical one-dimensional and two-dimensional manual gating methods, respectively.

[0039] ID modalGate serves as a key component of modalGate, which tracks the slope change of marker distribution by screening the derivative values from marker density plot to detect the main modals and gating point. Specifically, for a given marker Xi=(xi, ¾, . . . , x _n) , we apply the Gaussian kernel density estimation to estimate its probability density function fix) as shown by Equation (1):

where and h is calculated using a data-driven bandwidth estimation method

given by Equation (2): where for appropriate functions is a kernel-

based estimate of using an appropriate bandwidth a where and n is sample size. may be given by Equation (3)

where L is another symmetric density not necessarily K and with i=j terms added in (to include diagonals).

[0040] Then, the range of x is divided into a fixed number of bins (k) with bin size, b _size= (max(xi)-min(xi))/k. The derivative value of d _j of fix) at the edge point of each bin is calculated to track the slope change of the density using Equation (4):

where j = 1, 2, . . . , k. By screening the values of dj, the position of density peaks represented by m infix) can be easily detected using Equation (5):

[0041] The local maximum or minimum points in d _j represents the change point, i.e. point of inflection, of density. A local maximum in d _j is the place where the increasing rate of density reaches maximum in a modal; a local minimum in d _j labels the point with the maximal decreasing rate of density in a modal. Change points c, i.e. points of inflection, are identified using Equation (6):

With the position of density peaks and points of inflection of the density curves, ID modalGate can efficiently detect the gating position (g) at either the minimal intersection point between any two adjacent peaks or the cutting point along the tail of a specified peak, shown by Equation (7):

[0042] In some aspects, k and / are two adjustable parameters, wherein k affects the sensitivity of detecting peaks and change points (points of inflection), and / determines the distance between the gating on a tail from the peak. In some aspects, a default may set and 1=1, wherein a user can thereby adjust them in order to meet different gating requirements.

[0043] 2D modalGate extends the application of ID modalGate to multi-dimensional, e.g. 2D, gating by implementing a bin-aligned density tracking. For a pair of markers, e.g.

the kernel density is determined using a bivariate normal

kernel model as shown in Equation (8): where and the bandwidth h _x and h _y are calculated using the same method as h

described in ID modalGate. The data is binned on x and y to a grid of k _x rows and k _y columns with bin size

respectively. For each bin on either x or y, the density values are calculated using Equation (9):

[0044] The density track on the two-dimensional data is transformed to a density track on each bin of x or y, depending along which tracks the density values are calculated. ID modalGate is applied to detect the gating point on each bin, and all the gating points of each bin are linked up using linear regressions in order to obtain a gating line on the two- dimensional plot.

[0045] FIG. 3 shows detection results using ID modalGate on the left two charts, under 300, and the detection results using 2D modalGate on the right two charts, under 350.

[0046] For ID modalGate in 300, the top panel shows the plot of the derivative of the density plot against x and the bottom panel shows the density plot of exemplary data results, e.g. FCS, SSC, or any fluorescent marker data obtained from a flow cytometer. Two peaks are detected in the density plot and labeled with black circles (corresponding to zero values on the derivative curve in the top panel), while the four vertical dashed lines labeled 302 represent the four detected points of inflection (i.e. change points) on the density plot

(corresponding to maxima or minima on the derivative curve in the top panel). The two dashed lines labeled 304 and 306 represent the detected gating position on the left tail of peak 1 and the right tail of peak 2, respectively. The dashed line labeled 308 represents the detected minimal intersection position between peak 1 and peak 2. [0047] The graphs under 350 illustrate the application of 2D modalGate on two markers, x and y, of an exemplary set of data. The two markers may be, for example, FCS, SSC, or any fluorescent marker data obtained via a flow cytometer.

[0048] In the top graph under 350, x has been divided into 100 bins and y has been divided into 30 bins. For each bin on y, ID modalGate is applied to find the gating point between the two peaks. In the density plot below, the gating point on each bin is shown by the dashed line, where the solid line is drawn with the linear regression of all the gating points.

[0049] FIG. 4 shows the gating results using modalGate 450A-450E compared to manual gating 400A-400E in one exemplary comparison for pre-gating of myeloid cells from raw Flow Cytometry Standard (FCS) files. As shown in FIG. 4, the gating results of modalGate 450 matches up very well with that of expert gating 400. It is appreciated that the methods described herein may be used for pre-gating for other types of cells, e.g. T-cells, B-cells, or the like, and include the use of other data obtained from flow cytometers (e.g. using other fluorescent markers or light scatter data).

[0050] This pre-gating includes five sequential gates (A-E) to remove the unwanted cells step by step, including beads and debris, doublets, CD45- cells, dead cells and Lineage positive cells. These pre-gating steps are necessary for all myeloid cells analysis using flow cytometry. However, the data usually possess variation between different files and samples, which makes a static gating template not suitable to be applied to batch FCS files. Manual adjusting is thus incorporated by FCS experts to gate the myeloid population on each file using software like flow Jo, and is shown in 400A-400E.

[0051] On the bottom, i.e. 450A-450E, gates are customized using the modalGate algorithm to automate the manual practice shown in 400A-400E. The construction of the automated gates for 450A-450E are listed as follows.

[0052] In 450A, the PBMC cells are gated. This takes two markers, FSC-A and SSC-A

(where the A stands for area), as inputs. First, we use ID modalGate to find the boundary of two populations (two modals) with low SSC-A value, denoted as Then, for cells with

SSC-A value higher than 100000, we apply ID modalGate to find the boundary of beads at the tail of the first modal on FSC-A, denoted as xj. For cells with SSC-A value lower than 100000, we apply ID modalGate to find the boundary of the first modal, denoted as ¾. The gating of PBMC cells is to connect the point plus the vertical line of (0, yj)

[0053] In 450B, the single cells are gated. It works on markers FSC-A and FSC-H. This gate calculates the ratio r = FCS-A/FSC-H, and cells which fall between are

gated as single cells, where sd(r) is a standard deviation from r.

[0054] In 450C, the CD45+ cells (i.e. leukocytes groups) are gated. ID modalGate is applied to find the tail of the first modal of marker CD45 on the positive side. [0055] In 450D, the gating of alive cells is performed. 2D modal gate is used on the marker of DAPI-A and SSC-A. DAPI-A is cut into 100 bins while SSC-A is cut into 30 bins. For bins of SSC-A with total density above the 50 percentile, the minimal intersection point between the first and second modal are detected and then combined with a linear regression line to get the gating for alive cells.

[0056] In 450E, the lineage negative cells are gated. ID modalGate is applied to find the boundary of the first modal and the second modal on marker Lineage. [0057] Combining these five automated gates, 450A-450E, an automated gating template for extracting myeloid cells from raw FCS is constructed. Before loading the raw FCS data for automated gating, a quality control is performed with the default parameters using an in- house developed toolkit.

[0058] FIG. 5-9 compare the performance of modalGate with another automated gating method named flowDensity. Expert manual gating is also shown and used as the benchmark.

[0059] flowDensity is designed to automate the 2D traditional gating scheme by choosing the best cut-off points using characteristic of marker density distribution. While modalGate and flowDensity both share a reliance on the density estimation of the obtained data, they differ from each other in the methods of extracting desired characteristics from density distribution to use to determine the gating points. In flowDensity, the number of peaks is directly detected from the density distribution, then the height and width of peaks, the standard deviation of the peak, percentile of the density distribution and the slope of distribution curve are calculated as characteristics for aiding the determination of gating position. However, in modalGate, the derivative values from the density distribution (i.e. determining the peaks and points of inflection) are the primary characteristics being utilized to track the change of marker density distribution, and therefore, to determine the gating position.

[0060] 5 FCS files stained with the same panel were selected for testing the robustness of the automated gating, and labelled as SNF240, SNF217, SNF172, SNF152 and SNF139. A visual comparison of the gating results is plotted in FIG. 5-9, respectively, with the parameter settings and codes for this comparison. The following table provides more details about the staining parameters and codes used.

[0061] FIG. 5 shows a comparison between modalGate (middle row) and flowDensity

(bottom row) for gating a peripheral blood mononuclear cell (PBMC) population from 5 different FSC files. Expert manual gating (top row) is used as the benchmark. [0062] FIG. 6 shows a comparison between modalGate (middle row) and flowDensity (bottom row) for gating a single cell population from 5 different FSC files. Expert manual gating (top row) is used as the benchmark.

[0063] FIG. 7 shows a comparison between modalGate (middle row) and flowDensity (bottom row) for gating CD45+ population from 5 different FSC files. Expert manual gating (top row) is used as the benchmark.

[0064] FIG. 8 shows a comparison between modalGate (middle row) and flowDensity (bottom row) for gating a live cell population from 5 different FSC files. Expert manual gating (top row) is used as the benchmark.

[0065] FIG. 9 shows a comparison between modalGate (middle row) and flowDensity (bottom row) for gating lineage negative cell population from 5 different FSC files. Expert manual gating (top row) is used as the benchmark.

[0066] From the results, it is apparent that flowDensity performs comparably as well as modalGate in the CD45+ gate, i.e. shown in FIG. 7. However, except for that, flowDensity fails to identify the proper gate in one or more files in each of the other gates. modalGate, on the other hand, matches the expert manual gating quite well and demonstrates robustness in gating in all five files. To statistically assess the performance of modalGate and flowDensity, the cell IDs from automated gating are matched to those from manual gating, and counting the true positive (TP), false positive (FP) true negative (TN) and false negative (FN) events for each method in each gating step. From that, three metrics precision, recall and Fl are calculated, as shown in Equations (10)- (12):

[0067] FIG. 10 is a box plot chart 1000 showing the Fl measurement between modalGate (B) and flowDensity (A) in each of five pre-gating steps (i.e. PBMS, SingleCell, CD45pos, Alive, and LINneg) of myeloid cells.

[0068] Chart 1000 shows that modalGate is quite robust for different gates as well as for different files, represented by small interquartile ranges and few outliers in the Fl measurements. Also, the median Fl measurements all approach 1 in all five gates in modalGate. In contrast, flowDensity shows high interquartile range in several gates, e.g. in PBMC, Alive cell and LINneg gates. This indicates that flowDensity doesn't perform as robustly as modalGate on different files, and in several cases, its Fl measure is even lower than 0.5, e.g. in Alive cell gating.

[0069] Table 1 shows a more detailed statistical assessment between modalGate and flowDensity in five different gates of 5 FCS files described above.

TABLE 1

[0070] Overall, modalGate demonstrates high accuracy and robustness in the pre-gating of myeloid population, outperforming known gating methods such as flowDensity in testing. While shown as being used for pre-gating of myeloid data in some aspects, it is appreciated that modalGate may be used an algorithm for building automated gating templates, thereby facilitating the automated and reproducible analysis of flow cytometry data.

[0071] FIG. 11 shows a flowchart 1100 for gating of populations clusters in a data set in some aspects of this disclosure. It is appreciated that flowchart 1100 is exemplary in nature and may thus be simplified for purposes of this explanation.

[0072] In 1102, the data is sorted according to one or more measured parameters. In 1104, a density model is determined based on the sorted data. In 1106, a derivative value at each of a plurality of points along the density model is calculated. In 1108, determining, using the calculated derivative values: one or more peaks in the density model, and/or one or more points of inflection in the density model. In 1110, one or more gating points are identified based on the one or more peaks, or the one or more points of inflection, e.g. this may include using both one or more peaks and one or more points of inflection.

[0073] The following examples pertain to further aspects of this disclosure:

[0074] In Example 1, a method for automated gating of flow cytometry data by a processing device, the method comprising sorting the flow cytometry data according to one or more parameters measured by a flow cytometer; determining a density model of the sorted flow cytometry data; calculating a derivative value at each of a plurality of points on the density model; determining, using the calculated derivative values: one or more peaks in the density model, and/or one or more points of inflection in the density model; and identifying one or more gating points based on the one or more peaks, or the one or more points of inflection.

[0075] In Example 2, the subject matter of Example 1 may include when multiple peaks are determined in the density model, the method further comprising identifying at least one gating point between two of the multiple peaks based on the calculated derivative values of the density model between the two said peaks.

[0076] In Example 3, the subject matter of Example 2 may include identifying the at least one gating point by determining a minimum absolute value of the calculated derivative values between the two said peaks.

[0077] In Example 4 the subject matter of Examples 1-3 may include wherein the one or more parameters measured by the flow cytometer is selected from the group consisting of: forward scattered light (FSC); side scattered light (SSC); and a fluorescent activated marker.

[0078] In Example 5, the subject matter of Examples 1-4 may include determining the density model by applying a Gaussian kernel density estimation. [0079] In Example 6, the subject matter of Examples 1-5 may include determining the plurality of points along a domain of the density model by using a fixed number of bins.

[0080] In Example 7, the subject matter of Example 6 may include determining a size of each of the bins by dividing a difference of a maximum of the domain and a minimum of the domain by the fixed number of bins.

[0081] In Example 8, the subject matter of Examples 1-7 may include determining at least one of the one or more peaks by using a first calculated derivative value on one side of a point and a second calculated derivative value on a second side of the point.

[0082] In Example 9, the subject matter of Example 8 may include wherein the product of the first calculated derivative value and the second calculated derivative value is less than zero.

[0083] In Example 10, the subject matter of Example 9 may include wherein the first calculated derivative value is greater than zero.

[0084] In Example 11, the subject matter of Examples 1- 10 may include determining the one or more points of inflection of the density model by determining local maximum and/or local minimum in the calculated derivative values.

[0085] In Example 12, the subject matter of Examples 1- 11 may include determining the one or more points of inflection according to: where j is a number of a point of the plurality of points on the density model, c is a point of inflection, d _j is a derivative value at j, and m is determined peak.

[0086] In Example 13, the subject matter of Examples 1- 12 may include wherein the one or more gating points, g, of the sorted flow cytometry data are identified according to:

where j is a number of a point of the plurality of points on the density model, c is a point of inflection, d _j is a derivative value at j, b _size is a fixed distance between each of the plurality of points on the density model, m is a determined peak, and I is an adjustable parameter configured to determine a distance between the gating on a respective tail of a respective peak.

[0087] In Example 14, the subject matter of Example 13 may include setting a default value of I to about 1.

[0088] In Example 15, the subject matter of Examples 1- 14 may include adjusting a sensitivity of the determining of the peak and points of inflection by setting a pre-determined value for the number the plurality of points along the density model.

[0089] In Example 16, the subject matter of Example 15 may include setting the predetermined value to about 512.

[0090] In Example 17, the subject matter of Example 1 may include sorting the flow cytometry data according to two parameters measured by a flow cytometer.

[0091] In Example 18, the subject matter of Example 17 may include determining the density model by applying a bivariate normal kernel function to the sorted flow cytometry data.

[0092] In Example 19, the subject matter of Examples 17-18 may include determining a plurality of density tracks in the density model along either one of an x-axis or a y-axis of the density model.

[0093] In Example 20, the subject matter of Example 19 may include wherein determining the plurality of density tracks comprises determining an "A" number of bins along the x-axis of the density model and a "B" number of bins along the y-axis of the density model. [0094] In Example 21, the subject matter of Example 20 may include determining a bin size dimension along the x-axis, b _xsize, by a difference of a maximum the density model along the x-axis and a minimum of the density model along the x-axis by "A."

[0095] In Example 22, the subject matter of Examples 20-21 may include determining a bin size dimension along the y-axis, b _ysize, by a difference of a maximum the density model along the y-axis and a minimum of the density model along the y-axis by "B."

[0096] In Example 23, the subject matter of Examples 20-22 may include calculating a density value of each of the bins selected along the x-axis or y-axis of the density model.

[0097] In Example 24, the subject matter of Example 23 may include wherein determining the plurality of density tracks comprises transforming the calculated density values into the plurality of density tracks along the selected axis of the density model.

[0098] In Example 25, the subject matter of Example 24 may include wherein calculating the derivative value at each of the plurality of points on the density model comprises calculating the derivative value at each of the plurality of points along each of the plurality of density tracks.

[0099] In Example 26, the subject matter of Example 25 may include wherein each of the plurality of points along each density track is an edge of a bin dimension along the non- selected axis.

[0100] In Example 27, the subject matter of Examples 25-26 may include wherein one or more peaks and one or more points of inflection are determined on each of the plurality of density tracks.

[0101] In Example 28, the subject matter of Example 27 may include wherein at least one gating point is identified on each of the plurality of density tracks, providing a multiplicity of gating points.

[0102] In Example 29, the subject matter of Example 28 may include identifying a gating line comprising a linear regression of the multiplicity of gating points. [0103] In Example 30, machine-readable storage including machine-readable instructions which when executed by a processor of a device, cause the device to implement a method as recited in any preceding Example.

[0104] In Example 31, a system for performing flow cytometry experiments, the system comprising: a flow cytometer configured to measure one or more parameters of a sample comprising a plurality of particles; and a processing device configured to: obtain the measurements from the flow cytometer; sort the measurements according to the one or more parameters; determine a density model of the sorted measurements; calculate a derivative value at each of a plurality of points on the density model; determine, using the calculated derivative values: one or more peaks in the density model, and/or one or more points of inflection in the density model; and identify one or more gating points based on the one or more peaks, or the one or more points of inflection.

[0105] In Example 32, the subject matter of Example 31 may include the processing device further configured to, when multiple peaks are determined in the density model, identify at least one gating point between two of the multiple peaks based on the calculated derivative values of the density model between the two said peaks.

[0106] In Example 33, the subject matter of Example 32 may include the processing device further configured to identify the at least one gating point by determining a minimum absolute value of the calculated derivative values between the two said peaks.

[0107] In Example 34 the subject matter of Examples 31-33 may include wherein the one or more parameters measured by the flow cytometer is selected from the group consisting of: forward scattered light (FSC); side scattered light (SSC); and a fluorescent activated marker.

[0108] In Example 35, the subject matter of Examples 31-34 may include the processing device further configured to determine the density model by applying a Gaussian kernel density estimation. [0109] In Example 36, the subject matter of Examples 31-35 may include the processing device further configured to determine the plurality of points along a domain of the density model by using a fixed number of bins.

[0110] In Example 37, the subject matter of Example 36 the processing device further configured to determine a size of each of the bins by dividing a difference of a maximum of the domain and a minimum of the domain by the fixed number of bins.

[0111] In Example 38, the subject matter of Examples 31-37 may include the processing device further configured to determine at least one of the one or more peaks by using a first calculated derivative value on one side of a point and a second calculated derivative value on a second side of the point.

[0112] In Example 39, the subject matter of Example 38 may include wherein the product of the first calculated derivative value and the second calculated derivative value is less than zero.

[0113] In Example 40, the subject matter of Example 39 may include wherein the first calculated derivative value is greater than zero.

[0114] In Example 41, the subject matter of Examples 31-40 may include the processing device further configured to determine the one or more points of inflection of the density model by determining local maximum and/or local minimum in the calculated derivative values.

[0115] In Example 42, the subject matter of Examples 31-41 may include the processing device further configured to determine the one or more points of inflection according to:

where j is a number of a point of the plurality of points on the density model, c is a point of inflection, d _j is a derivative value at j, and m is determined peak. [0116] In Example 43, the subject matter of Examples 31-42 may include wherein the one or more gating points, g, of the sorted flow cytometry data are identified according to:

where j is a number of a point of the plurality of points on the density model, c is a point of inflection, d _j is a derivative value at j, b _size is a fixed distance between each of the plurality of points on the density model, m is a determined peak, and I is an adjustable parameter configured to determine a distance between the gating on a respective tail of a respective peak.

[0117] In Example 44, the subject matter of Example 43 may include the processing device further configured to set a default value of I to about 1.

[0118] In Example 45, the subject matter of Examples 31-44 may include the processing device further configured to adjust a sensitivity of the determining of the peak and points of inflection by setting a pre-determined value for the number the plurality of points along the density model.

[0119] In Example 46, the subject matter of Example 45 may include the processing device further configured to set the pre-determined value to about 512.

[0120] In Example 47, the subject matter of Example 31 may include the processing device further configured to set the flow cytometry data according to two parameters measured by a flow cytometer.

[0121] In Example 48, the subject matter of Example 47 may include the processing device further configured to determine the density model by applying a bivariate normal kernel function to the sorted flow cytometry data. [0122] In Example 49, the subject matter of Examples 47-48 may include the processing device further configured to determine a plurality of density tracks in the density model along either one of an x-axis or a y-axis of the density model.

[0123] In Example 50, the subject matter of Example 49 may include wherein determining the plurality of density tracks comprises determining an "A" number of bins along the x-axis of the density model and a "B" number of bins along the y-axis of the density model.

[0124] In Example 51, the subject matter of Example 50 may include the processing device further configured to determine a bin size dimension along the x-axis, b _xsize, by a difference of a maximum the density model along the x-axis and a minimum of the density model along the x-axis by "A."

[0125] In Example 52, the subject matter of Examples 50-51 may include the processing device further configured to determine a bin size dimension along the y-axis, b _ysize, by a difference of a maximum the density model along the y-axis and a minimum of the density model along the y-axis by "B."

[0126] In Example 53, the subject matter of Examples 50-52 may include the processing device further configured to calculate a density value of each of the bins selected along the x- axis or y-axis of the density model.

[0127] In Example 54, the subject matter of Example 53 may include wherein determining the plurality of density tracks comprises transforming the calculated density values into the plurality of density tracks along the selected axis of the density model.

[0128] In Example 55, the subject matter of Example 54 may include wherein calculating the derivative value at each of the plurality of points on the density model comprises calculating the derivative value at each of the plurality of points along each of the plurality of density tracks. [0129] In Example 56, the subject matter of Example 55 may include wherein each of the plurality of points along each density track is an edge of a bin dimension along the non- selected axis.

[0130] In Example 57, the subject matter of Examples 55-56 may include wherein one or more peaks and one or more points of inflection are determined on each of the plurality of density tracks.

[0131] In Example 58, the subject matter of Example 57 may include wherein at least one gating point is identified on each of the plurality of density tracks, providing a multiplicity of gating points.

[0132] In Example 59, the subject matter of Example 58 may include the processing device further configured to identify a gating line comprising a linear regression of the multiplicity of gating points.

[0133] While the above descriptions and connected figures may depict device components as separate elements, skilled persons will appreciate the various possibilities to combine or integrate discrete elements into a single element. Such may include combining two or more circuits for form a single circuit, mounting two or more circuits onto a common chip or chassis to form an integrated element, executing discrete software components on a common processor core, etc. Conversely, skilled persons will recognize the possibility to separate a single element into two or more discrete elements, such as splitting a single circuit into two or more separate circuits, separating a chip or chassis into discrete elements originally provided thereon, separating a software component into two or more sections and executing each on a separate processor core, etc.

[0134] It is appreciated that implementations of methods detailed herein are demonstrative in nature, and are thus understood as capable of being implemented in a corresponding device.

Likewise, it is appreciated that implementations of devices detailed herein are understood as capable of being implemented as a corresponding method. It is thus understood that a device corresponding to a method detailed herein may include one or more components configured to perform each aspect of the related method.

[0135] All acronyms defined in the above description additionally hold in all claims included herein.

[0136] While the invention has been particularly shown and described with reference to specific embodiments, it should be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. The scope of the invention is thus indicated by the appended claims and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced.

Previous Patent: A METHOD FOR MAKING A SOLID-SUPPORTED PHOSPHOLIPID BILAYER

Next Patent: COMPOUNDS FOR TREATING TUBERCULOSIS