CURVE PROCESSOR ALGORITHM FOR THE QUALITY CONTROL OF (RT-)qPCR CURVES

Title:

CURVE PROCESSOR ALGORITHM FOR THE QUALITY CONTROL OF (RT-)qPCR CURVES

Document Type and Number:

WIPO Patent Application WO/2011/131490

Kind Code:

Abstract:

The invention is in the field of analytical technology and relates to an improved procedure for determining the concentration or activity of an analyte in a sample. Specifically the invention provides an automated algorithm for the quality control of (RT-)qPCR reactions. Plotting the fluorescence intensity of a reporter dye divided by the fluorescence intensity of a passive reference dye against the cycle number leads to a so-called sigmoid function which is characterized by a background phase, an exponential growth phase and a plateau phase. Since the fluorescence intensity as a function of cycles relates to the initial number of template molecules in the sample, qPCR curves can be used to quantify the amount of RNA or DNA fragments in the sample by determination of a so-called Cq value.

More Like This:

JP5401885	A constructing method, a construction system, and a program for construction of a model
JP2003099521	METHOD, DEVICE AND PROGRAM FOR ANALYZING ENVIRONMENTAL BURDEN
JP4388224	Molecular material analyzer, molecular material analysis method and storage medium

Inventors:

DARTMANN MAREIKE (DE)
WEBER KARSTEN (DE)
ALTMANN GABRIELA (DE)
FEDER INKE SABINE (DE)
ROPERS TANJA (DE)
ROTH CLAUDIA (DE)

Application Number:

PCT/EP2011/055406

Publication Date:

October 27, 2011

Filing Date:

April 07, 2011

Export Citation:

Click for automatic bibliography generation Help

Assignee:

SIEMENS HEALTHCARE DIAGNOSTICS (US)
DARTMANN MAREIKE (DE)
WEBER KARSTEN (DE)
ALTMANN GABRIELA (DE)
FEDER INKE SABINE (DE)
ROPERS TANJA (DE)
ROTH CLAUDIA (DE)

International Classes:

G06F19/00; G16B25/20; G16B40/10

Domestic Patent References:

WO2010025985A1	2010-03-11
WO2007113622A2	2007-10-11

Foreign References:

EP0686699B1

2004-12-29

Other References:

ZHAO SHENG ET AL: "Comprehensive algorithm for quantitative real-time polymerase chain reaction", JOURNAL OF COMPUTATIONAL BIOLOGY, MARY ANN LIEBERT INC, US, vol. 12, no. 8, 1 October 2005 (2005-10-01), pages 1047-1064, XP002380886, ISSN: 1066-5277, DOI: 10.1089/CMB.2005.12.1047

Attorney, Agent or Firm:

MAIER, Daniel (München, DE)

Download PDF:

View/Download PDF PDF Help

Claims:

Claims :

1. Method for determining the concentration or activity of an analyte in a sample, the method comprising: a) mixing a sample with at least one reagent, whereby an analyte-dependent amplification reaction is set in motion, wherein the amplification of the analyte is detectable by a signal; b) measuring a signal changing over time as a result of the analyte-dependent amplification reaction; mathematically fitting a curve to signal

measurements, wherein said mathematical fitting comprises of

(i) the use of an extended Gompertz function, given by :

where x denotes the time, f denotes the signal, and yO, r, a, nO and b are parameters to be fitted,

(ii) regularization of at least one parameter or a mathematical combination of parameters such that signal curves not showing saturation within the observed time interval can be fitted robustly and with sufficient confidence,

(iii) and mathematically extracting a score value from said fitted curve; performing a quality control of said signal-time curve and determining the concentration or activity of said analyte, comprising the steps of

1) Determination if said signal-time curve is valid, wherein said signal-time curve is valid, if said score value can be extracted reliably, 2) Determination if the initial number of template molecules is below a limit of detection (LOD) , and

3) Determination of the concentration, wherein the score value relates to the initial number of analyte molecules for all curves which are valid,

wherein steps (1), (2), and (3) can be performed in any given order.

2. Method according to claim 1, wherein the analyte is a nucleic acid, in particular DNA or RNA, in particular mRNA.

3. Method according to claim 1, wherein the analyte- dependent amplification reaction is a PCR reaction, in particular a Reverse Transcriptase PCR (RT-PCR) reaction.

4. Method according to claim 1, wherein the amplification of the analyte is detectable by a fluorescence or optical signal .

5. Method according to claim 1, wherein an absolute concentration can be determined by use of an internal standard of known concentration. 6. Method according to claim 1, wherein said

regularization is performed on parameter a.

7. Method according to claim 6, wherein said

regularization is realized by a summand additional to the objective function used for fitting, where the summand is a weighted square of the z-transformed parameter a.

Parameters of the z-transformation, i.e. mean and

standard deviation, are empirical estimates from samples of signal curves showing saturation within the observed time interval which were known to be fitted robustly and with sufficient confidence.

8. Method according to claim 1, wherein parameters are constrained during fitting to ensure robust and confident estimation of parameters for curves showing no

amplification behavior or amplification behavior in the very end only. Constraints may be uni- or multivariate, linear or non-linear.

9. Method according to claim 8, wherein a constraint is used comprising the following steps (linearityNorm) :

(i) A linear model is fitted to the extended Gompertz model on the observed time interval. (Said extended

Gompertz model is defined by some parameter set which may not be optimal . )

(ii) The deviation between the linear and the extended Gompertz model is calculated using some mathematical norm (e.g. Euclidian, Manhattan or max-Norm) and based on the observed time interval.

(iii) Said deviation is compared to a suitable threshold such that a parameter set of the extended Gompertz model is said to be allowed if said deviation is above said threshold and the parameter set is said to be forbidden if said deviation is below said threshold.

Alternatively, instead of the extended Gompertz model one nly, which is defined as

10. Method according to claim 1, wherein the fitting of the extended Gompertz model is realized by a gradient- based or local optimization algorithm and wherein the starting point for said optimization algorithm and its configuration is chosen such that the resulting local optimum corresponds to a fitted model, for which the technical interpretation of the curve corresponds to the meaning of the parameters: parameters yO and r describe the beginning of the curve (background) , parameters nO and b describe the time point and velocity - respectively - of the amplification growth, and parameter a describes the height of the saturation level above background. In particular, parameters b and a must be positive.

11. Method according to claim 10, wherein the model fit is realized using a Euclidean distance measure, and wherein the optimization is nested by separating

parameters: For fixed parameters b and nO parameters yO, r, and a are optimized analytically by linear algebra operations since the objective function (including regularization) is a quadratic form of these parameters; and parameters b and nO are optimized non-linearly in an outer loop. This approach is advantageous, because it needs fewer iterations, it is more robust, and starting values have to be defined only for parameters b and nO .

12. Method according to claim 1, wherein said quality control classification is realized by a decision tree, wherein each decision is based on at least one feature from the following list: said parameters (yO, r, a, b, nO) , said score, a goodness-of-fit measure, the times of observation (in particular the bound of the interval) and features from constraints according to claims 8 or 9 if used. Each decision is derived from empirical training data by a data-driven method, wherein training curves are classified into said quality control classes by manual inspection, commercially available software or a

combination of both.

13. Method according to any of the claims above wherein said observed time interval may be restricted previous to described calculations in order to eliminate measurement outliers or parts of the curve showing behavior deviating from typical amplification behavior. 14. Method according to claims 1, 9, 11, 12, and 13, wherein said decision tree of claim 12 is degenerated to the following linear list of rules: • Is firstCycle greater than or equal to 10? If yes, set classification to "Invalid".

• Does the absolute value of yO exceed 20? If yes, set classification to "Invalid".

• Does the absolute value of r exceed 1? If yes, set classification to "Invalid".

• Is a greater than 25 or smaller than -15? If yes, set classification to "Invalid".

• Does b exceed 50? If yes, set classification to "Invalid" .

• Is nO greater than 65 or smaller than 15? If yes, set classification to "Invalid".

• Is the logarithm of linearityNorm smaller than or equal to -3.5? If yes, set classification to "Undetected" .

• Is the logarithm of linearityNorm smaller than or equal to -1.5 and the logarithm of b greater than or equal to 2.2? If yes, set classification to "Undetected".

• Is score smaller than 34 and a smaller than or equal to 0.2? If yes, set classification to "Undetected" .

• Is score smaller than 34 and a greater than 0.2 as well as smaller than 1? If yes, set classification to "Invalid".

• Is score greater than or equal to 34 and a smaller than or equal to 0.6? If yes, set classification to "Undetected".

• Is score greater than or equal to 40? If yes, set classification to "Undetected".

• Otherwise set classification to "Valid". where

- measurements have been undertaken at times x=l, 2, 3, 4, ... (cycle numbers) .

- firstCycle is related to the number of the first cycle where the measured signal is not an outlier with respect to the fitted model.

- linearityNorm is a measure of the linearity of the

fitted model according to claim 9 using the max-Norm and the Gompertz core. The linearityNorm constraint is defined by comparing the logarithm of the linearityNorm with a threshold

- the score is calculated as: score = nO - 0.12b.

15. Apparatus which is capable of automatically carrying out the method according to any of claims 1 to 14 for determining the activity or concentration of an analyte, comprising a) means for the determination of the signal changing over time as a result of the analyte-dependent amplification reaction, means for mathematically fitting a curve to signal measurements and mathematically extracting a score value from said fitted curve and storing a resultant score value, wherein said mathematical fitting comprises the use of a Gompertz function, means for performing a quality control of said signal-time curve, and means for determining a concentration or activity of the analyte according to said score value.

16. Computer program product for carrying out the method according to any of claims 1 to 14 for determining the activity or concentration of an analyte, comprising: i) means for mathematically fitting a curve to signal measurements and mathematically extracting a score value from said fitted curve and storing a resultant score value, wherein said mathematical fitting comprises the use of a Gompertz function, ii) means for performing a quality control of said

signal-time curve, and iii) means for determining a concentration or

activity of the analyte according to said score value .

Description:

Curve processor algorithm for the quality control of (RT- ) qPCR curves

1. Field of the invention

The invention is in the field of analytical technology and relates to an improved procedure for determining the concentration or activity of an analyte in a sample.

Specifically the invention provides an automated algorithm for the quality control of quantitative PCR (qPCR) assays.

1.1. Polymerase Chain Reaction (PCR)

The Polymerase Chain Reaction (PCR), developed in 1984 by Kary Mullis, is a means to amplify the amount of DNA or mRNA fragments, e.g. of a specific gene, in a (patient) sample. If mRNA fragments shall be amplified, they first have to be transcribed to cDNA in a reverse transcription (RT) step. In this case, the reaction is called RT-PCR.

The PCR takes place in small reaction tubes in a thermal cycler. The reaction mix consists of

• the original DNA template which contains the region (target) that should be amplified

• two primers which tag the beginning of the region that should be amplified on the sense respectively anti-sense strand of the DNA

• the (Taq) polymerase which synthesizes new DNA strands

• deoxynucleoside triphosphates (dNTPs) which are the components of the DNA strands to be synthesized

• a buffer solution

• divalent magnesium or manganese cations and monovalent potassium cations.

1.1.1. Basic amplifying procedure

The PCR process is a sequence of -20-50 cycles, each of them consisting of the following three steps:

1. Denaturation : The reaction mix is heated to a temperature of 94-96°C for 20-30 seconds. In the first cycle, this step can take up to 15 minutes (Initialization) . The purpose of these high temperatures is to annihilate the hydrogen bonds between the two strands of the double-stranded DNA.

2. Primer annealing: The temperature is lowered for ca.

30 seconds to a temperature which is specific for the annealing of the primers to the single-stranded

DNA (~50-65°C) . Too high temperatures lead to excessive thermal movement; hence the primers can't bind to the DNA. Temperatures which are too low forward unspecific binding of the primers to sequences of the DNA which are not entirely complementary .

3. Extension/ Elongation: The temperature is increased again to a temperature at which the (Taq) polymerase works best (~70-80°C) . The polymerase uses the dNTPs to synthesize new DNA strands which are complementary to those strands which are tagged by the primers. It starts at the 3' -end of the primer. If everything works fine, the target DNA in the reaction mix is duplicated in each cycle.

If the number of target molecules in the reaction mix at the beginning of the reaction is c ₀, the number of target molecules after n cycles is

(1) c _n = c ₀ - (1 + E)"

or equivalently

(2) log _(1+£) (cj = « + log _(1+£) (c ₀ ), where E denotes the efficiency of the PCR reaction.

Ideally the DNA is duplicated in each cycle, and the efficiency equals one. Plotting log _(1+£)(c _K) against the cycle number, one would theoretically get a straight line with slope one and intercept log _(1+£)(c ₀). Two different values of c ₀ would lead to parallel straight lines, the one belonging to the higher value of c ₀ lying above the one belonging to the lower value of c ₀. This is not correct in reality, because at the beginning and at the end of the PCR the process is inhibited for different reasons. During the first cycles there is just a small amount of template molecules, therefore the portion of the fluorescence signal which is caused by the template molecules is negligible. The end of the reaction is characterized by decreasing reactant concentrations and thus the reaction rate saturates; another problem that can occur is the deterioration of reactants. The PCR can also be used to quantify the amount of DNA or mRNA fragments in a sample. In this case, a real-time Polymerase Chain Reaction (qPCR) has to be carried out which relies on the basic PCR. It takes place in a TaqMan (Applied Biosystems) or MX3005 (Stratagene) , for example. 1.1.2. Basic quantifying principle

The qPCR follows the same pattern as the basic PCR except that a probe has to be added to the reaction mix. This probe is labeled by two fluorophors, a reporter and a quencher, and has to be designed in such a way that it binds to the target DNA strands. This binding takes place during the primer annealing phase. If the probe is activated by a specific wave length during this phase, the fluorescence of the reporter is suppressed due to the spatial vicinity of reporter dye and quencher dye as the reporter releases its energy to the quencher. The

underlying concept of this energy transfer is called FRET (fluorescence resonance energy transfer) . During the elongation phase the polymerase eliminates the probe which is dissolved. Thus the distance between reporter dye and quencher dye increases and the reporter begins to fluoresce. The higher the number of templates, the higher the number of redundant reporter molecules, therefore the intensity of the fluorescence is a measure of the initial number of target molecules.

Plotting the fluorescence intensity after n cycles against the cycle number n leads to a so-called sigmoid function which consists of three different parts: 1. Background : The number of target molecules and therefore the fluorescence intensity is very small during the first cycles. The fluorescence appears to be constant in the beginning, because the intensity caused by the amplified template is dominated by the so-called background fluorescence. The background fluorescence might be caused by impurities and degenerated reactants in the well or the optical subsystem of the PCR machine.

2. Exponential growth: During this phase, the fluorescence can be well described by equation (2), where the initial fluorescence intensity is assumed to be proportional to c ₀.

3. Plateau : Due to the consumption of available nucleotides and other limitations the synthesis of product slows down at some time. The intensity doesn't increase exponentially any more during the last cycles, but reaches a saturation phase.

To quantify the amount of DNA or mRNA fragments in a sample one now compares the fluorescence intensity to a pre-defined threshold and determines the cycle at which this threshold is reached for the first time (linear interpolation is used between subsequent cycles to obtain fractional cycle numbers) . The (fractional) cycle at which the threshold is reached for the first time is called C _t value. The earlier this takes place the higher was the amount of initial target molecules in the

reaction mix. It is important that the threshold is chosen in such a way that the C _t value is obtained during the exponential growth phase.

When using a TaqMan or a MX3005 to carry out the qPCR, the determination of the C _t values is done automatically by the associated software (SDS respectively MX Pro) except that the operator has to choose the threshold that shall be used when working with the TaqMan. If this threshold isn't reached until the end of the reaction, the C _t value is called "Undetermined" (SDS software language) . Due to contaminations within the reaction mix, failures of the laser or the photo detector which

measures the fluorescence intensity or other problems occurring during the reaction some qPCR curves show a behavior which deviates from the common sigmoid shape.

These curves have to be filtered out by visual inspection of the operator, a process which is time-consuming and subj ective .

The concept described in the second and third chapter of this document is a means to automate on the one hand the determination of C _q values and on the other hand the

quality control of qPCR reactions carried out on any

appropriate instrument. It has only been tested for

reactions carried out on a TaqMan up to now. In this

context, the term value denotes - like the C, value - a value which provides information about the initial

amount of DNA or mRNA fragments in the sample and about the validity of a curve. 1.1.3. Summary of literature / prior art

1) A Flexible Sigmoid Function of Determinate Growth

Xinyou Yin, Jan Goudriaan, Egbert A. Lantinga, Jan Vos, Huub J. Spiertz

Annals of Botany, 2003

The paper works on the determination of a function which is well suited to describe the sigmoid pattern of determinate growth in agricultural crops. A new function, the so-called "beta growth function" which is characterized by the three parameters tm, te and wmax, is proposed and its advantages (and disadvantages) in comparison to the logistic function, Richards' function, the Gompertz function, the Weibull function and two expolinear equations are worked out.

Difference to our invention:

No regularization was used to fit the model to the data.

No constraint like the linearityNorm was introduced to ensure robust fitting in case of zero target molecules.

No optimization of starting point for fitting routine is described . No rules are proposed to decide if a curve really shows a sigmoid pattern or not.

Agricultural crops are investigated, not fluorescence data of qPCR reactions.

2) A new method for robust quantitative and qualitative analysis of real-time PCR

Eric B. Shain, John M. Clemens

Nucleic Acids Research, 2008

The paper presents a new method for the analysis of real-time PCR data, the so-called "maxRatio method". This method contains the following steps:

1) Calculation of

. , . fluorescence(n)

ratw(n) =—

luoresce — 1

f nce(n - 1)

for each cycle n.

2) Determination of the maximal value of the mapping n → ratio (n) , this value is called MR.

3) Determination of the fractional cycle n for which ratio (n) = MR holds. This fractional cycle is called FCN.

4) Determination of the position of the point (FCN, MR) in a plot which depicts the MR value against the FCN value for a set of reference curves and thus classification into "normal reactive" (large MR, large FCN) , "abnormal reactive" (medium MR, medium FCN) and "nonreactive" (small MR, small FCN) .

Difference to our invention:

No model is fitted to the fluorescence data, therefore neither regularization nor constraints nor an optimization of the starting value is worked on. 3) A new real-time PCR method to overcome significant quantitative inaccuracy due to slight amplification

inhibition

Michele Guescini, Davide Sisti, Marco BL Rocchi, Laura

Stocchi, Vilberto Stocchi

BMC Bioinformatics, 2008

The paper compares four different methods (Ct method, second derivative (Cp) method, sigmoidal curve fitting (SCF) method, and CyO method) for the analysis of qPCR data. Of these, the CyO method (based on a nonlinear regression of Richards' equation) is the most accurate and precise method even in suboptimal amplification conditions. Difference to our invention:

No regularization was used to fit the model to the

fluorescence data.

No constraint like the linearityNorm was introduced to ensure robust fitting in case of zero target molecules.

No optimization of starting point for fitting routine is described .

No rules are proposed to decide if a curve really shows a sigmoid pattern or not. 4) Quantitative real-time RT-PCR based transcriptomics: Improvement of evaluation methods

Ales Tichopad

Dissertation, 2004 - Description of (RT-)qPCR reactions

- Model used to describe the data:

1) four-parametric sigmoid function (without background increase, based on

logistic function)

2) four-parametric logistic model

- Quantification by second derivative maximum of four- parametric sigmoid function (CP = nO - 1.317*b) or Ct method

- Description of experiments - Investigation of optimal quantification range

- Exact determination of efficiency (fitting of exponentially behaving phase with exponential model: f = gammaO + alpha * epsilon ^An)

- Algorithm for analysis of (RT-)qPCR data (quantification without reference gene or standardised quantification with reference gene)

Difference to our invention:

No regularization was used to fit the model to the

fluorescence data.

No constraint like the linearityNorm was introduced to ensure robust fitting in case of zero target molecules.

No optimization of starting point for fitting routine is described.

No rules are proposed to decide if a curve really shows a sigmoid pattern or not.

5) Determination of stable housekeeping genes,

differentially regulated target genes and sample integrity: BestKeeper - Excel-based tool using pair-wise correlations Michael W. Pfaffl, Ales Tichopad, Christian Prgomet, Tanja P. Neuvians

Biotechnology Letters, 2004

The paper presents a software called "BestKeeper" intended to enhance standardization of RNA quantification results (as, for example, results gained by an RT-PCR) . The tool chooses the best suited standards out of ten candidates and combines them into an index (as geometric mean) whose correlation to the expression levels of target genes can be computed. Used are Cp (crossing point) values which are gained by the

"second derivative maximum" method as computed by the

LightCycler software or Ct values.

Difference to our invention: No model is fitted to the fluorescence data, therefore neither regularization nor constraints nor an optimization of the starting value is worked on.

No rules are proposed to decide if a curve really shows a sigmoid pattern or not.

6) Inhibition of real-time RT-PCR quantification due to tissue-specific contaminants

Ales Tichopad, Andrea Didier, Michael W. Pfaffl

Molecular and cellular probes, 2004

The paper works on the influence of unknown tissue-specific factors on amplification kinetics. Various methods of Cp value acquisition (first derivative and second derivative maximum of the four-parameter sigmoid model, "fit point method" and "second derivative maximum method" computed by the LightCycler software, "Taqman threshold level"

computation method) are analyzed for this purpose. Difference to our invention:

No regularization was used to fit the model to the

fluorescence data.

No constraint like the linearityNorm was introduced to ensure robust fitting in case of zero target molecules.

No optimization of starting point for fitting routine is described .

No rules are proposed to decide if a curve really shows a sigmoid pattern or not. 7) Standardized determination of real-time PCR efficiency from a single reaction set-up

Ales Tichopad, Michael Dilger, Gerhard Schwarz, Michael W. Pfaffl

Nucleic Acids Research, 2003

The paper works on a computing method for the estimation of real-time PCR amplification efficiency. Instead of using serial dilution steps the following procedure is applied: - Linear regression of first three observations -> Is the last observation an outlier? No -> Linear regression of first four observations and so on -> Procedure is stopped when at least three subsequent observations are outliers -> The first of these three observations is regarded as the endpoint of the background phase and starting point of the exponential growth phase

- Determination of the endpoint of the exponential growth phase: Observation directly before the maximum of the second derivative (either as computed by the LightCycler software or from a four-parametric logistic model)

- Estimation of efficiency: Fitting of an exponential model (f = yO + alpha * E ^An) to the fluorescence data contained in the region of exponential growth

Difference to our invention:

No regularization was used to fit the model to the

fluorescence data.

No constraint like the linearityNorm was introduced to ensure robust fitting in case of zero target molecules.

No optimization of starting point for fitting routine is described .

No rules are proposed to decide if a curve really shows a sigmoid pattern or not.

8) Tissue-specific expression pattern of bovine prion gene: quantification using real-time RT-PCR

Ales Tichopad, Michael W. Pfaffl, Andrea Didier

Molecular and cellular probes, 2003

The quantification of expression of bovine prion

(proteinaceous infectious particle) in different organs via real-time RT-PCR is described and it is shown how the organs are involved in pathogenesis. Quantification is carried out via the "Second Derivative Maximum Method" as being

implemented in the LightCycler software (second derivative maximum within exponential phase of amplification curve is linearly related to a starting concentration of the template D A) .

Difference to our invention:

No model is fitted to the fluorescence data, therefore neither regularization nor constraints nor an optimization of the starting value is worked on.

No rules are proposed to decide if a curve really shows a sigmoid pattern or not.

9) Improving quantitative real-time RT-PCR reproducibility by boosting primer-linked amplification efficiency

Ales Tichopad, Anamarija Dzidic, Michael W. Pfaffl

Biotechnology Letters, 2002

The paper works on the impact of primer selection on the performance of real-time PCR reactions. For this purpose, a four-parametric sigmoid model is fit to the fluorescence data. Via ANOVA it is shown that most of the variance between b (slope) parameters (which is a measure for the efficiency of the primer itself and the variance of amplification efficiency) results from the use of different primers, not different tissues. Defining Cp as the maximum of the second derivative of the used four-parametric sigmoid model leads to CP = nO - 1.317 * b. Since the CP value is linearly related to b, the variability in CP values is linearly related to the amplification efficiency. It is mentioned that other sigmoid models can be used, too. Difference to our invention:

No regularization was used to fit the model to the

fluorescence data.

No constraint like the linearityNorm was introduced to ensure robust fitting in case of zero target molecules.

No optimization of starting point for fitting routine is described .

No rules are proposed to decide if a curve really shows a sigmoid pattern or not. 10) Inhibition of Taq Polymerase and MMLV Reverse Transcriptase by Tea Polyphenols (+) -Catechin and (-)- Epigallocatechin-3-Gallate (EGCG)

Ales Tichopad, Jiirgen Polster, Ladislav Pecen, Michael W. Pfaff1

Submitted, 2004

The effect of catechin and EGCG on the performance of (RT-) PCR reactions is investigated. This is done by fitting a mathematical model (the four-parametric sigmoid model) to the data and comparing the resulting fitting parameters for reactions with and without catechin and EGCG (ANOVA) .

Quantification is achieved by computing the maximum of the second derivative of the four-parametric sigmoid model or by using the "second derivative maximum method" implemented in the LightCycler software.

Difference to our invention:

No regularization was used to fit the model to the

fluorescence data.

No constraint like the linearityNorm was introduced to ensure robust fitting in case of zero target molecules.

No optimization of starting point for fitting routine is described.

No rules are proposed to decide if a curve really shows a sigmoid pattern or not.

11) Genomic Health Patent (WO 2006/014509 A2)

Difference to our invention:

No Gompertz function is used to fit a model to the normalized fluorescence data of a qPCR curve. Instead, a linear

regression analysis is performed in adjacent (possibly overlapping) regions of the data. A score relying on these regression data represents the quality of the well and determines its Pass/Fail status. The computation of a quantification value does not rely on "overall estimated" parameters "inflexion point" and "slope", but on a localized (for example quadratic) regression of the data in a predefined region and subsequent comparison with a threshold.

No description of

- regulari zation

- constraints

- optimization of starting value

12) Automated Quality Control Method and System for

Genetic Analysis

(Patent US 7398171 B2) Difference to our invention:

Quality control metrics are used to determine the status of a well (e.g. empty well), but these metrics are not derived by fitting any models to the fluorescence data. The only

exception is the derivation of the metric which tests if genetic material and probe dyes have been amplified. This metric fits a straight line to the amplification curve in order to get a baseline amplification curve.

13) WO 2010/025985

A linear regression is performed on the linear range of the fluorescence data of an RT-PCR reaction. Those values lying outside the linear range are compared to a threshold to determine if there is a signal at all. Afterwards, a

(Gompertz) model is fitted to the background-subtracted data and rules relying on the fitting parameters are defined to classify a curve as valid or invalid. Quantification is achieved by using a cp value (maximum of second derivative) or a bv value (intersection between background and tangent in the inflexion point) .

Difference to our invention: No regularization was used to fit the model to the fluorescence data.

No constraint like the linearityNorm was introduced to ensure robust fitting in case of zero target molecules.

No optimization of starting point for fitting routine is described .

The rules which are defined are only univariate, in our invention there are also bivariate rules.

2. Description of the invention

The invention provides an automated algorithm for the

quality control of (RT-) qPCR reactions. Plotting the

fluorescence intensity of a reporter dye divided by the fluorescence intensity of a passive reference dye against the cycle number leads to a so-called sigmoid function which is characterized by a background phase, an

exponential growth phase and a plateau phase. Since the fluorescence intensity as a function of cycles relates to the initial number of template molecules in the sample, qPCR curves can be used to quantify the amount of RNA or DNA fragments in the sample by determination of a so- called Cq value. Due to contaminations within the

reaction mix, unintended chemical reactions within wells, failures of the optical system of the PCR device which measures the fluorescence intensity, or other problems occurring during the reaction (e.g. air bubbles) some

qPCR curves show a behavior which deviates from the

common sigmoid shape. Information gained by these curves shouldn't be used for further analyses. Therefore,

quality control of qPCR curves consists of three steps:

1) Determination if a curve is valid at all.

2) Determination if the initial number of template

molecules is zero or very small (below some LOD) . 3) Determination of a Cq value which relates to the initial number of template molecules for all curves are valid.

A mathematical model (on the basis of the Gompertz function which is suitable to describe sigmoid curves) is fit to the data in consideration of nonlinear constraints and regularization parameters. In detail, values for parameters yO, r, a b and nO are chosen in such a manner that the deviation (normalized sum of squared errors) between the data and the model

is minimized, where x denotes the cycle and f (x) is fitted to the normalized fluorescence signal. Parameter b is forced to be positive (by considering exp (beta) instead of b) and a is regularized to be approximately 3.9110. Additionally, parameter combinations for which the largest absolute difference between the exponential term on the right hand side of the equation (called

Gompertz term) and a straight line gained by a linear regression of the Gompertz term (called linearityNorm) is smaller than 10-3 are forbidden by a constraint. The parameters nO and b are used for the definition of a so- called AIP value which is a measure of the amount of RNA fragments in the sample. The optimal AIP value was found to be

AIP = n _n - 0.72 - 6

The six features (yO, r, a, b, nO, linearityNorm) and the AIP value are used to define a set of rules which

identify a curve as "Numeric" (quantification by AIP value) , "Invalid" (curve is not reliable and should be ignored for further processing) or "Undetected" (initial number of molecules zero or very small) .

There are several inventive steps that have to be

combined to yield the advantages:

We used regularization of parameter a to ensure

robustness of the fitting of the Gompertz model.

We introduced the linearityNorm constraint to ensure robust fitting in case of zero target molecules.

We optimized the starting point and customized the numeric optimization algorithm to yield meaningful parameter values (intended local minimum of objective function) .

Based on real data we defined rules that rely on the features yO, r, a, b, nO, linearityNorm, AIP and that classify fluorescence curves into "Numeric" , "Invalid" or "Undetected".

The first advantage which arises from the present

invention is the objectivity and reproducibility of the method. This is especially advantageous for curves for which the decision between "Numeric" and "Invalid" is questionable and differs when asking different operators. In addition, tests have shown that when analyzing

triplicate measurements the number of outliers is smaller when using the curve processor algorithm instead of Ct values. Furthermore, Cq values do not depend on the choice of a certain threshold.

The second advantage is saving of work time which

currently is necessary to manually control fluorescence curves. Thus, costs are reduced.

The invention relates to a method for determining the concentration or activity of an analyte in a sample, the method comprising: a) mixing a sample with at least one reagent, whereby an analyte-dependent amplification reaction is set in motion, wherein the amplification of the analyte is detectable by a signal; measuring a signal changing over time as a result of the analyte-dependent amplification reaction; mathematically fitting a curve to signal

measurements, wherein said mathematical fitting comprises of

(i) the use of an extended Gompertz function, given by :

where x denotes the time, f denotes the signal, and yO, r, a, nO and b are parameters to be fitted,

1) Determination if said signal-time curve is valid, wherein said signal-time curve is valid, if said score value can be extracted reliably,

2) Determination if the initial number of template molecules is below a limit of detection (LOD) , and

3) Determination of the concentration, wherein the score value relates to the initial number of analyte molecules for all curves which are valid,

wherein steps (1), (2), and (3) can be performed in any given order. In this context, "extracting a score value reliably" refers to the ability to generate a similar score value in a duplicate experiment. Said score value can be the above described Cq value.

When measuring a signal's changing over time, time can be measured as real time (in seconds, minutes or hours) or, in the case of cyclic amplification reactions such as PCR time can also be measured in terms of amplification cycles .

According to an aspect of the invention the analyte is a nucleic acid, in particular DNA or RNA, in particular mRNA.

According to an aspect of the invention the analyte- dependent amplification reaction is a PCR reaction, in particular a Reverse Transcriptase PCR (RT-PCR) reaction.

According to an aspect of the invention the amplification of the analyte is detectable by a fluorescence or optical signal . According to an aspect of the invention an absolute concentration can be determined by use of an internal standard of known concentration.

According to an aspect of the invention said

regularization is performed on parameter a. In

particular, said regularization may be realized by a summand additional to the objective function used for fitting, where the summand is a weighted square of the z- transformed parameter a. Parameters of the z- transformation, i.e. mean and standard deviation, are empirical estimates from samples of signal curves showing saturation within the observed time interval which were known to be fitted robustly and with sufficient

confidence .

According to an aspect of the invention parameters are constrained during fitting to ensure robust and confident estimation of parameters for curves showing no

amplification behavior or amplification behavior in the very end only. Constraints may be uni- or multivariate, linear or non-linear. In particular, a constraint may be used comprising the following steps (linearityNorm) :

(i) A linear model is fitted to the extended Gompertz model on the observed time interval. (Said extended

Gompertz model is defined by some parameter set which may not be optimal . )

(ii) The deviation between the linear and the extended

Gompertz model is calculated using some mathematical norm (e.g. Euclidian, Manhattan or max-Norm) and based on the observed time interval.

Alternatively, instead of the extended Gompertz model one can use nly, which is defined as gompertzCore

According to an aspect of the invention the fitting of the extended Gompertz model is realized by a gradient- based or local optimization algorithm and wherein the starting point for said optimization algorithm and its configuration is chosen such that the resulting local optimum corresponds to a fitted model, for which the technical interpretation of the curve corresponds to the meaning of the parameters: parameters yO and r describe the beginning of the curve (background) , parameters nO and b describe the time point and velocity - respectively - of the amplification growth, and parameter a describes the height of the saturation level above background. In particular, parameters b and a must be positive. According to an aspect of the invention the model fit is realized using a Euclidean distance measure, and wherein the optimization is nested by separating parameters: For fixed parameters b and nO parameters yO, r, and a are optimized analytically by linear algebra operations since the objective function (including regularization) is a quadratic form of these parameters; and parameters b and nO are optimized non-linearly in an outer loop. This approach is advantageous, because it needs fewer

iterations, it is more robust, and starting values have to be defined only for parameters b and nO .

According to an aspect of the invention said quality control classification is realized by a decision tree, wherein each decision is based on at least one feature from the following list: said parameters (yO, r, a, b, nO) , said score, a goodness-of-fit measure, the times of observation (in particular the bound of the interval) and features from constraints according to claims 8 or 9 if used. Each decision is derived from empirical training data by a data-driven method, wherein training curves are classified into said quality control classes by manual inspection, commercially available software or a

combination of both. According to an aspect of the invention said observed time interval may be restricted previous to described calculations in order to eliminate measurement outliers or parts of the curve showing behavior deviating from typical amplification behavior.

According to an aspect of the invention said decision tree is degenerated to the following linear list of rules : • Is firstCycle greater than or equal to 10? If yes, set classification to "Invalid".