Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A METHOD OF ANALYSING DATA FROM CHEMICAL ANALYSIS
Document Type and Number:
WIPO Patent Application WO/2013/098169
Kind Code:
A1
Abstract:
The present invention relates to a method of analysing data obtained from a first sample and a second sample using the same chemical analysis technique, the data comprising a first data set from the first sample and a second data set from the second sample. The method takes into account both the relative difference of the amounts of the compound in both samples as well as the estimated absolute abundance of the same compound in both samples. The method is particularly applicable to data from a combined chromatography- mass spectrometry analysis, wherein the method is applied to data from the chromatography stage to select the chromatography peaks having the highest rank scores, and then the mass spectrometry data for the selected chromatography peaks is analysed further.

Inventors:
KNORR ARNO (DE)
Application Number:
PCT/EP2012/076244
Publication Date:
July 04, 2013
Filing Date:
December 19, 2012
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
PHILIP MORRIS PROD (CH)
KNORR ARNO (DE)
International Classes:
G01N30/86; A24B3/00; A24C5/34; G01N30/88; G06K9/62
Domestic Patent References:
WO2007012643A12007-02-01
Other References:
PÄR JONSSON ET AL: "High-Throughput Data Analysis for Detecting and Identifying Differences between Samples in GC/MS-Based Metabolomic Analyses", ANALYTICAL CHEMISTRY, vol. 77, no. 17, 1 September 2005 (2005-09-01), pages 5635 - 5642, XP055026079, ISSN: 0003-2700, DOI: 10.1021/ac050601e
CHEN P X ET AL: "Mainstream Smoke Chemical Analyses for 2R4F Kentucky Reference Cigarette", BEITRAEGE ZUR TABAKFORSCHUNG INTERNATIONAL, HAMBURG, DE, vol. 20, no. 7, 1 November 2003 (2003-11-01), pages 448 - 458, XP003024851, ISSN: 0173-783X
DALLUGE J ET AL: "Unravelling the composition of very complex samples by comprehensive gas chromatography coupled to time-of-flight mass spectrometry - Cigarette smoke", JOURNAL OF CHROMATOGRAPHY, ELSEVIER SCIENCE PUBLISHERS B.V, NL, vol. 974, no. 1-2, 18 October 2002 (2002-10-18), pages 169 - 184, XP004387548, ISSN: 0021-9673, DOI: 10.1016/S0021-9673(02)01384-5
V. V. MIHALEVA ET AL: "Automated procedure for candidate compound selection in GC-MS metabolomics based on prediction of Kovats retention index", BIOINFORMATICS, vol. 25, no. 6, 28 January 2009 (2009-01-28), pages 787 - 794, XP055020002, ISSN: 1367-4803, DOI: 10.1093/bioinformatics/btp056
PIERCE ET AL: "Recent advancements in comprehensive two-dimensional separations with chemometrics", JOURNAL OF CHROMATOGRAPHY, ELSEVIER SCIENCE PUBLISHERS B.V, NL, vol. 1184, no. 1-2, 28 February 2008 (2008-02-28), pages 341 - 352, XP022503348, ISSN: 0021-9673, DOI: 10.1016/J.CHROMA.2007.07.059
Attorney, Agent or Firm:
MASCHIO, Antonio (Southampton Hampshire SO15 2ET, GB)
Download PDF:
Claims:
Claims

1. A method of analysing data obtained from a first sample and a second sample using the same chemical analysis technique, the data comprising a first data set from the first sample and a second data set from the second sample, comprising the steps of:

(a) providing a first value L1 n of a variable Ln from the first data set and a second value L2n of the same variable from the second data set;

(b) calculating an effect score En for the variable Ln based on the formula:

En = k[(L1 n - L2n)/(L1 n + L2n)] * 100

(c) calculating a rank score Rn for the variable Ln based on the formula:

Rn =k[En3/1000] * [L1 n + L2n]/2 wherein k is a constant

(d) repeating steps (a) to (c) for a plurality n of variables; and

(e) comparing the rank scores for each of the variables.

2. The method of claim 1 , wherein the data comprises a series of peaks at n positions along an axis, and the values L1 n and L2n are measures of the magnitudes Ln of first and second peaks having the nth position on the axis in the first and second data respectively.

3. The method of claim 2, wherein L1 n is the integrated area under a peak in the first data set and L2n is the integrated area under a peak in the second data set, for the variable Ln.

4. The method of any one of the preceding claims, comprising selecting the variables Ln to be analysed by comparing L1 n and L2n, and calculating the effect score for those variables where | L1 n - L2n | exceeds a threshold value.

5. The method of any one of claims 1 to 3, comprising prior to step (b), applying a t-test and excluding variables which are not significantly different based on a threshold value.

6. The method of any one of the preceding claims, wherein the peaks represent concentrations of chemical compounds in the samples, and L1 n and L2n are concentrations of a particular chemical compound in the first and second samples respectively.

7. The method of any one of the preceding claims, wherein the chemical analysis is a chromatography stage in a combined chromatography- mass spectrometry analysis, and the method further comprises selecting the chromatography peaks having the highest rank scores, and analysing the mass spectrometry data for the selected chromatography peaks.

8. The method of claim 7, wherein analysing the mass spectrometry data comprises identifying the chemical compounds corresponding to the selected chromatography peaks.

9. The method of claim 7 or 8, wherein the data is two dimensional gas chromatography- time-of-flight mass spectrometry (GC-GC-TOF) data, or liquid chromatography-high resolution mass spectrometry (LC-HR-MS) data.

10. The method of any one of the preceding claims, comprising ranking the variables based on the rank scores.

1 1 . The method of any one of the preceding claims, where the chemical analysis is a non- targeted assay of a sample comprising a mixture of chemical entities.

12. A computer program comprising code means which, when run on a computer processor, cause the processor to carry out the method of any one of the preceding claims.

Description:
A METHOD OF ANALYSING DATA FROM CHEMICAL ANALYSIS Field of the Invention

The present invention relates to a method for comparing and analysing the results of a chemical analysis of two samples. Background to the Invention

A number of different chemical analysis techniques are known for analysing chemical or biological samples. In spectroscopy, the interaction of a sample with electromagnetic radiation of various wavelengths is measured. In chromatography, a sample is separated into its component chemicals by their different mobility in a medium, such as gas chromatography, wherein the sample is vaporised and the components in gas phase travel at different speeds within the chromatographic matrix and are separated in a chromatography column. In mass spectrometry, a sample is treated to create ionised fragments of the component molecules, which are then separated by their mass-to-charge ratios and detected.

The results of such forms of chemical analysis are often displayed graphically in the form of a series of peaks. For example, in spectroscopy, the peaks will show absorption by the sample of radiation at particular wavelengths. In chromatography, the peaks will show concentrations or amounts of sample having particular retention times in the column. In mass spectrometry, the peaks will represent the number of fragments detected at specific mass-charge ratios.

Chemical analysis can be used in a targeted manner, to identify particular chemical compounds in a single sample. In this case, analysis of the data may require identification of peaks or troughs at particular points in the data spectrum.

However, sometimes such chemical analysis techniques may be used in a comparative manner, in a non-targeted fashion. For example, a chemical analysis technique may be used to compare two samples. These may be different samples of a product in which a change has been made in preparation, storage, treatment or a combination of the foregoing, but the affect of that change is previously unknown or uncharacterised. In this case, the aim is to focus on differences between the results represented in data sets obtained from products with a change and products without the change. The first stage might be to identify the most significant differences between the two data sets, with a second stage being to identify the particular chemical compound that are responsible for those differences.

In cigarette product development, such comparative analysis is often carried out to determine the effect of a change in the product, such as use of a different manufacturing technique, different tobacco curing methods, storing the products at different environments for varying periods of time, use of a different blend of tobacco, the inclusion of an additive, a change in a cigarette component including material or its construction (e.g., a filter), heating the product at a different temperature over varying periods of time, or a combination of the foregoing, on the chemical composition of the smoke.

A particular type of chemical analysis commonly used is a combination of a chromatography stage and a mass spectrometry stage. Two particular examples are two dimensional gas chromatography- time of flight mass spectrometry (GCXGC-TOF) and liquid chromatography- high resolution mass spectrometry. A sample is first separated into its constituent chemicals in a chromatography stage, and each fraction is then subjected to a mass spectrometry analysis. In the case of two dimensional gas chromatography a second chromatography stage uses another column with a stationary phase of a different selectivity to further separate chemicals that elute from the first column at the same retention time.

Clearly, when a sample such as cigarette smoke is analysed, the results of the chemical analysis can be very complicated and include large amounts of data. Therefore, in a case when two samples are being compared, a method of analysing the data to determine the most significant and relevant differences is desirable.

Summary of the Invention

The present invention provides a method of analysing data obtained from a first sample and a second sample using the same chemical analysis technique, the data comprising a first data set from the first sample and a second data set from the second sample, comprising the steps of:

(a) determining a first value L1 n of a variable Ln from the first data set and a second value L2n of the same variable from the second data set;

(b) calculating an effect score En for the variable Ln based on the formula: En = k[(L1 n - L2n)/(L1 n + L2n)] * 100

(c) calculating a rank score Rn for the variable Ln based on the formula:

Rn =k[En 3 /1000] * [L1 n + L2n]/2 wherein k is a constant

(d) repeating steps (a) to (c) for a plurality n of variables; and (e) comparing the rank scores for each of the variables.

According to the invention, the Rank formula provides a reliable method of selecting the variables that have the most significant difference between the two data sets, and thus allows further analysis to concentrate on those particular variables. The formula takes into account the relative difference between the observed values of a variable as reflected by the Effect score and the abundance of the variable in question in the two samples expressed as the "average value" or [L1 n + L2n]/2. The variable may be any property that is being measured by the chemical analysis in both data sets. For example, in chromatography, the variable may be the magnitude of a peak at a specific retention time, and a plurality of variables can be the magnitudes of peaks at various selected retention times. In spectroscopy, the variable may be the magnitude or intensity of an absorption peak at a particular wavelength or frequency, and the plurality of variables are the magnitudes or intensities of peaks at different selected wavelength or frequency.

In particular the data set may comprise a series of peaks at n positions along an axis, and the values L1 n and L2n are measures of the magnitudes Ln of first and second peaks having the nth position on the axis in the first set and second data set respectively.

Thus the Rank formula allows the selection of peaks showing the most relevant difference(s) between the two data sets. The analysis can therefore identify the result of changes between the two samples. Preferably the values are the integrated areas under the peaks. In most forms of chemical analysis, particularly chromatography, the area under a peak most accurately reflects the concentration of the chemical entity responsible for the peak. The actual values from the data may be converted to concentrations by reference to a reference peak in the data, resulting from inclusion of a reference compound in the sample. Preferably the variables Ln to be analysed are first selected by comparing L1 n and L2n, and calculating the effect score for those variables where | L1 n - L2n | exceeds a threshold value. Preferably, the selection is made by applying a t-test for each pairs of variables in the two data sets and excluding variables which result in a t score greater than a threshold value given a p value. This initial statistical filtering step removes variables that are not significantly different between the two samples.

Preferably the peaks represent concentrations of chemical compounds in the samples, and L1 n and L2n are concentrations of a particular n th chemical compound in the first and second samples respectively. Preferably, the chemical analysis is a chromatography stage in a combined chromatography- mass spectrometry analysis, and the method further comprises selecting the chromatography peaks having the highest rank scores, and analysing the mass spectrometry data for the selected chromatography peaks. Further analysis can then focus on those peaks that constitute the most relevant and significant differences between the data sets. For example, in a combination chromatography-mass spectrometry technique, if the most significant differences can be identified in the chromatography data, then further analysis can be conducted only on the mass spectrometry data for those corresponding to the identified peaks in the chromatography data. Thus, in a combined chromatography-mass spectrometry analysis, the Rank formula can be used to select the peaks showing most difference between the chromatography data sets. Each peak corresponds to a chemical compound, which can be identified by reference to the mass spectrometry data. Therefore, the present invention reduces the amount of data that must be processed, and therefore increases speed of processing by focussing on differences in the chromatography spectra.

In a non-targeted differential screening assay where the invention method can be applied to analyze the chemical differences of two samples, compounds from the two samples are ranked individually according to its relevance considering the relative differences in abundance of each of the compounds as well as the quantitatively or semi-quantitatively determined abundance of each in the respective samples.

Brief Description of the Drawings

Examples of the invention will now be described with reference to the accompanying drawings, in which: Figure 1 is a flow diagram of the method;

Figure 2 illustrates graphically the allocation of HIT values; and

Figure 3 show the correlation of HIT values in GCXGC-TOF data by using the invented Rank procedure compared to the results from a common approach using PLS-DA.

Detailed Description Figure 1 is a flow diagram illustrating the method of the present invention used to analyse data obtained from a chemical analysis technique, particularly with reference to a combined chromatography-mass spectrometry analysis technique such as two dimensional gas chromatography- time of flight mass spectrometry (GCXGC-TOF) or liquid chromatography-high resolution mass spectrometry (LC-HR-MS).

In data generation step 100, a first sample and a second sample are subjected to the same chemical analysis technique to obtain a first data set and a second data set.

In the specific example wherein the data is obtained from a combined chromatography-mass spectrometry analysis technique, each data set is represented as a series of chromatography peaks, representing amounts of chemical entities that are eluted at certain retention times. The data set further includes mass spectrometry data for the fractions which have been separated in the chromatography stage. The chromatography data may itself include two dimensions, in the case where the chromatography stage is two dimensional gas chromatography.

In the peak alignment step 200, the data from the two samples is compared and corresponding peaks aligned to produce a consistent data matrix. In the case of a combined chromatography- mass spectrometry data analysis, the chromatography data is compared and corresponding peaks in the two data sets aligned based on mass-spectral similarity and chromatographic property, such as retention time or retention index. Various existing software packages can be used for this, such as ChromaTOF for GCXGC-TOF and MZmine for LC-HR-MS. In peak integration step 300, the aligned peaks of each data set are integrated to calculate the area under each peak, which is a measure of the concentration of the chemical entity in the sample that contributes to the peak. Although in general, a chromatography peak represents the totality of any substances having the same retention time, and can thus represent a mixture of chemical compounds. However, in high precision chromatography techniques such as GCXGC, or LC-HC, a peak will represent a single chemical compound in the samples.

In peak normalisation step 400, the peaks are normalised. In chromatography, to determine concentrations of chemical compounds in a sample, the peaks corresponding to the chemical compounds may be compared to a peak from an internal standard compound which is included at a known concentration with both the samples. An example of such an internal standard is ds-isophorone.

In data filtering step 500, the data is filtered statistically to remove corresponding peaks from both data sets that are not visibly or meaningfully different. This step comprises applying a statistical technique, for example, the t-test to compare the determined values of each of the same variable (or same peak representing a chemical entity) in the two samples. Thus, for each sample, multiple determinations of the value of each variable (or peak) are made. In the present invention, this means that each sample is subjected to the same chemical analysis technique multiple times, i.e., n-ι and n 2 , to obtain replicated data sets for the same sample. The number of replication for each sample can be the same or different. A low number of repetitions n-i and n 2 such as 3 to 5, is contemplated. The t-test is a statistical technique that will be well known to the person skilled in the art, but is outlined below. In the context of the explanation below, for sample 1 and sample 2, replicated data sets for each sample are generated experimentally. For each variable or peak, the replicated data from each sample are being compared pairwise. For example, where the chemical analysis for each sample is made in triplicates, i.e., n-ι = 3 and n 2 = 3, and given a particular peak, the comparison involves two replicated sets of data each comprising three determined values. Such a matched pairwise comparison is made for every variable (or peak) in the data sets.

The t-test can computed conveniently for example by using the Microsoft Excel spreadsheet formula: t-test (dataset L1 , dataset L2, tails, type) where :

tails = 2; two-tailed distribution and type = 3; heteroscedastic

L1 : first paired data set comprising replicated data

L2: second paired data set comprising replicated data To test if the means of both data sets are different, the t-test with the test statistic was used.

t =

5· X J—X 2

where

Si and s 2 are the standard deviation of the first data set and the second data set and n-i are the number of observations for each data set. The test statistic was approximately t- distributed with the degrees of freedom (D.F.) calculated using.

A table of Student's t-distribution confidence intervals can be used to determine the significance level at which two distributions differ. Depending on the chemical analysis technique, a value of p at 0.05 or at 0.1 can be used to determine whether statistically the peaks from the two samples are not significantly different, and are thus excluded from further analysis. The remaining pairs of peaks may be numbered from 1 to N. The calculated concentration corresponding to the area of the nth peak can be represented as L1 n in the first data set and L2n for the same peak in the corresponding nth position in the second data set.

In rank score generation step 600, a rank score Rn is calculated for each remaining pair of peaks by application of an empirically developed RANK formula to the t-test filtered data: Rn =k[En 3 /1000] * [L1 n + L2n]/2 where L1 n and L2n are the concentrations for corresponding nth peaks in the first sample data and the second sample data respectively. The value of L1 n and L2n can each be the mean of the replicates. The effect score En is calculated based on the formula:

En = [(L1 n - L2n)/(L1 n + L2n)] * 100 Thus the RANK formula considers both the relative difference of the abundance of the same compound in both samples as well as the estimated absolute abundance of the compound in both samples, (ie the greater the difference and absolute abundance, the greater the relevance). The formula comprises a mathematical combination of the difference of the variable (or chemical entity)("effect" %) and the abundance of the chemical entity (the average concentration = [L1 n + L2n]/2). The constant k is simply a scaling factor which may vary depending on the units used for L1 n and L2n. Ultimately, the aim is to compare Rank values to each other, so the values can be scaled in a linear way by any constant without affecting the resulting comparison. For example, k can be 1.

Once the Rank values Rn have been calculated for the various peaks, these can be ranked in order of the Rn values, wherein the higher the absolute value of Rn, the more significant the difference in the corresponding peaks between the two samples. The Rank value may be positive or negative, depending on the arbitrary choice of which data set is taken as the "first data set" and which is the "second data set". For example, if L1 n is higher than L2n, Rn will be positive, but if L1 n and L2n are reversed, Rn will have the same absolute value, but will be negative.

As mentioned above, in high precision chromatography techniques such as GCXGC, or high mass-resolution LC-MS, a peak (in the total ion current for GCxGC or in a well defined mass- trace in the case of high resolution LC-MS) will represent a single chemical compound in the samples. Therefore, the RANK formula allows the recognition of chromatography peaks with the most relevant differences between two samples, wherein each of these peaks represent a chemical compound in the samples. Therefore, in selection step 700, a set of peaks are selected based on the Rank values, for further analysis. These might be, for example, the top N Rank values (absolute values), or they may be all peaks having an absolute Rank value higher than a threshold. The pairs of peaks may be allocated a HIT value, which is simply a ranking allocated by placing the Rank values in order. Therefore, the highest Rank value is allocated a HIT value of 1 , the second highest is allocated a HIT value of 2, and so on. For negative Rank the negative value having the highest absolute value is allocated a HIT value of -1 , the second highest absolute value is allocated a HIT value of -2, and so on. Figure 2 illustrates graphically the allocation of HIT values to pairs of peaks.

In identification step 800, the peaks selected in the selection step 700 are chemically identified. This can be performed by reference to the results of a further chemical analysis step. In particular, in a combined chromatography- mass spectrometry system, the mass spectrometry data for the selected peaks can be analysed to identify the chemical compounds responsible for the chromatography peaks. This analysis of the mass spectrometry data may be performed manually, but preferably the analysis involves a computer program which matches the mass spectrometry data with reference data in a mass spectra library.

The method of the present invention is preferably implemented in the form of a computer program which may be run on a computer system.

Examples

The method of the invention is verified by analysis of smoke samples from a combusted reference cigarette (2R4F) and fortifying the same reference cigarettes smoke samples with known amounts of 10 selected compounds. This fortified smoke samples results in samples containing different known absolute concentrations and different known relative differences in concentration for the selected compounds among thousands of compounds that are present in the reference cigarette that remain unchanged. The non-targeted differential screenjng assay using GCxGC-TOF consists of 2 analytical methods, 1 for nonpolar compounds and 1 for polar compounds. The precision and accuracy of data acquisition, data processing, and data evaluation were determined by comparing the theoretical ranking, calculated from the fortified concentrations of the selected compounds and the experimentally determined ranking. The Reference Cigarette 2R4F was purchased from the University of Kentucky, Kentucky Tobacco Research and Development Center. The cigarettes were conditioned following ISO standard 3402 (1999). The samples were generated on a 20-port Borgwaldt smoking machine RM20H according to ISO standard 3308 (2000). Total particulate matter (TPM) of a 2R4F sample was fortified with standard solutions containing 10 compounds each. The standard solutions for fortification were prepared in 4 different compositions (4 fortification mixtures) resulting in 4 different fortification levels on TPM for each of the 2 analytical methods. The 4 fortification levels covered a concentration range of 1 μg/cig. to 30 μg/cig. with a maximum difference in concentration of 30-fold. Fortification level 1 (TPM level 1 ) was compared against fortification level 2 (TPM level 2), and fortification level 3 (TPM level 3) was compared against fortification level 4 (TPM level 4) (Table 1 ). The smoke samples were generated by trapping the TPM on a glass fiber filter, followed by an extraction with a dichloromethane:acetone mixture (80 : 20, v/v). Aliquots of this TPM extracts were used for the analyses. Every sample was analyzed in triplicate.

Table 1

GCxGC-TOF Nonpolar

Aliquots of the TPM extracts were fortified with internal standards. Water was added to the extracts in equal volume amounts, then the sample was shaken and centrifuged. The dichloromethane layer was separated, dried with sodium sulfate, and analyzed by GCxGC-TOF in full scan mode. Processing of the raw data was performed using the LECO ChromaTOF software. The processing included the following steps:

(1 ) Generation of the reference peak list for the comparison of the whole sample set

- Processing of a reference sample from group 1 (TPM level 1 )

computing of the baseline

finding peaks above the baseline

- identifying peaks by library search

integration of the peaks (area, height)

calculation of Rl based on KOVATS Indices

define classification regions (exclusion of, e.g., bleed, high abundant compounds triacetine, nicotine, tailing of high abundant fatty acids)

- check and update Rl method and reprocess datafile with actual method

declare processed and flagged data as "calibration"

delete false positive peaks, e.g., tailing of peaks or column bleed

select quantitation parameters - Processing of reference samples from group 2, group 3, and group 4 (TPM levels 2, 3, and 4) against so-called "calibration" prepared from group 1

computing of the baseline

finding peaks above the baseline

identifying peaks by library search

- integration of the peaks (area, height)

calculation of retention index based on KOVATS Indices

Accumulating reference peak list by adding all peaks, that were not found in group 1 (so- called "unknowns")

Preparing reference peak list for semiquantitative calibration

select calibration curve 1 st order type with "force origin" for all peaks

select 1 st/2nd dimension Rl deviation window = 12/0.3 for all peaks

set concentration = 1 for all peaks

maximum valid concentration = 1000000, minimum valid cone. = 0

calculation of standard

(2) Comparison of the whole sample set against the reference peak list and

export data

- Process all samples against reference peak list "calibration" with the

following parameters:

Method for Nonpolar Components

o signal/noise = 100 and peakwidth = 0.06

o signal/noise = 250 and peakwidth = 0.1 1 o signal/noise = 250 and peakwidth 0.20

Method for Polar Components

o signal/noise = 100 and peakwidth 0.06

o signal/noise = 250 and peakwidth 0.1 1

o signal/noise = 250 and peakwidth 0.20

- Each processing block contains:

computing of the baseline

finding peaks above the baselin

o calculation of retention index based on KOVATS Indices

o quantifying all compounds in the reference peak list for each sample

o export all peak information in ASCII CSV file format

Semiquantification of Compounds Using MS-EXCEL Software

The calculation of peak areas (integration) for large amounts of different compounds is a critical step due to the different chromatographic behaviour of individual compounds. In order to enhance the quality of the integration process, the samples were processed 3 times using different peak integration parameters as described, and the maximum value of the integration results for each component was calculated. For semiquantification, each compound was referred to 1 of the internal standards. Every internal standard was dedicated to a certain compound class.

Extraction of Significant Different Compounds Using MS-EXCEL Software

The extraction of significantly different compounds between the different groups was done by applying the t-test on the data set (2 groups, 3 replicates, i.e., 6 observations/variable). t-test (dataset Lx, dataset Ly, tails, type) tails = 2; two-tailed distribution type = 3; heteroscedastic

Lx: Measured values of level or group to be compared with Ly Ly: Measured values of level or group to be compared with Lx

Comparisons where p > 0.05 were not considered to be statistically significantly different. Therefore, these compounds were excluded from further analysis. Sorting of Compounds by Relevance According to Rank Parameter The sorting of significantly different compounds by their relevance was done by applying an empirically developed formula ("rank") on the t-test filtered data set. This RANK formula mathematically combines 2 criteria:

- difference of the variable ("effect" [%]) - abundance of the variable ("average concentration" [^g/cig.]).

RANK = [(Effect)3/1000] x [Average Concentration]

Average Concentration = (Lx+Ly)/2 Effect = (Ly-Lx)/(Ly+Lx) * 100

The data set was divided for positive (Lx > Ly) and negative (Lx < Ly) rank values and sorted by increasing absolute rank values for the positive as well as the negative effect.

According to the procedure described, the theoretical rank values were calculated for the fortification matrix. Then, the compounds were sorted by relevance according to their theoretical rank values.

Table 2 shows the theoretical Rank values and theoretical HIT numbers for the GCxGC-TOF nonpolar method as the chemical analysis technique. The HIT values are simply placing the ten Rank values in order of magnitude, from -5 to +5.

Table 2

After applying the described procedure for the determination of chemical differences to the measured data sets, the result table gave the experimentally found HIT numbers for the fortified compounds. By comparing the experimentally found HIT numbers to the theoretical HIT numbers, the method's ability to perform an appropriate ranking of compounds in fortified matrix samples differing in fortified concentrations was verified. This was done by correlating the theoretical reciprocal HIT numbers against the experimentally found reciprocal HIT numbers. The correlation was done by using the Pearson correlation coefficient.

xi theoretical reciprocal HIT numbers

y: experimentally found reciprocal HIT numbers

Table 3 shows the results for the GCxGC-TOF nonpolar method as the chemical analysis technique.

Table 3

For this non-limiting example, correlation coefficient for the reciprocal theoretical HIT numbers and the experimentally found reciprocal HIT numbers must be >0.98. Other thresholds may be applicable depending on the application.

The same procedure was applied to the samples using the polar method. The correlation coefficients for the reciprocal theoretical HIT numbers and the experimentally found reciprocal HIT numbers were: - nonpolar method: r = 0.9931 with significance level p≤0.05

- polar method: r = 0.9569 with significance level p≤0.05 r = 0.9847 with significance level p≤0.1

The specifity and selectivity of the assay was shown by comparing the experimentally found HIT numbers of the relevant chemical differences to the theoretical HIT numbers. The specificity and selectivity of the assay was sufficient to extract the fortified compounds and correlate the theoretical reciprocal HIT numbers and the experimentally found reciprocal HIT numbers with a predefined correlation coefficient, >0.98 for the nonpolar method.

The polar method showed less specifity/selectivity resulting in less significant results than the nonpolar method. The p-value of the t-test was adapted to p < 0.1 for the polar method to enable the extraction of the fortified compounds and the correlation of the theoretical reciprocal HIT numbers and the experimentally found reciprocal HIT numbers with a correlation coefficient of >0.98.

Figure 3 shows a correlation in HIT values in GCXGC-TOF data by using the invented Rank procedure compared to the results obtained from partial least square-discriminant analysis (PLS-DA), a commonly applied method for this type of analysis.

In summary, the RANK formula provides a method by which differences in two corresponding spectra from a chemical analysis can be compared, and the differences ranked by mathematical modelling. The model has been generated based on expert chemical knowledge and generated a numerical reflection of the relevance of a found difference within a comparative chemical assay.

The ability to quickly identify significant differences between two data sets allows further analysis to concentrate on those differences, thus increasing the speed of data analysis and processing.