Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
PREDICTIVE CHROMATOGRAPHY OF ORGANIC PLANT EXTRACTS
Document Type and Number:
WIPO Patent Application WO/2021/206570
Kind Code:
A1
Abstract:
The present disclosure draws attention to a method and system for predicting the phytochemical composition of a plant extract, as would have been determined from tedious laboratory procedures, from the time-series sensor data of the environment conditions in which the plant grew and the laboratory conditions in which the extraction would take place. The efficient encoding of the relational patterns between the laboratory-determined chromatographic profile of an extract (the output) and the time-series sensor data of environmental conditions and laboratory specifications (the input) is necessary for the standardization of herbal formulations.

Inventors:
JUANICO DRANDREB EARL (PH)
BACONG JUNELLE REY (PH)
NONAT PAUL VINCENT (PH)
AGUEL NIEL JON CARL (PH)
Application Number:
PCT/PH2020/050004
Publication Date:
October 14, 2021
Filing Date:
April 05, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
JUANICO DRANDREB EARL (PH)
International Classes:
G16B40/00; G06Q50/02; G16B40/10; H04W84/18
Foreign References:
KR20130132214A2013-12-04
US20040034477A12004-02-19
US20130006401A12013-01-03
Other References:
WANG ZHIGUANG, OATES TIM: "Imaging Time-Series to Improve Classification and Imputation", PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI 2015), 31 May 2015 (2015-05-31), pages 3939 - 3945, XP055865088, Retrieved from the Internet [retrieved on 20211124]
Download PDF:
Claims:
CLAIMS

1. A method for predicting the phytochemical chromatographic profile of organic plant extracts comprising: continuously collecting datasets on two or more of environmental and laboratory parameters, relevant to the phytochemical composition of plant extracts, through a network system of remote sensing instruments; preprocessing of the dataset consisting of the environmental and laboratory data through a data embedding and encoding procedure (DEEP); and encoding of the relational patterns hidden within the dataset through a data-encodified learning type-algorithm (DELTA); wherein the predictive model resulting from DEEP and DELTA are integrated with the network system of remote sensing instruments in accordance with a processor and a memory. 2. The method of claim 1, wherein the two or more of environmental parameters are one or more of soil parameters, such as pH, moisture, salinity, nutrient content, and one or more of atmospheric parameters, such as temperature, humidity, ambient light, and carbon dioxide concentration. 3. The method of claim 1, wherein the two or more of laboratory parameters are two or more specifications for the extraction, liquid chromatography, gas chromatography, high-performance liquid chromatography, high-performance thin layer chromatography, ultraviolet spectroscopy, mass spectrometry, or other applicable detection techniques.

4. The method of claim 1, wherein the network system of remote sensing instruments comprises a plurality of interconnected sensors for the

SUBSTITUTE SHEETS (RULE 26) continuous and automated, in situ data col lection of two or more environmental parameters across an area of vegetation, and two or more laboratory parameters within a laboratory setting.

5. The method of claim 1 , wherein the network system of remote sensing instruments includes a front-end service, an instance of which is a web appl ication, that provides users an interface for performing tasks remotely such as , but not l imited to, the visual ization and retrieval of historical data from the database server , and cal ibration of the sensors .

6. The method of claim 1 , wherein DEEP comprises the steps of : preparing the sample pairs of time-series sensor data and laboratory- determined chromatographic profi le; and converting, through a tempo- spatial transformation, the time-series data series into spatial ly organized information.

7. The method of claim 6, wherein the tempo- spat ial transformation is a Gramian Angular Summation/Difference Field.

8. The method of claim 6, wherein the spatial ly organized information is a two-dimensional image that can be readi ly assimi lated and processed by a DELTA.

9. The method of claim 1 , wherein DELTA is an optimization of the accuracy of predicting a laboratory-determined chromatographic profi le from the time-series sensor data.

10. The method of claim 1 , wherein the integration of the network system of remote sensing instruments and the predictive model resulting from DEEP and DELTA is a computer program product comprising a non- transitory computer- readable storage medium for storing readable program code, which when executed, causes a processor to predict the

SUBSTITUTE SHEETS (RULE 26) phytochemical composition of an extract from an actual plant based only on the relevant time-series sensor data.

11. A system for the continuous and automated, in situ data col lection of various environmental parameters across an area of vegetation and laboratory parameters in a laboratory setting, comprising: a plural ity of interconnected global environmental parameter sensing instrument (GEPSI) and local environmental parameter sensing instrument ) , each comprising a variety of probes for sensing two or more environmental parameters and laboratory parameters , wherein the aggregated environmental and laboratory data from the l inked instruments are sent to a database server through wired or wireless communication, and wherein a user interfaces with the information contained in the database server to perform tasks such as , but not l imited to, visual ization and retrieval of historical data, and remote cal ibration of sensors .

12. The system of claim 11 , wherein the GEPSI comprises a variety of probes for sensing global environmental parameters , or those parameters that extend across the entire area in which a plural ity of plants inhabit and in which the extraction process is conducted, such as , but not l imited to, ambient l ight , temperature and humidity, or carbon dioxide.

13. The system of claim 11 , wherein the comprises a variety of probes for sensing local environmental parameters , or those parameters that can be measured from the substrate of the specific site within the diameter of an individual plant’ s root system such as , but not l imited to, soi l moisture, pH, nutrient content , and sal inity.

14. The system of claim 11 , wherein the aggregated environmental and laboratory sensor data from the l inked instruments are sent to a

SUBSTITUTE SHEETS (RULE 26) database server through wired or wireless communication, thus forming a network system of remote sensing instruments that provides for the real -time col lection of global and local environmental parameters . 15. A computer program product comprising a non-transitory computer readable storage medium for storing computer readable program code, which when executed, causes a processor to predict the phytochemical composition of an extract from an actual plant based only on the relevant time-series sensor data. 16. A computer program product comprising a non-transitory computer readable storage medium for storing computer readable program code, which when executed, causes a processor to send sensor data from GEPSI and to a database server , manipulate and visual ize the col lected sensor data in the database server , retrieve historical data, and cal ibrate the sensors .

SUBSTITUTE SHEETS (RULE 26)

Description:
PREDICTIVE CHROMATOGRAPHY OF ORGANIC PLANT EXTRACTS

TECHNICAL FIELD OF INVENTION

This present invention relates primarily to the field of predicting the phytochemical composition of a plant extract, as would have been determined from tedious laboratory steps, from the time-series sensor data of the environment conditions in which the plant grew and the laboratory conditions in which the extraction would take place.

BACKGROUND ART Natural products, including plants, have been the most successful source of medicines (Sardana, 2012, Nikam et.al, 2012). Each plant species is similar to a factory that is capable of synthesizing highly complex and unusual substances for various medical and non-medical applications (Nikam et.al, 2012; Kinghorn, 2002). In fact, there are at least 120 distinct plant- derived, chemical substances being used as medicinal drugs in the world, while several other substances from natural products are being modified synthetically to produce even more medicinal drugs (Nikam et.al, 2012; Farooqi , 2001).

There is an increasing demand for plant -derived, or herbal medicines all over the world (British Medical Association, 1993; Rajani and Kanaki ,

2012). The complex phytochemical mixtures in herbal medicines have been shown to have advantages over the single molecules that are isolated or synthetically modified from natural sources. For instance, herbal medicines provide less toxic side effects for general well-being, as well as, for treating mul t i factor ial diseases like diabetes, heart diseases, cancer and psychiatric disorders (Rajani and Kanaki , 2012). As a result, it has become essential to prescribe a set of standards, constant parameters, definitive

SUBSTITUTE SHEETS (RULE 26) qualitative and quantitative values that carry an assurance of quality, efficacy, safety and reproducibility of herbal formulations (Kumari, 2016). The methods of standardization should consider all aspects of quality- controlled herbal formulations, namely correct identification and physical evaluation of the raw herbal materials, sample preparation and phytochemical evaluation, microbial and toxicity testing, as well as testing of biological activity (Rajani and Kanaki , 2012). Of these aforementioned aspects, the phytochemical evaluation is the most significant since it determines the phytochemical profile, or chromatogram of the raw herbal material. The chromatograms can be used to identify and quantify the metabolites that would make up the proposed herbal formulation. The advancements in modern methods of chemical analysis like high-performance thin- layer chromatography (HPTLC), gas chromatography (GC), mass spectrometry (MS), high-performance liquid chromatography (HPLC), LC-MS, GC-MS, UV-Vis spectroscopy, LC-UV and GC-UV, substantially improve the reliability and accuracy of phytochemical profiling.

Plants are complex mixtures of different compounds, which is responsible for the synergistic and therapeutic effects of herbal formulations, such as causing fewer side effects than single-molecule medicinal formulations (Rajani and Kanaki , 2012). Despite following stringent quality controls and rigorous laboratory protocols, the standardization of herbal formulations still remains a challenge. The difficulty arises from the complex nature of plants and the inherent variability of their phytochemistry. Due to their intrinsic plasticity, plants can adjust their responses to a multitude of biotic and abiotic stresses. Thus, environmental conditions such as temperature, humidity, sunlight, rainfall, and soil conditions, as well as diurnal and seasonal changes can drive the variability in the phytochemical make-up of raw herbal materials. Therefore, the high degree of adaptability

SUBSTITUTE SHEETS (RULE 26) of plants is also the source of the difficulty in the standardization of herbal formulations from them.

Various inventions and related processes have already been explored to address the concerns in standardizing herbal formulations. A few of which address the phytochemical standardization of natural products using the data from advanced analytical techniques such as LC-MS, and the insights from other related technologies. US Pat. No. 7,144,740 and European Patent 1340071 disclosed a method of phytochemical profiling, chemical standardization and therapeutic standardization of organic molecules and organometal 1 ic molecules from plant, animal or natural or artificial materials for medicinal applications. The goal of the invention is to build a comprehensive and standardized database of phytochemical profiles that could be extracted from natural sources. Although the said prior arts offer valuable solutions to the different challenges of standardizing herbal formulations, their efficacy and reliability still hinge on the quantity and quality of phytochemical profiles, or chromatograms, that could be obtained from the tedious compound -detect ion analyses such as LC-MS and HPLC.

The method of predicting the phytochemical profiles of organic plant extracts from the collected sensor data of various environmental parameters relevant to the phytochemical production in plants is partially disclosed in other prior arts. The US Pat. No. 9,013,302 disclosed a method for remotely monitoring the environmental conditions in real-time that automates the in situ monitoring and collection of environmental time- series data across an area of vegetation. However, the monitoring is linked to the control of an automated flow meter (US Pat. No. 10,113,895) and controllable irrigation pumps (US Pat. No. 10,473,505). In the present disclosure, the network system collects environmental data that will serve

SUBSTITUTE SHEETS (RULE 26) as an input to the machine learning framework that is also disclosed herein. The machine learning framework in the present invention also uti l izes the phytochemical profi les gathered from laboratory analysis .

There are some related patents that employ machine learning to classify the phytochemical profi les of organic extracts according to their biological activities . Japan Patent Appl ication JP2019527350A discloses the machine classification of chromatograms of tobacco samples according to their predetermined flavor profi le. In the present disclosure, the machine learning method is used to predict instead the extracted phytochemical composition from the relevant time-series sensor data of the environment conditions in which the plant grew and laboratory conditions in which extraction would take place.

SUMMARY OF INVENTION

Embodiments described in this disclosure provide a method and system for the prediction of extracted phytochemical composition of a plant from relevant time-series sensor data of environmental and laboratory parameters .

In one exemplary embodiment , the present disclosure relates to a method for predicting the phytochemical chromatographic profi le of organic plant extracts comprising: continuously col lecting two or more of environmental and laboratory parameters relevant to the extracted phytochemical composition in plants by a network system of remote sensing instruments ; preprocessing of the dataset consisting of the environmental and laboratory data through a data embedding and encoding procedure (DEEP) ; and encoding of the relational patterns hidden within the dataset through a data- encodified learning type-algorithm (DELTA) ; wherein the predictive model resulting from DEEP and DELTA are integrated with the network system of remote sensing instruments in accordance with a processor and a memory.

SUBSTITUTE SHEETS (RULE 26) In another embodiment , the present disclosure relates to a system for the continuous and automated, in situ data col lection of various environmental parameters across an area of vegetation and laboratory parameters in a laboratory setting, comprising a plural ity of interconnected global environmental parameter sensing instrument GEPSI and local environment parameter sensing instrument , each comprising a variety of probes for sensing two or more environmental parameters and laboratory parameters , wherein the aggregated environmental and laboratory data from the l inked instruments are sent to a database server through wired or wireless communication, and wherein a user interfaces with the information contained in the database server to perform tasks such as , but not l imited to, visual ization and retrieval of historical data, and remote cal ibration of sensors . In yet another embodiment , the present disclosure relates to a computer program product comprising a non-transitory computer readable storage medium for storing computer readable program code, which when executed, causes a processor to predict the phytochemical composition of an extract from an actual plant based only on the relevant time-series sensor data, send sensor data from GEPSI and to a database server , manipulate and visual ize the col lected sensor data in the database server , retrieve historical data, and cal ibrate the sensors .

In this respect , before explaining at least one embodiment of the invention in detai l , it is to be understood that the invention is not l imited in its appl ication to the detai ls of construction and to the arrangements of the components set forth in the fol lowing description or i l lustrated in the drawings . The invention is capable of other embodiments and of being practiced and carried out in various ways . Also, it is to be understood

SUBSTITUTE SHEETS (RULE 26) that the phraseology and terminology employed herein are for the purpose of description and should not be regarded as l imiting.

BRIEF DESCRIPTION OF DRAWINGS

Fig. 1 is the workflow of the invention

Fig. 2 is a ful ly-assembled view of a network system of remote sensing instruments Fig. 3 is an exploded-view drawing of the Global Environmental Parameter Sensing Instrument (GEPSI)

Fig. 4 is an exploded-view drawing the Focal Environmental Parameter Sensing Instrument (FEPSI)

Fig. 5 is an instance of a webpage displaying the location of the remote sensing devices

Fig. 6 is an instance of a dashboard displaying the real -time measurements of different sensors

Fig. 7 shows instances of the front-end service appl ication

Fig. 8 is an instance of a data table col lected from the network system Fig. 9 is an instance of a preprocessing method of environmental time- series data

Fig. 10 is an example of an image representation of the five environmental time-series data

Fig. 11 shows an analytical column and the gradient elution curve Fig. 12 contains instances of raw and preprocessed chromatographic data obtained from FC-UV analysis .

Fig. 13 is the framework for training the machine learning model

SUBSTITUTE SHEETS (RULE 26) Fig. 14 compares instances of predicted chromatograms and corresponding actual chromatograms Fig. 15 shows an instance of the evaluation of prediction accuracy DESCRIPTION OF EMBODIMENTS

This section presents embodiments of the invention with further detai l on the method and system for predicting the phytochemical composition of a plant extract from the time-series sensor data of the environment conditions in which the plant grew and the laboratory conditions in which the extraction would take place. The description of the invention appl ies the fol lowing terms and definitions . A ski l led reader should recognize the definitions of each term includes examples of possible alternative definitions and that other components may be incorporated into the definition.

Input -output functional mapping” is the approximate representation of the unknown, hidden relationship between input information and output information, the use of which is to predict the output information given the input information. “Network system” is an interconnected set of instruments l inked together by a computer and communication network.

Chromatogram” is a graph showing the molecular response of the plant extract obtained through a molecular detection technique, the peaks of which reveal the possible molecular components of the extract . “Environmental time-series” is a graph of a variable representing a measurable environmental factor versus time.

SUBSTITUTE SHEETS (RULE 26) Machine learning” is a function approximation technique that seeks for the most accurate model , using suitable optimization methods , for predicting an output information from a given input information. With reference to Fig. 1 , the method includes a series of steps that begins with gathering of environmental time-series data 101 such as temperature and humidity, ambient l ight , soi l pH and moisture using a network system of remote sensing instruments 200 shown in Fig. 2. The col lected data undergo a plural ity of preprocessing procedures 102, which are then converted to spatial data 103 such as in the form of an image. The aforementioned steps may be done in paral lel with harvesting and extracting organic plant materials 104, analyzing the extracts 105 in the laboratory, and cleansing the chromatographic data 106 obtained from the analysis . The input acquisition procedure steps 101 , 102, 103 and output acquisition procedure steps 104, 105, 106 yield the required datasets for training a suitable machine learning model 107 which can be used for inference of the phytochemical profi le 108 of a plant extract .

The environmental variables can be classified into two categories according to their spatial extent : global and local . Global variables are those that extend across the entire area in which a plural ity of plants inhabit and in the laboratory setting where extraction takes place. Local variables are those that can be measured from the substrate of the specific site within the diameter of an individual plant’ s root system. The col lection of environmental data 101 is faci l itated by two remote sensing instruments cal led the Global Environmental Parameter Sensing Instrument (GEPSI) 201 , and the Local Environmental Parameter Sensing Instrument (I .EPS I ) 202. The GEPSI 201 comprises a variety of probes for sensing global environmental parameters such as , but not l imited to, ambient l ight , temperature, humidity, and carbon dioxide (C02) . An instance of a probe in GEPSI 201 is

SUBSTITUTE SHEETS (RULE 26) an ambient l ight probe 205, such as a photodetector , that measures l ight intensity within the surrounding area by converting l ight energy into an electrical signal . Another instance of a probe in GEPSI 201 is the temperature and humidity probe 204, comprising a capacitive hygroscopic dielectric material and band gap temperature probe that col lects data on ambient temperature and humidity. In yet another instance of a probe in GEPSI 201 is a C02 sensor 206 that is made of a non-dispersive infrared gas detection cel l which functions as a single wavelength spectrophotometer , del ineating to detect 4.2/zm infrared radiation and thus , correlating it to the C02 concentration. The 202 includes a variety of sensors that col lect in-situ data on local environment parameters such as , but not l imited to, soi l pH and soi l moisture. An instance of a probe in PEPSI 202 is a pH probe 208 that quantifies the soi l pH through a potentiometric difference between two metal electrodes immersed in the soi l . An ampl ifier circuit 401 transforms the signal col lected by these metal electrodes into a waveform that is readable by the microcontrol ler 301. Another instance of a probe in PEPSI 202 is a soi l moisture probe 207 that relates the dielectric permittivity of the surrounding medium to its moisture content using capacitive sensing.

A plural ity of PEPSI 202 and GEPSI 201 are l inked via an expansion port 203 which faci l itates data transmission. The aggregated environmental data from the l inked instruments are sent to a database server 210 through wired or wireless communication 209. The interconnectivity of PEPSI 202 and GEPSI 201 with the database server 210 forms a network system of remote sensing instruments that provides a real -time monitoring of global and local environmental parameters .

A front-end service, an instance of which is a web appl ication, is used as an interface of the information contained in the database server 209.

SUBSTITUTE SHEETS (RULE 26) Through this interface, a user can perform tasks such as, but not limited to, visualization and retrieval of historical data, and remote calibration of the sensors. An instance of a web homepage 500 is shown in Fig. 5 which displays the location of the remote sensing devices. Included in this homepage is a map pin 501 that locates the geographical position of the available instruments 502 in the network system. Selecting a specific instrument from the list 502 will allow users to navigate through a dashboard 600. An instance of a dashboard 600 is shown in Fig. 6 which displays the real-time measurements 601, and visualizes the time-series of measurements 602, of different sensors in a selected device or instrument. In the same instance of the dashboard, a list of options 603 for both calibration and data download can be found.

Instances of a calibration and a download page 700 are shown in Fig. 7. The calibration procedure requires the specific sensor 701 to be calibrated, as well as the input for the real-time voltage measurement 702 of the sensor. The download page offers two options to users, namely: 703 download historical data from a specific sensor of an instrument, and 704 download historical data from all the sensors of an instrument. An instance of the historical data 800 collected from a GEPSI 201 is shown in Fig. 8. The data table 801 contains the log date and the sensor readings, which constitute the time-series data for each environmental parameter.

The preprocessing of raw, time series data 900 can be illustrated in Fig. 9 wherein the collected raw data such as, but not limited to the relative humidity 901 is transformed and normalized to a bounded dataset 902, typically between 0 and 1, or in percentage as 0 to 100%. An instance of this normalization procedure is through the use of technical indicators applied in stock market chart analysis such as William's R and stochastic oscillators. Data augmentation 903, which extends the size and variability

SUBSTITUTE SHEETS (RULE 26) of the normal ized dataset , may be appl ied by resampl ing the raw data in different time scales D t , among other methods .

The normal ized temporal sequence 902 is encoded into spatial ly organized information such as an image 800 via tempo- spat ial transformation. An instance of this tempo- spat ial transformation is the Gramian Angular Summation/Difference Field, or GASF which preserves absolute temporal relations through a bij ective map of polar coordinates . These images 800 serve as the training input for the machine learning model 107.

With reference to Fig. 1 , the harvest and extraction of organic plant extract 104 include a variety of preparatory steps such as , but not l imited to, washing the leaves or any part of the plant with water , drying using any heating apparatus l ike an oven, and grinding the dried plant parts prior to addition of the extracting solvent . The laboratory analysis of the extracts 105 is carried out using Ultra Performance Liquid Chromatography (UPLC) with detection methods such as , but not l imited to, ultraviolet (UV) spectroscopy and mass spectroscopy. The myriad of metabol ites in the extracts are separated and resolved in the UPLC system by optimizing the combinations of the mobi le phase with the stationary phase. A gradient of mobi le phase mixture of decreasing polarity across an analytical columnllOl ensures that the relevant metabol ites , which are indicated by chromatographic peaks or analyte bands in the column, are eluted in an order of decreasing polarity. The elution strength of the mobi le phase depends upon its initial polarity and the steepness of its gradient elution curve 1102. System parameters are optimized to find a balance between the compound resolution against adj acent peaks and the total analysis time, whi le maintaining a good capacity/selectivity factor of the chromatographic system.

SUBSTITUTE SHEETS (RULE 26) Liquid chromatography, an instance of which is UPLC, is a standard analysis method that is appl icable to a wide variety of plant extracts . For

BOO instance, the extracts from Blumea balsamifera leaves were analyzed in a UPLC system equipped with a UV detector , to obtain the chromatogram 1201 consisting of a plural ity of peaks . The peaks represent the maximum absorbance by the component metabol ites in the extract of radiation at a specific wavelength within the UV-Vis spectrum. Each peak 1202 in the

305 chromatogram 1201 therefore, represents a phytochemical compound, the relative concentration of which is quantifiable through the area under the curve (AUC) within the vicinity of the peak. An external chemical standard, which may or may not be a compound in the extract , but has chemical properties and structure related to the majority of the extract components ,

310 is used for the relative quantification of the extract components .

The raw chromatogram 1203 obtained from the UPLC-UV analysis may contain unwanted l inear trends coming from the shift in the polarity of the mobi le phase, as wel l as unbounded UV absorbance units to name a few, and thus the need for data cleansing. The cleaning of the raw chromatogram may involve

315 normal ization such that the total AUC is equal to 1 , detrending and removal of non-float type data, among others . The cleaned chromatogram 1204 serves as the training output for the machine learning model 107.

With reference to Fig. 13 , the framework for training the machine learning model 1300 is an iterative search for the optimal input -output mapping of

320 the preprocessed raw dataset provided by the data embedding and encodification procedure, or DEEP 1302. The optimization focuses on the accuracy of providing the correct output for a given input , which is implemented by tuning the learning parameters in the data encodified learning type-algorithm, or DELTA 1303 according to the performance metrics

325 1304. DELTA 1303 can be any parametric or non-parametric optimization

SUBSTITUTE SHEETS (RULE 26) algorithm appl ied to machine learning. DEEP 1302 and DELTA 1303 therefore apply a variety of machine- learning techniques and optimization methods , and the combinations thereof 1301 , to represent the functional mapping between the input and output dataset . An instance of a machine- learning methodology to implement DEEP 1302 is a convolutional neural network, which takes in images as inputs , fol lowed by subsequent layers of weighted fi lters , nodes and connections l inking the input to the output layer . An instance of DELTA 1303 is the RMSProp, wel l -known in the l iterature on artificial - intel 1 igence research. In reference to Fig. 14, the optimal DEEP 1302 emerging from training 107 is used to infer the chromatogram of organic plant extracts from their raw time-series data. The true chromatograms 1401 resulting from tedious laboratory analysis are predicted wel l by the corresponding chromatograms 1402 inferred by the optimal ly trained model . The accuracy of the optimal DEEP 1302 can be evaluated using a variety of statistical measures and assessment methods , an instance of which (Fig. 15) is through fi ltering the chromatograms using a peak detection threshold 1501 , and defining a binary classification of peaks 1502 and not peaks 1503 in the chromatograms . A confusion matrix 1504 can be drawn from matching the relative peaks 1502 and not peaks 1503 in the true 1401 and the inferred 1402 chromatograms .

The results from the confusion matrix 1504 can be summarized by a statistic such as , but not l imited to, the Matthew’ s Correlation Coefficient , which indicates the accuracy of the predictor in the scale of -1 to +1 , centered at 0. If the value of this statistic is -1 , then the predictor is in total disagreement with the observation. A value of 0 for this statistic impl ies that the predictor is randomly guessing the peaks of the chromatogram. Final ly, a value of +1 represents the ideal case in which the predictor is 100 percent accurate.

SUBSTITUTE SHEETS (RULE 26)