METHOD AND SYSTEM FOR PREDICTING QUANTITATIVE MEASURES OF OIL ADULTERATION OF AN EDIBLE OIL SAMPLE

Title:

METHOD AND SYSTEM FOR PREDICTING QUANTITATIVE MEASURES OF OIL ADULTERATION OF AN EDIBLE OIL SAMPLE

Document Type and Number:

WIPO Patent Application WO/2020/130947

Kind Code:

Abstract:

According to embodiments, a method for predicting quantitative measures of oil adulteration of an edible oil sample is provided. The method includes receiving at least part of a spectral data of the edible oil sample; providing a single prediction model capable of generating a prediction of an adulterant in the edible oil sample, wherein the single prediction model is a non-linear model including a deep neural network; and processing the at least part of the spectral data using the single prediction model. According to further embodiments, a computer readable storage medium including computer readable instructions operable when executed by a computer to predict quantitative measures of oil adulteration of the edible oil sample is also provided. According to yet further embodiments, an apparatus or system for predicting quantitative measures of oil adulteration of the edible oil sample is provided.

Inventors:

LIM JUNLIANG KEVIN (SG)
ELEJALDE-OCHANDINAO UNTZIZU (SG)
NGUYEN THI KIM NGAN (SG)
KINI SHRUTHI GOPALKRISHNA (SG)

Application Number:

PCT/SG2019/050627

Publication Date:

June 25, 2020

Filing Date:

December 20, 2019

Export Citation:

Click for automatic bibliography generation Help

Assignee:

WILMAR INTERNATIONAL LTD (SG)

International Classes:

G01N21/35; G06N3/02; G01N21/65; G01N33/03

Foreign References:

CN102636454A	2012-08-15
CN105136735A	2015-12-09
CN103398970A	2013-11-20
CN101995392A	2011-03-30
CN103293234A	2013-09-11
US20180052144A1	2018-02-22

Other References:

GARCIA-GONZALEZ D. L. ET AL.: "Using 1H and 13C NMR techniques and artificial neural networks to detect the adulteration of olive oil with hazelnut oil", EUROPEAN FOOD RESEARCH AND TECHNOLOGY, vol. 219, 12 August 2004 (2004-08-12), pages 545 - 548, XP055721651
GROSELJ N. ET AL.: "The Use of FT-MIR Spectroscopy and Counter- Propagation Artificial Neural Networks for Tracing the Adulteration of Olive Oil", ACTA CHIMICA SLOVENICA, vol. 55, no. 4, 31 January 2008 (2008-01-31), pages 935 - 941, XP055721656
HIRRI A. ET AL.: "FTIR Spectroscopy and PLS-DA Classification and Prediction of Four Commercial Grade Virgin Olive Oils from Morocco", FOOD ANALYSIS METHODS, vol. 9, no. 4, 30 April 2016 (2016-04-30), pages 974 - 981, XP035955791, [retrieved on 20200210], DOI: 10.1007/s12161-015-0255-y
GALTIER O. ET AL.: "Comparison of PLS1-DA, PLS2-DA and SIMCA for classification by origin of crude petroleum oils by MIR and virgin olive oils by NIR for different spectral regions", VIBRATIONAL SPECTROSCOPY, vol. 5, no. 1, 18 January 2011 (2011-01-18), pages 132 - 140, XP027557794, Retrieved from the Internet [retrieved on 20200210]

Attorney, Agent or Firm:

AMICA LAW LLC (SG)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1. A method for predicting quantitative measures of oil adulteration of an edible oil sample, the method comprising:

receiving at least part of a spectral data of the edible oil sample;

providing a single prediction model capable of generating a prediction of an adulterant in the edible oil sample, wherein the single prediction model is a non-linear model comprising a deep neural network; and

processing the at least part of the spectral data using the single prediction model.

2. The method of claim 1, wherein the single prediction model is constructed from a matrix of features, wherein each feature corresponds to an intensity of the spectra data at a specific wave number.

3. The method of claim 1 or 2, wherein the edible oil sample is selected from the group consisting of a peanut oil sample, an olive oil sample, a corn oil sample, a coconut oil sample, a cottonseed oil sample, a palm oil sample, a canola oil sample, a safflower oil sample, a sesame oil sample, a soybean oil sample, a sunflower oil sample, a camellia seed oil sample, a linseed (flaxseed) oil sample, and a sample of oil with a relatively higher proportion of certain fatty acids, including high erucic acid rapeseed oil, low erucic acid rapeseed oil, or high oleic acid sunflower oil.

4. The method of claim 3, wherein the edible oil sample is the peanut oil sample.

5. The method of any one of the preceding claims, further comprising generating the prediction of at least two adulterants in the edible oil sample.

6. The method of claim 5, wherein the prediction of the at least two adulterants are substantially simultaneously generated.

7. The method of claim 5 or 6, wherein the prediction comprises a type and a percentage amount of each of the at least two adulterants.

8. The method of any one of the preceding claims, further comprising prior to receiving the at least part of the spectral data, obtaining the spectral data using Fourier transform near-infrared (FT-NIR), Fourier transform mid-infrared (FT-MIR) or Raman spectroscopy.

9. A computer readable storage medium comprising computer readable instructions operable when executed by a computer to predict quantitative measures of oil adulteration of an edible oil sample, the computer readable instructions configured to perform a method of any one of claims 1 to 8.

10. An apparatus or system comprising:

a receiving unit configured to receive at least part of a spectral data of an edible oil sample;

a memory for storing a single prediction model capable of generating a prediction of an adulterant in the edible oil sample, wherein the single prediction model is a non linear model comprising a deep neural network; and

a processor configured to access the single prediction model stored in the memory to perform steps of a method of any one of claims 1 to 8 for generating a prediction of quantitative measures of oil adulteration in the edible oil sample.

Description:

METHOD AND SYSTEM FOR PREDICTING QUANTITATIVE MEASURES OF OIL ADULTERATION OF AN EDIBLE OIL SAMPLE

Cross-Reference To Related Application

[0001] This application claims the benefit of priority of Singapore patent application No. 10201811511R, filed 21 December 2018, the content of it being hereby incorporated by reference in its entirety for all purposes.

Technical Field

[0002] Various embodiments relate to a method for predicting quantitative measure of oil adulteration of an edible oil sample and an apparatus or system therefor.

Background

[0003] Prediction of oil adulteration is widely published. Prior publications described applications, for example, in extra virgin olive oil and sesame oil.

[0004] Oil adulteration may be determined using different techniques, for example, DNA-based marker screening, rapid screening using FT-NIR (Fourier transform near infrared), FT-MIR (Fourier transform mid infrared) and Raman spectroscopy, or prediction models.

[0005] In DNA-based marker screening, DNA from different plant origins are used to identify adulteration. The advantage of this technique is that DNA level markers provide a non-refutable evidence of adulteration. However, there are several unfavourable drawbacks to the DNA-based marker screening. For example, this screening technique is a destructive process, and thus samples used for screening cannot be recovered. Further, DNA extraction is a slow process and requires large sample volume. Most importantly, the models constructed from this type of technology are predominantly qualitative as it is difficult to quantify spiked oils from DNA.

[0006] Rapid screening using FT-NIR, FT-MIR and Raman spectroscopy uses light energy measured by either transmission or reflection to detect chemical signatures embedded with a sample. In contrast to the DNA-based marker screening, this technique is a rapid and non-destructive process. Quantitative models may be built to mine insights from rich datasets. The main disadvantage is that features associated with FT-NIR, FT- MIR or Raman spectroscopy are not direct evidence that a particular compound exists. Thus, the levels of uncertainty and inaccuracy are relatively high.

[0007] Both DNA-marker based and rapid screening technologies offer a technological platform to provide a feature-rich dataset, however these techniques do not offer a way to discriminate or predict outcomes. Thus, technologies that provide data acquisition are paired with modeling techniques so that features may be correlated with qualitative constitution of samples or quantitative composition of the samples.

[0008] The most common technique used for oil adulteration is a linear computational method called partial least squares (PLS). For example, PLS1 is the situation when PLS is modeled for a quantitative singular outcome.

[0009] A prior publication has described the use of PLS 1 on multiple singular outcomes to simulate prediction of different adulterants. In this publication, four models were built to predict four different adulterants.

[0010] The disadvantage of using PLS1 is that multiple predicted outcomes have to be corroborated and the interpretation of the outcome is not easy or straightforward. For example, oill is adulterated 10% into oil-A, but there are four models to predict adulteration of oill, oil2, oil3, and oil4. The prediction derived from model 1 may give the correct 10% adulteration level, but so may models 2-4, each predicting a non-zero adulterant level. It is therefore difficult to establish the fact and distinguish which adulterant is spiked given that there are four models to evaluate.

[0011] The main reason as to why PLS works in general for single oil outcome is that adulterating one oil generates linear features in spectral data. However, when more than one oil is adulterated, or when there are more than one brand of oils, or when oils are obtained from different geographical locations, or when oils are obtained through different processing technologies, the signals generated may be non-linear, thus rendering PLS unsuitable for predicting more complex samples.

[0012] Developments also focused on rapid technologies such as FT-NIR, FT-MIR and Raman spectrocopy typically with qualitative determination of adulteration. Quantitative determination of adulteration may be possibly modeled using partial least squares (e.g., PLS-DA and PLS1) and in a way that is specific to one adulterant. This process may require careful interpretation of several modeling results to decipher which specific adulterant is present and errors aggregated over multiple models make calling of adulteration levels in industry use impractical because it entails a level of human intervention to interpret model results.

[0013] Thus, there is a need for a method and system of predicting quantitative measures of oil adulteration and capturing invariant adulteration information, thereby addressing at least the problems mentioned hereinabove and providing for industry needs that is generalizable to blind samples and brands.

Summary

[0014] According to an embodiment, a method for predicting quantitative measures of oil adulteration of an edible oil sample is provided. The method may include receiving at least part of a spectral data of the edible oil sample; providing a single prediction model capable of generating a prediction of an adulterant in the edible oil sample, wherein the single prediction model is a non-linear model including a deep neural network; and processing the at least part of the spectral data using the single prediction model.

[0015] According to an embodiment, a computer readable storage medium including computer readable instructions operable when executed by a computer to predict quantitative measures of oil adulteration of an edible oil sample is provided. The computer readable instructions may be configured to perform the method according to various embodiments.

[0016] According to an embodiment, an apparatus or system is provided. The apparatus or system may include a receiving unit configured to receive at least part of a spectral data of an edible oil sample; a memory for storing a single prediction model capable of generating a prediction of an adulterant in the edible oil sample, wherein the single prediction model is a non-linear model including a deep neural network; and a processor configured to access the single prediction model stored in the memory to perform steps of the method in accordance with various embodiments for generating a prediction of quantitative measures of oil adulteration in the edible oil sample.

Brief Description of the Drawings

[0017] In the drawings, like reference characters generally refer to like parts throughout the different views. The drawings are not necessarily to scale, emphasis instead generally being placed upon illustrating the principles of the invention. In the following description, various embodiments of the invention are described with reference to the following drawings, in which:

[0018] FIG. 1A shows a flow chart illustrating a method for predicting oil adulteration of an edible oil sample, according to various embodiments.

[0019] FIG. IB shows a schematic view of an apparatus or system for predicting oil adulteration of an edible oil sample, according to various embodiments.

[0020] FIGS. 2A to 2N show graphs representing the final PLS2 experimental results, which reflect the predicted distribution for peanut oil and the adulterants of soybean oil and sunflower oil in terms of amount percentages.

[0021] FIGS. 3A to 3N show graphs representing the final deep learning experimental results, which reflect the predicted distribution for peanut oil and the adulterants of soybean oil and sunflower oil in terms of amount percentages.

[0022] FIG. 4 shows a graphical representation of data in a linear latent space based on different brands test, in accordance with one embodiment.

[0023] FIG. 5 shows a graphical representation of data in a non-linear transformed space based on different brands test, in accordance with one embodiment.

[0024] FIGS. 6 A to 6J show graphs representing the final PLS2 results for the blind brands test, which reflect the predicted distribution for the peanut oil and the adulterants of soybean oil and sunflower oil in terms of amount percentages.

[0025] FIGS. 7A to 7J show graphs representing the final deep learning results for the blind brands test, which reflect the predicted distribution for the peanut oil and the adulterants of soybean oil and sunflower oil in terms of amount percentages. [0026] FIG. 8 shows a schematic representation of the deep learning model, in accordance with various embodiments.

[0027] FIG. 9A to 9C show respective graphs illustrating the relationships between the actual adulteration and the predicted adulteration based on FT-NIR, Raman, combination of FT-NIR and Raman spectroscopy, in accordance with various embodiments.

[0028] FIG. 10 shows a graph illustrating the root-mean-square errors of cross validation (RMSECV) obtained from the analysis of FIGS. 9 A to 9C.

Detailed Description

[0029] The following detailed description refers to the accompanying drawings that show, by way of illustration, specific details and embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. Other embodiments may be utilized and structural, logical, and electrical changes may be made without departing from the scope of the invention. The various embodiments are not necessarily mutually exclusive, as some embodiments can be combined with one or more other embodiments to form new embodiments.

[0030] Embodiments described in the context of one of the methods or devices/apparatus are analogously valid for the other methods or devices/apparatus. Similarly, embodiments described in the context of a method are analogously valid for a device, and vice versa.

[0031] Features that are described in the context of an embodiment may correspondingly be applicable to the same or similar features in the other embodiments. Features that are described in the context of an embodiment may correspondingly be applicable to the other embodiments, even if not explicitly described in these other embodiments. Furthermore, additions and/or combinations and/or alternatives as described for a feature in the context of an embodiment may correspondingly be applicable to the same or similar feature in the other embodiments.

[0032] In the context of various embodiments, the articles“a”,“an” and“the” as used with regard to a feature or element include a reference to one or more of the features or elements. [0033] In the context of various embodiments, the phrase“substantially” may include “exactly” and a reasonable variance.

[0034] In the context of various embodiments, the term“about” or“approximately” as applied to a numeric value encompasses the exact value and a reasonable variance.

[0035] As used herein, the term“and/or” includes any and all combinations of one or more of the associated listed items.

[0036] As used herein, the phrase of the form of“at least one of A or B” may include A or B or both A and B. Correspondingly, the phrase of the form of“at least one of A or B or C”, or including further listed items, may include any and all combinations of one or more of the associated listed items.

[0037] Various embodiments may provide quantitative detection of oil adulterants using deep-learning models on FT-NIR, FT-MIR and Raman spectra.

[0038] Detection of oil adulterants is a difficult task. Various embodiments may provide an application of integrating deep-learning and FT-NIR/FT-MIR/Raman spectroscopy technologies to detect if multiple adulterants are present in a sample and if so, the quantitative amounts of each adulterant, thus addressing the practical aspect of the industry requirement. Other advantages of deep-learning, for example, over partial least squares are that the technology is able to learn complex non-linear structures in the data which is otherwise not possible with linear models (as in partial least squares, thus addressing the errors associated with constructing multiple PLS1 models).

[0039] In contrast to PLS, deep learning models may handle non-linear structures in the data significantly well. Multiple adulterant oils may be handled as well. Deep learning requires training of datasets that are sufficiently large to at least minimize or avoid over fitting.

[0040] Various embodiments provide a combination of technologies including rapid testing FT-NIR, FT-MIR, Raman spectroscopy and deep learning models to predict quantitative measures of adulteration, for example, to achieve accurate detection of the type of adulterated oil (qualitatively) under the situation that unknown oils may be adulterated, and accurate quantification of the adulterated matter in situations where two or more adulterants are added. In other words, accurate prediction of oil adulteration of more than one adulterant may be achieved. [0041] FIG. 1A shows a flow chart illustrating a method for predicting quantitative measure of oil adulteration of an edible oil sample 100, according to various embodiments. In FIG. 1A, at Step 102, at least part of a spectral data of the edible oil sample is received. At Step 104, a single prediction model capable of generating a prediction of an adulterant in the edible oil sample is provided. At Step 106, the at least part of the spectral data is processed using the single prediction model.

[0042] In the context of various embodiments, the term“processed” or“processing” may mean analyzing, or mathematically evaluating.

[0043] The method 100 may be a method of predicting more than one adulterant in the edible oil sample, in other words, at least two adulterants in the edible oil sample.

[0044] Various embodiments may provide modeling multiple quantitative adulterant outcomes in one single model. This advantageously addresses the problem faced in using PLS1 where multiple models were applied in predicting the adulteration, and further human interpretation of the results was necessary.

[0045] For example, the single prediction model may be a linear model. A quantitative prediction of two variables may be realized using PLS2. PLS2 is relatively less common and even more so, apparently not known to be used in predicting oil adulteration. Generally, PLS2 may enable prediction of outcomes that generally do not require further human corroboration and may be more easily interpreted, as compared to PLS 1.

[0046] In one embodiment, the single prediction model may be a non-linear model. The single prediction model may include a deep neural network (DNN). The DNN may be with multiple layers between an input layer and an output layer. The DNN may be a feedforward network in which data flows from the input layer to the output layer without looping back. The input layer may receive one or multiple inputs. The intermediate layers may be specialized. For example, the intermediate layers may include convolution layers which automatically find feature maps (summarize data features) within the spectra. The output layer may provide one or multiple outputs. For example, the deep neural network may be supervised.

[0047] The single prediction model may be for modeling complex traits and accounting for different oil brands, processing techniques, geographical locations and other variables. It is believed that these traits cause the (spectra) data to have non-linear structures. Deep learning models are purposed for non-linear pattern discovery, and thus provide a significantly good and useful tool to analyze and assess the non-linearity of the data.

[0048] Various embodiments may provide the development of additional model training strategies by influencing what the model learns from, e.g., complete information from a particular oil brand to predict a random oil brand. As such, the method 100 in accordance with various embodiments may enable the single prediction model to meet industry needs, including the prediction of adulteration generalized over different brand specific traits.

[0049] In various embodiments, the single prediction model may be trained by a matrix of features, wherein each feature corresponds to the intensity of the spectra data at a specific wavenumber. The spectra data may be based on training examples, which may include a processing technique of the edible oil sample, or a brand of the edible oil sample, or a geographical location from which the edible oil sample is obtained, or any combination thereof.

[0050] In various embodiments, the edible oil sample may be selected from the group consisting of a peanut oil sample, an olive oil sample, a corn oil sample, a coconut oil sample, a cottonseed oil sample, a palm oil sample, a canola oil sample, a safflower oil sample, a sesame oil sample, a soybean oil sample, a sunflower oil sample, a camellia seed oil sample, a linseed (flaxseed) oil sample, and a sample of oil with a relatively higher proportion of certain fatty acids, for example, high erucic acid rapeseed oil, low erucic acid rapeseed oil, or high oleic acid sunflower oil.

[0051] For example, the edible oil sample may be the peanut oil sample.

[0052] In various embodiments, the method 100 may further include generating the prediction of at least two adulterants in the edible oil sample. In other words, the output layer of the single prediction model may provide multiple outputs of the at least two adulterants or variables.

[0053] In the context of various embodiments, the term “generating” may mean “determining”.

[0054] The prediction of the at least two adulterants may be substantially simultaneously generated. No further corroboration of data and no human intervention are required. [0055] The prediction may include at least a type or a percentage amount of each of the at least two adulterants.

[0056] In various embodiments, the method 100 may further include prior to receiving the at least part of the spectral data, obtaining the spectral data using FT-NIR, FT-MIR or Raman spectroscopy.

[0057] For example, the spectral data may include FT-NIR, FT-MIR or Raman spectral data. The spectral data may be representative of tocopherols/tocotrienols band at specific wave numbers, or acid values for fatty acids, or peroxide values, or polymer compounds, amongst others. The specific wave numbers may be from 4476 to 9008.

[0058] The spectral data may be preprocessed, for example, by correction of baseline shifts, standard normal variate transformation for scaling the data within a sample, or multiplicative scattering correction for scaling the data across samples, and a Savitzky- Goly filter. The filter may be further configured to emulate a smoother 1 ^st or 2 ^nd order derivative signal.

[0059] While the method described above is illustrated and described as a series of steps or events, it will be appreciated that any ordering of such steps or events are not to be interpreted in a limiting sense. For example, some steps may occur in different orders and/or concurrently with other steps or events apart from those illustrated and/or described herein. In addition, not all illustrated steps may be required to implement one or more aspects or embodiments described herein. Also, one or more of the steps depicted herein may be carried out in one or more separate acts and/or phases.

[0060] Various embodiments further provide a computer readable storage medium including computer readable instructions operable when executed by a computer to predict quantitative measures of oil adulteration of an edible oil sample. The computer readable instructions may be configured to perform the method 100, in accordance with various embodiments.

[0061] FIG. IB shows a schematic view of an apparatus or system 120 for predicting oil adulteration of an edible oil sample, according to various embodiments. In FIG. IB, the apparatus or system 120 includes a receiving unit 122 configured to receive at least part of a spectral data of an edible oil sample; a memory 124 for storing a single prediction model capable of generating a prediction of an adulterant in the edible oil sample; and a processor 126 configured to access the single prediction model stored in the memory 124 to perform steps of the method 100 (FIG. 1A) for generating a prediction of quantitative measures of oil adulteration in the edible oil sample, in accordance with various embodiments. The receiving unit 122, the memory 124 and the processor 126 may be in communication with one another, as depicted by lines 128, 130. The communication may be bi-directional. The single prediction model may be a non-linear model including a deep neural network.

[0062] The apparatus or system 120 may include the same or like elements or components as those described in the method 100 of FIG. 1A, and as such, the like elements may be as described in the context of the method 100 of FIG. 1 A, and therefore the corresponding descriptions are omitted here.

[0063] Examples will be described below in forms of experiments conducted to provide a better understanding of the method 100 and the apparatus or system 120.

[0064] Samples and sample preparation

[0065] Pure peanut oil samples were spiked with adulterant soybean and adulterant sunflower oil. The ratios of adulteration were in 5% increments from 5% to 50%. Binary and ternary blends were allowed, thus increasing the complexity of the experiment but also increasing the feature -rich dataset. For example, ternary blends permit up to the order of 0(n ) combinations.

[0066] Each blend was measured in replicates, using up to two different glass vials and up to six different rotations of the glass vials to obtain example spectra for a given sample. Twenty peanut oil samples blended against soybean and sunflower were being profiled, with a total of 977 spectra generated. The spectra, more specifically FT-NIR spectra, was preprocessed using first derivative and transformed using signal normal variate.

[0067] Deep learning model

[0068] A deep learning model is constructed to model the quantitative adulteration of the soybean and sunflower oil in peanut oil. The model includes multiple layers and each layer is associated with a mathematical manipulation. The models find a correct mathematical manipulation to turn an input (e.g., the spectra or spectral data) into an output. The model may identify the correct mathematical manipulation by going through the layers with the calculation of the probability of the output.

[0069] In a simple form, for mathematical manipulation, each the deep learning model includes a matrix of features (fi to fx) and each feature is manipulated by a weight. The weight may represent or may be associated with a level of influence on the output. The features may correspond to the intensity of the FT-NIR spectra at different wave numbers. It should be appreciated that other features, not mentioned herein, may also be considered.

[0070] More specifically, the deep learning model, or interchangeably referred to as deep neural network, accepts as input the spectra. Each of the neural network layer is represented by a number of nodes, initialized with random weights. Data inputs flow through this network by multiplication with the weights associated with a node, through a process of forward propagation. An suitable activation function transforms this result so that non-linearity may be modeled. The activation function may be a sigmoid function or a tanh function. Each layer may be also represented by specialized nodes, e.g. convolutional nodes which transform a spectra into feature maps (i.e. regions within the spectra with discriminatory signals). It should be appreciated that the ordering of layers is not to be taken in a limiting sense. This process may be repeated over the number of layers in the network until finally reaching the output layer where the results are cummulated to represent the quantitative proportions of the oil and its adulterant(s). FIG. 8 shows a schematic representation of the deep learning model 800, in accordance with various embodiments. In FIG. 8, the inputs, i _m, 802 include m number of features, e.g., the intensity of spectra used for modelling. The hidden nodes, h _{r k}, 804 are of size r and k denoting the depth and width of the hidden nodes 804. Each hidden node 804 (as expanded and depicted in a dotted- line area 806) is computing the sum of weights, w, represented by å 808 and applying an activation function f 810 to estimate non-linear relationships in the data. If the hidden nodes belong to the layer whose type is convolutional, the weights correspond to a filter which when applied to the inputs creates feature maps. The outputs, o _s, include s number of outputs to estimate, quantification of oils and the adulterants. The initial estimate of the output generally have large errors as the nodes (e.g., 804) of the network 800 was intially assigned random weights. A back propagation algorithm readjusts these weights, w, so that the errors in the outputs 812 may be reduced. This iterative process of forward and backward propagations, changes the weights, w, in the deep neural network 800 until the error is sufficiently small, or if the errors have stopped decreasing. To prevent overfitting of data, a hold-out set of data may usually be reserved for forward propagation to substantially simultaneously check if errors on an independent dataset not trained on the deep neural network 800 also has sufficiently small errors.

[0071] In the event where the spectral is not preprocessed, for example, as described hereinabove, it may be possible to include 1 -Dimensional convolutional neural network nodes configured in front of the network so that a suitable filter may be learned to “automatically” preprocess the data for higher quality discriminatory signals. It should be appreciated that the ordering of layers is not be taken in a limiting sense.

[0072] PLS2 model

[0073] PLS2 model is a linear model. PLS2 corresponds to the case where there are several dependent variables. This is different from PLS1 which corresponds to the case where there is only one dependent variable. For PLS1, features X are projected into a latent variable space T and one dependent variable y. For PLS2, both the features X and dependent variables Y are matrices, and are both projected into latent variable spaces T and U, where a regression model may be used to ensure that the covariances are maximized.

[0074] Experiments Part A

[0075] Each of the deep learning model and the PLS2 model were trained by cross- validation and performed in the following two ways:

(i) partitioning 90% of the data for training and 10% of the samples for blind testing;

(ii) using 90% of the oil brands for training and 10% of the blind brands for testing.

[0076] FIGS. 2A to 2N show graphs representing the final PLS2 results (based on way (i) above), which reflect the predicted distribution for the peanut oil and the adulterants of soybean oil and sunflower oil in terms of amount percentages. The actual ratio for each case is presented in Table 1 below. [0077] Table 1

[0078] FIGS. 3A to 3N show graphs representing the final deep learning results, which reflect the predicted distribution for the peanut oil and the adulterants of soybean oil and sunflower oil in terms of amount percentages. The actual ratio for each case is presented in Table 2 below.

[0079] Table 2

[0080] The final results from the two models were compared against each other to reflect the contrast in accuracy.

[0081] From the final PLS2 results, it can be observed that the prediction of some samples may not reflect the actual percentages correctly. For example, in FIG. 2B, some predicted values for actual 40% sunflower oil may exceed 55%, which is closer to the actual percentage of peanut oil (at 60%) instead. This may cause some uncertainty in the prediction of the type and amount of adulteration in the sample. However, the PLS2 model may be applicable for oil samples where the adulteration levels are comparatively low and where the distinction of different adulterants are not crucial.

[0082] In contrast, FIG. 3B shows consistent predicted values of about 40% close to the actual 40% sunflower oil using the deep learning model.

[0083] In accordance with the final results, the deep learning model predicts to a much higher accuracy. This is even more apparent when the amount of adulteration is comparable to the amount of the oil sample under test.

[0084] The deep learning model is purposed for non-linear pattern discovery and works on a non-linear transformed space, while the PLS2 model works on a linear latent space.

[0085] FIG. 4 shows a graphical representation of the data in a linear latent space (based on way (ii) above, more specifically, on different brands test). The legend on the right side of FIG. 4 shows the respective percentage amounts of peanut oil to soybean oil to sunflower oil, which have been grouped in Groups a, b, and c to facilitate the identication of the different plot points in FIG. 4. The rectangular labels with white background represent the oil samples of different brands, which possibly involved different processing techniques, and/or possibly obtaining from different geographical locations, amongst other variables. For example, the label SG may represent the oil brand obtained from Santa Gloria (Spain).

[0086] It is observed from Groups a, b, and c that the PLS2 model (which is a linear method) may be able to sufficiently distinguish the different levels of oil adulteration. However, it is further observed from Group a in FIG. 4 that the plot points representing the different oil brands containing low levels of adulterants form over a large spread or distribution in the linear latent space. This suggests that the PLS2 model may not be applicable in determining the oil adulteration if the oil brand under test differs from the oil brand used to built the PLS2 model. In other words, in the linear latent space, the data may discriminate adulteration patterns which is desirable. However, the effects of processing techniques, brands, geographical locations and other variables are not treated as an invariant pattern.

[0087] FIG. 5 shows a graphical representation of the data in a non-linear transformed space (based on way (ii) above, more specifically, on different brands test). The legend on the right side of FIG. 5 shows the respective percentage amounts of peanut oil to soybean oil to sunflower oil, which have been grouped in Groups d, e, and f to facilitate the identication of the different plot points in FIG. 5. Similar to FIG. 4, the rectangular labels with white background in FIG. 5 represent the oil samples of different brands, which possibly involved different processing techniques, and/or possibly obtaining from different geographical locations, amongst other variables.

[0088] It is observed from Groups d, e, and f that the deep learning model may be able to sufficiently distinguish the different levels of oil adulteration. More importantly, it is observed from Group d in FIG. 5 that the plot points of the different oil brands containing low levels of adulterants are consolidated in the non-linear transformed space. This suggests that the deep learning model is capable of modeling complex traits and accounting for different oil brands, processing techniques, geographical locations and other variables. In other words, in the non-linear transformed space, the effects of brands (geographical location and processing type) become invariant.

[0089] The deep learning model also accounts for blind brands which is not possible with the PLS2 model. [0090] FIGS. 6A to 6J show graphs representing the final PLS2 results (based on way (ii) above, more specifically, the blind brands test), which reflect the predicted distribution for the peanut oil and the adulterants of soybean oil and sunflower oil in terms of amount percentages. In FIGS. 6A to 6J, each of the peanut oil and the adulterants is represented by three different groups of shadings 602, 604, 606. Each group 602, 604, 606 may be associated with a different brand, which may be representative of a different geographical location or a different processing type. The actual ratio for each case is presented in Table 3 below.

[0091] Table 3

[0092] FIGS. 7A to 7J show graphs representing the final deep learning results for the blind brands test, which reflect the predicted distribution for the peanut oil and the adulterants of soybean oil and sunflower oil in terms of amount percentages. In FIGS. 7A to 7J, each of the peanut oil and the adulterants is represented by three different groups of shadings 602, 604, 606. Each group 602, 604, 606 may be associated with a different brand, which may be representative of a different geographical location or a different processing type. The actual ratio for each case is presented in Table 4 below.

[0093] Table 4

[0094] The final results for the blind brands test from the two models were compared against each other to reflect the contrast in accuracy.

[0095] The deep learning model was found to be able to provide consistent results despite the oil being of a different brand, while the PLS2 model results show that a different oil brand affects the predicted values.

[0096] Table 5 shows a summary of the exemplary samples used in each of the three representative groups of shadings 602, 604, 606.

[0097] Table 5

HM: homemade peanut oil samples

[0098] The blending experiments above ( Part A) were conducted to detect adulteration of oils (>=2 adulterants) by combining photonics methods and machine learning. The experimental results demonstrated the ability to detect 2 or more adulterants at about 5% sensitivity when blind tested. The errors presented in these blending experiments were significantly lower than the errors found when using conventional or existing methods or systems. Thus, the method or the apparatus/system in accordance wtih various embodiments provide improved technical effect. Further, a conclusion drawn from the blending experiments above ( Part A) is that the method or the apparatus/system in accordance wtih various embodiments is agnostic to brands of main oil of interest (in this case, peanut oil).

[0099] The experiment has been modified as explained in Experiments ( Part B ) below along with the results, to demonstrate the following addtional points

• agnostic to brands of adulterant oils;

· generalizability to additional adulterant oil type (corn), which is more prevalently adulterated to peanut oil;

• generalizability to other photonics technologies;

• ability to surpass the 5% detection limit as shown in Experiment ( Part A) above.

[0100] Experiments - Part B

[0101] 22 brands of corn oil adulterated one brand of peanut oil. Adulteration was done in 1% increments from 85% to 99%. Three different types of data were profiled in accordance with the following photonics technologies: FT-NIR, FT-MIR (Fourier transform mid-infrared) and Raman spectroscopy.

[0102] The data from corresponding samples were modeled as described below.

[0103] Data pre-processing, machine learning and modeling were performed on each photonics technology separately as per the steps in Experiment ( Part A) and also with the additional difference :-

• Data from the three photonics technologies was merged to test and determine if the model becomes more sensitive.

• Variable selection was performed using cross-validation from each technology before merging in order to avoid or at least minimzie over fitting from a large number of features.

• The pre-processed data is thus a data matrix with n rows of samples and m features (reduced from the original data, where m=x+y+z, x features came from FT-NIR, y features from FT-MIR and z features from Raman spectroscopy), from which the deep learning model (as described herein), learns parameters that are required to predict the adulterant corn oil.

• A blind data set was constructed based on 20% of the corn oil brands to ensure the model is agnostic to oil. Data from the remaining 80% of the corn oil brands adulterated into peanut oil was used to construct the deep learning model.

[0104] FIGS. 9A to 9C show graphs illustrating the relationships between the actual adulteration and the predicted adulteration based on FT-NIR, FT-MIR and Raman spectroscopy. The X and Y axes denote arbitary values (ranging from 0 to 15) that respectively represent the adulteration between 85% and 99%.

[0105] From FIGS. 9A to 9C, data analysis suggested that each technology detects adulteration with significant sensitivity of about 5% margin of error, even when the physical adulteration was 1%. When data was combined, the model was able to lower this margin of error to below 5%, suggesting the utility of multi-modal photonics data for improving the detection and the technical effect was further increased. For example, the root mean square error of cross validation (RMSECV) for single technology (or method) was 0.54 and 1.30 for FT-NIR and Raman spectroscopy, respectively. When the technologies were combined (i.e, FT-NIR and Raman spectrocopy), the RMSECV was lowered to 0.02 as shown in FIG. 10. The model was also agnostic to corn oil brands, thus suggesting that the model may be agnostic to both main oil brands and adulterant brands.

[0106] To analyze the effects of different oil samples, a separate set of samples containing camellia seed oil and the adulterant of high-oleic sunflower (HOSF) oil was subjected to the two models for a blind brands test. The results obtained showed a similar reduction of RMSECV when multimodal technologies were used and were agnostic to either camellia seed oil brands or high-oleic sunflower brands, leading to the same conclusion as for the peanut oil sample and the adulterants of soybean oil and sunflower oil, thereby reflecting the ability of the deep learning model to accurately predict the oil adulterants in different oil type samples and regardless of the oil brands.

[0107] While the invention has been particularly shown and described with reference to specific embodiments, it should be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. The scope of the invention is thus indicated by the appended claims and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced.

Previous Patent: TRANSISTOR COMPRISING A CERAMIC AND AN IONOGEL

Next Patent: METHOD OF PREDICTING FOR BENEFIT FROM IMMUNE CHECKPOINT INHIBITION THERAPY