Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A RETAIL CHECKOUT TERMINAL FRESH PRODUCE IDENTIFICATION SYSTEM
Document Type and Number:
WIPO Patent Application WO/2019/119047
Kind Code:
A1
Abstract:
Disclosed are systems and methods including starting with a first number of images, generating a second number of images by digital operations on the first number of images, extracting features from the second number of images, and generating a classification model by training a neural network on the second number of images wherein the classification model provides a percentage likelihood of an image's categorisation, embedding the classification model in a processor and receiving an image for categorisation, wherein the processor is in communication with a POS system, the processor running the classification model to provide output to the POS system of a percentage likelihood of the image's categorisation.

Inventors:
HERZ MARCEL (AU)
SAMPSON CHRISTOPHER (AU)
Application Number:
PCT/AU2018/051369
Publication Date:
June 27, 2019
Filing Date:
December 20, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
TILITER PTY LTD (AU)
International Classes:
G06T7/41; A47F9/04; G06Q30/06; G06V10/774; G07G1/12
Domestic Patent References:
WO2009091353A12009-07-23
Foreign References:
US20160328660A12016-11-10
US20140126773A12014-05-08
US6260023B12001-07-10
EP0685814A21995-12-06
US20160328660A12016-11-10
Other References:
SHOTTON J. ET AL.: "Semantic Texton Forests for Image Categorization and Segmentation", 2008 CVPR, IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, June 2008 (2008-06-01), pages 1 - 8, XP031297061
NAIK S ET AL.: "Machine Vision based Fruit Classification and Grading - A Review", INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS ), vol. 170, no. 9, July 2017 (2017-07-01), pages 22 - 34, XP055621044, ISSN: 0975 - 8887
FARIA F.A. ET AL.: "Automatic Classifier Fusion for Produce Recognition", 25TH SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES, 2012, pages 252 - 259, XP032283154
BOLLE R.M. ET AL.: "VeggieVision: a produce recognition system", THIRD IEEE WORKSHOP ON APPLICATIONS OF COMPUTER VISION. WACV'96, 1996, pages 244 - 251, XP010206439, ISBN: 0-8186-7620-5
SA I ET AL.: "DeepFruits: A Fruit Detection System Using Deep Neural Networks", SENSORS, vol. 16, no. 8, 2016, pages 1222, XP055469740, DOI: 10.3390/s16081222
See also references of EP 3729375A4
Attorney, Agent or Firm:
BAXTER PATENT ATTORNEYS PTY LTD (AU)
Download PDF:
Claims:
Claims

1 . A method of image categorisation, comprising:

in pre-processing, starting with a first number of images, generating a second number of images by digital operations on the first number of images, extracting features from the second number of images, and generating a classification model by training a neural network on the second number of images wherein the classification model provides a percentage likelihood of an image’s categorisation; embedding the classification model in a processor; and

receiving an image for categorisation, wherein the processor is in communication with a POS system, the processor running the classification model to provide output to the POS system of a percentage likelihood of the image’s categorisation.

2. The method of claim 1 wherein pre-processing an image for categorisation

comprises capturing the image, extracting the features and applying the extracted features to the neural network of the classification model to generate a percentage likelihood of the image’s categorisation.

3. The method of claim 1 wherein in pre-processing, a pre-trained Convolution

Neural Network (CNN) trained on large unrelated or separate data sets used as a feature detector.

4. The method of claim 1 wherein the neural network comprises a Fully -Connected Neural Network.

5. The method of claim 3 wherein the feature extraction comprises:

a. the pre-trained CNN;

b. Colour space histogram

c. Texture features by numerical feature vector and

d. Dominant Colour Segmentation

6. The method of claim 1 wherein the POS system receives formatted

communication of output by the classification model, the formatted output comprising a protocol for providing scores for the percentage likelihood of the image’s category to the POS system.

7. A method of a system external to a Point-of-Sale (POS) system wherein the

external system comprises a processor and captures an image and runs a classification model embedded in the processor that provides as output scores for the percentage likelihood of the image’s category and the external system generates a formatted communication as output comprising a protocol to the POS system, wherein the POS system receives the formatted communication of the output by a classification model of the external systems.

8. The method of claim 7 wherein the model which is embedded in a processor in pre-processing, starts with a first number of images, generates a second number of images by augmentation of the first number of images, extracts features from the second number of images, and generates the classification model by processing the second number of images by a neural network to provides a percentage likelihood of the image’s categorisation.

9. The method of claim 8 wherein in pre-processing, the neural network comprises a pre-trained Convolution Neural Network (CNN) trained on large unrelated or separate data sets used as a shape and edge detector.

10. The method of claim 8 wherein the neural network comprises a Fully-Connected Neural Network.

1 1. The method of claim 9 wherein the feature extraction comprises:

e. The pre-trained CNN

f. Colour space histogram

g. Texture features by numerical feature vector and

h. Dominant Colour Segmentation

12. A method for categorising products, comprising:

populating with a first number of images, generating a second number of images by augmentation of the first number of images, feature extracting from the second number of images wherein the feature extracting comprises running a pre-trained Convolution Neural Network (CNN) as a high-level edge and shape identifier and then generating a classification model by processing the second number of images by a neural network wherein the classification model provides a percentage likelihood of an image’s categorisation.

13. The method of claim 12 wherein generating the classification model further

comprises pre-processing feature extraction of the second number of images.

14. The method of claim 12 wherein the feature extraction comprises:

a. The pre-trained CNN

b. Colour space histogram

c. Texture features by numerical feature vector

d. Dominant Colour Segmentation

15. The method of claim 12 wherein the neural network comprises a Fully-Connected Neural Network.

16. The method of claim 12 wherein the classification model is embedded in a

processor that is external to a POS system and which is in communication with the POS system.

17. The method of claim 16 wherein the POS system receives formatted

communication of output by the classification model, the formatted output comprising a protocol for providing scores for the percentage likelihood of the image’s category to the POS system.

18. A method for expanding an image data set, comprising:

in pre-processing, starting with a first number of images, segmenting the first number of images, generating a second number of images by digital operations on the first number of images, extracting features from the second number of images and processing the second number of images by a neural network, and thereby generating a classification model for deployment, wherein segmentation of an image is not performed at deployment.

19. The method of claim 18 wherein at deployment, capturing the image, extracting the features and applying the extracted features to the neural network of the classification model to generate a percentage likelihood of the image’s categorisation.

20. The method of claim 18 wherein in pre-processing, the feature extraction

comprises a pre-trained Convolution Neural Network (CNN) trained on large unrelated or separate data sets used as a shape and edge detector.

21 . The method of claim 18 wherein the neural network comprises a Fully-Connected Neural Network,

22. The method of claim 20 wherein the feature extraction comprises:

a. The pre-trained CNN

b. Colour space histogram

c. Texture features by numerical feature vector and

d. Dominant Colour Segmentation

23. The method of claim 18 wherein for deployment, the classification model is

embedded in a processor that is external to a POS system and which is in communication with the POS system.

24. The method of claim 23 wherein the POS system receives formatted

communication of output by the classification model, the formatted output comprising a protocol for providing scores for the percentage likelihood of the image’s category to the POS system.

25. A method of image categorisation, comprising: starting with a first number of images, generating a second number of images by performing digital operations on the first number of images, extracting features from the second number of images in accordance with:

a. a pre-trained Convolution Neural Network (CNN) trained on large unrelated or separate data sets used as a feature detector;

b. Colour space histogram;

c. Texture features by numerical feature vector; and

d. Dominant Colour Segmentation

and training a neural network on the features extracted to generate a classification model wherein the classification model provides a percentage likelihood of an image’s categorisation.

26. The method of claim 25, further comprising:

embedding the classification model in a processor; and

receiving an image for categorisation, wherein the processor is in communication with a POS system, the processor running the classification model to provide output to the POS system of a percentage likelihood of the image’s categorisation.

27. The method of claim 26 wherein the POS system receives formatted

communication of output by the classification model, the formatted output comprising a protocol for providing scores for the percentage likelihood of the image’s category to the POS system.

Description:
A RETAIL CHECKOUT TERMINAL FRESH PRODUCE IDENTIFICATION SYSTEM

Field of the Invention

[1 ] This invention relates to a retail checkout terminal fresh produce identification system which, more specifically, employs machine learning using a fresh produce learning set to visually classify a fresh produce type in use when presented with image data captured at the terminal. Furthermore, the present machine learning system is trained in a particular way to address the limitations inherent in fresh produce retail environments. Whereas the present system and methodology has wide application for differing types of retail checkout terminals, such will be described hereunder primarily with reference to self-service checkout terminals. However, it should be appreciated that the invention is not necessarily be limited to this particular application within the purposive scope of the embodiments provided.

Background of the Invention

[2] Self-service checkouts are increasingly commonplace today wherein shoppers are able to scan items and make payment substantially autonomously.

[3] Whereas barcodes may be scanned on packaged goods, for fresh produce, such as fresh fruit, vegetables and the like, users are required to make a selection from an onscreen display.

[4] However, such an approach is inaccurate with fresh produce items often being misclassified, inadvertently or dishonestly.

[5] The present invention seeks to provide a way, which will overcome or substantially ameliorate at least some of the deficiencies of the prior art, or to at least provide an alternative.

[6] It is to be understood that, if any prior art information is referred to herein, such reference does not constitute an admission that the information forms part of the common general knowledge in the art, in Australia or any other country.

Summary of the Invention

[7] There is provided herein retail checkout terminal fresh produce identification system for visual identification of fresh produce wherein the system is trained using machine learning. The system may integrate with conventional checkout POS system so as to be able to output the one or more predicted fresh produce types to the checkout system for on-screen display for selection by a shopper. [8] The imaging component of the system may comprise a mechanical jig comprising lighting (typically LED lighting), optionally a suitable homogenous background and a visible spectrum camera.

[9] The camera captures an image of the fresh produce presented so as to classify the type of produce.

[10] In embodiments, the system employs supervised machine learning by way of a neural network optimisation.

[11] As will be described in further detail below, the present system is trained on a particular manner so as to address certain limitations inherent in fresh produce identification systems while maintaining desirous or suitable detection accuracies.

[12] Specifically, detection accuracy may be increased utilising large datasets so as to account for uncontrolled environments across a large number of retailers. However, problematically, such large datasets may not be available, especially for retailers stocking relatively few fresh produce items.

[13] Moreover, where large datasets are available it has not been possible in alternative solutions to generalise performance to scenes where data has not been collected. The method described solves this problem by describing how to collect and expand data such that a model can be generated that will generalise to the wide variety of environments seen in the application of POS.

[14] Furthermore, the present system is ideally suited to minimise computational requirements in terms of processing and storage so as to allow the system to be built at low cost. The system may additionally address image resolution limitations (such as down to 740 x 480 pixels).

[15] Additionally, luminosity and lighting colour fluctuations at checkout terminals may affect imaging of presented fresh produce items.

[16] Specifically, in one or a preferred embodiment, the present system captures and utilises two feature vectors extracted from fresh produce image data comprising a colour histogram feature vector and Harlick texture features which are combined to form a full feature vector. The combination of these two feature vectors and the manner of their use was found during experimentation to provide sufficient accuracy and other advantages given the inherent limitations in retail produce identification systems.

[17] In embodiments, the histogram is banded into various bands of increased width to reduce the colour histogram feature vector length so as to allow high-performance models to be trained with smaller learning datasets. Such is an important consideration for small to medium fresh produce vendor's where it is not practical to collect large numbers of training images. In this way, the performance can be improved on a small fixed size dataset by reducing the features available to the training model. In embodiments, the number of the bands may be optimised to optimise the accuracy of the neural network.

[18] Furthermore, in embodiments, 14 Harlick Texture features are used comprising:

Angular Second Moment, Contrast, Correlation, Sum of Squares: Variance, Inverse Difference Moment, Sum Average, Sum Variance, Sum Entropy, Entropy, Difference Variance, Difference Entropy, Info. Measure of Correlation 1 , Info. Measure of Correlation 2, Max. Correlation Coefficient.

[19] Again, a sub selection of these texture features may be selected when optimising the accuracy of the neural network again to address sample set and computational limitations.

[20] The neural network is trained using a fresh produce learning set which, as alluded to above, may be limited in number for particular grocers. For example, the sample set training data may be only those fresh produce items commonly stocked by a particular retail outlet.

[21] A full feature vector is calculated for each image which is then used to optimise the neural network, including the neural weightings and structure.

[22] Once trained, the system is deployed to capture images of unknown input fresh produce items belonging to a category of the learning data set presented at the checkout terminal. The deployed system similarly generates a full feature vector from the collect image and generates a prediction using the trained neural model. The prediction is then transferred to the checkout system for processing using the defined communication protocol.

[23] As such, in accordance with this arrangement, the consumer need not make selections from fresh produce items on-screen but may simply place the fresh produce item in front of the camera for identification. In embodiments, where the fresh produce item cannot be determined to a degree of accuracy, the deployed system may transmit a plurality of potential classifications to the checkout system which may then present a sub selection of fresh produce items on-screen for selection by the consumer. [24] It should be noted that no specific information about the digital signature/colour histograms of each fresh produce category is contained in the optimised neural network model once deployed onto the deployment system. As such, the deployment system does not require large memory or significant computational power to identify the fresh produce category conferring advantages in reducing computation and storage and therefore cost.

[25] EP 0685814 A2 (D1) discloses a produce recognition system. According to D1 , a processed image is compared to reference images wherein an object is recognised when a match occurs. However, in contradistinction, the present system is able to avoid deployment of reference images to deployed systems, requiring only the provision of the trained neural network model, thereby reducing the computational storage of the deployed systems.

[26] Furthermore, while D1 does mention image features including colour and texture, D1 does not utilise a full feature vector comprising a combination of only a colour feature vector and a texture feature vector as does the present system.

[27] Furthermore, D1 does not seek to reduce computational requirements and therefore does not seek to band the colour histogram as does the present system, let alone optimising the number and widths of bands for optimising accuracy to address learning set limitations. Furthermore, D1 does not disclose sub selection of texture features for further addressing such limitations.

[28] As such, with the foregoing in mind, a plurality of embodiments is disclosed herein.

Disclosed is a method of and system for image categorisation including in preprocessing, starting with a first number of images, generating a second number of images by digital operations on the first number of images, extracting features from the second number of images, and generating a classification model by training a neural network on the second number of images wherein the classification model provides a percentage likelihood of an image’s categorisation, embedding the classification model in a processor receiving an image for categorisation, wherein the processor is in communication with a POS system, the processor running the classification model to provide output to the POS system of a percentage likelihood of the image’s categorisation.

[29] In accordance with one embodiment there is provided a retail checkout terminal fresh produce identification methods and systems comprising: at least one visible spectrum camera; a processor in operable communication with the visible spectrum camera, and a memory device for storing digital data, the memory device in operable communication with the processor across a system bus; and a checkout system interface wherein, in use: the system is trainable wherein: the processor is configured for receiving fresh produce image data from a fresh produce learning set using a visible spectrum camera; the memory device comprises a feature vector generation controller configured to generate a full feature vector for each fresh produce image, the full feature vector comprising a combination of: a colour histogram feature vector; and a texture feature vector; dominant colour segment vector; pre-trained convolutional neural network; the memory device comprises a neural network optimisation controller for optimising a neural network model, the neural network optimisation controller configured for optimising the neural network model utilising the full feature vector; and the optimised neural network model is deployed to the system, and the system is deployable to predict fresh produce classifications wherein: the processor is configured for receiving image data via a visible spectrum camera; the feature vector generation controller is configured for calculating a full feature vector for the image data comprising a colour histogram feature vector and a texture feature vector and inputting the full feature vector into a neural network optimised with the neural network model to output a fresh produce classification prediction; and the system outputs the fresh produce classification prediction via the checkout system interface.

[30] In another embodiment, disclosed is are methods of systems for image categorisation includes starting with a first number of images, generating a second number of images by performing digital operations on the first number of images, extracting features from the second number of images in according with:

a. a pre-trained Convolution Neutral Network (CNN) trained on large data set used as a feature detector;

b. Colour space histogram;

c. Texture features by a numerical feature vector; and d. Dominant Colour Segmentation

and training a neural and training a neural network on the features extracted to generate a classification model wherein the classification model provides a percentage likelihood of an image’s categorisation.

[31 ] Also disclosed is are methods and systems for categorising products, including populating with a first number of images, generating a second a second number of images by digital operations on the first number of images, feature extracting from the second number of images wherein the feature extracting comprises running a pretrained Convolution Neural Network (CNN) as a feature extractor and then generating a classification model by processing the second number of images by a neural network wherein the classification model provides a percentage likelihood of an image’s categorisation.

[32] Furthermore disclosed is are methods of a systems external to a Point-of-Sale (POS) system wherein the external system comprises a processor and captures an image and runs a classification model embedded in the processor that provides as output scores for the percentage likelihood of the image’s category and the external system generates a formatted communication as output comprising a protocol to the POS system , wherein the POS system receives the formatted communication of the output by a classification model of the external systems.

[33] Moreover disclosed is a method and system for expanding an image data set, including in pre-processing, starting with a first number of images, segmenting the first number of images, generating a second number of images by digital operations on the first number of images, extracting features from the second number of images and processing the second number of images by a neural network, and thereby generating a classification model for deployment, wherein segmentation of an image is not performed at deployment.

[34] Further features include that the colour histogram feature vector may be normalised to a scale.

[35] The scale may be between 0 and 1

[36] The feature vector generation controller may be configured for banding colour histogram feature vector into discreet bands.

[37] The neural network optimisation controller may be configured for optimising the number discreet bands.

[38] The discreet bands may comprise between 5 - 100 bands.

[39] The discreet bands may comprise 10 bands.

[40] The texture feature vector may comprises a plurality of texture features.

[41] The texture features may comprise at least a subset of Angular Second Moment, Contrast, Correlation, Sum of Squares: Variance, Inverse Difference Moment, Sum Average, Sum Variance, Sum Entropy, Entropy, Difference Variance, Difference Entropy, Info. Measure of Correlation 1 , Info. Measure of Correlation 2, Max. Correlation Coefficient.

[42] The neural network optimisation controller may be configured for selecting a subset of the plurality of texture features for optimising the accuracy of the neural network.

[43] The subset of the plurality of texture features may comprises between 8 and 12 texture features.

[44] The texture features may comprise 10 feature vectors.

[45] The neural network optimisation controller may be configured for optimising the number of neurons of the hidden layer.

[46] The number of neurons may be between 100 and 120.

[47] The number of neurons may be 1 16.

[48] Other aspects of the invention are also disclosed.

Brief Description of the Drawings

[49] Notwithstanding any other forms which may fall within the scope of the present invention, preferred embodiments of the disclosure will now be described, by way of example only, with reference to the accompanying drawings in which:

[50] Figure 1 shows a retail checkout terminal fresh produce identification system 100 in accordance with an embodiment;

[51] Figures 2-6 show exemplary colour histogram vectors and associated banded colour histogram from different types of fruit;

[52] Figure 7 show exemplary Harlick feature vectors for the same fruits of figures 2-6;

[53] Figure 8 shows test results from it and layer optimisation;

[54] Figure 9 shows test results from colour band optimisation;

[55] Figure 10 shows test results from texture feature optimisation; and

[56] Figure 11 shows test result detection accuracies.

[57] Figure 12 depicts the pre-processing of a 1 st number of images to generate a 2 nd number of images.

[58] Figure 13 depicts the feature extraction of the 2 nd number of images and the learning process to generate a categorisation model based upon the 2 nd number of images.

[59] Figure 14 depicts the deployed categorisation model in use with a POS system. Description of Embodiments

[60] Figure 1 shoes a retail checkout terminal fresh produce identification system 100 which is trained to predict fresh produce classifications for self-service terminals.

[61 ] The system 102 comprises a processor 105 for processing digital data. In operable communication with the processor 105 across a system bus is a memory device 106. The memory device 106 is configured for storing digital data, including computer program code instructions and associated data. As such, in use, the processor 105 fetches these computer program code instructions and associated data from the memory 106 for interpretation and execution thereon.

[62] In embodiments shown in Figure 1 , the computer program code instructions of the memory device 106 have been shown as being logically divided into various computer program code controllers which will be described in further detail below.

[63] In use, the system 102 is trained utilising a fresh produce learning set 101 .

[64] In this regard, the system 102 may operate as a training system 106 and a deployed system 1 14 for respective operations which may comprise shared or separate componentry. As alluded to above, the deployed system 104 may take the form of a low-cost computing device having low processing and storage requirements.

[65] The system 102 may comprise mechanical jig 1 15 for capturing image data. In this regard, the jig 1 15 may comprise a visible spectrum camera 104 and associated lighting and an appropriate a homogenous surface background (not shown) for optimising the image capture process. As such, during training, during use, an item of fresh produce would be placed in front of the visible spectrum camera 104 for the capturing of image data therefrom or alternatively loaded from an image database.

[66] During the training process, the processor 106 stores an image data set 107 captured by the camera 104 from the learning fresh produce set 101 in memory 106. In embodiments, for a particular grocer, the grocer may provide a sample set of each produce type to the system for learning.

[67] The memory 105 may comprise an image cropping and segmentation controller 108 for cropping and segmenting the portion of the image comprising the fresh produce from the homogenous background. In embodiments, the image segmentation may employ Otsu's algorithm.

[68] The image cropping and segmentation controller 108 isolates the fresh produce item image from the homogenous background for use for generating the full feature vector that contains only the fresh produce item on a preferably black background. This process minimizes any background interference and reduces the training required for the neural network model allowing smaller learning data sets 101 to be used to achieve good prediction performance.

[69] Thereafter, a luminosity correction controller 109 may adjust the luminosity of the image data.

[70] The luminosity may be corrected as part of the data collection and prediction process by normalising the average grayscale luminance of the RGB data image. The RGB images may be converted to grayscale equivalents using a standard colour to grayscale conversion. An average grayscale image luminance set point is chosen, typically half the dynamic range, to normalise all RGB images to have an approximately equal average grayscale luminance.

[71] The luminosity correction may comprise the following computational steps:

Let Ixyz represent an image of x c y pixels, where z = {r,g,b} represents each colour component of the RGB image.

Let Ig represent the x x y pixels grayscale representation of Ixyz. Ig~ represents the average luminance of Ig

Let Isp the average grayscale luminance set point.

Where Lc is the luminosity corrected image:

he = I g~~ Isp

[72] The memory device 106 may further comprise a feature vector generation controller configured for generating a full feature vector 110 comprising a combination of a colour histogram feature vector 111 and a Harlick texture feature vector 1 12.

Colour histogram

[73] The colour histogram of each fresh produce item is used as a feature of the digital signature vector of the fresh produce item. A set of the colour histograms along with texture features is used to train the neural network model 1 13.

[74] Figures 2 - 6 show exemplary RGB colour histograms for differing types of apples.

[75] The colour histogram may be created by taking an RGB image and creating a histogram using all possible colour intensity values as the x-axis bins and collecting the frequency of occurrence for each intensity. [76] The colour histogram may be normalised by scaling the maximum frequency of the banded colour histogram to 1 . All other histogram values are linearly reduced by the same scaling factor such that all the colour histogram values are between 0 and 1 .

[77] In a preferred embodiment, the histogram bin width is increased in a 'banding' process. This process reduces the colour histogram feature vector length allowing high performance models to be trained using smaller learning data sets.

[78] This is an important feature for small to medium fresh produce vendors where it is not practical to collect large numbers of images. The performance can be improved on a small fixed size data set by reducing the features available to the training model. The banding process is carried out by reducing the number of bins in the colour histogram. The bins in the full colour histogram may be allocated sequentially and distributed evenly to each larger bin. The larger bin frequencies may be calculated by averaging the frequency in the smaller bins allocated to it. The result is the banded colour histogram as shown in Figures 2B-4B.

[79] In embodiments, the number/widths of bins may be optimised when optimising the neural network.

[80] The calculation of the colour histogram and banded colour histogram may comprise:

Full histogram:

Let Ixyz represent an image of x x y pixels, where z = {r, g, b} represents each colour component of the RGB image.

Iverson brackets are defined here as: |P| = i 1 ’ ^ P . is true

t 0, otherwise

Where n is the colour depth, histogram bins are calculated as:

The three components of the of the histogram vector are then:

FF = (r Q , r 1 , ... , r n - l)

FG— (go, , , g n — i)

FF = (b 0 J 1 . b n - 1)

Let the maximum normalised colour histogram vector be: Z = -—

max (z) The full histogram vector is then constructed as:

FH = (FR,FG,FB)

Banded histogram:

Where b is the number of bands: the band width is calculated as: n =

The banded histogram bins are calculated as:

Note: b-1 will be an average of (m - 1) components if n is not an integer multiple of b.

The three components of the of the banded histogram vector are then:

BR = (r 0 , r 1 ... , r n - 1)

BG = (g 0 , g 1 , ... , g n - l)

The banded histogram vector is then constructed as:

BH=( BR, BG, BB)

[81] In embodiments, 14 texture features (e.g. Harlick Texture features) are used comprising: Angular Second Moment, Contrast, Correlation, Sum of Squares: Variance, Inverse Difference Moment, Sum Average, Sum Variance, Sum Entropy, Entropy, Difference Variance, Difference Entropy, Info. Measure of Correlation 1 , Info. Measure of Correlation 2, Max. Correlation Coefficient. The mathematical calculation of these features is common state of the art and is presented here.

[82] These 14 texture features are combined into a vector and used as a set of features for training the neural network predictor. Specifically, Figure 7 shows texture feature vector is collected for each of the apple varieties mentioned above.

[83] The full feature vector 110 is then utilised to optimise the neural network model 113. [84] The prediction output by the system 100 may be sent via an interface to a self- service terminal 1 19 for on-screen display of the predicted item on a screen 120. Such a prediction may be cross referenced with a produce database 122 for the purposes of checkout. As alluded to above, where an item cannot be predicted to a certain degree of accuracy, a sub selection interface 121 may be presented on screen 120 comprising the likely candidates from the set of produce items for selection by the user.

Neural network optimisation - exemplary test results

[85] During optimisation, the following parameters may be varied: number of neural network layers, number of neural network nodes in each layer, number of colour histogram bands, number of texture features.

[86] The system may automate the optimisation of these parameters to optimise the detection accuracy of the neural network model 1 13.

[87] For example, for the exemplary test results provided below, for the colour histogram and texture feature vector is provided in figures 2 - 7, 2866 images where used over 6 categories comprising:

[88] 1. Apple - Granny Smith

[89] 2. Apple - Pink Lady

[90] 3. Apple - Red Delicious

[91] 4. Apple - Royal Gala

[92] 5. Mandarin - Imperial

[93] 6. Orange - Navel

[94] During training as will be described in further detail below the system optimised the neural network model 113 to comprise a single hidden layer having 137 hidden layer neurons and utilising all texture features and 10 bands for the banded colour histogram.

[95] Figures 8 - 10 provide various performance plots to illustrate the optimisation process

[96] For each of the various model configurations, 22 models were developed using random

[97] selections of training, validation and test sets. The performance was then checked using a test set where the following are calculated and plotted using the 22 runs: a. Mean

b. 95% confidence internal for the mean

c. Minimum.

[98] Selection of the best neural network model configuration is based on these parameters

wherein the best performing model has the largest mean, smallest confidence interval and a minimum close to the lower confidence interval. Model performance is compared based on these parameters. Selecting a model based on this approach provides a solution to finding optimal performance on a small dataset. The typical solution to increasing model performance is to increase the dataset size, however, as alluded to above, in particular for fresh produce markets, only a limited fresh produce learning set 101 may be available. As such, for the present application, it is not possible to adopting the conventional approach of increasing the size of the learning set for optimising accuracy.

Hidden layer optimisation

[99] Figure 8 illustrates hidden layer optimisation wherein, as can be seen, although the mean performance is quite consistent peaking at ~78% where 1 16 hidden neurons are used, using approximately 140 hidden neurons results in widely varying model performance.

Colour band optimisation

[100] Figure 9 additionally shows why colour band optimisation is important for achieving optimal performance on the small fresh produce learning set 101 . Specifically, when the full 256 colour bands are used, performance peaks at 1 16 hidden layer neurons with a mean of ~78%. With 10 colour bands the performance peaks at ~87%.

Texture feature optimisation

[101 ] Figure 10 illustrates optimising the number of texture features.

[102] Although the models with 5, 10 and 14 texture features have mean performance of ~87%, models with 10 texture features yield more consistent performance and therefore would be the best choice for this particular application.

[103] Figure 1 1 illustrates the final performance of the system 100 according to this exemplary test.

[104] As discussed above, disclosed are a plurality of embodiments of methods and systems for image categorisation. Above, discussed in detail is one embodiment which utilises two feature vectors extracted from fresh produce image data comprising a colour histogram feature vector and Harlick texture features which are combined to form a full feature vector. That is, above, methods and systems utilising two feature extraction processes of a feature vector generation controller are disclosed. Alternatively, a feature vector generation controller is disclosed which processes an image data set according to the following feature extraction processes: a. a pre-trained Convolution Neural Network (CNN) trained on large unrelated and/or separate data sets used as a feature detector;

b. Colour space histogram;

c. Texture features by a numerical feature vector; and

d. Dominant colour segmentation

[105] A feature vector generation controller to affect the execution of the categorization model may therefore be embedded in a processor utilized at deployment.

[106] As previously discussed, the deployment system 102 including a camera to capture an image at the place of deployment may take the form of a low-cost computing device having low processing and storage requirements. The deployment system 102 can be installed external to a POS system or may integrated into a POS system. Either way, external or integrated, in order for the presently described systems and methods to be relevant in a commercial setting, the providing a percentage likelihood of an image’s categorization quickly, reliably and inexpensively is preferred. Current POS systems can cooperate with the presently described systems and methods externally. Thus, the presently described systems and methods are designed to communicate with a POS system a percentage likelihood of an image’s categorization with a disclosed protocol.

[107] As discussed above, in order to train the neural network on input images, a learning set is provided. In Figure 1 , digital operations including image crop and segmentation 108 and luminosity correction 109 are performed on a learning set. Further detail of a training system provided in Figure 12, wherein the image data set is a“2 nd number of images” generated as a result of digital operations on a“1 st number of images.”

[108] Figure 12 depicts the pre-processing of a 1 st number of images to generate a 2 nd number of images. A 1 st number of images may be, for example, 100 images of one type of fruit maybe provided at steps 150 and 152. The 1 st number of images may further be processed including the product images having the background removed 154 to generate mask product images 156. The images 152 and 156 are then subjected to augmenting techniques 158 including flip, rotate, sheer, zoom and skew as well as augmenting to deal with lighting and product variation, including lighting colour and brightness shifting and colour variation noise. This augmenting process may expand the original image set by > 10x. This expanded product image set may be combined with a background set, including empty scenes where the classification model is to be deployed, by and overlaying images are randomly or non-randomly for scene simulation 160 to generate a 2 nd expanded number of product images 162. The background set may contain >10 scenes and may include various lighting conditions, to provide robustness in addition to the augmentation. To maximise performance across the expected deployment environments, the background set may be augmented as described including lighting and colour variation. The background set may be an exhaustive representation of all environments expected in deployment. From the initial 1 st number of images, for example, 100 images, with product image expansion and scene simulation > 10000 number of images with sufficient quantitative variation (which may be segmented images or non-segmented images) to train the neural network may be generated, such that the neural network is robust to variation in lighting, background and natural product variation.

[109] Other types of images can be also processed as described above to simulate conditions without the expensive process of data collection. For example, bag simulation on items using images of bag texture blended with masked produce images may be utilized, to provide robust classification performance where products are placed in semi- translucent packaging. Furthermore, hand simulation for example with product by using hand images combined with masks may be utilized. For products that are not produce, such as loose bulk products, the same processes can be utilised. A benefit is that items without bar codes can be processed quickly at the site of deployment as if they had a bar code.

[1 10] Now turning to Figure 13, Figure 13 depicts the feature extraction of the 2 nd number of images and the learning process to generate a categorisation model based upon the 2 nd number of images. The feature extraction process may include utilising a pre-train convolution neural network 180a, CNN, use a high-level feature extraction. The CNN may be pre-trained on a large dataset, for example, millions of images, and then truncated to deliver general feature extraction/detection. It is beneficial to choose a high performing low computational architecture e.g. the MobileNet architecture. This approach delivers much higher performance since the pre-trained CNN is able to identify high level features in various scenes. In combination with more general features such as colour and texture an exceptional level of performance and generalisation can be achieved. Note this approach is contrast to state-of-art for product identification where typically, a CNN architecture would be trained on the available dataset explicitly. While these architectures have proven to perform well in a wide variety of applications, without significant number of images per category » 1000 training these models proves to be difficult and models do not generalise.

[1 1 1 ] The generated 2 nd expanded number of product images 162 are received so that quantitative features can be extracted. As discussed above, the feature extraction can include, a pre-trained Convolution Neural Network (CNN) trained on large data set separate to the 1 st set of images used as a feature detector 180b, colour space histogram 182 such as R, G, B colour histograms (wherein colour bands may be optimized), texture features by a numerical feature vector 184 such as texture features using Harlick texture feature and Dominant colour segmentation 186 such as dominant colour segments using K-Means Colour segmentation and the pretrained CNN 180b which as discussed is a pretrained Convolution Neural Network (CNN) trained on large data set.

[1 12] A fully-connected feed-forward neural network 188 trains on the features extracted from the input images 162. The feed-forward neural network may generate a score for each category to output a classification model 190 which can run on a feature vector generation controller to make predictions of images received at a deployment location. The classification model 190 may be embedded as a feature vector generation controller and incorporated into an inexpensive processor 105 of Figure 1 . In keeping with the commercial aspects of the presently disclosed systems and processes, the benefit of running the categorization model arrived at through the described processes, is that running the same categorization model requires little processing power for quick output at the deployment location and does not require storage of images or data signatures.

[1 13] Turning to Figure 14, Figure 14 depicts the deployed categorisation model in use with a POS system, that is, the disclosed methods and systems can include a system 194 external to a Point-of-Sale (POS) system 196 at the deployment location wherein the external system comprises a processor 105 and captures an image with a visible spectrum camera 198 for example, of an unknown fresh product item 200, and runs a classification model embedded in the processor 105 that provides as output scores for the percentage likelihood of the image’s category. The image feature extraction at the POS system deployment utilizes the same parameters and configuration as that used in training. Alternatively, the deployed features extraction may include variations such as nonsegmentation discussed below. Training data is stored in the Cloud and not locally. Only a small trained categorization model is embedded in the processor 105 so that the feature vector generation controller is deployed which is a fraction of the size of the data. Training data for 100 categories for example may be > 40GB where the deployed model and code base in < 150MB. [1 14] The external system 194 may generate a formatted communication as output comprising a protocol to the POS system, wherein the POS system 196 receives the formatted communication of the output by a classification model of the external systems. Depending upon the pre-processing, and starting with a first number of images, segmentation of the first number of images may be processed wherein segmentation of an image received from the visible spectrum camera 198 is not performed at deployment. Alternatively, the visible spectrum camera 198 may be a 3-D camera so that segmentation is not performed at deployment, but rather achieved by depth thresholding. Various adjustments may be made to limit the amount of processing required at deployment so as to allow the processing to occur quickly. The present systems and methods are intended to operate quickly and for the hardware of the vision system 194 to be inexpensive.

[1 15] Segmentation (extraction of only masked image produce in foreground without background), as noted may affect processing efficiency at the deployment location. As mentioned, prediction may be run on non-segmented or segmented images. For segmented images background simulation is not required. Segmentation robustness depends on the approach: Threshold background subtraction: Create a model (e.g. Guassian, KNN) of the background using >= 1 images. Compare to produce image to create the mask. Using stereoscopic imaging to obtain depth information and create a mask based on a known background depth. For non-segmented, scene simulation may be used to teach a system to recognize produce in various environments.

[1 16] In another embodiment, the deployed system may combined multiple viewing angles to increase statistical variance of features. Multiple cameras may be implemented by combining (stitching) images into a single image and running through the previously discussed prediction process. Mirrors may be used to enhance viewing angles and increase variance may be used in the same stitching process. No lights may be achieved with auto exposure adjustment or and HDR capable camera. The camera can be live calibrated using laser or external lighting in a scanner or dedicated calibration lighting/laser. The disclosed systems and methods may be implemented to only disable selection of non-barcoded items when a sufficient prediction score is not reached.

[1 17] The algorithmic instructions for features extraction, including the pre-trained CNN, along with the trained neural network may be deployed on an external system with communication to the POS system such as a low-cost single board computer or deployed directly onto the POS machine where communication is facilitated virtually.

[1 18] As shown in FIG. 13, the communication with the POS system may be implemented an HTTP server that sends JSON document to provide a percentage likelihood of an image’s 200 categorization. It uses the an ethernet connection, but can be adapted for Wifi, Serial or other digital data transfer mechanism

[1 19] The disclosed systems and methods can provide for various prediction options. Object detection using threshold assessment of a masked image; trigger by external input (fresh produce button pressed, scale stability); run prediction constantly making result always available to external system when required; and/or use constant prediction to assess if produce is present and trigger external system when sufficient certainty is reached. Certain categories may not be required when they are not active in the POS system. A minimum top score may be achieved so that one or more results may be displayed with an optional cut-off score for ranked results is provided.

[120] The disclosed system includes a convenient way to communicate with the POS via a defined protocol. Predictions made by the device may be stored JavaScript Object Notation (JSON) file allowing easy integration into most programming languages. The JSON file may be served periodically or when requested by the POS via a HTTP server or serial link. Other standard data structures e.g. XML, may be used which allow formatting of the following information. Table 1: JSON Structure

Contains a single JSON list "predictions".

The list is sorted by "score" from largest to smallest.

Table 2: Message types

[121] The disclosed systems and methods provide a scalable solution for identifying non- barcoded items such as fresh produce and buy-in-bulk items at a POS system using a camera. The disclosed systems and methods solutions allow adding new items. The disclosed systems and methods beneficially allows a general solution by learning high level feature relationships that can be taught to account for variation in lighting, background and seasonal variations. The disclosed systems and methods avoid expensive hardware and is therefore scalable due to low implementation costs. Also avoided are high internet bandwidth and server costs that could prohibit using cloud API service.

[122] The foregoing description, for purposes of explanation, used specific nomenclature to provide a thorough understanding of the invention. However, it will be apparent to one skilled in the art that specific details are not required in order to practice the invention. Thus, the foregoing descriptions of specific embodiments of the invention are presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed; obviously, many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, they thereby enable others skilled in the art to best utilise the invention and various embodiments with various modifications as are best suited to the particular use contemplated. It is intended that the following claims and their equivalents define the scope of the invention.