Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND COMPUTER PROGRAM PRODUCT FOR CHARACTERIZING A RETINA OF A PATIENT FROM AN EXAMINATION RECORD COMPRISING AT LEAST ONE IMAGE OF AT LEAST A PART OF THE RETINA
Document Type and Number:
WIPO Patent Application WO/2017/046378
Kind Code:
A1
Abstract:
Method comprising the steps consisting in: - automatically detecting image patterns on the image, - automatically associating image patterns with visual words, - automatically providing a distribution of visual words over at least a part of the image of the retina to be characterized, wherein elementary clusters each grouping an elementary number of training image patterns detected in each training image, and composite clusters each grouping at least two elementary clusters are defined, wherein visual words at different granularity levels are defined, a finest-grained visual word being assigned to each elementary cluster and a coarser-grained visual word being assigned to each composite cluster, and and wherein, for each image pattern: - a corresponding cluster is selected among the elementary and composite clusters, - said image pattern is associated with the visual word of the corresponding cluster.

Inventors:
QUELLEC GWENOLÉ (FR)
LAMARD MATHIEU (FR)
CAZUGUEL GUY (FR)
Application Number:
PCT/EP2016/072050
Publication Date:
March 23, 2017
Filing Date:
September 16, 2016
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
INSERM (INSTITUT NAT DE LA RECH MÉDICALE) (FR)
UNIV BRETAGNE OCCIDENTALE (FR)
INST MINES-TELECOM (FR)
International Classes:
G06F19/00
Domestic Patent References:
WO2013184070A12013-12-12
Other References:
GWÉNOLÉ QUELLEC ET AL: "A multiple-instance learning framework for diabetic retinopathy screening", MEDICAL IMAGE ANALYSIS, vol. 16, no. 6, 1 August 2012 (2012-08-01), pages 1228 - 1240, XP055130437, ISSN: 1361-8415, DOI: 10.1016/j.media.2012.06.003
WEI BU ET AL: "Hierarchical Detection of Hard Exudates in Color Retinal Images", JOURNAL OF SOFTWARE, vol. 8, no. 11, 1 November 2013 (2013-11-01), XP055259695, ISSN: 1796-217X, DOI: 10.4304/jsw.8.11.2723-2732
Attorney, Agent or Firm:
CABINET PLASSERAUD (FR)
Download PDF:
Claims:
CLAIMS

1. Method for characterizing a retina of a patient from an examination record comprising at least one image of at least a part of the retina, the method comprising the steps consisting in:

- automatically detecting image patterns on the image, each image pattern corresponding to a singularity of the retina to be characterized,

- automatically associating image patterns with visual words,

- automatically providing a distribution of visual words over at least a part of the image of the retina to be characterized,

the method being characterized in that, based on a plurality of training examination records each comprising at least one training image of at least a part of the retina of a patient, elementary clusters each grouping an elementary number of training image patterns detected in each training image, and composite clusters each grouping at least two elementary clusters are defined,

in that visual words at different granularity levels are defined, a finest-grained visual word being assigned to each elementary cluster and a coarser-grained visual word being assigned to each composite cluster, and

in that the step of automatically associating image patterns with visual words comprises, for each image pattern:

- selecting a corresponding cluster among the elementary and composite clusters,

- associating said image pattern with the visual word of the corresponding cluster.

2. Method according to claim 1, further comprising, before the step of automatically associating image patterns with visual words, a step consisting in automatically defining visual words at different granularity levels which comprises:

- providing a plurality of training examination records each comprising at least one training image of at least a part of the retina of a patient,

- detecting training image patterns in each training image,

- clustering the training image patterns into elementary clusters each grouping an elementary number of training image patterns, and into composite clusters each grouping at least two elementary clusters,

- assigning a finest-grained visual word to each elementary cluster and a coarser- grained visual word to each composite cluster.

3. Method according to claim 2, wherein the step of clustering the training image patterns comprises:

- characterizing the training image patterns of the training images, each training image patterns having a training feature vector describing the corresponding singularity and a surrounding of the corresponding singularity of the retina,

- calculating a central vector for each of the elementary and composite clusters based on the training feature vectors.

4. Method according to claim 3, wherein the step of automatically associating image patterns with visual words further comprises characterizing image patterns of the image of the retina to be characterized, each image pattern having a feature vector describing the corresponding singularity and a surrounding of the corresponding singularity of the retina,

and wherein the corresponding cluster selected for each image pattern is one of the elementary and composite clusters having the nearest central vector to the feature vector of said image pattern.

5. Method according to any of claims 2 to 4, wherein the step of clustering the training image patterns comprises:

- defining at least one index representative of a predetermined state of the retinas of the examination records,

- labelling elementary clusters extracted from training images of the examination records that present the predetermined state as positive, and elementary clusters extracted from training images of the examination records that do not present the predetermined state as negative,

- grouping positive elementary clusters and grouping negative elementary clusters to form composite clusters.

6. Method according to claim 5 when it depends on claim 3, wherein each elementary cluster extracted from training images of retinas is labelled either positive or negative in accordance with a diverse density score defined by, for a z'-th cluster Ci of central vector c;, i = [1 , . .. , K], K being the total number of elementary clusters:

where R^ denotes the -th positive examination record,

denotes the y'-th negative examination record,

g(j,k, +) denotes the training feature vector of the -th training image pattern of the y'-th positive examination record,

g(j,k,-) denotes the training feature vector of the k-th training image pattern of the y'-th negative examination record,

δ stands for the label + or -.

7. Method according to claim 6, wherein the step of clustering the training image patterns into elementary clusters comprises determining an optimal total number Ko of elementary clusters by:

- increasing the total number K of elementary clusters until an increase of a

1 K

criterion— ^ DD Ct ) reaches a threshold,

K ί=ι

- when the criterion reaches the threshold, assigning the value of K to Ko.

8. Method according to any of claims 1 to 7, wherein each examination record comprises a plurality of images of at least one retina, the method further comprising a step of creating a composite image of said retina.

9. Method according to claim 8, further comprising:

- before the step consisting in providing the distribution of visual words, the step consisting in automatically dividing the composite image into at least two regions,

- during the step consisting in providing the distribution of visual words, providing the distribution of visual worlds over each region of the composite image.

10. Method according to any of claims 1 to 9, wherein each examination record further comprises contextual data relative to the patient,

wherein, based on the plurality of training examination records, frequent itemsets containing at least one individual semantic item related to the distribution of visual words and/or at least one individual contextual item related to the contextual data are defined, and wherein the method further comprises, after the step of providing the distribution of visual words, a step consisting in checking the occurrence of the frequent itemsets in the examination record of the retina to be characterized.

11. Method according to claim 10, further comprising a step consisting in defining frequent itemsets which comprises:

- for each training examination record:

defining the individual semantic items each describing in a binary manner a part of the distribution of visual words, a combination of the individual semantic items describing the whole distribution of visual words,

defining the individual contextual items each describing in a binary manner a part of the contextual data, a combination of the individual contextual items describing the whole contextual data,

- mining the training examination records to identify as frequent itemsets combinations of at least one individual semantic item and/or at least one individual contextual item having an occurrence above a determined confidence threshold.

12. Method according to any of claims 10 and 11 when it depends on claim 9, wherein the frequent itemsets further contain at least one individual semantic item related to the distribution of visual words and/or at least one individual contextual item related to the contextual data and/or at least one individual spatial item related to the region of the structure of interest.

13. Method according to claim 12 when it depends on claim 11, wherein the step of defining frequent itemsets further comprises:

- for each training examination record, defining individual spatial items each describing in a binary manner at least a part of one of the regions of the structure of interest, a combination of the individual spatial items describing the whole regions of the structure of interest,

- mining the training examination records to identify as frequent itemsets associations of at least one individual semantic item and at least one individual contextual item with at least one individual spatial item having an occurrence above the determined confidence threshold.

14. Method according to any of claims 10 to 13, further comprising a step of associating each frequent itemset with a criteria related to the retina.

15. Computer program product for characterizing a retina of a patient from an examination record comprising at least one image of at least a part of the retina, said computer program product being stored on a carrier readable by an electronic processing unit and comprising instructions operable to cause the electronic processing unit to implement the method according to any of claims 1 to 14.

Description:
Method and computer program product for characterizing a retina of a patient from an examination record comprising at least one image of at least a part of the retina

The invention relates to a method and a computer program product for characterizing a retina of a patient from an examination record comprising at least one image of at least a part of the retina.

The invention finds particular applications in automated detection and progression measurement of retinal pathologies.

Retinal pathologies are responsible for millions of blindness cases worldwide. According to the World Health Organization, 4.5 million people are blind due to glaucoma, 3.5 million people are blind due to age-related macular degeneration and 2 million people are blind due to diabetic retinopathy. Early diagnosis is the key to slowing down the progression of these diseases and therefore preventing the occurrence of blindness.

The leading modality for detecting retinal pathologies is color fundus photography. In recent years, many algorithms have been developed for the automatic analysis of color fundus photographs (R. Winder, P. Morrow, I. McRitchie, J. Bailie and P. Hart, "Algorithms for digital image processing in diabetic retinopathy", Comput Med Imaging Graph, vol. 33, no. 8, pp. 608-22, December 2009 and M. D. Abramoff, M. K. Garvin and M. Sonka, "Retinal imaging and image analysis", IEEE Rev Biomed Eng, vol. 3, pp. 169— 208, 2010). The goal is usually to detect first appearing signs of a target pathology, such as microaneurysms in the case of diabetic retinopathy (M. Niemeijer, B. van Ginneken and M. J. Cree et al, "Retinopathy online challenge: automatic detection of microaneurysms in digital color fundus photographs", IEEE Trans Med Imaging, vol. 29, no. 1, pp. 185-95, January 2010). Based on these detections, an automated diagnosis can be made: is the target pathology present in the patient or not? Such automated diagnoses have proven to be as efficient as a second expert opinion, compared to the initial expert opinion (M. D. Abramoff, J. M. Reinhardt, S. R. Russell, J. C. Folk, V. B. Mahajan, M. Niemeijer and G. Quellec, "Automated early detection of diabetic retinopathy", Ophthalmology, vol. 117, no. 6, pp. 1147-54, June 2010).

However, an automated diagnosis system that can only diagnose one pathology is not likely to replace a human expert: ophthalmologists would want to look at images themselves to make sure no other pathology is present. An ability to detect as many anomalies as a human expert is required to allow widespread use of an automated diagnosis system. In order to achieve this goal, one solution would be to define a specific detector for each possible anomaly. But due to the large range of retinal pathologies, this solution is too complex.

A few solutions have been presented in the past to characterize the appearance of abnormal examination records. These solutions typically rely on Multiple Instance Learning (MIL) (J. Amores, "Multiple instance classification: review, taxonomy and comparative study", Artiflntell, vol. 201 , pp. 81-105, August 2013). The purpose of MIL is to identify a local image pattern that can explain a diagnosis assigned to the case as a whole. Extensions to the identification of multiple local patterns have been presented (J. R. Foulds and E. Frank, "Speeding up and boosting diverse density learning", in Proc DS, 2010, pp. 102-16 and G. Quellec, M. Lamard, M. D. Abramoff, E. Decenciere, B. Lay, A. Erginay, B. Cochener and G. Cazuguel, "A multiple-instance learning framework for diabetic retinopathy screening", Med Image Anal, vol. 16, no. 6, pp. 1228-40, August 2012). Recently, MIL has been applied to eye fundus examinations (X. Yu, W. Hsu, W. S. Lee and T. Lozano-Perez, "Abnormality detection in retinal images", MIT, http://hdl.handle.net/1721.1/3845, Tech. Rep., 2004, G. Quellec, M. Lamard, M. D. Abramoff, E. Decenciere, B. Lay, A. Erginay, B. Cochener and G. Cazuguel, "A multiple- instance learning framework for diabetic retinopathy screening" Med Image Anal, vol. 16, no. 6, pp. 1228-40, August 2012 and R. Venkatesan, P. Chandakkar, B. Li and H. K. Li, "Classification of diabetic retinopathy images using multi-class multiple-instance learning based on color correlogram features", in Proc IEEE EMBS, 2012, pp. 1462-5). Besides MIL, a few anomaly detectors have been presented for specific regions of the retina: the optic disc (G. Kavitha and S. Ramakrishnan, "Abnormality detection in retinal images using ant colony optimization and artificial neural networks", Biomed Sci Instrum, vol. 46, pp. 331-6, 2010 and H. Zhu, A. Poostchi, S. A. Vernon and D. P. Crabb, "Detecting abnormality in optic nerve head images using a feature extraction analysis", Biomed Opt Express, vol. 5, no. 7, pp. 2215-30, June 2014) and the retinal vasculature (G. Kavitha and S. Ramakrishnan, "Abnormality detection in retinal images using ant colony optimization and artificial neural networks", Biomed Sci Instrum, vol. 46, pp. 331-6, 2010). As an alternative, a few solutions have also been presented to characterize the appearance of normal cases. One promising solution is to build a statistical atlas describing the normal range of image intensities at every location of the retina (S. Lee, M. D. Abramoff and J. M. Reinhardt, "Retinal atlas " statistics from color fundus images", in Proc SPIE Med Imaging, 2010, p. 762310). After registering an input image to this statistical atlas, anomalies can be detected by measuring the local deviation from the statistical atlas (S. Ali, K. M. Adal, D. Sidibe, T. P. Karnowski, E. Chaum and' F. Meriaudeau, "Exudate segmentation on retinal atlas space", in Proc ISPA, 2013, pp. 700-4 and G. Quellec, K. Lee, M. Dolejsi, M. K. Garvin, M. D. Abramoff and M. Sonka, "Three-dimensional analysis of retinal layer texture: identification of fluid-filled regions in SD-OCT of the macula", IEEE Trans Med Imaging, vol. 29, no. 6, pp. 1321-30, June 2010.

Bag-of- Visual Words (BoVW) model (J. Sivic and A. Zisserman, "Video Google: a text retrieval approach to object matching in videos", in Proc IEEE ICCV, 2003, pp. 1470- 7) is a known solution for characterizing an image based on its local appearance. Recently, the BoVW model has been applied to retinal images (A. Rocha, T. Carvalho, H. F. Jelinek, S. Goldenstein and J. Wainer, "Points of interest and visual dictionaries for automatic retinal lesion detection", IEEE Trans Biomed Eng, vol. 59, no. 8, pp. 2244-53, August 2012, M. J. J. P. van Grinsven, A. Chakravarty, J. Sivaswamy, T. Theelen, B. van Ginneken and C. I. Sanchez, "A Bag of Words approach for discriminating between retinal images containing exudates or drusen", in Proc IEEE ISBI, 2013, pp. 1444-7, R. Pires, H. F. Jelinek, J. Wainer, S. Goldenstein, E. Valle and A. Rocha, "Assessing the need for referral in automatic diabetic retinopathy detection", IEEE Trans Biomed Eng, vol. 60, no. 12, pp. 3391-8, December 2013 and R. Pires, H. F. Jelinek, J. Wainer, E. Valle and A. Rocha, "Advancing Bag-of- Visual- Words representations for lesion classification in retinal images", PLOS One, vol. 9, no. 6, p. e96814, June 2014).

Document WO 2013/184070 discloses a method for detecting a drusen lesion on an image of the retina of a patient. The method is of the type comprising the steps consisting in:

- automatically detecting image patterns on the image, each image pattern corresponding to a singularity of the retina to be characterized,

- automatically associating image patterns with visual words,

- automatically providing a distribution of visual words over at least a part of the image of the retina.

However, known methods, based on defined features corresponding to some well- known pathologies, do not enable an objective and global characterization of the retina to be obtained.

The invention aims to solve the above mentioned problems

To this end, according to a first aspect, the invention provides a method of the aforementioned type wherein based on a plurality of training examination records each comprising at least one training image of at least a part of the retina of a patient, elementary clusters each grouping an elementary number of training image patterns detected in each training image, and composite clusters each grouping at least two elementary clusters are defined,

wherein visual words at different granularity levels are defined, a finest-grained visual word being assigned to each elementary cluster and a coarser-grained visual word being assigned to each composite cluster, and

wherein the step of automatically associating image patterns with visual words comprises, for each image pattern:

- selecting a corresponding cluster among the elementary and composite clusters, - associating said image pattern with the visual word of the corresponding cluster.

Hence, according to the invention, the Bag-of-Visual Words (BoVW) model is implemented in the context of data mining with visual words of multiple granularity levels having each a semantic relevance associated to each image pattern. The combination of data mining and of visual words of multiple granularity levels offers an enhanced modularity as regards the patterns and their possible combination to be taken into account to provide an objective and global characterization of the retina.

Data mining offers a simple solution to define a general framework for characterizing the appearance of abnormal examination records or, alternatively, the appearance of the normal ones, based on a large dataset of eye fundus examination records. Specifically, it allows the identification of image patterns visible in examination records marked as abnormal but not in examination records marked as normal or, alternatively, it allows the identification of patterns characterizing examination records marked as normal. The visual words can be more or less vague to enable different diagnosis rules to be subsequently defined. For instance, some rules may focus on lesions in general, but some others may focus on particular subtypes of lesions (e.g. bright lesions) or even on particular lesions (e.g. exudates).

For improved performance, image patterns may also be of multimedia type, fusing heterogeneous information. In particular, the localization of image patterns as well as contextual information about the patient (structured demographic and clinical data) may be taken into account. The multimedia image pattern may denote either 1) a feature vector extracted locally from an image, 2) a conjunction of feature vectors extracted locally from one or several images or 3) a conjunction of local feature vectors and contextual information about the patient. In doing so, multimedia rules may be extracted, such as: "if the patient is a female younger than 40 and a pattern similar to this example was found with one millimeter of one of her optic disks and at least two patterns similar to that other example were found within 2 millimeters of one of her foveas, then she should be addressed to an ophthalmologist".

The method may further comprise, before the step of automatically associating image patterns with visual words, a step consisting in automatically defining visual words at different granularity levels which comprises:

- providing a plurality of training examination records each comprising at least one training image of at least a part of the retina of a patient,

- detecting training image patterns in each training image,

- clustering the training image patterns into elementary clusters each grouping an elementary number of training image patterns, and into composite clusters each grouping at least two elementary clusters,

- assigning a finest-grained visual word to each elementary cluster and a coarser- grained visual word to each composite cluster.

The step of clustering the training image patterns may comprise:

- characterizing the training image patterns of the training images, each training image patterns having a training feature vector describing the corresponding singularity and a surrounding of the corresponding singularity of the retina,

- calculating a central vector for each of the elementary and composite clusters based on the training feature vectors.

The step of automatically associating image patterns with visual words may further comprise characterizing image patterns of the image of the retina to be characterized, each image pattern having a feature vector describing the corresponding singularity and a surrounding of the corresponding singularity of the retina. The corresponding cluster selected for each image pattern may be one of the elementary and composite clusters having the nearest central vector to the feature vector of said image pattern.

The step of clustering the training image patterns may comprise:

- defining at least one index representative of a predetermined state of the retinas of the examination records,

- labelling elementary clusters extracted from training images of the examination records that present the predetermined state as positive, and elementary clusters extracted from training images of the examination records that do not present the predetermined state as negative, - grouping positive elementary clusters and grouping negative elementary clusters to form composite clusters.

When a central vector is calculated for each of the elementary and composite clusters, each elementary cluster extracted from training images of retinas may be labelled either positive or negative in accordance with a diverse density score defined by, for a z ' -th cluster of central vector c;, i = [l, ..., K], K being the total number of elementary clusters:

where denotes the y-th positive examination record,

R^ denotes the y-th negative examination record,

g (j ,k, +) d eno t es the training feature vector of the k-th training image pattern of the y-th positive examination record,

g (j ,k,-) denotes the training feature vector of the k-th training image pattern of the y-th negative examination record,

δ stands for the label + or -.

The step of clustering the training image patterns into elementary clusters may comprise determining an optimal total number Ko of elementary clusters by:

- increasing the total number K of elementary clusters until an increase of a

1 K

criterion— ^ DD C t ) reaches a threshold,

K ί= ι

- when the criterion reaches the threshold, assigning the value of K to Ko.

Each examination record may comprise a plurality of images of at least one retina, the method further comprising a step of creating a composite image of said retina.

The method may further comprise:

- before the step consisting in providing the distribution of visual words, the step consisting in automatically dividing the composite image into at least two regions,

- during the step consisting in providing the distribution of visual words, providing the distribution of visual worlds over each region of the composite image. The step of automatically dividing the composite image may comprise automatically detecting at least one structure of interest chosen between an optic disc structure of interest corresponding to the optic disc and a fovea structure of interest corresponding to the fovea, said structure of interest being divided into concentric regions:

RL, R , Ri and RL, Ri, R + ,

where R L , R ~ , Ri denotes a region centered on a center L of the structure of interest, between radii R " and Ri,

R L , Ri, R + denotes a region centered on a center L of the structure of interest, between radii Ri and R + ,

R + — R

R t = R + i , Aspatiai designating a total number of regions.

^ spatial

At least one of the concentric regions may be further divided into equally angularly distributed regions.

The step of automatically dividing the composite image may comprise automatically detecting a blood vessel structure of interest corresponding to the blood vessels, said structure of interest being divided into regions:

Rv, D , Di and Rv, Di, D + ,

where Rv, D " , Di denotes a region centered on a center V of the structure of interest, between distances D " and Di,

Rv, Di, D + denotes a region centered on a center V of the structure of interest, between distances Ri and D + ,

D + - D

D i = D + i , Aspatiai designating a total number of regions.

^ spatial

Each examination record may further comprise contextual data relative to the patient. Based on the plurality of training examination records, frequent itemsets containing at least one individual semantic item related to the distribution of visual words and/or at least one individual contextual item related to the contextual data may be defined. The method may further comprise, after the step of providing the distribution of visual words, a step consisting in checking the occurrence of the frequent itemsets in the examination record of the retina to be characterized.

The method may further comprise a step consisting in defining frequent itemsets which comprises:

- for each training examination record: defining the individual semantic items each describing in a binary manner a part of the distribution of visual words, a combination of the individual semantic items describing the whole distribution of visual words,

defining the individual contextual items each describing in a binary manner a part of the contextual data, a combination of the individual contextual items describing the whole contextual data,

- mining the training examination records to identify as frequent itemsets combinations of at least one individual semantic item and/or at least one individual contextual item having an occurrence above a determined confidence threshold.

When the composite image is divided into regions, the frequent itemsets may further contain at least one individual semantic item related to the distribution of visual words and/or at least one individual contextual item related to the contextual data and/or at least one individual spatial item related to the region of the structure of interest.

The step of defining frequent itemsets may further comprise:

- for each training examination record, defining individual spatial items each describing in a binary manner at least a part of one of the regions of the structure of interest, a combination of the individual spatial items describing the whole regions of the structure of interest,

- mining the training examination records to identify as frequent itemsets associations of at least one individual semantic item and at least one individual contextual item with at least one individual spatial item having an occurrence above the determined confidence threshold.

The method may further comprise a step of associating each frequent itemset with a criteria related to the retina.

According to a second aspect, the invention proposes a computer program product for characterizing a retina of a patient from an examination record comprising at least one image of at least a part of the retina, said computer program product being stored on a carrier readable by an electronic processing unit and comprising instructions operable to cause the electronic processing unit to implement the previously defined method.

Other objects and advantages of the invention will emerge from the following disclosure of a particular embodiment of the invention given as non- limitative example, the disclosure being made in reference to the enclosed drawings in which:

- Figure 1 is a flowchart illustrating an embodiment of a method for characterizing a retina of a patient from an examination record comprising at least one image of at least a part of the retina, visual words at different granularity levels being defined and image patterns corresponding to a singularity of the retina being detected and associated with the visual words at different granularity levels, occurrence of frequent itemsets in the examination record being checked,

- Figure 2 illustrates image operations of a preprocessing step of the method of

Figure 1 : a) images being cropped and resized, b) two mosaics corresponding each to one of the retina of the patient being formed based on the images of the examination record, and retinal landmarks (OD: optic disk, F: fovea, V: blood vessels) being detected in each mosaic, c) regions being defined with respect to landmarks OD and F and d) regions being defined with respect to V,

- Figure 3 illustrates the step of detecting and characterizing image patterns of the method of Figure 1 , Figure 3 a being an original image, Figure 3b being an illumination- corrected intensity image used for detection, Figure 3 c being an illumination-corrected color image used for characterization, Figure 3d showing image patterns detected using BRISK detector,

- Figure 4 illustrates the step of defining visual words at different granularity levels of the method of Figure 1 , training image patterns being detected on training images of a plurality of training examination records, pathological image patterns being represented by black pentagons, normal image patterns being represented by white stars and normal image patterns appearing only in healthy examination records being represented by gray diamonds, the training image patterns being then clustered into elementary clusters to each of which a finest-grained visual word is associated and into composite clusters to each of which a coarser-grained visual word is associated, a usual approach based on K-means clustering being illustrated on Figure 4a, an initialization of the approach according to the invention based on ¾-means clustering with the optimal Ko value being illustrated on Figure 4b, and two iterations of a cluster grouping solution according to the invention (^lexical = 3) being illustrated on Figure 4c, in which elementary clusters are circled with plain lines and corresponding finest-grained visual word are indicated by Arabic numbers, and composite clusters are circled with dashed lines and corresponding coarser-grained visual word are indicated by Greek numbers,

- Figure 5 illustrates the step of defining frequent itemsets of the method of Figure 1 though extended Apriori algorithm for adaptive control of the granularity level.

In relation to the Figures, a method for characterizing a retina of a patient from an examination record comprising at least one image of at least a part of the retina is disclosed. The examination record generally comprises multiple images of each of the two retinas of the patient mixed together in one single directory. The examination record may, however, comprise one or multiple images of one of the retinas of the patient or one or multiple images of retinas of different patients. In the examination record, each retina can be either in part or fully imaged by any appropriate modality and especially by color fundus photography.

The image of each retina in the examination may be a composite image, called "mosaic", built from every image of the retina in the examination record as disclosed in the co-pending application by G. Quellec, M. Lamard and G. Cazuguel, entitled "Method and computer program product for processing an examination record comprising a plurality of images of at least parts of at least one retina of a patient".

Then, image patterns, each corresponding to a singularity of the retina to be characterized, are detected in an automated manner in each mosaic. Each image pattern may be associated in an automated manner with visual words at different granularity levels. In particular, based on a plurality of training examination records each comprising at least one training image of at least a part of the retina of a patient, elementary clusters each grouping an elementary number of training image patterns detected in each training image, and composite clusters each grouping at least two elementary clusters are defined. A finest-grained visual word is assigned to each elementary cluster and a coarser-grained visual word is assigned to each composite cluster. For each image pattern, a corresponding cluster among the elementary and composite clusters is selected and the image pattern is associated with the visual word of the corresponding cluster. The advantage of a lexical representation at multiple granularity levels is that more or less precise rules can be extracted through data mining.

A distribution of visual words over at least a part of the image of the retina to be characterized may be automatically provided. The retina of regions of this retina may be described by one histogram of visual words per level of lexical granularity.

Possibly, rules for predicting a diagnosis based on conjunctions of visual and contextual words can be extracted through data mining in a large dataset.

The method may be implemented in an automated manner by a computer through a computer program product stored on a carrier readable by an electronic processing unit and comprising instructions operable to cause the electronic processing unit to implement the method. I. SPATIAL LOCALIZATION WITH RESPECT TO RETINAL LANDMARKS Experts may interpret an image pattern differently depending on its location with respect to the main retinal landmarks: the optic disc, the fovea and the blood vessels in particular. For instances, lesions in the fovea likely affect the patient's sight more severely than the exact same lesions in the periphery of the retina. So, in order to reliably characterize an image pattern within a fundus photograph, it may be desirable to localize this pattern with respect to retinal landmarks. To achieve this goal, as shown on Figure 2, it is proposed to estimate a transformation from each image in the examination record to a common reference system and localize the main retinal landmarks in that reference system sections I-A and I-B). Then, several regions may be defined with respect to the retinal landmarks (sections I-C to I-F).

A. Common Reference System

In order to simplify parameter selection, images may be first resized and cropped to 780 x 780 pixels, as explained in previous works (G. Quellec, M. Lamard, M. D. Abramoff, E. Decenciere, B. Lay, A. Erginay, B. Cochener and G. Cazuguel, "A multiple- instance learning framework for diabetic retinopathy screening", Med Image Anal, vol. 16, no. 6, pp. 1228-40, August 2012). Then, two mosaics, one per retina, can be jointly created using every image in the examination record, as described in the co-pending application by G. Quellec, M. Lamard and G. Cazuguel, entitled "Method and computer program product for processing an examination record comprising a plurality of images of at least parts of at least one retina of a patient". The proposed mosaicing algorithm relies on blood vessel segmentation and graph theory. Using this algorithm, a transformation from each image to a common reference system may be available for each retina.

B. Retinal Landmark Detection

The co-pending application also presents an algorithm for detecting two retinal landmarks robustly in each mosaic: the main vessel arch surrounding the macula and the optic disc. The laterality of each mosaic, namely which mosaic corresponds to the retina of the left eye and which mosaic corresponds to the retina of the right eye, can be determined from the relative location of these retinal landmarks.

Many algorithms for detecting the fovea have been presented in the literature (H. Li and O. Chutatape, "Automated feature extraction in color retinal images by a model based approach", IEEE Trans Biomed Eng, vol. 51, no. 2, pp. 246-54, February 2004, M. Niemeijer, M. D. Abramoff and B. van Ginneken, "Fast detection of the optic disc and fovea in color fundus photographs", Med Image Anal, vol. 13, no. 6, pp. 859-70, December 2009 and M. E. Gegundez-Arias, D. Marin, J. M. Bravo and A. Suero, "Locating the fovea center position in digital fundus images using thresholding and feature extraction techniques", Comput Med Imaging Graph, vol. 37, no. 5-6, pp. 386-93, July- September 2013). However, the fovea is not always visible in pathological retinas.

However, due to severe structural changes that may occur in the macula, no algorithm can guarantee a successful detection of the fovea in pathological images.

A robust algorithm for detecting the fovea is presented below.

The location of the fovea center C F = {X F , JF} is estimated based on the location of the optic disc center CO D = {XO D , )>O D } and the laterality of the mosaic.

In left eyes, C^is estimated by:

where D F is an average distance between C OD and C F in images (for example, Dp OD = 340 pixels) and 9 F ° D is an average angle between an horizontal axis and an OD/F axis in images (for example, Q F ° D = 5.6 ) (K. Rohrschneider, "Determination of the location of the fovea on the fundus", Invest Ophthalmol Vis Sci, vol. 45, no. 9, pp. 3257-8, September 2004). By symmetry, in right eyes, C^is estimated by:

x F = x 0D + D F COS(7Z " + θ ρ )

(?) y F = y 0D + D p 0D s {n +e p 0D )

Then, if and only if a fovea-looking structure is found in a neighborhood of C F , C F is updated to the center of this pattern. Otherwise, the initial estimation is retained. The radius R F of the neighborhood is set to four times a standard deviation of the optic disc/fovea distance in images (for example, Rp = 4 x 18 = 72 pixels). In an illumination corrected intensity image I ICI (see section II-A), the fovea looks like a 2-D Gaussian. So it can be detected in Iici through template matching: the template T is defined as a Gaussian filter of standard deviation O F - The standard deviation O F is defined from the average full width at half maximum in images (for example, O F = 24 pixels). A linear model may be used:

Ιια (^) = Τ + β (8) where W is a sliding window. The sliding window W minimizing a is retained (a should be negative). If and only if \ \ > τ α in W , C F is updated to the center of W (for example, τ α = 10).

C. Dividing the Retina in Regions

Based on the location of retinal landmarks into a common reference system, the composite image of each retina may be divided into a cascade of regions enabling a spatial granularity to be defined for the characterization of the retina.

D. Regions Relative to the Fovea and the Optic Disc

Two grids dividing the mosaic into two regions or more are defined in the proposed framework: one is centered on the fovea and one is centered on the optic disc (R. A. Hitchings, D. B. Brown and S. A. Anderton, "Glaucoma screening by means of an optic disc grid", Br J Ophthalmol, vol. 67, no. 6, pp. 352-5, June 1983). Their parameters are controlled by a data mining algorithm (see section III). Two operators are defined in order to recursively divide the retina into regions: a radial division operator and an angular division operator.

1) Radial Division Operator

Let L £ {F,OD} denote one retinal landmark, where F denotes the fovea center, and OD denotes the optic disc center.

Let R £> R - R R + denote one region centered on L and delimited by two radii: R and R + . The first operator defines A spa tiai ~ 1 new radii:

R, = R- + i^ - (1) spatial

where A spat i a i designates a total number of regions, i = I, A spa ti a i - 1.

Based on each of these new radii, two new regions are defined: R L , R ~ , RI and Ri, , R +- Initially, the whole retina is delimited by R = 0 and R + = R MAX , where R MAX denotes a large enough radius to reach the whole retina from either the optic disc or the fovea (for example, R MAX = 1000 pixels).

2) Angular Division Operator

The second operator divides a region R into four quadrants: R r , R 7 , R N and R S , where T, I, N and S stand for temporal, inferior, nasal and superior, respectively. This operator is terminal: the subregions cannot be divided further using either the radial division operator or the angular division operator.

E. Regions Relative to the Blood Vessels

Blood vessels can also be used to define regions: regions are defined based on the distance from a pixel to the nearest blood vessel. The distance to the nearest vessel can be computed very fast using the Maurer distance transform (C. R. Maurer Jr., R. Qi and V. Raghavan, "A linear time algorithm for computing exact Euclidean distance transforms of binary images in arbitrary dimensions", IEEE Trans Pattern Anal Mach Intell, vol. 25, no. 2, pp. 265-70, February 2003). Note that this distance transform may already have been computed to build the mosaics (G. Quellec, M. Lamard and G. Cazuguel, co-pending application entitled "Method and computer program product for processing an examination record comprising a plurality of images of at least parts of at least one retina of a patient").

Let R^ D ~ , D + denote one region centered on that landmark (denoted by V ) and delimited by two distances: D and D + . A new operator, called the distance division operator, defines A spa tiai ~ 1 new distances:

/. /> · ; " ' " (2) spatial

where A spat i a i designates a total number of regions i = \, A spa tiai ~ 1.

Based on each of these new distances, two new regions are defined: RV, D ~ , DI and R^ Di, D+- Initially, the whole retina is delimited by D = 0 and D + = D MAX , where D MAX denotes a large enough distance to reach the whole retina from the blood vessels (for example, D MAX = 150 pixels).

F. Region Intersections

Finally, two regions defined with respect to different landmarks (F and OD, F and V, OD and V) can be combined using the intersection operator. The intersection operator is terminal: the region intersections cannot be divided further using either the radial division operator, the angular division operator or the distance division operator. II. ADAPTING THE BAG-OF- VISUAL- WORDS MODEL FOR DATA MINING

Each of the aforementioned regions may be characterized using a variation on the Bag-of- Visual- Words (BoVW) model. It starts with extraction of localized image patterns in images (section II-B) and characterization of each of these image patterns individually (section II-C). The performance of image pattern detection and characterization may be improved through image preprocessing (II- A). Based on a collection of image pattern characterizations, several visual words are defined. Unlike the usual BoVW model, visual words at multiple granularity levels are defined (II-D to II-G). So, each image pattern is associated with multiple visual words: one per level of lexical granularity. Finally, each region can be characterized by one histogram of visual words per level of lexical granularity (II-H).

A. Enhancing Image Patterns

Image Pattern Enhancement is illustrated in Figure 3. Color information may be useful for characterizing image patterns (see section II-C). However, retinal structures and anomalies are best contrasted in the green channel, so image patterns may be detected in this channel only. 1) Illumination-Corrected Intensity Image

In order to compensate for illumination variations, intensity in the green channel is first normalized throughout the retina.

Let 1Q denote the green channel. The idea is to define a background image h, which captures the illumination, and divide 1Q by h, in order to obtain an illumination corrected intensity image Iici. The background image is obtained using a large median filter of radius r 0 D, where r 0 D denotes the typical radius of optic discs in images (for example, r 0 D = 75 pixels). The image processing pipeline is detailed in G. Quellec, M. Lamard and G. Cazuguel, co-pending application entitled "Method and computer program product for processing an examination record comprising a plurality of images of at least parts of at least one retina of a patient".

2) Illumination-Corrected Color Image

Illumination may also be corrected in the color image for pattern characterization. An illumination-corrected color image lice can be obtained by dividing each color channel by the background image. In order to preserve the hue, the same background image (image defined above) is used for all channels. B. Image Pattern Detection

Known image pattern detectors are SIFT, SURF, ORB (E. Rublee, V. Rabaud, K. Konolige and G. R. Bradski, "ORB: an efficient alternative to SIFT or SURF", in Proc ICCV, 2011, pp. 2564-71), FREAK (A. Alahi, R. Ortiz and P. Vandergheynst, "FREAK: fast retina keypoint", in Proc IEEE CVPR, 2012, pp. 510-7), BRISK (S. Leutenegger, M. Chli, and R. Siegwart, "BRISK: binary robust invariant scalable keypoints", in Proc ICCV, 2011, pp. 2548-55), the standard GFTT detector (J. Shi and C. Tomasi, "Good features to track", in Proc ICCV, 1994, pp. 593-600), FRST (G. Loy and A. Zelinsky, "Fast radial symmetry for detecting points of interest", IEEE Trans Pattern Anal Mach Intell, vol. 25, no. 8, pp. 959-73, August 2003) and FRST (Fast Radial Symmetry Transform). The FRST has been discovered of particular relevance in detection of small roundish lesions (microaneurysms, exudates, drusen, etc.). Image pattern detection is preferably performed in illumination-corrected intensity images (frci)- C. Image Pattern Characterization

The step of automatically associating image patterns with visual words comprises characterizing image patterns of the image of the retina to be characterized. Each image pattern has a feature vector describing the corresponding singularity and a surrounding of the corresponding singularity of the retina.

Some of the detectors identified in section II-A, such as ORB, BRISK and FREAK, come with a pattern descriptor, which extracts the feature vector in the neighborhood of the detected patterns. As mentioned above, feature vectors are extracted from the illumination corrected color images (lice) since color information is certainly relevant for differentiating image patterns.

D. Defining Visual Words at Multiple Granularity Levels

The data mining algorithm disclosed in the following of the description (section III) relies on a dictionary of visual words and the occurrence of these visual words in various areas of the retina.

A step consisting in automatically defining visual words at different granularity levels may comprise:

- providing a plurality of training examination records each comprising at least one training image of at least a part of the retina of a patient,

- detecting training image patterns in each training image, - clustering the training image patterns into elementary clusters each grouping an elementary number of training image patterns, and into composite clusters each grouping at least two elementary clusters,

- assigning a finest-grained visual word to each elementary cluster and a coarser- grained visual word to each composite cluster.

The step of clustering the training image patterns may comprise:

- characterizing the training image patterns of the training images, each training image patterns having a training feature vector describing the corresponding singularity and a surrounding of the corresponding singularity of the retina,

- calculating a central vector for each of the elementary and composite clusters based on the training feature vectors.

1) Usual Definition for Visual Words

As commonly done in the computer vision literature (J. Sivic and A. Zisserman, "Video Google: a text retrieval approach to object matching in videos", in Proc IEEE

ICCV, 2003, pp. 1470-7), these visual words are built from a collection of image pattern characterizations extracted from a training dataset.

Let N denote the number of features in image pattern characterizations. The usual strategy to define visual words is to sample image pattern characterizations from every training image, put all these characterizations together, and define multiple clusters of image pattern characterizations, for example, through AT-means clustering in the R N feature space. Finally, each cluster can be associated with one visual word, as illustrated in Figure

3a. 2) Visual Word Granularity for Data Mining

Choosing the number of clusters (K) is not straightforward. In standard BoVW implementations (J. Sivic and A. Zisserman, "Video Google: a text retrieval approach to object matching in videos", in Proc IEEE ICCV, 2003, pp. 1470-7), several K values are tested and the one maximizing performance is retained. In a data mining context, however, it is desirable to have visual words more or less vague from one diagnosis rule to another. For instance, some rules may focus on lesions in general, but some others may focus on particular subtypes of lesions (e.g. bright lesions) or even on particular lesions (e.g. exudates). Therefore, multiple cluster sizes should be used in order to obtain a semantic representation of image patterns at multiple granularity levels. 3) Visual Words of Increasingly Coarse Granularity

This could be achieved by running AT-means clustering multiple times, with different values for K. However, as K decreases, the probability that normal and abnormal image patterns merge into the same clusters increases, as illustrated in Figure 3a. In order to avoid this phenomenon, AT-means clustering may be run only once with a large value K = Ko, as illustrated in Figure 3b and explained in section II-F. This defines the finest- grained visual words. Then, coarser-grained visual words may be defined by grouping clusters together, in a hierarchical fashion, as illustrated in Figure 3 c and explained in section II-G.

E. Semantic Indices for Clusters based on Multiple Instance Learning

In order to minimize the mixture of normal and abnormal patterns, at every granularity level, semantic indices are defined for each cluster. These indices are used to find the optimal number of clusters (Ko) in section II-F and to group clusters together in section II-G.

In particular, the step of clustering the training image patterns comprises:

- defining at least one index representative of a predetermined state of the retinas of the examination records,

- labelling elementary clusters extracted from training images of the examination records that present the predetermined state as positive, and elementary clusters extracted from training images of the examination records that do not present the predetermined state as negative,

- grouping positive elementary clusters and grouping negative elementary clusters to form composite clusters.

1) Multiple Instance Learning

A subset of weakly-supervised classification is Multiple Instance Learning (MIL) (J. Amores, "Multiple instance classification: review, taxonomy and comparative study", Artif Intell, vol. 201, pp. 81-105, August 2013). In MIL, labels are assigned to bags of instances. Each bag of instances is an examination record and the instances contained in this bag are the image patterns detected in it. A bag of instances is labeled positive if at least one of its instances is positive, otherwise it is labeled negative. The goal of MIL is to discover which instances are positive based on the global labels only, and use it to classify new bags of instances. Here, MIL is used to evaluate the discriminative power of each cluster of instances: two semantic indices are defined for each cluster as described hereafter.

2) Two-Way Multiple Instance Learning

Two definitions of "positive labels" may be used in order to define two different semantic indices. The first index is defined using pathological examination records as positive bags of instances ("positive = pathological"): an anomaly index will be obtained. The second index is defined using normal examination records as positive bags of instances ("positive = normal"): a normality index will be obtained. Although they are somehow inversely related, these indices are not redundant. A high normality index indicates image patterns that are more frequently seen in healthy retinas (a healthy optic disc, a healthy fovea, etc.); a low normality index indicates image patterns that are not, but pathological patterns are not guaranteed to have a lower normality score than patterns equally frequent in pathological and normal images (e.g. blood vessel patterns).

3) Diverse Density Scores of a Cluster

Diverse Density (DD) (O. Maron and T. Lozano-Perez, "A framework for multiple- instance learning", in Proc NIPS, 1998, pp. 570-576) measures an intersection of relevant bags minus the union of the irrelevant bags. Relevant instances in feature space are those maximizing DD.

Let denote the z ' -th cluster obtained through AT-means clustering, i = 1, K, and let d denote the corresponding central vector.

Let respectively Ε^' denote the y ' -th positive, respectively negative, examination record in the training set.

Let respectively R ? y (/ , . ¾ * ~ -), denote the k-th image pattern characterization extracted from that record.

The DD score of cluster can be expressed as follows:

(3) where δ stands for the class label: + or -.

U S) _ c

It should be noted that the squared Euclidean distances ( ) have already been computed for AT-means clustering, so this criterion can be computed fast, particularly if an approximation of the x→ 1 - e x function is used.

F. Finest-Grained Visual Words

As mentioned previously, visual words at the finest granularity level are associated with the Ko clusters obtained through A " 0 -means clustering. A supervised strategy is proposed in order to find K 0 , the optimal number of clusters.

Let DD p (Ci) and DD^ ) denote the two DD scores computed for cluster depending on the definition of "positive" ("pathological" or "normal"). In order to find Ko, the number of clusters K is increased until the following criterion stops increasing:

the value of K when the stopping criterion is reached.

G. Coarser-Grained Visual Words

Once the finest visual words have been defined (one cluster <= one visual word), coarser-grained visual words are defined by grouping clusters together. 1) Semantically- Augmented Feature Space

Cluster grouping relies in part on the location of central vector in feature space (N- D numerical vectors), and in part on the {DD p (Ci), DD^ )} indices defined above (2D semantic vectors). A semantically-augmented feature space R N+2 is obtained after normalizing the energy of numerical and semantic indices and concatenating the normalized numerical and semantic vectors. Energy is normalized so that the average energy ratio between semantic and numerical vectors equals p after normalization (parameter setting is explained in section III-G). This procedure ensures that coarser- grained visual words are more semantically correct: pathological patterns are more likely to be grouped together and so do the patterns appearing more often in normal retinas. 2) Hierarchical Clustering

Following the example of spatial granularity (see section I-C), the above semantically-augmented feature space may be divided hierarchically, for data mining purposes. AT-means clustering, K < K 0 , is also used for this purpose. But the input samples are no longer individual image pattern characterizations in R N , but rather the KQ finegrained cluster central vectors in the semantically-augmented feature space R N+2 .

First, the K 0 clusters are distributed among K = Ai exica i meta-clusters: each meta- cluster is associated with one coarse visual word. Then, each meta-cluster is further divided into Ai exica i smaller meta-clusters associated with finer-grained visual words. And so on until the data mining algorithm converges or until each meta-cluster only contains one cluster (i.e. until the finest-grained level is reached).

H. Visual Word Histogram of a Region

Once visual words have been defined, they are used to characterize an image or, in our case, multiple regions inside an image mosaic.

1) Visual Word Histogram

In the standard BoVW model (J. Sivic and A. Zisserman, "Video Google: a text retrieval approach to object matching in videos", in Proc IEEE ICCV, 2003, pp. 1470-7), the signature of an image / is defined as a histogram h 1 of visual words. For each image pattern detected in /, 1) a characterization is extracted, 2) the cluster having the nearest central vector is selected in order to associate the image pattern with a visual word w and 3) the histogram bin corresponding to that visual word, is incremented. These histograms are generally used to find similar images, given a distance metric between histograms, or to recognize an object through strongly supervised classification.

2) Adaptation to Mosaics

In order to process undistorted image patterns, these patterns are preferably detected and characterized in individual images. Obviously, images forming the mosaic overlap, so the same pattern may be counted multiple times. To solve this issue, an overlap map IO can be built: Io{x,y) is the number of input images whose projection intersects the (x, y) location in the mosaic image. Note that the black border around the camera's field of view may not be taken into account when forming the overlap map. This overlap map is used to measure by how much histogram bins should be incremented: for an image pattern whose projection into the mosaic image is (x, y), histogram bins should be incremented by \/Io(x,y).

3) Adaptation to Multiple Regions

Multiple regions of the retina have been characterized independently (see section I).

The (x, y) projection of an image pattern in the mosaic image may be used to determine which regions are affected by this image pattern.

Let w be the visual word associated with this image pattern: h R (w) incremented by \/Io{x,y) for each region St affected by this pattern.

III. MINING DATA AT MULTIPLE GRANULARITY LEVELS

Patient's retinas have been described by visual words at multiple levels of spatial and lexical granularity. These spatial characterizations can be combined with contextual information in order to extract more or less precise diagnosis rules. The proposed data mining algorithm extends usual solutions for association rule learning in transactional databases (see section III- A). Data preparation for association rule learning is presented in section III-B. Apriori, the most popular association rule learning algorithm, is presented in section III-C. The proposed strategy for adaptive control of the granularity is presented in sections III-D and III-E. The use of association rules for automated diagnosis is presented in section III-F. The proposed framework relies on a few parameters that need to be tuned: the overall training procedure is presented in section III-G.

Each examination record may further comprise contextual data relative to the patient. Based on the plurality of training examination records, frequent itemsets containing at least one individual semantic item related to the distribution of visual words and/or at least one individual contextual item related to the contextual data can be defined. Then, after the step of providing the distribution of visual words, a step consisting in checking the occurrence of the frequent itemsets in the examination record of the retina to be characterized may be implemented.

Frequent itemsets may be defined as follow:

- for each training examination record:

defining the individual semantic items each describing in a binary manner a part of the distribution of visual words, a combination of the individual semantic items describing the whole distribution of visual words, defining the individual contextual items each describing in a binary manner a part of the contextual data, a combination of the individual contextual items describing the whole contextual data,

- mining the training examination records to identify as frequent itemsets combinations of at least one individual semantic item and/or at least one individual contextual item having an occurrence above a determined confidence threshold.

The frequent itemsets may further contain at least one individual semantic item related to the distribution of visual words and/or at least one individual contextual item related to the contextual data and/or at least one individual spatial item related to the region of the structure of interest. Frequent itemsets may then further be defined by:

- for each training examination record, defining individual spatial items each describing in a binary manner at least a part of one of the regions of the structure of interest, a combination of the individual spatial items describing the whole regions of the structure of interest,

- mining the training examination records to identify as frequent itemsets associations of at least one individual semantic item and at least one individual contextual item with at least one individual spatial item having an occurrence above the determined confidence threshold.

A. Association Rule Learning

Association rule Learning is a subset of data mining operating on databases containing transactions (e.g. collections of items bought by customers) (R. Agrawal and R. Srikant, "Fast algorithms for mining association rules in large databases", in Proc VLDB, 1994, pp. 487-99). Each transaction is seen as a set of binary attributes called items.

Let / = i 2 , i n ) be a set of items (or an itemset for short). An association rule is defined as an implication of the form X = 7 where X, Y <Ξ / and X Π 7= 0.

Association rule learning is the problem of finding all rules with a large enough support and a large enough confidence, where:

- the support supp( ) of an itemset X is defined as the proportion of transactions in the database which contain the itemset,

- the support supp( = Y) of a rule X = Y is defined as supp( U Y),

- the confidence of a rule X = Y is defined as:

r v v . supp( u F)

conf{X => Y) =— ^ ^ (5) supp( ) It is an estimate of the conditional probability P(Y\X).

B. Data Preparation for Association Rule Learning

Association rules have been defined for binary attributes (items). However, visual characterizations consist of continuous variables (histogram levels - see section II-H) and contextual attributes are either continuous, ordinal or categorial variables.

1) Representing Ordinal and Categorial Variables

Ordinal and categorial variables can easily be transformed into binary variables using equality tests (e.g. "eye color" is categorial but "eye color = blue" is binary).

2) Representing Continuous Variables at Multiple Levels of Quantification Granularity

Continuous variables can also be transformed into binary variables by defining intervals (e.g. "age" is continuous but "10 < age < 20" is binary), but defining intervals is not straightforward. Multiple binary observations can be generated from each observation of a continuous variable V, as described below.

Let R f c R denote the range of possible values taken by V. Two binary variables (2 n , n = \) are derived from V by dividing R^ into two equiprobable intervals. Four binary variables (2 n , n = 2) are derived from V by dividing R^ into four equiprobable intervals, etc. At each level n, one binary observation can be created by searching for the interval in which the continuous observation falls. Again, a multiple-granularity representation is used: one binary observation is created per level of quantification granularity.

3) Null Observations

If no value is available for a given contextual variable in some examination record, then we simply do not generate any binary observation for this variable in that record. Similarly, if no instances of some visual word is observed in a given region of the retina, then no binary observations are generated for this visual word in that record.

C. Apriori Algorithm

Apriori is an association rule learning algorithm (R. Agrawal and R. Srikant, "Fast algorithms for mining association rules in large databases", in Proc VLDB, 1994, pp. 487- 99): it starts by mining frequent itemsets (minimum support: ¾ upp ) and then learns association rules (minimum confidence: s con f). Faster itemset mining algorithms exist (e.g. Max-Miner -R. J. Bayardo Jr., "Efficiently mining long patterns from databases", in Proc ACM SIGKDD, 1998, pp. 85-93), but these algorithms only enumerate the "maximal" frequent itemsets, that are not subsets of any other frequent itemset.

1) Frequent Itemset Mining

Apriori starts by identifying the frequent individual items in the database and then extends them to larger and larger itemsets as long as those itemsets appear sufficiently often in the database (supp( ) > ¾u PP ). In Apriori, frequent subsets are extended one item at a time. Candidate itemsets of length Lk are generated from itemsets of length . Then, candidates which have an infrequent sub pattern are pruned. After that, frequent itemsets are determined among the candidates by scanning the database.

2) Association Rule Learning

The frequent itemsets determined by Apriori are then used to determine association rules: the support of each frequent itemset is stored, so the confidence of each candidate rule can be computed efficiently using equation (5).

D. Extending Apriori for Adaptive Control of the Granularity Level

Frequent itemset mining can be extended to data at multiple granularity levels (see

Figure 5).

Before presenting the general case (section III-E), it is first assumed that granularity has a single dimension (e.g. the lexical dimension).

Following the example of Apriori, in which frequent itemsets are extracted by increasing size, the proposed algorithm extracts frequent itemsets by increasing level of granularity (i.e. from coarse to fine). Then, association rules are learned exactly as in the Apriori algorithm.

1) Initialization

First, the standard Apriori algorithm is run on the transactional database encoded using items at the lowest granularity level Go ( => coarse). Regarding the spatial dimension, there is a single region at level Go: Ω. Regarding the lexical dimension, there is a single visual word at level Go: the number of detected image patterns is simply counted. Regarding the quantification dimension, a single interval is defined for each continuous variable (R): it is simply checked whether or not a value was assigned to this variable (e.g. is the patient age known? was at least one instance of this visual word observed?).

2) Mining Frequent Itemsets of Increasing Granularity Levels

Then, the algorithm mines more precise frequent itemsets, obtained by replacing some items at granularity level G j -\ by more precise versions of these items at level G j . The refinement procedure described below is used for mining frequent itemsets containing at least one item at granularity level G j , j≥ 1 , but no items at a higher granularity level. This process is repeated for increasing values of j until no more frequent itemsets containing items at the granularity level G y are found.

3) Simple Relationship between Itemsets at Different Granularity Levels

Let X'≥X denote a relationship between itemsets X and X indicating that X' contains the same items as X or that one or more item in X is replaced in X by a more precise item. If X > X, then necessarily supp^) < supp( )- So, if itemset X is not frequent enough (i.e. supp( ) < ¾u PP ), then there is no need to search for frequent itemsets X such that ^ > X.

4) Refinement Iteration for Mining Frequent Itemsets of Maximal Granularity Level Gj

This defines a simple way to generate candidate itemsets as the granularity level increases. To generate candidates during the yth refinement iteration, each frequent itemset X found during the (j - l)-th iteration is modified as follows. The algorithm generates every more precise version of X, obtained by replacing one or more item at granularity level G j -\ in X by a more precise version of that item at granularity level G j . Note that only items at granularity level G j -\ in X need to be modified: those at any other level G / , / <j - 1 do not. Like in Apriori, candidates are generated and evaluated by increasing length Lk, so that candidates of length Lk which have an infrequent sub pattern can be pruned.

5) Data Structure

In order to efficiently identify infrequent sub patterns (for candidate itemset pruning) and to efficiently compute the confidence of candidate association rules, all frequent itemsets and their support can be stored into a hash tree. In order to efficiently generate candidates, a second hash tree is used to store specifically the frequent itemsets found during the (/ ' — 1 )th iteration.

E. Extension To Multiple Granularity Dimensions

In the disclosed embodiment, granularity has multiple dimensions: for example D\,

D 2 and D 3 are spatial dimensions with respect to the three retinal landmarks (see section I), Z>4 = the lexical dimension (see section II) and D 5 = the quantification dimension (see section III-B). The refinement strategy proposed above can be extended to multiple dimensions. First, the standard Apriori algorithm is run on the transactional database encoded using items at the lowest granularity level for all dimensions. Then, the result is improved by operating multiple refinement iterations along D\ until convergence, followed by multiple refinement iterations along D 2 until convergence, etc. The order in which granularity dimensions are processed does not matter. Then, association rules are learned exactly as in the Apriori algorithm.

F. Using Association Rules for Automated Diagnosis

In an automated diagnosis context, there is a need to learn association rules whose right-hand side is a diagnosis. The method may then further comprise a step of associating each frequent itemset with a criteria related to the retina.

Binary diagnoses may be considered, i.e. normal versus abnormal. An examination record is considered abnormal if at least one pathology is present in at least one retina, which indicates that the patient should be addressed to an ophthalmologist.

1) Normality and Abnormality Score of an Examination Record

Both a normality score and an abnormality score may be defined for each examination record. The normality score (respectively the abnormality score) is defined as the maximal confidence of association rules whose right-hand size is the 'normal' item (respectively the 'abnormal' item) and whose left-hand side contains items that are all observed in that record. In other words, the normality score is the specificity of the best normality detection rule and the abnormality score is the sensitivity of the best abnormality detection rule. 2) Dual Receiver-Operating Characteristic (ROC) Curve

One ROC curve can be built by varying a cutoff on the normality scores. A second ROC curve can be built by varying a cutoff on the abnormality scores. Depending on the desired sensitivity (respectively specificity) for the detection system, either a cutoff on the normality score or a cutoff on the abnormality score is used, whatever maximizes the overall specificity (respectively sensitivity).

3) Extension to One Diagnosis per Retina

Because each fundus photograph has been assigned to one retina (see sections I-A and I-B), it should be noted that the proposed automated decision framework can also be applied separately to each retina, for improved performances.

G. Overall Training Procedure

The proposed framework relies on a few parameters that should be tuned: A spa tiai (see section I-C), pg and Ai exica i (see section II-G), ¾ upp and £ con f (see section III-C). Besides these numerical parameters, an optimal combination of image pattern detectors and image pattern descriptors may be determined (see section II-C).

First, the numerical parameters are trained sequentially within a set of possible values, while using a default value for all other parameters. Each combination is evaluated by the area under the maximum of the two ROC curves. Then, each (detector U, descriptor V) is evaluated independently and the best pair, according to the above criterion, is retained.

Let (Uo, Vo) denote the best pair. Then each tuple {(Uo, Vo),(U, V )} is evaluated and the best tuple, denoted by {(Uo, Vo),(U V \ )} , is retained. And so on until performance stops increasing.