Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A METHOD FOR AUTOMATIC SEARCH FOR ZONES OF INTEREST IN DIGITAL IMAGES OF BIOLOGICAL ORIGIN
Document Type and Number:
WIPO Patent Application WO/2017/072685
Kind Code:
A1
Abstract:
A method for the automatic search for zones of interest in a digital image obtained from biological samples. The method comprises the following steps: providing a non-linear, scattering transform coefficient representation of a zone of a digital image and highlighting, or not highlighting, the zone of the digital image based on an index of interest obtained using a wide-margin linear classifier trained in primal form as Support Vector Machine.

Inventors:
ROFFILLI MATTEO (IT)
Application Number:
PCT/IB2016/056458
Publication Date:
May 04, 2017
Filing Date:
October 27, 2016
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
BIORETICS S R L (IT)
International Classes:
G06T7/00; G06V10/52
Other References:
AINA LÓPEZ TENZA: "Scattering Transform for Breast Cancer Detection", 11 June 2015 (2015-06-11), pages 1 - 62, XP055280471, Retrieved from the Internet [retrieved on 20160614]
JOAN BRUNA ET AL: "Invariant Scattering Convolution Networks", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, IEEE COMPUTER SOCIETY, USA, vol. 35, no. 8, 1 August 2013 (2013-08-01), pages 1872 - 1886, XP011515339, ISSN: 0162-8828, DOI: 10.1109/TPAMI.2012.230
ALAIN RAKOTOMAMONJY ET AL: "Scattering features for lung cancer detection in fibered confocal fluorescence microscopy images", ARTIFICIAL INTELLIGENCE IN MEDICINE, vol. 61, no. 2, 1 June 2014 (2014-06-01), NL, pages 105 - 118, XP055280749, ISSN: 0933-3657, DOI: 10.1016/j.artmed.2014.05.003
Attorney, Agent or Firm:
FANZINI, Valeriano (IT)
Download PDF:
Claims:
CLAIMS

1. A method for the automatic search for zones of interest in a digital image;

the method comprises the following steps:

- providing a representation of a zone of this digital image, this representation being a non-linear coefficient representation obtained by a scattering transform; and

- providing, for this digital image zone, starting from the representation of the same digital zone, a numerical value which is representative of the likelihood of containing a zone of interest, in particular using at least one linear classifier known as Support Vector Machine (SVM), trained in primal form, working on the scattering transform coefficient representation of the same zone of the digital image.

2. The method according to claim 1 , characterized in that the digital image zone is or is not shown on the basis of the numerical value.

3. The method according to either of the preceding claims, characterized in that it trains the linear classifier known as Support Vector Machine (SVM) through a series of samples of zones of interest.

4. The method according to any one of the preceding claims, characterized in that it comprises a step of selecting the windows with the highest likelihood of containing the object or zone of interest.

5. The method according to claim 4, characterized in that, in order to select the windows with the highest likelihood of containing the object or zone of interest, a group of neighbouring windows is defined for each window by selecting spatially proximate windows; and a candidate, that is to say the window with the highest likelihood of containing the object or zone of interest, is then chosen from among this group.

6. The method according to any one of the preceding claims, characterized in that the digital image represents breast tissue.

7. The method according to any one of the preceding claims, characterized in that the zones of interest represent tumour masses.

8. The method according to any one of the preceding claims, characterized in that the zones of interest represent microcalcifications or groups of microcalcifications.

9. The method according to any one of the preceding claims, characterized in that the digital image represents lung tissue.

10. The method according to any one of the preceding claims, characterized in that the digital image represents tissue from the digestive tract.

11. The method according to any one of the preceding claims, characterized in that the digital image represents biological material.

12. The method according to any one of the preceding claims, characterized in that the digital image is provided by any of the following diagnostic methods: mammography, Nuclear Magnetic Resonance Imaging, thermography, ultrasound, scintimammography, CAT, etc. of lungs, digestive tract; that is to say in the form of a corresponding digital photograph.

13. The method according to any one of the preceding claims, characterized in that it is designed to be implemented in an apparatus for processing and analysing digital images.

14. An apparatus designed to implement a method as claimed in the preceding claims.

Description:
DESCRIPTION

A METHOD FOR AUTOMATIC SEARCH FOR ZONES OF INTEREST IN DIGITAL IMAGES OF BIOLOGICAL ORIGIN

Technical field

This invention relates to a method for the automatic search for zones of interest in digital images preferably obtained from biological material.

Even if this description refers expressly to the detection of tumour masses in digital mammographic images, it is understood that the teachings of this invention can, with the necessary changes made, be applied to the analysis and processing of any digital image coming from any method of investigation, such as, for example, nuclear magnetic resonance imaging, thermography, ultrasound, scintimammography, CAT, etc. of lungs, digestive tract, etc., and images of biological tissue in general.

Background art

In many cases, the characteristics, such as shape and size, of objects of interest in a digital image subjected to analysis, are variable and not easy to define. Yet the operation of many of the algorithms used up to now to automatically detect objects in an image require those very characteristics to be well specified; this specification is a specific task performed by an expert in the field. To overcome the problems of defining the characteristics and avoid or reduce the work of the expert in the field, automatic detection systems have been developed in recent years which do not require the characteristics to be specified beforehand and which automatically learn from many samples of the objects to be classified.

The best state-of-the-art results from such "automatic learning systems" have been obtained from systems consisting of Support Vector Machine (SVM) or Relevance Vector Machine (RVM) classifiers trained to identify a class of objects using the original image, made up of the layers of grey or colour of all its pixels or a representation of it using transforms such as wavelets or ranklets, for example.

In this regard, reference is made to the following patents:

R. Campanini, M. Roffilli, N. Lanconelli: United States Patent Application 10/231 ,800 "Method, and corresponding apparatus, for automatic detection of regions of interest in digital images of biological tissue";

M. Roffilli, E. lampieri, E. Angelini, M. Masotti, Italian patent No. BO2006A200, "Metodo, e relativa apparecchiatura, per la ricerca automatica di zone di interesse in immagini digitali" ("Method and related apparatus for automatic search for zones of interest in digital images"). Another successful approach consists in directly applying to portions of digital images deep ANN - artificial neural networks of CNN (convolutional neural network) type which are capable of learning not only the decision making function but also filter banks for multi-layered representation.

In this regard, reference is made to the following article:

"Large-Scale Learning with SVM and Convolutional Nets for Generic Object Categorization" (Fu-Jie Huang andYann LeCun, Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1 , p. 284-291 ).

These systems, however, based as they are on the representation of objects in layers of grey or colour or on wavelet representation or even on convolution filters and hence on SVM or RVM or ANN classifiers, have several drawbacks, both general and specific to the case of images of biological samples.

There are problems connected with the representation of objects using grey or colour layers which are propagated also to the wavelet transform. These problems are due to the fact that some factors can cause variation in the grey layers of the images of the objects and thus in the coefficients values of their wavelet representation. Such factors may be, for example, the noise produced by the apparatus which generates the image or the values of the image capture parameters of the apparatus or the use of apparatuses which even just slightly differ from each other. These variations, however small in value, may lead to the object being classified differently by the classifiers.

Indeed, there may be variations between the digital images of the same breast obtained with different mammography units or between an image obtained with a digital mammography unit and one obtained with an analogue unit and then digitized. These variations in the grey layers and in the values of the wavelet coefficients may have negative consequences for the use of the above mentioned "automatic learning" methods.

One clarifying example is the following. To learn to identify mammographic images, the automatic classifier needs to be trained with a high number of masses taken from images diagnosed by expert radiologists. The acquisition and diagnosis of a large number of images involves massive effort and a very high cost. On the other hand, databases of images obtained from analogue mammographic units, diagnosed by expert radiologists and digitized, have been available on the Internet for some time. It would certainly be a considerable advantage to be able to use these databases to train a classifier to be used for automatic detection of masses in digital images produced by mammographic units of different kinds or in images which were originally analogue and subsequently digitized. The absence of this possibility means that, at present, all the diagnosed images used for training purposes must come from the same type of mammographic unit which will then be used to produce the images on which the classifier itself will work to aid in the detection of masses. Another drawback, again connected with grey layer representation and wavelet representation is due to the fact that it may be difficult for a single automatic detection system working in a clinical environment to work at constant levels of sensitivity and specificity if it has to analyse images from different types of mammographic units present in that clinical environment. With its absolute sort values, the ranklet transform partly solves the issues mentioned above in connection with pixels and wavelets but, at the same time, creates other issues due precisely to the exclusion of the real value of acquired numerical intensity, which may make originally very different images appear similar. To this must be added high computational complexity which limits its use in non real-time contexts.

Other specific problems in the biological field are due to the difficulty of obtaining a fair number of samples of the zones of interest. Let's think, for example, of anomalies in medical images: their availability is often limited by the objective difficulty of finding a large number of patients with such anomalies, which are often rare or very rare and impossible to reproduce experimentally not only for obvious ethical reasons but also on account of the complex nature of the biological processes which produced them. The same applies to any biological anomaly that is relatively uncommon but, precisely because of that, it is very important for it to be identified automatically. On the other hand, it is very easy to obtain samples of normal or healthy biological tissue because these represent the sampling norm. It is therefore necessary to have a classifier which is capable of generalizing well in identifying anomalies even with just a few samples of these but which is, at the same time, able to draw advantage from the millions of samples available in the norm.

CNN's, which integrate in a single algorithm both the representation and the classification of the images, seek to solve the problem of representation variability by shifting it towards the quantity of available images. In effect, the availability not of thousands but millions or even hundreds of millions of samples makes it possible to overcome by statistical means many of the problems of variability in sample distribution. Indeed, the cascade use of many layers of representation (hence the term "deep architecture") can mitigate variability while highlighting the significant differences, in a context of nonlinearity.

The con side of the size of the sample dataset is that it is difficult, and in many cases impossible, for the classifier to process the huge bulk of data involved.

ANN's, and hence CNN's, solve the problem using sequential processing algorithms such as Error Back Propagation which, thanks to their sequential nature, do not have limits on input data bulk but which, at the same time, do not always produce the same result, which is therefore not optimal but only approximated. In other words, by keeping unchanged the dataset and the order of presentation of the samples, but changing the optimize starting point, different solutions are generated and hence, different models, and their performance is, in many cases, nonuniform. RVM suffers from the same problem in that the algorithms used to solve it may be subject to local minima. Such behaviour is obviously undesirable in sensitive fields such as medicine. SVM, on the other hand, always gives the same result because the problem it solves is a convex one. With the computing resources currently available, ANN'S can process datasets of tens of millions of samples, SVM, in its nonlinear version, is limited to hundreds of thousands of samples, and RVM to tens of thousands.

More specifically, SVM is limited by the fact that the solution, in the nonlinear case, is obtained using a nonlinear transform known as kernel, usually in the Gaussian variant, which cannot be efficiently optimized beyond a certain limit. The result is that the optimization problem must be solved in its alternative form, known as dual (as opposed to the native form, known as primal), where the complexity of the problem is correlated with the number of samples instead of with the number of features (characteristics of the sample grouped in a numeric vector). Resulting from this are the limitations on the numbers of samples which can be used in training.

Disclosure of the invention

This invention therefore proposes a new integrated solution as an alternative to the solutions known up to now specifically for the biological field and, more specifically, proposes to overcome one or more of the above mentioned drawbacks or problems and/or to meet one or more of the needs in the field which, in particular, may be inferred from the above. Accordingly provided is a method for the automatic search for zones of interest in a digital image;

the method comprises the following steps:

- providing a representation of a zone of this digital image, this representation being a non-linear coefficient representation obtained by a scattering transform; and

- providing, for this digital image zone, starting from the representation of the same digital zone, a numerical value which is representative of the likelihood of containing a zone of interest, in particular using at least one linear classifier known as Support Vector Machine (SVM), trained in primal form, working on the scattering transform coefficient representation of the same zone of the digital image.

The advantageous choice of not using a nonlinear kernel, such as the

Gaussian one, for example, allows important computational optimizations, even in the SVM step, known as "test", making this method well suited to real-time contexts, that is, contexts normally present in the industrial field.

According to another aspect, also provided is a method for the automatic search for zones of interest in a digital image;

the method comprises the following steps:

- providing a representation of a zone of this digital image, this representation being a scattering transform coefficient representation; and

- providing, for this digital image zone, starting from the representation of the same digital zone, a numerical value which is representative of the likelihood of containing a zone of interest, in particular using at least one linear, maximum margin classifier known as Support Vector Machine, trained in primal form, working on the scattering transform coefficient representation of the same zone of the digital image;

- providing a selection of the numeric values which are representative of the likelihood of containing a zone of interest in order to highlight only the zones of most interest in the digital image.

More specifically, in the approach followed by this invention, the representation of the zones to be classified is accomplished by a nonlinear transform known as "scattering transform" implemented by a scattering network (see S. Mallat patent EP2577564A1 : Multiscale modulus filter bank and applications to pattern detection, clustering, classification and registration) and classification is preferably carried out by a linear, maximum margin SVM classifier trained in its primal variant.

Given an image, the scattering transform, using a cascaded wavelet transform and complex modulus operations, that is, the scattering network, extracts from the image a nonlinear representation which is invariant by translation and stable with respect to deformations. This representation is noteworthy because linearizing nonlinear characteristics allows effective use of linear classifiers.

In this regard, reference is made to the following article:

"Invariant scattering convolution networks", J Bruna, S Mallat, Pattern Analysis and Machine Intelligence, 2013.

Detailed below are the steps necessary for calculating the scattering transform by means of a scattering network.

Scattering networks construct features which are invariant to the action of groups such as translation, rotation or scaling through the convolution of the image which they are applied to, with pre-computed wavelet filters. Consequently, the features obtained from an image are very similar to those that would be obtained from a translated, rotated, scaled or weakly deformed version of it.

A scattering network has a tree-like structure, where each node receives an input, performs certain operations on it, generally convolution and modulus extraction operations, and returns two outputs. Of these, one is applied to the subsequent nodes of the network whilst the other forms part of the overall output of the network itself. More specifically, the following operations are performed:

· convolution with a complex, pre-computed shape wavelet (generally a Morlet wavelet): ¾</; (¾)= 2 2 ' ψ (2 J r ι) , con λ= 2 j r, r <3J , j ^Z, u ^lC .

• extraction of the phase from the result of the convolution by means of a complex modulus operation | . |,

° the result of the modulus operation is:

° applied as input to the nodes following the current one;

° further filtered by a convolution operation with low pass function, generally Gaussian, and then sent as output as part of the overall output to the network for that layer:

(p 2 j {u)= 2 21 φ (2 " ' u), dove J ^Z, u ^R 2

The root node and the nodes of the last layer are exceptions because they do not perform the filtering operation by means of wavelets but only the low-pass filtering operation.

The network implemented is described by four fundamental parameters, which are:

• M: is the number of active levels of the network, that is, the number of layers of the tree structure except the one containing the root node;

· J: is the maximum index by which the wavelets and filters are scaled. It should be noted that in this preferred implementation, the image is scaled and the filters remain fixed.

• L: is the number of rotations a wavelet is subjected to in order to obtain a plurality of different filters;

· Q: is the number of wavelets per frequency octave in the Fourier domain.

In the implementation which uses the multiresolution analysis, the parameters J and M are linked by the relation J >= M.

Each layer of the network thus implemented thus contributes to the overall output with a number of images equal to: (LxQ)"'( J ), 0 <m. <M

m

Each image forming part of the overall output from the network is scaled by a factor ^ ; that is to say, if the size of the network input image was ( xW) t then the output images will all be of size {Hli'x WIi')

As may be inferred from Figure 3, substantial differences between scattering networks and other "deep" architectures such as CNN's, for example, are:

· in scattering networks, the output is not concentrated in the last layer of the network but each layer contributes to the overall output;

• in scattering networks, the filters are pre-computed and must not therefore be constructed by training the network;

• scattering networks function as feature extractors and do not have any classification function like GNN's for example, so they must be coupled to a classifier such as an SVM, for example.

The numeric coefficients produced by the scattering transform for each resolution layer and for each scale are concatenated in a single feature vector.

Like other feature extraction methods, such as CNN's, it is a "deep" architecture, that is, one which involves a succession of nonlinear transformations. The depth of the architecture refers to the number of layers making up the nonlinear operations of the function learnt. Generally speaking, "deep" architectures have the following advantage: functions which can be represented in a compact manner by an architecture of depth k require an exponentially larger number of computational elements to be represented by an architecture of depth k-1.

Despite the success of CNN's, the properties and search methods of the optimum configuration of networks of this type are not yet well understood and there is a lack of mathematical models to explain their behaviour. The scattering transform answers these questions with a well-based mathematical and algorithmic framework.

The representation extracted by the scattering transform, that is, the numeric feature vector, is processed by a linear classifier targeted to finding a maximum margin geometric hyperplane (that is, the one with the strongest capability of generalization) in feature space, to separate the samples of one class from those of another.

More in detail, given a set of n samples in belonging to two classes (tt ^ * 1 ) we want to find the maximum margin hyperplane which separates the samples of one class from those of another. A hyperplane may be viewed as the set of points x which satisfies the equation ' ~ b = 0 . If the problem is linearly separable, we will have: w x[— h > 1 for the I of the first class;

for the %I of the second class. This may be rewritten as: y w- -b)≥ l fQr l<i≤n

Putting all this together, we obtain the minimization problem (in w , & ) of where y i (w-x i -b)≥ i For mathematical convenience, it is best to alter the preceding equation by substituting H H with 2 '' " . Thus, SVM solves the following optimization problem: If we introduce the multipliers " of KKT, the problem may be expressed as : n

argmin { →^max (a≥0 {- || || z - ^ t \y t (w · x - - 1]}

i = l

This is the problem tackled in the primal form and involves the estimation of parameters, correlated with the number of features. If we wanted to integrate a nonlinear representation, we would have to add a mapping function to the input space, for example by means of a Gaussian kernel and would, with major computational advantages, be led to solving the optimization problem in its dual form which involves the estimation of n parameters, correlated with the number of samples. Tackling the problem in the primal form is therefore convenient when the number of samples is greater than the number of features. For the primal form of linear SVM, highly optimized algorithmic solutions exist which are capable of processing hundreds of millions of samples, and even an infinite number of samples, in online mode.

This invention also has for an object an apparatus adapted to implement the method which is the main object of the invention.

Brief description of the drawings

The invention is described below with reference to the accompanying drawings, which illustrate a non-limiting embodiment of it and in which:

- Figure 1 illustrates the method for multiresolution scanning of an image; - Figure 2 is a block diagram representing an embodiment of an automatic search method according to this invention;

- Figure 3 shows the supports of the scattering transform implemented by a scattering network. Detailed description of preferred embodiments of the invention

In particular, in one particularly preferred embodiment of it, this invention provides a method for the automatic search of zones of interest in a digital image, the method essentially comprising the steps of representing zones of the digital image by means of a scattering transform and highlighting, or not highlighting, these zones based on a value of likelihood of interest obtained using at least one linear classifier known as support machine vector efficiently trainable in its primal form on more than hundreds of millions of samples.

In short, by associating a nonlinear representation obtained by means of a scattering transform with a linear SVM solved in primal form, we obtain a complete method which overcomes the limitations imposed by current technologies considered individually and which is particularly suitable for biological contexts where zones of interest containing rare diseases need to be identified.

As may be inferred from Figures 1 and 2, the system of this preferred embodiment works as follows: a window of fixed size scans the image from left to right and from top to bottom in steps of approximately 1/10 of its linear dimensions. For each window defined in this way, the portion of window contained in the window is extracted.

The portion of image is brought, by interpolation, to the reference dimensions set during training of the classifier, and represented using a scattering transformation implemented with a scattering network of preset parameters.

The values of the scattering transform are classified by a linear SVM classifier previously trained in primal form.

The numeric result of the classification is used to determine whether the zone is of interest or not.

The procedure is repeated after varying the size of the scanning window which is always brought to the reference dimensions by interpolation (upsampling or downsampling). Since both the size of the scanning window and the parameters of the scattering transform are fixed, the number of elements of all the vectors produced by all the windows is identical. This vector constitutes the input > for the SVM classifier.

The SVM classifier has thus been trained with a set of samples of zones of interest transformed by a scattering transform implemented with a scattering network of preset parameters.

In practice, in a binary classification context, the SVM classifier has been previously trained with a set of samples chosen so that it returns a numeric value which is positive if the input presented is a vector representing the first class, also called "positive class", corresponding, in the case of mammography, to tumour masses, and which is negative in the case of the second class or "negative class".

Thus, this SVM classifier assigns to the input vector a value of belonging to the zone of interest.

In another embodiment, the responses of a plurality of SVM classifiers can be used in suitable combination to assign the value of belonging to the input vector.

This is followed by a step of selecting the windows with the highest likelihood of containing the object or zone of interest: in this case, the tumour masses. For this purpose, a group of neighbouring windows is defined for each window by selecting windows which are spatially proximate within a preset Euclidean range r. A candidate, that is to say the window with the highest numeric value representing the highest likelihood is chosen from among this group.

All the windows chosen as candidates are stored in a list, while all the windows not chosen are discarded. The entire process is repeated for several different sizes of the scanning window to search for zones of interest or masses of different sizes. That way, a number of lists containing candidate windows is produced which is equal to the number of sizes searched. The windows contained in these lists are added to a single list, sorted in descending order based on the numeric values associated with each. This value is used as an estimate for likelihood-based sorting: that is to say, there is no likelihood value but a list sorted according to the likelihood value, although not known (ranking). Of these windows, the ones whose likelihood ranking is lower than a predetermined threshold f are discarded. At this point, the top n windows are chosen, where n is a preset value. These remaining windows are the windows which, for all search sizes, obtained the highest ranking of likelihood of containing a zone of interest or mass.

The next step is to select these remaining windows and to search for other windows, if any, around the geometrical boundary of each window. If the window under examination is isolated, it is discarded. If there are other windows, on the other hand, all of these plus the window under examination are grouped together in a single new window with the following associated with it: as centre, the centroid of the component windows; as size, the average size of the component windows; and as likelihood ranking, a combination of the likelihood rankings of the component windows.

These new windows are then added to a new list, sorted in descending order based on likelihood ranking. Lastly, only the top q windows of the new list, where q is a predetermined parameter, are displayed graphically overlaid on the original image. Overlaying the edge of the window makes it possible to visually grasp the centre and size of the zone of interest or mass identified, whilst the colour used for overlaying is associated with the likelihood ranking.

In practice, as may be well inferred from the flow diagram of Figure 2, a preferred embodiment of the method thus comprises the following steps:

- a) subjecting or having previously subjected the classifier to a corresponding training set,

- b) entering the image (input image), - c) setting the size of the image scanning window,

- d) cropping and resizing each scanned window from the image,

- e) performing the scattering transform of the crop or portion of the window,

- f) creating a corresponding vector of the scattering coefficients of the respective crop or portion,

- g) classifying the vector with the SVM (Support Vector Machine),

- h) repeating steps c) to f) with scanning windows of different sizes,

- i) selecting the crops or portions of interest for a certain scale,

- j) repeating steps b) to i) for all the scales of the image,

- k) selecting the crops or portions of interest for all the scales,

- I) creating an output image (output image with or without highlighted zones).