Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND MULTI-PURPOSE IMAGING SYSTEM FOR CHROMOSOMAL ANALYSIS BASED ON COLOR AND REGION INFORMATION
Document Type and Number:
WIPO Patent Application WO/2001/002800
Kind Code:
A2
Abstract:
The present invention relates to classification of chromosomes, and in particular to classification of multicolor images of combinatorially labeled probes. More particularly, the present invention relates to a multi-purpose imaging system for multicolor fluorescence in situ hybridization. The present invention provides a method and a system for automatically classifying chromosomes or portions thereof by combining color and region information. More particularly, the method comprises the steps of: selecting a first start pixel/voxel/region; selecting an adjacent pixel/voxel/region; determining dependent on a pre-selected similarity criteria whether said adjacent pixel/voxel/region is of similar direction; adding said adjacent pixel/voxel/region to said first pixel/voxel/region if said adjacent pixel/voxel/region is of similar direction so that a first region is grown; repeating steps b) to d) until no more adjacent pixels/voxels/regions are found that fulfil the selected similarity criteria; calculating a first color vector for said first grown region; repeating steps a) to f) with further start pixels to grow further regions until no more regions can be grown; and performing clasification on the weighted region color vectors.

Inventors:
EILS ROLAND (DE)
Application Number:
PCT/EP2000/006138
Publication Date:
January 11, 2001
Filing Date:
June 30, 2000
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV RUPRECHT KARLS HEIDELBERG (DE)
EILS ROLAND (DE)
International Classes:
G01N15/14; G06K9/00; (IPC1-7): G01B11/00
Foreign References:
US5798262A1998-08-25
US5817462A1998-10-06
US5880473A1999-03-09
Other References:
SCHROECK E ET AL: "MULTICOLOR SPECTRAL KARYOTYPING OF HUMAN CHROMOSOMES" SCIENCE,AMERICAN ASSOCIATION FOR THE ADVANCEMENT OF SCIENCE,,US, vol. 273, 26 July 1996 (1996-07-26), pages 494-497, XP000952836 ISSN: 0036-8075
Attorney, Agent or Firm:
VOSSIUS & PARTNER (München, DE)
Download PDF:
Claims:
Claims
1. Method for automatically classifying chromosomes or portions thereof, characterized by combining color and region information.
2. The mehod of claim 1, comprising the steps of : a) selecting a first start pixel/voxel/region ; b) selecting an adjacent pixel/voxel/region ; c) determining dependent on a preselected similarity criteria whether said adjacent pixei/voxel/region is of similar direction; d) adding said adjacent pixel/voxellregion to said first pixelivoxel/region if said adjacent pixel/voxel/region is of similar direction so that a first region is grown ; e) repeating steps b) to d) until no more adjacent pixels/voxels/regions are found that fulfil the selected similarity criteria; f) calculating a first color vector for said first grown region; g) repeating steps a) to f) with further start pixels/voxels/regions to grow further regions until no more regions can be grown; and h) performing classification on the weighted region color vectors.
3. The method of claim 1, comprising the steps of: a) selecting a first start pixel/voxel/region ; b) selecting an adjacent pixel/voxel/region ; c) determining dependent on a preselected similarity criteria whether said adjacent pixel/voxel/region is of similar direction; d) adding said adjacent pixel/voxel/region to said first pixel/voxel/region if said adjacent pixel/voxel/region is of similar direction so that a first region is grown; e) calculating a first color vector for said first grown region; f) repeating steps b) to e), wherein in step e) the color vector is updated; g) repeating steps a) to f) with further start pixels to grow further regions until no more regions can be grown ; and h) performing classification on the weighted region color vectors.
4. The method of claim 1, 2 or 3, wherein said color vectors are calculated based on the color information of said pixels/voxels/regions with a weight that corresponds to the size of the region.
5. The method of claim 4, wherein similar direction of two vectors is determined by comparing their normalized projection with a predefined value.
6. The method of any of claims 1 to 5, wherein step a) is preceded by a step of removing background.
7. The method of any of claims 1 to 6, further comprising a step of adapting class vectors.
8. The method of any of claims 1 to 7, further comprising the step of performing an edge, intensity or region based segmentation on every single color image.
9. The method of claim 8, wherein said segmentation step precedes step a).
10. Method for automatically classifying chromosomes or portions thereof, comprising the steps of: a) segmenting regions of an image without use of color information; b) calculating a color vector for said region; c) classifying said region using said color vector.
11. A system for automatically classifying chromosomes or portions thereof, comprising: means for selecting a first start pixel/voxel/region ; means for iteratively selecting an adjacent pixel/voxel/region, determining dependent on a preselected similarity criteria whether said adjacent pixel/voxel/region is of similar direction, and adding said adjacent pixel/voxel/region to said first pixel/voxel/region if said adjacent pixel/voxel/region is of similar direction so that a first region is grown, until no more adjacent pixels/voxels are found that fulfil the selected similarity criteria ; means for calculating a first color vector for said first grown region; means for repeating said iterative selection, determination, and addition with further start pixels to grow further regions until no more regions can be grown; and means for performing classification on the weighted region color vectors.
Description:
Method and multi-purpose imaging system for chromosomal analysis based on color and region information The present invention relates to classification of chromosomes, and in particular to classification of multicolor images of combinatorially labeled probes. More particularly, the present invention relates to a multi-purpose imaging system for multicolor fluorescence in situ hybridization.

Multicolor fluorescence in situ hybridization (MFISH) allows to identify the twenty-four different human chromosomes in a metaphase spread by the simultaneous hybridization of a set of chromosome-specific DNA probes, each labeled with a different combination of fluorescent dyes. MFISH has been shown to readily identify both simple and complex chromosomal abnormalities.

The application of the new multicolor karyotyping techniques promises to revolutionize the analysis of complex karyotypes with broad applications in pre- and postnatal applications and tumor cytogenetics. Due to the limited cytogenetic resolution of any of the new 24 color hybridization techniques using chromosome specific painting probes hidden structural abnormalities as they frequently occur in apparently normal metaphases from leukaemia and mental retardation might be eluded. These limitations can be overcome by disease specific libraries of differentially labeled probes. The automated analysis of such data is difficult since the image data is dominated by background rather than object information. Hence a classification of chromosomal regions based on color information alone is not feasible. Instead, we have developed a system which combines a region oriented analysis with a color analysis for accurate classification of both combinatorially labeled chromosomes and combinatorially labeled small regions. This method has been successfully applied to the analysis 1) of a 12 color telomere assay 2) of bar codes for nine different chromosomes. Notably, this method is not restricted to the analysis of two-dimensional metaphases but can also be applied to interphase cytogenetics, i. e. the analysis of combinatorially labeled chromosomal regions in three-dimensional cell nuclei.

Introduction Multiplex-fluorescence in situ hybridization is a combinatorial staining technique for the simultaneous detection and discrimination of the 22 autosomes and sex chromosomes as well as smaller biological probes. Each chromosome (or probe) is labeled in a unique combination of colors, which allows the identification of every chromosome by its spectral composition. For unique combinatorial labeling of all 46 chromosomes at least 5 fluorochromes are needed.

After image acquisition every pixel in the two-or three-dimensional image volume then contains, besides its spatial information, an additional spectral information. The dimension of the spectral information in every pixel depends on the number of fluorochromes and image acquisition technique used and is at least five as mentioned above. This information is used to classify every pixel according to its spectral components in order to detect the different biological probes where the labeling scheme of the different probes represents the'ideal'classes which the pixels have to be classified to. For correct cytogenetic diagnosis a very high degree of correctly classified pixels is obviously necessary.

In experimental conditions however, there are many obstacles that limit the rate of correct classification.'Biochemical noise', inhomogeneous hybridization, different binding properties of fluorochromes to different biological specimen and localization of the pixel/voxel/region within the probe cause considerable intensity variations of'true information'. Pixels/voxels in the outer regions of a chromosome will have in general a lower intensity than those lying more in the inner part. Crosstalk between different spectral channels can lead to very high background that is especially troublesome where'true information'intensity in the image is weak.

Any classification algorithm has to take these variations into account, either in a preprocessing step followed by a subsequent classification procedure, or by using a classifier that depends on direction (albedo) rather than on intensity.

Classification Every pixel in the image volume has to be assigned to one of the possible classes, which are defined by the labeling scheme. Therefore a distance measure or classifier has to be chosen to assign the pixels to their closest or most probable classes.

A classification algorithm for multicolor images that is solely based on intensity values in the greylevel images of the different color (spectral) channels could be as follows : First, a smoothing algorithm may be performed as a pre-processing step for every single image to compensate for the intensity variations within homogeneous regions of a chromosome. Then a threshold is either manually or automatically set for every single image by histogram analysis. Figure 1 shows the histogram of the FITC image of a multicolor labeled metaphase spread with a manually chosen threshold and the resulting image. Afterwards every pixel is classified to the class which it has the closest (Euclidian) distance to, or additionally binary images could be created before classification.

However, such a threshold is not always easy to find, even in images of excellent quality, especially if this is to be done automatically. There will always be a threshold range in the histogram and thus an uncertainty where a reasonable threshold could be set, and the need to take as many pixels as possible into account in order to avoid information loss is a serious conflict in such an approach.

Small probes or regions with only e few tens of pixels will hardly be detectable in the histogram, if the intensity is not high enough to form a gap between the background distribution and its own pixels in the histogram, or if the greyvalues of its pixels/voxels do not lie within or higher than a detectable peak formed by larger structures. For example, a translocation will be detected, if its greylevels lie within or higher than the peak formed by chromosomes.

Classification based on direction If n colors are used, every pixel in the multicolor image can be regarded as a point in the n-dimensional Euclidian space where every axis corresponds to one of the colors used for labeling, and thus can be treated as a n-dimensional vector.

In the further discussion, this Euclidian space will be referred to as'color space'.

In an MFISH experiment where, e. g., three different colors are used (n=3, RGB color space), two pixels having (255,200,220) and (128,100,110) intensities, respectively, should be classified to the same class (threefold painted probe). This information is of course distorted by different sources of noise that are not easy to be quantified. Garini et. al. however, presented in US-A-5 798 262 a color model and a valuable first step into quantitative noise analysis of multicolor images. US- A-5 798 262 relates to a method for finding L internal reference vectors for classification of L chromosomes or portions of chromosomes of a cell, the L chromosomes or portions of chromosomes being painted with K different flourophores or combinations thereof, wherein K basic chromosomes or portions of chromosomes of the L chromosomes or portions of chromosomes are each painted with only one of the K different fluorophores. The classification method according to US-A-5 798 262 comprises three techniques, namely (a) a multi- band collection device for measuring a spectral vector for each pixel ; (b) a method for computing internal reference vectors for each chromosome (or portion thereof); and (c) classification of all pixels for all chromosomes using those internal reference vectors. In contrast to the classification method described in U. S. Ser.

No. 08/635,820 of Garini et al. where those internal reference vectors were obtained from external reference libraries (which proved to be an"approach which is not always successful" ; cf. US-A-5 798 262, col. 6, I. 55,56) here these internal reference vectors are computed automatically from the data itself. In US-A-5 798 262, internal reference vectors are computed by using at least one pixel for each chromosome, i. e., the chromosomes are classified on a pixel basis. The classification method according to US-A-5 798 262 is performed in three steps, In a first step, a set of internal reference vectors for the chromosomes are computed (so-called basic chromosomes; preferably those 5 chromosomes being only labelled by one fluorochrome). Importantly, these reference vectors are computed by using at least one basic pixel from each such basic chromosome. Thereafter those basic reference vectors are used to compute internal reference vectors for the remaining chromosome classes. In the third step the reference vectors are used for classification of all pixels.

Data structure Figure 2 shows the data points in the FITC, Cy3, Cy3. 5 space of a multicolor metaphase spread where five different fluorochromes have been used (n=5). Several clusters can be observed, reaching out of a high background, that belong to different color combinations of these three fluorochromes, pointing into different directions. The high'background'results mainly from pixels that are labeled in combinations, which use at least one of the other two colors as well, which will be shown in one of the following sections, where the color space is again visualized after classification based on direction with excellent result, where only those points are plotted, which belong to chromosomes (or classes) that are only labeled in a combinatorial way with the fluorochromes (color space) shown.

Though distorted through different noise sources, clustering into a specific direction (distorted cone) in color space can be seen for a chromosome with several hundreds of pixels in the image (Figure 2). The intensity of pixels/voxels in the inner regions of a chromosome will be high, decreasing when moving to the outer regions of the chromosome where signal to noise ratio becomes worse which will lead to a spreading of the cluster. If probes are small with pixel number, let's say below 50-100, clusters may not be detectable in the color space for these probes, if noise is considerably, which is often the case in multicolor experiments.

For correct classification of these structures and to resolve overlaps of clusters in color space spatial information has to be taken into account. Hence if one desires to classify a multicolor image with painted chromosomes with translocations as well as other labeled small probes simultaneously in the same biological specimen, a region based classification algorithm based on direction seems to be a promising strategy for image analysis to yield robust results.

However, first the classification based on direction will be described, and application to multicolor MFISH images of normal. and aberrant metaphases will be shown.

Then classification will be extended on small probes, the problem of focal shifts will be addressed and a regional approach based on direction as classification algorithm will be developed. The results of classification of this algorithm on MFISH images of aberrant metaphase spreads and telomeres will be presented.

Smallest angle (or highest normalized projection') as distance measure Let f be the angle between two vectors (pixels/voxels) a and b in the n- dimensional color space, then cos (f) is the normalized projection of these two vectors calculated by their normalized scalar product.

Let a= (ar,.., an) and b= (bi,.., bn), then If a is a pixel/voxel/region and b one of the class-vectors, then a is assigned to the class-vector (probe) that it has the highest normalized projection length (smallest angle) with. Figure 3 shows the surface plots of the normalized scalar product for the class-vectors (128,0), (128,128) and (0,128).

If the classification is based on direction, a prior background correction and noise reduction has to be performed. The reasons are that the normalized scalar product is not defined for zero length (zero intensity in all color channels) is very noise sensitive and thus not a robust classifier, when intensity levels in all channels are low. Therefore pixels of low intensity are likely to be classified wrong if classified by direction. However, pre-processing can improve classification results, if the cluster structure is manipulated in such a way that a better conical clustering of the data points in color space is achieved. This will be discussed later.

Direction will change dramatically with noise at low intensities. This will lead to distortion of cones in the ideal case.

Correction of class-vectors, iteration The class-vectors the pixels/voxels are classified to represent the differently labeled probes in the image volume. Therefore, classification results will be better, the better class-vectors represent the pixel/voxel/region sets of different probes.

The labeling scheme only represents the optimal case, where no background, crosstalk and other sources of noise are present. Hence, the class-vectors have to be adjusted to the information content of the multicolor image. Figure 4 shows the classification result on the labeling scheme vectors and Figure 4a shows the color space of the first three fluorchromes used, with only those data points plotted, that have been classified to probes that are labeled only in these fluors.

The present invention suggests an iterative approach. The advantage is, that it is dynamical and automated classification feasible.

First, all data points are classified to the class-vectors. In the next step new class-vectors are calculated based on the classified pixels. Again classification and calculation is done iteratively.

At the beginning the class-vectors correspond to the labeling scheme or are slightly modified, taking intensity differences of the spectral windows and background into account. Of course, any other wisely chosen start-vectors could be chosen as well.

Convergence of the iteration is not guaranteed, especially it can lead to wrong classification in those cases, where neighboring clusters overlap or where a very small cluster is very close to a large cluster, if many iteration steps are performed. However, pre-processing algorithms can improve results by manipulation the clusters in color space, or by taking the spatial information into account as it is done in the region based approach according to the invention, that is represented in a following section. it turned out that 2 to 3 iteration steps are enough to yield good results on well hybridised images. Figure 4b shows the improved classification after one iteration and Figure 4c again shows the color space of the first three fluorchromes used, with only those data points plotted, that have been classified to probes that are labeled only in these fluors.

Note: the only pre-processing step to define a segmentation mask by setting a threshold of 2* (standard deviation) in the DAPI image.

A high number of pixels and therefore a good clustering is necessary for this algorithm to give reliable results. Translocations in aberrant cells will be classified correctly, if the chromosome with the same color information is in the image as well. The cluster is then dominated by data points of the chromosome, not those of the translocation. The same algorithm as before has been performed on a metaphase spread with translocations with excellent result that can be seen in Figure 5. The DAPI image was used here for background removal.

Very small probes with only a few data points in color space have to be treated differently as it will be discussed in a following section.

Pre-processing There are many ways of pre-processing images to enhance image contrast, signal-to-noise ratio and other image properties, that can be reviewed in literature on image analysis extensively. The following description exemplary focuses only on two approaches, that seem to have good properties for multicolor image classification, again these are not the only appropriate ones.

As mentioned before, pre-processing can improve classification results based on direction, if the cluster structure is manipulated in such a way that a better conical clustering of the data points in color space is achieved. Two methods will be described here: . Nonlinear an-isotropic diffusion filtering applied in image space, i. e. on every image separately and A density gradient function method for manipulation in color space.

Nonlinear an-isotropic diffusion filtering is an edge conserving smoothing filtering for noise reduction that is applied to every single color image. It has several control parameters, which have to be carefully adjusted to image content, in order to conserve as much image information of different scales as possible and remove noise at the same time. The effect on color space is denser clustering of the data points of different probes. Thus, by use of spatial information smoothing filters can improve classification results, and may ease the separation of overlapping clusters in color space. Figure 6a and b show an example how clustering in color space is (slightly) enhanced by diffusion filtering.

A direct way to manipulate the clustering in color space is the use of density gradient methods. This iterative algorithm forces points of a cluster to move together to form a denser one. However, it has not been implemented yet to present any results. There are a few control parameters that have to be set, and being an iterative approach, its effect on clusters in color space can be controlled and be much greater than filtering.

Overlap of clusters In the case where two cluster overlap considerably, a separation might be accomplished by applying both methods presented, before classification is performed. Using only one of these may not be enough for larger overlaps, because applying a smoothing filter (if ever) would only do a slight separation enhancement by making use of spatial information, while density gradient methods could lead to a merging in a case as such. But by applying both, filtering serves as a first enhancement for the following density gradient algorithm to separate the overlapping clusters into two. If cluster overlap only slightly, each of these methods could lead to separation. Overlap of clusters in color space of multicolor images leads to the question of discrimination and resolution, which will be addressed in a following section.

Even if there may be cluster arrangements in color space of multicolor images in experiment that cannot be separated by these methods, this empirical view on cluster separation indicates that two clusters that overlap in color space could be separated, if the spatial information is taken into consideration as well.

Figures 12 and 13 demonstrate how overlaps of clusters are resolvable by use of spatial information. Figure 12a and 12b show a pixel classification image and its FITC, Cy3 and Cy7 color space of a multicolor labeled metaphase spread. Figure 13a and 13b show the classification result after application of the region color clustering algorithm, presented in a further section, and classification of region color vectors on the same class vectors used in figures 12a and 12b. Figure 13b clearly shows how the overlap between the blue (chr. 9) and yellow (chr. X) cluster is resolved, leading to correct classification, that is not feasible using only color information in a pixel based algorithm. it does not only mean that separation of overlaps is feasible, but a higher resolution because two (or three) dimensions with non redundant information add to the n-dimensional color space when n fluorochromes are used.

Summary of the invention The present invention provides a multi-purpose image analysis system which is designed for routine application in clinical diagnostics. The present invention provides a method which allows the fully automated analysis both for multicolour karyotyping and for the study of disease specific combinatorially labeled probes. The potential of this technique will be exemplified by the fully automated detection of cryptic translocations in leukaemias with apparently normal karyotypes.

According to the present invention, the pixel classification method is based on the direction, and is applicable to multicolor images of arbitrary dimension. No pre-processing of the images is required to achieve excellent classification results on well hybridized probes. However, as the results depend on image quality and thus on the distribution in color space, a further embodiment of this approach is disclosed that takes the spatial information into account and turns out to be more robust. It will be applicable for classification of chromosomes and small probes as well as detection of translocations in tumor cells simultaneously with very high accuracy.

Region color clustering The classification method and system according to the present invention is based on region growing that takes spatial and color information into account. It tesselates the image volume into clusters of data points of similar color and spatial information. First, a background removal algorithm is preferably applied to speed up the algorithm and to get rid of regions of near zero intensity that are troublesome for direction. Then a first pixel/voxel/region is picked, if neighboring pixels are of similar direction, i. e., if the normalized projection is larger than a preset value (threshold), they are added, a color vector is calculated for these pixels, and the region is grown and the color vectors are until no neighboring pixels are found that fulfil the chosen similarity criteria. A color vector for this region is calculated based on the color information of the pixels/voxels with a weight that corresponds to its size. The next region starts with the next pixel found and is grown alike. Again a weighted region color vector is calculated. This is done iteratively until no more new regions can be found. Alternatively an updated color vector can be calculated during region process, which will lead to slightly larger regions and therefore fewer regions.

Finally, the classification is performed on the weighted region color vectors in color space, instead of a classifying pixels or voxels.

Again, by having taken the spatial information into account, overlaps could now be resolved more easily, because every cluster of a probe in color space is now represented by a few weighted points (or vectors). If'overlaps'of these region vectors still occur, gradient density methods may more easily be applied on these region vectors in color space.

The similarity criteria of'similar direction'has to be reasonably set. It is a value between zero and one, defining the threshold for a pixel to be added to a region. As mentioned earlier, the cosine of the angle f between two vectors in color space, which is the normalized projection of these two vectors, is calculated by their normalized scalar product (equation [1]). Hence, two vectors can be regarded as of similar direction, if their normalized projection is larger than a pre- defined value, which corresponds to a certain angle in the color space. A small angle (a larger projection) will lead to a large number of small regions, which is a safe strategy, if merging of neighboring regions of different probes with similar color is to be avoided. A small projection threshold corresponding to a large angle, will lead to a small number of larger regions.

The question what a reasonable threshold of normalized projection (or angle) is will be discussed later. It depends on the number of fluorochromes used and the labeling scheme, as well as on image content. If all neighboring regions of different probes in the image volume are well separated in the color space, a wider angle can be chosen. Figure 7 shows a classification result of this algorithm, where the threshold for cosine of angle f between two vectors was set cos (f) >0.973 or f <13. 3°, which led to few hundred regions. The DAPI image was not used here.

The adaptation of the class-vectors can either be performed prior to region color clustering on pixel/voxel/region basis, or after region growing on region color vectors. The behavior of the class-vectors during iteration region vectors is not yet fully investigated. Preliminary results showed no significant difference to adaptive iteration on pixels. This however, depends on the image set under study.

Discrimination and Resolution in multicolor images Use of spatial information can help separating of overlaps as described above, therefore it is justified to say that a higher resolution for classification is attained, because two (or three) dimensions with non redundant information add to the n-dimensional color space when n fluorochromes are used.

A quantitative analysis of resolution in full detail is not easy. There are sources of noise, of which biochemical noise seems to be particularly disturbing, that have not been quantified yet. However, some general remarks can be made on resolution.

The binary labeling scheme of multicolor experiments can be represented in color space with as vectors. If n fluorochromes are used to label the different probes in different. color combinations, the color space will be n-dimensional. For the sake of simplicity we will deal with normalized intensities in further discussion, what simply means a division of all components (intensities) of a vector in color space are by 255, which is the maximum intensity of a gray level image. For example, if 3 fluorochromes were used (n=3) and one probe was labeled in all three colors, this would correspond to a vector (1,1,1) in the color space of normalized intensities.

Before continuing the discussion, the following parameters are introduced: p: number or different probes in the image n: number of fluorochromes used in a multicolor experiment, n<=p m: maximum number of labels for a probe in the labeling scheme, m<=n s: number of equal labels between two vectors a and b, s<m<=n d: number of different labels between two vectors a and b, d<m<=n n (a): number of labels of vector a, n (a) >=1 n (b) : number of labels of vectors b, n (b) >=1 f : angle between two vectors If p different probes were to be labeled, the highest discrimination, or resolution would be attained, if every probe was labeled in a different unique fluorochrome (n=p). For instance, p=n=3 would correspond to labeling vectors (1,0,0), (0,1,0) and (0,0,1). All labeling scheme vectors would be mutually orthogonal, their scalar products would cancel out. This is the ideal case, and due to a limited number of different fluorochromes in most multicolor experiments n<p, which lead to combinatorial labeling strategy. When n fluors used, 2n-1 different combinations are possible. In such a labeling scheme pairs of class-vectors exist that have at least one label in common and therefore are not orthogonal anymore.

Angles between any two labeling (or class-) vectors decrease with rising number s (s<n) of same colors used.

If n fluorochromes are used and the highest number of simultaneous colors used for a probe is m, the normalized scalar product with angle f between any two given vectors a and b with a= (al,.., an), b= (bol,.., bn) with a ;, b, = {0, 1} and (s<m<=n) number of same labels, where n (a) and n (b) are the number of labels for a and b respectively and 1<=n (a), n (b), is The angle f is smallest between two vectors that differ only in one label, where one is labeled in m and the other in m-1 fluorochromes. The worst case is n (b) =m=n and n (a) =s=n (b)-1 : These considerations suggest two conclusions : 1. The main limiting parameter in multicolor images is the number of fluorochromes used.

2. For a given number of fluorochromes the combinatorial labeling scheme should be designed as such that angles between all label vectors are kept as large as possible (see examples 2 and 4 below).

The best labeling strategy for a given number of fluorochromes n with p different probes and label vectors a ;, i=1.. p is achieved when In other words, with limited number of fluorochromes the ratio between number of same labels s and number of different labels d between any two label vectors should be kept as small as possible, usually s/dc=1.

Examples If 24 different chromosomes in a metaphase spread were to be analyzed (p=24), at least five different fluorochromes would have to be used be to label a whole choromosome set of 22 pairs and two sex chromosomes, because the number of different combinations for 5 fluorochromes is 25-1=31. The maximum number of simultaneous labels of 3 is possible.

Example 1: With p=24, n=5ij n (b) =m=3, n (a) =s=2, equation [2] becomes corresponding to an angle f=arccos (W=35 3° Example 2: With p=24, n=7, n (b) =m=2 and n (a) =s=1, where simultaneous labels higher than 2 can be avoided, corresponding to an angle f =arccos () =45°. Thus discrimination could be enhanced significantly in comparison to example 1 by using more fluorochromes, enabling a reduction of simultaneous labels in the labeling scheme.

Example 3: With p=24, n=5, n (b) =m=4, n (a) =s=3, equation [2] becomes, corresponding to an angle f3" f=arccos (+T=300.

Example 4: Again p=24, n=5, n (b) =m=4, but n (a) =s=2 only with single, double and fourfold labeling corresponding to an angle f=arccos (A=45°, which is same result as in example 2.

Similarity criteria A normalized projection length is a similarity criteria in the region color algorithm presented above. It is set according to the smallest angle appearing in the labeling scheme. For an angle f=arccos(2) =35. 3" (example 2), a threshold angle fthreSh<f/2 would be chosen, corresponding to a projection length b0. 953.

This angle has to be set as such that good balance between discrimination and number of regions is accomplished. If set too small, region information will be lost, leading to pixel/voxel/region based classification in the limit of thresh-0.

Small probes The classification of small probes in multicolor images can be done in various ways. In case of translocations, where the chromosome of a translocation is present, good classification results can be achieved by using solely color information (Figure 5). If, however, there were small probes with unique label, this method would not yield reliable results. Figure 8 shows the data points in color space of FITC, Cy3 and Cy3.5 of a multicolor image of labeled telomeres., A straightforward approach is to apply background correction, and optionally pre-processing algorithms to enhance image quality. Then a edge or intensity based segmentation algorithm is applied on every single color image, binary images are created and an overlay image of these is the classification result. For small probes, this method will yield very good results, if probe intensities are high and segmentation is performed carefully, because color information is reliable only for a small part of a segmented region, and the ring effect due to different segmented sizes of the same probe in different color images could easily be removed thereafter, if probes are well separated.

Whatever kind of segmentation is performed, it is the essential step and therefore has to be done carefully to isolate the color information from background, which can be very high compared to information content.

For multi color images that contain small as well as large probes, the question of finding the right threshold for every single color image arises. Edge based segmentation algorithms for large probes do not promise very good results due to high intensity variations within, even after noise reduction and application of smoothing filters.

Thus, use of true color information for classification of small probes can be desirable, even if color for small probes is highly distorted. An approach could be to apply edge or intensity based segmentation on the maximum intensity image of the different color images after noise reduction and application of smoothing filters. Color vectors for the different segmented regions are then calculated, if they are spatially well separated, which is often the case for small probes.

All sources of noise, and particularly biochemical noise and focal shifts are troublesome for small probes, because such with only a few ten points in color space have bad statistical properties leading to high degrees of overlaps between classes (pixels belonging to different probes). Thus clusters will rarely appear in color space and on will have to deal with high intensity gradients within these regions with direction heavily distorted by noise.

Therefore, use of spatial information either in a region growing algorithm or a segmentation algorithm and classification of regions by classification of single color vectors seems preferable to pixel classification using only color information.

Figure 9a shows a classification result of region color classification, where the region have been segmented, and a color vector has been calculated for each region. Figure 9b shows the data plot in color space in FITC, Cy3, Cy3.5 of all probes labeled only in these fluors. Obviously, a classification solely based on color information will not lead to correct classification.

Focal shifts Light of different wavelength is focused into slightly different axial spots.

The difference can be in the order of 100-200 nm. The point spread function, which describes the three-dimensional intensity distribution in the focal spot is shifted. Hence different spot sizes are imaged for the different multicolor images.

Applications 1) Multi-color bar coding M-FISH is capable of identifying readily both simple and complex chromosomal abnormalities and has by now widely been used in pre-and postnatal applications and tumor cytogenetics. Recently we increased the number of fluors for probe labeling from five to seven. Currently DEAC, FITC, Cy3, Cy3.5, Cy5, Cy5.5 and Cy7 are used for the labeling and DAPI serves as counterstain. In theory this would allow 127 different combinations. More important is that for 24-color karyotyping the use of seven dyes reduces probe complexity because triple combinations of fluors can be avoided and image analysis is facilitated.

In cases where a chromosome with structural abnormalities does not contain DNA material from another chromosome, deleted or duplicated regions can be discerned by chromosome-specific multicolor bar codes. We have until now constructed multicolor bar codes consisting of multiple combinatorially labeled YAC clones for chromosomes 2,3,5,7,8,9,12,15,17, and 20. The above described imaging system was applied for an automated evaluation of the YAC ; signals. Fig~. 11 demonstrates that in particular in tumor cytogenetics bar codes have an increased potential to unravel complex intrachromosomal rearrangements and are invaluable for accurate break-point mapping.

2) Analysis of hidden chromosome rearrangements The discovery of hidden chromosome rearrangements as they frequently occur in apparently normal leukaemic karyotypes or karyotypes from patients with mental retardation provides more of a challenge, particularly as the cytogenetic resolution of M-FISH remains to be established. It is believed that a proportion of such apparently normal karyotypes may harbour cryptic translocations, particularly involving telomeric regions. To address this problem, we have developed a 12 colour FISH assay for subtelomeric rearrangements. This uses a"second generation"set of 41 chromosome-specific cosmid, PAC and P1 clones, all confirmed as within 500 kb of their respective chromosome end. Probes were combinatorially labelled using four fluorochromes, with both the short (p) and long (q) arms of each chromosome having the same labelling combination. This allows the identification of 12 pairs of chromosomes in one hybridisation, with a full survey of all telomeric regions possible in only two hybridizations. For analysis of the combinatorially labelled probes we applied the above described imaging technique. Fig. 9 shows that it is now possible to accurately analyze such metaphases in a fully automated way.

3) 3D The methods described here for classification of multi color images are not restricted to two-dimensional images. They can be applied to three-dimensional multicolor images in the same way. Three-dimensional multicolor images of interphase chromosomes in a cell nucleus are not as dense as metaphase chromosomes. This will lead to more diffuse clusters in the color space. However, appropriate filtering, segmentation and application of density gradient functions in combination with region color clustering and classification on color direction, are together powerful set of tools to overcome obstacles in classification of three- dimensional multicolor images of interphase probes.

In this context, reference is made to Figs. 14 and 15.