HUETHWOHL PHILIPP (DE)
NEUMANN JENS TIMO (DE)
SRIKANTHA ABHILASH (DE)
DE102022101884A | 2022-01-27 | |||
US202117376664A | 2021-07-15 | |||
US11138507B2 | 2021-10-05 | |||
US20190370955A1 | 2019-12-05 |
MOSQUEIRA-REY EDUARDO ET AL: "Human-in-the-loop machine learning: a state of the art", ARTIFICIAL INTELLIGENCE REVIEW, vol. 56, no. 4, 17 August 2022 (2022-08-17), NL, pages 3005 - 3054, XP093041308, ISSN: 0269-2821, Retrieved from the Internet
K. WANGD. ZHANGY. LIR. ZHANGL. LIN: "Cost-Effective Active Learning for Deep Image Classification", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, vol. 27, no. 12, 2017, pages 2591 - 2600
J. SHIMS. KANGS. CHO: "Active Learning of Convolutional Neural Network for Cost-Effective Wafer Map Pattern Classification", IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, vol. 33, no. 2, May 2020 (2020-05-01), pages 258 - 266, XP011786642, DOI: 10.1109/TSM.2020.2974867
Claims 1. A computer implemented method (28, 28') for the detection and classification of anomalies (15) in an imaging dataset (66) of a wafer comprising a plurality of semiconductor structures, the method comprising: - Selecting a machine learning anomaly classification algorithm; - Executing at least one outer iteration (40) comprising the following steps: i. Determining a current detection of a plurality of anomalies (15) in the imaging dataset (66); ii. Obtaining an unsupervised or semi-supervised clustering of the current detection of the plurality of anomalies (15); iii. Executing multiple inner iterations (42), at least some of them compris- ing the following steps: a. Using the anomaly classification algorithm to determine a current classification of the plurality of anomalies (15) in the imaging da- taset (66); b. Based on at least one decision criterion selecting at least one anomaly (15) of the current detection of the plurality of anomalies (15) by selecting at least one cluster of the clustering for presen- tation to a user via a user interface (236), the user interface (236) being configured to let the user assign one or more class labels of a current set of classes to each of the at least one cluster; c. Re-training the anomaly classification algorithm based on anoma- lies (15) annotated by the user in an inner iteration (42) of the cur- rent or any previous outer iteration (40). 2. The method of claim 1, wherein multiple outer iterations (40) are executed, at least some of them comprising steps i., ii and iii. 3. The method of claim 1 or 2, wherein determining a current detection of a plurality of anomalies (15) in the imaging dataset (66) in step i. comprises: - selecting a machine learning anomaly detection algorithm; - determining a current detection of a plurality of anomalies (15) in the imag- ing dataset (66). 4. The method of claim 3, wherein the selected anomaly detection algorithm is trained comprising the following steps: - selecting training data for the anomaly detection algorithm, the training data containing at least one subset of the imaging dataset (66) of the wafer and/or of an imaging dataset (66) of at least one other wafer and/or of an imaging dataset (66) of a wafer model; - re-training the anomaly detection algorithm based on training data selected in the current or any previous outer iteration (40). 5. The method of claim 4, wherein the user interface (236) is configured to let the user define one or more interest-regions (11) in the imaging dataset (66), and the training data for the anomaly detection algorithm is selected only based on said interest-regions (11). 6. The method of claim 4 or 5, wherein the user interface (236) is configured to let the user define one or more exclusion-regions in the imaging dataset (66), and the training data for the anomaly detection algorithm does not contain data based on said exclusion-regions. 7. The method of any one of claims 3 to 6, wherein the anomaly detection algo- rithm comprises an autoencoder neural network, and the plurality of anomalies (15) are detected based on a comparison between an input tile of the imaging dataset (66) and a reconstructed representation thereof obtained by presenting the tile to the autoencoder neural network, the tile containing an anomaly (15) and a surrounding of the anomaly (15). 8. The method of any one of claims 1 to 7, wherein each anomaly (15) is associ- ated with a feature vector, and the decision criterion is formulated with regard to the feature vectors associated with the plurality of anomalies (15). 9. The method of claim 8, wherein the feature vector associated with an anomaly (15) comprises the raw imaging data or pre-processed imaging data of said anomaly (15) or of a tile containing said anomaly (15). 10. The method of claim 8 or 9, wherein the feature vector associated with an anom- aly (15) comprises the activation of a layer, preferably the penultimate layer, of a pre-trained neural network when presented with said anomaly (15) as input. 11. The method of one of claims 8 to 10, wherein the feature vector associated with an anomaly (15) comprises a histogram of oriented gradients of said anomaly (15). 12. The method of any one of claims 1 to 11, wherein multiple anomalies (15) are selected for presentation to the user, and the at least one decision criterion comprises a similarity measure between the multiple anomalies (15). 13. The method of claim 12, further comprising selecting the multiple anomalies (15) to have a high similarity measure between each other. 14. The method of any one of claims 1 to 13, wherein the at least one decision criterion comprises a similarity measure of the selected at least one anomaly (15) and one or more further anomalies (15) that were selected in one or more previous iterations in step iii.b. 15. The method of claim 14, further comprising selecting the multiple anomalies (15) to have a low similarity measure with respect to the one or more further anom- alies (15) that were selected in the one or more previous iterations in step iii.b. 16. The method of any one of claims 1 to 15, wherein the at least one decision criterion comprises a probability of an anomaly (15) for not belonging to the current set of classes. 17. The method of claim 16, wherein the anomaly classification algorithm is an open set classifier and the probability of the anomaly (15) for not belonging to the current set of classes is estimated by the open set classifier. 18. The method of any one of claims 1 to 17, wherein the at least one decision criterion comprises the selected at least one anomaly (15) being classified as a predefined class or a class from a predefined set of classes in the current clas- sification. 19. The method of any one of claims 1 to 18, wherein multiple anomalies (15) are selected for presentation to the user, and the at least one decision criterion comprises the multiple anomalies (15) being classified as the same class in the current anomaly classification. 20. The method of any one of claims 1 to 19, wherein the at least one decision criterion comprises a population of the one or more classes the at least one anomaly (15) is assigned to in the current classification. 21. The method of any one of claims 1 to 20, wherein multiple anomalies (15) are concurrently presented to the user, and the method further comprises grouping and/or sorting the multiple anomalies (15) for presentation to the user. 22. The method of any one of claims 1 to 21, wherein the at least one decision criterion comprises a context of the selected at least one anomaly (15) with re- spect to the semiconductor structures. 23. The method of any one of claims 1 to 22, wherein the at least one decision criterion implements at least one member selected from the group consisting of an explorative annotation scheme and an exploitative annotation scheme. 24. The method of any one of claims 1 to 23, wherein the at least one decision criterion differs for at least two iterations of the inner iterations (42). 25. The method of any one of claims 1 to 24, wherein one of the at least one deci- sion criterion comprises selecting a cluster for presentation to the user accord- ing to a group novelty measure, such that the selected cluster is most dissimilar to one or more of the previously selected clusters. 26. The method of any one of claims 1 to 25, wherein one of the at least one deci- sion criterion comprises selecting a cluster for presentation to the user accord- ing to a between group similarity measure, which measures the similarity be- tween the selected cluster and one or more of the previously presented clusters. 27. The method according to claim 26, wherein the between group similarity meas- ure of the selected cluster lies above a threshold. 28. The method of any one of claims 1 to 27, wherein one of the at least one deci- sion criterion comprises selecting a cluster for presentation to the user accord- ing to a between group dissimilarity measure, which measures the dissimilarity between the selected cluster and one or more of the previously presented clus- ters. 29. The method according to claim 28, wherein the between group dissimilarity measure of the selected cluster lies above a threshold. 30. The method according to any one of claims 1 to 29, wherein the user interface (236) is configured to present multiple clusters to the user, to let the user select one or more of the presented multiple clusters and to let the user assign one or more class labels of a current set of classes to the selected clusters. 31. The method according to any one of claims 1 to 30, wherein the clustering is obtained taking into account the current detection of anomalies and/or the cur- rent classification of anomalies of one or more previous outer or inner iterations. 32. The method according to any one of claims 1 to 31, wherein the at least one decision criterion comprises selecting a cluster for presentation to the user ac- cording to the size of the cluster and/or according to the distribution of the anom- alies within the cluster. 33. The method of any one of claims 1 to 32, wherein the unsupervised or semi- supervised clustering is based on a hierarchical clustering method used to com- pute a cluster tree (194), wherein the root cluster (196) contains the detected plurality of anomalies (15), each leaf cluster (198, 200, 202) contains a single anomaly (15) of the detected plurality of anomalies (15) and for all internal clus- ters (204, 205) of the tree the following applies: for an internal cluster (204, 205) with n child clusters i = {1, .. , n } l et α i, i ∈ {1, .. , n} indicate the set of anomalies (15) of child cluster i, then { α1, .. , α n } is a partition of the set of anomalies (15) contained in the internal cluster (204, 205). 34. The method of claim 33, wherein the hierarchical clustering method comprises an agglomerative clustering method, where two clusters (201, 203, 206) are merged, starting from the leaves of the cluster tree (194), based on a cluster distance measure. 35. The method of claim 34, wherein the cluster distance measure comprises a function of pairwise distances, each between an anomaly (15) of the first and an anomaly (15) of the second cluster (201, 203, 206) of the two clusters (201, 203, 206). 36. The method of claim 34 or 35, wherein the function used for computing the cluster distance measure is Ward’s minimum variance method. 37. The method of claim 33, wherein the hierarchical clustering method comprises a divisive clustering method, where a cluster (201, 203, 206) is iteratively split, starting from the root cluster (196) of the cluster tree (194), based on a dissimi- larity measure between the anomalies (15) contained in the cluster (201, 203, 206). 38. The method of any one of claims 33 to 37, wherein the decision criterion com- prises selecting a cluster (201, 203, 206) of the cluster tree (194) for presenta- tion to the user. 39. The method of claim 38, the user interface (236) being configured to allow the user to select a cluster (201, 203, 206) suitable for annotation by iteratively mov- ing from the current cluster (201, 203, 206) to its parent cluster or to one of its child clusters in the cluster tree (194). 40. The method of claim 38 or 39, wherein the user interface (236) is configured to display a section of the cluster tree (194) containing the currently selected clus- ter (201, 203, 206) and to let the user select one of the displayed clusters (201, 203, 206) of the section of the cluster tree (194) for annotation. 41. The method of claim 40, wherein the section of the cluster tree (194) comprises the currently selected cluster (201, 203, 206) and one or more of its parent clus- ters and/or one or more of its child clusters. 42. The method of claim 40 or 41, wherein the user interface (236) is configured to let the user select the number of tree levels of the section of the cluster tree (194) displayed to the user. 43. The method of any one of claims 33 to 42, wherein one of the at least one decision criterion comprises selecting a cluster for presentation to the user ac- cording to the distance of the cluster from one or more of the previously selected clusters within the cluster tree (194). 44. The method of any one of claims 33 to 43, wherein one of the at least one decision criterion comprises selecting a cluster for presentation to the user ac- cording to the tree level of the cluster in the cluster tree (194). 45. The method of any one of claims 1 to 44, wherein multiple anomalies (15) are concurrently presented to the user and the user interface (236) is configured to batch annotate the multiple anomalies (15). 46. The method of claim 45, wherein batch annotation of the multiple anomalies (15) comprises batch assigning of a plurality of labels to the multiple anomalies (15) concurrently presented to the user. 47. The method of any one of claims 1 to 46, wherein the current set of classes is initialized as a predefined set of classes. 48. The method of any one of claims 1 to 47, wherein the annotation of the at least one anomaly (15) in step iii.b. comprises the option to add a new class to the current set of classes. 49. The method of claim 48, further comprising, upon adding a new class to the current set of classes, offering the user an option to assign previously labeled training data to the new class. 50. The method of claim 48 or 49, wherein the anomaly classification algorithm comprises an open set classifier. 51. The method of any one of claims 1 to 50, wherein the current set of classes is organized hierarchically and this knowledge is included in the training of the anomaly classification algorithm. 52. The method of any one of claims 1 to 51, wherein the current set of classes contains at least one defect class and at least one nuisance class. 53. The method of any one of claims 1 to 52, wherein the current set of classes contains an unknown anomaly class. 54. The method of any one of claims 1 to 53, wherein the selection of a machine learning algorithm comprises selecting one or more of the following attributes: - a model architecture; - an optimization algorithm for carrying out the training; - hyperparameters of the model and the optimization algorithm; - an initialization of the parameters of the model; - pre-processing techniques of the training data. 55. The method of claim 54, wherein one or more attributes of the machine learning algorithm are selected based on specific application knowledge. 56. The method of claim 54 or 55, the at least one outer iteration further comprising a modification step (90) containing an option to modify one or more attributes of the machine learning algorithm. 57. The method of any one of claims 1 to 56, wherein the imaging dataset (66) is a multibeam SEM image. 58. The method of any one of claims 1 to 57, wherein the imaging dataset (66) is a focused ion beam SEM image. 59. The method of any one of claims 1 to 58, further comprising determining one or more measurements based on the current classification of the plurality of anom- alies (15). 60. The method of claim 59, wherein the user interface is configured to let the user define one or more interest-regions (11) in the imaging dataset (66), especially die regions or border regions, and wherein the one or more measurements are computed based on the current classification of the plurality of anomalies (15) within each of the one or more interest-regions (11) separately. 61. The method of claim 60, further comprising automatically suggesting one or more new interest-regions (11) based on at least one selection criterion and presenting the suggested one or more interest-regions (11) to the user via the user interface (236). 62. The method of any one of claims 59 to 61, wherein the one or more measure- ments are selected from the group containing anomaly size, anomaly area, anomaly location, anomaly aspect ratio, anomaly morphology, number or ratio of anomalies, anomaly density, anomaly distribution, moments of an anomaly distribution, performance metrics, precision, recall, nuisance rate. 63. The method of claim 62, wherein the one or more measurements are selected from said group for a specific defect or a specific set of defects. 64. The method of any one of claims 59 to 63, further comprising controlling at least one wafer manufacturing process parameter based on the one or more meas- urements. 65. The method of any one of claims 59 to 64, further comprising assessing the quality of the wafer based on the one or more measurements and at least one quality assessment rule. 66. One or more machine-readable hardware storage devices comprising instruc- tions that are executable by one or more processing devices (244) to perform operations comprising the method of any one of claims 1 to 65. 67. A system (234) for controlling the quality of wafers produced in a semiconductor manufacturing fab, the system comprising - an imaging device (246) adapted to provide an imaging dataset (66) of said wafer; - a graphical user interface (236) configured to present data to the user and obtain input data from the user; - one or more processing devices (244); - one or more machine-readable hardware storage devices comprising in- structions that are executable by one or more processing devices (244) to perform operations comprising the method of claim 65. 68. A system (234’) for controlling the production of wafers in a semiconductor man- ufacturing fab, the system comprising - means (248) for producing wafers (250) controlled by at least one manu- facturing process parameters; - an imaging device (246) adapted to provide an imaging dataset (66) of said wafers; - a graphical user interface (236) configured to present data to the user and obtain input data from the user; - one or more processing devices (244); - one or more machine-readable hardware storage devices comprising in- structions that are executable by one or more processing devices (244) to perform operations comprising the method of claim 64. |
Tab 3: example termination criteria for aborting the outer and/or inner iterations The imaging dataset could be generated by a SEM or mSEM, a Helium ion micro- scope (HIM) or a cross-beam device including FIB and SEM or any charged particle imaging device. In a preferred implementation of the invention, the method can comprise determining one or more measurements based on the current classification of the plurality of anomalies. These measurements are the basis for the user to make decisions, e.g., if training can be terminated, if process parameters should be adapted, or if the cur- rently inspected wafer should be declared as scrap. In addition, the user interface could be configured to let the user define one or more interest-regions in the imaging dataset, especially die regions or border regions, and the one or more measurements can be computed based on the current classification of the plurality of anomalies within each of the one or more interest-regions separately. In this way, the wafer can be inspected locally, and defect distributions can also be computed locally and for each defect separately. The user could, for example, be interested in monitoring different defects depending on the region of the wafer. The method could additionally comprise automatically suggesting new interest-re- gions based on at least one selection criterion and presenting the suggested interest- regions to the user via the user interface. The user could, for example, select a border or a die region. Then, based on a selection criterion comprising, e.g., a similarity measure between different regions of the imaging dataset of the wafer and/or prior knowledge on the spatial location of the target region on the wafer, further border or die regions could be proposed and displayed to the user. The user could then select one, several or all of them to add these to the interest-regions. In this way, the anno- tation effort for the user is reduced. The one or more measurements can be selected from the group containing anomaly size, anomaly area, anomaly location, anomaly aspect ratio, anomaly morphology, number or ratio of anomalies, anomaly density, anomaly distribution, moments of an anomaly distribution, performance metrics, e.g., precision rate, capture rate, nuisance rate. The one or more measurements can be selected from said group for a specific defect or a specific set of defects. If one or more interest-regions have been selected by the user, these measurements can be computed locally with respect to the one or more of these interest-regions yielding, e.g., a local anomaly distribution, an average size of a specific defect within a specific region, the variance of the area of a specific defect within a specific region or a precision rate, nuisance rate or capture rate for a specific region, e.g., within border or die regions. Based on the one or more measurements at least one wafer manufacturing process parameter can be controlled. After computing said measurements, it would be possi- ble to determine the defect density for multiple regions of the wafer based on the result of the workflow. Different ones of these regions can be associated with different pro- cess parameters of a manufacturing process of the semiconductor structures. This can be in accordance with a Process Window Qualification sample. Then, the appro- priate at least one process parameter can be selected based on the defect densities, by concluding which regions show best behavior. Based on the one or more measurements and at least one quality assessment rule the quality of the wafer could be assessed. For example, the currently inspected wafer could be marked as scrap if a specific defect has been detected in the corresponding imaging dataset, or if a specified number of defects has been detected within a spe- cific region of the imaging dataset. Based on the disclosed workflow, cold-starting is possible within reasonable periods of time due to a reduced use of prior knowledge and a reduced annotation effort. As a result, cold-starting a workflow on a 50mFoV dataset, typically, requires about 24 hours in total, distributed among the steps of the workflow as follows: (1) 4h image acquisition under optimal conditions, (2) 3h to draw regulative and/or semantic masks (3) 4h to train the anomaly detection algorithm (4) 4h to annotate the anomalies (5) 4h to train the anomaly classification algorithm (6) 5h for review and qualification. This is possible using advanced compute infrastructure (6xV100 GPUs), 100TB fast file storage, efficient resource management using e.g., Kubernetes and a robust software design (e.g., dedicated data layer, caching meta-data for display etc.). In the following, advantageous exemplary embodiments of the invention are de- scribed and schematically shown in the figures. Fig.1 shows a schematic cell structure of a mSEM image of a wafer without defects; Fig.2 shows a defective cell structure containing six different types of defects; Fig.3 shows the cell structure of Fig.2 with marked and classified defects; Fig.4 shows a flow chart of a first embodiment of the computer implemented method for the detection and classification of anomalies; Fig.5 shows a flow chart of a second embodiment of the computer implemented method for the detection and classification of anomalies; Fig.6 shows a flow chart of the data selection routine in Fig.5; Fig.7 shows a flow chart of the anomaly detection routine in Fig.5; Fig.8 shows a flow chart of the annotation step in Fig.5; Fig.9 shows a flow chart of the classification step in Fig.5; Fig.10 shows a flow chart of the review routine in Fig.5; Fig.11 shows a cluster tree obtained by a hierarchical clustering method; Fig.12 shows a flow chart of a modified implementation of the annotation step based on hierarchical clustering; Fig.13 shows an improved precision-recall curve based on the disclosed inven- tion; Fig.14 schematically illustrates a system for controlling the quality of wafers in a semiconductor manufacturing fab; and Fig.15 schematically illustrates a system for controlling the production of wafers in a semiconductor manufacturing fab. Fig.1 shows a schematic cell structure 10 of a mSEM image of a wafer 250. In this schematic, the cells 12 are identical and regularly distributed over the entire image without showing any defects. In real data, however, the cell structure 10 can show defects, i.e., deviations of the semiconductor structure from an a priori defined norm, as well as nuisance, i.e., variations due to, for example, imaging artefacts, image ac- quisition noise, varying imaging conditions, variations of the semiconductor structures within the norm, imperfect lithography, varying manufacturing conditions, varying wa- fer treatment or rare semiconductor structures. Automatic defect detection methods suffer from the problem that they cannot discriminate between defects and nuisance. Thus, most of the detections of these methods correspond to nuisance and only very few to defects leading to a low precision rate. Therefore, a method able to discriminate between defects and nuisance is required. In addition, cold-starting is a common re- quirement in the semiconductor industry, i.e. training a system from scratch without prior knowledge of the imaging dataset 66 or the classes to be encountered. Due to the large size of the imaging datasets 66 this is only feasible if the user effort is kept as low as possible. Fig.2 shows a schematic defective cell structure 14 containing a plurality of anomalies 15. An anomaly 15 is a localized deviation of the imaging dataset 66 from an a priori defined norm, here the deviation from a normed semiconductor structure. Fig.3 shows the anomalies 15 of Fig.2 classified as one of six defect types: open 16, puncture 18, merge 20, half-open 22, dwarf 24 and skid 26. The precise detection and classification of such defects without requiring extensive prior knowledge or a high annotation effort from a user is the objective of this invention. Fig.4 shows a flowchart of a first embodiment of the computer implemented method 28 for the detection and classification of anomalies 15 in an imaging dataset 66 of a wafer 250 comprising a plurality of semiconductor structures. In a data selection rou- tine 30 a machine learning anomaly classification algorithm is selected, the selection including a model architecture, hyper parameters, an optimization algorithm, an ini- tialization of the model and pre-processing techniques for the training data. For ex- ample, a deep learning algorithm based on the VGG16 neural network architecture together with adequate loss functions can be selected. Training can be carried out from scratch, or a pre-trained model can be loaded as initialization. Then, one or mul- tiple outer iterations 40 are executed. At least one of these outer iterations 40 com- prises the following steps: in an anomaly detection routine 32 a current detection of a plurality of anomalies 15 in the imaging dataset 66 is determined. The current detec- tion of the plurality of anomalies 15 can be obtained by means of user annotation or automatically by using an algorithm, e.g., a pattern matching algorithm or a machine learning algorithm. The machine learning algorithm can contain an autoencoder neu- ral network, which is trained on sample data from the imaging dataset 66 itself or on sample data from a CAD wafer filer. Anomalies can be detected based on the differ- ence between a tile of the imaging dataset 66 and a reconstruction of this tile com- puted by the autoencoder network. The larger the difference the more likely the tile contains an anomaly. Based on the current detection of the plurality of anomalies, multiple inner iterations 42 are executed. At least one of the inner iterations comprises the following steps: in an anomaly classification routine 34 the selected anomaly classification algorithm is used to determine a current classification of the plurality of anomalies 15 in the imag- ing dataset 66. In an annotation routine 36, based on at least one decision criterion, at least one anomaly 15 of the current detection of the plurality of anomalies 15 is selected for presentation to a user. The decision criterion can comprise computing a similarity measure or a dissimilarity between different samples. The decision criterion can alternatively or additionally comprise a hierarchical clustering of the anomalies 15 of the current detection of anomalies 15 (or of the tiles containing these anomalies 15) based on a cluster tree 194. The user assigns a class label of a current set of classes to each of the at least one anomaly 15 selected by the decision criterion. In the first outer iteration 40, the current set of classes can be empty, thus coping with cold-start scenarios without prior knowledge about defect classes in the imaging da- taset. The current set of classes can also contain one or more different labels of de- fects 16, 18, 20, 22, 24, 26. The set of classes can also contain one or more nuisance classes in order to discriminate nuisance from defects, e.g., “imperfect lithography”, “contrast variation”, etc. The set of classes can also contain an “unknown” class, so new or unknown structures or structures with an unclear class affiliation can be as- signed to this class and do not interfere with the classification of other samples. The current set of classes can be extended by adding new labels in each inner iteration 42, e.g., by using an open set classifier. In a re-training routine 38, based on anoma- lies 15 annotated by the user in an inner iteration 42 of the current or any previous outer iteration 40 the anomaly classification algorithm can be re-trained. Since all samples from inner iterations 42 within any previous outer iteration 40 can be re-used for training, the user is able to interactively adapt single building blocks of the system, e.g., by changing the machine learning architecture or hyperparameters of the anom- aly detection and/or anomaly classification algorithm, and can still use all of the pre- viously annotated training data for training of the anomaly classification algorithm. In this way, training is very effective. Fig.5 shows a flowchart of a second embodiment 28’ of the computer implemented method comprising six stages: a data selection routine 46, where the user provides semantic and/or regulative masks for the imaging dataset 66 of the wafer 250; an anomaly detection routine 48, where an anomaly detection algorithm is trained and applied to the masked region; alternatively, a pre-trained model can be loaded, the model could possibly be re-trained and applied; an annotation step 50, where the detected anomalies are manually assigned to the current set of classes; a classifica- tion step 52, where an anomaly classification algorithm is trained using the annotated anomalies and applied to the detected anomalies within the masked region; alterna- tively, a pre-trained model can be loaded in a skipping step 60, the model could pos- sibly re-trained and applied; a review routine 54, where the user can review the clas- sification results, modify class labels, correct misclassified anomalies 15 or decide to refine stages of the workflow during an additional outer iteration 40; a report step 56, where performance metrics summarizing the incidence of various defect classes are compiled in a report. Based on this workflow interactive defect detection and nuisance rate management can be implemented, which allows for cold-starting. In detail: The second embodiment of the computer implemented method 28’ for the detection and classification of anomalies 15 in an imaging dataset 66 of a wafer 250 comprising a plurality of semiconductor structures comprises: One or multiple outer iterations 40 are executed containing the data selection routine 46 and the anomaly detection routine 48. In the data selection routine 46 interest-regions 11 of the imaging dataset 66 are se- lected, e.g., by drawing masks on the imaging dataset 66. The interest-regions 11 can be used to train the anomaly detection and/or the anomaly classification algorithm. The interest-regions 11 can also be used to indicate regions for evaluating the perfor- mance of the workflow. In this case, semantic masks can be of interest, i.e., masks containing a specific section of the wafer 250 such as border or die regions, to obtain region-specific measurements. The interest-regions 11 can be expanded or modified during further outer iterations 40 or further intermediate iterations 44 of the workflow. This enables the user to iteratively train the workflow encompassing the entire dataset with minimal effort. In the anomaly detection routine 48, an anomaly detection algorithm can be selected and trained based on the selected data. If the user is not satisfied with the detection results of the anomaly detection algorithm, the data selection routine 46 can be re- peated in a further intermediate iteration 44. Based on modified interest-regions 11 and a re-training of the anomaly detection algorithm the quality of the detection results can be improved. Based on the trained anomaly detection algorithm, a current detec- tion of the plurality of anomalies 15 is determined within the one or more interest- regions 11. Multiple inner iterations 42 are executed containing the annotation step 50, the anom- aly classification routine 52 and, possibly, the review routine 54. In the annotation step 50 the user annotates the plurality of anomalies 15 by assigning a class label to each of them or to a subset thereof. To reduce annotation effort, active learning can be applied by selecting specific samples from the plurality of anomalies 15 for presentation to the user, e.g., samples that are very similar and probably belong to the same class, or samples that are most dissimilar compared to the samples se- lected in a previous inner iteration 42. The user annotations can be skipped in a skip- ping step 60, for example by selecting a pre-trained anomaly classification algorithm and continuing with the anomaly classification routine 52. In the anomaly classification routine 52 the anomaly classification algorithm can be trained based on the previously annotated anomaly samples. Here, samples from the current inner iteration 42 or from previous inner iterations 62 which were part of a previous outer iteration 40 can be used together. In this way, training can be carried out most effectively and with minimum user effort. Based on the trained anomaly clas- sification algorithm, a current classification of the detected plurality of anomalies is determined, meaning that each anomaly of the plurality of anomalies is associated with one of the classes of the current set of classes. In the review routine 54 the user can review the current classification computed in the anomaly classification routine 52, He can visualize and navigate through the current classification of the plurality of anomalies 15, determine measurements based on the current classification of the plurality of anomalies 15, e.g., by measuring sizes of one or more anomalies or by computing an anomaly density for a specific region of the imaging dataset 66 or for a specific class, e.g., a specific defect, or he can check performance metrics, modify class labels or correct misclassified anomalies. Further- more, the quality of the wafer 250 can be assessed based on measurements and at least one quality assessment rule. For example, the wafer 250 can be labeled as defective, if a certain number of anomalies 15 classified as a certain defect is ex- ceeded. If the user is satisfied with the results, he can move on to the report step 56, where information on the imaging dataset 66, interest-regions 11, the set of classes, defects, statistics and metrics can be exported for future reference, for example by saving the information to a file. Otherwise, if the user is not satisfied with the results, he can go back to the data selection routine 46 and repeat the whole cycle during one or more intermediate iterations 44. By integrating data selection, anomaly detection and anomaly classification into a sin- gle workflow allowing the user to repeat and modify previous stages in the workflow within an intermediate iteration 44, classification results of high quality can be ob- tained within a short period of time. The reason for this lies in the flexibility of this workflow, since the user can directly visualize and thus react to the current classifica- tion results by not only modifying the classification algorithm or its training data within the inner iterations 42, but also by modifying earlier steps such as the anomaly detec- tion algorithm or the selection of interest-regions 11 within the outer iterations 40. Fig.6 is a flowchart illustrating an example implementation of the data selection rou- tine 46 based on a given imaging dataset 66. In a decision step 68 the user selects, if the workflow has already been trained (positive answer 70) or if cold-starting is re- quired (negative answer 72). If the workflow has already been trained the user might be interested in evaluating defect rates in different interest-regions 11 of the wafer 250, for example in die regions or border regions. Therefore, the user can indicate semantic masks containing such specific regions in a semantic annotation step 74. Based on a selection criterion the method can automatically suggest further interest- regions 11, e.g., based on their similarity to the user indicated interest-regions 11. For example, the user could mark die regions and the workflow could automatically indi- cate further die regions to the user via the user interface 236, which the user could add to their data selection. To expedite the selection process cut-copy-paste com- mands are available for mask selection. Further steps of the workflow, e.g., the anom- aly detection routine 48 can then be carried out based on these semantic interest- regions 11. Otherwise, if cold-starting is required (negative answer 72), the anomaly detection algorithm and the anomaly classification algorithm have to be learned from scratch. But their training can take prohibitively long for large datasets. Thereforethe user se- lects a representative subset of the imaging dataset 66 as interest-region 11. Said algorithms are then trained on the one or more interest-regions 11 with a human eval- uator in the loop in the subsequent steps within reasonable turnaround times. With increasing confidence in said algorithms, interest-regions 11 can be expanded to cover the entire dataset iteratively. This process is implemented in the following way: In a regulatory annotation step 76 the user can indicate one or more interest-regions 11 in the imaging dataset 66, which are used for the training and/or application of the anomaly detection algorithm in the anomaly detection routine 48. These regions can be expanded or modified during further outer iterations 40 or further intermediate iterations 44 of the algorithm to in- clude more regions of the imaging dataset 66 containing other defects or nuisances. To make cold starting possible, the user can start with a small interest-region 11, train the anomaly detection and the anomaly classification algorithm based on samples from this region and later on expand the interest-region 11 or add further interest- regions 11 and retrain both algorithms. The selected interest-regions 11 are the input of the subsequent anomaly detection routine 48. Fig.7 is a flowchart illustrating an example implementation of the anomaly detection routine 48. The objective of this step is to highlight regions of the imaging dataset 66 that are outliers with respect to the expected patterns in the dataset. During training, the anomaly detection algorithm, preferably an autoencoder, is presented with imag- ing data 66 without (or with very few) defects. The parameters in the anomaly detec- tion algorithm are tuned to reconstruct the imaging data 66 subject to an information bottleneck. Optionally, search for the best model architecture can also be manually or automatically performed. As a result, noise and defect-free images are perfectly re- constructed. On the other hand, image regions with defects are poorly reconstructed. Therefore, thresholding the difference between input and reconstructed input provides proposals for defects or anomalies 15. The workflow enables users to visualize the input, reconstruction images, adjust thresholds and analyze anomalies e.g., location, size, morphology etc. Should the model performance be unsatisfactory, the user can modify model parameters and/or input data to launch another inner iteration 40 of training of the anomaly detection algorithm. During evaluation of the workflow, the user can select a pre-trained model, which is applied to the imaging dataset 66 or to the one or more interest-regions 11, respec- tively. The resulting anomalies can be visualized by the user, and their properties can be analyzed. The objective of the anomaly detection routine 48 in the workflow is to obtain a high capture rate, e.g., close to 100%, meaning that almost all defects contained in the imaging dataset 66 are identified. This, however, will result in a very high nuisance rate, e.g., 99.99 %, meaning that only 1 of 10,000 detected anomalies, actually, re- lates to a defect. For this reason, the classification step 52 is added to the workflow. The anomaly detection routine 48 can be implemented in the following way: In a first decision step 78 the user indicates if he wants to use a pre-trained model (positive answer 80) or if cold-starting is required (negative answer 88). In case a pre- trained model is used, the user selects the model in a model selection step 82. The term model means a machine learning algorithm including a model architecture, hyper parameters, an optimization algorithm, an initialization of the model parameters and/or data pre-processing methods. Instead of a machine learning algorithm, other anomaly detection algorithms such as but not limited to pattern matching algorithms can be used for anomaly detection. It is also possible to query the user to annotate anomalies in the dataset by hand. The model is applied to detect anomalies in the selected one or more interest-regions 11 in a model application step 84 yielding a current anomaly detection in a current detection step 86, e.g., by applying thresholds to probabilistic detections. In case cold-starting is required (negative answer 88), the user selects an anomaly detection algorithm and parameters. In case a machine learning algorithm is selected, the user initializes the current model in a modification step 90 by selecting a model architecture, hyper parameters, an optimization algorithm and/or an initialization of the model parameters, e.g., the weights in case a neural network is selected. Alter- natively, a pre-trained model can be selected and re-trained. For anomaly detection an autoencoder model is preferrable. If training is required, the anomaly detection model is trained on sample data. In an analysis step 92 the user applies the anomaly detection algorithm to the selected one or more interest-regions 11 and analyzes the detection results. In a decision step 94 the user decides if the quality of the results is satisfactory (positive answer 104) or not (negative answer 96). If the user is not sat- isfied, he decides in another decision step 98 if he wants to modify the one or more interest-regions 11 (positive answer 100) by going back to the data selection routine 46. Otherwise (negative answer 102) the user can modify the anomaly detection al- gorithm by selecting a different algorithm, model or parameters and possibly re-train- ing the model in steps 90, 92. Once the user is satisfied with the anomaly detection results (positive answer 104) he can set thresholds in a threshold selection step 106. These thresholds can be applied to probabilistic outputs representing uncertainty of the anomaly detection algorithm. Based on these thresholds a binary decision can be taken for each pixel, if it belongs to an anomaly or not. In a saving step 108 the anom- aly detection algorithm including the selected model and parameters is stored and can be reloaded as pre-trained model in the model selection step 82 during further iterations of the workflow. Based on the anomaly detection algorithm and the selected thresholds a current detection of anomalies is determined in the current detection step 86. The current detection of anomalies is the input of the annotation step 50. Fig.8 is a flowchart illustrating an example implementation of the annotation step 50. The anomalies detected by the anomaly detection algorithm contain outliers and can be over-shadowed by nuisance. e.g., due to image acquisition noise, imperfect lithog- raphy, varying manufacturing conditions, miscellaneous wafer treatment, secondary uninteresting defects etc.. The annotation step enables the user to discriminate anom- alies from nuisance by assigning the anomalies to the current set of classes compris- ing defects (e.g., missing structure, broken structure etc.) and nuisance. As labeling individual samples requires large user effort and often results in poor la- beling quality, the workflow provides for a group-wise annotation strategy. Here, anomalies 15 are pre-clustered into groups based on their similarity. In each inner iteration 42, the user is presented with an unlabeled anomaly-group, all of which might be binned into a single class, e.g. by the virtue of pre-clustering. As a result, the user not only annotates multiple anomalies in a single click, but also gains an overview of intra-class variations, resulting in better annotation quality. The annotation process can be terminated when , e.g., (1) all anomalies are annotated, or (2) a certain termi- nal criterion is reached e.g., maximum number of clicks, total time for annotation etc. In addition, human effort is optimized by enabling the user to allocate distinct class labels to mutually exclusive subsets within a single anomaly group. Further, querying the next anomaly group can be optimized for “novelty”, in that each new anomaly- group should be visually different from the ones annotated before. It is to be noted that the novelty is evaluated on the group level, thereby making it robust to noise and outliers in practical scenarios. It is assumed that all user defined classes have a minimum number of samples, e.g., 10, so that sufficient data is available for training of a robust anomaly classification algorithm. The annotation step can be implemented in the following way: Input to the annotation step is a current detection of anomalies in the one or more interest-regions 11 obtained from the anomaly detection routine 48. In a first decision step 110 the user can decide if he wants to train or re-train the anomaly classification algorithm (positive answer 114) or if he wants to use a pre-trained model (negative answer 112). In the latter case, the workflow directly continues with the anomaly clas- sification routine 52. If the anomaly classification algorithm needs to be trained or re- trained based on further samples (positive answer 114), active learning can be applied to reduce the annotation effort for the user and speed up the training. For active learning, the plurality of anomalies of the current detection of anomalies is pre-clustered in a clustering step 116. Clustering the anomalies into groups reduces the annotation effort for the user, since groups of anomalies, which are likely to be associated with the same class, can be annotated simultaneously with a single or very few user interactions. To cluster the plurality of anomalies, each anomaly is extracted from the imaging dataset 66, usually together with a surrounding context of the anom- aly. For clustering, the raw image data can be used as feature vector, or feature vec- tors can be computed for the plurality of anomalies. Such a feature vector can, for example, comprise the activation of the penultimate layer of a pre-trained neural net- work, e.g., the VGG16 network pre-trained on the ImageNet database, when pre- sented with the anomaly as input. Clustering can be based on a similarity measure between the feature vectors of different anomalies, e.g., the cosine similarity meas- ure. The more similar the feature vectors are, the more likely they belong to the same cluster. All the samples of a cluster can then be presented to the user simultaneously in a querying step 118 and the user can – in the optimal case – assign all of the samples to the same class with a single user interaction. To speed up training, it can be advantageous to explore the variation of the anomalies as quickly as possible. To this end, the concept of group novelty can be applied in the querying step 118, meaning that the cluster, which is most dissimilar from the previ- ously presented cluster, is selected for presentation and annotation to the user. Since the clusters can contain samples from different classes, which cannot be anno- tated with a single user action, the user can assign different labels to different samples in the same cluster. To facilitate this process, hierarchical clustering is helpful. Based on hierarchical clustering a cluster tree is built, which is further explained with respect to Fig. 11. Starting from a cluster selected from the cluster tree due to a decision criterion, the user can move up or down the cluster tree to modify the resolution of the clusters until he finds a cluster whose samples all belong to the same class. This process is further explained with respect to Fig.12. After selecting a cluster for presentation to the user based on the decision criterion in the querying step 118, the user decides in a decision step 120 if he wants to terminate the labeling. In case of a positive answer 122 the workflow proceeds with the anomaly classification routine 52. In case the user wants to continue labeling (negative answer 124), in a visualization step 126 the samples belonging to the selected cluster are visualized via the user interface 236. In a decision step 128 the user decides if a new class label is required for labeling the current cluster. If this is the case (positive an- swer 130), in a class update step 134 the current set of classes and the user interface 236 are updated to contain the new class label. Otherwise, if no new class label is required for labeling (negative answer 132), the current set of classes does not change. In an allocation step 136 the user can assign one or more samples to one of the classes of the current set of classes. In a decision step 138 it is determined if all samples of the selected cluster are labeled (positive answer 140) or not yet (negative answer 142). In the latter case, the labeling continues with the decision step 128 of- fering the user an option to add a new label. If all samples of the current cluster are labeled, the labeled dataset is saved in a saving step 144. Then the next cluster is selected in the querying step 118. Fig.9 is a flowchart illustrating an example implementation of the anomaly classifica- tion routine 52. The anomaly classification algorithm aims at segregating the anomalies into user- defined classes in order to manage nuisance. During training, the algorithm learns to match anomaly-crops to the current set of classes. The user can customize the model e.g., include robustness against contrast-variations, account for data imbalance, mod- ify model architecture etc. Optionally, an automatic search for the best model archi- tecture for the given use-case can be manually or automatically performed. During evaluation of the workflow, all anomalies of the current detection of anomalies are input to the model to automatically generate inferred labels. The objective of the classification step 52 is to maintain the capture rate at a high level, e.g., close to 100 %, whereas the nuisance rate should be significantly reduced, e.g., to below 10%. The classification step 52 can be implemented in the following way: The input data to this step is a plurality of detected anomalies. If the labeling has not been skipped in the skipping step 60 the anomalies are also labeled for further train- ing. In a first decision step 146 the user decides if he wants to use a pre-trained anomaly classification model (positive answer 148). In this case the user selects a pre-trained model for anomaly classification in a model selection step 152. Then the model is applied to the plurality of anomalies detected by the anomaly detection algo- rithm in the model application step 154 yielding a current classification of the plurality of anomalies. If instead the user wants to train or re-train the anomaly classification model based on new sample data (negative answer 150), the user selects a pre-trained anomaly clas- sification model or initializes a new model. In a pre-processing step 156 pre-pro- cessing can be applied to the annotated sample data, e.g., data-augmentation, image enhancement or contrast-removal. In a hyper parameter selection step 158 the user selects hyper parameters of the model for training. In a splitting step 160 the training data is split into a set of training data and a set of validation data. The training data is used for training the model in a training step 162, while the validation data is used to monitor the model’s performance on unseen data samples in a validation step 164 in order to avoid over adaptation to the training data. Finally, in an analysis step 166 performance metrics are computed. Based on the classification of the detected anomalies low nuisance rates can be achieved. The reason is that anomalies not containing relevant defects can be as- signed to one or more nuisance classes and, thus, do not interfere with the detection of true defects. Fig.10 is a flowchart illustrating an example implementation of the review routine 54. In this step the user is able to visualize the classification results, which are overlaid on the dataset view. The user can choose which classes to consider, navigate through sFoVs, inspect images by zooming in/out, inspect details of the defects e.g., defect location in global coordinate frame, defect size etc., obtain overall defect statistics, and if available, classification performance metrics e.g., capture rate and nuisance rate. If the user decides to retrain the classifier due to unsatisfactory classifier performance because of mislabeling or due to false detections during the anomaly detection routine 48, he is directed to a refinement stage for re-training the classifier. In the refinement step, the user can select the size and composition of the dataset to be refined. An objective of the review process is to increase the user’s trust and confidence in the workflow within two or three iterations, after which the review process can be made optional. Samples annotated by the user in a previous iteration of the workflow will be retained as part of every following training step. Even though the user is not presented again with these samples they are included in the training. If a user adds an additional class to the current set of classes the user is given the opportunity to review and modify previous annotations again. The review routine 54 can be implemented in the following way: First, a current classification of the plurality of anomalies based on the current set of classes is determined in a current classification step 168. In a muting step 172 the user can select classes to disregard, i.e., classes which are excluded from the review. This might be the case if the user is confident of some classes and wants to concen- trate on the classification results of more difficult classes. The user can then visualize different types of information for assessing the quality of the trained workflow. In a defect visualization step 174 one or more defect instances can be visualized in the dataset. To this end, the classification results are overlaid on the dataset for analysis. The user can choose which classes to consider, navigate through the scan field of view (sFoV) or inspect images by zooming in or out. In a metrology step 176 measurements of the defects can be computed, e.g., defect location or defect size. In addition, overall statistics can be computed, e.g., number of defects per class or average defect size. Spatial statistics can be computed based on selected interest-regions 11, e.g., defect density within one or more interest-regions 11. In addition, performance metrics can be computed such as precision, nuisance and capture rate. In a semantic result step 178 classification results can be evaluated according to steps 174, 176 with respect to semantic masks indicated in the semantic annotation step 74, for example with respect to die regions or border regions only. Based on the review the user can judge the quality of the detection and classification model and decide on further steps for improving the workflow. In a first decision step 180 the user decides if he is satisfied with the quality of the results. If this is the case (positive answer 182) the workflow continues with the report step 56. Otherwise (neg- ative answer 184), the user decides in a subsequent decision step 186 if the detected anomalies make sense. If this is not the case (negative answer 188) the workflow is repeated by carrying out a further outer iteration 40 starting from the data selection routine 46, so the anomaly detection model can be improved based on further or dif- ferent data samples. If the detected anomalies make sense (positive answer 190) the anomaly classification algorithm can be improved. To this end, the user selects an- other or an additional interest-region 11 for refinement of the classification algorithm in a refinement step 192 and goes back to the anomaly classification routine 50 car- rying out a further inner iteration 42. In a subsequent report step 56 the user can save relevant information about the train- ing and/or the model to a file for future reference, e.g., defect-level and dataset-level information, metrology details and statistics. The user can configure the level of detail to be preserved in the report, e.g., crops of defects stored in the report, high-level intensity histograms etc. If available, metrics such as capture rate, nuisance rate and defect source analysis etc. can be included in the report. The objective of the report step 56 is to capture high-level information of the datasets used to train the model and the underlying defect catalogue. Further, it should be easy for the user to investigate the reasons why a workflow exhibits reduced performance due to shifts in manufacturing or imaging conditions. Fig.11 illustrates a preferred implementation of the clustering step 116 in Fig.8 based on hierarchical clustering. It shows a cluster tree 194 obtained by agglomerative or divisive hierarchical clustering of a set of samples belonging to six different classes: wavy line, start, triangle, square, rectangle, circle. The tree consists of a root cluster 196 at the top, leaf clusters 198, 200, 202 at the bottom and internal clusters 204, 205, 210 in between. The root cluster 196 contains the whole sample set, whereas the leaf clusters 198, 200, 202 contain only a single sample of the sample set. An agglomerative hierarchical clustering can for example be computed by means of the hierarchical agglomerative clustering (HAC) algorithm. This method initially as- signs each sample to a leaf cluster 198, 200, 202. Based on a similarity measure the similarity between the samples of each two different clusters is computed. For the two clusters with the highest similarity measure a new parent cluster is added to the tree containing the samples from both clusters. For example, the internal clusters 206, 208 both contain similar rectangular structures, i.e. square and rectangle. Therefore, their similarity is high. A new parent cluster 210 is created containing the samples from both child clusters 206, 208. This process is repeated until one cluster contains all samples, which is the root cluster 196. A divisive hierarchical clustering can be computed by means of the divisive analysis clustering (DIANA) algorithm (see above). This method initially assigns all samples to the root cluster 196. For each cluster, two child clusters are added to the tree, and the samples contained in the cluster are distributed between these child clusters based on a function. This process is continued until every sample belongs to a sepa- rate leaf cluster. The function measures dissimilarities between samples contained in the cluster. The DIANA algorithm determines the sample with the maximum average dissimilarity, adds the sample to one of the child clusters and then moves all samples to this child cluster that are more similar to this child cluster than to the remainder. For example, the cluster 210 is split into two clusters by adding two child clusters 206, 208. The object with the maximum average dissimilarity is one of the rectangles. This is moved to one of the new child clusters, i.e., child cluster 208. Then all objects more similar to this new cluster are moved to this child cluster 208, i.e., the second rectangle is added to the child cluster 208. The remaining samples, that is the squares, are moved to the second new cluster, i.e., the child cluster 206. Fig.12 shows a preferred implementation of the annotation step 50’ based on a clus- ter tree 194. A hierarchical cluster tree based annotation facilitates the annotation of the plurality of anomalies for the user by reducing the number of required user inter- actions. Fig.12 differs in three aspects from the annotation step 50 in Fig.8. First, the clustering step 116 is modified to a hierarchical clustering step 116’. Second, the que- rying step 118 is modified to a hierarchical querying step 118’. Third, the allocation step 136 is modified to a hierarchical allocation step 136’. In the hierarchical clustering step 116’ a hierarchical clustering method is used to build a cluster tree 194 from the sample data containing the plurality of detected anomalies 15. In the hierarchical querying step 118’, a cluster of the cluster tree is selected for presentation to the user based on a selection criterion, for example, the cluster with the highest dissimilarity measure compared to the cluster annotated in the previous iteration. The hierarchical allocation step 136’ allows the user to move through the cluster tree 194 in order to select a desired cluster resolution. If the cluster resolution is too high, samples from possibly many different classes are part of the current cluster. If the cluster resolution is too low, the cluster contains only samples from one class but is very small. In this case, parent clusters higher up in the cluster tree 194 may contain more samples of the same class and thus would be preferred for labeling by the user. The hierarchical allocation step 136’ comprises the following steps: in a decision step 212 the user decides if he is satisfied with the resolution of the current cluster. In this case (positive answer 216) he proceeds with annotating one or more of the samples in the current cluster in the hierarchical annotation step 224 and continues as de- scribed above for Fig.8. Otherwise (negative answer 214), the samples of a larger section of the cluster tree 194 containing the current cluster, e.g., the current cluster, its child clusters and its parent cluster, are displayed by the user interface 236 in a cluster display step 218. The user can inspect the clusters and select one of them in a cluster selection step 220, thereby improving the cluster resolution of the current cluster. The cluster resolution is higher if a child cluster is selected. The cluster reso- lution is lower if the parent cluster is selected. The process can be repeated in one or more iterations 222 until a satisfying cluster resolution is achieved. Then the current cluster is annotated in the hierarchical annotation step 224. For example, let the cluster 210 be the cluster selected in the hierarchical querying step 118’. Then the clusters of the child clusters 206, 208 and the cluster of the parent cluster 211 are displayed to the user. The clusters of the child clusters 206, 208 have a higher resolution, only containing samples from a single class, whereas the cluster of the parent cluster 211 contains samples from three different classes and, thus, has a lower resolution. For the user it might be beneficial to move to one of the child clus- ters 206, 208 and annotate this cluster by means of a single user interaction. However, let the cluster 207 be the selected cluster in the hierarchical querying step 118’. Then the clusters of the child clusters 201, 203 and the cluster of the parent cluster 206 are displayed to the user. The clusters of the child clusters 201, 203 have a higher resolution containing only one sample, whereas the cluster of the parent cluster 206 has a lower resolution containing four different samples of the same class. For the user it might be beneficial to move to the parent cluster 206 and annotate this cluster, thereby assigning a label to all four samples instead of only two of them by means of a single user interaction. The process can be repeated in one or more iter- ations 222, thereby moving through the clusters of the cluster tree 194, until a satis- fying cluster resolution is achieved. Then the current cluster is annotated in the hier- archical annotation step 224. During the annotation of the clusters new classes can be added to the current set of classes in the decision step 128 and the class update step 134. Fig.13 illustrates an effect of the application of the methods described above. It shows a conventional precision-recall curve 230 and an improved precision-recall curve 232 for defect detection based on the disclosed techniques. The precision axis 226 is the vertical axis and indicates various precision rates. The recall axis 228 is the horizontal axis and indicates various recall rates (i.e. capture rates). Based on conventional anomaly detection methods, the number of detected anomalies is very high, but of these only few are associated with real defects of the wafer 250. Therefore, the num- ber of false positive detections, i.e., nuisance, is high leading to a rather low precision rate of the conventional precision-recall curve 230. By combining anomaly detection and classification, real defects can be discriminated from nuisance, thereby strongly reducing the number of false positive detections. Thus, the precision rate and the recall rate of the improved precision-recall curve 232 are generally higher. Fig.14 schematically illustrates a system 234, which can be used for controlling the quality of wafers 250 produced in a semiconductor manufacturing fab. The system 234 includes an imaging device 246 and a processing device 244. The imaging device 246 is coupled to the processing device 244. The imaging device 246 is configured to acquire imaging datasets 66 of the wafer 250. The wafer 250 can include semicon- ductor structures, e.g., transistors such as field effect transistors, memory cells, et cetera. An example implementation of the imaging device 246 would be a SEM or mSEM, a Helium ion microscope (HIM) or a cross-beam device including FIB and SEM or any charged particle imaging device. The imaging device 246 can provide an imaging dataset 66 to the processing device 244. The processing device 244 includes a processor 238, e.g., implemented as a CPU or GPU. The processor 238 can receive the imaging dataset 66 via an interface 242. The processor 238 can load program code from a memory 240. The processor 238 can execute the program code. Upon executing the program code, the processor 238 performs techniques such as described herein, e.g., executing an anomaly de- tection to detect one or more anomalies; training the anomaly detection; executing a classification algorithm to classify the anomalies into a set of classes, e.g., including defect classes, a nuisance class, and/or an unknown class; retraining the ML classi- fication algorithm, e.g., based on an annotation obtained from a user upon presenting at least one anomaly to the user, e.g., via the respective user interface 236, computing a cluster tree 194 based on a hierarchical clustering method, assessing the quality of the wafer 250. For example, the processor 238 can perform the computer imple- mented methods 28 or 28’ shown in Fig.4 or Fig.5 respectively upon loading program code from the memory 240. Fig.15 schematically illustrates a system 234’, which can be used for controlling the production of wafers 250 in a semiconductor manufacturing fab. The system com- prises the same components as indicated in Fig.14 and the above said also applies for the respective components here. In addition, the system 234’ has means 248 for producing wafers 250 controlled by at least one wafer manufacturing process param- eter. To this end, an imaging dataset 66 is provided to the processing device by means of the imaging device 246. The processor 238 of the processing device 244 is configured to perform one of the disclosed methods comprising controlling the at least one wafer manufacturing process parameter based on one or more measurements of the current classification of anomalies in the imaging dataset of the wafer 250. For example, detected bridge defects indicate insufficient etching, so the amount of etch- ing is increased, detected line breaks indicate excessive etching, so the amount of etching is decreased, consistently occurring defects indicate a defective mask, so the mask must be checked, and detected missing structures hint at non-ideal material deposition, so the material deposition is modified. Embodiments, examples and aspects of the invention can be described by the fol- lowing clauses: 1. A computer implemented method (28, 28') for the detection and classification of anomalies (15) in an imaging dataset (66) of a wafer comprising a plurality of semiconductor structures, the method comprising: - Selecting a machine learning anomaly classification algorithm; - Executing at least one outer iteration (40) comprising the following steps: i. Determining a current detection of a plurality of anomalies (15) in the imaging dataset (66); ii. Executing multiple inner iterations (42), at least some of them compris- ing the following steps: a. Using the anomaly classification algorithm to determine a current classification of the plurality of anomalies (15) in the imaging da- taset (66); b. Based on at least one decision criterion selecting at least one anomaly (15) of the current detection of the plurality of anomalies (15) for presentation to a user via a user interface (236), the user interface (236) being configured to let the user assign a class label of a current set of classes to each of the at least one anomaly (15); c. Re-training the anomaly classification algorithm based on anoma- lies (15) annotated by the user in an inner iteration (42) of the cur- rent or any previous outer iteration (40). 2. The method of clause 1, wherein multiple outer iterations (40) are executed, at least some of them comprising steps i. and ii. 3. The method of clause 1 or 2, wherein determining a current detection of a plu- rality of anomalies (15) in the imaging dataset (66) in step i. comprises: - selecting a machine learning anomaly detection algorithm; - training the anomaly detection algorithm; - determining a current detection of a plurality of anomalies (15) in the imag- ing dataset (66). 4. The method of clause 3, wherein the training of the anomaly detection algorithm comprises at least one intermediate iteration (44) comprising the following steps: - selecting training data for the anomaly detection algorithm, the training data containing at least one subset of the imaging dataset (66) of the wafer and/or of an imaging dataset (66) of at least one other wafer and/or of an imaging dataset (66) of a wafer model; - re-training the anomaly detection algorithm based on training data selected in an intermediate iteration (44) of the current or any previous outer iteration (40). 5. The method of clause 4, wherein the user interface (236) is configured to let the user define one or more interest-regions (11) in the imaging dataset (66), and the training data for the anomaly detection algorithm is selected only based on said interest-regions (11). 6. The method of clause 4 or 5, wherein the user interface (236) is configured to let the user define one or more exclusion-regions in the imaging dataset (66), and the training data for the anomaly detection algorithm does not contain data based on said exclusion-regions. 7. The method of any one of clauses 3 to 6, wherein the anomaly detection algo- rithm comprises an autoencoder neural network, and the plurality of anomalies (15) are detected based on a comparison between an input tile of the imaging dataset (66) and a reconstructed representation thereof obtained by presenting the tile to the autoencoder neural network, the tile containing an anomaly (15) and a surrounding of the anomaly (15). 8. The method of any one of clauses 1 to 7, wherein each anomaly (15) is associ- ated with a feature vector, and the decision criterion is formulated with regard to the feature vectors associated with the plurality of anomalies (15). 9. The method of clause 8, wherein the feature vector associated with an anomaly (15) comprises the raw imaging data or pre-processed imaging data of said anomaly (15) or of a tile containing said anomaly (15). 10. The method of clause 8 or 9, wherein the feature vector associated with an anomaly (15) comprises the activation of a layer, preferably the penultimate layer, of a pre-trained neural network when presented with said anomaly (15) as input. 11. The method of one of clauses 8 to 10, wherein the feature vector associated with an anomaly (15) comprises a histogram of oriented gradients of said anom- aly (15). 12. The method of any one of clauses 1 to 11, wherein multiple anomalies (15) are selected for presentation to the user, and the at least one decision criterion comprises a similarity measure between the multiple anomalies (15). 13. The method of clause 12, further comprising selecting the multiple anomalies (15) to have a high similarity measure between each other. 14. The method of any one of clauses 1 to 13, wherein the at least one decision criterion comprises a similarity measure of the selected at least one anomaly (15) and one or more further anomalies (15) that were selected in one or more previous iterations in step ii.b. 15. The method of clause 14, further comprising selecting the multiple anomalies (15) to have a low similarity measure with respect to the one or more further anomalies (15) that were selected in the one or more previous iterations in step ii.b. 16. The method of any one of clauses 1 to 15, wherein the at least one decision criterion comprises a probability of an anomaly (15) for not belonging to the current set of classes. 17. The method of clause 16, wherein the anomaly classification algorithm is an open set classifier and the probability of the anomaly (15) for not belonging to the current set of classes is estimated by the open set classifier. 18. The method of any one of clauses 1 to 17, wherein the at least one decision criterion comprises the selected at least one anomaly (15) being classified as a predefined class or a class from a predefined set of classes in the current clas- sification. 19. The method of any one of clauses 1 to 18, wherein multiple anomalies (15) are selected for presentation to the user, and the at least one decision criterion comprises the multiple anomalies (15) being classified as the same class in the current anomaly classification. 20. The method of any one of clauses 1 to 19, wherein the at least one decision criterion comprises a population of the one or more classes the at least one anomaly (15) is assigned to in the current classification. 21. The method of any one of clauses 1 to 20, wherein multiple anomalies (15) are concurrently presented to the user, and the method further comprises grouping and/or sorting the multiple anomalies (15) for presentation to the user. 22. The method of any one of clauses 1 to 21, wherein the at least one decision criterion comprises a context of the selected at least one anomaly (15) with re- spect to the semiconductor structures. 23. The method of any one of clauses 1 to 22, wherein the at least one decision criterion implements at least one member selected from the group consisting of an explorative annotation scheme and an exploitative annotation scheme. 24. The method of any one of clauses 1 to 23, wherein the at least one decision criterion differs for at least two iterations of the inner iterations (42). 25. The method of any one of clauses 1 to 24, the decision criterion further com- prising selecting the at least one anomaly (15) based on an unsupervised or semi-supervised clustering of the detected plurality of anomalies (15). 26. The method of clause 25, wherein the unsupervised clustering is based on a hierarchical clustering method used to compute a cluster tree (194), wherein the root cluster (196) contains the detected plurality of anomalies (15), each leaf cluster (198, 200, 202) contains a single anomaly (15) of the detected plurality of anomalies (15) and for all internal clusters (204, 205) of the tree the following applies: for an internal cluster (204, 205) with n child clusters i = {1, .. , n } let α i , i ∈ {1, .. , n } indicate the set of anomalies (15) of child cluster i, then n is a partition of the set of anomalies (15) contained in the internal cluster (204, 205). 27. The method of clause 26, wherein the hierarchical clustering method comprises an agglomerative clustering method, where two clusters (201, 203, 206) are merged, starting from the leaves of the cluster tree (194), based on a cluster distance measure. 28. The method of clause 27, wherein the cluster distance measure comprises a function of pairwise distances, each between an anomaly (15) of the first and an anomaly (15) of the second cluster (201, 203, 206) of the two clusters (201, 203, 206). 29. The method any one of clauses 27 to 30, wherein the function used for compu- ting the cluster distance measure is Ward’s minimum variance method. 30. The method of clause 26, wherein the hierarchical clustering method comprises a divisive clustering method, where a cluster (201, 203, 206) is iteratively split, starting from the root cluster (196) of the cluster tree (194), based on a dissimi- larity measure between the anomalies (15) contained in the cluster (201, 203, 206). 31. The method of any one of clauses 26 to 30, wherein the decision criterion com- prises selecting a cluster (201, 203, 206) of the cluster tree (194) for presenta- tion to the user. 32. The method of clause 31, the user interface (236) being configured to allow the user to select a cluster (201, 203, 206) suitable for annotation by iteratively mov- ing from the current cluster (201, 203, 206) to its parent cluster or to one of its child clusters (201, 203, 206) in the cluster tree (194). 33. The method of clause 31, wherein the user interface (236) is configured to dis- play a section of the cluster tree (194) containing the currently selected cluster (201, 203, 206) and to let the user select one of the displayed clusters (201, 203, 206) of the section of the cluster tree (194) for annotation. 34. The method of any one of clauses 1 to 33, wherein multiple anomalies (15) are concurrently presented to the user and the user interface (236) is configured to batch annotate the multiple anomalies (15). 35. The method of clause 34, wherein batch annotation of the multiple anomalies (15) comprises batch assigning of a plurality of labels to the multiple anomalies (15) concurrently presented to the user. 36. The method of any one of clauses 1 to 35, wherein the current set of classes is initialized as a predefined set of classes. 37. The method of any one of clauses 1 to 36, wherein the annotation of the at least one anomaly (15) in step ii.b. comprises the option to add a new class to the current set of classes. 38. The method of clause 37, further comprising, upon adding a new class to the current set of classes, offering the user an option to assign previously labeled training data to the new class. 39. The method of clause 37 or 38, wherein the anomaly classification algorithm comprises an open set classifier. 40. The method of any one of clauses 1 to 39, wherein the current set of classes is organized hierarchically and this knowledge is included in the training of the anomaly classification algorithm. 41. The method of any one of clauses 1 to 40, wherein the current set of classes contains at least one defect class and at least one nuisance class. 42. The method of any one of clauses 1 to 41, wherein the current set of classes contains an unknown anomaly class. 43. The method of any one of clauses 1 to 42, wherein the selection of a machine learning algorithm comprises selecting one or more of the following attributes: - a model architecture; - an optimization algorithm for carrying out the training; - hyperparameters of the model and the optimization algorithm; - an initialization of the parameters of the model; - pre-processing techniques of the training data. 44. The method of clause 43, wherein one or more attributes of the machine learn- ing algorithm are selected based on specific application knowledge. 45. The method of clause 43 or 44, the at least one outer iteration further comprising a modification step (90) containing an option to modify one or more attributes of the machine learning algorithm. 46. The method of any one of clauses 1 to 45, wherein the imaging dataset (66) is a multibeam SEM image. 47. The method of any one of clauses 1 to 46, wherein the imaging dataset (66) is a focused ion beam SEM image. 48. The method of any one of clauses 1 to 47, further comprising determining one or more measurements based on the current classification of the plurality of anomalies (15). 49. The method of clause 48, wherein the user interface is configured to let the user define one or more interest-regions (11) in the imaging dataset (66), especially die regions or border regions, and wherein the one or more measurements are computed based on the current classification of the plurality of anomalies (15) within each of the one or more interest-regions (11) separately. 50. The method of clause 49, further comprising automatically suggesting one or more new interest-regions (11) based on at least one selection criterion and presenting the suggested one or more interest-regions (11) to the user via the user interface (236). 51. The method of any one of clauses 48 to 50, wherein the one or more measure- ments are selected from the group containing anomaly size, anomaly area, anomaly location, anomaly aspect ratio, anomaly morphology, number or ratio of anomalies, anomaly density, anomaly distribution, moments of an anomaly distribution, performance metrics, precision, recall, nuisance rate. 52. The method of clause 51, wherein the one or more measurements are selected from said group for a specific defect or a specific set of defects. 53. The method of any one of clauses 48 to 52, further comprising controlling at least one wafer manufacturing process parameter based on the one or more measurements. 54. The method of any one of clauses 48 to 53, further comprising assessing the quality of the wafer based on the one or more measurements and at least one quality assessment rule. 55. One or more machine-readable hardware storage devices comprising instruc- tions that are executable by one or more processing devices (244) to perform operations comprising the method of any one of clauses 1 to 54. 56. A system (234) for controlling the quality of wafers produced in a semiconductor manufacturing fab, the system comprising - an imaging device (246) adapted to provide an imaging dataset (66) of said wafer; - a graphical user interface (236) configured to present data to the user and obtain input data from the user; - one or more processing devices (244); - one or more machine-readable hardware storage devices comprising in- structions that are executable by one or more processing devices (244) to perform operations comprising the method of clause 54. 57. A system (234’) for controlling the production of wafers in a semiconductor man- ufacturing fab, the system comprising - means (248) for producing wafers (250) controlled by at least one manu- facturing process parameters; - an imaging device (246) adapted to provide an imaging dataset (66) of said wafers; - a graphical user interface (236) configured to present data to the user and obtain input data from the user; - one or more processing devices (244); one or more machine-readable hardware storage devices comprising instructions that are executable by one or more processing devices (244) to perform operations com- prising the method of clause 53. In summary, the invention relates to a computer implemented method 28, 28’ for the detection and classification of anomalies 15 in an imaging dataset 66 of a wafer com- prising a plurality of semiconductor structures. The method comprises determining a current detection of a plurality of anomalies 15 in the imaging dataset 66 and obtaining an unsupervised or semi-supervised clustering of the current detection of the plurality of anomalies 15. Based on at least one decision criterion at least one cluster of the clustering is selected for presentation and annotation to a user via a user interface 236. An anomaly classification algorithm is re-trained based on the annotated anom- alies 15. A system 234 for controlling the quality of wafers and a system 234’ for controlling the production of wafers are also disclosed.
Reference number list 10 cell structure 11 interest-region 12 cell 14 defective cell structure 15 anomaly 16 open 18 puncture 20 merge 22 half-open 24 dwarf 26 skid 28, 28’ computer implemented method 30 data selection routine 32 anomaly detection routine 34 anomaly classification routine 36 annotation routine 38 re-training routine 40 outer iteration 42 inner iteration 44 Intermediate iteration 46 data selection routine 48 anomaly detection routine 50, 50’ annotation routine 52 anomaly classification routine 54 review routine 56 report step 60 skipping step 66 imaging dataset 68 decision step 70 positive answer 72 negative answer 74 semantic annotation step 76 regulatory annotation step 78 decision step 80 positive answer 82 model selection step 84 model application step 86 current detection step 88 negative answer 90 modification step 92 analysis step 94 decision step 96 negative answer 98 decision step 100 positive answer 102 negative answer 104 positive answer 106 threshold selection step 108 saving step 110 decision step 112 negative answer 114 positive answer 116 clustering step 116’ hierarchical clustering step 118 querying step 118’ hierarchical querying step 120 decision step 122 positive answer 124 negative answer 126 visualization step 128 decision step 130 positive answer 132 negative answer 134 class update step 136 allocation step 136’ hierarchical allocation step 138 decision step 140 positive answer 142 negative answer 144 saving step 146 decision step 148 positive answer 150 negative answer 152 model selection step 154 model application step 156 pre-processing step 158 hyper parameter selection step 160 splitting step 162 training step 164 inference step 166 analysis step 168 current classification step 172 muting step 174 defect visualization step 176 metrology visualization step 178 semantic result step 180 decision step 182 positive answer 184 negative answer 186 decision step 188 negative answer 190 positive answer 192 refinement step 194 cluster tree 196 root cluster 198, 200, 202 leaf cluster 204, 205 internal cluster 201, 203, 206, 207, 208, 210, 211 cluster 212 decision step 214 negative answer 216 positive answer 218 cluster display step 220 cluster selection step 222 iteration 224 hierarchical annotation step 226 precision axis 228 recall axis 230 conventional precision-recall curve 232 improved prevision-recall curve 234, 234’ system 236 user interface 238 CPU 240 memory 242 interface 244 processing device 246 imaging device 248 means 250 wafer
Next Patent: ROADSIDE DATA PROVISION METHOD AND ROADSIDE DATA PROVISION METHOD SYSTEM