Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
MACHINE LEARNING-BASED CLASSIFICATION OF DEFECTS IN A SEMICONDUCTOR SPECIMEN
Document Type and Number:
WIPO Patent Application WO/2020/234863
Kind Code:
A1
Abstract:
There is provided a method of automated defects' classification, and a system thereof. The method comprises obtaining data informative of a set of defects' physical attributes usable to distinguish between defects of different classes among the plurality of classes; training a first machine learning model to generate, for the given defect, a multi-label output vector informative of values of the physical attributes, thereby generating for the given defect a multi-label descriptor; and using the trained first machine learning model to generate multi-label descriptors of the defects in the specimen. The method can further comprise obtaining data informative of multi-label data sets, each data set being uniquely indicative of a respective class of the plurality of classes and comprising a unique set of values of the physical attributes; and classifying defects in the specimen by matching respectively generated multi-label descriptors of the defects to the multi-label data sets.

Inventors:
SHAUBI OHAD (IL)
COHEN BOAZ (IL)
SAVCHENKO KIRILL (IL)
SHTALRID ORE (IL)
Application Number:
PCT/IL2020/050350
Publication Date:
November 26, 2020
Filing Date:
March 24, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
APPLIED MATERIALS ISRAEL LTD (IL)
International Classes:
G06K9/62
Foreign References:
US20180060702A12018-03-01
US6205239B12001-03-20
US5544256A1996-08-06
US20160358041A12016-12-08
Attorney, Agent or Firm:
HAUSMAN, Ehud (IL)
Download PDF:
Claims:
CLAIMS

1. A method of automated classifying defects in a semiconductor specimen into a plurality of classes, the method comprising, by a processing and memory circuitry (PMC): obtaining data informative of a set of defects’ physical attributes usable to distinguish between defects of different classes among the plurality of classes; and upon training a first machine learning model to process a sample comprising one or more images informative of a given defect so to generate for the given defect a multi-label output vector informative of values of the physical attributes from the set of physical attributes, thereby generating for the given defect a multi-label descriptor, using the trained first machine learning model to generate multi-label descriptors of the defects in the specimen, the descriptors being usable for classification.

2. The method of Claim 1 further comprising, by the PMC: obtaining data informative of multi-label data sets, each data set being uniquely indicative of a respective class of the plurality of classes and comprising a unique set of values of physical attributes from the set of physical attributes; and upon training a second machine learning model to provide a multilabel classification, using the trained second machine learning model to classify defects in the specimen by matching respectively generated multi-label descriptors of the defects to the multi-label data sets.

3. The method of Claim 1 further comprising, by the PMC: obtaining data informative of multi-label data sets, each data set being uniquely indicative of a respective class of the plurality of classes and comprising a unique set of values of physical attributes from the set of physical attributes; and analyzing the generated multi-label descriptors of the defects in the specimen to recognize new repeating multi-label data sets, thereby identifying new classes of the defects.

4. The method of Claim 1 further comprising analyzing the generated multilabel descriptors of the defects in the specimen to recognize multi-modal behavior of one or more classes.

5. The method of Claim 2, wherein classifying a defect includes defining a certainly threshold as a ratio between a number of values in the respectively generated multi-label descriptor that match to a given class and the total number of values in the multi-label data set indicative of the given class.

6. The method of Claim 5 further comprising using the certainty threshold to enable at least one of: a. optimizing of confidence levels of defects classification; b. identifying misclassified defects; c. setting purity requirements separately for each class and/or group of classes; d. setting accuracy requirements separately for each class and/or group of classes; and e. setting extraction requirements separately for each class and/or group of classes.

7. The method of Claim 1, wherein the physical attributes in the set of the defects’ physical attributes are informative of at least one of: physical location, shape, perimeter, sidewall angle, aspect ratio, orientation, symmetry, layer, texture, edges and chemical composition.

8. The method of Claim 1, wherein the plurality of classes comprises a “particle” class and a“bridge” class, and wherein the set of defects’ physical attributes comprises roughness of texture, clearness of edges, position in relation to a top of a patter, and position in relation to two patters.

9. The method of Claim 2, wherein the values of the physical attributes in the multi-label descriptors and the multi-label data sets are binary.

10. The method of Claim 2, wherein the values of the physical attributes in the multi-label descriptors and the multi-label data sets correspond to‘Yes”, “No” and“Not relevant” with regard to respective physical attributes.

11. A system to classify defects in a semiconductor specimen into a plurality of classes, the system comprising a processing and memory circuitry (PMC) operatively connected to an input interface, wherein the input interface is configured to receive samples comprising images informative of the defects; and wherein the PMC is configured: to obtain data informative of a set of defects’ physical attributes usable to distinguish between defects of different classes among the plurality of classes; and upon training a first machine learning model to process a sample comprising one or more images informative of a given defect so to generate for the given defect a multi-label output vector informative of values of the physical attributes from the set of physical attributes, thereby generating for the given defect a multi-label descriptor, to use the trained first machine learning model to generate multi-label descriptors of the defects in the specimen, the descriptors being usable for classification.

12. The system of Claim 11, wherein the PMC is further configured: to obtain data informative of multi-label data sets, each data set being uniquely indicative of a respective class of the plurality of classes and comprising a unique set of values of the physical attributes from the set of physical attributes; and upon training a second machine learning model to provide a multilabel classification, to use the trained second machine learning model for classifying defects in the specimen by matching respectively generated multi-label descriptors of the defects to the multi-label data sets.

13. The system of Claim 11, wherein the PMC is further configured to: obtain data informative of multi-label data sets, each data set being uniquely indicative of a respective class of the plurality of classes and comprising a unique set of values of the physical attributes from the set of physical attributes; and analyze the generated multi-label descriptors of the defects in the specimen to recognize new repeating multi-label data sets, thereby identifying new classes of the defects.

14. The system of Claim 11, wherein the PMC is further configured to recognize multi-modal behavior of one or more classes by analyzing the generated multi-label descriptors of the defects in the specimen.

15. The system of Claim 13, wherein classifying a defect includes defining a certainty threshold as a ratio between a number of values in the respectively generated multi-label descriptor that match to a given class and the total number of values in the multi-label data set indicative of the given class.

16. The system of Claim 15, wherein the PMC is further configured to use the certainty threshold to enable at least one of: a. optimizing of confidence levels of defects classification; b. identifying misclassified defects; c. setting purity requirements separately for each class and/or group of classes; d. setting accuracy requirements separately for each class and/or group of classes; and e. setting extraction requirements separately for each class and/or group of classes.

17. The system of Claim 11, wherein the physical attributes in the set of the defects’ physical attributes are informative of at least one of: physical location, shape, perimeter, sidewall angle, aspect ratio, orientation, symmetry, layer, texture, edges and chemical composition.

18. The system of Claim 11, wherein the plurality of classes comprises a “particle” class and a“bridge” class, and wherein the set of defects’ physical attributes comprises roughness of texture, clearness of edges, position in relation to a top of a patter, and position in relation to two patters.

19. A non-transitoiy computer readable medium comprising instructions that, when executed by a computer, cause the computer to perform a method of classifying defects in a semiconductor specimen into a plurality of classes, the method comprising: obtaining data informative of a set of defects’ physical attributes usable to distinguish between defects of different classes among the plurality of classes; and upon training a first machine learning model to process a sample comprising one or more images informative of a given defect so to generate for the given defect a multi-label output vector informative of values of the physical attributes from the set of physical attributes, thereby generating for the given defect, a multi-label descriptor, using the trained first machine learning model to generate multi-label descriptors of the defects in the specimen, the descriptors being usable for classification.

20. The non-transitory computer readable medium of Claim 19, wherein the method further comprises: obtaining data informative of multi-label data sets, each data set being uniquely indicative of a respective class of the plurality of classes and comprising a unique set of values of the physical attributes from the set of physical attributes; and upon training a second machine learning model to provide a multi - label classification, using the trained second machine learning model to classify defects in the specimen by matching respectively generated multi-label descriptors of the defects to the multi-label data sets.

Description:
MACHINE LEARNING-BASED CLASSIFICATION OF DEFECTS IN A

SEMICONDUCTOR SPECIMEN

TECHNICAL FIELD

[001] The presently disclosed subject matter relates, in general, to the field of examination of a specimen, and more specifically, to automating the classification of defects in a specimen.

BACKGROUND

[002] Current demands for high density and performance associated with ultra large scale integration of fabricated devices require submicron features, increased transistor and circuit speeds, and improved reliability. Such demands require formation of device features with high precision and uniformity, which, in turn, necessitates careful monitoring of the fabrication process, including automated examination of the devices while they are still in the form of semiconductor wafers.

[003] The term “specimen” used in this specification should be expansively construed to cover any kind of wafer, masks, and other structures, combinations and/or parts thereof used for manufacturing semiconductor integrated circuits, magnetic heads, flat panel displays, and other semiconductor-fabricated articles.

[004] The term“examination” used in this specification should be expansively construed to cover any kind of metrology-related operations as well as operations related to detection and/or classification of defects in a specimen during its fabrication. Examination is provided by using non-destructive examination tools during or after manufacture of the specimen to be examined. By way of non-limiting example, the examination process can include runtime scanning (in a single or in multiple scans), sampling, reviewing, measuring, classifying and/or other operations provided with regard to the specimen or parts thereof using the same or different inspection tools. Likewise, examination can be provided prior to manufacture of the specimen to be examined, and can include, for example, generating an examination recipe(s) and/or other setup operations. It is noted that, unless specifically stated otherwise, the term“examination” or its derivatives used in this specification, are not limited with respect to resolution or size of an inspection area. A variety of nondestructive examination tools includes, by way of non-limiting example, scanning electron microscopes, atomic force microscopes, optical inspection tools, etc.

[005] By way of non-limiting example, run-time examination can employ a two phase procedure, e.g. inspection of a specimen followed by review of sampled locations of potential defects. During the first phase, the surface of a specimen is inspected at high-speed and relatively low-resolution. In the first phase, a defect map is produced to show suspected locations on the specimen having high probability of a defect. During the second phase, at least part of the suspected locations are more thoroughly analyzed with relatively high resolution. In some cases, both phases can be implemented by the same inspection tool, and, in some other cases, these two phases are implemented by different inspection tools.

[006] Examination processes are used at various steps during semiconductor fabrication to detect and classify defects on specimens. Effectiveness of examination can be increased by automatization of process(es) as, for example, Automatic Defect

Classification (ADC), Automatic Defect Review (ADR), etc.

GENERAL DESCRIPTION

[007] In accordance with certain aspects of the presently disclosed subject matter, there is provided a method of automated classifying defects in a semiconductor specimen into a plurality of classes. The method by a processing and memory circuitry (PMC) and comprises: obtaining data informative of a set of defects' physical attributes usable to distinguish between defects of different classes among the plurality of classes; and upon training a first machine learning model to process a sample comprising one or more images informative of a given defect so to generate for the given defect a multi-label output vector informative of values of the physical attributes from the set of physical attributes, thereby generating for the given defect a multi-label descriptor, using the trained first machine learning model to generate multi-label descriptors of the defects in the specimen, the descriptors being usable for classification. [008] The method can further comprise: obtaining data informative of multi-label data sets, each data set being uniquely indicative of a respective class of the plurality of classes and comprising a unique set of values of physical attributes from the set of physical attributes; and upon training a second machine learning model to provide a multi-label classification, using the trained second machine learning model to classify defects in the specimen by matching respectively generated multi-label descriptors of the defects to the multi-label data sets. Alternatively or additionally, the generated multi-label descriptors of the defects in the specimen can be analyzed to recognize multi-modal behavior of one or more classes and/or to recognize new repeating multi-label data sets, thereby identifying new classes of the defects.

[009] In accordance with other aspects of the presently disclosed subject matter, there is provided a system to classify defects in a semiconductor specimen into a plurality of classes. The system comprises a processing and memory circuitry (PMC) operatively connected to an input interface, wherein the input interface is configured to receive samples comprising images informative of the defects; and wherein the PMC is configured: to obtain data informative of a set of defects' physical attributes usable to distinguish between defects of different classes among the plurality of classes; and upon training a first machine learning model to process a sample comprising one or more images informative of a given defect so to generate for the given defect a multi-label output vector informative of values of the physical attributes from the set of physical attributes, thereby generating for the given defect a multi-label descriptor, to use the trained first machine learning model to generate multi-label descriptors of the defects in the specimen, the descriptors being usable for classification. [0010] The PMC can be further configured: to obtain data informative of multilabel data sets, each data set being uniquely indicative of a respective class of the plurality of classes and comprising a unique set of values of the physical attributes from the set of physical attributes; and upon training a second machine learning model to provide a multi-label classification, to use the trained second machine learning model for classifying defects in the specimen by matching respectively generated multi-label descriptors of the defects to the multi-label data sets. [0011] The PMC can be further configured to: obtain data informative of multilabel data sets, each data set being uniquely indicative of a respective class of the plurality of classes and comprising a unique set of values of the physical attributes from the set of physical attributes; and analyze the generated multi-label descriptors of the defects in the specimen to recognize new repeating multi-label data sets, thereby identifying new classes of the defects and/or recognize multi-modal behavior of one or more classes by analyzing the generated multi-label descriptors of the defects in the specimen and/or recognize new repeating multi-label data sets, thereby identifying new classes of the defects. [0012] In accordance with further aspects and, optionally, in combination with other aspects of the presently disclosed subject matter, classifying a defect can include defining a certainty threshold as a ratio between a number of values in the respectively generated multi-label descriptor that match to a given class and the total number of values in the multi-label data set indicative of the given class. By way of non-limiting example, the certainty threshold can enable at least one of: optimizing of confidence levels of defects classification; identifying misclassified defects; setting purity requirements separately for each class and/or group of classes; setting accuracy requirements separately for each class and/or group of classes; and setting extraction requirements separately for each class and/or group of classes.

[0013] In accordance with further aspects and, optionally, in combination with other aspects of the presently disclosed subject matter, the physical attributes in the set of the defects' physical attributes can be informative of at least one of: physical location, shape, perimeter, sidewall angle, aspect ratio, orientation, symmetry, layer, texture, edges and chemical composition.

[0014] By way of non-limiting example, the plurality of classes can comprise a "particle" class and a "bridge" class, wherein the respective set of defects' physical attributes can comprise roughness of texture, clearness of edges, position in relation to a top of a pattern, and position in relation to two patterns.

[0015] In accordance with further aspects and, optionally, in combination with other aspects of the presently disclosed subject matter, the values of the physical attributes in the multi-label descriptors and the multi-label data sets can be binary. Alternatively, the values of the physical attributes in the multi-label descriptors and the multi-label data sets can correspond to "Yes", "No" and "Not relevant" with regard to respective physical attributes.

[0016] Among advantages of certain embodiments of the presently disclosed subject matter is capability of classifying defects using learned attributes with physical meaning, thus enabling physical understanding and debugging, whilst with high performance intrinsic for learned attributes.

[0017] Among further advantages of certain embodiments of the presently disclosed subject matter is capability of classification to unseen classes by defining such classes by attributes with physical meaning.

BRIEF DESCRIPTION OF THE DRAWINGS

[0018] In order to understand the invention and to see how it may be carried out in practice, embodiments will now be described, by way of non-limiting example only, with reference to the accompanying drawings, in which: [0019] Fig. 1 illustrates a functional block diagram of an examination system in accordance with certain embodiments of the presently disclosed subject matter;

[0020] Fig. 2 illustrates a generalized flow-chart of machine learning-based classification of defects in accordance with certain embodiments of the presently disclosed subject matter; and [0021] Figs. 3 illustrates a generalized flow-chart of a setup step for classifying defects in accordance with certain embodiments of the presently disclosed subject matter.

DETAILED DESCRIPTION OF EMBODIMENTS

[0022] In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, it will be understood by those skilled in the art that the presently disclosed subject matter may be practiced without these specific details. In other instances, well-known methods, procedures, components and circuits have not been described in detail so as not to obscure the presently disclosed subject matter.

[0023] Unless specifically stated otherwise, as apparent from the following discussions, it is appreciated that throughout the specification discussions utilizing terms such as "processing", "computing", "representing", "comparing", "generating", “training”, or the like, refer to the action(s) and/or processes) of a computer that manipulate and/or transform data into other data, said data represented as physical, such as electronic, quantities and/or said data representing the physical objects. The term“computer” should be expansively construed to cover any kind of hardware- based electronic device with data processing capabilities including, by way of nonlimiting example, a FPEI (Fabrication Process Examination Information) system and respective parts thereof disclosed in the present application.

[0024] The terms "non-transitory memory" and“non-transitoiy storage medium” used herein should be expansively construed to cover any volatile or non-volatile computer memory suitable to the presently disclosed subject matter.

[0025] The term "defect" used in this specification should be expansively construed to cover any kind of abnormality or undesirable feature formed on or within a specimen.

[0026] The term“design data” used in the specification should be expansively construed to cover any data indicative of hierarchical physical design (layout) of a specimen. Design data can be provided by a respective designer and/or can be derived from the physical design (e.g. through complex simulation, simple geometric and Boolean operations, etc.). Design data can be provided in different formats as, by way of non-limiting examples, GDSII format, OASIS format, etc. Design data can be presented in vector format, grayscale intensity image format, or otherwise.

[0027] It is appreciated that, unless specifically stated otherwise, certain features of the presently disclosed subject matter, which are described in the context of separate embodiments, can also be provided in combination in a single embodiment. Conversely, various features of the presently disclosed subject matter, which are described in the context of a single embodiment, can also be provided separately or in any suitable sub-combination. In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the methods and apparatus.

[0028] Bearing this in mind, attention is drawn to Fig. 1 illustrating a functional block diagram of an examination system in accordance with certain embodiments of the presently disclosed subject matter. The examination system 100 illustrated in Fig. 1 can be used for examination of a specimen (e.g. of a wafer and/or parts thereof) as part of the specimen fabrication process. The illustrated examination system 100 comprises computer-based system 103 capable of automatically determining metrology-related and/or defect-related information using images obtained during specimen fabrication. Such mages are referred to hereinafter as fabrication process (FP) images. The system 103 is referred to hereinafter as an FPEI (Fabrication Process Examination Information) system. FPEI system 103 can be operatively connected to one or more low-resolution examination tools 101 and/or one or more high-resolution examination tools 102 and/or other examination tools. The examination tools are configured to capture FP images and/or to review the captured FP image(s) and/or to enable or provide measurements related to the captured image(s). The FPEI system can be further operatively connected to CAD server 110 and data repository 109. [0029] FPEI system 103 comprises a processor and memory circuitry (PMC) 104 operatively connected to a hardware-based input interface 105 and to a hardware- based output interface 106. PMC 104 is configured to provide all processing necessary for operating FPEI system, as further detailed with reference to Figs. 2-3 and comprises a processor (not shown separately) and a memory (not shown separately). The processor of PMC 104 can be configured to execute several functional modules in accordance with computer-readable instructions implemented on a non-transitory computer-readable memory comprised in the PMC. Such functional modules are referred to hereinafter as comprised in the PMC. Functional modules comprised in PMC 104 include operatively connected Machine Learning Module for descriptors’ generation 111 (referred to hereinafter as a Descriptors’ Generator) and Machine Learning Module for defects’ classification 112 (referred to hereinafter as a Classifier). [0030] Operation of FPEI system 103, PMC 104 and the functional blocks therein will be further detailed with reference to Figs. 2-3.

[0031] As will be further detailed with reference to Figs. 2-3, FPEI system is configured to receive, via input interface 105, FP input data. FP input data can include data (and/or derivatives thereof and/or metadata associated therewith) produced by the examination tools and/or data produced and/or stored in one or more data repositories 109 and/or in CAD server 110 and/or another relevant data depository. It is noted that FP input data can include images (e.g. captured images, images derived from the captured images, simulated images, synthetic images, etc.) and associated numeric data (e.g. metadata, hand-crafted attributes, etc.). It is further noted that image data can include data related to a layer of interest and/or to one or more other layers of the specimen. Optionally, for training purposes, FP input data can include the entire available FAB data or part thereof selected in accordance with certain criteria. [0032] FPEI system is further configured to process at least part of the received FP input data and send, via output interface 106, the results (or part thereof) to a storage system 107, to examination tool(s), to a computer-based graphical user interface (GUI) 108 for rendering the results and/or to external systems (e.g. Yield Management System (YMS) of a FAB). GUI 108 can be further configured to enable user-specified inputs related to operating FPEI system 103.

[0033] By way of non-limiting example, a specimen can be examined by one or more low-resolution examination machines 101 (e.g. an optical inspection system, low-resolution SEM, etc.). The resulting data (referred to hereinafter as low- resolution image data 121), informative of low-resolution images of the specimen, can be transmitted - directly or via one or more intermediate systems - to FPEI system 103. Alternatively or additionally, the specimen can be examined by a high- resolution machine 102 (e.g. a subset of potential defect locations selected for review can be reviewed by a scanning electron microscope (SEM) or Atomic Force Microscopy (AFM)). The resulting data (referred to hereinafter as high-resolution image data 122), informative of high-resolution images of the specimen, can be transmitted - directly or via one or more intermediate systems - to FPEI system 103. [0034] It is noted that images of a desired location on a specimen can be captured at different resolutions. By way of non-limiting example, so-called“defect images” of the desired location are usable to distinguish between a defect and a false alarm, while so-called“class images” of the desired location are obtained with higher resolution and are usable for defect classification. In some embodiments, images of the same location (with the same or different resolutions) can comprise several images registered therebetween (e.g. images captured from the given location and one or more reference images corresponding to the given location).

[0035] Upon processing the FP input data (e.g. low-resolution image data and/or high-resolution image data, optionally together with other data, as, for example, design data, synthetic data, etc.), FPEI system can send the results (e.g. instruction- related data 123 and/or 124) to any of the examination tool(s), store the results (e.g. defect attributes, defect classification, etc.) in storage system 107, render the results via GUI 108 and/or send them to an external system (e.g. to YMS). [0036] Those versed in the art will readily appreciate that the teachings of the presently disclosed subject matter are not bound by the system illustrated in Fig. 1; equivalent and/or modified functionality can be consolidated or divided in another manner and can be implemented in any appropriate combination of software with firmware and/or hardware. [0037] Without limiting the scope of the disclosure in any way, it should also be noted that the examination tools can be implemented as inspection machines of various types, such as optical imaging machines, electron beam inspection machines and so on. In some cases the same examination tool can provide low-resolution image data and high-resolution image data. In some cases at least one examination tool can have metrology capabilities.

[0038] Descriptors’ Generator 111 and Classifier 112 can be implemented as separate or combined Machine Learning Modules. For purpose of illustration only, the following description is provided for Machine Learning Modules (Descriptors’ Generator 111 and Classifier 112) implemented as Deep Neural Networks (DNNs). Those skilled in the art will readily appreciate that the teachings of the presently disclosed subject matter are, likewise, applicable to other suitable techniques based on Machine Learning.

[0039] Descriptors’ Generator 111 and Classifier 112 can comprise one or more DNN subnetworks each comprising a plurality of layers organized in accordance with the respective DNN architecture. Optionally, at least one of the DNN networks can have an architecture different from the others. By way of non-limiting example, the layers in respective DNN networks can be organized in accordance with Convolutional Neural Network (CNN) architecture, Recurrent Neural Network architecture, Recursive Neural Networks architecture, or otherwise. Optionally, at least part of the DNN subnetworks can have one or more common layers (e.g. final fuse layer, output full-connected layers, etc.). Output of Descriptors’ Generator 111 can serve as input for Classifier 112.

[0040] Each layer of a DNN network can include multiple basic computational elements (CE) typically referred to in the art as dimensions, neurons, or nodes. Computational elements of a given layer can be connected with CEs of a preceding layer and/or a subsequent layer. Each connection between the CE of a preceding layer and the CE of a subsequent layer is associated with a weighting value. A given CE can receive inputs from CEs of a previous layer via the respective connections, each given connection being associated with a weighting value which can be applied to the input of the given connection. The weighting values can determine the relative strength of the connections and thus the relative influence of the respective inputs on the output of the given CE. The given CE can be configured to compute an activation value (e.g. the weighted sum of the inputs) and further derive an output by applying an activation function to the computed activation. The activation function can be, for example, an identity function, a deterministic function (e.g., linear, sigmoid, threshold, or the like), a stochastic function, or other suitable function. The output from the given CE can be transmitted to CEs of a subsequent layer via the respective connections. Likewise, as above, each connection at the output of a CE can be associated with a weighting value which can be applied to the output of the CE prior to being received as an input of a CE of a subsequent layer. Further to the weighting values, there can be threshold values (including limiting functions) associated with the connections and CEs. [0041] The weighting and/or threshold values of a deep neural network can be initially selected prior to training, and can be further iteratively adjusted or modified during training to achieve an optimal set of weighting and/or threshold values in the trained DNN module. After each iteration, a difference can be determined between the actual output produced by DNN module and the target output associated with the respective training set of data. The difference can be referred to as an error value. Training can be determined to be complete when a cost function indicative of the error value is less than a predetermined value or when a limited change in performance between iterations is achieved. Descriptors’ Generator 111 can be trained separately of Classifier 112. A set of input data used to train a respective machine learning model is referred to hereinafter as a training set. For DNN, the training set is used to adjust the weights/thresholds of the deep neural network.

[0042] It is noted that the teachings of the presently disclosed subject matter are not bound by the architecture of Descriptors’ Generator 111 and Classifier 112 (including the number and/or architecture of DNN networks). By way of nonlimiting example, Classifier 112 can operate in a manner disclosed in PCT application PCT/IL2019/050155 filed on February 7, 2019 incorporated herewith by reference in its entirety.

[0043] It is noted that the examination system illustrated in Fig. 1 can be implemented in a distributed computing environment, in which the aforementioned functional modules shown in Fig. 1 can be distributed over several local and/or remote devices, and can be linked through a communication network. It is further noted that in other embodiments at least part of examination tools 101 and/or 102, data repositories 109, storage system 107 and/or GUI 108 can be external to the examination system 100 and operate in data communication with FPEI system 103 via input interface 105 and output interface 106. FPEI system 103 can be implemented as stand-alone computers) to be used in conjunction with the examination tools. Alternatively, the respective functions of the FPEI system can, at least partly, be integrated with one or more examination tools. [0044] Referring to Fig. 2, there is illustrated a generalized flow-chart of machine learning-based classification in accordance with certain embodiments of the presently disclosed subject matter. The process includes a setup step (201) comprising training (202) Descriptors’ Generator 111 to provide class-related descriptors for defects in respective images and training (203) Classifier 112 to provide multi-label classification of defects based on the respective descriptors. The setup step is further detailed with reference to Fig. 3.

[0045] During the runtime (204), the PMC of FPEI system uses the obtained trained Descriptors’ Generator 111 and Classifier 112 to process (205) a FP sample comprising one or more FP images. Thereby PMC obtains (206) classification- related data characterizing at least one of the images in the processed FP sample. When processing one or more FP images, PMC can also use predefined parameters and/or parameters received from other sources in addition to the training-based parameters characterizing Descriptors’ Generator 111 and Classifier 112 upon training.

[0046] FP images in the FP sample can arrive from different examination modalities (e.g. from different examination tools, from different channels of the same examination tool as, for example, bright field and dark field images, from the same examination tool using different operational parameters, or can be derived from design data, etc.).

[0047] For example, FP images can be selected from images of the specimen (e.g. the wafer or parts thereof) captured during the manufacturing process, derivatives of the captured images obtained by various pre-processing stages (e.g. images of a part of a wafer or a photomask captured by SEM or an optical inspection system, SEM images roughly centered around the defect to be classified by ADC, SEM images of larger regions in which the defect is to be localized by ADR, registered images of different examination modalities corresponding to the same mask location, segmented images, height map images, etc.), computer-generated design data-based images, etc. It is noted that FP images can comprise the images of a layer of interest and/or registered images of one or more other layers of the specimen. FP mages of different layers are referred to hereinafter also as images received from the different modalities. [0048] It is noted that in embodiments of the presently disclosed subject matter, characteristics of the images comprised in FP samples and/or corresponding training samples, differ from the regular RGB images used in the general Deep Neural Networks known in the art. For example, electron based imaging results in greyscale images with various effects as non-uniform noise distribution, charging effects, large variability between sensors (different tools), and more. Further, the SEM image is usually composed of 5 different grey scale images, each image corresponding to a different perspective from which the image was taken (Top, Left, Right, Up, Down).

[0049] It is further noted that FP samples and/or corresponding training samples and/or can include multiple data types of FP input data, such as, for example, images of different origin and resolution (e.g. defect images, class images, reference images, CAD images, etc.), different types of numeric data, as, for example, different types of data derived from the images (e.g. height map, defect mask, grades, segmentations, etc.), different types of metadata (e.g. imaging conditions, pixel-size, etc.), different types of hand-crafted attributes (e.g. defect size, orientation, background segment, etc.), and alike. The defects of a given class may arrive from one or more layers and/or one or more products in the FAB. A training set can be further enriched to include augmented and/or synthetic defects. By way of nonlimiting example, a training set can be enriched, as detailed in PCT Application No. PCT/IL2019/050150 filed on February 7, 2019 and incorporated herewith by reference in its entirety, and US Application No. 16/280,869 filed on February 20, 2019 and incorporated herewith by reference in its entirety, etc.

[0050] Referring to Fig. 3, there is illustrated a generalized flow-chart of setup step (201). In accordance with certain embodiments of the presently disclosed subject matter, the setup step includes defining (301) a set of defects’ physical attributes (e.g. up to 20-30 physical attributes) usable to distinguish between defects of different classes and further defining (302), for each given class, a set of values of these physical attributes, the set of values uniquely describing the giving class. Accordingly, each class is uniquely associated (303) with a multi-label data set corresponding to the respective unique set of values of physical attributes characterizing defects in the given class. Operations (301) - (303) can be provided per product, per customer, and/or globally for a group of products/customers and respective unique association of classes with multi-label data sets can be stored in PMC 104. Operations (301) - (303) can be provided manually or, at least partly, by a computer.

[0051] By way of non-limiting example, the physical attributes can characterize physical location, shape, perimeter, sidewall angle, aspect ratio, orientation, symmetry, etc. Likewise, physical attributes can characterize location of a defect on certain one or more layers, chemical composition (e.g. missing and/or presence of a certain material), etc. The values of physical attributes are derivable from the images in FP samples and/or respective training samples (e.g. by processing the images and/or derivatives thereof) .

[0052] For example, a defect in a '‘particle” class can be characterized by a rough texture (small z variations of defect area) with clear edges which lie on top of the pattern, while a defect in a“bridge” class can be characterized by a non- rough texture without clear edges which connect two patterns, and lies on top. Thus, for these two classes, the set of defects’ physical attributes can be defined as follows:

- roughness of texture

- clearness of edges

- position in relation to the top of a patter

- position in relation to two patters

[0053] Optionally, but not necessary so, the values of physical attributes can be defined in a binary form. Table 1 presents a non-limiting example of a set of binary values of physical attributes uniquely describing“particle” and“bridge” classes of defects.

Table 1

[0054] Accordingly, for the above example, “Particle” class can be uniquely associated with multi-label binary data set [1 1 1 0] and“Bridge” class can be uniquely associated with multi-label binary data set [0 0 1 1], [0055] It is noted that some attributes from the set of physical attributes can be not relevant for a specific defect class. Optionally, but not necessary so, the values of physical attributes can be defined as“Yes”,“No” and“Not Relevant”.

[0056] PMC 104 trains Descriptors’ Generator 111 to generate (304), for each given defect, a multi- label output vector (referred to hereinafter also as“descriptor”) defining the values of attributes from the set of physical attributes. By way of nonlimiting example, the training procedure can be provided in a“structured prediction” manner (e.g. detailed in the article“Predict and Constrain: Modeling Cardinality in Deep Structured Prediction”, Nataly Brukhim and Amir Globerson, published on February 13, 2018, arXiv:1802.04721vl; https://arxiv.org/pdf/1802.04721.pdf. The article is incorporated herewith by reference in its entirety).

[0057] Alternatively or additionally, PMC 104 uses data sets uniquely associated with respective classes to train Classifier 112 to provide (305) classification of defects in accordance with the multi-label output vectors (descriptors).

[0058] The training process yields the trained Descriptors’ Generator and the trained Classifier.

[0059] Referring back to Fig. 2, dining runtime PMC 104 uses the trained Descriptors’ Generator to process a FP sample to obtain a descriptor informative of physical attributes of a respective defect, and further uses the trained Classifier to classify the defect in accordance with multi-label data sets uniquely associated with respective classes.

[0060] Classifying can include also a certainty procedure defining a certainty threshold as a ratio between a number of values matching to a given class, to the total number of attribute values in the multi-label data set. Such a certainty threshold enables optimization of confidence levels of defects classification, and identifying misclassified defects. By way of non-limiting example, such a threshold can be implemented in a technique of classifying defects comprising assigning each class to a classification group among three or more classification groups with different priorities and further setting purity, accuracy and/or extraction requirements separately for each class, and optimizing the classification results in accordance with per-class requirements. The technique is disclosed in US Application No. 2019/0096053 incorporated herewith by reference in its entirely. [0061] Analyses of the generated descriptors can recognize new repeating attribute patterns, thereby enabling to detect new classes initially not included in the classification. Likewise, such analyses enables recognizing multi-modal behavior of one or more classes and defining respective sub-classes (e.g. corresponding to different clusters of attributes’s values characterizing the same class). [0062] It is to be understood that the invention is not limited in its application to the details set forth in the description contained herein or illustrated in the drawings. The invention is capable of other embodiments and of being practiced and carried out in various ways. Hence, it is to be understood that the phraseology and terminology employed herein are for the purpose of description and should not be regarded as limiting. As such, those skilled in the art will appreciate that the conception upon which this disclosure is based may readily be utilized as a basis for designing other structures, methods, and systems for carrying out the several purposes of the presently disclosed subject matter.

[0063] It will also be understood that the system according to the invention may be, at least partly, implemented on a suitably programmed computer. Likewise, the invention contemplates a computer program being readable by a computer for executing the method of the invention. The invention further contemplates a non- transitory computer-readable memory tangibly embodying a program of instructions executable by the computer for executing the method of the invention. [0064] Those skilled in the art will readily appreciate that various modifications and changes can be applied to the embodiments of the invention as hereinbefore described without departing from its scope, defined in and by the appended claims.