Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
TRAINING A NEURAL NETWORK FOR DEFECT DETECTION IN LOW RESOLUTION IMAGES
Document Type and Number:
WIPO Patent Application WO/2019/191346
Kind Code:
A1
Abstract:
Methods and systems for training a neural network for defect detection in low resolution images are provided. One system includes an inspection tool that includes high and low resolution imaging subsystems and one or more components that include a high resolution neural network and a low resolution neural network. Computer subsystem(s) of the system are configured for generating a training set of defect images. At least one of the defect images is generated synthetically by the high resolution neural network using an image generated by the high resolution imaging subsystem. The computer subsystem(s) are also configured for training the low resolution neural network using the training set of defect images as input. In addition, the computer subsystem(s) are configured for detecting defects on another specimen by inputting the images generated for the other specimen by the low resolution imaging subsystem into the trained low resolution neural network.

Inventors:
BHASKAR KRIS (US)
KARSENTI LAURENT (IL)
RIES BRADLEY (US)
NICOLAIDES LENA (US)
YEOH RICHARD (SENG WEE) (SG)
HIEBERT STEPHEN (US)
Application Number:
PCT/US2019/024453
Publication Date:
October 03, 2019
Filing Date:
March 28, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
KLA TENCOR CORP (US)
International Classes:
H01L21/67; G03F7/20; H01L21/66
Foreign References:
US20170177997A12017-06-22
US20180075594A12018-03-15
JP2004354251A2004-12-16
US20150262038A12015-09-17
US20030086081A12003-05-08
US20170193400A12017-07-06
US7570796B22009-08-04
US7676077B22010-03-09
US7782452B22010-08-24
US8664594B12014-03-04
US8692204B22014-04-08
US8698093B12014-04-15
US8716662B12014-05-06
US8126255B22012-02-28
US9222895B22015-12-29
US20170148226A12017-05-25
US20160209334A12016-07-21
US6891627B12005-05-10
US20170200265A12017-07-13
US6902855B22005-06-07
US7418124B22008-08-26
US7729529B22010-06-01
US7769225B22010-08-03
US8041106B22011-10-18
US8111900B22012-02-07
US8213704B22012-07-03
US20170140524A12017-05-18
US20170193400A12017-07-06
US201862681073P2018-06-05
US20170193680A12017-07-06
US20170194126A12017-07-06
US20170200260A12017-07-13
US20170200264A12017-07-13
US20170345140A12017-11-30
US20190073566A12019-03-07
US20190073568A12019-03-07
Other References:
KRIZHEVSKY ET AL.: "ImageNet Classification with Deep Convolutional Neural Networks", NIPS, 2012, pages 9
SUGIYAMA, MORGAN KAUFMANN, INTRODUCTION TO STATISTICAL MACHINE LEARNING, 2016, pages 534
JEBARA: "MIT Thesis", 2002, article "Discriminative, Generative, and Imitative Learning", pages: 212
HAND ET AL.: "Principles of Data Mining (Adaptive Computation and Machine Learning", 2001, MIT PRESS, pages: 578
JIA ET AL.: "BIBE ' 14 Proceedings of the 2014 IEEE International Conference on Bioinformatics and Bioengineering", 10 November 2014, IEEE COMPUTER SOCIETY, article "A Novel Semi-supervised Deep Learning Framework for Affective State Recognition on EEG Signals", pages: 30 - 37
TORREY ET AL.: "Handbook of Research on Machine Learning Applications", 2009, IGI GLOBAL, article "Transfer Learning", pages: 22
YOSINSKI ET AL.: "How transferable are features in a deep neural network?", NIPS, 6 November 2014 (2014-11-06), pages 14
KINGMA ET AL.: "Semi-supervised Learning with Deep Generative Models", NIPS, 31 October 2014 (2014-10-31), pages 1 - 9
RASMUS ET AL.: "Semi-Supervised Learning with Ladder Networks", NIPS, 24 November 2015 (2015-11-24), pages 1 - 19
GOODFELLOW ET AL., GENERATIVE ADVERSARIAL NETS, 10 June 2014 (2014-06-10), pages 1 - 9
MAKHZANI ET AL.: "Adversarial Autoencoders", ARXIV:1511.05644V2, 25 May 2016 (2016-05-25), pages 16
SZEGEDY ET AL.: "Going Deeper with Convolutions", 2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR, June 2015 (2015-06-01), pages 9
ISOLA ET AL.: "Image-to-Image Translation with Conditional Adversarial Networks", ARXIV:1611.07004V2, 22 November 2017 (2017-11-22), pages 17
See also references of EP 3762961A4
Attorney, Agent or Firm:
MCANDREWS, Kevin et al. (US)
Download PDF:
Claims:
WHAT IS CLAIMED IS:

1. A system configured to train a neural network for defect detection in low resolution images, comprising: an inspection tool comprising a high resolution imaging subsystem and a low resolution imaging subsystem, wherein the high and low resolution imaging subsystems comprise at least an energy source and a detector, wherein the energy source is configured to generate energy that is directed to a specimen, and wherein the detector is configured to detect energy from the specimen and to generate images responsive to the detected energy; one or more computer subsystems configured for acquiring the images of the specimen generated by the high and low resolution imaging subsystems; and one or more components executed by the one or more computer subsystems, wherein the one or more components comprise a high resolution neural network and a low resolution neural network; and wherein the one or more computer subsystems are further configured for: generating a training set of defect images, wherein at least one of the defect images is generated synthetically by the high resolution neural network using at least one of the images generated by the high resolution imaging subsystem; training the low resolution neural network using the training set of defect images as input; and detecting defects on another specimen by inputting the images generated for the other specimen by the low resolution imaging subsystem into the trained low resolution neural network.

2. The system of claim 1 , wherein the training set of defect images comprises images of the specimen generated by more than one mode of the low resolution imaging subsystem. 3. The system of claim 2, wherein the more than one mode of the low resolution imaging subsystem comprises all of the modes of the low resolution imaging subsystem.

4. The system of claim 2, wherein the one or more computer subsystems are further configured for selecting one or more of the more than one mode of the low resolution imaging subsystem used for detecting the defects on the other specimen based on results of training the low resolution neural network with the images generated by the more than one mode of the low resolution imaging subsystem.

5. The system of claim 1 , wherein the inspection tool is configured as a macro inspection tool.

6. The system of claim 1 , wherein the defects detected on the other specimen are defects of a back end layer of the other specimen. 7. The system of claim 1 , wherein the defects detected on the other specimen are defects of a redistribution layer of the other specimen .

8. The system of claim 1, wherein the defects detected on the other specimen are defects of a high noise layer of the other specimen.

9. The system of claim 1 , wherein the defects detected on the other specimen are defects of a layer comprising metal lines of the other specimen.

10. The system of claim 1, wherein the other specimen on which the defects are detected is a post-dice specimen.

11. The system of claim 1 , wherein the high and low resolution neural networks are configured for single image defect detection. 12. The system of claim 1, wherein the training set of defect images comprises one or more images of one or more programmed defects on the specimen, wherein the one or more computer subsystems are further configured for generating the one or more programmed defects by altering a design for the specimen to create the one or more programmed defects in the design, and wherein the altered design is printed on the specimen to create the one or more programmed defects on the specimen.

13. The system of claim 1 , wherein the training set of defects comprises one or more images of one or more synthetic defects, and wherein the one or more computer subsystems are further configured for generating the one or more synthetic defects by altering a design for the specimen to create the one or more synthetic defects in the design, generating simulated high resolution images for the one or more synthetic defects based on the one or more synthetic defects in the design, and adding the simulated high resolution images to the training set. 14. The system of claim 13, wherein the one or more computer subsystems are further configured for generating the simulated high resolution images using the high resolution neural network, and wherein the high resolution neural network is configured as a deep generative model.

15. The system of claim 1 , wherein the training set of defects comprises one or more images of one or more synthetic defects, wherein the one or more computer subsystems are further configured for generating the one or more images of the one or more synthetic defects by altering a design for the specimen to create the one or more synthetic defects in the design, and wherein the one or more computer subsystems are further configured for generating simulated low resolution images for the one or more synthetic defects based on the one or more synthetic defects in the design.

16. The system of claim 15, wdierein the one or more computer subsystems are further configured for generating the simulated low resolution images using a deep generative model.

17. The system of claim 15, wherein generating the simulated low resolution images is performed with a generative adversarial network or a variational Bayesian method.

18. The system of claim 1 , wherein the training set of defects comprises one or more synthetic defects, and wherein the one or more computer subsystems are further configured for generating the one or more synthetic defects by altering one or more of the images generated by the high resolution imaging subsystem and one or more of the images generated by the low resolution imaging subsystem to create a segmentation image, altering the one or more of the images generated by the high resolution imaging subsystem based on the segmentation image, and generating simulated low resolution images for the one or more synthetic defects based on the altered one or more images. 19. The system of claim 18, wdierein generating the simulated low resolution images is performed with a generative adversarial network or a variational Bayesian method.

20. The system of claim 1, wherein the one or more computer subsystems are further configured for generating the at least one of the defect images synthetically by altering the at least one of the images generated by the high resolution imaging subsystem for the specimen to create high resolution images for known defects of interest.

21. The system of claim 1 , wherein the training set of defect images comprises one or more images of one or more artificial defects on the specimen generated by performing a process on the specimen known to cause the one or more artificial defects on the specimen.

22. The system of claim 1, wherein the training set of defect images comprises one or more defects detected on the specimen in one or more of the images generated by the high resolution imaging subsystem.

23. The system of claim 22, wherein the one or more computer subsystems are further configured for detecting the defects on the specimen in the images generated by the high resolution imaging subsystem by single image detection.

24. The system of claim 22, wherein the one or more computer subsystems are further configured for detecting the defects on the specimen in the images generated by the high resolution imaging subsystem by die-to-database detection.

25. The system of claim 1, wherein the inspection tool is configured for scanning swaths on the specimen while detecting energy from the specimen, and wherein the one or more computer subsystems are further configured for acquiring and storing at least three of the swaths of the images generated by the high resolution imaging subsystem such that the at least three of the swaths are available for use in generating the training set of defect images.

26. The system of claim 1, wherein the one or more computer subsystems are further configured for training the high resolution neural network, and wherein training the high resolution neural network and training the low resolution neural network are performed using a generative adversarial network or a variational Bayesian method.

27. The system of claim 1 , wherein the high resolution neural network is configured as a semi-supervised deep learning framework.

2S. The system of claim 1 , wherein the low resolution neural network is configured as a semi-supervised deep learning framework. 29. The system of claim 1, wherein the images generated by the low resolution imaging subsystem and acquired by the one or more computer subsystems comprise images taken through focus, wherein the one or more computer subsystems are further configured for mapping the images taken through focus to die images generated by the high resolution imaging subsystem, and wherein training the low resolution neural network is further performed based on the results of training the high resolution neural network and results of the mapping.

30. A non-transitory computer-readable medium, storing program instructions executable on one or more computer systems for performing a computer-implemented method for training a neural network for defect detection in low resolution images, wherein the computer-implemented method comprises; generating images for a specimen with high and low resolution imaging

subsystems of an inspection tool, wherein the high and low resolution imaging subsystems comprise at least an energy source and a detector, wherein the energy source is configured to generate energy that is directed to the specimen, and wherein the detector is configured to detect energy from the specimen and to generate images responsive to the detected energy; wherein one or more components are executed by the one or more computer systems, and wherein the one or more components comprise a high resolution neural network and a low resolution neural network; generating a training set of defect images, wherein at least one of the defect

images is generated synthetically by the high resolution neural network using at least one of the images generated by the high resolution imaging subsystem; training the low resolution neural network using the training set of defect images as input; and detecting defects on another specimen by inputting the images generated for the other specimen by the low resolution imaging subsystem into the trained low resolution neural network, wherein generating the training set, training the low resolution neural network, and detecting the defects are performed by the one or more computer systems.

31. A computer-implemented method for training a neural network for defect detection in low resolution images, comprising; generating images for a specimen with high and low resolution imaging

subsystems of an inspection tool, wherein the high and low resolution imaging subsystems comprise at least an energy source and a detector, wherein the energy source is configured to generate energy that is directed to the specimen, and wherein the detector is configured to detect energy from the specimen and to generate images responsive to the detected energy; wherein one or more components are executed by one or more computer systems, and wherein the one or more components comprise a high resolution neural network and a low resolution neural network; generating a training set of defect images, wherein at least one of the defect

images is generated synthetically by the high resolution neural network using at least one of the images generated by the high resolution imaging subsystem; training the low resolution neural network using the training set of defect images as input; and detecting defects on another specimen by inputting the images generated for the other specimen by the low resolution imaging subsystem into the trained low resolution neural network, wherein generating the training set, training the low resolution neural network, and detecting the defects are performed by the one or more computer systems.

Description:
TITLE: TRAINING A NEURAL NETWORK TOR DEFECT DETECTION IN LOW RESOLUTION IMAGES

BACKGROUND OF TILE INVENTION

L Fiel d of the Invention [0001] The present invention generally relates to methods and systems for training a neural network for defect detection in low resolution images.

[0002] The: following description and examples are not admitted to be prior art by virtue of their inclusion in thi s section. [0003] f abncadng semiconductor devices such as logic and memory devices typically includes processing a substrate such as a semiconductor wafer using a large number of semiconductor fabrication processes ro form various features and multiple levels of the semiconductor de\ ices. For example, lithography is a semiconductor fabrication process that involves transferring a pattern from a reticle to a resist arranged on a semiconductor wafer. Addrtional examples of semiconductor fabrication processes include, but are not limited:†b, eltemieal- meciraiiicul polishing (CMP) etch, deposition, and ion implantation. Multiple semiconductor devices may be fabricated ίh / a» arrangement on a single semiconductor wafer and then separated into iadi vidual semiconductor devices. [0003] inspection processes are used at various steps during a semiconductor manufacturing process to detect defects on wafers to drive higher yield in the manufacturing process and thus higher profits. Inspection has always been an important part of fabricating senfreondoetor devices. However , as the dimensions of semieonduetor devices decrease inspection becomes even more: important to the successful manufacture of acceptable semiconductor devices because smaller defects can cause the devices to fail.

[0005] Inspection methods have effectively not changed for more than 20 years .

Inspection solutions essentially have the following characteristics: a siibsiahiiatly slow electron beam type system that allows a user to identify and verify problems for physical defects; and separately a substantially fast hot coarse optical inspector that covers the entire wafer, but is often limited to a single layer of inspection. These two systems are typically separate * Some inspection systems ha ve: a high resolution camera on the same system as a scanning iow resolution sensor, but they are not integrated effectively to leverage each other for providing ground truth In formation

[0006] Conventional lithographic scaling ( at 1 l >3 nm) has slowed in addition, extreme ultraviolet (EUV) based sealing while progressing is also happening slowly. Newer appl ications such as driverless cars, sensors, deep learning jpDL) training and inference have resulted in a new focus on computational architectures instead of relying on scaling As an example, for both high performance computing (HPO and DL systems the overall system performance would benefit from a close proximity of memory and central processing unit (CPU) logic. So computer architects are focusing more on chip-to-chip interconnects wafer scale Integration, etc., and re-disittbuiion layers (RDL), These layers are often reconstituted dice, hence the currently used align and subtract defect detection methods will fail as inspection methods for such layers. Currently used segmentation techniques also have become difficult because the amount of nuisance compared : to defects of: interest fDOfs) is significantly high.

[1007] For RDL layers optical mode selection to suppress nuisance often takes 2 weeks because mode selection is done by manually examining what happens with a high resolution camera with inputs from the user. A typical wafer may only contain 10-20 events that represent DGI whereas the nuisance rate can be in the 100,00 to million range Therefore the current methods for selecting optical ffiodets) for EDI. layer inspection take a prohibitively Song time in addition, the scarcity of DQI, particularly compared to nuisance, available for selecting and setting up: the optical modes for RDL layer inspection can further increase the time required for the optical mode selection. Furthermore the limited number of BOIs available for optical mode selection can result in sub-optimal optical mode parameters being selected for RDL. layer inspection which can diminish the performance capability of such inspection

[0008] Accordingly, it would be advantageous to develop systems and methods for naming a neural network for defect detection in low resolution images that do not have one or more of the disadvantages described above SUMMARY OF THE INVENTION

[0009] The fo How» ig description of various embodiments is not to be const rti ed in any way as limiti ng: the subject matter of the appended claims, 100161 One embodiment relates to a system configured to train a neural network for defect detection in low resolution images. The system includes an inspection tool that includes a high resolution imaging subsystem and a low resolution imaging subsystem. The high and low resolution imaging subsystems include at least an energy source and a detector. The energy source is configured to generate energy that is directed to a specimen. The detector is configured to detect energy from the specimen and to generate images responsive to the detected energy.

[001 S | The system also includes cue or more computer subsystems configured for acquiring the images of the specimen generated by the high and low resolution: imaging subsystems, lit addition, the system includes one of more components executed by the one or more computer subsystems. The components) include a high resolution neural network and a low resolution neural network [0012] The one or mote computer subsystems are configured for generating a training set of defect images. At least one of the defect images is generated synthetically by the high resolution neural network using at least one of the images generated by the high resolution imaging subsystem. The computer subsystem(s) are further configured for uaining the low resolution neural network using the training set of defect images m input. The computer sub$y$tem|s) are also configured tor detecting defects on another Specimen by inputting the images generated for the other specimen by the low resolution imaging subsystem into the trained low resolution neural network, The system may he further configured as described herein. [0013] Another Embodiment relates to a computer-implemented method for

training a neural network for defect detection in low resolution images. The method includes generating images for a specimen with high and low resolution imaging: subsystems of an inspection tool which are configured as described above. One or more components are executed by one or more etmipuier systems, and the one or more components include a high resolution neural network and a low resolution neural network. The method includes the generating, training, and detecting steps described above. The generating, teaming, and detecting steps are performed by the one or more: computer systems. [0014] Each of the steps of the method described above may be further performed as described further herein in addition, the embodiment of the method described above may include any other s;tep(s) of any other methodfs) described herein furthermore, the method described above may be : performed by any of the systems described herein. [0015] Another embodiment relates to a non-nansuory computer-readable medium storing program instructions executable on one or more computer sys tems for performing a computer-implemented method for training a neural network for defect detection in low resolution images. The computer- implemented method includes the steps of the method described above. The computer-readable medium may be funher configured as described herein. The steps of the computer-implemented method may be performed as described further herein in addition, the computer-implemented method for which the program instructions ate executable ntay include any other: stepfs} of any other metbod(s) described herein.

BRIEF DESCRIPTION OF THE DRAWINGS

[0016] Further advantages dffhe present invention will become apparent ro those skilled in the art with the benefit of the following detailed description of the preferred embodiments and upon reference to the accompanying drawings in

Figs. 1 and la are schematic diagrams lllustrafiog side views of embodiments of a system configured as described herein;

fig. 2 is a flow chart illustrating steps that may be performed by the emnodimems described herein; and

Fig. 3 is a block diagram illustrating one embodiment of a : non-transitory computer- readable medium storing program instructions for causing computer sysiem(s) to perform a computer-implemented method described herein.

[0017] While the invention is susceptible to : various modifications ai id alternative forins, specific embodiments thereof are shown by way of example in the drawings and are herein described in detail, Tire drawings may not be to scale. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims.

DETAILED DESCRIPTION OF TEE PREFERRED EMBODIMENTS

[0618] The ternis“design”“design data” and : “design information” as used

interchangeably hefein genefaily to the physical design (layout) of an IC or other semiconductor device and data derived from the physical design through complex simulation or simple geometric and Boolean operations . In addition, an image of a reticle acquired by a reticle inspection system and/or derivati ves thereof can be used as a“proxy" or‘proxies” for the design. Such a reticle image or a deri vative thereof can serve as a substitute for the design layout in any embodiments described herein that use a design. The design may include any other design data or design data proxies described in commonly owned U.S.

Patent Nos. 7,570,796 issued on August 4,: 2009 to Za far et al and 7 * 676,077 issued on March 9, 2010 to Kid harm et al, both of which are incorporated by reference as if fully! set forth: herein, in addition, the design data can he standard ceil library data, integrated layout data, design data for one or more layers, derivatives of the design data, and full or partial chip design data.

[0019] In addition, the“design,”“design data * ” and design information”

described herein refers to information and data that is generated! by semiconductor device : designers in a design process : and is therefore available for use in the embodiments described herein well in advance of printing of the design on any physical specimens such as reticles and wafers.

Turning now to the drawings, it is noted that the figures are not drawn to scale. & -particular, the scale of some of the elements of the figures is greatly exaggerated to emphasize characteristics of the elements. It is also noted that the figures are not drawn to the same scale. Elements shown in more than one figure that may be similarly configured have been indicated using the same reference numerals. Unless otherwise noted herein, any of the elements described and shown: may include any suitable commercially available elements. [0021] One embodiment rela tes to & system configured to train a neural network for defect detection in low resolution images. One embodiment of such a system is shown in: Fig. I. The system includes one or more computer subsystems (e.g.. computer subsystems 3d and 1021 and one or more components 100: executed by the one or more computer subsystems. The one or more components: include high resolution neural network 104 and low resolution neural network 106, which are configured as described further herein. [0022] The system includes inspection tool 10 that includes a high resolution imaging subsystem and a low resolution imaging subsystem. In same

embodiments, the inspection tool is configured as an optical inspection tool. However, the inspection tool may he configured as another type of inspection tool described further herein. [0023] The tei m low resolution,” as used herein, is generally defined as a

resolution at which all of the patterned features an the specimen cannot be resolved. For example, some of the patterned features on the specimen may be resoh ed at a“low” resolution if their size is large enough to render them resolvable. However, low resolution does not render all patterned features on the specimens described herein resolvable. In this manner, a“low resolution, as that tennis used herein, cannot be used to generate information about patterned features on the specimen that is sufficient for applications such ds defect tevibw, which may include defect classification and/or verification, and metrology'. In addition, a‘low resolution imaging subsystem 5' as that term is used herein generally refers to an imaging subsystem that has a relatively low resolution < e.g.. lower than defect review and/or metrology systems) in order to have relatively fast throughput. In this mat met , a low tesoludon image” may also be commonly referred to as a high throughput or HT Image · Different kinds o f imaging subsystems may be configured for a low resolution, For example, in order to generate images at higher throughput, the e/p and die number of frames may be owered thereby resulting in lower quality scanning electron microscope; (SEM) images. 24] The“low resolution’ ' may also be“low resolution” in thatit is lower than a

“high resolution” described herein. A“high resolution” as that term is used herein can be generally defined as a resolution at which all patterned features of the specimen can be resolved with relatively high accuracy. In this manner, ail of the: patterned features on the specimen can be resolved at the high resolution regardless of their size. As such, a“high resolution · /* as that term is used herein, can be used to generate information about patterned features of the specimen that is Sufficient for use in applications such as defect review, which may include defect classification and/or verification, and metrology. In addition, a“high resolution’ ' as that term is used herein refers to a resolution that .Is generally not used by inspection systems during routine operation, which are configured to sacrifice resolution capability for increased throughput. A“high resolution Image’ may also be referred to in the art as a“high sensitivity image” which is another term for a“high quality image.” Di fferen t hinds of imaging Subsystems: may be configured for a high resolution. Tor example, to generate high quality images, the e-p, frames, etc , may be increased, which generates good quality

SEM images but lowers the throughput considerably These images; are then ¾gh sensitivity” images p . that they can he used for high sensitivity defect detection, [0025] In contrast to images and imaging subsystems, neural networks often are not classified or referred to as having any particular "resolution.” Instead the terms high and low resolution neural networks are used herein to identify two different neural networks, one trained and used for high resolution images aid another trained and used for low resolution images. In other words, the high resolution neural network may be trained and used to perform one or more functions (e,g.. defect detection^ using high resolution images generated by a high resolution imaging subsystem as input while the low resolution neural network may he trained and used to perform one of more functions (e.g., defect detection) using low resolution images generated by a low resolution imaging subsystem as input. Otherwise the high and low resolution neural networks: may be similarly or differently configured, with their parameterfs ) determined and set by the various steps described further herein. [0026] In one embodiment, the specimen is a water. The wafer may include any wafer known in the art. Although some embodiments may be described herein with respect to a wafer in particular, it is to he clear tMi hone of the eT»bodinients described hereia are limited to wafers. [0027] The high and low resolution imaging subsystems include at least an energy source and a detector. The energy source is configured to generate energy' that is directed te a specimen. The detector is configured to detect energy from the : specimen and to generate images responsive to the detected energy. Various configurations of the high and low resolution imaging subsystems arc described further herein. [0028] In general, the high and low resolution imaging subsystems may share some image forming elements of the inspection fool or none of the image toning elements of the inspection tool. For example, the high and low resolution: imaging subsystems may share the same energy source and detector, and one or more parameters of the energy source, detector, and/or other image forming elements of the inspection tool may be altered depending on if the high resolution imaging subsystem or the low resolution imaging subsystem is generating images of the specimen. In another example, the high and: low resolution imaging subsystems may share some image forming elements of the inspection tool such as the energy source and may have other non-shared image forming elements such as separate detectors fit a further example, the high and low resolution imaging subsystems may share no common image forming elements. In one such example, the high and low resolution Imaging subsystems may each have their own energy source, detectpr(s), and any other image forming: elements that are not used or shared by the other imaging subsystem. [0029] In the embodiment of the system shown in Fig, I , the high resolution imaging subsystem includes an il teminati on subsystem configured : to direct : light to specimen 12. The illumination subsystem includes at least one light source.

For example, as shown in Fig. 1 , the i! lamination subsystem includes light source 14. The illumination subsystem is configured to direct the light to the specimen at one or mote angles of incidence^ which may include one or more oblique angles and or one or tnare normal angles. For example, as shown in Fig. 1, light firorn light source 14 is directed through optical element 16 to beam splitter 18. Beam splitter 18 directs the light from optical element 16 to lens 20, which focuses the light to specimen 12 at a normal angle of incidence. The an gle of incidence may Include any suitable angle of ineidence, which may vary depending on, for instance, characteristics of the specimen. [0030] The IS lumination subsystem may he configured to direct the light to the specimen at different angles of incidence at different times. For example, the inspection tool maybe configured to alter one or more characteristics of one or more elements of the illumination subsystem such that the Sight can be directed to the specimen at an angle of incidence that is different than that shown in Fig. 1 la one such example, the Inspection too! may be configured to use one or I¥JS apertures (not shown) to control the anglers) at which light is directed from lens 20 to the specimen [0031] In one embodiment* light source 14 may include a broadband light source in this manner, the light generated by the light source and directed to the specimen may include broadband light. However. the light source may include any other suitable light source such as a laser, which may include auy suitable laser known in the art and may be configured to generate light at any suitable wa\elength{s} known in the art. In addition, the laser may be configured to generate light that is monochromatic or neady-monochromatie. in this manner, the laser may be a narrowband laser. The light source may also include a polychromatic light source that generates light at multiple discrete wavelengths m wavebands

[0032 } Light from beam splitter 18 may be focused onto specimen 12 by lens 20,

Although lens 20 is shown in Fig. 1 as a single refractive optical element- n is to be understood that, in practice, lens :20 may include a number of refractive and/or reflective opt tea I e !ements that in combination focus the light to the spec intern The illumination subsystem of the high resolution imaging subsystem may include any other suitable optical elements (not shown). Examples ofsuch optical elements include but are not limited to, polarizing cotnponenHs), spectral filter(s), spatial filterfs), reflective optical element(s), apodizefis}, beam spli tter(s), apertum(s), and the like, which may include any such suitable optical elements known in the ait. In addition* the In speed on tool may be configured to alter one or more of the elements of the illumination subsystem based on the type of illumination to be used for imaging.

[0033 } Although the high resolution imaging subsystem is described above as including one light source and illumination channel hi its illumination subsystem, the illumination subsystem may include moire than one illumination channel one of the illumination channels may include light source 14, optical element 16, and lens 20 as shown in Fig. 1 and another of the illumination channels (not shown ) may include similar elements, which may be configured differently or the same or may include at least a light source and possibly one or more other components such as those described further herein If the light fiem di fferent illumination channels is directed to the specimen at the same time, one or more characteristics (e.g., wavelength, polarization, etc,) of the light directed to the specimen by the different illumination channels may be different such that light resulting from illumination of the specimen by the different illumination channels can be discsaminaied : from each other at the detector (s). In another instance, the illumination subsystem may include only one light source (e.g , source 14 shown in Fig. 1) and light from ike light source nray be separated into different paths (e.g., based on wavelength, polarization, etc.) by one or more optical elements (not shown) of tbe illumination subsystem. Light in each of the different paths may then be directed to the specimen. Multiple illumination channels may be configured to direct light to the specimen at the same time or af different times (e.g. :? when different illumination channels are used to sequentially illuminate the specimen}. In another instance, the same illumination channel may be configured to direct light to the specimen with different characteristics at different times. For example in some instances, optical element 16 may be confi gored as: a spectral f lier and the properties of the spectral filter can be changed in a variety of different ways (e.g., by swapping out the spectral filter} such that different wavelengths of light can be directed to the specimen at different times. The illumination subsystem may have any other suitable configuration known in the art: tor directing the light having different or the same characteristics to the specimen at different of the same angles of incidence sequentially or simultaneously. [0035 } The inspection tool may also include a scanning subsystem configured to cause the light to be scanned over the specimen. For example, the inspection tool may include stage 22 cm which specimen 12 is disposed during imaging The scanning subsystem pay include any suitable mechanical and/or robotic assembly (that includes stage 22) that can be configured to move the specimen such that the light can be scanned over the: specimen. In addition, or alternatively, the inspection tool may be configured such that one or more optical elements of the high resolution imaging subsystem perform some scanning of the light over the specimen. The light may he scanned over the specimen in any suitable fashion such as in a serpentine-lifee path or in a spiral path [0036] The high resolution imaging subsystem further includes one or more

defection channels At least one of the one or more detection channels includes a detector configured to detect light from the specimen due to illumination of the specimen by the illumination subsystem and to generate output responsive to the detected light. For example, the high resolution imaging subsystem shown in Fig 1 includes a detection channel, formed by lens 20, element 2o, and detector 28 Although the high resolution imaging subsystem is described herein as including a common lens used for both illumination and co!!eetiou/deteetien, the illumination subsystem and the detection channel may include separate lenses (not shown ) for focusing in the case of illumination and collection in the ease of detection. The defection channel may be configured to collect and detect light at different angles of collection. For example, the angles of light that are collected and detected by the detection channel may fee selected and/or altered using one or more apertures (not shown) that axe positioned in a path of the light ifom the specimen· The light from the spechnen that is detected by the detection channel of the high resolution imaging subsystem may include specularly reflected light and/or scattered Sight.

In this manner, the high resolution imaging subsystem shown in Fig. 1 may fee configured for dark field (DF) and/or bright field (BF) imaging. [0037] Element 26 may fee a spectral filter, m aperture, or any «tier suitable element or combination of elements that can be used to control the light that is -detected by detector 28. Detector 2 S may include any suitable detector known in the art such as a photo-multiplier tube (FMT), charge coupled device (CCD), and time delay integration (TDI) camera. The detector may also: include a non- iiuaging detector or imaging detector if the detector is a non-imaging detector, the detector may be configured to detect certain characteristics of the; scattered light such as intensity but may not be configured to detect such characteristics as a function of position within the imaging plane, As such, the output that is generated by the detector may he signals or data, but not image signals or image data. A computer subsystem such as computer subsystem 36 may be configured to generate images of the specimen from the non-imaging output of the detector.

H owever, the detector may be configured as an imaging detector that is configured to generate imaging signals; or image: data. Therefore, the high resolution imaging subsystem may be configured to genera te the Images described herein: hr a number of ways,

[0038] The high resolution imaging subsystem may also include another detects on channel. For example, light from the specimen that is collected hy lens 20 may be directed through beam splitter 1 S to beam splitter 24, which may transmit a portion of the light to optical element 2b and reflect another portion of the light to optical element 3Q Optical element 30 may be a spectral filter, an aperture, or any other sui table element or combina tion of elements that can be used to control the light that is detected by detector 32. Detector 3:2. may include any of the detectors described above. The different detection channels of the high resolution imaging subsystem may fee configured to generate different images of the specimen images of the specimen generated with light having different characteristics such as polanEation, wavelength, etc. orsome combination thereof). [0039] In a different embodiment, the detection channel ibrrned by lens 20, optical element 30, and detector .32 may be part of the low resolution imaging subsystem of the inspection tool. In this case, the low resolution imaging subsystem may include the same illumination subsystem as the high resolution imaging subsystem, whiehis described in detail above (e g. , the illumination subsystem that includes light source 14, optical element 16, and lens 20), The high and low resolution imaging subsystems may' therefore share a common illumination subsystem. The high and knv resolution imaging subsystems may however include different detection channels:, each of which is configured to detect light from the specimen due to illumination by the shared illumination subsystem. In this manner, the high resolution detection channel may include lens 20, optical element 26, and detector 28, and the low resolution detection channel may include lens 20, optical element 30, and detector 32 In this manner, the high and low resolution detection channels may share a common optical element (lens 20) but also have {ton-shared optical elements.

[0040] The detection channels of the high and low resolution imaging subsystems may fee configured to generate high and low resolution specimen images, respectively ; , even though they share an illumination subsystem. For example, optical elements 26 and 30 may be different· v configured apertures and· or spectral inters that control the portions of the light that are detected by detectors 28 and 32, respectively, to thereby control the resolution of the images generated by detectors 28 and 32, respectively hr a different example, detector 28 of the high resolution imaging subsystem may be selceted to have a higher resolution than detector 32. The detection channels may be configured in any other suitable way to have different resolution capabilities:.

[004] | in another embodiment, the high and low resolution imaging subsystems may share all of the same image forming elements. For example, both the high and low resolution imaging subsystems may share the illumination subsystem formed by light source 14, optical element 16. and lens 20. The high and low lesclutiou imaging subsystems may also share the same detection channel or channels (e.g., one formed by lens 20, optical element 26, and detector 28 and/br another formed by less 20, optical element 30, and detector 32 }. In such an embodiment, one or more parameters or characteristics of any of these image forming elements may be altered depeudisg on whether high or low resolution images are being generated for the specimen Bor example a numerical aperture (NA 1 of lens 20 may be altered depending on whether lugh or low resolution Images are being formed of the specimen. [0042] Eh a further embodiment, the high and low resolution imaging subsystems may notch are any image forming elements. For example, the high resolution imaging subsystem may include the image forming elements described above, which may not be shared by the low resolution imaging subsystem. Instead, the low resolution imaging subsystem may include its own illumination and detection subsystems in one such example, as shown in Fig. I, the low resolution imaging subsystem may include an illumination subsystem that includes light source 38, optical element 40, and fens 44. Light from light source 38 passes through optical element 40 and is reflected hy beam splitter 42 to lens 44, which directs the light to specimen 12, Each of these image forming elements may be configured as described above. The illumination subsystem of the low resolution imaging subsystem may he further configured as described herein. Specimen 12 may be disposed on stage 22, which may be configured as described above to cause scanning of the light over the specimen during imaging in this manner, even if die high and low resolution imaging subsystems do not share any image forming elements they may share other elements of the inspection tool such as the stage, scanning subsystem, power source (not shown), housing (not shown), etc: [0043] The low resolution imaging subsystem may also include a detection

channel formed by lens 44, optical dement 46, and detector 48 Light from the Specimen due to ilhnninatron by the ilhuinnation subsystem may be collected by lens 44 and directed through beans splitter 42. which transmits the light to optical element 4b. Light that passes through optical element 46 i$ then detected by detector 48. Each of these image ibrmi ng ele then is ntay he foither conlgured as described above. The detection channel and/or detection subsystem of the low resolution imaging subsystem may be further configured as described herein [0044] It is noted that Fig. 1 is provided herein to generally illustrate

configurations: of high and low resolution imaging subsystems that may be included in the inspection: tool or that may generate images that; are used by the systems or methods described herein. The configurations of the high and low resolution Imaging subsystems described herein may be altered to optimize tire performance of the high aid low resolution imaging subsystems as is normally performed when designing a commercial inspection tool. In addition, the systems described herein may be implemented using an existing system te.g., by adding functionality described herein to an existing system) such as the Altair series of tools that are eommei dally available front KLA, Milpitas. Calif, For some Such systems, the embodiments described herein may be provided as optional fnnctionality of the system (e.g , In addition to other fenctlonaiity of the- system), Alternatively, the inspection tool described herein may be designed "front scratch” to provide a completely new inspection tool. The high and low resolution

Imaging subsystems may be further configured as described in tl.S. Patent bio. 7/782.452 issued August 24, 2010 to Mehanian et al , which Is incorporated by reference as if folly set forth herein,

[0045] The system also includes one or more computer subsystems configured for acquiring the images of the specimen generated by the high and low resolution imaging subsystems. For example, computer subsystem 36 coupled to (or included in) the inspection tool may be coupled to the detectors of the inspection tool in any suitable marmef (e,g., via one or more transmission media, which may include“wired” and/or 'wireless” transmission media) such that the computer subsystem can receive the output or images generated by the detectors for the specimen. Computer subsystem 30 may be configured to perform a number of functions described further herein using the output or images generated by the deteeiom.

[0046] The computer subsystems shewn in Fig. 1 {as well as other computer subsystems described herein) may also he referred to herein a# computer system's). Each of the computer suhsystem(S : ) : or system(§} described herein may take various forms, including a personal computer system, image computer, otainirame computer system, workstation, network appliance, Internet appliance, or other device In general, the term“computer system” may he broadly defined to encompass any device having one or more processors, which executes instructions from a memory medium. The computer snbsystem(s) or systetn(s) may also include any suitable processor known in the art such as a parallel processor in addition the computer subsystem] s) oi sysremis) may include : » computer platform with high speed processing and softw are, either as a standalone or a networked tool.

[6047] If the system includes mote than one computer subsystem, then the

different computer subsystems may be coupled to each other such that images, data, information, instructions, etc. can be sent between the computer subsystems, For example, computer subsystem 36 may be coupled to computer snbsystem(s) 102 as shown by the dashed line In Fig. 1 by any suitable : transmission media, which may include any suitable wired and/or wireless transmission media known in the art. Two or more ofsuch computer subsystems may also be: effectively coupled by a shared coniputer-readable storage medium (not shown).

[0048] Although the high and low resolution imaging subsystems are described above as being optical or light-based imaging: subsystems, the high and low resolution imaging subsystems may also or alternati vely include electron beam imaging subsysremfs) configured to generate electron beam images of tire specimen In one sucfc embodiment the electro» beam imaging suhsystem(s) may be configured to direct electrons to or scan electrons over the specimen and to detect electrons from the specimen : . Is one such entbochmeut shown in Pig. la, the electron beam imaging subsystem includes electron column 122 coupled to computer subsystem 124

[0049 ] As also shown in Fig, la, the electron column includes electron beam

source 326 configured to generate electrons that are focused to specimen 128 by one or more elements 130. The electron beam source may incl ude, for example, a cathode source or emitter tip, and one or more elements 130 may include, for example, a gun lens, an anode, a beam limiting aperture, a gate valve, a beam current selection aperture, : au objective lens, and a scanning subsystem, all of which ma y incl ude any such suitable elements known in the art.

[0050] Electrons returned from the specimen (e.g., secondary electrons) may be focused by one or more elements 132 to detector 134 One or more elements 132 may include, for example, a scanning subsystem, which may be the same scanning subsystem included i n elements s 130.

[00 51] The electron column may include any other suitable elements known in the art. In addition, the electron column may be further configured as described in U.S. Patent Nos. 8,664,594 issued April 4, 2014 to Jiang et al., 8,692,204 issued April 8, 2014 to Kojlma et al, 8,698.093 issued April 15, 2014 to Gubbens et al,, and 8,716,662 issued May 6, 2014 to MacDonald ei al., which are incorporated by reference as if fully set forth herein.,

[0052] Al though the electron column is shown in Fi g . 1 a as being con figured s ueh that the electrons are directed to the specimen at an oblique angle of incidence and are returned from the specimen at another oblique angle, it is to be understood that the electron beam may be d heeled to and detected from the specimen at any suitable angles, in addd ion. the electron beam imaging subsystem may be configured to use multiple modes to generate images of the specimen as described thither herein fe.g , with different i llumination angles, collection angles, etc }

The multiple modes of the electron beam imaging subsystem may be different in any image generation parameters The electron column shown in Fig. l a may also be configured to function as high and low resolution imaging subsystems in any suitable manner known In the art fe.g , by changing one or more parameters or characteristics of one or more elements included in the electron column so that high or low resolution images can be generated fin die: specimen) [0053] Computer subsystem 124 may he coupled to detector 134 as described above. The detector tnay defect electrons returned from the surface of the specimen thereby forming electron beam Images of the specimen. The electron beam images may include any suitable electron beam images . Computer subsystem 124 may be configured to perform one or more functions described further herein for the specimen using output generated by detector 134. A system that includes the electron heatn imaging subsystem shown in Fig. l a; may be: further configured as described herein.

[0054] Ills noted that Fig, la is provided herein to generally illustrate a

configuration of an electron beam imaging subsystem that may be included In the embodiment described herein. As with the optical imaging subsystems described above, the electron beam imaging subsystem configuration described herein may be altered: to optimize the performance of the imaging subsystem as is normally performed when designing a commercial Imaging subsystem. In addition, the systems described herein may be implemented using an existing system (e.g., by adding funetionahty described herein to an existing system) such as the tools that are commercially available from KLA. For some such systems. Etc embodiments described herein may be provided as optional functionality of the system (e.g., in addition to other functionality of the system). Alternatively the system described herein may be designed from scratch” to provide a completely new system. [0055] Although ihe imaging subsystems are described above as being light-based or electron beam-based imaging subsystems, the imaging subsystems may be i on beam-based imaging subsystems. Such ait imaging subsystem may be configured as shown in Fig, J-a except that the electron beam source may be replaced with any suitable ion beam source known in the art In addition, the imaging subsystems may be any other suitable ion beam-based imaging subsystem such as those included in commercially available focused ion beam (FIB) systems, helium ion microscopy (HIM) systems, and secondary ion mass spectroscopy (SIMS) systems. [0056] Although the inspection tools are described above as including high and low resolution imaging subsystems that are either optical, electron beam, or charged particle beam based, the high and low resolution imaging subsystems do not necessarily have to use the same type of energy. For example, the high resolution imaging subsystem may be an electron beam type imaging subsysteni while the low resolution imaging subsystem may be an optical type imaging subsystem imaging subsystems that use different types of energy may he combined into a single inspection tool in any suitable manner known in: the art.

|0057] As noted above, the imaging subsystems may be configured for directing energy (e.g , light, electrons) to and/or scanning energy over a physical version of the specimen thereby generating actual images for the physical version of the specimen. In this manner, the imaging subsystems may be configured as“actuaT imaging systems rather than“virtual” systems. For example, a storage medium (nor shown) and computer subsystem(s) 102 shown in Fig i may be configured as a“virtual” system. Systems and methods configured as“virtual” inspection systems are described in commonly assigned ITS Baient Nos. 8. 120.255 issued on February 28, 2012 to Bhaskar et ak and 9.222,895 issued on December 29.

2015 to Duffy er al., both of which are incorporated by leferenee as if fully set forth herein. The embodiments described herein may be further configured as described in these patents .

[0058] As further noted above, the imaging subsystems may be configured to generate images of the specimen with multiple modes. In general, a“mode” can be defined by the val ues of parameters ©f an: imaging subsystem used for generating images of a: specimen or the output used: to generate images of the specimen. Therefore, modes that are different may he different in the values for at least one of the imaging parameters of the imaging subsystem. For example, in an optical imaging subsystem;, different modes may use different wavelength(s) of light for illumination. The modes may be different in the illumination wavelength as described farther herein te.g.. by using different light sources, different spectral filters etc.) for different modes. Both the high and low resolution imaging subsystems may be capable of generating output or images tor the Specimen with different modes. [0059] The high and low resolution neural networks may have a variety of

different con figurations described further herein The high and low resolution neural networks: may he configured as a network of deep learning (DL) systems. The high resolution neural network may perform one or more functions for a specimen using high resolution images generated for the specimen by the high resolution imaging subsystem. The low resolution neural network may perform one or more functions for a specimen using low resolution images generated for the specimen by the low resolution imaging subsystem.

[0060] As described further herein the high resolution neural network may he used to generate defect images that are Used to; train the low resolution neural network that is then used for defect detection OR a speeimen nsing low resolution images of the specimen In this manner, the embodiments described herein may be configured as a generalized patch · ba$ed hybrid inspector using a network of Dl, systems. For example,; the embodiments described herein may be a kind of hybrid : inspector that identifies and e!assifi es design and process systematic defects in semiconductor manufacturing processes using a network of DL systems that combine optica! and possibly SEM and design patches. The term“systematic defects” is generally defined in the art as defects that are caused by an interaction between a process performed on the specimen and a design termed on fee specimen. Therefore,“systematic” defects may be formed at multiple, repeating locations across a specimen,

10061] Each of the high and low resolution neural networks may be a deep neural network with a set of weights that model: fee world according to fee data that it has been fed to train it. Neural networks can: be generally defined as a computational approach which is based on a relatively large collection of neural units loosely modeling the way a biological brain solves problems with relatively large clusters of biol ogi cal neurons connected by axons. Each neural unit Is; connected with many others, and links can be enfoictng hr inhibitory in their effect on the activation state of connected neural units fhese systems are self learning and trained rafeer than explicitly programmed and excel in areas where the solution or feature detection is difficult to express in a traditional computer program [0062] Neural networks typically consist of multiple layers, and the signal path traverses from front to back. The multiple layers perform a number of algorithms or transfonnatioas. :¾» general, the number of layers is not significant and is use ease dependent. For practical purposes, a suitable range of layers is front 2 layers to a few tens of layers. Modem neural network projects typically work with a few thousand ton few million neural anus and millions of connections. The goal of the neoal network Is to sol ve problems in the same way tha t the human brain would although several neural networks are much more abstract. The neural networks may have an ) suitable architecture anchor configuration know n in the art In some embodiments, the neural networks may be configured as a deep convolutional: neural network (DCMMJ as described in“ImageNet Classification with Deep Convolutional Neural Netwprkf^by Krizhevsky et: al„ NIPS, 2012, 9 pages, which is incorporated by reference as if fully set forth herein. The embodiments described herein may be further configured as described in this

[0063] Tile neural networks described herein belong to a class of computing commonly referred to as machine learning. Machine learning can he generally defined as a type of artificial intelligence (AI) that provides compaters with the ability to learn without being explicitly programmed. Machine learning fpenses on the development of computer programs that can teach themselves to grow and change when exposed to new data in other words machine learning can he defined as the subfield of computer science that“gives computers the ability to learn without being explicitly pro^ammed^ Machine learning explores the study and constructi on of algorithms that can learn irons and make predictions on data - such algorithms overcome following strictly static program instructions by making data driven predictions or decisions through building a snpdel from sample inputs.

[0064 ] The neural networks described herein may be further configured as

described in“Introduction to Statistical Machine Leammg, w bySugiyana¾,

Morgan Kaufmann, 2016, 534 pages;“Discriminative, Generative, and Imitative Learning: Mebara, MIT Thesis :, 20O2, 212 pages; and“Principles of Data Mining (Adaptive Computation and Machine Learning f Hand et al.. MP Press. 2001 , 578 pages; which are incorporated by reference as if fully set forth herein. The embodiments described herein: may be farther configured as described in these i eferenees

[0065] The neural networks described herein may also or alternatively belong to a class of computing commonly relented to as DL. Generally speaking, £ T3L” (also known as deep structured learning, hierarchical: learning or deep machine learning) is a branch of machine learning based on a set of : algorithms that attempt to model high level abstractions in data In a simple ease, there may be two satis of neuronst ones that receive an input signal and ones that send an output signal When the input layer receives an input; it passes on a modified version of the input to the next layer. In a PL based model there are many layers between the input and output (and the layers are not made of neurons but it can help to think of it that way), allowing the algorithm to use multiple processing layers, composed of multiple linear and non-linear transformations .

[0066j DL is part of a broader family of machine learning methods based on

learning representations of data. An observation (e.g., an image) can be represented in many ways such as a vector of intensity values per pixel, or in a more abstract w ay as a set of edges, regions of particular shape, etc, Some representations are better than others at simplifying the learning task (e g. face recognition or facial expression recognition). One of the promises of DL is replacing handcrafted features with efficient algorithms for unsupervised or: semi- supervised feature learning and hierarchical featore extraction, [0067] Research in this area attempts to make better representations and create models to learn these representations from large-scale ualabeled data. Some of the representations are inspired by advances in neuroscience and are loosely based on imetpretation of information processing and communication patterns hi a navons system such as neural coding which attempts to define a relationship "between various stimuli and associated neuronal responses in the brain. [0068] In one embodiment, the high resolution neural network is configured as a seroi -supervised DL framework. In another embodiment, the low resolution se»r#J network is configured as a semi-supervised DL framework For example, a semi “S upervised state of the networks can be used in the DL networks described herein. Such a DL framework may be configured for a two-level process using both supervised label information and unsupervised structure: inlormaiion to jointly make decisions on channel selection. For example, label information may be used in feature extraction and unlabeled information may be integrated to regularise die supervised training hi this wav, both Supervised and unsopervised information may be used during the training process to reduce model variance, A generative model such as a Restricted Bolfemami Machine ( RBM ) may be used to extract representative features and reduce the data dimensionality, which can greatly diminish the impact of scarcity of labeled information. An initial: channel selection procedure utilizing only unsupervised. information may remove irrelevant channels with little structure information and reduce data

dimensionality. Based on the results from the initial channel selection, a fine channel selection procedure can be used to handle noisy channel problems.

Therefore, such a DL fianiework may be particularly useful for handling information that is very noisy, which may be the case for some of the specimens described further herein. The DL frameworks may be further configured as described in **A. Novel Semi-supervised Deep Learning Framework for Affecti ve State Recognition on BEG by Jia e| ai., BIBB s 14 Proceedings of the

2014 IEEE International Conference on Bioinformafics and Bioengineering pp. 30-37, November 10-12. 2014, IEEE Computer Society, Washington. DC, which is incorporated by reference as if fully set forth herein. The embodiments described herein may he further configured as described in this reference.

[0069] The embodiments described herein may essentially divide and conquer the noise or nuisance suppression vs, defect of interest (DO!) detection-problem. For example, the computer subsysiern(s) described herein can perforin a kind of iterative training in which training is fust performed for nuisance suppression then DUi detection, -Nuisances ' ’ {which is sometimes used interchangeably with “nuisance defects 5 ! as that term is used herein is generally defined as defects that: a User does not care about atid/or events that are detected on a specimen but are not really actual defects ou the specimen. Nuisances that are not actually defects may he detected as events due to non-defect noise sources on the specimen fe g„ gram in metal lines on the specimen, signals from underlaying layers or materials on the specimen, hue edge roughness (LERT relatively small critical dimension (CD) variation: in patterned features, thickness variations, etc,) and/or due to marginal ides In the inspection subsystem itself oi its configuration used ibr inspection. [0070] The term "DOT" as used herein can be defined as defects that are detected on a specimen and are really actual defects on the specimen. Therefore, the POLs are of interest to a user because users generally care about hew many and what kind of actual defects are on specimens being inspected M some contexts, the term“BOG is used to refer to a subset of all of the actual defects oh the specimen, which includes only the actual defects that a user cares about. For example, there may he multiple types of DQIs on any given wafer, and one or more of them may be of greater Interest to a user than one or more other types. In tire context of the embodiments described hereirn however, the term“DOIs” is used to refer to any and all real defects oh a wafer. [0071] Generally therefore, the goal of inspection is not to detect nuisances on specimens. Despite siibstantial efforts to avoid such detection of nuisances, it is practically impossible ip eliminate: such detection completely. Therefore, it is important to identify which of the detected events are nuisances and which are DCMs such that the information for the different types of defects can he used separately, e.g., the information for the DOIs may be used to diagnose and/or make changes to one or more fabrication processes performed on the specimen, while die information for the nuisances can be ignored, eliminated or used to diagnose noise on the specimen and/or marginaiities in die inspection process or

[0072] it is fa?· easier to tackle the nuisance suppression problem be it based on modes (i.e., the image acquisition) or algorithms (i.e.. the image processing) if one focuses on minimizing noise, Npls© i$ present in abundance in the low resolution Images of the specimens described herein, Grain, for example, is substantially susceptible to producing noise in low NA images whereas it tends to get; washed out in Mgh resolution imaging which of coarse suffers torn a much lower throughput compared to low resolution images in particular, "‘grain " as that term is used herein refers to dislocations in the crystal line structure of a metal (such as aluminum or copper). As a result, when grain is present in a metal being: imaged, instead of "seeing” a smooth surface, there are numerous discontinuities which at a relatively low NA tend to stand out. In contrast, at a relatively high MA, the discontinuities tend to get washed out (e.g.. diminished)

[0073] In the various Paining steps described further herein . images (high or low resolution depending on the neural network being trained) as well as other

Information cart be input to the neural network being trained. For example, the other information may include information for the design of the specimen (pig.,, the design data itself or some other data relevant to the design) and process information, which may include any information for any of the processes performed on the specimen prior to imaging by the inspection tool. Using such additional information for training may he: advantageous for a couple of reasons. For example, design information can be useful in reducing prior layer defects (ie,, defects that the user is riot interested in for purposes of the current layer inspection j. hi some such instances, rules can be entered into the neural network via "rules defined by knowledge a priori ' or“learned based on the segmentation information provided by design.” (^Segmentation” Information as that term is used herein is generally defined as the information that is used to inspect different areas on the specimen differently- e.g... by separating images of the specimen into different segments, the inspection of which is determined based on design or other information for the: segments.) In addition, re-distributioh |RDL) layers in the back end of the semiconductor device fabrication process are somewhat simple, (compared to the front end), e.g., they typically include 3 or 4 layers, and therefore “process” knowledge can also be added as an input for these layers both for identifying potential modes as well as inputs for the kinds of defects that are process induced. This information will therefore relate to the nature of the process knowledge as compared to design which is geometrical. The process infonnatios may be input as labels or rules or even text that gets merged with the DL network. The computer subsystem! $) may be eontigtimd for training the high resolution nenta! network. Training the high resolution neural network may be performed in a supervised, semi-supervised, or uhsupervised manner. For example, in a supervised training method, one or more images of the specimen may be annotated with labels that indicate noise or noisy areas in the imagefs) and quiet (non-noisy) areas in theimage(s). The labels may be assigned to the images s ' ) tn any suitable maimer (e.g., by a user, «sing a ground truth method or using a defect detection method or algorithm known to be capable of separating detects from noise in the high resolution images with relatively high accuracy). The hnage(s) anti their labels may be input to the high resolution: neural network for the training in which one or more parameters of the high resolution neural network are altered until the output of the high resolution neural network matches the training input.

[0®75| In an unsupervised training method, unlabeled images may be input to the high resolution neural network for: the training and the high resolution neural network may use the images to identify noise m the images. For example due to the high resolution of the images input to the high resolution neural network, the high resolution images can act as a kind of ground truth information suitable for identifying nuisance areas on the specimen and ncrr-nuisance areas on the specimen and/or by performing a delect detection and/or classification method that separates nuisances from defects, the nuisance and non-nuisance areas on the specimen can he i deftti fled. The training may then include altering one or more parameters of the high resolution neural network as described above. [0076] In one embodiment, the images generated by the high resolution imaging subsystem used for training the high resolution neural network include images of the specimen generated by more than one mode of the high resolution imaging subsystem. The number of modes for which images are generated, acquired, and used in this and other steps described herein may vary depending on the various, possible configuration settings of the Inspection tool and/or what is simply practical from a time and/or storage space consideration, [0077] In some embodiments, the inspection tool is configured for scanning

swaths on the specimen: while detecting energy from the specimen, and the one or more computer subsystems are configured for acquiring and storing at least three of the swaths of the images generated by the high resolution imaging subsystem such that the at least three of the swaths are available for use in generating the training set of defofii images. The inspection tool may scan the swaths, on the specimen as described further herein, and the output or images generated by scanning a swath may be referred to as a swath of output or images. The embodiments described herein are capable of storing an entire row of data (i.e., swaths of images or data that cover entire dies in an entire row on the specimen) for multiple, e,g.. 30, modes in both high and low resolution modes. If there is not sufficient storage to store all high resolution patches in the swaths, three swaths (e.g„ fop, center, and bottom) can be stored. The high resolution images may be scanned for at least three entire swaths and stored on the macro inspector version of the virtual inspector simultaneously as such a system can store the low lesolution images described further herein as well. [0078] The images from the multi p Se modes may be input to the high resolution neural network for training as described further herein The images from different modes may be used separately or in combination for training the high resolution neural network. For example, j mages generated by different modes maybe used as multiple channel inputs in the training step. The images front different modes may be used in combination to identify nuisances versus noB-Miisaeees: in the images and/or on the specimen. The high resolution nenral network: paranieter(s) may then be altered to suppress detection of such nuisances in the images by the high resolution neural network in another example images generated in one or more of the different modes may be used for training the high resolution neural network., which may be performed as described herein, and then the high resolution neural network may be re-trained with the images generated by another or others of the different modes. In this manner the high resolution neural network may be trained using a kind of transfer teaming from one or more modes; to another one or more modes. [0079] The one or more computer subsystems are configured for generating a training set of defect images. At least one of the defect linages is generated synthetically by the high resolution neural network using at l east one of the images generated by the high resolution imaging subsystem. Generating the : training set may be performed as described herein. The defect images included in the: training set may include various types of defect images described further herein . The at least one defect image may be generated synthetically as described further herein. [§§801 The one or mote computer subsystems are configured for training the low resolution neural network using the training set of defect images as input. In tins manner the contputer subsysteutis) may be configured for performing a type of transfer learning of the information produced by training the high resolution neural network to the low resolution neural network. For example, one advantage of the embodiments described herein is that they provide systems and methods for inspection of semiconductor devices using efficiently trainable neural networks with a limited Uainmg set, To this end, a series of transfer learning methods can he used to enable and accelerate the efficient training of neural networks in a principled manner .

[0081 J Transfer learning can be generally defined as the in tpro cement of learning a new task (or a target task) through the transfer of knowledge from a related task that has already been learned tone or more source tasks). In the embodiments described herein, therefore training ihe high resolution neural network may involve learning foe one or more source tasks, and training the low resolution neural network may be performed with the results of training the high resolution neural network to thereby transfer the knowledge from the source tasks (the high resolution neural network training) to the target task (the low resolution neural network training). In transfer learning, the agent knows nothing about a target task (or even that there will be a target task) while it is learning a source task. For instance, in the embodiments described herein, foe high resolution neural network knows /nothing about foe low resolution neural network: while : it is befog trained. [ 0082] The transfer learning described herein may be performed in any suitable manner. For example, in an inductive learning: task, the objective is to : Induce a predictive model from a set of training examples. Transfer in inductive learning works by allowing source-task knowledge to affect the target r ask ' s inductive bias. In an inductive transfer method, the target-task inductive bias is chosen or adjusted based tin fee source-task knowledge. The way this Is done varies depending on which inducti ve learning algorithm is used to learn the: source and target tasks. [0083] Inductive transfer can Be viewed as not on ly a wav to improve· learning in a standard snpemseddearning task, Bat also as a way to: offset the difficulties posed by tasks that inv olve relatively small datasets. That is, if there axe relatively sma ll amounts of data or class labels for a task treating it as a target task and performing inductive transfer from a related source task can lead to more accurate models * These approaches therefore use source-task data to enhance target-task ite feet that the two datasets ate assumed to came from different probability distributions.

[0084] Transfer learning as described herein can Be further performed as

described in“Transfer Learning,’ Toxrey et al Handbook of Research on Machine Learning Applications, published by !GI Global edited by E. Soria, j.

Martin, K. Magdalena, M Martinet and A. Serrano, 2009, 22 pages, and“blow transferable are features in it deep neural network?” yosinski et al, NIPS 2014, November 6, 2014, 14 pages, which are incorporated by reference as if felly set fprth herein. The embodiments described herein may be further configured as described in these references. [0085] In one embodiment, the training set of defect images include images of the specimen generated by more than one mode of the low resolution imaging subsystem. For example the images used for training the low resolution neural network may include low resolution images generated using 30 or mote modes of the low resolution imaging subsystem:. The computer subsystem(s) described herein are capable of storing suets, a high volume of low resolution image data.

The multi-mode low 1 evolution images may be generated as described herein and used for training as described herein. The multiple modes of the low resolution imaging subsystem: whose images are used in the embodiments described herein may be configured and selected as described further herein.

[0086] In some embodiments, die computer subsystem) s) are configuredibr

training the hi gh resolution neural network, and train mg the high resoluti on neural network and training the low resolution neural network axe performed using a generative adversarial network { GAN! 01 a variational Bayesian method. For example a generative high resolution as well as low resolution neural network can he created by looking first St just the nuisance spaces. Such systems can he GANs or variational networks or the like. In particular, the training architecture used by the embodiments described herein is preferably designed to converge to the ground truth (for validation samples) with the minimum number of samples.

[0087] In one such embodiment, the one or more components include one or more additional components the training of the high and/or low resolution neural networks Is performed using the one or more additional components, and the one or more additional components include a common mother network, a grand common mother network, an adversarial network, a GAN, a deep adversarial generative network, an adversarial autoencoder, a Bayesian Neural Network, a component Conf gured for a variational Bayesian method, a ladder network, or some combination thereof. For example, the transfer learning methods that may be used in the embodiments described herein: include: using a common mother network for back end of line 1 BEQLj layers; using a grand common mother network for BBOL layers (will likely work on SEM ); using aft adversarial network to accelerate training; using a Bayesian Neural Network (Variational Bayes), which requires far fewer layers; and using the concept of the ladder network for training. The embodiments described herein may be configured for accelerating training by '’legally amplifying samples. These methods are also known as semi- supervised (a tew exantples axe available, but the v ast majority are not labeled by humans or ground truth). [0088] The computer subsystems) can also use methods such as semi -supervised methods that combine Bayesian generative modeling to achieve their results in a minimum number of samples. Examples of such methods are described in U.S. Patent Application Publication No 2017/0148226 published May 25, 2017 by Zhang et a! , and /‘Semi-supervised Learning with Deep Generative Models,” Kingma et al., NIPS 2014, October 31. 2014, pp. 1 -9, which are incorporated by reference as if fully set forth herein. The embodiments described herein may be further configured as described in these references hi addition, the computer subsystemfs} may leverage ladder networks where supervised and unsupervtsed learning are combined in deep neural networks such as the ones proposed hi ‘‘$emi -Supervised Learning with Lauder Networks,” Rasmus et ak ; MIPS 2015, November 24, 2015, pp, 1 -19, which is incorporated by reference as if fully set forth herein. The embodiments described herein may be further configured as described in this reference. The computer subsystemfs) described herein may be forthet configured to tain the low resolution neinai network using a deep adversarial generative network of the type described in "Generative Adversarial Nets” Goodfollow et al,* June 10, 2014, pp. 1 -9, which is incorporated by reference as if fully set forth herein. The embodiments described herein tnay be further configured as described in this reference. In addition or alternatively* the computer subsystem! si described herein may be confi gured to train the low resolution neural network using an adversarial autoeneoder (a method that combines a variational autoeneoder (VAE) and a deep generative adversarial network (DGAN)) such as that described in“Adversarial Abtoenccklers,” Makhzani et al , arXiv:151 1.05644v2, May 25, 2016, 16 pages, which is incorporated by reference as if folly set forth herein . The embodiments described herein ptav be farther configured as described in this reference. In some instances, the computer subsystemfs) may be configured to perform Bayesian Learning as described in“Bayesian Learning for Neural Networks,” Neal, Springer- Verlag New York, 1996 ; 204 pages which is: incorporated by reference as if fully set forth herein. The embodiments described herein may be Further configured as described in this reference. The comparer subsystem^) may also be configured to perform the variational Bayes method as described in 'The

Variational Bayes Method in Signal Processing,” Smidl, Springer- Vetlag Berlin Heidelberg, 2006, 228 pages, which is incorporated by reference as if folly set forth herein. The embodiments described herein may be further configured as described in ibis reference.

[0089] In another embodiment, the images generated by the low resolution

imaging subsystem and acquired by the one or more computer subsystems include images taken through focus, the one or more computer subsystems are confi gured for mapping the images taken /through focus fo the images generated by the high resolution imaging subsystem, and training the low resolution neural network is performed based cm the results of training the high resolution neural network and results ofthe mapping, For example the computer subsystem* s) can exploit patches of image data taken through focus tin the purpose of resolving the many to one mapping representation problem between low resolution {many images can represent) to high resolution (the: ground truth). Identifying the high and low resolution images that correspond to the same area on the specimen (and therefore correspond to each other) may therefore be facilitated using the low resolution images taken at multiple focus settings

[0090] Some such embodiments may be performed using volumetric inspection techniques. In general, volumetric inspection includes using an inspection tool to collect intensify data sets at a plurality of focus settings from each of a plurality of xy positions of the sample- A polynomial equation having a plurality of coefficients is extracted for each ofthe xy position's collected intensity data sets as a function of focus setting. Each of the coefficients set of values for the plurality of xy positions is represented with a corresponding coefficient image plane, A target set of coefficient image planes and a reference set of coefficient image planes are then analyzed to detect defects on the sample. In ibis manner, a tuple of volumetric images can be transformed into a Fourier spatial domain for tire purpose of separating signal from noise. Volumetric inspection may be further performed as described to U.S. Patent Application Publication No. 2016 '0209334 by Chen et ah published on July 21, 3016, which is incorporated by reference as if fi l l !y set forth herein . The embodiments described herein may be further configured as described in this publication In the embodiments described herein, the optical based output of the low resolution imaging subsystem may include volumetric stacks of optical images (e.g., between 3 to 5i z stacks) to enable a solution to the so called“one-to-many mapping” problem in optical space. The optical system tuple concept can it Iso be extended to include other optical modes besides the z focus images such as different wavelengths and apertures.

[0091] It is believed that a DL approach is superior particularly when combined with an optical volumetric stack of z images. For example the neural uetwork(s) described herein may have a one-to-mauy u-ansfbrmation problem In particular since virtually all optical systems cart be modeled as a thin film stack, variations in the film stack: along with variations in the wafer topography may cause a one- to-many mapping when going font high resolution imaging to low resolution imaging. All these variations could be learned, but they may also be a source of noise particularly if they occur locally (e.g., due to local color variations). There are a myriad of hand crafted algorithms to cope with these but none of them are totally effective . The volumetric stack of images can help to mitigate the one-to- many mapping problem and to shore up the signal." For example the volumetric information captures“phase” inhumation in addition to“intensity” information front the optical images & contrast, normal optical based inspection only works front i ‘iriiensity ; w which is a cause of ambiguity (the“many” hi the“one-to-mauy” mapping), Therefore, the embodiments described herein can exploit patches of image data taken through focus for the purpose of resolving the manv-to-one «lapping representation problem between low resolution (many images can represent) to high resolution (the ground truth). [0092] It is quite easy to design a system that catches no defects -·- quite useless too. Fortunately, for the applications described herein, the nature of the key defects in the layers of the specimen being inspected. e.g., an RDL layer, ate well understood. For example, the known DOls may iuclude opens, shorts, protrusions and intrusions. [0093] The training set of defect images may inc lude a variety of information for the known DOfs including high resolution images. For: example, the training Set may include design information (design patches, computer-aided; design (CAD) design data, rendered design data, design context information) for each of the known DOls . The training set may a! so include other images such as test images , reference images, difference images segmentation images, ete for each of the known DOls. The training set may also include defect information such as defect classification, size, shape, location, etc. in general, the training set may include any information related to the known BOIs that may he input to the high and low resolution nenral network during training and/or runtime [0094] The known DOls may include a number of different kinds of DOfs

described herein from a number of different sources;. In general, the known DOfs in the training set may include known DOls identified by one or more methods or systems. The known DOfs preferably include {when possible) two or more examples of each type of known DQI { e.g., two or more open examples., -two or

«rore short examples, elci).

[0095] The training may include inputting the iflfortnatio» for the training set of known DOls into the high and or low resolution neural network and altering one or rnore parameters of the high and/or low resolution neural network until the output produced by the high and/or low resolution neural network for the known 1301$ marches for substantially matches) the information for the known DQ]$ in the training set. Training the high and/or low resolution neural network may also include a kind of re-training, which may include transferring all weights of some layers (e g., convoluti onal layers) of the high and/or low resolution neural network and fine timing weights of other layers (e.g., folly connected layers) of the high and/or low resolution neural network. Training ntay, however, include altering any one or more trainable; parameters of the neural network. For example, the one or more parameters of the neural networks that are trained by the embodiments described herein may inc lude one or more weights for any layer of the neural networks that has trainable weights. In one such example, rhe weights may include weights lot convolution layers but not pooling layers,

[0096] In some embodiments, the training set of defect images includes high resolution images that represent images of the specimen generated by more than one mode of the high resolution imaging subsystem. The more than one mode of the high resolution imaging snbsv stem corresponding to the images in the training set may include any of the modes described herein. The high resolution images in the training set may represent images of the specimen generated by all or only some (two or more) of the modes that the high resolution is capable of using.

[0097] In some instances, as described herein, at least some of images in the training set may he generated in a manner that does not necessarily involve the high resolution imaging subsystem. For example. One or mote of the high resolution images in the training set may represent images of the known DO Is generated by more than one mode thereby corresponding to different high resolution images generated by different modes of the high resolution imaging subsystem. Different high resolution images may be simulated for different modes of the high resolution imaging subsystem thereby representing die high resolution images that would be generated by the different modes of the high resolution imaging subsystem tor the known DOls. In this manner die high resolution images ma\ include images that simulate, represent, or approximate images that would fee generated by the high resolution imaging subsystem if a known DO I on a specimen -were «Paged: by the high resolution imaging subsystem.

[0098] In one embodiment, the training set of defect images includes one or more images of one or more programmed defects on the specimen, the one or more computer subsystems are configured for .generating the one or: more programmed defects by altering a design for the specimen to create the one or more

programmed defects in the design, and the altered design is printed on the specimen to create the one or more programmed defects on the specimen.

“Programmed” defects as that term is used herein can be generally defined as one or more defects purposefully caused on a specimen by manipulation of the design information for the specimen

[0099] In contrast to methods that involve creating synthetic but realistic images for training s printing a specimen with a design altered to include programmed defects) allows for the true entitlement capability of the system to be used because actual DOI {programmed defects printed on the specimen) are available in abundance. For users willing to create test wafers, reticles with defects programmed into the design by the computer subsysfenhs) can be used to print the altered design on the test wafers much like standard direct step on wafer (DSW) wafers leveraged for decades in the front end of line (FEOL). Producing such test wafers using at least some of the same process steps used to produce product : wafers: will enable: relatively high volumes of actual DOI images, which have the same optical properties as expected: in real examples on product, to he collected for use in training a neural network to separate DDL· from nuisance. [00100] Generating the programmed defeetfs) by altering the design lor the specimen may he performed based on information about the known DOIs such as type dimensions, locations, shape, etc., which may come from any appropriate source: (e.g., : prior design or process knowledge and/or defect detection results). Altering the: design for the specimen may be performed using an electron design automation (EDA) tool. In this manner, tire embodiments described herein may have an added dimension of leveraging programmed design data generated with LDA CAD tools. The EDA tool may include any suitable commercially available EDA tool , in addition, the C AD work can be automated with a

prograrnmabie/graphical EDA editor, which may include any suitable : EDA software, hardware, system, or method. I» some such embodiments, one or more of the computer subsystems described herein ie g. 5 computer subsystem(s) 102) may be configured as an EDA tool or may be a computer subsystem included in an EDA tool.

[00101] & one swell embodiment, altering the design fer the specimen to create fee one or more programmed defects in the design may he performed using aft inception module configured tor altering the design to create the programmed defects in the design. For example, the neural networks described herein may be trained by a defect hallucination system such as those suggested by GoogLeNet inception for natural scene images, A traditional neural network that is pretrained on defects can: then play these backwards to create new defect types oft other geometry structures. Examples of systems and methods fer performing GoogLeNet inception can be found in“Going Deeper with Convolutions,” Szeged y et at., 2015 IEEE Conference on Coftpmter Vision and Pattern

Recognition (.CVPR), June 2015, 0 pages, which. is incorporated by reference as if felly set forth herein. The embodiments described herein may be farther configured as described in this reference. [00102] The altered design may he printed on the specimen using a semiconductor fabrication subsystem configured to perform one or more fabrication processes on the : specimen. The one or mote fabrication processes may include malting retiele(s) or mask(s) with the altered design and then processing wafers with those reiiele(S ;} or mask(s), the one or more fabrication : processes may include any suitable such processes known in the art. As shown in Fig. L the system may include semiconductor fabrication system 108. which max·· be coupled io computer subsystem(s) 102 and/or any othet elements of the system described herein. The semiconductor fabrication, system may include any semiconductor fabrication tool and/or chamber known in the art such as a lithography track, an etch chamber, a chemical mechanical polishing (CM Pi mo!, a deposition chamber, a stripping or cleaning chamber, and the like. Examples of suitable seniieonductor fabrication tools that may be included in the embodiments described herein are described in ITS. Patent Ho § ,891,627 to Levy et al . issued on May 10, 2005, which is incorporated by reference as if fully set forth herein. The embodiments described herein may be further configured as described in this patent.

[00103] After the specimen has been printed with the altered design, the locations of the programmed defect(s) on the specimen can be imaged using die high and low resolution imaging subsystems. The high and Sow resolution images of the programmed defeat(s) can then be used in the training steps described herein. In this manner, the embodiments tnay use a hybrid approach involving empirical and programmed methods in combination, that incudes generation of programmed layout defects as described above iu design space and determining the empirical impact of those programmed layout defects ou wafers by making masks with the modified design and processing wafers with those masks . In this manner, the neural networks can he trained using actual images of programmed defects,

[00104] in another embodiment, the training set of defects includes one or moie images of one or more synthetic defects, and the one or more computer subsystems are configured for generating the one or more synthetic defects by altering a design for the specimen to create the one or more synthetic defects in the design, generating simulated high resolution images for the one or more synthetic defects based on the one or more synthetic defects in the design, and adding the simulated high resolution images to the training set. Generating the synthetic deiectfsj by altering the design for the specimen may be further performed as described herein. As shown in Fig 2, the one or moresynthetic defects may include "'injected" defects 208, which may be determined in any suitable manner. Information for the injected detects 208 may he used to alter design data 202, which may fee CA D data or any other suitable type of design data described herein. The altered design data may then Ire used to generate simulated high resolution images for the injected defects, which may then be input to high resolution neural network 200 as part of the training set. l¾e training ; set may then be used to train the high resolution neural network as described further herein.

[00105] Generating the simulated high resolution images may include simulating what the altered design would look like when printed on a specimen For example, generating the simulated high resolution images may include generating a simulated representation of a specimen on which the synthetic detect* s) would be printed. One example of an empirically trained process model that may be used to generate a simulated specimen includes SEMu!ater 3D, which is commercially available from Coventor, Inc , Cary, NC An example of a rigorous lithography simulation model is Ptoliih, which is commercially available from KLA-Teucor. and which can he used in concert with the SEMularor 3D product

However, the simulated specimen may degenerated using any suitable model(s) of any of the processes) involved in producing actual specimens from the design data. In this manner, the altered design (altered to include one or more synthetic defects ) may be used to simulate what a specimen on which the altered design has been formed wd I look like in specimen space { not necessarily what such a specimen would look like to an imaging system). Therefore^ the simulated representation of the specimen may represent what the specimen would look like in 2D or 3D space of the specimen [O01«| The simulated representation of the specimen may then he used to generate the simulated high resolution linages that illustrate how the specimen on which the synthetic defectsare printed would appear In one or more actual images of the specimen generated by the high resolution imaging subsystem, The simulated nigh resolution images may be produced using a model Such as WINsinp which is commercially available from KFA., and which can rigorously model the response of an inspector using : an electromagnetic (EM) wave solver. Such s im ulati ons may be performed using any other suitable software, algorithmis), method(s), or system(s) known in the art: [bbI07| In one such embodiment, the one or more computer subsystems are

configured for generating the simulated high resolution images using the high resolution neural network, and the high resolution neural network is configured as a deep generative model. For example, the computer subsystern(s) may use a deep generati ve model combined with a synthetic method of generating defects: on design (EDA/CAD) data to produce realistic systematic and random detects on high resolution images to inject imo the training set for use by any machine learning: algorithm including bat not limited to DL systems,

|00108f A‘‘generative’ model can be generally defined as a model that is

probabilistic in nature in other words, a“generative” model is not one that performs forward simulation or rule-based approaches and, as such, a model of the physics of the processes involved in generating an actual image or output (for which a simulated image is being generated) is not necessary. Instead, as described further herein, the generative model can be learned tin that its parameters can be learned) based on a suitable training set of data. The generative mode! may be configured to have a DL architecture, w hich may Include multiple layers that perform a number of algorithms or transformations The number of layers included in the generative mode! may be use ease dependent For practical purposes, a suitable range of layers Is fom 2 layers to a few tens of layers. Deep generative models that: learn the joint probability distribution (mean and variance) between the high resolution images (image of actual wafer) and design (e.g., CAD or a vector representation of intended layout} can be used to generate the simulated high resolution images that are included in the training set. Defect artifacts such as opens, shorts ^ protrusions intrusions, etc. eon!d he inserted into the CAD and then fed into a generative model trained by a network as described in U S Patent Application Publication No 2017/0148226 published May 25,

2017 by Zhang et al and ' Semi -supers- ised Learning with Deep Generative Models,” Klugnia et al, NIPS 2014, October 31, 2014, pp. I -‘3, which are incorporated by reference as if fully set forth herein, to create realistic defects. The embodiments described herein may be further configured as described in these references. [00109] In an additional embodiment, the training set of defects inc hides one or more images of one or more synthetic defects, the one or more computer subsystems are configured for generating the one or more images of the one or more synthetic defects by altering: a design for the specimen to it cate the one or more synthetic defects in the: design, and the one or more computer subsystems are configured for generating simulated lo w resolution images for the one or more synthetic defects based on the one or more synthetic defects in the design. In this manner, the simulated low resolution images illustrate how the defects (e.g , known DCMs) appear in one or more actual images generated by the low resolution imaging subsystem. As such, the simulated image(s) may represent (e.g , correspond, simulate, or approximate! images that may be generated of the defects by the low resolution imaging subsystem. [Oil! 10] Generating the one or more synthetic defects by a Itering a design for the specimen may be performed as described further herein. If design (CAD) is available, it is straightforward to inject legal defect examples. For example, DOIs such as : open S v shorts,“mouse bites,” protrusions, etc. can be rendered (drawn) with various sizes, which could be automated : based on descriptions of the DO is

I -sing an ED A tool, these rendered DOIs can be located in/ 'legal '' places in the geometry* instead of m random: places. In one example, a short is a metal connection between two copper lines. For such a DOf, one could simply add S small short line at strategic pinch points in the design. Process patterning defects can also he drawn in the segmentation mode . Segmentation mode generally refers to a. stage of the inspection in which images generated by the mspecti on tool are segmented with or without user input or design information Process patterning defects refers to material that can be added, lifted, pinched off etc. and usually happen in a manner that is somewhat independent of the geometry or design patterns formed on the specimen (although the geometry or design patterns may in actuality contribute to the formation of such defects). One or more examples of such process patterning defects can he hand drawn by a user in the segmented images and then injected Into the neural networks described herein in that: way during training..

[0011 I j In one such example shown In Fig 2, the one or more synthetic detects may include‘injected” defects 208, which may he determined as described herein. Information for the injected defects 208 may be used to alter design data 202, which may be CAD data or any other Suitable type of design data described herein. The altered design data may then be used to generate simulated low resolution images for the injected: defects, which may then be input to low resolution neural network 20b for training, which may be performed as described further herein. [00112] In one such embodiment, fie one or more computer subsystems are configured for generating the simulated low resolution images using a deep generative model. Tot example, the computer subsystem(s) may use a deep generati ve model combined with a synthetic method of generating defects on design (EDA/C AB) data : to produce : realistic systematic and random defects on low resolution images to inject into the training set for the use by any machine learning algorithm including but not limited to DL systems. The deefr generative model may be configured as described herein.

[00113} In another such embodiment, generating the simulated low resolution images is performed with a generative adversarial network or a, variational Bayesian method. For example, to leverage the design at it fullest, a rendering trick of O AM or variational Bayes may be used to create realistic looking low resolution images for training. The GA.N or variational Bayesian method may be configured and/or performed as described further herein

[00114} in a further embodiment, the training set of defect images includes one or more synthetic defects, and the ©he or more computer: subsystems are configured for generating the one or more synthetic defects by altering one or more of the images generated by the high resolution imaging subsystem and one or more of the images generated by the low resolution imaging subsystem to create a segmentation image,, altering the one or more of the images generated by the high resolution imaging subsystem based on the segmentation image, and generating Simulated low resolution images for the one or more synthetic defects based on the altered one or more images. For example, when design (CAD) is not available, the high resolution images and Ice, resolution images may be leveraged to create as perfect a segmentation (binary) image as possible. There are numerous representation networks dial can be used to perform this segmentation in particular, a high resolution image will generally have less noise than a low resolution image. So a segmentation algorithm can be used a priori to create effectively something that looks like the design for it will at least he cleaner in the high resolution image than the low resolution image) and then can he transferred to the low resolution image (either with a simple geometric operation or a mote complex neural network iioage-to-iroage translation) thereby producing a relatively good“pseudo CAD" for the image. Ones we have this segmentation image, defects can be injected (drawn manually or automatically) and then simulated low resolution images can he rendered for the injected detects and used for training. The image-to-irnage translation may be performed as described in II. S. Patent Application Publication No. 201 7/02002@5 published July 13, 2017 by Bhaskar et al ,, which is incorporated by reference as If fully set forth herein.

The embodiments described herein may be further configured as described in this publication Iroage segmentation and irnage-to- image translation may also be performed in this embodiment as described in“image-to-Image translation with Conditional Adversarial Networks,’' by Ispta et aL, arXiy: Idl L07OO4v2,

November 22, 2017, 17 pages, which is incorporated by reference as if felly set forth herein. The ernbodi men ts described her ein may be further configured as described in this publication.

[00115] in one such embodiment, generating the simulated low resolution images is performed with a generative adversarial network or a variational Bayesian method. For example, the computer subsystem! s) may use a GAN or variational network to re-create tow resolution images of the injected delects that are then used for training. The GAN or variational method may be configured and used as described further herein.

[00116] hi some embodiments, tbe one or more computer Subsystems are

configured for generating the at least one of foe defect images synthetically by altering foe at least one of the images generated by the high resolution imaging subsystem for the specimen to create high resolution Images for known DOlk For example, for the known DOis, the act of“painting” examples in a legal structure of tire design ; mie eati be used to enab le a DL network to quite easi ly detect Selects in one such example as shown in Fie. 2, the at least one of the defect images may indude“painted " detects 210. Based on information tor the known DOIs, which may be acquired as described herein information for bow tie known DOl s would: appear in tie high resolution images may be generated Information for painted defects 210 may be used to alter high resolution images to create the high resolution images tor the painted defects. In one particular example, based on information for a bridge defect, one or more high resolution images can be altered to show such a bridge between two patterned structures. The information for the: bridge defect may inefede ini¾rt¾biaB such as hdw the defect type tends to appear in high resolution images and expected characteristics for the bridge defect such as the dimensions, materials, shape, texture, and the like that may have some effect on the high resolution images. The altered high resolution images maybe input to high resolution neural network 200 as part of the naming set and then used to train tie high resol ution neural network as described further herein.

[00117] In another embodiment, the training set of defect images includes one or more images of one or more artificial defects on the specimen generated by performing a process; on the specimen known to cause the one Or more artificial defects on the specimen. For example, as shown in Fig. 2, actual wafer data 204 may include defects that have been detected on one or more wafers (e.g., defective test wafers) and that lave preferably (but not necessarily) been verified and/or classified as EMM using a ground troth method (e.g., using a SEM defect review method, user verification or classification performed manually, etc.) information for the detected defects may be input to high resolution neural network 200 as part of the training set and then used to train the high resolution neural network as described further herein.

[00118] In some such embodiments the user can provide a defective test wafer that has examples of opens shorts, and other types of DOIs, A process windew qualification (PWQ) type DO! wafer can also be used as a defective test wafer to generate information for known DO!s that cun be used to train the model so that real world examples of defects are made explicit by the user. An extreme process condition can be deliberately induced so that examples of such defects are produced and detected on the test specimen. The PWQ equivalent of EPL layers can be exploited.

[00119] The process known to cause the artificial defeetis) on the specimen may be performed with two or more different values of one or more parameters of the process. Such a process may be performed using a PWQ method. For example, designs of experiments (DOFs) such as PWQ may be used as a generator of systematic defects. In general, PWQ is a technique invented by K.LA in the early 2000s for lithography focas and exposure " process window characterization and is widely adopted in one form or another. The basis for P WQ is to create an inspector compatible wafer where there are nominal dice and modulated dice next to each other tin a systematic fashion to maximize signal for the inspector. The one or more parameters of the process: that are varied in the PWQ method may include focus and exposure :(e.g, as in a focus-exposure PWQ process). PWQ methods pray also he performed as described in U.S. Patent Nos 6,902,855 to Peterson et al issued on June " 2005, 7,418,124 to Peterson et al. issued on

August 26, 2008, 7,720,529 to Wu et al. Issued on June 1, 20X0, 7,769,225 to Kehare et al. Issued on August 3, 2010, 8,041, 106 to Pak et at. issued on October 1 :8, 201 1, 8.111.900 to Wu et al . issued on! February 7, 20X2, and 8,2X3,704 to Peterson et al. issued on July X 2012, which are incorporated by reference as if fully set forth herein. The embodiments described herein may include any stepfs) of any method(s) described In these patents; and may be further configured as described in these patents. A PWQ Wafer may be printed as described in these patents. [§§1201 Soeh a process may also fee performed using a focus exposure matrix IFF M i method. For example, DDEs such as FEM methods and or wafers may be used as a generator of systematic : defects. FEM methods generally involve printing a number of dies on a wafer at different combinations of focus and exposure parameter values of a hfeography process, The different dies can then be inspected in any suitable manner to delect defects in the different dies. That information is then typically used to determine a process window for the focus and exposure of the lithography process. Therefore, a FEM method may be used to print such, dies on a specimen, and the defects defected on such a specimen may he used to identify Mown DOIs .

[§§1211 As described above, therefore, one or more DGEs such as PWQ and FEM wafers may be used as generators of systematic defects in this manner* the high resolution nenml network: rnay fee trained using information generated from a PWQ or FEM wafer that can act as a generator of systematic defects. While PWQ and their“cousin” FEM wafers are primarily used for determining process margin today, they can fee repurposed for training the neural networks described herein with real defects since they will occur in abundance on these wafers. These wafers and the: information generated from them can then be used as training samples for the training described further herein If such samples do not pros ide a complete set of possible defects, the information generated from such wafers may he complemented with other information: such as that generated by synthetic defect generation, which may be performed in a number of different manners as described further herein.

[§§1221 Performing a process on the specimen blown to cause the artificial

defects ) on the .specimen may he advantageous when not all defect types can be created using the design such as bottom bridges and metal residue, Sued defects can be induced by process out of window {where the process is performed using one or more parameters that are known to be outside of the process window for the process). The reticle may have RDL Comb / Meander Rs of different width. Different concentrations of metal glue layer removal can be experimented with to produce these types of defects. The locations of these pr ocess defects can be determined by measuring the chain resistance as infinite or zero indicates an open or short then imaged for use in creating an optimal DL network.

[00123] hi an additional embodiment, the training set of defect images includes one or moie defects detected on the Specimen in one or more of rhe images generated by the high resolution imaging subsystem. For example, as shown in Fig 2, actual wafer data 204 may include defects that have beep defected on one or more wafers (e.g. defective test wafers) and that have preferably (but not necessarily) been verified and/or classified as DO!s using a ground truth method (e.g., using a SEM defect review method, user verification or classification performed manually, etc.). Information for the detected defects may be input to high resolution neural network 200 as part of the train ing set. The training set may then be used to train the high resolution neural network as described further herein.

[00124] in one such embodiment, die one or more computer subsystems are

configured for detecting the defects on the specimen in the images generated by the high resolution imaging subsystem by single image detection (SID} For ex ampler the high resol ution imaging subsystem may be trained by a version of the SID algorithm. SID may be performed by the embodiments described herein as described in ITS. Patent Application Publication No, 2017/0140524 published May I S. 2017 by Karseuti et al, which is incorporated by reference as if fully set forth herein The embodiments described herein may be further configured as described in this publication.

[00125] · & anothei such embodiment the one or more computer subsystems are configured for detecting the defects on the specimen in the images generated by the high resolution imaging subsystem by die-to-datahase detection. For example, the computer subsystem! s) may leverage a machine learning algorithm or any die- to-database inspection algorithm as a ground truth trainer, Dic-to-darabase detection may lie performed by comparing tire high resolution images to a reference such as design data far the specimen. The results of such comparing may therefore he difference images (as the reference may he subtracted from the high resolution test images). The difference images may then be used to identify possible defects in the difference images (e.g., by applying a threshold to The difference images)

[00126] The training set of defect images that are usedidr training as described herein may therefore come front a couple of different sources including actual defects that just happen to be detected on an actual specimen or programmed, synthetic, and artificial defects : that are intentionally caused on the specimen or in images rendered for the specimen. The training set of defect images can also include some combination of actual defeats and programmed * /synthetic and/or artificial defects.

[00127] Using the programmed, synthetic, or artificial defects {possibly in

combination wuh actual defects .) may be advantageous for a /couple of reasons.

As described further herein, one of the applications that the embodiments described herein have been created and are particularly advantageous for is RDL. Although fine pi tch RDL occupies a relatively small area of chip layouts * they are a known source of yield loss clue to dense M>L patterns. To achieve high yield, Mine: defect inspection is deployed during formation of the RDL lines.

Compltearifig matters is that killer defects for RDL formation are few and far between in the actual production environment However * 100%k capture rate of key killer defects is usually required : in. these fine pitch RDL inspections. Many iterations of inspection recipe modifications may have to be done to accommodate all killer defect types, Hence, a production worthy inspection recipe can take weeks or mouths to fine time due to scarcity of small actual killer defect samples.

[00I28| The challenge here Is to optimize modes in the discovery phase with the shortest cvele time and the least iterations For example, if substantially small

RDL. shorts of about 1 micron do not appear j¾ the first 50 actual product wafers, the application engineer has to wait until the 51st wafer for his optimize inspection recipe process to complete and achieve 100% capture rate. The embodiments described herein however, provide a systematic approach to generating systematic repeating critical size (e.g. 0.5 microns, 1.0 microns, and

2 0 microns) defects in Sxed locations on a lithography masfc. Lithography based killer defect types like RDL metal shorts and RDL tnetal opens ean be reproduced: in a systematic way by violating the mask pattern either by creating opaque or pin: dot defects on the mask. A 3D optical: lithography simulation too! can be employed to predict the primabilhy of various reticle defect scenarios.

Experimental data can be used to validate the 3D simulator by comparing modeling data to SEM measurements of wafers exposed with a reticle containing programmed clear pinhole and opaque pin dot defects.

The programmed, artificial, and synthetic defects used for the training described herein can also be designed in such a way that both the manufactured; lithography defects and wet etch induced process defects can cause electrical fell are. Meander or comb structures can be designed with daisy chain structures to measure chain resistance as a proxy for RDL shorts or breaks. Several new benefits are provided by this approach. For example, this technique can be used to build a calibration wafer with systematic lithography and wet etch process defects. In addition, the manufactured defects can be matched to electrical failures by mapping the physical defects detected by an inspection tool to real electrical failure data. The results of such mapping can be used to calibrate or alter the inspection process if the inspection process either over -detects or under- detests killer defects with their corresponding electrical failure sites. The embodiments also provide a robust way to bring a newly installed RDL inspector to its lull integration capability entitlement with such an approach within a short period of time instead of weeks or months after installation as the full layer R0L Stack represents each user process condition and margin for defect creation

Furtilemiore, tlie same systematic approach can be used fer different RDL applications even if those RDL processes have a huge variety of lithography processing approaches (e.g., mask aligners, projection steppers, laser direct Imaging, and laser ablation). The embodiments described herein can also cater to prior: layer RDL noise that pan be a major detection challenge for multiple RDL layers and build these into the specimens on which programmed and/or artificial defects are formed or for which the programmed and/or artificial defects are generated In addition, design data for a specimen on which programmed and!or artificial are formed may be created or altered so that the systematic defects can be cerrel ated with e ί ectrical test structures,

[CMl!JOJ The one or more computer subsystems are further configured for training the l ow resolution neural network using the training set of defect images as input. In this manner, the DL based macro inspector can use the high resolution imaging subsystem as a de-facto inspector to train a low resolution macro tool . The training of the low resolution neural network may be performed usrng one of the transfer learning techniques described further herein, e g., by using the high resolution neural network as a mother network. j 001311 In addition or alternatively, the known DO!s and their locations in the high resoluti on images may be used to identify locations of the known DOIs in the low resolution images. For example, as described further herein, the computer subsystem! s) may be. configured ibr generating high resolution images from low resolution images (or vice versa) using an image-to-image translation technique described herein. Therefore, high and low resolution Images that correspond to each other (as being generated at the same location on the specimen) can he identified In this way, at least one of the detect images in the training set may be generated synthetically by the high resolution neural network using at least one of the images generated by the high resolution imaging subsystem. The locations of the known DOIs in the high resolution images can then be used to identify the locations of the known DOIs in the low resolution images (e.g , by image coordinate translation or by overlaying corresponding images). The high resolution images generated at the locations of the known DOIs can then be used (with or without using the trained high resolution neural network for transfer learning) to train foe low msolution neural network. Such training mas he performed as described herein w i th i cspect to training the high resoiu dost neural network,

[00132] In one embodiment, the training set of defect images includes images of the specimen generated by more than one mode of the low resolution Imaging subsystem in one such embodiment, the more than one mode of the low resointion imaging subsystem includes all of the modes of the low resolution imaging subsystem. For example, the low resolution images used for training the low resointion neurai network may be generated for 30+ modes (or ail of the modes of the low resolution imaging subsystem) . The modes of the images used for training the low resolution neural network may include images that were actually generated using the: low resointion imaging subsystem (e.g., by imaging an actual: specimen using the modes of the low resolution imaging subsystem). In addition, Or alternatively, images for some tone or more i or ail of the modes of the low resolution imaging subsystem used for training the low resolution neural network may be generated by simulation or image-te-image translation, both of which may be performed as described herein. In this manner, the images that arc used for training tire low resolution neural network may include Images for all of the modes generated using the low resolution imaging subsystem, images for all of the modes generated without using the low resolution imaging subsystem, or some combination thereof.

[001331 The neural networks desert bed herein may also include networks that require minimal training samples. Exam ples of training neural networks wi th a limited training set are described in U.S. Patent Application Publication No.

2017/0193400 published July 6, 2017 by Bhaskar et al.. which is incorporated by reference as if fully set forth herein. The embodiments described herein may be further configured as described in this publication:. The training that is performed herein may also include active learning schemes (ALS) such as those described in

U.S. Patent Application Serial No. 62/681 ,073 filed June 5, 2018 by Zhang et ai ., which is incorporated by reference as if fully set forth herein. The embodiments described herein may be ferther configured as described in this patent application. [001341 hi another such embodiment, the one or more computer subsystems are configured for selecting one or more of the more than one mode of the low resolution imaging subsystem usmd for detecting defects: on another specimen (or other specimens) based on results of training the low resolution neural network with the images generated by the more than one mode of the low resolution imaging subsystem. For example, when combined with the nuisance suppression technique described above the multiple modes of the low resolution images used for training enable determining which one Or more (e.g , : 3) diverse modes of the low resolution imaging subsystem are capable of catching all D01s and suppressing nuisances. In one such example, one or more of the modes that provide the best combined performance for DO! detection and nuisance suppression may he selected for use in inspection of other specimens. In another such example, one or more of the modes that in combination provide the best combined performance (where one mode may compensate for another mode and/or images generated using more than one mode are used in combination ! for DEM detection and nuisance suppression may be selected for inspection of other specimens. The one or more modes may be diverse in one or more parameters of the optical modes (e g.. different wavelengths, different polarizations, different pixel sizes (magnifications), etc.). These modes are then used to scan the entire wafer and then the trained low resolution neural network uses 'images generated by scanning the wafer with these modes to detect DOIs In this manner, the OL based macro inspector embodiments described herein can exploit the entire mode space of die inspector (e.g., wavelengths, apertures, BF vs. DF, etc.)

[00135] The one or more computer subsystems are also configured for detecting defects on another specimen by inputting the images generated for the other specimen by the low resolution imaging subsystem into the trained low resolution neural network. In tin s manner, once the low resolution neural network has been trained, images generated by the low resolution imaging subsystem for other specimens (possibly with the one or modes of the low resolution imaging subsystem selected as described herein) may he input to the tow resolution neural network by the computer subsystem! s)„ and the low resolution neural network may detect detects In the input Images and generate information for the detected defects, which may inc lude any suitable output that can be generated by the low resolution neoral network for the detected defects.

[00136] In one embodiment the high and low resolution neural networks are

configured for single image defect detection. For example, a die~to~die algorithm may not be in voked by the systems described herein for defect detection, Instead, the computer Snbsystern(s) may use SID for defect detection. Using such defect detection cbm mates any misalignment issues from affecting the defect detection results SID may he performed : as described further herein

[00137] in one embodimen r, the inspec ti on tool is configured as a macro inspection tool, A macro inspection tool is suitable for inspection of relatively noisy BEQL layers such as RBL and post-dice applications to detect defects in the presence of enormous noise such as grain on metal lines. A macro inspection tool is defined herein as a system that is not necessarily diffraction limited and has a spatial l esoiution of about 200 iim to about 2.0 microns and above. Such spatial resolution means That the smallest defects that such systems cart detect have dimensions of greater than about 200 nrn, which is much I arger than the smallest defects that the most advanced inspection tools on the market today can detect, hence the“macro” inspector designation. Such systems tend to utilize longer wavelengths of light (e.g. , about 500 run to about 700 nm l compared to the most advanced inspection tools: on the market today. These systems may he used when the DO is have relatively large siz.es and possibly also when tbrpughputs of TOO wafers per hour (wph> or more are required (wafer throughput here refers to number of 300 mm wafers inspected per hour).

[00138] The embodiments described herein provide a novel DL based macro

inspector that suppresses nuisances in RDL and grainy (high noise) layers by coop timizing the mode space as well as the algorithm detection space For example, in some embodiments, the defects detected on the other specimen are defects of a BEOL layer of the other specimen. The BEOL layer may include any BBOL layer known in the art including those described iterein lit a further embodiment, the defects detected on the other specimen are defects of a RDL layer of the other specimen. The RDL layer may have any suitable configuration known in the ait,

[00139] In another embodiment, the defects detected on the other specimen are defects of a high noise layer of the other specimen. A“high noise” layer as that term is defined herein generally refers to a layer whose noise is the predominant obstacle in inspection of the layer. Tor example, while every' wafer layer that is inspected by any inspection tool may exhibit more or less noise than other layers (ard techniques for handling detection of such noise must in ueneral be used in the inspection of every wafer layer), the primary obstacle in inspecting wafer layers successfully is most often the extremely small size of the defects that must be detected in contrast, the embodiments described herein are particularly suitable for detecting relatively large {‘'macro" defects of about 20Q»«i and above in size) ' rbereibrtr the primary obstacle in such inspection is not necessarily the size of the defects that must be detected (as many inspection tool configurations are capable of detecting: such large defects on most layers). Instead, the layers described herein will in general exhibit such“high noise” levels in images generated for the layers that detecting defects of even such large sizes can be rendered difficult if not impossible. However, the embodiments described herein have been designed to handle such noise levels vk the training (and optional mode selection) described herein that detecting defects on snch high noise layers is rendered possible.

|OO140| M ®n additional embodiment, tire defects detected on the other specimen are defects of a kyer that includes metal lines of the other specimen lor example, the BEQL and K.DL layers described herein may include metal Sines that form various elements of the devices being formed on the specimen. Such metal lines may produce a significant amount of‘‘grain” noise, which is described further herein. However, the embodiments described herein are configured for enabling detection of defects on such layers despite the grain noise due to the various training methods described herein.

[00141 j In some embodiments, the other specimen on which the defects are

detected is a post-dice specimen. A“post-dice” specimen can be generally defined as a wafer or other substrate on which multiple devices have been formed (e.g.. in different dies op dice) and then separated from each other in one of various ways A“post-dice” specimen may also be a specimen that has been separated into multiple dies or dice, which have not yet entered the packaging process. [00142] The defects that are detected on such layers and specimens may include, for example, EDI. metal line detects (shorts/bridges, opens broken lines, metal residnes/bottom bridges) via contact defects ( photoresist residues/via scumming), bump delects, micro-bump defects, copper pillar defects, after-stacking-of-chips defects, after ehemtcai-mechameal processing (CMP) defects, and after-grinding defects. Therefore, the embodiments described herein can be used to monitor (and possibly correct) any of the processes that were performed on the specimen and respited in such defects

[00143] The embodiments described herein were designed to be particularly

effective fef detecting such defects for a number of different reasons. For example, such defects fend to be relati vely difficult to detect because they tend to be located: in a substantialiy noisy (e. g.. grainy) background. Iff one such example. Substantial noise can be detected by inspection due to within RDL metal line noise, which may be caused by excessive metal grain in another sneh example, substantial noise can be detected by inspection due to inter-EDL metal layer noise caused by transparent dielectric polymer on or under the RDL layer. As such, the ratio of false events versus the real killer 001 that are reported by previously used inspection systems and methods can be substantially high- However, by training the low resolution neural network as described herein, which can be performed with a relatively high number of known DG!s by the embodiments described herein, the trained low resolution: neural network can be used for deteeting such DQl without detecting a huge amount of nuisances ' hi addition, using the SID method described herein for detecting such defects will reduce the die-to-die defect detection source of noise.

[00144] High-performance computing (HPC) applications such as AI networking chips and field programmable gate arrays (hTGAs) are increasingly being utilized and advanced muki-chip packaging to integrate different functions may be a fast time-to-market and cost effective solution instead of silicon on chip f SOC). Accordingly, much denser die-to-ctie communication Inpuf/ontpui (I/Q$) for advanced packaging are needed. To fulfill this demand, relatively large numbers of registered routing lines between dies lead to a constant dri ve for miniaturization for dimto-die SDL among industry participants. To : meet the future demand, RDL line width with 2am/2sm Ime/space ¼ about to go into volume production and active development is commencing with RDL line width down to a submicron range (less than about 1 micron). Typically, die size of HPC chips are

substantially large arid notoriously low yield. Key yield loss areas are where fine pitch RDL lines are placed. For example, today’s Tan out packages range from Sum line and space (5- Sum) and above with 2-2untin the works in research and development some are working on high end fan-out technologies at l-lurn and below, including packages capable of supporting high -bandwidth memory'

(HBM), Targeted for networking/server applications, fan-out at 2-2utn may appear soon with !-l unrelated for around 2020 The embodiments described herein advantageously provide systems and methods for effective and efficient defect detection in such devices thereby overcoming a significant obstacle in the Successful production of such devices.

[00145] The embodiments described herein have, therefore, a number of

advantages over other methods and systems for detecting defects on the specimens described herein, some of winch are described above/ In addi tion, the steps described herein can reduce the two weeks of data gathering currently needed to setup inspection recipes for the specimens described herein into eight hours of data gathering followed by about 1 to 2 hours of offline processing. At this phase, a network is trained that can run the whole water In addition, the training described herein requires: minimal user intervention. For exaniple v a user may classify a maximum elf 100 events or paint a maxi mum of 100 delects. The network will be hyper-tuned to catch the core defects and suppress the real noise. When real DOI are detected by this network, they can be weighted more hea\ ily than the artificial detects described herein and used to tine tune the low resolution neural network for both DQI detection and nuisance rate suppression . Such fine- tuning may be perfwmed in an active learning method or scheme which may be performed as described further herein,

[00146] The embodiments described herein may be further configured as described in commonly owned U.S. Patent Application Publication Nos. 2017/0140524 published May 18., 2017 by Karsenti et al., 2017/0148226 published May 25. 201 7 by Zhang et al.. 2017-0193400 published July 6, 2017 by Bhaskar et al,

2017/0193680 published July 6, 2017 by Zhang et :¾l, 2017/0194126 published July 6. 2017 by Bhaskar et a!., 2017/0200260 published July 13, 2017 by Bhaskaf et al, 2017 0200264 pobiisbed July 13, 201 7 by Park et aL 201 7 0200265 published July 13, 2017 by Bhaskar st al, 201 7 0345140 published November 30, 2017 by Zhang et al. 2019/0073566 published March 7, 2019 by Brauer, and 2019/0073568 publisbed/March, 7, 2019 by He et al., which are incorporated by reference as if fully set forth herein The embodiments described herein: may he further configured as described in these publications in addition, the

embodiments described herein may be configured to perform any steps described in these publications,

[00147] All of the embodiments described herein may include storing results of one or more steps of the embodiments in a computer-readable storage medium. The results may include any of the results described herein and may be stored in any manner known in the art. The storage medium may include any storage medium described herein or any other suitable storage medium known in the art. After the results have been stored, the results can he accessed hi the storage medium and used by any of the method or system embodiments described herein, formatted for display to a user, used by another software module, method, or system, etc. to perform one or more functions for the specimen or another specimen. Such functions include, but are not limited to, altering a process such as a fabrication process or step that was or will be performed on the specimen on which defects were detected in a feedback or feedforward manner, etc.

[00148] Each of the embodiments of each of the systems described above may be combined together into one single embodiment.

[00149] Another embodiment relates to a cornpuier-implemnnted method for training a neural network for delect detection in low resolution linages, The method includes generating Images for a specimen with high and low resolution imaging subsystems of an inspection tool. The imaging subsystems and the inspection tool are configured as described farther herein. One or more components are executed by one or more computer systems, and the one or more components include a high resolution neural network and a low resolution neural network * The one or more components, the one: or more computer systems , and the high and low resolution neural networks are configured as described furthei herein The method includes generating the training set of defect images* training the low resolution neural network, and detecting defects steps described further herein. These steps are performed hy the one or more computer systems.

[00150] Each of the steps of the method may be performed as described further herein. The method may also Include any othei slept st that can be performed by the system, computer system(s), and/br neural networks described herein. The computer system(s) may he configured according to any of the embodiments described herein, e.g., computer suhsystem(s ) 102 in addition, the method described above may be performed by any of the system embodiments described herein.

[00160] An additional embodiment relates to a rson- transitory computer-readable medium storing program instructions executable ©n one or more computer systems for performing a computer-implemented method for training a neural network for defect detection in low resolution images. One such embodiment is shown in Fig 3. in particular, as shown in Fig. 3. non-transitory eompurer- leadable medium 300 includes program instructions 302 executable on computer systemis) 304. The computer-implemented method may include any stepfs) of any method(s) described herein.

[00127] Program instructions 302 implementing methods such as those described herein may be stored on computer-readable medium 300. The comp met -readable medium may be a storage medium such as a magnetic or optical disk, a magnetic tape, or any other suitable non-transitory computer-readable medium known in the

[00127] The program instructions may be implemented in any of various ways, including procedure-based techniques, component-based techniques, and/or object-oriented techniques, among others. Foi example, the program instructions may be implemented using ActiveX controls C M- objects, lavaBeaas, Microsoft Fpimdatfon Classes (“MFC }, SSE (Streaming SOVffi Extension) or other technologies or methodologies, as desired

[00127] Computer system(s) 304 may be configured according to any of the

embodiments described herein.

[00127] Further modifications and alternative embodiments of various aspects of the invention will be apparent to those skilled in the art in view of this description* Fot example. methods and systems for training a neural network for defect detection in low resolution images are provided. Accordingly, tMs description is to he consum d as illustrative only and is lor the purpose of teaching those skilled, in the an the general manner of carrying out the imention. It is to be understood that the forms of the invention shown and described herein ace to be taken as the presently preferred embodiments, Elements and materials may fee substituted for those illustrated sad described herein, parts and processes may be reversed, and cenahi features of the invention may be utilized independently, ail as would be apparent to one skilled in the art after iiav ing the benefit of this description of the invention. Changes may be made in tie elements described herein without departing from the spirit and scope of the invention as described In the following claims.