Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD FOR DETERMINING A STOCHASTIC METRIC RELATING TO A LITHOGRAPHIC PROCESS
Document Type and Number:
WIPO Patent Application WO/2022/233546
Kind Code:
A1
Abstract:
A method of determining a stochastic metric, the method comprising: obtaining a trained model having been trained to correlate training optical metrology data to training stochastic metric data, wherein the training optical metrology data comprises a plurality of measurement signals relating to distributions of an intensity related parameter across a zero or higher order of diffraction of radiation scattered from a plurality of training structures, and the training stochastic metric data comprises stochastic metric values relating to said plurality of training structures, wherein the plurality of training structures have been formed with a variation in one or more dimensions on which said stochastic metric is dependent; obtaining optical metrology data comprising a distribution of the intensity related parameter across a zero or higher order of diffraction of radiation scattered from a structure; and using the trained model to infer a value of the stochastic metric from the optical metrology data.

Inventors:
BATISTAKIS CHRYSOSTOMOS (NL)
PISARENCO MAXIM (NL)
VAN KRAAIJ MARKUS (NL)
RUTIGLIANI VITO (NL)
MIDDLEBROOKS SCOTT (NL)
VERSCHUREN COEN (NL)
GEYPEN NIELS (NL)
Application Number:
PCT/EP2022/059781
Publication Date:
November 10, 2022
Filing Date:
April 12, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ASML NETHERLANDS BV (NL)
International Classes:
G03F7/20; G06N20/00
Domestic Patent References:
WO2017117568A12017-07-06
WO2017148982A12017-09-08
Foreign References:
US20160003609A12016-01-07
US20210035833A12021-02-04
US3392106A
US20100201963A12010-08-12
US20170184981A12017-06-29
US20160282282A12016-09-29
US20060066855A12006-03-30
Other References:
"Computational EPE-driven SPC for semiconductor manufacturing processes", RESEARCH DISCLOSURE, KENNETH MASON PUBLICATIONS, HAMPSHIRE, UK, GB, vol. 665, no. 40, 1 September 2019 (2019-09-01), pages 939, XP007147735, ISSN: 0374-4353, [retrieved on 20190819]
Attorney, Agent or Firm:
ASML NETHERLANDS B.V. (NL)
Download PDF:
Claims:
CLAIMS

1. A method of determining a stochastic metric relating to a structure, the method comprising: obtaining a trained model, the model having been trained to correlate training optical metrology data to training stochastic metric data, wherein the training optical metrology data comprises a plurality of measurement signals relating to a plurality of angularly resolved distributions of an intensity related parameter across a zero or higher order of diffraction comprised within radiation scattered from a plurality of training structures on a substrate, and the training stochastic metric data comprises stochastic metric values relating to said plurality of training structures, wherein the plurality of training structures have been formed with a variation in one or more dimensions on which said stochastic metric is dependent; obtaining optical metrology data comprising an angularly resolved distribution of the intensity related parameter across a zero or higher order of diffraction comprised within radiation scattered from a structure; and using the trained model to infer a value of the stochastic metric associated with the structure from the optical metrology data.

2. A method as claimed in claim 1, wherein each of said measurement signals further comprises spectrally resolved distributions of the intensity related parameter across a zero or higher order of diffraction comprised within radiation scattered from the plurality of training structures on the substrate.

3. A method as claimed in claim 1, wherein the parameter is diffraction efficiency.

4. A method as claimed in claim 1, wherein the training optical metrology data further comprises nominal informative metrology data relating to one or both of: non-defect measurements and/or simulations; and specific defect measurements or simulations.

5. A method as claimed in claim 1, wherein the model comprises a machine learning model, neural network or convolutional neural network.

6. A method as claimed in claim 1, wherein the variation in one or more dimensions is associated with a variation in one or more process parameters of a lithographic process used in applying the training structures to the training substrate.

7. A method as claimed in claim 6, wherein said training stochastic metric data describes an acceptable space or range of stochastic metric values or related dimensional metric values, and a corresponding acceptable space or range of values of the one or more process parameters. 8. A method as claimed in claim 6, wherein the one or more process parameters are one or more of: dose, focus.

9. A method as claimed in claim 1, further comprising the initial steps of: obtaining said training optical metrology data and stochastic metric data; and training said trained model on said training optical metrology data and stochastic metric data.

10. A method as claimed in claim 9, comprising obtaining high-resolution metrology data; and determining said stochastic metric data from said high-resolution metrology data.

11. A method as claimed in claim 10, wherein said high-resolution metrology data is obtained from scanning electron microscope metrology.

12. A method as claimed in claim 1, further comprising using the inferred value for the stochastic metric to decide where and/or when to perform further high-resolution metrology.

13. A method as claimed in claim 1, wherein the stochastic metric comprises one or more of: defect rate or other defect metric, line edge roughness, line width roughness, local critical dimension uniformity, circle edge roughness or edge placement error.

14. A computer program comprising program instructions operable to perform the method of any of claims 1 to 13, when run on a suitable apparatus.

15. A non-transient computer program carrier comprising the computer program of claim 14.

Description:
METHOD FOR DETERMINING A STOCHASTIC METRIC RELATING TO A LITHOGRAPHIC

PROCESS

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority of EP application 21172589.0 which was filed on 6 May 2021, and EP application 21179403.7 which was filed on 15 Jun 2021, and EP application 21214225.1 which was filed on 14 December 2021, and EP application 22156035.2 which was filed on 10 February 2022 and which is incorporated herein in its entirety by reference.

FIELD

[0002] The present invention relates to methods and apparatus for applying patterns to a substrate in a lithographic process.

BACKGROUND

[0003] A lithographic apparatus is a machine that applies a desired pattern onto a substrate, usually onto a target portion of the substrate. A lithographic apparatus can be used, for example, in the manufacture of integrated circuits (ICs). In that instance, a patterning device, which is alternatively referred to as a mask or a reticle, may be used to generate a circuit pattern to be formed on an individual layer of the IC. This pattern can be transferred onto a target portion (e.g. comprising part of, one, or several dies) on a substrate (e.g. a silicon wafer). Transfer of the pattern is typically via imaging onto a layer of radiation-sensitive material (resist) provided on the substrate. In general, a single substrate will contain a network of adjacent target portions that are successively patterned. Known lithographic apparatus include so-called steppers, in which each target portion is irradiated by exposing an entire pattern onto the target portion at one time, and so-called scanners, in which each target portion is irradiated by scanning the pattern through a radiation beam in a given direction (the “scanning”- direction) while synchronously scanning the substrate parallel or anti-parallel to this direction. It is also possible to transfer the pattern from the patterning device to the substrate by imprinting the pattern onto the substrate.

[0004] In order to monitor the lithographic process, parameters of the patterned substrate are measured. Parameters may include, for example, the overlay error between successive layers formed in or on the patterned substrate and critical linewidth or critical dimension (CD) of developed photosensitive resist. This measurement may be performed on a product substrate and/or on a dedicated metrology target. There are various techniques for making measurements of the microscopic structures formed in lithographic processes, including the use of scanning electron microscopes and various specialized tools.

[0005] In performing lithographic processes, such as application of a pattern on a substrate or measurement of such a pattern, process control and or quality monitoring methods may rely on a stochastic analysis of features formed by the lithographic processes. Such a stochastic analysis presently requires high resolution metrology which may typically be implemented using a scanning electron microscope SEM). However, SEM metrology is slow, and therefore unsuitable for high volume manufacturing.

SUMMARY OF THE INVENTION

[0006] It is an object of the invention to provide a method of stochastic metrology at faster speeds than presently possible using an SEM.

[0007] In a first aspect of the invention there is provided a method of determining a stochastic metric relating to a structure, the method comprising: obtaining a trained model, the model having been trained to correlate training optical metrology data to training stochastic metric data, wherein the training optical metrology data comprises a plurality of measurement signals relating to a plurality of angularly resolved distributions of an intensity related parameter across a zero or higher order of diffraction comprised within radiation scattered from a plurality of training structures on a substrate, and the training stochastic metric data comprises stochastic metric values relating to said plurality of training structures, wherein the plurality of training structures have been formed with a variation in one or more dimensions on which said stochastic metric is dependent; obtaining optical metrology data comprising an angularly resolved distribution of the intensity related parameter across a zero or higher order of diffraction comprised within radiation scattered from a structure; and using the trained model to infer a value of the stochastic metric associated with the structure from the optical metrology data..

[0008] By using a using a trained model as described, an accurate method to derive stochastic metric is possible based on less time-consuming (compared to SEM metrology) optical metrology data.

[0009] In a second aspect of the invention, there is provided a computing apparatus comprising a processor, and being configured to perform the method of the first aspect.

[0010] In a third aspect of the invention, there is provided a scanning electron microscopy inspection apparatus being operable to image a plurality of features on a substrate, and comprising the computing apparatus of the second aspect.

[0011] In a fourth aspect of the invention, there is provided a computer program comprising program instructions operable to perform the method of the first aspect when run on a suitable apparatus.

[0012] In a fifth aspect of the invention, there is provided an optical metrology device comprising: an optical system operable to obtain optical metrology data comprising at least one measurement signal relating to a structure having been exposed in a lithographic process; non-transient data carrier comprising a trained model, the model having been trained to infer one or more stochastic metric values for said stochastic metric from optical metrology data, said trained model having been trained on training optical metrology data and training stochastic metric data, wherein the training optical metrology data comprises a plurality of measurement signals, each relating to scattered radiation having been scattered by a training structure of a plurality of training structures on a training substrate; and the training stochastic metric data comprises stochastic metric values relating to said training structures, wherein multiple instances of said training structures have been formed with a variation in one or more process parameters on which said stochastic metric is dependent; and a processor being operable use the trained model to infer a value for said stochastic metric from said optical metrology data.

[0013] Further aspects, features and advantages of the invention, as well as the structure and operation of various embodiments of the invention, are described in detail below with reference to the accompanying drawings. It is noted that the invention is not limited to the specific embodiments described herein. Such embodiments are presented herein for illustrative purposes only. Additional embodiments will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein.

BRIEF DESCRIPTION OF THE DRAWINGS

[0014] Embodiments of the invention will now be described, by way of example, with reference to the accompanying drawings in which:

Figure 1 depicts a lithographic apparatus together with other apparatuses forming a production facility for semiconductor devices;

Figure 2 schematically depicts two examples of stochastic variation: (a) line edge roughness LER; and (b) schematically line width roughness (LWR);

Figure 3 (a) is a schematic drawing of a first optical metrology apparatus operable to implement methods according to an embodiment; and (b) a target measurable using such a tool;

Figure 4 (a) is a schematic drawing of a second optical metrology apparatus operable to implement methods according to an embodiment in which EUV and/or SXR radiation is used; and (b) a diffraction pattern as may be detected using such a metrology device;

Figure 5 is a flow diagram describing a first method of training and using a machine learning model to infer stochastic related data from optical metrology data according to an embodiment of the invention;

Figure 6 is a plot of defect rate DR(SEM) as measured using a scanning electron microscope against defect rate DR(IDM) as measured using an optical metrology tool such as illustrated in Figure 3(a) or Figure 4(a); and

Figure 7 is a flow diagram describing a second method of training and using a machine learning model to infer stochastic related data from optical metrology data according to an embodiment of the invention.

DETAIFED DESCRIPTION

[0015] Before describing embodiments of the invention in detail, it is instructive to present an example environment in which embodiments of the present invention may be implemented. [0016] Figure 1 at 200 shows a lithographic apparatus LA as part of an industrial production facility implementing a high-volume, lithographic manufacturing process. In the present example, the manufacturing process is adapted for the manufacture of for semiconductor products (integrated circuits) on substrates such as semiconductor wafers. The skilled person will appreciate that a wide variety of products can be manufactured by processing different types of substrates in variants of this process. The production of semiconductor products is used purely as an example which has great commercial significance today.

[0017] Within the lithographic apparatus (or “litho tool” 200 for short), a measurement station MEA is shown at 202 and an exposure station EXP is shown at 204. A control unit LACU is shown at 206. In this example, each substrate visits the measurement station and the exposure station to have a pattern applied. In an optical lithographic apparatus, for example, a projection system is used to transfer a product pattern from a patterning device MA onto the substrate using conditioned radiation and a projection system. This is done by forming an image of the pattern in a layer of radiation- sensitive resist material.

[0018] The term “projection system” used herein should be broadly interpreted as encompassing any type of projection system, including refractive, reflective, catadioptric, magnetic, electromagnetic and electrostatic optical systems, or any combination thereof, as appropriate for the exposure radiation being used, or for other factors such as the use of an immersion liquid or the use of a vacuum. The patterning MA device may be a mask or reticle, which imparts a pattern to a radiation beam transmitted or reflected by the patterning device. Well-known modes of operation include a stepping mode and a scanning mode. As is well known, the projection system may cooperate with support and positioning systems for the substrate and the patterning device in a variety of ways to apply a desired pattern to many target portions across a substrate. Programmable patterning devices may be used instead of reticles having a fixed pattern. The radiation for example may include electromagnetic radiation in the deep ultraviolet (DUV) or extreme ultraviolet (EUV) wavebands. The present disclosure is also applicable to other types of lithographic process, for example imprint lithography and direct writing lithography, for example by electron beam.

[0019] The lithographic apparatus control unit LACU which controls all the movements and measurements of various actuators and sensors to receive substrates W and reticles MA and to implement the patterning operations. LACU also includes signal processing and data processing capacity to implement desired calculations relevant to the operation of the apparatus. In practice, control unit LACU will be realized as a system of many sub-units, each handling the real-time data acquisition, processing and control of a subsystem or component within the apparatus.

[0020] Before the pattern is applied to a substrate at the exposure station EXP, the substrate is processed in at the measurement station MEA so that various preparatory steps may be carried out. The preparatory steps may include mapping the surface height of the substrate using a level sensor and measuring the position of alignment marks on the substrate using an alignment sensor. The alignment marks are arranged nominally in a regular grid pattern. However, due to inaccuracies in creating the marks and also due to deformations of the substrate that occur throughout its processing, the marks deviate from the ideal grid. Consequently, in addition to measuring position and orientation of the substrate, the alignment sensor in practice must measure in detail the positions of many marks across the substrate area, if the apparatus is to print product features at the correct locations with very high accuracy. The apparatus may be of a so-called dual stage type which has two substrate tables, each with a positioning system controlled by the control unit LACU. While one substrate on one substrate table is being exposed at the exposure station EXP, another substrate can be loaded onto the other substrate table at the measurement station MEA so that various preparatory steps may be carried out. The measurement of alignment marks is therefore very time-consuming and the provision of two substrate tables enables a substantial increase in the throughput of the apparatus. If the position sensor IF is not capable of measuring the position of the substrate table while it is at the measurement station as well as at the exposure station, a second position sensor may be provided to enable the positions of the substrate table to be tracked at both stations. Lithographic apparatus LA may for example is of a so-called dual stage type which has two substrate tables and two stations - an exposure station and a measurement station- between which the substrate tables can be exchanged.

[0021] Within the production facility, apparatus 200 forms part of a “litho cell” or “litho cluster” that contains also a coating apparatus 208 for applying photosensitive resist and other coatings to substrates W for patterning by the apparatus 200. At an output side of apparatus 200, a baking apparatus 210 and developing apparatus 212 are provided for developing the exposed pattern into a physical resist pattern. Between all of these apparatuses, substrate handling systems take care of supporting the substrates and transferring them from one piece of apparatus to the next. These apparatuses, which are often collectively referred to as the track, are under the control of a track control unit which is itself controlled by a supervisory control system SCS, which also controls the lithographic apparatus via lithographic apparatus control unit LACU. Thus, the different apparatus can be operated to maximize throughput and processing efficiency. Supervisory control system SCS receives recipe information R which provides in great detail a definition of the steps to be performed to create each patterned substrate. [0022] Once the pattern has been applied and developed in the litho cell, patterned substrates 220 are transferred to other processing apparatuses such as are illustrated at 222, 224, 226. A wide range of processing steps is implemented by various apparatuses in a typical manufacturing facility. For the sake of example, apparatus 222 in this embodiment is an etching station, and apparatus 224 performs a post-etch annealing step. Further physical and/or chemical processing steps are applied in further apparatuses, 226, etc.. Numerous types of operation can be required to make a real device, such as deposition of material, modification of surface material characteristics (oxidation, doping, ion implantation etc.), chemical-mechanical polishing (CMP), and so forth. The apparatus 226 may, in practice, represent a series of different processing steps performed in one or more apparatuses. As another example, apparatus and processing steps may be provided for the implementation of self- aligned multiple patterning, to produce multiple smaller features based on a precursor pattern laid down by the lithographic apparatus.

[0023] As is well known, the manufacture of semiconductor devices involves many repetitions of such processing, to build up device structures with appropriate materials and patterns, layer-by-layer on the substrate. Accordingly, substrates 230 arriving at the litho cluster may be newly prepared substrates, or they may be substrates that have been processed previously in this cluster or in another apparatus entirely. Similarly, depending on the required processing, substrates 232 on leaving apparatus 226 may be returned for a subsequent patterning operation in the same litho cluster, they may be destined for patterning operations in a different cluster, or they may be finished products to be sent for dicing and packaging.

[0024] Each layer of the product structure requires a different set of process steps, and the apparatuses 226 used at each layer may be completely different in type. Further, even where the processing steps to be applied by the apparatus 226 are nominally the same, in a large facility, there may be several supposedly identical machines working in parallel to perform the step 226 on different substrates. Small differences in set-up or faults between these machines can mean that they influence different substrates in different ways. Even steps that are relatively common to each layer, such as etching (apparatus 222) may be implemented by several etching apparatuses that are nominally identical but working in parallel to maximize throughput. In practice, moreover, different layers require different etch processes, for example chemical etches, plasma etches, according to the details of the material to be etched, and special requirements such as, for example, anisotropic etching.

[0025] The previous and/or subsequent processes may be performed in other lithography apparatuses, as just mentioned, and may even be performed in different types of lithography apparatus. For example, some layers in the device manufacturing process which are very demanding in parameters such as resolution and overlay may be performed in a more advanced lithography tool than other layers that are less demanding. Therefore some layers may be exposed in an immersion type lithography tool, while others are exposed in a ‘dry’ tool. Some layers may be exposed in a tool working at DUV wavelengths, while others are exposed using EUV wavelength radiation.

[0026] In order that the substrates that are exposed by the lithographic apparatus are exposed correctly and consistently, it is desirable to inspect exposed substrates to measure properties such as overlay errors between subsequent layers, line thicknesses, critical dimensions (CD), etc. Accordingly a manufacturing facility in which litho cell LC is located also includes metrology system which receives some or all of the substrates W that have been processed in the litho cell. Metrology results are provided directly or indirectly to the supervisory control system SCS. If errors are detected, adjustments may be made to exposures of subsequent substrates, especially if the metrology can be done soon and fast enough that other substrates of the same batch are still to be exposed. Also, already exposed substrates may be stripped and reworked to improve yield, or discarded, thereby avoiding performing further processing on substrates that are known to be faulty. In a case where only some target portions of a substrate are faulty, further exposures can be performed only on those target portions which are good. [0027] Also shown in Figure 1 is a metrology apparatus 240 which is provided for making measurements of parameters of the products at desired stages in the manufacturing process. A common example of a metrology station in a modern lithographic production facility is a scatterometer, for example a dark-field scatterometer, an angle-resolved scatterometer or a spectroscopic scatterometer, and it may be applied to measure properties of the developed substrates at 220 prior to etching in the apparatus 222. Using metrology apparatus 240, it may be determined, for example, that important performance parameters such as overlay or critical dimension (CD) do not meet specified accuracy requirements in the developed resist. Prior to the etching step, the opportunity exists to strip the developed resist and reprocess the substrates 220 through the litho cluster. The metrology results 242 from the apparatus 240 can be used to maintain accurate performance of the patterning operations in the litho cluster, by supervisory control system SCS and/or control unit LACU 206 making small adjustments over time, thereby minimizing the risk of products being made out-of-specification, and requiring re-work.

[0028] Another example of a metrology station is a scanning electron microscope (SEM), otherwise referred to as an electron beam (e-beam) metrology device, which may be included in addition to, or as an alternative to, a scatterometer. As such, metrology apparatus 240 may comprise an e-beam or SEM metrology device, either alone or in addition to a scatterometer. E-beam and SEM metrology devices have the advantage of measuring features directly (i.e., they directly image the features), rather than the indirect measurement techniques used in scatterometry (where parameter values are determined from reconstruction from and or asymmetry in diffraction orders of radiation diffracted by the structure being measured). The main disadvantage with e-beam or SEM metrology devices is their measurement speed, which is much slower than scatterometry, limiting their potential application to specific offline monitoring processes.

[0029] Additionally, metrology apparatus 240 and or other metrology apparatuses (not shown) can be applied to measure properties of the processed substrates 232, 234, and incoming substrates 230. The metrology apparatus can be used on the processed substrate to determine important parameters such as overlay or CD.

[0030] Lithographic projection apparatuses typically project a patterned (i.e., by a reticle) image at a point immediately above the substrate, and then ultimately into the resist. The projected image is called the aerial image, which comprises a distribution of light intensity as a function of spatial position in the image plane. The aerial image is the source of the information that is exposed into the resist, forming a gradient in dissolution rates that enables the three-dimensional resist image to appear during development.

[0031] Stochastic induced failure predictions are typically made based on one or more stochastic metrics. Such stochastic metrics may include stochastic measurements on the variation of one or more dimensional parameters; e.g. one or more of,: CD (so called Local CD Uniformity, LCDU), line edge position (so called line edge roughness LER), or linewidth (so called linewidth roughness LWR), for example. Accurate measurement of the number of failures is cumbersome, as low failure rates (e.g., of the order of 1 per million to 1 per billion) can be expected in an optimized process.

[0032] Imaging, using a lithographic projection apparatus, will result in a stochastic variation in one or more parameters, such as pronounced line width roughness (LWR) and local CD variation in small two-dimensional features such as holes. The stochastic variation may be attributed to factors such as photon shot noise, photon- generated secondary electrons, photon absorption variation, photon generated acids in the resist. In the case of EUV lithography, the small sizes of features for which EUV is called for further compound this stochastic variation. The stochastic variation in smaller features are a significant factor in production yield and justifies inclusion in a variety of optimization processes of the lithographic projection apparatus.

[0033] Figure 2(a) schematically depicts a stochastic effect, line edge roughness LER. Assuming all conditions are identical in three exposures or simulations of exposure of an edge 903 of a feature on a design layout, the resist images 903A, 903B and 903C of the edge 903 may have slightly different shapes and locations. Locations 904 A, 904B and 904C of the resist images 903 A, 903B and 903C may be measured by averaging the resist images 903 A, 903B and 903C, respectively. LER of the edge 903 may be a measure of the spatial distribution of the locations 904A, 904B and 904C. For example, the LER may be a 3s of the spatial distribution (assuming the distribution is a normal distribution). The LER may be derived from many exposures or simulation of the edge 903.

[0034] Figure 2(b) schematically depicts LWR. Assuming all conditions are identical in three exposures or simulations of exposure of a long rectangle feature 910 with a width 911 on a design layout, the resist images 910A, 910B and 910C of the rectangle feature 910 may have slightly different widths 911A, 91 IB and 911C, respectively. LWR of the rectangle feature 910 may be a measure of the distribution of the widths 911 A, 91 IB and 911C. For example, the LWR may be a 3s of the distribution (assuming the distribution is a normal distribution). The LWR may be derived from many exposures or simulation of the rectangle feature 910. In the context of a short feature (e.g., a contact hole), the widths of its images are not well defined because long edges are not available for averaging their locations. A similar quantity, LCDU, may be used to characterize the stochastic variation. The LCDU is a 3s of the distribution (assuming the distribution is a normal distribution) of measured CDs of images of the short feature.

[0035] Figure 3(a) illustrates an example of a metrology apparatus 100 suitable for use in embodiments of the invention disclosed herein. The principles of operation of this type of metrology apparatus and in-die metrology (IDM) techniques for which it may be used for are explained in more detail in the U.S. Patent Application Nos. US 2006-033921, US 2010-201963 and WO2017148982, which are incorporated herein in their entireties by reference. An optical axis, which has several branches throughout the apparatus, is represented by a dotted line O. In this apparatus, radiation emitted by source 110 (e.g., a xenon lamp) is directed onto substrate W via by an optical system comprising: lens system 120, aperture plate 130, lens system 140, a partially reflecting surface 150 and objective lens 160. In an embodiment these lens systems 120, 140, 160 are arranged in a double sequence of a 4F arrangement. In an embodiment, the radiation emitted by radiation source 110 is collimated using lens system 120. A different lens arrangement can be used, if desired. The angular range at which the radiation is incident on the substrate can be selected by defining a spatial intensity distribution in a plane that presents the spatial spectrum of the substrate plane. In particular, this can be done by inserting an aperture plate 130 of suitable form between lenses 120 and 140, in a plane which is a back-projected image of the objective lens pupil plane. Different intensity distributions (e.g., annular, dipole, etc.) are possible by using different apertures. The angular distribution of illumination in radial and peripheral directions, as well as properties such as wavelength, polarization and/or coherency of the radiation, can all be adjusted to obtain desired results. For example, one or more interference filters 130 can be provided between source 110 and partially reflecting surface 150 to select a wavelength of interest in the range of, say, 400 -900 nm or even lower, such as 200 - 300 nm. The interference filter may be tunable rather than comprising a set of different filters. A grating could be used instead of an interference filter. In an embodiment, one or more polarizers 170 can be provided between source 110 and partially reflecting surface 150 to select a polarization of interest. The polarizer may be tunable rather than comprising a set of different polarizers.

[0036] The target T is placed with substrate W normal to the optical axis O of objective lens 160. Thus, radiation from source 110 is reflected by partially reflecting surface 150 and focused into an illumination spot S (see Figure 3(b)) on target T on substrate W via objective lens 160. In an embodiment, objective lens 160 has a high numerical aperture (NA), desirably at least 0.9 or at least 0.95. An immersion metrology apparatus (using a relatively high refractive index fluid such as water) may even have a numerical aperture over 1.

[0037] Rays of illumination 170, 172 focused to the illumination spot from angles off the axis O gives rise to diffracted rays 174, 176. It should be remembered that these rays are just one of many parallel rays covering an area of the substrate including target T. Each element within the illumination spot is within the field of view of the metrology apparatus. Since the aperture in plate 130 has a finite width (necessary to admit a useful quantity of radiation), the incident rays 170, 172 will in fact occupy a range of angles, and the diffracted rays 174, 176 will be spread out somewhat. According to the point spread function of a small target, each diffraction order will be further spread over a range of angles, not a single ideal ray as shown.

[0038] At least the zeroth order diffracted by the target on substrate W is collected by objective lens 160 and directed back through partially reflecting surface 150. An optical element 180 provides at least part of the diffracted beams to optical system 182 which forms a diffraction spectrum (pupil plane image) of the target T on sensor 190 (e.g. a CCD or CMOS sensor) using the zeroth and or first order diffractive beams. In an embodiment, an aperture 186 is provided to filter out certain diffraction orders so that a particular diffraction order is provided to the sensor 190. In an embodiment, the aperture 186 allows substantially or primarily only zeroth order radiation to reach the sensor 190. In an embodiment, the sensor 190 may be a two-dimensional detector so that a two-dimensional angular scatter spectrum of a substrate target T can be measured. The sensor 190 may be, for example, an array of CCD or CMOS sensors, and may use an integration time of, for example, 40 milliseconds per frame. The sensor 190 may be used to measure the intensity of redirected radiation at a single wavelength (or narrow wavelength range), the intensity separately at multiple wavelengths or integrated over a wavelength range. Furthermore, the sensor may be used to separately measure the intensity of radiation with transverse magnetic- and/or transverse electric-polarization and or the phase difference between transverse magnetic- and transverse electric-polarized radiation.

[0039] Optionally, optical element 180 provides at least part of the diffracted beams to measurement branch 200 to form an image of the target on the substrate W on a sensor 230 (e.g. a CCD or CMOS sensor). The measurement branch 200 can be used for various auxiliary functions such as focusing the metrology apparatus (i.e., enabling the substrate W to be in focus with the objective 160), and or for dark field imaging where the image is formed with zeroth order blocked, such that it is comprised only of a single diffraction order or complementary pair of diffraction orders.

[0040] In order to provide a customized field of view for different sizes and shapes of grating, an adjustable field stop 300 is provided within the lens system 140 on the path from source 110 to the objective lens 160. The field stop 300 contains an aperture 302 and is located in a plane conjugate with the plane of the target T, so that the illumination spot becomes an image of the aperture 302. The image may be scaled according to a magnification factor, or the aperture and illumination spot may be in 1:1 size relation. In order to make the illumination adaptable to different types of measurement, the aperture plate 300 may comprise a number of aperture patterns formed around a disc, which rotates to bring a desired pattern into place. Alternatively or in addition, a set of plates 300 could be provided and swapped, to achieve the same effect. Additionally or alternatively, a programmable aperture device such as a deformable mirror array or transmissive spatial light modulator can be used.

[0041] Typically, a target will be aligned with its periodic structure features running either parallel to the Y axis or parallel to the X axis. With regard to its diffractive behavior, a periodic structure with features extending in a direction parallel to the Y axis has periodicity in the X direction, while the a periodic structure with features extending in a direction parallel to the X axis has periodicity in the Y direction. In order to measure the performance in both directions, both types of features are generally provided. While for simplicity there will be reference to lines and spaces, the periodic structure need not be formed of lines and space. Moreover, each line and or space between lines may be a structure formed of smaller sub-structures. Further, the periodic structure may be formed with periodicity in two dimensions at once, for example where the periodic structure comprises posts and or via holes.

[0042] Figure 3(b) illustrates a plan view of a typical target T, and the extent of illumination spot S in the apparatus of Figure 3(a). To obtain a diffraction spectrum that is free of interference from surrounding structures, the target T, in an embodiment, is a periodic structure (e.g., grating) larger than the width (e.g., diameter) of the illumination spot S. The width of spot S may be smaller than the width and length of the target. The target in other words is 'underfilled' by the illumination, and the diffraction signal is essentially free from any signals from product features and the like outside the target itself. This simplifies mathematical reconstruction of the target as it can be regarded as infinite.

[0043] The device depicted in Figure 3(a) may be used to determine the value of one or more variables of interest of a target pattern based on measurement data obtained using metrology. Radiation detected by the detector 190 provides a measured radiation distribution (or more generally an angularly resolved parameter distribution) for the target T.

[0044] Figure 4(a) depicts a schematic representation of a metrology apparatus 302 in which radiation in the wavelength range from 0.01 nm to 100 nm may be used to measure parameters of structures on a substrate. The metrology apparatus 302 presented in Figure 4(a) may be suitable for the hard X-ray, soft X-rays or EUV domain.

[0045] Figure 4(a) illustrates a schematic physical arrangement of a further metrology apparatus 302 usable in methods disclosed herein. This metrology apparatus 302 comprises a spectroscopic scatterometer using hard X-ray (HXR) and/or Soft X-Ray (SXR) and or EUV radiation optionally in grazing incidence, purely by way of example. Such a device will be referred to herein as a SXR metrology device for performing SXR metrology, and the images obtained will be referred to as SXR images, regardless of the actual wavelengths used.

[0046] Inspection apparatus 302 comprises a radiation source or called illumination source 310, illumination system 312, substrate support 316, detection systems 318, 398 and metrology processing unit (MPU) 320.

[0047] An illumination source 310 in this example is for a generation of EUV, hard X-ray or soft X- ray radiation. The illumination source 310 may be based on high harmonic generation (HHG) techniques as shown in Figure 4(a), and it may also be other types of illumination sources, e.g. liquid metal jet source, inverse Compton scattering (ICS) source, plasma channel source, magnetic undulator source or free electron laser (FEL) source.

[0048] For the example of HHG source, as shown in Figure 4(a), main components of the radiation source are a pump radiation source 330 operable to emit the pump radiation and a gas delivery system 332. Optionally the pump radiation source 330 is a laser, optionally the pump radiation source 330 is a pulsed high-power infrared or optical laser. The pump radiation source 330 may be, for example, a fiber-based laser with an optical amplifier, producing pulses of infrared radiation that may last for example less than 1 ns (1 nanosecond) per pulse, with a pulse repetition rate up to several megahertz, as required. The wavelength of the infrared radiation may be for example in the region of 1 pm (1 micron). Optionally, the laser pulses are delivered as a first pump radiation 340 to the gas delivery system 332, where in the gas a portion of the radiation is converted to higher frequencies than the first radiation into an emitted radiation 342. A gas supply 334 supplies a suitable gas to the gas delivery system 332, where it is optionally ionized by an electric source 336. The gas delivery system 332 may be a cut tube. A gas provided by the gas delivery system 332 defines a gas target, which may be a gas flow or a static volume. The gas may be for example a noble gas such as neon (Ne), helium (He) or argon (Ar). N2, 02, Ar, Kr, Xe gases may all be considered. These may be selectable options within the same apparatus.

[0049] The emitted radiation may contain multiple wavelengths. If the emitted radiation were monochromatic, then measurement calculations (for example reconstruction) may be simplified, but it is easier to produce radiation with several wavelengths. An emission divergence angle of the emitted radiation may be wavelength dependent. Different wavelengths will, for example, provide different levels of contrast when imaging structure of different materials. For inspection of metal structures or silicon structures, for example, different wavelengths may be selected to those used for imaging features of (carbon-based) resist, or for detecting contamination of such different materials. One or more filtering devices 344 may be provided. For example a filter such as a thin membrane of Aluminum (Al) or Zirconium (Zr) may serve to cut the fundamental IR radiation from passing further into the inspection apparatus. A grating (not shown) may be provided to select one or more specific wavelengths from among those generated. Optionally some or all of the beam path may be contained within a vacuum environment, bearing in mind that SXR and/or EUV radiation is absorbed when traveling in air. The various components of radiation source 310 and illumination optics 312 may be adjustable to implement different metrology ‘recipes’ within the same apparatus. For example different wavelengths and or polarization may be made selectable.

[0050] Depending on the materials of the structure under inspection, different wavelengths may offer a desired level of penetration into lower layers. For resolving the smallest device features and defects among the smallest device features, then a short wavelength is likely to be preferred. For example, one or more wavelengths in the range 0.01-20 nm or optionally in the range 1-10 nm or optionally in the range 10-20 nm may be chosen. Wavelengths shorter than 5 nm may suffer from very low critical angle when reflecting off materials of interest in semiconductor manufacture. Therefore to choose a wavelength greater than 5 nm may provide stronger signals at higher angles of incidence. On the other hand, if the inspection task is for detecting the presence of a certain material, for example to detect contamination, then wavelengths up to 50 nm could be useful.

[0051] From the radiation source 310, the filtered beam 342 enters an inspection chamber 350 where the substrate W including a structure of interest is held for inspection at a measurement position by substrate support 316. The structure of interest is labeled T. Optionally the atmosphere within inspection chamber 350 maybe maintained near vacuum by vacuum pump 352, so that SXR and/or EUV radiation may pass with-out undue attenuation through the atmosphere. The Illumination system 312 has the function of focusing the radiation into a focused beam 356, and may comprise for example a two- dimensionally curved mirror, or a series of one-dimensionally curved mirrors, as described in published US patent application US2017/0184981A1 (which content is incorporated herein by reference in its entirety), mentioned above. The focusing is performed to achieve a round or elliptical spot S under 10 pm in diameter, when projected onto the structure of interest. Substrate support 316 comprises for example an X-Y translation stage and a rotation stage, by which any part of the substrate W may be brought to the focal point of beam to in a desired orientation. Thus the radiation spot S is formed on the structure of interest. Alternatively, or additionally, substrate support 316 comprises for example a tilting stage that may tilt the substrate W at a certain angle to control the angle of incidence of the focused beam on the structure of interest T.

[0052] Optionally, the illumination system 312 provides a reference beam of radiation to a reference detector 314 which may be configured to measure a spectrum and/or intensities of different wavelengths in the filtered beam 342. The reference detector 314 may be configured to generate a signal 315 that is provided to processor 310 and the filter may comprise information about the spectrum of the filtered beam 342 and or the intensities of the different wavelengths in the filtered beam.

[0053] Reflected radiation 360 is captured by detector 318 and a spectrum is provided to processor 320 for use in calculating a property of the target structure T. The illumination system 312 and detection system 318 thus form an inspection apparatus. This inspection apparatus may comprise a hard X-ray, soft X-ray and/or EUV spectroscopic reflectometer of the kind described in US2016282282A1 which content is incorporated herein by reference in its entirety.

[0054] If the target Ta has a certain periodicity, the radiation of the focused beam 356 may be partially diffracted as well. The diffracted radiation 397 follows another path at well-defined angles with respect to the angle of incidence then the reflected radiation 360. In Figure 4(a), the drawn diffracted radiation 397 is drawn in a schematic manner and diffracted radiation 397 may follow many other paths than the drawn paths. The inspection apparatus 302 may also comprise further detection systems 398 that detect and/or image at least a portion of the diffracted radiation 397. In Figure 4(a) a single further detection system 398 is drawn, but embodiments of the inspection apparatus 302 may also comprise more than one further detection system 398 that are arranged at different position to detect and/or image diffracted radiation 397 at a plurality of diffraction directions. In other words, the (higher) diffraction orders of the focused radiation beam that impinges on the target Ta are detected and or imaged by one or more further detection systems 398. The one or more detection systems 398 generates a signal 399 that is provided to the metrology processor 320. The signal 399 may include information of the diffracted light 397 and or may include images obtained from the diffracted light 397.

[0055] To aid the alignment and focusing of the spot S with desired product structures, inspection apparatus 302 may also provide auxiliary optics using auxiliary radiation under control of metrology processor 320. Metrology processor 320 may also communicate with a position controller 372 which operates the translation stage, rotation and/or tilting stages. Processor 320 receives highly accurate feedback on the position and orientation of the substrate, via sensors. Sensors 374 may include interferometers, for example, which may give accuracy in the region of picometers. In the operation of the inspection apparatus 302, spectrum data 382 captured by detection system 318 is delivered to metrology processing unit 320.

[0056] Figure 4(b) shows a diffraction image which may be obtained by measuring a target (e.g., a target such as illustrated in Figure 3(a)). The light diffracts and multiple orders are captured on a detector. In this diagram, the zeroth order 0th (specular reflection) and two first diffraction orders are shown. All orders, except for the specular reflection, are spectrally resolved (hence the 2D pattern formed by the first orders). Note the soft x-ray setup of Figure 4(a) measures the entire spectrum at once, in contrast to the metrology device of Figure 3(a) which measures multiple angles at once; i.e., the image of Figure 4(b) is spectrally resolved while the pupil images captured by the metrology device of Figure 3(a) are angularly resolved.

[0057] In either metrology apparatus, a substrate support may be provided to hold the substrate W during measurement operations. In an example where the metrology apparatus is integrated with a lithographic apparatus, the two device may have the same substrate table. Coarse and fine positioners may be provided to accurately position the substrate in relation to a measurement optical system. Various sensors and actuators are provided for example to acquire the position of a target of interest, and to bring it into position under the objective lens. Typically many measurements will be made on targets at different locations across the substrate W. The substrate support can be moved in X and Y directions to acquire different targets, and in the Z direction to obtain a desired location of the target relative to the focus of the optical system. It is convenient to think and describe operations as if the objective lens is being brought to different locations relative to the substrate, when, for example, in practice the optical system may remain substantially stationary (typically in the X and Y directions, but perhaps also in the Z direction) and only the substrate moves. Provided the relative position of the substrate and the optical system is correct, it does not matter in principle which one of those is moving in the real world, or if both are moving, or a combination of a part of the optical system is moving (e.g., in the Z and/or tilt direction) with the remainder of the optical system being stationary and the substrate is moving (e.g., in the X and Y directions, but also optionally in the Z and/or tilt direction).

[0058] In an embodiment, the measurement accuracy and or sensitivity of a target may vary with respect to one or more attributes of the beam of radiation provided onto the target, for example, the wavelength of the radiation beam, the polarization of the radiation beam, the intensity distribution (i.e., angular or spatial intensity distribution) of the radiation beam, etc. Thus, a particular measurement strategy can be selected that desirably obtains, e.g., good measurement accuracy and or sensitivity of the target.

[0059] In order to monitor the patterning process (e.g., a device manufacturing process) that includes at least one pattern transfer step (e.g., an optical lithography step), the patterned substrate is inspected and one or more parameters of the patterned substrate are measured / determined. The one or more parameters may include, for example, overlay between successive layers formed in or on the patterned substrate, critical dimension (CD) (e.g., critical linewidth) of, for example, features formed in or on the patterned substrate, focus or focus error of an optical lithography step, dose or dose error of an optical lithography step, optical aberrations of an optical lithography step, placement error (e.g., edge placement error), etc. This measurement may be performed on a target of the product substrate itself and/or on a dedicated metrology target provided on the substrate. The measurement can be performed after-development of a resist but before etching or can be performed after-etch.

[0060] In an embodiment, a parameter obtained from a measurement process is a parameter derived from a parameter determined directly from the measurement process. As an example, a derived parameter obtained from a measurement parameter is edge placement error (EPE) for the patterning process. The edge placement error provides a variation in the location of an edge of a structure created by the patterning process. In an embodiment, the edge placement error is derived from an overlay value. In an embodiment, the edge placement error is derived from a combination of an overlay value and at least one stochastic metric. In an embodiment, the edge placement is derived from a combination of an overlay value, at least one CD stochastic metric value (e.g., CDU, LCDU) and (optionally) also another stochastic metric (e.g., edge roughness, shape asymmetry, etc. of the individual structures). In an embodiment, the edge placement error comprises an extreme value (e.g., 3 standard deviation, i.e., 3s) of overlay and CD errors combined. In an embodiment, the edge placement error has the following form (or comprises at least the first two of the following terms): wherein ^ o v e r lay orresponds to the standard deviation of overlay, corresponds to the standard deviation of overlay, cr CDUstructures corresponds to the standard deviation of the critical dimension uniformity (CDU) of structures created in the patterning process, s 0RE PBA corresponds to the standard deviation of optical proximity effects (OPE) and or proximity bias average (PBA) which is a difference between CD at pitch to a reference CD, and cr LER LPE corresponds to the standard deviation of line edge roughness (LER) and/or local placement error (LPE). While formulation above is in relation to standard deviation, it can be formulated in a different comparable statistical manner, such as variance.

[0061] There are various techniques for making measurements of the structures formed in the patterning process, including the use of a scanning electron microscope, an image-based measurement tool and or various specialized tools. As discussed above, a fast and non-invasive form of specialized metrology tool is one in which a beam of radiation is directed onto a target on the surface of the substrate and properties of the scattered (diffracted/reflected) beam are measured. By evaluating one or more properties of the radiation scattered by the substrate, one or more properties of the substrate can be determined. This may be termed diffraction-based metrology. One such application of this diffraction- based metrology is in the measurement of feature asymmetry within a target. This can be used as a measure of overlay, for example, but other applications are also known. For example, asymmetry can be measured by comparing opposite parts of the diffraction spectrum (for example, comparing the -1st and +lst orders in the diffraction spectrum of a periodic grating). This can be done as described, for example, in U.S. patent application publication US2006-066855, which is incorporated herein in its entirety by reference. Another application of diffraction-based metrology is in the measurement of feature width (CD) within a target. Such techniques can use the apparatus and methods described above in respect of Figure 3 or Figure 4.

[0062] A target or structure measured by an apparatus such as depicted in Figure 3(a) or 4, and by methods disclosed herein may comprise one or a plurality of geometrically symmetric unit cells or features. As such, the target T or structure can comprise just a single physical instance of a unit cell or feature or can comprise a plurality of physical instances of the unit cells or features.

[0063] The target/structure can be a specially designed target. In an embodiment, the target is for a scribe lane. In an embodiment, the target can be an in-die target, i.e., the target is among the device pattern (and thus between the scribe lanes). In an embodiment, the target can have a feature width or pitch comparable to device pattern features. For example, the target feature width or pitches can be less than or equal to 300% of the smallest feature size or pitch of the device pattern, be less than or equal to 200% of the smallest feature size or pitch of the device pattern, be less than or equal to 150% of the smallest feature size or pitch of the device pattern, or be less than or equal to 100% of the smallest feature size or pitch of the device pattern.

[0064] The target or structure can be a device structure. For example, the target or structure can be a portion of a memory device (which often has one or more features that are, or can be, geometrically symmetric). Where the device structure is non-periodic or non-regular (e.g., logic structure), the target may be superficially similar to the logic structure, (e.g., of similar feature size and configuration) such that it comprises a regularized extraction of the logic structure, which mimics the exposure performance of the logic structure.

[0065] Desirably, for each structure, the physical instance of the unit cell/feature or the plurality of physical instances of the unit cells/features collectively fills a beam spot of the metrology apparatus. In that case, the measured results comprise essentially only information from the physical instance of the unit cell (or its plurality of instances). In an embodiment, the beam spot has a cross-sectional width of 50 microns or less, 40 microns or less, 30 microns or less, 20 microns or less, 15 microns or less, 10 microns or less, 5 microns or less, or 2 microns or less. Structure features may be of the scale of 20nm pitch, and therefore each measurement or capture may comprise between 200 and 300 features (e.g., about 250 features) should the beam spot be 5pm for example. As such, each structure or target may comprise hundreds of features.

[0066] The essence of stochastics relates to the absorbed dose, which fluctuates due to the limited number of (absorbed) photons and resist chemical noise. This is practically reflected in CD variations from feature-to-feature and/or over the length of a feature. As such, a stochastic metric in the context of this disclosure may comprise a defect rate or other defect metric, or e.g., an average or mean of: line edge roughness (LER), line width roughness (LWR), LCDU, contact hole LCDU, circle edge roughness (CER) edge placement error (EPE), or a combination thereof.

[0067] At present, failure rate may be determined by counting defects obtained from SEM (e.g., e- beam) images. Typically, this failure rate estimation is performed by collecting multiple metrology points, e.g. contact holes (CHs) and counting the number of failures within the sample. While e-beam metrology is accurate, it is time consuming, and therefore is not always practical and non-ideal for massive defect metrology. Because mean CD strongly relates to the pattern defect rate, CD SEM helps in HVM as a yield-indicator. The mean CD, as calculated from e.g., a few hundred of CHs is sufficient to obtain a rough estimation of the failure rate. However, CH arrays with the same mean CD can have different failure rates, due to focus fluctuations.

[0068] It is proposed herein to use optical metrology (e.g., scatterometer based metrology) pupil measurements for a fast stochastic defect-rate estimation across wafer. Such metrology may use raw pupil data (e.g., In-Device Metrology (IDM) raw pupil data) or SXR images (e.g., 2D spectrally resolved images such as illustrated in Figure 4(b) obtained using a metrology tool such as illustrated in Figure 4(a)) as an input for a trained model, e.g., a machine learning model (e.g., a neural network model or convolutional neural network (CNN)), the model being trained to infer a defect rate prediction and/or other stochastic metric prediction from the raw pupil data SXR spectrally resolved diffraction image data.

[0069] With regard to the IDM embodiment, each measurement may yield a respective angularly resolved measurement signal. For example, in-device metrology may be based on detection of a measurement signal comprising an angularly resolved distribution (e.g., an angularly resolved intensity and/or diffraction efficiency distribution) in a pupil plane from radiation scattered by a structure on a wafer following illumination of the structure. Diffraction efficiency (a dimensionless value) describes the relative intensities of the diffraction beams, and may comprise a ratio of diffracted versus incident light intensity. Such an angularly resolved distribution measured at the pupil plane will be referred to simply as a “pupil” or “pupil measurement” in the description below. The pupils used may, for example, be the raw pupil or unprocessed pupil (other than, optionally, any normalization).

[0070] The angularly resolved distribution may be obtained from only the zeroth order of the radiation scattered by the structure, from only one or more higher orders of the radiation scattered by the structure or a combination of the zeroth order and one or more higher orders of the radiation scattered by the structure. IDM metrology based on analysis of the zeroth order (intensity distribution) to infer overlay/CD at device resolution (given device features are periodic) is described in the aforementioned WO2017148982, for example.

[0071] With regard to the SXR embodiment, each measurement may yield a respective spectrally resolved measurement signal. For example, SXR metrology may be based on detection of a measurement signal comprising a measured bright-field image or spectrally resolved distribution (e.g., a spectrally resolved intensity and/or diffraction efficiency distribution or image) in one or more (e.g., conjugate) pupil planes from radiation scattered by a structure on a wafer following illumination of the structure. Diffraction efficiency (a dimensionless value) describes the relative intensities of the diffraction beams, and may comprise a ratio of diffracted versus incident light intensity. Such a spectrally resolved distribution measured at the pupil plane will be referred to simply as an “SXR image” or “SXR measurement” in the description below (regardless of the actual wavelengths used). The SXR images used may, for example, be the raw SXR images or unprocessed SXR images (other than, optionally, any normalization). Due to the small wavelength used in SXR metrology, it is possible to resolve features on the order of the device pitch. Therefore, we a better correlation between SEM metrology values and SXR metrology may be expected compared to IDM.

[0072] IDM and/or SXR metrology may be performed on targets or structures comprising a feature size similar or the same as product features, and or directly on the product features provided that they are sufficiently regularized (e.g., periodic). The IDM/SXR metrology data may be obtained from pre etch measurements of structures in resist (i.e., after-develop metrology ADI) and/or post-etch measurement of structures (i.e., after-etch metrology AEI).

[0073] Such a method can, for example, facilitate hybrid metrology techniques comprising a combination of optical metrology and SEM metrology, such that (for example) a fast optical wafer scan can be performed, and the results of this optical metrology used to guide the slower but more accurate (or at least higher resolution) SEM inspection towards a few critical locations.

[0074] In optical (e.g., IDM or SXR) metrology, the measurement resolution is insufficiently high to enable direct detection of individual stochastic variations and defects; e.g., which may occur on the order of a single defective feature for every 10000 or more good features (noting that the measurement spot may comprise over 100 individual features for a single measurement). However, the inventors have determined that such a measurement (IDM or SXR) may be used as an input for a suitably trained model, which is then able to provide a very accurate estimation of one or more stochastic metrics such as the total defect rate or other defect metric and/or LCDU; for example when tested over varied process parameters (e.g., varied dose and focus conditions).

[0075] Stochastic defects may result from both photon shot noise and resist chemical noise. Because of this, stochastic variability in resist is both aerial image dependent and resist dependent. The inventors have observed that the stochastic properties of a specific pattern correlate well with the average geometric and material properties of the pattern, this information being present in the IDM raw pupil or SXR diffraction image. For example, defect-rate and LCDU variations for a given pattern and resist vary with process parameter variations such as dose and or focus variations. The IDM pupils or SXR diffraction images contain information on the averaged 3D profile geometry, which also vary with dose and/or focus variation. As such, by varying one or more process parameters (e.g., a focus-exposure matrix) and or feature dimensions between training structures or targets on a training substrate, a machine learning model can be trained on the pupils/SXR diffraction images obtained from measurement of these structures. The process parameters (such as focus and/or dose) variations cause changes in geometric properties (sensitivity depends on resist characteristics). These geometric properties (measurable with an SEM/e-beam tool) correlate with the stochastic metric and can also be measured by optical (e.g., IDM/SXR) metrology.

[0076] It may be appreciated that focus effects on the printed pattern might not be captured by the e- beam tool, but still captured by the optical tool. As such, a variation in dose and or focus may cause stochastic pattern changes can be captured by the e- beam tool (e.g., failure rate and LCDU); however the inventors have determined that some 3D changes which effect the failure rate may actually be better captured via optical metrology.

[0077] The machine learning model may be trained to a known or observed defect rate or other stochastic metric based process window which defines a process space comprising process parameter values which are expected to yield good or non-defective dies (at least in terms of an acceptable probability), and such that process parameter values outside of the process window may be expected to result in dies having an unacceptable probability of defect. For example, the machine learning model may be trained on such a defect rate based process window as measured by an e-beam/SEM tool or any other tool with sufficient resolution to directly measure the stochastic metric/defect rate. In a specific example, the process window may comprise a focus-exposure window, where focus and dose are the process parameters of interest which are varied over the structures measured by the e-beam/SEM tool so as to define the process window. Optical measurements, e.g., pupil measurements or SXR diffraction images, across all or part of the process window or focus-exposure window are obtained from the same wafers as measured by the e-beam/SEM tool. The optical measurements and corresponding SEM based defect rate data/process window may be used together as training input to the machine learning model. For example, each optical measurement may be labeled with its corresponding defect rate data and process parameter value(s) and used to train the machine learning model. In addition or alternatively to focus and or dose, the process parameter may relate to a parameter of a reticle used in the exposure, e.g., a reticle feature dimension on which imaged feature dimensions such as CD are dependent. By varying this reticle feature dimension, CD can be deliberately varied across multiple structures on the training wafer therefore providing a local CD stochastic metric (e.g., LCDU) on which the machine learning model may be trained so that it can map IDM pupils/SXR diffraction images to a LCDU prediction. As with the focus/dose examples, this LCD variation may be associated with a process window comprising LCDU values expected to yield with acceptable probability.

[0078] In addition to pupil data SXR diffraction image data from wafers corresponding to the SEM data, the training data may also comprise nominal informative signals (e.g., nominal informative pupils/SXR diffraction images) from a reference and/or a simulation. Such nominal informative signals may relate to non-defect structures/wafers (e.g., simulated pupils from perfectly formed structures) and/or structures/wafers with specific examples of a particular defect. In this way, the model can learn how to contrast the optically measured data against the nominal informative signals and better regress their difference to a given failure rate. As such, the training data may comprise a tensor containing the measured optical data (IDM pupils or SXR diffraction images from the exposed training wafer(s) and the nominal informative signals (nominal measured or simulated IDM pupils or SXR diffraction images as described).

[0079] The trained machine learning model can then be used to infer defect failure rate and/or other stochastic metric based on pupil measurement input.

[0080] The machine learning model may be a CNN. More specifically, the CNN may comprise an input layer, an output layer and hidden layers therebetween. The hidden layers may comprise, for example, a number of repetitions of convolution layers, activation layers, and batch normalization layers, followed by one or more dropout layers and one or more full connection layers. In an embodiment, the activation layers may impose a logarithmic activation function in order to linearly span the exponential range of defect rates.

[0081] Figure 5 is a flow diagram describing such a method. At step 400, a wafer is exposed using a scanner, with at least one process parameter being varied over the wafer. As such, the exposed wafer may comprise a plurality of training structures, each of which may comprise multiple instances of features. The training structures may all be similar, other than there may have been a variation in one or more process parameters used in their formation. Process parameters in this context may describe a parameter of the lithographic apparatus used to image the structure from a reticle (e.g., focus and or dose) and or a reticle parameter such as a reticle feature dimension (on which imaged feature dimensions such as CD are dependent). For example, the structures may be repeated for different values of focus and/or dose; e.g., in a similar manner as a focus-exposure matrix FEM, and or with different values for CD.

[0082] The training structures used to train the model may be similar or essentially the same as the structures which will be measured to obtain optical metrology data on which the trained model will be used to infer a stochastic metric in a production setting or HVM setting. However, this is not necessarily essential, and some differences may be accommodated, with a possible impact on inference accuracy by the trained model.

[0083] At step 410, high-resolution metrology data is obtained from measurements of the structures on the wafer exposed at step 400 using a high-resolution metrology tool, e.g., having sufficient resolution to be able to be able to individually image each feature or structure and/or directly determine defect rate. As such, a high-resolution metrology tool may have higher resolution than an optical metrology tool, and may comprise an SEM/e-beam tool. Based on this high-resolution metrology data, stochastic metric data may be determined, describing a process window for the process. A process window may describe a process space or range of process parameter values for which the number of defects/defect rate or other stochastic metric is acceptable, e.g., below a threshold. This is indicative of an acceptable probability of there being no defect provided that the process parameters remain within the process window, while outside of this window the number of defects/defect rate or other stochastic metric may be deemed to be unacceptable (i.e., indicative of an unacceptable probability of there being no defect).

[0084] At step 420, the same wafers may be measured using an optical metrology tool, e.g., in a pupil plane to obtain pupil measurements or optical metrology data. As such, the wafer may have been imaged with structures or targets suitable for such optical metrology at step 400. In an embodiment, the optical metrology data may comprise an angularly resolved distribution (e.g., obtained via IDM metrology) or spectrally resolved distribution (e.g., obtained via SXR metrology) such as an angularly or spectrally resolved intensity distribution or angularly or spectrally resolved diffraction efficiency distribution. The intensity or diffraction efficiency may be normalized (more detail is provided below). The optical metrology data may relate to measurements of the same structure with different illumination conditions (e.g., combinations of one or more of illumination wavelengths, illumination polarizations and wafer orientations).

[0085] At step 430, a machine learning model such as a deep convolutional neural network is trained on the stochastic metric data and optical metrology data such that it is able to map or regress the optical metrology data (e.g., pupil images or SXR diffraction images) to the stochastic metric data (e.g., a particular failure rate or LCDU value). As has been mentioned, additional metrology data of a similar type as the optical metrology data (e.g., pupil images or SXR diffraction images) may also be included in the training data, the additional metrology data relating to nominal informative signals, e.g., relating to specific stochastic metric examples (e.g., zero defects or specific defect types/rates). This additional metrology data may comprise simulated and/or measured nominal informative signals.

[0086] As is well known, the training may include a validation step, such that the training data is divided into a train set and validation set. In an embodiment, the validation may be performed on a CDU wafer. A CDU wafer may be exposed at best dose and best focus (e.g., using the values known from the training set) but can also comprise dose and focus variations which the trained model has never seen before. This further trains the model to be able to “interpolate” between the dose and focus conditions used for the training set.

[0087] At step 440, the trained model may be used to infer a stochastic metric value from one or more optical measurement pupils or SXR diffraction images (e.g., IDM or SXR metrology on targets/regular product structures) relating to a production wafer. This step may comprise inputting a pupil or SXR diffraction image to the trained model which can then infer a stochastic metric value from the input. [0088] It can be shown that, compared to an intensity distribution, a diffraction efficiency distribution may show better prediction performance when the machine learning model has been validated on validation data comprising an individual defect rate value per target. An intensity distribution embodiment demonstrates a more acceptable performance when the validation data is averaged over multiple targets; e.g., the validation data comprises defect rates for multiple pupils or SXR diffraction images (for multiple targets) which are averaged (e.g., an average of the log of the defect rates). [0089] Figure 6 is a plot of defect rate as obtained via high resolution (e.g., SEM) metrology DR(SEM) in a conventional manner against defect rate as obtained via optical (e.g., IDM) metrology DR(IDM) using a trained model according to the teachings of this disclosure. It can be seen that there is a near perfect correlation between the values obtained by both methods, such that the model inference from optical metrology performs essentially just as well as conventional SEM metrology for measuring defect rate. Testing has shown a very similar relationship between SEM measurement based mean LCDU and mean LCDU values obtained from optical metrology using a trained model and the teachings herein. It is expected that similar correlations will be found for other stochastic metrics.

[0090] As stated, the training may be based on intensity optical metrology data or diffraction efficiency optical metrology data. There are some advantages in using diffraction efficiencies which include less dependence on the size of the training data set (i.e., better performance for smaller training sets, e.g., relating to fewer than 50, 40 or 35 targets per field) and better performance when using the trained model validated on single target measurements rather than averaged measurements over a number of targets. By contrast, an intensity based embodiment shows reliable inference when validated on averaged metrology data (e.g., more than 10 or more than 20 targets or more than 50 targets). [0091] While the training of the model may ideally be performed over all process parameter values likely to be encountered in production, the inventors have determined that the trained model can predict, with good accuracy, a stochastic metric from a pupil relating to a process parameter value which were not used in training and therefore which the model has never encountered before. As such, the model is able to “interpolate” between (and possibly extrapolate beyond) the dose and focus conditions used for the training set.

[0092] Step 420 may comprise an optional normalization of the measured pupil intensities or diffraction efficiencies. The normalized intensity Intensity _norm p i j k may be determined in a high- resolution step by high-resolution determining an intermediate normalized intensity Intensity jiorm! p i j k :

Intensity p i j k

Intensity _norm' p i j k = m ax P,i,j ( Intensity pij k ) where i is the field index for the training data = 1. . N ields t r ai n· j is the target index for the training data = 1. . N targets train· , k is the channel index for the training data = 1. . N channels (where a channel may relate to a particular illumination condition (polarization) and orientation of the substrate) and p is the pixel index for the training data = 1.. N pixeis (where each pupil may comprise multiple pixels, each with a corresponding intensity (or diffraction efficiency) value Intensity p ij k . describes the maximum intensity value over the dataset. The diffraction efficiency normalization may be performed in the same way) [0093] An optional second step may comprise determining the normalized intensity Intensity _norm V i j k as: for K = 1.. N channels For p = 1. . N pixeis . The diffraction efficiency normalization may be performed in the same way for diffraction efficiencies. Alternatively, the intermediate normalized intensity Intensity jiorm' p i k may be used. Alternatively an idealized (reference) pupil based on for example simulation may be subtracted from the normalized intensity pupil.

[0094] The above description is related to obtaining optical measurements such as pupils (e.g., imaged at a pupil plane using a tool such as illustrated in Figure 3(a)) or SXR diffraction images (e.g., imaged at a pupil plane using a tool such as illustrated in Figure 4(a)) and mapping them to a stochastic metric using a suitable trained model.

[0095] In a further embodiment, similar techniques will be applied to bright field images (different to the spectrally resolved SXR images already described) obtained using a bright field inspection tool. Bright field inspection tools are used in defect detection in integrated circuit manufacturing processes. [0096] A bright field inspection (BFI) image may be obtained by illuminating a sample (e.g., structure on a substrate) with high angle incident light (e.g., 45-90 degrees with respect to the horizontal plane) producing a “bright” field of view, collecting the radiation reflected by the structure and imaging this reflected radiation on a camera at an image plane. By looking at the difference between an acquired BFI image and a reference BFI image of the same pattern, it is possible to detect (with limited accuracy) the existence of a defect.

[0097] Typically, in a BFI image, a defect appears dark against a lighter background. However, a typical BFI image contains much more information than a single dark spot indicating a defect; it usually contains many more other candidate (usually smaller) dark spots, such that the background resembles a white noise image. Each of these other dark spots could be mistakenly identified as a defect, should a classification algorithm not be robust enough. The BFI image can be strongly affected by the surrounding pattern and stochastic profile variation can likely be detected.

[0098] It is expected that a BFI image will show sensitivity to line-edge roughness. However, present BFI image processing, which typically only classifies defects within a BFI image in order to extract the relevant pattern defect rate, does not use the valuable surrounding pattern information within the image. [0099] As such, a machine learning model such as a deep convolutional neural network can be used in order to regress the full BFI image to the given pattern defect rate. Therefore, a set of images obtained via BFI may be mapped to a pattern failure rate as determined, e.g., by an SEM tool or similar, during a training phase. The input to the model may be a tensor comprising the measured bright field image(s). Similarly to the previous embodiment, the training data may also comprise one or more nominal informative bright field images from a reference and/or a simulation (e.g., relating to zero-defect images and or particular defects). In this way, the model will learn how to contrast the measured calibration images and nominal calibration images and better regress their difference to a given failure rate. [00100] The method can be augmented with uncertainty estimates of the defect rates, by combining predictions over multiple independent targets per field.

[00101] Figure 7 is a flow diagram describing such a method. At step 500, a wafer is exposed using a scanner, with at least one process parameter being varied over the wafer. As such, the exposed wafer may comprise a plurality of training structures, each of which may comprise multiple instances of features. The training structures may all be similar, other than there may have been a variation in one or more process parameters used in their formation. Process parameters in this context may describe a parameter of the lithographic apparatus used to image the structure from a reticle (e.g., focus and or dose) and or a reticle parameter such as a reticle feature dimension (on which imaged feature dimensions such as CD are dependent). For example, the structures may be repeated for different values of focus and/or dose; e.g., in a similar manner as a focus-exposure matrix FEM, and or with different values for CD.

[00102] The training structures used to train the model may be similar or essentially the same as the structures which will be measured to obtain inspection image data (e.g., image plane data or bright field inspection image data such as BFI images) on which the trained model will be used to infer a stochastic metric in a production setting or HVM setting. However, this is not necessarily essential, and some differences may be accommodated, with a possible impact on inference accuracy by the trained model. [00103] At step 510, high-resolution metrology data is obtained from measurements of the structures on the wafer exposed at step 500 using a high-resolution metrology tool, e.g., having sufficient resolution to be able to individually image each feature or structure and or directly and accurately classify defects and thereby determine the pattern defect rate. As such, a high-resolution metrology tool may have higher resolution than an optical metrology tool, and may comprise an SEM/e-beam tool. Based on this high-resolution metrology data, stochastic metric data may be determined, describing a process window for the process. A process window may describe a process space or range of process parameter values for which the number of defects/defect rate or other stochastic metric is acceptable, e.g., below a threshold. This is indicative of an acceptable probability of there being no defect provided that the process parameters remain within the process window, while outside of this window the number of defects/defect rate or other stochastic metric may be deemed to be unacceptable (i.e., indicative of an unacceptable probability of there being no defect).

[00104] At step 520, the same wafers may be measured using an inspection image tool, e.g., a bright field inspection tool, which captures images in an image plane to obtain inspection image data. The inspection image data may relate to measurements of the same structure with different illumination conditions (e.g., combinations of one or more of illumination wavelengths, illumination polarizations and wafer orientations).

[00105] At step 530, a machine learning model such as a deep convolutional neural network is trained on the stochastic metric data and inspection image data such that it is able to map or regress the inspection image data to the stochastic metric data (e.g., a particular failure rate or LCDU value). As has been mentioned, additional metrology data of a similar type as the inspection image data (e.g., BFI images) may also be included in the training data, the additional metrology data relating to nominal informative images, e.g., relating to specific stochastic metric examples (e.g., zero defects or specific defect types/rates). This additional metrology data may comprise simulated and/or measured nominal informative images.

[00106] As with the previous embodiment, a validation step may be performed, which may be the same as the validation step previously described.

[00107] At step 540, the trained model may be used to infer a stochastic metric value from one or more inspection images (e.g., bright field inspection images) relating to a production wafer. This step may comprise inputting an inspection image to the trained model which can then infer a stochastic metric value from the input.

[00108] It can be shown that, compared to an intensity distribution, a diffraction efficiency distribution may show better prediction performance when the machine learning model has been validated on validation data comprising an individual defect rate value per target. An intensity distribution embodiment demonstrates a more acceptable performance when the validation data is averaged over multiple targets; e.g., the validation data comprises defect rates for multiple pupils or inspection images (for multiple targets) which are averaged (e.g., an average of the log of the defect rates).

[00109] The proposed method may combine predictions over a few independent targets/stmctures (e.g., where independent relates to a particular field of view) in order to obtain uncertainty estimates as well as more accurate predictions.

[00110] The method disclosed herein can be applied to perform EPE metrology using an optical metrology tool/scatterometer. EPE (described above) is a composed metric, comprising both stochastics and systematics. In an embodiment, EPE may be determined from a determination of LCDU using a trained model and the method as disclosed herein and combining this with a conventional overlay measurement (e.g., diffraction based overlay or micro-diffraction based overlay method using an optical metrology tool).

[00111] It should be appreciated that any defect-rate prediction provided by the trained machine learning model is not binary (i.e., defect/no defect), but instead can provide a prediction of an expected defect rate for a particular pattern or multiple inspected patterns. For example, the machine learning model may be separately trained for each separate pattern/structure to obtain a dedicated model for that pattern/stmcture type. Such a model may be trained, for example, on critical patterns with a small process window. In another embodiment, it may be that a single model is trained to maps pupils for multiple different features/targets to a single defect rate prediction or single “global” defect rate .In such an embodiment, it may be that training only on one pattern may be enough in order to teach it to infer such a global defect rate.

[00112] Further embodiments of the invention are disclosed in the list of numbered clauses below:

1. A method of determining at least one stochastic metric relating to a lithographic process, the method comprising: obtaining a trained machine learning model, the machine learning model having been trained to infer one or more stochastic metric values for said stochastic metric from optical metrology data, said trained machine learning model having been trained on training optical metrology data and training stochastic metric data, wherein the training optical metrology data comprises a plurality of measurement signals, each relating to scattered radiation having been scattered by a training structure of a plurality of training structures on a training substrate; and the training stochastic metric data comprises stochastic metric values relating to said training structures, wherein multiple instances of said training structures have been formed with a variation in one or more process parameters on which said stochastic metric is dependent; obtaining optical metrology data comprising at least one measurement signal relating to a structure having been exposed in a lithographic process; and using the trained machine learning model to infer a value for said stochastic metric from said optical metrology data.

2. A method according to clause 1, wherein each of said measurement signals comprises an angularly resolved parameter distribution.

3. A method according to clause 2, wherein each angularly resolved parameter distribution comprises an angularly resolved intensity distribution or an angularly resolved diffraction efficiency distribution.

4. A method according to clause 2 or 3, wherein each angularly resolved parameter distribution comprises an angularly resolved parameter distribution obtained from a zeroth order of said scattered radiation.

5. A method according to clause 2, 3 or 4, wherein each angularly resolved parameter distribution comprises an angularly resolved parameter distribution obtained from one or more higher orders of said scattered radiation, a higher order comprising a diffraction order other than the zeroth order.

6. A method according to any of clauses 2 to 5, comprising the step of normalizing said angularly resolved parameter distribution.

7. A method according to clause 1, wherein each of said measurement signals comprises a spectrally resolved parameter distribution.

8. A method according to clause 7, wherein each angularly resolved parameter distribution comprises a spectrally resolved intensity distribution or a spectrally resolved diffraction efficiency distribution. 9. A method according to clause 7 or 8, wherein each angularly resolved parameter distribution is obtained from measurement radiation comprising wavelengths between 5nm and 30nm; or more specifically between lOnm and 20nm.

10. A method according to clause 7, 8 or 9, wherein each angularly resolved parameter distribution comprises a spectrally resolved parameter distribution obtained from at least one or more higher orders of said scattered radiation, a higher order comprising a diffraction order other than the zeroth order.

11. A method according to any of clauses 7 to 10, comprising the step of normalizing said angularly resolved parameter distribution.

12. A method according to any preceding clause, wherein the machine learning model comprises a convolutional neural network.

13. A method according to clause 12, wherein said convolutional neural network comprises one or more activation layers which imposes a logarithmic activation function.

14. A method according to any preceding clause, wherein said training stochastic metric data describes an acceptable space or range of stochastic metric values or related dimensional metric values, and a corresponding acceptable space or range of process parameter values for said one or more process parameters.

15. A method according to any preceding clause, comprising the initial steps of: obtaining said training optical metrology data and stochastic metric data; and training said trained machine learning model on said training optical metrology data and stochastic metric data.

16. A method according to clause 15, comprising obtaining high-resolution metrology data; and determining said stochastic metric data from said high-resolution metrology data.

17. A method according to clause 16, wherein said high-resolution metrology data is obtained from scanning electron microscope metrology.

18. A method according to any preceding clause, wherein the training optical metrology data further comprises nominal informative metrology data relating to one or both of: non-defect measurements and/or simulations; and specific defect measurements or simulations.

19. A method according to any preceding clause, comprising using the inferred value for the stochastic metric to decide where and/or when to perform further high-resolution metrology.

20. A method according to any preceding clause, wherein said one or more process parameters comprise one or both of focus and dose when forming said training structures.

21. A method according to any preceding clause, wherein said one or more process parameters comprises one or more feature dimensions on a patterning device used to expose said training structures.

22. A method according to any preceding clause, wherein the stochastic metric comprises one or more of: defect rate or other defect metric, line edge roughness, line width roughness, local critical dimension uniformity, circle edge roughness or edge placement error. 23. A processing apparatus comprising a processor, and being configured to perform the method of any preceding clause.

24. An optical inspection apparatus being operable to measure and obtain said optical metrology data, and comprising the computing apparatus of clause 23.

25. A computer program comprising program instructions operable to perform the method of any of clauses 1 to 22, when run on a suitable apparatus.

26. A non-transient computer program carrier comprising the computer program of clause 25.

27. An optical metrology device comprising: an optical system operable to obtain optical metrology data comprising at least one measurement signal relating to a structure having been exposed in a lithographic process; non-transient data carrier comprising a trained machine learning model, the machine learning model having been trained to infer one or more stochastic metric values for said stochastic metric from optical metrology data, said trained machine learning model having been trained on training optical metrology data and training stochastic metric data, wherein the training optical metrology data comprises a plurality of measurement signals, each relating to scattered radiation having been scattered by a training structure of a plurality of training structures on a training substrate; and the training stochastic metric data comprises stochastic metric values relating to said training structures, wherein multiple instances of said training structures have been formed with a variation in one or more process parameters on which said stochastic metric is dependent; and a processor being operable use the trained machine learning model to infer a value for said stochastic metric from said optical metrology data.

28. An optical metrology device according to clause 27, wherein each of said measurement signals comprises an angularly resolved parameter distribution.

29. An optical metrology device according to clause 28, wherein each angularly resolved parameter distribution comprises an angularly resolved intensity distribution or an angularly resolved diffraction efficiency distribution.

30. An optical metrology device according to clause 28 or 29, wherein each angularly resolved parameter distribution comprises an angularly resolved parameter distribution obtained from a zeroth order of said scattered radiation.

31. An optical metrology device according to any of clauses 28 to 30, wherein each angularly resolved parameter distribution comprises an angularly resolved parameter distribution obtained from one or more higher orders of said scattered radiation, a higher order comprising a diffraction order other than the zeroth order.

32. An optical metrology device according to clause 27, wherein each of said measurement signals comprises a spectrally resolved parameter distribution.

33. An optical metrology device according to clause 32, wherein each angularly resolved parameter distribution comprises a spectrally resolved intensity distribution or a spectrally resolved diffraction efficiency distribution. 34. An optical metrology device according to clause 32 or 33, wherein each angularly resolved parameter distribution is obtained from measurement radiation comprising wavelengths between 5nm and 30nm; or more specifically between lOnm and 20nm.

35. An optical metrology device according to any of clauses 32 to 34, wherein each angularly resolved parameter distribution comprises a spectrally resolved parameter distribution obtained from at least one or more higher orders of said scattered radiation, a higher order comprising a diffraction order other than the zeroth order.

36. An optical metrology device according to any of clauses 27 to 35, wherein the machine learning model comprises a convolutional neural network.

37. An optical metrology device according to clause 36, wherein said convolutional neural network comprises one or more activation layers being operable to impose a logarithmic activation function.

38. An optical metrology device according to any of clauses 27 to 37, wherein said training stochastic metric data describes an acceptable space or range of stochastic metric values or related dimensional metric values, and a corresponding acceptable space or range of process parameter values for said one or more process parameters.

39. An optical metrology device according to any of clauses 27 to 38, wherein said one or more process parameters comprise one or both of focus and dose when forming said training structures.

40. An optical metrology device according to any of clauses 27 to 39, wherein the stochastic metric comprises or more of defect rate or other defect metric, line edge roughness, line width roughness, local critical dimension uniformity, circle edge roughness or edge placement error.

41. A method of determining a stochastic metric relating to a structure, the method comprising: obtaining a trained machine learning model, the machine learning model having been trained to correlate training optical metrology data to training stochastic metric data, wherein the training optical metrology data comprises a plurality of measurement signals relating to radiation scattered from a plurality of training structures on a substrate and the training stochastic metric data comprises stochastic metric values relating to said plurality of training structures, wherein the plurality of training structures have been formed with a variation in one or more dimensions on which said stochastic metric is dependent; obtaining optical metrology data from a structure; and using the trained machine learning model to infer a value for said stochastic metric associated with the structure from said optical metrology data.

42. The method of clause 41, wherein the stochastic metric represents a defect probability or a CD variation at a small spatial scale, for example smaller than 1000 times the CD.

43. The method of clause 41 or 42, wherein the measurement signal is a zero order pupil intensity distribution of radiation after being scattered by the structure or training structure.

44. The method of any of clauses 41 to 43, wherein the training stochastic metric data is obtained using an e-beam metrology tool. 45. The method of any of clauses 41 to 44, wherein the variation in the one or more dimensions is associated with a variation in a process parameter of a lithographic apparatus, such exposure dose and/or focus setting.

46. A computer program comprising program instructions operable to perform the method of any of clauses 41 to 45, when run on a suitable apparatus.

47. A non-transient computer program carrier comprising the computer program of clause 46.

48. A method of determining a stochastic metric relating to a structure, the method comprising: obtaining a trained machine learning model, the machine learning model having been trained to correlate training optical metrology data to training stochastic metric data, wherein the training optical metrology data comprises a plurality of measurement signals relating to radiation scattered from a plurality of training structures on a substrate and the training stochastic metric data comprises stochastic metric values relating to said plurality of training structures, wherein the plurality of training structures have been formed with a variation in one or more dimensions on which said stochastic metric is dependent; obtaining optical metrology data from a structure; and using the trained machine learning model to infer a value of the stochastic metric associated with the structure from the optical metrology data.

[00113] Additional further embodiments of the invention are disclosed in the list of numbered clauses below:

1. A method of determining at least one stochastic metric relating to a lithographic process, the method comprising: obtaining a trained model, the model having been trained to infer one or more stochastic metric values for said stochastic metric from inspection image data, said trained model having been trained on training inspection image data and training stochastic metric data, wherein the training inspection image data comprises a plurality of inspection images, each relating to reflected radiation having been reflected by a training structure of a plurality of training structures on a training substrate; and the training stochastic metric data comprises stochastic metric values relating to said training structures, wherein multiple instances of said training structures have been formed with a variation in one or more process parameters on which said stochastic metric is dependent; obtaining inspection image data comprising at least one inspection image relating to a structure having been exposed in a lithographic process; and using the trained model to infer a value for said stochastic metric from said inspection image data.

2. A method according to clause 1, wherein each of said inspection images comprises an image of a respective structure captured at an image plane, or conjugate thereof, of an inspection imaging device used to obtain said inspection images.

3. A method according to clause 1 or 2, wherein of said inspection images comprises a bright field inspection image.

4. A method according to any preceding clause, wherein the model comprises a machine learning model, neural network or convolutional neural network. 5. A method according to clause 4, wherein said convolutional neural network comprises one or more activation layers which imposes a logarithmic activation function.

6. A method according to any preceding clause, wherein said training stochastic metric data describes an acceptable space or range of stochastic metric values or related dimensional metric values, and a corresponding acceptable space or range of process parameter values for said one or more process parameters.

7. A method according to any preceding clause, comprising the initial steps of: obtaining said training inspection image data and stochastic metric data; and training said trained model on said training inspection image data and stochastic metric data.

8. A method according to clause 7, comprising obtaining high-resolution metrology data; and determining said stochastic metric data from said high-resolution metrology data.

9. A method according to clause 8, wherein said high-resolution metrology data is obtained from scanning electron microscope metrology.

10. A method according to any preceding clause, wherein the training inspection image data further comprises nominal informative metrology data relating to one or both of: non-defect inspections and/or simulations; and specific defect inspections or simulations.

11. A method according to any preceding clause, comprising using the inferred value for the stochastic metric to decide where and/or when to perform further high-resolution metrology.

12. A method according to any preceding clause, wherein said one or more process parameters comprise one or both of focus and dose when forming said training structures.

13. A method according to any preceding clause, wherein said one or more process parameters comprises one or more feature dimensions on a patterning device used to expose said training structures.

14. A method according to any preceding clause, wherein the stochastic metric comprises one or more of: defect rate or other defect metric, line edge roughness, line width roughness, local critical dimension uniformity, circle edge roughness or edge placement error.

15. A processing apparatus comprising a processor, and being configured to perform the method of any preceding clause.

16. An optical inspection apparatus being operable to measure and obtain said inspection image data, and comprising the computing apparatus of clause 15.

17. A computer program comprising program instructions operable to perform the method of any of clauses 1 to 16, when run on a suitable apparatus.

18. A non-transient computer program carrier comprising the computer program of clause 17.

19. An inspection imaging device comprising: an imaging system operable to obtain inspection image data comprising at least one inspection image relating to a structure having been exposed in a lithographic process; non-transient data carrier comprising a trained model, the model having been trained to infer one or more stochastic metric values for said stochastic metric from inspection image data, said trained model having been trained on training inspection image data and training stochastic metric data, wherein the training inspection image data comprises a plurality of inspection images, each relating to reflected radiation having been reflected by a training structure of a plurality of training structures on a training substrate; and the training stochastic metric data comprises stochastic metric values relating to said training structures, wherein multiple instances of said training structures have been formed with a variation in one or more process parameters on which said stochastic metric is dependent; and a processor being operable use the trained model to infer a value for said stochastic metric from said inspection image data.

20. An inspection imaging device according to clause 19, comprising a camera at an image plane, or conjugate thereof, of the inspection imaging device, for capturing said inspection images.

21. An inspection imaging device according to clause 19 or 20, wherein said inspection imaging device comprises a bright field imaging device and each inspection image comprises a bright field inspection image.

22. An inspection imaging device according to any of clauses 19 to 21, wherein the model comprises a machine learning model, neural network or convolutional neural network.

23. An inspection imaging device according to clause 22, wherein said convolutional neural network comprises one or more activation layers being operable to impose a logarithmic activation function.

24. An inspection imaging device according to any of clauses 19 to 23, wherein said training stochastic metric data describes an acceptable space or range of stochastic metric values or related dimensional metric values, and a corresponding acceptable space or range of process parameter values for said one or more process parameters.

25. An inspection imaging device according to any of clauses 19 to 24, wherein said one or more process parameters comprise one or both of focus and dose when forming said training structures.

26. An inspection imaging device according to any of clauses 19 to 25, wherein the stochastic metric comprises or more of defect rate or other defect metric, line edge roughness, line width roughness, local critical dimension uniformity, circle edge roughness or edge placement error.

27. A method of determining a stochastic metric relating to a structure, the method comprising: obtaining a trained model, the model having been trained to correlate training inspection image data to training stochastic metric data, wherein the training inspection image data comprises a plurality of inspection images relating to radiation reflected from a plurality of training structures on a substrate and the training stochastic metric data comprises stochastic metric values relating to said plurality of training structures, wherein the plurality of training structures have been formed with a variation in one or more dimensions on which said stochastic metric is dependent; obtaining inspection image data from a structure; and using the trained model to infer a value for said stochastic metric associated with the structure from said inspection image data. 28. The method of clause 27, wherein the stochastic metric represents a defect probability or a CD variation at a small spatial scale, for example smaller than 1000 times the CD.

29. The method of clause 27 or 28, wherein the inspection image is a bright field inspection image of the structure or training structure.

30. The method of any of clauses 27 to 29, wherein the training stochastic metric data is obtained using an e-beam metrology tool.

31. The method of any of clauses 27 to 30, wherein the variation in the one or more dimensions is associated with a variation in a process parameter of a lithographic apparatus, such exposure dose and/or focus setting.

32. A computer program comprising program instructions operable to perform the method of any of clauses 27 to 31, when run on a suitable apparatus.

33. A non-transient computer program carrier comprising the computer program of clause 32.

34. A method of determining a stochastic metric relating to a structure, the method comprising: obtaining a trained model, the model having been trained to correlate training inspection image data to training stochastic metric data, wherein the training inspection image data comprises a plurality of inspection images relating to radiation reflected from a plurality of training structures on a substrate and the training stochastic metric data comprises stochastic metric values relating to said plurality of training structures, wherein the plurality of training structures have been formed with a variation in one or more dimensions on which said stochastic metric is dependent; obtaining inspection image data from a structure; and using the trained model to infer a value of the stochastic metric associated with the structure from the inspection image data.

[00114] The terms “radiation” and “beam” used in relation to the lithographic apparatus encompass all types of electromagnetic radiation, including ultraviolet (UV) radiation (e.g., having a wavelength of or about 365, 355, 248, 193, 157 or 126 nm) and extreme ultra-violet (EUV) radiation (e.g., having a wavelength in the range of 5-20 nm), as well as particle beams, such as ion beams or electron beams. [00115] The term “lens”, where the context allows, may refer to any one or combination of various types of optical components, including refractive, reflective, magnetic, electromagnetic and electrostatic optical components.

[00116] The foregoing description of the specific embodiments will so fully reveal the general nature of the invention that others can, by applying knowledge within the skill of the art, readily modify and or adapt for various applications such specific embodiments, without undue experimentation, without departing from the general concept of the present invention. Therefore, such adaptations and modifications are intended to be within the meaning and range of equivalents of the disclosed embodiments, based on the teaching and guidance presented herein. It is to be understood that the phraseology or terminology herein is for the purpose of description by example, and not of limitation, such that the terminology or phraseology of the present specification is to be interpreted by the skilled artisan in light of the teachings and guidance. [00117] The breadth and scope of the present invention should not be limited by any of the above- described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.