DETAILED ACTION
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
Regarding claims 1-20, 35 USC 112(f) is not invoked in claims 1-20. 







Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1-4,7-14 and 15-18 and 20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Zhang et al. (US 2018/0107928 A1).
Regarding claim 1, Zhang discloses a computerized method for defect detection on a specimen, the method performed by a processor and memory circuitry (PMC), the method comprising (the preamble of claim 1 does not serve to limit claim 1):
obtaining a runtime (via fig. 1: “Computer subsystem”, three times) image (via fig. 4:400: “Imaging tool”) representative of at least a portion of the specimen (or “specimen” as shown in fig. 1:14);











processing the runtime image using an unsupervised (said “semi-supervised detect or region detection”) model component (via fig. 4:442: “Best Model 2”) to obtain a second output (via an arrow between fig. 4:418: “Data and Labels” and said fig. 4:420: “Data Partition”) indicative of estimated (said via a “crude… approximate”) presence (said via fig. 4:432: “Detection” corresponding to “defect to be detected”) of second (via fig. 4:438: “Cropped Image”) defects on the runtime image (said via “defects in the image”), wherein the unsupervised model is trained (via fig. 4:440: “Model Training 2”) using a second training set (“train…based on cropped images”) including a plurality of second (cropped) images each representative of at least a portion of the specimen, each second (cropped) image being a reference (or information or data) image of (used to indicate association) a first (un-cropped) image; and 
combining the first output and the second output (resulting in “a combination of…images” represented as said fig. 4:420: “Data Partition”, a merging of said arrows) using one or more optimized parameters (or “fine tune parameters”) to obtain a defect detection result (or “light result…at the detector”) of the specimen (via:
“[0023] One embodiment relates to a system configured to perform diagnostic functions for a deep learning model.  Some embodiments described herein are configured as systems with optional visualization capability for causal understanding and guided training of a deep learning model for semiconductor applications such as inspection and metrology.  For example, the embodiments described herein provide a system configured to perform quality assurance and causal understanding for a deep learning model.  In particular, as described further herein, the embodiments are configured for generating causal information (e.g., causal image/vector) through several possible methods and/or algorithms.  In addition, by using the causal information, the embodiments can quantitatively determine the model performance.  Furthermore, the systems can use the information gained by quality assurance and/or causal understanding to perform one or more functions such as providing guidance on data augmentation and/or fine-tuning the process to further improve the accuracy of the deep 
learning model.  In other words, by using causal information (e.g., causal image/vector) in augmentation, the embodiments can improve the deep learning model further.  Moreover, the embodiments described herein provide semi-supervised detect or region detection, which can advantageously reduce manual labeling efforts.

“[0027] In one embodiment, the imaging tool is configured as an optical based imaging tool.  In this manner, in some embodiments, the images are generated by an optical based imaging tool.  In one such example, in the embodiment of the system shown in FIG. 1, optical based imaging tool 10 includes an illumination subsystem configured to direct light to specimen 14.  The illumination subsystem includes at least one light source.  For example, as shown in FIG. 1, the illumination subsystem includes light source 16.  In one embodiment, the illumination subsystem is configured to direct the light to the specimen at one or more angles of incidence, which may include one or more oblique angles and/or one or more normal angles.  For example, as shown in FIG. 1, light from light source 16 is directed through optical element 18 and then lens 20 to 
specimen 14 at an oblique angle of incidence.  The oblique angle of incidence may include any suitable oblique angle of incidence, which may vary depending on, for instance, characteristics of the specimen.”






“[0029] In some instances, the imaging tool may be configured to direct light to the specimen at more than one angle of incidence at the same time.  For example, the illumination subsystem may include more than one illumination channel, one of the illumination channels may include light source 16, optical element 18, and lens 20 as shown in FIG. 1 and another of the illumination channels (not shown) may include similar elements, which may be configured differently or the same, or may include at least a light source and possibly one or more other components such as those described further herein.  If such light is directed to the specimen at the same time as the other light, one or more characteristics (e.g., wavelength, polarization, etc.) of the light directed to the specimen at different angles of incidence may be different such 
that light resulting from illumination of the specimen at the different angles of incidence can be discriminated front each other at the detector(s).”

“[0067] In a further such embodiment, the deep learning model includes one or more fully connected layers configured for classifying defects on the specimen.  A "fully connected layer" may be generally defined as a layer in which each of the nodes is connected to each of the nodes in the previous layer.  The fully connected layer(s) may perform classification based on the features extracted by convolutional layer(s), which may be configured as described further herein.  The fully connected layer(s are configured for feature selection and classification.  In other words, the fully connected layer(s) select features from a feature map and then classify the defects in the image(s) based on the selected features.  The selected features may include all of the features in the feature map (if appropriate) or only some of the features in the feature map.”

“[0070] The features determined the deep learning model may include any suitable features described further herein or known in the art that can be inferred from the input described herein (and possibly used to generate the output described further herein).  For example, the features may include a vector of intensity values per pixel.  The features may also include any other types of features described herein, e.g., vectors of scalar values, vectors of independent distributions, joint distributions, or any other suitable feature types known in the art.”

“[0077] An input to a deep learning model can include a combination of: a) images defined by x(h, w, c, t, .  . . ), which is an N-dimensional tensor of images with height=h and width=w across other dimensions, e.g., channel c, time t, etc. (In semiconductor applications, x can be an optical image, an electron beam image, a design data image (e.g., CAD image), etc. under different tool conditions.); and b) feature vector v(m), which is a 1-dimensional vector (The dimension can be generalized to be more than 1.).





“[0087] In a further embodiment, the diagnostic component is configured for determining the one or more causal portions by global average pooling.  As described by Lin et al, in "Network In Network," arXiv: 1312,4400, which is incorporated by reference as if fully set forth herein, the global average pooling (GAP) is introduced and defined.  GAP provides crude pixel-level causal region information, which can be approximately interpreted as causal image/vector.  The embodiments described herein may be further configured as described in the above reference.

“[0106] In one such example, the causal information may be generated for an input image and if the relevant region in the causal information matches the defect to be detected (in the case of defect detection or classification), the diagnostic component may determine that no augmentation needs to be performed as the model predicted correctly.  However, if the relevant region only matches part of a defect or does not match a defect at all (in the case of defect detection or classification), the diagnostic component may determine that an augmentation method may be advantageous and may request input from a user for a possible augmentation method.  The user may then, for example, specify one or more attention portions and/or one or more ignore regions in the input image via bounding boxes, locations, etc. The information for these user-specified portions can be sent to the augmentation step to alter the input image, for 
example, by randomly perturbing the ignore portion(s) by zeroing or adding noise and/or randomly transforming the attention portion(s).”

























“[0115] For example, as shown in FIG. 4, detection 432 may generate ROI 434, which may include information for any one or more causal portions identified as ROIs, which may be used for cropping 436 of the original image to the candidate patch along with the output (e.g., class prediction) from best model 1.  In particular, the original image may be cropped to eliminate portion(s) of the original image that do not correspond to the ROI(s).  In one such example, cropped image(s) 438 generated by cropping 436 may be output to data partition 420, which may then use the cropped images to generate additional training data 422, which may replace the original training data.  The new training data may then be used to tune best model 1.  For example, the new training data may be input to model training 1 424, which may be used to tune or fine tune parameters of best model 1, which may output results to model selection 426.  Model selection may produce best model 1 428, which would be a modified version of the best model 1 originally produced.  The new best model 1 may then be evaluated as described above and used for detection of ROI(s), which can be used to generate still further training data, which can be used to re-tune the best model 1 again.  In this manner, the embodiments described herein provide a system for iteratively tuning a deep learning model based on ROI(s) determined by previous versions of the deep learning model.
[0116] In some embodiments, the one or more functions include identifying the 
one or more causal portions as one or more ROIs in the image, which may be 
performed as described herein, and training an additional deep learning model 
based on the one or more ROIs.  For example, causal back propagation or another 
of the causal portion determination methods described herein may be used as 
semi-supervised ROI detection to train a second "more accurate" deep learning 
model based on cropped images.  The reason this is called "semi-supervised" is 
that the labeling process for best model 1 does not require labeling exactly 
the bounding box for each object.  As shown in FIG. 4, for example, cropped 
image 438 may be provided to model training 2 440.  Model training 2 may be 
performed as described herein, but using a different deep learning model than 
that trained in model training 1 424.  Results of model training 2 may produce 
best model 2 442, which may then be provided to model deployment 444, which may 
be performed as described further herein.”







Regarding claim 2, Zhang discloses the computerized method according to claim 1, wherein the one or more optimized parameters are obtained during training using a third training set (via “additional training images” via:
“[0092] In some embodiments, the one or more functions include altering one or more parameters of the deep learning model based on the determined one or more causal portions.  For example, the diagnostic component may determine if the one or more causal portions are the correct causal portion(s) of the image, which may be performed as described further herein.  If the one or more causal portions are incorrect., the diagnostic component may be configured to fine-tune or re-train the deep learning model to thereby alter one or more parameters of the deep learning model, which may include any of the parameters described herein.  The fine-tuning or re-training of the deep learning model may include inputting additional training images to the deep learning model, comparing the output generated for the training images to known output for the training images (e.g., defect classification(s), segmentation region(s), etc.), and altering one or more parameters of the deep learning model until the output generated for the additional training images by the deep learning model substantially matches the known output for the additional training images.  In addition, the diagnostic component may be configured to perform any other method and/or algorithm to alter one or more parameters of the deep learning model based on the determined one or more causal portions.”












Regarding claim 3, Zhang discloses the computerized method according to claim 2, 
wherein the first output is a first grade map (via “the image (i.e., a feature map)”) representative of estimated (said via a “crude… approximate”) probabilities (via a “probabilistic” “model”) of the first defects on the runtime image, and the second output is a second grade map (said via “the image (i.e., a feature map)”) representative of estimated (said via a “crude… approximate”) probabilities (said via a “probabilistic” “model”) of the second defects on the runtime image; and 
wherein the combining is performed using a segmentation model component (or a “segmentation…proposal network”) operatively connected to the supervised and unsupervised model components (via “the deep learning model includes one…segmentation…proposal network”), to obtain a composite grade map (via said resulting in “a combination of…images” represented as said fig. 4:420: “Data Partition”, a merging of said arrows) indicative of estimated (said via a “crude… approximate”) probabilities (said via a “probabilistic” “model”) of the first defects and the second defects on the specimen, and 
wherein the segmentation model component is trained (given that said “the deep learning model includes one…segmentation…proposal network”) using the third training set (for “re-training” cited in the rejection of claim 2) based on outputs of the supervised model and unsupervised model (for said “re-training” via:





“[0060] In some embodiments, the deep learning model is a generative model.  A "generative" model can be generally defined as a model that is probabilistic in nature.  In other words, a "generative" model is riot one that performs forward simulation or rule-based approaches.  Instead, as described further herein, the generative model can be learned (in that its parameters can be learned) based on a suitable training set of data.  In one embodiment, the deep learning model is configured as a deep generative model.  For example, the model may be configured to have a deep learning architecture in that the model may include multiple layers, which perform a number of algorithms or transformations.”
“[0069] In some embodiments, the information determined by the deep learning model includes features of the images extracted by the deep learning model.  In one such embodiment, the deep learning model includes one or more convolutional layers.  The convolutional layer(s) may have any suitable configuration known in the art and are generally configured to determine features for an image as a function of position across the image (i.e., a feature map) by applying a convolution function to the input image using one or more filters.  In this manner, the deep learning model (or at least a part of the deep learning model) may be configured as a convolution neural network (CNN).  For example, the deep learning model may be configured as a CNN, which is usually stacks of convolution and pooling layers, to extract local features.  The embodiments 
described herein can take advantage of deep learning concepts such as a CNN to solve the normally intractable representation inversion problem.  The deep learning model may have any CNN configuration or architecture known in the art.  The one or more pooling layers may also have any suitable configuration known in the art (e.g., max pooling layers) and are generally configured for reducing the dimensionality of the feature map generated by the one or more convolutional layers while retaining the most important features.”
“[0072] In another embodiment, the information determined by the deep learning model includes one or more segmentation regions generated from the image.  In one such embodiment, the deep learning model includes a proposal network configured for identifying the segmentation region(s) (based on features determined for the image) and generating bounding boxes for each of the segmentation regions.  The segmentation regions may be detected based on the features (determined for the images by the deep learning model or another method or system) to thereby separate regions in the images based on noise (e.g., to separate noisy regions from quiet regions), to separate regions in the images based on specimen features located therein, to separate regions based on geometric characteristics of the output, etc. The proposal network may use features from a feature map, which may be generated or determined as described further herein, to detect the segmentation region(s) in the image based on the determined features.  The proposal network may be configured to generate bounding box detection results.  In this manner, the deep learning model may output bounding boxes, which may include a bounding box associated with each segmentation region or more than one segmentation region.  The deep learning model may output bounding box locations with each bounding box.  The results of the segmentation region generation can also be stored and used as described further herein.”).  

Regarding claim 4, Zhang discloses the computerized method according to claim 2, 
wherein the first output is a first grade map (said via “the image (i.e., a feature map)”) representative of estimated probabilities of the first defects on the runtime image, and the second output is a second grade map (said via “the image (i.e., a feature map)”) representative of estimated probabilities of the second defects on the runtime image; and 
wherein the combining comprises combining the first grade map and the second grade map with respective global weights (“a set of weights that model the world”) to generate a composite grade map (via said resulting in “a combination of…images” represented as said fig. 4:420: “Data Partition”, a merging of said arrows) indicative of estimated (said via a “crude… approximate”) probabilities (said via a “probabilistic” “model”) of the first defects and the second defects on the specimen, wherein the respective global weights are optimized (said “fine tune parameters”) during training using the third training set (via:
“[0061] In another embodiment, the deep learning model is configured as a neural network.  In a further embodiment, the deep learning model may be a deep neural network with a set of weights that model the world according to the data that 
it has been fed to train it.  Neural networks can be generally defined as a computational approach which is based on a relatively large collection of neural units loosely modeling the way a biological brain solves problems with relatively large clusters of biological neurons connected by axons.  Each neural unit is connected with many others, and links can be enforcing or inhibitory in their effect on the activation state of connected neural units.  These systems are self-learning and trained rather than explicitly programmed and excel in areas where the solution or feature detection is difficult to 
express in a traditional computer program.”).  



Regarding claim 7, Zhang discloses the computerized method according to claim 1, wherein the supervised model component is trained by processing each first image to generate (said via fig. 4:400: “Imaging tool”) a corresponding first grade map (said via “the image (i.e., a feature map)” at fig. 4:424: “Model Training 1”) representative of estimated probabilities (said via a “probabilistic” “model”) of the first defects on the first image, and optimizing (via said “fine tune parameters”) the supervised model component based on the label data corresponding to the first image (said via fig. 4:400: “Imaging tool”).  
Regarding claim 8, Zhang discloses the computerized method according to claim 1, wherein the unsupervised model component is trained by processing each second image to generate (said via fig. 4:400: “Imaging tool”) a corresponding second grade map (said via “the image (i.e., a feature map)” at fig. 4:440: “Model Training 2”) representative of estimated probabilities (said via a “probabilistic” “model”) of the second defects on the second image, and optimizing (via said “fine tune parameters”) the unsupervised network based on the second grade map (said via “the image (i.e., a feature map)” at fig. 4:440: “Model Training 2”) in relation to the second image (said via fig. 4:400: “Imaging tool”).  






Regarding claim 9, Zhang discloses the computerized method according to claim 1, wherein the first training set further includes, for each first image, corresponding design data (or “ ‘design,’ ‘design data,’ and ‘design information’ as used interchangeably herein”), and/or at least one reference image, and the obtaining further comprises obtaining (via “derived from…simulation”) design data and/or at least one reference image of the runtime image (via:
“[0020] The terms "design," "design data," and "design information" as used interchangeably herein generally refer to the physical design (layout) of an IC and data derived from the physical design through complex simulation or simple geometric and Boolean operations.  In addition, an image of a reticle acquired by a reticle inspection system and/or derivatives thereof can be used as a "proxy" or "proxies" for the design.  Such a reticle image or a derivative thereof can serve as a substitute for the design layout in any embodiments described herein that use a design.  The design may include any other design data or design data proxies described in commonly owned U.S.  Pat.  No. 7,570,796 issued on Aug.  4, 2009 to Zafar et al. and U.S.  Pat.  No. 7,676,077 issued on Mar.  9, 2010 to Kulkarni et al., both of which are incorporated by reference as if fully set forth herein.  In addition, the design data can be standard cell library data, integrated layout data, design data for one or more layers, derivatives of the design data, and full or partial chip design data.”).  

Regarding claim 10, Zhang discloses the computerized method according to claim 1, wherein the second training set further includes, for each second image, corresponding design data (said or “ ‘design,’ ‘design data,’ and ‘design information’ as used interchangeably herein”), and the obtaining further comprises obtaining (said via “derived from…simulation”) design data of (“of” used to indicate association) the runtime image.  




Regarding claim 11, Zhang discloses the computerized method according to claim 1, wherein the supervised model component and the unsupervised model component are trained separately (as shown in fig. 4:424:440: “Model Training”).  
Regarding claim 12, Zhang discloses the computerized method according to claim 1, further comprising obtaining, during runtime, one or more new first images (said via “additional training images”) each with label data (via fig. 4:406: “Data labeling”) indicative of presence (said via fig. 4:432: “Detection”) of one or more new classes (or “extra…classes…for further training”) of defects (said via “defects in the image”), and retraining (via said “re-training” and “further training”) the supervised model component using the new first images (via:
“[0101] In a further embodiment, the one or more functions include determining one or more characteristics of the one or more causal portions and determining, based on the one or more characteristics of the one or more causal portions, if additional images for the specimen should be collected from the imaging tool and used for additional training of the deep learning model.  For example, the diagnostic component or visualization 336 may be added after model evaluation as shown in FIG. 3 and it may fall back to data collection 302, if causal assurance failed on a) considerable samples of one type or class; and/or b) considerable samples of several types or classes.  If this path is selected, extra data for the error types or classes are collected from imaging tool 300 
for further training.  For example, as shown in FIG. 3, visualization 336 may send output such as instructions for additional data collection to data collection 302 step, which may be performed using imaging tool 300.  The additional data collection may be performed using the same specimens that were used for initial data collection and/or different specimens not previously used for data collection.”).






Regarding claim 13, Zhang discloses the computerized method according to claim 1, wherein the runtime image is a review (via a “review…inspection”) image generated by a review (via a “review…inspection” or “semiconductor…inspection”, cited in the rejection of claim 1 or “a reticle inspection system”, cited in the rejection of claim 9) tool (said via fig. 4:400: “Imaging tool” via:
“[0005] Defect review typically involves re-detecting defects detected as such by an inspection process and generating additional information about the defects at a higher resolution using either a high magnification optical system or a scanning electron microscope (SEM).  Defect review is therefore performed at discrete locations on specimens where defects have been detected by inspection.  The higher resolution data for the defects generated by defect review is more suitable for determining attributes of the defects such as profile, roughness, more accurate size information, etc.”).














Regarding claim 14, Zhang discloses the computerized method according to claim 1, further comprising 
processing the runtime image using one or more additional (relative to fig. 4:426: “Model Selection”) supervised and/or unsupervised model components (said via fig. 4:442: “Best Model 2”) to obtain one or more additional (said relative to fig. 4:426: “Model Selection”) outputs (output arrow of said via fig. 4:442: “Best Model 2”) indicative of estimated presence (via said fig. 4:432: “Detection”) of additional (said relative to fig. 4:426: “Model Selection”) defects on the runtime image (said via “defects in the image”), 
wherein the one or more additional (said relative to fig. 4:426: “Model Selection”)  supervised and/or unsupervised model components (said via fig. 4:442: “Best Model 2”) are trained (via said fig. 4:440: “Model Training”) using one or more additional training sets (said via “additional training images”) including training images from different layers (via a “design…image of…one or more layers”, cited in the rejection of claim 9) of the specimen and/or from different specimens (“such as reticles and wafers” via:
“[0021] In addition, the "design," "design data," and "design information" described herein refers to information and data that is generated by semiconductor device designers in a design process and is therefore available for use in the embodiments described herein well in advance of printing of the design on any physical specimens such as reticles and wafers.”).  







Regarding claim 15, claim 15 is rejected the same as claim 1. Thus, argument presented in claim 1 is equally applicable to claim 15. Accordingly, Zhang discloses claim 15 of a computerized system of defect detection on a specimen, the system comprising a processor and memory circuitry (PMC) configured to: 
obtain a runtime image representative of at least a portion of the specimen;
process the runtime image using a supervised model component to obtain a first output indicative of estimated presence of first defects on the runtime image, wherein the supervised model component is trained using a first training set including at least a plurality of first images each representative of at least a portion of the specimen and corresponding label data indicative of first defect distribution on the first images;
process the runtime image using an unsupervised model component to obtain a second output indicative of estimated presence of second defects on the runtime image, wherein the unsupervised model is trained using a second training set including a plurality of second images each representative of at least a portion of the specimen, each second image being a reference image of a first image; and 
combine the first output and the second output using one or more optimized parameters to obtain a defect detection result of the specimen.  






Regarding claim 16, claim 16 is rejected the same as claim 2. Thus, argument presented in claim 2 is equally applicable to claim 16. Accordingly, Zhang discloses claim 16 of the computerized system according to claim 15, wherein the one or more optimized parameters are obtained during training using a third training set.  
Regarding claim 17, claim 17 is rejected the same as claim 3. Thus, argument presented in claim 3 is equally applicable to claim 17. Accordingly, Zhang discloses claim 17 of the computerized system according to claim 16, wherein the first output is a first grade map representative of estimated probabilities of the first defects on the runtime image, and the second output is a second grade map representative of estimated probabilities of the second defects on the runtime image; and   wherein the PMC is configured to combine the first output and the second output using a segmentation model component operatively connected to the supervised and unsupervised model components, to obtain a composite grade map indicative of estimated probabilities of the first defects and the second defects on the specimen, and wherein the segmentation model component is trained using the third training set based on outputs of the supervised model and unsupervised model.  






Regarding claim 18, claim 18 is rejected the same as claim 4. Thus, argument presented in claim 4 is equally applicable to claim 18. Accordingly, Zhang discloses claim 18 of the computerized system according to claim 16, wherein the first output is a first grade map representative of estimated probabilities of the first defects on the runtime image, and the second output is a second grade map representative of estimated probabilities of the second defects on the runtime image; and wherein the PMC is configured to combine the first output and the second output by combining the first grade map and the second grade map with respective global weights to generate a composite grade map indicative of estimated probabilities of the first defects and the second defects on the specimen, wherein the respective global weights are optimized during training using the third training set.  












Regarding claim 20, claim 20 is rejected the same as claims 1 and 15. Thus, argument presented in claims 1 and 15 is equally applicable to claim 20. Accordingly, Zhang discloses claim 20 of a non-transitory computer readable storage medium (via fig. 5:500: “Computer-readable medium”) tangibly embodying a program of instructions that, when executed by a computer, cause the computer to perform a method of defect detection on a specimen, the method comprising: 
obtaining a runtime image representative of at least a portion of the specimen;
processing the runtime image using a supervised model component to obtain a first output indicative of estimated presence of first defects on the runtime image, wherein the supervised model component is trained using a first training set including at least a plurality of first images each representative of at least a portion of the specimen and corresponding label data indicative of first defect distribution on the first images;
processing the runtime image using an unsupervised model component to obtain a second output indicative of estimated presence of second defects on the runtime image, wherein the unsupervised model is trained using a second training set including a plurality of second images each representative of at least a portion of the specimen, each second image being a reference image of a first image; and 
combining the first output and the second output using one or more optimized parameters to obtain a defect detection result of the specimen.  




Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Regarding inquiry 4, see Suggestions.
Claims 5 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (US 2018/0107928 A1) in view of Tuohy (US Patent App. Pub. No.: US 2007/0177135 A1) and Pathangi et al. (US Patent App. Pub. No.: US 2020/0161081).






Regarding claim 5, Zhang teaches the computerized method according to claim 2, 
wherein the processing of the runtime image using a supervised model component comprises generating a first grade map (said via “the image (i.e., a feature map)”) representative of estimated probabilities (said via a “probabilistic” “model”) of the first defects on the runtime image and applying a first threshold to the first grade map to obtain a first defect map (or “defects” “map” cited in the rejection of claim 1:[0067]); 
wherein the processing of the runtime image using a unsupervised model component comprises generating a second grade map (said via “the image (i.e., a feature map)”) representative of estimated probabilities (said via a “probabilistic” “model”) of the second defects on the runtime image, and applying a second threshold to the second grade map to obtain a second defect map (said or “defects” “map” cited in the rejection of claim 1:[0067]), the first threshold and the second threshold being optimized during training using the third training set, and 
wherein the combining comprises combining the first defect map and the second defect map to generate a composite defect map.
Thus, Zhang does not teach, as indicated in bold above, the claimed:
a)	a first threshold;
b)	a second threshold; and
c)	the first threshold and the second threshold being optimized during training using the third training set, and 
wherein the combining comprises combining the first defect map and the second defect map to generate a composite defect map.
Accordingly, Tuohy teaches claim 5 of:
a)	a first (via fig. 1b: “1. filter”) threshold (or “a certain size…threshold…such as size, shape, position within a predefined area, number of defects per unit area and the like”);
b)	a second (via fig. 1b: “2. filter”) threshold (or “a certain size…threshold…such as size, shape, position within a predefined area, number of defects per unit area and the like”); and
c)	the first threshold and the second threshold being optimized during training using the third training set, and 
wherein the combining comprises combining (via fig. 1a:130: “correlation unit”) the first defect map (or “defects…substrate map 155A) and the second defect map (or “defect…substrate map 156A” or “defects…substrate map 157A”) to generate a composite defect map (via:..










“[0021] The process of removing less relevant data from a given measurement data set may be accomplished by using appropriate measurement data, i.e., data having reduced noise, which may be considered as reference data, and combining or merging the filtered measurement data with the reference data to determine, for instance, a degree of correlation, a die loss and the like for the set of measurement data that has been filtered on the basis of a predefined filter criterion.  For example, if the filtered measurement data may exhibit a significantly increased correlation with respect to the reference data compared to the non-filtered data, the respective filter criterion used may be identified as an appropriate filter criterion and may be used to obtain data of increased statistical significance for the measurement process under consideration.  In other illustrative embodiments, the filtering process may be performed in a progressive manner, i.e., the filtering process may be performed on the basis of progressively restricted filter criteria so that a plurality of differently, i.e., progressively, filtered measurement data is available, for which respective degrees of correlation may be determined.  In other embodiments, the correlation may be used as a "quality monitor" of the measurement data, from which a die loss may be calculated for every filtering step to select an appropriate filtering process on the basis of the calculated die loss.  In some illustrative embodiments, the term "progressively filtering" may indicate a filtering process in which the initial measurement data are filtered with respect to the same filter criterion but with an increasingly restrictive filter behavior.  In other illustrative embodiments, the term "progressively filtering" may include a plurality of consecutive filtering processes, wherein a different filter criterion may be applied to a filtered measurement data set that has previously been filtered by a different criterion.  For example, in the former case, a filter criterion may be selected, such as the size or area of a defect detected by optical inspection, the number of defects per unit area and the like, wherein, in each filtering step, the corresponding filtering action or range may be set more restrictively.  That is, it may be assumed that the influence of a defect may 
increase with its size, thereby rendering the corresponding larger defects more relevant compared to a smaller defect.  Consequently, during the progressive filtering process, the filter arrangement may be set so as to detect defects at or above a certain size, while neglecting effects below the threshold.  In the latter case, different filter criteria, such as size, shape, position within a predefined area, number of defects per unit area and the like, may be successively applied in order to reduce the noise in the original measurement data, thereby providing the potential for identifying appropriate filter "threads" that may be used in a corresponding manufacturing environment for 
obtaining measurement data of increased relevance.”; and









“[0029] Next, the measurement data 152A may be subjected to a first filtering process, for instance on the basis of a filter criterion determining a minimum defect size, below which a defect is considered as being not present.  Consequently, after re-processing the measurement data 152A according to the respective filter criterion and the setting of the filter criterion in the first step by selecting an appropriate minimum size, a filtered substrate map 154A may be obtained, wherein, for instance, 10 dies may be considered clean, while 86 die are still evaluated as defective die.  In a next filter step, a more restrictive range for the specified criterion, that is, an even increased 
minimum size of the defects, may be selected so that a further substrate map 155A may be generated.  For example, the minimum size in each of the filtering steps may be obtained as a multiple of the initial minimum defect size detectable by the corresponding inspection tool.  It should be appreciated, however, that any other value for the restricted range in the first, second and further filter step may be used.  The resulting filter process may yield 19 clean dies and thus 77 defective die.  Similarly, in a third filter step having a further increased restriction with respect to the corresponding filter 
criterion, such as the defect size, a further filtered set of measurement data represented by a substrate map 156A may be created.  Hereby, it may be assumed that 60 clean die are obtained, while 36 defective die are detected.  In a next filtering step, an even increased restriction, i.e., only defects having a size above a threshold higher than a threshold of any of the filter processes performed before, may be performed and may yield a corresponding set of filtered data represented by a substrate map 157A, wherein it may be assumed that 77 clean die are detected and thus 19 defective die are still present.  It should be appreciated that the above sequence of filtering steps is of illustrative nature only and other filter criteria in combination with respective increasingly restricted filter ranges may be used to obtain progressively filtered data sets.”).

Thus, one of ordinary skill in maps can modify Zhang’s teaching of said “defects” “map” cited in the rejection of claim 1:[0067] with Tuohy’s teaching of said fig. 1b: “1. filter” by:
a)	providing multiple of Zhang’s fig. 4:400 at Zhang’s fig. 4:400;
b)	inserting Tuohy’s fig. 1a:100 upon the output of each Zhang’s fig. 4:400; and
c)	recognizing that the modification is predictable or looked forward to because the modification allows one to monitor measurement-quality via a “ ‘quality monitor’ of the measurement data” from each of said Zhang’s fig. 4:400 via Tuohy, cited above.


	






The combination does not teach, as indicated in bold above, the remaining limitation of:
c)	“the first threshold and the second threshold being optimized during training using the third training set” 
	











Accordingly, Pathangi teaches:
c)	the first (“corresponding”) threshold (or “threshold (Thr)”) and the second (“corresponding”) threshold (said or “threshold (Thr)” via fig. 1:102: “Quantify a number of pixels in the image that exceed a corresponding threshold in the matrix”) being optimized (“to provide optimal detection results” corresponding to a “tuned” “threshold” comprising “optimum performance”) during training (via “the heat map of FIG. 4…used to train” as shown in figures 5-13 that uses parts, fig. 4: bolded squares, of said heatmap) using the third training set (said heatmap via:
“[0050] A nuisance rate can be tuned to required levels using the detection threshold parameter.  Thus, the threshold can be tuned depending on the application or desired sensitivity.”

wherein “tuned” is defined via Dictionary.com:
BRITISH DICTIONARY DEFINITIONS FOR TUNE
tune
verb
14	(tr often foll by up) to make fine adjustments to (an engine, machine, etc) to obtain optimum performance; and

“[0064] FIGS. 3-13 illustrate tuning a nuisance rate to required levels using the detection threshold parameter.  Depending on the average CD of one population of contact holes, the threshold (Thr) is different to provide optimal detection results.”

“[0067] In an SEM images of the three dies in the thick black border in the heat 
map of FIG. 4 (mean diameters of 16.7, 17.2, and 17.2), all the individual contact holes that are smaller than 10% of the mean critical dimension of the 100 contact holes from the corresponding image were used to train the deep learning model to be identified as defective.  Using this deep learning model, the SEM images from the three individual dies marked with a thick border (mean diameters of 15.7, 17.1, and 19.4) were used as verification images to assess the performance of the deep learning model in identifying defective contact holes.”).




	Thus, one of ordinary skill in the art of metrology and defects and thresholding thereof and heatmaps as indicated in Zhang teaching of “heatmap” via:
“[0084] In some embodiments, the diagnostic component is configured for determining the one or more causal portions by causal back propagation performed using a deconvolution heatmap algorithm.  A deconvolution heatmap can be viewed as a specific implementation of causal backpropagation.  11or example, as described by Zeiler et al., "Visualizing and understanding convolutional networks," ECCV, 2014, pp.  818-833, which is incorporated by reference as if fully set forth herein, the causal image can be computed via mapping activation from the deep learning model's output back to the pixel/feature (i.e., x and v) space through a backpropagation rule.  The embodiments described herein may be further configured as described in this reference.

can modify Zhang’s said “defects” “map” as modified via the combination of Tuohy with Pathangi’s teaching of “threshold (Thr)” by:
a)	making each of the combination’s Zhang’s fig. 4:400: “Imaging tool” as modified via the combination of Tuohy be as Pathangi’s fig. 14:200; 
b)	placing the tuned thresholds of Pathangi at Tuohy’s thresholds at said fig. 1b: “1. filter” and fig. 1b: “2. filter”; and 
c)	recognizing that the modification is predictable or looked forward to because the modification is used “to provide optimal detection results” via Pathangi, cited above.	








Regarding claim 19, claim 19 is rejected the same as claim 5. Thus, argument presented in claim 5 is equally applicable to claim 19. Accordingly, Zhang as combined via Tuohy and Pathangi teaches claim 20 of the computerized system according to claim 16, wherein the PMC is configured to process the runtime image using a supervised model component by generating a first grade map representative of estimated probabilities of the first defects on the runtime image and applying a first threshold to the first grade map to obtain a first defect map;
wherein the PMC is configured to process the runtime image using a unsupervised model component by generating a second grade map representative of estimated probabilities of the second defects on the runtime image and applying a second threshold to the second grade map to obtain a second defect map, the first threshold and the second threshold being optimized during training using the third training set, and   
wherein the PMC is configured to combine the first output and the second output by combining the first defect map and the second defect map to generate a composite defect map.  







Claim 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (US 2018/0107928 A1) in view of Zhou et al. (US Patent App. Pub. No.: US 2019/0104940 A1).
Regarding claim 6, Zhang teaches the computerized method according to claim 4, wherein the global weights are obtained using an non-gradient optimization function during training using the third training set.  
Zhang does not teach claim 6 as a whole.
Accordingly, Zhou teaches claim 6 of:
the global weights (or weighting coefficients via fig. 3:230: “Calculate change in error as a function of change in the network coefficients”) are obtained using an non-gradient optimization function (or “a non-gradient descent optimization algorithm like simulated annealing or a genetic algorithm”) during training (via fig. 1A: 130: “Network Training”) using the third training set (or fig. 1A:115: “Noisy/Artifact Data” and fig. 1A:120: “Opti-mized Data” via:
“[0067] In step 230 of step 130, a change in the error as a function of the change in the network can be calculated (e.g., an error gradient), and this change in the error can be used to select a direction and step size for a subsequent change to the weights/coefficients of the DL network 135.  Calculating the gradient of the error in this manner is consistent with certain implementations of a gradient descent optimization method.  In certain other implementations, as would be understood by one of ordinary skill in the art, this step can be omitted and/or substituted with another step in accordance with another optimization algorithm (e.g., a non-gradient descent optimization algorithm like simulated annealing or a genetic algorithm).”).





Thus one of ordinary skill in the art of noise as indicated in Zhang’s teaching of “noise” and “noisy regions” via Zhang:
“[0072] In another embodiment, the information determined by the deep learning model includes one or more segmentation regions generated from the image.  In one such embodiment, the deep learning model includes a proposal network configured for identifying the segmentation region(s) (based on features determined for the image) and generating bounding boxes for each of the segmentation regions.  The segmentation regions may be detected based on the features (determined for the images by the deep learning model or another method or system) to thereby separate regions in the images based on noise (e.g., to separate noisy regions from quiet regions), to separate regions in the images based on specimen features located therein, to separate regions based on geometric characteristics of the output, etc. The proposal network may use features from a feature map, which may be generated or determined as described further herein, to detect the segmentation region(s) in the image based on the determined features.  The proposal network may be configured to generate bounding box detection results.  In this manner, the deep learning model may output bounding boxes, which may include a bounding box associated with each segmentation region or more than one segmentation region.  The deep learning model may output bounding box locations with each bounding box.  The results of the segmentation region generation can also be stored and used as described further herein.”

can modify Zhang’s teaching of the “a set of weights that model the world” with Zhou’s teaching of said fig. 3:230: “Calculate change in error as a function of change in the network coefficients” by:
a)	inserting Zhou’s fig. 1A:110 before Zhang’s fig. 4:428: “Best Model 1” and fig. 4:442: “Best Model 2”; and
b)	recognizing that the modification is predictable or looked forward to because the modification enables one “to produce images resembling the high-image-quality images from…noisy…images” via Zhou: 




“[0031] The process 110 of method 100 performs offline training of the DL
network 135.  In step 130 of process 110, noisy data 115 and optimized data 120 
are used as training data to train a DL network, resulting in the DL network being output from step 130.  More generally, data 115 can be referred to as defect-exhibiting data, for which the "defect" can be any undesirable characteristic that can be affected trough image processing (e.g., noise or an artifact).  Similarly, data 120 can be referred to as defect-reduced data, defect-minimized data, or optimize data, for which the "defect" is less than in the data 115.  In an example using reconstructed images for data 115 and 120, the offline DL training process 110 trains the DL network 135 using a large 
number of noisy reconstructed images 115 that are paired with corresponding high-
image-quality images 120 to train the DL network 135 to produce images resembling 
the high-image-quality images from the noisy reconstructed images.”


































Suggestions

Obvious difference: applicant’s disclosure of fig. 2:208: “Combining the first output and the second output…”. Thus applicant’s disclosure thereof, such as [0070]- [0072], is an indication of non-obviousness in view of the cited art.
Applicant’s disclosure states in [006]: “the goal …is…high sensitivity” or a high detection or high perception or high recognition or high rate of identification or high electromagnetic extraction directed to the last limitation of claim 1’s “optimized…defect detection”. Thus, the corresponding disclosure, such as said [0070]-[0072], is an indication of non-obviousness in view of the cited art.
Note that these suggestions are not provided with respect to overcoming 35 USC 101,112,102 and/or 103. These suggestion are mainly provided to seek out advantages in the disclosure regardless of 35 USC 101,112,102 and/or 103.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DENNIS ROSARIO whose telephone number is (571)272-7397.  The examiner can normally be reached on Monday-Friday, 9AM-5PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on (571)272-7778.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/DENNIS ROSARIO/Examiner, Art Unit 2667 

/MATTHEW C BELLA/Supervisory Patent Examiner, Art Unit 2667