DETAILED ACTION
Response to Amendment
The amendment was received 11/19/21. Claims 1-20 are pending.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
Regarding claims 1-20, 35 USC 112(f) is not invoked in claims 1-20. 





The following definitions are “taken” via MPEP 2111.01 III. "PLAIN MEANING" REFERS TO THE ORDINARY AND CUSTOMARY MEANING GIVEN TO THE TERM BY THOSE OF ORDINARY SKILL IN THE ART, 3rd paragraph, emphasis added:
“It is also appropriate to look to how the claim term is used in the prior art, which includes prior art patents, published applications, trade publications, and dictionaries. Any meaning of a claim term taken from the prior art must be consistent with the use of the claim term in the specification and drawings. Moreover , when the specification is clear about the scope and content of a claim term, there is no need to turn to extrinsic evidence for claim interpretation. 3M Innovative Props. Co. v. Tredegar Corp., 725 F.3d 1315, 1326-28, 107 USPQ2d 1717, 1726-27 (Fed. Cir. 2013) (holding that "continuous microtextured skin layer over substantially the entire laminate" was clearly defined in the written description, and therefore, there was no need to turn to extrinsic evidence to construe the claim).”

The claimed “reference” (as in “each second image of the plurality of second images being a defect-free reference image of a first image” in claim 1) is interpreted in light of applicant’s disclosure ([0031]: “reference images corresponding to”) and definition thereof via Dictionary.com wherein “relation” is “taken” as the meaning of the claimed “reference” via MPEP 2111.01 III:
reference
noun
12	relation, regard, or respect:
all persons, without reference to age.







The claimed “-free” (as in “each second image of the plurality of second images being a defect-free reference image” in claim 1) is interpreted in light of applicant’s disclosure and definition thereof via Dictionary.com wherein “relation” is “taken” as the meaning of the claimed “” via MPEP 2111.01 III:
OTHER DEFINITIONS FOR FREE (2 OF 2)
-free
1	a combining form with the general sense “free of or from something specified,” and typically meaning “not having, containing, subject to, or affected by something unwanted, burdensome, etc.”:
error-free;
gluten-free;
tax-free;
germfree.













Response to Arguments
Applicant's arguments filed 11/19/21 have been fully considered but they are not persuasive:
Response to Rejections under 35 USC 102(a)(1)
Applicant’s state in pages 10,11:
“In rejecting originally filed claim 1, the Office Action cites Fig. 4 of Zhang and its corresponding description. Zhang further discloses "FIG. 4 illustrates... steps that may be performed to develop or create a deep learning model" via "data collection 402," "data labeling 406," "data partition 420," "model training 1 424," "model selection 426," "model evaluation 430," etc. (Zhang, [0110]-[0113].) As shown in FIG. 4, data labeling 406, data partition 420, model training 1 424, model selection 426, and model evaluation 430 are all processing stages prior to model deployment 444. (Zhang, FIG. 4.) Zhang discloses model deployment as "[b]est model 1 428 may also be sent to model deployment 444 in which best model 1 may be sent to imaging tool 400 for use in a production or runtime mode (post-training mode)." (Zhang, [0113].) Therefore, the steps for developing the deep learning model in FIG. 4 of Zhang correspond to a training phase of the deep learning model prior to its runtime deployment.”
	
	In response, this remark is directed to the claimed “the supervised model component is previously trained”. 
The examiner also notes that “runtime” appears as a term of art, Zhang et al. (US 2018/0107928 A1) discloses “normal runtime mode” in [0094], 5th S, and there is no dictionary definition of “runtime”, one word. The claimed “runtime” is in adjective-form. There is no noun-form of “runtime” in claim 1. If claim 1 did claim the noun-form of “runtime” (such as “during runtime”: applicant’s disclosure: page 5: bullet “(xii)”), then claim 1 would have to reviewed again with “runtime” in noun-form (such as “during runtime”) in relation to Zhang’s disclosure of “runtime”.
	


Applicants state in page 11, emphasis added/ issue identified:
“Nevertheless, Zhang does not disclose at least the above-emphasized features with respect to runtime defect detection of amended claim 1. Zhang teaches a training phase of the deep learning model prior to its runtime deployment instead of a "computerized method for runtime defect detection on a specimen...comprising: obtaining a runtime image," and "processing the runtime image," "to obtain a runtime defect detection result of the specimen," as recited in amended claim 1. Zhang's training phase does not disclose "processing the runtime image using a supervised model component...wherein the supervised model component is previously trained" and "separately processing the runtime image using an unsupervised model component...wherein the unsupervised model component is previously trained," which take place in a runtime/production phase after the relevant models are trained and deployed, as recited in amended claim 1. Zhang's steps of FIG. 4 disclose training a model instead of ""processing the runtime image using a... model component... [that] is previously trained," as recited in amended claim 1.” 

	The examiner respectfully disagrees since Zhang (US 2018/0107928 A1) discloses, based on the Office action of 8/19/2021: starting pg. 3, receiving images via a tool, fig. 1:10: light-optics, wherein the images are processed via a computer, fig. 1:36: “Computer system”, to obtain a detector light result, fig. 1:dashed lines to fig. 1:100,102: “Computer subsystem(s)”: “Component(s) executed by computer subsystem(s)”. 
	Thus, Zhang discloses:	
computerized method for runtime defect detection on a specimen...comprising: obtaining a runtime image (via fig. 4:402: “Data collection” based on said fig. 1:10: ‘light-optics), and 
processing (via said computer system represented in fig. 4 as input/output arrows in fig. 4) the runtime image to obtain (via said arrows) a runtime defect detection result (or a result, such as the output of fig. 4:424: “Model Training 1” for defect detection, based on said detector light result represented as the output arrow of fig. 4:400: “Imaging tool”) of the specimen,

Applicants state in page 11, emphasis added:
“Nevertheless, Zhang does not disclose at least the above-emphasized features with respect to runtime defect detection of amended claim 1. Zhang teaches a training phase of the deep learning model prior to its runtime deployment instead of a "computerized method for runtime defect detection on a specimen...comprising: obtaining a runtime image," and "processing the runtime image," "to obtain a runtime defect detection result of the specimen," as recited in amended claim 1. Zhang's training phase does not disclose "processing the runtime image using a supervised model component...wherein the supervised model component is previously trained" and "separately processing the runtime image using an unsupervised model component...wherein the unsupervised model component is previously trained," which take place in a runtime/production phase after the relevant models are trained and deployed, as recited in amended claim 1. Zhang's steps of FIG. 4 disclose training a model instead of ""processing the runtime image using a... model component... [that] is previously trained," as recited in amended claim 1.” 

In response, the corresponding portion in the Office action of 8/19/21, pg. 4, lines 1-12 does not mention “training phase”. Instead, the portion states “fig. 4:422:bottom-left: ‘Training Data’ ”. Zhang is silent regarding “phase”. Applicants are implying that all of fig. 4 except for fig. 4:444: “Model Deployment” is the training phase. Nevertheless, the word “trained” or “previously trained” is implicit in fig. 4:424: “Model Training 1” wherein “-ing” of “Training” expresses the action of the verb train or expresses the result of the verb train. Thus, the result of training is trained. Zhang teaches “trained” in “trained model …sent… to model selection 426”, [0113], 3rd S. 
Thus, Zhang teaches:
processing (via said computer and input and output arrows) the runtime image using a supervised model component (said fig. 4:424: “Model Training 1” that is made of components: supervised part and unsupervised part being used in combination)...wherein the supervised model component is previously trained (via said fig. 4:424: “Model Training 1” expressing the result of the verb “train” such as “trained” represented as the output of fig. 4:424: “Model Training 1”).
In response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (i.e., “take place in a runtime/production phase after the relevant models are trained and deployed” in page 11, emphasis added:
“Nevertheless, Zhang does not disclose at least the above-emphasized features with respect to runtime defect detection of amended claim 1. Zhang teaches a training phase of the deep learning model prior to its runtime deployment instead of a "computerized method for runtime defect detection on a specimen...comprising: obtaining a runtime image," and "processing the runtime image," "to obtain a runtime defect detection result of the specimen," as recited in amended claim 1. Zhang's training phase does not disclose "processing the runtime image using a supervised model component...wherein the supervised model component is previously trained" and "separately processing the runtime image using an unsupervised model component...wherein the unsupervised model component is previously trained," which take place in a runtime/production phase after the relevant models are trained and deployed, as recited in amended claim 1. Zhang's steps of FIG. 4 disclose training a model instead of ""processing the runtime image using a... model component... [that] is previously trained," as recited in amended claim 1.” 

) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).









Applicants state in page 11, emphasis added:
“Nevertheless, Zhang does not disclose at least the above-emphasized features with respect to runtime defect detection of amended claim 1. Zhang teaches a training phase of the deep learning model prior to its runtime deployment instead of a "computerized method for runtime defect detection on a specimen...comprising: obtaining a runtime image," and "processing the runtime image," "to obtain a runtime defect detection result of the specimen," as recited in amended claim 1. Zhang's training phase does not disclose "processing the runtime image using a supervised model component...wherein the supervised model component is previously trained" and "separately processing the runtime image using an unsupervised model component...wherein the unsupervised model component is previously trained," which take place in a runtime/production phase after the relevant models are trained and deployed, as recited in amended claim 1. Zhang's steps of FIG. 4 disclose training a model instead of ""processing the runtime image using a... model component... [that] is previously trained," as recited in amended claim 1.” 

	The examiner respectfully disagrees since Zhang teaches:
""processing the runtime image (represented as any arrow in fig. 4) using a... model component (said fig. 4:424: “Model Training 1” made of said combined supervised and unsupervised parts as discussed in detail in the next response to applicant’s remark)... [that] is previously trained (via said “training” that expressed “trained” represented as said output arrow of fig. 4:424: “Model Training 1”).









Applicants state in page 11, emphasis added:
“In addition, Zhang is silent as to ‘processing the runtime image using a supervised model component’ and ‘separately processing the runtime image using an unsupervised model component,’ as recited in amended claim 1.
The Office action alleges that ‘Best Model 1 428’ of FIG. 4 discloses ‘supervised model component’ of originally filed claim 1 and that ‘Best Model 2 442’ of FIG. 4 discloses ‘unsupervised model component’ of originally filed claim 1. (Office action, page 4.) Applicant respectfully disagrees. Zhang discloses that best model 1 428 and best model 442 are ‘semi-supervised’ since ‘the labeling process for best model 1 does not require labeling exactly the bounding box for each object.’ (Zhang, [0114] and [0116].) Therefore, Zhang's best model 1 428 and best model 2 442 do not teach or suggest ‘supervised model component’ and ‘unsupervised model component,’ as recited in amended claim 1. Zhang further only discloses training best model 1 428 and best model 442 and does not disclose ‘processing the runtime image using’ best model 1 428 and ‘separately processing the runtime image using’ best model 2 442 ‘to obtain a runtime defect detection result.’ ”

The examiner respectfully disagrees since Zhang’s “semi-supervised” includes supervised and unsupervised to one of ordinary skill in the art via MPEP 2131.01:












2131.01 Multiple References 35 USC 102 Rejections
Normally, only one reference should be used in making a rejection under 35 U.S.C. 102. However, a 35 U.S.C. 102  rejection over multiple references has been held to be proper when the extra references are cited to:

(A) Prove the primary reference contains an "enabled disclosure;"
(B) Explain the meaning of a term used in the primary reference; or
(C) Show that a characteristic not disclosed in the reference is inherent.

See subsections I-III below for more explanation of each circumstance.

II.    TO EXPLAIN THE MEANING OF A TERM USED IN THE PRIMARY REFERENCE
   Extra References or Other Evidence Can Be Used to Show Meaning of a Term Used in the Primary Reference

Extrinsic evidence may be used to explain but not expand the meaning of terms and phrases used in the reference relied upon as anticipatory of the claimed subject matter. In re Baxter Travenol Labs., 952 F.2d 388, 21 USPQ2d 1281 (Fed. Cir. 1991) (Baxter Travenol Labs. invention was directed to a blood bag system incorporating a bag containing DEHP, an additive to the plastic which improved the bag’s red blood cell storage capability. The examiner rejected the claims over a technical progress report by Becker which taught the same blood bag system but did not expressly disclose the presence of DEHP. The report, however, did disclose using commercial blood bags. It also disclosed the blood bag system as "very similar to [Baxter] Travenol’s commercial two bag blood container." Extrinsic evidence (depositions, declarations and Baxter Travenol’s own admissions) showed that commercial blood bags, at the time Becker’s report was written, contained DEHP. Therefore, one of ordinary skill in the art would have known that "commercial blood bags" meant bags containing DEHP. The claims were thus held to be anticipated.).

	Thus the following references at the time of when Zhang (US 2018/0107928 A1) was written or published (published April 19, 2018) or before the filing of applicant’s invention to explain the meaning of “semi-supervised”:
	



	NOONE et al. (US 2020/0166909 A1), filed Nov. 19, 2019, explains “semi-supervised”:
“[0242] Semi-supervised learning algorithms: In the context of the present disclosure, semi-supervised learning algorithms are algorithms that make use of both labeled and unlabeled object classification or manufacturing process data for training (typically using a relatively small amount of labeled data with a large amount of unlabeled data).”

Zejda et al. (US 2020/0410354 A1), filed Jun 27, 2019, explains “semi-supervised”:
“[0033] There are multiple ways in which weights can be trained. One method is called supervised learning. In supervised learning, all training samples are labeled, so that inputting each training sample into a neural network produces a known result. Another method is called unsupervised learning, where the training samples are not labeled and training aims to find a structure in the data or clusters in the data. Semi-supervised learning falls between supervised and unsupervised learning. In semi-supervised learning, a subset of training data is labeled. The unlabeled data can be used to define cluster boundaries and the labeled data can be used to label the clusters.”
	
	










Bhaskar et al. (US 2019/0303717 A1), filed Mar. 25, 2019, explains “semi-supervised”:
“[0071] In one embodiment, the high resolution neural network is configured as a semi-supervised DL framework. In another embodiment, the low resolution neural network is configured as a semi-supervised DL framework. For example, a semi-supervised state of the networks can be used in the DL networks described herein. Such a DL framework may be configured for a two-level process using both supervised label information and unsupervised structure information to jointly make decisions on channel selection. For example, label information may be used in feature extraction and unlabeled information may be integrated to regularize the supervised training. In this way, both supervised and unsupervised information may be used during the training process to reduce model variance. A generative model such as a Restricted Boltzmann Machine (RBM) may be used to extract representative features and reduce the data dimensionality, which can greatly diminish the impact of scarcity of labeled information. An initial channel selection procedure utilizing only unsupervised information may remove irrelevant channels with little structure information and reduce data dimensionality. Based on the results from the initial channel selection, a fine channel selection procedure can be used to handle noisy channel problems. Therefore, such a DL framework may be particularly useful for handling information that is very noisy, which may be the case for some of the specimens described further herein. The DL frameworks may be further configured as described in “A Novel Semi-supervised Deep Learning Framework for Affective State Recognition on EEG Signals,” by Jia et al., BIBE '14 Proceedings of the 2014 IEEE International Conference on Bioinformatics and Bioengineering, pp. 30-37, Nov. 10-12, 2014, IEEE Computer Society, Washington, D.C., which is incorporated by reference as if fully set forth herein. The embodiments described herein may be further configured as described in this reference.”

[0091] The computer subsystem(s) can also use methods such as semi-supervised methods that combine Bayesian generative modeling to achieve their results in a minimum number of samples. Examples of such methods are described in U.S. Patent Application Publication No. 2017/0148226 published May 25, 2017 by Zhang et al, and “Semi-supervised Learning with Deep Generative Models,” Kingma et al, NIPS 2014, Oct. 31, 2014, pp. 1-9, which are incorporated by reference as if fully set forth herein. The embodiments described herein may be further configured as described in these references. In addition, the computer subsystem(s) may leverage ladder networks where supervised and unsupervised learning are combined in deep neural networks such as the ones proposed in “Semi-Supervised Learning with Ladder Networks,” Rasmus et al., NIPS 2015, Nov. 24, 2015, pp. 1-19, which is incorporated by reference as if fully set forth herein. The embodiments described herein may be further configured as described in this reference. The computer subsystem(s) described herein may be further configured to train the low resolution neural network using a deep adversarial generative network of the type described in “Generative Adversarial Nets” Goodfellow et al., Jun. 10, 2014, pp. 1-9, which is incorporated by reference as if fully set forth herein. The embodiments described herein may be further configured as described in this reference. In addition or alternatively, the computer subsystem(s) described herein may be configured to train the low resolution neural network using an adversarial autoencoder (a method that combines a variational autoencoder (VAE) and a deep generative adversarial network (DGAN)) such as that described in “Adversarial Autoencoders,” Makhzani et al., arXiv:1511.05644v2, May 25, 2016, 16 pages, which is incorporated by reference as if fully set forth herein. The embodiments described herein may be further configured as described in this reference. In some instances, the computer subsystem(s) may be configured to perform Bayesian Learning as described in “Bayesian Learning for Neural Networks,” Neal, Springer-Verlag New York, 1996, 204 pages, which is incorporated by reference as if fully set forth herein. The embodiments described herein may be further configured as described in this reference. The computer subsystem(s) may also be configured to perform the variational Bayes method as described in “The Variational Bayes Method in Signal Processing,” Šmídl, Springer-Verlag Berlin Heidelberg, 2006, 228 pages, which is incorporated by reference as if fully set forth herein. The embodiments described herein may be further configured as described in this reference.


 	Sugaya (WO 2020/217957 A1), with priority document of 2019-085763 (26.04.2019): the priority document itself is not translated, explains “semi-supervised” via English translation of WO 2020/217957 A1 via SEARCH, section: “[1-4. Data judgement processing]”, 9th paragraph:
“In addition, semi-supervised learning is a mixture of supervised learning and unsupervised learning. After learning features in supervised learning, a huge amount of training data is given in unsupervised learning, and features are automatically created. This is a method of repeatedly learning while calculating the amount.”

	YAMAGUCHI (US 2018/0101924 A1), filed Oct. 5, 2017, explains “semiconductor”:
“[0030] In unsupervised learning, only input data are fed into the machine learning device 2 in large amounts to learn the distribution of the input data to, e.g., compress, classify, and shape the input data without corresponding teacher output data, unlike “supervised learning.” This allows, e.g., clustering of features seen in sets of input data into similar features, and on the basis of the obtained result, any norm may be defined and outputs are allocated to optimize it, thus predicting output. “Unsupervised learning” in this specification refers to broadly-defined “unsupervised learning” including, e.g., an intermediate between “supervised learning” and “unsupervised learning,” called “semi-supervised learning.”

	Fujii et al. (US Patent App. Pub. No.: US 2017/0308049 A1), filed Apr. 21, 2017, explains “semi-supervised”:
“[0047] Note that, as problem setting intermediate between unsupervised learning and supervised learning, there is one referred to as semi-supervised learning. This corresponds to a case, for example, in which there is a set of data on input and output only in some, and there is data on input alone in the remainder. In the present embodiment, by using data (simulation data and the like) that can be obtained without actually operating the cell control apparatus in unsupervised learning, it is possible to perform learning efficiently.”

	Thus, based on the above extrinsic evidence, one of skill in the art would recognize “semi-supervised”, as mentioned in Zhang, to mean both supervised and unsupervised or an intermediate or a linking or a combination or a mixture of supervised with unsupervised.
Applicants state in page 11, emphasis added:
“In addition, Zhang is silent as to ‘processing the runtime image using a supervised model component’ and ‘separately processing the runtime image using an unsupervised model component,’ as recited in amended claim 1.
The Office action alleges that ‘Best Model 1 428’ of FIG. 4 discloses ‘supervised model component’ of originally filed claim 1 and that ‘Best Model 2 442’ of FIG. 4 discloses ‘unsupervised model component’ of originally filed claim 1. (Office action, page 4.) Applicant respectfully disagrees. Zhang discloses that best model 1 428 and best model 442 are ‘semi-supervised’ since ‘the labeling process for best model 1 does not require labeling exactly the bounding box for each object.’ (Zhang, [0114] and [0116].) Therefore, Zhang's best model 1 428 and best model 2 442 do not teach or suggest ‘supervised model component’ and ‘unsupervised model component,’ as recited in amended claim 1. Zhang further only discloses training best model 1 428 and best model 442 and does not disclose ‘processing the runtime image using’ best model 1 428 and ‘separately processing the runtime image using’ best model 2 442 ‘to obtain a runtime defect detection result.’ ”

	The examiner respectfully disagrees since Zhang teaches “data and labels 418 is separated”, [0112], 1st S. Thus, fig. 4:424: “Model Training 1” is developed comprising the separation of the data and labels 418. Applicants are suggesting that the processing of fig. 4:428: “Best Model 1” is separate from the processing of fig. 4:442: “Best Model 2”. The claimed “separately processing” has wide enough scope to encompass a separation of the data from labels 418 via Zhang’s fig. 4:420: “Data Partition”. Thus, the claimed “separately processing the runtime image” is mapped to the input/output arrows in fig. 4 representative of the claimed “processing” and mapped to the data, fig. 4:404: “Raw data”, that is separated from the labels of fig. 4:418: “Data and Labels”. 
	Applicants are implying that supervised is separated from unsupervised (in contrast to the above extra references via said MPEP 2131.01 II; however, claim 1 does not make explicit that the claimed “supervised” is separate from the claimed “unsupervised”. Thus, the claimed “separately processing” in claim 1 has broad scope. 

Applicants state in page 12:
“Zhang's training of best model 1 428 using training data 422 and training of best model 2 442 using cropped version of images used to train best model 1 428 are not analogous to the above cited features of amended claim 1.”

	In response, based on the above examiner responses there is a correspondence of Zhang to claim 1. Zhang does appear to teach that figure 4’s best model 2 is dependent on or processed (represented as arrows in fig. 4) based on the processing of figure 4’s best model 1; however, the claimed “separately processing” has already been addressed above by the above discussion of processing separated or partitioned data that is separated or partitioned from labels. 
In addition, Zhang discloses that fig. 4:442: “Best Model 2” is “a different deep learning model than that trained in model training 1 424”, [0016], penultimate S. Thus, the processing of fig. 4:428: “Best Model 1” is separate from fig. 4:442:“Best Model 2” because the associated processing or training, fig. 4:424: “Model Training 1” and fig. 4:440: “Model Training 2” is based on different or separate models. 
	In addition, Zhang discloses “Model training 2 may be…different…than model training 1 424”, [0016], penultimate S. Thus, the processing of fig. 4:424: “Model Training 1” is different or separate from fig. 4:440: “Model Training 2” corresponding to claim 1’s “separately processing”.


	


Applicant’s arguments, see remarks, page 12:
“Moreover, Zhang does not disclose "the supervised model component is previously trained using a first training set comprising a plurality of first images each representative of the specimen and corresponding label data indicative of first defect distribution on the plurality of first images" and "the unsupervised model component is previously trained using a second training set including a plurality of second images each representative of the specimen, each second image of the plurality of second images being a defect-free reference image of a first image of the plurality of first images," as recited in amended claim 1…
The Office action alleges that cropped image 438 is "a plurality of second images each representative of the specimen, each second image of the plurality of second images being a defect-free reference image of a first image of the plurality of first images," as recited in amended claim 1. Applicant respectfully disagrees. Zhang discloses that ROI refer to "causal portion(s) [that] specify the important pixels or higher order features responsible for the output generation (e.g., classification or prediction)." (Zhang, [0114], [0115].) Therefore, Zhang's cropped image 438 includes portions of defective pixels or features of the original image that lead to defect classification/prediction instead of "being a defect-free reference image," as recited in amended claim 1.”

, filed 11/19/21, with respect to the rejection(s) of claim(s) 1-4,7-14 and 15-18 and 20 under 35 USC 102(a)(1) have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of 35 USC 103 in view of Gupta et al. (US Patent App. Pub. No.: US 2018/0293721 A1) that teaches simulating “a defect free version of the patterns in the images”, [0061]: 3rd S and fig. 7:700: a simulation “design…used as the reference”, Gupta: [0095], 2nd S, corresponding to the claimed “defect-free reference image”.





Applicant's arguments filed 11/19/21 have been fully considered but they are not persuasive. Applicants state in page 13, emphasis added:
“Furthermore, Zhang does not disclose "combining the first output and the second output using one or more optimized parameters to obtain a runtime defect detection result of the specimen," where the first output is obtained by "processing the runtime image using a supervised model component" and the second output is obtained by "separately processing the runtime image using an unsupervised model component," as recited in amended claim 1.”
The Office action alleges that the merging of arrows at data partition 420 of FIG. 4 discloses "combining the first output and the second output," as recited in amended claim 1. (Office action, page 5.) Applicant respectfully disagrees. Zhang discloses that "cropped image(s) 438 generated by cropping 436 may be output to data partition 420, which may then use the cropped images to generate additional training data 422, which may replace the original training data," where the "new training data may then be used to tune best model 1." (Zhang, [0115].) Therefore, Zhang's merging arrows refer to an interactive tuning process of the learning model using new training data in a training phase, which does not disclose "combining the first output and the second output using one or more optimized parameters to obtain a runtime defect detection result of the specimen," (where the first output and the second output are two runtime processing outputs other than two different training data) as recited in amended claim 1.”

The examiner respectfully disagrees since Zhang discloses under the broadest reasonable interpretation of “combining the first output and the second output” in light of applicant’s disclosure, such as page 19, [0063], 2nd S: “The combination can be performed in various ways in order to optimize the detection results”:
combining (via “a combination of”, [0077]: 1st S, an image “x” with vector “v” as the input of fig. 4:424: “Model Training 1” or as for example “(x, v)” in the disclosure: [0078], last S) the first output and the second output (or cropped images via fig. 4:438: “Cropped Image” that outputs cropped region of interest, ROI, images “x” to fig.4:420: “Data Partition” to be combined again with vector “v”, wherein the combining is the input of fig. 4:424: “Model Training 1”) using one or more optimized parameters (via “tune…parameters”: [0092]:3rd S) to obtain a runtime defect detection result (via output arrows of fig. 4 based on said fig. 4:424: “Model Training 1”) of the specimen.
In addition, the examiner anticipates that applicants will claim “combining with each other (“operatively connected to each other”: 35 USC 112(a) support: applicant’s disclosure [0027]: last S) the first output and the second output”. Thus, Zhang is silent regarding the “combining with each other the first output and the second output” or combining a cropped ROI image “x” with another cropped ROI image “x” resulting in a combination of (x, x) being input to said fig. 4:424: “Model Training 1”. Thus, a new reference, such as to HUBAUX et al. (WO 2020/156769 A1) with priority to Priority Data: 19209695.8 18 November 2019 (18.11.2019) EP, would be required to teach “combining with each other the first output and the second output”, as shown in HUBAUX: fig. 8: “SL” (Supervised Learning) combined with “UL” (Unsupervised Learning) and “P” (Prediction). 
This remark is related to the Office action of 8/19/21, page 34: Suggestions pointing out a clear difference under 35 USC 103 in view of applicant’s disclosure. The clear difference is what exactly is being combined with each other. Such as clear water and orange powder being mixed together to create water that looks orange. Claim 1 does not claim this exact combination with each other as disclosed in applicant’s disclosure.  






In response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (i.e., “uses a supervised model and an unsupervised model in combination” in applicant’s remarks, page 13) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
Instead claim 1, last limitation states:
“combining the first output and the second output using one or more optimized parameters”.
Response to Rejections under 35 USC 103
In view of applicant’s remarks in page 14 regarding the 35 USC 103 rejection of claims 5 and 19 and 6 and the 35 USC 102 rejection of claims 1 and 15 (including claim 20), the examiner is relying upon Zhang to teach the claim portions of claims 1 and 15 (including claim 20) as discussed above except for the claimed “defect-free” which requires a new reference, said Gupta et al. (US Patent App. Pub. No.: US 2018/0293721 A1), to teach “defect-free”.






Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness. Regarding inquiry 4, see Suggestions.
Claims 1-4,7-14 and 15-18 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (US 2018/0107928 A1) in view of Gupta et al. (US Patent App. Pub. No.: US 2018/0293721 A1) with the following extra evidence to explain “semi-supervised”, as detailed above, via MPEP:
2131.01 Multiple References 35 USC 102 Rejections
 II.    TO EXPLAIN THE MEANING OF A TERM USED IN THE PRIMARY REFERENCE 
Extra References or Other Evidence Can Be Used to Show Meaning of a Term Used in the Primary Reference

NOONE et al. (US 2020/0166909 A1), filed Nov. 19, 2019,  
Zejda et al. (US 2020/0410354 A1), filed Jun 27, 2019,  
Bhaskar et al. (US 2019/0303717 A1), filed Mar. 25, 2019, 
Sugaya (WO 2020/217957 A1), with priority document of 2019-085763 (26.04.2019): the priority document itself is not translated, 
YAMAGUCHI (US 2018/0101924 A1), filed Oct. 5, 2017, and
Fujii et al. (US Patent App. Pub. No.: US 2017/0308049 A1), filed Apr. 21, 2017.
Regarding claim 1, Zhang teaches a computerized method for runtime defect detection on a specimen, the computerized method being performed by a processor (or “processor”, [0041], 2nd S) and memory (or “memory medium”, id.) circuitry (via said processor) (PMC), the computerized method comprising:
obtaining a runtime (via fig. 1: “Computer subsystem”, three times with “runtime mode”, [0094]:5th S) image (via fig. 4:400: “Imaging tool”) representative of at least a portion of the specimen (or “specimen” as shown in fig. 1:14);
processing (said via fig. 1: “Computer subsystem”, three times) the runtime image using a supervised (“semi-supervised detect or region detection”, cited below [0023], wherein “semi-supervised” is explained via said MPEP 2131.01 II) model component (or half component, comprised by fig. 4:428:bottom-right: “Best Model 1”, via Dictionary.com: “semi- A prefix that means…half” via said “semi-supervised”) to obtain a first output (or arrow between fig.4:438:top-right: “Cropped Image” and fig. 4:420:bottom-left: “Data Partition”) indicative of estimated (via a “crude… approximate”, cited below:[0087]) presence (via fig. 4:432: bottom-right: “Detection” corresponding to “defect to be detected”, cited [0106]) of first defects on the runtime image (via “defects in the image”:[0067]), wherein the supervised model component is previously trained (via fig. 4:422:bottom-left: “Training Data” wherein “ing” of “Training Data” is expressing the result of the verb “train” as “trained”) using a first training set comprising plurality first images;
separately (via “data and labels 418 is separated into training data 422”, [0112], represented in fig. 4:422: “Training Data” and via a “different deep learning model”, [0116]: penultimate S) processing (represented as arrows in fig. 4 output from said “different deep learning model” such that fig. 4:424: “Model Training 1”:output arrow is separate from fig. 4:440: “Model Training 2”:output arrow and also inputs said separated data and labels 418) the runtime image using an unsupervised (comprised by “semi-supervised detect or region detection”) model component (via fig. 4:428: “Best Model 1”) to obtain a second (via “iteratively tuning a deep learning model”, [0115], last S) output (via said arrow between fig.4:438:top-right: “Cropped Image” and fig. 4:420:bottom-left: “Data Partition”) indicative of estimated (said via a “crude… approximate”) presence (said via fig. 4:432: “Detection” corresponding to “defect to be detected”) of second (via fig. 4:438: “Cropped Image”) defects on the runtime image (said via “defects in the image”), wherein the unsupervised model component is previously trained (via fig. 4:424: “Model Training 1” wherein “ing” of “Model Training 1” is expressing the result of the verb “train”, i.e., trained) using a second (via said “iteratively tuning a deep learning model”) training set (via said arrow between fig.4:438:top-right: “Cropped Image” and fig. 4:420:bottom-left: “Data Partition”) including a plurality of second (cropped) images each representative of each second (cropped) image of the plurality of second images being a defect-free reference (or information or data) image of (used to indicate association) a first (un-cropped) image of the plurality of images; and 


combining the first output and the second output (resulting in “a combination of…images” represented as said fig. 4:420: “Data Partition”, a merging of said arrows to be inputted into the deep learning model of fig. 4:424: “Model Training 1” given that “An input to a deep learning model can include a combination of: a) images defined by x…and b) feature vector v(m)”) using one or more optimized parameters (or “fine tune parameters”, [0115], 5th S) to obtain a runtime defect detection result (or “light result…at the detector” represented as the arrows in fig. 4, such as the output arrow of fig. 4:432: “Detection” based on the detector and is with said runtime mode) of the specimen (via:
“[0023] One embodiment relates to a system configured to perform diagnostic functions for a deep learning model.  Some embodiments described herein are configured as systems with optional visualization capability for causal understanding and guided training of a deep learning model for semiconductor applications such as inspection and metrology.  For example, the embodiments described herein provide a system configured to perform quality assurance and causal understanding for a deep learning model.  In particular, as described further herein, the embodiments are configured for generating causal information (e.g., causal image/vector) through several possible methods and/or algorithms.  In addition, by using the causal information, the embodiments can quantitatively determine the model performance.  Furthermore, the systems can use the information gained by quality assurance and/or causal understanding to perform one or more functions such as providing guidance on data augmentation and/or fine-tuning the process to further improve the accuracy of the deep 
learning model.  In other words, by using causal information (e.g., causal image/vector) in augmentation, the embodiments can improve the deep learning model further.  Moreover, the embodiments described herein provide semi-supervised detect or region detection, which can advantageously reduce manual labeling efforts.













“[0027] In one embodiment, the imaging tool is configured as an optical based imaging tool.  In this manner, in some embodiments, the images are generated by an optical based imaging tool.  In one such example, in the embodiment of the system shown in FIG. 1, optical based imaging tool 10 includes an illumination subsystem configured to direct light to specimen 14.  The illumination subsystem includes at least one light source.  For example, as shown in FIG. 1, the illumination subsystem includes light source 16.  In one embodiment, the illumination subsystem is configured to direct the light to the specimen at one or more angles of incidence, which may include one or more oblique angles and/or one or more normal angles.  For example, as shown in FIG. 1, light from light source 16 is directed through optical element 18 and then lens 20 to 
specimen 14 at an oblique angle of incidence.  The oblique angle of incidence may include any suitable oblique angle of incidence, which may vary depending on, for instance, characteristics of the specimen.”

“[0029] In some instances, the imaging tool may be configured to direct light to the specimen at more than one angle of incidence at the same time.  For example, the illumination subsystem may include more than one illumination channel, one of the illumination channels may include light source 16, optical element 18, and lens 20 as shown in FIG. 1 and another of the illumination channels (not shown) may include similar elements, which may be configured differently or the same, or may include at least a light source and possibly one or more other components such as those described further herein.  If such light is directed to the specimen at the same time as the other light, one or more characteristics (e.g., wavelength, polarization, etc.) of the light directed to the specimen at different angles of incidence may be different such 
that light resulting from illumination of the specimen at the different angles of incidence can be discriminated front each other at the detector(s).”

“[0067] In a further such embodiment, the deep learning model includes one or more fully connected layers configured for classifying defects on the specimen.  A "fully connected layer" may be generally defined as a layer in which each of the nodes is connected to each of the nodes in the previous layer.  The fully connected layer(s) may perform classification based on the features extracted by convolutional layer(s), which may be configured as described further herein.  The fully connected layer(s are configured for feature selection and classification.  In other words, the fully connected layer(s) select features from a feature map and then classify the defects in the image(s) based on the selected features.  The selected features may include all of the features in the feature map (if appropriate) or only some of the features in the feature map.”







“[0070] The features determined the deep learning model may include any suitable features described further herein or known in the art that can be inferred from the input described herein (and possibly used to generate the output described further herein).  For example, the features may include a vector of intensity values per pixel.  The features may also include any other types of features described herein, e.g., vectors of scalar values, vectors of independent distributions, joint distributions, or any other suitable feature types known in the art.”

“[0077] An input to a deep learning model can include a combination of: a) images defined by x(h, w, c, t, .  . . ), which is an N-dimensional tensor of images with height=h and width=w across other dimensions, e.g., channel c, time t, etc. (In semiconductor applications, x can be an optical image, an electron beam image, a design data image (e.g., CAD image), etc. under different tool conditions.); and b) feature vector v(m), which is a 1-dimensional vector (The dimension can be generalized to be more than 1.).

“[0087] In a further embodiment, the diagnostic component is configured for determining the one or more causal portions by global average pooling.  As described by Lin et al, in "Network In Network," arXiv: 1312,4400, which is incorporated by reference as if fully set forth herein, the global average pooling (GAP) is introduced and defined.  GAP provides crude pixel-level causal region information, which can be approximately interpreted as causal image/vector.  The embodiments described herein may be further configured as described in the above reference.

“[0106] In one such example, the causal information may be generated for an input image and if the relevant region in the causal information matches the defect to be detected (in the case of defect detection or classification), the diagnostic component may determine that no augmentation needs to be performed as the model predicted correctly.  However, if the relevant region only matches part of a defect or does not match a defect at all (in the case of defect detection or classification), the diagnostic component may determine that an augmentation method may be advantageous and may request input from a user for a possible augmentation method.  The user may then, for example, specify one or more attention portions and/or one or more ignore regions in the input image via bounding boxes, locations, etc. The information for these user-specified portions can be sent to the augmentation step to alter the input image, for 
example, by randomly perturbing the ignore portion(s) by zeroing or adding noise and/or randomly transforming the attention portion(s).”









“[0115] For example, as shown in FIG. 4, detection 432 may generate ROI 434, which may include information for any one or more causal portions identified as ROIs, which may be used for cropping 436 of the original image to the candidate patch along with the output (e.g., class prediction) from best model 1.  In particular, the original image may be cropped to eliminate portion(s) of the original image that do not correspond to the ROI(s).  In one such example, cropped image(s) 438 generated by cropping 436 may be output to data partition 420, which may then use the cropped images to generate additional training data 422, which may replace the original training data.  The new training data may then be used to tune best model 1.  For example, the new training data may be input to model training 1 424, which may be used to tune or fine tune parameters of best model 1, which may output results to model selection 426.  Model selection may produce best model 1 428, which would be a modified version of the best model 1 originally produced.  The new best model 1 may then be evaluated as described above and used for detection of ROI(s), which can be used to generate still further training data, which can be used to re-tune the best model 1 again.  In this manner, the embodiments described herein provide a system for iteratively tuning a deep learning model based on ROI(s) determined by previous versions of the deep learning model.
[0116] In some embodiments, the one or more functions include identifying the 
one or more causal portions as one or more ROIs in the image, which may be 
performed as described herein, and training an additional deep learning model 
based on the one or more ROIs.  For example, causal back propagation or another 
of the causal portion determination methods described herein may be used as 
semi-supervised ROI detection to train a second "more accurate" deep learning 
model based on cropped images.  The reason this is called "semi-supervised" is 
that the labeling process for best model 1 does not require labeling exactly 
the bounding box for each object.  As shown in FIG. 4, for example, cropped 
image 438 may be provided to model training 2 440.  Model training 2 may be 
performed as described herein, but using a different deep learning model than 
that trained in model training 1 424.  Results of model training 2 may produce 
best model 2 442, which may then be provided to model deployment 444, which may 
be performed as described further herein.”







Zhang does not teach:
“each second image of the plurality of second images being a defect-free reference image”.
Gupta teaches:
each second image (or “an image”, [0061], 3rd S) of the plurality of second images (via said “an image” being any image) being (“being”--used as a copula to connect the subject with its predicate adjective, or predicate nominative, in order to describe, identify, or amplify the subject—Dictionary.com) a defect-free reference image (via said “an image…can be…the design”, fig. 7:700:an image, “for generating” “a defect free version”, fig. 7:706:an image, “based on a design”, said [0061], 3rd S, where “the design is used as the reference”, [0095], 2nd S).
Thus, one of ordinary skill in the art of image simulations as indicated in Zhang:
“[0020] The terms “design,” “design data,” and “design information” as used interchangeably herein generally refer to the physical design (layout) of an IC and data derived from the physical design through complex simulation or simple geometric and Boolean operations.”

can modify Zhang’s teaching of the cropped region of interest, ROI, images (Zhang, fig. 4:438: “Cropped Image” based on fig. 4:434: “ROI”) with Gupta’s teaching of said “an image” by:
a)	making each of Zhang’s cropped ROI images be as Gupta’s “an image”;

b)	making each of said best models 1 and 2 be as Gupta’s fig. 7:704:712: “Trained First Learning Based Model”: “Trained Second Learning Based Model”; and

c)	recognizing that the modification is predictable or looked forward to because each of said “an image”, used for generating a respective image: Gupta’s fig. 7:706, corresponds to “simpler and/or more robust ways for generating simulated contours that can be used to generate much more precise estimations of the simulated contours.”, Gupta, [0063], last S.	
Regarding claim 2, Zhang as combined teaches the computerized method according to claim 1, wherein the one or more optimized parameters are obtained during training using a third training set (via “additional training images” via:
“[0092] In some embodiments, the one or more functions include altering one or more parameters of the deep learning model based on the determined one or more causal portions.  For example, the diagnostic component may determine if the one or more causal portions are the correct causal portion(s) of the image, which may be performed as described further herein.  If the one or more causal portions are incorrect., the diagnostic component may be configured to fine-tune or re-train the deep learning model to thereby alter one or more parameters of the deep learning model, which may include any of the parameters described herein.  The fine-tuning or re-training of the deep learning model may include inputting additional training images to the deep learning model, comparing the output generated for the training images to known output for the training images (e.g., defect classification(s), segmentation region(s), etc.), and altering one or more parameters of the deep learning model until the output generated for the additional training images by the deep learning model substantially matches the known output for the additional training images.  In addition, the diagnostic component may be configured to perform any other method and/or algorithm to alter one or more parameters of the deep learning model based on the determined one or more causal portions.”












Regarding claim 3, Zhang as combined teaches the computerized method according to claim 2, 
wherein the first output is a first grade map (via “the image (i.e., a feature map)”) representative of estimated (said via a “crude… approximate”) probabilities (via a “probabilistic” “model”) of the first defects on the runtime image, and the second output is a second grade map (said via “the image (i.e., a feature map)”) representative of estimated (said via a “crude… approximate”) probabilities (said via a “probabilistic” “model”) of the second defects on the runtime image; [[and]] 
of the first output and the second output is performed using a segmentation model component (or a “segmentation…proposal network”) operatively connected to the supervised model component and the unsupervised model component[[s]] (via “the deep learning model includes one…segmentation…proposal network”), to obtain a composite grade map (via said resulting in “a combination of…images” represented as said fig. 4:420: “Data Partition”, a merging of said arrows) indicative of estimated (said via a “crude… approximate”) probabilities (said via a “probabilistic” “model”) of the first defects and the second defects on the specimen; [[,]] and 
component and the unsupervised model component (for said “re-training” via:





“[0060] In some embodiments, the deep learning model is a generative model.  A "generative" model can be generally defined as a model that is probabilistic in nature.  In other words, a "generative" model is riot one that performs forward simulation or rule-based approaches.  Instead, as described further herein, the generative model can be learned (in that its parameters can be learned) based on a suitable training set of data.  In one embodiment, the deep learning model is configured as a deep generative model.  For example, the model may be configured to have a deep learning architecture in that the model may include multiple layers, which perform a number of algorithms or transformations.”
“[0069] In some embodiments, the information determined by the deep learning model includes features of the images extracted by the deep learning model.  In one such embodiment, the deep learning model includes one or more convolutional layers.  The convolutional layer(s) may have any suitable configuration known in the art and are generally configured to determine features for an image as a function of position across the image (i.e., a feature map) by applying a convolution function to the input image using one or more filters.  In this manner, the deep learning model (or at least a part of the deep learning model) may be configured as a convolution neural network (CNN).  For example, the deep learning model may be configured as a CNN, which is usually stacks of convolution and pooling layers, to extract local features.  The embodiments 
described herein can take advantage of deep learning concepts such as a CNN to solve the normally intractable representation inversion problem.  The deep learning model may have any CNN configuration or architecture known in the art.  The one or more pooling layers may also have any suitable configuration known in the art (e.g., max pooling layers) and are generally configured for reducing the dimensionality of the feature map generated by the one or more convolutional layers while retaining the most important features.”
“[0072] In another embodiment, the information determined by the deep learning model includes one or more segmentation regions generated from the image.  In one such embodiment, the deep learning model includes a proposal network configured for identifying the segmentation region(s) (based on features determined for the image) and generating bounding boxes for each of the segmentation regions.  The segmentation regions may be detected based on the features (determined for the images by the deep learning model or another method or system) to thereby separate regions in the images based on noise (e.g., to separate noisy regions from quiet regions), to separate regions in the images based on specimen features located therein, to separate regions based on geometric characteristics of the output, etc. The proposal network may use features from a feature map, which may be generated or determined as described further herein, to detect the segmentation region(s) in the image based on the determined features.  The proposal network may be configured to generate bounding box detection results.  In this manner, the deep learning model may output bounding boxes, which may include a bounding box associated with each segmentation region or more than one segmentation region.  The deep learning model may output bounding box locations with each bounding box.  The results of the segmentation region generation can also be stored and used as described further herein.”).  
Regarding claim 4, Zhang as combined teaches the computerized method according to claim 2, 
wherein the first output is a first grade map (said via “the image (i.e., a feature map)”) representative of estimated probabilities of the first defects on the runtime image, and the second output is a second grade map (said via “the image (i.e., a feature map)”) representative of estimated probabilities of the second defects on the runtime image; [[and]] 
of the first output and the second output comprises combining the first grade map and the second grade map with respective global weights (“a set of weights that model the world”) to generate a composite grade map (via said resulting in “a combination of…images” represented as said fig. 4:420: “Data Partition”, a merging of said arrows) indicative of estimated (said via a “crude… approximate”) probabilities (said via a “probabilistic” “model”) of the first defects and the second defects on the specimen; [[,]] and
the respective global weights are optimized (said “fine tune parameters”) during the training using the third training set (via:
“[0061] In another embodiment, the deep learning model is configured as a neural network.  In a further embodiment, the deep learning model may be a deep neural network with a set of weights that model the world according to the data that 
it has been fed to train it.  Neural networks can be generally defined as a computational approach which is based on a relatively large collection of neural units loosely modeling the way a biological brain solves problems with relatively large clusters of biological neurons connected by axons.  Each neural unit is connected with many others, and links can be enforcing or inhibitory in their effect on the activation state of connected neural units.  These systems are self-learning and trained rather than explicitly programmed and excel in areas where the solution or feature detection is difficult to 
express in a traditional computer program.”).  


Regarding claim 7, Zhang as combined teaches the computerized method according to claim 1, wherein the supervised model component is trained by processing each first image of the plurality of first images to generate (said via fig. 4:400: “Imaging tool”) a corresponding first grade map (said via “the image (i.e., a feature map)” at fig. 4:424: “Model Training 1”) representative of estimated probabilities (said via a “probabilistic” “model”) of the first defects on the first image, and optimizing (via said “fine tune parameters”) the supervised model component based on the label data corresponding to the first image (said via fig. 4:400: “Imaging tool”).  
Regarding claim 8, Zhang as combined teaches the computerized method according to claim 1, wherein the unsupervised model component is trained by processing each second image of the plurality of second image to generate (said via fig. 4:400: “Imaging tool”) a corresponding second grade map (said via “the image (i.e., a feature map)” at fig. 4:440: “Model Training 2”) representative of estimated probabilities (said via a “probabilistic” “model”) of the second defects on the plurality of second images, and optimizing (via said “fine tune parameters”) the unsupervised model component plurality of second images (said via fig. 4:400: “Imaging tool”).  





Regarding claim 9, Zhang as combined the computerized method according to claim 1, wherein the first training set further includes, for each first image of the plurality of first images, corresponding design data (or “ ‘design,’ ‘design data,’ and ‘design information’ as used interchangeably herein”), and/or at least one reference image, and the obtaining further comprises obtaining (via “derived from…simulation”) at least one of  design data [[and/]]or at least one reference image of the runtime image (via:
“[0020] The terms "design," "design data," and "design information" as used interchangeably herein generally refer to the physical design (layout) of an IC and data derived from the physical design through complex simulation or simple geometric and Boolean operations.  In addition, an image of a reticle acquired by a reticle inspection system and/or derivatives thereof can be used as a "proxy" or "proxies" for the design.  Such a reticle image or a derivative thereof can serve as a substitute for the design layout in any embodiments described herein that use a design.  The design may include any other design data or design data proxies described in commonly owned U.S.  Pat.  No. 7,570,796 issued on Aug.  4, 2009 to Zafar et al. and U.S.  Pat.  No. 7,676,077 issued on Mar.  9, 2010 to Kulkarni et al., both of which are incorporated by reference as if fully set forth herein.  In addition, the design data can be standard cell library data, integrated layout data, design data for one or more layers, derivatives of the design data, and full or partial chip design data.”).  

Regarding claim 10, Zhang as combined teaches the computerized method according to claim 1, wherein the second training set further includes, for each second image of the plurality of second images , corresponding design data (said or “ ‘design,’ ‘design data,’ and ‘design information’ as used interchangeably herein”), and the obtaining further comprises obtaining (said via “derived from…simulation”) design data of (“of” used to indicate association) the runtime image.  




Regarding claim 11, Zhang as combined teaches the computerized method according to claim 1, wherein the supervised model component and the unsupervised model component are trained separately (as shown in fig. 4:424:440: “Model Training”).  
Regarding claim 12, Zhang as combined teaches the computerized method according to claim 1, further comprising obtaining, during runtime, one or more new first images (said via “additional training images”) each with label data (via fig. 4:406: “Data labeling”) indicative of presence (said via fig. 4:432: “Detection”) of one or more new classes (or “extra…classes…for further training”) of defects (said via “defects in the image”), and retraining (via said “re-training” and “further training”) the supervised model component using the new first images (via:
“[0101] In a further embodiment, the one or more functions include determining one or more characteristics of the one or more causal portions and determining, based on the one or more characteristics of the one or more causal portions, if additional images for the specimen should be collected from the imaging tool and used for additional training of the deep learning model.  For example, the diagnostic component or visualization 336 may be added after model evaluation as shown in FIG. 3 and it may fall back to data collection 302, if causal assurance failed on a) considerable samples of one type or class; and/or b) considerable samples of several types or classes.  If this path is selected, extra data for the error types or classes are collected from imaging tool 300 
for further training.  For example, as shown in FIG. 3, visualization 336 may send output such as instructions for additional data collection to data collection 302 step, which may be performed using imaging tool 300.  The additional data collection may be performed using the same specimens that were used for initial data collection and/or different specimens not previously used for data collection.”).






Regarding claim 13, Zhang as combined teaches the computerized method according to claim 1, wherein the runtime image is a review (via a “review…inspection”) image generated by a review (via a “review…inspection” or “semiconductor…inspection”, cited in the rejection of claim 1 or “a reticle inspection system”, cited in the rejection of claim 9) tool (said via fig. 4:400: “Imaging tool” via:
“[0005] Defect review typically involves re-detecting defects detected as such by an inspection process and generating additional information about the defects at a higher resolution using either a high magnification optical system or a scanning electron microscope (SEM).  Defect review is therefore performed at discrete locations on specimens where defects have been detected by inspection.  The higher resolution data for the defects generated by defect review is more suitable for determining attributes of the defects such as profile, roughness, more accurate size information, etc.”).














Regarding claim 14, Zhang as combined teaches the computerized method according to claim 1, further comprising 
processing the runtime image using one or more additional (relative to fig. 4:426: “Model Selection”) components of supervised or 
wherein the one or more additional (said relative to fig. 4:426: “Model Selection”)  at least one of different layers (via a “design…image of…one or more layers”, cited in the rejection of claim 9) of the specimen [[and/]]or from different specimens (“such as reticles and wafers” via:
“[0021] In addition, the "design," "design data," and "design information" described herein refers to information and data that is generated by semiconductor device designers in a design process and is therefore available for use in the embodiments described herein well in advance of printing of the design on any physical specimens such as reticles and wafers.”).  





Regarding claim 15, claim 15 is rejected the same as claim 1. Thus, argument presented in claim 1 is equally applicable to claim 15. Accordingly, Zhang as combined as shown in the rejection of claim 1 teaches claim 15 of a computerized system of runtime defect detection on a specimen, the computerized system comprising a processor and memory circuitry (PMC) configured to: 
obtain a runtime image representative of at least a portion of the specimen;
process the runtime image using a supervised model component to obtain a first output indicative of estimated presence of first defects on the runtime image, wherein the supervised model component is previously trained using a first training set comprising plurality of first images;
separately process the runtime image using an unsupervised model component to obtain a second output indicative of estimated presence of second defects on the runtime image, wherein the unsupervised model component is previously trained using a second training set including a plurality of second images each representative of of the plurality of second images being a defect-free reference image of a first image of the plurality of first images; and 
combine the first output and the second output using one or more optimized parameters to obtain a runtime defect detection result of the specimen.  



Regarding claim 16, claim 16 is rejected the same as claim 2. Thus, argument presented in claim 2 is equally applicable to claim 16. Accordingly, Zhang as combined teaches claim 16 of the computerized system according to claim 15, wherein the one or more optimized parameters are obtained during training using a third training set.  
Regarding claim 17, claim 17 is rejected the same as claim 3. Thus, argument presented in claim 3 is equally applicable to claim 17. Accordingly, Zhang as combined teaches claim 17 of the computerized system according to claim 16, wherein: 
the first output is a first grade map representative of estimated probabilities of the first defects on the runtime image, and the second output is a second grade map representative of estimated probabilities of the second defects on the runtime image; [[and]]
model component and the unsupervised model component[[s]], to obtain a composite grade map indicative of estimated probabilities of the first defects and the second defects on the specimen;[[,]] and 
the segmentation model component is trained using the third training set based on outputs of the supervised model component and the unsupervised model component.  




Regarding claim 18, claim 18 is rejected the same as claim 4. Thus, argument presented in claim 4 is equally applicable to claim 18. Accordingly, Zhang discloses claim 18 of the computerized system according to claim 16, wherein: 
the first output is a first grade map representative of estimated probabilities of the first defects on the runtime image, and the second output is a second grade map representative of estimated probabilities of the second defects on the runtime image; and 
; and 
the respective global weights are optimized during the training using the third training set.  










Regarding claim 20, claim 20 is rejected the same as claims 1 and 15. Thus, argument presented in claims 1 and 15 is equally applicable to claim 20. Accordingly, Zhang as combined above in the rejection of claim 1 teaches claim 20 of a non-transitory computer readable storage medium (via fig. 5:500: “Computer-readable medium”) tangibly embodying a program of instructions that, when executed by a computer, cause the computer to perform a method of runtime defect detection on a specimen, the method comprising: 
obtaining a runtime image representative of at least a portion of the specimen;
processing the runtime image using a supervised model component to obtain a first output indicative of estimated presence of first defects on the runtime image, wherein the supervised model component is previously trained using a first training set comprising plurality of first images;
separately processing the runtime image using an unsupervised model component to obtain a second output indicative of estimated presence of second defects on the runtime image, wherein the unsupervised model component is previously trained using a second training set including a plurality of second images each representative of of the plurality of second images being a defect-free reference image of a first image of the plurality of first images; and 
combining the first output and the second output using one or more optimized parameters to obtain a runtime defect detection result of the specimen.  
Claims 5 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (US 2018/0107928 A1) in view of Gupta et al. (US Patent App. Pub. No.: US 2018/0293721 A1) with the following extrinsic evidence to explain “semi-supervised”, as detailed above, via MPEP:
2131.01 Multiple References 35 USC 102 Rejections
 II.    TO EXPLAIN THE MEANING OF A TERM USED IN THE PRIMARY REFERENCE 
Extra References or Other Evidence Can Be Used to Show Meaning of a Term Used in the Primary Reference

NOONE et al. (US 2020/0166909 A1), filed Nov. 19, 2019,  
Zejda et al. (US 2020/0410354 A1), filed Jun 27, 2019,  
Bhaskar et al. (US 2019/0303717 A1), filed Mar. 25, 2019, 
Sugaya (WO 2020/217957 A1), with priority document of 2019-085763 (26.04.2019): the priority document itself is not translated, 
YAMAGUCHI (US 2018/0101924 A1), filed Oct. 5, 2017, and
Fujii et al. (US Patent App. Pub. No.: US 2017/0308049 A1), filed Apr. 21, 2017
as applied above in the rejection of claims 1-4,7-14 and 15-18 and 20 further in view of Tuohy (US Patent App. Pub. No.: US 2007/0177135 A1) and Pathangi et al. (US Patent App. Pub. No.: US 2020/0161081).
Regarding claim 5, Zhang as combined teaches the computerized method according to claim 2, wherein:
the processing of the runtime image using [[a]] the supervised model component comprises generating a first grade map (said via “the image (i.e., a feature map)”) representative of estimated probabilities (said via a “probabilistic” “model”) of the first defects on the runtime image and applying a first threshold to the first grade map to obtain a first defect map (or “defects” “map” cited in the rejection of claim 1:[0067]); 


separately processing of the runtime image using [[a]] the unsupervised model component comprises generating a second grade map (said via “the image (i.e., a feature map)”) representative of estimated probabilities (said via a “probabilistic” “model”) of the second defects on the runtime image, and applying a second threshold to the second grade map to obtain a second defect map (said or “defects” “map” cited in the rejection of claim 1:[0067]), the first threshold and the second threshold being optimized during the training using the third training set; [[,]] and 
 the combining of the first output and the second output comprises combining the first defect map and the second defect map to generate a composite defect map.
Thus, Zhang does not teach, as indicated in bold above, the claimed:
a)	a first threshold;
b)	a second threshold; and
c)	the first threshold and the second threshold being optimized during the training using the third training set, and 
of the first output and the second output comprises combining the first defect map and the second defect map to generate a composite defect map.




Tuohy teaches claim 5 of:
a)	a first (via fig. 1b: “1. filter”) threshold (or “a certain size…threshold…such as size, shape, position within a predefined area, number of defects per unit area and the like”);
b)	a second (via fig. 1b: “2. filter”) threshold (or “a certain size…threshold…such as size, shape, position within a predefined area, number of defects per unit area and the like”); and
c)	the first threshold and the second threshold being optimized during the training using the third training set, and 
of the first output and the second output comprises combining (via fig. 1a:130: “correlation unit”) the first defect map (or “defects…substrate map 155A”) and the second defect map (or “defect…substrate map 156A” or “defects…substrate map 157A”) to generate a composite defect map (via:..










“[0021] The process of removing less relevant data from a given measurement data set may be accomplished by using appropriate measurement data, i.e., data having reduced noise, which may be considered as reference data, and combining or merging the filtered measurement data with the reference data to determine, for instance, a degree of correlation, a die loss and the like for the set of measurement data that has been filtered on the basis of a predefined filter criterion.  For example, if the filtered measurement data may exhibit a significantly increased correlation with respect to the reference data compared to the non-filtered data, the respective filter criterion used may be identified as an appropriate filter criterion and may be used to obtain data of increased statistical significance for the measurement process under consideration.  In other illustrative embodiments, the filtering process may be performed in a progressive manner, i.e., the filtering process may be performed on the basis of progressively restricted filter criteria so that a plurality of differently, i.e., progressively, filtered measurement data is available, for which respective degrees of correlation may be determined.  In other embodiments, the correlation may be used as a "quality monitor" of the measurement data, from which a die loss may be calculated for every filtering step to select an appropriate filtering process on the basis of the calculated die loss.  In some illustrative embodiments, the term "progressively filtering" may indicate a filtering process in which the initial measurement data are filtered with respect to the same filter criterion but with an increasingly restrictive filter behavior.  In other illustrative embodiments, the term "progressively filtering" may include a plurality of consecutive filtering processes, wherein a different filter criterion may be applied to a filtered measurement data set that has previously been filtered by a different criterion.  For example, in the former case, a filter criterion may be selected, such as the size or area of a defect detected by optical inspection, the number of defects per unit area and the like, wherein, in each filtering step, the corresponding filtering action or range may be set more restrictively.  That is, it may be assumed that the influence of a defect may 
increase with its size, thereby rendering the corresponding larger defects more relevant compared to a smaller defect.  Consequently, during the progressive filtering process, the filter arrangement may be set so as to detect defects at or above a certain size, while neglecting effects below the threshold.  In the latter case, different filter criteria, such as size, shape, position within a predefined area, number of defects per unit area and the like, may be successively applied in order to reduce the noise in the original measurement data, thereby providing the potential for identifying appropriate filter "threads" that may be used in a corresponding manufacturing environment for 
obtaining measurement data of increased relevance.”; and









“[0029] Next, the measurement data 152A may be subjected to a first filtering process, for instance on the basis of a filter criterion determining a minimum defect size, below which a defect is considered as being not present.  Consequently, after re-processing the measurement data 152A according to the respective filter criterion and the setting of the filter criterion in the first step by selecting an appropriate minimum size, a filtered substrate map 154A may be obtained, wherein, for instance, 10 dies may be considered clean, while 86 die are still evaluated as defective die.  In a next filter step, a more restrictive range for the specified criterion, that is, an even increased 
minimum size of the defects, may be selected so that a further substrate map 155A may be generated.  For example, the minimum size in each of the filtering steps may be obtained as a multiple of the initial minimum defect size detectable by the corresponding inspection tool.  It should be appreciated, however, that any other value for the restricted range in the first, second and further filter step may be used.  The resulting filter process may yield 19 clean dies and thus 77 defective die.  Similarly, in a third filter step having a further increased restriction with respect to the corresponding filter 
criterion, such as the defect size, a further filtered set of measurement data represented by a substrate map 156A may be created.  Hereby, it may be assumed that 60 clean die are obtained, while 36 defective die are detected.  In a next filtering step, an even increased restriction, i.e., only defects having a size above a threshold higher than a threshold of any of the filter processes performed before, may be performed and may yield a corresponding set of filtered data represented by a substrate map 157A, wherein it may be assumed that 77 clean die are detected and thus 19 defective die are still present.  It should be appreciated that the above sequence of filtering steps is of illustrative nature only and other filter criteria in combination with respective increasingly restricted filter ranges may be used to obtain progressively filtered data sets.”).

Thus, one of ordinary skill in maps can modify Zhang’s teaching of said “defects” “map” cited in the rejection of claim 1:[0067] with Tuohy’s teaching of said fig. 1b: “1. filter” by:
a)	providing multiple of Zhang’s fig. 4:400 at Zhang’s fig. 4:400;
b)	inserting Tuohy’s fig. 1a:100 upon the output of each Zhang’s fig. 4:400; and
c)	recognizing that the modification is predictable or looked forward to because the modification allows one to monitor measurement-quality via a “ ‘quality monitor’ of the measurement data” from each of said Zhang’s fig. 4:400 via Tuohy, cited above.


	






The combination does not teach, as indicated in bold above, the remaining limitation of:
c)	“the first threshold and the second threshold being optimized during the training using the third training set” 
	











Accordingly, Pathangi teaches:
c)	the first (“corresponding”) threshold (or “threshold (Thr)”) and the second (“corresponding”) threshold (said or “threshold (Thr)” via fig. 1:102: “Quantify a number of pixels in the image that exceed a corresponding threshold in the matrix”) being optimized (“to provide optimal detection results” corresponding to a “tuned” “threshold” comprising “optimum performance”) during the training (via “the heat map of FIG. 4…used to train” as shown in figures 5-13 that uses parts, fig. 4: bolded squares, of said heatmap) using the third training set (said heatmap via:
“[0050] A nuisance rate can be tuned to required levels using the detection threshold parameter.  Thus, the threshold can be tuned depending on the application or desired sensitivity.”

wherein “tuned” is defined via Dictionary.com:
BRITISH DICTIONARY DEFINITIONS FOR TUNE
tune
verb
14	(tr often foll by up) to make fine adjustments to (an engine, machine, etc) to obtain optimum performance; and

“[0064] FIGS. 3-13 illustrate tuning a nuisance rate to required levels using the detection threshold parameter.  Depending on the average CD of one population of contact holes, the threshold (Thr) is different to provide optimal detection results.”

“[0067] In an SEM images of the three dies in the thick black border in the heat 
map of FIG. 4 (mean diameters of 16.7, 17.2, and 17.2), all the individual contact holes that are smaller than 10% of the mean critical dimension of the 100 contact holes from the corresponding image were used to train the deep learning model to be identified as defective.  Using this deep learning model, the SEM images from the three individual dies marked with a thick border (mean diameters of 15.7, 17.1, and 19.4) were used as verification images to assess the performance of the deep learning model in identifying defective contact holes.”).




	Thus, one of ordinary skill in the art of metrology and defects and thresholding thereof and heatmaps as indicated in Zhang teaching of “heatmap” via:
“[0084] In some embodiments, the diagnostic component is configured for determining the one or more causal portions by causal back propagation performed using a deconvolution heatmap algorithm.  A deconvolution heatmap can be viewed as a specific implementation of causal backpropagation.  11or example, as described by Zeiler et al., "Visualizing and understanding convolutional networks," ECCV, 2014, pp.  818-833, which is incorporated by reference as if fully set forth herein, the causal image can be computed via mapping activation from the deep learning model's output back to the pixel/feature (i.e., x and v) space through a backpropagation rule.  The embodiments described herein may be further configured as described in this reference.

can modify Zhang’s said “defects” “map” as modified via the combination of Tuohy with Pathangi’s teaching of “threshold (Thr)” by:
a)	making each of the combination’s Zhang’s fig. 4:400: “Imaging tool” as modified via the combination of Tuohy be as Pathangi’s fig. 14:200; 
b)	placing the tuned thresholds of Pathangi at Tuohy’s thresholds at said fig. 1b: “1. filter” and fig. 1b: “2. filter”; and 
c)	recognizing that the modification is predictable or looked forward to because the modification is used “to provide optimal detection results” via Pathangi, cited above.	








Regarding claim 19, claim 19 is rejected the same as claim 5. Thus, argument presented in claim 5 is equally applicable to claim 19. Accordingly, Zhang as combined above teaches claim 19 of the computerized system according to claim 16, wherein: 
the PMC is configured to process the runtime image using [[a]] the supervised model component by generating a first grade map representative of estimated probabilities of the first defects on the runtime image and applying a first threshold to the first grade map to obtain a first defect map;
separately process the runtime image using [[a]] the unsupervised model component by generating a second grade map representative of estimated probabilities of the second defects on the runtime image and applying a second threshold to the second grade map to obtain a second defect map, the first threshold and the second threshold being optimized during the training using the third training set; [[,]] and   
.  







Claim 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (US 2018/0107928 A1) in view of Gupta et al. (US Patent App. Pub. No.: US 2018/0293721 A1) with the following extrinsic evidence to explain “semi-supervised”, as detailed above, via MPEP:
2131.01 Multiple References 35 USC 102 Rejections
 II.    TO EXPLAIN THE MEANING OF A TERM USED IN THE PRIMARY REFERENCE 
Extra References or Other Evidence Can Be Used to Show Meaning of a Term Used in the Primary Reference

NOONE et al. (US 2020/0166909 A1), filed Nov. 19, 2019,  
Zejda et al. (US 2020/0410354 A1), filed Jun 27, 2019,  
Bhaskar et al. (US 2019/0303717 A1), filed Mar. 25, 2019, 
Sugaya (WO 2020/217957 A1), with priority document of 2019-085763 (26.04.2019): the priority document itself is not translated, 
YAMAGUCHI (US 2018/0101924 A1), filed Oct. 5, 2017, and
Fujii et al. (US Patent App. Pub. No.: US 2017/0308049 A1), filed Apr. 21, 2017
as applied above in the rejection of claims 1-4,7-14 and 15-18 and 20 further in view of Zhou et al. (US Patent App. Pub. No.: US 2019/0104940 A1).
Regarding claim 6, Zhang teaches the computerized method according to claim 4, wherein the respective global weights are obtained using [[an]] a non-gradient optimization function during the training using the third training set.  
Zhang does not teach “a non-gradient optimization function”.
Zhou teaches claim 6 of:
the respective global weights (or weighting coefficients via fig. 3:230: “Calculate change in error as a function of change in the network coefficients”) are obtained using [[an]] a non-gradient optimization function (or “a non-gradient descent optimization algorithm like simulated annealing or a genetic algorithm”) during the training (via fig. 1A: 130: “Network Training”) using the third training set (or fig. 1A:115: “Noisy/Artifact Data” and fig. 1A:120: “Opti-mized Data” via:
“[0067] In step 230 of step 130, a change in the error as a function of the change in the network can be calculated (e.g., an error gradient), and this change in the error can be used to select a direction and step size for a subsequent change to the weights/coefficients of the DL network 135.  Calculating the gradient of the error in this manner is consistent with certain implementations of a gradient descent optimization method.  In certain other implementations, as would be understood by one of ordinary skill in the art, this step can be omitted and/or substituted with another step in accordance with another optimization algorithm (e.g., a non-gradient descent optimization algorithm like simulated annealing or a genetic algorithm).”).



















Thus one of ordinary skill in the art of noise as indicated in Zhang’s teaching of “noise” and “noisy regions” via Zhang:
“[0072] In another embodiment, the information determined by the deep learning model includes one or more segmentation regions generated from the image.  In one such embodiment, the deep learning model includes a proposal network configured for identifying the segmentation region(s) (based on features determined for the image) and generating bounding boxes for each of the segmentation regions.  The segmentation regions may be detected based on the features (determined for the images by the deep learning model or another method or system) to thereby separate regions in the images based on noise (e.g., to separate noisy regions from quiet regions), to separate regions in the images based on specimen features located therein, to separate regions based on geometric characteristics of the output, etc. The proposal network may use features from a feature map, which may be generated or determined as described further herein, to detect the segmentation region(s) in the image based on the determined features.  The proposal network may be configured to generate bounding box detection results.  In this manner, the deep learning model may output bounding boxes, which may include a bounding box associated with each segmentation region or more than one segmentation region.  The deep learning model may output bounding box locations with each bounding box.  The results of the segmentation region generation can also be stored and used as described further herein.”

can modify Zhang’s teaching of the “a set of weights that model the world” with Zhou’s teaching of said fig. 3:230: “Calculate change in error as a function of change in the network coefficients” by:
a)	inserting Zhou’s fig. 1A:110 before Zhang’s fig. 4:428: “Best Model 1” and fig. 4:442: “Best Model 2”; and
b)	recognizing that the modification is predictable or looked forward to because the modification enables one “to produce images resembling the high-image-quality images from…noisy…images” via Zhou: 




“[0031] The process 110 of method 100 performs offline training of the DL
network 135.  In step 130 of process 110, noisy data 115 and optimized data 120 
are used as training data to train a DL network, resulting in the DL network being output from step 130.  More generally, data 115 can be referred to as defect-exhibiting data, for which the "defect" can be any undesirable characteristic that can be affected trough image processing (e.g., noise or an artifact).  Similarly, data 120 can be referred to as defect-reduced data, defect-minimized data, or optimize data, for which the "defect" is less than in the data 115.  In an example using reconstructed images for data 115 and 120, the offline DL training process 110 trains the DL network 135 using a large 
number of noisy reconstructed images 115 that are paired with corresponding high-
image-quality images 120 to train the DL network 135 to produce images resembling 
the high-image-quality images from the noisy reconstructed images.”


































Suggestions

Obvious difference: applicant’s disclosure of fig. 2:208: “Combining the first output and the second output…”. Thus, applicant’s disclosure thereof, such as [0070]- [0072], is an indication of non-obviousness in view of the art applied in the 35 USC 103 rejection.
Applicant’s disclosure states in [006]: “the goal …is…high sensitivity” or a high detection or high perception or high recognition or high rate of identification or high electromagnetic extraction directed to the last limitation of claim 1’s “optimized…defect detection”. The claimed optimization in claims 1,4,5,6,7,8 is directed to this sensitivity as disclosed as multiple examples in applicant’s disclosure as reflected in said claims 1,4,5,6,7 and 8. The first example is applicant’s fig. 4:408:a dot pushing the envelope to the upper-left as the optimization. This pushing of the envelope or the disclosed dot is not apparent in claims 1,4,5,6,7,8. Similarly, said Hubaux (WO 2020/156769) teaches pushing the envelope as shown in fig. 9: dotted-line relative to solid-line.
Note that these suggestions are not provided with respect to overcoming 35 USC 101,112,102 and/or 103. These suggestion are mainly provided to seek out advantages in the disclosure regardless of 35 USC 101,112,102 and/or 103.






Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DENNIS ROSARIO whose telephone number is (571)272-7397. The examiner can normally be reached Monday-Friday, 9AM-5PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on (571)272-7778. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DENNIS ROSARIO/	Examiner, Art Unit 2667

/MATTHEW C BELLA/Supervisory Patent Examiner, Art Unit 2667