Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
	This Office Action is in response to applicant’s response filed on May 2, 2022, under which claims 1-9, 11-22, 24-40 and 42 are pending and under consideration.

Response to Arguments
	The previous § 112(b) rejection of claim 41 has been withdrawn due to the cancellation of this claim.  
35 U.S.C. § 103 Rejections
Applicant’s arguments in regards to the § 103 rejections, which pertain to the limitation of “wherein the one or more functions further comprise determining if the one or more causal portions that resulted in the information being determined are correct one or more causal portions of the image, and wherein the correct one or more causal portions are acquired from a method, another system, or a user of the system” in claim 1, have been fully considered but are not persuasive for the following reasons. 
Applicant first argues that Freixenet fails to teach “one or more causal portions” and instead only teaches segmentation results. These arguments are represented by the following parts of remarks quoted below:
Although Freixenet discloses determining if segmentation results are correct, the segmentation results are not causal portion(s) as presently claimed. For example, Freixenet contains no teaching or suggestion that the segmentation algorithms disclosed and evaluated by Freixenet determine which portion(s) of an image are causal portion(s). …

Freixenet, therefore, discloses simply determining the correctness of segmentation results without regard to any causal portion(s) of the image….

(Applicant’s response, pages 17-18) (emphasis added).
	 
These argument are not persuasive because Freixenet was not relied upon to teach the feature of “causal portions” (i.e., “one or more causal portions that resulted in the information being determined”). This feature is instead taught by Karlinsky. 
Therefore, in response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).
As stated above, Freixenet was not relied upon to teach the “causal portions” of the instant claim. Instead, Freixenet was relied upon to teach the technique of determining whether a certain portion of an image is the correct portion. Although Freixenet was not cited to teach image regions that are specifically “causal portions,” the segmented image regions in Freixenet are nonetheless analogous to the “causal portions” of the claim because they are determined by an algorithm (as stated in the previous Office Action) and the determination of such regions is subject to verification of correctness. Given that Freixenet was not relied upon to teach the “causal portions” but was instead relied upon to teach a general technique that is applicable to such causal portions, applicant cannot overcome the rejection by merely showing that Freixenet fails to teach the “causal portions.” The combination of references teaches the above-quoted limitations of claim 1. Primary reference Karlinsky teaches a “defect bounding box,” as generally described in [0066], Table 1 (near paragraph [0068]), [0090] and [0118], which corresponds to “one or more causal portions” recited in the instant claim. While Karlinsky does not explicitly teach determining the correctness of the bounding box, Freixenet’s technique can be used for the purpose of determining whether a specific portion is the correct portion, and one of ordinary skill in the art would have been motivated to apply Freixenet’s technique in such a manner, as further addressed in the comments below. 
Applicant next argues the following:
In addition, Applicants believe that it is also fair to say that since the algorithms, methods, and systems of Freixenet do not determine or result in any causal portions (since in the segmentation algorithms disclosed and evaluated by Freixenet, each pixel in an image is necessarily or inherently causal), Freixenet does not contain any teaching or suggestion equivalent to determining if one or more causal portions of an image that resulted in the segmentation results being determined are correct one or more causal portions of the image. Consequently, Freixenet does not teach or suggest a deep learning model configured for determining information from an image generated for a specimen by an imaging tool and a diagnostic component configured for determining if one or more causal portions of the image that resulted in the information being determined are correct one or more causal portions of the image, as recited in claims 1, 33-35, and 38.

(Applicant’s response, page 18) (emphasis added).

	Applicant appears to be arguing that because Freixenet teaches segmented portions rather than causal portions, Freixenet does not suggest determining the correctness of other types of regions such causal portions. This argument is not persuasive because the fact that Freixenet teaches segmentation regions rather than causal portions does not negate the fact that its teachings are nonetheless applicable to causal portions and would have suggested performing the same techniques on casual portions.
The segmentation results in Freixenet (which are also referred to as segmentation regions) and the causal portions in the instant claim are both regions of an image. Furthermore, the claim limitation at issue does not require any special technique for determining whether the causal portions are correct. Therefore, in the absence of additional claim limitations that require a specialized technique to determine correctness, a technique that is used for one can also be used for the other. Thus, a person of ordinary skill would have expected, reasonably, that correctness determination can be used on causal portions just as they are taught to be used on segmentation regions.
Additionally, the segmentation regions in Freixenet and the causal portions in the instant claim are both computer-determined regions of an image. Therefore, both the segmentation regions in Freixenet and the causal portions present the problem of determining whether the computer-determined region is correct. Since segmentation regions and causal portions are both computer-determined regions of an image that should be evaluated for accuracy, one of ordinary skill would have had a motivation to apply the technique of determining correctness of a region (as taught in Freixenet) to the particular feature of a causal portion, so as to have arrived at the claimed limitation at issue. Thus, one of ordinary skill in the art would have been motivated to evaluate the accuracy or quality of the causal portions, in the same manner that he or she would have been motivated to evaluate the accuracy or quality of a different type of image region. 
It is also well established that “familiar items may have obvious uses beyond their primary purposes” (KSR International Co. v. Teleflex Inc., 550 U.S. 398, 402 (2007)). Therefore, even if it can be said that the primary purpose of the determination of the correctness of regions in Freixenet is for segmentation regions, the same techniques may have obvious uses for other purposes, such as determining the correctness of image portions beside segmentation regions.
In summary, applicant has only pointed out that the correctness determination technique in Freixenet was not used for the exact feature of “causal portions,” but has not explained why there would have been no motivation or reasonable expectation of success in combining the teachings of Freixenet with the teachings of the other references. Since prima facie obviousness is established when there would have been a motivation to combine the teachings of the references in a manner that arrives at the claimed invention, together with a reasonable expectation of success in doing so, applicant’s arguments do not overcome the present obviousness rejection. 
Finally, applicant argues the following:
Consequently, Freixenet does not teach or suggest a deep learning model configured for determining information from an image generated for a specimen by an imaging tool and a diagnostic component configured for determining if one or more causal portions of the image that resulted in the information being determined are correct one or more causal portions of the image, as recited in claims 1, 33-35, and 38. As such, Freixenet does not teach or suggest all limitations of claims 1, 33-35, and 38 and cannot be combined with Karlinsky and Zintgraf to overcome deficiencies contained therein.

(Applicant’s response, page 18) (emphasis added).
	The above arguments are not persuasive. The rejection is based on the combined teachings of multiple references. Therefore, Freixenet need not teach or suggest all limitations of claims 1, 33-35, and 38, including the specific limitation of “causal portions.” 
	In response to applicant’s argument that Freixenet does not teach a “deep learning model,” this limitation is already taught by primary reference Karlinsky. Furthermore, the fact that Freixenet does not teach a deep learning model does not mean that its teachings are not applicable to a system that uses a deep learning model, as instantly claimed. Therefore, the combination of the cited references, which include Karlinsky and Freixenet, teaches the limitations of the claim.
In response to applicant's argument that Freixenet “cannot be combined” with the other references, the test for obviousness is not whether the features of a secondary reference may be bodily incorporated into the structure of the primary reference; nor is it that the claimed invention must be expressly suggested in any one or all of the references.  Rather, the test is what the combined teachings of the references would have suggested to those of ordinary skill in the art.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981). Therefore, applicant's argument that Freixenet “cannot be combined” with the other references does not reflect the applicable standard for obviousness.
	The Examiner note that the rejection is not bodily incorporating the segmentation regions of Freixenet into the structure of the primary reference. Instead, the rejection asserts that one of ordinary skill would have been motivated to apply the generally applicable techniques of Freixenet in determining correcting of a region to the teachings of the other references. As explained above, one of ordinary skill in the art would have had a motivation and reasonable expectation of success in doing so. Therefore, applicant’s arguments are not persuasive, and the rejection has been maintained.
Examiner’s Suggestions
	The Examiner recognizes that the instant application generally teaches a visualization system for causal understanding and guided training of a deep learning classifier for semiconductor inspection and metrology. In particular, the specification of the Provisional Application No. 62/408,402 teaches that the advantages of such a system includes “visualizing the causal of the prediction on a particular input by a deep learning classifier; and allowing [a] user to reinforce the correct causal relationship to be learned by a deep learning classifier” (§ 6 of the provisional specification). In connection with these advantages, the provisional specification teaches that “when the visualization system is enabled, the causal images are
generated and displayed for selected input data [and] are checked by domain experts, crowd sourcing or comparison algorithms against region labels (if available)” (§ 4 of the provisional specification specification), and the features of “feedback the user knowledge to the classifier development workflow in a systematic way” and “understand why deep learning classifier makes the prediction via causal back-propagation (CBP)” (§ 9 of the provisional specification). 
To advance prosecution and distinguish over the applied art, the Examiner suggests that the applicant focus on the above aspects described in the present application by amending the independent claims to specify (using appropriate claim language) that “the correct one or more causal portions” are received from the user after the image, the determined information, and the one or more causal portions causal portion have been displayed to the user by the system. These features relate to the aspect that the user feeds back knowledge to improve or fine the model by verifying the casual images, an aspect that is not reflected in the current claim language, which does not require user involvement in regards to the “correct one or more causal portions.”
If applicant amends all the independent claims to incorporate the above features, the current art rejections would be overcome. The Examiner cannot, at this time, determine whether or not such amendments would make the claims allowable. However they would move prosecution forward. The Examiner is available for an interview at Applicant's convenience to discuss the above suggestions or alternate ideas.
	
Priority
	Applicant’s election to continue prosecution of claims 17 and 19 based on the filing date of the non-provisional application is acknowledged. The determination of priority of these claims, as set forth in the Office Action of November 16, 2020, remains applicable under the current Office Action. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

1.	Claims 1-9, 24-31, 33-37, and 39-40 are rejected under 35 U.S.C. § 103 as being unpatentable over Karlinsky et al. (US 2017/0177997A1) (“Karlinsky”) in view of Zintgraf et al., “A New Method to Visualize Deep Neural Networks,” arXiv:1603.02518v2 [cs.CV] 9 Jun 2016 (“Zintgraf”) and Freixenet et al., “Yet Another Survey on Image Segmentation:
Region and Boundary Information Integration,” A. Heyden et al. (Eds.): ECCV 2002, LNCS 2352, pp. 408–422, 2002 (“Freixenet”).
As to claim 1, Karlinsky teaches a system configured to perform diagnostic functions for a deep learning model [[0086] teaches diagnostic functions such as providing results of initial model training to a user and receiving user feedback.], comprising: 
one or more computer subsystems having one or more processors that execute instructions from a memory medium; [[0040]: “FPEI system 103 comprises a processor and memory block (PMB) 104…The processor of PMB 104 can be configured to execute several functional modules in accordance with computer-readable instructions implemented on a non-transitory computer-readable memory comprised in PMB.” as shown in FIG. 1.] and 
one or more components executed by the one or more computer subsystems and stored on a non-transitory computer-readable medium, [[0021]: “non-transitory computer readable medium comprising instructions”; see also [0040], quoted above] and wherein the one or more components comprise: 
a deep learning model configured for determining information [Abstract: “Deep Neural Network (DNN) trained for a given examination-related application within a semiconductor fabrication process.” The DNN may have convolutional layers ([0053]), so as to be a convolutional neural network. Examples of information determined using the DNN include defect classification ([0055]), segmentation of process images ([0056]), and constructing images from one or more original images ([0060]), and other outputs shown in Table 1 near paragraph [0069]] from an image generated for a specimen by an imaging tool; [[0008]-[0009]: the DNN receives, as input a fabrication process (FP) image. [0064]: Such images may be “images of a part of a wafer or a photomask captured by SEM or an optical inspection system.” Note that Karlinsky generally relates to “examining a semiconductor specimen” (abstract)]; and
a diagnostic component configured for determining one or more causal portions of the image that resulted in the information being determined [A “defect bounding box,” as generally described in [0066], Table 1 (near paragraph [0068]), [0090] and [0118] corresponds to “one or more causal portions.” Paragraph [0066], Table 1, and [0118] teach that the bounding box can be a “DNN output” ([0068]), thereby teaching that the DNN performs an act of “determining” the defect bounding box. Furthermore, [0099] teaches that a bounding box can be supplied via user feedback. Since this feedback is then utilized to further train the neural network, as described in [0093], the act of “determining” is alternatively met by receipt of user feedback containing a bounding box.] and for performing one or more functions based on the determined one or more causal portions of the image, wherein the one or more functions comprise determining one or more characteristics of the one or more causal portions [The determination of a bounding box constitutes determining one or more characteristics thereof, since a bounding box has a position on the image. This is also described in Table 1: “Defect bounding box coordinate.” That is, the position of the bonding box corresponds to “one or more characteristics of the one or more causal portions.”].
Karlinsky does not specifically teach: 
(1)	“wherein determining the one or more characteristics comprises qualitatively and quantitatively identifying an importance of each pixel of the image input to the deep learning model in contributing to said determining the information”; and
(2)	“wherein the one or more functions further comprise determining if the one or more causal portions that resulted in the information being determined are correct one or more causal portions of the image, and wherein the correct one or more causal portions are acquired from a method, another system, or a user of the system.” 
Zintgraf, in an analogous art, teaches limitations (1) listed above. Zintgraf relates to a “method to visualize deep neural networks,” and is therefore in the same field of endeavor (artificial intelligence, including deep learning). Zintgraf generally relates to determining visualizing the importance of various parts of an input image. See § 1, paragraph 4: “We present a novel visualization method, exemplified for DCNNs, that finds and highlights the regions in image space that activate the nodes (hidden and output) in the neural network.”
In particular, Zintgraf teaches “wherein determining the one or more characteristics comprises qualitatively and quantitatively identifying an importance of each pixel of the image input to the deep learning model in contributing to said determining the information” [§ 3, paragraph 1: “For a given input image, the method will allow us to estimate the importance of each pixel by assigning it a relevance value.” § 3.2: “an individual pixel’s relevance is obtained by taking the average relevance obtained from the different patches it was in.” With respect to the limitation of “importance…in contributing to said determining the information,” note that importance/relevance in Zintgraf refers to relevance to predicting a class. See § 3.1, paragraph 3 (“For a feature to become relevant when using conditional sampling, it now has to satisfy two conditions: being relevant to predict the class of interest, and be hard to predict from the neighboring pixels.”) and § 4.1, paragraph 1 (“Red areas indicate evidence for the class, while blue indicates evidence against the class.”). With respect to the limitation of “qualitatively,” 4.1, paragraph 1, quoted above, teaches the qualitative determinations of “for the class” and “against the class.” See also FIG. 1, as described in the caption: “The colors in the visualizations have the following meaning: red stands for evidence for the predicted class; blue regions are evidence against it. Transparent regions do not have an influence on the decision.” The Examiner notes that a color copy of this reference, showing red and blue regions, can be downloaded from the link given in the full citation of this reference in Form PTO-892 (Notice of References cited). In general, the qualitative determination of “for” or “against” the class depends on the sign. See third page, top paragraph: “The resulting relevance vector has positive and negative entries. A positive value means that the corresponding feature has contributed towards the class of interest. A negative value on the other hand means that the feature value was actually evidence against the class.” While this specific description characterizes an earlier work, Zintgraf is built upon this work (§ 3, first sentence) and uses the same general methodology of computing the WE (weight of evidence) as a difference of two odds (see Algorithm 1, 4th-to-last line), and teaches that its calculation is “signed” (i.e., positive/negative) (§ 4.4, paragraph 2). With respect to the limitation of “quantitatively,” the “relevance value” is a numerical value computed by an algorithm. Therefore, the determination thereof qualifies as being performed “quantitatively.” For example, in FIG. 4, the degree of redness/blueness indicates quantitative evaluation.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Karlinsky and Zintgraf by modifying the system of Karlinsky such that “determining the one or more characteristics comprises qualitatively and quantitatively identifying an importance of each pixel of the image input to the deep learning model in contributing to said determining the information.” The motivation for doing so would have been to identify areas in an image that provide evidence in favor or against choosing a certain class (Zintgraf, abstract: “For image data for instance our method will highlight areas that provide evidence in favor of, and against choosing a certain class.”), particularly in a way that enables estimation of the importance of each pixel (Zintgraf, § 3: “allow us to estimate the importance of each pixel by assigning it a relevance value”).  
Freixenet, in an analogous art, teaches the remaining limitations (2) listed above. Freixenet generally methods on “image segmentation…, a relevant research area in Computer Vision” (abstract, first sentence). Therefore, Freixenet is in the same field of endeavor as the claimed invention, namely image analysis using artificial intelligence models, and is also reasonably pertinent to the problems of image analysis.
In particular, Freixenet teaches “wherein the one or more functions further comprise determining if the one or more causal portions that resulted in the information being determined are correct one or more causal portions of the image” [§ 4.1: “…boundary-based and region-based performance evaluation schemes are proposed. The boundary-based approach evaluates segmentation in terms of both localization and shape accuracy of extracted regions, while the region-based approach assesses the segmentation quality in terms of both size and location of the segmented regions.” That is, extracted/segmented “regions” are first determined by algorithms, such as algorithms disclosed in § 2.1. A ground truth is also acquired, as further described below. Then, the regions are assessed for correctness by comparing them to a ground truth (corresponding to the “correct one or more causal portions” in the claim) using various methods to determine whether they match the ground truth, using various methods. These regions are analogous to the “one or more causal portions” in the claim, since they are computer-determined regions of an image, and their techniques are applicable to the “one or more causal portions” of the claim (Examiner’s Note: The feature of “one or more functions further comprise determining if the one or more causal portions that resulted in the information being determined” is already taught by Karlinsky, as set forth above.). In Freixenet, one comparison method is described in § 4.1, paragraphs 2-3 (heading: “Boundary-Based Evaluation”): “The boundary-based scheme is intended to evaluate segmentation quality in terms of the precision of the extracted region boundaries. Let B represent the boundary point set derived from the segmentation and GB the boundary ground truth…A distance distribution signature from a set B1 to a set B2 of boundary points, denoted by DB2B1, is a discrete function whose distribution characterizes the discrepancy, measured in distance, from B1 and B2…As a rule, a DB2 B1 with a near-zero mean and a small standard deviation indicates high quality of the image segmentation.” Another method is described in § 4.1, paragraphs 4-7 (heading: “Region-Based Evaluation”): “The region-based scheme evaluates the segmentation accuracy in the number of regions, the locations and the sizes. Let the segmentation be S and the corresponding ground truth be GS. The goal is to quantitatively describe the degree of mismatch between them… A region-based performance measure based on normalized Hamming distance is defined… The smaller the degree of mismatch…”] and “wherein the correct one or more causal portions are acquired from a method, another system, or a user of the system.” [§ 4.1, paragraph 1: “The evaluation of image segmentation is performed with several quantitative measures.” Since the ground truth is inputted into a quantitative comparison method, it is “acquired from a method.” Note that “method” in this instance does not require a specific technique of determining the one or more causal portions, but covers any method of acquiring the portions.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Karlinsky and Zintgraf with the teachings of Freixenet by modifying the one or more functions to further comprise “determining if the one or more causal portions that resulted in the information being determined are correct one or more causal portions of the image, and wherein the correct one or more causal portions are acquired from a method, another system, or a user of the system.” The motivation would have been to evaluate the accuracy or quality of the determined regions, as suggested by Freixenet (§ 4.1, paragraph 1: “The boundary-based approach evaluates segmentation in terms of both localization and shape accuracy of extracted regions, while the region-based approach assesses the segmentation quality in terms of both size and location of the segmented regions.”).
 
As to claim 2, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, wherein the information comprises a classification for a defect detected on the specimen. [Karlinsky, [0055]: “defect classification”; also described in, e.g., Table 1, [0096]]

As to claim 3, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, wherein the information comprises features of the image extracted by the deep learning model. [Since the DNN processes the image, each layer of the DNN extracts features of the image. For example, Karlinsky, [0102]: “PMB trains (504) the DNN to extract classification-related features.”]

As to claim 4, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, wherein the information comprises a simulated image generated from the image. [Karlinsky, [0060]: “reconstructing an image from one or more images from a different examination modality,” also described in FIG. 9 and [00127]-[0133].]

As to claim 5, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, wherein the information comprises one or more segmentation regions generated from the image. [Karlinsky, [0056]: “segmentation of the fabrication process image including partitioning of FP image into segments,” also described in FIG. 6 and [0107]-[0112]]

As to claim 6, the combination of Karlinsky, Zintgraf, and Freixenet teaches system of claim 1, wherein the information comprises a multi-dimensional output generated from the image. [Karlinsky, [0060]: “reconstructing an image from one or more images from a different examination modality,” also described in FIG. 9 and [00127]-[0133]; and/or [0056]: “segmentation of the fabrication process image including partitioning of FP image into segments,” also described in FIG. 6 and [0107]-[0112].]

As to claim 7, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, wherein the deep learning model is a trained deep learning model. [Karlinsky, abstract, teaching that the DNN has been “trained.”]

As to claim 8, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, wherein the deep learning model is further configured as a neural network. [Abstract: teaching a deep neural network (DNN).]

As to claim 9, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, wherein the one or more functions further comprise altering one or more parameters of the deep learning model based on the determined one or more causal portions” [Karlinsky, [0086] teaches “the illustrated training process can be cyclic, and can be repeated several times until the DNN is sufficiently trained.” See also [0093]: “PMB can adjust the next training cycle… Adjusting can include at least one of: updating the training set (e.g. updating ground truth data and/or augmentation algorithms, obtaining additional first training samples and/or augmented training samples, etc.)…” Since the augmented images in Karlinsky are used in subsequent training cycles, the limitation of “altering one or more parameters of the deep learning model” is taught by the combination of references.]

As to claim 24, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, wherein the one or more functions further comprise generating a data augmentation method for application to additional images input to the deep learning model. [Karlinsky, [0077]: “Generating (403) training set of images can include augmenting (411) at least part of the first training samples and including the augmented training samples in the generated training set…An augmented training sample is derived from a first training sample by augmenting one or more images in the first training sample.” Note that the input for training, including the augmented training samples, may be adjusted for each subsequent training cycles, as described in [0093]. Images input for subsequent training cycles may be regarded as “additional images input to the deep learning model.”]

As to claim 25, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, wherein the one or more functions further comprise identifying the one or more causal portions as one or more regions of interest in the image [The “defect bounding box” as taught in Karlinsky ([0066], Table 1 (near paragraph [0068]), [0090] and [0118]) constitutes a “region of interest” because a bounding box is a region on the image.] and tuning the deep learning model based on the one or more regions of interest. [Karlinsky, [0086] teaches “the illustrated training process can be cyclic, and can be repeated several times until the DNN is sufficiently trained.” See also [0093]: “PMB can adjust the next training cycle… Adjusting can include at least one of: updating the training set (e.g. updating ground truth data and/or augmentation algorithms, obtaining additional first training samples and/or augmented training samples, etc.)…” Since the augmented images in the combination of Karlinsky are used in subsequent training cycles to refine the model, the limitation of “tuning the deep learning model” is taught by the combination of references.]

As to claim 26, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, wherein the one or more functions further comprise identifying the one or more causal portions as one or more regions of interest in the image [The “defect bounding box” as taught in Karlinsky ([0066], Table 1 (near paragraph [0068]), [0090] and [0118]) constitutes a “region of interest” because a bounding box is a region on the image.] and training an additional deep learning model based on the one or more regions of interest. [Karlinsky, [0086] teaches “the illustrated training process can be cyclic, and can be repeated several times until the DNN is sufficiently trained.” See also [0093]: “PMB can adjust the next training cycle… Adjusting can include at least one of: updating the training set (e.g. updating ground truth data and/or augmentation algorithms, obtaining additional first training samples and/or augmented training samples, etc.)…” Since the augmented images in the combination of Karlinsky are used in subsequent training cycles to refine the model, the limitation of “training an additional deep learning model” is taught by the combination of references. It is noted that a subsequent version of a deep learning model constitutes an “additional” deep learning model to the extent required by the claim.]

As to claim 27, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, wherein the imaging tool is configured as an inspection tool. [Karlinsky, [0064]: “images of a part of a wafer or a photomask captured by SEM or an optical inspection system.” See also [0042]: “a low-resolution examination machine 101 (e.g. an optical inspection system, low-resolution SEM, etc.)….a high-resolution machine 102 (e.g. a subset of potential defect locations selected for review can be reviewed by a scanning electron microscope (SEM) or Atomic Force Microscopy (AFM)).” See also [0046].] 

As to claim 28, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, wherein the imaging tool is configured as a metrology tool. [Karlinsky, [0064]: “images of a part of a wafer or a photomask captured by SEM or an optical inspection system.” See also [0042]: “a low-resolution examination machine 101 (e.g. an optical inspection system, low-resolution SEM, etc.)….a high-resolution machine 102 (e.g. a subset of potential defect locations selected for review can be reviewed by a scanning electron microscope (SEM) or Atomic Force Microscopy (AFM)).” See also [0046]. Note that in the present context, the aforementioned inspection tools are considered to be “metrology tools” because they are used to inspect wafers.]

As to claim 29, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, wherein the imaging tool is configured as an electron beam based imaging tool. [Karlinsky, [0064]: “images of a part of a wafer or a photomask captured by SEM or an optical inspection system.” See also [0042]: “a low-resolution examination machine 101 (e.g. an optical inspection system, low-resolution SEM, etc.)….a high-resolution machine 102 (e.g. a subset of potential defect locations selected for review can be reviewed by a scanning electron microscope (SEM) or Atomic Force Microscopy (AFM)).” See also [0046].]

As to claim 30, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, wherein the imaging tool is configured as an optical based imaging tool. [Karlinsky, [0064]: “images of a part of a wafer or a photomask captured by SEM or an optical inspection system.” See also [0042]: “a low-resolution examination machine 101 (e.g. an optical inspection system, low-resolution SEM, etc.).” See also [0046].]

As to claim 31, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, wherein the specimen is a wafer. [Karlinsky, [0064]: “images of a part of a wafer or a photomask captured by SEM or an optical inspection system.”]

As to claim 33, Karlinsky teaches a system configured to perform diagnostic functions for a deep learning model, [[0086] teaches diagnostic functions such as providing results of initial model training to a user and receiving user feedback.] comprising: 
an imaging tool configured for generating images of a specimen; [FIG. 1: low-resolution examination tool 101 and/or high-resolution examination tool 102, as described in [0042]: “a low-resolution examination machine 101 (e.g. an optical inspection system, low-resolution SEM, etc.)….a high-resolution machine 102 (e.g. a subset of potential defect locations selected for review can be reviewed by a scanning electron microscope (SEM) or Atomic Force Microscopy (AFM)).” See also [0046].]
one or more computer subsystems configured for acquiring the images; [FIG. 1: FPEI (Fabrication Process Examination Information) system 103 comprising a processor (block 104), as described in [0040]. As shown in figure 1, the imaging tool (low-resolution examination tool 101 and/or high-resolution examination tool 102), which may also be referred to as an inspection system, provides image data (inspection image data) to the FPEI system.].
The remaining limitations of “one or more components executed by the one or more computer subsystems, wherein the one or more components comprise: a deep learning model configured for determining information from an image generated for the specimen by the imaging tool; and a diagnostic component configured for determining one or more causal portions of the image that resulted in the information being determined and for performing one or more functions based on the determined one or more causal portions of the image, wherein the one or more functions comprise determining one or more characteristics of the one or more causal portions, wherein determining the one or more characteristics comprises qualitatively and quantitatively identifying an importance of each pixel of the image input to the deep learning model in contributing to said determining the information, wherein the one or more functions further comprise determining if the one or more causal portions that resulted in the information being determined are correct one or more causal portions of the image, and wherein the correct one or more causal portions are acquired from a method, another system, or a user of the system” are the same or substantially the same as the corresponding limitations recited in claim 1. Therefore, the rejection of claim 1 based on a combination of Karlinsky, Zintgraf, and Freixenet is applied to these remaining limitations. It is noted that the feature of “an image generated for the specimen by the imaging tool” (which differs from claim 1 with respect to the underlined portions) is taught by Karlinsky because the “one or more computer systems” acquires the image of the specimen obtained by the imaging tool.  

As to claim 34, Karlinsky teaches a non-transitory computer-readable medium, storing program instructions executable on one or more computer systems [[0021]: “non-transitory computer readable medium comprising instructions”] for performing a computer-implemented method for performing diagnostic functions for a deep learning model, [[0086] teaches diagnostic functions such as providing results of initial model training to a user and receiving user feedback.] wherein the computer-implemented method comprises: 
determining information from an image generated for a specimen by an imaging tool [[0008]-[0009]: a fabrication process (FP) image. [0064]: Such images may be “images of a part of a wafer or a photomask captured by SEM or an optical inspection system.” Note that Karlinsky generally relates to “examining a semiconductor specimen” (abstract). The limitation of “information” is addressed below.] by inputting the image to a deep learning model; [Abstract: a “Deep Neural Network (DNN) trained for a given examination-related application within a semiconductor fabrication process.” The DNN may have convolutional layers ([0053]), so as to be a convolutional neural network. Examples of information determined using the DNN include defect classification ([0055]), segmentation of process images ([0056]), and constructing images from one or more original images ([0060]). Furthermore, in the context of the instant claim, any output of any layer of the DNN may be considered to correspond to the instantly recited “information.” It is noted that the types of information determined by a trained model (as described in [0054]) may also be determined by the model during training (see description of training, [0071]-[0112]).]
determining one or more causal portions of the image that resulted in the information being determined by inputting the information to a diagnostic component; [A “defect bounding box,” as generally described in [0066], Table 1 (near paragraph [0068]), [0090] and [0118] corresponds to “one or more causal portions.” Paragraph [0066], Table 1, and [0118] teach that the bounding box can be a “DNN output” ([0068]), thereby teaching that the DNN performs an act of “determining” the defect bounding box. Furthermore, [0099] teaches that a bounding box can be supplied via user feedback. Since this feedback is then utilized to further train the neural network, as described in [0093], the act of “determining” is alternatively met by receipt of user feedback containing a bounding box.]
performing one or more functions based on the determined one or more causal portions of the image with the diagnostic component, wherein the one or more functions comprise determining one or more characteristics of the one or more causal portions; [The determination of a bounding box constitutes determining one or more characteristics thereof, since a bounding box has a position on the image. This is also described in Table 1: “Defect bounding box coordinate.” That is, the position of the bonding box corresponds to “one or more characteristics of the one or more causal portions.”]
wherein the deep learning model and the diagnostic component are included in one or more components executed by the one or more computer systems. [FIG. 1: FPEI (Fabrication Process Examination Information) system 103 comprising a processor (block 104), as described in [0040]]
Karlinsky does not teach: 
(1)	“wherein determining the one or more characteristics comprises qualitatively and quantitatively identifying an importance of each pixel of the image input to the deep learning model in contributing to said determining the information”; and
(2) 	“wherein the one or more functions further comprise determining if the one or more causal portions that resulted in the information being determined are correct one or more causal portions of the image, wherein the correct one or more causal portions are acquired from a method, another system, or a user of the system.” 
Zintgraf, in an analogous art, teaches the limitations (1) listed above. Zintgraf relates to a “method to visualize deep neural networks,” and is therefore in the same field of endeavor (artificial intelligence, including deep learning). Zintgraf generally relates to determining visualizing the importance of various parts of an input image. See § 1, paragraph 4: “We present a novel visualization method, exemplified for DCNNs, that finds and highlights the regions in image space that activate the nodes (hidden and output) in the neural network.”
In particular, Zintgraf teaches “wherein determining the one or more characteristics comprises qualitatively and quantitatively identifying an importance of each pixel of the image input to the deep learning model in contributing to said determining the information” [§ 3, paragraph 1: “For a given input image, the method will allow us to estimate the importance of each pixel by assigning it a relevance value.” § 3.2: “an individual pixel’s relevance is obtained by taking the average relevance obtained from the different patches it was in.” With respect to the limitation of “importance…in contributing to said determining the information,” note that importance/relevance in Zintgraf refers to relevance to predicting a class. See § 3.1, paragraph 3 (“For a feature to become relevant when using conditional sampling, it now has to satisfy two conditions: being relevant to predict the class of interest, and be hard to predict from the neighboring pixels.”) and § 4.1, paragraph 1 (“Red areas indicate evidence for the class, while blue indicates evidence against the class.”). With respect to the limitation of “qualitatively,” 4.1, paragraph 1, quoted above, teaches the qualitative determinations of “for the class” and “against the class.” See also FIG. 1, as described in the caption: “The colors in the visualizations have the following meaning: red stands for evidence for the predicted class; blue regions are evidence against it. Transparent regions do not have an influence on the decision.” The Examiner notes that a color copy of this reference, showing red and blue regions, can be downloaded from the link given in the full citation of this reference in Form PTO-892 (Notice of References cited). In general, the qualitative determination of “for” or “against” the class depends on the sign. See third page, top paragraph: “The resulting relevance vector has positive and negative entries. A positive value means that the corresponding feature has contributed towards the class of interest. A negative value on the other hand means that the feature value was actually evidence against the class.” While this specific description characterizes an earlier work, Zintgraf is built upon this work (§ 3, first sentence) and uses the same general methodology of computing the WE (weight of evidence) as a difference of two odds (see Algorithm 1, 4th-to-last line), and teaches that its calculation is “signed” (i.e., positive/negative) (§ 4.4, paragraph 2). With respect to the limitation of “quantitatively,” the “relevance value” is a numerical value computed by an algorithm. Therefore, the determination thereof qualifies as being performed “quantitatively.” For example, in FIG. 4, the degree of redness/blueness indicates quantitative evaluation.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Karlinsky and the teaching of Zintgraf by modifying Karlinsky such that “determining the one or more characteristics comprises qualitatively and quantitatively identifying an importance of each pixel of the image input to the deep learning model in contributing to said determining the information.” The motivation for doing so would have been to identify areas in an image that provide evidence in favor or against choosing a certain class (Zintgraf, abstract: “For image data for instance our method will highlight areas that provide evidence in favor of, and against choosing a certain class.”), particularly in a way that enables estimation of the importance of each pixel (Zintgraf, § 3: “allow us to estimate the importance of each pixel by assigning it a relevance value”).  
Freixenet, in an analogous art, teaches the remaining limitations (2) listed above. Freixenet generally methods on “image segmentation…, a relevant research area in Computer Vision” (abstract, first sentence). Therefore, Freixenet is in the same field of endeavor as the claimed invention, namely image analysis using artificial intelligence models, and is also reasonably pertinent to the problems of image analysis.
In particular, Freixenet teaches “wherein the one or more functions further comprise determining if the one or more causal portions that resulted in the information being determined are correct one or more causal portions of the image” [§ 4.1: “…boundary-based and region-based performance evaluation schemes are proposed. The boundary-based approach evaluates segmentation in terms of both localization and shape accuracy of extracted regions, while the region-based approach assesses the segmentation quality in terms of both size and location of the segmented regions.” That is, extracted/segmented “regions” are first determined by algorithms, such as algorithms disclosed in § 2.1. A ground truth is also acquired, as further described below. Then, the regions are assessed for correctness by comparing them to a ground truth (corresponding to the “correct one or more causal portions” in the claim) using various methods to determine whether they match the ground truth, using various methods. These regions are analogous to the “one or more causal portions” in the claim, since they are computer-determined regions of an image, and their techniques are applicable to the “one or more causal portions” of the claim (Examiner’s Note: The feature of “one or more functions further comprise determining if the one or more causal portions that resulted in the information being determined” is already taught by Karlinsky, as set forth above.). One comparison method is described in § 4.1, paragraphs 2-3 (heading: “Boundary-Based Evaluation”): “The boundary-based scheme is intended to evaluate segmentation quality in terms of the precision of the extracted region boundaries. Let B represent the boundary point set derived from the segmentation and GB the boundary ground truth…A distance distribution signature from a set B1 to a set B2 of boundary points, denoted by DB2B1, is a discrete function whose distribution characterizes the discrepancy, measured in distance, from B1 and B2…As a rule, a DB2 B1 with a near-zero mean and a small standard deviation indicates high quality of the image segmentation.” Another method is described in § 4.1, paragraphs 4-7 (heading: “Region-Based Evaluation”): “The region-based scheme evaluates the segmentation accuracy in the number of regions, the locations and the sizes. Let the segmentation be S and the corresponding ground truth be GS. The goal is to quantitatively describe the degree of mismatch between them… A region-based performance measure based on normalized Hamming distance is defined… The smaller the degree of mismatch…”] and “wherein the correct one or more causal portions are acquired from a method, another system, or a user of the system.” [§ 4.1, paragraph 1: “The evaluation of image segmentation is performed with several quantitative measures.” Since the ground truth is inputted into a quantitative comparison method, it is “acquired from a method.” Note that “method” in this instance does not require a specific technique of determining the one or more causal portions, but covers any method of acquiring the portions.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Karlinsky and Zintgraf with the teachings of Freixenet by modifying the one or more functions to further comprise “determining if the one or more causal portions that resulted in the information being determined are correct one or more causal portions of the image, wherein the correct one or more causal portions are acquired from a method, another system, or a user of the system.” The motivation would have been to evaluate the accuracy or quality of the determined regions, as suggested by Freixenet (§ 4.1, paragraph 1: “The boundary-based approach evaluates segmentation in terms of both localization and shape accuracy of extracted regions, while the region-based approach assesses the segmentation quality in terms of both size and location of the segmented regions.”).

As to claim 35, this claim is directed to a computer-implemented method for performing diagnostic functions for a deep learning model comprising operations that are the same or substantially the same as those recited in claim 34. Therefore, the rejection of claim 34 is applied to claim 35.

As to claim 36, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, wherein qualitatively identifying the importance of said each pixel comprises assigning a positive value for the importance of any of said each pixel that positively contributes to said determining the information. [As discussed in the rejection of claim 1, in Zintgraf, the qualitative determination of “for” or “against” the class depends on the sign. See third page, top paragraph: “The resulting relevance vector has positive and negative entries. A positive value means that the corresponding feature has contributed towards the class of interest. A negative value on the other hand means that the feature value was actually evidence against the class.” While this specific description characterizes an earlier work, Zintgraf is built upon this work (Zintgraf, § 3, first sentence) and uses the same general methodology of computing the WE (weight of evidence) as a difference of two odds (see Zintgraf, Algorithm 1, 4th-to-last line), and teaches that its calculation is “signed” (i.e., positive/negative) (Zintgraf, § 4.4, paragraph 2). Therefore, Zintgraf also uses positive and negative values for the relevance.]

As to claim 37, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, wherein qualitatively identifying the importance of said each pixel comprises assigning a negative value for the importance of any of said each pixel that negatively contributes to said determining the information. [As discussed in the rejection of claim 1, in Zintgraf, the qualitative determination of “for” or “against” the class depends on the sign. See third page, top paragraph: “The resulting relevance vector has positive and negative entries. A positive value means that the corresponding feature has contributed towards the class of interest. A negative value on the other hand means that the feature value was actually evidence against the class.” While this specific description characterizes an earlier work, Zintgraf is built upon this work (Zintgraf, § 3, first sentence) and uses the same general methodology of computing the WE (weight of evidence) as a difference of two odds (see Zintgraf, Algorithm 1, 4th-to-last line), and teaches that its calculation is “signed” (i.e., positive/negative) (Zintgraf, § 4.4, paragraph 2). Therefore, Zintgraf also uses positive and negative values for the relevance.]

As to claim 39, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, wherein determining if the one or more causal portions are the correct one or more causal portions of the image comprises comparing the determined one or more causal portions to the correct one or more causal portions of the image to determine if the determined one or more causal portions match the correct one or more causal portions. [As noted in the rejection of claim 1, Freixenet teaches comparing regions to a ground truth. Freixenet, § 4.1 paragraphs 2-3 (heading: “Boundary-Based Evaluation”): “The boundary-based scheme is intended to evaluate segmentation quality in terms of the precision of the extracted region boundaries. Let B represent the boundary point set derived from the segmentation and GB the boundary ground truth…A distance distribution signature from a set B1 to a set B2 of boundary points, denoted by DB2B1, is a discrete function whose distribution characterizes the discrepancy, measured in distance, from B1 and B2…As a rule, a DB2B1 with a near-zero mean and a small standard deviation indicates high quality of the image segmentation.” That is, a zero mean and zero standard deviation indicates the situation in which “one or more causal portions match the correct one or more causal portions.” Thus, the comparison determines “if” (i.e., whether or not) such a match occurs. The instant limitation is also met in the region-based evaluation taught in Freixenet, § 4.1 paragraphs 4-7 (heading: “Region-Based Evaluation”): “The region-based scheme evaluates the segmentation accuracy in the number of regions, the locations and the sizes. Let the segmentation be S and the corresponding ground truth be GS. The goal is to quantitatively describe the degree of mismatch between them… A region-based performance measure based on normalized Hamming distance is defined… The smaller the degree of mismatch…” Note that a low degree of mismatch corresponds to the situation in which “one or more causal portions match the correct one or more causal portions.”]

As to claim 40, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, wherein determining if the one or more causal portions are the correct one or more causal portions of the image comprises comparing the determined one or more causal portions to the correct one or more causal portions of the image to determine differences between the determined one or more causal portions and the correct one or more causal portions. [As noted in the rejection of claim 1, Freixenet teaches comparing regions to a ground truth. Freixenet, § 4.1 paragraphs 2-3 (heading: “Boundary-Based Evaluation”): “The boundary-based scheme is intended to evaluate segmentation quality in terms of the precision of the extracted region boundaries. Let B represent the boundary point set derived from the segmentation and GB the boundary ground truth…A distance distribution signature from a set B1 to a set B2 of boundary points, denoted by DB2B1, is a discrete function whose distribution characterizes the discrepancy, measured in distance, from B1 and B2…As a rule, a DB2B1 with a near-zero mean and a small standard deviation indicates high quality of the image segmentation.” The instant limitation is also met in the region-based evaluation taught in Freixenet, § 4.1 paragraphs 4-7 (heading: “Region-Based Evaluation”): “The region-based scheme evaluates the segmentation accuracy in the number of regions, the locations and the sizes. Let the segmentation be S and the corresponding ground truth be GS. The goal is to quantitatively describe the degree of mismatch between them… A region-based performance measure based on normalized Hamming distance is defined… The smaller the degree of mismatch…”]

2.	Claim 16 is rejected under 35 U.S.C. § 103 as being unpatentable over Karlinsky in view of Zintgraf and Freixenet, and further in view of Samek et al., “Evaluating the visualization of what a Deep Neural Network has learned,” arXiv: 1509.06321, September 21, 2015, 13 pages (“Samek”) (cited by applicant).
As to claim 16, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, but does not teach that the diagnostic component is further configured for determining the one or more causal portions by “causal back propagation using a layer-wise relevance propagation”
Samek teaches “causal back propagation performed using a layer-wise relevance propagation.” Samek generally relates to visualization of what a Deep Neural Network has learned (title and abstract). Therefore, Samek is in the field of machine learning and is also pertinent to image analysis using machine learning models. 
In particular, Samek teaches “causal back propagation performed using a layer-wise relevance propagation” [§ III, paragraph 4 “we review three recent methods for computing heatmaps, all of them performing a backward propagation pass on the network: 1) a sensitivity analysis based on neural network partial derivatives; 2) the so-called deconvolution method; and 3) the LRP algorithm.” Note that LRP refers to “Layer-wise relevance propagation (LRP)” as described in § II.C (“Relevance Heatmaps).].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Karlinsky, Zintgraf, and Freixenet with the further teachings of Samek by further configuring the diagnostic component for determining the one or more causal portions by causal back propagation performed using a layer-wise relevance propagation, in order to provides a better explanation of what made a DNN arrive at a particular classification decision, as suggested by Samek (abstract, near end: “layer-wise relevance propagation algorithm qualitatively and quantitatively provides a better explanation of what made a DNN arrive at a particular classification decision than the sensitivity-based approach or the deconvolution method”).

3.	Claims 11-12 are rejected under 35 U.S.C. § 103 as being unpatentable over Karlinsky in view of Zintgraf and Freixenet, and further in view of Harada et al. (US 2013/0294680A1) (“Harada”).
As to claim 11, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, and teaches the limitations of “at least the image, the determined information, and the determined one or more causal portions” as set forth in the rejection of claim 1, above. 
However, Karlinsky, Freixenet and Zintgraf do specifically not teach, before the priority date of the claimed invention, the detail of “visualization component for displaying…to a user” of the instant claim.
Harada, in an analogous art, teaches “visualization component for displaying…to the user or an additional user.” Harada generally relates to “classifying an image picked up of a defect on a semiconductor wafer” (abstract) involving neural network classification ([0040]). Therefore, Harada is in the field of machine learning, particularly the application of machine learning models to semiconductor manufacturing applications.
In particular, Harada teaches a “visualization component” [Graphical user interfaces, as shown in FIGS. 10-12. See [0027]: “FIG. 11 illustrates another example of GUI exemplifying a defect area according to Embodiment 1 of this invention.” See [0045]: “a GUI (graphic user interface) which enables the modification of parameters. An example of GUI that can ascertain and modify the image processing parameters is shown in FIG. 12. In FIG. 12 are shown a list 1201 of observation apparatuses to be selected; a list 1202 for selecting a defect image; a window 1203 for displaying a perfect image corresponding a selected defect image; a window 1204 for displaying a selected defect image; a window 1205 for displaying the extracted defect image obtained with the preset parameters; and an interface 1206 for adjusting the values of parameters.”] “for displaying…to the user or an additional user” information relevant to wafer image analysis. Furthermore, such information may include an “image” [Defect image, shown in FIGS. 10-12], “determined information” [e.g., classified defects with more than one parameter, as shown in FIG. 11, or classified defects shown in FIG. 12], and information analogous to the “determined one or more causal portions” [Extracted defect areas, as shown in FIG. 11 using a drawn border, and portions of the image with defects, as shown in FIG. 12].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Karlinsky, Zintgraf, and Freixenet with the teachings of Harada by implementing a graphical user interfaces as taught by Harada, such that the one or more components further comprise a visualization component configured for displaying at least the image, the determined information, and the determined one or more causal portions to the user or an additional user. The motivation would have been to present relevant information to a user in a visual manner as suggested by Harada (e.g., FIG. 11), and/or to enable user interaction as suggested by Harada ([0045]: “GUI that can ascertain and modify the image processing parameters”). 

As to claim 12, the combination of Karlinsky, Freixenet, Zintgraf, and Harada teaches the system of claim 11, wherein the one or more components further comprise a user interface component [Harada: Graphical user interfaces as shown in FIGS. 10-12] configured to receive input from the user or the additional user after said displaying [For example, as shown in FIG. 12 of Harada, the user may select a defect image], and wherein the one or more functions performed by the diagnostic component are determined based on the input from the user or the additional user. [Harada teaches displaying the selected defect image in windows 1203-1205 in [0045]: “a list 1202 for selecting a defect image; a window 1203 for displaying a perfect image corresponding a selected defect image; a window 1204 for displaying a selected defect image.” The Examiner notes that the instant claim language does not require a specific function of the “one or more functions” or specific operation of the function to be determined based on said user input, but only requires a function to be determined by the user input. Furthermore, “based on” does not require a specific relationship. Therefore, an input of selecting a defect image, reads on the instant limitation of being a basis for a functionality.]
The motivation for combining the teachings of Karlinsky, Zintgraf, and Freixenet with the teachings of Harada given for parent claim 11 also apply to the instant claim.

4.	Claim 13 is rejected under 35 U.S.C. § 103 as being unpatentable over Karlinsky in view of Zintgraf and Freixenet and further in view of Simonyan et al., “Deep inside convolutional networks: Visualising image classification models and saliency maps,” presented at International Conference on Learning Representations (ICLR) Workshop 2014, April 19, 2014, 8 pages (“Simonyan”) (cited by applicant).
As to claim 13, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, but does not teach that the diagnostic component is further configured for determining the one or more causal portions by “computing a local sensitivity.”
Simonyan, in an analogous art, teaches determining the one or more causal portions by “computing a local sensitivity.” Simonyan generally pertains to visualization of image classification models. Therefore, Simonyan is in the field of machine learning, and is pertinent to image classification applications of machine learning.
In particular, Simonyan teaches determining the one or more causal portions by “computing a local sensitivity” [Abstract: “We consider two visualisation techniques, based on computing the gradient of the class score with respect to the input image.” In particular, the derivative “w” of Sc with respect to the image I at the point (image) I0 may be computed as a local sensitivity, as shown in eq. (4) on page 2. Then, as described in § 3.1, a class salience value is derived for each pixel based on “w”.]. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Karlinsky, Zintgraf, and Freixenet with the further teachings of Simonyan by further configuring the diagnostic component for “determining the one or more causal portions by computing a local sensitivity,” in order to visualize image classification models, particularly to compute a saliency map specific to a given image and class, as suggested by Simonyan (abstract).

5.	Claims 14-15 are rejected under 35 U.S.C. § 103 as being unpatentable over Karlinsky in view of Zintgraf and Freixenet, and further in view of Zeiler et al., “Visualizing and understanding convolutional networks,” European Conference on Computer Vision (ECCV) 2014, September 2014, pp. 818-833 (“Zeiler”) (cited by applicant).
As to claim 14, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, but does not teach the further limitations of the instant claim.
Zeiler, in an analogous art, teaches the further limitations of claim 14. Zeiler generally relates to convolutional networks used for image classification (abstract). Therefore, Zeiler is analogous art in being in the field of machine learning, and is also pertinent to image classification applications of machine learning.
In particular, Zeiler teaches wherein the diagnostic component is further configured for determining the one or more causal portions by causal back propagation. [Zeiler, § 2.1: “We present a novel way to map these activities back to the input pixel space, showing what input pattern originally caused a given activation in the feature maps. We perform this mapping with a Deconvolutional Network (deconvnet).” It is noted that page 29, lines 9-10 of the instant application states that “deconvolution heatmap can be viewed as a specific implementation of causal backpropagation.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Karlinsky, Zintgraf, and Freixenet with the teachings of Zeiler by modifying the combination of Karlinsky, Zintgraf, and Freixenet such that the diagnostic component is further configured for determining the one or more causal portions by causal back propagation. One of ordinary skill in the art would have been motivated to do so in order to obtain an understanding of the operation of the layers of the model for diagnosis of the model and to improve or adjust the model based on such understanding, as suggested by Zeiler (§ 1 paragraph 2: “a visualization technique that reveals the input stimuli that excite individual feature maps at any layer in the model. It also allows us to observe the evolution of features during training and to diagnose potential problems with the model”; see also abstract and § 6 of Zeiler).

As to claim 15, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, but does not teach the further limitations of the instant claim.
Zeiler, in an analogous art, teaches the further limitations of claim 15. Zeiler generally relates to convolutional networks used for image classification (abstract). Therefore, Zeiler is analogous art in being in the field of machine learning, and is also pertinent to image classification applications of machine learning.
In particular, Zeiler teaches wherein the diagnostic component is further configured for determining the one or more causal portions by causal back propagation performed using a deconvolution heatmap algorithm. [Zeiler, § 2.1: “We present a novel way to map these activities back to the input pixel space, showing what input pattern originally caused a given activation in the feature maps. We perform this mapping with a Deconvolutional Network (deconvnet).” It is noted that page 29, lines 9-10 of the instant application states that “deconvolution heatmap can be viewed as a specific implementation of causal backpropagation.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Karlinsky, Zintgraf, and Freixenet with the teachings of Zeiler by modifying the combination of Karlinsky, Zintgraf, and Freixenet such that the diagnostic component is further configured for determining the one or more causal portions by causal back propagation performed using a deconvolution heatmap algorithm. One of ordinary skill in the art would have been motivated to do so in order to obtain an understanding of the operation of the layers of the model for diagnosis of the model and to improve or adjust the model based on such understanding, as suggested by Zeiler (§ 1 paragraph 2: “a visualization technique that reveals the input stimuli that excite individual feature maps at any layer in the model. It also allows us to observe the evolution of features

6.	Claim 17 is rejected under 35 U.S.C. § 103 as being unpatentable over Karlinsky in view of Zintgraf and Freixenet and further in view of Shrikumar et al., “Not Just A Black Box: Learning Important Features through propagating activation differences,” arXiv: 1605.01713, April 11, 2017, 6 pages (“Shrikumar”) (cited by applicant).
As to claim 17, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, but does not teach that the diagnostic component is further configured for determining the one or more causal portions specifically by “causal back propagation performed using a deep lift algorithm.” 
Shrikumar, in an analogous art, teaches “causal back propagation performed using a deep lift algorithm.” Shrikumar generally relates to analyzing the activation of neural network outputs, with applications to image analysis (see abstract). Therefore, Shrikumar is in the field of machine learning and is also pertinent to image analysis using machine learning models.
In particular, Shrikumar teaches “causal back propagation performed using a deep lift algorithm” [Abstract: “we present DeepLIFT (Learning Important FeaTures), an efficient and effective method for computing importance scores in a neural network. DeepLIFT compares the activation of each neuron to its ‘reference activation’ and assigns contribution scores according to the difference.” With respect to the limitation of “causal back propagation,” § 2.3 (“Backpropagation Rules”) teaches backpropagation to find contribution (i.e., causal) scores: “The computation is reminiscent of the chain rule used during gradient backpropagation, as equation 2 makes it possible to start with contribution scores of later layers and use them to find the contribution scores of preceding layers.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Karlinsky, Zintgraf, and Freixenet with the further teachings of Shrikumar by further configuring the diagnostic component for determining the one or more causal portions by causal back propagation performed using a deep lift algorithm, in order to efficient and effective compute importance scores in a neural network, as suggested by Shrikumar (abstract).

7. 	Claim 18 is rejected under 35 U.S.C. § 103 as being unpatentable over Karlinsky in view of Zintgraf and Freixenet and further in view of Lin et al., “Network In Network,” arXiv: 1312.4400, March 4, 2014, 10 pages (“Lin”) (cited by applicant).
As to claim 18, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, but does not teach that the diagnostic component is further configured for determining the one or more causal portions by “global average pooling.”
Lin, in an analogous art, teaches “global average pooling.” Lin generally pertains to deep network structures (abstract), with applicability to image processing (§ 4.2). Therefore, Lin is in the field of machine learning, particularly image analysis using machine learning models.
In particular, Lin teaches “global average pooling” [§ 3.2 (“Global Average Pooling”): “we propose another strategy called global average pooling to replace the traditional fully connected layers in CNN….One advantage of global average pooling over the fully connected layers is that it is more native to the convolution structure by enforcing correspondences between feature maps and categories. Thus the feature maps can be easily interpreted as categories confidence maps. Another advantage is that there is no parameter to optimize in the global average pooling thus overfitting is avoided at this layer…” (§ 3.2, paragraph 3)]. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Karlinsky, Zintgraf, and Freixenet such that the diagnostic component is further configured for determining the one or more causal portions by global average pooling, in order to implement a functionality that enables feature maps to be easily interpreted as categories confidence maps, as suggested by Lin (§ 3.2, paragraph 3, quoted above).

8.	Claims 19 is rejected under 35 U.S.C. § 103 as being unpatentable over Karlinsky in view of Zintgraf and Freixenet and further in view of Sundararajan et al., “Axiomatic Attribution for Deep Networks,” arXiv: 1703.01365, June 13, 2017, 11 pages. (“Sundararajan”) (cited by applicant).
As to claim 19, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, but does not teach that the diagnostic component is further configured for determining the one or more causal portions by “computing a path integral on gradients.”
Sundararajan, in an analogous art, teaches “computing a path integral on gradients.” Sundararajan pertains to techniques in attributing the prediction of a deep network to its input features (abstract), wherein its techniques may be applied to object recognition networks (§ 6.1). Therefore, Sundarajan is in the field of machine learning, particularly image analysis using machine learning models.
In particular, Sundararajan teaches “computing a path integral on gradients” [§ 3 (“3. Our Method: Integrated Gradients”): “We consider the straightline path (in Rn) from the baseline x0 to the input x, and compute the gradients at all points along the path.” See Eq. (1).]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Karlinsky, Zintgraf, and Freixenet such that the diagnostic component is further configured for determining the one or more causal portions by computing a path integral on gradients, in order to implement a functionality that enables attributing the prediction of a deep network to its input features in a method that requires no modification to the original network and is simple to implement, as suggested by Sundararajan (abstract). 

9.	Claims 20-21 are rejected under 35 U.S.C. § 103 as being unpatentable over Karlinsky in view of Zintgraf and Freixenet and further in view of Friedman, “Greedy Function Approximation: A Gradient Boosting Machine,” The Annals of Statistics, 29(5): 1189-1232, May 1999 (cited by applicant). 
As to claim 20, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, but does not teach that the diagnostic component is further configured for determining the one or more causal portions by “computing a partial dependence plot.”
Friedman, in an analogous art, teaches “computing a partial dependence plot.” Friedman teaches mathematical techniques that are “can be used to help interpret models produced by any ‘black box’ prediction method, such as neural networks, support vector machines, nearest neighbors, radial basis functions, etc.” (page 1220, bottom paragraph). Therefore, Friedman is in the field of machine learning or is pertinent to applications that use machine learning models such as neural networks. 
In particular, Friedman teaches “computing a partial dependence plot” [§ 8.2 (“Partial dependence plots”) at page 1219. In particular, eq. (51) on page 1220 defines the partial dependence function, and examples of partial dependence plots are shown in FIG. 8 (p. 1224), and FIG. 9 (p. 1225).] Friedman teaches that “Partial dependence functions (51) can be used to help interpret models produced by any ‘black box’ prediction method, such as neural networks, support vector machines, nearest neighbors, radial basis functions, etc.” (page 1220, bottom paragraph).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Karlinsky, Zintgraf, and Freixenet such that the diagnostic component is further configured for determining the one or more causal portions by computing a partial dependence plot, in order to implement a functionality that helps interpret neural networks, as suggested by Friedman (page 1220, bottom paragraph). 

As to claim 21, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, but does not teach that the diagnostic component is further configured for determining the one or more causal portions by “computing a partial dependence plot with path integral.”
Friedman, in an analogous art, teaches “computing a partial dependence plot with path integral.” Friedman teaches mathematical techniques that are “can be used to help interpret models produced by any ‘black box’ prediction method, such as neural networks, support vector machines, nearest neighbors, radial basis functions, etc.” (page 1220, bottom paragraph). Therefore, Friedman is in the field of machine learning or is pertinent to applications that use machine learning models such as neural networks. 
In particular, Friedman teaches “computing a partial dependence plot with path integral” [§ 8.2 (“Partial dependence plots”) at page 1219. In particular, eq. (51) on page 1220 defines the partial dependence function, and examples of partial dependence plots are shown in FIG. 8 (p. 1224), and FIG. 9 (p. 1225). With respect to the limitation of “path integral,” it is noted that eq. (51) of Friedman teaches an integral. Since this integral would be evaluated over some range of values for the variable of integration, Friedman is considered to teach computation with a path integral, given that the claim does not require, for example, any particular characteristics of the “path” or a particular manner of selecting the path. ] Friedman teaches that “Partial dependence functions (51) can be used to help interpret models produced by any ‘black box’ prediction method, such as neural networks, support vector machines, nearest neighbors, radial basis functions, etc.” (page 1220, bottom paragraph).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Karlinsky, Zintgraf, and Freixenet such that the diagnostic component is further configured for determining the one or more causal portions by computing a partial dependence plot with path integral, in order to implement a functionality that helps interpret neural networks through visualization, as suggested by Friedman (§ 8.2 (page 1219), first paragraph, and page 1220, bottom paragraph). 

10.	Claim 22 is rejected under 35 U.S.C. § 103 as being unpatentable over Karlinsky in view of Zintgraf and Freixenet and further in view of Bhaskar et al. (US 2009/0080759 A1) (“Bhaskar”).
As to claim 22, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, and the elements of “the one or more characteristics of the one or more causal portions” [as discussed above in the rejection of claim 1] and “additional images…for additional training of the deep learning model” [In Karlinsky, the input for training, including the augmented training samples, may be adjusted for each subsequent training cycles, as described in [0093]. Images input for subsequent training cycles may be regarded as “additional images…for additional training of the deep learning model.”].
However, the thus-far combination of references does not teach the limitation that the one or more functions further comprise “determining…if additional images for the specimen should be collected from the imaging tool and used for” the additional training of the deep learning model,” wherein the determining is “based on” the one or more characteristics of the one or more causal portions. 
Bhaskar, in an analogous art, teaches or suggests the above limitations. Bhaskar generally pertains to inspection-related functions for wafer data. Therefore, Bhaskar is in the same field of endeavor. 
In particular, Bhaskar teaches “determining based on” analysis results, “if additional images for the specimen should be collected from the imaging tool and used [[0111]: “the set of processor nodes is configured to perform processing of the image data stored in the arrays of the storage media and to use results of the processing to determine if additional image data for the wafer is to be acquired by scanning the wafer or from the arrays of the storage media. For example, the set of processor nodes may be configured to use results of the local and non-local image processing to determine if more data is to be acquired, as shown in step 33 in FIG. 2.” See also [0112] (“decide whether another scan should be performed on the wafer”). Acquiring additional images allow further information of wafer characteristics (e.g., defects) to be obtained before generating a final result, as disclosed in [0112]. The Examiner notes that “the one or more characteristics” is not particularly defined, and the term “based on” in this context does not require a particular relationship. Therefore, Bhaskar, in teaching that the determination is based on the results of the processing, is deemed to suggest the instant limitation when Bhaskar is combined with the teachings of Karlinsky.]. Bhaskar furthermore suggests the images being used “for additional training of the deep learning model.” [As noted above, Bhaskar teaches that the additional images can be used to generate a final result of inspection, which is analogous to imputing the additional images to the system of the combination of Karlinsky, Zintgraf, and Freixenet, which functions to inspect wafer images.] 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Karlinsky, Zintgraf, and Freixenet with the teachings of Bhaskar by modifying the one or more functions to further comprise “determining, based on the one or more characteristics of the one or more causal portions, if additional images for the specimen should be collected from the imaging tool and used for additional training of the deep learning model,” in order to implement a functionality that would retrieve additional images to improve wafer inspection, as suggested by Bhaskar.   

11.	Claim 32 is rejected under 35 U.S.C. § 103 as being unpatentable over Karlinsky in view of Zintgraf and Freixenet, and further in view of and Ma et al. (US 2018/0060702A1) (“Ma”).
As to claim 32, the combination of Karlinsky, Zintgraf, and Freixenet teaches the system of claim 1, but does not teach “wherein the specimen is a reticle.”
Ma, in an analogous art, teaches “wherein the specimen is a reticle.” Ma generally relates to “defect classifications for wafer or reticle inspection” ([0002]). The classification may involve a convolutional neural network to classify images ([0032]-[0033]). Therefore, Ma is in the field of machine learning, specifically the use of machine learning techniques to perform defect analysis of wafers and reticles.
In particular, Ma teaches that the specimen for which analysis is performed is a “reticle” [[0016]: “An inspection can be performed on a wafer or a reticle (‘target specimen’) to generate the defect record…The defect records can include defect images, for example.” See also [0015] (“Wafer or reticle defect inspection systems with capability of defect classification have been widely used in semiconductor manufacturing”). Similar to the features of the instant claim, such images are input into the convolutional neural network ([0032]).].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied the system of the combination of Karlinsky, Zintgraf, and Freixenet to a specimen that is a reticle, so as to result in the instant limitation of “wherein the specimen is a reticle,” in order to implement the result of inspecting reticle defects in semiconductor manufacturing, as suggested by Ma ([0003]). Moreover, since Ma teaches that wafers and reticle may be interchangeably used as the subject of inspection, the instant claim would have been obvious for being a simple substitution of one known element (reticle) for another (wafer) to obtain predictable results. 

12. 	Claim 38 is rejected under 35 U.S.C. § 103 as being unpatentable over Karlinsky in view of Ardizzone et al.,“ Saliency Based Image Cropping,” ICIAP 2013, Part I, LNCS 8156, pp. 773–782, 2013 (“Ardizzone”), Zintgraf, and Freixenet.
As to claim 38, Karlinsky teaches A system configured to perform diagnostic functions for a deep learning model, comprising: one or more computer subsystems having one or more processors that execute instructions from a memory medium; and one or more components executed by the one or more computer subsystems and stored on a non-transitory computer-readable medium, wherein the one or more components comprise: a deep learning model configured for determining information from an image generated for a specimen by an imaging tool; and a diagnostic component configured for determining one or more causal portions of the image that resulted in the information being determined and for performing one or more functions based on the determined one or more causal portions of the image, wherein the one or more functions comprise determining one or more characteristics of the one or more causal portions, [The foregoing limitations are the same or substantially the same as the corresponding limitations recited in claim 1. Therefore, these limitations are taught by Karlinsky for the reasons given in the rejection of claim 1.] wherein the one or more computer subsystems are configured for inputting a generated augmented image into the deep learning model. [In general, [0086] teaches “the illustrated training process can be cyclic, and can be repeated several times until the DNN is sufficiently trained.” [0093]: “PMB can adjust the next training cycle based on the received feedback. Adjusting can include at least one of: updating the training set (e.g. updating ground truth data and/or augmentation algorithms, obtaining additional first training samples and/or augmented training samples, etc.)…” With respect to the aspect of a “generated augmented image,” while Karlinsky does not teach the particular method of generating the augmented image, Karlinsky teaches the use of augmented images in general. Karlinsky, [0016]: “Augmenting at least part of the first training samples can be provided, for example, by geometrical warping, planting a new defect in an image, amplifying a defectiveness of a pre-existing defect in an image, removing a pre-existing defect from an image and disguising a defect in an image.” See also [0077]. Furthermore, in Karlinsky, the subsequent training cycles can use augmented images, since [0093] teaches “augmentation algorithms” for the updating of updated training set.]
Karlinsky does not specifically teach:
(1)	“altering the image based on the one or more characteristics of the one or more causal portions to thereby generate an augmented image” “wherein altering the image comprises de-emphasizing one or more non-causal portions of the image”; 
(2)	“wherein determining the one or more characteristics comprises qualitatively and quantitatively identifying an importance of each pixel of the image input to the deep learning model in contributing to said determining the information”; and
(3)	“wherein the one or more functions further comprise determining if the one or more causal portions that resulted in the information being determined are correct one or more causal portions of the image, and wherein the correct one or more causal portions are acquired from a method, another system, or a user of the system.” 
Ardizzone, in an analogous art, teaches limitations (1) listed above. Ardizzone relates to saliency based image cropping (see title). Ardizzone is analogous because it is in the field of image processing, and is also reasonably pertinent to the problem of processing images. It is noted that the concept of saliency described in Ardizzone is consistent with the teachings of Karlinsky, which teaches the use of a bounding box to localize information.
In particular, Ardizzone teaches “altering the image based on the one or more characteristics of the one or more causal portions to thereby generate an augmented image” “wherein altering the image comprises de-emphasizing one or more non-causal portions of the image.” [§ 4, paragraph 1: “Each saliency map is then binarized using different threshold values (see section 5) and then the bounding box of all the pixels, which values are above the threshold, is selected and used to crop the photo (fig. 2).” That is, the photo before and after the cropping operation corresponds to the “image” and “augmented image,” and cropping based on the “bounding box” (which is illustrated in FIG. 2 of the reference) constitutes cropping based on one or more characteristics (in this case, the position) of the one or more causal portions. Furthermore, the “cropping” operation corresponds to “altering” by “de-emphasizing,” particular in light of applicant’s disclosure, which exemplifies cropping to a region of interest as an example of de-emphasizing (see page 41, middle paragraph, of applicant’s specification).]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Karlinsky and Ardizzone such that the one or more functions further comprises “altering the image based on the one or more characteristics of the one or more causal portions to thereby generate an augmented image” wherein “altering the image comprises de-emphasizing one or more non-causal portions of the image,” such that the augmented image takes the role of the updated training data in Karlinsky (so as to satisfy the limitation of “…configured for inputting the generated augmented image into the deep learning model”). The motivation for doing so would have been to process the image in a way that selects the most relevant areas of the image, discarding less relevant areas, as suggested by Ardizzone, abstract (“Image cropping is a technique that is used to select the most relevant areas of an image, discarding the useless ones”). 
Zintgraf, in an analogous art, teaches limitations (2) listed above. Zintgraf relates to a “method to visualize deep neural networks,” and is therefore in the same field of endeavor (artificial intelligence, including deep learning models). Zintgraf generally relates to determining visualizing the importance of various parts of an input image. See § 1, paragraph 4: “We present a novel visualization method, exemplified for DCNNs, that finds and highlights the regions in image space that activate the nodes (hidden and output) in the neural network.”
In particular, Zintgraf teaches “wherein determining the one or more characteristics comprises qualitatively and quantitatively identifying an importance of each pixel of the image input to the deep learning model in contributing to said determining the information” [The foregoing limitations are the same or substantially the same as the corresponding limitations recited in claim 1. Therefore, these limitations are taught by Zintgraf for the reasons given in the rejection of claim 1.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Karlinsky and Ardizzone with the teachings of Zintgraf by modifying the system of the combination of Karlinsky and Ardizzone such that “determining the one or more characteristics comprises qualitatively and quantitatively identifying an importance of each pixel of the image input to the deep learning model in contributing to said determining the information,” so as to arrive at each and every limitation of the claim. The motivation for doing so would have been to identify areas in an image that provide evidence in favor or against choosing a certain class (Zintgraf, abstract: “For image data for instance our method will highlight areas that provide evidence in favor of, and against choosing a certain class.”), particularly in a way that enables estimation of the importance of each pixel (Zintgraf, § 3: “allow us to estimate the importance of each pixel by assigning it a relevance value”). 
Freixenet, in an analogous art, teaches the remaining limitations (3) listed above. Freixenet generally methods on “image segmentation…, a relevant research area in Computer Vision” (abstract, first sentence). Therefore, Freixenet is in the same field of endeavor as the claimed invention, namely image analysis using artificial intelligence models, and is also reasonably pertinent to the problems of image analysis.
In particular, Freixenet teaches “wherein the one or more functions further comprise determining if the one or more causal portions that resulted in the information being determined are correct one or more causal portions of the image” [§ 4.1: “…boundary-based and region-based performance evaluation schemes are proposed. The boundary-based approach evaluates segmentation in terms of both localization and shape accuracy of extracted regions, while the region-based approach assesses the segmentation quality in terms of both size and location of the segmented regions.” That is, extracted/segmented “regions” are first determined by algorithms, such as algorithms disclosed in § 2.1. A ground truth is also acquired, as further described below. Then, the regions are assessed for correctness by comparing them to a ground truth (corresponding to the “correct one or more causal portions” in the claim) using various methods to determine whether they match the ground truth, using various methods. These regions are analogous to the “one or more causal portions” in the claim, since they are computer-determined regions of an image, and their techniques are applicable to the “one or more causal portions” of the claim (Examiner’s Note: The feature of “one or more functions further comprise determining if the one or more causal portions that resulted in the information being determined” is already taught by Karlinsky, as set forth above.). One comparison method is described in § 4.1, paragraphs 2-3 (heading: “Boundary-Based Evaluation”): “The boundary-based scheme is intended to evaluate segmentation quality in terms of the precision of the extracted region boundaries. Let B represent the boundary point set derived from the segmentation and GB the boundary ground truth…A distance distribution signature from a set B1 to a set B2 of boundary points, denoted by DB2B1, is a discrete function whose distribution characterizes the discrepancy, measured in distance, from B1 and B2…As a rule, a DB2 B1 with a near-zero mean and a small standard deviation indicates high quality of the image segmentation.” Another method is described in § 4.1, paragraphs 4-7 (heading: “Region-Based Evaluation”): “The region-based scheme evaluates the segmentation accuracy in the number of regions, the locations and the sizes. Let the segmentation be S and the corresponding ground truth be GS. The goal is to quantitatively describe the degree of mismatch between them… A region-based performance measure based on normalized Hamming distance is defined… The smaller the degree of mismatch…”] and “wherein the correct one or more causal portions are acquired from a method, another system, or a user of the system.” [§ 4.1, paragraph 1: “The evaluation of image segmentation is performed with several quantitative measures.” Since the ground truth is inputted into a quantitative comparison method, it is “acquired from a method.” Note that “method” in this instance does not require a specific technique of determining the one or more causal portions, but covers any method of acquiring the portions.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Karlinsky, Ardizzone and Zintgraf with the teachings of Freixenet by modifying the one or more functions to further comprise “determining if the one or more causal portions that resulted in the information being determined are correct one or more causal portions of the image, and wherein the correct one or more causal portions are acquired from a method, another system, or a user of the system.” The motivation would have been to evaluate the accuracy or quality of the determined regions, as suggested by Freixenet (§ 4.1, paragraph 1: “The boundary-based approach evaluates segmentation in terms of both localization and shape accuracy of extracted regions, while the region-based approach assesses the segmentation quality in terms of both size and location of the segmented regions.”).

13.	Claim 42 is rejected under 35 U.S.C. § 103 as being unpatentable over Karlinsky in view of Zintgraf and Freixenet and further in view of Le Rudulier et al., (US 2017/0149690 A1) (“Le Rudulier”) and Bhaskar.
As to claim 42, the combination of Karlinsky, Zintgraf, and Freixenet teaches system of claim 1, and the elements of “results of determining if the one or more causal portions that resulted in the information being determined are the correct one or more causal portions of the image” [The “determining” operation is taught by Freixenet as discussed in the rejection of claim 1, above. Furthermore, Freixenet, § 4.2 (“The Results”) teaches that results are computed using the evaluation techniques described in this reference.] and “additional images…for training of the deep learning model” [In Karlinsky, the input for training, including the augmented training samples, may be adjusted for each subsequent training cycles, as described in [0093]. Images input for subsequent training cycles may be regarded as “additional images input to the deep learning model.”].
However, the thus-far combination of references does not teach the limitation that the one or more functions further comprise “determining…if additional images for the specimen should be collected from the imaging tool and used for” the additional training of the deep learning model,” wherein the determining is “based on” the results of determining if the one or more causal portions that resulted in the information being determined are the correct one or more causal portions of the image.
Le Rudulier, in an analogous art, teaches the limitation of the determining being “based on” the results of determining if the one or more causal portions that resulted in the information being determined are the correct one or more causal portions of the image, and training samples to be “used for additional training of the deep learning model.” Le Rudulier pertains to “classification systems” (see title), implemented by models such as “neural networks” ([0077]). Therefore, Le Rudulier is in the same field of endeavor as the claimed invention, namely image analysis using artificial intelligence models.
In particular, Le Rudulier teaches “determining, based on the results of determining if the one or more causal portions that resulted in the information being determined are the correct one or more causal portions of the image, if additional images […] should be collected […] and used for additional training of the deep learning model.” [[0080]: “If the accuracy of the classifier does not satisfy the desired accuracy, at 714, then the classifier may be trained using additional training data, at 704.”] 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Karlinsky, Zintgraf, and Freixenet with the teachings of Le Rudulier by modifying the one or more functions to further comprise “determining, based on results of determining if the one or more causal portions that resulted in the information being determined are the correct one or more causal portions of the image, if additional images […] should be collected […] and used for additional training of the deep learning model.”
The thus-far combination of references does not teach the limitation that the additional images are “additional images for the specimen…collected from the imaging tool.”
Bhaskar, in an analogous art, teaches or suggests the above limitations. Bhaskar generally pertains to inspection-related functions for wafer data. Therefore, Bhaskar is in the same field of endeavor. 
In particular, Bhaskar teaches “additional images for the specimen…collected from the imaging tool” [[0111]: “the set of processor nodes is configured to perform processing of the image data stored in the arrays of the storage media and to use results of the processing to determine if additional image data for the wafer is to be acquired by scanning the wafer or from the arrays of the storage media. For example, the set of processor nodes may be configured to use results of the local and non-local image processing to determine if more data is to be acquired, as shown in step 33 in FIG. 2.” See also [0112] (“decide whether another scan should be performed on the wafer”). Acquiring additional images allow further information of wafer characteristics (e.g., defects) to be obtained before generating a final result, as disclosed in [0112].] 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Karlinsky, Zintgraf, Freixenet, and Le Rudulier with the teachings of Bhaskar by modifying the additional images to be additional images for the specimen collected from the imaging tool, so as to satisfy each and every limitation of the instant claim. The motivation would have been to retrieve additional image when such is desired.   

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The following document depicts the state of the art.
US-20150227816-A1 teaches detecting salient regions and the subsequent adjustment to the detected region.
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YAO DAVID HUANG whose telephone number is (571)270-1764. The examiner can normally be reached Monday - Friday 9:00 am - 5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/Y.D.H./Examiner, Art Unit 2124                                                                                                                                                                                                        

/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124