DETAILED ACTION
Response to Amendment
The amendment was received 4/16/21. Claims 1-22 are pending.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. Accordingly, 35 USC 112(f) is NOT invoked. Accordingly:






Regarding claim 10, including claim 12, claim 10 is “treated as a Markush claim” given “alternatively usable members” via the claimed “at least one of:….and”: “and” means “otherwise” in claim 10 and similar for claim 12. Accordingly, the claimed “number of convolution layers” as amended has not been fully “searched”; the extent or “the full scope of the Markush grouping has” not “been searched” fully via MPEP:
803.02 2 Elected Species in Proper Markush Grouping Allowable over the Prior Art, 2nd paragraph:
“In the interest of compact prosecution, the examiner should ensure that the record is clear as to which species have been searched and have been found allowable over the prior art. The examiner should indicate that the provisional election of species requirement has been modified if additional species beyond the elected species have been searched and determined to be allowable over the prior art. The examiner should indicate that the provisional election of species requirement has been withdrawn if the full scope of the Markush grouping has been searched and been determined to be allowable over the prior art. Note that the examiner can only make or maintain any restriction requirement if there would be serious burden. Clarity of the record with regard to the provisional election of species requirement is critical to proper application of 35 U.S.C. 121  in later divisional applications.”; and

“2117    Markush Claims [R-10.2019]
Treatment of claims reciting alternatives is not governed by the particular format used (e.g., alternatives may be set forth as "a material selected from the group consisting of A, B, and C" or "wherein the material is A, B, or C"). See, e.g., the Supplementary Examination Guidelines for Determining Compliance with 35 U.S.C. 112 and for Treatment of Related Issues in Patent Applications ("Supplementary Guidelines"), 76 Fed. Reg. 7162 (February 9, 2011). Claims that set forth a list of alternatives from which a selection is to be made are typically referred to as Markush claims, after the appellant in Ex parte Markush, 1925 Dec. Comm’r Pat. 126, 127 (1924). Although the term "Markush claim" is used throughout the MPEP, any claim that recites alternatively usable members, regardless of format, should be treated as a Markush claim. Inventions in metallurgy, refractories, ceramics, chemistry, pharmacology and biology are most frequently claimed under the Markush formula, but purely mechanical features or process steps may also be claimed by using the Markush style of claiming. See, e.g., Fresenius USA, Inc. v. Baxter Int’l, Inc., 582 F.3d 1288, 1297-98 (Fed. Cir. 2009)(claim to a hemodialysis apparatus required "at least one unit selected from the group consisting of (i) a dialysate-preparation unit, (ii) a dialysate-circulation unit, (iii) an ultrafiltrate-removal unit, and (iv) a dialysate-monitoring unit" and a user/machine interface operably connected thereto); In re Harnisch, 631 F.2d 716, 206 USPQ 300 (CCPA 1980)(defining alternative moieties of a chemical compound with Markush groupings).”

rd paragraph, emphasis added:
“It is also appropriate to look to how the claim term is used in the prior art, which includes prior art patents, published applications, trade publications, and dictionaries. Any meaning of a claim term taken from the prior art must be consistent with the use of the claim term in the specification and drawings. Moreover , when the specification is clear about the scope and content of a claim term, there is no need to turn to extrinsic evidence for claim interpretation. 3M Innovative Props. Co. v. Tredegar Corp., 725 F.3d 1315, 1326-28, 107 USPQ2d 1717, 1726-27 (Fed. Cir. 2013) (holding that "continuous microtextured skin layer over substantially the entire laminate" was clearly defined in the written description, and therefore, there was no need to turn to extrinsic evidence to construe the claim).”

The claimed “halo” (as in “each tile in the grid of tiles comprises a halo” in claim 10) is interpreted in light of applicant’s disclosure and definition thereof via Dictionary.com, wherein definitions 1 and 2 are “taken”:
halo
noun, plural ha·los, ha·loes.
1	Also called nimbus. a geometric shape, usually in the form of a disk, circle, ring, or rayed structure, traditionally representing a radiant light around or above the head of a divine or sacred personage, an ancient or medieval monarch, etc.

wherein “nimbus” is defined:
nimbus
noun, plural nim·bi  [nim-buhbahy], nim·bus·es.
2	a cloud, aura, atmosphere, etc., surrounding a person or thing:
The candidate was encompassed with a nimbus of fame.

Claim 11’s “if” limitation is interpreted via MPEP 2111.04 II. CONTINENGENT LIMITATIONS.
Claim 12’s “a distance criterion or a difference in dose energy” is interpreted in light of applicant’s disclosure [0031].


Response to Arguments
Applicant’s arguments, see remarks, page 6, filed 4/16/21, with respect to the claim objection have been fully considered and are persuasive.  The claim objection of claims 1-15 has been withdrawn. 
Applicants state on page 6:
“In response to the Examiner's comments on claim interpretation, claim 10 has been amended to clarify the definition of halo.”

In response, the meaning of “halo” is “taken” as discussed and shown above via MPEP 2111.01 III, 3rd paragraph.
Applicant's arguments, page 7, emphasis added:
“Applicant respectfully argues that Ha and Mei, alone or in combination, do not teach or suggest the limitations of claim 1 as amended. Ha teaches multi-modality image registration for semiconductor devices by transforming images and comparing differences between the images (e.g., Ha c.7, 11. 24-36; c.23, 11. 18-54). Mei teaches processing images returned from an unmanned aerial vehicle to determine probabilities that the targets in candidate circumscribed frames belong to preset categories (e.g., Mei Abstract). The Examiner argues that it would be obvious to modify Ha with Mei's teachings since Mei's filters have "a relatively small calculation amount "and thus processes data faster than other size filters. However, although both Ha and Mei use deep learning networks, Applicant argues that it would not be obvious for one of ordinary skill in the art to apply the neural network parameters from one deep learning architecture to another or from one field to another. For example, Ha teaches an autoencoder, but Mei only uses convolution layers (no deconvolution layers) which is not an autoencoder, so it would not be obvious for one of ordinary skill in the art to modify Ha's autoencoder with Mei's filters. Furthermore, Mei's filters for determining the probabilities of targets in candidate circumscribed frames cannot simply be inserted into the algorithm for aligning semiconductor device images in Ha. Each deep learning algorithm has calculations that are specific to its data and application, and thus the parameters for tuning the learning must be tailored for that situation.”

filed 4/16/21 have been fully considered but they are not persuasive.



	The examiner respectfully disagrees since Ha teaches “This configuration of the learning based model is non-limiting, however, in that other learning based models may also be used in this embodiment and the parameters of the DCAE described above may be altered as necessary or desired.” via c.22,ll. 24-40 (or in the Office action of 1/8/21, page 8):
“In the embodiment shown in FIG. 3, the learning based model may be a regression model or any of the learning based models described herein.  In one such example, the learning based model may be in the form of a deep convolution autoencoder (DCAE).  The encoder portion of the learning based model may include, for example, five convolutional layers with kernel sizes of, for example, 5.times.5, a stride of 2, and no zero padding.  Each convolutional layer may be followed by a leaky rectified linear unit.  The decoder portion of the learning based model may have a similar architecture as the encoder, but uses deconvolutional layers.  The decoder may have separate weights, which are trainable to have more freedom to reconstruct design images.  This configuration of the learning based model is non-limiting, however, in that other learning based models may also be used in this embodiment and the parameters of the DCAE described above may be altered as necessary or desired.”.

Thus, one of skill in the art of learning models would of reasonably referred to Mei’s teaching of fig. 2:103: “…a deep learning network model” (detailed in Mei’s fig. 5:300) corresponding to Ha’s teaching of said “This configuration of the learning based model is non-limiting, however, in that other learning based models may also be used in this embodiment” in order to use other learning based models in Ha’s embodiment, such as shown in fig. 5, and combine as shown in the 35 USC 103 rejection such that “the parameters of the DCAE described above may be altered as necessary or desired”.




Applicant’s arguments, see remarks, page 8, emphasis added:
“Even if Ha and Mei were combined, it would not be obvious for one of ordinary skill in the art to derive a "pre-determined set of convolution layers, including a kernel size and filter size" "wherein the pre-determined set of convolution layers are tuned for increased accuracy of the encoded shape data based on design rules for the set of electronic designs" as recited in claim 1 of the present application. This specific tailoring of the convolution layers to increase accuracy of compressed shape data based on design rules for electronic designs provides unique benefits as explained in the present specification, for example:”

, filed 4/16/21, with respect to the rejection(s) of claim(s) 1,2,4,8,11,15 under 35 USC 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of 35 USC 103 in view of Lin et al. (Data Efficient Lithography Modeling With Transfer Learning and Active Data Selection) that teaches transfer learning or finetuning via fig. 12: “Finetune conv fc fc” that results in improved accuracy of data.
Applicant’s arguments, see remarks, page 9, regarding claims 10 and 10’, filed 4/16/21, with respect to the rejection(s) of claim(s) 10 and 10’ under 35 USC 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of 35 USC 103 in view of CAMMAROTA et al. (US 2020/0073636 A1) that teaches the value of a padding layer as shown in figures 5A:522:1 layer and fig. 5B:564:2 layers as a function of “filter kernel size”, [0061].





Applicant's arguments filed 4/16/21 have been fully considered but they are not persuasive. Applicant’s state on pages 9 and 10, emphasis added:
“Claim 12 as amended, dependent from claim 1, recites that "the error value is based on a distance criterion or difference in dose energy to manufacture the set of shape data on a surface, wherein the distance criterion or difference in dose energy are based on the design rules." The Office Action (p. 11) cites Ha as teaching a distance criterion (Fig. 4:"L2-loss") or difference in dose energy "during a semiconductor manufacturing process." Applicant respectfully argues that Ha does not teach an error value based on a "difference in dose energy." Ha merely teaches that "optical and electron beam tools may be configured for directing energy (e.g., light, electrons) to and/or scanning energy over a physical version of the specimen thereby generating actual (i.e., not simulated) output and/or images for the physical version of the specimen" (c.12, 11. 4-10; see also c. 14, 11. 10-17). Ha does not teach using energy to determine an error value determined in the learning algorithms. Furthermore, Ha does not teach using either "a distance criterion or difference in dose energy" as an error value "based on design rules" as set forth in claim 12.”

In response claim 12, similar to claim 10, is treated as a Markush claim; thus, the full scope of the Markush grouping has not been fully searched given that Ha teaches the other alternative as shown in the rejection of claim 12 under the broadest reasonable interpretation of claim 12.









Applicant's arguments filed 4/16/21 have been fully considered but they are not persuasive. Applicant’s state on pages 9 and 10, emphasis added:
“Claim 12 as amended, dependent from claim 1, recites that "the error value is based on a distance criterion or difference in dose energy to manufacture the set of shape data on a surface, wherein the distance criterion or difference in dose energy are based on the design rules." The Office Action (p. 11) cites Ha as teaching a distance criterion (Fig. 4:"L2-loss") or difference in dose energy "during a semiconductor manufacturing process." Applicant respectfully argues that Ha does not teach an error value based on a "difference in dose energy." Ha merely teaches that "optical and electron beam tools may be configured for directing energy (e.g., light, electrons) to and/or scanning energy over a physical version of the specimen thereby generating actual (i.e., not simulated) output and/or images for the physical version of the specimen" (c.12, 11. 4-10; see also c. 14, 11. 10-17). Ha does not teach using energy to determine an error value determined in the learning algorithms. Furthermore, Ha does not teach using either "a distance criterion or difference in dose energy" as an error value "based on design rules" as set forth in claim 12.”

The examiner respectfully disagrees since Ha teaches claim 12 of:

Regarding claim 12, Ha as combined teaches the method of claim 11 wherein the error value (said via fig. 4:“L2-loss”) is based on a distance criterion (said fig. 4:“L2-loss”) or a difference in dose energy to manufacture (via “during a semiconductor manufacturing process” cited in the rejection of claim 1) the set of shape data (said or “images…which may have…shape”) on a surface (said semiconductor), wherein the distance criterion (said fig. 4:“L2-loss”) or the difference in dose energy are based on the design rules (said “design rules” represented in fig. 4:400 via said “designers” making designs as shown in fig. 4:400).





Applicant's arguments filed 4/16/21 have been fully considered but they are not persuasive. Applicants state in page 10, emphasis added:
“Claim 16 as amended recites "a set of parameters including a set of convolution layers for a convolutional autoencoder, wherein the set of parameters are determined using design rules for the set of electronic designs," "encoding the set of shape data to compress the set of shape data," and "adjusting the set of parameters based on the calculated loss, wherein the set of parameters are tuned for increased accuracy of the encoded shape data based on the design rules for the set of electronic designs." These limitations are similar to those in claim 1, and thus the above arguments for claim 1 apply to claim 16. In particular, Ha teaches methods for aligning images from different modalities by comparing differences between images. Ha does not teach compressing shape data using convolution layers having parameters based on design rules for electronic designs as set forth in claim 16. Therefore, claim 16 and its dependent claims are not anticipated by Ha.”

	The examiner respectfully disagrees since Ha teaches, as shown in the 35 USC 102 rejection of claim 16, a method for compression of shape data for a set of electronic designs, the method comprising:
inputting a set of shape data (said or “images…which may have…shape”), wherein the set of shape data represents a set of shapes (said or “regions of particular shape”) for a device fabrication process (said or “fabricating semiconductor devices”); 
inputting a set of parameters (or “the parameters…may be altered” cited in the rejection of claim 1) including a set of convolution layers (said via fig. 5:512: “Fully connected layer(s)”) for a convolutional autoencoder (said via figs. 4:404, 5:502,506, 7:702: “Encoder”), wherein the set of parameters (said “the parameters…may be altered” cited in the rejection of claim 1) are determined  (via “as necessary or desired”) using design rules (said “design rules”) for the set (said fig. 3:300,306) of electronic designs (via said “designers”); 


encoding (said via figs. 4:404, 5:502,506, 7:702: “Encoder”) the set of shape data  to compress (said via “dimensionality reduction”) the set of shape data, using the set of convolution layers (said via fig. 5:512: “Fully connected layer(s)”) of the convolutional autoencoder (said via figs. 4:404, 5:502,506, 7:702: “Encoder”), to create a set of encoded shape data; 
decoding (said via figs. 4:408,7:706: “Decoder”) the set of encoded shape data  into decoded data using the convolutional autoencoder; 
calculating a loss (said via fig. 4:“L2-loss”) by comparing the decoded data (via said via fig. 4:408: “Decoder” and fig. 7:706: “Decoder”) with the input set of shape data (said or “images…which may have…shape”); and 
adjusting the set of parameters (said “the parameters…may be altered”) based on the calculated loss (said via fig. 4:“L2-loss”), wherein the set of parameters (said “the parameters…may be altered” cited in the rejection of claim 1) are tuned (via “parameters to tune”) for increased accuracy (“tune” comprising “superior” “accuracy”) of the encoded shape data (said or “images…which may have…shape”) based on the design rules (said “design rules”) for the set of electronic designs (via said “designers” via c.26,ll. 23-28:
“The end-to-end learning based model approaches described herein are different from the currently used methods in that, in these embodiments, the whole registration process is carried out in a single feedforward network.  These embodiments therefore are simpler, require fewer parameters to tune, run much faster and thus increase throughput.”).




Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Regarding inquiry 4, see Suggestions.
Claims 1,2,4,8,11,12,13 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ha et al. (US Patent 10,733,744) in view of Mei (US Patent 10,740,607) and Lin et al. (Data Efficient Lithography Modeling With Transfer Learning and Active Data Selection).





Regarding claim 1, Ha teaches a method for compression of shape data for a set of electronic designs, the method comprising: 
inputting a set of shape data (or “images…which may have…shape”), wherein the set of shape data (said or “images…which may have…shape”) represents a set of shapes (or “regions of particular shape” as indicated in fig. 2:200,202,212) for a device fabrication process (or “fabricating semiconductor devices”); 
using a convolutional autoencoder (via figs. 4:404, 5:502,506 and 7:702: “Encoder”) on the set of shape data (said “images…which may have…shape”), wherein the convolutional autoencoder (said figs. 4:404, 5:502,506 and 7:702: “Encoder”) has a pre-determined set of convolution layers (via fig. 5:512: “Fully connected layer(s)”), including a kernel size (or “kernel sizes”, i.e. dimensions, l x h x w, of a filter) and filter size (i.e. # of filters) for each convolution layer (said fig. 5:512: “Fully connected layer(s)”); and 










encoding (via said fig. 5:502,506: “Encoder”) the set of shape data (said “images…which may have…shape”) to compress (via “dimensionality reduction”) the set of shape data (said “images…which may have…shape”), using the pre-determined set of convolution layers (said fig. 5:512: “Fully connected layer(s)”) of the convolutional autoencoder (said fig. 5:502,506: “Encoder”), to create a set of encoded shape data, wherein the pre-determined set of convolutional layers (or “convolutional layers”) are tuned (or “tune”) for increased accuracy (via said “tune” comprising “superior” “accuracy”) of the encoded (via figs. 4:404, 5:502,506 and 7:702: “Encoder”) shape data (said “images…which may have…shape”) based on design rules (or “design rules”) for the set (via fig. 2:200 and 202: two things of similar design) of electronic designs (via “designers”); 
wherein the set of shape data (said “images…which may have…shape”) comprises a scanning electron microscope (SEM) image (via a “a scanning electron microscope (SEM)” as represented in fig. 1:10), and a mask (via a photo-“lithography…pattern…etch”) defect is identified in (via “defect detection…in…data”) the set of encoded shape data (said “images…which may have…shape”) 












c.1,ll. 17-50:
“Fabricating semiconductor devices such as logic and memory devices typically includes processing a substrate such as a semiconductor wafer using a large number of semiconductor fabrication processes to form various features and multiple levels of the semiconductor devices.  For example, lithography is a semiconductor fabrication process that involves transferring a pattern from a reticle to a resist arranged on a semiconductor wafer.  Additional examples of semiconductor fabrication processes include, but are not limited to, chemical-mechanical polishing (CMP), etch, deposition, and ion implantation. Multiple semiconductor devices may be fabricated in an arrangement on a single semiconductor wafer and then separated into individual semiconductor devices.”
Inspection processes are used at various steps during a semiconductor manufacturing process to detect detects on specimens to drive higher yield in the manufacturing process and thus higher profits.  Inspection has always been an important part of fabricating semiconductor devices.  However, as the dimensions of semiconductor devices decrease, inspection becomes even more important to the successful manufacture of acceptable semiconductor devices because smaller defects can cause the devices to fail.
Defect review typically involves re-detecting defects detected as such by an inspection process and generating additional information about the defects at a higher resolution using either a high magnification optical system or a scanning electron microscope (SEM), Defect review is therefore performed at discrete locations on specimens where defects have been detected by inspection.  The higher resolution data for the defects generated by defect review is more suitable for determining attributes of the defects such as profile, roughness, more accurate size information, etc.”;

c.2,ll. 56-67:
“Both heuristic rendering and physics-based approaches have, however, a number of disadvantages.  For example, the disadvantages of the currently used heuristic rendering approaches come from their heuristic nature.  Most of the challenges in multi-modality image registration are substantially hard to solve completely using heuristic rendering methods.  For example, missing computer aided design (CAD) layer issues, context dependent optical proximity correction (OPC) errors, non-uniformity, etc. are difficult to solve using heuristic methods.  As design rules continue shrinking, these challenges will become more and more severe.”

c.5,ll. 26-32:
“In addition, the "design," "design data," and "design information" described herein refers to information and data that is generated by semiconductor device designers in a design process and is therefore available for use in the embodiments described herein well in advance of printing of the design on any physical specimens such as reticles and wafers.”




c.5,ll. 43-58:
“In general, the embodiments described herein are configured as robust learning based approaches that substantially accurately align images across different modalities, which may have some combination of varying length scales, frequency spreads, differing structures, and large shape distortions.  One embodiment relates to a system configured to align images for a specimen acquired with different modalities.  One such embodiment is shown in FIG. 1.  The system may include optical tool 10, also referred to herein as an "optical imaging system." In general, the optical tool is configured for generating optical images of a specimen by directing light to (or scanning light over) and detecting light from the specimen.  In one embodiment, the specimen includes a wafer.  The wafer may include any wafer known in the art.  In another embodiment, the specimen includes a reticle.  The reticle may include any reticle known in the art.”;

c.18,ll. 29-40:
“Deep learning is part of a broader family of machine learning methods based on learning representations of data.  An observation (e.g., an image) can be represented in many ways such as a vector of intensity values per pixel, or in a more abstract way as a set of edges, regions of particular shape, etc. Some representations are better than others at simplifying the learning task (e.g., face recognition or facial expression recognition).  One of the promises of deep learning is replacing handcrafted features with efficient algorithms for unsupervised or semi-supervised feature learning and hierarchical feature extraction.”;























c.20,ll. 4-51:
“An autoencoder, autoassociator or Diabolo network is an artificial neural network used for unsupervised learning of efficient codings.  The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for the purpose of dimensionality reduction.  Recently, the autoencoder concept has become more widely used for learning regression models of data.  Architecturally, the simplest form of an autoencoder is a feedforward, non-recurrent neural network very similar to the multilayer perceptron (MLP)--having an input layer, an output layer and one or more hidden layers connecting them--, but with the output layer having the same number of 
nodes as the input layer, and with the purpose of reconstructing its own inputs (instead of predicting the target value given inputs).  Therefore, autoencoders are unsupervised learning models.  An autoencoder always consists of two parts, the encoder and the decoder.  Various techniques exist to prevent autoencoders from learning the identity function and to improve their ability to capture important information and learn richer representations.  The autoencoder may include any suitable variant of autoencoder such as a Denoising autoencoder, sparse autoencoder, variational autoencoder, and contractive autoen-coder.”
In a denoising autoencoder, the input (e.g., SEM) image may be considered as a noisy version of its corresponding (e.g., CAD) image.  Denoising autoencoders are generally configured to take a partially corrupted input while training to recover the original undistorted input.  This technique has been introduced with a specific approach to good representation.  A good representation is one that can be obtained robustly from a corrupted input and that will be useful for recovering the corresponding clean input.  This definition contains the following implicit assumptions: The higher level 
representations are relatively stable and robust to the corruption of the input; and It is necessary to extract features that are useful for representation of the input distribution.  A denoise convolutional autoencoder is generally defined as a denoising autoencoder that includes convolutional layers.  Denoising autoencoders constructed using convolutional layers have better image denoising performance for their ability to exploit strong spatial correlations.  The denoise convolutional autoencoders included in the 
embodiments described herein may be further configured as described by Gondara 
in "Medical Image Denoising Using Convolutional Denoising Autoencoders," arXiv:1608.04667v2, Sep. 18, 2016, 6 pages, which is incorporated by reference 
as if fully set forth herein.  The embodiments described herein may be further configured as described in this reference.”; 











c.22,ll. 24-40:
“In the embodiment shown in FIG. 3, the learning based model may be a regression model or any of the learning based models described herein.  In one such example, the learning based model may be in the form of a deep convolution autoencoder (DCAE).  The encoder portion of the learning based model may include, for example, five convolutional layers with kernel sizes of, for example, 5.times.5, a stride of 2, and no zero padding.  Each convolutional layer may be followed by a leaky rectified linear unit.  The decoder portion of the learning based model may have a similar architecture as the encoder, but uses deconvolutional layers.  The decoder may have separate weights, which are trainable to have more freedom to reconstruct design images.  This configuration of the learning based model is non-limiting, however, in that 
other learning based models may also be used in this embodiment and the parameters of the DCAE described above may be altered as necessary or desired.”; and

c.26,ll. 23-28:
“The end-to-end learning based model approaches described herein are different from the currently used methods in that, in these embodiments, the whole registration process is carried out in a single feedforward network.  These embodiments therefore are simpler, require fewer parameters to tune, run much faster and thus increase throughput.”

wherein “tune” is defined:

BRITISH DICTIONARY DEFINITIONS FOR TUNE
tune
14	(tr often foll by up) to make fine adjustments to (an engine, machine, etc) to obtain optimum performance

wherein “fine” is defined:
BRITISH DICTIONARY DEFINITIONS FOR FINE (1 OF 4)
fine1
adjective
1	excellent or choice in quality; very good of its kind: a fine speech

wherein “excellent” is defined:

BRITISH DICTIONARY DEFINITIONS FOR EXCELLENT
excellent
adjective
1	exceptionally good; extremely meritorious; superior

wherein “adjustment” is defined




BRITISH DICTIONARY DEFINITIONS FOR ADJUSTMENT
adjustment
noun
1	the act of adjusting or state of being adjusted

wherein “adjusting” is defined:
BRITISH DICTIONARY DEFINITIONS FOR ADJUST
adjust
verb
1	(tr) to alter slightly, esp to achieve accuracy; regulate: to adjust the television; 

and

c.29,ll. 33-43:
	“The embodiments described herein may also be used for die-to-database 
defect detection, in which inspection images are aligned to design data for a 
specimen so that they can be used in combination to detect defects on the 
specimen.  Such defect detection can provide increased sensitivity to detect 
pattern defects (e.g., missing patterned features, dummy defects, bridge 
defects, etc.).  Although such applications may only require coarse alignment 
between defect images and design, the embodiments described herein can provide 
the appropriate alignment for die-to-database defect detection.”).
	
	Thus, Ha does not teach, as indicated in bold above, the claimed:
A.	“filter size for each convolution layer”; and
B.	“the pre-determined set of convolutional layers are tuned”.









Accordingly, Mei teaches:
A.	filter size (or “a filter quantity of…4” via fig. 5:2nd box down: “4” of “4;3X3”) for each convolution layer (as shown in fig. 5:2nd 4th 6th 8th 10th ,12th 14th and 15th boxes down via c.10,ll. 28-35:
“Optionally, except the eighth convolutional layer, a filter quantity of a next convolutional layer is two times a filter quantity of a previous convolutional layer, and a filter quantity of the seventh convolutional layer is equal to that of the eighth convolutional layer.  Referring to FIG. 5, if a filter quantity of the first convolutional layer is 4, filter quantities of subsequent convolutional layers are sequentially 8, 16, 32, 64, 128, 256, and 256.”).

Thus, one of ordinary skill in the art of convolution layers can modify Ha’s said fig. 5:512: “Fully connected layer(s)” with Mei’s teaching of fig. 5: “4” of “4;3X3” and recognize that the modification is predictable or looked forward to because Mei’s filters have “a relatively small calculation amount” and thus a computer processes data faster than other size filters such as 100,000,000,000 miles x 100,000,000,000 miles via Mei, c.10,ll. 42-44:
“Optionally, a filter of 3.times.3 pixels may be adopted for each convolutional layer, and the filter of 3.times.3 pixels has a relatively small calculation amount.”









	Accordingly, Lin teaches the claimed:
B.	the pre-determined set of convolutional layers are tuned (via “finetune the successive layers” as shown in fig. 12: “Finetune conv fc fc” via page 1905, left column:
“Typical transfer learning scheme for neural networks fixes the first several layers of the model trained for another domain and finetune the successive layers with data from the target domain. The first several layers usually extract general features, which are considered to be similar between the source and the target domains, while the successive layers are classifiers or regressors that need to be adjusted. Fig. 12 shows an example of the transfer learning scheme. We first train a model with source domain data and then use the source domain model as the starting point for the training of the target domain. During the training for the target domain, the first k layers are fixed, while the rest layers are finetuned. We denote this scheme as TFk, shortened from “Transfer and Fix,” where k is the parameter for the number of fixed layers.”

Thus, one of ordinary skill in transfer learning can modify Ha’s teaching of said “convolutional layers” and “fine-tuning or transfer learning” via Ha, c.17,ll. 12-23:
“In general, the embodiments described herein use a learning based approach that is generic and can be applied to any imaging mode, imaging tool, and specimen.  However, as it is a data-driven method, when it is being applied to specific imaging modes, data may be collected from these imaging modes and the learning based models may be trained with this data.  However, in most of the cases, the learning based model does not need to be trained from scratch.  Instead, the learning that the model has done can be transferred from different imaging modes to speed up the training process.  In the machine learning literature, this is often referred to as fine-tuning or transfer learning.”

with Lin’s teaching of said “finetune the successive layers” as shown in fig. 12: “Finetune conv fc fc” and recognize that the modification is predictable or looked forward to because the combination “provides…additional accuracy improvement” via Lin, page 1908, left column, section B. Knowledge Transfer From N10 to N7, last paragraph:
“In Fig. 15(c), we enable transfer learning plus active learning, which provides 7% to 11% additional accuracy improvement for 10% to 40% amount of training data from the target domain.”

Regarding claim 2, Ha as combined teaches the method of claim 1 wherein the encoding (via said fig. 5:502,506: “Encoder”) with the convolutional autoencoder (via said fig. 5:502,506: “Encoder”) comprises a flattening (via “dimensionality reduction” comprising a “not thick” dimension via Dictionary.com) step (via fig. 9:902: “Program instructions”) followed by an embedding (via “mapping” via fig. 2:204: “alignment”) step (via fig. 9:902: “Program instructions”), the embedding (said “mapping” via fig. 2:204: “alignment”) step (said fig. 9:902: “Program instructions”) involving a fully-connected embedding layer (said fig. 5:512: “Fully connected layer(s)”) which outputs a one-dimensional vector (or “a vector of intensity values” cited in the rejection of claim 1 wherein the vector has a magnitude or a size dimension via:
c.2,ll.29-43:
“Using different types of information for a specimen in combination therefore requires some mapping of one type of information to another.  Oftentimes, currently, such mapping may be performed by aligning different images generated for a specimen to each other (e.g., using alignment features in the images and/or on the specimen and/or aligning the different images to a common reference (e.g., design)).  However, due to differences between the different types of information (e.g., different resolutions, pixel sizes, imaging methods (such as optical vs.  electron beam), etc.), alignment of one type of information to another to establish a mapping between the different types of information can be relatively difficult and is susceptible to errors in the alignment method and/or algorithm and noise sources on the specimen (e.g., color variation).”; and

Dictionary.com, emphasis added
reduce
14	to thin or dilute:
to reduce paint with oil or turpentine.
thin
adjective, thin·ner, thin·nest.
1	having relatively little extent from one surface or side to the opposite; not thick:
thin ice.

Regarding claim 4, Ha as combined teaches the method of claim 1 wherein the set of convolution layers (said fig. 5:512: “Fully connected layer(s)”) comprises at least four convolution layers (or “five convolutional layers” cited in the rejection of claim 1).
Regarding claim 8, Ha as combined teaches the method of claim 1, further comprising decoding (via fig. 4:408: “Decoder” and fig. 7:706: “Decoder”) the set of encoded shape data (said via figs. 4:404, 5:502,506 and 7:702: “Encoder”) into decoded data using the convolutional autoencoder (said via figs. 4:404, 5:502,506 and 7:702: “Encoder”).
Regarding claim 11, Ha as combined teaches the method of claim 1, further comprising: 
determining an error value (via fig. 4:“L2-loss”) for the set of encoded shape data (said via figs. 4:404, 5:502,506 and 7:702: “Encoder”); and
 outputting the input set of shape data instead of the set of encoded shape data if the error value of the set of encoded shape data is greater than a pre-determined threshold (thus, the broadest reasonable interpretation of claim 11 has been met via MPEP 2111.04 II. CONTINGENT LIMITATIONS).






Regarding claim 12, Ha as combined teaches the method of claim 11 wherein the error value (said via fig. 4:“L2-loss”) is based on a distance criterion (said fig. 4:“L2-loss”) or a difference in dose energy to manufacture (via “during a semiconductor manufacturing process” cited in the rejection of claim 1) the set of shape data (said or “images…which may have…shape”) on a surface (said semiconductor), wherein the distance criterion (said fig. 4:“L2-loss”) or the difference in dose energy are based on the design rules (said “design rules” represented in fig. 4:400 via said “designers” making designs as shown in fig. 4:400).
Regarding claim 13, Ha as combined teaches the method of claim 1 wherein the device fabrication process (said or “fabricating semiconductor devices”) is a semiconductor fabrication process (said via “during a semiconductor manufacturing process” cited in the rejection of claim 1).
Regarding claim 15, Ha as combined teaches the method of claim 1 wherein the SEM image (said via a “a scanning electron microscope (SEM)” as represented in fig. 1:10) further comprises a simulated (via “complex simulation”) mask (via said “etch”) image (or “reticle image” that “can serve as” the “complex simulation” via c.5,ll. 8-25:
“The terms "design," "design data," and "design information" as used interchangeably herein generally refer to the physical design (layout) of an IC and data derived from the physical design through complex simulation or simple geometric and Boolean operations.  In addition, an image of a reticle acquired by a reticle inspection system and/or derivatives thereof can be used as a "proxy" or "proxies" for the design, Such a reticle image or a derivative thereof can serve as a substitute for the design layout in any embodiments described herein that use a design.  The design may include any other design data or design data proxies described in commonly owned U.S.  Pat.  No. 7,570,796 issued on Aug.  4, 2009 to Zafar et al. and U.S.  Pat.  No. 7,676,077 
issued on Mar.  9, 2010 to Kulkarni et al., both of which are incorporated by reference as if fully set forth herein.  In addition, the design data can be standard cell library data, integrated layout data, design data for one or more layers, derivatives of the design data, and full or partial chip design data.” 

Claims 5-7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ha et al. (US Patent 10,733,744) in view of Mei (US Patent 10,740,607) and Lin et al. (Data Efficient Lithography Modeling With Transfer Learning and Active Data Selection) as applied above further in view of Liang et al. (US Patent App. Pub. No.: US 2019/0171223 A1).
Regarding claim 5, Ha as combined teaches the method of claim 4 wherein the set of convolution layers (said fig. 5:512: “Fully connected layer(s)”) comprises: 
a first convolution layer (via said fig. 5:512: “Fully connected layer(s)”) using a first 5x5 kernel (or “kernel sizes of, for example, 5.times.5” cited in the rejection of claim 1); 
a second convolution layer (via said fig. 5:512: “Fully connected layer(s)”) following the first convolution layer (via said fig. 5:512: “Fully connected layer(s)”) and using a second 5x5 kernel (said “kernel sizes of, for example, 5.times.5” cited in the rejection of claim 1); 
a third convolution layer (via said fig. 5:512: “Fully connected layer(s)”) following the second convolution layer (via said fig. 5:512: “Fully connected layer(s)”) and using a first 3x3 kernel; and 
a fourth convolution layer (via said fig. 5:512: “Fully connected layer(s)”) following the third convolution layer (via said fig. 5:512: “Fully connected layer(s)”) and using a second 3x3 kernel.
	Thus, Ha as combined does not teach, as indicated in bold above, the claimed “3X3”, twice.
	

Accordingly, Liang teaches the claimed 3X3 twice (via “the last two layers have kernel size 3.times.3” via:
[0031] Real operating images are images of real-world scenes related to the autonomous device during operation of the device.  For example, a real operating image 408 may be obtained or captured by a camera that is associated with the autonomous device.  In the case of a road vehicle, the camera may be a front-mounted camera that captures images of the road facing the vehicle.  The fake virtual image 410 corresponds to a canonical representation of the real operating image in the virtual domain.  As used herein a "canonical representation" refers to a pixel level representation that includes reliable prediction information and a minimum of non-essential background information.
[0032] The one or more fake virtual images 410 are input to the predictor 404.  “The predictor 404 includes five convolutional layers and three fully connected layers.  The first three convolutional layers have kernel size 5.times.5 and stride size 3, while the last two layers have kernel size 3.times.3 and stride size 1.  No padding is used.  The last convolutional layer is flattened and immediately followed by four fully connected layers with output size 100, 50, and 1 respectively.  All layers use ReLU activation.  In an alternate configuration, the predictor 404 includes four fully connected 
layers with output size 100, 50, 10 and 1 respectively.  The predictor 404 may be based on the network architecture used in DAVE-2 system described in M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, et al. End to end learning for self-driving cars.  arXiv preprint arXiv: 1604.07316, 2016.”).

	Thus, said one of ordinary skill in convolutional layers can modify Ha’s layers with Liang’s teaching of the two 3X3 kernels, as shown in Liang’s fig. 4B:404, and recognize that the modification is predictable or looked forward to because the two 3X3 kernels results in “reliable prediction information and a minimum of non-essential background information”.




Regarding claim 6, Ha as combined teaches the method of claim 5 wherein the first, second, third and fourth convolutional layers (via said fig. 5:512: “Fully connected layer(s)”) use filter sizes (or said layers as modified via the combination) of 32, 64, 128 and 256 (or “32” as shown in the combination’s Mei’s fig. 5: starting at 4th “Convolutional layer 32, 3X3” going to “256” as shown in fig. 5), respectively.
Regarding claim 7, Ha as combined teaches the method of claim 5 wherein a stride of 2 (or “a stride of 2” cited in the rejection of claim 1) is used in each of the four convolution layers (via said fig. 5:512: “Fully connected layer(s)”).















Claims 3 and 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ha et al. (US Patent 10,733,744) in view of Mei (US Patent 10,740,607) and Lin et al. (Data Efficient Lithography Modeling With Transfer Learning and Active Data Selection) as applied above further in view of Keisler et al. (US Patent 10,248,663).
Regarding claim 3, Ha as combined teaches the method of claim 2 wherein the one-dimensional vector (said or “a vector of intensity values” cited in the rejection of claim 1 wherein the vector has a magnitude or a size dimension) comprises 256 elements.
Thus, Ha as combined does not teach, as indicated in bold above, the claimed “256 elements”.
Accordingly, Keisler teaches the claimed:
256 elements (or “256-bit” via c.31,ll. 1-13:
“As one example, 64-bit (and 256-bit) binary codes may be generated using, for example, the following two example techniques:  
1.  Autoencoder--In some embodiments, an autoencoder is trained in TensorFlow.  In some embodiments, the encoding layer has 64 nodes with rectified linear (relu) activation.  In some embodiments, this layer is binarized based on zero/non-zero activation. 
 2.  PCA (principal component analysis)--In some embodiments, the top 64 principal components are obtained, the 2048-dimensional feature vectors are projected onto them, and binarized, for example, on positive/negative coefficients.”).

Thus, one of ordinary skill in the art of autoencoders can modify Ha’s encoder with the 256 bits and recognize that the modification is predictable or looked forward to for the same reasons as discussed in claim 9, below.



Regarding claim 9, Ha as combined teaches the method of claim 1 wherein: 
the set of shape data (said or “images…which may have…shape”) comprises a grid of tiles decomposed from a larger image; and 
the encoding (via said fig. 5:502,506: “Encoder”) comprises encoding the grid of tiles on a tile-by-tile basis.
	Thus, Ha as combined does not teach, as indicated in bold above, the claimed:
A.	a grid of tiles decomposed from a larger image; and
B.	the grid of tiles on a tile-by-tile basis.















Accordingly, Keisler teaches:
A.	a grid of tiles (via fig. 2A:202: “Tiling Function with 2 Overlapping Tile Grids” detailed in fig. 2B:242 of maximum overlap of pixels is haloed by fig. 2B:234,236,232,240,238 of lesser overlap of pixels) decomposed from a larger image; and
B.	the grid of tiles (said via fig. 2A:202: “Tiling Function with 2 Overlapping Tile Grids”) on a tile-by-tile basis (as indicated in fig. 2A:204: “Tile(s)” and fig. 2A:212 “Tile Key” and fig. 2C:270: “each tile” wherein “each” comprises one tile by one tile).
	Thus, one of ordinary skill in the art of image shapes can modify Ha’s teaching of said “images…which may have…shape” with Keisler’s teaching of fig. 2A:202: “Tiling Function with 2 Overlapping Tile Grids” and fig. 2C by creating tiles contained in the shapes of the images with larger shapes and recognize that the modification is predictable or looked forward to because Keisler’s teaching is “computationally efficient” via Keisler,c.6,ll. 29-36: 
“Thus, as shown in the example described above, extracting features may include extracting features from the input, raw image space, to a higher level, but lower dimensionality space.  This may provide various performance benefits and improvements in computation and memory usage (e.g., a smaller amount of data used to store the visual information in an image tile, where processing on the smaller amounts of data is more computationally efficient as well).”

	





Claim 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ha et al. (US Patent 10,733,744) in view of Mei (US Patent 10,740,607) and Lin et al. (Data Efficient Lithography Modeling With Transfer Learning and Active Data Selection) as applied above further in view of Keisler et al. (US Patent 10,248,663) as applied above further in view of CAMMAROTA et al. (US Patent App. Pub. No. US 2020/0073636 A1).
Regarding claim 10, Ha as combined teaches the method of claim 9, wherein each tile in the grid of tiles (via said or “images…which may have…shape” as modified via the combination) comprises a halo (via said or “images…which may have…shape”, as modified via the combination, comprised by a tile being a geometric shape via a “square” or “other shape”), the halo (via said or “images…which may have…shape”, as modified via the combination) being a region of neighboring pixels surrounding the tile (via said or “images…which may have…shape” as modified via the combination), the halo (via said or “images…which may have…shape”, as modified via the combination) having a size chosen based on at least one of: the number of convolution layers  and the kernel size of the convolution layers.
Thus, the combination of Ha does not teach when claim 10 is considered as a whole, as indicated in bold above, the claimed “the halo having a size chosen based on at least one of: the number of convolution layers and the kernel size of the convolution layers”. 




Accordingly, Cammarota teaches:
the halo (via figure 5A: “Padded input Feature 520” and fig. 5B:564: “constant padding” and fig. 5B:562: “Padded Input feature”) having a size chosen based on at least one of: the number of convolution layers and (i.e., otherwise) the kernel size (via “padding layers is selected based on the filter kernel size (e.g., padding layer(s)=(filter kernel size-1)/2)”) of the convolution layers (via:
“[0028] In convolutional neural networks, padding is a layer pre-processing technique for which padding values are known.  Hence, a portion of the computationally intensive MAC operations (e.g., the multiplications) can use the known information about the padding values to speed up multiplications, and overall MAC operations--by a single MAC operation or multiple simultaneous MAC operations.  For example, in applications that involve a hardware accelerator, information about known values can be exploited to speed up operations by specializing the circuitry.”; and

“[0061] FIG. 5B is a block diagram 550 illustrating multilayer padding of a padded input feature 560 to maintain an input feature size during multiply-accumulate (MAC) operations using a 5.times.5 filter kernel 590, according to aspects of the present disclosure.  In this example, a padded input feature 560 is composed of input feature values 562 (i1_1, i1_2, .  . . , i3_3) and the constant padding values 564, which illustrate a multilayer (e.g., =2 layer) constant padding type.  The constant padding values 564 may be added at an input of a convolutional neural network or layer by layer in the neural network.  Although shown using the constant padding type, it should be recognized that other padding types are considered, including zero padding type, reflective mirror padding type, symmetric mirror padding type, and edge mirror padding type.  For example, the mirror padding types may be beneficial in style transfer applications.  As shown in FIGS. 5A and 5B, symmetrical padding may involve an odd sized filter kernel, in which the number of padding layers is selected based on the filter kernel size (e.g., padding layer(s)=(filter kernel size-1)/2).”).

	




Thus, one of ordinary skill in the art of convolutional neural networks can modify Ha’s teaching of said “images…which may have…shape”, as modified via the combination, with Cammarota’s teaching of said figure 5A: “Padded Input Feature 520” and fig. 5B:564: “constant padding” and fig. 5B:562: “Padded Input feature” by applying padding to said Ha’s teaching of said “images…which may have…shape”, as modified via the combination, and recognizing that the modification is predictable or looked forward to because the modification is used “to speed up operations”, Cammarota, cited above.
	













Claim 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ha et al. (US Patent 10,733,744) in view of Mei (US Patent 10,740,607) and Lin et al. (Data Efficient Lithography Modeling With Transfer Learning and Active Data Selection) as applied above further in view of Chuang et al. (US 2015/0255730 A1).
Regarding claim 14, Ha as combined teaches the method of claim 1 wherein the device fabrication process (said or “fabricating semiconductor devices”) is a flat-panel display fabrication process.
Thus, Ha as combined does not teach, as indicated in bold above, the claimed “a flat-panel display”. Accordingly, Chuang teaches “new flat-panel displays” via:
“[0003] An electroluminescent device is a semiconductor device capable of 
converting electrical energy into light with high conversion efficiency, which is commonly used as the luminous elements in indication lights, display panels, and optical reading/writing heads, etc. The electroluminescent device, having characteristics such as free viewing angle, simple fabrication process, low production cost, fast response, wide operation temperature range, and full color display, etc., is expected to become the mainstream of new flat-panel displays.” 

	Thus, one of ordinary skill in the art of semiconductor fabrication can modify Ha’s fabrication of semiconductors with the new flat-panel displays comprising a “electroluminescent…semiconductor device” (Chuang, cited above: figs. 3-5) and recognize that the modification is predictable or looked forward to because the new flat-panel displays include a “simple fabrication process, low production cost” and is “expected to become the mainstream of new flat-panel displays” (Chuang, cited above).




Claim 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ha et al. (US Patent 10,733,744) in view of Mei (US Patent 10,740,607) and Lin et al. (Data Efficient Lithography Modeling With Transfer Learning and Active Data Selection) as applied above further in view of TSUTSUI et al. (US Patent App. Pub. No.: US 2020/0184137 A1).
	Regarding claim 21, Ha teaches the method of claim 1, wherein the design rules (said “design rules”) comprise a minimum line width (via “line width, thickness, etc.”) or a minimum line-to-line (said via “line width, thickness, etc.”) spacing (via c.1,ll. 51 to c.2,l.2:
“Metrology processes are also used at various steps during a semiconductor manufacturing process to monitor and control the process.  Metrology processes are different than inspection processes in that, unlike inspection processes in which defects are detected on specimens, metrology processes are used to measure one or more characteristics of the specimens that cannot be determined using currently used inspection tools.  For example, metrology processes are used to measure one or more characteristics of specimens such as a dimension (e.g., line width, thickness, etc.) of features formed on the specimens during a process such that the performance of the 
process can be determined from the one or more characteristics.  In addition, if the one or more characteristics of the specimens are unacceptable (e.g., out of a predetermined range for the characteristic(s)), the measurements of the one or more characteristics of the specimens may be used to alter one or more parameters of the process such that additional specimens manufactured by the process have acceptable characteristic(s).”).

	Thus, Ha does not teach, as indicated in bold above, the claimed “a minimum line width or a minimum line-to-line spacing”.
	



Accordingly, TSUTSUI teaches claim 21 of “a minimum line width or a minimum line-to-line spacing” via “minimum…width of a wiring” and “minimum…interval between a wiring” via TSUTSUI:
“[0108] A design rule, which is a restriction on layout design, refers to the minimum values or the like of the size of components (a semiconductor layer, a conductor layer, and the like) of each element and the interval therebetween.  Examples of a design rule include the maximum values and the minimum values of the width of a wiring, the interval between a wiring and an adjacent wiring, the interval between a wiring and an adjacent element, the size of a contact hole, and the like.”
	
Thus, one of skill in the art of designs can modify Ha’s teaches of the “design rules” and the “line width, thickness, etc.” with TSUTSUI’s minimums and recognize that the modification is predictable or looked forward to because TSUTSUI’s minimums are used to satisfy rules regarding with width or spacing of a line or a wire as indicated in TSUTSUI’s fig. 6:S29: “Is design rule satisfied?”.












Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 16-19 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Ha et al. (US Patent 10,733,744).
Regarding claim 16, claim 16 is rejected the same as claims 1,8,11. Thus, argument presented in claims 1,8,11 is equally applicable to claim 16. Accordingly, Ha discloses a method for compression of shape data for a set of electronic designs, the method comprising: 
inputting a set of shape data (said or “images…which may have…shape”), wherein the set of shape data represents a set of shapes (said or “regions of particular shape”) for a device fabrication process (said or “fabricating semiconductor devices”); 
inputting a set of parameters (or “the parameters…may be altered” cited in the rejection of claim 1) including a set of convolution layers (said via fig. 5:512: “Fully connected layer(s)”) for a convolutional autoencoder (said via figs. 4:404, 5:502,506, 7:702: “Encoder”), wherein the set of parameters (said “the parameters…may be altered” cited in the rejection of claim 1) are determined  (via “as necessary or desired”) using design rules (said “design rules”) for the set (said fig. 3:300,306 or fig. 5:504,508) of electronic designs (via said “designers”); 
encoding (said via figs. 4:404, 5:502,506, 7:702: “Encoder”) the set of shape data  to compress (said via “dimensionality reduction”) the set of shape data, using the set of convolution layers (said via fig. 5:512: “Fully connected layer(s)”) of the convolutional autoencoder (said via figs. 4:404, 5:502,506, 7:702: “Encoder”), to create a set of encoded shape data; 
decoding (said via figs. 4:408,7:706: “Decoder”) the set of encoded shape data  into decoded data using the convolutional autoencoder; 
calculating a loss (said via fig. 4:“L2-loss” as mentioned in claim 11) by comparing the decoded data (via said via fig. 4:408: “Decoder” and fig. 7:706: “Decoder”) with the input set of shape data (said or “images…which may have…shape”); and 
adjusting the set of parameters (said “the parameters…may be altered”) based on the calculated loss (said via fig. 4:“L2-loss”), wherein the set of parameters (said “the parameters…may be altered” cited in the rejection of claim 1) are tuned (via “parameters to tune”) for increased accuracy (via said “tune” comprising “superior” “accuracy”) of the encoded (said via figs. 4:404, 5:502,506, 7:702: “Encoder”) shape data (said or “images…which may have…shape”) based on the design rules (said “design rules”) for the set of electronic designs (via said “designers” via c.26,ll. 23-28:
“The end-to-end learning based model approaches described herein are different from the currently used methods in that, in these embodiments, the whole registration process is carried out in a single feedforward network.  These embodiments therefore are simpler, require fewer parameters to tune, run much faster and thus increase throughput.”).





Regarding claim 17, Ha discloses the method of claim 16, wherein the set of parameters (said or “the parameters…may be altered” cited in the rejection of claim 1) comprises at least one of: a kernel size (said “kernel sizes of, for example, 5.times.5”), a stride value (said “a stride of 2”) and a filter size for each convolution layer.
Regarding 18, Ha discloses the method of claim 17, further comprising determining a vector size (or “vector…dimension”) for the set of encoded (said via figs. 4:404, 5:502,506, 7:702: “Encoder”) shape data (said or “images…which may have…shape” via c.25,ll. 10-24:
“In embodiments described herein in which the feature space is used as 
the common space for image alignment or registration, the feature space of each 
imaging modality can be different.  It is driven by the data that is used to train the model.  The training process will determine what are the best features to describe the images from each image modality (e.g., by minimizing the cost functions).  Specifically, the deep features of the first image and the deep features of the second image are two output column vectors from the two encoders shown in FIG. 5.  The two feature vectors do not need to have the same dimensions.  Also, meanings of elements in each feature vector may be totally different.  They are driven by data through the training process.”









Regarding claim 19, Ha discloses the method of claim 16, further comprising initializing (via “an initial learning rate”) the set of parameters (said “the parameters…may be altered” during learning) for the convolutional autoencoder prior to the inputting (after training) of the set of convolution layers (via c.26,l. 62 to c.27,l.19:
“FIG. 8 shows one such embodiment.  This embodiment provides an iterative method for automatically sampling data for training based on reconstruction errors.  Reconstruction errors are the differences between the reconstructed image and the ground truth.  As shown in FIG. 8, data pool for training 800 may be input to initial sampling for training 802.  The data pool for training may include any suitable training data known in the art.  The initial sampling for training may be performed in any suitable manner known in the art, e.g., manually.  The initial sampling step generates training data 804, which is used for training 806 of an encoder of one of the learning based 
models described herein.  Training of the encoder may be performed as described 
further herein.  In another example, training may be performed using an Adam optimization solver with a mini-batch size of 10 and an initial learning rate of 0.01 (although this is just one non-limiting example of how the training may be performed).  After training of the encoder of the model, the trained encoder may be tested in testing step 808 performed using test data 810.  Test data may include any suitable test data known in the art.  For example, the test data may include data in data pool 800 that was not selected to be part of training data 804 by initial sampling step 802.  Testing of the trained encoder may be performed in any suitable manner known in the art.”).

Claim 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ha et al. (US Patent 10,733,744) in view of Keisler et al. (US Patent 10,248,663).
Regarding claim 20, claim 20 is rejected the same as claim 9. Thus argument presented in claim 9 is equally applicable to claim 20. 
Claim 22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ha et al. (US Patent 10,733,744) in view of TSUTSUI et al. (US Patent App. Pub. No.: US 2020/0184137 A1).
Regarding claim 22, claim 22 is rejected the same as claim 21. Thus, argument presented in claim 21 is equally applicable to claim 22.

Suggestions
Applicant’s disclosure states:
“[0023] The autoencoder 200 generates compressed data 208 through training, by 
comparing the decoded mask image 212 to the input 202 and calculating a loss value. The loss value is a cost function, which is an average of the losses from multiple data points. For example, a loss may be calculated for each data point, then the average of these losses corresponds to the cost (loss value). In some embodiments, batch gradient descent may be used where for one training cycle, "n" losses for "n" training instances is calculated, but only one cost is used in determining the parameter update. In some embodiments, stochastic gradient descent may be used, where the parameter update is calculated after each loss (and thus the loss effectively corresponds to the cost). The encoded compressed data 208 retains only information needed to reproduce the original input, within a pre-determined threshold, using decoder 210. For example, the autoencoder may set parameters to weight more important information, such that training allows the neural network to learn what information to keep based on those weights. Retaining only information that is needed to reproduce the original input can reduce calculation time and therefore improve processing efficiency.”

Claim 16’s setting or weighting via the claimed adjusting “parameters” (i.e., retaining “important information” as claimed in claims 21 and 22) is directed to being useful regarding the disclosed “reduce calculation time”, as already highlighted in applicant’s remarks of 4/16/21, page 8. In contrast, claim 1 does not claim “parameters”.
In contrast, Ha (US 10,733,744) teaches capturing important information instead of retaining important information. Thus, applicant’s disclosed retaining solution to the problem of time is an indication of non-obviousness in view of Ha.
Claim 10’s full Markush grouping or extent is not anticipated or obvious in the art of record or from the search of claim 10.
Note that these suggestions are not provided with respect to overcoming 35 USC 101,112,102 and/or 103. These suggestion are mainly provided to seek out advantages in the disclosure regardless of 35 USC 101,112,102 and/or 103.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Chen et al. (Design and Acceleration of Convolutional Neural Networks on Modern Architectures: the publication date (2018?) of which could not be clearly established by the examiner and thus is given today’s date: May 8, 2021) is pertinent as teaching the claimed:
the halo (via “padding surrounds” or “zero…Padding”) having a size (“P”) chosen (via “select”) based on at least one of: the number of convolution layers and (i.e. otherwise) the kernel size (via “FxF filter…filter size”) of the convolution layers (via page 4:
“The chosen architecture is based on the VGG-16 model [12], a well known CNN architecture. The architecture follows the VGG-16's pattern of two convolutional/ activation layers followed by a pooling layer. Where filter sizes remain 3x3 and the number of filters doubles after each pooling layer. We did not implement the VGG-16 architecture because it was designed for 244x244 images and the CIFAR-10 images are 32x32. Using input images of size 32x32 with the VGG-16 architecture would create a runtime error due to attempting to create an image of negative size after three series of two convolutional layers followed by a pooling layer. Also due to input image size, this architecture utilizes what is known as "same padding" on every other convolutional layer starting with the first convolutional layer. Same padding surrounds the input data with zeros such that the output after the convolution operation is the same size as the input. Thus, allowing the model to be deeper despite the small input image dimensions. The pooling operation by convention is a 2x2 max pooling filter with a stride of two and no padding. As its name suggests this filter selects the maximum value present of the four pixel values in its scope. The term stride refers to the number of rows or columns to move the filter. Stride can be thought of as the filters step size. Padding refers to the number of zeros to encircle the input with prior to filtering. Conventionally pooling layers effectively reduce the size of the data by two. The formula below [1] is used to calculate output size for a given NxN input image and an FxF filter where S = stride and P = padding. This formula is also used to select the correct padding given filter size
and stride to implement "same padding".

                        
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                            
                                
                                    
                                        
                                            N
                                            +
                                            2
                                            P
                                            -
                                            F
                                        
                                        
                                            S
                                        
                                    
                                
                            
                            +
                            1
                        
                       ”                 
Ma et al. (A Machine Learning Based Morphological Classification of 14,245 Radio AGNs Selected from the Best–Heckman Sample) is pertinent regarding claim 10 for teaching “We select…padding…size” via page 7:
“Once an input is given, the corresponding output of a convolutional layer is uniquely determined by its kernels and biases and the choices of stride and zero-padding modes. For simplicity, in this work we always scale the images, as well as the kernels, as square matrices. The initial values assigned to the elements in the kernels are randomly generated with truncated normal Gaussian distribution, and corresponding biases are initialized with constants (Abadi et al. 2015). We select half-padding (i.e., same padding)                         
                            p
                            =
                            
                                
                                    k
                                    /
                                    2
                                
                            
                        
                     where k is the size of the squared kernel, to keep the shape and set the size of the input. We also set the stride, the step length to slide the kernel, to s = 2, which leads to a linear pooling (i.e., subsampling or downsampling), to decrease the number of free parameters and meanwhile extract the most remarkable features.”

Lin et al. (Machine Learning for Yield Learning and Optimization) corresponding to the above applied Lin et al. (Data Efficient Lithography Modeling With Transfer Learning and Active Data Selection) is pertinent for also teaching finetuning as shown in fig. 9: “Finetune conv fc fc” and also teaches an optical proximity correction (OPC) autoencoder in fig. 17: “Neural network architecture of GAN-OPC” that can be used to create data for the transfer learning/finetuning of fig. 8: “Data Augmentation” under 35 USC 103 if these two references were to be combined.






Wetteland (Classification of histological images of bladder cancer using deep learning) is pertinent as teaching the same above “same padding” equation (Wetteland, page 16, equation (2): “Output size”) that is used “To avoid any architectural problems” regarding “Autoencoder hyperparameters”, Wetteland, page 38:
“Autoencoder hyperparameters
The convolutional layers could also have been used to reduce the size if the parameters (stride, filter kernel etc.) had been chosen accordingly. However, the program developed in this thesis is made to automatically test a multitude of different models
with different hyperparameters. To avoid any architectural problems, zero-padding
is added to the convolutional layers, and only pooling and fully-connected layers
is used to reduce the size. Both stride and kernel size is kept constant at 1 and 3
respectively.”

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DENNIS ROSARIO whose telephone number is (571)272-7397.  The examiner can normally be reached on Monday-Friday, 9AM-5PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on (571)272-7778.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/DENNIS ROSARIO/Examiner, Art Unit 2667 

/MATTHEW C BELLA/Supervisory Patent Examiner, Art Unit 2667