DETAILED ACTION
 
Notice of Pre-AIA  or AIA  Status
         The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
 
Status of Claims
         The following claim(s) is/are pending in this Office action: 1-20.
         Claim(s) 1-20 are rejected.  This rejection is NON-FINAL.
 
Information Disclosure Statement
         The information disclosure statement (IDS) submitted on January 30, 2018 and May 29, 2018 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
 
Claim Objections
            Claim 11 stands objected to because of the following informalities:  a conjunction is missing between the two wherein clauses. The examiner suggests amend claim 11 to recite “wherein the adjusting the weight function includes using an overall cost function, and wherein the overall cost function is a sum of the error index and the regularization item.”
 

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
 
The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
 
            Claims 9-11 stand rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
(a)         Claims 9: Claim 9 recites an “error index for the convergence threshold” that is not supported by the specification that, to the extent possible, describes what an error index is or what the relation is between an error index and the claimed convergence threshold.  Neither does the specification describe how an error index functions or is used, let alone “for the convergence threshold” as claimed. For example, ¶¶ 19, 26, 62, 88 describe “an error index” in a nearly identical manner without providing any additional details (e.g., ¶ [0088] describes, in its entirety “The processor may be further configured to set an error index for the convergence threshold.” ¶ 64 merely describes that “W_opt=Argmax[Accuracy (W)], where accuracy is defined by the error index”, again without providing any further description for an “error index”, much less its relation with a “convergence threshold”. 
The examiner notes that “The written description requirement is not necessarily met when the claim language appears in ipsis verbis in the specification. "Even if a claim is supported by the specification, the language of the specification, to the extent possible, must describe the claimed invention so that one skilled in the art can recognize what is claimed. The appearance of mere indistinct words in a specification or a claim, even an original claim, does not necessarily satisfy that requirement." See MPEP § 2163.03(V) (citations omitted). Therefore, claim 9 is rejected under 35 U.S.C. § 112(a) for failing to meet the written description requirement.
(b)       Claims 10-11 depend from claim 9 and are thus rejected accordingly due to at least their dependency from claim 9.
 
The following is a quotation of the first paragraph of 35 U.S.C. 112(b):
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
 
 
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
 
Claims 1-20 stand rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
(a)         Claim(s) 1 and 15:
(1) the limitation “the weight coefficients” is indefinite because it is unclear whether these weight coefficients are the same as or different from the “one or more weight coefficients” respectively recited in claims 1 and 15. For purpose of examination, this limitation is interpreted as the same as “the one or more weight coefficients”.  Claims 2-14 and 16-20 respectively depend from claims 1 and 15 and are thus rejected accordingly due to at least their respective dependency from claims 1 and 15.
(2) The limitation “performing a regression on reference spectra” is not interrelated to and thus have no bearing on any other claimed limitations and thus renders the scope of claims 1 and 15 indefinite. Further, a regression analysis is generally known to estimate the relationship(s) between a dependent variable (e.g., a spectral signal or an output of a model) and one or more independent variables (e.g., a critical parameter, a floating parameter, etc.) That is, a regression analysis may be performed on reference data (e.g., ground truths) to establish the relations between the output of a model and its independent variables so that the regression analysis result may be further utilized as basis for comparison (e.g., comparison to or with predicted results).  Nonetheless, neither the claims nor the present disclosure explains or provides any reasons or description that such a regression analysis needs to be performed more than once for reference data such as ground truths. Therefore, the missing link between the claimed limitation “performing a regression on reference spectra” and the remainder of these claims further renders the scope of these claims unclear.  For purpose of examination, this limitation is interpreted as repeated or iterative performance of a regression analysis on the same reference data despite the fact that this limitation does not appear to have any bearing on the remainder of the claims.
(3) The limitation “determining a root-mean-square error between the critical parameters and the reference spectra” is indefinite because a root mean square error denotes the deviation of two parallel values such as a sample value and a population value. Nonetheless, spectra and a critical parameter are not parallel values.  More particularly, ¶ [0053] of the present disclosure appears to describe that spectra are generated or measured “for critical parameters”.  That is, the present disclosure clearly distinguishes spectra from critical parameters. Therefore, it is unclear how these two distinct entities with different characteristics can possibly be compared to determine a root mean square error. For purpose of examination, this limitation is interpreted as determining a root-mean-square error between the critical parameters and corresponding critical parameters for the reference spectra. Applicant shall note that the above interpretation is based on the Examiner’s understanding of regression analyses and RMSE determination upon which the present disclosure remains absolute silent. Nonetheless, Applicant is respectfully requested to clarify the claim language.
(b)         Claims 2-14 and 16-20 respectively depend from claims 1 and 15 and are thus rejected accordingly due to at least their respective dependency from claims 1 and 16, the same rationale applying.
(c)        Claim(s) 4 and 17:
(1) Claim 4 recites the limitation “a single layer neural network” that is indefinite because a neural network generally includes an input layer, an output layer, and one or more hidden layers between the input and output layer. In addition, a single-layer neural network may be defined to include one layer of nodes.  Nonetheless, in this latter definition, this single layer of nodes receiving some inputs sends weighted inputs to another layer or another neural network having one or more receiving nodes for further processing or output, and thus the functioning nevertheless requires multiple layers of nodes.  For the purpose of examination, the claimed single layer neural network” is interpreted as a single hidden layer in a neural network.
(2) claim 17 recites substantially similar limitation and is thus rejected accordingly.
(d)       Claim 7: the limitation “the reference spectra are synthetic” is indefinite because as far as metrology is concerned, spectra are generated by metrology tool that, in a nutshell, collects the scattered photons from measured structure(s) after the measured structure(s) is hit with incident beams (e.g., optical light beams or electron beams) so as to generate spectra by, for example, correlating the intensities of collected photons with wavelengths.  That is, spectra are synthesized by correlating, for example, observed intensities with wavelengths and are thus synthetic.  Moreover, the present disclosure (e.g., ¶¶ 18, 51, 53, 55, and 51) also fails to provide additional information for synthetic reference spectra. Therefore, it is unclear what the claimed limitation means by reference spectra are synthetic.
(e)       Claim 9-11 and 20:
(1) Claim 9 recites the limitation “error index” in “setting an error index for the convergence threshold” that is indefinite.  More specifically, it is unclear what an error index or the relation between an error index and the claimed convergence threshold is.  Neither does the specification describe how an error index functions or is used, let alone “for the convergence threshold” as claimed. For purpose of examination, the claimed “error index” is interpreted as an error or an accuracy measure that also reflects errors.
(2) Claims 10-11 depend from claim 9 and is thus rejected accordingly due to at least their dependency from claim 9.
(3) claim 20 recites substantially similar limitation and is thus rejected accordingly.
(f)        Claim 10-11:
(1) the limitation “a wavelength direction” is indefinite. More specifically, a wavelength is known to denote the distance between two corresponding points in the same phase in the direction of a transverse wave.  A wavelength direction can be at best understood as a direction pertaining to wavelength. Nonetheless, a wavelength may refer to the wavelength of the incident light beam, the wavelength of scattered light (photons) from a semiconductor substrate after being hit by the incident light beam, or wavelengths that are used to plot the intensities of scattered photons in spectra signals, etc. The examiner notes that these various distinct definitions of a wavelength to which a “wavelength direction” pertains renders the scope the claims indefinite.  For purpose of examination, the claimed “wavelength direction” is interpreted as any direction pertaining to incident light having a first wavelength or scattering light having a second wavelength.
(2) The limitation “regularization item” is indefinite. More specifically, regularization is generally known to add information to a machine learning model (e.g., a neural network) to solve an ill-posed problem or to prevent overfitting (see Wikipedia – “regularization”). Nonetheless, neither the claims nor the specification provides any details about what a regularization item is, what function the claimed regularization item serves, what the target of regularization is, where the claimed “regularization item” applies, or what the relationships are between the “regularization item” and the other claimed limitations. For purpose of examination, the claimed regularization item is interpreted as any item that regularizes any part of a neural network.
(3) The limitation “autocorrelation length” is indefinite because autocorrelation length carries several different meanings such as a surface roughness parameter providing spatial information of surface topography, a distance in a direction in which an autocorrelation function because to a value, the length between two correlated temporal signals (e.g., the time series signals measured by a metrology tool), etc. The examiner notes that such distinct, variable definitions of “autocorrelation length” renders the scope of the claims uncertain and thus indefinite. For purpose of examination, an “autocorrelation length” is interpreted as a finite entity (e.g., a generalized distance) that limits the bandwidth of a signal representing an autocorrelation function and having a timescale of the inverse of the autocorrelation length. (4) Claim 11 depends from claim 10 and is thus rejected accordingly due to at least its dependency from claim 10.
(f)        Claim(s) 11-13: the limitation “the weight function” in claims 11-13 is indefinite and further lacks proper antecedent basis. More specifically, it is unclear what the relationship is between “the weight function” recited in claims 11-13 and the “one or more weight coefficients” recited in base claim 1 is. For purpose of examination, this claimed “weight function” is interpreted as any function that pertains to the output of a neural network (e.g., an activation function, a kernel (e.g., a weight vector or matrix), etc.)
(g)       Claim 12: The limitation “wherein the weight function is equal to noise” is indefinite. More specifically, the claimed “noise” is interpreted as, according to its broadest reasonable interpretation, irregular fluctuations in an electrical signal of a measured spectral response in a metrology tool, and the claimed “weight function”, according to claim 12 from which claim 13 depends, is a configurable function.  Nonetheless, it is unclear how a configurable function, which is generally defined as an expression involving one or more variables, equals to irregular fluctuations, which represent values, of an electrical signal.  For purpose of examination, the claimed “weight function” is interpreted as a kernel or a weight matrix, and the claimed “noise” is interpreted as a function pertaining to noise.
(h)       Claim 13: The limitation “the noise is continuous along a wavelength or parameter direction” is indefinite. More specifically, “continuous” means occurring in time or space without interruption or extending without break or irregularity (see WolframAlpha).  Nonetheless, given the claim language “the weight function is equal to noise”, which seems to suggest the claimed “noise” is a form of function.  A continuous function is differentiable by definition, but “occurring in time or space without interruption” does not necessarily mean “the noise is continuous”. The examiner notes that such multiple, different definitions render the scope of claim 13 indefinite. For purpose of examination, continuous is interpreted as extending without break or irregularity. Clarification is nevertheless required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
 
Claims 1-3, 6-7, 14-16, and 18 stand rejected under 35 U.S.C. 103 as being unpatentable over Ferns et al. USPGPub 20120022836 with publication date of Jan. 26, 2012 (hereinafter Ferns) in view of Liu, X. USPGPub 20180314163 with EFD of Dec. 15, 2014 (hereinafter Liu).
With respect to claim 1, Ferns teaches a method comprising: initializing a model that includes a Jacobian matrix using a processor, wherein the initializing includes spectra fitting; (Ferns, ¶ [0004]: “At operation 152, an initial scatterometry model is accessed. A scatterometry user may define an initial model of the expected sample structure by selecting one or more of the material files to assemble a stack of materials corresponding to those present in the periodic grating features to be measured.” ¶ [0030]: “In an embodiment, measured spectral information is received and a scatterometry model having a plurality (N) of model parameters floating is accessed. A Jacobian matrix (Jacobian) of the measured spectral information is calculated and, based on a precision metric determined from the Jacobian and spectral covariance matrix for each model parameter in a plurality of parameter combinations, a set of model parameters to be fixed at a predetermined parameter value is determined.” ¶ [0068]: “The exemplary computer system 1000 includes a processor 1002”.
The examiner notes that Ferns’ processor 1002 teaches a processor, and that Fern’s initial scatterometry model or subsequently revised scatterometry model teaches a model. The examiner further notes that Ferns’ constructing an initial scatterometry model by assembling a stack of materials corresponding to those present in the features whose scatterometry is to be measured so that the photons (e.g., spectra) resulting from the incident beam hitting these same materials exhibit substantially similar behaviors and hence can be used as reference for comparison teaches spectra fitting.  Therefore, the examiner asserts that Ferns teaches the above limitation in its entirety.)
 
constraining critical parameters, using the processor, with at least one floating parameter and one or more weight coefficients; and (Ferns, ¶ [0030]: “In further embodiments, a regression is run on the measured spectral information with the revised scatterometry model. The values of fixed model parameters having a relative precision metric that is sufficiently high that they cannot be reasonably floated in the scatterometry model are verified and updated from the nominal values, if significantly different. The fixed/float determination may then be looped to re-calculate the Jacobian matrix and re-select the fixed/floated parameter sets.” The examiner notes that Ferns’ fixed model parameter having a sufficiently high precision teaches a critical parameter. 
The examiner further notes that Fern’s floated parameter teaches at least one floating parameter, and that Fern’s determining which parameter can be floated in its determination of fixed parameter(s) a floating parameter(s) teaches that a critical parameter (e.g., Fern’s fixed parameter having sufficiently high precision) is constrained with at least one floating parameter.  Furthermore, according to FIG. 3 and its description of the present disclosure, W(n, pn) teaches a weighting for a Jacobian component.  Therefore, a Jacobian component in FIG. 3 teaches a weight coefficient. The examiner thus notes that Ferns’ calculating the Jacobian components for all parameters for the Jacobian matrix in Ferns’ determination of which parameters are to be fixed and are thus critical is based on at least one weight coefficient. Therefore, Ferns teaches the above limitation.)
 
training, using the processor, a neural network to use the model, wherein the training includes: (Ferns, ¶ [0037]: “Once the preprocessing operation 203 is completed, the revised model is then input to the model refining operation 204 in which regression intensive techniques may be employed to consider additional factors, such as additional spectral information characterizing within sample variation, etc.” ¶ [0072]: “The machine-accessible storage medium 1031 may also be used to store or train a neural network, and/or a software library containing methods that train or call a neural network meta-model and/or a user interface of the neural network meta-model.” The examiner notes that Ferns’ “neural network” teaches a neural network that uses Ferns’ scatterometry model, and that Ferns’ iteratively revising an initial scatterometry and/or subsequent, iterative model refining process at least by re-calculating the weight coefficient(s) teaches training the neural network to use the aforementioned model.)
 
adjusting at least one of the weight coefficients; (Ferns, ¶ [0029]: “In an embodiment, measured spectral information is received and a scatterometry model having a plurality (N) of model parameters floating is accessed. A Jacobian matrix (Jacobian) of the measured spectral information is calculated and, based on a precision metric determined from the Jacobian and spectral covariance matrix for each model parameter in a plurality of parameter combinations, a set of model parameters to be fixed at a predetermined parameter value is determined.” ¶ [0030]: “The fixed/float determination may then be looped to re-calculate the Jacobian matrix and re-select the fixed/floated parameter sets.”
The examiner notes that a Jacobian component for each parameter (e.g., the partial derivative of the output with respect to the parameter) in Ferns’ Jacobian matrix teaches a weight coefficient (also see citations and rationale for “weight coefficients”, supra), and that Ferns’ re-calculating a Jacobian component of a Jacobian matrix teaches adjusts at least one of the weight coefficients.)
 
performing a regression on reference spectra; (Ferns, ¶ [0006]: “The matching simulated spectra and/or associated optimized profile model can then be utilized at operation 157 to generate a set of simulated diffraction spectra by perturbing the values of the parameterized final profile model. The resulting set of simulated diffraction spectra may then be employed by a scatterometry measurement system operating in a production environment to determine whether subsequently measured grating structures have been fabricated according to specifications.” ¶ [0007]: “During the regression operation 156, simulated spectra from a set of model parameters for a hypothetical profile are fit to the measured sample spectra. With each regression performed to arrive at the next simulated spectra, a decision on which of the model parameters are to be allowed to float (i.e., to vary) and which are to be fixed is needed.”
The examiner notes that Ferns’ “matching simulated spectra,” “simulated diffraction spectra,” and/or “simulated spectra” teach reference spectra, and that Ferns’ performing each regression to determine the next simulated spectra for fitting measured spectra teaches performing a regression on reference spectra.)
 
determining an error between the critical parameters and the reference spectra; and (Ferns, ¶ [0038]: “With the parameterization so specified, a stable model with a reduced set (M of N) of floatable parameters is input to the model processor 250 to execute the model refinement operation 204 during which additional parameters may be fixed to output a final model having L of M parameters floating based on a best error estimate of parameters deemed critical”. The examiner notes that Ferns’ model refinement operation teaches a part of a training process of the model, and that Ferns’ determining the best error estimate of parameters that are deemed critical teach determining an error between the critical parameters and reference critical parameter values (and hence reference spectra as claimed).
 
repeating the adjusting, the performing, and the determining until the error is less than a convergence threshold. (Ferns, ¶ [0007]: “With each regression performed to arrive at the next simulated spectra, a decision on which of the model parameters are to be allowed to float (i.e., to vary) and which are to be fixed is needed.” ¶ [0061]: “If the normalized difference is not greater than a configurable difference threshold, the method 960 iterates to the next parameter until all parameters are checked. With all parameters determined to have normalized differences below the threshold, the method 960 returns to operation 390 of FIG. 3. In the event that the normalized difference is greater than a configurable threshold value for a first parameter, the method 960 returns to operation 301 (FIG. 3) to re-calculate the Jacobian J0 using the revised model.”
The examiner notes that Ferns’ configurable difference threshold teaches a convergence threshold, that Ferns’ difference of a parameter value teaches an error, and that Ferns’ iterative determination of whether to terminate re-calculation of the Jacobian based on whether the parameter value difference less than the configurable threshold value (and hence iterative performance of a regression analysis and determination of an error) teaches repeating the adjusting, the performing, and the determining until the error is less than a convergence threshold.)
 
Ferns does not appear to explicitly teach the error is a root-mean-square error.
Liu does teach, however, that a root-mean square error (Liu, ¶ [0044]: “The optimization process boils down to a process of finding a set of parameters (design variables) of the system that minimizes the cost function. The cost function can have any suitable form depending on the goal of the optimization. For example, the cost function can be weighted root mean square (RMS) of deviations of certain characteristics (evaluation points) of the system with respect to the intended values (e.g., ideal values) of these characteristics; the cost function can also be the maximum of these deviations.” ¶ [0053]: “therefore, minimizing the weighted RMS of fp(z1, z2, …, zN) is equivalent to minimizing the cost function                         
                            C
                            F
                            
                                
                                    
                                        
                                            z
                                        
                                        
                                            1
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            z
                                        
                                        
                                            2
                                        
                                    
                                    ,
                                    …
                                    ,
                                    
                                        
                                            z
                                        
                                        
                                            N
                                        
                                    
                                
                            
                            =
                             
                            
                                
                                    ∑
                                    
                                        p
                                        =
                                        1
                                    
                                    
                                        P
                                    
                                
                                
                                    
                                        
                                            w
                                        
                                        
                                            p
                                        
                                    
                                    
                                        
                                            f
                                        
                                        
                                            p
                                        
                                        
                                            2
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            z
                                        
                                        
                                            1
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            z
                                        
                                        
                                            2
                                        
                                    
                                    ,
                                    …
                                    ,
                                    
                                        
                                            z
                                        
                                        
                                            N
                                        
                                    
                                
                            
                        
                    , defined in Eq, 1. Thus the weighted RMS of fp(z1, z2, . . . , zN) and Eq. 1 may be utilized interchangeably for notational simplicity herein.” ¶ [0057]: “The predetermined termination condition may include various possibilities, i.e.[,] the cost function may be minimized or maximized, as required by the numerical technique used, the value of the cost function has been equal to a threshold value or has crossed the threshold value, the value of the cost function has reached within a preset error limit, or a preset number of iteration is reached. If either of the conditions in step 306 is satisfied, the method ends.”
The examiner notes that Liu’s RMS error representing the deviation of a characteristic form its target value teaches the claimed root-mean-square error. The examiner further notes that Liu’s termination condition (e.g., when the value of the RMS error is less than or equal to Liu’s threshold value) at which Liu’s process for minimizing the RMS error is terminated teaches a convergence threshold.)
 
Ferns and Liu are analogous art because both references pertain to using neural networks to improve processes for semiconductors. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Ferns with Liu’s training a neural network by determining an RMS error between a critical parameter and reference data (Liu, supra).  The modification formulates any optimization goal in terms of a cost function for any suitable characteristics of select characteristics with respect to the corresponding intended or ideal values of these select characteristics (e.g., root mean square of deviations) by finding local minimums of the cost function as well as using neural networks to accelerate the optimization for superior, global minimums of such select characteristics (Liu, ¶ [0044]: “The optimization process boils down to a process of finding a set of parameters (design variables) of the system that minimizes the cost function. The cost function can have any suitable form depending on the goal of the optimization. For example, the cost function can be weighted root mean square (RMS) of deviations of certain characteristics (evaluation points) of the system with respect to the intended values (e.g., ideal values) of these characteristics”. ¶ [0052]: “The cost function may represent any suitable characteristics of the lithographic projection apparatus or the substrate, for instance, focus, CD, image shift, image distortion, image rotation, etc.” ¶ [0060]: “The methods of FIG. 3 and FIG. 4 focus on local minimums of the cost function and thus may miss superior global minimums. Monte-Carlo algorithms may be used to find global minimums but would be computationally expensive because they involve finding local minimums (e.g. using the methods of FIG. 3 and FIG. 4) around many randomly selected starting positions in the space of the design variables. Machine learning algorithms may be useful to accelerate Monte-Carlo algorithms by filtering the randomly selected starting positions Monte-Carlo algorithms generate before trying to find local minimums around them.”)
 
With respect to claim 2, Ferns modified by Liu teaches the method of claim 1, and Ferns further teaches:
wherein the constraining uses a linear function. (Ferns, ¶ [0048]: “FIG. 5 is a flow diagram illustrating an exemplary combinatoric method 510 for determining the largest number of floatable parameters such that the P/T ratio of the combination is kept below a threshold (e.g., 1 as depicted in FIG. 2B) to ensure a good set of floatable parameters. There are                         
                            
                                
                                    
                                        
                                            
                                                N
                                            
                                        
                                        
                                            
                                                k
                                            
                                        
                                    
                                
                            
                        
                     such combinations” ¶ [0049]: “At operation 511 the test matrix (Jacobian) JT is assembled by taking all of the possible combinations of columns from J0 such that JT has k columns.”
The examiner notes that Ferns’ constraining critical parameters (e.g., Ferns’ fixed parameters having or requiring higher precision) by determining a set of floating parameters (e.g., the above floatable parameters that have lower precision(s)) and a weight coefficient (e.g., Ferns’ Jacobian component in the Jacobian matrix) teaches constraining critical parameters with at least one floating parameter and one or more weight coefficients (see citations and rationale for claim 1, supra). The examiner further notes that Ferns’ determining a set of floating parameters in FIG. 5 cited above uses a combinatorial method which selects combinations of possible columns (e.g., one column, two columns, …, all columns) from the total number of columns of a matrix is combinatorics and is thus linear (see e.g., Wikipedia, combinatorial matrix theory), and that the combinatorial method to select k parameters from a total of N parameters thus teaches a linear function that is used in determining floating parameters and hence constraining critical parameters.) 
 
With respect to claim 3, Ferns modified by Liu teaches the method of claim 1, and Ferns further teaches:
wherein the constraining uses a nonlinear function. (Ferns, ¶ [0049]: “Next, at operation 515, the precisions for the parameters in each combination are determined using equations (8) and (9). At operation 520, the max P/T ranking metric is determined for each combination and the combination for which the max P/T is minimized is identified. If the identified combination has a max P/T less than the P/T threshold (e.g., 1), that combination is stored to memory at operation 525.” ¶ [0042]: “FIG. 4B is a flow diagram illustrating an exemplary method 421 for determining a precision metric for each parameter from a Jacobian matrix, such as the test matrix JT which was assembled at operation 411. The method begins with a determination of spectral noise covariance, S, for all N model parameters, at operation 422. The model parameters are estimated by comparing the measured spectral information and simulated spectral information computed by a model, using a quadratic norm:                         
                            
                                
                                    x
                                
                                
                                    2
                                
                            
                            =
                            
                                
                                    1
                                
                                
                                    
                                        
                                            N
                                        
                                        
                                            e
                                            f
                                            f
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            f
                                            -
                                            m
                                        
                                    
                                
                                
                                    T
                                
                            
                            Q
                            
                                
                                    f
                                    -
                                    m
                                
                            
                        
                     (1)”.
The examiner notes that Ferns’ constraining critical parameters (e.g., Ferns’ fixed parameters having or requiring higher precision) by determining a set of floating parameters (e.g., the above floatable parameters that have lower precision(s)) and a weight coefficient (e.g., Ferns’ Jacobian component in the Jacobian matrix) teaches constraining critical parameters with at least one floating parameter and one or more weight coefficients (see citations and rationale for claim 1, supra). The examiner further notes that Ferns’ subsequent determining the precisions for the parameters in each combination by using Eqns. (8)-(9) that “begin[] with a determination of spectral noise covariance” by using the quadratic Eq. (1), which is nonlinear due to its quadratic nature.  Therefore, the examiner asserts that Ferns’ constraining critical parameters with at least one floating parameters uses a nonlinear function.) 
 
With respect to claim 6, Ferns modified by Liu teaches the method of claim 1, further comprising obtaining the one or more weight coefficients from a database. (Ferns, ¶ [0072]: “The machine-accessible storage medium 1031 may also be used to store or train a neural network, and/or a software library containing methods that train or call a neural network meta-model and/or a user interface of the neural network meta-model. The machine-accessible storage medium 1031 may further be used to store one or more additional components. While the machine-accessible storage medium 1031 is shown in an exemplary embodiment to be a single medium, the term “machine-readable storage medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions.”
The examiner notes that Ferns’ Jacobian components in a Jacobian matrix for parameters teach one or more weight coefficients (see citations and rationale for claim 1, supra). The examiner further notes that that Ferns’ storing and training neural networks with a database teaches storing the aforementioned Jacobian components and/or the Jacobian matrix in a database and hence teaches the above limitation.)
 
With respect to claim 7, Ferns modified by Liu teaches the method of claim 1, wherein the reference spectra are synthetic. (Ferns, ¶ [0005]: “From a parameterized model, simulated spectra for a given set of grating model parameter values may be computed using rigorous diffraction modeling algorithms, such as the Rigorous Coupled Wave Analysis (RCWA) method. Regression analysis is then performed at operation 156 until the parameterized model converges on a set of model parameter values characterizing a final profile model that corresponds to a simulated spectrum which matches the measured diffraction spectra to a predefined matching criterion.”
The examiner notes that Ferns’ simulated spectra teach reference spectra. The examiner further notes that Ferns’ generating the simulated spectra with its parameterized model teaches that the simulated spectra are synthetic.)
 
With respect to claim 9, Ferns modified by Liu teaches the method of claim 1, further comprising setting an error index for the convergence threshold. (Ferns, ¶ [0007]: “With each regression performed to arrive at the next simulated spectra, a decision on which of the model parameters are to be allowed to float (i.e., to vary) and which are to be fixed is needed. Generally, each model parameter allowed to float will render all other floating model parameters less precise and floating too many model parameters that cannot be precisely determined by the spectra may cause regression algorithms to become unstable.”
The examiner notes that Ferns’ deciding which parameters are allowed to float (and/or which parameters are to be fixed) during each regression renders one or more other parameters less precise and floating and therefore defines an error or accuracy which in turn affects the stability of the iterative regression process and hence a convergence threshold for the iterative regression process.  Therefore, Fern’s deciding which parameters are allowed to float or to be fixed or the result of such a decision (e.g., the links to the parameters allowed to float), under the broadest reasonable interpretation, teaches an error index for a convergence threshold as claimed.)
 
Ferns, Liu, and Jin are analogous art because all three references pertain to training neural networks for spectral analyses. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Ferns in view of Liu to incorporate Jin’s setting an error index for a convergence threshold (Jin, supra).  The modification not only addresses the problem having no solution for determining the number of neurons in a neural network to fit a nonlinear function but also provides an incremental solution with an optimized number of neurons that provide a specified accuracy (Jin, ¶ [0058]: “Guessing the optimal number of neurons in a feed-forward neural network to fit a nonlinear function is an NP-complete problem (i.e., a class of problems that have no known solution with polynomial-time complexity). Thus, in accordance with an embodiment of the present invention, and as described in more detail below, a fast method of optimization includes gradually increasing the number of neurons in the network during the training until an optimized number of neurons is determined that will provide a specified accuracy.”)
  
With respect to claim 14, Ferns teaches a computer program product comprising a non-transitory computer readable storage medium having computer readable program embodied therewith, the computer readable program configured to carry out the method of claim 1. (Ferns, ¶ [0064]: “This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, each coupled to a computer system bus.”)
 
With respect to claim 15, Ferns teaches a system comprising: Ferns, ¶ [0068]: “The exemplary computer system 1000 includes a processor 1002”.)
a processor in electronic communication with an electronic data storage unit and a wafer metrology tool, (Ferns, ¶ [0068]: “The exemplary computer system 1000 includes a processor 1002, a main memory 1004 (e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM) or Rambus DRAM (RDRAM), etc.), a static memory 1006 (e.g., flash memory, static random access memory (SRAM), etc.), and a secondary memory 1018 (e.g., a data storage device), which communicate with each other via a bus 1030.” ¶ [0066]: “A machine-readable medium includes any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer). For example, a machine-readable (e.g., computer-readable) medium includes a machine (e.g., a computer) readable storage medium (e.g., read only memory (“ROM”), random access memory (“RAM”), magnetic disk storage media, optical storage media, flash memory devices, etc.), a machine (e.g., computer) readable transmission medium (electrical, optical, acoustical or other form of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.)), etc.”)
 
wherein the processor is configured to: initialize a model in a manner that includes spectra fitting, wherein the model includes a Jacobian matrix; (Ferns, ¶ [0004]: “At operation 152, an initial scatterometry model is accessed. A scatterometry user may define an initial model of the expected sample structure by selecting one or more of the material files to assemble a stack of materials corresponding to those present in the periodic grating features to be measured.” ¶ [0030]: “In an embodiment, measured spectral information is received and a scatterometry model having a plurality (N) of model parameters floating is accessed. A Jacobian matrix (Jacobian) of the measured spectral information is calculated and, based on a precision metric determined from the Jacobian and spectral covariance matrix for each model parameter in a plurality of parameter combinations, a set of model parameters to be fixed at a predetermined parameter value is determined.” ¶ [0068]: “The exemplary computer system 1000 includes a processor 1002”.
The examiner notes that Ferns’ processor 1002 teaches a processor, and that Fern’s initial scatterometry model or subsequently revised scatterometry model teaches a model. The examiner further notes that Ferns’ constructing an initial scatterometry model by assembling a stack of materials corresponding to those present in the features whose scatterometry is to be measured so that the photons (e.g., spectra) resulting from the incident beam hitting these same materials exhibit substantially similar behaviors and hence can be used as reference for comparison teaches spectra fitting.  Therefore, the examiner asserts that Ferns teaches the above limitation in its entirety.)
 
constrain critical parameters with at least one floating parameter and one or more weight coefficients; and (Ferns, ¶ [0030]: “In further embodiments, a regression is run on the measured spectral information with the revised scatterometry model. The values of fixed model parameters having a relative precision metric that is sufficiently high that they cannot be reasonably floated in the scatterometry model are verified and updated from the nominal values, if significantly different. The fixed/float determination may then be looped to re-calculate the Jacobian matrix and re-select the fixed/floated parameter sets.” The examiner notes that Ferns’ fixed model parameter having a sufficiently high precision teaches a critical parameter. 
The examiner further notes that Fern’s floated parameter teaches at least one floating parameter, and that Fern’s determining which parameter can be floated in its determination of fixed parameter(s) a floating parameter(s) teaches that a critical parameter (e.g., Fern’s fixed parameter having sufficiently high precision) is constrained with at least one floating parameter.  Furthermore, according to FIG. 3 and its description of the present disclosure, W(n, pn) teaches a weighting for a Jacobian component.  Therefore, a Jacobian component in FIG. 3 teaches a weight coefficient. The examiner thus notes that Ferns’ calculating the Jacobian components for all parameters for the Jacobian matrix in Ferns’ determination of which parameters are to be fixed and are thus critical is based on at least one weight coefficient. Therefore, Ferns teaches the above limitation.)
 
train a neural network to use the model, wherein the training includes: (Ferns, ¶ [0037]: “Once the preprocessing operation 203 is completed, the revised model is then input to the model refining operation 204 in which regression intensive techniques may be employed to consider additional factors, such as additional spectral information characterizing within sample variation, etc.” ¶ [0072]: “The machine-accessible storage medium 1031 may also be used to store or train a neural network, and/or a software library containing methods that train or call a neural network meta-model and/or a user interface of the neural network meta-model.” The examiner notes that Ferns’ “neural network” teaches a neural network that uses Ferns’ scatterometry model, and that Ferns’ iteratively revising an initial scatterometry and/or subsequent, iterative model refining process at least by re-calculating the weight coefficient(s) teaches training the neural network to use the aforementioned model.)
 
adjusting at least one of the weight coefficients; (Ferns, ¶ [0029]: “In an embodiment, measured spectral information is received and a scatterometry model having a plurality (N) of model parameters floating is accessed. A Jacobian matrix (Jacobian) of the measured spectral information is calculated and, based on a precision metric determined from the Jacobian and spectral covariance matrix for each model parameter in a plurality of parameter combinations, a set of model parameters to be fixed at a predetermined parameter value is determined.” ¶ [0030]: “The fixed/float determination may then be looped to re-calculate the Jacobian matrix and re-select the fixed/floated parameter sets.”
The examiner notes that a Jacobian component for each parameter (e.g., the partial derivative of the output with respect to the parameter) in Ferns’ Jacobian matrix teaches a weight coefficient (also see citations and rationale for “weight coefficients”, supra), and that Ferns’ re-calculating a Jacobian component of a Jacobian matrix teaches adjusts at least one of the weight coefficients.)
 
performing a regression on reference spectra; (Ferns, ¶ [0006]: “The matching simulated spectra and/or associated optimized profile model can then be utilized at operation 157 to generate a set of simulated diffraction spectra by perturbing the values of the parameterized final profile model. The resulting set of simulated diffraction spectra may then be employed by a scatterometry measurement system operating in a production environment to determine whether subsequently measured grating structures have been fabricated according to specifications.” ¶ [0007]: “During the regression operation 156, simulated spectra from a set of model parameters for a hypothetical profile are fit to the measured sample spectra. With each regression performed to arrive at the next simulated spectra, a decision on which of the model parameters are to be allowed to float (i.e., to vary) and which are to be fixed is needed.”
The examiner notes that Ferns’ “matching simulated spectra,” “simulated diffraction spectra,” and/or “simulated spectra” teach reference spectra, and that Ferns’ performing each regression to determine the next simulated spectra for fitting measured spectra teaches performing a regression on reference spectra.)
 
determining an error between the critical parameters and the reference spectra; and (Ferns, ¶ [0038]: “With the parameterization so specified, a stable model with a reduced set (M of N) of floatable parameters is input to the model processor 250 to execute the model refinement operation 204 during which additional parameters may be fixed to output a final model having L of M parameters floating based on a best error estimate of parameters deemed critical”. The examiner notes that Ferns’ model refinement operation teaches a part of a training process of the model, and that Ferns’ determining the best error estimate of parameters that are deemed critical teach determining an error between the critical parameters and reference critical parameter values (and hence reference spectra as claimed).
 
repeating the adjusting, the performing, and the determining until the error is less than a convergence threshold. (Ferns, ¶ [0007]: “With each regression performed to arrive at the next simulated spectra, a decision on which of the model parameters are to be allowed to float (i.e., to vary) and which are to be fixed is needed.” ¶ [0061]: “If the normalized difference is not greater than a configurable difference threshold, the method 960 iterates to the next parameter until all parameters are checked. With all parameters determined to have normalized differences below the threshold, the method 960 returns to operation 390 of FIG. 3. In the event that the normalized difference is greater than a configurable threshold value for a first parameter, the method 960 returns to operation 301 (FIG. 3) to re-calculate the Jacobian J0 using the revised model.”
The examiner notes that Ferns’ configurable difference threshold teaches a convergence threshold, that Ferns’ difference of a parameter value teaches an error, and that Ferns’ iterative determination of whether to terminate re-calculation of the Jacobian based on whether the parameter value difference less than the configurable threshold value (and hence iterative performance of a regression analysis and determination of an error) teaches repeating the adjusting, the performing, and the determining until the error is less than a convergence threshold.)
 
Ferns does not appear to explicitly teach a root-mean-square error.
Liu does teach, however, that a root-mean square error (Liu, ¶ [0044]: “The optimization process boils down to a process of finding a set of parameters (design variables) of the system that minimizes the cost function. The cost function can have any suitable form depending on the goal of the optimization. For example, the cost function can be weighted root mean square (RMS) of deviations of certain characteristics (evaluation points) of the system with respect to the intended values (e.g., ideal values) of these characteristics; the cost function can also be the maximum of these deviations.” ¶ [0053]: “therefore, minimizing the weighted RMS of fp(z1, z2, …, zN) is equivalent to minimizing the cost function CF(z1, z2, . . . , zN)=                         
                            C
                            F
                            
                                
                                    
                                        
                                            z
                                        
                                        
                                            1
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            z
                                        
                                        
                                            2
                                        
                                    
                                    ,
                                    …
                                    ,
                                    
                                        
                                            z
                                        
                                        
                                            N
                                        
                                    
                                
                            
                            =
                             
                            
                                
                                    ∑
                                    
                                        p
                                        =
                                        1
                                    
                                    
                                        P
                                    
                                
                                
                                    
                                        
                                            w
                                        
                                        
                                            p
                                        
                                    
                                    
                                        
                                            f
                                        
                                        
                                            p
                                        
                                        
                                            2
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            z
                                        
                                        
                                            1
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            z
                                        
                                        
                                            2
                                        
                                    
                                    ,
                                    …
                                    ,
                                    
                                        
                                            z
                                        
                                        
                                            N
                                        
                                    
                                
                            
                        
                    , defined in Eq, 1. Thus the weighted RMS of fp(z1, z2, . . . , zN) and Eq. 1 may be utilized interchangeably for notational simplicity herein.” ¶ [0057]: “The predetermined termination condition may include various possibilities, i.e.[,] the cost function may be minimized or maximized, as required by the numerical technique used, the value of the cost function has been equal to a threshold value or has crossed the threshold value, the value of the cost function has reached within a preset error limit, or a preset number of iteration is reached. If either of the conditions in step 306 is satisfied, the method ends.”
The examiner notes that Liu’s RMS error representing the deviation of a characteristic form its target value teaches the claimed root-mean-square error. The examiner further notes that Liu’s termination condition (e.g., when the value of the RMS error is less than or equal to Liu’s threshold value) at which Liu’s process for minimizing the RMS error is terminated teaches a convergence threshold.)
 
Ferns and Liu are analogous art because both references pertain to using neural networks to improve processes for semiconductors. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Ferns with Liu’s training a neural network by determining an RMS error between a critical parameter and reference data (Liu, supra).  The modification formulates any optimization goal in terms of a cost function for any suitable characteristics of select characteristics with respect to the corresponding intended or ideal values of these select characteristics (e.g., root mean square of deviations) by finding local minimums of the cost function as well as using neural networks to accelerate the optimization for superior, global minimums of such select characteristics (Liu, ¶ [0044]: “The optimization process boils down to a process of finding a set of parameters (design variables) of the system that minimizes the cost function. The cost function can have any suitable form depending on the goal of the optimization. For example, the cost function can be weighted root mean square (RMS) of deviations of certain characteristics (evaluation points) of the system with respect to the intended values (e.g., ideal values) of these characteristics”. ¶ [0052]: “The cost function may represent any suitable characteristics of the lithographic projection apparatus or the substrate, for instance, focus, CD, image shift, image distortion, image rotation, etc.” ¶ [0060]: “The methods of FIG. 3 and FIG. 4 focus on local minimums of the cost function and thus may miss superior global minimums. Monte-Carlo algorithms may be used to find global minimums but would be computationally expensive because they involve finding local minimums (e.g. using the methods of FIG. 3 and FIG. 4) around many randomly selected starting positions in the space of the design variables. Machine learning algorithms may be useful to accelerate Monte-Carlo algorithms by filtering the randomly selected starting positions Monte-Carlo algorithms generate before trying to find local minimums around them.”)
 
 
With respect to claim 16, it is substantially similar to claim 2 or, as explicitly recited in the alternative in claim 16, substantially similar to claim 3 and is thus rejected accordingly, the same rationale applying.
 
With respect to claim 18, it is substantially similar to claim 6 and is thus rejected accordingly, the same rationale applying.
 
With respect to claim 20, it is substantially similar to claim 9 and is thus rejected accordingly, the same rationale applying.
 
Claims 4-5 and 17 stand rejected under 35 U.S.C. 103 as being unpatentable over Ferns et al. USPGPub 20120022836 with publication date of Jan. 26, 2012 (hereinafter Ferns) in view of Liu, X. USPGPub 20180314163 with EFD of Dec. 15, 2014 (hereinafter Liu) and further in view of Castillo et al. Functional Networks with Applications A Neural-Based Paradigm, pp. 29-130 (1999) (hereinafter Castillo).
 
With respect to claim 4, Ferns modified by Liu teaches the method of claim 3 but does not appear to explicitly teach:
wherein the constraining is performed with a single layer neural network. 
 
Castillo does, however, teach:
wherein the constraining is performed with a single layer neural network. (Castillo, p. 35, § 1.9.2, ¶ 1: “As we have seen before, a feed forward neural network with at least one hidden layer can approximate any nonlinear function to a given degree of accuracy.” p. 124, § 4.10, ¶ 1: “In this section we consider the case of one-layer functional networks, that is, functional networks with a single layer of neurons and no intermediate storing layers.” ¶ 2: “In fact, any functional network can be reduced to a one-layer functional network, by an adequate selection of the neural functions.”
The examiner notes that Castillo’s one-layer neural network teaches a single layer neural network.  The examiner further notes that Castillo’s reducing any functional network (e.g., the cited two-layer perceptron neural network for approximating any non-linear function such as Ferns’ quadratic function cited for claim 3, supra) to a one-layer neural network, when combined with Ferns’ quadratic function for determining floating parameters and hence for constraining critical parameters cited for claim 3, supra, teaches constraining critical parameters with a single layer neural network as claimed.)
Ferns, Liu, and Castillo are analogous art because all three references pertain to training neural networks. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Ferns in view of Liu to incorporate Castillo’s modeling a nonlinear function with a single layer neural network (Castillo, supra).  The modification provides the capability of reducing any functional neural network to a simple yet powerful one-layer neural network while still allowing for multiple dimensions and multiple arguments (Castillo, p. 124, § 4.10, ¶ 1: “In this section we consider the case of one-layer functional networks, that is, functional networks with a single layer of neurons and no intermediate storing layers.” ¶ 2: “One-layer functional networks are simple yet powerful because the neural functions are allowed to be multidimensional and multiargument.”)
 
With respect to claim 5, Ferns modified by Liu teaches the method of claim 3 but does not appear to explicitly teach:
wherein the constraining is performed with a multi-layered neural network. 
 
Castillo does, however, teach:
wherein the constraining is performed with a multi-layered neural network. (Castillo, p. 35, § 1.9.2, ¶ 1: “As we have seen before, a feed forward neural network with at least one hidden layer can approximate any nonlinear function to a given degree of accuracy.” p. 35, § 1.9.2, ¶ 3: “We have used a two-layers perceptron with different number of hidden units to approximate the noisy data.”
The examiner notes that Castillo’s neural network having the two-layer perceptron teaches a multi-layered neural network. The examiner further notes that Castillo’s approximating any nonlinear function (e.g., Ferns’ quadratic function for determining floating parameters and hence for constraining critical parameters cited for claim 3, supra) with such a neural network, when combined with Ferns’ nonlinear function for constraining critical parameters, teaches constraining critical parameters with a multi-layered neural network.)
Ferns, Liu, and Castillo are analogous art because all three references pertain to training neural networks. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Ferns in view of Liu to incorporate Castillo’s modeling a nonlinear function with a multilayered neural network (Castillo, supra).  The modification provides an explicit algorithm for approximating any set of functions to a given accuracy with a neural network having at most hidden layers as well as the capability of approximating any continuous function with only a single hidden layer (Castillo, p. 29, § 1.8, ¶ 3: “It has been shown that any set of functions Fi (Xl, ... , Xn) can be approximated to a given accuracy by (1.22) when considering, at most, two hidden layers. When these functions are continuous then a single hidden layer is enough (see Cybenko (1989)).”)
 
With respect to claim 17, it is substantially similar to claim 4 or, as explicitly recited in the alternative in claim 17, substantially similar to claim 5 and is thus rejected accordingly, the same rationale applying.
 
Claims 8 and 19 stand rejected under 35 U.S.C. 103 as being unpatentable over Ferns et al. USPGPub 20120022836 with publication date of Jan. 26, 2012 (hereinafter Ferns) in view of Liu, X. USPGPub 20180314163 with EFD of Dec. 15, 2014 (hereinafter Liu) and further in view of Kaushal et al. USPGPub 20100138026 published on June 3, 2010 (hereinafter Kaushal).
 
With respect to claim 8, Ferns modified by Liu teaches the method of claim 1 but does not appear to explicitly teach:
wherein the reference spectra are obtained from a semiconductor wafer. 
 
Kaushal does, however, teach:
wherein the reference spectra are obtained from a semiconductor wafer. (Kaushal, ¶ [0060]: “In the process conducted by tool system 310, sensors and probes comprising sensor component 325 can collect data (e.g., data assets)”; “Such techniques can include, but are not limiting to including, X-ray diffraction, transmission electron microscopy (TEM), scanning electron microscopy (SEM), mass spectrometry, light-exposure assessment, magnetoelectric transport measurements (e.g., Hall measurements), optical properties measurements (e.g., photoluminescence spectra, optical absorption spectra, time-resolved photoluminescence, time-resolved optical absorption), and so on. Additional data assets that are relevant to a product (e.g., a semiconductor substrate) include development inspection (DI) critical dimension (CD), and final inspection (FI) CD.” ¶ [0076]: “Autonomous biologically based learning system 360 can process the canonical data and the associated results (e.g., statistics about important parameters, observed drift in one or more parameters, predictive functions relating tool parameters, and so on) can be stored by self-awareness component 550 and employed for comparison to data supplied as information input 358; e.g., production process data or test run data.”
            The examiner notes that Kaushal’s dataset such as photoluminescence spectra, optical absorption spectra, etc. collected by its sensor component (325) as well as storing and employing the processing results of the aforementioned dataset for comparison with production process data or test run data teaches reference spectra as claimed. The examiner further notes that Kaushal’s dataset is relevant to semiconductor substrates and is thus obtained from a semiconductor wafer.)
 
Ferns, Liu, and Kaushal are analogous art because all three references pertain to training neural networks. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Ferns in view of Liu to incorporate Kaushal’s reference spectra obtained from a semiconductor wafer (Kaushal, supra).  The modification of using reference spectra obtained from semiconductor wafers enables the finding of significant level of tool system degradation and provide advance hint for process, goal, or contextual adjustment (Kaushal, ¶ [0076]: “If a difference between generated, learnt results of the canonical data and the device process run-data is small, then the manufacturing system degradation can be considered to be low. Alternatively, if the difference between stored learnt results of the canonical data and the sample process data is large, then there can be a significant level of tool system (e.g., semiconductor manufacturing system) degradation. A significant level of degradation can lead to a process, or goal, contextual adjustment.”)
 
With respect to claim 19, it is substantially similar to claim 8 and is thus rejected accordingly, the same rationale applying.
 
Claim 10 stands rejected under 35 U.S.C. 103 as being unpatentable over Ferns et al. USPGPub 20120022836 with publication date of Jan. 26, 2012 (hereinafter Ferns) in view of Liu, X. USPGPub 20180314163 with EFD of Dec. 15, 2014 (hereinafter Liu) and further in view of Hielscher et al. USPGPub 20130338496 published on Dec. 19, 2013 (hereinafter Hielscher).
With respect to claim 10, Ferns modified by Liu teaches the method of claim 9 but does not appear to explicitly teach:
defining a regularization item,  
wherein the regularization item is an inverse of an autocorrelation length, and  
wherein the autocorrelation length is one of the weight coefficients along a wavelength direction. 
defining a regularization item, 
wherein the regularization item is an inverse of an autocorrelation length, and 
wherein the autocorrelation length is one of the weight coefficients along a wavelength direction. 
 
Hielscher does, however, teach:
defining a regularization item, (Hielscher, ¶ [0055]: “The weight wj are calculated as:
                        
                            
                                
                                    w
                                
                                
                                    j
                                
                            
                            =
                             
                            
                                
                                    w
                                
                                
                                    n
                                    o
                                    r
                                    m
                                
                            
                            ∙
                            
                                
                                    ζ
                                
                                
                                    e
                                    x
                                    p
                                
                            
                            
                                
                                    
                                        
                                            d
                                        
                                        
                                            i
                                            ,
                                            j
                                        
                                    
                                    ;
                                    R
                                
                            
                            ,
                             
                            f
                            o
                            r
                             
                            j
                            ∈
                            N
                            
                                
                                    i
                                
                            
                        
                    		(11)
                
                    
                        
                            ζ
                        
                        
                            e
                            x
                            p
                        
                    
                    
                        
                            d
                            ;
                            R
                        
                    
                    =
                     
                    
                        
                            e
                        
                        
                            -
                            3
                            
                                
                                    
                                        
                                            
                                                
                                                    d
                                                
                                                /
                                                
                                                    R
                                                
                                            
                                        
                                    
                                
                                
                                    n
                                
                            
                        
                    
                    ,
                     
                    0
                    <
                    n
                    ≤
                    2
                
            
                
                    
                        
                            w
                        
                        
                            n
                            o
                            r
                            m
                        
                    
                    =
                     
                    
                        
                            
                                
                                    
                                        
                                            ∑
                                            
                                                j
                                                ∈
                                                N
                                                
                                                    
                                                        i
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    ζ
                                                
                                                
                                                    e
                                                    x
                                                    p
                                                
                                            
                                            
                                                
                                                    i
                                                    ,
                                                    j
                                                
                                            
                                        
                                    
                                
                            
                        
                        
                            -
                            1
                        
                    
                
            
¶ [0056]: “Where                         
                            
                                
                                    ζ
                                
                                
                                    e
                                    x
                                    p
                                
                            
                            
                                
                                    d
                                    ;
                                    R
                                
                            
                        
                    is the radial basis function (RBF),                         
                            
                                
                                    w
                                
                                
                                    n
                                    o
                                    r
                                    m
                                
                            
                        
                     is a normalized weight to ensure that each neighbor has equal influence on regularization, d is a distance from the node from its neighboring point j, and R is the correlation length used to control the influence of                         
                            
                                
                                    ζ
                                
                                
                                    e
                                    x
                                    p
                                
                            
                            
                                
                                    
                                        
                                            d
                                        
                                        
                                            i
                                            ,
                                            j
                                        
                                    
                                    ;
                                    R
                                
                            
                        
                    .”
The examiner notes that Hielscher’s defining the aforementioned radial basis function () for modifying a normalized weight (wnorm) in computing each weight (wj) of a weight matrix teaches defining a regularization process, and each component in the radial basis function, including 1/R, teaches a regularization item.)
 
wherein the regularization item is an inverse of an autocorrelation length, and (Hielscher, ¶¶ [0055]-[0056], supra. ¶ [0046]: “where ψ({                        
                            
                                
                                    r
                                
                                →
                            
                        
                    , Ω′, ω) is the complex-valued radiance in unit [W/cm2/sr.], μa and μs are the absorption and scattering coefficients, respectively, in units of [cm−1], ω is the external source modulation frequency and c is the speed of light inside the medium, Φ(Ω,Ω′) is the scattering phase function that describes scattering from incoming direction Ω′ into scattering direction Ω.” 
The examiner notes that Hielscher’s correlation length (R) teaches an autocorrelation length.  The examiner further notes that Hielscher’s use of the inverse of the autocorrelation length (1/R) as a regularization item in regularizing its neural network teaches that the regularization item is an inverse of an autocorrelation length.)
 
wherein the autocorrelation length is one of the weight coefficients along a wavelength direction. (Hielscher, ¶¶ [0055]-[0056], supra. ¶ [0046]: “As a forward model this procedure employs by the frequency-domain equation of radiative transfer (FD-ERT)

    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale
 
The examiner notes that the autocorrelation length (e.g., Hielscher’s correlation length, R) is multiplied to the normalized weight (wnorm) for each weight component (wj).  Therefore, for all the weights, the autocorrelation length is a weight coefficient because the autocorrelation length is a part of the radial basis function () that is multiplied to the normalized weight (wnorm) in computing the weight component wj.  The examiner further notes that Hielscher derives Eq. (11) by starting with Eq. (1) cited above that models the radiance of incident light for prediction optical properties P (see e.g., Hielscher, ¶¶ [0046]-[0057]).  Therefore, Hielscher’s correlation length R is pertaining to the direction (Ω) of the incident light having a first wavelength and/or the direction(s) of scattered light (Ω′) having a second direction.  As such, the examiner asserts that Hielscher’s autocorrelation length pertains to the aforementioned incident or scattering direction(s) and is thus along a wavelength direction.)
 
Ferns, Liu, and Hielscher are analogous art because all four references pertain to training neural networks for spectral analyses. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Ferns in view of Liu to incorporate Hielscher’s defining a regularization item as the inverse of an autocorrelation length along a wavelength direction (Hielscher, supra).  The modification provides three-dimensional image reconstructions and uses both forward and inverse variables to simultaneously solve both the forward and inverse problems and also eliminates the undesirable grid effects in image reconstruction from scanning (Hielscher, ¶ [0046]: “According to an embodiment, three dimensional image reconstructions can be performed using the PDE-constrained reduced Hessian SQP method that solves the forward and inverse problems simultaneously.” ¶ [0056]: “As a result, the operator as described in (10)-(11) has the same Smoothing effect regardless of the local grid density, which is desirable to eliminate “grid effects' due to variation in the cell size of unstructured grids.”)
 
Claim 11 stands rejected under 35 U.S.C. 103 as being unpatentable over Ferns et al. USPGPub 20120022836 with publication date of Jan. 26, 2012 (hereinafter Ferns) in view of Liu, X. USPGPub 20180314163 with EFD of Dec. 15, 2014 (hereinafter Liu) and Hielscher et al. USPGPub 20130338496 published on Dec. 19, 2013 (hereinafter Hielscher) and further in view of Cunha et al. Estimating the redshift distribution of photometric galaxy samples – II. Applications and tests of a new method (2009) (hereinafter Cunha).
 
With respect to claim 11, Ferns in view of Liu and Hielscher teaches the method of claim 10 but does not appear to teach:
wherein the adjusting the weight function includes using an overall cost function, 
wherein the overall cost function is a sum of the error index and the regularization item.  
 
Cunha does, however, teach:
wherein the adjusting the weight function includes using an overall cost function, (Cunha, p. 2388, right-hand column, ¶ 3 – p. 2389, left-hand column, ¶ 2: “If we define

    PNG
    media_image2.png
    58
    558
    media_image2.png
    Greyscale

            then the deconvolution can be stated as the problem of minimizing E0 with respect to N(z). To incorporate the prior, we define E = E0 + S		(15)”. 
The examiner notes that Cunha’s minimization target, E, teaches an overall cost function, and that Cunha’s minimizing the aforementioned overall cost function while incorporating weights teaches adjusting the weight function includes using an overall cost function.)
 
wherein the overall cost function is a sum of the error index and the regularization item.  (Cunha, p. 2388, right-hand column, ¶ 3 – p. 2389, left-hand column, ¶ 2, supra. The examiner notes that Cunha’s E0 teaches an error and hence an error index, that Cunha’s minimization target (E) teaches an overall cost function, and that Cunha’s parameter, [Symbol font/0x6C], teaches a regularization item because Cunha regularizes its neural network data with the aforementioned parameter [Symbol font/0x6C].  Therefore, the examiner asserts that Cunha’s Eq. (15) teaches that its overall cost function (e.g., the aforementioned objective function, E) is a sum of an error index (e.g., E0 defined by Cunha’s Eq. (14)) and a regularization item (e.g., Cunha’s regularization parameter [Symbol font/0x6C]), and that Cunha, when combined with Hielscher’s teaching of a regularization item that is an inverse of an autocorrelation length, thus teaches the above limitation.) 
 
Ferns, Liu, Hielscher, and Cunha are analogous art because all four references pertain to training neural networks for spectral analyses. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Ferns in view of Liu, Jin, and Hielscher to incorporate Cunha’s summing an error index and a regularization item as an overall cost function (Cunha, supra).  The modification provides a better estimate for the probability distribution (e.g., P(Zphot | Zspec) of having certain photons in a specific bin and improves the deconvolution estimate of spectral distribution (e.g., N(Zspec) in estimating the spectral distribution of photometric samples using observables and weight sampling from spectroscopic samples as well as simulation and real data (Cunha, p. 2389, left-hand column, ¶ 2: “The preceding discussion summarizes the ‘standard’ photo-z deconvolution method for estimating the redshift distribution. The weighting method can provide a better estimate of P(zphot|zspec)ij for the photometric sample, reducing the need for regularization and thereby improving the deconvolution estimate of N(zspec).” Abstract: “In Lima et al. we presented a new method for estimating the redshift distribution, N(z), of a photometric galaxy sample, using photometric observables and weighted sampling from a spectroscopic subsample of the data. In this paper, we extend this method and explore various applications of it, using both simulations and real data”.)
 
Claims 12-13 stand rejected under 35 U.S.C. 103 as being unpatentable over Ferns et al. USPGPub 20120022836 with publication date of Jan. 26, 2012 (hereinafter Ferns) in view of Liu, X. USPGPub 20180314163 with EFD of Dec. 15, 2014 (hereinafter Liu) and further in view of An et al., The Effects of Adding Noise During Backpropagation Training on A Generalization Performance (April 1, 1996) (hereinafter An).
With respect to claim 12, Ferns modified by Liu teaches the method of claim 1 but does not appear to explicitly teach:  
wherein the adjusting the weight function is configured to avoid over-fitting. 
An does, however, teach:
wherein the adjusting the weight function is configured to avoid over-fitting. (An, p. 656, § 4.2 “Weight Noise Added to the Hidden Layer”, ¶ 3: “We see from equation 4.13 that hidden-layer noise penalizes large derivatives and large weights at the output layer as well as large derivatives at the hidden layer. Large derivatives at the output layer are already penalized by the output-layer weight; hidden-layer noise thus reinforces such a penalty.” p. 656, last paragraph: “In summary, the main effects of weight noise are (1) reducing the number of hidden units, and (2) encouraging sigmoidal units, especially in the output layer, to operate in the saturation states, i.e., firmly on or off. The first effect limits the number of hidden units in a network and hence can prevent overfitting.”
The examiner notes that a weight function is interpreted as any expression or relation that pertains to weight adjustment.  The examiner further notes that An’s adding weight noise to the hidden layer to penalize large derivatives at the hidden layer teaches adjusting weights of the hidden layers and thus teaches a weight function.  The examiner also notes that An’s explicit teaching that adding weight noise prevents overfitting teaches that adjusting the weight function is configured to avoid over-fitting as claimed.)
Ferns, Liu, and An are analogous art because all four references pertain to training neural networks for spectral analyses. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Ferns in view of Liu to incorporate An’s adjusting weights to avoid overfitting (An, supra).  The modification not only injects layer noise to produces a neural network that is insensitive to noise in the data but also minimizes a cost function simply by training the neural network with data noise (An, P. 648, ¶ 2: “Thus,  noise  injection  could  prevent  the  neural  network  from overfitting the training  set and may  result  in  neural  networks  that  are insensitive to noise in  the data.” p. 648, ¶ 3: “We see that training with data noise effectively minimizes a cost function that differs from the standard error function E(w); the injected noise implicitly alters the  training  objective function.”)
 
With respect to claim 13, Ferns modified by Liu and An teaches the method of claim 12, and An further teaches:
 
wherein the weight function is equal to noise, and (An, p. 657,  § 5, ¶ 2: “The weight update rule for training with Langevin noise reads (Hertz et aI. 1989; Rognvaldsson 1994; Guillerm and Cotter 1991
wt+1 = wt -  E(w) +         (5.1)
where the noise is a gaussian random variable with mean zero and unit standard deviation. Let t =  be small and constant. Equation 5.1 can be viewed as a discretised version of the following continuous-time Langevin equation (Gillespie 1992; Seung et al. 1992):
dw = -  E(w)dt +   (5.2)”
The examiner notes that An’s noise () pertaining to adjustment of weight(s) of a neural network is interpreted as the weight function being equal to noise. The examiner thus asserts that An’s noise () is used in adjusting the weights of the weight matrix (w) as taught in Eqns. (5.1) and (5.2) and thus teaches a weight function. Therefore, An teaches that the weight function is equal to noise ().)
 
wherein the noise is continuous along a wavelength or parameter direction. (An, p. 657, § 5, ¶ 2, supra.  The examiner notes that a weight function pertaining to noise is interpreted as the weight function being equal to noise. The examiner also notes that An’s noise () that is used to adjust weights (w) and thus teaches a weight function that is equal to noise.  The examiner further notes that An’s noise (a Gaussian random variable) with mean zero and unit standard deviation expressed in the continuous-time equation (5.1) is continuous with respect to at least the weight parameters in the weight matrix (w) and the temporal parameter because the components of a continuous function are also continuous. Therefore, the examiner notes that An teaches that the noise is continuous along a wavelength or parameter direction.
Ferns, Liu, and An are analogous art because all four references pertain to training neural networks for spectral analyses. 
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Ferns in view of Liu to incorporate An’s using noise continuous in the wavelength or parameter direction as weights (An, supra).  The modification enables direct injection of An’s noise to the weight changes (e.g., An’s Eq. 2.7) while bypassing the neural network and the error function and further trains a neural network with Brownian dynamics that achieves both training and minimizing errors to achieve a global minimum error. (An, p. 657, § 5, ¶ 1: “Both the data and weight noise affect the dynamics of training through e(z,w) during the evaluation of Aw. In contrast, the Langevin noise bypasses the neural network and the error function; it is directly injected to the weight changes as indicated in equation 2.7.” p. 658, ¶ 3: “Training by Brownian dynamics with annealing likewise minimizes E(w), the end configuration of the training being the global minimum of E(w).”)
 
Conclusion
       The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
(a)            Jin et al. USPGPub 20120226644 published on Sept. 6, 2012 teaches setting an error index for a convergence threshold (e.g., Jin’s determined number of neurons) by guessing the optimal number of neurons in a feed-forward neural network to fit a nonlinear function is an NP-complete problem (i.e., a class of problems that have no known solution with polynomial-time complexity) and hence a fast method of optimization includes gradually increasing the number of neurons in the network during the training until an optimized number of neurons is determined that will provide a specified accuracy.
(b)           NASA, The Basics of Spectral Fitting (Nov. 25, 2004) teaches that the instrumental response of a spectrometer can be obtained by selecting a model spectrum that can be described in terms of parameters and match or fit the selected model to data obtained by the spectrometer to compare the predicted count spectrum to observed count spectrum. A best-fit model may then be determined by finding parameter values that produce the best-fit statistics.
(c)            Hermans et al. Memory in linear recurrent neural networks in continuous time (2009) teaches an autocorrelation function, (R(t) = exp(-|t|), which describes a signal which is limited in bandwidth by the finite autocorrelation length, , and where the signal timescale -1 for reservoir computing using continuous-time neural networks.
 
 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERICH C. TZOU whose telephone number is (571)272-9852. The examiner can normally be reached Monday-Friday 6:00AM-5:00PM PST with alternative Fridays off.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann J. Lo can be reached on 571-272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/E.C.T./Examiner, Art Unit 2126         
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126