DETAILED ACTION
This is the response to applicant’s amendment action regarding application number 16/361,915, filed March 22, 2019.


Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendments
The amendment filed August 10, 2022 has been entered. Examiner acknowledges receipt of Amendments to Application 16/361,915, which include: Amendments to the Claims, and Remarks containing Applicant’s amendments.
Regarding Applicant’s Remarks and Amendments to the Claims, Examiner acknowledges Applicant has amended Claim 11. Claims 1-20 remain pending in the application.

Response to Arguments
Examiner acknowledges receipt of Arguments to Application 16/361,915, which include: Remarks containing Applicant’s arguments.
Regarding Applicant’s Remarks for the identified specification objections regarding undefined subscript terms ‘v0’ and ‘PS’, Examiner acknowledges Applicant’s arguments and have considered them. Applicant explicitly indicates in their Remarks that these subscripts provided in Applicant’s specification paragraph [0032] are example representations that do not require any specific definition. Examiner points out that these subscripts appear to be more specific than just indicating a generic identifier ‘A’ or ‘B’ to denote a general subscript, and hence providing respective definitions for those subscripts would aid in the understanding of how those subscript terms are relevant and are applied in the context of the specification. However, given that Applicant explicitly acknowledges these subscripts represent example identifiers that are not directed to any specific definition, and that these example identifiers are not recited in any existing claim limitation that would cause indefinite issues, Examiner withdraws the respective specification objections previously set forth in the Non-Final Office Action mailed February 10, 2022.
Regarding Applicant’s Remarks for Claims 8-14 under 35 U.S.C. 101, Examiner acknowledges Applicant’s arguments and have considered them, and have found them to be not persuasive.
Regarding Applicant’s Remarks:
“In support of such an assertion, the Office Action points to the Miriam-Webster definition of the term "engine" as being software only. Applicant respectfully submits that such an interpretation of the term "engine" is improper, since the Specification expressly describes engines as a combination of hardware and programming implemented via local resources of a computing system. (See paragraph [0018] of the Specification). As such, the engines recited in claims 8-14 are not directed to software per se and instead recite patent-eligible subject matter. Withdrawal of the §101 rejections to claims 8-14 is respectfully requested.”
Examiner notes that Applicant’s above argument points to Applicant’s specification paragraph [0018] to indicate that the specification “expressly describes engines as a combination of hardware and programming implemented via local resources of a computing system”. However, Examiner points out that Applicant’s specification paragraph [0018] does not provide an explicit definition for the terms “model training engine” and “volume diagnosis adjustment engine” as being limited to a combination containing hardware components. Applicant’s specification paragraph [0018] merely states an example where a computing system “may implement a model training engine 108 and volume diagnosis adjustment engine 110 (and components thereof) in various ways, for example as hardware and programming implemented via local resources of the computing system”. A person having ordinary skill in the art would understand that the above statement does not limit the model training engine or volume diagnosis adjustment engine to a specific implementation or embodiment that contains hardware components and programming, and instead indicates that there are various ways of implementing those specified engines. One of these various ways may include an implementation representing only software programming. Examiner also points out that this latter engine implementation representing only software programming is consistent with the usage of the term “engine” as a term of art, such that a person having ordinary skill in the art would recognize this term “engine” as referring to computer software. This definition of “engine” is also consistent with the general definition of the term provided in the Merriam-Webster dictionary (merriam-webster.com/dictionary/engine, retrieved on 2/7/2022, where an engine is defined as “computer software that performs a fundamental function especially of a larger program”). Furthermore, Applicant’s specification paragraph [0068] suggests that “The systems, methods, devices, and logic described above, including the model training engine 108 and the volume diagnosis adjustment engine 110, may be implemented in many different ways in many different combinations of hardware, logic, circuitry, and executable instructions stored on a machine-readable medium. For example, the model training engine 108, volume diagnosis adjustment engine 110, or both, may include circuitry in a controller, a microprocessor, or an application specific integrated circuit (ASIC), or may be implemented with discrete logic or components, or a combination of other types of analog or digital circuitry …”. Examiner points out that the recitation of the term “components” for implementing the model training engine or volume diagnosis adjustment engine also does not restrict the component as being a hardware component, since the “components” of an engine may be implemented in many different ways in many different combinations, where one of such options includes executable instructions (i.e., a software program). Hence this paragraph also does not definitively indicate that an engine implementation must include hardware components, as asserted by the Applicant. Given that Applicant’s specification does not expressly describes engines as being limited to embodiments that contains hardware as asserted by the Applicant, Applicant’s arguments are not persuasive, and the existing 101 rejection is maintained. Examiner advises that if Applicant intends to claim a specific embodiment of an engine that includes hardware, Applicant must positively recite hardware in the claim language to overcome the existing 101 rejection.
Regarding Applicant’s Remarks for Claims 1-5, 8-12, and 15-18 under 35 U.S.C. 103 as being unpatentable over Cheng et al., Volume Diagnosis Data Mining, 2017 22nd IEEE European Test Symposium (ETS) [hereafter referred as Cheng], in view of Benware et al., Determining a Failure Root Cause Distribution From a Population of Layout-Aware Scan Diagnosis Results, IEEE Design & Test of Computers, 2012 [hereafter referred as Benware]; and for Claims 6-7, 13-14, and 19-20 under 35 U.S.C. 103 as being unpatentable over Cheng in view of Benware as applied to Claims 1, 8, and 15; in even further view of Rajski et al., U.S. PGPUB 2006/0066339, Determining and Analyzing Integrated Circuit Yield and Quality, published 3/30/2006 [hereafter referred as Rajski], Examiner acknowledges Applicant’s arguments and have considered them, and have found them to be not persuasive. Examiner points out each argument from the Applicant and addresses each of them in the following paragraphs with respect to the existing prior art mapping.
Regarding Applicant’s Remarks:
“Claim 1 recites method features performed by a computing system, including: 
accessing a diagnosis report for a given circuit die that has failed scan testing; 
computing, through a local phase of a volume diagnosis procedure, a probability distribution for the given circuit die from the diagnosis report, wherein the probability distribution specifies probabilities for different root causes as having caused the given circuit die to fail;
adjusting the probability distribution into an adjusted probability distribution using a supervised learning model, the supervised learning model trained with a training set comprising training probability distributions computed from training dies through the local phase of the volume diagnosis procedure, each training probability distribution labeled with an actual root cause that caused a given training die to fail; and
providing the adjusted probability distribution for the given circuit die as an input to a global phase of the volume diagnosis procedure to determine a global root cause distribution for multiple circuit dies that have failed the scan testing.
(Emphasis added). Accordingly, claim 1 expressly recites adjusting a probability distribution into an adjusted probability distribution using a supervised learning model. Claim 1 further recites that the supervised learning model is trained with a training set comprising training probability distributions labeled with an actual root cause that caused a given training die to fail. Applicant submits the Cheng-Benware combination fails to teach at least the above-emphasized features of claim 1.”
Examiner has considered this argument and finds the argument to be not persuasive. Examiner points out that Applicant’s argument is directed to the recited limitations: “adjusting the probability distribution into an adjusted probability distribution using a supervised learning model, the supervised learning model trained with a training set comprising training probability distributions computed from training dies through the local phase of the volume diagnosis procedure …”. Examiner points out that the above limitations actually contain two aspects, each of which will be addressed and clarified in the following paragraphs. 
Regarding the aspect associated with the limitation “… the supervised learning model trained with a training set comprising training probability distributions computed from training dies through the local phase of the volume diagnosis procedure …”, this limitation broadly recites generating training sets containing probability distributions, and using these training sets to train a supervised learning model. As indicated in the Non-Final Office Action mailed February 10, 2022, Cheng teaches using the most-likelihood estimation to compute and solve the probability parameters represented in the volume diagnosis Bayesian network model. Cheng further teaches that the MLE methods is used for improving the accuracy of these probability parameters, and that a combination of supervised machine learning, deep learning techniques, and domain knowledge is used to generate proper training data to build a supervised learning model, where the process of improving the accuracy of these probability parameters represents a form of adjustment of the probability distribution, and where this generation of proper training data includes using the domain knowledge information related to each of the probability parameters (where this domain knowledge is found in the scan diagnosis reports associated with individual failed ICs, thus forming a set of training data for each defective IC). Cheng additionally teaches applying cross-validation techniques to a total sampled data set, separating the sampled data set into training data and test data sets, where the training data set was used in a training phase to perform the MLE method to find the most likely distribution for the probability parameters identified in the volume diagnosis Bayesian network model (Cheng p.2 col.1 last paragraph-col.2 4th paragraph (Section II. Volume Diagnosis Model); p.6 col.1 2nd paragraph-col.2 2nd paragraph (Section V. Volume Diagnosis Practical Uses); and p.6 col.2 4th paragraph-p.7 col.1 4th paragraph). The probability parameters being learned from the Bayesian network are the conditional probability distributions (e.g., P(r|f), P(f|d), P(d|c)) representing the diagnosis reports. Examiner points out that this interpretation is consistent with Applicant’s specification paragraphs [0037] and Figure 2, where the training probability distributions are an output of a local phase of a volume diagnosis procedure, which are the diagnosis reports shown in Applicant’s Figure 2 (element 220). Cheng  teaches that the computed probability of a sampled diagnosis report containing a specific root cause is the product of the above-mentioned conditional probabilities, where each of these conditional probabilities is based on known data (faults, defects, physical features/root causes) that is present in each diagnosis report. Hence, each diagnosis report represents the conditional probabilities associated with the known data in each report, where this known data also represents labeled data found in each report. To provide additional clarification, Examiner points out that Cheng uses a card-game analogy to explain the creation of a Bayesian network model for computing volume diagnosis probability distributions, where in the card-game, the goal is to determine the probability of a number n drawn from a specific deck d, where each card contains known information about the card such as the number of the card, and each deck d is labeled with an indicator A, B, and C, where card samples are drawn from the set of decks (Cheng p.2 col.2 3rd paragraph: “… This card game has similar statistical characteristics as volume diagnosis.”; and pp.2-3 Section III. Card Game, in particular p.3 col.1 Figure 2 and p.3 2nd-4th paragraphs: “… As mentioned above in the setting the information of numbers on all cards included in each card deck is known. … An example of the card game is shown in Figure 2. There are 3 card decks. Deck A has card numbers 1, 2, 3, 4, 5, 6. Deck B has card numbers 1,3, 5. Deck C has card numbers 1, 2, 3. Four cards are drawn by the genie … Using MLE, the problem is to identify the probability distribution of card deck selection by the genie such that the probability of drawing these 4 numbers has the maximum value. …”). Cheng further teaches performing limited sampling by drawing enough cards from a specific deck to determine the MLE to overcome biased data present in each card deck, where in each of these cases, the drawn cards represent a training sample containing labeled data such as the card number (Cheng p.6 col.2 3rd-4th paragraphs: “… drawing a card from this deck repeatedly for 100 times, one can get the following sampled numbers … As in volume diagnosis, P(r|f), P(f|d), and P(d|c) can be correct based on unlimited diagnosis reports. Same as P(n|d), they may not be true for each fault, each defect and each root cause when only limited diagnosis reports are used …”; and p.7 col.1 1st paragraph-4th paragraphs: “… MLE distribution needs to include more card decks to get the maximum value on the biased data … In volume diagnosis, over-fitting problem in general adds extra root causes in the MLE distribution. Over-fitting can be alleviated by increasing sample data size … There are several popular machine learning techniques to deal with the over-fitting problem … cross-validation was used in DDYA … to divide the total sampled data set into N parts randomly. N-1 parts are used as training data and the remaining 1 part is used as test data. MLE finds the most likely distribution of training data and applied this distribution on testing data to measure its fitness.”). In the context of applying this card game analogy to the volume diagnosis scenario and the process of producing cross-validation training and test sets to train and evaluate a model, a person having ordinary skill in the art would understand through Cheng’s detailed teaching of the card game analogy that one must perform sufficient sampling of diagnosis reports and perform over-fitting techniques to overcome the data bias present in a limited sample. Hence, this sampling of diagnosis reports containing known/labeled information represents the generated sampled training data based on the generated diagnosis reports (representing the training probability distributions) containing known or labeled data. Taking into account with Cheng’s earlier statement about using supervised machine learning techniques to derive these parameters based on good training data, a person having ordinary skill in the art would understand that the Cheng provides sufficient teachings that explain how to produce training samples through the sampling of diagnosis reports (where each report represents a probability distribution of conditional probabilities of known data). Hence, these training samples containing the known/labeled information (e.g., faults, defects, root causes) correspond with the good training data that is used to perform the supervised machine learning techniques to train a supervised learning model. Hence, given the above evidence, Applicant’s arguments are not persuasive, and the existing prior art rejection is maintained.
Regarding the aspect associated with the limitation “adjusting the probability distribution into an adjusted probability distribution using a supervised learning model …”, this limitation broadly recites using a trained supervised learning model to produce an output representing an adjusted probability distribution. As indicated in the Non-Final Office Action mailed February 10, 2022, Cheng teaches using supervised learning techniques to build a deep learning model to learn the conditional probability parameters P(r|f), P(f|d), P(d|c), where the training of this deep learning model using supervised learning techniques results in a trained supervised learning model that learned these probability parameters modelled in a Bayesian network (Cheng p.6 col.2 2nd paragraph: “… an alternative is to use supervised machine learning techniques to derive these parameters based on good training data. With more aggressive deep learning techniques it is possible that a new model can be created to replace the Bayesian network. Some domain knowledge is still needed to get proper training data.”). As established earlier, these conditional probability parameters P(r|f), P(f|d), P(d|c) are defined as the conditional probability of a report when a specific fault in this report is true; the conditional probability of a fault if a specific defect is true; and the conditional probability of a defect if one of its root cause is true, respectively. In other words, these faults, defects, and root causes are present in each diagnosis report and represent known or labeled data in each diagnosis report (Cheng p.1 Section I. Introduction: “Recent advancements in scan diagnosis technologies include use of more physical information … This extra information not only improves diagnosis resolution to smaller and smaller defect locations, but also identifies physical features associated with each defect more precisely … With physical defect features reported by diagnosis tools, several papers [16-24] have proposed to use volume (large amount of) diagnosis reports with appropriate statistical analysis to automatically identify a common physical default feature such that yield can be improved by fixing this common physical defect feature. … Typically, in one diagnosis report, there can be several logic faults each of which can cause the failures observed at testers. Also reported are several physical defects each of which can cause one fault. Also reported are several physical features each of which is responsible to cause one defect. In other words, a diagnosis report can be caused by several possible physical features with various probability of each. …”). Cheng further teaches P(r) as a product of the above conditional probabilities to represent the probability of one diagnosis report. A person having ordinary skill in the art would understand that in statistics, a product of conditional probabilities represents a sequence of probabilities, where the sequence of probabilities represents a probability distribution (Cheng p.2 col.1 last paragraph-col.2 4th paragraph: “As mentioned above, in each diagnosis report, there can be several faults each of which matches the failures observed at testers. There are several defects each of which can cause one specific fault. There are several physical features each of which can be responsible to trigger one specific defect. It is possible that one physical feature can trigger two different defects at two different physical locations. … Based on Bayesian network we assume that             
                P
                
                    
                        v
                    
                
                =
                
                    ∏
                    
                        P
                        (
                        r
                        )
                    
                
            
        . That is the probability of all sampled volume diagnosis reports is equal to the product of the probability of each diagnosis report if all diagnosis reports are independent.             
                P
                
                    
                        r
                    
                
                =
                
                    ∑
                    
                        (
                        P
                        
                            
                                r
                            
                            
                                f
                            
                        
                        *
                        P
                        
                            
                                f
                            
                        
                        )
                    
                
            
         if all faults are mutually exclusive and independent. … P(r|f) is a conditional probability of report r if a specific fault is true … P(f|d) is a conditional probability of fault f if a specific defect is true … P(d|c) is a conditional probability of defect d if a specific root cause c is true. Combining all these equations we get             
                P
                
                    
                        v
                    
                
                =
                
                    ∏
                    
                        
                            ∑
                            
                                (
                                P
                                
                                    
                                        r
                                    
                                    
                                        f
                                    
                                
                                *
                                (
                                
                                    ∑
                                    
                                        (
                                        P
                                        
                                            
                                                f
                                            
                                            
                                                d
                                            
                                        
                                        *
                                        
                                            
                                                
                                                    ∑
                                                    
                                                        
                                                            
                                                                P
                                                                
                                                                    
                                                                        d
                                                                    
                                                                    
                                                                        c
                                                                    
                                                                
                                                                *
                                                                P
                                                                
                                                                    
                                                                        c
                                                                    
                                                                
                                                            
                                                        
                                                    
                                                
                                            
                                        
                                        )
                                        )
                                    
                                
                                )
                            
                        
                    
                
            
        . The accuracy of this Bayesian network depends on the accuracy of P(r|f), P(f|d), and P(d|c).”). Cheng additionally teaches that each of the above conditional probabilities represent adjusted values, where the adjustments broadly recite the changes in the relationships between the known data (e.g., faults, defects, and root causes) determined based on score systems used in the diagnosis tools, where this relationship and score information are also present and differ in each diagnosis report (Cheng p.6 Section V. Volume Diagnosis Practical Uses 1st-3rd paragraphs: “… Another alternative is to choose P(r|f) to be less than 1 for faults included in the diagnosis report to indicate they are close but not exact. To get correct P(r|f) requires the understanding of the score system used in diagnosis tools … P(f|d) … this parameter requires the understanding of the relationship among the logic faults and physical defects used in diagnosis tools … Similar to P(r|f), adjustment is needed to get correct P(f|d) … P(d|c) … It can be calculated based on how many defects can be caused by this root cause. This information can be derived from layout and defect behavior of each root cause. The correlation among these root causes and the defects should be used to get accurate P(d|c)…”). As established earlier in the preceding argument through Cheng’s detailed teaching of the card game analogy, for the volume diagnosis scenario, one must perform sufficient sampling of diagnosis reports and perform over-fitting techniques to overcome the data bias present in a limited sample. Hence, this sampling of diagnosis reports containing labeled fault, defect, and root cause information and their correlating scores/relationships represents generating sampled training data based on the generated diagnosis reports containing known and labeled data. These generated training samples containing this known and labeled information represents the “good domain knowledge” and “good training data” that the Cheng reference refers to when it is training this deep learning model using supervised learning techniques (Cheng p.6 col.2 2nd paragraph). A person having ordinary skill in the art would understand that a supervised learning method represents a method for learning the changes in data that is present in a given training set to perform a prediction, and hence the presence of these known faults, defects, root causes, and their relationships (modelled as a probability distribution present in the diagnosis reports) is learned through the sampled training data taught in Cheng, resulting in a learned model that performs a prediction based on received input data, where this prediction represents an adjusted probability distribution output based on the received input data (i.e., the known faults, defects, root causes and their relationships expressed as conditional probability distributions) found in the diagnosis reports. Examiner points out that this interpretation is consistent with Applicant’s specification paragraph [0055] and Figure 3, where the adjusted probability distribution output shown in Figure 3 (elements 331, 332) represents the estimated probability distributions learned by the supervised learning model for a given diagnosis report. Hence, given the above evidence, Applicant’s arguments are not persuasive, and the existing prior art rejection is maintained.
Regarding Applicant’s Remarks:
“Both Cheng and Benware generally relate to determination of root cause distributions, and teaching doing so through unsupervised learning techniques. As specifically described in Cheng and Benware, root cause distribution ("RCD" for short) techniques utilize Bayesian networks and Maximum Likelihood Estimation ("MLE") to determine probability parameters for specifying probabilities of certain defect types causing a chip defect. Note that all of the specific types of processing described in Cheng and Benware for RCD and volume diagnosis are unsupervised learning techniques (including Bayesian networks and MLE). As described in Cheng and Benware, none of the input data processed by the Bayesian networks and MLE is labeled with the actual root cause, which makes sense as MLE and Bayesian networks operate as unsupervised learning methods. Accordingly, any reliance upon MLE and Bayesian networks by the Office Action to teach the supervised learning model of claim 1 is misplaced.”
Examiner has considered this argument and finds the argument to be not persuasive. Examiner first points out that “RCD” as defined in the Benware reference is an acronym for “root cause deconvolution”, where RCD is a learning and inference process (Benware p.9 col.2 2nd paragraph: “… The second section presents the details of the complete learning and inference process, which we refer to as root cause deconvolution (RCD).”). This definition is consistent with Applicant’s definition of the same acronym “RCD” defined in Applicant’s specification paragraph [0021]: “… various ML-based volume diagnosis adjustment features are described with reference root cause deconvolution (“RCD”).”. Examiner further points out that part of Applicant’s above arguments are directed towards the assertion that both references teach using unsupervised machine learning methods such as Bayesian networks and maximum-likelihood estimation to determine root cause distributions. However, Examiner points out that the Cheng and Benware references do not use the term “unsupervised” when describing the RCD analysis that uses the Bayesian network model or machine learning techniques applied to the model for determining root cause distributions. A person having ordinary skill in the art would understand that a Bayesian network broadly indicates a probabilistic graphical model that represents a set of variables and conditional probabilities or dependencies between these variables, and therefore is not strictly categorized as an unsupervised machine learning model. Examiner also points out that Applicant appears to equate the maximum-likelihood estimation (MLE) technique as being a strictly unsupervised learning technique. However, Examiner points out that ScienceDirect.com defines maximum-likelihood estimation (https://www.sciencedirect.com/topics/mathematics/maximum-likelihood-estimation) as a general statistical estimation procedure “whereby the parameters of a model are optimized by maximizing the joint probability or probability density of observed measures based on an assumed distribution of those measurements”. The Oxford Reference also defines maximum-likelihood estimation (https://www.oxfordreference.com/view/10.1093/acref/9780191816826.001.0001/acref-9780191816826-e-0233?rskey=wq6HZf&result=10) as a “statistical estimation procedure (an alternative to ordinary least squares, OLS) for finding the value of one or more parameters for a given statistic which makes the known likelihood distribution a maximum. MLE chooses as the estimates of the parameters the values for which the probability of the observed scores is the highest. MLE is an integral part of structural equation modelling and generalized linear model algorithms.”. The above definitions are also consistent with the definition provided in the Cheng reference (Cheng p.2 col.1 4th paragraph: “In statistical analyses, maximum-likelihood estimation (MLE) [25] is a method of estimating the distribution for observed data.”). Hence, all three definitions of MLE merely indicate that the procedure is a common statistical procedure for determining a maximum probability distribution, and do not describe nor restrict this statistical method as a method applicable only for unsupervised machine learning. Hence, given the above evidence, Applicant’s arguments are not persuasive, and the existing prior art rejection is maintained.
Regarding Applicant’s Remarks:
“For example, the Cheng-Benware combination fails to teach or suggest the claimed feature of "the supervised learning model trained with a training set comprising training probability distributions computed from training dies through the local phase of the volume diagnosis procedure, each training probability distribution labeled with an actual root cause that caused a given training die to fail". There is no teaching of training a learning model with labeled input data, much less training probability distributions labeled with an actual root cause that caused a given training die to fail. The Office Action cites to p.14, col. 1 - p.15, col 1 of Benware in alleging the Cheng-Benware combination teaches labeled training probability distributions. However, the cited portions of Benware merely teach simulations to test the accuracy of probability parameters derived from RCD techniques utilizing unsupervised learning techniques. The Office Action can only assert such simulation results "are a form of labeling" (Office Action, p. 8). But even if this were true (which Applicant does not concede), Benware is silent as to further processing of the simulated data, let alone providing such simulation results as training data for supervised learning model. That is, both Cheng and Benware are wholly silent as to the claimed feature of ""the supervised learning model trained with a training set comprising training probability distributions computed from training dies through the local phase of the volume diagnosis procedure, each training probability distribution labeled with an actual root cause that caused a given training die to fail" and thus fail to teach or suggest all of the features of claim 1.”
Examiner has considered this argument and finds the argument to be not persuasive. Examiner notes that Applicant’s above argument contains several sub-arguments, each of which will be addressed in the following paragraphs.
Regarding Applicant’s sub-argument that asserts that Cheng does not teach generating training sets and training a supervised learning model with the training set, Examiner finds this sub-argument to be not persuasive. Examiner points out that this sub-argument has been addressed in an earlier preceding argument involving the limitation “… the supervised learning model trained with a training set comprising training probability distributions computed from training dies through the local phase of the volume diagnosis procedure …” and Examiner refers Applicant to the above response to that preceding argument as reference. Hence, based on the same response addressed to the preceding argument, Applicant’s arguments are not persuasive, and the existing prior art rejection is maintained.
Regarding Applicant’s sub-argument that asserts that Benware does not teach a training probability distribution labeled with an actual root cause that caused a given training die to fail, Examiner finds the sub-argument to be not persuasive.  As established earlier in response to Applicant’s preceding arguments (and also shown in Applicant’s specification paragraph [0037] and Figure 2), the training probability distributions are represented by the diagnosis reports. Furthermore the term “actual” as defined in Merriam-Webster dictionary broadly indicates something that exists in fact or reality, and hence merely indicates that the root cause is a root cause that is present or exists (i.e., a valid root cause). Hence this limitation (“… each training probability distribution labeled with an actual root cause that caused a given training die to fail”) broadly recites that the diagnosis reports are specified with a valid root cause. As indicated in the Non-Final Office Action mailed February 10, 2022, Benware teaches analyzing diagnosis reports, where the diagnosis reports include the specifying of a single root cause, where the specifying of this single root cause with each diagnosis report corresponds to a form of labeling. Benware also defines a set of possible root causes, where this indication of possible root causes indicates that these root causes are valid root causes that can be specified in each diagnosis report in a population of diagnosis reports (Benware p.14 Results from simulated defect experiments: “Experiments based on simulated defect responses in an IC have been carried out to evaluate the accuracy of RCD. In each experiment, the following steps are followed. 1) Specify a root cause distribution. …In every experiment performed, there is a consistent set of root cause models used as the complete set of possible root causes. The full set in these experiments include: short critical area model for each metal layer; open critical area model for each metal layer; open via macro count model for each via macro defined in the layout; cell-type count model for each library cell; one cell area model … In each experiment, a population of diagnosis reports is created and analyzed with RCD for a root cause distribution with only a single root cause specified. Only a subset of possible root causes was used as the injected root cause, however, each root cause model type (e.g., critical area shorts) is represented in the results.”). Examiner also notes that Applicant tries to link the term “actual” recited in the claims as being related to physical devices rather than the simulated experiments performed in Benware. While Benware indicates these experiments are based on simulated defect responses in an IC for perform the RCD analysis, these simulations only refer to the simulation of the tester datalog (representing the diagnosis reports), but as indicated in the earlier citation to the Benware reference, this information provided in the tester datalog simulation are still based on defects that occur on physical layouts and hence represent real (and hence valid) defects and root causes that can be observed from a tester to produce these diagnosis reports (Benware p.9 col.2 last paragraph-p.10 col.1 1st paragraph (Layout-aware diagnosis): “Layout-aware diagnosis [8] is the process that analyzes scan failures observed on the test for a defective die to produce a list of suspects that potentially contain the real defect. … The diagnosis process uses a logic level model of the design along with physical layout information to perform this analysis. Three basic types of suspects are produced by layout-aware diagnosis: opens, bridges, and cell internal. These three correspond to interconnect open defects, interconnect short defects, and defects inside library cell boundaries.”). Additionally, Benware also teaches applying the same RCD analysis on the lots manufactured on a 28-nm yield ramp, thus indicating that this analysis is not restricted to simulations only and is applicable to defects and root cause data found on physical devices (Benware pp.15-16 Results from 28-nm yield ramp: “… RCD was performed on all cores for four lots manufactured on a 28-nm bulk process … the data were processed with RCD as independent populations. A separate population of failing device was created for each layout configuration and each manufactured lot … After all the analyses were completed total root cause estimates per lot were obtained from each layout configuration …”). Hence, given the above evidence, Applicant’s arguments are not persuasive, and the existing prior art rejection is maintained.
Regarding Applicant’s Remarks:
“Importantly, the only mention of any supervised learning in all of the Cheng-Benware combination is a snippet in Cheng which states:
Without good domain knowledge, an alternative is to use supervised machine learning techniques to derive these parameters based on good training data. With more aggressive deep learning techniques it is possible that a new model can be created to replace the Bayesian network. Some domain knowledge is still needed to get proper training data.
(Cheng, p. 6, col 2 (emphasis added)). However, this portion of Cheng (also cited by the Office Action) actually teaches away from the claimed invention. Indeed, Cheng expressly teaches use of supervised machining learning techniques as a replacement to the parameter determination through Bayesian networks via probability distributions and diagnostic reports. That would require what Cheng refers to as "good training data" in the form of actual circuits and actual root cases instead of diagnosis reports as used in RCD through Bayesian networks and MLE to compute probability distributions. Put another way, the above paragraph in Cheng teaches determining correlation between root causes and defects through supervised machine learning instead of and without probability distributions and diagnosis reports processed via Bayesian networks and MLE.”
Examiner has considered this argument and finds the argument to be not persuasive. Examiner points out that Applicant’s above argument does acknowledge Cheng teaches providing good training data to a supervised learning model, but does not acknowledge that this supervised learning model is relevant and integrated with the context of the Cheng reference. Examiner reminds Applicant that MPEP 2145(X)(D-1) indicates that “…"the prior art’s mere disclosure of more than one alternative does not constitute a teaching away from any of these alternatives because such disclosure does not criticize, discredit, or otherwise discourage the solution claimed…." In re Fulton, 391 F.3d 1195, 1201, 73 USPQ2d 1141, 1146 (Fed. Cir. 2004)”. As established earlier in the responses to Applicant’s preceding arguments, Cheng teaches generating and using training data to train a supervised learning model. Cheng also teaches the identification of known faults, defects, and root causes in a set of volume diagnosis reports representing a set of failed IC circuits. Examiner also points out that Applicant’s arguments appear to assert that the Cheng reference requires the “good training data” to be in the form of actual circuits and actual root causes instead of diagnosis reports. Examiner points out that the context of the Cheng reference is based on deriving and estimating probabilities based on data mining volume diagnosis reports and building models based on the information provided by the volume diagnosis reports (Cheng p.1 Abstract and pp.1-2 Section I. Introduction). In other words, these diagnosis reports contain valid information found from physical devices being tested and scanned during a semiconductor manufacturing process. Examiner points out that Applicant’s assertions suggesting that the volume diagnosis reports do not contain “actual circuits” and “actual root causes” information is not persuasive, since Cheng indicates that these diagnosis reports are based on scanning physical devices to identify faults, defects, and root causes, where these scan diagnosis reports have been successfully used to guide physical failure analysis, and improve physical success rates with reduced turnaround time and cost (Cheng p.1 Section I. Introduction 1st paragraph: “Scan diagnosis also called logic diagnosis [1-6] is used to determine the defect locations and defect mechanism for a given failing device and the scan test patterns used. Scan diagnosis reports have been successfully used to guide physical failure analysis (PFA) to focus on a small area, and thus improve PFA success rate with reduced turnaround time and cost.”; and p.2 col.2 2nd-4th paragraphs: “… diagnosis reports depend on design behavior and test patterns used … Because of design complexity and limited and limited test patterns used, diagnosis reports often have multiple faults and multiple defects and multiple physical features …”). Hence, there is nothing in the Cheng reference that suggests that the information provided in the volume diagnosis reports represent randomly generated information that has no basis to valid defect and root cause information found on physical devices. Furthermore, based on the above citations from the Cheng reference, a person having ordinary skill in the art would understand that diagnosis reports are a reliable source of input data containing valid information found on physical devices, and hence they can form the basis of the “good training data” that is used to generate the training data to train a supervised machine learning model. This understanding is also consistent with Applicant’s own specification, since Applicant’s specification paragraph [0037] also indicates that “the training probability distributions may represent an output of a local phase of a volume diagnosis procedure performed for the training dies 210”, where in the context of this paragraph and Applicant’s Figure 2, the output from element 210 is element 220, which is a set of diagnosis reports, where the data from these diagnosis reports are used to generate training sets for training a supervised learning model. Hence, Applicant’s arguments are not persuasive, and the Cheng reference is within scope of the Applicant’s claimed invention, and hence the existing prior art rejection is maintained.

Claim Rejections - 35 USC § 101
U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 8-14 are rejected under 35 U.S.C. 101 
because the claimed invention is directed to non-statutory subject matter. The claims do not fall within at least one of the four categories of patent eligible subject matter because the system recited in independent Claim 8 (and inherited in the associated dependent Claims 9-14) is directed to software per se, which is not one of the four categories of patent eligible subject matter recited in 35 U.S.C. 101 (process, machine, manufacture, or composition of matter). The terms “model training engine” and “volume diagnosis adjustment engine” recited in Claim 8 do not invoke 112(f) claim interpretation since the term “engine” is defined by the Merriam-Webster dictionary (merriam-webster.com/dictionary/engine, retrieved on 2/7/2022) as “computer software that performs a fundamental function especially of a larger program”, and hence the term “engine” is not considered as a nonce term/generic placeholder (as it fails to meet the 112(f) three-prong test at Step A). 
Examiner also points out that Applicant’s specification does not provide an explicit definition for the terms “model training engine” and “volume diagnosis adjustment engine” as being limited to a combination containing hardware components. Applicant’s specification paragraph [0018] merely states an example where a computing system “may implement a model training engine 108 and volume diagnosis adjustment engine 110 (and components thereof) in various ways, for example as hardware and programming implemented via local resources of the computing system”. A person having ordinary skill in the art would understand that the above statement does not limit the model training engine or volume diagnosis adjustment engine to a specific implementation or embodiment that contains hardware components and programming, and instead indicates that there are various ways of implementing those specified engines. One of these various ways may include an implementation representing only software programming. Furthermore, Applicant’s specification paragraph [0068] suggests that “The systems, methods, devices, and logic described above, including the model training engine 108 and the volume diagnosis adjustment engine 110, may be implemented in many different ways in many different combinations of hardware, logic, circuitry, and executable instructions stored on a machine-readable medium. For example, the model training engine 108, volume diagnosis adjustment engine 110, or both, may include circuitry in a controller, a microprocessor, or an application specific integrated circuit (ASIC), or may be implemented with discrete logic or components, or a combination of other types of analog or digital circuitry …”. Examiner points out that the recitation of the term “components” for implementing the model training engine or volume diagnosis adjustment engine also does not restrict the component as being a hardware component, since the “components” of an engine may be implemented in many different ways in many different combinations, where one of such options includes executable instructions stored on a machine-readable medium (i.e., a software program). Given that the Applicant’s specification does not expressly describe the terms “model training engine” and “volume diagnosis adjustment engine” as being limited to embodiments that contain hardware components and instead allows other embodiments such as software programs, these terms “model training engine” and “volume diagnosis adjustment engine” applied in independent Claim 8 without the explicit recitation of hardware components correspond to an embodiment representing software programs, and thus Claim 8 (and its dependent claims) is interpreted as being directed to a software per se implementation. Applicant is advised to positively recite hardware as part of this system identified in independent Claim 8 (i.e., a computer processor and memory/non-transitory machine-readable medium) in order to resolve the 101 rejection to allow eligibility of independent Claim 8 and its associated dependent Claims 9-14.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-5, 8-12, and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over 
 Cheng et al., Volume Diagnosis Data Mining, 2017 22nd IEEE European Test Symposium (ETS), 10 pages [hereafter referred as Cheng], in view of Benware et al., Determining a Failure Root Cause Distribution From a Population of Layout-Aware Scan Diagnosis Results, IEEE Design & Test of Computers, 2012, pp.8-18 [hereafter referred as Benware].
Regarding original Claim 1, 
Cheng teaches
(Original) A method comprising: by a computing system (Examiner’s note: Cheng teaches accessing diagnosis reports containing multiple fault, defect, physical feature information, and creating and training a volume diagnosis Bayesian network model to determine probability distributions of faults and associated probable root causes, and using cross validation techniques to produce training and test data based on the domain knowledge information provided in the volume diagnosis reports, all of which require use of a computing system containing a computer processor executing instructions, where the instructions are stored on a computer-readable medium (Cheng p.2 Section II. Volume Diagnosis Model; pp.6-8 Section V. Volume Diagnosis Practical Usages, Figures 5(a),(b), and Figures 6(a)-(g)).): 
accessing a diagnosis report for a given circuit die that has failed scan testing (Examiner’s note: Cheng teaches a diagnosis driven yield analysis (DDYA) method where diagnosis reports are inspected for defect, physical feature, and fault information, where the physical feature represents probable root causes, and where the diagnosis reports are produced as a result of scan diagnosis used for determining defect locations on semiconductor devices (Cheng p.1 Abstract; p.1 col.1 Section I. Introduction 1st-3rd paragraphs: “Scan diagnosis … is used to determine the defect locations and defect mechanism for a given failing device and the scan test patterns used. … With physical defect features reported by diagnosis tools, several papers have proposed to use volume (large amount of) diagnosis reports with appropriate statistical analysis to automatically identify a common physical defect feature …”; p.1 col.2 Section I. Introduction 2nd-6th paragraphs; and p.2 col.1 Section I. Introduction 4th-5th paragraphs: “… With volume diagnosis reports, DDYA uses statistical method to identify the correct distribution of these diagnosis reports … DDYA in this paper is based on MLE. So DDYA problem has two parts: how to get correct volume diagnosis likelihood model and how to ensure the distribution identified by MLE is correct with limited diagnosis reports. … In this paper, a Bayesian network [26] is used to model volume diagnosis reports …”).); 
computing, through a local phase of a volume diagnosis procedure, a probability distribution for the given circuit die from the diagnosis report, wherein the probability distribution specifies probabilities for different root causes as having caused the given circuit die to fail (Examiner’s note: In light of applicant’s specification paragraph [0010], a “local phase of a volume diagnosis procedure” is defined as a phase involving individual failed ICs (i.e., semiconductor devices). Cheng teaches creating a Bayesian network model by first determining a probability P(r) (i.e., the probability of one diagnosis report) based on a distribution of probabilities including identifying all mutually exclusive and independent root causes P(c) that are associated with specific defects and various identified faults for an individual failed IC associated with the single diagnosis report (Cheng p.2 col.1 last paragraph-col.2 4th paragraph (Section II. Volume Diagnosis Model): “… Based on Bayesian network we assume that                         
                            P
                            
                                
                                    v
                                
                            
                            =
                            
                                ∏
                                
                                    P
                                    (
                                    r
                                    )
                                
                            
                        
                    . That is the probability of all sampled volume diagnosis reports is equal to the product of the probability of each diagnosis report if all diagnosis reports are independent.                         
                            P
                            
                                
                                    r
                                
                            
                            =
                            
                                ∑
                                
                                    (
                                    P
                                    
                                        
                                            r
                                        
                                        
                                            f
                                        
                                    
                                    *
                                    P
                                    
                                        
                                            f
                                        
                                    
                                    )
                                
                            
                        
                     if all faults are mutually exclusive and independent. … Combining all these equations we get                         
                            P
                            
                                
                                    v
                                
                            
                            =
                            
                                ∏
                                
                                    
                                        ∑
                                        
                                            (
                                            P
                                            
                                                
                                                    r
                                                
                                                
                                                    f
                                                
                                            
                                            *
                                            (
                                            
                                                ∑
                                                
                                                    (
                                                    P
                                                    
                                                        
                                                            f
                                                        
                                                        
                                                            d
                                                        
                                                    
                                                    *
                                                    
                                                        
                                                            
                                                                ∑
                                                                
                                                                    
                                                                        
                                                                            P
                                                                            
                                                                                
                                                                                    d
                                                                                
                                                                                
                                                                                    c
                                                                                
                                                                            
                                                                            *
                                                                            P
                                                                            
                                                                                
                                                                                    c
                                                                                
                                                                            
                                                                        
                                                                    
                                                                
                                                            
                                                        
                                                    
                                                    )
                                                    )
                                                
                                            
                                            )
                                        
                                    
                                
                            
                        
                    .”).); 
adjusting the probability distribution into an adjusted probability distribution using a supervised learning model (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites using a trained supervised learning model to produce an output representing an adjusted probability distribution. Cheng teaches using determining conditional probabilities represented in a Bayesian network model that models a set of diagnosis reports. These conditional probability parameters P(r|f), P(f|d), P(d|c) are defined as the conditional probability of a report when a specific fault in this report is true; the conditional probability of a fault if a specific defect is true; and the conditional probability of a defect if one of its root cause is true, respectively, where these faults, defects, and root causes are present in each diagnosis report and represent known or labeled data in each diagnosis report. In other words, the conditional probability parameters being learned from the Bayesian network (e.g., P(r|f), P(f|d), P(d|c)) represent the known and labeled data and their relationships found in the diagnosis reports (Cheng p.1 Section I. Introduction: “… in one diagnosis report, there can be several logic faults each of which can cause the failures observed at testers. Also reported are several physical defects each of which can cause one fault. Also reported are several physical features each of which is responsible to cause one defect. In other words, a diagnosis report can be caused by several possible physical features with various probability of each. …”). As indicated earlier, Cheng teaches P(r) as a product of the above conditional probabilities to represent the probability of one diagnosis report. A person having ordinary skill in the art would understand that in statistics, a product of conditional probabilities represents a sequence of probabilities, where each sequence of probabilities represents a probability distribution (Cheng p.2 col.1 last paragraph-col.2 4th paragraph). Additionally, Cheng teaches that each of the above conditional probabilities represent adjusted values, where the adjustments broadly recite changes in the relationships between the known data (e.g., faults, defects, and root causes) based on score systems used in the diagnosis tools, where this relationship and score information are also present and known in the diagnosis reports (Cheng p.6 Section V. Volume Diagnosis Practical Uses 1st-3rd paragraphs: “… To get correct P(r|f) requires the understanding of the score system used in diagnosis tools … P(f|d) … this parameter requires the understanding of the relationship among the logic faults and physical defects used in diagnosis tools … Similar to P(r|f), adjustment is needed to get correct P(f|d) … P(d|c) … This information can be derived from layout and defect behavior of each root cause. The correlation among these root causes and the defects should be used to get accurate P(d|c).”). Cheng additionally teaches using supervised learning techniques to build a deep learning model to learn the conditional probability parameters P(r|f), P(f|d), P(d|c), where the training of this deep learning model using supervised learning techniques results in a trained supervised learning model that learned these probability parameters modelled in a Bayesian network (Cheng p.6 col.2 2nd paragraph: “… an alternative is to use supervised machine learning techniques to derive these parameters based on good training data. With more aggressive deep learning techniques it is possible that a new model can be created to replace the Bayesian network. Some domain knowledge is still needed to get proper training data.”; and p.7 col.1 1st paragraph-4th paragraphs: “… In volume diagnosis, over-fitting problem in general adds extra root causes in the MLE distribution. Over-fitting can be alleviated by increasing sample data size … There are several popular machine learning techniques to deal with the over-fitting problem … cross-validation was used in DDYA … to divide the total sampled data set into N parts randomly. N-1 parts are used as training data and the remaining 1 part is used as test data. MLE finds the most likely distribution of training data and applied this distribution on testing data to measure its fitness.”). A person having ordinary skill in the art would understand that a supervised learning method represents a method for learning the changes in data that is present in a given training set to perform a prediction, and hence the presence of these known faults, defects, root causes, and their relationships modelled as a probability distribution present in the diagnosis reports is learned through the sampled training data taught in Cheng, resulting in a learned model that performs a prediction based on received input data, where this prediction represents an adjusted probability distribution output based on received input data.) … 
… the supervised learning model trained with a training set comprising training probability distributions computed from training dies through the local phase of the volume diagnosis procedure (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites generating training sets containing probability distributions, and using these training sets to train a supervised learning model. As indicated earlier, Cheng teaches using supervised learning to build a deep learning model to learn the conditional probability parameters P(r|f), P(f|d), P(d|c) that describe a Bayesian network of volume diagnosis, where this deep learning model based on supervised learning is a supervised learning model (Cheng p.6 col.2 2nd paragraph). As indicated earlier, Cheng teaches that the computed probability of a sampled diagnosis report containing a specific root cause is the product of the above-mentioned conditional probabilities, where each of these conditional probabilities is based on known data (faults, defects, physical features/root causes) that is present in each diagnosis report, such that these conditional probability parameters (e.g., P(r|f), P(f|d), P(d|c)) represent the known and labeled data and their relationships found in the diagnosis reports. To clarify the generation of training data in the context of generating cross-validation training and test data, Cheng uses a card-game analogy to explain the creation of a Bayesian network model for computing volume diagnosis probability distributions by performing limited sampling through drawing cards from a deck, and applies this card-game analogy to the volume diagnosis scenario to generate a sampled set of training data from the set of diagnosis reports (i.e., the card number and deck identifier) (Cheng p.2 col.2 3rd paragraph; Cheng p.6 col.2 3rd-4th paragraphs; and p.7 col.1 1st paragraph-4th paragraphs). A person having ordinary skill in the art would understand through Cheng’s detailed teaching of the card game analogy, for the volume diagnosis scenario, one must perform sufficient sampling of diagnosis reports and perform over-fitting techniques to overcome the data bias present in a limited sample. Hence, this sampling of diagnosis reports containing labeled information represents the generated sampled training data based on the generated diagnosis reports containing known or labeled data. Taking into account with Cheng’s earlier statement about using supervised machine learning techniques to derive these parameters based on good training data, a person having ordinary skill in the art would understand that the Cheng provides sufficient teachings that explain how to produce training samples through the sampling of diagnosis reports, where these training samples containing labeled information (e.g., faults, defects, root causes) correspond with the good training data that is used to perform the supervised machine learning techniques to train a supervised learning model.) …
… providing the adjusted probability distribution for the given circuit die as an input … to determine a … distribution for multiple circuit dies that have failed the scan testing (Examiner’s note: As indicated earlier, Cheng further teaches applying cross-validation techniques to a total sampled data set representing a set of limited diagnosis reports, separating the sampled data set into training data and test data sets, where the test data set was used to evaluate the Bayesian network model and measure the fitness of the model by using the most likely distribution for the probability parameters that were identified and computed during the training phase (Cheng p.6 col.2 4th paragraph-p.7 col.2 2nd paragraph: “… As in volume diagnosis, P(r|f), P(f|d) and P(d|c) can be correct based on unlimited diagnosis reports … Over-fitting can be alleviated by increasing sample data size … There are several popular machine earning techniques to deal with the over-fitting problem … cross validation was used in DDYA … to divide the total sampled data set into N parts randomly. N-1 parts are used as training data and the remaining 1 part is used as test data. MLE finds the most likely distribution of training data and applied this distribution on testing data to measure its fitness. … The fitting distribution which is most similar to underlying distribution fits best in test data as shown in Figure 6(c) and 6(f).”).).  
While Cheng teaches building a supervised learning model with proper training data to further improve the probability parameters represented in the volume diagnosis Bayesian network, Cheng does not explicitly teach
… each training probability distribution labeled with an actual root cause that caused a given training die to fail …
… providing the adjusted probability distribution … as an input to a global phase of the volume diagnosis procedure to determine a global root cause distribution …
Benware teaches 
… each training probability distribution labeled with an actual root cause that caused a given training die to fail (Examiner’s note: Benware teaches performing experiments based on simulated defect responses using a root cause deconvolution (RCD) method involving the creation of a Bayesian network model based on injected root cause defect information provided in scan diagnosis reports (Benware p.10 Figure 1), where the experiments involve creating a population of diagnosis reports and analyzing them using the RCD method for a root cause distribution with only a single root cause specified, where this specifying of a single root cause for a root cause distribution associated with a diagnosis report (containing defect information associated with possible root causes based on physical layouts) is a form of labeling (Benware p.14 Results from simulated defect experiments: “Experiments based on simulated defect responses in an IC have been carried out to evaluate the accuracy of RCD. In each experiment, the following steps are followed. 1) Specify a root cause distribution. …In every experiment performed, there is a consistent set of root cause models used as the complete set of possible root causes. The full set in these experiments include: short critical area model for each metal layer; open critical area model for each metal layer; open via macro count model for each via macro defined in the layout; cell-type count model for each library cell; one cell area model … In each experiment, a population of diagnosis reports is created and analyzed with RCD for a root cause distribution with only a single root cause specified. Only a subset of possible root causes was used as the injected root cause, however, each root cause model type (e.g., critical area shorts) is represented in the results.”; p.9 col.2 last paragraph-p.10 col.1 1st paragraph (Layout-aware diagnosis): “Layout-aware diagnosis [8] is the process that analyzes scan failures observed on the test for a defective die to produce a list of suspects that potentially contain the real defect. … The diagnosis process uses a logic level model of the design along with physical layout information to perform this analysis. Three basic types of suspects are produced by layout-aware diagnosis: opens, bridges, and cell internal. These three correspond to interconnect open defects, interconnect short defects, and defects inside library cell boundaries.”; and pp.15-16 Results from 28-nm yield ramp).) …
… providing the adjusted probability distribution … as an input to a global phase of the volume diagnosis procedure to determine a global root cause distribution (Examiner’s note: In light of applicant’s specification paragraph [0010], a “global phase of a volume diagnosis procedure” is defined as a phase involving a population of failed ICs (i.e., semiconductor devices). As indicated earlier, Benware teaches  performing experiments based on simulated defect responses using a root cause deconvolution (RCD) method involving the creation of a Bayesian network model (Benware p.10 Figure 1; p.11 Figure 2; pp.11-p.12 Creating the diagnosis Bayes net; and p.14 col.1-p.15 col.1 (Results from simulated defect experiments).), where this RCD analysis involves performing an expectation-maximization (EM) algorithm to learn and adjust the probability parameters identified in the Bayesian network equations                         
                            
                                
                                    c
                                
                                
                                    i
                                    ,
                                    l
                                
                                
                                    (
                                    t
                                    )
                                
                            
                        
                    ,                         
                            
                                
                                    θ
                                
                                
                                    l
                                
                                
                                    (
                                    t
                                    +
                                    1
                                    )
                                
                            
                        
                    , and P(                        
                            
                                
                                    R
                                    C
                                
                                
                                    i
                                
                            
                            =
                            
                                
                                    r
                                    c
                                
                                
                                    l
                                
                            
                            |
                            
                                
                                    θ
                                
                                
                                    
                                        
                                            t
                                            +
                                            1
                                        
                                    
                                
                            
                        
                    ) (Benware pp.12-14 Learning the parameter values, in particular p.12 col.2 3rd paragraph: “The primary objective of the learning phase is to determine the unknown parameter values of the model, in this case 𝛉, which leads to an understanding of the root cause distribution. … if one could accurately compute the likelihood of each root cause for each symptom, one could again determine the parameter 𝛉 by summing the likelihood values for each root cause and thus determining P(RC). … using the well-known expectation-maximization (EM) algorithm.”). Benware further teaches applying the RCD analysis (and hence the Bayesian network model and its adjusted probability distribution) on all cores for four lots manufactured on a 28-nm bulk process, where separate populations of failing devices were created for each layout configuration and each manufactured lot to produce a plurality of populations for RCD analysis (Benware pp.15-16 col.2 1st paragraph-p.16 1st paragraph (Results from 28-nm yield ramp): “… this section presents the results from applying the methodology to the early stages of a 28-nm yield ramp. … RCD was performed on all cores for four lots manufactured on a 28-nm bulk process … the data were processed with RCD as independent populations. A separate population of failing devices was created for each layout configuration and each manufactured lot, making for a total of 24 populations for RCD analysis. After all the analyses were completed, total root cause estimates per lot were obtained by summing the results obtained from each layout configuration.”).) …
Both Cheng and Benware are analogous art since they both teach using scan diagnosis reports to create a Bayesian network model to estimate and adjust root cause distributions.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the scan diagnosis reports taught in Cheng and label them with a specified root cause taught in Benware as a way to improve the estimations of root cause distributions produced by the Bayesian network model. The motivation to combine is taught in Benware, as limiting the number of identified root causes to the most relevant limits the complexity of a model/algorithm performing the root cause analysis, as well as reducing the occurrence of the training data inadvertently overfitting the model, effectively improving the memory storage required for the model as well as the accuracy of the predictions generated by the model/algorithm (Benware p.9 col.1 1st-3rd paragraph; p.13 col.2 3rd paragraph).
Regarding original Claim 2, 
Cheng in view of Benware teaches
(Original) The method of claim 1, wherein the volume diagnosis procedure utilizes an unsupervised learning model to compute the probability distribution for the given circuit die (Examiner’s note: Under its broadest reasonable interpretation, an unsupervised learning model is a model that learns and groups patterns from received data. As indicated earlier, Cheng teaches creating a Bayesian network model by first determining a probability P(r) (i.e., the probability of one diagnosis report) based on a distribution of probabilities including identifying all mutually exclusive and independent root causes P(c) that are associated with specific defects and various identified faults for an individual failed IC associated with the single diagnosis report, where these distributions of probabilities represent groupings of similar features/characteristics associated with the specific defects and various identified faults, leading to the identification of patterns associated with the physical faults found in the diagnosis reports (Cheng p.1 1st paragraph-p.2 1st paragraph and p.2 Figure 1 (Section I. Introduction); and Cheng p.2 col.1 last paragraph-col.2 4th paragraph (Section II. Volume Diagnosis Model).), 
determine the global root cause distribution for multiple circuit dies, or both.  
Regarding original Claim 3, 
Cheng in view of Benware teaches
(Original) The method of claim 1, wherein the global phase of the volume diagnosis procedure is performed using a root cause deconvolution (RCD) model (Examiner’s note: As indicated earlier, Benware teaches simulating defect responses for a population of created diagnosis reports by injecting a root cause (such as a critical area short), and analyzing these reports using a root cause deconvolution method, including applying an EM algorithm to learn and adjust parameter values in a Bayesian network (Benware p.10 Figure 1; p.11 Figure 2; pp.11-p.12 Creating the diagnosis Bayes net; pp.12-13 Learning the parameter values; and p.14 col.1-p.15 col.1 (Results from simulated defect experiments).). As indicated earlier, Benware further teaches applying the RCD analysis (and hence the Bayesian network model and its adjusted probability distribution) on all cores for four lots manufactured on a 28-nm bulk process, where separate populations of failing devices were created for each layout configuration and each manufactured lot to produce a plurality of populations for RCD analysis (Benware pp.15-16 col.2 1st paragraph-p.16 1st paragraph (Results from 28-nm yield ramp)).).  
Regarding original Claim 4, 
Cheng in view of Benware teaches
(Original) The method of claim 1, further comprising generating the supervised learning model, including by: 
accessing the training dies, wherein each training die has been injected with a given root cause to actually cause a scan test failure (Examiner’s note: As indicated earlier, Benware teaches simulating defect responses for a population of created diagnosis reports by injecting a root cause (such as a critical area short), and analyzing these reports using a root cause deconvolution (RCD) method, where each created diagnosis report contains failure information detected by scan testing on a tester for a defective die (Benware p.9 col.2 5th paragraph-p.10 col.1 1st paragraph (Layout-aware diagnosis) and p.14 col.1-p.15 col.1 (Results from simulated defect experiments).); 
generating diagnosis reports for each of the training dies (Examiner’s note: As indicated earlier, Benware teaches simulating defect responses for a population of created diagnosis reports, where each created diagnosis report contains failure information detected by scan testing on a tester for a defective die (Benware p.9 col.2 5th paragraph-p.10 col.1 1st paragraph (Layout-aware diagnosis) and p.14 col.1-p.15 col.1 (Results from simulated defect experiments).); 
computing, through the local phase of a volume diagnosis procedure, the training probability distributions from the diagnosis reports, each of the training probability distributions respectively corresponding to one of the training dies (Examiner’s note: As indicated earlier, Cheng teaches creating a Bayesian network model by first determining a probability P(r) (i.e., the probability of one diagnosis report) based on a distribution of probabilities including identifying all mutually exclusive and independent root causes P(c) that are associated with specific defects and various identified faults for an individual failed IC associated with the single diagnosis report (Cheng p.2 col.1 last paragraph-col.2 4th paragraph (Section II. Volume Diagnosis Model). This computing and creation of a Bayesian network model taught in Chen is functionally analogous to the computing and creation of a Bayesian network model taught in Benware, where Benware also teaches modelling the Bayesian network through determination of probabilities for the root causes and associated suspect defects as part of the root cause deconvolution method involving a EM algorithm performing a most-likelihood estimation, where the defect information for determining these probabilities are also retrieved from the created diagnosis reports (Benware p.10 Figure 1; p.11 Figure 2; and p.11 col.1-p.12 col.2 (Creating the diagnosis Bayes net)).); 
labeling each of the training probability distributions with the given root cause for the training die corresponding to the training probability distribution, the given root cause indicative of the actual root cause for the training probability distribution (Examiner’s note: As indicated earlier, Benware teaches performing experiments based on simulated defect responses using the RCD method (Benware p.10 Figure 1), where the experiments involve creating a population of diagnosis reports and analyzing them using the RCD method for a root cause distribution with only a single root cause specified, where this specifying of a single root cause for a root cause distribution associated with a diagnosis report (containing defect information associated with possible root causes) is a form of labeling (Benware p.14 col.1-p.15 col.1 (Results from simulated defect experiments)).); and 
providing, as the training set, the labeled training probability distributions to train the supervised learning model (Examiner’s note: As indicated earlier, Benware teaches simulating defect responses for a population of created diagnosis reports by injecting a root cause, and analyzing these reports using the RCD method, where using the created Bayesian network model to analyze these reports to learn the probability parameters in the Bayesian network model is a part of the RCD analysis (Benware p.9 col.2 5th paragraph-p.10 col.1 1st paragraph (Layout-aware diagnosis) and Benware p.14 col.1-p.15 col.1 (Results from simulated defect experiments)).).  
Regarding original Claim 5, 
Cheng in view of Benware teaches
(Original) The method of claim 4, wherein the training dies are generated via 
simulation (Examiner’s note: Cheng teaches simulating silicon defects lots to get failure files for the experiments (Cheng p.2 col.1 2nd paragraph: “In [16-24] to verify the effectiveness of these techniques, handful of silicon defects and lots of simulated defects are used to get failure files for the experiments.”), where one of the references cited [19] is the Benware reference, where Benware teaches that the defect responses are simulated for a population of diagnosis reports, where each diagnosis reports is based on scan diagnosis results reporting failure information for an individual IC (Benware p.8 col.2 2nd paragraph and p.14 col.1-p.15 col.1 (Results from simulated defect experiments): “Experiments based on simulated defect responses in an IC have been carried out to evaluate the accuracy of RCD. … In each experiment, a population of diagnosis reports is created and analyzed with RCD for a root cause distribution with only a single root cause specified. Only a subset of possible root cause was used as the injected root cause, however, each root cause model type (e.g., critical area shorts) is represented in the results.”).), emulation, or a combination of both.  
Regarding original Claim 8, 

Claim 8 recites a system comprising of claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 1, and hence is rejected under similar rationale and motivations provided by Chang and Benware as indicated in Claim 1. In addition, Cheng teaches accessing diagnosis reports containing multiple fault, defect, physical feature information, and creating and training a volume diagnosis Bayesian network model to determine probability distributions of faults and associated probable root causes, and using cross validation techniques to produce training and test data based on the domain knowledge information provided in the volume diagnosis reports, all of which require use of a computing system containing a computer processor executing instructions, where the instructions are stored on a computer-readable medium. A person having ordinary skill in the art would understand that a computing system executing these associated instructions related to training and applying of the supervised learning model to cross-validation data to further train and evaluate the supervised machine learning model corresponds to the computing system functioning as a model training engine and a volume diagnosis adjustment engine (Cheng p.2 Section II. Volume Diagnosis Model; pp.6-8 Section V. Volume Diagnosis Practical Usages, Figures 5(a),(b), and Figures 6(a)-(g)). 
Regarding original Claim 9, 
Claim 9 recites the system of claim 8, where the system further comprises claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 2, and hence is rejected under similar rationale provided by Chang in view of Benware as indicated in Claim 2, in view of the rejections applied to Claim 8.  
Regarding original Claim 10, 
Claim 10 recites the system of claim 8, where the system further comprises claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 3, and hence is rejected under similar rationale provided by Chang in view of Benware as indicated in Claim 3, in view of the rejections applied to Claim 8.  
Regarding amended Claim 11, 
Claim 11 recites the system of claim 8, where the system further comprises claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 4, and hence is rejected under similar rationale provided by Chang in view of Benware as indicated in Claim 4, in view of the rejections applied to Claim 8.  
Regarding original Claim 12, 
Claim 12 recites the system of claim 11, where the system further comprises claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 5, and hence is rejected under similar rationale provided by Chang in view of Benware as indicated in Claim 5, in view of the rejections applied to Claim 11.  
Regarding original Claim 15, 
Claim 15 recites a non-transitory machine readable medium comprising processor executable instructions on a computing system, where those instructions comprise of claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 1, and hence is rejected under similar rationale and motivations provided by Chang and Benware as indicated in Claim 1. In addition, as indicated earlier, Cheng teaches accessing diagnosis reports containing multiple fault, defect, physical feature information, and creating and training a volume diagnosis Bayesian network model to determine probability distributions of faults and associated probable root causes, and using cross validation techniques to produce training and test data based on the domain knowledge information provided in the volume diagnosis reports, all of which require use of a computing system containing a computer processor executing instructions, where the instructions are stored on a computer-readable medium (Cheng p.2 Section II. Volume Diagnosis Model; pp.6-8 Section V. Volume Diagnosis Practical Usages, Figures 5(a),(b), and Figures 6(a)-(g)).
Regarding original Claim 16, 
Claim 16 recites the non-transitory machine-readable medium of claim 15, where the non-transitory machine-readable medium further comprises of instructions that includes claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 2, and hence is rejected under similar rationale provided by Chang in view of Benware as indicated in Claim 2, in view of the rejections applied to Claim 15.  
Regarding original Claim 17, 
Claim 17 recites the non-transitory machine-readable medium of claim 15, where the non-transitory machine-readable medium further comprises of instructions that includes claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 4, and hence is rejected under similar rationale provided by Chang in view of Benware as indicated in Claim 4, in view of the rejections applied to Claim 15.  
Regarding original Claim 18, 
Claim 18 recites the non-transitory machine-readable medium of claim 17, where the non-transitory machine-readable medium further comprises of instructions that includes claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 5, and hence is rejected under similar rationale provided by Chang in view of Benware as indicated in Claim 5, in view of the rejections applied to Claim 17.  
Claims 6-7, 13-14, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over 
 Cheng et al., Volume Diagnosis Data Mining, 2017 22nd IEEE European Test Symposium (ETS), 10 pages [hereafter referred as Cheng], in view of Benware et al., Determining a Failure Root Cause Distribution From a Population of Layout-Aware Scan Diagnosis Results, IEEE Design & Test of Computers, 2012, pp.8-18 [hereafter referred as Benware] as applied to Claims 1, 8, and 15; in even further view of Rajski et al., U.S. PGPUB 2006/0066339, Determining and Analyzing Integrated Circuit Yield and Quality, published 3/30/2006 [hereafter referred as Rajski].
Regarding original Claim 6, 
Cheng in view of Benware as applied to Claim 1 teaches
(Original) The method of claim 1.
While Cheng in view of Benware teaches a supervised learning model, Cheng in view of Benware does not explicitly teach
wherein the supervised learning model comprises a linear function that linearly adjusts the probability distribution computed for the given circuit die.  
Rajski teaches
wherein the supervised learning model comprises a linear function that linearly adjusts the probability distribution computed for the given circuit die (Examiner’s note: Rajski Figure 32 teaches a defect computation step for computing feature fail probabilities involving a linear-regression-based method that partitions the design into smaller block and relates the fail rate of each block to the defect features contained within the block. Rajski further teaches that once the design has been partitioned into blocks, the fail probabilities for each block are estimated using an iterative procedure based on a linear regression model defined by the linear equation shown in Rajski Equation 12, where the estimates of                         
                            
                                
                                    p
                                
                                
                                    f
                                    a
                                    i
                                    l
                                
                            
                            (
                            
                                
                                    f
                                
                                
                                    i
                                
                            
                            )
                        
                     can be generated using well-known regression techniques (Rajski [0277], [0284]-[0291]). Rajski further teaches that a data calibration step that iteratively reduces the estimation error caused by equivalent classes (represented by the failure probability parameters shown in Rajski Equation 12) (Rajski [0303]-[0309], in particular [0303]: “The predicted distribution of yield loss mechanisms are desirably calibrated such that, in the statistical sense, the estimation error caused by equivalent classes can be reduced. As shown in Fig. 32, data calibration (22.2) can be performed in an iterative fashion with diagnostic results computation (22.1).”).).  
Both Cheng in view of Benware and Rajski are analogous art since they both teach analysis of fail test result data using probabilistic analysis methods to identify failure probabilities associated with a subset of physical feature defects.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the supervised learning model taught in Cheng in view of Benware and use the linear regression analysis technique taught in Rajski as a way to improve the analysis and computation of the feature fail probabilities. The motivation to combine is taught in Rajski, by showing that the estimation of the feature fail probabilities can be approximated through a linear equation, where the linear equation is easily solvable through performing calculations on a generic computer, hence improving the computational efficiency of the system (Rajski [0290]-[0291]).
Regarding original Claim 7, 
Cheng in view of Benware, in even further view of Rajski teaches
(Original) The method of claim 6, 
wherein the linear function comprises an adjustment matrix that linearly adjusts an input probability distribution (Examiner’s note: As indicated earlier, Rajski teaches a data calibration step for iteratively reducing the estimation error computed in the data defect computation step, where the data calibration step can be summarized in the form of a linear equation shown in Rajski Equation 16, which is an alternate representation of Rajski Equation 15 that defines the probability distribution                         
                            
                                
                                    P
                                    (
                                    O
                                
                                
                                    i
                                
                            
                            )
                        
                     that a defect is predicted by diagnosis as class i, where the matrix 𝚪 shown in Rajski Equation 16 represents the conditional probability                         
                            
                                
                                    P
                                    (
                                    O
                                
                                
                                    i
                                
                            
                            |
                            
                                
                                    D
                                
                                
                                    j
                                
                            
                            )
                        
                     that is adjusted according to the data calibration step (Rajski [0305]).); and 
wherein the adjustment matrix has dimensions of 'N' x 'N', wherein 'N' is a number of different root causes in probability distributions computed by the local phase of a volume diagnosis procedure (Examiner’s note: As indicated earlier, Rajski teaches a data calibration step using Rajski Equation 16 (which contains a matrix 𝚪). Rajski further teaches performing data calibration based on assumptions of no ambiguity between different classes and ambiguity between different classes, where for the case of ambiguity between different classes, Rajski Equation 16 will be calibrated based on the identified equation shown in Rajski Equation 17, where matrix                         
                            
                                
                                    Γ
                                
                                
                                    -
                                    1
                                
                            
                        
                     represents the inverse matrix of 𝚪, indicating that the matrix is a square matrix with equal dimensions i=j, where both i and j represent the number of defect classes, where these defect classes represent identified root causes (Rajski [0306]; [0154]: “… Process (21.1) is performed to try to identify the defect, respectively the class or subclass of the defect, which can best explain the failing behavior of the integrated circuit.”; and [0216]: “… each defect has an ID indicating which class it belongs to, in the event that all candidates fall into the same class …”).).  
Regarding original Claim 13, 
Claim 13 recites the system of claim 8, where the system further comprises claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 6, and hence is rejected under similar rationale and motivations provided by Chang in view of Benware and Rajski as indicated in Claim 6, in view of the rejections applied to Claim 8.  
Regarding original Claim 14, 
Claim 14 recites the system of claim 13, where the system further comprises claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 7, and hence is rejected under similar rationale provided by Chang in view of Benware, in further view of Rajski as indicated in Claim 7, in view of the rejections applied to Claim 13.  
Regarding original Claim 19, 
Claim 19 recites the non-transitory machine-readable medium of claim 15, where the non-transitory machine-readable medium further comprises of instructions that includes claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 6, and hence is rejected under similar rationale and motivations provided by Chang in view of Benware and Rajski as indicated in Claim 6, in view of the rejections applied to Claim 15.  
Regarding original Claim 20, 
Claim 20 recites the non-transitory machine-readable medium of claim 19, where the non-transitory machine-readable medium further comprises of instructions that includes claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 7, and hence is rejected under similar rationale provided by Chang in view of Benware, in further view of Rajski as indicated in Claim 7, in view of the rejections applied to Claim 19.  

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM WAI YIN KWAN whose telephone number is 303-297-4332. The examiner can normally be reached Monday-Friday 8:00am - 4:30pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 571-272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/WILLIAM WAI YIN KWAN/Examiner, Art Unit 2121                                                                                                                                                                                                        



/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121