DETAILED ACTION
This is the response to applicant’s amendment action regarding application number 16/545,708, filed August 20, 2019.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendments
The amendment filed December 9, 2021 has been entered. Examiner acknowledges receipt of Amendments to Application 16/545,708, which include: Amendments to the Claims pp.2-9, and Remarks pp.10-14 (containing applicant’s amendments). 
Regarding applicant’s Remarks on p.10, Examiner has acknowledged that all original Claims 1-20 have been cancelled, and new Claims 21-40 have been added. Examiner notes that the new Claim 28 has been incorrectly labeled as “(Currently Amended)” in the new Amendments to the Claims (since Claim 28 was not previously presented with the original set of Claims 1-20). Claims 21-40 remain pending in the application. 
Regarding applicant’s Remarks on p.10, Examiner has acknowledged that original Claims 9-10 are cancelled, and therefore the respective claim objections previously set forth in the Non-Final Office Action mailed October 1, 2021 are withdrawn. Examiner further notes that while original Claims 9-10 have been cancelled, certain aspects from those claim limitations are found in the newly-added independent Claims 21, 32 and 37, and dependent Claim 28, which contain the necessary corrections to the earlier identified objections.
Regarding applicant’s Remarks on p.10, Examiner acknowledges that original Claims 6-8 and 17-18 are cancelled, and therefore the respective §112(b) rejections previously set forth in the Non-Final Office Action mailed October 1, 2021 are withdrawn. 

Response to Arguments
Examiner acknowledges receipt of Arguments to Application 16/545,708, which include: Remarks pp.10-14 (containing applicant’s arguments). 
Regarding Applicant's Remarks on pp.10-11 for Claims 1-20 under 35 U.S.C. 101, Examiner acknowledges that original Claims 1-20 have been cancelled, and therefore the respective §101 rejections previously set forth in the Non-Final Office Action mailed October 1, 2021 are withdrawn. Examiner further notes that Applicant’s remaining arguments are directed to the newly-added independent Claims 21, 32, and 37, which necessitates further examination and re-evaluation as these new claims were not previously presented.
Regarding Applicant’s Remarks on pp.11-13 for Claims 1-5, 10-12, 14, 16, 19, and 20 under 35 U.S.C. 103 as being unpatentable over Cheong et al., U.S. PGPUB 2019/0304849, filed 3/26/2019 [hereafter referred as Cheong], in view of Bilenko et al., U.S. PGPUB 2014/0337096, published 11/13/2014 [hereafter referred as Bilenko], in further view of Graefe, Goetz, Query Evaluation Techniques for Large Databases, ACM Computing Surveys, Vol.25, No.2, June 1993 [hereafter referred as Graefe]; for Claims 6 and 17 under 35 U.S.C. 103 as being unpatentable over Cheong in view of Bilenko, in further view of Graefe as applied to Claims 1 and 16, in even further view of Liu et al., Isolation Forest, 2008 Eighth IEEE International Conference on Data Mining, IEEE 2008 [hereafter referred as Liu]; for Claims 7-8 and 18 under 35 U.S.C. 103 as being unpatentable over Cheong in view of Bilenko, in further view of Graefe, in even further view of Liu as applied to Claims 6 and 17, in even further view of Chen, Hongge, Novel Machine Learning Approaches for Modeling Variations in Semiconductor Manufacturing (Masters Thesis), Massachusetts Institute of Technology June 2017, 96 pages [hereafter referred as Chen]; for Claim 9 under 35 U.S.C. 103 as being unpatentable over Cheong in view of Bilenko, in further view of Graefe as applied to Claim 1, in even further view of Kida et al., Luis Sergio, U.S. PGPUB 2019/0065989, published 2/28/2019 [hereafter referred as Kida]; for Claim 13 under 35 U.S.C. 103 as being unpatentable over Cheong in view of Bilenko, in further view of Graefe as applied to Claim 1; in even further view of Won et al., Random Forest Model for Silicon-to-SPICE Gap and FinFET design Attribute Identification, IEIE Transactions on Smart Processing and Computing, Vol.5, No.5, October 2016, pp.358-365 [hereafter referred as Won]; and for Claim 15 under 35 U.S.C. 103 as being unpatentable over Cheong in view of David et al., U.S. PGPUB 2019/0064253, published 2/28/2019 [hereafter referred as David], Examiner acknowledges that all original Claims 1-20 have been cancelled, and the bulk of the Applicant’s arguments are directed to the new Claims 21-40 which have not been previously presented, and therefore, these new claims necessitate further examination and re-evaluation. However, Examiner has noted Applicant’s arguments contain certain broad assertions, which will be addressed in the following paragraphs.
Regarding applicant’s Remarks on p.12:
“The cited portions and Liu in general teach to randomly and recursively partition a set of points to isolate the points. Applicant notes that random partitioning of a set of data is entirely unrelated to randomizing the set of data. For example, random partitioning involves dividing a set of data into separate partitions, as shown in Figures 1(a)-(b) of Liu. In contrast, claim 21 recites a method whereby a reduced training set of data is randomized into a plurality of randomized reduced training sets of data, and the plurality of randomized data sets are subsequently used in a classification algorithm. 
Making this even more clear, Liu is silent on using the plurality of randomized reduced training sets of data in decision trees to determine a relationship between labels and features of the electronic items, in the manner recited. Applicant notes that the OA had previously not given this feature patentable weight as allegedly reciting an intended use. Applicant has amended this limitation to recite an active step, and respectfully submits that these features should be given patentable weight.”
	Examiner has considered this argument, and finds the argument to be not persuasive. The Liu reference was used to teach the following claim limitations recited in original dependent Claims 6 and 17: “randomizing … to obtain a plurality of randomized reduced training sets of data, wherein each of the randomized reduced training set of data is suitable to be used in a different decision tree of the classification algorithm, for determining a relationship between the at least one label and the features, based on an output of the different decision trees”, where these original dependent Claims 6 and 17 are now cancelled, but aspects from those original claims are recited in new Claims 21, 32, and 37. Under its broadest reasonable interpretation, “a plurality of randomized reduced training sets of data” refers to a plurality of groups of training sets of data that were generated through a randomization process. Hence, the claim limitations do not recite a specific randomization method that would further limit its Liu pp.414-415 Section 2 Isolation and Isolation Trees, including p.414 col.1 6th paragraph: “In this paper, the term isolation means ‘separating an instance from the rest of the instances’. … In a data-induced random tree, partitioning of instances are repeated recursively until all instances are isolated.”; and p.415 col.1 2nd paragraph: “… Since each partition is randomly generated, individual trees are generated with different sets of partitions.”). The end result from the process taught in Liu is a plurality of “isolated” partitions of instances, where these isolated multiple partitions were generated through a randomization process, such that each partition of instances is a representation of a randomized training set of data. Hence, applying this randomization method taught in the Liu reference to the generated reduced training set of data as taught in the Bilenko method would produce a plurality of randomized reduced training sets of data as recited by the original set of claim limitations. As indicated in the Non-Final Office Action mailed October 1, 2021, the motivation to combine is taught in Liu (identified in Liu p.413 Abstract, as this method provides linear time complexity with a low constant and low memory requirement, thus providing a computationally efficient and memory efficient algorithm for a machine learning system). Therefore, the prior art rejection from those earlier claims is relevant and appropriate based on the broadest reasonable interpretation of the claims.
	Examiner further notes that Applicant has now changed the claim language recited in newly-added independent Claims 21, 32, and 37 to eliminate the intended use that was previously recited in earlier claims (“wherein each of the randomized reduced training set of data is suitable to be used in a different decision tree of the classification algorithm, for determining a relationship between the at least one label and the features …”). Since these new Claims 21-40 were not previously presented, these new claims necessitate further examination and re-evaluation. 
Claim Objections
Claim 27 is objected to because of the following informality: 
A conjunction word is missing between the two limitations recited in this claim:
“randomizing the aggregated representation of at least one label value associated with the aggregated set of data is based on a probability distribution characterizing probability to obtain different values for the aggregated representation of the at least one label value; <missing conjunction word>
randomizing data representative of a number of the one or more sets of data of the group is based on a probability distribution characterizing probability to obtain different values of this data representative of a number of the one or more sets of data of the group.”. 
Claim 27 is recited as a dependent claim of Claim 26, where Claim 26 recites the randomizing is performed for at least one of the aggregated representation of at least one label value associated with the aggregated set of data, and data representative of a number of the one or more sets of data of the group. Given that Claim 27 recites limitations that are further limiting the scope of the limitations recited in Claim 26, and given that Claim 26 recites the claim limitations in the context of “at least one of” the two limitations, in order to preserve the same “at least one of” aspect inherited from Claim 26, Claim 27 should use the conjunction word “or” between the two recited claim limitations. Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.

4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 21-25, 28-29, 31-34, 36-38 and 40 are rejected under 35 U.S.C. 103 as being unpatentable over 
Honda et al., U.S. PGPUB 2019/0277913, filed 3/8/2019 [hereafter referred as Honda], in view of Bilenko et al., U.S. PGPUB 2014/0337096, published 11/13/2014 [hereafter referred as Bilenko], in further view of Graefe, Goetz, Query Evaluation Techniques for Large Databases, ACM Computing Surveys, Vol.25, No.2, June 1993 [hereafter referred as Graefe], in even further view of Brownlee, Jason, Bagging and Random Forest Ensemble Algorithms for Machine Learning, retrieved from web.archive.org dated June 25, 2019 (http://web.archive.org/web/20190625001106/https://machinelearningmastery.com/bagging-and-random-forest-ensemble-algorithms-for-machine-learning/), 9 pages [hereafter referred as Brownlee].
Regarding new Claim 21, 
Honda teaches
(New) A method comprising, by a processing unit and a memory coupled to a non-transitory computer-readable memory medium (Examiner’s note: Honda Figures 3 and 7 teach a process and a machine learning pipeline for processing input data collected from data sources during a production run for a semiconductor manufacturing process, and generating multiple predictive models for failure detection and classification, where the input data is for training the multiple predictive models used in different model architectures. A person having ordinary skill in the art would understand that performing the set of process steps and machine learning pipeline requires a computing system containing a processor and associated memory (e.g., RAM and disk storage) coupled to each other, where the associated memory stores computer instructions representing these process steps and associated machine learning pipeline to execute the process steps, where the process steps include storing the input data as well as all outputs resulting from the process steps and machine learning pipeline, and where the outputs include predictions produced from a machine learning model (Honda Figure 3 and [0035]-[0041]; Figure 7 and [0059]; and Figures 12-14).):
obtaining a training set of data comprising a plurality of sets of data each representative of an electronic item, each set comprising feature values for a plurality of features, and for at least one label (Examiner’s note: Honda teaches obtaining input data for a semiconductor manufacturing process, where this input data comes from the results of testing semiconductor wafers, and consists of computed and measured feature data from WAT/PCM, WS, CP, FT testing, where one of the FT test results includes a label indicating pass/fail (Honda Figure 1; [0022]-[0025]; [0088]-[0089]; and [0091]-[0122]).); 
building a reduced training set of data which comprises an aggregated representation of the training set of data (Examiner’s note: As indicated earlier, Honda teaches a machine learning pipeline for generating and storing input data as training data for the one or more predictive models, where the machine learning pipeline includes performing feature data cleansing, feature selection, and feature engineering for datasets used to train a machine learning model, where the feature generation pipe performing the feature engineering uses statistical methods on the feature data (minimum, maximum and 10-90 percentile range) to perform feature engineering conversion of data, and where multidimensional analysis in the data cleansing step may involve removing unwanted data and features from the dataset using dimensionality reduction (Honda Figure 3, elements 310,320, 330; and [0035]-[0041], [0047]-[0051]; Figure 7 and [0059]; [0123]-[0130]).), …
randomizing the reduced training set of data in order to obtain a plurality of randomized reduced training sets of data (Examiner’s note: Honda teaches applying machine learning techniques such as random oversampling and bagging on a dataset to avoid having a skewed dataset that exhibits a bias towards a majority class. A person having ordinary skill in the art would understand the term “bagging” to refer to “bootstrap aggregation”, which is a process that involves randomizing a dataset with replacement to produce a plurality of randomized training sets (Honda [0132]-[0135]).); …
… using the plurality of randomized reduced training sets of data in a classification algorithm implementing a plurality of decision trees for determining a relationship between the at least one label and the features of the electronic items (Examiner’s note: As indicated earlier, Honda teaches obtaining input data for a semiconductor manufacturing process, where this input data comes from the results of testing semiconductor wafers, and consists of computed and measured feature data from WAT/PCM, WS, CP, FT testing, where one of the FT test results includes a label indicating pass/fail (Honda Figure 1; [0022]-[0025]; [0088]-[0089]; and [0091]-[0122]). Honda Figures 12-14 further teaches providing input data collected from the PCM, WS, FT testing stages into various model architectures to generate Honda Figure 3, elements 310,320, 330; and [0035]-[0041], [0047]-[0051]; Figure 7 and [0059]) through dimensionality reduction and oversampling methods (including bagging) to produce a plurality of randomized reduced training sets of data to determine classification of different types of RMA’ed chips, where this classification method using decision trees represents a determination of relationships between at least one label and features of the electronic items (Honda [0140]-[0144]; and [0123]-[0137]).) …
wherein each randomized reduced training set of data is used in … the classification algorithm for determining the relationship between the at least one label and the features of the electronic items (Examiner’s note: As indicated earlier, Honda teaches a classification method using decision trees and input data from a semiconductor manufacturing process, where this classification method using decision trees and the input data based on computed and measured feature data from WAT/PCM, WS, CP, FT testing represents a determination of the relationship between at least one label and features of the electronic items (Honda Figure 1; [0022]-[0025]; [0088]-[0089]; and [0091]-[0122]; Figures 12-14 and [0140]-[0144]; and [0123]-[0137]).) …
… storing an output of the classification algorithm in the non-transitory computer-readable memory medium (Examiner’s note: As indicated earlier, Honda Figures 3 and 7 teach a process and a machine learning pipeline for processing input data collected from data sources during a production run for a semiconductor manufacturing process, and generating multiple predictive models for failure detection and classification, where the input data is for training the multiple predictive models used in different model architectures. A person having ordinary skill in the art would understand that performing the set of process steps and machine learning pipeline requires a computing system containing a processor and associated memory (e.g., RAM and disk storage) coupled to each other, where the associated memory stores computer instructions representing these process steps and associated machine learning pipeline to execute the process steps, where the process steps include storing the input data as well as all outputs resulting from the process steps and machine learning pipeline, and where the Honda Figures 12-14; Figure 3 and [0035]-[0041]; Figure 7 and [0059]).).
While Honda teaches data cleansing (a form of data pre-processing), feature selection, and feature engineering and bagging techniques to produce a plurality of randomized reduced set of training data, Honda does not explicitly teach
… the building comprising:
dividing the sets of data into a plurality of groups, wherein all sets of data, for which feature values meet at least one similarity criterion, are in the same group,
storing in the reduced training set of data, for each group, at least one aggregated set of data comprising: 
an aggregated representation of feature values for the one or more sets of data of the group, 
an aggregated representation of at least one label value of the one or more sets of data of the group, …
Bilenko teaches
… the building comprising:
dividing the sets of data into a plurality of groups, wherein all sets of data, for which feature values meet at least one similarity criterion, are in the same group (Examiner’s note: Bilenko teaches a master data set containing a plurality of training examples, with each training example containing event data and an associated label (“a plurality of sets of data”, Bilenko [0032]-[0033]). A training system uses a partitioning process to produce training set instances that represent a cluster or partition (Bilenko [0049]). The partitioning process divides the original dataset (corresponding to “… wherein all sets of data …”) into multiple partitions according to attribute values identified within each training example as aspect values (Bilenko [0033]), where an aspect value is identified with the event data within each training example (corresponding to “dividing the sets of data into a plurality of groups, …”), and combinations of aspect values can be used to produce partitions consisting of subset of the master dataset. The features within each partition are associated with each other based on similar rates, frequency, or common aspect values (corresponding to “… for which feature values meet at least one similarity criterion, are in the same group”) (Bilenko Figure 1, elements 110, 112; [0036]-[0037]: “… a partitioning process 112 produces a plurality of partitions (also referred to as bins) for each aspect under consideration. Each partition is associated with a set of aspect values. … The partitioning process performs a similar partitioning process for other aspects, include both aspects associated with single attributes and aspects associated with combinations of attributes.”; and [0049]-[0053]: “… the training system produces clusters of aspect values, where the aspect values in each cluster exhibit similar label-conditioned statistical profiles. For example, the training system can identify the click-through rates associated with different individual user IDs. The training system can then form clusters of user IDs that have similar click-through rates. … In other implementations, the training system can use other partitioning strategies to produce the partitions. In one alternative technique, the training system can group aspect values based on a frequency measure. … In another case, the training system can group aspect values on the basis of shared aspect values. For example, the training system can ensure that all entries in a particular partition have at least one common aspect value (such as a particular user ID).”).),
storing in the reduced training set of data, for each group, at least one aggregated set of data (Examiner’s note: Bilenko teaches each of the identified partitions (based on aspect values) is further processed by an aggregation process, which generates additional associated statistical feature information based on the collected event data for each training example (with the statistical feature information corresponding to “… an aggregated representation …”), and is stored as additional feature information with the associated partition (with each partition identified as tables as shown in Bilenko Figure 3). These generated statistical feature information are associated with the corresponding data set instance as additional features such that representative instances for each cluster/partition can be identified as shown in Bilenko Figure 4 (Bilenko Figure 1, element 114; Figure 3; Figure 5, element 514; [0037]-[0038]: “Next, in a filtering and aggregation process 114, the training system identifies plural subsets of data, selected from the master dataset 110, for the different respective partitions. The training system then forms plural instances of statistical information based on the respective subsets of data for the respective partitions. … More specifically, different implementations may generate different kinds of statistical measures. In one case, the training system can form a count of the label values associated with the training examples. … The statistical information can also include various averages, ratios, etc.”).) comprising: 
an aggregated representation of feature values for the one or more sets of data of the group (Examiner’s note: Bilenko teaches examples of statistical feature information include averages/ratios of event data (Bilenko Figure 1, element 114; Figure 3; Figure 5, element 514; [0038]: “… More specifically, different implementations may generate different kinds of statistical measures. … The statistical information can also include various averages, ratios, etc.”; and [0076]: “… As described in Section A, the aggregation module 514 can form different statistical values. … The aggregation module 514 can also form various other statistical measures, such as normalized counts, averages, standard deviations, click-through rates, other ratios (e.g. ln(N/N)), etc.”).), 
an aggregated representation of at least one label value of the one or more sets of data of the group (Examiner’s note: Bilenko teaches examples of statistical feature information include counts of label values (Bilenko Figure 1, element 114; Figure 3; Figure 5, element 514; [0038]: “… More specifically, different implementations may generate different kinds of statistical measures. In one case, the training system can form a count of the label values associated with the training examples. … The statistical information can also include various averages, ratios, etc.” and Bilenko [0076]: “… As described in Section A, the aggregation module 514 can form different statistical values. For example, the aggregation module 514 can form a count (                        
                            
                                
                                    N
                                
                                
                                    x
                                
                                
                                    +
                                
                            
                        
                    ) of the number of click events for the aspect value x in question, as well as a count (                        
                            
                                
                                    N
                                
                                
                                    x
                                
                                
                                    -
                                
                            
                        
                    ) for the number of non-click events.”).), …
data representative of a number of the one or more sets of data of the respective group (Examiner’s note: Under its broadest reasonable interpretation, the term “data representative of a number of the one or more sets of data of the respective group” is interpreted as a count value for one or more sets of data of a respective group. Bilenko teaches using a statistical measure such as a normalized count (which is a ratio of a count of a feature divided over the total number of features, and where a count represents “data representative of a number of the one or more sets of data of the respective group”) (Bilenko [0076]: “… The aggregation module 514 can also form various other statistical measures, such as normalized counts …”).) …

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the data cleansing/data pre-processing, feature selection, and feature engineering components taught in Honda and include data partitioning and aggregation functions to perform the partitioning and aggregation steps taught in Bilenko as a way to generate a reduced training set of data comprising aggregated representations of the original training set of data. The motivation to combine is taught in Bilenko, as a way to produce an aggregated and reduced training data set that represents an optimized training data set that minimizes the loss of predictive accuracy. It is desirable to perform data cleansing/data pre-processing and feature engineering techniques to generate aggregated and reduced training data sets that minimizes the loss of predictive accuracy for a machine learning model, since these reduced and aggregated training data sets reduce the amount of memory storage needed to store the data as well as the computation time to process the data in a machine learning model, and coupled with the benefit of minimizing loss of predictive accuracy allows this reduced training data set to provide prediction results without significantly sacrificing prediction accuracy (Bilenko [0048]-[0049]: “The training system may use different partitioning strategies to define partitions. In one approach, the training system can define partitions in a manner that satisfies an objective relating to loss of predictive accuracy. … Predictive accuracy refers to an extent to which the statistical information accurately represents the labels associated with individual training examples which contribute to the statistical information. … The training system minimizes the loss of predictive accuracy by performing clustering in such a manner that the instances of statistical information accurately characterize the members in the respective clusters, while minimizing, overall, the loss of descriptive information pertaining to specific members of the clusters.”).
While Honda in view of Bilenko teaches aggregated representations using a statistical measure such as a normalized count (which is a ratio of a count of a feature divided over the total number of features, and where a count represents “data representative of a number of the one or more sets of data of the group”), Honda in view of Bilenko does not explicitly teach
… wherein for a plurality of the groups which comprise a plurality of sets of data, a number of aggregated set of data is less than a number of the sets of data of the group … 
Graefe teaches
… wherein for a plurality of the groups which comprise a plurality of sets of data, a number of aggregated set of data is less than a number of the sets of data of the group (Examiner’s note: Graefe teaches aggregating functions including generating sums and counts (without normalization) to represent a set of items, with these sums and counts corresponding to “data representative of a number of the one or more sets of data of the group” (Graefe p.98 col.2 Section 4 Aggregation and Duplicate Removal 1st paragraph: “Aggregation is a very important statistical concept to summarize information about large amounts of data. The idea is to represent a set of items by a single value or to classify items into groups and determine one value per group. Most database systems support aggregate functions for minimum, maximum, sum, count, and average (arithmetic mean).”). Graefe further teaches for situations where aggregating data results produces identical sample sets, the duplicated sample set is removed, thus making the aggregated set of data less than the original sets of data in the group (Graefe p.98 col.2 last paragraph – p.99 col.1 1st paragraph (Section 4 Aggregation and Duplicate Removal): “Algorithms for aggregate functions require grouping … This grouping process is very similar to duplicate removal in which equal data items must be brought together, compared, and removed. Thus, aggregate functions and duplicate removal are typically implemented in the same module. … in duplicate removal, items are compared on all their attributes, but only on the attributes in the by-list of aggregate functions. Second, an identical item is immediately dropped from further consideration in duplicate removal whereas in aggregate functions some computation is performed before the second item of the same group is dropped. Both differences can easily be dealt with using a switch in an actual algorithm implementation. Because of their similarity, duplicate removal and aggregation are described and used interchangeably here.”).) … 
	Both Honda in view of Bilenko and Graefe are analogous art since they both teach managing and evaluating data sets containing sets of values using data aggregation techniques involving mathematical functions.
Honda in view of Bilenko and enhance it to support other aggregation functions such as summation and counting, and perform additional reduction techniques such as duplicate removal taught in Graefe as a way to apply other techniques to generate a reduced training set of data comprising aggregated representations of the original training set of data. The motivation to combine is taught in Graefe, as aggregation techniques form the foundation of data processing techniques such as hashing and sorting, both of which help to optimize data storage when dealing with large amounts of data, thus allowing the system to be more memory efficient and also by allowing the concise representations of data to be stored and partitioned for further pipelining and parallelization operations the system, in order to further optimize the performance and efficiency of the system (Graefe p.158 col.1 2nd-3rd paragraphs (Summary and Outlook): A large set of query processing algorithms has been developed for relational systems. Sort- and hash-based techniques have been used for physical storage design, for associative index structures, for algorithms for unary and binary matching operations such as aggregation, duplicate removal, join, intersection, and division, and for parallel query processing using hash- or range partitioning. … Many of the existing algorithms will continue to be useful for extensible and object-oriented systems, and many can easily be generalized from sets of tuples to more general pattern-matching functions. … it allows algebraic optimizations of requests, i.e., optimizing transformations of algebra expressions and cost-sensitive translations of logical into physical expressions. Finally, it permits pipelining between operators to exploit parallel computer architectures and partitioning of stored data and intermediate results for most operators, in particular, for operators on sets but also for other bulk types such as arrays, lists, and time series.”).
While Honda in view of Bilenko, in further view of Graefe teaches bagging/bootstrap aggregation techniques and model architectures utilizing decision tree/random forest classifiers for performing classification, Honda in view of Bilenko, in further view of Graefe does not explicitly teach
… wherein each randomized reduced training set of data is used in a different decision tree of the plurality of decision trees of the classification algorithm … based on respective outputs of the plurality of decision trees …
Brownlee teaches
… wherein each randomized reduced training set of data is used in a different decision tree of the plurality of decision trees of the classification algorithm … based on respective outputs of the plurality of decision trees (Examiner’s note: Brownlee teaches bootstrap aggregation as an ensemble method, where the bootstrap aggregation method applies samples that were randomly selected (with replacement) to a plurality of different decision trees (which can be CART trees or random forest trees), where each decision tree is used to perform classification, and the number of decision trees used for bagging correspond to the number of samples generated, and the respective outputs from the plurality of decision trees are used to determine a final prediction (Brownlee p.1 2nd bullet: “After reading this post you will know about: … The Bootstrap Aggregation algorithm for creating multiple different models from a single training dataset.”; p.2 Bootstrap Aggregation (Bagging): “Bootstrap Aggregation (or Bagging for short) is a simple an very powerful ensemble method … Bootstrap Aggregation is a general procedure that can be used to reduce the variance for those algorithm that have high variance. … Bagging is the application of the Bootstrap procedure to a high-variance machine learning algorithm, typically decision trees. … Bagging of the CART algorithm would work as follows. 1. Create many (e.g., 100) random sub-samples of our dataset with replacement. 2. Train a CART model on each sample. 3. Given a new dataset, calculate the average prediction from each model. … if we had 5 bagged decision trees that made the following class predictions for [[a in]] an input sample: blue, blue, red, blue and red, we would take the most frequent class and predict blue. … The only parameters when bagging decision trees is the number of samples and hence the number of trees to include.”; and p.2 Random Forest).) …
Both Honda in view of Bilenko, in further view of Graefe and Brownlee are analogous art since both teach machine learning techniques involving the bagging/bootstrap aggregation method.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the bagging/bootstrap aggregation techniques taught in Honda in view of Bilenko, in further view of Graefe and apply these techniques to a random forest model taught in Brownlee as a way to determine an estimated performance of the random forest model as well as measuring variable importance in the random forest model. The motivation to combine is taught in Brownlee, since determining estimated performance on out-of-bag samples (i.e., those samples that were not randomly selected) and determining variable importance are useful in providing error estimates that determine the accuracy of the bagged model, as well as identifying important variables that lead to a particular prediction or outcome, in order to identify subsets of input variables that may be most or least relevant to the problem, as a way to suggest further tuning of the model by removing certain features from the dataset in order to improve the performance and accuracy of the random forest model (Brownlee p.2 Estimated Performance: “…The performance of each model on its left out samples when averaged can provide an estimated accuracy of the bagged models. This estimated performance is often called the OOB estimate of performance. These performance measures are reliable test error estimate and correlate well with cross validation estimates.”; and p.2 Variable Importance: “As the Bagged decision trees are constructed, we can calculate how much the error function drops for a variable at each split point … These drops in error can be averaged across all decision trees and output to provide an estimate of the importance of each input variable. The greater the drop when the variable was chosen, the greater the importance. These outputs can identify subsets of input variables that may be most or least relevant to the problem and suggest at possible feature selection experiments you could perform where some features are removed from the dataset.”).
Regarding new Claim 22, 
Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee teaches
(New) The method of claim 21, wherein, 
for a feature, feature values of different sets of data meet the similarity criterion if at least one of (a), (b), (c) and (d) is met (Examiner’s note: Under its broadest reasonable interpretation, this claim limitation in a method claim recites a contingent clause that effectively renders the subsequent claim language to not be performed because the condition precedent (“if at least one of (a), (b), (c), and (d) is met”) is not required to be met, and the claimed invention can be practiced without the condition occurring. See MPEP 2111.04(II). Applicant is advised to amend the claim to positively cite the condition as being fulfilled, since no patentable weight is given for the subsequent claim language following a contingent clause that does not require the condition to be fulfilled for practicing the claimed invention. However, for the purposes of examination, this contingent clause will be treated as if the condition were fulfilled.): 
(a) the feature values are equal (Examiner’s note: Bilenko teaches aspect values in a training set are attribute values (with each aspect corresponding to “a feature”, and the respective aspect values corresponding to “feature values”) and grouping common aspect values is a form of comparing those inspected aspect values to ensure they are equal to each other (Bilenko [0053]: “… the training system can group aspect values on the basis of shared aspect values. For example, the training system can ensure that all entries in a particular partition have at least one common aspect value (such as a particular user ID).”).);
(b) the feature values do not differ one from the other more than a threshold; 
(c) the feature values are equal after the feature values have been approximated; 
(d) at least one of (a), (b), (c) is met and label values of the different sets of data are similar according to a second similarity criterion.  
Regarding new Claim 23, 
Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee teaches
(New) The method of claim 21, wherein the features correspond to manufacturing data of the electronic item (Examiner’s note: As indicated earlier, Honda teaches obtaining input data for a semiconductor manufacturing process, where this input data comes from the results of testing semiconductor wafers, and consists of computed and measured feature data from WAT/PCM, WS, CP, FT testing, where one of the FT test results includes a label indicating pass/fail (Honda Figure 1; [0022]-[0025]; [0088]-[0089]; and [0091]-[0122]).) and 
the at least one label corresponds to at least one quality attribute of the electronic item (Examiner’s note: As indicated earlier, Honda teaches obtaining input data for a semiconductor manufacturing process, where this input data comes from the results of testing semiconductor wafers, and consists of computed and measured feature data from WAT/PCM, WS, CP, FT testing, where one of Honda Figure 1; [0022]-[0025]; [0088]-[0089]; and [0091]-[0122]).).  
Regarding new Claim 24, 
Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee teaches
(New) The method of claim 21, wherein for a majority of the groups comprising a plurality of sets of data, a number of aggregated set of data is less than a number of the sets of data of the group by a magnitude of at least ten (Examiner’s note: Under its broadest reasonable interpretation, the term “magnitude” means size or extent, and hence the term “a number of aggregated set of data is less than a number of the sets of data in the group by a magnitude of at least ten” is interpreted to mean that the number of aggregated set of data is at least ten less than the number of sets of data. Bilenko teaches selecting a representative data set instance within a cluster representing an collected of data set instances with a common aspect value (corresponding to “a plurality of sets of data”), where the selection of the representative data set corresponds to “wherein for a majority of the groups comprising a plurality of sets of data, a number of aggregated set of data is less than a number of the sets of data of the group…”. As shown in Bilenko Figure 4, the representative data set instance (represented by a black dot) is among at least a group of 11 or 12 other representative data set instances, hence corresponding to “a number of aggregated set of data is less than … by a magnitude of at least ten”).  
Regarding new Claim 25, 
Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee teaches
(New) The method of claim 21, 
wherein the aggregated representation of at least one label value of the one or more sets of data of the group comprises a sum of all label values of the one or more sets of data of the group (Examiner’s note: Bilenko teaches examples of statistical feature information that include counts of label values (corresponding to “an aggregated representation of at least one label value of the one or more sets of data of the group comprises a sum of all label values of the one or more sets of data of the group”) (Bilenko Figure 1, element 114; Bilenko Figure 3; Bilenko Figure 5, element 514; [0038]: “… More specifically, different implementations may generate different kinds of statistical measures. In one case, the training system can form a count of the label values associated with the training examples.”).).  
Regarding new Claim 28, 
Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee teaches
(New) The method of claim 21, 
wherein the reduced training set of data comprises a plurality of aggregated sets of data (Examiner’s note: Bilenko teaches a received data set containing a plurality of training examples, with each training example containing event data and an associated label (Bilenko [0032]-[0033]). The partitioning process divides the received data set into multiple partitions according to attribute values identified within each training example as aspect values (Bilenko [0033]), where an aspect value is identified with the event data within each training example, and combinations of aspect values can be used to produce partitions consisting of subset of the received data set (with each of these partitions corresponding to a “reduced training set of data”), with the features within each partition being associated with each other based on similar rates, frequency, or common aspect values (Bilenko [0036]-[0037]; [0049]-[0053]). Each of the identified partitions based on aspect values is further processed by an aggregation process, which generates additional associated statistical feature information based on the collected event data for each training example (with the statistical feature information corresponding to “… an aggregated representation …”, Bilenko [0037]-[0038]), and is stored as additional feature information with the associated partition, with each partition containing one or more such statistical measures (corresponding to “the reduced training set of data comprises a plurality of aggregated sets of data”) (Bilenko Figure 1, elements 110, 112; [0040]: “… Each partition, in turn, provides statistical information generated for that partition, which may comprise one or more statistical measure that correspond to features”).), 
wherein the number of aggregated sets of data in the reduced training set of data does not increase if the training set of data is expanded with at least one set of data comprising feature values which are similar to feature values already present in the training set of data for at least one set of data, according to the similarity criterion (Examiner’s note: Under its broadest reasonable interpretation, this claim limitation in a method claim recites a contingent clause that effectively renders the subsequent if the training set of data is expanded with at least one set of data …”) is not required to be met, and the claimed invention can be practiced without the condition occurring. See MPEP 2111.04(II). Applicant is advised to amend the claim to positively cite the condition as being fulfilled, since no patentable weight is given for the subsequent claim language following a contingent clause that does not require the condition to be fulfilled for practicing the claimed invention. However, for the purposes of examination, this contingent clause will be treated as if the condition were fulfilled. Graefe teaches introducing new sources of data through a read-ahead operation, and performing a merge operation that involves sorting and duplicate removal on a set of aggregated data receiving additional data, where the read-ahead operation is interpreted as “if the training set of data is expanded with at least one set of data”), and where the sorting and duplicate removal operations is interpreted as identifying “at least one set of data comprising feature values which are similar to feature values already present in the training set of data for at least one set of data, according to the similarity criterion”, with the result of the duplicate removal producing a result in which “the number of aggregated sets of data in the reduced training set of data does not increase …” the overall size of the reduced training data set (Graefe p.100 col.1 Section 4.2 Aggregation Algorithms Based on Sorting 1st – 3rd paragraphs: “Sorting will bring equal items together, and duplicate removal will then be easy. The cost of duplicate removal is dominated by the sort cost, and the cost of this naive duplicate removal algorithm based on sorting can be assumed to be that of the sort operation. For aggregation, items are sorted on their grouping attributes. This simple method can be improved by detecting and removing duplicates as early as possible, easily implemented in the routines that write run files during sorting. With such "early" duplicate removal or aggregation, a run file can never contain more items than the final output (because otherwise it would contain duplicates!), which may speed up the final merges significantly [Bitton and De Witt 1983]. … the operations discussed in the section on sorting, namely read-ahead using forecasting, merge optimizations, large cluster sizes, and reduced final fan-in for binary consumer operations, are fully applicable when sorting is used for aggregation and duplicate removal.”).).  
Regarding new Claim 29, 
Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee teaches
 The method of claim 21, comprising, by a processing unit (Examiner’s note: As indicated earlier, Honda Figures 3 and 7 teach a process and a machine learning pipeline for processing input data collected from data sources during a production run for a semiconductor manufacturing process, and generating multiple predictive models for failure detection and classification, where the input data is for training the multiple predictive models used in different model architectures. A person having ordinary skill in the art would understand that performing the set of process steps and machine learning pipeline requires a computing system containing a processor and associated memory (e.g., RAM and disk storage) coupled to each other, where the associated memory stores computer instructions representing these process steps and associated machine learning pipeline to execute the process steps (Honda Figure 3 and [0035]-[0041]; Figure 7 and [0059]; and Figures 12-14).): 
providing at least one set of data comprising a plurality of feature values representative of at least one electronic item, for which at least one label value is to be predicted (Examiner’s note: As indicated earlier, Honda Figures 12-14 teaches providing input data collected from the PCM, WS, FT testing stages into various model architectures to generate prediction results, where the model architectures represent single level and multiple-level models implemented with machine learning algorithms such as decision trees and random forests (Honda [0140]-[0144]).), and
predicting, based on the relationship, the label value associated with the set of data, thereby allowing prediction for the at least one electronic item (Examiner’s note: As indicated earlier, Honda Figures 12-14 teaches providing input data collected from the PCM, WS, FT testing stages into various model architectures to generate prediction results, where the model architectures represent single level and multiple-level models implemented with machine learning algorithms such as decision trees and random forests, and where the prediction result represents the prediction of Returned Merchant Authorizations (RMAs) for packaged electronic chips, expressed as a probability. Honda further teaches that only chips that passed the FT testing are provided to chip users, and hence the prediction of RMAs for packaged electronic chips represents a prediction of whether that associated FT testing label is accurate or not (Honda [0087]-[0089]; [0140]-[0144]). Examiner notes that the claim language “… thereby allowing prediction for the at least one electronic item” recites an intended use of predicting the label value associated with the set of data, where this language is already reflected in the earlier claim providing at least one set of data comprising a plurality of feature values representative of at least one electronic item, for which at least one label value is to be predicted” and “predicting, based on the relationship, the label value associated with the set of data …”, and therefore is considered as redundant claim language that does not further limit the claim limitation.). 
Regarding new Claim 31, 
Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee teaches
(New) The method of claim 21, wherein:
the training set data is collected from at least operational data collected from at least a manufacturing line of one or more electronic items (Examiner’s note: Under its broadest reasonable interpretation, the term “operational data collected from at least a manufacturing line” is interpreted as data collected through a normal routine process, i.e., routine testing conducted during a manufacturing process. As indicated earlier, Honda teaches obtaining input data for a semiconductor manufacturing process, where this input data comes from the results of testing semiconductor wafers, and consists of computed and measured feature data from WAT/PCM, WS, CP, FT testing, where one of the FT test results includes a label indicating pass/fail (Honda Figure 1; [0022]-[0025]; [0088]-[0089]; and [0091]-[0122]).), 
wherein the method further comprises updating the relationship between the at least one label and the features of the electronic items based on an update of the operational data during manufacturing (Examiner’s note: Honda teaches a machine learning pipeline containing a feature selection pipe, where this feature selection pipe performs determinations as to which sensors that provide the collected input data and/or manufacturing steps may not be providing useful data for training the ML model, and to remove these steps/sensors from training the model (effectively identifying and removing the attributes/features from the manufacturing process that are not considered relevant to the model). Honda Figures 4-6 further teach various processes involving this feature selection functionality, where Figure 4 teaches a scenario where sensors (and their associated collected feature data) that provides the best cross-validation accuracy are retained, while those with least cross-validation accuracy are removed, and where Figures 5 and 6 teach scenarios where sensors that identify key variables (i.e., relevant variables) Honda Figures 4-6; and [0041], [0051]-[0054]).).  
Regarding new Claim 32,
Claim 32 recites a system, where the system comprises of claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 21, and hence is rejected under similar rationale and motivations provided by Honda, Bilenko, Graefe, and Brownlee as indicated in Claim 21. In addition, Honda Figures 3 and 7 teach a process and a machine learning pipeline for processing input data collected from data sources during a production run for a semiconductor manufacturing process, and generating multiple predictive models for failure detection and classification, where the input data is for training the multiple predictive models used in different model architectures. A person having ordinary skill in the art would understand that performing the set of process steps and machine learning pipeline requires a computing system containing a processor and associated memory (e.g., RAM and disk storage) coupled to each other, where the associated memory stores computer instructions representing these process steps and associated machine learning pipeline to execute the process steps, where the process steps include storing the input data as well as all outputs resulting from the process steps and machine learning pipeline, and where the outputs include predictions produced from a machine learning model (Honda Figure 3 and [0035]-[0041]; Figure 7 and [0059]; and Figures 12-14).
Regarding new Claim 33,
Claim 33 recites the system of claim 32, where the system further comprises of claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 22, and hence is rejected under similar rationale provided by Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee as indicated in Claim 22, in view of the rejections applied to Claim 32.
Regarding new Claim 34,
Claim 34 recites the system of claim 32, where the system further comprises of claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 23, and hence is rejected under similar rationale provided by Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee as indicated in Claim 23, in view of the rejections applied to Claim 32.
Regarding new Claim 36,
Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee as indicated in Claim 29, in view of the rejections applied to Claim 32.
Regarding new Claim 37,
Claim 37 recites a non-transitory storage device readable by a machine, where the non-transitory storage device embodies a program of instructions executable by a machine to perform operations comprising of claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 21, and hence is rejected under similar rationale and motivations provided by Honda, Bilenko, Graefe, and Brownlee as indicated in Claim 21. In addition, Honda Figures 3 and 7 teach a process and a machine learning pipeline for processing input data collected from data sources during a production run for a semiconductor manufacturing process, and generating multiple predictive models for failure detection and classification, where the input data is for training the multiple predictive models used in different model architectures. A person having ordinary skill in the art would understand that performing the set of process steps and machine learning pipeline requires a computing system containing a processor and associated memory (e.g., RAM and disk storage) coupled to each other, where the associated memory stores computer instructions representing these process steps and associated machine learning pipeline to execute the process steps, where the process steps include storing the input data as well as all outputs resulting from the process steps and machine learning pipeline, and where the outputs include predictions produced from a machine learning model (Honda Figure 3 and [0035]-[0041]; Figure 7 and [0059]; and Figures 12-14).
Regarding new Claim 38,
Claim 38 recites the non-transitory storage device of claim 37, where the non-transitory storage device further comprises of claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 22, and hence is rejected under similar rationale provided by Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee as indicated in Claim 22, in view of the rejections applied to Claim 37.
Regarding new Claim 40,
Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee as indicated in Claim 29, in view of the rejections applied to Claim 37.
Claim 30 is rejected under 35 U.S.C. 103 as being unpatentable over 
Honda et al., U.S. PGPUB 2019/0277913, filed 3/8/2019 [hereafter referred as Honda], in view of Bilenko et al., U.S. PGPUB 2014/0337096, published 11/13/2014 [hereafter referred as Bilenko], in further view of Graefe, Goetz, Query Evaluation Techniques for Large Databases, ACM Computing Surveys, Vol.25, No.2, June 1993 [hereafter referred as Graefe], in even further view of Brownlee, Jason, Bagging and Random Forest Ensemble Algorithms for Machine Learning, retrieved from web.archive.org dated June 25, 2019 (http://web.archive.org/web/20190625001106/https://machinelearningmastery.com/bagging-and-random-forest-ensemble-algorithms-for-machine-learning/), 9 pages [hereafter referred as Brownlee] as applied to Claim 21; in even further view of Won et al., Random Forest Model for Silicon-to-SPICE Gap and FinFET Design Attribute Identification, IEIE Transactions on Smart Processing and Computing, Vol.5 No.5, October 2016, pp.358-365 [hereafter referred as Won].
Regarding new Claim 30, 
Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee teaches
(New) The method of claim 21 … by a processing unit …
an importance of one or more features (Examiner’s note: Brownlee teaches applying bagging/bootstrap aggregation techniques to a plurality of decision tree/random forest trees to identify and determine important variables that lead to a particular prediction or outcome, in order to identify subsets of input variables that may be most or least relevant to the problem (Brownlee p.2 Variable Importance: “As the Bagged decision trees are constructed, we can calculate how much the error function drops for a variable at each split point … These drops in error can be averaged across all decision trees and output to provide an estimate of the importance of each input variable. The greater the drop when the variable was chosen, the greater the importance. These outputs can identify subsets of input variables that may be most or least relevant to the problem and suggest at possible feature selection experiments you could perform where some features are removed from the dataset.”).) …
While Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee teaches applying bagging/bootstrap aggregation techniques to a plurality of decision tree/random forest trees to determine variable importances, Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee does not explicitly teach 
further comprising … based on the relationship, determining at least one of:
an importance … with respect to the at least one label, the importance being representative of a level of contribution of the features in the at least one label; and 
an impact of one or more features with respect to the at least one label, the impact being representative of whether the one or more features increase or decrease the at least one label.
Won teaches
 further comprising … based on the relationship, determining at least one of:
an importance … with respect to the at least one label, the importance being representative of a level of contribution of the features in the at least one label (Examiner’s note: Won teaches calculating an importance index based on a ratio of the sum of nodes at which a particular design attribute is used to split the S2S gap data (representing a prediction result located at a particular terminal node) into the next nodes, and the sum of all nodes in the random forest model except for terminal nodes (Won p.363 Equation 8), where larger importance values indicate which design attributes have a larger contribution towards determining the selection of a particular S2S gap data in the model (Won p.363 col.1 3rd paragraph and col.2 3rd paragraph (Section 3.3 Significant Design Attributes 1st paragraph; and p.363 Figure 12).); and 
an impact of one or more features with respect to the at least one label, the impact being representative of whether the one or more features increase or decrease the at least one label (Examiner’s note: Won teaches calculating an impact index for each design attribute based on the mean values shifts in the path from one node to a next left node during random forest node traversal (Won p.363 col.2 Equations 9 and 10), where larger minus or plus impact values indicate which design attributes have more power (influence) to drive (effect) the selection of a particular S2S gap data Won p.363 col.2 2nd and 4th paragraphs (Section 3.3. Significant Design Attributes); and p.364 Figure 13).).  
Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee and Won are analogous art since they both teach processing semiconductor manufacturing data using bootstrap aggregation techniques.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the bootstrap aggregation techniques taught in Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee and apply learning data related to semiconductor design attributes and S2S gap data taught in Won to further analyze and determine importance and influence of certain design attributes/features (represented by the nodes in a decision tree). The motivation to combine is taught in Won, since metrics such as importance and impact allows process engineers to identify those design attributes/features that have the most contribution to the S2S gap decision output, where this S2S gap is used as a measure of quality for improving chip yield. By identifying the most relevant or contributing design attributes through this analysis, process engineers can focus on either minimizing their influence (if it has a negative impact on the final output result or prediction) or maximizing their influence (if it has a positive impact on the final output result or prediction), and as such, the use of this bootstrap method to identify importance variables and their associated impact provides valuable diagnostic information to improve overall chip yield in a manufacturing system (Won p.358 Section 1. Introduction 1st paragraph: “To accelerate product yield ramp-up, it is important to characterize a silicon device accurately by measuring a device-under-test (DUT) designed exactly the same as in real production chips. …”; p.359 col.1 1st paragraph (Section 1 Introduction): “S2S gap may come from incorrect modeling for particular design layouts, high layout sensitivity to process fluctuation or defects in layouts, etc. Finding design attributes that result in a large S2S gap and fixing the causes related to the design attributes, such as layout features, are crucial for timely yielding of ramp-up. But the number of design attributes is increasing significantly in the recent technology node, and the impacts of design attributes are sometimes interdependent. So it becomes more and more difficult to accurately analyze the impact of individual design attributes …”; and p.364 col.1 2nd paragraph-col.2 2nd paragraph: “As importance indicates, the S2S gap is clearly classified by the identified significant design attributes … This means that the design attributes identified by importance have an important role in determining the S2S gap. … As impact indicates, the S2S gap is verified to show a clear trend of a larger minus S2S gap under the following design attribute conditions … This means that the design attributes and values (i.e., conditions) identified by the minus value of impact, surely drive the S2S gap into the minus direction. Conversely, the design attributes and values identified by the plus value of impact have a driving force into the plus direction, as well.”).
Claims 26-27, 35, and 39 are rejected under 35 U.S.C. 103 as being unpatentable over 
Honda et al., U.S. PGPUB 2019/0277913, filed 3/8/2019 [hereafter referred as Honda], in view of Bilenko et al., U.S. PGPUB 2014/0337096, published 11/13/2014 [hereafter referred as Bilenko], in further view of Graefe, Goetz, Query Evaluation Techniques for Large Databases, ACM Computing Surveys, Vol.25, No.2, June 1993 [hereafter referred as Graefe], in even further view of Brownlee, Jason, Bagging and Random Forest Ensemble Algorithms for Machine Learning, retrieved from web.archive.org dated June 25, 2019 (http://web.archive.org/web/20190625001106/https://machinelearningmastery.com/bagging-and-random-forest-ensemble-algorithms-for-machine-learning/), 9 pages [hereafter referred as Brownlee] as applied to Claims 21, 32, and 37; in even further view of Chen, Hongge, Novel Machine Learning Approaches for Modeling Variations in Semiconductor Manufacturing (Masters Thesis), Massachusetts Institute of Technology June 2017, 96 pages [hereafter referred as Chen].
Regarding new Claim 26, 
Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee as applied to Claim 21 teaches
(New) The method of claim 21.
However Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee does not explicitly teach
wherein the randomizing comprises randomizing, for each aggregated set of data of the reduced training set of data, at least one of: 
the aggregated representation of at least one label value associated with the aggregated set of data, and 
data representative of a number of the one or more sets of data of the group.  
Chen teaches
wherein the randomizing comprises randomizing, for each aggregated set of data of the reduced training set of data, at least one of: 
the aggregated representation of at least one label value associated with the aggregated set of data (Examiner’s note: Chen teaches determining an expected number of good packages by randomly packaging the dies into packages and estimating the ‘pass’ count through a binomial distribution, where the ‘pass’ count is associated with a label that is associated with the test result of a chip die, and the expected number of good packages represents an aggregated representation for the label (Chen pp.48-50 Section 4.4 Mathematical Formulation: “With a large number of testing dies, we can estimate the underlying probability by the relative frequency. We denote “positive” or fail by 1, and “negative” or pass by 0. … Without any classifiers, if we randomly package the dies into packages or stacks with s die in the stack, the failure rate of the packages is                         
                            
                                
                                    p
                                
                                
                                    p
                                    a
                                    c
                                    k
                                    a
                                    g
                                    e
                                     
                                    f
                                    a
                                    i
                                    l
                                     
                                
                            
                        
                    =                         
                            
                                
                                    1
                                    -
                                    p
                                    (
                                    H
                                    =
                                    0
                                    )
                                
                                
                                    s
                                
                            
                        
                    . … Using the law of total expectation, the expected number of good packages in this case is given by 𝚬                        
                            [
                            
                                
                                    m
                                
                                
                                    2
                                
                            
                        
                    (s)]= 𝚬[𝚬[                        
                            
                                
                                    m
                                
                                
                                    2
                                
                            
                        
                    (s)|k] = 𝚬[                        
                            
                                
                                    kp
                                    (H=0|y=0)
                                
                                
                                    s
                                
                            
                        
                    ] =(n/s)(p)                        
                            
                                
                                    (H=0|y=0)
                                
                                
                                    s
                                
                            
                        
                    p(y=0), where k is the number of stacks packaged as high end products and k is subject to a binomial distribution B(n/s, p(y=0)).”).), and
data representative of a number of the one or more sets of data of the group.  
Both Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee and Chen are analogous art since they both teach techniques for processing semiconductor manufacturing data to predict chip yield.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the reduced training set of data taught in Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee and further perform the steps of randomly selecting a package die pass/fail value (where the package die pass/fail value represents a label) based on a binomial distribution taught in Chen as a way to further generate a plurality of randomized training sets of data. The motivation to combine is taught in Chen, as a way to linearly approximate an overall yield improvement for a number of dies in a package that closely matches the predicted values when a classifier is trained on a training set of data (see Chen p.56 Figure 4-9), thus providing a reliable way to (Chen p.57 1st paragraph: “Recalling Equation (4.18), the expected yield improvement is a function of TPR and FPR. However, FPR and TPR are constrained by the ROC curve of the classifier. Then the optimal point (FPR*, TPR*) is where the contour plot is tangent to the ROC curve and the corresponding optimal threshold y* is determined for future prediction. Figure 4-9 gives the ROC curve and contour plots of expected yield improvement𝚬                        
                            [
                            
                                
                                    m
                                
                                
                                    2
                                
                            
                        
                    (s)-                        
                            
                                
                                    m
                                
                                
                                    1
                                
                            
                        
                    (s)] /(n/s) with different s (number of dies in a package). From the contour plot we can see that our linearization in Equation (4.19) is a good approximation even at s = 16, which can be used as a fast estimation of the expected yield improvement.”). 
Regarding new Claim 27, 
Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee, in even further view of Chen teaches
(New) The method of claim 26, wherein: 
randomizing the aggregated representation of at least one label value associated with the aggregated set of data is based on a probability distribution characterizing probability to obtain different values for the aggregated representation of the at least one label value (Examiner’s note: As indicated earlier in the Claim Objections section, Claim 27 is a dependent claim of Claim 26, and thus also inherits the “at least one of” aspect recited in Claim 26 for the two claim limitations that are identified and being further limited in scope in Claim 27, resulting in the interpretation of these two limitations in Claim 27 as also having an exclusive “or” relationship (such that the presence of either claim limitation is sufficient for the method). As indicated earlier, Chen teaches determining an expected number of good packages by randomly packaging the dies into packages and estimating the ‘pass’ count through a binomial distribution, where a binomial distribution is a probability distribution characterizing probability to obtain different values representing the packaged die pass/fail value (Chen pp.48-50 Section 4.4 Mathematical Formulation).); or
randomizing data representative of a number of the one or more sets of data of the group is based on a probability distribution characterizing probability to obtain different values of this data representative of a number of the one or more sets of data of the group.  
Regarding new Claim 35,
Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee and Chen as indicated in Claim 26, in view of the rejections applied to Claim 32.
Regarding new Claim 39,
Claim 39 recites the non-transitory storage device of claim 37, where the non-transitory storage device further comprises of claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 26, and hence is rejected under similar rationale and motivations provided by Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee and Chen as indicated in Claim 26, in view of the rejections applied to Claim 37. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Won et al., Statistical Design Attribute Identification for FinFET Outlier and Silicon-to-SPICE Gap, 2016 29th IEEE International System-On-Chip Conference (SOCC), September 2016, pp.35-40, where this paper teaches performing Monte-Carlo random sampling to compute semiconductor attribute commonality, where attribute commonality is identified as a metric to determine impact of device characteristics, and where attribute commonality analysis can be used with decision tree methods to determine relationships between the S2S gap and semiconductor attribute features.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM WAI YIN KWAN whose telephone number is 303-297-4332. The examiner can normally be reached Monday-Friday 8:00am - 4:30pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 571-272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/WILLIAM WAI YIN KWAN/Examiner, Art Unit 2121                                                                                                                                                                                                        

/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121