DETAILED ACTION
The applicant’s request for continued examination regarding application number 16/545,708, filed August 20, 2019 has been entered.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on April 13, 2022 has been entered. 

Response to Amendments
The amendment filed April 13, 2022 has been entered, which references the amendment submission filed March 31, 2022. Examiner acknowledges receipt of Amendments to Application 16/545,708, which include: Amendments to the Claims, and Remarks containing Applicant’s amendments. 
Regarding Applicant’s Remarks, Examiner acknowledges Claims 21, 26-27, 32, 35, 37, and 39 have been amended, with Claims 1-20 previously cancelled. Claims 21-40 remain pending in the application. 
Regarding Applicant’s Remarks, Examiner acknowledges Applicant’s Amendments to the Claims have resolved the objection identified in Claim 27, and therefore the identified claim objection previously set forth in the Final Office Action mailed March 9, 2022 is withdrawn. 

Response to Arguments
Examiner acknowledges receipt of Arguments to Application 16/545,708, which include: Remarks containing Applicant’s arguments. 
Regarding Applicant’s Remarks for Claims 21-25, 28-29, 31-34, 36-38, and 40 under 35 U.S.C. 103 as being unpatentable over Honda et al., U.S. PGPUB 2019/0277913, filed 3/8/2019 [hereafter referred as Honda], in view of Bilenko et al., U.S. PGPUB 2014/0337096, published 11/13/2014 [hereafter referred as Bilenko], in further view of Graefe, Goetz, Query Evaluation Techniques for Large Databases, June 1993 [hereafter referred as Graefe], in even further view of Brownlee, Jason, Bagging and Random Forest Ensemble Algorithms for Machine Learning, retrieved from web.archive.org dated June 25, 2019 [hereafter referred as Brownlee]; for Claim 30 under 35 U.S.C. 103 as being unpatentable over Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee as applied to Claim 21; in even further view of Won et al., Random Forest Model for Silicon-to-SPICE Gap and FinFET Design Attribute Identification, October 2016 [hereafter referred as Won]; and for Claims 26-27, 35, and 39 under 35 U.S.C. 103 as being unpatentable over Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee as applied to Claims 21, 32, and 37; in even further view of Chen, Hongge, Novel Machine Learning Approaches for Modeling Variations in Semiconductor Manufacturing (Masters Thesis), June 2017 [hereafter referred as Chen], Examiner acknowledges Applicant’s arguments and have considered them, and have found them to be not persuasive. Examiner notes that the majority of the Applicant’s arguments are directed to new limitations in the amended claims which have not been previously presented, and thus necessitate further examination and re-evaluation of the amended and related original claims. The updated claim mappings according to the Applicant’s amended claims are provided in the relevant sections indicated below. However, Examiner has noted Applicant’s arguments contain certain broad assertions, which will be addressed in the following paragraphs.
Regarding applicant’s Remarks:
“… The OA relies on paragraphs [0132]-[0135] of Honda and pp. 48-50 of Chen as teaching related features of previously presented claims 21, 26 and 27. Applicant respectfully disagrees.
The cited paragraphs and Honda in general teach to avoid bias in a skewed dataset (i.e., one with a small minority of defective parts) by oversampling the minority class. Applicant submits that this is entirely unrelated to the quoted features of claim 21, wherein a plurality of randomized reduced training sets of data is produced by randomizing the number of sets of data in each group a plurality of times, in the manner recited. The oversampling method of Honda describes a process whereby a minority class (the set of defective parts) is sampled at a higher rate than a majority class (the set of non-defective parts). In other words, a proportionately larger subset of the minority class is sampled than of the majority class. Applicant submits that this has no relation to the methods of claim 21, which describe an aggregation process of building groups from an entire training set of data. In other words, claim 21 is completely unrelated to sampling or oversampling processes, as a training set of data is not sampled, but rather is obtained and processed, in the manner recited, to obtain a plurality of randomized reduced training sets of data.”
	Examiner has considered this argument, and finds the argument to be not persuasive. Examiner identifies the following limitations from independent Claim 1 according to the Final Office Action mailed March 9, 2022:
	“… randomizing the reduced training set of data in order to obtain a plurality of randomized reduced training sets of data; 
using the plurality of randomized reduced training sets of data in a classification algorithm implementing a plurality of decision trees for determining a relationship between the at least one label and the features of the electronic items, wherein each randomized reduced training set of data is used in a different decision tree of the plurality of decision trees of the classification algorithm for determining the relationship between the at least one label and the features of the electronic items based on respective outputs of the plurality of decision trees …”
	Examiner points out that Applicant’s assertions regarding randomizing the number of sets of data in each group a plurality of times are directed to Applicant’s newly amended claim limitations that were not presented earlier, which require further examination and re-evaluation, and therefore will not be addressed here. Hence, Examiner will address the argument based on Applicant’s following assertion: “… The oversampling method of Honda … Applicant submits that this has no relation to the methods of claim 21, which describe an aggregation process of building groups from an entire training set of data. In other words, claim 21 is completely unrelated to sampling or oversampling processes, as a training set of data is not sampled, but rather is obtained and processed, in the manner recited, to obtain a plurality of randomized reduced training sets of data”, where Applicant asserts that the Honda reference does not teach a randomization method for training sets of data. Examiner points to Applicant that MPEP 2111 requires that during patent examination, the pending claims must be given their broadest reasonable interpretation consistent with the specification, and an Examiner must construe claim terms in the broadest reasonable manner during prosecution as is reasonably allowed in an effort to establish a clear record of what applicant intends to claim. Hence, referring back to the above recited limitations from the Final Office Action mailed March 9, 2022, under its broadest reasonable interpretation, those limitations recite randomizing a reduced training set of data to obtain a plurality of randomized reduced training sets of data. As provided in the recited claim limitations, there are several aspects associated with generating a randomized reduced training set of data: 1) reducing a training set of data to generate a reduced training set of data; 2) randomizing the reduced training set of data to generate a randomized reduced training set of data, where the randomized reduced training set of data is used in a classification algorithm containing a plurality of decision trees to determine an association (relationship) between features and a label; and 3) applying each set of randomized reduced training set of data in a different decision tree. The following paragraphs discuss each aspect in detail with respect to the prior art.
Regarding the first aspect (“… the reduced training set of data …”), Examiner points out this first aspect broadly recites reducing a training set of data to generate a reduced training set of data, where this concept of obtaining a reduced training set of data is taught in Honda, where Honda Figure 3 and [0035]-[0041], [0047]-[0051]; Figure 7 and [0059], and [0123]-[0130] teach a machine learning pipeline in which various preprocessing, data cleaning, feature engineering, and feature selection steps are applied to a dataset to remove unwanted data and features from a dataset and to aggregate data fields, with additional dimensionality reduction techniques to further reduce the number of features in the dataset, resulting in a reduced training set of data ([0035]-[0041]: “… The FDC pipeline 300 includes a first pipe 310 configured for cleansing the dataset … in particular, to remove unwanted data and features from the dataset … In a second pipe 320, the training dataset is converted into useful target features … time series data can be converted using customized complex feature engineering … These feature engineering techniques could include, but are not limited to:1) a statistics-based feature, like minimum, maximum, and 10-90 percentile range … A third pipe 330 implement a method for coherent and efficient feature selection …”; [0051]: “… The feature selection pipe 330 can be implemented … by making an early determination as to which sensors and/or manufacturing steps may not be providing useful data for training the ML model, and to remove them from training the model. …”; [0124]: “The dataset can be prepared for modeling by assigning to each chip the raw measurement fields from PCM, WS and FT as well as augmenting the raw fields with the engineered/enriched features … A large number of features can degrade the performance of some modeling techniques … This issue is mitigated by performing dimensionality reduction. …”). Furthermore, as indicated in the Final Office Action mailed March 9, 2022, the additional limitations reciting certain techniques for dividing and aggregating the data to build a reduced training set of data are further taught in the Bilenko and Graefe references, where those corresponding claim mappings and motivations are further detailed in the Final Office Action, and thus will not be additionally restated here. Hence, given the above evidence, the Honda, Bilenko, and Graefe references are within scope of Applicant’s claimed invention and properly teach the first aspect of reducing a training set of data into a reduced training set of data, and as such, Applicant’s argument is not persuasive, and the prior art is maintained.
Regarding the second aspect (“randomizing … to obtain a plurality of randomized reduced training sets of data; using the plurality of randomized reduced training sets of data in a classification algorithm implementing a plurality of decision trees for determining a relationship between the at least one label and the features of the electronic items”), Examiner points out this second aspect broadly recites applying a randomization method on the generated training set of data to generate a plurality of randomized training sets of data, where the randomized training sets of data are used in a classification algorithm containing a plurality of decision trees to determine an association between features and a label. Examiner additionally points out that this concept of applying a randomization method on a generated training set of data (corresponding to a reduced training set of data) to generate a plurality of randomized training sets of data is also taught in Honda, where Honda [0132]-[0135] lists several methods for randomizing training sets of data ([0133]-[0135]: 1. Random oversampling. 2. Synthetic minority oversampling technique (SMOTE). 3. Bagging.”). Examiner notes that while Applicant’s arguments is based on Applicant’s assertion stating that random oversampling is not within the scope of the above recited limitations, Examiner also points out that Applicant does not explicitly address the “bagging” (i.e., bootstrap aggregation) method taught in Honda (and also was also cited in the Final Office Action) as one of the randomization methods for generating randomized training sets of data to reduce bias in the training set (where bias is also interpreted as a form of variance in a data set). Honda further teaches applying the training sets after the randomization method into various model architectures implementing various machine learning algorithms (including random forests) to generate classification results for the RMA’ed chips (representing failures during SLT) that correspond to failure codes indicating the subsystem/stage in which the failure occurred, where the random forest algorithm represents a classification algorithm containing a plurality of decision trees, and with the resulting classification based on the processed training sets (containing feature data from each WAT/PCM, WS, CP, FT subsystem/stage) that identifies an association between RMA’ed chips and failure codes (labels) (Honda [0123]-[0137]; Figures 12-14, [0140]-[0144]: “… One possible single-level model architecture … inputs the raw and engineered features into a machine learning model … Decision trees, Random forests … are examples of non-parametric models that can be used …”). Examiner additionally points out that a person having ordinary skill in the art would understand the term “bagging” taught in the Honda reference to be an abbreviation for the term “bootstrap aggregation”, which is a known term of art, that refers to a randomization method that involves creating a plurality of random samples from a dataset and applying each random sample to each decision tree of a plurality of decision trees. Examiner also notes that this definition (i.e., a dataset is randomly sampled multiple times to generate a plurality of samples to be used in a plurality of decision trees, where the plurality of decision trees can be a random forest) is also consistent with the definition of “bagging” provided in the Brownlee reference that is used to teach another limitation (Brownlee p.2 1st-6th paragraphs: “Bootstrap Aggregation (or Bagging for short) is a simple and very powerful ensemble method. … Bootstrap Aggregation is a general procedure that can be used to reduce the variance for those algorithm that have high variance. … Bagging is the application of the Bootstrap procedure to a high-variance machine learning algorithm, typically decision trees. … Let’s assume we have a sample dataset of 1000 instances (x) … Bagging of the CART algorithm would work as follows. 1. Create many (e.g., 100) random sub-samples of our dataset with replacement. 2. Train a CART model on each sample. 3. Given a new dataset, calculate the average prediction from each model.”; and p.2 Random Forest: “Random Forests are an improvement over bagged decision trees. … It is a simple tweak. … The random forest algorithm changes this procedure so that the learning algorithm is limited to a random sample of features of which to search. …”). Examiner also notes that the application of the bootstrap aggregation/bagging method is also consistent with the Applicant’s original specification, where the “bootstrap” method is indicated as a form of performing randomization of training sets of data (p.25 2nd paragraph: “… random sampling (also called “bootstrap” …) can be performed on the raw training set of data …”; and p.27 2nd paragraph: “… other randomization method can be used to randomize the reduced training set of data (e.g., boot strap, etc.).”). Hence, Applicant’s assertion that the Honda reference does not teach a method that produces a randomized reduced training set of data is not persuasive, and the prior art rejection is maintained.
Additionally, Examiner points out that the third aspect (“… wherein each randomized reduced training set of data is used in a different decision tree of the plurality of decision trees … based on respective outputs of the plurality of decision trees”) broadly recites applying a plurality of randomized training sets of data into a plurality of decision trees, where each randomized training set of data is applied to a different decision tree to determine an association/relationship between at least one label output and associated features. This aspect of applying the plurality of randomized training sets of data is taught in the Brownlee reference, where the application of the bagging algorithm (which was established in the response to the preceding argument as performing random sampling of the original dataset to produce a plurality of randomized data sets) into a plurality of decision trees in a random forest results in respective outputs that can be further analyzed to determine the importance of an input variable and associated feature, and where a relationship between the label outputs of each decision trees and the features associated with the input is measured by a determination of importance of an input variable and associated feature with respect to the decision tree output values (Brownlee p.2 1st-6th paragraphs; p.2 Random Forest; and p.2 Variable Importance: “As the Bagged decision trees are constructed, we can calculate how much the error function drops for a variable at each split point … These drops in error can be averaged across all decision trees and output to provide an estimate of the importance of each input variable. … These output can help identify subsets of input variables that may be most or least relevant to the problem and suggest at possible feature selection experiments you could perform where some features are removed from the dataset.”). Examiner also notes that this aspect is consistent with Applicant’s original specification, where the relationships between the features and label output from the classification algorithm are used to determine an importance of one or more features  (p.30 1st-2nd paragraphs: “… the relationship between the features and the label which was built using the classification algorithm can be used to determine an importance of one or more features with respect to at least one label … A more specific definition … of the importance of a feature is provided in the article “… Random Forest Model for Silicon-to-SPICE Gap and FinFET Design Attribute Identification …”), and where the mentioned article (identified as the Won reference) teaches a bootstrap sampling method to randomly sample learning data and apply those samples to random forest trees to maximize variance reduction and determine importance indices between the outputs of the decision trees and the associated features (Won Section 2.1 and Won Section 3.2). As established in the Final Office Action mailed March 9, 2022, the motivation to combine the Honda, Bilenko, Graefe, and Brownlee references is provided in Brownlee, since determining variable importances are useful in identifying subsets of input variables that may be most or least relevant to the problem, thus allowing a way to identify and tune the model by removing certain features from the dataset, resulting in improved performance and accuracy of the classification model (Brownlee p.2 Variable Importance). Hence, given the above evidence in light of the recited limitations, the combination of the Honda, Bilenko, Graefe, and Brownlee references are within scope of the Applicant’s claimed invention and teach the limitations as recited under its broadest reasonable interpretation, and as such, Applicant’s argument is not persuasive, and the prior art rejection is maintained.
As noted above, Applicant’s amended claim limitations that were not presented earlier necessitates further examination and re-evaluation of the amended and related original claims. The updated claim mappings according to the Applicant’s amended claims are provided in the relevant sections indicated below. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 21-25, 28-29, 31-34, 36-38 and 40 are rejected under 35 U.S.C. 103 as being unpatentable over 
Honda et al., U.S. PGPUB 2019/0277913, filed 3/8/2019 [hereafter referred as Honda], in view of Bilenko et al., U.S. PGPUB 2014/0337096, published 11/13/2014 [hereafter referred as Bilenko], in further view of Graefe, Goetz, Query Evaluation Techniques for Large Databases, June 1993 [hereafter referred as Graefe], in even further view of Brownlee, Jason, Bagging and Random Forest Ensemble Algorithms for Machine Learning, retrieved from web.archive.org dated June 25, 2019 [hereafter referred as Brownlee], in even further view of Kaempf, Ulrich, The Binomial Test: A Simple Tool to Identify Process Problems, May 1995 [hereafter referred as Kaempf].
Regarding amended Claim 21, 
Honda teaches
(Currently Amended) A method comprising, by a processing unit and a memory coupled to a non-transitory computer-readable memory medium (Examiner’s note: Honda Figures 3 and 7 teach a process and a machine learning pipeline for processing input data collected from data sources during a production run for a semiconductor manufacturing process, and generating multiple predictive models for failure detection and classification, where the input data is for training the multiple predictive models used in different model architectures. A person having ordinary skill in the art would understand that performing the set of process steps and machine learning pipeline requires a computing system containing a processor and associated memory (e.g., RAM and disk storage) coupled to each other, where the associated memory stores computer instructions representing these process steps and associated machine learning pipeline to execute the process steps, where the process steps include storing the input data as well as all outputs resulting from the process steps and machine learning pipeline, and where the outputs include predictions produced from a machine learning model (Honda Figure 3 and [0035]-[0041]; Figure 7 and [0059]; and Figures 12-14).): …
… obtaining a training set of data comprising a plurality of sets of data each representative of an electronic item, each set comprising feature values for a plurality of features, and for at least one label (Examiner’s note: Honda teaches obtaining input data for a semiconductor manufacturing process, where this input data comes from the results of testing semiconductor wafers, and consists of computed and measured feature data from WAT/PCM, WS, CP, FT testing, where one of the FT test results includes a label indicating pass/fail (Honda Figure 1; [0022]-[0025]; [0088]-[0089]; and [0091]-[0122]).) …
… building a reduced training set of data which comprises an aggregated representation of the training set of data (Examiner’s note: As indicated earlier, Honda teaches a machine learning pipeline for generating and storing input data as training data for the one or more predictive models, where the machine learning pipeline includes performing feature data cleansing, feature selection, and feature engineering for datasets used to train a machine learning model, where the feature generation pipe performing the feature engineering uses statistical methods on the feature data (minimum, maximum and 10-90 percentile range) to perform feature engineering conversion of data, and where multidimensional analysis in the data cleansing step may involve removing unwanted data and features from the dataset using dimensionality reduction (Honda Figure 3, elements 310,320, 330; and [0035]-[0041], [0047]-[0051]; Figure 7 and [0059]; [0123]-[0130]).) …
… an aggregated representation of feature values for the one or more sets of data of the group (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites a data value representing an aggregated representation of feature values. As indicated earlier, Honda teaches binning chips into numeric bins ranging from hardbin=1 to hardbin=n at the WS and FT levels, where each bin contains a count of respective chips, and each bin represents a label indicating pass (hardbin=1) or failure (hardbin > 1), where these associated bin counts and labels are further aggregated at the lot level at WS and FT subsystem/stages, such that the aggregated bin counts for each hardbin label are treated as aggregated features for each lot, with associated FT pass/fail labels (hence representing data corresponding to an aggregated representation of feature values) (Honda [0102]-[0109], [0116]-[0119]).) …
… an aggregated representation of at least one label value of the one or more sets of data of the group (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites a data value representing an aggregated representation of at least one label value, such as a count of a label value. As indicated earlier, Honda teaches binning chips into numeric bins ranging from hardbin=1 to hardbin=n at the WS and FT levels, where each bin contains a count of respective chips, and each bin represents a label indicating pass (hardbin=1) or failure (hardbin > 1), where these associated bin counts and labels are further aggregated at the lot level at WS and FT subsystem/stages, such that the aggregated bin counts for each hardbin label are treated as aggregated features for each lot, with associated FT pass/fail labels (hence representing data corresponding to an aggregated representation of at least one label value) (Honda [0102]-[0109], [0116]-[0119]).) …
… data representative of a number of sets of data of the respective group (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites a data value representing a number of sets of data, such as a count value. Honda teaches chip counts per wafer at both WS and FT stages, where these chip counts represent the number of chips at each stage, and thus represent data representative of a number of sets of data of the respective group (Honda [0098]-[0099] and [0112]-[0113]).) …
… randomizing … in order to obtain a plurality of randomized reduced training sets of data (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites applying a randomization method on the generated training set of data to generate a plurality of randomized training sets of data. Honda teaches applying machine learning techniques such as random oversampling and bagging on a dataset to avoid having a skewed dataset that exhibits a bias towards a majority class. A person having ordinary skill in the art would understand the term “bagging” to refer to “bootstrap aggregation”, which is a known term of art, and is a process that involves randomizing a dataset with replacement to produce a plurality of randomized training sets, where this aspect of performing random sampling with replacement within the bagging algorithm produces a plurality of randomized reduced training sets of data ([0132]-[0135]: “… Oversampling methods can be the following: 1. Random oversampling. 2. Synthetic minority oversampling technique (SMOTE). 3. Bagging.”).) …
… using the plurality of randomized reduced training sets of data in a classification algorithm implementing a plurality of decision trees for determining a relationship between the at least one label and the features of the electronic items (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites using a plurality of randomized reduced training sets of data in a classification algorithm implementing a plurality of decision trees to determine an association between at least one label and associated features of the electronic items. As indicated earlier, Honda teaches obtaining input data for a semiconductor manufacturing process, where this input data comes from the results of testing semiconductor wafers, and consists of computed and measured feature data from WAT/PCM, WS, CP, FT testing, where one of the FT test results includes a label indicating pass/fail (Honda Figure 1; [0022]-[0025]; [0088]-[0089]; and [0091]-[0122]; Honda Figure 3, elements 310,320, 330; and [0035]-[0041], [0047]-[0051]; Figure 7 and [0059]; [0123]-[0130]). As indicated earlier, Honda additionally teaches applying machine learning techniques including bagging (i.e., bootstrap aggregation), which is a process that involves randomizing a dataset with replacement to produce a plurality of randomized training sets. Honda Figures 12-14 further teaches applying the processed training sets into various model architectures implementing various machine learning algorithms (including random forests) to generate classification results for the RMA’ed chips (representing failures during SLT) that correspond to failure codes indicating the subsystem/stage in which the failure occurred, where the random forest algorithm represents a classification algorithm containing a plurality of decision trees, with the resulting classification based on the processed training sets (containing feature data from each WAT/PCM, WS, CP, FT subsystem/stage) that identifies an association between those features present in the RMA’ed chips and associated failure codes (labels). Hence, Honda teaches a process that corresponds to using a plurality of randomized reduced training sets of data in a classification algorithm to determine a relationship between at least one label and features of electronic items (Honda [0123]-[0137], in particular [0137]: “An RMA can be viewed as a failure of an independent SLT … Some of the failures can be captured from PCM, WS and FT data … It therefore can be necessary to classify the RMA’ed chips into failure codes indicating the subsystem that failed …”; Figures 12-14, [0140]-[0144]: “… One possible single-level model architecture … inputs the raw and engineered features into a machine learning model … Decision trees, Random forests … are examples of non-parametric models that can be used …”).) …
… wherein each randomized reduced training set of data is used in … the classification algorithm for determining the relationship between the at least one label and the features of the electronic items (Examiner’s note: As indicated earlier, Honda teaches a classification method using decision trees and input data from a semiconductor manufacturing process, where this classification method based on computed and measured feature data from WAT/PCM, WS, CP, FT testing represents a determination of the relationship between at least one label and features of the electronic items (Honda Figure 1; [0022]-[0025]; [0088]-[0089]; and [0091]-[0122]; [0123]-[0137]; and Figures 12-14, [0140]-[0144]).) …
… storing an output of the classification algorithm in the non-transitory computer-readable memory medium (Examiner’s note: As indicated earlier, Honda Figures 3 and 7 teach a process and a machine learning pipeline for processing input data collected from data sources during a production run for a semiconductor manufacturing process, and generating multiple predictive models for failure detection and classification, where the input data is for training the multiple predictive models used in different model architectures. A person having ordinary skill in the art would understand that performing the set of process steps and machine learning pipeline requires a computing system containing a processor and associated memory (e.g., RAM and disk storage) coupled to each other, where the associated memory contains computer instructions representing these process steps and associated machine learning pipeline to execute the process steps, and where the process steps include storing the input data as well as all outputs resulting from the process steps and machine learning pipeline in order to produce the outputs for a machine learning model (with the outputs marked as “predictions” in Honda Figures 12-14; Figure 3 and [0035]-[0041]; Figure 7 and [0059]).).
While Honda teaches data cleansing (a form of data pre-processing), feature selection, and feature engineering and bagging techniques to produce a plurality of randomized reduced set of training data, Honda does not explicitly teach
… the building comprising: dividing the sets of data into a plurality of groups, wherein all sets of data, for which feature values meet at least one similarity criterion, are in the same group …
… storing in the reduced training set of data, for each group, at least one aggregated set of data … 
Bilenko teaches
… the building comprising: dividing the sets of data into a plurality of groups, wherein all sets of data, for which feature values meet at least one similarity criterion, are in the same group (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites the creation of a plurality of groups of data from a larger training set, where each group contain feature values that meet at least one similar criterion. Bilenko teaches a master data set containing a plurality of training examples, with each training example containing event data and an associated label (“a plurality of sets of data”, Bilenko [0032]-[0033]). Bilenko further teaches a partitioning process to produce training set instances that represent a cluster or partition, where the partitioning process divides the original dataset into multiple partitions according to attribute values identified within each training example as aspect values. Bilenko additionally teaches an aspect value is identified with the event data within each training example, and combinations of aspect values can be used to produce partitions consisting of subset of the master dataset, such that features within each partition are associated with each other based on similar rates, frequency, or common aspect values, and hence this partitioning process corresponds to “dividing the sets of data into a plurality of groups, wherein all sets of data, for which feature values meet at least one similarity criterion, are in the same group” (Bilenko Figure 1, elements 110, 112; [0033], [0036]-[0037]: “… a partitioning process 112 produces a plurality of partitions (also referred to as bins) for each aspect under consideration. Each partition is associated with a set of aspect values. … The partitioning process performs a similar partitioning process for other aspects, include both aspects associated with single attributes and aspects associated with combinations of attributes.”; and [0049]-[0053]: “… the training system produces clusters of aspect values, where the aspect values in each cluster exhibit similar label-conditioned statistical profiles. … the training system can identify the click-through rates associated with different individual user IDs. The training system can then form clusters of user IDs that have similar click-through rates. … the training system can use other partitioning strategies to produce the partitions. In one alternative technique, the training system can group aspect values based on a frequency measure. … the training system can group aspect values on the basis of shared aspect values. For example, the training system can ensure that all entries in a particular partition have at least one common aspect value (such as a particular user ID).”; and [0049]).) …
… storing in the reduced training set of data, for each group, at least one aggregated set of data (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites creating a reduced training set of data by aggregating sets of data in the training set. Bilenko teaches each of the identified partitions (based on aspect values) is further processed by an aggregation process, which generates additional associated statistical feature information based on the collected event data for each training example (with the statistical feature information corresponding to “… an aggregated representation …”), and is stored as additional feature information with the associated partition (with each partition identified as tables as shown in Bilenko Figure 3). These generated statistical feature information are associated with the corresponding data set instance as additional features such that representative instances for each cluster/partition can be identified as shown in Bilenko Figure 4 (Bilenko Figure 1, element 114; Figure 3; Figure 5, element 514; [0037]-[0038]: “… in a filtering and aggregation process 114, the training system identifies plural subsets of data, selected from the master dataset 110, for the different respective partitions. The training system then forms plural instances of statistical information based on the respective subsets of data for the respective partitions. … different implementations may generate different kinds of statistical measures. … the training system can form a count of the label values associated with the training examples. … The statistical information can also include various averages, ratios, etc.”; [0076]: “… the aggregation module 514 can form different statistical values. … the aggregation module 514 can form a count (                        
                            
                                
                                    N
                                
                                
                                    x
                                
                                
                                    +
                                
                            
                        
                    ) of the number of click events for the aspect value x in question, as well as a count (                        
                            
                                
                                    N
                                
                                
                                    x
                                
                                
                                    -
                                
                            
                        
                    ) for the number of non-click events … The aggregation module 514 can also form various other statistical measures, such as normalized counts, averages, standard deviations, click-through rates, other ratios (e.g. ln(N/N)), etc.”).) … 
	Both Honda and Bilenko are analogous art since they both teach data cleansing (a form of data pre-processing), feature selection, and feature engineering on an original training set of data to analyze each set of data to produce a reduced training set of data.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the data cleansing/data pre-processing, feature selection, and feature engineering components taught in Honda and include data partitioning and aggregation functions to perform the partitioning and aggregation steps taught in Bilenko as a way to generate a reduced training set of data comprising aggregated representations of the original training set of data. The motivation to combine is taught in Bilenko, as a way to produce an aggregated and reduced training data set that represents an optimized training data set that minimizes the loss of predictive accuracy. It is desirable to perform data cleansing/data pre-processing and feature engineering techniques to generate aggregated and reduced training data sets that minimizes the loss of predictive accuracy for a machine learning model, since these reduced and aggregated training data sets reduce the amount of memory storage needed to store the data as well as the computation time to process the data in a machine learning model, and coupled with the benefit of minimizing loss of predictive accuracy allows this reduced training data set to provide prediction results without significantly sacrificing prediction accuracy (Bilenko [0048]-[0049]: “The training system may use different partitioning strategies to define partitions. In one approach, the training system can define partitions in a manner that satisfies an objective relating to loss of predictive accuracy. … Predictive accuracy refers to an extent to which the statistical information accurately represents the labels associated with individual training examples which contribute to the statistical information. … The training system minimizes the loss of predictive accuracy by performing clustering in such a manner that the instances of statistical information accurately characterize the members in the respective clusters, while minimizing, overall, the loss of descriptive information pertaining to specific members of the clusters.”).
While Honda in view of Bilenko teaches aggregated representations using a statistical measure such as a normalized count (which is a ratio of a count of a feature divided over the total number of features, and where a count represents “data representative of a number of the one or more sets of data of the group”), Honda in view of Bilenko does not explicitly teach
… wherein for a plurality of the groups which comprise a plurality of sets of data, a number of aggregated set of data is less than the number of sets of data of the group … 
Graefe teaches
… wherein for a plurality of the groups which comprise a plurality of sets of data, a number of aggregated set of data is less than the number of sets of data of the group (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites further determining a plurality of sets of data for each plurality of groups, where the number of aggregated set of data is less than the number of sets of data. Graefe teaches aggregating functions including generating sums and counts (without normalization) to represent a set of items, with these sums and counts corresponding to “data representative of a number of sets of data of the group” (Graefe p.98 col.2 Section 4 Aggregation and Duplicate Removal 1st paragraph: “Aggregation is a very important statistical concept to summarize information about large amounts of data. The idea is to represent a set of items by a single value or to classify items into groups and determine one value per group. Most database systems support aggregate functions for minimum, maximum, sum, count, and average (arithmetic mean).”). Graefe further teaches for situations where aggregating data results produces identical sample sets, and removing the identified identical sample set, thus making the aggregated set of data less than the original sets of data in the group (Graefe p.98 col.2 last paragraph – p.99 col.1 1st paragraph (Section 4 Aggregation and Duplicate Removal): “Algorithms for aggregate functions require grouping … This grouping process is very similar to duplicate removal in which equal data items must be brought together, compared, and removed. … aggregate functions and duplicate removal are typically implemented in the same module. … in duplicate removal, items are compared on all their attributes, but only on the attributes in the by-list of aggregate functions … an identical item is immediately dropped from further consideration in duplicate removal whereas in aggregate functions some computation is performed before the second item of the same group is dropped … Because of their similarity, duplicate removal and aggregation are described and used interchangeably here.”).) … 
	Both Honda in view of Bilenko and Graefe are analogous art since they both teach managing and evaluating data sets containing sets of values using data aggregation techniques involving mathematical functions.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the data cleansing/data pre-processing, feature selection, and feature engineering components taught in Honda in view of Bilenko and enhance it to support other aggregation functions such as summation and counting, and perform additional reduction techniques such as duplicate removal taught in Graefe as a way to apply other techniques to generate a reduced training set of data comprising aggregated representations of the original training set of data. The motivation to combine is taught in Graefe, as aggregation techniques form the foundation of data processing techniques such as hashing and sorting, both of which help to optimize data storage when dealing with large amounts of data, thus allowing the system to be more memory efficient and also by allowing the concise representations of data to be stored and partitioned for further pipelining and parallelization operations the system, in order to further optimize the performance and efficiency of the system (Graefe p.158 col.1 2nd-3rd paragraphs (Summary and Outlook): A large set of query processing algorithms has been developed for relational systems. Sort- and hash-based techniques have been used for physical storage design, for associative index structures, for algorithms for unary and binary matching operations such as aggregation, duplicate removal, join, intersection, and division, and for parallel query processing using hash- or range partitioning. … Many of the existing algorithms will continue to be useful for extensible and object-oriented systems, and many can easily be generalized from sets of tuples to more general pattern-matching functions. … it allows algebraic optimizations of requests, i.e., optimizing transformations of algebra expressions and cost-sensitive translations of logical into physical expressions. Finally, it permits pipelining between operators to exploit parallel computer architectures and partitioning of stored data and intermediate results for most operators, in particular, for operators on sets but also for other bulk types such as arrays, lists, and time series.”).
While Honda in view of Bilenko, in further view of Graefe teaches bagging/bootstrap aggregation techniques and model architectures utilizing decision tree/random forest classifiers for performing classification, Honda in view of Bilenko, in further view of Graefe does not explicitly teach
… each randomized reduced training set of data is used in a different decision tree of the plurality of decision trees … based on respective outputs of the plurality of decision trees …
Brownlee teaches
… each randomized reduced training set of data is used in a different decision tree of the plurality of decision trees … based on respective outputs of the plurality of decision trees (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites applying a plurality of randomized training sets of data into a plurality of decision trees, where each randomized training set of data is applied to a different decision tree to determine an association/relationship between at least one label output and associated features. Brownlee teaches bootstrap aggregation as an ensemble method, where the bootstrap aggregation method applies samples that were randomly selected (with replacement) to a plurality of different decision trees (which can be a random forest, with each decision tree in the random forest performing classification), with the number of decision trees corresponding to the number of generated random samples. Brownlee further teaches the respective outputs/predictions from the plurality of decision trees after applying the random samples are used to determine variable importances between a label output/prediction and the features, such that these variable importances represent relationship between the label output and the features (Brownlee p.1 2nd bullet: “… The Bootstrap Aggregation algorithm for creating multiple different models from a single training dataset.”; p.2 1st-6th paragraphs: “Bootstrap Aggregation (or Bagging for short) is a simple an very powerful ensemble method … Bootstrap Aggregation is a general procedure that can be used to reduce the variance for those algorithm that have high variance. … Bagging is the application of the Bootstrap procedure to a high-variance machine learning algorithm, typically decision trees. … Bagging of the CART algorithm would work as follows. 1. Create many (e.g., 100) random sub-samples of our dataset with replacement. 2. Train a CART model on each sample. 3. Given a new dataset, calculate the average prediction from each model. … if we had 5 bagged decision trees that made the following class predictions for [[a in]] an input sample: blue, blue, red, blue and red, we would take the most frequent class and predict blue. … The only parameters when bagging decision trees is the number of samples and hence the number of trees to include.”; p.2 Random Forest: “Random Forests are an improvement over bagged decision trees. … It is a simple tweak. … The random forest algorithm changes this procedure so that the learning algorithm is limited to a random sample of features of which to search. …”; and p.2 Variable Importance: “As the Bagged decision trees are constructed, we can calculate how much the error function drops for a variable at each split point … These drops in error can be averaged across all decision trees and output to provide an estimate of the importance of each input variable. The greater the drop when the variable was chosen, the greater the importance. These outputs can identify subsets of input variables that may be most or least relevant to the problem and suggest at possible feature selection experiments you could perform where some features are removed from the dataset.”).) …
Both Honda in view of Bilenko, in further view of Graefe and Brownlee are analogous art since both teach machine learning techniques involving the bagging/bootstrap aggregation method.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the bagging/bootstrap aggregation techniques taught in Honda in view of Bilenko, in further view of Graefe and apply these techniques to a random forest model taught in Brownlee as a way to identify and determine important or relevant variables and features for further tuning and improvement to the classification model. The motivation to combine is taught in Brownlee, since determining variable importances are useful in identifying subsets of input variables that may be most or least relevant to the problem, thus allowing a way to identify and tune the model by removing certain features from the dataset, resulting in improved performance and accuracy of the classification model (Brownlee p.2 Variable Importance).
While Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee teaches determining FT failure probabilities through a supervised learning step of using the chip count and hardbin WS data as features and FT pass/fail as a label to improve wafer yields (Honda [0005], [0109]), Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee does not explicitly teach
… randomizing the number of sets of data of each of the plurality of groups of the reduced training set of data a plurality of times … wherein randomizing the number of sets of data is performed based on a respective probability distribution characterizing a respective probability to obtain different values of the data representative of the number of sets of data of the respective group …
Kaempf teaches
… randomizing the number of sets of data of each of the plurality of groups of the reduced training set of data a plurality of times … wherein randomizing the number of sets of data is performed based on a respective probability distribution characterizing a respective probability to obtain different values of the data representative of the number of sets of data of the respective group (Examiner’s note: Under its broadest reasonable interpretation in light of Applicant’s specification p.25 last paragraph-p.26 5th paragraph, this limitation broadly recites determining a probability on a data value representative of the number of sets of data (where this data value is interpreted as a count value), where some randomness is introduced to obtain different values representing the data count value from a probability distribution (such as applying a random number generator). Kaempf teaches performing binomial distributions for different numbers of dice per wafer and different mean yields, where the binomial distribution applied to wafer yields represents a probability distribution that models the type and behavior of various defect sources in a wafer manufacturing process, as it approximates the evenly and randomly distributed, repetitive behavior of various types of defect sources (Kaempf p.160 Abstract and pp.160-161 Section I. Three Types of Defect Environments). Kaempf further teaches applying the binomial distribution to compute and predict the number of wafer lots containing good die based on a number of wafers (where this number of wafers containing good die represents a count value), as well as applying the binomial test to estimate and fit the wafer yield distribution, where the selection of different count values of wafers (representing different count values) are determined within a particular distribution range of the binomial distribution curve (such as two sigma limits of a binomial area) that is associated with a particular confidence level, where the confidence level represents a probability of occurrence on the binomial distribution curve. Kaempf additionally teaches that software-based random number generation are used to simulate these patterns and distributions, thus providing the randomness aspect to the binomial probability distribution, and as such, Kaempf teaches a process for determining probabilities on the wafer yield count to obtain different count values of the number of wafers by applying a probability distribution, such that these selected different wafer count values still represent the defect behavior and features present on the wafer according to a particular probability or confidence level based on the binomial probability distribution (Kaempf p.161 col.1 Section II. The Binomial Fit of Wafer Yield Distributions; p.161 col.1-col.2 Section III. Examples of Binomial Distributions; pp.163-164 Figures 4, 5, 6, and p.164 col.1-col.2 Section VI. Confidence Levels of the Binomial Test: “… the level of confidence in the Binomial Test needs to be assessed. … yield improvement during unit process development is measured with test wafers. … Guidelines to determine the number of wafers required to assess of the results from Binomial Test with a sufficient level of confidence are presented below. … It is plausible that all wafers randomly drawn from a binomial distributed Type-A population fall within a small portion of the binomial area … the probability that all N wafers belonging to a binomial population fall within the one-sigma limits is:                         
                            
                                
                                    P
                                    
                                        
                                            1
                                            σ
                                        
                                    
                                    =
                                    (
                                    0.68
                                    )
                                
                                
                                    N
                                
                            
                        
                     … To declare Type-B behavior using the criteria that the sampled wafers fall within the two-sigma limits (95.4% of the binomial area), 24 wafers provide a 68% confidence level. To obtain a 95% confidence level, a minimum of 63 wafers is required. … With most Type-C yield distributions, it is possible to select a subset of wafers that can be fitted into a binomial distribution around the mean yield of the subset …”; and p.165 col.2 4th paragraph: “… Simulations and graphs shown in this paper have been generated with Hewlett-Packard’s RMB BASIC software. Random number generation has been used for the simulations of patterns and distributions.”).) …
Both Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee and Kaempf are analogous art since both teach methods for predicting probabilities to improve wafer yield in semiconductor manufacturing environments.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the randomized reduced training set of data taught in Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee and apply the binomial test technique taught in Kaempf as a way to estimate and predict the probability of wafer yields with a different number of test wafers. The motivation to combine is taught in Kaempf, since the binomial distribution provides a good estimation and fit for modeling the evenly and randomly distributed natures of several types of defect sources that are present on a wafer. Furthermore, Kaempf teaches that the number of wafers available in a unit test environment is limited, and producing more wafer samples for unit testing applies additional overhead on the facility, and hence applying binomial distribution tests to identify a minimum set of wafers that still represent the defect behavior and features found in each wafer according to a particular probability or confidence level provides a more cost-effective and efficient technique to predict the probabilities of wafer yield without sacrificing accuracy (Kaempf p.164 col.1-col.2 Section VI. Confidence Levels of the Binomial Test).
Regarding previously presented Claim 22, 
Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee, in even further view of Kaempf teaches
(Previously presented) The method of claim 21, wherein, 
for a feature, feature values of different sets of data meet the similarity criterion if at least one of (a), (b), (c) and (d) is met (Examiner’s note: Under its broadest reasonable interpretation, this claim limitation in a method claim recites a contingent clause that effectively renders the subsequent claim language to not be performed because the condition precedent (“if at least one of (a), (b), (c), and (d) is met”) is not required to be met, and the claimed invention can be practiced without the condition occurring. See MPEP 2111.04(II). Applicant is advised to amend the claim to positively cite the condition as being fulfilled, since no patentable weight is given for the subsequent claim language following a contingent clause that does not require the condition to be fulfilled for practicing the claimed invention. However, for the purposes of examination, this contingent clause will be treated as if the condition were fulfilled.): 
(a) the feature values are equal (Examiner’s note: Bilenko teaches aspect values in a training set are attribute values (with each aspect corresponding to “a feature”, and the respective aspect values corresponding to “feature values”) and grouping common aspect values is a form of comparing those inspected aspect values to ensure they are equal to each other (Bilenko [0053]: “… the training system can group aspect values on the basis of shared aspect values. … the training system can ensure that all entries in a particular partition have at least one common aspect value (such as a particular user ID).”).);
(b) the feature values do not differ one from the other more than a threshold; 
(c) the feature values are equal after the feature values have been approximated; 
(d) at least one of (a), (b), (c) is met and label values of the different sets of data are similar according to a second similarity criterion.  
Regarding previously presented Claim 23, 
Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee, in even further view of Kaempf teaches
(Previously presented) The method of claim 21, wherein the features correspond to manufacturing data of the electronic item (Examiner’s note: As indicated earlier, Honda teaches obtaining input data for a semiconductor manufacturing process, where this input data comes from the results of testing semiconductor wafers, and consists of computed and measured feature data from WAT/PCM, WS, CP, FT testing, where one of the FT test results includes a label indicating pass/fail (Honda Figure 1; [0022]-[0025]; [0088]-[0089]; and [0091]-[0122]).) and 
the at least one label corresponds to at least one quality attribute of the electronic item (Examiner’s note: As indicated earlier, Honda teaches obtaining input data for a semiconductor manufacturing process, where this input data comes from the results of testing semiconductor wafers, and consists of computed and measured feature data from WAT/PCM, WS, CP, FT testing, where one of the FT test results includes a label indicating pass/fail (Honda Figure 1; [0022]-[0025]; [0088]-[0089]; and [0091]-[0122]).).  
Regarding previously presented Claim 24, 
Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee, in even further view of Kaempf teaches
(Previously presented) The method of claim 21, wherein for a majority of the groups comprising a plurality of sets of data, a number of aggregated set of data is less than a number of the sets of data of the group by a magnitude of at least ten (Examiner’s note: Under its broadest reasonable interpretation, the term “magnitude” means size or extent, and hence the term “a number of aggregated set of data is less than a number of the sets of data in the group by a magnitude of at least ten” is interpreted to mean that the number of aggregated set of data is at least ten less than the number of sets of data. Bilenko teaches selecting a representative data set instance within a cluster representing an collected of data set instances with a common aspect value (“a plurality of sets of data”), where the selection of the representative data set corresponds to “wherein for a majority of the groups comprising a plurality of sets of data, a number of aggregated set of data is less than a number of the sets of data of the group …”. As shown in Bilenko Figure 4, the representative data set instance (represented by a black dot) is among at least a group of 11 or 12 other representative data set instances, hence corresponding to “a number of aggregated set of data is less than … by a magnitude of at least ten”).  
Regarding previously presented Claim 25, 
Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee, in even further view of Kaempf teaches
(Previously presented) The method of claim 21, 
wherein the aggregated representation of at least one label value of the one or more sets of data of the group comprises a sum of all label values of the one or more sets of data of the group (Examiner’s note: Bilenko teaches examples of statistical feature information that include counts of label values, with the statistical feature information that includes counts of label values corresponding to “an aggregated representation of at least one label value of the one or more sets of data of the group comprises a sum of all label values of the one or more sets of data of the group” (Bilenko Figure 1, element 114; Bilenko Figure 3; Bilenko Figure 5, element 514; [0038]: “… different implementations may generate different kinds of statistical measures. … the training system can form a count of the label values associated with the training examples.”).).  
Regarding previously presented Claim 28, 
Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee, in even further view of Kaempf teaches
(Previously presented) The method of claim 21, 
wherein the reduced training set of data comprises a plurality of aggregated sets of data (Examiner’s note: Bilenko teaches a received data set containing a plurality of training examples, with each training example containing event data and an associated label (Bilenko [0032]-[0033]). The partitioning process divides the received data set into multiple partitions according to attribute values identified within each training example as aspect values (Bilenko [0033]), where an aspect value is identified with the event data within each training example, and combinations of aspect values can be used to produce partitions consisting of subset of the received data set (with each of these partitions corresponding to a “reduced training set of data”), with the features within each partition being associated with each other based on similar rates, frequency, or common aspect values (Bilenko [0036]-[0037]; [0049]-[0053]). Each of the identified partitions based on aspect values is further processed by an aggregation process, which generates additional associated statistical feature information based on the collected event data for each training example (with the statistical feature information corresponding to “… an aggregated representation …”, Bilenko [0037]-[0038]), and is stored as additional feature information with the associated partition, with each partition containing one or more such statistical measures (corresponding to “the reduced training set of data comprises a plurality of aggregated sets of data”) (Bilenko Figure 1, elements 110, 112; [0040]: “… Each partition, in turn, provides statistical information generated for that partition, which may comprise one or more statistical measure that correspond to features”).), 
wherein the number of aggregated sets of data in the reduced training set of data does not increase if the training set of data is expanded with at least one set of data comprising feature values which are similar to feature values already present in the training set of data for at least one set of data, according to the similarity criterion (Examiner’s note: Under its broadest reasonable interpretation, this claim limitation in a method claim recites a contingent clause that effectively renders the subsequent claim language to not be performed because the condition precedent (“if the training set of data is expanded with at least one set of data …”) is not required to be met, and the claimed invention can be practiced without the condition occurring. See MPEP 2111.04(II). Applicant is advised to amend the claim to positively cite the condition as being fulfilled, since no patentable weight is given for the subsequent claim language following a contingent clause that does not require the condition to be fulfilled for practicing the claimed invention. However, for the purposes of examination, this contingent clause will be treated as if the condition were fulfilled. Graefe teaches introducing new sources of data through a read-ahead operation, and performing a merge operation that involves sorting and duplicate removal on a set of aggregated data receiving additional data, where the read-ahead operation is interpreted as “if the training set of data is expanded with at least one set of data”), and where the sorting and duplicate removal operations is interpreted as identifying “at least one set of data comprising feature values which are similar to feature values already present in the training set of data for at least one set of data, according to the similarity criterion”, with the result of the duplicate removal producing a result in which “the number of aggregated sets of data in the reduced training set of data does not increase …” the overall size of the reduced training data set (Graefe p.100 col.1 Section 4.2 Aggregation Algorithms Based on Sorting 1st – 3rd paragraphs: “Sorting will bring equal items together, and duplicate removal will then be easy. The cost of duplicate removal is dominated by the sort cost, and the cost of this naive duplicate removal algorithm based on sorting can be assumed to be that of the sort operation. For aggregation, items are sorted on their grouping attributes. This simple method can be improved by detecting and removing duplicates as early as possible, easily implemented in the routines that write run files during sorting. With such "early" duplicate removal or aggregation, a run file can never contain more items than the final output (because otherwise it would contain duplicates!), which may speed up the final merges significantly [Bitton and De Witt 1983]. … the operations discussed in the section on sorting, namely read-ahead using forecasting, merge optimizations, large cluster sizes, and reduced final fan-in for binary consumer operations, are fully applicable when sorting is used for aggregation and duplicate removal.”).).  
Regarding previously presented Claim 29, 
Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee teaches
(Previously presented) The method of claim 21, comprising, by a processing unit (Examiner’s note: As indicated earlier, Honda Figures 3 and 7 teach a process and a machine learning pipeline for processing input data collected from data sources during a production run for a semiconductor manufacturing process, and generating multiple predictive models for failure detection and classification, where the input data is for training the multiple predictive models used in different model architectures. A person having ordinary skill in the art would understand that performing the set of process steps and machine learning pipeline requires a computing system containing a processor and associated memory (e.g., RAM and disk storage) coupled to each other, where the associated memory stores computer instructions representing these process steps and associated machine learning pipeline to execute the process steps (Honda Figure 3 and [0035]-[0041]; Figure 7 and [0059]; and Figures 12-14).): 
providing at least one set of data comprising a plurality of feature values representative of at least one electronic item, for which at least one label value is to be predicted (Examiner’s note: As indicated earlier, Honda Figures 12-14 teaches providing input data collected from the PCM, WS, FT testing stages into various model architectures to generate prediction results, where the model architectures represent single level and multiple-level models implemented with machine learning algorithms such as decision trees and random forests (Honda [0140]-[0144]).), and
predicting, based on the relationship, the label value associated with the set of data, thereby allowing prediction for the at least one electronic item (Examiner’s note: As indicated earlier, Honda Figures 12-14 teaches providing input data collected from the PCM, WS, FT testing stages into various model architectures to generate prediction results, where the model architectures represent single level and multiple-level models implemented with machine learning algorithms such as decision trees and random forests, and where the prediction result represents the prediction of Returned Merchant Authorizations (RMAs) for packaged electronic chips, expressed as a probability. Honda further teaches that only chips that passed the FT testing are provided to chip users, and hence the prediction of RMAs for packaged electronic chips represents a prediction of whether that associated FT testing label is accurate or not (Honda [0087]-[0089]; [0140]-[0144]). Examiner notes that the claim language “… thereby allowing prediction for the at least one electronic item” recites an intended use of predicting the label value associated with the set of data, where this language is already reflected in the earlier claim limitations “providing at least one set of data comprising a plurality of feature values representative of at least one electronic item, for which at least one label value is to be predicted” and “predicting, based on the relationship, the label value associated with the set of data …”, and therefore is considered as redundant claim language that does not further limit the claim limitation.). 
Regarding previously presented Claim 31, 
Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee teaches
(Previously presented) The method of claim 21, wherein:
the training set data is collected from at least operational data collected from at least a manufacturing line of one or more electronic items (Examiner’s note: Under its broadest reasonable interpretation, the term “operational data collected from at least a manufacturing line” is interpreted as data collected through a normal routine process, i.e., routine testing conducted during a manufacturing process. As indicated earlier, Honda teaches obtaining input data for a semiconductor manufacturing process, where this input data comes from the results of testing semiconductor wafers, and consists of computed and measured feature data from WAT/PCM, WS, CP, FT testing, where one of the FT test results includes a label indicating pass/fail (Honda Figure 1; [0022]-[0025]; [0088]-[0089]; and [0091]-[0122]).), 
wherein the method further comprises updating the relationship between the at least one label and the features of the electronic items based on an update of the operational data during manufacturing (Examiner’s note: Honda teaches a machine learning pipeline containing a feature selection pipe, where this feature selection pipe performs determinations as to which sensors that provide the collected input data and/or manufacturing steps may not be providing useful data for training the ML model, and to remove these steps/sensors from training the model (effectively identifying and removing the attributes/features from the manufacturing process that are not considered relevant to the model). Honda Figures 4-6 further teach various processes involving this feature selection functionality, where Figure 4 teaches a scenario where sensors (and their associated collected feature data) that provides the best cross-validation accuracy are retained, while those with least cross-validation accuracy are removed, and where Figures 5 and 6 teach scenarios where sensors that identify key variables (i.e., relevant variables) in a model with the highest accuracy are retained, while those that do not identify key variables are removed (Honda Figures 4-6; and [0041], [0051]-[0054]).).  
Regarding amended Claim 32,
Claim 32 recites a system, where the system comprises of claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 21, and hence is rejected under similar rationale and motivations provided by Honda, Bilenko, Graefe, Brownlee, and Kaempf as indicated in Claim 21. In addition, Honda Figures 3 and 7 teach a process and a machine learning pipeline for processing input data collected from data sources during a production run for a semiconductor manufacturing process, and generating multiple predictive models for failure detection and classification, where the input data is for training the multiple predictive models used in different model architectures. A person having ordinary skill in the art would understand that performing the set of process steps and machine learning pipeline requires a computing system containing a processor and associated memory (e.g., RAM and disk storage) coupled to each other, where the associated memory stores computer instructions representing these process steps and associated machine learning pipeline to execute the process steps, where the process steps include storing the input data as well as all outputs resulting from the process steps and machine learning pipeline, and where the outputs include predictions produced from a machine learning model (Honda Figure 3 and [0035]-[0041]; Figure 7 and [0059]; and Figures 12-14).
Regarding previously presented Claim 33,
Claim 33 recites the system of claim 32, where the system further comprises of claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 22, and hence is rejected under similar rationale provided by Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee, in even further view of Kaempf as indicated in Claim 22, in view of the rejections applied to Claim 32.
Regarding previously presented Claim 34,
Claim 34 recites the system of claim 32, where the system further comprises of claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 23, and hence is rejected under similar rationale provided by Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee, in even further view of Kaempf as indicated in Claim 23, in view of the rejections applied to Claim 32.
Regarding previously presented Claim 36,
Claim 36 recites the system of claim 32, where the system further comprises of claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 29, and hence is rejected under similar rationale provided by Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee, in even further view of Kaempf as indicated in Claim 29, in view of the rejections applied to Claim 32.
Regarding amended Claim 37,
Claim 37 recites a non-transitory storage device readable by a machine, where the non-transitory storage device embodies a program of instructions executable by a machine to perform operations comprising of claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 21, and hence is rejected under similar rationale and motivations provided by Honda, Bilenko, Graefe, Brownlee, and Kaempf as indicated in Claim 21. In addition, Honda Figures 3 and 7 teach a process and a machine learning pipeline for processing input data collected from data sources during a production run for a semiconductor manufacturing process, and generating multiple predictive models for failure detection and classification, where the input data is for training the multiple predictive models used in different model architectures. A person having ordinary skill in the art would understand that performing the set of process steps and machine learning pipeline requires a computing system containing a processor and associated memory (e.g., RAM and disk storage) coupled to each other, where the associated memory stores computer instructions representing these process steps and associated machine learning pipeline to execute the process steps, where the process steps include storing the input data as well as all outputs resulting from the process steps and machine learning pipeline, and where the outputs include predictions produced from a machine learning model (Honda Figure 3 and [0035]-[0041]; Figure 7 and [0059]; and Figures 12-14).
Regarding previously presented Claim 38,
Claim 38 recites the non-transitory storage device of claim 37, where the non-transitory storage device further comprises of claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 22, and hence is rejected under similar rationale provided by Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee, in even further view of Kaempf as indicated in Claim 22, in view of the rejections applied to Claim 37.
Regarding previously presented Claim 40,
Claim 40 recites the non-transitory storage device of claim 37, where the system further comprises of claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 29, and hence is rejected under similar rationale provided by Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee, in even further view of Kaempf as indicated in Claim 29, in view of the rejections applied to Claim 37.
Claim 30 is rejected under 35 U.S.C. 103 as being unpatentable over 
Honda et al., U.S. PGPUB 2019/0277913, filed 3/8/2019 [hereafter referred as Honda], in view of Bilenko et al., U.S. PGPUB 2014/0337096, published 11/13/2014 [hereafter referred as Bilenko], in further view of Graefe, Goetz, Query Evaluation Techniques for Large Databases, June 1993 [hereafter referred as Graefe], in even further view of Brownlee, Jason, Bagging and Random Forest Ensemble Algorithms for Machine Learning, retrieved from web.archive.org dated June 25, 2019  [hereafter referred as Brownlee], in even further view of Kaempf, Ulrich, The Binomial Test: A Simple Tool to Identify Process Problems, May 1995 [hereafter referred as Kaempf] as applied to Claim 21; in even further view of Won et al., Random Forest Model for Silicon-to-SPICE Gap and FinFET Design Attribute Identification, October 2016 [hereafter referred as Won].
Regarding previously presented Claim 30, 
Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee, in even further view of Kaempf as applied to Claim 21 teaches
(Previously presented) The method of claim 21, further comprising …
… by a processing unit (Examiner’s note: As indicated earlier, Honda Figures 3 and 7 teach a process and a machine learning pipeline for processing input data collected from data sources during a production run for a semiconductor manufacturing process, and generating multiple predictive models for failure detection and classification, where the input data is for training the multiple predictive models used in different model architectures. A person having ordinary skill in the art would understand that performing the set of process steps and machine learning pipeline requires a computing system containing a processor and associated memory (e.g., RAM and disk storage) coupled to each other, where the associated memory stores computer instructions representing these process steps and associated machine learning pipeline to execute the process steps, where the process steps include storing the input data as well as all outputs resulting from the process steps and machine learning pipeline, and where the outputs include predictions produced from a machine learning model (Honda Figure 3 and [0035]-[0041]; Figure 7 and [0059]; and Figures 12-14).) …
… an importance of one or more features (Examiner’s note: As indicated earlier, Brownlee teaches applying bagging/bootstrap aggregation techniques to a plurality of decision tree/random forest trees to identify and determine important variables that lead to a particular prediction or outcome, in order to identify subsets of input variables that may be most or least relevant to the problem (Brownlee p.2 Variable Importance).) …
While Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee, in even further view of Kaempf teaches applying bagging/bootstrap aggregation techniques to a plurality of decision tree/random forest trees to determine variable importances, Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee, in even further view of Kaempf does not explicitly teach 
… based on the relationship, determining at least one of:
… an importance … with respect to the at least one label, the importance being representative of a level of contribution of the features in the at least one label; and 
… an impact of one or more features with respect to the at least one label, the impact being representative of whether the one or more features increase or decrease the at least one label.
Won teaches
… based on the relationship, determining at least one of:
… an importance … with respect to the at least one label, the importance being representative of a level of contribution of the features in the at least one label (Examiner’s note: Won teaches calculating an importance index based on a ratio of the sum of nodes at which a particular design attribute is used to split the S2S gap data (representing a prediction result located at a particular terminal node) into the next nodes, and the sum of all nodes in the random forest model except for terminal nodes (Won p.363 Equation 8), where larger importance values indicate which design attributes have a larger contribution towards determining the selection of a particular S2S gap data in the model (Won p.363 col.1 3rd paragraph and col.2 3rd paragraph (Section 3.3 Significant Design Attributes 1st paragraph; and p.363 Figure 12).); and
… an impact of one or more features with respect to the at least one label, the impact being representative of whether the one or more features increase or decrease the at least one label (Examiner’s note: Won teaches calculating an impact index for each design attribute based on the mean values shifts in the path from one node to a next left node during random forest node traversal (Won p.363 col.2 Equations 9 and 10), where larger minus or plus impact values indicate which design attributes have more power (influence) to drive (effect) the selection of a particular S2S gap data (representing a prediction result located at a particular terminal node) in the minus or plus directions (Won p.363 col.2 2nd and 4th paragraphs (Section 3.3. Significant Design Attributes); and p.364 Figure 13).).  
Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee, in even further view of Kaempf and Won are analogous art since they both teach processing semiconductor manufacturing data using bootstrap aggregation techniques.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the bootstrap aggregation techniques taught in Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee, in even further view of Kaempf and apply learning data related to semiconductor design attributes and S2S gap data taught in Won to further analyze and determine importance and influence of certain design attributes/features (represented by the nodes in a decision tree). The motivation to combine is taught in Won, since metrics such as importance and impact allows process engineers to identify those design attributes/features that have the most contribution to the S2S gap decision output, where this S2S gap is used as a measure of quality for improving chip yield. By identifying the most relevant or contributing design attributes through this analysis, process engineers can focus on either minimizing their influence (if it has a negative impact on the final output result or prediction) or maximizing their influence (if it has a positive impact on the final output result or prediction), and as such, provides valuable diagnostic information to improve overall chip yield in a manufacturing system (Won p.358 Section 1. Introduction 1st paragraph: “To accelerate product yield ramp-up, it is important to characterize a silicon device accurately by measuring a device-under-test (DUT) designed exactly the same as in real production chips. …”; p.359 col.1 1st paragraph (Section 1 Introduction): “S2S gap may come from incorrect modeling for particular design layouts, high layout sensitivity to process fluctuation or defects in layouts, etc. Finding design attributes that result in a large S2S gap and fixing the causes related to the design attributes, such as layout features, are crucial for timely yielding of ramp-up. But the number of design attributes is increasing significantly in the recent technology node, and the impacts of design attributes are sometimes interdependent. So it becomes more and more difficult to accurately analyze the impact of individual design attributes …”; and p.364 col.1 2nd paragraph-col.2 2nd paragraph: “As importance indicates, the S2S gap is clearly classified by the identified significant design attributes … This means that the design attributes identified by importance have an important role in determining the S2S gap. … As impact indicates, the S2S gap is verified to show a clear trend of a larger minus S2S gap under the following design attribute conditions … This means that the design attributes and values (i.e., conditions) identified by the minus value of impact, surely drive the S2S gap into the minus direction. Conversely, the design attributes and values identified by the plus value of impact have a driving force into the plus direction, as well.”).
Claims 26-27, 35, and 39 are rejected under 35 U.S.C. 103 as being unpatentable over 
Honda et al., U.S. PGPUB 2019/0277913, filed 3/8/2019 [hereafter referred as Honda], in view of Bilenko et al., U.S. PGPUB 2014/0337096, published 11/13/2014 [hereafter referred as Bilenko], in further view of Graefe, Goetz, Query Evaluation Techniques for Large Databases, June 1993 [hereafter referred as Graefe], in even further view of Brownlee, Jason, Bagging and Random Forest Ensemble Algorithms for Machine Learning, retrieved from web.archive.org dated June 25, 2019 [hereafter referred as Brownlee], in even further view of Kaempf, Ulrich, The Binomial Test: A Simple Tool to Identify Process Problems, May 1995 [hereafter referred as Kaempf] as applied to Claims 21, 32, and 37; in even further view of Chen, Hongge, Novel Machine Learning Approaches for Modeling Variations in Semiconductor Manufacturing (Masters Thesis), June 2017 [hereafter referred as Chen].
Regarding amended Claim 26, 
Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee, in even further view of Kaempf as applied to Claim 21 teaches
(Currently Amended) The method of claim 21.
However Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee, in even further view of Kaempf does not explicitly teach
… wherein the randomizing further comprises randomizing, for each aggregated set of data of the reduced training set of data, the aggregated representation of at least one label value associated with the aggregated set of data.
Chen teaches
… wherein the randomizing further comprises randomizing, for each aggregated set of data of the reduced training set of data, the aggregated representation of at least one label value associated with the aggregated set of data (Examiner’s note: Chen teaches determining an expected number of good packages by randomly packaging the dies into packages and estimating the ‘pass’ count through a binomial distribution, where the ‘pass’ count is associated with a label that is associated with the test result of a chip die, and the expected number of good packages represents an aggregated representation for the label (Chen pp.48-50 Section 4.4 Mathematical Formulation: “With a large number of testing dies, we can estimate the underlying probability by the relative frequency. We denote “positive” or fail by 1, and “negative” or pass by 0. … Without any classifiers, if we randomly package the dies into packages or stacks with s die in the stack, the failure rate of the packages is                 
                    
                        
                            p
                        
                        
                            p
                            a
                            c
                            k
                            a
                            g
                            e
                             
                            f
                            a
                            i
                            l
                             
                        
                    
                
            =                 
                    
                        
                            1
                            -
                            p
                            (
                            H
                            =
                            0
                            )
                        
                        
                            s
                        
                    
                
            . … Using the law of total expectation, the expected number of good packages in this case is given by 𝚬                
                    [
                    
                        
                            m
                        
                        
                            2
                        
                    
                
            (s)]= 𝚬[𝚬[                
                    
                        
                            m
                        
                        
                            2
                        
                    
                
            (s)|k] = 𝚬[                
                    
                        
                            kp
                            (H=0|y=0)
                        
                        
                            s
                        
                    
                
            ] =(n/s)(p)                
                    
                        
                            (H=0|y=0)
                        
                        
                            s
                        
                    
                
            p(y=0), where k is the number of stacks packaged as high end products and k is subject to a binomial distribution B(n/s, p(y=0)).”).).  
Both Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee, in even further view of Kaempf and Chen are analogous art since they both teach techniques for processing semiconductor manufacturing data to predict chip yield.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the reduced training set of data taught in Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee, in even further view of Kaempf and further perform the steps of randomly selecting a package die pass/fail value (where the package die pass/fail value represents a label) based on a binomial distribution taught in Chen as a way to further generate a plurality of randomized training sets of data. The motivation to combine is taught in Chen, as a way to linearly approximate an overall yield improvement for a number of dies in a package that closely matches the predicted values when a classifier is trained on a training set of data (see Chen p.56 Figure 4-9), thus providing a reliable way to perform an aggregated approximation for a label and improving the robustness of the training set (Chen p.57 1st paragraph: “Recalling Equation (4.18), the expected yield improvement is a function of TPR and FPR. However, FPR and TPR are constrained by the ROC curve of the classifier. Then the optimal point (FPR*, TPR*) is where the contour plot is tangent to the ROC curve and the corresponding optimal threshold y* is determined for future prediction. Figure 4-9 gives the ROC curve and contour plots of expected yield improvement𝚬                
                    [
                    
                        
                            m
                        
                        
                            2
                        
                    
                
            (s)-                
                    
                        
                            m
                        
                        
                            1
                        
                    
                
            (s)] /(n/s) with different s (number of dies in a package). From the contour plot we can see that our linearization in Equation (4.19) is a good approximation even at s = 16, which can be used as a fast estimation of the expected yield improvement.”). 
Regarding amended Claim 27, 
Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee, in even further view of Kaempf, in even further view of Chen teaches
(Currently Amended) The method of claim 26, wherein: 
randomizing the aggregated representation of at least one label value associated with the aggregated set of data is based on a probability distribution characterizing probability to obtain different values for the aggregated representation of the at least one label value (Examiner’s note: As indicated earlier in the Claim Objections section, Claim 27 is a dependent claim of Claim 26, and thus also inherits the “at least one of” aspect recited in Claim 26 for the two claim limitations that are identified and being further limited in scope in Claim 27, resulting in the interpretation of these two limitations in Claim 27 as also having an exclusive “or” relationship (such that the presence of either claim limitation is sufficient for the method). As indicated earlier, Chen teaches determining an expected number of good packages by randomly packaging the dies into packages and estimating the ‘pass’ count through a binomial distribution, where a binomial distribution is a probability distribution characterizing probability to obtain different values representing the packaged die pass/fail value (Chen pp.48-50 Section 4.4 Mathematical Formulation).).  
Regarding amended Claim 35,
Claim 35 recites the system of claim 32, where the system further comprises of claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 26, and hence is rejected under similar rationale and motivations provided by Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee, in even further view of Kaempf and Chen as indicated in Claim 26, in view of the rejections applied to Claim 32.
Regarding amended Claim 39,
Claim 39 recites the non-transitory storage device of claim 37, where the non-transitory storage device further comprises of claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 26, and hence is rejected under similar rationale and motivations provided by Honda in view of Bilenko, in further view of Graefe, in even further view of Brownlee, in even further view of Kaempf and Chen as indicated in Claim 26, in view of the rejections applied to Claim 37. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM WAI YIN KWAN whose telephone number is 303-297-4332. The examiner can normally be reached Monday-Friday 8:00am - 4:30pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 571-272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/WILLIAM WAI YIN KWAN/Examiner, Art Unit 2121                                                                                                                                                                                                        


/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121