DETAILED ACTION
The applicant’s request for continued examination regarding application number 16/254,033, filed January 22, 2019 has been entered.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on January 27, 2022 has been entered.
 
Response to Amendments
The amendment filed January 27, 2022 has been entered. Examiner acknowledges receipt of Amendments to Application 16/254,033, which include: Amendments to the Claims, and Remarks containing Applicant’s amendments. 
Regarding Applicant’s Remarks and Amendments to the Claims, Examiner has acknowledged Claims 1, 3, 6-10, 13, 15, and 18-21 have been amended, and new Claims 22-25 have been added, with Claims 2 and 14 being previously cancelled, and Claims 5, 12, and 17 being newly cancelled. Claims 1, 3-4, 6-11, 13, 15-16, and 18-25 remain pending in the application. Examiner has noted a new claim containing a new claim objection, with the claim objection identified in the indicated section below.
Regarding Applicant’s Remarks and Amendments to the Claims, Examiner has acknowledged that although Applicant contests the identified typographical error in Claim 6 as being an objection (“MP pipeline” should be corrected as “ML pipeline”), Examiner has noted that Claim 6 has 

Response to Arguments
Examiner acknowledges receipt of Arguments to Application 16/254,033, which include: Remarks containing Applicant’s arguments.
Regarding Applicant’s Remarks for Claims 1, 3, 6-10, 13, 15, and 18-21 under 35 U.S.C. 103 as being unpatentable over Ghanta et al., U.S. PGPUB 2020/0034665, filed 7/30/2018 [hereafter referred as Ghanta] in view of Maag et al., U.S. PGPUB 2017/0220403, published 8/3/2017 [hereafter referred as Maag], in further view of Abeysooriya et al., U.S. Patent 9,336,483, issued 5/10/2016 [hereafter referred as Abeysooriya]; for Claims 4, 11, and 16 under 35 U.S.C. 103 as being unpatentable over Ghanta in view of Maag, in further view of Abeysooriya as applied to Claims 3, 10, and 15; in even further view of Dirac et al., U.S. PGPUB 2015/0379424, published 12/31/2015 [hereafter referred as Dirac '424]; and for Claims 5, 12, and 17 under 35 U.S.C. 103 as being unpatentable over Ghanta in view of Maag as applied to Claims 1, 9, and 13; in further view of Dirac et al., U.S. PGPUB 2015/0379430, published 12/31/2015 [hereafter referred as Dirac '430], Examiner acknowledges Applicant’s arguments and have considered them, and have found them to be not persuasive. During the examiner interview, the Examiner identified potential issues with the amended claim limitations in  independent claim 1 that were presented during the examiner interview, with no agreement being made at the time in terms of overcoming the existing rejections. Examiner has noted that the Applicant has since introduced considerable amendments in the independent and dependent claims that were not previously presented, and thus the scope of the claims as a whole has been changed such that it necessitates further examination and re-evaluation of the amended, original, and new claims. 
Examiner further notes that the remainder of Applicant’s arguments are directed to the newly added claim limitations and new claims not previously presented, where these new claim limitations and new claims necessitate further examination and re-evaluation of the amended and related original claims. 

Claim Objections
Claim 22 is objected to because of the following informality: 
A typographical error (missing word) in the following claim limitation: “wherein determining that applying the ML model building phase of the ML [pipeline] would result in a deficient ML model comprises: …”.  Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 7 and 8 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. 
The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Regarding Claim 7,
The amended claim recites a newly-added limitation “… wherein the ML pipeline model building phase includes a utility validation phase …”, but the specification fails to disclose an embodiment where the ML pipeline model building phase includes a utility validation phase. Examiner notes that Applicant’s only indicates that a ML model building pipeline can include both a model building phase and a utility validation phase, but does not indicate that the utility validation phase is included in an ML model building phase ([117]: “… ML training includes two phases: data pre-processing phase 620 and build model phase 624. … the ML trainer 610 may include a model utility validation phase …”; and [146]-[149]: “The ML model building pipeline 1010 additionally includes a model utility validation phase (module utility validation phase 1030).  … The ML model building pipeline 1010 additionally includes an ML model building phase (build model phase 1040).”). The specification must describe and support the claims such that the public is informed of the boundaries of what constitutes infringement of the patent, as well as determining whether the claimed invention meets the criteria for patentability by distinctly claiming the subject matter which the inventor regards as the invention. See MPEP 2163. Given that there is no support of this amended claim limitation present in the specification, this amended claim limitation fails to comply with the written description requirement. For the purposes of examination, this claim limitation will be interpreted according to the support provided in the Applicant’s specification paragraphs [0146]-[0149] (“… wherein the ML pipeline ”),  indicating that the ML pipeline includes a utility validation phase.
Regarding Claim 8,
The amended claim recites the newly-added limitation: “wherein applying the utility validation phase of the ML model building phase to the conditioned data set comprises causing the processor to terminate the ML pipeline in response to determining, based on comparing the predictive ability of the first ML model and the second ML model, that the predictive ability of the first ML model fails to exceed the predictive ability of the second ML model by more than a threshold amount”, but the specification fails to disclose an embodiment that teaches the phrase “applying the utility validation phase of the ML model building phase”, where this phrase has the similar lack of written description issue identified earlier in Claim 7 (i.e., Applicant’s specification only discloses a ML model building pipeline can include a model building phase and a utility validation phase, but fails to disclose a utility validation phase is included in an ML model building phase). The specification must describe and support the claims such that the public is this amended claim limitation fails to comply with the written description requirement. For the purposes of examination, this claim limitation will be interpreted according to the support provided in the Applicant’s specification paragraphs [0146]-[0149] (“applying the utility validation phase  ”), to maintain consistency with the similar claim limitation identified in amended Claim 7.

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider 
Claims 1, 7-9, 13, and 19-23 are rejected under 35 U.S.C. 103 as being unpatentable over 
Ghanta et al., U.S. PGPUB 2020/0034665, filed 7/30/2018 [hereafter referred as Ghanta] in view of Dirac et al., U.S. PGPUB 2015/0379430, published 12/31/2015 [hereafter referred as Dirac '430], in further view of Bianco et al., A Practical and Effective Sampling Selection Strategy for Large Scale Deduplication, September 2015 [hereafter referred as Bianco].
Regarding amended Claim 1, 
Ghanta teaches
(Currently amended) A system comprising:
a memory (Examiner’s note: Ghanta teaches a machine learning system/ML management apparatus containing a processor, volatile memory, and non-volatile computer readable storage medium, where the computer readable storage medium contains computer readable program instructions (program code) for modules implementing the described functions, as well as operational data for the modules, collected as data sets (Ghanta [0018]-[0021], [0025], [0029], [0033]; and Figure 1, element 104 and [0043]-[0044]).) containing:
a target data set (Examiner’s note: Under its broadest reasonable interpretation, a “target” data set broadly recites any identified data set that is used in a machine learning system. Ghanta teaches using training, validation, test, error, and inference data sets in a machine learning system/ML management apparatus containing training, orchestration/management (policy), and inference pipelines, where these data sets are representative of types of a target data set (Ghanta Figure 2, elements 202, 204, 206a-c; Figure 1, element 104 and [0036]-[0037]: “… a machine learning system may involve various components, pipelines, data sets and/or the like – such as training pipelines, orchestration/management pipelines, inference pipelines, and/or the like … the ML management apparatus 104 provides an improvement for machine learning systems by training a first or primary machine learning model for a first/primary machine learning algorithm using a training data set, validating the first machine learning model using a validation data set, the output of which is an error data set that describes the accuracy of the first machine learning model for a second/auxiliary machine learning algorithm using the error data set. …”).); and
instructions defining a software application configured to apply a machine learning (ML) pipeline (Examiner’s note: (Examiner’s note: As indicated earlier, Ghanta teaches a non-volatile computer readable storage medium containing computer readable program instructions (program code) for modules implementing the described functions on a machine learning system/ML management apparatus (Ghanta [0018]-[0021], [0025], [0029], [0033]; and Figure 1, element 104 and [0043]-[0044]).), 
wherein the ML pipeline includes … an ML model building phase (Examiner’s note: Ghanta teaches the policy, training, and inference machine learning pipelines implementing various machine learning operations including algorithm training/inference operations, where the training machine learning pipeline is used for generating machine learning models (Ghanta Figure 2A, elements 202, 204, 206a-c and [0053]: “… machine learning pipelines 202, 204, 206a-c comprise various machine learning features, components, objects, modules, and/or the like to perform various machine learning operations such as algorithm training/inference, feature engineering, validations, scoring, and/or the like.”; and [0056]: “… the machine learning system 200 includes physical and/or logical groupings of the machine learning pipelines 202, 204, 206a-c … the ML management apparatus 104 may select a training pipeline 204 for generating a machine learning model configured for the desired objective and one or more inference pipelines 206a-c that are configured to analyze the desired objective …”). Ghanta further teaches the ML management apparatus contains a primary training module, a primary validation module, a secondary training module, a secondary validation module, an analysis module, and an action module, where the primary and secondary training modules performing machine learning training represent ML model building phases (Ghanta Figure 3 and [0075]-[0077]: “… the apparatus 300 includes an embodiment of an ML management apparatus 104 … includes one or more of a primary training module 302, a primary validation module 304, a secondary training module 306, a secondary validation module 308, an analysis module 310, and an action module 312 … the primary training module 302 is configured to train a first machine learning model … using a training data set. … the primary training module 302 may receive, read, access and/or the like a training data set and provide the training data set to a training pipeline 204 to train the machine learning model. ”; and [0085]: “… the secondary training module 306 is configured to train a second machine learning model for a second machine learning algorithm using the error data set described above.”).); and 
a processor configured to execute the instructions to perform operations (Examiner’s note: As indicated earlier, Ghanta teaches a machine learning system/ML management apparatus containing a processor, volatile memory and non-volatile computer readable storage medium, where the computer readable storage medium contains computer readable program instructions (program code) for modules implementing the described functions (Ghanta [0018]-[0021], [0025], [0029], [0033]; and Figure 1, element 104 and [0043]-[0044]).) comprising:
obtaining, from the memory, the target data set (Examiner’s note: Under its broadest reasonable interpretation, a “target” data set broadly recites any identified data set that is used in a machine learning system. As indicated earlier, Ghanta teaches using training, validation, test, error, and inference data sets in a machine learning system/ML management apparatus containing training, orchestration/management (policy), and inference pipelines, where these data sets are representative of types of a target data set, and these data sets are considered forms of operational data stored in memory (Ghanta [0019], [0036]-[0037]). Ghanta further teaches the ML management apparatus contains a primary training module, a primary validation module, a secondary training module, a secondary validation module, an analysis module, and an action module, where the primary training module is configured to train a first machine learning model using a training data set (Ghanta Figure 3 and [0075]-[0077]).);
… generating a conditioned data set based on the target data set (Examiner’s note: Under its broadest reasonable interpretation, the term “conditioned data set” broadly recites a data set that is based on an identified “target” data set (where an identified “target” data set broadly recites any identified data set that is used in a machine learning system). As indicated earlier, Ghanta teaches the policy, training, and inference machine learning pipelines implementing various machine learning operations, including feature engineering operations. A person having ordinary skill in the art would understand that the term “feature engineering” represents a process for analyzing and performing conversions on an identified data set, where the output result of the feature engineering performed on an identified data set represents a conditioned data set (Ghanta [0053]: “… machine learning pipelines 202, 204, 206a-c comprise various machine learning features, components, objects, modules, and/or the like to perform various machine learning operations such as algorithm training/inference, feature engineering, validations, scoring, and/or the like.”).) …
… determining that applying the ML model building phase of the ML pipeline would result in a deficient ML model (Examiner’s note: Under its broadest reasonable interpretation, the term “deficient” as defined in the Merriam-Webster dictionary is something that is lacking in some necessary quality, or not up to a normal standard. As indicated earlier, Ghanta teaches the ML management apparatus containing a primary training module that is configured to train a first machine learning model using a training data set (Ghanta Figure 3 and [0075]-[0077]). Ghanta further teaches applying this training data set to generate the first machine learning model, where this first machine learning model is then validated using a validation data set, and generating an error data set resulting from the output of the validation, where the error data set is used in a secondary training module configured to train a second machine learning model to predict a suitability of the first machine learning model, and where the suitability of the first machine learning model is based on a score describing the efficacy, accuracy, effectiveness of the predictions of the first model. Ghanta additionally teaches this score is further analyzed in a secondary validation model to produce suitability metrics to analyze the suitability of the second machine learning algorithm, where an analysis module further uses these suitability metrics produced from both the first and second machine learning models to determine whether these suitability metrics satisfy thresholds to eventually determine whether the first machine learning model is a good fit for generating accurate predictions for the inference set. Hence, this analysis to determine whether these thresholds are being satisfied represents a suitability analysis for the first machine learning model, and as such, the determination where these thresholds are not being satisfied represents an unsuitability condition concluding that a first machine learning model lacks efficacy, accuracy, or effectiveness (thus representing a deficient ML model) (Ghanta [0078]-[0079]; [0085]: “… the secondary training module 306 is configured to train a second machine learning model … using the error data set described above. The second machine learning algorithm may be configured to predict a suitability of the first machine learning algorithm/model … the suitability may comprise a value such as a health score that describes the efficacy, accuracy, effectiveness, … of the predictions that the first machine learning algorithm/model generates for the inference data set.”; [0089]; Table 1 and [0092]-[0093]; and [0098]-[0099]: “… the analysis module 310 may determine whether the suitability score based on the metrics/health scores in Table 1 satisfies a threshold to determine (1) whether the second machine learning algorithm/module is a good fit for validating the predictive performance of the first machine learning algorithm/model, and if so (2) whether the first machine learning algorithm/model is a good fit for generating accurate predictions for the inference data set. … the ML management apparatus 104 can predict, in real time, the efficacy of a trained model on generating predictions for an inference data set … and if it determines that the trained model is not generating accurate predictions, the ML management apparatus can react accordingly as described below with reference to the action module 312.”).); and
providing an indication of inadequacy of the conditioned data set (Examiner’s note: Under its broadest reasonable interpretation, the term “inadequacy” as defined in the Merriam-Webster dictionary is representing a state or condition of not being adequate, not enough, or not good enough. As indicated earlier, Ghanta teaches the analysis module determining suitability of the first machine learning model for generating accurate predictions, where the determination of unsuitability (e.g., the model is not generating accurate predictions) identifies a condition that the first machine learning model is a deficient ML model. Ghanta further teaches the analysis module, in response to determining that the first machine learning model is deficient based on not satisfying predetermined suitability thresholds, can request the action module to perform various actions associated with the first machine learning algorithm, where these additional actions include retraining the first machine learning model using different training data set, switching the first machine learning model to a different machine learning model for training, or recommending a different first machine learning algorithm to be used to analyze the inference data set, and where the request from the analysis module to the action module to perform any of the above actions represents an indication of inadequacy with regards to the first machine learning model (Ghanta [0098]; and Figure 4, element 410 and [0100]-[0102]: “… the action module 312 is configured to trigger an action associated with the first machine learning algorithm, dynamically in real time, in response to the predicted suitability of the first machine learning algorithm/model for analyzing the inference data set not satisfying a predetermined suitability threshold. … the action comprises retraining the first machine learning model … using a different training data set … the action comprises switching the first machine learning model to a different machine learning model trained on different training data … the action comprises recommending one or more different first machine learning algorithms for analyzing the inference data set …”; and Figure 5, elements 514, 516, 518, 520).).
While Ghanta teaches machine learning pipelines performing various machine learning operations such as algorithm training/inference and feature engineering, and are associated with third party analytic engines for performing machine learning numeric computations and analysis (Ghanta [0053], [0055]), Ghanta does not explicitly teach
… wherein the ML pipeline includes a data pre-processing phase …
… applying the data pre-processing phase of the ML pipeline to the target data set before applying the ML model building phases of the ML pipeline; …
Dirac ‘430 teaches
… wherein the ML pipeline includes a data pre-processing phase (Examiner’s note: Dirac ‘430 teaches a machine learning service processing a model creation request that contains a training data set and recipes/constraint instructions for a feature processing manager, where the feature processing manager uses these recipes and constraints to generate a candidate set of feature processing transformations by scheduling one or more feature processing jobs, and using the processed (and pruned) training data set to execute the scheduled model training/re-training jobs, where this feature processing manager represents a data pre-processing phase that performs various validation checks on the data (Dirac ‘430 Figure 9a, element 904 and [0127]-[0128], [0130]), and the processed (and pruned) training data set represents a conditioned data set based on the training data set (Dirac ‘430 Figure 42, elements 4210, 4220, 4227, 4080, 4255, 4261 and [0239]-[0240]: “… The model creation request 4210 may indicate … one or more training sets 4220 … one or more feature processing recipes 4226 … a client may also optionally indicate one or more constraints 4227, such as a mandatory feature processing transformation… The FP manager 4080 may generate a candidate set of feature processing transformations … a number of different jobs may be generated and scheduled during this process, including … one or more feature processing jobs 5244, … and/or one or more training or re-training jobs 4261. … The FP manager may consult the MLS’s knowledge base of best practices to identify candidate transformations … based on the problem domain being addresse[d] by the model to be created or trained … once a candidate set of FPTs (feature processing transformations) is identified, some subset of the transformations may be removed or pruned from the set in each of several optimization iterations, and different variants of the model may be trained … using the pruned FPT sets.”).) …
… applying the data pre-processing phase of the ML pipeline to the target data set before applying the ML model building phases of the ML pipeline (Examiner’s note: Under its broadest reasonable interpretation, a “target” data set broadly recites any identified data set that is used in a machine learning system. As indicated earlier, Dirac ‘430 teaches a machine learning service that performs feature processing to generate a processed (and pruned) training set representing a conditioned data set, where this conditioned data set is used to execute the scheduled model training/re-training jobs, such that this flow of performing feature processing to generate a conditioned data set that is used for the scheduled model training/re-training jobs represents a flow of applying the data pre-processing phase of the ML pipeline to a target data set before applying the ML model building phase (Dirac ‘430 Figure 42, elements 4210, 4220, 4227, 4080, 4255, 4261 and [0239]-[0240]).); …
	Both Ghanta and Dirac ‘430 are analogous art since both teach performing data/feature processing on training data sets.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the data/feature processing step taught in Ghanta and incorporate the feature processing manager taught in Dirac ‘430 as a way to further define prediction quality metrics and run-time goals to improve the prediction performance of the ML pipeline. The motivation to combine is taught in Dirac ‘430, since the feature processing manager can process the values in the training set and apply the associated feature constraints and feature processing transformations based on corresponding quality metrics and run-time goals to improve the performance of the machine learning pipeline, thus making the ML pipeline more computationally efficient while at the same time satisfying the required training goals in the system (Dirac ‘430 [0232]).
	While Ghanta in view of Dirac ‘430 teaches data/feature processing in a ML pipeline, Ghanta in view of Dirac ‘430 does not explicitly teach
… identifying one or more duplicate data entries within the target data set; …
… wherein the conditioned data set includes each entry of the target data set except for the identified duplicate entries; …
… determining that the conditioned data set comprises less than a threshold quantity of unique entries; …
	Bianco teaches
… identifying one or more duplicate data entries within the target data set (Examiner’s note: Bianco teaches a T3S process for deduplicating data, where Sig-Dedup filters are used to perform an initial pre-processing deduplication stage based on prefix, length, positional, suffix filtering to identify a set of candidate matching pairs of records from a dataset (to remove initial duplicate records), as well as a first stage that performs sorting and ranking of these candidate pairs of records according to similarity values, where the rank represents different sample levels of similarity, with the lowest level [0.0-0.1] representing a large number of candidate pairs, and the highest level [0.9-1.0] representing a large number of matching pairs, where the process of identifying a set of duplicate data entries as candidate matching pairs and grouping them into different levels based on similarity represents a process for identifying one or more duplicate data entries within a target data set (Bianco p.2308, Figure 1. T3S steps overview; p.2307 col.1 Section 3.1 Signature-Based Deduplication (Sig-Dedup) 1st-5th paragraphs; p.2308 col.2 Section 4.1 Identifying the Approximate Blocking Threshold 1st-3rd paragraphs; p.2309 col.1 4th paragraph; p.2309 col.2 2nd-3rd paragraphs (Section 4.2 First Stage: Sample Selection Strategy): “The first stage of T3S adopts the concept of levels to allow each sample to have a similar diversity to that of the full set of pairs. The ranking, created by the blocking step, is fragmented into 10 levels … by using the similarity value of each candidate pair. … level [0.0-0.1] is composed of a large number of non-matching pairs (i.e., highly dissimilar records) while level [0.9-1.0] has matching pairs only.”).); …
… wherein the conditioned data set includes each entry of the target data set except for the identified duplicate entries (Examiner’s note: As indicated earlier, Bianco teaches a T3S process for deduplicating data, containing an initial pre-processing deduplication stage based on Sig-Dedup filtering to remove initial duplicate records, and a first stage that sorts and ranks the remaining candidate pairs Bianco p.2310 Figure 2; p.2309 col.2 Section 4.3 Second Stage: Redundancy Removal, 2nd paragraph: “The second stage of T3S aims at incrementally removing the non-informative or redundant pairs inside each sample level by using the SSAR … active learning method [21]. By redundant, we mean pairs carrying very similar information; the inclusion of a redundant pair does not contribute with useful information for the learning process. … The purpose of SSAR is to select for labeling only the most informative pairs required to maximize the training size diversity …”; p.2310 Algorithm 1 and 2nd paragraph: “Details of SSAR are shown in Algorithm 1. At each round, an unlabeled pair                         
                            
                                
                                    u
                                
                                
                                    i
                                
                            
                        
                     is used as a filter to remove irrelevant features and examples from D. … The objective of this procedure is to select the most dissimilar unlabeled pair by making a comparison with the current training set. … If the most dissimilar pair is not already present in the training set, it is labeled by the user and inserted into the training set D. … The idea is that this pair is the best “representative” of the information contained in the collection.”; and p.2311 col.1 3rd paragraph: “Our sample selection strategy incrementally invokes SSAR by using each level and the current training set as input. … As Fig.2 show, by using the incremental active selection, the redundant pairs at the levels [0.1-0.2, 0.2-0.3, 0.3-0.4, 0.4-0.5, and 0.5-0.6] can be removed, reducing the labeling effort.”).); …
… determining that the conditioned data set comprises less than a threshold quantity of unique entries (Examiner’s note: Under its broadest reasonable interpretation, this limitation broadly recites a comparison step against a threshold, where the threshold is based on a number of records that is present after a deduplication step. As indicated earlier, Bianco teaches a T3S process for deduplicating data, containing an initial pre-processing deduplication stage based on Sig-Dedup filtering to remove initial duplicate records, a first stage that sorts and ranks the candidate pairs according to different similarity Bianco p.2309 col.1 1st-2nd paragraphs: “… A random subset is selected from the dataset that is matched by using a variable threshold which varies in fixed ranges. The stopping criterion specifies that the number of pairs needed to satisfy the Sig-Dedup filters must be lower than the subset size. When compared with the entire dataset, the random subset naturally decreases the number of true matching pairs. … When the threshold value is incrementally increased, fewer tokens in the sorted record are index, thus reducing the number of candidate pairs. On the other hand, a high threshold value selects few tokens in the sorted record and a lot of matching pairs can be pruned out. The stopping criterion produces a threshold that avoids both: a large generation of candidate pairs and recall degradation …”; p.2313 col.2 Section 5.4 Identifying the Initial Threshold 1st paragraph and p.2314 col.1 1st-2nd paragraphs).); …
Both Ghanta in view of Dirac ‘430 and Bianco are analogous art since both teach performing data/feature processing on training data sets.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the data/feature processing steps taught in Ghanta in view of Dirac ‘430 and incorporate the data deduplication steps taught in Bianco as a way to perform further data transformation and data validation on an input dataset. The motivation to combine is taught in Bianco, since detecting and removing duplicates in data sets results in considerable improvements in data quality, where the deduplication will also result in producing a training data set that contains more relevant information in which to perform additional machine learning training and analysis, thus making a system that trains on (Bianco p.2305 col.1-col.2 Introduction: “… data quality can be degraded mostly due to the presence of duplicate pairs with misspellings, abbreviations, conflicting data, and redundant entities, among other problems … a system designed to collect scientific publications on the Web to create a central repository … may suffer a lot in the quality of its provided services, e.g., search or recommendation may not produce results as expected by the end user due to the large number of replicated or near-replicated publications dispersed on the Web (e.g., a query response composed mostly by duplicates may be considered as having low informative value). The ability to check whether a new collected object already exists in the data repository (or a close version of it) is an essential task to improve data quality. … Considerable improvements in data quality can be obtained by detecting and removing duplicates.”; p.2317 Figure 6 and col.1-col.2 (Section 5.7 T3S vs ALISA, ALD, and Christen (2008)): “We present experiments with the real-world and one synthetic datasets …  It can be seen, that T3S-[NGram and SVM] converge very quickly, producing good effectiveness with only a few manually labeled pairs … Note that T3S clearly outperforms ALIAS with a reduced labeling effort in both real datasets. … T3S-SVM requires only 103 and 31 labeled pairs (a reduction of 21 and 78 percent), reaching a statistically significant gain of 3 percent …”).
Regarding amended Claim 7,
 Ghanta in view of Dirac ‘430, in further view of Bianco teaches
(Currently amended) The system of claim 1, 
wherein the operations comprise applying the ML model building phase to generate an ML model, wherein the ML pipeline (Examiner’s note: As indicated earlier, this claim limitation exhibits a 112(a) lack of written description issue, and hence for purposes of examination, this limitation will be interpreted according to the support provided in the Applicant’s specification (“… wherein the ML pipeline includes a utility validation phase …”). As indicated earlier, Ghanta teaches the policy, training, and inference machine learning pipelines implementing various machine learning operations including algorithm training/inference operations, where the training machine learning pipeline is used for generating machine learning models (Ghanta Figure 2A, elements 202, 204, 206a-c and [0053]; and [0056]). As indicated earlier, Ghanta further teaches the ML Ghanta [0075]; [0078]: “… the primary validation module 304 is configured to validate the first machine learning algorithm/model using a validation data set.”; and [0088]: “… the secondary validation module 308 is configured to determine a suitability of the second machine learning algorithm for predicting the suitability of the first machine learning algorithm.”).), 
wherein the utility validation phase comprises causing the processor to: 
generate first and second ML models from the conditioned data set, wherein the first ML model corresponds to the ML model generated during the ML model building phase, and wherein the second ML model has fewer trainable parameters than the first ML model (Examiner’s note: Under its broadest reasonable interpretation, this claim limitation broadly recites validating first and second ML models using a conditioned data set, where the term “conditioned data set” broadly recites a data set based on a target data set (where an identified “target” data set broadly recites any identified data set that is used in a machine learning system). As indicated earlier, Ghanta teaches using training, validation, test, error, and inference data sets (“target data set”) in a machine learning system/ML management apparatus containing training, orchestration/management (policy), and inference pipelines (Ghanta [0019], [0036]-[0037]). As indicated earlier, Ghanta teaches the policy, training, and inference machine learning pipelines implementing various machine learning operations, including feature engineering operations, where a person having ordinary skill in the art would understand that “feature engineering” analyzes and performs conversions on a data set, where the output result of the feature engineering processing performed on a data set represents a conditioned data set (Ghanta [0053]). Ghanta further teaches a primary validation module performing validation of a first machine learning model using a validation data set, where this first machine learning model was generated from a ML training pipeline, as well as teaching a secondary validation module performing validation of a second machine learning model based on a statistical analysis of training statistics generated by the second machine learning model (where this second machine learning model was generated from a ML training pipeline by the secondary training module using a different machine learning algorithm, using the subset of data (error data set) produced Ghanta [0075]-[0077]; [0079]: “… The resulting output of the validation of the first machine learning algorithm/model, in one embodiment, comprises an error data set. The error data set … includes values indicating the prediction error of the first machine learning algorithm/model on the validation data set (e.g., a rate, a score, or other value that indicates how often the first machine learning algorithm/model accurately predicted a label …”; [0085]-[0087]: “… the secondary training module 306 is configured to train a second machine learning model for a second machine learning algorithm using the error data set described above. The second machine learning algorithm may be configured to predict a suitability of the first machine learning algorithm/model … the second machine learning algorithm is different than the first machine learning algorithm … the secondary training module 306 enhances the error data set by including additional data to supplement the prediction error data … the secondary training module 306 may include data for additional features such as features of the data set itself (e.g., the secondary training module 306 may select all or a subset of the available features of the error data set itself) …”).), 
compare a predictive ability of the first ML model and the second ML model (Examiner’s note: As indicated earlier, Ghanta teaches an analysis module that uses the identified suitability metrics produced from the validation modules for both the first and second machine learning models to determine whether these suitability metrics (i.e., confidence metrics, accuracy metrics, precision metrics, etc., where these metrics represent metrics used for determining predictive ability) satisfy thresholds for determining whether the first machine learning model is a good fit for generating accurate predictions for the inference set, such that this analysis to determine whether these thresholds are being satisfied represents a suitability analysis for the first machine learning model, and as such, the determination where these thresholds are not being satisfied represents an unsuitability condition concluding that a first machine learning model lacks efficacy, accuracy, or effectiveness, where this suitability analysis involving output produced from the first and second machine learning model represents a process for comparing predictive ability of the first ML model and the second ML model (Ghanta [0078]-[0079]; [0085]-[0087]: “… the secondary training module 306 may include … statistical signature scores for each sample in the data set, prediction values from the first machine learning algorithm … confidence metrics …”; [0089]: “… the secondary validation module 308 analyzes other statistics, such as training statistics, to determine suitability of the second machine learning algorithm in accurately assessing the effectiveness of the first machine learning algorithm … confidence metrics, accuracy metrics, precision metrics, and/or the like …; Table 1 and [0092]-[0093]: “The analysis module 310 … is configured to determine whether the first machine learning algorithm/model is a suitable algorithm/model for generating predictions for the inference data set based on the predictions that the second machine learning algorithm generates. … the analysis module 310 may determine whether the various metrics/health scores each satisfy a threshold value, if a percentage of the metrics/health scores satisfy threshold values, of if a calculated combination of various health scores (e.g., an average) satisfies a threshold. If so, then the analysis module 310 may determine that the first machine learning algorithm/model is generating accurate predictions for the inference data set. In some embodiments, the health scores/values may include prediction confidence values, data deviation values, AB testing values, canary values, and/or the like.”; and [0098]-[0099]: “… the analysis module 310 may determine whether the suitability score based on the metrics/health scores in Table 1 satisfies a threshold to determine (1) whether the second machine learning algorithm/module is a good fit for validating the predictive performance of the first machine learning algorithm/model, and if so (2) whether the first machine learning algorithm/model is a good fit for generating accurate predictions for the inference data set.”).).  
Regarding amended Claim 8, 
Ghanta in view of Dirac ‘430, in further view of Bianco teaches
(Currently amended) The system of claim 7, 
wherein applying the utility validation phase of the ML pipeline model building phase to the conditioned data set comprises causing the processor to 
terminate the ML pipeline in response to determining, based on comparing the predictive ability of the first ML model and the second ML model, that the predictive ability of the first ML model fails to exceed the predictive ability of the second ML model by more than a threshold amount (Examiner’s note: As indicated earlier, Ghanta teaches an analysis module that uses the identified suitability metrics produced from the validation modules for both the first and second machine learning models to determine Ghanta [0078]-[0079]; [0085]-[0087]; [0089]; Table 1 and [0092]-[0093]; and [0098]-[0099]). As indicated earlier, Ghanta teaches these additional actions include retraining the first machine learning model using different training data set, switching the first machine learning model to a different machine learning model for training, or recommending a different first machine learning algorithm to be used to analyze the inference data set. A person having ordinary skill in the art would understand that any of these above actions/recommendations (changing a different training set, switching different machine learning models for training, recommending different learning algorithms for inference) would require the current running ML pipeline to be terminated before these additional actions are applied to a machine learning pipeline (Ghanta Figure 5, elements 514, 516, 518, 520; and Figure 4, element 410 and [0100]-[0102]; [0109]: “…the analysis module 310 determines 514 whether the predicted suitability of the first machine learning algorithm/model satisfies a predetermined suitability threshold. If so, the method 500 ends. Otherwise, the action module 312 triggers one or more actions associated with the first machine learning algorithm. For instance, the action module 312 may trigger retraining 516 the first machine learning model with different training data, may trigger switching 518 the first machine learning model to a different machine learning model that is trained using different training data, may recommend 520 different machine learning algorithms for analyzing the inference data set, may update 522 suitability thresholds, and/or the like, and the method 500 ends.”).).  
Regarding amended Claim 9, 
Claim 9 recites a method comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 1, and hence is rejected under similar rationale and motivations provided by Ghanta, Dirac ‘430, and Bianco as indicated in Claim 1.
Regarding amended Claim 13, 
Claim 13 recites an article of manufacture including a non-transitory computer-readable medium having stored thereon program instructions that upon execution by a computing system, causes the computing system to perform operations comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 1, and hence is rejected under similar rationale and motivations provided by Ghanta, Dirac ‘430, and Bianco as indicated in Claim 1. As indicated earlier, Ghanta teaches non-volatile computer readable storage medium containing computer readable program instructions (program code) for modules implementing the described functions on a machine learning system/ML management apparatus (Ghanta [0018]-[0021], [0025], [0029], [0033]; and Figure 1, element 104 and [0043]-[0044]), including examples of CD-ROM, DVD, and memory stick containing these program instructions, where such examples of computer readable storage medium containing computer readable program instructions represent articles of manufacture.
Regarding amended Claim 19, 
Claim 19 recites the article of manufacture of claim 13, where the article of manufacture further comprises of operations comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 7, and hence is rejected under similar rationale provided by Ghanta in view of Dirac ‘430, in further view of Bianco as indicated in Claim 7, in view of rejections from Claim 13.
Regarding original Claim 20, 
Claim 20 recites the article of manufacture of claim 19, where the article of manufacture further comprises of operations comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 8, and hence is rejected under similar rationale provided by Ghanta in view of Dirac ‘430, in further view of Bianco as indicated in Claim 8, in view of rejections from Claim 19.
Regarding amended Claim 21, 
Ghanta in view of Dirac ‘430, in further view of Bianco teaches
The system of claim 1, wherein, the operations comprise, in response to determining that the conditioned data set associated with the target data set is inadequate, 
applying the data pre-processing phase of the ML pipeline to an additional target data set (Examiner’s note: Under its broadest reasonable interpretation, the ‘or’ identified in this claim limitation is interpreted as an exclusive ‘or’ defining a list of alternatives, indicating that only one of the identified limitations (“apply the ML pipeline to an additional target data set”, “change a criterion used to select the target data set”, or “both”) is required for the claimed invention. As indicated earlier, Ghanta teaches the analysis module, in response to determining that the first machine learning model is unsuitable (deficient) based on not satisfying predetermined suitability thresholds, can request the action module to perform various actions associated with the first machine learning algorithm, where these additional actions include retraining the first machine learning model using different training data set, switching the first machine learning model to a different machine learning model for training, or recommending a different first machine learning algorithm to be used to analyze the inference data set. A person having ordinary skill in the art would understand that any of these above actions/recommendations (changing a different training set, switching different machine learning models for training, recommending different learning algorithms for inference) would require the current running ML pipeline to be terminated before these additional actions are applied to a machine learning pipeline, and that any re-training or switching/replacing the first machine learning algorithm/model would involve applying the feature processing and data processing methods in the training and inference pipelines using additional training, validation, test, error, and inference data sets (representing additional target data sets) (Ghanta Figure 5, elements 514, 516, 518, 520; and Figure 4, element 410 and [0100]-[0102], [0109]; Figure 2A, elements 202, 204, 206a-c and [0053], [0056]; Figure 3 and [0075]-[0077]-[0079], [0085]; and Dirac ‘430 Figure 42, elements 4210, 4220, 4227, 4080, 4255, 4261 and [0239]-[0240]).), 
changing a criterion used to select the target data set, or both.
Regarding new Claim 22, 
Ghanta in view of Dirac ‘430, in further view of Bianco teaches
(New) The system of claim 1, 
wherein determining that applying the ML model building phase of the ML [pipeline] would result in a deficient ML model (Examiner’s note: As indicated earlier, Ghanta teaches an analysis module Ghanta [0078]-[0079]; [0085]; [0089]; Table 1 and [0092]-[0093]; and [0098]-[0099]).) comprises:
determining one or more failure metrics associated with the conditioned data set (Examiner’s note: Under its broadest reasonable interpretation, the term “failure” as defined in the Merriam-Webster dictionary indicates something that lacks success or failing short (a deficiency), resulting in the term “failure metrics” to broadly recite any set of metrics that are used to determine a deficient result. As indicated earlier, Ghanta teaches an analysis module that uses the identified suitability metrics (e.g., health scores, confidence metrics, data deviation values, accuracy metrics, precision metrics, etc.) produced from both the first and second machine learning models to perform additional analysis to determine whether these suitability metrics satisfy thresholds for determining whether the first machine learning model is a good fit (or deficient) for generating accurate predictions, such that these identified suitability metrics are also a representation of failure metrics, according to whether or not their comparison results against thresholds meet or fall below the threshold amount (Ghanta [0078]-[0079]; [0085]-[0087]: “… the secondary training module 306 may include … statistical signature scores for each sample in the data set, prediction values from the first machine learning algorithm … confidence metrics …”; [0089]: “… the secondary validation module 308 analyzes other statistics, such as training statistics, to determine suitability of the second machine learning algorithm in accurately assessing the effectiveness of the first machine learning algorithm … confidence metrics, accuracy metrics, precision metrics, and/or the like …; Table 1 and [0092]-[0093]: “The analysis module 310 … is configured to determine whether the first machine learning algorithm/model is a suitable algorithm/model for generating predictions for the inference data set based on the predictions that the second machine learning algorithm generates. … the analysis module 310 may determine whether the various metrics/health scores each satisfy a threshold value, if a percentage of the metrics/health scores satisfy threshold values, of if a calculated combination of various health scores (e.g., an average) satisfies a threshold. If so, then the analysis module 310 may determine that the first machine learning algorithm/model is generating accurate predictions for the inference data set. In some embodiments, the health scores/values may include prediction confidence values, data deviation values, AB testing values, canary values, and/or the like.”; and [0098]-[0099]: “… the analysis module 310 may determine whether the suitability score based on the metrics/health scores in Table 1 satisfies a threshold to determine (1) whether the second machine learning algorithm/module is a good fit for validating the predictive performance of the first machine learning algorithm/model, and if so (2) whether the first machine learning algorithm/model is a good fit for generating accurate predictions for the inference data set.”).); and
terminating the ML pipeline, in response to determining that at least one of the one or more failure metrics exceeds a respective threshold value (Examiner’s note: Under its broadest reasonable interpretation, the term “failure” as defined in the Merriam-Webster dictionary indicates something that lacks success or failing short (a deficiency), resulting in the term “failure metrics” to broadly recite any set of metrics that are used to determine a deficient result. As indicated earlier, Ghanta teaches an analysis module that uses the identified suitability metrics (e.g., health scores, confidence metrics, data deviation values, accuracy metrics, precision metrics, etc.) produced from both the first and second machine learning models to perform additional analysis to determine whether these suitability metrics satisfy thresholds for determining whether the first machine learning model is a good fit (or deficient) for generating accurate predictions, and where the analysis module triggers additional actions performed by an action module based on the earlier comparison result determining that the thresholds are not satisfied, where these actions (changing a different training set, switching different machine learning models for training, recommending different learning algorithms for inference) represent the termination of a current ML pipeline before re-initiating the next pipeline (Ghanta [0078]-[0079]; [0085]-[0087]; Table 1 and [0092]-[0093]; and [0098]-[0099]). A person having ordinary skill in the art would understand that using and comparing suitability metrics (indicating success) against threshold values representing a minimum ).
Regarding new Claim 23,
 Ghanta in view of Dirac ‘430, in further view of Bianco teaches
(New) The system of claim 22, wherein determining the one or more failure metrics 
occurs at one or more sub-phases of the ML pipeline (Examiner’s note: Under its broadest reasonable interpretation, the term “sub-phase” broadly recites any decision point or step performed in the ML pipeline, including decisions or steps within defined ML pipeline phases. As indicated earlier, Ghanta teaches an analysis module that uses the identified suitability metrics (e.g., health scores, confidence metrics, data deviation values, accuracy metrics, precision metrics, etc.) produced from both the first and second machine learning models to perform additional analysis to determine whether these suitability metrics satisfy thresholds for determining whether the first machine learning model is a good fit (or deficient) for generating accurate predictions, such that these identified suitability metrics are also a representation of failure metrics, according to whether or not their comparison results against thresholds meet or fall below the threshold amount, with the generation of these scores/metrics done within the primary and secondary validation modules, and the comparison of these scores/metrics against thresholds and resulting actions done in an analysis and action module following the secondary validation module (Ghanta [0078]-[0079]; [0085]-[0087]; Table 1 and [0092]-[0093]; and [0098]-[0099]).).
Claims 3, 10, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over 
Ghanta et al., U.S. PGPUB 2020/0034665, filed 7/30/2018 [hereafter referred as Ghanta] in view of Dirac et al., U.S. PGPUB 2015/0379430, published 12/31/2015 [hereafter referred as Dirac '430], in further view of Bianco et al., A Practical and Effective Sampling Selection Strategy for Large Scale Deduplication, September 2015 [hereafter referred as Bianco] as applied to Claims 1, 9, and 13; in even further view of Maag et al., U.S. PGPUB 2017/0220403, published 8/3/2017 [hereafter referred as Maag].
Regarding amended Claim 3, 
Ghanta in view of Dirac ‘430, in further view of Bianco as applied to Claim 1 teaches
 The system of claim 1, 
… wherein the operations comprise applying the data pre-processing phase of the ML pipeline (Examiner’s note: As indicated earlier, Dirac ‘430 teaches a machine learning service that performs feature processing to generate a processed (and pruned) training set representing a conditioned data set, where this conditioned data set is used to execute the scheduled model training/re-training jobs, such that this flow of performing feature processing to generate a conditioned data set that is used for the scheduled model training/re-training jobs represents a flow of applying the data pre-processing phase of the ML pipeline to a target data set before applying the ML model building phase (Dirac ‘430 Figure 42, elements 4210, 4220, 4227, 4080, 4255, 4261 and [0239]-[0240]).) …
While Ghanta in view of Dirac ‘430, in further view of Bianco teaches feature processing and data-preprocessing applied to target data sets (represented by training, validation, test, error, inference data sets) to produce conditioned data sets, Ghanta in view of Dirac ’430, in further view of Bianco does not explicitly teach
wherein the conditioned data set is arranged in columns and rows, wherein the columns define fields of the conditioned data set and the rows define entries in the conditioned data set, and
… to determine that the conditioned data set is inadequate based on, for a particular column of the columns of the conditioned data set, at least one of 
(i) the particular column being empty;
(ii) more than a threshold amount of the entries in the particular column being empty; 
(iii) fewer than the threshold amount of the entries in the particular column not being empty; 
(iv) the particular column containing a single unique value; or 
(v) values of the particular column being skewed beyond the threshold amount of the entries in the particular column.  
Maag teaches
wherein the conditioned data set is arranged in columns and rows, wherein the columns define fields of the conditioned data set and the rows define entries in the conditioned data set (Examiner’s note: Under its broadest reasonable interpretation, the term “conditioned data set” broadly recites a data set based on an identified “target” data set (where an identified “target” data set broadly recites any  Maag teaches data pipelines performing data transformations on data sets obtained from data sources, where these data transformations involve mapping data elements in a source data format to a target data format in a data pipeline before providing the transformed data to a corresponding data sink, with these data sources being any source of data storing one or more datasets, and where the datasets are collections of data that are sent to a data sink that further feed into a machine learning model such as a classifier for training (Maag [0175], [0179]-[0180]). Maag further teaches a data source can be arranged as a relational database in tabular format that provides rows of data, where rows of data in a relational database represent a record or entry, and columns corresponding to data fields (Maag [0086]; [0089]-[0090]; [0099]: “Each of the data sources 320 may provide different data, possibly even in different data formats. As just one simple example, one data source 320 (e.g., 320A) may be a relational database server that provides rows of data … ”; and [0164]: “… relational database schemas typically define tables of data, where each table is defined to include a number of columns (or fields), each tied to a specific type of data, such as strings, integers, doubles, floats, bytes, and so forth.”).), and 
… to determine that the conditioned data set is inadequate based on, for a particular column of the columns of the conditioned data set, at least one of 
(i) the particular column being empty (Examiner’s note: Under its broadest reasonable interpretation, the phrase “at least one of … or” is interpreted to define a list of alternatives, indicating that a minimum of one of the listed options is required to be identified for the claimed invention. Maag teaches performing schema validation tests on an identified data set, including indicating fault/warning for cases where columns which are defined as non-NULL contain NULL values, where the presence of NULL values in a column indicate an empty column, and where the corresponding indicated fault/warning informing the issue represents an indication of an inadequacy present in a data set (Maag [0167]: “Configuration points for schema validation tests may include the schema that should be compared against the pre-transformation and/or post-transformation data, the pipeline and/or data sets from which to collect the data, how often the tests 700 should be performed, criteria for determining whether a violation is a "fault" or "potential fault“ (or "warning"), valid values for certain columns/fields (e.g. ensuring columns which are defined as non-NULL do not contain NULL values, that non-negative columns do not contain numbers which are negative, etc.) and so forth.”).); 
(ii) more than a threshold amount of the entries in the particular column being empty; 
(iii) fewer than the threshold amount of the entries in the particular column not being empty; 
(iv) the particular column containing a single unique value; or 
(v) values of the particular column being skewed beyond the threshold amount of the entries in the particular column.  
Both Ghanta in view of Dirac ‘430, in further view of Bianco and Maag are analogous art since both teach data pipelines for machine learning systems.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the feature processing performed in the ML pipeline taught in Ghanta in view of Dirac ‘430, in further view of Bianco and additionally incorporate a data transformation phase taught in Maag as a way to perform data transformation and validation on an identified data set as it moves through a ML pipeline. The motivation to combine is taught in Maag, since a data transformation phase provides functionality to not only transform input data into a particular desired format, but also the functionality to validate the transformed data before it is further processed downstream by another entity (e.g., by a model building phase), ensuring that it meets certain schema requirements, thus improving the quality of the target data set for use in a machine learning pipeline to build more accurate machine learning models (Maag paragraph [0164]: “Schema validation is the process of inspecting data to ensure that the data actually adheres to the format defined by the schema. Schemas in relational database may also define other constructs as well, such as relationships, views, indexes, packages, procedures, functions, queues, triggers, types, sequences, and so forth. However, schemas other than relational database schemas also exist, such as XML schemas. In some embodiments, the schema(s) indicating the format of the data stored by the data sources 320 and the schema representing the data format expected by the data sinks 330 are used to implement the transformations performed by the pipelines 410. For instance, the logic defined by each pipeline may represent the steps or algorithm required to transform data from the data source format into the data sink format. If the transformation is performed properly, the data after transformation should be able to pass validation with respect to the schema of the data sink. However, if errors occur during the transformation, the validation might fail if the transformed data is improperly formatted.”).
Regarding amended Claim 10, 
Claim 10 recites the method of claim 9, where the method comprises of claim limitations that are similar in scope to corresponding claim limitations in Claim 3, and hence is rejected under similar rationale and motivations provided by Ghanta in view of Dirac ‘430, in further view of Bianco and Maag as indicated in Claim 3, in view of rejections from Claim 9.
Regarding amended Claim 15, 
Claim 15 recites the article of manufacture of claim 13, where the article of manufacture further comprises of operations comprising of claim limitations that are similar in scope to corresponding claim limitations in Claim 3, and hence is rejected under similar rationale and motivations provided by Ghanta in view of Dirac ‘430, in further view of Bianco and Maag as indicated in Claim 3, in view of rejections from Claim 13.
Claims 4, 11, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over 
Ghanta et al., U.S. PGPUB 2020/0034665, filed 7/30/2018 [hereafter referred as Ghanta] in view of Dirac et al., U.S. PGPUB 2015/0379430, published 12/31/2015 [hereafter referred as Dirac '430], in further view of Bianco et al., A Practical and Effective Sampling Selection Strategy for Large Scale Deduplication, September 2015 [hereafter referred as Bianco], in even further view of Maag et al., U.S. PGPUB 2017/0220403, published 8/3/2017 [hereafter referred as Maag] as applied to Claims 3, 10, and 15; in even further view of Dirac et al., U.S. PGPUB 2015/0379424, published 12/31/2015 [hereafter referred as Dirac '424].
Regarding previously presented Claim 4, 
Ghanta in view of Dirac ‘430, in further view of Bianco, in even further view of Maag as applied to Claim 3 teaches
(Previously presented) The system of claim 3.
However, Ghanta in view of Dirac ‘430, in further view of Bianco, in even further view of Maag does not teach
… wherein the particular column contains at least one of 
(i) word vectors that describe, in a semantically-encoded vector space, a meaning of respective words, or 
(ii) paragraph vectors that describe, in a semantically-encoded vector space, a meaning of respective multi-word samples of text.  
Dirac ‘424 teaches
wherein the particular column contains at least one of 
(i) word vectors that describe, in a semantically-encoded vector space, a meaning of respective words (Examiner’s note: Under its broadest reasonable interpretation, the phrase “at least one of … or” is interpreted to define a list of alternatives, indicating that a minimum of one of the listed options is required to be identified for the claimed invention. Dirac ‘424 teaches performing data pre-processing operations on input data containing data records containing variables of data types such as text, and using natural language processing to perform feature processing (Dirac ‘424 [0035]: “Some machine learning workflows, which may correspond to a sequence of API requests from a client 164, may include the extraction and cleansing of input data records from raw data repositories 130 (e.g., repositories indicated in data source definitions 150) by input record handlers 160 of the MLS, as indicated by arrow 114. … The input data may comprise data records that include variables of any of a variety of data types, such as, for example text, … The output produced by the input record handlers may be fed to feature processors 162 (as indicated by arrow 115), where a set of transformation operations may be performed 162 in accordance with recipes 152 using another set of resources from pool 185. Any of a variety of feature processing approaches may be used depending on the problem domain: e.g., … natural language processing …”). Dirac ‘424 further teaches performing the text data transformations by determining the root words to be included in an n-gram for use in a machine learning algorithm, where this transformation process represents transforming “word vectors that describe, in a semantically-encoded vector space, the meaning of respective words” (Dirac ‘424 [0079]: “… a recipe language defined by the MLS enables users to easily and concisely specify transformations to be performed on specified sets of data records to prepare the records for use for model training and prediction. … In at least one embodiment, a pipeline of successive transformations to be performed starting with a given input data set may be indicated within a single recipe. In one embodiment, the MLS may perform parameter optimization for one or more recipes---e.g., the MLS may automatically vary such transformation properties as the sizes of quantile bins or the number of root words to be included in an n-gram in an attempt to identify a more useful set of independent variables to be used for a particular machine learning algorithm.”).), or 
(ii) paragraph vectors that describe, in a semantically-encoded vector space, [[the]] a meaning of respective multi-word samples of text.  
Both Ghanta in view of Dirac ‘430, in further view of Bianco, in even further view of Maag and Dirac ‘424 are analogous art since both teach data pre-processing phases in machine learning systems.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the data pre-processing phase of Ghanta in view of Dirac ‘430, in further view of Bianco, in even further view of Maag and incorporate the text pre-processing steps of Dirac ‘424 as a way to handle text pre-processing in an input data set. The motivation to combine is taught in Dirac ‘424, as text data transformations that convert the data into root word vector representations/n-grams allow the machine learning system to perform automated parameter explorations for text data, thus improving upon the functionality of the machine learning system (Dirac ‘424 [0093]: “For many types of feature processing transformation operations, such as creating quantile bins for numeric data attributes, generating ngrams, or removing sparse or infrequent words from documents being analyzed, parameters may typically have to be selected, such as the sizes/boundaries of the bins, the lengths of the ngrams, the removal criteria for sparse words, and so on. The values of such parameters (which may also be referred to as hyper-parameters in some environments) may have a significant impact on the predictions that are made using the recipe outputs. Instead of requiring MLS users to manually submit requests for each parameter setting or each combination of parameter settings, in some embodiments the MLS may support automated parameter exploration.”; and [0094]: “Automated parameter exploration may also be used for selection dimensionality values for a vector representation of a text document (e.g., in accordance with the Latent Dirichlet Allocation (LDA) technique) or other natural language processing techniques. In some cases, the client may also indicate the criteria to be used to terminate exploration of the parameter value space, e.g., to arrive at acceptable parameter values. In at least some embodiments, the client may be given the option of letting the MLS decide the acceptance criteria to be used-such an option may be particularly useful for non-expert users. In one implementation, the client may indicate limits on resources or execution time for parameter exploration. In at least one implementation, the default setting for an auto-tune setting for at least some output transformations may be "true", e.g., a client may have to explicitly indicate that auto-tuning is not to be performed in order to prevent the MLS from exploring the parameter space for the transformations.”.).
Regarding previously presented Claim 11, 
Claim 11 recites the method of claim 10, where the method comprises of claim limitations that are similar in scope to corresponding claim limitations in Claim 4, and hence is rejected under similar rationale and motivations provided by Ghanta in view of Dirac ‘430, in further view of Bianco, in even further view of Maag and Dirac ‘424 as indicated in Claim 4, in view of rejections from Claim 10.
Regarding previously presented Claim 16, 
Claim 16 recites the article of manufacture of claim 15, where the article of manufacture further comprises of claim limitations that are similar in scope to corresponding claim limitations in Claim 4, and hence is rejected under similar rationale and motivations provided by Ghanta in view of Dirac ‘430, in further view of Bianco, in even further view of Maag and Dirac ‘424 as indicated in Claim 4, in view of rejections from Claim 15.
Claims 6, 18, and 24-25 are rejected under 35 U.S.C. 103 as being unpatentable over 
Ghanta et al., U.S. PGPUB 2020/0034665, filed 7/30/2018 [hereafter referred as Ghanta] in view of Dirac et al., U.S. PGPUB 2015/0379430, published 12/31/2015 [hereafter referred as Dirac '430], in further view of Bianco et al., A Practical and Effective Sampling Selection Strategy for Large Scale Deduplication, September 2015 [hereafter referred as Bianco] as applied to Claims 1, 13, and 22; in even further view of Schelter et al., Automating Large-Scale Data Quality Verification, 2018 [hereafter referred as Schelter].
Regarding amended Claim 6, 
Ghanta in view of Dirac ‘430, in further view of Bianco as applied to Claim 1 teaches
(Currently amended) The system of claim 1, 
… wherein applying the ML model building phase comprises generating an ML model to predict a particular column of the conditioned data set (Examiner’s notes: Under its broadest reasonable interpretation, the term “conditioned data set” broadly recites a data set that is based on an identified “target” data set (where an identified “target” data set broadly recites any identified data set that is used in a machine learning system). Ghanta teaches training pipeline and inference pipelines training a machine learning model on a training data set, and performing inferences on an inference data set, where the training data set contains three columns of feature data (Age, Sex, Height), and the inference data set contains only two columns of data (Age, Height), such that the inference data set infers an output corresponding to the information from the third column (Sex; M/F) (Ghanta Figure 2A, elements 200, 206a-c and [0040]: “In certain embodiments of machine learning systems 200, there is a training phase, for generating the machine learning model, and an inference phase for analyzing an inference data set using the machine learning model. The output from the inference phase may be one or more predictive "labels" determined as a function of one or more features of the inference data set. For example, if the training data set comprises three columns of feature data-Age, Sex, and Height-that are used to train the machine learning model, and the inference data comprises two columns of feature data-Age and Height-the output from an inference pipeline 206 using the machine learning model may be a "label" describing the predicted Sex (M/F) based on the given inference data.”).) …
While Ghanta in view of Dirac ‘430, in further view of Bianco teaches feature processing and data-preprocessing applied to target data sets (represented by training, validation, test, error, inference data sets) to produce conditioned data sets, Ghanta in view of Dirac ’430, in further view of Bianco does not explicitly teach
… wherein the conditioned data set is arranged in columns and rows, wherein the columns define fields of the conditioned data set and the rows define entries in the conditioned data set (Examiner’s note: Under its broadest reasonable interpretation, the term “conditioned data set” broadly recites a data set that is based on an identified “target” data set (where an identified “target” data set broadly recites any identified data set that is used in a machine learning system). Schelter teaches performing data quality verification tests on datasets that are used in determining predictions for a machine learning model Schelter p.1787 col.2 2nd paragraph), where the data can be arranged in tables in a relational database system, and where the verification tests are based on computing and applying metrics to identify data constraints to a number of records (“hasSize”, “satisfies”, “satisfiesIf” ) as well as various other applying other metrics to identify data constraints on a set of numerical or categorical columns (“isComplete”, “hasCompleteness”, “isInRange”, “isUnique”, “hasUniqueness”, “hasDistinctness”) (Schelter p.1783 col.2 1st paragraph: “… errors in the data might cause unexpected errors in downstream system that consume the data. In many cases these errors may be hard to detect, e.g., they might cause regressions in the prediction quality of a machine learning model, which makes assumptions about the shape of particular features computed from the input data [34].”; p.1782 col.1-col.2 Section 2. Data Quality Dimensions: “… The quality of data can refer to the extension of the data (i.e., data values), or to the intension of the data (i.e., the schema) [4] … Completeness refers to the degree to which an entity includes data required to describe a real-world object. In tables in relational database system, completeness can be measured the by presence of null values, which is usually interpreted as a missing value. … Consistency is defined as the degree to which a set of semantic rules are violated. Intra-relation constraints define a range of admissible values, such as a specific data type, an interval for a numerical column, or a set of values for a categorical column. … Inter-relation constraints may involve columns from multiple tables. … Accuracy is the correctness of the data and can be measured in two dimensions: syntactic and semantic. …”; p.1783 Table 1 Constraints available for composing user-defined data quality checks, refer to “isComplete”, “hasCompleteness”, “isUnique”, “hasUniqueness”, “hasDistinctness”, “isInRange”, “satisfies”, “satisfiesIf”, “hasSize”; and p.1784 Table 2 Computable metrics to base constraints on).) … 
… determining that values of the particular column are skewed beyond a threshold amount (Examiner’s note: Under its broadest reasonable interpretation, the phrase “values … skewed beyond a threshold amount” broadly recites values that is outside a specified interval range. As indicated earlier, Schelter teaches performing data quality verification tests on datasets, where the verification tests can involve applying constraints (“isInRange”) to a range of admissible values analyzed over an interval for a numerical column, which requires the use of a corresponding computed “ValueRange” metric to identify Schelter p.1782 col.1-col.2 Section 2. Data Quality Dimensions; p.1783 Table 1 Constraints available for composing user-defined data quality checks, refer to “isInRange”; p.1784 Table 2 Computable metrics to base constraints on; p.1786 col.1 Section 3.3 Constraint Suggestion 1st paragraph: “The benefits of our system to users heavily depend on the richness and specificity of the checks and constraints, which the users define and for which our system will regular compute data quality metrics. … we provide machinery to automatically suggest constraints and identify data types for datasets … Such suggestion functionality can then be integrated into ingestion pipelines and can also be used during exploratory data analysis.”; and p.1786 col.2 2nd paragraph, 6th bullet: “ … If the number of distinct values in a column is below a particular threshold, we interpret the column as categorical and suggest an isInRange constraint that checks whether future values are contained in the set of already observed values.”).).
	Both Ghanta in view of Dirac ‘430, in further view of Bianco and Schelter are analogous art since both teach data/feature processing on training data sets.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take data/feature processing steps taught in Ghanta in view of Dirac ‘430, in further view of Bianco and incorporate the data verification tests taught in Schelter as a way to detect and identify data quality issues (missing values, out-of-range values, etc.) that would impact the performance of a machine learning model. The motivation to combine is taught in Schelter, as a way to provide data quality checks to identify these data quality issues that would potentially result in failures when the data containing these issues are ingested by downstream processes (such as a machine learning model) within an automated machine learning system. Hence providing a way to generate an alert of these data quality issues when they are detected through automated data verification tests on the data sets allows a machine learning system to take further action to correct these issues on large datasets, thus improving the overall usability, scalability, and robustness of the machine learning system (Schelter p.1781 col.2 (Section 1 Introduction): “… there is a trend across different industries towards more automation of business processes with machine learning (ML) techniques. These techniques are often highly sensitive on input data, as the deployed models rely on strong assumptions about the shape of their inputs [43], and subtle errors introduced by changes in data can be very hard to detect [34]. At the same time, there is ample evidence that the volume of data available for training is often a decisive factor for a model’s performance [17], [44] … Many such data sources do not support integrity con[s]traints and data quality checks … Such issues potentially result in failures of the ingestion process. Even if the ingestion process still works, the errors in the data might cause unexpected errors in downstream systems that consume the data. … We therefore postulate that there is a need for increased automation of data validation. We present a system that we built for this task and that meets the demands of production use cases. …; p.1784 col.2 5th paragraph; and p.1791 col.2 last paragraph – p.1792 col.1 1st paragraph (Section 6 Learnings): “… users highlighted the fact that our data quality library runs on Spark, which they experience as a fast, scalable way to do data processing … Our system helped reduce manual and ad-hoc analysis on their data … Instead such check can now be run in an automated way as part of ingestion pipelines. Additionally, data producers can leverage our system to halt their data publishing pipelines when they encounter cases of data anomalies. By that, they can ensure that downstream data processing, which often includes training ML models, is only working with vetted data.”).  
Regarding amended Claim 18, 
Claim 18 recites the article of manufacture of claim 13, where the article of manufacture further comprises of claim limitations that are similar in scope to corresponding claim limitations in Claim 6, and hence is rejected under similar rationale and motivations provided by Ghanta in view of Dirac ‘430, in further view of Bianco and Schelter as indicated in Claim 6, in view of rejections from Claim 13.
Regarding new Claim 24,
 Ghanta in view of Dirac ‘430, in further view of Bianco as applied to Claim 22 teaches
(New) The system of claim 22, …
… determining the one or more failure metrics (Examiner’s note: Under its broadest reasonable interpretation, the term “failure” as defined in the Merriam-Webster dictionary indicates something that lacks success or failing short (a deficiency), resulting in the term “failure metrics” to broadly recite any set of metrics that are used to determine a deficient result. As indicated earlier, Ghanta teaches an analysis Ghanta [0078]-[0079]; [0085]-[0087]; [0089]; Table 1 and [0092]-[0093]; and [0098]-[0099]).); …
While Ghanta in view of Dirac ‘430, in further view of Bianco teaches the one or more failure metrics include data deviation metrics, Ghanta in view of Dirac ‘430, in further view of Bianco does not explicitly teach
determining that the conditioned data set comprises more than a threshold number of empty values in a particular column, in a particular field, or both.
Schelter teaches
determining that the conditioned data set comprises more than a threshold number of empty values in a particular column (Examiner’s note: Under its broadest reasonable interpretation, the ‘or’ identified in this claim limitation is interpreted as an exclusive ‘or’ defining a list of alternatives, indicating that only one of the identified limitations (“apply the ML pipeline to an additional target data set”, “change a criterion used to select the target data set”, or “both”) is required for the claimed invention. Furthermore, under its broadest reasonable interpretation, the term “conditioned data set” broadly recites a data set that is based on an identified “target” data set (where an identified “target” data set broadly recites any identified data set that is used in a machine learning system). As indicated earlier, Schelter teaches performing data quality verification tests on datasets, where the verification tests can involve applying constraints (“isComplete”, indicating that a particular column contains all non-NULL values (no empty/missing values), or “hasCompleteness”, indicating that some of the values in a column are NULL based on a custom validation from a user), both of which requires the use of a corresponding computed “Completeness” metric to identify the whether to apply a “isComplete” or “hasCompleteness” constraint (where the “hasCompleteness” constraint further identifies an associated lower bound start value of the Schelter p.1782 col.1-col.2 Section 2. Data Quality Dimensions; p.1783 Table 1 Constraints available for composing user-defined data quality checks, refer to “isComplete” and “hasCompleteness”; p.1784 Table 2 Computable metrics to base constraints on; p.1786 col.1 Section 3.3 Constraint Suggestion 1st paragraph: “The benefits of our system to users heavily depend on the richness and specificity of the checks and constraints, which the users define and for which our system will regular compute data quality metrics. … we provide machinery to automatically suggest constraints and identify data types for datasets … Such suggestion functionality can then be integrated into ingestion pipelines and can also be used during exploratory data analysis.”; and p.1786 col.2 2nd paragraph, 1st-2nd bullets: “ … If the column is complete in the sample at hand, we suggest an isComplete (not null) constraint. … If a column is incomplete in the sample at hand, we suggest a hasCompleteness constraint. We model the fact whether a value is present or not as a Bernoulli-distributed random variable, estimate a confidence interval for the corresponding probability, and return the start value of the interval as lower bound for the completeness in the data.”).), in a particular field, or both.
	Both Ghanta in view of Dirac ‘430, in further view of Bianco and Schelter are analogous art since both teach data/feature processing on training data sets.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take data/feature processing steps taught in Ghanta in view of Dirac ‘430, in further view of Bianco and incorporate the data verification tests taught in Schelter as a way to detect and identify data quality issues (missing values, out-of-range values, etc.) that would impact the performance of a machine learning model. The motivation to combine is taught in Schelter, as provided in the prior art claim mapping of Claim 6 recited above.
Regarding new Claim 25, 
Ghanta in view of Dirac ‘430, in further view of Bianco as applied to Claim 22 teaches
 (New) The system of claim 22, 
… the indication of inadequacy of the conditioned data set … caused the ML pipeline to be terminated (Examiner’s note: Under its broadest reasonable interpretation, the term “inadequacy” as defined in the Merriam-Webster dictionary is representing a state or condition of not being adequate, not enough, or not good enough. As indicated earlier, Ghanta teaches the analysis module determining suitability of the first machine learning model for generating accurate predictions, where the determination of unsuitability (e.g., the model is not generating accurate predictions) identifies a condition that the first machine learning model is a deficient ML model. Ghanta further teaches the analysis module, in response to determining that the first machine learning model is deficient based on not satisfying predetermined suitability thresholds, can request the action module to perform various actions associated with the first machine learning algorithm, where these additional actions include retraining the first machine learning model using different training data set, switching the first machine learning model to a different machine learning model for training, or recommending a different first machine learning algorithm to be used to analyze the inference data set, and where the trigger to perform any of the above actions represents an indication of inadequacy with regards to the first machine learning model. A person having ordinary skill in the art would understand that any of these above actions/recommendations (changing a different training set, switching different machine learning models for training, recommending different learning algorithms for inference) would require that the current running ML pipeline to be terminated before these additional actions are applied to a machine learning pipeline (Ghanta Figure 5, elements 514, 516, 518, 520; and Figure 4, element 410 and [0098], [0100]-[0102]; and [0109]: “…the analysis module 310 determines 514 whether the predicted suitability of the first machine learning algorithm/model satisfies a predetermined suitability threshold. If so, the method 500 ends. Otherwise, the action module 312 triggers one or more actions associated with the first machine learning algorithm. For instance, the action module 312 may trigger retraining 516 the first machine learning model with different training data, may trigger switching 518 the first machine learning model to a different machine learning model that is trained using different training data, may recommend 520 different machine learning algorithms for analyzing the inference data set, may update 522 suitability thresholds, and/or the like, and the method 500 ends.”).) …
Ghanta in view of Dirac ‘430, in further view of Bianco teaches providing an indication of inadequacy of a conditioned data set through triggering of actions that result in termination of a ML pipeline, Ghanta in view of Dirac ‘430, in further view of Bianco does not explicitly teach
… identifies one or more particular failure metrics …
Schelter teaches
… identifies one or more particular failure metrics (Examiner’s note: Schelter teaches the system performing the data quality verification tests on datasets reports which constraints succeeded and failed, including addition information on the predicates used for the corresponding metrics, and the values that made the constraint fail, where this detailed information represents an identification of the one or more particular failure metrics, and where this detailed information can be used as an indication to halt a machine learning pipeline (to ensure that the downstream training of ML models is not being trained with this failed data) (Schelter p.1784 col.2 5th paragraph: “Output. After execution of the data quality verification, our system reports which constraints succeeded and which failed, including information on the predicate applied to the metric into which the constraint was translated, and the value that made a constraint fail.”; p.1785 Listing 2; and p.1792 col.1 1st paragraph (Section 6 Learnings): “… data producers can leverage our system to halt their data publishing pipelines when they encounter cases of data anomalies. By that, they can ensure that downstream data processing, which often includes training ML models, is only working with vetted data.”).) …
	Both Ghanta in view of Dirac ‘430, in further view of Bianco and Schelter are analogous art since both teach data/feature processing on training data sets.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take data/feature processing steps taught in Ghanta in view of Dirac ‘430, in further view of Bianco and incorporate the data verification tests taught in Schelter as a way to detect and identify data quality issues (missing values, out-of-range values, etc.) that would impact the performance of a machine learning model. The motivation to combine is taught in Schelter, as provided in the prior art claim mapping of Claim 6 recited above.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM WAI YIN KWAN whose telephone number is 303-297-4332. The examiner can normally be reached Monday-Friday 8:00am - 4:30pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 571-272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/WILLIAM WAI YIN KWAN/Examiner, Art Unit 2121                                                                                                                                                                                                        


/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121