DETAILED ACTION
1.	This communication is in response to Application No. 16/563,036 filed on September 6, 2019 in which claims 1-20 are presented for examination.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
3.	The information disclosure statement submitted on 09/06/2019 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
4.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


5.	Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Goel et al. (hereinafter Goel) (US PG-PUB 20200210721), in view of Yan et al. (hereinafter Yan) (“HD-CNN: Hierarchical Deep Convolutional Neural Network for Large Scale Visual Recognition”).
	Regarding Claim 1, Goel teaches a method comprising: 
receiving a data set for training one or more machine learning (ML) models, wherein the data set comprises labeled exemplars for a plurality of classes (Goel, Par. [0050], “In some examples, the training data for different ML models may vary depending on the general classification (i.e., the candidate classification of the parent with which a sub-class ML model is associated) and/or candidate classifications with which they are associated. For example, a first sub-class ML model associated with the general classification “signage” may be trained on training data that comprises various signs. In some examples, the training data for the first sub-class ML model may exclusively comprise training data that includes at least one sign, although in additional or alternative examples, the training data may comprise data that does not include signs to train the sub-class ML model negatively as well (e.g. what is not a sign). Whereas, a second sub-class ML model associated with the general classification “pedestrian” may be trained on training data that may or may not contain signs, but does contain objects related to rare classifications such as “individual in wheelchair”, etc. The first ML model may be more broadly trained to differentiate between general classes, but, in some examples, not to differentiate between sub-classes. For example, first ML model may be trained using ground truths that indicate “pedestrian”, “vehicle”, “sign”, etc., but not “pedestrian holding object”, “individual in wheelchair”, “four-wheeled vehicle”, “stop sign”, “yield sign”, “speed sign”, etc.”, thus, training data is received for each different ML model. Further, the training data set comprises ground truth data which are labelled for a plurality of classes); 
training a first ML model using the training set (Goel, Par. [0053], “In some examples, the first ML model 202 may be trained first and sub-class ML models may be trained once the first ML model 202 has reached a sufficient accuracy. For example, the first ML model 202 may be trained until the first ML model 202 outputs a classification that meets or exceeds a probability threshold. In an additional or alternate example, the first ML model 202 may be trained simultaneously with one or more layers of sub-class ML models.”, therefore, the first ML model is trained using the training data set); 
evaluating, using the first testing set, a quality of the first ML model with respect to each class of the plurality of classes (Goel, Par. [0018], “In some examples, the techniques may additionally or alternatively comprise training the selected sub-class ML model and/or the first ML model by backpropagating loss through the sub-class ML model and/or the first ML model (e.g., for the classification associated with the sub-class ML model). In some examples, the loss may be backpropagated for the sub-classification and/or probability generated by the sub-class ML model and/or the classification, probability, one or more feature maps, and/or ROI generated by the first ML model. In some examples, a first loss may be calculated for the first ML model (e.g., based at least in part on ground truth that specifies an ROI and/or a classification) and a second loss may be calculated for a sub-class ML model (e.g., based at least in part on ground truth that specifies a classification and/or a sub-classification). In those examples where multiple models are used, the sub-class ML model may be trained using the second loss and/or the first ML model may be trained based at least in part on backpropagating the first loss and/or the second loss (i.e., training the model end-to-end). Backpropagating the second loss to train the first ML model may further refine the accuracy of the first ML model. In other examples where a single model (e.g., neural network) is used, the model may use one or more losses and propagate the losses back for refinement.”, therefore, loss may be calculated for the first ML model, in order to evaluate the quality with respect to each class/classification compared to ground truth); and 
upon determining that quality of the first ML model is below a predefined threshold with respect to a first class and a second class of the plurality of classes (Goel, Par. [0053-0054], “In some examples, the first ML model 202 may be trained first and sub-class ML models may be trained once the first ML model 202 has reached a sufficient accuracy. For example, the first ML model 202 may be trained until the first ML model 202 outputs a classification that meets or exceeds a probability threshold. In an additional or alternate example, the first ML model 202 may be trained simultaneously with one or more layers of sub-class ML models. In some examples, loss calculated for a sub-classification ML model may be backpropagated through the sub-classification and/or any parent ML model(s) up to and including the first ML model. In such examples, e.g., where one or more models are used, the model may be referred to as being trained “end-to-end.”, thus, the first ML model is trained until it outputs a classification that meets or exceeds a threshold. Hence, the training would continue if the first ML model is below a predefined threshold with respect to classification): 
identifying a subset of the training set, wherein each exemplar in the subset corresponds to either the first class or the second class (Goel, Par. [0011], “The techniques may comprise receiving the classification and selecting, based at least in part on the classification, a sub-class ML model from among a plurality of sub-class ML models. Selecting the sub-class ML model may comprise determining a subset of one or more feature maps to provide to the sub-class ML model as input. In some examples, each sub-class ML model may be associated with a different classification, although it is understood that, in additional or alternate examples, two sub-class ML models may share a common classification. For example, a first sub-class ML model may be associated with a “pedestrian” classification, a second sub-class ML model may be associated with a “vehicle” classification, and so on. Therefore, if the first ML model outputs a “pedestrian” classification, the techniques may include selecting the first sub-class ML model. In some examples, a selection component may provide a first subset to a first sub-class ML model based at least in part on a first classification and a second subset to a second sub-class ML model based at least in part on a second classification. Such selection may be by logical statements (e.g., switch, if-then, etc.), as part of a pooling calculation in a model, as another subnetwork, or otherwise.”, thus, subsets of the training data set are identified and exemplars either correspond to a first or second classification); and 
training a second ML model using the subset of the training set (Goel, Par. [0050], “In some examples, the training data for different ML models may vary depending on the general classification (i.e., the candidate classification of the parent with which a sub-class ML model is associated) and/or candidate classifications with which they are associated. For example, a first sub-class ML model associated with the general classification “signage” may be trained on training data that comprises various signs. In some examples, the training data for the first sub-class ML model may exclusively comprise training data that includes at least one sign, although in additional or alternative examples, the training data may comprise data that does not include signs to train the sub-class ML model negatively as well (e.g. what is not a sign). Whereas, a second sub-class ML model associated with the general classification “pedestrian” may be trained on training data that may or may not contain signs, but does contain objects related to rare classifications such as “individual in wheelchair”, etc. The first ML model may be more broadly trained to differentiate between general classes, but, in some examples, not to differentiate between sub-classes. For example, first ML model may be trained using ground truths that indicate “pedestrian”, “vehicle”, “sign”, etc., but not “pedestrian holding object”, “individual in wheelchair”, “four-wheeled vehicle”, “stop sign”, “yield sign”, “speed sign”, etc.”, thus, a second ML model using the subset of training data is trained)
Goel does not teach partitioning the data set into a training set and a first testing set.
However, Yan teaches partitioning the data set into a training set and a first testing set (Yan, Pgs. 4325, 7.2. CIFAR100 Dataset, “The CIFAR100 dataset consists of 100 classes of natural images. There are 50K training images and 10K testing images. We follow [11] to preprocess the datasets (e.g. global contrast normalization and ZCA whitening). Randomly cropped and flipped image patches of size 26 x 26 are used for training. We adopt a NIN network1 with three stacked layers [21]. We denote it as CIFAR100-NIN which will be the HD-CNN building block. Fine category components share preceding layers from conv1 to pool1 which accounts for 6% of the total parameters and 29% of the total floating point operations. The remaining layers are used as independent layers. For building the category hierarchy, we randomly choose 10K images for the training set as held-out set.”, hence, the dataset is split into a training set and a first testing set); 

Goel does not explicitly disclose partitioning the data set into a training set and a first testing set. However, Yan teaches partitioning the data set into a training set and a first testing set (Yan, Pgs. 4325, 7.2. CIFAR100 Dataset, “The CIFAR100 dataset consists of 100 classes of natural images. There are 50K training images and 10K testing images. We follow [11] to preprocess the datasets (e.g. global contrast normalization and ZCA whitening). Randomly cropped and flipped image patches of size 26 x 26 are used for training. We adopt a NIN network1 with three stacked layers [21]. We denote it as CIFAR100-NIN which will be the HD-CNN building block. Fine category components share preceding layers from conv1 to pool1 which accounts for 6% of the total parameters and 29% of the total floating point operations. The remaining layers are used as independent layers. For building the category hierarchy, we randomly choose 10K images for the training set as held-out set.”, hence, the dataset is split into a training set and a first testing set). It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of training a first machine learning model, using training data comprising a plurality of classes, and subsequently training a second machine learning model based on quality of the first machine learning model, as disclosed by Goel to include the partitioned testing set, as disclosed by Yan. One of ordinary skill in the art would have been motivated to make this modification to ensure that the first and second machine learning models are tested using an adequate amount of data and provide improved classification accuracy (Yan, Pg. 4324, 6. HD-CNN Testing, “As we add fine category components into the HD-CNN, the number of parameters, memory footprint and execution time in rear layers, all scale linearly in the number of coarse categories. To ensure HD-CNN is scalable for large-scale visual recognition, we develop conditional execution and layer parameter compression techniques”).

Regarding Claim 2, Goel in view of Yan teaches the method of claim 1, wherein partitioning the data set further comprises partitioning the data set into a second testing set (Yan, Pg. 4328, Section 7.3.2 VGG-16-layer Building Block Net, “We follow the training and testing protocols as in [26]. For training, we first sample a size S from the range [256, 512] and resize the image so that the length of short edge is S. Then a randomly cropped and flipped patch of size 224 x 224 is used for training. For testing, dense evaluation is performed on three scales {256, 384, 512} and the averaged prediction is used as the final prediction. Please refer to [26] for more training and testing details.”, thus, the data set is partitioned into a second testing set), the method further comprising: 
evaluating a quality of the second ML model with respect to the first and second classes; and  upon determining that the second ML model is satisfactory with respect to the first and second classes (Goel, Par. [0046], “n some examples, based at least in part on determining one of the sub-classifications to output, the sub-class ML model 304(p) may determine whether the output sub-classification meets or exceeds a probability threshold 324. For example, even though the sub-classification may be associated with a maximum probability of all the probabilities in the probability distribution 320, the probability may still be too low to be relied upon (e.g., less than 95%, less than 90%, less than 80%, less than 70%). If the probability associated with the output sub-classification is less than the probability threshold 324, the sub-class ML model 304(p) may output the classification received from the first ML model 202 instead of the sub-classification. However, if the probability meets or exceeds the probability threshold, the sub-class ML model 304(p) may output the sub-classification. In an additional or alternate example, the sub-class ML model 304(p) may output the sub-classification in addition to the general classification, even though the sub-classification is associated with a probability below the probability threshold, although in some examples, the sub-class ML model 304(p) may additionally or alternatively output an indication that the sub-classification is associated with a probability less than the probability threshold 324.”, therefore, loss is calculated for the sub-class ML, which evaluates the quality of the second ML model with respect to the classifications. Further, this is evaluated against a probability threshold which would determine if the second ML model is satisfactory with respect to the classification): 
creating a hierarchical ML model comprising the first and second ML models (Goel, Par. [0044], “In other words, the example architecture 300 may comprise a hierarchical structure of parent ML models and child ML models associated by classification where a child ML model is trained to output sub-classifications associated with a classification generated by a parent ML model. The classifications and/or sub-classifications generated by the ML models discussed herein may reflect the hierarchical structure of the ML models. For example, a “yield sign” may be indicated by the second sub-class ML model as “signage:traffic sign:yield sign”.”, therefore, as also illustrated in Figure 3, a hierarchical ML model comprising the first and second ML models is created).
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

Regarding Claim 3, Goel in view of Yan teaches the method of claim 2, wherein creating the hierarchical ML model comprises linking an input of the second ML model to output of the first ML model (Goel, Figure 3, depicts the hierarchical ML model with label 300. Further, it is apparent that the first ML model, label 202, is linked to the sub-class ML models, label 304, in which the output of the first ML model is fed as input to the sub-class ML models), such that when the first ML model classifies input data as belonging to the first or second class (Goel, Figure 4, depicts that a classification is generated by the first ML model in step 404), the input data is forwarded to the second ML model for final classification (Goel, Figure 4, depicts in steps 406-414 that once the input data is forwarded to the sub-class ML models a final classification is generated based on satisfying the probability threshold).

Regarding Claim 4, Goel in view of Yan teaches the method of claim 1, wherein evaluating the quality of the first ML model comprises: 
Goel does not teach wherein evaluating the quality of the first ML model comprises: generating a confusion matrix by processing the first testing set using the first ML model; and determining a precision of the first ML model with respect to each class of the plurality of classes, based on the confusion matrix.
However Yan teaches generating a confusion matrix (Yan, Pg. 4323, 4. Learning a Category Hierarchy, “We randomly sample a held-out set of images with balanced class distribution from the training set. The rest of the training set is used to train a building block net. We obtain a confusion matrix F by evaluating the net on the held-out set”, thus, a confusion matrix is generated) by processing the first testing set using the first ML model (Yan, Pg. 4325, 7.1 Overview, “We evaluate HD-CNN on the benchmark datasets CIFAR100 [17] and ImageNet [4]. HD-CNN is implemented on the widely deployed Caffe [15] software. The network is trained by back propagation [18]. We run all the testing experiments on a single NVIDIA Tesla K40c card”, therefore, the first ML model (which may be a convolutional neural network (CNN) as described in Goel Par. [0030]) processes the first testing set); and 
determining a precision of the first ML model with respect to each class of the plurality of classes, based on the confusion matrix (Yan, Pg. 4323, 4. Learning a Category Hierarchy, “We randomly sample a held-out set of images with balanced class distribution from the training set. The rest of the training set is used to train a building block net. We obtain a confusion matrix F by evaluating the net on the held-out set. A distance matrix D is derived as D = 1 – F and its diagonal entries are set to be zero. D is further transformed by D = 0.5 * (D + DT) to be symmetric. The entry Dij measures how easy it is to discriminate categories I and j. Spectral clustering is performed on D to cluster fine categories into K coarse categories. The result is a two-level category hierarchy representing a many-to-one mapping Pd : [1,C] -> [1,K] from fine to coarse categories. Here, the coarse categories are disjoint”, thus, based on the confusion matrix, the precision of the first ML model with respect to each classification/category is determined).
		
Goel does not explicitly disclose wherein evaluating the quality of the first ML model comprises: generating a confusion matrix by processing the first testing set using the first ML model; and determining a precision of the first ML model with respect to each class of the plurality of classes, based on the confusion matrix. However Yan teaches generating a confusion matrix (Yan, Pg. 4323, 4. Learning a Category Hierarchy, “We randomly sample a held-out set of images with balanced class distribution from the training set. The rest of the training set is used to train a building block net. We obtain a confusion matrix F by evaluating the net on the held-out set”, thus, a confusion matrix is generated) by processing the first testing set using the first ML model (Yan, Pg. 4325, 7.1 Overview, “We evaluate HD-CNN on the benchmark datasets CIFAR100 [17] and ImageNet [4]. HD-CNN is implemented on the widely deployed Caffe [15] software. The network is trained by back propagation [18]. We run all the testing experiments on a single NVIDIA Tesla K40c card”, therefore, the first ML model (which may be a convolutional neural network (CNN) as described in Goel Par. [0030]) processes the first testing set); and determining a precision of the first ML model with respect to each class of the plurality of classes, based on the confusion matrix (Yan, Pg. 4323, 4. Learning a Category Hierarchy, “We randomly sample a held-out set of images with balanced class distribution from the training set. The rest of the training set is used to train a building block net. We obtain a confusion matrix F by evaluating the net on the held-out set. A distance matrix D is derived as D = 1 – F and its diagonal entries are set to be zero. D is further transformed by D = 0.5 * (D + DT) to be symmetric. The entry Dij measures how easy it is to discriminate categories I and j. Spectral clustering is performed on D to cluster fine categories into K coarse categories. The result is a two-level category hierarchy representing a many-to-one mapping Pd : [1,C] -> [1,K] from fine to coarse categories. Here, the coarse categories are disjoint”, thus, based on the confusion matrix, the precision of the first ML model with respect to each classification/category is determined). It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of training a first and second machine learning model as disclosed by Goel in view of Yan in Claim 1, to include the generation of a confusion matrix using the first ML model and determination of a precision based on the confusion matrix, as disclosed by Yan. One of ordinary skill in the art would have been motivated to make this modification to better understand and correct the errors produced by the machine learning classification model and increase its accuracy (Yan, Pg. 4323, 4. Learning a Category Hierarchy, “With disjoint coarse categories, the overall classification depends heavily on the coarse category classifier. If an image is routed to an incorrect fine category classifier, then the mistake can not be corrected as the probability of ground truth label is implicitly set to zero there. Removing the separability constraint between coarse categories can make the HD-CNN less dependent on the coarse category classifier” & Table 1 which depicts the testing errors on CIFAR100 dataset).

Regarding Claim 5, Goel in view of Yan teaches the method of claim 1, the method further comprising: 
grouping the plurality of classes based on a required accuracy for each class of the plurality of classes (Yan, Pg. 4323, “Our goal of building a category hierarchy is grouping confusing fine categories into the same coarse category for which a dedicated fine category classifier will be trained. We employ a top down approach to learn the hierarchy from the training data.”, therefore, classes/categories are grouped based on accuracy); 
training a classifier ML model for each group of classes (Yan, Pg. 4325, 6. HD-CNN Testing, “In HD-CNN, the number of parameters in rear layers of fine category classifiers grows linearly in the number of coarse categories. Thus we compress the layer parameters at test time to reduce memory footprint”, thus, a coarse category classifier is trained and tested for each group of categories/classes); and 
training a group ML model to assign input to one of the classifier ML models (Yan, Pgs. 4321-4322, “In this paper, we propose a generic and principled hierarchical architecture, Hierarchical Deep Convolutional Neural Network (HD-CNN), that decomposes an image classification task into two steps. An HD-CNN first uses a coarse category CNN classifier to separate easy classes from one another. More challenging classes are routed downstream to fine category classifiers that focus on confusing classes.”, therefore, a group ML model is trained based on the coarse category classifier. This is further illustrated by Figure 1, which depicts the various coarse categories and how they are used in training a group ML model).
The reasons of obviousness have been noted in the rejection of Claims 1 and 4 above and applicable herein.

Regarding Claim 6, Goel in view of Yan teaches the method of claim 1, the method further comprising: 
receiving a first input (Goel, Par. [0059], “At operation 402, example process 400 may comprise receiving sensor data, according to any of the techniques discussed herein.”, thus, first input data is received as shown in Figure 4, label 402); 
processing the first input using the first ML model (Goel, Par. [0060], “At operation 404, example process 400 may comprise generating, by a first ML model, an ROI, classification associated with an object, and/or one or more feature maps, according to any of the techniques discussed herein. In some examples, according to the architecture discussed herein, the first ML model may be a parent (and/or a subnetwork) to one or more child ML models (which may be different ML model(s) and/or additional subnetworks of a same ML model as the first ML model), where each child ML model (i.e., sub-class ML model) corresponds to a different candidate classification that the first ML model is trained to generate a probability distribution for. In an additional or alternate example, the classifications with which the child ML models are associated may overlap, such that two or more sub-class models may generate a sub-class and/or classification probability based at least in part on a classification output by the first ML model. In some examples, the classification may be associated with the ROI and/or otherwise associated with a representation of the object in the sensor data.”, thus, the first input is processed by the first ML model, as shown in Figure 4 label 404); and 
upon determining, based on output of the first ML model, that the first input corresponds to either the first class or the second class (Goel, Par. [0061], “At operation 406, example process 400 may comprise selecting, based at least in part on the classification generated by the first ML model, a sub-class ML model from among a plurality of sub-class ML models, according to any of the techniques discussed herein. In some examples, operation 406 may further comprise identifying which sub-class ML models are associated as children to the first ML model. However, in an additional or alternate example, the sub-class ML models may be communicatively coupled (e.g., by hardware and/or by a software switch) to the first ML model such that an output of the first ML model is transmitted directly to a sub-class ML model that corresponds to the classification indicated by the output of the first ML model. Regardless, selecting the sub-class ML model may comprise determining that the sub-class ML model is associated with the classification generated by the first ML model. In an additional or alternate example, first ML model and the sub-class ML model(s) may be sub-portions of a same neural network. In such an example, operation 406 may be omitted.”, therefore, as shown in Figure 4 label 406, which shows the determination of classification based on the first input into the first ML model): 
processing the first input using the second ML model (Goel, Par. [0064], “At operation 408, example process 400 may comprise generating, by the selected sub-class ML model, a sub-classification and/or a sub-classification probability associated with the object, according to any of the techniques discussed herein. For example, generating the sub-classification and/or the probability may be based at least in part on the portion(s) of the feature map determined above and/or input to the first ML model, the ROI determined by the first ML model, and/or the classification determined by the first ML model. Although, once trained, in some examples, the sub-class ML model may not receive the classification and/or the ROI determined by the first ML model since the sub-class ML model may be trained to generate the sub-classification and/or probability based at least in part on the portion(s) of the feature map determined to correspond to the object. In some examples, a selection component may ensure that portion(s) are routed to the correct sub-class ML model that corresponds to the classification generated by the first ML model. For example, the selection component may comprise a layer of a neural network designed to transmit data from an output node of a parent ML model to an input node of a child ML model and/or a hardware and/or software switch.”, therefore, as shown in Figure 4 label 408, the first input is processed by the second ML and evaluated against the threshold as shown in label 410); and 
returning output of the second ML model (Goel, Fig. 4, as shown in Figure 4 labels 412 and 414, the output is returned based on classification of the second ML model)

Regarding Claim 7, Goel in view of Yan teaches the method of claim 6, the method further comprising: 
receiving a second input (Goel, Par. [0059], “At operation 402, example process 400 may comprise receiving sensor data, according to any of the techniques discussed herein.”, thus, second input data is received as shown in Figure 4, label 402); 
processing the second input using the first ML model (Goel, Par. [0060], “At operation 404, example process 400 may comprise generating, by a first ML model, an ROI, classification associated with an object, and/or one or more feature maps, according to any of the techniques discussed herein. In some examples, according to the architecture discussed herein, the first ML model may be a parent (and/or a subnetwork) to one or more child ML models (which may be different ML model(s) and/or additional subnetworks of a same ML model as the first ML model), where each child ML model (i.e., sub-class ML model) corresponds to a different candidate classification that the first ML model is trained to generate a probability distribution for. In an additional or alternate example, the classifications with which the child ML models are associated may overlap, such that two or more sub-class models may generate a sub-class and/or classification probability based at least in part on a classification output by the first ML model. In some examples, the classification may be associated with the ROI and/or otherwise associated with a representation of the object in the sensor data.”, thus, the second input is processed by the first ML model, as shown in Figure 4 label 404); and 
upon determining, based on output of the first ML model, that the second input corresponds to a third class of the plurality of classes, returning the output of the first ML model (Goel, Par. [0069], “In an additional or alternate example, if the sub-class ML model is related as a parent to other sub-class ML models, the sub-class ML model may transmit the sub-classification and/or portion(s) of the feature map(s) to a child sub-class ML model that corresponds to the sub-classification. The process described above at operations 408-414 may be repeated at the child sub-class ML model. However, if a sub-classification generated by the child sub-class ML model is less than a second probability threshold (e.g., which may equal, exceed, or be less than the probability threshold), instead of associating the classification with the object, the example process 400 may comprise associating the sub-classification of the sub-class ML model with the object. In some examples, an ROI, classification, and/or probability may be output by the first ML model and a sub-classification and/or probability may additionally or alternatively be output by any sub-class ML model that the feature data reaches (e.g., by selection/forwarding).”, thus, the process 400 may be repeated based on the output of the first ML model, where the second input corresponds to a third class).

Regarding Claim 8, Goel in view of Yan teaches a computer-readable storage medium containing computer program code that, when executed by operation of one or more computer processors, performs an operation (Goel, Par. [0100], “A non-transitory computer-readable medium storing processor-executable instructions that, when executed by one or more processors, cause the one or more processors to perform operations”, thus, a computer-readable storage medium containing computer program code to perform an operation is disclosed) comprising: 
receiving a data set for training one or more machine learning (ML) models, wherein the data set comprises labeled exemplars for a plurality of classes (See Claim 1 – recites substantially the same limitations as Claim 1 in the form of a computer-readable storage medium, therefore it is rejected under the same rationale); 
partitioning the data set into a training set and a first testing set (See Claim 1 – recites substantially the same limitations as Claim 1 in the form of a computer-readable storage medium, therefore it is rejected under the same rationale); 
training a first ML model using the training set (See Claim 1 – recites substantially the same limitations as Claim 1 in the form of a computer-readable storage medium, therefore it is rejected under the same rationale); 
evaluating, using the first testing set, a quality of the first ML model with respect to each class of the plurality of classes (See Claim 1 – recites substantially the same limitations as Claim 1 in the form of a computer-readable storage medium, therefore it is rejected under the same rationale); and 
upon determining that quality of the first ML model is below a predefined threshold with respect to a first class and a second class of the plurality of classes (See Claim 1 – recites substantially the same limitations as Claim 1 in the form of a computer-readable storage medium, therefore it is rejected under the same rationale): 
identifying a subset of the training set, wherein each exemplar in the subset corresponds to either the first class or the second class (See Claim 1 – recites substantially the same limitations as Claim 1 in the form of a computer-readable storage medium, therefore it is rejected under the same rationale); and 
training a second ML model using the subset of the training set (See Claim 1 – recites substantially the same limitations as Claim 1 in the form of a computer-readable storage medium, therefore it is rejected under the same rationale).
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

Claim 9 recites substantially the same limitations as Claim 2, in the form of a computer-readable storage medium, therefore it is rejected under the same rationale.

Claim 10 recites substantially the same limitations as Claim 3, in the form of a computer-readable storage medium, therefore it is rejected under the same rationale.

Claim 11 recites substantially the same limitations as Claim 4, in the form of a computer-readable storage medium, therefore it is rejected under the same rationale.

Claim 12 recites substantially the same limitations as Claim 5, in the form of a computer-readable storage medium, therefore it is rejected under the same rationale.

Claim 13 recites substantially the same limitations as Claim 6, in the form of a computer-readable storage medium, therefore it is rejected under the same rationale.

Claim 14 recites substantially the same limitations as Claim 7, in the form of a computer-readable storage medium, therefore it is rejected under the same rationale.

Regarding Claim 15, Goel in view of Yan teaches a system comprising: 
one or more computer processors (Goel, Par. [0090], “A system comprising: one or more processors;”, thus, one or more processors are disclosed); and 
a memory containing a program which when executed by the one or more computer processors performs an operation (Goel, Par. [0090], “memory storing processor-executable instructions that, when executed by the one or more processors, cause the system to perform operations”, therefore, a memory containing a program to be executed by the one or more processors to perform an operation is disclosed), the operation comprising: 
receiving a data set for training one or more machine learning (ML) models, wherein the data set comprises labeled exemplars for a plurality of classes (See Claim 1 – recites substantially the same limitations as Claim 1 in the form of a system, therefore it is rejected under the same rationale); 
partitioning the data set into a training set and a first testing set (See Claim 1 – recites substantially the same limitations as Claim 1 in the form of a system, therefore it is rejected under the same rationale); 
training a first ML model using the training set (See Claim 1 – recites substantially the same limitations as Claim 1 in the form of a system, therefore it is rejected under the same rationale); 
evaluating, using the first testing set, a quality of the first ML model with respect to each class of the plurality of classes (See Claim 1 – recites substantially the same limitations as Claim 1 in the form of a system, therefore it is rejected under the same rationale); and 
upon determining that quality of the first ML model is below a predefined threshold with respect to a first class and a second class of the plurality of classes (See Claim 1 – recites substantially the same limitations as Claim 1 in the form of a system, therefore it is rejected under the same rationale):
identifying a subset of the training set, wherein each exemplar in the subset corresponds to either the first class or the second class (See Claim 1 – recites substantially the same limitations as Claim 1 in the form of a system, therefore it is rejected under the same rationale); and 
training a second ML model using the subset of the training set (See Claim 1 – recites substantially the same limitations as Claim 1 in the form of a system, therefore it is rejected under the same rationale).
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

Claim 16 recites substantially the same limitations as Claims 2 and 3, in the form of a system, therefore it is rejected under the same rationale.

Claim 17 recites substantially the same limitations as Claim 4, in the form of a system, therefore it is rejected under the same rationale.

Claim 18 recites substantially the same limitations as Claim 5, in the form of a system, therefore it is rejected under the same rationale.

Claim 19 recites substantially the same limitations as Claim 6, in the form of a system, therefore it is rejected under the same rationale.

Claim 20 recites substantially the same limitations as Claim 7, in the form of a system, therefore it is rejected under the same rationale.

Conclusion
6.	The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure:
Perrey et al. (US PG-PUB 20180240551) disclosed systems and methods relating to hierarchical machine learning models.
Acharya et al. (US PG-PUB 20180232658) disclosed methods, systems, and computer readable media for performing a hierarchical topic machine learning operation on training data.
Goto et al. (US PG-PUB 20190087384) disclosed a classifier of machine learning in which transformed data is generated and processed by a classifier.
Khapali et al. (US PG-PUB 20210034960) disclosed processors for training a set of machine learning models and ranking each model based on thresholds.
Merler et al. (US Patent 9928448) disclosed methods utilizing two or more classifiers to calculate probability scores in a classification hierarchy. 
Vaughan et al. (US PG-PUB 20190019581) disclosed a diagnostic module comprising machine learning models and classifiers.
Verma et al. (US Patent 10977711) disclosed training a first and second machine learning model.

7.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Devika S Maharaj whose telephone number is 571-272-0829. The examiner can normally be reached Monday - Thursday 7:30am - 4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/D.S.M./Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123