Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after August 19, 2019, is being examined under the first inventor to file provisions of the AIA .
Claim 1-20 are pending.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 1, 
2A Prong 1: The limitation of analyzing the input data corresponding to a root level node of the model tree classifier to generate a level node classification and a confidence score corresponding to the classification is a mental process, because the limitation recites a process of classifying input data and calculating how sure about the classification result, which can be done with the aid of pen and paper. 
The limitation of for each level in the hierarchy of nodes after the root level node: determining a next level node based on a generated classification output of a previous level node is a mental process, as it recites deciding next node to input the result of the previous classification result, which can be done in human mind. 
The limitation of analyzing the input data to generate a level node classification output and a level node confidence score corresponding to the classification is a mental process, because it merely recites classifying the input data for each node in the decision tree, which can be done with the aid of pen and paper.
The limitation of determining whether each level node classification output is aligned with a previous level node classification output is a mental process, as it recites determining if each classification results from each node matches or not, which can be done in human mind.
The limitation of based on determining that each level node classification output is aligned with a previous level node classification output, determining whether a confidence score corresponding to at least one level node classification output is greater than a specified threshold is a mental process, because it recites calculating a confidence score for the classification result and determine if the score is greater than a threshold based on the classification results of each nodes.
The limitation of generating a final classification for the input data based on determining that a confidence score corresponding to the at least one level node classification output is greater than the specified threshold, the final classification comprising the level node classification output of the last level node in the hierarchy of nodes is also a mental process, because it merely recites outputting final classification result if the confidence score satisfies the threshold.
2A Prong 2: This judicial exception is not integrated into a practical application. The limitation of receiving input data for classification is an insignificant extra-solution activity.
2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The limitation of receiving input data for classification is a mere data gathering (MPEP 2106.05(g)). A model tree classifier, the first machine learning model, a machine learning model corresponding to each level in a hierarchy of nodes and at a server system are merely a form of a field of use and technological environment (MPEP 2106.05(h)).

Regarding claim 9, the limitation of a system comprising: a memory that stores instructions; and one or more processors configured by the instructions to perform operations is a generic computer component. Claim 9 is a system claim having similar limitation to the method claim 1. Therefore, it is rejected with the same rationale as claim 1 above.

Regarding claim 17, the limitation of a non-transitory computer-readable medium comprising instructions stored thereon that are executable by at least one processor to cause a computing device to perform operations is a generic computer component. Claim 17 is a non-transitory computer-readable medium claim having similar limitation to the method claim 1. Therefore, it is rejected with the same rationale as claim 1 above.

Regarding claim 2, the limitation of based on determining that each level node classification output is not aligned with a previous level node classification output based on determining at first level node classification is not aligned with a previous second level node classification, generating the final classification for the input data based on determining that a confidence score corresponding to the at least one level node classification output is greater than the specified threshold, the final classification comprising the previous second level node classification is a mental process, because the limitation merely recites a process of figuring out if the first classification and second classification matches, and outputting classification result only if the calculated confidence score is lower than the specified threshold, which can be done with the aid of pen and paper.
This judicial exception is not integrated into a practical application. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Claim 10 is a system claim having similar limitation to the method claim 2. Therefore, it is rejected with the same rationale as claim 2 above.
Claim 18 is a non-transitory computer-readable medium claim having similar limitation to the method claim 2. Therefore, it is rejected with the same rationale as claim 2 above.

Regarding claim 3, the limitation of not generating the final classification based on determining that there is no confidence score corresponding to a level node classification that is greater than the specified threshold is a mental process, as it recites a process of not classifying the input data if you are not confident with the result (confidence score less than a threshold).
This judicial exception is not integrated into a practical application. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Claim 11 is a system claim having similar limitation to the method claim 3. Therefore, it is rejected with the same rationale as claim 3 above.
Claim 19 is a non-transitory computer-readable medium claim having similar limitation to the method claim 3. Therefore, it is rejected with the same rationale as claim 3 above.

Regarding claim 4, the limitation of determining that a number of levels of nodes that are aligned are less than a specified threshold number of levels is a mental process, as it merely recites a method of counting number of nodes that are aligned.
The limitation of not generating the final classification based on the determination that the number of levels of nodes that are aligned is less than the specified threshold number of levels is also a mental process, as it recites a method of not classifying the input data if number of aligned nodes are less than the threshold, which can be done in human mind.
This judicial exception is not integrated into a practical application. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Claim 12 is a system claim having similar limitation to the method claim 4. Therefore, it is rejected with the same rationale as claim 4 above.
Claim 20 is a non-transitory computer-readable medium claim having similar limitation to the method claim 4. Therefore, it is rejected with the same rationale as claim 4 above.

Regarding claim 5, the limitation of based on determining that each level node classification output is not aligned with a previous level node classification output based on determining at first level node classification is not aligned with a previous second level node classification, determining that a number of levels of nodes that are aligned is less than a specified threshold number of levels is a mental process, because the limitation merely recites a process of figuring out if the output of classifiers matches, which can be done in human mind. The limitation of based on determining that a confidence score is greater than a higher specified threshold, generating the final classification for the input data, the final classification comprising the previous second level node classification is also a mental process, as it recites classifying the input data if the calculated confidence score is greater than the threshold, which can be done with the aid of pen and paper.
This judicial exception is not integrated into a practical application. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Claim 13 is a system claim having similar limitation to the method claim 5. Therefore, it is rejected with the same rationale as claim 5 above.

Regarding claim 6, the limitation of wherein the input data is at least one of an image, a document, text, video, or audio is merely a form of a field of use and technological environment (MPEP 2106.05(h)).
This judicial exception is not integrated into a practical application. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Claim 14 is a system claim having similar limitation to the method claim 6. Therefore, it is rejected with the same rationale as claim 6 above.

Regarding claim 7, the limitation of wherein the first machine learning model is a different type of machine learning model than the machine learning model corresponding to a next level node of the model tree classifier merely recites a form of a field of use and technological environment (MPEP 2106.05(h)), as it merely recites using different types of machine learning models.
This judicial exception is not integrated into a practical application. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Claim 15 is a system claim having similar limitation to the method claim 7. Therefore, it is rejected with the same rationale as claim 7 above.

Regarding claim 8, the limitation of wherein the first machine learning model is a less processing-intense machine learning model and generates a less precise classification and the machine learning model corresponding to a next level node of the model tree classifier is a more processing-intense machine learning model and generates a more precise classification merely recites a form of a field of use and technological environment (MPEP 2106.05(h)), as it merely recites two machine learning models producing different outputs.
This judicial exception is not integrated into a practical application. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Claim 16 is a system claim having similar limitation to the method claim 8. Therefore, it is rejected with the same rationale as claim 8 above.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 1-3, 6-11, 14-19 are rejected under 35 U.S.C. 103 over Roy (Roy et al, 2018, “Tree-CNN: A Deep Convolutional Neural Network for Lifelong Learning”) in view of Silla (Silla et al, 2011, “A survey of hierarchical classification across different application domains”), and further in view of Bhattacharjee (US 20160055262 A1).

Regarding claim 1, Roy teaches a computer-implemented method ([Roy, Abstract] “In recent years, Convolutional Neural Networks (CNNs) have shown remarkable performance in many computer vision tasks such as object recognition and detection. However, complex training issues, such as “catastrophic forgetting” and hyper-parameter tuning, make incremental learning in CNNs a difficult challenge. In this paper, we propose a hierarchical deep neural network, with CNNs at multiple levels, and a corresponding training method for lifelong learning”, shows that Roy performed an experiment using a computer) comprising: 
receiving, at a server system, input data for classification by a model tree classifier comprising a machine learning model corresponding to each level in a hierarchy of nodes in the model tree classifier ([Roy, page 4, right column, paragraph 4.1.2; page 4, Fig.2] “For ease of reference, we label this network as Tree-CNN A. The root node is a DCNN with two output nodes. It will classify the input image as either “Animals” or “Vehicles”. Each child node has a DCNN that does finer classification. The description of the layers in each of these sub-networks is given in Tables 1 and 2. Fig. 2 and Fig. 4 a) depict the initial model of Tree-CNN A”, discloses the first model (root model) within a tree structure (Tree-CNN). [Roy, page 5, left column, line 2-7; page 4, Fig.2] “The networI wik at the root node is trained to classify the images as “Animals” or “Vehicles”. For this, the 30; 000 training images belonging to the 6 classes are re-labeled as “Animals” or “Vehicles”. The root node is then trained for 300 epochs. The learning rate is kept at 0:1 for first 200 epochs, then reduced by 10 times every 50 epochs” and 32x32x3 image of the Figure 2 discloses the first instance of input to a first model. Server system, is a generic computer and it is obvious that Roy receives its input from a generic computer)
analyzing the input data using a first machine learning model corresponding to a root level node of the model tree classifier to generate a level node classification ([Roy, page 5, left column, line 2-7; page 4, Fig.2] “The network at the root node is trained to classify the images as “Animals” or “Vehicles”. For this, the 30; 000 training images belonging to the 6 classes are re-labeled as “Animals” or “Vehicles”. The root node is then trained for 300 epochs. The learning rate is kept at 0:1 for first 200 epochs, then reduced by 10 times every 50 epochs” and 32x32x3 image of the Figure 2 discloses the first instance of input to a first model, [Roy, page 3, left column, last paragraph, line 2-4 - right column line 1-6; page 3, Fig. 1] “First, we describe how the network predicts the class of an input image. A recursive algorithm moves along the hierarchies … At each node, beginning with the top node, the image is fed to the DCNN associated with that node. The output node with the highest classification probability is the next node the algorithm moves to. If it is a leaf node, then the class associated with that node is the predicted class”); 
for each level in the hierarchy of nodes after the root level node in the model tree classifier: determining a next level node of the model tree classifier based on a generated classification output of a previous level node ([Roy, page 3, left column, last paragraph, line 2-4 - page 3, right column line 1-7; page 3, Fig. 1] “First, we describe how the network predicts the class of an input image. A recursive algorithm moves along the hierarchies … At each node, beginning with the top node, the image is fed to the DCNN associated with that node. The output node with the highest classification probability is the next node the algorithm moves to. If it is a leaf node, then the class associated with that node is the predicted class. Else, the algorithm feeds the image to the DCNN of that node”, the IMAGE of the Fig 1 shows the input image fed into both second model, which corresponds to the branch nodes, and first model, which corresponds to the root node after identification of the branch model); and 
analyzing the input data to generate a level node classification output ([Roy, page 5, left column, line 8-14; page 6, Fig.4] “At the second level, each of the two branch nodes are separately trained. “Animals” branch node is trained with 15; 000 training images from the 3 classes. This node further classifies the image into dog, cat, and horse. It is trained for 300 epochs and the learning rate is same as used for the top node. Similarly, the branch node labeled “Vehicles” is trained to identify the 3 distinct vehicles, ship, truck, and automobile”, discloses the input processing of the branch node, which corresponds to the second level classification); 
Roy does not specifically teach generating and analyzing a level node confidence score corresponding to the classification, determining whether each level node classification output is aligned with a previous level node classification output, based on determining that each level node classification output is aligned with a previous level node classification output, determining whether a confidence score corresponding to at least one level node classification output is greater than a specified threshold, generating a final classification for the input data based on determining that a confidence score corresponding to the at least one level node classification output is greater than the specified threshold, the final classification comprising the level node classification output of the last level node in the hierarchy of nodes in the model tree classifier.
Silla teaches generating and analyzing a level node confidence score ([Silla, page 42, second paragraph, line 2-6] “In order for a class to be assigned to a test example, the probabilities for the predicted class were used. In the first method, they use a boolean condition where the posterior probability of the classes at the first and second levels must be higher than a user specified threshold, in the case of a two level class hierarchy.”), and determining whether each level node classification output is aligned with a previous level node classification output ([Silla, page 42, second paragraph, line 1-6] “In Dumais and Chen (2000) the authors propose two class-membership inconsistency correction methods based on thresholds. In order for a class to be assigned to a test example, the probabilities for the predicted class were used. In the first method, they use a boolean condition where the posterior probability of the classes at the first and second levels must be higher than a user specified threshold, in the case of a two level class hierarchy”, Silla reference measures whether the confidence values of two different levels are higher than threshold or not. [Silla, page 41, 2nd paragraph, line 8-15] “In the case of single-label (per level) problems one can enforce the prediction of a single class label per level by assigning to a new test example just the class predicted with the greatest confidence among all classifiers at a given level—assuming classifiers output a confidence measure of their prediction. This approach has, however, a disadvantage. Considering the example of Fig. 4 it would be possible, using this approach, to have an output like class 1=false and class 1.2=true (since the classifiers for nodes 1 and 1.2 are independently trained), which leads to an inconsistency in class predictions across different levels”, teaches a classifier measures its confidence); 
Silla teaches determining that each level node classification output is aligned with a previous level node classification output ([Silla, page 41, last paragraph, line 4-5 - page 42, first paragraph, line 1-3] “For example, if the output for the binary classifier of class 2 is true, and the outputs of the binary classifiers for classes 2.1 and 2.2 are false, then this approach would ignore the answer of all the lower level classifiers predicting classes that are descendant of classes 2.1 and 2.2 and output the class 2 to the user”, teaches determining output result of root class 2 and child classes 2.1, and 2.2 are aligned or not), determining whether a confidence score corresponding to at least one level node classification output is greater than a specified threshold ([Silla, page 42, second paragraph, line 1-15] “In Dumais and Chen (2000) the authors propose two class-membership inconsistency correction methods based on thresholds. In order for a class to be assigned to a test example, the probabilities for the predicted class were used. In the first method, they use a boolean condition where the posterior probability of the classes at the first and second levels must be higher than a user specified threshold, in the case of a two level class hierarchy. The second method uses a multiplicative threshold that takes into account the product of the posterior probability of the classes at the first and second levels. For example, let us suppose that, for a given test example, the posterior probability for each class in the first two levels in Fig. 4 were: p(c1) = 0.6, p(c2) = 0.2, p(c1.1) = 0.55, p(c1.2) = 0.1, p(c2.1) = 0.2, p(c2.2) = 0.3. Considering a threshold of 0.5, by using the boolean rule the classes predicted for that test example would be class 1 and class 1.1 as both classes have a posterior probability higher than 0.5. By using the multiplicative threshold, the example would be assigned to class 1 but not class 1.1, as the posterior probability of class 1×the posterior probability of class 1.1 is 0.33, which is below the multiplicative threshold of 0.5 ”. [Silla, page 41, 2nd paragraph, line 8-15] “In the case of single-label (per level) problems one can enforce the prediction of a single class label per level by assigning to a new test example just the class predicted with the greatest confidence among all classifiers at a given level—assuming classifiers output a confidence measure of their prediction. This approach has, however, a disadvantage. Considering the example of Fig. 4 it would be possible, using this approach, to have an output like class 1=false and class 1.2=true (since the classifiers for nodes 1 and 1.2 are independently trained), which leads to an inconsistency in class predictions across different levels”, teaches a classifier measures its confidence); and 
generating a final classification for the input data based on determining that a confidence score corresponding to the at least one level node classification output is greater than the specified threshold, the final classification comprising the level node classification output of the last level node in the hierarchy of nodes in the model tree classifier ([Silla, page 42, second paragraph, line 6-15] “… The second method uses a multiplicative threshold that takes into account the product of the posterior probability of the classes at the first and second levels. For example, let us suppose that, for a given test example, the posterior probability for each class in the first two levels in Fig. 4 were: p(c1) = 0.6, p(c2) = 0.2, p(c1.1) = 0.55, p(c1.2) = 0.1, p(c2.1) = 0.2, p(c2.2) = 0.3. Considering a threshold of 0.5, by using the boolean rule the classes predicted for that test example would be class 1 and class 1.1 as both classes have a posterior probability higher than 0.5. By using the multiplicative threshold, the example would be assigned to class 1 but not class 1.1, as the posterior probability of class 1×the posterior probability of class 1.1 is 0.33, which is below the multiplicative threshold of 0.5”, Silla reference measures confidence values (posterior probability) of both levels are higher than threshold or not, and make final decision using both probabilities).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Roy and Silla to computing confidence values of two level nodes of Silla to implement the method of hierarchical classification system of Roy. The suggestion and/or motivation for doing so is to enhance the accuracy of the classification system, as calculating and comparing confidence value of prediction results from different levels is useful for catching classification errors.
Roy in view of Silla does not explicitly disclose based on determining that data is aligned with an another data, determining whether a score corresponding to the output is greater than a specified threshold.
Bhattacharjee teaches based on determining that data is aligned with another data, determining whether a score corresponding to the data output is greater than a specified threshold ([Bhattacharjee, 0037] “The method may be carried out for each candidate string representing a function signature in the call stack of a prior crash dump. Each candidate string can be compared to a string representing the current function signature. The method may include calculating the approximate string match (402) and the exact string match (408) between the candidate function signature and the current function signature. In many cases, simply combining the approximate and exact string matching results arithmetically may not produce the best results. Therefore, some embodiments may weight these two string matching results before combining The method may also include calculating a weighted approximate string match result (404) as well as calculating a weighted exact string match result (410). These results may be compared to individual threshold values (406, 412) to determine whether each of the results individually meet a predetermined criteria … If both of the weighted results pass their individual threshold, then they can be combined to calculate a combined string match score (416). This combined score can then be compared to a threshold to determine whether the combined score indicates a sufficiently similar match (418). If the combined score does not exceed the threshold, then the method can continue on to the next candidate function signature (414)”, Bhattacharjee teaches calculating string matches, and based on string matches, it calculates combined string match score and compare it with a threshold) 
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Roy, Silla, and Bhattacharjee to use the method of based on determining that data is aligned with another data, determining whether a score corresponding to the data output is greater than a specified threshold of Bhattacharjee to implement the method of hierarchical classification system of Roy and Silla. The suggestion and/or motivation to do so is to improve the accuracy of the classification process, as both checking if the classification of level nodes matches and calculating confidence score of the nodes have to be done to determine which classification result is more accurate.

Regarding claim 9, Roy in view of Silla and further in view of Bhattacharjee teaches a system comprising: a memory that stores instructions; and one or more processors configured by the instructions to perform operations ([Roy, Abstract] “In recent years, Convolutional Neural Networks (CNNs) have shown remarkable performance in many computer vision tasks such as object recognition and detection. However, complex training issues, such as “catastrophic forgetting” and hyper-parameter tuning, make incremental learning in CNNs a difficult challenge. In this paper, we propose a hierarchical deep neural network, with CNNs at multiple levels, and a corresponding training method for lifelong learning”, shows that Roy performed an experiment using a computer. Every computer has at least one memory to store instructions and processors to perform instructions). 
Claim 9 is a system claim having similar limitation to the method claim 1. Therefore, it is rejected with the same rationale as claim 1 above.

Regarding claim 17, Roy in view of Silla and further in view of Bhattacharjee teaches a non-transitory computer-readable medium comprising instructions stored thereon that are executable by at least one processor to cause a computing device to perform operations ([Roy, Abstract] “In recent years, Convolutional Neural Networks (CNNs) have shown remarkable performance in many computer vision tasks such as object recognition and detection. However, complex training issues, such as “catastrophic forgetting” and hyper-parameter tuning, make incremental learning in CNNs a difficult challenge. In this paper, we propose a hierarchical deep neural network, with CNNs at multiple levels, and a corresponding training method for lifelong learning”, shows that Roy performed an experiment using a computer. Every computer has at least one processor to perform some types of operations).
Claim 17 is a non-transitory computer-readable medium claim having similar limitation to the method claim 1. Therefore, it is rejected with the same rationale as claim 1 above.

Regarding claim 2, Roy in view of Silla and further in view of Bhattacharjee teaches based on determining that each level node classification output is not aligned with a previous level node classification output based on determining at first level node classification is not aligned with a previous second level node classification, generating the final classification for the input data based on determining that a confidence score corresponding to the at least one level node classification output is greater than the specified threshold, the final classification comprising the previous second level node classification ([Silla, page 42, 2nd paragraph, line 6-15] “The second method uses a multiplicative threshold that takes into account the product of the posterior probability of the classes at the first and second levels. For example, let us suppose that, for a given test example, the posterior probability for each class in the first two levels in Fig. 4 were: p(c1) = 0.6, p(c2) = 0.2, p(c1.1) = 0.55, p(c1.2) = 0.1, p(c2.1) = 0.2, p(c2.2) = 0.3. Considering a threshold of 0.5, by using the boolean rule the classes predicted for that test example would be class 1 and class 1.1 as both classes have a posterior probability higher than 0.5. By using the multiplicative threshold, the example would be assigned to class 1 but not class 1.1, as the posterior probability of class 1×the posterior probability of class 1.1 is 0.33, which is below the multiplicative threshold of 0.5”, teaches figuring out if at least one level node output is greater than the threshold, and final classification comprising the previous classification, as it is multilevel classification and the 2nd level classification involves the 1st level classification. [Silla, page 41, 2nd paragraph, line 8-15] “In the case of single-label (per level) problems one can enforce the prediction of a single class label per level by assigning to a new test example just the class predicted with the greatest confidence among all classifiers at a given level—assuming classifiers output a confidence measure of their prediction. This approach has, however, a disadvantage. Considering the example of Fig. 4 it would be possible, using this approach, to have an output like class 1=false and class 1.2=true (since the classifiers for nodes 1 and 1.2 are independently trained), which leads to an inconsistency in class predictions across different levels”, teaches a classifier measures its confidence and inconsistency between different level nodes).
Claim 10 is a system claim having similar limitation to the method claim 2. Therefore, it is rejected with the same rationale as claim 2 above.
Claim 18 is a non-transitory computer-readable medium claim having similar limitation to the method claim 2. Therefore, it is rejected with the same rationale as claim 2 above.

Regarding claim 3, Roy in view of Silla and further in view of Bhattacharjee teaches not generating the final classification based on determining that there is no confidence score corresponding to a level node classification that is greater than the specified threshold ([Silla, page 46, 4.4 Non-mandatory leaf node prediction and the blocking problem, first paragraph, line 5-9] “A simple way to deal with the NMLNP problem is to use a threshold at each class node, and if the confidence score or posterior probability of the classifier at a given class node—for a given test example—is lower than this threshold, the classification stops for that example”, [Silla, page 46, 4.4 Non-mandatory leaf node prediction and the blocking problem, second paragraph, line 2-8] “As briefly mentioned in Sect. 4.1, blocking occurs when, during the top-down process of classification of a test example, the classifier at a certain level in the class hierarchy predicts that the example in question does not have the class associated with that classifier. In this case the classification of the example will be “blocked”, i.e., the example will not be passed to the descendants of that classifier. For instance, in Fig. 1 blocking could occur, say, at class node 2, which would mean that the example would not be passed to the classifiers that are descendants of that node”, teaches the classifier stops classification if the classification result is not aligned).
Claim 11 is a system claim having similar limitation to the method claim 3. Therefore, it is rejected with the same rationale as claim 3 above.
Claim 19 is a non-transitory computer-readable medium claim having similar limitation to the method claim 3. Therefore, it is rejected with the same rationale as claim 3 above.

Regarding claim 6, Roy in view of Silla and further in view of Bhattacharjee teaches wherein the input data is at least one of an image, a document, text, video, or audio ([Roy, page 3, left column, paragraph 3.2 Algorithm, line 8-11] “The root node will be trained to classify the images into the highest level of super-classes. The branch nodes further down the tree are trained to classify the images into finer classes”, Roy classifies image data).
Claim 14 is a system claim having similar limitation to the method claim 6. Therefore, it is rejected with the same rationale as claim 6 above.

Regarding claim 7, Roy in view of Silla and further in view of Bhattacharjee teaches wherein the first machine learning model is a different type of machine learning model than the machine learning model corresponding to a next level node of the model tree classifier ([Roy, page 5, left column, line 2-7; page 4, Fig.2] “The network at the root node is trained to classify the images as “Animals” or “Vehicles”. For this, the 30; 000 training images belonging to the 6 classes are re-labeled as “Animals” or “Vehicles”. The root node is then trained for 300 epochs. The learning rate is kept at 0:1 for first 200 epochs, then reduced by 10 times every 50 epochs” and 32x32x3 image of the Figure 2 discloses the first instance of input to a first model, [Roy, page 5, left column, line 8-14; page 6, Fig.4] “At the second level, each of the two branch nodes are separately trained. “Animals” branch node is trained with 15; 000 training images from the 3 classes. This node further classifies the image into dog, cat, and horse. It is trained for 300 epochs and the learning rate is same as used for the top node. Similarly, the branch node labeled “Vehicles” is trained to identify the 3 distinct vehicles, ship, truck, and automobile”, discloses the input processing of the branch node, which corresponds to the second level classification. The first level classifier of Roy classifies the images into bigger categories, and the second level classifiers classifies the images into smaller categories. Therefore, the first and the second classifiers of Roy are different machine learning model).
Claim 15 is a system claim having similar limitation to the method claim 7. Therefore, it is rejected with the same rationale as claim 7 above.

Regarding claim 8, Roy in view of Silla and further in view of Bhattacharjee teaches wherein the first machine learning model is a less processing-intense machine learning model and generates a less precise classification and the machine learning model corresponding to a next level node of the model tree classifier is a more processing-intense machine learning model and generates a more precise classification ([Roy, page 5, left column, line 2-7; page 4, Fig.2] “The network at the root node is trained to classify the images as “Animals” or “Vehicles”. For this, the 30; 000 training images belonging to the 6 classes are re-labeled as “Animals” or “Vehicles”. The root node is then trained for 300 epochs. The learning rate is kept at 0:1 for first 200 epochs, then reduced by 10 times every 50 epochs” and 32x32x3 image of the Figure 2 discloses the first instance of input to a first model, [Roy, page 5, left column, line 8-14; page 6, Fig.4] “At the second level, each of the two branch nodes are separately trained. “Animals” branch node is trained with 15; 000 training images from the 3 classes. This node further classifies the image into dog, cat, and horse. It is trained for 300 epochs and the learning rate is same as used for the top node. Similarly, the branch node labeled “Vehicles” is trained to identify the 3 distinct vehicles, ship, truck, and automobile”, discloses the input processing of the branch node, which corresponds to the second level classification. The first level classifier of Roy classifies the images into broader categories (less-precise classification), and the second level classifiers classifies the images into narrower categories (more-precise classification) ).
Claim 16 is a system claim having similar limitation to the method claim 8. Therefore, it is rejected with the same rationale as claim 8 above.

Claim 4-5, 12-13, and 20 are rejected under 35 U.S.C. 103 over Roy (Roy et al, 2018, “Tree-CNN: A Deep Convolutional Neural Network for Lifelong Learning”) in view of Silla (Silla et al, 2011, “A survey of hierarchical classification across different application domains”), in view of Bhattacharjee (US 20160055262 A1), and further in view of Kumhyr (US 20030131095 A1).

Regarding claim 4, Roy in view of Silla and further in view of Bhattacharjee teaches determining confidence score of nodes that are not aligned are less than a specified threshold ([Silla, page 42, second paragraph, line 1-6] “In Dumais and Chen (2000) the authors propose two class-membership inconsistency correction methods based on thresholds. In order for a class to be assigned to a test example, the probabilities for the predicted class were used. In the first method, they use a boolean condition where the posterior probability of the classes at the first and second levels must be higher than a user specified threshold, in the case of a two-level class hierarchy”, in this case, the threshold number can be interpreted as the total number of nodes in the first and the second level of the classifier for two class-membership inconsistency correction, because all first and second level must align. [Silla, page 46, 4.4 Non-mandatory leaf node prediction and the blocking problem, second paragraph, line 2-8] “As briefly mentioned in Sect. 4.1, blocking occurs when, during the top-down process of classification of a test example, the classifier at a certain level in the class hierarchy predicts that the example in question does not have the class associated with that classifier. In this case the classification of the example will be “blocked”, i.e., the example will not be passed to the descendants of that classifier. For instance, in Fig. 1 blocking could occur, say, at class node 2, which would mean that the example would not be passed to the classifiers that are descendants of that node”, teaches the classifier stops classification if the classification result is not aligned, which also can be interpreted as the threshold number of level node is 1); and 
not generating the final classification based on the determination that the confidence score of nodes that are not aligned is less than the specified threshold ([Silla, page 46, 4.4 Non-mandatory leaf node prediction and the blocking problem, first paragraph, line 5-9] “A simple way to deal with the NMLNP problem is to use a threshold at each class node, and if the confidence score or posterior probability of the classifier at a given class node—for a given test example—is lower than this threshold, the classification stops for that example”, teaches the classification stops when it does not satisfy the threshold, [Silla, page 46, 4.4 Non-mandatory leaf node prediction and the blocking problem, second paragraph, line 2-8] “As briefly mentioned in Sect. 4.1, blocking occurs when, during the top-down process of classification of a test example, the classifier at a certain level in the class hierarchy predicts that the example in question does not have the class associated with that classifier. In this case the classification of the example will be “blocked”, i.e., the example will not be passed to the descendants of that classifier. For instance, in Fig. 1 blocking could occur, say, at class node 2, which would mean that the example would not be passed to the classifiers that are descendants of that node”, teaches the classifier stops classification if the classification result is not aligned).
Neither Roy nor Silla explicitly discloses determining the number of items aligned (matched) and performing operation if the number of items aligned are less than threshold number of levels.
Kumhyr teaches determining the number of items aligned (matched) and performing operation if the number of items aligned are less than threshold number of levels ([Kumhyr, Claim 13-15] “13. The program product of claim 12 wherein said instructions for step of determining if said advertisement is displayed includes the instructions for performing the steps of: determining a number of matched key items in content of said page; and determining if said number of matched key items is less than a predetermined lower threshold.  14. The program product of claim 13 wherein said advertisement is displayed if said number of matched key items is less that said predetermined lower bound.  15. The program product of claim 13 further comprising instructions for, if said number of matched key items is not less that said predetermined lower threshold, performing the step of determining if said number of matched key items is not less than a predetermined upper threshold, and wherein said advertisement does not display if said number of matched key items is not less than said predetermined upper threshold”).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Roy, Silla, Bhattacharjee, and Kumhyr to use the method of determining the number of items aligned (matched) and performing operation if the number of items aligned are less than threshold number of levels of Kumhyr to implement the method of hierarchical classification system of Roy, Silla, and Bhattacharjee. The suggestion and/or motivation to do so is to avoid the inconsistency of classification result between parent node and child nodes.
Claim 12 is a system claim having similar limitation to the method claim 4. Therefore, it is rejected with the same rationale as claim 4 above.
Claim 20 is a non-transitory computer-readable medium claim having similar limitation to the method claim 4. Therefore, it is rejected with the same rationale as claim 4 above.

Regarding claim 5, Roy in view of Silla and further in view of Bhattacharjee teaches based on determining that each level node classification output is not aligned with a previous level node classification output based on determining at first level node classification is not aligned with a previous second level node classification ([Silla, page 41, 2nd paragraph, line 8-15] “In the case of single-label (per level) problems one can enforce the prediction of a single class label per level by assigning to a new test example just the class predicted with the greatest confidence among all classifiers at a given level—assuming classifiers output a confidence measure of their prediction. This approach has, however, a disadvantage. Considering the example of Fig. 4 it would be possible, using this approach, to have an output like class 1=false and class 1.2=true (since the classifiers for nodes 1 and 1.2 are independently trained), which leads to an inconsistency in class predictions across different levels”, teaches a classifier measures its confidence), determining confidence score of nodes that are not aligned is less than a specified threshold ([Silla, page 42, second paragraph, line 1-6] “In Dumais and Chen (2000) the authors propose two class-membership inconsistency correction methods based on thresholds. In order for a class to be assigned to a test example, the probabilities for the predicted class were used. In the first method, they use a boolean condition where the posterior probability of the classes at the first and second levels must be higher than a user specified threshold, in the case of a two-level class hierarchy”, in this case, the threshold number is the total number of nodes in the first and the second level of the classifier for two class-membership inconsistency correction, because all first and second level must align. [Silla, page 46, 4.4 Non-mandatory leaf node prediction and the blocking problem, second paragraph, line 2-8] “As briefly mentioned in Sect. 4.1, blocking occurs when, during the top-down process of classification of a test example, the classifier at a certain level in the class hierarchy predicts that the example in question does not have the class associated with that classifier. In this case the classification of the example will be “blocked”, i.e., the example will not be passed to the descendants of that classifier. For instance, in Fig. 1 blocking could occur, say, at class node 2, which would mean that the example would not be passed to the classifiers that are descendants of that node”, teaches the classifier stops classification if the classification result is not aligned, which also can be interpreted as the threshold number of level node is 1); and 
based on determining that a confidence score is greater than a higher specified threshold, generating the final classification for the input data, the final classification comprising the previous second level node classification ([Silla, page 42, second paragraph, line 6-15] “… The second method uses a multiplicative threshold that takes into account the product of the posterior probability of the classes at the first and second levels. For example, let us suppose that, for a given test example, the posterior probability for each class in the first two levels in Fig. 4 were: p(c1) = 0.6, p(c2) = 0.2, p(c1.1) = 0.55, p(c1.2) = 0.1, p(c2.1) = 0.2, p(c2.2) = 0.3. Considering a threshold of 0.5, by using the boolean rule the classes predicted for that test example would be class 1 and class 1.1 as both classes have a posterior probability higher than 0.5. By using the multiplicative threshold, the example would be assigned to class 1 but not class 1.1, as the posterior probability of class 1×the posterior probability of class 1.1 is 0.33, which is below the multiplicative threshold of 0.5”, Silla reference measures confidence values (posterior probability) of both levels are higher than threshold or not, and make final decision using both probabilities. [Silla, page 41, 2nd paragraph, line 8-15] “In the case of single-label (per level) problems one can enforce the prediction of a single class label per level by assigning to a new test example just the class predicted with the greatest confidence among all classifiers at a given level—assuming classifiers output a confidence measure of their prediction. This approach has, however, a disadvantage. Considering the example of Fig. 4 it would be possible, using this approach, to have an output like class 1=false and class 1.2=true (since the classifiers for nodes 1 and 1.2 are independently trained), which leads to an inconsistency in class predictions across different levels”, teaches a classifier measures its confidence).
Neither Roy nor Silla explicitly discloses performing specific operation when number of objects aligned (matched) are less than the specified threshold number.
Kumhyr teaches performing specific operation when number of objects aligned (matched) are less than the specified threshold number ([Kumhyr, Claim 13-15] “13. The program product of claim 12 wherein said instructions for step of determining if said advertisement is displayed includes the instructions for performing the steps of: determining a number of matched key items in content of said page; and determining if said number of matched key items is less than a predetermined lower threshold.  14. The program product of claim 13 wherein said advertisement is displayed if said number of matched key items is less that said predetermined lower bound.  15. The program product of claim 13 further comprising instructions for, if said number of matched key items is not less that said predetermined lower threshold, performing the step of determining if said number of matched key items is not less than a predetermined upper threshold, and wherein said advertisement does not display if said number of matched key items is not less than said predetermined upper threshold”, it displays advertisement if the number of items are less than predetermined threshold level (lower bound) ).
Claim 13 is a system claim having similar limitation to the method claim 5. Therefore, it is rejected with the same rationale as claim 5 above.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Regarding Tree-data structure classifiers.
Dumais, 2000, “Hierarchical Classification of Web Content”
US 9928448 B1
US 7349917 B2
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JUN KWON whose telephone number is (571)272-2072. The examiner can normally be reached on 7:30 AM - 5:30 PM. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached on (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/JUN KWON/
Examiner, Art Unit 2127
/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127