Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
This action is in reply to the amendments and remarks filed on 02/09/2022.
Claims 1-9 and 11-21 are pending.
Claims 1, 13, and 20 have been amended.

Response to Arguments
Applicant offers no arguments or amendments, with respect to claim 13 objections, and therefore the objections to claim 13 are therefore maintained.

Applicant’s arguments, with respect to the rejection(s) of claim(s) 1, 13, and 20 under 35 U.S.C. 103, have been considered but they are not persuasive. More specifically, the applicant argues that no art of record teaches the amended claim limitations (or analogously claimed limitations of claims 13 and 20), since “the clusters in Phoha are not selected the same way as…claimed” (i.e. “comparing” and “based at” steps of claim 1), “the clusters…are not trained random decision trees nor are they ‘results/outputs’ of trained random decision trees”, and “clusters are computed during a training phased of a trained random decision tree”. The examiner respectfully disagrees. 
Phoha, col. 6, lines 11-30: “create clusters of the dataset using k-means, and then organizes each identified k-means cluster into a decision tree using the ID3 algorithm…Each cluster is then subjected to the ID3 decision tree algorithm to impose a fine structure on each cluster” during training. Further, Col. 5. lines 8-13: The algorithm is executed on a computer [processor] having inputs, outputs and databases; Col. 6, lines 27-47: “This ID3 decision tree's characterization is compared against the associated cluster's characterization [comparing the values of features associated with the sensor data element with parameters of a first level of clusters], and the first conformity between the two characterizations (i.e. examine conformance with the closest cluster [first level of clusters], and if no conformance, move to the next closest cluster [first level of clusters], etc, repeat until conformance is obtained between cluster characterization and cluster ID3 decision tree data point characterization) is that characterization assigned to the unknown data point” [comparing the values of features associated with the sensor data element with parameters of a first level of clusters]; and Col. 6, lines 27-47: “This ID3 decision tree's characterization is compared against the associated cluster's characterization, and the first conformity between the two characterizations (i.e. examine conformance with the closest cluster, and if no conformance, move to the next closest cluster, etc, repeat until conformance is obtained between cluster characterization and cluster ID3 decision tree data point characterization [based at least on the comparing, selecting a cluster from the first level of clusters that includes parameters corresponding to the values of the features associated with the sensor data element]) is that characterization assigned to the unknown data point”. Additionally, “the clusters in Phoha are not…‘results/outputs’ of trained random decision trees” is not explicitly claimed.
Further, see 35 U.S.C 103 section for full mapping of claim limitations necessitated by applicant amendments.

Claim Objections
Claim 13 is objected to because of the following informalities:
Claim 13 recites a typo of missing punctuation stating “training a cluster-specific random decision tree using the selected one of the plurality of clusters”, and an optional amendment to overcome this objection would be as follows: “…clusters;”.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-2, 7, 9, and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Phoha et al (US 7,792,770 B1) hereinafter Phoha, in view of Criminisi et al. (US 9,519,868 B2) hereinafter Criminisi.
Regarding claim 1, Phoha teaches a machine learning system comprising; 
a memory storing at least one trained random decision tree and parameters of a plurality of clusters, the plurality of clusters being computed during a training phase of the trained random decision tree (col. 5. lines 8-13 teaches “The algorithm is executed on a computer having inputs, outputs and databases. The results from the training set, e.g. cluster identification [cluster parameters] and ID3 decision tree structure for each cluster, can be stored [memory] for later use on an data set to be identified, or can be computed on a run by run basis of the cascaded learning techniques on unknown data”. Further, Col. 4, line 54-Col. 5, line 13 teach “build[ing] a decision tree from the [training data] set” in two stages [training phase of the trained random decision tree], wherein a first stage includes “k-Means clustering is performed on training instances {Xi} to obtain k disjoint clusters” [the plurality of clusters being computed during a training phase of the trained random decision tree], and the second stage includes “each cluster of learning instances is further characterized using the known ID3 decision tree learning” [alternative the plurality of clusters being computed during a training phase of the trained random decision tree]); and
a processor programed to (Col. 5. lines 8-13: The algorithm is executed on a computer [processor] having inputs, outputs and databases):
push a sensor data element through the trained random decision tree to compute a prediction and to obtain values of features associated with the sensor data element (Col. 5. lines 8-13: The algorithm is executed on a computer [processor] having inputs, outputs and databases; Col. 6, lines 27-47: “unknown point will be…examined for closeness to the clusters, and for the closest clusters [and to obtain values of features associated with the sensor data element]” the unknown data will be “characterized by each cluster's ID3 decision tree” learning [push a sensor data element through the trained random decision tree to compute a prediction and to obtain values of features associated with the sensor data element]); 
comparing the values of features associated with the sensor data element with parameters of a first level of clusters (Col. 5. lines 8-13: The algorithm is executed on a computer [processor] having inputs, outputs and databases; Col. 6, lines 27-47: “This ID3 decision tree's characterization is compared against the associated cluster's characterization [comparing the values of features associated with the sensor data element with parameters of a first level of clusters], and the first conformity between the two characterizations (i.e. examine conformance with the closest cluster [first level of clusters], and if no conformance, move to the next closest cluster [first level of clusters], etc, repeat until conformance is obtained between cluster characterization and cluster ID3 decision tree data point characterization) is that characterization assigned to the unknown data point” [comparing the values of features associated with the sensor data element with parameters of a first level of clusters]);  
based at least on the comparing, selecting a cluster from the first level of clusters that includes parameters corresponding to the values of the features associated with the sensor data element (Col. 6, lines 27-47: “This ID3 decision tree's characterization is compared against the associated cluster's characterization, and the first conformity between the two characterizations (i.e. examine conformance with the closest cluster, and if no conformance, move to the next closest cluster, etc, repeat until conformance is obtained between cluster characterization and cluster ID3 decision tree data point characterization [based at least on the comparing, selecting a cluster from the first level of clusters that includes parameters corresponding to the values of the features associated with the sensor data element]) is that characterization assigned to the unknown data point”);
training at least one cluster-specific random decision tree using data from the selected cluster (col. 5. lines 8-13, Col. 6, lines 27-47, and claim 1: ID3 trees stored in memory; and used for characterizing unknown data to match to a specific cluster [selected cluster]; col. 4 lines 66-67 and claim 1: In the second stage of dataset characterization, each cluster of learning instances [selected cluster] is further characterized using the known ID3 decision tree learning. In ID3 characterization, the ID3 algorithm builds a decision tree from the set [training at least one cluster-specific random decision tree using data from the selected cluster].); 
pushing the prediction and the sensor data element through the cluster-specific random decision tree to compute another prediction (col. 15, lines 22-59 and Fig. 2: ID3 trees are trained on specific k-means clusters; test instance Z_i goes through a particular ID3 tree; instance is given an anomaly score [prediction]); 


However, while Phoha teaches determining a tree’s decision as “represented graphically as a path through the tree which the object follows…based on the terminal node on the path” [Col. 5, lines 20-29], Phoha does not explicitly teach and wherein the plurality of clusters group together sensor data elements which give rise to pathways of a specified metric when pushed through the trained random decision tree.
Criminisi teaches and wherein the plurality of clusters group together sensor data elements which give rise to pathways of a specified metric when pushed through the trained random decision tree (col. 4, lines 13-24 and col. 6 lines 35-37 teach “clusters of training elements are accumulated at leaf nodes” and thus have pathways that are deemed similar by similar being in the same leaf node [specified metric], wherein training data includes “sensor data”).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Criminisi’s teachings of training data clustering in decision tree leaf nodes into Phoha’s teaching of clustering algorithm for decision tree development in order to optimize decision tree prediction accuracy with data focused tree leaves (Criminisi col. 3 lines 58-Col. 4, line 24).

Regarding claim 2, the combination of Phoha and Criminisi teach all the claim limitations of claim 1 above; and further teach wherein the plurality of clusters also group together any one or more of: the values of features that are within a specified range of one another, values of the prediction that are within a specified range of one another (Phoha, col. 4 lines 62-65: Each k-means cluster represents a region of similar instances, ‘similar’ in terms of a chosen metric, such as Euclidean distances between the instances and the cluster “center” or center tendency, such as the centroid. Therefore, with the instances being within a certain distance of the centroid, the instances are within a certain distance of each other).

Regarding claim 7, the combination of Phoha and Criminisi teach all the claim limitations of claim 1 above; and further teach where the at least one trained random decision tree is part of a forest stored at the memory and wherein the processor pushes the sensor data element through a trained random decision forest to compute the prediction and to obtain the values of features associated with the sensor data element (Criminisi, col. 3 lines 3-8: A random decision forest is a plurality of random decision trees each having a root node, a plurality of split nodes and a plurality of leaf nodes. The root nodes, leaf nodes and split nodes may be represented using data structures in memory; col. 8 lines 57-63: The test data point is pushed 1006 through the tree until it reaches a leaf node and the cluster representation associated with that leaf during training is stored 1008 [feature value]. This is repeated 1010 for the other trees in the forest. The cluster representations are aggregated, for example by computing an average to obtain at least one cluster for the test data point. The test data point is then associated with a class label with a confidence [prediction]), and wherein the cluster-specific random decision tree is part of a cluster-specific random decision forest stored at the memory, and where the processor is configured to push the prediction through the cluster-specific random decision forest to compute the other prediction (Criminisi, col. 4 lines 32-38: forests, as ensembles of trees, behave the same way as trees except the ability of the system to generalize is improved…[and] deal well with new examples).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Criminisi’s teachings of decision tree forest operations into Phoha’s teaching of clustering algorithm for decision tree development in order to improve the system’s ability to generalize, that is to deal with new examples that differ from training data (Criminisi col. 4 lines 33-35).

Regarding claim 9, the combination of Phoha and Criminisi teach all the claim limitations of claim 1 above; and further teach a training logic which computes the plurality of clusters by clustering sensor data elements for which pathways have been observed during passing of the sensor data elements through a random decision forest (Criminisi, Abstract: In examples, a training objective [training logic] is used which seeks to cluster the observations based on the labels and similarity of the observations; col. 6 lines 35-37: plurality of clusters are accumulated at a leaf node, meaning their pathways are observed).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Criminisi’s teachings of training data clustering in decision tree leaf nodes and a training objective into Phoha’s teaching of clustering algorithm for decision tree development in order to optimize decision tree prediction accuracy with data focused tree leaves and image analysis (Criminisi, abstract, summary, and col. 6 lines 35-37).

Regarding claim 11, the combination of Phoha and Criminisi teach all the claim limitations of claim 9 above; and further teach wherein the training logic is configured to train the cluster-specific random decision tree (Criminisi, col. 6 lines 44-46: As mentioned above a training objective function is used to train a semi-supervised random decision forest which comprises a plurality of randomly trained trees).
Phoha and Criminisi are combinable for the same rationale as set forth above with respect to claims 1 and 9.

Claims 3, 4, 5, 13-18, and 20-21 are rejected under 35 U.S.C. 103 as being unpatentable over Phoha et al (US 7,792,770 B1) hereinafter Phoha, in view of Criminisi et al. (US 9,519,868 B2) hereinafter Criminisi, in view of Ainslie et al (US Patent 8316019) hereinafter Ainslie.
Regarding claim 3, the combination of Phoha and Criminisi teach all the claim limitations of claim 1 above. However the combination does not explicitly teach wherein the pathways of a specified metric are found by computing a metric which takes into account a depth of a deepest node of the random decision tree which is common to a pair of pathways through the trained random decision tree.
Ainslie teaches wherein the pathways of a specified metric are found by computing a metric which takes into account a depth of a deepest node of the random decision tree which is common to a pair of pathways through the trained random decision tree (col. 10 lines 23-26: For example, the relevance score can be directly proportional to the level of the lowest matching node (e.g., the matching node having the largest depth) in the profile tree, and asymptotically approaches a maximum value as the number of matching nodes increases).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify clustering algorithm for decision tree development as taught by Phoha, as modified by training data clustering in decision tree leaf nodes as taught by Criminisi, to include tree depth metrics in calculations as taught by Ainslie in order more accurately reflect the relevance of a particular data element to a specific cluster since a metric that is inversely proportional to depth becomes smaller as depth of the deepest common node increases (Ainslie col. 10 lines 20-44).

Regarding claim 4, the combination of Phoha and Criminisi teach all the claim limitations of claim 1 above. However the combination does not explicitly teach wherein the pathways of a specified metric are found by computing a metric which is inversely related to a depth of a deepest node of the random decision tree which is common to a pair of pathways through the trained random decision tree.
Ainslie teaches wherein the pathways of a specified metric are found by computing a metric which is inversely related to a depth of a deepest node of the random decision tree which is common to a pair of pathways through the trained random decision tree (col. 10 lines 37-44: For example, the personalized relevance score can be inversely proportional to the child counts the lowest matching node (e.g., the matching node having the largest depth) found in different branches of the profile tree).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify clustering algorithm for decision tree development as taught by Phoha, as modified by training data clustering in decision tree leaf nodes as taught by Criminisi, to include tree depth metrics in calculations as taught by Ainslie in order more accurately reflect the relevance of a particular data element to a specific cluster since a metric that is inversely proportional to depth becomes smaller as depth of the deepest common node increases (Ainslie col. 10 lines 20-44).

Regarding claim 5, the combination of Phoha and Criminisi teach all the claim limitations of claim 1 above. However the combination does not explicitly teach wherein the pathways of a specified metric are found by computing a metric which is inversely related to two to a power of a depth of a deepest node of the random decision tree which is common to a pair of pathways through the trained random decision tree, where the depth of the deepest node is expressed as an integer number of layers of a random decision forest.
Ainslie teaches wherein the pathways of a specified metric are found by computing a metric which is inversely related to two to a power of a depth of a deepest node of the random decision tree which is common to a pair of pathways through the trained random decision tree, where the depth of the deepest node is expressed as an integer number of layers of a random decision forest (col. 10 lines 20-44: a relevance score that is inversely proportional to a positive integer value [i.e. depth] is automatically inversely proportional to two to the power of that value).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify clustering algorithm for decision tree development as taught by Phoha, as modified by training data clustering in decision tree leaf nodes as taught by Criminisi, to include tree depth metrics in calculations as taught by Ainslie in order more accurately reflect the relevance of a particular data element since a metric that is inversely proportional to depth becomes smaller as depth of the deepest common node increases (Ainslie col. 10 lines 20-44).

Regarding claim 13, Phoha teaches a computer-implemented method of operation of a machine learning system comprising; 
during a training phase of a first level random decision tree (Col. 4, line 54-Col. 5, line 13 and col. 5. lines 8-13 teach utilizing a “training set” and “build[ing] a decision tree from the [training data] set” in two stages, wherein a first stage includes “k-Means clustering is performed on training instances {Xi} to obtain k disjoint clusters”, and the second stage includes “each cluster of learning instances is further characterized using the known ID3 decision tree learning” in at least one layer of the decision tree since decision trees are well known in the art to include at least two layers and the operations would be repeated [during a training phase of a first level random decision tree]): 
receiving a sensor data element (col. 15, lines 22-59, and Figs. 1-2: test instance Z_i is fed through system); and 
processing, using a processor, the sensor data element through the first level of the random decision tree to obtain values of features associated with the sensor data element (Col. 4, line 54-Col. 5, line 13, col. 5. lines 8-13, col. 15, lines 22-59, and Fig. 2: The algorithm is executed on a computer [processor] having inputs, outputs and databases; Col. 6, lines 27-47: “unknown point will be…examined for closeness to the clusters, and for the closest clusters [to obtain values of features associated with the sensor data element]” the unknown data will be “characterized by each cluster's ID3 decision tree” learning; col. 7 lines 40-45: attributes of each data point is evaluated to find nearest cluster [values of features] in at least one layer of the decision tree since decision trees are well known in the art to include at least two layers and the operations would be repeated); 
computing a plurality of clusters using a similarity metric that takes into account similarity of pathways in the first level random decision tree (col. 5. lines 8-13 teaches “The algorithm is executed on a computer having inputs, outputs and databases. The results from the training set, e.g. cluster identification [cluster parameters] and ID3 decision tree structure for each cluster, can be stored [memory] for later use on an data set to be identified, or can be computed on a run by run basis of the cascaded learning techniques on unknown data”. Further, Col. 4, line 54-Col. 5, line 13 teach “build[ing] a decision tree from the [training data] set” in two stages [training phase of the trained random decision tree], wherein a first stage includes “k-Means clustering is performed on training instances {Xi} to obtain k disjoint clusters” [computing a plurality of clusters using a similarity metric that takes into account similarity of pathways in the first level random decision tree], and the second stage includes “each cluster of learning instances is further characterized using the known ID3 decision tree learning”; wherein the training operations are performed in at least one layer of the decision tree since decision trees are well known in the art to include at least two layers and the operations would be repeated [alternative computing a plurality of clusters using a similarity metric that takes into account similarity of pathways in the first level random decision tree]); 
comparing the values of features associated with the received sensor data element with parameters and the plurality of a first level of clusters (Col. 5. lines 8-13: The algorithm is executed on a computer [processor] having inputs, outputs and databases; Col. 6, lines 27-47: “This ID3 decision tree's characterization is compared against the associated cluster's characterization [comparing the values of features associated with the sensor data element with parameters of a first level of clusters], and the first conformity between the two characterizations (i.e. examine conformance with the closest cluster [first level of clusters], and if no conformance, move to the next closest cluster [first level of clusters], etc, repeat until conformance is obtained between cluster characterization and cluster ID3 decision tree data point characterization) is that characterization assigned to the unknown data point” [comparing the values of features associated with the sensor data element with parameters of a first level of clusters]); 
based at least on the comparing, selecting a cluster from the first level of clusters that includes parameters corresponding to the values of the features associated with the sensor data element (Col. 6, lines 27-47: “This ID3 decision tree's characterization is compared against the associated cluster's characterization, and the first conformity between the two characterizations (i.e. examine conformance with the closest cluster, and if no conformance, move to the next closest cluster, etc, repeat until conformance is obtained between cluster characterization and cluster ID3 decision tree data point characterization [based at least on the comparing, selecting a cluster from the first level of clusters that includes parameters corresponding to the values of the features associated with the sensor data element]) is that characterization assigned to the unknown data point”);
training a cluster-specific random decision tree using the selected one of the plurality of clusters (col. 5. lines 8-13, Col. 6, lines 27-47, and claim 1: ID3 trees stored in memory; and used for characterizing unknown data to match to a specific cluster [selected cluster]; col. 4 lines 66-67 and claim 1: In the second stage of dataset characterization, each cluster of learning instances [selected cluster] is further characterized using the known ID3 decision tree learning. In ID3 characterization, the ID3 algorithm builds a decision tree from the set [training at least one cluster-specific random decision tree using data from the selected cluster].)
computing a prediction by passing the received sensor data element and the sensor data element through the cluster-specific random decision tree (col. 11 lines 49-55, col. 15, lines 22-59, and Fig. 2: test data is pushed through ID3 tree [cluster-specific tree] to predict anomaly of the test; col. 15, lines 22-59 and Fig. 2: ID3 trees are trained on specific k-means clusters; test instance Z_i goes through a particular ID3 tree; instance is given an anomaly score [prediction]), 



However, while Phoha teaches determining a tree’s decision as “represented graphically as a path through the tree which the object follows…based on the terminal node on the path” [Col. 5, lines 20-29], Phoha does not explicitly teach wherein the plurality of clusters group together sensor data elements on a basis of observed pathways when the sensor data elements are process by the trained random decision tree, and wherein the pathways of a specified metric are found by computing a metric which takes into account a depth of a deepest node of the random decision tree which is common to a pair of pathways through the trained random decision tree.
Criminisi teaches wherein the plurality of clusters group together sensor data elements on a basis of observed pathways when the sensor data elements are process by the trained random decision tree (col. 6 lines 35-37: clusters are accumulated at leaf nodes and thus have similar pathways , and col. 4, lines 13-24 and col. 6 lines 35-37 teach “clusters of training elements are accumulated at leaf nodes” and thus have pathways that are deemed similar by similar being in the same leaf node, wherein training data includes “sensor data” and col. 2, lines 58-59: trees can be used to cluster medical images).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Criminisi’s teachings of training data clustering in decision tree leaf nodes into Phoha’s teaching of clustering algorithm for decision tree development in order to optimize decision tree prediction accuracy with data focused tree leaves (Criminisi col. 3 lines 58-Col. 4, line 24).
However Criminisi does not explicitly teach and wherein the pathways of a specified metric are found by computing a metric which takes into account a depth of a deepest node of the random decision tree which is common to a pair of pathways through the trained random decision tree.
Ainslie teaches and wherein the pathways of a specified metric are found by computing a metric which takes into account a depth of a deepest node of the random decision tree which is common to a pair of pathways through the trained random decision tree (col. 10 lines 37-44: a relevance score that is inversely proportional to a positive integer value [i.e. depth] is automatically inversely proportional to two to the power of that value; and col. 10 lines 23-26: For example, the relevance score can be directly proportional to the level of the lowest matching node (e.g., the matching node having the largest depth) in the profile tree, and asymptotically approaches a maximum value as the number of matching nodes increases).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify clustering algorithm for decision tree development as taught by Phoha, as modified by training data clustering in decision tree leaf nodes as taught by Criminisi, to include tree depth metrics in calculations as taught by Ainslie in order more accurately reflect the relevance of a particular data element since a metric that is inversely proportional to depth becomes smaller as depth of the deepest common node increases (Ainslie col. 10 lines 20-44).

Regarding claim 14, the combination of Phoha, Criminisi, and Ainslie teach all the claim limitations of claim 13 above; and further teach wherein computing the plurality of clusters further comprises clustering sensor data elements for which behavior has been observed during passing of the sensor data elements through a random decision forest (Criminisi, col. 8 lines 57-59: test data points are observed as to which leaf node they end up in, which decides which particular cluster the data point to belong).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Criminisi’s teachings of training data clustering in decision tree leaf nodes into Phoha’s teaching of clustering algorithm for decision tree development in order to optimize decision tree prediction accuracy with data focused tree leaves (Criminisi col. 3 lines 58-Col. 4, line 24).

Regarding claim 15, the combination of Phoha, Criminisi, and Ainslie teach all the claim limitations of claim 13 above; and further teach wherein computing the plurality of clusters further comprises, for pairs of sensor data elements, computing a metric which takes into account a depth of a deepest node of the first level random decision tree which is common to a pathway of each sensor data element of the pair through the first level random decision tree (Ainslie, col. 10 lines 23-26: For example, the relevance score can be directly proportional to the level of the lowest matching node (e.g., the matching node having the largest depth) in the profile tree, and asymptotically approaches a maximum value as the number of matching nodes increases in at least one layer of the tree since these trees are well known in the art to include at least two layers and the operations would be repeated).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify clustering algorithm for decision tree development as taught by Phoha, as modified by training data clustering in decision tree leaf nodes as taught by Criminisi, to include tree depth metrics in calculations as taught by Ainslie in order more accurately reflect the relevance of a particular data element to a specific cluster since a metric that is inversely proportional to depth becomes smaller as depth of the deepest common node increases (Ainslie col. 10 lines 20-44).

Regarding claim 16, the combination of Phoha, Criminisi, and Ainslie teach all the claim limitations of claim 13 above; and further teach wherein computing the plurality of clusters further comprises, for pairs of sensor data elements, computing a metric which is inversely related to a depth of a deepest node of the first level random decision tree which is common to a pair of pathways through the first level random decision tree (Ainslie, col. 10 lines 37-44: For example, the personalized relevance score can be inversely proportional to the child counts the lowest matching node (e.g., the matching node having the largest depth) found in different branches of the profile tree in at least one layer of the tree since these trees are well known in the art to include at least two layers and the operations would be repeated).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify clustering algorithm for decision tree development as taught by Phoha, as modified by training data clustering in decision tree leaf nodes as taught by Criminisi, to include tree depth metrics in calculations as taught by Ainslie in order more accurately reflect the relevance of a particular data element to a specific cluster since a metric that is inversely proportional to depth becomes smaller as depth of the deepest common node increases (Ainslie col. 10 lines 20-44).

Regarding claim 17, the combination of Phoha, Criminisi, and Ainslie teach all the claim limitations of claim 13 above; and further teach training the cluster-specific random decision tree using data from the selected cluster (Phoha, col. 10 lines 8-10: For each training group cluster, the ID3 technique is employed to build a decision tree for that cluster, organized along a pre-selected subset of datapoint attribute).

Regarding claim 18, the combination of Phoha, Criminisi, and Ainslie teach all the claim limitations of claim 13 above; and further teach where the trained random decision tree is part of a forest and wherein the cluster-specific random decision tree is part of a cluster-specific random decision forest, and wherein the prediction is computed using the forests (Criminisi, col. 4 lines 32-38: forests, as ensembles of trees, behave the same way as trees except the ability of the system to generalize is improved…[and] deal well with new examples).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Criminisi’s teachings of decision tree forest operations into Phoha’s teaching of clustering algorithm for decision tree development in order to improve the system’s ability to generalize, that is to deal with new examples that differ from training data (Criminisi col. 4 lines 33-35).

Regarding claim 20, Phoha teaches a medical image analysis apparatus comprising: 
a memory storing at least one trained random decision tree and parameters of a plurality of clusters, the plurality of clusters being computed during a training phase of the trained random decision tree (col. 5. lines 8-13 and col. 19 lines 3-4 teach “The algorithm is executed on a computer having inputs, outputs and databases. The results from the training set, e.g. cluster identification [cluster parameters] and ID3 decision tree structure for each cluster, can be stored [memory] for later use on an data set to be identified, or can be computed on a run by run basis of the cascaded learning techniques on unknown data”; and that the computer is used for medical applications on “CT images, X-rays, MRI images, etc.” [medical image analysis apparatus]. Further, Col. 4, line 54-Col. 5, line 13 teach “build[ing] a decision tree from the [training data] set” in two stages [training phase of the trained random decision tree], wherein a first stage includes “k-Means clustering is performed on training instances {Xi} to obtain k disjoint clusters” [the plurality of clusters being computed during a training phase of the trained random decision tree], and the second stage includes “each cluster of learning instances is further characterized using the known ID3 decision tree learning” [alternative the plurality of clusters being computed during a training phase of the trained random decision tree]); 
a processor programed to (Col. 5. lines 8-13: The algorithm is executed on a computer [processor] having inputs, outputs and databases):
 push a medical image element through the trained random decision tree to compute a prediction of a class label of a class of objects of which the medical image element depicts (Col. 5. lines 8-13: The algorithm is executed on a computer [processor] having inputs, outputs and databases; col. 19 lines 3-4: data used can be CT images and X-rays [medical image element]; Col. 6, lines 27-47: “unknown point will be…examined for closeness to the clusters, and for the closest clusters [compute a prediction of a class label of a class of objects of which the medical image element depicts]” the unknown data will be “characterized by each cluster's ID3 decision tree” learning [compute a prediction of a class label of a class of objects of which the medical image element depicts]), 
obtain values of features associated with a sensor data element (Col. 6, lines 27-47: “unknown point will be…examined for closeness to the clusters, and for the closest clusters [obtain values of features associated with the sensor data element]” the unknown data will be “characterized by each cluster's ID3 decision tree” learning [obtain values of features associated with the sensor data element]);
compare the values of features associated with the sensor data element with parameters of a first level of clusters (Col. 5. lines 8-13: The algorithm is executed on a computer [processor] having inputs, outputs and databases; Col. 6, lines 27-47: “This ID3 decision tree's characterization is compared against the associated cluster's characterization [comparing the values of features associated with the sensor data element with parameters of a first level of clusters], and the first conformity between the two characterizations (i.e. examine conformance with the closest cluster [first level of clusters], and if no conformance, move to the next closest cluster [first level of clusters], etc, repeat until conformance is obtained between cluster characterization and cluster ID3 decision tree data point characterization) is that characterization assigned to the unknown data point” [comparing the values of features associated with the sensor data element with parameters of a first level of clusters]); and
based at least on the comparing, selecting a cluster from the first level of clusters that includes parameters corresponding to the values of the features associated with the sensor data element (Col. 6, lines 27-47: “This ID3 decision tree's characterization is compared against the associated cluster's characterization, and the first conformity between the two characterizations (i.e. examine conformance with the closest cluster, and if no conformance, move to the next closest cluster, etc, repeat until conformance is obtained between cluster characterization and cluster ID3 decision tree data point characterization [based at least on the comparing, selecting a cluster from the first level of clusters that includes parameters corresponding to the values of the features associated with the sensor data element]) is that characterization assigned to the unknown data point”);
the memory further storing at least one cluster-specific random decision tree, which has been trained using data from the selected cluster (col. 5. lines 8-13: ID3 trees stored in memory; col. 4 lines 66-67: In the second stage of dataset characterization, each cluster of learning instances is further characterized using the known ID3 decision tree learning. In ID3 characterization, the ID3 algorithm builds a decision tree from the set.); 
the processor further programmed to push the prediction and the sensor data element through the cluster-specific random decision tree to compute another prediction (col. 15, lines 22-59 and Fig. 2: ID3 trees are trained on specific k-means clusters; test instance Z_i goes through a particular ID3 tree; instance is given an anomaly score [prediction]); 



However, while Phoha teaches determining a tree’s decision as “represented graphically as a path through the tree which the object follows…based on the terminal node on the path” [Col. 5, lines 20-29], Phoha does not explicitly teach and wherein the plurality of clusters group together medical image elements which give rise to pathways of a specified metric when pushed through the trained random decision tree, and wherein the pathways of a specified metric are found by computing a metric which is inversely related to two to a power of a depth of a deepest node of the random decision tree which is common to a pair of pathways through the trained random decision tree, where the depth of the deepest node is expressed as an integer number of layers of a random decision forest.
Criminisi teaches and wherein the plurality of clusters group together medical image elements which give rise to pathways of a specified metric when pushed through the trained random decision tree (col. 4, lines 13-24 and col. 6 lines 35-37 teach “clusters of training elements are accumulated at leaf nodes” and thus have pathways that are deemed similar by similar being in the same leaf node [specified metric], wherein training data includes “sensor data” and col. 2, lines 58-59: trees can be used to cluster medical images).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Criminisi’s teachings of training data clustering in decision tree leaf nodes into Phoha’s teaching of clustering algorithm for decision tree development in order to optimize decision tree prediction accuracy with data focused tree leaves (Criminisi col. 3 lines 58-Col. 4, line 24).
However Criminisi does not explicitly teach and wherein the pathways of a specified metric are found by computing a metric which is inversely related to two to a power of a depth of a deepest node of the random decision tree which is common to a pair of pathways through the trained random decision tree, where the depth of the deepest node is expressed as an integer number of layers of a random decision forest.
Ainslie teaches and wherein the pathways of a specified metric are found by computing a metric which is inversely related to two to a power of a depth of a deepest node of the random decision tree which is common to a pair of pathways through the trained random decision tree, where the depth of the deepest node is expressed as an integer number of layers of a random decision forest (col. 10 lines 37-44: a relevance score that is inversely proportional to a positive integer value [i.e. depth] is automatically inversely proportional to two to the power of that value).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify clustering algorithm for decision tree development as taught by Phoha, as modified by training data clustering in decision tree leaf nodes as taught by Criminisi, to include tree depth metrics in calculations as taught by Ainslie in order more accurately reflect the relevance of a particular data element since a metric that is inversely proportional to depth becomes smaller as depth of the deepest common node increases (Ainslie col. 10 lines 20-44).

Regarding claim 21, the combination of Phoha, Criminisi, and Ainslie teach all the claim limitations of claim 20 above; and further teach wherein the specific metric is a specific range of magnitude between which a computed difference of the values of features and the parameters is defined (Phoha, col. 4 lines 62-65: Each k-means cluster represents a region of similar instances, ‘similar’ in terms of a chosen metric, such as Euclidean distances between the instances [feature value] and the cluster “center” or center tendency, such as the centroid [parameter]; col. 7 lines 42-45: specified range is defined by the closest cluster to the instance).

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Phoha et al (US 7,792,770 B1) hereinafter Phoha, in view of Criminisi et al. (US 9,519,868 B2) hereinafter Criminisi, in view of Nagamine et al (“Statistical prediction of protein–chemical interactions based on chemical structure and mass spectrometry data”, 2007) hereinafter Nagamine.
Regarding claim 6, the combination of Phoha and Criminisi teach all the claim limitations of claim 1 above; and further teach where the pathways of a specified metric are computed using a metric which takes into account distance between the values of features of a pair of sensor data elements expressed as vectors  (paragraphs 0 teach).
However Phoha does not explicitly teach and concatenated with an associated prediction.
Nagamine teaches and concatenated with an associated prediction (section 2.4: protein-chemical bindings can be represented by concatenating their feature vectors as one feature vector; note that the protein is the data and the associated chemical binding is the prediction)
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify clustering algorithm for decision tree development as taught by Phoha, as modified by training data clustering in decision tree leaf nodes as taught by Criminisi, to include vector concatenation as taught by Nagamine in order to improve functional efficiency by simplifying the representation of a data element and its prediction, and allows for convenient vector calculation through vector concatenating (Nagamine Section 2.4).

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Phoha et al (US 7,792,770 B1) hereinafter Phoha, in view of Criminisi et al. (US 9,519,868 B2) hereinafter Criminisi, in view of Winn et al. (US Pub 20080075367) hereinafter Winn.
Regarding claim 8, the combination of Phoha and Criminisi teach all the claim limitations of claim 7 above. However the combination does not explicitly teach wherein the sensor data element is an image or part of an image and the prediction is a class label of a class of object that the image is predicted to depict.
Winn teaches wherein the sensor data element is an image or part of an image and the prediction is a class label of a class of object that the image is predicted to depict (paragraph 0030: FIG. 4 is a high level schematic diagram of an apparatus 40 for object detection and recognition. It takes an image 44 as input which may be a digital photograph, video still or any suitable type of image . . . the apparatus produces as output 45 an object label map comprising a label for each image element specifying which object class, and optionally class instance, that image element is assigned to; 0034: In one example, the classifier comprises a plurality of decision trees, also referred to as a decision forest, each trained on a random subset of the training data.).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify clustering algorithm for decision tree development as taught by Phoha, as modified by training data clustering in decision tree leaf nodes as taught by Criminisi, to include classifying image data via decision forests as taught by Winn in order to train a decision tree on specific image data for more accurate predictions and since applying decision trees for object detection in images is known (Winn paragraphs 0002, 0030, and 0034).

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Phoha et al (US 7,792,770 B1) hereinafter Phoha, in view of Criminisi et al. (US 9,519,868 B2) hereinafter Criminisi, in view of Chen et al. (US Patent 8374974) hereinafter Chen.
Regarding claim 12, the combination of Phoha and Criminisi teach all the claim limitations of claim 9 above; claim 12 states wherein the memory stores at least one second level cluster-specific random decision tree and parameters of a plurality of second clusters (It is noted neither Phoha nor Criminisi explicitly disclose a second level cluster-specific random decision tree and parameters of second clusters. However, the above limitation is a mere duplication of a first level cluster-specific random decision tree and parameters first-level clusters, which is disclosed by Phoha (see rejection of claim 1), and further decision trees are well known in the art to include at least two layers and the operations would be repeated. The mere duplication of parts has no patentable significance unless a new and unexpected result is produced (see MPEP 2144.04(VI)(B)). A second level of cluster-specific random decision trees would function similar to a first level of cluster-specific random decision trees, in that both are trees trained on a particular cluster and used to output a prediction based on data fed through the tree. Thus, the limitation does not produce a new unexpected result and thus has no patentable significance
Nonetheless, Chen, Col. 3, line 56-Col. 4, line 43 and Fig. 1 teach using “a second level clustering” in the decision tree stored in “memory” and cluster “attributes”).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify clustering algorithm for decision tree development as taught by Phoha, as modified by training data clustering in decision tree leaf nodes as taught by Criminisi, to include decision tree “second level clustering” operations as taught by Chen in order to increase tree operation accuracy with the two-step clustering process (Chen, Col. 3, line 56-Col. 4, line 43 and Fig. 1).

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Phoha et al (US 7,792,770 B1) hereinafter Phoha, in view of Criminisi et al. (US 9,519,868 B2) hereinafter Criminisi, in view of Ainslie et al (US Patent 8316019) hereinafter Ainslie, in view of Chen et al. (US Patent 8374974) hereinafter Chen.
Regarding claim 19, the combination of Phoha, Criminisi, and Ainslie teach all the claim limitations of claim 18 above; claim 19 states further comprising using at least one second level cluster-specific random decision tree and parameters of a plurality of second-level clusters (It is noted that neither Phoha, Criminisi, nor Ainslie explicitly disclose a second level cluster-specific random decision tree and parameters of second clusters. However, the above limitation is a mere duplication of a first level cluster-specific random decision tree and parameters first-level clusters, which is disclosed by Phoha (see rejection of claim 1/18), and further decision trees are well known in the art to include at least two layers and the operations would be repeated. The mere duplication of parts has no patentable significance unless a new and unexpected result is produced (see MPEP 2144.04(VI)(B)). A second level of cluster-specific random decision trees would function similar to a first level of cluster-specific random decision trees, in that both are trees trained on a particular cluster and used to output a prediction based on data fed through the tree. Thus, the limitation does not produce a new unexpected result and thus has no patentable significance.
Nonetheless, Chen, Col. 3, line 56-Col. 4, line 43 and Fig. 1 teach using “a second level clustering” in the decision tree stored in “memory” and cluster “attributes”).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify clustering algorithm for decision tree development as taught by Phoha, as modified by training data clustering in decision tree leaf nodes as taught by Criminisi, as modified by tree depth metrics in calculations as taught by Ainslie, to include  as taught by Chen in order to increase tree operation accuracy with the two-step clustering process (Chen, Col. 3, line 56-Col. 4, line 43 and Fig. 1).

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CLINT MULLINAX whose telephone number is 571-272-3241.  The examiner can normally be reached on Mon - Fri 8:00-4:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/C.M./Examiner, Art Unit 2123                                                                                                                                                                                                        

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123