DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
The following claims are pending in this office action: 1-8, 10-16, and 18-22
The following claims are amended: 1-8, 10-16,  and 18-20
The following claims are new: 21-22
The following claims are cancelled: 9 and 17
The following claims are rejected: 1-8, 10-16, and 18-22
Response to Arguments
Applicant’s arguments with respect to claims 1-8, 10-16, and 18-22 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1 is rejected under 35 U.S.C. 103 as being unpatentable over U.S. Pub. No. US 20160342903 A1 to Shumpert (hereinafter, “Shumpert”), in view of U.S. Pub. No. US 20190205828 A1 to O’Hara, et al. (hereinafter, “O’Hara”)
	As per claim 1, Shumpert teaches a computer-implemented method for training a machine learning classifier, the method comprising:
performing unsupervised machine learning to identify a plurality of clusters in training data, the training data comprising multi-dimensional machine metrics generated by a plurality of computing devices; (Shumpert, Para [0081-0087] discloses “An example unsupervised learning approach…” and Para. [0088] discloses “This approach assumes that all data is already available; for instance, a large batch of sensor data was collected during a waiting period. When this algorithm has been executed completely—cluster assignment is repeated over and over until the assignments stabilize or converge—it will produce k clusters of sensor data.” And Fig 1 and 2 discloses training data sets (Note that Para. [0053] discloses that sensor data “…includes…a reading metric (e.g., voltage, current, etc.), and a reading value”))
	assigning a cluster label to the selected cluster (Shumpert, Para. [0089] discloses “In addition, without labeled training data, the standard algorithm cannot predict anomalies. The clusters merely represent groupings of the data, and it is unknown whether or not a given cluster is anomalous (e.g., the clusters could just represent different normal operating states). This can be addressed by collecting labeled training data and using it to classify the clusters, and the following pseudo-code example shows how the standard k-means algorithm can be alternatively adapted for supervised learning and prediction using labeled data:”)
Shumpert fails to explicitly teach:
determining, for each of the plurality of clusters and independent of user input specifying a label to be used in labeling the cluster, whether the cluster is complete or incomplete
selecting a cluster of the plurality of clusters
responsive to determining the cluster is incomplete, splitting the selected cluster into multiple clusters
and responsive to determining the selected cluster is complete, [[assigning a cluster label to the selected cluster]]
However, O’Hara (O’Hara addresses applying classification models to clusters) discloses:
determining, for each of the plurality of clusters and independent of user input specifying a label to be used in labeling the cluster, whether the cluster is complete or incomplete (O’Hara, Para. [0038] discloses “Traditional cluster splitting/merging algorithms may be applied in the re-calibrating process beginning at 610, where the quality of the clusters, such as that resulting from process detailed in FIG. 5 (labeled “B”), is computed. If it is determined that the quality of clustering can be improved by splitting or merging, a flag is set to true at 620 indicating that the clusters are not compact, and the change is accepted at 630 (e.g., to split or merge clusters). In one example, clusters that are not compact may be split into two clusters. In another example, if the number of records in a cluster is less than a certain percentage (“x %”) of the total number of scheduled delivery items, that cluster may be merged with the cluster nearest to it.” (Each clusters gets selected and subsequently evaluated))
selecting a cluster of the plurality of clusters (O’Hara, Para. [0038] discloses “Traditional cluster splitting/merging algorithms may be applied in the re-calibrating process beginning at 610, where the quality of the clusters, such as that resulting from process detailed in FIG. 5 (labeled “B”), is computed. If it is determined that the quality of clustering can be improved by splitting or merging, a flag is set to true at 620 indicating that the clusters are not compact, and the change is accepted at 630 (e.g., to split or merge clusters). In one example, clusters that are not compact may be split into two clusters. In another example, if the number of records in a cluster is less than a certain percentage (“x %”) of the total number of scheduled delivery items, that cluster may be merged with the cluster nearest to it.”(Each clusters gets selected and subsequently evaluated))
responsive to determining the cluster is incomplete, splitting the selected cluster into multiple clusters (O’Hara, Para. [0038] discloses “Traditional cluster splitting/merging algorithms may be applied in the re-calibrating process beginning at 610, where the quality of the clusters, such as that resulting from process detailed in FIG. 5 (labeled “B”), is computed. If it is determined that the quality of clustering can be improved by splitting or merging, a flag is set to true at 620 indicating that the clusters are not compact, and the change is accepted at 630 (e.g., to split or merge clusters). In one example, clusters that are not compact may be split into two clusters. In another example, if the number of records in a cluster is less than a certain percentage (“x %”) of the total number of scheduled delivery items, that cluster may be merged with the cluster nearest to it.” (Incomplete clusters are flagged to splitting))
and responsive to determining the selected cluster is complete, [[assigning a cluster label to the selected cluster]] (O’Hara, Para. [0038] discloses “Traditional cluster splitting/merging algorithms may be applied in the re-calibrating process beginning at 610, where the quality of the clusters, such as that resulting from process detailed in FIG. 5 (labeled “B”), is computed. If it is determined that the quality of clustering can be improved by splitting or merging, a flag is set to true at 620 indicating that the clusters are not compact, and the change is accepted at 630 (e.g., to split or merge clusters). In one example, clusters that are not compact may be split into two clusters. In another example, if the number of records in a cluster is less than a certain percentage (“x %”) of the total number of scheduled delivery items, that cluster may be merged with the cluster nearest to it.” (Clusters that are complete don’t get flagged for further splitting))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to plurality of clusters generated as disclosed by Shumpert to select a cluster and split it further if it is incomplete as disclosed by O’Hara. The combination would have been obvious because a person of ordinary skill in the art would be motivated to improve accuracy of the machine learning classifier. If clusters are deemed to be incomplete, it means that they are not in an optimal state thus should be further modified to potentially bring them closer to an optimal state, thus subsequently allowing for classification of the model to be improved.
2 is rejected under 35 U.S.C. 103 as being unpatentable over Shumpert, in view of O’Hara, and further in view of U.S. Patent No. US 10135863 B2 to Dennison, et al. (hereinafter, “Dennison”)
As per claim 2, the combination of Shumpert and O’Hara as shown above teaches the computer-implemented method of claim 1, O’Hara further teaches:
[[further comprising generating a completeness score for]] the selected cluster, and wherein determining whether each cluster of the plurality of clusters is complete or incomplete is performed based, at least in part, on [[the completeness score]] (O’Hara, Para. [0038] discloses “Traditional cluster splitting/merging algorithms may be applied in the re-calibrating process beginning at 610, where the quality of the clusters, such as that resulting from process detailed in FIG. 5 (labeled “B”), is computed. If it is determined that the quality of clustering can be improved by splitting or merging, a flag is set to true at 620 indicating that the clusters are not compact, and the change is accepted at 630 (e.g., to split or merge clusters). In one example, clusters that are not compact may be split into two clusters. In another example, if the number of records in a cluster is less than a certain percentage (“x %”) of the total number of scheduled delivery items, that cluster may be merged with the cluster nearest to it.”(Each clusters gets selected and subsequently evaluated))
The combination of Shumpert and O’Hara fails to explicitly teach:
further comprising generating a completeness score for [[the selected cluster, and wherein determining whether each cluster of the plurality of clusters is complete or incomplete is performed based, at least in part, on]] the completeness score
However, Dennison (Dennison addresses the issue of malicious software detection) teaches:
further comprising generating a completeness score for [[the selected cluster, and wherein determining whether each cluster of the plurality of clusters is complete or incomplete is performed based, at least in part, on]] the completeness score (Dennison, Col. 2, Line 34 discloses “The computer processors can be configured to execute the one or more software modules in order to cause the computer system to score the generated data item cluster” (Scoring to be done by using the selected cluster as disclosed by O’Hara above))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Shumpert as modified to use a score for cluster as disclosed by Dennison. The combination would have been obvious because a person of ordinary skill in the art would be motivated to improve the determination accuracy as to whether or not selected clusters are complete or incomplete. Using a score for the determination allows for a more objective determination rather than a subjective determination.

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Shumpert, in view of O’Hara, and further in view of Dennison, and further in view of U.S. Pub. No. US 20140143186 A1 to Bala (hereinafter, “Bala”)
As per claim 3, the combination of Shumpert, O’Hara, and Dennison as shown above teaches the computer-implemented method of claim 2, Dennison further teaches:
[[a distance between instances of training data in the selected cluster are approximately similar in length to a median non-zero distance between the instances of the training data]] (Dennison, Col. 2, Line 34 discloses “The computer processors can be configured to execute the one or more software modules in order to cause the computer system to score the generated data item cluster” (Scoring to be done by using the selected cluster as disclosed by O’Hara above))
The combination of Shumpert, O’Hara, and Dennison fails to explicitly teach:
[[wherein generating the completeness score for the cluster comprises determining]] a distance between instances of training data in the selected cluster are approximately similar in length to a median non-zero distance between the instances of the training data
	However, Bala (Bala addresses the issue of hybrid clustering) teaches:
[[wherein generating the completeness score for the cluster comprises determining]] a distance between instances of training data in the selected cluster are approximately similar in length to a median non-zero distance between the instances of the training data (Bala, Para. [0049] discloses “Once the centroid is determined, the Euclidean distance between each positive data point and the centroid is calculated and the values are used to estimate the average Dintra, the intra cluster distance…” (Using the intra cluster distance, the completeness score, as disclosed by Dennison above, may be computed using the intra cluster distance))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Shumpert as modified to determine if .

Claims 4-5 and 7-8 are rejected under 35 U.S.C. 103 as being unpatentable over Shumpert, in view of O’Hara, and further in view of U.S. Patent No. US 9110984 B1 to Lewis, et al. (hereinafter, “Lewis”)
As per claim 4, the combination of Shumpert and O’Hara as shown above teaches the computer-implemented method of claim 1, the combination of Shumpert and O’Hara fails to explicitly teach:
determining whether each of the plurality of clusters have been assigned a cluster label;
.and responsive to determining all clusters of the plurality of clusters have been assigned a cluster label, merging two or more of the plurality of clusters into a single cluster 
However, Lewis (Lewis addresses the issue of hierarchical clustering) teaches:
determining whether each of the plurality of clusters have been assigned a cluster label; (Lewis, Col. 12, Line 57 discloses “Thereafter, the labels may be reviewed using the label manager 130 to determine if two or more clusters within a specific level 106A-106C of hierarchy 106 are sufficiently related, such that the two or more related clusters may be merged to form a single cluster.” And Col. 12, Line 25-35 discloses “Thereafter, a label for each cluster in each level of the hierarchy 106 is determined 218…The label manager 130 may then iterate through each cluster of hierarchy 106 to determine the label for each cluster in hierarchy 106.” (Col. 12, Lines 35-56 of Lewis disclose assigning labels to clusters whereby clusters are merged afterwards. Clusters get reviewed and determine that they have been assigned a label))
and responsive to determining all clusters of the plurality of clusters have been assigned a cluster label, merging two or more of the plurality of clusters into a single cluster (Lewis, Col. 12, Line 57 discloses “Thereafter, the labels may be reviewed using the label manager 130 to determine if two or more clusters within a specific level 106A-106C of hierarchy 106 are sufficiently related, such that the two or more related clusters may be merged to form a single cluster.” (Col. 12, Lines 35-56 of Lewis disclose assigning labels to clusters whereby clusters are merged afterwards))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Shumpert as modified to merge clusters into a single cluster as disclosed by Lewis. The combination would have been obvious because a person of ordinary skill in the art would be motivated to improve versatility of the data and improve accuracy of the classifier. Merging two clusters into a single cluster may be done because two clusters are deemed to be similar to one another, thus rendering it beneficial to merge them to also reduce overhead. 

As per claim 5, the combination of Shumpert, O’Hara, and Lewis as shown above teaches the computer-implemented method of claim 4, Shumpert further teaches:
(Shumpert, Fig. 1 discloses training a supervised model using training data and Para. [0095] discloses assigned labels to unlabeled data which may be adapted for supervised learning (Supervised learning uses labeled training data))

As per claim 7, the combination of Shumpert, O’Hara, and Lewis as shown above teaches the computer-implemented method of claim 5, Shumpert further teaches further comprising:
deploying the machine learning classifier to a production environment for use in identifying production machine metrics as indicating anomalies (Shumpert, Fig. 4 discloses S402 which receives and transform data instance, and discloses predicting anomalies (Data instances are from machine sensors. Machine learning classifier being in a production environment that reads metric data for classifying anomalies))
receiving data from the machine learning classifier indicating an instance of the production machine metrics indicates an anomaly; (Shumpert, Fig 1. and Fig. 2 S108 and S206 disclose predicting supervised anomalies)
presenting data identifying the instance of the production machine metrics indicating an anomaly-to- incident likelihood in a user interface; (Shumpert, Para. [0029] discloses “An operator is alerted in response to a predicted anomaly” and Para. [0031] discloses “The resulting model is able to detect and recognize repeat problems (step S306), while still discovering new problems and routing them to domain experts for review (step S308)…”)
and receiving an indication, by way of the user interface, that the instance of the production machine metrics indicates or does not indicate an incident.(Shumpert, Para. [0061] discloses “An expert classification about the anomaly is received (step S418). If the anomaly is confirmed as being new…The expert might confirm the instance as new type of anomaly or they might classify it as a new normal operating state.”)

As per claim 8, the combination of Shumpert, O’Hara, and Lewis as shown above teaches the computer-implemented method of claim 7, Shumpert further teaches:
further comprising retraining the machine learning classifier based, at least in part, on the indication received by way of the user interface. (Shumpert, Fig. 4 discloses updating the shared model at 408 after receiving expert classification at S418. and Para. [0152] discloses “The shared model is retrieved from the model store 508 and retrained”)

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Shumpert, in view of O’Hara, further in view of Lewis, and further in view of U.S. Patent. No. US 9349103 B2 to Eberhardt, et al. (hereinafter, “Eberhardt”)
As per claim 6, the combination of Shumpert, O’Hara, and Lewis as shown above teaches the computer-implemented method of claim 5, the combination of Shumpert, O’Hara, and Dennison fails to explicitly teach:

However, Eberhardt (Eberhardt addresses the issue of machine learned Bayesian networks) teaches:
further comprising assigning incident probability inferences to the plurality of clusters by performing Bayesian learning on the cluster-labeled training data (Eberhardt, Abstract discloses “According to one embodiment, in response to a set of data for anomaly detection, a Bayesian belief network (BBN) model is applied to the data set…” and Col. 1 Line 58 discloses “typically BBNs are used to predict the probability of known events. However, in some situations, there is a need to predict certain unknown events that are dissimilar to certain known events. For example, there is a need to detect new virus or malware that has not been detected before…” and Fig. 1 discloses training module 105 used to train BBNs and Col. 4, Line 21 discloses “The BBN models 106 are then utilized by detection module 107 to provide an estimate of probability that the set of data is similar or dissimilar to a known event.”)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Shumpert as modified to use a Bayesian network to generate probabilities as disclosed by Eberhardt. The combination would have been obvious because a person of ordinary skill in the art would be motivated to “predict the probability of known events…” along with “predicting certain unknown events that are dissimilar to certain known events” (Eberhardt, Col. 1, Line 60)

10 is rejected under 35 U.S.C. 103 as being unpatentable over Shumpert, in view of O’Hara, further in view of U.S. Pub. No. US 20150150011 A1 to Fischetti, et al. (hereinafter, “Fischetti”)
As per claim 10, the combination of Shumpert and O’Hara as shown above teaches the computer-implemented method of claim 1, the combination of Shumpert and O’Hara fails to explicitly teach:
wherein splitting the selected cluster into multiple clusters is performed using a plurality of worker computing devices operating in parallel 
	However, Fischetti (Fischetti addresses the issue of parallel computation) discloses :
wherein splitting the selected cluster into multiple clusters is performed using a plurality of worker computing devices operating in parallel (Fischetti, Para. [0004] discloses “Parallel computation requires splitting a job among a set of processing units called “workers.” The computation is generally performed by a set of one or more master workers that split the workload into chunks and distribute them to a set of slave workers; master and slave workers can coincide in some implementations or variants.” (Using parallel computation to the selected cluster into multiple clusters as disclosed by O’Hara))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Shumpert as modified to use parallel computation as disclosed by FIschetti. The combination would have been obvious because a person of ordinary skill in the art would be motivated to improve processing power and efficiency. Using parallel computations speeds up the execution of the application and allows for solving larger problems in a quicker time.
s 11 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Shumpert, in view of Eberhardt
As per claim 11, Shumpert teaches a computer-implemented method for training a machine learning classifier, the method comprising:
performing unsupervised machine learning to generate cluster-labeled training data from training data independent of user input specifying a label to be used in assigned a cluster label to the cluster-labeled training data, the training data comprising multi-dimensional machine metrics generated by a plurality of computing devices; (Shumpert, Para. [0053] discloses that sensor data “…includes…a reading metric (e.g., voltage, current, etc.), and a reading value”  and Para. [0095] discloses performing unsupervised learning with training data to generated labeled clustered data (Training data comprises of sensor data which contains metrics from various devices))
training the machine learning classifier by performing supervised machine learning on the cluster-labeled training data (Shumpert, Fig. 1 discloses training a supervised model using training data))
and classifying instances of production metrics as indicating anomalies using the machine learning classifier and [[the incident probability inferences]] (Shumpert, Fig. 1 and 2 discloses predicting anomalies using supervised models)
the cluster-labeled training data; (Para. [0095] discloses performing unsupervised learning with training data to generated labeled clustered data)
Shumpert fails to explicitly teach:
[[the cluster-labeled training data]] by performing Bayesian learning on the cluster-labeled training data to assign incident probability inferences to the clusters 
the incident probability inferences
	However, Eberhardt teaches:
assigning incident probability inferences to individual clusters included in [[the cluster-labeled training data]] by performing Bayesian learning on the cluster-labeled training data to assign incident probability inferences to the clusters (Eberhardt, Abstract discloses “According to one embodiment, in response to a set of data for anomaly detection, a Bayesian belief network (BBN) model is applied to the data set…” and Col. 1 Line 58 discloses “typically BBNs are used to predict the probability of known events. However, in some situations, there is a need to predict certain unknown events that are dissimilar to certain known events. For example, there is a need to detect new virus or malware that has not been detected before…” and Fig. 1 discloses training module 105 used to train BBNs and Col. 4, Line 21 discloses “The BBN models 106 are then utilized by detection module 107 to provide an estimate of probability that the set of data is similar or dissimilar to a known event.”)
the incident probability inferences (Eberhardt, Abstract discloses “According to one embodiment, in response to a set of data for anomaly detection, a Bayesian belief network (BBN) model is applied to the data set…” and Col. 1 Line 58 discloses “typically BBNs are used to predict the probability of known events. However, in some situations, there is a need to predict certain unknown events that are dissimilar to certain known events. For example, there is a need to detect new virus or malware that has not been detected before…” and Fig. 1 discloses training module 105 used to train BBNs and Col. 4, Line 21 discloses “The BBN models 106 are then utilized by detection module 107 to provide an estimate of probability that the set of data is similar or dissimilar to a known event.”)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the training data, classification, and supervised machine learning as disclosed by Shumpert to using Bayesian learning to generate probability inferences as disclosed by Eberhardt  The combination would have been obvious because a person of ordinary skill in the art would be motivated to “predict the probability of known events…” along with “predicting certain unknown events that are dissimilar to certain known events” (Eberhardt, Col. 1, Line 60)

As per claim 15, the combination of Shumpert, and Eberhardt as shown above teaches the computer-implemented method of claim 11, Shumpert further teaches further comprising:
receiving data from the machine learning classifier indicating an instance of the production metrics indicates an anomaly; (Shumpert, Fig 1. And Fig. 2 S108 and S206 disclose predicting supervised anomalies)
presenting data identifying the instance of the production metrics indicating an anomaly-to- incident likelihood in a user interface; (Shumpert, Para. [0029] discloses “An operator is alerted in response to a predicted anomaly” and Para. [0031] discloses “The resulting model is able to detect and recognize repeat problems (step S306), while still discovering new problems and routing them to domain experts for review (step S308)…”)
(Shumpert, Para. [0061] discloses “An expert classification about the anomaly is received (step S418). If the anomaly is confirmed as being new…The expert might confirm the instance as new type of anomaly or they might classify it as a new normal operating state.”)
retraining the machine learning classifier based, at least in part, on the indication received by way of the user interface. (Shumpert, Fig. 4 discloses updating the shared model at 408 after receiving expert classification at S418 and Para. [0152] discloses “The shared model is retrieved from the model store 508 and retrained”)

Claims 12, 16, and 20-22 are rejected under 35 U.S.C. 103 as being unpatentable over Shumpert, in view of Eberhardt, further in view of O’Hara
	As per claim 12, the combination of Shumpert, and Eberhardt as shown above teaches the computer-implemented method of claim 11, Shumpert further teaches wherein the unsupervised machine learning comprises:
	Identifying a plurality of clusters of the training data (Shumpert, Para [0081-0087] discloses “An example unsupervised learning approach…” and Para. [0088] discloses “This approach assumes that all data is already available; for instance, a large batch of sensor data was collected during a waiting period. When this algorithm has been executed completely—cluster assignment is repeated over and over until the assignments stabilize or converge—it will produce k clusters of sensor data.” And Fig 1 and 2 discloses training data sets)
(Shumpert, Para. [0089] discloses “In addition, without labeled training data, the standard algorithm cannot predict anomalies. The clusters merely represent groupings of the data, and it is unknown whether or not a given cluster is anomalous (e.g., the clusters could just represent different normal operating states). This can be addressed by collecting labeled training data and using it to classify the clusters, and the following pseudo-code example shows how the standard k-means algorithm can be alternatively adapted for supervised learning and prediction using labeled data:”)
O’Hara further teaches:
selecting a cluster of the plurality of clusters; (O’Hara, Para. [0038] discloses “Traditional cluster splitting/merging algorithms may be applied in the re-calibrating process beginning at 610, where the quality of the clusters, such as that resulting from process detailed in FIG. 5 (labeled “B”), is computed. If it is determined that the quality of clustering can be improved by splitting or merging, a flag is set to true at 620 indicating that the clusters are not compact, and the change is accepted at 630 (e.g., to split or merge clusters). In one example, clusters that are not compact may be split into two clusters. In another example, if the number of records in a cluster is less than a certain percentage (“x %”) of the total number of scheduled delivery items, that cluster may be merged with the cluster nearest to it.” (Clusters get selected for evaluation))
 determining if the selected cluster is a candidate for splitting; independent of user input specifying a label to be used in assigning a cluster label to the selected cluster (O’Hara, Para. [0038] discloses “Traditional cluster splitting/merging algorithms may be applied in the re-calibrating process beginning at 610, where the quality of the clusters, such as that resulting from process detailed in FIG. 5 (labeled “B”), is computed. If it is determined that the quality of clustering can be improved by splitting or merging, a flag is set to true at 620 indicating that the clusters are not compact, and the change is accepted at 630 (e.g., to split or merge clusters). In one example, clusters that are not compact may be split into two clusters. In another example, if the number of records in a cluster is less than a certain percentage (“x %”) of the total number of scheduled delivery items, that cluster may be merged with the cluster nearest to it.” (Determining if clusters are complete or incomplete))
responsive to determining the selected cluster is a candidate for splitting, determining whether the selected cluster is complete or incomplete; (O’Hara, Para. [0038] discloses “Traditional cluster splitting/merging algorithms may be applied in the re-calibrating process beginning at 610, where the quality of the clusters, such as that resulting from process detailed in FIG. 5 (labeled “B”), is computed. If it is determined that the quality of clustering can be improved by splitting or merging, a flag is set to true at 620 indicating that the clusters are not compact, and the change is accepted at 630 (e.g., to split or merge clusters). In one example, clusters that are not compact may be split into two clusters. In another example, if the number of records in a cluster is less than a certain percentage (“x %”) of the total number of scheduled delivery items, that cluster may be merged with the cluster nearest to it.”)
responsive to determining the cluster is incomplete, performing unsupervised machine learning on training data in the selected cluster to split the selected cluster into multiple clusters; (O’Hara, Para. [0038] discloses “Traditional cluster splitting/merging algorithms may be applied in the re-calibrating process beginning at 610, where the quality of the clusters, such as that resulting from process detailed in FIG. 5 (labeled “B”), is computed. If it is determined that the quality of clustering can be improved by splitting or merging, a flag is set to true at 620 indicating that the clusters are not compact, and the change is accepted at 630 (e.g., to split or merge clusters). In one example, clusters that are not compact may be split into two clusters. In another example, if the number of records in a cluster is less than a certain percentage (“x %”) of the total number of scheduled delivery items, that cluster may be merged with the cluster nearest to it.”)
and responsive to determining the selected cluster is complete, [[assigning a cluster label to the selected cluster]]. (O’Hara, Para. [0038] discloses “Traditional cluster splitting/merging algorithms may be applied in the re-calibrating process beginning at 610, where the quality of the clusters, such as that resulting from process detailed in FIG. 5 (labeled “B”), is computed. If it is determined that the quality of clustering can be improved by splitting or merging, a flag is set to true at 620 indicating that the clusters are not compact, and the change is accepted at 630 (e.g., to split or merge clusters). In one example, clusters that are not compact may be split into two clusters. In another example, if the number of records in a cluster is less than a certain percentage (“x %”) of the total number of scheduled delivery items, that cluster may be merged with the cluster nearest to it.”)
Same motivation to combine Shumpert and O’Hara as claim 1

As per claim 16, Shumpert teaches a computer-implemented method for training a machine learning classifier, the method comprising:
(Shumpert, Para. [0034] discloses “Processing resources include at least one processor…”
a computer-readable storage medium having computer-executable instructions stored thereupon which, when executed by the processor, cause the processor to: (Shumpert, Para. [0153] discloses “It also will be appreciated that the techniques described herein may be accomplished by having at least one processor execute instructions that may be tangibly stored on a non-transitory computer readable storage medium.”)
generate cluster-labeled training data from training data, the training data comprising multi-dimensional machine metrics generated by a plurality of computing devices by processing the training data using an unsupervised machine learning algorithm configured to: (Shumpert, Para. [0053] discloses that sensor data “…includes…a reading metric (e.g., voltage, current, etc.), and a reading value” and Para. [0089] generates labeled training data (Training data comprises of sensor data which contains metrics from various devices))
perform supervised machine learning on the cluster-labeled training data to train the machine learning classifier (Shumpert, Fig. 1 discloses training a supervised model using training data)
and classify instances of production metrics as indicating anomalies using the machine learning classifier and [[the incident probability inferences]] (Shumpert, Fig. 1 and 2 discloses predicting anomalies using supervised models)
identify a plurality of clusters of the training data (Shumpert, Para [0081-0087] discloses “An example unsupervised learning approach…” and Para. [0088] discloses “This approach assumes that all data is already available; for instance, a large batch of sensor data was collected during a waiting period. When this algorithm has been executed completely—cluster assignment is repeated over and over until the assignments stabilize or converge—it will produce k clusters of sensor data.” And Fig 1 and 2 discloses training data sets)
label each of [[the complete clusters and the at least one merged cluster]] independent of user input (Shumpert, Para. [0089] generates assigning labels to clusters)
the cluster labeled training data (Shumpert, Para. [0089] generates cluster labeled training data)
	Shumpert fails to explicitly teach:
	the complete clusters and the at least one merged cluster
classify each of the plurality of clusters as complete or incomplete independent of user input specifying a label to be used in labeling the cluster
	generate at least one merged cluster using a subset of the plurality of clusters classifier as incomplete 
However, O’Hara teaches:
the complete clusters and the at least one merged cluster (O’Hara, Para. [0038] discloses “Traditional cluster splitting/merging algorithms may be applied in the re-calibrating process beginning at 610, where the quality of the clusters, such as that resulting from process detailed in FIG. 5 (labeled “B”), is computed. If it is determined that the quality of clustering can be improved by splitting or merging, a flag is set to true at 620 indicating that the clusters are not compact, and the change is accepted at 630 (e.g., to split or merge clusters). In one example, clusters that are not compact may be split into two clusters. In another example, if the number of records in a cluster is less than a certain percentage (“x %”) of the total number of scheduled delivery items, that cluster may be merged with the cluster nearest to it.” (Clusters are flagged for further splitting otherwise they are complete)
classify each of the plurality of clusters as complete or incomplete independent of user input specifying a label to be used in labeling the cluster (O’Hara, Para. [0038] discloses “Traditional cluster splitting/merging algorithms may be applied in the re-calibrating process beginning at 610, where the quality of the clusters, such as that resulting from process detailed in FIG. 5 (labeled “B”), is computed. If it is determined that the quality of clustering can be improved by splitting or merging, a flag is set to true at 620 indicating that the clusters are not compact, and the change is accepted at 630 (e.g., to split or merge clusters). In one example, clusters that are not compact may be split into two clusters. In another example, if the number of records in a cluster is less than a certain percentage (“x %”) of the total number of scheduled delivery items, that cluster may be merged with the cluster nearest to it.”)
	generate at least one merged cluster using a subset of the plurality of clusters classifier as incomplete (O’Hara, Para. [0038] discloses “Traditional cluster splitting/merging algorithms may be applied in the re-calibrating process beginning at 610, where the quality of the clusters, such as that resulting from process detailed in FIG. 5 (labeled “B”), is computed. If it is determined that the quality of clustering can be improved by splitting or merging, a flag is set to true at 620 indicating that the clusters are not compact, and the change is accepted at 630 (e.g., to split or merge clusters). In one example, clusters that are not compact may be split into two clusters. In another example, if the number of records in a cluster is less than a certain percentage (“x %”) of the total number of scheduled delivery items, that cluster may be merged with the cluster nearest to it.”)
Same motivation to combine Shumpert and O’Hara as claim 11
Shumpert fails to explicitly teach:
perform Bayesian learning on [[the cluster-labeled training data]] to assign incident probability inferences to the clusters
the incident probability inferences
	However, Eberhardt teaches:
perform Bayesian learning on [[the cluster-labeled training data]] to assign incident probability inferences to the clusters (Eberhardt, Abstract discloses “According to one embodiment, in response to a set of data for anomaly detection, a Bayesian belief network (BBN) model is applied to the data set…” and Col. 1 Line 58 discloses “typically BBNs are used to predict the probability of known events. However, in some situations, there is a need to predict certain unknown events that are dissimilar to certain known events. For example, there is a need to detect new virus or malware that has not been detected before…” and Fig. 1 discloses training module 105 used to train BBNs and Col. 4, Line 21 discloses “The BBN models 106 are then utilized by detection module 107 to provide an estimate of probability that the set of data is similar or dissimilar to a known event.”)
the incident probability inferences (Eberhardt, Abstract discloses “According to one embodiment, in response to a set of data for anomaly detection, a Bayesian belief network (BBN) model is applied to the data set…” and Col. 1 Line 58 discloses “typically BBNs are used to predict the probability of known events. However, in some situations, there is a need to predict certain unknown events that are dissimilar to certain known events. For example, there is a need to detect new virus or malware that has not been detected before…” and Fig. 1 discloses training module 105 used to train BBNs and Col. 4, Line 21 discloses “The BBN models 106 are then utilized by detection module 107 to provide an estimate of probability that the set of data is similar or dissimilar to a known event.”)
Same motivation to combine Shumpert and Eberhardt as claim 11

As per claim 20, the combination of Shumpert, O’Hara, and Eberhardt as shown above teaches the computing system of claim 16, Shumpert further teaches wherein the computer-executable instructions further cause the processor to:
receive data from the machine learning classifier indicating an instance of the production metrics indicates an anomaly; (Shumpert, Fig 1. And Fig. 2 S108 and S206 disclose predicting supervised anomalies)
present data identifying the instance of the production metrics indicating an anomaly-to- incident likelihood in a user interface; (Shumpert, Para. [0029] discloses “An operator is alerted in response to a predicted anomaly” and Para. [0031] discloses “The resulting model is able to detect and recognize repeat problems (step S306), while still discovering new problems and routing them to domain experts for review (step S308)…”)
and receive an indication, by way of the user interface, that the instance of the machine metrics indicates or does not indicate an incident.(Shumpert, Para. [0061] discloses “An expert classification about the anomaly is received (step S418). If the anomaly is confirmed as being new…The expert might confirm the instance as new type of anomaly or they might classify it as a new normal operating state.”)
retrain the machine learning classifier based, at least in part, on the indication received by way of the user interface. (Shumpert, Fig. 4 discloses updating the shared model at 408 after receiving expert classification at S418 and Para. [0152] discloses “The shared model is retrieved from the model store 508 and retrained”)

	As per claim 21, the combination of Shumpert, O’Hara, and Eberhardt as shown above teaches the computing system of claim 16, Shumpert further teaches:
	wherein the computer-executable instructions further cause the processor to assign a remedial action to each of the plurality of clusters having an [[assigned incident probability inferences that satisfies a threshold]] (Shumpert, Para. [0034] discloses “in response to a classification of the respective instance being a normal instance type, use the data in the respective instance to train the retrieved model; in response to a classification of the respective instance being an anomalous instance type that is not new, determine from the knowledgebase an action to be taken and take the determined action…” and Para. [0148] is an example action record)
	Eberhardt further teaches:
assigned incident probability inferences that satisfies a threshold (Eberhardt, Abstract discloses “According to one embodiment, in response to a set of data for anomaly detection, a Bayesian belief network (BBN) model is applied to the data set…” and Col. 1 Line 58 discloses “typically BBNs are used to predict the probability of known events. However, in some situations, there is a need to predict certain unknown events that are dissimilar to certain known events. For example, there is a need to detect new virus or malware that has not been detected before…” and Fig. 1 discloses training module 105 used to train BBNs and Col. 4, Line 21 discloses “The BBN models 106 are then utilized by detection module 107 to provide an estimate of probability that the set of data is similar or dissimilar to a known event.” And Col. 10, Line 30 discloses “In one embodiment, an output of BBN model 604 is compared to a predetermined threshold associated with BBN model 604. If the output of BBN model 604 is greater than the predetermined threshold determined at block 605, it means that data set 601 is more likely considered as part of malware.” And Col. 9, Line 47 discloses “Thresholds can be set for each output (classification, malware similarity, and benign similarity) to enhance these discriminatory capabilities.”)
Same motivation to combine Shumpert and Eberhardt as claim 11

As per claim 22, the combination of Shumpert, O’Hara, and Eberhardt as shown above teaches the computing system of claim 21, Shumpert further teaches:
wherein the remedial action comprises at least one of a device restoration to a recent healthy state, a device reboot, or a device reconfiguration (Shumpert, Para. [0148] discloses an Example Action Record where a RemediationType is “e.g. repair, replace, reburbish etc.”)

Claim 13 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Shumpert, in view of O’Hara, further in view of Eberhardt, and further in view of Dennison
As per claim 13, the combination of Shumpert, O’Hara, and Eberhardt as shown above teaches the computer-implemented method of claim 12, O’Hara further teaches:
[[further comprising generating a completeness score for]] the selected cluster, and determining whether the selected cluster is complete or incomplete is based, at least in part, on [[the completeness score]] (O’Hara, Para. [0038] discloses “Traditional cluster splitting/merging algorithms may be applied in the re-calibrating process beginning at 610, where the quality of the clusters, such as that resulting from process detailed in FIG. 5 (labeled “B”), is computed. If it is determined that the quality of clustering can be improved by splitting or merging, a flag is set to true at 620 indicating that the clusters are not compact, and the change is accepted at 630 (e.g., to split or merge clusters). In one example, clusters that are not compact may be split into two clusters. In another example, if the number of records in a cluster is less than a certain percentage (“x %”) of the total number of scheduled delivery items, that cluster may be merged with the cluster nearest to it.” (Clusters get selected and are determined to be complete or incomplete)
The combination of Shumpert, O’Hara, and Eberhardt fails to explicitly teach:
further comprising generating a completeness score for [[the selected cluster, and determining whether the selected cluster is complete or incomplete is based, at least in part, on]] the completeness score
However, Dennison teaches:
further comprising generating a completeness score for [[the selected cluster, and determining whether the selected cluster is complete or incomplete is based, at least in part, on]] the completeness score (Dennison, Col. 2, Line 34 discloses “The computer processors can be configured to execute the one or more software modules in order to cause the computer system to score the generated data item cluster”)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Shumpert as modified to use a score for cluster as disclosed by Dennison. The combination would have been obvious because a person of ordinary skill in the art would be motivated to improve the determination accuracy as to whether or not selected clusters are complete or incomplete. Using a score for the determination allows for a more objective determination rather than a subjective determination.

As per claim 18, the combination of Shumpert, O’Hara, and Eberhardt as shown above teaches the computing system of claim 16, O’Hara further teaches:
[[wherein the computer-executable instructions further cause the processor to generate a completeness score for]] each of the plurality of clusters and classify each of the plurality of clusters as complete or incomplete is based, at least in part, on [[the completeness score]] (O’Hara, Para. [0038] discloses “Traditional cluster splitting/merging algorithms may be applied in the re-calibrating process beginning at 610, where the quality of the clusters, such as that resulting from process detailed in FIG. 5 (labeled “B”), is computed. If it is determined that the quality of clustering can be improved by splitting or merging, a flag is set to true at 620 indicating that the clusters are not compact, and the change is accepted at 630 (e.g., to split or merge clusters). In one example, clusters that are not compact may be split into two clusters. In another example, if the number of records in a cluster is less than a certain percentage (“x %”) of the total number of scheduled delivery items, that cluster may be merged with the cluster nearest to it.” (Clusters get selected and are determined to be complete or incomplete)
The combination of Shumpert, O’Hara, and Eberhardt fails to explicitly teach:
wherein the computer-executable instructions further cause the processor to generate a completeness score for [[each of the plurality of clusters and classify each of the plurality of clusters as complete or incomplete is based, at least in part, on]] the completeness score
However, Dennison teaches:
wherein the computer-executable instructions further cause the processor to generate a completeness score for [[each of the plurality of clusters and classify each of the plurality of clusters as complete or incomplete is based, at least in part, on]] the completeness score (Dennison, Col. 2, Line 34 discloses “The computer processors can be configured to execute the one or more software modules in order to cause the computer system to score the generated data item cluster” )
Same motivation to combine Shumpert, O’Hara, Eberhardt, and Dennison as claim 13

Claim 14 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Shumpert, in view of O’Hara, further in view of Eberhardt, further in view of Dennison, and further in view of Bala
As per claim 14, the combination of Shumpert, O’Hara, Eberhardt, and Dennison as shown above teaches the computer-implemented method of claim 13, Dennison further teaches:
wherein generating the completeness score for the selected cluster comprises determining [[a distance between instances of training data in the selected cluster are approximately similar in length to a median non-zero distance between the instances of the training data]] (Dennison, Col. 2, Line 34 discloses “The computer processors can be configured to execute the one or more software modules in order to cause the computer system to score the generated data item cluster” )
The combination of Shumpert, O’Hara, Eberhardt, and Dennison fails to explicitly teach:
[[wherein generating the completeness score for the selected cluster comprises determining]] a distance between instances of training data in the selected cluster are approximately similar in length to a median non-zero distance between the instances of the training data
	However, Bala teaches:
[[wherein generating the completeness score for the selected cluster comprises determining]] a distance between instances of training data in the selected cluster are approximately similar in length to a median non-zero distance between the instances of the training data (Bala, Para. [0049] discloses “Once the centroid is determined, the Euclidean distance between each positive data point and the centroid is calculated and the values are used to estimate the average Dintra, the intra cluster distance…” (Using the intra cluster distance, the completeness score, as disclosed by Dennison above, may be computed using the intra cluster distance))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Shumpert as modified to determine if data within a cluster are similar in distance to a center as disclosed by Bala. The combination would have been obvious because a person of ordinary skill in the art would be motivated to further improve the determination accuracy as to whether or not selected clusters are complete or incomplete based off distances within the cluster. This allows for a further objective determination as to whether or not a selected cluster is indeed complete or incomplete.

As per claim 19, the combination of Shumpert, O’Hara, Eberhardt, and Dennison as shown above teaches the computing system of claim 18, Dennison further teaches:
wherein the completeness score for the cluster is generated based, at least in part, upon a determination as to whether [[a distance between instances of training data in the cluster are approximately similar in length to a median non-zero distance between the instances of the training data]] (Dennison, Col. 2, Line 34 discloses “The computer processors can be configured to execute the one or more software modules in order to cause the computer system to score the generated data item cluster” )
The combination of Shumpert, O’Hara, Eberhardt, and Dennison fails to explicitly teach:
[[wherein the completeness score for the cluster is generated based, at least in part, upon a determination as to whether]] a distance between instances of training data in the cluster are approximately similar in length to a median non-zero distance between the instances of the training data
	However, Bala teaches:
[[wherein the completeness score for the cluster is generated based, at least in part, upon a determination as to whether]] a distance between instances of training data in the cluster are approximately similar in length to a median non-zero distance between the instances of the training data (Bala, Para. [0049] discloses “Once the centroid is determined, the Euclidean distance between each positive data point and the centroid is calculated and the values are used to estimate the average Dintra, the intra cluster distance…” (Using the intra cluster distance, the completeness score, as disclosed by Dennison above, may be computed using the intra cluster distance))
Same motivation to combine Shumpert, O’Hara, Eberhardt, Dennison, and Bala as claim 14
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of 
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Hummel, et al. (U.S. Pub. No. US 20160217201 A1) discloses performing clustering algorithms to document clusters to produce cluster labels
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAMZA RAZZAQ MUGHAL whose telephone number is 571-272-8833. The examiner can normally be reached on M-TR from 7:30 to 5:00.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV, can be reached at telephone number 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://portal.uspto.gov/external/portal. Should you have questions about access to 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

/H.R.M./Examiner, Art Unit 2123                                                                                                                                                                                                        /ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123