DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claim 1 objected to because of the following informalities:  
Claim 1 recites “feature data…the feature data comprising a plurality of feature labels associated with characteristics the plurality of test instances….” However it should read “feature data…the feature data comprising a plurality of feature labels associated with characteristics of the plurality of test instances…”
Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Claim 1 recites a method, comprising: receiving, at a client device, a performance report including performance information for a machine learning system, wherein the performance information comprises: a plurality of outputs of the machine learning system for a plurality of test instances; accuracy data of the plurality of outputs, wherein the accuracy data includes identified errors between outputs from the plurality of outputs and associated ground truth 
The limitation of a plurality of outputs of the machine learning system for a plurality of test instances as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim limitation precludes the step from practically being performed in the mind. This limitation explains the data wherein the data is observation type data that is used by a mental process. 
The limitation of accuracy data of the plurality of outputs, wherein the accuracy data includes identified errors between outputs from the plurality of outputs and associated ground truth data corresponding to the plurality of test instances as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim limitation precludes the step from practically being performed in the mind. This limitation explains the data wherein the data is observation type data that is used by a mental process.

The limitation of providing, via a graphical user interface, one or more performance views based on the performance information, the one or more performance views including a plurality of graphical elements associated with a plurality of feature clusters, wherein the plurality of feature clusters include subsets of test instances from the plurality of test instances based on associated feature labels, and wherein the one or performance views includes an indication of the accuracy data corresponding to at least one feature cluster from the plurality of feature clusters as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “via a graphical user interface”, nothing in the claim limitation precludes the step from practically being performed in the mind. For example, but for the “via a graphical user interface”, “providing” in the context of the claim encompasses a mental process which involves an observation of data/opinion. 


of the limitation in the mind but for the recitation of generic computer components, then it falls
within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an
abstract idea.
The judicial exception is not integrated into a practical application. In particular, the claim recites additional elements – the machine learning system and a graphical user interface. The machine learning system and a graphical user interface are recited at a high level of generality (i.e., as a generic computer function) such that it amounts to no more than mere instructions to apply the exception using a generic computing model. Further the claims recites the act of receiving (receiving, at a client device, a performance reporting including…) and the act of displaying data (via a graphical user interface) which are considered to be insignificant extra solution activity. The receiving step is recited at a high level of generality and amounts to mere data gathering, which is a form of insignificant extra solution activity. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim does not include additional elements that are sufficient to significantly more
than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of the machine learning system and a graphical user interface amounts to no more than mere instructions to apply the exception using a generic computing component Limitations that the courts have found not to be enough to qualify as "significantly more" when recited in a claim with a judicial exception include 
data” is a well-understood, routine, conventional function when it is claimed in a merely
generic manner (as it is in the present claim). Thereby, a conclusion that the claimed receiving step is well understood routine, conventional activity is supported under Berkheimer. Further, the act of displaying data on a display is considered to be an extra solution activity in Step 2A Prong 2, and thus it is re-evaluated in Step 2B to determine if it is more than what is well understood, routine, conventional activity in the field. Daniel et al. (U.S. Pub. No. US 20100080389 A1) discloses in Para. [0017] that “…displaying either a text message and/or a visual display on the display element. The display element may be a liquid crystal display ("LCD") or light emitting diode ("LED") type, plasma, touch screen or other types of displays that are well known and used in the arts.”. Thereby, a conclusion that the act of displaying data via a graphical user interface is well understood routine, conventional activity is supported under Berkheimer Further, MPEP 2106.04(a)(2) states “In contrast, claims do recite a mental process when they contain limitations that can practically be performed in the human mind, including 
This claim is not patent eligible under U.S.C. 101.
	Claim 2 recites the method of claim 1, further comprising:  - 56 -FILED ELECTRONICALLYDocket No. 406506-US-NP detecting a selection of a graphical element from the plurality of graphical elements associated with a combination of one or more feature labels; and providing a visualization of the accuracy data associated with a subset of outputs from the plurality of outputs corresponding to a subset of test instances corresponding to the combination of one or more feature labels.
	The limitation of providing a visualization of the accuracy data associated with a subset of outputs from the plurality of outputs corresponding to a subset of test instances corresponding to the combination of one or more feature labels as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim limitation precludes the step from practically being performed in the mind. For example, “providing” In the context of the claim encompasses a user drawing a visualization of a plurality of data and presenting it to a second user. 
The judicial exception is not integrated into a practical application. In particular, the
claim recites one additional element – detecting a selection of a graphical element…. Detecting a selection of a graphical element is recited at a high level of generality (i.e., as a generic model performing a generic computer function) such that it amounts to no more than mere instructions to apply the exception using a generic computing model. Accordingly, these additional elements do not integrate the abstract idea

abstract idea. The claim is directed to an abstract idea
The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. The additional element of detecting a selection of a graphical element amounts to no more than mere instructions to apply the exception using generic components. Limitations that the courts have found not to be enough to qualify as "significantly more" when recited in a claim with a judicial exception include: generally linking the use of the judicial exception to a particular technological environment or field of use – see MPEP 2106.05(h). The claim limitation is mere suggesting a field of use or technological environment in which to apply the exception such that it amounts to no more than mere linking. 
This claim is not patent eligible under U.S.C. 101.
	Claim 3 recites the method of claim 1, wherein the plurality of graphical elements comprises a list of selectable features corresponding to the plurality of feature clusters, wherein the selectable features are ranked within the list based on measures of correlation between the plurality of feature clusters and identified errors from the accuracy data.
The judicial exception is not integrated into a practical application. In particular, the
claim recites one additional element – the plurality of graphical elements comprises a list of selectable features…. The plurality of graphical elements comprises a list of selectable features is recited at a high level of generality (i.e., as a generic model performing a generic computer function) such that it amounts to no more than mere instructions to apply the exception using a generic computing model. Accordingly, these additional elements do not integrate the abstract 
The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. The additional element of the plurality of graphical elements comprises a list of selectable features amounts to no more than mere instructions to apply the exception using generic components. Limitations that the courts have found not to be enough to qualify as "significantly more" when recited in a claim with a judicial exception include: generally linking the use of the judicial exception to a particular technological environment or field of use – see MPEP 2106.05(h). The claim limitation is mere suggesting a field of use or technological environment in which to apply the exception such that it amounts to no more than mere linking. 
This claim is not patent eligible under U.S.C. 101.
	Claim 4 recites the method of claim 1, wherein providing the one or more performance views comprises providing a global performance view for the plurality of feature clusters, the global performance view including a visual representation of the accuracy data with respect to multiple feature clusters of the plurality of feature clusters, and wherein the plurality of graphical elements includes selectable portions of the global performance view associated with the multiple feature clusters.
	The limitation of wherein providing the one or more performance views comprises providing a global performance view for the plurality of feature clusters, the global performance view including a visual representation of the accuracy data with respect to multiple feature clusters of the plurality of feature clusters as drafted, is a process that, under 
 The judicial exception is not integrated into a practical application. In particular, the
claim recites one additional element – the plurality of graphical element includes…. the plurality of graphical element includes is recited at a high level of generality (i.e., as a generic model performing a generic computer function) such that it amounts to no more than mere instructions to apply the exception using a generic computing model. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea
The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. The additional element of the plurality of graphical element includes amounts to no more than mere instructions to apply the exception using generic components. Limitations that the courts have found not to be enough to qualify as "significantly more" when recited in a claim with a judicial exception include: generally linking the use of the judicial exception to a particular technological environment or field of use – see MPEP 2106.05(h). The claim limitation is mere suggesting a field of use or technological environment in which to apply the exception such that it amounts to no more than mere linking. 
This claim is not patent eligible under U.S.C. 101.

	The limitation of wherein providing the one or more performance views comprises providing a cluster performance view for the first feature cluster, the cluster performance view -57 -FILED ELECTRONICALLYDocket No. 406506-US-NP comprising a visualization of the accuracy data for a first subset of outputs from the plurality of outputs associated with the first feature cluster as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim limitation precludes the step from practically being performed in the mind. For example, “providing” In the context of the claim encompasses a user visually presenting data to a second user.
The judicial exception is not integrated into a practical application. In particular, the
claim recites one additional element – detecting a selection of a graphical element…. Detecting a selection of a graphical element is recited at a high level of generality (i.e., as a generic model performing a generic computer function) such that it amounts to no more than mere instructions to apply the exception using a generic computing model. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea

significantly more than the judicial exception. The additional element of detecting a selection of a graphical element amounts to no more than mere instructions to apply the exception using generic components. Limitations that the courts have found not to be enough to qualify as "significantly more" when recited in a claim with a judicial exception include: generally linking the use of the judicial exception to a particular technological environment or field of use – see MPEP 2106.05(h). The claim limitation is mere suggesting a field of use or technological environment in which to apply the exception such that it amounts to no more than mere linking. 
This claim is not patent eligible under U.S.C. 101.
	Claim 6 recites the method of claim 5, wherein the cluster performance view comprises a multi- branch visualization of the accuracy data for the plurality of outputs, wherein the multi- branch visualization comprises: a first branch including an indication of the accuracy data associated with the first subset of outputs from the plurality of outputs associated with the first feature cluster; and a second branch including an indication of the accuracy data associated with a second subset of outputs from the plurality of outputs not associated with the first feature cluster.
	The limitation of a first branch including an indication of the accuracy data associated with the first subset of outputs from the plurality of outputs associated with the first feature cluster as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim limitation precludes the step from practically being performed in the mind. For example, the limitation encompasses a 
	The limitation of a second branch including an indication of the accuracy data associated with a second subset of outputs from the plurality of outputs not associated with the first feature cluster as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim limitation precludes the step from practically being performed in the mind. For example, the limitation encompasses a user presenting data in the form of a visualized tree containing multiple branches and nodes wherein the nodes are associated with data thus allowing for an observation of data.
The judicial exception is not integrated into a practical application. In particular, the
claim does not recite any additional elements. Accordingly, this does not integrate the abstract
idea into a practical application because it does not impose any meaningful limits on practicing
the abstract idea.
The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. As discussed above with respect to the
integration of the abstract idea into a practical application, no additional elements are cited.
This claim is not patent eligible under U.S.C. 101.
	Claim 7 recites the method of claim 6, further comprising: detecting a selection of the first branch; detecting a selection of an additional graphical element corresponding to a second feature cluster from the plurality of feature clusters; and providing a third branch including an 
	The limitation of providing a third branch including an indication of the accuracy data associated with a third subset of outputs associated with a combination of feature labels shared by the first cluster and the second feature cluster as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim limitation precludes the step from practically being performed in the mind. For example, “providing” In the context of the claim encompasses a user visually presenting a tree containing multiple branches wherein the nodes are associated with data allowing for an observation of such data.
The judicial exception is not integrated into a practical application. In particular, the
claim recites additional elements – detecting a selection of the first branch and detecting a selection of an additional graphical element…. Detecting a selection of the first branch and detecting a selection of an additional graphical element is recited at a high level of generality (i.e., as a generic model performing a generic computer function) such that it amounts to no more than mere instructions to apply the exception using a generic computing model. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea
The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. The additional element of Detecting a selection of the first branch and detecting a selection of an additional graphical element amounts to no 
This claim is not patent eligible under U.S.C. 101.
	Claim 8 recites the method of claim 7, wherein the multi-branch visualization of the accuracy data for the plurality of outputs comprises:  - 58 -FILED ELECTRONICALLYDocket No. 406506-US-NP a root node representative of the plurality of outputs for the plurality of test instances; a first level including a first node representative of the first subset of outputs and a second node representative of the second subset of outputs; and a second level including a third node representative of the third subset of outputs. This limitation, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim limitation precludes the step from practically being performed in the mind. This limitation merely describes how the data is visualized in a tree form containing nodes that are associated with data.
The judicial exception is not integrated into a practical application. In particular, the
claim does not recite any additional elements. Accordingly, this does not integrate the abstract
idea into a practical application because it does not impose any meaningful limits on practicing
the abstract idea.
The claim does not include additional elements that are sufficient to amount to

integration of the abstract idea into a practical application, no additional elements are cited.
This claim is not patent eligible under U.S.C. 101.
	Claim 9 recites the method of claim 1, wherein providing the one or more performance views further comprises providing an instance view associated with a selected feature cluster, wherein the instance view comprises a display of a test instance, a display of an output from the machine learning system for the test instance, and a display of at least a portion of the ground truth data for the test instance. This limitation, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, but for the “machine learning system” language, nothing in the claim limitation precludes the step from practically being performed in the mind. For example, but for the “machine learning system” language, “providing” in the context of the claim encompasses a user visually presenting data to a second user for observation.
The judicial exception is not integrated into a practical application. In particular, the
claim recites additional elements – the machine learning system. The machine learning system is recited at a high level of generality (i.e., as a generic model performing a generic computer function) such that it amounts to no more than mere instructions to apply the exception using a generic computing model. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea
The claim does not include additional elements that are sufficient to amount to

computing components cannot provide an inventive concept. Limitations that the courts have found not to be enough to qualify as "significantly more" when recited in a claim with a judicial exception include adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f) and generally linking the use of the judicial exception to a particular technological environment or field of use – see MPEP 2106.05(h). The claim limitation is mere suggesting a field of use or technological environment in which to apply the exception such that it amounts to no more than mere linking. Mere instructions to apply an exception using generic computing components cannot provide an inventive concept.
This claim is not patent eligible under U.S.C. 101.
	Claim 10 recites the method of claim 1, further comprising: providing, via the graphical user interface of the client device, a selectable option to provide failure information to a training system, the failure information comprising an indication of one or more feature labels from the plurality of feature labels associated with a threshold rate of identified errors from the accuracy data; and providing the failure information to the training system including instructions for refining the machine learning system based on selectively identified training data associated with the one or more feature labels.
The judicial exception is not integrated into a practical application. In particular, the

The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. The additional element of providing, via graphical user interface of the client device, a selectable option…, and providing the failure information to the training system amounts to no more than mere instructions to apply the exception using generic components. Limitations that the courts have found not to be enough to qualify as "significantly more" when recited in a claim with a judicial exception include: generally linking the use of the judicial exception to a particular technological environment or field of use – see MPEP 2106.05(h). The claim limitation is mere suggesting a field of use or technological environment in which to apply the exception such that it amounts to no more than mere linking. 
This claim is not patent eligible under U.S.C. 101.
	Claim 14 recites the system of claim 13, wherein providing the one or more performance views further comprises providing an instance view associated with the first 
The judicial exception is not integrated into a practical application. In particular, the
claim does not recite any additional elements. Accordingly, this does not integrate the abstract
idea into a practical application because it does not impose any meaningful limits on practicing
the abstract idea.
The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. As discussed above with respect to the
integration of the abstract idea into a practical application, no additional elements are cited.
This claim is not patent eligible under U.S.C. 101.
	Claim 15 recites the system of claim 11, further comprising instructions being executable by the one or more processors to cause the server device to: receive an indication of one or more feature labels associated with a threshold rate of identified errors from the accuracy data; and cause a training system to refine the machine learning system based on a plurality of training instances associated with the one or more feature labels.
The judicial exception is not integrated into a practical application. In particular, the claim recites an additional element – cause a training system to refine the machine learning system… is recited at a high level of generality (i.e., as a generic computer function) such that it 
The claim does not include additional elements that are sufficient to significantly more
than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of cause a training system to refine the machine learning system amounts to no more than mere instructions to apply the exception using a generic computing component  Limitations that the courts have found not to be enough to qualify as "significantly more" when recited in a claim with a judicial exception include adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f) and generally linking the use of the judicial exception to a particular technological environment or field of use – see MPEP 2106.05(h). Mere instructions to apply an exception using generic computing components cannot provide an inventive concept. Further, the receiving step is considered to be an extra solution activity in Step 2A Prong 2, and thus it is re-evaluated in Step 2B to determine if it is more than what is well understood, routine, conventional activity in the field. The court decisions cited in MPEP 2106.05(d)((II) indicate that “Receiving or transmitting data over a network, e.g., using the Berkheimer. 
This claim is not patent eligible under U.S.C. 101.
Claims 11-13 are rejected on the same grounds as claims 1-2, and 5 respectively
Claims 16-20 are rejected on the same ground as claims 1-2, 5, 9, and 10 respectively 
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-9, and 16-19 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Towards Accountable AI: Hybrid Human-Machine Analyses for Characterizing System Failure to Nushi, et al. (hereinafter, “Nushi”)
As per claim 1, Nushi teaches a method, comprising: 
receiving, at a client device, a performance report including performance information for a machine learning system, wherein the performance information comprises: (Nushi, Page 2, 1st Col., 2nd Para. discloses “Pandora provides multiple views to highlight different relationships between input data, system execution and system errors. Along one dimension, the views choose the type of data being used for error analysis: signals drawn from content being analyzed or signals collected from component execution” (Error analysis of data and signals collected from machine learning execution of a system))
a plurality of outputs of the machine learning system for a plurality of test instances; (Nushi, Page 2, 1st Col., 2nd Para. discloses “Content-based views use detailed ground truth or automatically detected content (i.e., input data) features to learn common situations associated with poor performance. For instance, a face recognizer could report that the system may make more mistakes in recognizing faces of old men wearing eyeglasses.” And Page 4, 1st Col., 3rd Para. discloses “The end-to-end process is focused on a sample evaluation dataset chosen by the system designer” (Evaluation dataset being the test instances))
accuracy data of the plurality of outputs, wherein the accuracy data includes identified errors between outputs from the plurality of outputs and associated ground truth data corresponding to the plurality of test instances; (Nushi, Page 2, 1st Col., 2nd Para. discloses “Content-based views use detailed ground truth or automatically detected content (i.e., input data) features to learn common situations associated with poor performance. For instance, a face recognizer could report that the system may make more mistakes in recognizing faces of old men wearing eyeglasses” and Page 6, Tables 2 and 4 disclose displaying accuracy data (Determination of the degree of accuracy of the outputs of the machine learning model))
feature data associated with the plurality of test instances, the feature data comprising a plurality of feature labels associated with characteristics the plurality of test instances, (Nushi, Page 6, Tables 2 and 4 discloses objects which contain feature label associated with the test instances)
and providing, via a graphical user interface, one or more performance views based on the performance information, the one or more performance views including a plurality of graphical elements associated with a plurality of feature clusters, (Nushi, Page 8, Figures 3-5 disclose displaying performance views associated with clusters obtained via clustering” and Page 8, 1st. Col, 2nd Para. discloses “Figure 3 visualizes the decision tree for a content view from crowd data for the cluster kitchen.”)
wherein the plurality of feature clusters include subsets of test instances from the plurality of test instances based on associated feature labels, (Nushi, Page 4, 1st Col., 3rd Para. discloses “Pandora can create generic as well as clustered reports for a given performance view. Generic reports analyze the evaluation dataset as a whole, while clustered reports decompose the analysis according to distinct semantic clusters.” (Distinct semantic clusters being subsets of test instances)) 
and wherein the one or performance views includes an indication of the accuracy data corresponding to at least one feature cluster from the plurality of feature clusters. (Nushi, Page 4, 1st Col., 3rd Para. discloses “Pandora can create generic as well as clustered reports for a given performance view. Generic reports analyze the evaluation dataset as a whole, while clustered reports decompose the analysis according to distinct semantic clusters.” (Clustered reports indicate performance data corresponding to cluster which are a subset of test instances))
As per claim 2, Nushi as shown above teaches the method of claim 1, Nushi further teaches further comprising:  - 56 -FILED ELECTRONICALLYDocket No. 406506-US-NP 
detecting a selection of a graphical element from the plurality of graphical elements associated with a combination of one or more feature labels; (Nushi, Page 4, 2nd Col., 4th Para. discloses “Clustered variants aggregate satisfaction over selected clusters” and Page 4, 2nd Col., 45h Para. discloses “Moreover, system designers can zoom into the decision tree explore the concrete instances classified in each leaf” (Selection of clusters containing feature labels))
and providing a visualization of the accuracy data associated with a subset of outputs from the plurality of outputs corresponding to a subset of test instances corresponding to the combination of one or more feature labels. (Nushi, Page 4, 2nd Col., 45h Para. discloses “Moreover, system designers can zoom into the decision tree explore the concrete instances classified in each leaf” and Page 5, 1st Col., 2nd Para. discloses “For more extensive explorations, the system designer can intentionally leave a feature out of the tree to investigate other failure conditions in depth or generate feature rankings, as described next “ (Visualization disclosed corresponding to subset of the combination on the leaf nodes)

As per claim 3, Nushi as shown above teaches The method of claim 1, Nushi further teaches:
wherein the plurality of graphical elements comprises a list of selectable features corresponding to the plurality of feature clusters, (Nushi, Page 4, 2nd Col., 4th Para. discloses “Clustered variants aggregate satisfaction over selected clusters” and Page 4, 2nd Col., 45h Para. discloses “Moreover, system designers can zoom into the decision tree explore the concrete instances classified in each leaf” (Selection of clusters containing feature labels))
wherein the selectable features are ranked within the list based on measures of correlation between the plurality of feature clusters and identified errors from the accuracy data. (Nushi, Page 5, 1st Col., 2nd Para. discloses “For more extensive explorations, the system designer can intentionally leave a feature out of the tree to investigate other failure conditions in depth or generate feature rankings, as described next “ and “We compute the mutual information between the feature and system performance (i.e. human satisfaction) as the ranking criterion. The same criterion is used for splitting nodes in the decision trees. Mutual information not only captures the correlation between two variables but also other statistical dependencies that can be useful for failure prediction.” (Capturing relationship among data))

	As per claim 4, Nushi as shown above teaches the method of claim 1, Nushi further teaches:
wherein providing the one or more performance views comprises providing a global performance view for the plurality of feature clusters, the global performance view including a visual representation of the accuracy data with respect to multiple feature clusters of the plurality of feature clusters, and wherein the plurality of graphical elements includes selectable portions of the global performance view associated with the multiple feature clusters (Nushi, Page 4, 1st Col, 3rd Para. discloses “Generic reports analyze the evaluation dataset as a whole, while clustered reports decompose the analysis according to distinct semantic clusters. As we show in the experimental evaluation, although there is value in extracting generic failure information, clustered reports are more predictive for system performance and they discover cluster-specific errors, which cannot be identified via generic views.” And Page 4, 2nd Col. 1st Para. discloses “View generation is a two-step process: 1) clustering the evaluation dataset based on content signals, and 2) detailed reporting globally and per cluster. The process generalizes to all views created in Pandora”)

As per claim 5, Nushi as shown above teaches The method of claim 1, Nushi further teaches further comprising: 
detecting a selection of a graphical element corresponding to a first feature cluster from the plurality of feature clusters; (Nushi, Page 4, 2nd Col., 4th Para. discloses “Clustered variants aggregate satisfaction over selected clusters” and Page 4, 2nd Col., 45h Para. discloses “Moreover, system designers can zoom into the decision tree explore the concrete instances classified in each leaf” (Selection of clusters containing feature labels))
and wherein providing the one or more performance views comprises providing a cluster performance view for the first feature cluster, the cluster performance view -57 -FILED ELECTRONICALLYDocket No. 406506-US-NP comprising a visualization of the accuracy data for a first subset of outputs from the plurality of outputs associated with the first feature cluster. (Nushi, Page 4, 1st Col, 3rd Para. discloses “Generic reports analyze the evaluation dataset as a whole, while clustered reports decompose the analysis according to distinct semantic clusters. As we show in the experimental evaluation, although there is value in extracting generic failure information, clustered reports are more predictive for system performance and they discover cluster-specific errors, which cannot be identified via generic views.” And Page 4, 2nd Col. 1st Para. discloses “View generation is a two-step process: 1) clustering the evaluation dataset based on content signals, and 2) detailed reporting globally and per cluster. The process generalizes to all views created in Pandora”)

	As per claim 6, Nushi as shown above teaches the method of claim 5, Nushi further teaches wherein the cluster performance view comprises a multi- branch visualization of the accuracy data for the plurality of outputs, wherein the multi- branch visualization comprises: 
a first branch including an indication of the accuracy data associated with the first subset of outputs from the plurality of outputs associated with the first feature cluster; . (Nushi, Page 4, 1st Col, 3rd Para. discloses “Generic reports analyze the evaluation dataset as a whole, while clustered reports decompose the analysis according to distinct semantic clusters. As we show in the experimental evaluation, although there is value in extracting generic failure information, clustered reports are more predictive for system performance and they discover cluster-specific errors, which cannot be identified via generic views.” And Page 4, 2nd Col. 1st Para. discloses “View generation is a two-step process: 1) clustering the evaluation dataset based on content signals, and 2) detailed reporting globally and per cluster. The process generalizes to all views created in Pandora” and Page 4, 2nd Col., 45h Para. discloses “Moreover, system designers can zoom into the decision tree explore the concrete instances classified in each leaf” and Page 4, 1st Col, 3rd Para. discloses “Generic reports analyze the evaluation dataset as a whole, while clustered reports decompose the analysis according to distinct semantic clusters…” (Visualized tree contains separate branches wherein each leaf node corresponds to distinct feature clusters))
and a second branch including an indication of the accuracy data associated with a second subset of outputs from the plurality of outputs not associated with the first feature cluster. (Nushi, Page 4, 1st Col, 3rd Para. discloses “Generic reports analyze the evaluation dataset as a whole, while clustered reports decompose the analysis according to distinct semantic clusters. As we show in the experimental evaluation, although there is value in extracting generic failure information, clustered reports are more predictive for system performance and they discover cluster-specific errors, which cannot be identified via generic views.” And Page 4, 2nd Col. 1st Para. discloses “View generation is a two-step process: 1) clustering the evaluation dataset based on content signals, and 2) detailed reporting globally and per cluster. The process generalizes to all views created in Pandora” and Page 4, 2nd Col., 45h Para. discloses “Moreover, system designers can zoom into the decision tree explore the concrete instances classified in each leaf” and Page 4, 1st Col, 3rd Para. discloses “Generic reports analyze the evaluation dataset as a whole, while clustered reports decompose the analysis according to distinct semantic clusters…” (Visualized tree contains separate branches wherein each leaf node corresponds to distinct feature clusters))

As per claim 7, Nushi as shown above teaches the method of claim 6, Nushi further teaches further comprising:
detecting a selection of the first branch; (Nushi, Page 4, 2nd Col., 45h Para. discloses “Moreover, system designers can zoom into the decision tree explore the concrete instances classified in each leaf” and Page 4, 1st Col, 3rd Para. discloses “Generic reports analyze the evaluation dataset as a whole, while clustered reports decompose the analysis according to distinct semantic clusters…” and Page 4, 2nd Col., 4th Para. discloses “Clustered variants aggregate satisfaction over selected clusters” (Selection and zooming into branches of the visualized tree))
detecting a selection of an additional graphical element corresponding to a second feature cluster from the plurality of feature clusters; (Nushi, Page 4, 2nd Col., 45h Para. discloses “Moreover, system designers can zoom into the decision tree explore the concrete instances classified in each leaf” and Page 4, 1st Col, 3rd Para. discloses “Generic reports analyze the evaluation dataset as a whole, while clustered reports decompose the analysis according to distinct semantic clusters…” and Page 4, 2nd Col., 4th Para. discloses “Clustered variants aggregate satisfaction over selected clusters” and Page 4, 2nd Col., 45h Para. discloses “Moreover, system designers can zoom into the decision tree explore the concrete instances classified in each leaf” (Selection and zooming into branches of the visualized tree wherein one may select a second feature cluster))
and providing a third branch including an indication of the accuracy data associated with a third subset of outputs associated with a combination of feature labels shared by the first cluster and the second feature cluster. (Nushi, Page 4, 2nd Col., 45h Para. discloses “Moreover, system designers can zoom into the decision tree explore the concrete instances classified in each leaf” and Page 4, 1st Col, 3rd Para. discloses “Generic reports analyze the evaluation dataset as a whole, while clustered reports decompose the analysis according to distinct semantic clusters…” and Page 4, 2nd Col., 4th Para. discloses “Clustered variants aggregate satisfaction over selected clusters” and Page 4, 2nd Col., 45h Para. discloses “Moreover, system designers can zoom into the decision tree explore the concrete instances classified in each leaf” (Visualized tree displaying branches and accuracies wherein a third branch may subsequently further display accuracies associated with the shared feature clusters))

As per claim 8,  Nushi as shown above teaches the method of claim 7, Nushi further teaches wherein the multi-branch visualization of the accuracy data for the plurality of output comprises:
a root node representative of the plurality of outputs for the plurality of test instances; (Nushi, Page 8, Figures 3-5 disclose a visualized tree containing a root node pertaining to outputs of test instances)
a first level including a first node representative of the first subset of outputs and a second node representative of the second subset of outputs; (Nushi, Page 8, Figures 3-5 disclose a visualized tree containing two leaf nodes at the first level pertaining to outputs of test instances)
and a second level including a third node representative of the third subset of outputs. (Nushi, Page 8, Figure 5 disclose a visualized tree containing a second level containing nodes pertaining to outputs of test instances)

As per claim 9, Nushi as shown above teaches the method of claim 1, Nushi further teaches:
 (Nushi, Page 4, 2nd Col., 45h Para. discloses “Moreover, system designers can zoom into the decision tree explore the concrete instances classified in each leaf” and Page 5, 1st Col., 2nd Para. discloses “For more extensive explorations, the system designer can intentionally leave a feature out of the tree to investigate other failure conditions in depth or generate feature rankings, as described next “ (Visualized decision tree may be zoomed into to further analyze it wherein the tree displays information associated with leaf nodes containing data regarding clusters)

As per claim 16, Nushi teaches a non-transitory computer readable storage medium storing instructions thereon that, when executed by one or more processors, causes a client device to:
receive, at a client device, a performance report including performance information for a machine learning system, wherein the performance information comprises: (Nushi, Page 2, 1st Col., 2nd Para. discloses “Pandora provides multiple views to highlight different relationships between input data, system execution and system errors. Along one dimension, the views choose the type of data being used for error analysis: signals drawn from content being analyzed or signals collected from component execution” (Error analysis of data and signals collected from machine learning execution of a system))
(Nushi, Page 2, 1st Col., 2nd Para. discloses “Content-based views use detailed ground truth or automatically detected content (i.e., input data) features to learn common situations associated with poor performance. For instance, a face recognizer could report that the system may make more mistakes in recognizing faces of old men wearing eyeglasses.” And Page 4, 1st Col., 3rd Para. discloses “The end-to-end process is focused on a sample evaluation dataset chosen by the system designer” (Evaluation dataset being the test instances))
accuracy data of the plurality of outputs, wherein the accuracy data includes identified errors between outputs from the plurality of outputs and associated ground truth data corresponding to the plurality of test instances; (Nushi, Page 2, 1st Col., 2nd Para. discloses “Content-based views use detailed ground truth or automatically detected content (i.e., input data) features to learn common situations associated with poor performance. For instance, a face recognizer could report that the system may make more mistakes in recognizing faces of old men wearing eyeglasses” and Page 6, Tables 2 and 4 disclose displaying accuracy data (Determination of the degree of accuracy of the outputs of the machine learning model))
feature data associated with the plurality of test instances, the feature data comprising a plurality of feature labels associated with characteristics the plurality of test instances, evidential information provided by the machine learning system, and contextual information from the plurality of test instances;  (Nushi, Page 6, Tables 2 and 4 discloses objects which contain feature label associated with the test instances)
and providing, via a graphical user interface, one or more performance views based on the performance information, the one or more performance views including a plurality of (Nushi, Page 8, Figures 3-5 disclose displaying performance views associated with clusters obtained via clustering” and Page 8, 1st. Col, 2nd Para. discloses “Figure 3 visualizes the decision tree for a content view from crowd data for the cluster kitchen.”)
wherein the plurality of feature clusters include subsets of test instances from the plurality of test instances based on associated feature labels, (Nushi, Page 4, 1st Col., 3rd Para. discloses “Pandora can create generic as well as clustered reports for a given performance view. Generic reports analyze the evaluation dataset as a whole, while clustered reports decompose the analysis according to distinct semantic clusters.” (Distinct semantic clusters being subsets of test instances)) 
and wherein the one or performance views includes an indication of the accuracy data corresponding to at least one feature cluster from the plurality of feature clusters. (Nushi, Page 4, 1st Col., 3rd Para. discloses “Pandora can create generic as well as clustered reports for a given performance view. Generic reports analyze the evaluation dataset as a whole, while clustered reports decompose the analysis according to distinct semantic clusters.” (Clustered reports indicate performance data corresponding to cluster which are a subset of test instances))

As per claim 17, Nushi as shown above teaches the non transitory computer readable storage medium of claim 16, Nushi further teaches further comprising instructions that, when executed by the one or more processors, causes the client device to:
(Nushi, Page 4, 2nd Col., 4th Para. discloses “Clustered variants aggregate satisfaction over selected clusters” and Page 4, 2nd Col., 45h Para. discloses “Moreover, system designers can zoom into the decision tree explore the concrete instances classified in each leaf” (Selection of clusters containing feature labels))
and provide a visualization of the accuracy data associated with a subset of outputs from the plurality of outputs corresponding to a subset of test instances corresponding to the combination of one or more feature labels. (Nushi, Page 4, 2nd Col., 45h Para. discloses “Moreover, system designers can zoom into the decision tree explore the concrete instances classified in each leaf” and Page 5, 1st Col., 2nd Para. discloses “For more extensive explorations, the system designer can intentionally leave a feature out of the tree to investigate other failure conditions in depth or generate feature rankings, as described next “ (Visualization disclosed corresponding to subset of the combination on the leaf nodes)

As per claim 18, Nushi as shown above teaches the non transitory computer readable storage medium of claim 16, Nushi further teaches further comprising instructions that, when executed by the one or more processors, causes the client device to:
detect a selection of a graphical element corresponding to a first feature cluster from the plurality of feature clusters; (Nushi, Page 4, 2nd Col., 4th Para. discloses “Clustered variants aggregate satisfaction over selected clusters” and Page 4, 2nd Col., 45h Para. discloses “Moreover, system designers can zoom into the decision tree explore the concrete instances classified in each leaf” (Selection of clusters containing feature labels))
 (Nushi, Page 4, 1st Col, 3rd Para. discloses “Generic reports analyze the evaluation dataset as a whole, while clustered reports decompose the analysis according to distinct semantic clusters. As we show in the experimental evaluation, although there is value in extracting generic failure information, clustered reports are more predictive for system performance and they discover cluster-specific errors, which cannot be identified via generic views.” And Page 4, 2nd Col. 1st Para. discloses “View generation is a two-step process: 1) clustering the evaluation dataset based on content signals, and 2) detailed reporting globally and per cluster. The process generalizes to all views created in Pandora”)

As per claim 19, Nushi as shown above teaches the non transitory computer readable storage medium of claim 16, Nushi further teaches:
wherein providing the one or more performance views further comprises providing an instance view associated with a selected feature cluster, wherein the instance view comprises a display of a test instance, a display of an output from the machine learning system for the test instance, and a display of at least a portion of the ground truth data for the test instance. (Nushi, Page 4, 2nd Col., 45h Para. discloses “Moreover, system designers can zoom into the decision tree explore the concrete instances classified in each leaf” and Page 5, 1st Col., 2nd Para. discloses “For more extensive explorations, the system designer can intentionally leave a feature out of the tree to investigate other failure conditions in depth or generate feature rankings, as described next “ (Visualized decision tree may be zoomed into to further analyze it wherein the tree displays information associated with leaf nodes containing data regarding clusters)
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 10 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Nushi
As per claim 10, Nushi as shown above teaches the method of claim 1, Nushi further teaches further comprising: 
providing, via the graphical user interface of the client device, a selectable option to provide failure information to a training system, the failure information comprising an indication of one or more feature labels from the plurality of feature labels associated with a threshold rate of identified errors from the accuracy data; (Nushi, Para. 6, Tables 2-4 disclose satisfaction and satisfaction accuracy thresholds, and Abstract discloses “Understanding details about failures is important for identifying pathways for refinement, communicating the reliability of systems in different settings, and for specifying appropriate human oversight and engagement” and Page 1, 2nd Col, 2nd Para. discloses “Finally, system designers can use detailed error analyses to make informed decisions on next steps for system improvement.”)
and providing the failure information to the training system including instructions for refining the machine learning system based on selectively identified training data associated with the one or more feature labels. (Nushi, Para. 6, Tables 2-4 disclose satisfaction and satisfaction accuracy thresholds, and Abstract discloses “Understanding details about failures is important for identifying pathways for refinement, communicating the reliability of systems in different settings, and for specifying appropriate human oversight and engagement” and Page 1, 2nd Col, 2nd Para. discloses “Finally, system designers can use detailed error analyses to make informed decisions on next steps for system improvement.”)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the performance information visualization system as disclosed by Nushi to incorporate providing failure information to a training system The combination would have been obvious because a person of ordinary skill in the art would be motivated to further improve accuracy of the machine learning system. Nushi already discloses a visualization system which provides performance information regarding a machine learning system, thus it would be a straightforward modification to further include instructions for retraining based off the performance information displayed. The performance 

As per claim 20, Nushi as shown above teaches the non transitory computer readable storage medium of claim 16, Nushi further teaches further comprising instructions that, when executed by the one or more processors, causes the client device to:
providing, via the graphical user interface of the client device, a selectable option to provide failure information to a training system, the failure information comprising an indication of one or more feature labels from the plurality of feature labels associated with a threshold rate of identified errors from the accuracy data; (Nushi, Para. 6, Tables 2-4 disclose satisfaction and satisfaction accuracy thresholds, and Abstract discloses “Understanding details about failures is important for identifying pathways for refinement, communicating the reliability of systems in different settings, and for specifying appropriate human oversight and engagement” and Page 1, 2nd Col, 2nd Para. discloses “Finally, system designers can use detailed error analyses to make informed decisions on next steps for system improvement.”)
and providing the failure information to the training system including instructions for refining the machine learning system based on selectively identified training data associated with the one or more feature labels. (Nushi, Para. 6, Tables 2-4 disclose satisfaction and satisfaction accuracy thresholds, and Abstract discloses “Understanding details about failures is important for identifying pathways for refinement, communicating the reliability of systems in different settings, and for specifying appropriate human oversight and engagement” and Page 1, 2nd Col, 2nd Para. discloses “Finally, system designers can use detailed error analyses to make informed decisions on next steps for system improvement.”)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the performance information visualization system as disclosed by Nushi to incorporate providing failure information to a training system The combination would have been obvious because a person of ordinary skill in the art would be motivated to further improve accuracy of the machine learning system. Nushi already discloses a visualization system which provides performance information regarding a machine learning system, thus it would be a straightforward modification to further include instructions for retraining based off the performance information displayed. The performance information discloses criterion associated with the machine learning system which may then subsequently be used to further refine the model based off the medication.

Claims 11-15 are rejected under 35 U.S.C. 103 as being unpatentable over Nushi in view of U.S. Patent No. US 10025950 B1 to Avasarala, et al. (hereinafter, “Avasarala”)
As per claim 11, Nushi teaches a system comprising:
generate a performance report including performance information for a machine learning system, wherein the performance information comprises: (Nushi, Page 2, 1st Col., 2nd Para. discloses “Pandora provides multiple views to highlight different relationships between input data, system execution and system errors. Along one dimension, the views choose the type of data being used for error analysis: signals drawn from content being analyzed or signals collected from component execution” (Error analysis of data and signals collected from machine learning execution of a system))
a plurality of outputs of the machine learning system for a plurality of test instances; (Nushi, Page 2, 1st Col., 2nd Para. discloses “Content-based views use detailed ground truth or automatically detected content (i.e., input data) features to learn common situations associated with poor performance. For instance, a face recognizer could report that the system may make more mistakes in recognizing faces of old men wearing eyeglasses.” And Page 4, 1st Col., 3rd Para. discloses “The end-to-end process is focused on a sample evaluation dataset chosen by the system designer” (Evaluation dataset being the test instances))
accuracy data of the plurality of outputs, wherein the accuracy data includes identified errors between outputs from the plurality of outputs and associated ground truth data corresponding to the plurality of test instances; (Nushi, Page 2, 1st Col., 2nd Para. discloses “Content-based views use detailed ground truth or automatically detected content (i.e., input data) features to learn common situations associated with poor performance. For instance, a face recognizer could report that the system may make more mistakes in recognizing faces of old men wearing eyeglasses” and Page 6, Tables 2 and 4 disclose displaying accuracy data (Determination of the degree of accuracy of the outputs of the machine learning model))
feature data associated with the plurality of test instances, the feature data comprising a plurality of feature labels associated with characteristics the plurality of test instances, evidential information provided by the machine learning system, and contextual information from the plurality of test instances;  (Nushi, Page 6, Tables 2 and 4 discloses objects which contain feature label associated with the test instances)
(Nushi, Page 8, Figures 3-5 disclose displaying performance views associated with clusters obtained via clustering” and Page 8, 1st. Col, 2nd Para. discloses “Figure 3 visualizes the decision tree for a content view from crowd data for the cluster kitchen.”)
and an indication of the accuracy data corresponding to at least one feature cluster from the plurality of feature clusters. (Nushi, Page 4, 1st Col., 3rd Para. discloses “Pandora can create generic as well as clustered reports for a given performance view. Generic reports analyze the evaluation dataset as a whole, while clustered reports decompose the analysis according to distinct semantic clusters.” (Clustered reports indicate performance data corresponding to cluster which are a subset of test instances))
Nushi fails to explicitly teach:
one or more processors
memory in electronic communication with the one or more processors
and instructions stored in the memory, the instructions being executable by the one or more processors to cause a server device to
However, Avasarala teaches:
one or more processors (Avasarala, Para. [20] discloses “one or more processors”)
memory in electronic communication with the one or more processors (Avasarala, Para. [20] discloses “a memory readable by the one or more processors and instructions stored in the memory.”)
(Avasarala, Para. [20] discloses “The instructions, when read by the one or more processors, direct the one more processors”)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the performance information generation as disclosed by Nushi to use the system hardware as disclosed by Avasarala. The combination would have been obvious because a person of ordinary skill in the art would be motivated to perform the embodiments of the claimed invention via a functional computing system.

As per claim 12, the combination of Nushi and Avasarala as shown above teaches the system of claim 11, Nushi further teaches further comprising instructions being executable by the one or more processors to cause the server device to:
detect a selection of a graphical element from the plurality of graphical elements associated with a feature cluster from the plurality of feature clusters; (Nushi, Page 4, 2nd Col., 4th Para. discloses “Clustered variants aggregate satisfaction over selected clusters” and Page 4, 2nd Col., 45h Para. discloses “Moreover, system designers can zoom into the decision tree explore the concrete instances classified in each leaf” (Selection of clusters containing feature labels))
and provide a visualization of the accuracy data associated with a subset of outputs from the plurality of outputs corresponding to the feature cluster. (Nushi, Page 4, 1st Col, 3rd Para. discloses “Generic reports analyze the evaluation dataset as a whole, while clustered reports decompose the analysis according to distinct semantic clusters. As we show in the experimental evaluation, although there is value in extracting generic failure information, clustered reports are more predictive for system performance and they discover cluster-specific errors, which cannot be identified via generic views.” And Page 4, 2nd Col. 1st Para. discloses “View generation is a two-step process: 1) clustering the evaluation dataset based on content signals, and 2) detailed reporting globally and per cluster. The process generalizes to all views created in Pandora”)

As per claim 13, the combination of Nushi and Avasarala as shown above teaches the system of claim 11, Nushi further teaches further comprising instructions being executable by the one or more processors to cause the server device to:
detecting a selection of a graphical element corresponding to a first feature cluster from the plurality of feature clusters; (Nushi, Page 4, 2nd Col., 4th Para. discloses “Clustered variants aggregate satisfaction over selected clusters” and Page 4, 2nd Col., 45h Para. discloses “Moreover, system designers can zoom into the decision tree explore the concrete instances classified in each leaf” (Selection of clusters containing feature labels))
and wherein providing the one or more performance views comprises providing a cluster performance view for the first feature cluster, the cluster performance view -57 -FILED ELECTRONICALLYDocket No. 406506-US-NP comprising a visualization of the accuracy data for a first subset of outputs from the plurality of outputs associated with the first feature cluster. (Nushi, Page 4, 1st Col, 3rd Para. discloses “Generic reports analyze the evaluation dataset as a whole, while clustered reports decompose the analysis according to distinct semantic clusters. As we show in the experimental evaluation, although there is value in extracting generic failure information, clustered reports are more predictive for system performance and they discover cluster-specific errors, which cannot be identified via generic views.” And Page 4, 2nd Col. 1st Para. discloses “View generation is a two-step process: 1) clustering the evaluation dataset based on content signals, and 2) detailed reporting globally and per cluster. The process generalizes to all views created in Pandora”)

As per claim 14, the combination of Nushi and Avasarala as shown above teaches the system of claim 13, Nushi further teaches:
wherein providing the one or more performance views further comprises providing an instance view associated with the first feature cluster, wherein the instance view comprises a display of a test instance from the first feature cluster and associated accuracy for the test instance,. (Nushi, Page 4, 2nd Col., 45h Para. discloses “Moreover, system designers can zoom into the decision tree explore the concrete instances classified in each leaf” and Page 5, 1st Col., 2nd Para. discloses “For more extensive explorations, the system designer can intentionally leave a feature out of the tree to investigate other failure conditions in depth or generate feature rankings, as described next “ (Visualized decision tree may be zoomed into to further analyze it wherein the tree displays information associated with leaf nodes containing data regarding clusters)

As per claim 15, the combination of Nushi and Avasarala as shown above teaches the system of claim 11, Nushi further teaches further comprising instructions being executable by the one or more processors to cause the server device to:
receive an indication of one or more feature labels associated with a threshold rate of identified errors from the accuracy data; (Nushi, Para. 6, Tables 2-4 disclose satisfaction and satisfaction accuracy thresholds, and Abstract discloses “Understanding details about failures is important for identifying pathways for refinement, communicating the reliability of systems in different settings, and for specifying appropriate human oversight and engagement” and Page 1, 2nd Col, 2nd Para. discloses “Finally, system designers can use detailed error analyses to make informed decisions on next steps for system improvement.”)
and cause a training system to refine the machine learning system based on a plurality of training instances associated with the one or more feature labels. (Nushi, Para. 6, Tables 2-4 disclose satisfaction and satisfaction accuracy thresholds, and Abstract discloses “Understanding details about failures is important for identifying pathways for refinement, communicating the reliability of systems in different settings, and for specifying appropriate human oversight and engagement” and Page 1, 2nd Col, 2nd Para. discloses “Finally, system designers can use detailed error analyses to make informed decisions on next steps for system improvement.”)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the performance information visualization system as disclosed by Nushi to incorporate providing failure information to a training system The combination would have been obvious because a person of ordinary skill in .
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Lokare, et al. (U.S. Patent No. US10311368) discloses a system for interpretability and improvement of a machine learning model
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAMZA RAZZAQ MUGHAL whose telephone number is 571-272-8833. The examiner can normally be reached on M-TR from 7:30 to 5:00.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV, can be reached at telephone number 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
/H.R.M./Examiner, Art Unit 2123                   

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123