Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim 1-18 are pending.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 1-10 is/are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 1, “second NN running at a higher level of precision than the first NN’
It is unclear what constitutes a higher level. The term ‘higher’ level is a relative term. 
For the purpose of examination, the claim is interpreted as: the second NN is more accurate than the first NN.
It is unclear what constitutes performing ‘sending to the remote DSA updated parameters of an updated version of the first NN based at least in part on the performance metrics and the first and second sets of behavioral estimates’. How updated parameters for an updated version is send out when the computing device do not disclose having the first neural network or the updated neural network to generate the parameters. What constitutes sending an updated version based on a second estimate? The estimates are compared? They are similar network? Can two different network performance be compared or used to generate an updated parameter for another network?
For the purpose of examination, the claim is interpreted as: the networks are similar with 2nd neural network being superior version of the first neural network and generating updated parameter based on comparing the estimates.
	Claim 2-10 depend on the claim 1. Therefore, claim 2-10 inherit the same deficiency.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 1, 7-13, 15-16, 18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

Regarding claim 1,
2A Prong 1: The limitation of produce a second set of behavioral estimates as outputs in response to the performance metrics is a mental process, as it merely recites making a prediction based on performance of something, which can be done in human mind as evaluation.  The limitation of the second NN running at a higher level of precision than the first NN is mental process, as it merely recites the second NN has better accuracy than the first NN.
2A Prong 2: This judicial exception is not integrated into a practical application. In particular, the claim recites the additional element – using the first NN and second NN. The first and second neural network in the claim is recited at a high-level of generality (i.e., as a generic neural network performs generic neural network function of making prediction) such that it amounts no more than mere instructions to apply the exception using a generic computer component. The limitation of receiving performance metrics of the DSA and a first set of behavioral estimates generated by a first neural network (NN) running on the DSA operating on the performance metrics is insignificant extra-solution activity. The limitation of sending to the remote DSA updated parameters of an updated version of the first NN based at least in part on the performance metrics and the first and second sets of behavioral estimates is insignificant extra-solution activity, as the Neural Network is merely an additional element. 
2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using the first and second NN to make prediction amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The limitation of receiving performance metrics of the DSA and a first set of behavioral estimates generated by a first neural network (NN) running on the DSA operating on the performance metrics is well, understood routine and convention such as mere data gathering (MPEP 2106.05(d)). The limitation of sending to the remote DSA updated parameters of an updated version of the first NN based at least in part on the performance metrics and the first and second sets of behavioral estimates is well, understood routine and conventional, because it is directed to mere data gathering (MPEP 2106.05d)). The limitation of operating a second NN on the computing device with the received performance metrics as inputs is field of use or technological environment (MPEP 2106.05(h)). 

Regarding claim 7, the limitation of wherein the method further includes the computing device directs to a generic hardware. The limitation of monitoring storage performance of a plurality of remote DSAs including the remote DSA by maintaining a pair of NNs for each of the plurality of remote DSAs, is a mental process, as it merely recites monitoring the performance of remote DSA which can be done in human mind or using pen and paper.
The limitation of each pair of NNs including a full NN and a reduced-precision NN, the full NN running at a higher level of precision than the reduced-precision NN is field of use or technological environment (MPEP 2106.05(h)), as it merely recites maintaining neural networks in the devices.
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.

Regarding claim 8, the limitation of wherein the behavioral estimates include estimates of a data reduction ratio achieved by the remote DSA is mathematical concept, as it merely recites using data reduction ratio to calculate behavioral estimate which is a mathematical function.
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.

Regarding claim 9, the limitation of wherein the behavioral estimates include estimates of a data compression ratio achieved by the remote DSA is mathematical concept, as it merely recites using data compression ratio to calculate behavioral estimate which is a mathematical function.
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.

Regarding claim 10, the limitation of wherein operating the second NN on the computing device includes operating the second NN on special-purpose processing circuitry configured to operate NNs in an accelerated manner compared to general-purpose processing circuitry is a field of use or technological environment (MPEP 2106.05(h)).
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.

	Regarding claim 11,
2A Prong 1: The limitation of produce a first set of behavioral estimates as outputs in response to the performance metrics is a mental process, as it merely recites making a prediction based on performance of something, which can be done in human mind. The limitation of the second NN running at a higher level of precision than the first NN is mental process, as it merely recites the first NN having higher accuracy than the first NN. The limitation of updating the first NN with the received updated parameters and operating the updated first NN on the apparatus to produce additional behavioral estimates is mental process, as the limitation merely recites updating the parameter using the received data. - 16 -DEH/BDR/MABAttorney Docket No.: 
2A Prong 2: This judicial exception is not integrated into a practical application. In particular, the claim recites the additional element – Neural Networks and a computerized apparatus (apparatus) of monitoring storage performance of the apparatus. The neural networks and the computerized apparatus in the steps are recited at a high-level of generality such that it amounts no more than mere instructions to apply the exception using a generic computer component. The limitation of receiving updated parameters of the first NN from the remote computing device in response to the remote computing device updating the first NN based at least in part on the performance metrics and the first and second sets of behavioral estimates is an insignificant extra-solution activity. The limitation of sending the performance metrics and the first set of behavioral estimates to a remote computing device configured to run a second NN, the second NN configured to produce a second set of behavioral estimates as outputs in response to the performance metrics is an insignificant extra-solution activity.
2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a computerized apparatus and neural networks to perform the sending and receiving parameter amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The limitation of operating a first neural network (NN) on the apparatus with performance metrics of the apparatus as inputs is field of use or technological environment (MPEP 2106.05(h)). The limitation of sending the performance metrics and the first set of behavioral estimates to a remote computing device configured to run a second NN, the second NN configured to produce a second set of behavioral estimates as outputs in response to the performance metrics is well understood routine and conventional activity such as data gathering (MPEP 2106.05(d)). The limitation of receiving updated parameters of the first NN from the remote computing device in response to the remote computing device updating the first NN based at least in part on the performance metrics and the first and second sets of behavioral estimates is well understood, routine and conventional activity such as mere data gathering (MPEP 2106.05(d)).

Regarding claim 12, the limitation of wherein the first NN is configured with nodes and synapses connecting some of the nodes, the synapses all having unity weight is field of use or technological field (MPEP 2106.05(h)).
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.

Regarding claim 13, the limitation of wherein the first NN includes fewer nodes than the second NN is field of use or technological environment (MPEP 2106.05(h)).
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.

Regarding claim 15, the limitation of wherein the first set of behavioral estimates includes a confidence value, the confidence value estimating how accurate the behavioral estimates are in comparison to the second set of behavioral estimates is a mental process, as it merely recites estimating the behavior using confidence value which can be done in human mind or with the aid of pen and paper. The limitation of wherein the method further comprises: evaluating whether the confidence value exceeds a threshold is a mental process, as it merely recites comparing the confidence value and a threshold. The limitation of in response to evaluating, selectively: for a first set of performance metrics for which the confidence value exceeds the threshold, utilizing the first set of behavioral estimates is a mental process, as it recites a process of selectively evaluating the estimates that has confidence value larger than the threshold. The limitation of refraining from utilizing the first set of behavioral estimates is a mental process.
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The limitation of for a second set of performance metrics for which the confidence value does not exceed the threshold: requesting the second set of behavioral estimates from the remote computing device is a field of use or technological environment (MPEP 2106.05(h)). The limitation of utilizing the second set of behavioral estimates as received from the remote computing device is a field of use or technological environment (MPEP 2106.05(h)), as it recites making second prediction using the data from the computing device.

Regarding claim 16, the limitation of wherein utilizing the first set of behavioral estimates includes informing a user of the apparatus of values of the first set of behavioral estimates is a field of use or technological environment (MPEP 2106.05(h)), as it recites displaying the result to the user.
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.

	Regarding claim 18,
2A Prong 1: The limitation of produce a first set of behavioral estimates as outputs in response to the performance metrics is a mental process, as it merely recites making a prediction based on a performance metric of something. The limitation of - 18 -DEH/BDR/MABAttorney Docket No.: update the first NN with the received updated parameters and operate the updated first NN on that DSA to produce additional behavioral estimates is a mental process, as it merely recites updating a parameter using the received data which can be done with the aid of pen and paper. The limitation of the second NN running at a higher level of precision than the first NN is a mental process, as it merely recites the second NN having better accuracy than the first NN.
2A Prong 2: This judicial exception is not integrated into a practical application. In particular, the claim recites the additional element – using the first and second neural network, a plurality of computerized data storage apparatuses (DSAs); and a remote computing device remote from the DSAs. The neural networks, DSAs, and remote computing device in the steps is recited at a high-level of generality (i.e., as a generic neural network performing a generic neural network function of making prediction, generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. The limitation of send the performance metrics and the first set of behavioral estimates to the remote computing device is an insignificant extra-solution activity. The limitation of receive updated parameters of the first NN from the remote computing device is an insignificant extra-solution activity. The limitation of receive the performance metrics and the first set of behavioral estimates from that DSA is an insignificant extra solution activity. The limitation of send to that DSA updated parameters of an updated version of the first NN based at least in part on the performance metrics and the first and second sets of behavioral estimates is an insignificant extra-solution activity.
2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using the first and second neural network, DSAs, and remote computing device to perform both the ranking and determining steps amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The limitation of operate a first neural network (NN) on that DSA with performance metrics of that DSA as inputs is a field of use or technological environment (MPEP 2106.05(h)). The limitation of operate a second NN for that DSA with the received performance metrics as inputs, the second NN configured to produce a second set of behavioral estimates as outputs in response to the performance metrics is a field of use or technological environment (MPEP 2106.05(h)). The limitation of send the performance metrics and the first set of behavioral estimates to the remote computing device is well understood courante and convention such as mere data gathering (MPEP 2106.05(d)). The limitation of receive updated parameters of the first NN from the remote computing device is well understood courante and convention such as mere data gathering (MPEP 2106.05(d)). The limitation of receive the performance metrics and the first set of behavioral estimates from that DSA is well understood courante and convention such as mere data gathering (MPEP 2106.05(d)). The limitation of send to that DSA updated parameters of an updated version of the first NN based at least in part on the performance metrics and the first and second sets of behavioral estimates is well understood courante and convention such as mere data gathering (MPEP 2106.05(d)).


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1, 7, 10-13, 15-16, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Jia (US 20180173971 A1) in view of Frey (Frey et al, 2015, “Cloud Storage Prediction with Neural Networks”), and further in view of Li (US 20160078339 A1).

Regarding claim 1, Jia teaches a method performed by a computing device of monitoring performance, the method comprising ([Jia, 0046] “The sensor subsystems 132 provide input sensor data 155 to an on-board neural network subsystem 134. The input sensor data 155 can include multiple channels of data, where each channel represents a different characteristic of reflected electromagnetic radiation. Thus, multiple channels of input sensor data 155 can be generated from measurements from the same sensor.”): 
receiving performance metrics of the DSA and a first set of estimates generated by a first neural network (NN) running on the DSA operating on the performance metrics ([Jia, 0047] “The sensor subsystems 132, the on-board neural network subsystem 134, or some combination of both, transform raw sensor data into the multiple channels of input sensor data 155. To do so, the on-board system 130 can project the various characteristics of the raw sensor data into a common coordinate system. The various characteristics of the raw sensor data, and their respective representations, will be discussed in more detail below with reference to FIG. 3.” The ON-BOARD-SYSTEM 130 corresponds to the DSA, and the ON-BOARD NEURAL NETOWRK SUBSYSTEM is the first NN.
[Jia, 0051] “The on-board neural network subsystem 134 uses the input sensor data 155 to generate one or more object detection predictions 165. The on-board neural network subsystem 134 can provide the one or more object detection predictions 165 to a planning subsystem 136, a user interface subsystem 138, or both.” The sensor data of the on-board neural network subsystem corresponds to the performance metrics of the DSA. The 134 receives the sensor data and generates the first set of estimates. 
[Jia, 0054] “The on-board neural network subsystem 134 can also use the input sensor data 155 to generate training data 123. The training data 123 can include the projected representations of the different channels of input sensor data. The on-board system 130 can provide the training data 123 to the training system 110 in offline batches or in an online fashion, e.g., continually whenever it is generated.” 
[Jia, 0059] “The neural network subsystem 114 can receive training examples 123 as input. The training examples 123 can include auto-labeled training data, human-labeled training data, or some combination of the two. Each of the training examples 123 includes a representation of the different channels of input sensor data as well as one or more labels that indicate the location of objects within regions of space represented by the input sensor data...” The paragraph discloses the second neural network receives the performance metric (sensor data). The combination of 0054 and 0059 teaches the first NN (training system 114) receives the output of the second NN (130).); 
operating a second NN on the computing device with the received performance metrics as inputs, the second NN configured to produce a second set of estimates as outputs in response to the performance metrics ([Jia, 0054] “The on-board neural network subsystem 134 can also use the input sensor data 155 to generate training data 123. The training data 123 can include the projected representations of the different channels of input sensor data. The on-board system 130 can provide the training data 123 to the training system 110 in offline batches or in an online fashion, e.g., continually whenever it is generated.” 
 [Jia, 0059] “The neural network subsystem 114 can receive training examples 123 as input. The training examples 123 can include auto-labeled training data, human-labeled training data, or some combination of the two. Each of the training examples 123 includes a representation of the different channels of input sensor data as well as one or more labels that indicate the location of objects within regions of space represented by the input sensor data...” The paragraph discloses the second neural network receives the performance metric (sensor data). The combination of 0054 and 0059 teaches the first NN (training system 114) receives the output of the second NN (130).
[Jia, 0060] “The training neural network subsystem 114 can generate, for each training example 123, one or more object detection predictions 135...” This paragraph teaches the second neural network generates predictions (second set of estimates) using the training examples received from.); 
sending the updated parameters of an updated version of a NN based at least in part on the performance metrics ([Jia, 0060] “The training neural network subsystem 114 can generate, for each training example 123, one or more object detection predictions 135. A training engine 116 analyzes the object detection predictions 135 and compares the object detection predictions to the labels in the training examples 123. The training engine 116 then generates updated model parameter values 145 by using an appropriate updating technique, e.g., backpropagation. The training engine 116 can then update the collection of model parameters 170 using the updated model parameter values 145.” The paragraph teaches the update of the NN parameter is based on its performance metric (comparing it with labels).
[Jia, 0061] “After training is complete, the training system 110 can provide a final set of model parameter values 171 to the on-board system 130 for use in making fully autonomous or semi-autonomous driving decisions. The training system 110 can provide the final set of model parameter values 171 by a wired or wireless connection to the on-board system 130.” The paragraph teaches sending the updated NN parameter to another device.).
Although Jia discloses the 1st NN and the 2nd NN, sending the updated parameter to a different device, and performance monitoring of a vehicle or a device, it does not specifically teach monitoring storage performance of a remote data storage apparatus (DSA), the second NN running at a higher level of precision than the first NN, the second NN running at a higher level of precision than the first NN, and sending the updated parameters of an updated version of the first NN based at least in part on the performance metrics and the first and second sets of behavioral estimates.
Frey teaches monitoring storage performance of a remote data storage apparatus (DSA) ([Frey, page 54, left column, the last paragraph – right column, first paragraph] “In the simulation, the impact of regulatory mechanisms on the following key performance indicators was considered: * Free memory amount: providing an optimal amount of memory by the control logic. * Response time: compliance with the KPI response time by adjusting the storage medium. * Backup Media: proposal of a suitable backup medium. For this, the used Neural Network consisted of 11 different input neurons. Table II lists the used input neurons and describes the used input factors. As output neurons, there is one neuron that gives the expected used memory amount for the next simulation step, a neuron that determines the amount of memory to be added or removed, as well as other neurons that recommend the optimal backup medium.” The cloud storage corresponds to the remote data storage apparatus.).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Jia and Frey to use the method of sending and receiving behavior estimate of DSA of Frey to implement the machine learning system of Jia. The suggestion and/or motivation to do so is to improve the accuracy of the system, as the method has to estimate the storage performance therefore the process of monitoring the storage is essential.
Jia in view of Frey does not specifically teach sending the updated parameters of an updated version of the first NN based at least in part on the performance metrics and the first and second sets of behavioral estimates.
Li teaches the second NN running at a higher level of precision than the first NN ([Li, 0016] “Various aspects of the technology described herein are generally directed to among other things, systems, methods, and computer-readable media for providing a first DNN model of reduced size for deployment on devices by “learning” the first DNN from a second DNN with larger capacity (number of hidden nodes). To learn a DNN with a smaller number of hidden nodes, a larger size (more accurate) “teacher” DNN is used for training the smaller “student” DNN”),
sending the updated parameters of an updated version of the first NN based at least in part on the performance metrics and the first and second sets of behavioral estimates ([Li, 0043] “… Embodiments that determine the KL divergence provide an advantage over other alternatives such as regression because minimizing the KL divergence is equivalent to minimizing the cross entropy of the distributions, as further described in method 500 of FIG. 5. If the output distribution 351 of student DNN 301 has converged with the output distribution 352 of teacher DNN 302, then the student DNN is deemed to be trained. However, if the output has not converged, and in some embodiments the output still appears to be converging, then the student DNN 301 is trained based on the error. For example, as shown at 370, using back propagation the weights of student DNN 301 are updated using the error signal.” The output of the student and the teacher model corresponds to the first and second sets of behavioral estimates. The error rate corresponds to the performance metrics.).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Jia, Frey, and Li to use the method of sending remote updated parameters based on its performance based on performance metric and behavior estimates of Li to implement the machine learning system of Jia and Frey. The suggestion and/or motivation to do so is to improve the efficiency of the system, as sending all of the trained first neural network to the remote device will waste the computation resources.

Regarding claim 7, Jia teaches maintaining a pair of NNs for each of the plurality of devices ([Jia, 0073] “The system processes the input using a high precision object detection neural network (240). As described above, the input processed by using the high precision object detection neural network is generated based on the output of multiple high-recall object detection neural networks, e.g., the high-recall object detection neural networks 305A and 305B. The components of the input, e.g., the projected laser image, the camera image patch, and the feature vector, are then processed using a high precision object detection neural network.” The Fig.1 discloses the neural networks in each of the on-board system and training system. The Fig. 3A discloses the structure of the neural networks that processes the sensor data 301 using a pair of neural networks 305A and 305B. A pair of neural networks are distributed in each of the system 110 and 130.).
	Jia does not specifically teach wherein the method further includes the computing device monitoring storage performance of a plurality of remote DSAs, each pair of NNs including a full NN and a reduced-precision NN, the full NN running at a higher level of precision than the reduced-precision NN.
Frey teaches wherein the method further includes the computing device monitoring storage performance of a plurality of remote DSAs ([Frey, page 53, left column, second paragraph] “Cloud storage resources are usually multi tenant, which means for the provider that it can economically be very important to distribute the storage as efficiently as possible ... In practice, this can lead to problems because the memory usage of clients can vary greatly and therefore SLA violations can happen easily. For this, a method shall be found that allows to determine the needed amount of memory close to the optimum and allocate it ahead of time ...” This paragraph discloses the cloud storage resources are multi tenant, which has plurality of remote storage devices.
[Frey, page 54, left column, the last paragraph – right column, first paragraph] “In the simulation, the impact of regulatory mechanisms on the following key performance indicators was considered: * Free memory amount: providing an optimal amount of memory by the control logic. * Response time: compliance with the KPI response time by adjusting the storage medium. * Backup Media: proposal of a suitable backup medium. For this, the used Neural Network consisted of 11 different input neurons. Table II lists the used input neurons and describes the used input factors. As output neurons, there is one neuron that gives the expected used memory amount for the next simulation step, a neuron that determines the amount of memory to be added or removed, as well as other neurons that recommend the optimal backup medium.” The cloud storage corresponds to the remote data storage apparatus.). 
However, Jia in view of Frey does not specifically teach each pair of NNs including a full NN and a reduced-precision NN, the full NN running at a higher level of precision than the reduced-precision NN.
Li teaches each pair of NNs including a full NN and a reduced-precision NN, the full NN running at a higher level of precision than the reduced-precision NN ([Li, 0016] “Various aspects of the technology described herein are generally directed to among other things, systems, methods, and computer-readable media for providing a first DNN model of reduced size for deployment on devices by “learning” the first DNN from a second DNN with larger capacity (number of hidden nodes). To learn a DNN with a smaller number of hidden nodes, a larger size (more accurate) “teacher” DNN is used for training the smaller “student” DNN”).

Regarding claim 10, Jia in view of Frey and further in view of Li teaches wherein operating the second NN on the computing device includes operating the second NN on special-purpose processing circuitry configured to operate NNs in an accelerated manner compared to general-purpose processing circuitry ([Jia, 0055] “The training system 110 is typically hosted within a data center 112, which can be a distributed computing system having hundreds or thousands of computers in one or more locations.”
[Jia, 0056] “The training system 110 includes a training neural network subsystem 114 … The training neural network subsystem 114 includes a plurality of computing devices having software or hardware modules that implement the respective operations of each layer of the neural network according to an architecture of the neural network.”
[Jia, 0057] “The training neural networks generally have the same architecture as the on-board neural networks. However, the training system 110 need not use the same hardware to compute the operations of each layer. In other words, the training system 110 can use CPUs only, highly parallelized hardware, or some combination of these.”).

Regarding claim 11, Jia teaches a method performed by a computerized apparatus of monitoring performance ([Jia, 0046] “The sensor subsystems 132 provide input sensor data 155 to an on-board neural network subsystem 134. The input sensor data 155 can include multiple channels of data, where each channel represents a different characteristic of reflected electromagnetic radiation. Thus, multiple channels of input sensor data 155 can be generated from measurements from the same sensor.”), the method comprising: 
operating a first neural network (NN) on the apparatus with performance metrics of the apparatus as inputs, the first NN configured to produce a first set of estimates as outputs in response to the performance metrics ([Jia, 0047] “The sensor subsystems 132, the on-board neural network subsystem 134, or some combination of both, transform raw sensor data into the multiple channels of input sensor data 155. To do so, the on-board system 130 can project the various characteristics of the raw sensor data into a common coordinate system. The various characteristics of the raw sensor data, and their respective representations, will be discussed in more detail below with reference to FIG. 3.”
[Jia, 0048] “The on-board neural network subsystem 134 implements the operations of each layer of a set of neural networks that are trained to make predictions related to object detection, i.e., related to detecting objects in the environment surrounding the vehicle.”); 
sending the performance metrics and the first set of estimates to a remote computing device configured to run a second NN, the second NN configured to produce a second set of behavioral estimates as outputs in response to the performance metrics ([Jia, Fig 1] The 114 corresponds to the second NN, and the 134 corresponds to the first NN. 110 is the remote computing device that receives the first set of estimates and the performance metrics.
[Jia, 0054] “The on-board neural network subsystem 134 can also use the input sensor data 155 to generate training data 123. The training data 123 can include the projected representations of the different channels of input sensor data. The on-board system 130 can provide the training data 123 to the training system 110 in offline batches or in an online fashion, e.g., continually whenever it is generated.” The paragraph teaches the first NN generates the first set of predictions, which is going to be input to the second NN. 
[Jia, 0059] “The neural network subsystem 114 can receive training examples 123 as input. The training examples 123 can include auto-labeled training data, human-labeled training data, or some combination of the two. Each of the training examples 123 includes a representation of the different channels of input sensor data as well as one or more labels that indicate the location of objects within regions of space represented by the input sensor data...” The paragraph discloses the second neural network receives the performance metric (sensor data).
[Jia, 0060] “The training neural network subsystem 114 can generate, for each training example 123, one or more object detection predictions 135...” This paragraph teaches the second NN generates predictions (second set of estimates) using the training examples received from.); 
receiving updated parameters of the first NN from the remote computing device in response to the remote computing device updating the first NN ([Jia, 0054] “The on-board neural network subsystem 134 can also use the input sensor data 155 to generate training data 123. The training data 123 can include the projected representations of the different channels of input sensor data. The on-board system 130 can provide the training data 123 to the training system 110 in offline batches or in an online fashion, e.g., continually whenever it is generated.” 
 [Jia, 0059] “The neural network subsystem 114 can receive training examples 123 as input. The training examples 123 can include auto-labeled training data, human-labeled training data, or some combination of the two. Each of the training examples 123 includes a representation of the different channels of input sensor data as well as one or more labels that indicate the location of objects within regions of space represented by the input sensor data ...” The paragraph discloses the first NN of on-board system sends the output generated from the first NN. The combination of 0054 and 0059 teaches the second NN receives the data and training result from the first NN. Even though it does not specifically teaches sending the parameters of the first NN, Jia teaches sending the parameter of the second NN to the on-board system in the paragraph 0061. 
[Jia, 0049] “The on-board neural network subsystem 134 can implement the operations of each layer of a neural network by loading a collection of model parameters 172 that are received from the training system 110.” This paragraph teaches the process of updating the first NN in on-board system.);
updating the first NN with the received updated parameters and operating the updated first NN on the apparatus to produce additional estimates ([Jia, 0061] “After training is complete, the training system 110 can provide a final set of model parameter values 171 to the on-board system 130 for use in making fully autonomous or semi-autonomous driving decisions. The training system 110 can provide the final set of model parameter values 171 by a wired or wireless connection to the on-board system 130.” This paragraph teaches the process of sending the parameter to the on-board system (remote).
[Jia, 0049] “The on-board neural network subsystem 134 can implement the operations of each layer of a neural network by loading a collection of model parameters 172 that are received from the training system 110.” [Jia, 0051] “The on-board neural network subsystem 134 uses the input sensor data 155 to generate one or more object detection predictions 165.” The On-Board Neural Network Subsystem 134 comprises the first neural network, receives the updated parameters and generates new estimates.).
Jia does not specifically teach a method performed by a computerized apparatus of monitoring storage performance of the apparatus, and the second NN running at a higher level of precision than the first NN.
Frey teaches a method monitoring storage performance of the apparatus ([Frey, page 54, left column, the last paragraph – right column, first paragraph] “In the simulation, the impact of regulatory mechanisms on the following key performance indicators was considered: * Free memory amount: providing an optimal amount of memory by the control logic. * Response time: compliance with the KPI response time by adjusting the storage medium. * Backup Media: proposal of a suitable backup medium. For this, the used Neural Network consisted of 11 different input neurons. Table II lists the used input neurons and describes the used input factors. As output neurons, there is one neuron that gives the expected used memory amount for the next simulation step, a neuron that determines the amount of memory to be added or removed, as well as other neurons that recommend the optimal backup medium.” The cloud storage corresponds to the remote data storage apparatus.): 
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Jia and Frey to use the method of monitoring storage performance of DSA of Frey to implement the machine learning system of Jia. The suggestion and/or motivation to do so is to improve the accuracy of the system, as the method has to estimate the storage performance therefore the process of monitoring the storage is essential.
Jia in view of Frey does not specifically teach the second NN running at a higher level of precision than the first NN.
Li teaches updating the first NN based at least in part on the performance metrics and the first and second sets of estimates ([Li, 0043] “… Embodiments that determine the KL divergence provide an advantage over other alternatives such as regression because minimizing the KL divergence is equivalent to minimizing the cross entropy of the distributions, as further described in method 500 of FIG. 5. If the output distribution 351 of student DNN 301 has converged with the output distribution 352 of teacher DNN 302, then the student DNN is deemed to be trained. However, if the output has not converged, and in some embodiments the output still appears to be converging, then the student DNN 301 is trained based on the error. For example, as shown at 370, using back propagation the weights of student DNN 301 are updated using the error signal.” The output of the student and the teacher model corresponds to the first and second sets of behavioral estimates. The error rate corresponds to the performance metrics.); 
the second NN running at a higher level of precision than the first NN ([Li, 0016] “Various aspects of the technology described herein are generally directed to among other things, systems, methods, and computer-readable media for providing a first DNN model of reduced size for deployment on devices by “learning” the first DNN from a second DNN with larger capacity (number of hidden nodes). To learn a DNN with a smaller number of hidden nodes, a larger size (more accurate) “teacher” DNN is used for training the smaller “student” DNN”).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Jia, Frey, and Li to use the method of the second NN running at higher level of precision than the first NN of Li to implement the machine learning system of Jia and Frey. The suggestion and/or motivation to do so is to improve the efficiency of the system, as the process of training the low-precision network using high-precision network will help reducing the use of computation resource by reducing the amount of memory to store the network.

Regarding claim 12, Jia in view of Frey and further in view of Li teaches wherein the first NN is configured with nodes and synapses connecting some of the nodes, the synapses all having unity weight ([Jia, 0011] “In addition, each convolutional network layer can have neurons in a three-dimensional arrangement, with depth, width, and height dimensions. The width and height dimensions correspond to the two-dimensional features of the layer's input. The depth-dimension includes one or more depth sublayers of neurons. Generally, convolutional neural networks employ weight sharing so that all neurons in a depth sublayer have the same weights. This provides for translation invariance when detecting features in the input.”).

Regarding claim 13, Jia in view of Frey and further in view of Li teaches wherein the first NN includes fewer nodes than the second NN ([Li, 0016] “Various aspects of the technology described herein are generally directed to among other things, systems, methods, and computer-readable media for providing a first DNN model of reduced size for deployment on devices by “learning” the first DNN from a second DNN with larger capacity (number of hidden nodes). To learn a DNN with a smaller number of hidden nodes, a larger size (more accurate) “teacher” DNN is used for training the smaller “student” DNN”).

Regarding claim 15, Jia teaches wherein the first set of behavioral estimates includes a confidence value, the confidence value estimating how accurate the behavioral estimates are in comparison to the second set of behavioral estimates ([Jia, 0022] “In some implementations, the method further includes: obtaining a camera image of the environment; and processing the camera image using a first high-recall object detection neural network. The first object detection neural network is configured to: receive the camera image; and process the camera image to generate: (i) data defining a plurality of bounding boxes in the camera image; and (ii) for each of the plurality of bounding boxes, a respective first confidence score that represents a likelihood that an object belonging to an object category from a second set of one or more object categories is present in the region of the environment shown in the bounding box.”); and 
wherein the method further comprises: evaluating whether the confidence value exceeds a threshold ([Jia, 0118] “In this example, the high-recall object detection neural network 305B is trained to identify objects that are a part of the object category “PEDESTRIAN.” Because the confidence score computed for the camera patch A is lower than a threshold, the high-recall object detection neural network 305B disregards the white vehicle as not being included in the “PEDESTRIAN” class, and therefore only generates a bounding box for the individual exiting the truck as illustrated.”); and 
in response to evaluating, selectively: for a first set of performance metrics for which the confidence value exceeds the threshold, utilizing the first set of behavioral estimates ([Jia, 0016] “… The system can use techniques to improve the accuracy of the predictions. The system can also process sensor data through specific neural network subsystems to reduce computational resources required to generate accurate object predictions. For example, confidence scores of object scores for predicted objects using a lower precision neural network can be used to determine if additional processing is required. In response determining that the values of the confidence scores satisfy a threshold value, a higher-precision neural network can be used to further process sensor data to improve object prediction accuracy” 
[Jia, 0105] “In the example depicted in FIG. 4A, the high-recall object detection neural network 305A identifies five bounding boxes within the processed laser projected image 404 corresponding to detected objects labelled as “A,” “B,” “C,” “D,” and “E” in the figure. The high-recall object detection neural network 305A also computes confidence scores for each object for a “CYCLIST” object category, which are included in table 408. As depicted, high-recall object detection neural network 305A computes the highest confidence score for objects “C” and “B,” indicating that these objects are most likely to represent a cyclist in the vicinity of the vehicle. In contrast, the high-recall object detection neural network 305A computes the lowest confidence score for object “D,” indicating that this object is either an object associated with a different object category, e.g., a pedestrian or a vehicle, a falsely detected object.”); and 
for a second set of performance metrics for which the confidence value does not exceed the threshold: requesting the second set of behavioral estimates from the remote computing device ([Jia, 0118] “In the example depicted, of the two objects initially detected, the high-recall object detection neural network 305B only identifies a bounding box within the processed camera image 454 for the camera image patch A of the individual exiting the parked truck, but not for the camera image patch B of the white vehicle. In this example, the high-recall object detection neural network 305B is trained to identify objects that are a part of the object category “PEDESTRIAN.” Because the confidence score computed for the camera patch A is lower than a threshold, the high-recall object detection neural network 305B disregards the white vehicle as not being included in the “PEDESTRIAN” class, and therefore only generates a bounding box for the individual exiting the truck as illustrated.” The training of second neural network 305B is requested due to the confidence score lower than the threshold.); 
refraining from utilizing the first set of behavioral estimates ([Jia, Fig 1] According to the Fig 1, the prediction generated from the first neural network 135 is not used in the ON-BOARD SYSTEM 130, which comprises the second neural network 134. ); and 
utilizing the second set of behavioral estimates as received from the remote computing device ([Jia, 0048] “The on-board neural network subsystem 134 implements the operations of each layer of a set of neural networks that are trained to make predictions related to object detection, i.e., related to detecting objects in the environment surrounding the vehicle. Thus, the on-board neural network subsystem 134 includes one or more computing devices having software or hardware modules that implement the respective operations of each layer of the neural networks according to an architecture of the neural networks. The object detection neural networks are described in more detail below with reference to FIGS. 2-4.”).
Jia does not specifically teach behavioral estimate of remote computing device.
Jia in view of Frey and further in view of Li teaches the behavioral estimate of remote computing device ([Frey, page 52, left column, Abstract] “In the simulation, the impact of regulatory mechanisms on the following key performance indicators was considered: * Free memory amount: providing an optimal amount of memory by the control logic. * Response time: compliance with the KPI response time by adjusting the storage medium. * Backup Media: proposal of a suitable backup medium. For this, the used Neural Network consisted of 11 different input neurons. Table II lists the used input neurons and describes the used input factors. As output neurons, there is one neuron that gives the expected used memory amount for the next simulation step, a neuron that determines the amount of memory to be added or removed, as well as other neurons that recommend the optimal backup medium.” The cloud storage corresponds to the remote data storage apparatus.).

Regarding claim 16, Jia in view of Frey and further in view of Li teaches wherein utilizing the first set of behavioral estimates includes informing a user of the apparatus of values of the first set of behavioral estimates ([Jia, 0077] “The system processes the alternate representations of the projected laser image, the camera image patch, and the feature vector through a combining sub-neural network (250). For example, as illustrated in FIG. 3A, the alternative representations 304A, 304B, and 304C are provided as input to a combining sub-neural network 310D. The combining sub-neural network 310D can include one or more visual combining neural network layers to generate visual combined representations of the processed project laser image and the processed the camera image patch. The visual combined representation and the alternative representation of the feature vector are then processed by one or more final combining layers to generate a final combined representation, which is used to generate object scores for objects detected within the visual combined representation.”).

Regarding claim 18, Jia teaches a system including: wherein each DSA is configured to: operate a first neural network (NN) on that DSA with performance metrics of that DSA as inputs, the first NN configured to produce a first set of behavioral estimates as outputs in response to the performance metrics ([Jia, 0051] The on-board neural network subsystem 134 uses the input sensor data 155 to generate one or more object detection predictions 165. The on-board neural network subsystem 134 can provide the one or more object detection predictions 165 to a planning subsystem 136, a user interface subsystem 138, or both.” The on-board NN corresponds to the first NN. The input sensor data corresponds to the performance metrics.); 
send the performance metrics and the first set of behavioral estimates to the remote computing device ([Jia, 0054] “The on-board neural network subsystem 134 can also use the input sensor data 155 to generate training data 123. The training data 123 can include the projected representations of the different channels of input sensor data. The on-board system 130 can provide the training data 123 to the training system 110 in offline batches or in an online fashion, e.g., continually whenever it is generated.” The paragraph teaches the first NN generates the first set of predictions, which is going to be input to the second NN. 
 [Jia, 0059] “The neural network subsystem 114 can receive training examples 123 as input. The training examples 123 can include auto-labeled training data, human-labeled training data, or some combination of the two. Each of the training examples 123 includes a representation of the different channels of input sensor data as well as one or more labels that indicate the location of objects within regions of space represented by the input sensor data. For example, the training examples 123 can include input sensor data for reference objects that are predetermined to be associated with different object categories, e.g., pedestrians, cyclists. In some implementations, training examples 123 can include multiple objects for each object category.” The training example 123 from the first NN (134) is transferred from the on-board system to the remote computing device (Training system 110). The auto-labeled training data is the first set of behavioral estimates, and the input sensor data is the performance metrics.); 
receive updated parameters of the first NN from the remote computing device ([Jia, 0049] “The on-board neural network subsystem 134 can implement the operations of each layer of a neural network by loading a collection of model parameters 172 that are received from the training system 110. Although illustrated as being logically separated, the model parameters 170 and the software or hardware modules performing the operations may actually be located on the same computing device or, in the case of an executing software module, stored within the same memory device.” ); 
update the first NN with the received updated parameters and operate the updated first NN on that DSA to produce additional behavioral estimates ([Jia, 0049] “The on-board neural network subsystem 134 can implement the operations of each layer of a neural network by loading a collection of model parameters 172 that are received from the training system 110. Although illustrated as being logically separated, the model parameters 170 and the software or hardware modules performing the operations may actually be located on the same computing device or, in the case of an executing software module, stored within the same memory device.” The first neural network 134 on the DSA 130 is updated using the updated parameter received from the 110.); and 
wherein the remote computing device is configured to, for each DSA: receive the performance metrics and the first set of behavioral estimates from that DSA ([Jia, 0054] “The on-board neural network subsystem 134 can also use the input sensor data 155 to generate training data 123. The training data 123 can include the projected representations of the different channels of input sensor data. The on-board system 130 can provide the training data 123 to the training system 110 in offline batches or in an online fashion, e.g., continually whenever it is generated.” As seen in the figure 1, the training data 123 is transferred from the on-board system to the remote computing system 110. The input sensor data corresponds to the performance metric and the generated training data corresponds to the first set of behavioral estimates from the DSA (on-board system).
[Jia, 0059] “The neural network subsystem 114 can receive training examples 123 as input. The training examples 123 can include auto-labeled training data, human-labeled training data, or some combination of the two. Each of the training examples 123 includes a representation of the different channels of input sensor data as well as one or more labels that indicate the location of objects within regions of space represented by the input sensor data. For example, the training examples 123 can include input sensor data for reference objects that are predetermined to be associated with different object categories, e.g., pedestrians, cyclists. In some implementations, training examples 123 can include multiple objects for each object category.” The training example 123 from the first NN (134) is transferred from the on-board system to the remote computing device (Training system 110). The auto-labeled training data is the first set of behavioral estimates, and the input sensor data is the performance metrics.); 
operate a second NN for that DSA with the received performance metrics as inputs, the second NN configured to produce a second set of behavioral estimates as outputs in response to the performance metrics ([Jia, 0054] “The on-board neural network subsystem 134 can also use the input sensor data 155 to generate training data 123. The training data 123 can include the projected representations of the different channels of input sensor data. The on-board system 130 can provide the training data 123 to the training system 110 in offline batches or in an online fashion, e.g., continually whenever it is generated.”
[Jia, 0055] “The training system 110 is typically hosted within a data center 112, which can be a distributed computing system having hundreds or thousands of computers in one or more locations.”
[Jia, 0056] “The training system 110 includes a training neural network subsystem 114 that can implement the operations of each layer of a neural network that is designed to make object detection predictions from input sensor data.”); and 
send to that DSA updated parameters of an updated version of the first NN based at least in part on the performance metrics and the first and second sets of behavioral estimates ([Jia, 0049] “The on-board neural network subsystem 134 can implement the operations of each layer of a neural network by loading a collection of model parameters 172 that are received from the training system 110. Although illustrated as being logically separated, the model parameters 170 and the software or hardware modules performing the operations may actually be located on the same computing device or, in the case of an executing software module, stored within the same memory device.”
[Jia, 0061] “After training is complete, the training system 110 can provide a final set of model parameter values 171 to the on-board system 130 for use in making fully autonomous or semi-autonomous driving decisions. The training system 110 can provide the final set of model parameter values 171 by a wired or wireless connection to the on-board system 130.” The on-board system which corresponds to the DSA receives the input data from the remote computing device 110 the updated first neural network parameters 171. The updated parameter value is updated based on the prediction results 135 and the performance metrics 123.).
Jia does not specifically teach a plurality of computerized data storage apparatuses (DSAs); and a remote computing device remote from the DSAs, and the second NN running at a higher level of precision than the first NN.
Frey teaches a plurality of computerized data storage apparatuses (DSAs); and a remote computing device remote from the DSAs ([Frey, page 53, left column, 5th line of 1st paragraph - 2nd paragraph] “For storage, typical KPIs stated in an SLA can be the read- and write-speed, storage capacity, random input/outputs per second (IOPS) and bandwidth. Since cloud computing resources can be allocated dynamically at runtime additional, dynamic service level objectives arise … Cloud storage resources are usually multi tenant, which means for the provider that it can economically be very important to distribute the storage as efficiently as possible. This means allocating as close to the minimum guaranteed amount of storage as possible. In practice, this can lead to problems because the memory usage of clients can vary greatly and therefore SLA violations can happen easily. For this, a method shall be found that allows to determine the needed amount of memory close to the optimum and allocate it ahead of time. With such an efficient provisioning method it would be possible for providers to maximize the usability of their infrastructure while at the same time guarantee customers a high quality service.”); 
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Jia and Frey to use the method of having DSA apparatus and sending and receiving behavior estimate of DSA of Frey to implement the machine learning system of Jia. The suggestion and/or motivation to do so is to improve the accuracy of the system, as the method has to estimate the storage performance therefore the process of monitoring the storage is essential.
Jia in view of Frey does not specifically teach the second NN running at a higher level of precision than the first NN.
Li teaches the second NN running at a higher level of precision than the first NN ([Li, 0016] “Various aspects of the technology described herein are generally directed to among other things, systems, methods, and computer-readable media for providing a first DNN model of reduced size for deployment on devices by “learning” the first DNN from a second DNN with larger capacity (number of hidden nodes). To learn a DNN with a smaller number of hidden nodes, a larger size (more accurate) “teacher” DNN is used for training the smaller “student” DNN”).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Jia, Frey, and Li to use the method of the second NN running at higher level of precision than the first NN of Li to implement the machine learning system of Jia and Frey. The suggestion and/or motivation to do so is to improve the efficiency of the system, as the process of training the low-precision network using high-precision network will help reducing the use of computation resource by reducing the amount of memory to store the network.

Claim 2-6 are rejected under 35 U.S.C. 103 as being unpatentable over Jia (US 20180173971 A1) in view of Frey (Frey et al, 2015, “Cloud Storage Prediction with Neural Networks”) in view of Li (US 20160078339 A1), and further in view of Brothers (US 20160358070 A1).

Regarding claim 2, Jia in view of Frey and further in view of Li teaches the method of claim 1 wherein generating the updated version of the first NN ([Jia, 0061] “After training is complete, the training system 110 can provide a final set of model parameter values 171 to the on-board system 130 for use in making fully autonomous or semi-autonomous driving decisions. The training system 110 can provide the final set of model parameter values 171 by a wired or wireless connection to the on-board system 130.” This paragraph teaches the process of sending the parameter to the on-board system (remote).
[Jia, 0049] “The on-board neural network subsystem 134 can implement the operations of each layer of a neural network by loading a collection of model parameters 172 that are received from the training system 110.” [Jia, 0051] “The on-board neural network subsystem 134 uses the input sensor data 155 to generate one or more object detection predictions 165.” The On-Board Neural Network Subsystem 134 comprises the first neural network, receives the updated parameters and generates new estimates.).
Jia in view of Frey and further in view of Li does not specifically teach reducing a level of precision of synapse weights from the second NN for use in the updated version of the first NN; and eliminating synapse weights in the updated version of the first NN having reduced-precision weights of zero.
Brother teaches reducing a level of precision of synapse weights from the second NN for use in the updated version of the first NN ([Brothers, 0074] “In another arrangement, pruning may include reducing the precision of a numerical format of one or more numerical values of the neural network. For example, the neural network analyzer can analyze one or more weights in a number of layers of the neural network to determine whether the precision of the numerical format used for the weights may be reduced. By reducing the precision of numerical formats used, lower precision arithmetic hardware may in turn be used. Lower precision arithmetic hardware can be more power efficient and can be built more densely than higher precision arithmetic hardware. For purposes of illustration, a neural network that uses the minimum number of bits required to represent the range and precision of the parameters can achieve higher performance (e.g., faster runtime and/or lower power consumption) than a neural network using a greater number of bits than required.”); and 
eliminating synapse weights in the updated version of the first NN having reduced-precision weights of zero ([Brothers, 0074] “In another arrangement, pruning may include reducing the precision of a numerical format of one or more numerical values of the neural network. For example, the neural network analyzer can analyze one or more weights in a number of layers of the neural network to determine whether the precision of the numerical format used for the weights may be reduced. By reducing the precision of numerical formats used, lower precision arithmetic hardware may in turn be used. Lower precision arithmetic hardware can be more power efficient and can be built more densely than higher precision arithmetic hardware. For purposes of illustration, a neural network that uses the minimum number of bits required to represent the range and precision of the parameters can achieve higher performance (e.g., faster runtime and/or lower power consumption) than a neural network using a greater number of bits than required.” The pruning process comprises the process of eliminating weights.).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Jia, Frey, Li, and Brothers to use the method of reducing the synapse weight of Brothers to implement the machine learning system of Jia, Frey, and Li. The suggestion and/or motivation to do so is to improve the efficiency of the system, as the process of reducing the synapse weight reduces the amount of memory space required to store the network.

Regarding claim 3, Jia in view of Frey in view of Li and further in view of Brothers teaches wherein generating the updated version of the first NN further includes eliminating, from the first updated version of the first NN, nodes of the second NN whose input synapses have all been eliminated in the updated version of the first NN ([Brothers, Claim 11] “11. The method of claim 10, wherein pruning the portion of the first neural network comprises performing an operation selected from the group consisting of using a different numerical format for weights of the first neural network, removing a feature map of a layer of the first neural network, zeroing a convolution kernel of the first neural network, and removing a layer of the first neural network.” The result of the pruning process can be interpreted as the second neural network.).

Regarding claim 4, Jia in view of Frey in view of Li and further in view of Brothers teaches wherein reducing the level of precision of the synapse weights includes discretizing the synapse weights to zero or one by rounding using a threshold value ([Brothers, 0062] “The neural network analyzer may scan the layers of the neural network to identify one or more convolution kernels that are candidates for convolution kernel substitution. In one example, the replacement convolution kernel may be a modified version of the convolution kernel selected for substitution. The neural network analyzer may evaluate the selected convolution kernel and, for weights of the selected convolution kernel that are less than a threshold value, change the weights to zero. In that case, the neural network analyzer generates a replacement convolution kernel that includes more zero weights than the selected convolution kernel thereby reducing the number of computations necessary for executing the modified neural network compared to the first neural network.” The process of rounding the weight value to zero by using a threshold value corresponds to the discretizing process.).

Regarding claim 5, Jia in view of Frey in view of Li and further in view of Brothers wherein generating the updated version of the first NN further includes, prior to reducing the level of precision and eliminating synapse weights, replacing the second NN with an updated version of the second NN, based at least in part on the performance metrics and the first and second sets of behavioral estimates ([Brothers, Claim 11] “11. The method of claim 10, wherein pruning the portion of the first neural network comprises performing an operation selected from the group consisting of using a different numerical format for weights of the first neural network, removing a feature map of a layer of the first neural network, zeroing a convolution kernel of the first neural network, and removing a layer of the first neural network.” The result of the pruning process can be interpreted as the second neural network.
[Brothers, 0050-0051] “… In one embodiment, the neural network analyzer performs validation by forward propagating validation test sets. The neural network analyzer may be configured to perform validation automatically ... In block 335, the neural network analyzer may determine whether the performance of the second neural network is acceptable. In one arrangement, the neural network analyzer may compare the performance of the second neural network with the performance of the first neural network. For example, the neural network analyzer may automatically run the same validation test sets on the first neural network and the second neural network. The neural network analyzer may compare any of the metrics described herein such as the prediction value, power consumption, and/or runtime between the two neural networks to compute the difference. If the difference is greater than a desired threshold, e.g., as specified by one or more of the performance requirements, then the performance of the modified neural network is not acceptable.” The validation is done by propagating validation test sets, which corresponds to the first and second output, and performance metric corresponds to the output of the neural network analyzer such as power consumption and runtime.).

Regarding claim 6, Jia in view of Frey in view of Li and further in view of Brothers teaches wherein generating the updated version of the first NN further includes adding nodes and synapses to the updated version of the first NN yielding a confidence estimate as an output value of the updated version of the first NN ([Brothers, Claim 11] “11. The method of claim 10, wherein pruning the portion of the first neural network comprises performing an operation selected from the group consisting of using a different numerical format for weights of the first neural network, removing a feature map of a layer of the first neural network, zeroing a convolution kernel of the first neural network, and removing a layer of the first neural network.” The result of the pruning process can be interpreted as the second neural network.
[Brothers, 0050-0051] “… In one embodiment, the neural network analyzer performs validation by forward propagating validation test sets. The neural network analyzer may be configured to perform validation automatically ... In block 335, the neural network analyzer may determine whether the performance of the second neural network is acceptable. In one arrangement, the neural network analyzer may compare the performance of the second neural network with the performance of the first neural network. For example, the neural network analyzer may automatically run the same validation test sets on the first neural network and the second neural network. The neural network analyzer may compare any of the metrics described herein such as the prediction value, power consumption, and/or runtime between the two neural networks to compute the difference. If the difference is greater than a desired threshold, e.g., as specified by one or more of the performance requirements, then the performance of the modified neural network is not acceptable.” The validation is done by propagating validation test sets, which corresponds to the first and second output, and performance metric corresponds to the output of the neural network analyzer such as power consumption and runtime.).

Claim 8 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Jia (US 20180173971 A1) in view of Frey (Frey et al, 2015, “Cloud Storage Prediction with Neural Networks”), in view of Li (US 20160078339 A1), and further in view of Harnik (US 20170199895 A1).

Regarding claim 8, Jia in view of Frey and further in view of Li teaches the method of claim 1. 
Jia in view of Frey and further in view of LI does not specifically teach wherein the behavioral estimates include estimates of a data reduction ratio achieved by the remote DSA. 
Harnik teaches wherein the behavioral estimates include estimates of a data reduction ratio achieved by the remote DSA ([Harnik, 0062] “Find a weighted duplication frequency histogram x′ under the constraint that [00005] .Math. i = 1 size  ( x ) .Math. i * x i = N .Math. CR ( 5 ) [0063] such that the distance between T(x′) and the observed weighted duplication frequency histogram y is minimal. [0064] The estimated number of chunks required to store the entire compressed and deduplicated dataset is then [00006] C = .Math. i = 1 size  ( x ′ ) .Math. x i ′ ( 6 ) [0065] and the estimated data reduction ratio is [00007] R = C N .”).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Jia, Frey, Li, and Harnik to use the method of the behavioral estimates include estimates of a data reduction ratio achieved by the remote DSA of Harnik to implement the machine learning system of Jia, Frey, and Li. The suggestion and/or motivation to do so is to improve the efficiency of the system, as the process of reducing the synapse weight reduces the amount of memory space required to store the network and validating the reduction ratio is important to measure the weight reduction performance of the system.

Regarding claim 9, Jia in view of Frey in view of Li and further in view of Harnik teaches wherein the behavioral estimates include estimates of a data compression ratio achieved by the remote DSA ([Harnik, 0047] “Finally, in a determination step 78, processor 24 determines a deduplication ratio based on the identified optimal histogram in a determination step 78, and the method ends. In some embodiments, the deduplication ratio indicates a first space savings for implementing deduplication on the sample number of chunks 34. In additional embodiments, processor 24 can estimate a compression ratio for each of the sample chunks, and can determine a second space savings based on the compression ratios and the deduplication ratio.”).

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Jia (US 20180173971 A1) in view of Frey (Frey et al, 2015, “Cloud Storage Prediction with Neural Networks”), in view of Li (US 20160078339 A1), and further in view of TURAKHIA (US 20180164866 A1).

Regarding claim 14, Jia teaches wherein the first NN has a first size, the first NN being stored entirely within a memory of the apparatus ([Jia, 0058] “The training neural network subsystem 114 can compute the operations of each layer of the neural network using current values of parameters 115 stored in a collection of model parameters 170. Although illustrated as being logically separated, the model parameters 170 and the software or hardware modules performing the operations may actually be located on the same computing device or on the same memory device.” The memory device corresponds to the cache of the apparatus, and the 114 corresponds to the first NN.); 
Jia does not specifically teach wherein the second NN has a second size larger than the first size, the second NN exceeding a size of the cache of the apparatus.
Jia in view of Frey, and further in view of Li teaches wherein the second NN has a second size larger than the first size, the second NN exceeding a size of the cache of the apparatus ([Li, 0016] “Various aspects of the technology described herein are generally directed to among other things, systems, methods, and computer-readable media for providing a first DNN model of reduced size for deployment on devices by “learning” the first DNN from a second DNN with larger capacity (number of hidden nodes). To learn a DNN with a smaller number of hidden nodes, a larger size (more accurate) “teacher” DNN is used for training the smaller “student” DNN.” If the first NN matches the cache size and the second NN is bigger than the first NN, it is obvious that the second NN exceeds the size of the cache of the apparatus.).
Even though Jia in view of Frey and further in view of Li teaches the NN stored in the memory, Jia in view of Frey, and further in view of Li does not specifically teach the NN being stored entirely within a cache of the apparatus.
TURAKHIA teaches the NN being stored entirely within a cache of the apparatus ([TURAKHIA, 0040] “The operand storage 308 may be a memory or a cache for storing operands that are to be loaded to the multipliers of the computation units 314. In one configuration, for each pair of operands, the first operand may be a weight of the neural network, and the second operand may be an activation of the neural network.”)
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Jia, Frey, Li, and TURAKHIA to use the method of NN being stored entirely within a cache of the apparatus of TURAKHIA to implement the machine learning system of Jia, Frey, and Li. The suggestion and/or motivation to do so is to improve the efficiency of the system, as storing the neural network in the cache memory will improve the speed of the neural network operation.

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Jia (US 20180173971 A1) in view of Frey (Frey et al, 2015, “Cloud Storage Prediction with Neural Networks”), in view of Li (US 20160078339 A1), and further in view of Golden (US 11144638 B1).

Regarding claim 17, Jia in view of Frey and further in view of Li teaches the method of claim 15, wherein utilizing the first set of behavioral estimates ([Jia, 0047] “The sensor subsystems 132, the on-board neural network subsystem 134, or some combination of both, transform raw sensor data into the multiple channels of input sensor data 155. To do so, the on-board system 130 can project the various characteristics of the raw sensor data into a common coordinate system. The various characteristics of the raw sensor data, and their respective representations, will be discussed in more detail below with reference to FIG. 3.”
[Jia, 0048] “The on-board neural network subsystem 134 implements the operations of each layer of a set of neural networks that are trained to make predictions related to object detection, i.e., related to detecting objects in the environment surrounding the vehicle.”).
Jia in view of Frey and further in view of Li does not specifically teach throttling intake of write commands based in part on a data reduction ratio of the first set of behavioral estimates.
Golden teaches throttling intake of write commands based in part on a data reduction ratio of the first set of behavioral estimates ([Golden, Claim 17] “17. The storage system of claim 14, wherein the one or more storage system controllers operatively coupled to determine comprises the one or more storage system controllers operatively coupled to: detect a write of data that compresses or deduplicates to a data reduction amount differing from historical compression or deduplication in the storage system.”).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Jia, Frey, Li, and Golden to use the method of throttling intake of write commands based in part on a data reduction ratio of the first set of behavioral estimates of Golden to implement the machine learning system of Jia, Frey, and Li. The suggestion and/or motivation to do so is to improve the efficiency of the system, as the process of throttling the input of write command based on a data reduction ratio will reduce the use of unnecessary write commands.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Regarding monitoring storage performance of data storage apparatus.
US 20190065901 A1
Chen, 1993, “Storage Performance-Metrics and Benchmarks”
US 7505949 B2
US 5737519 A
US 20150193697 A1
US-20190303980-A1
US-20180275667-A1
US-20200252682-A1
US-20200125956-A1
US-20150193697-A1
US 10127234 B1
Mishra & Marr, 2017, “APPRENTICE: USING KNOWLEDGE DISTILLATION TECHNIQUES TO IMPROVE LOW-PRECISION NETWORK ACCURACY”
Wang et al, 2018, “Privacy Preserving Distributed Deep Learning and Its Application in Credit Card Fraud Detection”
Chen, 1993, “Storage Performance-Metrics and Benchmarks”
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JUN KWON whose telephone number is (571)272-2072. The examiner can normally be reached on 7:30 AM - 5:30 PM. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached on (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/JUN KWON/
Examiner, Art Unit 2127

/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127