DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application was filed on 09/12/2016.
This action is in response to arguments and/or remarks filed on 01/14/2021. In the current amendments, claims 1-2, 10, 12-13, 15-16 and 21 have been amended and claims 1-22 are pending and have been examined. 



Information Disclosure Statement
The information disclosure statement (IDS) submitted on 07/16/2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.


Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 01/14/2021 has been entered.
 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-4, 10-15 and 21-22 are rejected under 35 U.S.C. 103 as being unpatentable over Han et al. (“Deep Compression: Compressing Deep Neural Networks with pruning, Trained Quantization and Huffman Coding”; hereinafter: Han) in view of Han et al. (“Learning both Weights and Connections for Efficient Neural Gao et al. (“DeepCloak: Masking Deep Neural Network Models for Robustness Against Adversarial Samples”, hereinafter: Gao). 
Regarding claim 1 (Currently Amended)
Han teaches a method, comprising: identifying one or more parameters representative of one or more redundant nodes  (pg. 2 third paragraph “To achieve this goal, we present “deep compression”: a three stage pipeline (Figure 1) to reduce the storage required by neural network in a manner that preserves the original accuracy. First, we prune the networking by removing the redundant connections, keeping only the most informative connections. Next, the weights are quantized so that multiple connections share the same weight, thus only the codebook (effective weights) and the indices need to be stored.”) from a plurality of layers of a neural network model, (pg. 3 section 3 “For pruned AlexNet, we are able to quantize to 8-bits (256 shared weights) for each CONV layers, and 5-bits (32 shared weights) for each FC layer without any loss of accuracy”)
including retrieving one or more parameters of a set of parameters (pg. 2 section 2 “Pruning reduced the number of parameters by 9× and 13× for AlexNet and VGG-16 model”) representative of a neural network model at least in part by identifying and removing one or more parameters representative of one or more redundant nodes from one or more layers of the neural network model, (pg. 2 third paragraph “To achieve this goal, we present “deep compression”: a three stage pipeline (Figure 1) to reduce the storage required by neural network in a manner that preserves the original accuracy. First, we prune the networking by removing the redundant connections, keeping only the most informative connections. Next, the weights are quantized so that multiple connections share the same weight, thus only the codebook (effective weights) and the indices need to be stored.”)
…
and storing the compressed set of parameters (pg. 2 section 2 “Pruning reduced the number of parameters by 9× and 13× for AlexNet and VGG-16 model”) representative of the neural network model in the at least one memory of the at least one computing device. (Pg. 9 section 6.3 third paragraph “For the pruned sparse layer, we stored the sparse matrix in in CSR format, and used cuSPARSE CSRMV kernel, which is optimized for sparse matrix-vector multiplication on GPU[corresponds to computing device with memory]”)
Han does not teach …including retrieving one or more parameters of the set of parameters representative of the neural network model from at least one memory of the at least one computing device,
combining one or more parameters representative of one or more nodes of one or more mask layers with the set of parameters representative of the neural network model including inserting a first mask layer of the one or more mask layers between first and second layers of the plurality of layers of the neural network model,
and training the one or more parameters representative of the one or more nodes of the one or more mask layers via one or more machine operations;
generating a compressed set of parameters representative of the neural network model at least in part by removing the one or more parameters representative of the one or more redundant nodes from the one or more layers of the neural network model;
Han2 teaches …including retrieving one or more parameters of the set of parameters representative of the neural network model (pg. 3 section 3 “The final step retrains the network to learn the final weights[corresponds to parameters] for the remaining sparse connections. This step is critical. If the pruned network is used without retraining, accuracy is significantly impacted.”) from at least one memory of the at least one computing device (pg. 4 section 4 “We carried out the experiments on Nvidia TitanX and GTX980 GPUs[corresponds to computing device].”)
combining one or more parameters representative of one or more nodes of one or more mask layers with the set of parameters representative of the neural network model… and training the one or more parameters representative of the one or more nodes of the one or more mask layers via one or more machine operations; (Under its broadest reasonable interpretation, Examiner notes that the mask applied to the filter corresponds to a mask layer and a value of mask(at any given position) is a parameter of a mask node. See pg. 4 section 3.5 “After pruning connections, neurons with zero input connections or zero output connections may be safely pruned. This pruning is furthered by removing all connections to or from a pruned neuron. The retraining[corresponds to training] phase automatically arrives at the result where dead neurons will have both zero input connections and zero output connections. This occurs due to gradient descent and regularization. A neuron that has zero input connections (or zero output connections) will have no contribution to the final loss, leading the gradient to be zero for its output connection (or input connection), respectively. Only the regularization term will push the weights to zero. Thus, the dead neurons[corresponds to node] will be automatically removed during retraining.”)
generating a compressed set of parameters (Examiner notes pruning reduces the number of weights thus means compressed see pg. 4-5 “Table 1 shows pruning saves  parameters on these networks. For each layer of the network the table shows (left to right) the original number of weights, the number of floating point operations to compute that layer’s activations, the average percentage of activations that are non-zero, the percentage of non-zero weights after pruning, and the percentage of actually required floating point operations.”)  representative of the neural network model at least in part by removing the one or more parameters representative of the one or more redundant nodes from the one or more layers of the neural network model; (Examiner notes that Han2 teaches the method of prunes redundant connections as clearly describes in the abstract “Our method prunes redundant connections using a three-step method. First, we train the network to learn which connections are important. Next, we prune the unimportant connections. Finally, we retrain the network to fine tune the weights of the remaining connections.” Later in section 3.5 it describes removing one or more parameters of the redundant connections [corresponds to nodes] from one or more layers see pg. 4 section 3.5 “After pruning connections, neurons with zero input connections or zero output connections may be safely pruned. This pruning is furthered by removing all connections to or from a pruned neuron. The retraining phase automatically arrives at the result where dead neurons[corresponds to nodes] will have both zero input connections and zero output connections. This occurs due to gradient descent and regularization. A neuron that has zero input connections (or zero output connections) will have no contribution to the final loss, leading the gradient to be zero for its output connection (or input connection), respectively. Only the regularization term will push the weights to zero. Thus, the dead neurons will be automatically removed during retraining.”)
Han and Han2 are analogous art because they are both directed to convolutional neural network.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Han to incorporate the teaching of Han2 to include efficient neural network method “to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections” as disclosed by Han2 (abstract).
Han in view of Han2 does not teach …the neural network model including inserting a first mask layer of the one or more mask layers between first and second layers of the plurality of layers of the neural network model.
Gao teaches …the neural network model including inserting a first mask layer of the one or more mask layers between first and second layers of the plurality of layers of the neural network model. (Pg. 2 section 3 “To remove those unnecessary features, we insert a mask layer in a DNN model right before the linear layer handling classification. The mask layer serves as a selector, which will keep the necessary features and remove the unnecessary features by setting them to 0. The size of the feature vector is kept unchanged. The proposed structure is shown in Figure 1”)
Han, Han2 and Gao are analogous art because they are all directed to neural network.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Han in view of Han2 to incorporate the teaching of Gao in order to include classifier that extracts many unnecessary features in the training process in adversarial attacks. 
One of ordinary skill in the art would have been motivated to make this modification in order to include a method for removing “unnecessary features for a DNN model by directly modifying its network structure” to enhance computation efficiency as disclosed by Gao (abstract).

Regarding claim 2 
Han in view of Han2 with Gao teaches claim 1.
Han further teaches wherein the set of parameters representative of the neural network model comprises one or more parameters (pg. 6 section 5.2 “We further examine the performance of Deep Compression on the ImageNet ILSVRC-2012 dataset, which has 1.2M training examples and 50k validation examples. We use the AlexNet Caffe model as the reference model, which has 61 million parameters and achieved a top-1 accuracy of 57.2% and a top-5 accuracy of 80.3%.”) representative of one or more convolutional layers of the neural network model. (Pg. 6 section 5.3 “LeNet-300-100 is a fully connected network with two hidden layers, with 300 and 100 neurons each, which achieves 1.6% error rate on Mnist. LeNet-5 is a convolutional network that has two convolutional layers and two fully connected layers, which achieves 0.8% error rate on Mnist.”)

Regarding claim 13
	Claim 13 is an apparatus claim corresponding to method claim 3 and is rejected for the same reasons as given in the rejection of claim 3. 

Regarding claim 3 
Han in view of Han2 with Gao teaches claim 1.
Han2 further teaches wherein the generating set of parameters representative of the neural network model further includes removing the one or more parameters representative of the one or more nodes of the one or more mask layers from the set of parameters representative of the neural network model. (Under its broadest reasonable interpretation, Examiner notes that the mask applied to the filter corresponds to a mask layer and a value of mask(at any given position) is a parameter of a mask node. See pg. 4 section 3.5 “After pruning connections, neurons with zero input connections or zero output connections may be safely pruned. This pruning is furthered by removing all connections to or from a pruned neuron. The retraining phase automatically arrives at the result where dead neurons will have both zero input connections and zero output connections. This occurs due to gradient descent and regularization. A neuron that has zero input connections (or zero output connections) will have no contribution to the final loss, leading the gradient to be zero for its output connection (or input connection), respectively. Only the regularization term will push the weights to zero. Thus, the dead neurons will be automatically removed during retraining.”)
Han, Gao and Han2 are analogous art because they are all directed to neural network.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Han in view of Gao to incorporate the teaching of Han2 to include efficient neural network method “to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections” as disclosed by Han2 (abstract).

Regarding claim 14
	Claim 14 is an apparatus claim corresponding to method claim 3 and is rejected for the same reasons as given in the rejection of claim 3. 

Regarding claim 4 
Han in view of Han2 with Gao teaches claim 3.
Han2 further teaches wherein the generating the compressed set of parameters representative of the neural network model further includes retraining the set of parameters representative of the neural network model responsive at least in part to the removing the one or more Page 3 of 39parameters representative of the one or more nodes of the one or more mask layers from the set of parameters representative of the neural  (Under its broadest reasonable interpretation, Examiner notes that the mask applied to the filter corresponds to a mask layer and a value of mask(at any given position) is a parameter of a mask node. See pg. 4 section 3.5 “After pruning connections, neurons with zero input connections or zero output connections may be safely pruned. This pruning is furthered by removing all connections to or from a pruned neuron. The retraining phase automatically arrives at the result where dead neurons will have both zero input connections and zero output connections. This occurs due to gradient descent and regularization. A neuron that has zero input connections (or zero output connections) will have no contribution to the final loss, leading the gradient to be zero for its output connection (or input connection), respectively. Only the regularization term will push the weights to zero. Thus, the dead neurons will be automatically removed during retraining.”)
Han, Gao and Han2 are analogous art because they are all directed to neural network.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Han in view of Gao to incorporate the teaching of Han2 to include efficient neural network method “to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections” as disclosed by Han2 (abstract).


Regarding claim 15
	Claim 15 is an apparatus claim corresponding to method claim 4 and is rejected for the same reasons as given in the rejection of claim 4. 

Regarding claim 10 (Currently Amended)66Attorney Docket No.: 252.P072 (P04368US.family) applying the weight decay parameter continues until the training the one or more parameters representative of the one or more mask layers is determined to not sufficiently maintain neural network model accuracy.
Han in view of Han2 teaches claim 4. 
Han2 further wherein the combining more parameters representative of the one or more nodes of the one or more mask layers with the set of parameters representative of the neural network model further includes adding one or more mask layers for respective individual layers of the plurality of layers of the neural network model except for input and/or output layers. (Under its broadest reasonable interpretation, Examiner notes that the mask applied to the filter corresponds to a mask layer and a value of mask(at any given position) is a parameter of a mask node. See pg. 4 section 3.5 “After pruning connections, neurons with zero input connections or zero output connections may be safely pruned. This pruning is furthered by removing all connections to or from a pruned neuron. The retraining[corresponds to training] phase automatically arrives at the result where dead neurons will have both zero input connections and zero output connections. This occurs due to gradient descent and regularization. A neuron that has zero input connections (or zero output connections) will have no contribution to the final loss, leading the gradient to be zero for its output connection (or input connection), respectively. Only the regularization term will push the weights to zero. Thus, the dead neurons[corresponds to node] will be automatically removed during retraining.”)
Han, Gao and Han2 are analogous art because they are all directed to neural network.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Han in view of Gao to incorporate the teaching of Han2 to include efficient neural network method “to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections” as disclosed by Han2 (abstract).

Regarding claim 21
	Claim 21 is an apparatus claim corresponding to method claim 10 and is rejected for the same reasons as given in the rejection of claim 10. 

Regarding claim 11 
Han in view of Han2 with Gao teaches claim 10.
Han further teaches the method of claim 10 wherein the storing the compressed set of parameters representative of the neural network Page 5 of 39model in the at least one memory of the at least one computing device includes storing the compressed set of parameters representative of the neural network model in a non-sparse format. (Pg. 3 section 3 second paragraph “Weight sharing is illustrated in Figure 3. Suppose we have a layer that has 4 input neurons and 4 output neurons, the weight is a 4 × 4 matrix. On the top left is the 4 × 4 weight matrix, and on the bottom left is the 4 × 4 gradient matrix. The weights are quantized to 4 bins (denoted with 4 colors), all the weights in the same bin share the same value, thus for each weight, we then need to store only a small index into a table of shared weights.” Examiner notes that storing full parameters corresponds to storing non-sparse format in memory see pg. 5 section 5 “The compression pipeline saves network storage by 35× to 49× across different networks without loss of accuracy. The total size of AlexNet decreased from 240MB to 6.9MB, which is small enough to be put into on-chip SRAM, eliminating the need to store the model in energy-consuming DRAM memory.”)

Regarding claim 22
	Claim 22 is an apparatus claim corresponding to method claim 11 and is rejected for the same reasons as given in the rejection of claim 11. 

Regarding claim 12 (Currently Amended)
Han teaches an apparatus, comprising: at least one processor of at least one computing device, (pg. 9 section 6.3 third paragraph “We compare three different off-the-shelf hardware: the NVIDIA GeForce GTX Titan X and the Intel Core i7 5930K as desktop processors (same package as NVIDIA Digits Dev Box) and NVIDIA Tegra K1 as mobile processor. To run the benchmark on GPU, we used cuBLAS GEMV for the original dense layer”)
a set of parameters (pg. 2 section 2 “Pruning reduced the number of parameters by 9× and 13× for AlexNet and VGG-16 model”) representative of the neural network model to identify one or more parameters representative of one or more redundant nodes from one or more layers of the neural network model, (pg. 2 third paragraph “To achieve this goal, we present “deep compression”: a three stage pipeline (Figure 1) to reduce the storage required by neural network in a manner that preserves the original accuracy. First, we prune the networking by removing the redundant connections, keeping only the most informative connections. Next, the weights are quantized so that multiple connections share the same weight, thus only the codebook (effective weights) and the indices need to be stored.”)
the processor further to store the compressed set of parameters representative of the neural network model in the at least one memory of the at least one computing device. (Pg. 9 section 6.3 third paragraph “For the pruned sparse layer, we stored the sparse matrix in in CSR format, and used cuSPARSE CSRMV kernel, which is optimized for sparse matrix-vector multiplication on GPU[corresponds to computing device with memory]”)
Han does not teach …retrieve a set of parameters of the set of parameters representative of the neural network model from at least one memory of the at least one computing device,
combining one or more parameters representative of one or more nodes of one or more mask layers with the set of parameters representative of the neural network model including inserting a first mask layer of the one or more mask layers between first and second layers of the plurality of layers of the neural network model,

generating a compressed set of parameters representative of the neural network model at least in part by removing the one or more parameters representative of the one or more redundant nodes from the one or more layers of the neural network model. 
Han2 teaches …including retrieving one or more parameters of the set of parameters representative of the neural network model (pg. 3 section 3 “The final step retrains the network to learn the final weights[corresponds to parameters] for the remaining sparse connections. This step is critical. If the pruned network is used without retraining, accuracy is significantly impacted.”) from at least one memory of the at least one computing device (pg. 4 section 4 “We carried out the experiments on Nvidia TitanX and GTX980 GPUs[corresponds to computing device].”)
combining one or more parameters representative of one or more nodes of one or more mask layers with the set of parameters representative of the neural network model and training the one or more parameters representative of the one or more nodes of the one or more mask layers via one or more machine operations; (Under its broadest reasonable interpretation, Examiner notes that the mask applied to the filter corresponds to a mask layer and a value of mask(at any given position) is a parameter of a mask node. See pg. 4 section 3.5 “After pruning connections, neurons with zero input connections or zero output connections may be safely pruned. This pruning is furthered by removing all connections to or from a pruned neuron. The retraining[corresponds to training] phase automatically arrives at the result where dead neurons will have both zero input connections and zero output connections. This occurs due to gradient descent and regularization. A neuron that has zero input connections (or zero output connections) will have no contribution to the final loss, leading the gradient to be zero for its output connection (or input connection), respectively. Only the regularization term will push the weights to zero. Thus, the dead neurons[corresponds to node] will be automatically removed during retraining.”)
generating a compressed set of parameters (Examiner notes pruning reduces the number of weights thus means compressed see pg. 4-5 “Table 1 shows pruning saves  parameters on these networks. For each layer of the network the table shows (left to right) the original number of weights, the number of floating point operations to compute that layer’s activations, the average percentage of activations that are non-zero, the percentage of non-zero weights after pruning, and the percentage of actually required floating point operations.”)  representative of the neural network model at least in part by removing the one or more parameters representative of the one or more redundant nodes from the one or more layers of the neural network model; (Examiner notes that Han2 teaches the method of prunes redundant connections as clearly describes in the abstract “Our method prunes redundant connections using a three-step method. First, we train the network to learn which connections are important. Next, we prune the unimportant connections. Finally, we retrain the network to fine tune the weights of the remaining connections.” Later in section 3.5 it describes removing one or more parameters of the redundant connections [corresponds to nodes] from one or more layers see pg. 4 section 3.5 “After pruning connections, neurons with zero input connections or zero output connections may be safely pruned. This pruning is furthered by removing all connections to or from a pruned neuron. The retraining phase automatically arrives at the result where dead neurons[corresponds to nodes] will have both zero input connections and zero output connections. This occurs due to gradient descent and regularization. A neuron that has zero input connections (or zero output connections) will have no contribution to the final loss, leading the gradient to be zero for its output connection (or input connection), respectively. Only the regularization term will push the weights to zero. Thus, the dead neurons will be automatically removed during retraining.”)
Han and Han2 are analogous art because they are both directed to convolutional neural network.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Han to incorporate the teaching of Han2 to include efficient neural network method “to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections” as recognized by Han2 (abstract).
Han in view of Han2 does not teach …the neural network model including inserting a first mask layer of the one or more mask layers between first and second layers of the plurality of layers of the neural network model.
Gao teaches …the neural network model including inserting a first mask layer of the one or more mask layers between first and second layers of the plurality of layers of the neural network model. (Pg. 2 section 3 “To remove those unnecessary features, we insert a mask layer in a DNN model right before the linear layer handling classification. The mask layer serves as a selector, which will keep the necessary features and remove the unnecessary features by setting them to 0. The size of the feature vector is kept unchanged. The proposed structure is shown in Figure 1”)
Han, Han2 and Gao are analogous art because they are all directed to neural network.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Han in view of Han2 to incorporate the teaching of Gao in order to include classifier that extracts many unnecessary features in the training process in adversarial attacks. 
One of ordinary skill in the art would have been motivated to make this modification in order to include a method for removing “unnecessary features for a DNN model by directly modifying its network structure” to enhance computation efficiency as disclosed by Gao (abstract).

Claims 5-9 and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Han et al. (“Deep Compression: Compressing Deep Neural Networks with pruning, Trained Quantization and Huffman Coding”; hereinafter: Han) in view of Han et al. (“Learning both Weights and Connections for Efficient Neural Networks”; hereinafter: Han2) in view of Gao et al. and further in view of Yang et al. (“Object Detection and Viewpoint Estimation with Auto-masking Neural Network”; hereinafter: Yang).
Regarding claim 5 
Han in view of Han2 with Gao teaches claim 4.
Han further teaches …and a parameter representative of a floating point value. (pg. 2 second paragraph “Energy consumption is dominated by memory access. Under 45nm CMOS technology, a 32 bit floating point add consumes 0.9pJ, a 32bit SRAM cache access takes 5pJ, while a 32bit DRAM memory access takes 640pJ, which is 3 orders of magnitude of an add operation.”)
Han in view of Han2 does not teach wherein the one or more nodes of the one or more mask layers individually comprise a parameter representative of a Boolean variable.
Yang teaches wherein the one or more nodes of the one or more mask layers () individually comprise a parameter representative of a Boolean variable (pg. 444 first paragraph “The mask layer is fully-connected to the output of CNNM with bounded rectified linear neurons, the outputs of which are… where mi ∈ [0, 1][corresponds to Boolean variable] is the response of a node in the mask layer”)
Han, Han2, Gao and Yang are analogous art because they are all directed to neural network.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Han in view of Han2 and Gao to incorporate the teaching of Yang to include mask layers that can select the most discriminative features from the input and pass them to the next level which also help deal with multiple tasks such as object detection and viewpoint estimation simultaneously as disclosed by Yang (pg. 442 fifth paragraph).
Regarding claim 16
	Claim 16 is an apparatus claim corresponding to method claim 5 and is rejected for the same reasons as given in the rejection of claim 5. 

Regarding claim 6 
Han in view of Han2 with Gao and Yang teaches claim 5. 
Han further teaches … iteratively assigning values (pg. 3 section 3 “During update, all the gradients are grouped by the color and summed together, multiplied by the learning rate and subtracted from the shared centroids from last iteration.”)
Yang further teaches wherein the training the one or more parameters representative of the one or more nodes of the one or more mask layers comprises… assigning values to the Boolean variables of the one or more nodes of the one or more mask layers (pg. 444 first paragraph “The mask layer is fully-connected to the output of CNNM with bounded rectified linear neurons, the outputs of which are… where mi ∈ [0, 1][corresponds to Boolean variable] is the response of a node in the mask layer”) depending at least in part on the parameters representative of the floating point values of the one or more nodes of the one or more mask layers. (Examiner notes that Table 2 shows MPPE with floating point values for car dataset including 97.0, 95.3 and 99.9 see pg. 451 “MPPE obtained by seven models on the EPFL car dataset”)
Han, Han2, Gao and Yang are analogous art because they are all directed to neural network.  
Han in view of Han2 with Gao to incorporate the teaching of Yang to include mask layers that can select the most discriminative features from the input and pass them to the next level which also help deal with multiple tasks such as object detection and viewpoint estimation simultaneously as recognized by Yang (pg. 442 fifth paragraph).

Regarding claim 17
	Claim 17 is an apparatus claim corresponding to method claim 6 and is rejected for the same reasons as given in the rejection of claim 6. 

Regarding claim 7 
Han in view of Han2 with Gao and Yang teaches claim 6. 
Han further teaches … iteratively assigning values (pg. 3 section 3 “During update, all the gradients are grouped by the color and summed together, multiplied by the learning rate and subtracted from the shared centroids from last iteration.”)
Yang further teaches …the Boolean variables of the one or more nodes of the one or more mask layers further comprises assigning the values depending at least in part on a specified threshold parameter. (pg. 444 first paragraph “The mask layer is fully-connected to the output of CNNM with bounded rectified linear neurons, the outputs of which are… where mi ∈ [0, 1][corresponds to Boolean variable] is the response of a node in the mask layer” also Yang teaches threshold parameter on pg. 448 fifth paragraph “Then, the average of the bounding boxes of the objects in the same viewpoint θ is defined as fB(θ). For a training image patch containing an object in viewpoint θ, if a ratio T2 is larger than a threshold, then the patch is regarded as a positive sample; otherwise a negative sample.”)
Han, Han2, Gao and Yang are analogous art because they are all directed to neural network.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Han in view of Han2 with Gao to incorporate the teaching of Yang to include mask layers that can select the most discriminative features from the input and pass them to the next level which also help deal with multiple tasks such as object detection and viewpoint estimation simultaneously as recognized by Yang (pg. 442 fifth paragraph).
Regarding claim 18
	Claim 18 is an apparatus claim corresponding to method claim 7 and is rejected for the same reasons as given in the rejection of claim 7. 

Regarding claim 8 
Han in view of Han2 with Gao and Yang teaches claim 7. 
Han further teaches wherein …further comprises iteratively applying a weight decay parameter to the floating point values of the one or more nodes of the one or more mask layers, (Examiner notes that Han teaches iteratively pruning based on the weights that is based on the real value see pg. 2 section 2 and see the floating points on pg. 2 third paragraph “Energy consumption is dominated by memory access. Under 45nm CMOS technology, a 32 bit floating point add consumes 0.9pJ, a 32bit SRAM cache access takes 5pJ, while a 32bit DRAM memory access takes 640pJ, which is 3 orders of magnitude of an add operation”) wherein the weight decay parameter is increased for successive iterations. (pg. 11 fifth paragraph “Network pruning has been used both to reduce network complexity and to reduce over-fitting. An early approach to pruning was biased weight decay (Hanson & Pratt, 1989). Optimal Brain Damage (LeCun et al., 1989) and Optimal Brain Surgeon (Hassibi et al., 1993) prune networks to reduce the number of connections based on the Hessian of the loss function and suggest that such pruning is more accurate than magnitude-based pruning such as weight decay.”)
Yang further teaches …training the one or more parameters representative of the one or more nodes of the one or more mask layers (pg. 443-444 first paragraph “The whole network is trained automatically with the input HOG images and the target ground truth information… The mask layer is fully-connected to the output of CNNM with bounded rectified linear neurons, the outputs of which are… where mi ∈ [0, 1] s the response of a node in the mask layer xi is the response of a node in the output of CNNM, wij is the weight between nodes i and j of the two layers, and N is the size of the mask”)
Han, Han2, Gao and Yang are analogous art because they are all directed to neural network.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Han in view of Han2 with Gao to incorporate the teaching of Yang to include mask layers that can select the most Yang (pg. 442 fifth paragraph).

Regarding claim 19
	Claim 19 is an apparatus claim corresponding to method claim 8 and is rejected for the same reasons as given in the rejection of claim 8. 

Regarding claim 9
Han in view of Han2 with Gao and Yang teaches claim 8. 
Han further teaches wherein the iteratively assigning the …and the iteratively applying the weight decay parameter continues until the training the one or more parameters representative of …is determined to not maintain a specified measure of neural network model accuracy. (Pg. 11 fifth paragraph “Network pruning has been used both to reduce network complexity and to reduce over-fitting. An early approach to pruning was biased weight decay (Hanson & Pratt, 1989). Optimal Brain Damage (LeCun et al., 1989) and Optimal Brain Surgeon (Hassibi et al., 1993) prune networks to reduce the number of connections based on the Hessian of the loss function and suggest that such pruning is more accurate than magnitude-based pruning such as weight decay.”) 
Yang further teaches ...values to the Boolean variables of the one or more nodes of the one or more mask layers (pg. 444 first paragraph “The mask layer is fully-connected to the output of CNNM with bounded rectified linear neurons, the outputs of which are… where mi ∈ [0, 1][corresponds to Boolean variable] is the response of a node in the mask layer” also Yang teaches threshold parameter on pg. 448 fifth paragraph “Then, the average of the bounding boxes of the objects in the same viewpoint θ is defined as fB(θ). For a training image patch containing an object in viewpoint θ, if a ratio T2 is larger than a threshold, then the patch is regarded as a positive sample; otherwise a negative sample.”)
Han, Han2 and Yang are analogous art because they are all directed to neural network.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Han in view of Han2 with Gao to incorporate the teaching of Yang to include mask layers that can select the most discriminative features from the input and pass them to the next level which also help deal with multiple tasks such as object detection and viewpoint estimation simultaneously as disclosed by Yang (pg. 442 fifth paragraph).
Regarding claim 20
	Claim 20 is an apparatus claim corresponding to method claim 9 and is rejected for the same reasons as given in the rejection of claim 9.


Response to Arguments
Applicant’s arguments with respect to claim(s) 1-22 have been considered but are moot because the new ground of rejection does not rely on any reference applied in 


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure is listed below: 
Han et al. (“EIE: Efficient Inference Engine on Compressed Deep Neural Network”)  teaches an efficient inference engine, a specialized accelerator that performs customized sparse matrix vector multiplication and handles weight sharing with no loss of efficiency. 
Kaiming He (“Deep Residual Learning for Image Recognition”) teaches residual learning framework to ease the training of networks and reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. 
Tianxing He et al. (“Reshaping Deep Neural Network for Fast Decoding By Node-Pruning”) teaches fully trained DNN are pruned with certain importance function and the reshaped DNN is retuned using back-propagation. Their approach requires no modification on code and can directly save computational costs during decoding.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VAN C MANG whose telephone number is (571)270-7598.  The examiner can normally be reached on Mon - Fri 8:00-5:00pm.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on 5712729767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/V.M./Examiner, Art Unit 2126    
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126