DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The filing date of the present invention is 07/29/2019. 
This action is in response to amendment and/or remarks filed on 09/29/2022. In the current amendments, claims 1-2, 4-6, 8 and 11-18 have been amended and claims 20-22 have been added. Claims 1-22 are currently pending and have been examined. 
In response to amendments and/or arguments filed on 09/29/2022, the claim objections made in the previous Office Action has been withdrawn. 
In response to amendments and/or arguments filed on 09/29/2022, the 35 U.S.C 112(f) claim interpretation made in the previous Office Action has been withdrawn. 
In response to amendments and/or arguments filed on 09/29/2022, the 35 U.S.C 112(b) rejections made in the previous Office Action has been withdrawn. 


Response to Arguments
Applicant’s arguments with respect to claim(s) 1-19 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-4, 6-7, 9-10, 12-14, 16-18 and 20-21 are rejected under 35 U.S.C. 103 as being unpatentable over Han et al. (“Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantiztion and Huffman Coding”, hereinafter: Han) in view of Abbasi-Asl et al. (“Structural Compression of Convolutional Neural Networks Based on Greedy Filter Pruning”). 
Regarding claim 1 (Currently Amended)
Han teaches a method of compressing an artificial neural network, (abstract “we introduce “deep compression”, a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35× to 49× without affecting their accuracy. Our method first prunes the network by learning only the important connections. Next, we quantize the weights to enforce weight sharing, finally, we apply Huffman coding. After the first two steps we retrain the network to fine tune the remaining connections and the quantized centroids. Pruning, reduces the number of connections by 9× to 13×; Quantization then reduces the number of bits that represent each connection from 32 to 5.”)
the method comprising: obtaining an initial value (section 6.3 “Fully connected layer dominates the model size (more than 90%) and got compressed the most by Deep Compression (96% weights pruned in VGG-16). In state-of-the-art object detection algorithms such as fast R-CNN (Girshick, 2015), upto 38% computation time is consumed on FC layers on uncompressed model. So it’s interesting to benchmark on FC layers, to see the effect of Deep Compression on performance and energy. Thus we setup our benchmark on FC6, FC7, FC8 layers of AlexNet and VGG-16.”) of a task accuracy for an inference task processed by the artificial neural network; (pg. 2 third paragraph “Our goal is to reduce the storage and energy required to run inference on such large networks so they can be deployed on mobile devices. To achieve this goal, we present “deep compression”: a three stage pipeline (Figure 1) to reduce the storage required by neural network in a manner that preserves the original accuracy. First, we prune the networking by removing the redundant connections, keeping only the most informative connections. Next, the weights are quantized so that multiple connections share the same weight, thus only the codebook (effective weights) and the indices need to be stored. Finally, we apply Huffman coding to take advantage of the biased distribution of effective weights.” also see section 3 “Weight sharing is illustrated in Figure 3. Suppose we have a layer that has 4 input neurons and 4 output neurons, the weight is a 4 × 4 matrix. On the top left is the 4 × 4 weight matrix, and on the bottom left is the 4 × 4 gradient matrix. The weights are quantized to 4 bins (denoted with 4 colors), all the weights in the same bin share the same value, thus for each weight, we then need to store only a small index into a table of shared weights.”)
compressing the artificial neural network by adjusting weights of connections among layers of the artificial neural network included in information regarding the connections; (pg. 2 “First, we prune the networking by removing the redundant connections, keeping only the most informative connections. Next, the weights are quantized so that multiple connections share the same weight, thus only the codebook (effective weights) and the indices need to be stored.”)
…
re-compressing the compressed artificial neural network according to the compression rate; (Examiner notes that the three stages of compression steps there is a repeating steps/update with arrow going back to the previous steps which corresponds to “re-compression” see Figure 1: “The three stage compression pipeline: pruning, quantization and Huffman coding. Pruning reduces the number of weights by 10×, while quantization further improves the compression rate: between 27× and 31×. Huffman coding gives more compression: between 35× and 49×. The compression rate already included the meta-data for sparse representation. The compression scheme doesn’t incur any accuracy loss.”)
and generating, based on an input provided to the recompressed artificial neural network, an inference output (pg. 3 “Weight sharing is illustrated in Figure 3. Suppose we have a layer that has 4 input neurons and 4 output neurons, the weight is a 4 × 4 matrix. On the top left is the 4 × 4 weight matrix, and on the bottom left is the 4 × 4 gradient matrix… For example, Figure 3 shows the weights of a single layer neural network with four input units and four output units. There are 4×4 = 16 weights originally but there are only 4 shared weights: similar weights are grouped together to share the same value.”).
Han does not teach determining, based on the initial value of the task accuracy and a task accuracy of the compressed artificial neural network, a compression rate for the compressed artificial neural network.
Abbasi-Asl teaches determining, based on the initial value of the task accuracy and a task accuracy of the compressed artificial neural network, a compression rate for the compressed artificial neural network; (pg. 6 “Figure 2: Performance of compression for LeNet. The top figure shows the overall classification accuracy of LeNet when the first convolutional layer is compressed. The bottom figure shows the classification accuracy when the second convolutional layer is compressed. The classification accuracy of uncompressed network[corresponds to initial value] is shown with a dashed red line. The purple curve shows the classification accuracy of our proposed CAR compression algorithm for various compression ratios. The accuracy for the fine tuned (retrained) CAR compression is shown in blue.”)
Han and Abbasi-Asl are analogous art because they are both directed to compressing neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have to combine the compressing neural network of Han with determining, based on the initial value of the task accuracy and a task accuracy of the compressed artificial neural network of Abbasi-Asl.
One of ordinary skill in the art would have been motivated to make this modification in order to allow “for better data adaptation and improves compression performance” using greedy process of fine-tuning at each iteration as disclosed by Abbasi-Asl (pg. 4 “One can also compress based on variants of our algorithm. One possibility is to avoid greedy process and remove several filters with lowest importance indices in one pass. This compression is faster, however the performance of the compressed network is worse than Algorithm 1 in the examples we tried. The greedy process with fine-tuning in each iteration seems to allow for a better data adaptation and improves compression performance.”)
 
Regarding claim 11
Claim 11 recites analogous limitations to independent claim 1 and therefore is rejected on the same ground as independent claim 1. 
Regarding claim 2 (Currently Amended) 
Han in view of Abbasi-Asl teaches the method of claim 1.
Abbasi-Asl further teaches wherein the determining of the compression rate comprises: increasing, in response to the task accuracy of the compressed artificial neural network being less than the initial value, (pg. 3-4 “Again, the filters are pruned while the classification accuracy is in the range of relative 5% from the accuracy of uncompressed network, while increasing the compression ratio (for weights) from 1.4 to 42 for either layer – a stunning 30 fold increase. That is, further weight compression boosts the weight compression ratio by sparsifying weights of the kept filters, although the number of filters is the same as the CAR compression.”)
an initial compression rate, (algorithm 1 shows initial compression ratio at input stage)
corresponding to the compressing of the artificial neural network, to increase the task accuracy of the compressed artificial neural network (pg. 8 first paragraph “If we fine-tune or retrain the CAR-compressed network, the compression ratio is increased to 1.66 or 1.67 with the same number of filters (to maintain the same 54% classification accuracy). We have also reported the classification accuracy for the case when both layer one and two are compressed together with a small decrease of classification accuracy to 51% (an absolute 3% or a relative 5%). The network is fine tuned or retrained on the same classification task after CAR-compressing both layers.”).
Han and Abbasi-Asl are analogous art because they are both directed to compressing neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have to combine the compressing neural network of Han with determining, based on the initial value of the task accuracy and a task accuracy of the compressed artificial neural network of Abbasi-Asl.
One of ordinary skill in the art would have been motivated to make this modification in order to allow “for better data adaptation and improves compression performance” using greedy process of fine-tuning at each iteration as disclosed by Abbasi-Asl (pg. 4 “One can also compress based on variants of our algorithm. One possibility is to avoid greedy process and remove several filters with lowest importance indices in one pass. This compression is faster, however the performance of the compressed network is worse than Algorithm 1 in the examples we tried. The greedy process with fine-tuning in each iteration seems to allow for a better data adaptation and improves compression performance.”)

Regarding claim 3
Han in view of Abbasi-Asl teaches the method of claim 1.
Han further teaches the method further comprising: 2Application No. 16/524,341performing a compression-evaluation operation to determine a task accuracy of the re-compressed artificial neural network (pg. 2 “We build on top of that approach. As shown on the left side of Figure 1, we start by learning the connectivity via normal network training. Next, we prune the small-weight connections: all connections with weights below a threshold are removed from the network. Finally, we retrain the network to learn the final weights for the remaining sparse connections. Pruning reduced the number of parameters by 9× and 13× for AlexNet and VGG-16 model.”)
and a compression rate for the re-compressed artificial neural network. (Examiner notes that the three stages of compression steps there is a repeating steps/update with arrow going back to the previous steps which corresponds to “re-compression” see Figure 1 and also see pg. 5 section 5 “Pruning is implemented by adding a mask to the blobs to mask out the update of the pruned connections. Quantization and weight sharing are implemented by maintaining a codebook structure that stores the shared weight, and group-by-index after calculating the gradient of each layer. Each shared weight is updated with all the gradients that fall into that bucket.”)

Regarding claim 4 (Currently Amended)
Han in view of Abbasi-Asl teaches the method of claim 3.
The method of claim 3, further comprising performing, based on an accuracy loss threshold and the task accuracy of the re-compressed artificial neural network, the operation. Under its broadest reasonable interpretation Examiner notes that during the retraining codebook steps it is comparing to accuracy loss threshold and quantize the weights of the codebook see “Figure 1: The three stage compression pipeline: pruning, quantization and Huffman coding. Pruning reduces the number of weights by 10×, while quantization further improves the compression rate: between 27× and 31×. Huffman coding gives more compression: between 35× and 49×. The compression rate already included the meta-data for sparse representation. The compression scheme doesn’t incur any accuracy loss.”)

Regarding claim 6 (Currently Amended)
Han in view of Abbasi-Asl teaches the method of claim 1.
Han further teaches wherein the re-compressing of the artificial neural network comprises: determining, based on the compression rate and a task accuracy for a task processed by the compressed artificial neural network, (see pg. 3 section 3 “During update, all the gradients are grouped by the color and summed together, multiplied by the learning rate and subtracted from the shared centroids from last iteration. For pruned AlexNet, we are able to quantize to 8-bits (256 shared weights) for each CONV layers, and 5-bits (32 shared weights) for each FC layer without any loss of accuracy. To calculate the compression rate, given k clusters, we only need log2(k) bits to encode the index. In general, for a network with n connections and each connection is represented with b bits, constraining the connections to have only k shared weights will result in a compression rate…”)
a compression ratio of the compressed artificial neural network; and re-compressing, based on the determined compression ratio, the artificial neural network (Examiner notes that the three stages of compression steps there is a repeating steps/update with arrow going back to the previous steps which corresponds to “re-compression” see Figure 1 and see pg. 5 section 5 “Pruning is implemented by adding a mask to the blobs to mask out the update of the pruned connections. Quantization and weight sharing are implemented by maintaining a codebook structure that stores the shared weight, and group-by-index after calculating the gradient of each layer. Each shared weight is updated with all the gradients that fall into that bucket.”).

Regarding claim 7
Han in view of Abbasi-Asl teaches the method of claim 6.
Han further teaches wherein the re-compressing of the artificial neural network comprises re-compressing the artificial neural network by adjusting weights from among nodes belonging to different layers from among layers of the compressed artificial neural network according to the compression ratio. (Figure 1: The three stage compression pipeline: pruning, quantization and Huffman coding. Pruning reduces the number of weights by 10×, while quantization further improves the compression rate: between 27× and 31×. Huffman coding gives more compression: between 35× and 49×. The compression rate already included the meta-data for sparse representation. The compression scheme doesn’t incur any accuracy loss.” also see pg. 2 “First, we prune the networking by removing the redundant connections, keeping only the most informative connections. Next, the weights are quantized so that multiple connections share the same weight, thus only the codebook (effective weights) and the indices need to be stored.”)

Regarding claim 9
Han in view of Abbasi-Asl teaches the method of claim 1.
Han further teaches wherein the artificial neural network comprises a trained artificial neural network. (Section 3.1 “We use k-means clustering to identify the shared weights for each layer of a trained network, so that all the weights that fall into the same cluster will share the same weight.”)

 Regarding claim 10
Han in view of Abbasi-Asl teaches the method of claim 1.
Han further teaches a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 1. (Pg. 9 section 6.3 “We compare three different off-the-shelf hardware: the NVIDIA GeForce GTX Titan X and the Intel Core i7 5930K as desktop processors (same package as NVIDIA Digits Dev Box) and NVIDIA Tegra K1 as mobile processor.”)

Regarding claim 12 (Currently Amended)
Han in view of Abbasi-Asl teaches the method of claim 1.
Abbasi-Asl further teaches wherein the processor is further configured to increase, in response to the task accuracy of the compressed artificial neural network being less than the initial value, (pg. 3-4 “Again, the filters are pruned while the classification accuracy is in the range of relative 5% from the accuracy of uncompressed network, while increasing the compression ratio (for weights) from 1.4 to 42 for either layer – a stunning 30 fold increase. That is, further weight compression boosts the weight compression ratio by sparsifying weights of the kept filters, although the number of filters is the same as the CAR compression.”)
an initial compression rate, (algorithm 1 shows initial compression ratio at input stage)
corresponding to the compressing of the artificial neural network, to increase task accuracy (pg. 8 first paragraph “If we fine-tune or retrain the CAR-compressed network, the compression ratio is increased to 1.66 or 1.67 with the same number of filters (to maintain the same 54% classification accuracy). We have also reported the classification accuracy for the case when both layer one and two are compressed together with a small decrease of classification accuracy to 51% (an absolute 3% or a relative 5%). The network is fine tuned or retrained on the same classification task after CAR-compressing both layers.”).
Han and Abbasi-Asl are analogous art because they are both directed to compressing neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have to combine the compressing neural network of Han with determining, based on the initial value of the task accuracy and a task accuracy of the compressed artificial neural network of Abbasi-Asl.
One of ordinary skill in the art would have been motivated to make this modification in order to allow “for better data adaptation and improves compression performance” using greedy process of fine-tuning at each iteration as disclosed by Abbasi-Asl (pg. 4 “One can also compress based on variants of our algorithm. One possibility is to avoid greedy process and remove several filters with lowest importance indices in one pass. This compression is faster, however the performance of the compressed network is worse than Algorithm 1 in the examples we tried. The greedy process with fine-tuning in each iteration seems to allow for a better data adaptation and improves compression performance.”)

Regarding claim 13 (Currently Amended)
Han in view of Abbasi-Asl teaches the method of claim 11.
Han further teaches wherein the processor (pg. 9 “We compare three different off-the-shelf hardware: the NVIDIA GeForce GTX Titan X and the Intel Core i7 5930K as desktop processors (same package as NVIDIA Digits Dev Box) and NVIDIA Tegra K1 as mobile processor.”) is further configured to perform a compression-evaluation operation to determine a task accuracy of the re-compressed artificial neural network (pg. 2 “We build on top of that approach. As shown on the left side of Figure 1, we start by learning the connectivity via normal network training. Next, we prune the small-weight connections: all connections with weights below a threshold are removed from the network. Finally, we retrain the network to learn the final weights for the remaining sparse connections. Pruning reduced the number of parameters by 9× and 13× for AlexNet and VGG-16 model.”)
and a compression rate for the re-compressed artificial neural network. (Examiner notes that the three stages of compression steps there is a repeating steps/update with arrow going back to the previous steps which corresponds to “re-compression” see Figure 1 and also see pg. 5 section 5 “Pruning is implemented by adding a mask to the blobs to mask out the update of the pruned connections. Quantization and weight sharing are implemented by maintaining a codebook structure that stores the shared weight, and group-by-index after calculating the gradient of each layer. Each shared weight is updated with all the gradients that fall into that bucket.”)

Regarding claim 14 (Currently Amended)
Han in view of Abbasi-Asl teaches the method of claim 11.
Han further teaches wherein the processor (pg. 9 “We compare three different off-the-shelf hardware: the NVIDIA GeForce GTX Titan X and the Intel Core i7 5930K as desktop processors (same package as NVIDIA Digits Dev Box) and NVIDIA Tegra K1 as mobile processor.”) is further configured to determine, based on an accuracy loss threshold and the task accuracy of the re-compressed artificial neural network, whether to perform the compression-evaluation operation. (Under its broadest reasonable interpretation Examiner notes that during the retraining codebook steps it is comparing to accuracy loss threshold and quantize the weights of the codebook see “Figure 1: The three stage compression pipeline: pruning, quantization and Huffman coding. Pruning reduces the number of weights by 10×, while quantization further improves the compression rate: between 27× and 31×. Huffman coding gives more compression: between 35× and 49×. The compression rate already included the meta-data for sparse representation. The compression scheme doesn’t incur any accuracy loss.”)

Regarding claim 16 (Currently Amended)
Han in view of Abbasi-Asl teaches the method of claim 11.
Han further teaches wherein the processor (pg. 9 “We compare three different off-the-shelf hardware: the NVIDIA GeForce GTX Titan X and the Intel Core i7 5930K as desktop processors (same package as NVIDIA Digits Dev Box) and NVIDIA Tegra K1 as mobile processor.”) is further configured to: determine, based on the compression rate and a task accuracy for a task processed by the compressed artificial neural network, pg. 3 section 3 “During update, all the gradients are grouped by the color and summed together, multiplied by the learning rate and subtracted from the shared centroids from last iteration. For pruned AlexNet, we are able to quantize to 8-bits (256 shared weights) for each CONV layers, and 5-bits (32 shared weights) for each FC layer without any loss of accuracy. To calculate the compression rate, given k clusters, we only need log2(k) bits to encode the index. In general, for a network with n connections and each connection is represented with b bits, constraining the connections to have only k shared weights will result in a compression rate…”)
a compression ratio of the compressed artificial neural network and re-compresses the artificial neural network based on the determined compression ratio (Examiner notes that the three stages of compression steps there is a repeating steps/update with arrow going back to the previous steps which corresponds to “re-compression” see Figure 1 and see pg. 5 section 5 “Pruning is implemented by adding a mask to the blobs to mask out the update of the pruned connections. Quantization and weight sharing are implemented by maintaining a codebook structure that stores the shared weight, and group-by-index after calculating the gradient of each layer. Each shared weight is updated with all the gradients that fall into that bucket.”).

Regarding claim 17 (Currently Amended)
Han in view of Abbasi-Asl teaches the method of claim 14.
Han further teaches wherein the processor is further configured to re-compressing of the artificial neural network comprises re-compressing the artificial neural network by adjusting weights from among nodes belonging to different layers from among layers of the compressed artificial neural network according to the compression ratio. (Figure 1: The three stage compression pipeline: pruning, quantization and Huffman coding. Pruning reduces the number of weights by 10×, while quantization further improves the compression rate: between 27× and 31×. Huffman coding gives more compression: between 35× and 49×. The compression rate already included the meta-data for sparse representation. The compression scheme doesn’t incur any accuracy loss.” also see pg. 2 “First, we prune the networking by removing the redundant connections, keeping only the most informative connections. Next, the weights are quantized so that multiple connections share the same weight, thus only the codebook (effective weights) and the indices need to be stored.”)

Regarding claim 18 (Currently Amended)
Han teaches the apparatus of claim 14.
Han further teaches wherein the processor is further configured to re-compresses the artificial neural network by adjusting weights with a value lesser than a threshold from among nodes of the compressed artificial neural network according to the compression ratio. (Figure 1 “The three stage compression pipeline: pruning, quantization and Huffman coding. Pruning reduces the number of weights by 10×, while quantization further improves the compression rate: between 27× and 31×. Huffman coding gives more compression: between 35× and 49×. The compression rate already included the meta-data for sparse representation. The compression scheme doesn’t incur any accuracy loss.” also see pg. 2 “First, we prune the networking by removing the redundant connections, keeping only the most informative connections. Next, the weights are quantized[corresponds to adjust] so that multiple connections share the same weight, thus only the codebook (effective weights) and the indices need to be stored.”)

Regarding claim 20 (New)
Han in view of Abbasi-Asl teaches the apparatus of claim 1. 
Han further teaches wherein the adjusting of the weights of the connections includes one of: removing the weight of connections that are equal or below a threshold, (pg. 2 section 2 “Next, we prune the small-weight connections: all connections with weights below a threshold are removed from the network. Finally, we retrain the network to learn the final weights for the remaining sparse connections. Pruning reduced the number of parameters by 9× and 13× for AlexNet and VGG-16 model.”) setting the weight to zero, or ignoring the weight during inference of the artificial neural network.  

Regarding claim 21 (New)
Han in view of Abbasi-Asl teaches the apparatus of claim 1. 
Han further teaches wherein the compressing of the artificial neural network includes pruning, …weights of connections that are equal to or less than a threshold (pg. 2 section 2 “. As shown on the left side of Figure 1, we start by learning the connectivity via normal network training. Next, we prune the small-weight connections: all connections with weights below a threshold are removed from the network.”);
Abbasi-Asl further teaches based on the initial value of the task accuracy (FIG. 2 “The classification accuracy of uncompressed network is shown with a dashed red line. The purple curve shows the classification accuracy of our proposed CAR compression algorithm for various compression ratios.”)


Claims 5, 15 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Han et al. (“Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantiztion and Huffman Coding”, hereinafter: Han) in view of Abbasi-Asl et al. (“Structural Compression of Convolutional Neural Networks Based on Greedy Filter Pruning”) and further in view of Lou et al. (“ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression”).
Regarding claim 5 
Han in view of Abbasi-Asl teaches the method of claim 3. 
Han in view of Abbasi-Asl does not teach the method further comprising: comparing the compression rate of the re-compressed artificial neural network with a lower threshold; and determining whether to terminate a current compression session and to start a compression session after the compression rate is set to an initial reference value, based on a result of the comparison.  
Luo teaches the method further comprising: comparing the compression rate of the re-compressed artificial neural network with a lower threshold; (Examiner notes that Algorithm 1 shows comparing training items under the for loop inside the if condition on lines 7-8 section 4.2 “Hence, there are in total 100,000 training samples used for finding the optimal channel subset via Algorithm 1. We compared several different choices of image and location number, and found that the current choice (10 images per class and 10 locations per image) is enough for neuron importance evaluation.”)
and determining whether to terminate a current compression session and to start a compression session after the compression rate is set to an initial reference value, based on a result of the comparison (Examiner notes that Algorithm 1 shows comparing training items under the for loop inside if condition on lines 7-8 section 4.2 “Hence, there are in total 100,000 training samples used for finding the optimal channel subset via Algorithm 1. We compared several different choices of image and location number, and found that the current choice (10 images per class and 10 locations per image) is enough for neuron importance evaluation.” Also see pg. 5063 “To explore the limits of ThiNet, we prune VGG-16 with a larger compression rate 0.25, achieving 16× parameters reduction in convolutional layers. The conv5 layers are also pruned to get a smaller model. As for conv5-3, which is directly related to the final feature representation, we only prune half of the filters for accuracy consideration. Using these smaller compression ratios, we train a very small model. Denoted as “ThiNet-Tiny” in Table 1, it only takes 5.05MB disk space (1MB=2 20 bytes) but still has AlexNet-level accuracy (the top-1/top-5 accuracy of AlexNet is 57.2%/80.3%, respectively).”).
Han, Abbasi-Asl and Lou are analogous art because they are all directed to compressing neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have to combine the compressing neural network of Larson in view of Abbasi-Asl with filter level pruning for deep neural network compression of Lou.
One of ordinary skill in the art would have been motivated to make this modification in order to provide “an efficient and unified framework, namely ThiNet, to simultaneously accelerate and compress CNN models in both training and inference stages” as disclosed (Lou abstract “We propose an efficient and unified framework, namely ThiNet, to simultaneously accelerate and compress CNN models in both training and inference stages. We focus on the filter level pruning, i.e., the whole filter would be discarded if it is less important. Our method does not change the original network structure, thus it can be perfectly supported by any off-the-shelf deep learning libraries.”). 
Regarding claim 15
Claim 15 recites analogous limitations to claim 5 and therefore is rejected on the same ground as claim 5. 

Regarding claim 22 (New)
Han in view of Abbasi-Asl teaches the method of claim 1. 
Han in view of Abbasi-Asl does not teach wherein the compression rate is a degree to which compression is to be performed with respect to a time for performing one compression.
Luo teaches wherein the compression rate is a degree to which compression is to be performed with respect to a time for performing one compression (pg. 5061 right col “Solving Eq. 6 is still NP hard, thus we use a greedy strategy (illustrated in algorithm 1). We add one element to T at a time, and choose the channel leading to the smallest objective value in the current iteration. Obviously, this greedy solution is sub-optimal. But the gap can be compensated by fine-tuning. W” also see pg. 5062 “Following the pruning strategy in Section 3.3, all the FC layers in VGG-16 are removed, and replaced with a global average pooling layer, and fine-tuned on new datasets. Starting from this fine-tuned model, we then prune the network layer by layer with different compression rate. Each pruning is followed by one epoch fine-tuning, and 12 epochs are performed in the final layer to improve accuracy. This procedure is repeated several times with different channel selection strategies.”).
Han, Abbasi-Asl and Lou are analogous art because they are all directed to compressing neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have to combine the compressing neural network of Larson in view of Abbasi-Asl with filter level pruning for deep neural network compression of Lou.
One of ordinary skill in the art would have been motivated to make this modification in order to provide “an efficient and unified framework, namely ThiNet, to simultaneously accelerate and compress CNN models in both training and inference stages” as disclosed (Lou abstract “We propose an efficient and unified framework, namely ThiNet, to simultaneously accelerate and compress CNN models in both training and inference stages. We focus on the filter level pruning, i.e., the whole filter would be discarded if it is less important. Our method does not change the original network structure, thus it can be perfectly supported by any off-the-shelf deep learning libraries.”). 

Claims 8 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over 
Han et al. (“Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantiztion and Huffman Coding”, hereinafter: Han) in view of Abbasi-Asl et al. and further in view of Ko et al. (“Adaptive Weight Compression for Memory-Efficient Neural Networks”).
Regarding claim 8
Han in view of Abbasi-Asl teaches the apparatus of claim 6. 
Han in view of Abbasi-Asl does not teach wherein the compression ratio is determined to reduce a degree of loss of a task accuracy with respect to a compression rate.  
Ko teaches wherein the compression ratio is determined to reduce a degree of loss of a task accuracy with respect to a compression rate. (Pg. 203 left col “Table 1 shows compression ratio achieved by different approaches at 1% loss of normalized accuracy. For the MNIST dataset, the proposed adaptive QF control approach achieves 42.4X compression, which is significantly higher than other approaches. When combined with the bit truncation, the weight size reduces by 273X, from 446KB to 1.6 KB.”) 
Han, Abbasi-Asl and Ko are analogous art because they are all directed to compressing neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have to combine the compressing neural network of Han in view of Abbasi-Asl with weight compression for memory efficient using neural networks of Ko.
One of ordinary skill in the art would have been motivated to make this modification in order to adaptively control the quantization factor of the JPEG algorithm. Doing so provides the advantage compressing less for higher accuracy as disclosed (Ko abstract “To minimize the loss of accuracy due to JPEG encoding, we propose to adaptively control the quantization factor of the JPEG algorithm depending on the error-sensitivity (gradient) of each weight. With the adaptive compression technique, the weight blocks with higher sensitivity are compressed less for higher accuracy. The adaptive compression reduces memory requirement, which in turn results in higher performance and lower energy of neural network hardware. The simulation for inference hardware for multilayer perceptron with the MNIST dataset shows up to 42X compression with less than 1% loss of recognition accuracy, resulting in 3X higher effective memory bandwidth and ~19X lower system energy.”). 

Regarding claim 19
Claim 19 recites analogous limitations to claim 8 and therefore is rejected on the same ground as claim 8. 



Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VAN C MANG whose telephone number is (571)270-7598. The examiner can normally be reached Mon - Fri 8:00-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on 5712729767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/V.M./Examiner, Art Unit 2126    
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126