Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This office action is in response to application 16/273,592, which was filed 02/12/19. In a preliminary amendment 04/02/21, claims 1 and 6 were amended. Claims 1-6 are pending and have been considered.

Allowable Subject Matter
Claims 1-6 are allowed.
The following is an examiner’s statement of reasons for allowance: 

The closest prior art to independent claim 1 is Han et al. (“DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING”, Cornell University, 15 Feb 2016, ArXiv:1510.00149v5[cs.CV], 14 pages). Han discloses a method comprising: 
instantiating a convolutional neural network on a computing system, the convolutional neural network including a plurality of layers (Abstract discloses a convolutional neural network), wherein instantiating the convolutional neural network comprises: 
training the convolutional neural network using a first loss function until a first classification accuracy is reached, wherein the first loss function calculates a classification error of the convolutional neural network, wherein training the convolutional neural network with the first loss function comprises optimizing, for a first one of the layers, a first set of F filters and a first set of F biases so as to minimize the first loss function, wherein each of the F filters is formed from K kernels, wherein each of the kernels has N-dimensions, and wherein each of the biases is scalar (Section 2. Network pruning discloses training a convolutional neural network. Fig. 1 shows the training step in the Pruning box, "Train Connectivity"); 
clustering the set of F x K kernels of the first layer into a set of C clusters, wherein each of the clusters is characterized by a centroid, thereby the C clusters being characterized by C centroids, wherein each of the centroids has N-dimensions, and wherein C is less than F x K (Section 3.1. Weight sharing discloses clustering a set of F x K weights, i.e. n weights, into a set of C clusters, i.e. k clusters, wherein n > k); 
training the convolutional neural network using the first loss function until a second classification accuracy is reached (Section 3.3. Feed-forward and back-propagation discloses that the centroids are retrained. Fig. 1 shows the retraining step in the quantization box, "Retrain Code Book"); 
creating a dictionary which maps each of the centroids to a corresponding centroid identifier (Section 3.3. Feed-forward and back-propagation discloses that a dictionary is created to map centroids to a centroid identifier, "...an index into the shared weight table is stored for each connection...". Fig. 1 shows the compression step in the quantization box, "Quantize the Weights with Code Book"); 
quantizing and compressing the F filters of the first layer by, for each of the F x K kernels, replacing the kernel with a centroid identifier that identifies a centroid that is closest to the kernel (Section 3.1. Weight sharing and 3.3. Feed-forward and back-propagation disclose quantization and compressing filters); 
storing the F quantized and compressed filters of the first layer in a memory of the computing system, the F quantized and compressed filters comprising F x K centroid identifiers (Section 8.1. Weight sharing and 3.3. Feed-forward and back-propagation disclose storing the quantized and compressed filters); and 
storing the F biases of the first layer in the memory (the convolutional neural network in 1. Introduction also comprises biases which must be stored); and 
classifying data received by the convolutional neural network (Section 5. Experiments discloses classifying data with the convolutional neural network), wherein the classification comprises: 
retrieving the F quantized and compressed filters of the first layer from the memory, the F quantized and compressed filters comprising F x K centroid identifiers (in order to classify the data in Section 5. Experiments, the quantized and compressed filters must be retrieved); 
decompressing, using the dictionary, the F quantized and compressed filters of the first layer into F quantized filters by mapping the F x K centroid identifiers into F x K corresponding quantized kernels, the F x K corresponding quantized kernels forming the F quantized filters (in order to classify the data in Section 5. Experiments, the quantized and compressed filters must be decompressed); 
retrieving the F biases of the first layer from the memory (in order to classify the data in Section 5. Experiments, the biases must be retrieved); and 
for the first layer, computing a convolution of the received data or data output from a layer previous to the first layer with the F quantized filters and the F biases (in order to classify the data in Section 5. Experiments, a convolution must be computed with the quantized filters and the biases).
However, Han does not disclose or suggest each of the kernels consists of nine parameters, and each of the centroids consists of nine parameters, and replacing the nine parameters of the kernel with one of the scalar centroid identifiers from the dictionary, and the F x K corresponding quantized kernels each having nine parameters and forming the F quantized filters. In other words, the deep neural network compression found in the method of claim 1 is more particular and specific than that taught by Han, and there is no particular reason one of ordinary skill in the art before the effective filing date of the claimed invention would have modified Han to result in the limitations of claim 1. 

Dependent claims 2-6 are allowable because they further limit allowable parent claim 1. 

Other relevant prior art:

Jafri et al. (“MOCHA: Morphable locality and compression aware architecture for convolutional neural networks”. 2017 IEEE International Parallel and Distributed Processing Symposium, pages 276-286) discloses an accelerator for CNNs which compresses input kernels, interleaves various optimizations, and automatically interleaves and cascades the optimizations.

Cheng et al. (“Model Compression and Acceleration for Deep Neural Networks”. arXiv:1710.09282v1 [cs.LG] 23 Oct 2017) discloses parameter pruning and sharing, low-rank factorization, transferred/compact convolutional filters, and knowledge distillation techniques for compressing deep NN models.

US 10417525 Ji et al. disclose objection recognition with reduced neural network weight precision.

US 20160217369 Annapureddy et al. disclose compressing a neural network by replacing one layer with compressed layers to produce the compressed network. The compressed network is fine-tuned by updating weight values in the compressed layer.

US 10373050 Lin et al. disclose quantizing a floating point neural network to obtain a fixed point neural network using a quantizer by selecting a moment of an input distribution of the floating point NN, determining quantizer parameters for quantizing values of the floating point neural network based on the selected moment of the input distribution of the floating point neural network.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jesse Pullias whose telephone number is 571/270-5135. The examiner can normally be reached on M-F 9:00 AM - 4:30 PM. If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Andrew Flanders can be reached on 571/272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571/270-6135. 

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/Jesse S Pullias/
Primary Examiner, Art Unit 2655                                                 05/24/22