DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The filing date of the present invention is 07/29/2019. 
This action is in response to amendment and/or remarks filed on 01/28/2022. In the current amendments, claims 1 and 11 have been amended. Claims 1-19 are currently pending and have been examined. 


Information Disclosure Statement
The information disclosure statement (IDS) submitted on 01/28/2022 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Objections
Claims 12-13 and 16 are objected to because of the following informalities: 
Claims 12-13 and 16 all recites “the apparatus of claim 9…”, instead they should all recite “the method of claim 9…” because the corresponding parent claim 9 is a method claim. Appropriate correction is required.

Claim Interpretation
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “controller configured to…”, in claims 11 and 14-18.


Per ¶70 of the specification, “The controller 320 may control all operations for driving the neural network apparatus 3. [0071] For example, the controller 320 may execute programs stored in the memory 310 of the apparatus 300 to control all operations of the apparatus 300. The controller 320 includes at least one of the apparatuses described with reference to FIGS. 3 and 4 or performs at least one of the methods described with reference to FIGS. 1, 2, and 4 through 9. The controller 320 refers to a data processing device configured as hardware with a circuitry in a physical structure to execute desired operations. For example, the desired operations may include codes or instructions included in a program.” And ¶ [0078] “The controller 320 may evaluate a trained artificial neural network, thereby obtaining an initial value of a task accuracy. The task accuracy may include a mean squared error (MSE), which indicates an error between a result of inference by the artificial neural network and an expected result.”

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.  
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 12-13 and 16 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 12-13 and 16 recites the limitation “the controller” in line 1.  There is insufficient antecedent basis for this limitation in the claim. In addition, claims 12-13 and 16 all recites “the apparatus of claim 9…” and the corresponding parent claim 9 is a method claim therefore there is insufficient antecedent basis.


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1-4, 6-7, 9-14 and 16-18 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Han et al. (“Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding”, hereinafter: Han). 
Regarding claim 1 (Currently Amended)
Han teaches a method of compressing an artificial neural network, (abstract “we introduce “deep compression”, a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35× to 49× without affecting their accuracy. Our method first prunes the network by learning only the important connections. Next, we quantize the weights to enforce weight sharing, finally, we apply Huffman coding. After the first two steps we retrain the network to fine tune the remaining connections and the quantized centroids. Pruning, reduces the number of connections by 9× to 13×; Quantization then reduces the number of bits that represent each connection from 32 to 5.”)
the method comprising: obtaining an initial value of a task accuracy for a task processed by the artificial neural network; (section 3 “Weight sharing is illustrated in Figure 3. Suppose we have a layer that has 4 input neurons and 4 output neurons, the weight is a 4 × 4 matrix. On the top left is the 4 × 4 weight matrix, and on the bottom left is the 4 × 4 gradient matrix. The weights are quantized to 4 bins (denoted with 4 colors), all the weights in the same bin share the same value, thus for each weight, we then need to store only a small index into a table of shared weights.”)
compressing the artificial neural network by adjusting weights of connections among layers of the artificial neural network included in information regarding the connections; (pg. 2 “First, we prune the networking by removing the redundant connections, keeping only the most informative connections. Next, the weights are quantized so that multiple connections share the same weight, thus only the codebook (effective weights) and the indices need to be stored.”)
determining a compression rate for the compressed artificial neural network based on the initial value of the task accuracy and a task accuracy of the compressed artificial neural network, (see pg. 3 section 3 “During update, all the gradients are grouped by the color and summed together, multiplied by the learning rate and subtracted from the shared centroids from last iteration. For pruned AlexNet, we are able to quantize to 8-bits (256 shared weights) for each CONV layers, and 5-bits (32 shared weights) for each FC layer without any loss of accuracy. To calculate the compression rate, given k clusters, we only need log2(k) bits to encode the index. In general, for a network with n connections and each connection is represented with b bits, constraining the connections to have only k shared weights will result in a compression rate…”)
re-compressing the compressed artificial neural network according to the compression rate. (Examiner notes that the three stages of compression steps there is a repeating steps/update with arrow going back to the previous steps which corresponds to “re-compression” see Figure 1: “The three stage compression pipeline: pruning, quantization and Huffman coding. Pruning reduces the number of weights by 10×, while quantization further improves the compression rate: between 27× and 31×. Huffman coding gives more compression: between 35× and 49×. The compression rate already included the meta-data for sparse representation. The compression scheme doesn’t incur any accuracy loss.”)
 
Regarding claim 11
Claim 11 recites analogous limitations to independent claim 1 and therefore is rejected on the same ground as independent claim 1. 

Regarding claim 2 
Han teaches the method of claim 1.
Han further teaches wherein the determining of the compression rate comprises determining the compression rate to increase the task accuracy of the compressed artificial neural network, (Examiner notes that Han teaches increasing compression rate from 9x-13x and 27x-31x as clearly shown in “Figure 1: The three stage compression pipeline: pruning, quantization and Huffman coding. Pruning reduces the number of weights by 10×, while quantization further improves the compression rate: between 27× and 31×.”)
in response to the task accuracy of the compressed artificial neural network being less than the initial value. (Pg. 6 section 5.3 “The VGG16 network as a whole has been compressed by 49×. Weights in the CONV layers are represented with 8 bits, and FC layers use 5 bits, which does not impact the accuracy. The two largest fully-connected layers can each be pruned to less than 1.6% of their original size.”)

Regarding claim 3
Han teaches the method of claim 1.
Han further teaches the method further comprising: 2Application No. 16/524,341performing a compression-evaluation operation to determine a task accuracy of the re-compressed artificial neural network (pg. 2 “We build on top of that approach. As shown on the left side of Figure 1, we start by learning the connectivity via normal network training. Next, we prune the small-weight connections: all connections with weights below a threshold are removed from the network. Finally, we retrain the network to learn the final weights for the remaining sparse connections. Pruning reduced the number of parameters by 9× and 13× for AlexNet and VGG-16 model.”)
and a compression rate for the re-compressed artificial neural network. (Examiner notes that the three stages of compression steps there is a repeating steps/update with arrow going back to the previous steps which corresponds to “re-compression” see Figure 1 and also see pg. 5 section 5 “Pruning is implemented by adding a mask to the blobs to mask out the update of the pruned connections. Quantization and weight sharing are implemented by maintaining a codebook structure that stores the shared weight, and group-by-index after calculating the gradient of each layer. Each shared weight is updated with all the gradients that fall into that bucket.”)

Regarding claim 4
Han teaches the method of claim 3.
Han further the method further comprising performing the compression-evaluation operation based on a accuracy loss threshold and the task accuracy of the re-compressed artificial neural network. (Under its broadest reasonable interpretation Examiner notes that during the retraining codebook steps it is comparing to accuracy loss threshold and quantize the weights of the codebook see “Figure 1: The three stage compression pipeline: pruning, quantization and Huffman coding. Pruning reduces the number of weights by 10×, while quantization further improves the compression rate: between 27× and 31×. Huffman coding gives more compression: between 35× and 49×. The compression rate already included the meta-data for sparse representation. The compression scheme doesn’t incur any accuracy loss.”)

Regarding claim 6
Han teaches the method of claim 1.
Han further teaches wherein the re-compressing of the artificial neural network comprises: determining a compression ratio of the compressed artificial neural network based on the compression rate and a task accuracy for a task processed by the compressed artificial neural network; (see pg. 3 section 3 “During update, all the gradients are grouped by the color and summed together, multiplied by the learning rate and subtracted from the shared centroids from last iteration. For pruned AlexNet, we are able to quantize to 8-bits (256 shared weights) for each CONV layers, and 5-bits (32 shared weights) for each FC layer without any loss of accuracy. To calculate the compression rate, given k clusters, we only need log2(k) bits to encode the index. In general, for a network with n connections and each connection is represented with b bits, constraining the connections to have only k shared weights will result in a compression rate…”)
and re-compressing the artificial neural network based on the determined compression ratio. (Examiner notes that the three stages of compression steps there is a repeating steps/update with arrow going back to the previous steps which corresponds to “re-compression” see Figure 1 and see pg. 5 section 5 “Pruning is implemented by adding a mask to the blobs to mask out the update of the pruned connections. Quantization and weight sharing are implemented by maintaining a codebook structure that stores the shared weight, and group-by-index after calculating the gradient of each layer. Each shared weight is updated with all the gradients that fall into that bucket.”)

Regarding claim 7
Han teaches the method of claim 6.
Han further teaches wherein the re-compressing of the artificial neural network comprises re-compressing the artificial neural network by adjusting weights from among nodes belonging to different layers from among layers of the compressed artificial neural network according to the compression ratio. (Figure 1: The three stage compression pipeline: pruning, quantization and Huffman coding. Pruning reduces the number of weights by 10×, while quantization further improves the compression rate: between 27× and 31×. Huffman coding gives more compression: between 35× and 49×. The compression rate already included the meta-data for sparse representation. The compression scheme doesn’t incur any accuracy loss.” also see pg. 2 “First, we prune the networking by removing the redundant connections, keeping only the most informative connections. Next, the weights are quantized so that multiple connections share the same weight, thus only the codebook (effective weights) and the indices need to be stored.”)

Regarding claim 9
Han teaches the method of claim 1.
Han further teaches wherein the artificial neural network comprises a trained artificial neural network. (Section 3.1 “We use k-means clustering to identify the shared weights for each layer of a trained network, so that all the weights that fall into the same cluster will share the same weight.”)

 Regarding claim 10
Han teaches the method of claim 1.
Han further teaches a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 1. (Pg. 9 section 6.3 “We compare three different off-the-shelf hardware: the NVIDIA GeForce GTX Titan X and the Intel Core i7 5930K as desktop processors (same package as NVIDIA Digits Dev Box) and NVIDIA Tegra K1 as mobile processor.”)
Regarding claim 12
Han teaches the method of claim 9.
Han further teaches wherein the controller is further configured to determine the compression rate to increase task accuracy, (Examiner notes that Han teaches increasing compression rate from 9x-13x and 27x-31x as clearly shown in “Figure 1: The three stage compression pipeline: pruning, quantization and Huffman coding. Pruning reduces the number of weights by 10×, while quantization further improves the compression rate: between 27× and 31×.”)
in response to the task accuracy of the compressed artificial neural network being less than the initial value. (Pg. 6 section 5.3 “The VGG16 network as a whole has been compressed by 49×. Weights in the CONV layers are represented with 8 bits, and FC layers use 5 bits, which does not impact the accuracy. The two largest fully-connected layers can each be pruned to less than 1.6% of their original size.”)

Regarding claim 13 
Han teaches the method of claim 9.
Han further teaches wherein the controller is further configured to perform a compression-evaluation operation to determine a task accuracy of the re-compressed artificial neural network (pg. 2 “We build on top of that approach. As shown on the left side of Figure 1, we start by learning the connectivity via normal network training. Next, we prune the small-weight connections: all connections with weights below a threshold are removed from the network. Finally, we retrain the network to learn the final weights for the remaining sparse connections. Pruning reduced the number of parameters by 9× and 13× for AlexNet and VGG-16 model.”)
and a compression rate for the re-compressed artificial neural network. (Examiner notes that the three stages of compression steps there is a repeating steps/update with arrow going back to the previous steps which corresponds to “re-compression” see Figure 1 and also see pg. 5 section 5 “Pruning is implemented by adding a mask to the blobs to mask out the update of the pruned connections. Quantization and weight sharing are implemented by maintaining a codebook structure that stores the shared weight, and group-by-index after calculating the gradient of each layer. Each shared weight is updated with all the gradients that fall into that bucket.”)

Regarding claim 14
Han teaches the apparatus of claim 11.
Han further the method further comprising performing the compression-evaluation operation based on a accuracy loss threshold and the task accuracy of the re-compressed artificial neural network. (Under its broadest reasonable interpretation Examiner notes that during the retraining codebook steps it is comparing to accuracy loss threshold and quantize the weights of the codebook see “Figure 1: The three stage compression pipeline: pruning, quantization and Huffman coding. Pruning reduces the number of weights by 10×, while quantization further improves the compression rate: between 27× and 31×. Huffman coding gives more compression: between 35× and 49×. The compression rate already included the meta-data for sparse representation. The compression scheme doesn’t incur any accuracy loss.”)

Regarding claim 16
Han teaches the apparatus of claim 9. 
Han further teaches wherein the controller is further configured to determine a compression ratio of the compressed artificial neural network based on the compression rate and a task accuracy for a task processed by the compressed artificial neural network, (see pg. 3 section 3 “During update, all the gradients are grouped by the color and summed together, multiplied by the learning rate and subtracted from the shared centroids from last iteration. For pruned AlexNet, we are able to quantize to 8-bits (256 shared weights) for each CONV layers, and 5-bits (32 shared weights) for each FC layer without any loss of accuracy. To calculate the compression rate, given k clusters, we only need log2(k) bits to encode the index. In general, for a network with n connections and each connection is represented with b bits, constraining the connections to have only k shared weights will result in a compression rate…”)
and re-compressing the artificial neural network based on the determined compression ratio. (Examiner notes that the three stages of compression steps there is a repeating steps/update with arrow going back to the previous steps which corresponds to “re-compression” see Figure 1 and see pg. 5 section 5 “Pruning is implemented by adding a mask to the blobs to mask out the update of the pruned connections. Quantization and weight sharing are implemented by maintaining a codebook structure that stores the shared weight, and group-by-index after calculating the gradient of each layer. Each shared weight is updated with all the gradients that fall into that bucket.”)

Regarding claim 17
Han teaches the apparatus of claim 14.
Han further teaches wherein the re-compressing of the artificial neural network comprises re-compressing the artificial neural network by adjusting weights from among nodes belonging to different layers from among layers of the compressed artificial neural network according to the compression ratio. (Figure 1: The three stage compression pipeline: pruning, quantization and Huffman coding. Pruning reduces the number of weights by 10×, while quantization further improves the compression rate: between 27× and 31×. Huffman coding gives more compression: between 35× and 49×. The compression rate already included the meta-data for sparse representation. The compression scheme doesn’t incur any accuracy loss.” also see pg. 2 “First, we prune the networking by removing the redundant connections, keeping only the most informative connections. Next, the weights are quantized so that multiple connections share the same weight, thus only the codebook (effective weights) and the indices need to be stored.”)

Regarding claim 18
Han teaches the apparatus of claim 14.
Han further teaches wherein the controller is further configured to re-compresses the artificial neural network by adjusting weights with a value lesser than a threshold from among nodes of the compressed artificial neural network according to the compression ratio. (Figure 1 “The three stage compression pipeline: pruning, quantization and Huffman coding. Pruning reduces the number of weights by 10×, while quantization further improves the compression rate: between 27× and 31×. Huffman coding gives more compression: between 35× and 49×. The compression rate already included the meta-data for sparse representation. The compression scheme doesn’t incur any accuracy loss.” also see pg. 2 “First, we prune the networking by removing the redundant connections, keeping only the most informative connections. Next, the weights are quantized[corresponds to adjust] so that multiple connections share the same weight, thus only the codebook (effective weights) and the indices need to be stored.”)

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 5 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over 
Han et al. (“Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantiztion and Huffman Coding”, hereinafter: Han) in view of Luo et al. (“ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression”). 
Regarding claim 5 
Han teaches the method of claim 3. 
Han does not teach the method further comprising: comparing the compression rate of the re-compressed artificial neural network with a lower threshold; and determining whether to terminate a current compression session and to start a compression session after the compression rate is set to an initial reference value, based on a result of the comparison.  
Luo teaches the method further comprising: comparing the compression rate of the re-compressed artificial neural network with a lower threshold; (Examiner notes that Algorithm 1 shows comparing training items under the for loop inside if condition on lines 7-8 section 4.2 “Hence, there are in total 100,000 training samples used for finding the optimal channel subset via Algorithm 1. We compared several different choices of image and location number, and found that the current choice (10 images per class and 10 locations per image) is enough for neuron importance evaluation.”)
and determining whether to terminate a current compression session and to start a compression session after the compression rate is set to an initial reference value, based on a result of the comparison. (Examiner notes that Algorithm 1 shows comparing training items under the for loop inside if condition on lines 7-8 section 4.2 “Hence, there are in total 100,000 training samples used for finding the optimal channel subset via Algorithm 1. We compared several different choices of image and location number, and found that the current choice (10 images per class and 10 locations per image) is enough for neuron importance evaluation.” Also see pg. 5063 “To explore the limits of ThiNet, we prune VGG-16 with a larger compression rate 0.25, achieving 16× parameters reduction in convolutional layers. The conv5 layers are also pruned to get a smaller model. As for conv5-3, which is directly related to the final feature representation, we only prune half of the filters for accuracy consideration. Using these smaller compression ratios, we train a very small model. Denoted as “ThiNet-Tiny” in Table 1, it only takes 5.05MB disk space (1MB=2 20 bytes) but still has AlexNet-level accuracy (the top-1/top-5 accuracy of AlexNet is 57.2%/80.3%, respectively).”)
Han and Lou are analogous art because they are both directed to compressing neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have to combine the compressing neural network of Larson with filter level pruning for deep neural network compression of Lou.
One of ordinary skill in the art would have been motivated to make this modification in order to provide “an efficient and unified framework, namely ThiNet, to simultaneously accelerate and compress CNN models in both training and inference stages” as disclosed (Lou abstract “We propose an efficient and unified framework, namely ThiNet, to simultaneously accelerate and compress CNN models in both training and inference stages. We focus on the filter level pruning, i.e., the whole filter would be discarded if it is less important. Our method does not change the original network structure, thus it can be perfectly supported by any off-the-shelf deep learning libraries.”). 
Regarding claim 15
Claim 15 recites analogous limitations to claim 5 and therefore is rejected on the same ground as claim 5. 

Claims 8 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over 
Han et al. (“Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantiztion and Huffman Coding”, hereinafter: Han) in view of Ko et al. (“Adaptive Weight Compression for Memory-Efficient Neural Networks”).
Regarding claim 8
Han teaches the apparatus of claim 6. 
Han does not teach wherein the compression ratio is determined to reduce a degree of loss of a task accuracy with respect to a compression rate.  
Ko teaches wherein the compression ratio is determined to reduce a degree of loss of a task accuracy with respect to a compression rate. (Pg. 203 left col “Table 1 shows compression ratio achieved by different approaches at 1% loss of normalized accuracy. For the MNIST dataset, the proposed adaptive QF control approach achieves 42.4X compression, which is significantly higher than other approaches. When combined with the bit truncation, the weight size reduces by 273X, from 446KB to 1.6 KB.”) 
Han and Ko are analogous art because they are both directed to compressing neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have to combine the compressing neural network of Larson with the weight compression for memory efficient of neural networks of Ko.
One of ordinary skill in the art would have been motivated to make this modification in order to adaptively control the quantization factor of the JPEG algorithm. Doing so provides the advantage compressing less for higher accuracy as disclosed (Ko abstract “To minimize the loss of accuracy due to JPEG encoding, we propose to adaptively control the quantization factor of the JPEG algorithm depending on the error-sensitivity (gradient) of each weight. With the adaptive compression technique, the weight blocks with higher sensitivity are compressed less for higher accuracy. The adaptive compression reduces memory requirement, which in turn results in higher performance and lower energy of neural network hardware. The simulation for inference hardware for multilayer perceptron with the MNIST dataset shows up to 42X compression with less than 1% loss of recognition accuracy, resulting in 3X higher effective memory bandwidth and ~19X lower system energy.”). 

Regarding claim 19
Claim 19 recites analogous limitations to claim 8 and therefore is rejected on the same ground as claim 8. 


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VAN C MANG whose telephone number is (571)270-7598. The examiner can normally be reached Mon - Fri 8:00-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on 5712729767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/V.M./Examiner, Art Unit 2126  
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126