Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
Regarding claim 11
Claim 11 invokes 35 U.S.C. 112(f) by “means for compressing at least on of activations or weights in at least one layer of the neural network based at least in part on a compression ratio and a system event to produce at least one of compressed activations or compressed weights” which is supported in Specifications paragraphs [0057] and [0058], “In some aspects, the compression ratio may be set based on the type of layer in the computational network (e.g., DCN 350), or the sensitivity to noise for a given layer. For example, a lower ratio of compression may be applied in fully-connected layers or convolution layers and a greater ratio of compression may be applied for pooling layer.  The compressor unit 408 may also  
Claim 11 also invokes 35 U.S.C. “means for operating neural network based on the at least one of the compressed activations or the compressed weights” which is supported in Specifications paragraph [0059] “When the compressed weights or activations are to be read, a decompressor unit 416 [from Figure 4 shown below] uses the compression metadata 412 to decompress the compressed activations/weights 414 (e.g., using a complement to the compression algorithm). The decompressed activations and weights may then be supplied to consumer unit 418. The consumer may comprise for example a hardware accelerator, a CPU, or other processing element.   Of course, in some aspects, the consumer unit 418 may be the same block as the producer unit 406.   For example, a hardware accelerator may consume activations that it produced during the processing of the preceding layer. Thus, the hardware accelerator may serve as both the producer unit 406 and the consumer unit 418. In some aspects, the consumer unit 418 may consume or utilize compressed data from multiple producers. For instance, a hardware accelerator may consume activations that it produced itself while also consuming weights that were produced by the CPU.
Regarding claim 13
Claim 13 invokes 35 U.S.C. 112(f) by “means for adapting the compression ratio mid-layer in response to a change in at least one of a power condition, a debug condition, a 

    PNG
    media_image1.png
    456
    760
    media_image1.png
    Greyscale

Regarding claim 15
Claim 15 invokes 35 U.S.C. 112(f) by “means for determining the compression ratio using a compression map, the compression map being specified based at least in part on at least one of sparsity estimates or a loss threshold for a specified bandwidth, a specified power level, a debug state, or a thermal profile” which is supported in Specifications paragraphs [0055] and [0056] and Figure 4.  “The producer units 406 may comprise, for example, a CPU (e.g., CPU 102), or other processing unit (e.g., GPU 104) of the computational network.   The producer unit 406 generates an output (e.g., memory bandwidth) including weights and activations for the next layer of the computational network that are supplied to the compressor unit 408. Although one producer unit 406 is shown in FIGURE 4, this is merely exemplary and not limiting and any number of producers may be included in the system architecture 400. The compressor unit 408 applies a compression algorithm based on a layer-wise compression map from the compression map unit 402 to compress the memory bandwidth (e.g., activations and/or weight). Applying the compression algorithm, the compressor unit may leverage existing sparsity in a layer and/or increase the sparsity in a layer to reduce the computations to be performed in the compressed layer. In one example, the compression algorithm may be a rectifier which sets weight values (or activations) that are less than 0.5 to zero. As such, computations using such value may be skipped thereby reducing the computations performed in a layer. In some aspects, the compression map may specify a compression ratio   (e.g., 60%) to be applied at a particular layer of the computational network.  The compression map unit 402 may generate the compression map based on one or more system events, for instance. The system events may include but are not limited to a bandwidth condition, a thermal condition, a debug condition, or a power condition. For 

    PNG
    media_image2.png
    498
    648
    media_image2.png
    Greyscale

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-15 are rejected under 35 U.S.C. 103 as being unpatentable over Lou et al., (hereafter Lou) “ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression”, and in view of Ko et al. (hereafter Ko) “Adaptive Weight Compression for Memory-Efficient Neural Networks”.
Regarding claim 1
A method of operating a neural network, comprising:
Lou teaches compressing at least one of activations or weights in at least one layer of the neural network  ([Title] “ThiNet: A filter Level Pruning Method for Deep Neural Network Compression” and [Abstract] “We propose an efficient and unified framework, namely ThiNet, to simultaneously accelerate and compress CNN models in both training and inference stages” and [pg. 3, Col 1, lines 27-30] “Given a pre-trained model, it would be pruned layer by layer with a predefined  compression rate” and [pg. 6, Col 1, lines 16-34] “For example, if we removed 60% filters in conv 1-1 using the small weight criterion, the top-1 accuracy is only 40.99% (for the fine-tuning), while random criterion is 51.26%.  By contrast, our method (ThiNet w/o w) can reach 68.24% and even 70.75% with least squares (ThiNet).  The accuracy loss of weight sum is large that fine-tuning cannot completely recover it from the drop.  In contrast, our method shows much higher and robust results.  The least squares approach does indeed aid to get a better weight initialization for fine-tuning, especially when the compression rate is relatively high”, the examiner notes “compress CNN models in both training and inference stages” teaches “compressing”, and “it would be pruned layer by layer” teaches “in at least on layer”, and “Deep Neural Network Compression” teaches “of the neural network”, and “if we removed 60% filters in conv1-1 using the small weight criterion” teaches “at least one of activations or weights”). 
based at least in part on a compression ratio ([Title] “ThiNet: A filter Level Pruning Method for Deep Neural Network Compression” and [Abstract] “We propose an efficient and unified framework, namely ThiNet, to simultaneously accelerate and compress CNN models in both training and inference stages” and [pg. 3, Col 1, lines 27-30] “Given a pre-trained model, it would be pruned layer by layer with a predefined  compression rate”, the examiner notes “with a predefined compression rate” teaches “based at least in part on a compression ratio”).
to produce at least one of compressed activations or compressed weights; ([pg. 3, Col 2, Lines 6-12] “Weak channels in layer (i + 1)’s input and their corresponding filters in layer i would be pruned away, leading to a much smaller model.  Note that, the pruned network has exactly the same structure but with fewer filters and channels.  In other words, the original wide network is becoming much thinner”.  The examiner notes “the pruned network has exactly the same structure but with fewer filters and channels” teaches “at least one of compressed activations or compressed weights”, where the filters and channels include the “compressed activations or compressed weights”). 
and operating the neural network to compute an inference based on the at least one of the compressed activations or the compressed weights.  ([pg. 6, Col 1, Lines 30-34] “our method shows much higher and robust results.  The least squares approach does indeed aid to get a better weight initialization for fine-tuning, especially when the compression rate is relatively high.” and [pg. 6, Col 2, lines 33-40] “We summarize the performance of the ThiNet approach in Table 1.  Here, “ThiNet-Conv” refers to the model in which only the first 10 convolutional layers are pruned with compression rate (i.e., half of the filters are removed in each layer til conv4-3) as stated above”.  The examiner notes “summarize the performance of the ThiNet approach” teaches “operating the neural network to compute an inference”, and “the least squares approach does indeed aids to get a better weight initialization for fine-tuning, especially when the compression rate is relatively high” teaches “based on the at least one of the compressed activations or the compressed weights”).
Lou does not teach a system event.
Ko teaches a system event.  ([pg. 2, col 1, lines 6-12] “We present the design of an inference engine with a JPEG decoder embedded in the memory controller.  We study the impact of weight compression on the memory requirement, performance, and energy consumption of a hardware system for inference, by considering memory access energy / latency and decoder overhead”, the examiner notes “We study the impact of weight compression on the memory requirement, performance, and energy consumption of a hardware system for inference, by considering memory access energy / latency and decoder overhead” teaches “a system event”, as defined in specifications paragraph [0056] “The system events may include but are not limited to a bandwidth condition, a thermal condition, a debug condition, or a power condition”, and “memory requirement, performance,” teaches “debug condition” as described in Specifications paragraph [0067] “debugging conditions (e.g., memory errors)”).
Lou and Ko are analogous art because each work in the field of neural network compression. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Lou to incorporate the teaching of Ko to use system requirements as per Ko [pg. 2, col 1, lines 6-12] to guide weight compression to improve neural network performance.
Regarding claim 2
The combination of Lou and Ko teaches claim 1.
Lou teaches wherein the compression is performed during computation of an inference.
 ([pg. 1, Abstract, lines 1-3] “We propose an efficient and unified framework, namely ThiNet, to simultaneously accelerate and compress CNN models in both training and inference stages”, the examiner notes “simultaneously accelerate and compress CNN models in both training and inference stages” teaches “wherein the compression is performed during computation of an inference”).
Regarding claim 3
The combination of Lou and Ko teaches claim 1.
Lou teaches wherein the compressing ratio is adapted mid-layer, ([Figure 1] In Figure 1, shown below, the compression is adapted to the neural network at the level of an individual layer, and teaches “wherein the compression ratio is adapted mid-layer”.  [pg. 3, Col 1, lines 30-34] – [pg. 3, Col 2, lines 1-20] “if we can use a subset of channels in layer (I + 1)’s input to approximate the output in layer i +1, the other channels can be safely removed from the input of layer i +1. (…) Weak channels in layer (i +1)’s input and their corresponding filters in layer i would be pruned away, leading to a much smaller model.  [pg. 5, Col 2, lines 26-28] “Starting from this fine-tuned model, we then prune the network layer by layer with different compression rate”).

    PNG
    media_image3.png
    532
    1123
    media_image3.png
    Greyscale


Lou does not teach in response to a change in at least one of a power condition, a bandwidth condition, a debug condition, or a thermal condition.
Ko teaches in response to a change in at least one of a power condition, a bandwidth condition, a debug condition, or a thermal condition.  ([pg. 2, col 1, lines 6-12] “We present the design of an inference engine with a JPEG decoder embedded in the memory controller.  We study the impact of weight compression on the memory requirement, performance, and energy consumption of a hardware system for inference, by considering memory access energy / latency and decoder overhead”, the examiner notes “We study the impact of weight compression on the memory requirement, performance, and energy consumption of a hardware system for inference, by considering memory access energy / latency and decoder overhead” teaches “at least one of a power condition, a bandwidth condition, a debug condition, or a thermal condition”, where “energy consumption of a hardware system for inference” teaches both “a power condition” and “a thermal condition”, and “memory requirement, performance,” teaches “debug condition” as described in Specifications paragraph [0067] “debugging conditions (e.g., memory errors)”).
Lou and Ko are analogous art because each work in the field of neural network compression. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Lou to incorporate the teaching of Ko to use system requirements as per Ko [pg. 2, col 1, lines 6-12] to guide weight compression to improve neural network performance.
Regarding claim 4
The combination of Lou and Ko teaches claim 1.
Lou does not teach wherein the system event comprises at least one of a bandwidth condition, a power condition, a debug condition, or a thermal condition.
Ko teaches wherein the system event comprises at least one of a bandwidth condition, a power condition, a debug condition, or a thermal condition.  ([pg. 2, col 1, lines 6-12] “We present the design of an inference engine with a JPEG decoder embedded in the memory controller.  We study the impact of weight compression on the memory requirement, performance, and energy consumption of a hardware system for inference, by considering memory access energy / latency and decoder overhead”, the examiner notes “We study the impact of weight compression on the memory requirement, performance, and energy consumption of a hardware system for inference, by considering memory access energy / latency and decoder overhead” teaches “at least one of a power condition, a bandwidth condition, a debug condition, or a thermal condition”, where “energy consumption of a hardware system for inference” teaches both “a power condition” and “a thermal condition”, and “memory requirement, performance,” teaches “debug condition” as described in Specifications paragraph [0067] “debugging conditions (e.g., memory errors)”).
Lou and Ko are analogous art because each work in the field of neural network compression. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Lou to incorporate the teaching of Ko to use system requirements as per Ko [pg. 2, col 1, lines 6-12] to guide weight compression to improve neural network performance.
Regarding claim 5
The combination of Lou and Ko teaches claim 4.
Lou does not teach wherein the compression ratio is determined using a compression map, the compression map being specified based at least in part on at least one of sparsity estimates or a loss threshold for a specified bandwidth, power or thermal profile.
 Ko teaches wherein the compression ratio is determined using a compression map, the compression map being specified based at least in part on at least one of sparsity estimates or a loss threshold for a specified bandwidth, power or thermal profile. ([pg. 3, Col 2, lines 6-12] “Overall encoding and decoding process is described in Figure 4.  First, the weight values are scaled to [0,256] to make them grayscale pixel values for an JPEG encoder.  Then, every 64 elements of each row of the 2-D weight matrix are reshaped into 8x8 sub-blocks.” And [pg. 5, Col 2, lines 2-12]  “As compression reduces the memory size required to store the trained weights, memory access delay and energy during inference will decrease accordingly.  However, the proposed compression approach requires JPEG decoding, which incurs additional computation time and energy.  Therefore, to achieve the advantage in system throughput and energy, compression ratio should be high enough so that the reduction in memory access time and energy can compensate for decoding overhead.  [pg. 6, Col 1, lines 2- 41] “In the baseline system without compression, the effective bandwidth is limited by the off-chip memory access time of a block [Figure 8(a)].  As access time for a block decreases during to the adaptive JPEG compression, the effective memory throughput increases even considering the decoding time[ Figure 8(b) and Figure 9(a)].  Note that pipelining fetch and JPEG decoding stages helps improve the performance.  As compression ratio increases, reduced block access time enhances the throughput until the decoding time becomes the throughput bottleneck [Figure 8(c)].  Therefore, the system throughput saturates at higher compression as illustrated in [Figure 9(a)].  (…) To analyze the energy advantage, we illustrate the sum of memory access and decoding energy for given compression ratio [Figure 9(b)].  For all memory types, compression leads to significant reduction in energy because of reduced memory access energy”.  The examiner notes “As compression ratio increases” and “adaptive JPEG compression” teaches “the compression ratio is determined using a compression map”, where “adaptive JPEG compression” and “JPEG encoder” teaches “compression map”, and “to achieve the advantage in system throughput and energy, compression ratio should be high enough so that the reduction in memory access time and energy can compensate for decoding overhead” teaches “the compression map being specified based at least in part on at least one of sparsity estimates or a loss threshold for a specified bandwidth, power or thermal profile”.   Figure 9(a) and Figure 9(b) show Effective bandwidth and Energy per block versus Compression ratio, which teach “sparsity estimates for a specified bandwidth, power or thermal profile” because both bandwidth and energy requirements are factored into the compression ratio and adaptation of the JPEG encoder).
    PNG
    media_image4.png
    418
    758
    media_image4.png
    Greyscale

Lou and Ko are analogous art because each work in the field of neural network compression. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Lou to incorporate the teaching of Ko to use system requirements and a JPEG decoder as per Ko [pg. 5, Col 2, lines 2-12] to guide weight compression to improve bandwidth and energy efficiency.
Regarding claim 6
Lou teaches compress at least one of activations or weights in at least one layer of the neural network based at least in part on a compression ratio, ([Title] “ThiNet: A filter Level Pruning Method for Deep Neural Network Compression” and [Abstract] “We propose an efficient and unified framework, namely ThiNet, to simultaneously accelerate and compress CNN models in both training and inference stages” and [pg. 3, Col 1, lines 27-30] “Given a pre-trained model, it would be pruned layer by layer with a predefined  compression rate” and [pg. 6, Col 1, lines 16-34] “For example, if we removed 60% filters in conv 1-1 using the small weight criterion, the top-1 accuracy is only 40.99% (for the fine-tuning), while random criterion is 51.26%.  By contrast, our method (ThiNet w/o w) can reach 68.24% and even 70.75% with least squares (ThiNet).  The accuracy loss of weight sum is large that fine-tuning cannot completely recover it from the drop.  In contrast, our method shows much higher and robust results.  The least squares approach does indeed aid to get a better weight initialization for fine-tuning, especially when the compression rate is relatively high”, the examiner notes “compress CNN models in both training and inference stages” teaches “compressing”, and “it would be pruned layer by layer” teaches “in at least on layer”, and “Deep Neural Network Compression” teaches “of the neural network”, and “if we removed 60% filters in conv1-1 using the small weight criterion” teaches “at least one of activations or weights” and ([Title] “ThiNet: A filter Level Pruning Method for Deep Neural Network Compression” and [Abstract] “We propose an efficient and unified framework, namely ThiNet, to simultaneously accelerate and compress CNN models in both training and inference stages” and [pg. 3, Col 1, lines 27-30] “Given a pre-trained model, it would be pruned layer by layer with a predefined  compression rate”, the examiner notes “with a predefined compression rate” teaches based at least in part on a compression ratio”).
([pg. 3, Col 2, Lines 6-12] “Weak channels in layer (i + 1)’s input and their corresponding filters in layer i would be pruned away, leading to a much smaller model.  Note that, the pruned network has exactly the same structure but with fewer filters and channels.  In other words, the original wide network is becoming much thinner”.  The examiner notes “the pruned network has exactly the same structure but with fewer filters and channels” teaches “at least one of compressed activations or compressed weights”). 
Lou does not teach An apparatus of operating a neural network, comprising: a memory; and at least one processor coupled to the memory, the at least one processor being configured to: and a system event 
Ko teaches An apparatus of operating a neural network, comprising: a memory; and at least one processor coupled to the memory, the at least one processor being configured to: ([pg. 4, Col 1, lines 1-12] “At the hardware device for neural network inference, one block of compressed weights is loaded into the on-chip buffer, together with the QF information.  The block is decoded by a JPEG decoder and reshaped into the original form of a 64-element row vector, which is fed into a MAC unit.  Here we assume the operations (off-chip memory access, JPEG decoding, and MAC operations) are pipelined in a unit of an 8x8 block, to improve the throughput.  The major overhead added to the inference device for adaptive image compression is the JPEG decoder.  Note the time/energy complexity of JPEG decoder is much lower than encoder as in common image encoding algorithms.”  The examiner notes “at the hardware device for neural network inference” teaches “An apparatus of operating a neural network”, and “one block of compressed weights is loaded into the on-chip buffer,” teaches “a memory”, and “MAC unit” teaches one processor , [Figure 4] (shown below) “Inference engine” and “off-chip memory” in subfigure (b), and “144 MAC units” in subfigure (c) teaches “at least one processor coupled to the memory”.

    PNG
    media_image5.png
    677
    1187
    media_image5.png
    Greyscale

and a system event ([pg. 2, col 1, lines 6-12] “We present the design of an inference engine with a JPEG decoder embedded in the memory controller.  We study the impact of weight compression on the memory requirement, performance, and energy consumption of a hardware system for inference, by considering memory access energy / latency and decoder overhead”, the examiner notes “weight compression” teaches “compression ratio”, and “JPEG decoder” teaches “compression map”, and “considering memory access energy / latency” teaches “being specified based at least in part on at least one of sparsity estimates or a loss threshold for a specified (…) power or thermal profile” because “considering memory access energy” address both the power required for memory access and thermal considerations because the author uses the term “energy” which encompasses both power for processing and thermal waste energy).
Lou and Ko are analogous art because each work in the field of neural network compression. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Lou to incorporate the teaching of Ko to use a hardware device and system requirements as per Ko [pg. 2, col 1, lines 6-12] to guide weight compression to perform and improve neural network performance.
Regarding claim 7
The combination of Lou and Ko teaches claim 6.
Lou teaches perform the compression during computation of an inference.  ([Abstract] “We propose an efficient and unified framework, namely ThiNet, to simultaneously accelerate and compress CNN models in both training and inference stages”).
Lou does not teach wherein the at least one processor is further configure.	 
Ko teaches wherein the at least one processor is further configure: ([pg. 4, Col 1, lines 1-12] “At the hardware device for neural network inference, one block of compressed weights is loaded into the on-chip buffer, together with the QF information.  The block is decoded by a JPEG decoder and reshaped into the original form of a 64-element row vector, which is fed into a MAC unit.  Here we assume the operations (off-chip memory access, JPEG decoding, and MAC operations) are pipelined in a unit of an 8x8 block, to improve the throughput.  The major overhead added to the inference device for adaptive image compression is the JPEG decoder.  Note the time/energy complexity of JPEG decoder is much lower than encoder as in common image encoding algorithms.”  The examiner notes “at the hardware device for neural network inference” teaches “An apparatus of operating a neural network”, and “one block of compressed weights is loaded into the on-chip buffer,” teaches “a memory”, and “MAC unit” teaches one processor , [Figure 4] (shown below) “Inference engine” and “off-chip memory” in subfigure (b), and “144 MAC units” in subfigure (c) teaches “at least one processor”.
Lou and Ko are analogous art because each work in the field of neural network compression. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Lou to incorporate the teaching of Ko to use a processor as per Ko [pg. 2, col 1, lines 6-12] to perform weight compression and improve neural network performance.
Regarding claim 8
The combination of Lou and Ko teaches claim 6.
Dependent Claim 8 incorporates substantively all the limitations of Claim 4 in a system and is rejected under the same rationale.
Regarding claim 9
The combination of Lou and Ko teaches claim 8.
Dependent Claim 9 incorporates substantively all the limitations of Claim 5 in a system and is rejected under the same rationale.
Regarding claim 10
The combination of Lou and Ko teaches claim 6.
Dependent Claim 10 incorporates substantively all the limitations of Claim 3 in a system and is rejected under the same rationale.
Regarding claim 11
Lou teaches means for compressing at least one of activations or weights in at least one layer of the neural network based ([Title] “ThiNet: A filter Level Pruning Method for Deep Neural Network Compression” and [Abstract] “We propose an efficient and unified framework, namely ThiNet, to simultaneously accelerate and compress CNN models in both training and inference stages” and [pg. 3, Col 1, lines 27-30] “Given a pre-trained model, it would be pruned layer by layer with a predefined  compression rate” and [pg. 6, Col 1, lines 16-34] “For example, if we removed 60% filters in conv 1-1 using the small weight criterion, the top-1 accuracy is only 40.99% (for the fine-tuning), while random criterion is 51.26%.  By contrast, our method (ThiNet w/o w) can reach 68.24% and even 70.75% with least squares (ThiNet).  The accuracy loss of weight sum is large that fine-tuning cannot completely recover it from the drop.  In contrast, our method shows much higher and robust results.  The least squares approach does indeed aid to get a better weight initialization for fine-tuning, especially when the compression rate is relatively high”, the examiner notes “compress CNN models in both training and inference stages” teaches “compressing”, and “it would be pruned layer by layer” teaches “in at least on layer”, and “Deep Neural Network Compression” teaches “of the neural network”, and “if we removed 60% filters in conv1-1 using the small weight criterion” teaches “at least one of activations or weights”). 
at least in part on a compression ratio, ([Title] “ThiNet: A filter Level Pruning Method for Deep Neural Network Compression” and [Abstract] “We propose an efficient and unified framework, namely ThiNet, to simultaneously accelerate and compress CNN models in both training and inference stages” and [pg. 3, Col 1, lines 27-30] “Given a pre-trained model, it would be pruned layer by layer with a predefined  compression rate”, the examiner notes “with a predefined compression rate” teaches based at least in part on a compression ratio”).
to produce at least one of compressed activations or compressed weights; ([pg. 3, Col 2, Lines 6-12] “Weak channels in layer (i + 1)’s input and their corresponding filters in layer i would be pruned away, leading to a much smaller model.  Note that, the pruned network has exactly the same structure but with fewer filters and channels.  In other words, the original wide network is becoming much thinner”.  The examiner notes “the pruned network has exactly the same structure but with fewer filters and channels” teaches “at least one of compressed activations or compressed weights”). 
and means for operating the neural network to compute an inference based on the at least one of the compressed activations or the compressed weights.  ([pg. 6, Col 1, Lines 30-34] “our method shows much higher and robust results.  The least squares approach does indeed aid to get a better weight initialization for fine-tuning, especially when the compression rate is relatively high.” and [pg. 6, Col 2, lines 33-40] “We summarize the performance of the ThiNet approach in Table 1.  Here, “ThiNet-Conv” refers to the model in which only the first 10 convolutional layers are pruned with compression rate (i.e., half of the filters are removed in each layer til conv4-3) as stated above”.  The examiner notes “summarize the performance of the ThiNet approach” teaches “operating the neural network to compute an inference”, and “the least squares approach does indeed aids to get a better weight initialization for fine-tuning, especially when the compression rate is relatively high” teaches “based on the at least one of the compressed activations or the compressed weights”).
Lou does not teach An apparatus of operating a neural network, comprising: and a system event 
Ko teaches An apparatus of operating a neural network, comprising: ([pg. 4, Col 1, lines 1-12] “At the hardware device for neural network inference, one block of compressed weights is loaded into the on-chip buffer, together with the QF information.  The block is decoded by a JPEG decoder and reshaped into the original form of a 64-element row vector, which is fed into a MAC unit.  Here we assume the operations (off-chip memory access, JPEG decoding, and MAC operations) are pipelined in a unit of an 8x8 block, to improve the throughput.  The major overhead added to the inference device for adaptive image compression is the JPEG decoder.  Note the time/energy complexity of JPEG decoder is much lower than encoder as in common image encoding algorithms.”  The examiner notes “at the hardware device for neural network inference” teaches “An apparatus of operating a neural network”.
and a system event ([pg. 2, col 1, lines 6-12] “We present the design of an inference engine with a JPEG decoder embedded in the memory controller.  We study the impact of weight compression on the memory requirement, performance, and energy consumption of a hardware system for inference, by considering memory access energy / latency and decoder overhead”, the examiner notes “We study the impact of weight compression on the memory requirement, performance, and energy consumption of a hardware system for inference, by considering memory access energy / latency and decoder overhead” teaches “a system event”, as defined in specifications paragraph [0056] “The system events may include but are not limited to a bandwidth condition, a thermal condition, a debug condition, or a power condition”, and “memory requirement, performance,” teaches “debug condition” as described in Specifications paragraph [0067] “debugging conditions (e.g., memory errors)”).
Lou and Ko are analogous art because each work in the field of neural network compression. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Lou to incorporate the teaching of Ko to use a hardware device and system requirements as per Ko [pg. 2, col 1, lines 6-12] to guide weight compression to perform and improve neural network performance.
Regarding claim 12
The combination of Lou and Ko teaches claim 11.
Dependent Claim 12 incorporates substantively all the limitations of Claim 2 in a system and is rejected under the same rationale.
Regarding claim 13
The combination of Lou and Ko teaches claim 11.
Dependent Claim 13 incorporates substantively all the limitations of Claim 3 in a system and is rejected under the same rationale.
Regarding claim 14
The combination of Lou and Ko teaches claim 11.
Dependent Claim 14 incorporates substantively all the limitations of Claim 4 in a system and is rejected under the same rationale.
Regarding claim 15
The combination of Lou and Ko teaches claim 14.
Dependent Claim 15 incorporates substantively all the limitations of Claim 5 in a system and is rejected under the same rationale.
Claims 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Lou, and in view of Ko, and in further view Annapureddy et al. (hereafter Annapureddy) (US 2016/0217369).
Regarding claim 16
Independent Claim 16 incorporates substantively all the limitations of Claim 11 in a computer program product.
The combination of Lou and Ko teaches substantively all the limitations except for the the non-transitory computer program product storing instructions, however, Annapureddy teaches these.
Annapureddy teaches A non-transitory computer-readable medium storing computer executable code for operating a neural network, comprising code to: ([pg. 15, col 1, lines 7-14] “A non-transitory computer readable medium having encoded thereon program code for compressing a neural network, the program code being executed by a processor and comprising:”).
	Lou, Ko, and Annapureddy are analogous art because each works in the field of neural network compression. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Lou and Ko to incorporate the teaching of Annapureddy to use a non-transitory computer-readable medium storage as per Annapureddy [pg. 15, col 1, lines 7-14] to store program code for compressing neural networks.
Regarding claim 17
The combination of Lou, Ko, and Annapureddy teaches claim 16.
Dependent Claim 17 incorporates substantively all the limitations of Claim 2 in a system and is rejected under the same rationale.
Regarding claim 18
The combination of Lou, Ko, and Annapureddy teaches claim 16.
Dependent Claim 18 incorporates substantively all the limitations of Claim 4 in a system and is rejected under the same rationale.
Regarding claim 19
The combination of Lou, Ko, and Annapureddy teaches claim 18.
Dependent Claim 19 incorporates substantively all the limitations of Claim 5 in a system and is rejected under the same rationale.
Regarding claim 20
The combination of Lou, Ko, and Annapureddy teaches claim 16.
Dependent Claim 20 incorporates substantively all the limitations of Claim 3 in a system and is rejected under the same rationale.
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure are listed below:
Yao (NPL: “DeepIoT: Compressing Deep Neural Network Structures for Sensing systems with a Compressor-Critic Framework”): teaches using device memory considerations to determine the compression ratio.
McDanel (NPL: “Incomplete Dot Products for Dynamic Computation Scaling in Neural Network Inference”): teaches adjusting the number of input channels in each layer of a convolutional neural network during inference.

Conclusion                                                                                                                                                              



                                                                                                                                          
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BENJAMIN S WALKER whose telephone number is (303)297-4479.  The examiner can normally be reached on Monday - Friday 0730-1700 (MT).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ANN LO can be reached on (571) 272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-

/BENJAMIN WALKER/Examiner, Art Unit 2126      
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126