DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application was filed on 09/29/2017. 
This action is in response to arguments and/or amendments filed on 06/22/2021. In the current amendments, claims 1, 15 and 23 have been amended and claims 1-25 are currently pending and have been examined. 

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 06/22/2021 has been entered.
 

Response to Arguments
Applicant’s arguments with respect to claim(s) 1-25 have been considered but are moot because the new ground of rejection does not rely on any reference applied in 


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-4, 6, 9-11, 15-18, 20, 23 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Park et al. (“Weighted-Entropy-based Quantization for Deep Neural Networks”, hereinafter: Park) in view of Han et al. (“Deep Compression Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding”, hereinafter: Han) and further in view of Choi et al. (“Towards the limit of Network Quantization”, hereinafter: Choi). 
Regarding claim 1 (Currently Amended
Park teaches an apparatus comprising: logic circuitry (pg. 5456 right col “memory capacity requirements of neural networks during the inference by exploiting the benefits of dedicated hardware accelerators, e.g. NVIDIA P40 and P4 [2][corresponds to processor] which support 8-bit integer arithmetic or Stripes [14] which provides execution time and energy consumption proportional to the bitwidth.” Also see pg. 4561 section 5.1 “In the cases of GoogLeNet and ResNet, the batch size is limited due to insufficient GPU memory capacity; this may increase overall accuracy loss. We use ILSVRC2012 data set, which contains 1.28M images for training and 50K images for testing. During the six epochs of fine-tuning, we first set the initial learning rate to 0.001 and decrease it by 10 times every two epochs.”) to compress one or more activation functions for a convolutional network based on non-uniform quantization, (Examiner notes that each activation are quantized differently which makes the network non-uniform quantization see pg. 5460 “We integrate the proposed weight/activation quantization into the conventional training algorithm for neural networks. Since weights do not change during each minibatch, weight quantization can be simply applied by quantizing the weights at the end of each mini-batch after the weight update. Note that we use full-precision weights during the weight update as in other previous work [19, 25]. On the other hand, activation quantization has to be applied to every forward/backward pass as each pass has its own set of activations. For each layer, we first perform the forward pass and apply the ordinary ReLU (without LogQuant). The resulting activations are fed into our algorithm for LogQuant parameter search.”)
wherein the non-uniform quantization for each layer of the convolutional network is to be performed offline, (Examiner interprets offline as “pre-trained network” see pg. section 4.2 “Activation quantization needs a different approach from weight quantization. While weights are fixed after the training, activations change at inference time according to the input data. This makes activations less suitable to be quantized by clustering-based approaches, which require a stable distribution of values.”)
wherein an activation function for a specific layer of the convolutional network is to be quantized during runtime, (pg. 5460 section 4.3 “For each layer, we first perform the forward pass and apply the ordinary ReLU (without LogQuant). The resulting activations are fed into our algorithm for LogQuant parameter search. The best base/offset combination from the algorithm is then used to quantize the activations by using WEIGHTEDLOGQUANTRELU. The quantized activations are passed to the next layer to perform the same process for the rest of the layers in the network.”)
wherein, to determine a next layer of the convolutional network, (pg. 5460 right col “The best base/offset combination from the algorithm is then used to quantize the activations by using WEIGHTEDLOGQUANTRELU. The quantized activations are passed to the next layer to perform the same process for the rest of the layers in the network.”)
Park does not teach wherein, during runtime, only indexes, corresponding to the quantized activation function, are stored to reduce memory bandwidth usage, …one or more indexes are first to be translated by an index-to-value function.
Han teaches wherein, during runtime, only indexes, corresponding to the quantized activation function, (Pg. 3 section 3 second paragraph “Weight sharing is illustrated in Figure 3. Suppose we have a layer that has 4 input neurons and 4 output neurons, the weight is a 4 × 4 matrix. On the top left is the 4 × 4 weight matrix, and on the bottom left is the 4 × 4 gradient matrix. The weights are quantized to 4 bins (denoted with 4 colors), all the weights in the same bin share the same value, thus for each weight, we then need to store only a small index into a table of shared weights.”) are stored to reduce memory bandwidth usage. (Abstract “Pruning, reduces the number of connections by 9× to 13×;Quantization then reduces the number of bits that represent each connection from 32 to 5. On the ImageNet dataset, our method reduced the storage required by AlexNet by 35×, from 240MB to 6.9MB, without loss of accuracy. Our method reduced the size of VGG-16 by 49× from 552MB to 11.3MB, again with no loss of accuracy. This allows fitting the model into on-chip SRAM cache rather than off-chip DRAM memory. Our compression method also facilitates the use of complex neural networks in mobile applications where application size and download bandwidth are constrained.”)
Park and Han are analogous art because they are both directed to compressing data.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Park to incorporate the teaching of Han to include a method or system that uses pruning and quantization for training neural network. 
One of ordinary skill in the arts would have been motivated to make this modification in order to reduce the storage requirement of neural networks without affecting their accuracy as disclosed by Han (Abstract “Our method first prunes the network by learning only the important connections. Next, we quantize the weights to enforce weight sharing, finally, we apply Huffman coding. After the first two steps we retrain the network to fine tune the remaining connections and the quantized centroids. Pruning, reduces the number of connections by 9× to 13×; Quantization then reduces the number of bits that represent each connection from 32 to 5.”).
	Park in view of Han with Choi does not teach one or more indexes are first to be translated by an index-to-value function.
Choi teaches one or more indexes are first to be translated by an index-to-value function. (Pg. 4 section 3 “Figure 2(a) illustrates an example of network quantization. For network quantization, network parameters are grouped into clusters. Parameters in the same cluster share their quantized value, which is the representative value (i.e., cluster center) of the cluster they belong to. After clustering, lossless binary encoding follows to encode quantized parameters into binary codewords to store instead of actual parameter values. Either fixed-length binary encoding or variable-length binary encoding, e.g., Huffman coding, can be employed to this end. Note that one also needs to keep a lookup table for decoding quantized values from their binary encoded codewords as shown in Figure 2(b).” Examiner notes that Choi teaches the need to look up a lookup table for decoding quantized values from their binary encoded codewords as shown in Figure 2(b).)
Park, Han and Choi are analogous art because they are all directed to neural network.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Park in view of Han to incorporate the teaching of Choi to include a method or system for network quantization that uses network compression techniques employed to reduce the redundancy of deep neural networks.
One of ordinary skill in the arts would have been motivated to make this modification in order to address two issues that conventional quantization method using k-means clustering cannot properly handle which include “quantizing the network parameters of all layers in a neural network together at once by taking Hessian-weight into account” and figure out the impact of quantization error across layers quickly as disclosed by Choi (pg. 8 section 4.5 “We propose quantizing the network parameters of all layers in a neural network together at once by taking Hessian-weight into account. Layer-by-layer quantization was examined in the previous work (Gong et al., 2014; Han et al., 2015a). However, e.g., in Han et al. (2015a), a larger number of bits (a larger number of clusters) are assigned for convolutional layers than fully-connected layers, which implies that they heuristically treat convolutional layers more importantly.”).
Regarding claim 15
Claim 15 recites analogous limitations to independent claim 1 and therefore is rejected on the same ground as independent claim 1.
Regarding claim 23
Claim 23 recites analogous limitations to independent claim 1 and therefore is rejected on the same ground as independent claim 1.

Regarding claim 2 
Park in view of Han with Choi teaches claim 1.
Han further teaches the system further comprising memory to store the indexes corresponding to the quantized activation function during runtime. (Pg. 3 section 3 second paragraph “Weight sharing is illustrated in Figure 3. Suppose we have a layer that has 4 input neurons and 4 output neurons, the weight is a 4 × 4 matrix. On the top left is the 4 × 4 weight matrix, and on the bottom left is the 4 × 4 gradient matrix. The weights are quantized to 4 bins (denoted with 4 colors), all the weights in the same bin share the same value, thus for each weight, we then need to store only a small index into a table of shared weights.”)
Park, Choi and Han are analogous art because they are all directed to neural network.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Park in view of Choi to incorporate the teaching of Han to include a method or system that uses pruning and quantization for training neural network. 
One of ordinary skill in the arts would have been motivated to make this modification in order to reduce the storage requirement of neural networks without affecting their accuracy as disclosed by Han (Abstract “Our method first prunes the network by learning only the important connections. Next, we quantize the weights to enforce weight sharing, finally, we apply Huffman coding. After the first two steps we retrain the network to fine tune the remaining connections and the quantized centroids. Pruning, reduces the number of connections by 9× to 13×; Quantization then reduces the number of bits that represent each connection from 32 to 5.”).
Regarding claim 16
Claim 16 recites analogous limitations to claim 2 and therefore is rejected on the same ground as claim 2.

Regarding claim 3 
Park in view of Han with Choi teaches claim 2.
Choi further teaches wherein the indexes are stored in a lookup table. (Pg. 5 “which is only a function of the number of clusters, i.e., k, assuming that N and b are given; here, we note that it is not necessary to store k binary codewords in a lookup table for fixed-length codes since they can be implicitly known, e.g., if quantized values are encoded into binary numbers ranging from 0 to k − 1 in increasing order and are stored in a lookup table in the same order.”)
Park, Han and Choi are analogous art because they are all directed to neural network.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Park in view of Han to incorporate the teaching of Choi to include a method or system for network quantization that uses network compression techniques employed to reduce the redundancy of deep neural networks.
One of ordinary skill in the arts would have been motivated to make this modification in order to address two issues that conventional quantization method using k-means clustering cannot properly handle which include “quantizing the network parameters of all layers in a neural network together at once by taking Hessian-weight into account” and figure out the impact of quantization error across layers quickly as disclosed by Choi (pg. 8 section 4.5 “We propose quantizing the network parameters of all layers in a neural network together at once by taking Hessian-weight into account. Layer-by-layer quantization was examined in the previous work (Gong et al., 2014; Han et al., 2015a). However, e.g., in Han et al. (2015a), a larger number of bits (a larger number of clusters) are assigned for convolutional layers than fully-connected layers, which implies that they heuristically treat convolutional layers more importantly.”).
Regarding claim 17
Claim 17 recites analogous limitations to claim 3 and therefore is rejected on the same ground as claim 3.

Regarding claim 4
Park in view of Han with Choi teaches claim 1.
Park further teaches wherein compression of the one or more activation functions is to reduce memory bandwidth usage for processing information between layers of the convolutional network. (Pg. 5461 section 5.1.1 “For AlexNet, the best quantization configurations that use the fewest bits while satisfying the 1% top-5 accuracy loss constraint are (3,6), (4,4), (4,5) and (4,6). For example, (4,4) reduces the bitwidths of both weights and activations by 87.5% (= 1−4/32) with less than 1% loss of top5 accuracy. Moreover, our approach provides much lower iso-accuracy bitwidth compared to previous work.”)
Regarding claim 18
Claim 18 recites analogous limitations to claim 4 and therefore is rejected on the same ground as claim 4.

Regarding claim 6 
Park in view of Han with Choi teaches claim 1.
Park further teaches wherein distribution of each layer of the convolutional network is determined offline. (Examiner interprets offline as “pre-trained network” see pg. section 4.2 “Activation quantization needs a different approach from weight quantization. While weights are fixed after the training, activations change at inference time according to the input data. This makes activations less suitable to be quantized by clustering-based approaches, which require a stable distribution of values.”)
Regarding claim 20
Claim 20 recites analogous limitations to claim 6 and therefore is rejected on the same ground as claim 6.
Regarding claim 25
Claim 25 recites analogous limitations to claim 6 and therefore is rejected on the same ground as claim 6.
 
Regarding claim 9 Attorney Docket P113830 Patent Application92  
Park in view of Han with Choi teaches claim 1.
Park further teaches wherein the convolutional network is to assist in image processing. (pg. 5461 “For image classification tasks, we evaluate the proposed method by quantizing two widely used CNNs for ImageNet tasks [6]: AlexNet [15] GoogLeNet [21] (both from Caffe framework [13]) and ResNet3 [11]. In order to apply our quantization scheme into these networks, we perform finetuning combined with our weight/activation quantization schemes under the batch size of 256 (for AlexNet), 64 (for GoogLeNet), or 16 (for ResNet-50/101)”)



Regarding claim 10
Park in view of Han with Choi teaches claim 1.
Park further teaches wherein the convolutional network is to comprise a Convolutional Neural Network (CNN) (pg. section 5.1 “For image classification tasks, we evaluate the proposed method by quantizing two widely used CNNs for ImageNet tasks”) or a Deep Convolutional Network (DCN).  

Regarding claim 11 
Park in view of Han with Choi teaches claim 1.
Park further teaches wherein a processor comprises the logic circuitry. (pg. 5456 right col “memory capacity requirements of neural networks during the inference by exploiting the benefits of dedicated hardware accelerators, e.g. NVIDIA P40 and P4 [2][corresponds to processor] which support 8-bit integer arithmetic or Stripes [14] which provides execution time and energy consumption proportional to the bitwidth.” Also see pg. 4561 section 5.1 “In the cases of GoogLeNet and ResNet, the batch size is limited due to insufficient GPU memory capacity; this may increase overall accuracy loss. We use ILSVRC2012 data set, which contains 1.28M images for training and 50K images for testing. During the six epochs of fine-tuning, we first set the initial learning rate to 0.001 and decrease it by 10 times every two epochs.”)



Claims 5 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over 
Park et al. (“Weighted-Entropy-based Quantization for Deep Neural Networks”, hereinafter: Park) in view of Han et al. (“Deep Compression Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding”, hereinafter: Han) Choi et al. and further in view of Hubara et al. (“Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations”, hereinafter: Hubara).  
Regarding claim 5
Park in view of Han with Choi teaches claim 1. 
Park in view of Han with Choi does not teach wherein compression of the one or more activation functions is to reduce representation size for each of the one or more activation functions to 4 bits.  
Hubara teaches wherein compression of the one or more activation functions is to reduce representation size for each of the one or more activation functions to 4 bits. (pg. 12 section 4.4 second paragraph “While AlexNet can be compressed rather easily, compressing GoogleNet is much harder due to its small number of parameters. When using vanilla BNNs, we observed a large degradation in the top-1 results. However, by using QNNs with 4-bit weights and activation, we were able to achieve 66.5% top-1 accuracy (only a 5.5% drop in performance compared to the 32-bit floating point architecture), which is the current state-of-the-art-compression result over GoogleNet.”)
Park, Han, Choi and Hubara are analogous art because they are both directed to vector quantization.  
Park in view of Han and Choi to incorporate the teaching of Hubara to include activation functions that reduce representation size to 4-bits. 
One of ordinary skill in the arts would have been motivated to make this modification in order to substantially increase speed of DNNs at run-time as disclosed by Hubara (pg. 2 second paragraph “The most common approach is to compress a trained (full precision) network. HashedNets (Chen et al., 2015) reduce model sizes by using a hash function to randomly group connection weights and force them to share a single parameter value. Gong et al. (2014) compressed deep convnets using vector quantization, which resulteds in only a 1% accuracy loss. However, both methods focused only on the fully connected layers. A recent work by Han and Dally (2015) successfully pruned several state-of-the-art large scale networks and showed that the number of parameters could be reduced by an order of magnitude”).
Regarding claim 19
Claim 19 recites analogous limitations to claim 5 and therefore is rejected on the same ground as claim 5.
 
Claims 7 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over 
Park et al. (“Weighted-Entropy-based Quantization for Deep Neural Networks”, hereinafter: Park) in view of Han et al. (“Deep Compression Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding”, hereinafter: Han) in  Choi et al. and further in view of Yeo et al. (“Grayscale Medical Image Compression Using Feedforward Neural Networks”, hereinafter: Yeo).  
Regarding claim 7 
Park in view of Han with Choi teaches claim 1.
Park in view of Han with Choi does not teach wherein the quantized activation function is decompressed during runtime.  
Yeo teaches wherein the quantized activation function (pg. 635 section 4 “This network has three hidden layers with the nodes in each layer using linear activation function. While the inner hidden layer takes advantage of the interpixel redundancy within each block, the outer layer exploits the interblock redundancy to achieve the most efficient representation of the image.”) is decompressed during runtime. (Pg. 635 left col “During compression, the coupling weights will remain the same throughout the process and the obtained activation values (coefficients of the orthogonal basis function in the new vector space) of the hidden layer will be kept as the compressed image file. Later in the decompression stage, the image can be rebuilt by using the activation values and coupling weights.”)
Park, Han, Choi and Yeo are analogous art because they are directed to vector quantization and neural network.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Park in view of Han with Choi to incorporate the teaching of Yeo to include decompression during runtime using compression algorithm.
using smaller number of hidden neurons as compare to the number of image pixels due to lesser information being stored” which reduces computation time as disclosed by Yeo (abstract “After training with sufficient sample images, the compression process will be carried out on the target image. The coupling weights and activation values of each neuron in the hidden layer will be stored after training. Compression is then achieved by using smaller number of hidden neurons as compare to the number of image pixels due to lesser information being stored. Experimental results show that the FFN is able to achieve comparable compression performance to popular existing medical image compression schemes such as JPEG2000 and JPEG-LS.”)
Regarding claim 21
Claim 21 recites analogous limitations to claim 7 and therefore is rejected on the same ground as claim 7.

Claims 8, 22 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Park et al. (“Weighted-Entropy-based Quantization for Deep Neural Networks”, hereinafter: Park) in view of Han et al. (“Deep Compression Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding”, hereinafter: Han) in view of Choi et al. and further in view of Cai et al. (“Deep Learning with Low Precision by Half-wave Gaussian Quantization”, hereinafter: Cai).  
Regarding claim 8 
Park in view of Han with Choi teaches claim 1. 
Park further teaches the logic…  (pg. 5456 right col “memory capacity requirements of neural networks during the inference by exploiting the benefits of dedicated hardware accelerators, e.g. NVIDIA P40 and P4 [2][corresponds to processor] which support 8-bit integer arithmetic or Stripes [14] which provides execution time and energy consumption proportional to the bitwidth.” Also see pg. 4561 section 5.1 “In the cases of GoogLeNet and ResNet, the batch size is limited due to insufficient GPU memory capacity; this may increase overall accuracy loss. We use ILSVRC2012 data set, which contains 1.28M images for training and 50K images for testing. During the six epochs of fine-tuning, we first set the initial learning rate to 0.001 and decrease it by 10 times every two epochs.”)
Park in view of Han with Choi does not teach wherein …is to compress the one or more activation functions without retraining the convolutional network. 
Cai teaches wherein …is to compress the one or more activation functions without retraining the convolutional network. (Examiner notes that pre-trained corresponds to the model is trained and does not indicate any “retraining of the model” because there were no updating weights to adjust the model see pg. 7 right col section 5.4 “Table 3 shows the performance achieved by the three networks under the different approximations. In all cases, weights were binarized and the HWGQ was used as forward approximator (quantizer). “no-opt” refers to the quantization of activations[corresponds to compressing the one or more activation function] of pre-trained BW networks.”)
Park, Han, Choi and Cai are analogous art because they are both directed to vector quantization.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Park in view of Choi with Han to incorporate the teaching of Cai to include compressing the one or more activation function without retraining the convolutional neural network. 
One of ordinary skill in the arts would have been motivated to make this modification in order to reduce the model size of quantization by reducing weight with a marginal cost in classification accuracy as disclosed by Cai (pg. 1 right col first paragraph “through the use of quantization [3, 28, 26], low-rank matrix factorization [19, 6], pruning [11, 10], architecture design [27, 17], etc. Recently, it has been shown that weight compression by quantization can achieve very large savings in memory, reducing each weight to as little as 1 bit, with a marginal cost in classification accuracy [3, 28]. However, it is less effective along the computational dimension, because the core network operation, implemented by each of its units, is the dot-product between a weight and an activation vector. On the other hand, complementing binary or quantized weights with quantized activations allows the replacement of expensive dot-products by logical and bitcounting operations. Hence, substantial speed ups are possible if, in addition to the weights, the inputs of each unit are binarized or quantized to low-bit.”).
Regarding claim 22
Claim 22 recites analogous limitations to claim 8 and therefore is rejected on the same ground as claim 8.
Regarding claim 24
Claim 24 recites analogous limitations to claim 8 and therefore is rejected on the same ground as claim 8.

Claims 12-14 are rejected under 35 U.S.C. 103 as being unpatentable over 
Park et al. (“Weighted-Entropy-based Quantization for Deep Neural Networks”, hereinafter: Park) in view of Han et al. (“Deep Compression Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding”, hereinafter: Han) in view of Choi et al. and further in view of NVIDIA (“NVIDIA TESLA: AUNIFIED GRAPHICS AND COMPUTING ARCHITECTURE”).  
Regarding claim 12
Park in view of Han with Choi teaches claim 11.
Park further teaches wherein the processor comprises a Graphics Processing Unit (GPU) (pg. 5456 right col “memory capacity requirements of neural networks during the inference by exploiting the benefits of dedicated hardware accelerators, e.g. NVIDIA P40 and P4 [2][corresponds to processor] which support 8-bit integer arithmetic or Stripes [14] which provides execution time and energy consumption proportional to the bitwidth.” Also see pg. 4561 section 5.1 “In the cases of GoogLeNet and ResNet, the batch size is limited due to insufficient GPU memory capacity; this may increase overall accuracy loss. We use ILSVRC2012 data set, which contains 1.28M images for training and 50K images for testing. During the six epochs of fine-tuning, we first set the initial learning rate to 0.001 and decrease it by 10 times every two epochs.”) or a General-Purpose GPU (GPGPU), 
Park in view of Han with Choi does not teach wherein the GPU or the GPGPU comprises one or more graphics processing cores.  
NVIDIA teaches wherein the GPU or the GPGPU comprises one or more graphics processing cores. (pg. 19 “The Tesla architecture is based on a scalable processor array. Figure 1 shows a block diagram of a GeForce 8800 GPU with 128 streaming-processor (SP) cores organized as 16 streaming multiprocessors (SMs) in eight independent processing units called texture/processor clusters (TPCs).”)
Park, Han, Choi and NVIDIA are analogous art because they are directed to data automation. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Park in view of Han with Choi to incorporate the teaching of NVIDIA to include processor with multicore. 
One of ordinary skill in the arts would have been motivated to make this modification in order to enable “high-performance parallel computing applications written in the C language using the Compute Unified Device Architecture (CUDA2–4) parallel programming model” to use in software development tools for faster speed as disclosed by NVIDIA (pg. 1 left col second paragraph “NVIDIA’s Tesla architecture, introduced in November 2006 in the GeForce 8800 GPU, unifies the vertex and pixel processors and extends them, enabling high-performance parallel computing applications written in the C language using the Compute Unified Device Architecture (CUDA2–4) parallel programming model and development tools. The Tesla unified graphics and computing architecture is available in a scalable family of GeForce 8-series GPUs and Quadro GPUs for laptops, desktops, workstations, and servers. It also provides the processing architecture for the Tesla GPU computing platforms introduced in 2007 for high-performance computing.”).

Regarding claim 13
Park in view of Han with Choi teaches claim 11.
Park in view of Han with Choi does not teach wherein the processor comprises one or more processor cores.
NVIDIA teaches wherein the processor comprises one or more processor cores. (Pg. 19 “The Tesla architecture is based on a scalable processor array. Figure 1 shows a block diagram of a GeForce 8800 GPU with 128 streaming-processor (SP) cores organized as 16 streaming multiprocessors (SMs) in eight independent processing units called texture/processor clusters (TPCs).”)
Park, Choi, Han and NVIDIA are analogous art because they are directed to data automation. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Park in view of Han with Choi to incorporate the teaching of NVIDIA to include processor with multicore. 
One of ordinary skill in the arts would have been motivated to make this modification in order to enable “high-performance parallel computing applications written in the C language using the Compute Unified Device Architecture (CUDA2–4) parallel programming model” to use in software development tools for faster speed as disclosed by NVIDIA (pg. 1 left col second paragraph “NVIDIA’s Tesla architecture, introduced in November 2006 in the GeForce 8800 GPU, unifies the vertex and pixel processors and extends them, enabling high-performance parallel computing applications written in the C language using the Compute Unified Device Architecture (CUDA2–4) parallel programming model and development tools. The Tesla unified graphics and computing architecture is available in a scalable family of GeForce 8-series GPUs and Quadro GPUs for laptops, desktops, workstations, and servers. It also provides the processing architecture for the Tesla GPU computing platforms introduced in 2007 for high-performance computing.”).

Regarding claim 14 
Park in view of Han with Choi teaches claim 1.
Park in view of Han with Choi does not explicitly teaches wherein one or more of a processor, the logic circuitry, and memory are on a single integrated circuit die.  
NVIDIA teaches wherein one or more of a processor, the logic circuitry, and memory are on a single integrated circuit die. (“Figure 1. Tesla unified graphics and computing GPU architecture. TPC: texture/processor cluster; SM: streaming multiprocessor; SP: streaming processor; Tex: texture, ROP: raster operation processor.” Also see pg. 19 right col “At the highest level, the GPU’s scalable streaming processor array (SPA) performs all the GPU’s programmable calculations. The scalable memory system consists of external DRAM control and fixed-function raster operation processors (ROPs) that perform color and depth frame buffer operations directly on memory”)
Park, Han, Choi and NVIDIA are analogous art because they are directed to data automation. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Park in view of Han with Choi to incorporate the teaching of NVIDIA to include processor with multicore. 
One of ordinary skill in the arts would have been motivated to make this modification in order to enable “high-performance parallel computing applications written in the C language using the Compute Unified Device Architecture (CUDA2–4) parallel programming model” to use in software development tools for faster speed as disclosed by NVIDIA (pg. 1 left col second paragraph “NVIDIA’s Tesla architecture, introduced in November 2006 in the GeForce 8800 GPU, unifies the vertex and pixel processors and extends them, enabling high-performance parallel computing applications written in the C language using the Compute Unified Device Architecture (CUDA2–4) parallel programming model and development tools. The Tesla unified graphics and computing architecture is available in a scalable family of GeForce 8-series GPUs and Quadro GPUs for laptops, desktops, workstations, and servers. It also provides the processing architecture for the Tesla GPU computing platforms introduced in 2007 for high-performance computing.”).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Tu et al. (“Reducing the Model Order of Deep Neural Networks Using Information Theory”) teaches a method to compress deep neural networks by using the Fisher Information metric, which we estimate through a stochastic optimization method that keeps track of second order information in the network. 
Ji et al. (US 2016/0086078 A1) teaches object recognition with reduced neural network weight precision. 
Miyashita et al. (“Convolutional Neural Networks using Logarithmic Data Representation”) teaches weights and activations in a trained network naturally have non-uniform distributions using non-uniform, base-2 logarithmic representation to encode weights. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to VAN C MANG whose telephone number is (571)270-7598.  The examiner can normally be reached on Mon - Fri 8:00-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/V.M./Examiner, Art Unit 2126  
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126