DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application is filed on 09/222/2017. 
This action is in response to arguments and/or remarks filed on 05/20/2022. In the current amendments claims 6 and 20 have been amended. Claims 4-5, 12, 14 and 19 have been cancelled. Claims 1-3, 6-11, 13, 5-18 and 20 are currently pending and have been examined. 

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 06/04/2022 has been entered.

Response to Arguments
Applicant’s arguments with respect to claim(s) 1-3, 6-9, 11, 13, 15-18 and 20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-2, 11, 13 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Han et al. (“Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding”, hereinafter: Han) in view of Kulkarni et al. (US 2016/0140425 A1) and further in view of Zhang et al. (“Bilinear Vector Quantization”, hereinafter: Zhang). 
Regarding claim 1 (Currently Amended)
Han teaches an information processing apparatus comprising: one or more processors (pg. 9 section 6.3 “We compare three different off-the-shelf hardware: the NVIDIA GeForce GTX Titan X and the Intel Core i7 5930K as desktop processors (same package as NVIDIA Digits Dev Box) and NVIDIA Tegra K1 as mobile processor.”)
and one or more memories, (pg. 5 “The total size of AlexNet decreased from 240MB to 6.9MB, which is small enough to be put into on-chip SRAM, eliminating the need to store the model in energy-consuming DRAM memory”) 
wherein the one or more processors performs, by executing programs stored in the one or more memories: (pg. 5 “The total size of AlexNet decreased from 240MB to 6.9MB, which is small enough to be put into on-chip SRAM, eliminating the need to store the model in energy-consuming DRAM memory”)
determining a plurality of blocks in which a weight parameter between a L layer and a layer next to the L layer of a neural network is divided, (Examiner notes that the weight parameters are divided into four different colors see pg. 3 section 3 “Weight sharing is illustrated in Figure 3. Suppose we have a layer that has 4 input neurons and 4 output neurons, the weight is a 4 × 4 matrix. On the top left is the 4 × 4 weight matrix, and on the bottom left is the 4 × 4 gradient matrix. The weights are quantized to 4 bins (denoted with 4 colors), all the weights in the same bin share the same value, thus for each weight, we then need to store only a small index into a table of shared weights. During update, all the gradients are grouped by the color and summed together, multiplied by the learning rate and subtracted from the shared centroids from last iteration.”)
Han does not teach wherein the plurality of blocks are blocks in which a feature channel of the weight parameter having a number of channels based on a number of feature channels in the L laver and a number of feature channels in the next layer of the L layer of the neural network is divided by an integer value; and encoding the weight parameter by approximating the plurality of blocks respectively by a linear sum of (a) two or more different codebook vectors, selected from a set of codebook vectors having a same channel size as the block, wherein the two or more different codebook vectors are part of a set of codebook vectors, and (b) codebook coefficients, wherein codebook vectors selected from a set of codebook vectors are selected a predetermined number based on an absolute value.
Kulkarni teaches wherein the plurality of blocks are blocks in which a feature channel of the weight parameter having a number of channels based on a number of feature channels in the L laver (para [0038] “In FIG. 2, each layer, represented by a box, is labeled with the size RlXClXKl, of its output in equation (3). The Ki kernels at layerl have dimension nl,xnl,xKl-1. The layer index 1 (respectively, kernel spatial dimension n) is indicated below (above) the box for each layer. The input image is assumed normalized to size 224x224x3, and 4x down-Sampling is applied during the first layer.”)
and a number of feature channels in the next layer of the L layer of the neural network is divided by an integer value; (para [0039] “The convolutional layers (l=1, 4, 7-9) first compute the spatial convolution of the input with K. kernels of size nxnixK and then apply entry-wise Rectified Linear Units (ReUs) max(0, Z). The normalization layers (1–2, 5) normalize each xe{x'} at the input using what can be seen as a generalization of the la norm consisting of dividing each entry X, of X…” also see para [0061] “Accordingly, the number n, of sub-vectors x; that x is divided into can vary from layer to layer,”)
selected from a set of codebook vectors having a same channel size as the block, (para [0040] “The fully connected layers (l=11-13) can be seen as convolutional layers with kernels having the same size as the layer's input data. The last layer (l=13) uses a softmax non linearity instead of the ReLU (Rectified Linear Unit) non linearity used in other layers and acts as a multi-class classifier, having as many outputs as there are classes targeted by the system.”)
wherein the two or more different codebook vectors are part of a set of codebook vectors, (para [0059] “In yet another embodiment of the present principles, a multi-layer, block-diagonal constrained architecture is described. Similarly to (9a), one can constrain the matrices MP to be block-diagonal and composed of sub-matrices M7 when learning the deep architecture defined by (11a-c). The size of the matrices M can vary from one layer to the other so as to incorporate dependencies between different sub-vectors from previous layers.”)
and (b) codebook coefficients; (Para [0058] “An image is given as input to the DCNN. The output of each layer (l=1-4) is concatenated to form one large feature vectorx0 (xj inequation 13 below). The superscript associated with the variables indicates the horizontal adaptation layer it belongs to. In the diagram, g represents the number of coefficients in the given block. For example, go represents the number of coefficients in X1. In the case of block matrix M1, go is number of input coefficients and g is number of output coefficients.”)
Han and Kulkarni are analogous art because they are both directed to neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Han to incorporate the teaching of Kulkarni to include method and system for image classification with joint feature adaptation and classifier learning. 
One of ordinary skill in the art would have been motivated to make this modification in order to have “an automated image classification system which can perform the task of retrieving the relevant images, based on the user query” for the purpose of avoiding manually mining image data from database for assessment as disclosed by Kulkarni (para [0003] “In the era of Big Data, image classification systems have become an area of increased interest, with application in many real world Scenarios. Provided an image as an input to an image classification system, the task of the system is to identify the visual concept present in the image. For example, in landscape assessment or planning, one needs to classify landscape images into classes Such as forest, water or agriculture. Since the number of landscape images in a database might be very large, it becomes difficult for a user to mine the required relevant images manually from a database for assessment.”).
Han in view of Kulkarni does not teach and encoding the weight parameter by approximating the plurality of blocks respectively by a linear sum of (a) two or more different codebook vectors,
…
and (b) codebook coefficients, wherein codebook vectors selected from a set of codebook vectors are selected a predetermined number based on an absolute value.
Zhang teaches and encoding the weight parameter by approximating the plurality of blocks respectively by a linear sum of (a) two or more different codebook vectors, (pg. 2 section 3 “To improve the accuracy of the reconstructed vector, we propose a bilinear vector quantization (BVQ) to encode a vector by a linear combination of two vectors. Suppose x is the vector to be encoded, which is encoded by two linearly independent codewords vi and vj”)
…
and (b) codebook coefficients, wherein codebook vectors selected from a set of codebook vectors are selected a predetermined number based on an absolute value. (Under its broadest reasonable interpretation (BRI) Examiner notes that all the vectors in the codebook are all zero-mean and normalized which means they don’t want negative value hence absolute value as evidence by pg. 3 left col “respectively. If the vectors in the codebook are all zero-mean and normalized: µi = µj = 0 and σ 2 i = σ 2 j = 1…” and also see pg. 3 right col “To encode a vector x in BVQ, we need only compute H for all the pairwise combinations of vectors in the codebook, then the maximum H is selected and its corresponding parameters s, t, and o are computed. It is unnecessary to compute s, t, and o for each combination.”)
Han, Kulkarni and Zhang are analogous art because they are all directed to codebook. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Han in view of Kulkarni to incorporate the teaching of Zhang to include a useful data compression method using vector quantization. 
One of ordinary skill in the art would have been motivated to make this modification in order to optimized codebook using vector quantization algorithm and help computer system process large values of codebook which can eventually reduce computation time as disclosed by Zhang (right col second paragraph “Although the accuracy of the reconstructed blocks in VQ can be partly improved by increasing the number of codewords or by optimizing the codebook, both of them are not the essential solutions. It cannot achieve the required effectiveness and efficiency by increasing the number of codewords because a larger codebook brings a bigger burden to the system resources to store, read and handle it. Moreover, the accuracy increases slowly when the number of the codewords gets a large value. Similarly, it is not an essential solution by optimizing the codebook, and the optimization of codebook itself is a difficult problem. Some optimization methods to construct the codebook of VQ have been proposed, such as the K-means clustering algorithm [6]. However, an optimized codebook with a fixed number of vectors is still hard to meet the accuracy requirement.”)
Regarding claim 11
Claim 11 recites analogous limitations to independent claim 1 and therefore is rejected on the same ground as independent claim 1.
Regarding claim 13
Claim 13 recites analogous limitations to independent claim 1 and therefore is rejected on the same ground as independent claim 1.

Regarding claim 2 
Han in view of Kulkarni with Zhang teaches claim 1.  
Han further teaches wherein the one or more processor (pg. 9 section 6.3 “We compare three different off-the-shelf hardware: the NVIDIA GeForce GTX Titan X and the Intel Core i7 5930K as desktop processors (same package as NVIDIA Digits Dev Box) and NVIDIA Tegra K1 as mobile processor.”)
divides the weight parameter into the plurality of groups after aligning the weight parameter by a predetermined method. (Examiner notes that the weight parameters are divided into four different colors see pg. 3 section 3 “Weight sharing is illustrated in Figure 3. Suppose we have a layer that has 4 input neurons and 4 output neurons, the weight is a 4 × 4 matrix. On the top left is the 4 × 4 weight matrix, and on the bottom left is the 4 × 4 gradient matrix. The weights are quantized to 4 bins (denoted with 4 colors), all the weights in the same bin share the same value, thus for each weight, we then need to store only a small index into a table of shared weights. During update, all the gradients are grouped by the color and summed together, multiplied by the learning rate and subtracted from the shared centroids from last iteration.”)

Regarding claim 18 
Han in view of Kulkarni with Zhang teaches claim 1. 
Han further teaches wherein the one or more processor divides the weight parameter into the plurality of groups such that the weight parameter after division is equal in size. (Pg. 3 “For example, Figure 3 shows the weights of a single layer neural network with four input units and four output units. There are 4×4 = 16 weights originally but there are only 4 shared weights: similar weights are grouped together to share the same value.”)

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Han et al. (“Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding”, hereinafter: Han) in view of Kulkarni et al. (US 2016/0140425 A1) in view of Zhang et al. and further in view of Courbariaux et al. (“BinaryConnect: Training Deep Neural Networks with binary weights during propagations”, hereinafter: Courbariaux).
Regarding claim 3
Han in view of Kulkarni with Zhang teaches claim 1.
Han in view of Kulkarni with Zhang does not teach wherein the weight parameter has elements of a binary value or a ternary value.  
Courbariaux teaches wherein the weight parameter has elements of a binary value or a ternary value. (Abstract “Binary weights, i.e., weights which are constrained to only two possible values (e.g. -1 or 1), would bring great benefits to specialized DL hardware by replacing many multiply-accumulate operations by simple accumulations, as multipliers are the most space and powerhungry components of the digital implementation of neural networks… also see pg. section 2.4 “Since the binarization operation is not influenced by variations of the real-valued weights w when its magnitude is beyond the binary values ±1, and since it is a common practice to bound weights (usually the weight vector) in order to regularize them, we have chosen to clip the real-valued weights within the [−1, 1] interval right after the weight updates, as per Algorithm 1.”)
Han, Kulkarni, Zhang and Courbariaux are analogous art because they are all directed to neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Han in view of Kulkarni with Zhang to incorporate the teaching of Courbariaux to include binary weight during propagation in deep neural networks.
One of ordinary skill in the art would have been motivated in order to improve faster computation at both training and test time using a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights as disclosed by Courbariaux (abstract “In the past, GPUs enabled these breakthroughs because of their greater computational speed. In the future, faster computation at both training and test time is likely to be crucial for further progress and for consumer applications on low-power devices. As a result, there is much interest in research and development of dedicated hardware for Deep Learning (DL). Binary weights, i.e., weights which are constrained to only two possible values (e.g. -1 or 1), would bring great benefits to specialized DL hardware by replacing many multiply-accumulate operations by simple accumulations, as multipliers are the most space and powerhungry components of the digital implementation of neural networks. We introduce BinaryConnect, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated. Like other dropout schemes, we show that BinaryConnect acts as regularizer and we obtain near state-of-the-art results with BinaryConnect on the permutation-invariant MNIST, CIFAR-10 and SVHN.”).

Claims 6-7 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Han et al. (“Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding”, hereinafter: Han) in view of Kulkarni et al. (US 2016/0140425 A1) in view of Zhang et al. and further in view of Wang et al. (“Small-Footprint High-Performance Deep Neural Network-Based Speech Recognition using Split-VQ”).
Regarding claim 6 (Currently Amended)
Han teaches an information processing apparatus comprising: one or more processors and one or more memories, (pg. 9 section 6.3 “We compare three different off-the-shelf hardware: the NVIDIA GeForce GTX Titan X and the Intel Core i7 5930K as desktop processors (same package as NVIDIA Digits Dev Box) and NVIDIA Tegra K1 as mobile processor.” pg. 5 “The total size of AlexNet decreased from 240MB to 6.9MB, which is small enough to be put into on-chip SRAM, eliminating the need to store the model in energy-consuming DRAM memory”)
wherein the one or more processors performs, by executing programs stored in the one or more memories: (pg. 5 “The total size of AlexNet decreased from 240MB to 6.9MB, which is small enough to be put into on-chip SRAM, eliminating the need to store the model in energy-consuming DRAM memory”)
determining a plurality of blocks in which a weight parameter between a L layer and a layer next to the L layer of a neural network is divided, (Examiner notes that the weight parameters are divided into four different colors see pg. 3 section 3 “Weight sharing is illustrated in Figure 3. Suppose we have a layer that has 4 input neurons and 4 output neurons, the weight is a 4 × 4 matrix. On the top left is the 4 × 4 weight matrix, and on the bottom left is the 4 × 4 gradient matrix. The weights are quantized to 4 bins (denoted with 4 colors), all the weights in the same bin share the same value, thus for each weight, we then need to store only a small index into a table of shared weights. During update, all the gradients are grouped by the color and summed together, multiplied by the learning rate and subtracted from the shared centroids from last iteration.”)
Han does not teach wherein the plurality of blocks are blocks in which a feature channel of the weight parameter having a number of channels based on a number of feature channels in the L layer and a number of feature channels in the next layer of the L layer of the neural network is divided by an integer value; encoding the weight parameter by approximating the plurality of blocks respectively by a linear combination of (a) two or more different codebook vectors, selected from a set of codebook vectors having a same channel size as the block, wherein the two or more different codebook vectors are part of a set of codebook vectors, and (b) codebook coefficients; and reconstructing the weight parameter by a linear sum of a codebook coefficient determined by the one or more processors and a corresponding codebook vector that corresponds to the codebook coefficient, wherein a weight coefficient is determined by optimizing a loss function including a loss term of approximation accuracy of the weight parameter of the neural network and a loss term as a sparse term of the weight coefficient.
Kulkarni teaches wherein the plurality of blocks are blocks in which a feature channel of the weight parameter having a number of channels based on a number of feature channels in the L laver (para [0038] “In FIG. 2, each layer, represented by a box, is labeled with the size RlXClXKl, of its output in equation (3). The Ki kernels at layerl have dimension nl,xnl,xKl-1. The layer index 1 (respectively, kernel spatial dimension n) is indicated below (above) the box for each layer. The input image is assumed normalized to size 224x224x3, and 4x down-Sampling is applied during the first layer.”)
and a number of feature channels in the next layer of the L layer of the neural network is divided by an integer value; (para [0039] “The convolutional layers (l=1, 4, 7-9) first compute the spatial convolution of the input with K. kernels of size nxnixK and then apply entry-wise Rectified Linear Units (ReUs) max(0, Z). The normalization layers (1–2, 5) normalize each xe{x'} at the input using what can be seen as a generalization of the la norm consisting of dividing each entry X, of X…” also see para [0061] “Accordingly, the number n, of sub-vectors x; that x is divided into can vary from layer to layer,”)
selected from a set of codebook vectors having a same channel size as the block, (para [0040] “The fully connected layers (l=11-13) can be seen as convolutional layers with kernels having the same size as the layer's input data. The last layer (l=13) uses a softmax non linearity instead of the ReLU (Rectified Linear Unit) non linearity used in other layers and acts as a multi-class classifier, having as many outputs as there are classes targeted by the system.”)
wherein the two or more different codebook vectors are part of a set of codebook vectors, (para [0059] “In yet another embodiment of the present principles, a multi-layer, block-diagonal constrained architecture is described. Similarly to (9a), one can constrain the matrices MP to be block-diagonal and composed of sub-matrices M7 when learning the deep architecture defined by (11a-c). The size of the matrices M can vary from one layer to the other so as to incorporate dependencies between different sub-vectors from previous layers.”)
and (b) codebook coefficients; (Para [0058] “An image is given as input to the DCNN. The output of each layer (l=1-4) is concatenated to form one large feature vectorx0 (xj inequation 13 below). The superscript associated with the variables indicates the horizontal adaptation layer it belongs to. In the diagram, g represents the number of coefficients in the given block. For example, go represents the number of coefficients in X1. In the case of block matrix M1, go is number of input coefficients and g is number of output coefficients.”)
Han and Kulkarni are analogous art because they are both directed to neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Han to incorporate the teaching of Kulkarni to include method and system for image classification with joint feature adaptation and classifier learning. 
One of ordinary skill in the art would have been motivated to make this modification in order to have “an automated image classification system which can perform the task of retrieving the relevant images, based on the user query” for the purpose of avoiding manually mining image data from database for assessment as disclosed by Kulkarni (para [0003] “In the era of Big Data, image classification systems have become an area of increased interest, with application in many real world Scenarios. Provided an image as an input to an image classification system, the task of the system is to identify the visual concept present in the image. For example, in landscape assessment or planning, one needs to classify landscape images into classes Such as forest, water or agriculture. Since the number of landscape images in a database might be very large, it becomes difficult for a user to mine the required relevant images manually from a database for assessment.”).
Han in view of Kulkarni does not teach encoding the weight parameter by approximating the plurality of blocks respectively by a linear combination of (a) two or more different codebook vectors,
…
wherein a weight coefficient is determined by optimizing a loss function including a loss term of approximation accuracy of the weight parameter of the neural network and a loss term as a sparse term of the weight coefficient.
Zhang teaches encoding the weight parameter by approximating the plurality of blocks respectively by a linear combination of (a) two or more different codebook vectors, (pg. 2 section 3 “To improve the accuracy of the reconstructed vector, we propose a bilinear vector quantization (BVQ) to encode a vector by a linear combination of two vectors. Suppose x is the vector to be encoded, which is encoded by two linearly independent codewords vi and vj”)
Han, Kulkarni and Zhang are analogous art because they are all directed to codebook. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Han in view of Kulkarni to incorporate the teaching of Zhang to include a useful data compression method using vector quantization. 
One of ordinary skill in the art would have been motivated to make this modification in order to optimized codebook using vector quantization algorithm and help computer system process large values of codebook which can eventually reduce computation time as disclosed by Zhang (right col second paragraph “Although the accuracy of the reconstructed blocks in VQ can be partly improved by increasing the number of codewords or by optimizing the codebook, both of them are not the essential solutions. It can not achieve the required effectiveness and efficiency by increasing the number of codewords because a larger codebook brings a bigger burden to the system resources to store, read and handle it. Moreover, the accuracy increases slowly when the number of the codewords gets a large value. Similarly, it is not an essential solution by optimizing the codebook, and the optimization of codebook itself is a difficult problem. Some optimization methods to construct the codebook of VQ have been proposed, such as the K-means clustering algorithm [6]. However, an optimized codebook with a fixed number of vectors is still hard to meet the accuracy requirement.”)
Han in view of Kulkarni with Zhang does not teach wherein a weight coefficient is determined by optimizing a loss function including a loss term of approximation accuracy of the weight parameter of the neural network and a loss term as a sparse term of the weight coefficient.
Wang teaches wherein a weight coefficient is determined by optimizing a loss function including a loss term of approximation accuracy of the weight parameter of the neural network and a loss term as a sparse term of the weight coefficient. (Pg. 4985 “The weight matrices A(l)’s and the bias vectors b(l)’s of the DNN can be estimated by minimizing the following cross entropy based loss function… where Xtr = (x1, . . . , xt) is a set of training feature vectors; st is the senone label of xt. The optimization is usually done by backpropagation using stochastic gradient descent. For example, given a mini-batch of training feature vectors, Xmb, and the corresponding labels, the weight matrix A(l) is updated using.” pg. 4986 further explain how the codebook can be fine tune using loss function see pg. 4986 “When an aggressive quantization is used, a significant WER increase will be observed. In this case, the codebook can be fine-tuned to minimize the cross entropy-based loss function in Eq. (6). The gradient of the loss function with respect to the codeword mk can be obtained using the chain rule) 
Han, Kulkarni, Zhang and Wang are analogous art because they are all directed to data automation. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Han in view of Kulkarni with Zhang to incorporate the teaching of Wang to include weight coefficients in quantized matrix structure that can save portion of computation time in computing neural network model as disclosed by Wang (pg. 4987 right col section 5 “An alternative is to quantize and fine-tune one layer at a time. Another limitation of our method is the use of Forbenius norm as the error function. A better way is to consider the distribution of each layer’s input and minimize the expected errors of the output. Though the proposed quantized matrix structure can potentially save a portion of computation, the nearby weight coefficients are no longer stored in an adjacent order. This may reduce the cache hit rate and thus slow down the computation. An efficient runtime implementation on mobile devices is needed to realize potential to speed up the neural network computation under the proposed weight matrix quantization scheme.”).
Regarding claim 20
Claim 20 recites analogous limitations to independent claim 6 and therefore is rejected on the same ground as independent claim 6.

Regarding claim 7 
Han in view of Kulkarni with Zhang and Wang teaches claim 6. 
Han further teaches the one or more processors reads (pg. 9 section 6.3 “We compare three different off-the-shelf hardware: the NVIDIA GeForce GTX Titan X and the Intel Core i7 5930K as desktop processors (same package as NVIDIA Digits Dev Box) and NVIDIA Tegra K1 as mobile processor.”)
and uses different codebook sets depending on a layer of the neural network which is a reconstruction target of the weight parameter. (Pg. 3 second paragraph “To compress further, we store the index difference instead of the absolute position, and encode this difference in 8 bits for conv layer and 5 bits for fc layer. When we need an index difference larger than the bound, we the zero padding solution shown in Figure 2: in case when the difference exceeds 8, the largest 3-bit (as an example) unsigned number, we add a filler zero.” Examiner notes that at each layer under quantize the weights with code-book Han teaches retrain code book and loops back thus mean using different codebook sets on different layer)

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Han et al. (“Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding”, hereinafter: Han) in view of Kulkarni et al. (US 2016/0140425 A1) in view of Zhang et al. in view of Wang and further in view of Courbariaux et al. (“BinaryConnect: Training Deep Neural Networks with binary weights during propagations”, hereinafter: Courbariaux).
Regarding claim 8 (Previously Presented)
Han in view of Kulkarni with Zhang and Wang teaches claim 6. 
Han in view of Kulkarni with Zhang and Wang teaches wherein at least one of the weight coefficient and the codebook vector has a binary value or a ternary value as an element.
Courbariaux teaches wherein at least one of the weight coefficient and the codebook vector has a binary value or a ternary value as an element. (Abstract “Binary weights, i.e., weights which are constrained to only two possible values (e.g. -1 or 1), would bring great benefits to specialized DL hardware by replacing many multiply-accumulate operations by simple accumulations, as multipliers are the most space and powerhungry components of the digital implementation of neural networks… also see pg. section 2.4 “Since the binarization operation is not influenced by variations of the real-valued weights w when its magnitude is beyond the binary values ±1, and since it is a common practice to bound weights (usually the weight vector) in order to regularize them, we have chosen to clip the real-valued weights within the [−1, 1] interval right after the weight updates, as per Algorithm 1.”)
Han, Kulkarni, Zhang, Wang and Courbariaux are analogous art because they are all directed to data automation.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Han in view of Kulkarni with Zhang and Wang to incorporate the teaching of Courbariaux to include binary weight during propagation in deep neural networks.
One of ordinary skill in the art would have been motivated in order to improve faster computation at both training and test time using a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights as disclosed by Courbariaux (abstract “In the future, faster computation at both training and test time is likely to be crucial for further progress and for consumer applications on low-power devices. As a result, there is much interest in research and development of dedicated hardware for Deep Learning (DL). Binary weights, i.e., weights which are constrained to only two possible values (e.g. -1 or 1), would bring great benefits to specialized DL hardware by replacing many multiply-accumulate operations by simple accumulations, as multipliers are the most space and powerhungry components of the digital implementation of neural networks. We introduce BinaryConnect, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated.”).

Claims 9 and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Han et al. (“Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding”, hereinafter: Han) in view of Kulkarni et al. in view of Zhang et al. in view of Wang et al. and further in view of Lane et al. (“DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices”, hereinafter: Lane). 
Regarding claim 9 10175431 01
Han in view of Kulkarni with Zhang and Wang teaches claim 6. 
Han in view of Kulkarni with Zhang and Wang does not teach wherein the one or more processors further function as allowing a user to instruct a constraint condition on a learning parameter.
Lane teaches herein the one or more processors further function as allowing a user to instruct a constraint condition on a learning parameter. (Pg. 9 left col “In comparison to these baselines, DeepX is free to use any supported unit, and has constrained use of RLC; specifically we only set ℰ𝑇𝐻 to allow expected accuracy drops of < 5%. To validate the accuracy drop, we use the original datasets used to train the respective models and run a large number of offline experiments with varying parametric settings used for RLC and DAD (See Algorithm 1).” Also see pg. 4 left col “The mobile CPU (or another constrained processor) supports initial model layers that have been compacted to meet its memory and computational limits. The remaining majority of model layers are then completed by GPU computation. Note, the model is compressed only where needed by resource constraints, instead of compression being applied across all layers”)
Han, Kulkarni, Zhang, Wang and Lane are analogous art because they are all directed to data automation. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Han in view of Kulkarni with Zhang and Wang to incorporate the teaching of Lane to improve software accelerator for low-power deep learning inference on mobile devices.
One of ordinary skill in the art would have been motivated to make this modification in order to improve a method or system that can “automatically decompose a deep model across available processors to maximize energy-efficiency and execution time, within fluctuating mobile resource constraints such as computation and memory” as disclosed by Lane (pg. 1 right col third paragraph “This accelerator dramatically lowers resource overhead by leveraging a mix of heterogeneous processors (e.g., GPUs, LPUs) present, but seldom utilized for sensor processing, in mobile SoCs. Each computational unit provides distinct resource efficiencies when executing different inference phases of deep models. DeepX allows non-expert developers to exploit these benefits by simply specifying a deep model to run. But beyond just using various local processors, DeepX amplifies the advantages they offer through two inference-time resource control algorithms, namely: (1) Runtime Layer Compression (RLC) and (2) Deep Architecture Decomposition (DAD). Through these runtime algorithms, DeepX can automatically decompose a deep model across available processors to maximize energy-efficiency and execution time, within fluctuating mobile resource constraints such as computation and memory.”).

Regarding claim 15 
Han in view of Kulkarni with Zhang, Wang and Lane teaches claim 9. 
Lane further teaches wherein the one or more processor (abstract “The foundation of DeepX is a pair of resource control algorithms, designed for the inference stage of deep learning, that: (1) decompose monolithic deep model network architectures into unit-blocks of various types, that are then more efficiently executed by heterogeneous local device processors (e.g., GPUs, CPUs);”) performs learning such that the constraint condition instructed by the instruction unit is satisfied (pg. 6 right col “But due to the large number of units and layers that comprise typical deep models, a large variety of potential decompositions exist. Consequently, the search for this plan must balance the speed and efficiency it identifies the plan, along with the need to satisfy user performance goals.”)
and then encodes the weight parameter based on a result of the learning. (pg. 6 right col “Algorithm 1 details the approach by DAD to cope with these competing concerns. Three specific techniques are employed, each narrow the search space by encoding an understanding of the deep learning algorithms and how they execute on within the resource limits presented by hardware. First, the architecture of each deep learning model includes a series of dependencies based on factors such as layer type, which determines the units must be computed in series. This limits groups of layers (Algorithm 1, line 2−7) and units that can be packed together to maximize desirable properties like parallel execution. Second, hardware resource limits dictate if a unit-block of the model is viable or not (line 5 and 7).”)
Han, Kulkarni, Zhang, Wang and Lane are analogous art because they are all directed to data automation. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Han in view of Kulkarni with Zhang and Wang to incorporate the teaching of Lane to improve software accelerator for low-power deep learning inference on mobile devices.
One of ordinary skill in the art would have been motivated to make this modification in order to improve a method or system that can “automatically decompose a deep model across available processors to maximize energy-efficiency and execution time, within fluctuating mobile resource constraints such as computation and memory” as disclosed by Lane (pg. 1 right col third paragraph “This accelerator dramatically lowers resource overhead by leveraging a mix of heterogeneous processors (e.g., GPUs, LPUs) present, but seldom utilized for sensor processing, in mobile SoCs. Each computational unit provides distinct resource efficiencies when executing different inference phases of deep models. DeepX allows non-expert developers to exploit these benefits by simply specifying a deep model to run. But beyond just using various local processors, DeepX amplifies the advantages they offer through two inference-time resource control algorithms, namely: (1) Runtime Layer Compression (RLC) and (2) Deep Architecture Decomposition (DAD). Through these runtime algorithms, DeepX can automatically decompose a deep model across available processors to maximize energy-efficiency and execution time, within fluctuating mobile resource constraints such as computation and memory.”).
Regarding claim 16 
Han in view of Kulkarni with Zhang, Wang and Lane teaches claim 15. 
Han further teaches …and wherein the one or more processor encodes the weight parameter such that the weight parameter after compression coding becomes able to be stored into the memory. (Examiner notes that after compression is done Han teaches storing the index difference see pg. 3 “To compress further, we store the index difference instead of the absolute position, and encode this difference in 8 bits for conv layer and 5 bits for fc layer. When we need an index difference larger than the bound, we the zero padding solution shown in Figure 2: in case when the difference exceeds 8, the largest 3-bit (as an example) unsigned number, we add a filler zero.”)
Lane further teaches wherein the one or more processor (pg. 4 right col section 4.1 “The matrix operations can be efficiently computed using, e.g., a GPU, while applying new vectorization techniques [21].”)
receives, from the user, (Examiner notes that DeepX is run by user and the intent of a user see pg. 5 “There are two key components to RLC. First, a dimensionality reduction process (§IV-A) used to lower the computations required as one layer feeds into the next. Second, an estimator (§IV-B) that regulates the level of dimensionality reduction to be applied before model accuracy is effected beyond the intent of the DeepX user.”) an instruction of the constraint condition regarding a memory, (pg. 4 right col “Selection of a decomposition plan is strongly influenced by the current available resources. Via OS hooks DeepX receives current resource usage levels before performing an inference. But better decisions can be made using accurate predictions of resource load, and planning for predicted levels”)
Han, Kulkarni, Zhang, Wang and Lane are analogous art because they are all directed to data automation. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Han in view of Kulkarni with, Zhang and Wang to incorporate the teaching of Lane to improve software accelerator for low-power deep learning inference on mobile devices.
One of ordinary skill in the art would have been motivated to make this modification in order to improve a method or system that can “automatically decompose a deep model across available processors to maximize energy-efficiency and execution time, within fluctuating mobile resource constraints such as computation and memory” as disclosed by Lane (pg. 1 right col third paragraph “This accelerator dramatically lowers resource overhead by leveraging a mix of heterogeneous processors (e.g., GPUs, LPUs) present, but seldom utilized for sensor processing, in mobile SoCs. Each computational unit provides distinct resource efficiencies when executing different inference phases of deep models. DeepX allows non-expert developers to exploit these benefits by simply specifying a deep model to run. But beyond just using various local processors, DeepX amplifies the advantages they offer through two inference-time resource control algorithms, namely: (1) Runtime Layer Compression (RLC) and (2) Deep Architecture Decomposition (DAD). Through these runtime algorithms, DeepX can automatically decompose a deep model across available processors to maximize energy-efficiency and execution time, within fluctuating mobile resource constraints such as computation and memory.”).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Han et al. (“EIE: Efficient Inference Engine on Compressed Deep Neural Network”) teaches an energy efficient inference engine (EIE) that performs inference on this compressed network model and accelerates the resulting sparse matrix-vector multiplication with weight sharing. 
Xie et al. (“Sparse deep feature learning for facial expression recognition”) teaches a feature sparseness-based regularization that learns deep features with better generalization capability. 
Chen et al. (“Compressing Convolutional Neural Networks in the Frequency Domain”) teaches a network architecture, Frequency-Sensitive Hashed Nets (FreshNets), which exploits inherent redundancy in both convolutional layers and fully-connected layers of a deep learning model, leading to dramatic savings in memory and storage consumption. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VAN C MANG whose telephone number is (571)270-7598. The examiner can normally be reached Mon - Fri 8:00-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on 5712729767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/V.M./      Examiner, Art Unit 2126        
                                                                                                                                                                                          /LUIS A SITIRICHE/Primary Examiner, Art Unit 2126