DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first 
inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION. —The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 4 and 5 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

In regards to claims 4 and 5. The claims recite the limitation of “beginning and end of values”. Examiner is unclear what is meant by beginning and end of values of a matrix. For examination purposes the claims will be interpreted as a matrix having patterns of multiple kernel weight values.

In regards to claim 5. The claim recites the limitation of “full dimension”. Examiner is unclear what is meant by full dimension of the matrix. For examination purposes the claim will be interpreted as a matrix having patterns of multiple kernel weight values across a dimension.







Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 6-8, and 10-15 are rejected under 35 U.S.C. 103 as being unpatentable over Watanabe et al. (hereinafter Watanabe) US 20160246603 A1, in view of Ge et al. (hereinafter Ge) “Optimized Product Quantization”, in view of Xu et al. (hereinafter Xu) “Refinable Kernels”.
In regard to claim 1. Watanabe discloses a neural network processor operating to receive data and to classify that data comprising:
“an input register holding received data for classification” (Watanabe in at least Fig. 7, Fig. 9, Fig. 10, ¶ [0052], and ¶ [0057]- [0064] discloses the reception unit);
“codebook storage memory holding data” (Watanabe in at least Fig. 7, Fig. 9, Fig. 10, ¶ [0052], and ¶ [0057]- [0064] discloses the difference vector storage unit which is being interpreted as the codebook storage memory);
“a codeword memory holding codeword data” (Watanabe in at least Fig. 7, Fig. 9, Fig. 10, ¶ [0052], and ¶ [0057]- [0064] discloses the base codeword storage unit which is being interpreted as the codeword memory);
“arithmetic circuitry communicating with the input register, the codebook storage memory, and the codeword memory” (Watanabe in at least Fig. 7, Fig. 9, Fig. 10, ¶ [0052], and ¶ [0057]- [0064] discloses the arithmetic control unit). 

Watanabe does not explicitly disclose:
“permitting a mapping of individual codeword values to patterns of multiple kernel weight values related to a kernel of a neural network trained to provide a set of classifications”;
“provide generation of a reconstructed kernel by indexing the codebook storage memory with codeword data”; and 
However, Ge discloses:
“permitting a mapping of individual codeword values to patterns of multiple kernel weight values related to a kernel of a neural network trained to provide a set of classifications” (Ge in at least § 4.3 discloses a product quantization (PQ) technique “PQ was introduced as a way of compacting image representations for image retrieval. In this scenario, the local descriptors of an image are first aggregated as a high-dimensional (often thousands of dimensions) vector. The aggregation methods include the Fisher Kernel”; in at least § 2 discloses quantization distortion “A variety of ANN methods, including k-means [14], product quantization [1], and iterative quantization (ITQ) [9], can be formulated within a framework of vector quantization”; in at least § 2.1 discloses vector quantization “A vector quantization system [7] maps a vector                                 
                                    x
                                    ∈
                                    
                                        
                                            R
                                        
                                        
                                            D
                                        
                                    
                                
                             to a codeword                                 
                                    c
                                
                             in a codebook                                 
                                    C
                                    =
                                    {
                                    c
                                    (
                                    i
                                    )
                                    }
                                
                             with                                 
                                    i
                                
                             in a finite index set. The mapping, termed as a quantizer … Given a codebook                                 
                                    C
                                
                            , an encoder that minimizes the distortion                                 
                                    E
                                
                             must satisfy the first Lloyd’s condition [7]: the encoder                                 
                                    i
                                    (
                                    x
                                    )
                                
                             should map any                                 
                                    x
                                
                             to its nearest codeword in the codebook                                 
                                    C
                                
                            ”; in at least the introduction § discloses “A query is quantized into a codeword and then compared with a short list of data which have the same or similar codewords. In the exhaustive search, the data are quantized into codewords [1], [8], [9]; the distances of vectors are approximated by the distances of codewords”; and in at least § 4.2 “Offline, each codeword has been assigned a short list that contains all the data vectors belonging to this codeword (i.e., nearest to it). Online, a query will find a number of nearest codewords and retrieve all their short lists”);
“provide generation of a reconstructed kernel by indexing the codebook storage memory with codeword data” (Ge in at least in at least § 2.1 discloses vector quantization “A vector quantization system [7] maps a vector                                 
                                    x
                                    ∈
                                    
                                        
                                            R
                                        
                                        
                                            D
                                        
                                    
                                
                             to a codeword                                 
                                    c
                                
                             in a codebook                                 
                                    C
                                    =
                                    {
                                    c
                                    (
                                    i
                                    )
                                    }
                                
                             with                                 
                                    i
                                
                             in a finite index set. The mapping, termed as a quantizer … Given a codebook                                 
                                    C
                                
                            , an encoder that minimizes the distortion                                 
                                    E
                                
                             must satisfy the first Lloyd’s condition [7]: the encoder                                 
                                    i
                                    (
                                    x
                                    )
                                
                             should map any                                 
                                    x
                                
                             to its nearest codeword in the codebook                                 
                                    C
                                     
                                    "
                                
                            ). 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Watanabe and Ge. Watanabe teaches a codeword generation system. Ge teaches a quantization system for a large number of codewords using Fisher Kernel. One of ordinary skill would have motivation to combine Watanabe and Ge to optimize the product quantization by minimizing quantization distortions (Ge Abstract and § 4.3).
Watanabe and Ge do not explicitly disclose:
“generate output representing a dot product between the received data and the reconstructed kernel to classify the received data according to the set of classifications”. 
However, Xu discloses:
“generate output representing a dot product between the received data and the reconstructed kernel to classify the received data according to the set of classifications” (Xu in at least introduction § “Let X be a prescribed set which is called in the theory of learning an input space” and “A kernel                                 
                                    K
                                
                             on                                 
                                    X
                                
                             corresponds to a Hilbert space                                 
                                    
                                        
                                            H
                                        
                                        
                                            K
                                        
                                    
                                     
                                    :
                                    =
                                    
                                        
                                            s
                                            p
                                            a
                                            n
                                        
                                        -
                                    
                                    {
                                    K
                                    (
                                    ·
                                    ,
                                     
                                    y
                                    )
                                     
                                    :
                                     
                                    y
                                     
                                    ∈
                                     
                                    X
                                    }
                                
                               of functions on                                 
                                    X
                                
                             with an inner product determined by                                 
                                    (
                                    K
                                    (
                                    ·
                                    ,
                                     
                                    y
                                    )
                                    ,
                                    K
                                    (
                                    ·
                                    ,
                                     
                                    x
                                    )
                                    )
                                    
                                        
                                            H
                                        
                                        
                                            K
                                        
                                    
                                     
                                    =
                                     
                                    K
                                    (
                                    x
                                    ,
                                     
                                    y
                                    )
                                    ,
                                     
                                    x
                                    ,
                                     
                                    y
                                     
                                    ∈
                                     
                                    X
                                
                            .                                  
                                    
                                        
                                            H
                                        
                                        
                                            K
                                        
                                    
                                
                             is a reproducing kernel Hilbert space (RKHS)”).  The broadest reasonable interpretation is that the inner product determination provides the kernel                                
                                     
                                    K
                                    (
                                    x
                                    ,
                                     
                                    y
                                    )
                                    ,
                                     
                                    x
                                    ,
                                     
                                    y
                                     
                                    ∈
                                     
                                    X
                                
                             for classification.
It would have obvious to one of ordinary skill in the art before the effective filing date the present application to combine Watanabe, Ge, and Xu. Watanabe teaches a codeword generation system. Ge teaches a quantization system for a large number of codewords using Fisher Kernel. Xu teaches reproducing kernels. One of ordinary skill would have motivation to combine Watanabe, Ge, and Xu to incorporate the computational advantages of efficient updating the kernel, improvement of the predictor, and efficiency in setting up the coefficient matrix (Xu § 7).  

In regards to 6. Watanabe, Ge, and Xu disclose the neural network processor of claim 1 (as mentioned above) wherein:
Ge further discloses:
“the reconstructed kernel is a scalar compression of the kernel of the neural network trained to provide the set of classifications” (Ge in at least § 4.1.1 “TC (Transform Coding [8]): this is a scalar quantization (SQ) method. SQ is a special case of PQ that each dimension forms a subspace. TC uses the principal components as the subspaces. It assigns each principal component with an adaptive number of bits. A similar method was also concurrently proposed in [33]” and “We notice that TC performs clearly better than                                 
                                    
                                        
                                            P
                                            Q
                                        
                                        
                                            R
                                            O
                                        
                                    
                                
                             and                                 
                                    
                                        
                                            P
                                            Q
                                        
                                        
                                            R
                                            R
                                        
                                    
                                
                             in the GIST1M set. But TC is inferior to our methods in all data sets. This is because TC is scalar quantization, while our method quantizes multi-dimensional subspaces. Further, TC assigns an adaptive number of bits to each eigenvalue, while our method assigns the eigenvalues to each subspace. Since bit numbers are discrete but eigenvalues are continuous, it is easier for our method to achieve balance”).

In regards to 7. Watanabe, Ge, and Xu disclose the neural network processor of claim 6 (as mentioned above) wherein:
Ge further discloses:
“the scalar compression replaces a range of kernel data values with a predetermined scalar value” (Ge in at least § 4.1.1 “TC (Transform Coding [8]): this is a scalar quantization (SQ) method. SQ is a special case of PQ that each dimension forms a subspace. TC uses the principal components as the subspaces. It assigns each principal component with an adaptive number of bits. A similar method was also concurrently proposed in [33]” and “We notice that TC performs clearly better than                                 
                                    
                                        
                                            P
                                            Q
                                        
                                        
                                            R
                                            O
                                        
                                    
                                
                             and                                 
                                    
                                        
                                            P
                                            Q
                                        
                                        
                                            R
                                            R
                                        
                                    
                                
                             in the GIST1M set. But TC is inferior to our methods in all data sets. This is because TC is scalar quantization, while our method quantizes multi-dimensional subspaces. Further, TC assigns an adaptive number of bits to each eigenvalue, while our method assigns the eigenvalues to each subspace. Since bit numbers are discrete but eigenvalues are continuous, it is easier for our method to achieve balance”).
	
In regards to claim 8. Watanabe, Ge, and Xu disclose the neural network processor of claim 1 (as mentioned above) wherein:
Ge further discloses:
“the patterns of multiple kernel weight values are a product quantization of vectors of the kernel of the neural network trained to provide a set of classifications” (Ge in at least § 4.3 “PQ was introduced as a way of compacting image representations for image retrieval. In this scenario, the local descriptors of an image are first aggregated as a high-dimensional (often thousands of dimensions) vector. The aggregation methods include the Fisher Kernel [11] and VLAD [3], [35]. The aggregated vector is normalized and compressed by PCA. The compressed vector is then compacted into a short code by PQ for retrieval”). 

In regards to claim 10. Watanabe, Ge, and Xu disclose the neural network processor of claim 1 (as mentioned above) wherein:
Watanabe further discloses:
“the arithmetic circuitry employs the data of the codebook storage memory to precompute a set of multiplications between the received data and data of the codebook storage memory” (Watanabe in at least Fig. 7, Fig. 9, Fig. 10, ¶ [0052], and ¶ [0057]- [0064] discloses the arithmetic control unit). 
Ge further discloses:
“populate a first lookup table and repeatedly uses the precomputed set of multiplications according to data of the codeword memory in generation of the output representing a dot product between the received data and the reconstructed kernel” (Ge in at least § 1, § 2.2.2, § 2.3, § 4.1, and § 4.2 “Both ways of distance computation are efficient using lookup tables. For SDC, the distances between any two sub-codewords in a subspace are pre-computed and stored in a k-by-k lookup table. For ADC, the distances between the sub-vector of the query and the sub-codewords in a subspace are pre-computed online and stored in a 1-by-k lookup table. The distance in the original space is simply the sum of the distances computed from the M subspaces”).

In regards to claim 11. Watanabe, Ge, and Xu disclose the neural network processor of claim 10 (as mentioned above) wherein:
Watanabe further discloses:
“arithmetic circuit” (Watanabe in at least Fig. 7, Fig. 9, Fig. 10, ¶ [0052], and ¶ [0057]- [0064] discloses the arithmetic control unit).
Ge further discloses:
“a second lookup table and further operates to populate the second lookup table with sums formed using the data of the codeword storage memory and the first lookup table and uses the second lookup table to form the dot product” (Ge in at least § 1, § 2.2.2, § 2.3, § 4.1, and § 4.2 “Both ways of distance computation are efficient using lookup tables. For SDC, the distances between any two sub-codewords in a subspace are pre-computed and stored in a k-by-k lookup table. For ADC, the distances between the sub-vector of the query and the sub-codewords in a subspace are pre-computed online and stored in a 1-by-k lookup table. The distance in the original space is simply the sum of the distances computed from the M subspaces”).

In regards to claim 12. Watanabe, Ge, and Xu disclose the neural network processor of claim 11 (as mentioned above) wherein:
Watanabe further discloses:
“the codebook storage memory holding [data]” (Watanabe in at least Fig. 7, Fig. 9, Fig. 10, ¶ [0052], and ¶ [0057]- [0064] discloses the difference vector storage unit which is being interpreted as the codebook storage memory)
Ge further discloses:
 “storage memory holds a scalar-quantized codebook that can be reconstructed into a product-quantized codebook using scalar-quantized codewords of the codeword memory where in the product-quantized codebook can be reconstructed into a kernel using product-quantized codewords stored in the codeword memory” (Ge in at least § 4.1.1 “TC (Transform Coding [8]): this is a scalar quantization (SQ) method. SQ is a special case of PQ that each dimension forms a subspace. TC uses the principal components as the subspaces. It assigns each principal component with an adaptive number of bits. A similar method was also concurrently proposed in [33]” and “We notice that TC performs clearly better than                                 
                                    
                                        
                                            P
                                            Q
                                        
                                        
                                            R
                                            O
                                        
                                    
                                
                             and                                 
                                    
                                        
                                            P
                                            Q
                                        
                                        
                                            R
                                            R
                                        
                                    
                                
                             in the GIST1M set. But TC is inferior to our methods in all data sets. This is because TC is scalar quantization, while our method quantizes multi-dimensional subspaces. Further, TC assigns an adaptive number of bits to each eigenvalue, while our method assigns the eigenvalues to each subspace. Since bit numbers are discrete but eigenvalues are continuous, it is easier for our method to achieve balance”, § 4.2 “Offline, each codeword has been assigned a short list that contains all the data vectors belonging to this codeword (i.e., nearest to it). Online, a query will find a number of nearest codewords and retrieve all their short lists”, and § 4.3 “PQ was introduced as a way of compacting image representations for image retrieval. In this scenario, the local descriptors of an image are first aggregated as a high-dimensional (often thousands of dimensions) vector. The aggregation methods include the Fisher Kernel [11] and VLAD [3], [35]. The aggregated vector is normalized and compressed by PCA. The compressed vector is then compacted into a short code by PQ for retrieval”)
 



In regards to claim 13. Watanabe, Ge, and Xu disclose the neural network processor of claim 11 (as mentioned above) wherein:
Watanabe further discloses:
“the codebook storage memory holding [data]” (Watanabe in at least Fig. 7, Fig. 9, Fig. 10, ¶ [0052], and ¶ [0057]- [0064] discloses the difference vector storage unit which is being interpreted as the codebook storage memory)
Ge further discloses:
“the data of the first and second lookup tables are stored for reuse between successive received data separated by a convolution of the kernel on an input data set” (Ge in at least § 1, § 2.2.2, § 2.3, § 4.1, and § 4.2 “Both ways of distance computation are efficient using lookup tables. For SDC, the distances between any two sub-codewords in a subspace are pre-computed and stored in a k-by-k lookup table. For ADC, the distances between the sub-vector of the query and the sub-codewords in a subspace are pre-computed online and stored in a 1-by-k lookup table. The distance in the original space is simply the sum of the distances computed from the M subspaces”).

In regards to claim 14. Watanabe, Ge, and Xu disclose the neural network processor of claim 1 (as mentioned above) wherein:
Watanabe further discloses:
“input register, codebook storage memory, codeword memory, and arithmetic circuit are held on a single integrated circuit substrate” (Watanabe in at least Fig. 7, Fig. 9, Fig. 10, ¶ [0052], and ¶ [0057]- [0064] discloses input register, codebook storage memory, codeword memory, and arithmetic circuit are held on a single integrated circuit substrate).





In regard to claim 15. Watanabe discloses a method of operating a neural network processor operating receiving data and having:
“an input register holding received data for classification” (Watanabe in at least Fig. 7, Fig. 9, Fig. 10, ¶ [0052], and ¶ [0057]- [0064] discloses the reception unit);
“codebook storage memory holding data” (Watanabe in at least Fig. 7, Fig. 9, Fig. 10, ¶ [0052], and ¶ [0057]- [0064] discloses the difference vector storage unit which is being interpreted as the codebook storage memory);
“a codeword memory holding codeword data” (Watanabe in at least Fig. 7, Fig. 9, Fig. 10, ¶ [0052], and ¶ [0057]- [0064] discloses the base codeword storage unit which is being interpreted as the codeword memory);
“arithmetic circuitry communicating with the input register, the codebook storage memory, and the codeword memory” (Watanabe in at least Fig. 7, Fig. 9, Fig. 10, ¶ [0052], and ¶ [0057]- [0064] discloses the arithmetic control unit).

Watanabe does not explicitly disclose:
“permitting a mapping of individual codeword values to patterns of multiple kernel weight values related to a kernel of a neural network trained to provide a set of classifications”;
“permit this generation of a reconstructed kernel by indexing the codebook storage memory with codeword data”); and
“training a neural network having weight values”;
“decomposing the weight values into vectors compressing the vectors to create a codebook mapping individual codeword values to patterns of multiple kernel weight values and corresponding codeword data allowing reconstruction of the weight values”;
“loading the codebook and the codeword data into the codebook storage memory and codeword memory respectively”;



However, Ge discloses:
“permitting a mapping of individual codeword values to patterns of multiple kernel weight values related to a kernel of a neural network trained to provide a set of classifications” (Ge in at least § 4.3 discloses a product quantization (PQ) technique “PQ was introduced as a way of compacting image representations for image retrieval. In this scenario, the local descriptors of an image are first aggregated as a high-dimensional (often thousands of dimensions) vector. The aggregation methods include the Fisher Kernel”; in at least § 2 discloses quantization distortion “A variety of ANN methods, including k-means [14], product quantization [1], and iterative quantization (ITQ) [9], can be formulated within a framework of vector quantization”; in at least § 2.1 discloses vector quantization “A vector quantization system [7] maps a vector                                 
                                    x
                                    ∈
                                    
                                        
                                            R
                                        
                                        
                                            D
                                        
                                    
                                
                             to a codeword                                 
                                    c
                                
                             in a codebook                                 
                                    C
                                    =
                                    {
                                    c
                                    (
                                    i
                                    )
                                    }
                                
                             with                                 
                                    i
                                
                             in a finite index set. The mapping, termed as a quantizer … Given a codebook                                 
                                    C
                                
                            , an encoder that minimizes the distortion                                 
                                    E
                                
                             must satisfy the first Lloyd’s condition [7]: the encoder                                 
                                    i
                                    (
                                    x
                                    )
                                
                             should map any                                 
                                    x
                                
                             to its nearest codeword in the codebook                                 
                                    C
                                
                            ”; in at least the introduction § discloses “A query is quantized into a codeword and then compared with a short list of data which have the same or similar codewords. In the exhaustive search, the data are quantized into codewords [1], [8], [9]; the distances of vectors are approximated by the distances of codewords”; and in at least § 4.2 “Offline, each codeword has been assigned a short list that contains all the data vectors belonging to this codeword (i.e., nearest to it). Online, a query will find a number of nearest codewords and retrieve all their short lists”);
“permit this generation of a reconstructed kernel by indexing the codebook storage memory with codeword data” (Ge in at least in at least § 2.1 discloses vector quantization “A vector quantization system [7] maps a vector                                 
                                    x
                                    ∈
                                    
                                        
                                            R
                                        
                                        
                                            D
                                        
                                    
                                
                             to a codeword                                 
                                    c
                                
                             in a codebook                                 
                                    C
                                    =
                                    {
                                    c
                                    (
                                    i
                                    )
                                    }
                                
                             with                                 
                                    i
                                
                             in a finite index set. The mapping, termed as a quantizer … Given a codebook                                 
                                    C
                                
                            , an encoder that minimizes the distortion                                 
                                    E
                                
                             must satisfy the first Lloyd’s condition [7]: the encoder                                 
                                    i
                                    (
                                    x
                                    )
                                
                             should map any                                 
                                    x
                                
                             to its nearest codeword in the codebook                                 
                                    C
                                
                            ”) ; and 
“training a neural network having weight values” (Ge in at least in at least § 3.1 see Algorithm 1);
“decomposing the weight values into vectors compressing the vectors to create a codebook mapping individual codeword values to patterns of multiple kernel weight values and corresponding codeword data allowing reconstruction of the weight values” (Ge in at least § 4.3 discloses a product quantization (PQ) technique “PQ was introduced as a way of compacting image representations for image retrieval. In this scenario, the local descriptors of an image are first aggregated as a high-dimensional (often thousands of dimensions) vector. The aggregation methods include the Fisher Kernel”; in at least § 2 discloses quantization distortion “A variety of ANN methods, including k-means [14], product quantization [1], and iterative quantization (ITQ) [9], can be formulated within a framework of vector quantization”; in at least § 2.1 discloses vector quantization “A vector quantization system [7] maps a vector                                 
                                    x
                                    ∈
                                    
                                        
                                            R
                                        
                                        
                                            D
                                        
                                    
                                
                             to a codeword                                 
                                    c
                                
                             in a codebook                                 
                                    C
                                    =
                                    {
                                    c
                                    (
                                    i
                                    )
                                    }
                                
                             with                                 
                                    i
                                
                             in a finite index set. The mapping, termed as a quantizer … Given a codebook                                 
                                    C
                                
                            , an encoder that minimizes the distortion                                 
                                    E
                                
                             must satisfy the first Lloyd’s condition [7]: the encoder                                 
                                    i
                                    (
                                    x
                                    )
                                
                             should map any                                 
                                    x
                                
                             to its nearest codeword in the codebook                                 
                                    C
                                
                            ”; in at least the introduction § discloses “A query is quantized into a codeword and then compared with a short list of data which have the same or similar codewords. In the exhaustive search, the data are quantized into codewords [1], [8], [9]; the distances of vectors are approximated by the distances of codewords”; and in at least § 4.2 “Offline, each codeword has been assigned a short list that contains all the data vectors belonging to this codeword (i.e., nearest to it). Online, a query will find a number of nearest codewords and retrieve all their short lists”);
“loading the codebook and the codeword data into the codebook storage memory and codeword memory respectively” (Ge in at least in at least § 2.1 discloses vector quantization “A vector quantization system [7] maps a vector                                 
                                    x
                                    ∈
                                    
                                        
                                            R
                                        
                                        
                                            D
                                        
                                    
                                
                             to a codeword                                 
                                    c
                                
                             in a codebook                                 
                                    C
                                    =
                                    {
                                    c
                                    (
                                    i
                                    )
                                    }
                                
                             with                                 
                                    i
                                
                             in a finite index set. The mapping, termed as a quantizer … Given a codebook                                 
                                    C
                                
                            , an encoder that minimizes the distortion                                 
                                    E
                                
                             must satisfy the first Lloyd’s condition [7]: the encoder                                 
                                    i
                                    (
                                    x
                                    )
                                
                             should map any                                 
                                    x
                                
                             to its nearest codeword in the codebook                                 
                                    C
                                
                            ”). The broadest reasonable interpretation is that the set                                 
                                    C
                                
                             has elements of                                 
                                    c
                                    (
                                    i
                                    )
                                
                             where is                                 
                                    i
                                
                             is the index, and                                 
                                    c
                                    (
                                    i
                                    )
                                
                             corresponds to codewords associated with an index                                 
                                    i
                                
                            , the set                                 
                                    C
                                    =
                                    {
                                    c
                                    (
                                    i
                                    )
                                    }
                                
                             represents the indexing the codebook with codeword data                                 
                                    c
                                
                             indexed by                                 
                                    i
                                
                             to generate the reconstructed kernel.
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Watanabe and Ge. Watanabe teaches a codeword generation system. Ge teaches a quantization system for a large number of codewords using Fisher Kernel. One of ordinary skill would have motivation to combine Watanabe and Ge to optimize the product quantization by minimizing quantization distortions (Ge Abstract and § 4.3).

Watanabe and Ge do not explicitly disclose:
“generate output representing a dot product between the received data and the reconstructed kernel to classify the received data according to the set of classifications”;
“operating the neural network processor to receive data for classification and to generate data representing a dot product between the received data and a kernel reconstructed from data of the codebook storage memory and codeword memory to classify the received data according to the set of classifications”.
However, Xu discloses:
“generate output representing a dot product between the received data and the reconstructed kernel to classify the received data according to the set of classifications” (Xu in at least introduction § “Let X be a prescribed set which is called in the theory of learning an input space” and “A kernel                                 
                                    K
                                
                             on                                 
                                    X
                                
                             corresponds to a Hilbert space                                 
                                    
                                        
                                            H
                                        
                                        
                                            K
                                        
                                    
                                     
                                    :
                                    =
                                    
                                        
                                            s
                                            p
                                            a
                                            n
                                        
                                        -
                                    
                                    {
                                    K
                                    (
                                    ·
                                    ,
                                     
                                    y
                                    )
                                     
                                    :
                                     
                                    y
                                     
                                    ∈
                                     
                                    X
                                    }
                                
                               of functions on                                 
                                    X
                                
                             with an inner product determined by                                 
                                    (
                                    K
                                    (
                                    ·
                                    ,
                                     
                                    y
                                    )
                                    ,
                                    K
                                    (
                                    ·
                                    ,
                                     
                                    x
                                    )
                                    )
                                    
                                        
                                            H
                                        
                                        
                                            K
                                        
                                    
                                     
                                    =
                                     
                                    K
                                    (
                                    x
                                    ,
                                     
                                    y
                                    )
                                    ,
                                     
                                    x
                                    ,
                                     
                                    y
                                     
                                    ∈
                                     
                                    X
                                
                            .                                  
                                    
                                        
                                            H
                                        
                                        
                                            K
                                        
                                    
                                
                             is a reproducing kernel Hilbert space (RKHS)”).  The broadest reasonable interpretation is that the inner product determination provides the kernel                                
                                     
                                    K
                                    (
                                    x
                                    ,
                                     
                                    y
                                    )
                                    ,
                                     
                                    x
                                    ,
                                     
                                    y
                                     
                                    ∈
                                     
                                    X
                                
                             for classification.
 “operating the neural network processor to receive data for classification and to generate data representing a dot product between the received data and a kernel reconstructed from data of the codebook storage memory and codeword memory to classify the received data according to the set of classifications” (Xu in at least introduction § “Let X be a prescribed set which is called in the theory of learning an input space” and “A kernel                                 
                                    K
                                
                             on                                 
                                    X
                                
                             corresponds to a Hilbert space                                 
                                    
                                        
                                            H
                                        
                                        
                                            K
                                        
                                    
                                     
                                    :
                                    =
                                    
                                        
                                            s
                                            p
                                            a
                                            n
                                        
                                        -
                                    
                                    {
                                    K
                                    (
                                    ·
                                    ,
                                     
                                    y
                                    )
                                     
                                    :
                                     
                                    y
                                     
                                    ∈
                                     
                                    X
                                    }
                                
                               of functions on                                 
                                    X
                                
                             with an inner product determined by                                 
                                    (
                                    K
                                    (
                                    ·
                                    ,
                                     
                                    y
                                    )
                                    ,
                                    K
                                    (
                                    ·
                                    ,
                                     
                                    x
                                    )
                                    )
                                    
                                        
                                            H
                                        
                                        
                                            K
                                        
                                    
                                     
                                    =
                                     
                                    K
                                    (
                                    x
                                    ,
                                     
                                    y
                                    )
                                    ,
                                     
                                    x
                                    ,
                                     
                                    y
                                     
                                    ∈
                                     
                                    X
                                
                            .                                  
                                    
                                        
                                            H
                                        
                                        
                                            K
                                        
                                    
                                
                             is a reproducing kernel Hilbert space (RKHS)”.  The broadest reasonable interpretation is that the                                 
                                    K
                                    (
                                    ·
                                    ,
                                     
                                    y
                                    )
                                    ,
                                    K
                                    (
                                    ·
                                    ,
                                     
                                    x
                                    )
                                
                            ) is the kernel operation of the input space from a finite set of training data and the inner product commonly known as “dot product” gives the reconstructed kernel of the input space belong to X (                                 
                                    K
                                    (
                                    x
                                    ,
                                     
                                    y
                                    )
                                    ,
                                     
                                    x
                                    ,
                                     
                                    y
                                     
                                    ∈
                                     
                                    X
                                    )
                                
                            ”). The broadest reasonable interpretation is that the inner product determination provides the kernel                                
                                     
                                    K
                                    (
                                    x
                                    ,
                                     
                                    y
                                    )
                                    ,
                                     
                                    x
                                    ,
                                     
                                    y
                                     
                                    ∈
                                     
                                    X
                                
                             for classification.

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Watanabe, Ge, and Xu. Watanabe teaches a codeword generation system. Ge teaches a quantization system for a large number of codewords using Fisher Kernel. Xu teaches reproducing kernels. One of ordinary skill would have motivation to combine Watanabe, Ge, and Xu to incorporate the computational advantages of efficient updating the kernel, improvement of the predictor, and efficiency in setting up the coefficient matrix (Xu § 7).  

Claims 2-5, and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Watanabe et al. (hereinafter Watanabe) US 20160246603 A1, in view of Ge et al. (hereinafter Ge) “Optimized Product Quantization”, in view of Xu et al. (hereinafter Xu) “Refinable Kernels”, in view of Cuthbert et.al. (hereinafter Cuthbert) US 8779950 B2.



In regards to claim 2. Watanabe, Ge, and Xu disclose the neural network processor of claim 1 (as mentioned above) wherein:
Watanabe, Ge, and Xu do not explicitly disclose:
“the patterns of multiple kernel weight values are contiguous weight values within a matrix of the kernel”. 
However, Cuthbert discloses:
“the patterns of multiple kernel weight values are contiguous weight values within a matrix of the kernel” (Cuthbert in at least Col. 2 lines 7-19 “losslessly translate between a first set of bytes and a plurality of pathways in a reproducible array of byte values, and losslessly translate between the plurality of pathways  in the reproducible array of byte values and a second set of bytes”; Col. 8 lines 3-16 “Array 306 may be any suitable array of nodes configured to hold byte values, where a node is a location within the array that is addressable using coordinates. Array 306 may have one or more dimensions. For example, array 306 may be a cube having three dimensions, which will be referred to as X, Y, and Z dimensions. In a three-dimensional array 306, therefore, the location of any given point or node in the array can be described using its X, Y, and Z coordinates. Array 306 may have more or fewer dimensions. In some embodiments, array 306 is four-dimensional. Array 306 may also be described as having a size. In this context, the size of an array may be delineated by the magnitude of each dimension. For example, array 306 may be a three dimensional cube of size 36 by 36 by 36”; Col. 8 lines 17-36 "a certain series of byte values from first set of bytes 302 may have an equivalent series of values located along a pattern consisting of a contiguous pathway 308 of nodes within array … In this context, a contiguous pathway may be meant as a series of nodes wherein any given sequential pair of nodes is contiguous.  The term contiguous is used in the sense that within array 306, each of two nodes may touch the other.  More specifically, for a three-dimensional array, two nodes may be considered contiguous if each of the respective X, Y, and Z coordinates of one node differs by no more than one unit from the X, Y, and Z coordinates of a second node"; Col. 8 line 56 to Col. 9 line 3; Col. 9 lines 40-43 “Each node may be configured as a storage location for a byte value”).

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Watanabe, Ge, Xu, and Cuthbert. Watanabe teaches a codeword generation system. Ge teaches a quantization system for a large number of codewords using Fisher Kernel.  Xu teaches reproducing kernels. Cuthbert teaches the data structure for the matrix (array) elements for storing in the memory. One of ordinary skill would be motivated to combine Watanabe, Ge, Xu, and Cuthbert to determine the optimum combination of pathway length and command representation size, with an overall goal of data compression (Cuthbert Col. 13 lines 4-10).

In regards to claim 3. Watanabe, Ge, Xu, and Cuthbert disclose the neural network processor of claim 2 (as mentioned above) wherein: 
Cuthbert further discloses:
“the patterns of multiple kernel weight values are contiguous weight values along a predetermined dimension of the kernel data” (Cuthbert in at least Col. 2 lines 7-19 “losslessly translate between a first set of bytes and a plurality of pathways in a reproducible array of byte values, and losslessly translate between the plurality of pathways  in the reproducible array of byte values and a second set of bytes”; Col. 8 lines 3-16 “Array 306 may be any suitable array of nodes configured to hold byte values, where a node is a location within the array that is addressable using coordinates. Array 306 may have one or more dimensions. For example, array 306 may be a cube having three dimensions, which will be referred to as X, Y, and Z dimensions. In a three-dimensional array 306, therefore, the location of any given point or node in the array can be described using its X, Y, and Z coordinates. Array 306 may have more or fewer dimensions. In some embodiments, array 306 is four-dimensional. Array 306 may also be described as having a size. In this context, the size of an array may be delineated by the magnitude of each dimension. For example, array 306 may be a three dimensional cube of size 36 by 36 by 36”; Col. 8 lines 17-36 "a certain series of byte values from first set of bytes 302 may have an equivalent series of values located along a pattern consisting of a contiguous pathway 308 of nodes within array … In this context, a contiguous pathway may be meant as a series of nodes wherein any given sequential pair of nodes is contiguous.  The term contiguous is used in the sense that within array 306, each of two nodes may touch the other.  More specifically, for a three-dimensional array, two nodes may be considered contiguous if each of the respective X, Y, and Z coordinates of one node differs by no more than one unit from the X, Y, and Z coordinates of a second node"; Col. 8 line 56 to Col. 9 line 3; Col. 9 lines 40-43 “Each node may be configured as a storage location for a byte value”).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Watanabe, Ge, Xu, and Cuthbert. Watanabe teaches a codeword generation system. Ge teaches a quantization system for a large number of codewords using Fisher Kernel.  Xu teaches reproducing kernels. Cuthbert teaches the data structure for the matrix (array) elements for storing in the memory. One of ordinary skill would be motivated to combine Watanabe, Ge, Xu, and Cuthbert to determine the optimum combination of pathway length and command representation size, with an overall goal of data compression (Cuthbert Col. 13 lines 4-10).

In regards to claim 4. Watanabe, Ge, Xu, and Cuthbert disclose the neural network processor of claim 3 (as mentioned above) wherein: 
Cuthbert further discloses:
“the patterns of multiple kernel weight values have beginning and end of values aligned within the matrix of the kernel” (Cuthbert in at least Col. 2 lines 7-19 “losslessly translate between a first set of bytes and a plurality of pathways in a reproducible array of byte values, and losslessly translate between the plurality of pathways  in the reproducible array of byte values and a second set of bytes”; Col. 8 lines 3-16 “Array 306 may be any suitable array of nodes configured to hold byte values, where a node is a location within the array that is addressable using coordinates. Array 306 may have one or more dimensions. For example, array 306 may be a cube having three dimensions, which will be referred to as X, Y, and Z dimensions. In a three-dimensional array 306, therefore, the location of any given point or node in the array can be described using its X, Y, and Z coordinates. Array 306 may have more or fewer dimensions. In some embodiments, array 306 is four-dimensional. Array 306 may also be described as having a size. In this context, the size of an array may be delineated by the magnitude of each dimension. For example, array 306 may be a three dimensional cube of size 36 by 36 by 36”; Col. 8 lines 17-36 "a certain series of byte values from first set of bytes 302 may have an equivalent series of values located along a pattern consisting of a contiguous pathway 308 of nodes within array … In this context, a contiguous pathway may be meant as a series of nodes wherein any given sequential pair of nodes is contiguous.  The term contiguous is used in the sense that within array 306, each of two nodes may touch the other.  More specifically, for a three-dimensional array, two nodes may be considered contiguous if each of the respective X, Y, and Z coordinates of one node differs by no more than one unit from the X, Y, and Z coordinates of a second node"; Col. 8 line 56 to Col. 9 line 3; Col. 9 lines 40-43 “Each node may be configured as a storage location for a byte value”).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Watanabe, Ge, Xu, and Cuthbert. Watanabe teaches a codeword generation system. Ge teaches a quantization system for a large number of codewords using Fisher Kernel.  Xu teaches reproducing kernels. Cuthbert teaches the data structure for the matrix (array) elements for storing in the memory. One of ordinary skill would be motivated to combine Watanabe, Ge, Xu, and Cuthbert to determine the optimum combination of pathway length and command representation size, with an overall goal of data compression (Cuthbert Col. 13 lines 4-10).
In regards to claim 5. Watanabe, Ge, Xu, and Cuthbert disclose the neural network processor of claim 3 (as mentioned above) wherein: 
Cuthbert further discloses:
“the beginning and end values extend at least a full dimension of the matrix of the kernel” (Cuthbert in at least Col. 2 lines 7-19 “losslessly translate between a first set of bytes and a plurality of pathways in a reproducible array of byte values, and losslessly translate between the plurality of pathways  in the reproducible array of byte values and a second set of bytes”; Col. 8 lines 3-16 “Array 306 may be any suitable array of nodes configured to hold byte values, where a node is a location within the array that is addressable using coordinates. Array 306 may have one or more dimensions. For example, array 306 may be a cube having three dimensions, which will be referred to as X, Y, and Z dimensions. In a three-dimensional array 306, therefore, the location of any given point or node in the array can be described using its X, Y, and Z coordinates. Array 306 may have more or fewer dimensions. In some embodiments, array 306 is four-dimensional. Array 306 may also be described as having a size. In this context, the size of an array may be delineated by the magnitude of each dimension. For example, array 306 may be a three dimensional cube of size 36 by 36 by 36”; Col. 8 lines 17-36 "a certain series of byte values from first set of bytes 302 may have an equivalent series of values located along a pattern consisting of a contiguous pathway 308 of nodes within array … In this context, a contiguous pathway may be meant as a series of nodes wherein any given sequential pair of nodes is contiguous.  The term contiguous is used in the sense that within array 306, each of two nodes may touch the other.  More specifically, for a three-dimensional array, two nodes may be considered contiguous if each of the respective X, Y, and Z coordinates of one node differs by no more than one unit from the X, Y, and Z coordinates of a second node"; Col. 8 line 56 to Col. 9 line 3; Col. 9 lines 40-43 “Each node may be configured as a storage location for a byte value”).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Watanabe, Ge, Xu, and Cuthbert. Watanabe  teaches a codeword generation system. Ge teaches a quantization system for a large number of codewords using Fisher Kernel.  Xu teaches reproducing kernels. Cuthbert teaches the data structure for the matrix (array) elements for storing in the memory. One of ordinary skill would be motivated to combine Watanabe, Ge, Xu, and Cuthbert to determine the optimum combination of pathway length and command representation size, with an overall goal of data compression (Cuthbert Col. 13 lines 4-10).
In regards to claim 9. Watanabe, Ge, Xu, and Cuthbert disclose the neural network processor of claim 8 (as mentioned above) wherein:
Ge further discloses:
“the product quantization provides multiple individual codeword values” (Ge in at least § 4.3 discloses a product quantization (PQ) technique “PQ was introduced as a way of compacting image representations for image retrieval. In this scenario, the local descriptors of an image are first aggregated as a high-dimensional (often thousands of dimensions) vector. The aggregation methods include the Fisher Kernel”; in at least § 2 discloses quantization distortion “A variety of ANN methods, including k-means [14], product quantization [1], and iterative quantization (ITQ) [9], can be formulated within a framework of vector quantization”; in at least § 2.1 discloses vector quantization “A vector quantization system [7] maps a vector                                 
                                    x
                                    ∈
                                    
                                        
                                            R
                                        
                                        
                                            D
                                        
                                    
                                
                             to a codeword                                 
                                    c
                                
                             in a codebook                                 
                                    C
                                    =
                                    {
                                    c
                                    (
                                    i
                                    )
                                    }
                                
                             with                                 
                                    i
                                
                             in a finite index set”).

Watanabe, Ge, and Xu do not explicitly disclose:
“values associated with different but contiguous patterns of multiple kernel weights”. 
However, Cuthbert discloses:
“values associated with different but contiguous patterns of multiple kernel weights” (Cuthbert in at least Col. 2 lines 7-19 “losslessly translate between a first set of bytes and a plurality of pathways in a reproducible array of byte values, and losslessly translate between the plurality of pathways  in the reproducible array of byte values and a second set of bytes”; Col. 8 lines 3-16 “Array 306 may be any suitable array of nodes configured to hold byte values, where a node is a location within the array that is addressable using coordinates. Array 306 may have one or more dimensions. For example, array 306 may be a cube having three dimensions, which will be referred to as X, Y, and Z dimensions. In a three-dimensional array 306, therefore, the location of any given point or node in the array can be described using its X, Y, and Z coordinates. Array 306 may have more or fewer dimensions. In some embodiments, array 306 is four-dimensional. Array 306 may also be described as having a size. In this context, the size of an array may be delineated by the magnitude of each dimension. For example, array 306 may be a three dimensional cube of size 36 by 36 by 36”; Col. 8 lines 17-36 "a certain series of byte values from first set of bytes 302 may have an equivalent series of values located along a pattern consisting of a contiguous pathway 308 of nodes within array … In this context, a contiguous pathway may be meant as a series of nodes wherein any given sequential pair of nodes is contiguous.  The term contiguous is used in the sense that within array 306, each of two nodes may touch the other.  More specifically, for a three-dimensional array, two nodes may be considered contiguous if each of the respective X, Y, and Z coordinates of one node differs by no more than one unit from the X, Y, and Z coordinates of a second node"; Col. 8 line 56 to Col. 9 line 3; Col. 9 lines 40-43 “Each node may be configured as a storage location for a byte value”).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Watanabe, Ge, Xu, and Cuthbert. Watanabe teaches a codeword generation system. Ge teaches a quantization system for a large number of codewords using Fisher Kernel.  Xu teaches reproducing kernels. Cuthbert teaches the data structure for the matrix (array) elements for storing in the memory. One of ordinary skill would be motivated to combine Watanabe, Ge, Xu, and Cuthbert to determine the optimum combination of pathway length and command representation size, with an overall goal of data compression (Cuthbert Col. 13 lines 4-10).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TIRUMALE KRISHNASWAMY RAMESH whose telephone number is (571)272-4605. The examiner can normally be reached by phone.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amir Mehrmanesh can be reached on phone (571-270-3351). The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/TIRUMALE K RAMESH/Examiner, Art Unit 4163                                                                                                                                                                                                        
/VIKER A LAMARDO/Primary Examiner, Art Unit 2126