DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 09/17/2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Drawings
New corrected drawings in compliance with 37 CFR 1.121(d) are required in this application because elements 302 and 306 of Fig 3 and element 306 of Fig 5 are unreadable due to the values in the black squares being obscured. Applicant is advised to employ the services of a competent patent draftsperson outside the Office, as the U.S. Patent and Trademark Office no longer prepares new drawings. The corrected drawings are required in reply to the Office action to avoid abandonment of the application. The requirement for corrected drawings will not be held in abeyance.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-2 and 4-20 rejected under 35 U.S.C. 101 because the claimed invention is
directed to an abstract idea and does not integrate the judicial exception into a practical
application or amount to significantly more than the judicial exception.
	Regarding independent claim 1, it recites:…generate a plurality of compressed matrix representations each corresponding to one of a plurality of global weight matrices, wherein each of the plurality of compressed matrix representations comprises a centroid index matrix and a centroid table, each element of the centroid index matrix corresponding to an element of the corresponding one of the plurality of global weight matrices and comprising an index into the centroid table, each element of the centroid table comprising a centroid value; and transfer at least one of the plurality of compressed matrix representations…. All of these limitations can be performed in the human mind through the use of observations, evaluations, judgments, and thus, claim 1 recites a mental process and is an abstract idea.
This judicial exception is not integrated into a practical application because it only recites
the following additional elements of: the parameter server and a plurality of training workers. All of these additional elements are recited at a high-level of generality (i.e., as a generic server and distributed computer system performing generic computer functions) such that it amounts to no more than mere instructions to apply the exception using generic computer components. See MPEP 2106.05(f).
	Furthermore, claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements of the parameter server and a plurality of training workers amount to no more than a recitation of the words "apply it" (or an equivalent) and/or are no more than mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f).
	Regarding dependent claim 2, it recites:…generating the compressed matrix representations…. All of these limitations can be performed in the human mind through the use of observations, evaluations, judgments, and thus, claim 2 recites a mental process and is an abstract idea.
This judicial exception is not integrated into a practical application because it only recites
the following additional elements of: according to a clustering algorithm. The additional element of using a clustering algorithm recites only the idea of a solution or outcome i.e., the element fails to recite details of how a solution to a problem is accomplished, with no restriction on how the result is accomplished and no description of the mechanism for accomplishing the result. See MPEP 2106.05(f)(1).
Furthermore, claim 2 does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional element of according to a clustering algorithm amount to no more than a recitation of the words "apply it" (or an equivalent) and/or are no more than mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f). Thus, the dependent claim is ineligible.
Regarding dependent claim 4, it recites:…generating a plurality of partial sums, each partial sum comprising the sum of the elements of the at least one input matrix that correspond to a common centroid value as indicated by the corresponding elements of the centroid index matrix; generating a set of products by multiplying each partial sum by its corresponding centroid value in the centroid table; and generating an activation result by summing the products of the set of products, the gradient matrices based at least in part on the activation result. All of these limitations deal with mathematical relationships, mathematical formulas or equations, mathematical calculations and thus, claim 4 recites a mathematical concept and is an abstract idea. None of the limitations integrate the judicial exception into a practical application or amount to significantly more than the judicial exception. Thus, the dependent claim is ineligible. 
Regarding dependent claim 5, it recites: wherein the activation result is the input of the next layer of the DNN. All of these limitations can be performed in the human mind through the use of observations, evaluations, judgments, and thus, claim 5 recites a mental process and is an abstract idea. None of the limitations integrate the judicial exception into a practical application or amount to significantly more than the judicial exception. Thus, the dependent claim is ineligible.
Regarding dependent claim 6, it recites: wherein the activation result is used to backpropagate a measure of output error of the DNN. All of these limitations can be performed in the human mind through the use of observations, evaluations, judgments, and thus, claim 6 recites a mental process and is an abstract idea. None of the limitations integrate the judicial exception into a practical application or amount to significantly more than the judicial exception. Thus, the dependent claim is ineligible.
Regarding independent claim 7, it recites:… receiving a compressed representation of a weight matrix and an input matrix…generating a plurality of partial sums, each partial sum comprising the sum of input values of the input matrix that correspond to a common weight value of a set of common weight values included in the compressed representation;  generating a set of products based on the plurality of partial sums and the set of common weight values; and generating the activation result by summing the products of the set of products. All of these limitations deal with mathematical relationships, mathematical formulas or equations, mathematical calculations and thus, claim 7 recites a mathematical concept and is an abstract idea.
This judicial exception is not integrated into a practical application because it only recites the following additional element of: the input matrix having input elements that are input values to at least part of the DNN layer. This additional element is insignificant extra-solution activity since the limitation amounts to post-solution activity to train a machine learning model. See MPEP 2106.05(g)(3).
Furthermore, claim 7 does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional element of the input matrix having input elements that are input values to at least part of the DNN layer amount to no more than data gathering for post-solution activity. See MPEP 2106.05(g)(3).
Regarding dependent claim 8, it recites:…receiving a centroid index matrix and a centroid table, the centroid index matrix comprising a plurality of entries containing centroid index values, each centroid index value comprising an index into the centroid table, and the centroid table comprising a plurality of centroid values that are the common weight values. All of these limitations deal with mathematical relationships, mathematical formulas or equations, mathematical calculations and thus, claim 8 recites a mathematical concept and is an abstract idea. None of the limitations integrate the judicial exception into a practical application or amount to significantly more than the judicial exception. Thus, the dependent claim is ineligible. 
Regarding dependent claim 9, it recites:… generating each partial sum by selecting a centroid index value of the centroid values, and summing the input elements of the input matrix having corresponding entries in the centroid index matrix that contain the selected centroid index value. All of these limitations deal with mathematical relationships, mathematical formulas or equations, mathematical calculations and thus, claim 9 recites a mathematical concept and is an abstract idea. None of the limitations integrate the judicial exception into a practical application or amount to significantly more than the judicial exception. Thus, the dependent claim is ineligible. 
Regarding dependent claim 10, it recites:… multiplying each partial sum of the plurality of partial sums by the centroid value in the centroid table having the centroid index value selected for generation of the partial sum. All of these limitations deal with mathematical relationships, mathematical formulas or equations, mathematical calculations and thus, claim 10 recites a mathematical concept and is an abstract idea. None of the limitations integrate the judicial exception into a practical application or amount to significantly more than the judicial exception. Thus, the dependent claim is ineligible. 
Regarding dependent claim 11, it recites: wherein the activation result is the input of the next layer of the DNN. All of these limitations can be performed in the human mind through the use of observations, evaluations, judgments, and thus, claim 11 recites a mental process and is an abstract idea. None of the limitations integrate the judicial exception into a practical application or amount to significantly more than the judicial exception. Thus, the dependent claim is ineligible.
Regarding dependent claim 12, it recites: wherein the activation result is used to backpropagate a measure of output error of the DNN. All of these limitations can be performed in the human mind through the use of observations, evaluations, judgments, and thus, claim 12 recites a mental process and is an abstract idea. None of the limitations integrate the judicial exception into a practical application or amount to significantly more than the judicial exception. Thus, the dependent claim is ineligible.
Regarding dependent claim 13, it recites: wherein the activation result is used to determine a gradient matrix for the DNN. All of these limitations can be performed in the human mind through the use of observations, evaluations, judgments, and thus, claim 13 recites a mental process and is an abstract idea. None of the limitations integrate the judicial exception into a practical application or amount to significantly more than the judicial exception. Thus, the dependent claim is ineligible.
Referring to independent claim 14, it is rejected on the same basis as
independent claim 7 since they are analogous claims.
	Referring to dependent claims 15-20, they are rejected on the same basis as
dependent claims 8-13 since they are analogous claims.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-3 are rejected under 35 U.S.C. 103 as being unpatentable over Cho et al. US 2019/0087723 Al(“Cho”) in view of Han et al., "Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding." arXiv preprint arXiv:1510.00149 (2015)(“Han”). 
Regarding claim 1, Cho teaches a distributed training system for training a deep neural network ("DNN") including a parameter server and a plurality of training workers configured to iteratively generate global DNN weights until the weights converge,the system comprising: the parameter server configured to: generate a plurality of compressed matrix representations each corresponding to one of a plurality of global weight matrices(Cho, paras. 0107-0112, see also figs. 4, 6, 7, 8, 10A, 10B,10C,  “Suppose that the training is at a stage where weight vector 315 (old [P]) associated with the nodes of the instances of model 311A. Old [P] could be the initial weights at the beginning of the training or the previous weights from a previous iteration in the training…[a]pplication 802A computes a gradient vector corresponding to IO using opcodes 412 and R 410. Application 802A computes an overall gradient vector from the computed gradient vectors of each worker machine. Application 802A optimizes R 410 to
produce R' 610 as in FIG. 6. Using O 412 and R' 610, application 802A computes I' 612 at PS 802. Using I' 612 and R' 610, application 802A computes approximated overall gradient vector G+ 714 as in FIG. 7 at PS 802 [i.e. generate a plurality of compressed matrix representations each corresponding to one of a plurality of global weight matrices].”); and transfer at least one of the plurality of compressed matrix representations to each of a plurality of training workers(Cho, paras. 0107-0112, see also figs. 4, 6, 7, 8, 10A, 10B,10C, “Application 802A passes G+ 714 to update module 317. Update module 317 uses old [P] 317 with G+ 714 to
compute new weight vector new [P] 319 in FIG. 3. Application 802A also transmits I' 612 and R' 610 to worker application 804A in WO (and worker application 806A in Wl and worker application 808A in W2) [i.e. transfer at least one of the plurality of compressed matrix representations to each of a plurality of training workers]. Worker application 804A computes G+ 714 locally at WO using I' 612 and R' 610. Worker application 804A updates the weights of model 311A instance in WO with the locally computed G+ 714. Updated model 311A is now ready for another iteration of the training.”).  
However, Cho does not teach: wherein each of the plurality of compressed matrix representations comprises a centroid index matrix and a centroid table, each element of the centroid index matrix corresponding to an element of the corresponding one of the plurality of global weight matrices and comprising an index into the centroid table, each element of the centroid table comprising a centroid value. 
However, Han teaches: wherein each of the plurality of compressed matrix representations comprises a centroid index matrix and a centroid table, each element of the centroid index matrix corresponding to an element of the corresponding one of the plurality of global weight matrices and comprising an index into the centroid table, each element of the centroid table comprising a centroid value(Han, pgs. 3-5, see also fig. 3, Weight sharing is illustrated by fig. 3 which has been reproduced herein: 
	
    PNG
    media_image1.png
    223
    504
    media_image1.png
    Greyscale

Fig. 3 details a table titled centroids [i.e. centroid table], a matrix titled cluster index [i.e. the centroid index matrix] where each element of the cluster index corresponds to an element of the 4x4 weight matrix and is an index into the centroids’ table, which contains centroid values).1
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Cho with the teachings of Han the motivation being to reduce the memory bandwidth when fetching and/or updating the weights of a deep neural network(DNN)(Han, pg., 2, “Running large neural networks require a lot of memory bandwidth to fetch the weights and a lot of computation to do dot products…[o]ur main insight is that, pruning and trained quantization are able to compress the network without
interfering each other, thus lead to surprisingly high compression rate. It makes the required storage so small (a few megabytes) that all weights can be cached on chip instead of going to off-chip DRAM.” ).
Regarding claim 2, Cho in view of Han teaches the distributed training system of claim 1, wherein said generating the plurality of compressed matrix representations comprises: generating the compressed matrix representations according to a clustering algorithm(Han, pg. 4, As equation 2 details which has been reproduced herein: 

    PNG
    media_image2.png
    101
    652
    media_image2.png
    Greyscale

Which details the within-cluster sum of squares (WCSS) minimization, where the original weights W =                         
                            {
                            
                                
                                    w
                                
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    w
                                
                                
                                    2
                                
                            
                            ,
                             
                            …
                            ,
                            
                                
                                    w
                                
                                
                                    n
                                
                            
                            }
                        
                     into k clusters C =                         
                            {
                            
                                
                                    c
                                
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    c
                                
                                
                                    2
                                
                            
                            ,
                             
                            …
                            ,
                            
                                
                                    c
                                
                                
                                    k
                                
                            
                            }
                        
                    ,                         
                            n
                            ≫
                            k
                        
                    ).  
Regarding claim 3, Cho in view of Han teaches the distributed training system of claim 1, wherein the parameter server is further configured to: provide to each training worker of the plurality of training workers at least one input matrix, each training worker calculating gradient matrices directly from the at least one of the plurality of compressed matrix representations based on the at least one input matrix; receive gradient matrices from each of the plurality of training workers(Cho, paras. 0107-0112, see also figs. 4, 6, 7, 8, 10A, 10B,10C, “The operations described with respect to WO apply similarly with respect to Wl and W2. Model 311A in WO is provided training inputs [i.e. provide to each training worker of the plurality of training workers at least one input matrix]. Application 804A computes or
receives the gradients for old [P] weights of model 311A. Application 804A constructs a gradient vector. Application 804A transforms the gradient vector in to an ISA vector (IO) using opcodes 412 and R 410 [i.e. each training worker calculating gradient matrices directly from the at least one of the plurality of compressed matrix representations based on the at least one input matrix]. Application 804A transmits IO to PS 802 [i.e. receive gradient matrices from each of the plurality of training workers].”);
 generate updated global weight matrices based at least in part on the received gradient matrices; generate a compressed matrix representation of each updated global weight matrix; and transfer at least one compressed matrix representation of each updated global weight matrix and at least one additional input matrix to each of the plurality of training workers for calculation of gradient matrices thereby(Cho, paras. 0107-0112, see also figs. 4, 6, 7, 8, 10A, 10B,10C, “Application 802A receives IO from WO (and I1 and 12 from Wl and W2, respectively)…[u]sing I' 612 and R' 610, application 802A computes approximated overall gradient vector G+ 714 as in FIG. 7 at PS 802. Application 802A passes G+ 714 to update module 317. Update module 317 uses old [P] 317 with G+ 714 to compute new weight vector new [P] 319 in FIG. 3 [i.e. generate updated global weight matrices based at least in part on the received gradient matrices]. Application 802A also transmits I' 612 and R' 610 to worker application 804A in WO (and worker application 806A in Wl and worker application 808A in W2). Worker application 804A computes G+ 714 locally at WO using I' 612 and R' 610. Worker application 804A updates the weights of model 311A instance in WO with the locally computed G+ 714. Updated model 311A is now ready for another iteration of the training [i.e. generate a compressed matrix representation of each updated global weight matrix and transfer at least one compressed matrix representation of each updated global weight matrix and at least one additional input matrix to each of the plurality of training workers for calculation of gradient matrices thereby].”).  

Claims 4-6 are rejected under 35 U.S.C. 103 as being unpatentable over Cho et al. US 2019/0087723 Al(“Cho”) in view of Han et al., "Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding." arXiv preprint arXiv:1510.00149 (2015)(“Han”) and further in view of Chen, Wenlin, et al. "Compressing neural networks with the hashing trick." International conference on machine learning. PMLR, (2015)(“Chen").
Regarding claim 4, Cho in view of Han teaches the distributed training system of claim 3 but does not teach, wherein calculating gradient matrices directly from the at least one of the plurality of compressed matrix representations based on the at least one input matrix comprises: generating a plurality of partial sums, each partial sum comprising the sum of the elements of the at least one input matrix that correspond to a common centroid value as indicated by the corresponding elements of the centroid index matrix; generating a set of products by multiplying each partial sum by its corresponding centroid value in the centroid table; and generating an activation result by summing the products of the set of products, the gradient matrices based at least in part on the activation result.  
However, Chen teaches: wherein calculating gradient matrices directly from the at least one of the plurality of compressed matrix representations based on the at least one input matrix comprises: generating a plurality of partial sums, each partial sum comprising the sum of the elements of the at least one input matrix that correspond to a common centroid value as indicated by the corresponding elements of the centroid index matrix(Chen, pgs. 3-4, right column, see also fig. 1, “We will denote the input activation as                         
                            a
                            =
                            
                                
                                    a
                                
                                
                                    l
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    m
                                
                            
                        
                    …[b]oth w and                         
                            
                                
                                    ϕ
                                
                                
                                    i
                                
                            
                            (
                            a
                            )
                        
                     are K-dimensional, where K is the number of hash buckets in this layer. The hash mapping function                         
                            
                                
                                    ϕ
                                
                                
                                    i
                                
                            
                        
                     is defined as follows. The                         
                            
                                
                                    k
                                
                                
                                    t
                                    h
                                
                            
                        
                     element of                        
                             
                             
                            
                                
                                    ϕ
                                
                                
                                    i
                                
                            
                            (
                            a
                            )
                        
                     , i.e.                         
                            
                                
                                    [
                                    
                                        
                                            ϕ
                                        
                                        
                                            i
                                        
                                    
                                    (
                                    a
                                    )
                                    ]
                                
                                
                                    k
                                
                            
                        
                    , is the sum of variables hashed into bucket k:                         
                            
                                
                                    [
                                    
                                        
                                            ϕ
                                        
                                        
                                            i
                                        
                                    
                                    (
                                    a
                                    )
                                    ]
                                
                                
                                    k
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        j
                                        :
                                        h
                                        
                                            
                                                i
                                                ,
                                                 
                                                j
                                            
                                        
                                        =
                                        k
                                    
                                
                                
                                    
                                        
                                            a
                                        
                                        
                                            j
                                        
                                    
                                
                            
                        
                    .” Chen teaches:                        
                             
                            
                                
                                    ∑
                                    
                                        j
                                        :
                                        h
                                        
                                            
                                                i
                                                ,
                                                 
                                                j
                                            
                                        
                                        =
                                        k
                                    
                                
                                
                                    
                                        
                                            a
                                        
                                        
                                            j
                                        
                                    
                                
                            
                        
                     [i.e. generating a plurality of partial sums, each partial sum comprising the sum of the elements of the at least one input matrix that correspond to a common centroid value as indicated by the corresponding elements of the centroid index matrix ]); 
generating a set of products by multiplying each partial sum by its corresponding centroid value in the centroid table; and generating an activation result by summing the products of the set of products, the gradient matrices based at least in part on the activation result(Chen, pgs. 4, left-column, see also fig. 1, “                        
                            
                                
                                    z
                                
                                
                                    i
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        k
                                        =
                                        1
                                    
                                    
                                        K
                                    
                                
                                
                                    
                                        
                                            w
                                        
                                        
                                            k
                                        
                                    
                                    
                                        
                                            [
                                            
                                                
                                                    ϕ
                                                
                                                
                                                    i
                                                
                                            
                                            (
                                            a
                                            )
                                            ]
                                        
                                        
                                            k
                                        
                                    
                                    =
                                    
                                        
                                            ∑
                                            
                                                k
                                                =
                                                1
                                            
                                            
                                                K
                                            
                                        
                                        
                                            
                                                
                                                    w
                                                
                                                
                                                    k
                                                
                                            
                                            
                                                
                                                    ∑
                                                    
                                                        j
                                                        :
                                                        h
                                                        
                                                            
                                                                i
                                                                ,
                                                                 
                                                                j
                                                            
                                                        
                                                        =
                                                        k
                                                    
                                                
                                                
                                                    
                                                        
                                                            a
                                                        
                                                        
                                                            j
                                                        
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                    … [i]f we substitute Eq. (7) into the error term we obtain:                         
                            
                                
                                    δ
                                
                                
                                    j
                                
                                
                                    l
                                
                            
                            =
                            
                                
                                    
                                        
                                            ∑
                                            
                                                i
                                                =
                                                1
                                            
                                            
                                                
                                                    
                                                        n
                                                    
                                                    
                                                        l
                                                        +
                                                        1
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    ξ
                                                
                                                
                                                    l
                                                
                                            
                                            
                                                
                                                    i
                                                    ,
                                                     
                                                    j
                                                
                                            
                                            
                                                
                                                    w
                                                
                                                
                                                    
                                                        
                                                            h
                                                        
                                                        
                                                            l
                                                        
                                                    
                                                    
                                                        
                                                            i
                                                            ,
                                                             
                                                            j
                                                        
                                                    
                                                
                                                
                                                    l
                                                
                                            
                                        
                                    
                                    
                                        
                                            δ
                                        
                                        
                                            i
                                        
                                        
                                            l
                                            +
                                            1
                                        
                                    
                                
                            
                            
                                
                                    f
                                
                                
                                    '
                                
                            
                            (
                            
                                
                                    z
                                
                                
                                    j
                                
                                
                                    l
                                
                            
                            )
                        
                    …                        
                            
                                
                                    ∂
                                    L
                                
                                
                                    ∂
                                    
                                        
                                            V
                                        
                                        
                                            i
                                            j
                                        
                                        
                                            l
                                        
                                    
                                
                            
                            =
                            
                                
                                    a
                                
                                
                                    j
                                
                                
                                    l
                                
                            
                            
                                
                                    δ
                                
                                
                                    i
                                
                                
                                    l
                                    +
                                    1
                                
                            
                        
                    ….” Chen teaches:                         
                            
                                
                                    ∑
                                    
                                        k
                                        =
                                        1
                                    
                                    
                                        K
                                    
                                
                                
                                    
                                        
                                            w
                                        
                                        
                                            k
                                        
                                    
                                    
                                        
                                            ∑
                                            
                                                j
                                                :
                                                h
                                                
                                                    
                                                        i
                                                        ,
                                                         
                                                        j
                                                    
                                                
                                                =
                                                k
                                            
                                        
                                        
                                            
                                                
                                                    a
                                                
                                                
                                                    j
                                                
                                            
                                        
                                    
                                
                            
                        
                     [i.e. generating a set of products by multiplying each partial sum by its corresponding centroid value in the centroid table and generating an activation result by summing the products of the set of products]                         
                            
                                
                                    ∂
                                    L
                                
                                
                                    ∂
                                    
                                        
                                            V
                                        
                                        
                                            i
                                            j
                                        
                                        
                                            l
                                        
                                    
                                
                            
                            =
                            
                                
                                    a
                                
                                
                                    j
                                
                                
                                    l
                                
                            
                            
                                
                                    δ
                                
                                
                                    i
                                
                                
                                    l
                                    +
                                    1
                                
                            
                        
                     [i.e. the gradient matrices based at least in part on the activation result ]). 
 It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Han and in view of Chen the motivation to do so would be to use a hashing function to compress neural networks for the beneficial mathematical properties that hashing provides (Chen, pg. 2, right-column, “In addition to memory savings, the hashing trick has the appealing property of being sparsity preserving, fast to compute and storage-free. The most important property of the hashing trick is, arguably, its (approximate) preservation of inner product operations… the hashing trick can be used to learn multiple classifiers within the same hashed space.”).
Regarding claim 5, Cho in view of Han and in view of Chen teaches the distributed training system of claim 4, wherein the activation result is the input of the next layer of the DNN(Chen, pgs. 3-4, see also fig. 1, “This section focuses on a single layer throughout and to
simplify notation we will drop the super-scripts                         
                            l
                        
                    . We will denote the input activation as                         
                            a
                            =
                            
                                
                                    a
                                
                                
                                    l
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    m
                                
                            
                        
                     of dimensionality                         
                            m
                            =
                            
                                
                                    n
                                
                                
                                    l
                                
                            
                            .
                        
                     We denote the output as                         
                            z
                            =
                            
                                
                                    z
                                
                                
                                    l
                                    +
                                    1
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    n
                                
                            
                        
                     with dimensionality                         
                            n
                            =
                            
                                
                                    n
                                
                                
                                    l
                                    +
                                    1
                                
                            
                        
                    . To facilitate weight sharing within a feed forward neural network, we can simply substitute Eq. (3) into Eq. (2):                         
                            
                                
                                    z
                                
                                
                                    i
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        j
                                        =
                                        1
                                    
                                    
                                        m
                                    
                                
                                
                                    
                                        
                                            V
                                        
                                        
                                            i
                                            j
                                        
                                    
                                    
                                        
                                            a
                                        
                                        
                                            j
                                        
                                    
                                
                            
                        
                    ” Chen teaches:                        
                             
                            
                                
                                    z
                                
                                
                                    i
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        j
                                        =
                                        1
                                    
                                    
                                        m
                                    
                                
                                
                                    
                                        
                                            V
                                        
                                        
                                            i
                                            j
                                        
                                    
                                    
                                        
                                            a
                                        
                                        
                                            j
                                        
                                    
                                
                            
                        
                     [i.e. wherein the activation result is the input of the next layer of the DNN]).  
Regarding claim 6, Cho in view of Han and in view of Chen teaches the distributed training system of claim 4, wherein the activation result is used to backpropagate a measure of output error of the DNN(Chen, pg.4, “Let L denote the loss function for training the neural network, e.g. cross entropy or the quadratic loss…[f]urther, let                         
                            
                                
                                    δ
                                
                                
                                    j
                                
                                
                                    l
                                
                            
                        
                     denote the gradient of L over activation j in layer l, also known as the error term… [i]f we substitute Eq. (7) into the error term we obtain:                         
                            
                                
                                    δ
                                
                                
                                    j
                                
                                
                                    l
                                
                            
                            =
                            
                                
                                    
                                        
                                            ∑
                                            
                                                i
                                                =
                                                1
                                            
                                            
                                                
                                                    
                                                        n
                                                    
                                                    
                                                        l
                                                        +
                                                        1
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    ξ
                                                
                                                
                                                    l
                                                
                                            
                                            
                                                
                                                    i
                                                    ,
                                                    j
                                                
                                            
                                            
                                                
                                                    w
                                                
                                                
                                                    
                                                        
                                                            h
                                                        
                                                        
                                                            l
                                                        
                                                    
                                                    
                                                        
                                                            i
                                                            ,
                                                             
                                                            j
                                                        
                                                    
                                                
                                                
                                                    l
                                                
                                            
                                            
                                                
                                                    δ
                                                
                                                
                                                    i
                                                
                                                
                                                    l
                                                    +
                                                    1
                                                
                                            
                                        
                                    
                                
                            
                            
                                
                                    f
                                
                                
                                    '
                                
                            
                            (
                            
                                
                                    z
                                
                                
                                    j
                                
                                
                                    l
                                
                            
                            )
                        
                    ”).  

Claims 7-20 are rejected under 35 U.S.C. 103 as being unpatentable over Han et al., "Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding." arXiv preprint arXiv:1510.00149 (2015)(“Han”) in view of Chen, Wenlin, et al. "Compressing neural networks with the hashing trick." International conference on machine learning. PMLR, (2015)(“Chen").
Regarding claim 7, teaches a method for generating an activation result for at least part of a deep neural network ("DNN") layer, comprising: receiving a compressed representation of a weight matrix and an input matrix, the input matrix having input elements that are input values to at least part of the DNN layer(Han, pg. 3, see also fig. 3, “Weight sharing is illustrated in Figure 3. Suppose we have a layer that has 4 input neurons and 4 output neurons…[t]he weights are quantized to 4 bins (denoted with 4 colors), all the weights in the same bin share the same value, thus for each weight, we then need to store only a small index into a table of shared weights… [f]or pruned AlexNet, we are able to quantize to 8-bits (256 shared weights) for each CONV layers, and 5-bits (32 shared weights) for each FC layer without any loss of accuracy.” Han teaches: the weights are quantized to 4 bins (denoted with 4 colors), all the weights in the same bin share the same value, thus for each weight, we then need to store only a small index into a table of shared weights [i.e. receiving a compressed representation of a weight matrix ] suppose we have a layer that has 4 input neurons [i.e. and an input matrix, the input matrix having input elements that are input values to at least part of the DNN layer ]).
Han does not teach: generating a plurality of partial sums, each partial sum comprising the sum of input values of the input matrix that correspond to a common weight value of a set of common weight values included in the compressed representation;  generating a set of products based on the plurality of partial sums and the set of common weight values; and generating the activation result by summing the products of the set of products.
However Chen teaches: generating a plurality of partial sums, each partial sum comprising the sum of input values of the input matrix that correspond to a common weight value of a set of common weight values included in the compressed representation(Chen, pgs. 3-4, right column, see also fig. 1, “We will denote the input activation as                         
                            a
                            =
                            
                                
                                    a
                                
                                
                                    l
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    m
                                
                            
                        
                    …[b]oth w and                         
                            
                                
                                    ϕ
                                
                                
                                    i
                                
                            
                            (
                            a
                            )
                        
                     are K-dimensional, where K is the number of hash buckets in this layer. The hash mapping function                         
                            
                                
                                    ϕ
                                
                                
                                    i
                                
                            
                        
                     is defined as follows. The                         
                            
                                
                                    k
                                
                                
                                    t
                                    h
                                
                            
                        
                     element of                        
                             
                             
                            
                                
                                    ϕ
                                
                                
                                    i
                                
                            
                            (
                            a
                            )
                        
                     , i.e.                         
                            
                                
                                    [
                                    
                                        
                                            ϕ
                                        
                                        
                                            i
                                        
                                    
                                    (
                                    a
                                    )
                                    ]
                                
                                
                                    k
                                
                            
                        
                    , is the sum of variables hashed into bucket k:                         
                            
                                
                                    [
                                    
                                        
                                            ϕ
                                        
                                        
                                            i
                                        
                                    
                                    (
                                    a
                                    )
                                    ]
                                
                                
                                    k
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        j
                                        :
                                        h
                                        
                                            
                                                i
                                                ,
                                                 
                                                j
                                            
                                        
                                        =
                                        k
                                    
                                
                                
                                    
                                        
                                            a
                                        
                                        
                                            j
                                        
                                    
                                
                            
                        
                    .”);
generating a set of products based on the plurality of partial sums and the set of common weight values; and generating the activation result by summing the products of the set of products(Chen, pgs. 4, left-column, see also fig. 1, “                        
                            
                                
                                    z
                                
                                
                                    i
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        k
                                        =
                                        1
                                    
                                    
                                        K
                                    
                                
                                
                                    
                                        
                                            w
                                        
                                        
                                            k
                                        
                                    
                                    
                                        
                                            [
                                            
                                                
                                                    ϕ
                                                
                                                
                                                    i
                                                
                                            
                                            (
                                            a
                                            )
                                            ]
                                        
                                        
                                            k
                                        
                                    
                                    =
                                    
                                        
                                            ∑
                                            
                                                k
                                                =
                                                1
                                            
                                            
                                                K
                                            
                                        
                                        
                                            
                                                
                                                    w
                                                
                                                
                                                    k
                                                
                                            
                                            
                                                
                                                    ∑
                                                    
                                                        j
                                                        :
                                                        h
                                                        
                                                            
                                                                i
                                                                ,
                                                                 
                                                                j
                                                            
                                                        
                                                        =
                                                        k
                                                    
                                                
                                                
                                                    
                                                        
                                                            a
                                                        
                                                        
                                                            j
                                                        
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                    ).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Han in view of Chen the motivation to do so would be to use a hashing function to compress neural networks for the beneficial mathematical properties that hashing provides(Chen, pg. 2, right-column, “In addition to memory savings, the hashing trick has the appealing property of being sparsity preserving, fast to compute and storage-free. The most important property of the hashing trick is, arguably, its (approximate) preservation of inner product operations… the hashing trick can be used to learn multiple classifiers within the same hashed space.”).
Regarding claim 8, Han in view of Chen teaches the method of claim 7, wherein said receiving a compressed representation of a weight matrix and an input matrix comprises: receiving a centroid index matrix and a centroid table, the centroid index matrix comprising a plurality of entries containing centroid index values, each centroid index value comprising an index into the centroid table, and the centroid table comprising a plurality of centroid values that are the common weight values(Han, pgs. 3-5, see also fig. 3, Weight sharing is illustrated by fig. 3 which has been reproduced herein: 
	
    PNG
    media_image1.png
    223
    504
    media_image1.png
    Greyscale

Fig. 3 details a table titled centroids [i.e. centroid table], a matrix titled cluster index [i.e. the centroid index matrix] where each element of the cluster index corresponds to an element into the centroids’ table, which contains centroid values).2  
Regarding claim 9, Han in view of Chen teaches the method of claim 8, wherein said generating a plurality of partial sums comprises: generating each partial sum by selecting a centroid index value of the centroid values, and summing the input elements of the input matrix having corresponding entries in the centroid index matrix that contain the selected centroid index value(Chen, pgs. 3-4, right column, see also fig. 1, “We will denote the input activation as                         
                            a
                            =
                            
                                
                                    a
                                
                                
                                    l
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    m
                                
                            
                        
                    …[b]oth w and                         
                            
                                
                                    ϕ
                                
                                
                                    i
                                
                            
                            (
                            a
                            )
                        
                     are K-dimensional, where K is the number of hash buckets in this layer. The hash mapping function                         
                            
                                
                                    ϕ
                                
                                
                                    i
                                
                            
                        
                     is defined as follows. The                         
                            
                                
                                    k
                                
                                
                                    t
                                    h
                                
                            
                        
                     element of                        
                             
                             
                            
                                
                                    ϕ
                                
                                
                                    i
                                
                            
                            (
                            a
                            )
                        
                     , i.e.                         
                            
                                
                                    [
                                    
                                        
                                            ϕ
                                        
                                        
                                            i
                                        
                                    
                                    (
                                    a
                                    )
                                    ]
                                
                                
                                    k
                                
                            
                        
                    , is the sum of variables hashed into bucket k:                         
                            
                                
                                    [
                                    
                                        
                                            ϕ
                                        
                                        
                                            i
                                        
                                    
                                    (
                                    a
                                    )
                                    ]
                                
                                
                                    k
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        j
                                        :
                                        h
                                        
                                            
                                                i
                                                ,
                                                 
                                                j
                                            
                                        
                                        =
                                        k
                                    
                                
                                
                                    
                                        
                                            a
                                        
                                        
                                            j
                                        
                                    
                                
                            
                        
                    .”).  
Regarding claim 10, Han in view of Chen teaches the method of claim 9, wherein said generating a set of products based on the plurality of partial sums and the set of common weight values comprises: multiplying each partial sum of the plurality of partial sums by the centroid value in the centroid table having the centroid index value selected for generation of the partial sum(Chen, pgs. 4, left-column, see also fig. 1, “                        
                            
                                
                                    z
                                
                                
                                    i
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        k
                                        =
                                        1
                                    
                                    
                                        K
                                    
                                
                                
                                    
                                        
                                            w
                                        
                                        
                                            k
                                        
                                    
                                    
                                        
                                            [
                                            
                                                
                                                    ϕ
                                                
                                                
                                                    i
                                                
                                            
                                            (
                                            a
                                            )
                                            ]
                                        
                                        
                                            k
                                        
                                    
                                    =
                                    
                                        
                                            ∑
                                            
                                                k
                                                =
                                                1
                                            
                                            
                                                K
                                            
                                        
                                        
                                            
                                                
                                                    w
                                                
                                                
                                                    k
                                                
                                            
                                            
                                                
                                                    ∑
                                                    
                                                        j
                                                        :
                                                        h
                                                        
                                                            
                                                                i
                                                                ,
                                                                 
                                                                j
                                                            
                                                        
                                                        =
                                                        k
                                                    
                                                
                                                
                                                    
                                                        
                                                            a
                                                        
                                                        
                                                            j
                                                        
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                    ”).  
Regarding claim 11, Han in view of Chen teaches the method of claim 7, wherein the activation result is the input of the next layer of the DNN(Chen, pgs. 3-4, see also fig. 1, “This section focuses on a single layer throughout and to simplify notation we will drop the super-scripts                         
                            l
                        
                    . We will denote the input activation as                         
                            a
                            =
                            
                                
                                    a
                                
                                
                                    l
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    m
                                
                            
                        
                     of dimensionality                         
                            m
                            =
                            
                                
                                    n
                                
                                
                                    l
                                
                            
                            .
                        
                     We denote the output as                         
                            z
                            =
                            
                                
                                    z
                                
                                
                                    l
                                    +
                                    1
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    n
                                
                            
                        
                     with dimensionality                         
                            n
                            =
                            
                                
                                    n
                                
                                
                                    l
                                    +
                                    1
                                
                            
                        
                    . To facilitate weight sharing within a feed forward neural network, we can simply substitute Eq. (3) into Eq. (2):                         
                            
                                
                                    z
                                
                                
                                    i
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        j
                                        =
                                        1
                                    
                                    
                                        m
                                    
                                
                                
                                    
                                        
                                            V
                                        
                                        
                                            i
                                            j
                                        
                                    
                                    
                                        
                                            a
                                        
                                        
                                            j
                                        
                                    
                                
                            
                        
                    ”).  
Regarding claim 12, Han in view of Chen teaches the method of claim 7, wherein the activation result is used to backpropagate a measure of output error of the DNN(Chen, pg.4, “Let L denote the loss function for training the neural network, e.g. cross entropy or the quadratic loss…[f]urther, let                         
                            
                                
                                    δ
                                
                                
                                    j
                                
                                
                                    l
                                
                            
                        
                     denote the gradient of L over activation j in layer l, also known as the error term… [i]f we substitute Eq. (7) into the error term we obtain:                         
                            
                                
                                    δ
                                
                                
                                    j
                                
                                
                                    l
                                
                            
                            =
                            
                                
                                    
                                        
                                            ∑
                                            
                                                i
                                                =
                                                1
                                            
                                            
                                                
                                                    
                                                        n
                                                    
                                                    
                                                        l
                                                        +
                                                        1
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    ξ
                                                
                                                
                                                    l
                                                
                                            
                                            
                                                
                                                    i
                                                    ,
                                                    j
                                                
                                            
                                            
                                                
                                                    w
                                                
                                                
                                                    
                                                        
                                                            h
                                                        
                                                        
                                                            l
                                                        
                                                    
                                                    
                                                        
                                                            i
                                                            ,
                                                             
                                                            j
                                                        
                                                    
                                                
                                                
                                                    l
                                                
                                            
                                            
                                                
                                                    δ
                                                
                                                
                                                    i
                                                
                                                
                                                    l
                                                    +
                                                    1
                                                
                                            
                                        
                                    
                                
                            
                            
                                
                                    f
                                
                                
                                    '
                                
                            
                            (
                            
                                
                                    z
                                
                                
                                    j
                                
                                
                                    l
                                
                            
                            )
                        
                    ”).  
Regarding claim 13, Han in view of Chen teaches the method of claim 7, wherein the activation result is used to determine a gradient matrix for the DNN(Chen, pgs.4-5, “To compute the gradient of                          
                            L
                        
                     with respect to a weight                         
                            
                                
                                    w
                                
                                
                                    k
                                
                                
                                    l
                                
                            
                        
                     we need the two gradients… [c]ombining these two, we obtain                         
                            
                                
                                    ∂
                                    L
                                
                                
                                    ∂
                                    
                                        
                                            w
                                        
                                        
                                            k
                                        
                                        
                                            l
                                        
                                    
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        i
                                        ,
                                        j
                                    
                                
                                
                                    
                                        
                                            ∂
                                            L
                                        
                                        
                                            ∂
                                            
                                                
                                                    V
                                                
                                                
                                                    i
                                                    j
                                                
                                                
                                                    l
                                                
                                            
                                        
                                    
                                    
                                        
                                            ∂
                                            
                                                
                                                    V
                                                
                                                
                                                    i
                                                    j
                                                
                                                
                                                    l
                                                
                                            
                                        
                                        
                                            ∂
                                            
                                                
                                                    w
                                                
                                                
                                                    k
                                                
                                                
                                                    l
                                                
                                            
                                        
                                    
                                
                            
                        
                     =                         
                            
                                
                                    ∑
                                    
                                        i
                                        =
                                        1
                                    
                                    
                                        
                                            
                                                n
                                            
                                            
                                                l
                                                +
                                                1
                                            
                                        
                                    
                                
                                
                                    
                                        
                                            ∑
                                            
                                                j
                                            
                                        
                                        
                                            
                                                
                                                    a
                                                
                                                
                                                    j
                                                
                                                
                                                    l
                                                
                                            
                                            
                                                
                                                    δ
                                                
                                                
                                                    i
                                                
                                                
                                                    l
                                                    +
                                                    1
                                                
                                            
                                            
                                                
                                                    ξ
                                                
                                                
                                                    l
                                                
                                            
                                            
                                                
                                                    i
                                                    ,
                                                     
                                                    j
                                                
                                            
                                            
                                                
                                                    δ
                                                
                                                
                                                    
                                                        
                                                            h
                                                        
                                                        
                                                            l
                                                        
                                                    
                                                    
                                                        
                                                            i
                                                            ,
                                                             
                                                            j
                                                        
                                                    
                                                    =
                                                    k
                                                
                                            
                                        
                                    
                                
                            
                        
                    ”).  
Referring to independent claim 14, it is rejected on the same basis as independent claim 7 since they are analogous claims.
Referring to dependent claims 15-20, they are rejected on the same basis as dependent claims 8-13 since they are analogous claims.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US 11196800 B2(details systems and methods for communication efficient distributed mean estimation using variable length encoding)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Adam Clark Standke whose telephone number is (571)270-1806. The examiner can normally be reached 10AM-7PM M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Adam Clark Standke
Assistant Examiner
Art Unit 2129



/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 Examiner takes notice that fig. 3 of Han contains the exact same figure and values found in fig.3 of Applicant’s black and white drawings submitted on 09/26/2019
        2 Examiner takes notice that fig. 3 of Han contains the exact same figure and values found in fig.3 of Applicant’s black and white drawings submitted on 09/26/2019