Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action is in response to application filed 4/16/2020. Claims 21-40 are pending. Priority date: 9/26/2016

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 21, 31 and 32 are rejected under 35 U.S.C. 103 as being unpatentable over Dean et al. (“Large Scale Distributed Deep Networks”, NIPS, 2012, pages: 9) (cited in IDS), hereinafter Dean, in view of Alistarh et al. (US 20180075347) (cited in IDS), hereinafter Alistarh.

21. A client computing device, comprising: 
at least one processor; and at least one non-transitory computer-readable medium that stores instructions that, when executed by the at least one processor, cause the client computing device to perform operations (Dean: e.g., page 4, Fig 2, where, a server is an example of a processor with storages and a model replica is an example of a client), the operations comprising: 
obtaining global values for a set of parameters of a machine-learned model (Dean: e.g., page 4, sec 3.1, par 3, obtaining, by a model replica, a copy of model parameters from a parameter server, where a model replica is an example of a client computing device); 
training the machine-learned model based at least in part on a local dataset to obtain an update matrix that is descriptive of updated values for the set of parameters of the machine- learned model (Dean: e.g., page 4, sec 3.1, par 3, computing a parameter gradient of a model based on a batch of data locally by a model replica, where Figs 1-2, the gradients computed over nodes of neural networks are an example of an update matrix that is descriptive of updated parameter values of the model), 
wherein the local dataset is stored locally by the client computing device (Dean: e.g., Fig 2 (left), a data shard attached to the model replica interprets the local dataset is stored locally by the client computing device); 

communicating the Dean: e.g., page 4, sec 3.1, par 2-3, sending, by the model replica, the gradient update to the parameter server interprets communicating updates to a server computing device).  
 	Dean does not expressly disclose, but  Alistarh discloses “encoding the update matrix to obtain an encoded update” and “encoded update” in “communicating the encoded update to a server computing device” (Alistarh: e.g., [0017], [0071], compressing the gradients (or weights), encoding the gradients or by setting gradients into quantization levels in consideration of their magnitudes interprets encoding the update matrix to obtain an encoded update, where quantizing is an example of encoding, where [0040], iteratively computing gradient for backpropagation by individual nodes of neural networks is an example of update matrix, and where [0017], [0054], the quantized gradients are an example of encoded updates). Nonetheless, quantization in data encoding is well known. It would have been obvious for one of ordinary skill in the art, having Dean and Alistarh before the effective filing date, to combine Alistarh with Dean in order to improve the communication efficiency, an objective of Dean.

31. The claim is substantially the same as claim 21 and therefore, rejected for the same reason. In addition, Alistarh discloses a non-transitory computer-readable medium (e.g., [0065], non-transitory medium).

32. The claim is substantially the same as claim 21 and therefore, rejected for the same reason.

Claims 22, 23, 32 and 33 are rejected under 35 U.S.C. 103 as being unpatentable over Dean, in view of Alistarh, further in view of Simard et al. (US 7016529), hereinafter Simard.

22 and 33, of claims 21 and claim 32, respectively, wherein combination of Dean and Alistarh does not expressly disclose, but Simard discloses encoding the update matrix comprises subsampling the update matrix to obtain the encoded update (Simard: e.g., col 15, lines 39-41, “the feature map sub-sampling every other position” interprets subsampling the parameters).  Nonetheless, subsampling is one of the basic functions of a deep neural network. Dean teaches complex interconnected and distributed neural networks. It would have been obvious for one of ordinary skill in the art, having Simard before the effective filing date, to combine Simard with Dean in view of Alistarh to improve the server and client communication by reducing the message quantities of Dean.

23 and 34, of claim 22 and claim 33, respectively, wherein subsampling the update matrix comprises: generating a parameter mask that specifies a portion of the set of parameters to be sampled; and subsampling the update matrix according to the parameter mask (Simard: e.g., col 15, lines 39-41, “the feature map sub-sampling every other position of at least a portion of the output pattern with the weighted set of trainable parameters”, where a portion of the output interprets the parameter mask).  

Claims 25, 26, 36 and 37 are rejected under 35 U.S.C. 103 as being unpatentable over Dean, in view of Alistarh, further in view of Courbariaux et al. (“BinaryConnect: Training Deep Neural Networks with binary weights during propagations”, NIPS, 2015, pages: 9), hereinafter Courbariaux.

25 and 36, of claim 21 and claim 32, respectively, respectively, wherein combination of Dean and Alistarh does not expressly disclose, but Courbariaux discloses encoding the update matrix comprises probabilistically quantizing one or more values included in the update matrix (Courbariaux: e.g., page 3, EQ (2), assign the maximum or minimum to a weight value based on a probability of a functional output of the weight value interprets probabilistically quantizing one or more values included in the update matrix). Nonetheless, limiting the dynamic range is common in data compression. Dean teaches complex interconnected and distributed neural networks. Binary quantization expedites the derivation of the gradient elements.  It would have been obvious for one of ordinary skill in the art, having Courbariaux before the effective filing date, to combine Courbariaux with the extended Dean to improve the computation and communication speed for the distributed neural networks of Dean.

26 and 37, of claim 21 and claim 32, respectively, respectively, wherein combination of Dean and Alistarh does not expressly disclose, but Courbariaux discloses encoding the update matrix comprises performing probabilistic binary quantization for one or more values included in the update matrix to change each of the one or more values to a maximum value included in the update matrix or a minimum value included in the update matrix (Courbariaux: e.g., page 3, EQ (2), assign the maximum or minimum to a weight value based on a probability of a functional output of the weight value). Nonetheless, limiting the dynamic range is common in data compression. Dean teaches complex interconnected and distributed neural networks. Binary quantization expedites the derivation of the gradient elements.  It would have been obvious for one of ordinary skill in the art, having Courbariaux before the effective filing date, to combine Courbariaux with the extended Dean to improve the computation and communication speed for the distributed neural networks of Dean.

Claims  27 and 38 are rejected under 35 U.S.C. 103 as being unpatentable over Dean, in view of Alistarh, further in view of Li et al. (“Ternary Weight Networks”, May, 2016, https://arxiv.org/pdf/1605.04711v1.pdf , pages: 9), hereinafter Li

27 and 38, of claim 21 and claim 32, respectively, wherein combination of Dean and Alistarh does not expressly disclose, but Li discloses encoding the update matrix comprises: defining a plurality of intervals between a maximum value included in the update matrix and a minimum value included in the update matrix (Li: e.g., page 3-4, EQ (6), defining the intervals between the maximum, 1, and the minimum, -1); and probabilistically changing each of one or more values included in the update matrix to a local interval maximum or a local interval maximum (Li: e.g., page 3-4, EQ (6), EQ (9), probabilistically changing the weight value to one of the ternary values based on EQ (9) according to the probability distribution of W). Nonetheless, the technique of quantization in data encoding is well known. Dean teaches computation and communication aspects of complex interconnected and distributed neural networks. It would have been obvious for one of ordinary skill in the art, having Li before the effective filing date, to combine Li with the extended Dean to improve the computation and communication speed while maintaining the accuracy for the distributed neural networks of Dean.

Claims 28 and 39 are rejected under 35 U.S.C. 103 as being unpatentable over Dean, in view of Alistarh, further in view of Garimella (US 9400955), hereinafter Garimella.

28 and 39, of claim 21 and claim 32, respectively, wherein combination of Dean and Alistarh does not expressly disclose, but Garimella discloses encoding the update matrix comprises multiplying a vector of the update matrix by a rotation matrix to obtain a rotated update (Garimella: e.g., Abstract, utilizing a random rotation matrix to reduce dynamic range of low-rank matrixes interprets multiplying a vector of the update matrix by a rotation matrix to obtain a rotated update). Nonetheless, limiting the dynamic range is common in data compression. Reduction of data dynamic range benefits the calculation of the parameters, including weight and gradient elements, in the complex interconnected and distributed neural networks. Reduction with rotation matrix further expedites the encoding process.  It would have been obvious for one of ordinary skill in the art, having Garimella before the effective filing date, to combine Garimella with the extended Dean to improve the efficiency of computation and communication in determining values of parameters for the large scale distributed neural networks of Dean.

Claim Objections
Claims 24, 29-30, 35 and 40 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. E.g., Nedic et al. teaches computing distributed sub-gradients operating over a time-varying topology under constraints that multiple distributed processing elements store and communicate quantized information. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to LiWu Chang whose telephone number is (571)270-3809, email: li-wu.chang@uspto.gov. The examiner can normally be reached M-F. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda M Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/LI WU CHANG/           Primary Examiner, Art Unit 2124                                                                                                                                                                                             	September 20, 2022