DETAILED ACTION
This action is in response to the Applicant Response filed 02 December 2021 for application 16/204,770 filed 29 November 2018.
Claims 1-5, 7, 9-12, 15-20 are currently amended.
Claims 21-23 are new.
Claims 6, 8, 14 are cancelled.
Claims 1-5, 7, 9-13, 15-23 are pending.
Claims 1-5, 7, 9-13, 15-23 are rejected.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments regarding the objections to the specification have been fully considered and, in light of the amendments to the specification, are persuasive.

Applicant’s arguments regarding the objections to the drawings have been fully considered and, in light of the amendments to the specification, are persuasive.

Applicant's arguments regarding the 35 U.S.C. 101 rejections of claims 1-4, 7, 9-12, 15-20 have been fully considered and, in light of the amendments to the claims, are persuasive. The 35 U.S.C. 101 rejections of claims 1-4, 7, 9-12, 15-20 have been withdrawn.

Applicant’s arguments regarding the 35 U.S.C. 102(a)(1) rejections of claims 1, 3-5, 9, 11-13, 15, 17-20 and applicant’s arguments regarding the 35 U.S.C. 103 rejections of 2, 7, 10, 16 have been fully considered but are not persuasive. 
It is noted while the Examiner may appreciate differences between the applied art and features described in the originally filed specification, any such features must be explicitly recited in the claims themselves and/or definitively and comprehensively defined in the specification in order to be considered and impact BRI of the metes and bounds of the claim terms. Applicant is respectfully reminded that during examination, the BRI of the claim terms consistent with the specification applies, and thus, the applicant is encouraged to amend the claims or point to portion(s) of the originally filed specification that prevent the BRI interpretation of the claim terms (MPEP 2173.01) enabling correspondence to the applied art.
Applicant argues that the references fail to teach the added features of claim 1 (similarly claims 9, 15), particularly:
...
a pointer component that identifies one or more compressed gradient weights, from one or more second learning entities of a distributed machine learning system, not present in a customized first concatenated compressed gradient weight for a first learning entity of the distributed machine learning system that was previously sent to the first learning entity; 
a compression component that computes a customized second concatenated compressed gradient weight for the first learning entity based on the one or more compressed gradient weights to update a weight of the first learning entity; and 
a transmit component that transmits, via a network, to the first learning entity, the customized second concatenated compressed gradient weight to initiate the first learning entity to update the weight of the first learning entity using the customized second concatenated compressed gradient weight.
(emphasis added by applicant). Specifically, applicant argues that Wen merely discloses each worker computer calculating local gradients, quantizing the gradients and sending the gradients to the parameter server but is silent to the above emphasized features. Applicant further argues that Zhang fails to cure the deficiencies of Wen, regarding the above recited features.
t (Wen, section 3.1). Wen further teaches that the parameter server averages the gradients received at iteration t from all of the workers and sends the averaged gradients back to the workers to updated each of the workers (Wen, section 3.1). Therefore, Wen teaches identifying compressed gradient weights from the workers, including one or more second learning entities, at iteration t. Because these gradients were identified at iteration t, they were not present in the first concatenated compressed gradient weight sent to the workers, including the first learning entity, at iteration t-1. Once the gradients for the workers at iteration t are identified, Wen teaches averaging the identified gradients. Because these weights are associated to the specific global model and, therefore, the specific local models, these averaged weights are customized for a particular model. Further, as noted in the specification, a concatenated compressed gradient weight is calculated using, for example, a hardsync protocol (¶0052), where a hardsync protocol averages the gradients from all of the learners (¶¶0056-0058; Figure 2, equations 202, 204). Wen further teaches sending the customized second concatenated compressed gradient weight to all of the workers, including the first learning entity, to update the local models, including the first learning entity. Therefore, Wen does, in fact, teach a pointer component that identifies one or more compressed gradient weights, from one or more second learning entities of a distributed machine learning system, not present in a customized first concatenated compressed gradient weight for a first learning entity of the distributed machine learning system that was previously sent to the first learning entity; a compression component that computes a customized second 
Therefore, claim 1 is rejected under 35 U.S.C. 102(a)(1) as anticipated by Wen. For similar reasons, claims 9, 15 are also rejected as anticipated by Wen. Additionally, the rejections of claims 1, 9, 15 apply to all dependent claims which are dependent on claims 1, 9, 15, including claims 3, 5, 8, 11, 13, 17, 19 which are also anticipated by Wen; claims 2, 7, 10, 16, 20-23 which are unpatentable over Wen in view of Zhang; and claims 4, 12, 18 which are unpatentable over Wen in view of Lim.

Claim Objections
Claims 7, 20, 22 are objected to because of the following informalities:
Claim 7, lines 2-3, an asynchronous machine learning system, or an asynchronous stochastic gradient descent system should read “an asynchronous machine learning system or an asynchronous stochastic gradient descent system” [comma removed]
Claim 20, lines 2-3, an asynchronous machine learning system, or an asynchronous stochastic gradient descent system should read “an asynchronous machine learning system or an asynchronous stochastic gradient descent system” [comma removed]
Claim 22, lines 2-3, an asynchronous machine learning system, or an asynchronous stochastic gradient descent system should read “an asynchronous machine learning system or an asynchronous stochastic gradient descent system” [comma removed]
Appropriate correction is required.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 3, 5, 9, 11, 13, 15, 17, 19 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Wen et al. (TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning, hereinafter referred to as "Wen").

Regarding claim 1 (Currently Amended), Wen teaches a system, comprising: 
a memory that stores computer executable components (Wen, Appendix B – teaches CPU/GPU based deep learning systems with distributed TensorFlow on a cluster of 4 machines, each of which had 4 GTX 1080 GPUs); and 
a processor that executes the computer executable components stored in the memory (Wen, Appendix B – teaches CPU/GPU based deep learning systems with distributed TensorFlow on a cluster of 4 machines, each of which had 4 GTX 1080 GPUs), wherein the computer executable components comprise: 
a pointer component that identifies one or more compressed gradient weights, from one or more second learning entities of a distributed machine learning system, not present in a customized first concatenated compressed gradient weight for a first learning entity of the distributed machine learning system that was previously sent to the first learning entity (Wen, section 3.1 – teaches that at iteration t, each worker computer generates local gradients, ternarizes [compresses] the gradients, and sends them to the parameter server [ternarized local t, they are not present in the previously calculated averaged gradients at t-1 (first concatenated compressed gradient weights). Further, because gradients are received from multiple workers and averaged gradients are then sent back to the multiple workers, identified gradients from first and one or more second entities are returns averaged gradients to the first and one or more second gradients]); 
a compression component that computes a customized second concatenated compressed gradient weight for the first learning entity based on the one or more compressed gradient weights to update a weight of the first learning entity (Wen, section 3.1 – teaches that at iteration t, each worker computers local ternarized gradients [compressed gradient weights] and sends them to the parameter server, where the parameter server averages [concatenates] the gradients from all the workers and sends the averaged gradients [second concatenated compressed gradient weight] back to the workers to update the workers [learning entities]; see also Wen, Figure 1, Algorithm 1 [As discussed in the specification, a concatenated compressed gradient weight is calculated using, for example, a hardsync protocol (¶0052), where a hardsync protocol averages the gradients from all of the learners (¶¶0056-0058; Figure 2, equations 202, 204)]); and 
a transmit component that transmits, via a network, to the first learning entity (Wen, section 3.1 – teaches transmitting the averaged gradients for iteration t [second concatenated compressed gradient weight] to the workers [including first learning entity] for update), the customized second concatenated compressed gradient weight to initiate the first learning entity to update the weight of the first learning entity using the customized second concatenated compressed gradient weight (Wen, section 3.1 – teaches transmitting the averaged gradients for iteration t [second concatenated compressed gradient weight] to the workers [including the first entity] for update).

Regarding claim 3 (Currently Amended), Wen teaches all of the limitations of the system of claim 1 as noted above. Wen further teaches wherein the compression component computes the customized first concatenated compressed gradient weight based on one or more second compressed gradient weights of respective learning entities of the distributed machine learning system (Wen, section 3.1 – teaches that at iteration t, each worker computers local gradients and sends them to the parameter server, where the parameter server averages the gradients from all the workers and sends the averaged gradients back to the workers; see also Wen, Figure 1, Algorithm 1 [Because this happens at each iteration, the averaged gradients at t-1 (first concatenated compressed gradient weight) was based on local gradients at t-1 (second compressed gradient weights)]).

Regarding claim 5 (Currently Amended), Wen teaches all of the limitations of the system of claim 1 as noted above. Wen further teaches wherein the customized second concatenated compressed gradient weight comprises a windowed concatenated compressed gradient weight having only the one or more compressed gradient weights (Wen, section 3.1 – teaches calculating average gradients for the iteration [second concatenated compressed gradient weight] using the only the local gradients for each learner for that particular iteration [only the one or more compressed gradient weights]), thereby facilitating at least one of: 
improved processing efficiency associated with the processor (Wen, section 5 – teaches improved processing efficiency); or 
reduced storage consumption associated with the memory
Regarding claim 9 (Currently Amended), it is the computer-implemented method embodiment of claim 1 with similar limitations to claim 1 and is rejected using the same reasoning found in claim 1.

Regarding claim 11 (Currently Amended), the rejection of claim 9 is incorporated herein. Further, the limitations in this claim are taught by Wen for the reasons set forth in the rejection of claim 3.

Regarding claim 13 (Original), Wen teaches all of the limitations of the method of claim 9 as noted above. Wen further teaches wherein the computing comprises, computing, by the system, a windowed concatenated compressed gradient weight having only the one or more compressed gradient weights (Wen, section 3.1 – teaches calculating average gradients for the iteration [second concatenated compressed gradient weight] using the only the local gradients for each learner for that particular iteration [only the one or more compressed gradient weights]), thereby facilitating improved processing efficiency associated with the processor (Wen, section 5 – teaches improved processing efficiency).

Regarding claim 15 (Currently Amended), it is the computer program product embodiment of claim 1 with similar limitations to claim 1 and is rejected using the same reasoning found in claim 1. Wen further teaches the following additional limitations:
a computer program product facilitating a gradient weight compression process (Wen, section 3.1 – teaches ternary compression of gradients), the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor (Wen, Appendix B – teaches CPU/GPU based deep learning systems with distributed TensorFlow on a cluster of 4 machines, each of which had 4 GTX 1080 GPUs) ...
Regarding claim 17 (Currently Amended), the rejection of claim 15 is incorporated herein. Further, the limitations in this claim are taught by Wen for the reasons set forth in the rejection of claim 3.

Regarding claim 19 (Currently Amended), Wen teaches all of the limitations of the computer program product of claim 15 as noted above. Wen further teaches wherein the customized second concatenated compressed gradient weight comprises a windowed concatenated compressed gradient weight having only the one or more compressed gradient weights (Wen, section 3.1 – teaches calculating average gradients for the iteration [second concatenated compressed gradient weight] using the only the local gradients for each learner for that particular iteration [only the one or more compressed gradient weights]).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.


Claims 2, 7, 10, 16, 20-23 are rejected under 35 U.S.C. 103 as being unpatentable over Wen in view of Zhang et al. (Staleness-aware Async-SGD for Distributed Deep Learning, hereinafter referred to as "Zhang").

Regarding claim 2 (Currently Amended), Wen teaches all of the limitations of the system of claim 1 as noted above. However, Wen does not explicitly teach wherein the pointer component identifies the one or more compressed gradient weights based on a first timestamp corresponding to the customized first concatenated compressed gradient weight and one or more second timestamps corresponding respectively to the one or more compressed gradient weights.
Zhang teaches wherein the pointer component identifies the one or more compressed gradient weights based on a first timestamp corresponding to the customized first concatenated compressed gradient weight and one or more second timestamps corresponding respectively to the one or more compressed gradient weights (Zhang, section 2.4 – teaches that the weights are updated when it has received a given number, e.g., 30 in the reference, of gradients from any of the learners [This demonstrates that the system has to have a timestamp of the last update (first timestamp) and a timestamp of the incoming gradients (second timestamps) to make sure the new gradients are identified after the last update. Further, while Zhang sends updated weights back to the learners, it would be obvious to a person having ordinary skill, especially in light of Wen, that the gradients could be sent to the learners.]).
It would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention to modify Wen with the teachings of Zhang in order to accelerate training of large-

Regarding claim 7 (Currently Amended), Wen teaches all of the limitations of the system of claim 1 as noted above. However, Wen does not explicitly teach wherein the distributed machine learning system comprises at least one of an asynchronous machine learning system, or an asynchronous stochastic gradient descent system.
Zhang teaches wherein the distributed machine learning system comprises at least one of an asynchronous machine learning system, or an asynchronous stochastic gradient descent system (Zhang, section 2.2 – teaches implementing an asynchronous SGD).
It would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention to modify Wen with the teachings of Zhang in order to accelerate training of large-scale deep networks in a distributed environment compared to SSGD and conventional ASGD algorithms 

Regarding claim 10 (Currently Amended), the rejection of claim 9 is incorporated herein. Further, the limitations in this claim are taught by Wen in view of Zhang for the reasons set forth in the rejection of claim 2.

Regarding claim 16 (Currently Amended), the rejection of claim 15 is incorporated herein. Further, the limitations in this claim are taught by Wen in view of Zhang for the reasons set forth in the rejection of claim 2.

Regarding claim 20 
Regarding claim 21 (New), Wen teaches all of the limitations of the computer program product of claim 15 as noted above. However, Wen does not explicitly teach encode, by the processor, a timestamp on the customized second concatenated compressed gradient weight.
Zhang teaches encode, by the processor, a timestamp on the customized second concatenated compressed gradient weight (Zhang, section 2.1 – teaches encoding a timestep counter with each gradient/weight transfer).
It would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention to modify Wen with the teachings of Zhang in order to accelerate training of large-scale deep networks in a distributed environment compared to SSGD and conventional ASGD algorithms in the field of distributed deep learning (Zhang, Abstract - "Deep neural networks have been shown to achieve state-of-the-art performance in several machine learning tasks. Stochastic Gradient Descent (SGD) is the preferred optimization algorithm for training these networks and asynchronous SGD (ASGD) has been widely adopted for accelerating the training of large-scale deep networks in a distributed computing environment. However, in practice it is quite challenging to tune the training hyperparameters (such as learning rate) when using ASGD so as achieve convergence and linear speedup, since the stability of the optimization algorithm is strongly influenced by the asynchronous nature of parameter updates. In this paper, we propose a variant of the ASGD algorithm in which the learning rate is modulated according to the gradient staleness and provide theoretical guarantees for convergence of this algorithm. Experimental verification is performed on commonly-used image classification benchmarks: CIFAR10 and Imagenet to demonstrate the superior effectiveness of the proposed approach, compared to SSGD (Synchronous SGD) and the conventional ASGD algorithm.").

Regarding claim 22 (New), the rejection of claim 9 is incorporated herein. Further, the limitations in this claim are taught by Wen in view of Zhang for the reasons set forth in the rejection of claim 7.

Regarding claim 23 (New), the rejection of claim 9 is incorporated herein. Further, the limitations in this claim are taught by Wen in view of Zhang for the reasons set forth in the rejection of claim 21.

Claims 4, 12, 18 are rejected under 35 U.S.C. 103 as being unpatentable over Wen in view of Lim et al. (US 2018/0336076 A1 – Parameter-Sharing Apparatus and Method, hereinafter referred to as “Lim”).

Regarding claim 4 (Currently Amended), Wen teaches all of the limitations of the system of claim 3 as noted above. However, Wen does not explicitly teach wherein the transmit component transmits to the respective learning entities of the distributed machine learning system respective sizes of the one or more second compressed gradient weights.
Lim teaches wherein the transmit component transmits to the respective learning entities of the distributed machine learning system respective sizes of the one or more second compressed gradient weights (Lim, ¶¶0087, 0114-0115 – teaches a distributed system [multiple learning entities] with a central parameter server receiving parameter information, including parameter size, such as size of memory needed to store the parameter, when transferring parameter values [gradient weights]).
It would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention to modify Wen with the teachings of Lim in order to accelerate training of distributed deep learning systems in the field of distributed deep learning (Lim, ¶0015 – “Due to these features, it is 

Regarding claim 12 (Currently Amended), the rejection of claim 11 is incorporated herein. Further, the limitations in this claim are taught by Wen in view of Lim for the reasons set forth in the rejection of claim 4.

Regarding claim 18 (Currently Amended), the rejection of claim 17 is incorporated herein. Further, the limitations in this claim are taught by Wen in view of Lim for the reasons set forth in the rejection of claim 4.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing 

Any inquiry concerning this communication or earlier communication from the examiner should be directed to MARSHALL WERNER whose telephone number is (469) 295-9143. The examiner can normally be reached on Monday – Thursday 7:30 AM – 4:30 PM ET.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar, can be reached at (571) 272-7796. The fax number for the organization where this application or proceeding is assigned is (571) 273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/MARSHALL L WERNER/               Examiner, Art Unit 2125                    

/BRIAN M SMITH/               Primary Examiner, Art Unit 2122