DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Amendments
This action is in response to amendments filed on 01 February 2021. As per applicant’s request, claims 16, 26-27 have been amended. Claims 16, 22-27, 29-35 are pending in the application.

Response to Arguments
Applicant’s arguments regarding the objections to the claims have been fully considered and, in light of the amendments to the claims, are persuasive.

Applicant’s arguments regarding the 35 U.S.C. 101 rejection of claim 26 have been fully considered and, in light of the amendments made to the claims, are persuasive. The 35 U.S.C. 101 rejection of claim 26 has been withdrawn.

Applicant’s arguments regarding the 35 U.S.C. 112(a) rejection of claim 26 have been fully considered and, in light of the amendments made to the claims, are persuasive. The 35 U.S.C. 112(a) rejection of claim 26 has been withdrawn.




Examiner’s Amendment
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment to the claims was given in applicant follow-up interview of 16 February 2021 from Scott Krueger Reg. No. 72633.
The claims have been amended as follows:
16. (Currently Amended) A method comprising: 
for each of a plurality of nodes:
determining, by a respective node of the plurality of nodes, a respective stripe of a plurality of stripes of gradient matrices of a computational model of an optimization problem for a first minibatch of training data; 
quantizing the respective stripe of the plurality of stripes of gradient matrices for the first minibatch using corresponding stored error matrices stored by the respective node; 
updating the stored error matrices stored by the respective node for the first minibatch using the corresponding quantized stripe of gradient matrices; 
exchanging the quantized respective stripe of the plurality of stripes of gradient matrices for the first minibatch synchronously with the plurality of nodes
partitioning the quantized respective stripe of gradient matrices 
, to each of the plurality of nodes, a respective [[ones ]]data stripe of the plurality of data stripes [[to ]]of the respective 
receiving a respective data stripe[[s]] from each of the other nodes during the first phase of the exchanging
aggregating the received data stripes into aggregated stripe data; 
transmitting, to each of the plurality of nodes, the aggregated stripe data [[to ]]of the respective 
receiving other aggregated stripe data from each of the other nodes during the second phase of the exchanging; and 
recovering the full set of gradients for the computational model from the aggregated stripe data and the other aggregated stripe data;
determininga respective stripe of a plurality of stripes of gradient matrices of the computational model for a second minibatch of the training data while exchanging the quantized gradient matrices for the first minibatch
repeating the determining, quantizing, updating, and exchanging steps for each of a plurality of minibatches of the computational model, the plurality of minibatches including the first and the second minibatches.


for each of a plurality of nodes:
determining, by a respective node of the plurality of nodes, a respective stripe of a plurality of stripes of gradient matrices of a computational model of an optimization problem for a first minibatch of training data; 
quantizing the respective stripe of the plurality of stripes of the gradient matrices for the first minibatch using corresponding stored error matrices stored by the respective node; 
updating the stored error matrices stored by the respective node for the first minibatch using the corresponding quantized stripe of gradient matrices; 
exchanging the quantized respective stripe of the plurality of stripes of gradient matrices for the first minibatch synchronously with the plurality of nodes
partitioning the quantized respective stripe of gradient matrices 
providing, to each of the plurality of nodes, a respective [[ones ]]-data stripe of the plurality of data stripes [[to ]]of the respective 
receiving a respective data stripe[[s]] from each of the other nodes during the first phase of the exchanging
aggregating the received data stripes into aggregated stripe data; 
, to each of the plurality of nodes, the aggregated stripe data [[to ]]of the respective 
receiving other aggregated stripe data from each of the other nodes during the second phase of the exchanging; and 
recovering the full set of gradients for the computational model from the aggregated stripe data and the other aggregated stripe data; 
determininga respective stripe of a plurality of stripes of gradient matrices of the computational model for a second minibatch of the training data while exchanging the quantized gradient matrices for the first minibatch
repeating the determining, quantizing, updating, and exchanging steps for each of a plurality of minibatches of the computational model, the plurality of minibatches including the first and the second minibatches.

27. (Proposed Examiner Amendment) A system comprising: 
one or more computer-readable media having thereon a plurality of modules and a computational model of an optimization problem; and 
a plurality of nodes, each including at least one processing unit, each processing unit operably coupled to at least one of the computer-readable media, the processing units adapted to intercommunicate and to execute modules of the plurality of modules comprising: 
an update-determining module configured to determine modification values of the computational model for a first minibatch of training data, the modification values comprising, for each node of the plurality of nodes, a respective stripe of a plurality 
a quantization module configured to quantize, for each node of the plurality of nodes, the respective stripe of the gradient matrices determined modification values for the first minibatch using stored error values stored by the respective node and to update the stored error values stored by the respective node for the first minibatch using the determined modification values and the quantized stripe of the gradient matrices determined modification values; 
a transferring module configured to transmit at least some of the quantized stripes of the gradient matrices determined modification values for the first minibatch synchronously to at least one other of the plurality of nodes, wherein the transferring module is configured to: 
partition each of the quantized stripes of the gradient matrices determined modification values
provide, to each of the plurality of nodes, a respective [[ones ]]data stripe of the plurality of data stripes [[to ]]of the respective 
receive, by each of the plurality of nodes, a respective data stripe[[s]] from each of the other nodes during the first phase of the transferring
aggregate, for each of the plurality of nodes, the received data stripes into aggregated stripe data; 
, to each of the plurality of nodes, the aggregated stripe data [[to ]]of the respective 
, by each of the plurality of nodes, other aggregated stripe data from each of the other nodes during the second phase of the transferring; and 
recover, by each of the plurality of nodes, the full set of gradients for the computational model from the aggregated stripe data and the other aggregated stripe data; and 
an updating module configured to modify the stored computational model according to the recovered full set of gradients
wherein the update-determining module is further configured to determine modification values of the computational model for a second minibatch of the training data while the transferring module is transmitting the at least some of the quantized stripes of the gradient matrices determined modification values for the first minibatch to the at least one other of the plurality of nodes

29. (Proposed Examiner Amendment) The system of claim 27, wherein each node includes a respective memory coupled to the respective at least one processing unit and configured to store a respective private quantization state including the stored error values.

32. (Proposed Examiner Amendment) The system of claim 27, further including a crossbar communicatively connecting the nodes, wherein the nodes are configured to execute the transferring module to transmit the at least some of the quantized stripes of the gradient matrices determined modification values via the crossbar in parallel with executing the update-determining module.

34. (Proposed Examiner Amendment) The system of claim 33, wherein the transferring module is configured to transfer second quantized stripes of the gradient matrices determined modification values from the GPGPU to the CPU in parallel with transferring the at least some of the quantized stripes of the gradient matrices determined modification values to the at least one other of the plurality of nodes

35. (Proposed Examiner Amendment) The system of claim 27, the quantization module further configured to reconstruct modification values using the transferred quantized stripes of the gradient matrices determined modification values and the updating module configured to modify the stored computational model according to the reconstructed modification values.

Allowable Subject Matter
Claims 16, 22-27, 29-35 are allowed.

The following is an examiner’s statement of reasons for allowance: in view of claims 16 and 26-27 and further search, claims 16 and 26-27 are considered allowable since when reading the claims in light of the specification, as per MPEP 2111.01, none of the references of record either alone or in combination fairly disclose or suggest the combination of limitations specified in the independent claims, including at least:


Claim 16:
...
exchanging the quantized respective stripe of the plurality of stripes of gradient matrices for the first minibatch synchronously with the plurality of nodes, wherein the exchanging comprises: 
partitioning the quantized respective stripe of gradient matrices into a plurality of data stripes; 
providing, to each of the plurality of nodes, a respective data stripe of the plurality of data stripes of the respective node during a first phase of the exchanging; 
receiving a respective data stripe from each of the other nodes during the first phase of the exchanging; 
aggregating the received data stripes into aggregated stripe data; 
transmitting, to each of the plurality of nodes, the aggregated stripe data of the respective node during a second phase of the exchanging; 
receiving other aggregated stripe data from each of the other nodes during the second phase of the exchanging; and 
recovering the full set of gradients for the computational model from the aggregated stripe data and the other aggregated stripe data;
...





Claim 26:
...
exchanging the quantized respective stripe of the plurality of stripes of gradient matrices for the first minibatch synchronously with the plurality of nodes, wherein the exchanging comprises: 
partitioning the quantized respective stripe of gradient matrices into a plurality of data stripes; 
providing, to each of the plurality of nodes, a respective -data stripe of the plurality of data stripes of the respective node during a first phase of the exchanging; 
receiving a respective data stripe from each of the other nodes during the first phase of the exchanging; 
aggregating the received data stripes into aggregated stripe data; 
transmitting, to each of the plurality of nodes, the aggregated stripe data of the respective node during a second phase of the exchanging; 
receiving other aggregated stripe data from each of the other nodes during the second phase of the exchanging; and 
recovering the full set of gradients for the computational model from the aggregated stripe data and the other aggregated stripe data;
...





Claim 27:
...
a transferring module configured to transmit at least some of the quantized stripes of the gradient matrices determined modification values for the first minibatch synchronously to at least one other of the plurality of nodes, wherein the transferring module is configured to: 
partition each of the quantized stripes of the gradient matrices determined modification values for the first minibatch into a plurality of data stripes; 
provide, to each of the plurality of nodes, a respective data stripe of the plurality of data stripes of the respective node during a first phase of transferring; 
receive, by each of the plurality of nodes, a respective data stripe from each of the other nodes during the first phase of the transferring; 
aggregate, for each of the plurality of nodes, the received data stripes into aggregated stripe data; 
transmit, to each of the plurality of nodes, the aggregated stripe data of the respective node during a second phase of the transferring; 
receive, by each of the plurality of nodes, other aggregated stripe data from each of the other nodes during the second phase of the transferring; and 
recover, by each of the plurality of nodes, the full set of gradients for the computational model from the aggregated stripe data and the other aggregated stripe data; and 
...

Regarding the cited limitations of claims 16, 26-27, which do not appear to be taught by the prior art: Strom teaches determining gradients of a computational model for a minibatch, quantizing the gradients using stored errors, updating the errors and exchanging the gradients with other nodes.

Therefore the cited/applied prior art fails to teach or suggest each and every feature of each of the combination of features recited in each of the independent claims 16, 26-27.
When taken into context, the claims as a whole were not uncovered in the prior art; i.e., all dependent claims which depend from claims 16, 26-27 are allowed as they depend upon an allowable independent claim.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submissions should be clearly labeled "Comments on Statement of Reasons for Allowance."

Conclusion
Any inquiry concerning this communication or earlier communication from the examiner should be directed to MARSHALL WERNER whose telephone number is (469) 295-9143. The examiner can normally be reached on Monday – Thursday 7:30 AM – 4:30 PM CST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar, can be reached at (571) 272-7796. The fax number for the organization where this application or proceeding is assigned is (571) 273-8300.


/MARSHALL L WERNER/               Examiner, Art Unit 2125                                                                                                                                                                              
	

	
	/KAMRAN AFSHAR/               Supervisory Patent Examiner, Art Unit 2125