DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are presented for examination.


Double Patenting
The non-statutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A non-statutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on non-statutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 


Claims 1-20 provisionally rejected on the ground of non-statutory double patenting as being unpatentable over claims 1-20 of copending Application No. 16/577,779 (reference application). Although the claims at issue are not identical, they are not patentably distinct from each other.
This is a provisional non-statutory double patenting rejection because the patentably indistinct claims have not in fact been patented.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention 

Claims 1-5, 7-12, and 14-20 are rejected under 35 U.S.C. 103 as being unpatentable over Sridharan et al. (U.S. 2019/0205745) and further in view of de Vangel et al. (U.S. 2020/0226458).
Sridharan was cited on the IDS filed 23 December 2020.

With respect to claim 1, Sridharan teaches a system, comprising: a parameter server (the nodes are directly communicating with the parameter server – see Sridharan, Fig. 22, element 2220; pages 22-23, paragraph 220) communicatively connected to a target device (multiple sets of worker nodes 2216A-2216B, 2236A-2236B – see Sridharan, page 23, paragraph 223), the parameter server (the nodes are directly communicating with the parameter server – see Sridharan, Fig. 22, element 2220; pages 22-23, paragraph 220) comprises: a data manager configured to store a master copy of an artificial intelligence (AI) model (in data parallelism 1904, the different nodes of the distributed network have a complete instance of the model and each node receives a different portion of the data. The results from the different nodes are then combined. Data parallel training approaches all require a technique of combining results and synchronizing the model parameters between each node. Exemplary approaches to combining data include parameter averaging and update based data parallelism. Parameter averaging trains each node on a subset of the training data and sets the global parameters (e.g., weights, biases) to the average of the parameters from each node. Parameter averaging uses a central parameter server that maintains the parameter data. Update based data parallelism is similar to parameter averaging except that instead of transferring parameters from the nodes to the parameter server, the updates to the model are transferred – see Sridharan, pages 18-19, paragraph 190); a transmitter configured to transmit a portion of the Al model (each layer of a neural network can be trained by a different processing node of the 
Sridharan does not explicitly teach a batch manager configured to determine a batch size suitable for the target device.

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Sridharan in view of de Vangel in order to enable a batch manager configured to determine a batch size suitable for the target device. One would be motivated to do so in order to optimize artificial neural network (ANN) computations based on automatic determination of a batch size (de Vangel, page 1, paragraph 1).

With respect to claim 2, the combination of Sridharan and de Vangel teaches the invention described in claim 1, including the system wherein the weight updater is configured to perform reduction of parameters by: receiving gradients from the target device, the gradients being generated by the target device executing the set of microbatches of the training dataset on the second subportion (Sridharan, page 20, paragraph 202) at the target device (Sridharan, Fig. 19, element 1904; pages 18-19, paragraph 190); and generating an average of the received gradients (Sridharan, page 20, paragraph 202).

With respect to claim 3, the combination of Sridharan and de Vangel teaches the invention described in claim 2, including the system wherein the weight updater is further configured to update the AI model with the average of the received gradients (Sridharan, pages 18-19, paragraph 190).

With respect to claim 4, the combination of Sridharan and de Vangel teaches the invention described in claim 2, including the system wherein the set of microbatches comprises a plurality of microbatches that are configured to be executed in sequential order (Sridharan, page 20, paragraph 202), the set of microbatches forming a minibatch that comprises a number of samples per update for training of the AI model (Sridharan, page 20, paragraph 202).
With respect to claim 5, the combination of Sridharan and de Vangel teaches the invention described in claim 2, including the system wherein the microbatch size is configurable based on a rate of executing the set of microbatches at the target device and a rate of communication between the target device and the parameter server (de Vangel, Fig. 7; page 5, paragraph 68).

With respect to claim 7, the combination of Sridharan and de Vangel teaches the invention described in claim 1, including the system wherein the transmitter is further configured to transmit another portion of the AI model to another target device; and the weight updater is further configured to receive gradients from the another target device to perform reduction of parameters for the another portion of the AI model (Sridharan, pages 18-19, paragraph 190).

With respect to claim 8, Sridharan teaches a method implemented in a parameter server (the nodes are directly communicating with the parameter server – see Sridharan, Fig. 22, element 2220; pages 22-23, paragraph 220), comprising: store a master copy of an artificial intelligence (AI) model (in data parallelism 1904, the different nodes of the distributed network have a complete instance of the model and each node receives a different portion of the data. The results from the different nodes are then combined. Data parallel training approaches all require a technique of combining results and synchronizing the model parameters between each node. Exemplary approaches to combining data include parameter averaging and update based data parallelism. Parameter averaging trains each node on a subset of the training data and sets the global parameters (e.g., weights, biases) to the average of the parameters from each node. Parameter averaging uses a central parameter server that maintains the parameter data. Update based data parallelism is similar to parameter averaging except that instead of transferring parameters from the nodes to the parameter server, the updates to the model are transferred – see Sridharan, pages 18-19, paragraph 190); transmitting a portion of the Al model (each layer of a 
Sridharan does not explicitly teach determining a batch size suitable for the target device.
However, de Vangel teaches determining a batch size suitable for the target device (de Vangel, Fig. 7; page 5, paragraph 68).


With respect to claim 15, Sridharan teaches a computer program product comprising a computer-readable storage device having computer program logic recorded thereon that when executed by a processor-based computer system causes the processor-based system to perform a method, the method comprising: storing a master copy of an artificial intelligence (AI) model (in data parallelism 1904, the different nodes of the distributed network have a complete instance of the model and each node receives a different portion of the data. The results from the different nodes are then combined. Data parallel training approaches all require a technique of combining results and synchronizing the model parameters between each node. Exemplary approaches to combining data include parameter averaging and update based data parallelism. Parameter averaging trains each node on a subset of the training data and sets the global parameters (e.g., weights, biases) to the average of the parameters from each node. Parameter averaging uses a central parameter server that maintains the parameter data. Update based data parallelism is similar to parameter averaging except that instead of transferring parameters from the nodes to the parameter server, the updates to the model are transferred – see Sridharan, pages 18-19, paragraph 190) at a parameter server (the nodes are directly communicating with the parameter server – see Sridharan, Fig. 22, element 2220; pages 22-23, paragraph 220); transmitting a portion of the Al model (each layer of a neural network can be trained by a different processing node of the distributed system – see Sridharan, page 18, paragraph 189) to the target device (the different nodes of the distributed network have a complete instance of the model and each node receives a different portion of 
Sridharan does not explicitly teach determining a batch size suitable for the target device.
However, de Vangel teaches determining a batch size suitable for the target device (de Vangel, Fig. 7; page 5, paragraph 68).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Sridharan in view of de Vangel in order to enable determining a batch size 

Claims 9-12, 14, and 16-20 do not teach or define any new limitations above claims 2-5 and 7 and therefore are rejected for similar reasons.

Allowable Subject Matter
Claims 6 and 13 are objected to as being dependent upon rejected base claims, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Alicia Baturay whose telephone number is (571) 272-3981. The examiner can normally be reached at 7am – 4pm, Mondays – Thursdays, Eastern Time.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Wing Chan can be reached on (571) 272-7493. The fax number for the organization where this application or proceeding is assigned is (571) 273-8300.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).




/Alicia Baturay/
Primary Examiner, Art Unit 2441

June 3, 2021