DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to the amendment filed 09 September 2021.
Claims 1, 6, 8, 13, and 15 were amended.
Claims 1-20 are pending in this Office Action.


Response to Amendment
The rejection of claims 1-20 under 35 U.S.C. § 101 regarding omitting essential elements was addressed and is withdrawn.
Applicants’ amendments and arguments with respect to claims 1-20 filed on 09 September 2021 have been fully considered but they are deemed to be moot in view of the new grounds of rejection.


Response to Arguments
Applicants’ arguments filed 09 September 2021 have been fully considered, but they are not persuasive for the reasons set forth below. 

Applicants Argue: Sridharan fails to teach a worker node comprising an integrated circuit chip having an on-chip memory size less than an entirety of the training model, as would be needed to teach or suggest claim 1. In fact, Sridharan provides no mention of the worker node having an integrated circuit chip let alone an integrated circuit chip having an on-chip memory size less than an entirety of the training model. 
teaches away from this aspect of claim 1, stating in para. [190] that “the different nodes of the distributed network have a complete instance of the model and each node receives a different portion of the data.” Thus, Sridharan implicitly discloses that its different nodes each store an entirety of the model, whereas, in claim 1, the target device comprises an integrated circuit chip having on-chip memory size less than an entirety of the AI model. Thus, for at least this reason, Sridharan fails to teach or suggest each and every feature of claim 1.

In Response: The examiner respectfully submits that the combination of Sridharan, Jackey, and de Vangel teaches the target device comprises an integrated circuit chip having on-chip memory size less than an entirety of the AI model.
The target device (multiple sets of worker nodes 2216A-2216B, 2236A-2236B – see Sridharan, page 23, paragraph 223) comprising an integrated circuit chip having an on-chip (the system 100 is a processing platform incorporated within a system-on-a-chip (SoC) integrated circuit – see Sridharan, Figs. 11B-14B; page 2, paragraphs 47-48) memory size less than an entirety of (a device on which the model executes has insufficient memory to support the model – see Jackey, col. 14, lines 19-36) the model (a distributed implementation can interact with model applications, such as, but not limited to, MATLAB® and Mathcad. Embodiments further interact with graphical modeling environments, such as, but not limited to LabView® – see Jackey, col. 14, line 62 – col. 15, line 19)
This renders the rejection proper, and thus the rejection stands.


Double Patenting
The non-statutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A non-statutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on non-statutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. 

Claims 1-20 provisionally rejected on the ground of non-statutory double patenting as being unpatentable over claims 1-20 of copending Application No. 16/577,779 (reference application). Although the claims at issue are not identical, they are not patentably distinct from each other.
This is a provisional non-statutory double patenting rejection because the patentably indistinct claims have not in fact been patented.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 7-12, and 14-20 are rejected under 35 U.S.C. 103 as being unpatentable over Sridharan et al. (U.S. 2019/0205745) in view of Jackey et al. (U.S. 8,780,114) and further in view of de Vangel et al. (U.S. 2020/0226458).
Sridharan was cited on the IDS filed 23 December 2020.

With respect to claim 1, Sridharan teaches a system, comprising: a parameter server (the nodes are directly communicating with the parameter server – see Sridharan, Fig. 22, element 2220; pages 22-23, paragraph 220) communicatively connected to a target device (multiple sets of worker nodes 2216A-2216B, 2236A-2236B – see Sridharan, page 23, paragraph 223), the parameter server (the nodes are directly communicating with the parameter server – see Sridharan, Fig. 22, element 2220; pages 22-23, paragraph 220) comprises: a data manager configured to store a master copy of an artificial intelligence (AI) model (in data parallelism 1904, the different nodes of the distributed network have a complete instance of the model and each node receives a different portion of the data. The results from the different nodes are then combined. Data parallel training approaches all require a technique of combining results and synchronizing the model parameters between each node. Exemplary approaches to combining data include parameter averaging and update based data parallelism. Parameter averaging trains each node on a subset of the training data and sets the global parameters (e.g., weights, biases) to the average of the parameters from each node. Parameter averaging uses a central parameter server that maintains the parameter data. Update based data parallelism is similar to parameter averaging except that instead of transferring parameters from the nodes to the parameter server, the updates to the model are transferred – see Sridharan, pages 18-19, paragraph 190); a transmitter configured to transmit a portion of the Al model (each layer of a neural network can be trained by a different processing node of the distributed system – see Sridharan, page 18, paragraph 189) to the target device (the different nodes of the distributed network have a complete instance of the model and each node receives a different portion 
Sridharan does not explicitly teach the target device having a memory size less than an entirety of the model.
However, Jackey teaches the target device memory size less than an entirety of (Jackey, col. 14, lines 19-36) the model (Jackey, col. 14, line 62 – col. 15, line 19).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Sridharan in view of Jackey in order to enable the target device having a memory size less than an entirety of the model. One would be motivated to do so in order to enable distributed implementations that may distribute processing across multiple types of processing logic connected by a network (Jackey, col. 13, lines 23-28).	
The combination of Sridharan and Jackey does not explicitly teach a batch manager configured to determine a batch size suitable for the target device.
However, de Vangel teaches a batch manager configured to determine a batch size suitable for the target device (de Vangel, Fig. 7; page 5, paragraph 68).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Sridharan and Jackey in view of de Vangel in order to enable a batch manager configured to determine a batch size suitable for the target device. One would be motivated to do so in order to optimize artificial neural network (ANN) computations based on automatic determination of a batch size (de Vangel, page 1, paragraph 1).

With respect to claim 2, the combination of Sridharan, Jackey, and de Vangel teaches the invention described in claim 1, including the system wherein the weight updater is configured to perform 
The combination of references is made under the same rationale as claim 1 above.

With respect to claim 3, the combination of Sridharan, Jackey, and de Vangel teaches the invention described in claim 2, including the system wherein the weight updater is further configured to update the AI model with the average of the received gradients (Sridharan, pages 18-19, paragraph 190).
The combination of references is made under the same rationale as claim 1 above.

With respect to claim 4, the combination of Sridharan, Jackey, and de Vangel teaches the invention described in claim 2, including the system wherein the set of microbatches comprises a plurality of microbatches that are configured to be executed in sequential order (Sridharan, page 20, paragraph 202), the set of microbatches forming a minibatch that comprises a number of samples per update for training of the AI model (Sridharan, page 20, paragraph 202).
The combination of references is made under the same rationale as claim 1 above.

With respect to claim 5, the combination of Sridharan, Jackey, and de Vangel teaches the invention described in claim 2, including the system wherein the microbatch size is configurable based on a rate of executing the set of microbatches at the target device and a rate of communication between the target device and the parameter server (de Vangel, Fig. 7; page 5, paragraph 68).
The combination of references is made under the same rationale as claim 1 above.

With respect to claim 7, the combination of Sridharan, Jackey, and de Vangel teaches the invention described in claim 1, including the system wherein the transmitter is further configured to transmit another portion of the AI model to another target device; and the weight updater is further configured to receive gradients from the another target device to perform reduction of parameters for the another portion of the AI model (Sridharan, pages 18-19, paragraph 190).
The combination of references is made under the same rationale as claim 1 above.

With respect to claim 8, Sridharan teaches a method implemented in a parameter server (the nodes are directly communicating with the parameter server – see Sridharan, Fig. 22, element 2220; pages 22-23, paragraph 220), comprising: store a master copy of an artificial intelligence (AI) model (in data parallelism 1904, the different nodes of the distributed network have a complete instance of the model and each node receives a different portion of the data. The results from the different nodes are then combined. Data parallel training approaches all require a technique of combining results and synchronizing the model parameters between each node. Exemplary approaches to combining data include parameter averaging and update based data parallelism. Parameter averaging trains each node on a subset of the training data and sets the global parameters (e.g., weights, biases) to the average of the parameters from each node. Parameter averaging uses a central parameter server that maintains the parameter data. Update based data parallelism is similar to parameter averaging except that instead of transferring parameters from the nodes to the parameter server, the updates to the model are transferred – see Sridharan, pages 18-19, paragraph 190); transmitting a portion of the Al model (each layer of a neural network can be trained by a different processing node of the distributed system – see Sridharan, page 18, paragraph 189) to the target device (the different nodes of the distributed network have a complete instance of the model and each node receives a different portion of the data – see Sridharan, Fig. 19, element 1904; pages 18-19, paragraph 190), the target device (the different nodes of the 
Sridharan does not explicitly teach the target device having a memory size less than an entirety of the model.
However, Jackey teaches the target device memory size less than an entirety of (Jackey, col. 14, lines 19-36) the model (Jackey, col. 14, line 62 – col. 15, line 19).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Sridharan in view of Jackey in order to enable the target device having a memory size less than an entirety of the model. One would be motivated to do so in order to enable distributed implementations that may distribute processing across multiple types of processing logic connected by a network (Jackey, col. 13, lines 23-28).	
The combination of Sridharan and Jackey does not explicitly teach determining a batch size suitable for the target device.
However, de Vangel teaches determining a batch size suitable for the target device (de Vangel, Fig. 7; page 5, paragraph 68).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Sridharan and Jackey in view of de Vangel in order to enable determining a batch size suitable for the target device. One would be motivated to do so in order to optimize artificial neural network (ANN) computations based on automatic determination of a batch size (de Vangel, page 1, paragraph 1).

With respect to claim 15, Sridharan teaches a computer program product comprising a computer-readable storage device having computer program logic recorded thereon that when executed by a processor-based computer system causes the processor-based system to perform a method, the method 
Sridharan does not explicitly teach the target device having a memory size less than an entirety of the model.
However, Jackey teaches the target device memory size less than an entirety of (Jackey, col. 14, lines 19-36) the model (Jackey, col. 14, line 62 – col. 15, line 19).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Sridharan in view of Jackey in order to enable the target device having a 
The combination of Sridharan and Jackey does not explicitly teach determining a batch size suitable for the target device.
However, de Vangel teaches determining a batch size suitable for the target device (de Vangel, Fig. 7; page 5, paragraph 68).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Sridharan and Jackey in view of de Vangel in order to enable determining a batch size suitable for the target device. One would be motivated to do so in order to optimize artificial neural network (ANN) computations based on automatic determination of a batch size (de Vangel, page 1, paragraph 1).

Claims 9-12, 14, and 16-20 do not teach or define any new limitations above claims 2-5 and 7 and therefore are rejected for similar reasons.

Allowable Subject Matter
Claims 6 and 13 are allowable over the prior art of record.
Conclusion
Applicants' amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. Applicants are reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Alicia Baturay whose telephone number is (571) 272-3981. The examiner can normally be reached at 7am – 4pm, Mondays – Thursdays, Eastern Time.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Wing Chan can be reached on (571) 272-7493. The fax number for the organization where this application or proceeding is assigned is (571) 273-8300.






/Alicia Baturay/
Primary Examiner, Art Unit 2441

October 21, 2021