DETAILED ACTION


Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-16, 21-36 are pending.

Claim Objections

Claim 1 is objected to because of the following informalities:  
--is configured -- should be -- is configured to -- in claim 1 line 7.
Appropriate correction is required.


Specification

The abstract of the disclosure is objected to because of the following:
Delete -- [figure 9] -- from the end of abstract.
Correction is required.  See MPEP § 608.01(b).


Drawings

The drawings are objected to because of the following:
Disconnected nodes should be deleted in fig. 6.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Rejections - 35 USC § 112

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.




Claims 1-16, 21-36 are rejected under 35 U.S.C. 112 (b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or joint inventor regards as the invention.

The following terms lack proper antecedent basis:

-- the pre-compiled synchronization barrier -- in claim 2 line 4.

The following claim language is not clearly understood:

Claim 1 line 4 recites “a gateway device connected to the at least one processing unit”. It is unclear if the gateway device is within the processing node or outside the processing node.

Claim 1 lines 18 recites “with processing nodes of a further of the at least two different sets of processing nodes”. It is unclear what is being referred by the “a further of” i.e. which processing nodes are being considered within the two sets. Similarly, in claim 6 line 4 recites “each further set”. It is unclear how many further sets are there.

Claim 1 line 4 recites “processing unit” and later in line 6 recites “processor” of each processing nodes. It is unclear if the “processing unit” and “processor” are same or different entity. Similar deficiency exist in claim 30.

Claim 2 line 2 recites “part of one of a plurality of further sets”. It is unclear if the further sets excludes or includes the first two different set of processing nodes recited in claim 1 line 5. 

Claim 2 line 4 recites “pre-compiled” synchronization barrier. It is unclear what is meant by pre-compiled.

Claim 5 line 5 recites “transfer elements of the array of the data items”. It is unclear which array is being transferred e.g. array recited in claim 1 line 3 or generated array recited in claim 5 line 4.

Claim 7 lines 12-13 recites “unload to storage”. It is unclear if the storage is part of the processing node or data processing system.

Claim 7 line 4 recites “multi-stage process” and lines 5-6 recite “reduce-scatter collective is performed following the completion of multi-stage process”. It is unclear if the multi-stage process is reduce-scatter process or load/store or load/compute processes. 

Claim 9 recites “some” which is not definite i.e. not certain some is referring to which further sets.

Claim 16 lines 5-6 recite “the one or more reduce-scatter collectives comprises a plurality of reduce-scatter collectives”. It is unclear what is meant by one or more collectives comprises a plurality of collectives (i.e. are these different type of collectives or different implementations of same collectives).

Claims 21 and 30 recite elements of claim 1 and have similar deficiency as claim 1. Therefore, they are rejected for the same rational. Remaining dependent claims are also rejected due to their dependency on the rejected independent claims.



Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



Claims 1, 15-16, 21, 30 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vaidyanathan et al. (US 2021/0109888 A1, hereafter Vaidyanathan)  in view of Zhang et al. (US 2020/0051201 A1, hereafter Zhang).

As per claim 1, Vaidyanathan teaches the invention substantially as claimed including a data processing system comprising a plurality of first processing nodes ([0010] parallel computer system, processing nodes, exchange data [0016] fig 1 computer system 100 multiple processing nodes 102), each of the plurality of first processing nodes comprising at least one memory ([0019] processing nodes 102 include memory 150) configured to store an array of data items ([0019] memory, store data and instructions [0010] processing nodes, store indexed input data array), wherein each of the plurality of first processing nodes (fig. 1 102) comprises: 
at least one processing unit ([0019] processing nodes 102 include multiple processing cores 140); and 
a gateway device connected to the at least one processing unit, wherein each of the plurality of first processing nodes belongs to at least two different sets of processing nodes (fig. 3 each processing nodes 102-n belongs to more than one different set of  nodes e.g. nodes P0 belongs to row P0, P4, P5, P1 as well as P0 belongs to P0, P2 [0024] nodes in the super nodes, super nodes grouped to form mesh ), wherein at least one processor of each of the plurality of first processing nodes is configured (fig. 1 processing core 140 processing node 102):

take part in one or more reduce-scatter collectives using the respective array of data items to obtain a reduced subset of an array of data items ([0010] parallel processing nodes, collective, parallel processing operation, reduce-scatter [0011] processing nodes, reduce-scatter operation, data array, reduction operation, output data scattered, across processing nodes), wherein each of the one or more reduce-scatter collectives is performed between processing nodes of a different one of the respective at least two different sets of processing nodes ([0010] parallel processing nodes, collective parallel processing operation, reduce-scatter [0022] fig. 3 collective operation, reduce-scatter [0024] processing nodes, exchange data, same super nodes, nodes in the super nodes, super nodes grouped to form mesh), wherein taking part in one or more reduce-scatter collectives is performed by the at least one processing unit of the respective first processing node ([0022] fig. 3 collective, operation, reduce-scatter [0024] processing node exchanging data with three other processing nodes fig. 1 processing core 140 processing node 102 [0019] cores perform collective operations,); 

subsequently, exchange the respective reduced subset of the array of data items by participating in an all-reduce collective with processing nodes of a further of the at least two different sets of processing nodes to which the respective processing node belongs to obtain a further reduced subset of the array of data items ([0010] parallel processing nodes, collective parallel processing operation, reduce-scatter [0022] fig. 3 collective operation, reduce-scatter, HRS based stages, Rabenseifner algorithm based stages [0024] processing nodes, exchange data, same super nodes, nodes in the super nodes, super nodes grouped to form mesh, processing nodes of one super node exchanging data with the processing nodes of the other super node 370 [0033] collective parallel processing operation, all-reduce operations fig. 4B 360 370), wherein exchanging the respective reduced subset of the array of data items by participating in an all-reduce collective is performed by the gateway device of the respective first processing node ([0022] fig. 3 collective, operation, reduce-scatter [0024] processing node exchanging data with three other processing nodes, HRS stages 360 processing nodes of one super node exchanging data with the processing nodes of the other super node 370 [0033] collective parallel processing operation, all-reduce operations fig. 4B 360 370); and

subsequently, take part in one or more all-gather collectives using the further reduced subset of the array of data items to obtain a reduced array of data items ([0033] collective, all-gather operations [0024] processing node, exchanging data, other processing nodes of other super node [0022] fig. 3 collective operation, HRS based stages, Rabenseifner algorithm based stages, fig 4B 360 370), wherein each of the one or more all-gather collectives is performed between processing nodes of one of the different ones of the respective at least two different sets of processing nodes ([0033] collective parallel operations, all-gather operations [0024] processing node exchanging data with three other processing nodes, HRS stages processing nodes of one super node exchanging data with the processing nodes of the other super node 370 [0022] fig. 3 collective operation, HRS based stages, Rabenseifner algorithm based stages), wherein taking part in one or more all-gather collectives is performed by the at least one processing unit of the respective first processing node ([0033] collective parallel operations, all-gather operations [0022] fig. 3 collective operation, HRS based stages, Rabenseifner algorithm based stages [0019] cores perform collective operations, fig. 1 processing core 140 processing node 102).  

Vaidyanathan doesn’t specifically teach a gateway device connected to the at least one processing unit, wherein all-reduce collective is performed by the gateway device of the respective first processing node.

Zhang, however, teaches a gateway device connected to the at least one processing unit ([0022] system 100, server, switch 101 GPU 114), wherein all-reduce collective is performed by the gateway device of the respective first processing node ([0032] fig. 2 GPU topology ware all-reduce operation, switch 201 is connected to other switches in the system  [0042] switch, performing, allReduce [0022] switch, connected to other switched and environment in other racks).

It would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention was made to combine the teachings of Vaidyanathan with the teachings of Zhang of sever including multiple GPUs comprising a switch and performing allreduce operation using switch to improve efficiency and allow a gateway device connected to the at least one processing unit, wherein all-reduce collective is performed by the gateway device of the respective first processing node to the method of Vaidyanathan as in the instant invention.



As per claim 15, Vaidyanathan teaches data processing system as claimed in claim 1, wherein for each of the first processing nodes (fig 1 102): 
the respective at least two different sets of processing nodes consists of two sets of processing nodes (fig. 3 stage 3 {M0 M1}…{M6, M7}); 
the one or more reduce-scatter collectives includes a reduce-scatter collective between processing nodes of a first of the respective two sets of processing nodes ([0027] stages 380, processing nodes 102 of the meshes exchange data and perform corresponding reduction operations); 
the all-reduce collective is between processing nodes of a second of the respective two sets of processing nodes (fig. 3 stage 3 {M0 M1}…{M6, M7} [0027] stages 380/382/384, processing nodes 102 of the meshes exchange data and perform corresponding reduction operations [0033] collective operations, all-reduce operations ); and 
the one or more all-gather collectives includes an all-gather collective between processing nodes of the first of the respective two sets of processing nodes (fig. 3 stage 3 {M0 M1}…{M6, M7} [0027] stages 380/382/384, processing nodes 102 of the meshes exchange data and perform corresponding reduction operations, [0033] collective operations, all-gather operations).  

As per claim 16, Vaidyanathan teaches wherein for each of the first processing nodes: 
the respective at least two different sets of processing nodes comprises more than two sets of processing nodes (fig. 3 stage 3 {M0 M1}…{M6, M7}); 
the one or more reduce-scatter collectives comprises a plurality of reduce-scatter collectives ([0002] type of collective operation, reduce-scatter operation [0011] mathematical addition [0049] first phase, second phase); and 
the one or more all-gather collectives comprises a plurality of all-gather collectives ([0033] collective parallel processing operations, all-gather operations).  

Claim 21 recites a method for elements of claim 1. Therefore, it is rejected for the same rational.

Claim 30 recites a non-transitory computer readable medium storing a computer program comprising computer readable instructions for the elements of claim 1. Therefore, it is rejected for the same rational.



Claims 2-14, 22-29, 31-36 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vaidyanathan in view of Zhang, as applied to above claims, and further in view of Chetlur et al. (US 2021/0133583 A1, hereafter Chetlur).

As per claim 2, Vaidyanathan teaches wherein each of the first processing nodes is part of one of a plurality of further sets of one or more processing nodes of the data processing system (fig. 1 processing nodes 102 fig. 3 processing nodes at different stages), wherein at least one processing node of each of the further set of processing nodes comprises at least one processor (fig 1 processing node 102 processing cores 140) configured to, prior to the pre-compiled synchronization barrier, generate the respective array of data items in dependence upon a different set of input data ([0011] reduction, mathematical addition, output data array, eight element data array, respective element, summation of respective elements of all the input data arrays).
Vaidyanathan and Zhang, in combination, do not specifically teach prior to the pre-compiled synchronization barrier. 
 Chetlur, however, teaches prior to the pre-compiled synchronization barrier ([0455] programing models provide single construct for synchronizing cooperating threads, barrier, programmers, define groups of threads). 

	It would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention was made to combine the teachings of Vaidyanathan and Zhang with the teachings of Chetlur of defining barrier synchronization for group of threads to improve efficiency and allow generating prior to the pre-compiled synchronization barrier to the method Vaidyanathan and Zhang as in the instant invention.


As per claim 3, Vaidyanathan teaches wherein each of the further sets of one or more processing nodes consists of a pair of processing nodes (fig. 3 each processing nodes 102-n belongs to more than one different set of  nodes e.g. nodes P0 belongs to row P0, P4 as well as P0 belongs to P0, P2 and P0 also belongs to P0, P6 [0024]).  

As per claim 4, Vaidyanathan teaches wherein each of the further sets of processing nodes comprises two or more processing nodes (fig. 3 each processing nodes 102-n belongs to more than one different set of  nodes e.g. nodes P0 belongs to row P0, P4 as well as P0 belongs to P0, P2 and P0 also belongs to P0, P6 [0024]) comprising one of the plurality of first processing nodes and at least one additional processing node (fig. 3 each processing nodes 102-n belongs to more than one different set of  nodes e.g. nodes P0 belongs to row P0, P4, P5, P1 as well as P0 belongs to P0, P2).  

As per claim 5, Vaidyanathan teaches wherein at least one processor of at least one of the processing nodes in each further set of one or more of the processing nodes is configured to: 
perform the generating the respective array of data items (([0011] reduction, mathematical addition, output data array, eight element data array, respective element, summation of respective elements of all the input data arrays); and 
transfer elements of the array of data items to another of the processing nodes in the respective further set of one or more of the processing nodes ([0011] elements of the output data array are equally scattered/distributed, across the processing nodes).  

As per claim 6, Vaidyanathan teaches wherein the steps of generating the respective array of data items, and transferring elements of the array of data items to another of the processing nodes is performed by at least one processor of the at least one additional processing node in each further set ([0011] elements of the output data array are equally scattered/distributed, across the processing nodes, e.g. each processing nodes may store the respective elements of the output data array fig 1 processing node 102 processing cores 140), wherein the another of the processing nodes to which the elements of the array of data items are transferred is the first processing node of the respective further set ([0011] elements of the output data array are equally scattered/distributed, across the processing nodes fig. 3 each processing nodes 102-n belongs to more than one different set of  nodes e.g. nodes P0 belongs to row P0, P4 as well as P0 belongs to P0, P2 and P0 also belongs to P0, P6 [0024]).   

As per claim 7, Vaidyanathan teaches a data processing system as claimed in claim 2, wherein for each of the further sets of processing nodes: 3Application No: 16/928,782Docket No. 58424.73US01 (417546US) Preliminary AmendmentCustomer No. 27683 
the generating the respective array of data items is performed as part of a stage of a multi- stage process over a plurality of time periods ([0012] collective operation, stages, in stage, pairs of processing nodes exchange half of their data and reduce the exchange data, processing continues in one or multiple subsequent stages, until each processing node stored an element of the resulting output data array [0015] overall time to perform collective operation); and 
the step of taking part in the one or more reduce-scatter collectives is performed following the completion of the multi-stage process, 
wherein, for each of the further sets of processing nodes (fig. 1 102), during each of some of the plurality of time periods ([0015] overall time to perform collective operation ): 
at least one processor of at least one of the processing nodes of the respective further (fig 1 102 140) set is configured to perform calculations for generating one or more elements of the respective array of data items ([0011] processing nodes, exchange data, apply reduction operations, output data array); and 
at least one other of the processing nodes of the respective further set is configured (fig 1 102 140) to unload to storage one or more elements of the respective array of data items that was calculated during a preceding one of the plurality of time periods ([0011] conclusion, reduce-scatter operation, elements of the output data arrays, equally distributed across the processing nodes).

Vaidyanathan doesn’t specifically teach the step of taking part in the one or more reduce-scatter collectives is performed following the completion of the multi-stage process, during each of some of the plurality of time periods.

Zhang, however, teaches during each of some of the plurality of time periods ([0030] each step takes an amount of time T according to equation D/N/BW=T).

Vaidyanathan and Zhang, in combination, do not specifically teach the step of taking part in the one or more reduce-scatter collectives is performed following the completion of the multi-stage process.

Chetlur, however, teaches the step of taking part in the one or more reduce-scatter collectives is performed following the completion of the multi-stage process ([0354] memory load/store operations i.e. multistage process).

 

As per claim 8, Vaidyanathan teaches wherein, for each of the further sets of processing nodes, the at least one other of the processing nodes comprises the first processing node of the respective further set (fig. 3 {P0, P2} {P2, P6} {P6 P7} {P6 P4}).  

As per claim 9, Vaidyanathan teaches wherein, for each of some of the further sets of processing nodes (fig. 3 {P0, P2} {P2, P6} {P6 P7} {P6 P4}), during each of at least some of the plurality of time periods ([0015] overall time to perform collective operation): 
at least one processor of the respective first processing node of the further set of processing nodes is configured (fig 1 102 140)  to generate one or more elements of the respective array of data items ([0011] processing nodes, exchange data, apply reduction operations, output data array); and 
at least one processor of the respective at least one additional processing node is configured (fig 1 102 140) to unload one or more elements of the respective array of data items generated in a preceding of the time periods ([0011] exchange data, apply reduction operations, elements of the output data array are equally scattered/distributed, across the processing nodes, e.g. each processing nodes may store the respective elements of the output data array ), 
wherein, for each of the some of the further sets of processing node (fig. 3 {P0, P2} {P2, P6} {P6 P7} {P6 P4}), during each of at least others of the plurality of time periods ([0015] overall time to perform collective operation): 
at least one processor of the respective at least one additional processing node of the further set of processing nodes is configured to (fig 1 102 140) generate one or more elements of the respective array of data items ([0011] processing nodes, exchange data, apply reduction operations, output data array); and 4Application No: 16/928,782Docket No. 58424.73US01 (417546US) Preliminary AmendmentCustomer No. 27683 
at least one processor of the respective further sets of processing node is configured to (fig 1 102 140) unload one or more elements of the respective array of data items generated in a preceding of the time periods ([0011] exchange data, apply reduction operations, elements of the output data array are equally scattered/distributed, across the processing nodes, e.g. each processing nodes may store the respective elements of the output data array). 
Zhang teaches remaining claim elements of during each of some of the plurality of time periods ([0030] each step takes an amount of time T according to equation D/N/BW=T).


As per claim 10, Vaidyanathan teaches wherein each of the plurality of first processing nodes is configured to (fig 1 102), prior to taking part in the one or more reduce-scatter collectives ([0002] nodes, collective operations, reduce-scatter operation), load the elements of the respective array of data items that were unloaded to storage by a processing node of the further set to which the respective first processing node belongs.  
Chetlur teaches remaining claim elements of prior to taking part ([0093] input/data used, during forward propagation of input/output data), load the elements of the respective array of data items that were unloaded to storage by a processing node of the further set to which the respective first processing node belongs ([0093] load information into processor ALU based on architecture of a neural network to which code corresponds, ).  

As per claim 11, Vaidyanathan teaches wherein each array of data items comprises an item from a list consisting of ([0011] input data array [0028] each input data array is <1, 2, 3, 4, 5, 6, 7, 8>).
Chetlur teaches remaining claim elements of weight updates for a neural network ([0067] set of nodes, associated, set of weights 202 neural network); and 
weights for a neural network derived from weight updates for the neural network ([0069] updates weights, all-gathered, workers, neural network).  

As per claim 12, Vaidyanathan teaches a data processing system as claimed in claim 2, wherein each array of data items comprises an item from a list consisting of ([0011] input data array [0028] each input data array is <1, 2, 3, 4, 5, 6, 7, 8>): 
weight updates for a neural network; and 
weights for a neural network derived from weight updates for the neural network, wherein at least one processor of at least one processing node of each of the further sets of processing nodes is configured (fig 1 102 140) to generate the weight updates using a different set of training data ([0012] collective operation, stages, in stage, pairs of processing nodes exchange half of their data and reduce the exchange data, processing continues in one or multiple subsequent stages, until each processing node stored an element of the resulting output data array).  
Chetlur teaches remaining claim elements of weight updates for a neural network ([0067] set of nodes, associated, set of weights 202 neural network); and 
weights for a neural network derived from weight updates for the neural network ([0069] updates weights, all-gathered, workers, neural network), generate the weight using set of training data ([0078] set of training data).


As per claim 13, Vaidyanathan teaches wherein the at least one processor of each of the first processing nodes in the data processing system is configured to (fig. 1 102 140):
following a reduce-scatter collective of the all-reduce collective ([0011] reduce scatter operation) and perform operations on each stored element of the array of data items stored by the respective first processing node to modify data of the stored elements of the array of data items ([0011] conclusion of the reduce-scatter operation, the elements of output data arrays are equally scatter across the processing nodes).  
Chetlur teaches remaining claim elements of prior to an all-gather collective of the all-reduce collective ([0056] all-reduce, combination of reduce-scatter and all-gather).

As per claim 14, Vaidyanathan teaches each of the stored elements comprises weight updates for a neural network ([0028] input data array <1, 2, 3, 4, 5, 6, 7, 8>); and 
the operations to modify data comprise providing updated weights of the neural network using the weight updates (fig. 4B stage 1 360 [0011] reduction, mathematical addition), 
the modified data of the stored elements comprises the updated weights (fig. 4B stage 2 370).  
Chetlur teaches remaining claim elements of weight updates for a neural network ([0067] set of nodes, associated, set of weights 202 neural network), 
updated weights of the neural network using the weight updates ([0069] updates weights, all-gathered, workers, neural network),
stored elements comprises the updated weights ([0071] updated portions are distributed among workers so that each worker has a complete updated set of weights). 
As per claim 35, Vaidyanathan teaches wherein the at least one processor of each of the plurality of first processing nodes is configured to (fig. 1 102 140) execute compute instructions during a compute phase ([0011] reduction, mathematical addition) and, following a precompiled synchronization barrier, enter at least one exchange phase ([0011] exchange their data and apply reduction operations), wherein each of the plurality of first processing nodes is configured to (fig. 1 102): 10Application No: 16/928,782Docket No. 58424.73US01 (417546US) Preliminary AmendmentCustomer No. 27683 
take part in the one or more reduce-scatter collectives during the at least one exchange phase ([0011] reduce-scatter operations, exchange their data and apply reduction operations); 
perform the exchange of the respective reduced subset of the array of data items by participating in the an all-reduce collective during the at least one exchange phase ([0011] reduce-scatter operations, exchange their data and apply reduction operations [0033] all-reduce operations); and
take part in the one or more all-gather collectives during the at least one exchange phase ([0011] reduce-scatter operations, exchange their data and apply reduction operations [0033] all-gather operations).  

Chetlur teaches remaining claim elements of a pre-compiled synchronization barrier ([0455] programing models provide single construct for synchronizing cooperating threads, barrier, programmers, define groups of threads). 

Claim 22 recites method for elements of claim 2. Therefore, it is rejected for the same rational.
Claim 23 recites method for elements of claim 5. Therefore, it is rejected for the same rational.
Claim 24 recites method for elements of claim 5. Therefore, it is rejected for the same rational.
Claim 25 recites method for elements of claim 7. Therefore, it is rejected for the same rational.
Claim 26 recites method for elements of claim 11. Therefore, it is rejected for the same rational.
Claim 27 recites method for elements of claim 12. Therefore, it is rejected for the same rational.
Claim 28 recites method for elements of claim 13. Therefore, it is rejected for the same rational.
Claim 29 recites method for elements of claim 14. Therefore, it is rejected for the same rational.
Claim 36 recites method for elements of claim 35. Therefore, it is rejected for the same rational.

Claim 31 recites the non-transitory computer readable medium for elements of claim 11. Therefore, it is rejected for the same rational.
Claim 32 recites the non-transitory computer readable medium for elements of claim 13. Therefore, it is rejected for the same rational.
Claim 33 recites the non-transitory computer readable medium for elements of claim 14. Therefore, it is rejected for the same rational.
Claim 34 recites the non-transitory computer readable medium for elements of claim 35. Therefore, it is rejected for the same rational.


Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 


Howard; Kevin D. (US-20130067443-A1) Parallel Processing Development Environment Extensions.
Kalamkar; Dhiraj D. (US-20180293492-A1) Abstraction Library To Enable Scalable Distributed Machine Learning.
Langer; Akhil (US-20190042527-A1) Techniques For Collective Operations In Distributed Systems.
LEE; Jinho (US-20200311016-A1) Method for Flexible, Fast All-Reduce on Arbitrary Tree Topology.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ABU ZAR GHAFFARI whose telephone number is (571)270-3799. The examiner can normally be reached Monday-Thursday 9:00 - 17:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng-Ai AN can be reached on 571-272-3756. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

ABU ZAR GHAFFARI
Primary Examiner
Art Unit 2195


/ABU ZAR GHAFFARI/Primary Examiner, Art Unit 2195