DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1, 8, and 9 have been amended.
Claims 1-9 have been examined.
The § 112 rejections in the previous Office Action have been addressed and are withdrawn.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-9 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Claim 1 recites, at lines 18-19, “the time cost = ∑Ti/min.” This language renders the scope of the claim indefinite because the meaning of “min” is ambiguous in the context of the claim. The plain meaning of the abbreviation “min” in the denominator, as commonly used, is minute, as in the example, revolutions per minute. The Examiner notes that the claim previously recited minute explicitly. However, the meaning in the specification appears to contradict this. For example, ¶ [0114] of the Application as filed indicates that “min(Wm,n) specifies a bandwidth. Given this conflict between the plain meaning of the claim language and the written description, and the fact that the term used in the claim, i.e., min, omits the parameters used in the written description to specify which bandwidth is referred to or any other textual clarification, a person having ordinary skill in the art would be unable to definitely 
Claims 2-7 are rejected as depending from rejected base claims and failing to cure the indefiniteness of those base claims.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-9 are rejected under 35 U.S.C. 103 as being unpatentable over US Patent No. 10,325,343 by Zhao et al. (hereinafter referred to as “Zhao”) in view of US Publication No. 2015/0215379 by Tamano (as cited by Applicant and hereinafter referred to as “Tamano”) in view of US Publication No. 2014/0164600 by Archer et al. (hereinafter referred to as “Archer”).
Regarding claims 1, 8, and 9, and taking claim 1 as representative, Zhao discloses:
a system comprising: a plurality of information processing devices each including a group of arithmetic processors, the plurality of information processing devices being configured to perform parallel processing… (Zhao discloses, at col. 1, lines 48-57, a system having a plurality of GPU servers, which are information devices. As disclosed, at col. 5, line 52 to col. 6, line 6, the GPU servers each include one or more GPU devices, which are arithmetic processors configured to perform processing.), wherein 
at least one of the plurality of information processing devices includes: a memory configured to store bandwidth information indicating a communication bandwidth with which an arithmetic processor included in the groups of arithmetic processors communicates with another arithmetic processor included in the groups of arithmetic processors (Zhao discloses, at col. 5, line 52 to col. 6, line 6, the GPU servers include performance metric tables, which is memory to store, as disclosed at col. 11, lines 13-28, interconnection (bandwidth) information.), and 
a processor coupled to the memory and configured to, for a source arithmetic processor that is any one of the groups of arithmetic processors, determine a destination arithmetic processor that is in one of the groups of arithmetic processors to which…data of the source arithmetic processor is to be transferred, based on the bandwidth information stored in the memory (Zhao discloses at col. 5, line 52 to col. 6, line 6,  that the GPU servers include a GPU grouping and provisioning system (which is a processor). As disclosed, at col. 9, lines 49-55, the grouping and provisioning system groups GPU devices, i.e., determines a destination arithmetic processor, based, as disclosed at col. 10, lines 28-36, on interconnect information, i.e., bandwidth.);
determine minimum bandwidth between pairs of GPUs (Zhao discloses, at col. 13, lines 48-56 and Figure 6B, determining the bandwidths between pairs of GPUs, which discloses determining a minimum bandwidth. As disclosed at col. 12, line 60- col. 13, line 4, lower scores indicate higher bandwidth, which corresponds to faster transmission speed, and higher scores indicate lower bandwidth, which corresponds to slower transmission speed. So the highest score in table 610 represents the minimum bandwidth.).
Zhao does not explicitly disclose that the parallel processing involves using calculation result data of the groups of arithmetic processors included in the plurality of information processing devices, that the data transmitted to the selected destination device includes the calculation result data, that the aforementioned processor is configured to perform All-reduced processing, and calculating a time cost for the All-reduced processing, the time cost = ∑Ti/min where Ti is an amount of data transfer per each of the arithmetic processors. 
However, in the same field of endeavor (e.g., distributed processing) Tamano discloses:
using calculation result data of the groups of processors and that the data transmitted the calculation result data (Tamano discloses, at ¶ [0040], transmitting result data among distributed processes.); and
performing all-reduce processing (Tamano discloses, at ¶ [0064], performing an AllReduce.).

Also in the same field of endeavor (e.g., collective operations) Archer discloses:
calculating a time cost for collective operations, the time cost = ∑Ti…where Ti is an amount of data transfer per each of the arithmetic processors (Archer discloses, at Table 1, a calculated execution time (time cost) for collective operations based on amount of data transferred and configuration. Calculating execution time based on amount of data implicitly discloses utilizing the bandwidth. That is, if the amount of data being transferred is known, calculating the amount of time taken to transfer the data utilizes the speed or rate of transfer, i.e., the bandwidth.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Zhao’s GPU resources to include Archer’s method of calculating performance metrics and selecting the best performing configuration to implement collective operations because doing so enables the best performing configuration to be used. See Archer, ¶ [0069]. 

Regarding claim 2, Zhao, as modified, discloses the elements of claim 1, as discussed above. Zhao also discloses:
the processor determines the destination arithmetic processor so as to reduce a first time taken for the calculation result data of each of the groups of arithmetic processors to be shared among the groups of arithmetic processors (Zhao discloses, at col. 10, lines 36-42, grouping GPUs together to provide faster communication, which reduces the time taken to share data among the GPUs.).

Regarding claim 3, Zhao, as modified, discloses the elements of claim 2, as discussed above. Zhao does not explicitly disclose wherein: a step is defined as transfer of the calculation result data, whose data volume is determined according to a predetermined algorithm, between each pair of arithmetic processors among part or all of the groups of arithmetic processors; and the processor is configured to: obtain a number of the steps taken for the calculation result data of each of the groups of arithmetic processors to be shared among the groups of arithmetic processors, and obtain a transfer data amount in each of the steps, determine, for each of the steps, a set of source-destination patterns each indicating a combination of the source arithmetic processors and the destination arithmetic processors, calculate, for each of a plurality of source-destination pattern combinations, the first time, based on the bandwidth information and the transfer data amount in each of the steps, each of the plurality of source- destination pattern combinations being a combination of source-destination patterns that are respectively selected from the sets of source-destination patterns determined for the respective steps, and select at least one source-destination pattern combination for which the calculated first time is shortest, from the plurality of source-destination pattern combinations.
However, in the same field of endeavor (e.g., distributed processing) Tamano discloses:
wherein: a step is defined as transfer of the calculation result data, whose data volume is determined according to a predetermined algorithm, between each pair of arithmetic processors among part or all of the groups of arithmetic processors (Tamano discloses, at ¶ [0071], a number of steps that each involve transferring calculation result data between each pair of processes. The data volume varies depending on the step. In a first step, the volume transferred is half of the volume transferred in a subsequent step (i.e., D1D2 in a first step, D1D2D3D4 in a subsequent step.). The algorithm that specifies the volume is given by the selected collective communication method, e.g., Recursive halving and doubling, as disclosed at ¶ [0066].); and  
the processor is configured to: obtain a number of the steps taken for the calculation result data of each of the groups of arithmetic processors to be shared among the groups of arithmetic processors,… (Tamano discloses, at ¶ [0066], selecting which collective communication algorithm to use, which determines the number of steps, as described at ¶ [0071].).

Also in the same field of endeavor (e.g., collective operations) Archer discloses:
obtain a transfer data amount in each of the steps and determine, for each of the steps, a set of source-destination patterns each indicating a combination of the source arithmetic processors and the destination arithmetic processors (Archer discloses, at ¶ [0070], determining message size (data transfer amount) to be used in each step, and at ¶ [0069], a plurality of system configurations (set of source-destination patterns indicating a combination of source and destination processors).); 
calculate, for each of a plurality of source-destination pattern combinations, the first time, based on the bandwidth information and the transfer data amount in each of the steps, each of the plurality of source- destination pattern combinations being a combination of source-destination patterns that are respectively selected from the sets of source-destination patterns determined for the respective steps (Archer discloses, at Table 1, a calculated execution time for each of a plurality of configurations. As described at ¶ [0069], the performance metrics determine how well each configuration performs. That is, execution time depends on the nodes participating and network topology (which indicates various bandwidths, as disclosed at ¶ [0060]) and the message size (transfer data amount).); and 
select at least one source-destination pattern combination for which the calculated first time is shortest, from the plurality of source-destination pattern combinations  (Archer discloses, at ¶ [0069], selecting the configuration which best satisfies a desired metric, such as shortest execution time.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Zhao’s GPU resources to include Archer’s method of calculating 

Regarding claim 4, Zhao, as modified, discloses the elements of claim 3, as discussed above. Zhao does not explicitly disclose wherein for calculation of the first time in each of the plurality of source- destination pattern combinations, the processor uses a minimum communication bandwidth that is smallest one of communication bandwidths between arithmetic processors in each of the steps.
However, in the same field of endeavor (e.g., collective operations) Archer discloses:
wherein for calculation of the first time in each of the plurality of source- destination pattern combinations, the processor uses a minimum communication bandwidth that is smallest one of communication bandwidths between arithmetic processors in each of the steps (Archer discloses, at ¶ [0069, selecting the lowest amount of network traffic (bandwidth).).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Zhao’s GPU resources to include Archer’s method of calculating performance metrics and selecting the best performing configuration to implement collective operations because doing so enables the best performing configuration to be used. See Archer, ¶ [0069].

Regarding claim 5, Zhao, as modified, discloses the elements of claim 3, as discussed above. Zhao does not explicitly disclose wherein in determination of the set of source-destination patterns, the processor determines, for each of the steps and for each of a plurality of data-sharing algorithms, the set of source-destination patterns with which the calculation result data is transferred in the part or all of the plurality of arithmetic processors.
However, in the same field of endeavor (e.g., distributed processing) Tamano discloses:
wherein in determination of the set of source-destination patterns, the processor determines, for each of the steps and for each of a plurality of data-sharing algorithms, the set of source-destination patterns with which the calculation result data is transferred in the part or all of the plurality of arithmetic processors (Tamano discloses, at ¶¶ [0065] and [0071], determining, for each step in the selected algorithm, the processes between which data is shared.).


Regarding claim 6, Zhao, as modified, discloses the elements of claim 5, as discussed above. Zhao does not explicitly disclose each of the groups of arithmetic processors included in the plurality of information processing devices is used for learning processing to learn weight coefficients in a predetermined neural network; and each of the groups of arithmetic processors divides the calculation result data into a predetermined number of subdivided pieces in All-Reduced processing in the learning processing, assigns one of the set of source- destination patterns to each of the subdivided pieces of the calculation result data, and transmits the subdivided pieces of the calculation result data to the destination arithmetic processors, in parallel, based on the assigned source- destination patterns.
However, in the same field of endeavor (e.g., distributed processing) Tamano discloses:
each of the groups of arithmetic processors included in the plurality of information processing devices is used for learning processing to learn weight coefficients in a predetermined neural network (Tamano discloses, at ¶¶ [0180]-[0183], learning weight values.); and 
each of the groups of arithmetic processors divides the calculation result data into a predetermined number of subdivided pieces in All-Reduced processing in the learning processing, assigns one of the set of source- destination patterns to each of the subdivided pieces of the calculation result data, and transmits the subdivided pieces of the calculation result data to the destination arithmetic processors, in parallel, based on the assigned source- destination patterns (Tamano discloses, at ¶ [0064], performing an AllReduce that involves, as disclosed at ¶ [0071], dividing the results data between the processes and the processes sharing the results data with specified other processes in parallel based on the method selected for the collective communication.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Zhao’s GPU resources to include transferring result data according to various algorithms, as disclosed by Tamano, because machine learning and data mining are becoming more and more prevalent, and using result data is an integral part of machine learning collective operations, such as MPI AllReduce, which are likewise integral to machine learning. As disclosed at ¶ [0009] of Tamano, preventing such collective operations from becoming a performance bottleneck is important to facilitate enhancing processing speed, which is likewise important as data sets continue to grow, as described at Tamano, ¶ [0003].

Regarding claim 7, Zhao, as modified, discloses the elements of claim 1, as discussed above. Zhao does not explicitly disclose each of the groups of arithmetic processors included in the plurality of information processing devices is used for learning processing to learn weight coefficients in a predetermined neural network; and the processor obtains the bandwidth information before the learning processing to learn the weight coefficients is performed, and determines the destination arithmetic processor, based on the obtained bandwidth information.
However, in the same field of endeavor (e.g., distributed processing) Tamano discloses:
each of the groups of arithmetic processors included in the plurality of information processing devices is used for learning processing to learn weight coefficients in a predetermined neural network (Tamano discloses, at ¶¶ [0180]-[0183], learning weight values.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Zhao’s GPU resources to include transferring result data according to various algorithms, as disclosed by Tamano, because machine learning and data mining are becoming more and more prevalent, and using result data is an integral part of machine learning collective operations, such as MPI AllReduce, which are likewise integral to machine learning. As disclosed at ¶ [0009] of Tamano, preventing such collective operations from becoming a performance 
Also in the same field of endeavor (e.g., collective operations) Archer discloses:
the processor obtains the bandwidth information before the learning processing to learn the weight coefficients is performed, and determines the destination arithmetic processor, based on the obtained bandwidth information (Archer discloses, at ¶ [0071], determining a preferred configuration, which is based on bandwidth, so that the preferred configuration can be used in performing subsequent collective operations.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify Zhao’s GPU resources to include Archer’s method of calculating performance metrics and selecting the best performing configuration to implement collective operations because doing so enables the best performing configuration to be used. See Archer, ¶ [0069].

Response to Arguments
On page 8 of the response filed February 19, 2021 (“response”), the Applicant argues, “The Office Action asserts on page 4 that Zhao discloses calculating a time cost in column 13, lines 12-15. Initially, it is noted that an Office Action dated July 6, 2020, states on page 3 that "Zhao does not explicitly disclose...that the aforementioned processor is configured to...calculate a time cost." If it was acknowledge [sic] that Zhao did not disclose calculating a time cost before, how can Zhao now disclose this feature?”
Though fully considered, the Examiner respectfully disagrees. Based on the current amendments, Zhao is not interpreted as disclosing the limitations in question, i.e., calculating a time cost. Therefore, the Applicant’s arguments are moot. However, the Examiner notes that in the amendments of November 6, 2020, the Applicant amended calculating a time cost to include dividing an amount of data by a unit of time, which is the definition of bandwidth. As calculating bandwidth is explicitly recited by Zhao, the claims in that form read on Zhao, whereas the claims prior to the amendment did not include those features and did not read on Zhao. 


On page 8 of the response the Applicant argues, “As stated on page 3 of the Office Action, "[t]he specification recites, at paragraphs [0114] et seq. calculating a time cost. However, the calculation uses a formula that divides the sum of an amount of information by a minimum bandwidth" (underlined emphasis added). As acknowledged on page 3 of the Office Action, the formula uses bandwidth to calculate a time cost; therefore, "calculating an actual operating bandwidth" cannot be deemed to be calculating a time cost when the time cost calculation includes using the bandwidth in the calculation.”
Though fully considered, the Examiner respectfully disagrees. As an initial matter, the Examiner notes that the amendment of the claims to more closely correspond to the specification has overcome the interpretation that Zhao discloses calculating a time cost as claimed. However, the Examiner notes that the previous rejection was based on the fact that despite what was described in the specification, the claims previously did not recite the calculation dividing an amount of information by a bandwidth. Instead, the claims recited features that conflicted with the description in the specification. Specifically, the claims 
The Examiner notes that, as indicated in the § 112(b) rejection above, the amended language is still unclear regarding the scope of the claims, as “min” is a commonly used abbreviation for minute, and the claims lack any other limitations that would conclusively inform a reader of the metes and bounds of the term “min” as used in the claims. While the specification indicates, though not with abundant clarity, that min(Wm,n) corresponds to a minimum bandwidth, the Applicant has chosen to recite only min, rather than min(Wm,n), and has not provided any text that would clarify that min corresponds to the minimum bandwidth referred to as min(Wm,n) in the specification, and not to minute, as previously recited and commonly used. 

On page 9 of the response the Applicant argues that the remaining claims are allowable for similar reasons.
Though fully considered, the Examiner respectfully disagrees. The reasons set forth in the remarks and rejections presented above, including those regarding the independent claims, are applicable to these claims.

Conclusion
The following prior art made of record and not relied upon is considered pertinent to Applicant’s disclosure.
US 20160269247 by Chadrakar discloses that throughput for all-reduce operations is dictated by the slowest link.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAWN DOMAN whose telephone number is (571)270-5677.  The examiner can normally be reached on Monday through Friday 8:30am-6pm Eastern Time.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee J. Li can be reached on 571-272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SHAWN DOMAN/
Examiner, Art Unit 2183