Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Status of the Application
This Office Action is in response to Applicant’s Amendment filed on 7/13/2022.
Claims 1-13 are pending for this examination.
Claims 10-13 were amended.

Claim Rejections - 35 U.S.C. § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Archer et al. (US 8,752,051), herein referred to as Archer ‘051.
Referring to claim 1, Archer ‘051 teaches a compute system (see Fig. 1, parallel computer 100) comprising:
a plurality of parallel processing units organized into a plurality of subsets of parallel processing units (see Fig. 1, multiple compute nodes 102 that can be arranged into operational groups 132, i.e. plurality of compute nodes that can be arranged into a plurality of subsets of nodes; see Fig. 2, wherein a computer node 152 comprises of multiple processing cores 164), wherein each parallel processing unit in each subset is coupled to each of the other parallel processing units in the same subset of parallel processing units by two bi-directional communication links (see Fig. 3A, wherein a compute node 152 can be arranged with a point-to-point adapter 180 for connecting to other compute nodes that can be arranged into a torus network 107 or mesh 108, see Fig. 4; see Col. 9, lines 15-24, wherein the communication links of a point-to-point adapter are bidirectional links +x 181, -x 182, +y 183, -y 184, +z 185, and -z 186), and each parallel processing unit is coupled to a corresponding parallel processing unit of a corresponding other of the subset of parallel processing units by one bi-directional communication link (see Fig. 3A, wherein a compute node 152 can be arranged with a point-to-point adapter 180 for connecting to other compute nodes that can be arranged into a torus network 107 or mesh 108, see Fig. 4, wherein the nodes communicate with each other through at least one communication link).
Examiner points out that Archer ‘051 only teaches the usage of bidirectional communication links to connect compute nodes in a point-to-point manner, but does not specifically indicate that nodes within any specific organizational group are connected to each other using two bidirectional links as being claimed.  
Examiner points out that a person of ordinary skill in the art would recognize that having multiple buses or communication links connecting between two devices / nodes would allow for higher speed as multiple active communication links between a nodes would allow for larger throughput of a singular transfer, or for multiple separate commands / data to be transmitted in parallel at any point in time, both causing higher overall speeds in data transfer between nodes of a network, and thereby it would have been obvious to a person of ordinary skill in the art at the time of invention to want to design the computing devices in a network to include as many communication links between nodes as allowable for connection to increase the amount of throughput and parallelism between the nodes.  However, this would be a simple matter of design choice that would not be particularly patentably distinct from other network system such as Archer ‘051 which already teaches connecting compute nodes together using bidirectional links, as the increasing of number of communication links between two specific nodes would allow for increased speeds at a higher cost of manufacture as having two active communication links would require two hardware interfaces which may be undesirable for some computer system designers.  As such, Examiner points out that having one or more bidirectional communication links between nodes / processors would just be a matter of design choice that is an obvious variant to those of ordinary skill in the art depending on whether the designer wants advantages in speed / throughput with disadvantages in costs / hardware needed or not. 
As to claim 2, Archer ‘051 teaches the compute system of Claim 1, wherein the parallel processing units and the bi-directional communication links are configured to compute an All_Reduce function (see Abstract; also see Fig. 6, step 600).

Referring to claim 10, Archer ‘051 teaches one or more non-transitory computing device readable media having instructions stored thereon that when executed by one or more processing units to perform a method (see Abstract; see Col. 21, lines 24-52) comprising:
configuring communication links for a cluster of eight parallel processing units organized into two subsets (see Fig. 1, multiple compute nodes 102 that can be arranged into operational groups 132, i.e. plurality of compute nodes that can be arranged into a plurality of subsets of nodes; see Fig. 2, wherein a computer node 152 comprises of multiple processing cores 164; Examiner points out that the specific number of processing units / nodes would be a matter of design choice that is not patently distinct in which instance considering that Archer ‘051 teaches multiple nodes arranged in torus / mesh network, see Fig. 4);
configuring each parallel processing unit in the same subset to be coupled to each of the other parallel processing units in the same subset by two bi-directional communication links (see Fig. 3A, wherein a compute node 152 can be arranged with a point-to-point adapter 180 for connecting to other compute nodes that can be arranged into a torus network 107 or mesh 108, see Fig. 4; see Col. 9, lines 15-24, wherein the communication links of a point-to-point adapter are bidirectional links +x 181, -x 182, +y 183, -y 184, +z 185, and -z 186); and
configuring each parallel processing unit to be coupled to a corresponding parallel processing unit in the other subset by one bi-directional communication link (see Fig. 3A, wherein a compute node 152 can be arranged with a point-to-point adapter 180 for connecting to other compute nodes that can be arranged into a torus network 107 or mesh 108, see Fig. 4, wherein the nodes communicate with each other through at least one communication link).
Examiner points out that Archer ‘051 only teaches the usage of bidirectional communication links to connect compute nodes in a point-to-point manner, but does not specifically indicate that nodes within any specific organizational group are connected to each other using two bidirectional links as being claimed.  
Examiner points out that a person of ordinary skill in the art would recognize that having multiple buses or communication links connecting between two devices / nodes would allow for higher speed as multiple active communication links between a nodes would allow for larger throughput of a singular transfer, or for multiple separate commands / data to be transmitted in parallel at any point in time, both causing higher overall speeds in data transfer between nodes of a network, and thereby it would have been obvious to a person of ordinary skill in the art at the time of invention to want to design the computing devices in a network to include as many communication links between nodes as allowable for connection to increase the amount of throughput and parallelism between the nodes.  However, this would be a simple matter of design choice that would not be particularly patentably distinct from other network system such as Archer ‘051 which already teaches connecting compute nodes together using bidirectional links, as the increasing of number of communication links between two specific nodes would allow for increased speeds at a higher cost of manufacture as having two active communication links would require two hardware interfaces which may be undesirable for some computer system designers.  As such, Examiner points out that having one or more bidirectional communication links between nodes / processors would just be a matter of design choice that is an obvious variant to those of ordinary skill in the art depending on whether the designer wants advantages in speed / throughput with disadvantages in costs / hardware needed or not. 

Allowable Subject Matter
Claims 3-6 and 11-13 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
As to claim 3, Examiner finds that prior art does not specifically teach the specifics of this dependent claim, with regards to the intermediate data and the broadcasting a corresponding group of sum data.
As to claims 4-6, Examiner points out that prior art does not specifically teach the usage of specifically eight parallel processing units that are divided into two subsets of four processing units each, with two bi-directional communication links couple each parallel processing unit to the other three parallel processing units in the same subset of parallel processing units; and one bi-directional communication link couples each parallel processing unit to a corresponding parallel processing unit is the other subset of parallel processing units as claimed.
As to claims 11-13, Examiner finds that prior art does not specifically teach the reducing the input data along 2x3 of the bi-directional communication links in parallel on both subset of parallel processing units; reducing data between corresponding parallel processing units in the two subsets of parallel processing units; and broadcasting data along 2x3 of the bi-directional communication links in parallel on both subset of the parallel processing units.


Claims 7-9 are indicated as allowable subject matter.  

The following is a statement of reasons for the indication of allowable subject matter:  
Prior art teaches systems and methods for connecting and communication between multiple processor cores or multiple nodes in a parallel computing environment and performing reduction operations, however, the prior art does not fairly teach or suggest, individually or in combination, a compute method where communication links for a cluster of eight parallel processing units are organized into two subsets, each processing unit with a subset connected to another processing unit within the same subset using two bi-directional links, and each processing unit coupled to other processing units in the other subset using one bi-directional link, where input data is reduced along 2x3 of the bi-directional links in parallel on both subsets and reducing data between corresponding parallel processing units in the two subsets, then broadcasting data along 2x3 of the bi-directional communication links in parallel as claimed.  Examiner finds that prior art systems teach having multiple processing units / nodes connected to each other using bidirectional links and performing reduction operations on data between these parallel processing units but finds the specific arrangement of eight parallel processing units arranged into two subsets with a different number bidirectional links between parallel processing units depending on if it is between units of the same subset or not and the usage of 2x3 bidirectional links being used as claimed for operations to be specific enough to be different from other prior art system.  The prior art of record neither anticipates nor renders obvious the above recited combination.

As allowable subject matter has been indicated, applicant's reply must either comply with all formal requirements or specifically traverse each requirement not complied with.  See 37 CFR 1.111(b) and MPEP § 707.07(a).

Response to Arguments
Applicant’s arguments, mailed 7/13/2022, have been fully considered but they are not deemed to be persuasive.

Applicant’s amendments to address the 101 rejections (see Page 8) are deemed acceptable to overcome the 101 rejections.  Thereby Examiner has withdrawn these rejection in view of the amendments.

Applicant’s arguments that Archer ‘051 needs to be considered in its entirety as those of ordinary skill in the art would understand the communication links of Figs. 5 and 5 need to be combined along with memory busses 154/168, processing cores 164 and RAM 156 to have a complete picture of the processing units and communication links of Archer ‘051, wherein when the buses, point-to-point network unidirectional nodes, and global combining network bidirectional node connections of all the computer nodes 102are properly considered, Applicants assert that those of ordinary skill in the art would appreciate that Archer ‘051 does not teach or suggest “a plurality of parallel processing units organized into a plurality of subsets of parallel processing units, wherein each processing unit in each subset is coupled to each of the other parallel processing units in the same subset of parallel processing units by two bi-directional communication links, and each parallel processing unit is coupled to a corresponding processing unit of a corresponding other of the subset of parallel processing units by one bi-directional communication link” as claimed in independent claim 1 and subsequently with independent claim 10 (see Pages 8-12) are deemed to be unpersuasive.  Examiner points out that a compute node 102 in Archer ‘051 comprises multiple processing cores 164 as a part of that individual node, see Fig. 2, in a system comprising a plurality of compute nodes in a parallel computer system 100, see Fig. 1, which can be considered as an individual node being a subset, or multiple nodes arranged into an operational group 132 which could also be considered as a subset of the total number of nodes.  As to the limitation of wherein each parallel processing unit in each subset coupled to each of the other parallel processing units in the same subset of parallel processing units by two bi-directional communication links, Examiner points out that Archer ‘051 teaches each compute node in the computer system includes bidirectional links for connecting to the other compute nodes, through a point-to-point adapter 180 with bidirectional links, see Col. 9, lines 15-24, where it specifically mentions the point-to-point adapter having six bidirectional links to communicate with other compute nodes in the torus / mesh network of compute nodes, which likewise also reads on the final limitation of each parallel processing unit being coupled to a corresponding parallel processing unit of a corresponding other of the subset of parallel processing units by one bi-directional communication link, i.e. connections between nodes of one organizational group and connections between two different organizational groups of compute nodes.  As for the rejection, Examiner noted that Archer ‘051 only teaches the usage of bidirectional communication links to connect compute nodes in a point-to-point manner, but does not specifically indicate that nodes within any specific organizational group are connected to each other (other specific organized groups) using two bidirectional links, but pointed out that a person of ordinary skill in the art would recognize the multiple buses or communication links between compute nodes like Archer ‘051 could be designed to include as many communication links between nodes as allowable for connection with the reasoning that the number of connections between nodes / processing units would be a simple matter of design choice that would not be particularly patentably distinct from other network system such as Archer ‘051 which already teaches connecting compute nodes together using bidirectional links, see Fig. 3A and Col. 9, lines 15-24.  As such, Examiner had indicated in the previous rejection that having one or more bidirectional communication links between nodes / processors would just be a matter of design choice that is an obvious variation of the Archer ‘051 system to those of ordinary skill in the art.  As such, Examiner believes that Archer ‘051 does properly teach the claim language of independent claims 1 and 10, and does not find Applicant’s arguments to be persuasive.

In summary, Archer ‘051 teaches the claimed invention as set forth above.

Relevant Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Blocksome et al. (US 8,458,244) teaches a parallel computer system that does local reduction operations on data, where the compute nodes can be arranged into operational groups.
Matam et al. (US 2020/0294181) teaches a SoC system implementing a parallel processor with multiple parallel processing units and connecting the processors with bidirectional links between the multi-core processors and GPUs.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL SUN whose telephone number is (571)270-1724.  The examiner can normally be reached on Monday-Friday 8am-4pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on 571-272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/MICHAEL SUN/Primary Examiner, Art Unit 2183